1findr, Dimensions and other free mega indexes - A review of the space and numbers comparison
Recently, someone reached out to me and asked me if I had a list of academic libraries that chose not to go with their own branded discovery services. I think for most reasonably sized and resourced university libraries the answer is not many. Utrecht is a well known exception, but the question remains could you chose not to follow the crowd and have your own discovery service of content? One of the main arguments against following Utrecht's lead of "delivery not discovery" which was also pointed out by as Utrecht themselves was that not having your own discovery service (or rather leasing a discovery index) was that you would be at the mercy of Google and their uncertain business model.
But with more free big discovery indexes coming into the field including Microsoft Academic, Digital Science Dimensions , 1Science's 1Findr or even free aggregators like BASE , CORE and others like Lens.org and Scilit based around Crossref metadata is this no longer such a big concern?
Similarly, when web scale discovery services and Google Scholar started to be adopted, commenters were wondering if the days of big expensive A&I like Scopus and Web of Science were numbered as they did not bring in actual content. After all, web scale discovery indexed more items (and was likely to grow as more publishers joined in) than Scopus and Web of Science and it seemed a waste of resources to keep A&I just for the metadata. That is of course not withstanding the fact that the quality of metadata was far superior in A&Is particularly subject specific ones (See for example Christina Pikas's rants on "pile of crap" and this defense of A&Is)
The need for citation metrics
Since then, while some libraries have cancelled one or the other, the fact that Web Scale discovery did not move aggressively towards the citation metrics side of things (only recently Primo made a small move towards this with the citation trails feature but this is meant for browsing and not bibliometrics) and relied on existing A&I content for that took the wind out of that argument as at the very least users still needed Scopus and Web of Science for bibliometrics. Google Scholar did have some bibliometrics (e.g. Google Scholar metrics) but often for individuals you had to use scraping tools like Harzing's Publish or perish (which by the way works also with Microsoft Academic, Scopus, Web of Science and Crossref via APIs) or browser extensions like Scholarometer to get at the data and it was unusable really for all but small scale individual citation counting.
In fact, Elsevier and Clarivate have doubled on this desire for bibliometrics on a large scale (i.e by institution) by launching Scival and Incites which require Scopus and Web of Science respectively.
A possible disruption opportunity?
Given this, it seems to me that there is some room for disruption here. Imagine an index that
a) had a large index of content (say anything above 50-60 million for articles, and perhaps 100 million+ if including other content) and did full text searching
and
b) had a its own robust bibliometrics and citation metrics
and
c) allowed easy access to these data (e.g. mass export features, APIs)
Such a service I could see challenging both comprehensive A&Is on one hand and web scale discovery services on another.
I would say until recently the closest candidate for this was Microsoft Academic which combines the technology akin to Google Scholar with a more traditional search UI (lots of facets) and even has a graph API. Still given this is by the giant Microsoft, there has been little attempts to court libraries as it doesn't make sense for Microsoft to enter this tiny market.
But could the emergence of Digital Science's Dimensions (see recent reviews here and here) and even more recently 1Science's 1Findr be this challenge?

1Science 1Findr with 89 million articles

Digital Science Dimensions with 93 million items
Intriguingly both services provide a freemium service that perfectly serve the discovery needs of researchers and have features on par with Google Scholar.

Dimensions offers 3 versions - Dimensions (free), Dimensions Plus and Dimensions Analytics . Possibly the Dimensions Plus matchups with Scopus and Dimensions Analytics matches up with Scival and/or bits of PURE?
Of the two, currently the freemium version of Dimensions offers more features. The free version of Dimensions offers the ability to create a free account to mass export up to 500 records , which is pretty generous for a free product and sufficient for most individual researchers, while currently I don't see any mass or batch export functions for 1Findr.

Export up to 500 free records in free version of Dimensions
The Dimensions UI offers more facets (journal/source titles and ability to filter to set of reputable journals) than 1Findr. I also understand compared to 1Findr, Dimension goes beyond just articles and includes monographs but neither cover as many as material types as typical web scale discovery indexes like EDS/Summon/Primo/Worldcat that include ebooks, newspapers, video platforms , image archives and more.
Dimensions also offers a free limited Dimensions metric API (see guide), though with conditions

Dimensions Metrics API Documentation
Given a doi, the free API would produce citation metrics like times cited, Relative Citation Ratio and Field citation ratio. You could even get an indicator showing highly cited articles for articles in the top 1%,5% or 10% in Field Citation ratio.
Though you can't get metadata of the article and I'm not sure what the limits of the API is, this is still quite a development, as you now have a new source of article level citation metrics that can be leveraged (given conditions) in a relatively systematic way for free.
I suspect though both freemium versions of Dimensions and 1Findr has been limited to around the level of functionality of Google Scholar and Microsoft Academic plus a bit more. They will be adequate for article searching and perhaps some simple citation work for individual researchers. Features like mass and bulk downloading will still be reserved for paid accounts.
Both free versions (but available in paid versions) also do not have the ability to filter by institution or affiliation (though there appears to be a possible workaround in dimensions here - though not reliable). As a sidenote, for free versions this functionality seems easily possible only in Microsoft Academic (example here) and Lens (example here).
Currently both Dimensions and 1Findr also reserve the ability to integrate with institutional holdings to the paid version (makes sense given this requires cost to maintain) , though this means in the final analysis, they may be unlikely to lure users of Google Scholar (which provides this feature) away despite some better filtering functions but this might not be their intention anyway.
The impact of Open on mega-indexes
This new generation of mega-indexes are also different from earlier ones in that they baked in finding open access or at least free to read articles by default. Both Dimensions and 1Findr try to show open access or at least free to read articles when available.
Dimensions uses Unpaywall to detect these articles , while 1Findr uses I believe uses their own crawler (in fact the company 1Science began primarily as a company leveraging on the need for accurate data on open access figures).
As I mentioned before in the past and in recent conference presentations at CNI Spring membership and Computer in Libraries 2018, existing A&Is like Scopus and Web of Science as well as Web Scale Discovery services like Primo and Summon have began to do the same with varying amounts of reliability. See table in next section for comparisons.

Primo OA indicator in Primo May 2018 release
'
Declining value of metadata article index
One of the more interesting things to occur in the discovery space in recent times I think is how metadata of articles and conference papers have become almost free and easy to obtain.
Large sources of article metadata from big free OA aggregators like BASE and CORE, Crossref, SHARE, Pubmed etc means that creating a big mega-index of articles isn't as hard as in the past.
Dimensions itself is stated to use Crossref metadata has a base, supplemented by full-text processing of publisher partners.
An example is Lens.org, which I became aware of recently.

Lens looks like a very ambitious under-taking.
"The Lens is building an open platform for Innovation Cartography. Specifically, the Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data...
Within the next two years, we expect to host over 95% of the world's patent information and link to most of the scholarly literature, creating open public innovation portfolios of individuals and institutions. Using all open source components, we are working to create open schemas by which patent documents can be used to teach and communicate, rather than confuse and intimidate.
Underlying data and analytics will be available to the public with APIs...." - about lens
The under-taking seems to be patent oriented but interesting is where all the datasets are coming from

By combining datasets from Crossref, Pubmed and Microsoft academic, they have created a pretty comprehensive index of articles.
Another interesting example, I was also made aware of recently was Scilit another index based on extracting Pubmed and Crossref data.

"Scilit is a comprehensive, free database for scientists using a new method to collate data and indexing scientific material. Our crawlers extract the latest data from CrossRef and PubMed on a daily basis. This means that newly published articles are added to Scilit immediately."
I've also written quite a bit in the past about large global open aggregators like BASE and CORE. It might be strange to include such aggregators in this comparison, but they have grown to really staggering sizes.
While their mandate is to harvest and crawl only open access repositories and sources, the nature of such sources is that it also includes massive amounts of non-free metadata only records and both BASE and CORE have grown massively to over 120 million metadata records (about 40 million for article like content). It's unclear to me whether this systematically includes the published literature in say Web of Science but it is closing in surely (see size comparison below).

Other examples of searches that rely on free metadata from Crossref include Science Open.
All these developments seem to indicate that the value of a discovery index particularly on article type content is diminishing. Citation metrics in Scopus and Web of Science itself still has value, though developments like open citations are slowly chipping away at this.

50% of articles in Crossref have open citations
Final thoughts and comparisons
The explosion in free mega-index of articles means that libraries who want to forgo Web Scale Discovery do not need to rely only on Google Scholar as many (arguably slightly weaker) alternatives exist.
Libraries that want to control their own discovery fate and blend in only journal articles , might have potentially a 5th option with the paid version of Dimension or 1Findr API.
But this option only extends to covering journals articles and perhaps monographs while Web Scale Discovery goes beyond to newspapers, magazines, ebooks, image archives, online video streaming items and more so currently Web scale discovery might still not be under threat until or unless these two tools add more content types.
In fact, currently 1Findr is used in a complimentary way with discovery services like Primo or as a service to look for free to read articles if there is no subscription (similar to Unpaywall).
Most intriguing is if perhaps a consortium tries to work on a open source discovery index around metadata data from Crossref, and perhaps open repository networks like SHARE or other repository networks (particularly if COAR's Next generation repositories recommendations bear fruit) together with progress on open citations.
All in all, more competition from most the new commerical versions and the free open alternatives may put some pressure on existing providers of comprehensive mega-indexes such as Scopus or Web of Science.
Size comparisons of these mega-indexes
Below is my own attempt to compare the sizes of the these indexes. Fortunately most of these indexes have a interface that makes it easy to determine the size of index.
Some allow you to do a "blank search" to get all results and then slice by facets (Summon, Dimensions,Lens), others allow you to do a search all with asteriks (1Findr) or allow you to do advanced search without typing in search terms (Scopus, WOS).
The table below is inspired by "Coverage and Overlap of 1findr, Dimensions, Scopus, and Web of Science" but with slightly different settings and a few more comparisons.

I would have loved to add Google Scholar to the comparison, but all methods to get at it such as running "illogicial boolean searches" like -site:fdfdgdfgs to get back all results or filtering by date range in advanced search gave obviously wrong results.
I also couldn't figure out a way to use the Microsoft Academic search interface to reliably get at the data but Lens includes Microsoft Academic data so it provides a rough proxy. See also "Who Has All the Content?"
Do take all these results with a big pinch of salt. I've tried to restrict their scope to cover journal articles and conference paper like items but obviously this isn't an exact science. Indexes may also differ in their ability to dedupe and the ones best at deduping will show a lower figure.
Still one suspects when the number of entries in your results equal or exceeds that in Web of Science or Scopus, you are probably nearly covering most of what your researchers care about so further size might have diminishing returns.
A somewhat more interesting comparison that until recently wasn't even a factor is the ability to detect free to read or Open Access (OA) articles.
Possibly, the differences in %OA does signal a little about the ability or at least the choices of these tools to find and link to OA or free to read articles.
Scopus has the lowest OA% because it finds only Gold Journals. Summon (plus Primo which will get the OA indicator next month) clearly also needs some work for OA detection.
Web of Science and Dimensions both use unpaywall for OA detection, but I understand Web of Sciences only shows links from a subset of results (acceptedVersion and publishedVersion but not the submittedVersion) but Dimensions shows all versions, hence as expected Dimensions shows a higher %.
Currently 1Findr seems to be the best at finding free to read articles in my quick and dirty comparison.
Still all in all, OA detection particularly in institutional repositories is still challenging and I always try these new indexes that have OA detection with my test set and they all fail with the exception of BASE where I took some effort to optimize for.

