Lens.org - detailed review of a new open discovery and citation index

I first read about Lens.org via a tweet on my way back from a conference in April 2018. There seemed to be something in the water at the time, as they was an explosion of new discovery services and idexes in the past few months, including Digital science's Dimensions, 1Science's 1Findr, Scilit and the new resurgent challenge to Google Scholar posed by Microsoft Academic.
At the time, I took a quick look at Len's about page to get a better sense of what the data included.

I immediately noted that Lens had a strong focus on patent searching (it's predecessor Patent Lens was built in 1999 to "render the global patent system more transparent"), and article search was the area I was most interested in so I perhaps prematurely dismissed Lens as mainly Microsoft Academic Graph (MAG) + patent searching.
However I have come to realise that Lens might in fact be a far more exciting development than I thought.
While it is true that the scholarly search portion of Lens might be perhaps mostly dominated by the voluminous data from Microsoft Academic Graph (MAG), Lens is far more than the sum of it's parts by combining open data from half a dozen open data sources.
The other significant thing about Lens that differentiates it from the other search discovery and citation indexes is that it is run by Cambia a non-profit that seems committed to produce a open, free to use alternative to commerically owned and licensed indexes.
Most importantly, Lens is not just open data but there is a surprisingly feature rich interface that goes with it that can go toe to toe to many commerical interfaces. Given that the tool is still very new (Scholarly citations from Microsoft Academic Graph was added only 2 weeks ago) , this is pretty impressive and the speed of updates so far, makes me wonder how far it will go, though of course it is still early days.
This is going to be a detailed review in two parts. The first part will focus on the features offered by Lens, wth a focus on the Scholarly search portion of Lens.
This will be followed by a simple test of the data in Lens by extracting items with a given affiliation and comparing the overlap with data from Scopus/Scival.
A detailed analysis of the results will be provided in my next post.
Table of content
Introduction to Lens
Lens is supported by Queensland University Technology and Cambia, which is self-described as "a non-profit social enterprise that for over two decades has been dedicated to democratizing problem solving using science and technology."
I have never heard of Cambia until now, but I understand they have been in the patent search space (starting from life sciences before expanding to other patents) since the 90s. With funding from organizations like Rockefeller Foundation , Bill & Melinda Gates Foundation they have slowly expanded the capabilities of Patent Lens. pooling data from various Patent offices around the world, and added innovative features like Patent Sequence Search . I'm probably not doing full justice to the history of Cambia and Lens, please read the timeline for more details.
Lens makes the following pledge.
.

Open, free, privacy - pledges by Lens
Privacy conscious, Free (no premium options) and Data that is open ? Sounds too good to be true. The librarian in me wonders about sustainability and business models but as this post focus on the features of Lens, I shall refrain from commenting further.
Understanding the Lens Record
The most interesting aspect of Lens to me besides the patent related functionality, is how it blends together data from various open data sources - including
This is very similar in concept to OpenAIRE's DOIBoost project which aims to enhance Crossref metatdata with MAG, ORCID and Unpaywall data but besides the difference in sources used, Lens has a full infrastructure around it and allows anyone to query and use without any technical expertise for free.
The idea here is that by combining all these open sources, one can cover for the gaps in any single data source to provide a super "meta" record that is better than any individual source.
Below shows one particularly complete record in the search results.

A particularly complete record in Lens search engine result page
Besides Unpaywall's links to open copies, the latest version of Lens even tries to use the OCLC registry to try to redirect you to your institution's resolver to access items via institutional subscriptions.
Unfortunately, my hunch is not that many libraries maintain their Openurl settings in WorldCat registry so this might not work for many people. Also the release notes that this feature currently works only for records with doi, though this might change in future versions.
Most of the tags listed such as "open access", "patents", "affiliation", "Field of study" , "Clinical trial" reflect data from various sources, I can hazard a guess where most of them come from.
Still, in case you are wondering, the "Full text" tag also indicates Lens has the full text of this paper indexed probably via CORE possibly the largest open repository of full text of Scholarly item.
Clicking into the record, you will see this impressive full record.

A particularly complete Lens record - Full record view
On the top right you can see various pid (permanent IDs) from Crossref, Microsoft Academic, Pubmed, Pubmed Central and a pid created by Len itself, so you can see what sources the combined record uses & click through to resolve the pids.

PIDs in the Lens record
To learn more about the Lens ID, refer to this help file , but it is essentially "15 digits long, broken in five groups where every 3rd digit is separated from the next with a dash (-)m followed by a check digit.
Below that, you get various information, in particular citation counts from Scholarly material and patents, references in the paper, funding info, links to Open access version and more.

Citation metrics in Lens full record
In many ways this combined record compares well against commerical A&I records from Scopus, Web of Science and Dimensions.

Lens Full record is comparable to commerical alternatives
Of course, not all the records will be that complete, it all hinges on the completeness of the various sources Lens drawn from.
Lens as a discovery tool

Lens default search box offers Patent (default) and article search
On the Lens homepage you can choose to search for patents or by articles (clicking on the "Scholar" button). If you just enter your query and hit enter, patents will be the default search.

Use the drop down arrow to do a structured (advanced field) search
Perhaps of interest to power users will be the "Structured Search" option, this is effectively the advanced search. The "classification viewer" option is for patent search.

Lens - structured field search for articles
The structured search will have a toggle to switch between Scholarly articles and patents.
Looking at the fields for Scholarly search can be awe inspiring, as you can see there is a wide variety of possible fields to search through , all thanks to the various data sources Lens pulls from.

Some fields you can use in structured field search
You get the usual title, abstract, author, keywords, pub type etc fields
Significantly you get "Author affiliation Name", so you can search by institutions, something you can't do easily with most free services (the exception is Microsoft Academic though you can't filter direct but need to requite on the semantic search - type and select from drop down) . I'll talk more about this later.

Restricting to affiliations in Microsoft Academic
You also get less commonly seen fields related to
Controlled vocabulory fields like MESH (from Pubmed), Field of Study (from Microsoft Academic)
Funding (eg Funding name, Funding Country, Funding ID etc)
Clinical trials (eg Clincial trial registry, Clincal trial ID)
Chemical substances (e.g Chemical Registry Name)
Reference (by Reference ID and Reference by Len Id)
Sources (e.g Source Country, Publisher, ISSN, Source subject etc)
Conference (e.g name, location, year etc)
Open Access (e.g Color, license)
I'm not too familar with some of these fields, but it's obvious some like the Open Access data is drawn from Unpaywall , a lot of STEM related fields are drawn from Pubmed, sources are probably from Crossref etc, while the more common ones are merged from more than one source.
In a way, the fact that it has a lot of filters is also a weakness. It isn't easy to tell how completely populated the fields are. For example, where is the Funding data drawn from? How complete are they? Does every record have a author affiliation field? That said there are flags you can use to restrict your search.

Flags in Lens
For example, you could restrict your search to only records with funding data, if they were open access, or even only if Lens had full text indexed.
Boolean, saved queries, collections and search history
Lens has full blown support for boolean operators which together with comprehensive filters, search history , saved queries and saved collections (compares to "saved lists" features for many databases) means Lens is feature rich enough even for power users. But in some sense this may not be as relevant as it seems as I wonder how many users will use Lens as a purely discovery tool when Google Scholar exists.

Search History in Lens
The shadow of Google Scholar looms large in discovery
I would rate the emergence of Google Scholar as a scholarly discovery tool has perhaps the most significant development in this field so far in the last 20 years. While it is by no means perfect and lacks some features wanted by power users, it's existence and popularity is the reason why it's extremely difficult now to charge for a general purpose scholarly article discovery index. (My view is Scopus and Web of Science mainly persist due to their importance as citation indexes, while other subject specific A&I have specific disciplinary features).
This is reflected perhaps by the fact that one of the newest entries - Digital Science's Dimension provides a freemium discovery search that matches up with Google Scholar but adds the conventional filters that Google Scholar lacks and even throws in some visualizations of results.
Microsoft Academic is almost eeriely similar as a discovery tool. (To be fair both Dimensions and Microsoft Academic boast about using latest Machine learning and Natural language processing techniques for entity recognition and topic modelling, but we are talking about features users can use and it's arguably how much of what techniques actually lead to an improvement in relevancy of results compared to what Google Scholar is doing)So it is no surprise that Lens as a Scholarly discovery tool follows the same playbook when displaying the results and shows not only results, filters but also a visualization panel on the right.

Search result page in lens with filters and visualization panel
Still Lens is perhaps servicable as a discovery service if for some reason you want a change of pace from Google Scholar and use a tool that is designed for power users.
The size of Lens vs other indexes
For sure the size of the index is not an issue, as of writing Lens has over 194 million records , which would cover the vast majority of peer reviewed material and Scholarly items you would want to find.
For comparison as at time of writing Microsoft Academic is boasting of 200 million records - which makes me wonder again about how much this one source is impacting Lens and whether the Scholarly search of Lens is just Microsoft Academic really.
That said searching Microsoft Academic and Lens with the same search keywords will give you very different results (far fewer in Microsoft Academic) because Microsoft Academic uses a "semantic search" rather than traditional keyword matching!
For further context on sizes of databases, Latest research as of 2018 estimates the sizes of the following databases
Google Scholar - 389 million
Proquest (selected 19 databases) - 279 million
Ebscohost (Sslected 375 database) - 132 million
Bielefeld Academic Search Engine (BASE) - 117 million
Web of Science (10 databases including core collection) - 105 million
Scopus - 72 million
Web of science (core collection) - 67 million
So clearly Lens is up there with the biggest , except for the might Google Scholar of course.
You could use Lens as a discovery tool by typing keywords but the fact that the default ordering is by Scholarly cites gives you a slight cue perhaps that this isn't the main purpose?
Still you can change the sorting to any of these
Relevancy
Scholarly citations
Patient citations
Publication date
Source Subject
Source type
Visualization panel in Lens
Somewhat similar to Dimensions, you can see visualizations of the results set on the right of the screen. You can choose to open this as a full screen.

Full page visualization panel in Lens
By default you get institutions displayed as a logo grid as well as various visualizations of authors, journals, Field of Study, year of publication etc.
But you can visualize almost any of the other fields such as Funding, Open Access Color, Open Access License, Conference related fields, Citation type etc.
I was wondering what the "citation type" visualization does, but it basically shows you the source where the data comes from.
For instance take this visualization of the research set by searching for items with author affiliation for my institution (filtered from 2013-2017).

Citation type in Lens
As you can see, the result set has 2,345 records but magid (Microsoft Academic ID) alone contributes 2,329 records. Crossref (dois) are almost as numerous and there are very little from Pubmed (pmid) or Pubmed Central (pmcid) sources.
This makes a lot of sense since my institution does not have do research in life sciences. It also shows there is a huge overlap between Crossref records and MAG and in this case at least MAG "knows of" almost all the records in the result set, though no doubt the merger of both Crossref and MAG might provide better quality of data than just using MAG.
Of course, the real selling point about Lens is that it shows you the citation counts of result sets at various levels up to institutions.
Let's talk about that.
Lens as a citation index and benchmarking tool
In many ways, tools like Scopus and Web of Science are valued today less for discovery per se but the citation metrics they provide. I suspect many libraries subscribe to them for this very reason.
To be fair Google also has metrics and you can find papers citing papers you are looking at but functionality is limited, though admittedly the author profiles in Google Scholar have had good takeup rates.
What Google Scholar and most freemium services like Dimensions lack is the ability to compare by institutions and to export in large batches.
Many institutions pay Elsevier and Clarivate large sums for Scopus/Scival and/or Web of Science/Incites for access to such features for benchmarking of institutions. Research Offices and libraries hunger for information on such benchmarks and measuring of impact .....
Can Len do some of this for free? Surprisingly yes.
Like any normal citation index, Lens allows you to find Scholarly citations of each paper by clicking on it. Until two weeks ago, this citation count included only open Crossref citations which is known to be lacking due to hold outs like Elsever, IEEE, ACS, but now it includes citations from Microsoft Academic Graph, which of course makes Lens citations far more complete, thiough perhaps at the cost of some accuracy as MAG data is mostly automatically extracted from bots.
Another piece of good news. Lens allows you to filter by affiliation, a feature usually not available in free versions. To do so, go to the structured search, select Scholarly toggle and use the author affiliation field.

Using structured search to search for items with affiliation
For my institution this gets me almost 6k records, you can add a bunch of filters after that, but the main trick is you can do the same visualization as before to study aspects of the outpost of your institution.

Visualization of institution output in Lens
For example you get handy stats for total cites from Scholarly material and patents to the almost 6k publications of my institutution. So for my set of results of 5,933 records
134 them were cited by patents
This came from 382 patents
There were 496 total patent citations
3,962 records were cited by Scholarly items
This came from 75,226 Scholarly items
With total cites of 90,855
Cites from patents rather than scholarly items is perhaps a different way of measuring impact of articles than the usual citations. Elsevier's Scival has a tab showing "Economic impact" which shows this very metric. I will talk more about this in the next blog post.
Besides citation metrics, you can also get the same visualizations as before, in particular you can see how much of the output is open access, by license, by top journal titles, collobrating institutions etc.
Saved query and batch uploads
But besides showing useful statistics, Lens would be less useful as a tool, if one was unable to batch export the results for further study.
In fact, Lens is very capable here too. Once you have registered an account (possibly linked to Linkedin or ORCID) you can add all the results in a search and add to a collection.

Adding results to collections in Lens
Below shows some collections I have created. Some are for institutions for my comparisons and some are for authors.

My collections in Lens
Such collections can be set up to be shared publicly or privately and most importantly do a batch export up to 50k records in CSV, BibTeX, RIS, JSON formats

Exporting up to 50k records in Lens
The last is a particularly generous feature, given that Scopus only allows 20k exports. This should be sufficient for most bibliometric analysis of even the largest institutions (in a batch or two).
Exporting of references and citations
It is somewhat disappointing though that while you can export a bunch of fields , unlike Scopus or Web of Science you can't export the references, which can be useful for bibliometric analysis using tools like VOSviewer.
One workaround which doesn't quite scale for large datasets is to go to the full record of every item, click on the citations tab.
Then lower down, select "XXX references" and click on view articles in Scholar Search

On the resulting search result page, add to a new collection. Rinse and repeat as required, then do a batch export. This works also if you selected "XXX Scholarly Citations" instead.

Here's a tip, you can search directly for references of a an item in Lens by going to the structured search , choosing the field "Reference ID" and entering the Lens ID of the item (without hyphens).
Some preliminary thoughts of Lens as a citation benchmarking tool.
Lens is clearly still a state of evolution, so what follows is just a preliminary assessment
Compared to expensive tools like Scival and Incites or Dimensions (for analytics), it lacks more sophisticated metrics like Relative Citation Ratio (RCR) , Field Weighted Citation (FWCI) and even the simple H-index etc though if you really wanted to, you can sort by citation count and manually get the H-index. I've found sorting by citations somewhat quirky and occasionally the sorting doesn't work properly, though reloading often works.
Lens currently still lacks an API but I'm told this will be coming soon.
Of course, Scival and Incites have much better features for benchmarking than Lens (e.g. the function of plotting various institutions against each other), but to be fair Lens should be matching up with Scopus/Web of Science rather than Scival/Incites and against them Lens is more than capable function wise.
That said, citation indexes need to be trusted to be accepted. As of now, there are only 2 citation indexes that are taken seriously - Scopus and Web of Science. Both of them are over a decade old and are used if (only partly) for many real world decisions in particularly University rankings and Faculty assessment.
Google Scholar is the third one often mentioned by faculty who like the higher citation counts, but IMHO is unlikely to be seriously accepted in say University rankings because the data is not transparent and cannot be easily studied (lack of infomation, batch exports and API) .
Digital Science's Dimensions is my bet to be the third widely accepted citation index (because they have made in roads by working with institutions) , though it will take years if not decades for this to occur.
But none of these are free, will Lens make the grade as a accepted citation index eventually?
How good is the data of Len?
The elephant in the room of course is I have been talking about features of Lens, without considering the quality of the data. Given that Microsoft Academic Graph is a big part of the data, one might suspect the quality of the data relies heavily on that source. One also wonders at how well the merging process occurs, for example which source takes priority if there is a conflict? How does the mapping occur between publication types listed by Crossref, PubMed and Microsoft Academic Graph?
I will cover in greater depth a couple of tests I did in my next blog post, but here are some preliminary results I got in a test I did by comparing items in Lens with author affiliation - "Singapore Management University" against the output I obtained from Scival.

Output was matched first using DOI , followed by a fuzzy match of titles ( matched only if pub year is less than or equal to a year).
As you can see the overlap is fairly respectable (52.5%). A slightly surprise was that the output of Lens for my institution (2,345) was only slightly bigger than Scival's (2,271).
A interesting study to compare against would be this study by Leiden University on "Accuracy of affiliation information in Microsoft Academic", though the study used Web of Science to compare not Scopus, given that Microsoft data is a big component of Lens,
I then analysed the differences in results, with led to quite interesting insights but that will have to wait for the next post.

