Unpaywall Journals, InstantILL, expansion of Semantic Scholar - new developments announced during open access 2019 week that caught my eye.
When you have been in the library industry for a while, you start noticing patterns. For example, many new developments and announcements (e.g. mergers, launch of new products and services) that are intended to have a big splash are often announced at big events like ALA annual in June.
To a much smaller extent we are starting to see this as well during Open Access Week.
This year we see exciting developments by two non-profit organizations, Our Research (probably still better known formerly as ImpactStory and the creators of the Unpaywall) and Open Access Button
While both started off with browser extensions that helped users find open access copies (see my blog post on developments around such browser extensions), they have thankfully started to diverge in the areas they are working on.
In Oct 2019 the following was announced/officially revealed
Our Research announced a new tool - Unpaywall Journals which "is a data dashboard that combines journal-level citations, downloads, Open Access statistics and more, to help librarians confidently manage their serials collections". To expand on this, using Unpaywall usage and stats it projects your usage for the next 5 years and together with parameters you set like rate of ILL, ILL cost, expected annual increases, cost of subscription etc, predicts the net impact of cancelling subscriptions and relying on open access articles + subscriptions.
Open Access Button
announced a new API with additional capabilities , in particular "api routes for accessing library subscriptions"
InstantILL was released in beta and "we’ll invite the over 300 libraries on our waiting list to test InstantILL’s simple, self-setup process to roll out the tool", with a new website demo.
announced a partnership with Montana State to make self-archiving simple with Shareyourpaper.org.
Towards the end of the week, yet another non-profit, The Allen Institute for Artificial Intelligence announced that Semantic Scholar has been enhanced with data from Microsoft Academic Graph leading to a big jump in coverage of 40 million to over 170 million records (for context Scopus has around 70 million records).
This is exciting news as Semantic Scholar which was first launched in 2015 with many interesting features such as citation sentiments (allowing you to tell why a cite occurred - whether it was for background, a cite of methodology or cite for result) but used to cover mainly Computer Science and Biomed domains only but now supports most major disciplines from Art to Sociology.
Incidentally all three developments are by non-profits and as such are either free (InstantILL and Semantic Scholar) or low cost (Unpaywall Journals).
Want to know more about these tools? Read on.
Quick links to section of blog post
Finding influential citations to your papers - Semantic Scholar

Semantic Scholar
Microsoft Academic Graph (MAG) seems to be on a roll recently, with more and more search engines incorporating this data-set.
Semantic Scholar now joins Scinapse and Lens.org and Microsoft academic itself to use the broad and deep store of academic content collected by Microsoft crawlers (similar to Google Scholar's) to create a search index.
Given MAG is open data available by a ODC-BY license, it doesn't take a prophet to predict more will follow in their footsteps. But why would we be interested in Semantic Scholar when the others exist?
One thing about Semantic Scholar as the name implies is it puts it's own spin on the content, which incidentally uses more sources than just MAG
According to the FAQ "Our engine analyzes publications and extracts important features using machine learning techniques. The resulting influential citations, images and key phrases allow our engine to “cut through the clutter” and give you results that are more relevant and impactful to your work."
I tend to get skeptical when I see search tools claim "machine learning techniques" but Semantic Scholar might be the real deal.
I will reserve another blog post to compare Lens.org , Scinapse , Microsoft Academic and Semantic Scholar so in this section I will briefly mention the interesting features without evaluating them.

Semantic Scholar filters
On the surface, the Semantic Scholar interface isn't anything special, you have expected filter types like publication type, Author, journal and conferences etc. There's a nice button to filter to "lit reviews" which "utilizes a series of heuristics to determine if a publication is classified as an overview by analyzing the document’s content and wording." Interestingly there is also a publication type for meta-analysis.
As they draw from sources such as Pubmed, Microsoft Academic etc this means a flood of non-publication material , so it's a nice touch that they have supplemental content (e.g. video, news, clinical trial) as a separate filter rather than lumping them as a publication type filter signalling they are treated differently which they indeed do.
They try to relate each of these supplement materials to the traditional publication so for each item detail page you can see a supplementary material section that allows you to look at slides, clinical trials, slides associated with the paper.
Post-filter there is nothing much to shout about with date range filters, author and the ability to watch or explore topics.

Post filters in Semantic Scholar
It interesting to note that Semantic Scholar uses "Fields" and "Topics" categories.
Fields which are broader categories seems to be inherited from the 19 first level hierarchy - Field of Study terms from MAG.

Fields filter in Semantic Scholar that matches up with MAG's Field of Study
Topics on the other hand are more granular and are generated by Semantic Scholars own in-house Machine Learning techniques.

Topics tagged to each Semantic Scholar article
Exploring the topics you get a very nifty view allows you to browse other related topics both broader or narrower.

Exploring topics in Semantic Scholar
This is of course similar to "old school databases" or rather abstract and indexing databases with proper thesaurus but the cool thing is all these topics are auto-extracted by Semantic Scholar using latest Machine learning/NLP techniques. I'm really curious how this compares to Microsoft Academic own "Field of Study" topics which are also auto-generated and hierarchical .

Exploring Microsoft Academic Graph - Field of Study
Still all these features are not unique to Semantic Scholar, but here we go into one that is - the way Semantic Scholar displays citation sentiments.
The idea of citation sentiments is this. Conventional approaches to citations only record that a citation from one paper to another occurred but we have no idea what that citation means.
Readers of this blog will of course immediately think of scite.ai which uses supervised machine learning to build a model to try to predict if a cite was a mere mention, a supporting cite or a contradicting cite.
Yet, there are other ways to do citation sentiments and prior to scite.ai, Semantic Scholar was already classifying citations into
a) Cites of results
b) Cite of methods
c) Cite of background

Citation sentiments in Semantic Scholar
On top of that, they also tag citations from papers that were considered to be "highly influenced" by the cited paper.
If you are interested in the technical details of the Machine learning on how this works refer to papers Structural Scaffolds for Citation Intent Classification in Scientific Publications and Identifying Meaningful Citations (also see my blog post here for a very brief summary of the paper)
This is not new of course, but now this feature works for most disciplines!
Like many new search engines they encourage you to create a profile to claim your own papers but because they calculate citation sentiments on which papers are highly influenced by which other papers it also opens up the opportunity for interesting visualization ideas like this.

Profile showing authors that influenced you and authors you influenced the most
I have recently been exploring how Science mapping tools like VOSviewer and Citespace are increasingly going beyond the standard Scopus, Web of Science and supporting data exported from other alternative citation sources for visualization.
I noted back then that VOSviewer supported Semantic Scholar for network visualization (via the free API) but at the time it wasn't that useful for me because of the limited subject areas.

VOSviewer supports Semantic Scholar among other sources
Of course, what would be really interesting is if VOSviewer could either limit the visualization to citations which certain sentiments.
For example, imagine a citation network that instead of using all citations, used only citations for methods and results (or each individually), how different would it look? Would we get any additional insight if we visualized all the citations regards of sentiment but color coded them by highly influential or not?
Let's move on to our next new tool - InstantILL
An easy way to deliver articles - InstantILL
One way of looking at the work of Open Access Button is that they try to look at the whole workflow and ecosystem around open access, identify friction and work to improve inefficiencies in these areas.
One way Open access is used in libraries is to reduce the need for ILL/DDS. When a request is made for an article, for many libraries, staff will do manually checks for free to read copies.
DeliverOA which supports various ILL systems such as
Alma
Clio
ILLiad
Tipasa
tries to reduce the manual checking involved. If Library staff are using say alma, they simply need to go to the resource sharing request page whenever a request comes in and click on the button.

Instructions for Open Access Button working on Alma resource sharing requests page
But why not even filter such requests that have OA versions much earlier aka at the point the user is making the request?
Of course many users are making ILL/DDS requests via the link resolver and these days more and more link resolvers are starting to incorporate OA checking functionality, however what if the user is entering through a form?
This is where instantILL comes into play. It is a improvement on most old school ILL/DDS forms most libraries use for users to enter requests.
What makes it superior? Firstly there is a simple intuitive search form that allows you to search by full article title, citation, DOI or URL and the system makes a best guess of what you want.
In the demo, it shows what happens when you enter a doi

InstantILL Search box for making requests - enter title, citation, DOI or URL
It will respond with the item it thinks you are looking for, a free copy if available (it knows what subscriptions you have via your link resolver or presumably if there is a free OA copy, though in this example it shows a subscribed copy) and lastly , a box for you to enter your email if you want the ILL/DDS process to go on as per normal .

InstantILL form responds with a link to online version
If you enter a title instead , the same process should occur. In the example a free copy is found.

InstantILL finds a free version
In both examples above if for some reason instantILL identifies the wrong item you want, you can click on "this is not the article I searched". and you can correct the pre-populated fields.

If InstantILL suggests a wrong item, you can correct it
InstantILL has been announced since earlier this year, but the medium post states that it is now in beta and you can try out a demo at https://instantill.org/. The nice thing about instantILL is that it can act as a front end, such that only when the final request is confirmed to be necessary it can then be sent to your ILL system.
All this looks very nice and the only minor improvement I can think of comes from a much earlier (2006!) proposal by Peter Murray that proposed using AJAX to autocomplete as users typed in fields in the old school citation linker.
Why not do some smart autocomplete as the user types the title? As the form can already resolve titles (via Crossref?) why not do autocomplete using MAG/Crossref/Lens etc? Perhaps costs?
But I guess in this day of ubiquitous cut and paste, auto completing titles is just a small improvement.
Business model for instantILL
And how much do all these features cost? Nothing!

There is a "leadership" tier that gives more analytics and better support it seems, but the main features mentioned above are all free!
Overall, I suspect many libraries will benefit from changing whatever form or email link they have on their website for ILL (though I wonder about more complicated library systems that have a lot of options for purchasing), but how big will be the impact?
My impression is generally most ILL/DDS requests come through the link resolver but perhaps a superior ILL form will change this?
An easier way to fill up your repository - shareyourpaper.org,
It's well known that the main problem facing institutional repositories is the difficulty faced in getting researchers to deposit even with a institutional mandate.
There are two aspect to it, firstly getting researchers aware and motivated to self archive and secondly getting them to actually go through the steps of self depositing.
The first aspect can be solved by systematically testing to see what kind of appeal is likely to motivate researchers to self-archive.
This is where a set of professionally well tested email templates can be important and Open Access Button provides them.

Email templates by Open Access Button to encourage self archiving
To add insult to injury many researchers will actually deposit their papers to subject repositories , ResearchGate, Academia.edu and anywhere else but institutional repositories.
There are many reasons for that, but one of them must be the poor interface many institutional repositories have for depositing.
That said, some institutions do have excellent systems to making depositing as easy as possible. Take University of Manchester's deposit form.

Apparently all you have to do is drag the author accepted manuscript into the form, enter the title and date of acceptance
This triggers an amazing process where the system automates tries to promote the just published research by auto-summarizing the paper, finding free copies and tweeting such information to twitter accounts that might be interested (identified using altmetrics) but I digress.
Open Access button's shareyourpaper.org/ tool is yet another tool to try to help with this.



"Shareyourpaper.org will allow libraries to significantly improve the deposit workflow and enable author-driven self-archiving without technical expertise or repository migration. This new tool will provide an institutional deposit portal, with direct deposit links for your repository and notifications of deposit. Instead of requiring complex repository integrations, shareyourpaper.org will deposit papers into Zenodo (an open-source repository operated by CERN) then support libraries pulling articles into their existing repository through bulk ingest and other mechanisms".
Like the InstantILL, shareyourpaper has a fully function free version and a leadership version.

Using projections of open access availability to guide subscription decisions - Unpaywall Journal
So far we have talked about initiatives and tools that help workflows around Open Access e.g those relating to ILL, filling institutional repositories. These improvements while important and will slowly yield gains are unlikely to make a big immediate impact on the current Scholarly communication system.
The next tool - Unpaywall Journals could potentially do it.
The idea is simple, as the amount of open access grows why not use that information to inform subscription renewal decisions?
This idea of using open access levels to either adjust cost per use or as a factor for evaluating renewals has been mooted in the literature as well as blog posts as far back as 2017. While a recent paper, gives an example of a library using 1Findr data (a tool similar to Unpaywall that gives OA levels of journals but has since been acquired by Elsevier), to help evaluate journals for renewals.
But in case you are unfamilar with the idea....
As s simplistic example, if 80% of the articles published in a journal (say publishing year 2018) is free to read, and you estimate the remaining 20%. would generate X DDS requests and these cost Y per transaction, it may turn out X * Y is actually far lower than paying for subscription, so you might actually cancel the title.
In reality is a lot more complicated than that of course. For example, you would need to project into the future, both the growth of open access articles , and future use of the articles in the cancelled journal title.
I am still trying to puzzle out how it works exactly but a recent preprint - The Future of OA: A Large-Scale Analysis Projecting Open Access Publication and Readership. sets out the methodology.
The paper which is interesting in it's own right projects that based on current trends of open access growth and usage, by 2025, 44% of all journal articles will be available as open access and 70% of article views will be to open access articles.
While it is possible to project open access growth trends, predicting article views seems to be a tall order since it would require you to get your hands on historic usage trends across all journals and you might be wondering how one could predict article views to open access articles?
The key seems to be that through the Unpaywall usage they have a good view of what requested items are open access or not and this allows them to fit a model to predict usage of articles in a journal as time goes by.
NOTE : Since I wrote this, Unpaywall Journal has released an official demo with a very different interface. The concept is exactly the same though.

I'm probably explaining this wrong, but roughly speaking, libraries will provide journal titles, costs and COUNTER downloads to Unpaywall Journals.
Libraries can adjust parameters like cost of ILL per transaction, ILL request percentage , Big deal cost annual increase etc.

Some people might think downloads are not everything and want to weight decisions by either institutional authorship in journals or cites to such journals. Unpaywall Journals extracts such information from Microsoft Academic, but the library can choose to give extra weight to them in terms of counting as additional downloads using "Downloads to add for each citation" and "Downloads to add for each authorship" parameters.
With all this information set, for each journal you can see whether it is worth cancelling it and relying on DDS or not.

So for example, for the journal Biomaterials you can see that across 5 years. given parameters set (DDS cost, expected requests, expected annual increases), if you cancel the journal you would actually save $13,925.
One thing you might notice is by default, this tool takes into account ResearchGate, back catalog and Open access. "Other delayed" refers to other methods that are not open access or via ILL, for example emailing the author or dare I say it sci-hub.
One would assume it would be possible to adjust the model to ignore ResearchGate copies or even to include only certain types of Green OA (say final published version and author accepted manuscripts only)
At the higher level you can also do similar projections at the bundle/big deal level.
You can try different combinations of renewal titles to see what % would have "instantly uses" and what % would be delayed use.
Alternatively you could enter a figure say something that is 33% of big deal, click on "simulate" and it would try to give you the best bundle of journals to subscribe that would maximize instant use.
Overall this is a very intriguing tool that has the potential to really change the game when it comes to journal negotiations.
Imagine if libraries starting using this tool to cancel or negotiate big deals/trans-formative deals one could imagine publishers might respond by tightening Green OA rules.
Normally I would say the chance of this happening quickly is low as librarians tends to be a conservative lot. However the people behind Unpaywall , Jason Priem and Heather Piwowar
have won a lot of good will in the community for years due to a proven record, and as such barely 4 days after announcement they have 400+ libraries on wait list and have announced one consortium has subscribed. No doubt more will follow.
AWESOME news: We expect to announce soon that @OhioLINK (118 libraries!) will be the first consortial user of Unpaywall Journals!!! Stay tuned....
— Unpaywall (@unpaywall) October 24, 2019
Another part of it is because the cost itself to subscribe is set at a very affordable 1000 USD. As they point out, given the high cost of journal subscription even cancelling one journal based on advice from this tool would easily earn back the annual rate.
Conclusion
Things are looking up in the open access/data world. For the few who have persisted to the end of this article, I'll see you in the next article!

