The Petrol Tank for AI Discovery Might be Running Dry as Publishers close access to scholarly content such as abstracts due to AI incentives
Elicit.com, Consensus, and Undermind.ai are among the new leading comprehensive cross-disciplinary “AI-powered academic search engines” today.
Traditional databases like Web of Science, Scopus have also added AI features, such as Web of Science Research Assistant, Scopus AI, but in my view, they are still playing catch up, with Scopus AI only recently adding Deep Research.
Yet, these tools, alongside many others, have depended heavily on open scholarly metadata—especially Open Abstracts and full text—typically sourced from aggregators like Semantic Scholar Academic Graph/Semantic Scholar Open Research Corpus, OpenAlex, and to a lesser extent Lens.org and Core. Even other AI-powered search tools like SciSpace and Scite.ai, which emphasize their own proprietary indexes, likely derive a significant portion of their data from open sources, complemented by specific publisher agreements.
For years, this reliance on open metadata functioned as a kind of “free ride.” While the Initiative for Open Abstracts (I4OA) has not matched the success of its older sibling, the Initiative for Open Citations, in persuading major publishers like Elsevier or Springer Nature to deposit abstracts directly into Crossref (see list of committed publishers for abstracts), open aggregators found other means.
A November 2024 analysis by Bianca Kramer, for instance, revealed that OpenAlex had obtained substantially more abstracts from these very publishers, often achieved through multiple methods including web scraping.
As of November 2024, Bianca reported:
Overall, the proportion of journal articles from 2022–2024 with Crossref DOIs for which abstracts from Elsevier articles that are available is 82% in OpenAlex.
Crucially, even though Elsevier, IEEE, ACS did not deposit abstracts in Crossref, OpenAlex at all, you can see the % of abstracts in OpenAlex is much higher than 0% in these publishers. It is likely that other open sources (e.g., Semantic Scholar’s Open Corpus) and proprietary indexes (e.g., SciSpace) achieved similar coverage.
The Rise of AI Changes the Game: The End of the “Free Ride”
In November 2024, Bianca sounded the alarm:
In November 2022, OpenAlex removed abstracts for Springer Nature’s non-Open Access content (see GitHub commit).
and
Now, two years later (in November 2024) Elsevier seems to have taken similar steps to have abstracts from closed articles removed from OpenAlex (see GitHub commit). The effect of this step is already visible in the OpenAlex web UI and API, and is expected to affect the data snapshot from November 2024 forward. With previous abstract coverage for Elsevier at 82% in OpenAlex, this will result in the loss of approximately 1,1M abstracts for current closed journal articles, and up to 11.5M abstracts for closed journal articles of all years.
September 2025: what’s changed?
2 Oct 2025 update - the analysis here was done prior to the OpenAlex Walden Update/Rewrite - the results look quite different at least as of 2 Oct when you try the Walden beta results.
Taking Elsevier as an example, I ran a query on the OpenAlex API to verify the situation:
https://api.openalex.org/works?filter=type:types/article,primary_location.source.publisher_lineage:p4310320990,publication_year:2022-2024,has_doi:true&group_by=has_abstract
At the time of writing, OpenAlex returns:
With abstracts (2022–2024): 533,080 works
Without abstracts (2022–2024): 1,833,524 works
That’s 533,080 / 2,366,604 = 22.5% with abstracts in OpenAlex—a dramatic fall from the ~83% Bianca reported in November 2024.
More tellingly, when filtering for works with abstracts that are not Open Access, the result is empty. In other words, the only Elsevier works in OpenAlex with abstracts now appear to be Open Access.
A similar query for Springer Nature shows:
With abstracts: 483,478 works
Without abstracts: 865,304 works
Coverage: 483,478 / 1,348,782 = 35.8% - compared to slightly under 50% in Nov 2024
Again, the Springer Nature works in OpenAlex with abstracts appear to be exclusively Open Access.
Implications for AI Search Tools and the Broader Research Ecosystem
Assuming the analysis above is mostly correct, OpenAlex (and likely other open sources) are no longer permitted to display abstracts from Elsevier and Springer Nature unless the articles are Open Access. While I have not verified other sources like the Semantic Scholar Open Corpus, I suspect similar pressure on indexes exposing abstracts from non-Open Access articles.
The impact of this shift is profoundly negative, extending beyond just AI search tools:
Degraded AI Discovery: Semantic Scholar’s corpus, OpenAlex, and—to a degree—Lens.org or CORE are the key repositories of titles and abstracts that many “AI-powered search” tools (e.g., Undermind.ai, Consensus, Elicit) rely on. If abstracts from Elsevier and Springer Nature are taken down, incumbents that previously ingested them may continue to function for a while (unless explicitly asked to purge), but going forward, their effectiveness will be severely constrained. New entrants will be hit harder, limited to roughly ~22% OpenAlex abstract coverage (for Elsevier 2022–2024) instead of 80%+ just months ago—significantly degrading discovery effectiveness.
Reduced Searchability in general: Beyond AI, researchers using any platform that relies on these open aggregators will find that even normal keyword searches within these databases will no longer effectively retrieve closed-access articles from these major publishers, even if the abstract itself would have matched their query. This fundamentally compromises the ability to efficiently scope and filter literature.
Hinders Bibliometric and Scientometric Research: Open metadata, including abstracts, is a cornerstone for researchers studying the structure and evolution of science using tools like VOSViewer, CiteSpace, Bibliometrix, pyBibx etc. The systematic loss of this data compromises the ability to conduct reproducible, large-scale analyses of scientific literature without resorting to expensive, proprietary tools.
Non-academic deep research might still work: Interestingly, because the abstracts still remain on Elsevier and SpringerNature pages, LLMs that search websites directly (e.g., ChatGPT+web search, OpenAI/Gemini/Perplexity Deep Research) could theoretically still access them. However, this is a less efficient, less structured, and potentially unstable approach. Also, publishers could block these AI agents, and relying on direct scraping is not a sustainable, scalable solution for building comprehensive academic knowledge graphs.
Conclusion — where this leaves libraries, vendors, and our students
If AI-powered discovery is the shiny sports car, open abstracts are the petrol. We’ve just learned what happens when the tank runs dry. The sudden collapse from ~80% abstract coverage to ~20–35% for two of the largest publishers doesn’t just ding recall metrics; it quietly shifts the playing field. Incumbents that cached content early will coast for a while; newcomers will stall at the start line. More importantly, researchers—our users—lose signal at precisely the stage where LLM-assisted search ought to shine: scoping, filtering, and “is this worth my time?”
We, of course, understand why publishers are doing this—mostly due to the rising value of abstracts for training and discovery as AI companies are now hungry for data for training and are willing to pay publishers for it (See Ithaka’s list of Generative AI Licensing Agreement Tracker). Abstracts, once a mere summary, are now a strategic asset.
The solution is less clear, but our collective response must be robust. This critical juncture calls for:
Advocacy: Libraries, institutions, open science advocates must redouble efforts to push for genuinely open metadata policies. We need to explore linking this advocacy to subscription renewal negotiations, making the value of open abstracts part of the conversation with publishers. See also Barcelona Declaration on Open Research Information
Educating Researchers: Librarians must take the lead in educating faculty and students about the changing landscape of AI search tools. This includes explaining the data sources that underpin these tools, their new limitations, and the potential biases introduced by restricted access to abstracts. Researchers need to understand why some AI tools may now miss significant portions of the literature and how to navigate these challenges. They may also be the best advocates to publishers for keeping abstracts open as they care profoundly about making their papers discoverable.
Achieving high Open Access rates : Of course, if every article is Open Access, this makes the concern over Open Abstracts pretty much moot. While we have seen great strides in Open Access rates in the last decade, I think this will take a while before we can achieve near universal rates of Open Access.
Gemini 2.5 Pro and GPT-5 helped with clean-up.




Hi Kukuh
Just to be clear I am not a health or medical librarian so take my answers with a pinch of salt. I can't say I have heard about this 90% published literature figure. But I guess if you mean "reputable" journals in biomedical, it is probably right.
Ai search tools unless they are based on established databases tend to be vague on sources. But typically if they do name the source it typically will be Semantic Scholar corpus, OpenAlex or their own propertiary source (which tends to be them indexing free sources).
Then you can look up papers that analyse these sources such as OpenAlex, Semantic scholar and how they compare to Scopus, Web of Science, PubMed etc.
Though note that even if 2 AI tools eg Elicit and Undermind both claim to use Semantic Scholar, they may choose to index different subsets (e.g. exclude certain types) and the indexes they derive will not be identicial.
In general, OpenAlex, Semantic Scholar would be a much broader source than typical databases, but the quality would be a lot more mixed.
Might this have the potential of making open access articles more discoverable than closed? https://www.science.org/content/article/open-access-papers-draw-more-citations-broader-readership