The Initiative for Open Abstracts (I4OA) launches & why I support it - some reflections

Do note what follows is my own personal reflections while working on I4OA and may not represent the views or opinions of I4OA as a whole.
Last month I4OA - the Initiative for Open Abstracts was launched.
The mission of I4OA is simple as stated on the home page
"I4OA calls on all scholarly publishers to open the abstracts of their published works, and where possible to submit them to Crossref."
Founding members and individuals of I4OA included representatives from
Scholarly publishers e.g. Sage Publishing, Royal Society
Infrastructure organizations e.g. Crossref
Reseachers & Biblometricians e.g. Ludo Waltman, Cameron Neylon, David Shotton , Cassidy R. Sugimoto, and many more.
It also included two librarians. Besides Bianca Kramer of Utrecht University , I was also one of the other librarians involved.
So why this I join I4OA?
As I noted in the I4OA launch event
The promise of I4OA was to allow people
To be able to easily access and use clean and well formatted abstracts from a centralized source with minimal effort.
VS the current situation of
Extracting messy, often scraped data from varied sources and/or from expensive commercial databases that may have selective coverage (e,g, Web of Science)
Why is this important? A common response I've encountered on hearing about this is to go "isn't all abstracts freely available to read already?"
The answer is yes, sort of. For humans, most of the abstracts is freely available for us to read on the publisher webpages. However for a machine that can and will read thousands of papers, it is far more difficult for them. Typically you would need to crawl thousands of webpages and scrape the data out which is not only resource intensive but can lead to poor quality data.
Clearly having easy access to clean well formatted abstracts would be a boon to researchers who need to do TDM (Text data mining) on abstracts and we should not underestimate the potential benefits such knowledge extraction can bring particularly with the improvements in NLP.

But what if you don't do TDM directly? Why would you care?
One thing that has dawned on me recently was that as I look back at my blog posts in the last 2-3 years, a common theme that emerged seems to be how the freeing up of open metadata and content has led to an explosion of innovative applications, ideas and tools.
In particular, I have watched with admiration how the sister organization of I4OA, I4OC the Intitative for Open Citations successfully pushed up the amount of Open Citations available in Crossref from 1% to 50% by lobbying publishers to deposit and make open the citations in Crossref.

Citation Gecko - a citation mapping tools that uses the open citations in Crossref
The percentage of open citations might just be a figure if it was not used in a practical way, but as I started exploring tools like Citation Gecko , VOSviewer and a growing number of similar tools.
I started to realise the effectiveness of such tools are directly related to how complete the metadata was available in Crossref and other open sources.
The same is happening with abstracts with many tools like Open Knowledge Maps, Scholarcy, Iris.AI, discovery tools and RIMs and more all becoming more effective as the amount of open abstracts increases.

Open Knowledge Maps - a visualization tools that uses the open citations in Crossref
As such when I was invited to join I4OA last year, I decided to take the plunge to join and try to make the world a little better in a small way.
Stakeholders support I4OA
Besides the founding members, we at I4OA have also the support of a large number of scholarly communication stakeholders including library consortiums, funders, vendors and more.

https://i4oa.org/#stakeholders
Publishers that currently support I4OA can be found here.
Experience working in I4OA
It was intimidating for me being part of I4OA, as I was really just a librarian blogger when the team was made up of many well known, knowledgable people in the publishing, Open Science and Open Access worlds, many who have done much to try to improve the Scholarly Communication ecosystem and in the final analysis improve Science and Scholarship.
In particular, watching all members in the team, be they from Infrastructure organizations (Crossref), researchers from research institutions (Bibliometricans, Open Access Activists) and even publishers work tirelessly to use their talent, energy and networks to try to convince more publishers to make abstracts and citations open drove home the point to me that perhaps not all publishers should be painted with the same brush and some were indeed trying in their own way to make things a little better (something I knew on paper but didn't really feel it until now).
Things are never absolute of course, but I wonder if looking at the list of publisher and stakeholder supporters on both I4OA and I4OC lists might give you some hints on who might perhaps be considered more on "the side of angels".
I4OA is a humble and modest proposal for improvement
One of the things that surprised me about reactions to I4OA was that we got a few reactions that was against I4OA because they felt that I4OA was a "distraction" to the goal of achieving Open Access.
Others wondered why I4OA did not particularly specify the licenses for the abstracts made available in Crossref, such that some of the abstracts might still be under copyright.
You can find the official answers to these two questions and more at I4OA FAQ here
My personal take is I4OA is at it's heart a small and modest proposal to improve the scholarly communication system using current mechanisms that are already part of publisher workflow i.e. depositing of abstracts and other metadata into Crossref when they mint dois.
In terms of copyright, I am not a lawyer, but what I have been hearing is that "transformative use" which most text data mining process would fall under would mostly be okay.
I can't speak for everyone on I4OA, but I personally of course support Open Access and if I could wave a wand today to make it happen I would.
Do note however that even open access journals may not upload abstracts and metadata properly in Crossref or support TDM properly so I4OA has benefits even for fully Gold Journals by ensuring this is properly done.
But we all know, the multitude of problems facing Open Access isn't likely to be solved in the near future so why not work to improve what we can now?
Morever, I don't believe advocating for open metadata and abstracts will impede in any way open access advocacy.
I must admit, I don't really buy the argument that achieving high rates of open metadata and abstracts will lead people to give up fighting for Open Access because the benefits of Open Access are not exactly the same as just open abstracts/metadata.
After all, text data mining isn't just the only and perhaps arguably isn't even the main reason many people are pushing for Open Access.
Conclusion
The future of open abstracts, metadata (including citations) is only just beginning.
However, as Christian Zimmermann who has worked tireless on the RePEc (Research papers in Economics) notes many of the benefits and applications we hope to achieve with open abstracts and metadata isn't exactly new and is well known from his work in th Economics discipline with RePEc but hopefully I4OA and related initatives can help bring such gains to other disciplines as well.
But how do we stand right now?
In an analysis by Crossref and some member of the I4OA , they found that
As of September 1, 2020, abstracts were available for 21% of all journal articles in Crossref in the period 2018-2020. It is important to note that not all items in Crossref will have a abstract so even at the best of times we do not expect a figure of 100%.
For a comparison, in Web of Science (Science Citation Index Expanded, Social Sciences Citation Index, and Arts & Humanities Citation Index), 86% of all journal publications in 2018 and 2019 that have a DOI also have an abstract.
Yes! This is a fantastic idea! Can you imagine how many amazing things we could build if all abstracts were open??? SO MANY AMAZING THINGS. https://t.co/QdPSaJV3Iy
— Jason Priem #BLM (@jasonpriem) September 24, 2020
where to begin! Search/discovery tools, summarization, entity mining and graphing, uncovering research fronts (emerging and historical), coauthor discovery, automated hypothesis generation, automated lit reviews, teaching tools, argumentation/claim graphs, nanopub extraction...
— Jason Priem #BLM (@jasonpriem) September 26, 2020
Want to help? Do consider signing up to support as a stakeholder

A big thanks to Gulcin Gribb and Bethany Wilkes (former University Librarian and current University Librarian respectively of my institution) as well as the Provost - Timothy Clark for supporting my involvement in this initiative.

