Crossref Participation reports - assessing quality of metadata provided by Publishers
I was recently invited to the Changing Landscape of Science and Technology Libraries (CLSTL) 2019 conference. This would have been my second time speaking there but unfortunately due to personal reasons I had to give it a miss.
Still I contributed a recorded talk for the event .
My talk was about how compared to a decade ago, the tons of Scholarly metadata that is now freely and openly available to be leveraged by API.
While other organizations like Pubmed, OpenCitations, BASE, CORE, Microsoft's MAG have contributed to the availability of this data deluge by extracting data from articles and harvesting and enriching existing sources, it cannot be denied that PID providers/infrastructure namely Crossref, Datacite, ORCID with ROR (Research Organization Registry) to join them soon are perhaps central to this change.

As I write this, I am now thinking about ARL backed SHARE and publisher backed CHORUS as additional sources of open metadata on scholarly items.

https://share.osf.io/
Checking publisher metadata quality via Crossref Participation reports
That said my sense is Crossref metadata in particular the Crossref metadata search API is still the most widely known and used source of scholarly metadata. The newer and lesser known Crossref event data API I think produces even more exciting data by documenting "events" around Crossref dois - e.g. number of times dois are mentioned on Reddit, Wikipedia, Blog etc , which works out to Crossref.
Of course, the main issue is, Crossref data which has reached 100 million dois is only as good as what the publishers deposit in Crossref. While there are some minimal standards on what to deposit (e.g. title, author, journal issue etc), publishers can differ a lot on what else they capture and deposit. For instance some will include ORCIDs, References, Abstracts etc, some may not.
Is there a way to tell which publishers are doing a better job?
Up to recently, it was tough, but now you can check Crossref Participation reports.
Just enter the publisher name and you will see a nice dashboard. Below shows the report for Springer Nature

Crossref participation report for Springer-Nature
By default it shows you the report for all current content (last 2 years) for journal articles, but you can change it to other content types like Books, Book chapters, even Datasets. For journals you can even restrict to specific journal titles to see how well the favorite journal you love to submit to handles metadata.
So for instance, you can see that for journals - current content, there are 748,950 articles deposited in Crossref by SpringerNature (look at the text just below the "Content Type: Journal articles navigation bar")
Of these articles, 92% of them have references. More importantly 99% of it is all made open (yes, some publishers like Elsevier, ACS keep the references closed) . You can also see 33% of these articles have ORCID, 31% Fundref Ids etc.
Here's another interesting comparison with the publisher ACS (American Chemistry Society)

Crossref participation report for ACS
On the face of it, ACS deposits very good metadata for journals in the last 2 years. 89% have references, a astonishing 88% have ORCIDs and 81% Have Funder Reg IDs. But the sad thing is 0% i.e. none of the references deposited are open, which means this isn't accessible to anyone.
Regular readers of this blog, would have read my past posts on the work of I4OC (Initiative for open citations) to convince publishers to make the references open so I won't belabor the point.
Not sure what each metric is measuring ? Click on the info icon.

Click on the info icon gives you more details on each metric
Analysis of Crossref Participation reports
The nice thing of these reports is that you can apparently get the data via APIs. This allows you to download the data in bulk for analysis.
I was planning to try to do this, but it seems Ted Habermann a metadata expert as beat me to it with his post Metadata Evolution - Crossref Participation Reports.
One of the more interesting things I've learnt is that the API gives you 11 metadata field metrics, the 10 shown on the web page plus an addition 11th - affiliation!
My reading is that being able to track the affiliation of authors accurately is now one of the more "hot" areas that many providers are competing on (notice freemium services like Dimension reserve filtering by affiliation to only paid customers). After all, a large part of Scholarly metrics is now driven by the interest in comparing institutions in terms of citations etc and you can't really do that without clean affiliation data. Possibly this new additional could be related to the new ROR (Research Organization Registry) mentioned above.
There are fascinating visualizations using Radar plots to visualize differences between current and back-file items, I highly encourage you to read it. A follow up article tries to tease out any possible relationships between size of deposits and completeness of metadata deposits.
Conclusion
All this is very interesting I guess but are there any implications for librarians? I guess the first thing to spring to mind is to hold publishers to higher standards when it comes to metadata, after all we pay large sums to them whether in subscription or Article Processing Charges. But to be honest, I don't see this happening as historically we are more interested in getting access to the content rather than the metadata.
That said a lot of free and open tools, as well as proprietary tools we use such as CRIS systems often rely on Crossref metadata and the better quality metadata we have, the more efficient our work can be.
It also cannot have escaped your notice that the more and higher quality such open metadata becomes, the more this eats into the value of A&I (abstract and indexing) providers particularly those that provide coverage of non-grey literature items.
While A&I providers like Scopus , Web of Science or Dimensions benefit from cleaner open metadata as well, it could potentially diminish the value of the additional cleaning and refinement work they do.
Lastly, I think knowing how to check for metadata qualify by publishers can be helpful when troubleshooting, helping faculty who are starting journals, or editors who want to hold publishers to some standard.

