Why Open Access definitions are confusing

Apr 06, 2021

Synopsis : The definition of Open Access is far more complicated and debated than it appears on first glance. This goes beyond advocates simply disagreeing on how much "openness" is needed before it counts as Open Access (for example someone disagreeing that articles with CC-BY-NC counts as Open Access).

Rather, as open access categories start to get more nuanced and grainular, our attempts to force open access definitions into singular colors/metal labels such as (Green, Gold, Bronze, Platinium) and act like they all differ in just one dimension is overly simplistic, when in fact the definitions vary on multiple dimensions.

In the examples below, I review some of the recent confusions that I have encountered personally which shows the problems such simplistic thinking can cause.

The solution is to decompose open access categories among multiple dimensions not one (e.g. user rights, prestige, cost, peer review, immediacy), to avoid confusing, counter-intutive or arbitary definitions.

I'm in debted by discussion with experts such as Cameron Neylon, Bianca Kramer, Lisa Janicke Hinchliffe, Ricard Orr and more on Twitter. Any misunderstandings of the topic and misrepresentations of their positions (if any) are mine.

Disagreements of Open Access

Open Access is one subject that always seems deceptively simple but the more I study it the more nuances I notice.

People disagree about

a) Goals - Should the aim be achieve OA period, or to achieve OA at a much lower cost that currently, or even a total revolution of the system with legacy publishers totally removed?

b) Methods - Even if people agree on the goals - e.g. achieve OA at a reasonable cost, they disagree on the methods. e.g. Should we sign transformative deals in the style of UC and Elsevier, or shall we do unbundling of big deals using Unsub service in the style of State University of New York (SUNY) system reducing subscription fees that can be used to fuel other open access approaches?

c) Facts - A lot of disagreements in methods comes from disagreement in facts or expectations of researcher/institution behavior. e.g. A age old debate was around the question of publisher OA embargos and ultimately the effect of the level of OA (includng Green OA) on renewals. It was long believed there was no evidence that institutions would cancel a title even if there were zero publisher embargo periods but how could you test this when levels of OA was low? Today with services like Unsub that uses levels of open access as a major component of decision making to cancel titles, this assumption is being challenged.

Unsub guided tour from Jason Priem on Vimeo.

Another recent example of disagreement when it comes to facts is the question of researcher behavor if APCs become the dominant mode. One current critique of transformative deals currently being signed is that the APCs being agreed upon were simply too high and favoured existing big legacy publishers in this new APC world.

I would guess that supporters of UC style transformative deals may even agree with this critique, but pin their hopes that in a APC world they are able to transfer some of the "pain" to researchers (unlike in a subscription world where readers are shielded) and this would lead to price competition and eventually APC dropping. Not everyone agrees this will happen and if so they may think this UC deal with Elsevier is not a good idea.

Given all these disagreements, are you surprised to know that even definitions of Open Access are disputed?

Definitions of Open Access

I personally have been watching the literature I dub "OA measurement" for some time now because I am convinced that high levels of OA will be disruptive for us academic librarians, so I have taken it upon myself to watch the literature closely that tries to measure the level of OA over the years (see this presentation I gave that briefly covers some of the analysis until 2018).

Below shows one of the first reports that indicate there is substantial pools of papers that are free to download.

Proportion of open access papers published in peer-reviewed journals at the European and world levels—1996–2013

Another paper I consider seminal in this area is - The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles, which applied the open Unpaywall technology to analyse the amount and types of Open Access. This highly influential paper which includes authors who were the cofounders of ImpactStory (now known as Our Research) also coined the term "Bronze Open Access" referring to the large amounts of free to read papers found on publisher platforms but without any open licenses.

Many other papers that study OA levels have since followed, most of which use the open Unpaywall API or dumps as the measuring tool of choice, as methods such as creating your own crawlers to harvest from the web, or even scraping Google Scholar results on a large scale isn't easy.

As such while not a researcher in this area, I consider myself fairly up-to-date on the issues and methodology and am even a bit of a nerd on how Unpaywall API (which is the most common way open access is measured) works, following along on the mailing list.

This unfortunately also led me down the rabbit hole to thinking how Open Access is actually defined and measured.

Disagreements about "Real OA"

Experts well versed in the area reading the sections above would probably realise many of the OA measurement studies I have been citing are not quite measuring the same thing.

For example, in the very first study, the authors were tracking "Free to download/read" papers, which may not correspond to Open Access papers which have open licenses.

This of course leads to the distinction between Gratis OA (which removes price barriers) and Libre OA (which removes price and permission barriers) and if Gratis OA can be considered Open enough to be classed under Open Access.

I've also seen arguments that only copies of papers licensed with CC-BY are truly Open Access, while copies with CC-BY-NC-ND are not really open access to the dismay of institutional repository managers who can often only post papers with such licenses.

And of course people quibble over whether categories like "Delayed OA" , where papers are available only after embargos (or even are deposited by choice much later) really count as OA etc (In practice, this is hard to track so most studies ignore this issue) etc.

Rick Anderson goes in-depth into a lot more of the diversity in definitions here.

Still these issues aren't really that puzzling to me. These disagreements just reflect what use cases are valued.

Disagreements around OA definitions - Category errors?

In recent months, I ran into issues around OA definitions that blew my mind, basically because they weren't simply disagreements about what counts as OA.

People would using the same labels to mean different things without realising it.

I am coming to realise these confusions result from what philosophers like to call "category errors."

Confusion 1 - How do you account for OA status for copies of papers that span more than one OA category?

The first nuance came to me on Twitter via Cameron Neylon who has this view that OA colors are "orthogonal" (he has also written a lot on this topic).

In a complicated discussion with Bianca Kramer on OA definitions, he makes this remark in response to scenario which paper initially has Bronze and Green OA copies and then the Bronze copy disappears after a while.

Then it stops being bronze (which as you know I hate as a category anyway...) but the question of whether it is green is entirely orthogonal.

It is (or is not) bronze &
Is (or is not) green

No relationship between those two characteristics.
— CⓐmeronNeylon (@CameronNeylon) June 27, 2020

Confused? Let me explain with a simpler example using just Green OA and Gold OA while ignoring other colors.

Imagine a person has the following set of papers with the location of the copy in parathesis

Paper A - Gold OA copy (PLOS ONE) + Green OA (PMC)

Paper B- Green OA only (Institutional repository)

Paper C - Gold OA copy only (PLOS ONE)

Paper D&E - no OA copy anywhere (paywall copies on publisher platform)

Here's a question - How many % of the papers are Green OA? How many % of the papers are Gold OA?

Let's for a moment simplify matters and assume all copies even Green OA ones are version of records/final publisher version copies.

Answer 1 :

Gold OA % = 2/5 = 40% (Paper A and C)

Green OA% = 1/5 = 20% (Paper B)

Not OA% = 2/5 = 40% (Paper D and E)

Answer 2 :

There are actually two copies of Gold papers (Paper A and C), but also two copies of Green OA too (Paper A and B) , with two more non OA as before for a total of 6 papers.

Gold OA% = 2/6 = 33.3% (Paper A and C)

Green OA%= 2/6 = 33.3% (Paper B)

Non OA = 2/6 = 33.3% (Paper D and E)

Edit : Bianca Kramer suggests a third answer that the denominator should be 5 (the unique number of papers) rather than 6 (the total number of copies) which makes sense but runs into the problem the percentages don't sum to 100%, which may or may not be an issue.

Using that calculation, we get

Green OA% = 2/5 = 40%

Gold OA% = 2/5= 40%

Non OA% = 2/5 = 40%

Why two different answers

Which answer seems more intutive to you?

The crux seems to be if a paper is available in several locations and falls into different color categories, do you feel that the paper must have one singular "best" OA/color status?

If so you agree with Answer 1 and let's call this approach the "best OA version approach".

The other approach which corresponds with Answer 2 , let's call "multi OA label approach".

Obviously, to use the "best OA version approach", you must have in mind some rule for deciding which OA/color status has priority when it has copies that fits multiple categories.

In our example, if a paper has both Gold OA and Green OA versions, most people would obviously say the best version is Gold OA, so this paper as the OA status of Gold not Green.

What do you mean by Green OA?

Another way of thinking of this disagreement is what you really mean by "Green OA".

On the surface this seems to be a simple concept - a paper is Green OA if there is a copy hosted on a repository.

But notice you can get mixed up depending on whether you use the concept of Green OA as "Unique Green OA" or "Total Green OA"

If you define a paper as Green OA only if it is "Unique Green" (no Gold OA copies) that corresponds to what I have been calling the "best OA version approach"

If you define a paper as "Total Green OA", that is a very different thing, since Green OA would include every paper that had a Green OA copy even if there was already a Gold OA copy.

This isn't a distinction I just made up btw.

For example, if I am interpreting them correctly below, Bianca Kramer thinks more using the concept of "Total Green" while for Lisa Janicke Hinchliffe, unique Green OA is the only Green OA that counts.

Yeah, that's what makes no sense to me. :)

/gnight!
— Lisa Janicke Hinchliffe (@lisalibrarian) February 28, 2021

When there is a Gold OA copy and the same copy is in a repository, Lisa Hinchliffe doesn't consider the copy in the repository as Green OA but rather "it is a Gold OA article retrieved from a repository rather than the publisher website."

Fortunately we have no problem being in disagreement. :) While I realize that people call this "green OA" - to me it is a Gold OA article retrieved from a repository rather than the publisher website.
— Lisa Janicke Hinchliffe (@lisalibrarian) February 28, 2021

Why this matters

I would guess for most people , Answer 1 is more intutive. But having in mind a singular color status for an article also leads to complications as we shall see.

So you might be thinking this is all academic, what does it matter? Leaving aside that we are starting to see rankings and metrics based on open access rates, we are also starting to see discovery systems and search engines start to add OA status as filters and facets - this includes Library Discovery services, A&I indexes like Scopus, Web of Science and/or newer discovery A&I services like Lens.org etc.

The OA filter in Scopus

As such, it is natural to wonder how they are dealing with this issue, to better understand how to use the OA filters.

In other words, if they detect a paper has both Green and Gold versions (most likely via Unpaywall), what do they display in the facets?

Let's assume that we have a discovery service like Lens.org that shows only one record for each article but can use the unpaywall service to tell in advance all the locations with free OA copies (whether it chooses to display all or some of them is moot here).

In fact, such a setup is common in discovery services and databases these days e.g. Scopus, Web of Science etc

Using the same example from above, what do the filters show?

The figure above corresponds to a system based on Answer 1- "best OA version approach", where an article can be either Green or Gold but not both. In other words, the sole Green OA that shows in the filter is for articles that are UNIQUE GREEN ONLY.

If you filter to the Green OA here, you only get 1 paper showing (Paper B but not Paper A)

If instead they follow the thinking in Answer 2, "multi OA label approach" where an article can be both Gold and Green OA, they will get this instead

In this conception of OA, when you filter to Green OA, you will get two papers showing (Paper A & B from above example).

Further complications - Green or Bronze?

Which would you prefer if you were consulted to define the OA definitions for these search engines? As a librarian I can see arguments for either definition.

For example, the first set of filters has the advanatage of simplicity. When you filter to Green OA, you get articles that are unique Green OA . One can argue that this is a valuable thing to know, because this shows truly valuable Green OA (If a Green OA copy is accompanied by a Gold OA, chances are the Green OA copy doesn't add any value besides some redundancy).

On the other hand, such a setup loses information. If for some reason you wanted to filter down to the set of articles which had both Green OA and Gold OA versions you could not do so.

Assuming a system like Lens.org with fully flexible facets and filters that allowed you to include or exclude values, a conception of OA where you listed all colors of a paper rather than choose the best one, would allow you to pull out any answer you wanted with the combination of inclusion and exclusions.

But say, you want to keep things simple and are heart set on Answer 1 and want filters like in the first set, you are still not out of the woods yet.

Because in reality there are not just Green OA and Gold OA! Even if you leave aside OA labels like Diamond/Platinum OA (which we will discuss later), most OA filters include "Bronze OA" ( free to read papers on publisher platforms but have no open licenses), because this has been found to be a big proportion of free to read papers.

So now if you want to adopt the "best OA version approach", you need to have a deciding rule to priortize all three types of OA.

So clearly Gold OA (via hybrid journals or full 100% Gold OA) has priority over Green. But what about the case where a paper has 2 copies, a "Bronze" and a "Green" which would you prioritize? In such a scenario should it show "Bronze" or "Green"?

You can argue for "Green" first because the "bronze" version is often unstable (lacking a open license it can be closed easily).

But you could argue the bronze version almost always tends to be the version of record, while the Green OA more often isn't so from a reader point of view, you probably want to be able to filter down to all articles with at least a Bronze copy, something that you can't do with the "best OA version approach" and defining the OA status of a paper to be Green when both Green and Bronze exist - ie Bronze filter = Bronze only.

For what's it is worth, I believe most OA facets in databases and academic search engines adopt the "best OA version approach", let me know if you know of a case that does not.

And regarding OA facets in search engines/databases, afaik both Web of Science and Scopus use the multi-label approach for green OA, rather than 'best OA version'

(and then there are differences in what versions (submitted, accepted, published) are included in green OA...) pic.twitter.com/qv2ak7Mqsq
— Bianca Kramer (@MsPhelps) April 7, 2021

Reflection

The other major drawback of adopting one best OA version method is that this best OA label might shift over time. As the Unpaywall developer notes, insisting on a best OA label can result in colors shifting from say Green to Bronze or Green to Bronze as copies drop in and out.

This brings us back to the first quote made by Cameron Neylon that OA definitions are entirely orthogonal and we shouldn't try to assign a best OA color.

So we would disagree with this. This stays bronze & also becomes green. The idea of a “best” colour is a category error for us because it conflates too many things. But all that’s fine for us bcs you give all locations so we can parse out.
— CⓐmeronNeylon (@CameronNeylon) June 27, 2020

Part of why we got into this mess is because we decided the OA colors are distinct categories like the way real colors are. When in fact they overlap in a lot of ways. Green, Gold, Bronze, Hybrid OA do not measure or vary on just one dimension, but on several, so you can't just compare them directly and choose the "best" one.

As Cameron Neylon says this is a "Category error".

In the next issue, a confusion arises again due to the attempt to force OA into one one dimensional label.

Confusion 2 - How do you define Gold OA Journals? Article level vs Journal level

When you first learn about open access, you learn about the common errors that many people tend to make about OA.

One particularly common error is for people to think the Gold OA route is defined by paying of APCs (Article processing charges).

As anyone can tell you, the original definition of "Green OA" just means articles made open access available via repositories and "Gold OA" means articles made open access available via journal platforms.

There is no mention of the business model at all. While research shows there are more articles made available via Gold OA that involve payment of APCs, there's still a sizable number of Gold OA journals that do not involve payment of APCs (most in DOA) dubbed Diamond/Platinium OA and they generate a significant % of articles made OA via the Gold OA method.

There is suspicion that publishers prefer to think that "Gold OA" = APCs to normalize the idea of paying APCs for OA, but beyond that no-one disagrees that the articles made available via Gold OA are not all or even mostly funded by APCs.

Here though I will quickly pose a question , are Hybrid Journals considered OA journals? The answer is no, because the definition of a OA journal includes the requirement that all the articles in the journal are OA, so even though some articles in a Hybrid Journal are OA and even Gold OA, hybrid journals itself are not OA journals.

Which points to the difference with applying the terminlogy "open access" at the journal level and the article level as at the journal level, you need to include an additional condition of "Are all the papers in the journal OA" which isn't something you need to consider at the article level.

Here's one attempt to distinguish between Gold and Green at article and journal levels.

Created by Jeroen Bosman & Bianca Kramer, last modified 20210301

Article level vs Journal level confusion

Recently, I tweet this

Am i fighting a losing battle to correct people whenever they define Gold OA = APC? should i even borther?
— Aaron Tay (@aarontay) February 26, 2021

Initially reactions to this tweet was as expected, people telling me to keep fighting the good fight including one from Peter Suber himself.

Thanks and plz keep it up.

Gold OA is OA delivered by journals, regardless of the journal's business model. When there are fees, we can call it fee-based or APC-based gold OA. If we give in, then we no longer have a good term for journal-based OA regardless of business model.
— Peter Suber (@petersuber) February 26, 2021

That seems to settle it right?

All Gold OA Journals = All Gold APC Journals!

Then I noticed this tweet that suggested to me "Gold OA *Journals*" will definitely involve using APC

My mind was blown when I realised that to some people, they defined Gold OA *journals* (distinct from article level definitions) this way.

In a Scholarly Kitchen article, Lisa Hinchliffe distinguishes between "Article level Open Access" - which is defined the usual way and "Journal level Open Access"

First she defines Hybrid journals

"In considering open access at the journal level, green and bronze articles are typically set aside. Subscription journals that offer an option for publishing an article gold open access, which is predominantly done through an APC payment, are known as hybrid journals."

So far so normal, then this is the mind blowing part for me

"For fully open access journals, the two most common categories are gold and diamond (which is also sometimes called platinum). The distinction between these two categories is not the status of the articles. In both, the articles are gold open access. Rather, the distinction is the funding model. For a gold open access journal, the business model is APC payment by or on behalf of the author. For a diamond (platinum) open access journal, the journal is financed in some other way. " - Lisa Hinchliffe, emphasis mine

In other words when Lisa says Gold OA Journal, she means only journals like PLOS ONE that uses APC. It excludes journals like Journal of the Medical Library Association (and many others in DOAJ), where the articles are 100% open access on the journal platform but do not involve APCs at all, which she includes in a seperate and distinct category called Diamond/Platinium OA journal.

To be clear, this school of thought, still thinks articles that are made OA via the Gold route or Gold OA *articles* do not necessarily imply APCs. Some are made Gold OA articles via Hybrid journals, Diamond/Platinum journals as well.

But when they talk about Gold OA Journals - it definitely involves APCs. If something is a Gold OA journal but does not involve APCs, it is diamond/platinum OA Journal.

Compare this to the way I think.

Gold OA journals are a superset of Gold APC journals like PLOS One AND Diamond OA journals like Journal of the Medical Library Association.

Using this definition, Gold OA journals definitely do not all involve APCs.

Problems with Gold OA journals = APCs

My first thought on learning that people define Gold OA journals as journals that definitely have APCs as a business model as a odd choice.

At the "article level", "gold" has nothing to do with the business model at all, but has to do with the platform, so it seems really confusing to reuse the same word "gold" and let it mean something else at the "journal level".

The other problem is if you follow this definitions the thousands of DOAJ journals that don't involve APC (and the at least 17,000 Diamond journals estimated in this study) , suddenly become non-Gold journals (count only as Diamond/Platinum journal) which I'm pretty sure is weird for a lot of people.

Lisa disagrees of course and the discussion continued here on Twitter, though it seems ultmately, she is against the idea of figuring out a document's status based on where the file is served up from which you saw earlier.

Confusion 3 - If papers are in preprint servers like Arxiv but are never or not yet published are they considered Green OA copies or Gold OA?

The next set of problems results from you realizing that the concept of Green and Gold OA, implicitly assumes the paper is published in a journal somewhere for it to have the possibility of a Gold OA version.

That's true. Preprint repositories are another place where the DOI URL = Publisher rule breaks down. We started calling papers on the various *rxivs green since most people think of them as repositories. But often there is no subsequent published version - so what color is that?
— Richard Orr (@unpaywall_dev) June 25, 2020

For a paper that never ends up published formally in a journal title, it will only have one version residing in the preprint server say arxiv. Such a version is typically called Green OA (because it is in a repository) but when you think about it you run into problems wondering if such a version can be considered a Version of Record or not.

Category error - root of all the OA definition problems

Why so much confusion? Do we have anything in common with the two confusions?

Here's a recent reflection on Diamond Journals.

"Diamond. The term is made to do a lot of work. Used in open-access publishing circles to refer to modes in which neither the author nor the reader pays, it implies rarity. It denotes scarcity. It signifies refinement. It seems, if the marketing of De Beers is anything to go by, to be the ultimate expression of love. Forever. A girl’s best friend."

"Diamond, however, is also a category error. While “gold” and “green” open access refer to conditions of availability – at the publisher and in a repository respectively – diamond denotes the financial conditions under which a publisher operates. Hence, while the term is meant to connote a supreme condition in which infrastructure exists transparently without financial demands from authors or readers, it also helps to cement the false certainty, erroneously held by many researchers, that “gold” open access means article processing charges (APCs)." Martin Paul Eve, emphasis mine

Lisa herself says pretty much the same thing in a different way.

Personally, I think we're trying too hard to create categories/subcategories rather than a matrix of characteristics. I'd be fine if we killed off the labels! (But, we won't).
— Lisa Janicke Hinchliffe (@lisalibrarian) February 27, 2021

Ultimately, all this confusion and issues arise from us trying to squeeze a multidimensional concept into one label.

In Unbundling Open Access dimensions: a conceptual discussion to reduce terminology inconsistencies, the authors anticipate almost every confusion I have listed above.

They for example mention the confusion over the definition of Gold OA journals, models where papers are not published via normal journal titles, delayed OA and more.

On the Gold OA journal vs platinum OA journal they write

"For example, the Gold label, which was originally intended to mean only that the venue of publication was a journal (Suber,2008b), is now bundled with other assumptions, namely, that the article is made OA immediately upon publication, and that the article has a license that provides ample rights to all users. There is not a widespread agreement, however, on whether Gold OA requires the payment of an APC. This depends on whether Diamond/Platinum OA journals are considered a subset of Gold, or a separate type of OA. In either case, it would be an arbitrary decision that is not initially self-evident."

The main crux of their argument and proposal is that we have to define OA by multi dimensions rather than one. They

"believe, that, as Suber and Harnad did ten years ago (Suber, 2008a), it is time to rethink again, and consider the best ways to interact with the concept of OA and all its variants. We believe that the original Gold/Green and Libre/Gratis distinctions are no longer capable of capturing all the permutations and nuances of OA that exist nowadays. We are not alone in this belief (Danowski, 2018; Neylon, 2013). Therefore, we think it is necessary to unbundle all the different aspects of OA and consider them as separate dimensions. " (emphasis mine)

They propose 6 dimensions - namely

Cost
Prestige
User rights
Stability
Immediacy
Peer review

On the Gold OA journal vs platinum OA journal they write

"For example, the Gold label, which was originally intended to mean only that the venue of publication was a journal (Suber,2008b), is now bundled with other assumptions, namely, that the article is made OA immediately upon publication, and that the article has a license that provides ample rights to all users. There is not a widespread agreement, however, on whether Gold OA requires the payment of an APC. This depends on whether Diamond/Platinum OA journals are considered a subset of Gold, or a separate type of OA. In either case, it would be an arbitrary decision that is not initially self-evident."

Conclusion

This has been a long and difficult blog post to write and I wrote it mostly to try to straighten my own thoughts on why I was so confused ....

Aaron Tay's Musings about Librarianship

Discussion about this post