Three+one bonus Google Scholar improvements that I wish Google would do.
One of the central themes of my blog is that it covers the topic of academic discovery (100+ blog posts are tagged "Discovery" out of 300+ posts) and when it comes to discovery in the academic area, the importance of Google Scholar looms over everything we do in this area.
Why is this so? For those not in the know, Google Scholar is pretty much unsurpassed in terms of size of index when it comes to journal type content and by that I don't mean just access to metadata (title, abstract, author etc) but more importantly full-text which it can search through and match your keywords. As such, it is an all in one discovery tool that works for almost all disciplines, updates extremely quickly with the trademark magical relevancy (probably mostly helped by full text matching) all wrapped up in the ease of use you expect of Google services. All at the unbeatable price of free.
I am also of the view Google Scholar also has a secret sauce that is hard to duplicate by rivals which is it's ownership of Google Books. Google Scholar may on occasion, mix in results from Google Books when searching Google Scholar, though it's unclear what and how often this occurs.
It's the reason why it is now almost impossible to charge for a new general purpose academic search engine if all it does is to allow keyword searches and displays results in the usual way (See my long article comparing Google Scholar with other citation discovery engines)
Despite efforts of well funded rivals such as Microsoft Academic, Semantic Scholar, Lens.org, Dimensions free to compete even on "free" it is still an uphill battle for them to lure users away from Google Scholar.
The dominance and popularity of Google Scholar
For example, if I was forced to choose to use only one academic search engine without knowing the task in advance I would choose Google Scholar in a heartbeat. I suspect in general this has been the default view for many if not most researchers for some time now.
Google Scholar isn't the best way to find scholarly literature, but it's 98% as good and 400% easier than the next-best alternative
— Paul Musgrave (@profmusgrave) April 17, 2021
The recent publication of "How Readers Discover Content in Scholarly Publications 2021" , notes
"Google Scholar continues to be the dominant search engine used for journal discovery in the US and most European countries. It is also the most popular search engine for journal discovery in China, although the popularity of Baidu continues..."
In terms of disciplines
"Google Scholar is by far the most important discovery resource for people working and studying in the broad area of Humanities and Social Science... , A&Is are still the most important search tool for people in high-income countries working in the broad subject area of life sciences (either in the medical or academic sector). This has been
consistently true since 2005. However, Google Scholar is catching up"
This surprised me a little on the importance of Abstract and indexing (A&Is) databases, but notably this refers to only researchers in the life sciences, where the specialized for life sciences search tool - PubMedis ubiquitous (Meta, a Chan Zuckerberg Initiative project focusing on biomedical areas is another one to watch).
So yes, it is possible for specialized search tools to outperform Google Scholar, but even that isn't easy or guaranteed. I've seen quite a lot of literature trying to compare Google or Google Scholar with subject A&Is and failing to come on-top on both recall and precision. Here's a recent example
Warning : This doesn't mean you just use Google Scholar alone for systematic reviews!
What can Google Scholar do to improve?
Still Google Scholar isn't perfect, and one can think of many ways it could improve. For example, it is clear the Google Scholar's lacks easy bulk export of data, something if fixed would help push bibliometrics forward a great deal and this is an area where rivals like Dimensions are taking aim by providing APIs etc. Still I suspect, this deficiency reflects more a lack of interest on the part of Google Scholar team or perhaps a legal constraint and is unlikely to change.
In any case, for general literature review, bulk export of data isn't quite a critical use-case for most researchers except for those doing bibliometrics, so this doesn't distract much from the usefulness of Google Scholar.
For this blog post, I will focus on "small" improvements (or so I would think) I would like to see made to Google Scholar that will help the average researcher in doing literature review.
Unlike say the recent A better Google Scholar, I'm not going to suggest large scale overhauls to the interface or functionality but rather small tweaks (or so I think).
They are
Google Scholar should have a show references function and "search within referencing articles" function
Google Scholar should allow search alerts for "Related articles" and base personalized recommendations on papers in "My library"/Google Scholar Libary and not just papers in the Google Scholar profile
Add the "About this Results panel" in Google that tells you what is being matched in each search result to Google Scholar
Bonus feature - filtering full-text matches to citation statements and contexts
1. Google Scholar should have a show references function and "search within referencing articles" function
Researchers love to do forward and backwards citations of relevant papers to help find related papers. This "pearl growing" technique with citations is a time honoured way to explore the literature jungle to help mitigate the issue of people using different terms/keywords.
I've noticed in some literature using citations this way is called "citation chasing", "citation searching" while Citation Pearl Growing is defined to exclude such techniques, but here I use the more commonly seen definition.
Not just citation pearl growing!
It is important to note that tracing citation relationships or citation pearl growing is not the be all and all method for find relevant papers. It's not even the only type of pearl growing technique! An Analysis of Citation Recommender Systems: Beyond the Obvious showed that in practice authors would cite papers that did not have any or were very loosely connected to other cited papers.
Such papers that were "very loosely connected" to the other cited papers, was difficult to find (low recall) using recommendation methods that were citation based including those based on co-citations, bibiliometric coupling , collaborative filtering (with where citing papers correspond to users and citations correspond to items) and PaperRank (random walk from seed papers along citation links).
Worse yet this wasn't a rare problem and almost 50% of cited papers in that study had only a citation link or less. However
about 46% of the papers that were loosely connected (2 or less links) share at least one common author with the seed papers. 60% appear in one of the same venues, 95% share at least one keyword.
This confirms the general practice of supplementing citation pearl growing by
looking at publications of interesting authors
browsing papers by common keywords
browsing papers in common venues (journal sites)
is important. The paper confirmed this hypothesis by showing recommendation methods that combine author/venue/keyword with citations techniques did better at retrieving such loosely connected papers.

Note: In the literature the terms "citations", "references" etc are often used in conflicting ways despite attempts to standardise. In this blog post, I will simply refer to "references" as the citations that appear in the bibliography/reference sections of papers.
Also a recent citation based literature mapping tool, Research Rabbit has tapped on this common workflow to provide an interface that helps reduce friction when doing citation pearl growing.

ResearchRabbit "All References" and "All Citations" function support citation pearl growing
How does Google Scholar compare? We will see how Google Scholar actually supports all these methods of searching except for one glaring exception.
Google Scholar has a great cited by function
Google Scholar famously has a "cited by" function, coupled with a marvellous way to "search within citing articles".


Google Scholar's "Cited by" function and "Search within Citing article" features
The "search within citing articles" leverages on what I consider Google Scholar's greatest strength, its index of full-text that allows you to quickly narrow down the thousands of citing articles to what you want.

Google Scholar's search alert feature
Given the fact that you can even create search alerts on such a search, what else can you ask for? While competitors like Semantic Scholar allow you to filter citations of a paper with some additional metadata fields such as "Field of Study", "Journal title", I don't think this are critical features even if Google Scholar borthered to do this.


Semantic Scholar provides additional ways to filter citations
However Google Scholar lacks a way to view all references
What I think Google Scholar really lacks is the opposite function - a function showing all references of a paper.
This is a very common feature in most academic search indexes, from the newer, Semantic Scholar, Lens.org, Dimensions free to even Scopus which was released in the same year as Google Scholar.

Lens.org allows viewing of references of papers
While you can of course manually copy and paste references of a paper into Google Scholar, why not just display the references with a single click in Google Scholar. I presume Google Scholar has that information...

Mockup on how Google Scholar can support views of references
This seems an obvious thing to do, so why isn't this in Google Scholar? Perhaps the worry is this function would lead to an overcrowded search engine results page and perhaps it wouldn't be worth it?
Still I would think being able to do backwards citations as well as forward citations would be amazing.
Google Scholar's help pages itself suggests - "If the search results are too specific for your needs, check out what they're citing in their "References" sections. Referenced works are often more general in nature." so why not make it easier?
Now imagine this and paired with the equivalent "search within referencing articles". I suspect this feature can be very helpful when used on review articles with huge references.

Mockup on how Google Scholar can support "Search within referenced articles"
Part of me notes that including such a feature would have complications like how or whether to present items that are not indexed in Google Scholar. But Google Scholar has handled this with the [Citations] tag , a similar thing can be done for references I guess.
2. Google Scholar should allow search alerts for "Related articles" and base personalized recommendations on papers in "My library"/Google Scholar Libary and not just papers in the Google Scholar profile
Google Scholar doesn't make a big deal about it, but it actually has multiple ways of recommending articles.
Firstly it has personalized recommendations based on your Google Scholar profile.

Personalized recommendation in Google Scholar based on your profile
These are great and I find them more on target than practically every alert system I have tested.
For example, the very first one right now is extremely relevant and covers the very topic I have been studying and blogging on for the past few months! However there are grave limitations, because they are based on works in your Google Scholar profile. As such these recommendations works great only if I already have a record in those areas (the second recommendation above is clearly based on the topic in one of my papers) but are useless when I am starting off in a new area, which is exactly when I need those recommendations!
I've thought of using various workarounds to try to bypass this, but none seem worth the effort. What Google could do is to just pull personalized recommendations based on the papers you save/star into "My Library"/Google Scholar Library. Currently, I find this feature is strangely pointless if you already use a reference manager.
Secondly, you have author/profile based recommendations/alerts where you can follow "people"

How this works is, you can go to a specific Google Scholar profile, and choose to get alerts of
New publications by the author
New citations to this author's works
New articles related to the author's research
I have tried this function and the main issue with these author based recommendations is that they often lead to less than useful recommendations because most researchers work in more than one area throughout their career and choosing any of this options tends to get papers that might be totally irrelevant particularly if you are interested in their past research.
Thirdly, you can create alerts based on citations of individual articles or even searches that match against terms that are citations of individual articles as I have shown about.
This is probably the best recommendation feature in Google Scholar if you want focused recommendations assuming you can't get personalized ones based on your profile.
Is that all?
In fact there is a fourth "recommendation" option, the "related articles" link below each article.

I couldn't find much coverage of this feature, except that it exists and that "It finds documents similar to the given search result." according to the help page.
We do not know how it works. Does it work on citations (explains why some recent papers don't have the option)? Text similarity? Collaborative filtering using user behavior? Possibly it could be similar to the personalized recommendations you get when you have such papers in your profile? Who knows?
In any case, trying it out in areas I am familar with, seems to me that it shows pretty high quality recommendations, supplementing the earlier methods.
While not knowing how it roughly works doesn't preclude it being useful, I think there is a strange lack here. This is what Google Scholar shows when you click on "Related articles"

Current Google Scholar related articles page
You can't quite see from the screenshot above but this article has over 60 "related articles". So an obvious improvement would be to add the typical Google Scholar facets on the left to filter by year range, sort by relevance/date etc. (See mockup below).
But more importantly, what could make this function even better I think is to allow alerts for related articles.
Currently, I would need to constantly run this related articles function to see if there was anything new.
Instead adding related articles as a alert would pair nicely with the cited by alerts.
Such a feature would add yet another recommendation avenue based on papers (rather than authors) and would not require you to have published in that area.
All in all here's my mock up of the screen for related articles.

Mockup of improved "related articles" screen for Google Scholar
3. Add the "About this Results panel" in Google that tells you what is being matched in each search result to Google Scholar
As noted in my last blog post, Google enhanced the "About this Results panel" first introduced in Feb 2021 to help provides guidance on exactly what Google was matching. Note that this seems to be US only currently.

About this panel in Google shows you what Google used to match your search query for each result
Google is of course known for tweaking and even overriding what the user types in as a keyword query if it thinks it knows better what you want. It's unclear to me how much of this applies to Google Scholar ( I suspect Google Scholar has less of such things).
Still there is no doubt Google Scholar can and tdoes ends to includes synonyms or "terms related to the keywords" you enter.
There are various things you can do in Google that they might try to prevent this but the most straight forward way "verbatim mode" doesn't exist in Google Scholar.
As such it would be great to know what related words Google Scholar is throwing in and remove it when necessary.
This is why this "About this Panel" in Google might be useful to include in Google Scholar as well, as it shows you what related words are being matched.

A Google search and the resulting "About this result" panel for one result
Bonus feature - filtering full-text matches to citation statements and contexts
This last bonus feature I suspect is a much bigger "Ask" than the ones above, but I believe it would make a huge difference....
As mentioned above one of the major advantages of Google Scholar over almost all academic search engines is they are able to access full-text of basically most papers. It is believed publishers give their crawlers almost full access to index full text behind paywalls. Given how dominant Google Scholar is, any publisher that does not do this would fall behind their competitors in visibility stakes that did so which is why almost everyone does.
One of the most powerful features resulting from this, but is seldom mentioned is that Google Scholar can not only match obscure search terms in the full-text but also show them in the search engine results.

Full text matches in Google Scholar
This is often insanely useful, like in the above example, you can quickly see the second and third result the words "Wikipedia" and "Singapore" are being matched mostly because of the author's affiliations. While the first is really talking about a Wikipedia article on Singapore.
Only Google Scholar can and does this well because it has most of the full text. The soon to be closed Microsoft Academic and the corresponding data in Microsoft Academic Graph does not seem to match full text at all. I believe that's because it only uses the full-text to extract citation sentiments and perhaps other necessary metadata for machine learning.
So Google Scholar is the only one with almost complete full-text and shows matches in the search engine snippets results, case closed right?
I do think Google Scholar might be possibly missing a trick. As hard as it is to believe, the fact that Google Scholar matches full-text can sometimes be a weakness, particularly when there is no way to control where the full-text match is.
While many search engines like EuroPMC can allow you to control in which paper section the match can occur, I think there is a even more powerful way to filter full text matches.

EuroPMC advanced search allows matches in sections of papers
What am I talking about? I am talking about matches in the citation context of papers.
This is of course, the same feature as scite's new citation statement search feature which I blogged about last month.
Note since I blogged about this, the search has become faster and supports Boolean, brackets defining groups, exact searches, wildcards, fuzzy term operators and pretty much everything supported by elastic search.
The idea is this, instead of matching your keywords over full-text (which scite does not store anyway), instead this feature allows you to restrict your search to match only text in the "citation statement" (and one sentence before and after it which is the citation context).

Citation statements in scite
On the face of it, this doesn't seem to be a major improvement. However once you have tried it and compared with Google Scholar, you will realise this is quite a ground breaking improvement.
I think the main thrust is this. Not all the full-text is equally important to match, but the citation statements (and contexts) generally are statements that make claims, discuss or distinguish findings of papers.
This often leads to a situation where such searches almost gives you an instant answer. In the example below, I did the following search in scite's citation statement search
google scholar suitable systematic search
and a nice paper popups that is not only relevant but instantly tells you the answer, that nope, probably not advised as a standalone.

Citation statement search - is Google scholar suitable for systematic review
This isn't cherry picking in my older blog posts, I show many examples such as entering
a) name of a theory e.g Terror Management Theory
b) name of a dataset e.g. Datastream
c) or any topic at all e.g. rohingya hypertension, chromosome missegregation rate hela, p-hacking psychology, williamsburg rents rise
and it's produces stunningly good results where you can almost read off the answers from the context matched.
For theory names searched e.g Terror Management Theory, you would typically get examples of how different papers defined it or how they used or tested it
For datasets names searched e.g. Datastream, you would get statements of how the datasets were used or collected etc.
This is not to say Google Scholar gave bad results, in fact it usually gave very relevant results, sometimes even the same papers cited by Scite citation statements! However you often had to download the paper and read the full-text to get a sense of the research.
Below shows the same search in Google Scholar.

Google Scholar search - google scholar suitable systematic search
While the first and third results were very relevant,
because Google Scholar was matching some of the terms in the title and some in the full text, the snipplet displayed was less useful than scite's snipplet display which was all in the full text and not just any full text but the citation statement.
So scite themselves put it, about Google Scholar
The results presented are small snippets of the text where the word occurs or simply the metadata of the publication. These results tend to lack relevance and precision because the keywords a simply matched anywhere in the text and it can be very hard to understand if the search results are going to be useful unless you got in and purchase the full text article.
In comparison, scite's
Citation Statement search shows you exactly why and where a keyword or phrase was used by showing the entire citation context including not only fully complete surrounding sentences ... by presenting results in context, you are presented with much more relevant and contextualized search results that can ensure you are not wasting time or money investigating irrelevant research.
The question is should Google Scholar try to parse out citation sentences? The effort probably isn't trivial given the mass of full text they have.
Still I've been studying and exploring academic discovery almost since I started as a librarian, and this innovation small as it seems feels like one of the most impactful steps I have seen in recent years.
Matching citation statements rather than just metadata and full-text alone often gives better results and even if the results are not out-right better, they are difference enough to be worth using.
The other thing to take note is that currently scite has only limited full-text from open access papers and publishers like Wiley etc they have partnerships with to extract citation statements from (currently at 900m citation statements) and already the results are quite stunning.
Now imagine if Google Scholar with all their full text started to do the same......
Conclusion
I've suggested 3 "small" feature improvements and 1 big "Ask" that I wish Google Scholar would do. Of course, while from the user point of view these feature updates seem small, it probably is not a small undertaking to actually code them in.....

