A Google Scholar trick - combining citations from multiple items and filtering by keyword in a single search!
I recently received an interesting question. Given multiple items (articles, books etc), how can you efficiently identify all the citations of these items, combine , dedupe them and filter the list with arbitary keywords in as few steps as possible.
In this blog post, I share a unofficial trick I found to do this in Google Scholar that may be helpful particularly if you are looking for citations of classic or seminal books and also briefly mention other indexes and search engines that allow this to be done in a quick and easy way.
Introduction
I was recently looking into systematic reviews and I read about how citation searching of included papers is considered as "good practice" and knowing how to do this efficiently for multiple papers seems worth while.
But how can this be done?
My first thought was that this can indeed be done in well known Abstract and Indexing databases like Scopus and Web of Science. For Scopus for example, this involves
1. Searching for the items in question and adding them to a saved list.
2. Going to the saved list and clicking "view cited by" and doing further filtering.
A similar method exists for Web of Science via "marked list" and "Citation Report" function.
While I can't feature out how to do this natively from Lens.org, the tool - Citation Chaser exists
What about Google Scholar?
But how can you do this for Google Scholar?
Google Scholar is of particular interest because of its index is unrivaled in size. In particular, it handles books very well and with it's ability to group different variants together it is probably the only index that works well for citations of books (Web of Science book citation index is generally not as good in comparison) as well as articles.
EDIT : Besides the method below - the other best method is to get Hazing's Publish or Perish Software V8 and use the "Retrieve citing works" function. This is limited to the top 1,000 results and you can workaround this limit to some extent by partioning your search by year ranges using the new duplicate current search function but eventually Google Scholar will throw a fit and the captcha will drive you nuts, if you have 10k results say.
In the example below, I looked for citations to C. Wright Mills' book "The sociological imagination", followed by filtering those citations by the keyword <Singapore or "Hong Kong">
You can of course also search within citations using the usual operators like author:<name>, intitle:<title> etc but here I keep it simple.

Cited by feature in Google Scholar

Search within citing articles in Google Scholar
This is probably one of my favourite features in Google Scholar since unlike competing products this searches the full text and not just the metadata. But the question remains how can you do the same but for multiple cited items? Say you wanted to also include citations to Putnam's "Bowling alone: The collapse and revival of American community" before filtering to items with the search terms <Singapore OR "Hong Kong"> ? How can you do this efficiently in one Google Scholar search?
I used two books as an example here, but any of the items indexed in Google Scholar e.g. article, conference proceeding can be used as well.
Are you destined to do each search seperately, bulk download all the citations somehow the cited list (possibly using Hazing's Publish or Perish but good luck trying that for >1,000 citations) and then combine and dedupe?
Alternatively you could indvidually "star" all cited items to add to your saved library and filter from there, but good luck doing that if there are 20,000 citations as you can't bulk "Star" all results, or even all results in a page.
Hacking Google Scholar "cites" parameter
Let's look closely again at the URL above that allows you to search within citations of the book. It looks like this
https://scholar.google.com/scholar?hl=en&as_sdt=2005&sciodt=0%2C5&cites=17209805240088812907&scipsc=1&q=Singapore+OR+%22Hong+Kong%22&btnG=
You can clearly guess based on the URL structure that the later portion in blue is the keyword search i.e Singapore OR "Hong Kong" within the citations.
Even more interesting is the portion that says &cites=<random string>. Could this random string be a identifer for the cited item? Indeed it is and to be more precise this ID - 17209805240088812907, refers to the work that Google Scholar has identified as the book - The Sociological imagination under which it has grouped 28 variants as at time of writing.

28 Variants of the book The Sociological imagination grouped by Google Scholar under one Identifer
Now let's do the same search but for Putnam's "Bowling alone: The collapse and revival of American community" and search within the citations of this book with the same keywords as before

Search within citing articles of Putnam's Bowling alone
This is the URL we get and I will color code the string in the same way
https://scholar.google.com/scholar?hl=en&as_sdt=2005&sciodt=0%2C5&cites=125483733959826388&scipsc=1&q=Singapore+OR+%22Hong+Kong%22&btnG=
The pattern now becomes clear. The string in red is indeed some ID referencing the item in Google Scholar.
But how do we do the same search but reference two different IDs at the same time? Let's be clear, we want to
1. Find all cited items of "The sociological imagination"
2. Find all cited items of "Bowling alone"
3. Combine the two cited items, dedupe
4. Search within Set 3 with keywords Singapore OR "Hong Kong".
All in one search.
Let's try the logical thing and just append the second ID to the first seperated by a comma. In other words
https://scholar.google.com/scholar?hl=en&as_sdt=2005&sciodt=0%2C5&cites=17209805240088812907,125483733959826388&scipsc=1&q=Singapore+OR+%22Hong+Kong%22&btnG=

Searching within citing articles of two cited items in Google Scholar
And this is the answer!
Adding more than two cited identifers to the search in Google Scholar
And by the way nothing restricts you to stop at just 2 cited Identifers, you can add even more with commas.
For example below are the 3 cited item identifers and their respective list of cited items
a) Items that cites the first identifer in Google Scholar
b) Items that cite the second identifer in Google Scholar
c) Items that cite the third identifer in Google Scholar
Try this Google Scholar search which combines cited items of all 3 identifers.
To further refine within this combined cited items list, further append
&scipsc=1&q=<normal Google Scholar search syntax>
For example to filter the combined cited item list with the keyword University, Try
Does the answer make sense?
Looking at each search query seperately, we can see you get
a) "about 2,200 results" for the first search based on the first cited item + keyword filter
b) "about 7,830 results" for the second search based on the second cited item + keyword filter
Compare this with the result we got by the hack that attempts to do this together which is
c) "about 9,930 results" for the combined search based on both cited items + keyword filter
Given a+b = 10,030, the answer of "about 9,930" seems reasonable enough, if we assume there are duplicates.
That said given all the queries are returning "about xxxx, results" , I wouldn't put too much stock on the precise numbers, but at least they look like in the right ball park.
To further check on this method, I have done some testing where the search query returns smaller numbers and precise counts and done some individual checking and so far it works as expected.
Yet another test of the method
Compare say the search below which looked at cited items of a paper of mine and filtered to keyword Singapore - which comes to 14 hits.

and here's another search of another cited item also filtered to Singapore which yields 5 hits

So what happens when we combine this using the hack/trick?

You get 18 hits, one less than summing both searches individually. When I did a check of the returned results of both searches, I found they had one hit in common, which explained the difference.
Alternative method?
If you are not comfortable with hacking the URL , you can also try a straight Google Scholar search like this
<keyword> AND <"title of cited item 1"> OR <"title of cited item 2"> e.g
Singapore OR "Hong Kong" AND "The sociological imagination" OR "Bowling alone: The collapse and revival of American community"
*Note Google actually supports implied AND - so you should not include the AND operator in the search but in most cases adding it will not make any difference.
Why the alternative method may often work
This works by exploiting the fact that most items have indexed full text in Google Scholar and this includes the references. So when you search <"title of cited item 1">, Google Scholar is usually able to see the title somewhere in the reference section and pick that up.
However there are some cons with this method.
The main issue is using quotes around the cited item. This is because determining if a reference is a cite to another item is often very tricky. A straight forward match of the quoted title string search usually works if the title appears in the reference, but often circumstances like spacing, punctuation marks and encoding issues or even slightly different variant titles (which may or may not be an citing error) will mean the full text search will fail and result in this method missing a citing item.
It is likely Google Scholar like other citing indexes are using cited by algorithms which are more sophisticated (some sort of normalization + fuzzy match) than simply matching the exact title by quotes to determine cited by which increases recall of cited items.
Related is the fact that the trick above references grouped item identifers (e.g. 17209805240088812907) that refer to grouped items and this allows you to pull all citations to the grouped entity/item rather than the individual variant represented by the cited quoted string.
This can be both good or bad depending on how accurate the grouping is though.
On one hand, if the cited item you are looking for has slightly different titles in it's different formats say from preprint to published paper (or in the case of a book - different imprints) and Google Scholar is able to group them correctly, then the hack which references the identifer will be able to catch all citations even those that cite variants.
On the other hand, if Google Scholar is wrongly grouping items that should not be grouped together (which does happen), the citations from those items will be wrongly considered in the search.
The main con of using the trick or hack I suggest above is purely unofficial. I have found using it too often will cause Google to start throwing a lot of captchas , perhaps it thinks you are up to some suspicious business. So use at your own risk in moderation!
Conclusion
This is a unusual little Google Scholar trick that might be useful. But I cannot vouch for it's reliability except to say I have tried a couple of cases, and they generally work as expected at least by Google standards.
I also do not claim to be the first to discover this, as the moment I got it to work, I started having a vague deja view feeling I have encountered this trick before but I can't seem to remember or find when or where.
If you have tried this, what is your experience so far?

