A Deep Dive into EBSCOhost's Natural Language Search and Web of Science Smart Search - Two bundled "Ai-powered"search (II)

Web of Science Smart Search - unique transparent hybrid search

Jul 14, 2025

In the first part of this series, I covered EBSCOhost’s new Natural Language Search (NLS) which uses a Large Language Model (LLM) to expand a user's input query to a Boolean Search Query and used to run over the conventional search system. In this article, I will focus on Web of Science’s Smart Search first launched in April 2025. Similar to the offering from EBSCOhost, this is bundled with your product at no additional cost. Also, like EBSCOhost NLS, this focuses only on improving your search query results, particularly when you type in natural language, and does not generate direct answers using retrieval augmented generation.

It is important not to confuse Web of Science Smart Search with the paid add-on - Web of Science Research Assistant which does retrieval augmented generation. See my review of it here.

How Web of Science Smart Search works

Web of Science's Smart Search like any “AI powered search” today aims to simplify the search experience by allowing users to input queries in natural language, much like they would describe them, rather than requiring complex Boolean operators and syntax.

But what exactly does it do?

First here are some minor features mentioned in the press release

Typeahead (which if you squint is almost by definition “AI-powered”) is hardly a wow feature. Query and abstract translation are more interesting but at its current stage, it supports Chinese only which is handy I guess when searching for COVID related topics etc.

The more detailed support pages also suggest this mode also handles typos.

But how does Semantic Search work? We are told

Semantic search enables users to effortlessly expand search criteria to ensure comprehensive results. Smart Search automatically finds semantically similar terms and incorporates them into the user’s query to locate content beyond the entered keywords.

and

Natural language querying facilitates more efficient search. Users can enter queries as they would describe them rather than worrying about Boolean operators and complex search syntax.

also, interestingly

Combined searches for papers and people simplify the user experience by including both papers and researcher profiles in results.

But let’s focus on “papers” side of things in this review.

The core of Smart Search, however, is its hybrid (or combined) approach that combines two distinct search methods: a traditional Boolean search and a semantic search. This "combined search" functionality runs two or more search methods simultaneously.

While a hybrid search that tries to use both to compensate for the strengths and weaknesses of keyword search and Semantic Search is a common setup, the interface also allows you to toggle between the combined results, or individual "Boolean" and "Semantic" results - which is rarely seen.

But what does “Semantic” here mean?

Academic search that incorporates “semantic” search these days almost certainly means at least one of two things (or both):

a) it uses an LLM to convert natural language input to a Boolean Search (e.g. Web of Science Research Assistant)

b) it uses a (dense) embedding/vector based method to match results (e.g. Part of Scopus AI)

Below shows a simplified example of how both the query and documents are run through an embedding model to generate representations in embeddings and a comparison function like cosine similarity is used to calculate a relevance or closeness score.

Unfortunately, looking at the existing document around this feature, I don’t see any mention of “vector search” or “embedding search” which is typically mentioned even in the detailed support pages. It gets confusing what is going on.

It sounds like there is some query interpreter that does NER (Named entity recognition) and the part about “used to generate a structured Web of Science query” sounds like it is generating a Boolean Search.

The examples at the end of the detailed support pages don’t clarify it for me either. I can’t tell if the semantic part is just LLMs (or some other “AI” NLP parser) helping to generate a Boolean search query or if this is just the Boolean part and the Semantic part is using embeddings to match.

That said, while Web of Science documentation doesn't explicitly use the terms "vector search" or "embedding search," it does state that "AI-driven technology is used to interpret the meaning and context of the search" and that Smart Search "automatically finds semantically similar terms and incorporates them into the user’s query to locate content beyond the entered keywords." This, combined with the fact that the system can handle multilingual searches, strongly suggests that an embedding-based approach is at play. Many modern embedding models are multilingual, which would explain the ability to match queries across languages.

What do the filter options mean?

While hybrid searches are not unusual now, this is usually done other the hood with no way to distinguish result sets from the different search methods. One of the most unique aspects of Web of Science Smart Search is the ability to filter and split results by "Boolean," "Semantic," and "Combined Semantic and Boolean”.

This is a feature I have always wished to be in the interface, as it allows for a more nuanced analysis of search results.

For example, a user could first examine the standard keyword results and then, with a simple filter change, view the additional results retrieved through the semantic search (the complementary method).

Looking at this interface at the default “combined Semantic and Boolean” results, you can see entries that are labelled with “Semantic Search result” and others are not labelled.

How do you interpret this tag?

While the documentation does not clarify, I believe the "Semantic Search result" label is applied only to results that are unique to the semantic search and not also found in the Boolean search.

I came to this conclusion because my tests show the Semantic Search results never overlap the Boolean results! This is unlikely to happen unless the search already pre-filters the Boolean set away when showing the Semantic results.

Another clue is to notice how the combined ranking works.

Take the following entry that is ranked #1 in the default “Combined Semantic and Boolean” relevancy ranking.

Yet. when I filtered it to Boolean, it is ranked only #9

If the combined ranking is based on both lists, then you would expect the Semantic Search result to not only retrieve it, but for it to be ranked higher than 9th, to bump up the overall ranking.

In fact, #1 is a recent systematic review on the topic, and I fully agree with its #1 ranking.

But when I restrict to Semantic filter, this article does not appear at all! Most probably because that filter shows only unique semantic search results!

Boolean Search result

Basic testing indicates that filtering to "Boolean" in Smart Search yields the same number of results as a conventional keyword search in Web of Science.

For example, the query

"impact of climate change on biodiversity"

in the normal Web of Science (Core Collection only) returns 20,190 results for my institution. The exact same search in Smart Search, when filtered to "Boolean," provides the same number of results, and the order is largely similar, demonstrating the reproducibility of Smart Search.

Semantic Search result

While not explicitly confirmed, several factors point to the semantic search being an embedding-based search.

First, unlike other systems that use LLMs to generate Boolean queries (such as EBSCOhost NLS, Scopus AI, and even Web of Science Research Assistant itself), the Smart Search interface does not display the converted Boolean search query

Second, for most queries, the semantic search returns a maximum of 100 results. This is characteristic of embedding-based methods, which are designed to efficiently find the top K closest results.

The multilingual capabilities of Smart Search also support the embedding theory. While an LLM could potentially translate a query to run in both English and Mandarin, a multilingual embedding model across many languages would be a more seamless way to match queries across languages simultaneously.

Lastly, in part 1, I showed that EBSCOhost’s NLS failed to match a certain known relevant paper with the query “Is there an Open Access Citation Advantage” because the constructed Boolean search strategy failed to provide enough looseness to match a paper that lack the word “advantage”.

Yet Web of Science Smart Search successfully located the desired paper as a "Semantic Search only result," although it was ranked low.

Furthermore, when filtering to "Semantic Search," which I believe shows only semantic search results, other relevant results can be found, demonstrating the potential of this approach to uncover valuable studies that might be missed by a traditional keyword search. (Note: there are only a few studies that use RCTs , so Semantic search added a lot of value).

Conclusion

In conclusion, Web of Science Smart Search is a valuable addition to the landscape of AI-powered academic search tools. It shares similarities with other tools in its use of natural language processing and a hybrid search model that combines traditional Boolean logic with modern semantic search capabilities.

However, Smart Search distinguishes itself with a few unique features. The most notable (and the one I wish others would copy) is its transparent and filterable hybrid search results. The ability for users to toggle between "Combined Semantic and Boolean" "Boolean," and "Semantic" results provides a level of insight into the search process that is not commonly found in other tools.

This transparency allows researchers to see the distinct contributions of both search methods and understand how they are being combined to produce the final ranking.

Aaron Tay's Musings about Librarianship

Discussion about this post