Microsoft academic preview - Microsoft's new rankings of journals, institutions, authors & conferences
It seems like just a short time last year in May where I blogged about the new iteration of Microsoft's academic search dubbed "Microsoft Academic" which was then in beta and it eventually left beta at the end of the year. But it seems they are at it again, this time there is a Microsoft Academic Preview.

On first glance this looks just like a run of mill face lift.

Don't get me wrong, overall I think this is an improvement. For example, the collection of citations to export now goes to the top right which is more inline to what we see in many academic databases and I like the new display of the "topics" (the test tube icon) below each article, allowing you to go to the topic page for that topic.
Clicking on say "Corporate Transparency" icon brings you to the topic page on Corporate Transparency.

From the topic paper, you can see top authors, journals, conferences and institutions in that topic. You can filter by year if you wish as well.
It also serves as a topic browser, you can browse to parent topics and related topics.
It's a interesting approach. In a typical database search, clicking on topics, I would expect it to further narrow down the search to a even smaller set containing only articles of that topic , but here it brings you to a brand new topic page of that topic.
But beyond that, I also noticed by accident Microsoft academic is providing analytics of
Of these the analytics for Journals, Conferences, Institutions and Authors provide rankings which make them very interesting.
I have some interest in bibliometrics but cannot be considered a bibliometrician, so I will just mention some of the things I notice about it, and leave the rigorous evaluation by the real experts.
The analytics with rankings are pretty much similar, so I will review the ones of most interest to me, Journals analytics, Institutions analytics and Authors analytics
Journal Analytics

Like most of the analytics page, journal analytics page allows you to use the topic browser to browse to lower level topics.
The main issue I have with the topic explorer is while there is a "filter topic" field, it is pretty useless.

The filter by topic only matches the top level topics
I was expecting "filter topic" to be able to search through all the topics, but sadly it only matches the topics already showing.
So for example if you searched for "Library Science" or "Accounting" , it would show nothing.
This means you have to find those topics by trial and error. For the record, to get to Library Science topic you need to click on Computer Science. While Accounting topic is under the main topic Business.

To get to the topic "Library Science", you need to know to filter to "Computer Science" first.
More troubling is that while similar journal rankings by Google, Clarviate, Scopus at best have 2 levels and are easy to figure out, I've found examples in Microsoft Academic where the hierachy goes down 6 levels from the top! (see example below ). Good luck finding it without a search of all topics.

Topic : Privacy equity secondary market - 6 levels below. At it's peak this topic generated 60-80 papers annually
Like the other analytics pages with rankings, you can choose to rank by citations, number of publications , H-index and something called "Rank" which I have no idea what it does.

You can also restrict rankings to 1/5/10/all years.
If you look carefuly there is a asterisk next to the citations option. Looking at the FAQ, I see this

Yes, citations in Microsoft Academic are estimations and in fact this isn't new to this preview version. But when these citations are used for rankings I can imagine this can lead to some questions.
Journal rankings in Microsoft Academic vs Google Scholar Metrics
Google Scholar of course famously released their Google Scholar metrics in April 2012 and have usually continued to update their list yearly. How similar or different are they?
Both require you to guess and try to decide where say Library Science would fall under. In this case , it's (or rather "Library & Information Science") placed under the main category "Social Science".
The Google Scholar metrics has been critiqued for including non journals in the listing like RePEc , SSRN and Arxiv. Playing around with topics like Economics and Physics, we see that Microsoft academic journal ranking have these non-journals included as well, and they tend to rank high when restricting to more recent periods (as expected).
For example, you can find SSRN in the Economics ranking at #1 (1 year) and Arxiv for physics, though to be fair this like Google Scholar metrics rankings, it doesn't take the whole repository, but subsets of it e.g. Economics network in SSRN.
A difference is that Google Scholar metrics gives you rankings by languages, as of writing this includes a dozen languages including English. These rankings by languages isn't available in Microsoft Academic.
To be fair, it isn't necessarily true that Microsoft academic indexes only English Journals, a quick search suggusts some of the Spanish Journals are indexed as well, there just isn't a ranking by languages.
On the other hand, the Google Scholar metrics are updated once a year only. I could be wrong, but I get the impression , Microsoft academic's rankings are updated in real time? Also you get ranking of 100 journals per topic as opposed to just 20 in Google's.
Comparison of the two rankings in Library Science
Let's try comparing the two.
Microsoft's "Library Science" ranking of journals for the last 5 years for H-index is as follows for the top 20.
1. The Library Quarterly
2. The Journal of Academic Librarianship
3. College & Research Libraries
4. Nature
5. Journal of Documentation
6. Library & Information Science Research
7. portal - Libraries and the Academy
8. Journal of Librarianship and Information Science
9. Journal of The Medical Library Association
10. Reference Services Review
11. South African Journal of Libraries and Information Science
12. New Library World
13. Journal of Library Administration
14. Reference and User Services Quarterly
15. New Review of Academic Librarianship
16. PLOS ONE
17. Archival Science
18. Journal of Librarianship and Scholarly Communication
19. American Archivist
20. Australian Academic & Research Libr
Compare this to Google Scholar's metric ranking of "Library & Information Science" journals by H5

I guess it is fair to say that the Google Scholar list looks much better by far. The Microsoft list has entries like Nature (#4), PLOS (#16) that look out of place.
Even for the journals that look correct, the ranking can be a bit odd. For instance #1 Library Quarterly is odd.
Some of differences, I suspect is due to the fact that Google Scholar's list is on "Library & Information Science", while Microsoft's is only on "Library Science", with a further subtopic of "Information Science".
So for example Journal of the Association for Information Science and Technology a top notch journal in the information science field is rightly ranked as #1 in Google's list but only #45 in Microsoft's list. But the moment you drill down to the subtopic Information Science , it jumps to #3.
Similarly Scientometrics #2 in Google's list, is #4 in Microsoft's subtopic Information Science.
Still it looks really odd to have the likes of Nature. Surely whether the topic is "Information Science" or "Library & Information Science", you shouldn't have Nature in the ranks.
On the surface, it seems hard to see which articles in Nature lead it to be ranked so highly in the "Library Science" topic.
But Microsoft Academic's "semantic search" (aka typing slowly and choosing suggestions comes to the rescue.
Semantic search in Microsoft Academic
Following the same method as in the above video you can quickly narrow down to content that is tagged with topic library science , is in the journal Nature and with dates after 2012.

Finding papers with topic Library science from journal Nature, published after 2012
By typing slowly nature library science and after 2012 and selecting the drop down sugguestions, you can filter down to precise results that are in the journal Nature, have the topic "Library Science" and was published after 2012. I'll let you decide on your own if the articles included by Microsoft's auto tagging of topics is accurate enough or not.
You can do similar tricks with institutions, authors and conferences by selecting the right icons in the autosuggestions (e.g. Testtube icon = topic, book icon = journal, calender icon = date)
Ranking journals by thousands of topics vs few categories
It seems to me that Google Scholar's classification of categories that journals fall into are quite conservative and produces similar results as Scopus's or Web of Science's, where each journal is classified into a few selected categories (typically below 50 classes).
Microsoft Academic on the other hand, autoclassifies articles into thousands of topics (or Fields of study) - where at the time of writing = 229,000 topics and journals can be ranked into as many of them as it has papers in those topics.

This no doubt is a big part of why the rankings are so different and results in journals like Nature appearing so high in so many rankings.
Institutional rankings
As interesting as journal rankings are, there are quite a few of them already, both paid ones by Scopus and Web of Science and free ones like Eigenfactor.org, SCImago Journal Ranking, CWTS , Scopus metrics and Google itself etc
Ranking of institutions by topics is a different ball game here. There is a lot of effort needed to do proper Author entity disambiguation and then assignment to institution (which have to be normalized as well).
This is perhaps why both Clarivate and Elsevier are charging for access to products like incites and Scival for tracking and benchmarking of institutions.
Also as I've noted in my blog post comparing big mega-indexes, it's not a coincidence that the freemium versions of Dimensions and 1Findr which also have multiple citation metrics , do not allow you to filter by Institutions. Clearly the functionality of filtering by institution for comparisons is something that can be commericialised.
There are only two exceptions. One is Lens.org which combines datasets from Crossref, Pubmed and Microsoft academic. It is one of the two that allows you to filter to a topic say History and the analytics page will show the Top institutions, journals etc in that topic.

Analytics in Lens when filtered to topic = History
This is definitely worth studying more, but it has only one citation metric, total cites and no other citation metrics and it's unclear how well the data quality is here.
The thing about Microsoft academic is that they seem to be working hard on this issue. In a blog post - How Microsoft Academic uses knowledge to address the problem of conflation/disambiguation, they talk about the efforts they make in this area.
They not only use AI to try to disambiguate author entities making use of the information "cues" in Microsoft Academic Graph, but also use data minded from authors’ web sites and online CVs. Given Microsoft now owns Linked-in, one can imagine the possibilities....
They write
"The second source of information we use is a recent development that we introduced last week. Our data scientists have developed a method for mining data from authors’ web sites and online CVs. Taking advantage of Microsoft’s web-scale infrastructure, by analyzing billions of documents found on the web, the team has taught the machine to recognize web pages that belong to researchers or may be CVs. Those pages have a set of common characteristics, a notable one being a list of publications. The list of publications found online is then compared with the data in the Microsoft Academic Knowledge Graph and used to inform the decision about whether authors with identical names are the same person or not.
Our approach is unique in scale. For identifying author homepages on the Web, we scan billions of documents from the Bing Index. We compare information from author homepages with information from more than 170 M papers in the Microsoft Academic Graph. In those 170 M papers, we see mentions of more than 600 M author names. By using the entity linking methods described here, we are able to conflate the number of authors from 500 to about 200 M."
They claim, the system is designed to be conservative and will not add papers to authors unless the system is sure. Authors can also claims profiles and papers like in Google Scholar (this claiming currently works on the production system not the preview version).
All in all they claim "Microsoft Academic provides one of the best conflation/disambiguation experiences on the Web". It's hard to say if they are right without further testing. On paper, Google might be the only other entity which can match their expertise and resources and Google Scholar is pretty good at spotting papers published by you once you have a profile.
Still currently, Google Scholar does not rank institutions or authors so it's unclear if they are focusing that much on the problem.
How good are the instituional rankings?

Microsoft academic institution ranking page
The Microsoft academic institution ranking page is pretty much similar to the journal page. You do get an additional map at the bottom of the page.
You get the same filters for topics. For this quick comparison, I narrowed down to Business->Accounting and compared the rankings list by Microsoft Academic with the BYU accounting rankings of institutions.
BYU accounting rankings of institutions has rankings by methods and topic and gives rankings for last 6 years, 12 years and all years.
The closest comparison is to compare BYU acounting ranks (ALL - last 6 years) with Microsoft academic's last 5 years.
Taking BYU rankings as a base, this is how it looks when we compare with total citations and H-index.

BYU accounting ranks (last 6 years) vs Microsoft Academic citations & H-index (last 5 years)
As you can see while the list are not wildly different, neither are they particularly similar. In particular, #9 and #17 are not even in the top 100 for Microsoft Academic using either total citations or H-index.
That said these two rankings are built in very different ways.
The major difference is the BYU rankings only includes articles from the top 12 accounting journals , while Microsoft's obviously includes all articles tagged with the topic of accounting. Another difference is that the BYU rankings are construed using the authors' current affiliation. So for example, if Researcher A moved from Duke University to UNSW, credit for all the papers by A will be credited to UNSW even though the paper was written while affiliated with Duke. This is I believe quite different from the way Mictosoft Academic does it.
Author rankings
This blog post is very long already, so I will just give a quick screenshot of the Author analytics. It has almost exactly the same options, the same browsing to topics etc.

Author analytics for topic Accountng
I won't do any comparisons, though you can do that with BYU rankings of accounting , SSRN rankings etc.
You also get trending authors, which is probably even more useful for more grainular topics to see who are the rising authors in a new area.

Publications analytics
Lastly, if you go to publication analytics, and refine to topics, you get 4 ranking tables in one page - for institutions, authors, journals and conferences.
Conclusion
This service is still in beta and rankings are a very tricky business. So there's no doubt this is going to be improved further.
What I find interesting is that Microsoft is currently all but giving this data away for free. I have heard from various sources that a few institutions are working with Microsoft Research to use data from Microsoft Academic Graph to create dashboards and do benchmarking.
Will this become a disruptive force? Who knows....

