Anurag Acharya

Google Scholar’s co-founder, Anurag Acharya.Credit: Amit Basu

Google Scholar, the popular free search engine for scholarly literature, revealed an unexpected feature on 23 March: it is keeping track of whether research papers covered by funders’ public-access mandates are free to read.

A scientist’s Google Scholar profile now displays how many of their papers should be free to read because a funder requires it; how many actually are; and how many are not. The search engine also encourages authors to make non-compliant papers public, if necessary simply by uploading them to their Google Drive. Researchers’ reactions have been mixed. Some have called it a ‘wall of shame’ and criticized it for mistakes — but others have welcomed it for prompting researchers to make their papers public.

Anurag Acharya, the co-founder of Google Scholar, explained to Nature how the tracking works — and how it might change in the future.

Why are you doing this?

The idea came up a couple of years ago, as funder mandates were becoming a bigger part of the scholarly ecosystem. There are an incredible number of public-access mandates from funders all over the world, and we thought authors would want to keep track and know what they’re expected to do. We wanted to provide this for authors to see what is really a part of the publication process. In this public-access world that we are gradually moving to, the publication process only ends when the paper is both published and available to the rest of the world to read.

How does it work?

We automatically detect when funding agencies are acknowledged as supporting a research work. We look for around 2,000 wording variations in an article’s text — ‘funded by’, ‘supported by’ and so on — to pick that up. We also found 175 funders that have publicly documented their mandates, including the dates they apply from, so that we can point authors to these documents.

For papers that appear to be supported by a funder with a relevant documented mandate, we check if we can find freely available versions of those works at any website (with the publisher’s version prioritized). If we can’t, we show the author the mandate that we think is relevant, and invite them to make their work available. We tell them to check if it’s actually available, or whether they can upload the paper to a funding agency or institutional repository. As a final fallback, we invite them to upload their paper to their Google Drive.

What if you make mistakes?

This automated process is bound to make mistakes, such as incorrectly assuming a funder supported a paper, or not seeing a public version of a paper, perhaps because of difficulty indexing a particular repository. But the author can always click ‘make a correction’ on their article, and tell us about the mistake.

What counts as freely available?

We simply check whether a public version of an article is free to read. We do not check if it has an open-access licence or whether it is peer-reviewed. This is because our first focus is to let people read research. You have to walk before you can run, especially at this big scale. So we do not track everything that some funding agencies currently specify in their mandates.

Are you competing with other services that track open access research, such as Unpaywall?

No. Unpaywall is doing a great job. It tracks articles’ DOIs [digital object identifiers] and makes browser extensions so that if you visit a scholarly article, it sees the DOI and points you to free versions. And its data on DOIs feeds into scholarly databases, such as Scopus and Web of Science, to help them track a paper’s open-access status. We don’t make browser extensions and we aren’t able to make our bulk data available. Our approach is for the individual scientist.

Google scholar profiles now show how many of a researcher’s papers should be free to read because a funder requires it — and whether they actually are (green) or are not (red).

Some scientists say that uploading a paper to Google Drive doesn’t make it easily discoverable outside the Google ecosystem — other services can’t easily harvest it.

That is true. As a scholarly search engine, we aim to point people to papers elsewhere. But Google Drive is the last resort to allow people to make their papers available. It’s not ideal. That’s why we have messaging encouraging authors to try other ways to make papers public first.

What happens if someone uploads a paper that they don’t own the rights to?

Publishers would detect it (if it’s made public) and could insist that it be taken down.

And what if a person adjusts their record falsely — correcting an article to say it doesn’t need to be public access, for instance, or claiming papers that aren’t theirs?

This is possible, but we don’t see much of this behaviour. We allow people to claim or add their own papers on their Google Scholar profile, and to decide for themselves what to publicly present. We do this because, for instance, we don’t want to insist to authors that we know a paper is or isn’t theirs. In practice, we don’t see much gaming because, for scholars, the reputational cost of being seen to do it is large and obvious.

How might this service change in the future?

We would like to work with funding agencies to discuss additional customization. For instance, the US National Institutes of Health requires that scientists deposit articles in PubMed Central — so perhaps we could include that requirement in our tracking. But we have no plans to look at open-access licences: the licence information is often not in articles so we cannot track it.