After being kicked out of a hotel conference room where they had participated in a three-day open-science workshop and hackathon, a group of computer scientists simply moved to an adjacent hallway. There, Heather Piwowar, Jason Priem and Cristhian Parra worked all night on software to help academics to illustrate how much of their work was freely available on the Internet. They realized how much time had passed only when they noticed hotel staff starting to prepare for breakfast.
That all-nighter, back in 2011, laid the foundation for Unpaywall. This free service locates open-access articles and presents paywalled papers that have been legally archived and are freely available on other websites to users who might otherwise have hit a paywalled version. Since one part of the technology was released in 2016, it has become indispensable for many researchers. And firms that run established scientific search engines are starting to take advantage of Unpaywall.
On 26 July, Elsevier announced plans to integrate Unpaywall into its Scopus database searches, allowing it to deliver millions more free-to-read papers to users than it does currently. Scopus’s embrace of Unpaywall, along with similar moves by other search engines, means that much more open-access content is now at researchers’ fingertips. These deals are also enabling funders, librarians and others to study open-access publishing trends comprehensively for the first time.
“Unpaywall is a ground-breaking development,” says Alberto Martín-Martín, who studies bibliometrics and science communication at the University of Grenada in Spain. “It takes us one step closer to achieving a true open research infrastructure.”
After participating in the 2011 hackathon, Piwowar and Priem founded a non-profit organization called Impactstory, in Vancouver, Canada, where they refined Unpaywall. (Parra is now a consultant at the World Bank in Asunción, Paraguay.)
Research by Priem and Piwowar published in August 2017 in PeerJ Preprints — using Unpaywall, naturally — suggests that almost half of the recent research papers that people search for online are available for free1. But, says Priem, “there is a terrific gap between the availability and discoverability” of these papers, and it is this problem Unpaywall hopes to solve.
Unpaywall consists of a database that includes a list of almost 20 million freely available scholarly articles. Most researchers access it using a browser plug-in that was released in 2017. The service works by searching for a queried paper’s unique digital tag — a string of numbers and letters known as its DOI, or digital object identifier — against those of articles gathered from 50,000 journals and repositories.
In June 2017, Unpaywall was integrated into a popular science search engine called Web of Science, which is operated by Clarivate Analytics. Dimensions, a service run by Digital Science that launched this year, used Unpaywall from the get-go. These companies, and now Elsevier, pay a subscription fee for a feed of Unpaywall’s database that is updated weekly.
Impactstory also offers free access to the Unpaywall database (updated twice a year for non-subscribers), the browser plug-in and an interface that allows programmers to interact with Unpaywall to retrieve data.
Since its launch, Unpaywall’s technology has also been integrated into many university-library discovery systems, so that users can easily find freely available versions of research papers in institutional repositories. These archives, which are operated by universities, funders and others, host a large portion of articles in Unpaywall’s database, but were difficult to search systematically in the past.
Scientists using Scopus can filter their results to find freely available papers, but the database links to only around 1.5 million papers published in fully open-access journals. Once Unpaywall’s integration is complete in November 2018, searches carried out on Scopus for free-to-read literature will also find articles on publisher platforms, even if the journal publishes a mix of open-access and paywalled articles.
This will boost the number of freely available articles in Scopus to 7 million, which is still around 13 million articles fewer than are listed in Unpaywall’s database as freely available (See 'Unpaywall Revolution'). This gap exists because Scopus will not link to articles posted in repositories.
Chris Banks, director of library services at Imperial College London, says she is perplexed by the fact that Scopus will not surface the majority of free-to-read content in repositories. Unpaywall is handy precisely because it uncovers these hard-to-find papers, she adds.
Large citation databases such as Scopus and Web of Science list the majority of all research articles. By integrating their records with Unpaywall data, researchers can systematically measure the proportion of the literature that is freely available — a feat that wasn’t previously possible. Scopus and Web of Science searches can also be filtered according to the nationality of the authors, their institution and subject area — allowing free-to-read articles to be identified according to these and other criteria.
The US National Institute of Mental Health (NIMH), which has an overall budget of around US$1.5 billion, is working with Impactstory to develop a bespoke tool that uses Unpaywall. The agency’s goal is to determine the extent to which researchers working at NIMH laboratories in Bethesda, Maryland, and nearby Rockville are making their papers, data and source code freely available.
Priem says that Impactstory hopes to offer a system similar to the one it is developing with NIMH for other institutions, whereas some universities and funders are already innovating with Unpaywall. Researchers at the University of Barcelona and the Polytechnic University of Catalonia in Spain have used Unpaywall to measure the proportion of articles published by researchers at their institutions that are freely available.
For Priem, making Unpaywall a go-to tool for researchers is just the start. Last month, Impactstory secured a US$850,000 grant to create a search engine aimed at non-scientists. It will also use artificial intelligence to summarize journal articles in its database in plain language, so that non-specialists can understand them. “20 million articles are free for everyone to read but might as well be closed if there is no way for any average person to access it,” he says. “We’re not yet finished.”
Nature 560, 290-291 (2018)
Piwowar, H. et al. Preprint at PeerJ Preprints https://doi.org/10.7287/peerj.preprints.3119v1 (2017).