Massive open index of scholarly papers launches

An ambitious free index of more than 200 million scientific documents that catalogues publication sources, author information and research topics, has been launched.

The index, called OpenAlex after the ancient Library of Alexandria in Egypt, also aims to chart connections between these data points to create a comprehensive, interlinked database of the global research system, say its founders. The database, which launched on 3 January, is a replacement for Microsoft Academic Graph (MAG), a free alternative to subscription-based platforms such as Scopus, Dimensions and Web of Science that was discontinued at the end of 2021.

“It’s just pulling lots of databases together in a clever way,” says Euan Adie, founder of Overton, a London-based firm that tracks the research cited in policy documents. Overton had been getting its data from various sources, including MAG, ORCID, Crossref and directly from publishers, but has now switched to using only OpenAlex, in the hope of making the process easier.

Improved coverage

Microsoft’s move to close MAG, announced last May, worried some academics and others who used its data to conduct studies and build research tools.

In response to MAG’s closure, non-profit scholarly services firm OurResearch in Vancouver, Canada, created OpenAlex, using part of a US$4.5-million grant from London-based charity Arcadia Fund. The index is currently accessible through an application programming interface, or API, that can perform complex searches. A simpler search-engine interface is scheduled to launch in February.

OpenAlex draws its data from MAG’s existing records and from other sources including Wikidata identifiers, ORCID, Crossref and ROR, says Jason Priem, co-founder of OurResearch.

The tool is also integrated with the Unpaywall database, which contains more than 30 million open-access articles that Priem and OurResearch co-founder Heather Piwowar launched in 2017. “We now have much better coverage of open access than MAG ever did,” Priem says. “Not only can we tell you where the free-to-read copies of any particular article live, but we can also tell you the licence and the version of that article.”

Priem says that OpenAlex updates every fortnight by bringing in more data from its sources. The tool goes a step further towards openness than MAG did, because OpenAlex’s data is freely available under a CC0 copyright licence for anyone to build on, says Priem. That means that if OpenAlex were to be discontinued, any researcher can pick up where OurResearch left off instead of having to rebuild the whole database from scratch.

Easy set-up

OpenAlex is also free to use, thanks to sponsorship from Amazon Web Services, and requires no registration or log-in information, making the process more user-friendly, says Priem. This differs from MAG, for which users had to log into Azure, Microsoft’s cloud-hosting system, and pay a small fee to download their data set. Priem says that his firm might consider rolling out a premium, pay-to-use tier of OpenAlex for users who want super-fast access, but a free up-to-date version will always be available.

It’s “written in such a way that’s very easy for somebody to pick up and use”, says Adie. He adds that it took him only about 20 minutes to get started on OpenAlex, compared with three to four days with MAG. “The downside is that Microsoft had a lot of technical capability that they could apply to Microsoft Academic. So we’ll have to see how OurResearch does without that,” Adie says.

Roar Bakken Stovner, who studies researchers’ citations patterns at Oslo Metropolitan University, says that it took him around two hours to start working with OpenAlex, compared with around a week with MAG. “For somebody who is more computer savvy, MAG might be easier,” he says. “For researchers who want to try small projects on their own, OpenAlex will be way easier to start with.”

Frode Opdahl, chief executive of Keenious, a start-up firm based in Tromsø, Norway, which scans millions of papers to suggest relevant references, says he’s pleased with the documentation published about OpenAlex. “It makes it a lot easier to work with and implement into our product,” he says.



