The number of new papers on the COVID-19 pandemic is doubling every two weeks, and shows no sign of slowing. Many of these papers are published first on preprint servers, which means they are made public before having undergone peer review. This makes it all the harder to judge their merit. Now, one start-up company says that its platform — called Scite.ai — can automatically tell readers whether papers have been supported or contradicted by later academic work.
Unlike conventional citation-metrics tools, Scite.ai tells users how often a paper has been supported or contradicted by the studies that cite it, as well as how many times it has simply been mentioned. The resulting reports display citations in the context in which they are mentioned, allowing users to assess for themselves how the paper is being cited.
So far, Scite.ai has analysed more than 16 million full-text scientific articles from publishers such as BMJ Publishing Group in London and Karger in Basle, Switzerland. But that is just a fraction of the scientific literature. “They’re limited by the literature they can get hold of and the machine-learning algorithms,” notes Jodi Schneider, an information scientist at the University of Illinois at Urbana–Champaign.
Still, the tool — accessible through a searchable website and as Chrome and Firefox browser plug-ins — can provide clarity. In March, the site’s developers pointed its artificial intelligence (AI)-based engine to a database, which at the time included 30,000 papers on different kinds of coronavirus, to help provide context about how much weight each of the articles might carry (see go.nature.com/35nchkp). They found that one 22 February preprint1, which indicated that higher levels of certain immune-signalling molecules are associated with more-severe cases of COVID-19, was supported by a preprint2 from another group just five days later (see the Scite.ai report at go.nature.com/2ztuokb).
Conversely, users who search Scite.ai for a preprint that suggested HIV contributed to the formation of the new coronavirus will find that the report was contradicted by two follow-ups and supported by none (see go.nature.com/2vtdfxd). (The preprint’s authors have since withdrawn it for revision in response to researchers’ comments on the work.) At the moment, Scite.ai’s analysis of the COVID-19 database of papers is not fully automated, so there is sometimes a delay in how quickly the preprints are analysed by the tool.
Scite.ai gets about 1,000 visitors a day and has some 2,700 registered users, a number that is growing since the site began requiring users to register to view the full citation analysis for a given paper on 20 March.
Citation counts are conventionally seen by researchers as measures of influence. But just because a paper is highly cited doesn’t mean it’s a good thing, says Elizabeth Suelzer, a reference librarian at the Medical College of Wisconsin Libraries in Milwaukee. Former physician Andrew Wakefield’s infamous retracted 1998 study that claimed a link between autism and vaccines is highly cited, she notes, but most of those citations are negative. Without a thorough citation analysis, “it would be hard to tell why the article was so highly cited”, Suelzer explains. That, she says, is why a tool such as Scite.ai could be helpful. Other examples include the Retraction Watch plug-in that flags retracted articles for the Zotero reference-management software.
Josh Nicholson, co-founder and chief executive of Scite.ai, first recognized the need for such a tool in 2012. Nicholson was pursuing a PhD in cell biology at Virginia Polytechnic Institute and State University in Blacksburg when he read a Nature commentary that was making waves about scientific reproducibility3. In it, a researcher formerly at the biotechnology company Amgen in Thousand Oaks, California, revealed that scientists there had been unable to reproduce the findings of 47 out of 53 ‘landmark’ cancer studies. That spurred Nicholson and biologist Yuri Lazebnik, then at Yale University in New Haven, Connecticut, to propose a new citation metric to indicate whether a given study or its conclusions have been verified by subsequent reports4. The pair launched Scite.ai in April last year.
At the heart of Scite.ai is a machine-learning algorithm that scans research articles to identify which papers they cite, and to determine whether they support, contradict or simply mention those papers. The algorithm mines the text of articles from publisher partners, including the Rockefeller University Press in New York City and Wiley in Hoboken, New Jersey. Scite.ai have also had preliminary conversations with Springer Nature in Heidelberg, Germany, which publishes Nature, Nicholson says. According to Nicholson, eight out of every ten papers flagged by the tool as supporting or contradicting a study are correctly categorized.
Although the machine-learning algorithm at the heart of Scite.ai has not been made public, Giovanni Colavizza, an AI scientist at the University of Amsterdam, currently a visiting researcher at the Alan Turing Institute in London, says that “their results are sound and precise”, from what he can tell. “Most citations are classified as ‘mentions’, because the classifier is trained to be cautious, which is reasonable, too,” says Colavizza, who is a user of the platform and whose team has analysed data from the start-up in the past.
James Heathers, a data scientist at Northeastern University in Boston, Massachusetts, likes the way that, for each paper, Scite.ai shows snippets of the other articles in which that paper’s citations appear, saving him from having to look up each referring paper and hunt for this context. “Every time I’m exploring a complicated topic from scratch, I’m using this,” Heathers says of Scite.ai. “The sentiment analysis seems to work really well,” he adds, referring to how Scite.ai categorizes positive and negative citations.
But Scite.ai is limited by the papers it can access. The tool has analysed some 16 million full-text articles. But there are more than 53 million scientific articles in the Web of Science (specifically, the Science Citation Index Expanded), and some 112 million articles are registered with Crossref, the registration agency for digital object identifiers (DOIs). Many of these articles are behind paywalls and are therefore inaccessible to Scite.ai.
“The tool in general is pretty promising,” but it is far from having comprehensive training data, says Dario Taraborelli, a science programme officer at the Chan Zuckerberg Initiative in San Francisco, California, who formerly served as head of research at the Wikimedia Foundation, also in San Francisco.
Nicholson acknowledges that getting access to papers on which to run citation analyses is a challenge, and says the company is working to expand that. But he says Scite.ai has made strides in forming partnerships with publishers for this purpose. “No one is looking at citations like we are doing, largely because we had to get access to the content,” he says.
In April, the Scite.ai team released a preprint5 analysing almost 2 million Wikipedia pages. The team reported that, of the more than 800,000 scientific articles cited in the online encyclopedia, almost 18% were not mentioned in subsequent studies and 39% were mentioned by other studies but were neither supported nor contradicted by them. Two per cent of the studies had been contradicted by all papers that cited them and lacked any supporting citations. Nicholson says that nearly 30% of articles referenced in Wikipedia have a supporting citation, compared with about 12% of articles in Web of Science, but he’s concerned that contradicted studies included in Wikipedia might spread bad information among the public.
Contradictory citations are rare in the literature, Nicholson notes, in part because academic circles are small, and negative comments can come back to haunt you. Scite.ai, he says, could induce a cultural shift. “Our hope is that we will encourage people to support more explicitly as well as contradict more explicitly.”