Most researchers have good reason to grumble about peer review: it is time-consuming and error-prone, and the workload is unevenly spread, with just 20% of scientists taking on most reviews.
Now peer review by artificial intelligence (AI) is promising to improve the process, boost the quality of published papers — and save reviewers time.
A handful of academic publishers are piloting AI tools to do anything from selecting reviewers to checking statistics and summarizing a paper’s findings.
In June, software called StatReviewer, which checks that statistics and methods in manuscripts are sound, was adopted by Aries Systems, a peer-review management system owned by Amsterdam-based publishing giant Elsevier.
And ScholarOne, a peer-review platform used by many journals, is teaming up with UNSILO of Aarhus, Denmark, which uses natural language processing and machine learning to analyse manuscripts. UNSILO automatically pulls out key concepts to summarize what the paper is about.
Crucially, in all cases, the job of ruling on what to do with a manuscript remains with the editor.
“It doesn’t replace editorial judgement but, by God, it makes it easier,” says David Worlock, a UK-based publishing consultant who saw the UNSILO demonstration at the Frankfurt Book Fair in Germany last month.
Calling the shots
UNSILO uses semantic analysis of the manuscript text to extract what it identifies as the main statements. This gives a better overview of a paper than the keywords typically submitted by authors, says Neil Christensen, sales director at UNSILO. “We find the important phrases in what they have actually written,” he says, “instead of just taking what they’ve come up with five minutes before submission.”
UNSILO then identifies which of these key phrases are most likely to be claims or findings, giving editors an at-a-glance summary of a study’s results. It also highlights whether the claims are similar to those from previously published papers, which could be used to detect plagiarism or simply to place the manuscript in context with related work in the wider literature.
“The tool’s not making a decision,” says Christensen. “It’s just saying: ‘Here are some things that stand out when comparing this manuscript with everything that’s been published before. You be the judge.’”
UNSILO’s prototype gets information from the PubMed Central scholarly database, which lets it compare new manuscripts with the full text of 1.7 million published biomedical research papers — a large, but limited, data set. The company says it will soon add more than 20 million further PubMed papers. Its collaboration with ScholarOne, which is owned by Clarivate Analytics in Philadelphia, Pennsylvania, will give it access to many more again, including those in Clarivate’s Web of Science database.
Giuliano Maciocci, who leads an innovation team at the journal eLife in Cambridge, UK, says that UNSILO is an interesting solution to some of the headaches in peer review, but it isn't something eLife would consider adopting. “We’re not entirely convinced it would be particularly useful in the context of a journal such as ours, where manual, expert curation is very important,” he says.
Worlock notes that there are several similar tools emerging. He is on the board of Wizdom.ai in London, a start-up owned by publishers Taylor & Francis, which is developing software that can mine paper databases and extract connections between different disciplines and concepts. He says that this kind of tool will be useful beyond peer review, for tasks such as writing grant applications or literature reviews.
From plagiarism to p values
Many platforms, including ScholarOne, already have automatic plagiarism checkers. And services including Penelope.ai examine whether the references and the structure of a manuscript meet a journal’s requirements.
Some can flag issues with the quality of a study, too. The tool statcheck, developed by Michèle Nuijten, a methodologist at Tilburg University in the Netherlands and colleagues, assesses the consistency of authors’ statistics reporting, focusing on p values. The journal Psychological Science runs all its papers through the tool, and Nuijten, says other publishers are keen to integrate it into their review processes.
When Nuijten's team analysed papers published in psychology journals, they found1 that roughly 50% contained at least one statistical inconsistency. In one in eight papers, the error was serious enough that it could have changed the statistical significance of a published result.
“That’s worrisome,” she says. She’s not surprised that reviewers miss such mistakes, however. “Not everyone has time to go over all the numbers. You focus on the main findings or the general story.”
For now, statcheck is limited to analysing manuscripts that use the American Psychological Association’s reporting style for statistics.
By contrast, the creators of StatReviewer, Timothy Houle at Wake Forest University School of Medicine in North Carolina and Chadwick DeVoss, CEO of tech start-up NEX7 in Wisconsin, claim that the tool can assess statistics in standard formats and presentation styles from multiple scientific fields. To do this, it checks that papers correctly include things such as sample sizes, information about blinding of subjects and baseline data.
StatReviewer can also identify markers of fraudulent behaviour, says DeVoss. “Things like, did they game some statistical rules, or did they flat-out make up data? If the risk is higher than what the journal is used to seeing, they can look into the details.”
Algorithm on trial
DeVoss says that StatReviewer is being tested by dozens of publishers. A 2017 trial with the open-access publisher BioMed Central in London was inconclusive because the tool did not analyse enough manuscripts but did nonetheless provide some insights (BioMed Central is now planning a follow-up).
StatReviewer caught things that human reviewers missed, says Amy Bourke-Waite, communications director for open research at Springer Nature, which owns BioMed Central and publishes Nature (Nature's news team is editorially independent of Springer Nature). For example, it was good at catching papers that did not meet required standards, such as following CONSORT, a manuscript format used by many publishers.
Bourke-Waite also reports that authors who took part said that they were as happy responding to StatReviewer reports as they would have been to a human reviewer’s.
Occasionally, she says, StatReviewer got things wrong — but sometimes its slip-ups drew authors’ attention to unclear reporting in their manuscripts.
Limits of automation
Even if the trials prove successful, DeVoss expects that only some journals will want to pay to have all their manuscripts scanned. So he and his colleagues are targeting authors, too, hoping that they will use the tool to check their manuscripts before submission.
There are potential pitfalls to AI in peer review in general. One concern is that machine-learning tools trained on previously published papers could reinforce existing biases in peer review. “If you build a decision-making system based on the articles which your journal has accepted in the past, it will have in-built biases,” says Worlock.
And if an algorithm provides a single overall score after evaluating a paper, as StatReviewer does, there might be temptation for editors to cut corners and simply rely on that score in deciding to reject a paper, says DeVoss.
Algorithms are not yet smart enough to allow an editor to accept or reject a paper solely on the basis of the information they extract, says Andrew Preston, co-founder of Publons, a New Zealand-based peer-review-tracking start-up acquired by Clarivate Analytics that is using machine learning to develop a tool to recommend reviewers. “These tools can make sure a manuscript is up to scratch, but in no way are they replacing what a reviewer would do in terms of evaluation.”
Nuijten agrees: “The algorithms are going to need some time to perfect but it makes sense to automate a lot of things because a lot of things in peer review are standard”.
Nature 563, 609-610 (2018)