A close-up shallow-focus view inside the CERN Computer

CERN, Europe’s particle-physics laboratory, produces vast amounts of data, which are stored at its computer centre (pictured) and analysed with the help of artifical intelligence (AI). UK funders want to know whether AI could also assist in peer reviewing thousands of research outputs for nationwide quality audits.Credit: Dean Mouhtaropoulos/Getty

Efforts to ease the workloads of peer reviewers by using artificial intelligence (AI) are gathering pace — with one country’s main research-evaluation exercise actively looking into ways of harnessing the technology.

A study commissioned by the United Kingdom’s main public research-funding bodies is examining how algorithms can assist in conducting peer review on journal articles submitted to the UK’s Research Excellence Framework (REF).

The REF, a national quality audit that measures the impact of research carried out at UK higher-education institutions, is a huge undertaking. In the latest iteration, the results of which were published in May 2022, more than 185,000 research outputs were evaluated from more than 76,000 academics based at 157 UK institutions. The results will determine how approximately £2 billion (US$2.2 billion) of funding is distributed among UK institutions each year.

The next REF is expected to take place in 2027 or 2028, and the new study will test whether AI could make the process less burdensome for the hundreds of referees involved in judging research outputs.

The funders that carry out the REF handed over peer-review data from just under 150,000 scientific papers to Mike Thelwall, a data scientist at the University of Wolverhampton, UK. These papers had been evaluated as part of the latest REF. Such data, outlining scores given to individual journal articles, are usually destroyed. But the funders — Research England, the Scottish Funding Council, the Higher Education Funding Council for Wales, and the Department for the Economy, Northern Ireland — gave Thelwall and his colleagues access first.

What’s the score?

Thelwall ran various AI programs on the data to see whether algorithms could yield scores similar to the ratings that REF peer reviewers gave the journal articles. The AI programs base their calculations on bibliometric data, and metadata including keywords in abstracts, titles, and article text.

“We’re looking at whether the AI [programs] could give information that the peer reviewers would find helpful in any way,” Thelwall says. For instance, he adds, AI could perhaps suggest a score that referees could consider during their assessment of papers. Another possibility, Thelwall notes, is AI being used as a tiebreaker if referees disagree strongly on an article — similarly to how REF panels already use citation data.

It seems “incredibly obvious and plausible” that AI should have a role in the REF process, says Eamon Duede, who studies the use of AI technologies in science at the University of Chicago in Illinois. “It’s just not entirely clear what that role is.” But Suede disagrees that AI should be used to assign scores to manuscripts. “I think this is a mistake.”

Anna Severin, a health consultant in Munich, Germany, who has used AI to analyse the peer-review process itself, goes further: “I don’t think AI should replace peer review,” or parts of it, she says. Severin, who works at management consultancy Capgemini Invent, worries that people could become overly reliant on algorithms and misuse AI tools. “All the administrative tasks and the processes surrounding and supporting the actual peer-review process — that’s really an area where AI and machine learning could help with reducing workload.”

One possible application of AI could be to find suitable peer reviewers — often a difficult task, and one fraught with biases and conflicts of interest. Recent analyses have shown that researchers are increasingly declining peer-review requests. This is especially true of a select minority who are constantly bombarded with such requests.

Thelwall says that although it is theoretically possible to use AI to find referees, this was not in the remit of his current project. “Some of the panel members, particularly the chairs, spend a lot of time allocating individual panel members to the outputs,” he notes.

AI has previously been used to streamline peer review and make it more robust. For example, some journals have implemented statcheck — an open-source tool developed by researchers in the Netherlands that trawls through papers and flags statistical errors — in their peer-review process. Some publishers are also using software to catch scientists who are doctoring data.


Algorithms have also been used to measure the rigour of scientific papers and the thoroughness of peer-review reports. But some efforts, such as using AI models to predict the future impact of papers, have drawn a fierce backlash from researchers — partly owing to a lack of transparency about how the technology works.

Thelwall agrees that the inner workings of AI systems are often not transparent, and are open to abuse through manipulation, but he argues that conventional peer review is already opaque. “We all know that reviewer judgments for journals often diverge quite a lot.”

Thelwall’s results will be released in November. The UK funders then plan to decide early in 2023 how to proceed with future REF exercises, on the basis of the study’s findings.

Catriona Firth, associate director for research environment at Research England, says that whatever the outcome, it is important that use of AI does not simply place more burdens on the REF process. “Even if we could do it and even if it were robust, would it actually save that much time by the time you trained up the algorithm?” She adds: “We don’t want to make things any more complicated than they need to be.”