Illustration of a magnifying glass over a paper

Amid concerns that the torrent of research papers on COVID-19 could be leading to an excess of sloppy work, a computer scientist who develops automated programs to spot duplicate images in research papers is running his software across the world’s coronavirus preprints. So far, his efforts have yielded mixed success and some controversy.

“I felt hopeless that I was not able to help with the pandemic in any way — but maybe I can help with what I know,” says Daniel Acuna, at Syracuse University in New York. In 2018, Acuna reported the development of algorithms to screen for matching images, including flipped, resized or rotated pictures, across tens of thousands of papers at a time. The software was pioneering because it tried to spot duplicate images — which can be the result of honest mistakes or sometimes denote misconduct — on a vast scale. Research-integrity experts generally still spot errors by eye, or use software to examine matches between a small number of images across one or a few papers.

Acuna’s programs are still experimental, and are being tested by journals and research institutions, he says. But COVID-19 research seemed like a useful testbed, too. This June, he downloaded 3,500 preprints from the bioRxiv and medRxiv servers, two key repositories for coronavirus studies, and used his proprietary software to extract and compare their images — about 21,000 in total.

Controversial results

In four hours, Acuna says, the software picked up around 400 instances of potentially duplicated images. Most turned out not to be problematic, says Acuna, but he selected 24 papers that he thought contained “interesting” duplicate images. In July, he posted these on a website he created, and also flagged the issues publicly on the paper-discussion site PubPeer.

The act created a stir. Some paper authors responded to say that Acuna’s software had picked up duplicate images, and that they would correct their errors, or that the mistakes had already been corrected in peer-reviewed versions of the work. “I think the tool, if proven accurate, should be implemented in PubMed as a default,” wrote Giuseppe Ballistreri, a virologist at the University of Helsinki in Finland, who thanked Acuna for doing the check.

But others said the duplicates weren’t mistakes. Elisabeth Bik, a consultant image-analyst who is renowned for her ability to spot problems in papers by eye, says she wished that Acuna had asked her first about some of his matches. The software “still needs human supervision to make sure it does not falsely label appropriate duplications as inappropriate”, she wrote on one PubPeer thread. Acuna agrees: what the software flags always need to be reviewed by a person and it can’t yet contextualize whether what it picks up is problematic, he says. “I still think it is useful because it is picking up things that would be hard for humans to catch.”

Some researchers said that Acuna’s software had got it wrong entirely and flagged similar — but not matching — images. For instance, Priyamvada Acharya, at Duke University in Durham, North Carolina, asked Acuna to delete his posting about a paper he had co-authored, which showed similar views of the same molecule. “We appreciate your intention and strongly support the effort. However, the implementation needs to be tweaked. Some evaluation or curating by a human is essential,” he and co-authors wrote in a letter. As a result of the feedback, Acuna has removed around one-third of the analyses from his website, and set the site’s listings to private-access, so that authors can only see his findings if they obtain an access key from him.

Overall, automated screening of research papers is far from perfect, and “still needs an expert to interpret and understand”, says Jana Christopher, an image-integrity analyst in Heidelberg, Germany, who reviewed Acuna’s findings for Nature’s news team. She says some of the duplicates flagged by Acuna look problematic, but others don’t. More broadly, automated image-checking work still focuses too heavily on finding duplicates, and cannot yet pick up all forms of data manipulation, she says.

Room for improvement

One problem for software, says Acuna, is that the popular PDF file format can mangle automated tools’ ability to extract images. This May, for instance, Bik tweeted about an image duplication in a Nature paper on COVID-19, which also appeared in a February preprint version — but Acuna’s software didn’t pick that up, because of the PDF problem, he says. (The paper’s authors say they’ve asked for a correction; the journal says that it is looking into the issue). Researchers who work on automated software to spot errors in DNA and RNA sequences made the same complaint about the PDF format in May.

With tens of thousands of preprints, reviewed papers, commentaries, letters and editorials published on the new coronavirus so far this year, numerous researchers have worried about low-quality research, errors, wasted effort and opportunism or even fraud. Scientists have warned of a “deluge of poor-quality research” and are being swamped with review requests as journals try to rush out peer-reviewed results.

Overall, more than 20 research studies about COVID-19 have been withdrawn or retracted from preprint servers or journals, according to the media site Retraction Watch. But it’s premature to conclude that such work is being retracted at higher rates than the rest of the literature, the site’s journalists wrote in a paper this month. Greater scrutiny of COVID-19 papers means that flaws are being detected more frequently than they might otherwise, they added.

Acuna says that he intends to keep up his automated end of that scrutiny — although he knows that duplicated images are only a small part of wider concerns. He’ll carry on analysing COVID-19 preprints — now at 5,500 on bioRxiv and medRxiv, and rising — notifying the authors of any issues first, and only making concerns public if they don’t respond. (Some authors haven’t responded to concerns raised in Acuna’s first batch of analyses, or to Nature’s requests for comment.) Acuna will compare their images to a much larger set of research papers on the database PubMed, which could pick up other instances of image re-use. “I like authors to be aware that someone is doing this,” he says.