A computer-aided analysis of almost 12,000 human-genetics papers has found more than 700 studies with errors in the DNA or RNA sequences of their experimental reagents1. That amounts to a “problem of alarming proportions”, because it suggests that a worrying fraction of studies on human genes are not reliable, says the team that conducted the analysis, led by cancer researcher Jennifer Byrne at the University of Sydney in Australia. The mistakes could be accidental but, researchers suspect, might sometimes point to fraud.
Byrne has been spotting errors in genetics research since 2015, when she found flaws in five papers reporting a common experiment: using a short stretch of DNA to inactivate a gene in cancer cells. The studies reported the wrong nucleotide sequences for the experiments they claimed to perform. Those weren’t the only problems with the papers, which also used similar language and figures; Byrne suspects they were the product of a paper mill, in which third-party companies create papers to order. Four have since been retracted.
Byrne continued to find papers with similar mistakes2. By 2017, she had teamed up with computer scientist Cyril Labbé at the University of Grenoble Alpes in France to create software called Seek & Blastn, which flags potential errors. It extracts short nucleotide sequences from papers and compares them with Blastn, a public database of nucleotides, to check whether they match the human gene they’re supposed to target. Researchers manually check each flagged mismatch.
To gauge the extent of the issue, the team screened papers in two journals that they knew had previously published work with flaws: Gene (where the team checked all 7,400 original papers published from 2007 to 2018) and Oncology Reports (all 3,800 open-access papers published between 2014 and 2018). After manual inspection, around 12% of the papers checked in Oncology Reports turned out to have nucleotide-sequence flaws; in Gene, the figure was only 2%.
Byrne and her group also picked out three subfields of cancer genetics in which they had previously found problems: papers on the effect of a specific type of microRNA (a small strand of RNA) in cancer cells; papers reporting using the drugs cisplatin or gemcitabine on cancer cells or in people with cancer; and papers reporting the effects of wiping out the activity of any of 17 genes to determine their function in cancer cells. More than 25% of around 600 papers studied in these subfields contained nucleotide-sequence mistakes, the team found.
In total, 712 papers — around 6% of the total screened — had errors in the sequences of their nucleotide reagents, the researchers say in a preprint posted on bioRxiv on 31 July1. These studies appeared in 78 journals and in total had been cited more than 17,000 times. (Twenty-nine of the studies appeared in journals published by Springer Nature, which says it will investigate the papers in question; Nature’s news team is editorially independent of its publisher). Just 11 of the 712 studies had been retracted and 3 had expressions of concern, the preprint notes. Byrne says that her team has now e-mailed editors at all the relevant journals or publishers whose contact details they could find, and some have replied to say that they’ll investigate the papers.
The editor-in-chief of Oncology Reports, virologist Demetrios Spandidos, told Nature that he has been contacted by Byrne but has not yet had enough time to evaluate all data described in the manuscript. The editor-in-chief of Gene, molecular biologist Andre Van Wijnen, did not respond to Nature’s request for comment.
Some problems could be unintentional errors, says Byrne. But, she says, the researchers found many inaccuracies that are much less plausibly honest mistakes, such as PCR (polymerase chain reaction) reagents — supposed to amplify DNA — that don’t target any known human gene or sequence.
“The unacceptably high rates of human gene function papers with incorrect nucleotide sequences that we have discovered represent a major challenge to the research fields that aim to translate genomics investments to patients,” the researchers write. At best, it wastes the time of scientists trying to follow up such studies; at worst, papers might have been faked and subfields of research could be unreliable.
Many of the problematic papers flagged in the analysis had authors affiliated with Chinese hospitals. Other sleuths have tied papers from authors affiliated with these institutions to possible paper mills — and hundreds of such studies have been retracted in the past year. Byrne suspects that some of the papers her team spotted could also be from paper mills.
Because Byrne’s team picked out a specific subset of papers, it’s not clear whether the findings are representative of genetics literature as a whole, says Olavo Amaral, a researcher who studies reproducibility and other aspects of scientific practice at the Federal University of Rio de Janeiro in Brazil. But it serves as a warning of problems that need fixing. “Considering that such errors could be detected by automatic tools, I think this speaks to the fact that regular peer review does a very bad job at spotting easy-to-detect mistakes that could be better handled by more systematic forms of quality control,” Amaral says.
“This work calls for the pressing need to develop community standards and guidelines on good practices to ensure high quality and reliable gene-function research,” adds Xihong Lin, a biostatistician at Harvard University in Cambridge, Massachusetts.
Some journals (including Gene) already suggest to editors that they can use Seek & Blastn to check for errors in nucleotide sequences in submitted manuscripts, says Byrne. She’d like to see that continue. “We’re continuing to apply Seek & Blastn to screen new journals, and some of these results are even more startling,” she says.
Park, Y. et al. Preprint at bioRxiv https://doi.org/10.1101/2021.07.29.453321 (2021).
Byrne, J. A. & Labbé, C. Scientometrics 110, 1471–1493 (2017).