Two scientists have rolled out a program that spots incorrect gene sequences reported in experiments — and have used it to identify flaws in more than 60 papers, almost all of them studies of cancer.

Jennifer Byrne, a cancer researcher at the Kids Research Institute of the Children’s Hospital at Westmead in Sydney, Australia, and Cyril Labbé, a computer scientist at the University of Grenoble Alpes in Grenoble, France, made public an early version of the program, called Seek & Blastn, in October and now they want other researchers to test the program and help to improve it. They then plan to offer it to journal editors and publishers as an addition to the tools that most already use to check papers, such as software to detect plagiarism.

Byrne has been working on identifying errors in human cancer papers since 2015, when she noticed problems with five papers on gene function in cancer cells. The authors of the papers described performing a common experiment in which they inactivated a gene using a short targeted nucleotide sequence, to observe its effects on tumour cells. Byrne was familiar with the gene because she was part of the team that reported it in 1998. And she realized that the papers reported using the wrong nucleotide sequences for the experiment they claimed to conduct. Two of these papers have since been retracted. Another two are expected to be retracted on 21 November.

Experimental errors

After noticing similar errors in another 25 papers, Byrne and Labbé developed the Seek & Blastn tool to discover more papers with incorrectly identified nucleotide fragments. The software extracts nucleotide sequences from uploaded papers and cross-checks them against a public database of nucleotides, called the Nucleotide Basic Local Alignment Search Tool (Blastn).

“Seek & Blastn tries to find mismatches between the claimed status of a sequence — what the paper says it does — and what the sequence actually is,” says Byrne. A mismatch is flagged, for instance, when a sequence described as targeting a human gene doesn’t find a match in the Blastn database. Sequences described as non-targeting that do have a match in the Blastn database are also detected.

So far, the program detects only misidentified human sequences, says Labbé, but the pair hope to develop it to check sequences from other species, such as mice. The program also struggles to pick up misidentified sequences if the description is unclear in the original paper. This can cause the program to miss some mistakes and to flag papers that have no errors, so all papers put through the software should also be checked manually, he says.

The pair say that they used Seek & Blastn to detect mismatched sequences in another 60 papers. Many of these manuscripts have other problems, such as poor-quality images, graphs and large chunks of overlapping text, all of which make some of the papers “strikingly similar” to each other, says Byrne. With the help of colleagues, they are now manually checking the papers.

Although some errors are minor or accidental, Byrne says the majority of the mismatches they have detected in papers may invalidate the results and conclusions. When you see these incorrectly identified sequences, she says, “you do get concerned about how the results were produced and whether the results in the paper actually reflect the experiments that were done”.

In a 2016 study1 in Scientometrics, Byrne and Labbé reported 48 problematic papers, including the 30 papers that had incorrectly identified nucleotide fragments. These were all written by authors from China. The duo did not publicly identify the papers, apart from the five papers from 2015, but privately contacted journal editors, Byrne says. Many of the editors have not responded, she says. But three more papers have been retracted. In total, the pair have identified incorrect sequences in more than 90 papers.

Automated tools such as Seek & Blastn are most valuable if they are used to promote good scientific practice and encourage scientists to avoid errors in the first place, rather than just catch people out, says statistician David Allison at Indiana University in Bloomington, who has spotted many papers with substantial errors. Such tools could also help to quantify error rates in particular journals and fields, he says.

Matt Hodgkinson, head of research integrity for open-access publisher Hindawi in London, which retracted two of the papers from its journal BioMed Research International, says he could see publishers using Seek & Blastn as part of the article-screening process. “It would depend on the cost and ease of use, whether it can be used and interpreted at scale,” says Hodgkinson. Staff or academic editors would also need to check the output, given the risk for false positives, he says.