Reproducibility: A tragedy of errors

Allison, David B.; Brown, Andrew W.; George, Brandon J.; Kaiser, Kathryn A.

doi:10.1038/530027a

Download PDF

Comment
Published: 03 February 2016

Reproducibility: A tragedy of errors

David B. Allison¹,
Andrew W. Brown²,
Brandon J. George³ &
…
Kathryn A. Kaiser⁴

Nature volume 530, pages 27–29 (2016)Cite this article

10k Accesses
146 Citations
1053 Altmetric
Metrics details

Subjects

Mistakes in peer-reviewed papers are easy to find but hard to fix, report David B. Allison and colleagues.

Credit: Illustration by David Parkins

Just how error-prone and self-correcting is science? We have spent the past 18 months getting a sense of that.

Scientific method: Statistical errors

We are a group of researchers working on obesity, nutrition and energetics. In the summer of 2014, one of us (D.B.A.) read a research paper in a well-regarded journal estimating how a change in fast-food consumption would affect children's weight, and he noted that the analysis applied a mathematical model that overestimated effects by more than tenfold. We and others submitted a letter¹ to the editor explaining the problem. Months later, we were gratified to learn that the authors had elected to retract their paper. In the face of popular articles proclaiming that science is stumbling, this episode was an affirmation that science is self-correcting.

Sadly, in our experience, the case is not representative. In the course of assembling weekly lists of articles in our field, we began noticing more peer-reviewed articles containing what we call substantial or invalidating errors. These involve factual mistakes or veer substantially from clearly accepted procedures in ways that, if corrected, might alter a paper's conclusions.

After attempting to address more than 25 of these errors with letters to authors or journals, and identifying at least a dozen more, we had to stop — the work took too much of our time. Our efforts revealed invalidating practices that occur repeatedly (see ‘Three common errors’) and showed how journals and authors react when faced with mistakes that need correction.

boxed-text

We learned that post-publication peer review is not consistent, smooth or rapid. Many journal editors and staff members seemed unprepared or ill-equipped to investigate, take action or even respond. Too often, the process spiralled through layers of ineffective e-mails among authors, editors and unidentified journal representatives, often without any public statement added to the original article. Some journals that acknowledged mistakes required a substantial fee to publish our letters: we were asked to spend our research dollars on correcting other people's errors.

Publishing: The peer-review scam

As academics who publish, review papers or serve as editors, we appreciate that these issues are complicated. And we feel that journal editors are dedicated and sincere in their efforts. Nevertheless, the scientific community must improve.

Science relies essentially but complacently on self-correction, yet scientific publishing raises severe disincentives against such correction. One publisher states that it will charge the author who initiates withdrawal of a published paper US$10,000.

Here we summarize our experience, the main barriers we encountered, and our thoughts on how to make published science more rigorous. (Details of other resolved issues are available on request.)

Six problems

Editors are often unable or reluctant to take speedy and appropriate action. For one paper, we obtained raw data deposited online, received institutional approval to reanalyse the data, and submitted a letter to the editor (through the manuscript-submission system) describing a need for correction within two weeks. After nine months, we asked the journal why, at minimum, an expression of concern had not been posted. An editor admitted that they had not anticipated the process taking as long as it had. The journal communicated its decision to accept our letter and retract the article 11 months after our submission. The letter and retraction have yet to be published.

Blind analysis: Hide results to seek the truth

Where to send expressions of concern is unclear. Journals rarely state whom to contact about potentially invalidating errors. We had to guess whether to send letters to a staff member or editor, formally submit the letter as a manuscript, or contact the authors of a paper directly. On a few occasions, we opted to contact authors when an apparent invalidating error may have merely been an ambiguous description. In unequivocal cases, we usually contacted the journal. Often, journals provided no way to contact editors directly, and editorial staff corresponded without identifying themselves; we were unsure whether editors were involved.

Journals that acknowledged invalidating errors were reluctant to issue retractions. In one case, we and others found that a paper had mistakenly argued that a statistical adjustment introduced bias, and we submitted a letter to the editor through the journal's submission system². An external statistical review subsequently commissioned by the journal confirmed the error. The authors were asked to retract the article, but they refused. The journal ultimately posted the authors' response to our letter and a summary of commissioned reviewers' criticism. An accompanying editorial published³ by the journal stated that “it is each author's responsibility to make sure that statistical procedures are correctly used and valid for the study submitted”.

Authors and journals should share data and code quickly when questions arise.

Journals charge authors to correct others' mistakes. For one article that we believed contained an invalidating error, our options were to post a comment in an online commenting system or pay a 'discounted' submission fee of US$1,716. With another journal from the same publisher, the fee was £1,470 (US$2,100) to publish a letter. Letters from the journal advised that “we are unable to take editorial considerations into account when assessing waiver requests, only the author's documented ability to pay”. The Committee on Publication Ethics, an independent body that provides advice on how to handle research misconduct, asserts that readers should not have to pay to read retractions. To our knowledge, no authority has discussed whether third parties should be charged to correct errors.

Statistics: P values are just the tip of the iceberg

No standard mechanism exists to request raw data. When we were able to access data online, we could quickly confirm suspected errors. In at least two cases, we requested data from the authors but received summaries of calculations instead. Sometimes we received no data at all, at which point it was not clear whether journal staff should step in. One journal did retract a paper when its authors refused to show their data or explain discrepancies that we had identified and alerted the journal to in a letter⁴.

Working directly with authors can delay correction. After we contacted authors about another paper, they offered to reanalyse the data to address our concerns. After a month with no response, we submitted a letter of concern to the journal. The letter was peer-reviewed and accepted within three weeks. The authors, when made aware of the pending publication of our letter, e-mailed us to state that they would prepare a reply, and we asked the journal not to publish our letter so that we could collaborate with the original authors. That process is ongoing, ten months after we identified the error.

Informal expressions of concern are overlooked. Although online platforms such as PubMed Commons offer a convenient way to comment on published papers, they do not include a mediating role for journal editors, and the comments are not incorporated into the literature. Posted concerns are rarely prominent on journals' websites and are not cross-referenced in any useful way. As a result, readers may assume that a flawed paper is correct, potentially leading to misinformed decisions in science, patient care and public policy.

Chemical con artists foil drug discovery

In one case, we chose to post a comment on the journal website and on PubMed Commons after months of private correspondence, in which the authors shared some supplementary data and said that they were preparing a full response. The concerns have been acknowledged but remain unaddressed 15 months after we contacted authors and the journal, and 6 months after we posted our comment (see go.nature.com/fv8tr2).

What can be done?

Journals have guidelines for paper submissions and peer review. The Committee on Publication Ethics has outlined recommendations for journals to address problems in areas such as authorship and review. But there is little formal guidance for post-publication corrections. (For our recommendations, see ‘Fixing post-publication review’.)

Fixing post-publication review

Table 7.33675 Fixing post-publication review

Full size table

Journals, publishers and scientific societies should standardize, streamline and publicize these processes. Authors and journals should share data and code quickly when questions arise. Researchers can aid this process by accessing statistical expertise for experimental design and analysis.

Ideally, anyone who detects a potential problem with a study will engage, whether by writing to authors and editors or by commenting online, and will do so in a collegial way. Scientists who engage in post-publication review often do so out of a sense of duty to their community, but this important work does not come with the same prestige as other scientific endeavours. Recognizing and incentivizing such activities could go a long way to cleaning up the literature.

Many hands make tight work

Our work was not a systematic search; we simply looked more closely at papers that caught our eye and that we were prepared to assess. We do not know the rate of errors or the motivations behind them (that is, whether they are honest mistakes or a 'sleight of statistics'). But we showed that a small team of investigators with expertise in statistics and experimental design could find dozens of problematic papers while keeping abreast of the literature. Most were detected simply by reading the paper.

A more formal survey would help to determine whether our experiences reflect science in general and whether our recommendations are feasible or effective. Others working to correct the scientific record have encountered similar challenges. Ben Goldacre, a physician and campaigner who is leading COMPare, a project that checks that clinical trials report the outcomes they said they would, told Retraction Watch: “This is a phenomenally laborious process. Not a week goes by that we don't curse the day we set out to do this.”

Robust science needs robust corrections. It is time to make the process less onerous.

Three common errors

Credit: A. Barrington Brown, Gonville and Caius College/SPL

As the influential twentieth-century statistician Ronald Fisher (pictured) said: “To consult the statistician after an experiment is finished is often merely to ask him to conduct a post mortem examination. He can perhaps say what the experiment died of.”

Too many of our post-publication reviews were indeed post mortems. Some studies used inappropriate or non-randomization methods, despite stating that their studies were randomized (see, for example, ref. 5 and go.nature.com/x2l9zz). Others described mathematically or physiologically impossible results: p-values greater than 1, or an average height change of about 7 centimetres in adults over 8 weeks^4,6.

Frequent errors, once recognized, can be kept out of the literature with targeted education and policies. Three of the most common are outlined below. These and others are described in depth in an upcoming publication⁷.

1 Mistaken design or analysis of cluster-randomized trials. In these studies, all participants in a cluster (for example, a cage, school or hospital) are given the same treatment. The number of clusters (not just the number of individuals) must be incorporated into the analysis. Otherwise, results often seem, falsely, to be statistically significant^8,9. Increasing the number of individuals within clusters can increase power, but the gains are minute compared with increasing clusters. Designs with only one cluster per treatment are not valid as randomized experiments, regardless of how many individuals are included.

2 Miscalculation in meta-analyses. Effect sizes are often miscalculated when meta-analysts are confronted with incomplete information and do not adapt appropriately. Another problem is confusion about how to calculate the variance of effects. Different study designs and meta-analyses require different approaches. Incorrect or inconsistent choices can change effect sizes, study weighting or the overall conclusions⁴.

3 Inappropriate baseline comparisons. In at least six articles, authors tested for changes from the baseline in separate groups; if one was significant and one not, the authors (wrongly) proposed a difference between groups. Rather than comparing 'differences in nominal significance' (the DINS error) differences between groups must be compared directly. For studies comparing two equal-sized groups, the DINS error can inflate the false-positive rate from 5% to as much as 50% (ref. 10).

References

Brown, A. W. et al. Child. Obes. 10, 542–545 (2014).
Article Google Scholar
Li, P. et al. Obes. Facts 8, 127–129 (2015).
Article Google Scholar
Hauner, H. Obes. Facts 8, 125–126 (2015)
Article Google Scholar
George, B. J., Brown, A. W. & Allison, D. B. J. Paramedical Sci. 6, 153–154 (2015).
Google Scholar
George, B. J., Goldsby, T. U., Brown, A. W., Li, P. & Allison, D. B. Int. J. Yoga 9, 87–88 (2016).
Article Google Scholar
Thomas, D. M. et al. World J. Acupunct. Moxibustion 25, 66–67 (2015).
Article Google Scholar
George, B. J. et al. Obesity (in the press).
Brown, A. W. et al. Am. J. Clin. Nutr. 102, 241–248 (2015).
Article CAS Google Scholar
Obesity 23, 2522 (2015).
Bland, J. M. & Altman, D. G. Am. J. Clin. Nutr. 102, 991–994 (2015).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

David B. Allison is a distinguished professor in the Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Alabama, USA.,
David B. Allison
Andrew W. Brown is a scientist in the Office of Energetics and the Nutrition Obesity Research Center, University of Alabama at Birmingham, Alabama, USA.,
Andrew W. Brown
Brandon J. George is a statistician in the Office of Energetics, University of Alabama at Birmingham, Alabama, USA.,
Brandon J. George
Kathryn A. Kaiser is an instructor in the Office of Energetics and the Nutrition Obesity Research Center, University of Alabama at Birmingham, Alabama, USA.,
Kathryn A. Kaiser

Authors

David B. Allison
View author publications
You can also search for this author in PubMed Google Scholar
Andrew W. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Brandon J. George
View author publications
You can also search for this author in PubMed Google Scholar
Kathryn A. Kaiser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David B. Allison.

Ethics declarations

Competing interests

D.B.A. is on multiple editorial boards and receives financial compensation from Frontiers in Genetics, Obesity and The American Journal of Clinical Nutrition. A.W.B. And K.A.K. are on the editorial board for Frontiers in Nutrition Methodology, but receive no compensation in their roles.

Additional information

Tweet Facebook LinkedIn weibo

This article is cited by

The future of academic publishing
- Abubakari Ahmed
- Aceil Al-Khatib
- J. Andrew Pruszynski
Nature Human Behaviour (2023)
Benefits and harms of implementing [18F]FDG-PET/CT for diagnosing recurrent breast cancer: a prospective clinical study
- Marianne Vogsen
- Jeanette Dupont Jensen
- Malene Grubbe Hildebrandt
EJNMMI Research (2021)
Errors in the implementation, analysis, and reporting of randomization within obesity and nutrition research: a guide to their avoidance
- Colby J. Vorland
- Andrew W. Brown
- David B. Allison
International Journal of Obesity (2021)
Systematic review and meta-analyses of studies analysing instructions to authors from 1987 to 2017
- Mario Malički
- Ana Jerončić
- Gerben ter Riet
Nature Communications (2021)
The thin ret(raction) line: biomedical journal responses to incorrect non-targeting nucleotide sequence reagents in human gene knockdown publications
- Jennifer A. Byrne
- Yasunori Park
- Cyril Labbé
Scientometrics (2021)