News
Published: 03 August 2017

Big names in statistics want to shake up much-maligned P value

Dalmeet Singh Chawla

Nature volume 548, pages 16–17 (2017)Cite this article

2721 Accesses
16 Citations
1512 Altmetric
Metrics details

Subjects

One of scientists’ favourite statistics — the P value — should face tougher standards, say leading researchers.

Science is in the throes of a reproducibility crisis, and researchers, funders and publishers are increasingly worried that the scholarly literature is littered with unreliable results. Now, a group of 72 prominent researchers is targeting what they say is one cause of the problem: weak statistical standards of evidence for claiming new discoveries.

Statistics: P values are just the tip of the iceberg

In many disciplines the significance of findings is judged by P values. They are used to test (and dismiss) a ‘null hypothesis’, which generally posits that the effect being tested for doesn’t exist. The smaller the P value that is found for a set of results, the less likely it is that the results are purely due to chance. Results are deemed 'statistically significant' when this value is below 0.05.

But many scientists worry that the 0.05 threshold has caused too many false positives to appear in the literature, a problem exacerbated by a practice called P hacking, in which researchers gather data without first creating a hypothesis to test, and then look for patterns in the results that can be reported as statistically significant.

So, in a provocative manuscript posted on the PsyArXiv preprint server on 22 July¹, researchers argue that P-value thresholds should be lowered to 0.005 for the social and biomedical sciences. The final paper is set to be published in Nature Human Behaviour.

“Researchers just don’t realize how weak the evidence is when the P value is 0.05,” says Daniel Benjamin, one of the paper’s co-lead authors and an economist at the University of Southern California in Los Angeles. He thinks that claims with P values between 0.05 and 0.005 should be treated merely as “suggestive evidence” instead of established knowledge.

Other co-authors include two heavyweights in reproducibility: John Ioannidis, who studies scientific robustness at Stanford University in California, and Brian Nosek, executive director of the Center for Open Science in Charlottesville, Virginia.

Credit: R. NUZZO; SOURCE: T. SELLKE ET AL. AM. STAT. 55, 62–71 (2001)

Super-sized samples

One problem with reducing P-value thresholds is that it may increase the odds of a false negative — stating that effects do not exist when in fact they do — says Casper Albers, a researcher in psychometrics and statistics at the University of Groningen in the Netherlands. To counter that problem, Benjamin and his colleagues suggest that researchers increase sample sizes by 70%; they say that this would avoid increasing rates of false negatives, while still dramatically reducing rates of false positives. But Albers thinks that in practice, only well-funded scientists would have the means to do this.

How scientists fool themselves – and how they can stop

Shlomo Argamon, a computer scientist at the Illinois Institute of Technology in Chicago, says there is no simple answer to the problem, because “no matter what confidence level you choose, if there are enough different ways to design your experiment, it becomes highly likely that at least one of them will give a statistically significant result just by chance”. More-radical changes such as new methodological standards and research incentives are needed, he says.

Lowering P-value thresholds may also exacerbate the “file-drawer problem”, in which studies with negative results are left unpublished, says Tom Johnstone, a cognitive neuroscientist at the University of Reading, UK. But Benjamin says all research should be published, regardless of P value.

Moving goalposts

Other scientific fields have already cracked down on P values — and in 2015, one psychology journal banned them. Particle physicists, who collect reams of data from atom-smashing experiments, have long demanded a P value below 0.0000003 (or 3 × 10⁻⁷) because of concerns that a lower threshold could lead to mistaken claims, notes Valen Johnson, a statistician at Texas A&M University in College Station and a co-lead author of the paper. More than a decade ago, geneticists took similar steps to establish a threshold of 5 × 10⁻⁸ for genome-wide association studies, which look for differences between people with a disease and those without across hundreds of thousands of DNA-letter variants.

The best science news from across the web, direct to your inbox – free!

Yet other scientists have abandoned P values in favour of more-sophisticated statistical tools, such as Bayesian tests, which require researchers to define and test two alternative hypotheses. But not all researchers will have the technical expertise to carry out Bayesian tests, says Johnson, who thinks that P values can still be useful for gauging whether a hypothesis is supported by evidence. “P value by itself is not necessarily evil.”

References

Benjamin, D. et al. Preprint on PsyArXiv http://osf.io/preprints/psyarxiv/mky9j (2017).

Download references

Authors

Dalmeet Singh Chawla
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Singh Chawla, D. Big names in statistics want to shake up much-maligned P value. Nature 548, 16–17 (2017). https://doi.org/10.1038/nature.2017.22375

Download citation

Published: 03 August 2017
Issue Date: 03 August 2017
DOI: https://doi.org/10.1038/nature.2017.22375

This article is cited by

Brain morphometry and estimation of aging brain in subjects with congenital untreated isolated GH deficiency
- Keila R. Villar-Gouy
- Carlos Ernesto Garrido Salmon
- Manuel H. Aguiar-Oliveira
Journal of Endocrinological Investigation (2024)
A call for changing data analysis practices: from philosophy and comprehensive reporting to modeling approaches and back
- Osvaldo A. Martin
- François P. Teste
Plant and Soil (2022)

Big names in statistics want to shake up much-maligned P value

Subjects

References

Related links

Related links in Nature Research

Rights and permissions

About this article

Cite this article

This article is cited by

Brain morphometry and estimation of aging brain in subjects with congenital untreated isolated GH deficiency

A call for changing data analysis practices: from philosophy and comprehensive reporting to modeling approaches and back

Search

Quick links

Subjects

References

Related links

Related links

Related links in Nature Research

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Brain morphometry and estimation of aging brain in subjects with congenital untreated isolated GH deficiency

A call for changing data analysis practices: from philosophy and comprehensive reporting to modeling approaches and back

Search

Quick links