News
Published: 07 March 2016

Statisticians issue warning over misuse of P values

Monya Baker

Nature volume 531, page 151 (2016)Cite this article

16k Accesses
139 Citations
2639 Altmetric
Metrics details

Subjects

Policy statement aims to halt missteps in the quest for certainty.

Misuse of the P value — a common test for judging the strength of scientific evidence — is contributing to the number of research findings that cannot be reproduced, the American Statistical Association (ASA) warns in a statement released today¹. The group has taken the unusual step of issuing principles to guide use of the P value, which it says cannot determine whether a hypothesis is true or whether results are important.

This is the first time that the 177-year-old ASA has made explicit recommendations on such a foundational matter in statistics, says executive director Ron Wasserstein. The society’s members had become increasingly concerned that the P value was being misapplied in ways that cast doubt on statistics generally, he adds.

How scientists fool themselves – and how they can stop

In its statement, the ASA advises researchers to avoid drawing scientific conclusions or making policy decisions based on P values alone. Researchers should describe not only the data analyses that produced statistically significant results, the society says, but all statistical tests and choices made in calculations. Otherwise, results may seem falsely robust.

Véronique Kiermer, executive editor of the Public Library of Science journals, says that the ASA’s statement lends weight and visibility to longstanding concerns over undue reliance on the P value. “It is also very important in that it shows statisticians, as a profession, engaging with the problems in the literature outside of their field,” she adds.

Weighing the evidence

P values are commonly used to test (and dismiss) a ‘null hypothesis’, which generally states that there is no difference between two groups, or that there is no correlation between a pair of characteristics. The smaller the P value, the less likely an observed set of values would occur by chance — assuming that the null hypothesis is true. A P value of 0.05 or less is generally taken to mean that a finding is statistically significant and warrants publication. But that is not necessarily true, the ASA statement notes.

Scientific method: Statistical errors

A P value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed. And a P value cannot indicate the importance of a finding; for instance, a drug can have a statistically significant effect on patients’ blood glucose levels without having a therapeutic effect.

Giovanni Parmigiani, a biostatistician at the Dana Farber Cancer Institute in Boston, Massachusetts, says that misunderstandings about what information a P value provides often crop up in textbooks and practice manuals. A course correction is long overdue, he adds. “Surely if this happened twenty years ago, biomedical research could be in a better place now.”

Frustration abounds

Criticism of the P value is nothing new. In 2011, researchers trying to raise awareness about false positives gamed an analysis to reach a statistically significant finding: that listening to music by the Beatles makes undergraduates younger². More controversially, in 2015, a set of documentary filmmakers published conclusions from a purposely shoddy clinical trial — supported by a robust P value — to show that eating chocolate helps people to lose weight. (The article has since been retracted.)

Statistics: P values are just the tip of the iceberg

But Simine Vazire, a psychologist at the University of California, Davis, and editor of the journal Social Psychological and Personality Science, thinks that the ASA statement could help to convince authors to disclose all of the statistical analyses that they run. “To the extent that people might be sceptical, it helps to have statisticians saying, ‘No, you can't interpret P values without this information,” she says.

More drastic steps, such as the ban on publishing papers that contain P values instituted by at least one journal, could be counter-productive, says Andrew Vickers, a biostatistician at Memorial Sloan Kettering Cancer Center in New York City. He compares attempts to bar the use of P values to addressing the risk of automobile accidents by warning people not to drive — a message that many in the target audience would probably ignore. Instead, Vickers says that researchers should be instructed to “treat statistics as a science, and not a recipe”.

Sign up - it's free!

But a better understanding of the P value will not take away the human impulse to use statistics to create an impossible level of confidence, warns Andrew Gelman, a statistician at Columbia University in New York City.

“People want something that they can't really get,” he says. “They want certainty.”

Smart software spots statistical errors in psychology papers

References

Wasserstein, R. L. & Lazar, N. A. advance online publication The American Statistician (2016).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. Psychol. Sci. 22, 1359-1366 (2011).
Article Google Scholar

Download references

Authors

Monya Baker
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Tweet Facebook LinkedIn Weibo

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baker, M. Statisticians issue warning over misuse of P values. Nature 531, 151 (2016). https://doi.org/10.1038/nature.2016.19503

Download citation

Published: 07 March 2016
Issue Date: 10 March 2016
DOI: https://doi.org/10.1038/nature.2016.19503

This article is cited by

Gamification enhances student intrinsic motivation, perceptions of autonomy and relatedness, but minimal impact on competency: a meta-analysis and systematic review
- Liuyufeng Li
- Khe Foon Hew
- Jiahui Du
Educational technology research and development (2024)
Bootstrap approach to disparity testing with source uncertainty in the data
- Gary C. McDonald
- Joseph F. Willard
Health Services and Outcomes Research Methodology (2023)
From anecdote to evidence: the relationship between personality and need for cognition of developers
- Daniel Russo
- Andres R. Masegosa
- Klaas-Jan Stol
Empirical Software Engineering (2022)
Learning to Play the Piano Whilst Reading Music: Short-Term School-Based Piano Instruction Improves Memory and Word Recognition in Children
- Ruth Price-Mohr
- Colin Price
International Journal of Early Childhood (2021)
A Comparison of Children Aged 4–5 Years Learning to Read Through Instructional Texts Containing Either a High or a Low Proportion of Phonically-Decodable Words
- Ruth Price-Mohr
- Colin Price
Early Childhood Education Journal (2020)

Statisticians issue warning over misuse of P values

Subjects

References

Additional information

Related links

Related links in Nature Research

Related external links

Rights and permissions

About this article

Cite this article

This article is cited by

Gamification enhances student intrinsic motivation, perceptions of autonomy and relatedness, but minimal impact on competency: a meta-analysis and systematic review

Bootstrap approach to disparity testing with source uncertainty in the data

From anecdote to evidence: the relationship between personality and need for cognition of developers

Learning to Play the Piano Whilst Reading Music: Short-Term School-Based Piano Instruction Improves Memory and Word Recognition in Children

A Comparison of Children Aged 4–5 Years Learning to Read Through Instructional Texts Containing Either a High or a Low Proportion of Phonically-Decodable Words

Search

Quick links

Subjects

References

Additional information

Related links

Related links

Related links in Nature Research

Related external links

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Gamification enhances student intrinsic motivation, perceptions of autonomy and relatedness, but minimal impact on competency: a meta-analysis and systematic review

Bootstrap approach to disparity testing with source uncertainty in the data

From anecdote to evidence: the relationship between personality and need for cognition of developers

Learning to Play the Piano Whilst Reading Music: Short-Term School-Based Piano Instruction Improves Memory and Word Recognition in Children

A Comparison of Children Aged 4–5 Years Learning to Read Through Instructional Texts Containing Either a High or a Low Proportion of Phonically-Decodable Words

Search

Quick links