Nature | News

Statisticians issue warning over misuse of P values

Policy statement aims to halt missteps in the quest for certainty.

Article tools

Misuse of the P value — a common test for judging the strength of scientific evidence — is contributing to the number of research findings that cannot be reproduced, the American Statistical Association (ASA) warns in a statement released today1. The group has taken the unusual step of issuing principles to guide use of the P value, which it says cannot determine whether a hypothesis is true or whether results are important.

This is the first time that the 177-year-old ASA has made explicit recommendations on such a foundational matter in statistics, says executive director Ron Wasserstein. The society’s members had become increasingly concerned that the P value was being misapplied in ways that cast doubt on statistics generally, he adds.

In its statement, the ASA advises researchers to avoid drawing scientific conclusions or making policy decisions based on P values alone. Researchers should describe not only the data analyses that produced statistically significant results, the society says, but all statistical tests and choices made in calculations. Otherwise, results may seem falsely robust.

Véronique Kiermer, executive editor of the Public Library of Science journals, says that the ASA’s statement lends weight and visibility to longstanding concerns over undue reliance on the P value. “It is also very important in that it shows statisticians, as a profession, engaging with the problems in the literature outside of their field,” she adds.

Weighing the evidence

P values are commonly used to test (and dismiss) a ‘null hypothesis’, which generally states that there is no difference between two groups, or that there is no correlation between a pair of characteristics. The smaller the P value, the less likely an observed set of values would occur by chance — assuming that the null hypothesis is true. A P value of 0.05 or less is generally taken to mean that a finding is statistically significant and warrants publication. But that is not necessarily true, the ASA statement notes.

A P value of 0.05 does not mean that there is a 95% chance that a given hypothesis is correct. Instead, it signifies that if the null hypothesis is true, and all other assumptions made are valid, there is a 5% chance of obtaining a result at least as extreme as the one observed. And a P value cannot indicate the importance of a finding; for instance, a drug can have a statistically significant effect on patients’ blood glucose levels without having a therapeutic effect.

Giovanni Parmigiani, a biostatistician at the Dana Farber Cancer Institute in Boston, Massachusetts, says that misunderstandings about what information a P value provides often crop up in textbooks and practice manuals. A course correction is long overdue, he adds. “Surely if this happened twenty years ago, biomedical research could be in a better place now.”

Frustration abounds

Criticism of the P value is nothing new. In 2011, researchers trying to raise awareness about false positives gamed an analysis to reach a statistically significant finding: that listening to music by the Beatles makes undergraduates younger2. More controversially, in 2015, a set of documentary filmmakers published conclusions from a purposely shoddy clinical trial — supported by a robust P value — to show that eating chocolate helps people to lose weight. (The article has since been retracted.)

But Simine Vazire, a psychologist at the University of California, Davis, and editor of the journal Social Psychological and Personality Science, thinks that the ASA statement could help to convince authors to disclose all of the statistical analyses that they run. “To the extent that people might be sceptical, it helps to have statisticians saying, ‘No, you can't interpret P values without this information,” she says.

More drastic steps, such as the ban on publishing papers that contain P values instituted by at least one journal, could be counter-productive, says Andrew Vickers, a biostatistician at Memorial Sloan Kettering Cancer Center in New York City. He compares attempts to bar the use of P values to addressing the risk of automobile accidents by warning people not to drive — a message that many in the target audience would probably ignore. Instead, Vickers says that researchers should be instructed to “treat statistics as a science, and not a recipe”.

But a better understanding of the P value will not take away the human impulse to use statistics to create an impossible level of confidence, warns Andrew Gelman, a statistician at Columbia University in New York City.

“People want something that they can't really get,” he says. “They want certainty.”

Journal name:
Nature
Volume:
531,
Pages:
151
Date published:
()
DOI:
doi:10.1038/nature.2016.19503

References

  1. Wasserstein, R. L. & Lazar, N. A. advance online publication The American Statistician (2016).

  2. Simmons, J. P., Nelson, L. D. & Simonsohn, U. Psychol. Sci. 22, 13591366 (2011).

For the best commenting experience, please login or register as a user and agree to our Community Guidelines. You will be re-directed back to this page where you will see comments updating in real-time and have the ability to recommend comments to other users.

Comments

Commenting is currently unavailable.

sign up to Nature briefing

What matters in science — and why — free in your inbox every weekday.

Sign up

Listen

new-pod-red

Nature Podcast

Our award-winning show features highlights from the week's edition of Nature, interviews with the people behind the science, and in-depth commentary and analysis from journalists around the world.