Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Big names in statistics want to shake up much-maligned P value

One of scientists’ favourite statistics — the P value — should face tougher standards, say leading researchers.

Science is in the throes of a reproducibility crisis, and researchers, funders and publishers are increasingly worried that the scholarly literature is littered with unreliable results. Now, a group of 72 prominent researchers is targeting what they say is one cause of the problem: weak statistical standards of evidence for claiming new discoveries.

Statistics: P values are just the tip of the iceberg

In many disciplines the significance of findings is judged by P values. They are used to test (and dismiss) a ‘null hypothesis’, which generally posits that the effect being tested for doesn’t exist. The smaller the P value that is found for a set of results, the less likely it is that the results are purely due to chance. Results are deemed 'statistically significant' when this value is below 0.05.

But many scientists worry that the 0.05 threshold has caused too many false positives to appear in the literature, a problem exacerbated by a practice called P hacking, in which researchers gather data without first creating a hypothesis to test, and then look for patterns in the results that can be reported as statistically significant.

So, in a provocative manuscript posted on the PsyArXiv preprint server on 22 July1, researchers argue that P-value thresholds should be lowered to 0.005 for the social and biomedical sciences. The final paper is set to be published in Nature Human Behaviour.

“Researchers just don’t realize how weak the evidence is when the P value is 0.05,” says Daniel Benjamin, one of the paper’s co-lead authors and an economist at the University of Southern California in Los Angeles. He thinks that claims with P values between 0.05 and 0.005 should be treated merely as “suggestive evidence” instead of established knowledge.

Other co-authors include two heavyweights in reproducibility: John Ioannidis, who studies scientific robustness at Stanford University in California, and Brian Nosek, executive director of the Center for Open Science in Charlottesville, Virginia.

Credit: R. NUZZO; SOURCE: T. SELLKE ET AL. AM. STAT. 55, 62–71 (2001)

Super-sized samples

One problem with reducing P-value thresholds is that it may increase the odds of a false negative — stating that effects do not exist when in fact they do — says Casper Albers, a researcher in psychometrics and statistics at the University of Groningen in the Netherlands. To counter that problem, Benjamin and his colleagues suggest that researchers increase sample sizes by 70%; they say that this would avoid increasing rates of false negatives, while still dramatically reducing rates of false positives. But Albers thinks that in practice, only well-funded scientists would have the means to do this.

How scientists fool themselves – and how they can stop

Shlomo Argamon, a computer scientist at the Illinois Institute of Technology in Chicago, says there is no simple answer to the problem, because “no matter what confidence level you choose, if there are enough different ways to design your experiment, it becomes highly likely that at least one of them will give a statistically significant result just by chance”. More-radical changes such as new methodological standards and research incentives are needed, he says.

Lowering P-value thresholds may also exacerbate the “file-drawer problem”, in which studies with negative results are left unpublished, says Tom Johnstone, a cognitive neuroscientist at the University of Reading, UK. But Benjamin says all research should be published, regardless of P value.

Moving goalposts

Other scientific fields have already cracked down on P values — and in 2015, one psychology journal banned them. Particle physicists, who collect reams of data from atom-smashing experiments, have long demanded a P value below 0.0000003 (or 3 × 10−7) because of concerns that a lower threshold could lead to mistaken claims, notes Valen Johnson, a statistician at Texas A&M University in College Station and a co-lead author of the paper. More than a decade ago, geneticists took similar steps to establish a threshold of 5 × 10−8 for genome-wide association studies, which look for differences between people with a disease and those without across hundreds of thousands of DNA-letter variants.

The best science news from across the web, direct to your inbox – free!

Yet other scientists have abandoned P values in favour of more-sophisticated statistical tools, such as Bayesian tests, which require researchers to define and test two alternative hypotheses. But not all researchers will have the technical expertise to carry out Bayesian tests, says Johnson, who thinks that P values can still be useful for gauging whether a hypothesis is supported by evidence. “P value by itself is not necessarily evil.”

figure c

References

  1. Benjamin, D. et al. Preprint on PsyArXiv http://osf.io/preprints/psyarxiv/mky9j (2017).

Download references

Authors

Related links

Related links

Related links in Nature Research

Statisticians issue warning over misuse of P values 2016-Mar-07

Reproducibility: A tragedy of errors 2016-Feb-03

How scientists fool themselves – and how they can stop 2015-Oct-07

Statistics: P values are just the tip of the iceberg 2015-Apr-28

Psychology journal bans P values 2015-Feb-26

Scientific method: Statistical errors 2014-Feb-12

Related external links

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Singh Chawla, D. Big names in statistics want to shake up much-maligned P value. Nature 548, 16–17 (2017). https://doi.org/10.1038/nature.2017.22375

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature.2017.22375

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing