Redefine statistical significance

Benjamin, Daniel J.; Berger, James O.; Johannesson, Magnus; Nosek, Brian A.; Wagenmakers, E.-J.; Berk, Richard; Bollen, Kenneth A.; Brembs, Björn; Brown, Lawrence; Camerer, Colin; Cesarini, David; Chambers, Christopher D.; Clyde, Merlise; Cook, Thomas D.; De Boeck, Paul; Dienes, Zoltan; Dreber, Anna; Easwaran, Kenny; Efferson, Charles; Fehr, Ernst; Fidler, Fiona; Field, Andy P.; Forster, Malcolm; George, Edward I.; Gonzalez, Richard; Goodman, Steven; Green, Edwin; Green, Donald P.; Greenwald, Anthony G.; Hadfield, Jarrod D.; Hedges, Larry V.; Held, Leonhard; Hua Ho, Teck; Hoijtink, Herbert; Hruschka, Daniel J.; Imai, Kosuke; Imbens, Guido; Ioannidis, John P. A.; Jeon, Minjeong; Jones, James Holland; Kirchler, Michael; Laibson, David; List, John; Little, Roderick; Lupia, Arthur; Machery, Edouard; Maxwell, Scott E.; McCarthy, Michael; Moore, Don A.; Morgan, Stephen L.; Munafó, Marcus; Nakagawa, Shinichi; Nyhan, Brendan; Parker, Timothy H.; Pericchi, Luis; Perugini, Marco; Rouder, Jeff; Rousseau, Judith; Savalei, Victoria; Schönbrodt, Felix D.; Sellke, Thomas; Sinclair, Betsy; Tingley, Dustin; Van Zandt, Trisha; Vazire, Simine; Watts, Duncan J.; Winship, Christopher; Wolpert, Robert L.; Xie, Yu; Young, Cristobal; Zinman, Jonathan; Johnson, Valen E.

doi:10.1038/s41562-017-0189-z

Comment
Published: 01 September 2017

Redefine statistical significance

Daniel J. Benjamin¹,
James O. Berger²,
Magnus Johannesson³,
Brian A. Nosek^4,5,
E.-J. Wagenmakers⁶,
Richard Berk^7,10,
Kenneth A. Bollen⁸,
Björn Brembs⁹,
Lawrence Brown¹⁰,
Colin Camerer¹¹,
David Cesarini^12,13,
Christopher D. Chambers¹⁴,
Merlise Clyde²,
Thomas D. Cook^15,16,
Paul De Boeck¹⁷,
Zoltan Dienes¹⁸,
Anna Dreber³,
Kenny Easwaran¹⁹,
Charles Efferson²⁰,
Ernst Fehr²¹,
Fiona Fidler²²,
Andy P. Field¹⁸,
Malcolm Forster²³,
Edward I. George¹⁰,
Richard Gonzalez²⁴,
Steven Goodman²⁵,
Edwin Green²⁶,
Donald P. Green²⁷,
Anthony G. Greenwald²⁸,
Jarrod D. Hadfield²⁹,
Larry V. Hedges³⁰,
Leonhard Held³¹,
Teck Hua Ho³²,
Herbert Hoijtink³³,
Daniel J. Hruschka³⁴,
Kosuke Imai³⁵,
Guido Imbens³⁶,
John P. A. Ioannidis³⁷,
Minjeong Jeon³⁸,
James Holland Jones^39,40,
Michael Kirchler⁴¹,
David Laibson⁴²,
John List⁴³,
Roderick Little⁴⁴,
Arthur Lupia⁴⁵,
Edouard Machery⁴⁶,
Scott E. Maxwell⁴⁷,
Michael McCarthy⁴⁸,
Don A. Moore⁴⁹,
Stephen L. Morgan⁵⁰,
Marcus Munafó^51,52,
Shinichi Nakagawa⁵³,
Brendan Nyhan⁵⁴,
Timothy H. Parker⁵⁵,
Luis Pericchi⁵⁶,
Marco Perugini⁵⁷,
Jeff Rouder⁵⁸,
Judith Rousseau⁵⁹,
Victoria Savalei⁶⁰,
Felix D. Schönbrodt⁶¹,
Thomas Sellke⁶²,
Betsy Sinclair⁶³,
Dustin Tingley⁶⁴,
Trisha Van Zandt⁶⁵,
Simine Vazire⁶⁶,
Duncan J. Watts⁶⁷,
Christopher Winship⁶⁸,
Robert L. Wolpert²,
Yu Xie⁶⁹,
Cristobal Young⁷⁰,
Jonathan Zinman⁷¹ &
…
Valen E. Johnson⁷²

Nature Human Behaviour volume 2, pages 6–10 (2018)Cite this article

180k Accesses
1517 Citations
901 Altmetric
Metrics details

Subjects

We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries.

You have full access to this article via your institution.

Download PDF

The lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on ‘statistically significant’ findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating statistically significant findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.

For fields where the threshold for defining statistical significance for new discoveries is P < 0.05, we propose a change to P < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called significant but do not meet the new threshold should instead be called suggestive. While statisticians have known the relative weakness of using P ≈ 0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new^1,2, a critical mass of researchers now endorse this change.

We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (for example, genomics and high-energy physics research; see the ‘Potential objections’ section below).

We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P values. However, changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.

Strength of evidence from P values

In testing a point null hypothesis H ₀ against an alternative hypothesis H ₁ based on data x _obs, the P value is defined as the probability, calculated under the null hypothesis, that a test statistic is as extreme or more extreme than its observed value. The null hypothesis is typically rejected — and the finding is declared statistically significant — if the P value falls below the (current) type I error threshold α = 0.05.

From a Bayesian perspective, a more direct measure of the strength of evidence for H ₁ relative to H ₀ is the ratio of their probabilities. By Bayes’ rule, this ratio may be written as:

$$\frac{{\rm{\Pr }}\left({H}_{1}\left|{x}_{{\rm{obs}}}\right.\right)}{{\rm{\Pr }}\left({H}_{0}\left|{x}_{{\rm{obs}}}\right.\right)}=\frac{f\left({x}_{{\rm{obs}}}\left|{H}_{1}\right.\right)}{f\left({x}_{{\rm{obs}}}\left|{H}_{0}\right.\right)}\times \frac{{\rm{\Pr }}\left({H}_{1}\right)}{{\rm{\Pr }}\left({H}_{0}\right)}\equiv {\rm{BF}}\times \left({\rm{prior}}\,{\rm{odds}}\right)$$

(1)

where BF is the Bayes factor that represents the evidence from the data, and the prior odds can be informed by researchers’ beliefs, scientific consensus, and validated evidence from similar research questions in the same field. Multiple-hypothesis testing, P-hacking and publication bias all reduce the credibility of evidence. Some of these practices reduce the prior odds of H ₁ relative to H ₀ by changing the population of hypothesis tests that are reported. Prediction markets³ and analyses of replication results⁴ both suggest that for psychology experiments, the prior odds of H ₁ relative to H ₀ may be only about 1:10. A similar number has been suggested in cancer clinical trials, and the number is likely to be much lower in preclinical biomedical research⁵.

There is no unique mapping between the P value and the Bayes factor, since the Bayes factor depends on H ₁. However, the connection between the two quantities can be evaluated for particular test statistics under certain classes of plausible alternatives (Fig. 1).

**Fig. 1: Relationship between the P value and the Bayes factor.**

A two-sided P value of 0.05 corresponds to Bayes factors in favour of H ₁ that range from about 2.5 to 3.4 under reasonable assumptions about H ₁ (Fig. 1). This is weak evidence from at least three perspectives. First, conventional Bayes factor categorizations⁶ characterize this range as ‘weak’ or ‘very weak’. Second, we suspect many scientists would guess that P ≈ 0.05 implies stronger support for H ₁ than a Bayes factor of 2.5 to 3.4. Third, using equation (1) and prior odds of 1:10, a P value of 0.05 corresponds to at least 3:1 odds (that is, the reciprocal of the product $\frac{1}{10}\times 3.4$) in favour of the null hypothesis!

Why 0.005

The choice of any particular threshold is arbitrary and involves a trade-off between type I and type II errors. We propose 0.005 for two reasons. First, a two-sided P value of 0.005 corresponds to Bayes factors between approximately 14 and 26 in favour of H ₁. This range represents ‘substantial’ to ‘strong’ evidence according to conventional Bayes factor classifications⁶.

Second, in many fields the P < 0.005 standard would reduce the false positive rate to levels we judge to be reasonable. If we let φ denote the proportion of null hypotheses that are true, 1 – β the power of tests in rejecting false null hypotheses, and α the type I error/significance threshold, then as the population of tested hypotheses becomes large, the false positive rate (that is, the proportion of true null effects among the total number of statistically significant findings) can be approximated by:

$${\rm{False}}\,{\rm{positive}}\,{\rm{rate}}\phantom{\rule{0em}{0ex}}\approx \phantom{\rule{0em}{0ex}}\frac{\alpha \phi }{\alpha \phi +\left(1-\beta \right)\left(1-\phi \right)}$$

(2)

For different levels of the prior odds that there is a true effect, $\frac{1-\phi }{\phi }$, and for significance thresholds α = 0.05 and α = 0.005, Fig. 2 shows the false positive rate as a function of power 1−β.

**Fig. 2: Relationship between the P value threshold, power, and the false positive rate.**

In many studies, statistical power is low⁷. Figure 2 demonstrates that low statistical power and α = 0.05 combine to produce high false positive rates.

For many, the calculations illustrated by Fig. 2 may be unsettling. For example, the false positive rate is greater than 33% with prior odds of 1:10 and a P value threshold of 0.05, regardless of the level of statistical power. Reducing the threshold to 0.005 would reduce this minimum false positive rate to 5%. Similar reductions in false positive rates would occur over a wide range of statistical powers.

Empirical evidence from recent replication projects in psychology and experimental economics provide insights into the prior odds in favour of H ₁. In both projects, the rate of replication (that is, significance at P < 0.05 in the replication in a consistent direction) was roughly double for initial studies with P < 0.005 relative to initial studies with 0.005 < P < 0.05: 50% versus 24% for psychology⁸, and 85% versus 44% for experimental economics⁹. Although based on relatively small samples of studies (93 in psychology, and 16 in experimental economics, after excluding initial studies with P > 0.05), these numbers are suggestive of the potential gains in reproducibility that would accrue from the new threshold of P < 0.005 in these fields. In biomedical research, 96% of a sample of recent papers claim statistically significant results with the P < 0.05 threshold¹⁰. However, replication rates were very low⁵ for these studies, suggesting a potential for gains by adopting this new standard in these fields as well.

Potential objections

We now address the most compelling arguments against adopting this higher standard of evidence.

The false negative rate would become unacceptably high

Evidence that does not reach the new significance threshold should be treated as suggestive, and where possible further evidence should be accumulated; indeed, the combined results from several studies may be compelling even if any particular study is not. Failing to reject the null hypothesis does not mean accepting the null hypothesis. Moreover, the false negative rate will not increase if sample sizes are increased so that statistical power is held constant.

For a wide range of common statistical tests, transitioning from a P value threshold of α = 0.05 to α = 0.005 while maintaining 80% power would require an increase in sample sizes of about 70%. Such an increase means that fewer studies can be conducted using current experimental designs and budgets. But Fig. 2 shows the benefit: false positive rates would typically fall by factors greater than two. Hence, considerable resources would be saved by not performing future studies based on false premises. Increasing sample sizes is also desirable because studies with small sample sizes tend to yield inflated effect size estimates¹¹, and publication and other biases may be more likely in an environment of small studies¹². We believe that efficiency gains would far outweigh losses.

The proposal does not address multiple-hypothesis testing, P-hacking, publication bias, low power, or other biases (for example, confounding, selective reporting, and measurement error), which are arguably the bigger problems

We agree. Reducing the P value threshold complements — but does not substitute for — solutions to these other problems, which include good study design, ex ante power calculations, pre-registration of planned analyses, replications, and transparent reporting of procedures and all statistical analyses conducted.

The appropriate threshold for statistical significance should be different for different research communities

We agree that the significance threshold selected for claiming a new discovery should depend on the prior odds that the null hypothesis is true, the number of hypotheses tested, the study design, the relative cost of type I versus type II errors, and other factors that vary by research topic. For exploratory research with very low prior odds (well outside the range in Fig. 2), even lower significance thresholds than 0.005 are needed. Recognition of this issue led the genetics research community to move to a ‘genome-wide significance threshold’ of 5 × 10^–8 over a decade ago. And in high-energy physics, the tradition has long been to define significance by a ‘5-sigma’ rule (roughly a P value threshold of 3 × 10^–7). We are essentially suggesting a move from a 2-sigma rule to a 3-sigma rule.

Our recommendation applies to disciplines with prior odds broadly in the range depicted in Fig. 2, where use of P < 0.05 as a default is widespread. Within those disciplines, it is helpful for consumers of research to have a consistent benchmark. We feel the default should be shifted.

Changing the significance threshold is a distraction from the real solution, which is to replace null hypothesis significance testing (and bright-line thresholds) with more focus on effect sizes and confidence intervals, treating the P value as a continuous measure, and/or a Bayesian method.

Many of us agree that there are better approaches to statistical analyses than null hypothesis significance testing, but as yet there is no consensus regarding the appropriate choice of replacement. For example, a recent statement by the American Statistical Association addressed numerous issues regarding the misinterpretation and misuse of P values (as well as the related concept of statistical significance), but failed to make explicit policy recommendations to address these shortcomings¹³. Even after the significance threshold is changed, many of us will continue to advocate for alternatives to null hypothesis significance testing.

Concluding remarks

Ronald Fisher understood that the choice of 0.05 was arbitrary when he introduced it¹⁴. Since then, theory and empirical evidence have demonstrated that a lower threshold is needed. A much larger pool of scientists are now asking a much larger number of questions, possibly with much lower prior odds of success.

For research communities that continue to rely on null hypothesis significance testing, reducing the P value threshold for claims of new discoveries to 0.005 is an actionable step that will immediately improve reproducibility. We emphasize that this proposal is about standards of evidence, not standards for policy action nor standards for publication. Results that do not reach the threshold for statistical significance (whatever it is) can still be important and merit publication in leading journals if they address important research questions with rigorous methods. This proposal should not be used to reject publications of novel findings with 0.005 < P < 0.05 properly labelled as suggestive evidence. We should reward quality and transparency of research as we impose these more stringent standards, and we should monitor how researchers’ behaviours are affected by this change. Otherwise, science runs the risk that the more demanding threshold for statistical significance will be met to the detriment of quality and transparency.

Journals can help transition to the new statistical significance threshold. Authors and readers can themselves take the initiative by describing and interpreting results more appropriately in light of the new proposed definition of statistical significance. The new significance threshold will help researchers and readers to understand and communicate evidence more accurately.

References

Greenwald, A. G. et al. Psychophysiology 33, 175–183 (1996).
Article CAS PubMed Google Scholar
Johnson, V. E. Proc. Natl Acad. Sci. USA 110, 19313–19317 (2013).
Article CAS PubMed PubMed Central Google Scholar
Dreber, A. et al. Proc. Natl Acad. Sci. USA 112, 15343–15347 (2015).
Article CAS PubMed PubMed Central Google Scholar
Johnson, V. E. et al. J. Am. Stat. Assoc. 112, 1–10 (2016).
Article Google Scholar
Begley, C. G. & Ioannidis, J. P. A. Circ. Res. 116, 116–126 (2015).
Article CAS PubMed Google Scholar
Kass, R. E. & Raftery, A. E. J. Am. Stat. Assoc. 90, 773–795 (1995).
Article Google Scholar
Szucs, D. & Ioannidis, J. P. A. PLoS Biol. 15, e2000797 (2017).
Article PubMed PubMed Central Google Scholar
Open Science Collaboration. Science 349, aac4716 (2015).
Camerer, C. F. et al. Science 351, 1433–1436 (2016).
Article CAS PubMed Google Scholar
Chavalarias, D. et al. JAMA 315, 1141–1148 (2016).
Article CAS PubMed Google Scholar
Gelman, A. & Carlin, J. Perspect. Psychol. Sci. 9, 641–651 (2014).
Article PubMed Google Scholar
Fanelli, D., Costas, R. & Ioannidis, J. P. A. Proc. Natl Acad. Sci. USA 114, 3714–3719 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wasserstein, R. L. & Lazar, N. A. Am. Stat. 70, 129–133 (2016).
Article Google Scholar
Fisher, R. A. Statistical Methods for Research Workers (Oliver & Boyd, Edinburgh, 1925).
Sellke, T., Bayarri, M. J. & Berger, J. O. Am. Stat. 55, 62–71 (2001).
Article Google Scholar

Download references

Acknowledgements

We thank D. L. Lormand, R. Royer and A. T. Nguyen Viet for excellent research assistance.

Author information

Authors and Affiliations

Center for Economic and Social Research and Department of Economics, University of Southern California, Los Angeles, CA, 90089-3332, USA
Daniel J. Benjamin
Department of Statistical Science, Duke University, Durham, NC, 27708-0251, USA
James O. Berger, Merlise Clyde & Robert L. Wolpert
Department of Economics, Stockholm School of Economics, Stockholm, SE-113 83, Sweden
Magnus Johannesson & Anna Dreber
University of Virginia, Charlottesville, VA, 22908, USA
Brian A. Nosek
Center for Open Science, Charlottesville, VA, 22903, USA
Brian A. Nosek
Department of Psychology, University of Amsterdam, Amsterdam, 1018 VZ, The Netherlands
E.-J. Wagenmakers
School of Arts and Sciences and Department of Criminology, University of Pennsylvania, Philadelphia, PA, 19104-6286, USA
Richard Berk
Department of Psychology and Neuroscience, Department of Sociology, University of North Carolina Chapel Hill, Chapel Hill, NC, 27599-3270, USA
Kenneth A. Bollen
Institute of Zoology — Neurogenetics, Universität Regensburg, Universitätsstrasse 31, 93040, Regensburg, Germany
Björn Brembs
Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA, 19104, USA
Richard Berk, Lawrence Brown & Edward I. George
Division of the Humanities and Social Sciences, California Institute of Technology, Pasadena, CA, 91125, USA
Colin Camerer
Department of Economics, New York University, New York, NY, 10012, USA
David Cesarini
The Research Institute of Industrial Economics (IFN), Stockholm, SE-102 15, Sweden
David Cesarini
Cardiff University Brain Research Imaging Centre (CUBRIC), Cardiff, CF24 4HQ, UK
Christopher D. Chambers
Northwestern University, Evanston, IL, 60208, USA
Thomas D. Cook
Mathematica Policy Research, Washington, DC, 20002-4221, USA
Thomas D. Cook
Department of Psychology, Quantitative Program, Ohio State University, Columbus, OH, 43210, USA
Paul De Boeck
School of Psychology, University of Sussex, Brighton, BN1 9QH, UK
Zoltan Dienes & Andy P. Field
Department of Philosophy, Texas A&M University, College Station, TX, 77843-4237, USA
Kenny Easwaran
Department of Psychology, Royal Holloway University of London, Egham Surrey, TW20 0EX, UK
Charles Efferson
Department of Economics, University of Zurich, 8006, Zurich, Switzerland
Ernst Fehr
School of BioSciences and School of Historical & Philosophical Studies, University of Melbourne, Parkville, VIC, 3010, Australia
Fiona Fidler
Department of Philosophy, University of Wisconsin — Madison, Madison, WI, 53706, USA
Malcolm Forster
Department of Psychology, University of Michigan, Ann Arbor, MI, 48109-1043, USA
Richard Gonzalez
Stanford University, General Medical Disciplines, Stanford, CA, 94305, USA
Steven Goodman
Department of Ecology, Evolution and Natural Resources SEBS, Rutgers University, New Brunswick, NJ, 08901-8551, USA
Edwin Green
Department of Political Science, Columbia University in the City of New York, New York, NY, 10027, USA
Donald P. Green
Department of Psychology, University of Washington, Seattle, WA, 98195-1525, USA
Anthony G. Greenwald
Institute of Evolutionary Biology School of Biological Sciences, The University of Edinburgh, Edinburgh, EH9 3JT, UK
Jarrod D. Hadfield
Weinberg College of Arts & Sciences Department of Statistics, Northwestern University, Evanston, IL, 60208, USA
Larry V. Hedges
Epidemiology, Biostatistics and Prevention Institute (EBPI), University of Zurich, 8001, Zurich, Switzerland
Leonhard Held
National University of Singapore, Singapore, 119077, Singapore
Teck Hua Ho
Department of Methods and Statistics, Universiteit Utrecht, Utrecht, 3584 CH, The Netherlands
Herbert Hoijtink
School of Human Evolution and Social Change, Arizona State University, Tempe, AZ, 85287-2402, USA
Daniel J. Hruschka
Department of Politics and Center for Statistics and Machine Learning, Princeton University, Princeton, NJ, 08544, USA
Kosuke Imai
Stanford University, Stanford, CA, 94305-5015, USA
Guido Imbens
Departments of Medicine, of Health Research and Policy, of Biomedical Data Science, and of Statistics and Meta-Research Innovation Center at Stanford (METRICS), Stanford University, Stanford, CA, 94305, USA
John P. A. Ioannidis
Advanced Quantitative Methods, Social Research Methodology, Department of Education, Graduate School of Education & Information Studies, University of California, Los Angeles, CA, 90095-1521, USA
Minjeong Jeon
Department of Life Sciences, Imperial College London, Ascot, SL5 7PY, UK
James Holland Jones
Department of Earth System Science, Stanford, CA, 94305-4216, USA
James Holland Jones
Department of Banking and Finance, University of Innsbruck and University of Gothenburg, Innsbruck, A-6020, Austria
Michael Kirchler
Department of Economics, Harvard University, Cambridge, MA, 02138, USA
David Laibson
Department of Economics, University of Chicago, Chicago, IL, 60637, USA
John List
Department of Biostatistics, University of Michigan, Ann Arbor, MI, 48109-2029, USA
Roderick Little
Department of Political Science, University of Michigan, Ann Arbor, MI, 48109-1045, USA
Arthur Lupia
Department of History and Philosophy of Science, University of Pittsburgh, Pittsburgh, PA, 15260, USA
Edouard Machery
Department of Psychology, University of Notre Dame, Notre Dame, IN, 46556, USA
Scott E. Maxwell
School of BioSciences, University of Melbourne, Parkville, VIC, 3010, Australia
Michael McCarthy
Haas School of Business, University of California at Berkeley, Berkeley, CA, 94720-1900A, USA
Don A. Moore
Johns Hopkins University, Baltimore, MD, 21218, USA
Stephen L. Morgan
MRC Integrative Epidemiology Unit, University of Bristol, Bristol, BS8 1TU, UK
Marcus Munafó
UK Centre for Tobacco and Alcohol Studies, School of Experimental Psychology, University of Bristol, Bristol, BS8 1TU, UK
Marcus Munafó
Evolution & Ecology Research Centre and School of Biological, Earth and Environmental Sciences, University of New South Wales, Sydney, NSW, 2052, Australia
Shinichi Nakagawa
Department of Government, Dartmouth College, Hanover, NH, 03755, USA
Brendan Nyhan
Department of Biology, Whitman College, Walla Walla, WA, 99362, USA
Timothy H. Parker
Department of Mathematics, University of Puerto Rico, Rio Piedras Campus, San Juan, PR, 00936-8377, Puerto Rico
Luis Pericchi
Department of Psychology, University of Milan-Bicocca, Milan, 20126, Italy
Marco Perugini
Department of Cognitive Sciences, University of California, Irvine, CA, 92617, USA
Jeff Rouder
Université Paris Dauphine, 75016, Paris, France
Judith Rousseau
Department of Psychology, The University of British Columbia, Vancouver, V6T 1Z4, BC, Canada
Victoria Savalei
Department Psychology, Ludwig-Maximilians-University Munich, Leopoldstraβe 13, 80802, Munich, Germany
Felix D. Schönbrodt
Department of Statistics, Purdue University, West Lafayette, IN, 47907-2067, USA
Thomas Sellke
Department of Political Science, Washington University in St. Louis, St. Louis, MO, 63130-4899, USA
Betsy Sinclair
Government Department, Harvard University, Cambridge, MA, 02138, USA
Dustin Tingley
Department of Psychology, Ohio State University, Columbus, OH, 43210, USA
Trisha Van Zandt
Department of Psychology, University of California, Davis, CA, 95616, USA
Simine Vazire
Microsoft Research, 641 Avenue of the Americas, 7th Floor, New York, NY, 10011, USA
Duncan J. Watts
Department of Sociology, Harvard University, Cambridge, MA, 02138, USA
Christopher Winship
Department of Sociology, Princeton University, Princeton, NJ, 08544, USA
Yu Xie
Department of Sociology, Stanford University, Stanford, CA, 94305-2047, USA
Cristobal Young
Department of Economics, Dartmouth College, Hanover, NH, 03755-3514, USA
Jonathan Zinman
Department of Statistics, Texas A&M University, College Station, TX, 77843, USA
Valen E. Johnson

Authors

Daniel J. Benjamin
View author publications
You can also search for this author in PubMed Google Scholar
James O. Berger
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Johannesson
View author publications
You can also search for this author in PubMed Google Scholar
Brian A. Nosek
View author publications
You can also search for this author in PubMed Google Scholar
E.-J. Wagenmakers
View author publications
You can also search for this author in PubMed Google Scholar
Richard Berk
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth A. Bollen
View author publications
You can also search for this author in PubMed Google Scholar
Björn Brembs
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence Brown
View author publications
You can also search for this author in PubMed Google Scholar
Colin Camerer
View author publications
You can also search for this author in PubMed Google Scholar
David Cesarini
View author publications
You can also search for this author in PubMed Google Scholar
Christopher D. Chambers
View author publications
You can also search for this author in PubMed Google Scholar
Merlise Clyde
View author publications
You can also search for this author in PubMed Google Scholar
Thomas D. Cook
View author publications
You can also search for this author in PubMed Google Scholar
Paul De Boeck
View author publications
You can also search for this author in PubMed Google Scholar
Zoltan Dienes
View author publications
You can also search for this author in PubMed Google Scholar
Anna Dreber
View author publications
You can also search for this author in PubMed Google Scholar
Kenny Easwaran
View author publications
You can also search for this author in PubMed Google Scholar
Charles Efferson
View author publications
You can also search for this author in PubMed Google Scholar
Ernst Fehr
View author publications
You can also search for this author in PubMed Google Scholar
Fiona Fidler
View author publications
You can also search for this author in PubMed Google Scholar
Andy P. Field
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Forster
View author publications
You can also search for this author in PubMed Google Scholar
Edward I. George
View author publications
You can also search for this author in PubMed Google Scholar
Richard Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar
Steven Goodman
View author publications
You can also search for this author in PubMed Google Scholar
Edwin Green
View author publications
You can also search for this author in PubMed Google Scholar
Donald P. Green
View author publications
You can also search for this author in PubMed Google Scholar
Anthony G. Greenwald
View author publications
You can also search for this author in PubMed Google Scholar
Jarrod D. Hadfield
View author publications
You can also search for this author in PubMed Google Scholar
Larry V. Hedges
View author publications
You can also search for this author in PubMed Google Scholar
Leonhard Held
View author publications
You can also search for this author in PubMed Google Scholar
Teck Hua Ho
View author publications
You can also search for this author in PubMed Google Scholar
Herbert Hoijtink
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Hruschka
View author publications
You can also search for this author in PubMed Google Scholar
Kosuke Imai
View author publications
You can also search for this author in PubMed Google Scholar
Guido Imbens
View author publications
You can also search for this author in PubMed Google Scholar
John P. A. Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar
Minjeong Jeon
View author publications
You can also search for this author in PubMed Google Scholar
James Holland Jones
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kirchler
View author publications
You can also search for this author in PubMed Google Scholar
David Laibson
View author publications
You can also search for this author in PubMed Google Scholar
John List
View author publications
You can also search for this author in PubMed Google Scholar
Roderick Little
View author publications
You can also search for this author in PubMed Google Scholar
Arthur Lupia
View author publications
You can also search for this author in PubMed Google Scholar
Edouard Machery
View author publications
You can also search for this author in PubMed Google Scholar
Scott E. Maxwell
View author publications
You can also search for this author in PubMed Google Scholar
Michael McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Don A. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Stephen L. Morgan
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Munafó
View author publications
You can also search for this author in PubMed Google Scholar
Shinichi Nakagawa
View author publications
You can also search for this author in PubMed Google Scholar
Brendan Nyhan
View author publications
You can also search for this author in PubMed Google Scholar
Timothy H. Parker
View author publications
You can also search for this author in PubMed Google Scholar
Luis Pericchi
View author publications
You can also search for this author in PubMed Google Scholar
Marco Perugini
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Rouder
View author publications
You can also search for this author in PubMed Google Scholar
Judith Rousseau
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Savalei
View author publications
You can also search for this author in PubMed Google Scholar
Felix D. Schönbrodt
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Sellke
View author publications
You can also search for this author in PubMed Google Scholar
Betsy Sinclair
View author publications
You can also search for this author in PubMed Google Scholar
Dustin Tingley
View author publications
You can also search for this author in PubMed Google Scholar
Trisha Van Zandt
View author publications
You can also search for this author in PubMed Google Scholar
Simine Vazire
View author publications
You can also search for this author in PubMed Google Scholar
Duncan J. Watts
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Winship
View author publications
You can also search for this author in PubMed Google Scholar
Robert L. Wolpert
View author publications
You can also search for this author in PubMed Google Scholar
Yu Xie
View author publications
You can also search for this author in PubMed Google Scholar
Cristobal Young
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Zinman
View author publications
You can also search for this author in PubMed Google Scholar
Valen E. Johnson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Daniel J. Benjamin, Magnus Johannesson or Valen E. Johnson.

Ethics declarations

Competing interests

One of the 72 authors, Christopher Chambers, is a member of the Advisory Board of Nature Human Behaviour. Christopher Chambers was not a corresponding author and did not communicate with the editors regarding the publication of this article. The other authors declare no competing interests.

Electronic supplementary material

Supplementary information

Supplementary Methods

Rights and permissions

Reprints and permissions

About this article

Cite this article

Benjamin, D.J., Berger, J.O., Johannesson, M. et al. Redefine statistical significance. Nat Hum Behav 2, 6–10 (2018). https://doi.org/10.1038/s41562-017-0189-z

Download citation

Published: 01 September 2017
Issue Date: January 2018
DOI: https://doi.org/10.1038/s41562-017-0189-z

This article is cited by

The psychological reality of the learned “p < .05” boundary
- V. N. Vimal Rao
- Jeffrey K. Bye
- Sashank Varma
Cognitive Research: Principles and Implications (2024)
Yeast mannan rich fraction positively influences microbiome uniformity, productivity associated taxa, and lay performance
- Robert J. Leigh
- Aoife Corrigan
- Fiona Walsh
Animal Microbiome (2024)
On the use of receiver operating characteristic curve analysis to determine the most appropriate p value significance threshold
- Farrokh Habibzadeh
Journal of Translational Medicine (2024)
An overlooked phenomenon: complex interactions of potential error sources on the quality of bacterial de novo genome assemblies
- Zoltán Rádai
- Alex Váradi
- Levente Laczkó
BMC Genomics (2024)
Financial professionals and climate experts have diverging perspectives on climate action
- Elisabeth Gsottbauer
- Michael Kirchler
- Christian König-Kersting
Communications Earth & Environment (2024)