Abstract
Empirical results hinge on analytical decisions that are defensible, arbitrary and motivated. These decisions probably introduce bias (towards the narrative put forward by the authors), and they certainly involve variability not reflected by standard errors. To address this source of noise and bias, we introduce specification curve analysis, which consists of three steps: (1) identifying the set of theoretically justified, statistically valid and non-redundant specifications; (2) displaying the results graphically, allowing readers to identify consequential specifications decisions; and (3) conducting joint inference across all specifications. We illustrate the use of this technique by applying it to three findings from two different papers, one investigating discrimination based on distinctively Black names, the other investigating the effect of assigning female versus male names to hurricanes. Specification curve analysis reveals that one finding is robust, one is weak and one is not robust at all.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Need for Cognition is associated with a preference for higher task load in effort discounting
Scientific Reports Open Access 09 November 2023
-
A Multiversal Model of Vibration of Effects of the Equitable and Sustainable Well-Being (BES) on Fertility
Social Indicators Research Open Access 29 June 2023
-
Theory and methods of the multiverse: an application for panel-based models
Quality & Quantity Open Access 26 June 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout



Data availability
The datasets used for both demonstrations have been deposited at OSF: https://osf.io/9rvps/
Code availability
The code used to generate all figures and calculations, including those in the Supplementary information, has been deposited at OSF: https://osf.io/9rvps/
Change history
09 October 2020
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
References
Leamer, E. E. Let’s take the con out of econometrics. Am. Econ. Rev. 73, 31-43 (1983).
Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, 696–701 (2005).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
Glaeser, E. L. Researcher incentives and empirical methods. NBER Technical Working Paper Series https://doi.org/10.3386/t0329 (2006).
Efron, B. Estimation and accuracy after model selection. J. Am. Stat. Assoc. 109, 991–1007 (2014).
White, H. A reality check for data snooping. Econometrica 68, 1097–1126 (2000).
Athey, S. & Imbens, G. A measure of robustness to misspecification. Am. Econ. Rev. 105, 476–480 (2015).
Sala-i-Martin, X. X. I just ran two million regressions. Am. Econ. Rev. 87, 178–183 (1997).
Muñoz, J. & Young, C. We ran 9 billion regressions: eliminating false positives through computational model robustness. Sociol. Methodol. 48, 1–33 (2018).
Young, C. & Holsteen, K. Model uncertainty and robustness: a computational framework for multimodel analysis. Sociol. Methods Res. 46, 3–40 (2017).
Miguel, E. et al. Promoting transparency in social science research. Science 343, 30–31 (2014).
Moore, D. A. Preregister if you want to. Am. Psychol. 71, 238–239 (2016).
Bhargava, S., Kassam, K. S. & Loewenstein, G. A reassessment of the defense of parenthood. Psychol. Sci. 25, 299–302 (2014).
DellaVigna, S. & Malmendier, U. Paying not to go to the gym. Am. Econ. Rev. 96, 694–719 (2006).
Stevenson, B. & Wolfers, J. Economic growth and subjective well-being: reassessing the Easterlin Paradox. Brookings Pap. Econ. Act. 2008, 1–87 (2008).
Card, D. & Krueger, A. B. Minimum wages and employment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am. Econ. Rev. 84, 772–793 (1994).
Jung, K., Shavitt, S., Viswanathan, M. & Hilbe, J. M. Female hurricanes are deadlier than male hurricanes. Proc. Natl Acad. Sci. USA 111, 8782–8787 (2014).
Malter, D. Female hurricanes are not deadlier than male hurricanes. Proc. Natl Acad. Sci. USA 111, E3496 (2014).
Maley, S. Statistics show no evidence of gender bias in the public’s hurricane preparedness. Proc. Natl Acad. Sci. USA 111, E3834 (2014).
Bakkensen, L. & Larson, W. Population matters when modeling hurricane fatalities. Proc. Natl Acad. Sci. USA 111, E5331 (2014).
Christensen, B. & Christensen, S. Are female hurricanes really deadlier than male hurricanes? Proc. Natl Acad. Sci. USA 111, E3497–E3498 (2014).
Jung, K., Shavitt, S., Viswanathan, M. & Hilbe, J. M. Reply to Christensen and Christensen and to Malter: pitfalls of erroneous analyses of hurricanes names. Proc. Natl Acad. Sci. USA 111, E3499–E3500 (2014).
Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94, 991–1013 (2004).
Boos, D. D. Introduction to the bootstrap world. Stat. Sci. 18, 168–174 (2003).
Bickel, P. J. & Ren, J.-J. The bootstrap in hypothesis testing. Proj. Euclid 36, 91–112 (2001).
MacKinnon, J. G. in Handbook of Computational Econometrics (eds Belsley, D. A. & Kontoghiorghes, E. J.) 183–213 (Wiley, 2009).
Paparoditis, E. & Politis, D. N. Bootstrap hypothesis testing in regression models. Stat. Probab. Lett. 74, 356–365 (2005).
Romano, J. P. Bootstrap and randomization tests of some nonparametric hypotheses. Ann. P Stat. 17, 141–159 (1989).
Pitman, E. J. G. Significance tests which may be applied to samples from any populations. J. R. Stat. Soc. 4, 119–130 (1937).
Fisher, R. A. The Design of Experiments (Oliver and Boyd, 1935).
Pesarin, F. & Salmaso, L. Permutation Tests for Complex Data: Theory, Applications and Software (John Wiley & Sons, 2010).
Ernst, M. D. Permutation methods: a basis for exact inference. Stat. Sci. 19, 676–685 (2004).
Flachaire, E. A better way to bootstrap pairs. Econ. Lett. 64, 257–262 (1999).
Lancaster, H. Significance tests in discrete distributions. J. Am. Stat. Assoc. 56, 223–234 (1961).
Acknowledgements
The authors received no specific funding for this work.
Author information
Authors and Affiliations
Contributions
U.S., J.P.S. and L.D.N. jointly developed the ideas surrounding specification curve analysis and wrote the manuscript. U.S. developed and implemented the inferential approach to specification curve analysis and conducted all analyses.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Primary handling editor: Stavroula Kousta
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Notes 1–5, Supplementary Figs. 1–10 and references.
Rights and permissions
About this article
Cite this article
Simonsohn, U., Simmons, J.P. & Nelson, L.D. Specification curve analysis. Nat Hum Behav 4, 1208–1214 (2020). https://doi.org/10.1038/s41562-020-0912-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41562-020-0912-z
This article is cited by
-
Reducing bias, increasing transparency and calibrating confidence with preregistration
Nature Human Behaviour (2023)
-
An umbrella review of randomized control trials on the effects of physical exercise on cognition
Nature Human Behaviour (2023)
-
Need for Cognition is associated with a preference for higher task load in effort discounting
Scientific Reports (2023)
-
Emotional Granularity is Associated with Daily Experiential Diversity
Affective Science (2023)
-
A Multiversal Model of Vibration of Effects of the Equitable and Sustainable Well-Being (BES) on Fertility
Social Indicators Research (2023)