Empirical results hinge on analytical decisions that are defensible, arbitrary and motivated. These decisions probably introduce bias (towards the narrative put forward by the authors), and they certainly involve variability not reflected by standard errors. To address this source of noise and bias, we introduce specification curve analysis, which consists of three steps: (1) identifying the set of theoretically justified, statistically valid and non-redundant specifications; (2) displaying the results graphically, allowing readers to identify consequential specifications decisions; and (3) conducting joint inference across all specifications. We illustrate the use of this technique by applying it to three findings from two different papers, one investigating discrimination based on distinctively Black names, the other investigating the effect of assigning female versus male names to hurricanes. Specification curve analysis reveals that one finding is robust, one is weak and one is not robust at all.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The datasets used for both demonstrations have been deposited at OSF: https://osf.io/9rvps/
Leamer, E. E. Let’s take the con out of econometrics. Am. Econ. Rev. 73, 31-43 (1983).
Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, 696–701 (2005).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
Glaeser, E. L. Researcher incentives and empirical methods. NBER Technical Working Paper Series https://doi.org/10.3386/t0329 (2006).
Efron, B. Estimation and accuracy after model selection. J. Am. Stat. Assoc. 109, 991–1007 (2014).
White, H. A reality check for data snooping. Econometrica 68, 1097–1126 (2000).
Athey, S. & Imbens, G. A measure of robustness to misspecification. Am. Econ. Rev. 105, 476–480 (2015).
Sala-i-Martin, X. X. I just ran two million regressions. Am. Econ. Rev. 87, 178–183 (1997).
Muñoz, J. & Young, C. We ran 9 billion regressions: eliminating false positives through computational model robustness. Sociol. Methodol. 48, 1–33 (2018).
Young, C. & Holsteen, K. Model uncertainty and robustness: a computational framework for multimodel analysis. Sociol. Methods Res. 46, 3–40 (2017).
Miguel, E. et al. Promoting transparency in social science research. Science 343, 30–31 (2014).
Moore, D. A. Preregister if you want to. Am. Psychol. 71, 238–239 (2016).
Bhargava, S., Kassam, K. S. & Loewenstein, G. A reassessment of the defense of parenthood. Psychol. Sci. 25, 299–302 (2014).
DellaVigna, S. & Malmendier, U. Paying not to go to the gym. Am. Econ. Rev. 96, 694–719 (2006).
Stevenson, B. & Wolfers, J. Economic growth and subjective well-being: reassessing the Easterlin Paradox. Brookings Pap. Econ. Act. 2008, 1–87 (2008).
Card, D. & Krueger, A. B. Minimum wages and employment: a case study of the fast-food industry in New Jersey and Pennsylvania. Am. Econ. Rev. 84, 772–793 (1994).
Jung, K., Shavitt, S., Viswanathan, M. & Hilbe, J. M. Female hurricanes are deadlier than male hurricanes. Proc. Natl Acad. Sci. USA 111, 8782–8787 (2014).
Malter, D. Female hurricanes are not deadlier than male hurricanes. Proc. Natl Acad. Sci. USA 111, E3496 (2014).
Maley, S. Statistics show no evidence of gender bias in the public’s hurricane preparedness. Proc. Natl Acad. Sci. USA 111, E3834 (2014).
Bakkensen, L. & Larson, W. Population matters when modeling hurricane fatalities. Proc. Natl Acad. Sci. USA 111, E5331 (2014).
Christensen, B. & Christensen, S. Are female hurricanes really deadlier than male hurricanes? Proc. Natl Acad. Sci. USA 111, E3497–E3498 (2014).
Jung, K., Shavitt, S., Viswanathan, M. & Hilbe, J. M. Reply to Christensen and Christensen and to Malter: pitfalls of erroneous analyses of hurricanes names. Proc. Natl Acad. Sci. USA 111, E3499–E3500 (2014).
Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94, 991–1013 (2004).
Boos, D. D. Introduction to the bootstrap world. Stat. Sci. 18, 168–174 (2003).
Bickel, P. J. & Ren, J.-J. The bootstrap in hypothesis testing. Proj. Euclid 36, 91–112 (2001).
MacKinnon, J. G. in Handbook of Computational Econometrics (eds Belsley, D. A. & Kontoghiorghes, E. J.) 183–213 (Wiley, 2009).
Paparoditis, E. & Politis, D. N. Bootstrap hypothesis testing in regression models. Stat. Probab. Lett. 74, 356–365 (2005).
Romano, J. P. Bootstrap and randomization tests of some nonparametric hypotheses. Ann. P Stat. 17, 141–159 (1989).
Pitman, E. J. G. Significance tests which may be applied to samples from any populations. J. R. Stat. Soc. 4, 119–130 (1937).
Fisher, R. A. The Design of Experiments (Oliver and Boyd, 1935).
Pesarin, F. & Salmaso, L. Permutation Tests for Complex Data: Theory, Applications and Software (John Wiley & Sons, 2010).
Ernst, M. D. Permutation methods: a basis for exact inference. Stat. Sci. 19, 676–685 (2004).
Flachaire, E. A better way to bootstrap pairs. Econ. Lett. 64, 257–262 (1999).
Lancaster, H. Significance tests in discrete distributions. J. Am. Stat. Assoc. 56, 223–234 (1961).
The authors received no specific funding for this work.
The authors declare no competing interests.
Peer review information Primary handling editor: Stavroula Kousta
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Simonsohn, U., Simmons, J.P. & Nelson, L.D. Specification curve analysis. Nat Hum Behav 4, 1208–1214 (2020). https://doi.org/10.1038/s41562-020-0912-z
Prevention Science (2021)