Being able to replicate scientific findings is crucial for scientific progress1,2,3,4,5,6,7,8,9,10,11,12,13,14,15. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 201516,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
McNutt, M. Reproducibility. Science 343, 229 (2014).
Baker, M. Is there a reproducibility crisis? Nature 533, 452–454 (2016).
Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).
Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).
Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712 (2011).
Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).
Maniadis, Z., Tufano, F. & List, J. A. One swallow doesn’t make a summer: new evidence on anchoring effects. Am. Econ. Rev. 104, 277–290 (2014).
Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. The economics of reproducibility in preclinical research. PLoS Biol. 13, e1002165 (2015).
Klein, R. A. et al. Investigating variation in replicability: a ‘many labs’ replication project. Soc. Psychol. 45, 142–152 (2014).
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016).
Ebersole, C. R. et al. Many Labs 3: evaluating participant pool quality across the academic semester via replication. J. Exp. Soc. Psychol. 67, 68–82 (2016).
Klein, R. A. et al. Many Labs 2: investigating variation in replicability across sample and setting. Adv. Methods Prac. Psychol. Sci. (in the press).
Ackerman, J. M., Nocera, C. C. & Bargh, J. A. Incidental haptic sensations influence social judgments and decisions. Science 328, 1712–1715 (2010).
Aviezer, H., Trope, Y. & Todorov, A. Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 1225–1229 (2012).
Balafoutas, L. & Sutter, M. Affirmative action policies promote women and do not harm efficiency in the laboratory. Science 335, 579–582 (2012).
Derex, M., Beugin, M.-P., Godelle, B. & Raymond, M. Experimental evidence for the influence of group size on cultural complexity. Nature 503, 389–391 (2013).
Duncan, K., Sadanand, A. & Davachi, L. Memory’s penumbra: episodic memory decisions induce lingering mnemonic biases. Science 337, 485–487 (2012).
Gervais, W. M. & Norenzayan, A. Analytic thinking promotes religious disbelief. Science 336, 493–496 (2012).
Gneezy, U., Keenan, E. A. & Gneezy, A. Avoiding overhead aversion in charity. Science 346, 632–635 (2014).
Hauser, O. P., Rand, D. G., Peysakhovich, A. & Nowak, M. A. Cooperating with the future. Nature 511, 220–223 (2014).
Janssen, M. A., Holahan, R., Lee, A. & Ostrom, E. Lab experiments for the study of social-ecological systems. Science 328, 613–617 (2010).
Karpicke, J. D. & Blunt, J. R. Retrieval practice produces more learning than elaborative studying with concept mapping. Science 331, 772–775 (2011).
Kidd, D. C. & Castano, E. Reading literary fiction improves theory of mind. Science 342, 377–380 (2013).
Kovacs, Á. M. & Téglás, E. & Endress, A. D. The social sense: susceptibility to others’ beliefs in human infants and adults. Science 330, 1830–1834 (2010).
Lee, S. W. S. & Schwarz, N. Washing away postdecisional dissonance. Science 328, 709 (2010).
Morewedge, C. K., Huh, Y. E. & Vosgerau, J. Thought for food: imagined consumption reduces actual consumption. Science 330, 1530–1533 (2010).
Nishi, A., Shirado, H., Rand, D. G. & Christakis, N. A. Inequality and visibility of wealth in experimental social networks. Nature 526, 426–429 (2015).
Pyc, M. A. & Rawson, K. A. Why testing improves memory: mediator effectiveness hypothesis. Science 330, 335 (2010).
Ramirez, G. & Beilock, S. L. Writing about testing worries boosts exam performance in the classroom. Science 331, 211–213 (2011).
Rand, D. G., Greene, J. D. & Nowak, M. A. Spontaneous giving and calculated greed. Nature 489, 427–430 (2012).
Shah, A. K., Mullainathan, S. & Shafir, E. Some consequences of having too little. Science 338, 682–685 (2012).
Sparrow, B., Liu, J. & Wegner, D. M. Google effects on memory: cognitive consequences of having information at our fingertips. Science 333, 776–778 (2011).
Wilson, T. D. et al. Just think: the challenges of the disengaged mind. Science 345, 75–77 (2014).
Bohannon, J. Replication effort provokes praise—and ‘bullying’ charges. Science 344, 788–789 (2014).
Gilbert, D. T., King, G., Pettigrew, S. & Wilson, T. D. Comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).
Anderson, C. J. et al. Response to comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).
Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
Etz, A. & Vandekerckhove, J. A Bayesian perspective on the Reproducibility Project: Psychology. PLoS One 11, e0149794 (2016).
Gelman, A. & Stern, H. The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60, 328–331 (2006).
Cumming, G. Replication and P intervals: P values predict the future only vaguely, but confidence intervals do much better. Psychol. Sci. 3, 286–300 (2008).
Verhagen, J. & Wagenmakers, E.-J. Bayesian tests to quantify the result of a replication attempt. J. Exp. Psychol. Gen. 143, 1457–1475 (2014).
Simonsohn, U. Small telescopes: detectability and the evaluation of replication results. Psychol. Sci. 26, 559–569 (2015).
Patil, P., Peng, R. D. & Leek, J. T. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11, 539–544 (2016).
Wagenmakers, E.-J. et al. Bayesian inference for psychology. Part II: example applications with JASP. Psychon. Bull. Rev. 25, 58–76 (2017).
Lee, M. D. & Wagenmakers, E.-J. Bayesian Cognitive Modeling: A Practical Course (Cambridge Univ. Press, Cambridge, 2013).
Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl Acad. Sci. USA 112, 15343–15347 (2015).
Benjamin, D. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).
Jeffreys, H. Theory of Probability (Oxford Univ. Press, Oxford, 1961).
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Arrow, K. J. et al. The promise of prediction markets. Science 320, 877–878 (2008).
Nosek, B. A., Ebersole, C. R., DeHaven, A. & Mellor, D. M. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).
Nosek, B. A. et al. Promoting an open research culture: author guidelines for journals could help to promote transparency, openness, and reproducibility. Science 348, 1422–1425 (2015).
Neither Nature Human Behaviour nor the publisher had any involvement with the conduct of this study prior to its submission to the journal. For financial support we thank: the Austrian Science Fund FWF (SFB F63, START-grant Y617-G11), the Austrian National Bank (grant OeNB 14953), the Behavioral and Neuroeconomics Discovery Fund (C.F.C.), the Jan Wallander and Tom Hedelius Foundation (P2015-0001:1 and P2013-0156:1), the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellows grant to A.D.), the Swedish Foundation for Humanities and Social Sciences (NHS14-1719:1), the Netherlands Organisation for Scientific Research (Vici grant 016.Vici.170.083 to E.-J.W.), the Sloan Foundation (G-2015-13929) and the Singapore National Research Foundation’s Returning Singaporean Scientists Scheme (grant NRF-RSS2014-001 to T.-H.H.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the following people for assistance with the experiments and analyses: D. van den Bergh, P.-C. Bindra, J. van Doorn, C. Huber, A. Ly, M. Marsman and J. Zambre.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Camerer, C.F., Dreber, A., Holzmeister, F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav 2, 637–644 (2018). https://doi.org/10.1038/s41562-018-0399-z
Evaluating implementation of the Transparency and Openness Promotion (TOP) guidelines: the TRUST process for rating journal policies, procedures, and practices
Research Integrity and Peer Review (2021)
International Journal of Mental Health Systems (2021)
The REPRISE project: protocol for an evaluation of REProducibility and Replicability In Syntheses of Evidence
Systematic Reviews (2021)
Life Sciences, Society and Policy (2021)
Nature Human Behaviour (2021)