Abstract

Being able to replicate scientific findings is crucial for scientific progress1,2,3,4,5,6,7,8,9,10,11,12,13,14,15. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 201516,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    McNutt, M. Reproducibility. Science 343, 229 (2014).

  2. 2.

    Baker, M. Is there a reproducibility crisis? Nature 533, 452–454 (2016).

  3. 3.

    Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).

  4. 4.

    Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).

  5. 5.

    Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712 (2011).

  6. 6.

    Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).

  7. 7.

    Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

  8. 8.

    Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).

  9. 9.

    Maniadis, Z., Tufano, F. & List, J. A. One swallow doesn’t make a summer: new evidence on anchoring effects. Am. Econ. Rev. 104, 277–290 (2014).

  10. 10.

    Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. The economics of reproducibility in preclinical research. PLoS Biol. 13, e1002165 (2015).

  11. 11.

    Klein, R. A. et al. Investigating variation in replicability: a ‘many labs’ replication project. Soc. Psychol. 45, 142–152 (2014).

  12. 12.

    Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

  13. 13.

    Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016).

  14. 14.

    Ebersole, C. R. et al. Many Labs 3: evaluating participant pool quality across the academic semester via replication. J. Exp. Soc. Psychol. 67, 68–82 (2016).

  15. 15.

    Klein, R. A. et al. Many Labs 2: investigating variation in replicability across sample and setting. Adv. Methods Prac. Psychol. Sci. (in the press).

  16. 16.

    Ackerman, J. M., Nocera, C. C. & Bargh, J. A. Incidental haptic sensations influence social judgments and decisions. Science 328, 1712–1715 (2010).

  17. 17.

    Aviezer, H., Trope, Y. & Todorov, A. Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 1225–1229 (2012).

  18. 18.

    Balafoutas, L. & Sutter, M. Affirmative action policies promote women and do not harm efficiency in the laboratory. Science 335, 579–582 (2012).

  19. 19.

    Derex, M., Beugin, M.-P., Godelle, B. & Raymond, M. Experimental evidence for the influence of group size on cultural complexity. Nature 503, 389–391 (2013).

  20. 20.

    Duncan, K., Sadanand, A. & Davachi, L. Memory’s penumbra: episodic memory decisions induce lingering mnemonic biases. Science 337, 485–487 (2012).

  21. 21.

    Gervais, W. M. & Norenzayan, A. Analytic thinking promotes religious disbelief. Science 336, 493–496 (2012).

  22. 22.

    Gneezy, U., Keenan, E. A. & Gneezy, A. Avoiding overhead aversion in charity. Science 346, 632–635 (2014).

  23. 23.

    Hauser, O. P., Rand, D. G., Peysakhovich, A. & Nowak, M. A. Cooperating with the future. Nature 511, 220–223 (2014).

  24. 24.

    Janssen, M. A., Holahan, R., Lee, A. & Ostrom, E. Lab experiments for the study of social-ecological systems. Science 328, 613–617 (2010).

  25. 25.

    Karpicke, J. D. & Blunt, J. R. Retrieval practice produces more learning than elaborative studying with concept mapping. Science 331, 772–775 (2011).

  26. 26.

    Kidd, D. C. & Castano, E. Reading literary fiction improves theory of mind. Science 342, 377–380 (2013).

  27. 27.

    Kovacs, Á. M. & Téglás, E. & Endress, A. D. The social sense: susceptibility to others’ beliefs in human infants and adults. Science 330, 1830–1834 (2010).

  28. 28.

    Lee, S. W. S. & Schwarz, N. Washing away postdecisional dissonance. Science 328, 709 (2010).

  29. 29.

    Morewedge, C. K., Huh, Y. E. & Vosgerau, J. Thought for food: imagined consumption reduces actual consumption. Science 330, 1530–1533 (2010).

  30. 30.

    Nishi, A., Shirado, H., Rand, D. G. & Christakis, N. A. Inequality and visibility of wealth in experimental social networks. Nature 526, 426–429 (2015).

  31. 31.

    Pyc, M. A. & Rawson, K. A. Why testing improves memory: mediator effectiveness hypothesis. Science 330, 335 (2010).

  32. 32.

    Ramirez, G. & Beilock, S. L. Writing about testing worries boosts exam performance in the classroom. Science 331, 211–213 (2011).

  33. 33.

    Rand, D. G., Greene, J. D. & Nowak, M. A. Spontaneous giving and calculated greed. Nature 489, 427–430 (2012).

  34. 34.

    Shah, A. K., Mullainathan, S. & Shafir, E. Some consequences of having too little. Science 338, 682–685 (2012).

  35. 35.

    Sparrow, B., Liu, J. & Wegner, D. M. Google effects on memory: cognitive consequences of having information at our fingertips. Science 333, 776–778 (2011).

  36. 36.

    Wilson, T. D. et al. Just think: the challenges of the disengaged mind. Science 345, 75–77 (2014).

  37. 37.

    Bohannon, J. Replication effort provokes praise—and ‘bullying’ charges. Science 344, 788–789 (2014).

  38. 38.

    Gilbert, D. T., King, G., Pettigrew, S. & Wilson, T. D. Comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).

  39. 39.

    Anderson, C. J. et al. Response to comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).

  40. 40.

    Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).

  41. 41.

    Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).

  42. 42.

    Etz, A. & Vandekerckhove, J. A Bayesian perspective on the Reproducibility Project: Psychology. PLoS One 11, e0149794 (2016).

  43. 43.

    Gelman, A. & Stern, H. The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60, 328–331 (2006).

  44. 44.

    Cumming, G. Replication and P intervals: P values predict the future only vaguely, but confidence intervals do much better. Psychol. Sci. 3, 286–300 (2008).

  45. 45.

    Verhagen, J. & Wagenmakers, E.-J. Bayesian tests to quantify the result of a replication attempt. J. Exp. Psychol. Gen. 143, 1457–1475 (2014).

  46. 46.

    Simonsohn, U. Small telescopes: detectability and the evaluation of replication results. Psychol. Sci. 26, 559–569 (2015).

  47. 47.

    Patil, P., Peng, R. D. & Leek, J. T. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11, 539–544 (2016).

  48. 48.

    Wagenmakers, E.-J. et al. Bayesian inference for psychology. Part II: example applications with JASP. Psychon. Bull. Rev. 25, 58–76 (2017).

  49. 49.

    Lee, M. D. & Wagenmakers, E.-J. Bayesian Cognitive Modeling: A Practical Course (Cambridge Univ. Press, Cambridge, 2013).

  50. 50.

    Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl Acad. Sci. USA 112, 15343–15347 (2015).

  51. 51.

    Benjamin, D. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).

  52. 52.

    Jeffreys, H. Theory of Probability (Oxford Univ. Press, Oxford, 1961).

  53. 53.

    Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

  54. 54.

    Arrow, K. J. et al. The promise of prediction markets. Science 320, 877–878 (2008).

  55. 55.

    Nosek, B. A., Ebersole, C. R., DeHaven, A. & Mellor, D. M. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).

  56. 56.

    Nosek, B. A. et al. Promoting an open research culture: author guidelines for journals could help to promote transparency, openness, and reproducibility. Science 348, 1422–1425 (2015).

Download references

Acknowledgements

Neither Nature Human Behaviour nor the publisher had any involvement with the conduct of this study prior to its submission to the journal. For financial support we thank: the Austrian Science Fund FWF (SFB F63, START-grant Y617-G11), the Austrian National Bank (grant OeNB 14953), the Behavioral and Neuroeconomics Discovery Fund (C.F.C.), the Jan Wallander and Tom Hedelius Foundation (P2015-0001:1 and P2013-0156:1), the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellows grant to A.D.), the Swedish Foundation for Humanities and Social Sciences (NHS14-1719:1), the Netherlands Organisation for Scientific Research (Vici grant 016.Vici.170.083 to E.-J.W.), the Sloan Foundation (G-2015-13929) and the Singapore National Research Foundation’s Returning Singaporean Scientists Scheme (grant NRF-RSS2014-001 to T.-H.H.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the following people for assistance with the experiments and analyses: D. van den Bergh, P.-C. Bindra, J. van Doorn, C. Huber, A. Ly, M. Marsman and J. Zambre.

Author information

Author notes

  1. These authors contributed equally: Colin F. Camerer, Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian A. Nosek, Thomas Pfeiffer.

Affiliations

  1. California Institute of Technology, Pasadena, CA, USA

    • Colin F. Camerer
  2. Department of Economics, Stockholm School of Economics, Stockholm, Sweden

    • Anna Dreber
    • , Magnus Johannesson
    • , Adam Altmejd
    • , Emma Heikensten
    •  & Siri Isaksson
  3. Department of Banking and Finance, University of Innsbruck, Innsbruck, Austria

    • Felix Holzmeister
    • , Jürgen Huber
    • , Michael Kirchler
    •  & Julia Rose
  4. NUS Business School, National University of Singapore, Singapore, Singapore

    • Teck-Hua Ho
  5. Centre for Finance, Department of Economics, University of Göteborg, Göteborg, Sweden

    • Michael Kirchler
  6. The Wharton School, University of Pennsylvania, Philadelphia, PA, USA

    • Gideon Nave
    •  & Dylan Manfredi
  7. Department of Psychology, University of Virginia, Charlottesville, VA, USA

    • Brian A. Nosek
    • , Nick Buttrick
    •  & Anup Gampa
  8. Center for Open Science, Charlottesville, VA, USA

    • Brian A. Nosek
    • , Nick Buttrick
    • , Anup Gampa
    •  & Lily Hummer
  9. New Zealand Institute for Advanced Study, Auckland, New Zealand

    • Thomas Pfeiffer
  10. Office of the Senior Deputy President and Provost, National University of Singapore, Singapore, Singapore

    • Taizan Chan
  11. John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA

    • Yiling Chen
  12. Spotify Sweden AB, Stockholm, Sweden

    • Eskil Forsell
  13. Department of Economics, LMU Munich, Munich, Germany

    • Taisuke Imai
  14. Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands

    • Eric-Jan Wagenmakers
  15. School of Management, Harbin Institute of Technology, Harbin, China

    • Hang Wu

Authors

  1. Search for Colin F. Camerer in:

  2. Search for Anna Dreber in:

  3. Search for Felix Holzmeister in:

  4. Search for Teck-Hua Ho in:

  5. Search for Jürgen Huber in:

  6. Search for Magnus Johannesson in:

  7. Search for Michael Kirchler in:

  8. Search for Gideon Nave in:

  9. Search for Brian A. Nosek in:

  10. Search for Thomas Pfeiffer in:

  11. Search for Adam Altmejd in:

  12. Search for Nick Buttrick in:

  13. Search for Taizan Chan in:

  14. Search for Yiling Chen in:

  15. Search for Eskil Forsell in:

  16. Search for Anup Gampa in:

  17. Search for Emma Heikensten in:

  18. Search for Lily Hummer in:

  19. Search for Taisuke Imai in:

  20. Search for Siri Isaksson in:

  21. Search for Dylan Manfredi in:

  22. Search for Julia Rose in:

  23. Search for Eric-Jan Wagenmakers in:

  24. Search for Hang Wu in:

Contributions

C.F.C., A.D., F.H., J.H., T.-H.H., M.J., M.K., G.N., B.A.N. and T.P. designed the research. C.F.C., A.D., F.H., T.-H.H., J.H., M.J., M.K., D.M., G.N., B.A.N., T.P. and E.-J.W. wrote the paper. T.C., A.D., E.F., F.H., T.-H.H., M.J., T.P. and Y.C. helped to design the prediction market part. F.H. and E.-J.W. analysed the data. A.A., N.B., A.G., E.H., F.H., L.H., T.I., S.I., D.M., J.R. and H.W. carried out the replications (including re-estimating the original estimate with the replication data). All authors approved the final manuscript.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Brian A. Nosek.

Supplementary information

  1. Supplementary Information

    Supplementary Methods, Supplementary References, Supplementary Tables 1–7 and Supplementary Figures 1–9

  2. Reporting Summary

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41562-018-0399-z

Further reading