Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015


Being able to replicate scientific findings is crucial for scientific progress1,2,3,4,5,6,7,8,9,10,11,12,13,14,15. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 201516,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Replication results after stage 1 and stage 2.
Fig. 2: Replication results for two complementary replication indicators.
Fig. 3: Default Bayes factors (one sided) for the 21 replications.
Fig. 4: Prediction market and survey beliefs.


  1. 1.

    McNutt, M. Reproducibility. Science 343, 229 (2014).

    CAS  PubMed  Google Scholar 

  2. 2.

    Baker, M. Is there a reproducibility crisis? Nature 533, 452–454 (2016).

    CAS  PubMed  Google Scholar 

  3. 3.

    Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).

    Google Scholar 

  4. 4.

    Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).

    PubMed  PubMed Central  Google Scholar 

  5. 5.

    Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712 (2011).

    CAS  PubMed  Google Scholar 

  6. 6.

    Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).

    CAS  PubMed  Google Scholar 

  7. 7.

    Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).

    CAS  PubMed  Google Scholar 

  9. 9.

    Maniadis, Z., Tufano, F. & List, J. A. One swallow doesn’t make a summer: new evidence on anchoring effects. Am. Econ. Rev. 104, 277–290 (2014).

    Google Scholar 

  10. 10.

    Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. The economics of reproducibility in preclinical research. PLoS Biol. 13, e1002165 (2015).

    PubMed  PubMed Central  Google Scholar 

  11. 11.

    Klein, R. A. et al. Investigating variation in replicability: a ‘many labs’ replication project. Soc. Psychol. 45, 142–152 (2014).

    Google Scholar 

  12. 12.

    Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

    Google Scholar 

  13. 13.

    Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016).

    CAS  PubMed  Google Scholar 

  14. 14.

    Ebersole, C. R. et al. Many Labs 3: evaluating participant pool quality across the academic semester via replication. J. Exp. Soc. Psychol. 67, 68–82 (2016).

    Google Scholar 

  15. 15.

    Klein, R. A. et al. Many Labs 2: investigating variation in replicability across sample and setting. Adv. Methods Prac. Psychol. Sci. (in the press).

  16. 16.

    Ackerman, J. M., Nocera, C. C. & Bargh, J. A. Incidental haptic sensations influence social judgments and decisions. Science 328, 1712–1715 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Aviezer, H., Trope, Y. & Todorov, A. Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 1225–1229 (2012).

    CAS  PubMed  Google Scholar 

  18. 18.

    Balafoutas, L. & Sutter, M. Affirmative action policies promote women and do not harm efficiency in the laboratory. Science 335, 579–582 (2012).

    CAS  PubMed  Google Scholar 

  19. 19.

    Derex, M., Beugin, M.-P., Godelle, B. & Raymond, M. Experimental evidence for the influence of group size on cultural complexity. Nature 503, 389–391 (2013).

    CAS  PubMed  Google Scholar 

  20. 20.

    Duncan, K., Sadanand, A. & Davachi, L. Memory’s penumbra: episodic memory decisions induce lingering mnemonic biases. Science 337, 485–487 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Gervais, W. M. & Norenzayan, A. Analytic thinking promotes religious disbelief. Science 336, 493–496 (2012).

    CAS  PubMed  Google Scholar 

  22. 22.

    Gneezy, U., Keenan, E. A. & Gneezy, A. Avoiding overhead aversion in charity. Science 346, 632–635 (2014).

    CAS  PubMed  Google Scholar 

  23. 23.

    Hauser, O. P., Rand, D. G., Peysakhovich, A. & Nowak, M. A. Cooperating with the future. Nature 511, 220–223 (2014).

    CAS  PubMed  Google Scholar 

  24. 24.

    Janssen, M. A., Holahan, R., Lee, A. & Ostrom, E. Lab experiments for the study of social-ecological systems. Science 328, 613–617 (2010).

    CAS  PubMed  Google Scholar 

  25. 25.

    Karpicke, J. D. & Blunt, J. R. Retrieval practice produces more learning than elaborative studying with concept mapping. Science 331, 772–775 (2011).

    CAS  PubMed  Google Scholar 

  26. 26.

    Kidd, D. C. & Castano, E. Reading literary fiction improves theory of mind. Science 342, 377–380 (2013).

    CAS  PubMed  Google Scholar 

  27. 27.

    Kovacs, Á. M. & Téglás, E. & Endress, A. D. The social sense: susceptibility to others’ beliefs in human infants and adults. Science 330, 1830–1834 (2010).

    CAS  PubMed  Google Scholar 

  28. 28.

    Lee, S. W. S. & Schwarz, N. Washing away postdecisional dissonance. Science 328, 709 (2010).

    CAS  PubMed  Google Scholar 

  29. 29.

    Morewedge, C. K., Huh, Y. E. & Vosgerau, J. Thought for food: imagined consumption reduces actual consumption. Science 330, 1530–1533 (2010).

    CAS  PubMed  Google Scholar 

  30. 30.

    Nishi, A., Shirado, H., Rand, D. G. & Christakis, N. A. Inequality and visibility of wealth in experimental social networks. Nature 526, 426–429 (2015).

    CAS  PubMed  Google Scholar 

  31. 31.

    Pyc, M. A. & Rawson, K. A. Why testing improves memory: mediator effectiveness hypothesis. Science 330, 335 (2010).

    CAS  PubMed  Google Scholar 

  32. 32.

    Ramirez, G. & Beilock, S. L. Writing about testing worries boosts exam performance in the classroom. Science 331, 211–213 (2011).

    CAS  PubMed  Google Scholar 

  33. 33.

    Rand, D. G., Greene, J. D. & Nowak, M. A. Spontaneous giving and calculated greed. Nature 489, 427–430 (2012).

    CAS  Google Scholar 

  34. 34.

    Shah, A. K., Mullainathan, S. & Shafir, E. Some consequences of having too little. Science 338, 682–685 (2012).

    CAS  PubMed  Google Scholar 

  35. 35.

    Sparrow, B., Liu, J. & Wegner, D. M. Google effects on memory: cognitive consequences of having information at our fingertips. Science 333, 776–778 (2011).

    CAS  PubMed  Google Scholar 

  36. 36.

    Wilson, T. D. et al. Just think: the challenges of the disengaged mind. Science 345, 75–77 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Bohannon, J. Replication effort provokes praise—and ‘bullying’ charges. Science 344, 788–789 (2014).

    CAS  PubMed  Google Scholar 

  38. 38.

    Gilbert, D. T., King, G., Pettigrew, S. & Wilson, T. D. Comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).

    CAS  PubMed  Google Scholar 

  39. 39.

    Anderson, C. J. et al. Response to comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).

    CAS  PubMed  Google Scholar 

  40. 40.

    Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).

    PubMed  Google Scholar 

  41. 41.

    Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).

    PubMed  Google Scholar 

  42. 42.

    Etz, A. & Vandekerckhove, J. A Bayesian perspective on the Reproducibility Project: Psychology. PLoS One 11, e0149794 (2016).

    PubMed  PubMed Central  Google Scholar 

  43. 43.

    Gelman, A. & Stern, H. The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60, 328–331 (2006).

    Google Scholar 

  44. 44.

    Cumming, G. Replication and P intervals: P values predict the future only vaguely, but confidence intervals do much better. Psychol. Sci. 3, 286–300 (2008).

    Google Scholar 

  45. 45.

    Verhagen, J. & Wagenmakers, E.-J. Bayesian tests to quantify the result of a replication attempt. J. Exp. Psychol. Gen. 143, 1457–1475 (2014).

    PubMed  Google Scholar 

  46. 46.

    Simonsohn, U. Small telescopes: detectability and the evaluation of replication results. Psychol. Sci. 26, 559–569 (2015).

    PubMed  Google Scholar 

  47. 47.

    Patil, P., Peng, R. D. & Leek, J. T. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11, 539–544 (2016).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    Wagenmakers, E.-J. et al. Bayesian inference for psychology. Part II: example applications with JASP. Psychon. Bull. Rev. 25, 58–76 (2017).

    PubMed Central  Google Scholar 

  49. 49.

    Lee, M. D. & Wagenmakers, E.-J. Bayesian Cognitive Modeling: A Practical Course (Cambridge Univ. Press, Cambridge, 2013).

  50. 50.

    Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl Acad. Sci. USA 112, 15343–15347 (2015).

    CAS  PubMed  Google Scholar 

  51. 51.

    Benjamin, D. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).

    PubMed  Google Scholar 

  52. 52.

    Jeffreys, H. Theory of Probability (Oxford Univ. Press, Oxford, 1961).

  53. 53.

    Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).

    Google Scholar 

  54. 54.

    Arrow, K. J. et al. The promise of prediction markets. Science 320, 877–878 (2008).

    CAS  PubMed  Google Scholar 

  55. 55.

    Nosek, B. A., Ebersole, C. R., DeHaven, A. & Mellor, D. M. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).

    CAS  PubMed  Google Scholar 

  56. 56.

    Nosek, B. A. et al. Promoting an open research culture: author guidelines for journals could help to promote transparency, openness, and reproducibility. Science 348, 1422–1425 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


Neither Nature Human Behaviour nor the publisher had any involvement with the conduct of this study prior to its submission to the journal. For financial support we thank: the Austrian Science Fund FWF (SFB F63, START-grant Y617-G11), the Austrian National Bank (grant OeNB 14953), the Behavioral and Neuroeconomics Discovery Fund (C.F.C.), the Jan Wallander and Tom Hedelius Foundation (P2015-0001:1 and P2013-0156:1), the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellows grant to A.D.), the Swedish Foundation for Humanities and Social Sciences (NHS14-1719:1), the Netherlands Organisation for Scientific Research (Vici grant 016.Vici.170.083 to E.-J.W.), the Sloan Foundation (G-2015-13929) and the Singapore National Research Foundation’s Returning Singaporean Scientists Scheme (grant NRF-RSS2014-001 to T.-H.H.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the following people for assistance with the experiments and analyses: D. van den Bergh, P.-C. Bindra, J. van Doorn, C. Huber, A. Ly, M. Marsman and J. Zambre.

Author information




C.F.C., A.D., F.H., J.H., T.-H.H., M.J., M.K., G.N., B.A.N. and T.P. designed the research. C.F.C., A.D., F.H., T.-H.H., J.H., M.J., M.K., D.M., G.N., B.A.N., T.P. and E.-J.W. wrote the paper. T.C., A.D., E.F., F.H., T.-H.H., M.J., T.P. and Y.C. helped to design the prediction market part. F.H. and E.-J.W. analysed the data. A.A., N.B., A.G., E.H., F.H., L.H., T.I., S.I., D.M., J.R. and H.W. carried out the replications (including re-estimating the original estimate with the replication data). All authors approved the final manuscript.

Corresponding author

Correspondence to Brian A. Nosek.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary References, Supplementary Tables 1–7 and Supplementary Figures 1–9

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Camerer, C.F., Dreber, A., Holzmeister, F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav 2, 637–644 (2018).

Download citation

Further reading


Quick links