Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Camerer, Colin F.; Dreber, Anna; Holzmeister, Felix; Ho, Teck-Hua; Huber, Jürgen; Johannesson, Magnus; Kirchler, Michael; Nave, Gideon; Nosek, Brian A.; Pfeiffer, Thomas; Altmejd, Adam; Buttrick, Nick; Chan, Taizan; Chen, Yiling; Forsell, Eskil; Gampa, Anup; Heikensten, Emma; Hummer, Lily; Imai, Taisuke; Isaksson, Siri; Manfredi, Dylan; Rose, Julia; Wagenmakers, Eric-Jan; Wu, Hang

doi:10.1038/s41562-018-0399-z

Letter
Published: 27 August 2018

Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Colin F. Camerer¹^na1,
Anna Dreber²^na1,
Felix Holzmeister ORCID: orcid.org/0000-0001-9606-0427³^na1,
Teck-Hua Ho⁴^na1,
Jürgen Huber³^na1,
Magnus Johannesson ORCID: orcid.org/0000-0001-8759-6393²^na1,
Michael Kirchler^3,5^na1,
Gideon Nave⁶^na1,
Brian A. Nosek ORCID: orcid.org/0000-0001-6797-5476^7,8^na1,
Thomas Pfeiffer ORCID: orcid.org/0000-0002-0592-577X⁹^na1,
Adam Altmejd ORCID: orcid.org/0000-0002-4248-0677²,
Nick Buttrick^7,8,
Taizan Chan¹⁰,
Yiling Chen¹¹,
Eskil Forsell¹²,
Anup Gampa^7,8,
Emma Heikensten²,
Lily Hummer⁸,
Taisuke Imai ORCID: orcid.org/0000-0002-0610-8093¹³,
Siri Isaksson²,
Dylan Manfredi⁶,
Julia Rose³,
Eric-Jan Wagenmakers¹⁴ &
…
Hang Wu¹⁵

Nature Human Behaviour volume 2, pages 637–644 (2018)Cite this article

65k Accesses
765 Citations
2307 Altmetric
Metrics details

Subjects

Abstract

Being able to replicate scientific findings is crucial for scientific progress^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}. We replicate 21 systematically selected experimental studies in the social sciences published in Nature and Science between 2010 and 2015^{16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36}. The replications follow analysis plans reviewed by the original authors and pre-registered prior to the replications. The replications are high powered, with sample sizes on average about five times higher than in the original studies. We find a significant effect in the same direction as the original study for 13 (62%) studies, and the effect size of the replications is on average about 50% of the original effect size. Replicability varies between 12 (57%) and 14 (67%) studies for complementary replicability indicators. Consistent with these results, the estimated true-positive rate is 67% in a Bayesian analysis. The relative effect size of true positives is estimated to be 71%, suggesting that both false positives and inflated effect sizes of true positives contribute to imperfect reproducibility. Furthermore, we find that peer beliefs of replicability are strongly related to replicability, suggesting that the research community could predict which results would replicate and that failures to replicate were not the result of chance alone.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Replication results after stage 1 and stage 2.**

**Fig. 2: Replication results for two complementary replication indicators.**

**Fig. 3: Default Bayes factors (one sided) for the 21 replications.**

**Fig. 4: Prediction market and survey beliefs.**

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Worldwide divergence of values

Article Open access 09 April 2024

Artificial intelligence and illusions of understanding in scientific research

Article 06 March 2024

References

McNutt, M. Reproducibility. Science 343, 229 (2014).
CAS PubMed Google Scholar
Baker, M. Is there a reproducibility crisis? Nature 533, 452–454 (2016).
CAS PubMed Google Scholar
Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).
Google Scholar
Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).
PubMed PubMed Central Google Scholar
Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712 (2011).
CAS PubMed Google Scholar
Begley, C. G. & Ellis, L. M. Drug development: raise standards for preclinical cancer research. Nature 483, 531–533 (2012).
CAS PubMed Google Scholar
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
CAS PubMed PubMed Central Google Scholar
Button, K. S. et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376 (2013).
CAS PubMed Google Scholar
Maniadis, Z., Tufano, F. & List, J. A. One swallow doesn’t make a summer: new evidence on anchoring effects. Am. Econ. Rev. 104, 277–290 (2014).
Google Scholar
Freedman, L. P., Cockburn, I. M. & Simcoe, T. S. The economics of reproducibility in preclinical research. PLoS Biol. 13, e1002165 (2015).
PubMed PubMed Central Google Scholar
Klein, R. A. et al. Investigating variation in replicability: a ‘many labs’ replication project. Soc. Psychol. 45, 142–152 (2014).
Google Scholar
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Google Scholar
Camerer, C. F. et al. Evaluating replicability of laboratory experiments in economics. Science 351, 1433–1436 (2016).
CAS PubMed Google Scholar
Ebersole, C. R. et al. Many Labs 3: evaluating participant pool quality across the academic semester via replication. J. Exp. Soc. Psychol. 67, 68–82 (2016).
Google Scholar
Klein, R. A. et al. Many Labs 2: investigating variation in replicability across sample and setting. Adv. Methods Prac. Psychol. Sci. (in the press).
Ackerman, J. M., Nocera, C. C. & Bargh, J. A. Incidental haptic sensations influence social judgments and decisions. Science 328, 1712–1715 (2010).
CAS PubMed PubMed Central Google Scholar
Aviezer, H., Trope, Y. & Todorov, A. Body cues, not facial expressions, discriminate between intense positive and negative emotions. Science 338, 1225–1229 (2012).
CAS PubMed Google Scholar
Balafoutas, L. & Sutter, M. Affirmative action policies promote women and do not harm efficiency in the laboratory. Science 335, 579–582 (2012).
CAS PubMed Google Scholar
Derex, M., Beugin, M.-P., Godelle, B. & Raymond, M. Experimental evidence for the influence of group size on cultural complexity. Nature 503, 389–391 (2013).
CAS PubMed Google Scholar
Duncan, K., Sadanand, A. & Davachi, L. Memory’s penumbra: episodic memory decisions induce lingering mnemonic biases. Science 337, 485–487 (2012).
CAS PubMed PubMed Central Google Scholar
Gervais, W. M. & Norenzayan, A. Analytic thinking promotes religious disbelief. Science 336, 493–496 (2012).
CAS PubMed Google Scholar
Gneezy, U., Keenan, E. A. & Gneezy, A. Avoiding overhead aversion in charity. Science 346, 632–635 (2014).
CAS PubMed Google Scholar
Hauser, O. P., Rand, D. G., Peysakhovich, A. & Nowak, M. A. Cooperating with the future. Nature 511, 220–223 (2014).
CAS PubMed Google Scholar
Janssen, M. A., Holahan, R., Lee, A. & Ostrom, E. Lab experiments for the study of social-ecological systems. Science 328, 613–617 (2010).
CAS PubMed Google Scholar
Karpicke, J. D. & Blunt, J. R. Retrieval practice produces more learning than elaborative studying with concept mapping. Science 331, 772–775 (2011).
CAS PubMed Google Scholar
Kidd, D. C. & Castano, E. Reading literary fiction improves theory of mind. Science 342, 377–380 (2013).
CAS PubMed Google Scholar
Kovacs, Á. M. & Téglás, E. & Endress, A. D. The social sense: susceptibility to others’ beliefs in human infants and adults. Science 330, 1830–1834 (2010).
CAS PubMed Google Scholar
Lee, S. W. S. & Schwarz, N. Washing away postdecisional dissonance. Science 328, 709 (2010).
CAS PubMed Google Scholar
Morewedge, C. K., Huh, Y. E. & Vosgerau, J. Thought for food: imagined consumption reduces actual consumption. Science 330, 1530–1533 (2010).
CAS PubMed Google Scholar
Nishi, A., Shirado, H., Rand, D. G. & Christakis, N. A. Inequality and visibility of wealth in experimental social networks. Nature 526, 426–429 (2015).
CAS PubMed Google Scholar
Pyc, M. A. & Rawson, K. A. Why testing improves memory: mediator effectiveness hypothesis. Science 330, 335 (2010).
CAS PubMed Google Scholar
Ramirez, G. & Beilock, S. L. Writing about testing worries boosts exam performance in the classroom. Science 331, 211–213 (2011).
CAS PubMed Google Scholar
Rand, D. G., Greene, J. D. & Nowak, M. A. Spontaneous giving and calculated greed. Nature 489, 427–430 (2012).
CAS Google Scholar
Shah, A. K., Mullainathan, S. & Shafir, E. Some consequences of having too little. Science 338, 682–685 (2012).
CAS PubMed Google Scholar
Sparrow, B., Liu, J. & Wegner, D. M. Google effects on memory: cognitive consequences of having information at our fingertips. Science 333, 776–778 (2011).
CAS PubMed Google Scholar
Wilson, T. D. et al. Just think: the challenges of the disengaged mind. Science 345, 75–77 (2014).
CAS PubMed PubMed Central Google Scholar
Bohannon, J. Replication effort provokes praise—and ‘bullying’ charges. Science 344, 788–789 (2014).
CAS PubMed Google Scholar
Gilbert, D. T., King, G., Pettigrew, S. & Wilson, T. D. Comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).
CAS PubMed Google Scholar
Anderson, C. J. et al. Response to comment on "Estimating the reproducibility of psychological science". Science 351, 1037 (2016).
CAS PubMed Google Scholar
Ioannidis, J. P. A. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).
PubMed Google Scholar
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
PubMed Google Scholar
Etz, A. & Vandekerckhove, J. A Bayesian perspective on the Reproducibility Project: Psychology. PLoS One 11, e0149794 (2016).
PubMed PubMed Central Google Scholar
Gelman, A. & Stern, H. The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60, 328–331 (2006).
Google Scholar
Cumming, G. Replication and P intervals: P values predict the future only vaguely, but confidence intervals do much better. Psychol. Sci. 3, 286–300 (2008).
Google Scholar
Verhagen, J. & Wagenmakers, E.-J. Bayesian tests to quantify the result of a replication attempt. J. Exp. Psychol. Gen. 143, 1457–1475 (2014).
PubMed Google Scholar
Simonsohn, U. Small telescopes: detectability and the evaluation of replication results. Psychol. Sci. 26, 559–569 (2015).
PubMed Google Scholar
Patil, P., Peng, R. D. & Leek, J. T. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11, 539–544 (2016).
PubMed PubMed Central Google Scholar
Wagenmakers, E.-J. et al. Bayesian inference for psychology. Part II: example applications with JASP. Psychon. Bull. Rev. 25, 58–76 (2017).
PubMed Central Google Scholar
Lee, M. D. & Wagenmakers, E.-J. Bayesian Cognitive Modeling: A Practical Course (Cambridge Univ. Press, Cambridge, 2013).
Dreber, A. et al. Using prediction markets to estimate the reproducibility of scientific research. Proc. Natl Acad. Sci. USA 112, 15343–15347 (2015).
CAS PubMed Google Scholar
Benjamin, D. et al. Redefine statistical significance. Nat. Hum. Behav. 2, 6–10 (2018).
PubMed Google Scholar
Jeffreys, H. Theory of Probability (Oxford Univ. Press, Oxford, 1961).
Kass, R. E. & Raftery, A. E. Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995).
Google Scholar
Arrow, K. J. et al. The promise of prediction markets. Science 320, 877–878 (2008).
CAS PubMed Google Scholar
Nosek, B. A., Ebersole, C. R., DeHaven, A. & Mellor, D. M. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).
CAS PubMed Google Scholar
Nosek, B. A. et al. Promoting an open research culture: author guidelines for journals could help to promote transparency, openness, and reproducibility. Science 348, 1422–1425 (2015).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Neither Nature Human Behaviour nor the publisher had any involvement with the conduct of this study prior to its submission to the journal. For financial support we thank: the Austrian Science Fund FWF (SFB F63, START-grant Y617-G11), the Austrian National Bank (grant OeNB 14953), the Behavioral and Neuroeconomics Discovery Fund (C.F.C.), the Jan Wallander and Tom Hedelius Foundation (P2015-0001:1 and P2013-0156:1), the Knut and Alice Wallenberg Foundation (Wallenberg Academy Fellows grant to A.D.), the Swedish Foundation for Humanities and Social Sciences (NHS14-1719:1), the Netherlands Organisation for Scientific Research (Vici grant 016.Vici.170.083 to E.-J.W.), the Sloan Foundation (G-2015-13929) and the Singapore National Research Foundation’s Returning Singaporean Scientists Scheme (grant NRF-RSS2014-001 to T.-H.H.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We thank the following people for assistance with the experiments and analyses: D. van den Bergh, P.-C. Bindra, J. van Doorn, C. Huber, A. Ly, M. Marsman and J. Zambre.

Author information

These authors contributed equally: Colin F. Camerer, Anna Dreber, Felix Holzmeister, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Gideon Nave, Brian A. Nosek, Thomas Pfeiffer.

Authors and Affiliations

California Institute of Technology, Pasadena, CA, USA
Colin F. Camerer
Department of Economics, Stockholm School of Economics, Stockholm, Sweden
Anna Dreber, Magnus Johannesson, Adam Altmejd, Emma Heikensten & Siri Isaksson
Department of Banking and Finance, University of Innsbruck, Innsbruck, Austria
Felix Holzmeister, Jürgen Huber, Michael Kirchler & Julia Rose
NUS Business School, National University of Singapore, Singapore, Singapore
Teck-Hua Ho
Centre for Finance, Department of Economics, University of Göteborg, Göteborg, Sweden
Michael Kirchler
The Wharton School, University of Pennsylvania, Philadelphia, PA, USA
Gideon Nave & Dylan Manfredi
Department of Psychology, University of Virginia, Charlottesville, VA, USA
Brian A. Nosek, Nick Buttrick & Anup Gampa
Center for Open Science, Charlottesville, VA, USA
Brian A. Nosek, Nick Buttrick, Anup Gampa & Lily Hummer
New Zealand Institute for Advanced Study, Auckland, New Zealand
Thomas Pfeiffer
Office of the Senior Deputy President and Provost, National University of Singapore, Singapore, Singapore
Taizan Chan
John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
Yiling Chen
Spotify Sweden AB, Stockholm, Sweden
Eskil Forsell
Department of Economics, LMU Munich, Munich, Germany
Taisuke Imai
Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
Eric-Jan Wagenmakers
School of Management, Harbin Institute of Technology, Harbin, China
Hang Wu

Authors

Colin F. Camerer
View author publications
You can also search for this author in PubMed Google Scholar
Anna Dreber
View author publications
You can also search for this author in PubMed Google Scholar
Felix Holzmeister
View author publications
You can also search for this author in PubMed Google Scholar
Teck-Hua Ho
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Huber
View author publications
You can also search for this author in PubMed Google Scholar
Magnus Johannesson
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kirchler
View author publications
You can also search for this author in PubMed Google Scholar
Gideon Nave
View author publications
You can also search for this author in PubMed Google Scholar
Brian A. Nosek
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Pfeiffer
View author publications
You can also search for this author in PubMed Google Scholar
Adam Altmejd
View author publications
You can also search for this author in PubMed Google Scholar
Nick Buttrick
View author publications
You can also search for this author in PubMed Google Scholar
Taizan Chan
View author publications
You can also search for this author in PubMed Google Scholar
Yiling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Eskil Forsell
View author publications
You can also search for this author in PubMed Google Scholar
Anup Gampa
View author publications
You can also search for this author in PubMed Google Scholar
Emma Heikensten
View author publications
You can also search for this author in PubMed Google Scholar
Lily Hummer
View author publications
You can also search for this author in PubMed Google Scholar
Taisuke Imai
View author publications
You can also search for this author in PubMed Google Scholar
Siri Isaksson
View author publications
You can also search for this author in PubMed Google Scholar
Dylan Manfredi
View author publications
You can also search for this author in PubMed Google Scholar
Julia Rose
View author publications
You can also search for this author in PubMed Google Scholar
Eric-Jan Wagenmakers
View author publications
You can also search for this author in PubMed Google Scholar
Hang Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.F.C., A.D., F.H., J.H., T.-H.H., M.J., M.K., G.N., B.A.N. and T.P. designed the research. C.F.C., A.D., F.H., T.-H.H., J.H., M.J., M.K., D.M., G.N., B.A.N., T.P. and E.-J.W. wrote the paper. T.C., A.D., E.F., F.H., T.-H.H., M.J., T.P. and Y.C. helped to design the prediction market part. F.H. and E.-J.W. analysed the data. A.A., N.B., A.G., E.H., F.H., L.H., T.I., S.I., D.M., J.R. and H.W. carried out the replications (including re-estimating the original estimate with the replication data). All authors approved the final manuscript.

Corresponding author

Correspondence to Brian A. Nosek.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary References, Supplementary Tables 1–7 and Supplementary Figures 1–9

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Camerer, C.F., Dreber, A., Holzmeister, F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat Hum Behav 2, 637–644 (2018). https://doi.org/10.1038/s41562-018-0399-z

Download citation

Received: 06 March 2018
Accepted: 06 July 2018
Published: 27 August 2018
Issue Date: September 2018
DOI: https://doi.org/10.1038/s41562-018-0399-z

This article is cited by

Realizing the full potential of behavioural science for climate change mitigation
- Kristian S. Nielsen
- Viktoria Cologna
- Kimberly S. Wolske
Nature Climate Change (2024)
Trust but verify

Nature Materials (2024)
Optimally generate policy-based evidence before scaling
- John A. List
Nature (2024)
An evaluation of the replicability of analyses using synthetic health data
- Khaled El Emam
- Lucy Mosquera
- Alaa El-Hussuna
Scientific Reports (2024)
Field testing the transferability of behavioural science knowledge on promoting vaccinations
- Silvia Saccardo
- Hengchen Dai
- Jeffrey Fujimoto
Nature Human Behaviour (2024)