In the past decade, behavioural science has gained influence in policymaking but suffered a crisis of confidence in the replicability of its findings. Here, we describe a nascent heterogeneity revolution that we believe these twin historical trends have triggered. This revolution will be defined by the recognition that most treatment effects are heterogeneous, so the variation in effect estimates across studies that defines the replication crisis is to be expected as long as heterogeneous effects are studied without a systematic approach to sampling and moderation. When studied systematically, heterogeneity can be leveraged to build more complete theories of causal mechanism that could inform nuanced and dependable guidance to policymakers. We recommend investment in shared research infrastructure to make it feasible to study behavioural interventions in heterogeneous and generalizable samples, and suggest low-cost steps researchers can take immediately to avoid being misled by heterogeneity and begin to learn from it instead.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Experimental Economics Open Access 28 November 2023
The direction of effects between parenting and adolescent affective well-being in everyday life is family specific
Scientific Reports Open Access 26 September 2023
The Yet Underestimated Importance of Communicating Findings from Educational Trials to Teachers, Schools, School Authorities, or Policy Makers (Comment on Brady et al. (2023))
Educational Psychology Review Open Access 15 June 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Science that can change the world. Nat. Hum. Behav. 3, 539–539 (2019).
Dubner, S. J. Could solving this one problem solve all the others? (Episode 282). Freakonomics http://freakonomics.com/podcast/solving-one-problem-solve-others/ (2017).
Benartzi, S. et al. Should governments invest more in nudging? Psychol. Sci. 28, 1041–1055 (2017).
Walton, G. M. The new science of wise psychological interventions. Curr. Dir. Psychol. Sci. 23, 73–82 (2014).
Walton, G. M. & Wilson, T. D. Wise interventions: psychological remedies for social and personal problems. Psychol. Rev. 125, 617–655 (2018).
Thaler, R. H. Watching behavior before writing the rules. The New York Times (7 July 2012).
Fix, C. R. & Sitkin, S. B. Bridging the divide between behavioral science & policy. Behav. Sci. Policy 1, 1–14 (2015).
Bavel, J. J. V. et al. Using social and behavioural science to support COVID-19 pandemic response. Nat. Hum. Behav. 4, 460–471 (2020).
Appelbaum, B. Behaviorists show the U.S. how to improve government operations. The New York Times (29 September 2015).
Afif, Z., Islan, W. W., Calvo-Gonzalez, O. & Dalton, A. Behavioral Science Around the World: Profiles of 10 Countries (World Bank, 2018).
Martin, S. & Ferrere, A. Building behavioral science capability in your company. Harvard Business Review (4 December 2017).
Karlan, D., Tanita, P. & Welch, S. Behavioral economics and donor nudges: impulse or deliberation? Stanford Social Innovation Review https://ssir.org/articles/entry/behavioral_economics_and_donor_nudges_impulse_or_deliberation# (2019).
Wendel, S. in Nudge Theory in Action: Behavioral Design in Policy and Markets (ed. Abdukadirov, S.) 95–123 (Springer, 2016).
Collaboration, O. S. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Camerer, C. F. et al. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2, 637–644 (2018).
Nelson, L. D., Simmons, J. & Simonsohn, U. Psychology’s renaissance. Annu. Rev. Psychol. 69, 511–534 (2018).
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
Allcott, H. Social norms and energy conservation. J. Public Econ. 95, 1082–1095 (2011).
Allcott, H. & Rogers, T. The short-run and long-run effects of behavioral interventions: experimental evidence from energy conservation. Am. Econ. Rev. 104, 3003–3037 (2014).
Office of Evaluation Sciences. A Confirmation Prompt Reduces Financial Self-Reporting Error (2015); https://oes.gsa.gov/assets/abstracts/1514-Industrial-Funding-Fee-Reports.pdf
Hoxby, C. M. & Turner, S. What high-achieving low-income students know about college. Am. Econ. Rev. 105, 514–517 (2015).
Bettinger, E. P., Long, B. T., Oreopoulos, P. & Sanbonmatsu, L. The role of application assistance and information in college decisions: results from the H&R Block Fafsa experiment. Q. J. Econ. 127, 1205–1242 (2012).
Bryan, C. J., Walton, G. M., Rogers, T. & Dweck, C. S. Motivating voter turnout by invoking the self. Proc. Natl Acad. Sci. USA 108, 12653–12656 (2011).
Allcott, H. Site selection bias in program evaluation. Q. J. Econ. 130, 1117–1165 (2015).
Office of Evaluation Sciences. A Confirmation Prompt Reduced Financial Self-Reporting Errors Initially, But The Effect Did Not Persist in Subsequent Periods (2017); https://oes.gsa.gov/assets/abstracts/1514-2-iff-confirmation-prompt-update.pdf
Tough, P. The Years That Matter Most: How College Makes or Breaks Us (Houghton Mifflin Harcourt, 2019)
Bird, K. A. et al. Nudging at Scale: Experimental Evidence from FAFSA Completion Campaigns (National Bureau opf Economic Research, 2019).
Gerber, A. S., Huber, G. A., Biggers, D. R. & Hendry, D. J. Reply to Bryan et al.: Variation in context unlikely explanation of nonrobustness of noun versus verb results. Proc. Natl Acad. Sci. USA 113, E6549–E6550 (2016).
Gerber, A., Huber, G. & Fang, A. Do subtle linguistic interventions priming a social identity as a voter have outsized effects on voter turnout? Evidence from a new replication experiment: outsized turnout effects of subtle linguistic cues. Polit. Psychol. 39, 925–938 (2018).
IJzerman, H. et al. Use caution when applying behavioural science to policy. Nat. Hum. Behav. 4, 1092–1094 (2020).
Lewis Jr, N. A. & Wai, J. Communicating what we know and what isn’t so: Science communication in psychology. Perspect. Psychol. Sci. https://doi.org/10.1177%2F1745691620964062 (2021).
Munafò, M. Raising research quality will require collective action. Nature 576, 183–183 (2019).
Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).
Simons, D. J., Holcombe, A. O. & Spellman, B. A. An introduction to registered replication reports at Perspectives on Psychological Science. Perspect. Psychol. Sci. 9, 552–555 (2014).
Nosek, B. A. & Lakens, D. Registered reports: a method to increase the credibility of published results. Soc. Psychol. 45, 137–141 (2014).
Berg, J. Progress on reproducibility. Science 359, 9 (2018).
Miller, D. I. When do growth mindset interventions work? Trends Cogn. Sci. 23, 910–912 (2019).
Szaszi, B., Palinkas, A., Palfi, B., Szollosi, A. & Aczel, B. A systematic scoping review of the choice architecture movement: toward understanding when and why nudges work. J. Behav. Decis. Mak. 31, 355–366 (2018).
Visser, P. S., Krosnick, J. A. & Lavrakas, P. J. in Handbook of Research Methods in Social and Personality Psychology (eds Reis, H. T. & Judd, C. M.) 223–252 (Cambridge Univ. Press, 2000).
Metz, C. Who is making sure the A.I. machines aren’t racist? The New York Times (15 March 2021).
Rose, T. The End of Average: How We Succeed in a World That Values Sameness (HarperOne, 2016).
Enos, R. D., Fowler, A. & Vavreck, L. Increasing inequality: the effect of GOTV mobilization on the composition of the electorate. J. Polit. 76, 273–288 (2014).
Kuhn, T. S. The Structure of Scientific Revolutions (University of Chicago Press, 1964).
McShane, B. B., Tackett, J. L., Böckenholt, U. & Gelman, A. Large-scale replication projects in contemporary psychological research. Am. Stat. 73, 99–105 (2019).
Kenny, D. A. & Judd, C. M. The unappreciated heterogeneity of effect sizes: Implications for power, precision, planning of research, and replication. Psychol. Methods 24, 578–589 (2019).
Stanley, T. D., Carter, E. C. & Doucouliagos, H. What meta-analyses reveal about the replicability of psychological research. Psychol. Bull. 144, 1325–1346 (2018).
Rahwan, Z., Yoeli, E. & Fasolo, B. Heterogeneity in banker culture and its influence on dishonesty. Nature 575, 345–349 (2019).
Bryan, C. J., Yeager, D. S. & O’Brien, J. Replicator degrees of freedom allow publication of misleading failures to replicate. Proc. Natl Acad. Sci. USA 116, 25535–25545 (2019).
Gelman, A. The connection between varying treatment effects and the crisis of unreplicable research: a Bayesian perspective. J. Manag. 41, 632–643 (2015).
Kitayama, S. Attitudes and social cognition. J. Pers. Soc. Psychol. 112, 357–360 (2017).
Walton, G. M. & Crum, A. J. (eds) Handbook of Wise Interventions: How Social Psychology Can Help People Change (Guilford Press, 2020).
Linden, A. H. & Hönekopp, J. Heterogeneity of research results: A new perspective from which to assess and promote progress in psychological science. Perspect. Psychol. Sci. 16, 358–376 (2021).
Thaler, R. H. & Sunstein, C. R. Nudge: Improving Decisions About Health, Wealth, and Happiness (Penguin Books, 2008).
McShane, B. B. & Böckenholt, U. You cannot step into the same river twice: when power analyses are optimistic. Perspect. Psychol. Sci. 9, 612–625 (2014).
Schultz, P. W., Nolan, J. M., Cialdini, R. B., Goldstein, N. J. & Griskevicius, V. The constructive, destructive, and reconstructive power of social norms. Psychol. Sci. 18, 429–434 (2007).
Chetty, R. Behavioral economics and public policy: a pragmatic perspective. Am. Econ. Rev. 105, 1–33 (2015).
Vivalt, E. How much can we generalize from impact evaluations?. J. Eur. Econ. Assoc. 18, 3045–3089 (2020).
Premachandra, B. & Neil Lewis, J. Do we report the information that is necessary to give psychology away? A scoping review of the psychological intervention literature 2000–2018. Perspect. Psychol. Sci. https://doi.org/10.1177/1745691620974774 (2021).
Gerber, A. S., Huber, G. A., Biggers, D. R. & Hendry, D. J. A field experiment shows that subtle linguistic cues might not affect voter behavior. Proc. Natl Acad. Sci. USA 113, 7112–7117 (2016).
Yong, E. Psychology’s ‘simple little tricks’ are falling apart. The Atlantic https://www.theatlantic.com/science/archive/2016/09/can-simple-tricks-mobilise-voters-and-help-students/499109/ (2016).
Yeager, D. S. et al. A national experiment reveals where a growth mindset improves achievement. Nature 573, 364–369 (2019).
Yeager, D. S., Krosnick, J. A., Visser, P. S., Holbrook, A. L. & Tahk, A. M. Moderation of classic social psychological effects by demographics in the U.S. adult population: new opportunities for theoretical advancement. J. Pers. Soc. Psychol. 117, e84 (2019).
Spencer, S. J., Zanna, M. P. & Fong, G. T. Establishing a causal chain: why experiments are often more effective than mediational analyses in examining psychological processes. J. Pers. Soc. Psychol. 89, 845–851 (2005).
Bullock, J. G., Green, D. P. & Ha, S. E. Yes, but what’s the mechanism? (Don’t expect an easy answer). J. Pers. Soc. Psychol. 98, 550–558 (2010).
Imai, K., Keele, L., Tingley, D. & Yamamoto, T. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. Am. Polit. Sci. Rev. 105, 765–789 (2011).
Bailey, D. H., Duncan, G., Cunha, F., Foorman, B. R. & Yeager, D. S. Fadeout and persistence of educational intervention effects. Psychol. Sci. Public Interest 21, 55–97 (2019).
Bardi, L., Gheza, D. & Brass, M. TPJ–M1 interaction in the control of shared representations: new insights from tDCS and TMS combined. NeuroImage 146, 734–740 (2017).
Krall, S. C. et al. The right temporoparietal junction in attention and social interaction: a transcranial magnetic stimulation study. Hum. Brain Mapp. 37, 796–807 (2016).
Mai, X. et al. Using tDCS to explore the role of the right temporo-parietal junction in theory of mind and cognitive empathy. Front. Psychol. 7, 380 (2016).
Reardon, S. F. & Stuart, E. A. Editors’ introduction: theme issue on variation in treatment effects. J. Res. Educ. Eff. 10, 671–674 (2017).
Tipton, E. & Hedges, L. V. The role of the sample in estimating and explaining treatment effect heterogeneity. J. Res. Educ. Eff. 10, 903–906 (2017).
VanderWeele, T. J. & Robins, J. M. Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology 18, 561–568 (2007).
Bryk, A. S., Gomez, L. M., Grunow, A. & LeMahieu, P. G. Learning to Improve: How America’s Schools Can Get Better at Getting Better (Harvard Education Press, 2015).
Weiss, M. J., Bloom, H. S. & Brock, T. A conceptual framework for studying the sources of variation in program effects. J. Policy Anal. Manag. 33, 778–808 (2014).
Simons, D. J., Shoda, Y. & Lindsay, D. S. Constraints on generality (COG): a proposed addition to all empirical papers. Perspect. Psychol. Sci. 12, 1123–1128 (2017).
Request for Applications: Education Research Grant Program. Institute for Education Sciences https://ies.ed.gov/funding/pdf/2021_84305A.pdf (2020).
Tipton, E. Beyond generalization of the ATE: designing randomized trials to understand treatment effect heterogeneity. J. R. Stat. Soc. A https://doi.org/10.1111/rssa.12629 (2020).
Ding, P., Feller, A. & Miratrix, L. Decomposing treatment effect variation. J. Am. Stat. Assoc. 114, 304–317 (2019).
Carvalho, C. M., Feller, A., Murray, J., Woody, S. & Yeager, D. S. Assessing treatment effect variation in observational studies: results from a data challenge. Obs. Stud. 5, 21–35 (2019).
Green, D. P. & Kern, H. L. Modeling heterogeneous treatment effects in survey experiments with bayesian additive regression trees. Public Opin. Q. 76, 491–511 (2012).
Tipton, E. & Olsen, R. B. A review of statistical methods for generalizing from evaluations of educational interventions. Educ. Res. 47, 516–524 (2018).
Tipton, E. Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments. Eval. Rev. 37, 109–139 (2014).
Brown, S. D. et al. A duty to describe: better the devil you know than the devil you don’t. Perspect. Psychol. Sci. 9, 626–640 (2014).
Gilbert, D. T., King, G., Pettigrew, S. & Wilson, T. D. Comment on ‘Estimating the reproducibility of psychological science’. Science 351, 1037–1037 (2016).
Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J. & Reinero, D. A. Contextual sensitivity in scientific reproducibility. Proc. Natl Acad. Sci. USA 113, 6454–6459 (2016).
Van Bavel, J., Mende-Siedlecki, P., Brady, W. J. & Reinero, D. A. Reply to Inbar: Contextual sensitivity helps explain the reproducibility gap between social and cognitive psychology. Proc. Natl Acad. Sci. USA 113, E4935–E4936 (2016).
Srivastava, S. Moderator interpretations of the Reproducibility Project. The Hardest Science https://thehardestscience.com/2015/09/02/moderator-interpretations-of-the-reproducibility-project/ (2015)
Roberts, B. W. The New Rules of Research. pigee https://pigee.wordpress.com/2015/09/17/the-new-rules-of-research/ (2015)
Miller, D. T., Dannals, J. E. & Zlatev, J. J. Behavioral processes in long-lag intervention studies. Perspect. Psychol. Sci. 12, 454–467 (2017).
Walton, G. M. & Yeager, D. S. Seed and soil: psychological affordances in contexts help to explain where wise interventions succeed or fail. Curr. Dir. Psychol. Sci. 29, 219–226 (2020).
Destin, M. Identity research that engages contextual forces to reduce socioeconomic disparities in education. Curr. Dir. Psychol. Sci. 29, 161–166 (2020).
Diekman, A. B., Joshi, M. P. & Benson-Greenwald, T. M. in Advances in Experimental Social Psychology (ed. Gawronski, B.) 189–244 (Academic Press, 2020).
Steele, C. M. A threat in the air: How stereotypes shape intellectual identity and performance. Am. Psychol. 52, 613–629 (1997).
Walton, G. M. & Cohen, G. L. A question of belonging: race, social fit, and achievement. J. Pers. Soc. Psychol. 92, 82–96 (2007).
Walton, G. M. & Cohen, G. L. A brief social-belonging intervention improves academic and health outcomes of minority students. Science 331, 1447–1451 (2011).
Cheryan, S., Plaut, V. C., Davies, P. G. & Steele, C. M. Ambient belonging: how stereotypical cues impact gender participation in computer science. J. Pers. Soc. Psychol. 97, 1045–1060 (2009).
Mullainathan, S. & Shafir, E. Scarcity: Why Having Too Little Means So Much (Times Books, 2013).
Abrajano, M. Reexamining the ‘racial gap’ in political knowledge. J. Polit. 77, 44–54 (2015).
Kim, H. & Markus, H. R. Deviance or uniqueness, harmony or conformity? A cultural analysis. J. Pers. Soc. Psychol. 77, 785–800 (1999).
Stephens, N. M., Markus, H. R. & Townsend, S. S. M. Choice as an act of meaning: The case of social class. J. Pers. Soc. Psychol. 93, 814–830 (2007).
Ross, L., Lepper, M. & Ward, A. in Handbook of Social Psychology https://doi.org/10.1002/9780470561119.socpsy001001 (John Wiley & Sons, 2010).
Miller, L. C. et al. Causal inference in generalizable environments: systematic representative design. Psychol. Inq. 30, 173–202 (2019).
Kraft, M. A. Interpreting Effect Sizes Of Education Interventions Working Paper (Brown University, 2018).
Yeager, D. S. How to overcome the education hype cycle. BOLD https://bold.expert/how-to-overcome-the-education-hype-cycle/ (2019).
Tipton, E., Yeager, D. S., Iachan, R. & Schneider, B. in Experimental Methods in Survey Research: Techniques that Combine Random Sampling with Random Assignment (eds Lavrakas, P. J. et al.) Ch. 22 (Wiley, 2019).
Hahn, P. R., Murray, J. S. & Carvalho, C. M. Bayesian regression tree models for causal inference: Regularization, confounding, and heterogeneous effects (with discussion). Bayesian Anal. 15, 965–1056 (2020).
Paluck, E. L., Shepherd, H. & Aronow, P. M. Changing climates of conflict: a social network experiment in 56 schools. Proc. Natl Acad. Sci. USA 113, 566–571 (2016).
Lewis, N. A. et al. Using qualitative approaches to improve quantitative inferences in environmental psychology. MethodsX 7, 100943 (2020).
Hershfield, H. E., Shu, S. & Benartzi, S. Temporal reframing and participation in a savings program: a field experiment. Market. Sci. 39, 1039–1051 (2020).
Holland, P. W. Statistics and causal inference. J. Am. Stat. Assoc. 81, 945–960 (1986).
Alexander, S. Links 12/19 Slate Star Codex https://slatestarcodex.com/2019/12/02/links-12-19/ (2019).
Overbye, D. A giant takes on physics’ biggest questions. The New York Times (15 May 2007).
Cho, A. Higgs boson makes its debut after decades-Long search. Science 337, 141–143 (2012).
Yeager, D. S. & Walton, G. M. Social-psychological interventions in education: They’re not magic. Rev. Educ. Res. 81, 267–301 (2011).
Singal, J. The Quick Fix: Why Fad Psychology Can’t Cure Our Social Ills (Farrar, Straus and Giroux, 2021).
John, L. K., Loewenstein, G. & Prelec, D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23, 524–532 (2012).
Vazire, S. Implications of the credibility revolution for productivity, creativity, and progress. Perspect. Psychol. Sci. 13, 411–417 (2018).
Bryan, C. J., Walton, G. M. & Dweck, C. S. Psychologically authentic versus inauthentic replication attempts. Proc. Natl Acad. Sci. USA 113, E6548 (2016).
Moshontz, H. et al. The Psychological Science Accelerator: advancing psychology through a distributed collaborative network. Adv. Methods Pract. Psychol. Sci. 1, 501–515 (2018).
Ladhania, R., Speiss, J., Milkman, K., Mullainathan, S. & Ungar, L. Personalizing treatments for habit formation: learning optimal treatment rules from a multi-arm experiment. In Allied Social Science Associations Annual Meeting 2021 (American Economic Association, 2021).
We thank C. Dweck, G. Wu, K. Milkman, A. Duckworth, G. Cohen, G. Chapman, J. Risen, G. Walton and S. Brady for helpful feedback on previous drafts.
The authors declare no competing interests.
Peer review information Nature Human Behaviour thanks Balazs Aczel, Ashley Whillans and Timothy Wilson for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Bryan, C.J., Tipton, E. & Yeager, D.S. Behavioural science is unlikely to change the world without a heterogeneity revolution. Nat Hum Behav 5, 980–989 (2021). https://doi.org/10.1038/s41562-021-01143-3
This article is cited by
The direction of effects between parenting and adolescent affective well-being in everyday life is family specific
Scientific Reports (2023)
Nature Reviews Psychology (2023)
Nature Human Behaviour (2023)
How Strong Is the Evidence for a Causal Reciprocal Effect? Contrasting Traditional and New Methods to Investigate the Reciprocal Effects Model of Self-Concept and Achievement
Educational Psychology Review (2023)
Quality & Quantity (2023)