Use caution when applying behavioural science to policy

IJzerman, Hans; Lewis, Neil A.; Przybylski, Andrew K.; Weinstein, Netta; DeBruine, Lisa; Ritchie, Stuart J.; Vazire, Simine; Forscher, Patrick S.; Morey, Richard D.; Ivory, James D.; Anvari, Farid

doi:10.1038/s41562-020-00990-w

Download PDF

Comment
Published: 09 October 2020

Use caution when applying behavioural science to policy

Nature Human Behaviour volume 4, pages 1092–1094 (2020)Cite this article

25k Accesses
115 Citations
197 Altmetric
Metrics details

Subjects

Social and behavioural scientists have attempted to speak to the COVID-19 crisis. But is behavioural research on COVID-19 suitable for making policy decisions? We offer a taxonomy that lets our science advance in ‘evidence readiness levels’ to be suitable for policy. We caution practitioners to take extreme care translating our findings to applications.

Researchers in the social and behavioural sciences periodically debate whether their research should be used to address pressing issues in society. To provide a few examples, in the 1940s psychologists discussed using research to address problems related to intergroup relations, problems brought to the fore by the Holocaust and other acts of rampant prejudice. In the 1990s, psychologists debated whether their research should inform legal decision-making. In the 2010s, psychologists argued for advising branches of government as economists often do. And now, in 2020, psychologists and other social and behavioural scientists are arguing that our research should inform the response to the new coronavirus disease (henceforth COVID-19)^1,2.

We are a team mostly consisting of empirical psychologists who conduct research on basic, applied and meta-scientific processes. We believe that scientists should apply their creativity, efforts and talents to serve our society, especially during crises. However, the way that social and behavioural science research is often conducted makes it difficult to know whether our efforts will do more good than harm. We will provide some examples from the field of social-personality psychology, where most of us were trained, to illustrate our concerns. This focus is not meant to imply that our field alone suffers from the issues we will discuss. Instead, a growing meta-science literature suggests that many other social and behavioural disciplines have encountered dynamics similar to those faced by our field.

What are those dynamics? First, study participants, mainly students, are drawn from populations that are in Western (mostly US), educated, industrialized, rich and democratic (WEIRD) societies³. Second, even with this narrow slice of population, the effects in published papers are not estimated with precision, sometimes barely ruling out trivially small effects under ostensibly controlled conditions. Third, many studies use a narrow range of stimuli and do not test for stimulus generalisability⁴. Fourth, many studies examine effects on measures, such as self-report scales, that are infrequently validated or linked to behaviour, much less to policy-relevant outcomes⁵. Fifth, independently replicated findings, even under ideal circumstances, are rare. Finally, our studies often fail to account for deeper cultural, historical, political and structural factors that play important moderating roles during the process of translation from basic findings to application. Together, these issues produce empirical insights that are more heterogeneous than might be apparent from a scan of the published literature.

Confident applications of social and behavioural science findings, then, require first and foremost an assessment of the evidence quality and weighing heterogeneity and the trade-offs and opportunity costs that follow. We must identify reliable findings that can be applied, have been investigated in the nations for which the application is intended and are derived from investigations using diverse stimuli. But the assessment of how ‘ready’ the intervention is must be included when persuading decision-makers to apply social and behavioural science evidence, particularly in crisis situations when lives are at stake and resources are limited. Not doing so can have disastrous consequences.

Here we propose one approach for assessing the quality of evidence before application and dissemination. Specifically, we draw inspiration from the US National Aeronautics and Space Administration (NASA)’s ‘technology readiness levels’ (TRL⁶), a benchmarking system for systematically evaluating the quality of scientific evidence and which has been used by the European Commission to judge how ready scientific applications beyond space flight are for operational environments. TRLs rank a technology’s readiness for application from 1 to 9 (see Fig. 1). At TRL1, basic principles have been reliably observed, reported and translated to a formal model. In TRL2, basic principles have been developed and tested in an application area. It is not until TRL4, when a prototype is developed, that tests are run in various environments that are as representative of the eventual application area(s) as possible. Later, at TRL6, the system is tested in a ‘real’ environment (like ground-to-space). At the very highest level (TRL9), the system has been ‘flight-proven’ through successful mission operations. These TRLs provide a useful framework to jumpstart conversations about how to assess the readiness of social and behavioural science evidence for application and dissemination.

Introducing evidence readiness levels

The desire to “directly inform policy and individual and collective behaviour in response to the pandemic” (p. 461)¹ overlooks existing evidence frameworks and the challenges we identify, illustrating that a simple taxonomy is necessary to have at hand during crises. As a very preliminary step to this end we propose a social and behavioural science variant of TRLs, evidence readiness levels (ERLs; Fig. 2).

There are several frameworks for assessing evidence quality across different scientific fields. The one that comes closest to what we envision is the Society for Prevention’s standards for prevention interventions⁷, as they incorporate standards for efficacy dissemination and feedback loops from crisis to theory. However, none of the existing frameworks capture the meta-scientific insights generated in our field in the last decade.

Our ERLs do not map perfectly onto NASA’s TRLs, and we should not expect them to; there are many differences between behavioural and rocket science. In the social and behavioural sciences we think this process should start with defining problem(s) in collaboration with the stakeholders most likely to implement the interventions (ERL1). These concepts can then be further developed in consultation with people in the target settings to gather preliminary information about how settings or context might alter processes (ERL2). From there, researchers can conduct systematic reviews and other meta-syntheses to select evidence that could potentially be applied (ERL3). These systematic reviews require a number of bias-detection techniques. It is well-known that the behavioural sciences suffer from publication bias and other practices that compromise the integrity of research evidence. Some findings may be reliable, but the onus is on us to identify which are and which are not and which generalize or don’t. Yet, these systematic reviews must still be done with an awareness that the currently available statistical techniques do not completely correct for bias and that the resultant findings are at most at ERL3.

Following this, one can gather information about stimulus and measurement validity and equivalence for application in the target setting (ERL4). Next, researchers—in consultation with local experts—should consider the potential benefits and harms associated with applying potential solutions (ERL5) and generate estimates of effects in a pilot sample (ERL6). With preliminary effects in hand, the team can then begin to test for heterogeneity in low-stakes (ERL7) and higher-stakes (ERL8) samples and settings, which would build the confidence necessary to apply the findings in the real target setting or crisis situation (ERL9).

Even at ERL9, evidence evaluation continues; applications of social and behavioural work, particularly in a crisis, should be iterative, so high-quality evidence is fed back to evaluate the effectiveness of the intervention and to develop critical and flexible improvements. Feedback should be grounded in collaboration between basic and applied researchers, as well as with stakeholders, to ensure that the resulting evidence is relevant and actionable. Failure to continually re-evaluate interventions in light of new data could lead to unnecessary harm, where even the best evidence was inadequate to predict the intervention’s real-world effects.

A benchmarking system such as the ERL requires us to think carefully about the nature of our research that can be applied credibly and guides where research investments should be made. For example, we can better recognise that our goal of gathering reliable insights (ERL3) provides a necessary foundation for further collective efforts that scaffold towards scalable and generalizable interventions (ERL7). Engaging community experts, identifying relevant theories, and collecting extensive observations are key to framing challenges and working with interdisciplinary teams to address them (ERL1). Behavioural scientists from different cultures then discuss how interventions may need to differ in nature across context and cultures. The multidisciplinary and multi-stakeholder nature of ERLs requires us to fundamentally rethink how we produce, and communicate confidence in, application-ready findings.

The current crisis provides a chance for social and behavioural scientists to question how we understand and communicate the value of our scientific models in terms of ERLs. It also requires us to communicate those ERLs to policy-makers so that they know whether we are making educated guesses (ERL3 or below) or can be confident about the application of our findings because we have tested and replicated them in representative environments (ERL7). When providing policy advice on the basis of scientific evidence, it is important to understand and be able to explain whether and how recommendations would impact affected individuals under a range of circumstances that are highly relevant to the crisis in question (ERL7).

Even if findings are at ERL3 after assessing evidence quality of primary studies, we have little way of knowing how much positive, or unintended negative, consequences an intervention might have when applied to a new situation. We are concerned to see social and behavioural scientists making confident claims about the utility of scientific findings for solving COVID-19 problems without regard for whether those findings are based on the kind of scientific methods that would move them up the ERL ladder¹. The absence of recognised benchmarking systems makes this challenging. While it is tempting to instead qualify uncertainty by using non-committal language about the possible utility of existing findings (for example, ‘may’, ‘could’), this approach is fundamentally flawed because public conversations generally ignore these rhetorical caveats⁸. Scientists should actively communicate uncertainty, particularly when speaking to crises. Communicating that their ERL is only at 3 or 4 would empower policy-makers by providing clear understanding of how to weight our advice in terms of their options. Reaching a higher ERL is extremely complicated and will require radical changes in the way we conduct research, not only in response to crises.

How social and behavioural scientists can advance their ERLs

The field of genetics started in a position similar to the position that many behavioural sciences find themselves in now, with small, independently collected samples that produced unreliable findings. Attempts to identify candidate genes for many constructs of interest kept stalling at TRL1/ERL4. In one prominent example, 52 patients provided genetic material for an analysis of the relationship between the 5-HTT gene and major depression⁹, a finding that spurred enormous interest in the biological mechanisms underlying depression. Unfortunately, as with the current situation in psychology, these early results were contradicted by failed replication studies¹⁰.

Technological advances in genotyping unlocked different approaches for geneticists. Instead of working in isolated teams, geneticists pooled resources via consortium studies and thereby accelerated scientific progress and quality. Their recent studies (with samples that sometimes exceed 1,000,000) dwarf previous candidate gene studies in terms of sample size¹¹. To accomplish this, geneticists devoted considerable time to developing research workflows, data harmonization systems and processes that increased the accuracy of their measurements. The new methodologies are not without flaws: for example, there is substantial scope for expanding the representativeness of study cohorts. But the progress that consortium research in genetics has made in a short time is impressive.

In recent years we have observed similar progress in the psychological sciences going from single, small-sample studies to large-scale replications^12,13 and novel studies¹⁴ to the building of the prerequisite infrastructure to facilitate team science. One example is the Psychological Science Accelerator (PSA), a large standing network with experts facilitating study selection, data management, ethics and translation¹⁵. While the PSA is making important progress, problems surrounding measurement validity, sample generalizability and organizational diversity (40% of its leadership is from North America), which affect the network’s ability to accurately interpret findings, still present material challenges to the applicability of their projects. Therefore, the PSA will require substantial improvement and investment before it can generate practical ERL7-level evidence and further develop our proposed framework.

The COVID-19 crisis underscores the critical need to bring the social and behavioural sciences in line with other mature sciences. Diverse consortia of researchers with expertise in philosophy, ethics, statistics and data and code management are needed to produce the kind of research required to better understand people the world over. Realising this mature, inclusive and efficient model necessitates a shift in the knowledge production and evaluation models that guide the social and behavioural sciences.

Be cautious when applying social and behavioural science to policy

On balance, we hold the view that the social and behavioural sciences have the potential to help us better understand our world. However, we are less sanguine about whether many areas of social and behavioural sciences are mature enough to provide such understanding, particularly when considering life-and-death issues like a pandemic. We believe that, rather than appealing to policy-makers to recognise our value, we should focus on earning the credibility that legitimates a seat at the policy table. The ERL taxonomy is a sample roadmap for achieving this level of maturity as a science and for accurately and honestly communicating our current state of evidence. Collaborations among large and diverse teams with local knowledge and multidisciplinary expertise can help us move up the evidence ladder. Equally important, studies in the behavioural sciences must be designed to move up this ladder incrementally. Designing an ERL6 study that is built on a shaky ERL1 foundation will be of little use. Moving up requires investment, thought and, most important of all, epistemic humility. Without a systematic and iterative research framework, we believe that behavioural scientists should carefully consider whether well-intentioned advice may do more harm than good.

References

Van Bavel, J. J. et al. Nat. Hum. Behav. 4, 460–471 (2020).
Article PubMed Google Scholar
Syed, M. Psychology of COVID-19 preprint tracker. https://docs.google.com/spreadsheets/d/1owb_jkXp6FMubhvh-aTKwzzxF87lNiRjia6kKMw-ZGA/edit?usp=sharing (2020).
Henrich, J., Heine, S. J. & Norenzayan, A. Behav. Brain Sci. 33, 61–83 (2010).
Article PubMed Google Scholar
Judd, C. M., Westfall, J. & Kenny, D. A. J. Pers. Soc. Psychol. 103, 54–69 (2012).
Article PubMed Google Scholar
Webb, T. L. & Sheeran, P. Psychol. Bull. 132, 249 (2006).
Article PubMed Google Scholar
Mai, T., ed. Technology Readiness Level. NASA (2012). https://go.nasa.gov/2XKbFsq.
Gottfredson, D. C. et al. Prev. Sci. 16, 893–926 (2015).
Article PubMed PubMed Central Google Scholar
Adams, R. C. et al. J. Exp. Psychol. Appl. 23, 1–14 (2017).
Article PubMed Google Scholar
Heils, A. et al. J. Neurochem. 66, 2621–2624 (1996).
Article CAS PubMed Google Scholar
Gillespie, N. A., Whitfield, J. B., Williams, B. E. N., Heath, A. C. & Martin, N. G. Psychol. Med. 35, 101–111 (2005).
Article PubMed Google Scholar
Jansen, P. R. et al. Nat. Genet. 51, 394–403 (2019).
Article CAS PubMed Google Scholar
Klein, R. A. et al. Soc. Psychol. 45, 142–152 (2014).
Article Google Scholar
Grahe, J. E., et al. Collaborative Replications and Education Project (CREP). https://osf.io/wfc6u (2016).
IJzerman, H. et al. Collabra: Psychology 4, 37 (2018).
Article Google Scholar
Moshontz, H. et al. Adv. Methods Practices Psychol. Sci. 1, 501–515 (2018).
Article Google Scholar

Download references

Acknowledgements

The preparation of this work was partly funded by a French National Research Agency “Investissements d’avenir” program grant (ANR-15-IDEX-02) awarded to H.I., a Huo Family Foundation grant to A.K.P., an ERC 647910 (KINSHIP) grant awarded to L.D., and an ERC 851890 (SOAR) grant awarded to N.W. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

These authors contributed equally: Hans IJzerman, Neil A. Lewis Jr., Andrew K. Przybylski.

Authors and Affiliations

LIP/PC2S, Université Grenoble Alpes, Grenoble, France
Hans IJzerman & Patrick S. Forscher
Institut Universitaire de France, Paris, France
Hans IJzerman
Department of Communication, Cornell University, Ithaca, NY, USA
Neil A. Lewis Jr.
Oxford Internet Institute and Department of Experimental Psychology, University of Oxford, Oxford, UK
Andrew K. Przybylski
School of Psychology and Clincal Language Sciences, University of Reading, Reading, UK
Netta Weinstein
Institute of Neuroscience and Psychology, University of Glasgow, Glasgow, UK
Lisa DeBruine
Social, Genetic and Developmental Psychiatry Centre, King’s College London, London, UK
Stuart J. Ritchie
Department of Psychology, University of California, Davis, Davis, CA, USA
Simine Vazire
Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
Simine Vazire
School of Psychology, Cardiff University, Cardiff, UK
Richard D. Morey
Department of Communication, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
James D. Ivory
Department of Marketing and Management, University of Southern Denmark, Odense, Denmark
Farid Anvari

Authors

Hans IJzerman
View author publications
You can also search for this author in PubMed Google Scholar
Neil A. Lewis Jr.
View author publications
You can also search for this author in PubMed Google Scholar
Andrew K. Przybylski
View author publications
You can also search for this author in PubMed Google Scholar
Netta Weinstein
View author publications
You can also search for this author in PubMed Google Scholar
Lisa DeBruine
View author publications
You can also search for this author in PubMed Google Scholar
Stuart J. Ritchie
View author publications
You can also search for this author in PubMed Google Scholar
Simine Vazire
View author publications
You can also search for this author in PubMed Google Scholar
Patrick S. Forscher
View author publications
You can also search for this author in PubMed Google Scholar
Richard D. Morey
View author publications
You can also search for this author in PubMed Google Scholar
James D. Ivory
View author publications
You can also search for this author in PubMed Google Scholar
Farid Anvari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans IJzerman.

Ethics declarations

Competing interests

P.S.F., H.I. and N.A.L. are on the board of directors of the Psychological Science Accelerator network referenced in the manuscript. The remaining authors declare no competing interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

IJzerman, H., Lewis, N.A., Przybylski, A.K. et al. Use caution when applying behavioural science to policy. Nat Hum Behav 4, 1092–1094 (2020). https://doi.org/10.1038/s41562-020-00990-w

Download citation

Published: 09 October 2020
Issue Date: November 2020
DOI: https://doi.org/10.1038/s41562-020-00990-w

This article is cited by

Field testing the transferability of behavioural science knowledge on promoting vaccinations
- Silvia Saccardo
- Hengchen Dai
- Jeffrey Fujimoto
Nature Human Behaviour (2024)
The Moral Foundations of Vaccine Passports
- Trisha Harjani
- Hongwei He
- Melody Manchi Chao
Journal of Business Ethics (2024)
A synthesis of evidence for policy from behavioural science during COVID-19
- Kai Ruggeri
- Friederike Stock
- Robb Willer
Nature (2024)
Reconsidering what makes syntheses of psychological intervention studies useful
- John K. Sakaluk
- Carm De Santis
- Don van Ravenzwaaij
Nature Reviews Psychology (2023)
Reducing bias, increasing transparency and calibrating confidence with preregistration
- Tom E. Hardwicke
- Eric-Jan Wagenmakers
Nature Human Behaviour (2023)