Integrating explanation and prediction in computational social science

Hofman, Jake M.; Watts, Duncan J.; Athey, Susan; Garip, Filiz; Griffiths, Thomas L.; Kleinberg, Jon; Margetts, Helen; Mullainathan, Sendhil; Salganik, Matthew J.; Vazire, Simine; Vespignani, Alessandro; Yarkoni, Tal

doi:10.1038/s41586-021-03659-0

Perspective
Published: 30 June 2021

Integrating explanation and prediction in computational social science

Nature volume 595, pages 181–188 (2021)Cite this article

28k Accesses
135 Citations
118 Altmetric
Metrics details

Subjects

Abstract

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions—the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes—and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Raiders of the lost HARK: a reproducible inference framework for big data science

Article Open access 22 October 2019

Mattia Prosperi, Jiang Bian, … Mo Wang

Statistical inference links data and theory in network science

Article Open access 10 November 2022

Leto Peel, Tiago P. Peixoto & Manlio De Domenico

Reconsidering evidence of moral contagion in online social networks

Article 10 June 2021

Jason W. Burton, Nicole Cruz & Ulrike Hahn

References

Watts, D. J. A twenty-first century science. Nature 445, 489 (2007).
CAS PubMed ADS Google Scholar
Lazer, D. et al. Computational social science. Science 323, 721–723 (2009).
CAS PubMed PubMed Central Google Scholar
Salganik, M. J. Bit by Bit: Social Research in the Digital Age (Princeton Univ. Press, 2018).
Lazer, D. M. J. et al. Computational social science: obstacles and opportunities. Science 369, 1060–1062 (2020).
CAS PubMed ADS Google Scholar
Lazer, D. et al. Meaningful measures of human society in the twenty-first century. Nature https://doi.org/10.1038/s41586-021-03660-7 (2021).
Wing, J. M. Computational thinking. Commun. ACM 49, 33–35 (2006).
Google Scholar
Hedström, P. & Ylikoski, P. Causal mechanisms in the social sciences. Annu. Rev. Sociol. 36, 49–67 (2010).
Google Scholar
Breiman, L. Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat. Sci. 16, 199–231 (2001). We view our paper as an extension of Brieman’s dichotomy (the ‘algorithmic’ and ‘data modelling’ cultures), arguing that these approaches should be integrated.
MATH Google Scholar
Mullainathan, S. & Spiess, J. Machine learning: an applied econometric approach. J. Econ. Perspect. 31, 87–106 (2017). This paper explores the relationships between predictive models and causal inference.
Google Scholar
Molina, M. & Garip, F. Machine learning for sociology. Annu. Rev. Sociol. 45, 27–45 (2019).
Shmueli, G. To explain or to predict? Stat. Sci. 25, 289–310 (2010). We build on Schmueli’s distinction between prediction and explanation and propose a framework for integrating the two approaches.
MathSciNet MATH Google Scholar
Agrawal, M., Peterson, J. C. & Griffiths, T. L. Scaling up psychology via Scientific Regret Minimization. Proc. Natl Acad. Sci. USA 117, 8825–8835 (2020). This paper exemplifies what we call integrative modelling.
CAS PubMed PubMed Central Google Scholar
Munafò, M. R. et al. A manifesto for reproducible science. Nat. Hum. Behav. 1, 0021 (2017).
PubMed PubMed Central Google Scholar
Yarkoni, T. The generalizability crisis. Behav. Brain Sci. https://doi.org/10.1017/S0140525X20001685 (2020).
Ward, M. D., Greenhill, B. D. & Bakke, K. M. The perils of policy by p-value: predicting civil conflicts. J. Peace Res. 47, 363–375 (2010).
Google Scholar
Yarkoni, T. & Westfall, J. Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12, 1100–1122 (2017).
PubMed PubMed Central Google Scholar
Watts, D. J. Should social science be more solution-oriented? Nat. Hum. Behav. 1, 0015 (2017).
Google Scholar
Berkman, E. T. & Wilson, S. M. So useful as a good theory? The practicality crisis in (social) psychological theory. Perspect. Psychol. Sci. https://doi.org/10.1177/1745691620969650 (2021).
Athey, S. Beyond prediction: Using big data for policy problems. Science 355, 483–485 (2017).
CAS PubMed ADS Google Scholar
Lipton, Z. C. The mythos of model interpretability. Queue 16, 31–57 (2018).
Google Scholar
Kleinberg, J., Ludwig, J., Mullainathan, S. & Sunstein, C. R. Discrimination in the age of algorithms. J. Legal Anal. 10, 113–174 (2018).
Google Scholar
Coveney, P. V., Dougherty, E. R. & Highfield, R. R. Big data need big theory too. Philos. Trans. R. Soc. A 374, 20160153 (2016).
ADS Google Scholar
Gigerenzer, G. Mindless statistics. J. Socio-Econ. 33, 587–606 (2004).
Google Scholar
Cohen, J. The earth is round (p < .05). Am. Psychol. 49, 997–1003 (1994).
Google Scholar
Bertrand, M. & Mullainathan, S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am. Econ. Rev. 94, 991–1013 (2004).
Google Scholar
Ioannidis, J. P. A. Why most published research findings are false. PLoS Med. 2, e124 (2005).
PubMed PubMed Central Google Scholar
Simmons, J. P., Nelson, L. D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).
PubMed Google Scholar
Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).
Google Scholar
Meehl, P. E. Why summaries of research on psychological theories are often uninterpretable. Psychol. Rep. 66, 195–244 (1990).
Google Scholar
Gelman, A. Causality and statistical learning. Am. J. Sociol. 117, 955–966 (2011).
Google Scholar
Dienes, Z. Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference (Macmillan, 2008).
Schrodt, P. A. Seven deadly sins of contemporary quantitative political analysis. J. Peace Res. 51, 287–300 (2014).
Google Scholar
Lazer, D., Kennedy, R., King, G. & Vespignani, A. The parable of Google flu: traps in big data analysis. Science 343, 1203–1205 (2014).
CAS PubMed ADS Google Scholar
Obermeyer, Z., Powers, B., Vogeli, C. & Mullainathan, S. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 447–453 (2019).
CAS PubMed ADS Google Scholar
Goel, S., Hofman, J. M., Lahaie, S., Pennock, D. M. & Watts, D. J. Predicting consumer behavior with web search. Proc. Natl Acad. Sci. USA 107, 17486–17490 (2010).
CAS PubMed PubMed Central ADS Google Scholar
Hofman, J. M., Sharma, A. & Watts, D. J. Prediction and explanation in social systems. Science 355, 486–488 (2017).
CAS PubMed ADS Google Scholar
Case, A. & Deaton, A. Rising morbidity and mortality in midlife among white non-Hispanic Americans in the 21st century. Proc. Natl Acad. Sci. USA 112, 15078–15083 (2015).
CAS PubMed PubMed Central ADS Google Scholar
Oliver, M. L., Shapiro, T. M. & Shapiro, T. Black Wealth, White Wealth: A New Perspective on Racial Inequality (Taylor & Francis, 2006).
Chetty, R., Hendren, N., Kline, P. & Saez, E. Where is the land of opportunity? The geography of intergenerational mobility in the United States. Q. J. Econ. 129, 1553–1623 (2014).
Google Scholar
Wagner, C. et al. Measuring algorithmically infused societies. Nature https://doi.org/10.1038/s41586-021-03666-1 (2021).
Ba, B. A., Knox, D., Mummolo, J. & Rivera, R. The role of officer race and gender in police–civilian interactions in Chicago. Science 371, 696–702 (2021).
CAS PubMed ADS Google Scholar
Provost, F. & Fawcett, T. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking (O’Reilly Media, 2013).
Makridakis, S., Wheelwright, S. C. & Hyndman, R. J. Forecasting Methods and Applications (Wiley, 1998).
Tetlock, P. E. Expert Political Judgment: How Good Is It? How Can We Know? (Princeton Univ. Press, 2005).
Kleinberg, J., Ludwig, J., Mullainathan, S. & Obermeyer, Z. Prediction policy problems. Am. Econ. Rev. 105, 491–495 (2015).
PubMed PubMed Central Google Scholar
Dowding, K. & Miller, C. On prediction in political science. Eur. J. Polit. Res. 58, 1001–1018 (2019).
Google Scholar
Galesic, M. et al. Human social sensing is an untapped resource for computational social science. Nature https://doi.org/10.1038/s41586-021-03649-2 (2021).
Article PubMed Google Scholar
Cheng, J., Adamic, L., Dow, P. A., Kleinberg, J. M. & Leskovec, J. Can cascades be predicted? In WWW '14: Proc. 23rd International Conference on World Wide Web 925–936 (2014).
Pearl, J. The seven tools of causal inference, with reflections on machine learning. Commun. ACM 62, 54–60 (2019). This paper outlines the need for causal thinking in building predictive models.
Google Scholar
Salganik, M. J. et al. Measuring the predictability of life outcomes with a scientific mass collaboration. Proc. Natl Acad. Sci. USA 117, 8398–8403 (2020).
CAS PubMed PubMed Central Google Scholar
Fudenberg, D., Kleinberg, J., Liang, A. & Mullainathan, S. Measuring the completeness of theories. SSRN https://doi.org/10.2139/ssrn.3018785 (2019).
Martin, T., Hofman, J. M., Sharma, A., Anderson, A. & Watts, D. J. Exploring limits to prediction in complex social systems. In WWW '16: Proc 25th International Conference on World Wide Web 683–694 (2016).
Watts, D. J. Common sense and sociological explanations. Am. J. Sociol. 120, 313–351 (2014). This paper argues that sociologists should pay more attention to prediction versus interpretability when evaluating their explanations.
Google Scholar
Zhou, F., Xu, X., Trajcevski, G. & Zhang, K. A survey of information cascade analysis: models, predictions, and recent advances. ACM Comput. Surv. 54, 1–36 (2021).
Google Scholar
Goel, S., Watts, D. J. & Goldstein, D. G. The structure of online diffusion networks. In EC '12: Proc. 13th ACM Conference on Electronic Commerce (2012).
Wu, S., Hofman, J. M., Mason, W. A. & Watts, D. J. Who says what to whom on Twitter. In WWW’11: Proc 20th International Conference on World Wide Web 705–714 (2011).
Goel, S., Anderson, A., Hofman, J. & Watts, D. J. The structural virality of online diffusion. Manage. Sci. 62, 180–196 (2015).
Google Scholar
Berger, J. & Milkman, K. L. What makes online content viral? J. Mark. Res. 49, 192–205 (2012).
Google Scholar
Bakshy, E., Hofman, J. M., Mason, W. A. & Watts, D. J. Everyone’s an influencer: quantifying influence on Twitter. In WSDM '11: Proc. Fourth ACM International Conference on Web Search and Data Mining 65–74 (2011).
Tan, C., Lee, L. & Pang, B. The effect of wording on message propagation: topic- and author-controlled natural experiments on Twitter. In Proc. 52nd Annual Meeting of the Association for Computational Linguistics 175–185 (2014).
Liu, T., Ungar, L. & Kording, K. Quantifying causality in data science with quasi-experiments. Nat. Comput. Sci. 1, 24–32 (2021).
Google Scholar
Hochberg, I. et al. Encouraging physical activity in patients with diabetes through automatic personalized feedback via reinforcement learning improves glycemic control. Diabetes Care 39, e59–e60 (2016).
PubMed Google Scholar
Athey, S. & Imbens, G. Recursive partitioning for heterogeneous causal effects. Proc. Natl Acad. Sci. USA 113, 7353–7360 (2016).
MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Charles, D., Chickering, M. & Simard, P. Counterfactual reasoning and learning systems: the example of computational advertising. J. Mach. Learn. Res. 14, 3207–3260 (2013).
MathSciNet MATH Google Scholar
Low, H. & Meghir, C. The use of structural models in econometrics. J. Econ. Perspect. 31, 33–58 (2017).
Google Scholar
Athey, S., Levin, J. & Seira, E. Comparing open and sealed bid auctions: evidence from timber auctions*. Q. J. Econ. 126, 207–257 (2011).
Google Scholar
Awad, E. et al. The Moral Machine experiment. Nature 563, 59–64 (2018).
CAS PubMed ADS Google Scholar
Aczel, B. et al. A consensus-based transparency checklist. Nat. Hum. Behav. 4, 4–6 (2020).
PubMed Google Scholar
Kidwell, M. C. et al. Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency. PLoS Biol. 14, e1002456 (2016).
PubMed PubMed Central Google Scholar
Nosek, B. A. et al. Promoting an open research culture. Science 348, 1422–1425 (2015).
CAS PubMed PubMed Central ADS Google Scholar
Nosek, B. A., Ebersole, C. R., DeHaven, A. C. & Mellor, D. T. The preregistration revolution. Proc. Natl Acad. Sci. USA 115, 2600–2606 (2018).
CAS PubMed PubMed Central Google Scholar
Donoho, D. 50 years of data science. J. Comput. Graph. Stat. 26, 745–766 (2017).
MathSciNet Google Scholar
Gelman, A. & Loken, E. The statistical crisis in science. Am. Sci. 102, 460 (2014).
Google Scholar
Rao, R. B., Fung, G. & Rosales, R. On the dangers of cross-validation. An experimental evaluation. In Proc. 2008 SIAM International Conference on Data Mining 588–596 (Society for Industrial and Applied Mathematics, 2008).
Dwork, C. et al. The reusable holdout: preserving validity in adaptive data analysis. Science 349, 636–638 (2015).
MathSciNet CAS PubMed MATH ADS Google Scholar
Chambers, C. D. Registered reports: a new publishing initiative at Cortex. Cortex 49, 609–610 (2013).
PubMed Google Scholar
Nosek, B. A. & Lakens, D. Registered reports: a method to increase the credibility of published reports. Soc. Psychol. 45, 137–141 (2014).
Google Scholar
Bennett, J. & Lanning, S. The Netflix Prize. In Proc. KDD Cup and Workshop 2007 (2007).
Dorie, V., Hill, J., Shalit, U., Scott, M. & Cervone, D. Automated versus do-it-yourself methods for causal inference: lessons learned from a data analysis competition. SSO Schweiz. Monatsschr. Zahnheilkd. 34, 43–68 (2019).
MathSciNet MATH Google Scholar
Lin, A., Merchant, A., Sarkar, S. K. & D’Amour, A. Universal causal evaluation engine: an API for empirically evaluating causal inference models. in Proc. Machine Learning Research (eds Le, T. D. et al.) Vol. 104, 50–58 (PMLR, 2019).
Craver, C. F. Explaining the Brain: Mechanisms and the Mosaic Unity of Neuroscience (Clarendon, 2007).
Salganik, M. J., Lundberg, I., Kindel, A. T. & McLanahan, S. Introduction to the special collection on the Fragile Families Challenge. Socius https://doi.org/10.1177/2378023119871580 (2019).
Strathern, M. ‘Improving ratings’: audit in the British university system. Eur. Rev. 5, 305–321 (1997).
Google Scholar
Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D. & Griffiths, T. L. Using large-scale experiments and machine learning to discover new theories of human decision-making. Science 372, 1209–1214 (2021).

Download references

Author information

These authors contributed equally: Jake M. Hofman, Duncan J. Watts

Authors and Affiliations

Microsoft Research, New York, NY, USA
Jake M. Hofman
Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA
Duncan J. Watts
The Annenberg School of Communication, University of Pennsylvania, Philadelphia, PA, USA
Duncan J. Watts
Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA, USA
Duncan J. Watts
Graduate School of Business, Stanford University, Stanford, CA, USA
Susan Athey
Department of Sociology, Princeton University, Princeton, NJ, USA
Filiz Garip & Matthew J. Salganik
Department of Psychology, Princeton University, Princeton, NJ, USA
Thomas L. Griffiths
Department of Computer Science, Princeton University, Princeton, NJ, USA
Thomas L. Griffiths
Department of Computer Science, Cornell University, Ithaca, NY, USA
Jon Kleinberg
Department of Information Science, Cornell University, Ithaca, NY, USA
Jon Kleinberg
Oxford Internet Institute, University of Oxford, Oxford, UK
Helen Margetts
Public Policy Programme, The Alan Turing Institute, London, UK
Helen Margetts
Booth School of Business, University of Chicago, Chicago, IL, USA
Sendhil Mullainathan
Melbourne School of Psychological Sciences, University of Melbourne, Melbourne, Victoria, Australia
Simine Vazire
Laboratory for the Modeling of Biological and Socio-technical Systems, Northeastern University, Boston, MA, USA
Alessandro Vespignani
Department of Psychology, University of Texas at Austin, Austin, TX, USA
Tal Yarkoni

Authors

Jake M. Hofman
View author publications
You can also search for this author in PubMed Google Scholar
Duncan J. Watts
View author publications
You can also search for this author in PubMed Google Scholar
Susan Athey
View author publications
You can also search for this author in PubMed Google Scholar
Filiz Garip
View author publications
You can also search for this author in PubMed Google Scholar
Thomas L. Griffiths
View author publications
You can also search for this author in PubMed Google Scholar
Jon Kleinberg
View author publications
You can also search for this author in PubMed Google Scholar
Helen Margetts
View author publications
You can also search for this author in PubMed Google Scholar
Sendhil Mullainathan
View author publications
You can also search for this author in PubMed Google Scholar
Matthew J. Salganik
View author publications
You can also search for this author in PubMed Google Scholar
Simine Vazire
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Vespignani
View author publications
You can also search for this author in PubMed Google Scholar
Tal Yarkoni
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M.H. and D.J.W. conceptualized and helped to write and prepare the manuscript. They contributed equally to these efforts. All authors were involved in and discussed the structure of the manuscript at various stages of its development.

Corresponding authors

Correspondence to Jake M. Hofman or Duncan J. Watts.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Noortje Marres, Melanie Mitchell and Scott Page for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hofman, J.M., Watts, D.J., Athey, S. et al. Integrating explanation and prediction in computational social science. Nature 595, 181–188 (2021). https://doi.org/10.1038/s41586-021-03659-0

Download citation

Received: 23 February 2021
Accepted: 20 May 2021
Published: 30 June 2021
Issue Date: 08 July 2021
DOI: https://doi.org/10.1038/s41586-021-03659-0

This article is cited by

Artificial intelligence and illusions of understanding in scientific research
- Lisa Messeri
- M. J. Crockett
Nature (2024)
The rise of machine learning in the academic social sciences
- Charles Rahal
- Mark Verhagen
- David Kirk
AI & SOCIETY (2024)
Hybridizing Motivational Strains: How Integrative Models Are Crucial for Advancing Motivation Science
- Ronnel B. King
- Luke K. Fryer
Educational Psychology Review (2024)
Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer)
- Elizaveta Sivak
- Paulina Pankowska
- Gert Stulp
Journal of Computational Social Science (2024)
Data driven contagion risk management in low-income countries using machine learning applications with COVID-19 in South Asia
- Abu S. Shonchoy
- Moogdho M. Mahzab
- Manhal Ali
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Integrating explanation and prediction in computational social science

Subjects

Abstract

Access options

Similar content being viewed by others

Raiders of the lost HARK: a reproducible inference framework for big data science

Statistical inference links data and theory in network science

Reconsidering evidence of moral contagion in online social networks

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

This article is cited by

Artificial intelligence and illusions of understanding in scientific research

The rise of machine learning in the academic social sciences

Hybridizing Motivational Strains: How Integrative Models Are Crucial for Advancing Motivation Science

Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer)

Data driven contagion risk management in low-income countries using machine learning applications with COVID-19 in South Asia

Comments

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links