EDITORIAL
01 July 2021

The powers and perils of using digital data to understand human behaviour

Computational social science is a powerful research tool. But it needs its different disciplines to find a common language.

You have full access to this article via your institution.

Download PDF

Seated passengers on the subway using their mobile phones — Computational social scientists have been using data from mobile phones to study the coronavirus pandemic.Credit: Paul Seheult/Eye Ubiquitous/Universal Images Group/Getty

What are the causes of vaccine hesitancy? How can people be encouraged to exercise more? What can governments do to improve the well-being of citizens?

Social scientists researching these questions observe how people behave, record data on those behaviours and then augment this knowledge by interviewing and/or polling those whom they are studying. Carrying out research in this way is a time-consuming and manual process. Moreover, it is difficult to obtain large amounts of data simultaneously.

But now, researchers have access to an unprecedented amount of social data, generated every second by continuous interactions on digital devices or platforms. These include data that trace people’s movements, purchases and online social interactions — which are all proving extraordinarily powerful for research. As a result, work weaving large data analysis with social questions, known as computational social science, has witnessed huge growth in recent years.

Nature special: Computational social science

During the course of the coronavirus pandemic alone, researchers have been able to access millions of mobile-phone records to study how people’s movement changed during the pandemic and the impact of those changes on how SARS-CoV-2 spread. They have been able to access anonymized credit-card purchase histories to study how people are spending money during the pandemic — information which is then used to understand how COVID-19 is affecting various sectors of the economy.

Using computers to analyse large data sets dates back to the earliest mainframe computers — and has been central to the work of actuaries and national statistics offices, both of which have long been important resources for studies of society and people. But the wealth of real-time and individual-level information is now unparalleled in its power to track trends, make predictions and inform decisions. And its availability puts it in reach of practically every social-science discipline: researchers in fields from psychology to economics and political science can now rely on data to enhance investigations of key societal questions.

Power and responsibility

At the same time, researchers need to remember that gathering and sharing such personal data — practices that are currently largely unregulated — pose many challenges to society. These include risks from increased surveillance, and the danger that people could be reidentified from otherwise anonymized data.

Everyone should decide how their digital data are used — not just tech companies

There are also concerns that people whose data are being used have not fully consented to this — and wider worries about the economic monopoly of tech corporations that own the majority of the data. These digital traces tend to be left disproportionately by relatively wealthy people in developed countries, biasing attempts to draw global conclusions. Acknowledging and working with these issues is key to ethical computational social science that promotes real societal progress.

The need to blend expertise in the social sciences with the skills required to collect, clean and analyse large data sets means that computational social science requires teams of researchers who can field a remarkably diverse set of expertise and skills. But with collaborations across disciplines come other challenges.

This week, Nature is publishing a special collection of articles with the objective of bridging the research disciplines and perspectives on doing science that underpin computational social science. We’re highlighting ways in which communities of social, natural and computational scientists can learn to better work together, to complement each other and overcome shared challenges.

Stronger bridges

To begin with, the varied disciplines need to overcome language barriers in which the same terms have different meanings. For example, in many of the social sciences (such as psychology and sociology), ‘prediction’ often refers to a correlation; in the physical sciences (such as physics, computer science and engineering), it usually means a forecast. True transdisciplinary research requires scientists first to learn each other’s languages, and then to develop a shared understanding of terms.

But the divide can run deeper than language, into how to curate, analyse and interpret data to explain a phenomenon. Jake Hofman at Microsoft Research in New York City and colleagues argue that computational social science could most effectively answer research questions by combining complementary approaches. For example, researchers building a numerical forecast on, say, the causes of traffic jams would assemble data on traffic flows, with insights from drivers on their reasons for taking particular routes.

The battle for ethical AI at the world’s biggest machine-learning conference

The results of any study are determined by not only the analytical strategies used, but also the quality of the data — and this becomes particularly delicate when dealing with social data. The vast amounts of available data that make computational social science possible — such as tweets or location data from phones — are usually not gathered for research purposes and so can easily be misinterpreted.

That is why, as David Lazer at Northeastern University in Boston, Massacusetts, and colleagues write, researchers who work with large data sets must resist drawing conclusions from just the trends or patterns seen in the numbers — and should account for factors that could affect a result. To extract real meaning from data, researchers need to ensure that they carefully define the objects of their measurement according to theory, validate them and interpret them appropriately.

The widespread influence of algorithms is another source of potential error, as Claudia Wagner at the Leibniz Institute for the Social Sciences in Mannheim, Germany, and colleagues explain. They note that the algorithms that pervade our societies influence individual and group behaviour in many ways — meaning that any observations describe not just human behaviour, but also the effects of algorithms on how people behave. They argue that the theories that inform social science need to be updated to acknowledge these influences; without these theories and a clear understanding of the impact of algorithms on the available data, researchers will not be able to draw meaningful conclusions.

Yet another complicating factor for computational social science is that large data sets are often the private property of commercial enterprises. Academic scientists need to liaise with corporations to obtain access, and this might introduce even more bias. This is partly because, for companies, data are valuable — and therefore sharing data is a risk to their bottom line. That is among the reasons why firms tend to restrict what they share, as Jathan Sadowski at Monash University in Melbourne, Australia, and colleagues highlight. But in light of the potential of these data to provide societal benefits, companies — together with academic researchers and public bodies — need to collectively engage with these questions and set standards for quality, access and data ownership.

Ways forward

There are ways to obtain data that are can be useful and reliable, as Mirta Galesic at the Santa Fe Institute in New Mexico and colleagues describe in an article on ‘human social sensing’. This is the study of how individuals gather information on others in their social networks. For instance, researchers could predict a swing in political opinions by interviewing people and asking them what their friends are talking about. Gathering data about people from other people can help to avoid some of the biases seen in self-reported data, and has the added benefit of generating anonymous data: the researchers never need to know any personal or sensitive details about the people whom they are receiving information about.

COVID-19 recovery: science isn’t enough to save us

Another area ripe for growth lies in the intersection of infectious-disease modelling and behavioural science. As Caroline Buckee of the Harvard T. H. Chan School of Public Health in Boston and colleagues argue, an accurate model of contagion and infection requires researchers to understand the cultures and behaviours of people who have been — or might be — infected. It is hard to predict a disease’s path without considering these and other social aspects of transmission. Structured and widespread collaborations cutting across disciplines are key to achieving this.

The pandemic has shown how lives can be saved when large-scale data sets are harnessed for science. This potential is only starting to be realized as researchers with backgrounds in computer science or applied mathematics join with social scientists. These relationships must deepen and encompass researchers in more fields — such as ethics, responsible research and science and technology studies — to ensure that we avoid known pitfalls and that we use these data in a way that maximizes gained knowledge and minimizes potential harm.

Transdisciplinary co-working is rarely easy, but it is essential for both better decisions and robust outcomes. Nature is committed to fostering this conversation, helping scientists to learn each other’s languages so that researchers can together make more progress on some of societies’ most pressing problems.

Nature 595, 149-150 (2021)

doi: https://doi.org/10.1038/d41586-021-01736-y

Reprints and permissions

Subjects

Latest on:

AI now beats humans at basic tasks — new benchmarks are needed, says major report

News 15 APR 24

High-threshold and low-overhead fault-tolerant quantum memory

Article 27 MAR 24

Three reasons why AI doesn’t model human language

Correspondence 19 MAR 24

It’s time to talk about the hidden human cost of the green transition

Correspondence 16 APR 24

Shrouded in secrecy: how science is harmed by the bullying and harassment rumour mill

Career Feature 16 APR 24

Use fines from EU social-media act to fund research on adolescent mental health

Correspondence 09 APR 24

A guide to the Nature Index

Nature Index 13 MAR 24

Decoding chromatin states by proteomic profiling of nucleosome readers

Article 06 MAR 24

‘All of Us’ genetics chart stirs unease over controversial depiction of race

News 23 FEB 24

Jobs

Postdoctoral Position

We are seeking highly motivated and skilled candidates for postdoctoral fellow positions

Boston, Massachusetts (US)

Boston Children's Hospital (BCH)
Qiushi Chair Professor

Distinguished scholars with notable achievements and extensive international influence.

Hangzhou, Zhejiang, China

Zhejiang University
ZJU 100 Young Professor

Promising young scholars who can independently establish and develop a research direction.

Hangzhou, Zhejiang, China

Zhejiang University
Head of the Thrust of Robotics and Autonomous Systems

Reporting to the Dean of Systems Hub, the Head of ROAS is an executive assuming overall responsibility for the academic, student, human resources...

Guangzhou, Guangdong, China

The Hong Kong University of Science and Technology (Guangzhou)
Head of Biology, Bio-island

Head of Biology to lead the discovery biology group.

Guangzhou, Guangdong, China

BeiGene Ltd.

The powers and perils of using digital data to understand human behaviour

Power and responsibility

Stronger bridges

Ways forward

Subjects

Latest on:

Jobs

Postdoctoral Position

Qiushi Chair Professor

ZJU 100 Young Professor

Head of the Thrust of Robotics and Autonomous Systems

Head of Biology, Bio-island

Search

Quick links

Power and responsibility

Stronger bridges

Ways forward

Related Articles

Subjects

Latest on:

Jobs

Postdoctoral Position

Qiushi Chair Professor

ZJU 100 Young Professor

Head of the Thrust of Robotics and Autonomous Systems

Head of Biology, Bio-island

Search

Quick links