The ongoing COVID-19 pandemic highlights the continuing importance of global infectious disease threats, and the need to develop rigorous scientific theories to understand, quantify, and forecast the risks that pathogens pose to humanity. One of the most important lessons of the pandemic so far is that the central forces shaping local and global variation in disease burden and dynamics have been social, not biological. Although substantial biological questions remain unanswered, the multiple waves of infection that have been driven by shifting control policies and the heterogeneous public response to them1,2, as well as the disproportionate impact of the disease on poor and marginalized communities around the world3,4,5,6, are the defining features of the pandemic’s trajectory on local and global scales.

Epidemiological models that describe the spread of infectious diseases through populations have been developed during the pandemic to understand and predict pathogen transmission and to guide public health policies7,8. As a tool for synthesizing current knowledge, identifying key drivers of transmission, and planning public health policy, such models have a long history in research and public health9,10, and are increasingly used to make decisions about health policy and global funding11. Although uncertainties about biological aspects of pathogen transmission may be problematic for modelling, it is the social context—which is important not only in terms of model structure and parameterization but also with respect to the availability and interpretation of epidemiological data—that often presents the biggest challenges for capturing the essential features of disease dynamics8,12,13.

Human societies are structured by cultural forces that define social relations, particularly between kin, and the spread of infection reflects these social structures—starting with the household or family unit, and extending to the structure of workplaces and public spaces, and the physical layouts of villages, towns, cities, and countries. The purpose of a model, whether purely theoretical or fit to data to inform decision-making in a specific context, will determine how detailed these social aspects of transmission need to be, with the adage that a model should be ‘as simple as possible but no simpler’ likewise taking on different meanings depending on the model’s intended function. Intrinsic to this decision is a question of scale13: capturing population-level dynamics may not require individual-level detail about social interactions, but a model intended to understand local drivers of transmission may.

Data about social relationships that are relevant for modelling pathogen transmission are traditionally collected by censuses and other surveys14,15,16, but the expansion of access to the internet has started to open up possibilities for a more expansive, real-time, and global approach to the collection of survey data17,18,19 and the development of relevant social science theories about human behaviour. Furthermore, new data streams from mobile devices—for example, via social media—are offering vast, relatively unexplored datasets about human mobility on a global scale20,21. Despite the recent marked increase in the availability of these new datasets—a trend that has accelerated during the COVID-19 pandemic22,23,24—challenges remain in using them to parameterize transmission models. In particular, the extent to which data from mobile phones provide an accurate proxy for contact rates that spread disease remains unclear25,26. In fact, it is still difficult to parameterize social aspects of transmission in mechanistic models even in the context of sophisticated approaches to modelling and the addition of powerful new data. Nevertheless, as social scientists embrace and grapple with new data streams, infectious disease modellers have the opportunity to use them in the context of transmission models as “a way of thinking clearly”9 about the social drivers of epidemics.

Here we describe key social parameters that modellers must consider to effectively capture the dynamics of pathogen transmission on different scales. We draw a distinction between models in which appropriately disaggregated data about baseline human social dynamics—grounded in local knowledge—can provide mechanistic insights into disease transmission, and the challenges introduced when the quality, resolution, or paucity of epidemiological and behavioural data may constrain predictive power even if the social aspects of a model are well-specified. We also discuss the importance of understanding and predicting deviations from baseline behaviour that result from infection and public health policies, an issue that may need to be addressed using a fundamentally different kind of model structure. Models are increasingly informing target product profiles, global strategies, and investments in global public health programs for many infectious diseases. Social and behavioural aspects of transmission are often ignored in the name of generalizability and parsimony, with authors adopting the language of physics to justify these simplifications while also claiming to provide public health value to specific populations. Too often the integration of these models in decision-making processes at the national level remains weak. We believe that one of the most important challenges for our field is the development of flexible frameworks that integrate social contexts that are relevant for disease, a challenge that requires closer collaboration between social scientists and infectious disease epidemiologists.

Parameterizing local contact rates

All mechanistic frameworks of infectious disease transmission make assumptions about how frequently people are exposed to disease, for example owing to close physical contact between susceptible and infectious people (the contact rate), and about the probability of infection when exposure occurs. In the well-studied susceptible–infectious–recovered (SIR) model—first developed by Kermack and McKendrick nearly a century ago27—mixing within a single homogeneous population, and therefore infection risk, was assumed to be random. The conceptual separation of the transmission coefficient into social and biological components was not the norm until the 1980s, as it became increasingly apparent that the population dynamics of HIV were driven disproportionately by transmission within particular demographic groups, which reflected highly non-random patterns of sexual contact28,29,30,31,32,33,34. This separation provides conceptual differentiation of the social and cultural forces that drive uneven infection risk within a population from the biology of transmission itself30,35,36, and is an essential feature to include to incorporate the effect of heterogeneous social interactions on transmission (Box 1).

In reality, assumptions of random mixing are always violated, even at the local and within-household scales, and the extent to which models must account for departures from them will depend on the mode of transmission of the pathogen and the purpose of the model. Kin structures are at the heart of all communities (Fig. 1). Comprehensive diary studies have revealed strong, age-structured mixing patterns related to household structures of nuclear families and peer groups shaped by schooling and patterns of employment14,16, but these contact rates can change over time17 and vary substantially around the world37. Recent technological developments have facilitated bluetooth-based and GPS studies of contact patterns37,38,39, providing rich, granular data with increasingly large sample sizes, and a foundation for developing general principles of human interaction that could be used in epidemic models. Passively collected, aggregated mobile phone data have also become increasingly available on a near real-time basis and at scale21. During the COVID-19 pandemic, many modellers have begun to examine whether local mobility metrics and foot-traffic data are useful proxies for contact rates within populations40,41,42,43,44,45,46. This can be effective when changes in mobility that occur on a scale that is measurable using mobile phone data are strongly correlated with contact rates. For example, at the beginning of the COVID-19 pandemic, Badr et al.46 used aggregated mobility metrics from mobile phones to show that marked reductions in mobility occurred throughout the USA in mid-March of 2020, regardless of local social distancing policies, and that this was strongly associated with a drop in COVID-19 growth rates across the country. In this case, aggregated mobility data provided a meaningful proxy for the contact rates that drove changes in transmission on a county level. In general, however, if contact rates are decoupled from mobility patterns measured in this way—which the authors suggest occurred after April 2020 in the USA25—an understanding of local transmission patterns still requires local data collection and/or contextual knowledge.

Fig. 1: The nature of the kin structures in a community strongly influences who mixes with whom.
figure 1

Kin structures can affect patterns of interaction at all geographic scales. For example, at the local level, in some cultures extended families live and interact within the same house or complex. Kin relationships also strongly affect normal patterns of visiting behaviour, as much movement in all human cultures involves visiting kin. In places without running water, it is common for laundry to be done in streams or rivers, and this is often an important social activity, especially for women and children. In some cultures, women and men have different roles, including differential participation in labour migration and agricultural work, and women may be confined to family compounds during the day. At the regional level, families regularly congregate for larger-scale social events such as weddings and holidays, with longer-distance travel connecting communities. When disease does occur, it is often measured by routine surveillance systems that only identify cases when treatment is sought, and when they are correctly diagnosed and reported.

Careful examination of the interactions between people inside and outside their households that lead to heterogeneous infection risk can produce epidemiological models that yield powerful and generalizable insights, despite their specificity. For example, for the Aedes mosquito-borne virus that causes dengue fever, household variation in disease incidence has often been assumed to almost exclusively reflect the spatial distribution of mosquito vectors. However, by closely monitoring people’s movements in relation to dengue clusters in Iquitos, Peru, Stoddard et al.47,48 showed that patterns of inter-household mobility associated with visiting friends and family were also a major driver of dengue transmission and needed to be considered in addition to spatial variation in mosquito densities. Similarly, by combining detailed survey data and precise location information about an outbreak of another Aedes mosquito-borne disease, Chikungunya, in Bangladesh, Salje et al.49 reconstructed transmission chains to show that this disease was highly localized to socially connected households within particular communities, and that the 1.5 times higher risk of infection among women coincided with the 1.5 times higher likelihood that they stayed in the home during the day. The power of these studies reflects the combination of social science data and rich epidemiological information, coupled with sophisticated analytics. It is not that models without these details would be wrong per se, but rather that the addition of social science data provides important mechanistic insights into how transmission works on this local scale; behaviours and patterns of household visiting will define the course of any particular outbreak and must be understood when generating context-specific policy.

These concerns in turn emphasize the continuing importance of on-the-ground data collection from surveys, the value of gender-disaggregated data—which the WHO only made standard practice for global health statistics in 201950—and the role of local knowledge in model and study design. Local knowledge is also key for interpreting epidemiological data used to fit and validate models (Box 2). Despite this, local social phenomena are often left out of disease models12, sometimes because they are developed in a different context, for use at a different scale, or for academic purposes by researchers who are unaware of local realities, or because the social science data are time-consuming to collect or unavailable. The rich new data sources discussed above raise the question of how much detail should be included in order to understand the mechanisms that drive disease or to capture population-level dynamics at different scales. In all models, a trade-off exists between parsimony and realism that hinges on the scale and purpose of the model: while it is certainly true that “it makes no sense to convey a beguiling sense of ‘reality’ with irrelevant detail, when other equally important factors can only be guessed at,”9 it is also the case that a failure to capture the critical deviations from assumptions of random mixing may lead to weak predictions, misspecified estimates of transmission51, and poor policy decisions. Therefore, matching the model and data structures to the scale of the research or policy question becomes the most important challenge for capturing the social drivers of epidemiological dynamics within a population.

Regional mobility and between-population transmission

Travel outside the community also plays a key role in spreading diseases (Fig. 1), and spatial models of infectious diseases often incorporate travel as a migration rate between populations. Traditionally, simple, theoretically derived gravity and radiation models—both based on the reasonable idea that large populations attract travellers, but not requiring specific data about mobility—have often been used as fixed parameters to describe mobility dynamics in these metapopulation models52,53,54. The increasing availability of mobile-phone-derived data on regional mobility is permitting the validation of these frameworks in real-world settings. The results of such studies suggest that gravity models systematically underestimate the volume of long-distance travel in our highly connected world, and may do poorly in rural areas55,56,57. They also highlight the importance of seasonal patterns of connectivity or asymmetric population shifts, such as holiday travel or displacement due to conflict or natural disasters. A recent comparison of aggregated mobile phone data from three countries showed that seasonal patterns of travel are a general feature of modern societies58. For example, in Kenya, this seasonal flux in population density, coinciding with school term times, was shown to be a stronger predictor of the regional patterns of the childhood infection rubella than rainfall and other explanations, explaining Kenya’s unusual three-peak pattern of rubella incidence59. Using mobile phone data to measure travel patterns in Bangladesh, Mahmud et al.60 observed large travel surges occurring out of the capital city of Dhaka to all parts of Bangladesh during the Eid festivals. In 2017, this holiday coincided with a large Chikungunya outbreak in the city, which spread throughout Bangladesh after the holiday, just as outbreaks of respiratory viruses in the global north at the end of December are often associated with holiday travel61,62.

Mobile phone data therefore provide valuable insights about these relative travel routes of millions of people between different places for the first time, as well as asymmetric movement patterns and large shifts in population density. There are limits to the insights mobile phone data streams can afford, however, primarily related to a gap in social insight; they are spatially coarse relative to contact patterns that spread disease, as previously discussed; they have implicit biases (for example, they do not include children and other people without phones); and they usually do not tell us anything about who is travelling and why. It will be important to measure and quantify bias and representativeness in these datasets26,63,64 as they are used more routinely, as well as to engage in meaningful efforts to standardize approaches to both analysis and privacy for this relatively new public health application26.

Analogous to the problems with random mixing assumptions in single-population models, the contact rate between populations often reflects travel by particular subsets of the population; in other words, the probability of travelling is not randomly distributed, but the mobility rates in most models assume that it is, and usually mobile phone data are not disaggregated demographically, in order to preserve the privacy of subscribers. When mobility data are available that are disaggregated, by gender for example, striking differences in mobility may emerge. A study of an urban setting in Latin America41 illustrates this heterogeneity well—women are both more localized in their movements and visit fewer locations than men, which may be important for infection dynamics in a given setting, depending on the pathogen. Surveys have shown that women with children also travel less to urban centres across sub-Saharan Africa compared to other demographics56. In a rural setting in Bangladesh, a survey of patients with malaria showed similar trends, with men travelling much greater distances than women65. Given that global gender roles often follow this pattern, it is likely that these findings are general and relevant for building robust disease models, especially when combined with gender-disaggregated health data. Contextual understanding coupled with disaggregated data, often generated using traditional social science approaches, is therefore not something we can do away with in the era of big data. Rather, these new data streams are uncovering dynamics that will be much more powerful when complemented by social science data and analysis.

Some of these gender differences in regional mobility are related to occupational activities, which is another important factor that drives contact rates between populations. Labour migration has long been studied by social scientists, but is challenging to incorporate into epidemic models. The importance of labour migration in particular demographic groups, for example linked to forest and plantation work, agriculture and livestock, or gold mining66,67, has been known to drive regional patterns of malaria transmission for decades68. In the Sahel region of Africa, where the malaria burden is intense and highly seasonal, pastoral livestock farming is a key economic activity, and pastoralist communities are highly mobile as they search for pasture and water for their livestock, often in areas that expose them to malaria, and they exhibit seasonal migration within and between countries (Fig. 2). Rapid environmental changes and competition for land have adversely affected pastoralist production systems, which has resulted in conflicts, volatility in mobility patterns, and various other adaptive behaviours that are hard to generalize, but are fundamental to malaria transmission and control in the region.

Fig. 2: Transhumant pastoralists cross through these climatic zones during the course of the year.
figure 2

Transhumance is a type of nomadism in which the seasonal movement of people is driven by the need for livestock pasture. In the rainy season, pastoralists in the Sahel region spread into rich, but short-lived, pastures, while they move further south with the onset of the dry season. After spending the height of the dry season in the more humid south, they move back north before the beginning of the agricultural activities of the rainy season94. These seasonal movements involve both cross-border (red arrows) and national (blue arrows) travel patterns that shape the lifestyles of these populations. The influence of seasonal movements of particular subsets of a population on these different national and international scales is challenging to measure and capture within transmission models of disease. Adapted with permission from ref. 95.

Mathematical frameworks of malaria, which were among the first epidemiological models to be developed for any infectious disease69, struggle to accommodate this kind of mobility. In fact, projections for future scenarios of malaria transmission under various interventions in sub-Saharan Africa that are based in part on mechanistic frameworks may not include any mobility parameters70. In the absence of understanding of these social contexts, models may assume that the prevalence of infection reflects local transmission characteristics, rather than imported infections. However, surveillance data from around the world suggest that imported infections actually represent the majority of cases in some settings. In a study in Nairobi, for example, two-thirds of patients with malaria tested in a facility in an informal settlement had a history of travel and nearly 80% of those who had travelled had visited counties with high malaria transmission71. In settings with frequent importation, therefore, policy targets and funding should focus on managing infections in travellers, not local mosquito control, and models of malaria transmission that fail to account for mobility will fail to capture the key socioeconomic mechanisms that drive the disease. Although these human aspects of malaria transmission continue to be emphasized as major impediments to elimination72, both generalizable and specific mobility frameworks are lacking and are often ignored in malaria transmission models that are used to guide elimination scenario planning, leading to the mistaken general assumption that low incidence regions are straightforward elimination targets73.

New data streams—not only from mobile phones but also from surveys and malaria parasite genetic data, which yield insights into the relatedness of different parasite populations—are allowing more sophisticated modelling approaches to identifying the ‘sources’ and ‘sinks’ of malaria infections, however. Chang et al.74 combined mobile phone data with parasite genetic data and surveys to model the spread of malaria in rural Bangladesh, for example. Modelling the expected flow of parasites using these different inputs as mobility parameters showed that there was broad agreement between models; parasites moved east to west as people travelled between the forests and more populous regions, with the survey confirming the importance of labour migration to the forest. In many ways, the relatively sophisticated modelling approach confirmed what the National Malaria Control Program already knew—that people get malaria in the forest—but it provided useful evidence for this local knowledge, as well as estimates about the volumes of importation and specific routes and hotspots on which to focus interventions. Efforts to combine and validate new data streams, as well as more theoretical models such as gravity and radiation models57, with surveys and other social science tools will be an important next step in the development of general mobility frameworks that describe labour migration around the world.

Epidemics are not like the weather

So far we have focused on behaviours that can be included as model parameters, such as contact rates between groups and baseline travel behaviour. In these cases, more, better quality, or different data about contact rates can improve model accuracy. However, human behaviour can also change in response to their awareness of, and information about, a disease. This can create feedback between the real (prevalence-based) or perceived (belief-based) risk of an infection, assessed by individuals in a population based on available information, and the behaviours that in turn drive its transmission, such as contact rate or the use of preventive interventions75. Here, human behaviour must be built into epidemiological models, with social parameters mechanistically linked to changes in the disease itself or beliefs about the disease76.

For endemic diseases for which prevention requires active participation by affected communities, such as treatment seeking and the use of preventive measures, understanding human behaviour in the context of risk perception and avoidance is essential for the development of dynamic frameworks to predict the impact of public health policies. Treatment seeking and adherence to drug regimens in the context of tuberculosis77,78, the use of condoms in the prevention of HIV transmission79,80, and sleeping under insecticide treated nets to prevent malaria81 all represent examples of complex human behaviours—particularly adherence to treatment and other interventions—that are challenging to integrate into model frameworks. In fact, these three major infectious disease threats are arguably among the most challenging for which to create robust theoretical frameworks in the context of interventions, for this reason82.

Epidemics, and the public alarm they can generate, create particularly strong feedback between behaviour and disease dynamics. For example, Epstein et al.83 modelled two interacting contagion processes that describe the spread of infectious disease and the spread of fear about the epidemic, which leads individuals to effectively remove themselves from the population. These social–epidemiological feedback loops lead to more complex disease dynamics than expected under a model with fixed behaviours; fear of the disease drives behavioural changes in contact rate as people take steps to isolate themselves, leading to flattened epidemic peaks and multiple waves of infection as the perceived and/or real risk of disease fluctuates. This is exactly what we have observed during the COVID-19 pandemic, with the social response to specific interventions such as social distancing driving the variable course of SARS-CoV-2 incidence around the world2. The publicly available data from online information sharing on platforms such as Twitter, where information and misinformation spread in parallel with the epidemic itself, will provide a rich source of information for investigating how different societies have reacted to the pandemic84. These social media data come with the same caveats discussed above in the context of mobile phone data, and the representativeness of new data streams in different contexts should be reported by data providers, and adequately measured—through social science studies, among others—and accounted for in modelling research and application.

In this context, the COVID-19 pandemic has created an incredible natural experiment on a global scale, with similar policies being enacted around the world in diverse social contexts. Similarities and differences in the trajectories of local epidemics of the same virus reflect the variable populations involved. Some social dynamics related to the contact rate parameters that we have discussed above do appear to be generalizable over time and in different contexts, and have been repeated around the world during the COVID-19 pandemic. In early 2020, Kissler et al.85 measured SARS-CoV-2 prevalence in women who gave birth at different New York City hospitals and found that there were marked differences across the city, ranging from about 10% in Manhattan to as high as 50% in the Bronx. Analysis of mobility data from Facebook users over the same time period showed that these local variations in incidence were strongly associated with continuing commuting behaviour in neighbourhoods with lower socioeconomic status, consistent with the inability of essential workers to lock down. In fact, this inability of lower-income people to reduce mobility as much as those in wealthier neighbourhoods has been associated with differential disease burden and mortality in cities around the world5,86.

Another characteristic response to policies during the pandemic has been an emptying out of urban centres. Throughout history, people have always fled urban centres when an epidemic hits, whether due to an outbreak of cholera in historical London, or due to a perceived but non-existent outbreak of bubonic plague that caused mass panic and the displacement of hundreds of thousands of people from Surat, India in 199576. These shifts in population density and increases in long-distance travel have important and general implications for understanding infectious diseases and designing public health policies. In response to COVID-19 lockdown policies, mobile phone data from around the world have uncovered similar behavioural responses to lockdown policies and travel restrictions, with similar dynamics occuring in urban centres in the USA, France, Spain, India, and Bangladesh87. It is likely that the social and demographic factors that drive this urban–rural migration vary greatly in these different settings, with the exodus from Manhattan perhaps representing wealthy people going to country homes and the movement patterns of people in Bangladesh corresponding to the movement of workers in response to the closing and re-opening of garment factories. These differences emphasize the power of coupling large-scale datasets with local context, and highlight the importance of further setting-specific research to untangle the general versus local drivers of these behaviours.

There is currently strong interest in the development of national epidemic modelling and forecasting centres in the USA and elsewhere, with parallels being drawn to the evolution of weather forecasting services. The cases described above bring into question the extent to which disease forecasting efforts can be compared to weather forecasts; however, are shifting behaviours based on local and global information about epidemics (and related policies) ever going to be predictable in the way that physical laws are? If a weatherman forecasts rain and everyone stays at home, it still rains. Not so with infectious diseases. We argue that short-term forecasting efforts that use ensemble or consensus approaches88,89,90 are promising, and for models with the goal of making predictions rather than understanding mechanisms, simple approaches are often adequate, if not more tractable and therefore desirable. This is highlighted by disease forecasting efforts for COVID-1988, Ebola89, and influenza91, in which simple models can produce predictions as powerful as those of more complex models over a short timescale. However, a deeper understanding of social and behavioural aspects of risk perception and decision-making is likely to be needed to make medium-term predictions and mechanistic statements about possible future trajectories of epidemics, or to develop models designed to inform interventions in different settings.

Key to these latter models will be more research to understand how people’s behaviours will change in response to particular policies; in this case, experimental or quasi-experimental evidence may be required to improve the predictive power of models that include social–epidemiological feedback mechanisms92. We argue, therefore, that rather than creating one large model to forecast disease outbreaks, like a weather forecast, developing distributed research capacity that can respond to specific outbreaks in particular contexts by designing models flexibly and with feedback from local health systems is perhaps a wiser investment.

Thinking clearly about modelling epidemics

If mathematical models are “no more and no less than a way of thinking clearly,”9 then it is essential for modellers to think clearly about how modelling decisions about social aspects of transmission reflect the model’s function. In particular, what social phenomena contribute to mechanistic aspects of transmission that are important for population-level dynamics, are these measurable, and, if not, how do they constrain the model’s utility in different contexts? What principles should guide decisions about the scale of a model (for example, individual versus population) given the questions that are being addressed, and what kinds of data are needed to parameterize and validate the resulting model?

There are important distinctions between epidemiological models developed with scientific goals of understanding disease ecology, and epidemiological models developed to inform public health policies, for example. Models with primarily academic or theoretical goals tend to err on the side of abstraction, whereas policy-relevant models may have a more ‘realistic’ depiction of contact rates. Indeed, there is sometimes an implicit assumption in the infectious disease modelling field that integrating setting-specific social interactions in a disease model will inevitably detract from its generalizability, limiting its relevance beyond a particular case study. We argue against this dogma and propose that for models intended to understand mechanisms driving outbreaks—particularly on local scales—“data is the plural of anecdote.”93. It is often by understanding specific social contexts and integrating insights from multiple applications to different contexts that general principles can be drawn. This philosophy requires a distributed generation of knowledge that is unwieldy to integrate into a unified theory, but ultimately can lead to general principles about what can, and what cannot, be ignored about local social contexts for models with different purposes.

In the public health context, substantial investment in modelling capacity is needed at the local and regional levels—not just in the context of dynamical modelling, but also for general statistical and quantitative capacity—to translate the sophisticated and data-rich approaches now available to us into better decision-making. This would ensure that models are being used appropriately in the context of policy. For example, many of the decisions we have discussed, such as whether a simple or a complex model is needed or whether new or different data streams would be helpful, require local literacy in quantitative methods that may be lacking. Partnerships between academic centres or research institutes and public health agencies and governments, as well as better training infrastructure, is therefore needed on an ongoing basis and in the context of endemic pathogens, so that modelling tools can be developed rapidly when a crisis such as COVID-19 occurs.