Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Crowding and the shape of COVID-19 epidemics

## Abstract

The coronavirus disease 2019 (COVID-19) pandemic is straining public health systems worldwide, and major non-pharmaceutical interventions have been implemented to slow its spread1,2,3,4. During the initial phase of the outbreak, dissemination of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was primarily determined by human mobility from Wuhan, China5,6. Yet empirical evidence on the effect of key geographic factors on local epidemic transmission is lacking7. In this study, we analyzed highly resolved spatial variables in cities, together with case count data, to investigate the role of climate, urbanization and variation in interventions. We show that the degree to which cases of COVID-19 are compressed into a short period of time (peakedness of the epidemic) is strongly shaped by population aggregation and heterogeneity, such that epidemics in crowded cities are more spread over time, and crowded cities have larger total attack rates than less populated cities. Observed differences in the peakedness of epidemics are consistent with a meta-population model of COVID-19 that explicitly accounts for spatial hierarchies. We paired our estimates with globally comprehensive data on human mobility and predict that crowded cities worldwide could experience more prolonged epidemics.

## Main

Predicting the epidemiology of the COVID-19 pandemic is a priority for guiding epidemic responses around the world. China has undergone its first epidemic wave, and, remarkably, cities across the country are now reporting few or no locally acquired cases8. Analyses have indicated that the spread of COVID-19 from Hubei to the rest of China was driven primarily by human mobility from Wuhan6,9, and that the stringent measures to restrict human movement and public gatherings within and among cities in China were associated with bringing local epidemics under control5. Key uncertainties remain as to which geographic factors drive the local transmission dynamics of COVID-19, and initial analysis suggests a limited role of climate in determining epidemic growth10.

Spatial heterogeneity in infectious disease transmission can be influenced by local differences in population or human movements, such that high local population densities might catalyze the spread of new pathogens due to higher contact rates with susceptible individuals11,12. For respiratory pathogens, the temporal clustering of cases in an epidemic (that is, the shortest period during which most cases are observed) varies with increased indoor crowding and socio-economic and climatic factors13,14,15,16,17,18. The temporal concentration of cases is minimized when incidence is spread evenly across time and increases as incidence becomes more concentrated in particular days, as has been observed for influenza13. In any given location, a higher temporal concentration of cases might require a larger surge capacity in the public health system19, especially for an emerging respiratory pathogen such as COVID-19 (ref. 20).

## Results

### Spatial population structure predicts the shape of epidemics of COVID-19

China and Italy provide detailed epidemiological time series for COVID-19 (refs. 2,21,22) across a wide range of geographic contexts; hence, the outbreaks in these countries provide an opportunity to evaluate the role of local factors in shaping epidemic behavior. We used daily epidemiological data from Chinese cities23,24 and Italian provinces, climate and population data and the response to local interventions as measured by human mobility data from Baidu25 and the COVID-19 Aggregated Mobility Research Dataset (https://www.google.com/covid19/mobility/) to identify drivers of transmission, with a focus on how the temporal clustering of cases differs between prefectures in China and provinces in Italy. A summary of the main findings, limitations and policy implications of our study is shown in Table 1.

We used daily incidence data of confirmed COVID-19 cases aggregated at the prefectural level (n = 293) in China (Fig. 1a) and at the province level (n = 108) in Italy. Prefectures and provinces are administrative units that typically have one urban center (Fig. 1b). We aggregated daily individual-level data collected from official government reports22. Epidemiological data in each prefecture were truncated to exclude dates before the first and after the last day of reported cases during the first epidemic. Cases reported after March 1, 2020, that were imported from outside China were excluded from the analysis. All epidemiological data from Hubei Province were excluded because of the lack of prefecture-level epidemiological data and issues with consistent reporting before January 20, 2020. The shape of epidemic curves varied among prefectures, with some showing a rapid rise and decline in reported cases and others showing more prolonged epidemics (Fig. 1a and Extended Data Fig. 1).

To characterize the temporal clustering of cases for each prefecture and province, we calculated the Shannon diversity index of the distribution of incident cases13. We defined the incidence distribution pij for a given city to be the proportion of COVID-19 cases during the first epidemic wave j that occurred on day i. The Shannon index of incidence for a given prefecture and year is given by $${\it{v}}_{\it{j}} = \left( { - \mathop {\sum }\limits_i p_{ij}\log p_{ij}} \right)^{ - 1}$$. Because vj is a function of the disease incidence curve in each location, rather than of absolute incidence values, it is less sensitive to varying reporting rates among cities. The Shannon index is maximal when all cases occur on the same day and minimal when each day of the epidemic has the same number of incident cases (for example, ‘flat’ epidemic curves). It is highly correlated with alternative measures of epidemic peakedness, such as the proportion of cases that occur at the peak ± 1 d (Extended Data Fig. 2). The total attack rate of reported COVID-19 cases in each prefecture is strongly negatively correlated with the Shannon index in China (Fig. 1c); hence, less peaked epidemics have a larger total attack rate (Pearson’s r = −0.67, 95% confidence interval (CI), −0.73 to −0.59, P < 0.01; for Italy, R2 = 0.33, P < 0.01). We hypothesize that this variation among cities in total attack rate and the temporal clustering of cases is the result of the spatial organization of human populations.

To test this hypothesis, we used Lloyd’s index of mean crowding13,26, treating the population count of each spatial grid cell as an individual unit (Fig. 1). The term ‘mean crowding’ used here is a specific geographic metric that summarizes both population density and how density is distributed across a prefecture (that is, patchiness; Fig. 1). Higher values of Lloyd’s index suggest a spatially aggregated population structure. For example, Xi’an has high values of crowding, whereas Bozhou has a similar population density but a population that is more evenly distributed across the prefecture (Fig. 1b). We performed log-linear regression modeling to determine the association between the temporal clustering of cases with socio-economic and environmental variables, including reductions in population flows during the outbreak period (Methods).

We found that the temporal clustering of cases was significantly negatively correlated with the mean number of contacts (P < 0.01) but positively correlated with mean population density (P < 0.01) and varies widely across China and Italy (Fig. 2 and Supplementary Table 1). This observation contrasts with the expectations of simple and classical epidemiological models, which predict higher peakedness in crowded areas due to the increased availability of susceptible individuals27,28. The spatial scale at which this relationship is best explained was 10 × 10 km, but results were statistically significant at all spatial scales between 1 and 50 km2 (Extended Data Fig. 3; P < 0.01). Mean specific humidity and population mobility remained significantly negatively correlated with epidemic peakedness when included in a multivariate model with crowding (Supplementary Table 1; all P < 0.01).

Using weekly human mobility data, we found that within-city human mobility during the outbreak was correlated with the temporal clustering of cases—that is, prefectures that have larger reductions in mobility also have lower epidemic peakedness (Extended Data Fig. 4 and Supplementary Table 1; P < 0.01). When we combined mobility reduction in a model with crowding and humidity, we found that these variables each remained significant predictors of the temporal clustering of cases (Extended Data Table 1; P < 0.01). These results suggest that, although measures to reduce mobility can successfully lead to a flattening of the epidemic curve, population crowding is an independent contributor to the shape of epidemics in these two countries.

Our multivariate model can explain a large fraction of the variation in epidemic peakedness among Chinese cities and Italian provinces, and sensitivity analyses confirm the robustness of our results to potential noise in location-specific incidence distributions (R2 = 0.638; Extended Data Fig. 2, Supplementary Table 1 and Extended Data Fig. 5). To evaluate the out-of-sample performance of our model, we 1) performed n-fold cross validation at the prefecture level in China (Spearman’s ρ = 0.61; 95% bootstrap CI, 0.52–0.68; P < 0.01); 2) used the fitted model in China to estimate peak intensity at the corresponding administrative level 2 locations—that is, province level, in Italy (Spearman’s ρ = 0.57; 95% bootstrap CI, 0.41–0.69; P < 0.01); and 3) performed n-fold cross validation at the province level in Italy (Spearman’s ρ = 0.65; 95% bootstrap CI. 0.52–0.76; P < 0.01). These results suggest that the model is robust to both within- and between-country out-of-sample testing (Extended Data Fig. 6).

To evaluate the potential effect of the temporal clustering of cases on the peak attack rate and total attack rate, we performed a simple linear regression (Supplementary Table 2). For locations that have a single peak, the attack rate at the peak is highest in two settings: 1) in crowded locations with high population size (prefectures that also experience high total attack rates); and 2) in locations that have lower population and lower crowding and, therefore, high temporal clustering of cases (Extended Data Fig. 7). Other prefectures that have low population and low crowding sometimes experience very short outbreaks with a small peak attack rate, suggesting local stochastic extinction possibly due to limited mixing between populations. We hypothesize that the observation that high peak attack rates can sometimes be found in low crowding areas is related to rare super-spreading events as observed in Bergamo, Italy, or Mulhouse, France.

### Simulation of COVID-19 epidemics in hierarchically structured populations

We hypothesize that the mechanism underlying our central observation—that more crowded cities experience less peaked outbreaks—is that crowding enables sustained transmission among households and through a city’s population, leading incidence to be widely distributed through time. To explore this proposed mechanism, we simulated stochastic epidemic dynamics in two types of populations. Simple, well-mixed transmission models in which contact rates are high in crowded regions were not consistent with our findings, because they predict that crowded regions would have more temporally clustered outbreaks. To capture realistic contact patterns, we created hierarchically structured populations29 in which individuals had high rates of contact within their social units (which are defined broadly and could represent households, care homes, hospitals, prisons, etc); lower rates with individuals from other units but within the same neighborhoods; and relatively rare contact with other individuals in other neighborhoods within the same prefecture (Fig. 3a). These assumptions are consistent with reports that most onward transmission after lockdowns were implemented occurred in households or in other close-contact situations2,30. In this scenario, less crowded prefectures often had more peaked and shorter outbreaks that were isolated to specific neighborhoods, whereas more crowded prefectures could sustain drawn-out outbreaks of larger final size, which jumped among the more highly connected neighborhoods (Fig. 3b,c). Further, if the reproduction number of COVID-19 is over-dispersed31,32,33, then crowding could enable local outbreaks to spread more widely due to the availability of contacts34.

We also simulated outbreak dynamics under extensive social distancing measures, as observed in Chinese prefectures (75% reduction in contact rates35,36). If social distancing reduces non-household contacts by the same relative amount in all locations, there will be more contacts remaining in crowded areas, because baseline contact rates are higher. Consequently, outbreaks in crowded regions could be larger and take longer to end after intervention (Fig. 3d, Fig. 1c and Extended Data Fig. 1).

Using the fitted model from China paired with globally comprehensive covariates, we extrapolated our results to cities across the world (Fig. 4). Human mobility data from Baidu were not available for locations outside of China. Therefore, we used aggregated human mobility data from Google’s COVID Mobility Research Dataset (Methods) to capture relative differences in human mobility through time. At the global scale, cities in yellow are predicted to have concentrated and peaked epidemics, whereas cities in blue are predicted to have more prolonged outbreaks (Fig. 4b; a full list is provided in the Supplementary Information). In general, the epidemics in coastal cities were less peaked and were larger and more prolonged, which could be attributable to high levels of population crowding in coastal cities. These predictions rely on fitted relationships of the first epidemic curves from Chinese and Italian cities and, therefore, should be interpreted very cautiously when generalizing to other settings.

## Discussion

Our findings confirm previous work on the peakedness of epidemics transmission for influenza in cities13. Our work provides empirical support for the role of spatial organization in determining infectious disease dynamics29,37 and, specifically, spatial variability in transmission parameters38. Furthermore, with lower total incidence in small cities compared to larger cities, the risk of resurgence could be elevated owing to lower population immunity after the first wave of the epidemic. Higher seroprevalence for COVID-19 in urban areas39 provides initial data to support these findings; however, there remains an urgent need to expand serological data collection and provide a full picture of attack rates across cities worldwide40. Even though our model does not account for over-dispersion in COVID-19 transmission, there is a theoretical link between the reproduction number in heterogeneous environments and Lloyd’s crowding index of aggregation41, such that the reproduction number increases with higher aggregation34. We report that, in dense cities, reductions in mobility tend to be larger, which potentially elevates the effectiveness of non-pharmaceutical interventions in dense cities42. However, assessing the effect of within-city connectivity and its spatial heterogeneity on disease dynamics will be critical to further our understanding of how COVID-19 spreads in urban areas. We found that there is an association between climatic factors and the peakedness of epidemics, but particular caution will need to be applied in interpreting these relationships outside the two studied countries (Italy and China). More work is needed to provide causal evidence for the effect of climatic factors on transmission dynamics of COVID-19 during the pandemic and post-pandemic phases10.

Currently, non-pharmaceutical interventions are the primary control strategy for COVID-19. As a result, public health measures are often focused on ‘flattening the curve’ to lower the risk of essential services running out of capacity. We show that spatial context, especially crowding, are important factors for assessing the shape of epidemic curves. Therefore, it will be critical to view non-pharmaceutical interventions through the perspective of crowding—that is, how does an intervention reduce the circle of contacts of an average individual—in cities across the world.

## Methods

### Epidemiological data

No officially reported line list was available for cases in China. We used a standardized protocol43 to extract individual-level data from December 1, 2019, to March 30, 2020. Sources were mainly official reports from provincial, municipal or national health governments. Data included basic demographics (age and sex), travel histories and key dates (dates of onset of symptoms, hospitalization and confirmation). Data were entered by a team of data curators on a rolling basis, and technical validation and geo-positioning protocols were applied continuously to ensure validity. A detailed description of the methodology is available22. Lastly, total numbers were matched with officially reported data from China and other government reports. Daily case counts from Italian provinces (n = 107) were extracted from the Presidenza del Consiglio dei Ministri Dipartimento della Protezione Civile (https://github.com/pcm-dpc/COVID-19).

### Estimating epidemic peakedness

Epidemic peakedness was estimated for each prefecture by calculating the inverse Shannon entropy of the distribution of COVID-19 cases. Inverse Shannon entropy was used to fit time series of other respiratory infections (influenza)13. The inverse Shannon entropy of incidence for a given prefecture in 2020 is then given by $$v_j = \left( { - \mathop {\sum}\nolimits_i {p_{ij}\log p_{ij}} } \right)^{ - 1}$$. Because vj is a function of incidence distribution in each location rather than raw incidence, it is invariant under differences in overall reporting rates between cities or attack rates. We then assessed how peakedness $${\it{v}} \propto \mathop {\sum}\nolimits_j {v_j}$$ varied across geographic areas in China. As an alternative measure of temporal clustering of cases, we estimated the proportion of cases at the peak ± 1 d (Extended Data Fig. 2).

### Proxies for COVID-19 interventions using within-city human mobility data from China

Estimates of within-city reductions of human mobility between the period before and after the lockdown was implemented on January 23, 2020, were extracted from Lai et al.36. Daily measures of human mobility were extracted from the Baidu Qianxi web platform to estimate the proportion of daily movement within prefectures in China. Relative mobility volume was available from January 2, 2020, to January 25, 2020. For each city, change in relative mobility was defined by mi=mil(lockdown)/mib(baseline), where mi is defined as mobility in prefecture i. Baidu’s mapping service is estimated to have a 30% market share in China, and more data can be found5,6.

### Data on drivers of transmission of COVID-19

Prefecture-specific population counts and densities were derived from the 2020 Gridded Population of The World, a modeled continuous surface of population estimated from national census data and the United Nations World Population Prospectus44. Population counts are defined at a 30-arc-second resolution (approximately 1 km × 1 km at the equator) and extracted within administrative 2 level cartographic boundaries defined by the National Bureau of Statistics of China. Lloyd’s mean crowding, $$\frac{{\left[ {\mathop {\sum }\nolimits_i \left( {q_i - 1} \right)q_i} \right]}}{{\mathop {\sum }\nolimits_i q_i}}$$, was estimated for each prefecture, where qi represents the population count of each non-zero pixel within a prefecture’s boundary and the resulting value estimates an individual’s mean number of expected neighbors13,45. When fitting the models, we consider the numerator $$\left[ {\mathop {\sum}\nolimits_i {\left( {q_i - 1} \right)q_i} } \right]$$, which we refer to as ‘contacts’, and the denominator $$\mathop {\sum}\nolimits_i {q_i}$$ (that is, population size) as separate predictors. We note that a negative slope for ‘contacts’ and a positive slope for ‘population’ support a negative coefficient for Lloyd’s mean crowding.

Daily temperature (°F), relative humidity (%) and atmospheric pressure (Pa) at the centroid of each prefecture was provided by The Dark Sky Company via the Dark Sky API and aggregated across a variety of data sources. Specific humidity (kg/kg) was then calculated using the R package humidity16. Meteorological variables for each prefecture were then averaged across the entirety of the study period.

### Statistical analysis

We normalized the values of epidemic peakedness between 0 and 1 and, for all non-zero values, fit a generalized linear model of the form

$${\mathrm{log}}\left( {Y_j} \right)\sim \beta _0 + \beta _1{\mathrm{log}}\left( {C_j} \right) + \beta _2{\mathrm{log}}\left({h_j}\right) + \beta _3{\mathrm{log}}\left( {P_j} \right) + \beta _4{\mathrm{log}}\left( {f_j} \right) + \beta _5{\mathrm{log}}\left( {t_j} \right)$$

where, for each prefecture j, Y is the scaled inverse Shannon entropy measure of epidemic peakedness derived from the COVID-19 time series; C is the mean number of contacts26,46; h is the mean specific humidity over the reporting period in kg/kg; P is the estimated population density; f is the relative change in population flows within each prefecture; and t is daily mean temperature.

### Projecting epidemic peakedness in cities around the world

We selected 310 urban centers from the European Commission Global Human Settlement Urban Centre Database and their included cartographic boundaries47. To ensure global coverage, up to the five most populous cities in each country were selected from the 1,000 most populous urban centers recorded in the database. Population count, crowding and meteorological variables were then estimated following identical procedures used to calculate these variables in the Chinese prefectures. Weather measurements were averaged over the 2-month period starting on February 1, 2020.

The parameters from the model of epidemic peakedness predicted by humidity, crowding and population size (Supplementary Table 1, model 6) were used to estimate relative peakedness in the 310 urban centers. A full list of predicted epidemic peakedness values can be found in Supplementary Table 3.

### Global human mobility data

We used the Google COVID-19 Aggregated Mobility Research Dataset, which contains anonymized relative mobility flows aggregated over users who have turned on the Location History setting, which is off by default. This is similar to the data used to show how busy certain types of places are in Google Maps, helping identify when a local business tends to be the most crowded. The mobility flux is aggregated per week, between pairs of approximately 5-km2 cells worldwide, and for the purpose of this study aggregated for 310 cities worldwide. We calculated both mobility within each city’s shapefile and mobility coming into each city. For each city, change in relative mobility was defined by mi = mil(April)/mib(December), where mi is defined as mobility in city i.

To produce this data set, machine learning was applied to log data to automatically segment it into semantic trips48. To provide strong privacy guarantees, all trips were anonymized and aggregated using a differentially private mechanism49 to aggregate flows over time (https://policies.google.com/technologies/anonymization). This research is done on the resulting heavily aggregated and differentially private data. No individual user data were ever manually inspected; only heavily aggregated flows of large populations were handled.

All anonymized trips were processed in aggregate to extract their origin and destination location and time. For example, if users traveled from location a to location b within time interval t, the corresponding cell (a, b, t) in the tensor would be n ± err, where err is Laplacian noise. The automated Laplace mechanism adds random noise drawn from a zero-mean Laplace distribution and yields (𝜖, δ)-differential privacy guarantee of 𝜖 = 0.66 and δ = 2.1 × 10−29. The parameter 𝜖 controls the noise intensity in terms of its variance, whereas δ represents the deviation from pure 𝜖-privacy. The closer they are to zero, the stronger the privacy guarantees. Each user contributes, at most, one increment to each partition. If they go from a region a to another region b multiple times in the same week, they contribute only once to the aggregation count.

These results should be interpreted in light of several important limitations. First, the Google mobility data are limited to smartphone users who have opted in to Google’s Location History feature, which is off by default. These data might not be representative of the population as whole, and, furthermore, their representativeness might vary by location. Importantly, these limited data are viewed only through the lens of differential privacy algorithms, specifically designed to protect user anonymity and obscure fine detail. Moreover, comparisons across, rather than within, locations are descriptive only because these regions can differ in substantial ways.

### Simulating epidemic dynamics

We simulated a simple stochastic SIR model of infection spread on weighted networks created to represent hierarchically structured populations. Individuals were first assigned to households using the distribution of household sizes in China (data from the United Nations Population Division; mean, 3.4 individuals). Households were then assigned to ‘neighborhoods’ of ~100 individuals, and all neighborhood members were connected with a lower weight. A randomly chosen 10% of individuals were given ‘external’ connections to individuals outside the neighborhood. The total population size was n = 1,000. Simulations were run for 300 d, and averages were taken over 20 iterations. The SIR model used a per-contact transmission rate of 𝛽 = 0.15 per day and recovery rate 𝛾 = 0.1 per day. For the simulations without interventions, the weights were wHH = 1, wNH = 0.01 and wEX = 0.001 for the crowded prefecture and wEX = 0.0001 for the less crowded prefecture. For the simulations with interventions, the household and neighborhood weights were the same, but we used wEX = 0.01 for the crowded prefecture and wEX = 0.001 for the ‘sparse’ prefecture. The intervention reduced the weight of all connections outside the household by 75%.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

We collated epidemiological data from publicly available data sources (news articles, press releases and published reports from public health agencies) that are described in full in ref. 22. Epidemiological and spatial data used in this study are available via Github (https://github.com/Emergent-Epidemics/COVID_crowding). The Google COVID-19 Aggregated Mobility Research Dataset used for this study is available with permission from Google. Code and data are also available at https://zenodo.org/record/4056578#.X3IFF5NKiek.

## Code availability

The code associated with the data analysis and statistics is available from https://github.com/Emergent-Epidemics/COVID_crowding. The simulation code is available from https://github.com/alsnhll/SIRNestedNetwork. Code and data are also available at https://zenodo.org/record/4056578#.X3IFF5NKiek.

## References

1. 1.

Fraher, E. P. et al. Ensuring and sustaining a pandemic workforce. N. Engl. J. Med. 382, 2181–2183 (2020).

2. 2.

Leung, K., Wu, J. T., Liu, D. & Leung, G. M. First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment. Lancet 395, 1382–1393 (2020).

3. 3.

Ji, Y., Ma, Z., Peppelenbosch, M. P. & Pan, Q. Potential association between COVID-19 mortality and health-care resource availability. Lancet Glob. Health 8, e480 (2020).

4. 4.

Rosenbaum, L. Facing Covid-19 in Italy—ethics, logistics, and therapeutics on the epidemic’s front line. N. Engl. J. Med. 382, 1873–1875 (2020).

5. 5.

Tian, H. et al. An investigation of transmission control measures during the first 50 days of the COVID-19 epidemic in China. Science 368, 638–642 (2020).

6. 6.

Kraemer, M. U. G. et al. The effect of human mobility and control measures on the COVID-19 epidemic in China. Science 368, 493–497 (2020).

7. 7.

Lipsitch, M., Swerdlow, D. L. & Finelli, L. Defining the epidemiology of Covid-19—studies needed. N. Engl. J. Med. 382, 1194–1196 (2020).

8. 8.

World Health Organization. Coronavirus disease 2019 (COVID-19) Situation Report - 71 https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200331-sitrep-71-covid-19.pdf?sfvrsn=4360e92b_8 (2020).

9. 9.

Zhao, S. et al. Quantifying the association between domestic travel and the exportation of novel coronavirus (2019-nCoV) cases from Wuhan, China in 2020: a correlational analysis. J. Travel Med. 27, 1–3 (2020).

10. 10.

Baker, R. E., Yang, W., Vecchi, G. A., Metcalf, C. J. E. & Grenfell, B. T. Susceptible supply limits the role of climate in the COVID-19 pandemic. Science 369, 315–319 (2020).

11. 11.

Rocklöv, J. & Sjödin, H. High population densities catalyse the spread of COVID-19. J. Travel Med. 27, taaa038 (2020).

12. 12.

Kraemer, M. U. G. et al. Big city, small world: density, contact rates, and transmission of dengue across Pakistan. J. R. Soc. Interface 12, 20150468 (2015).

13. 13.

Dalziel, B. D. et al. Urbanization and humidity shape the intensity of influenza epidemics in U.S. cities. Science 362, 75–79 (2018).

14. 14.

Shaman, J., Pitzer, V. E., Viboud, C., Grenfell, B. T. & Lipsitch, M. Absolute humidity and the seasonal onset of influenza in the continental United States. PLoS Biol. 8, e1000316 (2010).

15. 15.

Gog, J. R. et al. Spatial transmission of 2009 pandemic influenza in the US. PLoS Comput. Biol. 10, e1003635 (2014).

16. 16.

Shaman, J. & Kohn, M. Absolute humidity modulates influenza survival, transmission, and seasonality. Proc. Natl Acad. Sci. USA 106, 3243–3248 (2009).

17. 17.

Chetty, R. et al. The association between income and life expectancy in the United States, 2001–2014. JAMA 315, 1750–1766 (2016).

18. 18.

Kissler, S. M., Tedijanto, C., Goldstein, E., Grad, Y. H. & Lipsitch, M. Projecting the transmission dynamics of SARS-CoV-2 through the postpandemic period. Science 21, 1–9 (2020).

19. 19.

Crawford, J. M. et al. Laboratory surge response to pandemic (H1N1) 2009 outbreak, New York City Metropolitan Area, USA. Emerg. Infect. Dis. 16, 8–13 (2010).

20. 20.

Grasselli, G., Pesenti, A. & Cecconi, M. Critical care utilization for the COVID-19 outbreak in Lombardy, Italy. JAMA 323, 1545–1546 (2020).

21. 21.

Li, Q. et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N. Engl. J. Med. 382, 1199–1207 (2020).

22. 22.

Xu, B. et al. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci. Data 7, 106 (2020).

23. 23.

Xu, B. et al. Epidemiological data from the COVID-19 outbreak, real-time case information. figshare https://doi.org/10.6084/m9.figshare.11949279 (2020).

24. 24.

Xu, B. & Kraemer, M. U. G. Open access epidemiological data from the COVID-19. Lancet Infect. Dis. 3099, 30119 (2020).

25. 25.

Aurora Big Data. 2017 Mobile Map App Research Report: Which of the Highest, the Baidu, and Tencent Is Strong? https://baijiahao.baidu.com/s?id=1590386747028939917&wfr=spider&for=pc. (2017)

26. 26.

Lloyd, M. ‘Mean crowding’. J. Anim. Ecol. 36, 1–30 (1967).

27. 27.

May, R. M. & Anderson, R. M. Spatial heterogeneity and the design of immunization programs. Math. Biosci. 72, 83–111 (1984).

28. 28.

Anderson, R. M. & May, R. M. Infectious Diseases of Humans: Dynamics and Control (Oxford University Press, 1991).

29. 29.

Watts, D. J., Muhamad, R., Medina, D. C. & Dodds, P. S. Multiscale, resurgent epidemics in a hierarchical metapopulation model. Proc. Natl Acad. Sci. USA 102, 11157–11162 (2005).

30. 30.

Report of the WHO–China Joint Mission on Coronavirus Disease 2019 (COVID-19) https://www.who.int/docs/default-source/coronaviruse/who-china-joint-mission-on-covid-19-final-report.pdf 16–24 (2020).

31. 31.

Lloyd-Smith, J. O., Schreiber, S. J., Kopp, P. E. & Getz, W. M. Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359 (2005).

32. 32.

Kucharski, A. J. et al. Early dynamics of transmission and control of COVID-19: a mathematical modelling study. Lancet Infect. Dis. 3099, 1–7 (2020).

33. 33.

Riou, J. & Althaus, C. L. Pattern of early human-to-human transmission of Wuhan 2019 novel coronavirus (2019-nCoV), December 2019 to January 2020. Euro. Surveill. 25, 1–5 (2020).

34. 34.

Southwood, T. R. in Ecological Methods (ed Southwood, T. R.) 7–69 (Springer Netherlands, 1978).

35. 35.

Zhang, J. et al. Changes in contact patterns shape the dynamics of the COVID-19 outbreak in China. Science 368, 1481–1486 (2020).

36. 36.

Lai, S. et al. Effect of non-pharmaceutical interventions to contain COVID-19 in China. Nature 585, 410–413 (2020).

37. 37.

Sattenspiel, L. Simulating the effect of quarantine on the spread of the 1918–19 flu in central Canada. Bull. Math. Biol. 65, 1–26 (2003).

38. 38.

Meyers, L. A. Contact network epidemiology: bond percolation applied to infectious disease prediction and control. Bull. Am. Math. Soc. 44, 63–87 (2006).

39. 39.

Kissler, S. M. et al. Reductions in commuting mobility predict geographic differences in SARS-CoV-2 prevalence in New York City. Nat. Commun. 16, 4674 (2020).

40. 40.

Lipsitch, M., Swerdlow, D. L. & Finelli, L. Defining the epidemiology of Covid-19 — studies needed. N. Engl. J. Med. 382, 1194–1196 (2020).

41. 41.

Mat, N. F. C., Edinur, H. A., Razab, M. K. A. A. & Safuan, S. A single mass gathering resulted in massive transmission of COVID-19 infections in Malaysia with further international spread. J. Travel Med. 27, taaa059 (2020).

42. 42.

Flaxman, S. et al. Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries. Nature 584, 257–261 (2020).

43. 43.

Ramshaw, R. E. et al. A database of geopositioned Middle East respiratory syndrome coronavirus occurrences. Sci. Data 6, 318 (2019).

44. 44.

Doxsey-Whitfield, E. et al. Taking advantage of the improved availability of census data: a first look at the gridded population of the world, version 4. Pap. Appl. Geogr. 1, 226–234 (2015).

45. 45.

Reiczigel, J., Lang, Z., Rózsa, L. & Tóthmérész, B. Properties of crowding indices and statistical tools to analyse parasite crowding data. J. Parasitol. 91, 245–252 (2005).

46. 46.

Wade, M. J., Fitzpatrick, C. L. & Lively, C. M. 50-year anniversary of Lloyd’s ‘mean crowding’: ideas on patchy distributions. J. Anim. Ecol. 87, 1221–1226 (2018).

47. 47.

Florczyk, A. et al. GHS-UCDB R2019A - GHS Urban Centre Database 2015, multitemporal and multidimensional attributes https://data.jrc.ec.europa.eu/dataset/53473144-b88c-44bc-b4a3-4583ed1f547e (2019).

48. 48.

Bassolas, A. et al. Hierarchical organization of urban mobility and its connection with city livability. Nat. Commun. 10, 4817 (2019).

49. 49.

Wilson, R. J. et al. Differentially private SQL with bounded user contribution. Preprint at https://arxiv.org/abs/1909.01917 (2019).

## Acknowledgements

The authors thank K. Cordiano for statistical assistance. We thank the members of the Open COVID-19 Data Working Group. The members of the group are listed in the Supplementary Note. B.R. acknowledges funding from Google.org. M.U.G.K. acknowledges funding from the European Commission H2020 program (MOOD project) and a Branco Weiss Fellowship. O.G.P., M.U.G.K., A.E.Z. and H.T. acknowledge funding from the Oxford Martin School. H.T. acknowledges funding from the Beijing Science and Technology Planning Project (Z201100005420010). A.L.H. and A.N. acknowledge funding from the National Institutes of Health (DP5OD019851). The funding bodies had no role in study design, data collection and analysis, preparation of the manuscript or the decision to publish. All authors saw and approved the manuscript.

## Author information

Authors

### Contributions

M.U.G.K., O.G.P. and S.V.S. conceived the research. B.R., A.L.H., A.N., B.A., S.V.S. and M.U.G.K. analyzed the data. B.R. and S.V.S. analyzed the human mobility data. C.D., O.G.P., M.U.G.K. and S.V.S. interpreted the data. M.U.G.K. wrote the first draft of the manuscript. All authors contributed to the interpretation of results and manuscript writing.

### Corresponding authors

Correspondence to Samuel V. Scarpino or Oliver G. Pybus or Moritz U. G. Kraemer.

## Ethics declarations

### Competing interests

SVS is a paid consultant with Pandefense Advisory and Booze Allen Hamilton; is on the advisory board for BioFire Diagnostics Trend Surveillance, which includes paid consulting; and holds unexercised options in Iliad Biotechnologies. These entities provided no financial support associated with this research, did not have a role in the design of this study, and did not have any role during its execution, analyses, interpretation of the data and/or decision to submit.

Peer review information Jennifer Sargent was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Extended data

### Extended Data Fig. 1 Proportion of daily cases in prefectures in China.

a, shows the ten flattest epidemics and b, shows the most peaked epidemics. Red and blue curves indicate the average across these prefectures.

### Extended Data Fig. 2

Left panels: Proportion of cases at the peak (± 1 day) vs. inverse Shannon entropy for prefectures in China (n = 262) and regions in Italy (n = 107). Right panels: Proportion of days in the epidemic curve that had cases above the 50th percentile, normalized by the largest reported number of cases versus inverse Shannon entropy for prefectures in China (n = 262) and regions in Italy (n = 107).

### Extended Data Fig. 3

Epidemic peakedness is well explained by covariates at spatial scales from 1 – 50 km in China a, and Italy b.

### Extended Data Fig. 4 Relationship between population density and reduction in mobility in cities in China.

Each dot represents one city in China. Data on human mobility are extracted from Baidu Inc. and are available from Lai et al. 202036.

### Extended Data Fig. 5 Crowding and the temporal clustering of transmission of COVID-19 in Italy.

a, negative association between log10 of epidemic peakedness, as measured by Shannon’s diversity index (Methods), and log population crowding, as measure by Lloyd’s mean crowding (Methods). The point sizes indicate the size of the population in each city, b, Map of epidemic peakedness in Italy at the provincal level. Blue and green colours indicate lower peakedness and red and yellow colours higher peakedness. Grey prefectures had either no or very limited amount of reported cases. Values were rescaled so that Shannon index in each province = (Shannon index—min(Shannon index)/(max(Shannon index)—min(Shannon index)).

### Extended Data Fig. 6

Out-of-sample prediction (n-fold cross validation) over all prefectures in China a, b, and all provinces in Italy c, d. In-sample prediction in China (R2 = 0.32) compares well to out-of-sample predictions (R2 = 0.28). In-sample prediction in Italy (R2 = 0.38) compares well to out of sample predictions (R2 = 0.32).

### Extended Data Fig. 7

Total attack rate, peak attack rate, inverse Shannon entropy and population sizes for: a, shows prefectures in China with low peakedness and high variance (as measured by the variance in the first difference of the time series of daily new cases). These prefects have high population and high crowding. b, shows prefectures in China with high intensity and high variance. c, shows prefectures in China that have low peakedness and low variance. d, shows prefectures in China that have high peakedness and low variance.

## Supplementary information

### Supplementary Information

Supplementary Table 1: Regression model results of variables predicting temporal clustering of the epidemic in prefectures in China (n = 262) and provinces in Italy (n = 109). Supplementary Table 2: Relationship among peak incidence, total incidence and temporal clustering of cases. Supplementary Table 3: 310 global cities and their predicted temporal clustering of cases.

### Supplementary Note

Members of the Open COVID-19 Data Working Group

## Rights and permissions

Reprints and Permissions

Rader, B., Scarpino, S.V., Nande, A. et al. Crowding and the shape of COVID-19 epidemics. Nat Med 26, 1829–1834 (2020). https://doi.org/10.1038/s41591-020-1104-0

• Accepted:

• Published:

• Issue Date:

• ### Big data insight on global mobility during the Covid-19 pandemic lockdown

• , Zbigniew Galar
• , Robert Walasek
• , Grzegorz Zimon
•  & Per Engelseth

Journal of Big Data (2021)

• ### COVID-19 and future pandemics: a global systems approach and relevance to SDGs

•  & Saroj Jayasinghe

Globalization and Health (2021)

• ### Incidence and risk factors of COVID-19-like symptoms in the French general population during the lockdown period: a multi-cohort study

• Fabrice Carrat
• , Mathilde Touvier
• , Gianluca Severi
• , Laurence Meyer
• , Florence Jusot
• , Nathanael Lapidus
• , Delphine Rahib
• , Nathalie Lydié
• , Marie-Aline Charles
• , Pierre-Yves Ancel
• , Alexandra Rouquette
• , Xavier de Lamballerie
• , Marie Zins
•  & Nathalie Bajos

BMC Infectious Diseases (2021)

• ### Estimating the effect of social inequalities on the mitigation of COVID-19 across communities in Santiago de Chile

• Nicolò Gozzi
• , Michele Tizzoni
• , Matteo Chinazzi
• , Leo Ferres
• , Alessandro Vespignani
•  & Nicola Perra

Nature Communications (2021)

• ### Variation in microparasite free-living survival and indirect transmission can modulate the intensity of emerging outbreaks

• C. Brandon Ogbunugafor
• , Miles D. Miller-Dickson
• , Victor A. Meszaros
• , Lourdes M. Gomez
• , Anarina L. Murillo
•  & Samuel V. Scarpino

Scientific Reports (2020)