Abstract
Sudden, largescale and diffuse human migration can amplify localized outbreaks of disease into widespread epidemics^{1,2,3,4}. Rapid and accurate tracking of aggregate population flows may therefore be epidemiologically informative. Here we use 11,478,484 counts of mobile phone data from individuals leaving or transiting through the prefecture of Wuhan between 1 January and 24 January 2020 as they moved to 296 prefectures throughout mainland China. First, we document the efficacy of quarantine in ceasing movement. Second, we show that the distribution of population outflow from Wuhan accurately predicts the relative frequency and geographical distribution of infections with severe acute respiratory syndrome coronavirus 2 (SARSCoV2) until 19 February 2020, across mainland China. Third, we develop a spatiotemporal ‘risk source’ model that leverages population flow data (which operationalize the risk that emanates from epidemic epicentres) not only to forecast the distribution of confirmed cases, but also to identify regions that have a high risk of transmission at an early stage. Fourth, we use this risk source model to statistically derive the geographical spread of COVID19 and the growth pattern based on the population outflow from Wuhan; the model yields a benchmark trend and an index for assessing the risk of community transmission of COVID19 over time for different locations. This approach can be used by policymakers in any nation with available data to make rapid and accurate risk assessments and to plan the allocation of limited resources ahead of ongoing outbreaks.
Similar content being viewed by others
Main
Tracking population flows is especially important in the context of the outbreak of COVID19 in China and the rest of the world. This outbreak emerged in Wuhan (a prefecturelevel city in the province of Hubei) in the runup to the Chinese Lunar New Year’s Eve on 24 January 2020, which is associated with the annual Chunyun mass migration (which can involve as many as three billion trips). The potential scale and range of the diffusion of the outbreak was particularly alarming given the position of Wuhan as a central hub in China’s rail and aviation networks and given the severity of COVID19.
We used nationwide mobile phone data to track population outflow from Wuhan and linked this to COVID19 infection counts by location—at the prefecture level. Our data include 296 prefectures in 31 provinces and regions in China (average population 4.40 million, 94.07% of China’s population). Mobile phone geolocation data—which can reliably quantify human movement—provide precise, verifiable and realtime information^{5,6,7,8,9,10,11}. We conceptualized epidemiological morbidity as a function of the movement of the human population from a disease epicentre. We therefore normalize disease risk to the population inflow from Wuhan rather than to the size of the local population.
Our approach differs from previous studies in which individual mobility and disease spread^{1,2,3,4,12,13} was linked, as we used realtime data about actual movement, focussed on aggregate population flows rather than individual tracking, and implemented a new modelling approach. That is, other recent studies on COVID19 have used historical population flow data (for example, data on Chunyun migrations from previous years) to estimate case exportation during the current outbreak^{14,15,16,17,18}. However, the benefits of observing rather than estimating population movements are substantial as inaccurate predictions can have important consequences for policymaking: underreaction can result in disease spread and overreaction can lead to medically, socially and economically inefficient policies. Moreover, in contrast to previous approaches to epidemiological modelling^{12,13,14,15,16,17,18}, we take advantage of detailed data about the population flow that emanated from the source of the outbreak to develop a populationflowbased risk source model to test the extent to which population flow data can capture the spatiotemporal dynamics of the spread of the SARSCoV2 virus.
To measure the total aggregate population outflow from Wuhan before the region was quarantined on 23 January 2020, we used countrywide data (provided by a major national carrier) that tracked all of the movements out of Wuhan between 1 January and 24 January 2020. The onset of symptoms of the first recorded case of COVID19 in Wuhan was 1 December 2019; by 19 February 2020—the end of our study period—74,576 infected cases had been verified in mainland China according to data from the China Center for Disease Control and Prevention^{19,20,21}. Our time period includes the time at which the news about the outbreak initially appeared (on 31 December 2019 and 9 January 2020) and the annual Lunar New Year migration (which culminated on 24 January 2020). The dataset included any mobile phone user who had spent at least 2 h in Wuhan during this period and it tracked the total daily flow of such individuals to all other prefectures throughout mainland China. Locations were detected when users simply had their phones on. The dataset includes two measures of population outflow: the customer count of the carrier and their extrapolated count of total population movement. We use the latter in our primary analyses and the former as a robustness check (Supplementary Information).
We defined population flow as the total aggregate count of people who entered any given prefecture from Wuhan during the whole observation period (1–24 January 2020). Because Wuhan (population of 11.08 million people in 2018) is a major transportation hub, many of these people were travellers passing through rather than residents. The definition is also weighted by the number of transits through Wuhan since some people may have entered and exited Wuhan on several occasions in January (especially if they lived in neighbouring prefectures). This can be thought of as a linear weighting of additional infection and transmission risk from repeated transits. There were 11,478,484 counts of movements from Wuhan: 8,685,007 to other prefectures within Hubei and 2,793,477 to prefectures in other provinces.
Key dates during this period were 24 January—Lunar New Year’s Eve (outbound holiday travel is typically completed before this evening)—and January 23, when Wuhan was quarantined. We analysed the efficacy of the quarantine (Fig. 1b, c), which was manifested in a reduction of 52% and 38% in inter and intraprovincial population outflow, respectively, on 23 January 2020 compared with 22 January 2020 (when there were 546,324 and 141,208 counts of intra and extraprovincial travel, respectively), and a further reduction of 94% and 84% on 24 January 2020 compared with 23 January 2020. With the imposition of the quarantine—first in Wuhan (and two neighbouring prefectures) at 10:00 on 23 January 2020, and then in 12 other prefectures in Hubei by the end of the day on 24 January 2020—population outflow from Wuhan almost completely stopped (the average daily outflow thereafter was just 1,087 people to all prefectures outside of Hubei, which probably comprised government workers).
We combined the population flow dataset with the count and geographical location of confirmed cases of COVID19 nationwide (Fig. 1a), which used consistent and stringently enforced case ascertainment during this period. As of 19 February 2020, there were 74,576 infected cases in mainland China, of which 29,549 occurred outside of Wuhan and there were 2,118 fatalities (according to data from the China Center for Disease Control and Prevention).
Population flow from Wuhan was hypothesized to export the virus to other locations, where it caused local outbreaks (that is, either by importation or community transmission (refs. ^{19,20,21})). Indeed, we find a strong correlation between total population flow and the number of infections in each prefecture (Fig. 2a, b). Consistent with our hypothesis, the cumulative number of infections is highly correlated with aggregate population outflow from Wuhan from 1 to 24 January 2020, and the correlation increases over time from r = 0.522 on 24 January 2020 to r = 0.919 on 5 February 2020, and increases further to r = 0.952 on 19 February 2020 (P < 0.001 for all) (Fig. 2a–c). As there is little travel throughout the country during this period, the population outflow variable is comparable to a lagged variable in a time series. The correlation exhibited the same robust pattern even when different time windows of population outflow were used (Extended Data Fig. 1). The correlation between population outflow from Hubei province (excluding Wuhan itself) and the number of infections in each prefecture (Fig. 2c) followed a similar pattern but was substantially weaker; this correlation increased from r = 0.365 on 24 January 2020 to r = 0.583 on 19 February 2020.
For completeness we compared the predictive strength of aggregate population outflow to other factors—such as the relative frequency of Baidu search engine queries for virusrelated terms in each prefecture (for example, novel coronavirus, flu, SARS, atypical pneumonia and surgical mask)^{22,23,24}, the gross domestic product (GDP) and population size of each prefecture, and other movement variables. Each of these factors became less predictive of local outbreak size over time, either for the number of cumulative cases or the number of daily reported cases (Fig. 2c, d and Extended Data Figs. 2, 3).
We also evaluated a gravity model^{4,13}. Gravity models were originally developed to model flow volumes or other interactions between geographical areas based simply on distance between two regions and their populations. Here, we use a special case of the gravity model with only the population variable for the ‘recipient’ prefecture as Wuhan is always the ‘donor’ and thus a constant value (Supplementary Information 4.1). This model (with a significantly negative parameter for distance) predicts the high quantity of travel from Wuhan to other prefectures in Hubei and to geographically proximate provinces (Fig. 1). However, it does not explain the high traffic of population outflow to more distant coastal cities. That outflow does not strictly follow a gravity model is not surprising given the rationales for Chunyun migration patterns, which are primarily based on social connections^{8,25}.
Furthermore, we tested a gravity model to predict the infection count. Although the population size of the recipient prefecture and distance were significant predictors (P < 0.001), a mediation analysis shows that population flow from Wuhan mediates the effect of distance. Figure 2c, d illustrates why this is the case. Aggregate population flow from Wuhan exhibits a high and progressively stronger correlation with infection prevalence in destination locations over time. By contrast, the predictive strength of the distance from Wuhan, population size and GDP (an alternative source of gravity) of each prefecture shows no increases or decreases over time. There is no advantage to using distance to estimate population flow and infection spread when the actual population flow is observable, as in our case.
Next, we used two sets of models—one crosssectional and one dynamic model—to statistically model and benchmark the extent to which aggregate population outflow from Wuhan predicts the spread and distribution of infections with SARSCoV2 across mainland China. We developed what we call a risk source model that leverages observed population flow data to operationalize the risk emanating from the epidemic source.
We first modelled the effect of outflow on infection by using the following multiplicative exponential model:
in which y_{i} is the number of the cumulative (or daily) confirmed cases in prefecture i (depending on the model); x_{1i} is the cumulative population outflow from Wuhan to prefecture i from 1 to 24 January 2020; x_{2i} is the GDP of prefecture i; x_{3i} is the population size of prefecture i; m is the number of variables included; and c and β_{j} are parameters to estimate. λ_{k} is the fixed effect for province k; n is the number of prefectures considered in the analysis; I_{ik} is a dummy for prefecture i and I_{ik} = 1, if i ∈ k (prefecture i belongs to province k), otherwise I_{ik} = 0 (Supplementary Information).
We applied a nonlinear leastsquares method (Levenberg–Marquardt algorithm) to estimate the parameters of a model with confirmed cases as the dependent variable and aggregate Wuhan population outflow from 1–24 January 2020 as the sole predictor variable (R^{2} = 0.772 on 24 January to R^{2} = 0.946 on 19 February) and a model with population size and GDP as additional covariates (R^{2} = 0.809 on 24 January 24 to R^{2} = 0.967 on 19 February) (Supplementary Tables 1, 2). Although these additional covariates improve the fit, the parameter for population flow from Wuhan becomes increasingly dominant, whereas the GDP and population of a prefecture become increasingly less predictive over time. Overall, the performance of the models continuously improved as more infected cases were confirmed, suggesting that the spreading pattern of the virus gradually converged to the distribution of the population outflow from Wuhan to other prefectures in China. As a robustness check, we evaluate a model using daily confirmed cases and find consistent results (Supplementary Tables 3, 4).
The logic behind this convergence over time, as well as the predictive strength of the model, is that population flow from Wuhan to other prefectures fundamentally determines the eventual distribution of total infections in China. During the earliest phase of the outbreak, before the quarantine of Wuhan, there was a relative lack of awareness of the virus and few countermeasures preventing its spread. SARSCoV2 should thus have spread relatively randomly across the entire prefecture of Wuhan; that is, our results imply that the number of infected people was uniformly distributed (statistically speaking) in the population outflowing from Wuhan into different prefectures across the country.
Using the daily predicted cases in model (1), we are also able to calculate a daily risk score for prefectures based on the difference between the number of predicted and confirmed cases on any given date (Supplementary Information). A higherthanexpected level of infection suggests more community transmission (that is, ‘underperforming’ compared to the benchmark derived from the outflow population from Wuhan). On the other hand, ‘overperforming’ prefectures, with fewer cases than expected are also noteworthy, as they could have implemented highly successful public health measures (or may be prone to inaccurate data reporting). For example, Extended Data Fig. 4 identifies prefectures with transmission risk index values above the upper bound of the 90% confidence interval on 29 January, and the crossing of this threshold was indeed associated with imminent quarantine. The predictive strength of aggregate population flow from Wuhan and the overall fit of model (1) over time can also act as an early warning index of an epidemiological transition; they reflect the degree to which imported infections are dominant at any point in time. If model strength decreases significantly at any location, this may indicate that community transmission may be overtaking imported cases.
We next developed a spatiotemporal model to explore changes in distribution and growth of COVID19 across all prefectures over time (rather than on individual dates) (Supplementary Information 3.2). We use a Cox proportional hazards framework and replace the constant scaling parameter of model (1) with a timevarying hazard rate function λ_{0}(t), which typically has an Sshaped property (for example, logistic, generalized logistic or Gompertz functions^{26,27}) that epidemics typically follow:
in which λ(tx_{i}) is the hazard function describing the number of cumulative confirmed cases at time t given population outflow from Wuhan to prefecture i, and other variables x_{i} = {x_{1i}, x_{2i}, …, x_{mi}} are the realized values of the covariates for prefecture i; the other notation is the same as model (1).
This model extends our risk source model to a dynamic context; it incorporates all infected cases across all locales and dates to statistically derive the COVID19 epidemic curve and growth pattern across mainland China. We used the same method as before to estimate the parameters (Supplementary Information). When using only the single variable of total population outflow from Wuhan (from 1 to 24 January 2020) to each other prefecture, we observe R^{2} = 0.927 for the exponential–logistic model (Fig. 3a); the inclusion of local population and GDP increases R^{2} to 0.957 (alternate models are in Supplementary Table 5).
We use a similar logic as above to contrast the expected and observed outcomes to gauge epidemiological risk. Here, model predictions serve as reference patterns across time (Extended Data Figs. 5, 6). The differences in the growth trends between the number of predicted and confirmed cases can signal higher levels of SARSCoV2 community transmission. We use the integral of the differences over time to create a total transmission risk index (normalized by subtracting the mean and dividing by the standard deviation) and identify a list of prefectures above and below the 90% confidence interval (Extended Data Fig. 7 and Supplementary Table 11). Indeed, our model identifies a list of statistically significant underperforming prefectures; in most of these cases, we observed the subsequent imposition of quarantine (Extended Data Figs. 5, 6, Supplementary Information and Supplementary Table 12). On the other hand, prefectures with lower trends than expected might have had more successful public health measures. Figure 3b shows the dynamic shifts in the risk index score for selected prefectures, which enables the monitoring of prefectures to analyse which prefectures performed better in controlling the transmission risk over time.
In summary, using detailed mobilephone geolocation data to compute aggregate population movements, we track the transit of people from Wuhan to the rest of mainland China up to 24 January 2020. The geographical flow of people anticipates the subsequent location, intensity and timing of outbreaks in the rest of mainland China up to 19 February 2020. These data outperform other measures, such as population size, wealth or distance from the risk source. We modelled the epidemic curves of COVID19 across different locales using population flows and showed that deviations from model predictions served as tools to detect the burden of community transmission.
The logic of our populationflowbased risk source model differs from classic epidemiological models that rely on assumptions regarding population mixing, population compartment sizes and viral properties. By assuming that risk arises from human population movements, our risk source model is able to parsimoniously capture the distribution of the epidemic. The model has several advantages: it makes no assumptions regarding travel patterns or effective distance effects; allows for nonlinear estimations; generates a nonarbitrary, sourcelinked risk score; and is easily adapted to other empirical contexts. Notably, the multiplicative functional form can also accommodate multiple risk sources—for example, for countries in which there are multiple disease epicentres. As an example, we evaluated the distinct impact of population flow from Hubei (excluding Wuhan) as an alternative risk source in our models, and found that it had little impact on the spread and growth of COVID19 in the country (Supplementary Tables 6, 10).
We focused on the relative strength of the outbreak in each area, rather than the absolute number of cases, although one can predict the number of cases by using reported data to calibrate the parameters of the model. A key contribution of our approach is to robustly characterize the structure or relative distribution of cases across different geographical areas and over time, which is driven fundamentally by the cumulative outflow from Wuhan. Moreover, another benefit is that nonsystematic inaccuracy of COVID19 casefinding is relatively unimportant as long as we capture the distribution of population flow accurately over time, which we do.
Our approach is generalizable to any dataset that captures population movements (for example, trainticketing or cartolling data). This method can also be implemented in a live fashion (if suitable data are available) to facilitate policy decisions—for example, for the allocation of resources and manpower across specific geographical locales based on the predicted strength of the epidemic. This could also yield a dynamic performance metric when contrasted against realtime reports of infections, and, as we show, identify which areas have higher virus transmission risk or more effective measures.
Other techniques to forecast the levels of an epidemic in defined populations in advance have, of course, been proposed—whether the use of online search behaviour^{22,23,24} or the use of network sensors (that is, the monitoring of people who are at heightened risk of falling ill given their network position)^{28}. Our approach relies on data regarding population flow. Indeed, historical (that is, baseline) information about population flows—undisturbed by the imposition of quarantines or by publicity regarding outbreaks, both of which happened here—could also be valuable to public health experts and government officials when new outbreaks occur.
When people move, they take contagious diseases with them. Their movements are thus a harbinger of the future status of an epidemic, and this offers the prospect of using dataanalytic techniques to control an epidemic before it strikes too hard.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Data availability
Data necessary to reproduce the primary results of this study are included in the Article and its Supplementary Information.
Code availability
Code necessary to reproduce the primary results of this study is included in the Article and its Supplementary Information.
References
Colizza, V., Barrat, A., Barthélemy, M. & Vespignani, A. The role of the airline transportation network in the prediction and predictability of global epidemics. Proc. Natl Acad. Sci. USA 103, 2015–2020 (2006).
Halloran, M. E. et al. Ebola: mobility data. Science 346, 433 (2014).
Brockmann, D. & Helbing, D. The hidden geometry of complex, networkdriven contagion phenomena. Science 342, 1337–1342 (2013).
Balcan, D. et al. Multiscale mobility networks and the spatial spreading of infectious diseases. Proc. Natl Acad. Sci. USA 106, 21484–21489 (2009).
Brockmann, D., Hufnagel, L. & Geisel, T. The scaling laws of human travel. Nature 439, 462–465 (2006).
González, M. C., Hidalgo, C. A. & Barabási, A. L. Understanding individual human mobility patterns. Nature 453, 779–782 (2008).
Onnela, J. P., Arbesman, S., González, M. C., Barabási, A. L. & Christakis, N. A. Geographic constraints on social network groups. PLoS ONE 6, e16939 (2011).
Lu, X., Bengtsson, L. & Holme, P. Predictability of population displacement after the 2010 Haiti earthquake. Proc. Natl Acad. Sci. USA 109, 11576–11581 (2012).
Yan, X. Y., Wang, W. X., Gao, Z. Y. & Lai, Y. C. Universal model of individual and population mobility on diverse spatial scales. Nat. Commun. 8, 1639 (2017).
Csáji, B. C. et al. Exploring the mobility of mobile phone users. Physica A 392, 1459–1473 (2013).
Wesolowski, A. et al. Quantifying the impact of human mobility on malaria. Science 338, 267–270 (2012).
Adda, J. Economic activity and the spread of viral diseases: evidence from high frequency data. Q. J. Econ. 131, 891–941 (2016).
Viboud, C. et al. Synchrony, waves, and spatial hierarchies in the spread of influenza. Science 312, 447–451 (2006).
Wu, J. T., Leung, K. & Leung, G. M. Nowcasting and forecasting the potential domestic and international spread of the 2019nCoV outbreak originating in Wuhan, China: a modelling study. Lancet 395, 689–697 (2020).
Wu, J. T. et al. Estimating clinical severity of COVID19 from the transmission dynamics in Wuhan, China. Nat. Med. 26, 506–510 (2020).
Chinazzi, M. et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID19) outbreak. Science 368, 395–400 (2020).
Du, Z. et al. Risk for transportation of coronavirus disease from Wuhan to other cities in China. Emerg. Infect Dis. 26, 1049–1052 (2020).
Li, R. et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARSCoV2). Science 368, 489–493 (2020).
Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
Zhu, N. et al. A novel coronavirus from patients with pneumonia in China, 2019. New Engl. J. Med. 382, 727–733 (2020).
Chan, J. F.W. et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating persontoperson transmission: a study of a family cluster. Lancet 395, 514–523 (2020).
Ginsberg, J. et al. Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009).
Lazer, D., Kennedy, R., King, G. & Vespignani, A. The parable of Google flu: traps in big data analysis. Science 343, 1203–1205 (2014).
Viboud, C. & Vespignani, A. The future of influenza forecasts. Proc. Natl Acad. Sci. USA 116, 2802–2804 (2019).
Massey, D. S. & España, F. G. The social process of international migration. Science 237, 733–738 (1987).
Bürger, R., Chowell, G. & LaraDíıaz, L. Y. Comparative analysis of phenomenological growth models applied to epidemic outbreaks. Math. Biosci. Eng. 16, 4250–4273 (2019).
Roosa, K. et al. Shortterm forecasts of the COVID19 epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. J. Clin. Med. 9, 596 (2020).
Christakis, N. A. & Fowler, J. H. Social network sensors for early detection of contagious outbreaks. PLoS ONE 5, e12948 (2010).
Acknowledgements
We thank a major unnamed national carrier for providing the anonymised and aggregated data enabling the computation of population movements. J.J. is supported by the National Natural Science Foundation of China (72042009 and 71490722) and Shenzhen Institute of Artificial Intelligence and Robotics for Society (2020INT001). J.S.J. is supported by the Research Grants Council of Hong Kong (14505217). X.L. is supported by the National Natural Science Foundation of China (82041020, 91846301, 71771213, 71901067 and 61773120) and the Science and Technology Department of Sichuan Province (2020YFS0007). G.X. is supported by the National Natural Science Foundation of China (71704052 and 91846301) and Hunan Provincial Key Laboratory of New Retailing Virtual Reality Technology (2015TP). We thank staff at the telecom carrier for their assistance in data preparation. This work was deemed exempt from IRB review.
Author information
Authors and Affiliations
Contributions
All authors contributed equally to the paper. J.S.J., J.J. and N.A.C. conceived the research. J.J., Y.Y., X.L., J.S.J. and G.X. analysed the data. J.S.J. and N.A.C. wrote the paper. J.J., J.S.J. and X.L. obtained funding. All authors contributed to research design, analytical development and critical revisions.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks JukkaPekka Onnela and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Timewindow sensitivity test for the correlational analysis.
a, b, Pearson’s correlation (n = 296 prefectures) between the cumulative number of confirmed cases and population outflow from Wuhan on different days ranging from 1 to 14 days before 24 January 2020 for the cumulative number of diagnosed cases over time (a) and the number of newly diagnosed (daily) cases over time (b). Daily outflow is used for the calculation, for example, t = 3 indicates that the correlation is measured by daily outflow from Wuhan on 21 January 2020 with the cumulative number of confirmed cases from 24 January 2020 onwards. c, d, Pearson’s correlation (n = 296 prefectures) during 3 different (8day) time periods from 1 to 24 January 2020 between population outflow and the cumulative number of diagnosed cases over time (c) and the number of newly diagnosed (daily) cases over time (d).
Extended Data Fig. 2 Correlation with alternative population movement measures.
a, b, Pearson’s correlation (n = 296 prefectures) between alternative publicly available movement measurements from the 2018 City/Prefectures Statistical Year Book of China (with aggregate population outflow data from Wuhan from 1 to 24 January 2020 as a reference) and COVID19 count using the cumulative number of confirmed cases over time (a) and the number of daily confirmed cases over time (b). Foreign tourist, domestic tourist, and ‘highway, airway and waterway passenger’ numbers reflect interprefecture travel, while bus passengers and the number of taxis reflect local travel.
Extended Data Fig. 3 Search terms and correlation with confirmed cases.
a, Search frequency of Baidu search terms related to the COVID19 outbreak: the search terms are direct translations of the Chinese keywords that Baidu users used during the study period (note the official WHO name ‘COVID19’ was only announced on 11 February 2020). b, Pearson’s correlation (n = 296 prefectures) between Baidu search terms and the (cumulative) number of confirmed cases of COVID19 over time. The initially high and then decreasing predictive strength of search may reflect the fact that, initially, high volumes of information search about the virus signalled stronger risk perception in any given prefecture (for example, because of early reported cases, having more relatives in Wuhan, and so on), but that—over time—information saturation reduced the impetus for specific searches.
Extended Data Fig. 4 Prefectures with a high transmission risk index on 29 January 2020.
The predicted structure of the spread of the SARSCoV2 virus can be used as a benchmark to identify which locales deviate significantly. As model (1) predicts the number of cases in a prefecture based on the population outflow from Wuhan (that is, imported cases and the initial transmission of the virus within the local community), a greater difference between predicted and confirmed cases indicates a higher level of community transmission. Prefectures to the left of the dashed line have community transmission risk index values that were higher than the upper bound of the 90% confidence interval. Our model identified Wenzhou as having the most severe community transmission risk on 29 January 2020; the government announced a full quarantine of the prefecture on 2 February 2020.
Extended Data Fig. 5 Benchmark (predicted) versus actual virus growth in the prefectures of Hubei province.
Model (2) used aggregate population outflow from Wuhan from 1 to 24 January 2020 to provide a reference growth pattern (that is, epidemic curves) for the spread of COVID19 across time and space, without making a priori assumptions about the growth pattern or mechanism. Differences in the growth trends between predicted and confirmed cases can signal higher levels of COVID19 community transmission (Supplementary Table 11). The discrete jumps in confirmed cases in some prefectures after 13 February 2020 reflected a change in the infection count criteria of local governments; clinically diagnosed cases came to be included in total confirmed case counts in those prefectures (within Hubei province).
Extended Data Fig. 6 Benchmark (predicted) versus actual virus growth in selected prefectures outside of Hubei province.
Model (2) used aggregate population outflow from Wuhan from 1 to 24 January 2020 to provide a reference growth pattern (that is, epidemic curves) for the spread of COVID19 across time and space, without making a priori assumptions about the growth pattern or mechanism. Differences in the growth trends between predicted and confirmed cases can signal higher levels of COVID19 community transmission (Supplementary Table 11).
Extended Data Fig. 7 The distribution of the transmission risk index.
The transmission risk index \(({\overline{\varDelta }}_{i})\) is the normalized score of the integral of the differences between the actual number of confirmed infected cases and predicted numbers in our model. Prefectures above the 90% confidence interval of the index are likely to experience more local community transmission than imported cases, and prefectures below the 90% confidence interval may have a better performance in the control of the virus (Supplementary Table 11).
Extended Data Fig. 8 Robustness check of model (2) with different time lags and timewindow lengths.
We explored which time window and time lags of aggregate population outflow best explain the spread and intensity of COVID19. Time window refers to how many days of outflow data were used; time lag (0 to 23) is how many days before 24 January 2020 the time window starts. For example, analyses using time lag = 1 and time window = 2 use outflow data between 23 and 24 January 2020. The surfaces show that a more recent time lag improves the R^{2} (a) as well as the parameter value (b) of the population outflow coefficient in model (2).
Supplementary information
Supplementary Information
This file provides detailed methods, robustness checks, regression output, supplementary analyses, and code needed to reproduce our analyses.
Supplementary Data
Data file for the primary analyses reported in the paper, combining aggregate population outflow data from Wuhan from January 124, 2020, with cumulative COVID19 case counts as of February 19 and other data for 296 prefectures in mainland China.
Rights and permissions
About this article
Cite this article
Jia, J.S., Lu, X., Yuan, Y. et al. Population flow drives spatiotemporal distribution of COVID19 in China. Nature 582, 389–394 (2020). https://doi.org/10.1038/s415860202284y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s415860202284y
This article is cited by

Unravelling the spatial directionality of urban mobility
Nature Communications (2024)

A generalized vectorfield framework for mobility
Communications Physics (2024)

The exciting potential and daunting challenge of using GPS humanmobility data for epidemic modeling
Nature Computational Science (2024)

Spatiotemporal trajectory data modeling for fishing gear classification
Pattern Analysis and Applications (2024)

Networkbased time series modeling for COVID19 incidence in the Republic of Ireland
Applied Network Science (2024)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.