Real-time tracking and prediction of COVID-19 infection using digital proxies of population mobility and mixing

Digital proxies of human mobility and physical mixing have been used to monitor viral transmissibility and effectiveness of social distancing interventions in the ongoing COVID-19 pandemic. We develop a new framework that parameterizes disease transmission models with age-specific digital mobility data. By fitting the model to case data in Hong Kong, we are able to accurately track the local effective reproduction number of COVID-19 in near real time (i.e., no longer constrained by the delay of around 9 days between infection and reporting of cases) which is essential for quick assessment of the effectiveness of interventions on reducing transmissibility. Our findings show that accurate nowcast and forecast of COVID-19 epidemics can be obtained by integrating valid digital proxies of physical mixing into conventional epidemic models.

T racking the spread of COVID-19 infection in real time has been an elusive goal, given the necessary delay between infection and reporting. This delay consists of the incubation period (around 6 days), time between symptom onset and diagnosis (around 3 days), and the duration between confirmation and reporting (around half day) 1,2 . Therefore, there is around 9 days of delay even with instantaneous updating of case reports, assuming that testing is adequate to capture a consistent proportion of cases.
We previously estimated that possibly 44% of all COVID-19 infection events were pre-symptomatic transmission, i.e., during the last 2-3 days of the index-case incubation period; and 95% of all transmission would have taken place by day 5 after symptom onset 3 . Moreover, COVID-19, like SARS and MERS previously, show remarkable clustering and potential for superspreading events 4 . These are especially important if they take place in highrisk settings, such as nursing homes and institutional contexts. Taken together, it remains an urgent priority to develop new analytics that would allow truly real-time monitoring of transmissibility, thus the application of timely public health interventions in mitigation.
Digital proxies of human mobility and physical mixing have been shown to provide useful insights into disease transmission [5][6][7][8][9][10][11] . Specifically, during the ongoing COVID-19 pandemic, data of human mobility and mixing have been used to monitor the effectiveness of social distancing interventions 5,9,10 . Here, using COVID-19 in Hong Kong as an example, we describe a framework that integrates such digital proxies into conventional epidemic models to (i) track transmissibility in near real time; and (ii) generate nowcast and short-term forecast of the pandemic.

Results
Transmissibility of COVID-19 in Hong Kong. The first imported and local case of COVID-19 in Hong Kong was confirmed on January 23, 2020, and January 30, 2020, respectively. In response to the pandemic, Hong Kong imposed progressively more restrictive interventions on inbound travelers thereafter. As such, the transmissibility of imported and local cases was expected to be inherently different and should be estimated separately, hence we stratified the local cases in Hong Kong into secondary cases of local and imported cases (based on their epidemiologic linkages documented in the COVID-19 surveillance database compiled by the Government Center for Health Protection) and estimated the transmissibility of local and imported cases over time (see "Methods" for details).
To estimate the transmissibility of COVID-19 in Hong Kong, we first approximated the epidemic curves by date of infection by deconvoluting the epidemic curves by date of onset with the incubation period [12][13][14] . We then applied the method described in Thompson et al. 15 to estimate the instantaneous effective reproduction number (R t ), which was defined as the mean number of secondary infections generated by a typically infectious case at time t. We call the resulting estimates of R t for local cases our empirical R t estimates. We used simulations to verify that this method for generating empirical R t estimates was generally accurate, except when the daily case count fell below ten ( Supplementary Fig. 1).
The empirical R t estimate was around 2.5 when community transmission first began in mid-January. R t then dropped rapidly to around 1 during late-January after Wuhan was locked down on 23 January and stringent nonpharmaceutical interventions were implemented across all major cities in mainland China (Fig. 1). R t hovered around 1 in February, during which aggressive physical distancing measures, such as "work-from-home" among civil servants as well as many other businesses, were implemented. Due to the relaxation of these measures and the importation of cases among more than 75,000 returnees from infected countries during the first week of March 16 , R t rebounded sharply to mid-January levels (i.e., around 2.5) within a week. As the cases generated by such increase in transmission began to be registered by surveillance during the second week of March (i.e., around 9 days of infection-to-reporting delay), R t began to drop, probably due to spontaneous resumption of physical distancing measures among the general public in response to the observed increase in community transmission. NPIs such as "work-from-home" were reintroduced on 21 March and R t continued to drop thereafter to subcritical levels (i.e., <1) in April.
Digital proxies for population mixing. The basic premise of our framework was that intracity mobility and physical mixing relevant to the local spread of COVID-19 in Hong Kong could be gauged by the digital transactions made on Octopus cards, which are ubiquitously used by the Hong Kong population for their daily public transport and small retail payments (https://www. octopus.com.hk/tc/consumer/index.html). We demonstrated the validity of our premise by calculating the Pearson's correlation coefficients between these digital proxies and the posterior mean of the empirical R t estimates.
The empirical R t estimates were highly correlated with the number of Octopus transactions for transport: the Pearson's correlation coefficients between the two were r = 0.62, 0.68, 0.80, and 0.76 for children, students, adults, and the elderly, respectively ( Fig. 2 and Supplementary Fig. 2). These results support our premise that Octopus transport transactions were valid digital proxies for population mixing. The correlation between Octopus retail transactions and the empirical R t estimates was low except for fast-food retail among adults (r = 0.71, which was still lower than that for adult transport). Consequently, we did not use retail transactions as digital proxies.
Integrating the digital proxies into an epidemic model. Assuming that Octopus transport transactions were valid digital proxies for population mixing, we integrated them into an agestructured (0-11, 12-18, 19-64, and ≥65 years) susceptible (S)infectious (I)-removed (R) model, assuming that the contact matrix can be parameterized by optimally scaling the age-specific digital proxies (see "Methods"). We inferred these scaling factors and other model parameters by fitting the model to the epidemic curve of local cases between January 22, 2020 and May 31, 2020 (the date of symptom onset of the last case was 28 April, but this case was reported on 31 May and there were no cases reported between 15 and 31 May) 17,18 .
The R t estimates from the fitted model (i.e., the dominate eigenvalue of the next-generation matrix in the fitted model at time t) had a significantly higher correlation with the empirical R t estimates than the constituent digital proxies (r = 0.98, Fig. 3). The deviations between the two in late-February (during which empirical R t was lower) and early-March (during which empirical R t was higher) might be due to low case counts (which tends to cause oscillations in the empirical R t estimates; see Supplementary  Fig. 1). We estimated that only 23% (13-47%) of all local COVID-19 infections in Hong Kong had been ascertained by the official surveillance system ( Supplementary Fig. 6). We performed the same model fitting at five earlier time points of the epidemic: 2 March, 14 March, 17 March, 22 March, and 4 April. The posterior distributions of the scaling factors that translate the digital proxies into the contact matrix were largely the same over the course of the epidemic (Supplementary Fig. 3). That is, the inferred mechanistic dependence of transmission dynamics on the digital proxies was stable over time, giving credence to the epidemiologic validity of our framework. Incorporating household contact patterns into our framework did not improve its performance (see Supplementary Information), probably because local cases have been generated mostly by community transmission.
Nowcasting and forecasting the spread of COVID-19 in Hong Kong. The traditional framework of generating empirical R t estimates from epidemic curves could not provide real-time estimates of R t because there was an inevitable delay of around 9 days between infection and case reporting on average (e.g., 6 days of incubation period plus 3 days of lead time between symptoms onset and case reporting). In contrast, our framework could be used to translate the digital proxies (which can be autonomously compiled by Octopus daily) into real-time estimates of R t . More importantly, the fitted model could be used to (i) nowcast the epidemic (i.e., estimating the number of cases that have already been generated but not yet registered by surveillance due to the infection-to-reporting delay); and (ii) generate shortterm epidemic forecast by making assumptions on how the digital proxies (i.e., population mixing) might evolve over the forecast time horizon.
We used the metrics proposed in Funk et al. (namely sharpness, bias, ranked probability score (RPS), Dawid-Sebastiani score (DSS), and absolute error (AE)) to assess the predictive performance of our forecasts ( Supplementary Fig. 7) 19 . Sharpness, RPS, DSS, and AE dropped progressively between 30 January and 28 February, suggesting that the performance of our forecast model was improving as more mobility and case data became available ( Supplementary Fig. 7). However, all metrics suggested that the forecasts were less accurate on 29 February, 15-17 March, and 22-23 March ( Fig. 3 and Supplementary Fig. 8, in the sense that the observed case counts were near or outside the 95% prediction intervals). These prediction errors were likely due to the occurrence of superspreading events, which tend to result in more explosive growth in incidence 20 , and the stochastic effects in an otherwise low-prevalence setting. Specifically, our forecast underestimated the number of new onsets on 29 February which comprised a cluster from a religious group and another cluster seeded by returnees on the Diamond Princess Cruise. Similarly, our forecast underestimated incidence on 22-23 March which comprised a large cluster from a bar setting 17 . Because SSEs were rare in the training data and not explicitly modeled in our framework, the forecast model underestimated incidence when SSEs occurred. On the other hand, our forecast overestimated incidence on 15-17 March probably because the true prevalence was very low (on the order of tens) and the associated stochasticity in transmission dynamics resulted in an epidemic take-off that occurred slower than predicted by our deterministic framework. Notwithstanding the prediction errors attributed to SSEs and very low prevalence, our framework generated largely robust epidemic nowcasts during February-April: the estimated number of cases and its 95% prediction intervals from the fitted model (which were sufficiently tight for practical purposes) provided a reasonably robust inference of the number of cases that had already been generated but not yet registered by surveillance due to the infection-to-reporting delay (Fig. 3). Under the assumption that population mixing would remain at status quo for 6 days following the time of prediction, our epidemic forecasts were congruent with the observed epidemic curve during February-April except when SSEs occurred or when prevalence was very low (Fig. 3).
Finally, although the R t estimates and scaling factors for contact matrix parameterization were sensitive to assumptions regarding the generation time distribution (as expected), the accuracy and precision of the nowcasts and forecasts were unaffected (Supplementary Figs. 9 and 10).

Discussion
Epidemic dynamics of directly transmitted infectious diseases, including COVID-19, is shaped by contact patterns which can fluctuate substantially over time due to spontaneous behavioral changes in physical mixing among the general public as well as interventions mandated by health officials (e.g., different components of "lockdowns") 21 . Although the conventional method of using social contact surveys to gauge contact patterns has provided valuable insights into the epidemiology of many infectious diseases (e.g., influenza, varicella, the current COVID-19 pandemic, etc.) [22][23][24] , it might be difficult to acquire real-time updates of population mixing on a daily basis in the context of epidemic surveillance, especially in settings with limited resources. Mobile and location-based technology offer a complementary and efficient solution-the digital footprints of human mobility and activities registered by ubiquitously used mobile platforms (e.g., Octopus cards in Hong Kong, WeChat, and Alipay in mainland China, Oyster cards in the UK, Facebook and Google in the US, etc.) can be harnessed to generate near real-time proxies of population mixing with very high frequency and spatiotemporal resolution at very low cost. In this study, we have illustrated how accurate nowcast and short-term forecast of COVID-19 epidemics can be obtained using epidemic models parameterized with valid digital proxies even when population mixing was varying widely on a weekly or even daily basis.
The robustness of such a framework hinges on the identification of proxies that can provide representative and epidemiologically valid descriptions of human mobility and mixing among different age groups over time. The correlation between Octopus transactions and empirical R t estimates would be weaker if the former were not stratified by age and transaction categories ( Supplementary Fig. 2). Other digital proxies for human mobility in Hong Kong such as CityMapper 25 and Google's community mobility reports 26 (Supplementary Fig. 4) had a lower correlation with empirical R t estimates than Octopus transport transactions, probably because Octopus has much higher uptake across all age groups and areas. Similarly, we found that Baidu's transport data for Beijing, Shanghai, Shenzhen, and Wenzhou (the four Chinese cities for which empirical R t estimates have been reported in our previous study 27 ) also correlated well with the respective local R t ( Supplementary Fig. 5). In principle, jurisdictions that can track R t in near real-time would be able to adopt a reinforcement learning approach to optimize their intervention portfolios via rapid iterative cycles of transmissibility assessments and portfolio readjustments.
Our methods leveraged aggregate data instead of individuallevel data from mobile phone usage. While individual-level data could probably provide more information to improve the temporal and spatial resolution of COVID-19 transmissibility, the use of personally identifiable information must be considered more thoughtfully to avoid the associated social, ethical and legal challenges 6 . Aggregate data of population mobility and activity, such as the Octopus card usage and Baidu's intracity traffic indices, are by default anonymized and therefore should not elicit privacy concerns.
Our current framework has several limitations. First, daily usage of Octopus cards among adults is much higher than that among children (e.g., public transportation is free for children aged below 3 years) and the elderly (e.g., the daily activities of many seniors tend to occur within walking distance of their neighborhood). As such, we posit that the explanatory and predictive power of our framework could be enhanced by including proxies that are more specific to young children as well as older adults (e.g., relative changes in customer volumes of "dim sum" restaurants that are regularly patronized by the elderly in Hong Kong).
Second, it is unlikely that big data such as Octopus transactions can reflect physical mixing within households. However, this seems to have little impact on how well our framework could nowcast and forecast the COVID-19 epidemic in Hong Kong, probably because community transmission has so far been the major driving force for local spread of COVID-19. Nonetheless, our framework should be extended to deal with more general transmission scenarios by integrating the community contact patterns inferred from digital proxies with household contact patterns obtained from conventional social contact surveys.
Third, our framework tracked real-time changes in physical mixing but not temporal changes in the probability that these contacts conduce disease transmission. The latter might depend strongly on infection-prevention behaviors (e.g., wearing of masks and heightened personal hygiene) which can vary substantially as public sentiments regarding epidemic control fluctuate. Again, this seems to have little impact on the performance of our framework in this study, probably because infectionprevention precaution has consistently remained at very high levels among the general public in Hong Kong since the first emergence of COVID-19 in January 2020 16 . Future refinements of our framework should include tracking of factors that might affect transmissibility other than population mixing, which includes infection-prevention behaviors of the general public as well as climatic conditions 28 .
Fourth, our framework is based on changes in population mixing and therefore might not be able to provide accurate epidemic forecast when there are not enough data from large clusters or superspreading events to train the model (Fig. 3). Changes in population mixing correlate with the average disease transmissibility well, but the occurrence and scale of superspreading events are often associated with substantial stochasticity. To improve our framework, estimation of the scale of superspreading events should be integrated into the model once the relevant data are available from outbreak investigations.
Fifth, our framework is based on deterministic epidemic dynamics and hence is not yet designed to cope with stochasticity associated with very low disease prevalence (e.g., at fadeout levels). To generate a more accurate nowcast and forecast in such epidemic settings, the disease transmission model and inference of its parameters in our framework should be extended to account for stochastic epidemic dynamics 29 .
In conclusion, we have shown that digital proxies of population mobility and mixing can be integrated into conventional epidemic models to nowcast and forecast epidemics. As the penetration of mobile and location-based technology continues to rise globally in the future, many jurisdictions would be able to harness various streams of real-time big data to generate timely and robust epidemic intelligence. Indeed, in the context of COVID-19 control, such data are already readily accessible in countries such as China (e.g., WeChat and Alipay transactions) for designing policies that maximize economic productivity while maintaining the effective reproductive number below 1 until safe and effective vaccines become widely accessible.
Methods COVID-19 data in Hong Kong. The daily number of confirmed COVID-19 cases in Hong Kong by date of symptoms onset was provided by the Centre for Health Protection (CHP). Patients who met certain clinical, epidemiological, or laboratory criteria were classified as suspected, probable, or confirmed cases in Hong Kong 2 . Although some SARS-CoV-2 infections were asymptomatic at the time of reporting, for simplicity we used the term "case" to include all symptomatic and asymptomatic SARS-CoV-2 infections. A confirmed case is defined as a patient whose specimens have been laboratory-confirmed to contain virologic or serologic evidence of infection with SARS-CoV-2, irrespective of symptoms or epidemiologic linkage. The testing strategy in Hong Kong has been relatively consistent between late-January and the end of April 2 .
The CHP classified all COVID-19 cases into six categories: imported case, local case, possibly local case, epidemiologically linked with imported case, epidemiologically linked with local case and epidemiologically linked with possibly local case (https://www.coronavirus.gov.hk/eng/index.html). Transmissibility of imported and local cases was expected to be very different because intensive nonpharmaceutical interventions had been imposed on travelers arriving from COVID-19 affected regions since January 2020. Specifically, 14-day home quarantine has been mandated for all individuals arriving in Hong Kong from mainland China since 8 January, from South Korea since 25 February, from Iran and affected areas in Italy since 1 March, from Italy and affected regions in France, Germany, Japan, and Spain since 14 March, from the Schengen Area since 17 March and from all overseas countries and territories since 20 March 16 .
Octopus data in Hong Kong. The basic premise of our framework was that intracity mobility and physical mixing relevant to the local spread of COVID-19 in Hong Kong could be gauged by the digital transactions made on Octopus cards, which are ubiquitously used by the Hong Kong population for their daily public transport and small retail payments (https://www.octopus.com.hk/tc/consumer/ index.html). The Octopus cards are used by 99% of the population of Hong Kong aged 16 to 65 and the system handles more than 14 million transactions, worth over HK$180 million on a daily basis. (http://www.octopus.com.hk/en/corporate/ about-octopus/profile/services/index.html).
We obtained the daily number of Octopus transactions in two major categories, namely transport and retail, among four types of cards, namely child (for children aged 3 and 11), student (for primary, secondary school, and university students <26 years old as well as student cardholders who receive discounted fares on selected routes), adult (for non-student adults aged under 65), and elder (for older adults aged 65 years or above). The daily numbers of Octopus transactions were normalized assuming the average number of Octopus transactions of each category of each age group between January 1, 2020 and January 15, 2020 was 100%.
Model, inference, and analysis. Our analysis comprised the following steps:

Estimate the instantaneous reproductive number R t of local cases in Hong
Kong on each day t between 22 January (day 0) and 15 May (day T) in the renewal equation model built in the "EpiEstim" package. 2. Correlate the resulting time series of empirical R t estimates (posterior means) with the daily volumes of different types of Octopus transactions. Select digital traces with high correlation coefficients with R t as mobility proxies for parameterizing the contact matrix of the SIR-type epidemic model in step 3. 3. On each day t, optimize the scaling factors that translate the digital proxies into the contact matrix by fitting the epidemic model to the epidemic curve between day 0 and t. 4. On each day t, use the fitted model from step 3 to (retrospectively) nowcast and forecast the number of new cases and compare with the corresponding realized epidemic curve. We describe the details of each of these steps in the following subsections.
Step 1: estimating the instantaneous reproductive number of COVID-19 in Hong Kong. The instantaneous effective reproduction number R t is defined as the average number of secondary cases generated by cases on day t. If R t > 1, the epidemic is expanding at time t, whereas R t < 1 indicates that the epidemic size is shrinking at time t. Since the epidemic curves provided by the CHP are based on the dates of symptom onset, we used a deconvolution-based method to reconstruct the COVID-19 epidemic curves by dates of infections [12][13][14] . Specifically, we used f incubation , the probability density function (pdf) of the incubation period, to deconvolute the time series of the daily number of symptom onsets to reconstruct an epidemic curve of a daily number of new infections. We assumed the incubation period distribution was Gamma with a mean and SD of 6.5 and 2.6 days 1 .
To account for the differential transmissibility between local and imported cases, we decomposed the epidemic curves by imported cases, local cases linked with imported cases, and local cases based on the CHP case classification: imported cases include only "imported cases" defined by CHP; local cases linked with imported cases include only cases "epidemiologically linked with imported case" defined by CHP; local cases include "local case", "possibly local case", cases "epidemiologically linked with the local case", and cases "epidemiologically linked with possibly local case" defined by CHP.
We then computed R t for imported and local cases separately from the respective epidemic curves using the "EpiEstim" R package developed by Thompson et al. 15 . We called the resulting estimates of R t for local cases our empirical R t estimates. We used the default prior in "EpiEstim" R package, i.e., assuming a prior distribution for the effective reproduction number with mean of 5 and SD of 5. We assumed that the generation time distribution was gamma with (i) mean 5.2 days and coefficient of variation 0.33 in the base case; and (ii) mean 4.2 and 6.2 days with the same coefficient of variation in the sensitivity analysis. We assumed a 7-day time window in R t estimation with "EpiEstim" because the Octopus transport volume during weekends was on average 60-70% that during weekdays.
Step 2: select digital traces as proxies for population mobility and mixing in Hong Kong. We obtained the number of Octopus transactions in two major categories, namely transport and retail, among four types of cards, namely child (for children aged 3 to 11), student (for primary, secondary school, and university students who receive discounted fares on selected routes; some university students are >18 years), adult (for non-student adults aged under 65), and elder (for older adults aged 65 years or above). We used g a,c (t) to denote the normalized number of transactions for card type a and payment category c on day t (such that g a,c (t) = 1 on 1 January 2020). We calculated the Pearson's correlation coefficient between each of these digital proxies g a,c (t), and the posterior mean of our empirical R t estimates. We found that all transport transactions exhibited a correlation coefficient of 0.5 or above with R t while retail transactions exhibited a much lower correlation. As such, we selected only the age-specific transport transactions as digital proxies for population mixing in the epidemic model in step 3. Specifically, we assumed that the number of infectious contacts between age group a and b (outside households) at time t could be modeled as γ a g a,tran (t) γ b g b,tran (t) where γ a > 0 and γ b > 0 were the scaling factors for the digital proxy of age group a and b (to be inferred in step 3), respectively. Under such formulation, the average rate at which an individual in age group a made infectious contacts with age group b at time t was Step 3: optimize the weights of the digital proxies by fitting the epidemic model to the observed number of confirmed cases on each day (retrospectively). We used our previous age-structured SIR model to simulate the transmission of COVID-19 in Hong Kong 30 : where • m was the number of age groups in the population.
• S a (t) and R a (t) were the numbers of susceptible and removed individuals in age group a at time t.
• I a (t,τ) was the number of infectious individuals in age group a at time t who were infected at time t-τ.
• N a was the total number of people in age group a.

ARTICLE
• π a (t) was the force of infection on age group a at time t.
• f GT was the pdf of the generation time.
The time-varying next-generation matrix for this SIR model was: where T g was the mean generation time.
The effective reproduction number R t corresponded to the dominant eigenvalue of NGM(t) 31,32 . The incidence rate of infection and reported onsets in age group a at time t were calculated as follows: where p report was the proportion of infections ascertained by the CHP. We assumed that the epidemic was seeded by M local infections on January 22, 2020.
The set of parameters that were subject to statistical inference, which we denoted by θ, included: (i) the seed size M; (ii) the scaling factors γ a 's; and (iii) the ascertainment proportion p report . We estimated θ from the daily number of symptom onsets reported in Hong Kong (see Supplementary Table 1 for the list of inferred parameters) assuming that the observed number of cases was Poisson distributed with a mean equal to the number of cases in the fitted model. The likelihood function was Y where λ t ¼ P a A a;onset t ð Þ and n t were the expected (from the model) and observed number of reported onsets on day t. The statistical inference was performed in a Bayesian framework with noninformative (flat) priors using Markov Chain Monte Carlo. We used P t (θ) to denote the posterior distribution of θ obtained by fitting the model to epidemic data up to day t. All analyses were conducted in MATLAB 2020a and R 4.0.0.
Step 4: nowcast and forecast the daily number of confirmed cases using the fitted models. On each day t, we drew 5000 samples of θ from P t (θ) to parameterize the age-structured transmission model and simulated the number of new confirmed cases for day t (i.e., nowcast) as well as day t + 1,…, t + 6 (i.e., 6-day forecast) in each of these models. We then compared these simulated epidemic trajectories with the corresponding observed case counts in the CHP data to evaluate the predictive performance of the fitted model.
Incorporating household contacts into the framework. Conventionally, contact matrices for epidemic models of human-to-human transmissible respiratory diseases are constructed using data from social contact surveys 33 . Although we have previously conducted a social contact survey and estimated the contact matrix in Hong Kong (Supplementary Tables 2 and 3), the contact patterns outside households have likely changed substantially since then due to the COVID-19 pandemic as well as social unrest that has been ongoing since 2019 21 . Let H ab be the average number of household contacts that an individual in age group a had in age group b from our previous survey (Supplementary Table 3). As discussed in the main text, the contact matrix parameterized with digital proxies largely corresponded to social contacts outside the households that drove community transmission. In our sensitivity analysis, we assumed that the relative contact pattern within the household was the same as that in our previous survey and incorporate the household contact matrix into our framework as follows: where μ was a weight parameter subject to statistical inference.
Empirical estimates of R t in mainland Chinese cities. For Shenzhen and Wenzhou, dates of symptom onset were available for most cases. As in our analysis for Hong Kong, we used f incubation , the probability density function (pdf) of the incubation period, to deconvolute the time series of the daily number of symptom onsets to reconstruct an epidemic curve of the daily number of new infections. For Beijing and Shanghai, dates of symptom onset of many cases were not available. Thus we used f infection-reporting , the pdf of the time between infection and case reporting, to deconvolute the time series of the daily number of confirmed cases accordingly. We assumed f incubation and f onset-reporting were independent such that the pdf of the time between infection and reporting was f onsetÀreporting ðt À uÞf incubation ðuÞdu. We assumed the distribution of the time between symptom onset and reporting was Gamma with mean and standard deviation (SD) of 4.3 and 3.2 days, based on 186 cases reported in January-February 2020 in Beijing 27 , which was consistent with the time between symptom onset and case reporting since February across China from the WHO-China Joint Mission Report 34 .
In the vast majority of provinces and cities in mainland China, 14-day centralized quarantine had been mandated for individuals who had been to Hubei within 14 days from 23 January until late-April 27 . Cases in Beijing, Shanghai, Shenzhen, and Wenzhou were only categorized as imported and local cases. Consequently, we could not estimate R t of local and imported cases separately as we did in our analysis for Hong Kong. Instead, we estimated R t of local and imported cases in these cities by modifying the method by Thompson et al. 15 as follows.
For each city, let I t be the total number of new infections on day t which comprised both local I local t À Á and imported I imported t infections, i.e., . We assumed that the generation time distribution was the same for imported and local infectors. Let ρ t be the relative infectiousness of imported infections on day t (ρ t ∈ [0,1] due to the 14-day mandatory quarantine on imported infections). Given ; ρ 0 ; ; ρ tÀ1 where F GT was the cdf of the generation time. We then estimated R t using the same Bayesian method described in Thompson  Intracity mobility and activity indices of cities in mainland Chinese cities. To estimate the intracity mobility levels of mainland Chinese cities, we obtained publicly available indices of intracity traffic volumes based on location-based services provided by Baidu (https://qianxi.baidu.com/). The index is a normalized ratio of a city's population with intracity movement within 24 h to a city's residential population, though the precise details of the normalization algorithm have not been made publicly available by Baidu on their website.