Abstract
As countries in the world review interventions for containing the pandemic of coronavirus disease 2019 (COVID19), important lessons can be drawn from the study of the full transmission dynamics of its causative agent—severe acute respiratory syndrome coronavirus 2 (SARSCoV2)— in Wuhan (China), where vigorous nonpharmaceutical interventions have suppressed the local outbreak of this disease^{1}. Here we use a modelling approach to reconstruct the fullspectrum dynamics of COVID19 in Wuhan between 1 January and 8 March 2020 across 5 periods defined by events and interventions, on the basis of 32,583 laboratoryconfirmed cases^{1}. Accounting for presymptomatic infectiousness^{2}, timevarying ascertainment rates, transmission rates and population movements^{3}, we identify two key features of the outbreak: high covertness and high transmissibility. We estimate 87% (lower bound, 53%) of the infections before 8 March 2020 were unascertained (potentially including asymptomatic and mildly symptomatic individuals); and a basic reproduction number (R_{0}) of 3.54 (95% credible interval 3.40–3.67) in the early outbreak, much higher than that of severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS)^{4,5}. We observe that multipronged interventions had considerable positive effects on controlling the outbreak, decreasing the reproduction number to 0.28 (95% credible interval 0.23–0.33) and—by projection—reducing the total infections in Wuhan by 96.0% as of 8 March 2020. We also explore the probability of resurgence following the lifting of all interventions after 14 consecutive days of no ascertained infections; we estimate this probability at 0.32 and 0.06 on the basis of models with 87% and 53% unascertained cases, respectively—highlighting the risk posed by substantial covert infections when changing control measures. These results have important implications when considering strategies of continuing surveillance and interventions to eventually contain outbreaks of COVID19.
Main
COVID19, caused by SARSCoV2, was detected in Wuhan in December 2019^{6}. The high population density, together with increased social activities before the Chinese New Year, catalysed the outbreak; the spread of the outbreak was expedited by massive human movement during the Chunyun holiday travel season from 10 January 2020^{3}. Shortly after the confirmation of humantohuman transmission, the Chinese authorities implemented an unprecedented cordon sanitaire of Wuhan on 23 January to contain the geographical spread of the disease, followed by a series of nonpharmaceutical interventions—including suspension of all intra and intercity transportation, compulsory mask wearing in public places, cancellation of social gatherings and the home quarantine of individuals with presumed infections, those with COVID19 related symptoms and their close contacts^{1}—to reduce virus transmission. From 2 February, a strict stayathome policy for all residents, and the centralized isolation and quarantine of all patients, individuals suspected to have contracted the virus and their close contacts were implemented to stop household and community transmission. In addition, a citywide doortodoor universal survey of symptoms was carried out during 17–19 February by designated community workers, to identify previously undetected symptomatic cases. These interventions—together with improved medical resources and the redeployment of healthcare personnel from all over the country—have crushed the epidemic curve and reduced the attack rate in Wuhan, with the potential to shed light on global efforts to control outbreaks of COVID19^{1}.
Recent studies have revealed important transmission features of COVID19, including the infectiousness of asymptomatic^{7,8,9,10} and presymptomatic^{2,11,12} individuals. Furthermore, the number of ascertained cases was much smaller than that estimated using international cases exported from Wuhan before the travel suspension^{3,13,14}, which implies a substantial number of unascertained cases. Using reported cases from 375 cities in China, a previous modelling study concluded that a sizeable number of unascertained cases—despite having lower transmissibility—had facilitated the rapid spreading of COVID19^{15}. In addition, accounting for unascertained cases has refined the estimation of case fatality risk of COVID19^{16}. Modelling both ascertained and unascertained cases is important for interpreting transmission dynamics and epidemic trajectories.
On the basis of comprehensive epidemiological data from Wuhan^{1}, we delineated the full dynamics of COVID19 in the epicentre by extending the susceptible–exposed–infectious–recovered (SEIR) model to include presymptomatic infectiousness (P), unascertained cases (A) and case isolation in the hospital (H), generating a model that we name SAPHIRE (Fig. 1, Methods, Extended Data Tables 1, 2). We modelled the outbreak from 1 January 2020 across 5 time periods that were defined on the basis of key events and interventions: 1–9 January (before Chunyun), 10–22 January (Chunyun), 23 January to 1 February (cordon sanitaire), 2–16 February (centralized isolation and quarantine) and 17 February to 8 March (community screening). We assumed a constant population size of 10 million with equal numbers of daily inbound and outbound travellers (500,000 before Chunyun, 800,000 during Chunyun and 0 after cordon sanitaire)^{3}. Furthermore, we assumed that the transmission rate and ascertainment rate did not change in the first two periods (because few interventions were implemented before 23 January), whereas these rates were allowed to vary in later periods to reflect the strengths of different interventions. We estimated these rates across periods by Markov Chain Monte Carlo (MCMC) and further converted the transmission rate into the effective reproduction number (R_{e}) (Methods).
We first simulated epidemic curves with two periods to validate our parameter estimation procedure (Methods, Extended Data Fig. 1). Our method could accurately estimate R_{e} and the ascertainment rates when the model was correctly specified, and was robust to misspecification of the duration from the onset of symptoms to isolation and of the relative transmissibility of unascertained versus ascertained cases. As expected, estimates of R_{e} were positively correlated with the specified latent and infectious periods, and the estimated ascertainment rates were positively correlated with the specified ascertainment rate in the initial state.
Using confirmed cases exported from Wuhan to Singapore (Extended Data Table 3), we conservatively estimated the ascertainment rate during the early outbreak in Wuhan to be 0.23 (95% confidence interval 0.14–0.42; unless specified otherwise, all parenthetical ranges refer to the 95% credible interval) (Methods). We then fit the daily incidences in Wuhan from 1 January to 29 February, assuming the initial ascertainment rate was 0.23, and predicted the trend from 1 March to 8 March (Methods). Our model fit the observed data well, except for the outlier on 1 February; this outlier might be due to the approximatedate records of many patients admitted to the field hospitals set up after 1 February (Fig. 2a). After a series of multifaceted public health interventions, R_{e} decreased from 3.54 (3.40–3.67) and 3.32 (3.19–3.44) in the first two periods to 1.18 (1.11–1.25), 0.51 (0.47–0.54) and 0.28 (0.23–0.33) in the later three periods (Fig. 2b, Extended Data Tables 4, 5). We estimated the cumulative number of infections, including unascertained cases, up until 8 March to be 258,728 (204,783–320,145) if the trend of the fourth period was assumed (Fig. 2c), 818,724 (599,111–1,096,850) if the trend of the third period was assumed (Fig. 2d) or 6,302,694 (6,275,508–6,327,520) if the trend of the second period was assumed (Fig. 2e), in comparison to the estimated total infections of 249,187 (198,412–307,062) obtained by fitting data from all 5 periods (Fig. 2a). Correspondingly, these numbers translate into a 3.7%, 69.6% and 96.0% reduction of infections by the measures taken in the fifth period, the fourth and the fifth periods combined, and the last three periods combined, respectively.
We estimated low ascertainment rates throughout: 0.15 (0.13–0.17) for the first two periods, and 0.14 (0.11–0.17), 0.10 (0.08–0.12), and 0.16 (0.13–0.21) for the remaining three periods (Extended Data Table 6). Even with the universal screening of the community for symptoms that was implemented from 17 February to 19 February, the ascertainment rate was raised only to 0.16. On the basis of the fitted model using data from 1 January to 29 February, we projected the cumulative number of ascertained cases to be 32,577 (30,216–34,986) by 8 March, close to the reported number of 32,583. This was equivalent to an overall ascertainment rate of 0.13 (0.11–0.16) given the estimated total infections of 249,187 (198,412–307,062). The model also projected that the number of daily active infections (including presymptomatic, ascertained and unascertained infections) peaked at 55,879 (43,582–69,571) on 2 February and dropped afterwards to 701 (436–1,043) on 8 March (Fig. 2f). If the trend remained unchanged, the number of ascertained infections would have first become zero on 27 March (95% credible interval 20 March to 5 April), and the clearance of all infections would have occurred on 21 April (8 April to 12 May) (Extended Data Table 7). The first day of zero ascertained cases in Wuhan was reported on 18 March, indicating enhanced interventions in March.
We used stochastic simulations to investigate the implications of unascertained cases for continuing surveillance and interventions^{17} (Methods). Because of latent, presymptomatic and unascertained cases, the source of infection would not be completely cleared shortly after the first day of zero ascertained cases. We found that if control measures were lifted 14 days after the first day of zero ascertained cases, the probability of resurgence, defined as the number of active ascertained cases greater than 100, could be as high as 0.97, and the surge was predicted to occur on day 34 (27–47) after lifting controls (Fig. 3). If we were to impose a morestringent criterion of lifting controls after observing no ascertained cases in a consecutive period of 14 days, the probability of resurgence would drop to 0.32, with possible resurgence delayed to day 42 (33–55) after lifting controls (Fig. 3). These results highlight the risk of ignoring unascertained cases in switching intervention strategies, despite our use of a simplified model.
We performed a series of sensitivity analyses to test the robustness of our results by smoothing the outlier data point on 1 February, as well as varying the lengths of latent and infectious periods, the duration from the onset of symptoms to isolation, the ratio of transmissibility in unascertained versus ascertained cases, and the initial ascertainment rate (Extended Data Tables 4–7, Supplementary Information). Our major findings, of a marked decrease in R_{e} after interventions and the existence of a substantial number of unascertained cases, were robust. Consistent with simulations, the estimated ascertainment rates were positively correlated with the specified initial ascertainment rate. When we specified the initial ascertainment rate as 0.14 or 0.42, the estimated overall ascertainment rate was 0.08 (0.07–0.10) and 0.23 (0.16–0.28), respectively. If we assume an extreme scenario with no unascertained cases in the early outbreak (which we term model ‘S8’ (Supplementary Information)), the estimated ascertainment rate would be 0.47 (0.39–0.58) overall, which would represent an upper bound of the ascertainment rate. Because of the higher ascertainment rate (compared to the main analysis) in this model, we estimated a lower probability of resurgence (0.06) when lifting controls after 14 days of no ascertained cases, and the resurgence was expected to occur on day 38 (29–52) after lifting controls (Fig. 3). A simplified model that assumes complete ascertainment at any time performed substantially worse than the full model (Extended Data Table 4, Supplementary Information).
Understanding the proportion of unascertained cases and their transmissibility is critical for the prioritization of the surveillance and control measures^{17}. Our finding of a large fraction of unascertained cases—despite the high level of surveillance in Wuhan—indicates the existence of many asymptomatic or mildly symptomatic individuals. It was previously estimated that asymptomatic individuals accounted for 18% of the infections on board the Diamond Princess Cruise ship^{8} and 31% of the infected Japanese individuals who were evacuated from Wuhan^{9}. In addition, in a cohort of 210 women admitted for delivery between 22 March and 4 April in New York City (USA), 29 of 33 (88%) pregnant women infected with SARSCoV2 were asymptomatic^{10}. Several reports have also highlighted the difficulty of detecting cases of COVID19: the detection capacity varied from 11% in lowsurveillance countries to 40% in highsurveillance countries^{18,19}, and the modelling of epidemics outside of Wuhan has suggested that the ascertainment rate was 24.4% in China (excluding Hubei province)^{14} and 14% in Wuhan before the travel ban^{15}. Consistent with these studies and emerging serological studies that show that seroprevalence is much higher than the reported case prevalence in cities and countries worldwide^{20,21,22}, our analyses of data from Wuhan indicated an overall ascertainment rate between 8% and 23% (Extended Data Table 6, excluding the extreme scenario of model S8).
Our R_{e} estimate of 3.54 (3.40–3.67) before any interventions is at the higher end of the range of the estimated R_{0} values of other studies that used early epidemic data from Wuhan^{6,23}. This discrepancy might be due to the modelling of unascertained cases, morecomplete case records in our analysis and/or to the different time periods analysed. If we modelled from the first case of COVID19 reported in Wuhan, we would estimate a lower R_{e} of 3.38 (3.28–3.48) before interventions (Extended Data Fig. 2), which remains much higher than those of SARS and MERS^{4,5}.
Our modelling study has delineated the fullspectrum dynamics of the COVID19 outbreak in Wuhan, and highlighted two key features of the outbreak: high covertness and high transmissibility. These two features have synergistically propelled the COVID19 pandemic, and imposed considerable challenges to attempts to control the outbreak. However, the Wuhan case study demonstrates the effectiveness of vigorous and multifaceted containment efforts. In particular, despite the relatively low ascertainment rates (owing to mild or absent symptoms of many infected individuals), the outbreak was controlled by interventions such as wearing face masks, social distancing and quarantining close contacts^{1}, which block transmission that stems from unascertained cases.
Given the limitations of our model as discussed below, further investigations—such as a survey of the seroprevalence of SARSCoV2specific antibodies—are needed to confirm our estimates. First, owing to the delay in laboratory tests, we might have missed some cases and therefore underestimated the ascertainment rate (especially for the last period). Second, we excluded clinically diagnosed cases without laboratory confirmation to reduce falsepositive diagnoses; however, this leads to an underestimation of ascertainment rates—especially for the third and fourth periods, during which many clinically diagnosed cases were reported^{1}. The variation in the estimated ascertainment rates across periods reflects a combined effect of the evolving surveillance, interventions, medical resources and case definitions across time periods^{1,24}. Third, our model assumes homogeneous transmission within the population and ignores heterogeneity between groups by sex, age, geographical region and socioeconomic status^{25}. Furthermore, individual variation in infectiousness—such as superspreading events^{26}—is known to result in a higher probability of stochastic extinction given a fixed population R_{e} (ref. ^{27}). We might therefore have overestimated the probability of resurgence. Finally, we could not evaluate the effect of individual interventions on the basis of an epidemic curve from a single city, because many interventions were applied simultaneously. Future work that models heterogeneous transmission between different groups, and joint analysis with data from other cities, will provide deeper insights into the effectiveness of different control strategies^{28,29}.
Methods
Data of cases of COVID19 in Wuhan
We analysed the daily incidence data of COVID19, presented in figure 1 of ref. ^{1}. In brief, information on cases of COVID19 from 8 December 2019 to 8 March 2020 were extracted from the municipal Notifiable Disease Report System on 9 March 2020. The date of the onset of symptoms (the selfreported date of developing symptoms, such as a fever, cough or other respiratory symptoms) and the date of confirmed diagnosis were collected. For the consistency of case definition throughout the periods, we included only 32,583 individuals who had a laboratoryconfirmed positive test for SARSCoV2 by the realtime reversetranscription polymerasechainreaction (RT–PCR) assay or highthroughput sequencing of nasal and pharyngeal swab specimens. SAS software (version 9.4) was used in data collection.
Estimation of initial ascertainment rate using cases exported to Singapore
As of 10 May 2020, a total of 24 confirmed cases of COVID19 in Singapore were reported to be imported from China, among which 16 were imported from Wuhan before the cordon sanitaire on 23 January; the first case arrived in Singapore on 18 January (Extended Data Table 3). Based on VariFlight Data (https://data.variflight.com/en/), the total number of passengers who travelled from Wuhan to Singapore between 18 January and 23 January 2020 was 2,722. Therefore, the infection rate among these passengers was 0.59% (95% confidence interval 0.30–0.88%). These individuals had an onset of symptoms between 21 January and 30 January 2020. In Wuhan, a total of 12,433 confirmed cases involved individuals who were reported to have experienced an onset of symptoms in the same period—equivalent to a cumulative infection rate of 0.124% (95% confidence interval 0.122–0.126%), assuming a population size of 10 million for Wuhan. By further assuming complete ascertainment of early cases in Singapore (which is wellknown for its high level of surveillance^{18,19}), the ascertainment rate during the early outbreak in Wuhan was estimated to be 0.23 (95% confidence interval 0.14–0.42), corresponding to 0.77 (95% confidence interval 0.58–0.86) of the infections being unascertained. This represents a conservative estimate for two reasons: (1) the assumption of perfect ascertainment in Singapore ignored potential asymptomatic individuals;^{8,9} and (2) the number of imported cases in which individuals experienced symptom onset between 21 January and 30 January was underestimated owing to the suspension of flights after lockdown in Wuhan. Without direct information to estimate the initial ascertainment rate before 1 January 2020, we used these results based on Singapore data to set the initial value and the prior distribution of ascertainment rates in our model, and performed sensitivity analyses under various assumptions.
The SAPHIRE model
We extended the classic SEIR model to a SAPHIRE model (Fig. 1, Extended Data Table 1), which incorporates three additional compartments to account for presymptomatic infectious individuals (P), unascertained cases (A) and cases isolated in the hospital (H). We chose to analyse data from 1 January 2020, when the Huanan Seafood Market was disinfected, and thus did not model the zoonotic force of infection^{3}. We assumed a constant population size (N) = 10,000,000, with equal numbers of daily inbound and outbound travellers (n), in which n = 500,000 for 1–9 January, 800,000 for 10–22 January (owing to Chunyun) and 0 after the cordon sanitaire from 23 January^{3}. We divided the population into susceptible (S), exposed (E), P, A, ascertained infectious (I), H and removed (R) individuals. We introduced compartment H because ascertained cases would have a shorter effective infectious period owing to isolation, especially when medical resources were improved^{1}. We use italicized letters to denote the number of individuals in each corresponding compartment. The dynamics of these compartments across time (t) are described by the following set of ordinary differential equations:
in which b is the transmission rate for ascertained cases (defined as the number of individuals that an ascertained case can infect per day); α is the ratio of the transmission rate of unascertained cases to that of ascertained cases; r is ascertainment rate; D_{e} is the latent period; D_{p} is the presymptomatic infectious period; D_{i} is the symptomatic infectious period; D_{q} is the duration from illness onset to isolation; and D_{h} is the isolation period in hospital. R_{e} could be computed as
in which the three terms represent infections contributed by presymptomatic individuals, unascertained cases and ascertained cases, respectively. We adjusted the infectious periods of each type of case by taking population movement \(\left(\frac{n}{N}\right)\) and isolation (\({D}_{{\rm{q}}}^{1}\)) into account.
Parameter settings and initial states
Parameter settings for the main analysis are summarized in Extended Data Table 2. We set α = 0.55 according to ref. ^{15}, assuming lower transmissibility for unascertained cases. Compartment P contains both ascertained and unascertained cases in the presymptomatic phase. We set the transmissibility of P to be the same as unascertained cases, because it has previously been reported that the majority of cases are unascertained^{15}. We assumed an incubation period of 5.2 days and a presymptomatic infectious period of D_{p} = 2.3 days^{2,6}. Thus, the latent period was D_{e} = 5.2 − 2.3 = 2.9 days. Because presymptomatic infectiousness was estimated to account for 44% of the total infections from ascertained cases^{2}, we set the mean of total infectious period as \(({D}_{{\rm{p}}}+{D}_{{\rm{i}}})=\frac{{D}_{{\rm{p}}}}{0.44}=5.2\) days, assuming constant infectiousness across the presymptomatic and symptomatic phases of ascertained cases^{12}—thus, the mean symptomatic infectious period was D_{i} = 2.9 days. We set a long isolation period of D_{h} = 30 days, but this parameter has no effect on our fitting procedure and the final parameter estimates. The duration from the onset of symptoms to isolation was estimated to be D_{q} = 21, 15, 10, 6 and 3 days as the median time length from onset to confirmed diagnosis in period 1–5, respectively^{1}.
On the basis of the settings above, we specified the initial state of the model on 31 December 2019 (Extended Data Table 1). The initial number of ascertained symptomatic cases I(0) was specified as the number of ascertained cases in which individuals experienced symptom onset during 29–31 December 2019. We assumed the initial ascertainment rate was r_{0}, and thus the initial number of unascertained cases was \(A(0)={r}_{0}^{1}(1{r}_{0})I(0)\). We denoted P_{I}(0) and E_{I}(0) as the numbers of ascertained cases in which individuals experienced symptom onset during 1–2 January 2020 and 3–5 January 2020, respectively. Then, the initial numbers of exposed and presymptomatic individuals were set as \(E(0)={r}_{0}^{1}{E}_{{\rm{I}}}(0)\) and \(P(0)={r}_{0}^{1}{P}_{{\rm{I}}}(0)\), respectively. We assumed r_{0} = 0.23 in our main analysis, on the basis of the point estimate using the Singapore data (described in ‘Estimation of initial ascertainment rate using cases exported to Singapore’).
Estimation of parameters in the SAPHIRE model
Considering the timevarying strength of control measures, we assumed b = b_{12} and r = r_{12} for the first two periods, b = b_{3} and r = r_{3} for period 3, b = b_{4} and r = r_{4} for period 4, and b = b_{5} and r = r_{5} for period 5. We assumed that the observed number of ascertained cases in which individuals experienced symptom onset on day d—denoted as x_{d}—follows a Poisson distribution with rate \({\lambda }_{d}=r{P}_{d1}{D}_{{\rm{p}}}^{1}\), in which P_{d −1} is the expected number of presymptomatic individuals on day (d − 1). We fit the observed data from 1 January to 29 February (d = 1, 2, …, D, and D = 60) and used the fitted model to predict the trend from 1 March to 8 March. Thus, the likelihood function is
We estimated b_{12}, b_{3}, b_{4}, b_{5}, r_{12}, r_{3}, r_{4} and r_{5} by MCMC with the delayed rejection adaptive metropolis algorithm implemented in the R package BayesianTools (version 0.1.7)^{30}. We used a noninformative flat prior of Unif(0,2) for b_{12}, b_{3}, b_{4} and b_{5}. For r_{12}, we used an informative prior of Beta(7.3,24.6) by matching the first two moments of the estimate using Singapore data (described in ‘Estimation of initial ascertainment rate using cases exported to Singapore’). We reparameterized r_{3}, r_{4} and r_{5} by
in which \({\rm{logit}}(r)=\,\log \left(\frac{r}{1r}\right)\). In the MCMC, we sampled δ_{3}, δ_{4} and δ_{5} from the prior of N(0,1). We set a burnin period of 40,000 iterations and continued to run 100,000 iterations with a sampling step size of 10 iterations. We repeated MCMC with three different sets of initial values and assessed the convergence by the trace plot and the multivariate Gelman–Rubin diagnostic^{31} (Supplementary Information). Estimates of parameters were presented as posterior means and 95% credible intervals from 10,000 MCMC samples. All of the analyses were performed in R (version 3.6.2) and the Gelman–Rubin diagnostic was calculated using the gelman.diag function in the R package coda (version 0.19.3).
Stochastic simulations
We used stochastic simulations to obtain the 95% credible interval of a fitted or predicted epidemic curve. Given a set of parameter values from MCMC, we performed the following multinomial random sampling:
\(({U}_{{\rm{E}}\to {\rm{P}}},{U}_{{\rm{E}}\to {\rm{O}}},{U}_{{\rm{E}}\to {\rm{E}}}) \sim {\rm{Multinomial}}({E}_{t1};{p}_{{\rm{E}}\to {\rm{P}}},{p}_{{\rm{O}}},1{p}_{{\rm{E}}\to {\rm{P}}}{p}_{{\rm{O}}})\)
in which O denotes the status of outflow population, \({p}_{{\rm{O}}}=n{N}^{1}\) denotes the outflow probability and other quantities are status transition probabilities, including\(\,{p}_{{\rm{S}}\to {\rm{E}}}=b(\alpha {P}_{t1}+\alpha {A}_{t1}+{I}_{t1}){N}^{1}\), \({p}_{{\rm{E}}\to {\rm{P}}}={D}_{{\rm{e}}}^{1}\), \({p}_{{\rm{P}}\to {\rm{I}}}=r{D}_{{\rm{p}}}^{1}\), \({p}_{{\rm{P}}\to {\rm{A}}}=(1r){D}_{{\rm{p}}}^{1}\), \({p}_{{\rm{I}}\to {\rm{H}}}={D}_{{\rm{q}}}^{1}\), \({p}_{{\rm{I}}\to {\rm{R}}}={p}_{{\rm{A}}\to {\rm{R}}}={D}_{{\rm{i}}}^{1}\) and \(\,{p}_{{\rm{H}}\to {\rm{R}}}={D}_{{\rm{h}}}^{1}\). The SAPHIRE model described by equations (1)–(7) is equivalent to the following stochastic dynamics:
We repeated the stochastic simulations for all 10,000 sets of parameter values sampled by MCMC to construct the 95% credible interval of the epidemic curve by the 2.5 and 97.5 percentiles at each time point.
Prediction of epidemic ending date and the risk of resurgence
Using the stochastic simulations described in ‘Stochastic simulations’, we predicted the first day of no new ascertained cases and the date of clearance of all active infections in Wuhan, assuming continuation of the same control measures as the last period (that is, same parameter values).
We also evaluated the risk of outbreak resurgence after lifting control measures. We considered lifting all controls (1) at t days after the first day of zero ascertained cases, or (2) after a consecutive period of t days with no ascertained cases. After lifting controls, we set the transmission rate b, ascertainment rate r and population movement n to be the same as the first period, and continued the stochastic simulation to the stationary state. Time to resurgence was defined as the number of days from lifting controls to when the number of active ascertained cases (I) reached 100. We performed 10,000 simulations with 10,000 sets of parameter values sampled from MCMC (as described in ‘Estimation of parameters in the SAPHIRE model’). We calculated the probability of resurgence as the proportion of simulations in which resurgence occurred, as well as the time to resurgence conditional on the occurrence of resurgence.
Simulation study for method validation
To validate the method, we performed twoperiod stochastic simulations (equations (10) to (16)) with transmission rate b = b_{1} = 1.27, ascertainment rate r = r_{1} = 0.2, daily population movement n = 500,000, and duration from illness onset to isolation D_{q} = 20 days for the first period (so that R_{e} = 3.5 according to equation (8)), and b = b_{2} = 0.41, r = r_{2} = 0.4, n = 0 and D_{q} = 5 for the second period (so that R_{e} = 1.2 according to equation (8)). Lengths of both periods were set to 15 days, and the initial ascertainment rate was set to r_{0} = 0.3, and the other parameters and initial states were set as those in our main analysis (Extended Data Tables 1, 2). We repeated stochastic simulations 100 times to generate 100 datasets. For each dataset, we applied our MCMC method to estimate b_{1}, b_{2}, r_{1} and r_{2}, and set all other parameters and initial values to be the same as the true values. We translated b_{1} and b_{2} into (R_{e})_{1} and (R_{e})_{2} according to equation (8), and focused on evaluating the estimates of (R_{e})_{1}, (R_{e})_{2}, r_{1} and r_{2}. We also tested the robustness to misspecification of the latent period D_{e}, presymptomatic infectious period D_{p}, symptomatic infectious period D_{i}, duration from illness onset to isolation D_{q}, ratio of transmissibility between unascertained and ascertained cases α, and initial ascertainment rate r_{0}. In each test, we changed the specified value of a parameter (or initial state) to be 20% lower or higher than its true value, and kept all other parameters unchanged. When we changed the value of r_{0}, we adjusted the initial states A(0), P(0) and E(0) according to Extended Data Table 1.
For each simulated dataset, we ran the MCMC method with 20,000 burnin iterations and an additional 30,000 iterations. We sampled parameter values from every 10 iterations, resulting in 3,000 MCMC samples. We took the mean across 3,000 MCMC samples as the final estimates and display results for 100 repeated simulations.
Sensitivity analyses for the real data
We designed nine sensitivity analyses to test the robustness of our results from real data. For each of the sensitivity analyses, we fixed parameters and initial states to be the same as the main analysis except for those mentioned below. For analysis (S1), we adjust the reported incidences from 29 January to 1 February to their average. We suspect the spike of incidences on 1 February might be caused by approximatedate records among some patients admitted to the field hospitals after 2 February. The actual dates for illness onset for these patients were likely to be spread between 29 January and 1 February. For analysis (S2), we assume an incubation period of 4.1 days (lower 95% confidence interval from ref. ^{6}) and a presymptomatic infectious period of 1.1 days (the lower 95% confidence interval from ref. ^{2} is 0.8 days, but our discrete stochastic model requires D_{p} > 1), equivalent to set D_{e} = 3 and D_{p} = 1.1, and adjust P(0) and E(0) accordingly. For analysis (S3), we assume an incubation period of 7 days (upper 95% confidence interval from ref. ^{6}) and a presymptomatic infectious period of 3 days (upper 95% confidence interval from ref. ^{2}), equivalent to set D_{e} = 4 and D_{p} = 3, and adjust P(0) and E(0) accordingly. For analysis (S4), we assume the transmissibility of the unascertained cases is α = 0.46 (lower 95% confidence interval from ref. ^{15}) of the ascertained cases. For analysis (S5), we assume the transmissibility of the unascertained cases is α = 0.62 (upper 95% confidence interval from ref. ^{15}) of the ascertained cases. For analysis (S6), we assume the initial ascertainment rate is r_{0} = 0.14 (lower 95% confidence interval of the estimate using Singapore data) and adjust A(0), P(0) and E(0) accordingly. For analysis (S7), we assume the initial ascertainment rate is r_{0} = 0.42 (upper 95% confidence interval of the estimate using Singapore data) and adjust A(0), P(0) and E(0) accordingly. For analysis (S8), we assume the initial ascertainment rate is r_{0} = 1 (theoretical upper limit) and adjust A(0), P(0) and E(0) accordingly. For analysis (S9), we assume no unascertained cases by fixing r_{0} = r_{12} = r_{3} = r_{4} = r_{5} = 1. We compared this simplified model to the full model using the Bayes factor.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Data availability
The data analysed in this study are available on GitHub at https://github.com/chaolongwang/SAPHIRE.
Code availability
Codes are available on GitHub at https://github.com/chaolongwang/SAPHIRE.
References
 1.
Pan, A. et al. Association of public health interventions with the epidemiology of the COVID19 outbreak in Wuhan, China. J. Am. Med. Assoc. 323, 1915–1923 (2020).
 2.
He, X. et al. Temporal dynamics in viral shedding and transmissibility of COVID19. Nat. Med. 26, 672–675 (2020).
 3.
Wu, J. T., Leung, K. & Leung, G. M. Nowcasting and forecasting the potential domestic and international spread of the 2019nCoV outbreak originating in Wuhan, China: a modelling study. Lancet 395, 689–697 (2020).
 4.
Wang, Y., Wang, Y., Chen, Y. & Qin, Q. Unique epidemiological and clinical features of the emerging 2019 novel coronavirus pneumonia (COVID19) implicate special control measures. J. Med. Virol. 92, 568–576 (2020).
 5.
Lipsitch, M. et al. Transmission dynamics and control of severe acute respiratory syndrome. Science 300, 1966–1970 (2003).
 6.
Li, Q. et al. Early transmission dynamics in Wuhan, China, of novel coronavirusinfected pneumonia. N. Engl. J. Med. 382, 1199–1207 (2020).
 7.
Bai, Y. et al. Presumed asymptomatic carrier transmission of COVID19. J. Am. Med. Assoc. 323, 1406–1407 (2020).
 8.
Mizumoto, K., Kagaya, K., Zarebski, A. & Chowell, G. Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID19) cases on board the Diamond Princess cruise ship, Yokohama, Japan, 2020. Euro Surveill. 25, 2000180 (2020).
 9.
Nishiura, H. et al. Estimation of the asymptomatic ratio of novel coronavirus infections (COVID19). Int. J. Infect. Dis. 94, 154–155 (2020).
 10.
Sutton, D., Fuchs, K., D’Alton, M. & Goffman, D. Universal screening for SARSCoV2 in women admitted for delivery. N. Engl. J. Med. 382, 2163–2164 (2020).
 11.
Tong, Z. D. et al. Potential presymptomatic transmission of SARSCoV2, Zhejiang Province, China, 2020. Emerg. Infect. Dis. 26, 1052–1054 (2020).
 12.
Ferretti, L. et al. Quantifying SARSCoV2 transmission suggests epidemic control with digital contact tracing. Science 368, eabb6936 (2020).
 13.
Kucharski, A. J. et al. Early dynamics of transmission and control of COVID19: a mathematical modelling study. Lancet Infect. Dis. 20, 553–558 (2020).
 14.
Chinazzi, M. et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID19) outbreak. Science 368, 395–400 (2020).
 15.
Li, R. et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARSCoV2). Science 368, 489–493 (2020).
 16.
Wu, J. T. et al. Estimating clinical severity of COVID19 from the transmission dynamics in Wuhan, China. Nat. Med. 26, 506–510 (2020).
 17.
Lipsitch, M., Swerdlow, D. L. & Finelli, L. Defining the epidemiology of COVID19 – studies needed. N. Engl. J. Med. 382, 1194–1196 (2020).
 18.
De Salazar, P. M., Niehus, R., Taylor, A., Buckee, C. O. & Lipsitch, M. Identifying locations with possible undetected imported severe acute respiratory syndrome coronavirus 2 cases by using importation predictions. Emerg. Infect. Dis. 26, 1465–1469 (2020).
 19.
Niehus, R., De Salazar, P. M., Taylor, A. R. & Lipsitch, M. Using observational data to quantify bias of travellerderived COVID19 prevalence estimates in Wuhan, China. Lancet Infect. Dis. 20, 803–808 (2020).
 20.
Levesque, J. & Maybury, D. W. A note on COVID19 seroprevalence studies: a metaanalysis using hierarchical modelling. Preprint at https://doi.org/10.1101/2020.05.03.20089201 (2020).
 21.
To, K. K.W. et al. Seroprevalence of SARSCoV2 in Hong Kong and in residents evacuated from Hubei province, China: a multicohort study. Lancet Microbe 1, E111–E118 (2020).
 22.
Xu, X. et al. Seroprevalence of immunoglobulin M and G antibodies against SARSCoV2 in China. Nat. Med. https://doi.org/10.1038/s4159102009496 (2020).
 23.
Liu, Y., Gayle, A. A., WilderSmith, A. & Rocklöv, J. The reproductive number of COVID19 is higher compared to SARS coronavirus. J. Travel Med. 27, taaa021 (2020).
 24.
Tsang, T. K. et al. Effect of changing case definitions for COVID19 on the epidemic curve and transmission parameters in mainland China: a modelling study. Lancet Public Health 5, e289–e296 (2020).
 25.
Zhang, J. et al. Changes in contact patterns shape the dynamics of the COVID19 outbreak in China. Science 368, 1481–1486 (2020).
 26.
Liu, Y., Eggo, R. M. & Kucharski, A. J. Secondary attack rate and superspreading events for SARSCoV2. Lancet 395, e47 (2020).
 27.
LloydSmith, J. O., Schreiber, S. J., Kopp, P. E. & Getz, W. M. Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359 (2005).
 28.
Tian, H. et al. An investigation of transmission control measures during the first 50 days of the COVID19 epidemic in China. Science 368, 638–642 (2020).
 29.
Prem, K. et al. The effect of control strategies to reduce social mixing on outcomes of the COVID19 epidemic in Wuhan, China: a modelling study. Lancet Public Health 5, e261–e270 (2020).
 30.
Haario, H., Laine, M., Mira, A. & Saksman, E. DRAM: efficient adaptive MCMC. Stat. Comput. 16, 339–354 (2006).
 31.
Brooks, S. P. & Gelman, A. General methods for monitoring convergence of iterative simulations. J. Comput. Graph. Stat. 7, 434–455 (1997).
Acknowledgements
We thank H. Tian from Beijing Normal University for comments. This study was supported by the National Natural Scientific Foundation of China (91843302), the Fundamental Research Funds for the Central Universities (2019kfyXMBZ015), and the 111 Project (X.H., S.C., D.W., C.W., T.W.). X.L. is supported by Harvard University.
Author information
Affiliations
Contributions
T.W., X.L. and C.W. designed the study. X.H., S.C., X.L. and C.W. developed statistical methods. X.H., S.C. and D.W. performed data analysis. C.W. wrote the first draft of the manuscript. All authors reviewed and edited the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature thanks David Fisman and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Evaluation of the method on simulated data with two periods.
a, b, Illustration of one simulated dataset. We estimated b_{1}, b_{2}, r_{1} and r_{2} when the other parameters were specified to their true values. The red points represent the mean estimates and the shaded areas indicate 95% credible intervals from 3,000 MCMC samples. c, Summary of results from 100 simulations. Each row represents an estimated parameter as indicated on the right, including (R_{e})_{1}, (R_{e})_{2}, r_{1} and r_{2}. The grey dashed line in each row represents the true value of the parameter to be estimated. Each column represents a specified parameter as indicated on the top, including D_{e}, D_{p}, D_{i}, D_{q}, α and r_{0}, which we specified as the true values or 20% lower or higher than the true values. Each box summarizes estimates from 100 replicates, of which the median is indicated by the horizontal line, the interquartile range is indicated by the lower and upper bounds, and the minimum and the maximum are indicated by the whiskers.
Extended Data Fig. 2 Estimation of R_{0} using daily incidence data, starting from 9 December.
Following the main analysis, we assumed r_{0} = 0.23 and set I(0) = 1, A(0) = 3, E(0) = 17 and P(0) = H(0) = R(0) = 0 accordingly. We assumed transmission rate b, ascertainment rate r and duration from illness onset to hospitalization D_{q} (set to 21 days) were the same until 22 January 2020. All the other settings were the same as in the main analysis. The shaded area in the plot indicates 95% credible intervals estimated by the deterministic model with 10,000 sets of parameter values sampled from MCMC. Unlike other analyses, we did not construct 95% credible intervals by stochastic simulations, because stochastic variation at the early days had very large effects, owing to low counts. The inserted histogram shows the distribution of the estimated R_{0} from 9 December 2019 to 22 January 2020, for which the mean estimate was 3.38 (95% credible interval 3.28–3.48).
Supplementary information
Supplementary Figures
This file contains Supplementary Figures 110.
Rights and permissions
About this article
Cite this article
Hao, X., Cheng, S., Wu, D. et al. Reconstruction of the full transmission dynamics of COVID19 in Wuhan. Nature 584, 420–424 (2020). https://doi.org/10.1038/s4158602025548
Received:
Accepted:
Published:
Issue Date:
Further reading

Epidemic dynamics of influenzalike diseases spreading in complex networks
Nonlinear Dynamics (2020)

Sentinel surveillance strategies for early detection of coronavirus disease in fever clinics: experience from China
Epidemiology and Infection (2020)

ACE2 enhance viral infection or viral infection aggravate the underlying diseases
Computational and Structural Biotechnology Journal (2020)

Acute Kidney Injury in COVID19: The Chinese Experience
Seminars in Nephrology (2020)

Covid19: breaking the chain of household transmission
BMJ (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.