Harnessing peak transmission around symptom onset for non-pharmaceutical intervention and containment of the COVID-19 pandemic

Within a short period of time, COVID-19 grew into a world-wide pandemic. Transmission by pre-symptomatic and asymptomatic viral carriers rendered intervention and containment of the disease extremely challenging. Based on reported infection case studies, we construct an epidemiological model that focuses on transmission around the symptom onset. The model is calibrated against incubation period and pairwise transmission statistics during the initial outbreaks of the pandemic outside Wuhan with minimal non-pharmaceutical interventions. Mathematical treatment of the model yields explicit expressions for the size of latent and pre-symptomatic subpopulations during the exponential growth phase, with the local epidemic growth rate as input. We then explore reduction of the basic reproduction number R0 through specific transmission control measures such as contact tracing, testing, social distancing, wearing masks and sheltering in place. When these measures are implemented in combination, their effects on R0 multiply. We also compare our model behaviour to the first wave of the COVID-19 spreading in various affected regions and highlight generic and less generic features of the pandemic development.

T he Coronavirus Disease 2019 (COVID-19) is a new contagious disease caused by the novel coronavirus (SARS-COV-2) 1 , which belongs to the genera of betacoronavirus, the same as the coronavirus that caused the SARS epidemic between 2002 and 2003 2 . COVID-19 has spread to more than 200 countries/regions, with over 102 million confirmed cases and 2.2 million lives claimed as of January 31, 2021 3 . The outbreak has been declared a pandemic and a public health emergency of international concern 4 . As the specific symptoms of COVID-19 are now well-publicised, symptomatic transmissions are being contained in most countries. However, disease transmission by pre-symptomatic and asymptomatic viral carriers is seen to be extremely difficult to deal with due to its hidden nature 5 . Clinical data reveals that viral load becomes significant before the symptom onset [6][7][8] . Epidemiological investigations have identified clear cases of presymptomatic transmission soon after the initial outbreak [9][10][11][12] . Estimates vary greatly among experts on the percentage of total transmission due to this group of viral carriers, ranging from as low as 18% to over 50% [13][14][15] . An early model-based study by Ferretti et al. 16 suggested that pre-symptomatic transmission alone could yield a basic reproduction number R 0,p = 0.9, close to the critical value of 1.0 that sustains epidemic growth. Under intense surveillance of the pandemic, pre-symptomatic and asymptomatic transmissions become the main focus in outbreak control 5 .
While the actual viral shedding is influenced by many factors, patient viral load during the course of disease progression is more universal. This suggests a modelling approach that starts with clinical observations of symptom onset, and treats disease transmission as a dependent process that is further shaped by living and social conditions, including control measures to reduce physical contact. Following this strategy, we first introduce a model for an unprotected population and calibrate the model parameters against clinical case reports during the initial outbreak. Subsequently, we estimate the percentage reduction in the basic reproduction number (estimated to be around 3.87 at an exponential growth rate of 0.3/day) due to contact tracing, mask wearing and other measures, individually or in combination. Additionally, we present our findings against the epidemic development curves around the world to highlight the level of social mobilisation required to contain COVID-19 spreading.

Results
A renewal process centred on symptom onset. In epidemiological studies, the central quantity is the average number of secondary infections per unit time r(t) by a viral carrier on day t since the individual's own infection 17,18 . In the case of COVID-19, disease transmission peaks around the symptom onset time of the individual 7,8 , as illustrated by the infectiousness curve shown in Fig. 1a (left panel). This property, when averaged over the population, gives an r(t) (Fig. 1a, right panel) that closely resembles the symptom onset time distribution, which we denote by p O (t) (Fig. 1a, middle panel). In fact, when the time window of transmission is narrowly centred around the symptom onset, we have approximately The mean reproduction number R E sets the overall level of disease transmission in the population, and equals the basic reproduction number R 0 when the infectious disease first breaks into a community. Its actual value could change over time due to factors such as the intervention and containment measures considered below. The shift parameter θ S (Fig. 1a, right panel) accommodates the actual shape of the infectiousness curve as well as effects resulting from intervention measures, e.g., isolation delays of infected cases.
To link up Eq. (1) with actual transmission data, we developed a compartmentalised epidemic spreading model as illustrated in Fig. 1b. A total of four phases are introduced to accommodate the infectiousness curve in Fig. 1a, left panel. Three of these phases reside in the pre-symptomatic period: a non-infectious latent phase L, followed by infectious phases A 1 and A 2 before and after the infectiousness peak. Starting from the day of infection, an individual first stays in the latent phase L. Transition to phase A 1 takes place at a rate α L (t) that depends on the elapsed time t since infection. Once in phase A 1 , the individual is infectious with a daily transmission rate β A . Duration of the A 1 phase is variable and follows Poisson statistics with an exit rate constant α A . On the other hand, duration of the succeeding phase A 2 is fixed at θ P , after which symptoms develop and the person enters the symptomatic phase S. Upon entering A 2 , the patient's disease transmission rate β B (τ) weakens with the elapsed time τ = t−t O + θ P to match the right-wing of the infectiousness curve. Note that, due to the variable duration of A 1 , the population-averaged infectiousness of this phase rises towards the symptom onset.
Applying the above rules of disease transmission to a large and well-mixed population, the number of new infections per unit time J L (T) on day T satisfies the renewal equation where the kernel function is given by ð Þdt 1 being the probability that an individual remains in the latent phase t days after infection. Derivation of these results are presented in Supplementary Section 1, together with dynamic equations governing the size of each subgroup.
Equations (2) and (3) can be solved by performing the Laplace transform. In this respect our model is equally tractable mathematically as the susceptible-exposed-infectious-recovered (SEIR) type models defined by a set of rate equations 19 . As we show below, the explicit representation of the temporal structure for disease progression and transmission in the present case facilitates direct model calibration from clinical data and also quantitative evaluation of intervention measures against epidemic development.
In Supplementary Section 1.3, we show that the mean reproduction number of the model is given by Þdτ being reproduction numbers associated with pre-symptomatic and symptomatic transmissions, respectively. When the right wing of the infectiousness curve in Fig. 1a takes the form of an exponentially decaying function β B τ ð Þ ¼ β A e Àα B τ with a sufficiently large decay rate α B , we recover Eq. (1) which was initially proposed on heuristic grounds. The shift parameter is given approximately by Parameter calibration. By combining three data sets 11,20,21 with a total of 347 infection cases outside the Hubei province in China, we estimated the incubation period statistics p O (t) (see Fig. 2a). Due to the difficulty in identifying a precise date of infection, a window is assigned to the incubation period in each case. A rudimentary way to deal with the uncertainty is to treat all possible values inside the window as equally likely. This procedure yields a statistical distribution for each of the three data sets as well as the conglomerated one, as shown by symbols in Fig. 2a.
Alternatively, viewing the data as samples of a common underlying probability distribution, we estimated p O (t) by likelihood maximisation (see the "Methods" section and Supplementary Section 2.1). Within the class of functions considered, the lognormal distribution combined with an exponential tail yields the largest likelihood value (Fig. 2a, red line). From day 6 onward, p O (t) follows an exponential decay with a rate of −0.31/day, with a 95% confidence interval (CI) of (−0.35, −0.27) per day. We have also examined other values (from day 4 to day 8) for the switch. In all cases, exponential tail decay rates are found to be round −0.31/ day (see Supplementary Section 2.1).
A data set of 77 pairwise transmissions in several eastern and southeastern Asian countries and regions during their initial COVID-19 outbreak was compiled by He et al. 7 We took 66 pairs with unique symptom onset of the primary case to estimate the transmission probability p I (Δt), with Δt measured from the symptom onset of the primary case. The results are shown in Fig. 2b (see the "Methods" section and Supplementary Section 2.2 for details). Under a maximum-likelihood estimation scheme, we considered three alternative forms for p I (Δt). All have exponential tails far away from the transmission peak, but differ in the way  Fig. 1 A stochastic model for COVID-19 disease progression, transmission and intervention. a The mean reproduction rate r(t) (black curve) of a patient on day t since infection is expressed as a convolution of the symptom onset time distribution p O (t) (red curve) and the infectiousness curve R E p I (Δt) (blue curve), where Δt is measured from the symptom onset. The mean reproduction number R E sets the overall level of the epidemic. The peak of the normalised infectiousness function p I (Δt) is shifted from the symptom onset by an amount θ P , which takes a positive value on the pre-symptomatic side. The peak of the mean reproduction rate r(t) is shifted from the peak of the symptom onset time distribution p O (t) by θ S . b A compartmentalised model. A person infected first goes through a non-infectious latent phase (L) until t L , followed by an infectious period that spans across symptom onset at t O . In the pre-symptomatic phase A, the person is infectious without symptoms. The A phase is further split into two subphases, A 1 with a constant transmission rate (orange region) and A 2 with a declining transmission rate (blue region). At the symptom onset time t O , the person enters the S phase, and continues to be infectious (light blue region). Contact tracing brings an infected person out of the transmission cycle at the point of isolation, while testing does so only when the result is positive.
the two wings are joined together in the peak region. In the first case, the two exponential tails join directly to produce a cusp in the middle. In the second case, a flat top of variable width is introduced. In the third case, the flat top is replaced by a parabolic cap to give a more rounded peak. It turns out that the cusp function, with its peak located at 0.68 days before the symptom onset, is the most probable for this data set (Fig. 2b). Decay rates for the left and right wings are given by 0.46/day and 0.54/day, respectively (see Supplementary Section 2.2.2). Duration of the A 1 phase follows a Poisson process and is hence exponentially distributed. This gives rise to an exponential tail of the population-averaged infectiousness curve prior to entering the A 2 phase. We therefore set the model parameters to α A = 0.46/day, θ P = 0.68 days, and β B τ ð Þ ¼ β A e Àα B τ with α B = 0.54/day. These values were used in our numerical calculations, with the corresponding CIs given in Table 1.
Serial interval. Xu et al. 22 compiled a database of 1407 COVID-19 transmission pairs outside the Hubei province in China between early January till mid-February 2020. Among them, 677 pairs have the symptom onset dates and social relationships of infector-infectees. A detailed analysis of the data set, stratified before, during and one week after the Wuhan lockdown on 23 January 2020, was carried out by Ali et al. 23 which showed reduction of the serial interval of symptom onsets by a factor of 3 over the 5 weeks. In Fig. 2c, we show the distribution of the serial interval data for the whole period (solid circles) and separately for the first (open squares) and last two weeks (open triangles) of the period. The red line gives the predicted serial interval distribution using our estimated values for p O (t) and p I (Δt) (see Supplementary Section 2). While the overall agreement with the unstratified data is good especially on the positive side, it is also evident that serial intervals can be affected by travel and prevention measures, such as the percentage of imported cases among the infectors, the typical length of isolation delays, etc., which changed substantially before and after the Wuhan lockdown. Such temporal effect on the serial interval can be simulated simply with a shape function that masks p I (Δt). For example, imported index cases who spent part of their infectious period outside the region shift p SI (t SI ) to the right. On the other hand, vigorous contact tracing shortens isolation delays significantly, which in turn shifts p SI (t SI ) to the left. The long-time tail of both p O (t) and p SI (t SI ) decays slower than the rates α A and α B associated with the infectiousness curve. We have computed the tail of the probability q L (t) to remain in the latent phase, whose decay rate matches that of p O (t) (see Supplementary Section 2.4). According to Eq. (1), the long-time tail of the mean reproduction rate r(t) can also be attributed to infected cases who have a long incubation period in their disease progression.
Mean reproduction number. Under Eq. (1), the well-known Lotka-Euler estimating equation 24 yields  Fig. 3a, which covers both the growth (λ > 0) and declining (λ < 0) phases of the epidemic. The slope of the curve at R E = 1 is given by 1/τ g , where τ g is the mean generation time and equals τ O −θ S = 6.19 days under Eq. (6). The intercept of the curve at R E = 0 gives an ultimate epidemic decay rate of −0.31/day when disease transmission comes to a complete halt.
To estimate the uncertainty in the computed R E −λ curve, we performed bootstrap analysis of the data used to obtain p O (t) and p I (Δt). The detailed procedure is described in Supplementary Section 2, with the result shown in Fig. 3a. At a growth rate of λ = 0.3/day, our estimated value for the basic reproduction number R 0 is 3.87 (95% CI [3.38, 4.48]).
Composition of the infected population. As we demonstrate in the Supplementary Information, the convolutional form of our main Eq. (2) enables many analytic results to be derived and evaluated with the calibrated parameters. Figure 3b shows the probabilities that a given individual is in one of the four phases on day t after infection, computed using the formula in Supplementary Table 1. The red line marks the boundary between the pre-symptomatic and symptomatic phases. The width of the orange-coloured region (A 1 phase), on the other hand, is proportional to α À1 A % 2 days. Figure 3c, obtained from the Laplace transforms of these curves, gives the percentage of the infected population in each of the four phases on a given day when the epidemic is growing at a rate λ. These curves allow for estimation of the hidden population in L, A 1 and A 2 phases from the knowledge of S in real-time. They form the basis for quantitative assessment of intervention measures discussed below. Note that at high growth rates, a larger percentage of the infected population is in the latent and pre-symptomatic phases, so that suppressing transmission by this group through, say mask wearing and social distancing, assumes a greater priority.
Testing and contact tracing. To break the transmission chain in the community, governments around the world have adopted two measures with varying levels of intensity: (1) testing and isolating infected individuals and (2) tracing and quarantining contacts of infected individuals.
For testing control, persons who were in close proximity to a confirmed infection case are asked to undergo voluntary or mandatory testing for infection, and quarantined when the result is positive. From Fig. 3b we see that, if the test is conducted shortly after infection, the individual has a high probability to still be in the latent phase, hence the test result is likely to be negative. On the other hand, if the test is conducted too late, the person may have already infected others so that the reduction of r(t) given by Eq. (1) is small. Therefore, there is an optimal window between the infection date and the test date, which we analyse in Supplementary Section 4.1.2. In Fig. 4a, we show the reduction of the basic reproduction number R 0 as a function of the reporting delay, assuming all suspected contacts are tested. At R 0 = 3.87, if the results become available immediately after testing, the reduction of R 0 is shown as the blue curve, better than the testing outcomes with one day delay (red curve). The largest reduction is obtained when the test is performed 3 days after the contact. This corresponds to the day when the width of the orange plus dark blue region in Fig. 3b is the widest. For contact tracing and quarantine, we show our results under the scenario that a fraction q c of infectees are tracked down and quarantined within a time window t trace since infection (Fig. 4b,  blue line). This would bring the mean reproduction number R E from R 0 = 3.87 to a value below 1 if full tracing and quarantine is executed within 6 days after contact. An 80% tracing efficiency shrinks the time window to 3-4 days for achieving the same effect (Fig. 4b, red line). Details can be found in Supplementary Social distancing and mask wearing. Other than government-led interventions to break the transmission chain, individual-led efforts, including social-distancing, mask-wearing, frequent handwashing, etc., can slow down or even stop the outbreak. Among them, radical shifts have taken place in people's attitudes towards population-wide mask wearing. It was practiced in most Asian countries since the initial phase of the outbreak, yet not adopted by the EU and USA until June 2020. As of August 2020, community mask use was recommended or required by most major public health bodies 25,26 . However, despite multiple experiments performed on measuring the trapping efficacy of masks on viral particles at individual's level 27-30 the aggregate impact of mask wearing at the population level is not yet clearly quantified. Given the now established risk of pre-symptomatic transmission, and the dominant role of droplet-mediated COVID-19 infections 31 , masks with relatively low efficacy for personal protection may nevertheless reduce the overall infections in a population 32 . Based on a previous study on influenza aerosols 33 , we constructed a semi-quantitative model to show that mask-wearing reduces r(t) and hence R E by a factor (1−e ⋅ p m ) 2 , where e is the efficacy of trapping viral particles inside the mask, and p m is the percentage of the mask-wearing population (see Supplementary Section 4.2). According to this model, even for masks with intermediate efficacy (e = 50%), population-wide mask-wearing at p m = 98% alone could bring down R E from its basic value R 0 = 3.87 to 1, assuming no social segregation of mask-wearing and non-maskwearing groups.
When combined with contact tracing (Fig. 4c), the two effects multiply. Figure 4c shows a heatmap of the reduced R E when contact tracing and isolation is completed within 5 days of infection. The solid black line indicates that the reduced R E reaches 1. For example, the combination of tracing of close contacts at 60% efficiency within 5 days and 60% of the general public wearing masks achieves the same purpose. This target line can be reached with lower percentages when close contacts can be found within 2 days of possible infection (dash-dotted line), but the numbers need to be higher when the time frame is relaxed to 8 days (dashed line).
Provincial outbreaks and containment in China. We examined the temporal progression of COVID-19 outbreaks in different Daily growth rate (1/day) Fig. 3 Basic model predictions. a The relationship between the epidemic growth rate λ and the mean reproduction number R E . The grey lines, generated using the data shown in Fig. 2 with bootstrapping, give the range of uncertainty in the estimated function. At λ = 0.3/day, R E = R 0 = 3.87. CI stands for confidence interval. b Probabilities for an infected individual being in each of the four phases on day t after infection. The thick red line indicates the boundary between the pre-symptomatic and symptomatic phases. c Percentage of the infected population in each of the phases when the epidemic is growing at a rate λ. The thick red curve indicates the boundary between the pre-symptomatic and symptomatic population.
parts of the world using the data available from the Johns Hopkins CSSE Repository 34 , with the aim to extract more universal aspects of the pandemic development in light of our model studies. In the case of China, we focused on the daily confirmed cases from various provinces since the Wuhan lockdown on January 23, 2020. Broadly speaking, the ascending and descending curves follow very similar exponential laws, while the time it took to achieve the crossover was affected by the overall extent of the epidemic as well as occurrences of smaller outbreaks. From the data, we define three phases of epidemic development. Phase I is characterised by an exponential growth of the epidemic. In the first week after the Wuhan lockdown, nearly all provinces registered a growth rate of~0.3/day (Fig. 5, region shaded in pink) in the newly confirmed cases. Reports indicate that most of the growth during this period was driven by imported cases from Hubei province, whose own growth continued at this rate for a longer period (Fig. 5a). The fraction of local infections during import-driven growth can be calculated and the result depends on the local value of R E through its mean reproduction rate Eq. (1) (see Supplementary Section 4.3).
Phase II is a crossover phase where public policies on border control and local intervention measures become increasingly stringent. On a logarithmic scale, data from the most affected provinces (except Hubei) show consistent behaviour. Closer examination, however, reveals the presence of sporadic outbreaks. Well-documented examples include prison cases in Hubei, Shandong and Zhejiang provinces 35 . Overall, under the swift and forceful implementation of COVID-19 surveillance, turnaround of the epidemic in provinces other than Hubei was reached in about 3 weeks after the Wuhan lockdown. In Fig. 5b and the Supplementary Fig. 5 (see Supplementary Section 4.4), we present simulation results using our model, assuming a linear decrease of R E from a local value of 2.0 to zero over a period ΔT, which indeed reproduces the data in Fig. 5. The more gradual change of R E assumed in our simulations can be interpreted as due to the progressive mobility control and isolation policies including additional lockdowns, which took place from February 4 to 10, 2020 36,37 , as well as allocation of massive resources by relevant authorities to conduct rigorous contact tracing and to rapidly expand isolation facilities for use by COVID-19 patients 38 .
Phase III, or the final descent, occurred when the intervention measures essentially terminated transmission in the community. The few that re-emerged were quickly traced and contained. Within our model, the newly confirmed cases in this period are identified with the shrinking number of individuals moving from the latent to the symptomatic phase, as one moves along the time axis in Fig. 3b (see also Supplementary Section 4.5). Strikingly, the observed decay rate in this phase reached the maximum value of 0.31/day predicted by our model, including data from Hubei province shown in Fig. 5a. This observation indicates that the infected cases were isolated at extremely high efficiency. Interestingly, a similar decay in the daily new cases is seen on the cruise ship Diamond Princess (Fig. 5b).
The first wave of COVID-19 outbreaks in other countries and regions. Figure 6a- Fig. 4 Reduction of the mean reproduction number upon intervention. a Testing. Results are given for testing with 0 or 1 day reporting delay (blue and red curves), respectively. CI confidence interval. b Contact tracing and isolation. Results are shown for 100% (blue) and 80% (red) success rates, respectively. The 95% CIs of the estimated quantities in a and b (shaded areas) were obtained through bootstrap resampling with 1000 replications symptom onset times of 347 cases 11,20,21 and exposure windows of 66 transmission pairs 7 . c Mask-wearing in combination with contact tracing. The heatmap gives the reduced R E when contact tracing is implemented within 5 days after infection, assuming a basal value of 3.87. The solid black line marks the percentages required to reduce R E to 1. The dash-dotted line and the dashed line map out the percentages required to flatten the epidemic growth when the time frame for contact tracing is reduced to 2 days or relaxed to 8 days, respectively.
countries and regions from late January till end of March 2020. Countries and regions in east Asia shown in Fig. 6a experienced the first wave sooner than the rest of the world, but the epidemic growth rate is much lower than other places due to the prevention measures in place such as border control and mask wearing by the general public. Despite these measures, South Korea documented a major outbreak in the second half of February that elevated the overall level of the epidemic in the country 39 (Fig. 6c). In countries in Europe and in the US, exponential growth of the pandemic, with a growth rate close to 0.3/day, were reported from the beginning of March onward (Fig. 6b), driven by local infections. The surging pandemic triggered an emergency response by public health authorities and governments at all levels. Towards the end of March, countries that adopted stringent intervention measures have seen a significant reduction of the pandemic growth rate (Fig. 6b). The government of Italy imposed a national quarantine on March 9, 2020 40 , after which growth in the number of newly confirmed cases slowed down 34 . On the other hand, South Korea implemented aggressive contact tracing and testing policies 41,42 , enabling the country to bring the outbreak to a much-reduced level at R E ≈ 1.0.
In Fig. 6d we show the estimated epidemic growth rate λ(T) against the cumulative number of confirmed cases N(T) in five representative countries. We computed the growth rate from the local slope of the ln N(T) against T curve, i.e., λ(T) = ln [N(T)/N (T−ΔT)]/ΔT, using a time window ΔT = 3 days. The interval between a few tens to a few thousands cumulative cases can be taken as the first phase of local outbreaks in these countries, where the estimated values of λ(T) remain approximately stable. Three of the five countries exhibited growth rates of~0.3/day during this period, while Iran and Japan assumed values above 0.4/day and around 0.1/day, respectively. It is evident that epidemic preparedness and cultural aspects significantly affected COVID-19 spreading in the local population, before government intervention and containment measures took effect. A more complete discussion of growth rates during the exponential phase in different countries and regions can be found in Supplementary Section 5.

Discussions
We have succeeded in developing a directly calibratable model for COVID-19 transmission by both pre-symptomatic and symptomatic viral carriers. This was made possible by focusing on transmission around the symptom onset, which is a prominent feature of the disease. Explicit mathematical expressions for the size of subpopulations in various phases of disease progression and the associated transmission risks are obtained. These results facilitate assessment of control measures, either to break the transmission chain or to reduce the overall level of social contacts in the community. For example, contact tracing, in combination with mask wearing in public places, can have a strong and immediate effect in bringing down epidemic growth. In reality, governments often take incremental steps in intervention measures to ease their impact on the economy and on people's livelihood. The quantitative treatment of epidemic control carried out in this study can serve as a reference in the decision-making process.   On a technical level, the modelling framework presented here is intuitive and flexible, and allows easy association of clinical features with population level pandemic development. This can be a significant advantage when the need arises to adapt the epidemic model to specific social environments and demographic composition. Our estimated incubation period distribution is in excellent agreement with other studies (see Table 1 for a comparison of key statistical features) and furthermore is not expected to change significantly over time. This places Eq. (1) as a convenient starting point for exploring temporal structures of epidemic development. The shift parameter θ S in the equation embodies, in an explicit form, changing patterns of disease transmission from symptomatic to the pre-symptomatic viral carriers, and hence can serve as an important index for epidemic control.
With regard to the quantitative predictions under specific intervention measures, the main uncertainty comes from estimation of their efficacy in reducing transmission from the infectious subpopulations identified in this study. As a baseline study, we estimated the infectiousness function p I (Δt) based on a relatively small data set of 66 transmission pairs which led to a sizable CI at 95% for its wings. This could improve as more carefully curated transmission cases during the initial outbreak become available. Response of the public to specific intervention measures is a complex topic that deserves extensive research in the future.
Finally, as with other epidemic models that assume a wellmixed population, our current modelling framework does not treat epidemic spreading in a heterogeneous population that exhibits complex spatio-temporal dynamics, nor does it consider significant differences in disease progression and transmission in different age groups. Some of the basic questions in COVID-19 epidemiological studies, such as whether pre-symptomatic spread constitutes a major contributor to disease transmission 43,44 , cannot have definitive answers without considering these additional factors. In a large population, while individual outbreaks in specific communities may still follow the dynamics proposed here with suitable values of R E , transmission across communities requires a separate treatment.

Methods
Key variables and parameters. We collect key variables and parameters of the compartmentalised model together with the estimated values in Table 1 for easy reference.
Incubation period distribution. We analysed incubation periods of a total of N = 347 cases by combining three datasets 11,20,21 . For most cases, the infection date can only be assigned to a time window of more than one day. Therefore, the actual incubation period falls between IPl i and IPu i , i = 1, …, N, where IPl i and IPu i are the lower and upper bounds for case i. We perform maximum-likelihood estimation of the underlying symptom onset time distribution p O (t), following a scheme proposed by Reich et al. 47 Considering the exponential tail observed in the real data, we write where A is the normalisation factor and θ denotes the set of parameters to be estimated. Transition to the exponential decay (with rate γ) takes place at t e . Following common practice in the epidemiological literature, we take p left (t) to be a truncated log-normal or Weibull distribution with two parameters in each case Continuity of derivatives at t e yields Thus we are left with a set of three independent parameters. To estimate these parameters from the data, we consider the likelihood function with IPl i À0:5 We performed optimisation and sensitivity analyses by scanning t e values from 4 to 8, and infinity for log-normal distribution and from 3 to 7, and infinity for Weibull distribution. The best estimate is obtained when p left (t) is a truncated lognormal distribution with t e = 6 (see Supplementary Sec. 2.1 for details).
We also performed bootstrap analysis to determine uncertainties in the estimated p O (t). This is done by generating 1000 re-sampled copies of the original dataset with 347 cases. The maximum-likelihood estimation of p O (t) is then performed for each of the re-sampled copies. The 95% CIs were obtained from the 1000 replications (see Table 1).
Infectiousness profile. Disease transmission is quantified by the infectiousness function p I (t), the probability density function for pairwise transmission at time t since the symptom onset of the infector. We infer p I (t) by maximum-likelihood estimation, using the infector-infectee pairs published by He et al. 7 . In this dataset, the infectee exposure windows were documented in addition to the symptom onset dates of both infectors and infectees (77 pairs in total). Among them, 66 pairs have a unique symptom onset date (see Source Data), which are used here.
Given the general form and the limited temporal resolution of the dataset, we adopted simple exponentials for the two wings of the infectiousness function joined in the middle by a cap function, where A is the normalisation factor. The infectiousness function transits to the left exponential tail at θ A and to the right exponential tail at θ B . Between θ A and θ B , it takes the form of f(t). We consider three different forms of f(t): • Model 1: f(t) = 1 and θ A = θ B = θ P (two exponential tails directly join at θ P ); Independent parameters θ = (α A ,α B ,θ P ).
We perform maximum-likelihood estimations using the dataset mentioned above, where each transmission pair i is associated with an exposure window W i = [Wl i ,Wu i ] relative to the symptom onset of the infector. The likelihood function is constructed as follows: where L i ¼ L θ; Wl i ; Wu i ð Þ ¼ Z Wu i þ0:5 Wl i À0:5 Sensitivity analysis is performed at a set of values for ϵ (Model 2) and χ (Model 3), respectively. In both cases, the best estimate degenerates into Model 1 (see Supplementary Section 2.2 for details).
The uncertainty in the estimated p I (t) is determined through bootstrapping with 1000 replications, with which the 95% CIs were obtained (see Table 1).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/ licenses/by/4.0/.