Evolution of disease transmission during the COVID-19 pandemic: patterns and determinants

Epidemic models are being used by governments to inform public health strategies to reduce the spread of SARS-CoV-2. They simulate potential scenarios by manipulating model parameters that control processes of disease transmission and recovery. However, the validity of these parameters is challenged by the uncertainty of the impact of public health interventions on disease transmission, and the forecasting accuracy of these models is rarely investigated during an outbreak. We fitted a stochastic transmission model on reported cases, recoveries and deaths associated with SARS-CoV-2 infection across 101 countries. The dynamics of disease transmission was represented in terms of the daily effective reproduction number (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt). The relationship between public health interventions and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt was explored, firstly using a hierarchical clustering algorithm on initial \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt patterns, and secondly computing the time-lagged cross correlation among the daily number of policies implemented, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt, and daily incidence counts in subsequent months. The impact of updating \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt every time a prediction is made on the forecasting accuracy of the model was investigated. We identified 5 groups of countries with distinct transmission patterns during the first 6 months of the pandemic. Early adoption of social distancing measures and a shorter gap between interventions were associated with a reduction on the duration of outbreaks. The lagged correlation analysis revealed that increased policy volume was associated with lower future \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt (75 days lag), while a lower \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt was associated with lower future policy volume (102 days lag). Lastly, the outbreak prediction accuracy of the model using dynamically updated \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt produced an average AUROC of 0.72 (0.708, 0.723) compared to 0.56 (0.555, 0.568) when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt was kept constant. Monitoring the evolution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt during an epidemic is an important complementary piece of information to reported daily counts, recoveries and deaths, since it provides an early signal of the efficacy of containment measures. Using updated \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt values produces significantly better predictions of future outbreaks. Our results found variation in the effect of early public health interventions on the evolution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R_t$$\end{document}Rt over time and across countries, which could not be explained solely by the timing and number of the adopted interventions.

Throughout the manuscript, we make use of publicly available data of newly confirmed daily cases, recoveries and deaths as reported by Johns Hopkins University (JHU) from 22 Jan until 31 Dec 2020. Our SEIRD model has been coded in Python 3.8 and a replicable example of the model can be found at github.com/EliotZhu/EpiModel.
We used an extension of the traditional SEIRD model that allows for variation in delays from onset to report and the detection rate of testing in each country. In this way, if S k , E k , I k , R k , and D k represent respectively the susceptible, exposed, infectious, recovered and dead for each country or region k, their change in time is modeled as: where N k (t) = S k (t) + E k (t) + I k (t) + R k (t) + D k (t) is the total population after the travel ban if we ignore births and deaths unrelated to COVID-19. The effective transmission rate, β k (t) , and and the mean fatality rate, ξ k (t), for each country are assumed to follow geometric Brownian motions: Meanwhile, the clinical parameters σ -the mean rate of becoming symptomatic (that is 1/incubation period), and γ -the mean rate of recovery (that is 1/infectious period) are considered constant and obtained from the literature with σ = (5.2 days) −1 and γ = (2.9 days) −1 . We therefore implicitly assume individuals become infectious when they are symptomatic. The unknown parameters of interest, β k (t) and ξ k (t), are estimated by jointly fitting the daily observed cumulative number of confirmed cases ( C k ), confirmed recoveries ( R k ) and deaths ( D k ) using: where ρ k represents a delay between the actual incidence, recovery or death date and their reported date, and η k (t) , is the time-varying case detection rate. The time-varying transmission rate, β k (t) and mortality rate, ξ k (t) were estimated using sequential Monte Carlo (SMC) by jointly fitting them to three data sets: 1. Daily number of new cases reported in each country by date from onset.
2. Daily number of recoveries reported in each country.
3. Daily number of deaths reported in each country.
The SMC process proposes trajectories of the unobserved state process according to the probabilistic rules specified in equations (1) to (10). For each country, we generated 300 trajectories x n, j with 5000 particles (n = 1, 2, . . . , 5000, j = 1, 2, . . . , 300) for x = β k (t) and ξ k (t) respectively. We first used the model outputs to calculate expected trajectories for each of the three datasets we are fitting to, then we used a negative binomial observation model to compute the likelihood of fitting to these datasets, using the expected values from model outputs. The time varying basic reproduction number can be calculated as R t = β(t)/γ. The value of assumed delay from disease onset to reporting ρ k ∈ [0, 25], the detection rate at each time point η k (t) ∈ [0, 1], and the volatility of the random walks ζ k ∈ [0, 1] were estimated using grid search via the maximum likelihood approach (see Figures 1 to 3). Initial expected values of β k (0) and ξ k (0) were set to zero.
Profile likelihoods (see Figure 4 ) for each parameter were constructed based on the joint likelihood distribution of fitted counts of incidence, recovery and death. We assumed that the outbreak started with one infectious individual in each country and that the country's population was initially fully susceptible. The accuracy of the fitting is presented in Figure 5.

Additional results for clustering analysis
We present the dendrogram from the WPGMA clustering algorithm in Figure 6. Clusters 1, 2, 3, 4, and 5 are colored as pale blue, light blue, blue, light gray and gray respectively. The effective reproduction number was fitted since the outbreak onset in each country for the first 60 days. Across the 101 selected countries, the estimated R t peaked on average 29·6 (25·36,33·81) days before the crest in infections. The average peak duration was 5·4 (4·84,6·06) days. The adoption of social distancing policies took place on average 15·6 (13,18·23) days from the first reported case and 5·3 (1·69, 8·84) days before the estimated R t peak. 76·9% of the examined interventions were applied before the end of peak duration. We report the descriptives in the Table 1.  This table records selected descriptive statistics on the early patterns of counts, recoveries, deaths, and public health interventions, as well as on the corresponding estimated effective reproduction number per cluster. All values were averaged over the countries in each cluster and include the 95% confidence interval in brackets. Clusters were estimated from patterns of the effective reproduction number in the selected 101 countries as at 15th of May 2020 as described in the main text of the manuscript. The first section of the table (as indicated by the solid horizontal lines) displays information on the estimated effective reproduction number. The second section of the table records the adoption of the 12 recorded social distancing and economic interventions. The % columns record the percentage of countries that implemented the corresponding intervention in each cluster. The last section displays the average number of observed tests, reported death, reported infections per thousand population and reported death per infection.

The computation of time lagged cross correlation (TLCC)
To compute the time lagged cross correlation (TLCC) between two time series A(t) = sin(t/100) and B(t) = sin((t + 1)/100), where t ∈ {0, 1, 2, . . . , 1000} (see Figure 7), the correlation between the two time series was repeatedly estimated after shifting the time series B by one time step at a time (blue line in Figure 7). The shifted time (namely the offset) at which the correlation is maximum was then calculated. For example, in the example represented in Figure 8, the correlation has a maximum when B is shifted by 100 time steps. Similarly, we can also shift B backwards in time and find that the correlation is minimum when B is shifted backwards by 100 time steps.

Prediction accuracy and sensitivity to key parameters
Prediction of future outbreaks (defined as a major growth in cumulative incidence count in a 30-day window with a growth rate threshold value q ∈ [1, 3].) were conducted for each country and over 230 daily rolling windows from 15 May to 31 Dec 2020. The average AUROC by country (obtained by assuming different growth rate threshold) of this prediction can be found in Figure 9; its average by cluster is displayed in Table 2.  Across identified clusters in section 3.1 of the main manuscript, the prediction accuracy is best for clusters one and five (see Table 2).
To examine the sensitivity of the study findings to some model assumptions, we replicated the prediction study conducted in the main manuscript but assuming different levels of two key transmission parameters: the mean infectious period and the incubation period. We present the averaged AUROC over prediction windows and countries in Table 3. We observed an improvement in accuracy with longer mean infectious period or shorter incubation period.  This table records the average outbreak prediction accuracy (AUROC) using dynamic R t for different values of γ: 1/mean infectious period, and σ: 1/mean incubation period.
In Figure 10, we present the estimated mean R t by six selected countries and the predicted new incidence counts since 15 May 2020 to 15 Jan 2021 using the proposed SERID model with dynamic R t . Figure 10: Estimated mean R t (left panel) and predicted incidence counts (right panel) by country since 15 May 2020 to 15 Jan 2021.