Abstract
Many locations around the world have used realtime estimates of the timevarying effective reproductive number (\({R}_{t}\)) of COVID19 to provide evidence of transmission intensity to inform control strategies. Estimates of \({R}_{t}\) are typically based on statistical models applied to case counts and typically suffer lags of more than a week because of the latent period and reporting delays. Noting that viral loads tend to decline over time since illness onset, analysis of the distribution of viral loads among confirmed cases can provide insights into epidemic trajectory. Here, we analyzed viral load data on confirmed cases during two local epidemics in Hong Kong, identifying a strong correlation between temporal changes in the distribution of viral loads (measured by RTqPCR cycle threshold values) and estimates of \({R}_{t}\) based on case counts. We demonstrate that cycle threshold values could be used to improve realtime \({R}_{t}\) estimation, enabling more timely tracking of epidemic dynamics.
Introduction
Monitoring the transmission of an emerging infectious disease in a timely manner is crucial to evaluate the effectiveness of public health and social measures and to inform better control policies. During the coronavirus diseases 2019 (COVID19) pandemic, realtime assessment of transmission has generally been achieved through monitoring the timevarying effective reproductive number, \({R}_{t}\). A number of statistical approaches have been developed to allow estimation of \({R}_{t}\) from time series of daily case counts either recorded by date of illness onset or by date of laboratory confirmation, or from time series of observed deaths^{1,2}. Though efforts have been made to reduce the impact of lag in \({R}_{t}\) estimates^{3,4}, the majority of these approaches tend only to be able to estimate \({R}_{t}\) with a lag of one week or more, because COVID19 transmission can occur prior to illness onset^{5}, because of delays between individuals being infected and polymerase chain reaction (PCR) detectable and/or showing symptoms (which are typically around 3–5 days for COVID19)^{6}, and because of delays between illness onset and diagnosis. In Hong Kong, we estimated \({R}_{t}\) with a 7day lag by accounting for presymptomatic transmission and reporting delays^{7,8}.
An individual infected by SARSCoV2 will typically experience viral load peaking around illness onset and monotonically decreasing during the following two weeks^{9}. While viral loads can vary across individuals, with some shedding more than others^{10,11}, the mean distribution of viral loads from a group of patients measured around the time of illness onset will tend to have higher values than that of viral loads from a group of patients measured at a later time after infection^{12}. Collectively, higher populationlevel viral loads would correlate with more infected persons being earlier in the course of infection and vice versa^{13}. Viral loads can be proxied by cycle threshold (Ct) values in the realtime quantitative reversetranscription polymerase chain reaction (RTqPCR) assay, with lower Ct values indicative of higher viral loads.
A recent study showed that the distribution of viral loads among confirmed cases can provide inferences on transmission dynamics within populations, where populationlevel Ct values skewing towards lower values indicate more individuals have been recently infected, corresponding to an increasing rate of epidemic growth in the community, especially where single strain dominates^{13}. The method was demonstrated in a modeling study using crosssectional samples in a smallscale outbreak in Boston^{13} while the correlation was also observed over an epidemic wave elsewhere^{14}. Here, we incorporated Ct values from COVID19 cases in Hong Kong, a location with intense surveillance and casefinding efforts, to demonstrate that including data on population viral load distributions from symptombased surveillance could support realtime tracking of transmission.
Results
In Hong Kong, COVID19 cases are detected through clinical diagnosis for individuals with acute respiratory symptoms and public health surveillance for the community with a priority to people with predefined high exposure risks^{15} (see Methods). Close contacts of confirmed cases are traced and placed into quarantine outside the home, and repeatedly tested. All laboratoryconfirmed COVID19 cases, including asymptomatic cases, are isolated within the hospital and receive multiple RTqPCR tests during their stay. After excluding imported cases, we analyzed the first available record of Ct value (derived from RTqPCR tests targeting E gene^{16}) for each confirmed case and characterized the daily distribution of Ct values (measured by mean and skewness) that were sorted by sampling days. We included two consecutive epidemics in July–August 2020 (i.e., the third wave) and November 2020 through March 2021 (i.e., the fourth wave), which were dominated by local transmissions instead of imported cases^{8,17}.
A total of 8646 local COVID19 cases were detected during periods studied, among which 77% (\(n=6700\)) were symptomatic. Cases who were asymptomatic at the time of testing were more likely to be epidemiologically linked with other known cases compared to symptomatic cases (81% vs. 61%, chisquared test \(P < 0.001\)), suggesting they were more likely to be detected from contact tracing or from compulsory testing for populations with predefined high risks of exposures (see Methods) and could be detected earlier than symptomatic cases. Ct values were available for 96% (\(n=8268\)) of local cases and included in further analyses. All included cases had not been vaccinated, as the local COVID19 vaccination program began in late February 2021 towards the end of the period studied. Variants of concerns (VoCs) were not reported among local cases during the study period, while two separate wildtype lineages dominated the two studied waves^{18}.
We first examined the correlation between the distribution of daily Ct values and the local transmission dynamics^{8} (measured by the incidencebased \({R}_{t}\); see Methods). The temporal Ct distribution tracked very closely the incidencebased \({R}_{t}\) over the two epidemic waves (Fig. 1; Supplementary Fig. 1a). Higher values of incidencebased \({R}_{t}\) were found when the average Ct values decreased (Spearman’s correlation coefficient, \(\rho =0.79\), \(P < 0.001\) for the third wave and \(\rho =0.52\), \(P < 0.001\) for the fourth wave) and when the Ct skewed towards lower values (i.e., greater values of skewness estimates; \(\rho =0.80\), \(P < 0.001\) for the third wave and \(\rho =0.27\), \(P < 0.001\) for the fourth wave) (Fig. 1; Supplementary Table 1).
To confirm that the changes in the observed daily Ct distribution were mostly driven by the epidemic dynamics despite individual variations in viral shedding, we extrapolated the Ct value back to illness onset for symptomatic cases using the fitted association that Ct values increase 1.057 (95% confidence interval (CI): 1.050–1.063) per day after illness onset (Supplementary Fig. 2; see Methods). We found that distributions of Ct values at onset were less variable than those at sampling during the studied period (coefficient of variation for skewness: 0.37 vs. 0.80) (Supplementary Figs. 3 and 4), suggesting a relatively stable peaking level of viral loads across individuals over the course of the epidemic.
To use Ct values for realtime assessing COVID19 transmission in the community, we fitted a loglinear regression to daily incidencebased \({R}_{t}\) on daily mean and skewness of Ct values at sampling during the third wave (i.e., training period; see Methods). We found that the distribution of Ct values explained 72% of the observed variations in incidencebased \({R}_{t}\) during the training period (Supplementary Table 2). We then applied the trained model to the daily Ct distributions in the fourth wave (i.e., testing period) to estimate \({R}_{t}\) in real time (i.e., Ctbased \({R}_{t}\)). We found that the Ctbased method provided accurate realtime estimations of \({R}_{t}\) during the 7day lagged window suffered by the conventional incidencebased \({R}_{t}\) estimation method (Fig. 2a). We found high correlations between Ct and incidencebased \({R}_{t}\) for both training (Spearman’s correlation coefficient, \(\rho =0.81\), \(P < 0.001\)) and testing periods (\(\rho =0.48\), \(P < 0.001\)) (Fig. 2b–d). We conducted sensitivity analyses to account for the potential impact of age on Ct distributions (Supplementary Fig. 5) and for changes in proportions of symptomatic cases (Supplementary Fig. 6) and the resulting Ctbased \({R}_{t}\), and found that the high correlation between Ct and incidencebased \({R}_{t}\) remained.
We performed a further validation of our results by training the model using data from November to December 2020 (i.e., early stage of the fourth wave) and predicting the later stage of the fourth wave and the third wave, and found the high accuracy of predictions still held (Supplementary Fig. 7). We also performed a 10fold crossvalidation, in which we randomly assigned data between 6 July 2020 to 31 March 2021 into 10 validation sets. We found that on average 81% (ranging from 75% to 85%) of the Ct and incidencebased \({R}_{t}\) estimates were directionally consistent across validation sets. These results suggested that relationships between Ct distributions and \({R}_{t}\) estimates were not affected by temporal autocorrelation of incidencebased \({R}_{t}\). In addition, we found that our model predictions were insensitive to the selection of the training period as long as the training period had sufficient samples (e.g., >30 samples per day as suggested in Supplementary Table 4) and could reflect changes in both epidemic growth and population Ct distributions. Such training period often covered the transition point when \({R}_{t}\) shift around 1 and would span an epidemic peak in places with clear waves (Supplementary Fig. 8). Longer training periods did not necessarily lead to better performance, possibly due to the variability in the longer tail with low numbers of samples (Supplementary Figs. 1 and 8).
We used synthetic data to examine the potential impact of case detection on our methods. We simulated two consecutive epidemic waves (Supplementary Fig. 9a) using a compartment transmission model and investigated various case detection scenarios (Supplementary Fig. 9b). We also simulated individual viral load trajectories, incubation periods and samplesinceonset intervals to determine Ct values at sampling for detected symptomatic cases in the simulations (see Methods). We found that Ctbased \({R}_{t}\) can recover the simulation truth when there is limited and changing case detection (Supplementary Fig. 10). Specifically, Ctbased \({R}_{t}\) is correlated with the simulation truth under scenarios with varying case detection (i.e., scenario 3; Spearman \(\rho\) = 0.77, 95% CI: 0.73–0.81) and with certain degree of under detection (i.e., scenario 4; Spearman \(\rho\) = 0.65, 95% CI: 0.53–0.75) (Supplementary Fig. 10c, d; Supplementary Table 7).
Discussion
In this study, we applied a simplified Ctbased method to provide precise estimates of daily \({R}_{t}\) and demonstrated that such a method could be used for realtime \({R}_{t}\) estimation. Conventionally, the main challenge in estimating \({R}_{t}\) in realtime was largely caused by the delays between an individual being infected and being PCR detectable or illness onset^{6,19}. Linking the incidencebased \({R}_{t}\) and the populationlevel Ct distribution among samples collected on a given day was able to mitigate the rightcensoring issue (i.e., missing cases that were infected but notyetobserved due to the latent period^{6}) encountered by incidencebased methods for assessing transmission^{1,6}. Although studies^{3,4} demonstrated nowcasting and projection of incidencebased \({R}_{t}\) during the rightcensoring time window, these estimates were indicative values rather than genuine estimates informed with realtime empirical data.
The few studies that have used population viral loads to infer COVID19 epidemics only provided probability distributions of the estimated position of a community within an epidemic curve^{13,20}, while our study provides precise longitudinal \({R}_{t}\) estimates using a method that required less complicated computation efforts, which further demonstrates the potential to improve realtime situational awareness using the Ctbased methods. In addition, we showed that the daily Ct distribution could be applied for tracking epidemics under a symptom and contacttracing based setting, such as Hong Kong, providing empirical data to support the hypotheses generated from previous Ctbased studies^{13}. Temporal changes in population Ct distribution over an epidemic largely reflect changes in infectiontosampling delays that is determined by delays of infectiontoonset (i.e., incubation) and onsettotesting (i.e., testing delay). We showed that the onsettotesting did not provide additional information to the population Ct distribution (Supplementary Table 5), suggesting the observed changes in Ct distribution was largely driven by collective changes in the exposure time. In particular we demonstrated that epidemic progress might better explain temporal changes in population Ct distribution than changes in testing delays due to varied detection patterns in our case.
Our simplified Ctbased method also provides an approach for realtime estimation of \({R}_{t}\) without requiring intensive surveillance of COVID19 (i.e., accurate daily case counts by onset or diagnosis), which is of great significance especially for areas and time periods with limited and/or changing surveillance capacity. As the main prerequisite of the model was the distribution of Ct values among confirmed cases, our findings were less sensitive to changes in case reporting (e.g., due to definition changes and/or testing capacity constraints), which, by contrast, could affect conventional incidencebased \({R}_{t}\) estimation if not accounted for^{21,22}. For example, our results showed that populationlevel Ct distributions remained informative in tracking epidemic changes over time regardless of changes in surveillance in Hong Kong, especially by expanding the testing capacity and therefore detecting more cases at earlier disease stage (i.e., asymptomatic cases; Supplementary Fig. 6) during the fourth wave in our case^{15}. This was further supported by the high accuracy of Ctbased \({R}_{t}\) under various settings of case detection in our simulations. Of note, \({R}_{t}\) estimated with very few Ct samples (e.g., less than 30; Supplementary Table 4) collected on a given day can lead to larger uncertainty, though we believe this may not be an issue for most areas with prevalent local COVID19 transmissions, even if testing capacity is limited.
Our work is not the first attempt to improve realtime tracking COVID19 transmission. Another study^{23} demonstrated that using sewage surveillance could shorten the prediction delays to two days ahead of test positives, which, however, did not fully overcome the delays between being infected and being PCR detectable^{6}. In addition, the possibilities of locating sewage samples containing viral RNA could be low, especially when transmission in the community was low. Under such a situation, our method, which leverages existing information from confirmed cases, maybe less resourceconsuming and faster to implement, as long as the reporting delay could be shortened.
Future applications of our method may need adaptation to different populations, especially among those with which viral load trajectories differ. In particular, populations with higher SARSCoV2 vaccination rates may expect increased average Ct values when \({R}_{t}\) is greater than 1, as lower viral loads were found in cases who had received COVID19 vaccinations^{24}. Similarly, increased Ct values when \({R}_{t}\) is greater than 1 may also be found in populations younger than Hong Kong’s population, due to the generally lower viral loads observed among young people^{9}. As such, while we believe the intrinsic relationship between population viral loads and \({R}_{t}\) estimates will remain valid (as long as the time relation between infection and viral shedding still holds), recalibrations of the model may be needed when applying our model to different populations.
By the time this study was performed, there were limited VoCs circulating in Hong Kong^{18}, therefore we were not able to validate the generalizability of our model under outbreaks dominated by VoCs. A modeling study suggested that differences in populationlevel Ct values of samples from symptombased surveillance were more likely to reflect changes in viral load trajectories instead of differences in transmission rates across strains^{13,25}. As such, if increased viral loads (i.e., lower Ct values) occur with variant infections^{26}, this may lead to decreased average Ct values when \({R}_{t}\) is greater than 1. Therefore, monitoring the model performance and leveraging information on factors affecting viral load levels (e.g., genomic surveillance) are needed. For instance, unfavorable model performance (e.g., lower consistency between incidencebased and Ctbased estimates for more than a week) could indicate changes in correlations between population Ct distribution and epidemic progress. Under such cases, investigations about the driving factors of the divergence (e.g., changes in circulation strains or agestructure of the infected population) are needed to recalibrate the model.
To summarize, in this analysis we applied a simplified method to incorporate the populationlevel viral loads into the realtime estimation of transmission rates for COVID19 under symptombased surveillance. We demonstrated that the Ctbased method could provide accurate nowcasting of \({R}_{t}\) potentially allowing capacityconstrained regions to track local outbreaks quantitatively in a timely manner. Our method may need adaptions to different populations and the evolving strains, mainly to recalibrate the absolute extent to which the population viral loads correlate with COVID19 transmission.
Methods
Study settings
Hong Kong was among the first places to identify COVID19 cases globally, with its first COVID19 case detected in late January 2020^{8,27}. Cases were classified as “imported cases”, “local cases epidemiologically linked with imported cases”, “unlinked local cases”, and “local cases epidemiologically linked with local cases” according to their epidemiological characteristics and location of infection. All suspected COVID19 cases were confirmed by RTqPCR in a local centralized public health laboratory. A laboratoryconfirmed COVID19 case was defined as a local case if the case did not visit places outside of Hong Kong in the 14 days before symptom onset (for symptomatic cases) or confirmation (for asymptomatic cases); otherwise defined as imported cases.
By the time of analysis, four different waves of transmissions have occurred in Hong Kong. In this study, we restricted our analyses to the third (July 2020 to August 2020) and fourth (November 2020 to March 2021) waves which were dominated by local transmission, where 89% were local cases. Since we were interested in the local transmission of COVID19, we only included local cases (unlinked local cases and local cases epidemiologically linked with local cases) in our analyses. Given the stringent border controls since July 2020^{28} and the extremely small number of local cases linked with imported cases in Hong Kong (<0.1%), we assumed that all unlinked local cases were infected from other local cases. We did not include the first two waves (i.e., January to May 2020) as they were predominately imported cases and smaller clusters linked to those imported cases^{8,17}.
In Hong Kong, local COVID19 cases were generally detected from clinical diagnoses that targeted people with acute respiratory symptoms and from public health surveillance that targeted populations with predefined high risks of exposures (e.g., staff working at healthcare centers; residents living in neighborhoods with any labconfirmed cases) by health authorities^{15}. Upon case confirmation, contact tracing was carried out based on epidemiological information, with details described elsewhere^{29}. Among 8646 local COVID19 cases confirmed during our studied periods, 77% (6700 out of 8646) were detected symptomatic and 65% (5651 out of 8646) were found with epidemiological links with other known cases; across 23% (1946 out of 8646) of cases who were detected as asymptomatic, 81% (1584 out of 1946) of them were linked to other local cases. As such, the surveillance of COVID19 in Hong Kong is largely symptom and contacttracingbased.
Data sources
Data on viral load of COVID19 cases
In Hong Kong, all confirmed COVID19 cases (including asymptomatic cases) were admitted to hospitals for isolation and standardized management, with their hospitalization records stored in the data system managed by Hospital Authority (HA). Results for SARSCoV2 RTqPCR tests (LightMix^{®} Modular SARSCoV2 (COVID19) Egene, TIB Molbiol/Roche, Berlin, Germany)^{16} were recorded as Ct values in the system. The Ct value is the number of cycles needed to amplify the viral RNA in a specimen where the reported fluorescent signal reaches a predefined level in RTqPCR assays. Therefore, the Ct value is inversely associated with viral load and could be used as a semiquantitative measurement for viral load. In our main analysis, we used Ct values to measure viral load and analyzed the first recorded Ct value for each local case (which was usually sampled on or one day before admission) during the study period. Population viral load distributions were assessed by the date when samples were collected.
Demographic and epidemiological information of confirmed COVID19 cases
We obtained demographic and epidemiological information from the Department of Health of the Government of Hong Kong, including age, date of symptom onset, and case classification (i.e., local, imported, and contacts of local or imported cases).
Ethical approval for this study was obtained from the Institutional Review Board of the University of Hong Kong (IRB No. UW 20341).
Statistical methods
Estimating incidencebased \({R}_{t}\)
We estimated the incidencebased \({R}_{t}\) for local cases using an extension of Cori et al.^{3,7,30}. Briefly, local COVID19 cases confirmed on each day \(t\), \({Q}_{1}\left(t\right)\) was used for deconvolution to estimate the number of infections on each day \(t\), \({Y}_{1}\left(t\right)\)^{31}. We assumed an average 5.2 days (SD 3.9) for the incubation period^{19} and an average 4.7 days (SD 3.2 days, unpublished data) delay between illness onset to reporting empirically observed in Hong Kong, which were used for deconvolution. In this framework, the daily local \({R}_{t}\) (i.e., the incidencebased \({R}_{t}\) in our analysis) was the ratio between the number of new local cases at time \(t\), \({Y}_{1}\left(t\right)\), and the total infectiousness of cases at time \(t\), given by \(\mathop{\sum }\limits_{k=1}^{t1}{Y}_{1}\left(k\right){w}_{L}(tk)\), where \({w}_{L}(tk)\) denote the probability of being infectious \(tk\) days after infections. The transmission was modeled by a Poisson process, and therefore, we have
\({w}_{L}(tk)\) was estimated using the convolution of the incubation period (mean 5.2 days, SD 3.9)^{19} and the infectiousness relative to onset^{5} (details described elsewhere^{7}). To fully utilize available case count information and to provide more timely \({R}_{t}\) estimates under the incidencebased method, we used the smoothing method described in Cori et al.^{30} and calculated \({R}_{t}\) estimates over a time window of size \(\tau =14\) ending at time \(t\), assuming that the transmission rates was constant over the time period \(\left[t\tau +1,{t}\right]\). We used a Markov chain Monte Carlo algorithm to estimate the incidencebased \({R}_{t}\), and we assumed the prior for \({R}_{t}\) is Gamma (1,5) with mean and SD equal to 5^{32}. To account for the uncertainty of other parameter such as the incubation period, we used an bootstrap approach in Salje et al^{33} to reconstruct 200 epidemic curves and perform estimation. After that we presented the mean, 2.5% and 97.5% quantiles for those 200 \({R}_{t}\) estimates for each day \(t\). More details about incidencebased \({R}_{t}\) estimation was described elsewhere^{7}.
Temporal distribution of populationlevel Ct values
We analyzed the first available Ct value record for each local COVID19 case (i.e., \({y}_{j,t}\), \(t\) is the calendar date when the first sample was collected for individual \(j\)). To characterize the temporal distribution of populationlevel Ct values over the study period, we fitted a generalized additive model (GAM) to the abovementioned data over calendar time:
where \(s(t)\) was the smooth function for date \(t\) over the study period. 95% confidence intervals (CIs) of the smoothed average daily Ct were derived from 500 bootstraps (Fig. 1b; Supplementary Fig. 1a). In each bootstrap, we resampled from the data on cases’ first available Ct values and refitted the GAM. We also illustrated temporal changes in delays between illness onset to sampling and found a consistent pattern between the temporal trend of Ct distributions and that of delays (Supplementary Fig. 1). We did not include samples collected between 1 September 2020 to 31 October 2020 due to the small number of samples that were collected on each day.
To validate that the observed temporal variations in populationlevel Ct distribution was not driven by variations in individual viral load trajectories, we estimated the Ct value on the date of illness onset based on the observed pattern of Ct values against timesinceonset (Supplementary Fig. 2). We fitted a loglinear regression of the first available Ct value for individual \(j\) (\({y}_{j}\)) on the time interval between the individual’s illness onset and first sample collection (\({\delta }_{j}\)) and age group (\({a}_{j}\), modeled as categorical, i.e., 0–18, 19–64, and \(\ge\)65 years old)
where \({\beta }_{1}\), \({\beta }_{2}\), and \({\beta }_{3}\) are the estimated coefficients for the time interval between the illness onset and first sample collection, age group, and their interaction, respectively. We then calculated the backprojected Ct value at illness onset by setting \({\delta }_{j}=0\). We chose the loglinear model as the Akaike information criterion (AIC) indicated it outperformed the linear model in terms of model fit (−9177 and 38,279 for the loglinear and linear models, respectively).
To compare differences in the temporal trend of Ct values at sampling and at onset, we fitted GAM of Ct values at sampling (Eq. (2)) (or at onset as in Eq. (4)) against the smoothed calendar time over the third wave when sample sizes were over 30 per day
where \({\hat{y}}_{j,t}\) is the extrapolated Ct value at illness onset for individual \({j}\) who had illness onset on the calendar date \(t\). We calculated the mean and skewness of the Ct values at sampling or at onset over each biweekly window throughout the study period. Both results showed that Ct values at sampling were more variable than Ct values extrapolated at illness onset (Supplementary Figs. 3 and 4), suggesting variations in individual viral load trajectories may not be the major driver of the observed temporal variation in populationlevel Ct distribution over our study period during which only the wildtype SARSCoV2 strains have been circulating locally.
Incorporating Ct distributions into \({R}_{t}\) estimation (Ctbased \({R}_{t}\))
We used the mean (\({\bar{x}}_{t}\)) and skewness (\({b}_{t}\))^{34} to measure the distribution of Ct values that were sampled on date \(t:\)
where \({y}_{t,i}\) represented the \(i\)th (\(i={{{{\mathrm{1,2}}}}},\ldots ,{n}_{t}\)) of the total \({n}_{t}\) Ct values that were sampled on day \(t\). 95% CIs of the daily skewness \({b}_{t}\) were calculated from 500 bootstraps (Fig. 1c), with data on cases’ first available Ct values resampled in each bootstrap to recalculate the daily skewness.
We first calculated the Spearman’s rank correlation coefficient (\(\rho\)) between daily Ct distribution (i.e., daily mean and skewness) and the natural logtransformed incidencebased \({R}_{t}\) (Supplementary Table 1). To determine the best fit model that characterized the association between daily Ct distribution and the incidencebased Rt, we compared AIC of a series of regression models over the training period (i.e., 6 July 2020 to 31 August 2020), which used different formats of dependent variable and measurements for predictive variables (Supplementary Table 3). We compared models that were fitted to linear scale and natural logtransformed incidencebased \({R}_{t}\). We also assessed models that included different combinations of measurements for daily Ct distributions, including mean, median, and skewness. We imputed the daily Ct distributions using the average of that within the preceding 7 days when no samples were collected on that day. The model fitted to natural logtransformed incidencebased \({R}_{t}\) (\({{{{{\rm{ln}}}}}}\left({R}_{t}\right)\)) on the daily mean (\(\bar{{x}_{t}}\)) and skewness (\({b}_{t}\)) of Ct values was found with the lowest AIC and was used in our main analyses (Supplementary Table 3)
where \({\gamma }_{\bar{x}}\) and \({\gamma }_{b}\)were coefficients for daily mean and skewness of Ct values from the regression model and were reported in Supplementary Table 2 after exponential transformation.
We explored the impact of the training period and sample sizes for our estimation. We trained our model over different training periods with various starting dates (either between 4 and 23 July 2020 or between 10 and 29 November 2020) and we set lengths of these alternative training periods like 30, 40, 50, and 60 days respectively, after which we compared their adjusted R square and demonstrated the time period covered by the bestfit model over training periods of the same length (Supplementary Fig. 8). For sample sizes, we calculated the Spearman correlation coefficients between incidencebased and Ctbased estimates under different sample size intervals and found that Ctbased \({R}_{t}\) tended to be more accurate with over 30 records per day (Supplementary Table 4).
To assess whether our results would be affected by the age distribution of cases who were sampled on each day, we performed a sensitivity analysis by including the mean age (\(\bar{{a}_{t}}\)) of cases whose first sample were collected on day t into the abovementioned main model (Eq. (7))
Results suggested similar predictions from models with and without considering cases’ age distribution (Supplementary Fig. 5).
To assess whether our results would be affected by changes in sampling strategies in Hong Kong, we first looked at temporal changes in the proportion of symptomatic cases among all confirmed local cases (Supplementary Fig. 6a). We performed a sensitivity analysis by fitting the main model (Eq. (7)) using only records from symptomatic cases and found no significant difference from our main results (Supplementary Fig. 6). We also adjusted for delays from illness onset to sampling in our main model (Eq. (7)) and found that changes in sample collections (coefficient β = 0.93, 95% CI: 0.87–1.01) did not alter the association between population Ct distribution and incidencebased \({R}_{t}\) (Supplementary Table 5).
Crossvalidations of the model
To validate the generalizability of this Ctbased method, we fitted the main model (Eq. (7)) using data from an alternative training period, i.e., from 20 November 2020 to 19 December 2020 (the initial stage of the fourth wave) (Supplementary Fig. 7, Supplementary Table 2).
We further performed tenfold crossvalidation by randomly splitting the data between 6 July 2020 and 31 March 2021 into ten validation sets, after excluding days when less than five available Ct samples were collected. For each validation, we held one set as a testing set and trained the remaining nine sets using the main model (Eq. (7)). We compared the consistency between the Ct and incidencebased \({R}_{t}\) for the testing set by calculating the proportion of days when the two estimates were simultaneously below or above 1 (i.e., in the same direction) over the total duration of each validation set. We also assessed the prediction performance using the mean absolute error (MAE) for the Ct (\(E({R}_{t})\)) and incidencebased \({R}_{t}\) for each validation set
where \(d\) is a given date in the validation set \(D\) and \({N}_{D}\) is the number of days included in each validation set. We found an average of 0.28 (ranging from 0.25 to 0.34) of the MAE across ten validation sets, suggesting a good performance of our model predictions.
Simulations
Transmission model
We used a susceptibleexposedinfectiousrecovered (SEIR) model to simulate two consecutive epidemic waves assuming a closed population (n = 7.5 million, approximately the same size to the population in Hong Kong) and initial infections of 0.001%. Briefly, we simulated infections with a stochastic SEIR model, with compartments for susceptible (S), exposedbutnotyetinfectious (E), infectious (I), and recovered (R). The compartmental transition equations are listed below:
where \({\beta }_{t}=\frac{{R}_{0}}{\gamma }\) for \(t\ge {t}_{0}\). \(1/\sigma\) (\(\sigma =5{days}\)^{19}) indicated the average time for individuals to transit from E to I, while \(1/\gamma\) (\(\gamma =4{days}\)^{13}) referred to the observed mean infectious period. Detailed descriptions of parameters were listed in Supplementary Table 6.
We used synthetic \(\beta\) (which determines the underlying transmission rate) that changes over time t to synthesize the process of two consecutive epidemic waves
where \({R}_{0}=\) 2.2, \({R}_{0}^{3}=\) 1.9 and \({R}_{0}^{2}={R}_{0}^{4}=0.3\). Epidemic switch points were set at day 60 and 110 and \({R}_{0}\) changes between switch points were fitted via a cubic smoothing spline and interpolated into smooth transitions. \({R}_{t}\) calculated under this SEIR model (denoted as the simulation truth) would then be
Symptombased case detections
In symptombased surveillance, we assumed that only individuals who developed symptoms after infections (which follow a binomial distribution with \({p}_{{SymInf}}=0.6\)^{35,36}) would be detected after illness onset. We assumed the incubation period followed lognormal distribution (mean = 5.2, SD = 3.9)^{19}, while we estimated the delays between onset to detection with a gamma distribution (shape = 1.83 and rate = 0.43) using observations from Hong Kong.
We simulated four different scenarios to represent various intensities of case detection (Supplementary Fig. 9):

1.
Scenario 1: a fixed detection probability of 25%. We used this scenario to represent the practice of stable detection, as the case in Hong Kong^{15}.

2.
Scenario 2: a fixed detection probability of 10%. We used this scenario to represent the situation of stable but limited detection capacity.

3.
Scenario 3: the probability of detection increased from 15% to 60% over the second simulated wave. We used this scenario to represent the situation of expansion in case definition, as in the initial stage of the outbreak in mainland China^{22}.

4.
Scenario 4: a fixed detection probability of 25% except for the underdetection (lowest at 5%) during the initial stage of the second simulated wave.
Individual viral load trajectory
We simulated the viral load trajectories over the infection course for each detected symptomatic case using the previously published method^{37}. We assumed a unimodal trend of Ct value changes that will reach the lowest on the date of illness onset (and therefore had the same distribution of incubation periods^{19}), with the lowest Ct value (i.e., peak viral load) following a normal distribution with a mean of 22.3 and SD of 4.2^{11}. The duration of viral shedding since onset was parameterized as normally distributed with mean and SD of 17 and 0.94 days^{38}. Each infected individual, if detected, would then have their corresponding sampled Ct values as the Ct value falling on day \(k\) post infection based on their own Ct trajectories, with \({k}\) being the time interval between their dates of infection and detection.
Daily \({R}_{t}\) and populationlevel Ct from simulations
Incidencebased \({R}_{t}\) from synthetic case count data was estimated using the R package EpiNow2^{3} that has accounted for delays and other sources of uncertainty in a more sophisticated way. The incubation period and reporting delay that were used for deconvolution were assumed to follow the delay distributions that were simulated from symptombased surveillance. The mean and variance of the generation interval under the SEIR model were specified as \({T}_{c}=1/\sigma +1/\gamma\) and \({{{{{\rm{Var}}}}}}=2{(\frac{{T}_{c}}{2})}^{2}\) respectively, with \(\sigma\) and \(\gamma\) being the average time for individuals to transit from E to I and from I to loss of infectiousness respectively (see Supplementary Table 6). More details were provided in https://github.com/epiforecasts/EpiNow2.
The daily distribution of population Ct from the simulations was estimated by mean and skewness, as in Eqs. (5) and (6). The regression model (Eq. (7)) used to generate Ctbased \({R}_{t}\) under each scenario was selected by comparing the adjusted R square of models fitted over different training periods during the first simulated wave (Supplementary Fig. 10), after which we applied the model to estimate the Ctbased \({R}_{t}\) for days following the training period (denoted as the testing period). Spearman correlation coefficients \(\rho\) between Ctbased \({R}_{t}\) and the simulation truth were calculated to evaluate the accuracy of our estimates.
To investigate the uncertainty in sampling Ct values and therefore the accuracy of Ctbased \({R}_{t}\) estimates, we repeated each scenario 100 times and calculated the Spearman correlation coefficient between estimated Ctbased \({R}_{t}\) and the simulation truth for each simulation. We calculated the mean, 2.5 and 97.5% quantiles of the correlation coefficient across 100 simulations for each scenario (Supplementary Table 7).
All statistical analyses were conducted in R version 4.1.2 (R Development Core Team, 2021).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
All demographic and epidemiological information of confirmed COVID19 cases is freely available from the Centre for Health Protection website (https://www.coronavirus.gov.hk/eng/index.html). Daily aggregate data (including case counts, incidencebased \({R}_{t}\) and Ct distributions), and simulation data generated in this study have been deposited in the GitHub repository.
Code availability
All codes for analyses are available at the GitHub repository.
References
Gostic, K. M. et al. Practical considerations for measuring the effective reproductive number, Rt. PLOS Comput. Biol. 16, e1008409 (2020).
Flaxman, S. et al. Estimating the effects of nonpharmaceutical interventions on COVID19 in Europe. Nature 584, 257–261 (2020).
Abbott, S. et al. Estimating the timevarying reproduction number of SARSCoV2 using national and subnational case counts [version 2; peer review: 1 approved with reservations]. Wellcome Open Res. https://doi.org/10.12688/wellcomeopenres.16006.2 (2020).
Parag, K. V. Improved estimation of timevarying reproduction numbers at low case incidence and between epidemic waves. PLoS Comput. Biol. 17, e1009347 (2021).
He, X. et al. Temporal dynamics in viral shedding and transmissibility of COVID19. Nat. Med. 26, 672–675 (2020).
Kucirka, L. M., Lauer, S. A., Laeyendecker, O., Boon, D. & Lessler, J. Variation in falsenegative rate of reverse transcriptase polymerase chain reactionbased SARSCoV2 tests by time since exposure. Ann. Intern. Med. 173, 262–267 (2020).
Tsang, T. K., Wu, P., Lau, E. H. Y. & Cowling, B. J. Accounting for imported cases in estimating the timevarying reproductive number of COVID19 in Hong Kong. J. Infect. Dis. https://doi.org/10.1093/infdis/jiab299 (2021).
Realtime dashboard. https://covid19.sph.hku.hk/ (2021).
Jones, T. C. et al. Estimating infectiousness throughout SARSCoV2 infection course. Science 373, eabi5273 (2021).
Ke, R. et al. Daily sampling of early SARSCoV2 infection reveals substantial heterogeneity in infectiousness. Preprint at medRxiv https://doi.org/10.1101/2021.07.12.21260208 (2021).
Kissler, S. M. et al. Viral dynamics of acute SARSCoV2 infection and applications to diagnostic and public health strategies. PLOS Biol. 19, e3001333 (2021).
Wölfel, R. et al. Virological assessment of hospitalized patients with COVID2019. Nature 581, 465–469 (2020).
Hay, J. A. et al. Estimating epidemiologic dynamics from crosssectional viral load distributions. Science 373, eabh0635 (2021).
Tso, C. F., Garikipati, A., GreenSaxena, A., Mao, Q. & Das, R. Correlation of population SARSCoV2 cycle threshold values to local disease dynamics: exploratory observational study. JMIR Public Health Surveill. 7, e28265 (2021).
Yang, B. et al. Universal community nucleic acid testing for COVID19 in Hong Kong reveals insights into transmission dynamics: a crosssectional and modelling study. Clin. Infect. Dis. https://doi.org/10.1093/cid/ciab925 (2021).
Tsui, E. L. H. et al. Development of a datadriven COVID19 prognostication tool to inform triage and stepdown care for hospitalised patients in Hong Kong: a populationbased cohort study. BMC Med. Inform. Decis. Mak. 20, 323 (2020).
Yang, B. et al. Changing Disparities in Coronavirus Disease 2019 (COVID19) burden in the ethnically homogeneous population of hong kong through pandemic waves: an observational study. Clin. Infect. Dis. https://doi.org/10.1093/cid/ciab002 (2021).
Gu, H. et al. Genomic epidemiology of SARSCoV2 under an elimination strategy in Hong Kong. Nature Communications 13, 736, https://doi.org/10.1038/s41467022284207 (2022).
Li, Q. et al. Early transmission dynamics in wuhan, china, of novel coronavirusinfected pneumonia. N. Engl. J. Med. 382, 1199–1207 (2020).
Andriamandimby, S. F. et al. Crosssectional cycle threshold values reflect epidemic dynamics of COVID19 in Madagascar. Preprint at medRxiv https://doi.org/10.1101/2021.07.06.21259473 (2021).
Pitzer, V. E. et al. The impact of changes in diagnostic testing practices on estimates of COVID19 transmission in the United States. Am. J. Epidemiol. https://doi.org/10.1093/aje/kwab089 (2021).
Tsang, T. K. et al. Effect of changing case definitions for COVID19 on the epidemic curve and transmission parameters in mainland China: a modelling study. Lancet Public Health 5, e289–e296 (2020).
Peccia, J. et al. Measurement of SARSCoV2 RNA in wastewater tracks community infection dynamics. Nat. Biotechnol. 38, 1164–1167 (2020).
LevineTiefenbrun, M. et al. Initial report of decreased SARSCoV2 viral load after inoculation with the BNT162b2 vaccine. Nat. Med. 27, 790–792 (2021).
Hay, J. A., KennedyShaffer, L. & Mina, M. J. Viral loads observed under competing strain dynamics. Preprint at medRxiv https://doi.org/10.1101/2021.07.27.21261224 (2021).
Frampton, D. et al. Genomic characteristics and clinical effect of the emergent SARSCoV2 B.1.1.7 lineage in London, UK: a wholegenome sequencing and hospitalbased cohort study. Lancet Infect. Dis. https://doi.org/10.1016/S14733099(21)001705 (2021).
Cowling, B. J. et al. Impact assessment of nonpharmaceutical interventions against coronavirus disease 2019 and influenza in Hong Kong: an observational study. Lancet Public Health 5, e279–e288 (2020).
Yang, B. et al. The differential importation risks of COVID19 from inbound travellers and the feasibility of targeted travel controls: a case study in Hong Kong. The Lancet Regional Health—Western Pacific https://doi.org/10.1016/j.lanwpc.2021.100184 (2021).
Adam, D. C. et al. Clustering and superspreading potential of SARSCoV2 infections in Hong Kong. Nat. Med. 26, 1714–1719 (2020).
Cori, A., Ferguson, N. M., Fraser, C. & Cauchemez, S. A new framework and software to estimate timevarying reproduction numbers during epidemics. Am. J. Epidemiol. 178, 1505–1512 (2013).
Becker, N. G., Watson, L. F. & Carlin, J. B. A method of nonparametric backprojection and its application to AIDS data. Stat. Med. 10, 1527–1542 (1991).
Thompson, R. N. et al. Improved inference of timevarying reproduction numbers during infectious disease outbreaks. Epidemics 29, 100356 (2019).
Salje, H. et al. Reconstruction of antibody dynamics and infection histories to evaluate dengue risk. Nature 557, 719–723 (2018).
Joanes, D. N. & Gill, C. A. Comparing measures of sample skewness and kurtosis. J. R. Stat. Soc. 47, 183–189 (1998).
Li, C. et al. Estimating the instantaneous asymptomatic proportion with a simple approach: exemplified with the publicly available COVID19 surveillance data in Hong Kong. Front. Public Health https://doi.org/10.3389/fpubh.2021.604455 (2021).
Wu, P. et al. Suppressing COVID19 transmission in Hong Kong: an observational study of the first four months. Res. Square https://doi.org/10.21203/rs.3.rs34047/v1 (2021).
Quilty, B. J. et al. Quarantine and testing strategies in contact tracing for SARSCoV2: a modelling study. Lancet Public Health 6, e175–e183 (2021).
Cevik, M. et al. SARSCoV2, SARSCoV, and MERSCoV viral load dynamics, duration of viral shedding, and infectiousness: a systematic review and metaanalysis. Lancet Microbe 2, e13–e22 (2021).
Acknowledgements
We thank the Department of Health and Hospital Authority of the Food and Health Bureau of the Government of Hong Kong for providing the data for the analysis. This project was supported by the Health and Medical Research Fund, Food and Health Bureau, Government of the Hong Kong Special Administrative Region (grant no. COVID190118; B.J.C.), and the Themebased Research Scheme (Project No. T11712/19N; B.J.C.) of the Research Grants Council of the Hong Kong SAR Government.
Author information
Authors and Affiliations
Contributions
All authors meet the ICMJE criteria for authorship. The study was conceived by B.J.C. and B.Y. Y.L., E.H.Y.L., J.Y.W., H.S.B., J.K.C., F.H., H.G., and T.K.T. prepared the data. B.Y. and Y.L. developed the model. Y.L. and B.Y. conducted the data analyses. S.C., D.C.A., S.T.A., N.H.L.L., T.K.T., P.W., G.M.L., and B.J.C. interpreted the results. Y.L. and B.Y. wrote the first draft of the paper. All authors provided critical review and revision of the text and approved the final version.
Corresponding author
Ethics declarations
Competing interests
B.J.C. consults for AstraZeneca, GSK, Moderna, Roche, Sanofi Pasteur, and Pfizer. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks James Hay and Sen Pei for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Lin, Y., Yang, B., Cobey, S. et al. Incorporating temporal distribution of populationlevel viral load enables realtime estimation of COVID19 transmission. Nat Commun 13, 1155 (2022). https://doi.org/10.1038/s41467022288129
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467022288129
This article is cited by

Systematic Review on the Correlation Between SARSCoV2 RealTime PCR Cycle Threshold Values and Epidemiological Trends
Infectious Diseases and Therapy (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.