Abstract
The dynamics of SARSCoV2 replication and shedding in humans remain poorly understood. We captured the dynamics of infectious virus and viral RNA shedding during acute infection through daily longitudinal sampling of 60 individuals for up to 14 days. By fitting mechanistic models, we directly estimated viral expansion and clearance rates and overall infectiousness for each individual. Significant persontoperson variation in infectious virus shedding suggests that individuallevel heterogeneity in viral dynamics contributes to ‘superspreading’. Viral genome loads often peaked days earlier in saliva than in nasal swabs, indicating strong tissue compartmentalization and suggesting that saliva may serve as a superior sampling site for early detection of infection. Viral loads and clearance kinetics of Alpha (B.1.1.7) and previously circulating nonvariantofconcern viruses were mostly indistinguishable, indicating that the enhanced transmissibility of this variant cannot be explained simply by higher viral loads or delayed clearance. These results provide a highresolution portrait of SARSCoV2 infection dynamics and implicate individuallevel heterogeneity in infectiousness in superspreading.
Similar content being viewed by others
Main
Transmission of SARSCoV2 by both presymptomatic and asymptomatic individuals has been a major contributor to the explosive spread of this virus^{1,2,3,4,5}. Recent epidemiological investigations of community outbreaks have indicated that transmission of SARSCoV2 is highly heterogeneous, with a small fraction of infected individuals (often referred to as superspreaders) contributing a disproportionate share of forward transmission^{6,7,8}. Transmission heterogeneity has also been implicated in the epidemic spread of several other important viral pathogens, including measles and smallpox^{9}. Numerous behavioural and environmental explanations have been offered to explain transmission heterogeneity, but the extent to which the underlying features of the infection process within individual hosts contribute towards the superspreading phenomenon remains unclear. Addressing this gap in knowledge will inform the design of more targeted and effective strategies for controlling community spread.
Viral infection is a highly complex process in which viral replication and shedding dynamics are shaped by the complex interplay between host and viral factors. Recent studies have suggested that the magnitude and/or duration of viral shedding in both nasal and saliva samples correlate with disease severity, highlighting the potential importance of viral dynamics in influencing infection outcomes^{10,11,12,13}. Variation in viral load has also been suggested to correlate with transmission risk^{14}. In addition to implications for pathogenesis and transmission, defining the contours of viral shedding dynamics is also critical for designing effective surveillance, screening and testing strategies^{15}. To date, studies aimed at describing the longitudinal dynamics of SARSCoV2 shedding have been limited by (1) sparse sampling frequency, (2) failure to capture the early stages of infection when transmission is most likely, (3) absence of individuallevel data on infectious virus shedding kinetics and (4) biasing towards the most severe clinical outcomes^{16,17,18,19,20,21}. This is also true for viruses beyond SARSCoV2, because the dynamics of natural infection in humans have not been described in detail for any acute viral pathogen.
Here we capture the longitudinal viral dynamics of mild and asymptomatic early acute SARSCoV2 infection in 60 people by recording daily measurements of both viral RNA shedding (from midturbinate nasal swabs and saliva samples) and infectious virus shedding (from midturbinate nasal swabs) for up to 14 days. We reveal a striking degree of individuallevel heterogeneity in infectious virus shedding between individuals, thus providing a partial explanation for the central role of superspreaders in community transmission of SARSCoV2. We also directly compare the shedding dynamics of Alpha (B.1.1.7) and previously circulating nonAlpha viruses, revealing no substantial differences in nasal or saliva shedding. Altogether, these results provide a highresolution, multiparameter empirical profile of acute SARSCoV2 infection in humans and implicate persontoperson variation in infectious virus shedding in driving patterns of epidemiological spread of the pandemic.
Description of cohort and study design
During the fall of 2020 and spring of 2021, all faculty, staff and students at the University of Illinois at UrbanaChampaign were required to undergo at least twice weekly quantitative PCR with reverse transcription (RT–qPCR) testing for SARSCoV2 (ref. ^{22}). We leveraged this largescale, highfrequency screening programme to enrol symptomatic, presymptomatic and asymptomatic SARSCoV2infected individuals. We enroled university faculty, staff and students who reported a negative RT–qPCR test result in the past 7 days and were either (1) within 24 h of a positive RT–qPCR result or (2) within 5 days of exposure to someone with a confirmed positive RT–qPCR result. These criteria ensured that we enroled people within the first days of infection.
We collected both nasal and saliva samples daily for up to 14 days to generate a highresolution portrait of viral dynamics during the early stages of SARSCoV2 infection. Participants also completed a daily online symptom survey. Our study cohort was primarily young (median age, 28 years; range, 19–73 years), nonHispanic white and skewed slightly towards males (Supplementary Table 1). All infections were either mild or asymptomatic, and none of the participants were ever hospitalized for COVID19. All participants in this cohort reported that they had never been previously infected with SARSCoV2, and none were vaccinated against SARSCoV2 at the time of enrolment.
Early SARSCoV2 viral dynamics vary significantly between individuals
To examine viral dynamics at the individual level, we plotted cycle theshold (Ct)/cycle number (CN) values from both saliva and nasal swab samples (the RT–qPCR assay used for nasal swab samples reports CN values, an objective measure of the cycle number of the maximal rate of PCR signal increase, rather than Ct values. CN and Ct values are equivalent in suitability for quantitative estimates^{23}, Quidel SARS Sofia 2 antigen fluorescent immunoassay (FIA) results and viral culture data from nasal swabs, as a function of time relative to the lowest observed CN values (Fig. 1a and Extended Data Fig. 1). In many cases we captured both the rise and fall of viral genome shedding in nasal and/or saliva samples. A comparison between individuals revealed substantial heterogeneity in shedding dynamics, with obvious differences in the duration of detectable infectious virus shedding, clearance kinetics and the temporal relationship between shedding in nasal and saliva compartments. Further, nine out of 60 individuals had no detectable infectious virus in nasal samples (Fig. 1a and Extended Data Fig. 1).
Generally, earlier positivity results in the viral culture assay (which suggests higher infectious viral loads) were associated with lower CN values in nasal samples (Fig. 1b). This is unsurprising, as both nasal viral genome load and viral infectivity were assayed using the same sample. Saliva Ct values tended to be higher than matched nasal samples, probably due in part to the lower molecular sensitivity of the specific saliva RT–qPCR assay used, which does not include an RNA extraction step^{24}. For both sample types the relationship between viral culture results and Ct/CN values was not absolute, because several nasal swab samples with CN values >30 also tested positive for infectious virus. These data indicate that caution must be exercised when using a simple Ct/CN value cutoff as a surrogate for infectious status.
We also assessed the relationship between antigen FIA and viral culture results, and found that participants tested positive by antigen FIA on 93% of the days on which they also tested positive by viral culture (Fig. 1c). This finding is consistent with earlier crosssectional studies examining the relationship between antigen test positivity and infectious virus shedding^{25,26}.
While the symptom profiles selfreported by study participants varied widely across individuals, all cases were mild and did not require medical treatment (Extended Data Fig. 2). To determine whether any specific symptoms correlated with viral culture positivity, we compared the reported frequencies for each symptom on days where individuals tested viral culture positive or negative (Extended Data Fig. 3). Muscle aches, runny nose and scratchy throat were significantly more likely to be reported on days when participants were viral culture positive, suggesting these specific symptoms as potential indicators of infectious status. No other symptoms examined exhibited a clear association with viral culture status. Selfreported symptom data from this study may be partially skewed by having been collected after participants were notified of their initial positive test result or potential exposure.
Withinhost mechanistic models capture viral dynamics in nasal and saliva samples
To better quantify the specific features of viral dynamics within individuals, we implemented five withinhost mechanistic models based on models developed previously for SARSCoV2 and influenza infection (Methods, Fig. 2a and Extended Data Fig. 4)^{27,28,29}. We fit these models to viral genome loads derived from the observed Ct/CN values using a population mixedeffect modelling approach (Methods). The viral dynamics in nasal and saliva samples were distinct from each other in most individuals, indicating strong compartmentalization of the oral and nasal cavities. We thus fit the models to data from nasal and saliva samples separately. For each sample type, viral genome loads from four individuals remained very low or undetectable throughout the sampling period (Extended Data Fig. 1), suggesting that these individuals either (1) were enroled late during infection despite having a recent negative test result or (2) exhibited highly irregular shedding dynamics. Because we were primarily interested in early infection dynamics, data from these individuals were excluded. Altogether, we selected data from 56 out of 60 individuals for each sample type for model fitting. Addition of the excluded individuals did not change the main conclusions (analysis not shown).
To identify factors that might partially explain the observed variation in individuallevel dynamics, for each model we tested whether the age of participants or the infecting viral genotype (that is, nonB.1.1.7 versus B.1.1.7) covaried with any of the estimated model parameters in the model fitting. A total of 114 model variations were tested (see Methods). We compared the relative abilities of these model variations to capture RT–qPCR data using the corrected Akaike information criterion (AICc) and found that, in general, the refractory and effector cell models best describe data from nasal and saliva samples, respectively (Supplementary Tables 2 and 3). In the refractory model (Fig. 2a), we assumed that target cells can be rendered refractory to infection through the activity of soluble immune mediators released by infected cells such as interferon^{30}. In the bestfit immune effector cell model (Fig. 2a), we assumed that innate and adaptive immune cells are activated and recruited to eliminate infected cells, leading to increased viral clearance^{28}. See Supplementary Tables 4–6 for estimated values of the population and individual parameters and the fixed parameter values, respectively. Overall, these models described the observed Ct/CN values in both nasal and saliva samples very well (Fig. 2b).
The frequent longitudinal sampling of participants during early infection provided a unique opportunity for precise quantification of viral load kinetics during the viral expansion phase, before the peak in genome shedding. We estimated the mean early exponential expansion rate, r, before peak viral load (growth rate, for short) to be 4.4 d^{–1} (s.d. ± 0.5 d^{–1}) in the nasal compartment. The growth rate is 8.8 d^{–1} (s.d. ± 1.8 d^{–1}) in the saliva compartment, much higher than in the nasal compartment (Fig. 2c,d).
Viral clearance kinetics clearly differed between nasal and saliva samples (Fig. 2b–d). For nasal samples, viral genome loads decreased relatively quickly after peak, mostly driven by loss of productively infected cells, and we estimated an average death rate of productively infected cells at 2.5 d^{–1} (s.d. ± 0.4 d^{–1}); however, viral decline slowed over time. In saliva, postpeak viral genome loads declined initially at a slower rate than that in nasal samples. Consequently, we estimated a much smaller average death rate of productively infected cells in saliva during this phase, at 0.4 d^{–1} (s.d. ± 0.3 d^{–1}). However, our model suggested the existence of a second clearance phase with a more rapid decline occurring 1–2 weeks after infection, potentially due to the onset of effector cell and/or neutralizing antibody responses. Overall, we estimate that it takes on average 4.9 d (s.d. ± 0.5 d) and 3.9 d (s.d. ± 0.8 d) from infection to peak viral loads in the nasal and the saliva compartments, respectively (Fig. 2c,d). The average period from peak to undetectable genome viral load was 22.3 d (s.d. ± 8.3 d) and 14.9 d (s.d. ± 3.2 d) in the nasal and saliva compartments, respectively.
Interestingly, the model predicts a significant correlation (P < 0.01) in nasal samples between age and the Φ parameter, which describes the effectiveness of the antiviral immune response in rendering target cells refractory to infection (Fig. 2e). This suggests that innate immune responses are less effective at limiting SARSCoV2 in the nasal compartment of older individuals within our cohort, consistent with previous studies describing dysregulation of innate immunity to viral infection in aged individuals^{31,32,33}. There was no significant correlation between age and either growth rate or clearance rate in nasal samples (Extended Data Fig. 5).
Overall, we noted a surprising degree of discordance in viral dynamics between nasal and saliva samples for many participants. In most individuals (46 out of 54 analysed), viral genome shedding peaked at least 1 day earlier in saliva than in nasal samples (Fig. 2f). In contrast, the peak in nasal shedding preceded the saliva peak by at least 1 day in four individuals.
Significant heterogeneity in the infectious potential of individuals
We next examined the duration of infectious virus shedding in nasal samples, as a surrogate for the infectious potential of an individual. There exists a large variation in the number of days for which an individual tested positive for cell culture on nasal swabs (Fig. 3a). Nine out of 60 individuals tested negative by viral culture throughout the sampling period, whereas one individual tested positive for 9 days (Fig. 3a). We found a weak positive correlation between the duration of viral culture positivity and participant age (Fig. 3b). Of note, many study participants were viral culture positive on the first day of sample collection, suggesting that we failed to capture the onset of viral culture positivity for these individuals and thus may be underestimating the duration of infectious virus shedding for a subset of study participants.
To better quantify the infectious potential of each individual, we first used viral culture data as a measure for intrinsic infectiousness (infectiousness for short, below) to characterize how infectiousness depends on viral genome load. We fitted three alternative models as previously proposed^{27} to paired nasal RT–qPCR and viral culture data collected from each individual using a nonlinear mixedeffect modelling approach (see Extended Data Fig. 6 for workflow and Methods for details). Comparing models using AICc scores, we found that the relationship is best described by a saturation model where the infectious virus load is a Hilltype function of viral genome load (Fig. 3c, Extended Data Fig. 7 and and Supplementary Table 7). See Supplementary Table 8 for the bestfit parameter values.
Using the bestfit models, we estimated the infectiousness of each individual over the course of infection from their predicted genome viral loads and infectious viral loads (Extended Data Fig. 8). Note that the dataset allows us to estimate only a quantity that is a constant proportion of the infectious virus load (rather than its absolute value) across time and between individuals, and thus we report the predicted values in arbitrary units (a.u.) as a relative measure of infectiousness. Our model predicts that infectious virus shedding increases sharply when nasal CN values fall <22, and that the average amount of infectious virus shed is zero for CN values >29 (Fig. 3d). Importantly, there exists a high level of heterogeneity in infectiousness across different individuals that is not fully explained by differences in viral genome load (Fig. 3d). For example, at nasal CN values around 13, infectious virus shedding reached values >20 a.u. in three individuals while in 11 individuals it was <4 a.u. This suggests that viral Ct/CN values are not precisely predictive of infectiousness.
We next estimated the total infectiousness of each individual by integrating the area under the infectious virus load curve over the course of infection. This approach again revealed a large degree of heterogeneity in individuallevel infectiousness, with >57fold difference between the highest and lowest estimated infectiousness (104.0 and 1.8 a.u., respectively; Fig. 3e). We found that a gamma distribution with a shape parameter of 1.6 describes the distribution of individual infectiousness well (Fig. 3e). These data suggest that the previously reported heterogeneity in secondary transmission rates^{6,7} is likely to arise from a combination of heterogeneity in contact structure and heterogeneity in intrinsic infectiousness^{34}. This emphasizes the potential for a small subset of individuals that exhibit high intrinsic infectiousness to function as superspreaders if they have frequent and/or highrisk contacts during the infectious period. Finally, we observed a significant correlation between age and total infectiousness (P < 0.01, R^{2} = 0.21; Fig. 3f).
Analysis of B.1.1.7 viral dynamics
Finally, we asked whether infection with the B.1.1.7 (Alpha) variant of concern (VOC) is associated with any significant differences in viral dynamics that could potentially explain the enhanced transmissibility of this genotype^{35,36,37}. Previous studies have suggested that B.1.1.7 infection may result in higher peak viral loads or prolonged shedding compared with previously circulating genotypes^{38,39,40}. Within our cohort, 16 out of 60 individuals were infected with B.1.1.7.
Both the empirical data and our model analysis (Fig. 4a,c) suggest that the overall viral genome shedding dynamics in both nasal and saliva samples are indistinguishable between B.1.1.7 and nonB.1.1.7 infections (none of the latter were VOC genotypes except for a single P.1 (Gamma) infection; Supplementary Table 9). Although comparison of parameter estimates in nasal samples suggested a slightly slower growth rate and time to peak for B.1.1.7 versus nonB.1.1.7 (Fig. 4b), it is not clear whether this difference is biologically meaningful (Fig. 4a). Most importantly, we estimate that there is no significant difference between B.1.1.7 and nonB.1.1.7 viruses in total infectiousness in the nasal compartment (Fig. 4b). Previously, we have shown that the area under the logarithm of genome viral loads, denoted as AUC(log), may serve as a surrogate for infectiousness^{27}. Here we calculated AUC(log) from predicted viral load trajectories in the saliva compartment in each individual and found no difference between B.1.1.7 and nonB.1.1.7 viruses (Fig. 4d). These data indicate that other mechanisms not reflected in viral shedding dynamics drive the increased transmissibility of the B.1.1.7 (Alpha) variant.
Discussion
This study describes the results of daily multicompartment sampling of viral dynamics within dozens of individuals newly infected with SARSCoV2 and provides a comprehensive, highresolution description of viral shedding and clearance dynamics in humans.
Superspreading, in which a small subset of infected individuals are responsible for a disproportionately large share of transmission events, has been identified as a major driver of community spread of SARSCoV2, SARSCoV and many other acute viral pathogens^{6,7,9}. Superspreading is believed to arise from heterogeneity in both (1) contact structure between individuals arising from behavioural and environmental factors and (2) the intrinsic infectiousness of individuals^{9,34,41}. While heterogeneity in contact structure has been studied extensively^{42,43,44,45}, the extent of heterogeneity in infectiousness arising from individuallevel viral dynamics remains unknown. Although several studies have attempted to quantify this^{20,34}, the lack of empirical measurement of viral genome load and infectious virus shedding dynamics during early infection, which is a critical period for SARSCoV2 transmission, prevents precise estimation.
To address this question, we empirically quantified infectious virus shedding through daily longitudinal sampling of individuals infected with SARSCoV2. The substantial heterogeneity in infectious virus shedding that we observed among individuals indicates that superspreading is probably driven by individuallevel variation in specific features of the infection process, in addition to behavioural and environmental factors. We also found that heterogeneity in infectious virus shedding is only partly explained by individuallevel heterogeneity in viral genome load dynamics, suggesting that additional factors such as variation in the timing and magnitude of the neutralizing antibody response might contribute^{46}. Our results here suggest caution in assessing the infectiousness of an individual using viral genome load data alone. Further, the absence of clear viral genetic correlates of infectiousness within this dataset suggests the existence of specific host determinants of superspreading potential. While we identified age as a significant correlate of infectiousness, additional determinants probably exist. Defining these correlates could aid future efforts to mitigate community spread of the virus by helping identify individuals with elevated risk of becoming superspreaders.
Our finding that viral shedding often peaks earlier in saliva versus the nasal compartment, sometimes by several days, corroborates a recent study of four individuals^{47} and has several important implications. First, saliva screening may be a more effective sample type than nasal swabs for detection of infected individuals before or early in the infectious period^{48}. Early detection and isolation of infected individuals is absolutely critical for breaking transmission chains^{15}. Moreover, early viral shedding from the oral cavity may contribute to the high prevalence of presymptomatic SARSCoV2 transmission. We were unable to directly assess viral infectivity in saliva, so it remains unclear whether the earlier peaks in viral RNA shedding that we observed in saliva reflect earlier shedding of transmissioncompetent virus. The earlier detection of virus in saliva also raises questions about the initial site of SARSCoV2 infection. A recent study demonstrated that both salivary glands and oral mucosal epithelium can support SARSCoV2 replication, suggesting that infection could be initiated within the oral cavity^{49}. Alternatively, if infection is initiated in the nasopharynx or soft palate, viral RNA might be detectable in saliva before detection in the midturbinate swabs used in this study. The discordance in shedding dynamics between oral and nasal samples that we observed in many participants is consistent with a significant degree of compartmentalization between these adjacent but distinct tissue sites, as has been observed in animal models of influenza virus infection^{50,51}.
The specific mechanisms driving the enhanced transmissibility of the B.1.1.7 variant remain poorly understood. Recent studies have identified alterations in the structural conformation of the spike protein and enhanced antagonism of innate immunity by B.1.1.7 as potential contributors^{52,53}. Contrary to previous clinical studies, we observed no significant differences in either peak viral loads or clearance kinetics between B.1.1.7 and nonB.1.1.7 viruses as measured in either nasal swabs or saliva. Our results are consistent with studies demonstrating the absence of a growth advantage for B.1.1.7 in primary human respiratory epithelial cells^{54,55}. Similarly, a recent longitudinal study of RNA shedding observed no significant differences in mean peak viral RNA loads, clearance kinetics or infection duration of the Alpha and Delta variants compared with nonVOCs^{39}. If the timing of symptom onset differs between B.1.1.7 and nonB.1.1.7 infections, it could potentially explain why crosssectional analyses of viral loads might register lower Ct values for B.1.1.7 samples. These data suggest that the enhanced transmissibility of the B.1.1.7 variant may also be driven by features not reflected in shedding dynamics—for example, enhanced environmental stability or a lower infectious dose threshold.
This study has several limitations that must be considered. First, our study cohort was limited to faculty, students and staff of the University of Illinois at UrbanaChampaign and did not include anyone who was hospitalized for COVID19. The limited demographic and clinical profile of this cohort means that our results may not reflect the dynamics that occur during severe and lethal infections and/or in populations not well represented in our study. Second, there are multiple potential sources of technical variation that could contribute to noise in our experimental measurements. These include variability in sample collection quality and the potential for detection of subgenomic viral RNA in our RT–qPCR assays. While we took steps to minimize variation in sample collection quality, including having all sample collections remotely observed by trained study staff, it is possible that some of the sampletosample variation we observed is due to differences in sample quality. Finally, it must be noted that the results of viral culture assays performed on nasal swabs may not perfectly correlate with the actual transmission potential of an individual.
Altogether, our data provide a highresolution view of the longitudinal viral dynamics of SARSCoV2 infection in humans and implicate individuallevel heterogeneity in viral shedding as playing a critical role in community spread of this virus.
Methods
This study was approved by the Western Institutional Review Board, and all participants provided informed consent.
Participants
All oncampus students and employees of the University of Illinois at UrbanaChampaign are required to submit saliva for RT–qPCR testing every 2–4 days as part of the SHIELD campus surveillance testing programme. Individuals testing positive were instructed to isolate and were eligible to enrol in this study for a period of 24 h following receipt of their positive test result. Close contacts of individuals who test positive (particularly those cohoused with them) are instructed to quarantine and were eligible to enrol for up to 5 days after their last known exposure to an infected individual. All participants were also required to have received a negative saliva RT–qPCR result 7 days before enrolment.
Individuals were recruited via either a link shared in an automated text message providing isolation information sent within 30 min of a positive test result, a call from a study recruiter or a link shared by an enroled study participant or included in information provided to all quarantining close contacts. In addition, signs were used at each testing location and a website was available to inform the community about the study.
Participants were required to be at least 18 years of age, have a valid university ID, speak English, have Internet access and live within 8 miles of the university campus. After enrolment and consent, participants completed an initial survey to collect information on demographics and health history and were provided with sample collection supplies. Participants who tested positive before enrolment or during quarantine were followed for up to 14 days. Quarantining participants who continued to test negative by saliva RT–qPCR were followed for up to 7 days after their last exposure. All participants’ data and survey responses were collected in the Eureka digital study platform. All study participants were asked whether they had previously tested positive for SARSCoV2 or been vaccinated against SARSCoV2. All participants included in this cohort reported no previous SARSCoV2 infection and were unvaccinated at the time of enrolment.
Sample collection
Each day, participants were remotely observed by trained study staff, who collected the following samples.

(1)
Saliva (2 ml), into a 50ml conical tube

(2)
One nasal swab from a single nostril using a foamtipped swab that was placed within a dry collection tube

(3)
One nasal swab from the other nostril using a flocked swab that was subsequently placed in a collection vial containing 3 ml of viral transport medium (VTM). Swab and VTM manufacturer were not changed throughout the study.
The order of nostrils (left versus right) used for the two different swabs was randomized. For nasal swabs, participants were instructed to insert the soft tip of the swab at least 1 cm into the indicated nostril until they encountered mild resistance, rotate the swab around the nostril five times and leave it in place for 10–15 s. After daily sample collection, participants completed a symptom survey. A courier collected all participant samples within 1 h of sampling using a nocontact pickup protocol designed to minimize courier exposure to infected participants.
Saliva RT–qPCR
After collection, saliva samples were stored at room temperature and RT–qPCR was run within 12 h of initial collection in a Clinical Laboratory Improvement Amendments (CLIA)certified diagnostic laboratory. The protocol for the covidSHIELD direct salivatoRT–qPCR assay used has been detailed previously^{24}. In brief, saliva samples were heated at 95 °C for 30 min followed by the addition of 2× Tris/Borate/EDTA buffer (TBE) at a 1:1 ratio (final concentration 1× TBE) and Tween20 to a final concentration of 0.5%. Samples were assayed using the Thermo Taqpath COVID19 assay.
Antigen testing
Foamtipped nasal swabs were placed in collection tubes, transported in cold packs and stored at 4 °C overnight based on guidance from the manufacturer. The morning after collection, swabs were run through the Sofia SARS antigen FIA on Sofia devices according to the manufacturer’s protocol.
Nasal swab RT–qPCR
Collection tubes containing VTM and flocked nasal swabs were stored at −80 °C after collection and were subsequently shipped to Johns Hopkins University for RT–qPCR and virus culture testing. After thawing, VTM was aliquoted for RT–qPCR and infectivity assays. One millilitre of VTM from the nasal swab was assayed on the Abbott Alinity, according to the manufacturer’s instructions, in a College of American Pathologist and CLIAcertified laboratory.
Calibration curve for nasal swab RT–qPCR assay
Calibration curves for Alinity assay were determined using digital droplet PCR (ddPCR) as previously described^{56}. Nasal swab samples previously quantified using the Alinity assay were stored in a freezer at −80 °C between initial quantification and extraction for calibration curves. Samples were extracted simultaneously using the Perkin Elmer Chemagic 360 automated extraction platform, with sample input and eluate volumes of 300 and 60 µl, respectively. RNA eluates were stored at −80 °C. Digital droplet RT–PCR was performed following the BioRad EUA assay package insert (https://www.fda.gov/media/137579/download). A master mix was prepared per sample using the reagents provided in the ddPCR Supermix for Probes kit as follows: 5.5 µl of SuperMix (BioRad), 2.2 µl of reverse transcriptase (BioRad), 1.1 µl of dithiothreitol (BioRad), 1.1 µl of CDC triplex SARSCoV2 primer and probe mix (IDT) and 7.1 µl of nucleasefree water; 17 µl of master mix was then transferred to a 96well PCR plate and combined with 5 µl of RNA in eluate, and the plate was then loaded on to a QX200 automated droplet generator (BioRad). The dropletcontaining plate was then heat sealed with foil in a plate sealer (BioRad) and placed on a C1000 Touch thermal cycler (BioRad) to perform reverse transcription and amplification. Droplets were read using the QX200 droplet reader (BioRad). Data were analysed with QuantaSoft Analysis Pro 1.0 software.
Virus culture from nasal swabs
VeroTMPRSS2 cells were grown in complete medium (CM) consisting of DMEM with 10% foetal bovine serum (Gibco), 1 mM glutamine (Invitrogen), 1 mM sodium pyruvate (Invitrogen), 100 U ml^{–1} penicillin (Invitrogen) and 100 μg ml^{–1} streptomycin (Invitrogen)^{57}. Viral infectivity was assessed on VeroTMPRSS2 cells as previously described using infection medium (identical to CM except that FBS is reduced to 2.5%)^{26}. When a cytopathic effect was visible in >50% of cells in a given well, the supernatant was harvested. The presence of SARSCoV2 was confirmed through RT–qPCR, as described previously, by extracting RNA from the cell culture supernatant using the Qiagen viral RNA isolation kit and performing RT–qPCR using N1 and N2 SARSCoV2specific primers and probes, in addition to primers and probes for the human RNaseP gene with the CDC researchuseonly 2019Novel Coronavirus (2019nCoV) Realtime RT–PCR primer and probes sequences, and utilizing synthetic RNA target sequences to establish a standard curve^{58}.
Viral genome sequencing and analysis
Viral RNA was extracted from 140 µl of heatinactivated (30 min at 95 °C, as part of the protocol detailed in ref. ^{24}) saliva samples using the QIAamp viral RNA mini kit (Qiagen); 100 ng of viral RNA was used to generate complementary DNA using the SuperScript IV first strand synthesis kit (Invitrogen). Viral cDNA was then used to generate sequencing libraries utilizing the Swift SNAP Amplicon SARS CoV2 kit with additional coverage panel and unique dual indexing (Swift Biosciences), which were sequenced on an Illumina Novaseq SP lane. Data were run through the nfcore/viralrecon workflow (https://nfco.re/viralrecon/1.1.0) using the WuhanHu1 reference genome (NCBI accession NC_045512.2). Swift v.2 primer sequences were trimmed before variant analysis from iVar v.1.3.1 (https://doi.org/10.1186/s1305901816187), retaining all calls with a minimum allele frequency of 0.01 and higher. Viral lineages were called using the Pangolin tool (https://github.com/covlineages/pangolin) v.2.4.2, pango v.1.2.6 and the 5/19/21 version of the pangoLEARN model based on the nomenclature system described in ref. ^{59}.
Statistics and reproducibility
Details of statistical analysis methods are given below. No statistical method was used to predetermine sample size. For some analyses, a small number of individuals were excluded for reasons detailed above, where relevant. Experiments were not randomized and the investigators were not blinded to allocation during experiments and outcome assessment.
Statistical analyses
The difference in the distribution of a parameter of interest between the nonB.1.1.7 and B.1.1.7 infection groups was assessed using univariate analysis, and P values calculated using the Wilcoxon ranksum test. Comparison of infectious virus shedding between the two groups was performed using multivariate analysis with age as an additional variate. Levels of infectious viral shedding, after adjusting for age, were predicted by assuming an age of 28 years—that, is the median age of the cohort (Fig. 4c).
Generation of figures
All figures, except for Fig. 2a, were generated using RStudio. Figure 2a was generated using Microsoft Powerpoint.
Overview of model construction and parameter estimation
The goal of quantitative analyses is to use mathematical models to characterize viral shedding dynamics based on both viral genome loads (as measured by RT–qPCR) and the presence or absence of infectious virus (as measured by viral culture assay). Analysing the model results, we quantify individuallevel heterogeneity in both viral genome shedding dynamics and individual infectiousness. See Extended Data Fig. 6 for an overview of the analysis workflow.
First, we performed experiments to derive the calibration curves for transformation of Ct/CN values from RT–qPCR to viral genome loads (Viral genome load calibration from Ct/CN values). Note that, due to the nature of RT–qPCR assays and sampling noise, viral genome loads derived using calibration curves represent a proxy for the actual quantities. Nonetheless, this approach is the best available to derive viral genome loads for the purpose of viral dynamic modelling, and is widely used in understanding SARSCoV2 dynamics^{21,60}.
Second, we constructed viral dynamic models and fit these to viral genome loads (Viral dynamics models). We estimated key parameters governing infection processes in the nasal and the salivaassociated compartments, such as viral exponential growth rate before peak viral genome load and viral clearance rate. This allows us to characterize individuallevel heterogeneity in infection kinetics.
Third, we constructed mathematical models to describe how the amount of infectious virus shed relates to changes in viral genome load, as measured by RT–qPCR (Modelling infectiousness of an individual). We fit the models to viral culture assay data. Using the best model and predicted viral genome load kinetics from the viral dynamics model, we predicted the extent of infectious virus shedding—that is the infectiousness, for each individual—and thus quantified the individuallevel heterogeneity in infectiousness.
Viral genome load calibration from Ct/CN values
Viral genome load calibration: nasal samples
To calculate viral genome loads from CN values reported for nasal samples, we performed calibration curve experiments to empirically define the relationship between CN values obtained from the RT–qPCR assay used on nasal swab samples, and absolute viral genome loads within samples, as quantified by ddPCR. We quantified viral genome loads for 62 nasal samples with CN values ranging between 17 and 38. For each sample, absolute copy numbers of viral genomes were measured using two different Ngenespecific primer sets (N1 and N2). To account for technical noise between samples, we also determined the concentration of the host RNAse P (RP) transcript as a control (Supplementary Table 10). We then normalized copy numbers of N1 and N2 targets by dividing by their corresponding RP target numbers, then multiplied the mean of RP concentration across all samples. Note that the unit of these measurements is per millilitre: this is because nasal swab samples were each collected in 3 ml of VTM.
Plotting the logarithm of normalized viral genome loads against the associated CN values shows a clear linear relationship, justifying the use of linear regression below. Linear regression lines with similar coefficients were used as calibration curves in other studies^{21,60}. We also note that the noise in genome viral loads is high when CN values are high (for example, >33), probably a reflection of increased noise when the signal is low^{26}. However, this high level of variation at high CN values will not impact on the conclusion of our study, because the range of viral loads relevant to transmission is much higher (>10^{6} copies ml^{–1}; Fig. 3d).
We then performed linear regression on measured CN values and log_{10} viral genome loads (Extended Data Fig. 9). This led to the following formula for the relationship between CN values and viral genome load:
where V and CN denote the viral genome load and CN value, respectively. Note that, because of the high number of data points measured, the level of uncertainty in the regression line is minimal (Extended Data Fig. 9).
Viral genome load calibration: saliva samples
Unlike for nasal samples, we were unable to measure the calibration curve using saliva samples taken from participants. To quantify the efficiency of the RT–qPCR assay used on saliva samples, we used data from calibration experiments in which saliva samples obtained from healthy donors were spiked with SARSCoV2 genomic RNA. More specifically, 0.9 ml of saliva from a healthy donor was spiked with 0.1 ml of 1.8 × 10^{8}, 5.4 × 10^{5} or 6.0 × 10^{4} RNA copies ml^{–1}. For samples spiked with 1.8 × 10^{8} RNA copies ml^{–1}, tenfold serial dilutions were performed to a final concentration of 1.8 × 10^{4} RNA copies ml^{–1}. A total of 24 samples were collected and Ct values of the N gene then measured (Supplementary Table 11).
As above, we plotted the logarithm of viral loads against Ct values (Extended Data Fig. 10). The plot shows a clear linear relationship, justifying the use of linear regression below. We then performed linear regression on measured CN values and log_{10} viral genome loads (Extended Data Fig. 10). This led to the following formula for the relationship between CN values and viral genome load:
where V and Ct denote viral genome load and Ct value, respectively. In regard to the nasal calibration curve, the level of uncertainties in the regression line is minimal (Extended Data Fig. 10).
Note that a major difference between samples spiked with viral genomes and those taken from infected individuals is that the latter are likely to be noisier because of variation in the sample collection process. However, the two approaches should not differ substantially in assessing the efficiency of the RT–PCR protocol. The impact of noise in the nasal sample can be minimized by taking a large number of samples over a wide range of CN values, as we did for the nasal samples. Therefore, the calibration curves derived above represent an accurate translation of Ct/CN values to viral load.
Viral dynamics models
We constructed viral dynamics models to describe the dynamic changes in viral genome load. The viral genome load patterns in nasal and saliva samples are distinct from each other in many individuals, suggesting compartmentalization of infection dynamics in these two sample sites. Therefore, we use the models below to describe data collected from these two compartments separately. See Fig. 2a and Extended Data Fig. 4 for schematics of these models.
The targetcelllimited model
We first constructed a withinhost model based on the targetcelllimited (TCL) model used for other respiratory viruses such as influenza^{61} and, more recently, SARSCoV2 (refs. ^{27,29,62}). We keep track of the total numbers of target cells (T), cells in the eclipse phase of infection (E)—that is, infected cells not yet producing virus, productively infected cells (I) and viruses (V). The ordinary differential equations are:
In this model, target cells are infected by virus with rate constant β, cells in the eclipse phase become productively infected cells at percapita rate k and productively infected cells die at percapita rate δ. We use V to describe viruses measured in nasal or saliva samples, representing a proportion of the total virus in the compartment under consideration. Therefore, rate π is the product of viral production rate per infected cell and the proportion of virus that is sampled (see Ke et al.^{27} for a detailed derivation). Viruses are cleared at percapita rate c.
Refractory cell model
We extend the TCL model by including an early innate response—that is the typeI/III interferon response, where interferons are secreted from infected cells and bind to receptors on uninfected target cells, stimulating an antiviral response that renders them refractory to viral infection. Note that this is the best model to describe the viral genome load dynamics as measured by RT–qPCR from nasal samples.
We keep track of interferon (F) and cells refractory to infection (R), in addition to other quantities in the TCL model. The full ordinary differential equations (ODEs) for target cells, refractory cells and interferon are
In this model, the impact of the innate immune response is to convert target cells into refractory cells at rate ϕFT where ϕ is a rate constant. Refractory cells can become target cells again at rate ρ. Interferon is produced and cleared at rates s and μ, respectively.
For simplicity, and due to a lack of empirical data on interferon responses in our study, we simplify the model by making the quasisteadystate assumption that the interferon dynamics are much faster than the dynamics of infected cells and assume that \(\frac{{{\mathrm{d}}F}}{{{\mathrm{d}}t}} = 0\). Thus \(sI = \mu F\) or \(F = \frac{s}{\mu }I\).
Let \({\Phi} = \phi \frac{s}{\mu }\), so that the ODEs for the innate immunity model become:
Viral production reduction model
In addition to making target cells refractory to infection, the impact of interferons may include reducing virus production from infected cells. We include this action of interferons in the viral production reduction model. As above, we make the quasisteadystate assumption that interferon dynamics are much faster than those of infected cells and assume that F is proportional to I. The ODEs for the model are:
where γ is a constant representing the effect of interferon in reducing viral production.
Immune effector cell model
Over the course of infection, immune effector cells are activated and recruited to kill infected cells. These immune effector cells include innate immune cells such as macrophages and natural killer cells, as well as cells developed during the adaptive immune response such as cytotoxic T lymphocytes and antibodysecreting B cells. To consider the impact of these immune effector cells, we develop a model—the effector cell model—based on a previous model for influenza infection^{28}. In this model, we assume that the death rate of infected cells is δ_{1} at the beginning of the infection. This may reflect the cytotoxic effects of viral infection. After time t_{1}, the death rate of infected cells increases by δ_{2}, where δ_{2} models the killing of infected cells by immune effector cells. The ODEs for the model are:
Note that this is the best model to describe the viral genome load dynamics as measured by RT–qPCR from saliva samples.
Combined model
In the full model, we combine the refractory cell model and immune effector cell model to consider both the immediate interferon response and immune effector response. The ODEs for the model are:
Choice parameter values
Total target cell numbers
We calculate the total numbers of target cells in the nasal and saliva compartments by multiplying the total number of epithelial cells in these two compartments by the fraction of epithelial cells expected to be targets for SARSCoV2 infection.
For the total number of epithelial cells in the nasal compartment, we use the estimate from Baccam et al.^{61}, 4 × 10^{8} cells. This is calculated from the estimate that the surface area of the nasal turbinates is 160 cm^{2} (ref. ^{63}) and the surface area per epithelial cell is 2 × 10^{−11} to 4 × 10^{−11} m^{2} per cell (ref. ^{61}). For the saliva compartment, the total surface area of the mouth was estimated to be 214.7 cm^{2} (ref. ^{64}). Therefore, we estimate that the total number of epithelial cells in the mouth is approximately 4 × 10^{8} × 214.7/160 = 5.4 × 10^{8}.
Hou et al. estimated that the fraction of cells expressing angiotensinconverting enzyme 2—that is, the receptor for SARSCoV2 entry—on the cell surface is approximately 20% in the upper respiratory tract^{65}. Therefore, in our model, the initial numbers of target cells in the nasal and saliva compartments are calculated as 4 × 10^{8} × 20% = 8 × 10^{7} and 5.4 × 10^{8} × 20% = 1.08 × 10^{8}, respectively.
Note that these estimates are approximations using available best estimates in the literature. For a standard viral dynamics model, the number of initial target cells and virus production rate are unidentifiable and only their product is identifiable^{66}. Thus, if the actual number of target cells differs from that estimated here, an increase in the initial number of target cells will lead to a corresponding decrease in the estimate of virus production rate, and vice versa.
Initial number of infected cells
We assume that one cell in the compartment of interest is infected at the start of infection, E_{0} = one cell, consistent with refs. ^{27,67}. The small number of infected cells is also consistent with a recent work which estimated from sequencing data that the transmission bottleneck is small for SARSCoV2 and that there are probably between one and three infected cells at the initiation of infection^{68,69,70}. Note that, in an earlier work, we showed that changes in the number of initially infected cells of between one and five in the model do not substaintially change the inference results^{27}.
Initial viral growth rate, r
For all models above, the initial growth of the viral population before peak viral genome load is dominated by viral infection. This means that the immune responses considered in our models act to change the viral growth trajectory substantially only at later time points^{71}. Thus, we derive an approximation to the initial viral growth rate using the TCL model only (equation (1)). This approximation also represents a good approximation for other models.
We first make two simplifying assumptions commonly used in analysis of the initial dynamics of viral dynamic models^{72,73}. First, because at the initial stage of infection the number of infected cells is orders of magnitude lower than the number of target cells, we assume that the number of target cells is at a constant level, T_{0}. Second, the dynamics of viruses are much faster than those of infected cells. For example, the rate of viral clearance is in the time scale of minutes and hours whereas the death of productively infected cells is in days. Therefore, we make the quasisteadystate assumption, \(\frac{{{\mathrm{d}}V}}{{{\mathrm{d}}t}} \approx 0\), such that the concentrations of viruses are always in proportion to the concentration of productively infected cells—that is, \(\pi I \approx cV\). This gives \(V \approx \frac{\pi }{c}I\).
With these two assumptions, equation (1) becomes a system of linear ODEs with two variables, E and I:
The Jacobian matrix, J, for this system of ODEs is:
The initial growth rate, r, is the leading eigenvalue of the Jacobian matrix of the ODE system. We calculate the eigenvalues, λ, for the Jacobian matrix above from \(\left {J  \lambda I} \right = 0\), where I is the identity matrix, and get:
\(\lambda = \frac{1}{2}\left[ {  \left( {k + \delta } \right) \pm \sqrt {\left( {k + \delta } \right)^2 + 4k\delta \left( {R_0  1} \right)} } \right]\), where \(R_0 = \frac{{\beta \pi }}{{\delta c}}T_0\).
Then, the leading eigenvalue—that is, the initial growth rate r— is:
Model fitting strategy
Fitting viral dynamic models to viral genome load data
We took a nonlinear mixedeffect modelling approach to fit the viral dynamic models to viral genome load data from all individuals simultaneously. All estimations were performed using Monolix (Monolix Suite 2019R2, Lixoft: https://lixoft.com/products/monolix/). We allowed random effects on the fitted parameters (unless specified otherwise). All population parameters, except for the starting time of simulation, t_{0}, are positive and therefore we assume that they follow lognormal distributions. For t_{0} we assume a normal distribution because t_{0} can be positive or negative.
The parameters β and π in the viral dynamic models strongly correlate with each other when the models are fitted to viral genome load data^{66}. We tested three choices in handling this correlation in fitting all five viral dynamic models: (1) a correlation is assumed between parameter β and π in Monolix; (2) parameter β has a fixed effect only (that is, its value is set to be the same across all individuals); and (3) parameter π has a fixed effect only.
To test whether the age of the individuals and/or the infecting viral genotype (categorized as either nonB.1.1.7 or B.1.1.7) explains the heterogenous patterns in viral genome load trajectories across the cohort, we tested whether they covary with any of the fitted parameters in the model by setting the two variables as a continuous and a categorical covariate, respectively, in Monolix.
The assumptions on parameters β and π and the choice of parameters that covariate with age or viral strain of infection led to a large number of model choices for fitting. Therefore, we took the following strategy to ensure that we identified the best model and parameter combinations to describe the data.

First, we tested the three assumptions about parameters β and π in the five viral dynamic models without any covariate and selected the best assumption for further analysis based on their corrected Akaike information criterion (AICc) scores.

Second, using the best assumption, we tested the model by including the age of the individuals as a continuous covariate of all fitted parameter values with a random effect first. We then took an iterative approach to test whether the covariate should be removed from any of the parameters in the model using Pearson’s correlation test in Monolix. The parameter(s) that has a nonsignificant P value (P > 0.05) or with the lowest P value is removed from next round of parameter fitting. We iterated the process until all parameters were removed.

The best model variant with the lowest AICc score was then selected for analysis on whether parameter estimates differed in individuals infected by different viral strains. As before, we took an iterative approach. We first set the viral strain—that is, nonB.1.1.7 or B.1.1.7—as a categorical covariate of all fitted parameter values with a random effect in the model. We then tested whether the covariate should be removed from any of the parameters in the model using the analysis of variance in Monolix. The parameter(s) that has a nonsignificant P value (P > 0.05) or with the lowest P value is removed from the next round of parameter fitting. We iterated the process until all parameters were removed.

Finally, the model variant with the lowest AICc score was selected as the best model.
Prediction of viral genome load trajectories for nonB.1.1.7 and B.1.1.7 strains
We randomly sampled 5,000 sets of parameter combinations from the distribution specified by the bestfit population parameters (Supplementary Table 4). For the effector cell model for the saliva compartment, β and π are strongly correlated. We thus applied formulations such that correlations between the two parameter values are preserved in the random sampling in accordance with the estimated correlation coefficient. We simulated the bestfit model using the 5,000 sets of parameter combinations for each of the strain. The median and the fifth and 95th quantilse of viral genome loads at each time points are reported.
Modelling infectiousness of an individual
We model how infectiousness depends on the viral genome load in an individual, similarly to the framework proposed in Ke et al.^{27}. Specifically, we first use the viral culture data collected in this study to infer how the level of infectious virus shed relates to viral genome loads as measured by RT–qPCR. From this model, we predict how the level of infectious virus shedding changes over time in each individual and how the overall infectiousness of the infection varies among participants.
Relationship between viral genome load and infectious viruses
We first consider three alternative models describing how the amount of infectious virus in a sample is related to viral genome load (derived from the CN values): the ‘linear’ model, ‘powerlaw’ model and ‘saturation’ model. In these models, due to the nature of stochasticity in sampling, we assume the number of infectious viruses that was in the sample for cell culture experiment to be a random variable, Y, that follows a Poisson distribution, with V_{inf} representing the expected number of infectious viruses—that is, \(V_{{\mathrm{inf}}} = E(Y)\).

(1)
The linear model:
We assume that V_{inf}, is proportional to the viral genome load, V, in the sample:
$$V_{{\mathrm{inf }}} = E(Y) = AV$$(9)where A is a constant.

(2)
The powerlaw model:
We assume that V_{inf} is related to the viral genome load, V, by a power function:
$$V_{{\mathrm{inf}}} = E(Y) = BV\,^h$$(10)where B and h are constants.

(3)
The saturation model:
We assume that V_{inf} is related to the viral genome load, V, by a Hill function:
where V_{m} and K_{m} are constants and h is the Hill coefficient.
Probability of cell culture being positive
If each infectious virus has a probability \({\it{\varrho }}\) to establish infection such that the cell culture becomes positive, the number of viruses that successfully establish an infection in cell culture is Poisson distributed with parameter \(\lambda = E\left( Y \right){\it{\varrho }} = V_{{\mathrm{inf}}}{\it{\varrho }}\). Thus, the probability of one or more viruses successfully infecting the culture so that it tests positive is
Substituting the expressions of V_{inf} from the three models above, we get the following expressions for p_{positive} from the three models (note that we use the subscripts ‘1’, ’2’ and ‘3’ to denote the three models for V_{inf}):
where \(D = A{\it{\varrho }}\).
where \(G = B{\it{\varrho }}\).
where \(J = V_m{\it{\varrho }}\).
Note that, from the expressions above, it becomes clear that we will not be able to estimate parameters A, B and V_{m} in the three models because they appear as products with the unknown parameter \({\it{\varrho }}\) in the equations. This means that the viral culture data do not allow us to estimate the absolute number of infectious viruses in a sample or provide a viral genome load; instead, we are able to estimate a quantity that is a constant proportion of the actual number of infectious viruses over time and across individuals. Therefore, we report estimations of infectious viruses in arbitrary units. These estimates represent a relative measure of infectiousness. Two estimates measured at different time points and/or from different individuals can be compared using this method.
Model fitting using a population effect modelling approach
For each sample, viral genome load and cell culture positivity were measured. Using these data, we estimate parameter values in the three models by minimizing the negative loglikelihood of the data.
More specifically, the likelihood of the m^{th} observation being positive or negative in cell culture is calculated as:
where V_{m} is the viral genome load of the mth observation.
Because we have the paired nasal RT–qPCR and viral culture data for each individual, we fit the three mathematical models using a nonlinear mixedeffect modelling approach. Again, all estimations were performed using Monolix. We allowed random effects on the fitted parameters (unless specified otherwise). All population parameters with a random effect are assumed to follow lognormal distributions.
To find the best model explaining the data, we tested models with different combinations of parameters either with or without a random effect (Supplementary Table 7). The model with the lowest AIC score was selected as the best model.
Note that, for each of the three models, we tested a model variation where all parameters in the models have fixed effects only—that is, a single set of parameters is used to explain viral culture data from every individual. In this case, there is no heterogeneity in parameter values across individuals. The resulting AIC scores are significantly worse than the bestfit model assuming random effects on parameters (Supplementary Table 7). This indicates that there is a substantial level of individual heterogeneity in the relationship between infectious virus shedding and viral genome loads (as shown in Fig. 3d).
Calculation of CIs of the cell culture positivity curve (Fig. 3c)
Similar to the procedures performed for prediction of CIs of viral genome load trajectories, we randomly sampled 5,000 sets of parameter combinations from the distribution specified by the bestfit population parameters of the best model—that is, the saturation model assuming that K_{m} has only a fixed effect (Supplementary Table 8). More specifically, we sampled parameters from a lognormal distribution for J and h, with their means and standard deviations at the bestfit values. Using the parameter combinations, we generated curves of probability of cell culture positivity at CN values ranging between 10 and 40. The median and the fifth and 95th quantiles of viral genome loads at each CN values are reported.
Reporting Summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
All raw data used are included as a Supplementary table. Raw sequence data files can be found under BioProject ID PRJNA809434.
Code availability
Computer codes for the mathematical analyses in this paper are available at both https://github.com/BROOKELAB/Viraldynamicsmodeling and https://doi.org/10.5281/zenodo.6311388.
References
He, X. et al. Temporal dynamics in viral shedding and transmissibility of COVID19. Nat. Med. 26, 672–675 (2020).
Ferretti, L. et al. The timing of COVID19 transmission. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2020.09.04.20188516v2 (2020).
Szablewski, C. M. et al. SARSCoV2 transmission and infection among attendees of an overnight camp — Georgia, June 2020. MMWR Morb. Mortal. Wkly Rep. 69, 1023–1025 (2020).
Long, Q.X. et al. Clinical and immunological assessment of asymptomatic SARSCoV2 infections. Nat. Med. 26, 1200–1204 (2020).
Li, R. et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARSCoV2). Science 368, 489–493 (2020).
Sun, K. et al. Transmission heterogeneities, kinetics, and controllability of SARSCoV2. Science 371, eabe2424 (2021).
Adam, D. C. et al. Clustering and superspreading potential of SARSCoV2 infections in Hong Kong. Nat. Med. 26, 1714–1719 (2020).
Bi, Q. et al. Epidemiology and transmission of COVID19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. Lancet Infect. Dis. 20, 911–919 (2020).
LloydSmith, J. O., Schreiber, S. J., Kopp, P. E. & Getz, W. M. Superspreading and the effect of individual variation on disease emergence. Nature 438, 355–359 (2005).
Néant, N. et al. Modeling SARSCoV2 viral kinetics and association with mortality in hospitalized patients from the French COVID cohort. Proc. Natl Acad. Sci. USA 118, e2017962118 (2021).
Silva, J. et al. Saliva viral load is a dynamic unifying correlate of COVID19 severity and mortality. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2021.01.04.21249236v2 (2021).
The Massachusetts Consortium for Pathogen Readiness et al. SARSCoV2 viral load is associated with increased disease severity and mortality. Nat. Commun. 11, 5493 (2020).
Zheng, S. et al. Viral load dynamics and disease severity in patients infected with SARSCoV2 in Zhejiang province, China, JanuaryMarch 2020: retrospective cohort study. BMJ https://doi.org/10.1136/bmj.m1443 (2020).
Marks, M. et al. Transmission of COVID19 in 282 clusters in Catalonia, Spain: a cohort study. Lancet Infect. Dis. 21, P629–P636 (2021).
Larremore, D. B. et al. Test sensitivity is secondary to frequency and turnaround time for COVID19 screening. Sci. Adv. 7, eabd5393 (2021).
Kim, J. Y. et al. Viral load kinetics of SARSCoV2 infection in first two patients in Korea. J. Korean Med. Sci. 35, e86 (2020).
Wölfel, R. et al. Virological assessment of hospitalized patients with COVID2019. Nature 581, 465–469 (2020).
Lescure, F.X. et al. Clinical and virological data of the first cases of COVID19 in Europe: a case series. Lancet Infect. Dis. 20, 697–706 (2020).
Young, B. E. et al. Epidemiologic features and clinical course of patients infected with SARSCoV2 in Singapore. JAMA 323, 1488 (2020).
Jones, T. C. et al. Estimating infectiousness throughout SARSCoV2 infection course. Science 373, 6551 (2021).
Kissler, S. M. et al. Viral dynamics of acute SARSCoV2 infection and applications to diagnostic and public health strategies. PLoS Biol. 19, e3001333 (2021).
Ranoa, D. R. E. et al. Mitigation of SARSCoV2 transmission at a large public university. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2021.08.03.21261548v1 (2021).
Shain, E. B. & Clemens, J. M. A new method for robust quantitative and qualitative analysis of realtime PCR. Nucleic Acids Res. 36, e91 (2008).
Ranoa, D. R. E. et al. Salivabased molecular testing for SARSCoV2 that bypasses RNA extraction. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2020.06.18.159434v1 (2020).
Monel, B. et al. Release of infectious virus and cytokines in nasopharyngeal swabs from individuals infected with nonalpha or alpha SARSCoV2 variants: an observational retrospective study. eBioMedicine 73, 1036372021 (2021).
Pekosz, A. et al. Antigenbased testing but not realtime polymerase chain reaction correlates with severe acute respiratory syndrome coronavirus 2 viral culture. Clin. Infect. Dis. 76, e2861–e2866 (2021).
Ke, R., Zitzmann, C., Ho, D. D., Ribeiro, R. M. & Perelson, A. S. In vivo kinetics of SARSCoV2 infection and its relationship with a person’s infectiousness. Proc. Natl Acad. Sci. USA 118, e2111477118 (2021).
Pawelek, K. A. et al. Modeling withinhost dynamics of influenza virus infection including immune responses. PLoS Comput. Biol. 8, e1002588 (2012).
Goyal, A., CardozoOjeda, E. F. & Schiffer, J. T. Potency and timing of antiviral therapy as determinants of duration of SARSCoV2 shedding and intensity of inflammatory response. Sci. Adv. 6, eabc7112 (2020).
BlancoMelo, D. et al. Imbalanced host response to SARSCoV2 drives development of COVID19. Cell 181, 1036–1045 (2020).
Shaw, A. C., Goldstein, D. R. & Montgomery, R. R. Agedependent dysregulation of innate immunity. Nat. Rev. Immunol. 13, 875–887 (2013).
Molony, R. D. et al. Aging impairs both primary and secondary RIGI signaling for interferon induction in human monocytes. Sci. Signal. 10, eaan2392 (2017).
Angioni, R. et al. Ageseverity matched cytokine profiling reveals specific signatures in Covid19 patients. Cell Death Dis. 11, 957 (2020).
Goyal, A., Reeves, D. B., CardozoOjeda, E. F., Schiffer, J. T. & Mayer, B. T. Viral load and contact heterogeneity predict SARSCoV2 transmission and superspreading events. eLife 10, e63537 (2021).
Washington, N. L. et al. Emergence and rapid transmission of SARSCoV2 B.1.1.7 in the United States. Cell 184, P2587–2594 (2021).
Davies, N. G. et al. Estimated transmissibility and impact of SARSCoV2 lineage B.1.1.7 in England. Science 372, eabg3055 (2021).
Volz, E. et al. Assessing transmissibility of SARSCoV2 lineage B.1.1.7 in England. Nature 593, 266–269 (2021).
Calistri, P. et al. Infection sustained by lineage B.1.1.7 of SARSCoV2 is characterised by longer persistence and higher viral RNA loads in nasopharyngeal swabs. Int. J. Infect. Dis. 105, 753–755 (2021).
Kissler, S. M. et al. Viral dynamics of SARSCoV2 variants in vaccinated and unvaccinated persons. N. Engl. J. Med. 385, 2489–2491 (2021).
Althaus, C. L. et al. A tale of two variants: spread of SARSCoV2 variants Alpha in Geneva, Switzerland, and Beta in South Africa. Preprint at medRxiv https://www.medrxiv.org/content/10.1101/2021.06.10.21258468v1 (2021).
Lakdawala, S. S. & Menachery, V. D. Catch me if you can: superspreading of COVID19. Trends Microbiol. 29, P919–P929 (2021).
Mossong, J. et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med. 5, e74 (2008).
Zhang, J. et al. Changes in contact patterns shape the dynamics of the COVID19 outbreak in China. Science 368, 1481–1486 (2020).
Mistry, D. et al. Inferring highresolution human mixing patterns for disease modeling. Nat. Commun. 12, 323 (2021).
Wallinga, J., Teunis, P. & Kretzschmar, M. Using data on social contacts to estimate agespecific transmission parameters for respiratoryspread infectious agents. Am. J. Epidemiol. 164, 936–944 (2006).
van Kampen, J. J. A. et al. Duration and key determinants of infectious virus shedding in hospitalized patients with coronavirus disease2019 (COVID19). Nat. Commun. 12, 267 (2021).
Savela, E. S. et al. SARSCoV2 is detectable using sensitive RNA saliva testing days before viral load reaches detection range of lowsensitivity nasal swab tests. https://doi.org/10.1101/2021.04.02.21254771 (2021).
Smith, R. L. et al. Longitudinal assessment of diagnostic test performance over the course of acute SARSCoV2 infection. J. Infect. Dis. 224, 976–982 (2021).
Huang, N. et al. SARSCoV2 infection of the oral cavity and saliva. Nat. Med. 27, 892–903 (2021).
Lakdawala, S. S. et al. The soft palate is an important site of adaptation for transmissible influenza viruses. Nature 526, 122–125 (2015).
Amato, K. A. et al. Influenza A virus undergoes compartmentalized replication in vivo dominated by stochastic bottlenecks. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2021.09.28.462198v2 (2021).
Cai, Y. et al. Structural basis for enhanced infectivity and immune evasion of SARSCoV2 variants. Science 373, 6555 (2021).
Thorne, L. G. et al. Evolution of enhanced innate immune evasion by SARSCoV2. Nature 602, 487–495 (2022).
Brown, J. C. et al. Increased transmission of SARSCoV2 lineage B.1.1.7 (VOC 2020212/01) is not accounted for by a replicative advantage in primary airway cells or antibody escape. Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/2021.02.24.432576v2 (2021).
Ulrich, L. et al. Enhanced fitness of SARSCoV2 variant of concern Alpha but not Beta. Nature 602, 307–331 (2022).
Gniazdowski, V. et al. Repeated COVID19 molecular testing: correlation of SARSCoV2 culture with molecular assays and cycle thresholds. Clin. Infect. Dis. 73, e860–e869 (2021).
Matsuyama, S. et al. Enhanced isolation of SARSCoV2 by TMPRSS2expressing cells. Proc. Natl Acad. Sci. USA 117, 7001–7003 (2020).
Waggoner, J. J. et al. Triplex realtime RTPCR for severe acute respiratory syndrome coronavirus 2. Emerg. Infect. Dis. 26, 1633–1635 (2020).
Rambaut, A. et al. A dynamic nomenclature proposal for SARSCoV2 lineages to assist genomic epidemiology. Nat. Microbiol. 5, 1403–1407 (2020).
Han, M. S., Byun, J.H., Cho, Y. & Rim, J. H. RTPCR for SARSCoV2: quantitative versus qualitative. Lancet Infect. Dis. 21, P165 (2021).
Baccam, P., Beauchemin, C., Macken, C. A., Hayden, F. G. & Perelson, A. S. Kinetics of influenza A virus infection in humans. J. Virol. 80, 7590–7599 (2006).
Gonçalves, A. et al. Timing of antiviral treatment initiation is critical to reduce SARS‐CoV‐2 viral load. Clin. Pharmacol. Ther. 9, 509–514 (2020).
Ménache, M. G. et al. Upper respiratory tract surface areas and volumes of laboratory animals and humans: considerations for dosimetry models. J. Toxicol. Environ. Health 50, 475–506 (1997).
Collins, L. M. C. & Dawes, C. The surface area of the adult human mouth and thickness of the salivary film covering the teeth and oral mucosa. J. Dent. Res. 66, 1300–1302 (1987).
Hou, Y. J. et al. SARSCoV2 reverse genetics reveals a variable infection gradient in the respiratory tract. Cell 182, 429–446 (2020).
Miao, H., Xia, X., Perelson, A. S. & Wu, H. On identifiability of nonlinear ODE models and applications in viral dynamics. SIAM Rev. Soc. Ind. Appl. Math. 53, 3–39 (2011).
Smith, A. P., Moquin, D. J., Bernhauerova, V. & Smith, A. M. Influenza virus infection model with density dependence supports biphasic viral decay. Front. Microbiol. 9, 1554 (2018).
Martin, M. A. & Koelle, K. Comment on ‘Genomic epidemiology of superspreading events in Austria reveals mutational dynamics and transmission properties of SARSCoV2’. Sci. Transl. Med. 13, eabh1803 (2021).
Braun, K. M. et al. Acute SARSCoV2 infections harbor limited withinhost diversity and transmit via tight transmission bottlenecks. PLoS Pathog. 17, e1009849 (2021).
Valesano, A. L. et al. Temporal dynamics of SARSCoV2 mutation accumulation within and across infected hosts. PLoS Pathog. 17, e1009499 (2021).
Michael Lavigne, G., Russell, H., Sherry, B. & Ke, R. Autocrine and paracrine interferon signalling as ‘ring vaccination’ and ‘contact tracing’ strategies to suppress virus infection in a host. Proc. R. Soc. Lond. B Biol. Sci. 288, 20203002 (2021).
Perelson, A. S. & Nelson, P. W. Mathematical analysis of HIV1 dynamics in vivo. SIAM Rev. Soc. Ind. Appl. Math. 41, 3–44 (1999).
Perelson, A. S. & Ke, R. Mechanistic modeling of SARS‐CoV‐2 and other infectious diseases and the effects of therapeutics. Clin. Pharmacol. Ther. 109, 829–840 (2021).
Acknowledgements
We thank S. Ahmed, C. Bell, N. Bouton, C. Brennen, J. Brown, C. Buie, E. Cler, G. Cole, T. Coleman, A. Dunnett, L. Engels, S. Feher, K. Fox, L. Freeman, Y. Gonzalez, M. Harris, D. Henness, D. Hiser, A. Hussain, D. Jackson, J. Jarrett, M. Jenkins, K. Kalonji, S. Kanku, S. Krauklis, M. Krouse, E. Leshoure, J. Lewis, M. Li, A. Lopez, G. Lopez, E. Luna, C. H. Luo, C. Mackey, Sk. McLain, Y. B. Melesse, M. O’Donnell, S. Pflugmacher, D. Piatt, S. Pierce, G. Quitanilla, A. Samad, M. Scroggins, M. Settles, M. Sinn, P. Varney, E. Vlach, R. WilliamsChatman and T. Young for their efforts supporting recruitment, enrolment, logistics and/or sample collection and processing. We thank J. Olgin, N. Peyser and X. Butcher for assistance with the Eureka platform, M. Lore for assistance with REDcap, M. Loots for assistance with administration, G. Snyder for assistance in development of study protocols and logistics and E. Iturriaga and J. Chen for study protocol development. We thank A. Neumann for helpful input and suggestions for improving the analyses. Finally we thank A. Hernandez and C. Wright of the DNA Services Lab within the Roy J. Carver Biotechnology Center for assistance in establishment of a SARSCoV2 genomic sequencing protocol. VeroTMPRSS2 cells were provided by the National Institute of Infectious Diseases, Japan. Sofia 2 devices and associated supplies were provided to Carle Foundation Hospital by Quidel, although Quidel played no role in the design of the study or the interpretation or presentation of the data. This work was supported by the National Heart, Lung, and Blood Institute at the National Institutes of Health (no. 3U54HL14354102S2) through the RADxTech programme, to D.D.M., L.L.G. and C.B.B. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Institute of Biomedical Imaging and Bioengineering, the National Heart, Lung, and Blood Institute, the National Institutes of Health or the US Department of Health and Human Services. R.K. and C.B.B. were further supported by the Defense Advanced Research Projects Agency INTERCEPT programme through contract nos. R00676190 (to R.K.) and W911NF1720034 (to C.B.B.).
Author information
Authors and Affiliations
Contributions
Conceptualization was provided by R.K., R.L.S., W.J.H., Y.C.M., A.P., L.L.G. and C.B.B. Data curation was performed by R.L.S., B.B. and P.L. Formal analysis was carried out by R.K., P.P.M. and R.L.S. Funding acquisition was the responsibility of D.D.M., L.L.G. and C.B.B. Investigation was carried out by R.K., P.P.M., R.L.S., A.M., M.C., N.G., C.H.L., J.J., A.C., T.L., M.F., K.K.O.W., C.J.F., L.W., R.F., M.E.B., K.K.C., H.C., K.R.S., A.N.O., J.B. and M.L.R. Methodology was performed by R.K., R.L.S. and C.B.B. Project administration was overseen by D.C.E., K.R.S., S.B., S.L.G., C.R., J.Y. and J.Q. Software was provided by R.K., P.P.M. and R.L.S. H.H.M., Y.C.M., A.P., L.L.G. and C.B.B. supervised the study. Visualization was undertaken by R.K., P.P.M., R.L.S. and C.B.B. Writing of the original draft was done by R.K. and C.B.B., with writing, review and editing by R.K., P.P.M., R.L.S., Y.C.M., A.P., L.L.G. and C.B.B.
Corresponding author
Ethics declarations
Competing interests
C.B.B. and L.W. are listed as inventors on a patent application for the saliva RT–qPCR test used in this study. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks David Eyre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Remainder of individual plots.
Plots of longitudinal assay results from study participants not shown in Fig. 1a. Single asterisk next to the participant ID indicates B.1.1.7 variant infection, while double asterisks indicate P1 variant infection.
Extended Data Fig. 2 Individuallevel symptom data.
Selfreported symptom data from study participants, overlaid with viral culture status. Participants were asked to complete a survey through the Eureka digital study platform inquiring about the presence or absence of the indicated set of symptoms each day after sample collection.
Extended Data Fig. 3 Comparison of symptoms and viral culture status.
Plots show the proportions of either viral culture negative or viral culture positive days for which participants reported the indicated symptoms. The pvalues for the Wilcoxon ranksum test are reported. Data are only shown for individuals who reported the indicated symptom at least once.
Extended Data Fig. 4 Model structures.
Diagrams showing the structures of the additional three models (not shown in Fig. 2a) considered for describing viral load data. See Supporting Text for descriptions of the models.
Extended Data Fig. 5 Model parameter estimates as a function of age.
Plots showing the relationship between age and the indicated model parameter estimates for (A) the refractory cell model (nasal data) and the (B) the immune effector cell model (saliva data). Linear regressions were performed on the data. R^{2} values and pvalues are shown.
Extended Data Fig. 6 Analysis workflow.
Diagram indicating how empirical RTqPCR and viral culture data were used to generate estimations of individual level viral dynamics and infectiousness.
Extended Data Fig. 7 The saturation model accurately predicts the cell culture positivity data.
Lines denote the predicted probability of cell culture being positive. Dots denotes cell culture positivity data, where a dot is at 1 or 0 when the cell culture is positive or negative, respectively.
Extended Data Fig. 8 Individual infectiousness plots.
Estimated infectiousness over time plotted for individual study participants. Dashed lines indicate inferred peak in infectiousness.
Extended Data Fig. 9 The relationship between genome viral load (yaxis; on a log_{10} scale) and CN value of the nasal samples.
The black line, that is the center of the error band, represents the linear regression calibration curve. The shading around the black line shows the standard error for the regression.
Extended Data Fig. 10 The relationship between genome viral load (yaxis; on a log_{10} scale) and Ct value of the saliva samples.
The black line, that is the center of the error band, represents the linear regression calibration curve. The shading around the black line shows the standard error for the regression.
Supplementary information
Supplementary Information
Supplementary Tables 1–11.
Supplementary Data 1
This table contains all raw data collected from study participants used in this manuscript.
Rights and permissions
About this article
Cite this article
Ke, R., Martinez, P.P., Smith, R.L. et al. Daily longitudinal sampling of SARSCoV2 infection reveals substantial heterogeneity in infectiousness. Nat Microbiol 7, 640–652 (2022). https://doi.org/10.1038/s4156402201105z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s4156402201105z
This article is cited by

A unifying model to explain frequent SARSCoV2 rebound after nirmatrelvir treatment and limited prophylactic efficacy
Nature Communications (2024)

A WellsRiley based COVID19 infectious risk assessment model combining both short range and room scale effects
Building Simulation (2024)

Resurgence of SARSCoV2 Delta after Omicron variant superinfection in an immunocompromised pediatric patient
Virology Journal (2023)

The evolution of SARSCoV2
Nature Reviews Microbiology (2023)

A quantitative systems pharmacology model of the pathophysiology and treatment of COVID19 predicts optimal timing of pharmacological interventions
npj Systems Biology and Applications (2023)