Scaling SARS-CoV-2 wastewater concentrations to population estimates of infection

Monitoring the progression of SARS-CoV-2 outbreaks requires accurate estimation of the unobservable fraction of the population infected over time in addition to the observed numbers of COVID-19 cases, as the latter present a distorted view of the pandemic due to changes in test frequency and coverage over time. The objective of this report is to describe and illustrate an approach that produces representative estimates of the unobservable cumulative incidence of infection by scaling the daily concentrations of SARS-CoV-2 RNA in wastewater from the consistent population contribution of fecal material to the sewage collection system.


Results
reports the total and positive daily number of COVID-19 tests conducted on the residents of the four towns served by the WWTP. These data demonstrate the large increase in testing as the pandemic progressed, while the positive test results illustrate the two major COVID-19 waves experienced in the area with the second wave yielding higher daily case rates over a longer duration compared to the first. Figure 1b plots SARS-CoV-2 RNA concentration measured in sewage sludge over this same time period. This figure also shows two waves of infection. Unlike the number of positive tests, note that for the wastewater data the first wave, though shorter in duration, peaks at concentrations similar to the second wave. This demonstrates that the early lack of testing impacted the accuracy of reported COVID-19 case information.
Applying Eq. (1) (see "Methods") to the data contained in Fig. 1b yields the cumulative fraction infected in the population over time (Fig. 2 www.nature.com/scientificreports/ also shows point estimates for the cumulative fraction of the population infected in New Haven County (which subsumes the treatment plant population) produced by three independently developed and computationally intensive statistical models using completely different methods and data sources including COVID-19 cases and deaths (Model 1) 3 ; cases, deaths, hospitalizations, and close-contact measures deduced from cell phone geolocation data (Model 2) 4 ; and deaths alone (Model 3) 5 . These models demonstrate strikingly similar shapes to and bracket the results of the wastewater-based estimates. Model 1 3 hugs the wastewater model's upper 95% confidence limit, Model 2 4 hugs the wastewater model's lower 95% confidence limit, and Model 5 falls just beneath the point-estimate trajectory of Eq. (1).

Discussion
The utilization of wastewater-based epidemiology surged during the COVID-19 pandemic with applications to outbreak detection and tracking temporal trends [8][9][10] . This report presents a major advance for wastewater surveillance by using a simple scaling model to directly estimate the unobservable fraction of persons in a population infected over time from SARS-CoV-2 RNA concentrations in wastewater. This approach circumvents problems with non-representative sampling inherent in observable COVID-19 cases, hospitalizations, or deaths, and in principle can be applied in any location where continuous wastewater sampling over time is possible.  www.nature.com/scientificreports/ There are some limitations to our study that invite further investigation. The data collection period in our research ended before the emergence of the Delta and Omicron coronavirus variants of concern. It is possible that the mean shedding time differs for such variants, which would imply a different time shift from the historical value assumed in our study, though at least one recent report has estimated that the mean generation times for the Alpha and Delta variants are similar to historical strains 11 . It is also possible that infection with different variants could change the scaling from SARS-CoV-2 concentrations to infections. Determining whether the mean shedding times and RNA concentrations differ for variants of concern relative to historical strains are topics for future research.

Methods
The number of total tests and confirmed and probable COVID-19 cases was provided by the Connecticut Department of Public Health (CT DPH).
Nucleic acid was extracted from the primary sewage sludge of the New Haven, CT, USA wastewater treatment plant (which serves 200,000 residents), and SARS-CoV-2 RNA concentrations were quantified. Nucleic acid was extracted using commercial kits (Qiagen, RNeasy Powersoil Total RNA kit and Zymo, Quick-RNA Fecal/Soil Microbe Microprep). Nucleic acids were measured by spectrophotometry, the concentration adjusted to 200 ng µL −1 (NanoDrop, Thermo Fisher Scientific) and SARS-CoV-2 RNA concentrations were quantified through one-step qRT-PCR kit (BioRad iTaq™ Universal Probes One-Step Kit) using SARS-CoV-2 N1 and N2 primer sets for quantification in accordance with previously described protocols 6,12 . SARS-CoV-2 RNA concentrations were quantified daily throughout the study period. Further details regarding the construction of the SARS-CoV-2 RNA concentration dataset appear in the Supplementary Information.
There are two fundamental assumptions in our scaling model: 1. RNA concentrations provide a proportional measure of the extent of infection in the community given the population's consistent discharge of fecal material into the local sewage collection system, and 2. the concentration of SARS-CoV-2 RNA in sewage sludge lags the population incidence of SARS-CoV-2 in accord with the generation time distribution (also referred to as the shedding load distribution) from infection to transmission, the mean of which is approximately 9 days 7,13 . Letting π t denote the fraction of the population that is newly infected on day t (the incidence of infection) and ℓ denote the mean generation lag, the SARS-CoV-2 RNA concentration measured on day t, Z t , should approximately reflect the incidence of infection ℓ days earlier, that is, Z t ≈ kπ t-ℓ where k is the constant of proportionality, and consequently the cumulative fraction of the population infected by the end of day t, given by C t = t j=0 π j , should approximately follow C t = k ′ t j=0 Z j+ℓ where k′ = 1/k is the constant of proportionality scaling SARS-CoV-2 RNA concentration to infections per person. Given the cumulative number of infections C t* as of some particular date t*, let S t = t j=0 Z t denote the cumulative sludge RNA observed through day t. Then the scaling constant k′ can be evaluated from the relation C t * = k ′ S t * +ℓ yielding k ′ = C t * /S t * +ℓ . Substituting back into the equation for cumulative incidence up to an arbitrary day t yields C t = (C t * /S t * +ℓ ) × S t+ℓ . The cumulative incidence of infection in the 200,000 population served by the WWTP treatment plant was previously estimated as C t * = 9.3% (95% CI [0.0643, 0.1217]) as of t * = May 1, 2020 with a mean generation lag of 8.9 days 7 which we round up to ℓ = 9. Substituting yields our scaling of cumulative RNA concentration to cumulative incidence shown in Fig. 2 as Confidence intervals follow from the variance of C t estimated via the delta method 14 .

Data availability
All data employed in this report are available in the Excel file contained in the Source Data.