One year of modeling and forecasting COVID-19 transmission to support policymakers in Connecticut

To support public health policymakers in Connecticut, we developed a flexible county-structured compartmental SEIR-type model of SARS-CoV-2 transmission and COVID-19 disease progression. Our goals were to provide projections of infections, hospitalizations, and deaths, and estimates of important features of disease transmission and clinical progression. In this paper, we outline the model design, implementation and calibration, and describe how projections and estimates were used to meet the changing requirements of policymakers and officials in Connecticut from March 2020 to February 2021. The approach takes advantage of our unique access to Connecticut public health surveillance and hospital data and our direct connection to state officials and policymakers. We calibrated this model to data on deaths and hospitalizations and developed a novel measure of close interpersonal contact frequency to capture changes in transmission risk over time and used multiple local data sources to infer dynamics of time-varying model inputs. Estimated epidemiologic features of the COVID-19 epidemic in Connecticut include the effective reproduction number, cumulative incidence of infection, infection hospitalization and fatality ratios, and the case detection ratio. We conclude with a discussion of the limitations inherent in predicting uncertain epidemic trajectories and lessons learned from one year of providing COVID-19 projections in Connecticut.


Estimating hospitalizations coming from non-congregate settings
Our transmission model aims to capture disease spread in the community and excludes residents of congregate settings, whose contact patterns violate homogeneous mixing assumption. Available hospitalizations data do not disaggregate by the patient's place of residence at the time of diagnosis or hospitalization. We have therefore estimated the time-varying number of hospital admissions and hospitalizations census that came from congregate settings and excluded them from the observed time series. We received data on daily COVID-19 death counts in hospitals disaggregated by the type of residence (congregate vs. non-congregate) at the time of diagnosis or hospitalization from Connecticut Department of Public Health (CT DPH). Posterior distribution of model parameters was calibrated to the estimated time series of cumulative hospitalizations and hospitalizations census coming from non-congregate settings and to the observed time series of hospital deaths in this population.
The time-varying proportion of hospitalization census and cumulative hospitalizations coming from congregate settings was estimated as follows: 1. We first estimated the cumulative number of hospitalizations coming from congregate settings as the total number of hospital deaths among residents of congregate settings divided by the hospital case fatality ratio (HFR) among this population, and adjusting for an average hospital length of stay. The estimated HFR among residents of congregate settings in Connecticut is 0.38 and was obtained from the CT DPH based on a survey of a sample of 50 nursing homes, representing about a quarter of all nursing homes in Connecticut [1].
2. Next, we computed the cumulative proportion of hospitalizations coming from congregate settings as an estimate of the number of cumulative hospitalizations coming from congregate settings divided by the total cumulative number of hospitalizations as of the same date.
3. The proportion of hospital admissions coming from congregate settings varied over time. To approximate temporal dynamics of this proportion, we assumed that it follows the same pattern as the time-varying proportion of deaths among hospitalized residents of congregate settings among all hospital deaths: where d(t) denotes smoothed daily death counts calculated as a first order difference of spline-smoothed cumulative counts of respective time series. We then re-scaled this time-varying proportion relative to the cumulative proportion of deaths among hospitalized residents of congregate settings among all hospital deaths and lagged it back by the average hospital length of stay. The resulting time-varying function was then multiplied by the cumulative proportion of hospitalizations coming from congregate settings estimated in step 2 to obtain a time-varying estimate of the proportion of hospital admissions coming from congregate settings at time t.
This time-varying proportion was then applied to the CHA-reported daily hospital census and daily hospital admissions to estimate the number of current hospitalizations, daily hospital admissions, and cumulative hospitalizations coming from congregate settings, and respective time series coming from non-congregate settings follow directly.

Data smoothing for time-varying parameters
All time-varying parameters described in detail in the Methods section rely on smoothed functions of observed data. Figure S1 shows five functions of observed data series overlaid by a smooth function (red line), which is used as a component of time-varying model parameters.

Model calibration and Bayesian posterior inference
We calibrate the posterior distribution of model parameters to the estimated statewide hospitalizations and hospital deaths coming from non-congregate settings using a Bayesian approach with a Gaussian likelihood.
Model-based estimates of observed quantities are adjusted for reporting lags. The distributions of statewide hospitalizations census, cumulative hospitalizations, and hospital deaths are given by: where H(t, L H , prior on all three hyperparameters σ 2 h , σ 2 u , and σ 2 d with a = 0.5 and b = 5×10 6 at the end of the modeling period.
The prior was gradually relaxed as observed time series increased. For the purposes of future forecasting, as

Normalized close contact metric in Connecticut
Normalized metric Feb 2020 Apr 2020 Jun 2020 Aug 2020 Oct 2020 Dec 2020 opposed to estimation of epidemiologic features from the past model dynamics, we imposed a tighter prior on these hyperparameters to achieve reasonable posterior predictive intervals beyond the observation period.
We construct the posterior distribution over unknown parameters (θ, σ) as: We assume the date of epidemic onset to be February 16th, 2020 -21 days before the first case was officially confirmed in Connecticut on March 8th, 2020, and initialize the model with E 0 exposed individuals at the time of epidemic onset, setting the size of all downstream compartments to be zero. County-level distribution of E 0 is fixed and was estimated based on the county population size and dates of first registered case and death in each county.
Each likelihood term is weighted by the time-dependent weight z(t) times the weight assigned to a respective time series. We let the weight function z(t) take the following form, where z(t) is the weight assigned to an observation at time t, t ∈ {t 0 , ..., t max }, and the correspondence between {t 0 , ..., t max } and calendar time is set such that t = 0 corresponds to 90 days prior to the most recent observation. Parameter k z controls the smoothness of logistic function. We set k z = 0.01, resulting in a range of weights between 0.06 − 0.7 for the duration of observation period.
We set w h = 0.89, w u = 0.01, and w d = 0.1. We place a large weight on the hospitalizations census, since this time series is most sensitive to changes in epidemic dynamics, and a small weight on cumulative hospitalizations, since it measures a feature that is related to hospitalizations census. The range of observation times differ for different time series. For hospitalizations census and deaths, observation times start with the first non-zero observation. For cumulative hospitalizations, observation times start on May 29, 2020 when these data started being reported routinely.

Posterior sampling
Sampling from the joint posterior distribution of (θ, σ) given in (4) is performed using Markov Chain Monte Carlo (MCMC). We employ a hybrid algorithm that combines elliptical slice sampling (ESS) [2], Gibbs sampling, and Metropolis-Hastings sampling with random walk proposals. We first provide an overview of the steps in drawing samples from the full posterior distribution and then describe each steps in more details.
Update θ ESS |θ MH , σ: We use a rejection-free sampler (ESS) to sample a vector of random effects and a testing effect τ . The ESS operates by drawing samples from the ellipse defined by a Gaussian prior, and then accept or reject the samples by evaluating the likelihood component. Within the slice sampling step, the sampler moves along the generated ellipse and always accepts a new set of parameters.
Update θ MH |θ ESS , σ: We implement a Metropolis-Hastings algorithm with random walk proposals for θ MH . Proposals for L H and L D are made on a subset of integers bounded by a prior distribution on lags. All other elements of θ MH are continuous.
Update hyperparameters σ|θ: The hyperparameters σ 2 h , σ 2 u , σ 2 d , σ 2 are updated with Gibbs sampler steps: We ran 10 chains of the sampler for 22,000 iterations each, discarded the first 2,000 draws from each chain, and thinned each chain by a factor of 20. The final posterior sample combines the resulting thinned chains. Based on the visual inspection of individual parameter trace plots, we found that 10,000 iterations is sufficient for the chain to converge in practice. Figure

Prior and posterior distributions of model parameters
Let θ MH,CONT denote the subset of continuous transmission model parameters whose joint distribution is calibrated to observed data and updated with Metropolis-Hastings step: θ MH, CONT = (β 0 , q A , α I S , m H,0 , E 0 ). For each individual parameter θ ∈ θ MH,CONT , we specify a fixed support [θ min , θ max ], and put independent beta priors on the transformed parameter, i.e., where the shape parameters a θ and b θ are set to let θ have mean µ θ and standard deviation σ θ . Table S1 provides the summary of (µ θ , σ θ , θ min , θ max ) for all parameters in θ MH,CONT along with data sources. For parameters whose values were fixed, only the mean is given. Parameter τ is updated in the ESS step, therefore in calibration, we assume a Gaussian prior distribution on log(τ ), whose mean and SD are shown in Table S1.  Asymptomatic infections play an important role in transmission of SARS-CoV-2 [35][36][37], but estimates of the proportion of infections that do not exhibit symptoms vary substantially [3,4,38]. The true proportion of asymptomatic infections is important for projections and policy planning due to its relationship to evolving herd immunity. Reported estimates of asymptomatic proportion range between 6 -96%, and the authors of a recent review recommend a range between 40 -45% [3]. Another review reported an overall asymptomatic proportion estimate of 20%, and 31% among the studies that included follow-up [4]. Large population-based studies conducted in Spain and in the U.K. provide estimates of asymptomatic proportion between 22-36% [39,40]. We calculated age-adjusted weighted average of several estimates available from the literature: Nishiura et al.
[6] estimated 30.8% among Japanese citizens evacuated from Wuhan, China. We applied this estimate to age group 20-64 years old. Mizumoto et al. [7] estimated 17.9% among infections on the Diamond Princess cruise ship. We applied this estimate to age group 65 plus years old, which is the age group, in which most infections occurred. We assumed 65% for age group 0-19 years old, consistent with findings reported by Russell et al. [8], where 4 out of 6 infections in this age group were asymptomatic among passengers of the Diamond Princess. The average was weighted by the age distribution of Connecticut population, resulting in the estimate of q A = 0.37, which is consistent with systematic reviews and population-based studies. At the time of this writing, the best estimate of asymptomatic proportion recommended by the CDC was 40% [5]. We center the prior distribution of q A at 0.4 and allow it to vary between 0.3 and 0.5. The proportion of severe cases early in the epidemic (q I S ,0 ) is assumed to be 7% of symptomatic infections in line with [9][10][11]. Combined with the estimates of asymptomatic proportion, this translates into an estimate of severe proportion of 4.2% (3.5-4.9%) of all infections.
Duration of latency (1/δ) was fixed, since consistent estimates of this parameter are available from the literature, and because it is highly correlated with several calibrated parameters, including reporting lags and parameters that determine duration of infectiousness and time between infection and recovery or death. We assume an average of 4 days of latency [10,12,13] and 2 days of presymptomatic infectiousness [13][14][15] resulting in an average incubation period of 6 days consistent with [5,[16][17][18][19][20][21][22].
Parameters (α A , k A , α I M , α I S ) collectively determine the force of infection at any given time. Force of infection, transmission parameter β and initial number of exposed individuals E 0 together determine the early growth of the epidemic. Without additional data, all of these parameters cannot be simultaneously identified. We therefore fixed the values for a subset of these parameters based on available estimates and assumed a diffuse prior on β, which absorbs additional variation of parameters that determine the force of infection.
Duration of infectiousness of asymptomatic individuals is unknown, but is likely shorter than that of symptomatic individuals [23][24][25][26]. While several studies estimated that viral RNA could be detected in upper respiratory tract for 2-3 weeks [41,42], findings from [23,27,28] show that in mild-to-moderate symptomatic patients live virus could be isolated for a substantially shorter time period: up to 7-10 days from the day of symptom onset, suggesting the duration of infectiousness of up to 12 days. Based on these estimates and [14], we assume the duration of infectiousness of 7 days among asymptomatic individuals. Although multiple studies have shown similar viral loads among symptomatic, presymptomatic and asymptomatic cases [24,43,44], evidence suggests that asymptomatic individuals are less infectious that symptomatic, likely due to higher viral shedding while coughing and longer duration of infectiousness among symptomatic individuals [4,23,25,45,46]. Multiple estimates of relative infectiousness of asymptomatic individuals compared to symptomatic are available and range from close to zero to above one [4,13,25,29,30]. Generally, estimates that are based on attack rates are lower than those based on viral shedding or time until the first negative PCR test. In our analysis, we set relative infectiousness of asymptomatic cases to be 0.5 [4,13,29].
Although the duration of infectiousness of mild-to-moderately symptomatic cases may be up to 12 days [23,27,28], we assume that the majority of symptomatic individuals self-isolate shortly after developing symptoms. We set the duration of infectiousness of symptomatic cases to be 4 days [9, 10, 12], 2 of which are assumed to represent presymptomatic infectiousness [13][14][15]. We further assume that the duration of self-isolation until recovery is 7 days [23,27,28].
We calibrate the rate of hospitalization among severe cases (α I S ), assuming a mean of 9 days between the onset of infectiousness and hospitalization: 2 days of presymptomatic infectiousness plus 7 days between the onset of symptoms and hospitalization [32,33].

Definitions and estimation of epidemiologic parameters
We compute model-based estimates of the following epidemiologic parameters that describe SARS-CoV-2 transmission and COVID-19 disease progression:

Basic (R 0 ) and effective (R eff ) reproduction number
Basic reproduction number is defined as the expected number of people that a single infected individual will infect before recovery or death in a fully susceptible population. It is usually estimated at the beginning of an epidemic before any interventions intended to slow down transmission are implemented. Effective reproduction number has the same meaning, but adjusts for depletion of susceptible individuals and, potentially, effects of transmission-reducing interventions. It shows the expected number of people that a single infected individual will infect at time t as the epidemic progresses. We compute a model-based estimate of R eff as follows: where expressions for time-varying model parameters are provided in the Methods section of the main text, S(t) and D(t) are statewide numbers of susceptible and cumulative deceased individuals at time t, and N is the size of Connecticut population residing in non-congregate settings. We estimate R 0 using the formula for R eff at time zero (March 1, 2020).

Instantaneous and cumulative case-detection ratio (CDR)
CDR is defined as a proportion of all infections, including asymptomatic, that are detected and become known to the public health surveillance system. We estimate instantaneous CDR at time t as:

Infection hospitalization ratio (IHR)
IHR is defined as a proportion of all infections, including asymptomatic, which get admitted to a hospital. We compute an estimate of cumulative IHR as a ratio of reported cumulative hospitalizations to a model-based projection of a sum of hospitalized, recovered, and deceased individuals, adjusting for reporting lag L H .

Infection fatality ratio (IFR)
IFR is defined as a proportion of all infections, including asymptomatic, which result in death. We compute an estimate of cumulative IFR as a ratio of reported cumulative deaths to a model-based projection of a sum of recovered and deceased individuals, adjusting for reporting lag L D .

Hospital case fatality ratio (HFR)
HFR is defined as a proportion of patients admitted to a hospital who die. We compute instantaneous HFR at time t as a ratio of hospital deaths at time t to hospital admissions at time t − HLOS(t), where HLOS(t) is an average hospital length of stay at time t.