Differences in COVID-19 cyclicity and predictability among U.S. counties and states reflect the effectiveness of protective measures

During the COVID-19 pandemic, many quantitative approaches were employed to predict the course of disease spread. However, forecasting faces the challenge of inherently unpredictable spread dynamics, setting a limit to the accuracy of all models. Here, we analyze COVID-19 data from the USA to explain variation among jurisdictions in disease spread predictability (that is, the extent to which predictions are possible), using a combination of statistical and simulation models. We show that for half the counties and states the spread rate of COVID-19, r(t), was predictable at most 9 weeks and 8 weeks ahead, respectively, corresponding to at most 40% and 35% of an average cycle length of 23 weeks and 26 weeks. High predictability was associated with high cyclicity of r(t) and negatively associated with R0 values from the pandemic’s onset. Our statistical evidence suggests the following explanation: jurisdictions with a severe initial outbreak, and where individuals and authorities took strong and sustained protective measures against COVID-19, successfully curbed subsequent waves of disease spread, but at the same time unintentionally decreased its predictability. Decreased predictability of disease spread should be viewed as a by-product of positive and sustained steps that people take to protect themselves and others.

Bozzuto C, Ives AR (2023): Differences in COVID-19 cyclicity and predictability among U.S. counties and states reflect the effectiveness of protective measures.Scientific Reports.

Supplementary Figures
Fig. S1.Estimated r(t) time series at the county level.The estimated r(t) time series of the 100 analyzed counties are grouped according to the similarity of the dynamics.The similarity has been assessed using a dynamic time warping algorithm 1,2 , and for illustrative purposes four groups have been used.
3 / 16 Fig.S2.Estimated r(t) time series at the state level.The estimated r(t) time series of the 49 analyzed states are grouped according to the similarity of the dynamics.The similarity has been assessed using a dynamic time warping algorithm 1,2 , and for illustrative purposes four groups have been used.

Additional methods information Estimation of COVID-19 spread rate r(t)
Here we give a summary description of our previously published approach 3 to estimate county-level and state-level spread rates r(t); please refer to the original publication for additional technical details.We employed a time-varying autoregressive model designed explicitly to reconstruct the non-observed time-varying spread rate using nonlinear, statedependent error terms.The state-space model is For every jurisdiction, the variable   is the unobserved, log-transformed value of weekly deaths at time , and   is the observed count that depends on the observation uncertainty described by the random variable   .The model assumes that the death count increases exponentially at rate   , and this latent variable   changes through time as a random walk with  ! ~ N 0,  !! .We assume that the count data follow a quasi-Poisson distribution.Thus, the expectation of counts at time  is exp   , and the variance is proportional to this expectation.
We fit the model using the Kalman filter 4 to compute the maximum likelihood.In addition to the variances  !! and  !! ,we estimated the initial value of   and the initial value of   at the start of the time series.Finally, we applied the Kalman smoother to generate time series of r(t) at the county and state levels.Note that the Kalman smoother is not an additional, 'external' method.Rather, after using the Kalman filter for parameter estimation and the reconstruction of r(t), the smoother seizes all information available after fitting and 'retrospectively adjusts' all time series involved in their entirety (e.g.'true' death counts or unobserved spread rate); see Durbin and Koopman 4 for further details.

/ 16
Predictive power for an ARMA(2,2) process The variance of the transition distribution and the variance of the stationary distribution of the ARMA(2,2) model (eq. 1 in the main text) must be calculated to compute the predictability measure predictive power (  , eq. 2).This can be done by recasting the ARMA(2,2) model as a vector autoregressive process of order one, that is, a 4-dimensional VAR(1) 5 .To this end, the ARMA(2,2) model parameters populate the two jurisdiction-specific matrices where  is a variance-covariance matrix, and  ! is the variance of the original univariate ARMA(2,2) process (eq.1).The covariance matrix of the stationary distribution ( ! ) in vectorized form is defined as where vec • is the vec-operator,  is an identity matrix of appropriate size, ⨂ denotes the Kronecker product, and subscript VAR refers to the recast model.To compute the variance of the stationary distribution (as well as the transition distribution; see below) of the original univariate ARMA(2,2) process ( !(!"#!) , which is a scalar), the two additional vectors  = 1 0 0 0 ,  = 1 0 1 0 are needed, so that (superscript T means the transpose) The time-dependent variance of the transition distribution of the original univariate ARMA(2,2) process ( !(!"#!) , which is a scalar) can be computed as 13 / 16 , where the matrix   ≔   ! ! ; further technical details can be found in 5 .The transition distribution gives the evolution of the process through time, starting at a specific point  ! at time  !(both potentially multi-dimensional) when the variance var  != 0. We calculated the variance of the transition distribution for eq. 2 without considering spatial correlation, cor( ! ,  ! ).The calculations are still correct for the marginal transition distribution, that is, the transition distribution for a single jurisdiction that does not depend on the other ones.
Further, the differences in the mean spread rate among jurisdictions,  !,! (eq.1), can also be ignored, because all estimates were nearly zero (Results in the main text).

Predictive power, estimation uncertainty, and structural uncertainty
In the preceding section we have shown how to compute predictive power,   (eq.2), for a univariate model as used in the present study.We have assumed that all parameters are known with certainty, which clearly is not realistic.The flip side of such approach, though, is that computation disregarding estimation uncertainty offers a 'best case predictability': explicitly considering estimation uncertainty will lower it 5,6 .In addition to estimation uncertainty, there is also structural uncertainty related to the model used 7,8 .In this section we offer some related general considerations along with references to specialized literature.
From eq. 2 in the main text it can be seen that in order to compute  the variance of the transition distribution is scaled with respect to the variance of the stationary distribution.This ratio,    !!! , can be expanded to include estimation uncertainty.Both the transition and the stationary variance can be expressed as sums 5,6 , where we use a prime to designate an exactly known variance, and the second summand in each equation is the covariance matrix stemming from estimation; if we set   =  !≡ 0, we recover   as computed in the previous section.The computation of   and  ! is detailed in Lüktepohl 5 (ch.3), including how to iteratively compute   for any forecast 14 / 16 horizon  .Nonetheless, from a practical perspective the computation of   under estimation uncertainty is probably easiest using a parametric bootstrap using the estimated uncertainty around the parameters needed to compute   .The main insight here is that including estimation uncertainty will reduce predictive power 6 , and therefore using parameter point estimates to compute   can be regarded as indicating a 'best case predictability': if the latter is high, then including estimation uncertainty might be a cautious approach for public health decision-making.
Finally, there is also the issue of uncertainty related to a model used, i.e. structural uncertainty 7,8 .An often-employed way to address model uncertainty is to select a 'best' model among competing models using an information criterion like AIC (or the small-sample version AICc) and subsequently assume the selected model is the correct or 'true' one.A general problem arises from using the same data -quite understandably for short time seriesfor model selection and parameter estimation, often resulting in overly confident predictions, i.e. narrow prediction intervals (Ref. 7,8and references therein).More fundamentally, there is no simple solution to tackle structural uncertainty 8 .One approach could be to concomitantly use several sensible models using model averaging (e.g.Ref. 9,10 ).For example, Cramer et al. 11 have shown the usefulness of using ensemble forecasts during the COVID-19 pandemic.
Nonetheless, in our study we refrain from applying such approach as it is not clear to us how to interpret averaged predictive power values (see also caveats expressed in Ref. 12 ).

Simulation model
We designed the simulation model to determine the plausibility of the hypothesis that changes in the transmission rate of COVID-19 in response to death counts could explain patterns of cyclicity and predictability observed in state and county data.The simulation model is a modification of that presented in Ref. 3 .The simulation model tracks the epidemic on a daily time scale and explicitly includes the time period from infection to subsequent transmission (infectiousness), and from infection to death; therefore, it is akin to a SEIR model.
The simulation model tracks the number of infected individuals on day t who were infected τ days previously,  ;  .After 25 days, they are all assumed to be recovered or dead.
The relative daily infectiousness of an infected individual,   , is given by a Weibull distribution with mean 7.5 days and standard deviation 3.4 6 (Supplementary Fig. 3a in Ref. 3 ).
For an individual who dies, the day of death,   , is given by a Weibull distribution with mean 18.5 days and standard deviation 3.4 6 (Supplementary Fig. 3b in Ref.  variable exp   is included to represent environmental variation; its variance was selected to mimic the variation in r(t) observed in the real data.Note that the simulations produce values of r(t) that we analyzed directly, rather than estimating r(t) by applying a Kalman smoother 4 to the simulated values of D(t).
Although the time series were generated on a daily time scale, simulated data were subsequently aggregated to weeks for analyses.The analyses of r(t) were performed on time series of length 40 weeks.The results of the analyses depended on the initial number of infected individuals and the week at which the analyses were initiated.For the analyses in the main text, we initially simulated 40 weeks of data starting with  ;  = 10 -6 .Other parameters were  = 10 5 and the variance of   = 0.03.
We analyzed each simulated time series separately, unlike the real time series which were all analyzed simultaneously.Furthermore, fits using an ARMA(2,2) were unstable for time series of length 40, and therefore we fit the simulated time series using an AR(2) model, estimating PP 4 , the damping factor d, and average cycle length (period) from the autoregressive coefficients in the same way as the real data.

Fig. S3 .
Fig. S3.Illustrative visualization of the transition distribution overlapping with the stationary distribution.For three illustrative AR(1) processes, the time-dependent transition distribution and the stationary distribution are shown, which are used to compute the predictability measure predictive power (PP(t), eq.2); the variances here are shown as 95% confidence intervals, and the orange square is an arbitrary initial state.The quicker the transition distribution is 'absorbed' by the stationary one, the faster predictability is said to be lost.Simulation parameters: lag-1 coefficients are given in the panel titles; ( = 0) = 0.15 and var(r) = 0.001 for all three AR(1) processes.

Table S1 . Predictability barrier as a function of a preset threshold. This table
3 3.Finally, the 15 / 16 time between initial infection and diagnosis, ℎ  , is assumed to be log-normally distributed with mean 5.5 days and standard deviation 2.2 8 (Supplementary Fig.3cin Ref.3).Deaths occur according to the infection age category of individuals,  ;  , so that the probability of death of individuals that had been infected τ days earlier is 1 −    .Here, s is the overall survival probability, assumed to be s = 98%; changes in this assumption had little effect on the simulation results.Once an individual dies, they are removed from the pool of individuals.These individuals are summed to calculate the daily death count, D(t).On dayt, the number of new infections is     exp      ;    is the transmission rate and   scales the transmission rate by the infectiousness of individuals infected τ days earlier,  ;  .Here, we let   =  const 1 +   − 14 !! , so that an increasing number of deaths two weeks beforehand decreases the transmission rate below the constitutive transmission rate of  !"#$% ; this transmission rate function is the same as in eq. 3 in the main text, albeit here expressed on a daily timescale.The lognormal random !, where