Trade-offs between individual and ensemble forecasts of an emerging infectious disease

Probabilistic forecasts play an indispensable role in answering questions about the spread of newly emerged pathogens. However, uncertainties about the epidemiology of emerging pathogens can make it difficult to choose among alternative model structures and assumptions. To assess the potential for uncertainties about emerging pathogens to affect forecasts of their spread, we evaluated the performance 16 forecasting models in the context of the 2015-2016 Zika epidemic in Colombia. Each model featured a different combination of assumptions about human mobility, spatiotemporal variation in transmission potential, and the number of virus introductions. We found that which model assumptions had the most ensemble weight changed through time. We additionally identified a trade-off whereby some individual models outperformed ensemble models early in the epidemic, but on average the ensembles outperformed all individual models. Our results suggest that multiple models spanning uncertainty across alternative assumptions are necessary to obtain robust forecasts for emerging infectious diseases.


Priors on parameters common to all models
In each model that we considered, we iteratively estimated the reporting probability (⇢), dispersion parameter of the negative binomial distribution ( ), R multiplier (k), and the timing of the first infection (◆). When possible, we leveraged previous estimates of parameters to inform prior distributions for model parameters. In some instances, we used dengue-specific parameter estimates as priors for Zika-specific parameters [1].
For the reporting probability (⇢), we assumed a mean of 0.2 and a variance of 0.05. Although this mean and variance are not directly informed by an empirical study of Zika reporting, they are in line with what we would expect for dengue [2,3] and Zika [4]. We assumed that ⇢ was a beta random variable and, using the method of moments, we specified a prior distribution for ⇢ such that ⇢ ⇠ beta(0.44, 1.76) (10) which resulted in mean and variance consistent with our prior assumptions. The dispersion parameter of the negative binomial distribution accounts for variability in case reporting beyond that captured by ⇢. Lower values of the dispersion parameter indicate overdispersion, such that variability in cases cannot be explained by a single rate of case incidence, as would be generated by a Poisson distribution with rate ⇢I d,t . Given the likelihood of variation in the reporting probability over the course of the epidemic [5] and across departments [6], we specified a uniform prior for , ⇠ uniform(0, 1), which resulted in a level of overdispersion in reporting equal to at least a geometric distribution ( = 1) but potentially greater ( < 1).
To relate the transmission coefficient ( d,t ) to environmentally driven descriptions of the reproduction number (R d,t ), we used a multiplier (k) that applied to both the static and dynamic models of R. We specified a gamma prior distribution, k ⇠ gamma(2.25, 0.75), the parameters of which were chosen by moment matching to result in a mean of three and a variance of two.
To seed the Zika epidemic in Colombia, we assumed that undetected transmission could have been occurring at any time throughout the first 34 weeks of 2015. For reference, the first case was not reported until August 9, 2015 (week 35). Thus, we specified a uniform prior for the date of the initial introduction (◆ 1 ) between weeks 1 and 34 of 2015. We assumed that the location of the first introduction (l 1 ) could have been any of the departments in Colombia with equal prior probability.

Assumptions about human mobility
Spatially coupled models For the spatially coupled models, we assumed that transmission was coupled across departments by human mobility. In these models, we informed the spatial connectivity matrix in three ways: i) aggregated mobility patterns extracted from mobile phone call detail records (CDRs), ii) a gravity model, or iii) a radiation model. In the gravity and radiation models, T i,j is defined as the total number of trips from department i to department j. This takes the form T i,j = c in the gravity model and T i,j = T i N i N j (N i +s i,j )(N i +N j +s i,j ) in the radiation model, where N i and N j are population sizes at the origin i and destination j, d i,j is the distance between i and j, s i,j is the total population within radius d i,j from i, and T i is the total number of individuals who make a trip. The parameters c, ↵, , and were fitted to CDRs from Spain and validated in West Africa [7]. All three mobility models were rownormalized to correspond to the proportion of time spent by residents of department i in department j.

Spatially uncoupled models
For the spatially uncoupled models, we assumed that each department's epidemic occurred independently of all other departments. We used the same prior distributions as described above for ⇢, , k for each department. Under this assumption, we did not include a parameter for the location of the initial introduction into Colombia, as we instead were concerned with the initial introduction into that department. It was still necessary to specify the timing of that introduction and, for models that considered it, the timing of a second introduction. Following the rationale that undetected transmission could have occurred for up to 34 weeks prior to the first reported case in a given department, we assumed an even prior for the timing of the introduction(s) into a given department.

Dynamic model
The environmentally driven component of d,t for the dynamic model,R d,t (T d,t , OP d,t |c, , ↵, v), was defined as the product of two functions: one that depends on T d,t and one that depends on OP d,t .
The function of OP d,t was defined as c( log(1 OP d,t )), which converts occurrence probability into an expectation of mosquito abundance [8]. The constant c scales this expectation to account for uncertainty in its magnitude, which we estimated and assumed to have a gamma-distributed prior with a shape parameter of 16 and a rate parameter of 0.07. This choice of prior resulted in averageR d,t being equal to one when evaluated at the mean values of the prior for c.
The function of T d,t was based on a temperature-dependent description of the basic reproduction number by Mordecai et al. [9]. Specifically, we used the version of that model based on Briere functions for parameters that were not otherwise accounted for in estimates of mosquito occurrence probability [10]. Those parameters include the mosquito biting rate (a), mosquito-to-human transmission probability (b), human-to-mosquito transmission probability (c), average adult mosquito lifespan (lf ), and parasite development rate (P DR). Those functions combine to form the temperature-dependent component ofR d,t , We did not include other parameters from Mordecai et al. [9] related to mosquitoes, such as immature development rate and egg-to-adult survival rate, as the effects of those parameters were accounted for in estimates of OP d,t from Kraemer et al. [10].
To reduce the number of parameters that we needed to estimate, we worked with a simplified description of the temperature curves produced by eq. (13) (Fig. 5).
To do so, we took random draws from the posterior distribution of parameters from Mordecai et al. [9], computed functions of temperature according to eq. (13), and fitted parameters of a skew normal distribution to those curves by least squares. The skew normal distribution has three parameters-location ( ), scale (↵), shape ( )-that together control the mean, variance, and skew of the distribution, which is sufficient to approximate the uncertainty in posterior predictions of the temperature curves described by eq. (13). We then took the mean and covariance across those estimates of , ↵, and to describe their variation with a multivariate normal distribution, which was the prior distribution we used for those parameters at the first step of our particle filter.

Static model
The environmentally driven component of d for the dynamic model,R d (T d,t , OP d , P P P d ), was defined as the product of three functions: one that depends on T d,t , one that depends on OP d , and one that depends on P P P d . We used values ofR d at the 5 km x 5 km grid cell level as calculated by Perkins et al. [8]. To aggregate them to the department level, we took a population-weighted mean ofR d across 5 km x 5 km grid cells within each department. Although we made no other modifications to the calculation ofR d , we summarize the methodology used by Perkins et al. [8] in the interest of comparability with the dynamic model.
The function of OP d was defined as log(1 OP d )), which was the same procedure used to convert occurrence probability into an expectation of mosquito abundance as used in the dynamic model. For the static model, however, we followed Perkins et al. [8] and relied on a description of occurrence probability that was not defined on a time-varying basis [10].
Rather than estimate a scaling parameter like c in the dynamic model, we relied on a scaling parameter defined as a function of P P P d by Perkins et al. [8]. This function took the form of a monotonically decreasing, cubic spline function estimated with a shape-constrained additive model. The data that informed this estimate of both the relationship with P P P d and the magnitude ofR d were outbreak sizes of 12 chikungunya outbreaks and one Zika outbreak [8] that occurred prior to the ZIKV invasion of the Americas.
The function of T d,t was based on a temperature-dependent description of the basic reproduction number that includes temperature-dependent descriptions of mosquito mortality (µ) [11] and the extrinsic incubation period (n) [12]. Those functions contribute to an expression for monthly contributions to R d,t , along with constants that represent the mosquito-to-human transmission probability (b), human-to-mosquito transmission probability (c), mosquito biting rate (a), and rate of recovery from human infection (r). For each department d, we took the average of the six largest values of eq. (14) as the contribution of T d,t toR d .

One-introduction models
Each of the one-introduction models assumed infections were seeded at one point in time (◆ 1 ) in one location (l 1 ), as specified in the general model parameters section of the Supplementary Methods.

Two-introduction models
Each of the two-introduction models made similar assumptions about ◆ 1 and l 1 for the first introduction. For these models, we additionally specified the timing (◆ 2 ) and location (l 2 ) of the second introduction events, for which we assumed even priors. In reality, there were likely more than only one or two introductions throughout the epidemic, but genomic epidemiology suggests the majority of local transmission resulted from only one or two introductions [13].  Figure 1: Proportion of forecast trajectories predicting zero infections. Forecast trajectories are specific to each particle and this example is from the model using CDR-derived mobility data, static R, and one importation event, corresponding to Fig. 2. Parameters include the R multiplier (k), reporting rate (⇢), overdispersion parameter ( ), R t scalar (c), and the location ( ), shape (↵), and scale (⌫) parameters for the skew normal distribution. Blue indicates posterior estimates were higher than prior estimates, red indicates posterior estimates were lower than prior estimates, and grey indicates no difference. Areas in white indicate the corresponding model does not use that parameter. To calculate the relative difference, we subtracted the prior estimates of parameters from the posterior estimates of parameters and divided the difference by the prior to standardize over different parameter magnitudes. For comparison purposes, we left out the initial timing and initial location of ZIKV introduction parameters.  The negative binomial overdispersion parameter versus the reporting probability for one example model at each assimilation period. The overdispersion parameter in the negative binomial distribution represents the variability in the reporting probability. Pearson's correlation coefficient for the two parameters within a parameter set is listed in the top right corner. This example is from the model using CDR-derived mobility data, static R, and one importation event, corresponding to Fig. 1   Observed incidence with initial median forecast for each of the 16 models with no weeks of data yet assimilated forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. With the forecasts generally being at zero cases in the majority of the departments, we see that models were unlikely to forecast an epidemic to occur when no data was yet to be assimilated. 12 weeks of data assimilated into model Supplementary Figure 13: Observed incidence with median forecast for each of the 16 models with 12 weeks of data assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. Observed incidence with median forecast for each of the 16 models with 24 weeks of data assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change.

Ensemble weight
Supplementary Figure 17: Ensemble weights of each model at each assimilation period. Ensemble weights were calculated using the expectation-maximization algorithm on short-term forecast performances, where forecast performance was assessed against 4-wk ahead incidence in a given department and nationally. Log scores were assessed using cumulative 4-wk ahead forecasts for each department and aggregated nationally. These log scores were then used in the expectation-maximization algorithm.  Figure 19: Forecasts for a spatially coupled (green) and uncoupled (yellow) models with one introduction and static R for three departments. Navy bars denote incidence data. Wide, light bands denote the 75% CrI, narrow, dark band denotes the 50% CrI, and thick line denotes the median forecast. Each row is from the same time point. Time points were chosen to be equally spaced out through the epidemic, with the first set of forecasts from the week of the first case, the second set of forecasts generated at 24 weeks after the first case was reported in Colombia, and the third set of forecasts generated at 48 weeks after the first case was reported in Colombia.  Figure 20: Fitted R and forecasts in two departments for models with two ZIKV introductions, cell phone mobility informed human movement, and a dynamic or static R. Models with dynamic R are depicted in red and models with a static R are depicted in blue. In plots of R, each line is a draw from the posterior, with a bold median line; horizontal black line depicts R = 1. The first set of forecasts in the middle column are from the peak week in both Risaralda and Córdoba, 24-28 weeks after the first case was reported in Colombia, and the second set are from 44-48 weeks after the first case was reported in Colombia. Vertical grey bars depict the forecasts and data considered when assessing short-term forecast performance. Narrow navy bars denote incidence data. Wide, light bands denote the 75% CrI, narrow, dark band denotes the 50% CrI, and thick line denotes the median forecast. Each row is from the same time point.  Figure 22: Incidence by department with mosquito occurrence probability trends. Blue bars denote weekly Zika incidence and green line denotes mosquito occurrence trends.   Figure 25: Observed incidence with initial expectation maximization algorithm-weighted ensemble forecast with no weeks of data yet assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. Observed incidence with expectation maximization algorithm-weighted ensemble forecast with 12 weeks of data assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change.  Figure 27: Observed incidence with expectation maximization algorithm-weighted ensemble forecast with 24 weeks of data assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change.  Figure 28: Observed incidence with expectation maximization algorithm-weighted ensemble forecast with 36 weeks of data assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. Observed incidence with expectation maximization algorithm-weighted ensemble forecast with 48 weeks of data assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change.   Figure 30: Observed incidence with initial equally weighted ensemble forecast with no weeks of data yet assimilated into forecasting models. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change. Navy bars indicate Zika incidence data at weekly time interval [14]. Light green band denotes 75% credible interval, darker green band denotes 50% credible interval, and the dark green line denotes median ensemble forecast. Vertical line indicates the point at which the forecast was made, with data to the left of the line assimilated into the model. Forecasts to the right of the vertical line change as more data is assimilated into the model, while model fits to the left of the vertical line do not change.