Inter-annual variation in seasonal dengue epidemics driven by multiple interacting factors in Guangzhou, China

Vector-borne diseases display wide inter-annual variation in seasonal epidemic size due to their complex dependence on temporally variable environmental conditions and other factors. In 2014, Guangzhou, China experienced its worst dengue epidemic on record, with incidence exceeding the historical average by two orders of magnitude. To disentangle contributions from multiple factors to inter-annual variation in epidemic size, we fitted a semi-mechanistic model to time series data from 2005–2015 and performed a series of factorial simulation experiments in which seasonal epidemics were simulated under all combinations of year-specific patterns of four time-varying factors: imported cases, mosquito density, temperature, and residual variation in local conditions not explicitly represented in the model. Our results indicate that while epidemics in most years were limited by unfavorable conditions with respect to one or more factors, the epidemic in 2014 was made possible by the combination of favorable conditions for all factors considered in our analysis.

1.03 1.07 ! . ( 1.05 1.18 ! . ) 1.06 1.19 ! . * 1.04 1.14 ! . + 1.04 1.12 ! . , 1  Supplementary Fig. 4. Posterior predictive distribution of the difference between the first non-zero dengue incidence week to the last non-zero dengue incidence week in each year based on 2,000 simulations from the fitted transmission model. The red line shows observed difference between the first non-zero dengue incidence week to the last non-zero dengue incidence day for each year. Supplementary Fig. 9. The city of Guangzhou in the province of Guangdong, China. The city limits of Guangzhou are outlined in red with population per square kilometer 8 for the region and the entirety of China (upper right) 9 .

Supplementary Methods 1. Simpler alternative model.
In addition to the model we used throughout our analysis, we also considered a simpler alternative model to explore the extent to which our more complex model formulation was necessary to address our motivating questions about drivers of inter-annual variation in local incidence. As a prerequisite to addressing our motivating questions through factorial simulation experiments, it is essential that a model be capable of reproducing observed incidence patterns in Guangzhou during 2005-2015. If a model were to fail in this regard, then its predictions about incidence patterns under alternative conditions could not be viewed as reliable.

Model description
To formulate an alternative model with at least some key features of our primary model but that could be viewed as a simpler alternative, we followed the approach of Kraemer et al. 10 . Their model was also posed as a TSIR model and made use of the same effective number of cases in the previous generation, 0′ 2 . Consistent with assumptions of our primary model, we assumed that transmission had a low impact on population susceptibility and that 0 2 = 4(6) 0′ 2 . The transmission coefficient 4(6) depended on a concave function of temperature, a monotonically increasing function of mosquito density, and a time-varying function to capture residual variation in transmission, similar to 4 . (6) in our primary model. These component functions were modeled with cubic smoothing splines using the s function in the scam package 11 in R, with concavity imposed with the bs= 'cv' option and monotonicity imposed with the bs= 'mpi' option (see model code below). These constraints are consistent with the prior assumptions applied to our primary model. One of the most distinct differences between the structure of this model and that of our primary model is that the former only made use of temperature and mosquito density values at time t rather than lagged over the 49 days preceding t.

Model fit
We used the scam package 11 in R to fit this model to data from days t for which both 0 2 and 0′ 2 were positive (n=630). The scam model described a high proportion of the of variation in incidence, with the adjusted coefficient of variation, 9 : = 0.955. We then used this fitted model to simulate local dengue incidence over the entire timeframe of 4,017 days. Although the model described a high proportion of variation in the data to which it was fitted, the model-predicted incidence showed overall poor consistency with observed incidence patterns (Supplementary Methods 1 Fig. 1). For example, on the basis of Pearson's correlation coefficient between median values from model simulations and observed values, this model had a relatively low value of ? = 0.269 compared to ? = 0.966 for the more complex model used in our primary analysis. As a result, we conclude that use of the more complex model is indeed justified, as a relatively simple alternative model with similar features was unable to reproduce observed incidence patterns. It is likely that the most important reason that the more complex model displayed greater consistency with observed incidence patterns is that it was fitted specifically on the basis of its ability to reproduce those patterns rather than the simpler and more common alternative of being fitted on the basis of the model's ability to reproduce incidence patterns from one generation to the next. In particular, the fact that the simpler model did not account for days with zero incidence was likely a major limiting factor. Year

Supplementary Methods 2. Alternative model formulations.
To consider the possibility that model formulations other than that used in our primary analysis might improve model performance, we fitted and performed validation analyses for several alternative models. These alternative models included other covariates in addition to the two explicit covariates used in our primary analysis (temperature and mosquito density) and a constant, rather than time-varying, residual term (4 . ). Following the model used in our primary analysis, each alternative model was a variant of the TSIR model,  6)) The second alternative model we considered included precipitation as an additional covariate. We considered precipitation because standing water is required for juvenile mosquito stages 12 . The way in which precipitation affects DENV transmission is not always straightforward though. On the one hand, more precipitation can lead to higher transmission due to increased abundance of mosquitoes 12 . On the other hand, too much precipitation can flood standing water and flush eggs out of aquatic habitats 13 . Analogous to relationships between temperature and DENV transmission in the primary model, we specified an additional bivariate basis function for precipitation, W X (V 2TQ , Y), to allow for 4(6) to depend on weighted sums of daily effects of temperature, K 2TQ , mosquito abundance, I(6 − Y), and precipitation, V 2TQ . We used cubic Bsplines to describe these functions using the fda package in R 14 . The precipitation covariate surface was defined by parameters associated with a 3x3 matrix that defined the height of each component of the bivariate spline ranging 1-49 days for Y and 0-215 mm for V.
Alternative Model 3: 4(6) = H(I(6), K, V, 9[, 4 . (6)) The third alternative model we considered included both precipitation and relative humidity as additional covariates. Although precipitation and humidity may be somewhat collinear in their effects on mosquito abundance, as high relative humidity may be an indicator of rain, humidity has additionally been shown to independently affect adult mosquito mortality 15 . Again, analogous to relationships between temperature and DENV transmission in the primary model, we specified additional bivariate basis functions for precipitation, W X (V 2TQ , Y), and relative humidity, W \] (9[ 2TQ , Y), to allow for 4(6) to depend on weighted sums of daily effects of temperature, K 2TQ , mosquito abundance, I(6 − Y), precipitation, V 2TQ , and relative humidity, 9[ 2TQ . Like other surfaces, the relative humidity covariate surface was defined with a bivariate spline ranging from 1-49 days for Y and 0-100% for 9[.  6)) The fourth alternative model we considered included only relative humidity as a covariate in addition to the primary model. As precipitation affects DENV transmission in complex ways, we constructed a model that would isolate the effects of relative humidity on transmission. Similar to the primary model and Alternative Models 2 and 3, we specified a bivariate basis function for temperature, mosquito abundance, and relative humidity and allowed transmission to depend on the weighted sums of daily effects of these covariates.

Model fits
As with our primary model in the main text, we estimated the posterior distribution of parameters for Alternate Models 1-4 using a Sequential Monte Carlo algorithm implemented using the BayesianTools R library 16 .
To validate the performance of each fitted model, we used data on imported cases to seed 1,000 simulations of local DENV transmission over the entire 2005-2015 time period. Over the period as a whole, Alternative Models 3 & 4-which included 4 . (6) and relative humidity as an additional covariate-had daily medians of simulated local dengue incidence that were highly correlated with observed local incidence, with ?  Fig. 5). For the observed epidemiological "features" of local dengue incidence patterns, we found that each model performed similarly in terms of the number of years that the observed values fell within the 95% posterior predictive intervals (PPIs) (Supplementary Methods 2 Table  1). However, the range of the simulated values for the features differed greatly among models, due to differing levels of uncertainty in their predictions. For example, Alternative Model 1 had particularly wide ranges for each of the four epidemiological features (Supplementary Methods 2 Fig. 14), due to the lack of a time-varying residual term. Given that the model used in our primary analysis showed strong support for a heightened contribution of residual variation from 4 . (6) in 2014, it was not surprising that Alternative Model 1 performed poorly. It was more surprising, however, that Alternative Models 2-4 all performed worse than our primary model, given that they did not exclude any of the predictors used in our primary model. In principle, a model that is nested within another (i.e., our primary model is a special case of Alternative Models 2-4) should do no better than a model that it is nested within, because additional parameters should only deviate from null values (i.e., parameter values resulting in the special case) if they lead to an improvement in the model's performance. On some level, the fact that Alternative Models 2-4 did not exceed the performance of the primary model suggests that they likely did not converge on their true global optimum values. Thus, we cannot exclude the possibility that other parameterizations of Alternative Models 2-4 could perform better than our primary model. In addition to model performance though, Alternative Models 2-4 exhibited some undesirable characteristics, including: 1) stronger dependence on 4 . (6) (Supplementary Methods 2 Figs. 7,9); 2) erratic contributions from precipitation and relative humidity (Supplementary Methods 2 Figs. 7-9); and 3) erratic patterns in 4(6). In addition, there were substantial gaps in the precipitation data (red bands in Supplementary Methods 2 Fig. 1), which likely exacerbated these problems for Alternative Models 2 & 3. Most importantly, none of these results indicate potential to provide a better explanation for interannual variation in dengue incidence, which was the primary goal of our analysis.
As a result of these assessments of Alternative Models 1-4, we concluded that our primary model-which includes 4 . (6), temperature, and mosquito density as covariates-was an appropriate choice for modeling inter-annual variation in dengue incidence in our study area. Although precipitation and humidity are meaningful biologically 12,13 , we suspect that much of the influence of those variables may be mediated through their effects on mosquito density, which was modeled directly in our primary model and may not be easily improved on by adding data on precipitation and humidity.   Supplementary Methods 2 Fig. 10. Ranking of years by annual local incidence simulated with the transmission coefficient was fitted to b(c) + d + e f given data on local conditions and imported cases from that year (Alternative Model 1). Dark red corresponds to the lowest ranking (i.e., highest simulated local incidence), and dark blue corresponds to the highest ranking (i.e., lowest simulated local incidence). The height of a given segment of a given year's bar is proportional to the posterior probability that simulations of local incidence from that year were of a given rank relative to other years.

Supplementary
Supplementary Methods 2 Fig. 11. Ranking of years by annual local incidence simulated with the transmission coefficient was fitted to b(c) + d + g + e f (c) given data on local conditions and imported cases from that year (Alternative Model 2). Dark red corresponds to the lowest ranking (i.e., highest simulated local incidence), and dark blue corresponds to the highest ranking (i.e., lowest simulated local incidence). The height of a given segment of a given year's bar is proportional to the posterior probability that simulations of local incidence from that year were of a given rank relative to other years.
Supplementary Methods 2 Fig. 12. Ranking of years by annual local incidence simulated with the transmission coefficient was fitted to b(c) + d + g + hi + e f (c) given data on local conditions and imported cases from that year (Alternative Model 3). Dark red corresponds to the lowest ranking (i.e., highest simulated local incidence), and dark blue corresponds to the highest ranking (i.e., lowest simulated local incidence). The height of a given segment of a given year's bar is proportional to the posterior probability that simulations of local incidence from that year were of a given rank relative to other years.