Projecting the future incidence and burden of dengue in Southeast Asia

The recent global expansion of dengue has been facilitated by changes in urbanisation, mobility, and climate. In this work, we project future changes in dengue incidence and case burden to 2099 under the latest climate change scenarios. We fit a statistical model to province-level monthly dengue case counts from eight countries across Southeast Asia, one of the worst affected regions. We project that dengue incidence will peak this century before declining to lower levels with large variations between and within countries. Our findings reveal that northern Thailand and Cambodia will show the biggest decreases and equatorial areas will show the biggest increases. The impact of climate change will be counterbalanced by income growth, with population growth having the biggest influence on increasing burden. These findings can be used for formulating mitigation and adaptation interventions to reduce the immediate growing impact of dengue virus in the region.

2. The first two paragraphs of the results section may be clearer to the readers if switched.The section states that projections were made based on scenarios from socioeconomic and climate forecasts and then follows by describing the statistical model developed to predict dengue.Switching the two would be easier to follow sequentially, stating that the statistical model was fit to predict dengue and then applied to scenarios based on socioeconomic and climate forecasts.
3. The beginning of the results section could use some additional explanation of the GCMs for readers not as familiar with this topic.The strings of letters and numbers are not easy to understand out of context.An additional sentence or two qualitatively describing what they are and how they fit into this study would help clarify.4. In the section "estimated determinants of dengue risk," interpreting the RMSE would help with additional context.Was 263 found averaging across all locations and months, or averaged across months, aggregated for all locations?It would also be beneficial to see how this compares to any summary statistics of monthly cases.I realize that RMSE does not perfectly translate to absolute error, but a this context would be helpful in interpreting whether this is high or low accuracy.5.The methods section describes the dengue data, and after seeing the supplemental table, it seems that not all 12*18 months were available for all eight counties.Were missing observations imputed or omitted from model fitting?It would also be clearer to the readers to state in the main text how many months of data were or were not available.The paper by Colón-González fits a statistical model to province-level monthly dengue case counts 8 Southeast Asian countries and then uses this model to project the dengue burden up to 2099 considering projected changes in a set of socioeconomic (mainly GDP), demographic, and climate variables.I'm generally skeptical of the usefulness of such long terms projections, not only because of the large sources of uncertainty but because it is quite likely that vaccines and vector-control interventions will become available before that time horizon.However, I think the question is broadly relevant (understanding how dengue burden will change as a function of multiple changing drivers) and shorter-term projections are useful.I also appreciate that, acknowledging the uncertainty of projections in the covariates used, the authors consider several GCM-SSP combinations.Nevertheless, I have several concerns about the manuscript and think it is not publishable in its current form.I summarize my main concerns below.

When describing
My main concern is that, other than stating the RMSE of the best fitting model, the paper does not present results on model fit or predictive performance.This makes it very difficult to evaluate the validity and robustness of the results presented since overfitting is always a big risk in this type of model.At the very least I'd like to see plots of the correlation between predicted and observed data points, for both the in-sample and out-of-sample observations.Furthermore, given that the principal output of this study is projections over periods that extend well beyond the observation time (216 months of data and 960 months of projection), I think that leave-one-out cross-validation is not enough.Blocked cross-validation (over blocks of space and time) might give a better sense of how competing models might perform when extrapolating.

Similarly, I'd like to see more results to better understand what drives the projections. How much do the associations between covariates and outcome vary by country and over time?
To what extent does the distribution of projected covariates match the distribution of covariates used for model fitting (in other words, how much is being extrapolated beyond the observed range of covariates?).I also have some concerns about how to interpret estimates of dengue burden that are aggregated (over all age groups) and not age-stratified.The clinical presentation of dengue is known to be age-dependent, and the age distribution of the population will continue to change over the next century.This, at the very least, should be discussed.Along the same line, the model does not consider changes in the age structure of the population, even though they are key determinants of both transmission dynamics and disease burden.Did the authors consider including birth rate, life expectancy, or other metrics of population structure in their models?

Response to Reviewers comments
Manuscript reference number: NCOMMS-22-23253 Title: Projecting the future incidence and burden of dengue in Southeast Asia We wish to thank you for your comments as they have led to improvements to the manuscript.All changes are described below and can be summarised as follows: • Added a paragraph indicating the added value of our study • Incorporated a description of the general circulation models for non-specialists • Included an analysis of the predictive ability of the model stratified by administrative unit, month of the year, and year in the time series.• Corrected and clarified the number of alternative models explored in the analysis • Explained our rationale for restricting predictions to the period 2020-2099 • Added a figure to the supplementary information section showing the country-stratified contribution of the covariates in the model • Discussed the utility of our scenario-based projections • Replaced our time series cross-validation algorithm with a block cross-validation algorithm • Discussed the age dependency of the clinical presentation of dengue cases.

Reviewer #1 (Remarks to the Author):
In this study, the authors combined climate and non-climate factors associated with dengue incidence in order to project dengue burden in Southeast Asia for the remainder of the 21st Century.Fitting a monthly model to eight countries at the province-level showed that dengue incidence will increase steadily over the next several decades before declining by the end of the century.Overall, this is a strong study with interesting implications.The use of socioeconomic predictors to project dengue burden is particularly novel and interesting.Specific comments are below.
Thank you for the positive evaluation of our manuscript.

The introduction section previous studies on this topic and some of their limitations. In the last paragraph of this section, the authors should state more explicitly what sets this study apart. For example, is the greatest motivating strength the model form, the data sources, the two together, or something else?
We thank the reviewer for this suggestion.We have included the following text in the main body of the manuscript.
"The main contributions of these study are two-fold.First, our results derive from a model formulated using long-term dengue cases reported at a fine spatial scale across eight countries in Southeast Asia.This provides more robust results than proxy measures of transmission, such as vectorial capacity.Second, this study projects changes in dengue incidence based on different future scenarios of the most important determinants of dengue risk (i.e., climate, urbanisation, socioeconomic development, and human mobility) all of which have been important in shaping the expansion of dengue in previous years."

The first two paragraphs of the results section may be clearer to the readers if switched. The section states that projections were made based on scenarios from socioeconomic and climate forecasts and then follows by describing the statistical model developed to predict dengue. Switching the two would be easier to follow sequentially, stating that the statistical model was fit to predict dengue and then applied to scenarios based on socioeconomic and climate forecasts.
Thank you for the suggestion.We switched the paragraphs in the Results section as suggested.

The beginning of the results section could use some additional explanation of the GCMs for readers not as familiar with this topic. The strings of letters and numbers are not easy to understand out of context. An additional sentence or two qualitatively describing what they are and how they fit into this study would help clarify.
As suggested, we included an additional explanation of the GCMs as follows: "Briefly, GCMs are numerical models representing physical processes to depict the climate using a three-dimensional (ocean, cryosphere, and land surface) grid over the world."

In the section "estimated determinants of dengue risk," interpreting the RMSE would help with additional context. Was 263 found averaging across all locations and months, or averaged across months, aggregated for all locations? It would also be beneficial to see how this compares to any summary statistics of monthly cases. I realize that RMSE does not perfectly translate to absolute error, but a this context would be helpful in interpreting whether this is high or low accuracy.
Thank you for highlighting this issue.For this revision, we used the mean absolute error (MAE) instead of the RMSE as the former is a more natural and unambiguous measure of error.We included the following explanation of its interpretation to the main body of the manuscript: "The mean absolute error (MAE) was used as a measure of predictive ability because it is a natural and unambiguous measure of average skill magnitude.The selected model had median cross-validated MAE of 37.5 cases per month averaged across all locations and time steps.This value should be interpreted relative to a total of 5,284,064 dengue cases across all locations over the study period, and a monthly mean of 114 (range 0-11,212) dengue cases across the region.We note that the MAE was larger in regions and months of the year with a higher number of dengue cases, and was typically lower than the mean number of monthly dengue cases (see Supplementary information).Analysing the MAE as a proportion of the observed cases, the median cross-validated MAE remained below one 89% of the years in the series." We also added the following figures to the Supplementary Information section:

The methods section describes the dengue data, and after seeing the supplemental table, it seems that not all 12*18 months were available for all eight counties. Were missing observations imputed or omitted from model fitting? It would also be clearer to the readers to state in the main text how many months of data were or were not available.
We added a clarification in the text to indicate that missing observations were omitted from the model fitting.
"Missing observations (n = 7,361) were omitted from the model fitting restricting the analysis to the set of fully-observed observations."

When describing the statistical model, check the coefficients listed with each predictor. The mobility and air passenger coefficients appear to be mislabelled in the text.
Thank you for bringing this to our attention.We have checked and amended the coefficients.

The model selection section lists seven potential predictors considered for inclusion, six of which are included in the final model. The results section states that 252 models were considered; clarify how 252 model forms were derived. Including/excluding the seven parameters only yields 2^7 = 128 models.
We have removed the Model selection section and replaced it with a Model evaluation section that we believe is more appropriate for the current version of the manuscript.In this new section, we have corrected the calculations for the number of fitted models.
"In a sensitivity analysis, we fitted models using the six predictors in isolation, as well as all their possible combinations (n = 64 models) using a blocked cross-validation algorithm."

The model selection also lists three "other variables" that were considered for inclusion after the previous seven variables were applied in model fitting. Why were these variables considered after initial model fitting rather than alongside the other variables during the initial model fitting?
When fitting the initial predictive model, we wanted to include the major predictors known to be relevant for dengue dynamics (air temperature, number of consecutive dry days [a proxy for water availability for breeding sites], human population density, human mobility, air travel volume, and GDP).We tested whether other factors such as wind speed and specific humidity improved the predictive ability of the model in a series of preliminary analyses but none of them helped improving the model skill.To avoid confusion, we have decided not to mention this preliminary analysis in this revised version of the manuscript.

Reviewer #2 (Remarks to the Author):
Colon-Gonzalez et al. project the future incidence and burden of dengue in SE Asia under a variety of scenarios.The manuscript is well-written.I do not have some comments regarding the design of the study.
We thank the reviewer for their positive comments.

1) The authors use 2000-2017 as the reference period and 2020-2099 for prediction. Please justify such split of the two periods. In particular, why the period of 2018-2019 is excluded.
Dengue data for the period 2018-2019 were not freely available for any of the countries at the time of the data acquisition and so they were not included in the historical period.The period 2020-2099 was selected to allow for predictions on complete decadal periods.This is now mentioned in the manuscript: "The period 2020-2099 was selected to allow for predictions on complete decadal periods."

2) Relatedly, as the prediction covers the period of 2019-2021, it would be good the authors could demonstrate the potential influence of Covid-19 pandemic on factors such as human mobility and thus the prediction of dengue incidence and burden. It is possible that the impact of Covid-19 pandemic on the prediction over longer periods (i.e., by 2050 and 2080) may be marginal; but should not be neglected. I would suggest the authors to use data in 2000-2019 (if data available) for reference and 2022-2099 for prediction.
As stated above, at the time of the data acquisition, dengue records for the period 2018 onwards were not freely available for any of the countries.We acknowledge that this is a limitation of our study as the effects of COVID-19 restrictions on dengue risk appear to have reduced dengue risk in the region, albeit for an unknown duration (https://www.sciencedirect.com/science/article/pii/S1473309922000251).We note however that national and international travel is rebounding fast in the region (https://www.iata.org/en/pressroom/2022-releases/2022-06-09-01/)and so any protective effects of human mobility restrictions might be limited to 1-2 years and might be cancelled out due to higher population susceptibility leading to above average incidence over the next few years.

3) Contribution of different factors to future changes is the key to targeting dominant drivers. The authors made the analysis for SE Asia as shown in Figure 5, how does the findings vary at finer scale(s)? For example, the dominant drivers may be different across eight countries by 2050 and 2080. It would be interesting to show the heterogeneities of the dominant drivers across countries and two periods.
We have disaggregated the results presented in Figure 5 by country (below) and presented them in the Supplementary Information.
the statistical model, check the coefficients listed with each predictor.The mobility and air passenger coefficients appear to be mislabeled in the text.7. The model selection section lists seven potential predictors considered for inclusion, six of which are included in the final model.The results section states that 252 models were considered; clarify how 252 model forms were derived.Including/excluding the seven parameters only yields 2^7 = 128 models.8.The model selection also lists three "other variables" that were considered for inclusion after the previous seven variables were applied in model fitting.Why were these variables considered after initial model fitting rather than alongside the other variables during the initial model fitting?Reviewer #2 (Remarks to the Author): Col´on-Gonz´ alez et al. project the future incidence and burden of dengue in SE Asia under a variety of scenarios.The manuscript is well-written.I do not have some comments regarding the design of the study.1) The authors use 2000-2017 as the reference period and 2020-2099 for prediction.Please justify such split of the two periods.In particular, why the period of 2018-2019 is excluded.2) Relatedly, as the prediction covers the period of 2019-2021, it would be good the authors could demonstrate the potential influence of Cvoid-19 pandemic on factors such as human mobility and thus the prediction of dengue incidence and burden.It is possible that the impact of Covid-19 pandemic on the prediction over longer periods (i.e. by 2050 and 2080) may be marginal; but should not be neglected.I would suggest the authors to use data in 2000-2019 (if data available) for reference and 2022-2099 for prediction.3) Contribution of different factors to future changes is the key to targeting dominant drivers.The authors made the analysis for SE Asia as shown in Figure 5, how does the findings vary at finer scale(s)?For example, the dominant drivers may be different across eight countries by 2050 and 2080.It would be interesting to show the heterogeneities of the dominant drivers across countries and two periods.Reviewer #3 (Remarks to the Author):