Dengue is an acute systemic viral disease that has established itself globally in both endemic and epidemic transmission cycles. Dengue virus infection in humans is often inapparent1,6 but can lead to a wide range of clinical manifestations, from mild fever to potentially fatal dengue shock syndrome2. The lifelong immunity developed after infection with one of the four virus types is type-specific1, and progression to more serious disease is frequently, but not exclusively, associated with secondary infection by heterologous types2,5. No effective antiviral agents yet exist to treat dengue infection and treatment therefore remains supportive2. Furthermore, no licensed vaccine against dengue infection is available, and the most advanced dengue vaccine candidate did not meet expectations in a recent large trial7,8. Current efforts to curb dengue transmission focus on the vector, using combinations of chemical and biological targeting of Aedes mosquitoes and management of breeding sites2. These control efforts have failed to stem the increasing incidence of dengue fever epidemics and expansion of the geographical range of endemic transmission9. Although the historical expansion of this disease is well documented, the potentially large burden of ill-health attributable to dengue across much of the tropical and subtropical world remains poorly enumerated.

Knowledge of the geographical distribution and burden of dengue is essential for understanding its contribution to global morbidity and mortality burdens, in determining how to allocate optimally the limited resources available for dengue control, and in evaluating the impact of such activities internationally. Additionally, estimates of both apparent and inapparent infection distributions form a key requirement for assessing clinical surveillance and for scoping reliably future vaccine demand and delivery strategies. Previous maps of dengue risk have used various approaches combining historical occurrence records and expert opinion to demarcate areas at endemic risk10,11,12. More sophisticated risk-mapping techniques have also been implemented13,14, but the empirical evidence base has since been improved, alongside advances in disease modelling approaches. Furthermore, no studies have used a continuous global risk map as the foundation for dengue burden estimation.

The first global estimates of total dengue virus infections were based on an assumed constant annual infection rate among a crude approximation of the population at risk (10% in 1 billion (ref. 5) or 4% in 2 billion (ref. 15)), yielding figures of 80–100 million infections per year worldwide in 1988 (refs 5, 15). As more information was collated on the ratio of dengue haemorrhagic fever to dengue fever cases, and the ratio of deaths to dengue haemorrhagic fever cases, the global figure was revised to 50–100 million infections16,17, although larger estimates of 100–200 million have also been made10 (Fig. 1). These estimates were intended solely as approximations but, in the absence of better evidence, the resulting figure of 50–100 million infections per year is widely cited and currently used by the World Health Organization (WHO). As the methods used were informal, these estimates were presented without confidence intervals, and no attempt was made to assess geographical or temporal variation in incidence or the inapparent infection reservoir.

Figure 1: Global estimates of total dengue infections.
figure 1

Comparison of previous estimates of total global dengue infections in individuals of all ages, 1985–2010. Black triangle, ref. 5; dark blue triangle, ref. 15; green triangle, ref. 17; orange triangle, ref. 16; light blue triangle, ref. 30; pink triangle, ref. 10; red triangle, apparent infections from this study. Estimates are aligned to the year of estimate and, if not stated, aligned to the publication date. Red shading marks the credible interval of our current estimate, for comparison. Error bars from ref. 10 and ref. 16 replicated the confidence intervals provided in these publications.

PowerPoint slide

Here we present the outcome of a new project to derive an evidence-based map of dengue risk and estimates of apparent and inapparent infections worldwide on the basis of the global population in 2010. We compiled a database of 8,309 geo-located records of dengue occurrence from a systematic search, resulting from 2,838 published literature sources as well as newer online resources18 (see Supplementary Information, section A; the full bibliography4 and occurrence data are available from authors on request). Using these occurrence records we: chose a set of gridded environmental and socioeconomic covariates known, or proposed, to affect dengue transmission (see Supplementary Information, section B); incorporated recent work assessing the strength of evidence on national and subnational-level dengue present/absent status4 (Fig. 2a); and built a boosted regression tree (BRT) statistical model of dengue risk that addressed the limitations of previous risk maps (see Supplementary Information, section C) to define the probability of occurrence of dengue infection (dengue risk) within each 5 km × 5 km pixel globally (Fig. 2b). The model was run 336 times to reflect parameter uncertainty and an ensemble mean map was created (see Supplementary Information, section C). We then combined this ensemble map with detailed longitudinal information on dengue infection incidence from cohort studies and built a non-parametric Bayesian hierarchical model to describe the relationship between dengue risk and incidence (see Supplementary Information, section D). Finally, we used the estimated relationship to predict the number of apparent and inapparent dengue infections in 2010 (see Supplementary Information, section E). Our definition of an apparent infection is consistent with that used by the cohort studies: an infection with sufficient severity to modify a person’s regular schedule, such as attending school. This definition encompasses any level of severity of the disease.

Figure 2: Global evidence consensus, risk and burden of dengue in 2010.
figure 2

a, National and subnational evidence consensus on complete absence (green) through to complete presence (red) of dengue4. b, Probability of dengue occurrence at 5 km × 5 km spatial resolution of the mean predicted map (area under the receiver operator curve of 0.81 (±0.02 s.d., n = 336)) from 336 boosted regression tree models. Areas with a high probability of dengue occurrence are shown in red and areas with a low probability in green. c, Cartogram of the annual number of infections for all ages as a proportion of national or subnational (China) geographical area.

PowerPoint slide

We predict that dengue transmission is ubiquitous throughout the tropics, with the highest risk zones in the Americas and Asia (Fig. 2b). Validation statistics indicated high predictive performance of the BRT ensemble mean map with area under the receiver operating characteristic (AUC) of 0.81 (±0.02 s.d., n = 336) (see Supplementary Information, section C). Predicted risk in Africa, although more unevenly distributed than in other tropical endemic regions, is much more widespread than suggested previously. Africa has the poorest record of occurrence data and, as such, increased information from this continent would help to define better the spatial distribution of dengue within it and to improve its derivative burden estimates. We found high levels of precipitation and temperature suitability for dengue transmission to be most strongly associated among the variables considered with elevated dengue risk, although low precipitation was not found to limit transmission strongly (see Supplementary Information, section C). Proximity to low-income urban and peri-urban centres was also linked to greater risk, particularly in highly connected areas, indicating that human movement between population centres is an important facilitator of dengue spread. These associations have previously been cited9, but have not been demonstrated at the global scale and highlight the importance of including socioeconomic covariates when assessing dengue risk.

We estimate that there were 96 million apparent dengue infections globally in 2010 (Table 1). Asia bore 70% (67 (47–94) million infections) of this burden, and is characterized by large swathes of densely populated regions coinciding with very high suitability for disease transmission. India19,20 alone contributed 34% (33 (24–44) million infections) of the global total. The disproportionate infection burden borne by Asian countries is emphasized in the cartogram shown in Fig. 2c. The Americas contributed 14% (13 (9–18) million infections) of apparent infections worldwide, of which over half occurred in Brazil and Mexico. Our results indicate that Africa’s dengue burden is nearly equivalent to that of the Americas (16 (11–22) million infections, or 16% of the global total), representing a significantly larger burden than previously estimated. This disparity supports the notion of a largely hidden African dengue burden, being masked by symptomatically similar illnesses, under-reporting and highly variable treatment-seeking behaviour6,9,20. The countries of Oceania contributed less than 0.2% of global apparent infections.

Table 1 Estimated burden of dengue in 2010, by continent

We estimate that an additional 294 (217–392) million inapparent infections occurred worldwide in 2010. These mild or asymptomatic infections are not detected by the public health surveillance system and have no immediate implications for clinical management6. However, the presence of this huge potential reservoir of infection has profound implications for: (1) correctly enumerating economic impact (for example, how many vaccinations are needed to avert an apparent infection) and triangulating with independent assessments of disability adjusted life years (DALYs)21; (2) elucidating the population dynamics of dengue viruses22; and (3) making hypotheses about population effects of future vaccine programmes23 (volume, targeting efficacy, impacts in combination with vector control), which will need to be administered to maximize cross-protection and minimize post-vaccination susceptibility.

The absolute uncertainties in the national burden estimates are inevitably a function of population size, with the greatest uncertainties in India, Indonesia, Brazil and China (see full rankings in Supplementary Table 4). In addition, comparing the ratio of the mean to the width of the confidence interval24 revealed the greatest contributors to relative uncertainty (see full rankings in Supplementary Table 4). These were countries with sparse occurrence points and low evidence consensus on dengue presence, such as Afghanistan or Rwanda (see Fig. 2a), or those with ubiquitous high risk, such as Singapore or Djibouti, for which our burden prediction confidence interval is at its widest (see Supplementary Information, section D, Fig. 2). Therefore, increasing evidence consensus and occurrence data availability in low consensus countries and assembling new cohort studies, particularly in areas of high transmission, will reduce uncertainty in future burden estimates. Our approach, uniquely, provides new evidence to help maximize the value and cost-effectiveness of surveillance efforts, by indicating where limited resources can be targeted to have their maximum possible impact in improving our knowledge of the global burden and distribution of dengue.

Our estimates of total infection burden (apparent and inapparent) are more than three times higher than the WHO predicted figure (Supplementary Information, section E). Our definition of an apparent infection is broad, encompassing any disruption to the daily routine of the infected individual, and consequently is an inclusive measurement of the total population affected adversely by the disease. Within this broad class, the severity of symptoms will affect treatment-seeking behaviours and the probability of a correct diagnosis in response to a given infection. Our definition is therefore more comprehensive than those of traditional surveillance systems which, even in the most efficient system, report a much narrower range of dengue infections. By reviewing our database of longitudinal cohort studies, in which total infections in the community were documented exhaustively, we find that the biggest source of disparity between actual and reported infection numbers is the low proportion of individuals with apparent infections seeking care from formal health facilities (see Supplementary Information, section E, Fig. 5 for full analysis). Additional biases are introduced by misdiagnosis and the systematic failure of health management information systems to capture and report presenting dengue cases. By extracting the average magnitude of each of these sequential disparities from published cohort and clinical studies, we can recreate a hypothetical reporting chain with idealized reporting and arrive at estimates that are broadly comparable to those countries reported to the WHO. This is most clear in more reliable reporting regions such as the Americas. Systemic under-reporting and low hospitalization rates have important implications, for example, in the evaluation of vaccine efficacy based on reduced hospitalized caseloads. Inferences about these biases may be made from the comparison of estimated versus reported infection burdens in 2010, highlighting areas where particularly poor reporting might be strengthened (see Supplementary Information, section E).

We have strived to be exhaustive in the assembly of contemporary data on dengue occurrence and clinical incidence and have applied new modelling approaches to maximize the predictive power of these data. It remains the case, however, that the empirical evidence base for global dengue risk is more limited than that available, for example, for Plasmodium falciparum25 and Plasmodium vivax26 malaria. Records of disease occurrence carry less information than those of prevalence and, as databases of the latter become more widespread, future approaches should focus on assessing relationships between seroprevalence and clinical incidence as a means of assessing risk27. Additional cartographic refinements are also required to help differentiate endemic- from epidemic-prone areas, to determine the geographic diversity of dengue virus types and to predict the distributions of future risk under scenarios of socioeconomic and environmental change.

The global burden of dengue is formidable and represents a growing challenge to public health officials and policymakers. Success in tackling this growing global threat is, in part, contingent on strengthening the evidence base on which control planning decisions and their impact are evaluated. It is hoped that this evaluation of contemporary dengue risk distribution and burden will help to advance that goal.

Methods Summary

We compiled a database of 8,309 geo-located occurrence records for the period 1960 to 2012 from a combination of published literature and online resources18. All records were standardized annually (that is, repeat records in the same location within a year merged as one occurrence) and underwent rigorous quality control. From a suite of potential environmental and socioeconomic covariates, we chose a relevant subset including: (1) two precipitation variables interpolated from global meteorological stations; (2) an index of temperature suitability for dengue transmission adapted from an equivalent index for malaria28; (3) a vegetation/moisture index; (4) demarcations of urban and peri-urban areas; (5) an urban accessibility metric; and (6) an indicator of relative poverty. We then built a disease distribution model using a boosted regression tree (BRT) framework. To compensate for the lack of absence data, we created an evidence-based probabilistic framework for generating pseudo-absences that mitigated the main biasing factors in pseudo-absence generation29, namely: (1) geographical extent; (2) number; (3) contamination bias; and (4) sampling bias. We then created an ensemble of 336 BRT models using different plausible combinations of these factors and representing independent samples of possible sampling distributions. We calculated the final probability of occurrence (risk) map as the central tendency of these 336 BRT models predicted at a 5 km × 5 km resolution. Exclusion criteria were based on the definitive extents of dengue4 and temperature suitability for dengue transmission28. Using detailed longitudinal information from 54 dengue cohort studies, we defined a relationship between the probability of dengue occurrence and inapparent and apparent incidence using a Bayesian hierarchical model. We defined a negative binomial likelihood function with constant dispersion and a rate characterized by a highly flexible data-driven Gaussian process prior. Uninformative hyperpriors were assigned hierarchically to the prior parameters and the full posterior distribution determined by Markov Chain Monte Carlo (MCMC) sampling. Using human population gridded data, estimates of dengue infections were then calculated nationally, regionally and globally for both apparent and inapparent infections.

Online Methods

Assembly of the occurrence database and its quality control

Occurrence data comprised of point or polygon locations of confirmed dengue infection presence derived from both peer-reviewed literature and HealthMap alerts18,31 (see Supplementary Information, section A). An occurrence was defined as one or more laboratory or clinically confirmed infection(s) of dengue occurring at a unique location (a 5 km × 5 km pixel) within one calendar year. All occurrence data underwent manual review and automatic quality control to ensure information fidelity and precise geo-positioning. In total, 9,648 and 1,622 occurrence locations were obtained from literature searches and HealthMap, respectively. After the quality control procedures, our final data set contained 8,309 occurrence locations (5,216 point locations and 3,093 small polygon centroids) spanning a period from 1960 to 2012. We assume any record of dengue occurrence, regardless of its age, represented an environment permissible for the disease, as dengue has expanded from a focal disease in Asia to a cosmopolitan disease of the tropics.

Explanatory covariates

We assembled gridded global data for a suite of eight explanatory covariates. The covariates were chosen based on factors known or hypothesized to contribute to suitability for dengue transmission (see Supplementary Information, section B). These covariates included: (1) annual maximum and minimum precipitation variables from a Fourier processed32 synoptic annual series interpolated from global meteorological stations33; (2) a biological model combining the effects of temperature on the extrinsic incubation period of dengue virus and lifespan of the Aedes aegypti vector to quantify the dengue-specific temperature suitability for transmission28,34,35; (3) Fourier-processed annual average normalized difference vegetation index36; (4) categorical demarcations of urban and peri-urban areas37; (5) an urban accessibility metric defining the travel time to nearest city of 50,000 people or more by land- or water-based travel38; and (6) an indicator of relative poverty derived from the finest geographic scale data available for economic productivity and adjusted for purchasing power parity39. No covariate grids were shown to be adversely affected by multicollinearity (see Supplementary Information, section B) and were standardized to ensure identical spatial resolution, extent and boundaries. For point records, covariate values corresponded to the pixel value containing the location of the point. For polygon occurrence records, covariate values were averaged across the whole polygon.

Predicting the probability of occurrence (risk) of dengue transmission

We used a boosted regression tree (BRT) approach to establish a multivariate empirical relationship between the probability of occurrence of a dengue virus infection and the environmental conditions sampled at each site from the covariate suite. The BRT method has been shown to fit complicated response functions efficiently, while guarding against overfitting, and is therefore widely used for vector and disease distribution mapping40,41. The BRT approach combines regression trees42 with gradient boosting43, whereby an initial regression tree is fitted and iteratively improved upon in a forward stage-wise manner (boosting) by minimizing the variation in the response not explained by the model at each iteration (see Supplementary Information, section C).

Like other niche mapping approaches, the BRT models require not only presence data but also absence data defining areas of disease absence and potentially unsuitable environmental conditions at unsampled locations. Because data on absence of disease are not definitive, pseudo-absence data estimate areas of disease absence instead. No consensus approach has been developed to optimize the generation of pseudo-absence data and we therefore created an evidence-based probabilistic framework for generating pseudo-absences, incorporating the main biasing factors in pseudo-absence generation, namely: (1) geographical extent; (2) number; (3) contamination bias; and (4) sampling bias. To represent areas of absence, na pseudo-absence points29,44,45 were randomly generated based on dengue presence or absence certainty measures at a national or subnational level4. Pseudo-absence locations were restricted to a maximum distance μ from any recorded presence site46,47. Additionally, to compensate for ‘contamination’ of true but unobserved presences within the generated pseudo-absences48, np pseudo-presence points were generated using the same procedure used to generate the pseudo-absences. Variation in the parameter set π = {μ, na, np} resulted in independent samples of the possible states of the real distribution, with all parameter combinations representing a null distribution of possible states. Therefore, rather than using an individual parameter combination from π, we created an ensemble49 of 336 BRT models spanning reasonable ranges in π and evaluated the central tendency as the mean across all 336 BRT models (see Supplementary Information, section C). The final ensemble BRT model was used to predict a global map of the probability of occurrence of dengue virus infection at a 5 km × 5 km resolution.

Estimation of dengue burden and populations at risk

Formal literature searches were conducted for serological dengue virus incidence surveys. Inclusion criteria were restricted to longitudinal surveys of seroconversion to dengue-virus-specific antibodies carried out in parallel with active symptom surveillance in a defined cohort. The surveys were abstracted, standardized and geopositioned (see Supplementary Information, section D). In total, 54 dengue incidence surveys were collected. Of these, 39 contained information about the ratio of inapparent to apparent infections.

The empirical relationship between incidence and the probability of occurrence was represented using a Bayesian hierarchical model. We defined a negative binomial likelihood function50 with constant dispersion and a rate characterized by a highly flexible data-driven Gaussian process prior51. The Gaussian process prior was parameterized with a quadratic mean function and a squared exponential covariance function51. Uninformative hyperpriors were assigned hierarchically to the prior parameters and the full posterior distribution determined by Markov Chain Monte Carlo (MCMC) sampling52. The entire model was fitted separately for apparent and inapparent infection incidences, with missing inapparent to apparent ratio values imputed in the MCMC. Using human population gridded data for the year 2010 (ref. 53), estimates of apparent and inapparent dengue infections were calculated nationally, regionally and globally. These estimates were then compared to national clinical cases reported to the WHO and differences between our cartographic estimates of infections and the WHO surveillance estimates were reconciled in a comparative analysis addressing key factors in traditional surveillance under-reporting (see Supplementary Information, section E).