Aedes aegypti and Ae. albopictus are the main vectors transmitting dengue and chikungunya viruses. Despite being pathogens of global public health importance, knowledge of their vectors’ global distribution remains patchy and sparse. A global geographic database of known occurrences of Ae. aegypti and Ae. albopictus between 1960 and 2014 was compiled. Herein we present the database, which comprises occurrence data linked to point or polygon locations, derived from peer-reviewed literature and unpublished studies including national entomological surveys and expert networks. We describe all data collection processes, as well as geo-positioning methods, database management and quality-control procedures. This is the first comprehensive global database of Ae. aegypti and Ae. albopictus occurrence, consisting of 19,930 and 22,137 geo-positioned occurrence records respectively. Both datasets can be used for a variety of mapping and spatial analyses of the vectors and, by inference, the diseases they transmit.
Background & Summary
Aedes aegypti [=Stegomyia aegypti
1] and Ae. albopictus [=Stegomyia albopicta
1] are disease vectors for many important viral human diseases such as dengue, chikungunya and yellow fever2,
Aedes aegypti is a predominantly urban vector, utilising the abundance of artificial containers as larval sites and feeding almost exclusively on humans12. Aedes albopictus can more often be found in peri-urban and rural environments, feeding readily on a variety of mammalian (including humans) and avian species13.
Aedes mosquito surveys are performed to better understand ecological and epidemiological aspects of the vectors as well as to assist disease surveillance and control14,
The database described here contains information on the known global occurrences of the adults, pupae, larvae or eggs of Ae. aegypti and Ae. albopictus globally from 1960 and 2014.
By including data from a variety of sources we were able to create the largest currently available standardised up-to-date global dataset for both Ae. aegypti and Ae. albopictus (Fig. 1), containing 42,067 geo-positioned occurrences.
PubMed (http://www.ncbi.nlm.nih.gov/nlmcatalog/journals) was searched using the term ‘Aedes’ OR ‘aegypti’ OR ‘albopictus’ for the years 1960 to 2013. The Medical Subject Headings (MeSH) term technology used in the PubMed citation archive ensured all pseudonyms were automatically included (http://www.nlm.nih.gov/mesh) in the searches. The same process was repeated for ISI Web of Science (http://wok.mimas.ac.uk) and ProMED (http://www.promedmail.org). The searches were last updated on 15th November 2013. No language restrictions were placed on these searches; however, only those citations with a full title and abstract were retrieved. This resulted in a collection of 8,597 references, of which 2,804 unique articles were identified from their abstracts as potentially containing useable location data. In-house language skills allowed processing of all English, French, Portuguese and Spanish articles. Confirmed Aedes occurrences within these articles were entered into the database. Occurrences were classified as confirmed when the article clearly stated the presence of the vector at a specific time in a specific location. This includes transient populations, i.e., found in ports or only during the summer months. Only for Europe were we able to include information of transient versus established populations using expert opinions. Laboratory studies were included if the mosquito/larvae were collected from the wild specifically for the purpose of the study. Occurrences were recorded separately for both species. More specific information about ‘sub-species’ or ‘genetic characteristics’ for example, were recorded where available but not included in the final database. This information can be obtained from the authors upon request. Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.47v3c lists full references for each published record in the database.
In addition to the data directly sourced from published literature, primary and unpublished occurrence data from national entomological surveys were obtained through contact with administrators of these surveys when possible (Fig. 1). This includes primary data for Ae. albopictus provided from an earlier published article by Carvalho et al. 20 Collections have been part of the Levantamento Rapido de Indice para Aedes aegypti (LIRAa) in Brazil and are described in full elsewhere20,21. Similarly, Ae. aegypti primary data with geographic locations were also provided by an entomological survey directly from the Ministry of Health of Brazil for 2013. All occurrences for Brazil were classified as polygons (see geo-positioning methods) as they represent surveys conducted in Brazilian municipalities with their respective centroids being used as geographic information in the database. Primary and unpublished occurrence points for both species were provided by Elyazar (2014) for Indonesia. Moore and Barker (2014) provided summary data from the United States, collected between 1985–2014 from published and unpublished sources, including geographic location at the USA county administrative level which were included as polygons in the database. In addition, Teng (2014) added occurrence records for both species based on national entomological surveys performed in Taiwan between 2004–2013. European Ae. albopictus records were derived from the European Centre for Disease Prevention and Control (ECDC) funded TIGERMAPS and VBORNET datasets provided by Schaffner and Hendrickx (2013). Data were based on Nomenclature of territorial units for statistics 3 (NUTS3) level for Europe from 1979–2013. NUTS3 centroids were calculated in ArcGIS, occurrences classified as polygons and added to the database.
Geo-positioning of data from published sources
All available location information was extracted for each occurrence from the relevant primary research article. The site name was used together with all contextual information provided about the site position to determine its latitudinal and longitudinal coordinates using Google Maps (https://www.maps.google.co.uk), Google Earth (http://www.google.co.uk/intl/en_uk/earth), or other online geo-positional databases including Geonames (http://www.geonames.org), Fallingrain (http://www.fallingrain.com/world/index.html) or as a last resort, using simple Google searches. Place names are often duplicated within a country, so contextual information was used to ensure the right site was selected. When the site name was not found, information from the text, was also used to scan sites in the approximate area to check for alternate spelling of the site name. If the study site could be geo-positioned to a specific latitude and longitude within a 5 km×5 km pixel, it was termed a ‘point location’. For each occurrence that could not be assigned a single 5 km×5 km pixel, e.g., a large city, the occurrence was entered as a polygon data type. Polygon occurrences were subsequently classified based on the polygon size they correspond to as either between 5–10 km2, 10–25 km2, 25–100 km2 or >100 km2. All locations were then linked to administrative units as recognised by the FAO Global Administrative Unit Layer (GAUL) system22. This initial database then underwent spatial and temporal standardisation and finally technical validation.
Occurrence database management: spatial and temporal standardisation
As the database was compiled from many different sources and several institutions, it was first necessary to standardise the data entries such that identical locations which may have been geo-positioned slightly differently were given the same unique identifier. Point records were given the same unique identifier if they lay within the same 5 km×5 km pixel within a global grid. Finally, any record associated with a polygon measuring larger than 111 km×111 km at the equator (1 degree) was removed from the database (n=475, n=54 for Ae. aegypti and Ae. albopictus respectively).
Similarly, it was necessary to temporally standardise the database to avoid duplicates. We chose to define a single occurrence at a given unique location (as identified above) within one calendar year. This was particularly important for oversampled regions that undergo multiple yearly surveys such as Taiwan and involved a procedure which: (i) disaggregated any records which were in the same location but spanning multiple years into individual records for each respective year and then (ii) aggregated all records with the same unique location identifier and occurring within the same year to form a single occurrence record. This led to 1,112 and 370 records being removed for Ae. aegypti and Ae. albopictus, respectively.
This database is publicly available online as a comma-delimited file for both species independently for ease of use and the ability to import it into a variety of software programs (Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.47v3c). Each of the rows represents a single occurrence record (one or more Aedes cases in the same unique location within a single calendar year). The fields contained in the database are as follows:
VECTOR: Identifying the species; Ae. aegypti or Ae. albopictus
OCCURRENCE_ID: Unique identifier for each occurrence in the database after temporal and spatial standardisation.
SOURCE_TYPE: Published literature or unpublished sources with reference ID that corresponds to the full list of references in Data Citation 1: Dryad Digital Repository http://dx.doi.org/10.5061/dryad.47v3c.
LOCATION_TYPE: Whether the record represents a point or a polygon location.
POLYGON_ADMIN: Admin level or polygon size which the record represents when the location type is a polygon. −999 when the location type is a point (5 km×5 km).
X: The longitudinal coordinate of the point or polygon centroid (WGS1984 Datum).
Y: The latitudinal coordinate of the point or polygon centroid (WGS1984 Datum).
YEAR: The year of the occurrence.
COUNTRY: The name of the country within which the occurrence lies.
COUNTRY_ID: ISO alpha-3 country codes.
GAUL_AD0: The country-level global administrative unit layer (GAUL) code (see http://www.fao.org/geonetwork) which identifies the Admin-0 polygon within which any smaller polygons and points lie.
STATUS: Established versus transient populations.
The following procedures were carried out on the final database to ensure the accuracy and validity of the occurrence records.
A raster distinguishing land from water22 was created at a 5 km×5 km resolution and was used to ensure all occurrences were positioned on a valid land pixel (n=95 and n=64 records were removed for Ae. aegypti and Ae. albopictus respectively).
We cross-validated all of the unique occurrence locations against temperature-based Aedes population persistence metrics developed by Brady et al. 23 In brief, this classification was determined by modelling the effect of temperature on adult Ae. aegypti and Ae. albopictus survival and length of first gonotrophic cycle, the interaction of which determines whether the population can persist. Population persistence was then predicted on a global scale using interpolated meteorological data24. Occurrences that fell outside this range were re-checked to ensure the quality of the occurrence records.
The result is a database consisting of 19,930 and 22,137 geo-positioned occurrences in total worldwide for Ae. aegypti and Ae. albopictus respectively, broken down by region, location type and source type in Fig. 1. In Figs 2 and 3 the global geographic distribution of both species is displayed. Figs 4 and 5 show occurrence records by year and region for both Ae. aegypti and Ae. albopictus respsectively. The increase in occurrence records since 2013 is largely attributable to the LIRAa data from Brazil.
The dataset described here can be used to investigate the spatial and temporal patterns of Aedes distribution at multiple scales and resolutions. As Ae. aegypti and Ae. albopictus are invasive species, spreading to new areas via shipping routes and human movement25,
This dataset was first used in an ecological niche modelling framework along with a set of environmental covariates to map the global distribution of each species32. A generic code to produce the global risk maps is openly available as an R software package ‘seegSDM’ from GitHub (https://github.com/SEEG-Oxford/seegSDM). Such maps can help to guide vector surveillance efforts in countries where the distribution of both species is not well-known, but which are at high risk for importation of related viruses.
Regional biases in density of occurrence records are apparent and may be due to differences in the amount of regular surveillance, differences in the number of published studies and availability of routinely collected data. Use on a global scale, however, would need to take into account geographical sampling bias as done in Kraemer et al. using similarly biased background points in a presence-only niche modelling approach31,32. The method for accounting for sampling bias, however, might vary depending on the research question asked and methodology applied in subsequent analyses.
How to cite this article: Kraemer, M. U. G. et al. The global compendium of Aedes aegypti and Ae. albopictus occurrence. Sci. Data 2:150035 doi: 10.1038/sdata.2015.35 (2015).
Kraemer, M. U. G. Dryad Digital Repository http://dx.doi.org/10.5061/dryad.47v3c (2015).
M.U.G.K. is funded by the German Academic Exchange Service (DAAD) through a graduate scholarship. M.E.S. is funded by a project grant from the Bill and Melinda Gates Foundation via the VecNet consortium (http://vecnet.org). O.J.B. is funded by a BBSRC studentship. J.P.M. and G.R.W.W. are funded by the International Research Consortium on Dengue Risk Assessment Management and Surveillance (IDAMS, European Commission 7th Framework Programme (21803), :http://www.idams.eu, Publication #28). I.R.F.E. is funded by the Wellcome Trust (#B9RZGS0). VBORNET is an ECDC funded project (contract number ECDC/09/018). S.I.H. is funded by a Senior Research Fellowship from the Wellcome Trust (#095066) which also supports A.M. and K.A.D. S.I.H. and C.M.D. also acknowledge funding support from the RAPIDD program of the Science & Technology Directorate, Department of Homeland Security, and the Fogarty International Center, National Institutes of Health. FMS is funded by the Rhodes Trust.
We thank Dr Roseli La Corte dos Santos, Departamento de Morfologia, Universidade Federal de Sergipe for providing useful guidance, the Ministry of Health of Brazil for providing additional data, Bimandra Djaafara and Karin Dian Lestari of EOCRU for their work on geo-positioning data from Indonesia. We thank all contributors to the Europe Ae. albopictus dataset (a detailed list in Data Citation 1).
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0 Metadata associated with this Data Descriptor is available at http://www.nature.com/sdata/ and is released under the CC0 waiver to maximize reuse.