Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Genetic diversity of the African malaria vector Anopheles gambiae


The sustainability of malaria control in Africa is threatened by the rise of insecticide resistance in Anopheles mosquitoes, which transmit the disease1. To gain a deeper understanding of how mosquito populations are evolving, here we sequenced the genomes of 765 specimens of Anopheles gambiae and Anopheles coluzzii sampled from 15 locations across Africa, and identified over 50 million single nucleotide polymorphisms within the accessible genome. These data revealed complex population structure and patterns of gene flow, with evidence of ancient expansions, recent bottlenecks, and local variation in effective population size. Strong signals of recent selection were observed in insecticide-resistance genes, with several sweeps spreading over large geographical distances and between species. The design of new tools for mosquito control using gene-drive systems will need to take account of high levels of genetic diversity in natural mosquito populations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Patterns of genomic variation.
Figure 2: Geographical population structure and migration.
Figure 3: Population size history.
Figure 4: Evolution and spread of insecticide resistance in the Vgsc gene.

Similar content being viewed by others

Accession codes

Primary accessions

European Nucleotide Archive


  1. Hemingway, J. et al. Averting a malaria disaster: will insecticide resistance derail malaria control? Lancet 387, 1785–1788 (2016)

    Article  PubMed  PubMed Central  Google Scholar 

  2. Bhatt, S. et al. The effect of malaria control on Plasmodium falciparum in Africa between 2000 and 2015. Nature 526, 207–211 (2015)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. della Torre, A. et al. Molecular evidence of incipient speciation within Anopheles gambiae s.s. in West Africa. Insect Mol. Biol. 10, 9–18 (2001)

    Article  CAS  PubMed  Google Scholar 

  4. Lawniczak, M. K. N. et al. Widespread divergence between incipient Anopheles gambiae species revealed by whole genome sequences. Science 330, 512–514 (2010)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. Tene Fossog, B. et al. Habitat segregation and ecological character displacement in cryptic African malaria mosquitoes. Evol. Appl. 8, 326–346 (2015)

    Article  PubMed  PubMed Central  Google Scholar 

  6. Diabaté, A. et al. Larval development of the molecular forms of Anopheles gambiae (Diptera: Culicidae) in different habitats: a transplantation experiment. J. Med. Entomol. 42, 548–553 (2005)

    Article  PubMed  Google Scholar 

  7. Gimonneau, G. et al. A behavioral mechanism underlying ecological divergence in the malaria mosquito Anopheles gambiae. Behav. Ecol. 21, 1087–1092 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Dao, A. et al. Signatures of aestivation and migration in Sahelian malaria mosquito populations. Nature 516, 387–390 (2014)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Leffler, E. M. et al. Revisiting an old riddle: what determines genetic diversity levels within species? PLoS Biol. 10, e1001388 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Hammond, A. et al. A CRISPR–Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nat. Biotechnol. 34, 78–83 (2016)

    Article  CAS  PubMed  Google Scholar 

  11. Lehmann, T. et al. The Rift Valley complex as a barrier to gene flow for Anopheles gambiae in Kenya. J. Hered. 90, 613–621 (1999)

    Article  CAS  PubMed  Google Scholar 

  12. Lehmann, T. et al. Population structure of Anopheles gambiae in Africa. J. Hered. 94, 133–147 (2003)

    Article  CAS  PubMed  Google Scholar 

  13. Slotman, M. A. et al. Evidence for subdivision within the M molecular form of Anopheles gambiae. Mol. Ecol. 16, 639–649 (2007)

    Article  CAS  PubMed  Google Scholar 

  14. Pinto, J. et al. Geographic population structure of the African malaria vector Anopheles gambiae suggests a role for the forest–savannah biome transition as a barrier to gene flow. Evol. Appl. 6, 910–924 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Cruickshank, T. E. & Hahn, M. W. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol. Ecol. 23, 3133–3157 (2014)

    Article  PubMed  Google Scholar 

  16. Service, M. W. Mosquito (Diptera: Culicidae) dispersal—the long and short of it. J. Med. Entomol. 34, 579–588 (1997)

    Article  CAS  PubMed  Google Scholar 

  17. Lee, Y. et al. Spatiotemporal dynamics of gene flow and hybrid fitness between the M and S forms of the malaria mosquito, Anopheles gambiae. Proc. Natl Acad. Sci. USA 110, 19854–19859 (2013)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. Neafsey, D. E. et al. SNP genotyping defines complex gene-flow boundaries among African malaria vector mosquitoes. Science 330, 514–517 (2010)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  19. Clarkson, C. S. et al. Adaptive introgression between Anopheles sibling species eliminates a major genomic island but not reproductive isolation. Nat. Commun. 5, 4248 (2014)

    Article  ADS  CAS  PubMed  Google Scholar 

  20. Norris, L. C. et al. Adaptive introgression in an African malaria mosquito coincident with the increased usage of insecticide-treated bed nets. Proc. Natl Acad. Sci. USA 112, 815–820 (2015)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  21. Vicente, J. L. et al. Massive introgression drives species radiation at the range limit of Anopheles gambiae. Sci. Rep. 7, 46451 (2017)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  22. Nwakanma, D. C. et al. Breakdown in the process of incipient speciation in Anopheles gambiae. Genetics 193, 1221–1231 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  23. Li, S ., Schlebusch, C. & Jakobsson, M. Genetic variation reveals large-scale population expansion and migration during the expansion of Bantu-speaking peoples. Proc. R. Soc. Lond. B 281, 20141448 (2014)

    Article  Google Scholar 

  24. Noor, A. M., Amin, A. A., Akhwale, W. S. & Snow, R. W. Increasing coverage and decreasing inequity in insecticide-treated bed net use among rural Kenyan children. PLoS Med. 4, e255 (2007)

    Article  PubMed  PubMed Central  Google Scholar 

  25. Mwangangi, J. M. et al. Shifts in malaria vector species composition and transmission dynamics along the Kenyan coast over the past 20 years. Malar. J. 12, 13 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

  26. Davies, T. G. E., Field, L. M., Usherwood, P. N. R. & Williamson, M. S. A comparative study of voltage-gated sodium channels in the Insecta: implications for pyrethroid resistance in Anopheline and other Neopteran species. Insect Mol. Biol. 16, 361–375 (2007)

    Article  CAS  PubMed  Google Scholar 

  27. Mitchell, S. N. et al. Metabolic and target-site mechanisms combine to confer strong DDT resistance in Anopheles gambiae. PLoS ONE 9, e92662 (2014)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  28. Edi, C. V. et al. CYP6 P450 enzymes and ACE-1 duplication produce extreme and multiple insecticide resistance in the malaria mosquito Anopheles gambiae. PLoS Genet. 10, e1004236 (2014)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Jones, C. M. et al. Footprints of positive selection associated with a mutation (N1575Y) in the voltage-gated sodium channel of Anopheles gambiae. Proc. Natl Acad. Sci. USA 109, 6614–6619 (2012)

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  30. Ross, R. Inaugural lecture on the possibility of extirpating malaria from certain localities by a new method. BMJ 2, 1–4 (1899)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Sharakhova, M. V. et al. Update of the Anopheles gambiae PEST genome assembly. Genome Biol. 8, R5 (2007)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinformatics 11, 11.10.1–11.10.33 (2013)

    Google Scholar 

  35. Delaneau, O., Howie, B., Cox, A. J., Zagury, J.-F. & Marchini, J. Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93, 687–696 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Patterson, N., Price, A. L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Kopelman, N. M., Mayzel, J., Jakobsson, M., Rosenberg, N. A. & Mayrose, I. Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. Mol. Ecol. Resour. 15, 1179–1191 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Bhatia, G., Patterson, N., Sankararaman, S. & Price, A. L. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Liu, X. & Fu, Y.-X. Exploring population size changes using SNP frequency spectra. Nat. Genet. 47, 555–559 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Keightley, P. D., Ness, R. W., Halligan, D. L. & Haddrill, P. R. Estimation of the spontaneous mutation rate per nucleotide site in a Drosophila melanogaster full-sib family. Genetics 196, 313–320 (2014)

    Article  CAS  PubMed  Google Scholar 

  43. Schrider, D. R., Houle, D., Lynch, M. & Hahn, M. W. Rates and genomic consequences of spontaneous mutational events in Drosophila melanogaster. Genetics 194, 937–954 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Browning, B. L. & Browning, S. R. Detecting identity by descent and estimating genotype error rates in sequence data. Am. J. Hum. Genet. 93, 840–851 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Browning, S. R. & Browning, B. L. Accurate non-parametric estimation of recent effective population size from segments of identity by descent. Am. J. Hum. Genet. 97, 404–418 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Garud, N. R., Messer, P. W., Buzbas, E. O. & Petrov, D. A. Recent selective sweeps in North American Drosophila melanogaster show signatures of soft sweeps. PLoS Genet. 11, e1005004 (2015)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Sabeti, P. C. et al. Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913–918 (2007)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  48. Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci. Eng. 9, 90–95 (2007)

    Article  Google Scholar 

  49. Sayre, R. G . et al. A New Map of Standardized Terrestrial Ecosystems of Africa (American Association of Geographers, 2013)

  50. Sharakhova, M. V. et al. Genome mapping and characterization of the Anopheles gambiae heterochromatin. BMC Genomics 11, 459 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


The authors would like to thank the staff of the Wellcome Trust Sanger Institute Sample Logistics, Sequencing and Informatics facilities for their contributions. This work was supported by the Wellcome Trust (090770/Z/09/Z; 090532/Z/09/Z; 098051) and Medical Research Council UK and the Department for International Development (DFID) (MR/M006212/1). M.K.N.L. was supported by MRC grant G1100339. S.O.’L. and A.B. were supported by a grant from the Foundation for the National Institutes of Health through the Vector-Based Control of Transmission: Discovery Research (VCTR) program of the Grand Challenges in Global Health initiative of the Bill & Melinda Gates Foundation. D.W., C.S.W., H.D.M. and M.J.D. were supported by Award Numbers U19AI089674 and R01AI082734 from the National Institute of Allergy and Infectious Diseases (NIAID). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIAID or NIH. T.A. was supported by a Sir Henry Wellcome Postdoctoral Fellowship.

Author information

Authors and Affiliations



Details of author contributions are given in the consortium author list.

Corresponding authors

Correspondence to Alistair Miles, Martin J. Donnelly, Mara K. N. Lawniczak or Dominic P. Kwiatkowski.

Ethics declarations

Competing interests

The author declare no competing financial interests.

Additional information

Reviewer Information Nature thanks J. Pool and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Overview of population sampling.

Red circles show sampling locations for wild-caught mosquitoes. Colours in the map represent ecosystem classes; dark green represents forest ecosystems; see figure 9 in ref. 49 for a complete colour legend. The Congo Basin tropical rainforest is the large region of dark green in Central Africa. Sampling details for each site are shown in light grey boxes, including country (two-letter country code), location and year of collection, predominant ecosystem classification for the local region, and number and sex of individuals sequenced. For colony crosses, the direction of cross (colony of origin of mother and father) and number of offspring is shown. The inset map depicts geological fault lines in the East African rift system ( Species assignment for Guinea-Bissau and Kenya specimens is uncertain, see main text. Sequencing depth per individual is shown as median (5th–95th percentile) for each population.

Extended Data Figure 2 Genome accessibility and haplotype validation.

a, Percentage of accessible bases in non-overlapping 400-kb windows. The schematic of chromosomes below shows chromatin state predictions from ref. 50. b, Haplotypes inferred in the crosses. Each panel shows either maternal or paternal haplotypes from a single cross. Each row within a panel represents a single progeny haplotype. Haplotypes are coloured by parental inheritance (blue denotes allele from parent’s first chromosome; red denotes allele from parent’s second chromosome). Switches between colours along a haplotype indicate recombination events. Regions that were within a run of homozygosity in the parent and thus not informative for haplotype validation are masked in grey. c, Error rate estimates for haplotypes inferred in wild-caught individuals. Top plots show estimates for the mean switch distance (red line), compared to the mean switch distance if heterozygotes were phased randomly (black line). Bottom plots show the switch error rate (probability of a switch error occurring between two adjacent heterozygous genotype calls).

Extended Data Figure 3 Variant discovery and nucleotide diversity.

a, Number of variant alleles discovered per individual mosquito. Only females are plotted. b, Genetic diversity within populations. Nucleotide diversity (π) and Tajima’s D were calculated in non-overlapping 20-kb genomic windows. SNP density depicts the distribution of allele frequencies (site frequency spectrum) for each population, scaled such that a population with constant size over time is expected to have a constant SNP density over all allele frequencies. c, Average nucleotide diversity (π) and ratio of diversity between sex-linked (X) and autosomal (A) chromosomes in relation to gene architecture. d, Relationship between number of individuals sampled and the cumulative number of variant sites discovered (left), availability of conserved Cas9 target sites within genes (centre), and number of genes containing at least 1 conserved Cas9 target site which could thus be ‘targetable’ for gene drive (right).

Extended Data Figure 4 ADMIXTURE analysis.

a, Ancestry proportions within individual mosquitoes for ADMIXTURE models from K = 2 to K = 10 ancestral populations. Each vertical bar represents the proportion of ancestry within a single individual, with colours corresponding to ancestral populations. These data are the average of the major q-matrix clusters derived by CLUMPAK analysis. b, Violin plot of cross-validation error for each of 100 replicates for each K value.

Extended Data Figure 5 Population structure and differentiation.

a, Principal components analysis of the 765 wild-caught mosquitoes. b, Average allele frequency differentiation (FST) between pairs of populations. The bottom left triangle shows average FST values between each population pair. The top right triangle shows the Z score for each FST value estimated via a block-jackknife procedure39. CM* denotes Cameroon savannah sampling site only. c, Allele sharing in doubleton (f2) variants. The height of the coloured bars represent the probability of sharing a doubleton allele between two populations. Heights are normalized row-wise for each population.

Extended Data Figure 6 Ancestry informative markers.

Rows represent individual mosquitoes (grouped by population) and columns represent SNPs (grouped by chromosome arm). Colours represent species genotype. The column at the far left shows the species assignment according to the conventional molecular test based on a single marker on the X chromosome, which was performed for all individuals except Kenya (KE). The column at the far right shows the genotype for kdr variants in Vgsc codon 995. Lines at the lower edge show the physical locations of the AIM SNPs.

Extended Data Figure 7 Population size history.

a, Stairway plot of inferred histories for each population. The shaded area shows the 95% confidence interval from 199 bootstrap replicates. b, Inferred histories from three-epoch ∂a∂i models41. The thick line shows the history with the highest likelihood found by optimization; thin lines show 100 histories with the highest likelihoods from even sampling of the model parameter space. c, Inferred histories from ∂a∂i two-population models allowing for migration. For each population pair, solutions from 5 optimization runs with the highest likelihoods are shown, with the thick line showing the history with the highest likelihood. In all panels, time and Ne are scaled assuming 11 generations per year and a mutation rate of μ = 3.5 × 10−9. Scaling of time and Ne is proportional to 1/μ, for example, if the true mutation rate is twice as high then estimates of time and Ne would be halved. ya, years ago.

Extended Data Figure 8 Identity by descent and recent effective population size history.

a, Patterns of IBD sharing within populations. Each marker represents a pair of individuals. b, The distribution of IBD tract lengths within populations. c, Recent population size history for the Kenyan population inferred by the IBDNe program45. d, Comparison of the IBD tract length distribution between Kenya and four simulated demographic scenarios. e, Population size histories inferred by IBDNe (red dashed lines) from data generated by simulations (black line shows the simulated population size history). f, Comparison of patterns of IBD sharing generated by simulations (black contour lines) with Kenyan data (filled blue contours). See Supplementary Information 8.4 for details of simulations. ga, generations ago.

Extended Data Figure 9 Genome scans for signatures of recent selection.

a, Haplotype diversity. Each track plots the H12 statistic in non-overlapping windows over the genome. A value of 1 indicates low haplotype diversity within a window, expected if one or two haplotypes have risen to high frequency owing to recent selection. A value of 0 indicates high haplotype diversity, expected in neutral regions. b, XP-EHH scans. For each population comparison (for example, BF gambiae versus BF coluzzii), positive scores indicate longer haplotypes and therefore recent selection in the first population (for example, BF gambiae), and negative scores indicate selection in the second population (for example, BF coluzzii).

Extended Data Figure 10 Haplotype structure at metabolic insecticide-resistance loci.

Plot components are as described for Fig. 4. For both loci, SNPs shown in the bottom panel are all either non-synonymous or splice site variants, and are associated with one or more haplotypes under selection. a, Haplotype clustering using 1,375 SNPs within the region 3R: 28,591,663–28,602,280 spanning 8 genes (Gste1Gste8). b, Haplotype clustering using 1,844 SNPs within the region 2R: 28,491,415–28,502,910 spanning 5 genes (Cyp6p1Cyp6p5).

Supplementary information

Life Sciences Reporting Summary (PDF 71 kb)

Supplementary Information

This file contains Supplementary Text and Data – see contents page for details. (PDF 1759 kb)

Supplementary Data

This file contains Supplementary Table 1 (XLSX 50 kb)

Supplementary Data

This file contains Supplementary Table 2 (XLSX 42 kb)

Supplementary Data

This file contains Supplementary Table 3 (XLSX 14 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

The Anopheles gambiae 1000 Genomes Consortium. Genetic diversity of the African malaria vector Anopheles gambiae. Nature 552, 96–100 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing