Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera


The honeybee Apis mellifera has major ecological and economic importance. We analyze patterns of genetic variation at 8.3 million SNPs, identified by sequencing 140 honeybee genomes from a worldwide sample of 14 populations at a combined total depth of 634×. These data provide insight into the evolutionary history and genetic basis of local adaptation in this species. We find evidence that population sizes have fluctuated greatly, mirroring historical fluctuations in climate, although contemporary populations have high genetic diversity, indicating the absence of domestication bottlenecks. Levels of genetic variation are strongly shaped by natural selection and are highly correlated with patterns of gene expression and DNA methylation. We identify genomic signatures of local adaptation, which are enriched in genes expressed in workers and in immune system– and sperm motility–related genes that might underlie geographic variation in reproduction, dispersal and disease resistance. This study provides a framework for future investigations into responses to pathogens and climate change in honeybees.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Geographic distribution of genetic variation and demographic history of honeybees.
Figure 2: Genetic variation associated with gene function.
Figure 3: Genes under positive selection.
Figure 4: Function of selected genes.

Accession codes

Primary accessions



  1. 1

    Gallai, N., Salles, J.-M., Settele, J. & Vaissière, B.E. Economic valuation of the vulnerability of world agriculture confronted with pollinator decline. Ecol. Econ. 68, 810–821 (2009).

    Article  Google Scholar 

  2. 2

    Neumann, P. & Carreck, N. Honey bee colony losses. J. Apic. Res. 49, 1–6 (2010).

    Article  Google Scholar 

  3. 3

    Garnery, L., Cornuet, J.M. & Solignac, M. Evolutionary history of the honey bee Apis mellifera inferred from mitochondrial DNA analysis. Mol. Ecol. 1, 145–154 (1992).

    CAS  Article  Google Scholar 

  4. 4

    Arias, M.C. & Sheppard, W.S. Molecular phylogenetics of honey bee subspecies (Apis mellifera L.) inferred from mitochondrial DNA sequence. Mol. Phylogenet. Evol. 5, 557–566 (1996).

    CAS  Article  Google Scholar 

  5. 5

    Ruttner, F. Biogeography and Taxonomy of Honeybees (Springer-Verlag, Berlin, 1988).

  6. 6

    Hepburn, H.R. & Radloff, S.E. Honeybees of Africa (Springer-Verlag, Berlin, 1998).

  7. 7

    Whitfield, C.W. et al. Thrice out of Africa: ancient and recent expansions of the honey bee, Apis mellifera. Science 314, 642–645 (2006).

    CAS  Article  Google Scholar 

  8. 8

    Han, F., Wallberg, A. & Webster, M.T. From where did the Western honeybee (Apis mellifera) originate? Ecol. Evol. 2, 1949–1957 (2012).

    Article  Google Scholar 

  9. 9

    The Honeybee Genome Sequencing Consortium. Insights into social insects from the genome of the honeybee Apis mellifera. Nature 443, 931–949 (2006).

  10. 10

    Fazier, M. et al. A scientific note on Varroa destructor found in East Africa; threat or opportunity? Apidologie (Celle) 41, 463–465 (2010).

    Article  Google Scholar 

  11. 11

    Martin, S.J. & Medina, L.M. Africanized honeybees have unique tolerance to Varroa mites. Trends Parasitol. 20, 112–114 (2004).

    Article  Google Scholar 

  12. 12

    Strauss, U. et al. Seasonal prevalence of pathogens and parasites in the savannah honeybee (Apis mellifera scutellata). J. Invertebr. Pathol. 114, 45–52 (2013).

    Article  Google Scholar 

  13. 13

    Sammataro, D., Gerson, U. & Needham, G. Parasitic mites of honey bees: life history, implications, and impact. Annu. Rev. Entomol. 45, 519–548 (2000).

    CAS  Article  Google Scholar 

  14. 14

    Dietemann, V., Pirk, C.W.W. & Crewe, R. Is there a need for conservation of honeybees in Africa? Apidologie (Celle) 40, 285–295 (2009).

    Article  Google Scholar 

  15. 15

    Crane, E. The World History of Beekeeping and Honey Hunting (Routledge, New York, 1999).

  16. 16

    Harpur, B.A., Minaei, S., Kent, C.F. & Zayed, A. Management increases genetic diversity of honey bees via admixture. Mol. Ecol. 21, 4414–4421 (2012).

    Article  Google Scholar 

  17. 17

    Scott Schneider, S., DeGrandi-Hoffman, G. & Smith, D.R. The African honey bee: factors contributing to a successful biological invasion. Annu. Rev. Entomol. 49, 351–376 (2004).

    Article  Google Scholar 

  18. 18

    Beye, M. et al. Exceptionally high levels of recombination across the honey bee genome. Genome Res. 16, 1339–1344 (2006).

    CAS  Article  Google Scholar 

  19. 19

    Moritz, R.F.A., Kraus, F.B., Kryger, P. & Crewe, R.M. The size of wild honeybee populations (Apis mellifera) and its implications for the conservation of honeybees. J. Insect Conserv. 11, 391–397 (2007).

    Article  Google Scholar 

  20. 20

    Jaffé, R. et al. Estimating the density of honeybee colonies across their natural range to fill the gap in pollinator decline censuses. Conserv. Biol. 24, 583–593 (2010).

    Article  Google Scholar 

  21. 21

    Baudry, E. et al. Relatedness among honeybees (Apis mellifera) of a drone congregation. Proc. R. Soc. B–Biol. Sci. 265, 2009–2014 (1998).

    Article  Google Scholar 

  22. 22

    Jaffé, R., Dietemann, V., Crewe, R.M. & Moritz, R.F.A. Temporal variation in the genetic structure of a drone congregation area: an insight into the population dynamics of wild African honeybees (Apis mellifera scutellata). Mol. Ecol. 18, 1511–1522 (2009).

    Article  Google Scholar 

  23. 23

    Wakeley, J. Coalescent Theory: An Introduction (Roberts & Co. Publishers, Greenwood Village, Colorado, USA, 2009).

  24. 24

    Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

    CAS  Article  Google Scholar 

  25. 25

    Augustin, L. et al. Eight glacial cycles from an Antarctic ice core. Nature 429, 623–628 (2004).

    CAS  Article  Google Scholar 

  26. 26

    Elango, N., Hunt, B.G., Goodisman, M.A. & Yi, S.V. DNA methylation is widespread and associated with differential gene expression in castes of the honeybee, Apis mellifera. Proc. Natl. Acad. Sci. USA 106, 11206–11211 (2009).

    CAS  Article  Google Scholar 

  27. 27

    Sarda, S., Zeng, J., Hunt, B.G. & Yi, S.V. The evolution of invertebrate gene body methylation. Mol. Biol. Evol. 29, 1907–1916 (2012).

    CAS  Article  Google Scholar 

  28. 28

    Nanty, L. et al. Comparative methylomics reveals gene-body H3K36me3 in Drosophila predicts DNA methylation and CpG landscapes in other invertebrates. Genome Res. 21, 1841–1850 (2011).

    CAS  Article  Google Scholar 

  29. 29

    Lyko, F. et al. The honey bee epigenomes: differential methylation of brain DNA in queens and workers. PLoS Biol. 8, e1000506 (2010).

    Article  Google Scholar 

  30. 30

    Grozinger, C.M., Fan, Y., Hoover, S.E.R. & Winston, M.L. Genome-wide analysis reveals differences in brain gene expression patterns associated with caste and reproductive status in honey bees (Apis mellifera). Mol. Ecol. 16, 4837–4848 (2007).

    CAS  Article  Google Scholar 

  31. 31

    Zayed, A., Naeger, N.L., Rodriguez-Zas, S.L. & Robinson, G.E. Common and novel transcriptional routes to behavioral maturation in worker and male honey bees. Genes Brain Behav. 11, 253–261 (2012).

    CAS  Article  Google Scholar 

  32. 32

    Sella, G., Petrov, D.A., Przeworski, M. & Andolfatto, P. Pervasive natural selection in the Drosophila genome? PLoS Genet. 5, e1000495 (2009).

    Article  Google Scholar 

  33. 33

    Harpur, B.A. et al. Population genomics of the honey bee reveals strong signatures of positive selection on worker traits. Proc. Natl. Acad. Sci. USA 111, 2614–2619 (2014).

    CAS  Article  Google Scholar 

  34. 34

    Nässel, D.R. & Winther, A.M.E. Drosophila neuropeptides in regulation of physiology and behavior. Prog. Neurobiol. 92, 42–104 (2010).

    Article  Google Scholar 

  35. 35

    Woodard, S.H. et al. Genes involved in convergent evolution of eusociality in bees. Proc. Natl. Acad. Sci. USA 108, 7472–7477 (2011).

    CAS  Article  Google Scholar 

  36. 36

    Corona, M. et al. Vitellogenin, juvenile hormone, insulin signaling, and queen honey bee longevity. Proc. Natl. Acad. Sci. USA 104, 7128–7133 (2007).

    CAS  Article  Google Scholar 

  37. 37

    Amdam, G.V. et al. Higher vitellogenin concentrations in honey bee workers may be an adaptation to life in temperate climates. Insectes Soc. 52, 316–319 (2005).

    Article  Google Scholar 

  38. 38

    Fiumera, A.C., Dumont, B.L. & Clark, A.G. Associations between sperm competition and natural variation in male reproductive genes on the third chromosome of Drosophila melanogaster. Genetics 176, 1245–1260 (2007).

    CAS  Article  Google Scholar 

  39. 39

    Nakayama, M. et al. Drosophila carrying Pex3 or Pex16 mutations are models of Zellweger syndrome that reflect its symptoms associated with the absence of peroxisomes. PLoS ONE 6, e22984 (2011).

    CAS  Article  Google Scholar 

  40. 40

    Hernández-García, R., de la Rúa, P. & Serrano, J. Mating frequency in Apis mellifera iberiensis queens. J. Apic. Res. 48, 121–125 (2009).

    Article  Google Scholar 

  41. 41

    Franck, P., Koeniger, N., Lahner, G., Crewe, R.M. & Solignac, M. Evolution of extreme polyandry: an estimate of mating frequency in two African honeybee subspecies, Apis mellifera monticola and A.m. scutellata. Insectes Soc. 47, 364–370 (2000).

    Article  Google Scholar 

  42. 42

    DeGrandi-hoffman, G., Tarpy, D.R. & Schneider, S.S. Patriline composition of worker populations in honeybee (Apis mellifera) colonies headed by queens inseminated with semen from African and European drones. Apidologie (Celle) 34, 111–120 (2003).

    Article  Google Scholar 

  43. 43

    Fries, I. & Raina, S. American foulbrood and African honey bees (Hymenoptera: Apidae). J. Econ. Entomol. 96, 1641–1646 (2003).

    Article  Google Scholar 

  44. 44

    Galko, M.J. & Krasnow, M.A. Cellular and genetic analysis of wound healing in Drosophila larvae. PLoS Biol. 2, E239 (2004).

    Article  Google Scholar 

  45. 45

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    CAS  Article  Google Scholar 

  46. 46

    Browning, S.R. & Browning, B.L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    CAS  Article  Google Scholar 

  47. 47

    Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).

    CAS  Google Scholar 

  48. 48

    Watterson, G.A. On the number of segregating sites in genetical models without recombination. Theor. Popul. Biol. 7, 256–276 (1975).

    CAS  Article  Google Scholar 

  49. 49

    Cho, S., Huang, Z.Y., Green, D.R., Smith, D.R. & Zhang, J. Evolution of the complementary sex-determination gene of honey bees: balancing selection and trans-species polymorphisms. Genome Res. 16, 1366–1375 (2006).

    CAS  Article  Google Scholar 

  50. 50

    Katoh, K., Kuma, K., Toh, H. & Miyata, T. MAFFT version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 33, 511–518 (2005).

    CAS  Article  Google Scholar 

  51. 51

    Alexander, D.H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    CAS  Article  Google Scholar 

  52. 52

    Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53

    Lawson, D.J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012).

    CAS  Article  Google Scholar 

  54. 54

    Pickrell, J.K. & Pritchard, J.K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    CAS  Article  Google Scholar 

  55. 55

    Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO summarizes and visualizes long lists of Gene Ontology terms. PLoS ONE 6, e21800 (2011).

    CAS  Article  Google Scholar 

Download references


We thank I. Fries, J. Evans, M. Lodesani, M. Spivak, I. Arvidsson, A. Yusuf and P. Rosenkranz for providing samples. We also thank I. Jonasson, C. Tellgren-Roth, L. Nyberg and the Uppsala Genome Centre for performing the sequencing and mapping. Bioinformatic analyses were performed using resources at the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX). We thank I. Fries, D. Hultmark, A. Eyre-Walker, M. Jakobsson, L. Andersson and K. Lindblad-Toh for helpful discussions and comments on the manuscript. This study was funded by the Swedish Research Council Formas (grant 2010-1295).

Author information




M.T.W. conceived the study. A.W. and M.T.W. wrote the manuscript. A.W. performed all the analyses, including DNA extraction, quality control and variant calling, analysis of diversity and divergence, demographic analysis, gene annotation, selection analysis and GO analysis. M.T.W. contributed to all analyses. F.H. and G.W. contributed to the analysis of the levels of genetic variation and population substructure. B.D., M.K., N.H., Z.L.P.S., M.H.A., I.K., P.D.l.R. and C.W.P. provided samples and gave comments on the manuscript.

Corresponding authors

Correspondence to Andreas Wallberg or Matthew T Webster.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Diploid genotype quality control using a haploid drone sequenced to 20× depth of coverage.

Haploid drone data were intentionally misspecified to be diploid in the FreeBayes SNP calling process, and variable positions shared by the drone and the population data set were compared. This figure shows the distributions of heterozygous genotypes in the population data set conditional on their genotype in the resequenced drone (blue, all population data set SNPs; pink, SNPs where the drone had a homozygous difference compared to the reference genome; gray, SNPs where the drone had a heterozygous genotype). Chromosomal positions with heterozygous drone genotypes demonstrated extremely elevated levels of heterozygous genotypes in the population data set (gray) compared to the L-shaped distribution of all population SNPs (blue) and the positions where the drone had homozygous differences relative to the reference sequence (red). SNPs overlapping the heterozygous drone positions were therefore removed from the population data set, as they are unlikely to represent true SNPs.

Supplementary Figure 2 Genotype representation.

(a) The number of genotypes called per sample (y axis) compared to the average depth of coverage of the sample (x axis). The number of SNP positions at which a sample had its genotype inferred varied between 5–8 million and is correlated with its depth of coverage (i.e., poorly represented samples had more unknown genotypes before alleles were imputed with BEAGLE). (b) The number of SNPs where each population had one or more genotypes called out of all ten samples (y axis) compared to the average depth of coverage for the population (x axis). Populations sequenced at lower depth generally had more missing genotypes than those sequenced at higher depth. The missing genotypes appeared to be partly randomly distributed among samples and were smoothened at the population level. Of the 8.3 million SNPs in the data set, all populations (see legend in a) are represented by at least 5 genotypes at 8 million SNPs and by 8 genotypes at 7 million SNPs. The A. mellifera anatolica population had two samples sequenced at low coverage (A), resulting in the lowest genotype representation at the population level. In total, about 6% of the genotypes in the SNP matrix were missing and were imputed with BEAGLE.

Supplementary Figure 3 Neighbor-joining trees based on allele sharing distances.

Trees were constructed using different subsets of the data using samples from native subspecies. (a) All data. (b) Coding sequences. (c) Genic sequences (introns, exons and UTRs). (d) Intergenic sequences. All trees show consistent topologies.

Supplementary Figure 4 Linkage disequilibrium decays rapidly in honeybees.

The linkage disequilibrium (LD) between SNPs as measured using the r2 estimator (y axes) is shown over increasing distance between the SNPs (x axes). (a) LD was found to decay quickly across the natural, domestic and Africanized populations. All populations have an r2 value of <0.2 at 1 kb. The rate of decay of LD reflected population size and was most rapid in the African populations. The highest LD was found among European bees. The domestic lineages had an intermediate decay of LD. (b) The corresponding decay of LD after grouping the samples according to major continental groups—A (n = 30), C (n = 20), M (n = 30) and O (n = 20)—shows that r2 is below 0.1 in all groups at distances of greater than 1,500 bp.

Supplementary Figure 5 Model-based clustering and ancestral inferences.

(a) A tree representation of the genetic relationships between the individual honeybee samples. fineSTRUCTURE was used to infer interrelationships among the samples, recovering a topology where most populations are identified as monophyletic clades (with a few outliers) and structured according to the four continental groups. South African A. mellifera capensis samples were recovered inside the A. mellifera scutellata population. The domestic bees clustered mainly with the bees from central Europe (C group). Africanized bees from Brazil were grouped with African bees (A group). (b) Allelic ancestry of each sampled honeybee. ADMIXTURE was used to infer allelic origins, and the samples were sorted according to fineSTRUCTURE topology. At K = 4, ADMIXTURE subdivides samples according to the four major groups inferred with fineSTRUCTURE. The domestic bees have mainly C-type alleles but also show some proportion of alleles with other origins. The Africanized bees appear to have mainly African alleles but also carry a remainder proportion of alleles originating from western Europe (M group). ADMIXTURE detects introgression of African alleles into the A. mellifera syriaca population. About 18% of A. mellifera syriaca alleles appear to be of African origin. (c) The co-ancestry matrix estimated in fineSTRUCTURE shows high levels of co-ancestry within the populations and groups and also detects introgression of African alleles into the A. mellifera syriaca population. (d) The cross-validation error according to the number of hypothetical ancestral groups (K). The cross-validation (CV) procedure implemented in ADMIXTURE was applied to all data (140 samples, including subspecies, domestic and Africanized samples; blue) and a data set including only the natural subspecies (100 samples; red). In both cases, the optimal value for K was found to be 4 (lowest CV errors), suggesting that the variation is best summarized as being subdivided according to the four ancestral groups.

Supplementary Figure 6 TreeMix analysis.

The relationships, divergence and major mixtures between populations were inferred and illustrated as trees using TreeMix. (a) The TreeMix ML tree explains 98.2% of the variation. It identifies the four major groups and places the root between the A + M and C + O clusters. (b) The most important mixing event (arrow) based on the residuals involves the migration of African alleles into the O group and explains an additional 0.6% of the variation. (c,d) The inferred interrelationships and major mixing event (arrow) after pooling the samples according to the continental groups are consistent with the population tree.

Supplementary Figure 7 FST and MDS analyses of population divergence.

(a) The fixation index (FST) matrix reconstructed from estimates of pairwise FST between all natural populations. There is evidence of large amounts of drift separating the populations of western and northern Europe (A. mellifera mellifera and A. mellifera iberiensis; M group) from those of central/southeastern Europe (A. mellifera carnica and A. mellifera ligustica; C group) and the introgression of African alleles into the A. mellifera syriaca population. (b) The corresponding FST matrix of pairwise distances between samples when aggregated according to the major groups is similar to that among populations. (c) Upper left, neighbor-joining (NJ) tree reconstructed from the pairwise FST distance matrix shows that the outgroup connects to the middle of the tree rather than within any of the groups. Lower right, the corresponding splits network reconstructed using SplitsTree indicates that the tree-like resolution is lowest at the basal positions among the groups. (d) The principal-component analysis (PCA) plot of the first two components (PC1 and PC2) as recovered by multidimensional scaling (MDS) in PLINK further supports the four major groups and the dominance of C-group alleles in domestic bees. The Africanized hybrid bees appear most similar to the native African samples.

Supplementary Figure 8 Genealogical concordance analyses.

(a) Population split events (blue dashed lines) in each group (A, M, C, O) as represented by the sampled honeybees were assumed to represent the origin of each population. (b) Population ages estimated by genealogical concordance (within, splits within the major groups, as shown in a; between, splits between groups). The proportion of concordant sites was computed from whole-genome patterns of allele sharing among samples and used to estimate the divergence times T in units of 1.5Ne. The effective population size was computed from the levels of sequence variation in each population. The mutation rate based on the divergence between A. mellifera and A. cerana (5.27 × 10–9) was used to transform ages into units of time, assuming a generation turnover of 1 year. Modern subspecies diverged around 20,000 YBP, whereas the major continental groups appear to have diverged around 200,000–300,00 YBP. The method is sensitive to external gene flow, as shown by the introgression of African alleles into the A. mellifera syriaca population, which overestimates the age of its split from A. mellifera anatoliaca. Because of admixture, the A. mellifera syriaca population was not included when estimating the ages of the splits between the O group and the other three groups.

Supplementary Figure 9 PSMC analysis.

The historical effective population size (Ne) (y axes) was traced back in time (x axes) using the Pairwise Sequentially Markovian Coalescent (PSMC) model applied to four samples sequenced at 20× depth of coverage. PSMC reconstructs the variation of historical Ne values from patterns of heterozygosity along the chromosomes. (a) Analysis of the complete data set. Generation turnover time was specified to be 1 year and the mutation rate was estimated to be 0.53 × 10–8 mutations per base and generation, computed from divergence observed with the outgroup species A. cerana (confidence intervals computed from 125 bootstrap replicates). (b) Analysis of only transversions. Old coalescent events and historical population sizes were reconstructed with greater resolution when the genotype data were thinned to include only transversions, suggesting alternating patterns of population expansion and contraction that match the last two glaciation cycles (errors bars are computed as in a; the transversion mutation rate was estimated to be 0.15 × 10–8).

Supplementary Figure 10 Genome-wide patterns of population differentiation.

(a) The mean fixation index (FST) of coding and noncoding SNPs detected in three comparisons (top, A versus MC; middle, C versus AM; bottom, M versus AC; A, Africa; M, northern/western Europe; C, southern/eastern Europe) as estimated for genes of different caste expression and CpGO/E categories. The FST of SNPs in coding regions (CDS + UTRs) is not significantly higher than the level of differentiation detected in noncoding (intron + intergenic) regions. There are no signs of genome-wide signals of positive selection as detected by FST on coding regions across all genes in the genome or among genes with caste-biased overexpression. The coding regions of low-GC/CpG genes have a small (12%) increase in mean FST compared to high-GC/CpG genes, a signal that is less clear in noncoding regions (NS, genes with unbiased expression; Q, queen-biased genes; W, worker-biased genes; D, drone-biased genes; LCpG, low-CpG genes; HCpG, high-CpG genes; LGC, low-GC genes; HGC, high-GC genes; 95% confidence intervals were estimated by bootstrap). (b) GC content in a 100-bp window around each SNP (y axes) is weakly correlated with FST (x axes; R2 < 2%) and mainly detected as a 10–20% drop in the extreme tail along the FST distribution for SNPs in coding regions. (c) Substitutions in coding regions were analyzed for codon changes (proportion of nonsynonymous SNPs; y axes) across the FST spectrum (x axes). High-FST substitutions (FST ≥ 0.9; fnon = 0.42) were found to be significantly enriched for nonsynonymous changes between African and European bees (A versus MC) compared to low-FST substitutions (FST < 0.9; fnon = 0.28) and in the C versus AM comparison (FST ≥ 0.9; fnon = 0.31 versus FST < 0.9; fnon = 0.26) but not the M versus AC comparison (FST ≥ 0.9; fnon = 0.28 versus FST < 0.9; fnon = 0.28) by bootstrapping the SNPs and assessing the variation around the mean at the 5% level. The large enrichment of highly differentiated nonsynonymous SNPs suggests that protein sequences are under divergent positive selection in African and European bees.

Supplementary Figure 11 Levels of nucleotide diversity around SNPs conditional on FST.

Neutral diversity (Watterson’s theta, θw) in the African bees was measured in 1-kb windows up to a distance of 250 kb on either side of every SNP detected between African and European bees and grouped into FST intervals (size = 0.05; the last interval contains the fixed SNPs). The mean diversity close to a SNP of a particular FST class (1–20 kb away) was compared to the diversity of all SNPs (at 100–250 kb away, matching the average diversity computed across the whole genome). (a) All SNPs. Diversity is reduced by 8% around SNPs with FST = 0.95–1.00 and by 23% around fixed SNPs with FST = 1 (pink, FST = 0.9–0.95; green, FST = 0.95–1.00; blue, FST = 1; 95% bootstrap confidence intervals are shown; average diversities computed for all lower FST classes shown in dark blue). (b) Coding SNPs. SNPs in coding regions are associated with the greatest reduction in diversity (FST(1) = 31%; FST(0.95–1.00) = 32%; FST(0.9–0.95) = 16%; FST(0.85–0.90) = 15%). (c) Intronic SNPs. It is mainly the fixed SNPs in introns that show reduced levels of variation (23%). (d) Intergenic SNPs. These SNPs are only weakly associated with reduced variation (FST(1) = 6%).

Supplementary Figure 12 Properties of highly differentiated SNPs.

The highly differentiated (top 0.1%) SNPs in the 3 pairwise comparisons between the African and European groups (6,179 SNPs with FST ≥ 0.89 in A versus MC; 4,799 SNPs with FST ≥ 0.95 in C versus AM; 6,283 SNPs with FST ≥ 0.96 in M versus AC) were subdivided according to honeybee genes, gene regions and caste-biased expression. (a) The CpGO/E levels of all putatively selected genes (green) associated with the highly differentiated SNPs between African and European bees are elevated compared to all genes (gray background). Likewise, the subsets of potentially selected caste-biased genes converge on high CpGO/E levels relative to all genes of the caste (Fig. 4b) (Q, queen; W, worker; D, drone), suggesting that unmethylated or differentially methylated genes are important targets of selection and adaptation in honeybees. (b) The functional associations of the Drosophila orthologs of the honeybee genes were analyzed with g:Profiler for functional enrichments. Across all contrasts and the 3 gene regions, 184 biological process (BP) terms were found to be significantly enriched in the SNP data sets, mostly driven by strong enrichments in intronic SNPs. The BP-associated noncoding regions were clustered into major categories using the REVIGO web service, spanning cellular development and signaling, regulation, response to stimulus and morphogenic development. SNPs in coding regions were significantly enriched for microtubule-based movement, detecting the large number of sperm motor proteins undergoing selection between African and European bees.

Supplementary Figure 13 Examples of genes with highly differentiated SNPs.

About 2,000–3,000 genes were associated with the top 0.1% of the most differentiated SNPs across three different comparisons (A versus MC, C versus AM, M versus AC; A, African; M, northern/western Europe; C, southern/eastern Europe; see Fig. 4a for details), including genes involved in the social, reproductive and immune systems. (a) Model of the vitellogenin receptor (yl) gene (light blue above graph; exons are shown as blocks; introns are shown as dashed lines; the arrow indicates the strand; neighboring genes shown in gray) involved in honeybee worker labor division and behavior. The allelic differentiations of SNPs among bees from different groups were estimated using the FST fixation index (y axis; circles, noncoding SNPs; filled circles, coding SNPs; triangles, nonsynonymous coding SNPs; FST measured in 1-kb windows shown as lines) and are illustrated at their coordinates along the gene (x axis). All honeybee groups (A, C, M) are associated with nonsynonymous SNPs at FST > 0.9, but the exons along the last 4 kb of the gene in C-group bees are the most differentiated. (b) Model of the microtubular dynein heavy chain 7 sperm motor protein gene (symbols as in a) showing highly differentiated SNPs and regions, found mainly in the A and M groups. The African bees have nonsynonymous SNPs distributed across several exons. (c) Model of the microtubular dynein intermediate chain 3 sperm motor gene (symbols as in a) showing a large cluster of nonsynonymous SNPs in an exon separating African and European bees. (d) Model of the immune gene hemocytin/hemolectin (Hml) showing several highly differentiated SNPs (FST > 0.9) in the African bees and a haplotype-like signature in M-group bees at lower fixation.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13, Supplementary Tables 1 and 2, and Supplementary Note. (PDF 3757 kb)

Supplementary Data Set 1

Gene and gene region associations of the top 0.1% most highly differentiated SNPs among honeybee groups in three comparisons (A versus MC, C versus AM, M versus AC). (XLSX 591 kb)

Supplementary Data Set 2

Significantly enriched gene ontology (GO) terms of Drosophila orthologs of honeybee genes associated with the top 0.1% most highly differentiated SNPs. (XLSX 35 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wallberg, A., Han, F., Wellhagen, G. et al. A worldwide survey of genome sequence variation provides insight into the evolutionary history of the honeybee Apis mellifera. Nat Genet 46, 1081–1088 (2014).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing