Declining genetic diversity of European honeybees along the twentieth century

The European honeybee (Apis mellifera) is a key pollinator and has in the last decades suffered significant population decline. A combination of factors, including decrease in genetic diversity and introduction of Varroa mites, have been suggested to be responsible for these losses, but no definitive cause has yet been appointed. In Europe not only have wild colonies been severely affected, but managed hives have had a massive decline in numbers. To test the hypothesis that honeybees’ genetic diversity has decreased in the recent past, we used reduced representation genome sequencing of 40 historical honeybee specimens collected in Natural History collections across Europe and compared them to genomic data from 40 individuals from extant populations (collected post 2006). Our results are consistent with the existence of five evolutionary lineages as previously described, and show a decrease in genetic diversity between historical and extant individuals of the same lineage, as well as high levels of admixture in historical specimens. Our data confirm that a loss of genetic diversity has occurred during the last century, potentially increasing honeybees’ vulnerability to contemporary ecological and anthropogenic stressors.

Scientific RepoRtS | (2020) 10:10520 | https://doi.org/10.1038/s41598-020-67370-2 www.nature.com/scientificreports/ fungi or bacteria) in museum samples still poses limitations to aDNA shotgun sequencing experiments, as most sequences yielded would not be from the specimen. DNA capture-enrichment methods, in contrast, allow targeted sequencing by selective enrichment of sequences of interest prior to sequencing, hence increasing the depth of sequencing over target regions and lowering costs per target 25 .
As honeybees are thought of being at least partially domesticated, they were not usually collected with purpose by naturalists, but rather as incidental bycatches when collecting other insect species thought to have greater natural history value. Curation and annotations were therefore sparse and incomplete. Despite this, several historical collections exist. We took advantage of this, and generated genomic data on historical European honeybee specimens, allowing us to explore the genetic diversity of the species over the last 150 years. In this study, we assess the genetic diversity of honeybees across Europe from different time periods using historical museum collections and ancient DNA (aDNA) techniques.

Material and methods
Data collection. Pin-dried specimens of Apis mellifera covering most of the natural range of honeybees in Europe were obtained from museum collections. As collections were not digitized at the time of sampling, all information from labels were recorded manually, including when available: date of sampling, location name, geographical coordinates, sex, and name of collector. None of the collections had unique identifiers (voucher numbers) for these specimens, so internal identifiers were used consisting of two letter country code, four-digit year of sampling, and, if more than one specimen matched the country/year combination, one letter (a through e). This is the only information available for these specimens, and we therefore do not know if their origin is from feral, wild or managed colonies. As bees in managed hives can freely move around their area, gene flow between managed and feral hives is most likely unrestricted, so we believe this does not affect the interpretation of our results. Subspecies/lineage information was not known a priori when selecting specimens, so sampling attempted to cover a wide geographic and temporal span.
The time span of the specimens is from 1850 to 2002 (See Fig. 1 and Supplementary Table S1). DNA extraction, Illumina library preparations, and PCR setups were performed in a dedicated ancient DNA laboratory. Total genomic DNA was extracted using the non-destructive method described in Gilbert et al. 20 and detailed in Campos and Gilbert 21 . Probes for in-solution, hybridization capture enrichment kits (MYcroarray) were designed for randomly selected gene locations present in gene set AmelOGSV3.2 26 , but also for immune 27  Analysis. Sequencing data were analysed using a set of custom scripts and software (SI Appendix). Adapter sequences were trimmed and filtered for N's and reads shorter than 30 bp were removed using AdapterRemoval 29 . Trimmed reads were initially mapped to Amel 4.5 17, 26 using bwa-0.7.5a-r405, with seed length disabled to improve mapping efficiency in ancient DNA datasets 30 . The alignments were sorted using Samtools 31 and filtered for PCR duplicates using Picard MarkDuplicates-1.88 (https ://picar d.sourc eforg e.net), and for paralogs using BWA. We used ANGSD (Analysis of Next Generation Sequencing Data) 32 for quality filtering and data processing. In all ANGSD analyses, we required a minimum mapping quality of 30 and minimum base quality score of 20. We calculated error rates using an outgroup individual and an error free individual. We randomly selected a modern sample from lineage M from Poland (SRR957058) as the error-free individual, and the outgroup individual used was a modern Apis cerana (SRR957079). We used mapDamage 33 to display nucleotide misincorporation patterns and rescale the quality scores in the bam files. After rescaling we recalculated error rates and compared them with the previous estimates. The rescaled sequences were used in subsequent analyses. We calculated genome-wide coverage in the modern individuals and depth of coverage within the capture regions for both modern and historical individuals. Five historical samples were excluded from further processing, as they had an average coverage below 0.5 ×. Genotype likelihoods were estimated based on the aligned reads and associated mapping, and sequencing quality scores for all individuals.
Population structure. We used NGSadmix version 32 to test the number of genetically distinguishable populations in our data 34 . As the presumptive number of evolutionary lineages in Apis mellifera is five, we ran NGSadmix for K between two and nine. The evolutionary history of the individuals was inferred using Neighbor-Joining 35 . Haploid genotypes from ancient and modern samples were obtained by randomly sampling one read per position of each of the samples with ANGSD. The tree was built using the program RapidNJ 36 . FigTree v.1.4.4 (Rambaut, 2012; https ://tree.bio.ed.ac.uk/softw are/figtr ee/) was used to visualize the tree. To compare the genetic diversity of our dataset with other published honeybee datasets, we built two haplotype networks using mitochondrial DNA sequences. In addition to the modern and historical samples, we used other Apis mellifera mitochondrial sequences downloaded from the NCBI website (Table 1). All mitochondrial sequences were aligned using MAFFT 37 . Cytochrome b (CytB) and the sequence spanning from the beginning of the Cytochrome Oxidase I to the end of Cytochrome Oxidase II (COI-COII) were extracted separately from the alignment, according to the NCBI sequence KM458618.1 38 . Haplotype networks were reconstructed using TempNet 39 . The procedure was repeated using the non-admixed individuals from subsets as defined in Supplementary Table S1.  Table S1). We divide each group further, including four unmixed or less mixed individuals, based on geographic location (Supplementary Table S1). We estimated the population scaled mutation rate θ and the neutrality test statistic Tajima's D according to the method described in Korneliussen et al. 40 . ANGSD was also used to estimate Watterson and Pairwise θ. We used F-statistics to investigate the genetic distance between the populations observed in NGSadmix. Reynold weighted F ST 41 was calculated using ANGSD, for each of the subgroups defined above. We looked for outlier levels of F ST to identify loci that have probably undergone geographically restricted positive selection 42

Results
In this study, we sequenced 46 historical honeybees from 17 different European countries. Depth of coverage in the targeted regions was between 0 × and 59 ×, with an average of 25.75 × (1.43 × genome-wide [0.0 × to 5.02 ×]). Depth of coverage within the targeted regions was on average 20 times higher than in other sites in the genome, ranging from a threefold to a 66-fold increase ( Table 2). Five historical honeybees with depth of coverage below 0.5 × in target regions were excluded from further analysis. Depth of coverage was 7 × in the Apis cerana specimen used as outgroup, while in the rest of the modern specimens, depth of coverage ranged from 12 to 27 × (Table 2). Quality control. Sequences from modern specimens showed relatively low error rates between 0.01% and 0.16%. Historical samples presented higher error rates (between 0.1 and 0.7%), which correspond mainly to post-mortem deamination C->T and G->A (Supplementary Figure S1). These affected mostly the beginning and end of reads, which is to be expected in museum preserved samples. After masking transitions with mapDamage, error rates were halved. Fig. 2A, in parallel with the admixture plot for K = 8 (see next section). This value of K was chosen as it mirrors the clades inferred from the phylogenetic tree. There are two main branches: the top one includes the two African lineages: A and Y. The next lineage to split is lineage O, followed by C and M. Finally, lineage C further subdivides in two groups.

Phylogeny. Phylogenetic relationships between all the individuals is represented in
The haplotype networks obtained from cytochrome b and COI-COII region can be seen in Fig. 3. Both networks show some degree of clustering according to lineages, with individuals belonging to lineages C and M sharing some haplotypes.  Figure S2). Admixture was high for many historical European samples. Individuals from Scandinavia, and some from the Netherlands, England, and France have high levels of admixture. The level of admixture observed for modern samples was low as previously reported in Harpur et al. 28 , but this does not mean admixture does not occur, as individuals were chosen for that study because they were not admixed. www.nature.com/scientificreports/ Nucleotide diversity was higher in historic populations. Supplementary Figure S3 shows the distribution of Watterson theta across the different chromosomes, and a plot with the values of Tajima's D can be seen in Supplementary Figure S4. Nucleotide diversity is highest in historical O lineage individuals and lowest in the Y lineage samples. Individuals belonging to lineages A and M seem to have similar levels of nucleotide diversity. On the other hand, lineage C has lower levels of diversity but not as low as lineage Y. Boxplots with theta Watterson (Fig. 4A) and Tajima's D (Fig. 4B) were made using the sites from the haploid sampling and with the group      Table S3). Two genes are common in the three comparisons: MRJP7, major royal jelly protein, and an unnamed gene that synthesizes a membrane protein (Table 4).

Discussion
Historic data supports five genomic lineages in honeybees. The topology of the Apis mellifera phylogenetic tree which includes both modern and historical samples corroborates previous phylogenetic inferences and its division in five lineages 28,43,44 . In this analysis, we did not compare our data with the sequences from Wallberg et al. 43 , as they used SOLiD sequencing chemistry, and combining data from two different sequencing technologies might have caused biases in our estimates. Nevertheless, Cridland et al. 44 successfully generated a phylogenetic tree which included representatives of A, C, M, O, and Y lineages based on whole genomes that, globally, is congruent with the phylogeny generated in this study based on targeted regions ( Fig. 2A). In both phylogenomic reconstructions, lineage Y seems highly divergent, originating from lineage A shortly after the separation of A/Y from C/M/O. Wallberg et al. 43 supplementary Table S1.
Scientific RepoRtS | (2020) 10:10520 | https://doi.org/10.1038/s41598-020-67370-2 www.nature.com/scientificreports/ was from lineage Y rather than A 44 . Our specimens from Jordan and Lebanon do not show any sign of admixture when K = 5 or more and are basal in the Neighbor-Joining tree in regard to the European lineages. This would imply that admixture from lineage A is more recent than 1976, when our Jordanian samples were collected, or reflect different sampling localities. Harpur et al. 28 did not find any significant admixture between lineages in their data; however, this was to be expected giving their sampling strategy to avoid admixed individuals.
Decrease in genetic diversity in modern honeybees compared to historic populations. Despite the limited sample size, we think that our estimates of genetic diversity are reliable, as both simulation and empirical studies indicate that a large sample size is not required when analysing a large number of SNP markers 45,46 .
Our results suggest that genetic diversity has decreased in European honeybees over the last century. This is supported by the lower nucleotide diversity found in modern C-M samples (π ≅ 0.001), compared to historical ones (π ≅ 0.003). Genetic diversity influences a wide range of phenotypes in honeybee colonies, from expression of antimicrobial compounds, resistance to pathogens, thermoregulation, foraging behaviour and colony defence 8 , all essential to colony survival, and response to environmental stress, with lower genetic diversity reducing the variation of these phenotypes as well. Tasks within a colony, such as defence and hygienic behaviour, are performed by a small subset of workers descendent from only some patrilineal lines 8 . Differences in propensity for certain tasks are believed to be influenced by genetics. For example, hygienic and non-hygienic colonies have a difference in gene expression in Cytochrome P450 gene and a limited number of other genes 47 . This means that when genetic diversity is decreased the number of workers in a colony performing some tasks may decrease or less specialized workers will perform such tasks, decreasing the efficiency of the colony 48 . This may originate from high selection pressure selecting for traits based only on queen performance but ignoring the genetic contribution of drones or failing to maintain sufficient levels of genetic diversity within a colony. We hypothesize that management practices that increase relatedness between colonies, as well as a reduction of number and density of colonies due to, for example, a decrease of suitable habitat availability, are the main factors contributing to the observed decrease of genetic diversity. At the population level, genetic diversity can be affected by selective sweeps, background selection, temporal fluctuations in the direction of selection on segregating alleles 49 , and the level of genetic recombination 48 .
In addition, decreasing density or fewer colonies, demographic expansions, as well as habitat fragmentation would also lower genetic diversity. Colony density in wild populations in Europe is much lower than in African savannahs, despite harsher environmental conditions. This has been associated with more intensive beekeeping in Europe 50 . It has also been found that abiotic factors, such as temperature and land use, are associated with both density of colonies and genetic diversity 51 .
Domestication and professional breeding aim at selecting individuals with specific traits, consciously or unconsciously narrowing genetic variation. Artificial selection on managed hives, however, would only have an indirect effect on wild colonies when drones from managed hives breed with wild queens, or new queens from a managed hive establish a new colony in the wild. This is very frequent, as beekeepers do not track all bees in their colonies. In any case, in many European countries, there are much less wild colonies than managed ones www.nature.com/scientificreports/ due to the lack of suitable nesting places, so gene flow between them has probably masked any genetic difference between them.
Our results seem to contradict those of a study from 2012 52 where Harpur and colleagues find the withincolony diversity to be higher in managed colonies than in wild ones, while our results show global patterns of decline within lineages. It should be emphasized that the two are studying different levels of genetic diversity, theirs within-colony diversity in admixed populations, and ours on a meta-population level looking only at nonadmixed individuals. It is not surprising that Harpur et al. found higher diversity within managed colonies, as beekeepers will bring in stock from other parts of the world, what they called the progenitor populations, and these will admix with the local populations despite beekeeper's intentions, creating higher diversity descendants as de la Rua highlights in a reply to that work 53  Protein is a family of nine genes and one pseudogene located in tandem on a 60 kb cluster located on chromosome 11 54 . This family of genes encodes secretory proteins that are the major protein content of Royal Jelly, a nutrient-rich substance produced by nurse bees used to feed the larvae, which is only found in some genus of Hymenoptera 55 . MRJP seems to have evolved recently deriving from the Yellow family of genes and it seems to have diversified independently in each species where it has been found. In honeybees, MRJP is mostly expressed in workers (particularly nurses) but also other castes 56,57 , and besides being involved in the production of Royal Jelly, it has also been associated with brain function 58 , caste determination and many aspects of eusociality 54 . These functions seem to derive from Royal Jelly's function in establishing division of labour in the colonies through determining the development of larvae into queens and worker. However, their biochemical function is not determined at the moment. MRJP 4 is down regulated in honeybee heads after infection 54 . Given that Royal Jelly Proteins affect many aspects of behaviour, nutrition and development, and that this pattern of selection is found not only when analysing modern bees alone 28 , but also when comparing historical bees to modern bees, we speculate that domestication can be responsible for the selection signal. While MRJP7 is but one gene within the MRJP/Yellow family, we speculate that selection on this gene, due to its association with nutrition and development, could be caused by selective pressure from beekeeping practices such as the desire for higher honey production or more fertile queens.
The observed decrease in genetic diversity could potentially have an impact on the ability of colonies to react and survive to current and upcoming threats, such as pathogens, pesticides, and climate change. The distribution of evolutionary lineages and admixture proportions in honeybees is not fully understood yet. Several geographical regions are under-sampled, such as most of Africa and areas of the Middle East. The fragmented nature of the sampling carried out in most honeybee genetic studies, has made the lineage nomenclature inconsistent, making comparisons among studies difficult, unreliable or impossible. Mapping with better precision the distribution of each lineage and areas of current and past admixture would help us to better understand the population dynamics of honeybees. Natural history collections with proper annotations of sampling locality and date prove once again to be an essential resource to study temporal trends and provide a glimpse of evolutionary processes occurring in historical times.

Data availability
The BAM files of the sequence data mapped to the Apis mellifera genome v4.5 have been deposited in the Short Read Archive under BioProject PRJNA505606. Custom scripts used to analyse the data can be accessed in https ://githu b.com/Lucia RT/code-bees.