Background & Summary

The straw-coloured fruit bat (Eidolon helvum) is a common, widely distributed, migratory species, occurring across sub-Saharan Africa and some offshore islands (Fig. 1)1,2. Since 2007, investigations into the epidemiology and ecology of zoonotic viral infections in E. helvum have been undertaken via longitudinal sampling of wild populations in Ghana. Complementing this, between 2008–2011 and in 2014, cross-sectional sampling events were undertaken to determine the genetic population structure of E. helvum, and to assess whether the serological findings in Ghana were representative across the species’ range (Fig. 1).

Figure 1: Map showing location of E. helvum sampling locations for genetic and serological analyses.
figure 1

Shading represents the distribution range of E. helvum. Sampling locations are numbered as in Table 1 (available online only). Adapted with permission from Mickleburgh et al.43 and Peel et al.12.

Four viruses were the focus of our serological surveys in E. helvum bats: Lagos bat virus (LBV), African henipaviruses, Achimota virus 1 (AchPV1) and Achimota virus 2 (AchPV2). Lagos bat virus is one of at least 15 known species in the Lyssavirus genus3 and has been isolated from E. helvum on multiple occasions4,5. An African henipavirus is still yet to be isolated, however a full genome sequence has been obtained (putative name: African bat henipavirus Eid_hel/GH-M74a/GHA/2009 (M74))6. Achimota viruses 1 and 2 are closely related rubulaviruses for which serological evidence suggestive of spillover to humans in Africa exists7.

The specific aims of the data collection were to:

  • Investigate whether antibodies to LBV, henipaviruses and Achimota viruses are present in E. helvum across its continental and island range, and to explore the antibody dynamics where possible.

  • Describe the genetic metapopulation structure of E. helvum using a combination of mitochondrial (mtDNA) and microsatellite markers.

  • Gather information on E. helvum distribution and seasonal patterns of reproduction.

  • Combine results from these multidisciplinary studies to make inferences about virus transmission dynamics, and ultimately make inferences on the spillover risk to human populations.

Samples in this dataset are from 2827 bats from nine countries over an 8-year period (Fig. 2, Table 1 (available online only)). Raw data comprises spatial (roost location), seasonal (timing of sampling and seasonal birth pulses), morphological (forearm length, body weight), demographic (age, sex, reproductive status, mother-offspring relationships) and identification (individually numbered thumb-band) components. Data generated includes genetic characterisation (mtDNA sequencing and microsatellite genotyping) and serological assay results (for LBV, henipaviruses and Achimota viruses).

Figure 2: Sampling intensity per month, by country.
figure 2

Red represents high sampling intensity and the numbers of samples collected per month is recorded within each grid cell. Records in the database with unknown collection date are not represented here (nine from Annobón, seven from Bioko, 14 from Príncipe, 10 from Rio Muni and 13 from São Tomé).

Table 1 Overview of Eidolon helvum sampling locations and sample types used in this study

Multiple publications have arisen from these data, however many aspects remain unexplored. Demographic analyses have estimated birth and survival rates8,9, and explored the effect of hunting on the latter9. Variations in roost composition have suggested a fission-fusion social structure9. Serological analyses have identified: the presence of antibodies against LBV, henipaviruses and Achimota viruses in E. helvum in Ghana7,10,11 and more broadly across the species’ range, including isolated off-shore islands7,12,13; that these viruses circulate endemically in E. helvum in Ghana, with evidence of horizontal transmission7,14,15; and that E. helvum bats previously infected with LBV can have long-survival post infection16. The henipavirus dataset was used to develop a Bayesian method to determine appropriate cutoffs for serological assays17. Population genetic analyses identified that E. helvum are panmictic across their continental range, but that genetically isolated populations exist on isolated islands12.

Other publications arising from these samples, but based on analyses not included here, include the development of a universal real-time assay and a pseudotype neutralisation assay for Lyssaviruses18,19, microsatellite loci characterisation20, estimation of divergence times between Eidolon sister species21, inference of movement ecology based on stable isotope ratios22, demonstration of Ebola antibodies16,23, identification of multiple novel viruses24, and novel Bartonella species in bat flies collected from E. helvum25,26.

This dataset contributes a substantial volume of data on the ecology of E. helvum and its viruses and will be valuable for a wide range of studies. In particular, an age-specific dataset such as this is rare and valuable for wildlife, especially bats. Further analyses could include viral transmission dynamic modelling in age-structured populations, including the use of cutting-edge Bayesian approaches to address complex epidemiological questions27; time-series analyses on 5 years of wild henipavirus serological data from the same study site in Ghana (n=1486 data points), investigation of seasonal reproductive asynchrony in wide-ranging species; ecological niche modelling; inference of island colonisation history, exploration of relationships between island and body size; and various spatial analyses of demographic, morphometric or serological data. Field samples (e.g. serum, blood cells, urine, skin samples) and extracted DNA from individual bats in this dataset exist in storage and the authors are open to collaborative requests to undertake further analyses.


These methods are expanded and modified versions of descriptions in our previous publications, as cited in each section below. All associated data can be found in ‘Eidolon helvum data 2007–2014.csv’ [Data Citation 1].

Capture and Data Collection

Capture and sampling information has been described previously e.g. (refs 10,12). Sampling locations comprised 13 E. helvum roosting sites in continental Africa, and 14 in the four main islands in the Gulf of Guinea (Fig. 1, further detail in Table 1 (available online only)). In the majority of locations, data are from a single sampling event (sometimes comprising multiple sampling sessions within a one month period). Repeated sampling was conducted in Ghana (multiple sampling events per year over four years), Tanzania (one sampling event per year over two years) and Annobón (three sampling events over 4 years) as these locations were the focus of specific research studies. All fieldwork was undertaken under permits granted by national and local authorities (listed in Acknowledgements) and under ethics approval from the Zoological Society of London Ethics Committee (WLE/0489 and WLE/0467), using field protocols which followed ASM guidelines28. Bats were captured at the roost with mist nets (6–18 m; 38 mm) as they departed the roost site at dusk, or returned at dawn. Except for a proportion of bats that were euthanased for virological studies (n=238), bats were released following sampling. Additional samples and data were obtained from other research groups (n=152) and in collaboration with local hunters in São Tomé (n=102), where bats are hunted for human consumption.

Personal protective equipment (long clothing, face masks, eye protection and gloves) was worn during sample collection. Morphometric and demographic details were recorded from bats under manual restraint. Female reproductive status was assigned as non-reproductive, pregnant, or lactating, according to the descriptions provided in Table 2. The phase in the reproductive cycle (i.e. the time in months between the sampling date and the beginning of the last birthing season) was estimated based on published data and the pregnancy status of females (foetal size, assuming a true gestation period of 4 months (Mutere 1965)) or degree of juvenile development during sampling.

Table 2 Reproductive status classifications for female E. helvum.

Age was assessed by morphological characteristics (Table 3) and all individuals were placed into one of four age classes: Neonate (N; <2mths), Juvenile (J; 2–<6 months), Sexually Immature (SI; 6–<24 months) or Adult (A; ≥24 months). For a subset of samples, the timing of sampling in relation to the birthing season permitted further classification of SI individuals into 6-month age groups SI.1, SI.2 and SI.3 (6 –<12, 12–<18, 18–<24 months, respectively). Additionally, for bats that were hunted or euthanased following capture, upper canine teeth were extracted, air dried and shipped to the USA (Matson’s laboratory, USA) for histological examination to assess the number of tooth cementum annuli present29,30. Following previous studies31, it was assumed that each observed cementum layer represented one year. Each age estimation was scored with a certainty code: A: highest certainty of reported age (51% of samples, e.g. Fig. 3a,b), B: histological evidence supported a given age result±0.5–1.5 years (46% of samples, e.g. Fig. 3c), or C: tooth or section quality was too compromised to accurately age (3% of samples).

Table 3 Age classification system used for E. helvum
Figure 3: Histological sections of upper canine teeth from E. helvum for cementum age analysis (Giemsa stain).
figure 3

Photographs and captions courtesy of Gary Matson, Matson’s Laboratory, MT, USA. Each age estimation was scored with a certainty code: A: highest certainty of reported age, B: histological evidence supported a given age result±0.5–1.5 years, or C: tooth or section quality was too compromised to accurately age. (a) Bat ID 424. Cementum age 2, certainty code A. 100X. The tooth was in excellent histological condition, as indicated by the presence of periodontal membrane and good differential staining between annuli and light cementum. (b) Bat ID 62. Cementum age 6, certainty code A. 100X. Annuli are complex, with at least two components each year. A key feature of age analysis is resolving uncertainty about whether complex annuli or individual components are being used as age indicators. (c) Bat ID 44. Cementum age 13, certainty code B (13–15 yrs). 400X. The root tip of this tooth had been broken off during extraction. Missing cementum complicates age analysis, reducing the evidence available for evaluating whether annuli observed at one point may be clearly identifiable as components of complex annuli at another point.

Genetic and blood samples were collected under manual restraint. Wing membrane biopsies (4-mm) were placed into 70% alcohol. Up to 1 ml blood was collected from the propatagial vein using a citrated 1 ml syringe and placed into a plain 1.5 ml eppendorf tube.

Molecular methods

Molecular methods have been described previously12,20. Genomic DNA was extracted from E. helvum tissues (predominantly wing membrane biopsies, but also liver and muscle samples, all stored in ethanol) using DNeasy Blood and Tissue Kits (QIAGEN Ltd., Crawley, West Sussex, UK). DNA was quantified using Quant-iT PicoGreen dsDNA kits (Molecular probes, UK), and later using a Nanodrop ND-1000 Spectrophotometer (Thermo Fisher Scientific, UK) and diluted to a standard concentration.

Twenty E. helvum loci developed in a previous study20 were quality-checked using a subset of samples. Loci E and Ae were discarded due to difficulty in scoring or high error rates and data from locus Ag were re-binned and re-scored, correcting earlier issues with allelic dropout. In total, 170 continental and 385 island samples were run as multiplex PCRs at 18 loci (TSY, FWB, MNQX, AgPK, AcAfAi, AdAh) in 10 μl PCRs, containing 4ng template DNA, 0.2 μM of each primer, and 5 μl Type-it Multiplex PCR Master Mix (QIAGEN Ltd.). Positive and negative controls were included on each plate and amplification was performed using the following conditions: 5 min at 94 °C; 30 cycles of 30 s at 95 °C, 90 s at 57 °C, and 30 s at 72 °C; then 30 s at 60 °C. Genotyping was performed by capillary electrophoresis using a Beckman CEQ 8000 (Beckman, UK). Allele sizes were scored automatically prior to manual verification. Genotyping data from 18 loci are provided in ‘Eidolon helvum data 2007–2014.csv’ [Data Citation 1]. Loci B has previously been identified as being X-linked20.

Fragments of the mitochondrial DNA cytochrome b (cytb) gene were amplified from continental samples by PCR using the generic primers L14722 (5′- CGA AGC TTG ATA TGA AAA ACC ATC GTT G)32 and H15149 (5′- AAA CTG CAG CCC CTC AGA ATG ATA TTT GTC CTC A)33 in 20 μl reactions, containing 0.1–1 ng template DNA, 0.2 μM of each primer, 0.25 mM of each dNTP, 1.5 mM MgCl2, 0.25 μl of Taq polymerase (Invitrogen), and 0.2 μl 10X reaction buffer and with the following conditions: 5 min at 94 °C; 40 cycles of 1 min at 93 °C, 1 min at 54 °C, and 2 min at 72 °C; then 7 min at 72 °C. Although these generic primers were adequate with continental samples (8% PCR failure), amplification from isolated Gulf of Guinea island samples was less successful (48% PCR failure). Shortened primers (EhM2814 (5′- GCT TGA TAT GAA AAA CCA TCG TTG) and EhM2815 (5′- CAG CCC CTC AGA ATG ATA TTT GT) resulted in successful amplification when using Microzone MegaMix-Gold reagent (Microzone Ltd, UK). PCRs were performed in 20 μl reactions, containing 2 ng template DNA, 0.25 μM of each primer, and 10 μl MegaMix-Gold, using the following conditions: 5 min at 95 °C; 33 cycles of 30 sec at 95 °C, 30 sec at 53 °C, and 45 sec at 72 °C. PCR products were checked by gel electrophoresis on 1% agarose gels, purified using Exosap-IT clean-up (USB Europe, Germany) and sequenced in both directions on an ABI 3730xl DNA Analyser, (Applied Biosystems). Paired sequences were edited and aligned using the STADEN Package v1.6 (ref. 34). Multiple sequence alignment was performed using default settings in T-COFFEE35. Sequences were checked manually and trimmed to a standard length (397 bp) in JALVIEW v2 (ref. 36). No sequence differences were detected in 38 samples sequenced using both primer pairs, so data were combined.

Data from 608 and 544 individuals is available for cytb and microsatellite analyses (at 18 loci), respectively (Table 1 (available online only)).

Serological analyses

Serological methods have been described previously7,1014.

A modified fluorescent antibody virus neutralization (mFAVN) assay using the LBVNig56 isolate was used to detect neutralising antibodies against LBV10,37. Samples were tested in duplicate using threefold serial dilutions (representing reciprocal titres of 9, 27, 81, and 243–19,683). Human rabies immunoglobulin, LBV-positive rabbit serum, and rabies-vaccinated mouse serum were used as positive controls and negative rabbit and mouse serum were negative controls. Titres were considered positive at IC100 endpoint reciprocal dilutions >1: 9 (100% neutralisation of virus).

Henipavirus antibodies detected in African fruit bat samples using virus neutralisation assays, multiplexed microsphere assays and pseudotype assays developed to target other known henipaviruses (Hendra and Nipah viruses) and are presumed to represent cross-neutralisation or cross-reactivity12. Here, Luminex multiplexed microsphere binding assays were used to detect antibodies against henipaviruses (HeV and NiV). In these assays, purified recombinant expressed henipavirus soluble G glycoproteins38 are conjugated to internally coloured and distinguishable microspheres, allowing multiplexing. For African bat samples, stronger results were consistently observed in NiV binding assays and virus neutralisation tests13, so only NiV binding assay results are included in the dataset. Binding results are outputted as median fluorescence intensity (MFI) values of at least 100 microspheres for each virus type. In mid-December 2010, major repair work was undertaken on the Luminex machine being used for serological analysis. A subset of samples that had been analysed before the repairs were repeated to calibrate results (n=293). MFI values pre- and post-repair work were significantly different, making the use of a single cutoff inappropriate17. Two approaches were taken to designate results as seropositive or seronegative. First, a Bayesian mixture model was applied as described in17. Cutoffs for pre- and post-repair work were determined so that samples above this cutoff were ≥ 99% likely to be in the seropositive distribution (MFI=156.1 and 127.5, respectively). Second, linear regression of pre- and post-maintenance ln(MFI) values demonstrated a significant linear relationship (Fig. 4, R20.81, F-statistic: 1306 (1, 296), P<2.2e-16), and the variance decreases for higher MFI values (above the cutoff). Pre-maintenance MFI values were converted to post-maintenance values using the formula:

Figure 4: Correlation between ln(MFI) values pre- and post-repair of the Luminex machine used to run the assays, based on 293 samples.
figure 4

The linear regression line is in red. (R20.81, F-statistic: 1306 (1, 296), P<2.2e–16).

NEW_MFI=exp (0.7795774*ln(OLD_MFI)+0.4392832).

The Bayesian mixture model was applied to this transformed and combined data using the same method. From this analysis, MFI values >94.2 were ≥ 99% likely to be in the seropositive distribution. Results from the two methods were compared and the second method resulted in the highest congruence between pre- and post-maintenance paired results (congruence in 266/298 samples versus 250/298 samples for the first method), and these data were therefore used in the final dataset. Raw MFI values are available on request.

Antibodies against Achimota viruses 1 and 2 were detected using virus neutralisation assays7, with all testing in duplicate. Samples were diluted to 1:20 and incubated with 200 TCID50 of virus for 30 min at 37 °C prior to the addition of Vero cell suspension at an MOI equivalent to 0.01. Cell monolayers were assessed for evidence of virus neutralization 7 days post infection. Where sample volume permitted, positive samples were titrated in a 2-fold dilution series from 1: 20 to 1: 160 and retested using the same protocol.

Data Records

The data are contained in a single comma-separated file (.csv format), entitled ‘Eidolon helvum data 2007–2014’ (Data Citation 1). Each row below the header represents an individual bat (n=2,827), and the columns (n=68) contain sample identifier information, demographic and morphometric data, and results of genetic and serological assays. Full descriptions of the column titles are included in the Table 4 (available online only).

Table 4 Descriptor codes for data file

Technical Validation

Molecular analyses

Recommendations for minimisation and assessment of errors that may occur during the sampling, DNA extraction, amplification, sequencing, genotyping and data analysis processes were followed where possible39 (Table 5).

Table 5 Measures adopted to minimise and allow assessment of errors which may occur during the sampling, DNA extraction, amplification, sequencing, genotyping and data analysis processes (adapted from Bonin et al. 2004; Table 4).

As previously described20, microsatellite loci were tested for evidence of departure from Hardy–Weinberg equilibrium (HWE) and genotypic disequilibrium using FSTAT 2.9 (ref. 40), with appropriate Bonferroni corrections for multiple testing. All loci were analysed in MICRO-CHECKER41 to test for null alleles, stuttering and large allelic dropout as a cause of departure from HWE. Additionally, since Locus M displayed extremely low polymorphism (99.1% of individuals were homozygous for a particular allele), this locus was included in all PCR plates as a positive control and to determine inter-assay variability in allele fragment length. Error rates for microsatellite loci are reported in Peel et al.20 Inter-assay genotyping variability, measured by the variation in fragment length of the dominant allele of locus M on each plate, was low (range 134.32−134.66) across 27 runs and two control samples. Loci Y proved difficult to confidently bin due to alleles of single nucleotide difference and was therefore not included in the dataset.

Error rates for cytb analyses were assessed by replicate extractions (performed on 2.4% of samples), replicate PCR and sequencing reactions (performed on 8–14% of extracted samples), and by inclusion of positive and negative controls for all extractions and PCRs. Poor quality mtDNA sequence traces were excluded. Background PCR and sequencing error rates of the new E. helvum cytb primers EhM2814 and EhM2815 were assessed by running 70 replicates of a single sample. PCR and sequencing error rates were calculated at the base-pair level. Sequencing error rate was negligible (0–0.01%) across samples repeated in duplicate, and no substitutions were observed in the 70 replicate sequences obtained from a single sample (Table 6).

Table 6 Mitochondrial DNA PCR repeat rates and sequencing error rates.

Serological analyses

All serological assays included positive and negative controls. Samples were tested in duplicate (LBV and Achimota viruses) or with 100 replicates (henipaviruses). Further validation procedures for multiplexed microsphere binding assays are presented as part of the methods, above.

Usage Notes

Users of these data are advised that importing the.csv data file (Data Citation 1) into Microsoft Excel can result in formatting errors, particularly with the column ‘Teeth.Age.Range’. Rather than opening the file with Excel (by double-clicking, for example), it is suggested that users instead select ‘File>Import>csv file >Delimited’, then select the ‘Teeth.Age.Range’ column and set the column data format as ‘Text’. Alternatively, importing and processing the data into the software ‘R’42 may be preferable.

Additional Information

How to cite: Peel, A. J. et al. Bat trait, genetic and pathogen data from large-scale investigations of African fruit bats, Eidolon helvum. Sci. Data 3:160049 doi: 10.1038/sdata.2016.49 (2016).