In the last decade, ancient DNA research has grown rapidly and started to overcome several of its earlier limitations through Next-Generation-Sequencing (NGS). Among other advances, NGS allows direct estimation of sample contamination from modern DNA sources. First NGS-based approaches of estimating contamination measured heterozygosity. These measurements, however, could only be performed on haploid genomic regions, i.e. the mitochondrial genome or male X chromosomes, but provided no measures of contamination in the nuclear genome of females with their two X chromosomes. Instead, female nuclear contamination is routinely extrapolated from mitochondrial contamination estimates, but it remains unclear if this extrapolation is reliable and to what degree variation in mitochondrial to nuclear DNA ratios affects this extrapolation. We therefore analyzed ancient DNA from 317 samples of different skeletal elements from multiple sites, spanning a temporal range from 7,000 BP to 386 AD. We found that the mitochondrial to nuclear DNA (mt/nc) ratio negatively correlates with an increase in endogenous DNA content and strongly influenced mitochondrial and nuclear contamination estimates in males. The ratio of mt to nc contamination estimates remained stable for overall mt/nc ratios below 200, as found particularly often in petrous bones but less in other skeletal elements and became more variable above that ratio.
The emergence of Next-Generation-Sequencing (NGS) technologies has substantially advanced the field of ancient DNA (aDNA) research1. Besides its high throughput, NGS gave rise to analyses of ancient DNA-specific DNA damage to reveal patterns that authenticate ancient DNA. Post-mortem DNA continuously accumulates characteristic modifications, in particular deamination at the DNA fragment ends2,3,4. Deamination frequency thus increases over time and can therefore reveal the ancient origin of an aDNA sample3. As a second step required for authenticating ancient human DNA, NGS data allow estimating contamination levels of human DNA directly1. Earlier estimates of contamination levels in NGS data from early modern humans were based on so-called diagnostic positions on the mitochondrial DNA (mtDNA), i.e. nucleotide positions that differ between the sample and a comparative dataset of complete modern human mtDNA sequences from world-wide populations5. Reads exhibiting a different allele at these positions than the majority of reads likely constitute contamination. Today, Bayesian approaches allow even more precise estimates by identifying the proportion of sequencing reads that were considered to be authentic aDNA from the studied individual6.
While quantifications of contamination levels in the haploid and maternally inherited mtDNA are achieved by measuring levels of heterozygosity1, contamination estimates for the diploid nuclear genome can only be obtained for the haploid sex chromosomes in male individuals, especially for the larger X chromosome7. For female individuals, most current studies therefore restrict analyses to nuclear DNA sequences with typical aDNA damage patterns and thus sequences of trusted ancient origin, often reducing datasets by an order of magnitude8. Alternatively, female mtDNA contamination estimates are extrapolated to the nuclear level and only female samples with low mtDNA contamination estimates are used for population genetic analysis on the nuclear genome. Whether this extrapolation is reliable, however, remains untested, in particular given that mitochondrial to nuclear DNA ratios can substantially vary between and even within bone samples9, potentially affecting the extrapolation of mtDNA to nuclear contamination levels.
The reliability of this extrapolation may also depend on the chosen skeletal element given differences in endogenous DNA content and relative amounts of mitochondrial and nuclear DNA. Recent work has identified the petrous part of the temporal bone as a particularly rich source of endogenous DNA and thus an ideal candidate for aDNA studies10,11. Endogenous DNA portions in this skeletal element exceed that in other parts of the skeleton by up to a factor of 40012. This reduces the cost of shotgun sequencing for ancient human genomes to sufficient coverage10 and allows obtaining sequences from geographic regions with typically poor DNA preservation, such as the Near East, Remote Oceania and Africa13,14,15,16,17. The exceptionally good DNA preservation in petrous bones has been linked to the high bone density of the bony labyrinth10, which reaches adult size during gestation18 and shows reduced bone remodeling compared to the surrounding tissue19. These conditions seem to be ideal for DNA preservation. Initial studies suggest low mt/nc ratios for petrous bones20. We hypothesize that these low ratios allow reliable estimates of contamination with human DNA using mtDNA, especially for female individuals.
This is the first study to explore systematic differences in mt/nc ratios between skeletal elements and classically used negative controls, and the relationship between mt/nc ratios and contamination estimates. We compare mt/nc ratios and overall DNA preservation in petrous bones, teeth and other skeletal elements in newly produced and publically available aDNA datasets. Our data show that DNA contamination estimates in males strikingly vary with mt/nc ratios, and argue that the typically low mt/nc ratios in petrous bones make them ideal for making reliable contamination estimates using mtDNA.
Differences in the mt/nc ratio between sites
For each of the three tested skeletal elements we investigated samples from different sites (Table 1), labelled P1 through P7 for petrous bones, T1 through T4 for teeth, and B1 through B4 for other bones. The subsample pairs T3/B1, T4/B2 and P1/T1 originate from the same studies, respectively. Given that any systematic differences between sample origins in, for example, general preservation, laboratory protocols, soil type, climatic conditions and taphonomic alterations can introduce between-sample variation, we first assessed the degree of origin-related variation in mt/nc ratios within each skeletal element (Fig. 1). Indeed, we detected significant variation between sample origins in all skeletal elements (Petrous bones ANOVA F6,50 = 20.15, p < 0.001; Teeth ANOVA F3,181 = 25.25, p < 0.001; Bones ANOVA F3,71 = 9.26, p < 0.001), with at least one sample differing from all remaining ones in each skeletal element (Tukey-HSD tests, Fig. 1). In all the following statistical models comparing differences in mt/nc ratios between tissues, we therefore routinely – and conservatively – accounted for between-origin variation while maintaining the paired nature of samples from the same source by integrating study ID and site ID as random factors.
Differences in the mt/nc ratio between skeletal elements
To assess the mt/nc ratio of potential sources for contamination of archaeological samples, we also measured the mt/nc ratio of human contamination originating from plant extracts21 and laboratory negative controls. If mt/nc ratios of modern human contamination exceeded those of the ancient human DNA, the relative contamination impact would be stronger on the mtDNA than on the nDNA, and vice versa. We found no significant difference in the loge[mt/nc ratio] between both controls (ANOVA F1,21 = 2.33, p = 0.14), therefore combining them for all further analyses.
The mt/nc ratios showed significant variation (Fig. 2) between the three skeletal elements (petrous bones, teeth and bones) and controls (Chi² = 9.54, df = 3, p = 0.023). Petrous bones had significantly lower mt/nc ratios than teeth and showed a trend towards lower mt/nc ratios than other bone samples, while teeth and other bones showed similarly high ratios (Tukey HSD post hoc tests as indicated in Fig. 2). The mt/nc ratios of the combined control group were intermediate between the ratios derived from teeth and bones and those derived from petrous bones. In addition to these differences in mean mt/nc ratios, skeletal elements also showed striking differences in mt/nc ratio variation (Levene’s test F3,336 = 15.01, p < 0.001). Variation was again similar between teeth and other bones (Tukey HSD post hoc P = 0.75) but significantly larger in these two than in petrous bones (both P < 0.0001).
Factors influencing the mt/nc ratio
To assess the degree to which mt/nc ratios vary with sample characteristics within sample origins, we tested relationships with endogenous DNA content, age, and mapping stringency. Despite much between-sample variation, mt/nc ratios almost consistently decreased with increasing endogenous DNA content across skeletal elements and sampling origins (Fig. 3, mixed model endogenous DNA-effect Chi² = 6.57, df = 1, P = 0.010, overall regression slope = −0.224 ± 0.082, mean ± SE).
In contrast, we detected no significant relationship between radio carbon dates of the P2, P3, T3 and B2 samples and their loge[mt/nc ratio] (mixed model sample age-effect Chi² = 0.63, df = 1, P = 0.426). Changes in the mapping stringency in terms of mismatches allowed per read (BWA parameter –n, with tested values 0.001, 0.01, 0.1, 0.2 and 0.8 corresponding to 5, 3.7, 2, 1.7 and 0 mismatches for the average fragment size of T1 and T2 of 55 bp) showed some fluctuation in the mt/nc ratio in T1 and T2 teeth, but no directional effect on mean mt/nc ratios (linear regression F1,125 = 2.049, P = 0.155, R² = 0.016).
Effect of changes in mt/nc ratio on contamination estimates
To evaluate whether mitochondrial and nuclear contamination estimates vary with sample mt/nc ratios, we analysed male individuals where sufficient NGS coverage allowed reliable estimates of mtDNA and nuclear contamination levels. Mitochondrial and X-chromosomal contamination rates were by far most strongly biased towards mt-contamination in samples with high mt/nc ratios (Fig. 4). In contrast, where mt/nc ratios were small, nuclear contamination estimates were close to mtDNA contamination rates with an average ratio of 1.47 ± 0.44 SE. In addition, contamination ratios remained stable across loge[mt/nc ratio] between 0 and loge(200) = 5.3 (mixed model loge[mt/nc ratio]-effect Chi2 = 0.80, df = 1, p = 0.37), but strikingly increased with loge[mt/nc ratio] at values exceeding loge(200) = 5.3 (Chi² = 5.95, df = 1, p = 0.015). This stability is reflected in the fact that samples with a ratio below loge(200) = 5.3 are located on the bisecting line when plotting MT contamination against X contamination (Fig. 4).
In this work, we show that variation in the mt/nc ratio of ancient DNA samples strongly influences the estimates of contamination by human DNA, which is one of the most critical steps when analysinig ancient human DNA. Besides the characteristic age related misincorporation2, estimates of contamination rates are used for authentication of ancient human DNA. Currently available methods for contamination estimates of the mtDNA and nDNA levels measure heterozygosity in haploid regions of the mtDNA, or the X chromosome in males6,22,23. However, at present it is not possible to estimate contamination levels in the nDNA of female individuals. Recent studies of ancient human nDNA either restrict their analysis to male individuals with known nDNA contamination levels or extrapolate contamination levels of mtDNA to nDNA in the case of females.
Our results show that contamination levels extrapolated from the mtDNA to the nDNA might systematically underestimate the actual level of nDNA contamination. This appears particularly likely for teeth and other bones with a high mt/nc ratio, where such extrapolations (as often done for female specimens) may be highly misleading. For example, three “other bone” samples in the current study exhibited less than 3% contamination on the mtDNA, contrasting to nDNA-contamination between 16 and 30% measured on the X-chromosomes. All three samples were characterized by high mt/nc ratios spanning 870 to >56.000. For these samples, it is not possible to determine from the distribution of reads on the X and Y chromosome with certainty if they are males with a contaminated X chromosome or if they are females with male contamination. Either way, the contamination on the nuclear level is too high for further population genetic analysis and even in absence of Y chromosomal reads as an indicator of contamination, the extrapolation from the mtDNA contamination level could still underestimate the actual nuclear contamination level in case of high mt/nc ratios. Therefore, we advise to take the mt/nc ratio into consideration if only the mtDNA contamination estimate is used in order to choose female samples to be included in population genetic analysis. An effect of mt/nc ratio on the contamination estimates of mtDNA and nDNA as described above was already suggested in the context of analysing Neandertal nuclear and mtDNA9,24 and receives further support by our statistical analysis.
In contrast to samples with a high mt/nc ratio, samples with a ratio below 200 (loge = 5.3) seem to provide a rather reliable estimate of contamination on the mitochondrial level (Fig. 4). We included two types of possible contamination in our study: modern human DNA contamination in archaeological samples measured from aDNA plant extracts from 6,000 year old barley21 and contamination introduced during lab work measured from the negative controls carried along in the entire laboratory workflow. The mt/nc ratios of these potential contamination sources were intermediate to those detected in the three tested skeletal elements (Fig. 2). Therefore, these types of contamination contribute more to the contamination level of mtDNA than nDNA in samples showing lower ratios than these controls. Our systematic comparison of a large number of teeth, diverse bones and petrous bones indicates that the petrous bones display relatively low mt/nc ratios and hence have a high chance to show even lower contamination rates on the nDNA than extrapolated from their mtDNA. In addition, we found that the mt/nc ratio in petrous bones not only tended to be smaller compared to teeth and other bones but also exhibited significantly less variation (Fig. 2). The lower mt/nc ratios in petrous bones increase the chance to obtain reliable extrapolations of nDNA contamination rates from estimated mtDNA contamination. Furthermore, the extraordinarily good preservation of endogenous DNA in petrous bones makes it possible to overcome the limited preservation of aDNA in most environments12. Hansen and colleagues20 show that teeth can have even higher contents of endogenous DNA than petrous bones. In our dataset, however, the majority of petrous bones displayed clearly higher endogenous DNA contents than teeth, leading us to maintain petrous bones as the prime source for highly concentrated and little contaminated endogenous DNA.
From these influences of the mt/nc ratio on the contamination estimates, we conclude that the mt/nc ratio should be considered if mtDNA contamination estimates are used to select female samples for nDNA studies. Reasons for the observed variation in this ratio have already been subject of different studies and no clear reason for the elevation of the mt/nc ratio in some aDNA samples compared to modern DNA has been identified. One possibility could be a bias of the mt/nc ratio originating from bacteria, since mtDNA contains features of a bacterial chromosome and bacterial reads from the environmental background might map better to the mtDNA elevating the mt/nc ratio. However, mapping with different stringencies (different numbers of allowed mismatches per read, BWA parameter -n) did not reveal a particular trend. Allowing more mismatches (-n 0.001, -n 0.01 and -n 0.1) resulted in similar values in the ratio than more stringent mapping (-n 0.2 and -n 0.8). We could therefore not detect any effect of the bacterial background in aDNA extracts on the correlation between high mt/nc ratios and lower percentages of endogenous DNA.
It has been suggested that the higher ratios in aDNA might originate from better protection of DNA in the mitochondrion due to its double membrane25. In addition, Schwarz and colleagues25 and Allentoft and colleagues26 conclude from their observation of longer mtDNA fragments compared to nDNA that mtDNA decays more slowly than nDNA. This would result in increasing ratios through time. No such trend is detectable in our data for petrous bones (P2, P3) and teeth (T3), and only a weak trend in other bones (B2). Therefore, we found no indication that different decay rates cause the observed variation in the mt/nc ratio.
Another reason for differences in mt/nc ratios could be higher metabolic activity in particular skeletal elements, as suggested by Higgins and colleagues27 in their study of different parts of teeth. Their observation indicates that higher mt/nc ratios in dentine may originate from its higher metabolic activity as compared to enamel and cementum. This assumption, of different metabolic activity resulting in different mt/nc ratios could also be applied to other skeletal elements, the petrous bone in particular. In the cortical part of the petrous bone, bone remodelling is suppressed compared to surrounding tissue by a specific signalling-pathway, and the number of metabolic active osteoclasts and osteoblasts is reduced28. Furthermore, the more compact bone regions contain a higher number of osteocytes, which have lower numbers of mitochondria29. The number of mitochondria in osteocytes decreases from periosteal and endosteal surfaces towards the inner and denser bone parts30. Consequently, sampling the hardest part of the petrous bone results in low mt/nc ratios due low mt/nc ratio in this area ante-mortem. Within the densest parts of the cortical bone displaying low mt/nc ratios, the endogenous DNA is also better protected against the environmental background of bacteria, fungi and other microbes resulting in high percentages of endogenous DNA.
In conclusion, we showed that the mt/nc ratio is an important value in aDNA authentication. It strongly influences the accuracy of extrapolating nDNA contamination levels from mtDNA contamination estimates. This approach should be used with particular caution if mt/nc ratios exceed 200. Lower mt/nc ratios are often associated with high percentages of endogenous DNA, as typically found in the densest parts of the petrous bones. In this case, low mt/nc ratios are most likely derived from a low mtDNA concentration in this bone region ante-mortem, which is caused by a low metabolic activity.
Temporal bones, including the petrous bones have already proven to be valuable in physical anthropology. The compact structure of the petrous bone is ideal for preservation in archaeological contexts, therefore, it can be used to estimate the minimal number of individuals31. While age can only be roughly estimated depending on developmental stages32. In addition, the discovery of the higher percentage of endogenous DNA in petrous bone compared to other skeletal elements10,11 constitutes a substantial improvement for the field of aDNA. Our study confirms those findings and shows that mitochondrial to nuclear ratios provide a further argument to extract ancient DNA from petrous bones providing reliable human contamination assessment from both sexes.
Material and Methods
We combined shotgun data from 76 new ancient DNA samples from the late Neolithic dolmen burial site of Oberbipp in Switzerland33,34, the late Neolithic multiple burial of Muttenz35, the final Neolithic sites in Switzerland Spreitenbach36, Seengen35, Zurzach35 and cave finds from Wartau37 (detailed description of archaeological sites and laboratory workflow SI section 1 and 2) with published datasets (Table 1). In total, DNA from 317 ancient individuals and 23 controls was analysed for its mtDNA to nDNA (mt/nc) ratio. For each of the three investigated skeletal elements, samples originated from three sources. Different skeletal elements do not originate from the same individuals. Samples for petrous bones (n = 57) included 40 newly processed samples from the previous mentioned Swiss burials (P1, P4, P5, P6, P7), 13 from Gamba and colleagues10 (P2) and four from Broushaki and colleagues13 and Gallego-Llorente and colleagues14 (P3). Teeth samples (n = 185) included 36 from the Swiss burials (T1, T2), 88 from Allentoft and colleagues38 (T3) and 61 from Schuenemann and colleagues39 (T4). P1 and T1 from this study originate from the same multiple burial in Oberbipp, Switzerland but due to the comingled nature of the remains an assignment of the teeth to the petrous bones is not possible. Furthermore, the endogenous DNA in the teeth is too low to allow kinship analysis to identify identical individuals. Therefore, it is possible that teeth and petrous bones are from the same individuals. Finally, diverse bone samples from compact bone parts (n = 75) included 13 from Allentoft and colleagues38 (B1), 52 from Schuenemann and colleagues39 (B2) and eight from Günther and colleagues40 (B3) and two from Broushaki and colleagues13 (B3). Fastq files were downloaded from the European Nucleotide Archive. Radio carbon dates were used from Gamba and colleagues10, Broushaki and colleagues13, Gallego-Llorente and colleagues14 and Schuenemann and colleagues39.
All data were processed with the EAGER pipeline41. If necessary, adapters were removed and paired-end data was merged using Clip&Merge41. Mapping was performed with BWA with the mismatch parameter set to 0.01 and a seed length of 1,000. If necessary, PCR duplicates were removed using DeDup41. Mitochondrial to nuclear ratios were calculated by dividing the mean coverage of the mitochondrial genome by the mean coverage of the nuclear genome to take into account the different length of nuclear and mitochondrial genome. Samples with no reads on the mitochondrial chromosome resulting in a ratio of zero as found in some negative controls, plant extracts, and extremely bad preserved teeth samples were excluded.
Reads mapping to the mitochondrial genome were extracted from the BAM files and the mitochondrial genome was reconstructed using the software schmutzi22. Mitochondrial contamination was then estimated with a Bayesian approach as described in Fu and colleagues6. Sex determination was performed after Skoglund and colleagues42. X chromosomal contamination in males with more than 0.5-fold coverage on the X chromosome was estimated using ANGSD23.
Statistical analysis was performed in R version 3.4.3 (R Core Team 2017). We first used one-way ANOVA to assess differences in loge[mt/nc ratio] (i.e., the raw mt/nc-ratios loge-transformed to approach normality) between the sub-samples within each of the three different skeletal elements and controls, followed by Tukey HSD post hoc tests. Second, to assess overall differences in loge[mt/nc ratio] between skeletal elements and controls, we performed linear mixed models as implemented in the lme4-package for R43. Given substantial variation in loge[mt/nc ratio] between different sample origins and studies detected above, this model contained the source study as a random slope factor, allowing between-study variation in the main effect while maintaining the paired nature of samples from the same source. Pairwise Tukey HSD post hoc tests were calculated from this model using the lsmeans package for R44. Differences in the variance of the loge[mt/nc ratio] between skeletal elements and control were tested using Levene’s test, followed by Tukey HSD post hoc tests.
To assess how loge[mt/nc ratio] varies with DNA-content (log-transformed to approach normality) we constructed a linear mixed model as above with loge[mt/nc ratio] as the response variable, loge[endogenous DNA] as the predictor, and subsample as random slope factor allowing differential regression slopes between subsamples. In addition, we added skeletal element and its interaction with loge[endogenous DNA] as fixed factor to assess slope consistency between skeletal elements. Given that this interaction was clearly insignificant (Chi² = 0.18, df = 2, P = 0.91) we removed it from the final reported model. Information about endogenous DNA content was only available for all Swiss and Egyptian samples. An identical model structure was used to assess the relationship between sample age and loge[mt/nc ratio], now with sample age as the main predictor.
Differences in loge[mt/nc ratio] between different mapping stringencies were assessed using linear regression. Information about loge[mt/nc ratio] at different mapping stringencies was obtained for the teeth of the Swiss burials sites Oberbipp and Spreitenbach (T1) since they have low endogenous content and a high portion of bacterial background.
Finally, to assess how the ratio of mitochondrial over X-chromosomal contamination varies with loge[mt/nc ratio] we constructed linear mixed models as above, but separate for samples with low (<5.3) and high (>5.3) loge[mt/nc ratio]. Contamination ratio served as the response variable, loge[mt/nc ratio] as the predictor, and skeletal element as a random slope factor allowing differential responses to loge[mt/nc ratio].
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We would like to thank the archaeological services of the Cantons of Aargau, Baselland, Bern, and Sankt Gallen for providing access to their collections as well as Anne Gibbson for proof reading the manuscript.