Introduction

Molecular anthropologists are eager to extract DNA from palaeoanthropological remains, such as bones and teeth, in order to get a direct grip onto the past.1 This has led to a sort of academic industry that produces an increasing amount of mitochondrial (mtDNA) data from specimens kept in museums or churches, or dug out from graveyards or prehistoric burial grounds. Ancient DNA (aDNA), however, poses the fundamental problem of authenticity, especially when ancient DNA is not expected to be much different in sequence from modern DNA.2, 3, 4 Contamination from many sources at any stage is virtually unavoidable, for example ‘contaminant DNA … can be found in buffers and other reagent solutions’.5 Moreover, there is the inherent risk of post mortem damage,6, 7 phantom mutations,8, 9 and amplification of nuclear inserts of mtDNA.10, 11, 12 Since few would believe anymore in mtDNA results of the kind published in those early days,2 for example ‘unusual’ sequences from Egyptian mummies,13 nowadays much emphasis is laid onto stringent criteria when dealing with ancient mtDNA, although in reality things seemed not to have moved on very much overall: ‘However, there is still concern that many studies are not paying enough attention to the exacting protocols needed to overcome the technical challenges of the discipline and to defend it from the ridicule that has plagued it in the past.’3

The meticulous application of the Nine Criteria of Authenticity14 would certainly spare the scientific community any further ‘exciting’ claims about, say, archaic Homo sapiens mtDNA15, 16 and will elevate the chances for authentic results in many cases, but it cannot prevent artefacts.17 The Nine Criteria are precautions that should normally be followed to increase reliability, and so they are regarded more or less necessary conditions for justifying authenticity of ancient DNA sequencing attempts. In particular, as to repeatability of experiments, ‘the fact that a DNA sequence is found in two independent extracts is a necessary, but not sufficient, criterion of authenticity when human remains are analysed’.18 Thus, one arrives at the rather bleak outlook that ‘in the absence of further technical improvements, it is impossible to produce undisputable human mtDNA sequences from ancient human remains’.18 As to the art of ancient DNA analysis, there seems to be a division between leading experts who are sceptical about the aDNA boom16, 17, 18, 19 and an optimistic group of aDNA scholars who think they ‘did all the right things20 with ancient DNA.

There may though be a chance to produce authentic ancient mtDNA from modern humans under extremely favourable conditions,21 viz., when the design of an ancient DNA study can incorporate additional internal controls for consistency. At least, the targeted mtDNA fragments of a regional aDNA study should a priori (by the best of current knowledge) be different from all of the potential mtDNA of the broad archaeological and laboratorial context. Only then should one attempt to set out amplifying ancient mtDNA, inasmuch as most contamination would give a positive signal that can be differentiated from the expected authentic mtDNA. Such privileged conditions are, however, fairly unusual in ancient DNA projects: imagine that a Japanese archaeological team would (under hermetic conditions) freshly excavate some prehistoric bones and teeth from a burial site, say, in South America, and analyse the material in a laboratory in Japan, then, with the necessary precautions, some authentic results might be expected – provided that coding-region (and control-region) markers were employed which can clearly separate Asian from Native American mtDNA.22 However, if a coffin from a church in Italy was opened and its dusty contents were analysed by a European team, then from the outset the results would be fraught with doubts about authenticity.

A posteriori control

There can be clear indication to the brave adherent of the Nine Criteria that mtDNA sequences obtained from ancient material are nonetheless artefacts or at least bear dubious nucleotide changes. Three indicators can positively exclude or question authenticity of ancient mtDNA sequencing results a posteriori. First is the phylogeographic paradox (or the principle of ‘phylogenetic expectation’2): if the putative ancient mtDNA reflected typical mtDNA lineages of the human environment of excavation team, curating staff, or lab personnel rather than mtDNA lineages that would be expected to have thrived in the geographic area of the ancient population, then contamination would likely have over-run any authentic DNA.23 This principle does not a priori exclude the possibility of unexpected findings, but, rather to the contrary, should shield the expected findings from expected sources of contamination that could have entered through a long (and possibly not fully controllable or reconstructible) handling process. Second is the mosaic structure: if the putative ancient mtDNA haplotype, composed of several (more or less) overlapping fragments (from the D-loop) and complementary fragments (from the coding region), was unusual in the sense that the separate fragments are well in line with modern mtDNA lineages from different branches of the mtDNA phylogeny but their odd combination did not come close to any point in the phylogeny, then the compound haplotype would most likely constitute some artificial recombinant, suggesting contamination or sample mix-up.24 This principle does not mean that we should necessarily mistrust a single recurrent mutational event but rather a whole array of back mutations, which could perfectly be explained by an artificial combination of distant haplotype motifs. Third is the abnormal mutational spectrum: if an agglomeration of unusual mutations was scattered across the mtDNA data deemed to be ancient, then post mortem changes and phantom mutations would have transformed the potentially authentic mtDNA to a degree that the resulting sequences would be virtually useless. In order to have the latter two indicators be applicable for an ancient DNA study, a moderate number of ancient individuals need to be tested for multiple different mtDNA fragments each.

General caveats

Amplified products of presumed ancient mtDNA are not always subjected to cloning and subsequent sequencing but sometimes to indirect procedures instead,25 particularly when only a limited number of additional diagnostic sites are of interest (to the researcher). Restriction fragment length polymorphism (RFLP) analysis and its interpretation may, however, even be tricky with modern mtDNA because of incomplete digestion or because bands may not be readily discernible and assignable to single mutations; its use in ancient DNA studies promotes ample scope for contaminating sequences to make their way into the results. Last but not least, the multiple testing for different enzymes and amplified fragments invites sample mix-up to act upon the outcomes (such as in a case of modern mtDNA26). This convenient but risky strategy to get around sequencing, even when merely confined to a second round of experiments for confirmation, is patently unsuitable for samples of degraded DNA. There are several cases of ‘ancient’ haplotypes where RFLPs and partial D-loop sequences constitute mosaic compound haplotypes.27, 28

Contamination and post mortem damage are not the only pitfalls the scholar of ancient DNA is confronted with. Low copy number of partly preserved mtDNA also entails an elevated risk of propagating sequencing artefacts due to suboptimal sequencing biochemistry and reading software. This constitutes not only a particular challenge to ancient DNA, but also remains a problem for modern mtDNA studies; see for example the Ladin data,8, 29, 30 the Syrian data,8, 31 or the Turkish mtDNA data.32 In particular, the latter data include an elevated number of otherwise rare or hitherto unobserved transversions (>5% of the sequences even carry two transversions). In half of the instances (16292A, 16351T, 16343C, 16322T, 16140A, 16258C, and 16303T) a simple pattern emerges: it is the successor nucleotide that is copied into the position in question, thus suggesting a sequencing problem (and lack of visual inspection of the sequencer outputs). In other cases, clerical and documentation errors as well as sample mix-up in the lab seem to dominate.24, 33, 34, 35 Many published datasets seem to suffer from such errors. There is no reason to believe that ancient DNA data are immune against these problems, which would thus come on top of the other notorious artefacts.28

The only way to shield against phantom mutations is to sequence and read both strands. This strategy is often implicitly alluded to but, alas, rarely applied in practice. Namely, the polycytosine tract in HVS-I incurred by C at position 16189 effectively serves as a barrier for reading a strand beyond the tract in either direction (because of slippage and resulting sequence overlay).36 To circumvent this dilemma, two primer pairs are needed with one primer each flanking the C tract.37, 38 When, instead, only one pair of primers is used for a region covering 16189, then this reading problem is inevitable. Therefore, the statement ‘For precision, forward and reverse sequences were always obtained for each mtDNA segment and individual39 must be false in the presence of sequences with a long C tract in HVS-I. Similar problems can arise in HVS-II after position 310. For example, whenever one sees the HVS-II positions 317, 320, 330, 343, and 345 mutated in concert,39 then the amplified product is a mixture of sequences with elongated C tracts 303–309 of different lengths.40 The occurrence of such a phenomenon would also clearly indicate that only the light strand was analysed.

The employed amplification strategy can influence the outcome of an aDNA sequencing attempt in an adverse way. Nested PCR, in comparison to direct PCR, is known to be more vulnerable to background noise and phantom mutations, possibly introduced by low-fidelity Taq polymerase in the first round of PCR.41More importantly, the excess of PCR cycles and the handling of amplification products during the procedure introduce an increased risk of contamination’.42 A nested-PCR assay,43 without independent replication in another lab, thus bears a high risk of producing artefacts.

Phylogeographic paradox

A straightforward case of the phylogeographic paradox at work has been documented recently.43 Two of the ancient ‘Fuegian’ haplogroup D sequences bear the full motif (16294–16296–16304, constituting a typical mtDNA lineage of the European haplogroup T2) of the mtDNA from the Spanish researcher himself and another two, one from haplogroup C and D each, have the partial motif 16294–16296.43 This transition pair is essentially confined to haplogroup T2 (together with the 16126 transition outside the sequencing frame). Even the occurrences in major African haplogroups (L1c and L2) for which the 16294 transition is among the characteristic mutations are very small in number.44 The artificial status of this tandem mutation in the three haplogroup D sequences is underscored by the fact that the sequences are deprived of both transitions (16325 and 16362) beyond 16223 that would be expected for Native American haplogroup D lineages. One can therefore conclude that the tandem mutation 16294–16296 could hardly have arisen at least two times independently in those four ‘Fuegian’ mtDNA sequences but rather that they constitute recombinants generated through contamination. This demonstrates that the contamination controls exercised43 were insufficient.

Mosaic structure

There is an interesting case of crosscontamination between samples in the ‘Fuegian’ data.43 The mtDNA lineage with motif 16223–16241–16342–16362 is the likely ancestral haplotype for the ‘Cayapa’ subhaplogroup of haplogroup D, as inferred from recorded instances.45, 46, 47, 48 To my knowledge the only other appearance of the motif 16241–16342 is in an mtDNA lineage sampled in Vanuatu.49 Now, the ‘Cayapa’ motif, albeit without the expected 16362 transition, appears twice in the ‘Fuegian’ data: one in a haplogroup C sequence (F41) and the other in a haplogroup D sequence (F69). Moreover, this haplogroup C sequence lacks the specific transitions at 16298, 16325, and 16327. Although single back mutations at any of these three sites have been found in haplogroup C lineages from Native Americans, a triple back mutation would be extremely implausible. Therefore, one is forced to conclude that the compound RFLP/HVS-I haplotype F41 constitutes a recombinant type.

A similar strategy for screening an additional site by RFLP analysis was followed earlier.31 It was claimed that the following ‘genetic characterization of the body attributed to the evangelist Luke’ was obtained: transitional differences at sites 16235 and 16291 in HVS-I as well as the restriction-site change +7025 AluI (thus excluding haplogroup H membership). In contrast, haplogroup H status has been confirmed for some sequences bearing the motif 16235–16291 (sampled in Galicia: A Salas, personal communication). No attempt, however, was made to screen the total available mtDNA database or necessary new samples from modern populations in Europe or the Near East for this compound motif, so that it appears questionable whether the compound haplotype (16235–16291/+7025 AluI) is really authentic. Note that, in contrast to the role of RFLPs here, RFLP screening was elsewhere used for aDNA as ‘a strategy to authenticate the identity of ancient mitochondrial DNA (mtDNA), based on the previously established relationship between D-loop sequence substitutions and haplogroup-specific restriction site changes’,50 in other words, to seek for hints at mosaic patterns.

A likely case of mosaic structure is hidden in a recent high-profile publication,51 where the authors claimed to have retrieved authentic mtDNA information for two 24 000-year-old European (‘Cro-Magnon’) human specimens. The variation in the first hypervariable segment of the D-loop (HVS-I:16024–16400) was assessed in two labs with 2–4 primer pairs. It was asserted that specimen Paglicci-25 is identical to rCRS and Paglicci-12 differs from rCRS by a transition at 16223 in HVS-I. According to the authors, specific mtDNA sites outside HVS-I were also analysed (‘by amplification, cloning, and sequencing of the surrounding region’), but no details were given in the paper, except reporting that Paglicci-25 has −7025 AluI and bears nucleotides A at 73, G at 11719, and A at 12308, whereas Paglicci-12 shows G at 73, C at 10873, T at 10238, and AACC at 10397–10400. These additional analyses were only carried out in one lab and not duplicated in a second lab. The authors further asserted that the mtDNA of both specimens belong to typical Near Eastern haplogroups. In particular, the mtDNA of specimen Paglicci-12, with claimed mutations at sites 73, 10873, and 16223 but none in the stretch 10397–10400 relative to rCRS, was regarded as a member of haplogroup N. They have, however, confused the roles of C and T at 10873 in the mtDNA phylogeny – in fact, C at both sites 10400 and 10873, as observed in Paglicci-12, indicates that this mtDNA haplotype should rather not belong to either of the Eurasian/Oceanian haplogroups M and N, which completely cover the non-African mtDNA pool. Therefore, we would be led to sort this mtDNA lineage into an (unknown) African subhaplogroup of the superhaplogroup L3; but this does not sit easily with the claimed nucleotide A at 10398.52 The most plausible explanation then is that we are seeing here a mosaic origin of the compound mtDNA haplotype for Paglicci-12. If these problems had been realized in time by the authors themselves, then further coding-region markers could have been analysed for a more thorough characterization of the targeted mtDNA.

Multiple occurrences of the same tandem mutation on lineages from disjoint haplogroups can distinctly signal artefacts.8, 35 For instance, the ‘Etruscan’ data53 harbour the transition pair 16193–16219 thrice, on quite different HVS-I backgrounds, even separated by a restriction site: once jointly with +14766 MseI and twice with −14766 MseI. The transition 16219, however, is an infrequent mutation,8 which is essentially confined to haplogroup U6ab (having the motif 16172–16219)54 and also found within a specific subhaplogroup of haplogroup H6 (having the motif 16362–16482–239),55 and which otherwise has only sporadic occurrences. The claim that position 16219 mutates almost three times56 (or elevated to five times57) as fast as a HVS-I position on average cannot be substantiated with proper phylogenetic analyses. Although the motif 16193–16219 occurs in the ‘Etruscan’ data without the transitions 16172 or 16362, it is not necessary to assume that some phantom mutations reproduced this mutation pair since the motif very well matches the specific haplogroup H6 haplotype within the middle one of the three fragments (16024–16131, 16108–16260, and 16248–16384) that were generated by separate amplifications. One cannot exclude the possibility that some crosscontamination with 16193–16219 became dominant in several amplicons of the middle fragments.

Abnormal mutational spectrum

An abnormal mutational spectrum is often identifiable by several single extremely rare mutations or novel mutation pairs (tandem mutations), which are distributed across the dataset. The table that is supposed to display ‘Etruscan’ mtDNA variation53 includes, for instance, the transition 16334, which has so far been observed in only one published dataset – the ‘Ladins’,29 known for the high accumulation of sequencing artefacts.8 In the ‘Etruscans’53 the 16334 transition appears twice, but on different HVS-I backgrounds and connected with different RFLP results, viz. with +14766 MseI and −14766 MseI, respectively, so that one is led to assume even two independent mutations at site 16334 in this single small data set. Changes at 16095 are rare elsewhere, but here we see the transition 16095 as well as the transversion 16095G, which otherwise appear mainly in old datasets,58 most of which are of dubious quality. In this context, it is interesting to note that site 16095 received the top score (29 instances) for ‘N’ (undetermined nucleotide) among 850 HVS-I sequences59 that found their way into ‘D-Loop-BASE’ (a German forensic mtDNA database, which, however, has recently gone offline because of serious problems: http://www.d-loop-base.de/). In a recent screening of >5000 pairs of electropherograms for both strands, site 16095 was among the top four HVS-I sites for which the strands showed discrepant signals due to phantom mutations.40 It therefore seems that certain sites, such as 16095, may be prone to background noise and phantom mutations under suboptimal sequencing conditions. There are further mutations in the ‘Etruscan’ data which are otherwise extremely rare (such as the 16098 transition). For instance, the transition 16229 is reported only three times in the European mtDNA pool,60, 61, 62 but in the ‘Etruscans’ it appears on two different branches of the mtDNA phylogeny, and in one of the sequences even jointly with the neighbouring transition 16228. Worldwide, the latter transition was so far found in only one haplogroup D lineage,63 in two different haplogroup B lineages of one dataset,64 and in two further datasets,65, 66 both of which appear to be problematic.24, 67 Summarizing, there is ample evidence for the action of phantom mutations and/or post mortem damage on the Etruscan data.68, 69

With regard to sequence quality, the ‘Etruscan’ sequences53 and the modern Turkish sequences32 are in sharp contrast with the ‘Egyin Gol’ sequences.70 Two of the indicators (viz. phylogeographic paradox and abnormal sequence spectrum) are (partly) applicable but give no alarm in the case of the latter dataset. In particular, absolutely no signal of phantom mutations (or poor documentation) can be discerned in the ‘Egyin Gol’ data, which perfectly look like modern mtDNA data from Central Asia, whereas both former datasets suffer from obvious phantom mutations (or post mortem damage). It is however problematic that the authors70 did not adhere to the Nine Criteria by dismissing the most important requirements of Cloning and Independent replication. This could offer green light for high-throughput sequencing of ancient DNA to those laboratories which are much less successful with the technical side of mtDNA sequencing.

Conclusion

In summary, most aDNA studies31, 43, 51, 53 failed to utilize the full set of Nine Criteria of Authenticity. In particular, discarded results that might represent contamination were not always reported (as required for Control amplifications) and, most importantly, not all produced HVS-I sequences and none of the additional mtDNA fragments outside the HVS-I range were confirmed in a second lab (thus violating Independent replication), although all the mtDNA information obtained was essential for the conclusions made in the papers. The amplification ranges were chosen so economically that they hardly overlapped (contra the recommendation for Reproducibility and Cloning). This minimalist strategy of amplifying complementary fragments rather than widely overlapping mtDNA fragments has also the disadvantage that sample mix-up would go unnoticed in the lab. Multiple cloning cannot detect such problems when carried out in one go, since a wrongly drawn sample would influence all of the cloning experiments.24 It does not suffice to demonstrate authenticity of the sequences in an ancient population study by cloning only some53, 71 of the PCR products rather than all.

As the authors of the Nine Criteria lamented, ‘high-profile journals continue to publish studies that do not meet the necessary controls’.14 The situation has evidently not improved since then. Many researchers43, 72 still ignore most of the Nine Criteria – possibly because they believe that they are not really necessary – but give no indication as to why the criteria ignored are not relevant for the case under study. For example, as for the racemization test, one may argue that it is not infallible as a sine qua non condition and would rather serve as an informal guideline in the lab to see whether it might be worth pursuing the sequencing attempts when contrasting candidate samples. The Nine Criteria were, however, not discussed in a recent book73 on ancient DNA typing, thus missing ‘an opportunity to help eradicate the mistakes that are frequently made by newcomers to aDNA research’.74

Experiments have not always been well designed in the aDNA field31, 51, 53 because from the outset a phylogeographic paradox had no chance to show up and thus to shield expected contamination from expected authentic sequences. To exclude only lab personnel as potential contaminators is certainly not sufficient. For example, in the experiments with ‘Guanche’ mtDNA, the authors positively detected some contamination of controls but had to admit that ‘none of these contaminating sequences belong to the people working in the lab or to those known to be involved in the archaeological manipulation’.72 Untraceable (non-lab) contamination was reported in a case, in which multiple sequences were obtained from a single specimen.75 In some studies,51, 53 mosaic structure and abnormal mutational spectrum were not discovered in time since the employed search strategies were insufficient and only a very small fraction of the worldwide mtDNA database (currently comprising >30 000 HVS-I sequences) was taken for comparison. Cases where mosaic structure and abnormal mutational spectrum positively suggest nonauthenticity of ancient (or modern) mtDNA continue to find their way into high-profile journals. Researchers planning to perform aDNA studies on human bones or teeth should pause and reflect whether they are really able to follow all necessary and further criteria for aDNA work as well as to evaluate fresh aDNA data in the light of an edited worldwide database, so that they could eventually convince and persuade both reviewers and readers that the obtained data are authentic.76