A skullcap found in the Salkhit Valley in northeast Mongolia is, to our knowledge, the only Pleistocene hominin fossil found in the country. It was initially described as an individual with possible archaic affinities, but its ancestry has been debated since the discovery. Here, we determine the age of the Salkhit skull by compound-specific radiocarbon dating of hydroxyproline to 34,950–33,900 Cal. BP (at 95% probability), placing the Salkhit individual in the Early Upper Paleolithic period. We reconstruct the complete mitochondrial genome (mtDNA) of the specimen. It falls within a group of modern human mtDNAs (haplogroup N) that is widespread in Eurasia today. The results now place the specimen into its proper chronometric and biological context and allow us to begin integrating it with other evidence for the human occupation of this region during the Paleolithic, as well as wider Pleistocene sequences across Eurasia.
Mongolia has been a focus of attention for researchers interested in human origins for more than a century and substantial advances in the understanding of the Mongolian Paleolithic have been made in the last few decades1. The discovery of Paleolithic remains is, however, still mainly limited to restricted areas in the centre part of the country (Fig. 1, Supplementary Table 1). In addition to surface surveys, recent excavations revealed aspects of the Paleolithic occupation of the region, enabling archaeologists to begin integrating the Mongolian Paleolithic with other Pleistocene sequences across Eurasia. In congruence with the neighbouring regions of southern Siberia and the Siberian Altai, the Upper Paleolithic in Mongolia has been divided into three phases: Early, Middle and Late Upper Paleolithic. In contrast with the Lower and Middle Paleolithic, which are characterized by few sites, the start of the Upper Paleolithic is relatively well defined by the presumably persistence of earlier Mousterian or Levallois industries and the appearance of blade production in Northern Mongolia, mirroring the southern Siberian record2. The Early Upper Paleolithic (EUP) is usually subdivided into two phases. The first, spanning from 40,000 BP to 35,000 BP, includes the site of Rashaan Khad, which is about 150 km SW from Salkhit (dated at 39,100 ± 1000 BP (OxA-34324))3. The second, represented by assemblages from the Khangai Mountains and the Gobi Altai district, covers the period between 33,000 BP and 26,000 BP1,4,5. The Mousterian or Levallois elements disappear with the Middle Upper Paleolithic (MUP) in Mongolia. MUP sites have been identified only in the Orkhon River valley and are younger than ca. 25,000 BP1. Finally, the later phase of the Mongolian Upper Paleolithic (LUP) is characterized by microliths and lasts until the end of the Pleistocene1,2,3.
The skullcap analysed in this study was discovered in 2006 during mining operations in the Salkhit Valley of the Norovlin county in the Khentii province, eastern Mongolia (48°16′17.9″ N and 112°21′37.9″ E). It is, so far, the only Pleistocene human fossil found in the country. Subsequent investigations around the find-spot did not yield additional fossil remains. The skullcap has a mostly complete frontal as well as two partially complete parietal and nasal bones. Ages suggested for the specimen range from the Early Middle Pleistocene to the terminal Late Pleistocene. The presence of archaic features has led to a potential affiliation to an uncharacterized archaic hominin species. The specimen was thus initially referred to as Mongolanthropus6. The comparison of the dimensions of the skullcap with reference skullcaps by multidimensional scaling analysis showed similarities with Neanderthals, Homo erectus and Asian archaic Homo sapiens7. Based on the published photographs, Kaifu and Fujita8 suggested that the Salkhit specimen belongs to a terminal Pleistocene modern human. Additional comparisons with Middle and Late Pleistocene hominin fossils from northeast Asia (Zhoukoudian Locality 1, Dali, and Zhoukoudian Upper Cave) concluded that the peculiar features of the Salkhit skull are more likely to be regionally predominant modern human features than diagnostic features of an archaic species9,10.
Because of the dearth of hominin fossils recovered in Mongolia, the Salkhit skull represents a unique opportunity to investigate the types of humans that occupied the region during the Pleistocene. Here, we present the results of both chronometric and genetic analyses of the Salkhit specimen. This fossil dates to approximately 34–35 thousand years ago and its mitochondrial genome is of Eurasian modern human type.
Results and Discussion
Since its discovery in 2006, the Salkhit fossil (Reference number: 2006-4) has been kept at the Institute of History and Archaeology (formerly, Institute of Archaeology), Mongolian Academy of Sciences, in Ulaanbaatar. The specimen was sampled for radiocarbon and genetic analyses in four locations (B, C, D and E, Fig. 2) in the internal part of the skull at the posterior tip next to an area that had been sampled prior to this study (A, Fig. 2). In total, ~1 g of bone was sampled.
The Salkhit skull (Sample A, Fig. 2) was previously dated to 22,105 ± 45 BP by the Beta Analytic Inc. radiocarbon laboratory (Miami, Florida, USA), but this result has never been formally reported11. Unfortunately, no information is available about the sample preparation methods that had been applied. Another fraction of the same sample A was also dated at the Oxford Radiocarbon Accelerator Unit (ORAU) in 2010. It was pre-treated following the routine procedure comprising decalcification, base wash, reacidification, gelatinisation and ultrafiltration steps (Coded ‘AF’ in the ORAU), as described by Brock et al.12. It produced a date of 23,630 ± 160 BP (Table 1). The C:N atomic ratio of carbon to nitrogen for the extracted collagen was 3.5. Even though this value falls within a range of C:N values [2.9–3.5] that passes the quality control criteria of most radiocarbon facilities, the value is relatively high and could indicate that some of the contamination was not entirely removed from the sample. We therefore decided to re-date the skullcap using a compound-specific approach13. The higher efficiency of this method in removing contaminants from heavily contaminated samples and its ability to provide accurate 14C measurements has been demonstrated in several recent cases involving Paleolithic sites in France14, Croatia15, Russia16,17,18,19,20,21, and the Americas22,23.
A new bone sample was obtained and after extraction, the collagen was hydrolyzed and the amino acid hydroxyproline was isolated using preparative High Performance Liquid Chromatography (Prep-HPLC). The hydroxyproline was combusted, graphitised, and measured by AMS, producing a date of 30,430 ± 300 BP (OxA-X-2717-25). The C:N atomic ratio measured was 5.0, corresponding to the expected theoretical value for hydroxyproline (C5H9NO3) and indicating the absence of contamination in the sample after the Prep-HPLC treatment. The date were calibrated using OxCal 4.324 and the INTCAL13 calibration curve25, yielding an age range of 34,950–33,900 Cal BP (at 95% probability) (Table 1, Fig. 3). This is significantly older than the two dates obtained previously. Such differences have been observed before in cases where bones were heavily contaminated with conservation agents, indicating that the standard chemical pretreatment12 is sometimes not efficient enough to remove all high molecular-weight contamination that is present in a sample and sometimes even cross-linked to the collagen. The contamination was confirmed by additional chemical investigations using Pyrolysis Gas Chromatography coupled with Mass Spectrometry (Py-GC/MS). They revealed the presence of a silicone-based contaminant in the Salkhit bone probably used to make the endocast (Fig. 4)7. The hydrolysis of the collagen and isolation of the amino acid hydroxyproline obviated this problem.
DNA extraction and mtDNA enrichment
DNA was extracted from 30.9, 27.0 and 33.1 mg of powder removed from the sampling spots B, C and D, respectively (Fig. 2)26. Aliquots of the extracts were converted to single-stranded libraries27 and enriched for human mtDNA fragments using a present-day human mtDNA probe set28. The enriched library molecules were sequenced from both ends. Overlapping paired-end reads were merged to reconstruct full-length molecules sequences, which were then mapped to the revised Cambridge reference sequence of the human mitochondrial genome (rCRS, GenBank acc. no. NC_0120920). After discarding sequences shorter than 35 base pairs (bp) and fusing sequences with identical start and end coordinates, 264,884 unique mtDNA fragments remained (Table 2).
To assess whether or not molecules of ancient origin were present, we estimated the frequencies of cytosine (C) to thymine (T) mismatches along the sequences relative to the reference sequence. Such mismatches result from deamination of C residues in ancient DNA, particularly towards the ends of molecules29. The C to T mismatch frequencies ranged from 34.9% to 41.6% at the first (5’) position and from 20.6% to 25.7% at the last (3’) position of the sequences, respectively (Table 2) (Supplementary Fig. 1), with the library yielding the largest number of mitochondrial sequences exhibiting the lowest terminal C to T mismatch frequencies. The differences among the libraries suggest that, in addition to endogenous ancient DNA molecules, contaminating present-day human mtDNA molecules were extracted and incorporated in the libraries. This is further supported by the observation that DNA fragments that carry C to T mismatches at one end have higher frequencies of C to T mismatch at the other end than when all DNA fragments are considered (Table 2) (Supplementary Table 2)30. We conclude that the DNA libraries contain authentic ancient molecules carrying terminal C to T mismatches as well as present-day contaminating DNA molecules.
Contamination and Salkhit mtDNA consensus sequence
To reduce the impact of cytosine deamination on the reconstruction of the mitochondrial genome, we masked T bases at the first and last three positions of each mtDNA fragment before aligning them to the rCRS reference human mitochondrial genome. The estimation of present-day human contamination in each library relied on two positions (pos. 10,364 and pos. 14,578 in the rCRS reference genome) where the Salkhit mtDNA genome sequence differs from a worldwide panel of 311 present-day human mtDNA genomes. The proportions of sequences covering both positions and not matching the Salkhit-specific state are 19.6%, 35.2% and 42.6% for the libraries made out the extracts B, C and D, respectively (Table 2).
Because of the high level of contamination of the specimen, we called the Salkhit mtDNA consensus using only the 38,024 mtDNA fragments carrying C to T mismatches to the human reference mtDNA in one of their three first or three last positions recovered from the three DNA libraries. This significantly lowers the amount of modern human contaminating sequences and reduces any potential bias for contaminating molecules resulting from the use of modern human capture baits and reference sequence for mapping. The consensus was called for 16,567 positions, out of 16,569, where at least 80% of the fragments supported the same base (Supplementary Fig. 2) with an average coverage of 123 DNA fragments per nucleotide position. One of the two missing positions was in a C-homopolymer stretch (pos. 295–320), which after visual inspection was manually corrected, while the other (pos. 16,093) is covered by 117 fragments from which 88 carry a C while 29 carry a T (Supplementary Fig. 2). This observation might reflect heteroplasmy and was left uncalled.
For the alignments of deaminated fragments only, we estimated the level of contamination to 4.1%, 1.6% and 3.7% for libraries made from extracts B, C and D, respectively (Table 2). Given the coverage of the genome, this level of contamination is not expected to affect the accuracy of the consensus sequence.
MtDNA phylogeny and dating
The Salkhit mtDNA lineage was assigned to the modern human macro-haplogroup N using HaploGrep231 (based on mtDNA tree Build 1732). Haplogroup N and M are the two basal mtDNA haplogroups shared among all present-day non-Africans. While the mosaic of archaic-like and modern human-like morphological traits have made the assignment of the fossil to Pleistocene hominin groups difficult, we thus show that the mtDNA of the Salkhit individual is of modern human origin. However, the sequence does not carry any substitutions characteristic of known sub-haplogroups inside the haplogroup N. A maximum parsimony analysis (subtree Fig. 5) assigns the Salkhit mtDNA to an uncharacterized lineage which branches off the root of haplogroup N. It is therefore unlikely that the Salkhit mitochondrial lineage is directly ancestral to any present-day human mtDNA. Among ancient modern humans, only the mtDNA of the ~40,000-year-old Romanian Oase 1 individual33 falls outside the known sub-lineages of N or M, suggesting the existence of more mtDNA diversity among early modern humans in Eurasia than among later and present-day Eurasian populations.
The maximum parsimony tree (Fig. 5) suggests that the Salkhit mtDNA accumulated 10 substitutions since its divergence from its closest relatives in the N haplogroup. Using a mtDNA mutation rate of 2.67 × 10−8 substitution per site per year34, we estimate that the Salkhit mtDNA diverged from the root of the N haplogroup 19,000 to 28,000 years (95% HPD) before the Salkhit individual lived. Considering the reported divergence time of the N haplogroup between 70 and 50 ka ago35,36,37, this estimate is compatible with the calibrated radiocarbon date.
We also estimated the age of the Salkhit specimen mtDNA using a Bayesian method implemented in BEAST38 and 58 mtDNA sequences from archaeologically dated specimens as calibration points (Supplementary Table 3). Applying a strict clock and the mutation rate above we obtain a date between 12,910 and 39,410 years BP (95% HPD) with a median of 26,600 years BP, supported by a Bayesian inference of phylogeny tree (Supplementary Fig. 4). This date is consistent with the calibrated radiocarbon date as well as with the molecular date inferred from the number of private substitutions accumulated in the Salkhit mtDNA since its divergence from the root of the N haplogroup.
The morphological traits of the Salkhit skullcap have fueled the debate about the origins of the prehistoric humans of the region, where it is, to our knowledge, the only Pleistocene human fossil discovered. The complete mtDNA sequence reconstructed from the Salkhit skull falls within the variation of modern human mtDNAs, showing that the maternal lineage of the specimen is of modern human ancestry. The compound-specific radiocarbon date shows that the previous dates of 22 and 23 ka BP were too young. This is due to the presence of a silicon-based contaminant that could not be removed with other pretreatments. It is now clear that the Salkhit individual belongs within the EUP in Mongolia. Based on this new chronological evidence and its location, we hypothesize that the Salkhit individual, although found without any archaeological context, may have been familiar with the Levallois-like technology and blade production industry of the EUP found in Northern Mongolia and in Southern Siberia.
Two different methods were used to prepare the samples for AMS dating. First, samples were pre-treated following the routine procedure at the Oxford Radiocarbon Accelerator Unit (ORAU) comprising a decalcification, base wash, reacidification, gelatinisation and ultrafiltration (Coded ‘AF’ in the ORAU)12. The second sample taken from the skullcap was dated using the single amino acid radiocarbon dating method optimized at the ORAU13. This method involves separation of the underivatised amino acids from hydrolyzed bone collagen samples using preparative High Performance Liquid Chromatography (Prep-HPLC). The amino acid hydroxyproline is isolated by Prep-HPLC, combusted, graphitized and AMS dated. This pretreatment approach (Coded ‘HYP’ in the ORAU) is the most efficient technique to remove contaminants including, but not limited to, conservation materials (unless collagen-based glue has been applied). The %C, %N and atomic C/N ratio were measured using an automated carbon and nitrogen elemental analyzer (Carlo Erba EA1108) coupled with a continuous-flow isotope monitoring mass spectrometer (Europa Geo 20/20).
DNA sampling, extraction and mtDNA sequence determination
After the removal of approximately 1 mm of surface material using a sterile dentistry drill, which was performed to reduce present-day human DNA contamination, the sampling for DNA analyses was carried out in areas B, C and D (Fig. 2). DNA was extracted from 30.9 mg, 33.1 mg and 27 mg, respectively, using a silica-based protocol optimized for the recovery of short molecules from ancient material26. An extraction blank was also carried through each step of DNA extraction. Ten microliters of each extract (of a total of 50 µL) were converted into a single-stranded DNA library as described elsewhere27 using an automated liquid handling platform39. One positive and one negative control were included during library preparation40. No uracil-DNA glycosylase treatment was performed. After amplification with a pair of primers carrying unique index sequences41 and quantification using a NanoDrop ND-1000 (NanoDrop Technologies) photospectrometer, 1 µg of each sample library and the blank library were enriched for mtDNA using present-day human mtDNA probes (in 1-bp tiling density) as described28,34. Before sequencing, libraries were amplified in a one-cycle-PCR using Herculase II Fusion DNA polymerase (Agilent)42 with primers IS5 and IS643 to remove heteroduplices.
The libraries were pooled and sequenced on a MiSeq (Illumina Inc.) using a paired-end configuration of 76 + 7 cycles for each insert and index read41. Base calling was performed using Bustard (Illumina), and sequences that did not exactly match one of the expected index combinations were discarded. Adapter sequences were trimmed and overlapping paired-end reads merged using leeHom44 before mapping the sequences to the revised Cambridge Reference mitochondrial genome (rCRS, NC_0120920) using the Burrows-Wheeler Aligner (BWA)45 with parameters « -n 0.01 –o 2 –l 16500 »46. Mapped sequences with identical alignment start and end coordinates were collapsed using bam-rmdup (https://bitbucket.org/ustenzel/biohazard). Only overlap-merged sequences of at least 35 base pairs in length were retained for subsequent analyses.
After masking terminal C to T substitutions, a perl script was used to call the consensus mtDNA sequence (based on the «mpileup» command of SAMtools47) requiring a minimum coverage of 5 and a consensus support of at least 80% at each position.
The Salkhit mtDNA haplogroup was assigned using Haplogrep31 and mtPhyl5 (https://sites.google.com/site/mtphyl/home) based on the Phylotree database (mtDNA tree Build 17)32. The number of mutations that occurred in the Salkhit mtDNA since its divergence from the root of haplogroup N was inferred by maximum parsimony analysis using mtPhyl.
A multiple sequence alignment of the Salkhit mtDNA sequence to 115 complete mtDNAs of modern humans18,34,36 and one Neandertal mtDNA48 was generated using MAFFT49. We then estimated the molecular age of the Salkhit mtDNA based on its branch length in a Bayesian statistical framework38 as implemented in BEAST250 using the dataset above. We used tip dates of zero for present-day mtDNA sequences and individual fossil dates intervals as priors for the ancient mtDNA sequences (Supplementary Table 3). As determined by jModelTest2, we performed the analyses using a GTR +I+G model for nucleotide substitution with a gamma distributed rates among sites. A strict clock and an uncorrelated lognormal-distributed relaxed clock, both under a constant size population and a Bayesian skyline coalescent tree prior, were tested. To estimate the posterior distribution of each model, we performed two runs of the Markov Chain Monte Carlo (MCMC) algorithm implemented in the Beast software. Each MCMC run was performed with 50,000,000 iterations and sampled every 10,000 generations. After discarding 10% of the sampled iterations as burn-in, the output was analyzed with Tracer v1.5.0 (http://tree.bio.ed.ac.uk/software/tracer/). The effective sample sizes estimate supported the strict clock model over the relaxed clock model, while neither the constant size nor the Bayesian skyline coalescent models were rejected. The age of the Salkhit mtDNA was therefore estimated based on a strict clock model and a Bayesian skyline coalescent model.
Pyrolysis gas chromatography coupled with mass spectrometry
The analyses were performed on an EGA/PY-3030D Micro Furnace Pyrolyzer (Frontier Lab, Japan) connected to a 6890 gas chromatograph equipped with a split/splitless injector. An HP 5MS column (30 m × 0.25 mm, film thickness 0.25 µm, Agilent Technologies, USA) coupled with a deactivated silica pre-column (2 m × 0.32 mm, Agilent Technologies, USA) was used for the chromatographic separation. The program of the oven temperature used for the chromatographic separation was 40 °C isothermal for 6 min followed by a linear ramp at 20 °C/min up to 310 °C. Analyses were performed under a constant flow of helium at 1 mL/min, with a split ratio 1:20. The gas chromatograph was coupled with a 5973 Mass Selective Detector (Agilent Technologies, USA). The MS transfer line temperature was 300 °C. The mass spectrometer was operated in EI positive mode (70 eV, scanning m/z 50-600). The MS ion source was kept at 230 °C and the MS quadrupole at 150 °C.
The two samples, 450 µg of bone powder from the Salkhit skull and 250 µg of an uncontaminated mammoth bone sample from Latton (UK), used as a background standard at the ORAU, were successively placed in a stainless-steel cup and inserted in the micro-furnace where they were pyrolysed at 600 °C for 20 sec.
Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
The complete mitochondrial DNA sequence of the Salkhit individual is deposited in the European Nucleotide Archive (http://www.ebi.ac.uk/ena) under the accession number PRJEB29760. Other data that support the findings of this study are available from the corresponding authors on reasonable request. A reporting summary for this Article is available as a Supplementary Information file.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Funding was provided by the Max Planck Society and the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013); ERC grant 324139 PalaeoChron awarded to Professor Tom Higham and ERC grant 694707-100 100 Archaic Genomes awarded to Professor Svante Pääbo. We thank the staff of the Oxford Radiocarbon Accelerator Unit at the University of Oxford for support in the lab, Cosimo Posth at the Max Planck Institute for the Science of Human History (Jena), Martin Sikora and Ashot Margaryan at the Centre for GeoGenetics, University of Copenhagen and Mateja Hajdinjak, Alexandre Huebner, Stéphane Peyrégne, Viviane Slon and Mark Stoneking at the Max Planck Institute for Evolutionary Anthropology, Leipzig for helpful discussions.