Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that stretch well back into the Early Pleistocene subepoch. Although theoretical models suggest that DNA should survive on this timescale1, the oldest genomic data recovered so far are from a horse specimen dated to 780–560 thousand years ago2. Here we report the recovery of genome-wide data from three mammoth specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old. We find that two distinct mammoth lineages were present in eastern Siberia during the Early Pleistocene. One of these lineages gave rise to the woolly mammoth and the other represents a previously unrecognized lineage that was ancestral to the first mammoths to colonize North America. Our analyses reveal that the Columbian mammoth of North America traces its ancestry to a Middle Pleistocene hybridization between these two lineages, with roughly equal admixture proportions. Finally, we show that the majority of protein-coding changes associated with cold adaptation in woolly mammoths were already present one million years ago. These findings highlight the potential of deep-time palaeogenomics to expand our understanding of speciation and long-term adaptive evolution.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All sequence data (in .fastq format) for samples sequenced in this study are available through the European Nucleotide Archive under accession number PRJEB42269. Previously published data used in this study are available under accession numbers PRJEB24361 and PRJEB7929.
The custom code used in this study to evaluate read length cut-offs is available from GitHub (https://github.com/stefaniehartmann/readLengthCutoff).
Allentoft, M. E. et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. Lond. B 279, 4724–4733 (2012).
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466–469 (2012).
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Palkopoulou, E. et al. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr. Biol. 25, 1395–1400 (2015).
Weir, J. T. & Schluter, D. Ice sheets promote speciation in boreal birds. Proc. R. Soc. Lond. B 271, 1881–1887 (2004).
Lister, A. M. The impact of Quaternary Ice Ages on mammalian evolution. Phil. Trans. R. Soc. Lond. B 359, 221–241 (2004).
Lister, A. M., Sher, A. V., van Essen, H. & Wei, G. The pattern and process of mammoth evolution in Eurasia. Quat. Int. 126–128, 49–64 (2005).
Werdelin, L. & Sanders, W. J. (eds) Cenozoic Mammals of Africa (Univ. California Press, 2010).
Lister, A. M. & Sher, A. V. Evolution and dispersal of mammoths across the Northern Hemisphere. Science 350, 805–809 (2015).
Repenning, C. A. Allophaiomys and the Age of the Olyor Suite, Krestovka Sections, Yakutia (US Government Printing Office, 1992).
Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15758–15763 (2013).
Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).
Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, db.prot5448 (2010).
Palkopoulou, E. et al. A comprehensive genomic history of extinct and living elephants. Proc. Natl Acad. Sci. USA 115, E2566–E2574 (2018).
Rohland, N. et al. Proboscidean mitogenomics: chronology and mode of elephant evolution using mastodon as outgroup. PLoS Biol. 5, e207 (2007).
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Chang, D. et al. The evolutionary and phylogeographic history of woolly mammoths: a comprehensive mitogenomic analysis. Sci. Rep. 7, 44585 (2017).
Pečnerová, P. et al. Mitogenome evolution in the last surviving woolly mammoth population reveals neutral and functional consequences of small population size. Evol. Lett. 1, 292–303 (2017).
Barnes, I. et al. Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius. Curr. Biol. 17, 1072–1075 (2007).
Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).
Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).
Leppälä, K., Nielsen, S. V. & Mailund, T. admixturegraph: an R package for admixture graph manipulation and fitting. Bioinformatics 33, 1738–1740 (2017).
Skov, L. et al. Detecting archaic introgression using an unadmixed outgroup. PLoS Genet. 14, e1007641 (2018).
Lynch, V. J. et al. Elephantid genomes reveal the molecular bases of woolly mammoth adaptations to the Arctic. Cell Rep. 12, 217–228 (2015).
Mallet, J. Hybrid speciation. Nature 446, 279–283 (2007).
Lucas, S. G., Morgan, G. S., Love, D. W. & Connell, S. D. The first North American mammoths: taxonomy and chronology of early Irvingtonian (Early Pleistocene) Mammuthus from New Mexico. Quat. Int. 443, 2–13 (2017).
Gansauge, M.-T. & Meyer, M. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protocols 8, 737–748 (2013).
John, J. S. SeqPrep: tool for stripping adaptors and/or merging paired reads with overlap into single reads. GitHub https://github.com/jstjohn/SeqPrep (2011).
Schubert, M. et al. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics 13, 178 (2012).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Feuerborn, T. R. et al. Competitive mapping allows for the identification and exclusion of human DNA contamination in ancient faunal genomic datasets. BMC Genomics 21, 844 (2020).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).
Skoglund, P. et al. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl Acad. Sci. USA 111, 2229–2234 (2014).
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0, 2013–2015. http://www.repeatmasker.org (2015).
Green, R. E. et al. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134, 416–426 (2008).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Meyer, M. et al. Palaeogenomes of Eurasian straight-tusked elephants challenge the current view of elephant evolution. eLife 6, e25413 (2017).
Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).
Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 (2012).
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013).
Lefort, V., Desper, R. & Gascuel, O. FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32, 2798–2800 (2015).
Liu, L. et al. Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion. Nat. Commun. 10, 1992 (2019).
Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
T.v.d.V., P.P., D.D.-d.-M., M.D. and L.D. acknowledge support from the Swedish Research Council (2012-3869 and 2017-04647), FORMAS (2018-01640) and the Tryggers Foundation (CTS 17:109). A.G. is supported by the Knut and Alice Wallenberg Foundation (1,000 Ancient Genomes project). A.B. and P.S. were supported by the Francis Crick Institute (FC001595), which receives its core funding from Cancer Research UK, the UK Medical Research Council and the Wellcome Trust. P.S. was supported by the European Research Council (grant no. 852558), the Wellcome Trust (217223/Z/19/Z) and the Vallee Foundation. M.H., J.A.T., I.B., A.M.L. and G.X. were supported by NERC (grant no. NE/J010480/1) and the ERC StG grant GeneFlow (no. 310763). B.S. and J.O. were supported by the US National Science Foundation (DEB-1754451). P.N. was supported by RFBR (grant no. 13-05-01128). The authors also acknowledge support from Science for Life Laboratory, the Knut and Alice Wallenberg Foundation, the National Genomics Infrastructure funded by the Swedish Research Council, and Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure. N. Clark at the Hunterian Museum provided access to the Scotland mammoth sample. Finally, we thank our late friend and colleague A. Sher, who defined and described the Olyorian sequence, collected large quantities of fossil vertebrate material (including all of the Early and Middle Pleistocene specimens studied here) and consistently promoted multidisciplinary studies on his finds.
The authors declare no competing interests.
Peer review information Nature thanks Gloria Cuenca-Bescós, David Lambert, Krishna Veeramah and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
a, b, Upper third molars in lateral and cross-sectional views. c, Partial lower third molar in lateral and occlusal views. a, Chukochya (accession number PIN 3341-737). b, Krestovka (accession number PIN 3491-3) flipped horizontally. c, Adycha (accession number PIN 3723-511), occlusal view flipped horizontally. The lamellae are more closely spaced, and the enamel is thinner, in a (M. primigenius-like) than in b, c (M. trogontherii-like). d, Hypsodonty index versus lamellar length index of upper M3. e, Enamel thickness index versus basal lamellar length index of lower M3. Olyorian specimens that yielded DNA are labelled by site name. Green dashed line, convex hull summarizing Early to early Middle Pleistocene (about 1.5–0.5 Ma) North American Mammuthus samples (data points not shown). Green and blue squares, Early and Late Olyorian northeastern Siberian samples, respectively. Red and green circles, European M. meridionalis and M. trogontherii, respectively. Blue circles, M. primigenius from northeastern Siberia and Alaska. Note (i) the similarity of Krestovka and Adycha to other molars from the Early Olyorian, and to European steppe mammoths (M. trogontherii); (ii) the similarity of early North American mammoths to these (to molars of the Early Olyorian, in particular); and (iii) the similarity of Chukochya to M. primigenius. For site details, measurement definitions and data, see Supplementary Information section 1.
Extended Data Fig. 2 Sample age on the basis of biostratigraphy, palaeomagnetic reversals and genomic data.
Chart shows the stratigraphic position of the Kutuyakhian fauna, Phenacomys complex, and Early Olyorian and Late Olyorian faunas in relation to important European, northwest Asian and northern North American stratigraphic benchmarks. ELMA, European land mammal ages (small mammals); LMA, land mammal ages (large mammals); MN and MQ, European small mammal biozones; EEBU, East European biochronological units. Biostratigraphic- and palaeomagnetic-based chronological constraints for the specimens are provided, in comparison with the DNA-based age estimations.
Reads are aligned to the LoxAfr4 autosomes. For the three Early and Middle Pleistocene samples (Krestovka, Adycha and Chukochya), reads of 25–200-bp length are shown; 30–200-bp reads are shown for the remaining samples. Ultrashort reads (<35 bp) are denoted in red; these were shown to be enriched for spurious alignments, and therefore excluded from downstream analyses (Supplementary Information section 4). The mean read lengths (μ) were calculated using only the retained reads (≥35 bp).
The most ancient samples (Krestovka, Adycha and Chukochya) carry a greater frequency of cytosine deamination compared to younger permafrost-preserved woolly mammoth samples (Oimyakon and Wrangel) and the Columbian mammoth (M. columbi) specimen.
The statistics reflect relative divergence between the genomes on the left and the right side. Lower values indicate reduced derived allele-sharing between the sample indicated on the left and the right of the graph, at sites for which the genome on the right panel is heterozygous. The lower the value, the more drift has occurred between the genomes (and thus the older their genetic divergence).
The most parsimonious graph model (highest Bayes factor) of the phylogenetic relationships among mammoth lineages augmented with one admixture event. Branch lengths are given in f-statistic units multiplied by 1,000. Discontinuous lines show admixture events between lineages, and percentages represent admixture proportions.
a, The number of private alleles per 1,000 bp within genomic regions identified as woolly mammoth (M. primigenius) ancestry or ghost ancestry. b, Maximum-likelihood phylogenies for those genomic regions identified as ghost ancestry in the Colombian mammoth (M. columbi) genome. c, Maximum-likelihood phylogenies for regions identified as unadmixed ancestry.
Detailed description of methods and additional results, containing information on sample morphology and stratigraphy, laboratory methods for DNA extraction and sequencing, sequence data processing, and DNA authenticity assessment. Further information on mitogenome reconstruction, DNA-based dating, genetic phylogenies, and admixture analysis (f4-statistics, AdmixtureGraphs, TreeMix and ghost admixture) is also provided.
These tables contain information on sequencing data, specifically the number of sequence reads generated, and mapping and post-mortem DNA damage statistics. We also list all used priors and obtained posteriors from the mitochondrial BEAST analysis, all pairwise f4-statistics, and a list of all coding changes comparing mammoths to elephants.
About this article
Cite this article
van der Valk, T., Pečnerová, P., Díez-del-Molino, D. et al. Million-year-old DNA sheds light on the genomic history of mammoths. Nature 591, 265–269 (2021). https://doi.org/10.1038/s41586-021-03224-9
Frontiers in Ecology and Evolution (2021)
Nature Reviews Genetics (2021)
DNA staining in fossil cells beyond the Quaternary: Reassessment of the evidence and prospects for an improved understanding of DNA preservation in deep time
Earth-Science Reviews (2021)