Million-year-old DNA sheds light on the genomic history of mammoths

Abstract

Temporal genomic data hold great potential for studying evolutionary processes such as speciation. However, sampling across speciation events would, in many cases, require genomic time series that stretch well back into the Early Pleistocene subepoch. Although theoretical models suggest that DNA should survive on this timescale1, the oldest genomic data recovered so far are from a horse specimen dated to 780–560 thousand years ago2. Here we report the recovery of genome-wide data from three mammoth specimens dating to the Early and Middle Pleistocene subepochs, two of which are more than one million years old. We find that two distinct mammoth lineages were present in eastern Siberia during the Early Pleistocene. One of these lineages gave rise to the woolly mammoth and the other represents a previously unrecognized lineage that was ancestral to the first mammoths to colonize North America. Our analyses reveal that the Columbian mammoth of North America traces its ancestry to a Middle Pleistocene hybridization between these two lineages, with roughly equal admixture proportions. Finally, we show that the majority of protein-coding changes associated with cold adaptation in woolly mammoths were already present one million years ago. These findings highlight the potential of deep-time palaeogenomics to expand our understanding of speciation and long-term adaptive evolution.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: DNA-based phylogenies and specimen age estimates.
Fig. 2: Inferred genomic history of mammoths.

Data availability

All sequence data (in .fastq format) for samples sequenced in this study are available through the European Nucleotide Archive under accession number PRJEB42269. Previously published data used in this study are available under accession numbers PRJEB24361 and PRJEB7929.

Code availability

The custom code used in this study to evaluate read length cut-offs is available from GitHub (https://github.com/stefaniehartmann/readLengthCutoff).

References

  1. 1.

    Allentoft, M. E. et al. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proc. R. Soc. Lond. B 279, 4724–4733 (2012).

    CAS  Google Scholar 

  2. 2.

    Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).

    ADS  CAS  PubMed  Google Scholar 

  3. 3.

    Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466–469 (2012).

    ADS  CAS  PubMed  Google Scholar 

  4. 4.

    Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Palkopoulou, E. et al. Complete genomes reveal signatures of demographic and genetic declines in the woolly mammoth. Curr. Biol. 25, 1395–1400 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Weir, J. T. & Schluter, D. Ice sheets promote speciation in boreal birds. Proc. R. Soc. Lond. B 271, 1881–1887 (2004).

    Google Scholar 

  7. 7.

    Lister, A. M. The impact of Quaternary Ice Ages on mammalian evolution. Phil. Trans. R. Soc. Lond. B 359, 221–241 (2004).

    Google Scholar 

  8. 8.

    Lister, A. M., Sher, A. V., van Essen, H. & Wei, G. The pattern and process of mammoth evolution in Eurasia. Quat. Int. 126–128, 49–64 (2005).

    Google Scholar 

  9. 9.

    Werdelin, L. & Sanders, W. J. (eds) Cenozoic Mammals of Africa (Univ. California Press, 2010).

  10. 10.

    Lister, A. M. & Sher, A. V. Evolution and dispersal of mammoths across the Northern Hemisphere. Science 350, 805–809 (2015).

    ADS  CAS  PubMed  Google Scholar 

  11. 11.

    Repenning, C. A. Allophaiomys and the Age of the Olyor Suite, Krestovka Sections, Yakutia (US Government Printing Office, 1992).

  12. 12.

    Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl Acad. Sci. USA 110, 15758–15763 (2013).

    ADS  CAS  PubMed  Google Scholar 

  13. 13.

    Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).

    PubMed  Google Scholar 

  14. 14.

    Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, db.prot5448 (2010).

    Google Scholar 

  15. 15.

    Palkopoulou, E. et al. A comprehensive genomic history of extinct and living elephants. Proc. Natl Acad. Sci. USA 115, E2566–E2574 (2018).

    CAS  PubMed  Google Scholar 

  16. 16.

    Rohland, N. et al. Proboscidean mitogenomics: chronology and mode of elephant evolution using mastodon as outgroup. PLoS Biol. 5, e207 (2007).

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Chang, D. et al. The evolutionary and phylogeographic history of woolly mammoths: a comprehensive mitogenomic analysis. Sci. Rep. 7, 44585 (2017).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Pečnerová, P. et al. Mitogenome evolution in the last surviving woolly mammoth population reveals neutral and functional consequences of small population size. Evol. Lett. 1, 292–303 (2017).

    PubMed  PubMed Central  Google Scholar 

  20. 20.

    Barnes, I. et al. Genetic structure and extinction of the woolly mammoth, Mammuthus primigenius. Curr. Biol. 17, 1072–1075 (2007).

    CAS  PubMed  Google Scholar 

  21. 21.

    Pickrell, J. K. & Pritchard, J. K. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8, e1002967 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Leppälä, K., Nielsen, S. V. & Mailund, T. admixturegraph: an R package for admixture graph manipulation and fitting. Bioinformatics 33, 1738–1740 (2017).

    PubMed  PubMed Central  Google Scholar 

  24. 24.

    Skov, L. et al. Detecting archaic introgression using an unadmixed outgroup. PLoS Genet. 14, e1007641 (2018).

    PubMed  PubMed Central  Google Scholar 

  25. 25.

    Lynch, V. J. et al. Elephantid genomes reveal the molecular bases of woolly mammoth adaptations to the Arctic. Cell Rep. 12, 217–228 (2015).

    CAS  PubMed  Google Scholar 

  26. 26.

    Mallet, J. Hybrid speciation. Nature 446, 279–283 (2007).

    ADS  CAS  PubMed  Google Scholar 

  27. 27.

    Lucas, S. G., Morgan, G. S., Love, D. W. & Connell, S. D. The first North American mammoths: taxonomy and chronology of early Irvingtonian (Early Pleistocene) Mammuthus from New Mexico. Quat. Int. 443, 2–13 (2017).

    Google Scholar 

  28. 28.

    Gansauge, M.-T. & Meyer, M. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protocols 8, 737–748 (2013).

    PubMed  Google Scholar 

  29. 29.

    John, J. S. SeqPrep: tool for stripping adaptors and/or merging paired reads with overlap into single reads. GitHub https://github.com/jstjohn/SeqPrep (2011).

  30. 30.

    Schubert, M. et al. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics 13, 178 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31.

    Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  32. 32.

    Feuerborn, T. R. et al. Competitive mapping allows for the identification and exclusion of human DNA contamination in ancient faunal genomic datasets. BMC Genomics 21, 844 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  34. 34.

    Jónsson, H., Ginolhac, A., Schubert, M., Johnson, P. L. F. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).

    PubMed  PubMed Central  Google Scholar 

  35. 35.

    Skoglund, P. et al. Separating endogenous ancient DNA from modern day contamination in a Siberian Neandertal. Proc. Natl Acad. Sci. USA 111, 2229–2234 (2014).

    ADS  CAS  PubMed  Google Scholar 

  36. 36.

    Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0, 2013–2015. http://www.repeatmasker.org (2015).

  38. 38.

    Green, R. E. et al. A complete Neandertal mitochondrial genome sequence determined by high-throughput sequencing. Cell 134, 416–426 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Meyer, M. et al. Palaeogenomes of Eurasian straight-tusked elephants challenge the current view of elephant evolution. eLife 6, e25413 (2017).

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).

    ADS  CAS  PubMed  Google Scholar 

  42. 42.

    Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. jModelTest 2: more models, new heuristics and parallel computing. Nat. Methods 9, 772 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).

    PubMed  PubMed Central  Google Scholar 

  44. 44.

    Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013).

    CAS  PubMed  Google Scholar 

  45. 45.

    Lefort, V., Desper, R. & Gascuel, O. FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32, 2798–2800 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Liu, L. et al. Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion. Nat. Commun. 10, 1992 (2019).

    ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Frith, M. C., Hamada, M. & Horton, P. Parameters for accurate genome alignment. BMC Bioinformatics 11, 80 (2010).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).

    PubMed  PubMed Central  Google Scholar 

  49. 49.

    Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinformatics 10, 48 (2009).

    PubMed  PubMed Central  Google Scholar 

  50. 50.

    Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    CAS  Google Scholar 

Download references

Acknowledgements

T.v.d.V., P.P., D.D.-d.-M., M.D. and L.D. acknowledge support from the Swedish Research Council (2012-3869 and 2017-04647), FORMAS (2018-01640) and the Tryggers Foundation (CTS 17:109). A.G. is supported by the Knut and Alice Wallenberg Foundation (1,000 Ancient Genomes project). A.B. and P.S. were supported by the Francis Crick Institute (FC001595), which receives its core funding from Cancer Research UK, the UK Medical Research Council and the Wellcome Trust. P.S. was supported by the European Research Council (grant no. 852558), the Wellcome Trust (217223/Z/19/Z) and the Vallee Foundation. M.H., J.A.T., I.B., A.M.L. and G.X. were supported by NERC (grant no. NE/J010480/1) and the ERC StG grant GeneFlow (no. 310763). B.S. and J.O. were supported by the US National Science Foundation (DEB-1754451). P.N. was supported by RFBR (grant no. 13-05-01128). The authors also acknowledge support from Science for Life Laboratory, the Knut and Alice Wallenberg Foundation, the National Genomics Infrastructure funded by the Swedish Research Council, and Uppsala Multidisciplinary Center for Advanced Computational Science for assistance with massively parallel sequencing and access to the UPPMAX computational infrastructure. N. Clark at the Hunterian Museum provided access to the Scotland mammoth sample. Finally, we thank our late friend and colleague A. Sher, who defined and described the Olyorian sequence, collected large quantities of fossil vertebrate material (including all of the Early and Middle Pleistocene specimens studied here) and consistently promoted multidisciplinary studies on his finds.

Author information

Affiliations

Authors

Contributions

L.D., A.M.L., B.S., M.H. and I.B. conceived the project. L.D., A.G., P.P. and D.D.-d.-M. designed the study together with P.N. and A.M.L. Laboratory work on Early and Middle Pleistocene samples was done by P.P., L.D., A.G. and M.D., and G.X. and J.A.T. conducted laboratory work on Late Pleistocene samples. P.P., T.v.d.V. and D.D.-d.-M. processed and mapped sequence data. T.v.d.V., S.H. and P.D.H. performed tests on DNA authenticity. T.v.d.V., J.O. and S.L. conducted phylogenetic and Treemix analyses. J.O. and T.v.d.V. computed genomic age estimates. T.v.d.V., A.B. and D.D.-d.-M. performed analyses on D statistics and f4 statistics and admixture graph models. T.v.d.V. performed analyses on population structure, and ghost admixture. T.v.d.V., E.S., F.R.F. and M.S. performed analysis on selection. L.D., P.D.H., M.H., B.S., A.G., M.S., P.S., P.N. and A.M.L. provided advice on the bioinformatic analyses and/or helped to interpret the results. P.N. and A.M.L. provided morphological analyses as well as palaeontological and geological information. The manuscript was written by T.v.d.V., P.P., D.D.-d.-M., P.N. and L.D., with contributions from all co-authors.

Corresponding authors

Correspondence to Tom van der Valk or Love Dalén.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Gloria Cuenca-Bescós, David Lambert, Krishna Veeramah and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Mammoth molars and morphometric comparisons.

a, b, Upper third molars in lateral and cross-sectional views. c, Partial lower third molar in lateral and occlusal views. a, Chukochya (accession number PIN 3341-737). b, Krestovka (accession number PIN 3491-3) flipped horizontally. c, Adycha (accession number PIN 3723-511), occlusal view flipped horizontally. The lamellae are more closely spaced, and the enamel is thinner, in a (Mprimigenius-like) than in b, c (Mtrogontherii-like). d, Hypsodonty index versus lamellar length index of upper M3. e, Enamel thickness index versus basal lamellar length index of lower M3. Olyorian specimens that yielded DNA are labelled by site name. Green dashed line, convex hull summarizing Early to early Middle Pleistocene (about 1.5–0.5 Ma) North American Mammuthus samples (data points not shown). Green and blue squares, Early and Late Olyorian northeastern Siberian samples, respectively. Red and green circles, European M. meridionalis and M. trogontherii, respectively. Blue circles, M. primigenius from northeastern Siberia and Alaska. Note (i) the similarity of Krestovka and Adycha to other molars from the Early Olyorian, and to European steppe mammoths (M. trogontherii); (ii) the similarity of early North American mammoths to these (to molars of the Early Olyorian, in particular); and (iii) the similarity of Chukochya to M. primigenius. For site details, measurement definitions and data, see Supplementary Information section 1.

Extended Data Fig. 2 Sample age on the basis of biostratigraphy, palaeomagnetic reversals and genomic data.

Chart shows the stratigraphic position of the Kutuyakhian fauna, Phenacomys complex, and Early Olyorian and Late Olyorian faunas in relation to important European, northwest Asian and northern North American stratigraphic benchmarks. ELMA, European land mammal ages (small mammals); LMA, land mammal ages (large mammals); MN and MQ, European small mammal biozones; EEBU, East European biochronological units. Biostratigraphic- and palaeomagnetic-based chronological constraints for the specimens are provided, in comparison with the DNA-based age estimations.

Extended Data Fig. 3 DNA-fragment length distributions for nine mammoths.

Reads are aligned to the LoxAfr4 autosomes. For the three Early and Middle Pleistocene samples (Krestovka, Adycha and Chukochya), reads of 25–200-bp length are shown; 30–200-bp reads are shown for the remaining samples. Ultrashort reads (<35 bp) are denoted in red; these were shown to be enriched for spurious alignments, and therefore excluded from downstream analyses (Supplementary Information section 4). The mean read lengths (μ) were calculated using only the retained reads (≥35 bp).

Extended Data Fig. 4 Post-mortem cytosine deamination damage profiles at CpG sites.

The most ancient samples (Krestovka, Adycha and Chukochya) carry a greater frequency of cytosine deamination compared to younger permafrost-preserved woolly mammoth samples (Oimyakon and Wrangel) and the Columbian mammoth (M. columbi) specimen.

Extended Data Fig. 5 F(A|B) statistics.

The statistics reflect relative divergence between the genomes on the left and the right side. Lower values indicate reduced derived allele-sharing between the sample indicated on the left and the right of the graph, at sites for which the genome on the right panel is heterozygous. The lower the value, the more drift has occurred between the genomes (and thus the older their genetic divergence).

Extended Data Fig. 6 qpGraph model.

The most parsimonious graph model (highest Bayes factor) of the phylogenetic relationships among mammoth lineages augmented with one admixture event. Branch lengths are given in f-statistic units multiplied by 1,000. Discontinuous lines show admixture events between lineages, and percentages represent admixture proportions.

Extended Data Fig. 7 Ghost introgression analysis of the Columbian mammoth genome.

a, The number of private alleles per 1,000 bp within genomic regions identified as woolly mammoth (M. primigenius) ancestry or ghost ancestry. b, Maximum-likelihood phylogenies for those genomic regions identified as ghost ancestry in the Colombian mammoth (M. columbi) genome. c, Maximum-likelihood phylogenies for regions identified as unadmixed ancestry.

Supplementary information

Supplementary Information

Detailed description of methods and additional results, containing information on sample morphology and stratigraphy, laboratory methods for DNA extraction and sequencing, sequence data processing, and DNA authenticity assessment. Further information on mitogenome reconstruction, DNA-based dating, genetic phylogenies, and admixture analysis (f4-statistics, AdmixtureGraphs, TreeMix and ghost admixture) is also provided.

Reporting Summary

Supplementary Tables

These tables contain information on sequencing data, specifically the number of sequence reads generated, and mapping and post-mortem DNA damage statistics. We also list all used priors and obtained posteriors from the mitochondrial BEAST analysis, all pairwise f4-statistics, and a list of all coding changes comparing mammoths to elephants.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

van der Valk, T., Pečnerová, P., Díez-del-Molino, D. et al. Million-year-old DNA sheds light on the genomic history of mammoths. Nature (2021). https://doi.org/10.1038/s41586-021-03224-9

Download citation

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing