The sequencing of ancient DNA has enabled the reconstruction of speciation, migration and admixture events for extinct taxa1. However, the irreversible post-mortem degradation2 of ancient DNA has so far limited its recovery—outside permafrost areas—to specimens that are not older than approximately 0.5 million years (Myr)3. By contrast, tandem mass spectrometry has enabled the sequencing of approximately 1.5-Myr-old collagen type I4, and suggested the presence of protein residues in fossils of the Cretaceous period5—although with limited phylogenetic use6. In the absence of molecular evidence, the speciation of several extinct species of the Early and Middle Pleistocene epoch remains contentious. Here we address the phylogenetic relationships of the Eurasian Rhinocerotidae of the Pleistocene epoch7,8,9, using the proteome of dental enamel from a Stephanorhinus tooth that is approximately 1.77-Myr old, recovered from the archaeological site of Dmanisi (South Caucasus, Georgia)10. Molecular phylogenetic analyses place this Stephanorhinus as a sister group to the clade formed by the woolly rhinoceros (Coelodonta antiquitatis) and Merck’s rhinoceros (Stephanorhinus kirchbergensis). We show that Coelodonta evolved from an early Stephanorhinus lineage, and that this latter genus includes at least two distinct evolutionary lines. The genus Stephanorhinus is therefore currently paraphyletic, and its systematic revision is needed. We demonstrate that sequencing the proteome of Early Pleistocene dental enamel overcomes the limitations of phylogenetic inference based on ancient collagen or DNA. Our approach also provides additional information about the sex and taxonomic assignment of other specimens from Dmanisi. Our findings reveal that proteomic investigation of ancient dental enamel—which is the hardest tissue in vertebrates11, and is highly abundant in the fossil record—can push the reconstruction of molecular evolution further back into the Early Pleistocene epoch, beyond the currently known limits of ancient DNA preservation.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All of the mass spectrometry proteomics data have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD011008. Genomic BAM files used for Rhinocerotidae protein sequence translation and protein sequence alignments used for phylogenetic reconstruction are available on Figshare (https://doi.org/10.6084/m9.figshare.7212746).
The in-house R script used to align the peptide sequences confidently identified by the PEAKS searches is available to everyone upon request to the corresponding authors.
Cappellini, E. et al. Ancient biomolecules and evolutionary inference. Annu. Rev. Biochem. 87, 1029–1060 (2018).
Dabney, J., Meyer, M. & Pääbo, S. Ancient DNA damage. Cold Spring Harb. Perspect. Biol. 5, a012567 (2013).
Meyer, M. et al. Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531, 504–507 (2016).
Wadsworth, C. & Buckley, M. Proteome degradation in fossils: investigating the longevity of protein survival in ancient bone. Rapid Commun. Mass Spectrom. 28, 605–615 (2014).
Schweitzer, M. H. et al. Analyses of soft tissue from Tyrannosaurus rex suggest the presence of protein. Science 316, 277–280 (2007).
Schroeter, E. R. et al. Expansion for the Brachylophosaurus canadensis collagen I sequence and additional evidence of the preservation of Cretaceous protein. J. Proteome Res. 16, 920–932 (2017).
Willerslev, E. et al. Analysis of complete mitochondrial genomes from extinct and extant rhinoceroses reveals lack of phylogenetic resolution. BMC Evol. Biol. 9, 95 (2009).
Welker, F. et al. Middle Pleistocene protein sequences from the rhinoceros genus Stephanorhinus and the phylogeny of extant and extinct Middle/Late Pleistocene Rhinocerotidae. PeerJ 5, e3033 (2017).
Kirillova, I. et al. Discovery of the skull of Stephanorhinus kirchbergensis (Jäger, 1839) above the Arctic Circle. Quat. Res. 88, 537–550 (2017).
Lordkipanidze, D. et al. A complete skull from Dmanisi, Georgia, and the evolutionary biology of early Homo. Science 342, 326–331 (2013).
Eastoe, J. E. Organic matrix of tooth enamel. Nature 187, 411–412 (1960).
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. eLife 5, e17092 (2016).
Welker, F. et al. Ancient proteins resolve the evolutionary history of Darwin’s South American ungulates. Nature 522, 81–84 (2015).
Chen, F. et al. A late Middle Pleistocene Denisovan mandible from the Tibetan Plateau. Nature 569, 409–412 (2019).
Nei, M. Molecular Evolutionary Genetics Vol. 75, 39–63 (Columbia Univ. Press, 1987).
Buckley, M., Warwood, S., van Dongen, B., Kitchener, A. C. & Manning, P. L. A fossil protein chimera; difficulties in discriminating dinosaur peptide sequences from modern cross-contamination. Proc. R. Soc. Lond. B 284, 20170544 (2017).
Gabunia, L. et al. Earliest Pleistocene hominid cranial remains from Dmanisi, Republic of Georgia: taxonomy, geological setting, and age. Science 288, 1019–1025 (2000).
Ferring, R. et al. Earliest human occupations at Dmanisi (Georgian Caucasus) dated to 1.85–1.78 Ma. Proc. Natl Acad. Sci. USA 108, 10432–10436 (2011).
Castiblanco, G. A. et al. Identification of proteins from human permanent erupted enamel. Eur. J. Oral Sci. 123, 390–395 (2015).
Stewart, N. A. et al. The identification of peptides by nanoLC-MS/MS from human surface tooth enamel following a simple acid etch extraction. RSC Advances 6, 61673–61679 (2016).
van Doorn, N. L., Wilson, J., Hollund, H., Soressi, M. & Collins, M. J. Site-specific deamidation of glutamine: a new marker of bone collagen deterioration. Rapid Commun. Mass Spectrom. 26, 2319–2327 (2012).
Catak, S., Monard, G., Aviyente, V. & Ruiz-López, M. F. Computational study on nonenzymatic peptide bond cleavage at asparagine and aspartic acid. J. Phys. Chem. A 112, 8752–8761 (2008).
Hunter, T. Why nature chose phosphate to modify proteins. Phil. Trans. R. Soc. Lond. B 367, 2513–2516 (2012).
Hu, J. C. C., Yamakoshi, Y., Yamakoshi, F., Krebsbach, P. H. & Simmer, J. P. Proteomics and genetics of dental enamel. Cells Tissues Organs 181, 219–231 (2005).
Tagliabracci, V. S. et al. Secreted kinase phosphorylates extracellular proteins that regulate biomineralization. Science 336, 1150–1153 (2012).
Cleland, T. P. Solid digestion of demineralized bone as a method to access potentially insoluble proteins and post-translational modifications. J. Proteome Res. 17, 536–542 (2018).
Antoine, P.-O. et al. A revision of Aceratherium blanfordi Lydekker, 1884 (Mammalia: Rhinocerotidae) from the Early Miocene of Pakistan: postcranials as a key. Zool. J. Linn. Soc. 160, 139–194 (2010).
Steiner, C. C. & Ryder, O. A. Molecular phylogeny and evolution of the Perissodactyla. Zool. J. Linn. Soc. 163, 1289–1303 (2011).
Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).
Rieseberg, L. H. Evolution: replacing genes and traits through hybridization. Curr. Biol. 19, R119–R122 (2009).
Guérin, C. Les Rhinocéros (Mammalia, Perissodactyla) du Miocène Terminal au Pleistocène Supérieur en Europe occidentale, Comparaison avec les Espèces Actuelles (Documents du Laboratoire de Geologie de la Faculte des Sciences de Lyon, volume 79) (Univ. Claude-Bernard, 1980).
Deng, T. et al. Out of Tibet: Pliocene woolly rhino suggests high-plateau origin of Ice Age megaherbivores. Science 333, 1285–1288 (2011).
Orlando, L. et al. Ancient DNA analysis reveals woolly rhino evolutionary relationships. Mol. Phylogenet. Evol. 28, 485–499 (2003).
Yuan, J. et al. Ancient DNA sequences from Coelodonta antiquitatis in China reveal its divergence and phylogeny. Sci. China Earth Sci. 57, 388–396 (2014).
Penkman, K. E. H., Kaufman, D. S., Maddy, D. & Collins, M. J. Closed-system behaviour of the intra-crystalline fraction of amino acids in mollusc shells. Quat. Geochronol. 3, 2–25 (2008).
Hendy, J. et al. A guide to ancient protein studies. Nat. Ecol. Evol. 2, 791–799 (2018).
Wiśniewski, J. R., Zougman, A., Nagaraj, N. & Mann, M. Universal sample preparation method for proteome analysis. Nat. Methods 6, 359–362 (2009).
Cappellini, E. et al. Resolution of the type material of the Asian elephant, Elephas maximus Linnaeus, 1758 (Proboscidea, Elephantidae). Zool. J. Linn. Soc. 170, 222–232 (2014).
Kulak, N. A., Pichler, G., Paron, I., Nagaraj, N. & Mann, M. Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat. Methods 11, 319–324 (2014).
Mackie, M. et al. Palaeoproteomic profiling of conservation layers on a 14th century Italian wall painting. Angew. Chem. Int. Edn 57, 7369–7374 (2018).
Cappellini, E. et al. Proteomic analysis of a Pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012).
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteomics 11, M111.010587 (2012).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Welker, F. et al. Palaeoproteomic evidence identifies archaic hominins associated with the Châtelperronian at the Grotte du Renne. Proc. Natl Acad. Sci. USA 113, 11162–11167 (2016).
Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Gabriels, R., Martens, L. & Degroeve, S. Updated MS2PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Res. 47, W295–W299 (2019).
Tyanova, S., Temu, T. & Cox, J. The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat. Protocols 11, 2301–2319 (2016).
Colaert, N., Helsens, K., Martens, L., Vandekerckhove, J. & Gevaert, K. Improved visualization of protein consensus sequences by iceLogo. Nat. Methods 6, 786–787 (2009).
Korneliussen, T. S., Albrechtsen, A. & Nielsen, R. ANGSD: analysis of next generation sequencing data. BMC Bioinformatics 15, 356 (2014).
Briggs, A. W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Sea Urchin Genome Sequencing Consortium. The genome of the sea urchin Strongylocentrotus purpuratus. Science 314, 941–952 (2006).
Katoh, K. & Frith, M. C. Adding unaligned sequences into an existing alignment using MAFFT and LAST. Bioinformatics 28, 3144–3146 (2012).
Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).
Rohland, N. & Hofreiter, M. Comparison and optimization of ancient DNA extraction. Biotechniques 42, 343–352 (2007).
Meyer, M. & Kircher, M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, pdb.prot5448 (2010).
Schubert, M. et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat. Protocols 9, 1056–1082 (2014).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Dickinson, M. R., Lister, A. M. & Penkman, K. E. H. A new method for enamel amino acid racemization dating: a closed system approach. Quat. Geochronol. 50, 29–46 (2019).
E.C. and F.W. are supported by the VILLUM FONDEN (grant number 17649) and by the European Commission through a Marie Skłodowska Curie (MSC) Individual Fellowship (grant number 795569). E.W. is supported by the Lundbeck Foundation, the Danish National Research Foundation, the Novo Nordisk Foundation, the Carlsberg Foundation, KU2016 and the Wellcome Trust. E.C., C.K., J.V.O., P.R. and D.S. are supported by the European Commission through the MSC European Training Network ‘TEMPERA’ (grant number 722606). M.M. and R.R.J.-C. are supported by the University of Copenhagen KU2016 (UCPH Excellence Programme) grant. M.M. is also supported by the Danish National Research Foundation award PROTEIOS (DNRF128). Work at the Novo Nordisk Foundation Center for Protein Research is funded in part by a donation from the Novo Nordisk Foundation (grant number NNF14CC0001). M.R.D. is supported by a PhD DTA studentship from NERC and the Natural History Museum (NE/K500987/1 & NE/L501761/1). K.P. is supported by the Leverhulme Trust (PLP -2012-116). L.R. and L.P. are supported by the Italian Ministry for Foreign Affairs (MAECI, DGSP-VI). L.P. was also supported by the EU-SYNTHESYS project (AT-TAF-2550, DE-TAF-3049, GB-TAF-2825, HU-TAF-3593 and ES-TAF-2997) funded by the European Commission. L.D. is supported by the Swedish Research Council (grant number 2017-04647) and FORMAS (grant number 2015-676). M.T.P.G. is supported by ERC Consolidator Grant ‘Extinction genomics’ (grant number 681396). L.O. is supported by the ERC Consolidator Grant ‘PEGASUS’ (grant agreement number 681605). B.S., J.K. and P.D.H. are supported by the Gordon and Betty Moore foundation. B.M.-N. is supported by the Spanish Ministry of Sciences (grant number CGL2016-80975-P) and the Generalitat de Catalunya, Spain (grant number 2017SGR 859). J.A. is supported by the Spanish Ministry of Sciences (grant number CGL2016-80000-P). R.F. is supported by National Science Foundation (grant number 1025245). The ancient DNA analysis was carried out using the facilities of the University of Luxembourg, the Swedish Museum of Natural History and UC Santa Cruz. We acknowledge support from the Science for Life Laboratory, the National Genomics Infrastructure (Sweden) and UPPMAX for providing assistance with massive parallel sequencing and computational infrastructure. Research at Dmanisi is supported by the John Templeton Foundation (grant number 52935), and the Shota Rustaveli Science Foundation (grant number 18-27262). We thank B. Triozzi and K. Murphy Gregersen for technical support.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Peer review information Nature thanks Benedikt Kessler, Tina Warinner and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
Extended Data Fig. 1 Generalized stratigraphic profiles for Dmanisi, indicating origins of the specimens.
a, Type section of the Dmanisi M5 excavation block. b, Stratigraphic profile of excavation area M6. M6 preserves a larger gully associated with the pipe-gully phase of stratigraphic–geomorphic development in stratum B1. The thickness of the stratum B1 gully fill extends to the basalt surface but includes ‘rip-ups’ of strata A1 and A2, showing that the deposits in stratum B1 post-date those of stratum A. c, Stratigraphic section of excavation area M17. Here, Stratum B1 was deposited after the erosion of stratum A deposits. The stratigraphic position of specimen Dm.5/157–16635 is highlighted with a red diamond. The Masavara basalt is about 50 cm below the base of the profile shown. d, Northern section of block 2. Following the collapse of a pipe and erosion to the basalt, the deeper part of this area was filled with local gully fill of strata B1x, B1y and B1z. Note the uniform burial of all stratum B1 deposits by strata B2, B3 and B4. The sampled specimens are indicated by the five-digit CGG numbers. Extended Data Table 1 provides both the CGG and GNM specimen numbers.
a, c, e, g, i, j, Peptide–spectrum match (PSM) sequence coverage of the proteins AMBN (a), ENAM (c), AMELX (e), AMTN (g), MMP20 (i) and ALB (j). Annotations include ‘amino acid position, amino acid called in that position (number of PSMs and peptides covering that position)’ for the phylogenetically informative single-amino-acid polymorphisms within Rhinocerotidae. b, d, f, h, Frequency (per cent) of phosphorylated (green) and unphosphorylated (red) PSMs per amino acid position for AMBN (b), ENAM (d), AMELX (f) and AMTN (h). Numbers within the bars provide the PSM counts. k, Violin plot of distribution of PSM coverage for all covered sites (n = 693), and for sites of phylogenetic relevance (single-amino-acid polymorphisms, n = 30). The box plots define the range of the data, with whiskers extending to 1.5× interquartile range, boxes denoting the 25th and 75th percentiles and dots indicating the median. All panels are based only on MaxQuant search results. The Supplementary Data contains examples of MS/MS spectra, and fragment-ion series alignments for each of the marked single-amino-acid polymorphisms.
Extended Data Fig. 3 Peptide and fragment-ion coverage of AMELX isoform 1 and isoform 2 from specimen Dm.M6/7.II.296–16856.
Peptides specific to AMELX isoform 1 and isoform 2 appear in the top and bottom parts of the figure, respectively. No AMELX isoform 2 is currently reported in public databases for the Cervidae group. Accordingly, the AMELX-isoform-2-specific peptides were identified by MaxQuant spectral matching against bovine (Bos taurus) AMELX isoform 2 (UniProt accession number P02817-2). AMELX isoform 2 (also known as leucine-rich amelogenin peptide (LRAP)) is a naturally occurring isoform of AMELX from the translation product of an alternatively spliced transcript.
Extent of intra-crystalline racemization in enamel for the free amino acid (FAA, x axis) fraction and the total hydrolysable amino acids (THAA, y axis) fraction for four amino acids (Asp plus Asn (here denoted Asx), Glu plus Gln (here denoted Glx), Ala and Phe). Note the differences in axis scale. Intra-crystalline data from Proboscidea enamel from a range of sites in the UK64 have been shown for comparison (grey crosses). Taxa from both Dmanisi and the UK exhibit a similar relationship between FAA and THAA racemization, and R2 values have been calculated on the basis of a polynomial relationship (order = 2, all > 0.93).
Annotated spectra including phosphorylated (here denoted ph) serine (S). a, Phosphorylation in the S-X-E motif of AMELX. b, Phosphorylation in the S-X-phosphorylated S motif of AMBN. Phosphorylation was independently observed in all three separate analyses of Dm.5/157–16635, including multiple spectra and peptides (Extended Data Fig. 2).
Extended Data Fig. 6 Phylogenetic relationships between the comparative reference dataset and specimen Dm.bXI–16857.
Consensus tree from Bayesian inference. The posterior probability of each bipartition is shown as a percentage to the left of each node.
a, Specimen Dm.6/151.4.A4.12–16630. b, Specimen Dm.69/64.3.B1.53–16631. c, Specimen Dm.8/154.4.A4.22–16639. d, Specimen Dm.M6/7.II.296–16856. Note the presence of deamidated glutamine (deQ) and asparagine (deN), oxidated methionine (oxM) and phosphorylated serine (phS).
a, Maximum-likelihood phylogeny obtained using PhyML and the protein alignment that excludes Dm.5/157–16635. b, Topologies obtained from 100 random replicates of the woolly rhinoceros (C. antiquitatis). In each replicate, the number of missing sites was similar to that observed for the Dm.5/157–16635 specimen (72.4% missingness). The percentage shown for each topology indicates the number of replicates in which that particular topology was recovered. c, As in b, but for the Javan rhinoceros (R. sondaicus). d, As in b, but for the black rhinoceros (D. bicornis).
Supplementary Materials, Methods and Results, with Figures and Tables. Detailed description, enriched with figures and tables, of: (i) the studied specimens, (ii) the experimental procedures used to generate the data, and (iii) the results, both positive and negative, supporting the conclusions reported in the main text.
Selection of automatically and manually annotated tandem MS/MS spectra, retrieved from Dmanisi ~1.77 Myr old specimens as well as synthetic peptides, supporting the identification of: (i) phylogenetically informative amino acid positions observed in specimen Dm.5/157-16635, and (ii) phosphorylated sites.
About this article
Cite this article
Cappellini, E., Welker, F., Pandolfi, L. et al. Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature 574, 103–107 (2019). https://doi.org/10.1038/s41586-019-1555-y
Scientific Reports (2021)
Scientific Reports (2021)
Annual Review of Animal Biosciences (2021)
Molecular Ecology Resources (2021)
Journal of Proteomics (2021)