Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel

Abstract

In temperate and subtropical regions, ancient proteins are reported to survive up to about 2 million years, far beyond the known limits of ancient DNA preservation in the same areas. Accordingly, their amino acid sequences currently represent the only source of genetic information available to pursue phylogenetic inference involving species that went extinct too long ago to be amenable for ancient DNA analysis. Here we present a complete workflow, including sample preparation, mass spectrometric data acquisition and computational analysis, to recover and interpret million-year-old dental enamel protein sequences. During sample preparation, the proteolytic digestion step, usually an integral part of conventional bottom-up proteomics, is omitted to increase the recovery of the randomly degraded peptides spontaneously generated by extensive diagenetic hydrolysis of ancient proteins over geological time. Similarly, we describe other solutions we have adopted to (1) authenticate the endogenous origin of the protein traces we identify, (2) detect and validate amino acid variation in the ancient protein sequences and (3) attempt phylogenetic inference. Sample preparation and data acquisition can be completed in 3–4 working days, while subsequent data analysis usually takes 2–5 days. The workflow described requires basic expertise in ancient biomolecules analysis, mass spectrometry-based proteomics and molecular phylogeny. Finally, we describe the limits of this approach and its potential for the reconstruction of evolutionary relationships in paleontology and paleoanthropology.

Key points

  • Paleoproteomics has shown that it is possible to obtain useful phylogenetic information from dental enamel proteins up to 2 million years old. They are heavily fragmented and chemically modified, making their recovery and analysis challenging.

  • The protocol describes how to (1) extract million-year-old dental enamel protein remains while minimizing contamination, (2) sequence them using high-resolution tandem mass spectrometry and (3) attempt otherwise so far impossible molecular-based phylogenetic inference.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Artistic representation of a cross-section of a human tooth.
Fig. 2: Schematic representation of the data analysis process.
Fig. 3: Authentication of the ancient, endogenous origin of the retrieved peptides.
Fig. 4: Validation of ancient protein sequence reconstruction.
Fig. 5: Polymer contamination.

Similar content being viewed by others

Data availability

All raw data points underlying Fig. 3, originally published in ref. 12, are accessible through the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD011008.

Code availability

Scripts required for the protein database reconstruction and phylogenetic analysis can be found at the following github repository: https://github.com/johnpatramanis/Nature_Prot_Enamel

References

  1. Higuchi, R., Bowman, B., Freiberger, M., Ryder, O. A. & Wilson, A. C. DNA sequences from the quagga, an extinct member of the horse family. Nature 312, 282–284 (1984).

    Article  CAS  PubMed  Google Scholar 

  2. Pääbo, S., Gifford, J. A. & Wilson, A. C. Mitochondrial DNA sequences from a 7000-year old brain. Nucleic Acids Res. 16, 9775–9787 (1988).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Hagelberg, E. & Clegg, J. B. Isolation and characterization of DNA from archaeological bone. Proc. Biol. Sci. 244, 45–50 (1991).

    Article  CAS  PubMed  Google Scholar 

  4. Poinar, H. N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394 (2006).

    Article  CAS  PubMed  Google Scholar 

  5. Willerslev, E. et al. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).

    Article  CAS  PubMed  Google Scholar 

  9. van der Valk, T. et al. Million-year-old DNA sheds light on the genomic history of mammoths. Nature 591, 265–269 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).

    Article  CAS  PubMed  Google Scholar 

  11. Lipson, M. et al. Ancient DNA and deep population structure in sub-Saharan African foragers. Nature 603, 290–296 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Cappellini, E. et al. Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature 574, 103–107 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Welker, F. et al. Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature 576, 262–265 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Welker, F. et al. The dental proteome of Homo antecessor. Nature 580, 235–238 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Warinner, C., Korzow Richter, K. & Collins, M. J. Paleoproteomics. Chem. Rev. 122, 13401–13446 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Olsen, J. V., Ong, S.-E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteom. 3, 608–614 (2004).

    Article  CAS  Google Scholar 

  17. Stewart, N. A., Gerlach, R. F., Gowland, R. L., Gron, K. J. & Montgomery, J. Sex determination of human remains from peptides in tooth enamel. Proc. Natl Acad. Sci. USA 114, 13649–13654 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Cappellini, E. et al. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012).

    Article  CAS  PubMed  Google Scholar 

  19. Mackie, M. et al. Palaeoproteomic profiling of conservation layers on a 14th century Italian wall painting. Angew. Chem. 57, 7369–7374 (2018).

    Article  CAS  Google Scholar 

  20. Rappsilber, J., Mann, M. & Ishihama, Y. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2, 1896–1906 (2007).

    Article  CAS  PubMed  Google Scholar 

  21. Parker, G. J. et al. Sex estimation using sexually dimorphic amelogenin protein fragments in human enamel. J. Archaeol. Sci. 101, 169–180 (2019).

    Article  CAS  Google Scholar 

  22. Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670 (2003).

    Article  CAS  PubMed  Google Scholar 

  23. Peng, W., Pronker, M. F. & Snijder, J. Mass spectrometry-based de novo sequencing of monoclonal antibodies using multiple proteases and a dual fragmentation scheme. J. Proteome Res. 20, 3559–3566 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

    Article  CAS  PubMed  Google Scholar 

  25. Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).

    Article  CAS  PubMed  Google Scholar 

  26. Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteom. 11, M111.010587 (2012).

    Article  Google Scholar 

  27. Orlando, L. et al. Ancient DNA analysis. Nat. Rev. Methods Prim. 1, 1–26 (2021).

    Google Scholar 

  28. Renaud, G., Schubert, M., Sawyer, S. & Orlando, L. Authentication and assessment of contamination in ancient DNA. Methods Mol. Biol. 1963, 163–194 (2019).

    Article  CAS  PubMed  Google Scholar 

  29. Radzicka, A. & Wolfenden, R. Rates of uncatalyzed peptide bond hydrolysis in neutral solution and the transition state affinities of proteases. J. Am. Chem. Soc. 118, 6105–6109 (1996).

    Article  CAS  Google Scholar 

  30. Iwata, T. et al. Processing of ameloblastin by MMP-20. J. Dent. Res. 86, 153–157 (2007).

    Article  CAS  PubMed  Google Scholar 

  31. Yamakoshi, Y., Hu, J. C.-C., Fukae, M., Yamakoshi, F. & Simmer, J. P. How do enamelysin and kallikrein 4 process the 32-kDa enamelin? Eur. J. Oral. Sci. 114, 45–51 (2006). 379–80.

    Article  CAS  PubMed  Google Scholar 

  32. van Doorn, N. L., Wilson, J., Hollund, H., Soressi, M. & Collins, M. J. Site-specific deamidation of glutamine: a new marker of bone collagen deterioration. Rapid Commun. Mass Spectrom. 26, 2319–2327 (2012).

    Article  PubMed  Google Scholar 

  33. Schroeter, E. R. & Cleland, T. P. Glutamine deamidation: an indicator of antiquity, or preservational quality? Rapid Commun. Mass Spectrom. 30, 251–255 (2016).

    Article  CAS  PubMed  Google Scholar 

  34. Ramsøe, A. et al. DeamiDATE 1.0: site-specific deamidation as a tool to assess authenticity of members of ancient proteomes. J. Archaeol. Sci. 115, 105080 (2020).

    Article  Google Scholar 

  35. Tagliabracci, V. S. et al. Secreted kinase phosphorylates extracellular proteins that regulate biomineralization. Science 336, 1150–1153 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Penkman, K. E. H., Kaufman, D. S., Maddy, D. & Collins, M. J. Closed-system behaviour of the intra-crystalline fraction of amino acids in mollusc shells. Quat. Geochronol. 3, 2–25 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Xiao, Y., Vecchi, M. M. & Wen, D. Distinguishing between leucine and isoleucine by integrated LC–MS analysis using an orbitrap fusion mass spectrometer. Anal. Chem. 88, 10757–10766 (2016).

    Article  CAS  PubMed  Google Scholar 

  41. Gabriels, R., Martens, L. & Degroeve, S. Updated MS2PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Res. 47, W295–W299 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Tiwary, S. et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat. Methods 16, 519–525 (2019).

    Article  CAS  PubMed  Google Scholar 

  43. Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).

    Article  CAS  PubMed  Google Scholar 

  44. Wilhelm, M. et al. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics. Nat. Commun. 12, 3346 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Gilbert, C. et al. Species identification of ivory and bone museum objects using minimally invasive proteomics. Sci. Adv. 10, eadi9028 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).

    Article  CAS  PubMed  Google Scholar 

  47. Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Patramanis, I., Ramos-Madrigal, J., Cappellini, E. & Racimo, F. PaleoProPhyler: a reproducible pipeline for phylogenetic inference using ancient proteins. Peer Community J. 3, e112 (2023).

    Article  Google Scholar 

  49. Pamilo, P. & Nei, M. Relationships between gene trees and species trees. Mol. Biol. Evol. https://doi.org/10.1093/oxfordjournals.molbev.a040517 (1988).

  50. Takahata, N. Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122, 957–966 (1989).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).

    Article  Google Scholar 

  52. Nichols, R. Gene trees and species trees are not the same. Trends Ecol. Evol. 16, 358–364 (2001).

    Article  CAS  PubMed  Google Scholar 

  53. Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Mailund, T., Munch, K. & Schierup, M. H. Lineage sorting in apes. Annu. Rev. Genet. 48, 519–535 (2014).

    Article  CAS  PubMed  Google Scholar 

  55. Sousa, F., Bertrand, Y. J. K., Doyle, J. J., Oxelman, B. & Pfeil, B. E. Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L. Syst. Biol. 66, 934–949 (2017).

    Article  CAS  PubMed  Google Scholar 

  56. Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Lanier, H. C., Huang, H. & Knowles, L. L. How low can you go? The effects of mutation rate on the accuracy of species-tree estimation. Mol. Phylogenet. Evol. 70, 112–119 (2014).

    Article  PubMed  Google Scholar 

  59. Madupe, P. P. et al. Enamel proteins reveal biological sex and genetic variability within southern African Paranthropus. Preprint at bioRxiv https://doi.org/10.1101/2023.07.03.547326 (2023).

  60. Yu, Y., Yu, Y., Smith, M. & Pieper, R. A spinnable and automatable StageTip for high throughput peptide desalting and proteomics. Protoc. Exch. https://doi.org/10.1038/protex.2014.033 (2014).

    Article  Google Scholar 

  61. Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

    Article  PubMed  Google Scholar 

  62. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).

  63. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    Article  PubMed  Google Scholar 

  64. Hall, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98 (1999).

    CAS  Google Scholar 

  65. Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818 (1998).

    Article  CAS  PubMed  Google Scholar 

  68. Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. eLife 5, e17092 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Lacruz, R. S., Habelitz, S., Wright, J. T. & Paine, M. L. Dental enamel formation and implications for oral health and disease. Physiol. Rev. 97, 939–993 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Blausen.com staff. Medical gallery of Blausen Medical 2014. WikiJournal Med. https://doi.org/10.15347/wjm/2014.010 (2014).

  71. Ahmadi, S. & Winter, D. Identification of poly(ethylene glycol) and poly(ethylene glycol)-based detergents using peptide search engines. Anal. Chem. 90, 6594–6600 (2018).

    Article  CAS  PubMed  Google Scholar 

  72. Bartlett, J. D. Dental enamel development: proteinases and their enamel matrix substrates. ISRN Dent. 2013, 684607 (2013).

    PubMed  PubMed Central  Google Scholar 

  73. Lu, Y. et al. Functions of KLK4 and MMP-20 in dental enamel formation. Biol. Chem. 389, 695–700 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).

  75. Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 50, D20–D26 (2022).

    Article  CAS  PubMed  Google Scholar 

  76. Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res. 44, D67–D72 (2016).

    Article  CAS  PubMed  Google Scholar 

  77. The NCBI C++ Toolkit. National Center for Biotechnology Information (2003).

  78. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  79. Prüfer, K. et al. Computational challenges in the analysis of ancient DNA. Genome Biol. 11, R47 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Hendy, J. et al. A guide to ancient protein studies. Nat. Ecol. Evol. 2, 791–799 (2018).

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

E.C., F.W. and J.R.-M. were supported by the VILLUM FONDEN (grant no. 17649). E.C., P.L.R. and J.V.O. were supported by the European Commission through the MSC European Training Network ‘TEMPERA’ (grant agreement no. 722606). E.C., I.P., C.K., R.S.P., P.P.M., F.S.H. and J.V.O. are supported by the European Commission through the MSC European Training Network ‘PUSHH’ (grant agreement no. 861389). E.C., A.J.T., M.M. and J.V.O. were supported by the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 101021361). F.W. has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 948365). A.J.T. was supported by the Danish National Research Foundation award PROTEIOS (DNRF128). Work at The Novo Nordisk Foundation Center for Protein Research was funded in part by a generous donation from the Novo Nordisk Foundation (NNF14CC0001). We thank F. Racimo for critical reading of the manuscript and for the valuable comments and feedback.

Author information

Authors and Affiliations

Authors

Contributions

E.C. devised the sample preparation methodology. J.R.-M. and I.P. respectively devised and further developed the phyloproteomic pipeline. P.L.R. devised the quantitative analysis of low-abundance modifications. R.S.P. and P.P.M. specifically focused on the experimental design of ancient protein authentication and sequence validation. P.L.R., M.M., C.K., F.S.H. and J.V.O. specifically focused on the description of the MS analysis steps. Figures 2 and 5 were designed by C.K. Figure 3 was created by R.S.P., who also reanalyzed the data underlying it. Figure 4 was designed by R.S.P. and C.K. A.J.T. and E.C. wrote the manuscript with contributions from P.L.R., I.P., C.K., R.S.P., P.M., F.S.K., F.W., M.M. and J.R.-M. All authors read and commented on the manuscript.

Corresponding authors

Correspondence to Jazmín Ramos-Madrigal or Enrico Cappellini.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Protocols thanks Katerina Douka, Glendon Parker, Brett Phinney and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this protocol

Cappellini, E. et al. Nature 574, 103–107 (2019): https://doi.org/10.1038/s41586-019-1555-y

Welker, F. et al. Nature 576, 262–265 (2019): https://doi.org/10.1038/s41586-019-1728-8

Welker, F. et al. Nature 580, 235–238 (2020): https://doi.org/10.1038/s41586-020-2153-8

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Taurozzi, A.J., Rüther, P.L., Patramanis, I. et al. Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel. Nat Protoc 19, 2085–2116 (2024). https://doi.org/10.1038/s41596-024-00975-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41596-024-00975-3

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research