Abstract
In temperate and subtropical regions, ancient proteins are reported to survive up to about 2 million years, far beyond the known limits of ancient DNA preservation in the same areas. Accordingly, their amino acid sequences currently represent the only source of genetic information available to pursue phylogenetic inference involving species that went extinct too long ago to be amenable for ancient DNA analysis. Here we present a complete workflow, including sample preparation, mass spectrometric data acquisition and computational analysis, to recover and interpret million-year-old dental enamel protein sequences. During sample preparation, the proteolytic digestion step, usually an integral part of conventional bottom-up proteomics, is omitted to increase the recovery of the randomly degraded peptides spontaneously generated by extensive diagenetic hydrolysis of ancient proteins over geological time. Similarly, we describe other solutions we have adopted to (1) authenticate the endogenous origin of the protein traces we identify, (2) detect and validate amino acid variation in the ancient protein sequences and (3) attempt phylogenetic inference. Sample preparation and data acquisition can be completed in 3–4 working days, while subsequent data analysis usually takes 2–5 days. The workflow described requires basic expertise in ancient biomolecules analysis, mass spectrometry-based proteomics and molecular phylogeny. Finally, we describe the limits of this approach and its potential for the reconstruction of evolutionary relationships in paleontology and paleoanthropology.
Key points
-
Paleoproteomics has shown that it is possible to obtain useful phylogenetic information from dental enamel proteins up to 2 million years old. They are heavily fragmented and chemically modified, making their recovery and analysis challenging.
-
The protocol describes how to (1) extract million-year-old dental enamel protein remains while minimizing contamination, (2) sequence them using high-resolution tandem mass spectrometry and (3) attempt otherwise so far impossible molecular-based phylogenetic inference.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All raw data points underlying Fig. 3, originally published in ref. 12, are accessible through the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD011008.
Code availability
Scripts required for the protein database reconstruction and phylogenetic analysis can be found at the following github repository: https://github.com/johnpatramanis/Nature_Prot_Enamel
References
Higuchi, R., Bowman, B., Freiberger, M., Ryder, O. A. & Wilson, A. C. DNA sequences from the quagga, an extinct member of the horse family. Nature 312, 282–284 (1984).
Pääbo, S., Gifford, J. A. & Wilson, A. C. Mitochondrial DNA sequences from a 7000-year old brain. Nucleic Acids Res. 16, 9775–9787 (1988).
Hagelberg, E. & Clegg, J. B. Isolation and characterization of DNA from archaeological bone. Proc. Biol. Sci. 244, 45–50 (1991).
Poinar, H. N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394 (2006).
Willerslev, E. et al. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114 (2007).
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
van der Valk, T. et al. Million-year-old DNA sheds light on the genomic history of mammoths. Nature 591, 265–269 (2021).
Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).
Lipson, M. et al. Ancient DNA and deep population structure in sub-Saharan African foragers. Nature 603, 290–296 (2022).
Cappellini, E. et al. Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature 574, 103–107 (2019).
Welker, F. et al. Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature 576, 262–265 (2019).
Welker, F. et al. The dental proteome of Homo antecessor. Nature 580, 235–238 (2020).
Warinner, C., Korzow Richter, K. & Collins, M. J. Paleoproteomics. Chem. Rev. 122, 13401–13446 (2022).
Olsen, J. V., Ong, S.-E. & Mann, M. Trypsin cleaves exclusively C-terminal to arginine and lysine residues. Mol. Cell. Proteom. 3, 608–614 (2004).
Stewart, N. A., Gerlach, R. F., Gowland, R. L., Gron, K. J. & Montgomery, J. Sex determination of human remains from peptides in tooth enamel. Proc. Natl Acad. Sci. USA 114, 13649–13654 (2017).
Cappellini, E. et al. Proteomic analysis of a pleistocene mammoth femur reveals more than one hundred ancient bone proteins. J. Proteome Res. 11, 917–926 (2012).
Mackie, M. et al. Palaeoproteomic profiling of conservation layers on a 14th century Italian wall painting. Angew. Chem. 57, 7369–7374 (2018).
Rappsilber, J., Mann, M. & Ishihama, Y. Protocol for micro-purification, enrichment, pre-fractionation and storage of peptides for proteomics using StageTips. Nat. Protoc. 2, 1896–1906 (2007).
Parker, G. J. et al. Sex estimation using sexually dimorphic amelogenin protein fragments in human enamel. J. Archaeol. Sci. 101, 169–180 (2019).
Rappsilber, J., Ishihama, Y. & Mann, M. Stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and LC/MS sample pretreatment in proteomics. Anal. Chem. 75, 663–670 (2003).
Peng, W., Pronker, M. F. & Snijder, J. Mass spectrometry-based de novo sequencing of monoclonal antibodies using multiple proteases and a dual fragmentation scheme. J. Proteome Res. 20, 3559–3566 (2021).
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).
Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteom. 11, M111.010587 (2012).
Orlando, L. et al. Ancient DNA analysis. Nat. Rev. Methods Prim. 1, 1–26 (2021).
Renaud, G., Schubert, M., Sawyer, S. & Orlando, L. Authentication and assessment of contamination in ancient DNA. Methods Mol. Biol. 1963, 163–194 (2019).
Radzicka, A. & Wolfenden, R. Rates of uncatalyzed peptide bond hydrolysis in neutral solution and the transition state affinities of proteases. J. Am. Chem. Soc. 118, 6105–6109 (1996).
Iwata, T. et al. Processing of ameloblastin by MMP-20. J. Dent. Res. 86, 153–157 (2007).
Yamakoshi, Y., Hu, J. C.-C., Fukae, M., Yamakoshi, F. & Simmer, J. P. How do enamelysin and kallikrein 4 process the 32-kDa enamelin? Eur. J. Oral. Sci. 114, 45–51 (2006). 379–80.
van Doorn, N. L., Wilson, J., Hollund, H., Soressi, M. & Collins, M. J. Site-specific deamidation of glutamine: a new marker of bone collagen deterioration. Rapid Commun. Mass Spectrom. 26, 2319–2327 (2012).
Schroeter, E. R. & Cleland, T. P. Glutamine deamidation: an indicator of antiquity, or preservational quality? Rapid Commun. Mass Spectrom. 30, 251–255 (2016).
Ramsøe, A. et al. DeamiDATE 1.0: site-specific deamidation as a tool to assess authenticity of members of ancient proteomes. J. Archaeol. Sci. 115, 105080 (2020).
Tagliabracci, V. S. et al. Secreted kinase phosphorylates extracellular proteins that regulate biomineralization. Science 336, 1150–1153 (2012).
Penkman, K. E. H., Kaufman, D. S., Maddy, D. & Collins, M. J. Closed-system behaviour of the intra-crystalline fraction of amino acids in mollusc shells. Quat. Geochronol. 3, 2–25 (2008).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Katoh, K., Misawa, K., Kuma, K.-I. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649 (2012).
Xiao, Y., Vecchi, M. M. & Wen, D. Distinguishing between leucine and isoleucine by integrated LC–MS analysis using an orbitrap fusion mass spectrometer. Anal. Chem. 88, 10757–10766 (2016).
Gabriels, R., Martens, L. & Degroeve, S. Updated MS2PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Res. 47, W295–W299 (2019).
Tiwary, S. et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat. Methods 16, 519–525 (2019).
Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).
Wilhelm, M. et al. Deep learning boosts sensitivity of mass spectrometry-based immunopeptidomics. Nat. Commun. 12, 3346 (2021).
Gilbert, C. et al. Species identification of ivory and bone museum objects using minimally invasive proteomics. Sci. Adv. 10, eadi9028 (2024).
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).
Patramanis, I., Ramos-Madrigal, J., Cappellini, E. & Racimo, F. PaleoProPhyler: a reproducible pipeline for phylogenetic inference using ancient proteins. Peer Community J. 3, e112 (2023).
Pamilo, P. & Nei, M. Relationships between gene trees and species trees. Mol. Biol. Evol. https://doi.org/10.1093/oxfordjournals.molbev.a040517 (1988).
Takahata, N. Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122, 957–966 (1989).
Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).
Nichols, R. Gene trees and species trees are not the same. Trends Ecol. Evol. 16, 358–364 (2001).
Hobolth, A., Dutheil, J. Y., Hawks, J., Schierup, M. H. & Mailund, T. Incomplete lineage sorting patterns among human, chimpanzee, and orangutan suggest recent orangutan speciation and widespread selection. Genome Res. 21, 349–356 (2011).
Mailund, T., Munch, K. & Schierup, M. H. Lineage sorting in apes. Annu. Rev. Genet. 48, 519–535 (2014).
Sousa, F., Bertrand, Y. J. K., Doyle, J. J., Oxelman, B. & Pfeil, B. E. Using genomic location and coalescent simulation to investigate gene tree discordance in Medicago L. Syst. Biol. 66, 934–949 (2017).
Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012).
Huerta-Sánchez, E. et al. Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197 (2014).
Lanier, H. C., Huang, H. & Knowles, L. L. How low can you go? The effects of mutation rate on the accuracy of species-tree estimation. Mol. Phylogenet. Evol. 70, 112–119 (2014).
Madupe, P. P. et al. Enamel proteins reveal biological sex and genetic variability within southern African Paranthropus. Preprint at bioRxiv https://doi.org/10.1101/2023.07.03.547326 (2023).
Yu, Y., Yu, Y., Smith, M. & Pieper, R. A spinnable and automatable StageTip for high throughput peptide desalting and proteomics. Protoc. Exch. https://doi.org/10.1038/protex.2014.033 (2014).
Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Hall, T. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucleic Acids Symp. Ser. 41, 95–98 (1999).
Sievers, F. et al. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539 (2011).
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818 (1998).
Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. eLife 5, e17092 (2016).
Lacruz, R. S., Habelitz, S., Wright, J. T. & Paine, M. L. Dental enamel formation and implications for oral health and disease. Physiol. Rev. 97, 939–993 (2017).
Blausen.com staff. Medical gallery of Blausen Medical 2014. WikiJournal Med. https://doi.org/10.15347/wjm/2014.010 (2014).
Ahmadi, S. & Winter, D. Identification of poly(ethylene glycol) and poly(ethylene glycol)-based detergents using peptide search engines. Anal. Chem. 90, 6594–6600 (2018).
Bartlett, J. D. Dental enamel development: proteinases and their enamel matrix substrates. ISRN Dent. 2013, 684607 (2013).
Lu, Y. et al. Functions of KLK4 and MMP-20 in dental enamel formation. Biol. Chem. 389, 695–700 (2008).
The UniProt Consortium. UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res. 51, D523–D531 (2023).
Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 50, D20–D26 (2022).
Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Sayers, E. W. GenBank. Nucleic Acids Res. 44, D67–D72 (2016).
The NCBI C++ Toolkit. National Center for Biotechnology Information (2003).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Prüfer, K. et al. Computational challenges in the analysis of ancient DNA. Genome Biol. 11, R47 (2010).
Hendy, J. et al. A guide to ancient protein studies. Nat. Ecol. Evol. 2, 791–799 (2018).
Acknowledgements
E.C., F.W. and J.R.-M. were supported by the VILLUM FONDEN (grant no. 17649). E.C., P.L.R. and J.V.O. were supported by the European Commission through the MSC European Training Network ‘TEMPERA’ (grant agreement no. 722606). E.C., I.P., C.K., R.S.P., P.P.M., F.S.H. and J.V.O. are supported by the European Commission through the MSC European Training Network ‘PUSHH’ (grant agreement no. 861389). E.C., A.J.T., M.M. and J.V.O. were supported by the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 101021361). F.W. has received funding from the European Research Council under the European Union’s Horizon 2020 research and innovation program (grant agreement no. 948365). A.J.T. was supported by the Danish National Research Foundation award PROTEIOS (DNRF128). Work at The Novo Nordisk Foundation Center for Protein Research was funded in part by a generous donation from the Novo Nordisk Foundation (NNF14CC0001). We thank F. Racimo for critical reading of the manuscript and for the valuable comments and feedback.
Author information
Authors and Affiliations
Contributions
E.C. devised the sample preparation methodology. J.R.-M. and I.P. respectively devised and further developed the phyloproteomic pipeline. P.L.R. devised the quantitative analysis of low-abundance modifications. R.S.P. and P.P.M. specifically focused on the experimental design of ancient protein authentication and sequence validation. P.L.R., M.M., C.K., F.S.H. and J.V.O. specifically focused on the description of the MS analysis steps. Figures 2 and 5 were designed by C.K. Figure 3 was created by R.S.P., who also reanalyzed the data underlying it. Figure 4 was designed by R.S.P. and C.K. A.J.T. and E.C. wrote the manuscript with contributions from P.L.R., I.P., C.K., R.S.P., P.M., F.S.K., F.W., M.M. and J.R.-M. All authors read and commented on the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Protocols thanks Katerina Douka, Glendon Parker, Brett Phinney and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Cappellini, E. et al. Nature 574, 103–107 (2019): https://doi.org/10.1038/s41586-019-1555-y
Welker, F. et al. Nature 576, 262–265 (2019): https://doi.org/10.1038/s41586-019-1728-8
Welker, F. et al. Nature 580, 235–238 (2020): https://doi.org/10.1038/s41586-020-2153-8
Supplementary information
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Taurozzi, A.J., Rüther, P.L., Patramanis, I. et al. Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel. Nat Protoc 19, 2085–2116 (2024). https://doi.org/10.1038/s41596-024-00975-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-024-00975-3