Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX

Abstract

Next-generation sequencing technologies have revolutionized the field of paleogenomics, allowing the reconstruction of complete ancient genomes and their comparison with modern references. However, this requires the processing of vast amounts of data and involves a large number of steps that use a variety of computational tools. Here we present PALEOMIX (http://geogenetics.ku.dk/publications/paleomix), a flexible and user-friendly pipeline applicable to both modern and ancient genomes, which largely automates the in silico analyses behind whole-genome resequencing. Starting with next-generation sequencing reads, PALEOMIX carries out adapter removal, mapping against reference genomes, PCR duplicate removal, characterization of and compensation for postmortem damage, SNP calling and maximum-likelihood phylogenomic inference, and it profiles the metagenomic contents of the samples. As such, PALEOMIX allows for a series of potential applications in paleogenomics, comparative genomics and metagenomics. Applying the PALEOMIX pipeline to the three ancient and seven modern Phytophthora infestans genomes as described here takes 5 d using a 16-core server.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1
Figure 2: The P. infestans phylogeny produced by the phylogenetic pipeline on the basis of coding sequences with at least 80% of bases covered in the multiple sequence alignment.
Figure 3: Analyses of the microbial taxonomical profiles of three Phytophthora-infected historical potato samples.

Similar content being viewed by others

References

  1. Keane, T.M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).

    Article  CAS  Google Scholar 

  2. McVean, G.A. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  CAS  Google Scholar 

  3. Nielsen, R., Paul, J.S., Albrechtsen, A. & Song, Y.S. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011).

    Article  CAS  Google Scholar 

  4. Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  CAS  Google Scholar 

  5. Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).

    Article  CAS  Google Scholar 

  6. Ball, M.P. et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nat. Biotechnol. 27, 361–368 (2009).

    Article  CAS  Google Scholar 

  7. Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).

    Article  CAS  Google Scholar 

  8. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).

    Article  CAS  Google Scholar 

  9. Overballe-Petersen, S., Orlando, L. & Willerslev, E. Next-generation sequencing offers new insights into DNA degradation. Trends Biotechnol. 30, 364–368 (2012).

    Article  CAS  Google Scholar 

  10. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).

    Article  CAS  Google Scholar 

  11. Poinar, H.N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394 (2006).

    Article  CAS  Google Scholar 

  12. Miller, W. et al. Sequencing the nuclear genome of the extinct woolly mammoth. Nature 456, 387–390 (2008).

    Article  CAS  Google Scholar 

  13. Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).

    Article  CAS  Google Scholar 

  14. Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  Google Scholar 

  15. Krause, J. et al. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature 464, 894–897 (2010).

    Article  CAS  Google Scholar 

  16. Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010).

    Article  CAS  Google Scholar 

  17. Rasmussen, M. et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011).

    Article  CAS  Google Scholar 

  18. Keller, A. et al. New insights into the Tyrolean Iceman′s origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. 3, 698 (2012).

    Article  Google Scholar 

  19. Raghavan, M. et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014).

    Article  Google Scholar 

  20. Prufer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).

    Article  Google Scholar 

  21. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).

    Article  CAS  Google Scholar 

  22. Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).

    Article  CAS  Google Scholar 

  23. Millar, C.D. & Lambert, D.M. Ancient DNA: towards a million-year-old genome. Nature 499, 34–35 (2013).

    Article  CAS  Google Scholar 

  24. Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466–469 (2012).

    Article  CAS  Google Scholar 

  25. Lindqvist, C. et al. Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear. Proc. Natl. Acad. Sci. USA 107, 5053–5057 (2010).

    Article  CAS  Google Scholar 

  26. Miller, W. et al. Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proc. Natl. Acad. Sci. USA 109, E2382–E2390 (2012).

    Article  CAS  Google Scholar 

  27. Gilbert, M.T. et al. Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial genomes. Proc. Natl. Acad. Sci. USA 105, 8327–8332 (2008).

    Article  CAS  Google Scholar 

  28. Gilbert, M.T. et al. Paleo-Eskimo mtDNA genome reveals matrilineal discontinuity in Greenland. Science 320, 1787–1789 (2008).

    Article  CAS  Google Scholar 

  29. Bon, C. et al. Coprolites as a source of information on the genome and diet of the cave hyena. Proc. Biol. Sci. 279, 2825–2830 (2012).

    Article  CAS  Google Scholar 

  30. Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. USA 110, 15758–15763 (2013).

    Article  CAS  Google Scholar 

  31. Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl. Acad. Sci. USA 110, 2223–2227 (2013).

    Article  CAS  Google Scholar 

  32. Lari, M. et al. The complete mitochondrial genome of an 11,450-year-old aurochsen (Bos primigenius) from Central Italy. BMC Evol. Biol. 11, 32 (2011).

    Article  CAS  Google Scholar 

  33. Vilstrup, J.T. et al. Mitochondrial phylogenomics of modern and ancient equids. PLoS ONE 8, e55950 (2013).

    Article  CAS  Google Scholar 

  34. Haus, T. et al. Mitochondrial diversity and distribution of African green monkeys (Chlorocebus Gray, 1870). Am. J. Primatol. 75, 350–360 (2013).

    Article  CAS  Google Scholar 

  35. Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).

    Article  CAS  Google Scholar 

  36. Burbano, H.A. et al. Targeted investigation of the Neandertal genome by array-based sequence capture. Science 328, 723–725 (2010).

    Article  CAS  Google Scholar 

  37. Bos, K.I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506–510 (2011).

    Article  CAS  Google Scholar 

  38. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  Google Scholar 

  39. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  40. McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  Google Scholar 

  41. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  42. Schubert, M. et al. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics 13, 178 (2012).

    Article  CAS  Google Scholar 

  43. Krause, J. et al. A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr. Biol. 20, 231–236 (2010).

    Article  CAS  Google Scholar 

  44. Ginolhac, A., Rasmussen, M., Gilbert, M.T., Willerslev, E. & Orlando, L. mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27, 2153–2155 (2011).

    Article  CAS  Google Scholar 

  45. Jonsson, H., Ginolhac, A., Schubert, M., Johnson, P.L. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).

    Article  CAS  Google Scholar 

  46. Martin, M.D. et al. Reconstructing genome evolution in historic samples of the Irish potato famine pathogen. Nat. Commun. 4, 2172 (2013).

    Article  Google Scholar 

  47. Yoshida, K. et al. The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine. Elife 2, e00731 (2013).

    Article  Google Scholar 

  48. Kircher, M. Analysis of high-throughput ancient DNA sequencing data. Methods Mol. Biol. 840, 197–228 (2012).

    Article  CAS  Google Scholar 

  49. Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

    Article  Google Scholar 

  50. Lindgreen, S. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res. Notes 5, 337 (2012).

    Article  Google Scholar 

  51. Briggs, A.W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).

    Article  Google Scholar 

  52. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

    Article  CAS  Google Scholar 

  53. Briggs, A.W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. USA 104, 14616–14621 (2007).

    Article  CAS  Google Scholar 

  54. McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).

    Article  CAS  Google Scholar 

  55. Katoh, K. & Standley, D.M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Article  CAS  Google Scholar 

  56. de Queiroz, A. & Gatesy, J. The supermatrix approach to systematics. Trends Ecol. Evol. 22, 34–41 (2007).

    Article  Google Scholar 

  57. Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).

    Article  CAS  Google Scholar 

  58. Fierer, N. et al. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc. Natl. Acad. Sci. USA 109, 21390–21395 (2012).

    Article  CAS  Google Scholar 

  59. Cotillard, A. et al. Dietary intervention impact on gut microbial gene richness. Nature 500, 585–588 (2013).

    Article  CAS  Google Scholar 

  60. Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013).

    Article  Google Scholar 

  61. Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

    Article  CAS  Google Scholar 

  62. Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

    Article  CAS  Google Scholar 

  63. Huson, D.H., Richter, D.C., Mitra, S., Auch, A.F. & Schuster, S.C. Methods for comparative metagenomics. BMC Bioinformatics 10 (suppl. 1), S12 (2009).

    Article  Google Scholar 

  64. Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  65. Der Sarkissian, C., Ermini, L., Jónsson, H., Alekseev, A.N., Crubezy, E., Shapiro, B. & Orlando, L. Shotgun microbial profiling of fossil remains. Mol. Ecol. 10.1111/mec.12690 (2014).

  66. Ondov, B.D., Bergman, N.H. & Phillippy, A.M. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011).

    Article  Google Scholar 

  67. R Development Core Team. R: A Language and Environment for Statistical Computing, http://www.r-project.org/ (2013).

  68. Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542 (2006).

    Article  CAS  Google Scholar 

  69. Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    Article  CAS  Google Scholar 

  70. Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries. Nat. Methods 10, 325–327 (2013).

    Article  CAS  Google Scholar 

  71. Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).

    Article  Google Scholar 

  72. Venables, W.N. & Ripley,, B.D. Modern Applied Statistics with S (Springer, 2002).

  73. Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20, 289–290 (2004).

    Article  CAS  Google Scholar 

  74. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).

  75. Haas, B.J. et al. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature 461, 393–398 (2009).

    Article  CAS  Google Scholar 

  76. Avila-Adame, C. et al. Mitochondrial genome sequences and molecular evolution of the Irish potato famine pathogen, Phytophthora infestans. Curr. Genet. 49, 39–46 (2006).

    Article  CAS  Google Scholar 

  77. Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L. & Rice, P.M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by the Danish Council for Independent Research, Natural Sciences (FNU); the Danish National Research Foundation (DNRF94); a Marie Curie Career Integration grant (FP7 CIG-293845); and the Marie Curie FP7 Initial Training Network (EUROTAST). The work of M.S. was made possible thanks to the support of the Lundbeck foundation (R52-A5062). A.G. and L.E. were supported by Marie Curie Intra-European Fellowships (FP7 IEF-299176 and IEF-302617, respectively). R.F. was supported by a postdoctoral grant from AXA Research Fund (32983).

Author information

Authors and Affiliations

Authors

Contributions

M.S., H.J., A.G. and L.O. designed the BAM pipeline, with feedback from R.S. and M.M. M.S., A.G. and L.O. designed the phylogenetic pipeline. M.S. wrote the scripts for both pipelines, based in part on code written by M.K. L.E. and C.D.S. designed the metagenomic pipeline and scripts. M.D.M. guided selection and analysis of the example data sets. H.J., R.S., A.G., M.D.M., R.F., M.K., M.M. and L.O. tested the pipelines. L.O. coordinated the work. M.S., L.E., C.D.S., R.F. and L.O. wrote the manuscript with contributions from all the authors.

Corresponding author

Correspondence to Ludovic Orlando.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

Phytophthora infestans samples. (PDF 61 kb)

Supplementary Table 2

Libraries for Phytophthora infestans samples. (PDF 98 kb)

Supplementary Table 3

Adapter sequences for Phytophthora infestans samples. (PDF 476 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schubert, M., Ermini, L., Sarkissian, C. et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat Protoc 9, 1056–1082 (2014). https://doi.org/10.1038/nprot.2014.063

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2014.063

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing