Abstract
Next-generation sequencing technologies have revolutionized the field of paleogenomics, allowing the reconstruction of complete ancient genomes and their comparison with modern references. However, this requires the processing of vast amounts of data and involves a large number of steps that use a variety of computational tools. Here we present PALEOMIX (http://geogenetics.ku.dk/publications/paleomix), a flexible and user-friendly pipeline applicable to both modern and ancient genomes, which largely automates the in silico analyses behind whole-genome resequencing. Starting with next-generation sequencing reads, PALEOMIX carries out adapter removal, mapping against reference genomes, PCR duplicate removal, characterization of and compensation for postmortem damage, SNP calling and maximum-likelihood phylogenomic inference, and it profiles the metagenomic contents of the samples. As such, PALEOMIX allows for a series of potential applications in paleogenomics, comparative genomics and metagenomics. Applying the PALEOMIX pipeline to the three ancient and seven modern Phytophthora infestans genomes as described here takes 5 d using a 16-core server.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Keane, T.M. et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature 477, 289–294 (2011).
McVean, G.A. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Nielsen, R., Paul, J.S., Albrechtsen, A. & Song, Y.S. Genotype and SNP calling from next-generation sequencing data. Nat. Rev. Genet. 12, 443–451 (2011).
Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Ball, M.P. et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nat. Biotechnol. 27, 361–368 (2009).
Wheeler, D.A. et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008).
Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311–317 (2010).
Overballe-Petersen, S., Orlando, L. & Willerslev, E. Next-generation sequencing offers new insights into DNA degradation. Trends Biotechnol. 30, 364–368 (2012).
Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).
Poinar, H.N. et al. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. Science 311, 392–394 (2006).
Miller, W. et al. Sequencing the nuclear genome of the extinct woolly mammoth. Nature 456, 387–390 (2008).
Rasmussen, M. et al. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757–762 (2010).
Green, R.E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).
Krause, J. et al. The complete mitochondrial DNA genome of an unknown hominin from southern Siberia. Nature 464, 894–897 (2010).
Reich, D. et al. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468, 1053–1060 (2010).
Rasmussen, M. et al. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science 334, 94–98 (2011).
Keller, A. et al. New insights into the Tyrolean Iceman′s origin and phenotype as inferred by whole-genome sequencing. Nat. Commun. 3, 698 (2012).
Raghavan, M. et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature 505, 87–91 (2014).
Prufer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Millar, C.D. & Lambert, D.M. Ancient DNA: towards a million-year-old genome. Nature 499, 34–35 (2013).
Skoglund, P. et al. Origins and genetic legacy of Neolithic farmers and hunter-gatherers in Europe. Science 336, 466–469 (2012).
Lindqvist, C. et al. Complete mitochondrial genome of a Pleistocene jawbone unveils the origin of polar bear. Proc. Natl. Acad. Sci. USA 107, 5053–5057 (2010).
Miller, W. et al. Polar and brown bear genomes reveal ancient admixture and demographic footprints of past climate change. Proc. Natl. Acad. Sci. USA 109, E2382–E2390 (2012).
Gilbert, M.T. et al. Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial genomes. Proc. Natl. Acad. Sci. USA 105, 8327–8332 (2008).
Gilbert, M.T. et al. Paleo-Eskimo mtDNA genome reveals matrilineal discontinuity in Greenland. Science 320, 1787–1789 (2008).
Bon, C. et al. Coprolites as a source of information on the genome and diet of the cave hyena. Proc. Biol. Sci. 279, 2825–2830 (2012).
Dabney, J. et al. Complete mitochondrial genome sequence of a Middle Pleistocene cave bear reconstructed from ultrashort DNA fragments. Proc. Natl. Acad. Sci. USA 110, 15758–15763 (2013).
Fu, Q. et al. DNA analysis of an early modern human from Tianyuan Cave, China. Proc. Natl. Acad. Sci. USA 110, 2223–2227 (2013).
Lari, M. et al. The complete mitochondrial genome of an 11,450-year-old aurochsen (Bos primigenius) from Central Italy. BMC Evol. Biol. 11, 32 (2011).
Vilstrup, J.T. et al. Mitochondrial phylogenomics of modern and ancient equids. PLoS ONE 8, e55950 (2013).
Haus, T. et al. Mitochondrial diversity and distribution of African green monkeys (Chlorocebus Gray, 1870). Am. J. Primatol. 75, 350–360 (2013).
Meyer, M. et al. A mitochondrial genome sequence of a hominin from Sima de los Huesos. Nature 505, 403–406 (2014).
Burbano, H.A. et al. Targeted investigation of the Neandertal genome by array-based sequence capture. Science 328, 723–725 (2010).
Bos, K.I. et al. A draft genome of Yersinia pestis from victims of the Black Death. Nature 478, 506–510 (2011).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
McKenna, A. et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Schubert, M. et al. Improving ancient DNA read mapping against modern reference genomes. BMC Genomics 13, 178 (2012).
Krause, J. et al. A complete mtDNA genome of an early modern human from Kostenki, Russia. Curr. Biol. 20, 231–236 (2010).
Ginolhac, A., Rasmussen, M., Gilbert, M.T., Willerslev, E. & Orlando, L. mapDamage: testing for damage patterns in ancient DNA sequences. Bioinformatics 27, 2153–2155 (2011).
Jonsson, H., Ginolhac, A., Schubert, M., Johnson, P.L. & Orlando, L. mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684 (2013).
Martin, M.D. et al. Reconstructing genome evolution in historic samples of the Irish potato famine pathogen. Nat. Commun. 4, 2172 (2013).
Yoshida, K. et al. The rise and fall of the Phytophthora infestans lineage that triggered the Irish potato famine. Elife 2, e00731 (2013).
Kircher, M. Analysis of high-throughput ancient DNA sequencing data. Methods Mol. Biol. 840, 197–228 (2012).
Goecks, J., Nekrutenko, A. & Taylor, J. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
Lindgreen, S. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res. Notes 5, 337 (2012).
Briggs, A.W. et al. Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87 (2010).
Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Briggs, A.W. et al. Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. USA 104, 14616–14621 (2007).
McLaren, W. et al. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).
Katoh, K. & Standley, D.M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
de Queiroz, A. & Gatesy, J. The supermatrix approach to systematics. Trends Ecol. Evol. 22, 34–41 (2007).
Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690 (2006).
Fierer, N. et al. Cross-biome metagenomic analyses of soil microbial communities and their functional attributes. Proc. Natl. Acad. Sci. USA 109, 21390–21395 (2012).
Cotillard, A. et al. Dietary intervention impact on gut microbial gene richness. Nature 500, 585–588 (2013).
Schloissnig, S. et al. Genomic variation landscape of the human gut microbiome. Nature 493, 45–50 (2013).
Segata, N. et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).
Altschul, S.F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Huson, D.H., Richter, D.C., Mitra, S., Auch, A.F. & Schuster, S.C. Methods for comparative metagenomics. BMC Bioinformatics 10 (suppl. 1), S12 (2009).
Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Der Sarkissian, C., Ermini, L., Jónsson, H., Alekseev, A.N., Crubezy, E., Shapiro, B. & Orlando, L. Shotgun microbial profiling of fossil remains. Mol. Ecol. 10.1111/mec.12690 (2014).
Ondov, B.D., Bergman, N.H. & Phillippy, A.M. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011).
R Development Core Team. R: A Language and Environment for Statistical Computing, http://www.r-project.org/ (2013).
Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542 (2006).
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Daley, T. & Smith, A.D. Predicting the molecular complexity of sequencing libraries. Nat. Methods 10, 325–327 (2013).
Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).
Venables, W.N. & Ripley,, B.D. Modern Applied Statistics with S (Springer, 2002).
Paradis, E., Claude, J. & Strimmer, K. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20, 289–290 (2004).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).
Haas, B.J. et al. Genome sequence and analysis of the Irish potato famine pathogen Phytophthora infestans. Nature 461, 393–398 (2009).
Avila-Adame, C. et al. Mitochondrial genome sequences and molecular evolution of the Irish potato famine pathogen, Phytophthora infestans. Curr. Genet. 49, 39–46 (2006).
Cock, P.J., Fields, C.J., Goto, N., Heuer, M.L. & Rice, P.M. The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res. 38, 1767–1771 (2010).
Acknowledgements
This work was supported by the Danish Council for Independent Research, Natural Sciences (FNU); the Danish National Research Foundation (DNRF94); a Marie Curie Career Integration grant (FP7 CIG-293845); and the Marie Curie FP7 Initial Training Network (EUROTAST). The work of M.S. was made possible thanks to the support of the Lundbeck foundation (R52-A5062). A.G. and L.E. were supported by Marie Curie Intra-European Fellowships (FP7 IEF-299176 and IEF-302617, respectively). R.F. was supported by a postdoctoral grant from AXA Research Fund (32983).
Author information
Authors and Affiliations
Contributions
M.S., H.J., A.G. and L.O. designed the BAM pipeline, with feedback from R.S. and M.M. M.S., A.G. and L.O. designed the phylogenetic pipeline. M.S. wrote the scripts for both pipelines, based in part on code written by M.K. L.E. and C.D.S. designed the metagenomic pipeline and scripts. M.D.M. guided selection and analysis of the example data sets. H.J., R.S., A.G., M.D.M., R.F., M.K., M.M. and L.O. tested the pipelines. L.O. coordinated the work. M.S., L.E., C.D.S., R.F. and L.O. wrote the manuscript with contributions from all the authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Table 1
Phytophthora infestans samples. (PDF 61 kb)
Supplementary Table 2
Libraries for Phytophthora infestans samples. (PDF 98 kb)
Supplementary Table 3
Adapter sequences for Phytophthora infestans samples. (PDF 476 kb)
Rights and permissions
About this article
Cite this article
Schubert, M., Ermini, L., Sarkissian, C. et al. Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat Protoc 9, 1056–1082 (2014). https://doi.org/10.1038/nprot.2014.063
Published:
Issue Date:
DOI: https://doi.org/10.1038/nprot.2014.063
This article is cited by
-
Molecular archaeological study of horse remains unearthed from Jiulongshan cemetery, Ningxia, China
Asian Archaeology (2024)
-
Disentangling the origins of viticulture in the western Mediterranean
Scientific Reports (2023)
-
Multiomics analysis identifies novel facilitators of human dopaminergic neuron differentiation
EMBO Reports (2023)
-
Mitochondrial genomes reveal mid-Pleistocene population divergence, and post-glacial expansion, in Australasian snapper (Chrysophrys auratus)
Heredity (2023)
-
Large haploblocks underlie rapid adaptation in the invasive weed Ambrosia artemisiifolia
Nature Communications (2023)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.