Review Article | Published:

Shotgun metagenomics, from sampling to analysis

Nature Biotechnology volume 35, pages 833844 (2017) | Download Citation

  • A Corrigendum to this article was published on 08 December 2017

This article has been updated

Abstract

Diverse microbial communities of bacteria, archaea, viruses and single-celled eukaryotes have crucial roles in the environment and in human health. However, microbes are frequently difficult to culture in the laboratory, which can confound cataloging of members and understanding of how communities function. High-throughput sequencing technologies and a suite of computational pipelines have been combined into shotgun metagenomics methods that have transformed microbiology. Still, computational approaches to overcome the challenges that affect both assembly-based and mapping-based metagenomic profiling, particularly of high-complexity samples or environments containing organisms with limited similarity to sequenced genomes, are needed. Understanding the functions and characterizing specific strains of these communities offers biotechnological promise in therapeutic discovery and innovative ways to synthesize products using microbial factories and can pinpoint the contributions of microorganisms to planetary, animal and human health.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Change history

  • 12 September 2017

    In the version of this article initially published, the Competing Financial Interests should have indicated the authors had competing interests, but instead indicated there were none. The detailed statement was missing from the HTML: J.T.S. receives research funding from Oxford Nanopore Technologies and has received travel and accommodations to speak at meetings hosted by Oxford Nanopore Technologies. N.J.L. has received honoraria to speak at Oxford Nanopore and Illumina meetings, and travel and accommodation to attend company-sponsored meetings. N.J.L. has ongoing research collaborations with Oxford Nanopore who have provided free-of-charge sequencing reagents as part of the MinION Access Programme and directly in support of research projects. In addition, the publication date was given as 11 September, rather than 12 September 2017. The errors have been corrected for the PDF and HTML versions of this article.

References

  1. 1.

    & Microbial community profiling for human microbiome projects: tools, techniques, and challenges. Genome Res. 19, 1141–1152 (2009).

  2. 2.

    Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  3. 3.

    et al. Biogeography and individuality shape function in the human skin metagenome. Nature 514, 59–64 (2014).

  4. 4.

    et al. A culture-independent sequence-based metagenomics approach to the investigation of an outbreak of Shiga-toxigenic Escherichia coli O104:H4. J. Am. Med. Assoc. 309, 1502–1510 (2013).

  5. 5.

    et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).

  6. 6.

    et al. Ocean plankton. Structure and function of the global ocean microbiome. Science 348, 1261359 (2015).

  7. 7.

    et al. Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004).

  8. 8.

    et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).

  9. 9.

    et al. Complete nitrification by a single microorganism. Nature 528, 555–559 (2015).

  10. 10.

    et al. Complete nitrification by Nitrospira bacteria. Nature 528, 504–509 (2015).

  11. 11.

    et al. A systematic analysis of biosynthetic gene clusters in the human microbiome reveals a common family of antibiotics. Cell 158, 1402–1414 (2014).

  12. 12.

    et al. Disease-specific alterations in the enteric virome in inflammatory bowel disease. Cell 160, 447–460 (2015).

  13. 13.

    et al. The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe 15, 382–392 (2014).

  14. 14.

    et al. Durable coexistence of donor and recipient strains after fecal microbiota transplantation. Science 352, 586–589 (2016).

  15. 15.

    et al. Direct sequencing of the human microbiome readily reveals community differences. Genome Biol. 11, 210 (2010).

  16. 16.

    et al. Conducting a microbiome study. Cell 158, 250–262 (2014).

  17. 17.

    et al. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS One 7, e52078 (2012).

  18. 18.

    , , , & Two-stage microbial community experimental design. ISME J. 7, 2330–2339 (2013).

  19. 19.

    et al. The effect of host genetics on the gut microbiome. Nat. Genet. 48, 1407–1412 (2016).

  20. 20.

    et al. Population-level analysis of gut microbiome variation. Science 352, 560–564 (2016).

  21. 21.

    et al. Unlocking the potential of metagenomics through replicated experimental design. Nat. Biotechnol. 30, 513–520 (2012).

  22. 22.

    et al. Stochastic changes over time and not founder effects drive cage effects in microbial community assembly in a mouse model. ISME J. 7, 2116–2125 (2013).

  23. 23.

    et al. Age and microenvironment outweigh genetic influence on the Zucker rat microbiome. PLoS One 9, e100916 (2014).

  24. 24.

    et al. Olfactory exposure to males, including men, causes stress and related analgesia in rodents. Nat. Methods 11, 629–632 (2014).

  25. 25.

    , , , & Heterogeneity of the gut microbiome in mice: guidelines for optimizing experimental design. FEMS Microbiol. Rev. 40, 117–132 (2016).

  26. 26.

    et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29, 415–420 (2011).

  27. 27.

    et al. Meta-analyses of studies of the human microbiota. Genome Res. 23, 1704–1714 (2013).

  28. 28.

    , , , & New perspectives on microbial community distortion after whole-genome amplification. PLoS One 10, e0124158 (2015).

  29. 29.

    et al. Time between collection and storage significantly influences bacterial sequence composition in sputum samples from cystic fibrosis respiratory infections. J. Clin. Microbiol. 52, 3011–3016 (2014).

  30. 30.

    et al. Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome 2, 19 (2014).

  31. 31.

    , , , & Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS One 7, e33865 (2012).

  32. 32.

    et al. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing. PLoS One 9, e88982 (2014).

  33. 33.

    , , & Specific ribosomal DNA sequences from diverse environmental settings correlate with experimental contaminants. Appl. Environ. Microbiol. 64, 3110–3113 (1998).

  34. 34.

    et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).

  35. 35.

    et al. Improved multiple displacement amplification (iMDA) and ultraclean reagents. BMC Genomics 15, 443 (2014).

  36. 36.

    , , , & Analysis, optimization and verification of Illumina-generated 16S rRNA gene amplicon surveys. PLoS One 9, e94249 (2014).

  37. 37.

    et al. Index switching causes “spreading-of-signal” among multiplexed samples in Illumina HiSeq 4000 DNA sequencing. Preprint at (2017).

  38. 38.

    et al. Inexpensive multiplexed library preparation for megabase-sized genomes. PLoS One 10, e0128036 (2015).

  39. 39.

    et al. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc. Natl. Acad. Sci. USA 112, 14024–14029 (2015).

  40. 40.

    & The theory and practice of genome sequence assembly. Annu. Rev. Genomics Hum. Genet. 16, 153–172 (2015).

  41. 41.

    , & An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. USA 98, 9748–9753 (2001).

  42. 42.

    Exploring genome characteristics and sequence quality without a reference. Bioinformatics 30, 1228–1235 (2014).

  43. 43.

    , , , & De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44, 226–232 (2012).

  44. 44.

    , , & Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27, i94–i101 (2011).

  45. 45.

    , , & MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012).

  46. 46.

    , , & IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

  47. 47.

    et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

  48. 48.

    et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

  49. 49.

    , , , & Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13, R122 (2012).

  50. 50.

    et al. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. Proc. Natl. Acad. Sci. USA 109, 13272–13277 (2012).

  51. 51.

    et al. Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning. Nat. Biotechnol. 33, 1053–1060 (2015).

  52. 52.

    , , , & MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

  53. 53.

    et al. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience 2, 10 (2013).

  54. 54.

    et al. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genomics 17, 55 (2016).

  55. 55.

    et al. Critical assessment of metagenome interpretation—a benchmark of computational metagenomics software. Preprint at (2017).

  56. 56.

    , & Compositional biases of bacterial genomes and evolutionary implications. J. Bacteriol. 179, 3899–3913 (1997).

  57. 57.

    et al. Community-wide analysis of microbial genome sequence signatures. Genome Biol. 10, R85 (2009).

  58. 58.

    , , , & Metagenome fragment classification using N-mer frequency profiles. Adv. Bioinformatics 2008, 205969 (2008).

  59. 59.

    , , , & Accurate phylogenetic classification of variable-length DNA fragments. Nat. Methods 4, 63–72 (2007).

  60. 60.

    et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

  61. 61.

    , , & The binning of metagenomic contigs for microbial physiology of mixed cultures. Front. Microbiol. 3, 410 (2012).

  62. 62.

    & Clustering metagenomic sequences with interpolated Markov models. BMC Bioinformatics 11, 544 (2010).

  63. 63.

    et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res. 23, 111–120 (2013).

  64. 64.

    et al. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31, 533–538 (2013).

  65. 65.

    et al. Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples. Science 349, 1101–1106 (2015).

  66. 66.

    et al. GroopM: an automated tool for the recovery of population genomes from related metagenomes. PeerJ 2, e603 (2014).

  67. 67.

    , , & MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

  68. 68.

    et al. Anvi'o: an advanced analysis and visualization platform for 'omics data. PeerJ 3, e1319 (2015).

  69. 69.

    et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

  70. 70.

    Growing unculturable bacteria. J. Bacteriol. 194, 4151–4160 (2012).

  71. 71.

    et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).

  72. 72.

    et al. A catalog of reference genomes from the human microbiome. Science 328, 994–999 (2010).

  73. 73.

    et al. Computational meta'omics for microbial community studies. Mol. Syst. Biol. 9, 666 (2013).

  74. 74.

    et al. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat. Biotechnol. 32, 822–828 (2014).

  75. 75.

    et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

  76. 76.

    et al. Gut metagenome in European women with normal, impaired and diabetic glucose control. Nature 498, 99–103 (2013).

  77. 77.

    et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

  78. 78.

    et al. Potential of fecal microbiota for early-stage detection of colorectal cancer. Mol. Syst. Biol. 10, 766 (2014).

  79. 79.

    et al. Alterations of the human gut microbiome in liver cirrhosis. Nature 513, 59–64 (2014).

  80. 80.

    , , , & Integrative analysis of environmental sequences using MEGAN4. Genome Res. 21, 1552–1560 (2011).

  81. 81.

    & Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6, 673–676 (2009).

  82. 82.

    & Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

  83. 83.

    et al. A catalog of the mouse gut metagenome. Nat. Biotechnol. 33, 1103–1108 (2015).

  84. 84.

    , , & Phylogeny, culturing, and metagenomics of the human gut microbiota. Trends Microbiol. 22, 267–274 (2014).

  85. 85.

    et al. Metagenomic species profiling using universal phylogenetic marker genes. Nat. Methods 10, 1196–1199 (2013).

  86. 86.

    et al. Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

  87. 87.

    et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903 (2015).

  88. 88.

    et al. Accessible, curated metagenomic data through ExperimentHub. Nat. Methods (in press).

  89. 89.

    et al. ConStrains identifies microbial strains in metagenomic datasets. Nat. Biotechnol. 33, 1045–1052 (2015).

  90. 90.

    et al. Uncovering oral Neisseria tropism and persistence using metagenomic sequencing. Nat. Microbiol. 1, 16070 (2016).

  91. 91.

    , & Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010).

  92. 92.

    et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).

  93. 93.

    et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).

  94. 94.

    et al. Data, information, knowledge and principle: back to metabolism in KEGG. Nucleic Acids Res. 42, D199–D205 (2014).

  95. 95.

    UniProt Consortium. Activities at the Universal Protein Resource (UniProt). Nucleic Acids Res. 42, D191–D198 (2014).

  96. 96.

    et al. Interconnected microbiomes and resistomes in low-income human habitats. Nature 533, 212–216 (2016).

  97. 97.

    et al. High-specificity targeted functional profiling in microbial communities with ShortBRED. PLoS Comput. Biol. 11, e1004557 (2015).

  98. 98.

    & ARDB—Antibiotic Resistance Genes Database. Nucleic Acids Res. 37, D443–D447 (2009).

  99. 99.

    , & Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J. 9, 207–216 (2015).

  100. 100.

    , & Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 0550–8 (2014).

  101. 101.

    et al. Vegan: the community ecology package. The Comprehensive R Archive Network (2007).

  102. 102.

    , , & Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10, 1200–1202 (2013).

  103. 103.

    & Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, e1002687 (2012).

  104. 104.

    et al. Microbial co-occurrence relationships in the human microbiome. PLoS Comput. Biol. 8, e1002606 (2012).

  105. 105.

    , , , & Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS Comput. Biol. 12, e1004977 (2016).

  106. 106.

    , & Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput. Biol. 5, e1000352 (2009).

  107. 107.

    et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).

  108. 108.

    , , , & Compact graphical representation of phylogenetic data and metadata with GraPhlAn. PeerJ 3, e1029 (2015).

  109. 109.

    , & Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12, 385 (2011).

  110. 110.

    , , , & Microbial strain-level population structure and genetic diversity from metagenomes. Genome Res. 27, 626–638 (2017).

  111. 111.

    et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat. Methods 13, 435–438 (2016).

  112. 112.

    et al. De novo extraction of microbial strains from metagenomes reveals intra-species niche partitioning. Preprint at (2016).

  113. 113.

    , & A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer. Gigascience 3, 22 (2014).

  114. 114.

    , & A complete bacterial genome assembled de novo using only nanopore sequencing data. Nat. Methods 12, 733–735 (2015).

  115. 115.

    et al. Synthetic long-read sequencing reveals intraspecies diversity in the human microbiome. Nat. Biotechnol. 34, 64–69 (2016).

  116. 116.

    et al. Accurate, multi-kb reads resolve complex populations and detect rare microorganisms. Genome Res. 25, 534–543 (2015).

  117. 117.

    Microbiology: the road to strain-level identification. Nat. Methods 13, 401–404 (2016).

  118. 118.

    et al. A Bayesian approach to inferring the phylogenetic structure of communities from metagenomic data. Genetics 197, 925–937 (2014).

  119. 119.

    , , & An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography. Genome Res. 26, 1612–1625 (2016).

  120. 120.

    et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004).

  121. 121.

    , & Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  122. 122.

    et al. A quantitative comparison of single-cell whole-genome amplification methods. PLoS ONE 9, e105585 (2014).

  123. 123.

    , , , & Fixation-free fluorescence in situ hybridization for targeted enrichment of microbial populations. ISME J. 4, 1352–1356 (2010).

  124. 124.

    et al. Reconstructing rare soil microbial genomes using in situ enrichments and metagenomics. Front. Microbiol. 6, 358 (2015).

  125. 125.

    et al. Complete bacteriophage transfer in a bacterial endosymbiont (Wolbachia) determined by targeted genome capture. Genome Biol. Evol. 3, 209–218 (2011).

  126. 126.

    et al. Generating whole bacterial genome sequences of low-abundance species from complex samples with IMS-MDA. Nat. Protoc. 8, 2404–2412 (2013).

  127. 127.

    et al. Purifying the impure: sequencing metagenomes and metatranscriptomes from complex animal-associated samples. J. Vis. Exp. 94, e52117 (2014).

  128. 128.

    et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).

  129. 129.

    et al. Niche and host-associated functional signatures of the root surface microbiome. Nat. Commun. 5, 4950 (2014).

  130. 130.

    , , Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012).

  131. 131.

    et al. Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads. Microbiome 5, 11 (2017).

  132. 132.

    et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).

  133. 133.

    et al. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 30, 281–283 (2002).

  134. 134.

    , , Average genome size: a potential source of bias in comparative metagenomics. ISME J. 4, 1075–1077 (2010).

  135. 135.

    et al. Unlocking the potential of metagenomics through replicated experimental design. Nat. Biotechnol. 30, 513–520 (2012).

Download references

Acknowledgements

A.W.W. and the Rowett Institute receive core funding support from the Scottish Government's Rural and Environmental Science and Analysis Service (RESAS). N.S. is supported by the European Research Council (ERC-STG project MetaPG), a European Union Framework Program 7 Marie-Curie grant (PCIG13-618833), a MIUR grant (FIR RBFR13EWWI), a Fondazione Caritro grant (Rif.Int.2013.0239) and a Terme di Comano grant. C.Q. and N.J.L. are funded through a MRC bioinformatics fellowship (MR/M50161X/1) as part of the MRC Cloud Infrastructure for Microbial Bioinformatics (CLIMB) consortium (MR/L015080/1). J.T.S. is supported by the Ontario Institute for Cancer Research through funding provided by the Government of Ontario.

Author information

Author notes

    • Christopher Quince
    •  & Alan W Walker

    These authors contributed equally to this work.

Affiliations

  1. Warwick Medical School, University of Warwick, Warwick, UK.

    • Christopher Quince
  2. Microbiology Group, The Rowett Institute, University of Aberdeen, Aberdeen, UK.

    • Alan W Walker
  3. Ontario Institute for Cancer Research, Toronto, Ontario, Canada.

    • Jared T Simpson
  4. Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.

    • Jared T Simpson
  5. Institute for Microbiology and Infection, University of Birmingham, Birmingham, UK.

    • Nicholas J Loman
  6. Centre for Integrative Biology, University of Trento, Trento, Italy.

    • Nicola Segata

Authors

  1. Search for Christopher Quince in:

  2. Search for Alan W Walker in:

  3. Search for Jared T Simpson in:

  4. Search for Nicholas J Loman in:

  5. Search for Nicola Segata in:

Contributions

C.Q., A.W.W., J.T.S., N.J.L. and N.S. drafted the paper, revised the text and designed figures, tables and boxes. C.Q. and N.S. performed the metagenomic analyses described in the manuscript.

Competing interests

J.T.S. receives research funding from Oxford Nanopore Technologies and has received travel and accommodations to speak at meetings hosted by Oxford Nanopore Technologies. N.J.L. has received honoraria to speak at Oxford Nanopore and Illumina meetings, and travel and accommodation to attend company-sponsored meetings. N.J.L. has ongoing research collaborations with Oxford Nanopore who have provided free-of-charge sequencing reagents as part of the MinION Access Programme and directly in support of research projects.

Corresponding author

Correspondence to Nicola Segata.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figure 1

  2. 2.

    Life Sciences Reporting Summary

  3. 3.

    Supplementary Box 1

    Problems and solutions for study and design.

Zip files

  1. 1.

    Supplementary Code 1

    Supporting scripts and pipeline description.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.3935

Further reading