Review Article | Published:

A clinician's guide to microbiome analysis

Nature Reviews Gastroenterology & Hepatology volume 14, pages 585595 (2017) | Download Citation

This article has been updated

Abstract

Microbiome analysis involves determining the composition and function of a community of microorganisms in a particular location. For the gastroenterologist, this technology opens up a rapidly evolving set of challenges and opportunities for generating novel insights into the health of patients on the basis of microbiota characterizations from intestinal, hepatic or extraintestinal samples. Alterations in gut microbiota composition correlate with intestinal and extraintestinal disease and, although only a few mechanisms are known, the microbiota are still an attractive target for developing biomarkers for disease detection and management as well as potential therapeutic applications. In this Review, we summarize the major decision points confronting new entrants to the field or for those designing new projects in microbiome research. We provide recommendations based on current technology options and our experience of sequencing platform choices. We also offer perspectives on future applications of microbiome research, which we hope convey the promise of this technology for clinical applications.

Key points

  • Complex communities of microorganisms live on and in the human body, and variations in the composition and function of these communities are increasingly linked to various conditions and diseases

  • Although it is not known if microbiome changes are causative or consequential in most pathophysiologies, they might provide biomarkers for disease detection or management

  • Microbiome analysis is likely to become a routine component of secondary health care and is emerging as a modifiable environmental risk factor in multifactorial diseases that could be targeted by novel therapeutics

  • Technology advancements are leading to a range of powerful methods for microbiome analysis becoming available and affordable for clinical studies

  • Judicious choice of sample type and sequencing platform are required to maximize the clinical utility of microbiome data

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Change history

  • 11 August 2017

    In the version of this Review initially published online, the article should have indicated that Marcus J. Claesson and Adam G. Clooney contributed equally to this work. The error has been corrected for the HTML, PDF and print versions of the article.

References

  1. 1.

    , , & The gut microbiota in IBD. Nat. Rev. Gastroenterol. Hepatol. 9, 599–608 (2012).

  2. 2.

    , & Gastrointestinal microbiota in irritable bowel syndrome: present state and perspectives. Microbiology 156, 3205–3215 (2010).

  3. 3.

    et al. Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk. N. Engl. J. Med. 368, 1575–1584 (2013).

  4. 4.

    et al. Human gut microbes impact host serum metabolome and insulin sensitivity. Nature 535, 376–381 (2016).

  5. 5.

    & Microbes, microbiota, and colon cancer. Cell Host Microbe 15, 317–328 (2014).

  6. 6.

    & The vocabulary of microbiome research: a proposal. Microbiome 3, 31 (2015).

  7. 7.

    , , , & The application of ecological theory toward an understanding of the human microbiome. Science 336, 1255–1262 (2012).

  8. 8.

    et al. Antibiotics in early life alter the murine colonic microbiome and adiposity. Nature 488, 621–626 (2012).

  9. 9.

    , , & The pervasive effects of an antibiotic on the human gut microbiota, as revealed by deep 16S rRNA sequencing. PLoS Biol. 6, e280 (2008).

  10. 10.

    , , , & Metagenomic pyrosequencing and microbial identification. Clin. Chem. 55, 856–866 (2009).

  11. 11.

    , , & Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J. Bacteriol. 186, 2629–2635 (2004).

  12. 12.

    , , , & Compilation of small ribosomal subunit RNA structures. Nucleic Acids Res. 21, 3025–3049 (1993).

  13. 13.

    et al. Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res. 38, e200 (2010).

  14. 14.

    et al. Comparing apples and oranges?: Next generation sequencing and its impact on microbiome analysis. PLoS ONE 11, e0148028 (2016).

  15. 15.

    et al. Topographic diversity of fungal and bacterial communities in human skin. Nature 498, 367–370 (2013).

  16. 16.

    et al. Spatial variation of the colonic microbiota in patients with ulcerative colitis and control volunteers. Gut 64, 1553–1561 (2015).

  17. 17.

    et al. Comparison of brush and biopsy sampling methods of the ileal pouch for assessment of mucosa-associated microbiota of human subjects. Microbiome 2, 5 (2014).

  18. 18.

    et al. Microbial population differentials between mucosal and submucosal intestinal tissues in advanced crohn's disease of the ileum. PLoS ONE 10, e0134382 (2015).

  19. 19.

    et al. Extending colonic mucosal microbiome analysis-assessment of colonic lavage as a proxy for endoscopic colonic biopsies. Microbiome 4, 61 (2016).

  20. 20.

    et al. Rectal swabs for analysis of the intestinal microbiota. PLoS ONE 9, e101344 (2014).

  21. 21.

    et al. The effects of bowel preparation on microbiota-related metrics differ in health and in inflammatory bowel disease and for the mucosal and luminal microbiota compartments. Clin. Transl Gastroenterol. 7, e143 (2016).

  22. 22.

    et al. The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe 15, 382–392 (2014).

  23. 23.

    et al. Tumour-associated and non-tumour-associated microbiota in colorectal cancer. Gut (2016).

  24. 24.

    et al. Methods for improving human gut microbiome data by reducing variability through sample processing and storage of stool. PLoS ONE 10, e0134802 (2015).

  25. 25.

    et al. Storage conditions of intestinal microbiota matter in metagenomic analysis. BMC Microbiol. 12, 158 (2012).

  26. 26.

    , & Freezing fecal samples prior to DNA extraction affects the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis. FEMS Microbiol. Lett. 329, 193–197 (2012).

  27. 27.

    et al. Latitude in sample handling and storage for infant faecal microbiota studies: the elephant in the room? Microbiome 4, 40 (2016).

  28. 28.

    et al. Comparison of collection methods for fecal samples in microbiome studies. Am. J. Epidemiol. 185, 115–123 (2017).

  29. 29.

    et al. Effect of room temperature transport vials on DNA quality and phylogenetic composition of faecal microbiota of elderly adults and infants. Microbiome 4, 19 (2016).

  30. 30.

    et al. A robust ambient temperature collection and stabilization strategy: enabling worldwide functional studies of the human microbiome. Sci. Rep. 6, 31731 (2016).

  31. 31.

    et al. Collection media and delayed freezing effects on microbial composition of human stool. Microbiome 3, 33 (2015).

  32. 32.

    , & Sample storage conditions significantly influence faecal microbiome profiles. Scientif. Rep. 5, 16350 (2015).

  33. 33.

    , , , & Optimal preservation of liver biopsy samples for downstream translational applications. Hepatol. Int. 7, 758–766 (2013).

  34. 34.

    , , & Five commercial DNA extraction systems tested and compared on a stool sample collection. Diagnost. Microbiol. Infecti. Dis. 69, 240–244 (2011).

  35. 35.

    , , , & Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS ONE 7, e33865 (2012).

  36. 36.

    , & Optimization of terminal restriction fragment polymorphism (TRFLP) analysis of human gut microbiota. J. Microbiol. Methods 68, 303–311 (2007).

  37. 37.

    , & Comparison of DNA extraction kits for PCR-DGGE analysis of human intestinal microbial communities from fecal specimens. Nutr. J. 9, 23 (2010).

  38. 38.

    , , , & Comparison of six commercial kits to extract bacterial chromosome and plasmid DNA for MiSeq sequencing. Scientif. Rep. 6, 28063 (2016).

  39. 39.

    et al. Microbial diversity in fecal samples depends on DNA extraction method: easyMag DNA extraction compared to QIAamp DNA stool mini kit extraction. BMC Res. Notes 7, 50 (2014).

  40. 40.

    et al. Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome 2, 19 (2014).

  41. 41.

    et al. The effect of DNA extraction methodology on gut microbiota research applications. BMC Res. Notes 9, 365 (2016).

  42. 42.

    , , & Comparative evaluation of DNA extraction methods from feces of multiple host species for downstream next-generation sequencing. PLoS ONE 10, e0143334 (2015).

  43. 43.

    et al. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing. PLoS ONE 9, e88982 (2014).

  44. 44.

    et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).

  45. 45.

    et al. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome 4, 29 (2016).

  46. 46.

    , , & A critical assessment of the “sterile womb” and “in utero colonization” hypotheses: implications for research on the pioneer infant microbiome. Microbiome 5, 48 (2017).

  47. 47.

    & Quantifying the effect of ribosomal density on mRNA stability. PLoS ONE 9, e102308 (2014).

  48. 48.

    , & Coming of age: ten years of next-generation sequencing technologies. Nature reviews. Genetics 17, 333–351 (2016).

  49. 49.

    , & Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

  50. 50.

    et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol. 14, R2 (2013).

  51. 51.

    , , & MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012).

  52. 52.

    et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).

  53. 53.

    , & How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29, 987–991 (2011).

  54. 54.

    et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

  55. 55.

    , , & IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

  56. 56.

    & MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res.: Int. J. Rapid Publ. Rep. Genes Genomes 22, 69–77 (2015).

  57. 57.

    , , , & TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformat. 5, 163 (2004).

  58. 58.

    , & The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS ONE 7, e38581 (2012).

  59. 59.

    & Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

  60. 60.

    et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature Methods 12, 902–903 (2015).

  61. 61.

    & Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods 6, 673–676 (2009).

  62. 62.

    , , & MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genom. 15 (Suppl. 1), S12 (2014).

  63. 63.

    & Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28, 1033–1034 (2012).

  64. 64.

    & Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Scientif. Rep. 6, 24175 (2016).

  65. 65.

    , , & Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformat. 16, 363 (2015).

  66. 66.

    , , & CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom. 16, 236 (2015).

  67. 67.

    , & An evaluation of the accuracy and speed of metagenome analysis tools. Scientif. Rep. 6, 19233 (2016).

  68. 68.

    , & Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010).

  69. 69.

    , & FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38, e191 (2010).

  70. 70.

    , , & Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012).

  71. 71.

    , , , & Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  72. 72.

    et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).

  73. 73.

    et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).

  74. 74.

    & KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

  75. 75.

    et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–230 (2014).

  76. 76.

    , & The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).

  77. 77.

    et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009).

  78. 78.

    et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformat. 9, 386 (2008).

  79. 79.

    , , , & CAMERA: a community resource for metagenomics. PLoS Biol. 5, e75 (2007).

  80. 80.

    et al. EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res. 42, D600–606 (2014).

  81. 81.

    et al. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7, 335–336 (2010).

  82. 82.

    et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

  83. 83.

    UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature Methods 10, 996–998 (2013).

  84. 84.

    , , , & A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data. J. Proteomics Bioinform, 8, 283–291 (2015).

  85. 85.

    & de novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 3, e1487 (2015).

  86. 86.

    et al. Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data. Microbiome 3, 19 (2015).

  87. 87.

    et al. Open-source sequence clustering methods improve the state of the art. mSystems 1, e00003–00015 (2016).

  88. 88.

    , , , & UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27, 2194–2200 (2011).

  89. 89.

    et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504 (2011).

  90. 90.

    , , & Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

  91. 91.

    et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

  92. 92.

    et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).

  93. 93.

    et al. The RDP-II (Ribosomal Database Project). Nucleic Acids Res. 29, 173–174 (2001).

  94. 94.

    et al. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 166, 1063–1068 (2005).

  95. 95.

    , , & SPINGO: a rapid species-classifier for microbial amplicon sequences. BMC Bioinformat. 16, 324 (2015).

  96. 96.

    Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

  97. 97.

    , , , & A comparison of methods for clustering 16S rRNA sequences into OTUs. PLoS ONE 8, e70837 (2013).

  98. 98.

    et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods 13, 581–583 (2016).

  99. 99.

    et al. MEGAN Community Edition — Interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Computat. Biol. 12, e1004957 (2016).

  100. 100.

    & phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013).

  101. 101.

    et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnol. 31, 814–821 (2013).

  102. 102.

    , , & Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31, 2882–2884 (2015).

  103. 103.

    et al. Duodenal infusion of donor feces for recurrent Clostridium difficile. N. Engl. J. Med. 368, 407–415 (2013).

  104. 104.

    et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

  105. 105.

    et al. Gut microbiota composition correlates with diet and health in the elderly. Nature 488, 178–184 (2012).

  106. 106.

    Medicines from microbiota. Nat. Biotechnol. 31, 309–315 (2013).

  107. 107.

    US Food and Drug Administration. Early Clinical Trials With Live Biotherapeutic Products: Chemistry, Manufacturing, and Control Information; Guidance for Industry (FDA, 2016).

  108. 108.

    , , , & Making the leap from research laboratory to clinic: challenges and opportunities for next-generation sequencing in infectious disease diagnostics. mBio 6, e01888–e01815 (2015).

  109. 109.

    et al. Acute west nile virus meningoencephalitis diagnosed via metagenomic deep sequencing of cerebrospinal fluid in a renal transplant patient. Am. J. Transplant. (2016).

  110. 110.

    et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).

  111. 111.

    , , , & Phenotypic differentiation of gastrointestinal microbes is reflected in their encoded metabolic repertoires. Microbiome 3, 55 (2015).

  112. 112.

    & Systems biology of host-microbe metabolomics. Wiley Interdiscip. Rev. Syst. Biol. Med. 7, 195–219 (2015).

  113. 113.

    [No authors listed.] Babraham Bioinformatics

  114. 114.

    [No authors listed.] Hannonlab

  115. 115.

    & Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).

  116. 116.

    & NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7, e30619 (2012).

  117. 117.

    , , & Meta-QC-Chain: comprehensive and fast quality control method for metagenomic data. Genom. Proteom. Bioinformat. 12, 52–56 (2014).

  118. 118.

    , , & Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27, i94–i101 (2011).

  119. 119.

    & Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

  120. 120.

    , , , & Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13, R122 (2012).

  121. 121.

    , , , & MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

  122. 122.

    , , & MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

  123. 123.

    et al. Omega: an overlap-graph de novo assembler for metagenomics. Bioinformatics 30, 2717–2722 (2014).

  124. 124.

    , & MetaCAA: a clustering-aided methodology for efficient assembly of metagenomic datasets. Genomics 103, 161–168 (2014).

  125. 125.

    , , & metaSPAdes: a new versatile de novo metagenomics assembler. arXiv 1604.03071 (2016).

  126. 126.

    & An ORFome assembly approach to metagenomics sequences analysis. J. Bioinformat. Computat. Biol. 7, 455–471 (2009).

  127. 127.

    , , & GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis. Bioinformation 4, 46–49 (2010).

  128. 128.

    et al. Metagenomic species profiling using universal phylogenetic marker genes. Nature Methods 10, 1196–1199 (2013).

  129. 129.

    & Higher classification sensitivity of short metagenomic reads with CLARK-S. Bioinformatics (2016).

  130. 130.

    , , , & TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformat. 10, 56 (2009).

  131. 131.

    , & NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27, 127–129 (2011).

  132. 132.

    , , & MLTreeMap—accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genom. 11, 461 (2010).

  133. 133.

    , , & Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res. 43, e69 (2015).

  134. 134.

    & Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res. 39, e91 (2011).

  135. 135.

    et al. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 29, 2253–2260 (2013).

  136. 136.

    , & Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 31, 817–824 (2015).

  137. 137.

    , & Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucleic Acids Res. 40, e111 (2012).

  138. 138.

    , , & SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics 25, 1722–1730 (2009).

  139. 139.

    , , & SPHINX—an algorithm for taxonomic binning of metagenomic sequences. Bioinformatics 27, 22–30 (2011).

  140. 140.

    , , & RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformat. 12, 41 (2011).

  141. 141.

    , & WGSQuikr: fast whole-genome shotgun metagenomic classification. PLoS ONE 9, e91784 (2014).

  142. 142.

    , , & Binning sequences using very sparse labels within a metagenome. BMC Bioinformat. 9, 215 (2008).

  143. 143.

    , , & Treephyler: fast taxonomic profiling of metagenomes. Bioinformatics 26, 960–961 (2010).

  144. 144.

    et al. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME J. 5, 918–928 (2011).

  145. 145.

    , , & ClaMS: a classifier for metagenomic sequences. Standards Genom. Sci. 5, 248–253 (2011).

  146. 146.

    et al. Genometa—a fast and accurate classifier for short metagenomic shotgun reads. PLoS ONE 7, e41224 (2012).

  147. 147.

    , , , & Woods: a fast and accurate functional annotator and classifier of genomic and metagenomic sequences. Genomics 106, 1–6 (2015).

  148. 148.

    , & DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences. BMC Bioinformat. 11 (Suppl. 7), S14 (2010).

  149. 149.

    et al. Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms. Nucleic Acids Res. 41, e3 (2013).

  150. 150.

    et al. INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences. BMC Genom. 12 (Suppl. 3), S4 (2011).

  151. 151.

    , , & Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin. PLoS ONE 7, e34030 (2012).

  152. 152.

    , , , & Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genom. 12 (Suppl. 2), S4 (2011).

  153. 153.

    & Metagenomic taxonomic classification using extreme learning machines. J. Bioinformat. Computat. Biol. 10, 1250015 (2012).

  154. 154.

    , , & metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences. BMC Bioinformat. 14 (Suppl. 5), S2 (2013).

  155. 155.

    & SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles. Bioinformatics 29, 1858–1864 (2013).

  156. 156.

    , & DUDes: a top-down taxonomic profiler for metagenomics. Bioinformatics 32, 2272–2280 (2016).

  157. 157.

    , & Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Commun. 7, 11257 (2016).

  158. 158.

    , , , & MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).

  159. 159.

    , , , & MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes. BMC Biol. 13, 92 (2015).

  160. 160.

    , & MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucleic Acids Res. 42, e73 (2014).

  161. 161.

    , , , & A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads. PLoS ONE 7, e46450 (2012).

  162. 162.

    , , & Protein signature-based estimation of metagenomic abundances including all domains of life and viruses. Bioinformatics 29, 973–980 (2013).

  163. 163.

    , & TWARIT: an extremely rapid and efficient approach for phylogenetic classification of metagenomic sequences. Gene 505, 259–265 (2012).

  164. 164.

    et al. Classification of metagenomics data at lower taxonomic level using a robust supervised classifier. Evol. Bioinformat. Online 11, 3–10 S20523 (2015).

  165. 165.

    , & ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics 25, 2737–2738 (2009).

  166. 166.

    Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinformat. 10, 359 (2009).

  167. 167.

    , , , & Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40, e9 (2012).

  168. 168.

    , , & Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res. 37, W101–W105 (2009).

  169. 169.

    , , & Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformat. 14 (Suppl. 5), S12 (2013).

  170. 170.

    , , & Metaphor: finding bi-directional best hit homology relationships in (meta)genomic datasets. Genomics 104, 459–463 (2014).

  171. 171.

    & MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets. BMC Proc. 5 (Suppl. 2), S9 (2011).

  172. 172.

    et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42, D206–D214 (2014).

  173. 173.

    et al. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res. 42, D231–D239 (2014).

  174. 174.

    et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Computat. Biol. 8, e1002358 (2012).

  175. 175.

    et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).

  176. 176.

    et al. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res. 42, D568–573 (2014).

  177. 177.

    , , , & WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genom. 12, 444 (2011).

  178. 178.

    et al. METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26, 2631–2632 (2010).

  179. 179.

    , , , & Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization. PloS one 9, e89323 (2014).

  180. 180.

    et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS ONE 7, e47656 (2012).

  181. 181.

    et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44, W3–W10 (2016).

  182. 182.

    et al. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS. BMC Bioinformat. 16, 203 (2015).

  183. 183.

    et al. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformat. 6, 41 (2005).

  184. 184.

    , , , & SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 26, 2977–2978 (2010).

  185. 185.

    & Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl. Environ. Microbiol. 71, 1501–1506 (2005).

  186. 186.

    & Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

  187. 187.

    , & Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics 27, 611–618 (2011).

  188. 188.

    & ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. 39, e95 (2011).

  189. 189.

    , & DNACLUST: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinformat. 12, 271 (2011).

  190. 190.

    , , & A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences. BMC Bioinformat. 11, 601 (2010).

  191. 191.

    , , & M-Pick, a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinformat. 14, 43 (2013).

  192. 192.

    , , , & Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ 3, e1420 (2015).

  193. 193.

    et al. Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering. Microbiome 3, 43 (2015).

  194. 194.

    & MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs. Mol. bioSystems 11, 1907–1913 (2015).

  195. 195.

    , , , & CATCh, an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Appl. Environ. Microbiol. 81, 1573–1584 (2015).

  196. 196.

    , , & Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 6, 1440–1444 (2012).

  197. 197.

    , , , & 16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets. PLoS ONE 10, e0116106 (2015).

  198. 198.

    , , , & rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res. 43, D593–D598 (2015).

  199. 199.

    et al. PhylOPDb: a 16S rRNA oligonucleotide probe database for prokaryotic identification. Database (Oxford) (2014).

  200. 200.

    , , & Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database. BMC Genom. 16, 1056 (2015).

  201. 201.

    , & UniFrac—an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformat. 7, 371 (2006).

  202. 202.

    , , & The 120 kilodalton outer membrane (rOmpB) of Rickettsia rickettsii is encoded by an unusually long open reading frame: evidence for protein processing from a large precursor. Mol. Microbiol. 5, 2361–2370 (1991).

  203. 203.

    , , & Differential abundance analysis for microbial marker-gene surveys. Nature Methods 10, 1200–1202 (2013).

  204. 204.

    et al. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction. Microbiome 2, 11 (2014).

  205. 205.

    , & OTUbase: an R infrastructure package for operational taxonomic unit data. Bioinformatics 27, 1700–1701 (2011).

  206. 206.

    & FastGroup: a program to dereplicate libraries of 16S rDNA sequences. BMC Bioinformat. 2, 9 (2001).

  207. 207.

    et al. PANGEA: pipeline for analysis of next generation amplicons. ISME J. 4, 852–861 (2010).

  208. 208.

    et al. CLOTU: an online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation. BMC Bioinformat. 12, 182 (2011).

  209. 209.

    et al. JAGUC—a software package for environmental diversity analyses. J. Bioinformat. Computat. Biol. 9, 749–773 (2011).

  210. 210.

    , , , & MICCA: a complete and accurate software for taxonomic profiling of metagenomic data. Scientif. Rep. 5, 9743 (2015).

  211. 211.

    , & FunFrame: functional gene ecological analysis pipeline. Bioinformatics 29, 1212–1214 (2013).

Download references

Acknowledgements

This work was supported by Science Foundation Ireland through a Centre Award to the APC Microbiome Institute (SFI/12/RC/2273).

Author information

Author notes

    • Marcus J. Claesson
    •  & Adam G. Clooney

    These authors contributed equally to this work

Affiliations

  1. School of Microbiology, University College Cork, Western Road, T12 Y337 Cork, Ireland.

    • Marcus J. Claesson
    • , Adam G. Clooney
    •  & Paul W. O'Toole
  2. APC Microbiome Institute, University College Cork, Western Road, T12 Y337 Cork, Ireland.

    • Marcus J. Claesson
    • , Adam G. Clooney
    •  & Paul W. O'Toole
  3. Department of Biological Sciences, Cork Institute of Technology, Rossa Avenue, Bishopstown, T12 P928 Cork, Ireland.

    • Adam G. Clooney

Authors

  1. Search for Marcus J. Claesson in:

  2. Search for Adam G. Clooney in:

  3. Search for Paul W. O'Toole in:

Contributions

All authors contributed equally to this work.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Marcus J. Claesson or Paul W. O'Toole.

Glossary

Microbiome

The collection of microbial genomes at a given site.

Biomarkers

A measurable indicator of disease, pharmacological response or normal biological function.

Bioinformatics

The use of computer science, statistics and mathematics to analyse and interpret biological processes and molecular components.

Phylogenetics

Evolutionary relationships between organisms, genes or proteins.

Metagenome

The collective microbial genomes and genes in an environment or sample.

Shotgun sequencing

All extracted DNA is randomly sheered into desired fragment sizes for high-throughput sequencing, as opposed to targeting a specific marker gene.

Amplicon

A target gene or sequence that is amplified naturally or artificially.

Copy number

The number of copies of a particular section of DNA; some organisms have multiple copies of a targeted gene.

Taxa

A population of phylogenetically related organisms.

16S ribosomal RNA gene

A gene located in the 30S subunit of a prokaryotic ribosome, which contains nine variable regions that can be targeted for amplification and used for microbial taxonomic profiling of a sample.

18S ribosomal RNA gene

A gene located in the 40S ribosomal subunit found in eukaryotic cells, targeted in the analysis of fungal communities.

Alpha diversity

Microbiota diversity within an individual site or sample diversity; one value per sample.

Beta diversity

Intervariability, diversity between separate samples.

PCR bias

Unequal amounts of amplification across DNA sequences that leads to a skewed distribution of PCR products.

Metatranscriptomics

The study of RNA copies of the collective microbial genes in a community or sample.

Assembly

The process in which short DNA fragments are aligned and merged to form longer DNA fragments.

Contigs

Contiguous DNA sequences assembled from shorter, overlapping sequencing reads.

Annotation

Assigning functions or functional categories to gene or protein.

PHRED quality scores

A measure of the quality of base calling in a sequenced strand of DNA.

de Bruijn graphs

Consist of nodes (k-mers) and edges (overlaps between k-mers). The graph is constructed using k-mer overlaps leading to an assembled sequence.

Scaffolds

The product of aligning and merging contigs to form longer continuous DNA sequences.

Binning

Grouping DNA sequences based on particular attributes such as GC content or similarity with other genes.

k-mer

Short DNA sequence with fixed length k.

Homology

Shared ancestry or degree of relationship between sequences or genes.

Gene calling

Identifying coding regions in a sequence of DNA.

Orthologues

Genes in different species derived from a common ancestral gene following speciation, which usually retain the same function.

Pipelines

A series of tools or scripts optimized for the analysis of a dataset in which the outputs of one step are the inputs for next step.

Barcode sequence

A short series of DNA bases attached to sequence reads, each unique to a sample to enable differentiation after sequencing.

Operational taxonomic units

A collection (cluster) of sequences that are often at least 97% similar to each other and used to classify closely related individuals.

Reference database

A collection of known information (for example, gene sequences or functions) constructed in a format for querying or similarity-based searches.

Chimeric sequences

Artefacts from the PCR process in which an amplified sequence is composed of DNA from two or more parents.

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nrgastro.2017.97

Further reading