A clinician's guide to microbiome analysis

Key Points

  • Complex communities of microorganisms live on and in the human body, and variations in the composition and function of these communities are increasingly linked to various conditions and diseases

  • Although it is not known if microbiome changes are causative or consequential in most pathophysiologies, they might provide biomarkers for disease detection or management

  • Microbiome analysis is likely to become a routine component of secondary health care and is emerging as a modifiable environmental risk factor in multifactorial diseases that could be targeted by novel therapeutics

  • Technology advancements are leading to a range of powerful methods for microbiome analysis becoming available and affordable for clinical studies

  • Judicious choice of sample type and sequencing platform are required to maximize the clinical utility of microbiome data

Abstract

Microbiome analysis involves determining the composition and function of a community of microorganisms in a particular location. For the gastroenterologist, this technology opens up a rapidly evolving set of challenges and opportunities for generating novel insights into the health of patients on the basis of microbiota characterizations from intestinal, hepatic or extraintestinal samples. Alterations in gut microbiota composition correlate with intestinal and extraintestinal disease and, although only a few mechanisms are known, the microbiota are still an attractive target for developing biomarkers for disease detection and management as well as potential therapeutic applications. In this Review, we summarize the major decision points confronting new entrants to the field or for those designing new projects in microbiome research. We provide recommendations based on current technology options and our experience of sequencing platform choices. We also offer perspectives on future applications of microbiome research, which we hope convey the promise of this technology for clinical applications.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Flowchart of the major steps involved in bioinformatic analysis of the microbiome.
Figure 2: Sequence read assembly.

Change history

  • 11 August 2017

    In the version of this Review initially published online, the article should have indicated that Marcus J. Claesson and Adam G. Clooney contributed equally to this work. The error has been corrected for the HTML, PDF and print versions of the article.

References

  1. 1

    Manichanh, C., Borruel, N., Casellas, F. & Guarner, F. The gut microbiota in IBD. Nat. Rev. Gastroenterol. Hepatol. 9, 599–608 (2012).

    Article  CAS  PubMed  Google Scholar 

  2. 2

    Salonen, A., de Vos, W. M. & Palva, A. Gastrointestinal microbiota in irritable bowel syndrome: present state and perspectives. Microbiology 156, 3205–3215 (2010).

    Article  CAS  PubMed  Google Scholar 

  3. 3

    Tang, W. H. et al. Intestinal microbial metabolism of phosphatidylcholine and cardiovascular risk. N. Engl. J. Med. 368, 1575–1584 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Pedersen, H. K. et al. Human gut microbes impact host serum metabolome and insulin sensitivity. Nature 535, 376–381 (2016).

    Article  CAS  PubMed  Google Scholar 

  5. 5

    Sears, C. L. & Garrett, W. S. Microbes, microbiota, and colon cancer. Cell Host Microbe 15, 317–328 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. 6

    Marchesi, J. R. & Ravel, J. The vocabulary of microbiome research: a proposal. Microbiome 3, 31 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7

    Costello, E. K., Stagaman, K., Dethlefsen, L., Bohannan, B. J. & Relman, D. A. The application of ecological theory toward an understanding of the human microbiome. Science 336, 1255–1262 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Cho, I. et al. Antibiotics in early life alter the murine colonic microbiome and adiposity. Nature 488, 621–626 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Dethlefsen, L., Huse, S., Sogin, M. L. & Relman, D. A. The pervasive effects of an antibiotic on the human gut microbiota, as revealed by deep 16S rRNA sequencing. PLoS Biol. 6, e280 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Petrosino, J. F., Highlander, S., Luna, R. A., Gibbs, R. A. & Versalovic, J. Metagenomic pyrosequencing and microbial identification. Clin. Chem. 55, 856–866 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Acinas, S. G., Marcelino, L. A., Klepac-Ceraj, V. & Polz, M. F. Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J. Bacteriol. 186, 2629–2635 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. 12

    Neefs, J. M., Van de Peer, Y., De Rijk, P., Chapelle, S. & De Wachter, R. Compilation of small ribosomal subunit RNA structures. Nucleic Acids Res. 21, 3025–3049 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Claesson, M. J. et al. Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Res. 38, e200 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. 14

    Clooney, A. G. et al. Comparing apples and oranges?: Next generation sequencing and its impact on microbiome analysis. PLoS ONE 11, e0148028 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15

    Findley, K. et al. Topographic diversity of fungal and bacterial communities in human skin. Nature 498, 367–370 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. 16

    Lavelle, A. et al. Spatial variation of the colonic microbiota in patients with ulcerative colitis and control volunteers. Gut 64, 1553–1561 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Huse, S. M. et al. Comparison of brush and biopsy sampling methods of the ileal pouch for assessment of mucosa-associated microbiota of human subjects. Microbiome 2, 5 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18

    Chiodini, R. J. et al. Microbial population differentials between mucosal and submucosal intestinal tissues in advanced crohn's disease of the ileum. PLoS ONE 10, e0134382 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. 19

    Watt, E. et al. Extending colonic mucosal microbiome analysis-assessment of colonic lavage as a proxy for endoscopic colonic biopsies. Microbiome 4, 61 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  20. 20

    Budding, A. E. et al. Rectal swabs for analysis of the intestinal microbiota. PLoS ONE 9, e101344 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Shobar, R. M. et al. The effects of bowel preparation on microbiota-related metrics differ in health and in inflammatory bowel disease and for the mucosal and luminal microbiota compartments. Clin. Transl Gastroenterol. 7, e143 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. 22

    Gevers, D. et al. The treatment-naive microbiome in new-onset Crohn's disease. Cell Host Microbe 15, 382–392 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Flemer, B. et al. Tumour-associated and non-tumour-associated microbiota in colorectal cancer. Gut http://dx.doi.org/10.1136/gutjnl-2015-309595 (2016).

  24. 24

    Gorzelak, M. A. et al. Methods for improving human gut microbiome data by reducing variability through sample processing and storage of stool. PLoS ONE 10, e0134802 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Cardona, S. et al. Storage conditions of intestinal microbiota matter in metagenomic analysis. BMC Microbiol. 12, 158 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Bahl, M. I., Bergstrom, A. & Licht, T. R. Freezing fecal samples prior to DNA extraction affects the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis. FEMS Microbiol. Lett. 329, 193–197 (2012).

    Article  CAS  PubMed  Google Scholar 

  27. 27

    Shaw, A. G. et al. Latitude in sample handling and storage for infant faecal microbiota studies: the elephant in the room? Microbiome 4, 40 http://dx.doi.org/10.1186/s40168-016-0186-x (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  28. 28

    Vogtmann, E. et al. Comparison of collection methods for fecal samples in microbiome studies. Am. J. Epidemiol. 185, 115–123 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  29. 29

    Hill, C. J. et al. Effect of room temperature transport vials on DNA quality and phylogenetic composition of faecal microbiota of elderly adults and infants. Microbiome 4, 19 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30

    Anderson, E. L. et al. A robust ambient temperature collection and stabilization strategy: enabling worldwide functional studies of the human microbiome. Sci. Rep. 6, 31731 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Flores, R. et al. Collection media and delayed freezing effects on microbial composition of human stool. Microbiome 3, 33 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  32. 32

    Choo, J. M., Leong, L. E. & Rogers, G. B. Sample storage conditions significantly influence faecal microbiome profiles. Scientif. Rep. 5, 16350 (2015).

    Article  CAS  Google Scholar 

  33. 33

    Sherker, A. R., Cherepanov, V., Alvandi, Z., Ramos, R. & Feld, J. J. Optimal preservation of liver biopsy samples for downstream translational applications. Hepatol. Int. 7, 758–766 (2013).

    Article  PubMed  Google Scholar 

  34. 34

    Persson, S., de Boer, R. F., Kooistra-Smid, A. M. & Olsen, K. E. Five commercial DNA extraction systems tested and compared on a stool sample collection. Diagnost. Microbiol. Infecti. Dis. 69, 240–244 (2011).

    Article  CAS  Google Scholar 

  35. 35

    Yuan, S., Cohen, D. B., Ravel, J., Abdo, Z. & Forney, L. J. Evaluation of methods for the extraction and purification of DNA from the human microbiome. PLoS ONE 7, e33865 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Li, F., Hullar, M. A. & Lampe, J. W. Optimization of terminal restriction fragment polymorphism (TRFLP) analysis of human gut microbiota. J. Microbiol. Methods 68, 303–311 (2007).

    Article  CAS  PubMed  Google Scholar 

  37. 37

    Ariefdjohan, M. W., Savaiano, D. A. & Nakatsu, C. H. Comparison of DNA extraction kits for PCR-DGGE analysis of human intestinal microbial communities from fecal specimens. Nutr. J. 9, 23 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. 38

    Becker, L., Steglich, M., Fuchs, S., Werner, G. & Nubel, U. Comparison of six commercial kits to extract bacterial chromosome and plasmid DNA for MiSeq sequencing. Scientif. Rep. 6, 28063 (2016).

    Article  CAS  Google Scholar 

  39. 39

    Mirsepasi, H. et al. Microbial diversity in fecal samples depends on DNA extraction method: easyMag DNA extraction compared to QIAamp DNA stool mini kit extraction. BMC Res. Notes 7, 50 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40

    Wesolowska-Andersen, A. et al. Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis. Microbiome 2, 19 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  41. 41

    Gerasimidis, K. et al. The effect of DNA extraction methodology on gut microbiota research applications. BMC Res. Notes 9, 365 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. 42

    Hart, M. L., Meyer, A., Johnson, P. J. & Ericsson, A. C. Comparative evaluation of DNA extraction methods from feces of multiple host species for downstream next-generation sequencing. PLoS ONE 10, e0143334 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Kennedy, N. A. et al. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing. PLoS ONE 9, e88982 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 http://dx.doi.org/10.1186/s12915-014-0087-z (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. 45

    Lauder, A. P. et al. Comparison of placenta samples with contamination controls does not provide evidence for a distinct placenta microbiota. Microbiome 4, 29 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  46. 46

    Perez-Munoz, M. E., Arrieta, M. C., Ramer-Tait, A. E. & Walter, J. A critical assessment of the “sterile womb” and “in utero colonization” hypotheses: implications for research on the pioneer infant microbiome. Microbiome 5, 48 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  47. 47

    Edri, S. & Tuller, T. Quantifying the effect of ribosomal density on mRNA stability. PLoS ONE 9, e102308 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  48. 48

    Goodwin, S., McPherson, J. D. & McCombie, W. R. Coming of age: ten years of next-generation sequencing technologies. Nature reviews. Genetics 17, 333–351 (2016).

    CAS  PubMed  Google Scholar 

  49. 49

    Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Treangen, T. J. et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol. 14, R2 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  51. 51

    Namiki, T., Hachiya, T., Tanaka, H. & Sakakibara, Y. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Nucleic Acids Res. 40, e155 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. 52

    Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1, 18 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  53. 53

    Compeau, P. E. C., Pevzner, P. A. & Tesla, G. How to apply de Bruijn graphs to genome assembly. Nat. Biotechnol. 29, 987–991 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Simpson, J. T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. 55

    Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

    Article  CAS  PubMed  Google Scholar 

  56. 56

    Afiahayati, Sato, K. & Sakakibara, Y. MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res.: Int. J. Rapid Publ. Rep. Genes Genomes 22, 69–77 (2015).

    Article  CAS  Google Scholar 

  57. 57

    Teeling, H., Waldmann, J., Lombardot, T., Bauer, M. & Glockner, F. O. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformat. 5, 163 (2004).

    Article  CAS  Google Scholar 

  58. 58

    Patil, K. R., Roune, L. & McHardy, A. C. The PhyloPythiaS web server for taxonomic assignment of metagenome sequences. PLoS ONE 7, e38581 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  60. 60

    Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nature Methods 12, 902–903 (2015).

    Article  CAS  PubMed  Google Scholar 

  61. 61

    Brady, A. & Salzberg, S. L. Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nature Methods 6, 673–676 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. 62

    Wang, Y., Leung, H., Yiu, S. & Chin, F. MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genom. 15 (Suppl. 1), S12 (2014).

    Article  Google Scholar 

  63. 63

    Wu, M. & Scott, A. J. Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28, 1033–1034 (2012).

    Article  CAS  PubMed  Google Scholar 

  64. 64

    Lin, H. H. & Liao, Y. C. Accurate binning of metagenomic contigs via automated clustering sequences using information of genomic signatures and marker genes. Scientif. Rep. 6, 24175 (2016).

    Article  CAS  Google Scholar 

  65. 65

    Peabody, M. A., Van Rossum, T., Lo, R. & Brinkman, F. S. Evaluation of shotgun metagenomics sequence classification methods using in silico and in vitro simulated communities. BMC Bioinformat. 16, 363 (2015).

    Article  CAS  Google Scholar 

  66. 66

    Ounit, R., Wanamaker, S., Close, T. J. & Lonardi, S. CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genom. 16, 236 (2015).

    Article  CAS  Google Scholar 

  67. 67

    Lindgreen, S., Adair, K. L. & Gardner, P. P. An evaluation of the accuracy and speed of metagenome analysis tools. Scientif. Rep. 6, 19233 (2016).

    Article  CAS  Google Scholar 

  68. 68

    Zhu, W., Lomsadze, A. & Borodovsky, M. Ab initio gene identification in metagenomic sequences. Nucleic Acids Res. 38, e132 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Rho, M., Tang, H. & Ye, Y. FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res. 38, e191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Hyatt, D., LoCascio, P. F., Hauser, L. J. & Uberbacher, E. C. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223–2230 (2012).

    Article  CAS  PubMed  Google Scholar 

  71. 71

    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

    Article  CAS  PubMed  Google Scholar 

  72. 72

    Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286–D293 (2016).

    Article  CAS  PubMed  Google Scholar 

  73. 73

    Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  74. 74

    Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–230 (2014).

    Article  CAS  PubMed  Google Scholar 

  76. 76

    Haft, D. H., Selengut, J. D. & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Hunter, S. et al. InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215 (2009).

    Article  CAS  PubMed  Google Scholar 

  78. 78

    Meyer, F. et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformat. 9, 386 (2008).

    Article  CAS  Google Scholar 

  79. 79

    Seshadri, R., Kravitz, S. A., Smarr, L., Gilna, P. & Frazier, M. CAMERA: a community resource for metagenomics. PLoS Biol. 5, e75 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. 80

    Hunter, S. et al. EBI metagenomics—a new resource for the analysis and archiving of metagenomic data. Nucleic Acids Res. 42, D600–606 (2014).

    Article  CAS  PubMed  Google Scholar 

  81. 81

    Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7, 335–336 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. 82

    Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. 83

    Edgar, R. C. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature Methods 10, 996–998 (2013).

    Article  CAS  PubMed  Google Scholar 

  84. 84

    Plummer, E., Twin, J., Bulach, D. M., Garland, S. M. & Tabrizi, S. N. A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data. J. Proteomics Bioinform, 8, 283–291 (2015).

    Article  Google Scholar 

  85. 85

    Westcott, S. L. & Schloss, P. D. de novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ 3, e1487 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. 86

    Jervis-Bardy, J. et al. Deriving accurate microbiota profiles from human samples with low bacterial content through post-sequencing processing of Illumina MiSeq data. Microbiome 3, 19 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  87. 87

    Kopylova, E. et al. Open-source sequence clustering methods improve the state of the art. mSystems 1, e00003–00015 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  88. 88

    Edgar, R. C., Haas, B. J., Clemente, J. C., Quince, C. & Knight, R. UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27, 2194–2200 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. 89

    Haas, B. J. et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 21, 494–504 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. 90

    Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. 91

    Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

    Article  CAS  PubMed  Google Scholar 

  92. 92

    DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. 93

    Maidak, B. L. et al. The RDP-II (Ribosomal Database Project). Nucleic Acids Res. 29, 173–174 (2001).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. 94

    Koljalg, U. et al. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 166, 1063–1068 (2005).

    Article  CAS  PubMed  Google Scholar 

  95. 95

    Allard, G., Ryan, F. J., Jeffery, I. B. & Claesson, M. J. SPINGO: a rapid species-classifier for microbial amplicon sequences. BMC Bioinformat. 16, 324 (2015).

    Article  CAS  Google Scholar 

  96. 96

    Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460–2461 (2010).

    Article  CAS  PubMed  Google Scholar 

  97. 97

    Chen, W., Zhang, C. K., Cheng, Y., Zhang, S. & Zhao, H. A comparison of methods for clustering 16S rRNA sequences into OTUs. PLoS ONE 8, e70837 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. 98

    Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nature Methods 13, 581–583 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. 99

    Huson, D. H. et al. MEGAN Community Edition — Interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Computat. Biol. 12, e1004957 (2016).

    Article  CAS  Google Scholar 

  100. 100

    McMurdie, P. J. & Holmes, S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. 101

    Langille, M. G. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nature Biotechnol. 31, 814–821 (2013).

    Article  CAS  Google Scholar 

  102. 102

    Asshauer, K. P., Wemheuer, B., Daniel, R. & Meinicke, P. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31, 2882–2884 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. 103

    van Nood, E. et al. Duodenal infusion of donor feces for recurrent Clostridium difficile. N. Engl. J. Med. 368, 407–415 (2013).

    Article  CAS  PubMed  Google Scholar 

  104. 104

    Le Chatelier, E. et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546 (2013).

    Article  CAS  PubMed  Google Scholar 

  105. 105

    Claesson, M. J. et al. Gut microbiota composition correlates with diet and health in the elderly. Nature 488, 178–184 (2012).

    Article  CAS  PubMed  Google Scholar 

  106. 106

    Olle, B. Medicines from microbiota. Nat. Biotechnol. 31, 309–315 (2013).

    Article  CAS  PubMed  Google Scholar 

  107. 107

    US Food and Drug Administration. Early Clinical Trials With Live Biotherapeutic Products: Chemistry, Manufacturing, and Control Information; Guidance for Industry (FDA, 2016).

  108. 108

    Goldberg, B., Sichtig, H., Geyer, C., Ledeboer, N. & Weinstock, G. M. Making the leap from research laboratory to clinic: challenges and opportunities for next-generation sequencing in infectious disease diagnostics. mBio 6, e01888–e01815 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. 109

    Wilson, M. R. et al. Acute west nile virus meningoencephalitis diagnosed via metagenomic deep sequencing of cerebrospinal fluid in a renal transplant patient. Am. J. Transplant. http://dx.doi.org/10.1111/ajt.14058 (2016).

  110. 110

    Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).

    Article  CAS  PubMed  Google Scholar 

  111. 111

    Bauer, E., Laczny, C. C., Magnusdottir, S., Wilmes, P. & Thiele, I. Phenotypic differentiation of gastrointestinal microbes is reflected in their encoded metabolic repertoires. Microbiome 3, 55 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  112. 112

    Heinken, A. & Thiele, I. Systems biology of host-microbe metabolomics. Wiley Interdiscip. Rev. Syst. Biol. Med. 7, 195–219 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  113. 113

    [No authors listed.] Babraham Bioinformatics http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

  114. 114

    [No authors listed.] Hannonlab http://hannonlab.cshl.edu/fastx_toolkit/index.html

  115. 115

    Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. 116

    Patel, R. K. & Jain, M. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS ONE 7, e30619 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. 117

    Zhou, Q., Su, X., Jing, G. & Ning, K. Meta-QC-Chain: comprehensive and fast quality control method for metagenomic data. Genom. Proteom. Bioinformat. 12, 52–56 (2014).

    Article  Google Scholar 

  118. 118

    Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics 27, i94–i101 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. 119

    Zerbino, D. R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. 120

    Boisvert, S., Raymond, F., Godzaridis, E., Laviolette, F. & Corbeil, J. Ray Meta: scalable de novo metagenome assembly and profiling. Genome Biol. 13, R122 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. 121

    Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).

    Article  CAS  PubMed  Google Scholar 

  122. 122

    Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  123. 123

    Haider, B. et al. Omega: an overlap-graph de novo assembler for metagenomics. Bioinformatics 30, 2717–2722 (2014).

    Article  CAS  PubMed  Google Scholar 

  124. 124

    Reddy, R. M., Mohammed, M. H. & Mande, S. S. MetaCAA: a clustering-aided methodology for efficient assembly of metagenomic datasets. Genomics 103, 161–168 (2014).

    Article  CAS  PubMed  Google Scholar 

  125. 125

    Nurk, S., Meleshko, D., Korobeynikov, A. & Pevzner, P. metaSPAdes: a new versatile de novo metagenomics assembler. arXiv 1604.03071 (2016).

  126. 126

    Ye, Y. & Tang, H. An ORFome assembly approach to metagenomics sequences analysis. J. Bioinformat. Computat. Biol. 7, 455–471 (2009).

    Article  CAS  Google Scholar 

  127. 127

    Yu, F., Sun, Y., Liu, L. & Farmerie, W. GSTaxClassifier: a genomic signature based taxonomic classifier for metagenomic data analysis. Bioinformation 4, 46–49 (2010).

    Article  Google Scholar 

  128. 128

    Sunagawa, S. et al. Metagenomic species profiling using universal phylogenetic marker genes. Nature Methods 10, 1196–1199 (2013).

    Article  CAS  PubMed  Google Scholar 

  129. 129

    Ounit, R. & Lonardi, S. Higher classification sensitivity of short metagenomic reads with CLARK-S. Bioinformatics http://dx.doi.org/10.1093/bioinformatics/btw542 (2016).

  130. 130

    Diaz, N. N., Krause, L., Goesmann, A., Niehaus, K. & Nattkemper, T. W. TACOA: taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinformat. 10, 56 (2009).

    Article  CAS  Google Scholar 

  131. 131

    Rosen, G. L., Reichenberger, E. R. & Rosenfeld, A. M. NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics 27, 127–129 (2011).

    Article  CAS  PubMed  Google Scholar 

  132. 132

    Stark, M., Berger, S. A., Stamatakis, A. & von Mering, C. MLTreeMap—accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genom. 11, 461 (2010).

    Article  CAS  Google Scholar 

  133. 133

    Freitas, T. A., Li, P. E., Scholz, M. B. & Chain, P. S. Accurate read-based metagenome characterization using a hierarchical suite of unique signatures. Nucleic Acids Res. 43, e69 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  134. 134

    Gerlach, W. & Stoye, J. Taxonomic classification of metagenomic shotgun sequences with CARMA3. Nucleic Acids Res. 39, e91 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  135. 135

    Ames, S. K. et al. Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 29, 2253–2260 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. 136

    Droge, J., Gregor, I. & McHardy, A. C. Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods. Bioinformatics 31, 817–824 (2015).

    Article  CAS  PubMed  Google Scholar 

  137. 137

    MacDonald, N. J., Parks, D. H. & Beiko, R. G. Rapid identification of high-confidence taxonomic assignments for metagenomic data. Nucleic Acids Res. 40, e111 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  138. 138

    Monzoorul Haque, M., Ghosh, T. S., Komanduri, D. & Mande, S. S. SOrt-ITEMS: sequence orthology based approach for improved taxonomic estimation of metagenomic sequences. Bioinformatics 25, 1722–1730 (2009).

    Article  CAS  PubMed  Google Scholar 

  139. 139

    Mohammed, M. H., Ghosh, T. S., Singh, N. K. & Mande, S. S. SPHINX—an algorithm for taxonomic binning of metagenomic sequences. Bioinformatics 27, 22–30 (2011).

    Article  CAS  PubMed  Google Scholar 

  140. 140

    Nalbantoglu, O. U., Way, S. F., Hinrichs, S. H. & Sayood, K. RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformat. 12, 41 (2011).

    Article  Google Scholar 

  141. 141

    Koslicki, D., Foucart, S. & Rosen, G. WGSQuikr: fast whole-genome shotgun metagenomic classification. PLoS ONE 9, e91784 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. 142

    Chan, C. K., Hsu, A. L., Halgamuge, S. K. & Tang, S. L. Binning sequences using very sparse labels within a metagenome. BMC Bioinformat. 9, 215 (2008).

    Article  CAS  Google Scholar 

  143. 143

    Schreiber, F., Gumrich, P., Daniel, R. & Meinicke, P. Treephyler: fast taxonomic profiling of metagenomes. Bioinformatics 26, 960–961 (2010).

    Article  CAS  PubMed  Google Scholar 

  144. 144

    Weber, M. et al. Practical application of self-organizing maps to interrelate biodiversity and functional data in NGS-based metagenomics. ISME J. 5, 918–928 (2011).

    Article  CAS  PubMed  Google Scholar 

  145. 145

    Pati, A., Heath, L. S., Kyrpides, N. C. & Ivanova, N. ClaMS: a classifier for metagenomic sequences. Standards Genom. Sci. 5, 248–253 (2011).

    Article  Google Scholar 

  146. 146

    Davenport, C. F. et al. Genometa—a fast and accurate classifier for short metagenomic shotgun reads. PLoS ONE 7, e41224 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. 147

    Sharma, A. K., Gupta, A., Kumar, S., Dhakan, D. B. & Sharma, V. K. Woods: a fast and accurate functional annotator and classifier of genomic and metagenomic sequences. Genomics 106, 1–6 (2015).

    Article  CAS  PubMed  Google Scholar 

  148. 148

    Ghosh, T. S., Monzoorul Haque, M. & Mande, S. S. DiScRIBinATE: a rapid method for accurate taxonomic classification of metagenomic sequences. BMC Bioinformat. 11 (Suppl. 7), S14 (2010).

    Article  Google Scholar 

  149. 149

    Liu, J. et al. Composition-based classification of short metagenomic sequences elucidates the landscapes of taxonomic and functional enrichment of microorganisms. Nucleic Acids Res. 41, e3 (2013).

    Article  CAS  PubMed  Google Scholar 

  150. 150

    Mohammed, M. H. et al. INDUS - a composition-based approach for rapid and accurate taxonomic classification of metagenomic sequences. BMC Genom. 12 (Suppl. 3), S4 (2011).

    Article  Google Scholar 

  151. 151

    Sharma, V. K., Kumar, N., Prakash, T. & Taylor, T. D. Fast and accurate taxonomic assignments of metagenomic sequences using MetaBin. PLoS ONE 7, e34030 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. 152

    Liu, B., Gibbons, T., Ghodsi, M., Treangen, T. & Pop, M. Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genom. 12 (Suppl. 2), S4 (2011).

    Article  CAS  Google Scholar 

  153. 153

    Rasheed, Z. & Rangwala, H. Metagenomic taxonomic classification using extreme learning machines. J. Bioinformat. Computat. Biol. 10, 1250015 (2012).

    Article  Google Scholar 

  154. 154

    Ander, C., Schulz-Trieglaff, O. B., Stoye, J. & Cox, A. J. metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences. BMC Bioinformat. 14 (Suppl. 5), S2 (2013).

    Article  Google Scholar 

  155. 155

    Porter, M. S. & Beiko, R. G. SPANNER: taxonomic assignment of sequences using pyramid matching of similarity profiles. Bioinformatics 29, 1858–1864 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. 156

    Piro, V. C., Lindner, M. S. & Renard, B. Y. DUDes: a top-down taxonomic profiler for metagenomics. Bioinformatics 32, 2272–2280 (2016).

    Article  CAS  PubMed  Google Scholar 

  157. 157

    Menzel, P., Ng, K. L. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nature Commun. 7, 11257 (2016).

    Article  CAS  Google Scholar 

  158. 158

    Wu, Y. W., Tang, Y. H., Tringe, S. G., Simmons, B. A. & Singer, S. W. MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome 2, 26 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  159. 159

    Petrenko, P., Lobb, B., Kurtz, D. A., Neufeld, J. D. & Doxey, A. C. MetAnnotate: function-specific taxonomic profiling and comparison of metagenomes. BMC Biol. 13, 92 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. 160

    Luo, C., Rodriguez, R. L. & Konstantinidis, K. T. MyTaxa: an advanced taxonomic classifier for genomic and metagenomic sequences. Nucleic Acids Res. 42, e73 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  161. 161

    Jiang, H., An, L., Lin, S. M., Feng, G. & Qiu, Y. A statistical framework for accurate taxonomic assignment of metagenomic sequencing reads. PLoS ONE 7, e46450 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. 162

    Klingenberg, H., Asshauer, K. P., Lingner, T. & Meinicke, P. Protein signature-based estimation of metagenomic abundances including all domains of life and viruses. Bioinformatics 29, 973–980 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. 163

    Reddy, R. M., Mohammed, M. H. & Mande, S. S. TWARIT: an extremely rapid and efficient approach for phylogenetic classification of metagenomic sequences. Gene 505, 259–265 (2012).

    Article  CAS  PubMed  Google Scholar 

  164. 164

    Hou, T. et al. Classification of metagenomics data at lower taxonomic level using a robust supervised classifier. Evol. Bioinformat. Online 11, 3–10 S20523 (2015).

    CAS  Google Scholar 

  165. 165

    Kristiansson, E., Hugenholtz, P. & Dalevi, D. ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. Bioinformatics 25, 2737–2738 (2009).

    Article  CAS  PubMed  Google Scholar 

  166. 166

    Li, W. Analysis and comparison of very large metagenomes with fast clustering and functional annotation. BMC Bioinformat. 10, 359 (2009).

    Article  CAS  Google Scholar 

  167. 167

    Kelley, D. R., Liu, B., Delcher, A. L., Pop, M. & Salzberg, S. L. Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering. Nucleic Acids Res. 40, e9 (2012).

    Article  CAS  PubMed  Google Scholar 

  168. 168

    Hoff, K. J., Lingner, T., Meinicke, P. & Tech, M. Orphelia: predicting genes in metagenomic sequencing reads. Nucleic Acids Res. 37, W101–W105 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. 169

    Liu, Y., Guo, J., Hu, G. & Zhu, H. Gene prediction in metagenomic fragments based on the SVM algorithm. BMC Bioinformat. 14 (Suppl. 5), S12 (2013).

    Article  Google Scholar 

  170. 170

    van der Veen, B. E., Harris, H. M., O'Toole, P. W. & Claesson, M. J. Metaphor: finding bi-directional best hit homology relationships in (meta)genomic datasets. Genomics 104, 459–463 (2014).

    Article  CAS  PubMed  Google Scholar 

  171. 171

    Liu, B. & Pop, M. MetaPath: identifying differentially abundant metabolic pathways in metagenomic datasets. BMC Proc. 5 (Suppl. 2), S9 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  172. 172

    Overbeek, R. et al. The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Res. 42, D206–D214 (2014).

    Article  CAS  PubMed  Google Scholar 

  173. 173

    Powell, S. et al. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res. 42, D231–D239 (2014).

    Article  CAS  PubMed  Google Scholar 

  174. 174

    Abubucker, S. et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Computat. Biol. 8, e1002358 (2012).

    Article  CAS  Google Scholar 

  175. 175

    Segata, N. et al. Metagenomic biomarker discovery and explanation. Genome Biol. 12, R60 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  176. 176

    Markowitz, V. M. et al. IMG/M 4 version of the integrated metagenome comparative analysis system. Nucleic Acids Res. 42, D568–573 (2014).

    Article  CAS  PubMed  Google Scholar 

  177. 177

    Wu, S., Zhu, Z., Fu, L., Niu, B. & Li, W. WebMGA: a customizable web server for fast metagenomic sequence analysis. BMC Genom. 12, 444 (2011).

    Article  Google Scholar 

  178. 178

    Goll, J. et al. METAREP: JCVI metagenomics reports—an open source tool for high-performance comparative metagenomics. Bioinformatics 26, 2631–2632 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  179. 179

    Su, X., Pan, W., Song, B., Xu, J. & Ning, K. Parallel-META 2.0: enhanced metagenomic data analysis with functional annotation, high performance computing and advanced visualization. PloS one 9, e89323 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  180. 180

    Kultima, J. R. et al. MOCAT: a metagenomics assembly and gene prediction toolkit. PLoS ONE 7, e47656 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  181. 181

    Afgan, E. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update. Nucleic Acids Res. 44, W3–W10 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  182. 182

    Fosso, B. et al. BioMaS: a modular pipeline for Bioinformatic analysis of Metagenomic AmpliconS. BMC Bioinformat. 16, 203 (2015).

    Article  Google Scholar 

  183. 183

    Angly, F. et al. PHACCS, an online tool for estimating the structure and diversity of uncultured viral communities using metagenomic information. BMC Bioinformat. 6, 41 (2005).

    Article  CAS  Google Scholar 

  184. 184

    Arumugam, M., Harrington, E. D., Foerstner, K. U., Raes, J. & Bork, P. SmashCommunity: a metagenomic annotation and analysis tool. Bioinformatics 26, 2977–2978 (2010).

    Article  CAS  PubMed  Google Scholar 

  185. 185

    Schloss, P. D. & Handelsman, J. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl. Environ. Microbiol. 71, 1501–1506 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  186. 186

    Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

    Article  CAS  PubMed  Google Scholar 

  187. 187

    Hao, X., Jiang, R. & Chen, T. Clustering 16S rRNA for OTU prediction: a method of unsupervised Bayesian clustering. Bioinformatics 27, 611–618 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  188. 188

    Cai, Y. & Sun, Y. ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA pyrosequences in quasilinear computational time. Nucleic Acids Res. 39, e95 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  189. 189

    Ghodsi, M., Liu, B. & Pop, M. DNACLUST: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinformat. 12, 271 (2011).

    Article  Google Scholar 

  190. 190

    Russell, D. J., Way, S. F., Benson, A. K. & Sayood, K. A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences. BMC Bioinformat. 11, 601 (2010).

    Article  Google Scholar 

  191. 191

    Wang, X., Yao, J., Sun, Y. & Mai, V. M-Pick, a modularity-based method for OTU picking of 16S rRNA sequences. BMC Bioinformat. 14, 43 (2013).

    Article  Google Scholar 

  192. 192

    Mahe, F., Rognes, T., Quince, C., de Vargas, C. & Dunthorn, M. Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ 3, e1420 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  193. 193

    Franzen, O. et al. Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering. Microbiome 3, 43 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  194. 194

    Wei, Z. G. & Zhang, S. W. MtHc: a motif-based hierarchical method for clustering massive 16S rRNA sequences into OTUs. Mol. bioSystems 11, 1907–1913 (2015).

    Article  CAS  Google Scholar 

  195. 195

    Mysara, M., Saeys, Y., Leys, N., Raes, J. & Monsieurs, P. CATCh, an ensemble classifier for chimera detection in 16S rRNA sequencing studies. Appl. Environ. Microbiol. 81, 1573–1584 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  196. 196

    Soergel, D. A., Dey, N., Knight, R. & Brenner, S. E. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 6, 1440–1444 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  197. 197

    Chaudhary, N., Sharma, A. K., Agarwal, P., Gupta, A. & Sharma, V. K. 16S classifier: a tool for fast and accurate taxonomic classification of 16S rRNA hypervariable regions in metagenomic datasets. PLoS ONE 10, e0116106 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  198. 198

    Stoddard, S. F., Smith, B. J., Hein, R., Roller, B. R. & Schmidt, T. M. rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res. 43, D593–D598 (2015).

    Article  CAS  PubMed  Google Scholar 

  199. 199

    Jaziri, F. et al. PhylOPDb: a 16S rRNA oligonucleotide probe database for prokaryotic identification. Database (Oxford) http://dx.doi.org/10.1093/database/bau036 (2014).

  200. 200

    Ritari, J., Salojarvi, J., Lahti, L. & de Vos, W. M. Improved taxonomic assignment of human intestinal 16S rRNA sequences by a dedicated reference database. BMC Genom. 16, 1056 (2015).

    Article  CAS  Google Scholar 

  201. 201

    Lozupone, C., Hamady, M. & Knight, R. UniFrac—an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformat. 7, 371 (2006).

    Article  CAS  Google Scholar 

  202. 202

    Gilmore, R. D., Cieplak, W., Policastro, P. F. & Hackstadt, T. The 120 kilodalton outer membrane (rOmpB) of Rickettsia rickettsii is encoded by an unusually long open reading frame: evidence for protein processing from a large precursor. Mol. Microbiol. 5, 2361–2370 (1991).

    Article  CAS  PubMed  Google Scholar 

  203. 203

    Paulson, J. N., Stine, O. C., Bravo, H. C. & Pop, M. Differential abundance analysis for microbial marker-gene surveys. Nature Methods 10, 1200–1202 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  204. 204

    Angly, F. E. et al. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction. Microbiome 2, 11 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  205. 205

    Beck, D., Settles, M. & Foster, J. A. OTUbase: an R infrastructure package for operational taxonomic unit data. Bioinformatics 27, 1700–1701 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  206. 206

    Seguritan, V. & Rohwer, F. FastGroup: a program to dereplicate libraries of 16S rDNA sequences. BMC Bioinformat. 2, 9 (2001).

    Article  CAS  Google Scholar 

  207. 207

    Giongo, A. et al. PANGEA: pipeline for analysis of next generation amplicons. ISME J. 4, 852–861 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  208. 208

    Kumar, S. et al. CLOTU: an online pipeline for processing and clustering of 454 amplicon reads into OTUs followed by taxonomic annotation. BMC Bioinformat. 12, 182 (2011).

    Article  Google Scholar 

  209. 209

    Nebel, M. E. et al. JAGUC—a software package for environmental diversity analyses. J. Bioinformat. Computat. Biol. 9, 749–773 (2011).

    Article  Google Scholar 

  210. 210

    Albanese, D., Fontana, P., De Filippo, C., Cavalieri, D. & Donati, C. MICCA: a complete and accurate software for taxonomic profiling of metagenomic data. Scientif. Rep. 5, 9743 (2015).

    Article  CAS  Google Scholar 

  211. 211

    Weisman, D., Yasuda, M. & Bowen, J. L. FunFrame: functional gene ecological analysis pipeline. Bioinformatics 29, 1212–1214 (2013).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

This work was supported by Science Foundation Ireland through a Centre Award to the APC Microbiome Institute (SFI/12/RC/2273).

Author information

Affiliations

Authors

Contributions

All authors contributed equally to this work.

Corresponding authors

Correspondence to Marcus J. Claesson or Paul W. O'Toole.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

PowerPoint slides

Glossary

Microbiome

The collection of microbial genomes at a given site.

Biomarkers

A measurable indicator of disease, pharmacological response or normal biological function.

Bioinformatics

The use of computer science, statistics and mathematics to analyse and interpret biological processes and molecular components.

Phylogenetics

Evolutionary relationships between organisms, genes or proteins.

Metagenome

The collective microbial genomes and genes in an environment or sample.

Shotgun sequencing

All extracted DNA is randomly sheered into desired fragment sizes for high-throughput sequencing, as opposed to targeting a specific marker gene.

Amplicon

A target gene or sequence that is amplified naturally or artificially.

Copy number

The number of copies of a particular section of DNA; some organisms have multiple copies of a targeted gene.

Taxa

A population of phylogenetically related organisms.

16S ribosomal RNA gene

A gene located in the 30S subunit of a prokaryotic ribosome, which contains nine variable regions that can be targeted for amplification and used for microbial taxonomic profiling of a sample.

18S ribosomal RNA gene

A gene located in the 40S ribosomal subunit found in eukaryotic cells, targeted in the analysis of fungal communities.

Alpha diversity

Microbiota diversity within an individual site or sample diversity; one value per sample.

Beta diversity

Intervariability, diversity between separate samples.

PCR bias

Unequal amounts of amplification across DNA sequences that leads to a skewed distribution of PCR products.

Metatranscriptomics

The study of RNA copies of the collective microbial genes in a community or sample.

Assembly

The process in which short DNA fragments are aligned and merged to form longer DNA fragments.

Contigs

Contiguous DNA sequences assembled from shorter, overlapping sequencing reads.

Annotation

Assigning functions or functional categories to gene or protein.

PHRED quality scores

A measure of the quality of base calling in a sequenced strand of DNA.

de Bruijn graphs

Consist of nodes (k-mers) and edges (overlaps between k-mers). The graph is constructed using k-mer overlaps leading to an assembled sequence.

Scaffolds

The product of aligning and merging contigs to form longer continuous DNA sequences.

Binning

Grouping DNA sequences based on particular attributes such as GC content or similarity with other genes.

k-mer

Short DNA sequence with fixed length k.

Homology

Shared ancestry or degree of relationship between sequences or genes.

Gene calling

Identifying coding regions in a sequence of DNA.

Orthologues

Genes in different species derived from a common ancestral gene following speciation, which usually retain the same function.

Pipelines

A series of tools or scripts optimized for the analysis of a dataset in which the outputs of one step are the inputs for next step.

Barcode sequence

A short series of DNA bases attached to sequence reads, each unique to a sample to enable differentiation after sequencing.

Operational taxonomic units

A collection (cluster) of sequences that are often at least 97% similar to each other and used to classify closely related individuals.

Reference database

A collection of known information (for example, gene sequences or functions) constructed in a format for querying or similarity-based searches.

Chimeric sequences

Artefacts from the PCR process in which an amplified sequence is composed of DNA from two or more parents.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Claesson, M., Clooney, A. & O'Toole, P. A clinician's guide to microbiome analysis. Nat Rev Gastroenterol Hepatol 14, 585–595 (2017). https://doi.org/10.1038/nrgastro.2017.97

Download citation

Further reading

Search

Quick links

Sign up for the Nature Briefing newsletter for a daily update on COVID-19 science.
Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing