Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Best practices for analysing microbiomes


Complex microbial communities shape the dynamics of various environments, ranging from the mammalian gastrointestinal tract to the soil. Advances in DNA sequencing technologies and data analysis have provided drastic improvements in microbiome analyses, for example, in taxonomic resolution, false discovery rate control and other properties, over earlier methods. In this Review, we discuss the best practices for performing a microbiome study, including experimental design, choice of molecular analysis technology, methods for data analysis and the integration of multiple omics data sets. We focus on recent findings that suggest that operational taxonomic unit-based analyses should be replaced with new methods that are based on exact sequence variants, methods for integrating metagenomic and metabolomic data, and issues surrounding compositional data analysis, where advances have been particularly rapid. We note that although some of these approaches are new, it is important to keep sight of the classic issues that arise during experimental design and relate to research reproducibility. We describe how keeping these issues in mind allows researchers to obtain more insight from their microbiome data sets.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Experimental design considerations for microbiome experiments.
Fig. 2: Best workflow for 16S ribosomal RNA, metagenomic and metatranscriptomic sequencing.
Fig. 3: Integrating omics data with microbiome data.


  1. 1.

    Meisel, J. S., Hannigan, G. D. & Tyldsley, A. S. Skin microbiome surveys are strongly influenced by experimental design. J. Invest. Dermatol. 136, 947–956 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2.

    Falony, G. et al. Population-level analysis of gut microbiome variation. Science 29, 560–564 (2016).

    Google Scholar 

  3. 3.

    Noguera-Julian, M. et al. Gut microbiota linked to sexual preference and HIV infection. EBioMedicine. 5, 135–146 (2016).

    PubMed  PubMed Central  Google Scholar 

  4. 4.

    Wu, Gary, D. et al. Linking long-term dietary patterns with gut microbial enterotypes. Science 334, 105–108 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Forslund, K. et al. Disentangling the effects of type 2 diabetes and metformin on the human gut microbiota. Nature 528, 262–266 (2015). This study is an excellent example of how study design and metadata collection can influence experimental results.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Jackson, M. A. et al. Proton pump inhibitors alter the composition of the gut microbiota. Gut 65, 749–756 (2016).

    PubMed  Google Scholar 

  7. 7.

    Halfvarson, J. Dynamics of the human gut microbiome in inflammatory bowel disease. Nat. Microbiol. 2, 17004 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Kelly, B. J. et al. Power and sample-size estimation for microbiome studies using pairwise distances and PERMANOVA. Bioinformatics 31, 2461–2468 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Debelius, J., Song, S. J., Vazquez-Baeza, Y., Xu, Z. Z., Gonzalez, A. & Knight, R. Tiny microbes, enormous impacts: what matters in gut microbiome studies? Genome Biol. 17, 217 (2016).

    PubMed  PubMed Central  Google Scholar 

  10. 10.

    La Rosa, P. S. et al. Hypothesis testing and power calculations for taxonomic-based human microbiome data. PLoS ONE. 7, e52078 (2012).

    PubMed  PubMed Central  Google Scholar 

  11. 11.

    Knights, D., Costello, E. K. & Knight, R. Supervised classification of human microbiota. FEMS Microbiol. Rev. 35, 343–359 (2011).

    CAS  PubMed  Google Scholar 

  12. 12.

    Dethlefsen, L. & Relman, D. A. Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc. Natl Acad. Sci. USA 108, 4554–4561 (2011).

    CAS  PubMed  Google Scholar 

  13. 13.

    Fierer, N., Hamady, M., Lauber, C. L. & Knight, R. The influence of sex, handedness, and washing on the diversity of hand surface bacteria. Proc. Natl Acad. Sci. USA 105, 17994–17999 (2008).

    CAS  PubMed  Google Scholar 

  14. 14.

    Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    The Human Microbiome Project Consortium. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012). This study was the first large-scale effort to characterize the healthy human microbiota and commonly used reference database.

    PubMed Central  Google Scholar 

  16. 16.

    McDonald, D., Birmingham, A. & Knight, R. Context and the human microbiome. Microbiome 3, 52 (2015).

    PubMed  PubMed Central  Google Scholar 

  17. 17.

    Ramette, A. Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol. 62, 142–160 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Kostic, A. D., Howitt, M. R. & Garrett, W. S. Exploring host-microbiota interactions in animal models and humans. Genes Dev. 27, 701–718 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Ridaura, V. K. et al. Cultured gut microbiota from twins discordant for obesity modulate adiposity and metabolic phenotypes in mice. Science 341, 6150 (2013).

    Google Scholar 

  20. 20.

    Reber, S. O. et al. Immunization with a heat-killed preparation of the environmental bacterium Mycobacterium Vaccae promotes stress resilience in mice. Proc. Natl Acad. Sci. USA 113, E3130–E3139 (2016).

    CAS  PubMed  Google Scholar 

  21. 21.

    Ley, R. E. et al. Obesity alters gut microbial ecology. Proc. Natl Acad. Sci. USA 102, 11070–11075 (2005).

    CAS  PubMed  Google Scholar 

  22. 22.

    Friswell, M. K. et al. Site and strain-specific variation in gut microbiota profiles and metabolism in experimental mice. PLoS ONE. 5, e8584 (2010).

    PubMed  PubMed Central  Google Scholar 

  23. 23.

    Snijders, A. M. et al. Influence of early life exposure, host genetics and diet on the mouse gut microbiome and metabolome. Nat. Microbiol. 2, 16221 (2016).

    CAS  PubMed  Google Scholar 

  24. 24.

    Stagaman, K., Burns, A. R., Guillemin, K. & Bohannan, B. J. The role of adaptive immunity as an ecological filter on the gut microbiota in zebrafish. ISME J. 11, 1630–1639 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Sinha, R. et al. Assessment of variation in microbial community amplicon sequencing by the Microbiome Quality Control (MBQC) project consortium. Nat. Biotechnol. 35, 1077–1086 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Costea, P. I. et al. Towards standards for human fecal sample processing in metagenomic studies. Nat. Biotechnol. 35, 1069–1076 (2017).

    CAS  PubMed  Google Scholar 

  27. 27.

    Salter, S. J. et al. Reagent and laboratory contamination can. critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).

    PubMed  PubMed Central  Google Scholar 

  28. 28.

    Amir, A. et al. Correcting for microbial blooms in fecal samples during room-temperature shipping. mSystems 2, e00199–00116 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Fouhy, F. et al. The effects of freezing on faecal microbiota as determined using MiSeq sequencing and culture-based investigations. PLoS ONE. 10, e0119355 (2015).

    PubMed  PubMed Central  Google Scholar 

  30. 30.

    Song, S. J. et al. Preservation methods differ in fecal microbiome stability, affecting suitability for field studies. mSystems 1, e00021–00016 (2016).

    PubMed  PubMed Central  Google Scholar 

  31. 31.

    Jumpstart Consortium Human Microbiome Project Data Generation Working Group. Evaluation of 16S rDNA-based community profiling for human microbiome research. PLoS ONE. 7, e39315 (2012).

    PubMed Central  Google Scholar 

  32. 32.

    Chase, J. et al. Geography and location are the primary drivers of office microbiome composition. mSystems 1, e00022–00016 (2016).

    PubMed  PubMed Central  Google Scholar 

  33. 33.

    Walker, A. W. et al. 16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice. Microbiome 3, 26 (2015).

    PubMed  PubMed Central  Google Scholar 

  34. 34.

    Bonnet, R., Suau, A., Doré, J., Gibson, G. R. & Collins, M. D. Differences in rDNA libraries of faecal bacteria derived from 10- and 25-cycle PCRs. Int. J. Syst. Evol. Microbiol. 52, 757–763 (2002).

    CAS  PubMed  Google Scholar 

  35. 35.

    Salter, S. J. et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 12, 87 (2014).

    PubMed  PubMed Central  Google Scholar 

  36. 36.

    Walters, W. A. et al. PrimerProspector: de novo design and taxonomic analysis of barcoded polymerase chain reaction primers. Bioinformatics 27, 1159–1161 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Zaneveld, J. R., Lozupone, C., Gordon, J. I. & Knight, R. Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Okuda, S., Tsuchiya, Y., Kiriyama, C., Itoh, M. & Morisaki, H. Virtual metagenome reconstruction from 16S rRNA gene sequences. Nat. Commun. 3, 1203 (2012).

    PubMed  Google Scholar 

  39. 39.

    Langille, M. G. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 31, 814–821 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Aßhauer, K. P., Wemheuer, B., Daniel, R. & Meinicke, P. Tax4Fun: predicting functional profiles from metagenomic 16S rRNA data. Bioinformatics 31, 2882–2884 (2015).

    PubMed  PubMed Central  Google Scholar 

  41. 41.

    Jun, S. R., Robeson, M. S., Hauser, L. J., Schadt, C. W. & Gorin, A. A. PanFP: pangenome-based functional profiles for microbial communities. BMC Res. Notes 8, 479 (2015).

    PubMed  PubMed Central  Google Scholar 

  42. 42.

    Scholz, M. et al. Strain-level microbial epidemiology and population genomics from shotgun metagenomics. Nat. Methods. 13, 435–438 (2016).

    CAS  PubMed  Google Scholar 

  43. 43.

    Mukherjee, S. et al. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Biotechnol. 35, 676–683 (2016).

    Google Scholar 

  44. 44.

    Abubucker, Sahar et al. Metabolic reconstruction for metagenomic data and its application to the human microbiome. PLoS Comput. Biol. 8, e1002358 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Quince, C., Walker, A. W. & Simpson, J. T. Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844 (2017). This is a comprehensive review on using shotgun metagenomics.

    CAS  PubMed  Google Scholar 

  46. 46.

    Carini, P. et al. Relic DNA is abundant in soil and obscures estimates of soil microbial diversity. Nat. Microbiol. 2, 16242 (2016).

    PubMed  Google Scholar 

  47. 47.

    Emerson, J. B. et al. Schrödinger’s microbes: tools for distinguishing the living from the dead in microbial ecosystems. Microbiome 5, 86 (2017).

    PubMed  PubMed Central  Google Scholar 

  48. 48.

    Giannoukos, G. et al. Efficient and robust RNA-Seq process for cultured bacteria and complex community transcriptomes. Genome Biol. 13, 3 (2012).

    Google Scholar 

  49. 49.

    Wang, Y., Hayatsu, M. & Fujii, T. Extraction of bacterial RNA from soil: challenges and solutions. Microbes Environ. 27, 111–121 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50.

    Tveit, A. T., Urich, T. & Svenning, M. M. Metatranscriptomic analysis of arctic peat soil microbiota. Appl. Environ. Microbiol. 80, 5761–5772 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Franzosa, E. A. et al. Relating the metatranscriptome and metagenome of the human gut. Proc. Natl Acad. Sci. USA 111, E2329–E2338 (2014).

    CAS  PubMed  Google Scholar 

  52. 52.

    Maurice, C. F., Haiser, H. J. & Turnbaugh, P. J. Xenobiotics shape the physiology and gene expression of the active human gut microbiome. Cell 152, 39–50 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Bashiardes, S., Zilberman-Schapira, G. & Elinav, E. Use of metatranscriptomics in microbiome research. Bioinform. Biol. Insights. 10, 19–25 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Soergel, D. A. W., Dey, N., Knight, R. & Brenner, S. E. Selection of primers for optimal taxonomic classification of environmental 16S rRNA gene sequences. ISME J. 6, 1440–1444 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Thompson, L. et al. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551, 457–453 (2017). This study develops and implements standardized protocols and new analytical methods that enabled a massive comparison of over 100 studies to characterize the microbial diversity on Earth.

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Glenn, T. C. Field guide to next-generation DNA sequencers. Mol. Ecol. Resour. 11, 759–769 (2011).

    CAS  PubMed  Google Scholar 

  57. 57.

    Kunin, V., Engelbrektson, A., Ochman, H. & Hugenholtz, P. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ. Microbiol. 12, 118–123 (2010).

    CAS  PubMed  Google Scholar 

  58. 58.

    Reeder, J. & Knight, R. The ‘rare biosphere’: a reality check. Nat. Methods. 6, 636–637 (2009).

    CAS  PubMed  Google Scholar 

  59. 59.

    Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods. 7, 335–336 (2010). This is a widely used software package for microbiome analysis.

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Schloss, P. D. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541 (2009).This is a widely used software package for microbiome analysis.

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Callahan, B. J., McMurdie, P. J. & Holmes, S. P. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639–2643 (2017).

    PubMed  PubMed Central  Google Scholar 

  62. 62.

    Eren, A. M. et al. Oligotyping: differentiating between closely related microbial taxa using 16S rRNA gene data. Methods Ecol. Evol. 4, 1111–1119 (2013).

    PubMed Central  Google Scholar 

  63. 63.

    Amir, A. et al. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems. 2, e00191–e00116 (2017).

    PubMed  PubMed Central  Google Scholar 

  64. 64.

    Callahan, B. J. et al. DADA2: high resolution sample inference from Illumina amplicon data. Nat. Methods. 13, 581–583 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Lozupone, C. A. et al. “Meta-analyses of studies of the human microbiota”. Genome Res. 23, 1704–1714 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naïve bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67.

    McDonald, D. et al. An improved greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Kuczynski, J. et al. Microbial community resemblance methods differ in their ability to detect biologically relevant patterns. Nat. Methods. 7, 813–819 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Olm, M. R. et al. The source and evolutionary history of a microbial contaminant identified through soil metagenomic analysis. MBio. 8, e01969–16 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Wood, D. E. & Salzberg, S. L. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol. 15, R46 (2014).

    PubMed  PubMed Central  Google Scholar 

  71. 71.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 4, 357–359 (2012).

    Google Scholar 

  72. 72.

    Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73.

    McIntyre, A. B. R. et al. Comprehensive benchmarking and ensemble approaches for metagenomic classifiers. Genome Biol. 18, 182 (2017).

    PubMed  PubMed Central  Google Scholar 

  74. 74.

    Truong, D. T. et al. MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods. 12, 902–903 (2015).

    CAS  PubMed  Google Scholar 

  75. 75.

    Nguyen, N., Mirarab, S., Liu, B., Pop, M. & Warnow, T. TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30, 3548–3555 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76.

    Huson, D. H. et al. MEGAN community edition - interactive exploration and analysis of large-scale microbiome sequencing data. PLoS Comput. Biol 12, e1004957 (2016).

    PubMed  PubMed Central  Google Scholar 

  77. 77.

    O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    PubMed  Google Scholar 

  78. 78.

    Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).

    CAS  PubMed  Google Scholar 

  79. 79.

    Suzek, B. E., Wang, Y., Huang, H., McGarvey, P. B. & Wu, C. H. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 926–932 (2015).

    CAS  PubMed  Google Scholar 

  80. 80.

    Markowitz, V. M. et al. IMG: The integrated microbial genomes database and comparative analysis system. Nucleic Acids Res. 40, D115–D122 (2012).

    CAS  PubMed  Google Scholar 

  81. 81.

    Arndt, D. et al. PHASTER: a better, faster version of the PHAST phage search tool. Nucleic Acids Res. 44, 1–6 (2016).

    Google Scholar 

  82. 82.

    Gibson, M. K., Forsberg, K. J. & Dantas, G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J. 9, 1–10 (2014).

    Google Scholar 

  83. 83.

    Prestat, E. et al. FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus. Nucleic Acids Res. 42, e145 (2014).

    PubMed  PubMed Central  Google Scholar 

  84. 84.

    Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348, 6237 (2015).

    Google Scholar 

  85. 85.

    Xiao, L. et al. A catalog of the mouse gut metagenome. Nat. Biotechnol. 33, 1103–1108 (2015).

    CAS  PubMed  Google Scholar 

  86. 86.

    Qin, J. et al. A human gut microbial gene catalog established by metagenomic sequencing. Nature 464, 59–65 (2010). This study is the first large-scale effort to catalogue microbial genomes in the human gut using shotgun metagenomic sequencing.

    CAS  PubMed  PubMed Central  Google Scholar 

  87. 87.

    Medema, M. H. et al. AntiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 39, W339–W346 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. 88.

    Howe, A. C. et al. Tackling soil diversity with the assembly of large, complex metagenomes. Proc. Natl Acad. Sci. USA 111, 4904–4909 (2014).

    CAS  PubMed  Google Scholar 

  89. 89.

    Ye, Y. & Tang, H. Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis. Bioinformatics 32, 1001–1008 (2016).

    CAS  PubMed  Google Scholar 

  90. 90.

    Narayanasamy, S. et al. IMP: a pipeline for reproducible reference-independent integrated metagenomic and metatranscriptomic analyses. Genome Biol. 17, 260 (2016).

    PubMed  PubMed Central  Google Scholar 

  91. 91.

    Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. 92.

    Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. 93.

    Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2014).

    Google Scholar 

  94. 94.

    Vollmers, J., Wiegand, S. & Kaster, A. K. Comparing and evaluating metagenome assembly tools from a microbiologist’s perspective - not only size matters! PLoS ONE 12, e0169662 (2017).

    PubMed  PubMed Central  Google Scholar 

  95. 95.

    Wu, Y. W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2015).

    PubMed  Google Scholar 

  96. 96.

    Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods. 11, 1144–1146 (2014).

    CAS  PubMed  Google Scholar 

  97. 97.

    Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  98. 98.

    Laczny, C. C. et al. VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data. Microbiome 3, 1 (2015).

    PubMed  PubMed Central  Google Scholar 

  99. 99.

    Eren, A. M. et al. Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3, e1319 (2015).

    PubMed  PubMed Central  Google Scholar 

  100. 100.

    White Iii, R. A. et al. ATLAS (Automatic Tool for Local Assembly Structures) -a comprehensive infrastructure for assembly, annotation, and genomic binning of metagenomic and metatranscriptomic data. PeerJ (2017).

    Article  Google Scholar 

  101. 101.

    Treangen, T. J. et al. MetAMOS: a modular and open source metagenomic assembly and analysis pipeline. Genome Biol. 14, R2 (2013).

    PubMed  PubMed Central  Google Scholar 

  102. 102.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).

    CAS  Google Scholar 

  103. 103.

    Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. 104.

    Sczyrba, A. et al. Critical assessment of metagenome interpretation–a benchmark of computational metagenomics software. Nat. Methods 14, 1063–1071 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. 105.

    Barwell, L. J., Isaac, N. J. B. & Kunin, W. E. Measuring ß-diversity with species abundance data. J. Anim. Ecol. 84, 1112–1122 (2015).

    PubMed  PubMed Central  Google Scholar 

  106. 106.

    Hamady, M., Lozupone, C. & Knight, R. Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J. 4, 17–27 (2010). This study underscores the power of incorporating phylogenetic information when comparing microbial communities.

    CAS  PubMed  Google Scholar 

  107. 107.

    Dixon, P. VEGAN, a package of R functions for community ecology. J. Veg Sci. 14, 927–930 (2003).

    Google Scholar 

  108. 108.

    Anderson, M. J. & Walsh, D. C. I. What null hypothesis are you testing? PERMANOVA, ANOSIM and the Mantel test in the face of heterogeneous dispersions. Ecol. Monogr. 83, 557–574 (2013).

    Google Scholar 

  109. 109.

    Weiss, S. et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome 5, 27 (2017).

    PubMed  PubMed Central  Google Scholar 

  110. 110.

    McMurdie, P. J. & Holmes, S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10, e1003531 (2014).

    PubMed  PubMed Central  Google Scholar 

  111. 111.

    Vázquez-Baeza, Y., Pirrung, M., Gonzalez, A. & Knight, R. EMPeror: a tool for visualizing high-throughput microbial community data. GigaScience 2, 16 (2013).

    PubMed  PubMed Central  Google Scholar 

  112. 112.

    Aitchison, J. The statistical analysis of compositional data. J. R. Stat. Soc. Series B. Stat. Methodol. 44, 139–177 (1987).

    Google Scholar 

  113. 113.

    Mandal, S. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb. Ecol. Health Dis. 26, 27663 (2015).

    PubMed  Google Scholar 

  114. 114.

    Weiss, S. et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J. 10, 1–13 (2016).

    CAS  Google Scholar 

  115. 115.

    Lovell, D., Pawlowsky-Glahn, V., Egozcue, J. J., Marguerat, S. & Bähler, J. Proportionality: a valid alternative to correlation for relative data. PLoS Comput. Biol. 11, e1004075 (2015).

    PubMed  PubMed Central  Google Scholar 

  116. 116.

    Friedman, J. & Alm, E. J. Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8, e1002687 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  117. 117.

    Kurtz, Z. D. et al. Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput. Biol. 11, e1004226 (2015).

    PubMed  PubMed Central  Google Scholar 

  118. 118.

    Schwager, E., Mallick, H., Ventz, S. & Huttenhower, C. A. Bayesian method for detecting pairwise associations in compositional data. PLoS Comput. Biol. 13, e1005852 (2017).

    PubMed  PubMed Central  Google Scholar 

  119. 119.

    Washburne, A. D. et al. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 5, e2969 (2017).

    PubMed  PubMed Central  Google Scholar 

  120. 120.

    Silverman, J. D., Washburne, A. D., Mukherjee, S. & David, L. A. A phylogenetic transform enhances analysis of compositional microbiota data. eLife 6, e21887 (2017).

    PubMed  PubMed Central  Google Scholar 

  121. 121.

    Morton, J. T. et al. Balance trees reveal microbial niche differentiation. mSystems 2, e00162–00116 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  122. 122.

    Vandeputte, D. et al. Quantitative microbiome profiling links gut community variation to microbial load. Nature 551, 507–511 (2017).

    CAS  Google Scholar 

  123. 123.

    Kleyer, H., Tecon, R. & Or, D. Resolving species level changes in a representative soil bacterial community using microfluidic quantitative. Front. Microbiol. 8, 2017 (2017).

    PubMed  PubMed Central  Google Scholar 

  124. 124.

    Knights, D., Parfrey, L. W., Zaneveld, J., Lozupone, C. & Knight, R. Human-associated microbial signatures: examining their predictive value. Cell Host Microbe. 10, 292–296 (2011).

    CAS  PubMed  Google Scholar 

  125. 125.

    Yazdani, M. et al. Using machine learning to identify major shifts in human gut microbiome protein family abundance in disease. IEEE (2016).

    Article  Google Scholar 

  126. 126.

    Huang, S. et al. Predictive modeling of gingivitis severity and susceptibility via oral microbiota. ISME J. 8, 1768–1780 (2014).

    PubMed  PubMed Central  Google Scholar 

  127. 127.

    Teng, F. et al. Prediction of early childhood caries via spatial-temporal variations of oral microbiota. Cell Host Microbe. 18, 296–306 (2015).

    CAS  PubMed  Google Scholar 

  128. 128.

    Metcalf, J. L. et al. Microbial community assembly and metabolic function during mammalian corpse decomposition. Science 351, 158–162 (2016).

    CAS  PubMed  Google Scholar 

  129. 129.

    Subramanian, S. et al. Persistent gut microbiota immaturity in malnourished Bangladeshi children. Nature 510, 417–421 (2014). This study demonstrates the power of machine learning with microbiome data by developing a microbiota maturity index.

    CAS  PubMed  PubMed Central  Google Scholar 

  130. 130.

    Knights, D. et al. Bayesian community-wide culture-independent microbial source tracking. Nat. Methods. 8, 761–763 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  131. 131.

    Lax, S. et al. Longitudinal analysis of microbial interaction between humans and the indoor environment. Science 345, 1048–1052 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  132. 132.

    Roume, H. et al. A biomolecular isolation framework for eco-systems biology. ISME J. 7, 110–121 (2013).

    CAS  PubMed  Google Scholar 

  133. 133.

    Nicholson, J. K. & Lindon, J. C. Systems biology: metabonomics. Nature 455, 1054–1056 (2008).

    CAS  PubMed  Google Scholar 

  134. 134.

    Wang, R. & Seyedsayamdost, M. R. Hijacking exogenous signals to generate new secondary metabolites during symbiotic interactions. Nat. Rev. Chem. 1, 21 (2017).

    Google Scholar 

  135. 135.

    Huan, T. et al. Systems biology guided by XCMS online metabolomics addressing reproducibility in single- laboratory phenotyping experiments. Nat. Methods 14, 461–462 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  136. 136.

    Hurley, J. R. & Cattell, R. B. The procrustes program: producing direct rotation to test a hypothesized factor structure. Behav. Sci. 7, 258–262 (1962).

    Google Scholar 

  137. 137.

    Mantel, N. The detection of disease clustering and a generalized regression approach. Cancer Res. 27, 209–220 (1967).

    CAS  PubMed  Google Scholar 

  138. 138.

    Doledec, S. & Chessel, D. Co-inertia analysis: an alternative method for studying species-environment relationships. Freshwater Biol. 31, 277–294 (1994).

    Google Scholar 

  139. 139.

    Boulesteix, A. & Strimmer, K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief. Bioinform. 8, 32–44 (2007).

    CAS  PubMed  Google Scholar 

  140. 140.

    Witten, D. M. & Tibshirani, R. J. Extensions of sparse canonical correlation analysis with applications to genomic data. Stat. Appl. Genet. Mol. Biol. 8, 1–27 (2009).

    Google Scholar 

  141. 141.

    Wilms, I. & Croux, C. Robust sparse canonical correlation analysis. BMC Syst. Biol. 10, 72 (2016).

    PubMed  PubMed Central  Google Scholar 

  142. 142.

    Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  143. 143.

    Dhanasekaran, A. R., Pearson, J. L., Ganesan, B. & Weimer, B. C. Metabolome searcher: a high throughput tool for metabolite identification and metabolic pathway mapping directly from mass spectrometry and using genome restriction. BMC Bioinformatics. 16, 62 (2015).

    PubMed  PubMed Central  Google Scholar 

  144. 144.

    Protsyuk, Ivan. et al. 3D molecular cartography using LC-MS combined with optimus and ‘ili software. Nat. Protoc. 13, 134–154 (2018).

    CAS  PubMed  Google Scholar 

  145. 145.

    McHardy, I. H. et al. Integrative analysis of the microbiome and metabolome of the human intestinal mucosal surface reveals exquisite inter-relationships. Microbiome 1, 17 (2013).

    PubMed  PubMed Central  Google Scholar 

  146. 146.

    Whiteson, K. L. et al. Breath gas metabolites and bacterial metagenomes from cystic fibrosis airways indicate active pH neutral 2,3-butanedione fermentation. ISME J. 8, 1247–1258 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  147. 147.

    Theriot, C. M. et al. Antibiotic-induced shifts in the mouse gut microbiome and metabolome increase susceptibility to Clostridium difficile infection. Nat. Commun. 5, 3114 (2014). A great example of omics data integration (microbiome and metabolome data).

    PubMed  PubMed Central  Google Scholar 

  148. 148.

    Erickson, A. R. et al. Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn’s disease. PLoS ONE. 7, e49138 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  149. 149.

    Hultman, J. et al. Multi-omics of permafrost, active layer and thermokarst bog soil microbiomes. Nature 521, 208–212 (2015).

    CAS  PubMed  Google Scholar 

  150. 150.

    Jagtap, P. D. et al. Metaproteomic analysis using the galaxy framework. Proteomics 15, 3553–3565 (2015).

    CAS  PubMed  Google Scholar 

  151. 151.

    Cheng, K. et al. MetaLab: an automated pipeline for metaproteomic data analysis. Microbiome 5, 157 (2017).

    PubMed  PubMed Central  Google Scholar 

  152. 152.

    Yilmaz, P. et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specificaitons. Nat. Biotechnol. 29, 415–420 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  153. 153.

    Ríos-Covián, D. et al. Intestinal short chain fatty acids and their link with diet and human health. Front. Microbiol. 7, 185 (2016).

    PubMed  PubMed Central  Google Scholar 

  154. 154.

    Balskus, E. P. Colibactin: understanding an elusive gut bacterial genotoxin. Nat. Prod. Rep. 32, 1534–1540 (2015).

    CAS  PubMed  Google Scholar 

  155. 155.

    Quinn, R. A. et al. Microbial, host and xenobiotic diversity in the cystic fibrosis sputum metabolome. ISME J. 95384, 1–16 (2015).

    Google Scholar 

  156. 156.

    Fang, H., Huang, C., Zhao, H. & Deng, M. CCLasso: correlation inference for compositional data through lasso. Bioinformatics 31, 3172–3180 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  157. 157.

    Lê Cao, K. A., González, I. & Déjean, S. IntegrOmics: an R package to unravel relationships between two omics datasets. Bioinformatics 25, 2855–2856 (2009).

    PubMed  PubMed Central  Google Scholar 

  158. 158.

    Wikoff, W. R. et al. Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites. Proc. Natl Acad. Sci. USA 106, 3698–3703 (2009).

    CAS  PubMed  Google Scholar 

  159. 159.

    Liu, Z., Lozupone, C., Hamady, M., Bushman, F. D. & Knight, R. Short pyrosequencing reads suffice for accurate microbial community analysis. Nucleic Acids Res. 35, e120 (2007).

    PubMed  PubMed Central  Google Scholar 

  160. 160.

    The Integrative HMP (iHMP) Research Network Consortium. The integrative human microbiome project: dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014).

    Google Scholar 

  161. 161.

    Korem, T. et al. Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples. Science 349, 1101–1106 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  162. 162.

    Sangwan, N., Xia, F. & Gilbert, J. A. Recovering complete and draft population genomes from metagenome datasets. Microbiome 4, 8 (2016).

    PubMed  PubMed Central  Google Scholar 

  163. 163.

    Bikel, S. et al. Combining metagenomics, metatranscriptomics and viromics to explore novel microbial interactions: towards a systems-level understanding of human microbiome. Comput. Struct. Biotechnol. J. 13, 390–401 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  164. 164.

    Sultan, M. et al. Influence of RNA extraction methods and library selection schemes on RNA-seq data. BMC Genomics. 15, 675 (2014).

    PubMed  PubMed Central  Google Scholar 

  165. 165.

    Peano, C. et al. An efficient rRNA removal method for RNA sequencing in GC-rich bacteria. Microb. Inform. Exp. 3, 1 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


This review is informed by our work funded by the National Institutes of Health, National Science Foundation, Alfred P. Sloan Foundation, John Templeton Foundation and W. M. Keck Foundation, as well as that of hundreds of collaborators on the Human Microbiome Project, American Gut Project and Earth Microbiome Project.

Reviewer information

Nature Reviews Microbiology thanks J. Raes and other anonymous reviewers for their contributions to the peer review of this work.

Author information




A.V. and B.C.T. researched the data for the article. A.G., T.K., D.M., J.N., J.G.S. and J.R.Z. substantially contributed to discussion of content. R.K., A.V., B.C.T., A.A., C.C., J.D., L.M., A.V.M., J.T.M., R.A.Q., L.R.T., A.T., Z.Z.X., Q.Z. and J.G.C. wrote the article. R.K., A.V., B.C.T., T.K., D.M., A.D.S. and P.C.D. reviewed and edited the manuscript before submission.

Corresponding author

Correspondence to Rob Knight.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links


Galaxy (

GitHub (

Jupyter Notebooks (


Qiita (

R Markdown (


Exact sequence variants

For marker gene sequencing, the exact DNA sequence for each read is used instead of operational taxonomic unit clustering.

Operational taxonomic units

(OTUs). A group of closely related individuals or sequences (often 97% sequence similarity threshold).

Machine learning

The use of algorithms to learn from and make predictions about data.


Information about the data. In many studies, this is structured as a matrix with samples as rows and metadata categories (age, sex, longitude, season, disease state, average monthly rainfall, and so on) as columns.

Alpha diversity

A measure of within-sample diversity.

Effect size analysis

Quantification of the magnitude of an effect of a particular metadata category (treatment group, sex and sequencing plate) on the data.

Marker genes

Conserved genes (commonly 16S ribosomal RNA (rRNA), internal transcribed spacer (ITS) and 18 S rRNA) that typically contain a highly variable region that can be used for detailed identification that is flanked by highly conserved regions that can serve as binding sites for PCR primers.

Nested statistical tests

Statistical tests that address variables related to the main effect. For example, soil plot would be a nested factor for testing the effects of a fertilizer on the soil microbiota.


Involving the consumption of faeces. Many animal species eat faeces to more efficiently break down plant matter by digesting the material twice.


Inferred sequences of base pairs in a single DNA fragment.


The total content of gene transcripts from a community of organisms.

Humic substances

Produced by biodegrading organic matter; humic substances are the main component of humus (soil).


The collection of genetic material from a community of organisms, for example, the genetic material from all microorganisms in the human gut microbiome.

Naive Bayesian classifier

A simple probabilistic classifier used in machine learning that is based on applying Bayes’ theorem assuming strong independence between the features.


All possible sequences of length k from a read obtained through DNA sequencing.

Beta diversity

A measure of similarity between samples.

Faith’s phylogenetic diversity

An alpha diversity metric that uses a phylogenetic tree to compute sample diversity.

Shannon index

A commonly used index to characterize species diversity in a community.

False discovery rates

A method of understanding the rate of type I errors in null hypothesis testing when performing multiple comparisons.

Isometric log ratio transform

(ilr). Converts a vector of proportions into a vector of log ratios using a tree as a reference. The computed log ratios consist of the difference of mean logarithms of species proportions between adjacent clades within the tree.

Random forests regression

A machine learning technique that uses decision trees to perform classification.

Family-wise error

The probability of making one or more type I errors (false discoveries) when performing multiple hypotheses tests.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Knight, R., Vrbanac, A., Taylor, B.C. et al. Best practices for analysing microbiomes. Nat Rev Microbiol 16, 410–422 (2018).

Download citation

Further reading


Quick links