Review Article | Published:

Methods for phylogenetic analysis of microbiome data


How does knowing the evolutionary history of microorganisms affect our analysis of microbiological datasets? Depending on the research question, the common ancestry of microorganisms can be a source of confounding variation, or a scaffolding used for inference. For example, when performing regression on traits, common ancestry is a source of dependence among observations, whereas when searching for clades with correlated abundances, common ancestry is the scaffolding for inference. The common ancestry of microorganisms and their genes are organized in trees—phylogenies—which can and should be incorporated into analyses of microbial datasets. While there has been a recent expansion of phylogenetically informed analytical tools, little guidance exists for which method best answers which biological questions. Here, we review methods for phylogeny-aware analyses of microbiome datasets, considerations for choosing the appropriate method and challenges inherent in these methods. We introduce a conceptual organization of these tools, breaking them down into phylogenetic comparative methods, ancestral state reconstruction and analysis of phylogenetic variables and distances, and provide examples in Supplementary Online Tutorials. Careful consideration of the research question and ecological and evolutionary assumptions will help researchers choose a phylogeny and appropriate methods to produce accurate, biologically informative and previously unreported insights.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Martiny, J. B. H., Jones, S. E., Lennon, J. T. & Martiny, A. C. Microbiomes in light of traits: a phylogenetic perspective. Science 350, aac9323 (2015).

  2. 2.

    Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

  3. 3.

    Tilman, D. Resource Competition and Community Structure (Princeton Univ. Press, Princeton, 1982).

  4. 4.

    MacArthur, R. H. Environmental factors affecting bird species diversity. Am. Nat. 98, 387–397 (1964).

  5. 5.

    May, R. M. Stability and Complexity in Model Ecosystems (Princeton Univ. Press, Princeton, 2001).

  6. 6.

    Arditi, R. & Ginzburg, L. R. How Species Interact: Altering the Standard View on Trophic Ecology (Oxford University Press, Oxford, 2012).

  7. 7.

    Consortium, H. M. P. et al. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

  8. 8.

    Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive Earth’s biogeochemical cycles. Science 320, 1034–1039 (2008).

  9. 9.

    Bardgett, R. D., Freeman, C. & Ostle, N. J. Microbial contributions to climate change through carbon cycle feedbacks. ISME J. 2, 805–814 (2008).

  10. 10.

    Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, Oxford, 2000).

  11. 11.

    Yang, Z. & Rannala, B. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13, 303 (2012).

  12. 12.

    Hillis, D. M. & Dixon, M. T. Ribosomal DNA: molecular evolution and phylogenetic inference. Q. Rev. Biol. 66, 411–453 (1991).

  13. 13.

    Snel, B., Bork, P. & Huynen, M. A. Genome phylogeny based on gene content. Nat. Genet. 21, 108–110 (1999).

  14. 14.

    Zaneveld, J. R., Lozupone, C., Gordon, J. I. & Knight, R. Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879 (2010).

  15. 15.

    Hall, B. G. & Barlow, M. Evolution of the serine β-lactamases: past, present and future. Drug Resist. Updat. 7, 111–123 (2004).

  16. 16.

    Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19, 2226–2238 (2002).

  17. 17.

    Větrovský, T. & Baldrian, P. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS ONE 8, e57923 (2013).

  18. 18.

    Lozupone, C. A., Hamady, M., Kelley, S. T. & Knight, R. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Appl. Environ. Microbiol. 73, 1576–1585 (2007).

  19. 19.

    Stone, E. A. Why the phylogenetic regression appears robust to tree misspecification. Syst. Biol. 60, 245–260 (2011).

  20. 20.

    Riesenfeld, S. J. & Pollard, K. S. Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis. BMC Genomics 14, 419 (2013).

  21. 21.

    Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).

  22. 22.

    Grafen, A. The phylogenetic regression. Philos. Trans. R. Soc. Lond. B Biol. Sci. 326, 119–157 (1989).

  23. 23.

    Martins, E. P. & Hansen, T. F. Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am. Nat. 149, 646–667 (1997).

  24. 24.

    Blomberg, S. P., Lefevre, J. G., Wells, J. A. & Waterhouse, M. Independent contrasts and PGLS regression estimators are equivalent. Syst. Biol. 61, 382–391 (2012).

  25. 25.

    Pagel, M. Inferring the historical patterns of biological evolution. Nature 401, 877–884 (1999).

  26. 26.

    Blomberg, S. P., Garland, T. Jr, Ives, A. R. & Crespi, B. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57, 717–745 (2003).

  27. 27.

    Lavin, S. R., Karasov, W. H., Ives, A. R., Middleton, K. M., & Garland, T.Jr. Morphometrics of the avian small intestine compared with that of nonflying mammals: a phylogenetic approach. Physiol. Biochem. Zool. 81, 526–550 (2008).

  28. 28.

    Lindenfors, P., Revell, L. J. & Nunn, C. L. Sexual dimorphism in primate aerobic capacity: a phylogenetic test. J. Evol. Biol. 23, 1183–1194 (2010).

  29. 29.

    Garamszegi, L. Z. Modern Phylogenetic Comparative Methods and their Application in Evolutionary Biology: Concepts and Practice (Springer, London, 2014).

  30. 30.

    Bradley, P. H., Nayfach, S. & Pollard, K. S. Phylogeny-corrected identification of microbial gene families relevant to human gut colonization. Preprint at (2017).

  31. 31.

    Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).

  32. 32.

    Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).

  33. 33.

    Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).

  34. 34.

    Kembel, S. W. et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 26, 1463–1464 (2010).

  35. 35.

    Orme, D. The Caper Package: Comparative Analysis of Phylogenetics and Evolution in R. R Package v.5 (CRAN, 2013).

  36. 36.

    Harmon, L. J., Weir, J. T., Brock, C. D., Glor, R. E. & Challenger, W. GEIGER: investigating evolutionary radiations. Bioinformatics 24, 129–131 (2007).

  37. 37.

    Tung Ho, Ls & Ané, C. A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Syst. Biol. 63, 397–408 (2014).

  38. 38.

    Langille, M. G. I. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 31, 814–821 (2013).

  39. 39.

    Cunningham, C. W., Omland, K. E. & Oakley, T. H. Reconstructing ancestral character states: a critical reappraisal. Trends Ecol. Evol. 13, 361–366 (1998).

  40. 40.

    Joy, J. B., Liang, R. H., McCloskey, R. M., Nguyen, T. & Poon, A. F. Y. Ancestral reconstruction. PLoS Comput. Biol. 12, e1004763 (2016).

  41. 41.

    Kuhner, M. K. & Felsenstein, J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459–468 (1994).

  42. 42.

    Joy, J. B., Liang, R. H., Mccloskey, R. M., Nguyen, T. & Art, F. Ancestral reconstruction. PLoS Comput. Biol. 112, e1004763 (2016).

  43. 43.

    Washburne, A. D. et al. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 5, e2969 (2017).

  44. 44.

    Silverman, J. D., Washburne, A. D., Mukherjee, S. & David, L. A. A phylogenetic transform enhances analysis of compositional microbiota data. eLife 6, e21887 (2017).

  45. 45.

    Socolar, J. & Washburne, A. Prey carrying capacity modulates the effect of predation on prey diversity. Am. Nat. 186, 333–347 (2015).

  46. 46.

    McCann, K. S. The diversity-stability debate. Nature 405, 228 (2000).

  47. 47.

    Socolar, J. B., Gilroy, J. J., Kunin, W. E. & Edwards, D. P. How should beta-diversity inform biodiversity conservation? Trends Ecol. Evol. 31, 67–80 (2016).

  48. 48.

    Aitchison, J. The Statistical Analysis of Compositional Data (Chapman and Hall, London, 1986).

  49. 49.

    Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can. J. Microbiol. 62, 692–703 (2016).

  50. 50.

    Klappenbach, J. A., Dunbar, J. M. & Schmidt, T. M. rRNA operon copy number reflects ecological strategies of bacteria. Appl. Environ. Microbiol. 66, 1328–1333 (2000).

  51. 51.

    Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71, 8228–8235 (2005).

  52. 52.

    Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics 12, 118 (2011).

  53. 53.

    Chen, J. et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. 28, 2106–2113 (2012).

  54. 54.

    Swenson, N. G. Phylogenetic beta diversity metrics, trait evolution and inferring the functional beta diversity of communities. PLoS ONE 6, e21264 (2011).

  55. 55.

    Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D. & Li, H. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14, 244–258 (2013).

  56. 56.

    Purdom, E. Analysis of a data matrix and a graph: metagenomic data and the phylogenetic tree. Ann. Appl. Stat. 5, 2326–2358 (2011).

  57. 57.

    Fukuyama, J. et al. Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment. PLoS Comput. Biol. 13, e1005706 (2017).

  58. 58.

    Hamady, M., Lozupone, C. & Knight, R. Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J. 4, 17 (2010).

  59. 59.

    Gogarten, J. P. & Townsend, J. P. Horizontal gene transfer, genome innovation and evolution. Nat. Rev. Microbiol. 3, 679 (2005).

  60. 60.

    Cohen, O., Gophna, U. & Pupko, T. The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer research article. 28, 1481–1489 (2011).

  61. 61.

    Kitahara, K. & Miyazaki, K. Natural and experimental evidence for horizontal gene transfer of 16S rRNA revisiting bacterial phylogeny. 3, e24210 (2013).

  62. 62.

    Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).

  63. 63.

    Than, C., Ruths, D. & Nakhleh, L. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9, 322 (2008).

  64. 64.

    Ravenhall, M., Škunca, N., Lassalle, F. & Dessimoz, C. Inferring horizontal gene transfer. PLoS Comput. Biol. 11, e1004095 (2015).

  65. 65.

    Lozupone, C. A. & Knight, R. Species divergence and the measurement of microbial diversity. FEMS Microbiol. Rev. 32, 557–578 (2008).

  66. 66.

    Diniz-Filho, J. A. F., Sant’Ana, C. E. R. & Bini, L. M. An eigenvector method for estimating phylogenetic inertia. Evolution 52, 1247–1262 (1998).

  67. 67.

    Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze microbiome high throughput sequencing data. Can. J. Microbiol. 62, 692–703 (2016).

  68. 68.

    Freckleton, R. P., Cooper, N. & Jetz, W. Comparative methods as a statistical fix: the dangers of ignoring an evolutionary model. Am. Nat. 178, E10–E17 (2011).

  69. 69.

    Heath, T. A., Hedtke, S. M. & Hillis, D. M. Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol. 46, 239–257 (2008).

  70. 70.

    Locey, K. J. & Lennon, J. T. Scaling laws predict global microbial diversity. Proc. Natl Acad. Sci. USA 113, 5970–5975 (2016).

  71. 71.

    Hipsley, C. A. & Müller, J. Beyond fossil calibrations: realities of molecular clock practices in evolutionary biology. Front. Genet. 5, 138 (2014).

  72. 72.

    Forest, F. Calibrating the tree of life: fossils, molecules and evolutionary timescales. Ann. Bot. 104, 789–794 (2009).

  73. 73.

    Yang, Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11, 367–372 (1996).

  74. 74.

    Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756 (2011).

Download references


A.D.W. received support from Duke University Biology Department’s provision of start-up funds for D. Nemergut (deceased) and the Defense Advanced Research Projects Agency (DARPA) grant D16AP0013. This paper is published in the spirit of D. Nemergut’s contagious love of science.

Author information

Competing interests

The authors declare no competing interests.

Correspondence to Alex D. Washburne.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading

Fig. 1: PCMs control for the statistical dependence among traits resulting from evolution of traits along the phylogenetic tree.
Fig. 2: Phylogenies define the geometry of community ecological data, much like a sphere defines the geometry of GPS data.
Fig. 3: Phylogeny-aware distances.