Methods for phylogenetic analysis of microbiome data

Abstract

How does knowing the evolutionary history of microorganisms affect our analysis of microbiological datasets? Depending on the research question, the common ancestry of microorganisms can be a source of confounding variation, or a scaffolding used for inference. For example, when performing regression on traits, common ancestry is a source of dependence among observations, whereas when searching for clades with correlated abundances, common ancestry is the scaffolding for inference. The common ancestry of microorganisms and their genes are organized in trees—phylogenies—which can and should be incorporated into analyses of microbial datasets. While there has been a recent expansion of phylogenetically informed analytical tools, little guidance exists for which method best answers which biological questions. Here, we review methods for phylogeny-aware analyses of microbiome datasets, considerations for choosing the appropriate method and challenges inherent in these methods. We introduce a conceptual organization of these tools, breaking them down into phylogenetic comparative methods, ancestral state reconstruction and analysis of phylogenetic variables and distances, and provide examples in Supplementary Online Tutorials. Careful consideration of the research question and ecological and evolutionary assumptions will help researchers choose a phylogeny and appropriate methods to produce accurate, biologically informative and previously unreported insights.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: PCMs control for the statistical dependence among traits resulting from evolution of traits along the phylogenetic tree.
Fig. 2: Phylogenies define the geometry of community ecological data, much like a sphere defines the geometry of GPS data.
Fig. 3: Phylogeny-aware distances.

References

  1. 1.

    Martiny, J. B. H., Jones, S. E., Lennon, J. T. & Martiny, A. C. Microbiomes in light of traits: a phylogenetic perspective. Science 350, aac9323 (2015).

    Article  PubMed  CAS  Google Scholar 

  2. 2.

    Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

    Article  PubMed  CAS  Google Scholar 

  3. 3.

    Tilman, D. Resource Competition and Community Structure (Princeton Univ. Press, Princeton, 1982).

    Google Scholar 

  4. 4.

    MacArthur, R. H. Environmental factors affecting bird species diversity. Am. Nat. 98, 387–397 (1964).

    Article  Google Scholar 

  5. 5.

    May, R. M. Stability and Complexity in Model Ecosystems (Princeton Univ. Press, Princeton, 2001).

    Google Scholar 

  6. 6.

    Arditi, R. & Ginzburg, L. R. How Species Interact: Altering the Standard View on Trophic Ecology (Oxford University Press, Oxford, 2012).

    Google Scholar 

  7. 7.

    Consortium, H. M. P. et al. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).

    Article  CAS  Google Scholar 

  8. 8.

    Falkowski, P. G., Fenchel, T. & Delong, E. F. The microbial engines that drive Earth’s biogeochemical cycles. Science 320, 1034–1039 (2008).

    Article  PubMed  CAS  Google Scholar 

  9. 9.

    Bardgett, R. D., Freeman, C. & Ostle, N. J. Microbial contributions to climate change through carbon cycle feedbacks. ISME J. 2, 805–814 (2008).

    Article  PubMed  CAS  Google Scholar 

  10. 10.

    Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, Oxford, 2000).

    Google Scholar 

  11. 11.

    Yang, Z. & Rannala, B. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13, 303 (2012).

    Article  PubMed  CAS  Google Scholar 

  12. 12.

    Hillis, D. M. & Dixon, M. T. Ribosomal DNA: molecular evolution and phylogenetic inference. Q. Rev. Biol. 66, 411–453 (1991).

    Article  PubMed  CAS  Google Scholar 

  13. 13.

    Snel, B., Bork, P. & Huynen, M. A. Genome phylogeny based on gene content. Nat. Genet. 21, 108–110 (1999).

    Article  PubMed  CAS  Google Scholar 

  14. 14.

    Zaneveld, J. R., Lozupone, C., Gordon, J. I. & Knight, R. Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res. 38, 3869–3879 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. 15.

    Hall, B. G. & Barlow, M. Evolution of the serine β-lactamases: past, present and future. Drug Resist. Updat. 7, 111–123 (2004).

    Article  PubMed  CAS  Google Scholar 

  16. 16.

    Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19, 2226–2238 (2002).

    Article  PubMed  CAS  Google Scholar 

  17. 17.

    Větrovský, T. & Baldrian, P. The variability of the 16S rRNA gene in bacterial genomes and its consequences for bacterial community analyses. PLoS ONE 8, e57923 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. 18.

    Lozupone, C. A., Hamady, M., Kelley, S. T. & Knight, R. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Appl. Environ. Microbiol. 73, 1576–1585 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. 19.

    Stone, E. A. Why the phylogenetic regression appears robust to tree misspecification. Syst. Biol. 60, 245–260 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Riesenfeld, S. J. & Pollard, K. S. Beyond classification: gene-family phylogenies from shotgun metagenomic reads enable accurate community analysis. BMC Genomics 14, 419 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. 21.

    Felsenstein, J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791 (1985).

    Article  PubMed  Google Scholar 

  22. 22.

    Grafen, A. The phylogenetic regression. Philos. Trans. R. Soc. Lond. B Biol. Sci. 326, 119–157 (1989).

    Article  PubMed  CAS  Google Scholar 

  23. 23.

    Martins, E. P. & Hansen, T. F. Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am. Nat. 149, 646–667 (1997).

    Article  Google Scholar 

  24. 24.

    Blomberg, S. P., Lefevre, J. G., Wells, J. A. & Waterhouse, M. Independent contrasts and PGLS regression estimators are equivalent. Syst. Biol. 61, 382–391 (2012).

    Article  PubMed  Google Scholar 

  25. 25.

    Pagel, M. Inferring the historical patterns of biological evolution. Nature 401, 877–884 (1999).

    Article  PubMed  CAS  Google Scholar 

  26. 26.

    Blomberg, S. P., Garland, T. Jr, Ives, A. R. & Crespi, B. Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57, 717–745 (2003).

    Article  PubMed  Google Scholar 

  27. 27.

    Lavin, S. R., Karasov, W. H., Ives, A. R., Middleton, K. M., & Garland, T.Jr. Morphometrics of the avian small intestine compared with that of nonflying mammals: a phylogenetic approach. Physiol. Biochem. Zool. 81, 526–550 (2008).

    Article  PubMed  Google Scholar 

  28. 28.

    Lindenfors, P., Revell, L. J. & Nunn, C. L. Sexual dimorphism in primate aerobic capacity: a phylogenetic test. J. Evol. Biol. 23, 1183–1194 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  29. 29.

    Garamszegi, L. Z. Modern Phylogenetic Comparative Methods and their Application in Evolutionary Biology: Concepts and Practice (Springer, London, 2014).

  30. 30.

    Bradley, P. H., Nayfach, S. & Pollard, K. S. Phylogeny-corrected identification of microbial gene families relevant to human gut colonization. Preprint at https://www.biorxiv.org/content/early/2017/09/16/189795 (2017).

  31. 31.

    Paradis, E., Claude, J. & Strimmer, K. APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289–290 (2004).

    Article  PubMed  CAS  Google Scholar 

  32. 32.

    Schliep, K. P. phangorn: phylogenetic analysis in R. Bioinformatics 27, 592–593 (2011).

    Article  PubMed  CAS  Google Scholar 

  33. 33.

    Revell, L. J. phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223 (2012).

    Article  Google Scholar 

  34. 34.

    Kembel, S. W. et al. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 26, 1463–1464 (2010).

    Article  PubMed  CAS  Google Scholar 

  35. 35.

    Orme, D. The Caper Package: Comparative Analysis of Phylogenetics and Evolution in R. R Package v.5 (CRAN, 2013).

  36. 36.

    Harmon, L. J., Weir, J. T., Brock, C. D., Glor, R. E. & Challenger, W. GEIGER: investigating evolutionary radiations. Bioinformatics 24, 129–131 (2007).

    Article  PubMed  CAS  Google Scholar 

  37. 37.

    Tung Ho, Ls & Ané, C. A linear-time algorithm for Gaussian and non-Gaussian trait evolution models. Syst. Biol. 63, 397–408 (2014).

    Article  Google Scholar 

  38. 38.

    Langille, M. G. I. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 31, 814–821 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  39. 39.

    Cunningham, C. W., Omland, K. E. & Oakley, T. H. Reconstructing ancestral character states: a critical reappraisal. Trends Ecol. Evol. 13, 361–366 (1998).

    Article  PubMed  CAS  Google Scholar 

  40. 40.

    Joy, J. B., Liang, R. H., McCloskey, R. M., Nguyen, T. & Poon, A. F. Y. Ancestral reconstruction. PLoS Comput. Biol. 12, e1004763 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. 41.

    Kuhner, M. K. & Felsenstein, J. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol. 11, 459–468 (1994).

    PubMed  CAS  Google Scholar 

  42. 42.

    Joy, J. B., Liang, R. H., Mccloskey, R. M., Nguyen, T. & Art, F. Ancestral reconstruction. PLoS Comput. Biol. 112, e1004763 (2016).

    Article  CAS  Google Scholar 

  43. 43.

    Washburne, A. D. et al. Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 5, e2969 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Silverman, J. D., Washburne, A. D., Mukherjee, S. & David, L. A. A phylogenetic transform enhances analysis of compositional microbiota data. eLife 6, e21887 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Socolar, J. & Washburne, A. Prey carrying capacity modulates the effect of predation on prey diversity. Am. Nat. 186, 333–347 (2015).

    Article  PubMed  Google Scholar 

  46. 46.

    McCann, K. S. The diversity-stability debate. Nature 405, 228 (2000).

    Article  PubMed  CAS  Google Scholar 

  47. 47.

    Socolar, J. B., Gilroy, J. J., Kunin, W. E. & Edwards, D. P. How should beta-diversity inform biodiversity conservation? Trends Ecol. Evol. 31, 67–80 (2016).

    Article  PubMed  Google Scholar 

  48. 48.

    Aitchison, J. The Statistical Analysis of Compositional Data (Chapman and Hall, London, 1986).

    Google Scholar 

  49. 49.

    Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data. Can. J. Microbiol. 62, 692–703 (2016).

    Article  PubMed  CAS  Google Scholar 

  50. 50.

    Klappenbach, J. A., Dunbar, J. M. & Schmidt, T. M. rRNA operon copy number reflects ecological strategies of bacteria. Appl. Environ. Microbiol. 66, 1328–1333 (2000).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  51. 51.

    Lozupone, C. & Knight, R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71, 8228–8235 (2005).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. 52.

    Chang, Q., Luan, Y. & Sun, F. Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics 12, 118 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Chen, J. et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. 28, 2106–2113 (2012).

  54. 54.

    Swenson, N. G. Phylogenetic beta diversity metrics, trait evolution and inferring the functional beta diversity of communities. PLoS ONE 6, e21264 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. 55.

    Chen, J., Bushman, F. D., Lewis, J. D., Wu, G. D. & Li, H. Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. Biostatistics 14, 244–258 (2013).

    Article  PubMed  Google Scholar 

  56. 56.

    Purdom, E. Analysis of a data matrix and a graph: metagenomic data and the phylogenetic tree. Ann. Appl. Stat. 5, 2326–2358 (2011).

    Article  Google Scholar 

  57. 57.

    Fukuyama, J. et al. Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment. PLoS Comput. Biol. 13, e1005706 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  58. 58.

    Hamady, M., Lozupone, C. & Knight, R. Fast UniFrac: facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data. ISME J. 4, 17 (2010).

    Article  PubMed  CAS  Google Scholar 

  59. 59.

    Gogarten, J. P. & Townsend, J. P. Horizontal gene transfer, genome innovation and evolution. Nat. Rev. Microbiol. 3, 679 (2005).

    Article  PubMed  CAS  Google Scholar 

  60. 60.

    Cohen, O., Gophna, U. & Pupko, T. The complexity hypothesis revisited: connectivity rather than function constitutes a barrier to horizontal gene transfer research article. 28, 1481–1489 (2011).

  61. 61.

    Kitahara, K. & Miyazaki, K. Natural and experimental evidence for horizontal gene transfer of 16S rRNA revisiting bacterial phylogeny. 3, e24210 (2013).

  62. 62.

    Segata, N., Börnigen, D., Morgan, X. C. & Huttenhower, C. PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nat. Commun. 4, 2304 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. 63.

    Than, C., Ruths, D. & Nakhleh, L. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9, 322 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. 64.

    Ravenhall, M., Škunca, N., Lassalle, F. & Dessimoz, C. Inferring horizontal gene transfer. PLoS Comput. Biol. 11, e1004095 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. 65.

    Lozupone, C. A. & Knight, R. Species divergence and the measurement of microbial diversity. FEMS Microbiol. Rev. 32, 557–578 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. 66.

    Diniz-Filho, J. A. F., Sant’Ana, C. E. R. & Bini, L. M. An eigenvector method for estimating phylogenetic inertia. Evolution 52, 1247–1262 (1998).

    Article  PubMed  Google Scholar 

  67. 67.

    Gloor, G. B. & Reid, G. Compositional analysis: a valid approach to analyze microbiome high throughput sequencing data. Can. J. Microbiol. 62, 692–703 (2016).

    Article  PubMed  CAS  Google Scholar 

  68. 68.

    Freckleton, R. P., Cooper, N. & Jetz, W. Comparative methods as a statistical fix: the dangers of ignoring an evolutionary model. Am. Nat. 178, E10–E17 (2011).

    Article  PubMed  Google Scholar 

  69. 69.

    Heath, T. A., Hedtke, S. M. & Hillis, D. M. Taxon sampling and the accuracy of phylogenetic analyses. J. Syst. Evol. 46, 239–257 (2008).

    Google Scholar 

  70. 70.

    Locey, K. J. & Lennon, J. T. Scaling laws predict global microbial diversity. Proc. Natl Acad. Sci. USA 113, 5970–5975 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  71. 71.

    Hipsley, C. A. & Müller, J. Beyond fossil calibrations: realities of molecular clock practices in evolutionary biology. Front. Genet. 5, 138 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  72. 72.

    Forest, F. Calibrating the tree of life: fossils, molecules and evolutionary timescales. Ann. Bot. 104, 789–794 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Yang, Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11, 367–372 (1996).

    Article  PubMed  CAS  Google Scholar 

  74. 74.

    Hodgkinson, A. & Eyre-Walker, A. Variation in the mutation rate across mammalian genomes. Nat. Rev. Genet. 12, 756 (2011).

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

A.D.W. received support from Duke University Biology Department’s provision of start-up funds for D. Nemergut (deceased) and the Defense Advanced Research Projects Agency (DARPA) grant D16AP0013. This paper is published in the spirit of D. Nemergut’s contagious love of science.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Alex D. Washburne.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Washburne, A.D., Morton, J.T., Sanders, J. et al. Methods for phylogenetic analysis of microbiome data. Nat Microbiol 3, 652–661 (2018). https://doi.org/10.1038/s41564-018-0156-0

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing