Recent studies have shown that microbial communities are composed of groups of functionally cohesive taxa whose abundance is more stable and better-associated with metabolic fluxes than that of any individual taxon. However, identifying these functional groups in a manner that is independent of error-prone functional gene annotations remains a major open problem. Here we tackle this structure–function problem by developing a novel unsupervised approach that coarse-grains taxa into functional groups, solely on the basis of the patterns of statistical variation in species abundances and functional read-outs. We demonstrate the power of this approach on three distinct datasets. On data of replicate microcosms with heterotrophic soil bacteria, our unsupervised algorithm recovered experimentally validated functional groups that divide metabolic labour and remain stable despite large variation in species composition. When leveraged against the ocean microbiome data, our approach discovered a functional group that combines aerobic and anaerobic ammonia oxidizers whose summed abundance tracks closely with nitrate concentrations in the water column. Finally, we show that our framework can enable the detection of species groups that are probably responsible for the production or consumption of metabolites abundant in animal gut microbiomes, serving as a hypothesis-generating tool for mechanistic studies. Overall, this work advances our understanding of structure–function relationships in complex microbiomes and provides a powerful approach to discover functional groups in an objective and systematic manner.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
R scripts are available at https://github.com/Xiaoyu2425/Ensemble-Quotient-Optimization. An R package for EQO, mEQO, is available at https://github.com/Xiaoyu2425/mEQO with an accompanying vignette illustrating how to use the package.
Louca, S. et al. Function and functional redundancy in microbial systems. Nat. Ecol. Evol. https://doi.org/10.1038/s41559-018-0519-1 (2018).
Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. https://doi.org/10.1038/ncomms13219 (2016).
Louca, S., Parfrey, L. W. & Doebeli, M. Decoupling function and taxonomy in the global ocean microbiome. Science https://doi.org/10.1126/science.aaf4507 (2016).
Goldford, J. E. et al. Emergent simplicity in microbial community assembly. Science 361, 469–474 (2018).
Nelson, M. B., Martiny, A. C. & Martiny, J. B. H. Global biogeography of microbial nitrogen-cycling traits in soil. Proc. Natl Acad. Sci. USA https://doi.org/10.1073/pnas.1601070113 (2016).
Díaz, S. & Cabido, M. Vive la différence: plant functional diversity matters to ecosystem processes. Trends Ecol. Evol. 16, 646–655 (2001).
Benoit, D. M., Jackson, D. A. & Chu, C. Partitioning fish communities into guilds for ecological analyses: an overview of current approaches and future directions. Can. J. Fish. Aquat. Sci. https://doi.org/10.1139/cjfas-2020-0455 (2021).
Blaum, N., Mosner, E., Schwager, M. & Jeltsch, F. How functional is functional? Ecological groupings in terrestrial animal ecology: towards an animal functional type approach. Biodivers. Conserv. 20, 2333–2345 (2011).
Hussain, F. A. et al. Rapid evolutionary turnover of mobile genetic elements drives bacterial resistance to phages. Science 374, 488–492 (2021).
Goyal, A., Bittleston, L. S., Leventhal, G. E., Lu, L. & Cordero, O. X. Interactions between strains govern the eco-evolutionary dynamics of microbial communities. eLife https://doi.org/10.7554/eLife.74987 (2022).
Schindler, D. E., Armstrong, J. B. & Reed, T. E. The portfolio concept in ecology and evolution. Front. Ecol. Environ. 13, 257–263 (2015).
Levin, S. A. Encyclopedia of Biodiversity 2nd edn. Elsevier Inc. (2013).
Shi, L. D. et al. Methane-dependent selenate reduction by a bacterial consortium. ISME J. 15, 3683–3692 (2021).
Lobb, B., Tremblay, B. J. M., Moreno-Hagelsieb, G. & Doxey, A. C. An assessment of genome annotation coverage across the bacterial tree of life. Microb. Genom. 6, e000341 (2020).
Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLOS Comput. Biol. 5, e1000605 (2009).
Griesemer, M., Kimbrel, J. A., Zhou, C. E., Navid, A. & D’Haeseleer, P. Combining multiple functional annotation tools increases coverage of metabolic annotation. BMC Genomics 19, 948 (2018).
Lee, D. J., Minchin, S. D. & Busby, S. J. W. Activating transcription in bacteria. Annu. Rev. Microbiol. 66, 125–152 (2012).
Aidelberg, G. et al. Hierarchy of non-glucose sugars in Escherichia coli. BMC Syst. Biol. 8, 133 (2014).
Gilbert, J. A. et al. Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535, 94–103 (2016).
Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science https://doi.org/10.1126/science.1261359 (2015).
Gregor, R. et al. Mammalian gut metabolomes mirror microbiome composition and host phylogeny. ISME J. 16, 1262–1274 (2021).
Qin, W. et al. Nitrosopumilus maritimus gen. nov., sp. nov., Nitrosopumilus cobalaminigenes sp. nov., Nitrosopumilus oxyclinae sp. nov., and Nitrosopumilus ureiphilus sp. nov., four marine ammoniaoxidizing archaea of the phylum Thaumarchaeota. Int. J. Syst. Evol. Microbiol. 67, 5067–5079 (2017).
Schmid, M. C. et al. Anaerobic ammonium-oxidizing bacteria in marine environments: widespread occurrence but low diversity. Environ. Microbiol. 9, 1476–1484 (2007).
Mori, J. F. et al. Putative mixotrophic nitrifying-denitrifying gammaproteobacteria implicated in nitrogen cycling within the ammonia/oxygen transition zone of an oil sands pit lake. Front. Microbiol. 10, 2435 (2019).
Wang, B. et al. Recovering partial nitritation in a PN/A system during mainstream wastewater treatment by reviving AOB activity after thoroughly inhibiting AOB and NOB with free nitrous acid. Environ. Int. 139, 105684 (2020).
Kibe, R. et al. Upregulation of colonic luminal polyamines produced by intestinal microbiota delays senescence in mice. Sci. Rep. 4, 4548 (2014).
Matsumoto, M., Kurihara, S., Kibe, R., Ashida, H. & Benno, Y. Longevity in mice is promoted by probiotic-induced suppression of colonic senescence dependent on upregulation of gut bacterial polyamine production. PLoS ONE 6, e23652 (2011).
Gupta, V. K. et al. Restoring polyamines protects from age-induced memory impairment in an autophagy-dependent manner. Nat. Neurosci. 16, 1453–1460 (2013).
Zhang, M. et al. Spermine inhibits proinflammatory cytokine synthesis in human mononuclear cells: a counterregulatory mechanism that restrains the immune response. J. Exp. Med. 185, 1759–1768 (1997).
Gerner, E. W. & Meyskens, F. L. Polyamines and cancer: old molecules, new understanding. Nat. Rev. Cancer 4, 781–792 (2004).
Noack, J., Kleessen, B., Proll, J., Dongowski, G. & Blaut, M. Dietary guar gum and pectin stimulate intestinal microbial polyamine synthesis in rats. J. Nutr. 128, 1385–1391 (1998).
Noack, J., Dongowski, G., Hartmann, L. & Blaut, M. The human gut bacteria Bacteroides thetaiotaomicron and Fusobacterium varium produce putrescine and spermidine in cecum of pectin-fed gnotobiotic rats. J. Nutr. 130, 1225–1231 (2000).
Roggenbuck, M. et al. The microbiome of New World vultures. Nat. Commun. 5, 5498 (2014).
Nakamura, A., Ooga, T. & Matsumoto, M. Intestinal luminal putrescine is produced by collective biosynthetic pathways of the commensal microbiome. Gut Microbes 10, 159–171 (2019).
Kitada, Y. et al. Bioactive polyamine production by a novel hybrid system comprising multiple indigenous gut bacterial strategies. Sci. Adv. 4, 62–89 (2018).
Kettle, H., Louis, P., Holtrop, G., Duncan, S. H. & Flint, H. J. Modelling the emergent dynamics and major metabolites of the human colonic microbiota. Environ. Microbiol. 17, 1615–1630 (2015).
Benítez-Páez, A. et al. Depletion of Blautia species in the microbiota of obese children relates to intestinal inflammation and metabolic phenotype worsening. mSystems 5, e00857-19 (2020).
McKenzie, V. J. et al. The effects of captivity on the mammalian gut microbiome. Integr. Comp. Biol. 57, 690–704 (2017).
Datta, M. S., Sliwerska, E., Gore, J., Polz, M. F. & Cordero, O. X. Microbial interactions lead to rapid micro-scale successions on model marine particles. Nat. Commun. 7, 11965 (2016).
Szabo, R. E. et al. Historical contingencies and phage induction diversify bacterioplankton communities at the microscale. Proc. Natl Acad. Sci. USA 119, e2117748119 (2022).
Khanijou, J. K. et al. Metabolomics and modelling approaches for systems metabolic engineering. Metab. Eng. Commun. 15, e00209 (2022).
Yilmaz, P. et al. The SILVA and ‘All-species Living Tree Project (LTP)’ taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).
Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ https://doi.org/10.7717/peerj.2584 (2016).
Gaur, A. & Arora, S. R. Solving a quadratic fractional integer programming problem using linearization. Manag. Sci. Financ. Eng. 14, 25–44 (2008).
Yue, D., Guillén‐Gosálbez, G., & You, F. Global optimization of large‐scale mixed‐integer linear fractional programming problems: a reformulation‐linearization method and process scheduling applications. AIChE Journal, 59, 4255–4272 (2013)
R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2011).
Scrucca, L. GA: a package for genetic algorithms in R. J. Stat. Softw. 53, 1–37 (2013).
Scrucca, L. On some extensions to GA package: hybrid optimisation, parallelisation and islands evolution. R J. 9, 187–206 (2016).
We thank S. Louca, J. Goldford, I. Mizrahi and coworkers for sharing their datasets; B. Liu, M. Tikhonov, Q. Wang, C. Song and all Cordero lab members for helpful suggestions. X.S. thanks S. W. Chisholm, M. Follows and R. Juanes for encouraging feedback. X.S. and O.X.C. were supported by a seed grant from the Abdul Latif Jameel Water and Food Systems Lab. O.X.C. was supported by the Simons Collaboration: Principles of Microbial Ecosystems, award number 542395. A.G. was supported by the Gordon and Betty Moore Foundation as a Physics of Living Systems Fellow through grant number GBMF4513. R.G. was supported by Simons Foundation Postdoctoral Fellowship Award 653410.
The authors declare no competing interests.
Peer review information
Nature Ecology and Evolution thanks Thomas Curtis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Schematic illustration of different types of environmental/phenotypic variables that can be leveraged by EQO to predict functional groups.
Bar plots show relative abundance of different taxa. Appropriate grouping of taxa leads to strong coupling with the phenotypic variable. For a continuous phenotypic variable such as measured concentration of a metabolite across samples, the coupling can be captured by a strong correlation. For a uniform phenotypic variable, the coupling can be marked as high stability or low variability. For a categorical phenotypic variable, the coupling can be interpreted as significant discrimination between treatments.
Extended Data Fig. 2 EQO for Tara Ocean microbiome.
(a) The best group size for EQO was determined to be 11 based on an AIC minimization criterion. (b) Relative abundance distribution of the 11 taxa selected by the algorithm across all sampling sites (left y-axis). Nitrate concentration measured at each sampling site was shown as black dots (right y-axis).
Extended Data Fig. 3 Statistical tests for predictability of metabolites in the gut microbiome dataset.
Metabolites with cross-validated R2 lower than 0.2 as well as adjusted P-value higher than 0.01 were filtered out from further analysis (Methods).
Extended Data Fig. 4 Distribution of predicted putrescine functional group in samples ranked according to host phylogeny.
1, hyena; 2, tiger; 3, leopard; 4, lion; 5, sand cat; 6, jungle cat; 7, wolf; 8, coati; 9, black bear; 10, brown bear; 11, donkey; 12, zebra; 13, rhino; 14, goat; 15, sheep; 16, lemur; 17, capuchin; 18, mandrill; 19, gibbon; 20, gorilla; 21, chimpanzee; 22, African elephant; 23, Asian elephant.
Extended Data Fig. 5 Functional groups identified for lactate production and butyrate production.
Species that are confirmed to be lactate producers or butyrate producers in previous experimental studies are highlighted in red. The two Streptococcus luteciae species in the left panel are well-known lactate producers in the Lactobacillales clade. Faecalibacterium prausnitzii and Blautia producta as well as Lachnospiraceae species have been reported to be butyrate producers.
Supplementary Tables 1–3.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shan, X., Goyal, A., Gregor, R. et al. Annotation-free discovery of functional groups in microbial communities. Nat Ecol Evol 7, 716–724 (2023). https://doi.org/10.1038/s41559-023-02021-z