Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Annotation-free discovery of functional groups in microbial communities


Recent studies have shown that microbial communities are composed of groups of functionally cohesive taxa whose abundance is more stable and better-associated with metabolic fluxes than that of any individual taxon. However, identifying these functional groups in a manner that is independent of error-prone functional gene annotations remains a major open problem. Here we tackle this structure–function problem by developing a novel unsupervised approach that coarse-grains taxa into functional groups, solely on the basis of the patterns of statistical variation in species abundances and functional read-outs. We demonstrate the power of this approach on three distinct datasets. On data of replicate microcosms with heterotrophic soil bacteria, our unsupervised algorithm recovered experimentally validated functional groups that divide metabolic labour and remain stable despite large variation in species composition. When leveraged against the ocean microbiome data, our approach discovered a functional group that combines aerobic and anaerobic ammonia oxidizers whose summed abundance tracks closely with nitrate concentrations in the water column. Finally, we show that our framework can enable the detection of species groups that are probably responsible for the production or consumption of metabolites abundant in animal gut microbiomes, serving as a hypothesis-generating tool for mechanistic studies. Overall, this work advances our understanding of structure–function relationships in complex microbiomes and provides a powerful approach to discover functional groups in an objective and systematic manner.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Schematic illustration of Ensemble Quotient Optimization (EQO).
Fig. 2: Coarse-graining functional groups in replicate microcosms.
Fig. 3: Functional guilds of nitrogen cycling in the ocean microbiome.
Fig. 4: Predicting the level of metabolites with minimal assemblages in gut microbiome.

Data availability

Data used for EQO in this study are provided as supplementary tables. Access to the raw source data is available in original publications of microbiome datasets3,4,20,21.

Code availability

R scripts are available at An R package for EQO, mEQO, is available at with an accompanying vignette illustrating how to use the package.


  1. Louca, S. et al. Function and functional redundancy in microbial systems. Nat. Ecol. Evol. (2018).

    Article  PubMed  Google Scholar 

  2. Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Louca, S., Parfrey, L. W. & Doebeli, M. Decoupling function and taxonomy in the global ocean microbiome. Science (2016).

    Article  PubMed  Google Scholar 

  4. Goldford, J. E. et al. Emergent simplicity in microbial community assembly. Science 361, 469–474 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Nelson, M. B., Martiny, A. C. & Martiny, J. B. H. Global biogeography of microbial nitrogen-cycling traits in soil. Proc. Natl Acad. Sci. USA (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Díaz, S. & Cabido, M. Vive la différence: plant functional diversity matters to ecosystem processes. Trends Ecol. Evol. 16, 646–655 (2001).

    Article  Google Scholar 

  7. Benoit, D. M., Jackson, D. A. & Chu, C. Partitioning fish communities into guilds for ecological analyses: an overview of current approaches and future directions. Can. J. Fish. Aquat. Sci. (2021).

  8. Blaum, N., Mosner, E., Schwager, M. & Jeltsch, F. How functional is functional? Ecological groupings in terrestrial animal ecology: towards an animal functional type approach. Biodivers. Conserv. 20, 2333–2345 (2011).

    Article  Google Scholar 

  9. Hussain, F. A. et al. Rapid evolutionary turnover of mobile genetic elements drives bacterial resistance to phages. Science 374, 488–492 (2021).

    Article  CAS  PubMed  Google Scholar 

  10. Goyal, A., Bittleston, L. S., Leventhal, G. E., Lu, L. & Cordero, O. X. Interactions between strains govern the eco-evolutionary dynamics of microbial communities. eLife (2022).

  11. Schindler, D. E., Armstrong, J. B. & Reed, T. E. The portfolio concept in ecology and evolution. Front. Ecol. Environ. 13, 257–263 (2015).

    Article  Google Scholar 

  12. Levin, S. A. Encyclopedia of Biodiversity 2nd edn. Elsevier Inc. (2013).

  13. Shi, L. D. et al. Methane-dependent selenate reduction by a bacterial consortium. ISME J. 15, 3683–3692 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lobb, B., Tremblay, B. J. M., Moreno-Hagelsieb, G. & Doxey, A. C. An assessment of genome annotation coverage across the bacterial tree of life. Microb. Genom. 6, e000341 (2020).

    PubMed  PubMed Central  Google Scholar 

  15. Schnoes, A. M., Brown, S. D., Dodevski, I. & Babbitt, P. C. Annotation error in public databases: misannotation of molecular function in enzyme superfamilies. PLOS Comput. Biol. 5, e1000605 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Griesemer, M., Kimbrel, J. A., Zhou, C. E., Navid, A. & D’Haeseleer, P. Combining multiple functional annotation tools increases coverage of metabolic annotation. BMC Genomics 19, 948 (2018).

  17. Lee, D. J., Minchin, S. D. & Busby, S. J. W. Activating transcription in bacteria. Annu. Rev. Microbiol. 66, 125–152 (2012).

    Article  CAS  PubMed  Google Scholar 

  18. Aidelberg, G. et al. Hierarchy of non-glucose sugars in Escherichia coli. BMC Syst. Biol. 8, 133 (2014).

  19. Gilbert, J. A. et al. Microbiome-wide association studies link dynamic microbial consortia to disease. Nature 535, 94–103 (2016).

    Article  CAS  PubMed  Google Scholar 

  20. Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science (2015).

    Article  PubMed  Google Scholar 

  21. Gregor, R. et al. Mammalian gut metabolomes mirror microbiome composition and host phylogeny. ISME J. 16, 1262–1274 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Qin, W. et al. Nitrosopumilus maritimus gen. nov., sp. nov., Nitrosopumilus cobalaminigenes sp. nov., Nitrosopumilus oxyclinae sp. nov., and Nitrosopumilus ureiphilus sp. nov., four marine ammoniaoxidizing archaea of the phylum Thaumarchaeota. Int. J. Syst. Evol. Microbiol. 67, 5067–5079 (2017).

    Article  PubMed  Google Scholar 

  23. Schmid, M. C. et al. Anaerobic ammonium-oxidizing bacteria in marine environments: widespread occurrence but low diversity. Environ. Microbiol. 9, 1476–1484 (2007).

    Article  CAS  PubMed  Google Scholar 

  24. Mori, J. F. et al. Putative mixotrophic nitrifying-denitrifying gammaproteobacteria implicated in nitrogen cycling within the ammonia/oxygen transition zone of an oil sands pit lake. Front. Microbiol. 10, 2435 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Wang, B. et al. Recovering partial nitritation in a PN/A system during mainstream wastewater treatment by reviving AOB activity after thoroughly inhibiting AOB and NOB with free nitrous acid. Environ. Int. 139, 105684 (2020).

    Article  CAS  PubMed  Google Scholar 

  26. Kibe, R. et al. Upregulation of colonic luminal polyamines produced by intestinal microbiota delays senescence in mice. Sci. Rep. 4, 4548 (2014).

  27. Matsumoto, M., Kurihara, S., Kibe, R., Ashida, H. & Benno, Y. Longevity in mice is promoted by probiotic-induced suppression of colonic senescence dependent on upregulation of gut bacterial polyamine production. PLoS ONE 6, e23652 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Gupta, V. K. et al. Restoring polyamines protects from age-induced memory impairment in an autophagy-dependent manner. Nat. Neurosci. 16, 1453–1460 (2013).

    Article  CAS  PubMed  Google Scholar 

  29. Zhang, M. et al. Spermine inhibits proinflammatory cytokine synthesis in human mononuclear cells: a counterregulatory mechanism that restrains the immune response. J. Exp. Med. 185, 1759–1768 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Gerner, E. W. & Meyskens, F. L. Polyamines and cancer: old molecules, new understanding. Nat. Rev. Cancer 4, 781–792 (2004).

    Article  CAS  PubMed  Google Scholar 

  31. Noack, J., Kleessen, B., Proll, J., Dongowski, G. & Blaut, M. Dietary guar gum and pectin stimulate intestinal microbial polyamine synthesis in rats. J. Nutr. 128, 1385–1391 (1998).

    Article  CAS  PubMed  Google Scholar 

  32. Noack, J., Dongowski, G., Hartmann, L. & Blaut, M. The human gut bacteria Bacteroides thetaiotaomicron and Fusobacterium varium produce putrescine and spermidine in cecum of pectin-fed gnotobiotic rats. J. Nutr. 130, 1225–1231 (2000).

    Article  CAS  PubMed  Google Scholar 

  33. Roggenbuck, M. et al. The microbiome of New World vultures. Nat. Commun. 5, 5498 (2014).

  34. Nakamura, A., Ooga, T. & Matsumoto, M. Intestinal luminal putrescine is produced by collective biosynthetic pathways of the commensal microbiome. Gut Microbes 10, 159–171 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Kitada, Y. et al. Bioactive polyamine production by a novel hybrid system comprising multiple indigenous gut bacterial strategies. Sci. Adv. 4, 62–89 (2018).

    Article  Google Scholar 

  36. Kettle, H., Louis, P., Holtrop, G., Duncan, S. H. & Flint, H. J. Modelling the emergent dynamics and major metabolites of the human colonic microbiota. Environ. Microbiol. 17, 1615–1630 (2015).

    Article  CAS  PubMed  Google Scholar 

  37. Benítez-Páez, A. et al. Depletion of Blautia species in the microbiota of obese children relates to intestinal inflammation and metabolic phenotype worsening. mSystems 5, e00857-19 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  38. McKenzie, V. J. et al. The effects of captivity on the mammalian gut microbiome. Integr. Comp. Biol. 57, 690–704 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Datta, M. S., Sliwerska, E., Gore, J., Polz, M. F. & Cordero, O. X. Microbial interactions lead to rapid micro-scale successions on model marine particles. Nat. Commun. 7, 11965 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Szabo, R. E. et al. Historical contingencies and phage induction diversify bacterioplankton communities at the microscale. Proc. Natl Acad. Sci. USA 119, e2117748119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Khanijou, J. K. et al. Metabolomics and modelling approaches for systems metabolic engineering. Metab. Eng. Commun. 15, e00209 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Yilmaz, P. et al. The SILVA and ‘All-species Living Tree Project (LTP)’ taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).

    Article  CAS  PubMed  Google Scholar 

  43. Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).

    Article  CAS  PubMed  Google Scholar 

  44. Rognes, T., Flouri, T., Nichols, B., Quince, C. & Mahé, F. VSEARCH: a versatile open source tool for metagenomics. PeerJ (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Gaur, A. & Arora, S. R. Solving a quadratic fractional integer programming problem using linearization. Manag. Sci. Financ. Eng. 14, 25–44 (2008).

    Google Scholar 

  46. Yue, D., Guillén‐Gosálbez, G., & You, F. Global optimization of large‐scale mixed‐integer linear fractional programming problems: a reformulation‐linearization method and process scheduling applications. AIChE Journal, 59, 4255–4272 (2013)

  47. R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2011).

  48. Scrucca, L. GA: a package for genetic algorithms in R. J. Stat. Softw. 53, 1–37 (2013).

    Article  Google Scholar 

  49. Scrucca, L. On some extensions to GA package: hybrid optimisation, parallelisation and islands evolution. R J. 9, 187–206 (2016).

    Article  Google Scholar 

Download references


We thank S. Louca, J. Goldford, I. Mizrahi and coworkers for sharing their datasets; B. Liu, M. Tikhonov, Q. Wang, C. Song and all Cordero lab members for helpful suggestions. X.S. thanks S. W. Chisholm, M. Follows and R. Juanes for encouraging feedback. X.S. and O.X.C. were supported by a seed grant from the Abdul Latif Jameel Water and Food Systems Lab. O.X.C. was supported by the Simons Collaboration: Principles of Microbial Ecosystems, award number 542395. A.G. was supported by the Gordon and Betty Moore Foundation as a Physics of Living Systems Fellow through grant number GBMF4513. R.G. was supported by Simons Foundation Postdoctoral Fellowship Award 653410.

Author information

Authors and Affiliations



O.X.C. conceived the study. X.S. and O.X.C. designed the algorithm and performed data analysis. X.S., A.G., R.G. and O.X.C. wrote the manuscript.

Corresponding author

Correspondence to Otto X. Cordero.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Ecology and Evolution thanks Thomas Curtis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Schematic illustration of different types of environmental/phenotypic variables that can be leveraged by EQO to predict functional groups.

Bar plots show relative abundance of different taxa. Appropriate grouping of taxa leads to strong coupling with the phenotypic variable. For a continuous phenotypic variable such as measured concentration of a metabolite across samples, the coupling can be captured by a strong correlation. For a uniform phenotypic variable, the coupling can be marked as high stability or low variability. For a categorical phenotypic variable, the coupling can be interpreted as significant discrimination between treatments.

Extended Data Fig. 2 EQO for Tara Ocean microbiome.

(a) The best group size for EQO was determined to be 11 based on an AIC minimization criterion. (b) Relative abundance distribution of the 11 taxa selected by the algorithm across all sampling sites (left y-axis). Nitrate concentration measured at each sampling site was shown as black dots (right y-axis).

Extended Data Fig. 3 Statistical tests for predictability of metabolites in the gut microbiome dataset.

Metabolites with cross-validated R2 lower than 0.2 as well as adjusted P-value higher than 0.01 were filtered out from further analysis (Methods).

Extended Data Fig. 4 Distribution of predicted putrescine functional group in samples ranked according to host phylogeny.

1, hyena; 2, tiger; 3, leopard; 4, lion; 5, sand cat; 6, jungle cat; 7, wolf; 8, coati; 9, black bear; 10, brown bear; 11, donkey; 12, zebra; 13, rhino; 14, goat; 15, sheep; 16, lemur; 17, capuchin; 18, mandrill; 19, gibbon; 20, gorilla; 21, chimpanzee; 22, African elephant; 23, Asian elephant.

Extended Data Fig. 5 Functional groups identified for lactate production and butyrate production.

Species that are confirmed to be lactate producers or butyrate producers in previous experimental studies are highlighted in red. The two Streptococcus luteciae species in the left panel are well-known lactate producers in the Lactobacillales clade. Faecalibacterium prausnitzii and Blautia producta as well as Lachnospiraceae species have been reported to be butyrate producers.

Supplementary information

Supplementary Information

Supplementary Notes

Reporting Summary

Supplementary Table

Supplementary Tables 1–3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shan, X., Goyal, A., Gregor, R. et al. Annotation-free discovery of functional groups in microbial communities. Nat Ecol Evol 7, 716–724 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing