A metabolomics pipeline for the mechanistic interrogation of the gut microbiome

Han, Shuo; Van Treuren, Will; Fischer, Curt R.; Merrill, Bryan D.; DeFelice, Brian C.; Sanchez, Juan M.; Higginbottom, Steven K.; Guthrie, Leah; Fall, Lalla A.; Dodd, Dylan; Fischbach, Michael A.; Sonnenburg, Justin L.

doi:10.1038/s41586-021-03707-9

Article
Published: 14 July 2021

A metabolomics pipeline for the mechanistic interrogation of the gut microbiome

Nature volume 595, pages 415–420 (2021)Cite this article

49k Accesses
165 Citations
178 Altmetric
Metrics details

Subjects

Abstract

Gut microorganisms modulate host phenotypes and are associated with numerous health effects in humans, ranging from host responses to cancer immunotherapy to metabolic disease and obesity. However, difficulty in accurate and high-throughput functional analysis of human gut microorganisms has hindered efforts to define mechanistic connections between individual microbial strains and host phenotypes. One key way in which the gut microbiome influences host physiology is through the production of small molecules^1,2,3, yet progress in elucidating this chemical interplay has been hindered by limited tools calibrated to detect the products of anaerobic biochemistry in the gut. Here we construct a microbiome-focused, integrated mass-spectrometry pipeline to accelerate the identification of microbiota-dependent metabolites in diverse sample types. We report the metabolic profiles of 178 gut microorganism strains using our library of 833 metabolites. Using this metabolomics resource, we establish deviations in the relationships between phylogeny and metabolism, use machine learning to discover a previously undescribed type of metabolism in Bacteroides, and reveal candidate biochemical pathways using comparative genomics. Microbiota-dependent metabolites can be detected in diverse biological fluids from gnotobiotic and conventionally colonized mice and traced back to the corresponding metabolomic profiles of cultured bacteria. Collectively, our microbiome-focused metabolomics pipeline and interactive metabolomics profile explorer are a powerful tool for characterizing microorganisms and interactions between microorganisms and their host.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: A microbiome-focused metabolomics pipeline enables the mechanistic interrogation of microbiome metabolism.**

**Fig. 2: Relationships between phylogeny, taxonomy and metabolome.**

**Fig. 3: Discovery of nitrogen-assimilation strategies in *Bacteroides* and previously undescribed gene–phenotype relationships.**

**Fig. 4: Metabolic contribution by individual gut microorganisms in a multi-species community.**

Moving beyond descriptive studies: harnessing metabolomics to elucidate the molecular mechanisms underpinning host-microbiome phenotypes

Article 15 August 2022

Mass spectrometry-based metabolomics in microbiome investigations

Article 22 September 2021

The gut microbiome-metabolome dataset collection: a curated resource for integrative meta-analysis

Article Open access 15 October 2022

Data availability

All raw data from metabolomics are publicly available from the Metabolomics Workbench under study number ST001683 for in vivo data and study number ST001688 for in vitro data. MS/MS libraries generated using the qTOF and QE instruments are publicly accessible in the MoNA spectrum database (https://mona.fiehnlab.ucdavis.edu) and can be queried using the keywords ‘Sonnenburg Lab MS2 Library’.

Code availability

Custom Python code was written to enable the construction of the MS/MS libraries, the processing and visualization of the in vitro and in vivo LC–MS data, the optical density and growth curve data, the bioinformatics analysis of 16S and whole genomes, and the analysis of the metabolomic data. Full code for each of these steps is available at https://doi.org/10.5281/zenodo.4890994. The JavaScript code supporting the interactive, web-based Metabolomics Data Explorer is available at https://doi.org/10.5281/zenodo.4890999.

References

Koppel, N., Maini Rekdal, V. & Balskus, E. P. Chemical transformation of xenobiotics by the human gut microbiota. Science 356, eaag2770 (2017).
Article PubMed Google Scholar
Koh, A., De Vadder, F., Kovatcheva-Datchary, P. & Bäckhed, F. From dietary fiber to host physiology: short-chain fatty acids as key bacterial metabolites. Cell 165, 1332–1345 (2016).
Article CAS PubMed Google Scholar
Donia, M. S. & Fischbach, M. A. Small molecules from the human microbiota. Science 349, 1254766 (2015).
Article PubMed PubMed Central Google Scholar
Rooks, M. G. & Garrett, W. S. Gut microbiota, metabolites and host immunity. Nat. Rev. Immunol. 16, 341–352 (2016).
Article CAS PubMed PubMed Central Google Scholar
Cani, P. D. Microbiota and metabolites in metabolic diseases. Nat. Rev. Endocrinol. 15, 69–70 (2019).
Article CAS PubMed Google Scholar
Sonnenburg, J. L. & Bäckhed, F. Diet-microbiota interactions as moderators of human metabolism. Nature 535, 56–64 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Kasahara, K. & Rey, F. E. The emerging role of gut microbial metabolism on cardiovascular disease. Curr. Opin. Microbiol. 50, 64–70 (2019).
Article CAS PubMed Google Scholar
Lynch, J. B. & Hsiao, E. Y. Microbiomes as sources of emergent host phenotypes. Science 365, 1405–1409 (2019).
Article ADS CAS PubMed Google Scholar
Nemet, I. et al. A cardiovascular disease-linked gut microbial metabolite acts via adrenergic receptors. Cell 180, 862–877.e22 (2020).
Article CAS PubMed PubMed Central Google Scholar
Koh, A. et al. Microbially produced imidazole propionate impairs insulin signaling through mTORC1. Cell 175, 947–961.e17 (2018).
Article CAS PubMed Google Scholar
Rinschen, M. M., Ivanisevic, J., Giera, M. & Siuzdak, G. Identification of bioactive metabolites using activity metabolomics. Nat. Rev. Mol. Cell Biol. 20, 353–367 (2019)
Article CAS PubMed PubMed Central Google Scholar
Blaženović, I., Kind, T., Ji, J. & Fiehn, O. Software tools and approaches for compound identification of LC-MS/MS data in metabolomics. Metabolites 8, 31 (2018).
Article PubMed Central Google Scholar
Guijas, C. et al. METLIN: a technology platform for identifying knowns and unknowns. Anal. Chem. 90, 3156–3164 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46 (D1), D608–D617 (2018).
Article CAS PubMed Google Scholar
Kanehisa, M., Sato, Y., Furumichi, M., Morishima, K. & Tanabe, M. New approach for understanding genome variations in KEGG. Nucleic Acids Res. 47 (D1), D590–D595 (2019).
Article CAS PubMed Google Scholar
Sampson, T. R. & Mazmanian, S. K. Control of brain development, function, and behavior by the microbiome. Cell Host Microbe 17, 565–576 (2015).
Article CAS PubMed PubMed Central Google Scholar
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Goldford, J. E. et al. Emergent simplicity in microbial community assembly. Science 361, 469–474 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Kamneva, O. K. Genome composition and phylogeny of microbes predict their co-occurrence in the environment. PLOS Comput. Biol. 13, e1005366 (2017).
Article ADS PubMed PubMed Central Google Scholar
Borenstein, E., Kupiec, M., Feldman, M. W. & Ruppin, E. Large-scale reconstruction and phylogenetic analysis of metabolic environments. Proc. Natl Acad. Sci. USA 105, 14482–14487 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Plata, G., Henry, C. S. & Vitkup, D. Long-term phenotypic evolution of bacteria. Nature 517, 369–372 (2015).
Article ADS CAS PubMed Google Scholar
Tofalo, R., Cocchi, S. & Suzzi, G. Polyamines and gut microbiota. Front. Nutr. 6, 16 (2019).
Article PubMed PubMed Central Google Scholar
Qi, H. et al. Lactobacillus maintains healthy gut mucosa by producing l-ornithine. Commun. Biol. 2, 171 (2019).
Article PubMed PubMed Central Google Scholar
Zúñiga, M., Pérez, G. & González-Candelas, F. Evolution of arginine deiminase (ADI) pathway genes. Mol. Phylogenet. Evol. 25, 429–444 (2002).
Article PubMed Google Scholar
Tabor, C. W. & Tabor, H. Polyamines in microorganisms. Microbiol. Rev. 49, 81–99 (1985).
Article CAS PubMed PubMed Central Google Scholar
Varel, V. H. & Bryant, M. P. Nutritional features of Bacteroides fragilis subsp. fragilis. Appl. Microbiol. 28, 251–257 (1974).
Article CAS PubMed PubMed Central Google Scholar
Mars, R. A. T. et al. Longitudinal multi-omics reveals subset-specific mechanisms underlying irritable bowel syndrome. Cell 182, 1460–1473.e17 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wastyk, H. C. et al. Gut microbiota-targeted diets modulate human immune status. Preprint at https://doi.org/10.1101/2020.09.30.321448 (2020).
Kotagale, N. R., Taksande, B. G. & Inamdar, N. N. Neuroprotective offerings by agmatine. Neurotoxicology 73, 228–245 (2019).
Article CAS PubMed Google Scholar
Winter, T. N., Elmquist, W. F. & Fairbanks, C. A. OCT2 and MATE1 provide bidirectional agmatine transport. Mol. Pharm. 8, 133–142 (2011).
Article CAS PubMed Google Scholar
Chin, R. M. et al. The metabolite α-ketoglutarate extends lifespan by inhibiting ATP synthase and TOR. Nature 510, 397–401 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, W. et al. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature 569, 663–671 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Watrous, J. D. et al. Directed non-targeted mass spectrometry and chemical networking for discovery of eicosanoids and related oxylipins. Cell Chem. Biol. 26, 433–442.e4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dumas, M. E. et al. Metabolic profiling reveals a contribution of gut microbiota to fatty liver phenotype in insulin-resistant mice. Proc. Natl Acad. Sci. USA 103, 12511–12516 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Langille, M. G. I. et al. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat. Biotechnol. 31, 814–821 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sberro, H. et al. Large-scale analyses of human microbiomes reveal thousands of small, novel genes. Cell 178, 1245–1259.e14 (2019).
Article CAS PubMed PubMed Central Google Scholar
Quinn, R. A. et al. Global chemical effects of the microbiome include new bile-acid conjugations. Nature 579, 123–129 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Zimmermann, M., Zimmermann-Kogadeeva, M., Wegmann, R. & Goodman, A. L. Mapping human microbiome drug metabolism by gut bacteria and their genes. Nature 570, 462–467 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wu, G. D. et al. Comparative metabolomics in vegans and omnivores reveal constraints on diet-dependent gut microbiota metabolite production. Gut 65, 63–72 (2016).
Article CAS PubMed Google Scholar
Kim, S. G. et al. Microbiota-derived lantibiotic restores resistance against vancomycin-resistant Enterococcus. Nature 572, 665–669 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Maini Rekdal, V., Bess, E. N., Bisanz, J. E., Turnbaugh, P. J. & Balskus, E. P. Discovery and inhibition of an interspecies gut bacterial pathway for Levodopa metabolism. Science 364, eaau6323 (2019).
Article PubMed Google Scholar
Wikoff, W. R. et al. Metabolomics analysis reveals large effects of gut microflora on mammalian blood metabolites. Proc. Natl Acad. Sci. USA 106, 3698–3703 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Showalter, M. R. et al. Obesogenic diets alter metabolism in mice. PLoS ONE 13, e0190632 (2018).
Article PubMed PubMed Central Google Scholar
Cajka, T., Smilowitz, J. T. & Fiehn, O. Validating quantitative untargeted lipidomics across nine liquid chromatography-high-resolution mass spectrometry platforms. Anal. Chem. 89, 12360–12368 (2017).
Article CAS PubMed Google Scholar
Tsugawa, H. et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526 (2015).
Article CAS PubMed PubMed Central Google Scholar
Kösters, M. et al. pymzML v2.0: introducing a highly compressed and seekable gzip format. Bioinformatics 34, 2513–2514 (2018).
Article PubMed Google Scholar
Wu, M. et al. Genetic determinants of in vivo fitness and diet responsiveness in multiple human gut Bacteroides. Science 350, aac5992 (2015).
Article PubMed PubMed Central Google Scholar
Shepherd, E. S., DeLoache, W. C., Pruss, K. M., Whitaker, W. R. & Sonnenburg, J. L. An exclusive metabolic niche enables strain engraftment in the gut microbiota. Nature 557, 434–438 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Tritsch, G. L. & Moore, G. E. Spontaneous decomposition of glutamine in cell culture media. Exp. Cell Res. 28, 360–364 (1962).
Article CAS PubMed Google Scholar
Dodd, D. et al. A gut bacterial pathway metabolizes aromatic amino acids into nine circulating metabolites. Nature 551, 648–652 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bankevich, A. et al. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069 (2014).
Article CAS PubMed Google Scholar
Medema, M. H., Takano, E. & Breitling, R. Detecting sequence homology at the gene cluster level with MultiGeneBlast. Mol. Biol. Evol. 30, 1218–1223 (2013).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank C. Khosla, Y. Dai and the Stanford ChEM-H Metabolomics Knowledge Center for use of the LC–MS-qTOF instrument; A. Shiver, K. C. Huang and A. Cheng for sharing bacterial strains; T. Meyers and T. Cowan for sharing chemical standards; H. C. Wastyk and G. K. Fragiadakis for sharing their metabolite data before publication; J. K. Yang for consultation on web design tools; and T. Le for discussion on MS methods. This work is supported by R01-DK085025, DP1-AT009892, R01-DK101674, gifts from M. and J. Pasquesi, H. Buhr and J. Feiber, a Stanford Discovery Innovation Fund Award (J.L.S.), Chan Zuckerberg Biohub (M.A.F. and J.L.S.), DP1-DK113598 and P01-HL147823 (M.A.F.), K08DK110335 (D.D.), Stanford Dean’s Postdoctoral Fellowship and NRSA F32AG062119 (S.H.), NSF-GRFP DGE-114747 (W.V.T.) and 5T32AI007328-32 (L.G.).

Author information

These authors contributed equally: Shuo Han, Will Van Treuren

Authors and Affiliations

Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
Shuo Han, Will Van Treuren, Bryan D. Merrill, Steven K. Higginbottom, Leah Guthrie, Dylan Dodd & Justin L. Sonnenburg
Microbiology and Immunology Graduate Program, Stanford University School of Medicine, Stanford, CA, USA
Will Van Treuren & Bryan D. Merrill
ChEM-H, Stanford University, Stanford, CA, USA
Curt R. Fischer & Lalla A. Fall
Chan Zuckerberg Biohub, San Francisco, CA, USA
Curt R. Fischer, Brian C. DeFelice, Juan M. Sanchez, Michael A. Fischbach & Justin L. Sonnenburg
Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA
Lalla A. Fall & Dylan Dodd
Department of Bioengineering, Stanford University, Stanford, CA, USA
Michael A. Fischbach
Center for Human Microbiome Studies, Stanford, CA, USA
Justin L. Sonnenburg

Authors

Shuo Han
View author publications
You can also search for this author in PubMed Google Scholar
Will Van Treuren
View author publications
You can also search for this author in PubMed Google Scholar
Curt R. Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Bryan D. Merrill
View author publications
You can also search for this author in PubMed Google Scholar
Brian C. DeFelice
View author publications
You can also search for this author in PubMed Google Scholar
Juan M. Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Steven K. Higginbottom
View author publications
You can also search for this author in PubMed Google Scholar
Leah Guthrie
View author publications
You can also search for this author in PubMed Google Scholar
Lalla A. Fall
View author publications
You can also search for this author in PubMed Google Scholar
Dylan Dodd
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Fischbach
View author publications
You can also search for this author in PubMed Google Scholar
Justin L. Sonnenburg
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.H., W.V.T., D.D., M.A.F. and J.L.S. designed this study. S.H., W.V.T. and S.K.H. performed all bacterial culture and gnotobiotic mouse experiments. S.H. constructed MS qTOF and QE m/z-RT and MS2 spectrum libraries, performed MS experimental validation and MS-DIAL data analysis, conducted comparative genomics and metabolomic distance analyses, constructed the in vivo metabolomics pipeline and database, and designed the Metabolomics Data Explorer. W.V.T. constructed the bacterial strain library, performed phylogenetic analysis, built the in vitro bioinformatics pipeline and database, performed phylogenetic versus metabolomic distance comparisons, and built random forest models. C.R.F. and L.G. conducted chemoinformatics analyses of reference library compounds. B.D.M. built the strain-resolved comparative genomics database. S.H., B.C.D. and J.M.S. developed qTOF and QE MS2 methods and collected data. D.D., L.A.F. and C.R.F. set up the MS1 MS methods. S.H., D.D. and L.A.F. built the authentic compound collection and developed metabolomics sample preparation methods. All authors provided intellectual contributions. S.H., W.V.T. and J.L.S. wrote the paper, and all authors provided feedback. S.H. and W.V.T. contributed equally with author order determined by coin-flip.

Corresponding authors

Correspondence to Dylan Dodd, Michael A. Fischbach or Justin L. Sonnenburg.

Ethics declarations

Competing interests

M.A.F. is a co-founder and director of Federation Bio and Viralogic, a co-founder of Revolution Medicines, and a member of the scientific advisory boards of NGM Bio and Zymergen.

Additional information

Peer review information Nature thanks Gary Siuzdak and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Summary statistics for the MS reference library metabolites, their detection and validation.

a, Chemical similarity network of the compound library. Network nodes, library compounds coloured by their superclasses. Node size, monoisotopic mass. Edges between nodes, substructure similarity values above a z-score threshold of 1 s.d. from the mean. b, Scatter plots and histograms of chemical properties of 833 library metabolites. c, Venn diagram of library compounds that are detected by each of the three methods. d, Venn diagram of compounds (by PubChem CID) identified in the reference compound library (Supplementary Table 1), in vitro conditions (Supplementary Table 7, ‘count.ps’) and in vivo conditions (Supplementary Table 8, ‘istd_corr_ion_count_matrix’). In vitro conditions include all medium types, and in vivo conditions include all sample types: urine, serum, faeces and caecal contents, and all colonization states. e, Scatterplot of all pairwise similarity scores (biological sample versus library) of the same compound searched against the MoNA spectrum database. All library standards (median similarity score = 992) and 97.3% of the corresponding compounds from biological samples (median similarity score = 923) had similarity scores of ≥600, and 2.7% of those compounds from biological samples scored below 600. Confidence levels were determined based on both similarity scores and visual validation of the MS/MS spectra. f, Schematic of the data collection and analysis workflow of the metabolomics pipeline. Panel created with Biorender.com.

Extended Data Fig. 2 Schematic of a custom bioinformatics analysis pipeline that generates a metabolite fold-change matrix.

The pipeline integrates data across multiple experimental runs and minimizes intra-replicate, intra-experiment and inter-experiment variability. The four steps detailed here are explained in depth in the Supplementary Methods (see ‘Custom bioinformatics: in vitro pipeline’ section). Step 1, a database recording sample metadata (organism, media, growth data, and so on) and MS-DIAL output files are integrated into data matrices that are specific to each analytical method. Step 2, all data are grouped by replicate (biological sample groups (BSGs)) and analysed to remove replicates with low intra-replicate correlation. Replicates are then grouped by experiment (EXPs) to assess inter-experiment variability. Transformations reducing inter-experiment variability are identified and compared. For metabolites that are detected by multiple methods, their ion counts are compared on a per-replicate and per-experiment basis to identify one or more methods that consistently detect these metabolites. Step 3, using an internal standard-based correction, ion counts for individual samples are adjusted and transformed into different fold-change data matrices. Step 4, data matrices corresponding to each method are combined into a single data matrix representing all detected metabolites.

Extended Data Fig. 3 High-throughput identification and analysis of diverse metabolites in complex biological matrices.

a, Number of unique compounds (by PubChem CID) within distinct chemical superclasses detected in the m/z-RT reference library (n = 815, 11 superclasses), in vitro dataset (n = 458, 9 superclasses) or in vivo dataset (n = 551, 9 superclasses), excluding internal standards. Nine of the eleven chemical superclasses in the reference library are represented in the metabolites detected in vitro and in vivo. The two remaining library superclasses (organosulfur and organometallic compounds) not represented in the experimental data contain one compound each. b, Diverse classes of metabolites identified in the conventional mouse caecum. Representative metabolites shown are significantly elevated (≥4-fold, corrected P < 0.05) in conventional mice versus germ-free controls in one experiment with n = 3 (conventional) and n = 4 (germ-free) mice. P values were calculated using two-tailed Student’s t-tests with Benjamini–Hochberg correction for multiple comparisons. c, Examples of precursors, intermediates and products from the tryptophan fermentation pathway that were identified by our methods both in vitro (C. sporogenes culture supernatant) and in vivo (C. sporogenes mono-colonization caecal contents). Extracted ion chromatogram peaks representing relative ion counts for each metabolite are shown. d, e, Histograms of changes in RT (d) and total ion count (e) for 132 spike-in metabolites in five complex biological matrices using three analytical methods. All spiked-in metabolites show minimal change in RT, falling within a conservative ±0.1-min search window from their RTs as determined in the library control condition (d). The majority of spiked-in metabolites (for example, 97% in faeces) exhibit less than fourfold change in ion counts relative to those detected in the library control condition (e). Representative examples of RT shifts (d) and changes in total ion counts (e) in individual metabolites in the mouse faecal matrix are shown. Data are mean ± s.e.m. of one experiment with n = 3 biological replicates. f, Histograms of linear ranges of 377 reference library metabolites measured in serial dilutions. A representative linear range of 5-hydroxyindole is shown. g, Violin plots (median, quartiles) of differences in RTs measured by three analytical methods between distinct MS instruments: the qTOF 6454, with which the library was built, was compared with a second instrument: a qTOF 6530 for a shared panel of 219 reference library metabolites (top) or a Orbitrap QE for a shared panel of 773 reference library metabolites (bottom). Mean RT differences (in min) between two instruments by each method (C18-positive, C18-negative, HILIC-positive, respectively) were as follows: qTOF versus qTOF, pre-correction: 0.238, 0.044, −0.110; post-correction: −0.023, −0.020, 0.015; qTOF versus QE, pre-correction: 0.151, 0.027, 0.196; post-correction: −0.040, −0.021, 0.026). Per method, RT correction was performed by polynomial transformation of the library based on inter-instrumental RT shifts of 10–20 robustly detected metabolites. Per method, using the corrected library with a RT tolerance window of 0.2 min, around 99% of the 219 metabolites tested on the second qTOF and about 94% of the 773 metabolites tested on the QE were correctly identified.

Extended Data Fig. 4 Conserved and unique metabolomic signatures across bacterial taxa.

a, Schematic of our high-throughput bacterial culture and sample collection workflow. Panel created with Biorender.com. b, Intra-replicate Pearson correlation coefficients (triplicates and greater) stratified by fourteen independent bacterial culture experiments and three analytical methods. For each experiment, Pearson correlation r values were calculated for all supernatant and medium sample replicate groups: n = 346 (C18-positive), n = 344 (C18-negative) and n = 344 (HILIC-positive). Total ion count data were corrected by internal standards and log-transformed, standardized and scaled, before computing Pearson correlation values. Box, median, 25th and 75th percentiles; whiskers, Tukey’s method. c, Left, number of medium-specific or common metabolites detected in the same bacterial strain grown in two different media (29 strains cultured in two or more of the 12 different media). Each dot represents the total number of metabolites from a single comparison between two media in which a strain has been grown: n = 58 (co-detected in two media), n = 116 (detected in one of the two media), n = 33 (detected in the mega medium) and n = 16 (detected in polyamine-free medium). Box, median, 25th and 75th percentiles; whiskers, minimum and maximum. Right, agmatine production levels by B. eggerthii. Data are mean ± s.e.m. from 2–3 independent experiments, each with n = 3 biological replicates. P values, two-tailed t-test with Benjamini–Hochberg correction for multiple comparisons. d, Heat map of metabolomic profiles of 158 bacterial strains grown in mega medium, clustered by 16S phylogenetic distance. Individual metabolites are hierarchically clustered (Ward’s method) using Euclidean distance between the fold-change (log₂-transformed) values across all taxonomies. Metabolites shown are detected in at least 50% of the 158 taxonomies to enable Ward clustering. e, f, Production or consumption patterns of tyramine and pantothenic acid across 158 strains grown in mega medium. Data are mean ± s.e.m. from 1–3 independent experiments (identified by dot colour), each with n ≥ 3 biological replicates.

Extended Data Fig. 5 Metabolic profile variation among related bacteria.

a, Pairwise metabolomic profile comparisons between two closely related strains grown in mega medium: C. sporogenes ATCC 15579 and C. cadaveris HM-1039 (subpanel 1), and among four strains of Bacteroides fragilis (subpanels 2–7): HM-710, HM-711, HM-714 and HM-20. Each dot represents an averaged fold-change value (log₂-transformed) from 1–3 independent experiments, each with n = 3 biological replicates. Pearson correlation r values of pairwise metabolomic profile comparisons, performed on standardized and scaled data: ATCC 15579 versus HM-1039 (r = 0.063), HM-711 versus HM-710 (r = 0.859), HM-714 versus HM-710 (r = 0.866), HM-714 versus HM-711 (r = 0.880), HM-20 versus HM-710 (r = 0.829), HM-20 versus HM-711 (r = 0.845) and HM-20 versus HM-714 (r = 0.807). b, Metabolic similarities and variations among closely related species of C. sporogenes and C. cadaveris, and among different strains of the same species of B. fragilis grown in mega medium. Taxonomies shown are clustered by 16S phylogenetic distance, and are coloured according to the distinct phyla. Data are mean ± s.e.m. from 1–3 independent experiments, each with n = 3 biological replicates.

Extended Data Fig. 6 Relationships between phylogeny, taxonomy and metabolome.

a, Metabolomic profiles of 158 bacterial strains grown in mega medium. Individual taxonomies are clustered by metabolomic profile distances (fold change, log₂-transformed) across all metabolites. Individual metabolites are hierarchically clustered (Ward’s method) using Euclidean distance between the fold-change (log₂-transformed) values across all taxonomies. Metabolites shown are detected in at least 50% of the 158 taxonomies to enable Ward clustering. b, Metabolic similarities between two phylogenetically distant species grown in mega medium. Taxonomies are clustered by metabolomic profile distances (fold change, log₂-transformed) across all metabolites. Data are mean ± s.e.m. of one experiment with n = 3 biological replicates. c, Scatter plot of pairwise metabolomic profile comparison between two phylogenetically distant species. Each dot represents an averaged fold-change value (log₂-transformed) of one experiment with n = 3 biological replicates. Pearson correlation of pairwise metabolomic profile comparison between these two species, performed on standardized and scaled fold-change data, r = 0.7090. d, Venn diagram of unique and overlapping compounds (by PubChem CID) identified in the culture supernatant of 158 mega-medium grown strains and caecal contents of conventional mice.

Extended Data Fig. 7 Multiple data transformations identify nonlinear relationship between phylogenetic and metabolomic distance.

a, Heat map showing the comparison of phylogenetic and metabolomic tree topologies. Cells record the number of tips for which the neighbourhoods share more overlap than expected (P < 0.05; one-sided permutation test). Data are stratified by fractional overlap of neighbourhoods and permutation probability (see Supplementary Methods, ‘Distance comparisons’). b, Histogram of chemical similarity scores (based on Tanimoto 2D structures) between each unique pair of compounds (by PubChem CID) detected in the in vitro dataset. For this pairwise comparison, 359 non-co-eluting compounds were used. c, Metabolomic distance tree with each metabolite weighted based on their chemical similarity (left) or unweighted control metabolomic distance tree (right). The weighted and unweighted matrices were calculated using uniquely detected, non-co-eluting compounds in the in vitro dataset, for which a unique PubChem CID identifier can be assigned to each compound. Two-sided Mantel test for comparison between the weighted and unweighted distance matrices: r² = 0.863, P = 0.001. d, Left, correlation of phylogenetic and metabolomic distance across pairs of strains coloured by lowest shared taxonomic rank with a LOESS fit shown. Dashed vertical line occurs at x = 0.11 as referenced in the text. Right, Metabolomic distance between pairs of strains binned by the lowest shared taxonomic rank. Species (n = 111), genus (n = 1,386), family (n = 159), order (n = 1,222), class (n = 34), phylum (n = 1,442) and kingdom (n = 8,442). Box, median, 25th and 75th percentiles; whiskers, Tukey’s method. e–i, Internal-standard-corrected fold-change data (e–g) and internal-standard-corrected total ion count data (h, i) were log-transformed and used to calculate pairwise metabolomic distances between microbial taxa. These distances were compared to the corresponding pairwise phylogenetic distances generated from a tree built with the V4 region of 16S (left) or the full-length 16S gene (right). Data are plotted with a LOESS fit. Set 1, microorganisms grown in at least one experiment simultaneously. Set 2, microorganisms grown in the same experiment only. j, Phylogenetic tree constructed using the full 16S sequences of a subset of the strains grown in mega medium. Only strains with available full 16S sequences are shown (Supplementary Table 6). k, Left, schematic of the pathway that synthesizes citrulline and ornithine, or synthesizes agmatine and/or putrescine. Right, the top six matches identified by the comparative genomics tool MultiGeneBlast within a 40-kb search window, when searched against a genomic database of our strain collection with sequenced genomes. Horizontal dashed lines between genes represent multiple other genes present within the search window.

Extended Data Fig. 8 Asparagine and glutamine can be used as sole nitrogen sources by most tested Bacteroidetes.

a, Top, an example decision tree from a forest that can differentiate Bacteroidetes versus bacteria from the other four represented phyla with >97% accuracy. For each decision node, phylum-level increases and decreases based on metabolite levels are shown (relative fold change compared to the bacterial medium controls, log₂-transformed). Actinobacteria (n = 20), Bacteroidetes (n = 57), Firmicutes (n = 83), Fusobacteria (n = 3) and Proteobacteria (n = 10). Dashed line, metabolite threshold. Box, median, 25th and 75th percentiles; whiskers: Tukey’s method. Bottom, the 10 most important features differentiating the five tested phyla. Data are shown as median metabolite log₂-fold-change values for each phylum; metabolites and phyla are ordered by Ward linkage distance. b, Representative growth curves from two independent experiments, each with n = 3 biological replicates for a subset of Bacteroides spp. using modified SMM with the indicated nitrogen source. Legend colours for the sole nitrogen source are the same in b–d. c, Representative growth curves of one experiment with n = 5 biological replicates for 60 Bacteroidetes using modified SMM with the indicated nitrogen sources. d, Growth curves of wild-type and mutant B. thetaiotaomicron (Bt) grown in defined minimal media with either cysteine (top) (one experiment, n = 3 biological replicates) or sodium sulfide (Na₂S, bottom) as sole reduced sulfur sources (one experiment, n = 3 biological replicates). e, Amino acid production and consumption levels in gnotobiotic mice mono-colonized with B. thetaiotaomicron (one experiment, n = 5 mice). Box, median, 25th and 75th percentiles; whiskers, Tukey’s method. Numeric labels in b and c correspond to the following: 1, B. acidifaciens DSMZ 15896; 2, B. caccae ATCC 43185; 3, B. caccae BEI HM-728; 4, B. cellulosilyticus BEI HM-726; 5, B. cellulosilyticus DSMZ 14838; 6, B. coprophilus DSMZ 18228; 7, B. dorei BEI HM-29; 8, B. dorei BEI HM-717; 9, B. dorei BEI HM-718; 10, B. dorei BEI HM-719; 11, B. dorei DSMZ 17855; 12, B. eggerthii ATCC 27754; 13, B. eggerthii DSMZ 20697; 14, B. finegoldii BEI HM-727; 15, B. finegoldii DSMZ 17565; 16, B. fragilis BEI HM-20; 17, B. fragilis BEI HM-710; 18, B. fragilis BEI HM-711; 19, B. fragilis BEI HM-714; 20, B. fragilis NCTC 9343; 21, B. intestinalis DSMZ 17393; 22, B. ovatus ATCC 8483; 23, B. ovatus BEI HM-222; 24, B. pectinophilus ATCC 43243; 25, B. plebeius DSMZ 17135; 26, B. salyersiae BEI HM-725; 27, Bacteroides sp. BEI HM-18; 28, Bacteroides sp. BEI HM-189; 29, Bacteroides sp. BEI HM-19; 30, Bacteroides sp. BEI HM-22; 31, Bacteroides sp. BEI HM-23; 32, Bacteroides sp. BEI HM-258; 33, Bacteroides sp. BEI HM-27; 34, Bacteroides sp. BEI HM-28; 35, Bacteroides sp. BEI HM-58; 36, B. stercoris ATCC 43183; 37, B. stercoris BEI HM-1036; 38, B. thetaiotaomicron 3730; 39, B. thetaiotaomicron 3731; 40, B. thetaiotaomicron 633; 41, B. thetaiotaomicron 7330; 42, B. thetaiotaomicron 7853; 43, B. thetaiotaomicron 8702; 44, B. thetaiotaomicron 8713; 45, B. thetaiotaomicron 8736; 46, B. thetaiotaomicron 940; 47, B. thetaiotaomicron VPI 5482; 48, B. thetaiotaomicron WH302; 49, B. thetaiotaomicron WH305; 50, B. uniformis ATCC 8492; 51, B. vulgatus ATCC 8482; 52, B. vulgatus BEI HM-720; 53, B. xylanisolvens DSMZ 18836; 54, P. distasonis ATCC 8503; 55, P. distasonis BEI HM-169; 56, P. johnsonii BEI HM-731; 57, P. johnsonii DSMZ 18315; 58, P. merdae ATCC 43184; 59, P. merdae BEI HM-729; 60, P. merdae BEI HM-730.

Extended Data Fig. 9 Metabolic contribution by individual gut microorganisms in a multi-species community.

a, α-Ketoglutaric acid levels in faeces of mice mono-colonized with Anaerostipes sp. BEI HM-220. Data are mean ± s.e.m. of two independent experiments, each with n = 4 mice (germ-free) or n = 5 or 7 mice (Anaerostipes mono-colonized). b, Left, MDMs were associated with specific bacterial phyla leveraging both in vivo and in vitro metabolomic data. Right, number of bacterial strains grown in mega medium by phylum that produce MDMs identified in the caecal contents of mice colonized with B. thetaiotaomicron (Bt, n = 5) or C. sporogenes (Cs, n = 3), or with a six-member community (n = 3). Numbers of strains that produce at least one of these metabolites in vitro by phylum: Bacteroidetes, n = 52; Firmicutes, n = 60; Proteobacteria, n = 8; Actinobacteria, n = 16; and Fusobacteria, n = 3. Each metabolite shown was significantly produced both in vitro and in vivo (≥4-fold, corrected P < 0.05). Uniquely detected (non-co-eluting) metabolites are shown (Supplementary Table 9). c, Spearman correlation between metabolomic profiles (standardized and scaled, log₂-transformed, fold-change data) of individual B. thetaiotaomicron- or C. sporogenes-mono-colonizesd host biofluids (caecal contents, faeces, serum or urine) and individual bacterial culture (158 strains grown in mega medium). Coloured dots, Spearman’s ρ values calculated by comparing metabolomic profiles of individual bacterial culture versus individual biofluid of either B. thetaiotaomicron- or C. sporogenes-mono-colonized mice. Black dots, Spearman’s ρ calculated using metabolomic profiles of B. thetaiotaomicron or C. sporogenes, the same strains used for mono-colonization in mice. d, Venn diagram of overlapping metabolites that are significantly produced (≥4-fold, corrected P < 0.05) in culture and in the caecum of colonized mice. e, Principal component analysis separates metabolomic profiles of identified metabolites by sample type in each colonization state. P values on metabolomic profile comparisons between different sample types of the same colonization state were determined using PERMANOVA: six-member community (P = 0.073) and all other colonization states (P = 0.001). f, Principal component analysis separates metabolomic profiles of identified metabolites by colonization states. P values on metabolomic profile comparisons between different colonization states of the same sample type were determined using PERMANOVA: P = 0.001 for all four sample types. g, h, Example chemical structures of significantly produced metabolites (≥4-fold, corrected P < 0.05) in serum (g) or urine (h) by each colonization state corresponding to Fig. 4b. a, b, d, g, h, P values were determined using two-tailed Student’s t-tests with Benjamini–Hochberg correction for multiple comparisons.

Extended Data Fig. 10 Metabolic contribution of multi-species communities in gnotobiotic mice.

a, Proposed host–microbial co-metabolism pathways that could lead to the synthesis of specific host–microbial co-metabolites in the urine and serum of mice colonized with the six-member community. b, c, Metabolite levels in urine (b) and caecal contents (c) of mice colonized with the six-member community (+Cs) or the five-member community (−Cs). Metabolites shown represent a panel of significantly elevated or reduced metabolites (≥4-fold, corrected P < 0.05) in the six-member community. Superscript ‘1’ in metabolite names, co-eluting metabolites as annotated in the MS reference library (Supplementary Table 1). Superscript ‘2’ in metabolite names, co-eluting isomeric metabolites with truncated names in the figure (2-hydroxy-3-methylpentanoic acid, 2-hydroxy-4-methylpentanoic acid; and α-galactose 1-phosphate, α-glucose 1-phosphate, glucose-6-phosphate, mannose 6-phosphate). Data are mean ± s.e.m. of one experiment with n = 6 (urine, six-member community), n = 7 (urine, five-member community) and n = 3 (caecal, both six-member and five-member communities). b, c, P values were calculated using two-tailed Student’s t-tests with Benjamini–Hochberg correction for multiple comparisons. *P < 0.05, **P < 0.01, ***P < 0.001. b, Venn diagram (right) of significantly elevated and reduced metabolites in individual host biofluids (caecal contents, serum and urine) using the same threshold in b (left).

Supplementary information

Supplementary Information

This file contains Supplementary Methods and a full guide to Supplementary Tables 1-9. The Supplementary Methods include mass spectrometry LC/MS methods, data analysis, bacterial sequencing and phylogenetics, distance comparisons and classifiers, the Mega Medium (MM) preparation protocol and Salyer’s Minimal Medium (SMM) preparation protocol.

Reporting Summary

Supplementary Table 1

Mass spectrometry compound m/z-RT reference library.

Supplementary Table 2

MS/MS validation of biological metabolites and library standards, matrix effects on ion count and RT, and linear range of individual metabolites.

Supplementary Table 3

MS/MS spectra library constructed on qTOF and QE instruments.

Supplementary Table 4

Inter-instrumental retention time shift and correction.

Supplementary Table 5

In vitro sample metadata.

Supplementary Table 6

Culture and strain information.

Supplementary Table 7

In vitro data matrices.

Supplementary Table 8

The in vivo database and in vivo pipeline output data matrices.

Supplementary Table 9

List of “phylum-associated” metabolites.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, S., Van Treuren, W., Fischer, C.R. et al. A metabolomics pipeline for the mechanistic interrogation of the gut microbiome. Nature 595, 415–420 (2021). https://doi.org/10.1038/s41586-021-03707-9

Download citation

Received: 16 July 2020
Accepted: 08 June 2021
Published: 14 July 2021
Issue Date: 15 July 2021
DOI: https://doi.org/10.1038/s41586-021-03707-9

This article is cited by

Resource competition predicts assembly of gut bacterial communities in vitro
- Po-Yi Ho
- Taylor H. Nguyen
- Kerwyn Casey Huang
Nature Microbiology (2024)
Antibiotics treatment promotes vasculogenesis in the brain of glioma-bearing mice
- Maria Rosito
- Javeria Maqbool
- Cristina Limatola
Cell Death & Disease (2024)
Integrated annotation prioritizes metabolites with bioactivity in inflammatory bowel disease
- Amrisha Bhosle
- Sena Bae
- Curtis Huttenhower
Molecular Systems Biology (2024)
microbeMASST: a taxonomically informed mass spectrometry search tool for microbial metabolomics data
- Simone Zuffa
- Robin Schmid
- Pieter C. Dorrestein
Nature Microbiology (2024)
GMMAD: a comprehensive database of human gut microbial metabolite associations with diseases
- Cheng-Yu Wang
- Xia Kuang
- Feng-Biao Guo
BMC Genomics (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.