Introduction

Metabolism plays a foundational role in essentially all aspects of life and disruptions in metabolism can affect a wide range of basic functions including nutrition, athletic performance, immune function, pain perception, and the progression of both chronic and infectious diseases1,2,3,4,5,6,7. Although mammalian metabolism has been intensively studied for over a century, it has primarily been investigated through the lens of host metabolic function. However, over the last 20 years we have become increasingly aware of the role that microbial communities play in modulating the availability of nutrients and how these microbial modulations can impact homeostasis5. Everything that mammals eat enters the gastrointestinal (GI) tract, where the metabolism of the gut microbiome can transform these molecules and directly influence the complement of nutrients that are passed along to the host3.

Disruptions in the microbiome caused by antibiotics, diet, and disease can alter these intricate host-microbiome metabolic exchanges and affect biological functions throughout the body1,2,3,5,6,8,9,10,11,12. A few examples of diseases that are modulated by microbial metabolism include colitis (e.g., irritable bowel syndrome (IBS) and Crohn’s disease)10,13,14,15,16,17, immune diseases (e.g., multiple sclerosis)1,18,19, neurodegenerative diseases (e.g., amyotrophic lateral sclerosis and Alzheimer’s disease)20,21,22, psychological conditions (e.g., depression)8,23, cystic fibrosis24,25,26,27,28,29, cancer30,31,32, and cardiovascular disease33. Microbial metabolism can also play a role in the pharmacokinetics of drugs9,34,35. These surprising associations have led researchers to investigate the microbiome as a vehicle for stimulating specific metabolic activities21,36,37,38 and as a tool to modulate inflammation39,40,41.

The links between microbial metabolism and human diseases are now well-defined thanks to large-scale efforts, such as the >15,000 feces samples collected by the American Gut Project42. These efforts have helped demonstrate links between the microbiome and diverse conditions including IBS, cystic fibrosis, diabetes mellitus, and cancer31,43,44,45. While identifying the specific molecular mechanisms contributing to these diseases remains challenging, recent advances in metabolomics technologies allow us to capture a broad swath of central carbon metabolism and track the microbial metabolism of carbon chains as they are passed through networks of over 5000 reactions46,47,48,49. Metabolomics technology, when coupled to animal models, isotope tracing studies, and in vitro organ models allows researchers to probe the complexities of host-microbiome metabolic dynamics with a greater degree of experimental control than was previously possible (Fig. 1, Table 1). The main objective of this review is to examine the unique challenges encountered in microbiome-metabolism studies and discuss how these emerging tools and techniques can provide insights into the molecular mechanisms at work behind complex host–microbiome interactions. We highlight examples of how these tools can be used to study host-microbiota and microbe-microbe interactions, as well as interactions of the microbiome with diet and pharmaceuticals (Table 1; see Supplementary Table 1 for more examples).

Fig. 1: Established model systems for studying host-microbiome-related phenomena.
figure 1

Figure created with BioRender, available at Biorender.com.

Table 1 Established host-microbiome metabolic phenotypes along different axes of interaction, including host-microbe, microbe-microbe, and microbes with the external environment.

Biological complexities in microbiome metabolic studies

Complex microbial ecosystems are found throughout the integument, GI tract, airways, mucosa, and urogenital tract5. The environmental conditions that microbes encounter at these sites vary dramatically with respect to pH, oxygen, bicarbonate, and nutrient availability and these site-to-site differences can have a profound impact on which microbes inhabit the niche50. Since the metabolic capacity of microbes differs considerably species to species, these differences in microbial community composition can have a profound impact on the metabolic capacity of the overall system51,52,53,54,55. Moreover, the ensemble action of host and microbial enzymes creates a more complex metabolic network than exists in any individual species5,6.

Microbiome metabolism is further complicated by the multi-species pathways that nutrients can take through the microbial ecosystem. The metabolic waste products of some microbes are the preferred carbon sources for others. Succinate, for example, is a primary waste product of Enterobacteriales56 but is also one of the preferred carbon sources of Pseudomonas aeruginosa57,58,59. These differences in nutritional strategies enable microbial cross-feeding interactions that create multi-species metabolic networks in the microbiome50,60. Cross-feeding can significantly alter the metabolic capacity of systems24,25,27,28,52,61 and modulate microbial phenotypes (e.g., their sensitivity to antibiotics)35,53,54. Cross-feeding interactions also enable some species to survive in otherwise inhospitable environments23,35,62. For example, a previously unreported genus of bacteria (KLE1738) was recently found to depend on γ-aminobutyric acid (GABA)-producing Bacteroides fragilis for growth23. Cross-feeding interactions are also thought to play a role in a range of health issues, including periodontal health62, the clinical progression of pulmonary infections in cystic fibrosis patents24,25,28,29, and may contribute to the overgrowth of pathobionts in response to undernutrition63.

Tracking metabolism through multi-species networks

Decoupling the complex flow of molecules organism-to-organism is a non-trivial challenge and to study cross-feeding, researchers have employed a variety of methods including genome-scale metabolic reconstruction28,64, computational modeling with in vitro data28,65,66,67,68, and in vitro checkerboard assays53,54 (Table 1; Supplementary Table 1). Recently, metabolomics has played a larger role in dissecting these dynamics, largely via the use of isotope tracing experiments69,70,71. Isotope tracing approaches track the flow of stable isotope-labeled nutrients (typically 13C, 15N or 2H) through microbial communities and their exchange with the host23,24,61,72. This strategy has been used to identify microbe-specific biomarkers23,33,61, demonstrate the exchange of nutrients from microbes to the host72, and demonstrate syntrophic relationships between microbes that overcome nutrient imbalance in the diet61. Though powerful, isotope labeling approaches are a serious analytical undertaking, especially in the context of untargeted and semi-targeted metabolomics studies. The multi-species metabolic networks can scramble isotope labeling and make it difficult to predict which molecules and which isotopologues (i.e., the number of isotopically labeled atoms present in a molecule) will be produced from a microbiome-linked processing of a precursor. This uncertainty can create computational challenges because it requires all possible metabolites to be screened for all possible isotopologues. This dramatically increases the search space and thereby the likelihood of misidentifying metabolites in the context of untargeted/semi-targeted studies. Although this can be partially mitigated via high-resolution mass spectrometry73, additional care must be given to the molecular assignment process since mass and retention time alone may not be sufficient to unambiguously identify metabolites in isotopically complex microbiome extracts.

Organ models of microbiome metabolism

In vitro bioreactors and organ models are well-established systems for reducing the complexity of microbiome analyses and provide a path for identifying specific molecular interactions between cell types26,27,74,75,76,77,78,79. One of the best-established bioreactor systems is the Simulator of the Human Intestinal Microbial Ecosystem (SHIME®), which simulates the entire human GI tract74,75. The SHIME® system simulates digestion by pumping contents into a series of chambers74. This system can be primed with fecal suspensions or engineered with specific microbes to enable specific molecular interactions to be studied under controlled conditions. This model has been used to elucidate the effects of diet and pre- or probiotics on organic acid and short-chain fatty acid (SCFA) production in different GI compartments36,38,80,81,82. Other organ models include the Winogradsky column system, which has been used to study the microbial community interactions underlying pulmonary infections25,26,27, and a range of single compartment chemostat reactors that have been used to simulate specific microbiome ecosystems (e.g., the human colon)37,76,77,79,83,84,85,86.

Although each of these established in vitro systems enable detailed molecular studies to be conducted under well-controlled conditions, they lack human cell interactions87 and thus do a poor job of simulating the significant host contributions to the environment, such as the absorption of nutrients associated with mammalian host cells27,74,83,88. To address this, a range of new in vitro organ models have been developed. Organoids, organ-on-a-chip, and related in vitro human biomimetic models allow researchers to simulate specific compartments of the human body while manipulating parameters to simulate disease pathogenesis, host-cell responses, and drug interactions89,90,91. Organoid culture is a well-established tool for studying a multitude of organs and disease models, but most current approaches use a microfluidic organ-on-a-chip approach to mimic complex interactions between the microbiome and host or other microbial cells. The human Colon Chip was used to determine the human microbiome-associated metabolites that mediated susceptibility to enterohemorrhagic E. coli infection, which is not common in mice and therefore cannot be studied in a murine model92. The human gut-on-a-chip, comprised of two microfluidic channels which are separated by a flexible membrane lined with human epithelial cells, is a useful tool to manipulate different factors in the gut microbiome, such as the presence of immune cells and pathogenic microbes, to determine the factors that contribute most to intestinal inflammation and bacterial overgrowth40,93. Similarly, the simplified human microbiota (SIHUMIx) model consists of three parallel bioreactors inoculated with a standard mix of eight bacterial species that are dominant in human feces and improves the reproducibility of results from prior bioreactor systems94.

Newer models are allowing researchers to study more complex interactions including co-culture and multi-organ systems41,87,89. The microfluidics-based HuMix (human-microbial crosstalk) model consists of gaskets divided into three co-laminar microchannels (medium perfusion microchamber, human epithelial cell culture chamber, microbial culture microchamber) with cell-covered membranes used for the extraction of intracellular metabolites87. This provides a means to continually monitor the effect of co-culture on individual co-cultured cell contingents. The authors validated HuMix with human intestinal epithelial cells co-cultured with Lactobacillus rhamnosus GG grown under anaerobic conditions, which induced the intracellular accumulation of GABA in the epithelial cells. Another model connected the human gut, liver, and circulating immune cells (T regulatory and T helper 17 cells) to simulate ulcerative colitis ex vivo using an integrated co-culture system of two fluidically communicating human micro-physiological systems41. In this system, the authors tested the immune response to microbiome-derived SCFAs, which was dependent on the involvement of effector CD4 T cells.

Each of these in vitro organ models provides a mechanism for investigating the molecular underpinnings of host-microbiome interactions. However, all bioreactors and organ models are sensitive to subtle changes in pH, temperature, nutrients, and oxygen levels which can have a dramatic impact the metabolic phenotypes observed in these model systems27,74,83,95. Although these platforms provide a controlled environment for testing metabolic hypotheses about interactions of individual species, they generally must be combined with metabolic profiling or in vivo approaches to verify the physiological relevance of the findings (see Table 1 for examples).

Animal model strategies for investigating molecular interactions

Gnotobiotic animal models, which have been extensively reviewed elsewhere96,97,98, are one of the most effective tools for investigating the molecular underpinnings of host-microbe molecular interactions99,100,101. Germ-free mice (and other species) can be colonized with defined collections of microbes, which enables direct comparisons between germ-free (GF; free of all microorganisms), specific pathogen free (SPF; contain a microbiota that is free of specified pathogens), and monocolonized/polycolonized animals. For human studies, microbial communities linked to specific metabolic phenotypes can be mapped via metagenomic sequencing102 and the taxonomic structure of populations can be linked to disease states. These phylogenetic mapping efforts can be effective when combined with fecal microbiota transplantation studies to separate host versus microbiome contributions to complex diseases103.

These model systems present a powerful framework for integrating hypothesis testing into metabolomics studies of the microbiome and have been used to investigate diverse biological processes including aging104, reproduction105,106, and metabolism13,14,16,107,108. This strategy has been very successful in providing molecular insights into diverse diseases including colitis (both IBS and Crohn’s disease)15,17, neurodegenerative disease20,21, breast cancer32, diabetes109, the biological response to toxin exposure9,10,34,110,111,112,113, and the role that the microbiome plays in immunity30,114. Though powerful, gnotobiotic models have some limitations. Human physiology and our microbial communities differ from those found in model organisms100 and this can present challenges in translating findings back to human disease115,116. Furthermore, rodents are coprophagic, a behavior that is not common in people, and this can have a direct impact on the metabolic composition of the GI tract, including microbial catabolites of bile acids115. Despite these shortcomings, gnotobiotic models are currently the best tools available for testing specific microbiome metabolic hypotheses and, if carefully coupled with in vitro organ models or human studies, provide the most direct path for unraveling the molecular underpinnings of host-microbiome metabolic dynamics.

Sampling considerations

Metabolic associations can be established through the analysis of non-invasive samples (analyses of feces, serum, and urine), but these samples are indirect reporters of microbial metabolism, are diluted significantly after they leave their microenvironment, and can undergo significant biological or chemical degradation before they can be sampled from these distal sites44,117,118,119,120,121,122. Although the obvious solution to this problem is to collect samples directly from microbial communities, this approach is not always practical (i.e., sampling the GI tract in humans is invasive and not all sites can be reached via endoscopy123). Moreover, the metabolites produced by microbes in one site can affect a wide range of other organs11,20,21,113,122,124,125,126,127. Microbially-derived trimethylamine N-oxide and phenylacetylglutamine produced in the gut, for example, are linked to elevated risks of cardiovascular disease and pancreatic cancer31,33. In addition, a growing body of literature has shown that the microbiome influences both local and systemic immune function2,127. Microbial inosine, for example, plays a direct role in the activation of antitumor T cells30. Resolving these indirect modes of action is critical for understanding the molecular mechanisms that contribute to disease but poses significant challenges to study designs. At present, the best strategy is to sample broadly from both the local microbial communities (wherever possible) and from distal sites around the body.

The choice of metabolite extraction solvents will also directly affect the scope of metabolites that can be observed in a study46,121. The merits of diverse sample preparation methods, the timing of collection, transport and storage conditions of the samples, homogenization, and pretreatment strategies (e.g., use of preservatives) have been discussed at length elsewhere117,118,119,128,129. Briefly, some general principles that need to be considered are (1) the extraction solvent must match the downstream analysis (i.e., aqueous extractions should be matched with analyses of hydrophilic molecules and vice versa), (2) metabolism needs to be quenched to prevent samples from degrading, (3) solvent/solute ratios should be adjusted according to the target molecules to maximize extraction efficiencies, (4) freezing samples will cause some metabolites to precipitate out of solution and can affect quantification, and (5) every sample extraction method excludes certain groups of molecules and introduces biases46,121,130. Consequently, the primary objective of any extraction should be to capture the target transect of molecules with the least technical error. With this objective in mind, we have increasingly favored extractions in 4 °C 50% methanol:H2O (at a 1:50 or 1:20 volume:volume dilution or 50 mg tissue/mL) for general studies involving central carbon metabolism131,132,133, extracellular metabolome analyses134,135,136, and other projects involving the analysis of polar metabolites137,138,139 (see Section S1 in the SI file for detailed extraction protocols). We have shown that this simple extraction, when coupled with hydrophilic interaction liquid chromatography (HILIC) mass spectrometry (MS), can reproducibly capture metabolites over thousands of samples with minimal technical error (coefficient of variation <0.15 over 1000+ injections)140. We find that this method is a good first choice for analyzing polar compounds when the molecular targets are poorly defined; however, other methods can be better optimized for specific compound classes such as SCFAs. Samples containing SCFAs should be immediately frozen (at −20 °C or preferably −80 °C) and can then undergo extraction using solvents such as an acetonitrile:H2O blend or specialized cleanup steps like solid-phase microextraction141,142,143,144. Appropriate sample preparation steps for the sample matrix and the classes of molecules analyzed help ensure reproducible and meaningful results in these complex microbiome-metabolomics studies.

New frontiers in human sampling

Gnotobiotic animals are a powerful platform for studying disease but will always be imprecise re-creations of human diseases. Validating these findings requires human studies, which are generally limited to blood, urine, feces, and other non-invasive samples. Although some researchers have employed surgical biopsy145 and mucosal endoscopic lavage17,123, these medical procedures are difficult to organize for most studies. To address this, several groups are working to develop ingestible GI sample collection devices146,147. These new tools allow insights into physiology that were previously only practical in animal models.

Three examples of this emerging platform are the CapScan® (Envivo Bio)146, the Ingestible Osmotic Pill147, and the Small Intestine MicroBiome Aspiration Capsule (SIMBA™) (Small Intestine MicroBiome Aspiration Capsule (2022) at https://www.nimblesci.com/technology). The CapScan® consists of a collapsed collection bladder, capped by a one-way valve inside a dissolvable capsule with an enteric coating146. Once ingested, the device moves down the GI tract until it reaches a pre-set pH level (e.g., pH 7–8 in the ileum), where the enteric coating dissolves and the collection bladder draws in the luminal contents. The one-way valve prevents further entrance of liquid into the capsule, which is later recovered from the stool. These researchers tested the device on 240 intestinal samples from 15 healthy patients and found significant differences in microbes and metabolites present in the intestines compared to the stool and determined that bile acid profiles varied along the intestines, as found previously in animal models. Other devices follow similar principles, although the mechanism of action for the 3D-printed Osmotic Pill involves a pressure differential created across the semipermeable membrane, which induces a passive pumping action as it moves down the GI tract147. The pill is embedded with a small neodymium magnet, and thus can be held in a precise location inside the GI tract for sampling of specific locations. These ingestible sampling devices have just recently been introduced and their applications into metabolomics have just started to come online. One consideration in applying this emerging technology to metabolomics is that samples collected via these capsules cannot be metabolically quenched until after the capsule has been collected (potentially a day or more after sampling the microbiome)46,121,148. Consequently, these tools will be most amenable to analyzes of metabolic phenotypes that are biologically and chemically stable.

Analytical complexities in microbiome metabolic studies

Microbiome metabolomics studies present significant analytical challenges because of the exceptional breadth of chemical diversity and the high degree of metabolic complexity that can be found in metabolic extracts from microbial communities (Fig. 2)44,46,47. Complex carbohydrates, lipids, bile acids, SCFAs, peptides, amino acids, nucleotides, vitamins and cofactors, and a stunning diversity of secondary metabolites are a few examples of small molecule classes that are metabolized by the microbiome1,2,3,6,8 and can modulate host-microbiome dynamics (see Table 1 for examples). Unfortunately, no single analytical technique can capture this full spectrum of molecules in a single analysis46,48,49. Consequently, each study must designate a target range of molecules and select an extraction method and analytical framework that are compatible with the chemical properties of the target analytes.

Fig. 2: Analytical challenges associated with microbiome-metabolomics research include the diversity of chemical classes present in local and systemic regions of the gut microbiome.
figure 2

Figure created with BioRender, available at Biorender.com.

As discussed in “Sampling considerations”, there are dozens of commonly used sample preparation and analytical workflows for metabolomics analyses. These have generally consolidated around methods for the three core instrumentation platforms used in metabolomics: liquid chromatography-mass spectrometry (LC-MS)10,13,21,45,142, gas chromatography-mass spectrometry (GC-MS)110,149,150, and nuclear magnetic resonance (NMR)12,15,120. Although the general merits and pitfalls of these platforms are thoroughly described elsewhere46, there are some special considerations that these platforms have in the context of microbiome research.

One of the most intensively studied classes of molecules in microbiome research are SCFAs, which are microbially-derived compounds that are implicated in a vast range of biological processes ranging from colitis and immune function to pain perception10,13,15,21,22,85,110,151,152. Although these molecules can be detected on any of the three core analytical platforms, they are surprisingly difficult to accurately quantify. NMR methods can detect them directly but cannot accurately quantify them without selective or multidimensional methods130,153. Although SCFAs are small and polar, they are difficult to resolve by liquid or gas chromatography without derivatization154. To address this, a range of specialized SCFA analysis techniques have been developed10,21,45,142. Of these, we prefer SCFA Quantification Using Aniline Derivatization, which is an isotope-based LC-MS strategy that enables robust absolute quantification of SCFAs in complex samples142.

Another important class of metabolites for microbiome research are bile acids, which encompass a rich collection of cholesterol-derived molecules and whose conjugated derivatives are secreted from the liver into the gut and converted into secondary bile acids via microbial catabolism1,11,44,113,122,155,156. A selection of conjugated and unconjugated bile acids reach the circulatory system, where they interact with bile acid receptors1,11,113,156,157. These molecules are primarily detected in the cecum or the feces and are best detected using reverse-phase LC-MS44,113,158 or via targeted analyses using chemical labeling kits that are now commercially available (e.g., Biocrates)11.

Beyond these intensively studied classes of compounds, a range of aqueous central carbon metabolites, including amino acids (e.g., neurotransmitters, choline derivatives, and tryptophan derivatives), nucleotides, and energy intermediates are emerging as important regulators of the interplay between host and microbiome1,8,20,21,111,124,159,160. These metabolites are more commonly associated with systemic effects (i.e., are found in the bloodstream and tissues such as the brain and liver)12,20,21,111,124,159, but can also serve as important immune regulators in the GI tract30,122,161,162. These hydrophilic compounds are most easily analyzed by LC-MS using HILIC140,163,164,165 but can also be derivatized and analyzed by GC-MS14,109,112. Our preferred strategy for aqueous analyzes of the GI tract and feces uses a zwitterionic HILIC (Thermo Fisher Syncronis™) stationary phase combined with a short linear ammonium formate (aqueous phase)/ acetonitrile with formic acid (organic phase) gradient to capture amino acids, carbohydrates, nucleotide derivatives, and other common compounds that are found outside the cell140. We have recently shown that the uptake and secretion of these compounds is a sensitive predictor of microbial taxa56.

Phosphate-containing metabolites (e.g., ATP, NADH, glucose-6P, and carbamoyl phosphate) and organic acids (e.g., citrate and α-ketoisovalerate) play a critical role in central carbon metabolism, energy transfer, and redox166. Although a range of HILIC methods have been developed to chromatographically resolve these compounds, they tend to ionize poorly by electrospray ionization. This problem can be mitigated through the use of ion pairing agents (e.g., tributylamine; TBA), which improve their chromatographic properties and stabilizes their negative charges167,168. The TBA-metabolite complexes formed through this method enhance LC-MS sensitivity by orders of magnitude for these compounds. One of the most effective of these methods is C18 reverse phase ion pairing (RPIP) that was developed by the Rabinowitz group169. This method can be challenging to set up and involves spraying 15 mM TBA into an instrument, which is very difficult to clean out of the system and effectively commits the LC-MS system to negative mode analyses. For labs with the technical expertise to establish this method and resources to commit an instrument to this setup, the RPIP method offers a robust and high-sensitivity mechanism for quantifying phosphate-containing metabolites and organic acids.

In summary, the vast diversity of microbial metabolites produced by the microbiome cannot be captured using a single analytical assay and the selection of analytical method(s) will have a direct impact on the transect of metabolic pathways that will be observable for any given study. Thus, the analytical approach must be tailored to each investigation and multiple methods are generally required to capture a comprehensive picture of host-microbiome metabolic dynamics.

Data normalization in microbiome metabolic studies

The variability of microbiome samples (e.g., fecal water content and variation in urine water content) along with the frequently large scale of many microbiome studies creates significant complexities with regards to normalizing metabolomics datasets31,33,42,43,56,170. Data normalization is a complex task that is affected by both the analytical platform and sample type. Whereas NMR data can be normalized post-acquisition, mass spectrometry data are very challenging to correct post-acquisition because each molecule follows its own unique ionization properties that are nonlinearly affected by the composition of the matrix171,172,173,174. Thus, no single normalization constant can be used to correct for sample-to-sample differences in composition. To address this, a range of computational strategies have been proposed including normalization to a constant sum175, probabilistic quotient normalization176, metabolic ratio correction171, median fold change177, and normalization to MS total useful signal173. However, in our experience, none of these computational strategies completely control for variability and all of these mathematical operations contribute to undesirable propagation of error. Additionally, normalization can produce significant artefacts—especially in untargeted analyses (partially due to missing signals in analytes approaching the limit of detection). Consequently, our preferred approach, wherever possible, is to prepare homogeneous samples or otherwise correct the sample extractions prior to analysis to minimize the need for post-analysis data normalization. We employ a range of analytical strategies depending on the sample type: for tissue and fecal samples we prefer weighing, for fluids we normalize to initial sample volume, and for microbial samples and cell cultures we correct to optical density or colony counts; alternatively, we introduce isotope-labeled reference metabolites into the extraction mixture142 (see SI Section S1 for details).

In addition to sample-to-sample variability, metabolomics studies must also contend with the inherent instability of the analytical platforms. LC-MS response factors drift day-to-day and thus preclude the direct comparison of raw signal intensities from batch-to-batch. This problem can be addressed by collecting a common reference sample in every batch and expressing observed intensity relative to the common refence140. This common sample can be prepared as a mixture of all of the representative samples (i.e., a “super mix”) which is ideal in untargeted analyses that may capture unusual metabolites. Alternatively, a mixture of analytical reference standards in a representative sample matrix can be collected along with each batch of samples140,173. These calibration reference samples can then be used to compute the absolute concentration of target metabolites and thereby sidestep the data normalization pitfalls. Naturally, the standards-based approach limits analysis to compounds that can be matched to commercial standards and to signals that are present within the linear range of the calibration reference mixtures.

Data analysis complexities in microbiome metabolic studies

Untargeted metabolomics studies capture thousands of individual features from each sample, but conventional experimental designs involve cohorts of hundreds or fewer subjects178. This creates an inherent mismatch between data size and biological replication, which is a recipe for driving false discovery, statistical overfitting, and a variety of computational problems. Consequently, data processing steps, including removal of noise, peak detection, identification, quantification, and missing value imputation can play pivotal role in the quality of the resulting dataset178,179,180. Many of these challenges are inherent to all systems-level analysis and there are well-established statistical tools such as dimensionality reduction approaches (e.g., principal component analysis)12,16,32,105,124,181,182, correction for multiple-testing (e.g., Bonferroni correction)183, the use of linear models (e.g., ridge regression)184, and data visualization strategies (e.g., volcano or Manhattan plots)185 that can identify statistically significant correlations and avoid common pitfalls related to false discovery and statistical overfitting. Recently, new computational approaches have been developed, including machine learning and mediation analysis, that provide a powerful new approach for unlocking the molecular underpinnings of host-microbiome dynamics186,187,188,189,190,191.

Machine learning strategies for identifying molecular mechanisms

Microbiome-metabolomics computational approaches must decipher a complex mix of interactions to reveal the microbial composition, the interactions between its components, the interaction with the host, as well as time dependencies in the sample. In recent years, researchers have increasingly used machine learning (ML) approaches to resolve these complexities188,190,192,193,194. ML techniques compress high-dimensional data into models via a recursive learning (or fitting) process. Once trained, these models can then be used to predict, classify, or transform new data. A branch of ML called explainable ML focuses on the interpretability of such models and is useful for microbiome research because of its ability to model to highly complex functions, and to identify important combinations of features while simultaneously incorporating confounding variables189,190,195,196,197. Some examples of successful applications of ML are high-capacity models (e.g., Random Forest190 and Gradient Boosting189), deep artificial neural networks that can model arbitrarily complex functions196,197, and the SHAP algorithm that uses concepts from cooperative game theory to analyze trained models and their predictions195. Each of these tools can return a human interpretable “importance” score of the original features and these scores can be used to help drive research into discrete molecular mechanisms. Recently, identifying these causal relationships has been taken a step forward with AutoEncoders, which are neural networks that were developed to identify causal relationships using a latent variable model187. Though effective in skilled hands, neural networks often require expert knowledge to be implemented effectively and other approaches including mediation analysis may provide a means to interpreting complex relationships in metabolomics data.

Mediation analysis in microbiome studies

Mediation analysis is a valuable tool that can estimate causality in relationships between study variables191,198. Mediation models were first employed in the study of psychosocial predictors of human health in order to identify potential causal relationships by decomposing the direct effect of a predictor versus a treatment or outcome or its indirect effect through a mediator199,200,201,202. A mediation model has three types of variables, the predictor (X), the outcome (Y), and the mediator (M). X could be the state of a patient (age, gender, comorbidities, etc.), Y the severity of a disease, and M the clinical interventions. Tests of association often use another variable called the confounding factor or variable (C), which does not mediate the association between X and Y but is an alternate (biasing) explanation for it. The goal is to quantify direct effects (caused by X) and indirect effects (caused by M) on (Y) and their statistical significances. Explanations for a potentially-causal association between a predictor (X) and an outcome (Y) almost always involve at least one mediator variable (M)203.

More recently, there has been a growing interest in studying the role of human gut microbiota as a “mediating” biological pathway in the association between diet, a medical intervention or an environmental exposure, and adult health200,201,202. Several publications have also explored the role of early-life microbiota by testing the role of gut microbiota during infancy in mediating associations between a variety of early life exposures such as maternal pre-pregnancy weight, cesarean section delivery, and household cleaning product use, and comparing these to future health outcomes204,205,206. Mediation analysis can be used to identify microbiota metabolic pathways for breastfeeding in promoting gut immunity. For example, γ-Proteobacteria and its metabolite lactate have been shown experimentally to promote mucosal immunity and aid maturation of gut microbiota by stimulating Immunoglobulin A responses and enhancing intestinal cell activity of dendritic cells207,208.

In a multiple mediator pathway model (Fig. 3), we demonstrate the effects of two sequential mediators, gut γ-Proteobacteria (Mediator 1) and lactate levels (Mediator 2), in the pathway between breastfeeding (BF) status (X) and fecal secretory Immunoglobulin A (sIgA) levels, a marker of gut immunity, after 3 months of breastfeeding (Y). At this young infant age, breast milk is the sole source of sIgA and the infant gut only produces small amounts. In this example, the predictor variable (X) is divided into two categories, X1 (partially-breastfed infants) and X2 (non-breastfed infants), and compared to the reference category, exclusively breastfed infants. The mediating path (or indirect effect) being tested is the γ-Proteobacteria – lactate pathway in the association between the extent of non-breastfeeding and fecal sIgA levels. This path shows statistical significance for partial breastfeeding [Path 3 × 1: −0.05, 95% (−0.10, −0.01)] and no breastfeeding [Path 3 × 2: −0.06, 95%CI (−0.13, −0.02)], indicating that limiting breast milk intake can lower infant sIgA levels through a pathway of reduced abundance of γ-Proteobacteria and its metabolite lactate. As expected, the model also shows a substantial direct effect for lack of any breastfeeding in lowering sIgA levels [X2, −4.34]. Importantly, the microbiota-lactate path is a separate path to the direct effects of breast milk in supplying sIgA to the infant, suggesting that reduced availability of fecal lactate due to limited breastfeeding may lower sIgA production in the infant gut. Experimentally, it has been shown that lactate stimulates sIgA production208. By identifying multiple metabolic paths or consequences of reduced milk intake, this model underscores the importance of breastfeeding in not only providing passive IgA immunization to the infant but also its role in promoting the immuno-stimulatory activity of γ-Proteobacteria and lactate during early infancy when mucosal immunity is poorly developed209,210.

Fig. 3: Mediation analysis can be used to establish causality in microbiome-metabolomics studies, with an example shown of the mediating roles of infant γ-Proteobacteria (Mediator 1) and lactate (Mediator 2) on the association between breastfeeding status (causal agent X) and fecal sIgA levels (presumed effect Y) at 3 months.
figure 3

a We establish the variables in the causal diagram, showing the association between causal agent X (breastfeeding status) and presumed effect Y (infant fecal sIgA levels at 3 months). b Using a sequential mediation model, we establish the direct effects of breastfeeding status on fecal sIgA levels, where breastfeeding status is the categorical variable and exclusive-BF is the reference group. c We then calculate the indirect and total effects in the causal diagram. β-coefficients are shown with 95% Confidence Intervals (CI) and significant differences (p < 0.05) are indicated in red.

Although classical mediation approaches may not model nonlinear effects well, new approaches including parametric models and ML mediation analysis show promising results in establishing high-dimensional data without pre-selection of control variables186,191,192,194,211. These exciting new methodologies will enable researchers to handle the ever-increasing amounts and complexity of future datasets generated through large-scale metabolomics studies.

Examples of host-microbiome dynamics that are modulated through metabolism

Advances in metabolomics technologies have greatly expanded our ability to dissect the molecular underpinnings of complex host-microbiome interactions. Recently, there has been a major expansion of the literature in this area. Our small selection of examples provided here (Table 1) can serve as a starting point for exploring this exciting new body of literature.

Host-microbe metabolic dynamics encompass a wide range of biological functions including chronic and infectious disease, immunity, aging, neurology, and physiology10,13,15,18,22,30,33,104,212. One classic example of these host-microbe dynamics relates to SCFAs, which are produced by the gut microflora and have protective effects against colitis and IBS10,13,15. SCFAs also modulate the maturation of immune cells, including microglia, and shape the properties of the visceral pain signaling pathways10,22. Recent studies have shown that SCFAs have radioprotective effects by stimulating repair processes in the gastrointestinal tract and by reducing proinflammatory responses in the host22,110.

Other select examples of host-microbe interactions include the production of inosine by Bifidobacterium pseudolongum, which aids in the activation of host responses to colorectal cancer in immune checkpoint blockade therapy (with anti-CLTA-4 treatment)30. In addition, phenylacetyl-glutamine and phenylacetyl-glycine, produced by Clostridium sporogenes, are associated with cardiovascular disease through modulation of G-protein coupled receptors33. Lactobacillus and Bacteroides microbiome members are correlated with enhanced neurobiological functions like improved memory or alleviated depression symptoms23,159. Bacteroides possess the ability to produce GABA, a key neurotransmitter for mood and memory regulation23, while Lactobacillus spp. enhance GABA production in a lactate-dependent manner159. Lactobacillus members also metabolize dietary tryptophan into indole compounds (e.g., indoxyl-3-sulfate, indole 3-propionic acid, indole 3-aldehyde) which have a protective effect against host inflammation via the activation of aryl hydrocarbon receptors in both experimental autoimmune encephalomyelitis (EAE; representative of multiple sclerosis)18 and colorectal cancer models212.

Microbiota also participate in microbe-to-microbe interactions that impact host homeostasis and affect the ability of specific pathogens to cause infection. In cystic fibrosis infections, a range of in vitro, computational, and human metabolic profiling approaches have collectively established that dominance by the pathogen Pseudomonas aeruginosa is driven by its ability to cross-feed on amino acids, organic acids, and alcohols produced by facultative anaerobes in the environment24,25,28,29. Phascolarctobacterium spp. have a protective effect against Clostridium difficile infections by consuming the succinate needed for Clostridium difficile growth39 while Bacteroides vulgatus produce SCFAs and trimethylamines, which have a protective effect against Vibrio cholerae infections151.

Diet plays a large role in microbiota-mediated effects in the host. Probiotic and prebiotic (e.g., inulin and stachyose) treatments can be used to stimulate specific strains in the gut microbiome to produce higher levels of SCFAs21,36,37,38. Host nutrition can also be corrected through cross-feeding, where in flies, Lactobacillus and Acetobacter establish a syntrophic relationship to overcome nutrient scarcity due to an imbalanced diet61. Diet can also have a negative impact on the host, such as in the case of Chron’s disease, where catabolism of dietary serine by blooms of Escherichia coli and Citrobacter rodentium can worsen inflammatory responses39. Butyrate produced by Firmicutes members of the microbiome in response to a high carbohydrate diet is associated with the hyperproliferation of colon epithelial cells in a colorectal cancer model213. While current technologies have enabled us to unravel many of these host-microbiome interactions along multiple axes of interaction, new technologies will allow us to elucidate even more of the complex interactions underlying community metabolism, disease pathology, and dysbiosis at a molecular level.

Conclusion

Over the last two decades metabolomics has matured from a largely descriptive activity into a tool for probing the molecular underpinnings of biology. Host-microbiome dynamics are some of the most complex biological systems where these tools have been applied and the biological, logistical, analytical, and computational challenges inherent to this field of research make it difficult to move beyond correlational statistics. However, recent advances in model systems, experimental strategies, analytics, and computational tools have opened the door to mechanistic insights into microbiome-mediated biological phenomena. As these approaches mature, we anticipate that we will quickly gain molecular understanding of the myriad mechanisms that the microbiome uses to modulate the host immune system and other important phenomena.