Main

The microbiome–gut–brain axis is informed by biological and epistemological knowledge from many disciplines, spanning microbiology, ecology, psychiatry, and others. Similarly, in its analysis, it is strengthened by methods from across the scientific landscape, as well as some truly interdisciplinary approaches developed specifically for the microbiome–gut–brain axis field (Fig. 1).

Fig. 1: Multidisciplinary techniques that enrich the microbiome–gut–brain axis field that are discussed in this Perspective.
figure 1

a, Constructing a DAG facilitates statistical techniques such as causal inference, and experimental procedures such as FMT can be used to interrogate causality and directionality. b, Techniques such as multivariate modeling and ordination can be used to analyze and interpret large’omics datasets. c, Higher-order patterns within microbiome data such as interaction networks, functional modules, ecological guilds, and amalgamations, called mesoscale features, are used to ask and answer ecologically relevant questions. d, Microbiome time series can be analyzed using mixed-effect models. Special cases of time-series microbiome data, such as microbial volatility, as well as circadian rhythms can be used to ask and answer targeted questions.

In part 1, we introduced core concepts and foundations of compositional data analysis of the microbiome–gut–brain axis:1, ranging from study design and pre-registration of analysis, to selecting the most suitable diversity metrics, and the options for functional inference. In part 2, we provide a perspective on how to leverage techniques from other disciplines, and provide future directions for the microbiome–gut–brain axis field. We hope that this mapping of the broader landscape will provide useful navigation from which the reader may explore original sources as per their needs and interests.

One aim of this piece is to provide context for the methods borrowed, adapted, and developed from both adjacent and far-flung fields and to aid the reader in appraising their respective strengths and weaknesses for microbiome analysis.

As a guiding principle, we believe that the microbiome–gut–brain axis field has an imperative to become a more reproducible science and to operate from a place of deeper statistical and biological understanding. The techniques described in the following have been carefully examined and selected to ensure that they are fit to drive the field toward this goal.

Causality, uncertainty and the microbiome

There has been a growing call for experiments that can establish causality in the microbiome–gut–brain axis field2,3. Causality is a philosophically and statistically contentious term. Granger causality can be thought of as a pragmatic approach to estimating causality between occurrences A and B. In a nutshell, if knowledge of the occurrence A helps predict the occurrence of B, A is said to ‘Granger-cause’ B. However, in the case of complex systems such as the microbiome, where nonlinear dynamics are ubiquitous, Granger causality may not be appropriate4. Historically compelling sets of criteria to establish causality between a microorganism and a disease exist, including Koch’s postulates5 and the Bradford Hill criteria6. Since these are less applicable to the ecosystem approach required for the microbiome–gut–brain axis, we will not elaborate on these criteria (but see Box 1, which leverages experimental design to interrogate causality). Rather, we provide an overview of causality concepts from epidemiology and econometrics that have been applied to the microbiome–gut–brain axis, as well as some important pitfalls.

Causal inference analysis

Causal inference is commonly the underlying motivation for microbiome–gut–brain axis studies, even when it is not being explicitly tested. As outlined by ref. 7, being explicit about the causal motivations of (even) an observational analysis “reduces ambiguity in the scientific question, errors in the data analysis, and excesses in the interpretation of the results”. Rather than avoiding causal language because, as the oft-repeated cautionary tale goes, correlation does not mean causation, Hernán7 suggests that we instead ask clearer causal questions and improve use of causal inference methods such as adjusting for confounding. There are many occasions where randomized controlled trials that would provide stronger evidence of causality are not feasible, biologically plausible, or indeed ethical. The complexity of the gastrointestinal and microbial environments is certainly difficult to replicate completely as interventions in clinical and even preclinical trials.

A directed acyclic graph (DAG8), or causal diagram, is a useful first step in making explicit the causal hypotheses and underlying assumptions about variables in a study. Creating a DAG serves as a prompt to consider, discuss with colleagues, and design analyses. It is best done at the conception phase of a study so that it may inform aspects of study design, from the timing of data collection to the list of potentially confounding variables about which to collect data. We stress here that while DAGs are a helpful tool to ask causal questions, they do not necessarily allow the user to quantify causality from cross-sectional data. Specialized mechanistic follow-up studies remain the gold standard in this regard. In brief, hypothesized relationships between variables are represented by arrows between them, pointing from cause to effect. By convention, causal diagrams point left to right, with exposure variables on the left and outcome variable on the right. Then add any variables that causally impact the main exposure of interest, or the outcome, using arrows between variables to depict the direction of causality. DAGs must be acyclic; that is, variables must not contain feedback loops; relationships between variables must be depicted as unidirectional. This differs from infographics of gut–brain interactions, which are frequently bidirectional as per biological reality. An example DAG can be found in Fig. 2a, and we expand on DAG creation in Box 2. This DAG was created using the dagitty R library and reflects variables relevant to the previously published schizophrenia dataset9 used in the accompanying Rmarkdown script.

Fig. 2: An example DAG followed by graphical representations of multivariate and mediation analyses using miniature DAGs.
figure 2

a, A DAG describes a hypothetical causal pathway, where an arrow from A to B (A → B) suggests a causal relationship between A and B. Ideally, the framework of constructs and variables is considered before (and thereby informing) data collection. Some examples of relevant unobserved variables are shown here in gray. Under the assumptions of this DAG and with only three available covariates (sex, smoking status, and body mass index), sex and smoking status comprise the minimal adjustment set required to estimate the total effect of gut microbiota on schizophrenia. b, Table illustrating the relationship between models where predictors and responses can be either univariate or multivariate. Predictor variables are on the left of each diagram, whereas response variables are on the right. Univariate predictor variable to multivariate response variables was included for the sake of completion. Practically, a univariate predictor variable to multivariate response variables can be approached in the same fashion as multivariate predictor variable to a univariate response variable. c, Two DAGs illustrating the scenarios from the section on mediation analysis. Diet affects the microbiome, and diet affects the brain. Mediation analysis helps us ask and answer whether measured associations between the microbiome and the brain are spurious (scenario I) or whether the microbiome affects the brain through diet (scenario II). BMI, body mass index.

Mediation analysis

Mediation analysis is used to investigate whether a variable transmits its effect on the outcome through another mediator variable10. For example, an effect of diet on host behavior is well documented, as are effects of diet on the microbiome11. Similarly, the gut microbiome is also known to affect host behavior12. If we were to test whether diet could affect host behavior via its effects on the microbiome, that would require a mediation analysis. Where mediation explains a relationship, there are two main possibilities:

  • Partial mediation refers to the scenario where there is both a direct effect and an indirect (mediation) effect; for example, if diet were to both directly affect behavior and indirectly affect behavior by modulating the microbiome—which in turn affects behavior.

  • Complete mediation refers to the scenario where—using the preceding example again—diet affects only the microbiome, which in turn affects behavior, but diet on its own does not directly affect behavior.

One recent example of how mediation analysis can be used in the microbiome–gut–brain axis field can be found in the context of autism, diet, and the microbiome13. The authors convincingly showed that alterations in the microbiomes of autistic children can be explained by a restricted diet, a common trait in autistic children. They concluded that since diet can explain the altered microbiome, that altered microbiome does not play a causal role in the occurrence of autism. In a letter to Yap et al.13, Morton et al.14 argued that their model implicitly assumed the absence of a relationship between diet and the microbiome (for example, independence), which is known to be untrue. Morton et al.14 argued that a more appropriate model would be one where diet affects (1) host phenotype directly and (2) the microbiome, which in turn affects phenotype. Essentially, they argue that the microbiome acts as a partial mediator in this autism example. See Fig. 2c for two miniature DAGs illustrating these two scenarios.

Several excellent tools exist to perform mediation analysis. The mediation package in R takes standard generalized linear model fits as input15. Also see the primer on how to perform a mediation analysis in R in the Supplementary Information.

We note that mediation is accompanied by inherently longitudinal assumptions. One presumes that due to the occurrence of some exposure at time 1, a mediating variable is affected at time 2, and the outcome shift as a result is observed at some point in the future (time 3). The use of mediation analysis in cross-sectional observational data, although common, is not considered best practice16. The reason for this is that it presumes that the causal chain being tested is correct and precludes an examination of a potential alternative temporal order of the variables. This is particularly relevant for variables that are dynamic, such as diet, the microbiome, and mental states. So mediation analysis in cross-sectional observational studies is correlational and needs to be validated in targeted follow-up studies. Some alternative options with fewer data requirements have been trialed through data simulation,17 demonstrating that sequential mediation—when data for the exposure, mediator, and outcome are collected only once each, but at least longitudinally and in a meaningful temporal sequence—can provide adequate sensitivity to identifying the presence of mediation. The gold standard is the resource-intensive multilevel longitudinal mediation, where variables that represent enduring exposures (such as diet) are collected repeatedly, and path coefficients between variables are allowed to vary across individuals. This may also be ideal for contexts in which a large degree of inter-individual variability might be expected (such as in host–microbiome studies).

Notably, estimating an indirect effect through mediation analysis requires substantially more power than estimating direct effects in traditional analysis. For example, one popular method to estimate the effect size of a mediation analysis is to multiply the two coefficients (exposure to mediator and mediator to outcome), which will always yield a smaller absolute coefficient compared to its two component coefficients18. Also see Box 3 on power calculations.

Mendelian randomization and the microbiome

In contrast to causal inference, Mendelian randomization is a statistical method from the field of epidemiology, often used to estimate the causal effects of genetic factors on a phenotype in large cohorts.19,20,21 In a nutshell, Mendelian randomization leverages the fact that genotype is fixed at conception and therefore takes place before the manifestation of a phenotype. This, along with other assumptions, allows the researcher to assess causality and directionality of the exposure (genotype) on the outcome (phenotype).

Recently, Mendelian randomization has been applied to microbiome data in the sense that genotype is replaced with microbiome metagenomic content. Particular care is therefore necessary. Unlike genotype, the microbiome is not fixed at conception but remains in constant flux throughout life (Time-varying signals). While it makes sense to assess the causal effect of host genotype on the microbiome, for example, in the case of a host metabolic disorder altering the host gut and hence the microbiome22, it seems much less clear whether taking the microbial metagenome as a fixed exposure is appropriate.

High-dimensional data science

Microbiome–gut–brain axis experiments tend to yield complex, high-dimensional datasets. Here, we will discuss techniques to handle these types of data and strategies to integrate multiple high-dimensional data from the same experiment.

Stratifying and clustering samples

In some cases, it is necessary to stratify data into clusters, distinct subgroups based on microbiome signature. Stratification is a common method of defining enterotypes, which are large subgroups based on microbial taxonomic composition. The precise number of true enterotypes, as well as the best way to define them, is still up for debate (although 3–4 enterotypes are often cited23,24). Initial efforts involved calculating a Jensen–Shannon dissimilarity matrix and performing cluster analysis (clusters here corresponding to enterotypes) using the partition around medoids approach25. More recently, studies have employed the Dirichlet multinomial mixtures approach, a promising technique to estimate enterotypes from the Bayesian school26. In brief, the method involves estimating a probability vector for each sample and then estimating whether these vectors came from the same source (metacommunity, enterotype) or from separate enterotypes. Enterotypes appear to be important constructs because they are related to factors such as host health, diet, and exercise, despite some known limitations27. Notably, bacterial load is not easy to estimate using metagenomic techniques such as 16S and shotgun (although compare ref. 28) but can rather be assessed by pan-bacterial quantitative PCR (qPCR) or, most accurately, using flow cytometry. Bacterial load is associated with enterotype identity and may bias results23,24,27.

Stratifying samples on the basis of feature abundance is a defensible approach under some circumstances, for example, when pursuing functional groups of microorganisms that might exhibit competitive exclusion. However, it is rarely advisable to stratify samples into subgroups while in the middle of an analysis or when working with datasets comprising only 10s–100s of samples because there are too few samples for validation. It is especially important to validate data-driven stratification, either in a new cohort or in a subsection of withheld data that can be used as a validation set. Spurious strata can frequently arise from technical or biological artifacts, leading enthusiastic researchers on long and fruitless tangents. Clustering algorithms, by design, will cluster and can even find seemingly impressive clusters among random noise.

Multi-omics integration

The microbiome refers to the collection of microbial genes in a sample. While the present work focuses on this type of data, other ’omics also exist29,30,31,32. Microbial genetic data provide evidence of microorganism presence as well as their functional potential.33 Besides metagenomics, the three most common types of ’omics data in microbiome–gut–brain axis studies are the following:

  • Metabolomics: the metabolites and small molecules in a sample. Mass spectrometry or nuclear magnetic resonance spectroscopy are the most common techniques to measure the metabolome. Metabolomics can shed light on the functional consequences of a given microbiome.

  • Metatranscriptomics: the sequencing of RNA in a sample. In practice, metatranscriptomics can be thought of as RNAseq on a microbial community rather than a single organism. Metatranscriptomics can tell us about the transcriptional activity of a microbial community. A microorganism may be present and have a certain gene, but it may not be transcribing that gene34.

  • Metaproteomics: the proteins in a sample. Typically, metaproteomics relies on specialized mass spectrometry techniques to identify proteins and derive their sequences. Metaproteomics goes further than metatranscriptomics and tells us whether the transcribed genes are translated to proteins.

There are three broad approaches to data integration, treating datasets as either univariate or multivariate. The suffixes -variable and -variate are often used interchangeably, but they refer to subtly but meaningfully distinct concepts35. In short, -variate refers to the structural nature of the data, whereas -variable refers to the structure and number of variables in the statistical model (also Fig. 2b):

  • Univariate-univariate: with two separate multivariate datasets, one can perform an acceptable analysis using ‘simple’ univariate methods by correlating each microbiome feature (for example, taxa or gene) with individual features in the other dataset, one at a time. Metrics such as Pearson’s and Spearman’s Rank correlation coefficients are commonly used for this purpose. Both of these metrics can be thought of as special cases of a linear model, which we particularly recommend as it allows for the inclusion of covariates.

  • Univariate-multivariate: treat one feature from one dataset as a dependent variable, and use all features from the other dataset as the predictors. By repeating this for each feature, all associations between the datasets are described.

  • Multivariate–multivariate: multivariate regression, such as a canonical correlation analysis or redundancy analysis36, to obtain a single model that associates all features from one dataset with all features from the other dataset. The mixOmics package provides a user-friendly implementation of multivariate methods for microbiome research37,38. Similarly, neural networks or other machine learning can be used39,40,41.

For multivariate–multivariate analysis, one compelling method, DIABLO42, extends this approach by comparing association networks between phenotypes, focusing on the interactions between two ’omics data tests rather than the values within the two individual datasets. This permits the discovery of patterns not necessarily visible in either of the individual datasets. Note also that it is possible to extend any of these approaches to incorporate external information about known relationships within the individual dataset or across the two datasets (for example, via a gene ontology database). For example, joint pathway analysis takes advantage of existing biological knowledge structures by mapping two ’omics datasets to the same metabolic pathways and then assessing the joint coverage as a readout of pathway enrichment43,44. Such knowledge structures could also be leveraged to constrain an analysis to include only feature pairs that are canonically able to interact according to the database, thus potentially preserving power by avoiding unnecessary hypothesis testing, as exemplified by the anansi framework.45 Whatever approach one uses, analysts should take care to normalize or transform their data appropriately, especially since correlations can yield spurious results when measured for compositional data46,47. As with differential abundance analysis, multiple multivariate tests should always be accompanied by an FDR adjustment. When null hypothesis testing is not straightforward in multivariate methods, permutations or algorithmic validation (for example, cross-validation) may be used there instead.

Exploring the mesoscale

Mesoscale features of the microbiome contain information about patterns within parts of a microbiome that can be seen across samples—not necessarily its smallest parts (the microscale) or about the whole system (the macroscale). Mesoscale analysis focuses on identifying community-level patterns that define the ecosystem(s). This is useful because phenomena in a microbiome may be more readily explained by aggregated patterns in the data rather than by any individual feature. The mesoscale is an important object of study in theoretical ecology48. These emerging techniques derive microbiome mesoscale features. The first three make use of external knowledge. The final two are purely data driven:

  • Ecological guilds: ecological guilds are taxonomically unrelated but functionally related clusters of microorganisms that have a shared role in the microbiome (for example, occupy a common niche). For example, microbial communities across a wide span of environments, including soil, the ocean, and the human gut, could be assigned to trophic groups on the basis of how they feed on certain substrates and subsequently pass on metabolites to another trophic group49. While ecological guilds are a promising concept in microbiome science, to our knowledge there are currently no standardized pipelines or databases that can be used to detect and compare ecological guilds across cohorts and experiments (but compare ref. 50). Such tools would be welcome additions to the field51,52.

  • Functional modules: functional modules are a list of curated metabolic pathways encoding for processes that are related to a specific aspect of the microbiome. We will consider two classes of functional modules. Gut–brain modules cover pathways that are related to gut–brain communication, such as serotonin degradation and histamine synthesis. The complete list of gut–brain modules can be accessed as a table in the supplementary files of ref. 53. Gut-–metabolic modules, from the same group, encompass metabolic processes in the microbiome. Changes in gut–metabolic modules can indicate a shift in the microbial metabolic environment and thereby in the fitness landscape, thus allowing for microorganisms with different metabolic features to thrive. The complete list of gut–metabolic modules can be accessed as a table in the supplementary files of the paper that introduced them54. Functional modules are especially interpretable and help develop hypotheses for future experiments. We note that functional module analysis depends on the availability of a functional abundance table (see the section on functions in the companion piece of this Perspective1).

  • Enrichment analysis: differential abundance (DA) analysis is first performed on taxa or genes or other ’omics features, and then a functional database is used to summarize the DA results. In the simplest case, the DA results can be dichotomized into significant or non-significant, and functional status can be dichotomized as present or absent. For each function, one could perform a Fisher exact test (or similar) to test over-enrichment among the significant taxa, genes, or features43,55. Gene set enrichment analysis is a popular generalization of this concept and is commonplace in gene expression analysis55.

  • Network analysis: network analysis is most often applied to study or visualize associations between microbiome features such as taxa or genes. This requires some measure of association. The Pearson’s correlation is the most popular; however, correlations have been shown to yield spurious results when applied to compositional data56. For this reason, several alternatives have been designed specifically for microbiome data46,57,58 (also see the discussion on compositionality in the companion piece to this Perspective1). These metrics build on a log-ratio transformation that makes them more robust to the biases introduced by compositionality59, although they can still be prone to false positives60. A recent benchmark of 213 single-cell datasets has shown that proportionality has excellent performance for sparse high-dimensional data such as those encountered in microbiome research61.

  • Balance selection and summed log-ratios: balance selection and data-driven amalgamation are two new approaches to learning mesoscale features directly from the data. In both cases, the motivation is to find mesoscale features that serve as a biomarker to predict another variable of interest. These mesoscale features are unique in that they are defined explicitly as a ratio between groups of taxa, similar to the Firmicutes-to-Bacteroidetes ratio62. By using a ratio of taxa, any normalization factors would cancel, thus making the method normalization-free. When the groups of taxa are summarized by a geometric mean, the resultant mesoscale feature is called a balance. When they are summarized by a sum, the resultant mesoscale feature is called a summed log-ratio. Software tools such as selbal63, balance64, amalgam65, and CoDaCoRe66 enable analysts to learn mesoscale features in a few lines of code. It is customary to validate the reliability of these features by measuring predictive performance in a withheld test set67.

Time-varying signals

While 16S and shotgun sequencing allow only for snapshot measurements of the microbiome, in reality, microbiomes are dynamic ecosystems in constant flux. To account for this, it has become more common for studies to include multiple (repeated) measures of the microbiome. However, time-series analysis necessitates special considerations68,69.

Statistical considerations with time-varying data analysis

Time-series data, where the same microbiomes are sampled repeatedly, intrinsically break the assumption of independence between samples that many statistical tests rely on. Mixed-effects models are well equipped to handle this type of data, using the resampled microbiome as a random effect. The well-documented and widely used lme4 package in R provides an excellent framework for this70. More-specialized microbiome tools such as MaAsLin271 are also available.

A recent study on the temporal variation of the microbiome estimated that inter-individual variation is smaller than intra-individual variation72. Taking several microbiome measurements over time may therefore be necessary to increase power to detect group differences. Another approach to deal with this high intra-individual variation is to include microbial variance in the model73. This allows investigation of whether microbial variability itself is associated with the phenotype of interest. The idea that microbial variance rather than abundance can be informative for a phenotype is core to the idea of volatility.

Volatility

The microbiome is a dynamic ecosystem that undergoes constant change. The degree of change in the microbiome over time is called volatility, which is inversely related to stability. The term was first coined during the early days of the Human Microbiome Project in the context of instability74 and was soon thereafter used to describe the degree of change in the microbiome between two time points75. It can be helpful to think of volatility as a change in sample diversity (alpha or beta) over time. In a neutral setting, without intervention, a higher volatility is generally considered to be associated with negative health outcomes76. One way to calculate volatility is to measure the beta diversity between two or more time points corresponding to the same host. When measuring volatility in this fashion, it is especially useful to choose a beta diversity metric that is also a distance (that is, follows triangle inequality, such as PhiLR or Aitchison distance) so that any comparisons are standardized for all time points. Volatility has recently been shown to differ between enterotypes, indicating that microbiome composition at least partially explains microbiome volatility72. Because sampling depth is known to affect beta diversity indices, it may be worth subsampling before volatility analysis77.

Circadian rhythms

Circadian rhythms, or 24-hour biological cycles, are key in maintaining physical and mental health78. The microbiome is an example of a biological system that displays such a 24-hour cycle79,80. Typically, models that assess rhythmicity will make use of a sinusoidal model rather than a conventional linear model. Circadian rhythms are a special case of time-varying data as there is an implicit assumption that microbial taxa will oscillate around a set mean (mesor). Due to the 24-hour period of a circadian rhythm, time of sampling becomes an important source of variance and thus a relevant covariate even when the researcher is not interested in investigating circadian rhythms per se. We recently developed the kronos package in R to analyze circadian rhythms in the microbiome81.

Consolidating and looking forward

As the microbiome–gut–brain axis field continues its maturation, we shift our priorities away from a basic demonstration of relevance and toward formulating and addressing more mechanistic questions. In the final section of this Perspective, we briefly look forward to efforts to consolidate findings in the field.

Meta-analyses

In a nutshell, meta-analyses incorporate outcomes from numerous studies on the same subject to estimate a ‘true effect` on the basis of a weighted summary of the component studies. When planning a meta-analysis of microbiome–gut–brain axis studies, it is particularly important to consider which features to analyze. For example, it may be preferable to investigate the role of microbial functions in a disorder rather than taxonomy-level data. In addition, due to large inter-study heterogeneity in methodology, it may not be appropriate to compare reported outcomes from studies at all, and a ‘meta-re-analysis` from raw data may be warranted. This again underlines the importance of making microbiome data publicly available. We note and applaud the burgeoning development of meta-analysis methods for microbiome studies such as MMUPHin by ref. 82, which account for the heterogeneity in pre-processing that precludes standard meta-analysis tools and techniques.83,84,85 A groundswell of attempts at reproducing previous findings and quantitative synthesis of the literature to date will improve the robustness of the field, as it has done in others.

Toward enriching microbiome–gut–brain axis research

In this part 2 of our Bugs as features Perspective, we have taken you on a tour of both adjacent and far-flung topics to enrich contemporary microbiome–gut–brain axis research, from Mendelian randomization and mediation analysis to numerous ways to explore microbiome patterns of the mesoscale. In combination with the concepts and foundations detailed in part 1, and the corresponding supplementary code tutorial, we have described the key considerations for microbiome–gut–brain axis analysis1. In our opinion, establishing causality, integrating multi-omics data, and accounting for the dynamic nature of the microbiome are key. We hope that this Perspective has assisted with confident navigation of the microbial landscape. We trust that the increased use of biologically and statistically sound methods such as those described here will improve our understanding of the complex phenomenon known as the microbiome–gut–brain axis.