EMBED: Essential MicroBiomE Dynamics, a dimensionality reduction approach for longitudinal microbiome studies

Shahin, Mayar; Ji, Brian; Dixit, Purushottam D.

doi:10.1038/s41540-023-00285-6

Download PDF

Article
Open access
Published: 20 June 2023

EMBED: Essential MicroBiomE Dynamics, a dimensionality reduction approach for longitudinal microbiome studies

Mayar Shahin¹,
Brian Ji² &
Purushottam D. Dixit^1,3,4^nAff5

npj Systems Biology and Applications volume 9, Article number: 26 (2023) Cite this article

1492 Accesses
11 Altmetric
Metrics details

Subjects

Abstract

Dimensionality reduction offers unique insights into high-dimensional microbiome dynamics by leveraging collective abundance fluctuations of multiple bacteria driven by similar ecological perturbations. However, methods providing lower-dimensional representations of microbiome dynamics both at the community and individual taxa levels are not currently available. To that end, we present EMBED: Essential MicroBiomE Dynamics, a probabilistic nonlinear tensor factorization approach. Like normal mode analysis in structural biophysics, EMBED infers ecological normal modes (ECNs), which represent the unique orthogonal modes capturing the collective behavior of microbial communities. Using multiple real and synthetic datasets, we show that a very small number of ECNs can accurately approximate microbiome dynamics. Inferred ECNs reflect specific ecological behaviors, providing natural templates along which the dynamics of individual bacteria may be partitioned. Moreover, the multi-subject treatment in EMBED systematically identifies subject-specific and universal abundance dynamics that are not detected by traditional approaches. Collectively, these results highlight the utility of EMBED as a versatile dimensionality reduction tool for studies of microbiome dynamics.

Context-aware dimensionality reduction deconvolutes gut microbial community dynamics

Article 31 August 2020

Bugs as features (part 1): concepts and foundations for the compositional data analysis of the microbiome–gut–brain axis

Article 05 December 2023

Analysis of microbial compositions: a review of normalization and differential abundance analysis

Article Open access 02 December 2020

Introduction

Advances in sequencing have enabled the characterization of host-associated microbiomes at an unprecedented resolution^1,2. In contrast to static cross-sectional snapshots of these ecosystems, longitudinal studies offer unique insights into the biological processes structuring microbial ecosystems within individual hosts. For example, recent longitudinal studies on gut microbiome have elucidated the determinants of microbiome colonization in early childhood^3,4, the effects of the microbiome on outcomes following bone-marrow transplant⁵, and the recolonization of microbial communities following antibiotic perturbation^{6,7,8,9,10,11}.

Yet, understanding how the microbiome changes in response to environmental perturbations such as host diet variation^12,13 and antibiotic administration^10,11 remains challenging. This is because of the enormous organizational complexity of these ecosystems, comprising thousands of individual bacterial taxa whose abundances vary substantially across space and time^{12,14,15,16,17} and across biological replicates¹⁸. In addition, technical sequencing noise can seriously confound true abundance changes^15,19,20. For example, technical noise is likely to be the most dominant factor in the observed abundance variability in more than half the bacterial taxa in longitudinal gut microbiome studies¹⁵ and likely remains a significant contributor for all measured taxa.

Despite this complexity, recent work suggests that abundances of individual bacterial species fluctuate with collective responses to perturbations^10,11,12,13. Therefore, the high-dimensional dynamics of the microbiome could potentially be understood as dynamics of a few collective variables on a manifold of a much smaller dimension²¹. Indeed, approaches such as multidimensional scaling that embed microbiome samples on a smaller dimensional manifold are popular^22,23,24. However, these methods only identify shifts at the community level¹⁸. Crucially, these methods do not account for temporal correlations in abundances of individual bacterial taxa and variability across subjects.

At the same time, there is a long history of using dimensionality reduction for multivariate time-series data²⁵. Indeed, several methods have been developed in the last decade focusing specifically on the analysis of microbiome dynamics. Methods such as ecogroup identification²⁶ use covariation in longitudinal data to infer interaction patterns between taxa. In contrast, methods such as MDSINE2²⁷ and MTV-LMM²⁸ infer interactions among species by fitting microbiome abundance dynamics to phenomenological models. Methods such as LUMINATE²⁰, TGP-CODA¹⁹, and DIVERS¹⁵ quantify the magnitude of noise in abundance time series. Finally, dimensionality reduction approaches such as CTF¹⁸ impute lower-dimensional representations for individual subjects as well as time points using sparse tensor factorization of log-transformed data with the purpose of identifying groups of subjects with unique dynamical signatures.

In this context, we present EMBED: Essential MicroBiomE Dynamics. EMBED is a probabilistic nonlinear tensor factorization-based dimensionality reduction method. EMBED infers common dynamical features in microbiome trajectories of multiple subjects that experience the same environmental perturbation (dietary shifts, antibiotic exposure, etc.). EMBED identifies a set of unique and orthogonal temporal bases which we call Ecological Normal Modes (ECNs) and taxa- and subject-specific loadings that quantify the contribution of individual ECNs in determining the abundance dynamics of taxa in individual subjects. ECNs are the statistically independent and unique dynamical templates along which the abundance trajectories of individual bacteria are decomposed. As we will show below, ECNs can also be viewed as the latent drivers of the microbial ecosystem. In systems strongly driven by environmental perturbations, they are reflective of the environmental perturbations as well as inherent dynamics of the microbiome. EMBED has several salient features. First, bacterial abundances are known to vary substantially even over short periods of time¹⁶. To model this variability, EMBED utilizes the exponential Gibbs–Boltzmann distribution (also known as the logistic equation). The Gibbs–Boltzmann distribution allows EMBED to capture very large changes in bacterial abundances with relatively small changes in the corresponding latents²⁹. Second, by restricting the number of ECNS to be low, EMBED can provide a low-dimensional description of the community by filtering out small fluctuations in the data that may be potentially unimportant. Third, ECNs are inferred using a probabilistic model that accounts for sequencing noise inherent in all microbiome studies¹⁵. Fourth, similar to the normal modes in structural biology³⁰, ECNs represent statistically independent modes of collective abundance changes. Fifth, the explicit multi-subject treatment in EMBED systematically identifies universal and subject-specific dynamical behaviors and bacterial taxa that exhibit that behavior.

Using synthetic data and several publicly available longitudinal datasets^12,13,14, we show that EMBED-based low-dimensional approximation of microbial community dynamics is accurate and robust to sequencing noise, underscoring the low-dimensional nature of microbiome dynamics. Using synthetic data, we show that EMBED infers statistically independent dynamical modes. Using two datasets that encompass major ecological perturbations including dietary changes¹³, and antibiotic administration¹⁰, we show that the identified ECNs reflected specific ecological behaviors and serve as templates to reconstruct the dynamics of individual bacterial taxa. The loadings identify universal and subject-specific bacterial taxa dynamics. These results show that EMBED will be an important dimensionality reduction tool to decipher collective dynamical behaviors within the microbiome.

Results

EMBED identifies reduced-dimensional descriptors for longitudinal microbiome dynamics

In EMBED (Fig. 1), we model microbial abundance counts ${n}_{{os}}(t)$ (Operational taxonomic unit, OTU “o”, subject “s”, and time point “t”) as arising from a multinomial distribution. The likelihood of observing the data is given by:

$$L=\prod \limits_{s,t}\frac{{N}_{s}\left(t\right)!}{\prod\limits _{o}{n}_{{os}}\left(t\right)!}\prod\limits _{o}{q}_{{os}}{\left(t\right)}^{{n}_{{os}}\left(t\right)}$$

(1)

where ${N}_{s}\left(t\right)=\sum _{o}{n}_{{os}}(t)$ is the total read count on a given day t for subject s. The probabilities ${q}_{{os}}\left(t\right)$ are modeled as a Gibbs–Boltzmann distribution²⁹

$${q}_{{os}}\left(t\right)=\frac{1}{{\Omega }_{{st}}}{{\exp }}\left(-\mathop{\sum }\limits_{k=1}^{K}{z}_{{tk}}{\theta }_{{kos}}\right).$$

(2)

In Eq. (2), ${z}_{{tk}}$ are time-specific latents that are shared by all OTUs and subjects, ${\theta }_{{kos}}$ are OTU- and subject-specific loadings that are shared across all time points, and ${\Omega }_{{st}}$ is the normalization constant. This low-rank tensor factorization is a special case of the so-called Tucker decomposition³¹. The number of latents is chosen such that $K\ll O,T$ to obtain a reduced-dimensional description. The parameters are estimated using log-likelihood maximization. While most microbiome abundance data are compositional³², new techniques are being developed to measure absolute bacterial loads^15,33,34. In addition to modeling relative abundance data, EMBED is also equipped to model measurements of absolute abundances. To do so, we use the absolute abundance instead of the daily total read count ${N}_{s}\left(t\right)$ in Eq. (1).

The optimal values of the parameters depend on the initial conditions but are nonetheless related to each other via a linear transformation²⁹. We therefore identify a unique and orthonormal representation for the latents by exploiting the dynamical nature of the data. The long-term stability of the microbiome is now well-established^16,17,35. Therefore, we fit a “return to normal” linear dynamical model to inferred latents:

$${{\boldsymbol{z}}}_{t+1}={\boldsymbol{A}}{{\boldsymbol{z}}}_{t}+{\boldsymbol{u}}+{\boldsymbol{\varepsilon }}{\boldsymbol{.}}$$

(3)

In Eq. (3), the matrix A is assumed to be symmetric, u are the baseline values, and the noise ε is assumed to be Gaussian distributed and uncorrelated. After diagonalizing the inferred interaction matrix (Supplementary Information section 1), ${\boldsymbol{A}}{\boldsymbol{=}}{{\boldsymbol{v}}}^{T}{\boldsymbol{\Lambda }}{\boldsymbol{v}}$, we find that the re-oriented latents, or the ecological normal modes (ECNs), ${{\boldsymbol{y}}}_{t}={\boldsymbol{v}}{{\boldsymbol{z}}}_{t}$ fluctuate independently of each other

$${y}_{t+1,k}={\Lambda }_{k}{y}_{{tk}}+{u}_{k}^{{\prime} }+{\varepsilon }_{k}^{{\prime} }.$$

(4)

In Eq. (4), ${{\boldsymbol{u}}}^{{\boldsymbol{{\prime} }}}={\boldsymbol{vu}}$, and ${{\boldsymbol{\epsilon }}}^{{\boldsymbol{{\prime}}}}={\boldsymbol{v}}{\boldsymbol{\epsilon }}{.}$ We redefine the corresponding loadings ${\boldsymbol{\Phi }}={{\boldsymbol{v}}}^{T}{\boldsymbol{\theta }}{.}$ Notably, since ${{\boldsymbol{vv}}}^{T}={\boldsymbol{I}}$, this simultaneous transformation is a mere reorientation of the latents and the loadings and does not change model predictions²⁹. As we will show below, the orthonormal ECNs are uniquely defined for a given dataset. We note that the actual dynamics of the latents are likely to be more complex than the linear model (Eq. (3)). Yet, similar to normal mode analysis³⁰, as we will show below, ECNs represent a reorientation of the latents that uncovers the unique and orthogonal templates of microbial abundance fluctuations.

EMBED accurately and robustly approximates microbiome abundance time series using dynamics on a lower-dimensional manifold

Using EMBED, we approximated microbiome abundance time series from publicly available longitudinal datasets on human beings^11,12,14 and mice^10,13 as well as synthetic data generated using a multispecies Lotka–Volterra model³⁶ (Supplementary Information section 1). When using EMBED and other reconstruction methods to model synthetic data, we sampled relative abundances using the true underlying propensities of species and a multinomial distribution with a sequencing depth of 10⁴. The accuracy of reconstruction was evaluated against the true propensities as predicted by the model. We compared EMBED with CTF (compositional tensor factorization), a recently developed dimensionality reduction method by Martino et al.^18,37, and sparse vector autoregressive modeling (referred to as Lasso from here onwards)^38,39. While similar to EMBED, CTF obtains both time-series reconstruction and lower-dimensional embedding, Lasso only obtains time-series reconstruction using fewer parameters than the data. To put Lasso on an equal footing with low-rank factorization methods like EMBED and CTF, the number of parameters in Lasso was adjusted to be approximately equal to EMBED and CTF by adjusting the Lagrange multiplier that dictates sparsity (# of parameters = K × O + K × T where O is the number of OTUs and T is the number of time points for a single subject time series, Supplementary Information section 2).

In Fig. 2, we show that EMBED-based reconstruction was significantly more accurate than CTF and Lasso both at the level of community composition as well as the dynamical trajectories of individual OTUs. Figure 2a–c show results for the publicly available datasets and Fig. 2d–f show results for the Lotka–Volterra model. Notably, as seen in Fig. 2a–f, EMBED was better at data reconstruction than CTF and Lasso for every time series. We note that the results presented below are insensitive to the dimension of the latent space (Supplementary Figs. 1 and 2) as well as the sequencing depth (Supplementary Fig. 3) and to temporally fluctuating carrying capacities in the Lotka–Volterra model (Supplementary Fig. 4). The details of the analyses can be found in Supplementary Information section 3.

**Fig. 2: EMBED-based reconstruction of microbiome time series is accurate and precise.**

Figure 2a shows the KL divergence between the observed community composition and the reconstructions based on EMBED, CTF, and Lasso. EMBED-based reconstruction was more accurate at the community level (Wilcoxon signed rank $p=1.8\times {10}^{-5}$ for the comparison between EMBED and CTF and EMBED and Lasso). Figure 2b shows that the mean squared error in OTU-specific longitudinal trajectories (averaged over OTUs) was lower in EMBED-based reconstruction (Wilcoxon signed-rank $p=1.8\times {10}^{-5}$ for the comparison between EMBED and CTF and EMBED and Lasso). Finally, in Fig. 2c, we show the Pearson correlation coefficient between the observed longitudinal time series of individual OTUs and the corresponding reconstruction. The Pearson correlation coefficient was averaged across OTUs for each subject and one number was reported per subject. This Pearson correlation coefficient was higher for EMBED (Wilcoxon signed rank $p=1.8\times {10}^{-5}$ for the comparison between EMBED and CTF and EMBED and Lasso). Figure 2d–f shows similar plots for synthetic data (Wilcoxon signed rank $p=7.5\times {10}^{-10}$ for the comparison between EMBED and CTF and EMBED and Lasso). We note that all p-values are identical because EMBED reconstruction was always better than CTF and Lasso reconstructions for individual datasets (not shown), leading to identical p-values for the nonparametric Wilcoxon test.

We next tested how the three methods perform when reconstructing OTU-specific daily abundance changes (Fig. 2g). To that end, we estimated the log ratio of daily abundance changes $\Delta ={{{\log }}}_{10}\frac{{x}_{o}(t+1)}{{x}_{o}(t)}$ across all OTUs and all days both in the publicly available time-series data and in the reconstructed time series ${\Delta }_{M}$ (M = EMBED/CTF/Lasso). We then investigated the dependence of the absolute error ${{\rm{\delta }}\Delta ={\rm{|}}\Delta -\Delta }_{M}|$ on the abundance ${x}_{o}(t)$. To that end, we binned the reconstruction error for every 5th percentile of OTU abundances ${x}_{o}(t)$. In Fig. 2g, we plot the average error for each of the 5-percentile intervals (error bars represent standard errors of the mean). Interestingly, we see that while CTF is more accurate than EMBED and Lasso at reconstructing low abundances, EMBED is more accurate in reconstructing abundance changes for highly abundant OTUs. Notably, our analysis suggests that abundance fluctuations of OTUs with mean abundance <0.1% (log₁₀ = −3) are dominated by technical noise¹⁵. We therefore conclude that CTF-based reconstruction is accurate in modeling abundance changes that are dominated by noise, suggesting that CTF-based reconstruction may overfit to small and noise-dominated variations in OTU abundances. In contrast, EMBED-based reconstruction is more accurate compared to both CTF and Lasso for OTUs whose abundances are measured with minimal technical noise.

The reorientation z→y of latents using a dynamical model (Eqs. (3) and (4)) allows us to identify independent directions of significant collective dynamics in the microbiome without changing the accuracy of model predictions. In contrast, any other orthogonal decomposition of the microbiome time series that does not explicitly take into account dynamics is likely to result in a latent space description that involves a mixture of independent modes. To test the dynamical independence of ECNs, we used the publicly available time series as above. Each time series was approximated using EMBED using K = 5 ECNs. We correlated the inferred ECNs with time series of abundances of individual taxa. Correlations that were above a 5% FDR using the Benjamini–Hochberg procedure were deemed significant. As seen in Fig. 2h, on average, 35% of OTUs correlated with only one ECN while 45% of OTUs correlated with two or more ECNs. In contrast, 28% of OTUs correlated with only one component obtained using CTF (Wilcoxon signed-rank test p= 0.033) and 54% OTUs correlated with two or more components (Wilcoxon signed-rank test $p=0.014$). Notably, the specificity of taxon-ECN correlations was not due to the accuracy of the EMBED-based reconstruction. To test this, we performed SVD on the zθ matrix prior to the reorientation step (Eqs. (3) and (4) above) to obtain orthonormal latents ${{\boldsymbol{y}}}_{{SVD}}$ that did not consider the longitudinal nature of the data. We found that statistics of correlations of individual bacterial taxa with ${{\boldsymbol{y}}}_{{SVD}}$ were indistinguishable from CTF and significantly different compared to ECNs (Supplementary Table 1). These analyses underscore the importance of dynamical system-based reorientation of the latents in EMBED in identifying independent modes of significant collective abundance changes.

The probabilistic nature of EMBED accounts for spurious abundance variability arising from sampling noise. To test the robustness of EMBED to sampling noise, we generated ground truth trajectories using the multispecies Lotka–Volterra model³⁶ with both competitive and cooperative interactions^40,41. Using different sequencing depths, two sets of read counts were sampled using the same ground truth abundances. EMBED (and CTF) was used to model the observed read counts. The more robust the inference is to sampling noise, the better will be the agreement between the two inferred models. Indeed, as seen in Fig. 2i, EMBED-based reconstruction of abundance time series was internally consistent and robust to sequencing noise. The statistical significance of these results evaluated using the Wilcoxon signed-rank test can be found in Supplementary Table 2.

Based on these analyses, we conclude that EMBED can accurately and precisely reconstruct microbiome abundance time series using a small number of latent dimensions and that the inferred ECNs correspond to orthogonal modes of fluctuations in the collective dynamics of the bacterial ecosystem.

Effect of dietary oscillations on the gut microbiome

Host diet has been shown to be a major factor influencing gut bacterial dynamics^13,42 but in a subject-specific manner⁴³. We applied EMBED to the data collected by Carmody et al.¹³ to better understand bacterial abundance changes in response to highly controlled dietary perturbations. Briefly, the diets of five individually housed mice were alternated every ∼3 days between a low-fat, plant-polysaccharide diet (LFPP) and a high-fat, high-sugar diet (HFHS). Daily fecal samples were collected for over a month (Supplementary Fig. 5).

Using K = 5 ECNs, EMBED obtained a lower-dimensional time-series approximation that reconstructed the original data with great accuracy (average taxa Pearson correlation coefficient $r=0.75\pm 0.18$, average community Pearson correlation coefficient, $r=0.98\pm 0.003$) (Supplementary Fig. 6). Notably, the inferred ECNs were unique (Supplementary Fig. 7), and robust to missing samples (Supplementary Fig. 8 and Supplementary Table 3) as well as variation in OTU inclusion criteria (Supplementary Fig. 9 and Supplementary Table 4). The first ECN ${y}_{1}(t)$ represented a relatively constant abundance throughout the entire time series (Fig. 3a and Supplementary Information section 3). Moreover, the corresponding loading vector ${{\boldsymbol{\Phi }}}_{1}$ showed a significant correlation to the average individual OTU abundance across time (average Spearman correlation coefficient across subjects, $r=-0.86\pm 0.06$, Fig. 2b), suggesting that despite large-scale, cyclic dietary changes, gut bacterial abundances in the community tended to fluctuate around a constant average abundance.

**Fig. 3: The effect of dietary oscillations on microbiome dynamics.**

In contrast, ECNs ${y}_{2}(t)$ and ${y}_{3}(t)$ collectively captured the cyclic nature of dietary oscillations, confirming that the murine diet rapidly and reproducibly alters abundance dynamics even at the individual OTU level (Supplementary Information section 3). To identify OTUs whose oscillatory dynamics were similar across subjects, we clustered the loadings ${{\boldsymbol{\Phi }}}_{2}$ and ${{\boldsymbol{\Phi }}}_{3}$ of individual OTUs on ECNs ${y}_{2}(t)$ and ${y}_{3}(t)$ using Ward’s linkage. This approach is in spirit similar to clustering the log ratio of OTU dynamical trajectories reconstructed using OTU loadings corresponding only to ECNs ${y}_{2}(t)$ and ${y}_{3}(t)$ and OTU loadings corresponding only to ECN ${y}_{1}(t)$. This approach ensures that our identification of OTUs with similar dynamics is not influenced by their overall abundance. In addition to removing the effect of overall OTU abundances, EMBED also allows us to cluster OTU dynamics only along user-chosen dynamical modes. We found that bacteria in the community largely clustered into three groups (Fig. 3d); those whose abundances increased with the LFPP diet (blue, group 1), and those whose abundances increased with the HFHS diet to different extents (black and magenta, groups 2 and 3). In keeping with recent studies^44,45,46, we found that the genera Saccharicrinis, members of the Bacteroidetes phylum, were significantly enriched in group 1 (5 out of 13 compared to 7 out of 73, hypergeometric test, $p=0.0015$) consistent with the notion that bacteria belonging to this genera are able to degrade plant polysaccharides and utilize the metabolic byproducts present in the LFPP diet.

Unexpectedly, we found two ECNs ${y}_{4}(t)$ and ${y}_{5}(t)$ that represented profound nonoscillatory behavior in abundance fluctuations. ${y}_{4}(t)$ represented an overall drift in abundance (see Supplementary Information section 3) over the time series and ${y}_{5}(t)$ represented a U-shaped recovery (see Supplementary Information section 3). The loadings corresponding to these two modes were significantly correlated across subjects (Spearman correlation coefficient $r=0.37\pm 0.16,$ averaged across mice). The top five OTUs with most negative and positive loadings ${{\boldsymbol{\Phi }}}_{4}$ (omitting OTUs that were also in the top five negative/positive for loadings ${{\boldsymbol{\Phi }}}_{5})$ experienced a significant, irreversible increase and decrease throughout the time course of the experiment respectively (Fig. 3c, top). Thus, while the dynamics of most gut bacteria in this community exhibit rapid and reversible changes in response to dietary oscillations, there exist certain bacteria that exhibit irreversible changes over time. In contrast, the top five OTUs with most negative and positive loadings ${{\boldsymbol{\Phi }}}_{5}$ (omitting OTUs that were also in the top five negative/positive for loadings ${{\boldsymbol{\Phi }}}_{4})$ experienced an inverted U-shaped and a U-shaped abundance profile (Fig. 3c, bottom). Interestingly, OTUs that exhibited these nonoscillatory behaviors differed significantly from subject to subject (Supplementary Table 5).

EMBED can identify OTUs that exhibit universal dynamics and those that exhibit subject-specific behavior. Each OTU within each subject-specific ecosystem is characterized by a K-dimensional vector of loadings corresponding to the K ECNs. OTUs whose loading vectors are similar across all subjects have similar dynamics across subjects and vice versa for OTUs with different loading vectors. To identify these universal and subject-specific OTUs, we computed the average distance across all pairs of subjects of the OTU-specific loadings vectors. This average distance also correlated strongly with the average distance of the subject-specific OTU-abundance trajectories (inset of Fig. 3e). In Fig. 3e, we plot the average abundance of ten OTUs with the most similar Φ loadings (bottom) and the 10 most dissimilar Φ loadings (top). The black lines show the OTU-averaged abundances for individual subjects and the colored bold lines (green and orange) show the average across subjects. As seen in Fig. 3e, the top ten OTUs whose dynamics were similar across all subjects strongly preferred the HFHS diet. Notably, these OTUs are overrepresented by the genus Oscillibacter (4 out of 10 compared to 5 out of 73, Hypergeometric test $p=9\times {10}^{-4}$). Interestingly, this overrepresentation was observed only at the genus and the family level and was not observed at higher taxonomic classifications (Supplementary Table 6). Moreover, no other genus or family was overrepresented. This strongly suggests a specific genus level preference to high-fat high-sugar diet in the genus Oscillibacter that can override subject-specific ecosystem parameters. Notably, Oscillibacter are known to prefer high fat⁴⁷ as well as high-sugar diets⁴⁸. Future work is needed to further establish the mechanistic connection between Oscillibacter and HFHS diets. Notably, beyond these specific associations, we found that OTU-specific dynamics across subjects was not driven by the phylogeny (Supplementary Table 7 and Supplementary Information section 4).

ECNs identify modes of recovery of bacteria under antibiotic action

Broad-spectrum oral antibiotics have significant effects on the gut flora both during and after administration. Specifically, microbiome abundance dynamics following antibiotic administration can potentially exhibit a combination of several typical behaviors which may reflect different survival strategies^7,9,11,49. These include quick recovery following removal of antibiotic, slow but partial recovery, and one-time changes followed by resilience to repeat antibiotic treatment. The temporal variation in abundances of any bacteria could be a combination of these typical behaviors. Moreover, given that the gut ecosystems differ across different hosts, the response of specific bacteria to the same antibiotic treatment could vary from host to host. To better parse the major modes of gut bacterial dynamics associated with antibiotic administration, we analyzed the data collected by Ng et al.¹⁰. Briefly, six mice were given the antibiotic ciprofloxacin in two regimens (days 1–4 and days 14–18) and fecal microbiome samples were collected daily over a period of ∼30 days (Supplementary Fig. 10).

We found that a very small number K = 4 ECNs was sufficient to capture the data with significant accuracy (average taxa Pearson correlation coefficient $r=0.80\pm 0.2$, average community Pearson correlation coefficient, $r=0.98\pm 0.01$) (Supplementary Fig. 6). Similar to the diet study, the inferred ECNs were unique (Supplementary Fig. 7) and robust to missing samples (Supplementary Fig. 8 and Supplementary Table 3) as well as variation in OTU inclusion criteria (Supplementary Figs. 9 and Supplementary Table 4). As shown in Fig. 4a and consistent with the diet analysis, ECN ${y}_{1}(t)$ was relatively stable throughout the study (Supplementary Information section 3) and the corresponding loading vector ${{\boldsymbol{\Phi }}}_{1}$ was strongly correlated with the mean OTU abundance over time (Spearman correlation coefficient $r=-0.57\pm 0.07)$ (Fig. 4b). We found the remaining several ECNs to follow broad classes of behaviors in response to periods of stress. Indeed, ECNs, ${y}_{2}(t)$ appeared to represent an inelastic one-time change followed by a relatively stable response (Supplementary Information section 3). ECN, ${y}_{3}(t)$ represented the opposite, it responded to the antibiotic treatment the second time but not the first time. In contrast, ECN ${y}_{4}(t)$ represented elastic changes in the microbiome, potentially representing abundances reproducibly decreasing (or increasing) with the action of the antibiotic but quickly bouncing back to pre-antibiotic levels when it was withdrawn (Supplementary Information section 3).

**Fig. 4: Effect of antibiotic treatment on the gut microbiome.**

These salient dynamical features were captured when we clustered the OTUs using the loadings ${{\boldsymbol{\Phi }}}_{2}-{{\boldsymbol{\Phi }}}_{4}$ using Ward’s linkage (Fig. 4c), which identified seven major groups of OTUs with distinct dynamical behaviors (Fig. 4c, d). Interestingly, while some of the groups simply reflected behaviors of individual ECNs, others could be understood according to their relative contributions across multiple ECNs. For example, the behavior of OTUs in groups 1 and 3 aligned with ECN ${y}_{2}(t)$, albeit with opposing trends. Group 1 OTUs flourished during the first antibiotic treatment but the second treatment did not elicit a similar response. In contrast, OTUs in group 3 diminished in their abundance after the first antibiotic treatment but were resistant to subsequent antibiotic action.

OTUs in groups 2, 5, 6, and 7 displayed highly elastic dynamics in response to both periods of antibiotic administration. Group 2 OTUs was overrepresented by the genus Akkermansia (all 2 out of 41 OTUs are in Group 2, Hypergeometric test $p=0.026$) flourished during the antibiotic treatment but decreased their abundance in a reversible manner when antibiotics were withdrawn. Notably, species from this genus are known to be rare in the human gut but only colonize it following treatment with broad-spectrum antibiotics, including ciprofloxacin⁵⁰. OTUs in groups 5, 6, and 7 in contrast diminished their abundance in the presence of antibiotics in a reversible manner. Group 6 was overrepresented by the genus Blautia (3 out of 6 compared to 5 out of 41, Hypergeometric test P = 0.017), while group 7 was overrepresented by the genus Aestuariispira (all 2 out of 41 OTUs are in Group 7, Hypergeometric test p = 0.0073). Finally, group 4 comprised OTUs that were exquisitely sensitive to initial antibiotic administration, whose abundance did not make any meaningful recovery. These OTUs were overrepresented in the genus Coprobacter (2 out of 5 compared to 3 out of 41, Hypergeometric test p = 0.035). These specific associations need to be further investigated.

Notably, OTUs in groups 5 and 7, groups that represent slower and partial recovery compared to OTUs group 6, exhibited significant subject-to-subject variability as quantified by both the average subject-to-subject variability in OTU-specific Φ loadings (Fig. 4e) and the subject-to-subject variability in OTU-specific abundance trajectories (Supplementary Fig. 10). While these OTUs exhibited qualitative dynamics of recovery across all subjects (Supplementary Fig. 10), the time course and the extent of recovery varied from subject-to-subject. These findings are corroborated by recent studies that show imperfect and subject-specific recovery of bacterial abundances following antibiotic treatment^11,51,52,53. Interestingly, unlike the diet study, the OTUs in the same dynamical group shared phylogenetic similarity (Supplementary Table 7 and Supplementary Information section 3).

Discussion

Bacteria in host-associated microbiomes live in complex ecological communities governed by competitive and cooperative interactions, and a constantly changing environment. Extensive spatial and temporal variability and coordinate changes in abundances in response to environmental perturbations are a hallmark of these communities. Dimensionality reduction can leverage these fluctuations, but its use towards understanding microbiome dynamics has thus far been limited.

In this work, we presented EMBED, a dimensionality reduction approach specifically tailored to identify the ecological normal modes in the dynamics of bacterial communities that are shared across subjects undergoing identical environmental perturbations. Identified ECNs shed insight into the underlying structure of bacterial community dynamics. By applying EMBED to several times series datasets representing major ecological perturbations, we identified immediate and reversible changes to the gut community in response to these stimuli. However, EMBED also identified more subtle, longer-term, and perhaps irreversible changes to specific members of the community, the mechanisms, and consequences of which would be interesting to pursue further. Notably, while EMBED can learn accurate lower-dimensional representation in any longitudinal data (Supplementary Fig. 11), the inferred ECNs are likely to be easily interpretable when individual hosts are experiencing the same environmental perturbations.

One of the ECNs in the studied datasets (Figs. 3 and 4) was consistently found to be constant over time. This ECN also reflected the temporal mean abundance of individual OTUs. We can potentially leverage this insight and absorb this ECN in the lower-dimensional model. Specifically, we can model the departure from the mean abundance as a Gibbs–Boltzmann distribution. That is, instead of Eq. (1), we can model OTU abundances as

$${q}_{{os}}\left(t\right)=\frac{{\mu }_{{os}}}{{\Omega }_{{st}}}{{\exp }}\left(-\mathop{\sum }\limits_{k=1}^{K}{z}_{{tk}}{\theta }_{{kos}}\right).$$

(5)

where ${\mu }_{{os}}$ is the temporal average abundance of OTU “o” in subject “s”. This way, we model only the fluctuations around the mean abundance and potentially reduce the dimensionality of our description even further. We leave this for future studies.

One key parameter in EMBED is the number of components K. A large K will necessarily fit the data better, potentially fitting to noise and unimportant idiosyncrasies in the data. How do we decide the appropriate number of components? In this work, we chose K based on the qualitative elbow method⁵⁴ (Supplementary Fig. 12). However, going forward, more rigorous approaches can be implemented. EMBED is a probabilistic model and information-theoretic criteria⁵⁵ could be used to identify the correct number of components. These criteria seek a balance between an increase in the number of parameters and the accuracy of fit to data (likelihood). We note that the total likelihood of the data in our model is linearly proportional to the sequencing depth. However, the reported sequencing depth is typically over-inflated compared to the true nucleotide capture probability of the experiments leading to an inflated estimate of the total likelihood. This issue has been well discussed in single-cell RNA sequencing (see e.g.,⁵⁶). One approach to solve this in the context of the microbiome is to obtain technical repeats¹⁵ which can in turn allow us to estimate the true technical noise.

The presented formulation of EMBED specifically focused on identifying dynamical features of the microbiome in hosts that were subjected to the same strong environmental perturbation. However, in many cases, the perturbations may be weak, for example, a gradual shift in diet⁵⁷, or completely absent, for example, when studying maturation of gut microbiomes of infants⁵⁸. In such cases, we expect a significantly higher host-to-host variability in microbiome dynamics. In this case, EMBED can be reformulated to capture this variability. Here, instead of the tensor decomposition in Eq. (2), we can model the microbiome dynamics using a tensor decomposition as follows:

$${q}_{{os}}\left(t\right)=\frac{1}{{\Omega }_{{st}}}{{\exp }}\left(-\mathop{\sum }\limits_{k=1}^{K}{z}_{{tk}}{\theta }_{{ok}}{\Gamma }_{{sk}}\right).$$

(6)

In Eq. (6), ${z}_{{tk}}$ are time-specific embeddings, ${\theta }_{{ok}}$ are species-specific embeddings, and ${\Gamma }_{{sk}}$ couple these embeddings to specific subjects. We leave this generalization to future studies.

While EMBED was specifically developed to study microbiomes, it reflects a more generalizable framework that can easily be applied to other types of longitudinal sequencing data as well. We therefore expect that EMBED will be a significant tool in the analysis of dynamics of high-dimensional sequencing data beyond the microbiome.

Methods

Inference of ECNs from longitudinal data

We consider that abundance of O bacterial operational taxonomic units (OTUs) are measured over a period of T days in S subjects. We model the read counts ${n}_{{os}}(t)$ of OTUs “o” on any given day t in subject s as a multinomial distribution. The likelihood of observing the data is given by

$$L=\prod\limits _{s,t}\frac{{N}_{s}\left(t\right)!}{\prod\limits _{o}{n}_{{os}}\left(t\right)!}\prod\limits _{o}{q}_{{os}}{\left(t\right)}^{{n}_{{os}}\left(t\right)}$$

(7)

where ${N}_{s}\left(t\right)=\sum _{o}{n}_{{os}}(t)$ is the total read count on a given day and ${q}_{{os}}\left(t\right)$ are the underlying propensities for individual OTUs. We model these propensities using the exponential Gibbs–Boltzmann distribution which allows us to capture large variations in OTU abundances²⁹.

$${q}_{{os}}\left(t\right)=\frac{1}{{\Omega }_{{st}}}{{\exp }}\left(-\mathop{\sum }\limits_{k=1}^{K}{z}_{{tk}}{\theta }_{{kos}}\right)$$

(8)

where ${z}_{{tk}}$ are time-specific latents that are shared by all OTUs and subjects, and ${\theta }_{{kos}}$ are OTU-and subject-specific loadings that are shared across all time points. The number K of latents/loadings is chosen such that $K\ll O,T$ thereby achieving a lower-dimensional description of the time-series data. We obtain the zs and the θs using the maximum likelihood approach. While most microbiome abundance data are compositional, new techniques are being developed to measure absolute bacterial loads¹⁵. EMBED is naturally equipped to model measurements of abundances. To do so, we use the absolute abundance instead of the daily total read count ${N}_{s}\left(t\right)$ in Eq. (1) (Supplementary Fig. 13).

To that end, we write down the log-likelihood of the data:

$${ln}={const}.+\sum\limits _{t,o,s}{n}_{{os}}\left(t\right){\rm{log }}\,{q}_{{os}}(t).$$

(9)

The constant term of the likelihood does not depend on the parameters and can thus be omitted in likelihood maximization. Simplifying using Eqs. (7) and (8), we have

$${ln}=-\sum\limits _{t,o,s,k}{{N}_{s}\left(t\right)x}_{{os}}\left(t\right){z}_{{tk}}{\theta }_{{kos}}-\sum\limits _{t,s}{\rm{log }}{\Omega }_{{st}}.$$

(10)

Here ${x}_{{os}}\left(t\right)={n}_{{os}}(t)/{N}_{s}(t)$ is the relative abundance of OTU o at time t. We obtain the gradients

$$\frac{\partial \mathrm{ln}}{\partial {z}_{{tk}}}=-\sum\limits _{o,s}{N}_{s}(t)\left({x}_{{os}}\left(t\right)-{q}_{{os}}\left(t\right)\right){\theta }_{{kos}}\,{\rm{and}}$$

(11)

$$\frac{\partial \mathrm{ln}}{\partial {\theta }_{{kos}}}=-\sum\limits _{t}{N}_{s}\left(t\right){z}_{{tk}}\left({x}_{{os}}\left(t\right)-{q}_{{os}}\left(t\right)\right)$$

(12)

We use gradient ascent algorithm to find the latents and the loadings that maximize the likelihood. In the analyzed datasets, the read counts on all days were equal. Therefore, we performed gradient ascent by normalizing the log-likelihood by the total read count and using relative abundances on the left-hand side of Eqs. (11) and (12). A learning rate of $\eta \in [\mathrm{0.001,0.005}]$ ensured that the inference was stable. When investigating the accuracy of EMBED-based reconstruction of community composition (Fig. 2), we stopped the inference when the relative gradients of both zs and θs were less than ${10}^{-3}$ or if the maximum number of iterations exceeded ${10}^{5}$. When analyzing the diet and the antibiotics datasets, we stopped the inference when the relative gradients of both zs and θs were less than ${10}^{-4}$ or if the maximum number of iterations exceeded ${10}^{6}$.

For a given K, using the microbiome data ${x}_{{os}}\left(t\right)$ and starting from random initialization, we first simultaneously infer the latents ${z}_{{tk}}$ and the features ${\Theta }_{{kos}}.$ We observe that the $T\times K$ matrix z of latents can be multiplied by an invertible matrix B $\left({\boldsymbol{z}}{\boldsymbol{\to }}{\boldsymbol{zB}}\right)$ and the corresponding matrix $K\times O\times S$ matrix of features can be multiplied by the inverse ${{\boldsymbol{B}}}^{{\boldsymbol{-}}{\boldsymbol{1}}}$ $\left({\boldsymbol{\Theta }}{\boldsymbol{\to }}{{\boldsymbol{B}}}^{{\boldsymbol{-}}{\boldsymbol{1}}}{\boldsymbol{\Theta }}\right)$ and the abundance predictions from the model do not change. Therefore, we use the Gram–Schmidt procedure to orthogonalize the matrix of latents such that ${\boldsymbol{z}}{\boldsymbol{\to }}{{\boldsymbol{z}}}^{{\boldsymbol{{\prime} }}}$ where ${{\boldsymbol{z}}^{\prime}}^{T}{{\boldsymbol{z}}}^{\prime}\,{\boldsymbol{=}}\,{{\boldsymbol{I}}}_{{\boldsymbol{K}}}$ is an identity matrix. For an inferred matrix of latents z, we found out the matrix multiplier ${\boldsymbol{B}}\,{\boldsymbol{=}}\,{{\boldsymbol{z}}}^{{\boldsymbol{+}}}{\boldsymbol{z\text{'}}}$ where ${\boldsymbol{z\text{'}}}$ was the orthogonal matrix of latents obtained after the Gram–Schmidt procedure and ${{\boldsymbol{z}}}^{{\boldsymbol{+}}}$ is the Moore-Penrose pseudoinverse of matrix ${\boldsymbol{z}}.$ Once ${\boldsymbol{B}}$ is identified, we also transform the Θ matrix (${\boldsymbol{\Theta }}{\boldsymbol{\to }}{{\boldsymbol{\Theta }}}^{\prime}\,{\boldsymbol{=}}\,{{\boldsymbol{B}}}^{{\boldsymbol{-}}{\mathbf{1}}}{\boldsymbol{\Theta }}$). At the end of this procedure, we end up with orthonormal latents ${{\boldsymbol{z}}}^{{\boldsymbol{{\prime} }}}$ and corresponding features ${\boldsymbol{\Theta }}^{\prime}$ that correspond to the same abundances as z and Θ. For the sake of simplicity of notation, we drop the primes.

Next, we model the dynamics of the orthonormal latents using a linear dynamical system:

$${z}_{t+1,k}=\sum\limits _{{k}^{{\prime} }}{A}_{k{k}^{{\prime} }}{z}_{{{tk}}^{{\prime} }}+{u}_{k}+{\eta }_{k}(t)$$

(13)

where we assume that ${A}_{k{k}^{{\prime} }}={A}_{{k}^{{\prime} }k}$ and ${\eta }_{k}(t)$ are Gaussian distributed uncorrelated noise vectors: ${\langle \eta }_{k}\left({t}_{1}\right){n}_{{k}^{{\prime} }}\left({t}_{2}\right)\rangle ={\delta }_{12}{\delta }_{k{k}^{{\prime} }}$ where ${\delta }_{{ab}}$ is the Kronecker delta function. Our task is to find the symmetric interaction matrix A and the vector u that fits this model. We achieve this using squared error minimization. We write

$$E\left({\boldsymbol{A}}{\boldsymbol{,}}{\boldsymbol{u}}\right)=\sum\limits _{t}{\left({z}_{{tk}}-{z}_{{tk}}^{{pred}}\right)}^{2}$$

(14)

where ${z}_{{tk}}$ is the inferred latent and ${z}_{{tk}}^{{pred}}$ is the corresponding prediction using ${z}_{t-1,k}$ and Eq. (13). We restrict the summation only over time points t such that measurements are available for time points t and $t-1$. When there are no missing time points/samples, Eq. (14) can be minimized analytically. However, in real microbiome time series, samples are often missing. In that case, we propagate the latents for the missing samples using the dynamical Eq. (13). This makes the problem nonlinear as the dynamical propagation involves matrix multiplication. Therefore, to obtain a matrix A that minimizes the error in Eq. (14), we use simulated annealing. Once the matrix A is identified, we transform the orthonormal latents ${z}_{{tk}}$ into ecological normal modes ${y}_{{tk}}$ as described in the Results section.

The scripts for obtaining ECNs y and corresponding loadings Φ from read count data can be found at: https://github.com/mayar-shahin/EMBED.

In short, the steps involved in inferring the ECNs and the corresponding loadings are as follows.

a.
Start with the $T\times O\times S$ OTU-table and a chosen latent space dimension K. Randomly initialize the T × K matrix of latents ${\boldsymbol{z}}$ and the $K\times O\times S$ matrix of features Θ. In our implementation on github, we stack multiple subjects to create a $K\times (O\times S)$ matrix.
b.
Perform gradient ascent using Eqs. (11) and (12) to obtain the latents and the features.
c.
Use the Gram–Schmidt procedure to obtain an orthonormal set of latents ${\boldsymbol{z}}^{\prime}$ from the original latents ${\boldsymbol{z}}$. Obtain the $K\times K$ rotation matrix ${\boldsymbol{B}}\,{\boldsymbol{=}}\,{{\boldsymbol{z}}}^{{\boldsymbol{+}}}{\boldsymbol{z}}^{\prime}$ and transform the features ${{\boldsymbol{\Theta }}}^{\prime}\,{\boldsymbol{=}}\,{{\bf{B}}}^{{\boldsymbol{-}}{\boldsymbol{1}}}{\boldsymbol{\Theta }}$. The new orthonormal latents ${\boldsymbol{z}}^{\prime}$ and features ${{\boldsymbol{\Theta }}}^{\prime}$ fit the data to the same degree of accuracy as the original latents ${\boldsymbol{z}}$ and features ${\boldsymbol{\Theta }}$.
d.
Find the symmetric interaction matrix A by minimizing the squared error in Eq. (14) using simulated annealing. Diagonalize the interaction matrix ${\boldsymbol{A}}\,{\boldsymbol{=}}\,{{\boldsymbol{v}}}^{{\boldsymbol{T}}}{\boldsymbol{\Lambda }}{\boldsymbol{v}}$. Obtain the ECNs, ${{\boldsymbol{y}}}_{t}={\boldsymbol{v}}{{\boldsymbol{z}}}_{t}$ and the corresponding loadings ${\boldsymbol{\Phi }}\,{\boldsymbol{=}}\,{{\boldsymbol{v}}}^{T}{\boldsymbol{\theta }}$.

We note that in the current work, our goal was to use the dynamical model to obtain a reorientation of the latent variables, rather than fitting the latent variables to a decaying first-order dynamics. An alternative approach to simultaneously fit the dynamical model and the embedding model to the data. Specifically, we can write the total likelihood

$$L=\sum\limits _{t,o,s}{n}_{{os}}\left(t\right){\rm{log }}{q}_{{os}}\left(t\right)-\frac{\beta }{{2\sigma }^{2}}\sum\limits _{t,l}\left({z}_{k}\left(t+1\right)-\sum\limits _{{k}^{{\prime} }}{A}_{k{k}^{{\prime} }}{z}_{{k}^{{\prime} }}\left(t\right)-{u}_{k}\right).$$

(15)

that combines both model fit to data and the dynamics of the latent variables. In Eq. (15), we have assumed a Gaussian distribution for the noise in the linear dynamics with standard deviation $\sigma .$ We denote by β the hyperparameter that dictates the relative contribution of the data likelihood and the latent dynamics to the overall likelihood. Notably, β is a hyperparameter and is not a priori known. Therefore, our calculations can therefore be thought of as a limit where β is small.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

All data and code related to the manuscript are available at https://github.com/mayar-shahin/EMBED.

Code availability

All code related to the manuscript is available at https://github.com/mayar-shahin/EMBED.

References

Caporaso, J. G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl Acad. Sci. USA 108, 4516–4522 (2011).
Article CAS PubMed Google Scholar
Kozich, J. J., Westcott, S. L., Baxter, N. T., Highlander, S. K. & Schloss, P. D. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform. Appl. Environ. Microbiol. 79, 5112–5120 (2013).
Article CAS PubMed PubMed Central Google Scholar
Stewart, C. J. et al. Temporal development of the gut microbiome in early childhood from the TEDDY study. Nature 562, 583–588 (2018).
Article CAS PubMed PubMed Central Google Scholar
Vatanen, T. et al. Genomic variation and strain-specific functional adaptation in the human gut microbiome during early life. Nat. Micro 4, 470–479 (2019).
Article CAS Google Scholar
Peled, J. U. et al. Microbiota as predictor of mortality in allogeneic hematopoietic-cell transplantation. New. Eng. J. Med. 382, 822–834 (2020).
Article CAS PubMed Google Scholar
Buffie, C. G. et al. Precision microbiome reconstitution restores bile acid mediated resistance to Clostridium difficile. Nature 517, 205–208 (2015).
Article CAS PubMed Google Scholar
Suez, J. et al. Post-antibiotic gut mucosal microbiome reconstitution is impaired by probiotics and improved by autologous FMT. Cell 174, 1406–1423.e16 (2018).
Article CAS PubMed Google Scholar
Zmora, N. et al. Personalized gut mucosal colonization resistance to empiric probiotics is associated with unique host and microbiome features. Cell 174, 1388–1405.e21 (2018).
Article CAS PubMed Google Scholar
Kim, S. G. et al. Microbiota-derived lantibiotic restores resistance against vancomycin-resistant Enterococcus. Nature 572, 665–669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ng, K. M. et al. Recovery of the gut microbiota after antibiotics depends on host diet, community context, and environmental reservoirs. Cell Host Microbe 26, 650–665.e4 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dethlefsen, L. & Relman, D. A. Incomplete recovery and individualized responses of the human distal gut microbiota to repeated antibiotic perturbation. Proc. Natl Acad. Sci. USA 108, 4554–4561 (2011).
Article CAS PubMed Google Scholar
David, L. A. et al. Host lifestyle affects human microbiota on daily timescales. Genome Biol. 15, R89 (2014).
Article PubMed PubMed Central Google Scholar
Carmody, R. N. et al. Diet dominates host genotype in shaping the murine gut microbiota. Cell Host Microbe 17, 72–84 (2015).
Article CAS PubMed Google Scholar
Caporaso, J. G. et al. Moving pictures of the human microbiome. Genome Biol. 12, R50 (2011).
Article PubMed PubMed Central Google Scholar
Ji, B. W. et al. Quantifying spatiotemporal variability and noise in absolute microbiota abundances using replicate sampling. Nat. Methods 16, 731–736 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ji, B. W., Sheth, R. U., Dixit, P. D., Tchourine, K. & Vitkup, D. Macroecological dynamics of gut microbiota. Nat. Microbiol. 5, 768–775 (2020).
Article CAS PubMed Google Scholar
Grilli, J. Macroecological laws describe variation and diversity in microbial communities. Nat. Commun. 11, 4743 (2020).
Article CAS PubMed PubMed Central Google Scholar
Martino, C. et al. Context-aware dimensionality reduction deconvolutes gut microbial community dynamics. Nat. Biotechnol. 39, 165–168 (2021).
Article CAS PubMed Google Scholar
Äijö, T., Müller, C. L. & Bonneau, R. Temporal probabilistic modeling of bacterial compositions derived from 16S rRNA sequencing. Bioinfo 34, 372–380 (2018).
Article Google Scholar
Joseph, T. A., Pasarkar, A. P. & Pe’er, I. Efficient and accurate inference of mixed microbial population trajectories from longitudinal count data. Cell Syst. 10, 463–469.e6 (2020).
Article CAS PubMed Google Scholar
Moon, K. R. et al. Manifold learning-based methods for analyzing single-cell RNA-sequencing data. Curr. Opin. Syst. Biol. 7, 36–46 (2018).
Article Google Scholar
Costello, E. K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 1694–1697 (2009).
Article CAS PubMed PubMed Central Google Scholar
Huttenhower, C. et al. Structure, function and diversity of the healthy human microbiome. Nature 486, 207–214 (2012).
Article CAS Google Scholar
Lloyd-Price, J. et al. Strains, functions and dynamics in the expanded Human Microbiome Project. Nature 550, 61–66 (2017).
Article CAS PubMed PubMed Central Google Scholar
Peña, D. & Poncela, P. Dimension Reduction in Multivariate Time Series. In Advances in Distribution Theory, Order Statistics, and Inference (eds Balakrishnan, N. et al.) 433–458 (Birkhäuser Boston, 2006).
Raman, A. S. et al. A sparse covarying unit that describes healthy and impaired human gut microbiota development. Science 365, eaau4735 (2019).
Article PubMed PubMed Central Google Scholar
Gibson, T. E. et al. Intrinsic instability of the dysbiotic microbiome revealed through dynamical systems inference at scale. Preprint at http://biorxiv.org/lookup/doi/10.1101/2021.12.14.469105 (2021).
Shenhav, L. et al. Modeling the temporal dynamics of the gut microbial community in adults and infants. PLoS Comput. Biol. 15, e1006960 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dixit, P. D. Thermodynamic inference of data manifolds. Phys. Rev. Res. 2, 023201 (2020).
Article CAS Google Scholar
Cui, Q. & Bahar, I. Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems (CRC Press, 2005).
Rabanser, S., Shchur, O. & Günnemann, S. Introduction to tensor decompositions and their applications in machine learning. Preprint at https://arxiv.org/abs/1711.10781 (2017).
Gloor, G. B., Wu, J. R., Pawlowsky-Glahn, V. & Egozcue, J. J. It’s all relative: analyzing microbiome data as compositions. Ann. Epidemiol. 26, 322–329 (2016).
Article PubMed Google Scholar
Stämmler, F. et al. Adjusting microbiome profiles for differences in microbial load by spike-in bacteria. Microbiome 4, 28 (2016).
Article PubMed PubMed Central Google Scholar
IBDMDB, Investigators et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Article Google Scholar
Faith, J. J. et al. The long-term stability of the human gut microbiota. Science 341, 1237439 (2013).
Kilpatrick, A. M. & Ives, A. R. Species interactions can explain Taylor’s power law for ecological time series. Nature 422, 65–68 (2003).
Article CAS PubMed Google Scholar
Martino, C. et al. A novel sparse compositional technique reveals microbial perturbations. mSystems 4, e00016–e00019 (2019).
Article PubMed PubMed Central Google Scholar
Davis, R. A., Zang, P. & Zheng, T. Sparse vector autoregressive modeling. J. Comput. Graph. Stat. 25, 1077–1096 (2016).
Article Google Scholar
Gibbons, S. M., Kearney, S. M., Smillie, C. S. & Alm, E. J. Two dynamic regimes in the human gut microbiome. PLoS Comput. Biol. 13, e1005364 (2017).
Article PubMed PubMed Central Google Scholar
Bucci, V. & Xavier, J. B. Towards predictive models of the human gut microbiome. J. Mol. Biol. 426, 3907–3916 (2014).
Article CAS PubMed PubMed Central Google Scholar
Coyte, K. Z., Schluter, J. & Foster, K. R. The ecology of the microbiome: networks, competition, and stability. Science 350, 663–666 (2015).
Article CAS PubMed Google Scholar
David, L. A. et al. Diet rapidly and reproducibly alters the human gut microbiome. Nature 505, 559–563 (2014).
Article CAS PubMed Google Scholar
Zeevi, D. et al. Personalized nutrition by prediction of glycemic responses. Cell 163, 1079–1094 (2015).
Article CAS PubMed Google Scholar
Johnson, E. L., Heaver, S. L., Walters, W. A. & Ley, R. E. Microbiome and metabolic disease: revisiting the bacterial phylum Bacteroidetes. J. Mol. Med. 95, 1–8 (2017).
Article CAS PubMed Google Scholar
Gao, J. et al. Predictive functional profiling using marker gene sequences and community diversity analyses of microbes in full-scale anaerobic sludge digesters. Bioprocess. Biosyst. Eng. 39, 1115–1127 (2016).
Article CAS PubMed Google Scholar
Leadbeater, D. R. et al. Mechanistic strategies of microbial communities regulating lignocellulose deconstruction in a UK salt marsh. Microbiome 9, 48 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lam, Y. Y. et al. Increased gut permeability and microbiota change associate with mesenteric fat inflammation and metabolic dysfunction in diet-induced obese mice. PLoS ONE 7, e34233 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kong, C., Gao, R., Yan, X., Huang, L. & Qin, H. Probiotics improve gut microbiota dysbiosis in obese mice fed a high-fat or high-sucrose diet. Nutrition 60, 175–184 (2019).
Article CAS PubMed Google Scholar
Balaban, N. Q., Merrin, J., Chait, R., Kowalik, L. & Leibler, S. Bacterial persistence as a phenotypic switch. Science 305, 1622–1625 (2004).
Article CAS PubMed Google Scholar
Dubourg, G. et al. High-level colonisation of the human gut by Verrucomicrobia following broad-spectrum antibiotic treatment. Int. J. Antimicrob. Agents 41, 149–155 (2013).
Article CAS PubMed Google Scholar
Isaac, S. et al. Short- and long-term effects of oral vancomycin on the human intestinal microbiota. J. Antimicrob. Chemother. 72, 128–136 (2017).
Article CAS PubMed Google Scholar
Pennycook, J. H. & Scanlan, P. D. Ecological and evolutionary responses to antibiotic treatment in the human gut microbiota. FEMS Microbiol. Rev. 45, fuab018 (2021).
Article CAS PubMed PubMed Central Google Scholar
Koo, H. et al. Individualized recovery of gut microbial strains post antibiotics. NPJ Biofilms Microbiomes 5, 30 (2019).
Article PubMed PubMed Central Google Scholar
Thorndike, R. L. Who belongs in the family? Psychometrika 18, 267–276 (1953).
Article Google Scholar
Akaike, H. Information theory and an extension of the maximum likelihood principle. In Selected Papers of Hirotugu Akaike (ed. Parzen, E.) 199–213 (Springer New York, 1998).
Breda, J., Zavolan, M. & van Nimwegen, E. Bayesian inference of gene expression states from single-cell RNA-seq data. Nat. Biotechnol. 39, 1008–1016 (2021).
Article CAS PubMed Google Scholar
Wastyk, H. C. et al. Gut-microbiota-targeted diets modulate human immune status. Cell 184, 4137–4153.e14 (2021).
Article CAS PubMed PubMed Central Google Scholar
Koenig, J. E. et al. Succession of microbial consortia in the developing infant gut microbiome. Proc. Natl Acad. Sci. USA 108, 4578–4585 (2011).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

P.D. and M.S. acknowledge NIH grant R35GM142547.

Author information

Purushottam D. Dixit
Present address: Department of Biomedical Engineering, Yale University, New Haven, CT, 06511, USA

Authors and Affiliations

Department of Physics, University of Florida, Gainesville, FL, 32611, USA
Mayar Shahin & Purushottam D. Dixit
Physician-Scientist Training Pathway, Department of Medicine, UCSD, San Diego, CA, 92103, USA
Brian Ji
Genetics Institute, University of Florida, Gainesville, FL, 32611, USA
Purushottam D. Dixit
Department of Chemical Engineering, University of Florida, Gainesville, FL, 32611, USA
Purushottam D. Dixit

Authors

Mayar Shahin
View author publications
You can also search for this author in PubMed Google Scholar
Brian Ji
View author publications
You can also search for this author in PubMed Google Scholar
Purushottam D. Dixit
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.S., B.J., and P.D. designed the research. M.S. and P.D. did the analysis. M.S., B.J., and P.D. wrote the manuscript.

Corresponding authors

Correspondence to Mayar Shahin or Purushottam D. Dixit.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary text, figures, and tables

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Shahin, M., Ji, B. & Dixit, P.D. EMBED: Essential MicroBiomE Dynamics, a dimensionality reduction approach for longitudinal microbiome studies. npj Syst Biol Appl 9, 26 (2023). https://doi.org/10.1038/s41540-023-00285-6

Download citation

Received: 03 February 2023
Accepted: 23 May 2023
Published: 20 June 2023
DOI: https://doi.org/10.1038/s41540-023-00285-6