Towards a potential landscape framework of microbiome dynamics

Microbiome dynamics influence the health and functioning of human physiology and the environment. These dynamics are driven in part by interactions between large numbers of microbial taxa, making large-scale prediction and modeling a challenge. Here, we identify states and dynamical features relevant to macroscopic processes, such as infection in the human body and geochemical cycling in the oceans, by modeling the dynamics as stochastic motion on a potential energy-like landscape. We show that gut disease processes and marine geochemical events are associated with reproducible transitions between community states, defined as topological features of the landscape. We find a reproducible two-state succession during recovery from cholera in the gut microbiomes of multiple patients. Recurrence of the late disease state prolongs disease duration. We find evidence of dynamic stability in the gut microbiome of a human subject after experiencing diarrhea during travel, in contrast to residual instability in a second human subject after clinical recovery from Salmonella infection. Finally, we find the structure of marine Prochlorococcus communities in the western Atlantic and north Pacific oceans to smoothly vary with temperature and depth. However, annual water column cycling in the Atlantic drives periodic state transitions across depths. Our approach bridges the small-scale fluctuations in microbiome composition and large-scale changes in state and phenotype, improves analyses of how changes in community composition associate with phenotype without requiring experimental characterization of underlying mechanisms, and provides a novel assessment of microbiome stability and its relation to human and environmental health. Importance Time series of microbial communities are difficult to analyze due to the large number of interacting taxa. We developed a novel analysis based on topology to detect compositional states and state transitions in microbial time series. Our method generalizes across biological systems and can identify gut microbiome dynamics associated with recovery from disease in multiple patients on the order of weeks, and marine bacterial dynamics driven by geochemical cycling on the order of years. We furthermore propose a novel definition of ecological stability that distinguishes between complete and incomplete recovery from infection in human gut microbiomes. Our method requires minimal assumptions regarding biological mechanisms. Overall, our analysis complements current methods for identifying key ecological processes in microbial communities, and suggests further developments in modeling that may improve prediction of microbial dynamics.

community scale. 126 Generally, patients occupied state 7 for longer than they did state 2, suggesting that the 127 stability of the late state in a given patient influences disease duration. To quantify stability, we 128 calculated a temporal correlation function for each state-patient pair during the diarrhea phase. 129 Monotonically decreasing correlation functions indicate metastability; slopes become more negative 130 with decreasing stability. While this analysis revealed that all patients transiently occupied state 131 2, with greatest persistence in patient C, patients A, C, and E had non-monotonic correlation 132 functions for state 7, coinciding with prolonged times to recovery compared to the rest of the cohort, 133 with patients B and F exhibiting the expected monotonic decrease (Fig. 2D). This indicated that 134 patients A, C, and E repeatedly entered and exited state 7, suggesting that prolonged diarrhea in 135 these three patients may have been additionally influenced by the instability or inaccessibility of 136 alternative, healthy states, and that (re-)assembly of the healthy microbial community constitutes 137 a non-trivial step in recovery. probability before and after travel, exhibiting resilience; in contrast, subject B post-infection did 158 not restore the pre-infection probability across states, despite some samples sharing states with 159 pre-infection healthy samples (Fig. 4A). Thus, the restoration of the microbial community to a 160 'healthy' state cannot be confirmed with a single time point. 161 Temporal correlation functions further showed that subject A, as well as subject B before 162 infection, repeatedly visited the same set of states; in contrast, subject B after infection transiently 163 occupied several states without repetition (Fig. 4B). This shows that not only did the microbiome 164 of subject B enter an alternative state, or probability across states, post-infection, but that this 165 alternative state was not fully stabilized. It is possible that the pre-infection probability across 166 states was restored in subject B after the end of the observational period. Prochlorococcus communities rarely occupy the same compositional states over time (Fig. 5C upwelling occurs at HOT, and the probability of a given depth fraction occupying any state remains 193 uniform over the calendar year; the distribution is especially stationary for shallow depths (Fig. 5C).

194
This periodicity was also evident in periodic correlation functions for BATS, and non-periodic for 195 HOT (Fig. 5D).

197
Given that the data sets analyzed here are among the largest longitudinal microbiome data sets 198 currently available, we asked whether the biological hypotheses could have been obtained from 199 sparser data sets. We focused on our finding that microbiome phase spaces are structured by 200 latent variables representing host phenotypes or environmental conditions, and examined whether 201 this structuring was robust to data rarefaction. We found that the partitioning of the phase 202 space by clinical phenotype in the case of the cholera patients, by subject in the case of the two 203 healthy adult humans, and the gradation by depth in the case of Prochlorococcus communities, 204 are robust to all rarefaction tests performed. In the case of cholera patients, nodes remained 205 divided into those representing mostly samples from the diarrhea phase and those representing 206 the recovery phase, with edges being more dense between nodes of the same phenotype than 207 those of different phenotypes (Supporting Information Fig. 3). In the case of the two healthy 208 adult humans, nodes were consistently dominated by samples from one subject, with edges being 209 more dense between nodes representing the same subject than those representing different subjects 210 (Supporting Information Fig. 4). For the Prochlorococcus data set, nodes aggregating samples 211 from similar depth fractions were more densely connected than those representing disparate depths 212 (Supporting Information Fig. 5).

214
We identified unrecognized dynamics governing large-scale phenotypes in microbial time series 215 data by using TDA to infer the shape of a potential landscape from 16S and ITS ribosomal RNA the large-scale dynamics differed between the two groups. We found that multiple cholera patients 230 followed a trajectory of early-to late-stage disease states. In contrast, the two healthy subjects 231 from the year-long data set experienced apparently random jumps between states during Salmonella 232 infection and traveler's diarrhea, respectively. This discordance between the two human gut mi-233 crobiome datasets suggests that microbial infections can potentially be classified into 'ordered'   This notion of resilience as identical probability across states before and after a perturbation 256 can be generalized to a notion of dynamic stability, defined as stationary probability across states  [1]. Third, high dimensionality may also increase the number and complexity of paths 300 by which the system evolves toward lower potential. Finally, the time scales of sampling may differ 301 from those that are predictable by the potential landscape; for example, the potential landscape 302 may well predict the dynamics of gut microbiome relaxation after a meal on the time scale of 303 hours, but this may not be captured by daily sampling. Nolting [30] and Abbott [1] discuss some 304 of these factors in detail. As above, analysis of synthetic data generated by theoretical population 305 dynamics models may help elucidate the limits of potential landscape inference and prediction.

306
In addition to offering a novel quantitative description of microbiome states and dynamics, 307 we hope our analysis will, in time, facilitate predictive modeling of the dynamics and forecast-308 ing of major state transitions in the microbiome. As an example, our approach to identifying   is thus not affected. The data points were then binned by overlapping intervals of the two ranked 357 principal coordinates. For hyperparameters specifying these bins and their overlaps, see Table 1.  (Table 1).

365
The final output is produced by representing each local cluster of data points as a vertex, and 367 drawing an edge between each pair of vertices that share at least one data point. When plotting, 368 the size of each vertex represents the number of data points therein.   Potential estimation 382 We estimated the potential for each vertex by calculating the k-nearest neighbors (kNN) density [10] for each constituent data point i: where d ij is the distance between points i and j, choosing k equal to 10% of the number of samples in each data set, rounded to the nearest integer. kNN varies inversely with density, making it a proxy for the potential. For a vertex V representing n points, we define its potential as The n 2 term in the denominator compensates for the differing sizes of vertices.

383
State assignment 384 We then defined states as topological features of the landscape surrounding local minima of U . We designated each vertex with lower U than its neighbors to be a local minimum of the potential. Connected vertices tied for minimum U were each assigned to be a local minimum. To approximate a gradient, we converted the undirected Mapper graph to a directed graph, with each edge pointing from the the vertex with greater U to the one with lower U . For each non-minimum vertex, we found the graph distance d g to each local minimum constrained by edge direction. We defined the state B x of a minimum V x as the set of vertices V with uniquely shortest graph distance to V x : for all x = y and V y ∈ M , where M is the set of all local minima (Fig 1B). Given that a system occupied state B x at time t, we defined the temporal correlation to be the expectation that it will still (or again) occupy state B x at time t + τ :    Figure 1: Using Mapper to characterize the microbial phase space. A. Using the Mapper algorithm to infer the potential landscape of a toy ecosystem. The mutually antagonistic interaction between species X and Y leads to denser sampling of the phase space where either X or Y is abundant and the other is rare than in other regions; configurations in which X and Y are similar in density are unstable, as small uncertainties in numerical advantage will eventually lead to the dominance of one species over the other. This probability density is analogous to an inverse of the potential landscape. Mapper infers a 'skeleton' of density from the data represented as a point cloud. This representation preserves major features of the landscape such as the two densely-sampled clusters separated by a sparsely-sampled region. B. Identification of local minima and metastable states in the Mapper graph shown in A. Data density for each vertex is estimated by the mean kNN density (see Methods) for samples associated with that vertex. The graph is converted to a directed graph, with each edge pointing in the direction of increasing kNN density. A local minimum, highlighted in pink, is defined as a vertex that has lower kNN than all its neighbors. Finally, the state associated with a local minimum is defined as the set of vertices that have uniquely shortest directed graph distance to that minimum. Non-minima vertices with equal graph distances to multiple local minima are unassociated with any state (grey).    figure   605 showing the temperature gradients across the Prochlorococcus phase space.