Introduction

Microorganisms form complex communities that play critical roles in maintaining the well-being of their hosts or the integrity of their environment1,2,3,4. Disrupting these microbial communities can have severe consequences. In humans, for example, a disruption to the gut microbiota—the aggregate of microorganisms residing in our intestine—is associated with several disorders including irritable bowel syndrome, Clostridium difficile Infection (CDI), autism, obesity, and cavernous cerebral malformations5,6,7. For agriculture crops, a disruption of rhizosphere microbiota can reduce their disease resistance and hence decrease the overall crop yield8,9. In the oceans, a disruption to their microbiota can impact global climate by altering carbon sequestration rates3,4,10. Driving disrupted microbial communities back to their healthy states could offer novel solutions to prevent and treat complex human diseases, enhance sustainable agriculture, and regulate global warming11,12. For instance, inoculating soil microbes can restore terrestrial ecosystems13, and fecal microbiota transplantation (FMT) is so far the most successful therapy for treating recurrent CDI14. Despite the success of these empirical strategies, a broad application of microbial-manipulation strategies will be possible only if we can efficiently control large complex microbial communities15.

There are two big challenges down the road. First, an efficient control method should only manipulate a minimum set of species in the community. However, we still lack a systematic method to identify minimum sets of those “driver species” whose control can help us drive a whole community to desired states. Here, we use the term “species” without necessarily representing the lowest major taxonomic rank. One could also organize microbes by strains, genera, or operational taxonomical units. Second, even when those driver species have been identified, designing the control strategy that should be applied to them (e.g., how their abundance needs to be manipulated) for driving the community towards the desired state remains difficult. This difficulty arises because of the inherent complexity of microbial dynamics and our limited knowledge of them.

To address those two challenges, here we develop a control framework using the ecological network underlying the microbial community. First, we introduce the new notion of “structural accessibility”, which generalizes the notion of structural linear controllability16,17 to systems with nonlinear dynamics. Then, we derive a complete graph-theoretical  characterization of structural accessibility. This result enables  us to efficiently identify minimum sets of driver species of any microbial community purely from the topology of its underlying ecological network, even if some microbial interactions are missing and its population dynamics is unknown. Once the driver species are identified, we systematically design feedback control strategies to drive a microbial community towards the desired state, even if its dynamics is not precisely known. We numerically validated our control framework in large microbial communities, analyzing its performance for different parameters of the community (e.g., the connectivity of its underlying ecological network), and for errors in the ecological network used to identify the driver species. Finally, we demonstrate our framework by controlling the core microbiota of the sea sponge Ircinia oros, and restoring the gut microbiota of gnotobiotic mice infected by Clostridium difficile. Our results provide a rational and systematic framework to control microbial communities and other complex ecosystems.

Results

Modeling controlled microbial communities

Our framework focuses on the impact that manipulating a subset of species has on the abundances of other species. We thus consider a microbial community whose state at time t is determined from the abundance profile \(x(t) \in {\Bbb R}^N\) of its N species, where the i-th entry xi(t) represents the abundance of the i-th species at time t. The state evolves according to some population dynamics

$$\dot x(t) = f(x(t)),$$
(1)

where the function \(f:{\Bbb R}^N \to {\Bbb R}^N\) models the species intrinsic growth and the inter/intra-species interactions of the community (see Supplementary Note 1 for details). For most microbial communities f is unknown and difficult to infer given the many interaction mechanisms between microbes18. Thus, we assume that f(x) is some unknown meromorphic function of x (i.e., the quotient of analytic functions). This assumption is very mild as it is satisfied by most population dynamics models19.

Instead of knowing the population dynamics of the microbial community, we assume we know its underlying ecological network \({\cal G} = ({\mathrm{X}},{\mathrm{E}})\). This network is a directed graph where nodes X = {x1, …, xN} represent species, and edges (xj → xi)  E denote that the j-th species has a direct ecological impact (i.e., direct promotion or inhibition of growth) on the i-th species (Fig. 1a). Mapping ecological networks requires performing mono- and co-culture experiments20,21, using system identification techniques with time-resolved abundance data22,23, or using steady-state abundance data via a recently developed inference method24. In general, ecological networks are different from correlation networks20,25 because correlation does not imply causation26,27.

Fig. 1
figure 1

Controlling a microbial community. a Ecological network \({\cal G}\) for a toy microbial community of N = 3 species (green, yellow, blue). The controlled ecological network \({\cal G}^{\mathrm{c}}\) contains M = 1 control input actuating the third species. b Initial and desired abundance profiles (bars). Controlling the community consists in driving its state from the initial state x0 to the desired state xd, represented by two points in the state space of the community. c In the continuous control scheme, the control inputs u(t) are continuous signals modifying the growth of the actuated species. The controlled population dynamics of this community is given by \(\dot x_1 = 0.1 + x_1(1 - x_1/5)(x_1/3 - 1) - (0.1x_1x_3)/(1 + x_3)\), \(\dot x_2 = 0.1 + x_2(1 - x_2/4)(x_2 - 1) + (x_2x_3)/(1 + x_3)\), \(\dot x_3 = x_3(1 - x_3/2)(x_3 - 1) + u\). In the absence of control, this community has two equilibria \(x_0 = (3.14,4.58,1)^ \top\) and \(x_{\mathrm{d}} = (4.57,4.73,2)^ \top\), chosen as the initial and desired states, respectively. d In the impulsive control scheme, the control inputs u(t) are impulses applied at the intervention instants \({\Bbb T} = \{ t_1,t_2, \cdots \}\), instantaneously changing the abundance of the actuated species. The controlled population dynamics is the same as in panel (c), except that \(\dot x_3 = x_3(1 - x_3/2)(x_3 - 1)\) and x3(t+) = x3(t) + u(t) if \(t \in {\Bbb T} = \{ 5,10,15\}\). Under this controlled population dynamics, our mathematical formalism identifies x3 as the solo driver species needed to drive this microbial community (Example 1 in Supplementary Note 2)

Controlling the community consists in driving its state from an initial value \(x_0 = x(0) \in {\Bbb R}^N\) at t = 0 (e.g., a “diseased” state) towards the desired value \(x_{\mathrm{d}} \in {\Bbb R}^N\) (e.g., the “healthy” state, Fig. 1b). We assume that the community will not evolve by itself to xd. To drive the community, we use M control inputs \(u(t) \in {\Bbb R}^M\) directly affecting certain species that we call “actuated species” (Fig. 1a). Control inputs encode a combination of M control actions applied at time t. We consider four possible control actions. If uj(t) < 0, the j-th control action at time t can be a bacteriostatic agent or bactericide, decreasing the abundance28 of the species it actuates. If uj(t) > 0, the j-th control action at time t can be a prebiotic29 or transplantation, stimulating the growth or engrafting a consortium of the species it actuates, respectively. Probiotics administration30 and FMTs14 are examples of transplantations. To specify the species actuated by each control input we introduce the controlled ecological network \({\cal G}^{{\kern 1pt} c{\kern 1pt} } = ({\mathrm{X}} \cup {\mathrm{U}},{\mathrm{E}} \cup {\mathrm{B}})\). Here, U = {u1, …, uM} are the control input nodes and (uj → xi)  B denotes that the j-th control input actuates the i-th species (Fig. 1a).

We introduce two control schemes describing how the control inputs change the species abundance (see Supplementary Note 1 for details). The first control scheme models a combination of prebiotics (if uj(t) > 0) and bacteriostatic agents (if uj(t) < 0) as continuous control inputs modifying the growth of the actuated species (Fig. 1c):

$$\dot x(t) = f\left( {x(t)} \right) + g\left( {x(t)} \right)u(t),\quad t \in {\Bbb R}.$$
(2)

The second control scheme models a combination of transplantations (if uj(t) > 0) and bactericides (if uj(t) < 0) applied at discrete intervention instants \({\Bbb T} = \{ t_1,t_2, \cdots \}\), rendering impulsive control inputs that instantaneously modify the abundance of the actuated species (Fig. 1d):

$$\dot x(t) = f\left( {x(t)} \right)\,{\mathrm{if}}\,t\not \in {\Bbb T},\quad x(t^ + ) = x(t) + g\left( {x(t)} \right)u(t)\,{\mathrm{if}}\,t \in {\Bbb T}.$$
(3)

Above, x(t+) denotes the state “right after time t”, so x(t) “jumps” at \(t \in {\Bbb T}\) if u(t) ≠ 0. The pair {f, g} characterizes both control schemes, describing the controlled population dynamics of the microbial community. The function \(g:{\Bbb R}^N \to {\Bbb R}^{N \times M}\) models the direct susceptibility of the species to the control actions. The j-th control input actuates the i-th species if \(g_{ij}\not \equiv 0\). Because g is typically unknown, we just assume that g(x) is some unknown meromorphic function of x such that \(g_{ij}\not \equiv 0\) iff (uj → xi)  B.

Notice that when all species are directly controlled (i.e., an independent control input actuates each species), the whole microbial community can easily be driven to the desired state. Fortunately, as we show next, actuating all the species is far from being necessary. Thanks to the inter-species interactions encoded in the ecological network \({\cal G}\), we can identify minimum sets of species that we need to actuate in order to drive the whole community. We call those species “driver species”.

Identifying driver species

To understand when a set of actuated species is a set of driver species, consider the three-species community with Generalized Lotka–Volterra (GLV) population dynamics of Fig. 2a. This toy community has one control input actuating x3. Actuating only this species creates an autonomous element—namely, a constraint between some species abundances that the control input cannot break, confining the state of the community to a low-dimensional manifold (Fig. 2a, right). More precisely, our mathematical formalism reveals that ξ = x1x2 is the autonomous element (Example 2 in Supplementary Note 2). Indeed, differentiating ξ with respect to time yields \(\dot \xi = x_1x_2(1 - x_3) + x_1x_2( - 1 + x_3) \equiv 0\), confining the community to \(\{ x \in {\Bbb R}^3|x_1x_2 = x_1(0)x_2(0)\}\). Intuitively, the autonomous element exists because the control input cannot change x1 without changing x2 in a predefined way, making it impossible to drive the community in the three-dimensional state space. This observation indicates that x3 alone cannot be a driver species for this community. Introducing a second control input actuating x1 helps the community jump out of the low-dimensional manifold eliminating the autonomous element, allowing us to drive this community to any desired state with positive abundance (Fig. 2b, and Example 6 in Supplementary Note 5). Therefore, {x1, x3} is a minimum set of driver species for this community.

Fig. 2
figure 2

Autonomous elements constrain the state of microbial communities, characterizing their driver species. a A three-species community with GLV dynamics \(\dot x_1 = x_1( - 1 + x_3)\), \(\dot x_2 = x_2(1 - x_3)\), \(\dot x_3 = x_3( - 0.5 + 1.5x_3)\). For actuating x3, we consider the impulsive control scheme with x3(t+) = x3(t) + u1(t) for \(t \in {\Bbb T}\). With this controlled population dynamics, our mathematical formalism reveals the autonomous element x1x2 that constraints the state of this microbial community to the low-dimensional manifold \(\{ x \in {\Bbb R}^3|x_1x_2 = x_1(0)x_2(0)\}\) (gray) for all control inputs. Five state trajectories (in colors) with random control inputs illustrate this fact. Hence, {x3} alone cannot be a set of driver species for this controlled population dynamics. b Including a second control input u2(t) actuating x1 (i.e., x1(t+) = x1(t) + u2(t) for \(t \in {\Bbb T}\)) eliminates the autonomous element, since the state of the microbial community (colors) can explore a three-dimensional space (gray). Hence {x1, x3} is a minimum set of driver species for this community with GLV dynamics. c We proved that, generically, increasing the complexity of the controlled population dynamics cannot create autonomous elements. In this example, increasing the deformation size C from the GLV in panel (a) (with C = 0) to the controlled population dynamics in Fig. 1 (with C > 0) eliminates the autonomous element that was present by actuating x3 alone (Example 1 in Supplementary Note 2). Therefore, increasing the complexity of the population dynamics makes {x3} a solo driver species

In the general case of N species and M control inputs, we define a set of actuated species as a set of driver species if the corresponding controlled population dynamics {f, g} lacks autonomous elements. For linear dynamics {f(x), g(x)} = {Ax, B}, \(A \in {\Bbb R}^{N \times N}\), \(B \in {\Bbb R}^{N \times M}\), the absence of autonomous elements is equivalent to their controllability31—the ability to drive the system between any two states, easily verified using Kalman’s condition \({\mathrm{rank}}\,[B,AB, \cdots, A^{N - 1}B] = N\). In the case of nonlinear dynamics, the absence of autonomous elements can be characterized using a mathematical formalism based on differential one-forms (see Methods and Supplementary Note 2). For the continuous control scheme of Eq. (2), the conditions for the absence of autonomous elements are well understood as they define when a system is accessible31, a cornerstone concept in nonlinear control theory. Because it is more natural to control microbial communities with impulsive control actions, in this paper we extended the study of autonomous elements to the impulsive control systems of Eq. (3). We first introduced a definition of autonomous elements for impulsive control systems (Definition 3 in Supplementary Note 2). We then characterized necessary and sufficient conditions for the absence of autonomous elements in a controlled population dynamics (Theorem 2 in Supplementary Note 2). To our surprise, the conditions for the absence of autonomous elements for the continuous and the impulsive control schemes are identical (Remark 2 in Supplementary Note 2). This result means that transplantations and bactericides (impulsive control actions) can be as effective as prebiotics and bacteriostatic agents (continuous control actions).

Structural accessibility characterizes the generic absence of autonomous elements

In general, it remains extremely difficult finding a pair {f, g} that models the controlled population dynamics of a microbial community. This fact might suggest it is impossible to predict if the controlled community has autonomous elements or not, making it impossible to identify its driver species. We now show that this seemingly unavoidable limitation can be solved using the topology of the controlled ecological network of the community.

Define the network \({\cal G}_{f,g} = ({\mathrm{X}} \cup {\mathrm{U}},{\mathrm{E}}_{f,g} \cup {\mathrm{B}}_{f,g})\) associated with {f, g} as follows: (xj → xi)  Ef,g if xj appears in the right-hand side of \(\dot x_i\) in Eq. (2) or xi(t+) in Eq. (3). Similarly, (uj → xi)  Bf,g if \(g_{ij}\not \equiv 0\). Using this definition, we next describe the class \({\frak D}\) of all possible controlled population dynamics that a controlled microbial community can have given we know its \({\cal G}^{\mathrm{c}}\). Mathematically, \({\frak D}\) contains all base models {f*, g*} such that \({\cal G}_{f^{\ast},g^ \ast } = {\cal G}^{\mathrm{c}}\), together with all deformations {f, g} of each of those base models. The base models characterize the simplest controlled population dynamics that the community can have, leading us to choose them as controlled GLV models with constant susceptibilities:

$$f_i^ \ast (x) = r_ix_i + \mathop {\sum}\limits_{j = 1}^N a_{ij}x_ix_j,\quad g_{ij}^ \ast (x) = b_{ij},$$
(4)

for i = 1, …, N. The parameters \(A = (a_{ij}) \in {\Bbb R}^{N \times N}\), \(r = (r_i) \in {\Bbb R}^N\), and \(B = (b_{ij}) \in {\Bbb R}^{N \times M}\) represent the interaction matrix, the intrinsic growth rate vector, and the susceptibility matrix of the community, respectively. As the simplest population dynamics, the GLV model has been applied to microbial communities in lakes, soils, and human bodies14,15,20,32,33,34,35,36,37,38.

A deformation of {f*, g*} is any meromorphic pair {f, g} such that: (i) \({\cal G}_{f,g} = {\cal G}_{f^{\ast},g^{\ast}}\); (ii) there exists a finite set of parameters \(\theta \in {\Bbb R}^C\) such that \(\{ f(x),g(x)\} = \{ \tilde f(x;\theta ),\tilde g(x;\theta )\}\); and (iii) the identity \(\{ \tilde f(x;0),\tilde g(x;0)\} = \{ f^ \ast (x),g^ \ast (x)\}\) holds. The smallest integer C ≥ 0 satisfying these three conditions is called the size of the deformation. A general class of controlled population dynamics are deformations of Eq. (4), including

$${\hskip-5pt}f_i(x;\theta ) = \, \theta _{i,1} + x_i\left( { - r_i - \theta _{i,2}x_i} \right)\left( {\theta _{i,3}x_i - 1} \right) \\ \!\!\!\!\!\!\!\!\!+ \mathop {\sum}\limits_{j = 1}^N a_{ij}\frac{{x_ix_j}}{{1 + \theta _{ij,4} + \theta _{ij,5}x_i + \theta _{ij,6}x_ix_j + \theta _{ij,7}x_j}},$$
(5)

for i = 1, …, N. Above, θi,1 are migration rates from/to neighboring habitats, \(\theta _{i,2}^{ - 1}\) are the carrying capacities of the environment, \(\theta _{i,3}^{ - 1}\) are the Allee constants, and \(\{ \theta _{ij,k}\} _{k = 4}^7\) characterize the functional responses39. θi,1 > 0 also models species like C. difficile that sporulate into “inactive” forms and then recover. “Higher-order interactions” (e.g., θixixjxk) and susceptibilities mediated by species abundance (e.g., gij(x;θ) = bij + θijkxk) are deformations as well.

We call \({\frak D}\) structurally accessible if almost all of its base models and almost all of their deformations lack autonomous elements. This definition means that except for a zero-measure set of “singularities,” all the controlled population dynamics that the community may take have to lack autonomous elements. The conditions under which \({\frak D}\) is structurally accessible are fully characterized using our mathematical formalism and they depend only on \({\cal G}^{\mathrm{c}}\) (see Methods and Supplementary Note 3). Hence, if \({\frak D}\) is structurally accessible, hereafter we also call \({\cal G}^{\mathrm{c}}\) structurally accessible. We first proved that, generically, increasing the size of a deformation cannot create autonomous elements (Proposition 1 in Supplementary Note 3). See also Fig. 2c for an illustration. This result reduces the search for autonomous elements to the deformations in \({\frak D}\) with minimum size C = 0 (i.e., all base models whose graph matches \({\cal G}^{\mathrm{c}}\)). Finally, we proved that \({\frak D}\) is structurally accessible if and only if \({\cal G}^{\mathrm{c}}\) satisfies the following two graph–theoretical conditions: (i) each species is the end-node of a path that starts at a control input node; and (ii) there is a disjoint union of cycles (excluding self-loops) and paths that cover all species nodes (Theorem 3 of Supplementary Note 3). Note that the conditions for structural accessibility depend on the chosen base model.

Structural accessibility is a nonlinear generalization of “structural controllability” for linear systems16. The latter notion has received increasing attention in Network Science17. Interestingly, the two graph–theoretical conditions for structural accessibility are almost the same as those for structural linear controllability16. The key difference is that for structural linear controllability self-loops (corresponding to intrinsic nodal dynamics) can be used to satisfy condition (ii). See Remark 4 in Supplementary Note 3 for more details.

Identifying minimum sets of driver species in microbial communities

The above result provides a complete graph-characterization of driver species: a set of actuated species is a set of driver species (for all but a zero-measure set of controlled population dynamics that the community may have) if and only if its corresponding \({\cal G}^{\mathrm{c}}\) satisfies the two graph–theoretical conditions. See Fig. 3 for an illustration. With this characterization, one can apply the maximum matching algorithm directly to \({\cal G}\) to calculate the minimum number of control inputs needed to ensure the structural accessibility of \({\cal G}^{\mathrm{c}}\), as did in the structural linear controllability case17,40. However, this may not provide a minimum set of driver species because one control input may actuate multiple species. Fortunately, we can dedicate one control input to one species. Therefore, we adapted the notion of a “feasible dedicated input configuration”41 and a polynomial-time algorithm (combining maximum matching with a strongly connected component decomposition of \({\cal G}\)) to identify one minimum set of driver species (Methods and Supplementary Note 4). Note that once \({\cal G}^{\mathrm{c}}\) is structurally accessible, it cannot lose its structural accessibility when new edges are added to it. This observation implies that the driver species can be identified from an “incomplete” ecological network (e.g., containing only high-confidence interactions).

Fig. 3
figure 3

Identifying driver species. For each network, a minimum set of driver species is shown providing a disjoint union of paths (purple) and cycles (green) covering all species nodes (see Supplementary Table 1 for the species name). Thus, the resulting controlled ecological network is structurally accessible. Self-loops are omitted from these networks to improve readability. a Inferred ecological network of the gut microbiota of germ-free mice pre-colonized with a mixture of human commensal bacterial type strains and then infected with C. difficile (species 7), as in ref. 22 b Inferred ecological network of the core microbiota of the sea sponge Ircinia oros, as in ref. 23

Driving the driver species

We next calculate the control inputs to be applied to a set of driver species for driving the whole community towards the desired state xd. We will show that it is more efficient to calculate impulsive control inputs. To calculate these impulsive control inputs \(\{ u(t_k),t_k \in {\Bbb T}\}\) we adopt a model predictive control (MPC) approach42. Based on the current state of the community x(tk) at \(t_k \in {\Bbb T}\), we use knowledge of its controlled population dynamics {f, g} to predict the sequence of states \({\hat{X}}_{k,L} = \{ {\hat{x}} (t_{k + 1}), \cdots,{\hat{x}}(t_{k + L + 1})\}\) that the community will take in response to a sequence of L impulsive control inputs \(U_{k,L} = \{ u(t_k), \cdots,u(t_{k + L - 1})\}\). The prediction horizon L > 0 determines how far into the future we predict. Then, we choose \(u(t_k) = u_1^ \ast (t_k)\) where \(u_1^ \ast (t_k)\) is the first element of the optimal control sequence \(U_{k,L}^ \ast\) calculated as:

$$U_{k,L}^ \ast = {\mathrm{arg}}\mathop {\mathrm{min}}\limits_{U_{k,L} \in {\Bbb R}^{M \times L}} J_{x_{\mathrm{d}}}({\hat{X}}_{k,L},U_{k,L})\ {\mathrm{subject}}\,{\mathrm{to}}\ U_{k,L} \in {\mathrm{\Omega }}.$$
(6)

Here, \({\mathrm{\Omega }} \subseteq {\Bbb R}^{M \times L}\) specifies constraints in the control inputs, and \(J_{x_{\mathrm{d}}}\) is some cost function penalizing deviations of the predicted trajectory \({\hat{X}}_{k,L}\) from xd. For example, the cost function \(J_{x_{\mathrm{d}}}({\hat{X}}_{k,L},U_{k,L}) = \| {{\hat{x}} (t_{k + L + 1}) - x_{\mathrm{d}}} \|\) penalizes the deviations of the predicted final state. By recalculating \(U_{k,L}^ \ast\) at each tk using the actual state of the community the MPC creates a feedback loop enhancing its robustness against prediction errors42. The prediction horizon can be chosen based on the controlled population dynamics of the community (Methods). For L = 1, this methodology is similar to ref. 43. Equation (6) is a finite-dimensional optimization problem that can be solved using algorithms like DIRECT44. Solving the analogous optimization problem for continuous control inputs is more challenging because the optimization is over the infinite-dimensional space of continuous functions.

We illustrate the above MPC strategy driving the microbial community of Fig. 1 with its solo driver species. According to its dynamics, L = 3 impulsive control inputs are sufficient (see caption in Fig. 1, and Example 4 in Supplementary Note 5). We chose  \(J_{x_{\mathrm{d}}}({\hat{X}}_{k,L},U_{k,L}) = \left\| {\hat{x}}(t_{k,L}) - x_{\mathrm{d}} \right\|_{2}\). Solving Eq. (6) using DIRECT yields the nonlinear MPC strategy u*(t1) = −0.8815, u*(t2) = 2.0089 and u*(t3) = −10−4 (pink in Fig. 4a). We compared the performance of two other control strategies. The first strategy uses one transplantation to increase the abundance of the driver species to its desired value, reminiscent of one probiotic administration restoring its “healthy” abundance (purple in Fig. 4a). The second control strategy ignores the driver species, setting the abundance of the two non-driver species to their desired values (blue in Fig. 4a).

Fig. 4
figure 4

Success and failure of different control strategies. a Three control strategies for driving the microbial community of Fig. 1a toward the desired state. First, MPC applied to the identified driver species {x3} (pink dots). The second control strategy increases the abundance of the driver species to match its value at the desired state x3(t1) = x3,d (purple dots). The third control strategy does not actuate the driver species, but actuates the other two species {x1, x2} by setting their abundance to their desired values (i.e., x1(tk) = x1,d and x2(tk) = x2,d, solid and hollow blue dots, respectively). b Response of the microbial community to these three control strategies. Here and in panel (d), the “jumps” produced by the control inputs are depicted by dashed lines. The equilibria of the population dynamics are shown as gray dots. Only the MPC applied to the driver species succeeds in driving the community to xd. c Control strategies obtained by using the linear MPC with parameters \(Q = {\mathrm{diag}}(20,1,10)\) and different values for R: 10−4 (pink), 10−3 (green), 10−2 (blue). d Trajectories of the controlled community using the linear MPC strategies described in panel (c). Colors correspond to the different values of R

Among the above three control strategies, only the nonlinear MPC applied to the driver species succeeds (Fig. 4b). This strategy succeeds in a somewhat unconventional way: although the driver species is more abundant in the desired state than in the initial state, the first control action decreases its abundance further. Such control action lets the non-driver species reach their desired abundances and, once that happens, the abundance of the driver species is finally increased to its desired value (pink in Fig. 4b). Just restoring the abundance of the driver species succeeds in driving x2 and x3, but it fails to drive x1 to the desired abundance (purple in Fig. 4b). Ignoring the driver species is the worst control strategy, failing to drive any of the three species to their desired values (blue in Fig. 4b). This toy example demonstrates the advantage of identifying and actuating driver species.

Driving large communities with uncertain dynamics

Solving the non-convex optimization problem of Eq. (6) is challenging as N or L increase, and it also requires knowing {f, g}, which may be impossible for large communities. We next circumvent these two drawbacks leveraging the network underlying the controlled microbial community.

Consider we can obtain a weighted adjacency matrix \({\hat{A}} \in {\Bbb R}^{N \times N}\) from \({\cal G}\), providing a proxy for its interaction matrix. Without additional knowledge of the community,  we just assume that we can increase or decrease the abundance of each driver species. We thus use \({\hat{B}} \in \{ 0,1\} ^{N \times M}\) as a proxy for the susceptibility matrix, with bij = 1 if the j-th control input actuates the i-th driver species. By rewriting \(\{ f(x),g(x)\} = \{{\hat{A}}x + w_x,{\hat{B}} + w_u\}\), we use \(\{ {\hat{A}}x,{\hat{B}}\}\) to provide a linear prediction for the response of the community to the control inputs. Here, \(w_x = f - {\hat{A}}x\) and \(w_u = g - {\hat{B}}\) are considered as “perturbations”. Using \(\{ {\hat{A}}x,{\hat{B}}\}\), we design a linear MPC by solving Eq. (6) with the quadratic cost function

$$J_{x_{\mathrm{d}}}({\hat{X}}_{k,\infty },U_{k,\infty }) = \mathop {\sum}\limits_{i = k}^\infty \left[ {{\hat{x}}(t_i) - x_{\mathrm{d}}} \right]^ \top Q\left[ {{\hat{x}}(t_i) - x_{\mathrm{d}}} \right] + u(t_i)^ \top Ru(t_i).$$

Above, the positive definite matrices \(Q = Q^ \top \in {\Bbb R}^{N \times N}\) and \(R = R^ \top \in {\Bbb R}^{M \times M}\) are design parameters. Q penalizes the deviations of the predicted trajectory from the desired state, and R penalizes the control inputs magnitude. Under this scenario, Eq. (6) can be solved in closed form45 yielding the linear MPC u(tk) = Kx(tk), where \(K \in {\Bbb R}^{M \times N}\) is the solution of a Riccati equation (Supplementary Note 6). Since the Ricatti equation can be efficiently solved for large N, the linear MPC can be calculated for large communities. This linear MPC is robust against (wx, wu) and it allows calculating the control inputs for the continuous control scheme (Supplementary Note 6). However, its performance strongly depends on the chosen \(({\hat{A}}, {\hat{B}})\) and the distance to the desired state (Supplementary Note 6).

We applied the linear MPC for driving the toy three-species community of Fig. 1, assuming its dynamics is uncertain. Considering the ecological network of this community and its nonlinear population dynamics, we chose \({\hat{A}} = ( - 0.5,\, 0, - 0.1;0, - 5,1;0,0, - 1)\) as a proxy for its interaction matrix. Here \({\hat{A}}\) is a rough approximation of the linearization of the population dynamics at the desired state given by (−0.37, 0, −0.05; 0, −5. 31, 0.52; 0, 0, −1). Choosing \(Q = {\mathrm{diag}}(20,1,10)\), we compared the performance of three different linear MPCs obtained with R = 10−4, 10−3, 10−2 (Fig. 4c). For R = 10−4, without using knowledge of the population dynamics, the performance of the linear MPC (pink in Fig. 4d) is very similar to the performance of the nonlinear MPC that uses full knowledge of the nonlinear population dynamics (pink in Fig. 4b). This success illustrates the robustness of the linear MPC against the perturbations. As R increases, the performance of the linear MPC deteriorates (green and blue in Fig. 4d).

Numerical validation on large uncertain microbial communities

To validate our control framework for large communities, we built communities of N = 100 species having random directed Erdös–Rényi ecological networks with connectivity c [0, 1], see Fig. 5a. The network edge-weights were  chosen from a normal distribution with zero mean and standard deviation σ > 0, where σ characterizes the typical interspecies interaction strength. Negative self-loops with weights −1 were added to each species. We used this ecological network to identify the driver species of the community, and its corresponding weighted adjacency matrix as the interaction matrix to construct the linear MPC. We simulated the population dynamics of these communities using Eq. (5) ensuring all share \(x_{\mathrm{d}} \in {\Bbb R}^N\) as equilibrium. The resulting communities have nonlinear population dynamics, and their linearization at the desired state is different from the interaction matrix used for the linear MPC (Supplementary Note 8).

Fig. 5
figure 5

Numerical validation on large microbial communities. a Example of the ecological network for a random microbial community with N = 100 species (c = 0.03). A minimum set of M = 6 driver species is shown in purple. The desired state is chosen as \(x_{\mathrm{d}} = (1, \cdots ,1)^ \top\). b With a random initial abundance x0 at distance d = 0.4 from the desired state, the uncontrolled microbial community does not reach xd. c, d For the same community and initial abundance as in panel (b), we apply the control input generated by the linear MPC (panel c) to the six identified driver species. This control strategy drives the community to xd (panel d). e, f, h Mean success rate as a function of d. Error bars denote the standard error of the mean. Parameters are: c = 0.025, θmax = 0.05 for panel (e), σ = 0.8, θmax = 0.05 for panel (f), and c = 0.025, σ = 0.8, θmax = 0.05 for panel (h). g Success rate as function of the proportion M/N of driver species. Black dots show the success rate of 7700 random communities. Pink shows the mean success rate

To quantify the success of our control framework on a given community, we generated 300 initial species abundances that are uniformly distributed at a distance d > 0 from xd. The success rate at distance d is defined as the proportion of those initial conditions that are driven to xd only when the linear MPC is applied to a minimum set of driver species of the community (Fig. 5b–d). Namely, we discard all initial conditions that naturally evolve to xd. Finally, we calculated the mean success rate by averaging the success rate over 100 random communities (see Supplementary Note 8 for details).

The mean success rate is close to 1 for small d regardless of the community’s parameters (Fig. 5e, f), confirming the theoretical guarantee that the linear MPC succeeds if d is small enough. The mean success rate decreases as σ increases, especially for large distances (Fig. 5e). Since increasing σ damages the stability of the population dynamics46, this result suggests that microbial communities become “harder” to control as they lose stability. The mean success rate is higher in communities with low connectivity (Fig. 5f). In general, the size of a minimum set of driver species increases as c decreases, indicating that the success rate increases as the number of driver species increases. Indeed, regardless of d, our control framework attains a mean success rate >0.8 provided that at least 6 from 100 species are driver species (Fig. 5g). This result suggests that the success rate can be enhanced by actuating a few additional species. Finally, to investigate the robustness of our control framework to errors in the ecological network, we randomly rewired each of its edges with probability p [0,1] (e.g., p = 0.05 corresponds to a 5% error). The success rate deteriorates but remains larger than zero despite large errors (Fig. 5h), showing the robustness of our control framework. However, a 5% error decreases the mean success rate in about 30%, emphasizing the importance of accurately mapping ecological networks for controlling microbial communities.

Application

We analyzed the ecological network of the gut microbiota of germ-free mice that were  pre-colonized with a mixture of human commensal bacterial type strains and then infected with C. difficile spores22. In Fig. 3a, we identified a minimum set of five driver species in this 14-species community: Ruminococcus obeum (x1), Raphitoma mirabilis (x12), Bacteroides ovatus (x2), Clostridium ramosum (x6), and Akkermansia muciniphila (x10). We also used the ecological network underlying the core microbiota of the sea sponge I. oros23, finding ten driver species in this twenty-species community (Fig. 3b).

We studied by simulation the efficacy of the identified driver species and the linear MPC for driving these two microbial communities, assuming that their dynamics are uncertain (see Supplementary Note 7 for details of the simulation). For the mice gut microbiota, our framework succeeds in driving the community from an initial state where C. difficile is overabundant towards the desired state with a better balance of species (Fig. 6a, c). Similar results were obtained for controlling the core microbiota of I. oros (Fig. 6b, d). These results show again that the linear MPC method is robust enough to drive nonlinear microbial communities.

Fig. 6
figure 6

Controlling host-associated microbial communities. The controlled population dynamics of both microbial communities were simulated using the controlled GLV equations (see Supplementary Note 7 for details). The intrinsic growth rates were adjusted such that the community has an initial “diseased” equilibrium state x0 in which one species (C. difficile for the mice gut microbiota) is overabundant compared to the rest of species. We chose the desired state xd as another equilibrium with a more balanced abundance profile. For each microbial community, we used the minimum set of driver species identified in Fig. 3. a, b Control inputs obtained using the linear MPC for the impulsive and continuous control schemes. c, d Projection of the high-dimensional abundance profiles (states of the microbial communities) into their first three principal components (PCs). See Supplementary Figure 1 for the temporal response of each species. The calculated control strategies applied to the driver species succeed in driving the community to the desired state, using either continuous or impulsive control

Discussion

Our theoretical framework allows systematically and efficiently controlling microbial communities towards desired states by identifying their driver species. Identifying the driver species of a microbial community only requires knowledge of its underlying ecological network. Note that there could be multiple different minimum sets of driver species for the same community. If the cost of choosing any species as a driver species is known, a combinatorial optimization scheme will allow selecting the best minimum driver species set. We emphasize that the driver species discussed here may not coincide with other notions in ecology such as keystone47,48 or core49 species. For example, the selection of driver species do not directly depend on their abundances, while keystone species do47.

For large uncertain communities, the linear model predictive controller gives a robust and efficient way to calculate the control inputs. The performance of this controller could be further improved by modeling the susceptibility of species to the control actions (e.g., pharmacokinetics). In such case, different control actions could be modeled by different pairs {f, g}, making the conditions for the absence of autonomous elements different for continuous and impulsive control actions. Control algorithms based on reinforcement learning50 (RL) could provide even better performance. Our characterization of minimum sets of driver species will help to efficiently apply those control algorithms to microbial communities, as RL algorithms require specifying the “driver variables” they can actuate51. Here, controlling small synthetic communities could provide valuable insights for designing such controllers. We also note that altering the ecological network or obtaining a “simplified” network, in the spirit of refs. 52,53, could be complementary control approaches (e.g., for reducing the minimum number of driver species).

It has been suggested that the success of ecosystem management strategies could be predicted using the notion of controllability54. However, this notion is somewhat inadequate for microbial communities and other biological systems. By their nature, biological systems cannot be fully controllable because there are states they cannot reach (e.g., states with negative abundances). Furthermore, since dynamic models for microbial communities are nonlinear and uncertain, it is impossible even to test if those systems are controllable. Structural accessibility overcomes these two limitations, generalizing the notion of accessibility31 to systems with uncertain dynamics. Counterintuitively, our mathematical formalism suggests that communities with more complicated population dynamics (i.e., deformations with larger size) require fewer driver species. However, using fewer driver species could complicate the design of control strategies (Remark 9 in Supplementary Note 5). Indeed, by choosing an adequate base model55 and making mild assumptions on the dynamics (i.e., f and g are meromorphic functions), our framework can identify minimum sets of “driver variables” for general nonlinear systems when their underlying networks are known (see Supplementary Note 9 for an example of a small gene regulatory network).

There are two limitations in our current framework for controlling microbial communities. First, stochastic effects are considered negligible. Incorporating stochastic effects yields stochastic differential equations for which the notion of autonomous elements still needs to be mathematically formulated. We anticipate that this is quite challenging, but definitely merits further studies. Second, our current framework does not explicitly model the dynamics of resources provided to and/or chemicals produced by the microbial species56,57,58,59,60,61,62,63. Our characterization of driver species only applies to some instances of resource-based models, e.g., the classical MacArthur’s consumer-resource model64 when the resource dynamics is much faster than the species dynamics65. For general resource-based models, identifying their driver species requires analyzing a new kind of “output accessibility” that characterizes the absence of autonomous elements in the species abundances and ignores autonomous elements in the resource abundances. Then, the notion of “structural output accessibility” (i.e., generic output accessibility given an adequate base model) would provide a nonlinear counterpart of linear target controllability66. Structural output accessibility could allow us to identify driver species and/or “driver resources” of a community from knowing the bipartite interaction network of species and resources. This is beyond the scope of this work and deserves dedicated efforts.

To fully harvest the benefits of controlling microbial communities, a stronger synergy between microbial ecology and control theory is necessary. We hope that this work will catalyze new interdisciplinary approaches that enhance our ability to control complex microbial communities inside and around us.

Methods

Detecting autonomous elements in the continuous control scheme

For the continuous control systems of Eq. (2), the notion of autonomous elements and the conditions for their absence are well understood, since they define when a system is accessible (see Supplementary Note 2.2 and ref. 31 for details). An autonomous element for Eq. (2) is a non-constant function ξ(x) such that there exists an integer ν ≥ 0 and a meromorphic function F such that \(F(\xi,\dot \xi, \cdots,\xi ^{(\nu )}) = 0\). In words, an autonomous element ξ is an “internal variable” of the system that evolves completely unaffected by the control inputs. System (2) is said accessible if it has no autonomous element31.

The absence of autonomous elements can be characterized by using a mathematical formalism based on differential one-forms31. Consider the set of meromorphic functions \({\cal K}\) in the variables \(\{ x,u,\dot u,\ddot u, \cdots \}\), and the sets of differential symbols \({\mathrm{d}}x = ({\mathrm{d}}x_1, \cdots,{\mathrm{d}}x_N)^{\top}\) and \({\mathrm{d}}u = ({\mathrm{d}}u_1, \cdots,{\mathrm{d}}u_M)^{\top}\). Let \({\cal X} = {\mathrm{span}}_{\cal K} \{{\mathrm{d}}x \}\) be the vector space spanned over \({\cal K}\) by the elements of dx, intuitively playing the role of “all functions of state variables”. Any \(\omega \in {\cal X}\) is a “one-form”31 (see Supplementary Note 2.1 for details). In this setting, the chain rule provides a way to formally operate with one-forms, such as taking time derivatives: if \(\omega = \beta ^ \top {\mathrm{d}}x\) then \(\dot \omega : = \dot \beta ^ \top {\mathrm{d}}x + \beta ^ \top {\mathrm{d}}\dot x\). To identify the presence of autonomous elements in the dynamics with continuous control of Eq. (2), one calculates the sequence of subspaces \({\cal H}_k \subset {\cal X}\) defined recursively by

$${\cal H}_k = \{ \omega \in {\cal H}_k|\dot \omega \in {\cal H}_k\},\quad k \ge 1,$$
(7)

starting with \({\cal H}_1 = {\cal X}\). Then, one can prove that Eq. (2) lacks autonomous elements if and only if there exists an integer k* such that \({\cal H}_{k^ \ast } = \{ 0\}\), see ref. 31 (page 49, Thm.3.17).

Detecting autonomous elements in the impulsive control scheme

For the impulsive control systems of Eq. (3) the notions of autonomous elements and accessibility are rather unexplored. Recall that an autonomous element is an internal variable of the system that is completely unaffected by the control actions. To introduce a  suitable definition of autonomous element for the impulsive control systems, note that the control inputs cause “jumps” in the actuated variables (i.e., discontinuities). These jumps are propagated to other state variables by the continuous dynamics. Thus, we define an autonomous element of Eq. (3) as a non-constant function ξ(x) such that ξ(x(t)), \(t \in {\Bbb R}\), is a \({\cal C}^\infty\) function (i.e., infinitely differentiable function) under any impulsive input (see Supplementary Note 2.3 for details). By analogy to the case of continuous control, we say that system (3) is accessible if it has no autonomous element according to the above definition.

To characterize the accessibility of impulsive control systems, we built the sequence of subspaces \({\cal H}_k\) of all functions of the state variables that can be differentiated at least (k − 1) times (see details in Supplementary Note 2.3). The functions belonging to the limit \({\cal H}_\infty\) are the autonomous elements of the system, since they are completely unaffected by the control inputs. Consequently, because the limit subspace \({\cal H}_\infty\) is also “integrable” (informally, it does not contain “fictitious” autonomous elements), accessibility is equivalent to the condition \({\cal H}_\infty = \{ 0\}\) (see Theorem 2 in Supplementary Note 2). We further prove that the limit \({\cal H}_\infty\) is attained in a finite step (i.e., there exists a finite k* such that \({\cal H}_{k^ \ast } = {\cal H}_{k^ \ast + 1} = \cdots = {\cal H}_\infty\)).

We illustrate the above formalism using the three-species microbial community of Fig. 1 where x3 is the actuated species (see caption for its population dynamics). To compute the sequence \({\cal H}_k\), one starts by definition with \({\cal H}_1 = {\mathrm{span}}_{\cal K}\{ {\mathrm{d}} x_1,{\mathrm{d}}x_2,{\mathrm{d}}x_3\}\). Next, \({\cal H}_2\) are all one-forms in \({\cal H}_1\) that can be differentiated once (i.e., they are continuous, so they are not directly affected by u). Because u actuates x3, we get \({\cal H}_2 = {\mathrm{span}}_{\cal K}\{{\mathrm{d}}x_2,{\mathrm{d}}x_1\}\). Similarly, \({\cal H}_3\) are all those one-forms in \({\cal H}_2\) that can be differentiated twice (i.e., their first derivative is continuous), yielding \({\cal H}_3 = {\mathrm{span}}_{\cal K}\{ x_2{\mathrm{d}} x_1 + x_1{\mathrm{d}} x_2\}\). Finally, \({\cal H}_4 = \{ 0\}\) (see details in Example 1 in Supplementary Note 2). This implies that the controlled population dynamics is free of autonomous elements and hence it is accessible. See also Example 2 in Supplementary Note 2 for a community with autonomous elements.

Detecting autonomous elements without knowledge of the population dynamics

When the controlled population dynamics of the microbial community is unknown, we consider the class \({\frak D}\) of all controlled dynamics that the community may have given we know its controlled ecological network. Identifying the presence of autonomous elements in the full class \({\frak D}\) becomes possible thanks to so-called “generic properties” of meromorphic functions31. This is a mathematical property implying that a meromorphic function will satisfy a certain condition in almost all points of its domain—that is, everywhere except for a zero-measure set of “singularities”—provided that such condition holds at a single point. We exploited this property to prove that, generically, increasing the size C of a deformation cannot create new autonomous elements (Proposition 1 in Supplementary Note 3). See Fig. 2c for an illustration. This result allows us to only search for autonomous elements on the subset \({\frak D}_0 \subset {\frak D}\) of all \(\{ f,g\} \in {\frak D}\) with size C = 0, corresponding to all base controlled GLV models of Eq. (4). Finally, we proved that the generic absence of autonomous elements in \({\frak D}_0\) can be determined only from the topology of the controlled ecological network \({\cal G}^{\mathrm{c}}\) (Theorem 3 in Supplementary Note 3).

Identifying a minimum set of driver species

Let \(\tilde {\cal G}({\mathrm{X}})\) be the subgraph obtained by removing all self-loops from the ecological network \({\cal G}({\mathrm{X}})\) of the (uncontrolled) community. Let \(\tilde {\cal B}({\mathrm{X}}^ - \cup {\mathrm{X}}^ + )\) be the bipartite representation of \(\tilde {\cal G}({\mathrm{X}})\), built by placing the edge \(({\mathrm{x}}_j^{+},{\mathrm{x}}_{i}^{-})\) in \(\tilde {\cal B}\) if the directed edge (xj → xi) is in \(\tilde {\cal G}\). Then, to identify a minimum set of driver species, we applied the notion of a “dedicated input configuration” introduced in ref. 41 (see details in Supplementary Note 4).

A strongly connected component (SCC) is said “non-top linked” if it has no incoming edges from other SCCs. Let M* be a maximum matching in \(\tilde {\cal B}\). Then, a non-top linked SCC is said to be “top assignable” with respect to M* if it contains at least one right-unmatched node in M*. Let Z  X be the set of right-unmatched nodes of some maximum matching of \(\tilde {\cal B}\) with maximum top assignability. Let W  X be a set consisting of one state node from each non-top linked SCC of \(\tilde {\cal G}\) not already present in Z. Then, we prove that XD X is a minimum set of driver species if and only if there exist two disjoint subsets Z and W as defined above, such that XD = Z  W (Proposition 3 in Supplementary Note 4). Using this result, we applied Algorithm 1 of ref. 41 to \(\tilde {\cal G}\) to obtain a minimum set of driver species. This algorithm is implemented in Julia as the DriverSpecies function in the DriverSpeciesModule package. This algorithm is illustrated for communities of N = 100 species in Fig. 5a and Supplementary Fig. 2.

Choosing the prediction horizon

To choose the prediction horizon L for the nonlinear MPC we proved there are two possible cases (Theorem 4 in Supplementary Note 5). First, when the community can be driven to xd using L < ∞ impulsive control inputs. Second, when the community can only be asymptotically driven to xd, meaning that \(L \gg N\) should be chosen sufficiently large. This second case could be circumvented by increasing the number of actuated species (Remark 8 in Supplementary Note 5).

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Code availability

A Julia implementation of the algorithm for identifying a minimum set of driver species, as well as all other functions necessary to reproduce the results of the paper, is provided at the GitHub repository: https://github.com/mtangulo/DriverSpecies.