A theoretical framework for controlling complex microbial communities

Microbes form complex communities that perform critical roles for the integrity of their environment or the well-being of their hosts. Controlling these microbial communities can help us restore natural ecosystems and maintain healthy human microbiota. However, the lack of an efficient and systematic control framework has limited our ability to manipulate these microbial communities. Here we fill this gap by developing a control framework based on the new notion of structural accessibility. Our framework uses the ecological network of the community to identify minimum sets of its driver species, manipulation of which allows controlling the whole community. We numerically validate our control framework on large communities, and then we demonstrate its application for controlling the gut microbiota of gnotobiotic mice infected with Clostridium difficile and the core microbiota of the sea sponge Ircinia oros. Our results provide a systematic pipeline to efficiently drive complex microbial communities towards desired states.

M icroorganisms form complex communities that play critical roles in maintaining the well-being of their hosts or the integrity of their environment [1][2][3][4] . Disrupting these microbial communities can have severe consequences. In humans, for example, a disruption to the gut microbiota-the aggregate of microorganisms residing in our intestine-is associated with several disorders including irritable bowel syndrome, Clostridium difficile Infection (CDI), autism, obesity, and cavernous cerebral malformations [5][6][7] . For agriculture crops, a disruption of rhizosphere microbiota can reduce their disease resistance and hence decrease the overall crop yield 8,9 . In the oceans, a disruption to their microbiota can impact global climate by altering carbon sequestration rates 3,4,10 . Driving disrupted microbial communities back to their healthy states could offer novel solutions to prevent and treat complex human diseases, enhance sustainable agriculture, and regulate global warming 11,12 . For instance, inoculating soil microbes can restore terrestrial ecosystems 13 , and fecal microbiota transplantation (FMT) is so far the most successful therapy for treating recurrent CDI 14 . Despite the success of these empirical strategies, a broad application of microbialmanipulation strategies will be possible only if we can efficiently control large complex microbial communities 15 .
There are two big challenges down the road. First, an efficient control method should only manipulate a minimum set of species in the community. However, we still lack a systematic method to identify minimum sets of those "driver species" whose control can help us drive a whole community to desired states. Here, we use the term "species" without necessarily representing the lowest major taxonomic rank. One could also organize microbes by strains, genera, or operational taxonomical units. Second, even when those driver species have been identified, designing the control strategy that should be applied to them (e.g., how their abundance needs to be manipulated) for driving the community towards the desired state remains difficult. This difficulty arises because of the inherent complexity of microbial dynamics and our limited knowledge of them.
To address those two challenges, here we develop a control framework using the ecological network underlying the microbial community. First, we introduce the new notion of "structural accessibility", which generalizes the notion of structural linear controllability 16,17 to systems with nonlinear dynamics. Then, we derive a complete graph-theoretical characterization of structural accessibility. This result enables us to efficiently identify minimum sets of driver species of any microbial community purely from the topology of its underlying ecological network, even if some microbial interactions are missing and its population dynamics is unknown. Once the driver species are identified, we systematically design feedback control strategies to drive a microbial community towards the desired state, even if its dynamics is not precisely known. We numerically validated our control framework in large microbial communities, analyzing its performance for different parameters of the community (e.g., the connectivity of its underlying ecological network), and for errors in the ecological network used to identify the driver species. Finally, we demonstrate our framework by controlling the core microbiota of the sea sponge Ircinia oros, and restoring the gut microbiota of gnotobiotic mice infected by Clostridium difficile. Our results provide a rational and systematic framework to control microbial communities and other complex ecosystems.

Results
Modeling controlled microbial communities. Our framework focuses on the impact that manipulating a subset of species has on the abundances of other species. We thus consider a microbial community whose state at time t is determined from the abundance profile xðtÞ 2 R N of its N species, where the i-th entry x i (t) represents the abundance of the i-th species at time t. The state evolves according to some population dynamics where the function f : R N ! R N models the species intrinsic growth and the inter/intra-species interactions of the community (see Supplementary Note 1 for details). For most microbial communities f is unknown and difficult to infer given the many interaction mechanisms between microbes 18 . Thus, we assume that f(x) is some unknown meromorphic function of x (i.e., the quotient of analytic functions). This assumption is very mild as it is satisfied by most population dynamics models 19 . Instead of knowing the population dynamics of the microbial community, we assume we know its underlying ecological network G ¼ ðX; EÞ. This network is a directed graph where nodes X = {x 1 , …, x N } represent species, and edges (x j → x i ) ∈ E denote that the j-th species has a direct ecological impact (i.e., direct promotion or inhibition of growth) on the i-th species (Fig. 1a). Mapping ecological networks requires performing mono-and co-culture experiments 20,21 , using system identification techniques with time-resolved abundance data 22,23 , or using steady-state abundance data via a recently developed inference method 24 . In general, ecological networks are different from correlation networks 20,25 because correlation does not imply causation 26,27 .
Controlling the community consists in driving its state from an initial value x 0 ¼ xð0Þ 2 R N at t = 0 (e.g., a "diseased" state) towards the desired value x d 2 R N (e.g., the "healthy" state, Fig. 1b). We assume that the community will not evolve by itself to x d . To drive the community, we use M control inputs uðtÞ 2 R M directly affecting certain species that we call "actuated species" (Fig. 1a). Control inputs encode a combination of M control actions applied at time t. We consider four possible control actions. If u j (t) < 0, the j-th control action at time t can be a bacteriostatic agent or bactericide, decreasing the abundance 28 of the species it actuates. If u j (t) > 0, the j-th control action at time t can be a prebiotic 29 or transplantation, stimulating the growth or engrafting a consortium of the species it actuates, respectively. Probiotics administration 30 and FMTs 14 are examples of transplantations. To specify the species actuated by each control input we introduce the controlled ecological network G c ¼ ðX ∪ U; E ∪ BÞ. Here, U = {u 1 , …, u M } are the control input nodes and (u j → x i ) ∈ B denotes that the j-th control input actuates the i-th species (Fig. 1a).
We introduce two control schemes describing how the control inputs change the species abundance (see Supplementary Note 1 for details). The first control scheme models a combination of prebiotics (if u j (t) > 0) and bacteriostatic agents (if u j (t) < 0) as continuous control inputs modifying the growth of the actuated species (Fig. 1c): The second control scheme models a combination of transplantations (if u j (t) > 0) and bactericides (if u j (t) < 0) applied at discrete intervention instants T ¼ ft 1 ; t 2 ; Á Á Ág, rendering impulsive control inputs that instantaneously modify the abundance of the actuated species (Fig. 1d): Above, x(t + ) denotes the state "right after time t", so x(t) "jumps" at t 2 T if u(t) ≠ 0. The pair {f, g} characterizes both control schemes, describing the controlled population dynamics of the microbial community. The function g : R N ! R N M models the direct susceptibility of the species to the control actions. The j-th control input actuates the i-th species if g ij 6 0. Because g is typically unknown, we just assume that g(x) is some unknown meromorphic function of x such that g ij 6 0 iff (u j → Notice that when all species are directly controlled (i.e., an independent control input actuates each species), the whole microbial community can easily be driven to the desired state. Fortunately, as we show next, actuating all the species is far from being necessary. Thanks to the inter-species interactions encoded in the ecological network G, we can identify minimum sets of species that we need to actuate in order to drive the whole community. We call those species "driver species".
Identifying driver species. To understand when a set of actuated species is a set of driver species, consider the three-species community with Generalized Lotka-Volterra (GLV) population dynamics of Fig. 2a. This toy community has one control input actuating x 3 . Actuating only this species creates an autonomous element-namely, a constraint between some species abundances that the control input cannot break, confining the state of the community to a low-dimensional manifold (Fig. 2a, right). More precisely, our mathematical formalism reveals that ξ = x 1 x 2 is the autonomous element (Example 2 in Supplementary Note 2). Indeed, differentiating ξ with respect to time yields Intuitively, the autonomous element exists because the control input cannot change x 1 without changing x 2 in a predefined way, making it impossible to drive the community in the three-dimensional state space. This observation indicates that x 3 alone cannot be a driver species for this community. Introducing a second control input actuating x 1 helps the community jump out of the low-dimensional manifold eliminating the autonomous element, allowing us to drive this community to any desired state with positive abundance (Fig. 2b, and Example 6 in Supplementary Note 5). Therefore, {x 1 , x 3 } is a minimum set of driver species for this community.
In the general case of N species and M control inputs, we define a set of actuated species as a set of driver species if the corresponding controlled population dynamics {f, g} lacks autonomous elements.  Fig. 1 Controlling a microbial community. a Ecological network G for a toy microbial community of N = 3 species (green, yellow, blue). The controlled ecological network G c contains M = 1 control input actuating the third species. b Initial and desired abundance profiles (bars). Controlling the community consists in driving its state from the initial state x 0 to the desired state x d , represented by two points in the state space of the community. c In the continuous control scheme, the control inputs u(t) are continuous signals modifying the growth of the actuated species. The controlled population dynamics of this community is given by _ In the absence of control, this community has two equilibria x 0 ¼ ð3:14; 4:58; 1Þ > and x d ¼ ð4:57; 4:73; 2Þ > , chosen as the initial and desired states, respectively. d In the impulsive control scheme, the control inputs u(t) are impulses applied at the intervention instants T ¼ ft 1 ; t 2 ; Á Á Ág, instantaneously changing the abundance of the actuated species. The controlled population dynamics is the same as in panel (c), except that _ 15g. Under this controlled population dynamics, our mathematical formalism identifies x 3 as the solo driver species needed to drive this microbial community (Example 1 in Supplementary Note 2) the absence of autonomous elements is equivalent to their controllability 31 -the ability to drive the system between any two states, easily verified using Kalman's condition rank ½B; AB; Á Á Á ; A NÀ1 B ¼ N. In the case of nonlinear dynamics, the absence of autonomous elements can be characterized using a mathematical formalism based on differential one-forms (see Methods and Supplementary Note 2). For the continuous control scheme of Eq. (2), the conditions for the absence of autonomous elements are well understood as they define when a system is accessible 31 , a cornerstone concept in nonlinear control theory. Because it is more natural to control microbial communities with impulsive control actions, in this paper we extended the study of autonomous elements to the impulsive control systems of Eq. (3). We first introduced a definition of autonomous elements for impulsive control systems (Definition 3 in Supplementary Note 2). We then characterized necessary and sufficient conditions for the absence of autonomous elements in a controlled population dynamics (Theorem 2 in Supplementary Note 2). To our surprise, the conditions for the absence of autonomous elements for the continuous and the impulsive control schemes are identical (Remark 2 in Supplementary Note 2). This result means that transplantations and bactericides (impulsive control actions) can be as effective as prebiotics and bacteriostatic agents (continuous control actions).
Structural accessibility characterizes the generic absence of autonomous elements. In general, it remains extremely difficult finding a pair {f, g} that models the controlled population dynamics of a microbial community. This fact might suggest it is impossible to predict if the controlled community has autonomous elements or not, making it impossible to identify its driver species. We now show that this seemingly unavoidable limitation can be solved using the topology of the controlled ecological network of the community.
Define the network Using this definition, we next describe the class D of all possible controlled population dynamics that a controlled microbial community can have given we know its G c . Mathematically, D contains all base models {f * , g * } such that G f Ã ;g Ã ¼ G c , together with all deformations {f, g} of each of those base models. The base models characterize the simplest controlled population dynamics that the community can have, leading us to choose  Fig. 2 Autonomous elements constrain the state of microbial communities, characterizing their driver species. a A three-species community with GLV dynamics _ For actuating x 3 , we consider the impulsive control scheme with x 3 (t + ) = x 3 (t) + u 1 (t) for t 2 T. With this controlled population dynamics, our mathematical formalism reveals the autonomous element x 1 x 2 that constraints the state of this microbial community to the low-dimensional manifold fx 2 R 3 jx 1 x 2 ¼ x 1 ð0Þx 2 ð0Þg (gray) for all control inputs. Five state trajectories (in colors) with random control inputs illustrate this fact. Hence, {x 3 } alone cannot be a set of driver species for this controlled population dynamics. b Including a second control input u 2 (t) actuating x 1 (i.e., x 1 (t + ) = x 1 (t) + u 2 (t) for t 2 T) eliminates the autonomous element, since the state of the microbial community (colors) can explore a three-dimensional space (gray). Hence {x 1 , x 3 } is a minimum set of driver species for this community with GLV dynamics. c We proved that, generically, increasing the complexity of the controlled population dynamics cannot create autonomous elements. In this example, increasing the deformation size C from the GLV in panel (a) (with C = 0) to the controlled population dynamics in Fig. 1 (with C > 0) eliminates the autonomous element that was present by actuating x 3 alone (Example 1 in Supplementary Note 2). Therefore, increasing the complexity of the population dynamics makes {x 3 } a solo driver species them as controlled GLV models with constant susceptibilities: represent the interaction matrix, the intrinsic growth rate vector, and the susceptibility matrix of the community, respectively. As the simplest population dynamics, the GLV model has been applied to microbial communities in lakes, soils, and human bodies 14,15,20,[32][33][34][35][36][37][38] . A deformation of {f * , g * } is any meromorphic pair {f, g} such that: (i) G f ;g ¼ G f Ã ;g Ã ; (ii) there exists a finite set of parameters θ 2 R C such that ff ðxÞ; gðxÞg ¼ ff ðx; θÞ;gðx; θÞg; and (iii) the identity ff ðx; 0Þ;gðx; 0Þg ¼ ff Ã ðxÞ; g Ã ðxÞg holds. The smallest integer C ≥ 0 satisfying these three conditions is called the size of the deformation. A general class of controlled population dynamics are deformations of Eq. (4), including for i = 1, …, N. Above, θ i,1 are migration rates from/ to neighboring habitats, θ À1 i;2 are the carrying capacities of the environment, θ À1 i;3 are the Allee constants, and fθ ij;k g 7 k¼4 characterize the functional responses 39 . θ i,1 > 0 also models species like C. difficile that sporulate into "inactive" forms and then recover. "Higher-order interactions" (e.g., θ i x i x j x k ) and susceptibilities mediated by species abundance (e.g., g ij (x;θ) = b ij + θ ijk x k ) are deformations as well.
We call D structurally accessible if almost all of its base models and almost all of their deformations lack autonomous elements. This definition means that except for a zero-measure set of "singularities," all the controlled population dynamics that the community may take have to lack autonomous elements. The conditions under which D is structurally accessible are fully characterized using our mathematical formalism and they depend only on G c (see Methods and Supplementary Note 3). Hence, if D is structurally accessible, hereafter we also call G c structurally accessible. We first proved that, generically, increasing the size of a deformation cannot create autonomous elements (Proposition 1 in Supplementary Note 3). See also Fig. 2c for an illustration. This result reduces the search for autonomous elements to the deformations in D with minimum size C = 0 (i.e., all base models whose graph matches G c ). Finally, we proved that D is structurally accessible if and only if G c satisfies the following two graph-theoretical conditions: (i) each species is the end-node of a path that starts at a control input node; and (ii) there is a disjoint union of cycles (excluding self-loops) and paths that cover all species nodes (Theorem 3 of Supplementary Note 3). Note that the conditions for structural accessibility depend on the chosen base model.
Structural accessibility is a nonlinear generalization of "structural controllability" for linear systems 16 . The latter notion has received increasing attention in Network Science 17 . Interestingly, the two graph-theoretical conditions for structural accessibility are almost the same as those for structural linear controllability 16 . The key difference is that for structural linear controllability self-loops (corresponding to intrinsic nodal dynamics) can be used to satisfy condition (ii). See Remark 4 in Supplementary Note 3 for more details.
Identifying minimum sets of driver species in microbial communities. The above result provides a complete graphcharacterization of driver species: a set of actuated species is a set of driver species (for all but a zero-measure set of controlled population dynamics that the community may have) if and only if its corresponding G c satisfies the two graph-theoretical conditions. See Fig. 3 for an illustration. With this characterization, one can apply the maximum matching algorithm directly to G to calculate the minimum number of control inputs needed to ensure the structural accessibility of G c , as did in the structural linear controllability case 17,40 . However, this may not provide a minimum set of driver species because one control input may actuate multiple species. Fortunately, we can dedicate one control input to one species. Therefore, we adapted the notion of a a b  Fig. 3 Identifying driver species. For each network, a minimum set of driver species is shown providing a disjoint union of paths (purple) and cycles (green) covering all species nodes (see Supplementary Table 1 for the species name). Thus, the resulting controlled ecological network is structurally accessible. Self-loops are omitted from these networks to improve readability. a Inferred ecological network of the gut microbiota of germ-free mice pre-colonized with a mixture of human commensal bacterial type strains and then infected with C. difficile (species 7), as in ref. 22 b Inferred ecological network of the core microbiota of the sea sponge Ircinia oros, as in ref. 23 NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-019-08890-y ARTICLE NATURE COMMUNICATIONS | (2019) 10:1045 | https://doi.org/10.1038/s41467-019-08890-y | www.nature.com/naturecommunications "feasible dedicated input configuration" 41 and a polynomial-time algorithm (combining maximum matching with a strongly connected component decomposition of G) to identify one minimum set of driver species (Methods and Supplementary Note 4). Note that once G c is structurally accessible, it cannot lose its structural accessibility when new edges are added to it. This observation implies that the driver species can be identified from an "incomplete" ecological network (e.g., containing only highconfidence interactions).
Driving the driver species. We next calculate the control inputs to be applied to a set of driver species for driving the whole community towards the desired state x d . We will show that it is more efficient to calculate impulsive control inputs. To calculate these impulsive control inputs fuðt k Þ; t k 2 Tg we adopt a model predictive control (MPC) approach 42 . Based on the current state of the community x(t k ) at t k 2 T, we use knowledge of its controlled population dynamics {f, g} to predict the sequence of stateŝ X k;L ¼ fxðt kþ1 Þ; Á Á Á ;xðt kþLþ1 Þg that the community will take in response to a sequence of L impulsive control inputs U k;L ¼ fuðt k Þ; Á Á Á ; uðt kþLÀ1 Þg. The prediction horizon L > 0 determines how far into the future we predict. Then, we choose uðt k Þ ¼ u Ã 1 ðt k Þ where u Ã 1 ðt k Þ is the first element of the optimal control sequence U Ã k;L calculated as: Here, Ω R M L specifies constraints in the control inputs, and J x d is some cost function penalizing deviations of the predicted trajectoryX k;L from x d . For example, the cost function J x d ðX k;L ; U k;L Þ ¼ kxðt kþLþ1 Þ À x d k penalizes the deviations of the predicted final state. By recalculating U Ã k;L at each t k using the actual state of the community the MPC creates a feedback loop enhancing its robustness against prediction errors 42 . The prediction horizon can be chosen based on the controlled population dynamics of the community (Methods). For L = 1, this methodology is similar to ref. 43 . Equation (6) is a finite-dimensional optimization problem that can be solved using algorithms like DIRECT 44 . Solving the analogous optimization problem for continuous control inputs is more challenging because the optimization is over the infinitedimensional space of continuous functions.
We illustrate the above MPC strategy driving the microbial community of Fig. 1 with its solo driver species. According to its dynamics, L = 3 impulsive control inputs are sufficient (see caption in Fig. 1, and Example 4 in Supplementary Note 5). We chose J x d ðX k;L ; U k;L Þ ¼xðt k;L Þ À x d 2 . Solving Eq. (6) using DIRECT yields the nonlinear MPC strategy u * (t 1 ) = −0.8815, u * (t 2 ) = 2.0089 and u * (t 3 ) = −10 −4 (pink in Fig. 4a). We compared the performance of two other control strategies. The first strategy uses one transplantation to increase the abundance of the driver species to its desired value, reminiscent of one probiotic administration restoring its "healthy" abundance (purple in Fig. 4a). The second control strategy ignores the driver species, setting the abundance of the two non-driver species to their desired values (blue in Fig. 4a).
Among the above three control strategies, only the nonlinear MPC applied to the driver species succeeds (Fig. 4b). This strategy succeeds in a somewhat unconventional way: although the driver species is more abundant in the desired state than in the initial state, the first control action decreases its abundance further. Such control action lets the non-driver species reach their desired abundances and, once that happens, the abundance of the driver species is finally increased to its desired value (pink in Fig. 4b).
Just restoring the abundance of the driver species succeeds in driving x 2 and x 3 , but it fails to drive x 1 to the desired abundance (purple in Fig. 4b). Ignoring the driver species is the worst control strategy, failing to drive any of the three species to their desired values (blue in Fig. 4b). This toy example demonstrates the advantage of identifying and actuating driver species.
Driving large communities with uncertain dynamics. Solving the non-convex optimization problem of Eq. (6) is challenging as N or L increase, and it also requires knowing {f, g}, which may be impossible for large communities. We next circumvent these two drawbacks leveraging the network underlying the controlled microbial community.
Consider we can obtain a weighted adjacency matrixÂ 2 R N N from G, providing a proxy for its interaction matrix. Without additional knowledge of the community, we just assume that we can increase or decrease the abundance of each driver species. We thus useB 2 f0; 1g N M as a proxy for the susceptibility matrix, with b ij = 1 if the j-th control input actuates the i-th driver species. By rewriting ff ðxÞ; gðxÞg ¼ fÂx þ w x ;B þ w u g, we use fÂx;Bg to provide a linear prediction for the response of the community to the control inputs. Here, w x ¼ f ÀÂx and w u ¼ g ÀB are considered as "perturbations". Using fÂx;Bg, we design a linear MPC by solving Eq. (6) with the quadratic cost function Above, the positive definite matrices Q ¼ Q > 2 R N N and R ¼ R > 2 R M M are design parameters. Q penalizes the deviations of the predicted trajectory from the desired state, and R penalizes the control inputs magnitude. Under this scenario, Eq. (6) can be solved in closed form 45 yielding the linear MPC u(t k ) = Kx(t k ), where K 2 R M N is the solution of a Riccati equation (Supplementary Note 6). Since the Ricatti equation can be efficiently solved for large N, the linear MPC can be calculated for large communities. This linear MPC is robust against (w x , w u ) and it allows calculating the control inputs for the continuous control scheme (Supplementary Note 6). However, its performance strongly depends on the chosen ðÂ;BÞ and the distance to the desired state (Supplementary Note 6).
We applied the linear MPC for driving the toy three-species community of Fig. 1, assuming its dynamics is uncertain. Considering the ecological network of this community and its nonlinear population dynamics, we choseÂ ¼ ðÀ0:5; 0; À0:1; 0; À5; 1; 0; 0; À1Þ as a proxy for its interaction matrix. HereÂ is a rough approximation of the linearization of the population dynamics at the desired state given by (−0.37, 0, −0.05; 0, −5. 31, 0.52; 0, 0, −1). Choosing Q ¼ diagð20; 1; 10Þ, we compared the performance of three different linear MPCs obtained with R = 10 −4 , 10 −3 , 10 −2 (Fig. 4c). For R = 10 −4 , without using knowledge of the population dynamics, the performance of the linear MPC (pink in Fig. 4d) is very similar to the performance of the nonlinear MPC that uses full knowledge of the nonlinear population dynamics (pink in Fig. 4b). This success illustrates the robustness of the linear MPC against the perturbations. As R increases, the performance of the linear MPC deteriorates (green and blue in Fig. 4d).
Numerical validation on large uncertain microbial communities. To validate our control framework for large communities, we built communities of N = 100 species having random directed Erdös-Rényi ecological networks with connectivity c ∈ [0, 1], see Fig. 5a. The network edge-weights were chosen from a normal distribution with zero mean and standard deviation σ > 0, where σ characterizes the typical interspecies interaction strength. Negative self-loops with weights −1 were added to each species. We used this ecological network to identify the driver species of the community, and its corresponding weighted adjacency matrix as the interaction matrix to construct the linear MPC. We simulated the population dynamics of these communities using Eq. (5) ensuring all share x d 2 R N as equilibrium. The resulting communities have nonlinear population dynamics, and their linearization at the desired state is different from the interaction matrix used for the linear MPC (Supplementary Note 8).
To quantify the success of our control framework on a given community, we generated 300 initial species abundances that are uniformly distributed at a distance d > 0 from x d . The success rate at distance d is defined as the proportion of those initial conditions that are driven to x d only when the linear MPC is applied to a minimum set of driver species of the community (Fig. 5b-d). Namely, we discard all initial conditions that naturally evolve to x d . Finally, we calculated the mean success rate by averaging the success rate over 100 random communities (see Supplementary Note 8 for details).
The mean success rate is close to 1 for small d regardless of the community's parameters (Fig. 5e, f), confirming the theoretical guarantee that the linear MPC succeeds if d is small enough. The mean success rate decreases as σ increases, especially for large distances (Fig. 5e). Since increasing σ damages the stability of the population dynamics 46 , this result suggests that microbial communities become "harder" to control as they lose stability. The mean success rate is higher in communities with low connectivity (Fig. 5f). In general, the size of a minimum set of driver species increases as c decreases, indicating that the success

Fig. 4
Success and failure of different control strategies. a Three control strategies for driving the microbial community of Fig. 1a toward the desired state. First, MPC applied to the identified driver species {x 3 } (pink dots). The second control strategy increases the abundance of the driver species to match its value at the desired state x 3 (t 1 ) = x 3,d (purple dots). The third control strategy does not actuate the driver species, but actuates the other two species {x 1 , x 2 } by setting their abundance to their desired values (i.e., x 1 (t k ) = x 1,d and x 2 (t k ) = x 2,d , solid and hollow blue dots, respectively). b Response of the microbial community to these three control strategies. Here and in panel (d), the "jumps" produced by the control inputs are depicted by dashed lines. The equilibria of the population dynamics are shown as gray dots. Only the MPC applied to the driver species succeeds in driving the community to x d . c Control strategies obtained by using the linear MPC with parameters Q ¼ diagð20; 1; 10Þ and different values for R: 10 −4 (pink), 10 −3 (green), 10 −2 (blue). d Trajectories of the controlled community using the linear MPC strategies described in panel (c). Colors correspond to the different values of R rate increases as the number of driver species increases. Indeed, regardless of d, our control framework attains a mean success rate >0.8 provided that at least 6 from 100 species are driver species (Fig. 5g). This result suggests that the success rate can be enhanced by actuating a few additional species. Finally, to investigate the robustness of our control framework to errors in the ecological network, we randomly rewired each of its edges with probability p ∈ [0,1] (e.g., p = 0.05 corresponds to a 5% error). The success rate deteriorates but remains larger than zero despite large errors (Fig. 5h), showing the robustness of our control framework. However, a 5% error decreases the mean success rate in about 30%, emphasizing the importance of accurately mapping ecological networks for controlling microbial communities.
Application. We analyzed the ecological network of the gut microbiota of germ-free mice that were pre-colonized with a mixture of human commensal bacterial type strains and then Distance to the desired state infected with C. difficile spores 22 . In Fig. 3a, we identified a minimum set of five driver species in this 14-species community: Ruminococcus obeum (x 1 ), Raphitoma mirabilis (x 12 ), Bacteroides ovatus (x 2 ), Clostridium ramosum (x 6 ), and Akkermansia muciniphila (x 10 ). We also used the ecological network underlying the core microbiota of the sea sponge I. oros 23 , finding ten driver species in this twenty-species community (Fig. 3b). We studied by simulation the efficacy of the identified driver species and the linear MPC for driving these two microbial communities, assuming that their dynamics are uncertain (see Supplementary Note 7 for details of the simulation). For the mice gut microbiota, our framework succeeds in driving the community from an initial state where C. difficile is overabundant towards the desired state with a better balance of species (Fig. 6a, c). Similar results were obtained for controlling the core microbiota of I. oros (Fig. 6b, d). These results show again that the linear MPC method is robust enough to drive nonlinear microbial communities.

Discussion
Our theoretical framework allows systematically and efficiently controlling microbial communities towards desired states by identifying their driver species. Identifying the driver species of a microbial community only requires knowledge of its underlying ecological network. Note that there could be multiple different minimum sets of driver species for the same community. If the cost of choosing any species as a driver species is known, a combinatorial optimization scheme will allow selecting the best minimum driver species set. We emphasize that the driver species discussed here may not coincide with other notions in ecology such as keystone 47,48 or core 49 species. For example, the selection of driver species do not directly depend on their abundances, while keystone species do 47 .
For large uncertain communities, the linear model predictive controller gives a robust and efficient way to calculate the control inputs. The performance of this controller could be further improved by modeling the susceptibility of species to the control actions (e.g., pharmacokinetics). In such case, different control actions could be modeled by different pairs {f, g}, making the conditions for the absence of autonomous elements different for continuous and impulsive control actions. Control algorithms based on reinforcement learning 50 (RL) could provide even better performance. Our characterization of minimum sets of driver species will help to efficiently apply those control algorithms to microbial communities, as RL algorithms require specifying the "driver variables" they can actuate 51 . Here, controlling small synthetic communities could provide valuable insights for designing such controllers. We also note that altering the ecological network or obtaining a "simplified" network, in the spirit of refs. 52,53 , could be complementary control approaches (e.g., for reducing the minimum number of driver species).
It has been suggested that the success of ecosystem management strategies could be predicted using the notion of controllability 54  With impulsive control Initial state Desired state With continuous control Fig. 6 Controlling host-associated microbial communities. The controlled population dynamics of both microbial communities were simulated using the controlled GLV equations (see Supplementary Note 7 for details). The intrinsic growth rates were adjusted such that the community has an initial "diseased" equilibrium state x 0 in which one species (C. difficile for the mice gut microbiota) is overabundant compared to the rest of species. We chose the desired state x d as another equilibrium with a more balanced abundance profile. For each microbial community, we used the minimum set of driver species identified in Fig. 3. a, b Control inputs obtained using the linear MPC for the impulsive and continuous control schemes. c, d Projection of the highdimensional abundance profiles (states of the microbial communities) into their first three principal components (PCs). See Supplementary Figure 1 for the temporal response of each species. The calculated control strategies applied to the driver species succeed in driving the community to the desired state, using either continuous or impulsive control microbial communities and other biological systems. By their nature, biological systems cannot be fully controllable because there are states they cannot reach (e.g., states with negative abundances). Furthermore, since dynamic models for microbial communities are nonlinear and uncertain, it is impossible even to test if those systems are controllable. Structural accessibility overcomes these two limitations, generalizing the notion of accessibility 31 to systems with uncertain dynamics. Counterintuitively, our mathematical formalism suggests that communities with more complicated population dynamics (i.e., deformations with larger size) require fewer driver species. However, using fewer driver species could complicate the design of control strategies (Remark 9 in Supplementary Note 5). Indeed, by choosing an adequate base model 55 and making mild assumptions on the dynamics (i.e., f and g are meromorphic functions), our framework can identify minimum sets of "driver variables" for general nonlinear systems when their underlying networks are known (see Supplementary Note 9 for an example of a small gene regulatory network).
There are two limitations in our current framework for controlling microbial communities. First, stochastic effects are considered negligible. Incorporating stochastic effects yields stochastic differential equations for which the notion of autonomous elements still needs to be mathematically formulated. We anticipate that this is quite challenging, but definitely merits further studies. Second, our current framework does not explicitly model the dynamics of resources provided to and/or chemicals produced by the microbial species [56][57][58][59][60][61][62][63] . Our characterization of driver species only applies to some instances of resource-based models, e.g., the classical MacArthur's consumerresource model 64 when the resource dynamics is much faster than the species dynamics 65 . For general resource-based models, identifying their driver species requires analyzing a new kind of "output accessibility" that characterizes the absence of autonomous elements in the species abundances and ignores autonomous elements in the resource abundances. Then, the notion of "structural output accessibility" (i.e., generic output accessibility given an adequate base model) would provide a nonlinear counterpart of linear target controllability 66 . Structural output accessibility could allow us to identify driver species and/or "driver resources" of a community from knowing the bipartite interaction network of species and resources. This is beyond the scope of this work and deserves dedicated efforts.
To fully harvest the benefits of controlling microbial communities, a stronger synergy between microbial ecology and control theory is necessary. We hope that this work will catalyze new interdisciplinary approaches that enhance our ability to control complex microbial communities inside and around us.

Methods
Detecting autonomous elements in the continuous control scheme. For the continuous control systems of Eq. (2), the notion of autonomous elements and the conditions for their absence are well understood, since they define when a system is accessible (see Supplementary Note 2.2 and ref. 31 for details). An autonomous element for Eq. (2) is a non-constant function ξ(x) such that there exists an integer ν ≥ 0 and a meromorphic function F such that Fðξ; _ ξ; Á Á Á ; ξ ðνÞ Þ ¼ 0. In words, an autonomous element ξ is an "internal variable" of the system that evolves completely unaffected by the control inputs. System (2) is said accessible if it has no autonomous element 31 .
The absence of autonomous elements can be characterized by using a mathematical formalism based on differential one-forms 31 . Consider the set of meromorphic functions K in the variables fx; u; _ u; € u; Á Á Ág, and the sets of differential symbols dx ¼ ðdx 1 ; Á Á Á ; dx N Þ > and du ¼ ðdu 1 ; Á Á Á ; du M Þ > . Let X ¼ span K fdxg be the vector space spanned over K by the elements of dx, intuitively playing the role of "all functions of state variables". Any ω 2 X is a "one-form" 31 (see Supplementary Note 2.1 for details). In this setting, the chain rule provides a way to formally operate with one-forms, such as taking time derivatives: if To identify the presence of autonomous elements in the dynamics with continuous control of Eq. (2), one calculates the sequence of subspaces H k & X defined recursively by Detecting autonomous elements in the impulsive control scheme. For the impulsive control systems of Eq. (3) the notions of autonomous elements and accessibility are rather unexplored. Recall that an autonomous element is an internal variable of the system that is completely unaffected by the control actions.
To introduce a suitable definition of autonomous element for the impulsive control systems, note that the control inputs cause "jumps" in the actuated variables (i.e., discontinuities). These jumps are propagated to other state variables by the continuous dynamics. Thus, we define an autonomous element of Eq. (3) as a non-constant function ξ(x) such that ξ(x(t)), t 2 R, is a C 1 function (i.e., infinitely differentiable function) under any impulsive input (see Supplementary Note 2.3 for details). By analogy to the case of continuous control, we say that system (3) is accessible if it has no autonomous element according to the above definition.
To characterize the accessibility of impulsive control systems, we built the sequence of subspaces H k of all functions of the state variables that can be differentiated at least (k − 1) times (see details in Supplementary Note 2.3). The functions belonging to the limit H 1 are the autonomous elements of the system, since they are completely unaffected by the control inputs. Consequently, because the limit subspace H 1 is also "integrable" (informally, it does not contain "fictitious" autonomous elements), accessibility is equivalent to the condition H 1 ¼ f0g (see Theorem 2 in Supplementary Note 2). We further prove that the limit H 1 is attained in a finite step (i.e., there exists a finite k * such that We illustrate the above formalism using the three-species microbial community of Fig. 1 where x 3 is the actuated species (see caption for its population dynamics). To compute the sequence H k , one starts by definition with H 1 ¼ span K fdx 1 ; dx 2 ; dx 3 g. Next, H 2 are all one-forms in H 1 that can be differentiated once (i.e., they are continuous, so they are not directly affected by u). Because u actuates x 3 , we get H 2 ¼ span K fdx 2 ; dx 1 g. Similarly, H 3 are all those one-forms in H 2 that can be differentiated twice (i.e., their first derivative is continuous), yielding H 3 ¼ span K fx 2 dx 1 þ x 1 dx 2 g. Finally, H 4 ¼ f0g (see details in Example 1 in Supplementary Note 2). This implies that the controlled population dynamics is free of autonomous elements and hence it is accessible. See also Example 2 in Supplementary Note 2 for a community with autonomous elements.
Detecting autonomous elements without knowledge of the population dynamics. When the controlled population dynamics of the microbial community is unknown, we consider the class D of all controlled dynamics that the community may have given we know its controlled ecological network. Identifying the presence of autonomous elements in the full class D becomes possible thanks to so-called "generic properties" of meromorphic functions 31 . This is a mathematical property implying that a meromorphic function will satisfy a certain condition in almost all points of its domain-that is, everywhere except for a zero-measure set of "singularities"-provided that such condition holds at a single point. We exploited this property to prove that, generically, increasing the size C of a deformation cannot create new autonomous elements (Proposition 1 in Supplementary Note 3). See Fig. 2c for an illustration. This result allows us to only search for autonomous elements on the subset D 0 & D of all ff ; gg 2 D with size C = 0, corresponding to all base controlled GLV models of Eq. (4). Finally, we proved that the generic absence of autonomous elements in D 0 can be determined only from the topology of the controlled ecological network G c (Theorem 3 in Supplementary Note 3).
Identifying a minimum set of driver species. LetGðXÞ be the subgraph obtained by removing all self-loops from the ecological network GðXÞ of the (uncontrolled) community. LetBðX À ∪ X þ Þ be the bipartite representation ofGðXÞ, built by placing the edge ðx þ j ; x À i Þ inB if the directed edge (x j → x i ) is inG. Then, to identify a minimum set of driver species, we applied the notion of a "dedicated input configuration" introduced in ref. 41 (see details in Supplementary Note 4).
A strongly connected component (SCC) is said "non-top linked" if it has no incoming edges from other SCCs. Let M * be a maximum matching inB. Then, a non-top linked SCC is said to be "top assignable" with respect to M * if it contains at least one right-unmatched node in M * . Let Z ⊆ X be the set of rightunmatched nodes of some maximum matching ofB with maximum top assignability. Let W ⊆ X be a set consisting of one state node from each non-top linked SCC ofG not already present in Z. Then, we prove that X D ⊆ X is a minimum set of driver species if and only if there exist two disjoint subsets Z and W as defined above, such that X D = Z ∪ W (Proposition 3 in Supplementary Note 4). Using this result, we applied Algorithm 1 of ref. 41 toG to obtain a minimum set of driver species. This algorithm is implemented in Julia as the DriverSpecies function in the DriverSpeciesModule package. This algorithm is illustrated for communities of N = 100 species in Fig. 5a and Supplementary Fig. 2.
Choosing the prediction horizon. To choose the prediction horizon L for the nonlinear MPC we proved there are two possible cases (Theorem 4 in Supplementary Note 5). First, when the community can be driven to x d using L < ∞ impulsive control inputs. Second, when the community can only be asymptotically driven to x d , meaning that L ) N should be chosen sufficiently large. This second case could be circumvented by increasing the number of actuated species (Remark 8 in Supplementary Note 5).
Reporting summary. Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.
Code availability. A Julia implementation of the algorithm for identifying a minimum set of driver species, as well as all other functions necessary to reproduce the results of the paper, is provided at the GitHub repository: https://github.com/ mtangulo/DriverSpecies.