Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities

Metabolite exchanges in microbial communities give rise to ecological interactions that govern ecosystem diversity and stability. It is unclear, however, how the rise of these interactions varies across metabolites and organisms. Here we address this question by integrating genome-scale models of metabolism with evolutionary game theory. Specifically, we use microbial fitness values estimated by metabolic models to infer evolutionarily stable interactions in multi-species microbial “games”. We first validate our approach using a well-characterized yeast cheater-cooperator system. We next perform over 80,000 in silico experiments to infer how metabolic interdependencies mediated by amino acid leakage in Escherichia coli vary across 189 amino acid pairs. While most pairs display shared patterns of inter-species interactions, multiple deviations are caused by pleiotropy and epistasis in metabolism. Furthermore, simulated invasion experiments reveal possible paths to obligate cross-feeding. Our study provides genomically driven insight into the rise of ecological interactions, with implications for microbiome research and synthetic ecology.

. Nash equilibria and equilibrium genotype frequencies for (glycine, threonine) amino acid pairs. (A) Identified Nash equilibria for two-, three-and four-player games (see also Figure 3F of the main text). Equilibria labeled 'Other' represent the integration of all equilibria corresponding to leakiness levels that cannot be sustained by 11 genotype. (B) to (F) show the equilibrium frequencies of various genotypes (11, 01, 10 and 00) using different initial genotype frequencies each with direct biological interpretation (see the main text and Figure 6 therein).  Figure 6B therein): 11 genotype is first converted to 10 by losing its second leaky function. Upon equilibrating, both 10 and 11 may lose their first leaky function converting to 00 and 01. (B) The frequency profiles of various genotypes for two representative low (10%) and high (20%) leakiness levels for both lysine and isoleucine. (C) Equilibrium frequencies of various genotypes for leakiness levels sustainable by 11 genotype (green region in Figure 4C of the main text). A comparison with Figure 6B of the main text shows that the equilibrium frequencies of the four genotypes do not depend on which leaky function is lost first. Assessing the evolutionary dynamics at low/moderate in (B) leakiness levels shows that even though cross-feeding emerges initially, it is eventually driven to extinction by non-producers. This pattern reveals an evolutionary path toward cross-feeding: Once a crossfeeding association is established at low/moderate leakiness levels, crossfeeders may evolve to increase their level of secreted amino acids to help their partner's growth in exchange for the amino acid they need. This increase can be so severe that even prototrophs (11) are driven to extinction (i.e., moving from the green to red region in Figure 4C of the main text). 11 11 01 11 00 Figure 9. Cross-feeders are resistant to invasion by non-producers once established. First, two mutant genotypes arise from 11, where one has lost its first leaky function (01) and the second has lost its second one (10). Next, 00 originates from 01 and/or 10 by losing their remaining leaky function. (A) and (B) show the initial and equilibrium genotypes frequencies for first step, respectively, and (C) and (D) show those for the second step for (lysine, isoleucine) amino acid pair. The equilibrium frequencies are shown only for leakiness levels that are sustainable by 11 genotype (green region in Figure 4C in the main text). These figures show that cross-feeding can emerge in the first step and can resist exploitation by 00 genotypes in the second step. Results are shown for (lysine, isoleucine) as a representative amino acid pair.
(A) Initial genotype frequencies. (B) Genotype frequencies profiles for two representative low (10%) and high (20%) leakiness levels for both lysine and isoleucine. (C) Equilibrium genotype frequencies for leakiness levels sustainable by 11 genotype (green region in Figure 4C of the main text). Construct in silico strains for wild-type, partial producers and no-producer  Figure 11. A schematic representation of the metabolic network-driven evolutionary game theory approach used for E. coli leaking one amino acid. This same procedure can be applied to any system with one leaky trait such as sucrose hydrolysis by S. cerevisiae (see the main text). Construct in silico strains for wild-type, partial producers and non-producer  is a game where the optimal strategy is to take the same strategy as the opponent and corresponds to two Nash equilibria, namely, (Non-producer, Non-producer) and (Producer, Producer). Addition of glucose to the extracellular environment increases the average fitness of non-producers and makes producers more difficult to sustain regardless of how the cooperation cost is modeled. Figure 14. Sensitivity of predicted growth defect (cooperation cost) using the yeast iAZ900 model. Sensitivities are shown with respect to (A) histidine uptake rate and (B) stoichiometric coefficient of ATP in the sucrose hydrolysis reaction (SUCRe). For a fixed capture efficiency, the growth cost in (A) was computed as the difference between the predicted growth under the histidine saturation conditions and that for a given histidine uptake rate. Similarly, the growth defect for a fixed capture efficiency in (B) was computed as the difference between the predicted growth rate when the ATP cost of sucrose hydrolysis is zero and that for a given ATP cost. As shown here, the model is not very much sensitive to histidine limitations. We found a similar pattern for other models of the yeast (Supplementary References 20 and 21; results not shown). Processor time (sec) There are a number of differences between addressing community dynamics using the multi-species dynamic Flux Balance Analysis (dFBA) of metabolic models 1 (see also 2,3 for recent reviews of these methods) and that using the mechanistic evolutionary agme theory framework proposed in this study. Here, we aim to model only the 'evolutionary dynamics' of the system, that is, how the relative genotype abundances (frequencies, or community structure) change over time following the evolutionary game theory 4 , and some microbial ecology literature 5  Multi-species dFBA approaches, on the other hand, take into account the impact of variations in both genotype frequencies and genotype abundances and thus model the 'eco-evolutionary' dynamics of the system 5,7,8 . These approaches, however, require the knowledge of the uptake kinetics for the specific compounds that are taken up by each community member, which may not be always available.

Comparison with previous efforts to integrate metabolic networks and evolutionary game theory
While elementary mode analysis of metabolic networks 9 has been previously used in conjunction with evolutionary game theory to assess rate-yield tradeoffs in a single species 10 , integrating constraint-based analysis of metabolic networks (i.e., COBRA-based methods 11 ) with evolutionary game theory for the analysis of interspecies interactions has not been explored before. In this study we present (to our knowledge) the first example of such a framework.

Connections with previous experimental studies establishing synthetic cooperative exchanges
It is important to highlight a key difference between our in silico analyses and the previous experimental studies that report on establishing synthetic cross-feeding associations [12][13][14][15][16] . While these studies start with an initial population of complementary genotypes to assess whether they can grow in a co-culture, here we The identified Nash equilibria and equilibrium fraction of producers are given in Supplementary Figure 13B and 13C. Overall, the general patterns that are observed here are qualitatively in agreement with those in Figures 2B and 2C of the main text.
The only difference is that the Prisoner's Dilemma game comprises only a small portion of the cost-capture efficiency landscape in Supplementary Figure 13B compared to that in Figure 2B of the main text. This is mostly due to the fact that the genome-scale metabolic models of the yeast 20,21 cannot fully recapitulate the growth defect (i.e., cooperation cost) due to histidine limitation in the absence of hisD gene (see Supplementary Figure 14).
Assessing the impact of the addition of glucose to the growth medium (Supplementary Figures 13D-F) also reveals patterns that are qualitatively in agreement with those in Figures 2E and 2F

Supplementary Note 3. Detailed analysis of the Nash equilibria and equilibrium genotype frequencies in E. coli strains secreting individual amino acids
Here, we analyze in more details the identified Nash equilibria and equilibrium genotype frequencies given in Figures 3B and 3D of the main text.

Analysis of the Nash equilibria
As noted in the main text and shown in Figure 3B  equilibria. For (WT, WT) to be a Nash equilibrium, a WT facing a WT must be fitter than a MT facing a WT. We reasoned that this would occur if the secreted amino acid was associated with a small growth cost to WT and/or a small growth advantage to the mutant. Indeed, for the most extreme case of glutamate, we found that the growth gain per unit uptake for a glutamate auxotroph mutant is significantly lower than all other amino acid auxotroph mutants (Supplementary Figure 16).

Analysis of the equilibrium genotype frequencies
As shown in Figure 3D Figure   17A). However, we do not observe a consistent pattern in the growth benefit of the mutant strains for all amino acids (Supplementary Figure 17B) consistent with previous reports 22 . In particular, we observed that the growth rate of MTs that appear only in a small equilibrium fraction in the Snowdrift game region has already reached a saturation point at very small uptake levels. This implies that any further increase in the leakiness levels in WT will not increase their growth rate. This causes the average fitness of these mutants to be lower than WT, which in turn leads to a higher frequency of WT in the Snowdrift game region.

More details on NashEq Finder
Nash equilibrium is a central concept in game theory describing a state where no player has an incentive to change its current strategy because it cannot improve its payoff any further by doing so, if other players keep their strategies unchanged. As noted in the main text, we developed an optimization-based framework called NashEq Finder to identify the pure strategy Nash equilibria (NE) of a non-symmetric !-player game, given its payoff matrix (Sections 3 and 4 of this text provide detailed descriptions of how payoffs were computed for the case studies in this paper). In the following, we present an example of how NashEq Finder optimization formulation works using a simple two-player game, and also provide a preliminary assessment of the computational efficiency of the NashEq Finder algorithm.

An example showing how NashEq Finder optimization formulation works
Consider the following payoff matrix for a game with two players, ("1 and "2), where each player can either Cooperate (C) or Defect (D): We have: We also set 89 ' = 89 + = −1.
It is easy to verify that "1 is better off defecting if "2 cooperates. The same is true if "2 defects as well. Similarly, "2 is always better off defecting no matter if "1 cooperates or defects. This implies that DD is the Nash equilibrium of the game (i.e., this is a Prisoner's Dilemma game). 33 is the only cell of the payoff matrix that satisfies the conditions of the Nash equilibrium mathematically described with Constraints (6) and (7) in the NashEq Finder optimization formulation in the main text (note that ; &, is a binary variable and can assume only a value of zero or one): Constraint (6): For this constraint, the strategy of player "2 is fixed at 3 and we check whether In this case we observe that both ; ,, = 0 ; ,, = 1 satisfy Constraints (6) and (7), however, since the objective function of the optimization problem maximizes sum of the binary variables, ; ,, = 1 will be returned as the optimal value for ; ,, upon solving the optimization problem. Now, we verify that Constraints (6) and (7)  0 that satisfies both Constraints (S1) and (S2).

Computational efficiency of the NashEq Finder algorithm
Computing pure strategy Nash equlibria is known to be an NP-hard problem and it is PPAD-complete when considering mixed strategies as well 23,24 . As noted in the main text, we have provided, in Supplementary Software 1, a python script implementing NashEq Finder in its most general form to identify all pure strategy Nash equilibria of an !-player game. The runtime of NashEq primarily depends on the size of the payoff matrix (a function of number of players and the number of strategies each player can take) as one binary variable is assigned to each cell of the payoff matrix. Supplementary Figure 18 shows the required processor time to solve NashEq Finder for a number of sample simulations that we performed for E. coli leaking different number of amino acids. NashEq Funder complements previous mixed-integer programming approaches to identify the Nash equilibria of n-player games 25 , however, a more comprehensive study is needed to find out how the computational efficiency of NashEq Finder is compared to that of the previous algorithms.

Definition of the in silico producer and non-producer strains
Non-producer: The non-producer strain lacks suc2 gene, which is simulated by (ii) Changes in the invertase production cost is modeled by changing the histidine uptake rate: This is to specifically reproduce Gore et al 18 system, where they used a histidine auxotroph producer strain (ΔhisD), so as to experimentally increase the "cost of cooperation" by limiting the histidine concentration in the growth medium. Here, we first construct an in silico histidine auxotroph strain by setting the lower and upper bound for reaction IGPDH (imidazoleglycerol phosphate dehydratase) encoded by gene hisD to zero. We, next, manually adjusted L MND,OP&QR such that the growth rate of a producer strain is 2.5% less than that of a non-producer strain as reported in 18 . However, since the in silico non-producer strain cannot grow on sucrose (due to having SUCRe reaction removed), we instead used the original iAZ900 model where SUCRe reaction does not contain an ATP cost as a proxy for the non-producer strain. We found with this analysis that L MND = 0.115, results in a growth rate for the producer strain, which is less than that for the non-producer by 2.5%.
All simulations were performed for aerobic minimal medium with a sucrose and oxygen uptake rates of 10

Estimating additional metabolic parameters
We first need to determine a number of parameters before quantifying the payoff matrix of the game.
Death rate: A death rate was calculated requirements by using the following equation 1 for the case of an infeasible FBA problem implying that the available glucose in the growth medium cannot support maintenance ATP: Growth cost of sucrose hydrolysis: We calculated a representative growth cost of sucrose hydrolysis for a fixed L MND by solving two FBA problems one for the original iAZ900 model (where no ATP cost is incorporated into the SUCRe reaction) and the other for the in silico producer strain (where ATP is incorporated into the SUCRe reaction but with K = 1). The growth cost of sucrose hydrolysis is then calculated as follows: tTTARu[\Tu,Rv' ).
Note that Ldef_ℎijf_kfl;mℎ_elLm always takes a negative value.
Glucose and fructose secretion rates: We first solve the following FBA problem for a producer strain alone, to find out the glucose and fructose secretion rates: subject to L oÅ Y Å Å∈Ç = 0, ∀Ñ ∈ Ö, (S1) where, Ö and â denote the set of metabolites and reactions, L oÅ is the stoichiometric coefficient of metabolite Ñ in reaction à and Y Å is the flux of reaction à. Furthermore, is the upper bounds on the sucrose uptake rate. Assigning a non-zero value to kJe_LKefKmKj_éJdè and éfd_LKefKmKj_éJdè in this case the goal was to guarantee that a partner (producer or non-producer) strain can still benefit from this cooperative behavior even when the cooperation is too costly.
With having these parameters determined, we can now compute the payoffs. Note that these payoffs are computed for the encounter of two single cells. The impact of cell type frequencies is accounted for when simulating the evolutionary dynamics using the Replicator's equation (see Methods in the main text). The following FBA formulations are presented for a fixed capture efficiency, K , and a fixed ATP cost of sucrose hydrolysis, L MND for case (i) in Section 3.1. These formulations can be readily adjusted for case (ii) as well.

Computing the payoffs for a producer vs. another producer
Here, we solve one FBA problem as follows:

Computing the payoffs for a producer vs. a non-producer
The FBA problem for the producer is formulated as follows: subject to L oÅ Y Å Å∈Ç = 0, ∀Ñ ∈ Ö, 89 Å ≤ Y Å ≤ á9 Å , ∀à ∈ â, Y äã_]åtu_R ≥ −10, The difference between this FBA problem and that for producer vs. producer is that here we do not allow for the uptake of glucose and fructose as here the producer deals with a non-producer strain. As before, the optimal value of Y noTS[]] is assigned as the producer's payoff, if the FBA problem is solved to optimality, and is set to The FBA for the non-producer is as follows: subject to Here, the non-producer can benefit from the secreted glucose and fructose by its cooperative partner. The optimal value of Y noTS[]] is assigned as the non-producer's payoff, if the FBA problem is solved to optimality. Otherwise, the payoff of the nonproducers is set to Y ZR[\X (note that the non-producer does not incur the cost of sucrose hydrolysis).

Computing the payoffs for a non-producer vs. another non-producer
There is no need to solve an FBA problem in this case, because it is known a priori that an FBA problem for a non-producer vs. a non-producer will be infeasible because the ATP maintenance requirements (described by Y MNDç = 1 20 ) cannot be satisfied due to the absence of glucose/fructose. The payoff of the non-producer is set to Y ZR[\X in this case.

Computing the payoff values for amino acid-secreting E. coli strains
Here

Construction of the in silico strains
Auxotrophy for one or more amino acids is imposed by setting the lower and upper bounds for reactions corresponding to the genes responsible for the synthesis of those amino acid to zero. . it should be noted that the impact of leaking an amino acid at a particular leakiness level on the fitness (growth) of that genotype depends not only on the cost of synthesizing that amino acid but also on its transport cost, in case the transport of this amino acid is metabolically or energetically costly (all these costs are captured by the metabolic network model).

Determining the net export and uptake level of each amino acid by each community member
Here, we assume that the leakiness level of each amino acid is the same across all genotypes in the community that are leaky for that amino acid given that all mutant genotypes arise from a wild-type genotype. For example, if the leakiness level of amino acid AA is 25%, all genotypes that are not auxotroph for this AA will secrete ), ó be the total number of genotypes in a given encounter (e.g., ó = 2 for pairwise interacitons and ó = 3 for the encounter of three genotypes), and ó MM denote the total number of genotypes in the game secreting amino acid AA. Then, the net uptake rate of an amino acid AA for a genotype auxotroph for AA is determined as follows: Y òR\,åA\[ôR = ó MM 10" 100 ó − 1 (S11) Similarly, the net secretion (export) level of amino acid AA by a genotype in the game that is leaky for this amino acid is determined as follows: Y òR\,R`ATu\ = 10" 100 − ó MM − 1 10" 100 ó − 1 (S12) Note that if the net export turns out to be negative, it implies uptake. This may happen for three or more leaky traits.
The main assumption underlying these relations is that a genotype that is leaky for an amino acid does not have access to part of the amino acid that is secreted out but it can access those secreted by other genotypes in the community. This assumption is to impose the notion that the private benefit is the portion of the amino acid that is retained by producing cells and the rest will serve as the public good. In addition, we assume, for simplicity, that any secreted amino acid by a given genotype is equally shared among all other the genotypes involved in the game (community). This is a reasonable assumption in a homogeneous environment, which is the focus of this study. Note that these assumptions do not affect the generality of our framework and can be easily relaxed, if needed. Some examples of how these relations work are given in the following, where 1 and 0 represent cases in which a genotype is leaky or auxotroph for a given amino acid, respectively. This implies that the same amount that is exported by genotype 1 is taken up from its partner genotype 1.

Formulation of the FBA problems to quantify the payoffs
One FBA problem is solved for each distinct genotype in a pairwise or higher-order interaction. For example, if computing the payoffs for the encounter of three genotypes 11, 11 and 10, one needs to solve only one FBA problem for 11 genotypes and one for the 10 genotype. The list of exchanged amino acids and their leakiness levels are provided as inputs. The general form of the FBA problem that is solved for a given genotype ü in the game (community) is as follows: where, all parameters and variables defined for an FBA problem in Section 3 are extended here by addition of a superscript ü denoting that they belong to the community member ü . Here, ¢£ ô and Ö UR[ô §,ô denote the set of reactions corresponding to the knocked out genes and the set of leaky metabolites for genotype ü, respectively. Constraint (S14) sets to zero the flux of reactions corresponding to the specific gene mutations in the genotype under consideration.
Constraints (S15) and (S16) impose the bounds on the export rate of metabolites for which genotype ü is leaky and on the uptake rate of metabolites for which it is following previous studies 28 . The general FBA formulation can be customized for any desired genotypes by adjusting ¢£ ô and Ö UR[ô §,ô . If the FBA problem is solved to optimality, the obtained biomass flux is assigned as the payoff of the corresponding genotype. Otherwise, the payoff is set to jK%mℎ_f%mK (a negative value), which is computed beforehand the same way it was described in section 2 of Supplementary Methods.