A yield-cost tradeoff governs Escherichia coli’s decision between fermentation and respiration in carbon-limited growth

Living cells react to changes in growth conditions by re-shaping their proteome. This accounts for different stress-response strategies, both specific (i.e., aimed at increasing the availability of stress-mitigating proteins) and systemic (such as large-scale changes in the use of metabolic pathways aimed at a more efficient exploitation of resources). Proteome re-allocation can, however, imply significant biosynthetic costs. Whether and how such costs impact the growth performance are largely open problems. Focusing on carbon-limited E. coli growth, we integrate genome-scale modeling and proteomic data to address these questions at quantitative level. After deriving a simple formula linking growth rate, carbon intake, and biosynthetic costs, we show that optimal growth results from the tradeoff between yield maximization and protein burden minimization. Empirical data confirm that E. coli growth is indeed close to Pareto-optimal over a broad range of growth rates. Moreover, we establish that, while most of the intaken carbon is diverted into biomass precursors, the efficiency of ATP synthesis is the key driver of the yield-cost tradeoff. These findings provide a quantitative perspective on carbon overflow, the origin of growth laws and the multidimensional optimality of E. coli metabolism.


NOTE 1. TRADE-OFFS IN THE GENERAL MODEL OF OPTIMAL PROTEIN ALLOCATION
A. Duality of maximum growth and optimal proteome allocation Here we consider in more detail the problem of characterizing the optimal strategy for the cell to reallocate its proteome when a specific metabolic activity L is limited. In the Main Text, we focus on L being the carbon uptake (C), but the same model can be in principle applied to other stress sources, e.g. limiting other nutrient sources (nitrogen, phosphate etc.) or antibiotics. In the following, µ stands for the growth rate, v L for the flux of the limited activity (to be identified with the carbon import flux J C under carbon limitation, see Main Text), φ N L = 1 − φ L for the mass fraction of the rest of the proteome and q = v L /µ for the specific flux of the limited activity (per unit of growth rate). These quantities in turn depend on internal variables of the cell such as metabolic fluxes and metabolite or enzyme levels (see e.g. [1]). However, instead of considering the whole space of states spanned by such variables, we will limit ourselves to a set of cellular states ("phenotypes") which we label with an index α. Each phenotype is assumed to be described by a different set of values for the limiting flux v L and for φ N L for each growth rate µ. We then define, for each α, the quantity where w L denotes the cost of the protein controlling v α L (to be identified with the level of nutritional stress w C under carbon limitation, see Main Text). Both v α L and φ α N L are modulated by the dilution rate (i.e. Φ α ≡ Φ α (w L , µ)). In particular, we make the natural assumption that both increase with µ for each α, namely dv α For any given w L , the growth rate µ α (w L ) pertaining to phenotype α can in principle be obtained by inverting the condition Φ α (µ α , w L ) = 1. The problem of growth rate maximization can thus be re-cast as the constrained maximization of µ α over the index α for given w L and v α (µ), i.e. α = arg max α µ subject to Φ α (µ, w L ) = 1 , or, more simply, with µ α (w L ) being a solution of Eq. (1). We call this the direct proteome-constrained problem.
We denote by v L and φ N L the values of v L and φ N L corresponding to α . One can easily show that, for any growth rate µ for which all these quantities exist, v (q) where (v where the values of µ are assumed to match the optimal growth rate in the direct formulation of the problem. In order to see this, one first has to introduce the following dual problem of the direct proteome-constrained problem (3): The solution of this problem is identical to the one of the direct problem, Eq. (3), provided dΦ /dλ > 0 and the growth rate µ is set so as to match the optimal one obtained in the direct problem (see [1]). Then, the proof is straightforward: because of the definitions of the optimization problems (8) and (9), φ (ε) N L , respectively. The first inequality directly gives us one of the two constraints; another constraint, namely v L ≤ v (ε) L , is obtained by using both inequalities. The demonstration of the remaining bounds (the q-bounds) is analogous.
Both the qand the ε-problem can be shown to be equivalent to two other problems different from the direct proteome-constrained problem introduced above. Indeed, the former describes the minimization of the specific flux q = v L /µ (or to the maximization of the growth yield, proportional to 1/q = µ/v L ) at fixed growth rate, while the latter is equivalent to the maximization of the growth rate subject to a constraint on φ N L .
In other terms, optimal growth and optimal proteome allocation are dual to each other and solutions to the direct protein-constrained problem are bound to lie within a range defined by the qand ε-problems. The existence of these bounds allows to study how cells may optimally handle different degrees of limitation, that is, different values of w L , by switching between alternate solutions. In particular, for w L = 0, the solution to the original problem coincides with the solution to the ε-problem. As w L increases, instead, the solution may shift towards states with larger yields (smaller specific rates) at the cost of increasing φ N L .

B. Transitions between optimal phenotypes
For any given "phenotype" α, one can obtain a growth rate µ α as a function of w L by solving the equation Φ α (w L , µ) = 1. Optimal phenotypes are those maximizing µ for each w L . Assuming for simplicity that optimal states are unique, transitions between "phenotypes" α and β may occur i.e. any decrease in the flux v L has to be matched by an increase in the non-limited proteome φ N L , highlighting the tradeoff between optimal proteome allocation and efficient use of limited resources. By differentiating Φ α with respect to w L one gets and therefore, assuming This equation not only involves the absolute magnitude of the limited flux v L , but also the variation of the proteome as the growth rate changes.

C. Consequences for Constrained Allocation FBA
The consequences of equations (10) and (12) depend on the model at hand. Let us consider, as in the main text, a flux-based constraint based model with proteome allocation constraint such as CAFBA, with no maintenance ATP hydrolysis rate. In this case, as explained in the Main Text, we can introduce the metabolic states ξ and express the fluxes as v = ξ · µ. Let us consider a set of different metabolic states, identified by an index α as ξ α . For each state we can compute the specific uptakes q α and protein costs ε α , so that v α L = q α µ and φ α N L = ε α µ. The bounds Eq. (5) and (6) take the form where (q) and (ε) label to specific states ξ (q) and ξ (ε) , while the asterisk indicates the optimal state.
On the other hand, Equations (10) and (12) respectively become Together, these two constraints imply that, at any transition between optimal states, both q and ε vary, and they do so in opposite directions. When w L = 0, ε = ε (ε) , while the specific flux q is maximum. As w L increases, at each transition q decreases as ε increases. These properties lie at the basis of the Pareto front analysis.

NOTE 2. COMPUTATION OF THE PARETO FRONT
Let us consider a value w C of w C such that states α and β with C α = q α w C + ε α and C β = q β w C + ε β are optimal for w C = w C , with C α = C β when w C = w C = (ε α − ε β )/(q β − q α ). We consider for definiteness q α > q β . Starting from w C < w C (so with β as the optimal state) we are interested in constraining the parameters of a sub-optimal state γ with cost C γ , such that C γ > C α for w C > w C and C γ > C β for w C < w C . In what follows, we assume that q β < q γ < q α .
Suppose that w C < w C . The constraint C γ > C β can be rewritten as where we used Note that conditions (17) and (18) identify the same half-space in the (q γ , ε γ ) plane, defined by the line passing through the points (q α , ε α ) and (q β , ε β ). Therefore, given a set of optimal states α, β, . . . , the Pareto frontier is obtained by connecting neighboring points with straight lines, as illustrated with a concrete example in Fig. N1. cross represents a suboptimal pathway, while the blue dots represent three Pareto-optimal metabolic states, numbered from one (minimum proteome cost ε 1 ) to three (minimum specific intake q 3 ). The Pareto frontier is shown in red and delimitates the feasible region (in light red). (b): Total cost C α = q α w C + ε α for each metabolic state α shown in panel (a). Each total cost C α is a linear function of w C . The cost is minimized by the envelope (shown as a red line) of the lines corresponding to the three Pareto-optimal states (dashed lines). As w C increases, the optimal metabolic mode switches from 1 to 2 at w 1 C = (ε 1 − ε 2 )/(q 2 − q 1 ) and . The grey lines denote the costs C α of suboptimal pathways (grey crosses in panel (a)).

A. Definition and basic decomposition
In this section we shall address in greater detail the definitions of metabolic states and the flux decomposition given in the Main Text, Eq. (6). In particular, we will study how the solutions of FBA-like problems depend on the parameters of the problem itself. To do so, it is useful to start from a "traditional" FBA setting in which the only constraints are (i) the stoichiometric constraints, a constraint on the carbon intake flux, J C (equivalently, it can also be an upper bound). For now, we will set the ATP maintenance flux σ 0 to zero. Given these constraints, we focus on flux configurations maximizing some linear functional of the fluxes; to be definite, we will simply pick the biomass synthesis flux µ. Also, for simplicity, in this note we will measure growth in terms of biomass accumulation instead of growth rate, i.e. µ has the same units of all other fluxes (mmol/g DW h), so that the stoichiometric coefficients describing the biomass composition are dimensionless.
Purely by dimensional analysis, we see that the optimal solution v to the FBA problem has to be proportional to the only unit-bearing quantity where ξ is a dimensionless vector that implicitly depends on the network topology and on the active irreversibility bounds of the problem at hand, and it can be computed by dividing the optimal flux solution (v) by the carbon intake flux J C . Eq. (19) simply states that the optimal fluxes are all proportional to each other upon varying J C , as can be easily verified numerically. By specializing Eq. (19) to the biomass synthesis flux we obtain the relationship between carbon intake and growth rate, J C = µ/ξ µ ; using this expression we can formulate an expression analogous to Eq. (19) expressing the fluxes as a function of the growth rate, v (µ) = ξµ after redefining ξ/ξ µ → ξ.
Let us stress that these are functional relations: once both the vector ξ and a single flux, e.g. the growth rate µ, are known, the optimal solution is obtained as a function of µ as v (µ) = ξ · µ. The uniqueness of the ξ vector is linked to that of the FBA problem; if multiple solutions exist, then the vector ξ has the same degeneracy.
In the case of CAFBA there is not upper bound or constraint on the carbon intake flux. Instead, the protein constraint (Main text Eq. (3)) is responsible for bounding the magnitude of the optimal fluxes. By dividing both sides of Eq. (3) by w C and splitting the fluxes into forward and backward fluxes [2], we can rewrite the proteome constraint as s · v = z C , where the components of s are dimensionless (s i = w i /w C ) and z C = φ max /w C has units of flux. Again, because of the linearity of the problem, all optimal solutions can be written as a linear function of z C as v = ξz C . This time, however, ξ does in principle depend on the vector s and, thus, on w C . Because of ξ being dimensionless, we must have dξ/dw C = 0, which implies that ξ is, at most, a piecewise function of w C (or z C , or µ). Within each interval in which ξ is constant, the optimal solution will still be described by a linear function v = ξµ. This is in agreement with the analysis presented in the Main Text, with the optimal solution "jumping" between different metabolic states ξ. Note that CAFBA solutions are mostly unique [2], and hence so is ξ.
B. More than one constraint: the case of ATP maintenance The main assumption behind Eq. (4) in the Main Text is that, for a given metabolic state ξ, then v = ξµ represents a valid set of fluxes, satisfying all stoichiometric and irreversibility constraints (not the proteome constraint, which is instead used to set the value of µ through Eq. (4)). If an ATP maintenance term is present, then this parametrization fails. Using again "standard FBA" to simplify the discussion, but this time with a finite ATP maintenance flux, we see that there are two unit-bearing constants in the optimization problem, the carbon flux J C and the ATP maintenance flux σ 0 . Because of the linearity of the problem, the optimal solutions have to be first order functions of these two quantities, v * = ξ (1) J C + ξ (2) σ 0 . Note that a zero-order term is excluded by dimensional analysis. As before, the vectors ξ (1) and ξ (2) are uniquely determined if as the solution to the linear programming problem is also unique.
Even without proteome constraint, the two vectors ξ (1) and ξ (2) can be piecewise constant, although the flux themselves v are found to be continuous in FBA solutions. These discontinuities happen when some flux "hits" an irreversibility constraint, and a new equality constraint is effectively introduced. This phenomenon has two main consequences. First, it is clear that these jumps can happen only when either ξ (1) or ξ (2) do not to satisfy the irreversibility constraints, even though they still satisfy the mass-balance constraints. This means that they cannot be considered, in isolation, a valid flux vector. Second, consider v(J C , σ 0 ) = ξ (1) J C + ξ (2) σ 0 with fixed vectors ξ (1) and ξ (2) . This function is not guaranteed to satisfy the irreversibility constraints for arbitrary values of J C and σ 0 . On the other hand, it is in possible to compute this function for any optimal solution v , and the resulting function could be used to provide vectors in at least a neighborhood of the optimal solution, i.e. for J C and σ 0 not far from the values corresponding to the ones used to compute v .
With the proteome constraint instead of the bound on carbon uptake, the same results apply, except one has z C instead of J C , and both "continuous" (due to the irreversibility constraints) and "discontinuous" (due to the proteome constraint) jumps are present in the two metabolic vectors.
By trading J C for the growth rate µ, and σ 0 for the total energy flux J E = σ 0 + σµ, we can express the optimal solution as a linear combination of µ and J E , i.e.

C. Computation of energy and biomass vectors
The vectors β and η used to compute the ATP and biomass yields in Fig. 3 were computed as follows. We first assume that the optimal solution v of a CAFBA problem (with either uniform or randomized E-sector weights) can be locally parametrized as a function of growth rate µ and total energy flux J E as described in Eq. (20). In particular, this expression will be valid for the optimal solution of a CAFBA problem, which we indicate with a star: This expression provides a constraint between the two vectors β and η. The other constraint is obtained by computing a perturbed solution, obtained by solving CAFBA with all the same parameters as before except for the ATP maintenance flux, which is slightly increased (e.g. by 10 −2 mmol ATP/g DW h). As long as the perturbation is small enough, this solution is related to the (perturbed, indicated by the tilde) growth rate and energy flux through the same metabolic modes β and η:ṽ Equations (21)-(22) can then be easily inverted to find the vectors β and η by solving a twodimensional linear system for each reaction: Of course, the flux components β i and η i cannot be uniquely determined if v i =ṽ i = 0, and we thus set both of them to zero. Repeated optimization of CAFBA with randomized E-sector protein costs provides a sampling of different metabolic states, i.e. pairs (β, η). We checked numerically that both β and η are independent on the perturbation applied. For instance, slightly perturbing w C instead of σ 0 yields the same vectors.

D. Decomposition of carbon intake into energy-and biomass-associated components
The decomposition J C = J C→E + J C→B (Main Text Eq. (6)) is obtained by specializing Eq. (20) to the case of carbon intake, v i = J C . The right side of the equation becomes: where we neglected for simplicity the growth-independent ATP maintenance σ 0 (the same approximation used throughout the Main Text). Therefore we obtain Eq. (7) from the Main Text: where we defined q B ≡ β C and q E ≡ η C . We see that q E is the specific carbon uptake per ATP produced, i.e. 1/q E is the ATP yield per lactose molecule. Instead, 1/q B is the biomass yield per lactose molecule, in units of g DW /mmol lac ; it is easily converted to g DW /g lac units by dividing q B by the specific mass of lactose.

E. Growth rate associated to each metabolic state
In this section we will show how to compute a flux vector v as a function of the growth rate and the ATP maintenance flux, starting from the metabolic modes (β, η) calculated above. This can be done in full generality by including a nonzero ATP maintenance flux σ 0 .
The starting point is the parametrization of fluxes as a function of the growth rate µ and of the energy flux J E given by Eq. (20). Since the energy flux and the growth rate are related by J E = σ 0 + σµ, it is more useful in the following to express the optimal fluxes as a function of µ where we defined ξ = β + ση. Similarly to the case of vanishing σ 0 , we define the specific intake fluxes q = ξ C and q 0 = η C . For the E-sector protein costs, we have the additional problem of having to separate the absolute value of each flux, |v i |, into the sum of two terms which might not have the same sign. While it impossible to obtain a simple expression for arbitrary vectors ξ and η, it is feasible if we restrict ourselves to small variations in µ and σ 0 from the values obtained during the sampling procedure. We define s i as the sign of the optimal flux v i used above to compute the two vectors ξ and η. Then, we write: This expression is valid in a neighborhood of the optimal solution v , as long as all fluxes keep the same sign upon varying µ and σ 0 . Using the proteome constraint, one obtains the following generalization of Eq. (4) from the Main Text: If we let σ 0 → 0, the flux decomposition Eq. (28) reduces to v = ξµ, and Equation (29)  The optimality of the pair (ξ, η), or equivalently (β, η), depends now on four variables, two specific fluxes (q and q 0 , or q B and q E ) and two protein costs (ε and ε 0 , or ε B and ε E ), and therefore the Pareto optimality of the solutions with finite σ 0 is harder to evaluate. As σ 0 → 0, the Pareto optimality between q and ε is recovered, with q monotonously decreasing as w C is increased. However, this does not constrain the specific fluxes for energy q E and biomass q B to be monotonously related to w C ; in fact the optimal q B observed in Main Text Fig. 3 is a nonmonotonic function of the growth rate.
Here growth is proportional to the consumption of the metabolite e and, for simplicity, we use the same units for both. Under steady-state mass-balance, one has u = g, au = r + v and µ = b 1 g + b 2 r. Flux states compatible with these constraints can be expressed as functions of µ and u alone. From r ≥ 0 and v ≥ 0 one gets instead the following bounds for the growth rate, or, introducing the growth yield Y ≡ µ/u (reciprocal of the specific influx, Y ≡ 1/q): Equation (32) implies that the different steady states are characterized by yields between the yield of fermentation Y fer = b 1 and that of respiration Y res = b 1 + ab 2 > Y fer . As Eq. (31) does not by itself limit the growth rate, extra constraints have to be enforced to obtain well-defined solutions to the problem of maximizing µ. The nature of optimal states therefore depends on which additional constraints are imposed. We consider three distinct scenarios, whose solutions are summarized in This is a well known property of the solutions of standard FBA.
FBA with Molecular Crowding (FBAwMC) scenario [3,4]: In this case a "crowding constraint" is imposed, consisting of an overall bound on intracellular fluxes of the form c 1 g + c 2 r = 1.
Now, for the growth rate one finds optimal solution is obtained by minimizing g, i.e. by setting g = g ≡ (b 1 + ab 2 ) −1 µ, and presents respiratory metabolism. If instead b 1 /c 1 < b 2 /c 2 , then g should be maximized (g = µ/b 1 ) and the optimal solution presents fermentative metabolism. The inclusion of explicit coefficients for the carbon uptake u and the fermentation flux v in the crowding constraint only leads to a re-definition of the coefficients c 1 and c 2 .
Constrained Allocation FBA (CAFBA) scenario [2]: In this case, the additional constraint models proteome allocation and reads w C u + w 1 g + w 2 r + w R µ = φ max . As in the FBAwMC case, the growth rate can be expressed as a function of u alone, obtaining Q is proportional to the additional protein cost of respiration, w 2 /b 2 , with respect to the cost of fermentation, w 1 /b 1 , where the protein costs are weighted by the inverse yields of the pathways. Since w C increases with decreasing nutrient levels, the sign of Q − w C might change when one shifts from good carbon sources to poor ones. For the realistic case where Q > 0, in specific, the optimal solution shifts from fermentation to respiration as w C increases, i.e. as carbon is limited. (3) (1) can compute the specific carbon uptake rate q(ξ) = ξ C and the specific protein cost ε = i∈E w i |ξ i |. The growth rate µ corresponding to the metabolic state ξ is then computed from q and through Eq. (4) from the Main Text, and with it the corresponding flux vector v = µ · ξ.
FIG. S2. Energy production partitioning. The flux decomposition into energy-and biomass-associated vectors (Fig. 3a) is only useful if J E either dominates the total energy budget of the cell (e.g. the biosynthetic cost of the biomass components is small with respect to J E ) or if it represents a fixed fraction of the total energy budget. We define the vector S AT P , whose components S AT P,i represent the stoichiometric coefficient of ATP in reaction i. (We exclude the biomass reaction and the ATP maintenance reaction.) For a given flux v i , S AT P,i v i indicates the ATP production (if positive) or consumption (if negative) flux. When summing over all reactions, we can separate the positive and negative contributions as (S AT P · v) − = i S AT P,i v i Θ(−S AT P,i v i ) and with Θ(x) denoting the Heaviside theta function. Hence, (S AT P · v) + measures the total ATP synthesized by the flux vector v. Since each flux vector can be decomposed as v = η · J E + β · µ, the total ATP production J E can be expressed as J E = S AT P v = (S AT P η) · J E + (S AT P β) · µ. By further decomposing the scalar products into positive and negative parts we obtain four terms, namely J η± E = (S AT P · η) ± · J E and J β± E = (S AT P · β) ± · µ, with the two constraints all metabolic states sampled in Fig. 3 (Main Text). This means that about 50% of the ATP generated in the cell is used to fuel the biosynthetic reactions included in the vector β, independently of the growth rate. In turn, considering the constraints mentioned above, one concludes that, in terms of absolute fluxes, about 1/3 (resp. 2/3) of the in-taken carbon goes through pathways that contribute to the overall production of energy (resp. biomass).

SUPPLEMENTARY TABLES Symbol Description
µ Growth rate (units: per hour).
J C Carbon (lactose) intake flux, in units of mmol lactose per dry weight per hour.
φ C Mass fraction of catabolic proteins, responsible for uptaking the carbon source with a flux J C .
w C Protein cost associated to carbon intake. Its value varies in growth media with different carbon source availability: a larger value indicates lower carbon concentrations and/or quality.
φ N C Fraction of non-limited catabolic proteome (φ N C = 1 − φ C ). In Eq. (2) it is assumed to depend linearly on growth rate as v i Flux of each metabolic reaction (labeled by the index i) included in the genome-scale model of metabolism studied in this work (iJR904). This usually excludes carbon uptake (whose flux is J C ).
v Vector whose entries are all metabolic fluxes v i .
w i Protein cost for individual enzymatic reactions (the proteome E-sector, as described in the text). Under the "uniform" or "homogeneous weights" approximation, they are all taken to be the same, w i = w E = 8.3 · 10 −4 g DW h/mmol . q Specific carbon flux, q = J C /µ. It specifies the amount of carbon (in our case lactose) intaken per unit of dry mass, and has units of mmol lactose per gram of dry weight. In other words, the production of 1 gram of dry cells requires q mmol of lactose. It is also inversely proportional to the growth yield, usually defined as grams of dry weight per gram of substrate.
ε Protein cost of the enzymatic reaction, ε = i∈E w i |v i |/µ .

C
Total protein cost, including the one associated to carbon uptake (qw C ) and the metabolic one (ε). σ Growth-associated ATP hydrolysis rate (units: mmol ATP hydrolyzed per gram of dry weight). It relates the energy production flux J E to growth rate, J E = σµ .
J C→E Carbon flux associated to energy production.
J C→B Carbon flux associated to the production of biomass precursors. This includes the production of energy necessary to fuel the biosynthetic pathways.
q E Energy-associated specific carbon intake, q E = J C→E /J E = J C→E /(σµ) (mmol of lactose per mmol of ATP synthesized).
q B Biomass-associated specific carbon intake, q B = J C→B /µ (mmol lactose per gram of biomass precursors) η, β Energy-and biomass-associated vectors (η and β, respectively) describing the metabolic state of the cell when the fluxes to energy and biomass precursors are considered separately. TABLE S1. List of symbols and parameters used in the Main Text.