Abstract
The biological fitness of microbes is largely determined by the rate with which they replicate their biomass composition. Mathematical models that maximize this balanced growth rate while accounting for mass conservation, reaction kinetics, and limits on dry mass per volume are inevitably nonlinear. Here, we develop a general theory for such models, termed Growth Balance Analysis (GBA), which provides explicit expressions for protein concentrations, fluxes, and growth rates. These variables are functions of the concentrations of cellular components, for which we calculate marginal fitness costs and benefits that are related to metabolic control coefficients. At maximal growth rate, the net benefits of all concentrations are equal. Based solely on physicochemical constraints, GBA unveils fundamental quantitative principles of cellular resource allocation and growth; it accurately predicts the relationship between growth rates and ribosome concentrations in E. coli and yeast and between growth rate and dry mass density in E. coli.
Introduction
The defining feature of life is selfreplication. For noninteracting unicellular organisms in constant environments, the rate of this selfreplication is equivalent to their evolutionary fitness^{1}: fastgrowing cells outcompete those growing more slowly. Accordingly, we expect that natural selection favoring fast growth in specific environments has played an important role in shaping the physiology of many microbial organisms^{2,3}.
Conceptually, we can envision a bacterial cell as a volume enclosed by a membrane, filled with a solution of metabolites and of the proteins and nucleic acids that catalyze their conversion into biomass. A state of the cell is characterized by the molecular concentrations, which in turn determine the fluxes of the biochemical reactions through kinetic rate laws. The boundary conditions limiting the concentrations and fluxes are provided by the environment and by physicochemical constraints. Cellular growth has to be balanced over the cell cycle, i.e., all cellular components must be produced in proportion to their abundances^{4}. Casting these constraints into a mathematical model and characterizing states of optimal growth may provide a detailed understanding of central aspects of bacterial physiology^{3,5,6,7,8,9,10}.
Molenaar et al.^{5} proposed a small, schematic model of balanced, selfreplicating growth with explicit nonlinear reaction kinetics and at most seven reactions, including the production of catalytic proteins. Numerical growth rate optimization predicted qualitatively the growthrate dependencies of cellular ribosome content, cell size, and the emergence of overflow metabolism. We term this general modeling scheme Growth Balance Analysis (GBA). No extensions of this approach to larger models have been proposed, likely because of its inherent nonlinearity and the resulting difficulty of numerical optimizations. Instead, even simpler, linear models of 1–3 reactions were solved analytically to gain further qualitative understanding of systemslevel effects^{3,6,7,8,9}, including optimal gene regulation strategies^{3,8}.
Models for the genomescale physiology of complete cells are typically formulated as approximations to GBA^{11}. Currently, the most popular such method is flux balance analysis (FBA)^{12,13}. FBA maximizes the production rate of a constant biomass concentration vector while accounting for mass conservation by balancing the fluxes producing and consuming internal metabolites (Fig. 1). All constraints in FBA are linear. The resulting computational efficiency comes at the price of ignoring reaction kinetics and the requirement of sufficient enzyme concentrations to catalyze the predicted metabolic fluxes. FBA can be viewed as a linearization of the GBA scheme^{11}. Figure 1 shows a schematic comparison of FBA and GBA. While FBA predicts a linear dependence of maximal growth rate on nutrient uptake fluxes, GBA leads to a nonlinear (Monodtype) dependence on nutrient concentrations.
Most alternative wholecell modeling schemes^{14,15,16} are generalizations of FBA and are also based on the optimization of a cellular objective, which is typically set to the cellular growth rate or a proxy thereof. Like GBA, resource balance analysis (RBA)^{14} and genomescale models of metabolism and gene expression (ME)^{15} combine a genomescale metabolic model (as utilized in FBA, Fig. 1) with a protein translation apparatus that converts precursors into protein. While RBA models are formulated at a level of detail typical for FBA models, ME models aim to account comprehensively for all growthrelated cellular processes, including, for example, chaperoneassisted protein folding. Contrary to GBA, both methods do not account for metabolite concentrations and assume a linear relationship between fluxes and protein abundances. ME models typically assume constant effective rate constants for reactions, which are set to in vitro^{17} or in vivo^{18} estimates of turnover numbers (k_{cat}). RBA instead uses phenomenological, growthrate dependent effective kinetic rate constants. These are modeled as linear functions of the growth rate, and parameters are obtained by fitting modelpredicted fluxes to proteomics data. Constraint allocation flux balance analysis (CAFBA)^{16} is conceptually similar to RBA and ME, but describes the protein costs of biochemical reactions through previously discovered phenomenological growth laws^{19,20}.
These previous modeling schemes can be considered as approximations to GBA^{11} that go beyond FBA by including the protein cost of biochemical fluxes, but that ignore the influence of metabolite concentrations on reaction kinetics and the costs incurred through their dilution by growth. Genomescale implementations of RBA, ME, and CAFBA for model organisms^{14,15,16,21} have been shown to predict some macroscopic phenotypic behavior^{14,22}. However, the predicted investment into individual proteins appears to be highly inaccurate, possibly because enzyme kinetics are only treated approximately and metabolite concentrations are not accounted for. Moreover, these methods cannot facilitate a full understanding of phenotypic behavior from basic biochemical and biophysical constraints, and thus do not provide mechanistic insights at the level that would be possible with fully parameterized, genomescale GBA models.
Due to the central role of kinetic rate laws in GBA, GBA is also closely related to kinetic modeling approaches of cellular metabolism and growth^{23,24}. Like GBA, kinetic models implement the mass balance of biochemical reactions while accounting for the dependence of enzyme kinetics on protein and metabolite concentrations. In contrast to GBA and the alternative modeling schemes discussed so far, kinetic modeling approaches do not assume optimality, but simply describe the (steady state) distribution of fluxes and metabolite concentrations resulting from known enzyme concentrations and the kinetic rate laws. However, in vitro kinetic parameters (as reflected in databases such as BRENDA^{17}) are very incomplete, and estimates were often made in different experimental settings and are thus not always consistent with each other^{23} and with in vivo data^{18}. For this reason, enzymatic rate laws in kinetic modeling algorithms are typically parameterized by a fitting procedure that minimizes the discrepancy between model predictions and experimental data (e.g., metabolic fluxes or metabolite concentrations measured across multiple conditions or mutants)^{23,24}. Different approaches to kinetic modeling differ from each other in their representation of enzymatic rate laws and in the algorithm used to fit the corresponding parameters. While such fitted parameterizations can lead to accurate predictions of overall cellular physiology, they may show little or no correspondence to experimentally determined kinetic parameters^{25}. Moreover, kinetic models typically need to account for substratelevel regulatory interactions to result in realistic predictions^{23}.
Below, we develop the mathematical foundations for GBA of arbitrarily complex cellular systems. We first describe the constraints that characterize states of balanced growth, and we define elementary growth states (EGSs) by referring to the elementary flux modes (EFMs) of metabolic pathway analysis^{26} and FBA. We then show that the reaction fluxes, individual protein concentrations, and growth rate of any EGS are uniquely determined by the set of active reactions and the total cellular protein and individual reactant concentrations. We show how this theoretical framework can be used to understand cellular resource allocation conceptually, and we demonstrate how to analyze specific subsystems for which systemslevel effects cancel mathematically.
Results
Modeling balanced exponential growth
Our model assumes that the cell increases exponentially in size, while the concentrations of all cellular components (including the number of membrane constituents per cell volume) remain constant^{5}. We do not explicitly model cell division; thus, our model can also be interpreted as describing the growth of a population of cells^{8}. In balanced growth, the net production rate of each molecular constituent must balance its dilution by growth, \(0={\left.\frac{dx}{dt}\right}_{{\rm{production}}}\mu x\), where x denotes the concentration of a given component and μ is the cellular growth rate^{5,8}. The mass conservation in chemical reaction networks is commonly described through a stoichiometric matrix N, where rows correspond to metabolites and each column describes the mass balance of one reaction^{26}. Here, we focus on matrices A of active reactions, i.e., A is a submatrix of N that contains all columns j for reactions with flux v_{j} ≠ 0 and all rows for reactants i involved in these reactions. A also includes a “ribosome” reaction to produce catalytic proteins, encompassing enzymes, transporters, and the ribosome itself. We express concentrations as mass concentrations (mass per volume); accordingly, the entries of A are not stoichiometric coefficients but are mass fractions. The mass conservation of each component can then be stated as
where v is the flux vector (in units of [mass][volume]^{−1}[time]^{−1}), a is the vector of reactant mass concentrations a_{α}, and P is the sum of the mass concentrations p_{j} of all proteins j ∈ {1, …, n},
The first row of A describes the net production of total protein P, which is then distributed among the individual proteins j. The remaining rows describe the net production of the reactants α.
Each reaction rate v_{j} is the product of the concentration of its catalyzing protein p_{j} and a kinetic function k_{j}(a) that depends on the reactant concentrations a_{α},
We assume that the functional form and kinetic parameters of k_{j}(a) are known. k_{j}(a) may depend on the mass concentrations of substrates, products, and other molecules a_{α} acting as inhibitors or activators, and accounts for the system’s thermodynamics. The activity of all reactions j represented in A (v_{j} ≠ 0) implies p_{j} > 0 and k_{j}(a) ≠ 0.
Below, we treat the concentrations of total protein P and individual reactants a_{α} as the state variables of the system, and we show that the fluxes v_{j}, individual protein concentrations p_{j}, and growth rate μ can be cast as response variables. For a given concentration vector [P, a]^{T}, we define a balanced growth state (BGS) as a cellular state (characterized by its flux vector v) that satisfies constraints (1), (2) and (3). The set of all such states forms the solution space of balanced growth. On the following pages, we first develop a framework for GBA by characterizing BGSs at a fixed concentration vector [P, a]^{T}. These characterizations are independent of any physicochemical limits on the concentrations of the cellular components (density constraints); such constraints will, however, become crucial once we examine optimal balanced growth across all feasible concentration vectors. In the main text, we provide an overview over the mathematical structure of GBA and its implications; the formal definitions and theorems are detailed in “Methods”, while Supplementary Table 2 lists the symbols used.
Cellular state defined by the concentration variables
Let v be a BGS at concentration vector \({{\bf{y}}}_{0}={[{P}_{0},{{\bf{a}}}_{0}]}^{T}\). If we treat y_{0} as a constant, then Eq. (1) is mathematically identical to the steadystate constraint fundamental to FBA and to metabolic pathway analysis in general^{26}. We call v an EGS if v also represents an EFM^{27} of the corresponding FBAtype problem defined by the massnormalized stoichiometric matrix A together with a “biomass reaction” described by y_{0} and the flux directions enforced by the signs of the kinetic functions k_{j}(a_{0}) (i.e., v is a feasible flux vector with minimal support under the FBAtype constraints; “Methods”, Definition 3). We can express any BGS as a weighted average of EGSs at the same concentration vector [P, a]^{T} (Theorem 3). Moreover, any optimal BGS under a single cellular density constraint (see below) is also an EGS (Theorem 9 based on refs. ^{28,29} for EFMs; see also ref. ^{30}).
Thus, if we characterize the mathematical properties of EGSs, then these properties apply not only to optimal BGSs—which are the main focus of this work—but also to the individual EGS in a decomposition of any BGS. If A is the active stoichiometric matrix of an EGS, it has full column rank (Theorem 4 based on ref. ^{31}; see also ref. ^{30}). The full column rank is the only property of EGSs that we will require below. Accordingly, without much loss of generality, we focus on active matrices A that have full column rank for the remainder of this article.
The matrix A may have more rows than columns, in which case some reactant concentrations in a are linearly dependent on other concentrations^{32}. The dependent concentrations c are not free variables, and hence they can be put aside and dealt with separately. For clarity of presentation, we here present only the case without dependent reactants; the generalization to BGSs with dependent reactants can be treated similarly and is detailed in “Methods”.
Without dependent reactants, A is a square matrix with a unique inverse A^{−1}, and x ≡ [P, a]^{T} is the corresponding vector of independent concentrations. Multiplying both sides of the mass balance constraint (1) by A^{−1}, we obtain (Theorem 5)
The righthand side of the mass balance constraint (1) quantifies how much of each component x_{i} needs to be produced to offset the dilution that would otherwise occur through the exponential volume increase at rate μ. \({A}_{ji}^{1}\) quantifies the proportion of flux v_{j} invested into offsetting the dilution of component i, and we thus name A^{−1} the investment (or dilution) matrix; see Supplementary Fig. 1 for examples. In contrast to the massnormalized stoichiometric matrix A, which describes local mass balances, A^{−1} describes the structural allocation of reaction fluxes into offsetting the dilution of all downstream cellular components, carrying global, systemslevel information.
From the kinetic equation (Eq. (3)), p_{j} = v_{j}/k_{j}(a), and inserting v_{j} from the investment equation (Eq. (4)) gives
where ∑_{i} sums over the total protein and individual reactant concentrations (Theorem 6). Substituting these expressions into the total protein sum (Eq. (2)) and solving for μ results in the growth equation (Theorem 7)
As detailed in “Methods” (Theorems 5–7), a corresponding result also holds for BGSs with dependent reactants. Thus, for any active matrix A with full column rank (in particular for all active matrices of EGSs) and for any corresponding concentration vector x, there are unique and explicit mathematical solutions for the fluxes v, individual protein concentrations p, and growth rate μ. If μ (Eq. (6)) and all individual protein concentrations p_{j} (Eq. (5)) are positive, the cellular state is a BGS; otherwise, no balanced growth is possible at these concentrations.
Marginal fitness contributions of cellular concentrations
We now use these relationships to calculate the costs and benefits of concentration changes, which are naturally expressed in terms of relative fitness effects. As above, the main text considers the simpler case without dependent reactants, while the more general case is treated in “Methods”. If fitness is determined predominantly by growth rate^{1} (Supplementary Note 1), we can we define the marginal net benefit η_{i} of concentration x_{i} as the relative change in growth rate^{33} due to a small change in x_{i} (“Methods”, Definition 4),
for example, η_{P} = η_{ATP} = 0.01 l mg^{−1} would indicate that an increase of either total protein or ATP concentration by 1 mg l^{−1}—if possible—would increase the growth rate by 1%.
To aid in the interpretation of η_{i} below, we define the marginal production cost incurred by the system via protein j as a consequence of increasing concentration x_{i} at fixed growth rate μ and kinetics k_{j},
where the second equality follows from Eq. (5). \({q}_{i}^{j}\) quantifies by how much the concentration p_{j} of the upstream protein j has to rise in order to offset the increased dilution of the downstream concentration x_{i}. \({q}_{i}^{j}\) is related to the protein control coefficient of metabolic control analysis (MCA); see Supplementary Note 3 for a more detailed summary of the relationship between GBA and MCA^{34,35,36}.
Taking the partial derivatives of the growth equation (Eq. (6)) with respect to P and the concentration a_{α} of reactant α, respectively, we find that the marginal net benefits according to Eq. (7) can be expressed as (Theorem 8)
and
with
where the last equation is derived using p_{j} = v_{j}/k_{j}. \({u}_{\alpha }^{j}\) can be interpreted as the marginal kinetic benefit^{37} of reactant α to reaction j and quantifies the proportion of protein p_{j} “saved” due to the change in kinetics associated with an increase in a_{α}. The kinetic benefit \({u}_{\alpha }^{j}\) is a strictly local effect, as it is zero if a_{α} does not influence the kinetic function k_{j}(a); we expect \({u}_{\alpha }^{j}\) to be positive if α is a substrate and negative if α is a product of reaction j. \({u}_{\alpha }^{j}\) relates directly to the elasticity coefficients of MCA (Supplementary Note 3). Because fluxes are proportional to the concentrations of the catalyzing proteins, the marginal kinetic benefit of total protein is simply 1/P. Expressions that additionally account for dependent reactants are provided in “Methods”.
As seen from the derivation in “Methods”, applying the chain rule of differentiation to the growth equation (Eq. (6)) further provides a simple interpretation of the net benefit of component i via reaction j (see “Marginal fitness benefits and costs” in “Methods”; note that because here we assume that there are no dependent reactants, direct and total net benefits as defined in “Methods” are identical). The derivation shows that the marginal net benefit is identical to the reduction of the proteome fractions ϕ_{j} ≡ p_{j}/P facilitated by the increase in x_{i} at constant μ,
Thus, for a positive η_{i} and keeping the growth rate μ constant, a small increase in x_{i} by Δx_{i} results in a corresponding reduction of the total proteome fraction, ∑_{j}Δϕ_{j} = −η_{i}Δx_{i}: at least some proteins are now required at lower concentrations. This result provides a formal justification for the widely held notion that cellular costs lie predominantly in protein production^{3,5,6,7,8,9,14,15,19,20,37,38}.
Optimal growth and the balance of marginal net benefits
Up to this point, we kept x = [P, a]^{T} fixed. We will now characterize optimal growth states, i.e., BGSs with maximal growth rate across all allowed concentration vectors x. To make this problem well defined, we need to consider an additional constraint that reflects the cellular requirement for a minimal amount of free water to facilitate diffusion^{39,40}. We implement this constraint by assuming that cellular dry weight per volume is limited to a maximal density ρ, where ρ is determined by external osmolarity^{40,41} but is otherwise constant across growth conditions^{42,43,44},
A BGS is a densityconstrained BGS (dBGS) if it additionally satisfies constraint (9). At maximal growth rate, the cellular components will utilize the full cellular limit on density to saturate enzymes with their substrates, and thus the inequality in Eq. (9) becomes an equality.
The maximal balanced growth rate μ^{*} will be a function of ρ. In analogy to the marginal net benefits of cellular components, we define the marginal benefit of the cellular density as the relative fitness increase facilitated by a small increase in ρ,
Using the method of Lagrange multipliers with the growth equation (Eq. (6)) as the objective function, we derive necessary conditions at optimal growth, which we term balance equations:
(Theorem 10). Again, the presentation here assumes that there are no dependent reactants, while a corresponding result is derived for the general case with dependent reactants in “Methods” (“Optimal densityconstrained balanced growth states”). Both with and without dependent reactants, the optimal state is perfectly balanced: the marginal net benefits of all independent cellular concentrations x_{i} are identical. Thus, if the dry weight density ρ could increase by a small amount (such as 1 mg l^{−1}), then the marginal fitness gain that could be achieved by increasing protein concentration by this amount is identical to that achieved by instead increasing the concentration of any reactant α by the same amount. This should not be surprising: if the marginal net benefit of concentration x_{i} was higher than that of \({x}_{i^{\prime} }\), growth could be accelerated by increasing x_{i} at the expense of \({x}_{i^{\prime} }\).
Equation (10) together with Eq. (9) describes a system of n + 1 equations for n + 1 unknowns, the independent concentrations x_{i}. In realistic cellular systems, this set of equations has a finite number of discrete solutions. Thus, growth rate optimization can be replaced by searching for the solution of the balance equations. If the optimization problem is convex, the conditions given by Eq. (10) are necessary and sufficient, and the solution is unique.
Quantitative predictions
If a substrate \({\alpha }^{\prime}\) is consumed only by a single reaction that is the only one producing a product \({i}^{\prime}\) (with \({i}^{\prime}\in \{\rm{P},\alpha \}\)), the nonlocal dilution terms in the balance equation (\({\eta }_{{\alpha }^{\prime}}={\eta }_{{i}^{\prime}}\)) cancel, and we are left with a local problem for which only the production cost of \({x}_{i^{\prime} }\) and the kinetic benefits of \({a}_{{\alpha }^{\prime}}\) and \({x}_{{i}^{\prime}}\) must be considered. This is the case for protein production in simplified models^{38} where the ribosome (R) produces proteins from a single substrate, a generic ternary complex (T). In such models, we can calculate the optimal proteome fraction of actively translating ribosomes, ϕ_{R} ≡ p_{R}/P, from the balance equation η_{T} = η_{P} (Eq. (10) and its generalization in Theorem 10). The predictions agree quantitatively with experimental values in E. coli^{45,46} and the yeast Saccharomyces cerevisiae^{47} across a wide range of growth conditions (Fig. 2).
In contrast to previous approaches based on the analysis of schematic, linear cell models with 2–3 reactions and largely arbitrary kinetic parameters^{6,7,8,9}, our predictions of the scaling of active ribosome fractions with growth rate (Supplementary Fig. 2) are both quantitative and general, as they rely only on the known stoichiometries and kinetics of the ribosome reactions themselves and are independent of any particular network structure. An approximation that ignores the dilution of intermediates and hence the associated production costs (\({q}_{\alpha }^{j}\approx 0\)) results in less accurate predictions of E. coli ribosome concentrations especially at high growth rates (Supplementary Fig. 4). In contrast, these approximate predictions are close to observed values for growth on minimal media (μ < 1 h^{−1}), indicating that the dilution of intermediates, μa_{α}, becomes less important at lower growth rates. The latter observation may explain why the relationship between the concentrations of a substrate and its catalysts is well approximated in this regime by simply minimizing their combined mass concentration while keeping the reaction rate constant^{48}, as this is mathematically equivalent to ignoring the dilution of intermediates.
To obtain a rough quantitative estimate of the marginal net benefits η_{i}, we here consider the simplest model of a complete cell, consisting of only a transport protein and the ribosome^{3,7} (Supplementary Fig. 2). This model is structurally very similar to previously analyzed schematic wholecell models. However, contrary to previous models that assumed a fixed total protein concentration as the only density constraint^{3,5,6,7,8,9,19}, our model’s density constraint (9) limits the joint mass concentration of proteins and reactants. Based on the experimentally observed proteome fraction of total dry weight in E. coli, P/ρ = 0.55^{49}, we estimate \(\frac{\rho }{\mu }\frac{d\mu }{d\rho }=\rho {\eta }_{\rho }=0.66\) (“Methods”, Eq. (42)). Thus, a decrease in cellular dry weight density ρ of 1% would lead to a 0.66% reduction in growth rate, emphasizing the biological significance of the density constraint and potentially explaining why E. coli’s dry mass density appears to be roughly constant across conditions^{42,43,44}.
The cellular density ρ changes when external osmolarity is modified^{40}. \(\rho {\eta }_{\rho }=\frac{d\mathrm{ln}\,\mu }{d\mathrm{ln}\,\rho }\) is the slope of the loglogscale plot of μ vs. ρ across different external osmolarities. While increases in ρ may have strong effects on diffusion and thus on enzyme kinetics, reductions in ρ due to decreased external osmolarity are within the scope of our model. The very limited available experimental data (three data points from ref. ^{50}, Supplementary Fig. 3) suggest ρη_{ρ} ≈ 0.66, the same as our rough estimate from the minimal cell model. An otherwise identical model that limits total protein density^{3,5,6,7,8,9,19} P instead of dry mass density predicts a much weaker dependency of growth rate on osmolarity, with Pη_{P} = 0.36 (“Methods”).
Discussion
At the heart of our mathematical derivations is A^{−1}, the inverse of the massnormalized active stoichiometric matrix A of any given EGS (or, more generally, any given BGS with linearly independent reactions). A^{−1} provides important information on the cellular efficiency. As seen from Eq. (4), \({A}_{ji}^{1}\) quantifies which proportion of reaction flux v_{j} is required to offset the dilution of the downstream cellular component i (either total protein P or reactant α). These nonlocal, structural massbalance constraints lead to an explicit dependence of reaction fluxes on the cellular concentrations (Eq. (4), Theorem 5). Independently of this, fluxes also depend on concentrations through reaction kinetics (constraint (3)). Combining these two relationships leads to explicit expressions for the individual protein concentrations p_{j} and for the growth rate μ, casting them as functions of the concentrations x = [P, a]^{T}. Accordingly, A^{−1} accounts for all systemslevel contributions to the marginal costs and benefits of cellular concentrations x_{i}, while the kinetic functions k_{j}(a) account for local effects. The insight that optimal, densityconstrained states of balanced growth are EGS allowed us to derive the balance equations (Eq. (10)); furthermore, as any BGS can be expressed as a weighted average of EGSs (Theorem 3), our results allow a general characterization of the solution space of balanced growth.
While computational limitations restricted previous studies of balanced growth to specific models with 2–7 reactions, we here provide general results for arbitrarily complex cellular systems. Except for the maximal cellular dry weight density constraint (9), the balanced growth model proposed by Molenaar et al.^{5} and utilized subsequently for the analysis of schematic models^{3,6,7,8,9,10} is based on assumptions identical to those made for GBA, constraints (1), (2) and (3). Previous authors (with the exception of Faizi et al.^{10}) assumed a limit on total protein (“macromolecular”) concentrations, while we assume a joint limitation of all cellular solutes (Eq. (9)). The latter choice is justified by the approximate constancy of the cellular dry mass density across growth conditions^{42,43,44}, and by an observed relationship between enzyme and substrate concentrations that is consistent with natural selection on the parsimonious use of a limited dry mass density^{48}.
To make the presentation concise, our development of GBA assumes (i) that all proteins contribute to growth by acting as catalysts or transporters; (ii) that there is a 1to1 correspondence between proteins and reactions; (iii) that proteins are not used as reactants; (iv) that all catalysts are proteins; and (v) that cells are optimized for growth. Supplementary Note 2 outlines how to remove these simplifications.
Due to the explicit inclusion of the major physicochemical constraints on cellular growth, GBA models promise to provide a mechanistic understanding of microbial resource allocation and physiology at a depth not achievable with alternative optimizationbased models. In principle, exploitation of the balance equations (Eq. (10)) may allow the numerical optimization of cellular systems of realistic size, encompassing hundreds of protein and reactant species. However, several challenges must be overcome before GBA models can be used to make detailed quantitative predictions of genomescale resource investment and physiology.
The first challenge is the identification of the set of active reactions in a given cellular state, leading to the active stoichiometric matrix A. The optimal state is an EFM of the linearized problem (Theorem 9), and thus a direct way to achieve this would be to compute all EFMs of the full stoichiometric matrix compatible with balanced growth (i.e., all supportminimal subsets of reactions that are capable of producing their own reactants plus protein), to apply GBA to each of them, and to then select the EFM resulting in the highest growth rate. While this approach works well for small, schematic models as those in refs. ^{3,5,6,7,8,9} and may be feasible for coarsegrained models with a few dozen reactions, the number of biomassproducing EFMs in genomescale networks is too large for them to be calculated exhaustively on current computers^{51}. As an approximate alternative, one could restrict this analysis to a subset of candidate EFMs, e.g., based on FBA with molecular crowding^{52} and on parsimonious FBA^{53} (where fluxes could be scaled by the maximal enzyme turnover rates, k_{cat}) or chosen to represent known physiological states (e.g., yieldmaximizing vs. overflow metabolism^{54}), or one might analyze only EFMs with a prespecified maximal number of reactions^{51}.
A further obstacle to the accurate formulation of GBA models is the current incompleteness of knowledge on the kinetic rate laws and parameters needed for the functions k_{j}(a), the same problem which hampers largescale kinetic modeling applications^{23,24,55} and (with respect to the effective turnover numbers) RBA^{14} and ME^{15,22} models. Recent developments of highthroughput assays for their estimation from omics data have led to promising results^{14,18,56}, suggesting that such approaches may lead to a comprehensive kinetic characterization of model microbes in the future. In parallel, methods from artificial intelligence have been shown to predict enzyme kinetic parameters with reasonable accuracy^{57,58}, suggesting that these approaches can augment incomplete in vitro or in vivo parameter sets. Parameter balancing^{59} could aid in the completion of a given set of kinetic constants by exploiting the thermodynamic dependencies among biochemical quantities^{37}. In addition, GBA parameterizations could be completed similarly to the parameterization of kinetic models^{23,24}, by fitting model predictions to experimental data acquired across growth conditions. Experience with kinetic models indicates that high predictive power can frequently be achieved with large uncertainties in the parameter sets^{23,25}, suggesting that even approximate GBA parameterizations may already lead to valuable insights. Finally, finding the optimal state of a genomescale GBA model requires the numerical solution of a large nonlinear optimization problem. The system of n + 1 equations provided by Eqs. (9) and (10) represents the necessary conditions for optimal growth, and these are important ingredients for developing efficient algorithms to solve the problem.
Although explicit, genomescale GBA models are built on the same kinetic rate laws as kinetic modeling approaches, their optimizationbased methodology does not require enzyme concentrations as model inputs and will likely be more robust to inaccurate kinetic representations. Importantly, GBA will also be much more robust to the omission of regulatory effects of reactants, as these result in additional protein costs but will in most cases have no major influence on the predicted fluxes. On the other hand, kinetic models can be used to assess the cellular response to genetic or environmental perturbations and can utilize mutant data for their parameterization. This is not possible with optimizationbased models such as GBA, as they assume that cellular resource allocation in the modeled state is optimal with respect to a known objective function, the balanced growth rate in the case of GBA.
While several challenges have to be met before GBA can be applied to genomescale balanced growth models, the present work establishes a comprehensive formal basis for such applications. Importantly, this mathematical framework can immediately be applied to the systematic analysis of schematic models, such as those examined in earlier work using numerical methods^{5,10} or adhoc analytical optimizations^{3,6,7,8,9}. Moreover, the analytic formulations developed here facilitate the straight forward application of GBA to coarsegrained cellular models of increasing complexity, parameterized from experimental data^{19,20,60}.
Independent of model details and parameterizations, our mathematical analysis provides general quantitative insights into cellular resource allocation and physiology in states of balanced growth. For example, while previous work has emphasized the central role of proteins in the cellular economy^{3,5,6,7,8,9,14,15,19,20,37,38}, Eq. (8) provides a rigorous formal justification for this notion in the context of balanced growth. At the same time, whereas the total protein mass concentration P is much higher than the mass concentration of any other cellular constituent a_{α} in most biological systems, the balance equations show that their marginal net benefits are in fact equal at optimal growth.
The application and further development of the GBA theory may foster an enhanced theoretical understanding of how physicochemical constraints determine the fitness costs and benefits of cellular organization. Moreover, the explicit expressions for the marginal fitness costs and benefits of cellular concentrations provide a rigorous framework for a quantitative analysis of the cellular economy. We anticipate that this approach will prove fruitful not only in the interpretation of natural and laboratory evolution, but also in optimizing the design of synthetic biological systems.
Methods
Overview
In the first four sections of “Methods”, we provide a formal description of Growth Balance Analysis (GBA), detailing the formal definitions, theorems, and proofs that form the basis of the main text. For simplicity of notation, we use the following conventions: {α} is the set of all reactants in the active stoichiometric matrix A, and ∑_{α} indicates that we sum over all α ∈ {α}. We use corresponding notations for the sets of independent basis reactants {β}, with concentrations b_{β}, and dependent reactants {γ}, with concentrations c_{γ} (see below). As explained in Definition 1 below, stoichiometric matrices are always in units of mass fractions, not stoichiometric coefficients. The last two sections describe the calculations of the optimal ribosome proteome fractions and the dependence of maximal growth rate on cellular water content.
Characterization of balanced growth states
First, we introduce the fundamental definitions that characterize the solution space of balanced cellular growth. We define BGSs and generalize the concept of EFMs from linear constraintbased models to EGSs (defined as flux vectors). We then introduce several theorems on the characterization and decomposition of BGSs.
In the formulation presented here, we assume that proteins do only act as catalysts and not as substrates of reactions. Hence, neither total protein nor individual proteins are considered “reactants”.
Definition 1 (BGSs): Let \({\bf{v}}^{\prime} \in {{\mathbb{R}}}^{n^{\prime} }\) be the vector of fluxes through the biochemical reactions that occur in a cell, in units of [mass][volume]^{−1}[time]^{−1}. Let \({\bf{v}}\in {{\mathbb{R}}}_{\ne 0}^{n}\), \(n\le n^{\prime}\), be the subvector of \({\bf{v}}^{\prime}\) that contains all active fluxes of \({\bf{v}}^{\prime}\) (i.e., all entries \({v}_{k}^{\prime} \, \ne \, 0\)). Let \({\bf{y}}\equiv {[P,{\bf{a}}]}^{T}\in {{\mathbb{R}}}_{ \,{> }\,0}^{m+1}\) be a corresponding vector of total protein concentration P and individual reactant concentrations a_{α}, α ∈ {1, ..., m}, where each a_{α} is consumed or produced by at least one of the fluxes v_{i}; y is in units of [mass][volume]^{−1}. Let \(A\in {{\mathbb{R}}}^{(m+1)\times n}\) be the corresponding active stoichiometric matrix in mass fraction units, i.e., column j of A describes reaction j with flux v_{j}, row i of A corresponds to the cellular component y_{i}, and each column is mass balanced. Thus, the sum of negative entries in each column is S_{−} = −1 and the sum of positive entries of each column is S_{+} = +1; for reactions that involve an external substrate not represented by a row of A, −1 < S_{−} ≤ 0, while for reactions that involve an external product, 0 ≤ S_{+} < 1.
Let \({\bf{p}}\in {{\mathbb{R}}}_{{> }\,0}^{n}\) be the vector of individual protein concentrations (in units of [mass][volume]−1), where protein j catalyzes reaction j; for simplicity, we assume that the “ribosome” catalyzing protein production is also itself a protein (but see Supplementary Note 2 for how to remove this simplification). Let k(a) be a vector of kinetic functions, \({\bf{k}}:{{\mathbb{R}}}_{{> }\,0}^{m}\,\mapsto\, {{\mathbb{R}}}_{\ne 0}^{n}\), where k_{j}(a) is in units of [time]^{−1}.
Then v is a balanced growth state (BGS) at growth rate μ if and only if it fulfills the following three constraints:
A BGS v at growth rate μ is a densityconstrained BGS (dBGS) if it additionally fulfills the constraint on total dry mass density
Constraint (11) implements mass balance, constraint (12) implements concentrationdependent reaction kinetics, while constraint (13) implements a constraint on the total proteome concentration. The kinetic constraint (12) assumes that the flux through each reaction is linear in the concentration of the catalyzing enzyme, while the dependence on the reactant concentrations a_{α} will typically be nonlinear. For simplicity of notation, we will sometimes make the dependence of kinetics on a implicit, i.e., we will use k_{j} ≡ k_{j}(a).
In the above definitions, we define a BGSs (or dBGSs) as a function of the set of active reactions (corresponding to the columns of A) and the concentration vector y = [P, a]^{T}. For a given active stoichiometric matrix A, the set of all such states at all concentrations \({\bf{y}}\in {{\mathbb{R}}}_{{> }\,0}^{m+1}\) defines the solution space of balanced growth (or of densityconstrained balanced growth if only concentrations y that respect constraint (14) are considered).
Based on biophysical considerations, we might replace Eq. (14) with separate density constraints on the total volume concentration inside each cellular compartment^{39} and on the total area occupied by nonlipid membrane components per membrane area^{5,61}. An even simpler density constraint imposed in most previous models^{3,5,6,7,8,9,14,15} is to fix total protein concentration P to a constant value. However, it has been shown that P decreases with increasing growth rate, whereas total dry mass density is approximately constant across conditions^{42,43,44}. Thus, while a constant P allows to simplify the presentation, Eq. (9) provides a biologically more meaningful constraint; moreover, this constraint allows us to determine the costs and benefits of varying the total protein concentration.
De Groot et al. have defined BGSs for a similar problem^{30}. In their formulation, the dimensions of the concentration vector y include not only total protein P, but all individual protein concentrations p_{j}. This more general problem formulation comes at the cost of more involved decomposition rules^{30} compared with Theorem 2.
We now provide the basis for linking BGSs to EFMs, which are defined for FBAtype linear constraintbased problems^{27} and which have been extended to proteomeconstrained models^{28,29}.
Definition 2 (EFMs): Let \({\bf{v}}\in {{\mathbb{R}}}^{n}\), \({\bf{y}}={[P,{\bf{a}}]}^{T}\in {{\mathbb{R}}}_{{> }\,0}^{m+1}\), and \(A\in {{\mathbb{R}}}^{(m+1)\times n}\) be as in Definition 1. Let \({{\bf{k}}}^{{\rm{(eff)}}}\in {{\mathbb{R}}}_{\ne 0}^{n}\) be a vector of effective kinetic constants. Then we call v a feasible flux vector at biomass production rate v_{bio} if and only if it fulfills the following constraints:
A feasible flux vector v is a representative of an elementary flux mode (EFM) if and only if it is nondecomposable, i.e., it fulfills the following additional constraint^{27}: There exists no couple of feasible flux vectors \({\bf{v}}^{\prime} ,{\bf{v}}^{\prime\prime}\) such that \({\bf{v}}={\lambda }_{1}{\bf{v}}^{\prime} +{\lambda }_{2}{\bf{v}}^{\prime\prime}\) with λ_{1}, λ_{2} > 0 and where both \({\bf{v}}^{\prime}\) and v″ have at least the same number of zeroes as v, while at least one of them contains more zeroes than v.
If we consider the concentration vector y = [P, a]^{T} as a descriptor of a constant biomass composition, we see that constraint (15) is mathematically equivalent to the standard steadystate constraint of FBA and metabolic pathway analysis^{26} problems, formulated without an artificial “biomass reaction” in A (see, for example, Eq. (2) in ref. ^{62}). Note that in the definition of EFMs, both the biomass composition y = [P, a]^{T} and the effective kinetics k^{(eff)} are assumed to be constant; thus, the constraints (15)–(17) that define the space of feasible flux vectors are fully linear. In contrast, constraint (12) in Definition 1 defines reaction kinetics as a function of the reactant concentrations a.
Definition 3 (EGS): A BGS v at concentrations y = [P, a]^{T} is an elementary growth state (EGS) if and only if it is a representative of a corresponding EFM, i.e., v represents an EFM of the corresponding linear problem with constant biomass y and effective kinetic constants k^{(eff)} = k(a).
We emphasize that v is an EFM of the corresponding linearized (FBAlike) problem (see Definition 2), not of the balanced growth problem (Definition 1) from which it is derived. EFMs are defined as equivalence classes of minimal feasible steadystate flux distributions, whose members can be converted into each other by multiplication with a positive scalar^{27}. This definition cannot be generalized to balanced growth models, as multiples of a feasible flux vector generally do not satisfy constraint (11). For this reason, de Groot et al. have generalized the concept of EFMs to equivalence classes of minimal sets of active reactions in BGSs, termed elementary growth modes (EGMs)^{30}.
Theorem 1 (Existence of solutions): Let y = [P, a]^{T} be a concentration vector and μ > 0 be a growth rate. For any flux vector \({\bf{v}}^{\prime}\) that satisfies the mass balance constraint (11), there exists a unique BGS \({\bf{v}}=\lambda {\bf{v}}^{\prime}\) with λ > 0 if all fluxes run in the direction compatible with the reaction kinetics (i.e., ∀j: k_{j}v_{j} > 0), and no such BGS otherwise.
Proof: From constraint (12), it is clear that if k_{j}v_{j} ≤ 0, no BGS with p_{j} > 0 exists. For k_{j} ≠ 0, the concentration of protein j is uniquely defined by p_{j} = v_{j}/k_{j} (constraint (12)). Let \(P^{\prime} ={\sum }_{j}v^{\prime} /{k}_{j}\) be the total protein concentration associated with \({\bf{v}}^{\prime}\). Then setting \(\lambda \equiv P/P^{\prime}\) results in the only flux vector that fulfills all constraints of Definition 1. □
Next, we use this result to show that any weighted average of BGSs is itself a BGS.
Theorem 2 (A weighted average of BGSs is a BGS): Let (v^{(1)}, . . . , v^{(k)}) be an ordered set of BGSs for the concentration vector y = [P, a]^{T} with growth rates (μ^{(1)}, . . . , μ^{(k)}), but with potentially different active stoichiometric matrices A^{(l)}. Let A be the stoichiometric matrix that combines all reactions represented in (A^{(1)}, . . . , A^{(k)}), i.e., the columns of A consist of all unique columns of (A^{(1)}, . . . , A^{(k)}). Let \(({{\bf{v}}}^{^{\prime} (1)},...,{{\bf{v}}}^{^{\prime} (k)})\) be a representation of the individual BGSs v^{(l)} in the flux space defined by A, i.e., \({v}_{j}^{^{\prime} (l)}=0\) for all columns (reactions) of A not represented in A^{(l)}. Then any weighted average \({\bf{v}}={\sum }_{l}{w}_{l}{{\bf{v}}}^{^{\prime} (l)}\) of these extended flux vectors (with weights w_{l} > 0 and ∑_{l}w_{l} = 1) is itself a BGS for y, with a growth rate that is the weighted average of the individual growth rates, μ = ∑_{l}w_{l}μ^{(l)}.
Proof: The mass balance constraint (11) is linear in the fluxes and growth rates, and is hence also fulfilled for the weighted averages. The protein concentrations of each BGS \({{\bf{v}}}^{^{\prime} (l)}\) are \({p}_{j}^{^{\prime} (l)}={v}_{j}^{^{\prime} (l)}/{k}_{j}\). To satisfy the reaction kinetics constraint (12), the protein concentrations of the weighted average are \({p}_{j}={v}_{j}/{k}_{j}={\sum }_{l}{w}_{l}{v}_{j}^{^{\prime} (l)}/{k}_{j}={\sum }_{l}{w}_{l}{p}_{j}^{^{\prime} (l)}\). As each BGS (l) fulfills the proteome constraint (13), \({\sum }_{j}{p}_{j}={\sum }_{j}{\sum }_{l}{w}_{l}{p}_{j}^{^{\prime} (l)}={\sum }_{l}{w}_{l}P=P\), and thus v is a BGS. □
We can now use Theorems 1 and 2 together with results on EFMs to show that any BGS can be decomposed into a weighted average of EGSs.
Theorem 3 (BGSs are weighted averages of EGSs): Any BGS v for the concentration vector y = [P, a]^{T} can be decomposed into a weighted average of EGSs at y.
Proof: v is a feasible flux vector for the linearized problem defined by constraints (15)–(17) at constant biomass y. The direction of reaction j is fixed by the sign of \({k}_{j}^{{\rm{(eff)}}}={k}_{j}({\bf{a}})\), i.e., all reactions are irreversible. Under these conditions, it has been shown that v is a convex combination of EFMs \({{\bf{v}}}^{^{\prime} (l)}\) of the linear problem^{27}, i.e., \({\bf{v}}={\sum }_{l}{w}_{l}^{\prime}{{\bf{v}}}^{^{\prime} (l)}\) with \({w}_{l}^{\prime}\, > \,0\). From Theorem 1, we know that for each of these EFMs, there exists a unique BGS \({{\bf{v}}}^{(l)}={\lambda }_{l}{{\bf{v}}}^{^{\prime} (l)}\) with λ_{l} > 0; according to Definition 3, this is an EGS. Thus, we can write v = ∑_{l}w_{l}v^{(l)} as a linear combination of EGSs, with weights \({w}_{l}\equiv {w}_{l}^{\prime}/{\lambda }_{l}\).
To prove that v is a weighted average of the v^{(l)}, it remains to be shown that W ≡ ∑_{l}w_{l} = 1. According to Theorem 2, a weighted average \({\bf{v}}^{\prime\prime} \equiv {\sum }_{l}\frac{{w}_{l}}{W}{{\bf{v}}}^{(l)}=\frac{1}{W}{\bf{v}}\) will also be a BGS. However, Theorem 1 states that there exists only one BGS in the direction of v, and thus W = 1. □
Growth equations
In this section, we assume that the concentrations of total protein and of individual reactants, y ≡ [P, a] are known. Mass conservation (constraint (11)) and reaction kinetics (constraint (12)) relate reaction fluxes to the concentration vector in two fundamentally different ways. We will now exploit this fact to eliminate the flux variables and to derive explicit expressions for v, p, and μ.
Note that because the concentrations y are used as state variables in these analyses, no explicit consideration of constraints on cellular density, such as constraint (14), is necessary. The given concentrations y may obey constraint (11) or alternative density constraints, such as independent constraints on the density of cellular compartments, but these will not be used here. They will only become important when we vary y to find states of maximal growth rate in a later section.
An important requirement for the analyses below is that the active stoichiometric matrix A has full column rank, motivating the next theorem.
Theorem 4 (The active reactions of an EGS are linearly independent): Let \(A\in {{\mathbb{R}}}^{(m+1)\times n}\) be the active stoichiometric matrix of an EGS. Then A has full column rank n, i.e., the columns of A are linearly independent.
Proof: According to the definition of EGSs (Definition 3), A is also the active matrix of the corresponding linearized (flux balance type) problem. It has previously been shown^{31} that the active stoichiometric matrix A of an EFM of a linear fluxbalance problem has full column rank if A is formulated without an explicit “biomass” reaction (as in Definition 2). □
According to this theorem, the following theorems—which assume that A has full column rank—can in particular be applied to EGSs (and, as we will see below in Theorem 9, thus also to dBGSs with maximal growth rate).
Theorem 5 (Investment equation): Let \(A\in {{\mathbb{R}}}^{(m+1)\times n}\) be an active stoichiometric matrix of a flux vector v that fulfills the mass balance constraint (11) with concentration vector y = [P, a]^{T}, where A has full column rank n. Then we can split A into two submatrices \(B\in {{\mathbb{R}}}^{n\times n}\) and \(C\in {{\mathbb{R}}}^{(m+1n)\times n}\),
such that B is a nonsingular (invertible) square matrix and each row of C is a linear combination of rows of B.
Let B^{−1} be the inverse of B. Let b be the subvector of reactant concentrations a that correspond to the rows of B, c be the subvector of the reactant concentrations that correspond to the rows of C, and x ≡ [P, b]^{T}. Then v is given by
The dependent reactant concentrations c are linear combinations of the independent concentrations x,
with the dependence matrix D ≡ CB^{−1}.
Proof: The active stoichiometric matrix A may have more rows than columns. In this case, m + 1 > n, and the rows for exactly n metabolites are linearly independent, as row and column rank must equal. As a consequence, the remaining m + 1 − n metabolite concentrations are linearly dependent on the concentrations of the n independent metabolites. These dependent concentrations are not free variables, and hence they can be put aside and dealt with separately.
We decompose the linear system of equations represented by constraint (11) into two parts, rearranging the rows of A into matrices B, C such that B contains the rows for the independent reactants. As A has full column rank, choosing linearly independent rows results in a square matrix B of full rank (#rows(B) = rank(B) = rank(A)). Let b be the subvector of reactant concentrations a that correspond to the rows of B, and let c be the subvector of the remaining reactant concentrations corresponding to the rows of C. We can then split the mass balance constraint (11) into two separate equations:
B is a square matrix of full rank, so there is always a unique inverse B^{−1}. Multiplying both sides of the first equation by B^{−1} from the left, we obtain the desired equation for v. Inserting this result into the second equation results in the desired equation for c. □
Thus, if A has full rank, then any flux vector v respecting the flux balance constraint (11) is uniquely defined and is a linear combination of the total protein concentration P and the independent metabolite concentrations b. Each entry of the inverse matrix \({B}_{ji}^{1}\) quantifies the proportion of flux j invested into the dilution of component i, and we thus name B^{−1} the investment (or dilution) matrix (see Supplementary Fig. 1 for examples). In contrast to the stoichiometric matrix A, which describes local mass balances (constraint (11)), B^{−1} describes the structural allocation of reaction fluxes into the production of cellular components diluted by growth, and thus carries global, systemslevel information.
B corresponds to the reduced stoichiometric matrix in ref. ^{32}. D describes the linear dependence of the dependent concentrations c on P and b; it is identical to the link matrix in ref. ^{32}. The relationship between A and B, C can be understood in terms of matroid theory, where the rows of B form a basis for the matroid spanned by the rows of A, and the set of rows of C is the closure for the set of rows of B. If the choice for the partitioning of A into B and C is not unique, some partitionings may be pathological and should be avoided (Supplementary Note 4).
When A is not square, B includes a proper subset of the rows in A, and thus B on its own is not mass balanced. The “missing” mass fluxes are balancing c, and hence the flux investment into c is already accounted for by the investment equation in Theorem 5.
We are now in a position to express the individual protein concentrations and the growth rate of a BGS as explicit functions of the concentrations y = [P, a]^{T}.
Theorem 6 (Individual protein concentrations as a function of the independent concentrations): Let \(A\in {{\mathbb{R}}}^{(m+1)\times n}\) be an active stoichiometric matrix with full column rank n, and let x = [P, b]^{T} be the independent concentration vector with corresponding index i ∈ {P, β}. Let v be a corresponding BGS. Let B and D be the basis and dependency matrices, respectively, as defined in Theorem 5. Then the concentration of the protein catalyzing reaction j is
Proof: As A is an active matrix, all fluxes v_{j} = p_{j}k_{j}(a) (constraint (12)) are nonzero. We can thus express the individual protein concentrations as p_{j} = v_{j}/k_{j}(a). Inserting v_{j} from the investment equation (Theorem 5) directly leads to the above equation. □
We now insert the equations for the individual proteins into the total protein constraint (13) to obtain an explicit expression for the growth rate.
Theorem 7 (Growth equation): Let \(A\in {{\mathbb{R}}}^{(m+1)\times n}\) be an active stoichiometric matrix with full column rank n, and let y = [P, a]^{T} be a concentration vector. Let v be a corresponding BGS. Let B and D be the basis and dependency matrices, respectively, as defined in Theorem 5. Then the growth rate is
if for all reactions \(\frac{{p}_{j}}{\mu }=\frac{{\sum }_{i}{B}_{ji}^{1}{x}_{i}}{{k}_{j}({\bf{a}})}\, > \,0\), and no balanced growth is possible otherwise.
Proof: According to Theorem 6, the individual protein concentrations are \({p}_{j}=\mu \frac{{\sum }_{i}{B}_{ji}^{1}{x}_{i}}{{k}_{j}({\bf{a}})}\). The flux v_{j} catalyzed by protein j must be active, and thus p_{j} has to be positive for all j. Substituting the expressions for p_{j} into the proteome constraint (13), we obtain
The sum on the r.h.s. is positive, and dividing by it results in the growth equation. □
Thus, if the active matrix A of a BGS is full rank, there are unique and explicit mathematical solutions for p, v, and μ. In particular, this is the case for optimal growth states (Theorem 9), as well as for all other EGSs. In this section, we did not impose any density constraints (such as constraint (14)), and thus Theorems 1–7 remain valid under arbitrary density constraints as long as these are respected by the concentration vector y = [P, a]^{T}.
Marginal fitness benefits and costs
In this section, we first define marginal fitness benefits and costs of concentrations. As in the previous section, the considerations in this section make no use of the density constraint (14), and thus remain valid under alternative density constraints. After introducing the definitions, we show how to calculate and to interpret the costs and benefits.
Definition 4 (Marginal costs and benefits): Let v be a BGS with growth rate μ. Let i ∈ {P, β} be an index of the independent concentration vector x = [P, b]^{T}. Then the direct marginal net benefit of concentration x_{i} is defined as the relative change in growth rate due to a small change in x_{i}^{33},
Analogously, we define the marginal benefit of dependent reactant γ as
The (total) marginal net benefit of x_{i} is then defined as the relative change in growth rate due to a small change in x_{i}, accounting for the resulting changes in the concentration of dependent metabolites c_{γ},
where the second equality follows directly from Eq. (18).
A change δx_{i} of x_{i} (i ∈ {P, β}) causes a correlated change of each dependent concentration δc_{γ} = D_{γi}δx_{i} (Eq. (18)). Thus, a change by δx_{i} results in a total change of the utilization of cellular density by κ_{i}δx_{i}, with the density factor defined as
To help in the interpretation of the marginal net benefits, we will relate them in the next theorem to two explicit definitions of costs and benefits, respectively. The marginal production cost of the cellular concentration x_{i} is defined as
where the subscript of the parenthesis indicates which variables are kept constant in the derivative. \({q}_{i}^{j}\) can be interpreted as the additional amount of protein j required to offset the increased dilution of x_{i} ∈ {P, β} at growth rate μ and fixed kinetics k_{j}. We define the marginal kinetic benefit of the reactant concentration b_{β} as
and we make corresponding definitions \({u}_{\gamma }^{j}\) for dependent concentrations c_{γ}. The marginal kinetic benefits can be interpreted as the fraction of proteins j saved at constant flux v_{j} due to the increased saturation of reaction j with reactant β or γ, respectively.
The marginal net benefits can now be expressed as differences between benefits and costs. To calculate the direct marginal net benefits \({\eta }_{i}^{0}\), we must use the growth equation derived in Theorem 7,
where we defined proteome fractions ϕ_{j} ≡ p_{j}/P. The first form given for μ(P, a) here quantifies growth as a function of the state variables x_{i}, and it would be straight forward to calculate \({\eta }_{i}^{0}\) from this expression. However, to establish a formal link between marginal net benefits and protein investment, we will instead go via the second form, which arises from Theorem 6 and was used to derive the growth equation, and the third form, which expresses this relationship in terms of the proteome fractions ϕ_{j}. When we take the partial derivatives with respect to the state variables x_{i} in the second and third forms, we must make sure that we keep the right terms constant: when expressed in terms of the x_{i}, the expression p_{j}/μ is in fact independent of μ (Theorem 6), and we hence need to take the derivatives while keeping μ constant. We thus get for the direct marginal net benefits:
where we used the fact that the proteome fractions must add to 1, ∑_{j}ϕ_{j} = 1. Thus, the direct marginal net benefit of the cellular concentration x_{i} is identical to the total associated changes in proteome fractions caused by this change.
Again looking at Eq. (21), we can further analyze the nature of the proteome changes caused by a change in the cellular concentration x_{i}. Let us first consider a reactant concentration x_{i} = b_{β}. Applying the chain rule of differentiation to \({\phi }_{j}={p}_{j}/P=\mu {\sum }_{i}{B}_{ji}^{1}{x}_{i}/{k}_{j}({\bf{a}})\), we have to add the partial derivatives with respect to x_{i} = b_{β} in the numerator \(\mu {\sum }_{i}{B}_{ji}^{1}{x}_{i}={v}_{j}\) (keeping μ and k_{j} constant) and in the denominator k_{j}(a) (keeping the numerator v_{j} constant, which also guarantees that μ is constant). Thus, we can write the direct marginal net benefits of the independent reactant concentration b_{β} in terms of proteome changes as
where in the last line we inserted the definitions of the marginal kinetic benefits and production costs (Definition 4).
Performing an analogous calculation for the direct net benefit of the total protein concentration P (noting that we now need to take the derivative with respect to the numerator of the growth equation but not with respect to (k_{j}), we obtain
where we used ∑_{j}p_{j} = P and where we inserted the definition of the marginal production cost of P (Definition 4, with k_{j} independent of P) in the last line. The positive term 1/P in the direct net benefit of total protein quantifies the marginal benefit of increasing the total protein concentration P, which accelerates all reactions linearly.
We have thus proven the next Theorem, which elucidates how costs and benefits of cellular compounds are naturally related to protein use; this connection has been proposed before^{33,37} but is derived here rigorously from first principles.
Theorem 8 (Direct marginal net benefits): The direct marginal net benefit of any independent cellular concentration x_{i} (i ∈ {P, β}) is the negative of the total associated change in relative protein concentrations at constant growth rate μ,
The direct marginal net benefits of the total protein concentration P and of independent reactant concentrations b_{β} (β ∈ {1, …, m}), respectively, are
The marginal production cost \({q}_{i}^{j}\) is the fraction of extra protein j expended to offset the additional dilution of concentration x_{i} at rate μ and fixed saturation k_{j}; it can be calculated from the growth equation (Theorem 7) as
The marginal kinetic benefit \({u}_{\beta }^{j}\) is the fraction of protein j saved due to its increased saturation with reactant β; it is calculated from the growth equation as
The marginal kinetic benefits of dependent reactants γ are
where \({u}_{\gamma }^{j}\) is calculated analogously to the marginal kinetic benefits of independent reactants, \({u}_{\beta }^{j}\).
Optimal densityconstrained balanced growth states
So far, we have considered BGS for a given set of active reactions (corresponding to the columns of A) and given concentrations y = [P, a]^{T}, where y may or may not have respected any particular density constraint. We now examine densityconstrained BGSs (dBGSs) with maximal growth rate given the set of active reactions, optimized over all concentration vectors \({\bf{y}}={[P,{\bf{a}}]}^{T}\in {{\mathbb{R}}}_{{> }\,0}^{m+1}\) that respect the density constraint (14). As a preparation for these analyses, we first show that states of optimal growth are EGSs.
Theorem 9 (dBGSs with maximal growth rate are EGSs). Let N be a stoichiometric matrix of a general balanced growth model. Let v^{*} be a dBGS that maximizes the growth rate of the general problem. Then v^{*} is an EGS.
Proof: Without loss of generality, we restrict v^{*} to its active dimensions (\({v}_{j}^{*} \, {\ne} \, 0\)), with active stoichiometric matrix A. Then this reduced v^{*} is the optimal solution for the following nonlinear optimization problem over all concentration vectors \({\bf{y}}\equiv {[P,{\bf{a}}]}^{T}\in {{\mathbb{R}}}_{{> }\,0}^{m+1}\):
Let \({{\bf{y}}}^{* }={[{P}^{* },{{\bf{a}}}^{* }]}^{T}\) be the concentrations and μ^{*} the growth rate of the optimal solution v^{*}. Now let us consider a linearized version of this optimization problem, where me maximize the production rate v_{bio} at constant biomass composition y^{*} and effective kinetic constants \({k}_{j}^{{\rm{(eff)}}}\equiv {k}_{j}({{\bf{a}}}^{* })\) (see Definition 2):
We relaxed the constraint (13) on total protein into an inequality constraint, so that Eq. (25) describes a proteinconstrained FBA problem for the active stoichiometric matrix. This is precisely the type of constrained flux balance problem analyzed in refs. ^{28,29}, which prove that the solutions v^{opt} to the optimization problem defined by Eq. (25) are EFMs.
In the optimal solution to the problem defined by Eq. (25), the protein concentration constraint will be active, that is, P^{*} = ∑_{j}p_{j}; if not, the biomass production rate v_{bio} could be increased by multiplying the vector of protein concentrations p with a constant >1 (as \({v}_{j}={p}_{j}{k}_{j}^{* }\) for all j). Thus, the optimization problem described by Eq. (25) is the same as that described by Eq. (24), except for a reduction in the dimension of the search space due to the fixed concentrations y^{*} (Note that the cellular density constraint (14) is trivially respected in Eq. (25) and can be ignored). Accordingly, the flux distribution v^{*} that maximizes the balanced growth rate μ in Eq. (24) also maximizes the biomass production rate v_{bio} of the proteinconstrained FBA problem in Eq. (25); it is hence a representative of an EFM of the active stoichiometric matrix A with biomass y^{*}^{28,29}, and thus v^{*} is an EGS according to Definition 3. □
In parallel work, de Groot et al.^{30} have shown that optimal solutions to balanced growth problems are elementary growth modes (as defined in ref. ^{30}), and that the active stoichiometric matrix of elementary growth modes has full rank.
If instead of a single constraint on cellular density, multiple density constraints are imposed simultaneously (e.g., to describe separate constraints on different cellular compartments), then the solutions may in some cases correspond to positive linear combinations of EGSs^{30,63}, and the treatment below needs to be generalized. Multiple density constraints may play a role in the emergence of overflow metabolism in E. coli^{54,64}, although overflow metabolism can also arise in balanced growth models with a single density constraint^{5}.
In a dBGS with maximal growth rate for a given active stoichiometric matrix A, the cellular components will utilize the full limit on cellular density ρ to saturate enzymes with their substrates. Thus, the constraint (14) will be active, turning the inequality into an equality. The maximal balanced growth rate μ^{*} will thus be a function of the maximal cellular density ρ. As a reference value for the marginal net benefits of individual concentrations x_{i}, we now define the marginal benefit of the cellular density ρ.
Definition 5 (Marginal benefit of the cellular density): In analogy to the marginal net benefits of cellular components, we define the marginal benefit of the cellular density as the fitness increase facilitated by a small increase in ρ,
We can now relate η_{ρ} to the total marginal net benefits of all concentrations. To do this, we derive necessary conditions for any optimal BGS at constant cellular density ρ, using the method of Lagrange multipliers. The Lagrange multipliers quantify the importance of the density constraint, Eq. (14), and of the constraints for the dependent reactants, Eq. (18), for the maximization of the objective function. The Lagrangian \({\mathcal{L}}\) is a function of y = [P, a]^{T}, and ρ.
Theorem 10 (Balance equation): In a dBGS with maximal growth rate, the total marginal net benefit of each independent concentration x_{i} (i ∈ {P, β}) equals the marginal benefit of the cellular density ρ scaled by the density factor κ_{i},
Proof: We use the method of Lagrange multipliers to derive necessary conditions for any optimal dBGS at constant cellular density ρ. Our objective function is given by Theorem 7, which expresses the growth rate μ as an explicit function of the concentrations y = [P, a]^{T}. The density constraint (14) will be active at maximal growth rate, i.e., it becomes an equality. The density constraint can then be expressed as a function g_{ρ} that depends on ρ and on the concentrations,
Finally, the constraints on each dependent reactant γ also only depend on y = [P, a]^{T}, with the entries D_{γP} determining the composition of each γ in terms of P, and D_{γβ} determining the composition of γ in terms of b_{β},
We now define a Lagrangian as the sum of the objective function μ and the constraints g scaled by Lagrange multipliers λ_{ρ}, accounting for the density constraint (14), and λ_{γ}, accounting for the dependence of the dependent reactants γ ∈ {γ}, Eq. (18):
The firstorder necessary conditions for a constrained local maximum are that all partial derivatives of \({\mathcal{L}}\) with respect to the variables P, b_{β}, c_{γ} and to the Lagrange multipliers λ_{ρ}, λ_{γ} are zero,
For the partial derivative with respect to an independent concentration x_{i} (i ∈ {P, β}), we have
With Theorem 8, this results in
For the partial derivative with respect to a dependent reactant c_{γ}, we have
With Eq. (19), we obtain
Substituting λ_{γ} from the last equation into Eq. (27) gives (for i ∈ {P, β})
Rearranging results in
where we used η_{ρ} = −λ_{ρ}/μ, which follows directly from the envelope theorem^{65}. With μ > 0, we can divide by μ to obtain the balance equation. □
The optimal state is perfectly balanced: the total marginal net benefit of each independent cellular concentration x_{i} equals the marginal benefit of the cellular density, scaled by κ_{i} to account for its total utilization of cellular density. If i does not have any dependent reactants (∀γ: D_{γi} = 0), then the balance equation simplifies to \({\eta }_{i}={\eta }_{i}^{0}={\eta }_{\rho }\) (Eq. (10)).
Theorem 10 states that if the dry weight density ρ would be allowed to increase by a small amount, such as 1 mg l^{−1}, then the marginal fitness gain that could be achieved by increasing protein concentration (plus dependent concentrations) by this amount is identical to that achieved by increasing the concentration of any reactant β (plus its dependent concentrations) by the same amount.
Instead of using Lagrange multipliers in the proof, one could express the total protein concentration P = ρ − ∑_{α}a_{α} (constraint (14)) and the dependent reactant concentrations c_{γ} = D_{γP}P + ∑_{β}D_{γβ}b_{β} (Eq. (18)) in terms of ρ and of the independent reactant concentrations b. Substituting the resulting expressions into the growth equation (Theorem 7) would result in an objective function that depends only on ρ and b, and that is constrained only by the requirement of positive concentrations. While this would lead to the same balance equations as derived in the Lagrange multiplier framework, this formulation misses important insights that can be derived from the Lagrange multipliers themselves.
Optimal ribosome proteome fraction
Here we employ a very simple model for translation^{38}. It accounts only for the elongation phase, where one catalyst (the ribosome plus bound mRNA, with concentration R) converts one substrate (the ternary complex, with concentration a_{T}) into protein, following irreversible Michaelis–Menten kinetics:
with constant maximal ribosome activity k_{cat} (in units of [time]^{−1}) and Michaelis constant K_{m} (in units of [mass][volume]^{−1}).
We assume that the model has no dependent reactants (A = B) and that the ternary complex is not used in any other reaction. In this case, the same canceling of production costs as in the model depicted in Supplementary Fig. 1a happens, and the balance of net benefits of ternary complex and total protein, η_{T} = η_{P} (Eq. (10)), simplifies to
with the kinetic benefit of the ternary complex T for the ribosome R, \({u}_{{\rm{T}}}^{\rm{R}}\) (Definition 4). Substituting the partial derivative of irreversible Michaelis–Menten kinetics (Eq. (29)), we obtain
Rearranging Eq. (29), we also see that the kinetics determine the concentration a_{T} uniquely in terms of v_{R}, R, K_{m}, and the ribosome’s turnover number k_{cat},
Substituting this into Eq. (30) gives
From the ribosome kinetics and mass conservation of proteins, we have
Thus, substituting μ/k_{R} = R/P and v_{R} = μP in Eq. (31), we obtain
This is equivalent to a quadratic equation in R/P,
Its two solutions are
To see which of the two solutions is relevant, we rewrite this as
Because k_{cat}R > Rk_{R} = v_{R} = μP, the term in square brackets [ ⋅ ] in Eq. (33) must be >1. Only the positive root is compatible with this condition. Thus, the ratio R/P is uniquely determined by
To relate this expression to experimental data, we need to remember that ribosomes consist of protein and RNA. To estimate the ribosome proteome fraction ϕ_{R}, we thus need to scale the previous expression by the fraction r_{P} of ribosome which is protein, resulting in the final equation
The same procedure can be used to find an equation for ϕ_{R} that ignores the production costs. Starting from Eq. (31) without the production cost term μ/k_{R}, we obtain
which results in a quadratic equation similar to Eq. (32),
Solving for R/P gives
Again because Rk_{cat} > μP, the term in square brackets [ ⋅ ] in Eq. (35) must be >1, and again only the positive root is compatible with this condition. Thus, the ribosome proteome fraction is uniquely determined in this approximation by
We compared the predictions for ϕ_{R} to experimental estimates based on quantitative proteomics^{45} and on total RNA to protein ratios^{19,42,46,66}. While all estimates are very similar (Fig. 2), given that on the order of 20% of total RNA is tRNA^{42} and that this proportion is at least moderately growth rate dependent^{67}, the exact growth rate dependence of ϕ_{R} may be captured more faithfully by the proteomics data.
To calculate ϕ_{R} from the proteomics measurements, we first calculated the mean over all molar concentrations of ribosomal proteins reported by Schmidt et al.^{45}. Molar concentrations of the ribosome were converted to mass concentrations by multiplying with molar masses derived from the amino acid sequences for the protein parts and nucleotide sequences for the RNA parts. For this, we assumed that each ribosome contained one copy of each of its constituents, with the exception of four copies of RplL^{68}. We multiplied the ribosome mass concentrations with the mass fraction of ribosomes that is protein (r_{P} = 0.358^{45}), and divided the result by the total protein mass concentration P to obtain ϕ_{R}. The proteome fraction of actively translating ribosomes was determined based on total ribosome proteome fraction and the fraction of active ribosome at different growth rates. The latter was estimated by fitting a smooth saturation function s(μ) = μ/(μ + z) over the fractions of active ribosomes estimated in ref. ^{46}, with the bestfitting parameter z = 0.124 h^{−1}. Nonlinear fitting was performed using the function nls() in gnu R^{69}.
We set the Michaelis constant of the ribosome to \({K}_{{\rm{m}}}^{\prime}=3\times 1{0}^{6}{\rm{mol}}\,{\rm{l}}^{1}\), based on the diffusion limit for ternary complexes calculated in ref. ^{38}. We set the ribosome’s turnover number to k_{cat} = 22 AA s^{−1}, the highest elongation rate observed experimentally in ref. ^{42}. As we do not distinguish between different ternary complexes and the ribosome only accepts one of the 40 different ternary complex types at any given time, \({K}_{{\rm{m}}}^{\prime}\) was multiplied by 40 (see ref. ^{38}), resulting in an effective Michaelis constant of K_{m} = 1.2 × 10^{−4} mol l^{−1}. For consistency of the units with the mass concentration units used throughout our paper, the kinetic parameters had to be converted from molar to mass concentrations. The mean weight (±SD) of amino acids across all conditions assayed in ref. ^{45} was (132.60 ± 0.09) Da; the ribosome molecular weight is 2,306,967 Da; and the mean weight of ternary complexes is (69,167 ± 1351) g mol^{−1}. With these numbers, we obtain k_{cat} = 22 AA s^{−1} × (132.60 Da AA^{−1})/(2,306,967 Da) × 3600 s h^{−1} = 4.55 h^{−1}, and K_{m} = 40 × 3 × 10^{−6} mol l^{−1} × 69,167 g mol^{−1} = 8.30 g l^{−1}. For the predictions based on Eq. (34), we set the total protein concentration to P = 127.4 g l^{−1} ^{45}.
For yeast, the concentration of actively translating ribosomes was determined based on total ribosome concentration and the fraction of active ribosome at different growth rates; the data was extracted from the figures of ref. ^{47} using the GetData Graph Digitizer program (Version 2.26, obtained from http://getdatagraphdigitizer.com/). The fraction of active ribosomes was estimated by fitting a smooth saturation function s(μ) = μ/(μ + z) over the fractions of active ribosomes estimated in ref. ^{47}, again using the nls() function in R. The bestfitting parameter was z = 0.122 h^{−1}, very close to the E. coli estimate. We again set \({K}_{{\rm{m}}}^{\prime}\) to the diffusion limit^{38} K_{m} = 3 × 10^{−6} mol l^{−1}, multiplied with the number of different ternary complexes, of which there are 41 in yeast^{70}. The ribosome’s turnover number was set to k_{cat} = 10 AA s^{−1}, the highest elongation rate observed experimentally according to ref. ^{71}. To convert to mass units, we used the mean weight of amino acids (130 Da)^{72}, the ribosome molecular weight 3,620,000 Da^{73}, and the molecular weight of ternary complex (240,000 Da)^{74,75,76}. With these numbers, we obtain k_{cat} = 10 AA s^{−1} × (130 Da AA^{−1})/(3,620,000 Da) × 3600 s 1 h^{−1} = 1.29 h^{−1}, and K_{m} = 41 × 3 × 10^{−6} mol l^{−1} × 240,000 g mol^{−1} = 29.52 g l^{−1}. In yeast, the mass fraction of ribosomes that is protein is r_{P} = 0.45^{73}. For the predictions based on Eq. (34), we set the total protein concentration to the haploid cell value P = 85.7 g l^{−1} ^{77}.
To quantify the fit of our predictions for ϕ_{R} to the observed ribosomal proteome fractions, we calculated Pearson’s correlation coefficient r between observed and predicted values as well as the coefficient of determination
with the total sum of squares \(S{S}_{\text{tot}}={\sum }_{i}{({\phi }_{{\rm{R,i}}}\bar{{\phi }_{{\rm{R}}}})}^{2}\) (proportional to the variance of the data) and the residual sum of squares \(S{S}_{\text{res}}={\sum }_{i}{({\phi }_{{\rm{R,i}}}{\phi }_{{\rm{R,i}}}^{{\rm{predicted}}})}^{2}\) (proportional to the variance of the residuals).
Dependence of maximal growth rate on cellular water content
Cayley et al.^{40,50} have shown that the internal water content of E. coli cells increases when these are grown in environments with reduced osmolarity. This effect corresponds to a decrease of cellular dry weight per volume, ρ, by δρ. η_{ρ} quantifies the associated reduction in relative fitness, δf = δμ^{*}/μ^{*} = η_{ρ}δρ, with μ^{*} the maximal growth rate (Definition 5). The relative change in the maximal growth rate per relative change in ρ is then
From Eq. (26), we know that η_{P} = κ_{P}η_{ρ}; if there are no dependent reactants for P (i.e., ∀γ: D_{γP} = 0), this simplifies to
and thus
The mass fraction of total protein in cell dry weight P/ρ ≈ 0.55 has been shown to be approximately constant for E. coli across growth conditions supporting intermediate to high growth rates^{40,45,49}.
To estimate the total protein production cost \({\sum }_{j}{q}_{{\rm{P}}}^{j}\), we consider the simplest possible wholecell model, comprising only a transport reaction and the ribosome reaction (Supplementary Fig. 2). The active stoichiometric matrix A of this model and its inverse A^{−1} are (written here with row and column labels):
The density is determined only by its two components,
where
From the inverse A^{−1} and Theorem 5, we obtain
and
From the inverse A^{−1} and Eq. (23), we get
Combining this with Eqs. (40) and (41) and using ϕ_{R} = p_{R}/P and ϕ_{t} = p_{t}/P = 1 − ϕ_{R}, we obtain
Inserting this in Eq. (39) results in
The growth rate in the reference growth condition of osmolarity Osm = 0.28 in ref. ^{50} is μ = 1.0 h^{−1}. From Eq. (34), we estimate the mass fraction of ribosomal proteins in total protein ϕ_{R} at this growth rate as ϕ_{R} = 0.19. Substituting this value into Eq. (42) together with P/ρ = 0.55, we estimate the relative change in the maximal growth rate per relative change in ρ as
Note that instead of a density constraint on total dry mass ρ, previous analyses of schematic and coarsegrained models of balanced growth^{3,5,6,7,8,9,19} utilized a constraint only on the concentration of macromolecules P. Calculating Pη_{P} instead of ρη_{ρ} leads to a replacement of the factor (ρ/P − 1) with (1 − P/ρ) compared with the last line of Eq. (42), and the same parameterization then leads to a prediction of Pη_{P} = 0.36.
Cayley et al.^{50} report cell growth at reduced osmolarities, summarized in Supplementary Table 1. The cellfree water content \({\overline{V}}_{free}\) in Supplementary Table 1 is calculated from the total cell water \({\overline{V}}_{cell}\) minus the observed constant bound water \({\overline{V}}_{b}=0.40\pm 0.04\) ml gCDW^{−1}^{40}. Errors are estimated standard deviations based on error propagation among normally distributed random variables. Supplementary Fig. 3 plots the natural logarithms of μ and ρ. Linear regression over the three available data points results in an estimated slope of 0.66, indistinguishable from our estimate of \(\frac{d\mathrm{ln}\,{\mu }^{* }}{d\mathrm{ln}\,\rho }=\rho {\eta }_{\rho }=0.66\).
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
References
Fisher, R. A. & Bennett, J. H. The Genetical Theory of Natural Selection: A Complete Variorum Edition (Oxford University Press, 1999).
Ibarra, R. U., Edwards, J. S. & Palsson, B. O. Escherichia coli K12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420, 186–189 (2002).
Towbin, B. D. et al. Optimality and suboptimality in a bacterial growth law. Nat. Commun. 8, 14123 (2017).
Campbell, A. Synchronization of cell division. Bacteriol. Rev. 21, 263–272 (1957).
Molenaar, D., van Berlo, R., de Ridder, D. & Teusink, B. Shifts in growth strategies reflect tradeoffs in cellular economics. Mol. Syst. Biol. 5, 323 (2009).
Weiße, A. Y., Oyarzún, D. A., Danos, V. & Swain, P. S. Mechanistic links between cellular tradeoffs, gene expression, and growth. Proc. Nat. Acad. Sci. 112, E1038–E1047 (2015).
Maitra, A. & Dill, K. A. Bacterial growth laws reflect the evolutionary importance of energy efficiency. Proc. Nat. Acad. Sci. 112, 406–411 (2015).
Giordano, N., Mairet, F., Gouzé, J.L., Geiselmann, J. & de Jong, H. Dynamical allocation of cellular resources as an optimal control problem: novel insights into microbial growth strategies. PLOS Comput. Biol. 12, e1004802 (2016).
Kafri, M., MetzlRaz, E., Jona, G. & Barkai, N. The cost of protein production. Cell Reports 14, 22–31 (2016).
Faizi, M., Zavřel, T., Loureiro, C., Červený, J., Steuer, R. A model of optimal protein allocation during phototrophic growth. BioSystems 166, 26–36 (2018).
De Jong, H. et al. Mathematical modelling of microbes: metabolism, gene expression and growth. J. R. Soc. Interface 14, 20170502 (2017).
Watson, M. R. Metabolic maps for the Apple II. Biochem. Soc.Trans. 12, 1093–1094 (1984).
Lewis, N. E., Nagarajan, H. & Palsson, B. O. Constraining the metabolic genotypephenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291–305 (2012).
Goelzer, A. et al. Quantitative prediction of genomewide resource allocation in bacteria. Metab. Eng. 32, 232–243 (2015).
O’Brien, E. J., Lerman, J. A., Chang, R. L., Hyduke, D. R. & Palsson, B. Genomescale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol. Syst. Biol. 9, 693 (2013).
Mori, M., Hwa, T., Martin, O. C., De Martino, A. & Marinari, E. Constrained allocation flux balance analysis. PLOS Comput. Biol. 12, 1–24 (2016).
Chang, A. et al. BRENDA in 2015: exciting developments in its 25th year of existence. Nucl. Acids Res. 43, D439–D446 (2014).
Davidi, D. et al. Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro k _{cat} measurements. Proc. Nat. Acad. Sci. 113, 3401–3406 (2016).
Scott, M., Gunderson, C. W., Mateescu, E. M., Zhang, Z. & Hwa, T. Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099–1102 (2010).
Hui, S. et al. Quantitative proteomic analysis reveals a simple strategy of global resource allocation in bacteria. Mol. Syst. Biol. 11, e784 (2015).
Goelzer, A. & Fromion, V. RBA for eukaryotic cells: foundations and theoretical developments. Preprint at: https://www.biorxiv.org/content/10.1101/750182v1 (2019).
O’Brien, E. J., Utrilla, J. & Palsson, B. O. Quantification and classification of E. coli proteome utilization and unused protein costs across environments. PLOS Comput. Biol. 12, e1004998 (2016).
Saa, P. A. & Nielsen, L. K. Formulation, construction and analysis of kinetic models of metabolism: a review of modelling frameworks. Biotechnol. Adv. 35, 981–1003 (2017).
Strutz, J., Martin, J., Greene, J., Broadbelt, L. & Tyo, K. Metabolic kinetic modeling provides insight into complex biological questions, but hurdles remain. Curr Opin Biotechnol. 59, 24–30 (2019).
Khodayari, A. & Maranas, C. D. A genomescale Escherichia coli kinetic metabolic model kecoli457 satisfying flux data for multiple mutant strains. Nat. Commun. 7, 13806 (2016).
Heinrich, R. & Schuster, S. The Regulation of Cellular Systems (Springer, 2011).
Schuster, S. & Hilgetag, C. On elementary flux modes in biochemical reaction systems at steady state. J. Biol. Syst. 02, 165–182 (1994).
Wortel, M. T., Peters, H., Hulshof, J., Teusink, B. & Bruggeman, F. J. Metabolic states with maximal specific rate carry flux through an elementary flux mode. FEBS J. 281, 1547–1555 (2014).
Müller, S., Regensburger, G. & Steuer, R. Enzyme allocation problems in kinetic metabolic networks: optimal solutions are elementary flux modes. J.Theor. Biol. 347, 182–190 (2014).
de Groot, D. H., Hulshof, J., Teusink, B., Bruggeman, F. J. & Planqué, R. Elementary growth modes provide a molecular description of cellular selffabrication. Plos Comput. Biol. 16, e1007559 (2020).
Gagneur, J. & Klamt, S. Computation of elementary modes: a unifying framework and the new binary approach. BMC Bioinform. 5, 175 (2004).
Reder, C. Metabolic control theory: a structural approach. J. Theor. Biol. 135, 175–201 (1988).
Dekel, E. & Alon, U. Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592 (2005).
Kleijn, I. T., Krah, L. H. J. & Hermsen, R. Noise propagation in an integrated model of bacterial gene expression and growth. PLOS Comput. Biol. 14, e1006386 (2018).
Liebermeister, W. Optimal metabolic states in cells. Preprint at: https://www.biorxiv.org/content/10.1101/483867v1 (2018).
Liebermeister, W. The value structure of metabolic states. Preprint at: https://www.biorxiv.org/content/10.1101/483891v1 (2018).
Noor, E. et al. The protein cost of metabolic fluxes: prediction from enzymatic rate laws and cost minimization. PLOS Comput. Biol. 12, e1005167 (2016).
Klumpp, S., Scott, M., Pedersen, S. & Hwa, T. Molecular crowding limits translation and cell growth. Proc. Nat. Acad. Sci. 110, 16754–16759 (2013).
Atkinson, D. E. Limitation of metabolite concentrations and the conservation of solvent capacity in the living cell. Current Top. Cell. Regul. 1, 29–43 (1969).
Cayley, S., Lewis, B. A., Guttman, H. J. & Record, M. Characterization of the cytoplasm of Escherichia coli K12 as a function of external osmolarity: implications for proteinDNA interactions in vivo. J. Mol. Biol. 222, 281–300 (1991).
Baldwin, W. W., Myer, R., Powell, N., Anderson, E. & Koch, A. L. Buoyant density of Escherichia coli is determined solely by the osmolarity of the culture medium. Arch. Microbiol. 164, 155–157 (1995).
Bremer, H. & Dennis P. P. Modulation of chemical composition and other parameters of the cell at different exponential growth rates. EcoSal Plus 2008, https://doi.org/10.1128/ecosal.5.2.3 (2008).
Klumpp, S., Zhang, Z. & Hwa, T. Growth ratedependent global effects on gene expression in bacteria. Cell 139, 1366–1375 (2009).
Basan, M. et al. Inflating bacterial cells by increased protein synthesis. Mol. Syst. Biol. 11, 836 (2015).
Schmidt, A. et al. The quantitative and conditiondependent Escherichia coli proteome. Nat. Biotechnol. 34, 104–110 (2015).
Dai, X. et al. Reduction of translating ribosomes enables escherichia coli to maintain elongation rates during slow growth. Nat. Microbiol. 2, 16231 (2016).
MetzlRaz, E. et al. Principles of cellular resource allocation revealed by conditiondependent proteome profiling. eLife 6, e28034 (2017).
Dourado, H., Maurino, V. G. & Lercher, M. J. Enzymes and substrates are balanced at minimal combined mass concentration in vivo. Preprint at: https://www.biorxiv.org/content/10.1101/128009v1 (2017).
Ingraham, J. L., Maaløe, O. & Neidhardt, F.C. Growth of the Bacterial Cell (Sinauer Associates Inc, 1983).
Cayley, D. S., Guttman, H. J. & Record, M. T. Biophysical characterization of changes in amounts and activity of Escherichia coli cell and compartment water and turgor pressure in response to osmotic stress. Biophys. J. 78, 1748–1764 (2000).
de Figueiredo, L. F. et al. Computing the shortest elementary flux modes in genomescale metabolic networks. Bioinformatics 25, 3158–3165 (2009).
Beg, Q. K. et al. Intracellular crowding defines the mode and sequence of substrate uptake by escherichia coli and constrains its metabolic activity. Proc. Nat. Acad. Sci. 104, 12663–12668 (2007).
Holzhütter, H. G. The principle of flux minimization and its application to estimate stationary fluxes in metabolic networks. Eur. J. Biochem. 271, 2905–2922 (2004).
Basan, M. et al. Overflow metabolism in Escherichia coli results from efficient proteome allocation. Nature 528, 99–104 (2015).
Nilsson, A., Nielsen, J. & Palsson, B. O. Metabolic models of protein allocation call for the kinetome. Cell Syst. 5, 538–541 (2017).
Hackett, S. R. et al. Systemslevel analysis of mechanisms regulating yeast metabolic flux. Science 354, aaf2786 (2016).
Borger, S., Liebermeister, W. & Klipp, E. Prediction of Enzyme Kinetic Parameters Based on Statistical Learning. Genome Inf. 17, 80–87 (2006).
Heckmann, D. et al. Machine learning applied to enzyme turnover numbers reveals protein structural correlates and improves metabolic models. Nat. Commun. 9, 5252 (2018).
Lubitz, T., Schulz, M., Klipp, E. & Liebermeister, W. Parameter balancing in kinetic models of cell metabolism. J. Phys. Chem. B 114, 16298–303 (2010).
Kaltenbach, H.M. & Stelling, J. Modular Analysis of Biological Networks. In Analysis of Biological Networks, Vol. 736 of Wiley Series on Bioinformatics: Computational Techniques and Engineering (eds Junker, B. H. & Schreiber, F.), 3–17 (John Wiley & Sons, Inc., Hoboken, NJ, USA, 2008).
Zhuang, K., Vemuri, G. N. & Mahadevan, R. Economics of membrane occupancy and respirofermentation. Mol. Syst. Biol. 7, 500 (2011).
Benyamini, T., Folger, O., Ruppin, E. & Shlomi, T. Flux balance analysis accounting for metabolite dilution. Genome Biol. 11, R43 (2010).
de Groot, D. H., van Boxtel, C., Planqué, R., Bruggeman, F. J. & Teusink, B. The number of active metabolic pathways is bounded by the number of cellular constraints at maximal metabolic rates. PLOS Comput. Biol. 15, e1006858 (2019).
de Groot, D. H., et al. The common message of constraintbased optimization approaches: overflow metabolism is caused by two growthlimiting constraints. Cell. Mol. Life Sci. 77, 441–453 (2020).
Afriat, S. Theory of maxima and the method of lagrange. SIAM J. Appl. Math. 20, 343–357 (1971).
Forchhammer, J. & Lindahl, L. Growth rate of polypeptide chains as a function of the cell growth rate in a mutant of Escherichia coli 15. J. Mol. Biol. 55, 563–568 (1971).
Dennis, P. P. Regulation of ribosomal and transfer ribonucleic acid synthesis in Escherichia coli b/r. J. Biol. Chem. 247, 2842–2845 (1972).
SantosZavaleta, A. et al. EcoCyc: fusing model organism databases with systems biology. Nucl. Acids Res. 41, D605–D612 (2012).
R Core Team R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.Rproject.org/ (2017).
Gilchrist, M. A. & Wagner, A. A model of protein translation including codon bias, nonsense errors, and ribosome recycling. J. Theor. Biol. 239, 417–434 (2006).
Karpinets, T. V., Greenwood, D. J., Sams, C. E. & Ammons, J. T. RNA:protein ratio of the unicellular organism as a characteristic of phosphorous and nitrogen stoichiometry and of the cellular requirement of ribosomes for protein synthesis. BMC Biol. 4, 30 (2006).
Lange, H. C. & Heijnen, J. J. Statistical reconciliation of the elemental and molecular biomass composition of Saccharomyces cerevisiae. Biotechnol. Bioeng. 75, 334–344 (2001).
Warner, J. R. The assembly of ribosomes in yeast. J. Biol. Chem. 246, 447–454 (1971).
Saha, S. K. & Chakraburtty, K. Protein synthesis in yeast. isolation of variant forms of elongation factor 1 from the yeast Saccharomyces cerevisiae. J. Biol. Chem. 261, 12599–12603 (1986).
Jeppesen, M. G. et al. The crystal structure of the glutathione stransferaselike domain of elongation factor 1b from Saccharomyces cerevisiae. J. Biol. Chem. 278, 47190–47198 (2003).
Algire, M. A. et al. Development and characterization of a reconstituted yeast translation initiation system. RNA 8, 382–397 (2002).
Sherman, F. Getting Started with Yeast. In Guide to Yeast Genetics and Molecular and Cell Biology  Part B, Vol. 350 of Methods in Enzymology (eds Guthrie, C. & Fink, G. R.), 3–41 (Academic Press, 2002).
Acknowledgements
We thank Johannes Berg, Oliver Ebenhöh, Daan de Groot, XiaoPan Hu, Terry Hwa, Michael Lässig, Wolfram Liebermeister, Elad Noor, and Deniz Sezer for discussions. We thank XiaoPan Hu for help with the translation model and the calculation of active ribosome fractions, and Jin Wang for help with assembling the yeast parameters. This work was funded by the German Academic Exchange Service (DAAD) through a fellowship (IRTG 1525) to H.D. and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through grants IRTG 1525, CRC 680, CRC 1310, and, under Germany’s Excellence Strategy, through grant EXC 2048/1 (Project ID: 390686111).
Author information
Authors and Affiliations
Contributions
H.D. and M.J.L. jointly conceived the study, interpreted the results, and wrote the manuscript. H.D. developed the GBA framework, performed all data analyses, and derived all formal results except Theorems 1–3, which were derived by MJL.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Dourado, H., Lercher, M.J. An analytical theory of balanced cellular growth. Nat Commun 11, 1226 (2020). https://doi.org/10.1038/s4146702014751w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s4146702014751w
This article is cited by

Enzyme expression kinetics by Escherichia coli during transition from rich to minimal media depends on proteome reserves
Nature Microbiology (2023)

Shaping bacterial gene expression by physiological and proteome allocation constraints
Nature Reviews Microbiology (2022)

Understanding and mathematical modelling of cellular resource allocation in microorganisms: a comparative synthesis
BMC Bioinformatics (2021)

The two languages of science
Genome Biology (2020)

The protein translation machinery is expressed for maximal efficiency in Escherichia coli
Nature Communications (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.