An analytical theory of balanced cellular growth

The biological fitness of microbes is largely determined by the rate with which they replicate their biomass composition. Mathematical models that maximize this balanced growth rate while accounting for mass conservation, reaction kinetics, and limits on dry mass per volume are inevitably non-linear. Here, we develop a general theory for such models, termed Growth Balance Analysis (GBA), which provides explicit expressions for protein concentrations, fluxes, and growth rates. These variables are functions of the concentrations of cellular components, for which we calculate marginal fitness costs and benefits that are related to metabolic control coefficients. At maximal growth rate, the net benefits of all concentrations are equal. Based solely on physicochemical constraints, GBA unveils fundamental quantitative principles of cellular resource allocation and growth; it accurately predicts the relationship between growth rates and ribosome concentrations in E. coli and yeast and between growth rate and dry mass density in E. coli.

T he defining feature of life is self-replication. For noninteracting unicellular organisms in constant environments, the rate of this self-replication is equivalent to their evolutionary fitness 1 : fast-growing cells outcompete those growing more slowly. Accordingly, we expect that natural selection favoring fast growth in specific environments has played an important role in shaping the physiology of many microbial organisms 2,3 .
Conceptually, we can envision a bacterial cell as a volume enclosed by a membrane, filled with a solution of metabolites and of the proteins and nucleic acids that catalyze their conversion into biomass. A state of the cell is characterized by the molecular concentrations, which in turn determine the fluxes of the biochemical reactions through kinetic rate laws. The boundary conditions limiting the concentrations and fluxes are provided by the environment and by physicochemical constraints. Cellular growth has to be balanced over the cell cycle, i.e., all cellular components must be produced in proportion to their abundances 4 . Casting these constraints into a mathematical model and characterizing states of optimal growth may provide a detailed understanding of central aspects of bacterial physiology 3,5-10 .
Molenaar et al. 5 proposed a small, schematic model of balanced, self-replicating growth with explicit non-linear reaction kinetics and at most seven reactions, including the production of catalytic proteins. Numerical growth rate optimization predicted qualitatively the growth-rate dependencies of cellular ribosome content, cell size, and the emergence of overflow metabolism. We term this general modeling scheme Growth Balance Analysis (GBA). No extensions of this approach to larger models have been proposed, likely because of its inherent non-linearity and the resulting difficulty of numerical optimizations. Instead, even simpler, linear models of 1-3 reactions were solved analytically to gain further qualitative understanding of systems-level effects 3,6-9 , including optimal gene regulation strategies 3,8 .
Models for the genome-scale physiology of complete cells are typically formulated as approximations to GBA 11 . Currently, the most popular such method is flux balance analysis (FBA) 12,13 . FBA maximizes the production rate of a constant biomass concentration vector while accounting for mass conservation by balancing the fluxes producing and consuming internal metabolites (Fig. 1). All constraints in FBA are linear. The resulting computational efficiency comes at the price of ignoring reaction kinetics and the requirement of sufficient enzyme concentrations to catalyze the predicted metabolic fluxes. FBA can be viewed as a linearization of the GBA scheme 11 . Figure 1 shows a schematic comparison of FBA and GBA. While FBA predicts a linear dependence of maximal growth rate on nutrient uptake fluxes, GBA leads to a non-linear (Monod-type) dependence on nutrient concentrations.
Most alternative whole-cell modeling schemes [14][15][16] are generalizations of FBA and are also based on the optimization of a cellular objective, which is typically set to the cellular growth rate or a proxy thereof. Like GBA, resource balance analysis (RBA) 14 and genome-scale models of metabolism and gene expression (ME) 15 combine a genome-scale metabolic model (as utilized in FBA, Fig. 1) with a protein translation apparatus that converts precursors into protein. While RBA models are formulated at a level of detail typical for FBA models, ME models aim to account comprehensively for all growth-related cellular processes, including, for example, chaperone-assisted protein folding. Contrary to GBA, both methods do not account for metabolite concentrations and assume a linear relationship between fluxes and protein abundances. ME models typically assume constant effective rate constants for reactions, which are set to in vitro 17 or in vivo 18 estimates of turnover numbers (k cat ). RBA instead uses phenomenological, growth-rate dependent effective kinetic rate constants. These are modeled as linear functions of the growth rate, and parameters are obtained by fitting model-predicted fluxes to proteomics data. Constraint allocation flux balance analysis (CAFBA) 16 is conceptually similar to RBA and ME, but describes the protein costs of biochemical reactions through previously discovered phenomenological growth laws 19,20 .
These previous modeling schemes can be considered as approximations to GBA 11 that go beyond FBA by including the protein cost of biochemical fluxes, but that ignore the influence of metabolite concentrations on reaction kinetics and the costs  ) for a simple schematic model. A nutrient G is taken up through a transporter t at rate v t and is then converted by an enzyme e with rate v e into a precursor for protein synthesis, AA. In FBA, AA is equated with the biomass, the production of which is maximized while enforcing the stationarity of internal concentrations (blue); this leads to a linear dependence of growth rate on uptake flux. In GBA, AA is converted further into total protein P by a ribosome R, where P represents the sum of the three proteins (t, e, R). GBA maximizes the balanced production of the cellular composition with growth (blue), offsetting the dilution of the cellular components (G, AA, P) with the growth rate μ indicated by the blue arrows. The reaction fluxes are constrained by non-linear reaction kinetics (red) and a limit on cellular density (dry mass per volume, gray); this leads to a non-linear dependence of growth rate on nutrient concentrations. incurred through their dilution by growth. Genome-scale implementations of RBA, ME, and CAFBA for model organisms [14][15][16]21 have been shown to predict some macroscopic phenotypic behavior 14,22 . However, the predicted investment into individual proteins appears to be highly inaccurate, possibly because enzyme kinetics are only treated approximately and metabolite concentrations are not accounted for. Moreover, these methods cannot facilitate a full understanding of phenotypic behavior from basic biochemical and biophysical constraints, and thus do not provide mechanistic insights at the level that would be possible with fully parameterized, genome-scale GBA models. Due to the central role of kinetic rate laws in GBA, GBA is also closely related to kinetic modeling approaches of cellular metabolism and growth 23,24 . Like GBA, kinetic models implement the mass balance of biochemical reactions while accounting for the dependence of enzyme kinetics on protein and metabolite concentrations. In contrast to GBA and the alternative modeling schemes discussed so far, kinetic modeling approaches do not assume optimality, but simply describe the (steady state) distribution of fluxes and metabolite concentrations resulting from known enzyme concentrations and the kinetic rate laws. However, in vitro kinetic parameters (as reflected in databases such as BRENDA 17 ) are very incomplete, and estimates were often made in different experimental settings and are thus not always consistent with each other 23 and with in vivo data 18 . For this reason, enzymatic rate laws in kinetic modeling algorithms are typically parameterized by a fitting procedure that minimizes the discrepancy between model predictions and experimental data (e.g., metabolic fluxes or metabolite concentrations measured across multiple conditions or mutants) 23,24 . Different approaches to kinetic modeling differ from each other in their representation of enzymatic rate laws and in the algorithm used to fit the corresponding parameters. While such fitted parameterizations can lead to accurate predictions of overall cellular physiology, they may show little or no correspondence to experimentally determined kinetic parameters 25 . Moreover, kinetic models typically need to account for substrate-level regulatory interactions to result in realistic predictions 23 .
Below, we develop the mathematical foundations for GBA of arbitrarily complex cellular systems. We first describe the constraints that characterize states of balanced growth, and we define elementary growth states (EGSs) by referring to the elementary flux modes (EFMs) of metabolic pathway analysis 26 and FBA. We then show that the reaction fluxes, individual protein concentrations, and growth rate of any EGS are uniquely determined by the set of active reactions and the total cellular protein and individual reactant concentrations. We show how this theoretical framework can be used to understand cellular resource allocation conceptually, and we demonstrate how to analyze specific subsystems for which systems-level effects cancel mathematically.

Results
Modeling balanced exponential growth. Our model assumes that the cell increases exponentially in size, while the concentrations of all cellular components (including the number of membrane constituents per cell volume) remain constant 5 . We do not explicitly model cell division; thus, our model can also be interpreted as describing the growth of a population of cells 8 . In balanced growth, the net production rate of each molecular constituent must balance its dilution by growth, 0 ¼ dx dt production À μx, where x denotes the concentration of a given component and μ is the cellular growth rate 5,8 . The mass conservation in chemical reaction networks is commonly described through a stoichiometric matrix N, where rows correspond to metabolites and each column describes the mass balance of one reaction 26 . Here, we focus on matrices A of active reactions, i.e., A is a sub-matrix of N that contains all columns j for reactions with flux v j ≠ 0 and all rows for reactants i involved in these reactions. A also includes a "ribosome" reaction to produce catalytic proteins, encompassing enzymes, transporters, and the ribosome itself. We express concentrations as mass concentrations (mass per volume); accordingly, the entries of A are not stoichiometric coefficients but are mass fractions. The mass conservation of each component can then be stated as where v is the flux vector (in units of [mass][volume] −1 [time] −1 ), a is the vector of reactant mass concentrations a α , and P is the sum of the mass concentrations p j of all proteins j ∈ {1, …, n}, The first row of A describes the net production of total protein P, which is then distributed among the individual proteins j. The remaining rows describe the net production of the reactants α.
Each reaction rate v j is the product of the concentration of its catalyzing protein p j and a kinetic function k j (a) that depends on the reactant concentrations a α , We assume that the functional form and kinetic parameters of k j (a) are known. k j (a) may depend on the mass concentrations of substrates, products, and other molecules a α acting as inhibitors or activators, and accounts for the system's thermodynamics. The activity of all reactions j represented in A (v j ≠ 0) implies p j > 0 and k j (a) ≠ 0. Below, we treat the concentrations of total protein P and individual reactants a α as the state variables of the system, and we show that the fluxes v j , individual protein concentrations p j , and growth rate μ can be cast as response variables. For a given concentration vector [P, a] T , we define a balanced growth state (BGS) as a cellular state (characterized by its flux vector v) that satisfies constraints (1), (2) and (3). The set of all such states forms the solution space of balanced growth. On the following pages, we first develop a framework for GBA by characterizing BGSs at a fixed concentration vector [P, a] T . These characterizations are independent of any physicochemical limits on the concentrations of the cellular components (density constraints); such constraints will, however, become crucial once we examine optimal balanced growth across all feasible concentration vectors. In the main text, we provide an overview over the mathematical structure of GBA and its implications; the formal definitions and theorems are detailed in "Methods", while Supplementary Table 2 lists the symbols used. Cellular state defined by the concentration variables. Let v be a BGS at concentration vector y 0 ¼ ½P 0 ; a 0 T . If we treat y 0 as a constant, then Eq. (1) is mathematically identical to the steadystate constraint fundamental to FBA and to metabolic pathway analysis in general 26 . We call v an EGS if v also represents an EFM 27 of the corresponding FBA-type problem defined by the mass-normalized stoichiometric matrix A together with a "biomass reaction" described by y 0 and the flux directions enforced by the signs of the kinetic functions k j (a 0 ) (i.e., v is a feasible flux vector with minimal support under the FBA-type constraints; "Methods", Definition 3). We can express any BGS as a weighted average of EGSs at the same concentration vector [P, a] T (Theorem 3). Moreover, any optimal BGS under a single cellular density constraint (see below) is also an EGS (Theorem 9 based on refs. 28,29 for EFMs; see also ref. 30 ).
Thus, if we characterize the mathematical properties of EGSs, then these properties apply not only to optimal BGSs-which are the main focus of this work-but also to the individual EGS in a decomposition of any BGS. If A is the active stoichiometric matrix of an EGS, it has full column rank (Theorem 4 based on ref. 31 ; see also ref. 30 ). The full column rank is the only property of EGSs that we will require below. Accordingly, without much loss of generality, we focus on active matrices A that have full column rank for the remainder of this article.
The matrix A may have more rows than columns, in which case some reactant concentrations in a are linearly dependent on other concentrations 32 . The dependent concentrations c are not free variables, and hence they can be put aside and dealt with separately. For clarity of presentation, we here present only the case without dependent reactants; the generalization to BGSs with dependent reactants can be treated similarly and is detailed in "Methods".
Without dependent reactants, A is a square matrix with a unique inverse A −1 , and x ≡ [P, a] T is the corresponding vector of independent concentrations. Multiplying both sides of the mass balance constraint (1) by A −1 , we obtain (Theorem 5) The right-hand side of the mass balance constraint (1) quantifies how much of each component x i needs to be produced to offset the dilution that would otherwise occur through the exponential volume increase at rate μ. A À1 ji quantifies the proportion of flux v j invested into offsetting the dilution of component i, and we thus name A −1 the investment (or dilution) matrix; see Supplementary  Fig. 1 for examples. In contrast to the mass-normalized stoichiometric matrix A, which describes local mass balances, A −1 describes the structural allocation of reaction fluxes into offsetting the dilution of all downstream cellular components, carrying global, systems-level information.
From the kinetic equation (Eq. (3)), p j = v j /k j (a), and inserting v j from the investment equation (Eq. (4)) gives where ∑ i sums over the total protein and individual reactant concentrations (Theorem 6). Substituting these expressions into the total protein sum (Eq. (2)) and solving for μ results in the growth equation (Theorem 7) As detailed in "Methods" (Theorems 5-7), a corresponding result also holds for BGSs with dependent reactants. Thus, for any active matrix A with full column rank (in particular for all active matrices of EGSs) and for any corresponding concentration vector x, there are unique and explicit mathematical solutions for the fluxes v, individual protein concentrations p, and growth rate μ. If μ (Eq. (6)) and all individual protein concentrations p j (Eq. (5)) are positive, the cellular state is a BGS; otherwise, no balanced growth is possible at these concentrations.
Marginal fitness contributions of cellular concentrations. We now use these relationships to calculate the costs and benefits of concentration changes, which are naturally expressed in terms of relative fitness effects. As above, the main text considers the simpler case without dependent reactants, while the more general case is treated in "Methods". If fitness is determined predominantly by growth rate 1 (Supplementary Note 1), we can we define the marginal net benefit η i of concentration x i as the relative change in growth rate 33 due to a small change in x i ("Methods", Definition 4), for example, η P = η ATP = 0.01 l mg −1 would indicate that an increase of either total protein or ATP concentration by 1 mg l −1 -if possible-would increase the growth rate by 1%.
To aid in the interpretation of η i below, we define the marginal production cost incurred by the system via protein j as a consequence of increasing concentration x i at fixed growth rate μ and kinetics k j , where the second equality follows from Eq. (5). q j i quantifies by how much the concentration p j of the upstream protein j has to rise in order to offset the increased dilution of the downstream concentration x i . q j i is related to the protein control coefficient of metabolic control analysis (MCA); see Supplementary Note 3 for a more detailed summary of the relationship between GBA and MCA [34][35][36] .
Taking the partial derivatives of the growth equation (Eq. (6)) with respect to P and the concentration a α of reactant α, respectively, we find that the marginal net benefits according to Eq. (7) can be expressed as (Theorem 8) where the last equation is derived using p j = v j /k j . u j α can be interpreted as the marginal kinetic benefit 37 of reactant α to reaction j and quantifies the proportion of protein p j "saved" due to the change in kinetics associated with an increase in a α . The kinetic benefit u j α is a strictly local effect, as it is zero if a α does not influence the kinetic function k j (a); we expect u j α to be positive if α is a substrate and negative if α is a product of reaction j. u j α relates directly to the elasticity coefficients of MCA (Supplementary Note 3). Because fluxes are proportional to the concentrations of the catalyzing proteins, the marginal kinetic benefit of total protein is simply 1/P. Expressions that additionally account for dependent reactants are provided in "Methods".
As seen from the derivation in "Methods", applying the chain rule of differentiation to the growth equation (Eq. (6)) further provides a simple interpretation of the net benefit of component i via reaction j (see "Marginal fitness benefits and costs" in "Methods"; note that because here we assume that there are no dependent reactants, direct and total net benefits as defined in "Methods" are identical). The derivation shows that the marginal net benefit is identical to the reduction of the proteome fractions ϕ j ≡ p j /P facilitated by the increase in x i at constant μ, Thus, for a positive η i and keeping the growth rate μ constant, a small increase in x i by Δx i results in a corresponding reduction of the total proteome fraction, ∑ j Δϕ j = −η i Δx i : at least some proteins are now required at lower concentrations. This result provides a formal justification for the widely held notion that cellular costs lie predominantly in protein production 3,[5][6][7][8][9]14,15,19,20,37,38 .
Optimal growth and the balance of marginal net benefits. Up to this point, we kept x = [P, a] T fixed. We will now characterize optimal growth states, i.e., BGSs with maximal growth rate across all allowed concentration vectors x. To make this problem well defined, we need to consider an additional constraint that reflects the cellular requirement for a minimal amount of free water to facilitate diffusion 39,40 . We implement this constraint by assuming that cellular dry weight per volume is limited to a maximal density ρ, where ρ is determined by external osmolarity 40,41 but is otherwise constant across growth conditions [42][43][44] , A BGS is a density-constrained BGS (dBGS) if it additionally satisfies constraint (9). At maximal growth rate, the cellular components will utilize the full cellular limit on density to saturate enzymes with their substrates, and thus the inequality in Eq. (9) becomes an equality.
The maximal balanced growth rate μ * will be a function of ρ. In analogy to the marginal net benefits of cellular components, we define the marginal benefit of the cellular density as the relative fitness increase facilitated by a small increase in ρ, Using the method of Lagrange multipliers with the growth equation (Eq. (6)) as the objective function, we derive necessary conditions at optimal growth, which we term balance equations: (Theorem 10). Again, the presentation here assumes that there are no dependent reactants, while a corresponding result is derived for the general case with dependent reactants in "Methods" ("Optimal density-constrained balanced growth states"). Both with and without dependent reactants, the optimal state is perfectly balanced: the marginal net benefits of all independent cellular concentrations x i are identical. Thus, if the dry weight density ρ could increase by a small amount (such as 1 mg l −1 ), then the marginal fitness gain that could be achieved by increasing protein concentration by this amount is identical to that achieved by instead increasing the concentration of any reactant α by the same amount. This should not be surprising: if the marginal net benefit of concentration x i was higher than that of x i 0 , growth could be accelerated by increasing x i at the expense of x i 0 . Equation (10) together with Eq. (9) describes a system of n + 1 equations for n + 1 unknowns, the independent concentrations x i . In realistic cellular systems, this set of equations has a finite number of discrete solutions. Thus, growth rate optimization can be replaced by searching for the solution of the balance equations. If the optimization problem is convex, the conditions given by Eq. (10) are necessary and sufficient, and the solution is unique.
Quantitative predictions. If a substrate α 0 is consumed only by a single reaction that is the only one producing a product i 0 (with i 0 2 fP; αg), the non-local dilution terms in the balance equation (η α 0 ¼ η i 0 ) cancel, and we are left with a local problem for which only the production cost of x i 0 and the kinetic benefits of a α 0 and x i 0 must be considered. This is the case for protein production in simplified models 38 where the ribosome (R) produces proteins from a single substrate, a generic ternary complex (T). In such models, we can calculate the optimal proteome fraction of actively translating ribosomes, ϕ R ≡ p R /P, from the balance equation η T = η P (Eq. (10) and its generalization in Theorem 10). The predictions agree quantitatively with experimental values in E. coli 45,46 and the yeast Saccharomyces cerevisiae 47 across a wide range of growth conditions (Fig. 2).
In contrast to previous approaches based on the analysis of schematic, linear cell models with 2-3 reactions and largely arbitrary kinetic parameters [6][7][8][9] , our predictions of the scaling of active ribosome fractions with growth rate ( Supplementary Fig. 2) are both quantitative and general, as they rely only on the known stoichiometries and kinetics of the ribosome reactions themselves and are independent of any particular network structure. An approximation that ignores the dilution of intermediates and hence the associated production costs (q j α % 0) results in less  Fig. 4). In contrast, these approximate predictions are close to observed values for growth on minimal media (μ < 1 h −1 ), indicating that the dilution of intermediates, μa α , becomes less important at lower growth rates. The latter observation may explain why the relationship between the concentrations of a substrate and its catalysts is well approximated in this regime by simply minimizing their combined mass concentration while keeping the reaction rate constant 48 , as this is mathematically equivalent to ignoring the dilution of intermediates.
To obtain a rough quantitative estimate of the marginal net benefits η i , we here consider the simplest model of a complete cell, consisting of only a transport protein and the ribosome 3,7 ( Supplementary Fig. 2). This model is structurally very similar to previously analyzed schematic whole-cell models. However, contrary to previous models that assumed a fixed total protein concentration as the only density constraint 3,5-9,19 , our model's density constraint (9) limits the joint mass concentration of proteins and reactants. Based on the experimentally observed proteome fraction of total dry weight in E. coli, P/ρ = 0.55 49 (42)). Thus, a decrease in cellular dry weight density ρ of 1% would lead to a 0.66% reduction in growth rate, emphasizing the biological significance of the density constraint and potentially explaining why E. coli's dry mass density appears to be roughly constant across conditions [42][43][44] .
The cellular density ρ changes when external osmolarity is modified 40 . ρη ρ ¼ dln μ dln ρ is the slope of the log-log-scale plot of μ vs. ρ across different external osmolarities. While increases in ρ may have strong effects on diffusion and thus on enzyme kinetics, reductions in ρ due to decreased external osmolarity are within the scope of our model. The very limited available experimental data (three data points from ref. 50 , Supplementary Fig. 3) suggest ρη ρ ≈ 0.66, the same as our rough estimate from the minimal cell model. An otherwise identical model that limits total protein density 3,5-9,19 P instead of dry mass density predicts a much weaker dependency of growth rate on osmolarity, with Pη P = 0.36 ("Methods").

Discussion
At the heart of our mathematical derivations is A −1 , the inverse of the mass-normalized active stoichiometric matrix A of any given EGS (or, more generally, any given BGS with linearly independent reactions). A −1 provides important information on the cellular efficiency. As seen from Eq. (4), A À1 ji quantifies which proportion of reaction flux v j is required to offset the dilution of the downstream cellular component i (either total protein P or reactant α). These non-local, structural massbalance constraints lead to an explicit dependence of reaction fluxes on the cellular concentrations (Eq. (4), Theorem 5). Independently of this, fluxes also depend on concentrations through reaction kinetics (constraint (3)). Combining these two relationships leads to explicit expressions for the individual protein concentrations p j and for the growth rate μ, casting them as functions of the concentrations x = [P, a] T . Accordingly, A −1 accounts for all systems-level contributions to the marginal costs and benefits of cellular concentrations x i , while the kinetic functions k j (a) account for local effects. The insight that optimal, density-constrained states of balanced growth are EGS allowed us to derive the balance equations (Eq. (10)); furthermore, as any BGS can be expressed as a weighted average of EGSs (Theorem 3), our results allow a general characterization of the solution space of balanced growth.
While computational limitations restricted previous studies of balanced growth to specific models with 2-7 reactions, we here provide general results for arbitrarily complex cellular systems. Except for the maximal cellular dry weight density constraint (9), the balanced growth model proposed by Molenaar et al. 5 and utilized subsequently for the analysis of schematic models 3,6-10 is based on assumptions identical to those made for GBA, constraints (1), (2) and (3). Previous authors (with the exception of Faizi et al. 10 ) assumed a limit on total protein ("macromolecular") concentrations, while we assume a joint limitation of all cellular solutes (Eq. (9)). The latter choice is justified by the approximate constancy of the cellular dry mass density across growth conditions [42][43][44] , and by an observed relationship between enzyme and substrate concentrations that is consistent with natural selection on the parsimonious use of a limited dry mass density 48 .
To make the presentation concise, our development of GBA assumes (i) that all proteins contribute to growth by acting as catalysts or transporters; (ii) that there is a 1-to-1 correspondence between proteins and reactions; (iii) that proteins are not used as reactants; (iv) that all catalysts are proteins; and (v) that cells are optimized for growth. Supplementary Note 2 outlines how to remove these simplifications.
Due to the explicit inclusion of the major physicochemical constraints on cellular growth, GBA models promise to provide a mechanistic understanding of microbial resource allocation and physiology at a depth not achievable with alternative optimization-based models. In principle, exploitation of the balance equations (Eq. (10)) may allow the numerical optimization of cellular systems of realistic size, encompassing hundreds of protein and reactant species. However, several challenges must be overcome before GBA models can be used to make detailed quantitative predictions of genome-scale resource investment and physiology.
The first challenge is the identification of the set of active reactions in a given cellular state, leading to the active stoichiometric matrix A. The optimal state is an EFM of the linearized problem (Theorem 9), and thus a direct way to achieve this would be to compute all EFMs of the full stoichiometric matrix compatible with balanced growth (i.e., all support-minimal subsets of reactions that are capable of producing their own reactants plus protein), to apply GBA to each of them, and to then select the EFM resulting in the highest growth rate. While this approach works well for small, schematic models as those in refs. 3,5-9 and may be feasible for coarse-grained models with a few dozen reactions, the number of biomass-producing EFMs in genomescale networks is too large for them to be calculated exhaustively on current computers 51 . As an approximate alternative, one could restrict this analysis to a subset of candidate EFMs, e.g., based on FBA with molecular crowding 52 and on parsimonious FBA 53 (where fluxes could be scaled by the maximal enzyme turnover rates, k cat ) or chosen to represent known physiological states (e.g., yield-maximizing vs. overflow metabolism 54 ), or one might analyze only EFMs with a pre-specified maximal number of reactions 51 .
A further obstacle to the accurate formulation of GBA models is the current incompleteness of knowledge on the kinetic rate laws and parameters needed for the functions k j (a), the same problem which hampers large-scale kinetic modeling applications 23,24,55 and (with respect to the effective turnover numbers) RBA 14 and ME 15,22 models. Recent developments of high-throughput assays for their estimation from -omics data have led to promising results 14,18,56 , suggesting that such approaches may lead to a comprehensive kinetic characterization of model microbes in the future. In parallel, methods from artificial intelligence have been shown to predict enzyme kinetic parameters with reasonable accuracy 57,58 , suggesting that these approaches can augment incomplete in vitro or in vivo parameter sets. Parameter balancing 59 could aid in the completion of a given set of kinetic constants by exploiting the thermodynamic dependencies among biochemical quantities 37 . In addition, GBA parameterizations could be completed similarly to the parameterization of kinetic models 23,24 , by fitting model predictions to experimental data acquired across growth conditions. Experience with kinetic models indicates that high predictive power can frequently be achieved with large uncertainties in the parameter sets 23,25 , suggesting that even approximate GBA parameterizations may already lead to valuable insights. Finally, finding the optimal state of a genome-scale GBA model requires the numerical solution of a large non-linear optimization problem. The system of n + 1 equations provided by Eqs. (9) and (10) represents the necessary conditions for optimal growth, and these are important ingredients for developing efficient algorithms to solve the problem.
Although explicit, genome-scale GBA models are built on the same kinetic rate laws as kinetic modeling approaches, their optimization-based methodology does not require enzyme concentrations as model inputs and will likely be more robust to inaccurate kinetic representations. Importantly, GBA will also be much more robust to the omission of regulatory effects of reactants, as these result in additional protein costs but will in most cases have no major influence on the predicted fluxes. On the other hand, kinetic models can be used to assess the cellular response to genetic or environmental perturbations and can utilize mutant data for their parameterization. This is not possible with optimization-based models such as GBA, as they assume that cellular resource allocation in the modeled state is optimal with respect to a known objective function, the balanced growth rate in the case of GBA.
While several challenges have to be met before GBA can be applied to genome-scale balanced growth models, the present work establishes a comprehensive formal basis for such applications. Importantly, this mathematical framework can immediately be applied to the systematic analysis of schematic models, such as those examined in earlier work using numerical methods 5,10 or ad-hoc analytical optimizations 3,6-9 . Moreover, the analytic formulations developed here facilitate the straight forward application of GBA to coarse-grained cellular models of increasing complexity, parameterized from experimental data 19,20,60 .
Independent of model details and parameterizations, our mathematical analysis provides general quantitative insights into cellular resource allocation and physiology in states of balanced growth. For example, while previous work has emphasized the central role of proteins in the cellular economy 3,[5][6][7][8][9]14,15,19,20,37,38 , Eq. (8) provides a rigorous formal justification for this notion in the context of balanced growth. At the same time, whereas the total protein mass concentration P is much higher than the mass concentration of any other cellular constituent a α in most biological systems, the balance equations show that their marginal net benefits are in fact equal at optimal growth.
The application and further development of the GBA theory may foster an enhanced theoretical understanding of how physicochemical constraints determine the fitness costs and benefits of cellular organization. Moreover, the explicit expressions for the marginal fitness costs and benefits of cellular concentrations provide a rigorous framework for a quantitative analysis of the cellular economy. We anticipate that this approach will prove fruitful not only in the interpretation of natural and laboratory evolution, but also in optimizing the design of synthetic biological systems.

Methods
Overview. In the first four sections of "Methods", we provide a formal description of Growth Balance Analysis (GBA), detailing the formal definitions, theorems, and proofs that form the basis of the main text. For simplicity of notation, we use the following conventions: {α} is the set of all reactants in the active stoichiometric matrix A, and ∑ α indicates that we sum over all α ∈ {α}. We use corresponding notations for the sets of independent basis reactants {β}, with concentrations b β , and dependent reactants {γ}, with concentrations c γ (see below). As explained in Definition 1 below, stoichiometric matrices are always in units of mass fractions, not stoichiometric coefficients. The last two sections describe the calculations of the optimal ribosome proteome fractions and the dependence of maximal growth rate on cellular water content.
Characterization of balanced growth states. First, we introduce the fundamental definitions that characterize the solution space of balanced cellular growth. We define BGSs and generalize the concept of EFMs from linear constraint-based models to EGSs (defined as flux vectors). We then introduce several theorems on the characterization and decomposition of BGSs.
In the formulation presented here, we assume that proteins do only act as catalysts and not as substrates of reactions. Hence, neither total protein nor individual proteins are considered "reactants". , where protein j catalyzes reaction j; for simplicity, we assume that the "ribosome" catalyzing protein production is also itself a protein (but see Supplementary Note 2 for how to remove this simplification). Let k(a) be a vector of kinetic functions, k : Then v is a balanced growth state (BGS) at growth rate μ if and only if it fulfills the following three constraints: v j ¼ p j k j ðaÞ ð12Þ A BGS v at growth rate μ is a density-constrained BGS (dBGS) if it additionally fulfills the constraint on total dry mass density Constraint (11) implements mass balance, constraint (12) implements concentration-dependent reaction kinetics, while constraint (13) implements a constraint on the total proteome concentration. The kinetic constraint (12) assumes that the flux through each reaction is linear in the concentration of the catalyzing enzyme, while the dependence on the reactant concentrations a α will typically be non-linear. For simplicity of notation, we will sometimes make the dependence of kinetics on a implicit, i.e., we will use k j ≡ k j (a).
In the above definitions, we define a BGSs (or dBGSs) as a function of the set of active reactions (corresponding to the columns of A) and the concentration vector y = [P, a] T . For a given active stoichiometric matrix A, the set of all such states at all concentrations y 2 R mþ1 > 0 defines the solution space of balanced growth (or of density-constrained balanced growth if only concentrations y that respect constraint (14) are considered).
Based on biophysical considerations, we might replace Eq. (14) with separate density constraints on the total volume concentration inside each cellular compartment 39 and on the total area occupied by non-lipid membrane components per membrane area 5,61 . An even simpler density constraint imposed in most previous models 3, [5][6][7][8][9]14,15 is to fix total protein concentration P to a constant value. However, it has been shown that P decreases with increasing growth rate, whereas total dry mass density is approximately constant across conditions [42][43][44] . Thus, while a constant P allows to simplify the presentation, Eq. (9) provides a biologically more meaningful constraint; moreover, this constraint allows us to determine the costs and benefits of varying the total protein concentration.
De Groot et al. have defined BGSs for a similar problem 30 . In their formulation, the dimensions of the concentration vector y include not only total protein P, but all individual protein concentrations p j . This more general problem formulation NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-14751-w ARTICLE comes at the cost of more involved decomposition rules 30 compared with Theorem 2.
We now provide the basis for linking BGSs to EFMs, which are defined for FBA-type linear constraint-based problems 27 and which have been extended to proteome-constrained models 28,29 .
Definition 2 (EFMs): Let v 2 R n , y ¼ ½P; a T 2 R mþ1 > 0 , and A 2 R ðmþ1Þ n be as in Definition 1. Let k ðeff Þ 2 R n ≠0 be a vector of effective kinetic constants. Then we call v a feasible flux vector at biomass production rate v bio if and only if it fulfills the following constraints: A feasible flux vector v is a representative of an elementary flux mode (EFM) if and only if it is non-decomposable, i.e., it fulfills the following additional constraint 27 : There exists no couple of feasible flux vectors v 0 ; v 00 such that v ¼ λ 1 v 0 þ λ 2 v 00 with λ 1 , λ 2 > 0 and where both v 0 and v″ have at least the same number of zeroes as v, while at least one of them contains more zeroes than v.
If we consider the concentration vector y = [P, a] T as a descriptor of a constant biomass composition, we see that constraint (15) is mathematically equivalent to the standard steady-state constraint of FBA and metabolic pathway analysis 26 problems, formulated without an artificial "biomass reaction" in A (see, for example, Eq. (2) in ref. 62 ). Note that in the definition of EFMs, both the biomass composition y = [P, a] T and the effective kinetics k (eff) are assumed to be constant; thus, the constraints (15)-(17)  We emphasize that v is an EFM of the corresponding linearized (FBA-like) problem (see Definition 2), not of the balanced growth problem (Definition 1) from which it is derived. EFMs are defined as equivalence classes of minimal feasible steady-state flux distributions, whose members can be converted into each other by multiplication with a positive scalar 27 . This definition cannot be generalized to balanced growth models, as multiples of a feasible flux vector generally do not satisfy constraint (11). For this reason, de Groot et al. have generalized the concept of EFMs to equivalence classes of minimal sets of active reactions in BGSs, termed elementary growth modes (EGMs) 30 .
Theorem 1 (Existence of solutions): Let y = [P, a] T be a concentration vector and μ > 0 be a growth rate. For any flux vector v 0 that satisfies the mass balance constraint (11), there exists a unique BGS v ¼ λv 0 with λ > 0 if all fluxes run in the direction compatible with the reaction kinetics (i.e., ∀j: k j v j > 0), and no such BGS otherwise.
Proof: From constraint (12), it is clear that if k j v j ≤ 0, no BGS with p j > 0 exists. For k j ≠ 0, the concentration of protein j is uniquely defined by p j = v j /k j (constraint (12)). Let P 0 ¼ P j v 0 =k j be the total protein concentration associated with v 0 . Then setting λ P=P 0 results in the only flux vector that fulfills all constraints of Definition 1. □ Next, we use this result to show that any weighted average of BGSs is itself a BGS.
Theorem 2 (A weighted average of BGSs is a BGS): Let (v (1) , . . . , v (k) ) be an ordered set of BGSs for the concentration vector y = [P, a] T with growth rates (μ (1) , . . . , μ (k) ), but with potentially different active stoichiometric matrices A (l) . Let A be the stoichiometric matrix that combines all reactions represented in (A (1) , . . . , A (k) ), i.e., the columns of A consist of all unique columns of (A (1) , . . . , A (k) ). Let represented in A (l) . Then any weighted average v ¼ P l w l v 0 ðlÞ of these extended flux vectors (with weights w l > 0 and ∑ l w l = 1) is itself a BGS for y, with a growth rate that is the weighted average of the individual growth rates, μ = ∑ l w l μ (l) .
Proof: The mass balance constraint (11) is linear in the fluxes and growth rates, and is hence also fulfilled for the weighted averages. The protein concentrations of each BGS v 0 ðlÞ are p  27 , i.e., v ¼ P l w 0 l v 0 ðlÞ with w 0 l > 0. From Theorem 1, we know that for each of these EFMs, there exists a unique BGS v ðlÞ ¼ λ l v 0 ðlÞ with λ l > 0; according to Definition 3, this is an EGS. Thus, we can write v = ∑ l w l v (l) as a linear combination of EGSs, with weights w l w 0 l =λ l . To prove that v is a weighted average of the v (l) , it remains to be shown that W ≡ ∑ l w l = 1. According to Theorem 2, a weighted average v 00 P l w l W v ðlÞ ¼ 1 W v will also be a BGS. However, Theorem 1 states that there exists only one BGS in the direction of v, and thus W = 1. □ Growth equations. In this section, we assume that the concentrations of total protein and of individual reactants, y ≡ [P, a] are known. Mass conservation (constraint (11)) and reaction kinetics (constraint (12)) relate reaction fluxes to the concentration vector in two fundamentally different ways. We will now exploit this fact to eliminate the flux variables and to derive explicit expressions for v, p, and μ.
Note that because the concentrations y are used as state variables in these analyses, no explicit consideration of constraints on cellular density, such as constraint (14), is necessary. The given concentrations y may obey constraint (11) or alternative density constraints, such as independent constraints on the density of cellular compartments, but these will not be used here. They will only become important when we vary y to find states of maximal growth rate in a later section.
An important requirement for the analyses below is that the active stoichiometric matrix A has full column rank, motivating the next theorem.
Theorem 4 (The active reactions of an EGS are linearly independent): Let A 2 R ðmþ1Þ n be the active stoichiometric matrix of an EGS. Then A has full column rank n, i.e., the columns of A are linearly independent.
Proof: According to the definition of EGSs (Definition 3), A is also the active matrix of the corresponding linearized (flux balance type) problem. It has previously been shown 31 that the active stoichiometric matrix A of an EFM of a linear flux-balance problem has full column rank if A is formulated without an explicit "biomass" reaction (as in Definition 2). □ According to this theorem, the following theorems-which assume that A has full column rank-can in particular be applied to EGSs (and, as we will see below in Theorem 9, thus also to dBGSs with maximal growth rate).
Theorem 5 (Investment equation): Let A 2 R ðmþ1Þ n be an active stoichiometric matrix of a flux vector v that fulfills the mass balance constraint (11) with concentration vector y = [P, a] T , where A has full column rank n. Then we can split A into two submatrices B 2 R n n and C 2 R ðmþ1ÀnÞ n , such that B is a non-singular (invertible) square matrix and each row of C is a linear combination of rows of B. Let B −1 be the inverse of B. Let b be the subvector of reactant concentrations a that correspond to the rows of B, c be the subvector of the reactant concentrations that correspond to the rows of C, and x ≡ [P, b] T . Then v is given by The dependent reactant concentrations c are linear combinations of the independent concentrations x, with the dependence matrix D ≡ CB −1 . Proof: The active stoichiometric matrix A may have more rows than columns. In this case, m + 1 > n, and the rows for exactly n metabolites are linearly independent, as row and column rank must equal. As a consequence, the remaining m + 1 − n metabolite concentrations are linearly dependent on the concentrations of the n independent metabolites. These dependent concentrations are not free variables, and hence they can be put aside and dealt with separately.
We decompose the linear system of equations represented by constraint (11) into two parts, rearranging the rows of A into matrices B, C such that B contains the rows for the independent reactants. As A has full column rank, choosing linearly independent rows results in a square matrix B of full rank (#rows(B) = rank(B) = rank(A)). Let b be the subvector of reactant concentrations a that correspond to the rows of B, and let c be the subvector of the remaining reactant concentrations corresponding to the rows of C. We can then split the mass balance constraint (11) into two separate equations:  (11) is uniquely defined and is a linear combination of the total protein concentration P and the independent metabolite concentrations b. Each entry of the inverse matrix B À1 ji quantifies the proportion of flux j invested into the dilution of component i, and we thus name B −1 the investment (or dilution) matrix (see Supplementary Fig. 1 for examples). In contrast to the stoichiometric matrix A, which describes local mass balances (constraint (11)), B −1 describes the structural allocation of reaction fluxes into the production of cellular components diluted by growth, and thus carries global, systems-level information.
B corresponds to the reduced stoichiometric matrix in ref. 32 . D describes the linear dependence of the dependent concentrations c on P and b; it is identical to the link matrix in ref. 32 . The relationship between A and B, C can be understood in terms of matroid theory, where the rows of B form a basis for the matroid spanned by the rows of A, and the set of rows of C is the closure for the set of rows of B. If the choice for the partitioning of A into B and C is not unique, some partitionings may be pathological and should be avoided (Supplementary Note 4).
When A is not square, B includes a proper subset of the rows in A, and thus B on its own is not mass balanced. The "missing" mass fluxes are balancing c, and hence the flux investment into c is already accounted for by the investment equation in Theorem 5.
We are now in a position to express the individual protein concentrations and the growth rate of a BGS as explicit functions of the concentrations y = [P, a] T . Theorem 6 (Individual protein concentrations as a function of the independent concentrations): Let A 2 R ðmþ1Þ n be an active stoichiometric matrix with full column rank n, and let x = [P, b] T be the independent concentration vector with corresponding index i ∈ {P, β}. Let v be a corresponding BGS. Let B and D be the basis and dependency matrices, respectively, as defined in Theorem 5. Then the concentration of the protein catalyzing reaction j is Proof: As A is an active matrix, all fluxes v j = p j k j (a) (constraint (12)) are nonzero. We can thus express the individual protein concentrations as p j = v j /k j (a).
Inserting v j from the investment equation (Theorem 5) directly leads to the above equation. □ We now insert the equations for the individual proteins into the total protein constraint (13) to obtain an explicit expression for the growth rate.
Theorem 7 (Growth equation): Let A 2 R ðmþ1Þ n be an active stoichiometric matrix with full column rank n, and let y = [P, a] T be a concentration vector. Let v be a corresponding BGS. Let B and D be the basis and dependency matrices, respectively, as defined in Theorem 5. Then the growth rate is ji x i k j ðaÞ > 0, and no balanced growth is possible otherwise.
Proof: According to Theorem 6, the individual protein concentrations are The flux v j catalyzed by protein j must be active, and thus p j has to be positive for all j. Substituting the expressions for p j into the proteome constraint (13), we obtain The sum on the r.h.s. is positive, and dividing by it results in the growth equation. □ Thus, if the active matrix A of a BGS is full rank, there are unique and explicit mathematical solutions for p, v, and μ. In particular, this is the case for optimal growth states (Theorem 9), as well as for all other EGSs. In this section, we did not impose any density constraints (such as constraint (14)), and thus Theorems 1-7 remain valid under arbitrary density constraints as long as these are respected by the concentration vector y = [P, a] T .
Marginal fitness benefits and costs. In this section, we first define marginal fitness benefits and costs of concentrations. As in the previous section, the considerations in this section make no use of the density constraint (14), and thus remain valid under alternative density constraints. After introducing the definitions, we show how to calculate and to interpret the costs and benefits.
Definition 4 (Marginal costs and benefits): Let v be a BGS with growth rate μ. Let i ∈ {P, β} be an index of the independent concentration vector x = [P, b] T . Then the direct marginal net benefit of concentration x i is defined as the relative change in growth rate due to a small change in x i 33 , Analogously, we define the marginal benefit of dependent reactant γ as The (total) marginal net benefit of x i is then defined as the relative change in growth rate due to a small change in x i , accounting for the resulting changes in the concentration of dependent metabolites c γ , where the second equality follows directly from Eq. (18). A change δx i of x i (i ∈ {P, β}) causes a correlated change of each dependent concentration δc γ = D γi δx i (Eq. (18)). Thus, a change by δx i results in a total change of the utilization of cellular density by κ i δx i , with the density factor defined as To help in the interpretation of the marginal net benefits, we will relate them in the next theorem to two explicit definitions of costs and benefits, respectively. The marginal production cost of the cellular concentration x i is defined as where the subscript of the parenthesis indicates which variables are kept constant in the derivative. q j i can be interpreted as the additional amount of protein j required to offset the increased dilution of x i ∈ {P, β} at growth rate μ and fixed kinetics k j . We define the marginal kinetic benefit of the reactant concentration b β as and we make corresponding definitions u j γ for dependent concentrations c γ . The marginal kinetic benefits can be interpreted as the fraction of proteins j saved at constant flux v j due to the increased saturation of reaction j with reactant β or γ, respectively.
The marginal net benefits can now be expressed as differences between benefits and costs. To calculate the direct marginal net benefits η 0 i , we must use the growth equation derived in Theorem 7, where we defined proteome fractions ϕ j ≡ p j /P. The first form given for μ(P, a) here quantifies growth as a function of the state variables x i , and it would be straight forward to calculate η 0 i from this expression. However, to establish a formal link between marginal net benefits and protein investment, we will instead go via the second form, which arises from Theorem 6 and was used to derive the growth equation, and the third form, which expresses this relationship in terms of the proteome fractions ϕ j . When we take the partial derivatives with respect to the state variables x i in the second and third forms, we must make sure that we keep the right terms constant: when expressed in terms of the x i , the expression p j /μ is in fact independent of μ (Theorem 6), and we hence need to take the derivatives while keeping μ constant. We thus get for the direct marginal net benefits: where we used the fact that the proteome fractions must add to 1, ∑ j ϕ j = 1. Thus, the direct marginal net benefit of the cellular concentration x i is identical to the total associated changes in proteome fractions caused by this change.
Again looking at Eq. (21), we can further analyze the nature of the proteome changes caused by a change in the cellular concentration x i . Let us first consider a reactant concentration x i = b β . Applying the chain rule of differentiation to ϕ j ¼ p j =P ¼ μ P i B À1 ji x i =k j ðaÞ, we have to add the partial derivatives with respect to x i = b β in the numerator μ P i B À1 ji x i ¼ v j (keeping μ and k j constant) and in the denominator k j (a) (keeping the numerator v j constant, which also guarantees that μ is constant). Thus, we can write the direct marginal net benefits of the independent reactant concentration b β in terms of proteome changes as where in the last line we inserted the definitions of the marginal kinetic benefits and production costs (Definition 4).
Performing an analogous calculation for the direct net benefit of the total protein concentration P (noting that we now need to take the derivative with respect to the numerator of the growth equation but not with respect to (k j ), we obtain where we used ∑ j p j = P and where we inserted the definition of the marginal production cost of P (Definition 4, with k j independent of P) in the last line. The positive term 1/P in the direct net benefit of total protein quantifies the marginal benefit of increasing the total protein concentration P, which accelerates all reactions linearly.
We have thus proven the next Theorem, which elucidates how costs and benefits of cellular compounds are naturally related to protein use; this connection has been proposed before 33,37 but is derived here rigorously from first principles.
Theorem 8 (Direct marginal net benefits): The direct marginal net benefit of any independent cellular concentration x i (i ∈ {P, β}) is the negative of the total associated change in relative protein concentrations at constant growth rate μ, The direct marginal net benefits of the total protein concentration P and of independent reactant concentrations b β (β ∈ {1, …, m}), respectively, are The marginal production cost q j i is the fraction of extra protein j expended to offset the additional dilution of concentration x i at rate μ and fixed saturation k j ; it can be calculated from the growth equation (Theorem 7) as The marginal kinetic benefit u j β is the fraction of protein j saved due to its increased saturation with reactant β; it is calculated from the growth equation as The marginal kinetic benefits of dependent reactants γ are where u j γ is calculated analogously to the marginal kinetic benefits of independent reactants, u j β .
Optimal density-constrained balanced growth states. So far, we have considered BGS for a given set of active reactions (corresponding to the columns of A) and given concentrations y = [P, a] T , where y may or may not have respected any particular density constraint. We now examine density-constrained BGSs (dBGSs) with maximal growth rate given the set of active reactions, optimized over all concentration vectors y ¼ ½P; a T 2 R mþ1 > 0 that respect the density constraint (14). As a preparation for these analyses, we first show that states of optimal growth are EGSs.
Theorem 9 (dBGSs with maximal growth rate are EGSs). Let N be a stoichiometric matrix of a general balanced growth model. Let v * be a dBGS that maximizes the growth rate of the general problem. Then v * is an EGS.
Proof: Without loss of generality, we restrict v * to its active dimensions (v Ã j ≠ 0), with active stoichiometric matrix A. Then this reduced v * is the optimal solution for the following non-linear optimization problem over all concentration vectors y ½P; a T 2 R mþ1 > 0 : maximize y μ subject to : Let y Ã ¼ ½P Ã ; a Ã T be the concentrations and μ * the growth rate of the optimal solution v * . Now let us consider a linearized version of this optimization problem, where me maximize the production rate v bio at constant biomass composition y * and effective kinetic constants k ðeff Þ j k j ða Ã Þ (see Definition 2): We relaxed the constraint (13) on total protein into an inequality constraint, so that Eq. (25) describes a protein-constrained FBA problem for the active stoichiometric matrix. This is precisely the type of constrained flux balance problem analyzed in refs. 28,29 , which prove that the solutions v opt to the optimization problem defined by Eq. (25) are EFMs.
In the optimal solution to the problem defined by Eq. (25), the protein concentration constraint will be active, that is, P * = ∑ j p j ; if not, the biomass production rate v bio could be increased by multiplying the vector of protein concentrations p with a constant >1 (as v j ¼ p j k Ã j for all j). Thus, the optimization problem described by Eq. (25) is the same as that described by Eq. (24), except for a reduction in the dimension of the search space due to the fixed concentrations y * (Note that the cellular density constraint (14) is trivially respected in Eq. (25) and can be ignored). Accordingly, the flux distribution v * that maximizes the balanced growth rate μ in Eq. (24) also maximizes the biomass production rate v bio of the protein-constrained FBA problem in Eq. (25); it is hence a representative of an EFM of the active stoichiometric matrix A with biomass y *28,29 , and thus v * is an EGS according to Definition 3. □ In parallel work, de Groot et al. 30 have shown that optimal solutions to balanced growth problems are elementary growth modes (as defined in ref. 30 ), and that the active stoichiometric matrix of elementary growth modes has full rank.
If instead of a single constraint on cellular density, multiple density constraints are imposed simultaneously (e.g., to describe separate constraints on different cellular compartments), then the solutions may in some cases correspond to positive linear combinations of EGSs 30,63 , and the treatment below needs to be generalized. Multiple density constraints may play a role in the emergence of overflow metabolism in E. coli 54,64 , although overflow metabolism can also arise in balanced growth models with a single density constraint 5 .
In a dBGS with maximal growth rate for a given active stoichiometric matrix A, the cellular components will utilize the full limit on cellular density ρ to saturate enzymes with their substrates. Thus, the constraint (14) will be active, turning the inequality into an equality. The maximal balanced growth rate μ * will thus be a function of the maximal cellular density ρ. As a reference value for the marginal net benefits of individual concentrations x i , we now define the marginal benefit of the cellular density ρ.
Definition 5 (Marginal benefit of the cellular density): In analogy to the marginal net benefits of cellular components, we define the marginal benefit of the cellular density as the fitness increase facilitated by a small increase in ρ, We can now relate η ρ to the total marginal net benefits of all concentrations. To do this, we derive necessary conditions for any optimal BGS at constant cellular density ρ, using the method of Lagrange multipliers. The Lagrange multipliers quantify the importance of the density constraint, Eq. (14), and of the constraints for the dependent reactants, Eq. (18), for the maximization of the objective function. The Lagrangian L is a function of y = [P, a] T , and ρ.
Theorem 10 (Balance equation): In a dBGS with maximal growth rate, the total marginal net benefit of each independent concentration x i (i ∈ {P, β}) equals the marginal benefit of the cellular density ρ scaled by the density factor κ i , Proof: We use the method of Lagrange multipliers to derive necessary conditions for any optimal dBGS at constant cellular density ρ. Our objective function is given by Theorem 7, which expresses the growth rate μ as an explicit function of the concentrations y = [P, a] T . The density constraint (14) will be active at maximal growth rate, i.e., it becomes an equality. The density constraint can then be expressed as a function g ρ that depends on ρ and on the concentrations, g ρ ðP; aÞ P þ X α a α À ρ ¼ 0: Finally, the constraints on each dependent reactant γ also only depend on y = [P, a] T , with the entries D γP determining the composition of each γ in terms of P, and D γβ determining the composition of γ in terms of b β , We now define a Lagrangian as the sum of the objective function μ and the constraints g scaled by Lagrange multipliers λ ρ , accounting for the density constraint (14), and λ γ , accounting for the dependence of the dependent reactants γ ∈ {γ}, Eq. (18): The first-order necessary conditions for a constrained local maximum are that all partial derivatives of L with respect to the variables P, b β , c γ and to the Lagrange multipliers λ ρ , λ γ are zero, For the partial derivative with respect to an independent concentration x i (i ∈ {P, β}), we have With Theorem 8, this results in For the partial derivative with respect to a dependent reactant c γ , we have ∂L ∂c γ ¼ ∂μ ∂c γ þ λ ρ À λ γ ¼ 0: With Eq. (19), we obtain λ γ ¼ μη 0 γ þ λ ρ : Substituting λ γ from the last equation into Eq. (27) gives (for i ∈ {P, β}) Rearranging results in where we used η ρ = −λ ρ /μ, which follows directly from the envelope theorem 65 .
With μ > 0, we can divide by μ to obtain the balance equation. □ The optimal state is perfectly balanced: the total marginal net benefit of each independent cellular concentration x i equals the marginal benefit of the cellular density, scaled by κ i to account for its total utilization of cellular density. If i does not have any dependent reactants (∀γ: D γi = 0), then the balance equation simplifies to η i ¼ η 0 i ¼ η ρ (Eq. (10)). Theorem 10 states that if the dry weight density ρ would be allowed to increase by a small amount, such as 1 mg l −1 , then the marginal fitness gain that could be achieved by increasing protein concentration (plus dependent concentrations) by this amount is identical to that achieved by increasing the concentration of any reactant β (plus its dependent concentrations) by the same amount.
Instead of using Lagrange multipliers in the proof, one could express the total protein concentration P = ρ − ∑ α a α (constraint (14)) and the dependent reactant concentrations c γ = D γP P + ∑ β D γβ b β (Eq. (18)) in terms of ρ and of the independent reactant concentrations b. Substituting the resulting expressions into the growth equation (Theorem 7) would result in an objective function that depends only on ρ and b, and that is constrained only by the requirement of positive concentrations. While this would lead to the same balance equations as derived in the Lagrange multiplier framework, this formulation misses important insights that can be derived from the Lagrange multipliers themselves.
Optimal ribosome proteome fraction. Here we employ a very simple model for translation 38 . It accounts only for the elongation phase, where one catalyst (the ribosome plus bound mRNA, with concentration R) converts one substrate (the ternary complex, with concentration a T ) into protein, following irreversible Michaelis-Menten kinetics: We assume that the model has no dependent reactants (A = B) and that the ternary complex is not used in any other reaction. In this case, the same canceling of production costs as in the model depicted in Supplementary Fig. 1a happens, and the balance of net benefits of ternary complex and total protein, η T = η P (Eq. (10)), simplifies to with the kinetic benefit of the ternary complex T for the ribosome R, u R T (Definition 4). Substituting the partial derivative of irreversible Michaelis-Menten kinetics (Eq. (29)), we obtain Rearranging Eq. (29), we also see that the kinetics determine the concentration a T uniquely in terms of v R , R, K m , and the ribosome's turnover number k cat , Substituting this into Eq. (30) gives From the ribosome kinetics and mass conservation of proteins, we have Thus, substituting μ/k R = R/P and v R = μP in Eq. (31), we obtain

5:
This is equivalent to a quadratic equation in R/P, Its two solutions are To see which of the two solutions is relevant, we rewrite this as Because k cat R > Rk R = v R = μP, the term in square brackets [ ⋅ ] in Eq. (33) must be >1. Only the positive root is compatible with this condition. Thus, the ratio R/P is uniquely determined by To relate this expression to experimental data, we need to remember that ribosomes consist of protein and RNA. To estimate the ribosome proteome fraction ϕ R , we thus need to scale the previous expression by the fraction r P of ribosome which is protein, resulting in the final equation The same procedure can be used to find an equation for ϕ R that ignores the production costs. Starting from Eq. (31) without the production cost term μ/k R , we obtain which results in a quadratic equation similar to Eq. (32), Solving for R/P gives Again because Rk cat > μP, the term in square brackets [ ⋅ ] in Eq. (35) must be >1, and again only the positive root is compatible with this condition. Thus, the ribosome proteome fraction is uniquely determined in this approximation by We compared the predictions for ϕ R to experimental estimates based on quantitative proteomics 45 and on total RNA to protein ratios 19,42,46,66 . While all estimates are very similar (Fig. 2), given that on the order of 20% of total RNA is tRNA 42 and that this proportion is at least moderately growth rate dependent 67 , the exact growth rate dependence of ϕ R may be captured more faithfully by the proteomics data.
To calculate ϕ R from the proteomics measurements, we first calculated the mean over all molar concentrations of ribosomal proteins reported by Schmidt et al. 45 . Molar concentrations of the ribosome were converted to mass concentrations by multiplying with molar masses derived from the amino acid sequences for the protein parts and nucleotide sequences for the RNA parts. For this, we assumed that each ribosome contained one copy of each of its constituents, with the exception of four copies of RplL 68 . We multiplied the ribosome mass concentrations with the mass fraction of ribosomes that is protein (r P = 0.358 45 ), and divided the result by the total protein mass concentration P to obtain ϕ R . The proteome fraction of actively translating ribosomes was determined based on total ribosome proteome fraction and the fraction of active ribosome at different growth rates. The latter was estimated by fitting a smooth saturation function s(μ) = μ/(μ + z) over the fractions of active ribosomes estimated in ref. 46 , with the best-fitting parameter z = 0.124 h −1 . Nonlinear fitting was performed using the function nls() in gnu R 69 .
We set the Michaelis constant of the ribosome to K 0 m ¼ 3 10 À6 mol l À1 , based on the diffusion limit for ternary complexes calculated in ref. 38 . We set the ribosome's turnover number to k cat = 22 AA s −1 , the highest elongation rate observed experimentally in ref. 42 . As we do not distinguish between different ternary complexes and the ribosome only accepts one of the 40 different ternary complex types at any given time, K 0 m was multiplied by 40 (see ref. 38 ), resulting in an effective Michaelis constant of K m = 1.2 × 10 −4 mol l −1 . For consistency of the units with the mass concentration units used throughout our paper, the kinetic parameters had to be converted from molar to mass concentrations. The mean weight (±SD) of amino acids across all conditions assayed in ref. 45 was (132.60 ± 0.09) Da; the ribosome molecular weight is 2,306,967 Da; and the mean weight of ternary complexes is (69,167 ± 1351) g mol −1 . With these numbers, we obtain k cat = 22 AA s −1 × (132.60 Da AA −1 )/(2,306,967 Da) × 3600 s h −1 = 4.55 h −1 , and K m = 40 × 3 × 10 −6 mol l −1 × 69,167 g mol −1 = 8.30 g l −1 . For the predictions based on Eq. (34), we set the total protein concentration to P = 127.4 g l −1 45 .
For yeast, the concentration of actively translating ribosomes was determined based on total ribosome concentration and the fraction of active ribosome at different growth rates; the data was extracted from the figures of ref. 47 using the GetData Graph Digitizer program (Version 2.26, obtained from http://getdatagraph-digitizer.com/). The fraction of active ribosomes was estimated by fitting a smooth saturation function s(μ) = μ/(μ + z) over the fractions of active ribosomes estimated in ref. 47 , again using the nls() function in R. The best-fitting parameter was z = 0.122 h −1 , very close to the E. coli estimate. We again set K 0 m to the diffusion limit 38 K m = 3 × 10 −6 mol l −1 , multiplied with the number of different ternary complexes, of which there are 41 in yeast 70 . The ribosome's turnover number was set to k cat = 10 AA s −1 , the highest elongation rate observed experimentally according to ref. 71 . To convert to mass units, we used the mean weight of amino acids (130 Da) 72 , the ribosome molecular weight 3,620,000 Da 73 , and the molecular weight of ternary complex (240,000 Da) [74][75][76] . With these numbers, we obtain k cat = 10 AA s −1 × (130 Da AA −1 )/(3,620,000 Da) × 3600 s 1 h −1 = 1.29 h −1 , and K m = 41 × 3 × 10 −6 mol l −1 × 240,000 g mol −1 = 29.52 g l −1 . In yeast, the mass fraction of ribosomes that is protein is r P = 0.45 73 . For the predictions based on Eq. (34), we set the total protein concentration to the haploid cell value P = 85.7 g l −1 77 .
To quantify the fit of our predictions for ϕ R to the observed ribosomal proteome fractions, we calculated Pearson's correlation coefficient r between observed and predicted values as well as the coefficient of determination R 2 1 À SS res SS tot with the total sum of squares SS tot ¼ P i ðϕ R;i À ϕ R Þ 2 (proportional to the variance of the data) and the residual sum of squares SS res ¼ P i ðϕ R;i À ϕ predicted R;i Þ 2 (proportional to the variance of the residuals).
Dependence of maximal growth rate on cellular water content. Cayley et al. 40,50 have shown that the internal water content of E. coli cells increases when these are grown in environments with reduced osmolarity. This effect corresponds to a decrease of cellular dry weight per volume, ρ, by δρ. η ρ quantifies the associated reduction in relative fitness, δf = δμ * /μ * = η ρ δρ, with μ * the maximal growth rate (Definition 5). The relative change in the maximal growth rate per relative change in ρ is then From Eq. (26), we know that η P = κ P η ρ ; if there are no dependent reactants for P (i.e., ∀γ: D γP = 0), this simplifies to and thus The mass fraction of total protein in cell dry weight P/ρ ≈ 0.55 has been shown to be approximately constant for E. coli across growth conditions supporting intermediate to high growth rates 40,45,49 .
To estimate the total protein production cost P j q j P , we consider the simplest possible whole-cell model, comprising only a transport reaction and the ribosome reaction ( Supplementary Fig. 2). The active stoichiometric matrix A of this model and its inverse A −1 are (written here with row and column labels): The density is determined only by its two components, where P ¼ p t þ p R : From the inverse A −1 and Theorem 5, we obtain v t ¼ μðP þ a 1 Þ ¼ μρ ð40Þ and v R ¼ μP: From the inverse A −1 and Eq. (23), we get X j q j P ¼ Combining this with Eqs. (40) and (41) and using ϕ R = p R /P and ϕ t = p t /P = 1 − ϕ R , we obtain P j q j P ¼ 1 Inserting this in Eq. (39) results in The growth rate in the reference growth condition of osmolarity Osm = 0.28 in ref. 50 is μ = 1.0 h −1 . From Eq. (34), we estimate the mass fraction of ribosomal proteins in total protein ϕ R at this growth rate as ϕ R = 0.19. Substituting this value into Eq. (42) together with P/ρ = 0.55, we estimate the relative change in the maximal growth rate per relative change in ρ as ρη ρ ¼ 0:66: Note that instead of a density constraint on total dry mass ρ, previous analyses of schematic and coarse-grained models of balanced growth 3,5-9,19 utilized a constraint only on the concentration of macromolecules P. Calculating Pη P instead of ρη ρ leads to a replacement of the factor (ρ/P − 1) with (1 − P/ρ) compared with the last line of Eq. (42), and the same parameterization then leads to a prediction of Pη P = 0.36.
Cayley et al. 50 report cell growth at reduced osmolarities, summarized in Supplementary Table 1. The cell-free water content V free in Supplementary Table 1 is calculated from the total cell water V cell minus the observed constant bound water V b ¼ 0:40 ± 0:04 ml gCDW −140 . Errors are estimated standard deviations based on error propagation among normally distributed random variables. Supplementary Fig. 3 plots the natural logarithms of μ and ρ. Linear regression over the three available data points results in an estimated slope of 0.66, indistinguishable from our estimate of dln μ Ã dln ρ ¼ ρη ρ ¼ 0:66.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability