Robust Analysis of Fluxes in Genome-Scale Metabolic Pathways

Constraint-based optimization, such as flux balance analysis (FBA), has become a standard systems-biology computational method to study cellular metabolisms that are assumed to be in a steady state of optimal growth. The methods are based on optimization while assuming (i) equilibrium of a linear system of ordinary differential equations, and (ii) deterministic data. However, the steady-state assumption is biologically imperfect, and several key stoichiometric coefficients are experimentally inferred from situations of inherent variation. We propose an approach that explicitly acknowledges heterogeneity and conducts a robust analysis of metabolic pathways (RAMP). The basic assumption of steady state is relaxed, and we model the innate heterogeneity of cells probabilistically. Our mathematical study of the stochastic problem shows that FBA is a limiting case of our RAMP method. Moreover, RAMP has the properties that: A) metabolic states are (Lipschitz) continuous with regards to the probabilistic modeling parameters, B) convergent metabolic states are solutions to the deterministic FBA paradigm as the stochastic elements dissipate, and C) RAMP can identify biologically tolerable diversity of a metabolic network in an optimized culture. We benchmark RAMP against traditional FBA on genome-scale metabolic reconstructed models of E. coli, calculating essential genes and comparing with experimental flux data.


Results
We begin by re-visiting the foundational assumptions of FBA, where the metabolism of an individual cell c is defined by a system of n reactions, with the j-th reaction being where S ij c is the stoichiometric coefficient for metabolite i in reaction j, the flux of reaction j is v j c , and c denotes the cell. Consequently, genetically identical individuals of a culture share the same collection of metabolic reactions. Hence, the preponderance of stoichiometric coefficients of any particular genome-scale metabolic reconstruction are static across the individuals of a culture. However, a complete metabolic model includes several inferred and less certain reactions to model processes, such as those involved in the creation of biomass. In many of the recent extensions of FBA 9 , a combination of alternative objective functions, global constraints or upper/lower bounds on sets of reaction fluxes are added to the system. Consequently, many extensions, such as FBAwMC 32 , GIMME 33 , MOMENT 34 , and GX-FBA 35 , are implemented by adding rows and/or columns to the FBA stoichiometric matrix, and these new stoichiometric coefficients do not generally consist of integer values. Instead, depending on the method of interest, these added stoichiometric coefficients may consist of a combination of data from large-scale' omics sources, biochemistry knowledge, or theoretical modelling assumptions. Thus, they are carriers of inherent uncertainty.
In particular, the experimentally determined coefficients of a biomass equation are noticeably different from their integral counterparts in the shared metabolism, see Table 1 as an example from the E. coli metabolic model iJR904 36 . These less certain coefficients would be stochastic, time varying parameters per individual cell. Thus, instead of the growth reaction in Table 1 being imposed on all cells in a population, these coefficients should be able to vary among the individual cells.
There exists also other sources of uncertainty for the performance of a genome-scale metabolic reconstruction, such as the value of the phosphate/oxygen (P/O) ratio. Changes in how the network performs the ATP production and drain has a significant potential to affect model-predicted growth performance. In the following discussion, however, we will base our presentation and discussion on uncertainty in the biomass growth coefficients.
Suppose ĉ indexes the single, average cell assumed by FBA. The limiting steady state assumption of FBA constrains the fluxes so that, which is the equilibrium condition as t → ∞. Furthermore, it has been suggested that cellular metabolic performance evolves toward an optimal growth state as a culture's environment is left unchanged and as populations are repeatedly sampled and re-grown 4,10 . FBA is predicated on the assumption that cells have been tuned through such an evolutionary processes to an "optimal" use of their resources, where "optimal" is measured by a function of the fluxes, g (v). Hence, FBA studies metabolic processes through optimization problems, and adaptations thereof, of the form where Ŝ c is the stoichiometric matrix whose components are Ŝ ij c and v c is the associated (decision) vector of fluxes. The objective function g(v) is typically chosen to estimate the rate at which biomass is created, and for convenience, we assume throughout that g(v) is the sole flux of the growth reaction, i.e. g(v) = v Growth . While several alternative objectives have been proposed and investigated in different biological settings 6,7 , the production of biomass is the most prevalent choice and is the default in the COBRA 37 Table 1. An example of the non-integer coefficients, in parentheses, of the input and output metabolites of the E. coli metabolic model iJR904 growth reaction 36 .
A probabilistic development of RAMP. We assume that a culture has gone through a long-enough time of growth and re-sampling to be largely optimized to the resources of its (invariant) environment. Heterogeneity within the culture is modeled as the random experiment that is the growth of the culture, and we consider the variational states of the genetically identical cells of the culture as independent and identically distributed outcomes from that growth experiment. Hence, the known heterogeneity within an optimized culture is the result of a large sample of possible outcomes. i  be approximately normal, and we let μ i and σ i be the unknown mean and standard deviation of this normal distribution. This approximation is expected to be trustworthy since cultures regularly contain hundreds of millions of cells.
We replace the traditional static assumption of steady state with the probability constraints Here, |M i | is a bound on the feasible fluxes that define the allowed deviation from steady state. Unlike the FBA model formulation that collapses onto an ideal cell and ignores cultural variations, these stochastic constraints account for the innate variations that are known to exist. Since where δ ε and δ 1−ε are the ε and 1 − ε percentiles respectively, i.e.
Rearranging these inequalities and using the fact that The stochastic constraints in Eq. (4) are expressed in terms of the fluxes by extending the static differential equation of an individual cell in Eq. (1) to the stochastic differential equation of a culture. The standard assumption has been that the ideal cell of an FBA model is an average representation that matches measurements of a culture. If we allow S i to be the i-th row of the random stoichiometric matrix in which the uncertain elements are random variables, e.g. for the creation of biomass, then the implicit FBA assumption of an ideal cell results in the following stochastic differential equation for each metabolite, Consequently, for any flux vector v we have An apt interpretation of FBA's projection of a culture's heterogeneity onto an average is somewhat laid bare by these equalities: FBA assumes that μ i = 0. The first and second equalities then constrain the expectation of the sample mean, d x dt [ ]/ i  to be zero. The ideal FBA cell ĉ is interpreted as a stochastic representation of the culture; a representation that inherits the expected averages of the metabolic rates of change over a culture. The FBA paradigm thus acknowledges, but ignores within its modeling framework, deviation from steady state since FBA only insists that the expectation be zero without regard to variation. The third equality is a direct consequence of Eq. (5), and the last equality follows from the linearity of the expectation. Hence FBA assumes E(S i )v = 0, which shows that the stoichiometric coefficients of an FBA model are reasonably interpreted as averages, i.e. that = S E S ( ) i c i . Hence, growth coefficients, such as those in Table 1, are expectations attributed to an ideal cell in hopes of representing an entire culture. If a stoichiometric coefficient is certain because it is part of the metabolic network that is common to the taxa, then the average is simply the known value that is shared among all cells. If the coefficient is otherwise uncertain and assumed to be random, then FBA assumes the mean value.
The equation E(S i )v = μ i shows how the vector of fluxes v defines the expected value of the rate of change of the concentration (of metabolite i), and the equation begs for detailed random models of the uncertain stoichiometric coefficients. Such detail is challenged by experimental limitations. For example, it is impossible to study individual cells or even groups of cells that are in identical metabolic or proteomic states. Hence, inferring accurate distributional characteristics on parameters such as the growth coefficients is currently impossible. Beyond averages, our ability to express Eq. (4) in terms of the fluxes further necessitates the calculation of standard deviations.
Scenario analysis in RAMP. A common statistical supposition is to impose additional requirements. For example, we could outright assume that the individual random elements of S i are independent and identically distributed. However, such a requirement would be dubious in our setting due to the certain dependence among growth and metabolic uptake and excretion. We instead promote the use of scenario analysis to query the behavior of the stochastic model. Scenarios have been used in other application domains in which similar optimization problems arise. For instance, scenarios are motivated even with an assumption of independent and identically distributed random variables to better manage computational space 38 . In addition to overcoming our inability to accurately model the separate random stoichiometric coefficients, scenarios have the practical advantage of being easily interpreted as biological possibilities. Importantly, the normality of d x dt [ ]/ i  is independent of the distributions used to model the uncertain stoichiometric coefficients. The scenarios in RAMP's development and mathematical analysis are purposely arbitrary so that RAMP remains viable no matter which (discrete) distributions might actually model a biochemical reality.
Assuming that each S i has q random scenarios, we let S ik be the (non-random) stoichiometric coefficients for the i-th metabolite in scenario k and let p ik be the probability of scenario k, for k = 1, 2, 3, …, q. Allowing p i to be the (positive) vector of probabilities indexed by k, P i to be the (positive definite) diagonal matrix formed by p i , and Ŝ i to be the matrix whose k-th row is S ik , the mean and variance of S i v are where e is a vector of ones.
, we have that the standard deviation is The expected value and standard deviation are related through constraint (4), from which we have Although Eq. (8) is more complicated than a traditional linear constraint, it is the combination of two second order cone constraints. Such constraints share several desirable properties with linear constraints, such as being convex. Replacing the traditional linear constraints = S v 0 c with Eq. (8), we have arrived at an SOCP that is our RAMP model: The RAMP model defaults to a commonality of the upper and lower flux bounds among the cells so that L c = L and U c = U for all c. This default is somewhat for notational convenience since stochastic variation in L or U could be remodeled as variation in the coefficient of the associated flux, and the resulting constraint would then be part of the stoichiometric matrix 20 .
A feasible RAMP solution is a collection of fluxes under which the mean and variance of the normal distribution of d x dt [ ]/ i  satisfies the likelihood of being in steady state with regard to the spectrum of stoichiometric possibilities. A mathematical expression of this fact is possible upon recognizing that which allows the right-hand side of the SOCP constraint to be re-written as The left-hand side SOCP constraint can be similarly re-written as, The sets from which S i are drawn are called uncertainty sets in the robust optimization literature, and these sets define the collection of stoichiometric possibility. A feasible flux must satisfy being within |M i | of steady state for all possible stoichiometric possibilities, meaning that S i can deviate from its average p S i T i by as much as u T R i as long as ≤ u 1.
Illustrative example of the RAMP method. The re-expression of the SOCP constraints in terms of their uncertainty sets provides a geometric description. As an illustrative example, consider the two variable system The expected value of the constraint is simply which is the value of the original, static constraint. Setting δ 1−ε = 3 so that ε ≈ 0.0015, we have that This infinite collection of constraints is neither more nor less restrictive than the FBA constraint. One interpretation is that RAMP first relaxes the average equilibrium constraint of

which adds an infinite number of linear constraints (per original constraint).
A depiction of the geometry for the above example is shown in Fig. 1. The original, static (FBA) constraint with the variable bounds is depicted by the bold dashed line segment through the origin. The shaded region is the feasibility set formed by the combined stochastic constraints, and the light dashed lines are samples of the infinite set of linear inequalities added by the stochastic model. The RAMP feasible region does not contain the FBA feasible region. As an example, the point (0.75, 0.75) is feasible to the original static FBA constraint but is infeasible to the stochastic RAMP constraints. Likewise, there are feasible solutions to the stochastic model that are infeasible in the static case.

Mathematical observations of RAMP properties.
In the following we develop three important mathematical qualities of the RAMP model by establishing three Theorems, proofs of which are contained in the Methods section. Our first result demonstrates that feasible RAMP flux vectors satisfy a Lipschitz continuity property in their probabilistic elements, provided that bounding constraints can be marginally relaxed. The second result demonstrates that convergent RAMP solutions are indeed FBA solutions as stochastic uncertainty diminishes. The third theorem characterizes the types of uncertainty in the stoichiometric matrix that can be tolerated by an optimal FBA solution.
Continuity property of RAMP solutions. Corollary 1 (see Methods) suggests that RAMP's SOCP constraints might imbue continuity of the feasible flux vectors with regard to the stochastic modeling elements p i and Ŝ i . However, the linear bounds L ≤ v ≤ U prevent an immediate application of Corollary 1 because realistic genome-scale metabolic models commonly enforce some fluxes with implied equalities, i.e. L i = U i for some i. An example is the often called ATP maintenance flux, which is the rate at which ATP is lost due to non-growth-associated processes. These fixed fluxes are included to ensure that FBA models account for cellular demands that are not explicitly part of the genome-scale metabolic reconstructed model 39  The equalities imposed by the bounding constraints could be re-cast probabilistically. However, unlike the steady state equations of Sv = 0, these constraints cannot be guaranteed to be of the form required by Lemma 1. For instance, if v i is the flux for the reaction that removes ATP for non-growth-associated demands, then the Since the signs of L i − M i and L i + M i would agree for small values of M i , which would be required to remain realistic, the requirement of a positive right-hand side in Lemma 1 can not be ensured after writing both inequalities in the correct form.
Nevertheless, if we adjust the variable bounds dependent on the perturbed stochastic elements instead of prescribing them independent of perturbation, then we can relax the bounds to ensure feasibility. Probabilistically this means that we can set the values of M i for the variable bounds so that we satisfy them with probability 1. ( , ) : 1, 2, , }, , )

Theorem 1. For a collection of probability vectors p i and scenario matrices Ŝ i and for the lower and upper bounds U and L, let
i i  be the nonempty set of feasible fluxes satisfying the constraints of the RAMP model in Eq. (9). Assuming that each p i + Δp i is a probability vector, we then have for each and is independent of all S max{ }, Theorem 1 highlights that any questionable discontinuities that FBA might exhibit with regard to parametric update are due to the imposed linear equalities that are the byproduct of enforcing the biologically unrealistic assumption of a uniform steady state. The RAMP approach adjusts FBA's modeling paradigm to include its inherent stochastic nature, and in the process, RAMP ensures the continuity that would be expected of the feasible flux states as long as the imposed linear bounds can also be relaxed commensurate with the magnitude of the perturbation. In conclusion, any discontinuity of FBA that could be caused by minute model adjustments are the outcome of an overly rigid (linear) model.
We note that a recent discussion in the literature, see refs 40 -42, suggests that the creation of biomass might be sensitive to small perturbations in the growth coefficients. Indeed, the authors of ref. 40 have argued that in some cases perturbations to the biomass equation are necessary to achieve growth. The continuity of Theorem 1 counters any such concern with regard to RAMP, as flux states for any growth equation would remain close to those with a perturbed equation. Specifically, a change in the creation of biomass is controlled by the amount of coefficient perturbation, and hence, biomass creation can not depend on small adjustments to the growth coefficients in RAMP.  (10) and (11).
Connection between FBA and RAMP solutions. A reasonable question is if the deterministic FBA model is a limiting case of RAMP as the stochastic elements diminish? Theorems 2 and 3 show that convergent RAMP solutions are indeed FBA solutions, and hence, they answer this question in the affirmative. They also show that interpreting FBA as a limiting RAMP model characterizes the possible random flux states among cells in an optimal growth, steady state culture. The convergence result of Theorem 2 assumes an interiority condition, which is tacit as long as the equalities imposed by the bounding constraints are relaxed to allow arbitrarily small adjustments, i.e. as long as for any arbitrarily small η > 0. The proof of Theorem 2 (see Methods) is straightforward and follows from the fact that if the primal and dual solutions converge as the stochastic elements dissipate, then the resulting necessary and sufficient conditions are that of FBA. However, the proof importantly identifies that not all random elements need to disappear, a point that prompts Theorem 3. , and M i t , and assume v t → v as t → ∞. Assume likewise that a corresponding dual sequence of optimal solutions converges. Further assume that Then v is a solution to the FBA model in Theorem 2 can be relaxed since the proof remains valid as long as the limiting (see the proof in Methods). Since R i can have low rank, e.g. the rank of R 1 in Eq. (12) is 1, we see that R i need not generally vanish. This observation suggests that not all uncertainty needs to be removed to recover an FBA solution with RAMP. It is this interesting observation that motivates our third result.
Allowed probabilistic variation for optimal flux solution. Suppose v is a FBA solution corresponding with an average stoichiometric matrix, i.e. the rows of the matrix are p S i T i . We pose the question of whether or not it is possible to identify nontrivial scenarios and probabilities so that v is a solution to both an FBA problem and to a limiting stochastic RAMP counterpart. If such probabilities and scenarios exist, then the optimal flux state v would be optimal for a range of stochastic variation within an optimal growth, steady state culture. We describe scenarios Ŝ i as biologically possible for v if there exists (positive) probability vectors p i such that v maximizes cellular growth under the conditions that Theorem 3 characterizes the biologically possible scenarios of any FBA solution. The argument arranges and where S′ is the submatrix of S whose columns correspond with ′ v .
Then the scenarios of Ŝ i are biologically possible for v if and only if for all i we have α ′ ′ = S v e i i for some scalar α i ≠ 0. In biological terms, Theorem 3 identifies the probabilistic variations that are possible in the stoichiometric matrix for cells with the same optimal, steady state fluxes. Thus, if we trust that FBA solutions do indeed identify the limiting behavior of cells as they evolve to optimize to their environmental resources, then Theorem 3 concludes that optimized cells with the same flux state can be described by different probabilistic models.
Additionally, since Eq. (17) in the proof of Theorem 3 only requires that p i be a probability vector with nonzero entries, we can use this fact to construct a test that will check if any particular collection of scenarios is possible for optimal states. For example, suppose we are interested in investigating if some of the stoichiometric coefficients can vary for the cells having a specified flux state v in an optimized culture. Any collection of scenarios for which ˆŜ v i does not have identical components is impossible, since there is no possible way to assign probabilities that make the flux state optimal once probabilistic variation is included in the model. If our mathematical models accurately assess biological reality, then we can claim that it is biologically impossible to have a collection of optimized cells with a common flux state that vary according to the suggested scenarios.
RAMP implemented as a computational model. We now compare the computational ability of several RAMP implementations with FBA. The computational results herein only initiate the broad comparisons that would benchmark RAMP against the numerous FBA adaptations. The metrics we use in our benchmarking are direct comparisons between FBA and RAMP predictions of experimentally determined fluxes, and RAMP's and FBA's ability to identify essential and viable genes. The latter since this is one of the commonly used tests to assess a metabolic model's potential for producing biologically relevant predictions. All flux tests were conducted on the iJO1366 metabolic model of Escherichia coli 25 , whereas knockouts also used the iAF1260 metabolic model 24 .
These high-quality reconstructed models have, respectively, 1366 and 1260 genes, 2583 and 2382 metabolic reactions, and 1805 and 1668 metabolites.
Gene essentiality is decided by simulating gene knockouts with a corresponding removal of the associated set of metabolic reactions 2, 3 . If the modified metabolic model affects the optimal growth rate, typically by at least a 50% reduction, then the gene is considered to be essential to the organism. When a predictive model correctly identifies a gene as essential, we label this as a true positive. Similarly, a true negative indicates a correctly identified non-essential gene by the predictive model. A false positive occurs if the analysis incorrectly labels a gene as essential, and a false negative if the analysis incorrectly labels a gene as non-essential. The predictive power of the model is the ratio of the sum of true positives and negatives to the number of genes.
The forthcoming computational experiments introduces various stochastic assumptions by querying either different percentiles δ 1−ε or different scenarios for the growth equation. At this point, however, it is necessary to emphasize that the values of M i bounds are also part of the stochastic interpretation since they define the permissible variation from steady state. Specifically, the set of feasible flux states either grows or stays the same with an increase in any M i , from which we know that the growth rate is non-decreasing as a function of the individual M i values. So if the M i values are too small, then we will likely under-approximate the growth rate, but if the M i values are too large, then we will likely over-approximate the growth rate. We solve an optimization problem (see Methods) to appropriately set the values of M i so that RAMP's growth rate exactly matches that of FBA's. The problem ensures that the sum of permissible variations from steady state is as small as possible.
RAMP assessment of uncertainty in growth reaction. The goal of our first computational experiment is to gauge the relationship between RAMP and FBA for individual uncertainties in the growth reaction. This is motivated by the observation that stoichiometric coefficients of the growth reaction are empirically determined non-integers, with values spanning many orders of magnitude (see e.g. Table 1). We address this goal by posing the question, What is the maximum level of uncertainty tolerated in a single stoichiometric coefficient in the growth reaction before RAMP's predictions of gene-knockout essentiality depart from those of FBA in a given environment?
We tackle this challenge by individually increasing the level of randomness in a single growth coefficient, multiplying it with a factor containing ±σ based on five scenarios, and we increase the multiplier σ from zero in an iterative fashion (see Methods for details). As expected, RAMP and FBA coincide for the case with no randomness in the growth stoichiometric coefficients, i.e. with σ = 0. However, as σ increases it is possible that RAMP will find a different set of essential genes than FBA. Consequently, we determine the maximal value of σ for which RAMP and FBA return identical sets of essential genes.
Assuming that the genome-scale metabolic reconstruction is a high-quality rendering of the biochemical processes associated with the individual biomass components, we make the following observations: If RAMP returns identical essential gene sets as FBA (i.e. stable predictive ability) for large percent variations of a single coefficient, then the cells of the culture could possibly be disparate in how they incorporate the associated metabolite in biomass. If the predictive ability instead degrades with only slight deviations, then the growth coefficient is more likely exhibiting little variation across the culture.  Table 2 for names) have an undetermined maximal value > 10 26 and are placed at the vertical axis (10 4 ) Figure 2 displays, on a logarithmic scale, the calculated maximal values of the multiplier σ before RAMP departs from FBA's gene-knockout predictions, for each of the 72 growth coefficients in iJO1366 25 . We find that for 18 of the coefficients (see Table 2 for their names), we were only able to determine a lower bound on the value of the multiplier of σ > 10 26 . Upon inspection of the numerical results for these 18 coefficients, we noticed the following two properties: within numerical accuracy, (i) the value of the mean (see Eq. (6)) is μ i = 0, and (ii) the standard deviation (see Eq. (7)) scales with M i . A direct consequence of these two observations is that the RAMP-specific constraints of Eq. (8) reduces to the standard FBA constraint S i v = 0. Hence, for these 18 coefficients of the biomass equation, the RAMP problem reduced to that of FBA.
The biochemical explanation for why RAMP collapses to the standard FBA problem for all values of σ is immediate for the coefficients in Table 2 with the exception of 2fe2s, 4fe4s, fe3, and mobd: Upon inspection of iJO1366, we find that the biomass components are (i) freely available in the environment, and (ii) may be imported into the cytosol at no energetic or metabolic burden. Hence, any change of their stoichiometric coefficients in the biomass equation leaves the remainder of the biochemical network flux-patterns unchanged. However, the introduction or generation of 2fe2s, 4fe4s, fe3, and mobd into the cytosolic compartment is associated with metabolic burden through more complex pathways in the biochemical network reconstruction. For these four cases we were unable to determine a clear pattern, making a detailed biochemical explanation difficult.
We find for the remaining 54 biomass coefficients that the maximally allowed multiplier spans a surprising 4 orders of magnitude. Upon investigating a possible relationship between these maximal multiplier values and the corresponding stoichiometric coefficients, we find that they are uncorrelated (ρ = −0.004). Note that, for all of the 54 coefficients, the numerically determined RAMP solutions are more permissive than the FBA solutions, meaning that at the determined maximal multiplier values, RAMP identified as essential a subset of the FBA-determined essential genes. In total, we found that 23 of the 54 coefficients supported maximal permissible multipliers σ > 1. This is quite surprising, as it suggests that both the sign and the magnitude of the corresponding stoichiometric coefficient may vary while the growth rate and RAMP's predictive power are maintained.
RAMP computational predictions of gene essentiality. The essential gene predictions of RAMP and FBA for our second computational experiment, in which all coefficients in the growth reaction are simultaneously uncertain (see Methods), are tabulated in Table 3. We assumed in all experiments an oxygen-unlimited environment with glucose as the sole, and limiting, carbon source as the default environment. We additionally conducted the same tests with glycerol as the only carbon source, finding results that mirror those of glucose (not shown).
The tallies in Table 3 show some interesting differences between the two genome-scale metabolic reconstructions iAF1260 24 and iJO1366 25 in response to the different RAMP implementations. While the majority of biochemical reactions are the same in these two models, the RAMP analysis emphasizes that small details of the reconstructed model affect the identification of essential genes. Here, RAMP may be an additional approach in vetting the quality of model reconstructions. Table 3 shows that the default RAMP model (RAMP 1 ), which only permits uncertainty past the stated accuracy of FBA's growth equation, precisely replicates the essential gene predictions of FBA for both genome-scale metabolic model reconstructions. This fact computationally substantiates Theorem 2 and numerically demonstrates further that minor probabilistic variations mimic FBA.
Model 2 (RAMP 2 ) is designed to magnify the perturbations of Model 1 by a multiplicative factor ρ and 'spread' the scenarios in the last few digits. From Table 3, we see that this approach impacts RAMP's predictive ability. For the iJO1366 model, all values of ρ decreased the true positives by 9 and increased the true negatives by 1, which means that RAMP 2 had 8 fewer correct predictions than FBA with increased scenario variability as magnified by  Scaling the scenarios proportionately (RAMP 3a , RAMP 3b , and RAMP 3c representing increasing levels of variation) has a slightly more varied effect than multiplying the default variations by ρ. Results with σ ≥ 0.4 had little value since these models relaxed the FBA constraints to the point at which few, if any, essential genes were identified, and for this reason we only report the results for σ ≤ 0.3. The trend as σ increases is that RAMP 3 reduces the number of predicted essential genes, which lowers the number of true and false positives but increases the number of true and false negatives. Five true positives move to false negatives in the iJO1366 model for RAMP 3a and RAMP 3b , which reduces RAMP's predictive ability to 90.78%. However, for larger σ, RAMP 3c increases its number of true negatives and reduces the number of false positives, which increased predictive ability to 90.92%. The trend for the iAF1260 model is similar.
Altering the parameter ε (RAMP 4 ) had no effect on RAMP's ability to predict essential genes, and for all tested values of ε, RAMP agreed with FBA. This clearly suggests that it is more important to tune RAMP with regard to the scenarios than it is with the probabilistic guarantee of satisfying the near equilibrium constraints in Eq. (3).

RAMP comparison with experimentally determined fluxes.
To investigate RAMP's consistency with experimentally determined fluxes, we formulated a non-linear (quadratic) objective function for RAMP that minimizes the distance between a set of experimentally determined flux data and the corresponding reaction fluxes in a genome-scale metabolic model. We directly compare the results from RAMP 3 with σ = 0.2 with the corresponding results from FBA. The details of the problem formulation are given in Eqs (22) and (23). We used flux data 6 measured for 28 reactions from the central carbon metabolism of E. coli growing on glucose in batch conditions 26 both aerobically (Fig. 3(a)) and anaerobically (Fig. 3(b)), as well as in aerobic chemostat conditions 27 at the two dilution rates of 0.1/h (Fig. 3(c)) and 0.4/h (Fig. 3d)).
A striking visual feature of Fig. 3 is that the RAMP-predicted fluxes consistently provide a better fit to the experimental fluxes than the FBA-predicted ones. In Table 4, we quantify the performance of RAMP and FBA in predicting fluxes by calculating the mean square error , with N = 28. The results  Supplementary Table S1.
reported in Fig. 3 and Table 4 demonstrate that RAMP significantly outperforms FBA in predicting experimental fluxes in a wide range of conditions.

Discussion
The expanding suite of systems-biology computational methods described as constraint-based analyses has significantly improved our ability to probe the properties of genome-scale metabolic reconstructions. However, the basic premises of these methods, the assumptions of steady state and no heterogeneity in a cellular population, has stayed unchanged. In this article, we have developed the conceptual framework for a new paradigm of constraint-based methods by explicitly relaxing the basic assumption of steady-state while allowing for heterogeneity. We have presented the mathematical foundation proving that, as levels of uncertainty are reduced, the RAMP method will become identical to standard FBA. Building on the RAMP foundation, it will be possible to develop robust versions of many of the contemporary constraint-based methods. For some of the new extensions to FBA, the inclusion of RAMP formalism is straightforward, e.g. FBAwMC 32 , GIMME 33 , MOMENT 34 , and GX-FBA 35 . However, also methods that add gene-regulatory effects, such as rFBA 43 , and thermodynamic considerations 44 may be directly implemented with RAMP, since many of these methods generate modified metabolic networks, or temporal sequences of such. Note that, since it is known that uncertainty in variable bounds, e.g. restrictions such as upper bounds on reaction fluxes, mathematically is not different from uncertainty in the stoichiometric coefficients 20 , the RAMP formulation is also capable of modelling uncertainty in flux constraints.
Theoretically, we are able to formulate a wide host of linear and nonlinear RAMP models as SOCPs. However, the nonlinear ones have not been uniformly translatable into computational reality. While we have been able to solve the L 2 RAMP flux-minimization problem (see Eq. (22)) and show that RAMP outperforms FBA for flux predictions, we have achieved interspersed success with native SOCP solvers for general RAMP models. Specifically, we have attempted to • solve L 2 versions of Eq. (21), • predict essential genes with a RAMP version of the minimization of metabolic adjustment (MOMA) method 45 , which is a quadratic model, and • increase variation beyond the growth coefficients to replicate stochastic environmental bounds.
While we are able to solve these problems occasionally, we were unable to identify a choice of solver settings and convergence criteria to achieve consistent and stable success for the numerical implementation with current SOCP software for any of these extensions. Establishing a stable computational platform for the general RAMP paradigm is an important goal for future work.
The SOCP constraints of RAMP appropriately inject statistical variation into the FBA paradigm, but the conclusion of whether or not RAMP's stochastic perspective is more or less restrictive than FBA's deterministic setting remains unclear. From a mathematical point of view, RAMP is neither a direct relaxation of FBA nor a direct restriction of FBA. However, our computational results demonstrate that RAMP is, in practice, a relaxation of FBA. When investigating uncertainty in individual growth coefficients, we uniformly show that discrepancies in gene essentiality are due to RAMP being less restrictive, as evidenced by RAMP having predicted at least one essential gene less than FBA, i.e. a subset of the FBA results. Similarly with all growth coefficients assumed uncertain, RAMP's stochastic adaptation results in a model whose trend is to predict fewer essential genes, especially as uncertainty increases.
These computational outcomes favorably agrees with the biological sentiment that cellular metabolic networks are more robust in their wild-type setting and are increasingly fragile as they evolve toward an optimal, and specialized, steady-state. Hence, the number of essential genes should reduce as statistical variation increases, which is predicted by RAMP. Moreover, our mathematical analysis and computational outcomes show that RAMP mimics FBA if the stochastic elements are sufficiently small. In particular, if variation is not allowed beyond the assumed accuracy of an FBA model, then RAMP behaves like FBA for predictions of essential genes.
Theorem 3 provides a new approach to discriminate FBA's optimal flux states: For example, some FBA solutions may have no biologically possible scenarios while others may have many. This suggests that the former is less stable with regards to random variation, whereas the latter is optimal under a wider range of stochastic possibilities. One may intuit that actual flux states should be robust against random variation 46 , and hence, learning to calculate FBA solutions that can witness as much variability as possible is a promising direction of future research.
Finally, our choice of letting the objective function coincide with the growth reaction was purely a choice based on making RAMP easily comparable to traditional implementations of FBA, and RAMP is valid beyond this choice.

Methods
In this section we provide the mathematical foundation and proofs of the theorems presented in the main text. We also discuss aspects of the computational stability and implementation of RAMP.

Lemma 1
The following lemma establishes a general property from which our result follows.

Lemma 1. Let A be an n-element row vector, R be a q × n matrix, and b be a positive scalar. Let
. Then, for any ∈ v F A R ( , ), there are scalars z and λ satisfying 0 < z ≤ 1 and λ ≥ 0 such that ) and λ is independent of ∆A and ∆R .
Proof. Should v satisfy and we are done with z = 1 and λ = 0. Suppose instead that . From the Intermediate Value Theorem there is a τ such that 0 < τ ≤ 1 and To ease notation, let We conclude that , and hence, Hence, and λ is independent of ∆A and ∆R . □ Lemma 1 shows that vectors satisfying SOCP constraints of the form can be scaled to remain feasible, and moreover, that the magnitude of the adjustment to remain feasible is uniformly bounded by the magnitude of the perturbations. Importantly, this result immediately extends to finite collections of SOCP constraints of the same form, a fact we formalize in Corollary 1.

Corollary 1.
For i = 1, …, m, let A i be an n-element row vector, R i be a q × n matrix, and b i be a positive scalar. Let and λ is independent of all ∆A i and ∆R i .
Proof. The proof follows directly from Lemma 1 and its proof upon letting z i and λ I be the scalars for each I and setting z = min i {z i } and λ = max i {λ i }. □ Unfortunately, systems of linear equalities like those of FBA do not satisfy similar continuity properties, and changing the stoichiometric matrix in FBA can lead to discontinuities. As a simple example, the system of homogeneous equations has substantially different solution sets around α = 0. Notice that v 1 = v 2 = t solves the system for any t if α = 0, but that the only solution for α ≠ 0 is v 1 = v 2 = 0. Hence the solution v 1 = v 2 = 1 at α = 0 is not arbitrarily close to a solution with α ≠ 0.

Proof for Theorem 1
Proof.

15)
From the Mean Value Theorem there is a μ ik between p ik and p ik + Δp ik such that Each of RAMP's SOCP constraints may be re-written as  (14) and (16) we know that there is a v′ satisfying these perturbed SOCP constraints and a λ ≥ 0 defined independent of ∆A i and ∆R i , and subsequently independent of ∆Ŝ i , such that Since λˆ, κ i A , and κ i R are all independent of ∆Ŝ i , the proof is complete upon noticing that v′ must also satisfy

Proof for Theorem 2
Proof. For each t let ρ σŷ y w w ( , , , , , ) t t t t t t be the assumed dual solution that converges to ρ σŷ y w w ( , , , , , ). Further let f be a vector so that f T v = v Growth . The dual can be assumed to satisfy Slater's interiority condition since the necessary and sufficient primal-dual conditions in this case would be satisfied by the convergent sequences: Hence the system is indeed necessary and sufficient for optimality. Allowing t → ∞, we have by assumption that Since these are the necessary and sufficient conditions for FBA, v is an optimal solution to the FBA model.

Proof for Theorem 3
Proof. Let = ′ v v ( , 0) be as stated. Then, Hence, the limiting RAMP model as M i ↓ 0 is the linear program The necessary and sufficient conditions are  (9)) as a computational model proved elusive. While we were able to determine optimal solutions in many situations for the tested genome-scale metabolic reconstructions, we were unable to find consistent achievement of optimality for the general RAMP problem with a wide variety of state-of-the-art algorithms to solve SOCPs. The computational challenge is likely caused by two characteristics specific to genome-scale metabolic reconstructed networks. First, a substantial number of variables are unsigned and, essentially, unbounded. Second, the FBA problems arising from genome-scale reconstructed metabolic networks are highly degenerate (a detailed discussion of this point is provided in ref. 47) and often have high-dimensional solution sets. These characteristics combine in a way that seems to hamper the underlying interior-point algorithms employed in native SOCP solvers. We also tried to use standard nonlinear reformulations that allowed us to experiment with nonlinear solvers, but again success was tepid at best. Motivated by biological considerations, we therefore implemented a limited version of RAMP as a computational model. If the probabilistic variability of each constraint is restricted to a single coefficient, then the associated SOCP is linear. The linear constraint is constructed by, for example, assuming that the last coefficient of the i-th constraint is the sole random variable. Then for some q-vector s we have This calculation shows that if we restrict probabilistic variability to, for example, the coefficients of the growth reaction, then each of the SOCP constraints of the form can be re-written linearly as where Ŝ i,Growth is the column of Ŝ i containing the scenarios for the growth coefficient. We choose to use the growth coefficients for two reasons. First, we have already noted that this is an empirically derived reaction that is based on the aggregate properties of a cell culture, and thus, the stoichiometric values are inherently associated with uncertainty. Furthermore, it is well known that biomass composition (and thus the values of the stoichiometric coefficients of the growth reaction) of a cellular culture is affected by nutrient conditions and the culture's growth phase 48,49 .
The linear re-formulation in Eq. (20) allows RAMP to be solved with standard linear simplex solvers, which proved to be computationally stable. All numerical work was conducted with the freeware GLPK or with the commercial solver Gurobi © , both of which worked well with the COBRA toolbox 37 .
RAMP prediction of gene essentiality. We present two computational experiments to assist in assessing how stochastic variation affects RAMP's ability to predict gene essentiality. The first experiment iteratively induces individual randomness in each of the growth coefficients, holding the others at their nominal value as stated in the model. The second experiment assumes simultaneous randomness in all growth coefficients. In all of the computational experiments, we used the E. coli genome-scale metabolic models iAF1260 24 and iJO1366 25 in a minimal nutrient environment consisting of (i) the models' presets to emulate M9 medium, (ii) a maximal glucose uptake rate of 10 mmol/gDw/h, and (iii) unlimited oxygen uptake.
RAMP variation in a single growth coefficient. Here, we increase the level of uncertainty in a single growth coefficient in an iterative fashion, and we systematically go through all of the coefficients. The goal of this experiment is to gauge predictive ability against individual uncertainties in the growth equation, and we measure the predictive ability simply by comparing the number of essential genes with that predicted by FBA. If the predictive ability is stable for large percent variations of a single coefficient, then the cells of the culture could possibly be disparate in how they use the associated metabolite. If instead the predictive ability degrades with slight deviations, then the growth coefficient is more likely conserved across the culture.
The experiment uses q = 5 scenarios for each coefficient. The two most extreme scenarios multiply the coefficient by ±σ with probability P(3/2 ≤ z) = 0.0351, where z is standard normal variable. Two other scenarios multiply the coefficient by ±σ/2 with probability P(1/2 ≤ z ≤ 3/2) = 0.2389, and the other scenario leaves the coefficient unchanged with probability P(−1/2 ≤ z ≤ 1/2) = 0.4520. The value of δ 1−ε was 3, which meant that the probabilistic constraints were guaranteed with probability 0.9997.
The largest value of σ was calculated within an accuracy of 0.0001 and with a final value that allowed RAMP's predictive accuracy to match that of FBA's. Thus, starting from an initial value of σ = 0.0001, we (i) increased σ by a factor 2 until RAMP's gene knockout predictions no longer matched that of FBA, and (ii) at this point, we initiated a binary search to identify the maximal value of the multiplier σ, which resulted in RAMP and FBA gene knockout predictions matching.
RAMP simultaneous variation in all growth coefficients. In this experiment we assumed simultaneous randomness in all growth coefficients. Four probabilistic models were used to assess RAMP's overall sensitivity to stochastic variation, and each was compared against FBA's gene knockout predictions as calculated by the COBRA toolbox (see Table 3 for results).
Model 1(default) Our default RAMP model assumes that the means of the growth coefficients are the values stated in the FBA model and that probabilistic variation is restricted to the first unspecified significant digit. Assuming − 10 d i identifies this digit for the i-th growth coefficient, we further assume that each growth coefficient has the 5 scenarios in which it is perturbed by η ± ⋅ − 10 d i , where η is one of 0, 1, or 2. The probability of the scenario with no perturbation is P(−1/2 ≤ z ≤ 1/2) = 0.4520, with perturbation ± − 10 d i is P(1/2 ≤ z ≤ 3/2) = 0.2389, and with perturbation ± ⋅ − 2 10 d i is P(3/2 ≤ z) = 0.0351, where z is a standard normal variable. Two examples of these scenarios are illustrated in Table 5. We choose to set δ 1−ε = 3, so that the probabilistic constraints have a 99.97% guarantee of satisfaction.
Model 2 The second probabilistic model multiplies η in model 1 by the scalar ρ to assess how RAMP solutions adjust as the scenarios deviate from those of FBA. Eight tests with ρ = 2, 3,…, 9 were considered. Integers beyond 9 resulted in sign changes and were not considered. The probability scenarios from Model 1 were used, as well as δ 1−ε similarly being set to 3.
Model 3 Since Model 2's scaling by ρ disproportionately effects small growth coefficients, we compensate by replacing the scenarios of Model 1 with percentages of the growth coefficient itself. We choose different ranges for σ, using the scenarios σ η ± ⋅ ⋅Ŝ (1 ) i Growth , , where η is one of 0, ±1/2, or ±1. The probabilities are unchanged from those in Model 1, and δ 1−ε is 3.
Model 4 To assess how RAMP reacts to changes in the certainty of satisfying the probabilistic constraints, model 4 changes δ 1−ε . All other model parameters are inherited from the default model.
In our testing of the predictive ability of these four computational RAMP models (for computational results, see Table 3), we used the short-hand notation given in Table 6 for our choice of combination of model and parameters.
Determination of bounds M i . The selection of M i (see Eq. (9)) is an important parameter to decide. However, instead of imposing these bounds arbitrarily, these parameters are determined so that each RAMP model accurately returns the targeted, optimal growth rate. Let γ * be the optimal growth rate as calculated by the FBA model. The M i values are calculated by solving the following optimization problem.
where M is the vector whose i-th component is M i , and ⋅ 1 is the L 1 norm. Setting M to be the calculated optimal solution of Eq. (21) tightens the SOCP constraints while ensuring that the optimal growth rate is held at its desired value. We experimented with L 2 and L ∞ counterparts; however, the L 2 norm suffered from inconsistent solves, and the L ∞ norm overly relaxed constraints, which was not surprising.
Comparison of RAMP and FBA with experimentally determined fluxes. We tested the ability of FBA and RAMP to agree with experimentally determined fluxes. For RAMP we solved, As with RAMP, θ was set to 0.9, and the solution was again as close as possible to v EXP .  Table 6. The choice of RAMP parameters for models 1-4 in our computational simulations.