Abstract
Constraintbased optimization, such as flux balance analysis (FBA), has become a standard systemsbiology computational method to study cellular metabolisms that are assumed to be in a steady state of optimal growth. The methods are based on optimization while assuming (i) equilibrium of a linear system of ordinary differential equations, and (ii) deterministic data. However, the steadystate assumption is biologically imperfect, and several key stoichiometric coefficients are experimentally inferred from situations of inherent variation. We propose an approach that explicitly acknowledges heterogeneity and conducts a robust analysis of metabolic pathways (RAMP). The basic assumption of steady state is relaxed, and we model the innate heterogeneity of cells probabilistically. Our mathematical study of the stochastic problem shows that FBA is a limiting case of our RAMP method. Moreover, RAMP has the properties that: A) metabolic states are (Lipschitz) continuous with regards to the probabilistic modeling parameters, B) convergent metabolic states are solutions to the deterministic FBA paradigm as the stochastic elements dissipate, and C) RAMP can identify biologically tolerable diversity of a metabolic network in an optimized culture. We benchmark RAMP against traditional FBA on genomescale metabolic reconstructed models of E. coli, calculating essential genes and comparing with experimental flux data.
Introduction
Constraintbased analysis is an approach to study metabolic networks that has become a primary computational tool with a well established literature^{1,2,3}. The foundation of constraintbased analysis is a representation of a cell’s metabolism as a linear system of differential equations in the (unknown) fluxes of the metabolic reactions. The additional assumption that the metabolic network has reached steady state results in a homogeneous system. It is customary to optimize an appropriate objective over the fluxes that achieve steady state to model how a cellular metabolism reacts as it is held in a constant environment^{4, 5}. Different objectives have been proposed and studied^{6, 7}, with the most prevalent being the maximization of the rate at which biomass is created.
The most commonly applied version of constraintbased analysis is known as flux balance analysis (FBA). The original study of FBA investigated small portions of the central metabolism^{8}, but the use of constraintbased methods has expanded to encompass genomescale metabolic models and a diverse range of applications and computational formulations^{9}. However, the basic premises of optimality and steady state have remained intact, although these assumptions are well known to be inexact from a biochemistry perspective. That said, there are two somewhat evident reasons to leave these premises unchecked. First, the resulting optimization problems are most often linear and less often quadratic (convex), and in both cases the problems are easily solved with standard methods and software. Second, traditional constraintbased models are useful even with questionable premises. For example, with a high level of accuracy, FBA models can identify essential genes^{10}, predict metabolic responses as pathways are interrupted^{11}, determine central metabolic pathways^{12,13,14}, develop synthetic biology engineering strategies^{15,16,17,18}, as well as identify genes showing promise as inhibitors of cancer migration^{19}. Hence, a justified retort by the constraintbased analysis community to those who might question the two defining premises is that the modeling approach yields meaningful and computationally tractable science even though the cornerstone premises are imperfect.
The traditional constraintbased paradigm assumes a model of an individual cell that has reached an ideal, limiting setting of steady state. Models are benchmarked with experimental outcomes under the assumption that experimental measurements are conducted on a population of identical cells that have evolved to optimal performance. However, experimental reality differs from this assumption. First, even well optimized cellular populations of identical cells exhibit prominent heterogeneity in uptake, secretion, and growth rates^{4}. Second, experimental flux measurements constitute averages over such heterogeneous populations. Moreover, these averages provide the only possible experimental prototype of cellular flux since single cell measurements are currently impossible. Third, heterogeneity of metabolic rates is innate within a culture because cells vary within, e.g., their cell cycles and replication states. The traditional paradigm that a constraintbased model represents a single, optimized cell is certainly challenged by such experimental realities upon which models are vetted.
Here, we challenge and relax the foundational assumption of a system being in a steady state, and we instead introduce a robust optimization^{20} counterpart to FBA, called Robust Analysis of Metabolic Pathways (RAMP). The robust optimization implementation allows us to model a system stochastically where, instead of imposing the rigid and simplistic condition of deterministic coefficients and steady state, we allow these assumptions to be relaxed. In RAMP we control departures from steady state by limiting their likelihood of deviation. Since the stochastic flexibility of RAMP permits us to postulate metabolic models for cultures that deviate from steady state, this opens the possibility of computationally studying functional states of cellular metabolisms as they transition toward a steady state. Such an investigation begs the questions of (1) whether or not RAMP solutions are continuous in their stochastic elements and (2) whether or not convergent trajectories of RAMP solutions yield deterministic FBA solutions as the probabilistic assumptions abate. We mathematically answer both questions in the affirmative, and hence, FBA is justly thought of as the limit of evolutionary adaptations of a cellular metabolic network that achieves steady state.
Additionally, the development and stochastic formulation of RAMP explicitly allows us to systematically address heterogeneity in metabolic phenotypes that exists in isogenic cellular populations^{21,22,23}. Our mathematical analysis identifies the probabilistic situations that coincide with our modeling framework once the limiting deterministic setting is reached. Consequently, if we assume that RAMP’s modeling paradigm aligns with the stochastic nature of a culture, then we can mathematically characterize the ways in which a culture may harbor variation.
Possible concerns with RAMP are that it could lose the predictive successes of FBA, and that it could be less computationally tractable. We have designed our computational experiments to compare RAMP with FBA in such a way that both models can be solved with the default simplex algorithms of several linear solvers, and our results on two E. coli models^{24, 25} demonstrate that RAMP rivals FBA in its ability to identify essential genes. We further show that RAMP’s efficacy in identifying essential genes is predominantly stable as individual coefficients of the biomass equation are assumed to be uncertain. The ranges over which individual coefficients can vary are disparate, having a surprising spread of many orders of magnitude. All coefficients could accommodate uncertainty of at least 0.42%, with most extending beyond 100% uncertainty. We also compare FBA’s and RAMP’s consistency with experimentally determined fluxes^{6, 26, 27}, finding that RAMP significantly outperforms FBA for both aerobic and anaerobic conditions.
With regards to computational tractability, RAMP is a robust linear program, which is a secondorder cone program (SOCP) that is known to be solvable in polynomial time^{28, 29}. Such robust models were developed to overcome problems with overoptimizing designs, meaning that a design could be optimal with respect to the estimated data on which it was built but (much) less so as the data varied over realistic possibilities^{30}. A similar concern about overoptimization has been expressed in the FBA literature^{7}, and RAMP directly addresses this issue by expressing variability within the model itself.
We note that an alternative robust model has been suggested^{31}. This method differs significantly from RAMP since it uses a robust least squares method in a biobjective model. Such a model can illustrate the tradeoff between the optimal growth rate and the deviation from steady state, whereas RAMP uses robust linear programming. Furthermore, the numerical work in ref. 31 is conducted on a small, illustrative example, and there is no extrapolation of the biobjective model to a realistic genomescale metabolic reconstruction.
Our conclusion is that RAMP opens a new paradigm for constraintbased optimization and analysis of genomescale metabolic models that is also directly applicable to a large number of the recent constraintbased extensions to FBA. Our computational implementation and benchmarking of a specific version of RAMP using both linear and nonnonlinear objective functions, show that it is a computationally tractable alternative and complement to traditional FBA.
Results
We begin by revisiting the foundational assumptions of FBA, where the metabolism of an individual cell c is defined by a system of n reactions, with the jth reaction being
$${a}_{1j}^{c}\left[{x}_{1}^{c}\right]+{a}_{2j}^{c}\left[{x}_{2}^{c}\right]+\dots +{a}_{mj}^{c}\left[{x}_{m}^{c}\right]\underset{{k}_{}^{cj}}{\overset{{k}_{+}^{cj}}{\rightleftharpoons}}{b}_{1j}^{c}\left[{x}_{1}^{c}\right]+{b}_{2j}^{c}\left[{x}_{2}^{c}\right]+\dots +{b}_{mj}^{c}\left[{x}_{m}^{c}\right].$$Here, $\left[{x}_{i}^{c}\right]$ denotes the concentration of the ith metabolite in cell c. The cellular superscript c is absent in FBA’s standard development due to the assumption that an average cell is being studied. Thus, the ordinary differential equation for the ith metabolic concentration in cell c is,
where ${S}_{ij}^{c}$ is the stoichiometric coefficient for metabolite i in reaction j, the flux of reaction j is ${v}_{j}^{c}$, and c denotes the cell. Consequently, genetically identical individuals of a culture share the same collection of metabolic reactions. Hence, the preponderance of stoichiometric coefficients of any particular genomescale metabolic reconstruction are static across the individuals of a culture. However, a complete metabolic model includes several inferred and less certain reactions to model processes, such as those involved in the creation of biomass. In many of the recent extensions of FBA^{9}, a combination of alternative objective functions, global constraints or upper/lower bounds on sets of reaction fluxes are added to the system. Consequently, many extensions, such as FBAwMC^{32}, GIMME^{33}, MOMENT^{34}, and GXFBA^{35}, are implemented by adding rows and/or columns to the FBA stoichiometric matrix, and these new stoichiometric coefficients do not generally consist of integer values. Instead, depending on the method of interest, these added stoichiometric coefficients may consist of a combination of data from largescale’ omics sources, biochemistry knowledge, or theoretical modelling assumptions. Thus, they are carriers of inherent uncertainty.
In particular, the experimentally determined coefficients of a biomass equation are noticeably different from their integral counterparts in the shared metabolism, see Table 1 as an example from the E. coli metabolic model iJR904^{36}. These less certain coefficients would be stochastic, time varying parameters per individual cell. Thus, instead of the growth reaction in Table 1 being imposed on all cells in a population, these coefficients should be able to vary among the individual cells.
There exists also other sources of uncertainty for the performance of a genomescale metabolic reconstruction, such as the value of the phosphate/oxygen (P/O) ratio. Changes in how the network performs the ATP production and drain has a significant potential to affect modelpredicted growth performance. In the following discussion, however, we will base our presentation and discussion on uncertainty in the biomass growth coefficients.
Suppose $\stackrel{\u02c6}{c}$ indexes the single, average cell assumed by FBA. The limiting steady state assumption of FBA constrains the fluxes so that,
$$\frac{d\left[{x}_{i}^{\stackrel{\u02c6}{c}}\right]}{dt}=\sum _{j=1}^{n}{S}_{ij}^{\stackrel{\u02c6}{c}}{v}_{j}^{\stackrel{\u02c6}{c}}=\mathrm{0,}\phantom{\rule{.5em}{0ex}}\phantom{\rule{.5em}{0ex}}\forall i,$$which is the equilibrium condition as t → ∞. Furthermore, it has been suggested that cellular metabolic performance evolves toward an optimal growth state as a culture’s environment is left unchanged and as populations are repeatedly sampled and regrown^{4, 10}. FBA is predicated on the assumption that cells have been tuned through such an evolutionary processes to an “optimal” use of their resources, where “optimal” is measured by a function of the fluxes, g(v). Hence, FBA studies metabolic processes through optimization problems, and adaptations thereof, of the form
where ${S}^{\stackrel{\u02c6}{c}}$ is the stoichiometric matrix whose components are ${S}_{ij}^{\stackrel{\u02c6}{c}}$ and ${v}^{\stackrel{\u02c6}{c}}$ is the associated (decision) vector of fluxes. The objective function g(v) is typically chosen to estimate the rate at which biomass is created, and for convenience, we assume throughout that g(v) is the sole flux of the growth reaction, i.e. g(v) = v_{Growth}. While several alternative objectives have been proposed and investigated in different biological settings^{6, 7}, the production of biomass is the most prevalent choice and is the default in the COBRA^{37} toolbox. The vectors of lower bounds, ${L}^{\stackrel{\u02c6}{c}}$, and upper bounds, ${U}^{\stackrel{\u02c6}{c}}$, may contain ±∞, or some suitably large value, to indicate that a flux is unbounded. If ${L}_{j}^{\stackrel{\u02c6}{c}}<0$ and ${U}_{j}^{\stackrel{\u02c6}{c}}>0$, then reaction j is reversible.
A probabilistic development of RAMP
We assume that a culture has gone through a longenough time of growth and resampling to be largely optimized to the resources of its (invariant) environment. Heterogeneity within the culture is modeled as the random experiment that is the growth of the culture, and we consider the variational states of the genetically identical cells of the culture as independent and identically distributed outcomes from that growth experiment. Hence, the known heterogeneity within an optimized culture is the result of a large sample of possible outcomes. Consequently, for each metabolite i the collection of $d\left[{x}_{i}^{c}\right]/dt$ across the cells c of the culture is a random sample of the stochastic rate of change of the concentration. Let $\mathcal{C}$ be the collection of cells in the culture and $d\left[{x}_{i}^{\mathcal{C}}\right]/dt$ be the average rate of change of the concentration (of metabolite i) over the sample, that is
$$\frac{d\left[{x}_{i}^{\mathcal{C}}\right]}{dt}=\frac{1}{\left\mathcal{C}\right}\sum _{c\in \mathbf{C}}\frac{d\left[{x}_{i}^{c}\right]}{dt}.$$The central limit theorem mandates that $d\left[{x}_{i}^{\mathcal{C}}\right]/dt$ be approximately normal, and we let μ_{ i } and σ_{ i } be the unknown mean and standard deviation of this normal distribution. This approximation is expected to be trustworthy since cultures regularly contain hundreds of millions of cells.
We replace the traditional static assumption of steady state with the probability constraints
which combine to ensure that $P({M}_{i}\le d[{x}_{i}^{\mathcal{C}}]/dt\le {M}_{i})\ge 12\epsilon $. Here, M_{ i } is a bound on the feasible fluxes that define the allowed deviation from steady state. Unlike the FBA model formulation that collapses onto an ideal cell and ignores cultural variations, these stochastic constraints account for the innate variations that are known to exist. Since $(d[{x}_{i}^{\mathcal{C}}]/dt{\mu}_{i})/{\sigma}_{i}$ is a standard normal variable, we write the constraints in Eq. (3) as
$$\frac{{M}_{i}{\mu}_{i}}{{\sigma}_{i}}\ge {\delta}_{1\epsilon}\phantom{\rule{1em}{0ex}}\mathrm{and}\phantom{\rule{.25em}{0ex}}\phantom{\rule{.1em}{0ex}}\frac{{M}_{i}{\mu}_{i}}{{\sigma}_{i}}\le {\delta}_{\epsilon},$$where δ_{ ε } and δ_{1−ε } are the ε and 1 − ε percentiles respectively, i.e. $P((d\left[{x}_{i}^{\mathcal{C}}\right]/dt{\mu}_{i})/{\sigma}_{i}>{\delta}_{1\epsilon})=\epsilon $. Rearranging these inequalities and using the fact that δ_{1−ε } = −δ_{ ε }, we find
The stochastic constraints in Eq. (4) are expressed in terms of the fluxes by extending the static differential equation of an individual cell in Eq. (1) to the stochastic differential equation of a culture. The standard assumption has been that the ideal cell of an FBA model is an average representation that matches measurements of a culture. If we allow S_{ i } to be the ith row of the random stoichiometric matrix in which the uncertain elements are random variables, e.g. for the creation of biomass, then the implicit FBA assumption of an ideal cell results in the following stochastic differential equation for each metabolite,
Consequently, for any flux vector v we have
$${\mu}_{i}=E(d[{x}_{i}^{\stackrel{\u02c6}{c}}]/dt)=E(d[{x}_{i}^{\mathcal{C}}]/dt)=E({S}_{i}v)=E({S}_{i})v.$$An apt interpretation of FBA’s projection of a culture’s heterogeneity onto an average is somewhat laid bare by these equalities: FBA assumes that μ_{ i } = 0. The first and second equalities then constrain the expectation of the sample mean, $d\left[{x}_{i}^{\mathcal{C}}\right]/dt$ to be zero. The ideal FBA cell $\stackrel{\u02c6}{c}$ is interpreted as a stochastic representation of the culture; a representation that inherits the expected averages of the metabolic rates of change over a culture. The FBA paradigm thus acknowledges, but ignores within its modeling framework, deviation from steady state since FBA only insists that the expectation be zero without regard to variation. The third equality is a direct consequence of Eq. (5), and the last equality follows from the linearity of the expectation. Hence FBA assumes E(S_{ i })v = 0, which shows that the stoichiometric coefficients of an FBA model are reasonably interpreted as averages, i.e. that ${S}_{i}^{\stackrel{\u02c6}{c}}=E({S}_{i})$. Hence, growth coefficients, such as those in Table 1, are expectations attributed to an ideal cell in hopes of representing an entire culture. If a stoichiometric coefficient is certain because it is part of the metabolic network that is common to the taxa, then the average is simply the known value that is shared among all cells. If the coefficient is otherwise uncertain and assumed to be random, then FBA assumes the mean value.
The equation E(S_{ i })v = μ_{ i } shows how the vector of fluxes v defines the expected value of the rate of change of the concentration (of metabolite i), and the equation begs for detailed random models of the uncertain stoichiometric coefficients. Such detail is challenged by experimental limitations. For example, it is impossible to study individual cells or even groups of cells that are in identical metabolic or proteomic states. Hence, inferring accurate distributional characteristics on parameters such as the growth coefficients is currently impossible. Beyond averages, our ability to express Eq. (4) in terms of the fluxes further necessitates the calculation of standard deviations.
Scenario analysis in RAMP
A common statistical supposition is to impose additional requirements. For example, we could outright assume that the individual random elements of S_{ i } are independent and identically distributed. However, such a requirement would be dubious in our setting due to the certain dependence among growth and metabolic uptake and excretion. We instead promote the use of scenario analysis to query the behavior of the stochastic model. Scenarios have been used in other application domains in which similar optimization problems arise. For instance, scenarios are motivated even with an assumption of independent and identically distributed random variables to better manage computational space^{38}. In addition to overcoming our inability to accurately model the separate random stoichiometric coefficients, scenarios have the practical advantage of being easily interpreted as biological possibilities. Importantly, the normality of $d\left[{x}_{i}^{\mathcal{C}}\right]/dt$ is independent of the distributions used to model the uncertain stoichiometric coefficients. The scenarios in RAMP’s development and mathematical analysis are purposely arbitrary so that RAMP remains viable no matter which (discrete) distributions might actually model a biochemical reality.
Assuming that each S_{ i } has q random scenarios, we let S_{ ik } be the (nonrandom) stoichiometric coefficients for the ith metabolite in scenario k and let p_{ ik } be the probability of scenario k, for k = 1, 2, 3, …, q. Allowing p_{ i } to be the (positive) vector of probabilities indexed by k, P_{ i } to be the (positive definite) diagonal matrix formed by p_{ i }, and ${\stackrel{\u02c6}{S}}_{i}$ to be the matrix whose kth row is S_{ ik }, the mean and variance of S_{ i }v are
and
$${\sigma}_{i}^{2}=\mathrm{Var}({S}_{i}v)=\sum _{k=1}^{q}{p}_{ik}{({S}_{ik}v\mathrm{E}({S}_{ik}v))}^{2}={({\stackrel{\u02c6}{S}}_{i}v)}^{T}{(Ie{p}_{i}^{T})}^{T}{P}_{i}(Ie{p}_{i}^{T}){\stackrel{\u02c6}{S}}_{i}v,$$where e is a vector of ones. Defining ${R}_{i}={\delta}_{1\epsilon}\sqrt{{P}_{i}}(Ie{p}_{i}^{T}){\stackrel{\u02c6}{S}}_{i}$, we have that the standard deviation is
The expected value and standard deviation are related through constraint (4), from which we have
$$\Vert {R}_{i}v\Vert \le \phantom{\rule{.25em}{0ex}}\mathrm{min}\{{M}_{i}{p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v,{M}_{i}+{p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v\},$$or equivalently,
Although Eq. (8) is more complicated than a traditional linear constraint, it is the combination of two second order cone constraints. Such constraints share several desirable properties with linear constraints, such as being convex. Replacing the traditional linear constraints ${S}^{\stackrel{\u02c6}{c}}v=0$ with Eq. (8), we have arrived at an SOCP that is our RAMP model:
The RAMP model defaults to a commonality of the upper and lower flux bounds among the cells so that L^{c} = L and U^{c} = U for all c. This default is somewhat for notational convenience since stochastic variation in L or U could be remodeled as variation in the coefficient of the associated flux, and the resulting constraint would then be part of the stoichiometric matrix^{20}.
A feasible RAMP solution is a collection of fluxes under which the mean and variance of the normal distribution of $d\left[{x}_{i}^{\mathcal{C}}\right]/dt$ satisfies the likelihood of being in steady state with regard to the spectrum of stoichiometric possibilities. A mathematical expression of this fact is possible upon recognizing that
$$\Vert {R}_{i}v\Vert =\phantom{\rule{.25em}{0ex}}\mathrm{max}\phantom{\rule{.25em}{0ex}}\{{u}^{T}{R}_{i}v\phantom{\rule{.20em}{0ex}}:\Vert u\Vert \le 1\},$$which allows the righthand side of the SOCP constraint to be rewritten as
The lefthand side SOCP constraint can be similarly rewritten as,
The sets from which S_{ i } are drawn are called uncertainty sets in the robust optimization literature, and these sets define the collection of stoichiometric possibility. A feasible flux must satisfy being within M_{ i } of steady state for all possible stoichiometric possibilities, meaning that S_{ i } can deviate from its average ${p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}$ by as much as u^{T}R_{ i } as long as $\Vert u\Vert \le 1$.
Illustrative example of the RAMP method
The reexpression of the SOCP constraints in terms of their uncertainty sets provides a geometric description. As an illustrative example, consider the two variable system
$${S}_{1}v={s}_{1}{v}_{1}+{s}_{2}{v}_{2}=\mathrm{0,}\phantom{\rule[0000]{0.20em}{0ex}}1\le {v}_{1}\le \mathrm{1,}\phantom{\rule[0000]{0.20em}{0ex}}1\le {v}_{2}\le \mathrm{1,}$$where the average values are s_{1} = 1 and s_{2} = −1, corresponding to the standard static values of a stoichiometric matrix. Assume we have q = 3 scenarios, and
$${p}_{1}=\left(\begin{array}{c}\mathrm{1/4}\\ \mathrm{1/2}\\ \mathrm{1/4}\end{array}\right)\phantom{\rule{1em}{0ex}}\mathrm{and}\phantom{\rule{1em}{0ex}}{\stackrel{\u02c6}{S}}_{1}=\left[\begin{array}{cc}\hfill 0.9\hfill & \hfill 1.2\hfill \\ \hfill 1\hfill & \hfill 1\hfill \\ \hfill 1.1\hfill & \hfill 0.8\hfill \end{array}\right],$$which means that
$$P({s}_{1}=0.9\phantom{\rule{.5em}{0ex}}\mathrm{and}\phantom{\rule{.5em}{0ex}}{s}_{2}=\phantom{\rule[0000]{0.25em}{0ex}}\mathrm{1.2)}=P({s}_{1}=1.1\phantom{\rule{.5em}{0ex}}\mathrm{and}\phantom{\rule{.5em}{0ex}}{s}_{2}=\phantom{\rule[0000]{0.25em}{0ex}}\mathrm{0.8)}=\mathrm{1/4}$$and
$$P({s}_{1}=1\phantom{\rule{.25em}{0ex}}\mathrm{and}\phantom{\rule{.25em}{0ex}}{s}_{2}=\phantom{\rule[0000]{0.25em}{0ex}}\mathrm{1)}=\mathrm{1/2}.$$The expected value of the constraint is simply
$${p}_{1}^{T}{\stackrel{\u02c6}{S}}_{1}v={\overline{S}}_{1}v={v}_{1}{v}_{2},$$which is the value of the original, static constraint. Setting δ_{1−ε } = 3 so that ε ≈ 0.0015, we have that
If M_{1} = 0.2, then Eqs (10) and (11) combine to show that v_{1} and v_{2} are feasible in the stochastic model if and only if
$$0.2\le \mathrm{(1}\mathrm{0.15(}{u}_{1}{u}_{3})){v}_{1}+(\phantom{\rule[0000]{0.20em}{0ex}}\phantom{\rule[0000]{0.20em}{0ex}}1\mathrm{0.30(}{u}_{1}{u}_{3})){v}_{2}\le 0.2$$for all vectors u such that ${u}_{1}^{2}+{u}_{2}^{2}+{u}_{3}^{2}\le 1$. Hence, v_{1} and v_{2} are feasible in the stochastic case only if they satisfy an infinite number of linear inequalities that are perturbations of a relaxed version of the original, static equality.
This infinite collection of constraints is neither more nor less restrictive than the FBA constraint. One interpretation is that RAMP first relaxes the average equilibrium constraint of ${p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v=0$ by replacing this last expression with ${M}_{i}\le {p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v\le {M}_{i}$, but then RAMP further restricts the relaxed constraint by replacing it with $\Vert {R}_{i}v\Vert {M}_{i}\le {p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v\le {M}_{i}\Vert {R}_{i}v\Vert $, which adds an infinite number of linear constraints (per original constraint).
A depiction of the geometry for the above example is shown in Fig. 1. The original, static (FBA) constraint with the variable bounds is depicted by the bold dashed line segment through the origin. The shaded region is the feasibility set formed by the combined stochastic constraints, and the light dashed lines are samples of the infinite set of linear inequalities added by the stochastic model. The RAMP feasible region does not contain the FBA feasible region. As an example, the point (0.75, 0.75) is feasible to the original static FBA constraint but is infeasible to the stochastic RAMP constraints. Likewise, there are feasible solutions to the stochastic model that are infeasible in the static case.
Mathematical observations of RAMP properties
In the following we develop three important mathematical qualities of the RAMP model by establishing three Theorems, proofs of which are contained in the Methods section. Our first result demonstrates that feasible RAMP flux vectors satisfy a Lipschitz continuity property in their probabilistic elements, provided that bounding constraints can be marginally relaxed. The second result demonstrates that convergent RAMP solutions are indeed FBA solutions as stochastic uncertainty diminishes. The third theorem characterizes the types of uncertainty in the stoichiometric matrix that can be tolerated by an optimal FBA solution.
Continuity property of RAMP solutions
Corollary 1 (see Methods) suggests that RAMP’s SOCP constraints might imbue continuity of the feasible flux vectors with regard to the stochastic modeling elements p_{ i } and ${\stackrel{\u02c6}{S}}_{i}$. However, the linear bounds L ≤ v ≤ U prevent an immediate application of Corollary 1 because realistic genomescale metabolic models commonly enforce some fluxes with implied equalities, i.e. L_{ i } = U_{ i } for some i. An example is the often called ATP maintenance flux, which is the rate at which ATP is lost due to nongrowthassociated processes. These fixed fluxes are included to ensure that FBA models account for cellular demands that are not explicitly part of the genomescale metabolic reconstructed model^{39}. The continuity results depend on scaling, and Lemma 1 and Corollary 1 (see Methods) do not provide the Lipschitz continuity for RAMP since scaled fixed fluxes do not remain feasible under the requirement that v_{ i } = L_{ i } = U_{ i }.
The equalities imposed by the bounding constraints could be recast probabilistically. However, unlike the steady state equations of Sv = 0, these constraints cannot be guaranteed to be of the form required by Lemma 1. For instance, if v_{ i } is the flux for the reaction that removes ATP for nongrowthassociated demands, then the equality v_{ i } = L_{ i } would naturally transition to P(L_{ i } − M_{ i } ≤ v_{ i } ≤ L_{ i } + M_{ i }) ≥ 1 − 2ε, which would associate with an SOCP constraint of the form ${L}_{i}{M}_{i}+\Vert {R}_{i}v\Vert \le v\le {L}_{i}+{M}_{i}\Vert {R}_{i}v\Vert $. Since the signs of L_{ i } − M_{ i } and L_{ i } + M_{ i } would agree for small values of M_{ i }, which would be required to remain realistic, the requirement of a positive righthand side in Lemma 1 can not be ensured after writing both inequalities in the correct form.
Nevertheless, if we adjust the variable bounds dependent on the perturbed stochastic elements instead of prescribing them independent of perturbation, then we can relax the bounds to ensure feasibility. Probabilistically this means that we can set the values of M_{ i } for the variable bounds so that we satisfy them with probability 1.
Theorem 1.
For a collection of probability vectors p_{ i }and scenario matrices${\stackrel{\u02c6}{S}}_{i}$and for the lower and upper bounds U and L, let$\mathcal{F}(\{({p}_{i},{\stackrel{\u02c6}{S}}_{i})\phantom{\rule[0000]{0.20em}{0ex}}:i=\mathrm{1,\; 2},\dots ,m\},L,U)$be the nonempty set of feasible fluxes satisfying the constraints of the RAMP model in Eq. (9). Assuming that each p_{ i } + Δp_{ i }is a probability vector, we then have for each
$$v\in \mathcal{F}(\{({p}_{i},{\stackrel{\u02c6}{S}}_{i})\phantom{\rule[0000]{0.20em}{0ex}}:i=\mathrm{1,\; 2},\dots ,m\},L,U)$$that there is a λ ≥ 0 such that
$$\underset{v\prime}{\mathrm{min}}\Vert vv\prime \Vert \le \lambda \mathrm{\Gamma}$$where
$$\begin{array}{lll}\hfill \mathrm{\Gamma}& \hfill =\hfill & \underset{i}{\mathrm{max}}\{\Vert \mathrm{\Delta}{p}_{i}\Vert +\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert +\Vert \mathrm{\Delta}{p}_{i}\Vert \Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert \},\hfill \\ v\prime \hfill & \in \hfill & \mathcal{F}(\{({p}_{i}+\mathrm{\Delta}{p}_{i},{\stackrel{\u02c6}{S}}_{i}+\mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i})\phantom{\rule{.25em}{0ex}}:i=1,2,\dots ,m\},L\lambda \mathrm{\Gamma}e,U+\lambda \mathrm{\Gamma}e),\hfill \\ \hfill & \hfill & and\phantom{\rule{.25em}{0ex}}\lambda \phantom{\rule{.1em}{0ex}}\phantom{\rule{.25em}{0ex}}is\phantom{\rule{.25em}{0ex}}independent\phantom{\rule{.25em}{0ex}}of\phantom{\rule{.25em}{0ex}}all\phantom{\rule{.25em}{0ex}}\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert .\hfill \end{array}$$Theorem 1 highlights that any questionable discontinuities that FBA might exhibit with regard to parametric update are due to the imposed linear equalities that are the byproduct of enforcing the biologically unrealistic assumption of a uniform steady state. The RAMP approach adjusts FBA’s modeling paradigm to include its inherent stochastic nature, and in the process, RAMP ensures the continuity that would be expected of the feasible flux states as long as the imposed linear bounds can also be relaxed commensurate with the magnitude of the perturbation. In conclusion, any discontinuity of FBA that could be caused by minute model adjustments are the outcome of an overly rigid (linear) model.
We note that a recent discussion in the literature, see refs 40,41,42, suggests that the creation of biomass might be sensitive to small perturbations in the growth coefficients. Indeed, the authors of ref. 40 have argued that in some cases perturbations to the biomass equation are necessary to achieve growth. The continuity of Theorem 1 counters any such concern with regard to RAMP, as flux states for any growth equation would remain close to those with a perturbed equation. Specifically, a change in the creation of biomass is controlled by the amount of coefficient perturbation, and hence, biomass creation can not depend on small adjustments to the growth coefficients in RAMP.
Connection between FBA and RAMP solutions
A reasonable question is if the deterministic FBA model is a limiting case of RAMP as the stochastic elements diminish? Theorems 2 and 3 show that convergent RAMP solutions are indeed FBA solutions, and hence, they answer this question in the affirmative. They also show that interpreting FBA as a limiting RAMP model characterizes the possible random flux states among cells in an optimal growth, steady state culture. The convergence result of Theorem 2 assumes an interiority condition, which is tacit as long as the equalities imposed by the bounding constraints are relaxed to allow arbitrarily small adjustments, i.e. as long as L_{ i } = v_{ i } = U_{ i } is replaced with L_{ i } − η ≤ v_{ i } ≤ U_{ i } + η for any arbitrarily small η > 0. The proof of Theorem 2 (see Methods) is straightforward and follows from the fact that if the primal and dual solutions converge as the stochastic elements dissipate, then the resulting necessary and sufficient conditions are that of FBA. However, the proof importantly identifies that not all random elements need to disappear, a point that prompts Theorem 3.
Theorem 2.
Let${p}_{i}^{t}$, ${\stackrel{\u02c6}{S}}_{i}^{t}$, ${M}_{i}^{t}$, be sequences such that for each t the corresponding RAMP model satisfies Slater’s interiority condition. Let v^{t}be an optimal solution of the RAMP model corresponding to${p}_{i}^{t}$, ${\stackrel{\u02c6}{S}}_{i}^{t}$, and${M}_{i}^{t}$, and assume v^{t} → v as t → ∞. Assume likewise that a corresponding dual sequence of optimal solutions converges. Further assume that as t → ∞ we have${({p}_{i}^{t})}^{T}{\stackrel{\u02c6}{S}}_{i}^{t}\to {\overline{S}}_{i}$, ${R}_{i}^{t}\to 0$, and${M}_{i}^{t}\to 0$. Then v is a solution to the FBA model
$$\mathrm{max}\{{v}_{Growth}\phantom{\rule[0000]{0.20em}{0ex}}:\overline{S}v=\mathrm{0,}\phantom{\rule{.1em}{0ex}}L\le v\le M\},$$where$\overline{S}$is the matrix whose ith row is${\overline{S}}_{i}$.
The assumption that ${R}_{i}^{t}\to 0$ in Theorem 2 can be relaxed since the proof remains valid as long as the limiting matrices R_{ i } satisfy R_{ i }v = 0 and ${R}_{i}^{T}({y}_{i}+{w}_{i})=0$ (see the proof in Methods). Since R_{ i } can have low rank, e.g. the rank of R_{1} in Eq. (12) is 1, we see that R_{ i } need not generally vanish. This observation suggests that not all uncertainty needs to be removed to recover an FBA solution with RAMP. It is this interesting observation that motivates our third result.
Allowed probabilistic variation for optimal flux solution
Suppose $\stackrel{\u02c6}{v}$ is a FBA solution corresponding with an average stoichiometric matrix, i.e. the rows of the matrix are ${p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}$. We pose the question of whether or not it is possible to identify nontrivial scenarios and probabilities so that $\stackrel{\u02c6}{v}$ is a solution to both an FBA problem and to a limiting stochastic RAMP counterpart. If such probabilities and scenarios exist, then the optimal flux state $\stackrel{\u02c6}{v}$ would be optimal for a range of stochastic variation within an optimal growth, steady state culture. We describe scenarios ${\stackrel{\u02c6}{S}}_{i}$ as biologically possible for $\stackrel{\u02c6}{v}$ if there exists (positive) probability vectors p_{ i } such that $\stackrel{\u02c6}{v}$ maximizes cellular growth under the conditions that $\stackrel{\u02c6}{v}$ satisfies $L\le \stackrel{\u02c6}{v}\le U$ and
$$\underset{{M}_{i}\to 0}{\mathrm{lim}}P({M}_{i}\le {p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}\stackrel{\u02c6}{v}\le {M}_{i})\le 12\epsilon ,\forall i.$$Theorem 3 characterizes the biologically possible scenarios of any FBA solution. The argument arranges and partitions $\stackrel{\u02c6}{v}$ into $(\stackrel{\u02c6}{v}\prime ,\stackrel{\u02c6}{v}\u2033)$, where ${\stackrel{\u02c6}{v}}_{j}^{\prime}\ne 0$ for all j and $\stackrel{\u02c6}{v}\u2033=0$. This way $S\stackrel{\u02c6}{v}=S\prime \stackrel{\u02c6}{v}\prime $, where S′ is the submatrix of S whose columns correspond with $\stackrel{\u02c6}{v}\prime $.
Theorem 3.
Let$\stackrel{\u02c6}{v}=(\stackrel{\u02c6}{v}\prime ,\mathrm{0)}$, with${\stackrel{\u02c6}{v}}_{j}^{\prime}\ne 0$for all j, be a solution to the FBA problem
Then the scenarios of${\stackrel{\u02c6}{S}}_{i}$are biologically possible for$\stackrel{\u02c6}{v}$if and only if for all i we have${\stackrel{\u02c6}{S}}_{i}^{\prime}v\prime ={\alpha}_{i}e$for some scalar α_{ i } ≠ 0.
In biological terms, Theorem 3 identifies the probabilistic variations that are possible in the stoichiometric matrix for cells with the same optimal, steady state fluxes. Thus, if we trust that FBA solutions do indeed identify the limiting behavior of cells as they evolve to optimize to their environmental resources, then Theorem 3 concludes that optimized cells with the same flux state can be described by different probabilistic models.
Additionally, since Eq. (17) in the proof of Theorem 3 only requires that p_{ i } be a probability vector with nonzero entries, we can use this fact to construct a test that will check if any particular collection of scenarios is possible for optimal states. For example, suppose we are interested in investigating if some of the stoichiometric coefficients can vary for the cells having a specified flux state $\stackrel{\u02c6}{v}$ in an optimized culture. Any collection of scenarios for which ${\stackrel{\u02c6}{S}}_{i}\stackrel{\u02c6}{v}$ does not have identical components is impossible, since there is no possible way to assign probabilities that make the flux state optimal once probabilistic variation is included in the model. If our mathematical models accurately assess biological reality, then we can claim that it is biologically impossible to have a collection of optimized cells with a common flux state that vary according to the suggested scenarios.
RAMP implemented as a computational model
We now compare the computational ability of several RAMP implementations with FBA. The computational results herein only initiate the broad comparisons that would benchmark RAMP against the numerous FBA adaptations. The metrics we use in our benchmarking are direct comparisons between FBA and RAMP predictions of experimentally determined fluxes, and RAMP’s and FBA’s ability to identify essential and viable genes. The latter since this is one of the commonly used tests to assess a metabolic model’s potential for producing biologically relevant predictions. All flux tests were conducted on the iJO1366 metabolic model of Escherichia coli^{25}, whereas knockouts also used the iAF1260 metabolic model^{24}. These highquality reconstructed models have, respectively, 1366 and 1260 genes, 2583 and 2382 metabolic reactions, and 1805 and 1668 metabolites.
Gene essentiality is decided by simulating gene knockouts with a corresponding removal of the associated set of metabolic reactions^{2, 3}. If the modified metabolic model affects the optimal growth rate, typically by at least a 50% reduction, then the gene is considered to be essential to the organism. When a predictive model correctly identifies a gene as essential, we label this as a true positive. Similarly, a true negative indicates a correctly identified nonessential gene by the predictive model. A false positive occurs if the analysis incorrectly labels a gene as essential, and a false negative if the analysis incorrectly labels a gene as nonessential. The predictive power of the model is the ratio of the sum of true positives and negatives to the number of genes.
The forthcoming computational experiments introduces various stochastic assumptions by querying either different percentiles δ_{1−ε } or different scenarios for the growth equation. At this point, however, it is necessary to emphasize that the values of M_{ i } bounds are also part of the stochastic interpretation since they define the permissible variation from steady state. Specifically, the set of feasible flux states either grows or stays the same with an increase in any M_{ i }, from which we know that the growth rate is nondecreasing as a function of the individual M_{ i } values. So if the M_{ i } values are too small, then we will likely underapproximate the growth rate, but if the M_{ i } values are too large, then we will likely overapproximate the growth rate. We solve an optimization problem (see Methods) to appropriately set the values of M_{ i } so that RAMP’s growth rate exactly matches that of FBA’s. The problem ensures that the sum of permissible variations from steady state is as small as possible.
RAMP assessment of uncertainty in growth reaction
The goal of our first computational experiment is to gauge the relationship between RAMP and FBA for individual uncertainties in the growth reaction. This is motivated by the observation that stoichiometric coefficients of the growth reaction are empirically determined nonintegers, with values spanning many orders of magnitude (see e.g. Table 1). We address this goal by posing the question, What is the maximum level of uncertainty tolerated in a single stoichiometric coefficient in the growth reaction before RAMP’s predictions of geneknockout essentiality depart from those of FBA in a given environment?
We tackle this challenge by individually increasing the level of randomness in a single growth coefficient, multiplying it with a factor containing ±σ based on five scenarios, and we increase the multiplier σ from zero in an iterative fashion (see Methods for details). As expected, RAMP and FBA coincide for the case with no randomness in the growth stoichiometric coefficients, i.e. with σ = 0. However, as σ increases it is possible that RAMP will find a different set of essential genes than FBA. Consequently, we determine the maximal value of σ for which RAMP and FBA return identical sets of essential genes.
Assuming that the genomescale metabolic reconstruction is a highquality rendering of the biochemical processes associated with the individual biomass components, we make the following observations: If RAMP returns identical essential gene sets as FBA (i.e. stable predictive ability) for large percent variations of a single coefficient, then the cells of the culture could possibly be disparate in how they incorporate the associated metabolite in biomass. If the predictive ability instead degrades with only slight deviations, then the growth coefficient is more likely exhibiting little variation across the culture.
Figure 2 displays, on a logarithmic scale, the calculated maximal values of the multiplier σ before RAMP departs from FBA’s geneknockout predictions, for each of the 72 growth coefficients in iJO1366^{25}. We find that for 18 of the coefficients (see Table 2 for their names), we were only able to determine a lower bound on the value of the multiplier of σ > 10^{26}. Upon inspection of the numerical results for these 18 coefficients, we noticed the following two properties: within numerical accuracy, (i) the value of the mean (see Eq. (6)) is μ_{ i } = 0, and (ii) the standard deviation (see Eq. (7)) scales with M_{ i }. A direct consequence of these two observations is that the RAMPspecific constraints of Eq. (8) reduces to the standard FBA constraint S_{ i }v = 0. Hence, for these 18 coefficients of the biomass equation, the RAMP problem reduced to that of FBA.
The biochemical explanation for why RAMP collapses to the standard FBA problem for all values of σ is immediate for the coefficients in Table 2 with the exception of 2fe2s, 4fe4s, fe3, and mobd: Upon inspection of iJO1366, we find that the biomass components are (i) freely available in the environment, and (ii) may be imported into the cytosol at no energetic or metabolic burden. Hence, any change of their stoichiometric coefficients in the biomass equation leaves the remainder of the biochemical network fluxpatterns unchanged. However, the introduction or generation of 2fe2s, 4fe4s, fe3, and mobd into the cytosolic compartment is associated with metabolic burden through more complex pathways in the biochemical network reconstruction. For these four cases we were unable to determine a clear pattern, making a detailed biochemical explanation difficult.
We find for the remaining 54 biomass coefficients that the maximally allowed multiplier spans a surprising 4 orders of magnitude. Upon investigating a possible relationship between these maximal multiplier values and the corresponding stoichiometric coefficients, we find that they are uncorrelated (ρ = −0.004). Note that, for all of the 54 coefficients, the numerically determined RAMP solutions are more permissive than the FBA solutions, meaning that at the determined maximal multiplier values, RAMP identified as essential a subset of the FBAdetermined essential genes. In total, we found that 23 of the 54 coefficients supported maximal permissible multipliers σ > 1. This is quite surprising, as it suggests that both the sign and the magnitude of the corresponding stoichiometric coefficient may vary while the growth rate and RAMP’s predictive power are maintained.
RAMP computational predictions of gene essentiality
The essential gene predictions of RAMP and FBA for our second computational experiment, in which all coefficients in the growth reaction are simultaneously uncertain (see Methods), are tabulated in Table 3. We assumed in all experiments an oxygenunlimited environment with glucose as the sole, and limiting, carbon source as the default environment. We additionally conducted the same tests with glycerol as the only carbon source, finding results that mirror those of glucose (not shown).
The tallies in Table 3 show some interesting differences between the two genomescale metabolic reconstructions iAF1260^{24} and iJO1366^{25} in response to the different RAMP implementations. While the majority of biochemical reactions are the same in these two models, the RAMP analysis emphasizes that small details of the reconstructed model affect the identification of essential genes. Here, RAMP may be an additional approach in vetting the quality of model reconstructions.
Table 3 shows that the default RAMP model (RAMP_{1}), which only permits uncertainty past the stated accuracy of FBA’s growth equation, precisely replicates the essential gene predictions of FBA for both genomescale metabolic model reconstructions. This fact computationally substantiates Theorem 2 and numerically demonstrates further that minor probabilistic variations mimic FBA.
Model 2 (RAMP_{2}) is designed to magnify the perturbations of Model 1 by a multiplicative factor ρ and ‘spread’ the scenarios in the last few digits. From Table 3, we see that this approach impacts RAMP’s predictive ability. For the iJO1366 model, all values of ρ decreased the true positives by 9 and increased the true negatives by 1, which means that RAMP_{2} had 8 fewer correct predictions than FBA with increased scenario variability as magnified by ρ. In contrast, the iAF1260 model agreed with FBA for all values of ρ. RAMP’s predictive power compared with FBA for RAMP_{2} decreased from 91.14% to 90.56% with the iJO1366 model and matched FBA’s 91.32% with the iAF1260 model. The similarity between FBA and RAMP for RAMP_{1} and RAMP_{2} shows that probabilistic deviations in the growth coefficients beyond what is reported by the genomescale metabolic reconstructions only lead to minor adjustments in predictive power.
Scaling the scenarios proportionately (RAMP_{3a }, RAMP_{3b }, and RAMP_{3c } representing increasing levels of variation) has a slightly more varied effect than multiplying the default variations by ρ. Results with σ ≥ 0.4 had little value since these models relaxed the FBA constraints to the point at which few, if any, essential genes were identified, and for this reason we only report the results for σ ≤ 0.3. The trend as σ increases is that RAMP_{3} reduces the number of predicted essential genes, which lowers the number of true and false positives but increases the number of true and false negatives. Five true positives move to false negatives in the iJO1366 model for RAMP_{3a } and RAMP_{3b }, which reduces RAMP’s predictive ability to 90.78%. However, for larger σ, RAMP_{3c } increases its number of true negatives and reduces the number of false positives, which increased predictive ability to 90.92%. The trend for the iAF1260 model is similar.
Altering the parameter ε (RAMP_{4}) had no effect on RAMP’s ability to predict essential genes, and for all tested values of ε, RAMP agreed with FBA. This clearly suggests that it is more important to tune RAMP with regard to the scenarios than it is with the probabilistic guarantee of satisfying the near equilibrium constraints in Eq. (3).
RAMP comparison with experimentally determined fluxes
To investigate RAMP’s consistency with experimentally determined fluxes, we formulated a nonlinear (quadratic) objective function for RAMP that minimizes the distance between a set of experimentally determined flux data and the corresponding reaction fluxes in a genomescale metabolic model. We directly compare the results from RAMP_{3} with σ = 0.2 with the corresponding results from FBA. The details of the problem formulation are given in Eqs (22) and (23). We used flux data^{6} measured for 28 reactions from the central carbon metabolism of E. coli growing on glucose in batch conditions^{26} both aerobically (Fig. 3(a)) and anaerobically (Fig. 3(b)), as well as in aerobic chemostat conditions^{27} at the two dilution rates of 0.1/h (Fig. 3(c)) and 0.4/h (Fig. 3d)).
A striking visual feature of Fig. 3 is that the RAMPpredicted fluxes consistently provide a better fit to the experimental fluxes than the FBApredicted ones. In Table 4, we quantify the performance of RAMP and FBA in predicting fluxes by calculating the mean square error $\mathrm{MSE}={\Vert {v}_{predicted}{v}_{exp}\Vert}^{2}/N$, with N = 28. The results reported in Fig. 3 and Table 4 demonstrate that RAMP significantly outperforms FBA in predicting experimental fluxes in a wide range of conditions.
Discussion
The expanding suite of systemsbiology computational methods described as constraintbased analyses has significantly improved our ability to probe the properties of genomescale metabolic reconstructions. However, the basic premises of these methods, the assumptions of steady state and no heterogeneity in a cellular population, has stayed unchanged. In this article, we have developed the conceptual framework for a new paradigm of constraintbased methods by explicitly relaxing the basic assumption of steadystate while allowing for heterogeneity. We have presented the mathematical foundation proving that, as levels of uncertainty are reduced, the RAMP method will become identical to standard FBA.
Building on the RAMP foundation, it will be possible to develop robust versions of many of the contemporary constraintbased methods. For some of the new extensions to FBA, the inclusion of RAMP formalism is straightforward, e.g. FBAwMC^{32}, GIMME^{33}, MOMENT^{34}, and GXFBA^{35}. However, also methods that add generegulatory effects, such as rFBA^{43}, and thermodynamic considerations^{44} may be directly implemented with RAMP, since many of these methods generate modified metabolic networks, or temporal sequences of such. Note that, since it is known that uncertainty in variable bounds, e.g. restrictions such as upper bounds on reaction fluxes, mathematically is not different from uncertainty in the stoichiometric coefficients^{20}, the RAMP formulation is also capable of modelling uncertainty in flux constraints.
Theoretically, we are able to formulate a wide host of linear and nonlinear RAMP models as SOCPs. However, the nonlinear ones have not been uniformly translatable into computational reality. While we have been able to solve the L_{2} RAMP fluxminimization problem (see Eq. (22)) and show that RAMP outperforms FBA for flux predictions, we have achieved interspersed success with native SOCP solvers for general RAMP models. Specifically, we have attempted to

solve L_{2} versions of Eq. (21),

predict essential genes with a RAMP version of the minimization of metabolic adjustment (MOMA) method^{45}, which is a quadratic model, and

increase variation beyond the growth coefficients to replicate stochastic environmental bounds.
While we are able to solve these problems occasionally, we were unable to identify a choice of solver settings and convergence criteria to achieve consistent and stable success for the numerical implementation with current SOCP software for any of these extensions. Establishing a stable computational platform for the general RAMP paradigm is an important goal for future work.
The SOCP constraints of RAMP appropriately inject statistical variation into the FBA paradigm, but the conclusion of whether or not RAMP’s stochastic perspective is more or less restrictive than FBA’s deterministic setting remains unclear. From a mathematical point of view, RAMP is neither a direct relaxation of FBA nor a direct restriction of FBA. However, our computational results demonstrate that RAMP is, in practice, a relaxation of FBA. When investigating uncertainty in individual growth coefficients, we uniformly show that discrepancies in gene essentiality are due to RAMP being less restrictive, as evidenced by RAMP having predicted at least one essential gene less than FBA, i.e. a subset of the FBA results. Similarly with all growth coefficients assumed uncertain, RAMP’s stochastic adaptation results in a model whose trend is to predict fewer essential genes, especially as uncertainty increases.
These computational outcomes favorably agrees with the biological sentiment that cellular metabolic networks are more robust in their wildtype setting and are increasingly fragile as they evolve toward an optimal, and specialized, steadystate. Hence, the number of essential genes should reduce as statistical variation increases, which is predicted by RAMP. Moreover, our mathematical analysis and computational outcomes show that RAMP mimics FBA if the stochastic elements are sufficiently small. In particular, if variation is not allowed beyond the assumed accuracy of an FBA model, then RAMP behaves like FBA for predictions of essential genes.
Theorem 3 provides a new approach to discriminate FBA’s optimal flux states: For example, some FBA solutions may have no biologically possible scenarios while others may have many. This suggests that the former is less stable with regards to random variation, whereas the latter is optimal under a wider range of stochastic possibilities. One may intuit that actual flux states should be robust against random variation^{46}, and hence, learning to calculate FBA solutions that can witness as much variability as possible is a promising direction of future research.
Finally, our choice of letting the objective function coincide with the growth reaction was purely a choice based on making RAMP easily comparable to traditional implementations of FBA, and RAMP is valid beyond this choice.
Methods
In this section we provide the mathematical foundation and proofs of the theorems presented in the main text. We also discuss aspects of the computational stability and implementation of RAMP.
Lemma 1
The following lemma establishes a general property from which our result follows.
Lemma 1
Let$\overline{A}$be an nelement row vector, R be a q × n matrix, and b be a positive scalar. Let$\mathcal{F}(\overline{A},R)=\{v\phantom{\rule[0000]{0.20em}{0ex}}:\overline{A}v+\Vert Rv\Vert \le b\}$. Then, for any$v\in F(\overline{A},R)$, there are scalars z and λ satisfying 0 < z ≤ 1 and λ ≥ 0 such that
$$\mathrm{min}\{\Vert vv\prime \Vert \phantom{\rule{.25em}{0ex}}:v\prime \in \mathcal{F}(\overline{A}+\mathrm{\Delta}\overline{A},R+\mathrm{\Delta}R)\}\le \Vert vzv\Vert \le \lambda (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert ),$$where$zv\in \mathcal{F}(\overline{A}+\mathrm{\Delta}\overline{A},R+\mathrm{\Delta}R)$and λ is independent of$\Vert \mathrm{\Delta}\overline{A}\Vert $and$\Vert \mathrm{\Delta}R\Vert $.
Proof
Should v satisfy $(\overline{A}+\mathrm{\Delta}\overline{A})v+\Vert (R+\mathrm{\Delta}R)v\Vert \le b$, then we immediately have
$$\mathrm{min}\phantom{\rule{.25em}{0ex}}\{\Vert vv\prime \Vert \phantom{\rule{.25em}{0ex}}:v\prime \in \mathcal{F}(\overline{A}+\mathrm{\Delta}\overline{A},R+\mathrm{\Delta}R)\}=\Vert vv\Vert =\mathrm{0,}$$and we are done with z = 1 and λ = 0.
Suppose instead that $(\overline{A}+\mathrm{\Delta}\overline{A})v+\Vert (R+\mathrm{\Delta}R)v\Vert >b$. From the Intermediate Value Theorem there is a τ such that 0 < τ ≤ 1 and
$$0<\left(\overline{A}+\mathrm{(1}\tau )\mathrm{\Delta}\overline{A}\right)v+\Vert \left(R+\mathrm{(1}\tau )\mathrm{\Delta}R\right)v\Vert =b.$$To ease notation, let $\overline{A}\prime =\overline{A}+\mathrm{(1}\tau )\mathrm{\Delta}\overline{A}$ and R′ = R + (1 − τ)ΔR. Also select z such that
$$z=\frac{b}{b+\tau (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert )\Vert v\Vert}.$$Then,
$$\begin{array}{lll}\hfill (\overline{A}+\mathrm{\Delta}\overline{A})(zv)+\Vert (R+\mathrm{\Delta}R)(zv)\Vert b& \hfill =\hfill & (\overline{A}\prime +\tau \mathrm{\Delta}\overline{A})(zv)+\Vert (R\prime +\tau \mathrm{\Delta}R)(zv)\Vert b\hfill \\ \hfill & \hfill \le \hfill & z(\overline{A}\prime v+\Vert R\prime v\Vert +\tau (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert )\Vert v\Vert )b\hfill \\ \hfill & \hfill =\hfill & z(b+\tau (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert )\Vert v\Vert )b=0.\hfill \end{array}$$We conclude that $\left(\overline{A}+\mathrm{\Delta}\overline{A}\right)(zv)+\Vert \left(R+\mathrm{\Delta}R\right)(zv)\Vert \le b$, and hence,
$$zv\in \mathcal{F}(\overline{A}+\mathrm{\Delta}\overline{A},R+\mathrm{\Delta}R\mathrm{)}.$$Let $\lambda ={\Vert v\Vert}^{2}/b$, from which we have that
$$\begin{array}{lll}\hfill \Vert vzv\Vert & \hfill =\hfill & \left(1\frac{b}{b+\tau (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert )\Vert v\Vert}\right)\Vert v\Vert \hfill \\ \hfill & \hfill =\hfill & \left(\frac{\tau {\Vert v\Vert}^{2}}{b+\tau (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert )\Vert v\Vert}\right)(\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert )\hfill \\ \hfill & \hfill \le \hfill & \left(\frac{{\Vert v\Vert}^{2}}{b}\right)(\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert )\hfill \\ \hfill & \hfill =\hfill & \lambda (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert ).\hfill \end{array}$$Hence,
$$\mathrm{min}\{\Vert vv\prime \Vert \phantom{\rule{.25em}{0ex}}:v\prime \in \mathcal{F}(\overline{A}+\mathrm{\Delta}\overline{A},R+\mathrm{\Delta}R)\}\le \Vert vzv\Vert \le \lambda (\Vert \mathrm{\Delta}\overline{A}\Vert +\Vert \mathrm{\Delta}R\Vert ),$$and λ is independent of $\Vert \mathrm{\Delta}\overline{A}\Vert $ and $\Vert \mathrm{\Delta}R\Vert $.□
Lemma 1 shows that vectors satisfying SOCP constraints of the form $\overline{A}v+\Vert Rv\Vert \le b$, with b > 0, can be scaled to remain feasible, and moreover, that the magnitude of the adjustment to remain feasible is uniformly bounded by the magnitude of the perturbations. Importantly, this result immediately extends to finite collections of SOCP constraints of the same form, a fact we formalize in Corollary 1.
Corollary 1
For i = 1, …, m, let${\overline{A}}_{i}$be an nelement row vector, R_{ i }be a q × n matrix, and b_{ i }be a positive scalar. Let
$$\mathcal{F}(\{({\overline{A}}_{i},{R}_{i})\phantom{\rule{.25em}{0ex}}:i=1,2,\dots ,m\})=\{v\phantom{\rule{.25em}{0ex}}:{\overline{A}}_{i}v+\Vert {R}_{i}v\Vert \le {b}_{i},for\phantom{\rule{.25em}{0ex}}i=1,2,\dots m,\}.$$Then, for any$v\in \mathcal{F}(\{({\overline{A}}_{i},{R}_{i})\phantom{\rule{.25em}{0ex}}:i=1,2,\dots ,m\})$, there are scalars z and λ satisfying 0 < z ≤ 1 and λ ≥ 0 such that
$$\begin{array}{lll}\mathrm{min}\{\Vert vv\prime \Vert \phantom{\rule{.25em}{0ex}}:v\prime \hfill & \in \hfill & \mathcal{F}(\{{\overline{A}}_{i}+\mathrm{\Delta}{\overline{A}}_{i},{R}_{i}+\mathrm{\Delta}{R}_{i},i=1,2,\dots ,m\})\}\hfill \\ \hfill & \hfill \le \hfill & \Vert vzv\Vert \le \lambda \phantom{\rule{.1em}{0ex}}{\mathrm{max}}_{i}\{(\Vert \mathrm{\Delta}{\overline{A}}_{i}\Vert +\Vert \mathrm{\Delta}{R}_{i}\Vert )\},\hfill \end{array}$$where$zv\in \mathcal{F}(\{({\overline{A}}_{i}+\mathrm{\Delta}{\overline{A}}_{i},{R}_{i}+\mathrm{\Delta}{R}_{i})\phantom{\rule[0000]{0.20em}{0ex}}:i=1,2,\dots ,m\})$and λ is independent of all $\Vert \mathrm{\Delta}{\overline{A}}_{i}\Vert $and$\Vert \mathrm{\Delta}{R}_{i}\Vert $.
Proof
The proof follows directly from Lemma 1 and its proof upon letting z_{ i } and λ_{ I } be the scalars for each I and setting z = min_{ i }{z_{ i }} and λ = max_{ i }{λ_{ i }}.□
Unfortunately, systems of linear equalities like those of FBA do not satisfy similar continuity properties, and changing the stoichiometric matrix in FBA can lead to discontinuities. As a simple example, the system of homogeneous equations
$$\begin{array}{lll}\hfill {v}_{1}{v}_{2}& \hfill =\hfill & 0\hfill \\ \hfill {v}_{1}\mathrm{(1}+\alpha ){v}_{2}& \hfill =\hfill & 0\hfill \end{array}$$has substantially different solution sets around α = 0. Notice that v_{1} = v_{2} = t solves the system for any t if α = 0, but that the only solution for α ≠ 0 is v_{1} = v_{2} = 0. Hence the solution v_{1} = v_{2} = 1 at α = 0 is not arbitrarily close to a solution with α ≠ 0.
Proof for Theorem 1
Proof
Let ${\overline{A}}_{i}={p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}$ and ${\overline{A}}_{i}+\mathrm{\Delta}{\overline{A}}_{i}={({p}_{i}+\mathrm{\Delta}{p}_{i})}^{T}({\stackrel{\u02c6}{S}}_{i}+\mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i})$. Then,
where ${\kappa}_{i}^{A}=\Vert v\Vert \mathrm{max}\{\Vert {\stackrel{\u02c6}{S}}_{i}\Vert ,\Vert {p}_{i}\Vert ,1\}$. Further note that
From the Mean Value Theorem there is a μ_{ ik } between p_{ ik } and p_{ ik } + Δp_{ ik } such that
$$\sqrt{{p}_{ik}+\mathrm{\Delta}{p}_{ik}}\sqrt{{p}_{ik}}=\frac{\mathrm{\Delta}{p}_{ik}}{2\sqrt{{\mu}_{ik}}}.$$Hence,
$$\Vert \sqrt{{P}_{i}+\mathrm{\Delta}{P}_{i}}\sqrt{{P}_{i}}\Vert =\sqrt{\sum _{k=1}^{q}{\left(\sqrt{{p}_{ik}+\mathrm{\Delta}{p}_{ik}}\sqrt{{p}_{ik}}\right)}^{2}}=\sqrt{{\sum _{k=1}^{q}\left(\frac{\mathrm{\Delta}{p}_{ik}}{2\sqrt{{\mu}_{ik}}}\right)}^{2}}\le \frac{1}{2\sqrt{{\mu}_{i}}}\Vert \mathrm{\Delta}{p}_{i}\Vert ,$$where ${\mu}_{i}={\mathrm{min}}_{k}\left\{{\mu}_{ik}\right\}$. We now have from inequality Eq. (15) that
where
$${\kappa}_{i}^{R}={\delta}_{1\epsilon}\phantom{\rule{.25em}{0ex}}\mathrm{max}\left\{\frac{1}{2\sqrt{{\mu}_{i}}}(\Vert Ie{p}_{i}^{T}\Vert +\sqrt{q})\Vert {\stackrel{\u02c6}{S}}_{i}\Vert ,\Vert Ie{p}_{i}^{T}\Vert ,\sqrt{q}\right\}.$$Each of RAMP’s SOCP constraints may be rewritten as
$$\begin{array}{lll}\hfill ({\overline{A}}_{i}+\mathrm{\Delta}{\overline{A}}_{i})v+\Vert ({R}_{i}+\mathrm{\Delta}{R}_{i})v\Vert & \hfill \le \hfill & {M}_{i},i=1,2,\dots ,\mathrm{m}\phantom{\rule{.5em}{0ex}}\phantom{\rule{.5em}{0ex}}\mathrm{and}\hfill \\ \hfill ({\overline{A}}_{i}+\mathrm{\Delta}{\overline{A}}_{i})v+\Vert ({R}_{i}+\mathrm{\Delta}{R}_{i})v\Vert & \hfill \le \hfill & {M}_{i},i=1,2,\dots ,m.\hfill \end{array}$$From Corollary 1 and inequalities Eqs (14) and (16) we know that there is a v′ satisfying these perturbed SOCP constraints and a $\stackrel{\u02c6}{\lambda}\ge 0$ defined independent of $\Vert \mathrm{\Delta}{A}_{i}\Vert $ and $\Vert \mathrm{\Delta}{R}_{i}\Vert $, and subsequently independent of $\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert $, such that
$$\begin{array}{lll}\hfill \Vert vv\prime \Vert & \hfill \le \hfill & \stackrel{\u02c6}{\lambda}\underset{i}{\mathrm{max}}\{\Vert \mathrm{\Delta}{\overline{A}}_{i}\Vert +\Vert \mathrm{\Delta}{R}_{i}\Vert \}\hfill \\ \hfill & \hfill \le \hfill & \stackrel{\u02c6}{\lambda}\underset{i}{\mathrm{max}}\{({\kappa}_{i}^{A}+{\kappa}_{i}^{R})(\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert +\Vert \mathrm{\Delta}{p}_{i}\Vert +\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert \Vert \mathrm{\Delta}{p}_{i}\Vert )\}\hfill \\ \hfill & \hfill \le \hfill & \stackrel{\u02c6}{\lambda}\underset{i}{\mathrm{max}}\{{\kappa}_{i}^{A}+{\kappa}_{i}^{R}\}\underset{i}{\mathrm{max}}\{(\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert +\Vert \mathrm{\Delta}{p}_{i}\Vert +\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert \Vert \mathrm{\Delta}{p}_{i}\Vert )\}.\hfill \end{array}$$Since $\stackrel{\u02c6}{\lambda}$, ${\kappa}_{i}^{A}$, and ${\kappa}_{i}^{R}$ are all independent of $\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert $, the proof is complete upon noticing that v′ must also satisfy
$$L\lambda \phantom{\rule{.1em}{0ex}}\mathrm{\Gamma}e\le v\prime \le U+\lambda \mathrm{\Gamma}e,$$where $\lambda =\stackrel{\u02c6}{\lambda}{\mathrm{max}}_{i}\{{\kappa}_{i}^{A}+{\kappa}_{i}^{R}\}$ and
$$\mathrm{\Gamma}=\underset{i}{\mathrm{max}}\{\Vert \mathrm{\Delta}{p}_{i}\Vert +\Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert +\Vert \mathrm{\Delta}{p}_{i}\Vert \Vert \mathrm{\Delta}{\stackrel{\u02c6}{S}}_{i}\Vert \}.$$Proof for Theorem 2
Proof
For each t let $({\stackrel{\u02c6}{y}}^{t},{y}^{t},{\stackrel{\u02c6}{w}}^{t},{w}^{t},{\rho}^{t},{\sigma}^{t})$ be the assumed dual solution that converges to $(\stackrel{\u02c6}{y},y,\stackrel{\u02c6}{w},w,\rho ,\sigma )$. Further let f be a vector so that f^{T}v = v_{Growth}. The dual can be assumed to satisfy Slater’s interiority condition since the necessary and sufficient primaldual conditions in this case would be satisfied by the convergent sequences:
$$\begin{array}{c}{M}_{i}^{t}+\Vert {R}_{i}^{t}{v}^{t}\Vert \le {({p}_{i}^{t})}^{T}{\stackrel{\u02c6}{S}}_{i}^{t}{v}^{t}\le {M}_{i}^{t}\Vert {R}_{i}^{t}{v}^{t}\Vert ,\forall i\\ L\le {v}^{t}\le U\\ \sum _{i}({({\stackrel{\u02c6}{S}}_{i}^{t})}^{T}{p}_{i}^{t}({\stackrel{\u02c6}{y}}_{i}^{t}{\stackrel{\u02c6}{w}}_{i}^{t}){({R}_{i}^{t})}^{T}({y}_{i}+{w}_{i}))+{\rho}^{t}{\sigma}^{t}=f\\ \Vert {y}_{i}^{t}\Vert \le {\stackrel{\u02c6}{y}}_{i}^{t},\forall i\\ \Vert {w}_{i}^{t}\Vert \le {\stackrel{\u02c6}{w}}_{i}^{t},\forall i\\ {\rho}^{t},{\sigma}^{t}\ge 0\\ \sum _{i}{M}_{i}^{t}({\stackrel{\u02c6}{y}}_{i}^{t}+{\stackrel{\u02c6}{w}}_{i}^{t})+{U}^{T}{\rho}^{t}{L}^{T}{\sigma}^{t}{f}^{T}{v}^{t}=0.\end{array}$$Dual feasibility is defined by the 3^{rd} through 6^{th} constraints, from which we can see that the dual is always strictly feasible, and hence, the dual satisfies Slater’s interiority condition. Select any y and w variables such that $\Vert {y}_{i}^{t}\Vert <{\stackrel{\u02c6}{y}}_{i}^{t}$ and $\Vert {w}_{i}\Vert <{\stackrel{\u02c6}{w}}_{i}^{t}$. Since ρ^{t} − σ^{t} can attain any vector, these two variables can be chosen so that ρ^{t} > 0, σ^{t} > 0 and that
$$\sum _{i}({({\stackrel{\u02c6}{S}}_{i}^{t})}^{T}{p}_{i}^{t}({\stackrel{\u02c6}{y}}_{i}^{t}{\stackrel{\u02c6}{w}}_{i}^{t}){({R}_{i}^{t})}^{T}({y}_{i}^{t}+{w}_{i}^{t}))+{\rho}^{t}{\sigma}^{t}=f.$$Hence the system is indeed necessary and sufficient for optimality.
Allowing t → ∞, we have by assumption that ${\left({p}_{i}^{t}\right)}^{T}{\stackrel{\u02c6}{S}}_{i}^{t}\to {\overline{S}}_{i}$, ${R}_{i}^{t}\to 0$, and ${M}_{i}^{t}\to 0$, from which we have
$$\begin{array}{c}\overline{S}v=0\\ L\le v\le U\\ {\overline{S}}^{T}(\stackrel{\u02c6}{y}\stackrel{\u02c6}{w})+\rho \sigma =f\\ \stackrel{\u02c6}{y},\stackrel{\u02c6}{w},\rho ,\sigma \ge 0\\ {U}^{T}\rho {L}^{T}\sigma {f}^{T}v=0.\end{array}$$Since these are the necessary and sufficient conditions for FBA, v is an optimal solution to the FBA model.
Proof for Theorem 3
Proof
Let $\stackrel{\u02c6}{v}=(v\prime ,0)$ be as stated. Then,
The last equality shows that ${R}_{i}\stackrel{\u02c6}{v}=0$ if and only if $\stackrel{\u02c6}{S}\prime \stackrel{\u02c6}{v}\prime $ is an eigenvector of $e{p}_{i}^{T}$ for the eigenvalue 1. Since p_{ i } is a probability vector, $e{p}_{i}^{T}$ has only two eigenspaces, one of dimension q − 1 for the eigenvalue 0, and one of dimension 1 for the eigenvalue 1. All eigenvectors for the eigenvalue of 1 are scalar multiples of the all ones vector e. Hence, ${R}_{i}\stackrel{\u02c6}{v}=0$ if and only if the scenarios for each I satisfy ${\stackrel{\u02c6}{S}}_{i}^{\prime}{\stackrel{\u02c6}{v}}_{i}^{\prime}={\alpha}_{i}e$ for some α_{ I } ≠ 0.
RAMP’s SOCP constraints as M_{ i } ↓ 0 are
$$\Vert {R}_{i}v\Vert \le {p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v\le \Vert {R}_{i}v\Vert \iff \{\begin{array}{c}{p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v=0\\ {R}_{i}v=0.\end{array}$$Hence, the limiting RAMP model as M_{ i } ↓ 0 is the linear program
The necessary and sufficient conditions are
where f is the vector so that f^{T}v = v_{Growth}.
Suppose for each I that S′_{ i }v′ = α_{ i }e for some α_{ I } ≠ 0. Then $\stackrel{\u02c6}{v}$ satisfies R_{ i }v = 0 for all i. Moreover, since $\stackrel{\u02c6}{v}$ solves Eq. (13), the strong duality theorem of linear programming guarantees a solution to Eq. (19) with $v=\stackrel{\u02c6}{v}$ and w_{ i } = 0. Alternatively, if for some I we have S′_{ i }v′ ≠ α_{ i }e for all nonzero α_{ I }, then ${R}_{i}\stackrel{\u02c6}{v}\ne 0$ and $\stackrel{\u02c6}{v}$ is infeasible in Eq. (18). Hence, in this case ${\stackrel{\u02c6}{S}}_{i}$ is not biologically possible for $\stackrel{\u02c6}{v}$.□
Computational stability of the RAMP framework
A straightforward implementation of RAMP (see Eq. (9)) as a computational model proved elusive. While we were able to determine optimal solutions in many situations for the tested genomescale metabolic reconstructions, we were unable to find consistent achievement of optimality for the general RAMP problem with a wide variety of stateoftheart algorithms to solve SOCPs. The computational challenge is likely caused by two characteristics specific to genomescale metabolic reconstructed networks. First, a substantial number of variables are unsigned and, essentially, unbounded. Second, the FBA problems arising from genomescale reconstructed metabolic networks are highly degenerate (a detailed discussion of this point is provided in ref. 47) and often have highdimensional solution sets. These characteristics combine in a way that seems to hamper the underlying interiorpoint algorithms employed in native SOCP solvers. We also tried to use standard nonlinear reformulations that allowed us to experiment with nonlinear solvers, but again success was tepid at best.
Motivated by biological considerations, we therefore implemented a limited version of RAMP as a computational model. If the probabilistic variability of each constraint is restricted to a single coefficient, then the associated SOCP is linear. The linear constraint is constructed by, for example, assuming that the last coefficient of the ith constraint is the sole random variable. Then for some qvector s we have
$$\begin{array}{lll}\hfill \Vert {R}_{i}v\Vert & \hfill =\hfill & \Vert {\delta}_{1\epsilon}\sqrt{{P}_{i}}(Ie{p}_{i}^{T}){\stackrel{\u02c6}{S}}_{i}v\Vert \hfill \\ \hfill & \hfill =\hfill & \Vert {\delta}_{1\epsilon}\sqrt{{P}_{i}}(Ie{p}_{i}^{T})[{S}_{i\mathrm{,1}}e,\dots ,{S}_{i,(m\mathrm{1)}}e,s]v\Vert \hfill \\ \hfill & \hfill =\hfill & \Vert {\delta}_{1\epsilon}\sqrt{{P}_{i}}[{S}_{i\mathrm{,1}}(ee{p}_{i}^{T}e),\dots ,{S}_{i,(m\mathrm{1)}}(ee{p}_{i}^{T}e),(Ie{p}_{i}^{T})s]v\Vert \hfill \\ \hfill & \hfill =\hfill & \Vert {\delta}_{1\epsilon}\sqrt{{P}_{i}}[\mathrm{0,}\dots ,0,(Ie{p}_{i}^{T})s]v\Vert \hfill \\ \hfill & \hfill =\hfill & \Vert {\delta}_{1\epsilon}\sqrt{{P}_{i}}(Ie{p}_{i}^{T})s\Vert {v}_{n}.\hfill \end{array}$$This calculation shows that if we restrict probabilistic variability to, for example, the coefficients of the growth reaction, then each of the SOCP constraints of the form
$$\Vert {R}_{i}v\Vert {M}_{i}\le {p}_{i}^{T}{\stackrel{\u02c6}{S}}_{i}v\le {M}_{i}\Vert {R}_{i}v\Vert ,$$can be rewritten linearly as
where ${\stackrel{\u02c6}{S}}_{i,\mathrm{Growth}}$ is the column of ${\stackrel{\u02c6}{S}}_{i}$ containing the scenarios for the growth coefficient.
We choose to use the growth coefficients for two reasons. First, we have already noted that this is an empirically derived reaction that is based on the aggregate properties of a cell culture, and thus, the stoichiometric values are inherently associated with uncertainty. Furthermore, it is well known that biomass composition (and thus the values of the stoichiometric coefficients of the growth reaction) of a cellular culture is affected by nutrient conditions and the culture’s growth phase^{48, 49}.
The linear reformulation in Eq. (20) allows RAMP to be solved with standard linear simplex solvers, which proved to be computationally stable. All numerical work was conducted with the freeware GLPK or with the commercial solver Gurobi^{©}, both of which worked well with the COBRA toolbox^{37}.
RAMP prediction of gene essentiality
We present two computational experiments to assist in assessing how stochastic variation affects RAMP’s ability to predict gene essentiality. The first experiment iteratively induces individual randomness in each of the growth coefficients, holding the others at their nominal value as stated in the model. The second experiment assumes simultaneous randomness in all growth coefficients. In all of the computational experiments, we used the E. coli genomescale metabolic models iAF1260^{24} and iJO1366^{25} in a minimal nutrient environment consisting of (i) the models’ presets to emulate M9 medium, (ii) a maximal glucose uptake rate of 10 mmol/gDw/h, and (iii) unlimited oxygen uptake.
RAMP variation in a single growth coefficient
Here, we increase the level of uncertainty in a single growth coefficient in an iterative fashion, and we systematically go through all of the coefficients. The goal of this experiment is to gauge predictive ability against individual uncertainties in the growth equation, and we measure the predictive ability simply by comparing the number of essential genes with that predicted by FBA. If the predictive ability is stable for large percent variations of a single coefficient, then the cells of the culture could possibly be disparate in how they use the associated metabolite. If instead the predictive ability degrades with slight deviations, then the growth coefficient is more likely conserved across the culture.
The experiment uses q = 5 scenarios for each coefficient. The two most extreme scenarios multiply the coefficient by ±σ with probability P(3/2 ≤ z) = 0.0351, where z is standard normal variable. Two other scenarios multiply the coefficient by ±σ/2 with probability P(1/2 ≤ z ≤ 3/2) = 0.2389, and the other scenario leaves the coefficient unchanged with probability P(−1/2 ≤ z ≤ 1/2) = 0.4520. The value of δ_{1−ε } was 3, which meant that the probabilistic constraints were guaranteed with probability 0.9997.
The largest value of σ was calculated within an accuracy of 0.0001 and with a final value that allowed RAMP’s predictive accuracy to match that of FBA’s. Thus, starting from an initial value of σ = 0.0001, we (i) increased σ by a factor 2 until RAMP’s gene knockout predictions no longer matched that of FBA, and (ii) at this point, we initiated a binary search to identify the maximal value of the multiplier σ, which resulted in RAMP and FBA gene knockout predictions matching.
RAMP simultaneous variation in all growth coefficients
In this experiment we assumed simultaneous randomness in all growth coefficients. Four probabilistic models were used to assess RAMP’s overall sensitivity to stochastic variation, and each was compared against FBA’s gene knockout predictions as calculated by the COBRA toolbox (see Table 3 for results).
Model 1(default) Our default RAMP model assumes that the means of the growth coefficients are the values stated in the FBA model and that probabilistic variation is restricted to the first unspecified significant digit. Assuming ${10}^{{d}_{i}}$ identifies this digit for the ith growth coefficient, we further assume that each growth coefficient has the 5 scenarios in which it is perturbed by $\pm \eta \cdot {10}^{{d}_{i}}$, where η is one of 0, 1, or 2. The probability of the scenario with no perturbation is P(−1/2 ≤ z ≤ 1/2) = 0.4520, with perturbation $\pm {10}^{{d}_{i}}$ is P(1/2 ≤ z ≤ 3/2) = 0.2389, and with perturbation $\pm 2\cdot {10}^{{d}_{i}}$ is P(3/2 ≤ z) = 0.0351, where z is a standard normal variable. Two examples of these scenarios are illustrated in Table 5. We choose to set δ_{1−ε } = 3, so that the probabilistic constraints have a 99.97% guarantee of satisfaction.
Model 2 The second probabilistic model multiplies η in model 1 by the scalar ρ to assess how RAMP solutions adjust as the scenarios deviate from those of FBA. Eight tests with ρ = 2, 3,…, 9 were considered. Integers beyond 9 resulted in sign changes and were not considered. The probability scenarios from Model 1 were used, as well as δ_{1−ε } similarly being set to 3.
Model 3 Since Model 2’s scaling by ρ disproportionately effects small growth coefficients, we compensate by replacing the scenarios of Model 1 with percentages of the growth coefficient itself. We choose different ranges for σ, using the scenarios $\mathrm{(1}\pm \phantom{\rule{.1em}{0ex}}\sigma \cdot \eta )\cdot {\stackrel{\u02c6}{S}}_{i,Growth}$, where η is one of 0, ±1/2, or ±1. The probabilities are unchanged from those in Model 1, and δ_{1−ε } is 3.
Model 4 To assess how RAMP reacts to changes in the certainty of satisfying the probabilistic constraints, model 4 changes δ_{1−ε }. All other model parameters are inherited from the default model.
In our testing of the predictive ability of these four computational RAMP models (for computational results, see Table 3), we used the shorthand notation given in Table 6 for our choice of combination of model and parameters.
Determination of bounds M _{ i }
The selection of M_{ i } (see Eq. (9)) is an important parameter to decide. However, instead of imposing these bounds arbitrarily, these parameters are determined so that each RAMP model accurately returns the targeted, optimal growth rate. Let γ^{*} be the optimal growth rate as calculated by the FBA model. The M_{ i } values are calculated by solving the following optimization problem.
where M is the vector whose ith component is M_{ i }, and ${\Vert \phantom{\rule[0000]{0.20em}{0ex}}\cdot \phantom{\rule[0000]{0.20em}{0ex}}\Vert}_{1}$ is the L_{1} norm. Setting M to be the calculated optimal solution of Eq. (21) tightens the SOCP constraints while ensuring that the optimal growth rate is held at its desired value. We experimented with L_{2} and L_{∞} counterparts; however, the L_{2} norm suffered from inconsistent solves, and the L_{∞} norm overly relaxed constraints, which was not surprising.
Comparison of RAMP and FBA with experimentally determined fluxes
We tested the ability of FBA and RAMP to agree with experimentally determined fluxes. For RAMP we solved,
where v_{ EXP } is the set of experimentally determined fluxes. The M_{ i } values were calculated to achieve the FBAoptimal growth rate as noted in the previous section. The scenarios were those of Model 3 (RAMP_{3}) with a 20% scaling of the growth equation, i.e. σ = 0.2. The (FBA) optimal growth rate γ^{*} was scaled down by θ = 0.9 because the experimental fluxes were not guaranteed to coincide with optimized cultures. Note that Eq. (22) is an example of a nonlinear (quadratic) objective within the RAMP formalism. Consequently, we determine a set of RAMP flux values that are as close to the experimental flux set as possible.
Likewise, the ability of FBA to realize the experimental fluxes was tested by solving the following adaptation of Eq. (2),
As with RAMP, θ was set to 0.9, and the solution was again as close as possible to v_{ EXP }.
Additional Information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
 1.
Bordbar, A., Monk, J. M., King, Z. A. & Palsson, B. O. Constraintbased models predict metabolic and associated cellular functions. Nat Rev Genet 15, 107–120 (2014).
 2.
Palsson, B. O. Systems Biology: Constraintbased Reconstruction and Analysis (Cambridge Univ Press, 2015).
 3.
Maranas, C. D. & Zomorrodi, A. R. Optimization Methods in Metabolic Networks (Wiley, 2016).
 4.
Ibarra, R. U., Edwards, J. S. & Palsson, B. O. Escherichia coli k12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420, 186–189 (2002).
 5.
Dekel, E. & Alon, U. Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592 (2005).
 6.
Schuetz, R., Kuepfer, L. & Sauer, U. Systematic evaluation of objective functions for predicting intracellular fluxes in escherichia coli. Mol Syst Biol 3, 119 (2007).
 7.
Schuetz, R., Zamboni, N., Zampieri, M., Heinemann, M. & Sauer, U. Multidimensional optimality of microbial metabolism. Science 336 (2012).
 8.
Varma, A., Boesch, B. W. & Palsson, B. O. Biochemical production capabilities of escherichia coli. Biotechnol Bioeng 42, 59–73 (1993).
 9.
Lewis, N. E., Nagarajan, H. & Palsson, B. O. Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nat Rev Microb 10, 291–305 (2012).
 10.
Fong, S. S. & Palsson, B. O. Metabolic genedeletion strains of escherichia coli evolve to computationally predicted growth phenotypes. Nat Genet 36, 1056–1058 (2004).
 11.
Burgard, A. P., Pharkya, P. & Maranas, C. D. Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng 84, 647–657 (2003).
 12.
Burgard, A. P., Vaidyaraman, S. & Maranas, C. D. Minimal reaction sets for escherichia coli metabolism under different growth requirements and uptake environments. Biotechnology Progress 17, 791–797 (2001).
 13.
Almaas, E., Kovacs, B., Vicsek, T., Oltvai, Z. N. & Barabasi, A. L. Global organization of metabolic fluxes in the bacterium escherichia coli. Nature 427, 839–843 (2004).
 14.
Almaas, E., Oltvai, Z. N. & Barabasi, A.L. The activity reaction core and plasticity of metabolic networks. PLoS Comput Biol 1, e68 (2005).
 15.
Lee, K. H., Park, J. H., Kim, T. Y., Kim, H. U. & Lee, S. Y. Systems metabolic engineering of escherichia coli for lthreonine production. Mol Syst Biol 3, 149 (2007).
 16.
Park, J. H., Lee, K. H., Kim, T. Y. & Lee, S. Y. Metabolic engineering of escherichia coli for the production of lvaline based on transcriptome analysis and in silico gene knockout simulation. Proc Natl Acad Sci USA 104, 7797–7802 (2007).
 17.
Yim, H. et al. Metabolic engineering of escherichia coli for direct production of 1,4butanediol. Nat Chem Biol 7, 445–452 (2011).
 18.
Khodayari, A., Chowdhury, A. & Maranas, C. D. Succinate overproduction: A case study of computational strain design using a comprehensive escherichia coli kinetic model. Front Bioeng Biotechnol 2, 76 (2014).
 19.
Yizhak, K. et al. A computational study of the warburg effect identifies metabolic targets inhibiting cancer migration. Mol Syst Biol 10, 744 (2014).
 20.
BenTal, A., El Ghaoui, L. & Nemirovski, A. Robust Optimization (Princeton Univ. Press, 2009).
 21.
Ko, E. P., Yomo, T. & Urabe, I. Dynamic clustering of bacterial population. Physica D 75, 81–88 (1994).
 22.
Avery, S. V. Cell individuality: the bistability of competence development. Trends in Microbiology 13, 459–462 (2005).
 23.
Smits, W. K., Kuipers, O. P. & Veening, J.W. Phenotypic variation in bacteria: the role of feedback regulation. Nat Rev Micro 4, 259–271 (2006).
 24.
Feist, A. M. et al. A genomescale metabolic reconstruction for escherichia coli k12 mg1655 that accounts for 1260 orfs and thermodynamic information. Mol Syst Biol 3, 121 (2007).
 25.
Orth, J. D. et al. A comprehensive genomescale reconstruction of escherichia coli metabolism–2011. Mol Syst Biol 7, 535 (2011).
 26.
Perrenoud, A. & Sauer, U. Impact of global transcriptional regulation by arca, arcb, cra, crp, cya, fnr, and mlc on glucose catabolism in escherichia coli. J Bacteriol 187, 3171–3179 (2005).
 27.
Nanchen, A., Schicker, A. & Sauer, U. Nonlinear dependency of intracellular fluxes on growth rate in miniaturized continuous cultures of escherichia coli. Appl Environ Microbiol 72, 1164–1172 (2006).
 28.
BenTal, A. & Nemirovski, A. Robust convex optimization. Mathematics of Operations Research 23, 769–805 (1998).
 29.
BenTal, A. & Nemirovski, A. Robust solutions of uncertain linear programs. Operations Research Letters 25, 1–13 (1999).
 30.
BenTal, A. & Nemirovski, A. Robust truss topology design via semidefinite programming. SIAM journal on optimization 7, 991–1016 (1997).
 31.
Zavlanos, M. & Julius, A. Robust flux balance analysis of metabolic networks. In American Control Conference (ACC), 2011, 2915–2920 (2011).
 32.
Beg, Q. K. et al. Intracellular crowding defines the mode and sequence of substrate uptake by escherichia coli and constrains its metabolic activity. Proc Natl Acad Sci USA 104, 12663–12668 (2007).
 33.
Becker, S. A. & Palsson, B. O. Contextspecific metabolic networks are consistent with experiments. PLoS Comput Biol 4, e1000082 (2008).
 34.
Adadi, R., Volkmer, B., Milo, R., Heinemann, M. & Shlomi, T. Prediction of microbial growth rate versus biomass yield by a metabolic network with kinetic parameters. PLoS Comput Biol 8, e1002575 (2012).
 35.
Navid, A. & Almaas, E. Genomelevel transcription data of yersinia pestis analyzed with a new metabolic constraintbased approach. BMC Systems Biology 6, 150 (2012).
 36.
Reed, J. L., Vo, T. D., Schilling, C. H. & Palsson, B. O. An expanded genomescale model of escherichia coli k12 (ijr904 gsm/gpr). Genome Biol 4, R54 (2003).
 37.
Schellenberger, J. et al. Quantitative prediction of cellular metabolism with constraintbased models: the cobra toolbox v2.0. Nat. Protocols 6, 1290–1307 (2011).
 38.
Chu, M., Zinchenko, Y., Henderson, S. & Sharpe, M. Robust optimization for intensity modulated radiation therapy treatment planning under uncertainty. Physics in Medicine and Biology 50, 5463–5477 (2005).
 39.
Thiele, I. & Palsson, B. O. A protocol for generating a highquality genomescale metabolic reconstruction. Nat. Protocols 5, 93–121 (2010).
 40.
Chindelevitch, L., Trigg, J., Regev, A. & Berger, B. An exact arithmetic toolbox for a consistent and reproducible structural analysis of metabolic network models. Nat Commun 5 (2014).
 41.
Ebrahim, A. et al. Do genomescale models need exact solvers or clearer standards? Molecular Systems Biology 11 (2015).
 42.
Chindelevitch, L., Trigg, J., Regev, A. & Berger, B. Reply to “do genomescale models need exact solvers or clearer standards?”. Molecular Systems Biology 11 (2015).
 43.
Covert, M. W., Schilling, C. H. & Palsson, B. Regulation of gene expression in flux balance models of metabolism. J Theor Biol 213, 73–88 (2001).
 44.
Ataman, M. & Hatzimanikatis, V. Heading in the right direction: thermodynamicsbased network analysis and pathway engineering. Curr Opin Biotechnol 36, 176–182 (2015).
 45.
Segrè, D., Vitkup, D. & Church, G. M. Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci USA 99, 15112–15117 (2002).
 46.
Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S. & Gilles, E. D. Metabolic network structure determines key aspects of functionality and regulation. Nature 420, 190–193 (2002).
 47.
Livingstone, K., Holder, A. & Almaas, E. Introduction to systems biology for mathematical programmers. In Lim, G. & Lee, E. (eds.) Optimization in medicine and biology, chap. 11, 311–354 (Auerbach Publications, 2008).
 48.
Mahadevan, R. et al. Characterization of metabolism in the fe(iii)reducing organism geobacter sulfurreducens by constraintbased modeling. Appl Environ Microbiol 72, 1558–1568 (2006).
 49.
Neidhardt, F., Ingraham, J. & Schaechter, M. Physiology of the bacterial cell: a molecular approach (Sinauer Associates, 1990).
Acknowledgements
The authors thank Dr. Eric Reyes for his careful reading of an earlier draft, as well as Jarle Pahr for insights on flux comparisons. E.A. would like to thank The Research Council of Norway (RCN) grant 245160 (ERASysAPP: WineSys) for funding.
Author information
Affiliations
Department of Mathematics, RoseHulman Institute of Technology, Terre Haute, IN, USA
 Michael MacGillivray
 , Amy Ko
 , Emily Gruber
 , Miranda Sawyer
 & Allen Holder
Department of Biotechnology, NTNU  Norwegian University of Science and Technology, Trondheim, Norway
 Eivind Almaas
K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and General Practice, NTNU  Norwegian University of Science and Technology, Trondheim, Norway
 Eivind Almaas
Authors
Search for Michael MacGillivray in:
Search for Amy Ko in:
Search for Emily Gruber in:
Search for Miranda Sawyer in:
Search for Eivind Almaas in:
Search for Allen Holder in:
Contributions
E.A. and A.H. initiated the project. Mathematical analysis by M.M., A.K., E.G., M.S., and A.H. A.H. conducted the numerical experiments. E.A. and A.H. analyzed the results and wrote the paper.
Competing Interests
The authors declare that they have no competing interests.
Corresponding authors
Correspondence to Eivind Almaas or Allen Holder.
Electronic supplementary material
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.
This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/