An analytic approximation of the feasible space of metabolic networks

Assuming a steady-state condition within a cell, metabolic fluxes satisfy an underdetermined linear system of stoichiometric equations. Characterizing the space of fluxes that satisfy such equations along with given bounds (and possibly additional relevant constraints) is considered of utmost importance for the understanding of cellular metabolism. Extreme values for each individual flux can be computed with linear programming (as flux balance analysis), and their marginal distributions can be approximately computed with Monte Carlo sampling. Here we present an approximate analytic method for the latter task based on expectation propagation equations that does not involve sampling and can achieve much better predictions than other existing analytic methods. The method is iterative, and its computation time is dominated by one matrix inversion per iteration. With respect to sampling, we show through extensive simulation that it has some advantages including computation time, and the ability to efficiently fix empirically estimated distributions of fluxes.

(a) I think that the authors should somewhat clarify early how the bounds on the fluxes is taken into account.
I am worried that a reader, might start wondering about this from line 105 forward. Perhaps, the following rephrase on line 104 might be useful: "As standard, and ignoring for the moment the bounds in the fluxes, we can define a quadratric energy function...", or something similar. 10. In the section "First local approach" (from line 121), I have the following suggestions/comments: (a) Although it is implicit in the form of the Gaussian prior (line 123), it should be made explicit that, with this choice, the fluxes are now unbounded.
(b) In Eq. (II.8) the symbol ∀n is used, while in line 114 the symbol n ∈ {1, . . . , N } is used. Please, unify presentation of the notation. 11. In the section "Expectation Propagation" (from line 131), I have the following suggestions/comments: (c) Moreover the sign= has various meanings in Mathematics, and for some readers it may not be clear from the context.
(d) In reference [24] of line 143, it is said: "Of course, it is an extensive number of "intractable" priors that makes the computation intractable, not a single one!". Does this mean that a possible generalization of the EP method would be to include an intensive number of intractable priors, viz According to your expertise, how will this affect the viability of your method for moderate values of L? I guess that it must be important which set of fluxes (n 1 , . . . , n L ) are chosen in this generalization (ie. highly correlated fluxes) (e) The panels (a) and (b) of figure II.1 should be made bigger (space permitting) and the labels in the axes also bigger, so it improves readability. 12. In the subsection "Numerical results" (from line 163) (a) The order in the discussion between lines 168-172 is a bit confusing. Perhaps it would be more appropriate to discuss the part regarding the preprocessing the network first and then discuss how HR and EP are run in the reduced networks. (f) In line 202: "Results suggest that the more T grows the more the points, either the means or the variances, are correlated" → "Results suggest that as T grows, more points, either the means or the variances, become correlated" 13. In the section "Discussions" (from line 209) (a) In line 209" "Discussions" → "Discussion".
(b) In line 236, the power EP algorithm is mentioned for the first time. Is this a new implementation? Is this a known method? If the latter, can a reference be provided?. This paragraph does not seem to add anything substantial to the paper. If the authors agree, perhaps it should be removed altogether from the paper (and the corresponding the appendix G). Otherwise, please give a some argument about the importance of such a variation.
14. In the section "Methods" (from line 260) (a) I am not sure the concept of exclusive and inclusive KL divergence is known. Please make sure this is explained in the references provided or explain otherwise.

Response to reviewers
Dear Editors, We acknowledge both Reviewers for their critical reading of the manuscript. Thanks to their comments, we hope to have improved substantially the quality in this revised version.
In the following we list the major changes relative to the first version of the work: A non-adaptive approach • In the section "A non-adaptive approach" (called "First local approach" in the old submission) of the manuscript we have decided to study another organism, the red blood cell, and to report the plots of a subset of marginals. We believe that (a) the comparison to this model is very popular as many authors choose to report their results on this model (b) Reviewer 2 asked to compare some marginals estimated by EP against the BP algorithm in * Fernandez-de-Cossio-Diaz, Jorge, and Roberto Mulet. "Fast inference of ill-posed problems within a convex space." arXiv preprint arXiv:1602.08412 (2016).
Since authors of this work reported the entire set of marginals for the red blood cell, we have decided to analyze the same organism and to make a comparison with their published results. The plots are shown in Appendix H.

Expectation Propagation
• The old section "Numeric results", now entitled "Numerical results for large scale metabolic networks" , has been almost entirely rewritten to clarify several issues reported by the reviewers along with to correct some minor errors of the first submission.
• In the new section "Study of Escherichia Coli metabolism for a constrained growth rate" we report the results of the Expectation Propagation algorithm applied to the iJR904 when we constrain the biomass flux of Escherichia Coli to be equal to an experimental profile. In the first submission we just mentioned how to cope, in principle, this slightly different problem involving metabolic fluxes.

Methods
• The update rules for the experimentally measured flux are now reported in the new section "Update equations for a constrained posterior" and not anymore in the Appendix.
• The technical details of the computations are now listed in the section "Technical details".
• All the discussion about the link between EP and other inference techniques, along with the free energy functional, has been postponed in the Appendix.
• We have added a new section "Weighted Hit-and-Run" where we propose a simple re-weighting of the configurations sampled by HR in order to "force" a certain marginal to fit an experimental one (as we have exactly done by EP in the section "Study of Escherichia Coli metabolism for a constrained growth rate"). We have discussed in detail why this method is generally unfeasible.
• We have erased the section "Power EP" as the algorithm presented in this part has never been used for the results reported in the manuscript and thus the description of "Power EP" algorithm does not add any relevant information to the article.
Furthermore, we have applied these minor changes to the work: • Eq. II.4. We have defined the indicator function I; • In Appendix E, matrix S was erroneously indicated as K; we have replaced all the "K" with the correct notation; • Generally, we have used a more unified notation for mathematical symbols. We have used lowercase bold style for vectors, uppercase bold for matrices and the apex " T " for matrix transposes; • Line 399. We have changed the title of Appendix E "Fast computation of Σ and µ" with the more appropriate "Fast computation of Σ andν"; • Section II.B is now titled "A non-adaptive approach" instead of "First local approach"; Minor changes • We implemented several minor changes targetted at having a much more uniform notation.
• Line 196. "the name of the organism, the name" becomes "the name of the organism" RESPONSE TO REVIEWER 1

Structure and writing
• Q1: Lines 11 to 19 of the abstract are standard in the metabolic community and can be summarized in a single sentence. It is not clear what the authors mean by "reference tools" since the enumerated serve to solve different problems. Moreover, the example for coupling of fluxes is not the prime one; in fact, coupling of fluxes (in the sense that their ratio is the same in any steady state of the system) serves as a more intuitive example. What does the acronym TAP stand for (this is not clear for the general audience of Nature communications)? Due to these issues, a more informative and to-the-point abstract is needed.
A1: We shortened lines 11-19 to be hopefully more concise and to the point. We clarified that the "reference tools" depend on the type of analysis sought. We clarified a bit the example on coupling of fluxes, but kept our version as it is more generic. We were afraid that introducing the example proposed by the referee, even if much simpler to understand, would trivialize the issue, as it can be "cured" very simply by eliminating one of the two fluxes. We removed TAP from the abstract, as it is not strictly needed.
• Q2: In section I, line 54 -55, it is not clear what is meant by "if the optimum is unique"? A linear program has a unique optimum value, but the optimizer (the set of values for the variables) may not be.

A2:
We simplified the sentence, as the optimizer set was referenced before.
• Q3: It is not clear why incorporation of prior knowledge (e.g. estimated fluxes) is limiting to the HR technique (lines 73 -77).
A3: With "prior knowledge" we wanted to indicate the inclusion of experimentally measured marginal probabilities of fluxes. We have explained in Appendix G how the solution of this problem can be attempted both by importance sampling in a Boltzmann learning scheme or by a re-weighting of the configurations explored by HR. However this last possibility turns out to be unfeasible in general cases (we suspect that the first possibility would be also very time consuming, but it seems that such tools are not available for testing currently). We have changed that part of the sentence to "incorporation of other constraints".
• Q4: It would be nice to provide biochemical examples when the rows of the stoichiometric matrix would be correlated? (e.g. cofactors usage) A4: Indeed, the suggestion is correct. We added some specific examples from the stoichiometric matrix of the ecoli-core model. We have added in the introduction the following lines: "To give an example, let us analyze the rows of the stoichiometric matrix for ecoli-core model. The rows corresponding to the adenosine-diphosphate (ADP) and adenosine-triphosphate (ATP) appear strongly correlated as both metabolites commonly appear in 11 reactions; the same apply for the intracellular water and hydrogen ion that have 10 reactions in common." • Q5: Line 124 can include a reference to Eq. (II.8).
A5: We have changed the order of Eq. (II.8) and Eqs. (II.6), (II.7) in a way that now Eq. (II.8) results immediately after Line 124 where we have explained how to determine the means and variances in the nonadaptive approach (called "first local approach" in the first submission).
• Q6: Metabolites do not carry flux (line 172). I suppose metabolites which are only produced or only degraded are removed.
A6: Exactly. We have clarified the sentence as "First we preprocess the stoichiometric matrix of the model in order to remove all reactions involving metabolites that are only produced or only degraded. Consider, for instance, the i th metabolite entering in K = {k 1 , . . . , k K } reactions. If the entries of the i th rows of the stoichiometric matrix {S ik } k∈K are all positive or all negative and moreover b i = 0, the linear equation k K k=k1 S ik ν k = 0 has only one trivial solution ν k = 0 for k ∈ K. As this assignment should be compatible with the constraint on the bounds, i.e. ν inf k ≤ 0 ≤ ν sup k for k ∈ K (otherwise the model is unfeasible), these fluxes can be set to zero slightly simplifying the model." • Q7: The technical details of the implementation can be postponed from lines 180 -183 to the methods section.
A7: We have created a new subsection in the "Methods" part entitled "Technical details" containing the paragraph in objective.
• Q8: The statement on lines 190 -191 is obsolete. Although we understand that the main result can be summarized in the left panel of Fig. II.2, we believe that the scatter plots are useful to describe in detail how EP results approach the Monte Carlo ones for an increasing number of sample points. Not only the means and variances of our approximation are highly correlated with HR ones, but they approach the values computed with HR as we sample the polytope more and more fairly. Moreover, the plot shows that there are outliers in some cases, and information that is difficult to infer from the correlation alone in particular when the agreement is not so good (see e.g. the figure corresponding to Saccharomyces Cerevisiae iND750 or Mycobacterium Tuberculosis iNJ661 in the Appendix).
The part in lines [195 -207] have been rewritten.
The populated portion of the variances of fluxes for Cholinergic neuron lie on the extremes of the plot. We do not know why the middle region is not populated. The fluxes with smaller variances (< 10 −10 ) are essentially blocked as their mean is also close to zero. There is however a consistent group of fluxes with small variances (< 10 −5 ) but non-zero means; that is fluxes for which there is essentially no uncertainty on their value. Note that in any case the result is perfectly consistent with HR.
• Q10: The discussion deserves more rigor and more biological examples to demonstrate the usefulness of the approach (apart from lowering the computation time for similar approximation of marginals). What do the authors mean by sufficient statistics of a Gaussian distribution? In what sense are the N × N and M × M matrices equivalent?
A10: First, we underline that lowering of computing time for very large metabolic networks, like the RECON1 model, is an important point in itself, as the computing time needed for the uniform sampling of HR is close to being unfeasible: it required for us about 20 days to obtain the best estimates with T = 2.6 × 10 9 sample points, and the correlations seems to be still on an increasing trend.
Concerning the biological examples, we have decided to follow the referee's suggestion and develop in detail how to incorporate the information about an experimentally measure of the distribution of the biomass flux (as Reviewer 1 suggested in Q8 of the section "Comments to the method"). The update equations appearing in Appendix G are now reported in the "Methods" section where, moreover, we have better explained how to derive them. To give a biological example, we have applied this slightly modified algorithm to the iJR904 model for Escherichia Coli in some specific conditions. The complete discussion is now reported in the new Subsection "Study of Escherichia Coli metabolism for a constrained growth rate" of the "Results" section.
As the reviewer points out, the expression "sufficient statistics" is not clear and unneeded. We have replaced it by "the optimization of two parameters, the mean and the variance of a Gaussian distribution".
Regarding the "N × N and M × M matrices", we think that the expression of the original work can be misunderstood. With "equivalent" we meant that the N × N matrix Σ can be written, using Woodbury formula, in the following way: where 1 M ×M is the M × M identity matrix. Thus, instead of inverting the original βS t S + D, we can invert 1 M ×M + βSD −1 S T that has dimension M × M . Although we believe that this equality can be very helpful, the statement in the Discussion part and the relative paragraph in Appendix E have been omitted since all results presented in this paper are obtained through the inversion of the original matrix.
• Q11: Parts of section IV.B cannot be understood without reading the appendices as clarification of the notation used.
A11: We agree with the referee; the paragraph is not clear enough. We have postponed the discussion about the EP free energy functional and its link to other inference approaches in Appendix B where this functional is derived in detail.

Comments to the Method
• Q1: Why is the normalization constant from Eq. II.2 neglected if the limits of II.5 in limits of \beta to infinity?
A1: In the original submission we omitted terms that do not depend on ν (and thus were not needed in the final expression). For the sake of clarity, we corrected Eq. (II.2) to include the normalization explicitly: In addition, to underline the dependence on the unknowns variables, we have introduced the sentence "Neglecting all the terms that do not depend on ν, the posterior takes the form of P (ν|b) ∝ e − β 2 (Sν−b) T (Sν−b) N n=1 ψ n (ν n ) where we have not explicitly reported the normalization constant". Note that the normalization constant hidden in the proportionality symbol does indeed depend on β. Note also that we do not perform the β → ∞ limit explicitly, but just employ a very large value of β.
• Q2: It seems that the fluxes shown in Figure II.1 involve exchange reactions. How does the first local approach behave for internal reactions?
A2: The results remain qualitatively the same. We report here, in Figure 1, the results of EP against the nonadaptive approach (first local approach in the old version of the manuscript) for a subset of internal/exchange reactions (these have been chosen randomly, the behavior is similar in other cases. We find that the non-adaptive approach generally overestimates the variances, and is very inaccurate for the means) • Q3: What are the rules for numerical convergence used in this study (lines 151-152).
A3: To complete the discussion we have added a reference to the subsection "Numerical results" where the criterion is described. The following sentence has been added to line 152: "Further technical details about the convergence are reported in Subsection II.C 1".
• Q4: Doesn't Figure II.1 (b) report the accuracy rather than the efficiency of the approach?
• Q5: Most critically, how does one chose the value of beta a priori? Is there any guideline for this important issue? If not resolved, the approach will have the same issues as the criticized HR whose convergence depends on the scenarios investigated (as stated in the introduction).
A5: For the results reported in the work we have set β = 10 10 except for the biggest model, RECON1, where β = 10 9 . This has been done because EP did not converge for β = 10 10 on the RECON1 model. We show in Figure 2 that results do not significantly change for β ∈ 10 7 , 10 10 , so our guideline is to always use a very large value for this parameter, for instance β = 10 10 and, in case of non-convergence, decrease it. We would like to stress that, even though convergence is not guaranteed, the tuning of this parameter only requires few runs of EP. The total time spent for the setting of β is thus of the order of minutes (or seconds in most cases) that is negligible with respect to the convergence time of a typical HR run investigated in this work. We added these considerations to the manuscript in the section "Numerical results for large scale metabolic networks".
• Q6: The error in the means and the variances should be a function of a and d , the parameters to be estimated (see lines 178).
A6: In the first submission we have erroneously used b instead of d, which we have now corrected. Actually, a more consistent way of estimating the error (which we had actually used in the simulations) is to compute differences on the first and second moment of the (truncated) marginal distribution on successive time-steps, as these are the quantities we use for the comparison. We have rephrased this part.
• Q7: The discussion is convoluted when it comes to the treatment of the ranges for b allowed! Why is a range from 10^-50 to 10^50 considered and how is this precision achieved in the implementation? It is clear that a simple variant of Flux Variability Analysis can determine the range of values that a flux is allowed to take in the feasible space. Why not use these as bounds?
A7: This was an unfortunate typo. We have erroneously indicated the variances of the approximation with b instead of d as for Q6. All the discussion refers to bounds on the parameters of the approximation and does not involve the intakes/uptakes b nor the bounds on the fluxes that are provided by the model and can take any real number.
• Q8: The authors should provide one biologically motivated example where experimental evidence is encoded (e.g. confidence intervals of flux estimates from labeling studies in different model species).
A8: We have discussed in detail how to enforce an experimental profile in our algorithm in section "Study of Escherichia Coli metabolism for a constrained growth rate".
• Q9: The definition of D does not match the standard definition of a diagonal matrix! A9: D is indeed a diagonal matrix from which we had only specified the diagonal terms. We have completed the description of the matrix D by specifying that non-diagonal terms are 0. • Q10: The S used for stoichiometric matrix in the introduction is different from that in Eq. IV.3. Please, unify.
A10: To unify the notation we have decided to use the symbol S for the stoichiometric matrix.
• Q11: Are there any fluxes for which convergence is achieved easier than for others?  A12: Violation of the condition d n > 0 can indeed occur during the iteration, and it is a sign that the EP approximation cannot capture the true distribution (maybe because it is too constrained as the referee suggests). For example, one simple case in which this surely occurs is when ν inf n = ν sup n (but thi case of course can be eliminated by simplifying the system). In practice, the truncation bounds 10 −50 , 10 50 we apply on d n ensure that this singularity is never propagated numerically, but of course the truncation may indeed be source of inaccuracy. What we observe is that at convergence of the algorithm, none of the d n takes the boundary value 10 −50 : these boundaries are only reached during the iterations. • Q3: Line 27: Cavity → cavity A3: "Cavity" changed in "cavity"; • Q4: Line 47: ν, inf ν sup → ν inf , ν sup A4: Comma shifted: ν inf , ν sup ; • Q5: Line 49: After the sentence "Empirically, it turns out that N M so that the system is typically severely undetermined", it should be perhaps appropriate to give some examples of values N and M for some metabolic networks, to give an idea to the reader.
A5: Agreed. We changed to "Empirically, it turns out that N > M so that the system is typically severely under-determined. For instance, for the RECON1 model of Homo Sapiens N = 2469 and M = 1587" N and M are typically of the same order of magnitude, so we have changed in >.
• Q6: Line 52-53. I wonder whether the author agree with me in that the following reference could be useful for some readers regarding Flux Balance Analysis: (c) Moreover the sign . = has various meanings in Mathematics, and for some readers it may not be clear from the context.

(d)
In reference [24] of line 143, it is said: "Of course, it is an extensive number of "intractable" priors that makes the computation intractable, not a single one!". Does this mean that a possible generalization of the EP method would be to include an intensive number of intractable priors, viz Q (n1,...,n L ) (ν|b) . (c) We simply replaced the symbol " . =" with "=" as it is clear from the sentence that it is a definition.
(d) Of course this generalization can lead to interest results and it is something that must be taken into account for future developments. We want to underline some points: The computation of the moments of the tilted distribution with two intractable priors becomes slightly harder. Mathematically speaking, for any pair (n, m) of fluxes, we need to compute the expectations ν α n Q (nm) , ν α m Q (nm) for α = {1, 2} over the two-fluxes tilted distribution Consider, for instance, the computation of the first moment ν n Q (n,m) ν n Q (n,m) ∝ where we have used the two components vectors ν (nm) = (ν n , ν m ),ν (nm) = (ν n ,ν m ) and a 2 × 2 matrix Σ −1 (nm) = Σ nn Σ nm Σ mn Σ mm . If we perform the integration in (7) with respect to ν m we obtain something which depend on the erf νn−νn

Σnn
. The integral is now the computation of the first moment of a distribution which contains the error function. To the best our knowledge there is no closed expression to exactly perform this integration. We could numerically compute it (for instance via Monte-Carlo methods) with the drawback of severely slowing down the algorithm, or perform some expansion.
-Notice that when we perform the matching of the moments of the distribution in (6) and Q (ν|b) we have, in principle, 5 equations for 4 unknowns. The fifth missing equation regards the expectation ν n ν m Q (n,m) with which we do not have any associated parameter. To take into account this information, we could generalize even more our approximation, and couple the N fluxes using some number k ≤ N 2 of bivariate Gaussians or eventually using multivariate Gaussians. Unfortunately, it is not clear how to efficiently manage this approximation but we do not exclude to develop more this point in future works. We just mention that something has been already done in this direction in * Qi, Yuan and Minka, T. P. "Tree-structured approximations by expectation propagation". Advances in Neural Information Processing Systems (NIPS)

Numerical results
• Q12: In the subsection "Numerical results" (from line 163) (a) The order in the discussion between lines 168-172 is a bit confusing. Perhaps it would be more appropriate to discuss the part regarding the preprocessing the network first and then discuss how HR and EP are run in the reduced networks. (a) According to the suggestion, we have shifted lines 168-170 after the explanation of pre-processing techniques.
(b) Yes, we do mean EP (fixed); (d) We erased the entire phrase since the Pearson's coefficient is positive (negative) for positive (negative) correlations; it is not exactly equal to +1 (-1).
(b) In line 236, the power EP algorithm is mentioned for the first time. Is this a new implementation? Is this a known method? If the latter, can a reference be provided?. This paragraph does not seem to add anything substantial to the paper. If the authors agree, perhaps it should be removed altogether from the paper (and the corresponding the appendix G). Otherwise, please give a some argument about the importance of such a variation. Basically, instead of minimizing, at each step, the Kullback-Leibler divergence between the full approximate distribution and the tilted one, one has to minimize a generalα-divergence that is formally defined, for two distributions p (x), q (x), x ∈ X as In the limit α → 1 one recovers the "inclusive" KL distance and so EP. This method sometimes provides better results then the standard EP algorithm, but it was not the case for this model. As the reviewer suggested, it is better to omit this part and Appendix G (of the old manuscript) and so we did.

Method
• Q14: In the section "Methods" (from line 260) (a) I am not sure the concept of exclusive and inclusive KL divergence is known. Please make sure this is explained in the references provided or explain otherwise.
A14: The definition of these two quantities is already on the text. Nevertheless, we have added the following line and reference: "A complete discussion about "exclusive" and "inclusive" divergences are provided in [*]".
Possibly the confusion is due to the fact that we omitted the explicit dependence of the matrix D on the variable index n to simplify the notation. We added a sentence in the manuscript to clarify this aspect. We moved Appendices H and I (Now G and H) to an independent Supplementary Results document.