Metabolic reactions enable cellular function by converting nutrients into energy, and by assembling macromolecules that sustain the cellular machinery.1 Cellular metabolism is usually thought of as a collection of pathways comprising enzymatic reactions associated with broad functional categories. Yet metabolic reactions are highly interconnected: enzymes convert multiple reactants into products with other metabolites acting as co-factors; enzymes can catalyse several reactions, and some reactions are catalysed by multiple enzymes, and so on. This enmeshed web of reactions is thus naturally amenable to network analysis, an approach that has been successfully applied to different aspects of cellular and molecular biology, e.g., protein-protein interactions,2 transcriptional regulation,3 or protein structure.4,5

Tools from graph theory6 have previously been applied to the analysis of structural properties of metabolic networks, including their degree distribution,7,8,9,10 the presence of metabolic roles,11 and their community structure.12,13,14,15 A central challenge, however, is that there are multiple ways to construct a network from a metabolic model.16 For example, one can create a graph with metabolites as nodes and edges representing the reactions that transform one metabolite into another;7,8,17,18 a graph with reactions as nodes and edges corresponding to the metabolites shared among them;19,20,21 or even a bipartite graph with both reactions and metabolites as nodes.22 Importantly, the conclusions of graph-theoretical analyses are highly dependent on the chosen graph construction.23

A key feature of metabolic reactions is the directionality of flows: metabolic networks contain both irreversible and reversible reactions, and reversible reactions can change their direction depending on cellular and environmental contexts.1 Existing graph constructions have been useful for developing an intuitive understanding of metabolic complexity. Many of these constructions, however, lead to graphs that do not include directional information that is central to metabolic function.8,16 In addition, current graph constructions are usually derived from the whole set of metabolic reactions in an organism, and thus correspond to a generic metabolic ‘blueprint’ of the cell. Yet cells switch specific pathways ‘on’ and ‘off’ to sustain their energetic budget in different environments.24 Hence, such blueprint graphs might not capture the specific metabolic connectivity in a given environment, thus limiting their ability to provide biological insights in different growth conditions.

In this paper, we present a flow-based approach to construct metabolic graphs that encapsulate the directional flow of metabolites produced or consumed through enzymatic reactions. The proposed graphs can be tailored to incorporate flux distributions under different environmental conditions. To introduce our approach, we proceed in two steps. We first define the Normalised Flow Graph (NFG), a weighted, directed graph with reactions as nodes, edges that represent supplier-consumer relationships between reactions, and weights given by the probability that a metabolite chosen at random from all reactions is produced/consumed by the source/target reaction. This graph can be used to carry out graph-theoretical analyses of organism-wide metabolic organisation independent of cellular context or environmental conditions. We then show that this formalism can be adapted seamlessly to construct the Mass Flow Graph (MFG), a directed, environment-dependent, graph with weights computed from Flux Balance Analysis (FBA),25 the most widespread method to study genome-scale metabolic networks.

Our formulation addresses several drawbacks of current constructions of metabolic graphs. Firstly, in our flow graphs, an edge indicates that metabolites are produced by the source reaction and consumed by the target reaction, thus accounting for metabolic directionality and the natural flow of chemical mass from reactants to products. Secondly, the Normalised Flow Graph discounts naturally the over-representation of pool metabolites (e.g., adenosine triphosphate (ATP), nicotinamide adenine dinucleotide (NADH), protons, water, and other co-factors) that appear in many reactions and tend to obfuscate the graph connectivity. Our construction avoids the removal of pool metabolites from the network, which can change the graph structure drastically.26,27,28,29,30 Finally, the Mass Flow Graph incorporates additional biological information reflecting the effect of the environmental context into the graph construction. In particular, since the weights in the MFG correspond directly to fluxes (in units of mass per time), different biological scenarios can be analysed using balanced fluxes (e.g., from different FBA solutions) under different carbon sources and other environmental perturbations.16,25,31,32

After introducing the mathematical framework, we showcase our approach with two examples. Firstly, in the absence of environmental context, our analysis of the NFG of the core model of Escherichia coli metabolism33 reveals the importance of including directionality and appropriate edge weights in the graph to understand the modular organisation of metabolic sub-systems. We then use FBA solutions computed for several relevant growth conditions for E. coli, and show that the structure of the MFG changes dramatically in each case (e.g., connectivity, ranking of reactions, community structure), thus capturing the environment-dependent nature of metabolism. Secondly, we study a model of human hepatocyte metabolism evaluated under different conditions for the wild-type and in a mutation found in primary hyperoxaluria type 1, a rare metabolic disorder,34 and show how the changes in network structure of the MFGs reveal new information that is complementary to the flux analysis predicted by FBA.


Definitions and background

Consider a metabolic network composed of n metabolites Xi (i = 1,…,n) that participate in m reactions

$$R_j:\quad \mathop {\sum}\limits_{i = 1}^n \alpha _{ij}X_i \rightleftharpoons \mathop {\sum}\limits_{i = 1}^n \beta _{ij}X_i,\quad j = 1,2, \ldots ,m,$$

where αij and βij are the stoichiometric coefficients of species i in reaction j. Let us denote the concentration of metabolite Xi at time t as xi(t). We then define the n-dimensional vector of metabolite concentrations: x(t) = (x1(t),…,xn(t))T. Each reaction takes place with rate vj(x,t), measured in units of concentration per time.35 We compile the reaction rates in the m-dimensional vector: v(t) = (v1(t),…,vm(t))T.

The mass balance of the system can then be represented compactly by the system of ordinary differential equations

$${\dot{\mathbf x}} = {\mathbf{Sv}},$$

where the n × m matrix S is the stoichiometric matrix with entries Sij = βij − αij, i.e., the net number of Xi molecules produced (positive Sij) or consumed (negative Sij) by the j-th reaction. Figure 1a shows a toy example of a metabolic network including nutrient uptake, biosynthesis of metabolic intermediates, secretion of waste products, and biomass production.32

Fig. 1
figure 1

Graphs from metabolic networks. a Toy metabolic network describing nutrient uptake, biosynthesis of metabolic intermediates, secretion of waste products, and biomass production.32 The biomass reaction is R8: X3 + 2X4 + X5. b Bipartite graph associated with the boolean stoichiometric matrix \(\widehat{\mathbf{S}}\), and the Reaction Adjacency Graph (RAG)16 with adjacency matrix \({\mathbf{A}} = {\widehat{\mathbf{S}}}^T{\widehat{\mathbf{S}}}\). The undirected edges of A indicate the number of shared metabolites among reactions. c The Normalised Flow Graph (NFG) \({\mathcal{D}}\) and two Mass Flow Graphs (MFG) \(M({\mathrm{v}}^{*})\) constructed from the consumption and production stoichiometric matrices (5). Note that the reversible reaction R4 is unfolded into two nodes. The NFG in Eq. (8) is a directed graph with weights representing the probability that a metabolite chosen uniformly at random from the stoichiometric matrix is produced by source reaction is consumed by the target reaction. The MFGs in Eq. (12) are constructed from two different Flux Balance Analysis solutions (\({\mathbf{v}}_{\mathbf{1}}^ \ast\) and \({\mathbf{v}}_{\mathbf{2}}^ \ast\)) obtained by optimising a biomass objective function under different flux constraints representing different environmental or cellular contexts (see Sec. SI 2 in the Supplementary Information for details). The weighted edges of the MFGs represent mass flow from source to target reactions in units of metabolic flux. The computed FBA solutions translate into different connectivity in the resulting MFGs

There are several ways to construct a graph for a given metabolic network with stoichiometric matrix S. A common approach16 is to define a unipartite graph with reactions as nodes and m × m adjacency matrix

$${\mathbf{A}} = \widehat{\mathbf S}^T{\widehat{\mathbf S}},$$

where \({\widehat{\mathbf S}}_{}^{}\) is the boolean version of S (i.e., \(\hat{S}_{ij} = 1\) when Sij ≠ 0 and \(\hat{S}_{ij} = 0\) otherwise). This is the Reaction Adjacency Graph (RAG), in which two reactions (nodes) are connected if they share metabolites, either as reactants or products. Self-loops represent the total number of metabolites that participate in a reaction (Fig. 1b).

Though widely studied,8,16 the RAG has known limitations and overlooks key aspects of the connectivity of metabolic networks. The RAG is blind to the directionality of flows, as it does not incorporate the irreversibility of reactions (by construction A is a symmetric matrix). Furthermore, the structure of A is dominated by the large number of edges introduced by pool metabolites that appear in many reactions, such as water, ions or enzymatic cofactors. Computational schemes have been introduced to mitigate the bias caused by pool metabolites,27 but these do not follow from biophysical considerations and need manual calibration. Finally, the construction of the graph A from is not easily extended to incorporate the effect of environmental changes.

Metabolic graphs that incorporate flux directionality and biological context

To address the limitations of the reaction adjacency graph A, we propose a graph formulation that follows from a flux-based perspective. To construct our graph, we unfold each reaction into two separate directions (forward and reverse) and redefine the links between reaction nodes to reflect producer-consumer relationships. Specifically, two reactions are connected if one produces a metabolite that is consumed by the other. As shown below, this definition leads to graphs that naturally account for the reversibility of reactions, and allows for the seamless integration of biological contexts modelled through FBA. Inspired by matrix formulations of chemical reaction network kinetics,36 we rewrite the reaction rate vector v as:

$${\mathbf{v}}: = {\mathbf{v}}^ + - {\mathbf{v}}^ - = {\mathbf{v}}^ + - {\mathrm{diag}}\left( {\mathbf{r}} \right){\mathbf{v}}^ - ,$$

where v+ and v are non-negative vectors containing the forward and backward reaction rates, respectively. Here the m × m matrix diag(r) contains r in its main diagonal, and r is the m-dimensional reversibility vector with components rj = 1 if reaction Rj is reversible and rj = 0 if it is irreversible. With these definitions, we can rewrite the metabolic model in Eq. (2) as:

$${\dot{\mathbf x}} = {\mathbf{Sv}} = \underbrace {\left[ {\begin{array}{*{20}{c}} {\mathbf{S}} & { - {\mathbf{S}}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {{\mathbf{I}}_m} & 0 \cr 0 & {{\mathrm{diag}}\left( {\mathbf{r}} \right)} \end{array}} \right]}_{{\mathbf{S}}_{2m}}\left[ {\begin{array}{*{20}{c}} {{\mathbf{v}}^ + } \cr {{\mathbf{v}}^ - } \end{array}} \right]: = {\mathbf{S}}_{2m}{\kern 1pt} {\mathbf{v}}_{2m}{\kern 1pt} ,$$

where v2m: = [v+v]T is the unfolded 2m-dimensional vector of reaction rates, Im is the m × m identity matrix, and we have defined S2m, the unfolded version of the stoichiometric matrix of the 2m forward and reverse reactions.

Normalised Flow Graph: a directional blueprint of metabolism

The unfolding into forward and backward fluxes leads us to the definition of production and consumption stoichiometric matrices:

$$\begin{array}{*{20}{l}} {{\mathrm{Production:}}\quad } \hfill & {{\mathbf{S}}_{2m}^ + = \frac{1}{2}\left( {{\mathrm{abs}}\left( {{\mathbf{S}}_{2m}} \right) + {\mathbf{S}}_{2m}} \right)} \hfill \cr {{\mathrm{Consumption:}}\quad } \hfill & {{\mathbf{S}}_{2m}^ - = \frac{1}{2}\left( {{\mathrm{abs}}\left( {{\mathbf{S}}_{2m}} \right) - {\mathbf{S}}_{2m}} \right),} \hfill \end{array}$$

where abs(S2m) is the matrix of absolute values of the corresponding entries of S2m. Note that each entry of the matrix \({\mathbf{S}}_{2m}^ +\), denoted \(s_{ij}^ +\), gives the number of molecules of metabolite Xi produced by reaction Rj. Conversely, the entries of \({\mathbf{S}}_{2m}^ -\), denoted \(s_{ij}^ -\), correspond to the number of molecules of metabolite Xi consumed by reaction Rj.

Within our directional flux framework, it is natural to consider a purely probabilistic description of producer-consumer relationships between reactions, as follows. Suppose we are given a stoichiometric matrix S without any additional biological information, such as metabolite concentrations, reaction fluxes, or kinetic rates. In the absence of such information, the probability that a metabolite molecule Xk chosen uniformly at random from S is produced by reaction Ri and consumed by reaction Rj is:

$$P\left({{{\rm{one}}\,{\rm{molecule}}}\,{\mathrm{of}}\,X_k\,{\mathrm{is}}\,{\mathrm{produced}}\,{\mathrm{by}}\,R_i\,{\mathrm{and}}\,{\mathrm{consumed}}\,{\mathrm{by}}\,R_j} \right) = \frac{{s_{ki}^ + }}{{w_k^ + }}{\kern 1pt} \frac{{s_{kj}^ - }}{{w_k^ - }},$$

where \(w_k^ + = \mathop {\sum}\nolimits_{h = 1}^{2m} s_{kh}^ +\) and \(w_k^ - = \mathop {\sum}\nolimits_{h = 1}^{2m} s_{kh}^ -\) are the total number of molecules of Xk produced and consumed by all reactions that have been accounted for in S2m. Unlike models in that rely on stochastic chemical kinetics,37 the probabilities in Eq. (6) do not contain information on kinetic rate constants, which are typically not available for genome-scale metabolic models.38 In our formulation, the relevant probabilities contain only the stoichiometric information included in the matrix S2m and should not be confused with the reaction propensity functions in Gillespie-type stochastic simulations of biochemical systems.

We thus define the weight of the edge between reaction nodes Ri and Rj as the probability that any metabolite chosen at random is produced by Ri and consumed by Rj. Summing over all metabolites and normalizing, we obtain the edge weights of the adjacency matrix of the NFG:

$${\cal D}_{ij} = \frac{1}{n}\mathop {\sum}\limits_{k = 1}^n \frac{{s_{ki}^ + }}{{w_k^ + }}{\kern 1pt} \frac{{s_{kj}^ - }}{{w_k^ - }},$$

in which \(\mathop {\sum}\nolimits_{i,j} {\cal D}_{ij} = 1\) (i.e., the probability that any metabolite is consumed/produced by any reaction is 1). Rewritten compactly in matrix form, we obtain the

$${\mathrm{Normalised}}\,{\mathrm{Flow}}\,{\mathrm{Graph}}\,({\mathrm{NFG}}):\quad {\cal D} = \frac{1}{n}\left( {{\mathbf{W}}_ + ^\dagger {\mathbf{S}}_{2m}^ + } \right)^T\left( {{\mathbf{W}}_ - ^\dagger {\mathbf{S}}_{2m}^ - } \right),$$

where \({\mathbf{W}}_ + ^\dagger = {\mathrm{diag}}\left( {{\mathbf{S}}_{2m}^ + \mathbf{1}_{2m}} \right)^\dagger\), \({\mathbf{W}}_ - ^\dagger = {\mathrm{diag}}\left( {{\mathbf{S}}_{2m}^ - \mathbf{1}_{2m}} \right)^\dagger\), 12m is a vector of ones, and † denotes the Moore-Penrose pseudoinverse. In Fig. 1c we illustrate the creation of the NFG for a toy network. The NFG is a weighted, directed graph which encodes a blueprint of the whole metabolic model, and provides a natural scaling of the contribution of pool metabolites to flux transfer. We remark that the NFG is distinct from directed analogues of the RAG constructed from boolean production and consumption stoichiometric matrices, as shown in Sec. SI 1.

We now extend the construction of the NFG to accommodate specific environmental contexts or growth conditions.

Mass flow graphs: incorporating information of the biological context

Cells adjust their metabolic fluxes to respond to the availability of nutrients and environmental requirements. Flux Balance Analysis (FBA) is a widely used method to predict environment-specific flux distributions. FBA computes a vector of metabolic fluxes \({{\mathrm{v}}^{*}}\) that maximise a cellular objective (e.g., biomass, growth or ATP production). The FBA solution is obtained assuming steady state conditions (\({\dot{\mathbf x}} = 0\) in Eq. (2)) subject to constraints that describe the availability of nutrients and other extracellular compounds.16 The core elements of FBA are briefly summarised in the Appendix A.1.

To incorporate the biological information afforded by FBA solutions into the structure of a metabolic graph, we again define the graph edges in terms of production and consumption fluxes. Similarly to Eq. (4), we unfold the FBA solution vector \({{\mathrm{v}}^{*}}\) into forward and backward components: positive entries in the FBA solution correspond to forward fluxes, negative entries in the FBA solution correspond to backward fluxes. From the unfolded fluxes

$${\mathbf{v}}_{2m}^ \ast = \left[ {\begin{array}{*{20}{c}} {{\mathbf{v}}^{ \ast + }} \cr {{\mathbf{v}}^{ \ast - }} \end{array}} \right] = \frac{1}{2}\left[ {\begin{array}{*{20}{c}} {{\mathrm{abs}}\left( {{\mathbf{v}}^ \ast } \right) + {\mathbf{v}}^ \ast } \cr {{\mathrm{abs}}\left( {{\mathbf{v}}^ \ast } \right) - {\mathbf{v}}^ \ast } \end{array}} \right],$$

we compute the 2m × 1 vector of production and consumption fluxes as

$${\mathbf{j}}({\mathbf{v}}^ \ast ) = {\mathbf{S}}_{2m}^ + {\mathbf{v}}_{2m}^ \ast = {\mathbf{S}}_{2m}^ - {\mathbf{v}}_{2m}^ \ast .$$

The k-th entry of \({\mathrm{j}}({{\mathrm{v}}^{*})}\) is the flux at which metabolite Xk is produced and consumed; the equality of the production and consumption fluxes follows from the steady state condition, \({\dot{\mathbf x}} = 0\) (i.e., the fluxes are balanced).

To construct the flow graph, we define the weight of the edge between reactions Ri and Rj as the total mass flow of metabolites produced by Ri that are consumed by Rj. Assuming that the amount of metabolite produced by one reaction is distributed among the reactions that consume it in proportion to their flux (and respecting the stoichiometry), the flux of metabolite Xk from reaction Ri to Rj is given by

$${\mathrm{Flow}}\,{\mathrm{of}}\,X_k\,{\mathrm{from}}\,R_i\,{\mathrm{to}}\,R_j = ({\mathrm{flow}}\,{\mathrm{of}}\,X_k\,{\mathrm{produced}}\,{\mathrm{by}}\,R_i) \times \left( {\frac{{{\mathrm{flow}}\,{\mathrm{of}}\,X_k{\mathrm{consumed}}\,{\mathrm{by}}\,R_j}}{{{\mathrm{total}}\,{\mathrm{consumptionf}}\,{\mathrm{low}}\,{\mathrm{of}}\,X_k}}} \right).$$

For example, if the total flux of metabolite Xk is 10 mmol/gDW/h, with reaction Ri producing Xk at a rate 1.5 mmol/gDW/h and reaction Rj consuming Xk at a rate 3.0 mmol/gDW/h, then the flow of Xk from Ri to Rj is 0.45 mmol/gDW/h.

Summing (10) over all metabolites, we obtain the edge weight relating reactions Ri and Rj:

$$M_{ij}\left( {{\mathbf{v}}^ \ast } \right) = \mathop {\sum}\limits_{k = 1}^n s_{ki}^ + v_{2m{\kern 1pt} i}^ \ast \times \left( {\frac{{s_{kj}^ - v_{2m{\kern 1pt} j}^ \ast }}{{\mathop {\sum}\nolimits_{j = 1}^{2m} {s_{kj}^ - } v_{2m{\kern 1pt} j}^ \ast }}} \right).$$

In matrix form, these edge weights are collected into the adjacency matrix of the

$${\mathrm{Mass}}\,{\mathrm{Flow}}\,{\mathrm{Graph}}\,\left( {{\mathrm{MFG}}} \right):\quad {\mathbf{M}}\left( {{\bf{v}}^ \ast } \right) = \left( {{\mathbf{S}}_{2m}^ + {\bf{V}}^ \ast } \right)^T{\mathbf{J}}_v^\dagger \left( {{\mathbf{S}}_{2m}^ - {\bf{V}}^ \ast } \right),$$

where \({\bf{V}}^ \ast = {\mathrm{diag}}\left( {{\bf{v}}_{2m}^ \ast } \right)\), \({\mathrm{J}}_{v}={\mathrm{diag}} ({\rm{j}}({{\mathrm{v}}^{*}))}\) and † denotes the matrix pseudoinverse. The MFG is a directed, weighted graph with edge weights in units of mmol/gDW/h. Self-loops describe the metabolic flux of autocatalytic reactions, i.e., those in which products are also reactants.

The MFG provides a versatile framework to create environment-specific metabolic graphs from FBA solutions. In Fig. 1c, we illustrate the creation of MFGs for a toy network under different biological scenarios. In each case, an FBA solution is computed under a fixed uptake flux with the remaining fluxes constrained to account for differences in the biological environment: in scenario 1, the fluxes are constrained to be strictly positive and no larger than the nutrient uptake flux, while in scenario 2 we impose a positive lower bound on reaction R7. Note how the MFG for scenario 2 displays an extra edge between reactions R4 and R7, as well as distinct edge weights to scenario 1 (see Sec. SI 2 for details). These differences illustrate how changes in the FBA solutions translate into different graph connectivities and edge weights.

Graphs for Escherichia coli core metabolism

To illustrate our framework, we construct and analyse the flow graphs of the well-studied core metabolic model of E. coli.33 This model (Fig. 2a) contains 72 metabolites and 95 reactions, grouped into 11 pathways, which describe the main biochemical routes in central carbon metabolism.39,40,41 We provide a Supplemental Spreadsheet with full details of the reactions and metabolites in this model, as well as all the results presented below.

Fig. 2
figure 2

Graphs for the core metabolism of Escherichia coli. a Map of the E. coli core metabolic model created with the online tool Escher.33,54 b The standard Reaction Adjacency Graph A, as given by Eq. (3). The nodes represent reactions; two reactions are linked by an undirected edge if they share reactants or products. The nodes are coloured according to their PageRank score, a measure of their centrality (or importance) in the graph. c The directed Normalised Flow Graph \({\cal D}\), as computed from Eq. (8). The reversible reactions are unfolded into two overlapping nodes (one for the forward reaction, one for the backward). The directed links indicate flow of metabolites produced by the source node and consumed by the target node. The nodes are coloured according to their PageRank score. d Comparison of PageRank percentiles of reactions in A and \({\cal D}\). Reversible reactions are represented by two triangles connected by a line; both share the same PageRank in A, but each has its own PageRank in \({\cal D}\). Reactions that appear above (below) the diagonal have increased (decreased) PageRank in \({\cal D}\) as compared to A

The Normalised Flow Graph: the impact of directionality

To examine the effect of flux directionality on the metabolic graphs, we compare the Reaction Adjacency Graph (A) and our proposed Normalised Flow Graph \(\left( {\cal D} \right)\) for the same metabolic model in Fig. 2. The A graph has 95 nodes and 1,158 undirected edges, whereas the \(\mathcal{D}\) graph has 154 nodes and 1,604 directed and weighted edges. The increase in node count is due to the unfolding of reversible reactions into forward and backward reaction nodes. Unlike the A graph, where edges represent shared metabolites between two reactions, the directed edges of the \(\mathcal{D}\) graph represent the flow of metabolites from a source to a target reaction. A salient feature of both graphs is their high connectivity, which is not apparent from the traditional pathway representation in Fig. 2a.

The effect of directionality becomes apparent when comparing the importance of reaction nodes in both graphs (Fig. 2b–d), as measured with the PageRank score for node centrality.42,43 The overall node hierarchy is maintained across both graphs: exchange reactions tend to have low PageRank centrality scores, core metabolic reactions have high scores, and the biomass reaction has the highest scores in both graphs. Yet we also observe substantial changes in several reactions. For example, the reactions for ATP maintenance (ATPM, irreversible), phosphoenolpyruvate synthase (PPS, irreversible) and ABC-mediated transport of L-glutamine (GLNabc, irreversible) drop from being among the top 10% most important reactions in the A graph to the bottom percentiles in the \({\cal D}\) graph. Conversely, reactions such as aconitase A (ACONTa, irreversible), transaldolase (TALA, reversible) and succinyl-CoA synthetase (SUCOAS, reversible), and formate transport via diffusion (FORti, irreversible) gain substantial importance in the \({\cal D}\) graph. For instance, FORti is the sole consumer of formate, which is produced by pyruvate formate lyase (PFL), a reaction that is highly connected to the rest of the network. Importantly, in most of the reversible reactions, such as ATP synthase (ATPS4r), there is a wide gap between the PageRank of the forward and backward reactions, suggesting a marked asymmetry in the importance of metabolic flows.

Community detection is frequently used in the analysis of complex graphs: nodes are clustered into tightly related communities that reveal the coarse-grained structure of the graph, potentially at different levels of resolution.44,45,46 The community structure of metabolic graphs has been the subject of multiple analyses.12,14,44 However, most community detection methods are applicable to undirected graphs only, and thus fail to capture the directionality of the metabolic graphs we propose here. To account for graph directionality, we use the Markov Stability community detection framework,46,47,48 which uses diffusion on graphs to detect groups of nodes where flows are retained persistently across time scales. Markov Stability is ideally suited to find multi-resolution community structure45 and can deal with both directed and undirected graphs46,49 (see “Menthods” section). In the case of metabolic graphs, Markov Stability can reveal groups of reactions that are closely interlinked by the flow of metabolites that they produce and consume.

Figure 3 shows the difference between the community structure of the undirected RAG and the directed NFG of the core metabolism of E. coli. For the A graph, Markov Stability reveals a partition into seven communities (Fig. 3b, see also Supplementary Sec. SI 3), which are largely dictated by the many edges created by shared pool metabolites. For example, community C 1(A) is mainly composed of reactions that consume or produce ATP and water. Yet, the biomass reaction (the largest consumer of ATP) is not a member of C 1(A) because, in the standard A graph construction, any connection involving ATP has equal weight. Other communities in A are also determined by pool metabolites, e.g. C 2(A) is dominated by H+, and C 3(A) is dominated by NAD+ and NADP+, as illustrated by word clouds of the relative frequency of metabolites in the reactions within each community. The community structure in A thus reflects the limitations of the RAG construction due to the absence of biological context and the large number of uninformative links introduced by pool metabolites.

Fig. 3
figure 3

Directionality and community structure of graphs for Escherichia coli metabolism. a Reactions of the core model of E. coli metabolism grouped into eleven biochemical pathways, b, c Graphs A and \({\cal D}\) from Fig. 2b–c partitioned into communities computed with the Markov Stability method; for clarity, the graph edges are not shown. The Sankey diagrams68,69 show the correspondence between biochemical pathways and the communities found in each graph. The word clouds contain the metabolites that participate in the reactions each community, and the word size is proportional to the number of reactions in which each metabolite participates

For the \({\cal D}\) graph, we found a robust partition into five communities (Fig. 3c, Supplementary Sec. SI 3), which comprise reactions related consistently through biochemical pathways. Community C1\(({\cal D})\) contains the reactions in the pentose phosphate pathway together with the first steps of glycolysis involving D-fructose, D-glucose, or D-ribulose. Community C2\(({\cal D})\) contains the main reactions that produce ATP from substrate level as well as oxidative phosphorylation and the biomass reaction. Community C3\(({\cal D})\) includes the core of the citric acid cycle, anaplerotic reactions related to malate syntheses, as well as the intake of cofactors such as CO2. Community C4\(({\cal D})\) contains reactions that are secondary sources of carbon (such as malate and succinate), as well as oxidative phosphorilation reactions. Finally, community C5\(({\cal D})\) contains reactions that are part of the pyruvate metabolism subsystem, as well as transport reactions for the most common secondary carbon metabolites such as lactate, formate, acetaldehyde and ethanol. Altogether, the communities of the \({\cal D}\) graph reflect metabolite flows associated with specific cellular functions, as a consequence of including flux directionality in the graph construction. As seen in Fig. 3c, the communities are no longer exclusively determined by pool metabolites (e.g., water is no longer dominant and protons are spread among all communities). For a more detailed explanation and comparison of the communities found in the A and \({\cal D}\) graphs, see Supplementary Section SI 3. Full information about PageRank scores and communities is provided in the Supplementary Spreadsheet.

Mass Flow Graphs: the impact of growth conditions and biological context

To incorporate the impact of environmental context, we construct the Mass Flow Graphs in Eq. (12) using FBA solutions of the core model of E. coli metabolism in four relevant growth conditions: aerobic growth in rich media with glucose; aerobic growth in rich media with ethanol, anaerobic growth in glucose; and aerobic growth in glucose but phosphate- and ammonium-limited. The results in Fig. 4 show how changes in metabolite fluxes under different biological contexts have a direct effect in the MFG. Note that, in all cases, the MFGs have fewer nodes than the blueprint graph \({\cal D}\) since the FBA solutions contain numerous reactions with zero flux.

Fig. 4
figure 4

Mass Flow Graphs for Escherichia coli in different growth conditions. The MFGs are computed from Eq. (12) and the FBA solutions in four different environments: a Aerobic growth in d-glucose, b aerobic growth in ethanol, c anaerobic growth in d-glucose, and d aerobic growth in d-glucose but with limited ammonium and phosphate. Each subfigure shows: (left) flux map obtained with Escher,54 where the increased red colour of the arrows indicates increased flux; (centre) Mass Flow Graph with nodes coloured according to their PageRank (zero flux reactions are in grey; thickness of connections proportional to fluxes); (right) community structure computed with the Markov Stability method together with Sankey diagrams showing the correspondence between biochemical pathways and MFG communities

Next we summarise how the changes in the community structure of the MFGs for the four conditions reflect the distinct relationships of functional pathways in response to growth requirements.

Aerobic growth in D-glucose (M glc)

We found a robust partition into three communities with an intuitive biological interpretation (Fig. 4a and Supplementary Fig. SI2A). C 1(Mglc) is the carbon-processing community, comprising reactions that process carbon from D-glucose to pyruvate including most of the glycolysis and pentose phosphate pathways, together with related transport and exchange reactions. C2(Mglc) harbours the bulk of reactions related to oxidative phosphorylation and the production of energy in the cell, including the electron transport chain of NADH dehydrogenase, cytochrome oxidase, and ATP synthase, as well as transport reactions for phosphate and oxygen intake and proton balance. C2(Mglc) also includes the growth reaction, consistent with ATP being the main substrate for both the ATP maintenance (ATPM) requirement and the biomass reaction in this growth condition. Finally, C3(Mglc) contains reactions related to the citric acid cycle (TCA) and the production of NADH and NADPH (i.e., the cell’s reductive power), together with carbon intake routes strongly linked to the TCA cycle, such as those starting from phosphoenolpyruvic acid (PEP).

Aerobic growth in ethanol (M etoh)

The robust partition into three communities that we found for this scenario resembles the structure of Mglc with subtle, yet important, differences (Fig. 4b and Supplementary Fig. SI 2B). Most salient are the differences in the carbon-processing community C1(Metoh), which reflects the switch from d-glucose to ethanol as a carbon source. C1(Metoh) contains gluconeogenic reactions (instead of glycolytic), due to the reversal of flux induced by the change of carbon source, as well as anaplerotic reactions and reactions related to glutamate metabolism. In particular, the reactions in this community are related to the production of precursors such as PEP, pyruvate, 3-phospho-D-glycerate (3PG), glyceraldehyde-3-phosphate (G3P), d-fructose-6-phosphate (F6P), and d-glucose-6-phosphate, all of which are substrates for growth. Consequently, the biomass reaction is also grouped within C1(Metoh) due to the increased metabolic flux of precursors relative to ATP production in this biological scenario. The other two reaction communities (energy-generation C2(Metoh) and citric acid cycle C3(Metoh)) display less prominent differences relative to the Mglc graph, with additional pyruvate metabolism and anaplerotic reactions as well as subtle ascriptions of reactions involved in NADH/NADPH balance and the source for acetyl-CoA.

Anaerobic growth in d-glucose (M anaero)

The profound impact of the absence of oxygen on the metabolic balance of the cell is reflected in drastic changes in the MFG (Fig. 4c and Supplementary Fig. SI 2C). Both the connectivity and reaction communities in the MFG are starkly different from the aerobic scenarios, with a much diminished presence of oxidative phosphorylation and the absence of the first two steps of the electron transport chain (CYTBD and NADH16). We found that Manaero has a robust partition into four communities. C1(Manaero) still contains carbon processing (glucose intake and glycolysis), yet now decoupled from the pentose phosphate pathway. C3(Manaero) includes the pentose phosphate pathway grouped with the citric acid cycle (incomplete) and the biomass reaction, as well as the growth precursors including alpha-D-ribose-5-phosphate (r5p), d-erythrose-4-phosphate (e4p), 2-oxalacetate and NADPH. The other two communities are specific to the anaerobic context: C2(Manaero) contains the conversion of PEP into formate (more than half of the carbon secreted by the cell becomes formate50); C4(Manaero) includes NADH production and consumption via reactions linked to glyceraldehyde-3-phosphate dehydrogenase (GAPD).

Aerobic growth in d-glucose but limited phosphate and ammonium (M lim)

Under growth-limiting conditions, we found a robust partition into three communities (Fig. 4d and Supplementary Fig. SI 2D). The community structure reflects overflow metabolism,51 which occurs when the cell takes in more carbon than it can process. As a consequence, the excess carbon is secreted from the cell, leading to a decrease in growth and a partial shutdown of the citric acid cycle. This is reflected in the reduced weight of the TCA pathway and its grouping with the secretion routes of acetate and formate within C3(Mlim). Hence, C3(Mlim) comprises reactions that are not strongly coupled in favourable growth conditions, yet are linked together by metabolic responses to limited ammonium and phosphate. Furthermore, the carbon-processing community C1(Mlim) contains the glycolytic pathway, yet detached from the pentose phosphate pathway (as in Manaero), highlighting its role in precursor formation. The bioenergetic machinery, contained in community C2(Mlim), includes the pentose phosphate pathway, with a smaller role for the electron transport chain (21.8% of the total ATP as compared to 66.5% in Mglc).

In addition to the effect on community structure, Fig. 4 also shows the changes induced by the environment on the MFG connectivity and relative importance of reactions, as measured by their PageRank score. To provide a global snapshot of the effect of growth conditions on cellular metabolism, Fig. 5 shows the cumulative PageRank of each pathway for each of the MFGs. The cumulative PageRank quantifies the relative importance of pathways, and how their importance changes upon environmental shifts.

Fig. 5
figure 5

Pathway centrality (PageRank) computed from the MFG of different growth conditions. The cumulative pathway PageRank reflects the relative importance of metabolic pathways in each MFG. Changes in pathway centrality indicate the overall rearrangement of fluxes within the pathways in response to environmental shifts: a from aerobic glucose-rich to aerobic ethanol-rich; b from aerobic glucose-rich to anerobic glucose-rich; c from aerobic glucose-rich to a similar medium with limited phosphate and ammonium. Variations in cumulative Pagerank highlight changes across most cellular pathways

In aerobic growth, a shift from glucose to ethanol (Mglc → Metoh) as carbon source decreases the importance of pyruvate metabolism and oxidative phosphorylation, while increasing the importance of the pentose phosphate pathway. A shift from aerobic to anaerobic growth in glucose (Mglc → Manaero) sees a large reduction in the importance of oxidative phosphorylation and the citric acid cycle, coupled with a large increase in the importance of gluconeogenesis, pyruvate metabolism, and transport and exchange reactions. The effect of growth-limiting conditions in aerobic growth under glucose (Mglc → Mlim) is reflected on the increased importance of pyruvate metabolism and a reduction in the importance of oxidative phosphorylation, citric acid cycle, and the pentose phosphate pathway. The importance of transport and exchange reactions is also increased under limiting conditions. Such qualitative relations between growth conditions and the importance of specific pathways highlights the utility of the MFGs to characterise systemic metabolic changes in response to environmental conditions.

A more detailed discussion of the changes in pathways and reactions can be found in Section SI 4 and Fig. SI 2 in the Supplementary Information, with full details of all the results in the Supplemental Spreadsheet.

Multiscale organisation of mass flow graphs

The definition of the MFGs as directed graphs opens up the application of network-theoretic tools for detecting modules of reaction nodes and the hierarchical relationships among them.

In contrast with methods for undirected graphs, the Markov Stability framework47,52 can be used to detect multi-resolution community structure in directed graphs (Sec. 4.2), thus allowing the exploration of the multiscale organisation of metabolic reaction networks. The modules so detected reflect subsets of reactions where metabolic fluxes tend to be contained.

Figure 6 illustrates this multiscale analysis on Mglc, the MFG of E. coli under aerobic growth in glucose. By varying the Markov time t, a parameter in the Markov Stability method, we scanned the community structures at different resolutions. Our results show that, from finer to coarser resolution, the MFG can be partitioned into 11, 7, 5, 3, and 2 communities of high persistence across Markov time (extended plateaux over t, as shown by low values of VI(t, t′)) and high robustness under optimisation (as shown by dips in VI(t)). For further details, see Section 4.2 and Refs.45,46,47,52

Fig. 6
figure 6

Community structure of flow graphs across different scales. We applied the Markov Stability method to partition the mass flow graph for E. coli aerobic growth in glucose (Mglc) across levels of resolution. The top panel shows the number of communities of the optimal partition (blue line) and two measures of its robustness (VI(t) (green line) and VI(t,t′) (colour map)) as a function of the Markov time t (see text and “Methods” section). The five Markov times selected correspond to robust partitions of the graph into 11, 7, 5, 3, and 2 communities, as signalled by extended low values of VI(t,t′) and low values (or pronounced dips) of VI(t). The Sankey diagram (middle panel) visualises the multiscale organisation of the communities of the MFG across Markov times, and the relationship of the communities with the biochemical pathways. The bottom panel shows the five partitions at the selected Markov times. The partition into 3 communities corresponds to that in Fig. 4A

The Sankey diagram in Fig. 6 visualises the pathway composition of the graph partitions and their relationships across different resolutions. As we decrease the resolution (longer Markov times), the reactions in different pathways assemble and split into different groupings, reflecting both specific relationships and general organisation principles associated with this growth condition. A general observation is that glycolysis is grouped together with oxidative phosphorylation across most scales, underlining the fact that those two pathways function as cohesive metabolic sub-units in aerobic conditions. In contrast, the exchange and transport pathways appear spread among multiple partitions across all resolutions. This is expected, as exchange/transport are enabling functional pathways, in which reactions do not interact amongst themselves but rather feed substrates to other pathways.

Other reaction groupings reflect more specific relationships. For example, the citric acid cycle (always linked to anaplerotic reactions) appears as a cohesive unit across most scales, and only splits in two in the final grouping, reflecting the global role of the TCA cycle in linking to both glycolysis and oxidative phosphorylation. The pentose phosphate pathway, on the other hand, is split into two groups (one linked to glutamate metabolism and another one linked to glycolysis) across early scales, only merging into the same community towards the final groupings. This suggests a more interconnected flux relationship of the different steps of the penthose phosphate pathway with the rest of metabolism. In Supplementary Figure SI2, we present the multiscale analyses of the reaction communities for the other three growth scenarios (Metoh, Manaero, Mlim).

Using MFGs to analyse hepatocyte metabolism in wild type and PH1 mutant human cells

To showcase the applicability of our framework to larger metabolic models, we analyse a model of human hepatocyte (liver) metabolism with 777 metabolites and 2589 reactions,34 which extends the widely used HepatoNet1 model53 with an additional 50 reactions and 8 metabolites. This extended model was used in Ref.34 to compare wild-type cells (WT) and cells affected by the rare disease Primary Hyperoxaluria Type 1 (PH1), which lack alanine:glyoxylate aminotransferase (AGT) due to a genetic mutation. AGT is an enzyme found in peroxisomes and its mutation decreases the breakdown of glyoxylate, with subsequent accumulation of calcium oxalate that leads to liver damage.

Following,34 we first obtain 442 FBA solutions for different sets of metabolic objectives for both the wild-type (WT) model and the PH1 model lacking AGT (reaction r2541). We then generate the corresponding 442 MFGs for each WT and PH1, and obtain the averages over each ensemble: \(\overline {\mathbf{M}} _{{\mathrm{WT}}}\) and \(\overline {\mathbf{M}} _{{\mathrm{PH1}}}\). Of the 2589 reactions in the model, 2448 forward and 1362 reverse reactions are present in at least one of the FBA solutions. Hence the average MFGs have 3810 nodes each (see Supplementary Spreadsheet for full details about the reactions).

Figure 7a shows the MFG for the wild-type \(\left( {\overline {\mathbf{M}} _{{\mathrm{WT}}}} \right)\) coloured according to a robust partition into 7 communities obtained with Markov Stability. The seven communities are broadly linked to amino acid metabolism (C0), energy metabolism (C1 and C5), glutathione metabolism (C2), fatty acid and bile acid metabolism (C3 and C4) and cholesterol metabolism and lipoprotein particle assembly (C6). As expected, the network community structure of the MFG is largely preserved under the AGT mutation: the Sankey diagramme in Fig. 7b shows a remarkable match between the partitions of \(\overline {\mathbf{M}} _{{\mathrm{WT}}}\) and \(\overline {\mathbf{M}} _{{\mathrm{PH1}}}\) found independently with Markov Stability. Despite this similarity, our method also identified subtle but important differences between the healthy and diseased networks. In particular, C3′ in \(\overline {\mathbf{M}} _{{\mathrm{PH1}}}\) receives 60 reactions, almost all taking place in the peroxisome and linked to mevalonate and iso-pentenyl pathways, as well as highly central transfer reactions of PPi, O2 and H2O2 between the peroxisome and the cytosol (r1152, r0857, r2577).

Fig. 7
figure 7

MFG analysis of a model of human hepatocyte metabolism and the genetic condition PH1. a Average MFG of wild-type hepatocytes cells over 442 metabolic objectives. The reaction nodes are coloured according to communities in a 7-way partition obtained with Markov Stability. The Sankey diagramme shows the consistency between the communities in the wild-type MFG and the communities independently found in the MFG of the mutated PH1 cells. Word clouds of the most frequent metabolites in the reactions of the WT communities reveal functional groupings (see text). Under the PH1 mutation, the only large change relates to metabolites that join C3’ from community C0 in WT. b Comparison of the PageRank percentiles in the WT and PH1 MFGs, with reactions whose rank changes by more than 20 percentiles labelled. c Difference in FBA flux between WT and PH1 vs difference in PageRank percentile between WT and PH1. Reactions whose flux difference is greater than 100 mmol/gDW/h (italics) or whose change in PageRank percentile is greater than 20 are labelled. The differences in centrality (PageRank) provide complementary information, revealing additional important reactions affected by the PH1 mutation that knocks out reaction r2541

Overall, the centrality (PageRank) of most reactions in the MFG is relatively unaffected by the PH1 mutation, as shown by the good correlation between the PageRank percentiles in \(\overline {\mathbf{M}} _{{\mathrm{WT}}}\) and \(\overline {\mathbf{M}} _{{\mathrm{PH1}}}\) in Fig. 7c. Yet, there are notable exceptions, and the reactions that exhibit the largest change in PageRank centrality (labelled in Fig. 7b) provide biological insights into the disease state. Specifically, the four reactions (r0916, r1088, r2384, r2374) that undergo the largest increase in centrality from \(\overline {\mathbf{M}} _{{\mathrm{WT}}}\) to \(\overline {\mathbf{M}} _{{\mathrm{PH1}}}\) are related to the transfer of citrate out of the cytosol in exchange for oxalate and PEP; whereas those with the largest decrease of PageRank from \(\overline {\mathbf{M}} _{{\mathrm{WT}}}\) to \(\overline {\mathbf{M}} _{{\mathrm{PH1}}}\) are related to VLDL-pool reactions (r1228, r1195, r1220) and to transfers of hydroxypyruvate and alanine from peroxisome to cytosol (r2581, r2543).

It is worth remarking that although oxalate and citrate reactions are directly linked to metabolic changes associated with the PH1 diseased state, none of them exhibits large changes in their flux predicted by FBA, yet they show large changes in PageRank centrality.

These observations underscore how the information provided by our network analysis provides complementary information to the results from FBA. As shown in Fig. 7d, there is a group of reactions (labelled with italics in the Figure) that exhibit large gains or decreases in their flux under the PH1 mutation, yet they only undergo relatively small changes in their PageRank scores. Closer inspection reveals that most of these reactions are close to the AGT reaction (r2541, highlighted in the Figure) in the pathway and involve the conversion of glycolate, pyruvate, glycine, alanine and serine. Hence the changes in flux follow from the local rearrangement of flows as a consequence of the deletion of reaction r2541. On the other hand, the citrate and oxalate reactions (r0916, r1088, r2384, r2374) with large changes in their centrality yet undergo small changes in flux, thus reflecting global changes in the flux structure of the network. Importantly, the transport reactions of O2, H2O2, serine and hydroxypyruvate between cytosol and peroxisome (r0857, r2577, r2583, r2543) all undergo large changes both in centrality and flux, highlighting the importance of peroxisome transfer reactions in PH1. We provide a full spreadsheet with these analyses as Supplementary Material for the interested reader.


Metabolism is commonly understood in terms of functional pathways interconnected into metabolic networks,54 i.e., metabolites linked by arrows representing enzymatic reactions between them as in Fig. 2a. However, such standard representations are not amenable to rigorous graph-theoretic analysis. Fundamentally different graphs can be constructed from such metabolic reactions depending on the chosen representation of species/interactions as nodes/edges, e.g., reactions as nodes; metabolites as nodes; or both reaction and metabolites as nodes.16 Each of those graphs can be directed or undirected and with weighted links computed according to different rules. The choices and subtleties in graph construction are crucial both to capture the relevant metabolic information and to interpret their topological properties.10,17

Here we have presented a flux-based strategy to build graphs for metabolic networks. Our graphs have reactions as nodes and directed weighted edges representing the flux of metabolites produced by a source reaction and consumed by a target reaction. This principle is applied to build both ‘blueprint’ graphs (NFG), which summarise probabilistically the fluxes of the whole metabolism of an organism, as well as context-specific graphs (MFGs), which reflect specific environmental conditions. The blueprint Normalised Flow Graph has edge weights equal to the probability that source/target reactions produce/consume a molecule of a metabolite chosen at random from the stoichiometric matrix in the absence of any other information, and can thus be used when this matrix is the only information available. The NFG construction naturally tames the over-representation of pool metabolites without the need to remove them from the graph arbitrarily, as often done in the literature.26,28,29,30 Context-specific Mass Flow Graphs (MFGs) can incorporate the effect of the environment, e.g., with edge weights corresponding to the total flux of metabolites between reactions as computed by Flux Balance Analysis (FBA). FBA solutions for different environments can then be used to build different metabolic graphs in different growth conditions.

The proposed graph constructions provide complementary tools for studying the organisation of metabolism and can be embedded into any FBA-based modelling pipeline. Specifically, the NFG relies on the availability of a well-curated stoichiometric matrix, which is produced with metabolic reconstruction techniques that typically precede the application of FBA. The MFG, on the other hand, explicitly uses the FBA solutions in its construction. Both methods provide a systematic framework to convert genome-scale metabolic models into a directed graph on which analysis tools from network theory can be applied.

To exemplify our approach, we built and analysed NFG and MFGs for the core metabolism of E. coli. Through the analysis of topological properties and community structure of these graphs, we highlighted the importance of weighted directionality in metabolic graph construction, and revealed the flow-mediated relationships between functional pathways under different environments. In particular, the MFGs capture specific metabolic adaptations such as the glycolytic-gluconeogenic switch, overflow metabolism, and the effects of anoxia. The proposed graph construction can be readily applied to large genome-scale metabolic networks.12,19,21,22,39

To illustrate the scalability of our analyses to larger metabolic models, we studied a genome-scale model of a large metabolic model of human hepatocytes with around 3000 reactions in which we compared the wild-type and a mutated state associated with the disease PH1 under more than 400 metabolic conditions.34 Our network analysis of the MFGs revealed a consistent organisation of the reaction graph, which is highly preserved under the mutation. Our analysis also identified notable changes in the network centrality score and community structure of certain reactions, which is linked to key biological processes in PH1. Importantly, network measures computed from the MFGs reveal complementary information to that provided by the sole analysis of perturbed FBA fluxes.

Our flow graphs provide a systematic connection between network theory and constraint-based methods widely employed in metabolic modelling,21,22,25,32 thus opening avenues towards environment-dependent, graph-based analyses of cell metabolism. An area of interest for future research is the use of MFGs to study how network measures of flow graphs can help characterise metabolic conditions that maximise the efficacy of drug treatments or disease-related distortions, e.g., cancer-related metabolic signatures.55,56,57,58 In particular, MFGs can quantify metabolic robustness via graph statistics upon removal of reaction nodes.22

The proposed framework for graph construction for metabolic networks can be extended in different directions. The core idea is the distinction between production and consumption fluxes (5), and how to encode the relationship between produced and consumed mass flows in the weighted links of a graph. This general principle can also be used to build other potentially useful graphs. For example, two other graphs that describe relationships between reactions are:

$${\mathrm{Competition}}\,{\mathrm{graph}}:\quad \quad {\cal D}_{\mathrm{c}} = \frac{1}{n}{\kern 1pt} {\mathbf{S}}_{2m}^{ - {\kern 1pt} T}\left( {{\mathbf{W}}_ - ^\dagger } \right)^2{\mathbf{S}}_{2m}^ -$$
$${\mathrm{Synergy}}\,{\mathrm{graph}}:\quad \quad {\cal D}_{\mathrm{s}} = \frac{1}{n}{\kern 1pt} {\mathbf{S}}_{2m}^{ + {\kern 1pt} T}\left( {{\mathbf{W}}_ + ^\dagger } \right)^2{\mathbf{S}}_{2m}^ + .$$

The competition and synergy graphs are undirected and their edge weights represent the probability that two reactions consume \(\left( {{\cal D}_{\mathrm{c}}} \right)\) or produce \(\left( {{\cal D}_{\mathrm{s}}} \right)\) metabolites chosen uniformly at random. Similarly to the MFG construction, we can also create the corresponding FBA versions of competition and synergy mass flow graphs, which follow directly from (1214). These additional graphs could help reveal further relationships between metabolic reactions in the cell, and will be the subject of future studies.

The framework can also be extended to include dynamic adaptations of metabolic activity in different ways: by using dynamic extensions of FBA;59,60 by incorporating static61 or time-varying62 enzyme concentrations; or by considering kinetic models (with kinetic constants when available) to generate probabilistic reaction fluxes in the sense of stochastic chemical reaction networks.37,63 Of particular interest to metabolic modelling, we envision that MFGs could provide a route to evaluate the robustness of FBA solutions25,64 by exploiting the non-uniqueness of the MFG from each FBA solution in the space of graphs. Such results could enhance the interface between network science and metabolic analysis, allowing for the systematic exploration of the system-level organisation of metabolism in response to environmental constraints and disease states.


Flux balance analysis

Flux Balance Analysis (FBA)25,32 is a widely-adopted approach to analyse metabolism and cellular growth. FBA calculates the reaction fluxes that optimise growth in specific biological contexts. The main hypothesis behind FBA is that cells adapt their metabolism to maximise growth in different biological conditions. The conditions are encoded as constraints on the fluxes of certain reactions; for example, constraints reactions that import nutrients and other necessary compounds from the exterior.

The mathematical formulation of the FBA is described in the following constrained optimisation problem:

$$\begin{array}{*{20}{l}} {{\mathrm{maximise:}}} \hfill & {{\mathbf{c}}^T{\mathbf{v}}} \hfill \cr {{\mathrm{subject}}\,{\mathrm{to}}{\kern 1pt} {\kern 1pt} } \hfill & {\left\{ {\begin{array}{*{20}{l}} {{\mathbf{Sv}} = 0} \hfill \cr {{\mathbf{v}}_{{\mathrm{lb}}} \le {\mathbf{v}} \le {\mathbf{v}}_{{\mathrm{ub}}},} \hfill \end{array}} \right.} \hfill \end{array}$$

where S is the stoichiometry matrix of the model, v the vector of fluxes, c is an indicator vector (i.e., c(i) = 1 when i is the biomass reaction and zero everywhere else) so that cTv is the flux of the biomass reaction. The constraint Sv = 0 enforces mass-conservation at stationarity, and vlb and vub are the lower and upper bounds of each reaction’s flux. Through these vectors, one can encode a variety of different scenarios.33 The biomass reaction represents the most widely-used flux that is optimised, although there are others can be used as well.31,65

In our simulations, we set the individual carbon intake rate to 18.5 mmol/gDW/h for every source available in each scenario. We allowed oxygen intake to reach the maximum needed in to consume all the carbon except in the anaerobic condition scenario, in which the upper bound for oxygen intake was 0 mmol/gDW/h. In the scenario with limited phosphate and ammonium intake, the levels of NH4 and phosphate intake were fixed at 4.5 mmol/gDW/h and 3.04 mmol/gDW/h respectively (a reduction of 50% compared to a glucose-fed aerobic scenario with no restrictions).

Markov Stability community detection framework

We extract the communities in each network using the Markov Stability community detection framework.47,48 This framework uses diffusion processes on the network to find groups of nodes (i.e., communities) that retain flows for longer than one would expect on a comparable random network; in addition, Markov Stability incorporates directed flows seamlessly into the analysis.46,49

The diffusion process we use is a continuous-time Markov process on the network. From the adjacency matrix G of the graph (in our case, the RAG, NFG or MFG), we construct a rate matrix for the process: \({\mathbf{M}} = {\mathbf{K}}_{{\mathrm{out}}}^{ - 1}{\mathbf{G}}\), where Kout is the diagonal matrix of out-strengths, \(k_{{\mathrm{out}},i} = \mathop {\sum}\nolimits_j g_{i,j}\). When a node has no outgoing edges then we simply let kout,i = 1. In general, a directed network will not be strongly-connected and thus a Markov process on M will not have a unique steady state. To ensure the uniqueness of the steady state we must add a teleportation component to the dynamics by which a random walker visiting a node can follow an outgoing edge with probability λ or jump (teleport) uniformly to any other node in the network with probability 1−λ.42 The rate matrix of a Markov process with teleportation is:

$${\mathbf{B}} = \lambda {\mathbf{M}} + \frac{1}{N}\left[ {\left( {1 - \lambda } \right){\mathbf{I}}_N + \lambda \,{\mathrm{diag}}({\mathbf{a}})} \right]{\mathbf{11}}^T,$$

where the N × 1 vector a is an indicator for dangling nodes: if node i has no outgoing edges then ai = 1, and ai = 0 otherwise. Here we use λ = 0.85. The Markov process is described by the ODE:

$${\dot{\mathbf x}} = - {\mathbf{L}}^T{\mathbf{x}},$$

where L = IN − B. The solution of (17) is \({\mathbf{x}}(t) = e^{ - t{\mathbf{L}}^T}{\mathbf{x}}(0)\) and its stationary state (i.e., \({\dot{\mathbf x}} = 0\)) is x = π, where π is the leading left eigenvector of B.

A hard partition of the graph into C communities can be encoded into the N × C matrix H, where hic = 1 if node i belongs to community c and zero otherwise. The C × C clustered autocovariance matrix of (17) is

$${\mathbf{R}}(t,{\mathbf{H}}) = {\mathbf{H}}^T\left( {{\mathbf{{\Pi}}}e^{ - t{\mathbf{L}}^T} - {\mathbf{\pi \pi }}^T} \right){\mathbf{H}},$$

and the entry (c,s) of R(t, H) measures how likely it is that a random walker that started the process in community c finds itself in community s after time t when at stationarity. The diagonal elements of R(t,H) thus record how good the communities in H are at retaining flows. The Markov stability of the partition is then defined as

$$r(t,{\mathbf{H}}) = {\mathrm{trace}}\,{\kern 1pt} {\mathbf{R}}(t,{\mathbf{H}}).$$

The optimised communities are obtained by maximising the cost function (19) over the space of all partitions for every time t to obtain an optimised partition \(\widehat{\cal P}(t)\). This optimisation is NP-hard; hence, with no guarantees of optimality. Here we use the Louvain greedy optimisation heuristic,66 which is known to give high quality solutions \(\widehat{\cal P}(t)\) in an efficient manner. The value of the Markov time t, i.e. the duration of the Markov process, can be understood as a resolution parameter for the partition into communities.45,47 In the limit t → 0, Markov stability will assign each node to its own community; as t grows, we obtain larger communities because the random walkers have more time to explore the network.48 We scan through a range of values of t to explore the multiscale community structure of the network. The code for Markov Stability can be found at

To identify the important partitions across time, we use two criteria of robustness.45 Firstly, we optimise (19) 100 times for each value of t and we assess the consistency of the solutions found. A relevant partition should be a robust outcome of the optimisation, i.e., the ensemble of optimised solutions should be similar as measured with the normalised variation of information:67

$$VI\left( {{\cal P},{\cal P}^\prime } \right) = \frac{{2\Omega \left( {{\cal P},{\cal P}^\prime } \right) - \Omega \left( {\cal P} \right) - \Omega \left( {{\cal P}^\prime } \right)}}{{{\mathrm{log}}(n)}},$$

where \(\Omega ({\cal P}) = - \mathop {\sum}\nolimits_{\cal C} p({\cal C}){\mathrm{log}}\,p({\cal C})\) is a Shannon entropy and \(p({\cal C})\) is the relative frequency of finding a node in community \({\cal P}\) in partition \({\cal P}\). We then compute the average variation of information of the ensemble of solutions from the \(\ell = 100\) Louvain optimisations \({\cal P}_i(t)\) at each Markov time t:

$$VI(t) = \frac{1}{{\ell \left( {\ell - 1} \right)}}\mathop {\sum}\limits_{i \ne j} VI\left( {{\cal P}_i(t),{\cal P}_j(t)} \right).$$

If all Louvain runs return similar partitions, then VI(t) is small, indicating robustness of the partition to the optimisation. Hence we select partitions with low values (or dips) of VI(t). Secondly, relevant partitions should also be optimal across Markov time, as indicated by a low values of the cross-time variation of information:

$$VI\left( {t,t^\prime } \right) = VI\left( {\hat {\cal P}\left( t \right),\hat {\cal P}\left( {t^\prime } \right)} \right).$$

Therefore, we also search for partitions with extended low value plateaux of VI(t, t′).45,46,52

Data statement

No new data were generated during the course of this research. The results of the analysis are available in the Supplementary Spreadsheet.

Code availability statement

All computations performed on MATLAB. Code for Markov Stability available at