Introduction

“Robustness” is a system’s intrinsic ability to maintain functionality under perturbation. Due to the generality of this definition, a consensus on an exact quantitative metric, especially in a biological context, has yet to emerge1,2,3,4. Aside from biology, robustness and reliability are key design goals in many fields of engineering and particularly paramount in the development of safety-critical systems5. Thus, some of the theoretical groundwork relevant in these areas can be borrowed for a systems biology approach1,2.

Common basic reliability measures consider only the interaction of binary-state components, i.e. components that are either fully functional or fully dysfunctional. In such an analysis, a network is deemed functional if there exists at least one path from the input node(s) to the output node(s). Some early efforts to quantify structural robustness in metabolic networks follow the same principle. They correlate robustness with the number of unique, minimal pathways6 that provide a given function (in the context of metabolism usually cell growth)7,8,9.

Consistent with the characterisation of “functionality” as cell growth, “dysfunctionality” (i.e. the lack of growth) is caused by lethal combinations of reaction knockouts, which could arise via loss-of-function (LOF) mutations disabling vital enzymatic activity. Similarly to how all paths leading from input nodes to output nodes can be determined computationally10, the same can be done for all possible ways of blocking these paths (i.e. lethal combinations of reaction deletions)11. Analogously, knowledge of these knockout combinations, termed minimal cut sets (MCSs)12, allows for computing the network’s PoF, which—in mathematical terms—is the complement of robustness13.

As we have shown previously, the great advantage in using the PoF as a surrogate for robustness lies in the fact that only the lowest cardinality-MCSs are sufficient to approximate it with negligible error13. As efficient algorithms for finding low-cardinality MCSs are available14,15, this allows for the examination of robustness in GSMMs.

Here, we present fundamental improvements—both in theory and software implementation—of our PoF-calculating algorithm13. Provided with a list of low-cardinality MCSs and running on a personal computer, the program is now capable of determining the PoF of a GSMM at reasonable accuracy in under a minute, whereas hours of high-performance computing time were required in the past. With the improved tool at hand we computed the PoF across a large panel of Enterobacteriaceae as well as 30 fungal species and show that their structural metabolic robustness does not correlate with their ability to thrive in nutritionally diverse environments.

Results

Terminology and definitions

The PoF, F, of a metabolic network with r reactions is defined as the probability that a given number of random LOF mutations disables growth (i.e. is lethal). It can be expressed as the probability-weighted average of the failure frequency

$${f}_{d}=\frac{{\mathrm{number}}\; {\mathrm{of}}\; {\mathrm{lethal}}\; {\mathrm{combinations}}\; {\mathrm{of}}\; {d}\; {\mathrm{deletions}}}{{\mathrm{number}}\; {\mathrm{of}}\; {\mathrm{all}}\; {\mathrm{combinations}}\; {\mathrm{of}}\; {d}\; {\mathrm{deletions}}}$$
(1)

for LOF mutations in exactly d enzymes (reactions) being lethal. This gives

$$F:= \mathop{\sum }\limits_{d = 1}^{r}{w}_{d}{f}_{d}$$
(2)

with the discrete binomial distribution

$${w}_{d}=\left({r}\atop{d}\right){p}^{d}{(1-p)}^{r-d},$$
(3)

which—for a constant reaction-level mutation rate p—describes the probability that d LOF mutations occur.

Equation (1) can be expressed in a computationally more approachable way using combinations of MCSs13. In this context, an MCS is a minimal set of reaction deletions that suppresses growth12. Essential reactions, synthetic lethal pairs or synthetic lethal triplets are examples of MCSs with cardinalities one, two, or three, respectively. In the Supplementary Notes we expand on ref. 13 and show that, if all MCSs of a metabolic network are known, Eq. (2) can be formulated as

$$F=\mathop{\sum }\limits_{{{\emptyset}}\ne {\mathcal{J}}\subseteq \{1,...,m\}}{(-1)}^{| {\mathcal{J}}| -1}\,{p}^{| {{\mathscr{M}}}_{{\mathcal{J}}}| },$$
(4)

where \({{\mathscr{M}}}_{{\mathcal{J}}}\) denotes the union \({\cup }_{j\in {\mathcal{J}}}{{\mathscr{M}}}_{j}\) of one or more MCSs with the (multi)-index \({\mathcal{J}}\) running over the power set of the indices {1, . . ., m} of all m MCSs. In other words, this means that every possible combination of one or more MCSs is a summand in Eq. (4). In Supplementary Table 1, we provide an evaluation of Eq. (4) for the toy network in Fig. 1a.

Fig. 1: PoF by example.
figure 1

Toy network (a) and failure frequency as function of the number of LOF mutations (b). Hatched bars indicate the contributions of the essential reaction (r6, column 1), the two synthetic lethal pairs ({r1, r2} and {r4, r5}, column 2), and the two synthetic lethal triplets ({r1, r3, r5} and {r2, r3, r4}, column 3). Note the increasing overlap between the bars for increasing d, indicating that large parts of fd can be estimated without knowledge of higher order combinations of mutations. For a complete list of all lethal combinations, see Supplementary Table 2.

Note that according to Eq. (4), F solely depends on the total number of MCSs, m, and the cardinalities of the members of their power set (i.e. on how the MCSs overlap with each other), but not on the number of reactions in the network. Thus, the PoF is independent of network size.

PoF by example

A toy network composed of five irreversible and one reversible reaction is depicted in Fig. 1a and produces biomass via r6. Any combination of LOF mutations encompassing the essential reaction r6, one of the synthetic lethal pairs {r1, r2}, {r4, r5}, or one of the synthetic lethal triplets {r1, r3, r5}, {r2, r3, r4} is lethal. The failure frequency, fd (i.e. the probability that a given number of d LOF mutations results in cell death), is illustrated in Fig. 1b. In general, the failure frequency for any fixed d is dominated by single and double LOF mutations (see the hatched bars in Fig. 1b), while the contribution of higher order combinations is relatively small. For instance, in Fig. 1b all fd—except for f3—can be computed exactly with the knowledge of the single and double LOF mutations alone as all fd with d > 3 are already determined to be 1 by their respective contributions.

For a constant mutation rate p = 0.1 the likelihood that one, two, or more LOF mutations occur is binomially distributed, and the resulting expected value of the probability of a lethal set of mutations (i.e. the PoF) is p + 2p2 − 7p4 + 7p5 − 2p6 = 0.11936 (see Supplementary Tables 1 and 2 for details). Note that it is independent of the network’s size.

Estimating F in large models

Equation (4) determines the PoF exactly as long as all m MCSs are known and the sum is evaluated over all their possible combinations (i.e. the full power set with 2m − 1 members). For GSMMs both requirements cannot be met as the enumeration of all high-cardinality MCSs in a large network is computationally intractable and—even if it was not—the number of possible combinations would soon exceed the number of atoms in the universe. Thus, approximations are indispensable.

It is possible to systematically construct the first \(m_i\) MCSs of lowest cardinality15, which have the greatest impact on the PoF. Hence, F can still be reliably estimated, even when only a relatively small number of MCS are known13. This addresses the first issue. However, even if the number of included MCSs is reduced from m to \(m_i\), the exponential explosion of the summands still remains critical. In the simplest, least accurate case, we consider only the essential reactions, i.e. all m1 MCSs with \(| {{\mathscr{M}}}_{j}| =1\), and Eq. (4) simplifies to

$$F\approx {\widetilde{F}}_{r}^{1}=1-{(1-p)}^{{m}_{1}}\approx {m}_{1}p$$
(5)

(for derivation, see Supplementary Notes). When all MCSs up to a certain cardinality are known, we call this cardinality dm (in this case dm = 1). The number of essential reactions, m1, times the mutation rate, p, is in fact the leading term in the sum of Eq. (4). To improve accuracy, we resort to Eq. (2) and, rather than summing over all r, truncate the sum to d → d0 so that

$$F\approx {\widetilde{F}}_{{d}_{0}}^{{d}_{m}}=\mathop{\sum }\limits_{d = 1}^{{d}_{0}}{w}_{d}{f}_{d}^{{d}_{m}}.$$
(6)

Here, the superscript dm indicates that \({\widetilde{F}}_{{d}_{0}}\) and fd are computed with all MCSs up to cardinality dm. However, we show in the Supplementary Notes that the resulting error can be further reduced.

Figure 2 serves to illustrate the maximum error \({\varepsilon }_{\max }\) for \({\widetilde{F}}_{10}^{3}\) in the E. coli GSMMi JO1366 (dm = 3; all MCSs with cardinalities <4 are known). For d ≤ dm = 3 all \({f}_{d}^{3}\) are exact. MCSs with cardinalities >3 have not been calculated and their contributions are unknown. Therefore, in the worst case all cut sets of cardinality 4 could be lethal leading to \({f}_{d}^{3}=1\) for d ≥ 4. The resulting maximum error is represented by the hatched bars in the inset of Fig. 2. It can be calculated via

$${\varepsilon }_{\max }=1-{\widetilde{F}}_{{d}_{0}}^{{d}_{m}}-\mathop{\sum }\limits_{d = 0}^{{d}_{m}}{w}_{d}\left(1-{f}_{d}^{{d}_{m}}\right).$$
(7)

For iJO1366 in Fig. 2 the maximum absolute and relative errors peak around p = 3 × 10−3 and quickly approach zero for p < 0.5 × 10−3. Note that the sum in Eq. (7) is independent of d0.

Fig. 2: Maximum error of \(\widetilde{F}\).
figure 2

Lower (\(\widetilde{F}\)) and upper (\(\widetilde{F}+\varepsilon\)) bounds (blue lines) as well as relative and absolute errors (dashed and solid grey lines) for the PoF of a GSMM of E. coli with dm = 3 and d0 = 10 vs. LOF mutation rate p. Vertical grey lines indicate the position of the respective maximum value. The inset shows fd for 1 ≤ d ≤ 12 (white bars) and the corresponding maximum error (blue bars) as well as the shape of the binomial distribution at the mutation rates with the greatest absolute and relative errors.

Implementation

In our previous work13, we approximated the (total) PoF by computing \({{f}}_{{d}}^{{d}_{m}}\) only for d ≤ d0 = dm and formulated a parallelised algorithm to carry out the truncated sum. Here, we present pof2.0 that includes further algorithmic improvements and allows us to quickly evaluate \({\widetilde{F}}_{{d}_{0}}^{{d}_{m}}\) for larger d0. This has two implications: For a given number of MCSs F can be predicted more precisely and to achieve a certain accuracy the number of required MCSs can be reduced.

At the core of our method lies the fact that any lethal cut set must be a superset of at least one MCS. In order to estimate \({\widetilde{F}}_{{d}_{0}}^{{d}_{m}}\), \(| {{\mathscr{M}}}_{{\mathcal{J}}}|\) (the number of knocked out reactions) as well as \(| {\mathcal{J}}|\) (the number of MCSs in the cut set) need to be found for every possible combination that can be formed from the network’s known MCSs (i.e. their power set). The program achieves this by a recursive procedure, the basis of which has been outlined in ref. 13. In short, we first sort the list of MCSs by cardinality. Then, for every element in the sorted list, we iterate over all MCSs that appear earlier in the list and form the respective unions with \(| {\mathcal{J}}| =2\). If a union with \(| {{\mathscr{M}}}_{{\mathcal{J}}}| \le {d}_{0}\) is not a subset of a previously encountered cut set, we again iterate over all elements in the list up to the earliest MCS already present in that union to form all cut sets with \(| {\mathcal{J}}| =3\) and so on. \(| {{\mathscr{M}}}_{{\mathcal{J}}}|\) and \(| {\mathcal{J}}|\) of every lethal combination are recorded to be evaluated once recursion has finished. The actual implementation employs a few shortcuts that considerably reduce the number of recursions required.

Moreover, in metabolic pathway analysis, it is common practice to compress networks before analysis16. This reduces the number of metabolites and reactions without losing information and is necessary for efficient utilisation of computational resources. Naturally, compressed networks also contain drastically fewer MCSs. pof2.0 is capable of handling linearly compressed networks, which entails a major leap in performance. The code is available at www.github.com/julibeg/PoF.

PoF computes within seconds

To assess the computational efficiency of pof2.0, we analysed its performance as a function of the number of processed MCSs for E. coli’s central carbon metabolism model (CCMM)17 and its GSMM iJO136618 (Supplementary Fig. 1). In both cases, we observed a considerable reduction in run-time cost per additionally processed MCS compared to the previous implementation13. For instance, after processing ~2000 MCSs of the CCMM, we reached a ~100-fold speedup, the majority of which can be attributed to network compression. Extrapolating run-times to ~796,070 MCSs (dm = 15) gives an estimate of ~280 h for the previous implementation while pof2.0 took less than a second. Due to these advances, the bulk of the computational burden in estimating structural robustness now clearly lies at MCS enumeration.

The preceding sections established the theoretical foundation and computational feasibility of our approach to quantify structural robustness. Next, we capitalise on the reduced run-time of pof2.0 by analysing hundreds of GSMMs in order to showcase the utility of our approach and examine the nature of metabolic robustness. Unless stated otherwise, all PoF approximations reported below were calculated with dm = 3, d0 = 10, and p = 10−4.

PoF negatively correlates with carbon source molar mass

To explore the influence of varying carbon sources on the robustness of growth in E. coli we computed the PoF of iJO136618 growing on minimal medium under aerobic conditions for all its 181 single, growth-supporting carbon sources. As shown in Fig. 3, the PoF decreased with increasing molar mass of the carbon source (Pearson: r = −0.30, p = 3 × 10−5; Spearman: r = −0.48, p = 8 × 10−12).

Fig. 3: PoF vs. carbon source molar mass.
figure 3

\({\widetilde{F}}_{10}^{3}\) as function of the carbon source’s molar mass for all single, growth-supporting carbon sources in E. coli iJO1366. The line represents a linear fit to the data. Ellipses single out two spikes (see main text for details). The inset shows data with spikes removed and improved fit.

Figure 3 features two pronounced spikes (marked “1” and “2”). In spike 1 the PoF increases with the molar mass of the respective carbon sources, which are all medium- and long-chain fatty acids. While in reality the same enzymes process fatty acids of different lengths during beta-oxidation, these steps are represented as distinct reactions in metabolic models. Thus, contrary to in vivo, fatty acids are catabolised by increasingly longer linear reaction chains in silico. Spike 2, on the other hand, is comprised of cases where a dramatic increase in molar mass has hardly any effect on the PoF. Responsible are oligomaltoses (i.e. starch oligomers), most of which reach central metabolism via the same number of reactions in silico. Without the spikes, correlation increases to: Pearson r = −0.57, p = 5.4 × 10−16; Spearman r = −0.58, p = 1.3 × 10−16.

We further investigated the impact of the carbon source’s molecular composition on the PoF, which revealed a hierarchy based on chemical composition. On average, “pure” carbon sources (i.e. consisting only of C, H, and—possibly—O atoms) supported less robust growth than substrates containing either nitrogen or phosphate or both; see Supplementary Fig. 2.

Growth on “elementally richer” substrates being more robust can be explained by higher redundancy. For instance, when growing on an amino acid, otherwise lethal mutations in ammonium transport and assimilation pathways would become irrelevant. As an example consider the 298 MCSs of cardinality 1 in the joint set of MCSs for aerobic growth on either glucose or glutamate, out of which 285 were identical for both. Only eight and five were specific to glucose and glutamate, respectively, which is reflected in a lower PoF for the latter. Three out of the eight essential reactions unique to growth on glucose are involved in ammonium transport. This illustrates how F is linked to topological differences in metabolic networks.

The origin of biological robustness

The evolutionary origin of genetic robustness—despite its ubiquity1,19—remains strongly debated20,21. Approaches to explaining its emergence can be roughly split into three categories: congruent, adaptive and intrinsic21. Contrary to the adaptive hypothesis, which states that genetic robustness is selected for directly22,23,24,25,26,27,28, the congruent hypothesis postulates that it rather arises as a byproduct of selection for environmental resilience29,30,31. The intrinsic hypothesis, on the other hand, negates any evolutionary impact and argues that robustness is a fundamental property of complex systems and networks32,33,34. However, due to its multi-factorial nature, it is more likely that no single mechanism is responsible for the development of biological robustness. The actual origin may rather comprise a combination of the hypotheses introduced above.

To test the impact of the congruence mechanism (“environmental robustness breeds genetic robustness”), we speculated that organisms capable of growing in many different conditions (robust towards changes in the chemical environment) would exhibit low PoFs (robust towards LOF mutations). Accordingly, structural robustness in metabolic networks should correlate with the number of substrates enabling growth. To evaluate this idea, we calculated the PoF and counted the aerobic, growth-supporting carbon sources for GSMMs of 53 E. coli and Shigella35, 406 Salmonella36, as well as 30 fungal37 strains.

Robustness correlates with the diversity of growth media

For the E. coli and Shigella models the PoF was calculated twice—once with regard to growth on glucose minimal medium and once for minimal medium with all growth-supporting carbon sources enabled. Being allowed to grow on all carbon sources the networks showed considerably higher robustness compared to growth on glucose alone (Supplementary Fig. 3). In fact, when testing iJO1366, the average of F decreased monotonically with the number of carbon sources available (see Supplementary Fig. 4). This confirms the naïve expectation that fatal reaction deletions are more strongly buffered in richer media. However, the magnitude of the two PoFs for a given model were correlated (Pearson r = 0.77, p = 1 × 10−11; Spearman r = 0.60, p = 3 × 10−6).

Similarly, we found growth on minimal glucose medium to be considerably more robust than optimal growth on the same medium (Supplementary Fig. 3a). In fact, the distribution of the PoF across the models was quite consistent for the different conditions with the exception of Shigella boydii Sb227. This strain showed the greatest robustness among all E. coli and Shigella models at glucose-optimal growth while ranking last when all carbon sources were enabled (Supplementary Fig. 3b).

PoF does not correlate with carbon utilisation capabilities

Figure 4 shows the PoF across all E. coli, Shigella, Salmonella, and fungal models versus the number of carbon sources that allow the respective networks to grow. Among the E. coli and Shigella strains, it ranged from 2.78% (E. coli IAI39) to 2.95% (S. boydii CDC 3083-94) with an average of 2.87%. For Salmonella and the fungi, it was considerably lower with averages of 2.62% and 1.04%, respectively. Within the respective groups there was no correlation between F and the number of carbon sources. On the contrary, most E. coli models sat on a straight horizontal line from 167 to 186 different carbon sources with the PoF being practically identical at 2.8685%. Similarly, out of 406 Salmonella models, 213 had essentially the same \(\widetilde{F}\) of 2.6059% and grow on 143–168 carbon sources. Finally, the fungi showed the highest amount of variability; yet again no correlation with the number of carbon sources.

Fig. 4: Genetic robustness vs. environmental robustness.
figure 4

a Estimated total PoF, \({\widetilde{F}}_{10}^{3}\), as a function of the number of growth-supporting carbon sources for all models tested. Growth on glucose minimal medium under aerobic conditions was simulated in GSMMs of various commensal and pathogenic E. coli, Shigella, and Salmonella strains as well as 30 fungal species. b \({\widetilde{F}}_{10}^{3}\) vs. m1 (i.e. number of essential reactions). The dotted line represents the PoF approximation using Eq. (5) and the number of essential reactions alone.

The observation that many strains have (almost) identical PoF values, although differing substantially in carbon utilisation capability, suggests that additional carbon sources mostly only add extra uptake reactions to the network rather than increasing connectivity. For example, all E. coli models with \({\widetilde{F}}_{10}^{3}\approx 2.8685 \%\) feature the same set of essential reactions, albeit some can grow on close to 20 more carbon sources compared to others. The linear fit in Fig. 5 corroborates this assumption as the number of reactions increases by ~3.28 for every additional carbon source. These three reactions usually consist of an exchange reaction for importing the carbon source across the model boundary and two transport reactions across the outer and inner membrane. Therefore, roughly only every third carbon source adds one non-transport reaction to the model leaving little room for increasing metabolic connectivity.

Fig. 5: Reactions vs. C-sources.
figure 5

Number of reactions vs. number of carbon sources for all native E. coli and Shigella models. The slope of the linear fit (dotted line) is ~3.28.

To further examine this hypothesis we computed the size of E. coli’s “shared reactome”, i.e. the set of reactions present in all E. coli GSMMs. Under the investigated conditions (aerobic growth on glucose minimal medium), it was 1336/1774 = 75%. Considering only the E. coli strains with virtually equal PoF, it increased to 1447/1772 = 82%, indicating that the PoF is a good predictor for network overlap. Similarly, a pairwise comparison across all E. coli and Shigella models revealed that the difference in PoF correlates with the pairwise network dissimilarity, 1 − σi,j, of any two networks Mi and Mj (see Fig. 6). Here, dissimilarity was defined as the complement of the pairwise network similarity

$${\sigma }_{i,j}=\frac{{r}_{{M}_{i}\cap {M}_{j}}^{2}}{{r}_{{M}_{i}}{r}_{{M}_{j}}},$$
(8)

where \({r}_{{M}_{i}},{r}_{{M}_{j}}\), and \({r}_{{M}_{i}\cap {M}_{j}}\) denote the number of reactions in the models Mi, Mj, and of those found in both, respectively.

Fig. 6: Pairwise difference in the PoF across all E. coli and Shigella GSMMs vs. network dissimilarity.
figure 6

Strains with different m1 cluster around \(\Delta \widetilde{F}=1{0}^{-4}\); strains with equal m1 around \(\Delta \widetilde{F}=1{0}^{-7}\); equal m1 and m2 around \(\Delta \widetilde{F}=1{0}^{-11}\); equal m1, m2, and m3 around \(\Delta \widetilde{F}=1{0}^{-16}\). Mixed pairs of strains are in blue; pairs of E. coli and Shigella strains are in grey and light blue, respectively. Note the logarithmic scale on the y-axis. The inset shows the same data in linear scale and a linear fit. Horizontal dotted lines correspond to Δm1.

In addition to greater robustness, the Salmonella models also displayed reduced nutritional flexibility; being able to grow on only 164 carbon sources on average as opposed to E. coli’s 179. Moreover, their relative shared reactome was slightly larger at 1274/1628 = 78%.

With only 1383/2153 = 64% of reactions belonging to the shared reactome, the fungal models showed larger diversity. This is not surprising, since the group of fungal models featured 30 species from different genera as opposed to multiple strains/species from the same genus for the bacterial models.

In spite of decreased nutritional flexibility, all fungal species showed considerably larger robustness as opposed to the bacterial models discussed above. However, it appears that this observation is an artefact of the different model reconstruction procedures rather than a genuine biological effect. The fungal models’ biomass consisted of 44 components, while the bacterial biomass reactions were considerably more detailed (51 and 68 components for Salmonella and E. coli/Shigella, respectively), including additional vitamins and metal ions. Consequently, the growth medium was more complex for the bacterial models (containing 18 components) as opposed to the fungal models with only five components (see Supplementary Table 3). Moreover, the more complex the biomass composition, the more likely it is affected by LOF mutations. In fact, when using the fungal biomass reaction as objective in iJO1366, the PoF decreased from 2.89 to 1.53% (Supplementary Fig. 5). Thus, as the biomass composition is different for models of the respective reconstruction groups (fungal, E. coli/Shigella, and Salmonella), we conclude that comparisons within groups are appropriate, but not across groups.

When comparing pathogenic and non-pathogenic models within the same reconstruction group, no differences in the PoF have been found (Supplementary Fig. 6). However, given the small number of purely pathogenic strains/species and the lacking genetic diversity among them, general trends cannot be inferred.

Essential reactions dominate metabolic robustness

For the models assessed here, a comparison with Eq. (5) (right panel in Fig. 4) reveals that even at the relatively high p = 10−4 the number of MCSs with cardinalities two and three was not large enough to contribute noticeably to \({\widetilde{F}}_{10}^{3}\). This observation—under the assumption that it can be generalised to most bacterial models growing on minimal medium—suggests that a reasonably accurate approximation for the structural robustness of these networks could be obtained solely from the number of essential reactions, m1, which is easy to compute.

Figure 6 further highlights the minuscule contribution of the MCSs with \(| {{\mathscr{M}}}_j| \in \{2,3\}\) as all pairwise PoF-differences between two models with the same number of essential reactions are smaller than 10−6. Similarly, the PoF differences among pairs of models with the same number of essential reactions as well as synthetic lethal pairs (m1,i = m1,j and m2,i = m2,j) lie between 10−12 and 10−10 and are mostly determined by differences in the number of synthetic lethal triplets Δm3 (Supplementary Fig. 7).

As F is almost exclusively determined by m1, we identified the essential reactions shared by all E. coli and Shigella models and the corresponding KEGG38 pathways (Fig. 7). Interestingly, (central) carbon metabolism only hosts a small number of essential reactions and thus constitutes a minor contribution to the PoF. Instead, amino acid synthesis appears to be buffered by redundancy to a lesser extent.

Fig. 7: Essential reactions grouped by pathway.
figure 7

Number of essential reactions grouped by KEGG pathways occurring in (empty bars) and shared by (solid bars) all E. coli and Shigella models.

Discussion

Robustness is a key feature of biological systems. Yet, its exact quantification remains challenging. In a previous work, we have modelled the robustness of metabolic networks by counting the frequency of the binary network response (as measured by growth or no growth under steady-state conditions) upon random reaction deletions13. Similar mathematical strategies are commonly used in engineering to evaluate the reliability of safety-critical structures, for example in aircraft39. Here, we expanded on this approach and presented a comprehensive theoretical description which we validated by confirming the buffering capabilities of diverse growth media40,41.

Capitalising on reduced run-times for GSMMs, we were able to evaluate the PoF across 459 members of the family Enterobacteriaceae (amongst them commensal and pathogenic E. coli, Shigella, as well as Salmonella strains) and 30 species of fungi growing on glucose minimal medium under aerobic conditions. Although the closely related E. coli and Salmonella strains, whose lineages separated 140 million years ago42, share a large fraction of their genetic material43,44, we found the Salmonella models to be considerably more robust. In fact, Salmonella serovars were shown to have fewer essential genes (relative to the genome size) than E. coli MG165545,46,47, which is also reflected in the corresponding metabolic models. This indicates higher redundancy and supports our in silico analysis of the respective GSMMs. However, in this work we have not undertaken any phylogenetic analyses and thus cannot discuss evolutionary trajectories that could explain these differences.

Compared to the bacterial models the fungal species analysed here showed drastically higher robustness. However, it should be stressed that, if GSMMs that have been generated via different reconstruction pipelines are compared, caution has to be exercised as differences in the reconstructions process add to the observed PoF. For instance, the biomass composition is more detailed in the Enterobacteriaceae models as they require metal ions in their minimal in silico medium to support growth, whereas the fungal models do not. This prohibits a direct comparison between the respective groups. In general, we observed an increase in the PoF with the complexity of the biomass formulation. This is consistent with earlier reports that the size of essential networks scales with the number of biomass components48. However, to turn this drawback into a benefit, the PoF could be used as a quality indicator. For instance, we were able to track the smaller PoF of E. coli IAI39 compared to the average PoF of the E. coli strains back to a lumped reconstruction of folate biosynthesis.

Within the groups of models that share the same reconstruction pipeline (E. coli/Shigella, Salmonella, and fungi), our findings do not support a connection between nutritional flexibility and cellular robustness. In line with this result, the diversity of genetic capabilities observed in Salmonella was shown to not be indicative of strain-specific lifestyle36. Here we find that this is true not only for Enterobacteriaceae but also for fungi. Although the strain-specific portions of the pan-reactome are largely associated with altered carbon metabolism35,36, our analysis reveals that this does not lead to increased connectivity (Fig. 5)—and thereby structural robustness—of the respective network. This reflects the notion that metabolism is organised in a bow tie structure with a large number of diverse nutrients being “fanned into” a tightly knotted core network from where just a few key metabolites “fan-out” again to produce all building blocks and macromolecular compounds that make up a cell49,50. The observation that on average the network size of E. coli and Shigella GSMMs increased by only three reactions for every additional carbon source (Fig. 5)—a number that is in good agreement with the previously derived two reactions per carbon source for minimal metabolic models48—also corroborates this interpretation.

The congruent hypothesis has recently also been challenged by Ho and Zhang20. Their analysis was based on the reallocation of growth-optimal flux distributions in a single GSMM of E. coli MG1655 (computed via minimisation of metabolic adjustments51) upon genetic and environmental perturbations20. Due to the use of an optimisation principle, their analysis was inherently biased, while our approach is more general in the sense that it takes the complete network structure (i.e. all possible flux re-routings) into account52. It could be argued that robustness is most relevant in the fittest states. However, our results show that structural robustness impacts all levels of growth-optimality in a similar way (Supplementary Fig. 3).

Classically, an unbiased characterisation of metabolic networks is achieved in terms of minimal functional units that are able to characterise all feasible steady-state flux distributions6,53,54,55. Alternatively, a network can be equally well described by failure modes, called MCSs11. This is mirrored by the fact that, according to Eq. (4), the PoF is solely determined by the MCSs and independent of the network’s size. Thus, it is a true feature of a network’s structural topology.

In principle, F can be computed exactly. However, in practice it suffers from the combinatorial explosion of the number of MCSs in GSMMs and the exponential explosion of the number of summands in Eq. (4) (i.e. the power set of the MCSs). Nonetheless, evaluating the PoF remains feasible as the (relatively few) low-cardinality MCSs are most dominant13. In fact, for many models growing on minimal media the number of essential reactions already suffices. This is due to the fact that it is far more likely that a randomly selected set of d deletions contains e.g. one or more essential reaction(s) than a complete MCS of cardinality d. Hence, high-cardinality MCSs are not as relevant and can be neglected—provided d0 is sufficiently large and p is sufficiently small (Fig. 2).

The probability of LOF mutations in a given number of reactions follows the binomial distribution, consequently depending on the total number of reactions in the model as well as the per-reaction mutation rate p. As we reported solely comparative analyses here, we focused on relative differences in F and chose p to be 10−4 per reaction. In order to correctly calculate absolute values, a more accurate, strain-specific estimate for p would be required. However, quantifying deleterious mutation rates proves challenging as LOF mutations with strong effects are usually cleared from populations quickly and thereby evading experimental detection. Kibota and Lynch56 give a lower bound of 2 × 10−4 deleterious mutations per genome and generation in E. coli, while estimates for the general mutation rate are in the range of 5 × 10−3 to 10−4 per genome and generation57,58,59,60. Hence, the rate for LOF mutations of a given gene (or reaction) can be confidently assumed to be very low.

This has implications for the computation of F, as fewer MCSs are required to give precise estimates in case of smaller mutation rates. For instance, in a network with 3000 reactions out of which 300 are essential the maximum relative error of the analytical PoF estimate from the essential reactions alone would be around 1% with p = 10−6 (see Supplementary Fig. 8c). Thus, in such a case performing recursion would not be necessary.

In fact, for all models analysed here the number of essential reactions dominated \(\widetilde{F}\), enabling accurate predictions with Eq. (5) even at the unrealistically high p = 10−4 (see Fig. 4b). Thus, genetic robustness could simply be estimated given the number of essential reactions or genes.

Due to the binary nature of LOF mutations underlying the concept of the PoF, it can only measure structural redundancy in metabolic networks by analysing their topology in the steady state. Alternative forms of resilience or robustness, for instance absolute concentration robustness61 or regulatory feedback mechanisms, are not addressed. Moreover, due to its focus on the reaction level, the PoF disregards some genetic information (e.g. duplicated genes coding for the same enzyme). However, we have demonstrated previously that PoF analyses at the level of genes and at the level of reactions yield qualitatively similar results13. Incidentally, recent advances in software for the enumeration of genetic MCSs62,63 allow for extending the scope of our method to the gene level. Additionally, we have shown here that the PoF is independent of the network’s size and mostly determined by the number of essential reactions (or genes, for that matter). Thus, given knowledge of the essential genes64 (and potentially the synthetic lethal pairs, which have been determined for some model organisms like yeast65), Eqs. (4) and (5) enable us to evaluate the PoF on the genome level without the need for a metabolic reconstruction altogether. When relying only on the essential genes for PoF estimation, the theory simplifies greatly. Therefore, in future work the theoretical framework presented here could be expanded such that ways for incorporating gene lengths in the analysis could easily be devised, which would further improve the accuracy and reliability of our approach for quantifying genetic robustness.

Methods

Model acquisition and preprocessing

Fifty-four GSMMs of E. coli and Shigella strains35 were downloaded from the BIGG Models database, version 1.666,67. Additionally, 408 Salmonella GSMMs36 (accession ID: MODEL1807280001) and 56 GSMMs of fungal species37 (accession IDs: MODEL16042800{00-55}) were retrieved from BioModels68. The minimal media can be found in Supplementary Table 3. Native model sizes were ~2600 reactions for the bacterial and ~6700 reactions for the fungal species. To remove any artefacts that might have emerged in the automated model-reconstruction process, all models were made consistent with respect to growth on glucose. This means that every reaction unable to carry any flux when growing on the minimal medium (as determined by flux variability analysis69) was removed from the respective model. This reduced model sizes to ~1650 reactions for E. coli/ Shigella, ~1550 reactions for Salmonella, and ~1750 reactions for the fungi. Note that in terms of the PoF analysis, “consistent” and “native” models are equivalent as F is independent of the number of reactions and the set of MCSs is the same for both. However, the models were made consistent nonetheless as it is commonly considered good practice and also reduces the computational load on MCS enumeration. Additionally, the E. coli and Shigella batch was also made consistent with regard to optimal growth on glucose, removing every reaction not participating in flux patterns providing the highest possible growth rate. Thereby, model sizes were reduced to ~500 reactions. To avoid introducing any bias in the process of making the models consistent, glucose levels in the media were adjusted on a per-model basis beforehand so that all models would achieve the same growth rate. Moreover, for reasons stated below, the default upper and lower bounds on reaction fluxes were relaxed to ±10,000. The 54 E. coli and Shigella models contained ten auxotrophs, one of which was capable of growing on its supplementary medium component as sole carbon source. It was therefore removed from the dataset. Salmonella featured 27 auxotrophs, two of which were able to grow on their supplements alone and were subsequently excluded from the analysis. The supplements required by the other auxotrophs were added to their respective media and can be found in Supplementary Tables 4 and 5. Out of the 56 fungal models, 26 were unable to grow in the specified conditions and thus dropped. The remaining 30 species (Supplementary Table 6) underwent additional preprocessing steps in order to harmonise reaction bounds prior to making the models consistent. All preprocessing was done in COBRApy v.0.17.070.

Computation of MCSs and PoF

To enumerate the models’ MCSs, the relevant information required was extracted from the respective SBML71 or JSON files and reformatted accordingly. The data was then subjected to

  • Linear compression with an in-house Perl script. The target (i.e. biomass generation) reaction was exempt from compression.

  • Transforming the MILP problem inferred from the compressed model into its dual form, again by an in-house Perl script.

  • Solving the dual problem for MCSs with cardinalities up to \(d_m\) with an in-house implementation of the algorithm described in ref. 15.

The compressed MCSs were then used for PoF evaluation. For all models, the LOF mutation rate p was set to 10−4 per reaction.

Single carbon source comparison for E. coli

Every carbon source available in iJO1366 was added separately to the minimal medium (Supplementary Table 3) except for glucose and the corresponding PoF with \(d_m =3\) and d0 = 10 was determined. To reduce the computational burden in the MCS-enumeration step, the respective model was made consistent for every substrate. Again, in order to prevent introducing bias, the carbon source availability in the medium was tuned to allow for the same growth rate as glucose. This was infeasible for some carbon sources with very low yields due to internal model constraints (i.e. flux bounds). Therefore, these constraints were relaxed to ±10,000 in all models.

Benchmarking PoF-implementations

For E. coli’s CCMM17 and GSMM18, low-cardinality MCSs were enumerated up to \(d_m =15\) and \(d_m=3\), respectively. Then, for a range of m ≤ \({m_{d_{m}}}\), the first m compressed MCSs were extracted and evaluated for all d0 ≤ 15 (CCMM) and d0 ≤ 8 (GSMM). Additionally, the MCSs were decompressed and evaluated again in the uncompressed form with both pof2.0 and the old implementation available at https://github.com/mpgerstl/networkRobustnessToolbox. Computation was performed on an Intel® Xeon® CPU E5-2650 v3 @ 2.30 GHz, and run-times were recorded using Bash’s72 built-in time command.

Computation of the PoF

pof2.0 was written in C++1173. The implementation relies on Boost74 for calculating binomial coefficients and uses Luigi Pertoldi’s progress bar75. The code is available at www.github.com/julibeg/PoF.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.