Introduction

A gene is pleiotropic if it affects more than one phenotypic trait1,2. A classic example is phenylketonuria, a human disease that is caused by a single gene defect but which affects multiple systems, with symptoms ranging from lighter skin color to mental disorders3. Pleiotropic effects can cause alleles to affect fitness differentially at different ages, a phenomenon believed to be a major cause of aging4,5,6; indeed, alleles contributing to increased longevity often show reduced fertility and stress tolerance7. Similar antagonistic epistasis may underlie other important biological phenomena such as speciation8 and cooperation9. Understanding the factors that contribute to pleiotropy is of fundamental importance in genetics10,11,12, evolution13,14,15,16, development17,18, as well as in disease19,20 and ageing4. In comparison to its fundamental importance, empirical knowledge of the prevalence and especially on the causal mechanisms of pleiotropy is scarce2,21.

Pleiotropy may be classified according to the types of traits considered22. Molecular gene pleiotropy refers to the number of functions of a gene and its products, e.g., the number of reactions catalyzed by a single enzyme. Developmental pleiotropy describes the genetic and evolutionary interdependence of phenotypic aspects. Finally, selectional pleiotropy refers to the number of separate fitness components affected by mutations to a gene. In this study, we focus on the latter type of pleiotropy.

Experimental studies generally assess pleiotropy through the analysis of gene knockouts23,24,25. The degree of pleiotropy is then defined as the number of traits affected when a gene becomes fully non-functional. Wang et al.25 analyzed phenotypes of large numbers of yeast, nematode, and mouse mutants. They found that pleiotropy is widespread: on average, yeast gene knockouts affect 8% of the examined traits; for the nematode, the corresponding number is 10%, for the mouse 3% (see also24). The distributions of the degree of pleiotropy appear rather similar across very different study systems, from the skeletal features of mice25 to metabolic systems26,27,28. Moreover, pleiotropy was found to be modular, such that sets of genes tend to affect the same sets of traits25.

In E. coli, 36% of metabolic reactions are catalyzed by enzymes also involved in other reactions; the same is true for 27% of metabolic reactions in the yeast Saccharomyces cerevisiae28. Pleiotropic effects of mutations that affect enzyme activity can be simulated from genome-scale metabolic models using constraint-based modeling techniques such as flux balance analysis (FBA)29,30. The functional pleiotropy of a metabolic gene can then be defined as the number of biomass components whose maximal production is affected by the gene’s knockout26. Previous studies using this definition found that a metabolic gene’s functional pleiotropy is related to its propensity to form negative epistatic interactions with other metabolic genes26,27.

While full gene knockouts are easily examined experimentally, they may not be representative of the effects of deleterious alleles segregating in natural populations: individual mutations may affect only a subset of all traits influenced by the gene31. Thus, it is important to distinguish between the pleiotropy of the gene and the pleiotropy of individual mutations, especially in evolutionary and clinical contexts. For example, while 4.6% of human SNPs implicated in complex non-Mendelian phenotypes show pleiotropic effects, most of these do not fully abolish protein function32. Experimental studies indicate that mutational pleiotropy tends to be smaller than gene pleiotropy31.

Genome-scale metabolic models allow us to dissect the relationship between gene and mutational pleiotropy in quantitative detail, without being hampered by the detection limits of experimental assays. Does the degree of pleiotropy depend on how severely a given allele of a metabolic gene reduces protein activity, i.e., are the same number of functions affected when protein function or expression is reduced only partially? How modular is metabolic pleiotropy? Currency metabolites, such as ATP and NADPH, are used as cofactors in many otherwise unrelated reactions; it thus appears highly likely that a substantial fraction of metabolic pleiotropy is due to effects on the production of currency metabolites. Is such an effect of currency metabolites on patterns of pleiotropy confirmed by simulated data?

Below, we address these questions by analyzing the metabolic networks of a representative bacterial model system, Escherichia coli, and a corresponding eukaryotic system, the baker’s yeast Saccharomyces cerevisiae. We find that most gene knockouts that impact fitness do so by affecting the production of multiple biomass components, and that the number of affected biomass components typically increases with increasing mutation severity. Pleiotropy is rarely a consequence of multiple molecular gene functions, but is an emergent property of the metabolic network. For many genes, pleiotropy is indeed mediated through their involvement in the generation of currency metabolites.

Results

Estimating pleiotropy from contributions to biomass components within the wildtype flux distribution

We first estimated wildtype flux distributions in the default growth condition for the genome-scale metabolic model of E. coli33 and the yeast S. cerevisiae34 (obtained from https://sourceforge.net/projects/yeast/files/). The maximal biomass production rates were estimated using flux balance analysis (FBA)29,30. For both model systems, we identified the flux distribution compatible with maximal biomass production that had the smallest sum of absolute fluxes, a strategy often termed parsimonious FBA (pFBA), which approximates optimal utilization of limited cellular protein resources35.

To simulate mutations that cause different reductions of protein function or expression and correspond to different deleterious alleles of a metabolic gene, we restricted the maximal flux through all reactions requiring this gene to a fixed percentage of the estimated wildtype flux36, starting from 100% (the wildtype) down to 0% (a full gene knockout) in steps of 0.5%. For each flux reduction, we defined the degree of pleiotropy (referred to simply as “pleiotropy” below) as the number of biomass components whose production was reduced by at least 0.01% compared to the maximal (wildtype) production. Note that with this definition, only genes with pleiotropy ≥2 are pleiotropic, while genes with pleiotropy 0 (no affected biomass component) or pleiotropy 1 (one affected biomass component) are non-pleiotropic.

Flux distributions at maximal biomass production rate are usually not unique35, and so in many cases a flux restriction through one reaction may be compensated by a rerouting of fluxes through alternative pathways. Such rerouting would require the upregulation of the corresponding genes. While it has been observed experimentally that cells can survive many gene deletions in central metabolism without drastic changes in gene expression37, the necessary upregulation of protein expression will not occur spontaneously at least for some pathways38. More importantly, if we are interested in the de facto contribution of a given gene to the production of biomass components, then it is of no consequence if alternative pathways could take over part of this functionality. Thus, when calculating the maximal (wildtype) production rate of individual biomass components as well as when simulating the effects of mutations to a given metabolic gene, we did not allow the redistribution of fluxes to alternative pathways: we allowed only decreases, not increases, of the absolute value of any flux compared to the wildtype flux distribution obtained with pFBA.

Note that experimental studies often employ a pragmatic working definition of pleiotropy that lies somewhere between the definitions of pleiotropy proposed here based on the wild type flux distribution on the one hand and a quantitative measure of essentiality based on an analogous calculation that allows the free redistribution of fluxes. In these studies, pleiotropy is typically estimated as the number of traits with observable phenotypic changes after the gene knockout, but before allowing the strain to adapt to its new genotype. In this case, some fluxes may be rerouted due to enzymes and transporters that are expressed either spuriously or because of other roles they play in wildtype physiology, while other fluxes that require the upregulation of the corresponding enzymes and transporters will not yet be active. Thus, our definition of pleiotropy describes a “worst case scenario”, providing an upper limit on experimentally measured pleiotropy.

Many genes affect the production of multiple biomass components

Pleiotropy varies widely between different genes. Mutations to the majority of genes affect no biomass components in the minimal growth medium assayed, independent of mutation severity (E. coli: 1,067 genes or 78.1%; S. cerevisiae: 687 genes or 75.6%). Among genes contributing to biomass production—and thus fitness—in the wildtype, non-pleiotropic cases are rare: in E. coli, only 54 full-gene knockouts (out of 299 knockouts with fitness contributions, 18.1%) affect exactly one biomass component, while the same is true for only 12 knockouts (out of 222 knockouts with fitness contributions, 5.4%) in S. cerevisiae. Conversely, knockouts of 32 genes in E. coli (10.7% of knockouts with fitness contributions) and 40 genes in S. cerevisiae (18.1% of knockouts with fitness contributions) affect the production of all biomass components. Many of the remaining genes show low degrees of pleiotropy, affecting the production of only a few biomass components; on average, full gene knockouts of fitness-relevant genes affect the production of 20% of biomass components in E. coli and 34% of biomass components in S. cerevisiae (Fig. 1, Table 1).

Figure 1
figure 1

Most complete gene knockouts of fitness-relevant genes have pleiotropic effects, i.e., they affect the production of multiple biomass components. For some genes, pleiotropy is reduced when NADPH is made freely available (cyan bars). For other freely available currency metabolites, see Supplementary Figure S2.

Table 1 Average number of biomass components whose production is affected by a full gene knockout.

These percentages reflect functional pleiotropy, the de facto contribution of gene products to biomass component production. If, instead, we are interested in the phenotypic effects of gene knockouts after allowing the mutant strain to adapt its physiology to its altered gene content, we must allow free redistributions of fluxes after the gene knockouts. Corresponding simulations show that after adaptation, genes with fitness contributions are, on average, essential for the production of 9.2% of E. coli biomass components and of 26.6% of S. cerevisiae biomass components (Table 1, Supplementary Figure S1). The degree of gene pleiotropy for yeast is substantially higher than previous experimental estimates, which are around 2 (corresponding to 2–11% of considered traits depending on the types of traits analyzed)25; however, experimental estimates of gene pleiotropy tend be downwardly biased due to experimental detection limits2,22.

Pleiotropy is an emergent property of the metabolic network

Pleiotropy can be classified by its origin into type I pleiotropy, caused by multiple molecular functions of a gene product, and type II pleiotropy, caused by multiple physiological consequences of a single molecular function2. Similar distinctions have been made previously using the terms “horizontal” vs. “vertical”10 and “mosaic” vs. “relational”39 pleiotropy. Our model allows us to quantify the relative contributions of these two pleiotropy types. 41.7% of E.coli genes and 40.5% of yeast genes in our metabolic models catalyze multiple reactions. To what extent does this functional diversity cause functional pleiotropy as measured in the number of biomass components affected by a gene knockout? To answer this question, we compared the gene pleiotropy (Fig. 1) to the pleiotropy of individual reactions catalyzed by the gene product. For example, fully abolishing all functions of the purB (b1131) gene, whose gene product catalyzes two distinct biochemical reactions, reduced the production of 18 biomass components. In contrast, blocking only one of the catalyzed reactions results in a pleiotropy estimate of 16, while blocking only the other reaction results in a pleiotropy of 10. Thus, the pleiotropy of the b1131 gene is largely of type II, and is only in small part due to its multiple molecular functions.

This pattern is typical: the maximal pleiotropy arising from blocking only a single out of several reactions catalyzed by the same protein accounts for over 97% of the gene pleiotropy (E. coli 97.4%, yeast 97.6%). These numbers drop only marginally when we consider only gene products that are essential for multiple reactions, to 92.2% in E. coli and to 94.5% in yeast (Supplementary Figure S3). We conclude that the vast majority of metabolic epistasis is of type II, i.e., is an emergent property of the metabolic network rather than a consequence of multiple molecular functions. This finding is consistent with the previous observation that the degree of pleiotropy in yeast is not significantly correlated with the number of molecular gene functions40.

Metabolic networks show significant but low modularity

The relationship between genes and biomass components (traits) can be represented as a bipartite graph, with links connecting genes with affected biomass components. Modules are defined as sets of genes and traits with significantly more within-module than between-module links25. A high degree of modularity thus indicates that pleiotropic genes tend to affect groups of related traits (e.g., chemically related biomass components) rather than random sets of traits. Supplementary Figure S4 shows heatmaps that illustrate the modularity of both metabolic pleiotropy networks. To quantitatively assess the modularity, we used the LP&BRIM algorithm41, resulting in raw modularities of Q = 0.235 for E. coli and Q = 0.197 for S. cerevisiae. Both networks show highly statistically significant modularity: in each case, the modularity of 10,000 randomly rewired networks was always lower than observed for the real pleiotropy network (i.e., P < 0.0001; Supplementary Figure S5).

Following ref.25, we then defined a z-score for modularity (or “scaled modularity”)42 as the difference between the observed modularity and the mean modularity of randomly rewired networks, measured in number of standard deviations. The E. coli pleiotropy network exhibits a scaled modularity of 9.1, while the S. cerevisiae network has a scaled modularity of 4.9, i.e., the modularity of metabolic pleiotropy is about 9 and 5 standard deviations higher than for corresponding random gene-trait networks. These values are surprisingly low: for five different experimental study systems and trait definitions, Wang et al. found a median scaled modularity of 37 (range 34–238). Thus, metabolic pleiotropy networks are less modular than other pleiotropy networks, suggesting that the underlying metabolic network shows more interconnections between the pathways producing different sets of biomass components than the genetic networks underlying other types of traits. Our findings on modularity may be related to the role of currency metabolites, which crosslink the diverse metabolic pathways (see below).

Pleiotropy typically increases with increasing mutation severity

We next examined the pleiotropy of alleles with small-effect mutations, i.e., mutations that reduce enzyme capacity without fully abolishing enzyme function. About 20% of E. coli genes with fitness contributions have constant pleiotropy: small-effect mutations of these genes affect the same number of biomass components as full gene knockouts. In comparison, only 7.7% of yeast genes contributing to fitness exhibit constant pleiotropy.

All other genes contributing to fitness affect an increasing number of biomass components for increasingly deleterious alleles. Figure 2 shows this stepwise increase in pleiotropy for the example of Lipoamide dehydrogenase (gene names: E. coli b0116, S. cerevisiae YFL018C; for additional examples, see Supplementary Figure S6. In both organisms, pleiotropy typically increases in about a dozen steps from weakly to strongly deleterious alleles (Fig. 3; mean number of steps: E. coli 11.6, S. cerevisiae 12.6).

Figure 2
figure 2

Pleiotropy for the Lipoamide dehydrogenase gene increases for increasingly deleterious alleles. Pleiotropy is reduced when NADPH is made freely available (cyan curves). For additional examples, see Supplementary Figure S6.

Figure 3
figure 3

For the majority of genes contributing to biomass production, pleiotropy increases for increasingly deleterious alleles in multiple steps. Histograms for the number of pleiotropy steps in E. coli and the yeast S. cerevisiae. Cyan bars reflect the reduced numbers of pleiotropy increases when making NADPH freely available.

The pleiotropy of the full gene knockout constitutes an upper limit to the number of stepwise increases in pleiotropy. If there was otherwise no systematic relationship between maximal pleiotropy and the number of steps, we would expect the numbers of steps to be uniformly distributed between zero and the pleiotropy of the full knockout. However, the correlation between the number of steps and pleiotropy at full knockout was much stronger than expected from such a relationship (Supplementary Figure S7, Spearman’s ρ = 0.926 (E. coli) and ρ = 0.986 (S. cerevisiae), P < 10−6 in each case from randomizations; see Methods). Thus, genes whose full knockout showed higher metabolic pleiotropy also showed more stepwise increases in pleiotropy for increasingly debilitating mutations.

All genes whose mutations affect the production of at least one biomass component must also affect the overall production of biomass (i.e., in the common interpretation of FBA, fitness). The reverse is not true: a mutation to a gene may affect the maximal production of biomass, but not the production of any individual biomass component. This is a consequence of the algorithm employed to estimate production capabilities for individual biomass components. If we maximize the production of a single compound, then pathways usually concerned with the production of other biomass components can be diverted to the production of this compound. While we find no such genes for S. cerevisiae, this is indeed the case for 3 essential E. coli genes, which encode transporters for acetate (b4067), magnesium/nickel/cobalt (b3816), and calcium/sodium (b3196, an antiporter).

The pleiotropy of most genes is mediated by currency metabolites

We can conceptually partition internal metabolites into currency metabolites—those involved in many reactions, e.g., to provide energy or redox equivalents43—and primary metabolites. A deleterious allele may affect the production of a given biomass component because the mutated gene catalyzes a reaction in a pathway of primary metabolites that directly leads to the component’s production. Conversely, a deleterious allele may affect not the primary metabolites, but the currency metabolites utilized in the component’s production. A list of 14 currency metabolites was obtained from ref.43. Excluding exchange reactions, 753 out of a total of 2,251 reactions (33.5%) in E. coli and 310 out of 3,324 reactions (9.3%) in S. cerevisiae involved at least one of these metabolites.

A substantial fraction of pleiotropy is indeed associated with the generation of currency metabolites: 87.4% of previously pleiotropic genes show reduced pleiotropy when we make metabolites such as ATP, UTP, or NADPH freely available in yeast (Fig. 4). The free availability of ATP alone reduces the degree of pleiotropy of over half of pleiotropic yeast genes. The influence of currency metabolite production on pleiotropy is weaker, yet still substantial in E. coli: here, 55.3% of pleiotropic genes are affected, with NADH making the biggest contribution (over 40%) (Fig. 4).

Figure 4
figure 4

Many genes show reduced pleiotropy when currency metabolites are made freely available. The bar chart shows the percentage of previously pleiotropic genes with reduced pleiotropy in response to the free availability of different currency metabolites. Abbreviations: Adenosine triphosphate (ATP); Cytidine triphosphate (CTP); Guanosine triphosphate (GTP); Uridine triphosphate (UTP); Inosine triphosphate (ITP); Nicotinamide adenine dinucleotide (NADH); Nicotinamide adenine dinucleotide phosphate (NADPH); Flavin adenine dinucleotide reduced (FADH2); Reduced flavin mononucleotide (FMNH2); Ubiquinol-8 (Q8H2); Menaquinol 8 (MQL8); 2-Demethylmenaquinol 8 (DMMQL8); Acetyl-CoA (ACCOA); L-Glutamate (GLU).

Involvement in currency metabolite production is an important determinant of the number of biomass components for which a gene knockout is essential even after allowing the mutant strain to adapt its protein expression to the altered gene content of its genome. This contribution is particularly striking in yeast: for over half of the tested currency metabolites, free availability reduces the number of biomass components for which a gene is essential for almost half of the genes (Supplementary Figure S8).

Discussion

Using constraint-based simulations of the metabolic models for E. coli and the yeast S. cerevisiae, we have characterized the distributions of pleiotropy. Consistent with earlier computational26,27,28 and experimental23,24,25 studies, we found that the knockout of a majority of genes that contribute to fitness has pleiotropic effects. The vast majority of this gene pleiotropy is not caused by multiple molecular functions of the gene product (type I), but is an emergent property of the metabolic network (type II). Pleiotropy is modular, but to a lower degree than estimated experimentally for non-metabolic systems25.

For most pleiotropic genes, pleiotropy increases strongly for alleles with increasingly debilitating effects. Thus, standard measures of pleiotropy based on gene knockout studies are more likely to reflect the maximal degree of mutational pleiotropy of a given gene2,31. Alleles that only knock down protein activity (by reducing enzyme/transporter function or expression level) often affect only a subset of phenotypic traits, with additional traits affected progressively as alleles become more deleterious. Thus, the physiological effect of the full gene knockout will in most cases not be representative for the effects of deleterious alleles that retain some level of enzyme function. This type of effect is also evident from individual medical observations of pleiotropy. For example, some small-effect mutations affecting human SOX9 expression lead to minor skeletal malformations, while the consequences of large-effect mutations can include sex reversals44.

How can we understand the dependence of pleiotropy on the degree to which an allele reduces protein activity? For increasingly deleterious alleles, more and more metabolic resources must be channeled into the compensation of the compromised pathway; as a consequence of this increasing drain of resources, more and more other pathways are affected. Not surprisingly27, we found that the pleiotropy of many genes is mediated through the generation of currency metabolites such as ATP, NADPH, or FADH2. This is true for more than half of the pleiotropic genes in E. coli, and for 87% of pleiotropic genes in yeast.

While the overall patterns of pleiotropy appear qualitatively similar between E. coli and yeast, we found a number of quantitative differences. Compared to E. coli genes, yeast genes (i) showed generally higher pleiotropy and were rarely of pleiotropy 1; (ii) were less likely to have constant pleiotropy; and (iii) were more likely to show reduced pleiotropy when supplied with currency metabolites. Moreover, (iv) the yeast pleiotropy network exhibited lower modularity. In part, these differences may be related to network size. The E. coli metabolic network encompasses substantially more genes overall than the yeast network. However, we constrained network usage to reactions active in the wildtype. In contrast to total network sizes, the active metabolic network of yeast (755 reactions and 184 metabolites) is substantially larger than the active metabolic network of E. coli (462 reactions and 103 metabolites); this difference is consistent with the notion that yeast metabolism is more complex, yet less flexible than E. coli metabolism. While the average number of reactions per metabolite is similar between E. coli (4.49) and yeast (4.10), the lower yeast modularity indicates that reactions more often connect otherwise distant network parts. A role of currency metabolites in such connecting reactions would be consistent with the larger effect of currency metabolite supply on pleiotropy in yeast. In sum, the higher interconnectedness of the yeast pleiotropic network, combined with the larger active metabolic network size, appears to provide more potential for pleiotropic effects.

Pleiotropy is a complex phenomenon: it is not constant, but varies between different alleles of the same gene, and its causes are often indirect. Thus, experimental as well as computational analyses of pleiotropy should move away from focusing on full gene knockouts, and instead consider explicitly the degree to which mutations reduce protein activity. The necessity of a corresponding nuanced view of pleiotropy may be particularly evident in studies of medically relevant mutations, where full knockouts are often lethal, while small-effect mutations may segregate at appreciable frequencies in the human population45.

Materials and Methods

Metabolic models

To simulate Escherichia coli metabolism, we used the metabolic reconstruction iJO136633, encompassing 1,366 metabolic genes associated with 2,251 reactions. For the yeast S. cerevisiae, we used the yeast7.6 model (https://sourceforge.net/projects/yeast)34, accounting for 909 metabolic genes associated with 3,324 reactions. The published models were used without any modifications. The E. coli model contains a growth-independent maintenance energy consumption term (the ATPM reaction), which enforcies a minimal ATPase activity of 3.15 mmol/gDW/h. We utilized the default biomass reactions for E. coli (Ec_biomass_iJO1366_core_53p95M), which comprises 50 essential biomass components (Supplementary Table S1), considering only “substrates” of the biomass reactions and excluding inorganic ions and H2O. For S. cerevisiae, we used the “yeast 5 biomass pseudoreaction”, which comprises 35 essential biomass components (Supplementary Table S2), again considering only “substrates” and excluding inorganic ions and H2O.

Flux distribution constraints derived from wildtype simulations

In order to approximate the de facto contribution of individual metabolic proteins to the production of individual biomass components in vivo, we should only consider flux distributions that are naturally active during growth (biomass production) in the nutritional environment studied, and fluxes should not exceed these wildtype fluxes. We thus first estimate the wildtype flux distribution vWT, by running a flux balance analysis (FBA) with the biomass reaction as the objective function, followed by a minimization of the sum of absolute fluxes at the previously determined maximal biomass production rate (parsimonious FBA35).

When simulating the production of individual biomass components, we constrained all fluxes vi to values between zero and the wildtype flux viWT for this reaction, i.e.,

$$\begin{array}{c}0\le {v}_{i}\le {{v}_{i}}^{{\rm{WT}}}\,{\rm{for}}\,{{v}_{i}}^{{\rm{WT}}}\ge 0\\ 0\ge {v}_{i}\ge {{v}_{i}}^{{\rm{WT}}}\,{\rm{for}}\,{{v}_{i}}^{{\rm{WT}}} < 0\end{array}$$
(1)

Estimating pleiotropy

For each essential biomass component (Supplementary Tables S1 and S2, respectively), we added a new exchange reaction representing its secretion46. As some biomass components may be coupled through the biomass reaction, we allowed the free excretion of all other biomass components when maximizing the production of one selected biomass component (i.e., vj ≥ 0 for all added exchange reactions j).

We then calculated the maximum production of each biomass component by maximizing its exchange reaction flux while enforcing the wildtype flux distribution constraints (Eq. 1). For each metabolic gene, we compared this unperturbed maximal production with the maximal production rate of alleles with increasingly reduced protein activity, simulated by restricting the flux through all reactions catalyzed by the gene to a fixed fraction of the wildtype flux36, which we reduced from 100% to 0% in steps of 0.5%. The flux through a specific reaction was constrained in this way only if the gene-protein-reaction (GPR) mapping contained the affected gene either alone or only in an “AND” relationship (i.e., as an essential part of a protein complex); if the GPR listed the affected gene in an “OR” relationship (i.e., as one of multiple isoenzymes or alternative transporters), the reaction was not affected.

We defined pleiotropy as the number of biomass components whose maximal production was reduced by at least 0.01% compared to the unperturbed state (WT) for the allele considered26. Thus, an allele not involved in the maximal production of any essential biomass component is considered to have pleiotropy 0; an allele that affects the production of exactly one essential biomass component has pleiotropy 1.

Our estimate of pleiotropy reflects the actual contribution of a gene product to biomass formation, based on estimated enzyme and transporter activities in the wildtype. If instead, one is interested in a quantitative measure of essentiality, defined as the number of biomass components affected by a deleterious allele after the mutant strain has been allowed to adapt its physiology to the gene deletion, a different algorithm is more appropriate. In this case, one needs to allow the free redistribution of fluxes after the simulated activity reduction of the protein encoded by the gene in question.

Statistical test for the relationship between pleiotropy at full gene knockout and number of steps

The full knockout (maximal) pleiotropy sets an upper limit to the possible number of pleiotropy steps at decreasing enzyme activity. The null hypothesis is that the number of steps is uniformly distributed between zero and maximal pleiotropy; i.e., the null hypothesis assumes that apart from the upper limit, there is no systematic relationship between maximal pleiotropy and number of steps. We tested this through a randomization protocol, where we constructed 106 datasets with the same maximal pleiotropies, but with step numbers drawn from the corresponding uniform distributions. All random datasets for both the E. coli and the yeast data had Spearman rank correlation coefficients between pleiotropy at full knockout and number of steps that were lower than the observed correlation coefficients. Thus, the empirical P-value was <10−6 for both data sets.

Currency metabolites

In additional analyses, we made several cofactors freely available to study how pleiotropy is associated with the generation of currency metabolites. We did this by adding a balanced biochemical reaction that interconverts the activated and inactivated versions of the cofactor and allowing unlimited flux of this reaction in both directions. For example, to simulate free NADPH, we added the following reversible reaction:

$${\rm{NADPH}}\rightleftharpoons {{\rm{NADP}}}^{+}+{{\rm{H}}}^{+}+2{{\rm{e}}}^{-}$$

A list of currency metabolites was obtained from ref.43. Supplementary Table S3 lists the currency metabolites and the corresponding exchange reactions as well as the number of reactions utilizing each currency metabolite.

Software used

All simulations were performed in R47 using sybil, a computer library optimized for efficient constraint-based modeling of metabolic networks48. We used IBM ILOG CPLEX as the linear solver, connected to sybil via the cplexAPI R package.

To calculate network modularities, we used the LP&BRIM algorithm (Label Propagation with Bipartite Recursively Induced Modules) implemented in Matlab49.