Dynamic optimization identifies optimal programmes for pathway regulation in prokaryotes

Bartl, Martin; Kötzing, Martin; Schuster, Stefan; Li, Pu; Kaleta, Christoph

doi:10.1038/ncomms3243

Article
Published: 27 August 2013

Dynamic optimization identifies optimal programmes for pathway regulation in prokaryotes

Martin Bartl¹,
Martin Kötzing^1,2,
Stefan Schuster³,
Pu Li¹ &
…
Christoph Kaleta²

Nature Communications volume 4, Article number: 2243 (2013) Cite this article

1746 Accesses
21 Citations
43 Altmetric
Metrics details

Subjects

Abstract

To survive in fluctuating environmental conditions, microorganisms must be able to quickly react to environmental challenges by upregulating the expression of genes encoding metabolic pathways. Here we show that protein abundance and protein synthesis capacity are key factors that determine the optimal strategy for the activation of a metabolic pathway. If protein abundance relative to protein synthesis capacity increases, the strategies shift from the simultaneous activation of all enzymes to the sequential activation of groups of enzymes and finally to a sequential activation of individual enzymes along the pathway. In the case of pathways with large differences in protein abundance, even more complex pathway activation strategies with a delayed activation of low abundance enzymes and an accelerated activation of high abundance enzymes are optimal. We confirm the existence of these pathway activation strategies as well as their dependence on our proposed constraints for a large number of metabolic pathways in several hundred prokaryotes.

You have full access to this article via your institution.

Download PDF

Emergence of fractal geometries in the evolution of a metabolic enzyme

Article Open access 10 April 2024

Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis

Article 12 April 2024

The genetic landscape of a metabolic interaction

Article Open access 18 April 2024

Introduction

The consideration of organisms based on optimality principles has provided explanations for a large number of important biological phenomena^1,2,3,4,5,6. An important component of the adaptation of organisms is the ability to quickly adapt to changes in their natural environment to survive and prevail against competitors^6,7,8,9,10. With only a limited amount of resources available, the ability of a quick adaptation can provide an important evolutionary advantage.

In this work, we study optimal programmes for the activation of metabolic pathways under the constraint of a limited cellular protein synthesis capacity. Being able to quickly adjust fluxes through metabolic pathways is of critical importance to reduce lag times upon depletion of essential biomass components and during major growth transitions^6,7,11,12. Previous work has established that, assuming a limited total abundance of proteins as well as the minimization of the invested protein, a sequential induction of enzymes along a pathway is optimal for a rapid activation^{13,14,15,16,17,18,19}. That metabolic pathways show a pattern of sequential activation, also known as just-in-time-activation¹⁹, has been demonstrated experimentally in Escherichia coli for a selected number of amino-acid biosynthetic pathways¹⁸ and on a global scale in Saccharomyces cerevisiae²⁰.

However, previous works trying to explain these patterns face several problems. First, they predict that a sequential activation of enzymes within a linear metabolic pathway is always preferential to other types of activation strategies. This result is problematic in that the partial operonic organization of many metabolic pathways in prokaryotes prohibits a detailed sequential activation of proteins within a pathway, as proteins on a monocistronic transcript are produced with only a small delay^21,22. To some extent, these observations can be explained by a balance between the fitness advantage obtained through a sequential activation of enzymes within a pathway and the fitness advantage of an operonic organization that minimizes biochemical noise^23,24 and reduces the length of promoter sequences¹⁸. Second, although the production of proteins represents a burden for the cell^25,26, previous approaches only incompletely took into account that protein cost lies in the process of their production (cf. ref. 26) or even assumed that proteins can be produced at any rate^{13,14,15,16,19}.

Here we use dynamic optimization to investigate how limitations in protein production capacity influence the optimal timing of the production of enzymes to activate a metabolic pathway. We find that the interplay between the protein production capacity of the cell and the amount in which a particular enzyme needs to be produced (that is, its abundance) can explain the optimality of a wide variety of pathway activation strategies. In particular, we find that the previously reported sequential activation strategy of enzymes along a pathway^{13,14,15,16,17,18,19} is only optimal if large amounts of proteins need to be produced, whereas the simultaneous activation of all enzymes within a pathway is optimal in the case where only small amounts of protein need to be produced. Thus, we show that, depending on protein abundances, an operonic organization of a metabolic pathway is optimal to reduce activation time, whereas previous work postulated activation time-independent effects to explain the operonic organization of metabolic pathways^18,23. Moreover, we observe that, if there are differences in the abundance of enzymes of a pathway, it is optimal to produce enzymes with high abundance earlier and to delay the production of enzymes with low abundances.

Results

Enzyme synthesis capacities influence activation strategies

We consider the activation of a metabolic pathway that comprises four enzymatic steps e₁, …, e₄ that convert a buffered substrate S via three intermediates Y₁, …, Y₃ into a product P (Fig. 1a, Methods). In many cases, the activation of a particular pathway is required to resume growth, for instance, if an amino acid that has been depleted from the growth medium has to be synthesized. If a particular pathway product p(t) has a fixed proportion in cellular biomass, growth cannot resume unless the pathway product is present in sufficient quantities. An example for a mechanism implementing such a pathway-dependent growth control is the stringent response in E. coli that arrests growth if there is a lack of amino acids²⁷. Therefore, we introduce an objective function that maximizes biomass formation limited by the synthesis of the product of the considered pathway

**Figure 1: Model pathway and optimal activation strategies.**

where μ is the growth rate. Here the time courses of the enzymes e₁(t), …, e₄(t) and growth rate profile μ(t) are determined to maximize the objective function. As we consider the activation of the pathway to an active state, which is maintained after activation, a large final time t_f is defined (assumed to be t_f=1,000 arbitrary time units). By explicitly taking into account the growth rate in the course of pathway activation, we are able to more precisely model the influence of dilution through growth on pathway activation.

The concentration of an enzyme e_j(t) is determined by two factors: the enzyme synthesis rate d_j(t) and the dilution through growth μ(t)·e_j(t). We assume that dilution through growth is the major source for protein degradation as it has been reported previously for E. coli²⁸. There are two constraints on enzyme synthesis rate: the synthesis capacity of individual enzymes and the free protein synthesis capacity of the cell. For each enzyme, there is an upper bound on the rate at which it can be synthesized, d_j,max. This upper bound is determined by several factors such as the maximal copy-number of the associated messenger RNA and its translation efficiency²⁹. Depending on the required concentration of an enzyme for an active pathway (that is, its abundance), both factors can be adjusted in the course of evolution to increase and decrease the production capacity of the enzyme. In consequence, enzyme abundance is a major determinant in the maximal production capacity of an enzyme. Thus, the time course of e_j is determined by

The free protein synthesis capacity corresponds to the maximal amount of protein that can be produced by free ribosomes of the cell within a specific time interval. We use the constraint based on free-protein synthesis capacity to account for the synthesis of other proteins by ribosomes that need to be produced to maintain cell viability³⁰. The free protein synthesis capacity is mainly determined by ribosomal concentrations, total mRNA concentrations, transfer RNA concentrations and the availability of substrates for protein synthesis. Thus, we formulate the constraint on free-protein synthesis capacity, d_max, within a specific time interval by

In the first step, we analysed the influence of the interplay between individual enzyme synthesis and free protein synthesis capacity on optimal pathway activation strategies for a prototypic metabolic pathway with unit kinetic parameters (Fig. 1a).

If the sum of individual enzyme synthesis rates is equal (or smaller) than the free protein synthesis capacity, all enzymes can be produced at the same time and, hence, they are induced simultaneously for a rapid pathway activation (Fig. 1b). To give a clear illustration, we reduced the plotted time window to the dynamics of the pathway activation (the complete profiles are shown in Supplementary Fig. S1). In Fig. 1d, we consider the case where each individual enzyme synthesis rate is equal to the free protein synthesis capacity. This corresponds to enzymes that have to be produced in large amounts. For this scenario, we observe a sequential activation of enzymes according to their order within the pathway. This type of activation strategy is similar to the so called ‘just-in-time-activation’ strategy¹⁹, also reported in other studies^13,14,15,16. In the case in which each individual enzyme synthesis rate is smaller than the free protein synthesis capacity but their sum is larger than the free protein synthesis capacity, we observe an intermediary behaviour in which parts of the metabolic pathway are sequentially activated (Fig. 1c). The influences of the different constraints on optimal pathway activation strategies are summarized in Fig. 1e.

To investigate the influence of kinetic parameters on the optimization runs in the previous section, we repeated the optimization for 100 uniformly sampled kinetic parameter sets from the interval [0,2] for several values of the individual enzyme synthesis capacity.

Although a sequential or partial sequential activation of enzymes within the model pathway is often optimal, we observed many cases in which the order of activation is rearranged such that the production of later steps of the pathway is induced before the induction of earlier steps. A closer investigation showed that the rearrangement of the activation sequence from the order of reactions in the pathway is because of differences in the abundance of individual enzymes caused by differences in their catalytic efficiency. We observed that enzymes that need to be present in higher amounts relative to the other enzymes of the pathway are induced earlier than enzymes with average abundances as their production takes longer (Fig. 2). Moreover, we observed that enzymes whose abundance is low relative to the other enzymes of the pathway tended to be activated later than surrounding enzymes of the pathway (Fig. 2). For similar plots for all considered individual enzyme synthesis rates see Supplementary Fig. S2.

**Figure 2: Influence of enzyme abundance on the order of activation of enzymes in a pathway.**

Influence of activation strategies on operonic organization

To validate the prediction of the optimality of different type of activation strategies depending on enzyme synthesis constraints, we used the operonic organization of enzymes belonging to the same metabolic pathway. Please note that we use the term ‘operon’ synonymously for ‘transcription unit’ for the sake of clarity although, strictly speaking, an operon needs to be composed of at least two genes. An important feature of operons is that the genes contained within them are expressed almost simultaneously^21,22. Thus, in contrast to a regulon, where several genes are controlled by the same transcription factor, and can be activated at the same time or in a sequential fashion, the operonic organization of genes always leads to their almost simultaneous activation. An important conclusion of our predictions is that the number of genes that are activated simultaneously should decrease with increasing enzyme synthesis rates relative to the free-protein synthesis capacity. To test this hypothesis in our optimization runs, we assumed that genes that are activated within a certain time span belong to the same operon (Methods) and determined the average number of genes per operon (GpO) for different values of the individual enzyme synthesis rates. Plotting the average number of GpO over different individual enzyme synthesis rates, we observed a decreasing size of operons with increasing individual enzyme synthesis rates (Fig. 3a). Thus, when individual enzyme synthesis rates are increasing relative to free protein synthesis capacity, the size of operons decreases.

**Figure 3: Signatures of pathway activation strategies from optimization runs.**

Another important aspect of our optimization is that, depending on their abundance relative to the remaining enzymes of the pathway, enzymes should be earlier or later activated. Translated into the terms of genes that are coexpressed together in operons, we would expect that genes that have higher abundances relative to other enzymes of a pathway tend to be coexpressed with earlier enzymes of the same pathway, whereas enzymes with lower abundance should be coexpressed with later enzymes of the pathway.

To test whether we could find this relationship in our optimization runs, we determined the distribution of the positional coexpression bias values of low abundance and high abundance enzymes across our optimization experiments (Fig. 3b). This distribution indicates how often a specific set of enzymes is coexpressed (that is, within the same operon) with earlier or later enzymes of the same pathway (Methods). We compared the distribution of the positional coexpression bias with the optimized operon structure to the distribution we obtained for a random assignment of genes to operons. We found that low abundance enzymes are significantly more often coexpressed with later enzymes of a pathway (Wilcoxon test P-value=6.61 × 10⁻⁴), whereas high abundance proteins are significantly more often coexpressed with earlier enzymes of a pathway (Wilcoxon test P-value=3.38 × 10⁻⁷).

Genomic signatures of activation strategies

To test whether the interplay between enzyme synthesis rates and free-protein synthesis capacity has the predicted influence on pathway activation strategies, we analysed the operonic organization of metabolic pathways across all the pathways contained in 550 prokaryotes of the MicroCyc collection³¹, for which information on the operonic structure from MicrobesOnline is available³². We tested two types of predictions. First, we investigated the influence of enzyme synthesis rates and free protein synthesis capacity on the size of the operons of specific metabolic pathways. Second, we analysed the influence of protein abundance on the order in which proteins within a pathway are activated. To exclude the possibility that there is a general trend for operon sizes to decrease with increasing protein abundance that is unspecific for metabolic pathways, we determined the correlation between protein abundance and operon size for each of the 550 organisms from MicroCyc. We found that in the vast majority of organisms, operon sizes increase with protein abundances, which is opposite to our predictions (Supplementary Note 1).

As measurements on enzyme synthesis rates/protein abundance and free protein synthesis capacity are not available across all of the organisms that we consider, we used genomic features that have a strong influence on these variables for each organism. There are two important factors influencing the free protein synthesis capacity of a cell. The first is the concentration of ribosomes in the cell and the second is the number of protein-coding genes in the genome of an organism. As a reference for the protein synthesis capacity of an organism, we used the copy-number of the ribosomal RNAs in its genome. A correlation analysis across several species shows that there is a strong correlation between both factors (see Supplementary Note 2). Another factor that has a strong influence on the free protein synthesis capacity of an organism is the number of protein-coding genes contained within its genome. If the enzymes of a pathway need to be expressed, only a small fraction of the entire cellular protein synthesis capacity will be allocated to the production of these proteins, as proteins required for other functions of the cell need to be produced at the same time³⁰. Hence, if considering all other factors such as total ribosomal capacity as equal, an increasing number of protein-coding genes within a genome will reduce the free protein synthesis capacity that can be reallocated to the production of the enzymes of a pathway. This is confirmed by a strong positive correlation between the number of protein-coding genes and the copy-number of ribosomal RNAs across 130 species (see Supplementary Note 2).

One genomic feature that is often used as a proxy for protein abundance is the codon adaptation index that is computed from the coding sequence of a gene³³. This measure determines the expression strength of a protein by comparing its codon usage with the codon usage in high-expressed genes³³ such as ribosomal genes. In Supplementary Notes 3 and 4, we show that codon adaptation indices are a good proxy for protein abundance and are comparable across species.

Enzyme synthesis capacities influence operon sizes

Following the prediction of our optimization results, we expect that the number of protein-coding genes, the number of rRNA operons as well as enzyme abundance have an influence on the size of metabolic operons. First, we expect the size of the operons, across which the genes of a pathway are distributed, to decrease with an increasing number of protein-coding genes because of a concomitant decrease in the free protein synthesis capacity (hypothesis 1). Second, we expect the size of the operons of a pathway to increase with an increasing number of rRNA operons because of an increase in the free protein synthesis capacity (hypothesis 2). Third, we expect the size of the operons of a pathway to decrease with an increase in the average abundance of enzymes within this pathway (hypothesis 3).

We tested these hypotheses across the metabolic pathways of 550 prokaryotes contained in MicroCyc³¹. We analysed only metabolic pathways that are present in at least 100 organisms (99 metabolic pathways). To analyse the above hypotheses, we determined for each organism and each pathway the number of operons across which this pathway is distributed. Then we computed the Spearman’s correlation between the factors outlined in the hypotheses while controlling for the other investigated factors. Moreover, to exclude effects outside metabolism that have an effect on operon sizes, we also controlled for the average size of non-metabolic operons for each organism. For more information about how we controlled for confounding influences on operon sizes, see the Methods section and Supplementary Note 5. Subsequently, we corrected all of the resulting P-values for multiple testing using the Benjamini–Yekutieli procedure³⁴, and only accepted correlations as significant if they were below a false discovery rate of 5%.

For 69 of the 99 pathways, we found at least one significant correlation for one of the tested hypotheses (Table 1). For detailed information including P-values across all 99 tested pathways, see Supplementary Data 1–7. Overall, hypothesis 1 is confirmed for 22 of the 99 pathways and rejected (that is, significant correlation in the opposite direction) for 4 pathways. Hypothesis 2 is confirmed for 29 pathways and rejected for 9 pathways. Hypothesis 3 is confirmed for 32 pathways and rejected for 8 pathways. Of the 69 metabolic pathways with at least one significant correlation, 20 fulfill at least two of the hypotheses and 5 fulfill all of the hypotheses (Table 1). We found only two pathways for which two of the hypotheses are rejected. Among the pathways that fulfilled at least two hypotheses, we found 6 pathways associated to amino acid biosynthesis (among 15 such pathways in the 99 pathways), 3 pathways associated to nucleotide biosynthesis and several pathways associated to producing cofactors. Of the five pathways for which all three hypotheses are confirmed, three belong to amino-acid biosynthetic pathways (leucine, proline and tryptophan biosynthesis).

Table 1 Validation of pathway activation strategies

Full size table

These results show that, although not all hypotheses are confirmed at the same time for many pathways, individual enzyme synthesis rates and free protein synthesis capacity have a strong influence on the size of metabolic operons. This is apparent from the much larger number of 83 confirmations of our hypotheses in comparison with only 21 rejections across all pathways. Moreover, our analysis focuses on pathways whose products are essential for growth. Not all biomass components can be considered equally important for growth and the evolutionary pressure to implement the activation programmes that we propose is expected to be higher for products that are of central importance during most growth transitions. This is exemplified by amino acid biosynthetic pathways, whose activation is essential for a resumption of growth under most conditions. Among the 99 pathways that we considered in our analysis, 15 belong to amino-acid biosynthesis. Considering the 20 pathways for which at least two hypotheses were fulfilled, 6 correspond to amino-acid biosynthetic pathways, which is a significant enrichment (hypergeometric test P-value=1.1 × 10⁻²).

Protein abundance influences the timing of activation

A second important prediction of our optimization approach is that the abundance of an enzyme relative to the remaining enzymes of a pathway should have an influence on the order of activation of these enzymes. Reanalyzing the data of previous works on the timing of the activation of enzymes in the arginine biosynthetic pathway of E. coli¹⁹ confirms this result: ArgG, which is the most abundant enzyme of arginine biosynthesis⁶, is activated much earlier than the surrounding steps of the pathway (Supplementary Note 7 and Supplementary Fig. S3). In their work, Zaslaver et al.¹⁹ argued that this discrepancy might be because of pathway topology as ArgG condenses the products of two branches of the arginine biosynthetic pathway. However, we show that our findings also apply to linear chains of reactions that are embedded into more complex pathway topologies (Supplementary Fig. S3).

If protein abundance has the predicted influence on the order of activation of enzymes within a pathway, we would expect that abundant enzymes tend to be expressed together with earlier steps of the same pathway, whereas less abundant enzymes are more often coexpressed with later steps of the same pathway. To test this relationship, we computed the average coexpression bias of high and low abundance enzymes across all organisms of the MicroCyc collection. As described above, the average coexpression bias of an organism for high and low abundance enzymes indicates how often a specific set of enzymes is coexpressed with earlier or later steps of a pathway. To exclude effects that result from dependencies between pathway position and protein abundance, we compared, for each organism, the average coexpression bias that can be obtained for the actual operon structure with the average coexpression bias of a randomized operonic organization (see Methods). The actual operonic organization leads to a significantly later activation of low abundance enzymes compared with the randomized operon structure (Wilcoxon test P-value=1.1 × 10⁻⁸, Fig. 4). The average positional coexpression bias for each organism can be found in Supplementary Data 3. For high abundance enzymes, we find that they are activated significantly earlier through the actual operonic structure in comparison with a randomized operonic structure (Wilcoxon test P-value=2.1 × 10⁻⁴, Fig. 4). Thus, as predicted by the optimization, the operonic structure of metabolic pathways is tuned to a later activation of low abundance enzymes, whereas high abundance enzymes tend to be activated earlier.

**Figure 4: Protein abundance has an influence on the timing of pathway activation *in vivo*.**

Discussion

In this work, we used dynamic optimization for the identification and validation of optimal regulatory strategies for controlling metabolic pathways across a large number of metabolic pathways in several hundred prokaryotes. We based our investigation on the assumption of limitations in individual and free protein biosynthesis capacities²⁶. The results of the dynamic optimization and the validation show that protein abundance is an important factor influencing the type of regulatory programme that is used to control metabolic pathways. Whereas a low abundance of proteins leads to the optimality of a simultaneous activation of all enzymes of a pathway, a sequential activation of enzymes is optimal in case of high abundance proteins. Depending on the relative abundances of enzymes within a pathway, particularly abundant enzymes are activated much earlier than the preceding reaction steps, whereas enzymes with low abundance tend to be activated later than the neighbouring steps of the pathway. Thus, in contrast to the results of previous works, we show that the sequential activation of enzymes along a pathway, known as ‘just-in-time activation’, is only a special case for quick pathway activation. Another important conclusion that can be drawn from our results is that, depending on environmental conditions, there can be a shift in the optimal programme to activate a pathway. As ribosomal capacity correlates with growth rate, it is optimal to simultaneously activate the enzymes within a pathway in a condition supporting high growth rates while it could be optimal to sequentially activate enzymes in a condition only supporting low growth rates. This observation can explain why the sequential activation of the arginine biosynthetic pathway reported in E. coli¹⁹ was not observed in conditions supporting higher growth rates in a recent work³⁵.

As an important factor that is representative of the specific type of regulatory programme that is used to control a metabolic pathway, we identified the operonic organization of enzymes within a pathway. The correlations between genomic features as well as operon sizes for different metabolic pathways that we determined show that operon sizes decrease with increasing protein abundance and increase with increasing protein synthesis capacity. Thus, the interplay between protein abundance and constraints in protein synthesis capacity also represents an important driving force in the growth and decline of metabolic operons. As the optimal abundance of proteins as well as the protein synthesis capacity of an organism change in the course of its evolutionary history, the optimal operonic organization of metabolic pathways constantly changes. Thus, protein abundance as well as protein synthesis capacity are important contributors to the often observed high evolutionary plasticity of operons^36,37.

We expect that our results are of high importance also beyond the level of metabolism, for instance, for the production of complex molecular machineries such as flagella³⁸ or in stress responses³⁹ that require the production of large amounts of protein. Moreover, our results are of relevance for biotechnological applications as they provide guidelines about how a production process should be initiated on the enzymatic level to maximize yield of the product while minimizing the burden on the target organism.

Methods

Optimal regulatory strategies of metabolic pathways

In this work, we consider the activation of a metabolic pathway with a buffered substrate shown in Fig. 1a. Taking into account the dilution of intermediates by growth rate μ(t), we obtain:

and

The kinetic behaviour of metabolites is modelled by irreversible Michaelis–Menten kinetics

with the buffered substrate s=1 (arbitrary concentration unit), for example, (here i=4). The kinetic parameters are set to

or randomly chosen. The initial concentrations are x₁(0)=x₂(0)=x₃(0)=x₄(0)=0 (arbitrary concentration units) by assuming a complete inactive pathway. We modelled the enzyme profiles by differential equations for each enzyme with

including dilution due to cell growth. We considered a corresponding maximum slope due to enzyme synthesis rates by

and the free protein synthesis capacity by

where d_max=0.01. Furthermore, we also integrated the influence of an optimal time varying growth by an additional differential equation for growth rate

This can be interpreted as a dynamic adaption of growth rate due to environmental changes. We choose also a maximum adaption rate

on the basis of a separate time domain, which is slower than the enzyme synthesis (here d_μ,max=0.001).

During pathway activation, the constraints in enzyme synthesis and growth adaption could cause high accumulations of metabolites, which are harmful to the cell^6,17. We took this limitation into account by constraining total metabolite concentration with Ω=4 by

The continuous optimization problem was transformed to a nonlinear programming problem by the quasi-sequential approach⁴⁰. The quasi-sequential approach was extended to handle approximation errors in moving finite element strategies (called qMFE), with constraints on state and optimization variables⁴¹. In this work, we used qMFE also for problems with fixed final time to identify optimal time profiles independent of a predefined element placement. The optimal time courses of the enzymes e_j(t) and growth rate profile μ(t) were computed numerically by using d_j(t) and d_μ(t) as decision variables. As we used a gradient-based approach⁴², to avoid local optima, we solved the problem 100 times with random initializations of the decision variables and choose only the results with the best objective value.

Evaluation of optimization results

To determine the position in the activation sequence (Fig. 2), we sorted the activation times of enzymes in increasing order and determined for each enzyme its position in this ordered list. For the runs with randomized kinetic parameters, the abundances of enzymes (defined as the concentration after pathway activation) were normalized to an average of 1. We defined an enzyme as low abundant if its normalized abundance was <1 divided by 1.1 and as high abundant if it was >1.1.

To determine the number of GpO, we grouped enzymes together in an operon if their activation time was not more than 1 time unit of the optimization apart (total simulation time was t_f=1,000 time units). The activation time t_active,j of enzyme j was determined as the time when e_j(t_active,j)≥d_j,max, that is, when the concentration of enzyme j is above its maximal synthesis capacity for a time unit.

For a given operonic organization resulting from the optimization, we determined the distribution of the positional coexpression bias as follows. First, we grouped enzymes according to the above definition into the sets ‘low abundant’ and ‘high abundant’. Then we determined for each co-occurrence of an enzyme within the high and low abundance sets with any other enzyme of the pathway in an operon the positional coexpression bias, that is, the distance in reaction steps. For instance, if the high abundance enzyme e₃ (that catalyses reaction 3) appears together with e₁ (that catalyses reaction 1) in an operon, this distance is −2 (relative to enzyme e3). For the low abundance enzyme e₂, which occurs in the same operon like e₄, this distance is +2. To construct the histogram in Fig. 3b, we counted the number of occurrences of each possible coexpression bias value and determined the frequency of each value for low and high abundance enzymes independently. We compared the distribution of positional coexpression values with the distribution we obtained from a purely random operonic organization. To this end, we determined the coexpression bias values for an operonic organization in which the identity of genes belonging to each operon has been randomly reassigned. To avoid bias due to a single randomization of operon structure, the distribution of coexpression bias from randomized operons summarizes the overall distribution obtained from 100 independent experiments with randomized operons.

Genome annotation and codon adaptation indices

The operon structure for the considered organisms was downloaded from MicrobesOnline³². General information on the number of protein-coding genes, sets of metabolic and non-metabolic genes, the copy-number of the rRNA operons and detailed information on the genome annotation of the organisms within MicroCyc³¹ were obtained from the PathwayTools⁴³ files provided with the database. We could map information on operonic structure obtained from MicrobesOnline to 550 organisms from the MicroCyc collection. The codon adaptation indices contained within MicroCyc were provided by David Vallenet.

Pathway structure and operonic organization

To determine linear chains of reactions considered in our optimization, we used information on pathway structure provided from MetaCyc⁴⁴. This database contains detailed layout information for each pathway that also allowed us to determine actual substrates and products within the pathway as well as cofactors. On the basis of the layout, we determined the reaction graph of each pathway (as displayed on the web interface of each pathway in the MetaCyc database) and defined the sequence of reactions within each pathway as the longest path between a metabolite that is not produced by any reaction of that pathway (pathway substrate) and a metabolite that is not produced by any other reaction of that pathway (pathway product).

We determined the enzymes associated to each pathway as those that occurred in the longest path, as defined above. The number of GpO was computed as the number of enzymes associated to the pathway divided by the number of operons across which these enzymes are distributed.

The average positional coexpression bias of low and high abundance enzymes of an organism was determined as follows. First, an enzyme was defined as low abundant if its codon adaptation index minus the average codon adaptation index of enzymes within a pathway was below −0.1 (codon adaptation indices range from 0 to 1). Enzymes were defined as high abundant if their codon adaptation index minus the average codon adaptation index of enzymes within the pathway was above 0.1. Results did not change significantly for small changes in these threshold values. For each organism, we determined the distribution of the positional coexpression bias values for high and low abundance enzymes for the actual and the randomized operonic structure, as in the analysis of the positional coexpression bias from the optimization results. We did not consider enzymes occurring in the same operon if they are associated to the same reaction to exclude bias due to enzymes that are coexpressed because they occur in a multi-enzyme complex. Subsequently, we determined the average positional coexpression bias for low and high abundance enzymes with actual and randomized operonic structure as the mean of the corresponding distributions.

Statistical tests

In the analyses of hypotheses about genomic features influencing operon sizes across different species, we considered the influence of the factors ‘ribosomal RNA copy-number’, ‘number of protein-coding genes’, ‘average codon adaptation index of proteins in the pathway’ (as a proxy for protein abundance) and ‘average non-metabolic operon size’. In all analyses, we determined the influence of each factor independently from the other factors on the size of operons in which enzymes for each particular pathway are organized. To this end, we computed partial correlations between each factor and operon sizes while controlling for the other investigated factors. Thus, when testing hypothesis 1 for a specific pathway (relationship between operon size and number of protein-coding genes), we computed the partial Spearman’s correlation between the operon size of genes of the specific pathway and the number of protein-coding genes while controlling for the number of rRNA operons, average protein abundance within the pathway and the average size of non-metabolic operons. Statistical evaluations were performed using R⁴⁵. Partial correlations were computed using the R package ppcor.

To test whether phylogenetic dependencies between species have an influence on our results, we repeated our analyses with reduced organism sets in which organisms belonging to particular clades were randomly removed. As described in Supplementary Note 6, we could confirm that our results also apply to subsets of species from the MicroCyc collection.

Additional information

How to cite this article: Bartl, M. et al. Dynamic optimization identifies optimal programmes for pathway regulation in prokaryotes. Nat. Commun. 4:2243 doi: 10.1038/ncomms3243 (2013).

References

Kacser, H. & Beeby, R. Evolution of catalytic proteins. On the origin of enzyme species by means of natural selection. J. Mol. Evol. 20, 38–51 (1984).
Article CAS ADS Google Scholar
Heinrich, R., Schuster, S. & Holzhütter, H. G. Mathematical analysis of enzymic reaction systems using optimization principles. Eur. J. Biochem. 201, 1–21 (1991).
Article CAS Google Scholar
Heinrich, R. & Schuster, S. The Regulation of Cellular Systems Chapman & Hall (1996).
Ebenhöh, O. & Heinrich, R. Stoichiometric design of metabolic networks: Multifunctionality, clusters, optimization, weak and strong robustness. Bull. Math. Biol. 65, 323–357 (2003).
Article Google Scholar
Cornish-Bowden, A. The Pursuit of Perfection: Aspects of Biochemical Evolution Oxford University Press (2004).
Wessely, F. et al. Optimal regulatory strategies for metabolic pathways in Escherichia coli depending on protein costs. Mol. Sys. Biol. 7, 515 (2011).
Article Google Scholar
Schuetz, R., Zamboni, N., Zampieri, M., Heinemann, M. & Sauer, U. Multidimensional optimality of microbial metabolism. Science 336, 601–604 (2012).
Article CAS ADS Google Scholar
Beaumont, H. J., Gallie, J., Kost, C., Ferguson, G. C. & Rainey, P. B. Experimental evolution of bet hedging. Nature 462, 90–93 (2009).
Article CAS ADS Google Scholar
Satory, D., Gordon, A. J., Halliday, J. A. & Herman, C. Epigenetic switches: can infidelity govern fate in microbes? Curr. Opin. Microbiol. 14, 212–217 (2011).
Article CAS Google Scholar
Alves, R. & Savageau, M. A. Evidence of selection for low cognate amino acid bias in amino acid biosynthetic enzymes. Mol. Microbiol. 56, 1017–1034 (2005).
Article CAS Google Scholar
Geisel, N. Constitutive versus responsive gene expression strategies for growth in changing environments. PLoS One 6, e27033 (2011).
Article CAS ADS Google Scholar
Geisel, N., Vilar, J. M. & Rubi, J. M. Optimal resting-growth strategies of microbial populations in fluctuating environments. PLoS One 6, e18622 (2011).
Article CAS ADS Google Scholar
Bartl, M., Li, P. & Schuster, S. Modelling the optimal timing in metabolic pathway activation-use of Pontryagin’s Maximum Principle and role of the Golden section. BioSystems 101, 67–77 (2010).
Article CAS Google Scholar
Klipp, E., Heinrich, R. & Holzhütter, H. G. Prediction of temporal gene expression - Metabolic optimization by re-distribution of enzyme activities. Eur. J. Biochem. 269, 5406–5413 (2002).
Article CAS Google Scholar
Oyarzún, D., Ingalls, B., Middleton, R. & Kalamatianos, D. Sequential activation of metabolic pathways: a dynamic optimization approach. Bull. Math. Biol. 71, 1851–1872 (2009).
Article MathSciNet Google Scholar
Oyarzun, D. A. Optimal control of metabolic networks with saturable enzyme kinetics. IET. Syst. Biol. 5, 110–119 (2011).
Article Google Scholar
Schuster, S. & Heinrich, R. Time hierarchy in enzymatic reaction chains resulting from optimality principles. J. Theor. Biol. 129, 189–209 (1987).
Article CAS MathSciNet Google Scholar
Zaslaver, A., Mayo, A., Ronen, M. & Alon, U. Optimal gene partition into operons correlates with gene functional order. Phys. Biol. 3, 183–189 (2006).
Article CAS ADS Google Scholar
Zaslaver, A. et al. Just-in-time transcription program in metabolic pathways. Nat. Genet. 36, 486–491 (2004).
Article CAS Google Scholar
Chechik, G. et al. Activity motifs reveal principles of timing in transcriptional control of the yeast metabolic network. Nat. Biotechnol. 26, 1251–1259 (2008).
Article CAS Google Scholar
Alpers, D. H. & Tomkins, G. M. Sequential transcription of the genes of the lactose operon and its regulation by protein synthesis. J. Biol. Chem. 241, 4434–4443 (1966).
CAS PubMed Google Scholar
Alpers, D. H. & Tomkins, G. M. The order of induction and deinduction of the enzymes of the lactose operon in E. Coli. Proc. Natl Acad. Sci. USA 53, 797–802 (1965).
Article CAS ADS Google Scholar
Kovacs, K., Hurst, L. D. & Papp, B. Stochasticity in protein levels drives colinearity of gene order in metabolic operons of Escherichia coli. PLoS Biol. 7, e1000115 (2009).
Article Google Scholar
Ray, J. C. & Igoshin, O. A. Interplay of gene expression noise and ultrasensitive dynamics affects bacterial operon organization. PLoS. Comput. Biol. 8, e1002672 (2012).
Article CAS ADS Google Scholar
Dekel, E. & Alon, U. Optimality and evolutionary tuning of the expression level of a protein. Nature 436, 588–592 (2005).
Article CAS ADS Google Scholar
Stoebel, D. M., Dean, A. M. & Dykhuizen, D. E. The cost of expression of Escherichia coli lac operon proteins is in the process, not in the products. Genetics 178, 1653–1660 (2008).
Article CAS Google Scholar
Ferullo, D. J. & Lovett, S. T. The stringent response and cell cycle arrest in Escherichia coli. PLoS Genet. 4, e1000300 (2008).
Article Google Scholar
Mandelstam, J. The intracellular turnover of protein and nucleic acids and its role in biochemical differentiation. Bacteriol. Rev. 24, 289–308 (1960).
CAS PubMed PubMed Central Google Scholar
Tuller, T. et al. Composite effects of gene determinants on the translation speed and density of ribosomes. Genome Biol. 12, R110 (2011).
Article CAS Google Scholar
Scott, M., Gunderson, C. W., Mateescu, E. M., Zhang, Z. & Hwa, T. Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099–1102 (2010).
Article CAS ADS Google Scholar
Vallenet, D. et al. MicroScope - an integrated microbial resource for the curation and comparative analysis of genomic and metabolic data. Nucleic. Acids Res. 41, D636–D647 (2013).
Article CAS Google Scholar
Dehal, P. S. et al. MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic. Acids Res. 38, D396–D400 (2010).
Article CAS Google Scholar
Sharp, P. M. & Li, W.-H. The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic. Acids Res. 15, 1281–1295 (1987).
Article CAS ADS Google Scholar
Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 1165–1188 (2001).
Gerosa, L., Kochanowski, K., Heinemann, M. & Sauer, U. Dissecting specific and global transcriptional regulation of bacterial gene expression. Mol. Syst. Biol. 9, 658 (2013).
Article Google Scholar
Itoh, T., Takemoto, K., Mori, H. & Gojobori, T. Evolutionary instability of operon structures disclosed by sequence comparisons of complete microbial genomes. Mol. Biol. Evol. 16, 332–346 (1999).
Article CAS Google Scholar
Price, M. N., Arkin, A. P. & Alm, E. J. The life-cycle of operons. PLoS Genet. 2, e96 (2006).
Article Google Scholar
Kalir, S. et al. Ordering genes in a flagella pathway by analysis of expression kinetics from living bacteria. Science 292, 2080–2083 (2001).
Article CAS Google Scholar
Ronen, M., Rosenberg, R., Shraiman, B. I. & Alon, U. Assigning numbers to the arrows: parameterizing a gene regulation network by using accurate expression kinetics. Proc. Natl Acad. Sci. USA 99, 10555–10560 (2002).
Article CAS ADS Google Scholar
Hong, W., Wang, S., Li, P., Wozny, G. & Biegler, L. T. A quasi-sequential approach to large-scale dynamic optimization problems. AIChE. J. 52, 255–268 (2006).
Article CAS Google Scholar
Bartl, M., Li, P. & Biegler, L. T. Improvement of state profile accuracy in nonlinear dynamic optimization with the quasi-sequential approach. AIChE. J. 57, 2185–2197 (2011).
Article CAS Google Scholar
Gill, P. E., Murray, W. & Saunders, M. A. SNOPT: An SQP algorithm for large-scale constrained optimization. Siam J. Optim. 12, 979–1006 (2002).
Article MathSciNet Google Scholar
Karp, P. D. et al. Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief. Bioinform. 11, 40–79 (2010).
Article CAS Google Scholar
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res 40, D742–D753 (2012).
Article CAS Google Scholar
R Core Team. (R Foundation for Statistical Computing (2013).

Download references

Acknowledgements

We thank Sascha Schäuble, Juliane Gebauer, Ines Mynttinen and Abbe Geletu for helpful discussions and comments on the manuscript. We thank David Vallenet for pointing us to the MicroCyc collection of metabolic databases and for providing the codon adaptation indices of these organisms. Financial support from the Deutsche Forschungsgemeinschaft (KA 3541/3) is gratefully acknowledged.

Author information

Authors and Affiliations

Department of Simulation and Optimal Processes, Institute for Automation and Systems Engineering, Ilmenau University of Technology, Helmholtzplatz 5, Ilmenau, 98693, Germany
Martin Bartl, Martin Kötzing & Pu Li
Research Group Theoretical Systems Biology, Friedrich Schiller University, Leutragraben 1, Jena, 07743, Germany
Martin Kötzing & Christoph Kaleta
Department of Bioinformatics, Friedrich Schiller University, Ernst-Abbe-Platz 2, Jena, 07743, Germany
Stefan Schuster

Authors

Martin Bartl
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kötzing
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Schuster
View author publications
You can also search for this author in PubMed Google Scholar
Pu Li
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Kaleta
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.B. and M.K. (M.K. was involved in the initial phase) prepared and conducted the dynamic optimization. They developed the dynamic optimization software. C.K. and M.B. conducted data analysis. C.K., P.L. and S.S. conceived research and commented on the manuscript. M.B. and C.K. wrote the manuscript.

Corresponding author

Correspondence to Christoph Kaleta.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figures, Tables, Notes and References

Supplementary Figures S1-S4, Supplementary Table S1, Supplementary Notes 1-7 and Supplementary References (PDF 939 kb)

Supplementary Data 1

Organism statistics. The first three columns provide the MicroCyc-ID, the organism name and the NCBI ID. The fourth column indicates whether MicrobesOnline contains information on the operonic structure of the genome. Columns five through seven indicate genome statistics for the organisms derived from the annotated PathwayTools-files of the corresponding organism from MicroCyc. Columns eight and nine give, if available, the growth rate (doubling time in hours) as well as the growth temperature of the organism in °C. Information was obtained from Supplementary Reference 47. (XLSX 107 kb)

Supplementary Data 2

Information about the statistical tests of three hypotheses about the influence of enzyme synthesis rates and free protein synthesis capacity on the size of metabolic operons. Only pathways that occur in more than 100 organisms of the MicroCyc collection are shown. The number of organisms in which each pathway occurs is indicated in the third column. For each pathway the spearman correlation as well as the p-value of the correlation are given. Correlations represent partial correlations (see main document). p-values have been corrected for multiple testing using the Benjamini-Yekutieli procedure. (XLSX 23 kb)

Supplementary Data 3

Average positional coexpression bias for different organisms for high and low abundance enzymes. The second column in each sub-table indicates the average positional coexpression bias with the actual operon structure and the third column the average positional coexpression bias with randomized operons. (XLSX 30 kb)

Supplementary Data 4

Correlation between operon sizes and protein abundance for all organisms considered in this study. Columns two through seven contain the Spearman correlation and the p-value between operon size and average codon adaptation indices for each operon. p-values were corrected using the Benjamini-Yeketueli procedure. The different sets of genes correspond to all genes and all genes not annotated with a metabolic function. (XLSX 41 kb)

Supplementary Data 5

Comparison between the average CAIs of proteins belonging to each pathway across all organisms and the abundance of the products of the corresponding pathway in E. coli. The upper part of the sheet contains the comparison and the lower part of the sheet information on the molar abundance of each metabolite taken from the Supplementary Material of Supplementary Reference 51. The first column of the upper part gives the MetaCyc name of each pathway and the second column the abbreviated names of the corresponding metabolite in the biomass composition. The third column indicates the total molar abundance of the products of the pathway. For each organism, we computed the average of CAIs of the proteins belonging to each pathway. These values were averaged across all organisms containing the corresponding pathway and are indicated in column five. (XLSX 13 kb)

Supplementary Data 6

Information about tests for phylogenetic bias affecting the three hypotheses on genomic influences on operon size. Column two gives the number of MicroCyc organisms considered in each run and column three the number of pathways. Columns four through nine indicate for each hypothesis the number of pathways for which the hypothesis was rejected or accepted. (XLSX 54 kb)

Supplementary Data 7

Information about tests for phylogenetic bias affecting the frequency of coexpression of low and high abundance enzymes with earlier and later enzymes of the same pathway. Column two gives the number of MicroCyc organisms considered in each. Columns three through eight give for low and high abundance enzymes the average positional coexpression bias with the actual operon structure and a randomized operon structure. The p-value of a test of both distributions with a Wilcoxon test is indicated in columns five and eight. (XLSX 98 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bartl, M., Kötzing, M., Schuster, S. et al. Dynamic optimization identifies optimal programmes for pathway regulation in prokaryotes. Nat Commun 4, 2243 (2013). https://doi.org/10.1038/ncomms3243

Download citation

Received: 09 April 2013
Accepted: 04 July 2013
Published: 27 August 2013
DOI: https://doi.org/10.1038/ncomms3243

This article is cited by

Using optimal control to understand complex metabolic pathways
- Nikolaos Tsiantis
- Julio R. Banga
BMC Bioinformatics (2020)
A simple method for identifying parameter correlations in partially observed linear dynamic models
- Pu Li
- Quoc Dong Vu
BMC Systems Biology (2015)
Optimal programs of pathway control: dissecting the influence of pathway topology and feedback inhibition on pathway regulation
- Gundián M de Hijas-Liste
- Eva Balsa-Canto
- Christoph Kaleta
BMC Bioinformatics (2015)
Optimality in the zonation of ammonia detoxification in rodent liver
- Martin Bartl
- Michael Pfaff
- Pu Li
Archives of Toxicology (2015)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.