## Abstract

Which properties of metabolic networks can be derived solely from stoichiometry? Predictive results have been obtained by flux balance analysis (FBA), by postulating that cells set metabolic fluxes to maximize growth rate. Here we consider a generalization of FBA to single-cell level using maximum entropy modeling, which we extend and test experimentally. Specifically, we define for *Escherichia coli* metabolism a flux distribution that yields the experimental growth rate: the model, containing FBA as a limit, provides a better match to measured fluxes and it makes a wide range of predictions: on flux variability, regulation, and correlations; on the relative importance of stoichiometry vs. optimization; on scaling relations for growth rate distributions. We validate the latter here with single-cell data at different sub-inhibitory antibiotic concentrations. The model quantifies growth optimization as emerging from the interplay of competitive dynamics in the population and regulation of metabolism at the level of single cells.

## Introduction

After the significant developments in molecular biology and biochemistry in the last century, many aspects of cellular physiology could be understood as a result of interactions between identified molecular components. Perhaps the best-characterized example is intermediate metabolism, the set of reactions that enable cell growth by converting organic compounds and transducing free energy. Today it is possible to some extent to infer the topology of metabolic networks from data at genomic scale, but the dynamics and parameter dependence of such networks remain difficult to analyze. Alternatively, one can assume that known reactions only provide physico-chemical constraints within which some adaptive dynamics has maximized the growth rate, e.g., by adjusting enzyme levels and controlling reaction rates^{1}. An influential implementation of this idea for batch cultures under steady-state conditions has been the flux balance analysis (FBA)^{2}, which has been tested experimentally^{3,4}, also in mutant strains, strains used for industrial production^{5,6,7}, as well as phenotypes implicated in disease (e.g., Warburg effect^{8}). Using maximum entropy ideas from statistical physics, we extend the application of FBA from batch to single-cell level and show that our extension makes a wide range of predictions, some of which we test experimentally.

Recent measurements at the single-cell level demonstrated the existence of substantial cell-to-cell growth rate fluctuations even in well-controlled steady-state conditions^{9}. These fluctuations exhibit universal scaling properties^{10,11,12,13}, relate to cell size control mechanisms^{14,15}, act as a global collective mode for heterogeneity in gene expression^{16,17,18}, and are ultimately believed to affect fitness^{19}. To link these observations to metabolism, however, we need to set up a mathematical description not only of the optimal metabolic fluxes and maximal growth rate in batch culture (as in FBA, which permits no heterogeneity across cells), but for the complete joint distribution over metabolic fluxes. Metabolic phenotypes of individual cells growing in steady-state conditions can then be understood as samples from this joint distribution, which would automatically contain information about flux correlations, and, in particular, could directly predict cell-to-cell growth rate fluctuations.

The simplest construction of a joint distribution over metabolic fluxes can be derived in the maximum entropy framework^{20}. The key intuition is to look for the most unbiased (or random) distribution over fluxes through individual metabolic reactions that is consistent with the given stoichiometric constraints, while matching the experimentally measured average growth rate. The maximum entropy model that we specify below will turn out to be a one-parameter family of distributions, where the single parameter can be fit to match experimental data; all subsequent predictions follow directly, without further fitting. A similar approach has recently been used in diverse biological settings, ranging from neural networks^{21,22}, genetic regulatory networks^{23}, antibody diversity^{24}, and collective motion of starling flocks^{25}.

In addition to accounting for cell-to-cell variability, the maximum entropy construction provides a principled interpolation between two extremal regimes of metabolic network function. In the uniform (no-optimization) limit, no control is exerted over metabolic fluxes: they are selected at random as long as they are permitted by stoichiometry, resulting in broad yet non-trivial flux distributions that support a small, non-zero growth rate. In the FBA limit, fluxes are controlled precisely to maximize the growth rate, with zero fluctuations. The existence of these two limits defines a fundamental, and still unanswered, question about metabolic networks: Is there empirical evidence that real metabolic networks are located in an intermediate regime between the two limits where fluctuations are non-negligible^{26}, and if so, what are the properties of this intermediate regime (see Fig. 1)? Here, we address this question using metabolic flux and single-cell physiology data for *Escherichia coli*.

In the Methods section, we provide a review of the maximum entropy formalism for metabolic networks as it has been established in previous work^{26,27,28,29,31} and we stress its implementation in our work. In the Results section, we use this formalism to set up and test quantitative predictions for *E. coli* as well as to discuss possible theoretical extensions. We provide here compelling experimental evidence that the observed growth rate fluctuations reflect metabolic flux variability and sub-optimality of growth, both of which are captured quantitatively by the maximum entropy model of the metabolic network.

## Results

### Experimental test of flux predictions for *E. coli*

We constructed a maximum entropy model for the catabolic core of the *E. coli* metabolism (see Methods section). The model has a single parameter *β* that constrains the average growth rate in the flux space, interpolating from an uniform sampling (*β* = 0) to the optimal FBA solution (*β* → ∞). In particular, we consider the specific value of this parameter *β*^{*} inferred by constraining the average growth rate in the model to match the population experimental growth rate (\(\overline \lambda = 0.2\,{\mathrm{h}}^{ - 1}\) for a set of 12 experiments shown in Fig. 2a, and \(\overline \lambda = 0.1\,{\mathrm{h}}^{ - 1}\) for a set of seven experiments, not shown).

To evaluate the quality of model predictions, we compared *N*_{f}= 20 measured metabolic fluxes in *E. coli* from previously published data to our predictions, as shown in Fig. 2a. We defined the mean-squared error (MSE) as

where *V*_{i} is the measured flux (relative to glucose uptake) and 〈*v*_{i}〉 is the mean of the corresponding flux computed in the maximum entropy model of Eq. (7). Figure 2b examines the behavior of MSE as a function of the parameter *β*. First, we note that the best flux predictions occur at or close to the value \(\beta ^ \ast \lambda _{{\mathrm{max}}} \simeq 120\), identified by the maximum entropy fit, at both average growth rates. This is a non-trivial prediction, because the value of *β** was not fitted to minimize MSE, but rather, as demanded by the maxent formalism, to match the population growth rate. Second (and unsurprisingly), we find that flux predictions are better at *β** than with uniform sampling, at *β* = 0. Perhaps the most surprising is our third finding: flux predictions at intermediate value of beta (\(\beta ^ \ast \lambda _{{\mathrm{max}}} \simeq 120\)) significantly outperform the limit of *β* → ∞, i.e., the FBA solution.

In addition to a better quantitative match overall, the maximum entropy model correctly predicted non-zero flux through the glyoxylate shunt, i.e., for the isocitrate lyase (ICL) as well as ME1 reactions, which FBA misses qualitatively by setting them to zero. As a consequence, this also leads to a better match of our model with data for reactions isocitrate dehydrogenase (ICDH) and alpha ketoglutarate dehydrogenase (AKGDH) that channel pyruvate through the Krebs cycle.

Lastly, we point out that Eq. (7) can also be viewed as a phenomenological equation for average fluxes with a single fitting parameter *β* that is set not to match the measured population growth rate, as in the maxent formalism, but simply to minimize some error measure (say MSE) with respect to experimentally measured fluxes. In Supplementary Note 4, we show that this also leads to predictions that outperform FBA for measured fluxes both in wild-type and mutant strains.

It is instructive to examine the evolution of the joint distribution over fluxes, *P*_{β}(**v**), as a function of the optimization parameter, *β*. Figure 3a shows how the growth rate approaches the maximal rate achievable, *λ*_{max}, with the inferred values of *β*^{*} from Fig. 2a, c suggesting an optimization level in the range of ~80% of the maximum. These levels are reached by adjusting flux values away from what they would have been under uniform sampling from the polytope of the allowed metabolic phenotypes, \({\cal P}\). Figure 3b traces the relative changes in all fluxes as a function of *β*. Interestingly, in the FBA limit, almost half of the fluxes (38 out of 86 fluxes, the upper half of the plot) are forced to zero, whereas at the inferred value (*β*^{*}*λ*_{max} ≈ 120), these fluxes only decrease by about 1/3 relative to their average value in the uniform sampling limit. Furthermore, the glyoxylate shunt remains active, in agreement with experimental observations. Surprisingly, only for a few reactions the fluxes are predicted to increase with growth rate optimization relative to the uniform sampling (lowest ~5 fluxes in Fig. 3b). These are mainly nitrogen and phosphate transport reactions, and to a lesser extent, malate dehydrogenase (MDH) and phosphoglucose isomerase (PGI) reactions. The latter two reactions are classified as reversible, whereas regulation in metabolic networks is thought to take place at irreversible reactions^{32}, so the predicted increase may be a consequence of increased substrate levels.

We separately illustrate three flux behaviors in Fig. 3c, for ICL, ICDH, and glutamate dehydrogenase (GLUDy). ICL and ICDH track the relative channeling of carbon sources in the Krebs cycle vs. glyoxylate shunt; ICL flux is switched off in the *β* → ∞ limit, whereas ICDH flux remains nearly constant with *β*. In contrast, GLUDy reaction is reversible, switching sign at intermediate values of *β*, while at high *β* the reaction ultimately gets frozen in the backward direction, implying high levels of ammonia in the cell, given the low affinity of this enzyme for ammonia^{33}.

We also evaluated the predicted variability in metabolic fluxes from the maximum entropy model, Eq. (7), at *β*^{*}*λ*_{max} ~ 10^{2}, and found a clear division between reactions with high and low coefficients of variation (CV). Among fluxes with lower variability were all glycolytic reactions (CV < 0.3) with the exception of PGI, as well as all transport reactions related to biomass formation (i.e., for glucose, oxygen, ammonia, carbon dioxide, phosphate ions; CV < 0.11), the first part of the Krebs cycle, and the irreversible reactions of oxidative phosphorylation.

We next wondered how flux variances scale with the optimization level. In the uniform sampling limit (*β* = 0), the variances should be large, characteristic of the shape and extent of the permitted polytope, \({\cal P}\). While in the FBA limit (*β* → ∞) the flux variability should vanish, we expect a well-defined scaling regime at high *β* where the variances shrink toward the FBA solution in a manner that is independent of the global polytope properties. This regime is indeed reached for all fluxes at \(\bar \lambda /\lambda _{{\mathrm{max}}} \gtrsim 0.90\) and for some fluxes much earlier, as shown in Fig. 3d: flux variability subsequently decreases with *β* as \(\sigma _i(\beta )/\sigma _i(\beta = 0) \propto \left( {1 - \bar \lambda (\beta )/\lambda _{{\mathrm{max}}}} \right)\).

What kind of correlation structure between fluxes does the maximum entropy model predict? While the growth rate *λ* is linear in constituent fluxes in Eq. (7), suggesting that the joint distribution could factorize, correlations between fluxes develop because of the stoichiometric constraints that define the polytope \({\cal P}\). A subset of fluxes that we focus on in Fig. 3 exhibits a clear structure of strong (anti-)correlation both under uniform sampling (Fig. 3e) and in the FBA limit (Fig. 3f). The FBA pattern of correlations, in particular, can easily be partitioned into four groups using a clustering algorithm^{34} so that the groups are strongly enriched for reactions characteristic of glycolysis, glyoxylate shunt, pentose phosphate pathway, and citric acid cycle, respectively. Fluxes in the glycolysis cluster tend to correlate strongly with fluxes in the citric acid cycle cluster, but anti-correlate with glyoxylate shunt and pentose phosphate pathway cluster. Comparison of the FBA correlations (F) with the uniform sampling (E) reveals that stoichiometric constraints alone shape much of the correlation structure, with the exception of anti-correlation between glycolysis and glyoxylate shunt clusters, which is a distinct consequence of the growth rate optimization. More generally, it is intriguing to apply maximum entropy to recover the correlation structure of metabolic fluxes in the FBA limit and use that to identify, automatically via clustering, separate metabolic pathways.

### Lower limit of regulatory information needed for fast growth

As the growth rate optimization parameter *β* is increased, flux variances shrink (Fig. 3d), correlations strengthen (Fig. 3f), and the distribution over fluxes within the polytope \({\cal P}\) localizes closer to the FBA solution, **v**_{max}. Could this localization emerge due to competitive growth dynamics in batch culture^{28}? To test this hypothesis, we checked the relationship predicted by Eq. (13), which can be solved analytically (see Supplementary Methods). For the typical values of the carrying capacity vs. inoculation size ratio (\(N_{\rm C}/N_0 \simeq 10^6\)), the corresponding prediction from Eq. (13) is *β*^{*}*λ*_{max} ~ 50, considerably underestimating \(\beta ^ \ast \lambda _{{\mathrm{max}}} \simeq 120\), recovered by the maximum entropy model as reproducing the population growth rate and showing good match with measured fluxes. In other words, metabolic fluxes are more localized and growth is closer to optimal than would be expected in a scenario where metabolic reactions at the individual cell level are not actively regulated.

Motivated by these findings, we explored an alternative scenario where the localization of the distribution over fluxes around the optimal growth rate is achieved by active regulation of metabolic reactions. First, we quantified the degree of localization by the decrease in the entropy of the distribution over fluxes, Eq. (10). This is plotted for our *E. coli* network in Fig. 3g, where we show the average growth rate, \(\bar \lambda\), as a function of information, *I* (expressed in bits), parametrically in *β*. The resulting curve divides the \((I,\bar \lambda )\) plane into two halves: while it is possible to achieve metabolic phenotypes below the \(I(\bar \lambda )\) curve, the dashed region above the curve is forbidden. This is because *no* distribution exists that achieves high growth rates \(\bar \lambda\) without also deviating from the uniform distribution by at least the required number of bits.

Figure 3g suggests that at least ~40 bits of information are required to control the fluxes and reach growth rates amounting to ~80% of the maximal rate, *λ*_{max}, as reported in data for *E. coli*; higher growth rates call for increasing amounts of information, which formally diverges in the FBA limit as *β* → ∞. Interestingly, the number of out-of-equilibrium reactions in the model, 39, is in good agreement with the inferred amount of minimal information, given a simplistic estimate of 1 bit per reaction (sufficient to distinguish, e.g., high from low expression of the metabolic enzyme). This is consistent with the hypothesis that regulatory control is exerted for enzymes that catalyze irreversible reactions^{32}.

Cells control metabolic fluxes through regulatory networks, either indirectly, by regulating the expression of metabolic enzymes, or directly, by modulating the enzymatic activity through various feedback loops; either way, metabolic resources are required to exert this control. This leads to a trade-off: flux control is necessary to support a high growth rate, but itself carries a growth rate penalty. We created a simple toy model to capture this intuition (see Supplementary Note 2). Here, *K* regulatory pathways control the fluxes and each pathway is modeled as a Gaussian information channel, so that together, these channels provide *I*(*β*) bits of necessary information as shown in Fig. 3g. The signal-to-noise of each regulatory channel is determined by the number of regulatory molecules: higher molecular counts enable precise control and thus higher information, but impose higher cost. In this model, the cost-free growth rate at given *β* is reduced by the cost to support *K* channels which control the fluxes, so that the resulting effective growth rate is:

where *α* determines the metabolic cost of regulatory molecules, and we estimated the number of regulatory pathways, *K*, to be approximately the number of degrees of freedom of the flux polytope, *K* ≈ *D*. The cost of regulation clearly limits the achievable growth rate, as shown in Fig. 3h, where the \(\bar \lambda _{{\mathrm{eff}}}(\beta )\) curves now develop a maximum rather than increasing monotonically as in the cost-free case of Fig. 3a. While our toy model is very simplistic, it does capture properly the scaling of information with the growth rate, as well as the exponential metabolic cost of achieving high information transmission in molecular networks, reported previously^{35,36}. Thus, among many possible constraints acting on a cell, the cost of regulating metabolism itself^{1} can impose non-negligible limits to growth.

### Experimental test of growth rate fluctuation scaling

Can we test the novel predictions of our theory that extend beyond the domain of validity of the FBA? While it is currently experimentally unfeasible to measure metabolic fluxes and their variability at the single cell level, one can tractably measure division times and growth rates for single *E. coli* cells growing in stable conditions for long periods of time. In our model, such growth measurements directly connect to the biomass producing reaction with its associated growth flux *λ*(**v**). Figure 3d suggests that flux variability should scale \(\propto (\lambda _{{\mathrm{max}}} - \bar \lambda )\), and since the growth flux is a linear combination of metabolic fluxes, its variability, too, should follow the same scaling. To verify this explicitly, we computed the fluctuations in growth rate, *σ*/*λ*_{max}, as a function of the optimization parameter *β*, in Fig. 4a.

In the range of \(\beta \lambda _{{\mathrm{max}}} \gtrsim 40\), characteristic of wild-type *E. coli* experiments, the predicted growth fluctuations indeed obey

we refer to this range as the scaling regime. Beyond variance, the complete distribution of growth rates, *Q*(*λ*), can be sampled by marginalizing the maximum entropy model, Eq. (7).

Measurements of single-cell growth rates allow us to estimate growth rate distributions and compare them to the predicted *Q*(*λ*), as well as to empirically extracted \(\bar \lambda\), *λ*_{max}, and the fluctuations *σ*, to verify the predicted relation of Eq. (3). We used previously published data^{37} where *E. coli* cells were stably grown in a mother machine microfluidic device while multiple sub-inhibitory steps of concentration of the antibiotic tetracycline were delivered as shown in Fig. 4b. Low concentrations of antibiotic allowed us to probe different average growth rates in the same setup, and to construct empirical distributions of growth rates for every antibiotic concentration by pooling data from technical replicates of the multi-step experiments (Supplementary Note 3). We find an excellent match between measured and predicted growth rate distributions in Fig. 4c for all five concentrations of the antibiotic used. Looking at many individual lineages in separate microfluidic channels, we can also extract *λ*_{max}, \(\bar \lambda\), and *σ* per lineage empirically and confirm the predicted scaling of growth rate fluctuations, as shown in Fig. 4d.

To conclude this section on fluctuations, we briefly mention that, under mild conditions, it is possible to study the dynamical response of the network in the linear regime under small perturbations. For example, ref.^{26} introduced a simple biologically motivated dynamics of diffusion-replication inside the metabolic space, described by the one-parameter equation for the growth rate distribution *Q*(*λ*), showing that the following fluctuation-dissipation scaling laws should hold for the typical response times as well as the growth rate fluctuation autocorrelation time *τ*:^{29}

This relation, which further extends Eq. (3), is experimentally testable and predicts a divergent slowing down of the response time with growth rate maximization. As a consequence, growth rate fluctuations could take on a functional role in speeding up the response to environmental perturbations, e.g., to nutritional up-shifts or externally applied stresses. Even if the experimental test of such dynamic predictions is beyond the scope of this paper, our model connects to a wide range of currently ongoing metabolism- and growth-related investigations.

## Discussion

In this work, we considered maximum entropy distributions at fixed average growth rate in the space of metabolic phenotypes, a straightforward and statistically rigorous extension of the FBA, which is recovered in the asymptotic limit. Experimental estimates of enzymatic fluxes of the central carbon core metabolism in bulk cultures of *E. coli*, as well as empirical growth rate distributions of *E. coli* collected from single cell measurements, are consistent with intermediate level of growth optimization (*βλ*_{max} ~ 10^{2} and \(\bar \lambda /\lambda _{{\mathrm{max}}}\sim 0.8\)). We find that variability can be captured by a simple maximum entropy model, and that the zero-fluctuation FBA limit qualitatively misses important experimental facts, e.g., the observed non-zero fluxes through the glyoxylate shunt.

The improved ability of our model to match the flux measurements is a consequence of a single extra parameter, *β*, which can easily be determined from existing experimental data. Beyond a better fit, however, our model also makes a wide range of predictions, extending the domain of metabolic network analysis to the single-cell level. While it is difficult to measure the single-cell metabolic fluxes and their variability in isogenic populations in steady state, such measurements for the growth rate are increasingly available. This connection enables the new predictions of our theory to be tested, and opens up the theory for verifiable extensions. Validating the predicted scaling of growth rate fluctuations in Fig. 4 is only the first step, with two broad lines of investigation within reach.

First, our approach is not limited to the core catabolism analyzed here or to bacterial metabolism, but can in principle be extended to other genome-scale networks. In practice, however, we often lack suitable large-scale experimental flux measurements. It is also likely that physico-chemical constraints alone are insufficient to yield quantitatively accurate predictions. Similar issues arise also in the core catabolism for high growth rates that exceed the threshold of the acetate switch, for which a trade-off between growth yield and rate emerges and additional constraints have to be added in FBA-based approaches^{38}. Our method can be extended to accommodate such cases, or systems where strict growth maximization is likely not a suitable objective. Extra objectives or constraints in the maximum entropy would appear as additional terms in the exponent of Eq. (7), where their corresponding parameters would control various trade-offs between the objectives. This flexibility may be required to model metabolic dependencies, cell type heterogeneity, or interactions between cells.

Such extensions of maximum entropy modeling will benefit from the recent flourish of statistical-physics-inspired algorithms, ranging from belief propagation^{39,40,41}, relaxational learning^{42}, and gaussian analytical approximation^{43}, used to solve the sampling problem, which is computationally at the heart of our approach (where on the other hand FBA relies on simpler linear programming). While the employed Monte Carlo hit-and-run Markov chain is sufficient for the analyzed network, faster methods (in particular^{43}) will pave the way to large-scale applications and inverse modeling settings.

Second, we considered two possible mechanisms for the emergence of maximum entropy distributions over metabolic fluxes. On the one hand, analysis of population dynamics of competitive growth under resource constraints reveals that maximum entropy distributions at fixed average growth rate are the steady states of logistic growth, in principle giving further testable predictions on the dependence of the growth optimization from inoculum size and medium carrying capacity. In essence, it is the exponential character of the growth laws that leads naturally to Boltzmann distributions. Despite the same functional form, this is in contrast to the standard case of statistical mechanics at equilibrium, where the link between molecular dynamics and the equilibrium distribution is non-trivial. On the other hand, localization of growth distributions toward the optimum can also arise due to the active regulation of metabolic enzymes, most likely involved in catalyzing irreversible reactions. These two mechanisms are not mutually exclusive and can actually operate concurrently; in our estimate, the purely population-dynamics scenario substantially underpredicts growth rate optimization (i.e., *βλ*_{max}) for the bulk culture, likely because it completely disregards active regulation of metabolism, which is known to be important. Note that the two mechanisms are, at least in principle, distinguishable experimentally: in the mother-machine device, there is no competition across the independent microfluidics channels, putting us in the regime with a very small *N*_{C}/*N*_{0}, which predicts smaller *β*^{*}*λ*_{max} than in the bulk, qualitatively in line with the observations. In other words, in bulk, both single-cell regulation and competitive growth may be active simultaneously, leading to higher growth rate optimization than in single-cell microfluidics measurements, where competitive growth is nearly absent. Further investigations are required to tease apart these two contributions quantitatively, in particular allowing for growth state transitions in modeling.

The connection between maximum entropy models and fluctuation-dissipation relations, Eq. (4), requires further assumptions that need to be tested separately, but makes a very strong prediction about the relationship between the autocorrelation time of growth fluctuations and the typical response time to, e.g., nutrient shifts. This relationship appears fundamental, since the response time is a central biological quantity measurable in bulk, while the fluctuation autocorrelations are microscopic, single-cell properties, which can be measured with recent experimental setups. Interestingly, the predicted response times lengthen with the degree of growth rate optimization, suggesting a trade-off between responsiveness to changes and efficiency in steady state; as a consequence, it is unclear whether the evolutionarily optimal outcome should be equated to complete growth rate optimization with no fluctuations, e.g., the FBA limit. Quantitatively, in stable environments where *E. coli* grows well and possibly achieves a high degree of growth rate optimization, one could experimentally look for signatures of long-timescale fluctuations, either directly in the growth signal, or by proxy through constitutive gene expression. Curiously, we report that the parameter *β* of our model has the dimension of time, whose best-fit value inferred from *E. coli* data is of the order of 1 day.

Beyond extensions to dynamics, our analysis made two further theoretical contributions. First, it clarified the relative roles of stoichiometric constraints and the growth optimization assumption in FBA. The maximum entropy model is an explicit construction of a smooth interpolation between the uniform regime (where only stoichiometric constraints are active) and the FBA (where growth is maximized in addition). The uniform limit is a natural baseline—where no control is exerted by the cell—against which to compare the observed fluxes, their fluctuations, and correlations, as we have done in Fig. 3. Without this baseline comparison, it is hard to assess how surprising the observations of metabolic optimality should really be^{3}. Our second theoretical contribution is the observation that a certain minimal information is needed to achieve a desired growth rate (Fig. 3g, h). This information is expressed in the same currency (bits) in which we measure the performance of regulatory networks, enabling us to suggest a trade-off that sets the optimal degree of metabolic control. Contrary to other cellular networks where estimation of information only has been done for single network components or simple pathways^{36}, the metabolic network is the sole case where we could estimate the lower bound on the required number of regulatory bits. Our statistical mechanics approach thus opens a connection between metabolic networks and their regulatory counterparts, which is both of theoretical interest and could also be probed in comparative genomic studies.

## Methods

### General background

We consider the set of reactions in the well-mixed, continuum limit. Let *S*_{iμ} be the stoichiometric coefficient of the metabolite *μ* (whose concentration is *c*_{μ}) in reaction *i*, whose flux is *v*_{i}. The metabolic network dynamics is then given by mass balance equations:

Assuming steady state, \(\dot c_\mu = 0\), and including further constraints from thermodynamics, nutrient availability, and kinetic limits in the form of lower (LB) and upper (UB) bounds on fluxes, we obtain a convex polytope \({\cal P}\) of feasible steady states (metabolic phenotypes) in the space of fluxes:

In addition to bona fide, well-balanced chemical reactions, constraint-based models often include a phenomenological biomass reaction in the form of a linear combination of metabolite fluxes, \(\lambda ({\mathbf{v}}) = \mathop {\sum}\nolimits_i \xi _iv_i\), where the proportions *ξ*_{i} are set to mimic cell growth, i.e., the metabolite fluxes necessary to reconstitute the biomass of a new cell in a typical division time.

### The network

The network employed in the study is the catabolic core of the genome-scale reconstruction iAF1260 (see Supplementary Methods), in a glucose-limited minimal medium in aerobic conditions^{44}. The network comprises *N* = 86 reactions among *M* = 68 metabolites and includes glycolysis, pentose phosphate pathway, TCA cycle, oxidative phosphorylation, and nitrogen catabolism. The dimension of the resulting polytope \({\cal P}\) of allowed steady states is *D* = 23, from which we can efficiently draw flux configurations using Hit-and-Run Monte Carlo Markov Chain after suitable preprocessing^{27} (see Supplementary Methods).

### Maximum entropy modeling

FBA looks for the flux configuration **v**_{max} that maximizes growth *λ*_{max} = *λ*(**v**_{max}) subject to constraints given by Eq. (6), which can be easily found by linear programming. In contrast, our maximum entropy approach starts with a distribution over fluxes with a Boltzmann form, which assumes that the fluxes are as random as possible while achieving a desired average growth rate^{26}:

The parameter, *β*, of the distribution *P* can then be set to match the predicted average growth rate to the measured growth rate, *λ*_{data}:

Once *β* is fixed, the joint distribution of Eq. (7) can be queried for average fluxes, flux correlations, or other quantities of interest that we discuss later.

The maximum entropy distribution with a constrained average growth rate has two interesting limits, as illustrated in Fig. 1. The growth rate, \(\bar \lambda _{}^{}\), increases with *β* (which we will refer to as an optimization parameter) until, in the limit *β* → ∞, the distribution *P*_{∞}(**v**) collapses into a delta function at **v**_{max}, lying at the boundary of the polytope \({\cal P}\): this is the FBA solution that supports the maximal growth rate *λ*_{max}. Conversely, as *β* → 0, Eq. (7) yields a uniform sampling of fluxes over the permitted polytope \({\cal P}\): this uniform solution is an interesting baseline case for comparison because it incorporates all stoichiometric constraints but postulates no growth rate optimization. In statistical physics, high-*β* regime (limiting toward the FBA solution) corresponds to the energy-dominated regime, while the low-*β* regime (limiting toward the uniform sampling) corresponds to the entropy-dominated regime; the optimization parameter *β* corresponds to the inverse temperature.

Apart from generic information-theoretic arguments put forward by Jaynes in support of the maximum entropy approach^{20}, are there further justifications for using the Boltzmann form in Eq. (7) that would be specific to the case of metabolic networks? Below, we consider two non-exclusive possibilities: active regulation at the single-cell level and competitive growth dynamics in a population.

### Information costs of regulation

The first possibility is to mechanistically interpret the deviation of flux distributions away from the uniform sampling of the polytope \({\cal P}\) and its localization around the optimal solution as a consequence of the active regulation in the metabolic network. Such regulation could be achieved by, e.g., control over gene expression of key metabolic enzymes, or by allosteric feedback regulation mediated by metabolite concentrations or fluxes. The degree of localization of the flux distribution can be quantified by its entropy:

Because *P*_{β}(**v**) is, by construction, a maximum entropy distribution with average growth rate \(\bar \lambda (\beta )_{}^{}\), the decrease in entropy^{26},

is a measure for the minimal amount of information necessary to control the fluxes and achieve a given average growth rate. Equivalently, if we were to construct a regulation system that needs to realize the Boltzmann distribution of Eq. (7), *I*(*β*) would provide a lower bound on its information demand. In the Results section, we estimate this information demand for the *E. coli* network and propose a toy regulatory model that can meet it.

### Competitive growth dynamics

The second possibility is that the Boltzmann distribution emerges from competitive growth dynamics. Since its historical origins in statistical physics, much research has been devoted to uncovering the dynamical roots of Boltzmann distributions, whose study highlighted important concepts and applications, ranging from ergodicity to fluctuation-response relations. The same questions naturally arise in the context of its application in metabolism. It has been shown that the maximum entropy distribution at a fixed average growth rate is recovered independently and justified dynamically as the steady state of logistic growth^{28}. Since the logistic growth is the standard model used to experimentally fit optical density curves^{45}, this link also provides a possible interpretation of the maximum entropy parameter *β*, as we discuss below.

Consider a population of initial size *N*_{0} in a medium with carrying capacity *N*_{C} and assume that the intrinsic growth rates of individuals, *λ*_{i}, are sampled independently from a distribution *q*(*λ*), defined over the feasible polytope \({\cal P}\). In the simplest setting, upon neglecting growth state transitions, the number *n*_{i} of cells with growth rate *λ*_{i} will evolve in time according to

Then, \(n_i(t) = e^{\beta (t)\lambda _i}\), with

Under a mean field approximation^{28}, the steady states of these dynamics are distributions with maximum entropy form at a fixed average growth rate, where the asymptotic optimization parameter, \(\beta ^ \star\), is given implicitly by the equation

Equation (13) can be viewed as a relationship between quantities that can be independently estimated for a specific experimental setup: the inoculum size (*N*_{0}) and carrying capacity (*N*_{C}) on the one hand, as well as the typical value of *β*, via Eq. (8) or direct fitting of measured metabolic fluxes, on the other.

Taken together, the two mechanisms, active regulation and competitive growth dynamics, need not be exclusive, and can operate concurrently. A simple diagnostic that could provide insight into the relative importance of both mechanisms is to examine whether the relationship of Eq. (13) is satisfied. If it were, it would suggest that the Boltzmann distribution is dynamical in origin. If, on the other hand, the values of *β* inferred from fitting the maximum entropy model were higher than those derived from the *N*_{C}/*N*_{0} ratio and Eq. (13), additional active regulation may be at work. In the Results section and Supplementary Methods, we provide estimates of these quantities for the experiments under consideration.

### Code availability

We have provided in doi:10.15479/AT:ISTA:62 a C++ code implementing the Lovasz preprocessing as well as the Hit-and-Run algorithm and the polytope representation of the metabolic network used in this study. Please refer to the README.txt file for further information.

### Data availability

The metabolic network employed in this study is the catabolic core from the genome-scale reconstruction iAF1260 and it is available in the Supplementary materials of the published reconstruction work^{44}. The experimental estimates of the metabolic fluxes can be retrieved from the database^{46} doi: 10.1093/nar/gku1137 (see also the Supplementary Methods and Supplementary Data 1). Single-cell growth rate data are available from the Supplementary materials published in^{37}.

## Additional information

**Publisher's note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Change history

### 04 September 2018

This Article was originally published without the accompanying Peer Review File. This file is now available in the HTML version of the Article; the PDF was correct from the time of publication.

## References

- 1.
Kacser, H. & Burns, J. A. The control of flux.

*Biochem. Soc. Trans.***23**, 341–366 (1995). - 2.
Orth, J., Thiele, I. & Palsson, B. O. What is flux balance analysis?

*Nat. Biotechnol.***28**, 245–248 (2010). - 3.
Ibarra, R. U., Edwards, J. S. & Palsson, B. O. Escherichia coli k-12 undergoes adaptive evolution to achieve in silico predicted optimal growth.

*Nature***420**, 186–189 (2002). - 4.
Edwards, J. S., Ibarra, R. U. & Palsson, B. O. In silico predictions of escherichia coli metabolic capabilities are consistent with experimental data.

*Nat. Biotechnol.***19**, 125–130 (2001). - 5.
Majewski, R. A. & Domach, M. M. Simple constrained-optimization view of acetate overflow in E. coli.

*Biotechnol. Bioeng.***35**, 732–738 (1990). - 6.
Varma, A. & Palsson, B. O. Stoichiometric flux balance models quantitatively predict growth and metabolic by-product secretion in wild-type escherichia coli w3110.

*Appl. Environ. Microbiol.***60**, 3724–3731 (1994). - 7.
Edwards, J. & Palsson, B. Metabolic flux balance analysis and the in silico analysis of Escherichia coli k-12 gene deletions.

*BMC Bioinformatics***1**, 1–1 (2000). - 8.
Vazquez, A., Liu, J., Zhou, Y. & Oltvai, Z. N. Catabolic efficiency of aerobic glycolysis: the warburg effect revisited.

*BMC Syst. Biol.***4**, 58 (2010). - 9.
Wang, P. et al. Robust growth of Escherichia coli.

*Curr. Biol.***20**, 1099–1103 (2010). - 10.
Iyer-Biswas, S., Crooks, G. E., Scherer, N. F. & Dinner, A. R. Universality in stochastic exponential growth.

*Phys. Rev. Lett.***113**, 028101 (2014). - 11.
Iyer-Biswas, S. Scaling laws governing stochastic growth and division of single bacterial cells.

*Proc. Natl Acad. Sci. USA***111**, 15912–15917 (2014). - 12.
Kennard, A. S. et al. Individuality and universality in the growth-division laws of single E. coli cells.

*Phys. Rev. E***93**, 012408 (2016). - 13.
Naama, B. et al. Universal protein distributions in a model of cell growth and division.

*Phys. Rev. E***92**, 042713 (2015). - 14.
Taheri-Araghi, S. et al. Cell-size control and homeostasis in bacteria.

*Curr. Biol.***25**, 385–391 (2015). - 15.
Amir, A. Cell size regulation in bacteria.

*Phys. Rev. Lett.***112**, 208102 (2014). - 16.
Kiviet, D. J. et al. Stochasticity of metabolism and growth at the single-cell level.

*Nature***514**, 376–379 (2014). - 17.
Shahrezaei, V. & Marguerat, S. Connecting growth with gene expression: of noise and numbers.

*Curr. Opin. Microbiol.***25**, 127–135 (2015). - 18.
Keren, L. et al. Noise in gene expression is coupled to growth rate.

*Genome Res.***25**, 1893–1902 (2015). - 19.
Cerulus, B., New, A. M., Pougach, K. & Verstrepen, K. J. Noise and epigenetic inheritance of single-cell division times influence population fitness.

*Curr. Biol.***26**, 1138–1147 (2016). - 20.
Jaynes, E. T. Information theory and statistical mechanics.

*Phys. Rev.***106**, 620 (1957). - 21.
Schneidman, E., Berry, M. J., Segev, R. & Bialek, W. Weak pairwise correlations imply strongly correlated network states in a neural population.

*Nature***440**, 1007–1012 (2006). - 22.
Tkačik, G. et al. Searching for collective behavior in a large network of sensory neurons.

*PLoS Comput. Biol.***10**, e1003408 (2014). - 23.
Lezon, T. R., Banavar, J. R., Cieplak, M., Maritan, A. & Fedoroff, N. V. Using the principle of entropy maximization to infer genetic interaction networks from gene expression patterns.

*Proc. Natl Acad. Sci. USA***103**, 19033–19038 (2006). - 24.
Mora, T., Walczak, A. M., Bialek, W. & Callan, C. G. Maximum entropy models for antibody diversity.

*Proc. Natl Acad. Sci. USA***107**, 5405–5410 (2010). - 25.
Bialek, W. et al. Statistical mechanics for natural flocks of birds.

*Proc. Natl Acad. Sci. USA***109**, 4786–4791 (2012). - 26.
De Martino, D., Capuani, F. & De Martino, A. Growth against entropy in bacterial metabolism: the phenotypic trade-off behind empirical growth rate distributions in E. coli.

*Phys. Biol.***13**, 036005 (2016). - 27.
De Martino, D., Mori, M. & Parisi, V. Uniform sampling of steady states in metabolic networks: heterogeneous scales and rounding.

*PLoS ONE***10**, e0122670 (2015). - 28.
De Martino, D., Capuani, F. & De Martino, A. Quantifying the entropic cost of cellular growth control.

*Phys. Rev. E***96**, 010401 (2017). - 29.
De Martino, D. & Masoero, D. Asymptotic analysis of noisy fitness maximization, applied to metabolism and growth.

*J. Stat. Mech. Theory Exp.***2016**, 123502 (2016). - 30.
De Martino, D. Maximum entropy modeling of metabolic networks by constraining growth-rate moments predicts coexistence of phenotypes.

*Phys. Rev. E***96**, 060401 (2017). - 31.
De Martino, D. Scales and multimodal flux distributions in stationary metabolic network models via thermodynamics.

*Phys. Rev. E***95**, 062419 (2017). - 32.
Kümmel, A., Panke, S. & Heinemann, M. Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data.

*Mol. Syst. Biol.***2**, 1 (2006). - 33.
Sakamoto, N., Kotre, A. M. & Savageau, M. A. Glutamate dehydrogenase from Escherichia coli: purification and properties.

*J. Bacteriol.***124**, 775–783 (1975). - 34.
Slonim, N., Atwal, G. S., Tkačik, G. & Bialek, W. Information-based clustering.

*Proc. Natl Acad. Sci. USA***102**, 18297–18302 (2005). - 35.
Tkačik, G. & Walczak, A. M. Information transmission in genetic regulatory networks: a review.

*J. Phys. Condens. Matter***23**, 153102 (2011). - 36.
Tkačik, G. & Bialek, W. Information processing in living systems.

*Annu. Rev. Condens. Matter Phys.***7**, 89–117 (2016). - 37.
Bergmiller, T. et al. Biased partitioning of the multidrug efflux pump AcrAB-TolC underlies long-lived phenotypic heterogeneity.

*Science***356**, 311–315 (2017). - 38.
Mori, M., Hwa, T., Martin, O. C., De Martino, A. & Marinari, E. Constrained allocation flux balance analysis.

*PLoS Comput. Biol.***12**, e1004913 (2016). - 39.
Fernandez-de-Cossio-Diaz, J. & Mulet, R. Fast inference of ill-posed problems within a convex space.

*J. Stat. Mech. Theory Exp.***7**, 073207 (2016). - 40.
Alessandro Massucci, F., Font-Clos, F., De Martino, A. & Pérez Castillo, I. A novel methodology to estimate metabolic flux distributions in constraint-based models.

*Metabolites***3**, 838–852 (2013). - 41.
Font-Clos, F., Massucci, F. A. & Castillo, I. P. A weighted belief-propagation algorithm to estimate volume-related properties of random polytopes.

*J. Stat. Mech. Theory Exp.***2012**, P11003 (2012). - 42.
Martelli, C., De Martino, A., Marinari, E., Marsili, M. & Castillo, I. P. Identifying essential genes in Escherichia coli from a metabolic optimization principle.

*Proc. Natl Acad. Sci. USA***106**, 2607–2611 (2008). - 43.
Braunstein, A., Muntoni, A. P. & Pagnani, A. An analytic approximation of the feasible space of metabolic networks.

*Nat. Commun*.**8**, 14915 (2017). - 44.
Orth, J. D. et al. A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011.

*Mol. Syst. Biol.***7**, 535 (2011). - 45.
Baranyi, J. & Roberts, T. A. A dynamic approach to predicting bacterial growth in food.

*Int. J. Food Microbiol.***23**, 277–294 (1994). - 46.
Zhang, Z. et al. CeCaFDB: a curated database for the documentation, visualization and comparative analysis of central carbon metabolic flux distributions explored by 13C-fluxomics.

*Nucleic Acids Res.***43**, D549–D557 (2014).

## Acknowledgements

We acknowledge the support of the Austrian Science Fund grant FWF P28844 (G.T.) and of the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007-2013) under REA grant agreement no. [291734] (D.D.M).

## Author information

### Affiliations

#### Institute of Science and Technology Austria, Am Campus 1, A-3400, Klosterneuburg, Austria

- Daniele De Martino
- , Anna MC Andersson
- , Tobias Bergmiller
- , Călin C. Guet
- & Gašper Tkačik

### Authors

### Search for Daniele De Martino in:

### Search for Anna MC Andersson in:

### Search for Tobias Bergmiller in:

### Search for Călin C. Guet in:

### Search for Gašper Tkačik in:

### Contributions

D.D.M. and G.T. conceived the study and developed the theory. D.D.M. performed the simulations. A.A. and D.D.M. carried out the data analysis. C.G. and T.B. carried out the experiments. All authors contributed in writing the final manuscript.

### Competing interests

The authors declare no competing interests.

### Corresponding author

Correspondence to Daniele De Martino.

## Electronic supplementary material

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.