## Abstract

Phenotypic variation is a hallmark of cellular physiology. Metabolic heterogeneity, in particular, underpins single-cell phenomena such as microbial drug tolerance and growth variability. Much research has focussed on transcriptomic and proteomic heterogeneity, yet it remains unclear if such variation permeates to the metabolic state of a cell. Here we propose a stochastic model to show that complex forms of metabolic heterogeneity emerge from fluctuations in enzyme expression and catalysis. The analysis predicts clonal populations to split into two or more metabolically distinct subpopulations. We reveal mechanisms not seen in deterministic models, in which enzymes with unimodal expression distributions lead to metabolites with a bimodal or multimodal distribution across the population. Based on published data, the results suggest that metabolite heterogeneity may be more pervasive than previously thought. Our work casts light on links between gene expression and metabolism, and provides a theory to probe the sources of metabolite heterogeneity.

## Introduction

Cellular heterogeneity is ubiquitous across all domains of life. In microbes, clonal populations display phenotypic variability as a result of multiple factors such as fluctuations in the microenvironment, stochasticity in gene expression, or asymmetric partitioning at cell division^{1,2,3}. Variability is well recognised at the transcriptional and translational levels. Yet various single-cell phenomena result from the emergence of distinct metabolic states within a clonal population. For example, metabolic heterogeneity plays a key role in antibiotic tolerance^{4,5,6}, heterogeneous nutrient uptake^{7,8}, and variations in growth rate^{9,10}. It has also been shown that nutrient shifts can cause populations to split into two^{11,12} or more^{13} subpopulations with distinct growth abilities. The emergence of subpopulations has been theorised as a bet-hedging strategy that gives an evolutionary advantage for survival in adverse environments^{4,14}.

A central challenge to quantify metabolic variability is the lack of techniques for measuring metabolites with single-cell resolution^{15}. In contrast to single-cell measurements of protein expression, for which powerful reporter systems have been developed^{16,17}, quantification of metabolites in single-cells remains a major challenge. Some of the techniques employed so far include Förster resonance energy transfer (FRET) sensors^{18}, metabolite-responsive transcription factors^{19,20}, RNA sensors^{21}, and mass-spectrometry^{22}, yet most of these technologies are in the early stages of development^{15}. As a result, metabolic heterogeneity is typically quantified indirectly via measurements of metabolic enzymes or growth rate in single-cells^{9,12,23}.

Our objective in this paper is to characterise heterogeneity in metabolites as a result of stochastic enzyme expression and catalysis. Metabolic models traditionally assume that enzymatic reactions behave deterministically on the basis that both enzymes and metabolites appear in high molecule numbers^{24}. However, single-cell proteomics in *Escherichia coli* show that metabolic enzymes are as variable as any other member of the proteome^{17}, while metabolomics data suggest that average metabolite abundances span several orders of magnitude^{25}. The few datasets on single-cell metabolite abundance already suggest substantial variability in some metabolites in *E. coli*^{19,26}. Such evidence casts doubt on the traditional assumption of metabolism being a purely deterministic process, suggesting a link between fluctuations in enzyme expression and metabolites.

The role of stochastic gene expression in protein variability has been well studied^{2,3,27,28,29}, but the impact of such randomness on metabolic reactions remains much less understood. Various theoretical studies have analysed the impact of fluctuations in the supply and consumption of metabolites^{30,31,32}, or the propagation of enzyme noise to a metabolite^{33}. However, despite mounting experimental evidence of stochastic effects in metabolism, mathematical models still lack the sufficient detail to integrate the processes that are known to shape protein heterogeneity, such as stochastic promoter switching and transcriptional bursting.

In this paper, we propose a model for metabolite heterogeneity in single-cells. The model integrates stochasticity in enzyme catalysis^{24} and expression^{27}, two well-established processes that so far have been studied in isolation. Our approach includes a stochastic formulation of various relevant mechanisms in enzymatic reactions, including reversible catalysis, stochastic switching of promoter activity, fluctuations in mRNA transcripts, and consumption of the enzymatic product by downstream processes.

We probe the model for various sources of stochasticity using simulations and analytical solutions for the stationary distribution of the metabolite. The analysis reveals intricate patterns of heterogeneity that translate into bimodal and multimodal distributions for the number of metabolite molecules. These phenomena arise from the interplay between a lowly abundant enzyme and its catalytic parameters. Under the separation of timescales typical of metabolic reactions, we show that metabolite distributions can be accurately approximated by a Poisson mixture model (PMM) across large regions of the parameter space. The mixture model can be readily adapted to a wide class of gene expression models and provides a quantitative tool to predict metabolite variability from enzyme measurements in single-cells.

## Results

### Stochastic model of an enzymatic reaction

We consider a model that combines enzyme kinetics and enzyme expression into a single stochastic description (Fig. 1a). The model includes an enzymatic reaction with standard Michaelis–Menten kinetics, in which substrate and enzyme bind reversibly to form a complex that undergoes reversible catalysis into a metabolite. We assume that enzyme expression follows the well-established three-stage model for gene expression^{3,27}, where a single copy gene switches stochastically between an inactive state (*D*_{off}) and active state (*D*_{on}). In the active state, mRNAs are transcribed and translated into protein. The model also includes consumption of the metabolite by downstream pathways, degradation of mRNA transcripts, and dilution by the growth of all species. Since metabolic reactions operate far from thermodynamic equilibrium, we assume that the substrate pool remains constant so that the system reaches a non-zero flux, e.g. when the substrate is a highly abundant extracellular carbon source or a slowly varying intracellular metabolite. The model reactions are shown in Eqs. (R1)–(R9) in the Methods section.

To investigate the emergence of metabolic heterogeneity, we need to compute the stationary probability distribution of metabolite molecules (*n*_{p}) for relevant combinations of model parameters. Figure 1b shows a typical simulation of the model obtained with Gillespie’s algorithm^{34}. A key challenge for such simulations, however, is the multiscale nature of enzymatic reactions: not only do metabolic reactions operate in a much faster timescale (milliseconds) than enzyme expression (tens of minutes)^{30,35,36}, but also the average number of enzymes is much lower than the number of metabolites. These multiple scales result in reaction propensities that differ by several orders of magnitude, thus leading to extremely slow simulations which make the exploration of the parameter space infeasible. An alternative is to use simulation algorithms that exploit the separation of scales to increase computational speed, such as tau-leaping or slow-scale approximations^{37}. Yet in our case it is unclear how such numerical approximations impact the predictions drawn from the simulations.

To determine the impact of genetic and catalytic parameters on metabolic heterogeneity, we obtained an analytic approximation for the distribution of metabolite molecules that can be evaluated efficiently without expensive stochastic simulations. Our solution allows the exploration of parameter space to characterise the different regimes promoting metabolic heterogeneity. The approximation follows from exploiting time scale separation in the chemical master equation of the stochastic process^{38}. In physiological regimes, the model has three timescales: a fast metabolic time scale, in which substrate and enzyme bind and unbind; an intermediate time scale associated with the catalysis of the metabolite (*n*_{p}); and a slow timescale associated with the expression of the enzyme and dilution by cell growth.

The total amount of enzyme (free and substrate-bound, denoted as *n*_{e} and *n*_{c}, respectively) varies in the slowest timescale, and therefore the binding/unbinding of substrate and enzyme equilibrates quickly. As a result, in the timescale of gene expression, the metabolite can be assumed to depend directly on the total enzyme *n*_{etot} = *n*_{e} + *n*_{c} rather than on *n*_{e} and *n*_{c} individually. Under this approximation, it is convenient to use the law of total probability:

The formula in (1) decomposes the distribution of metabolite **P**(*n*_{p}) into stochasticity originating from enzyme expression, **P**(*n*_{etot}), and from fluctuations in the catalytic reaction itself, described by the conditional distribution of metabolite given the amount of total enzyme, **P**(*n*_{p}|*n*_{etot}). In the timescale of metabolite fluctuations, the total enzyme can be assumed to be in a quasi-stationary state. Further, exploiting the fast binding/unbinding between substrate and enzyme, we showed that the metabolite follows a birth–death process with effective propensities (details in the Methods section):

where \({\Bbb E}({n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}})\) and \({\Bbb E}({n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}})\) are the conditional expectations of the free enzyme (*n*_{e}) and complex (*n*_{c}) given the total enzyme and metabolite. In Eq. (2), *n*_{s} is the constant number of substrate molecules, the parameters *k*_{1}, *k*_{−1}, *k*_{cat}, and *k*_{rev} are the rate constants of the Michaelis–Menten mechanism (defined in Fig. 1a), and *k*_{c} is an effective first-order rate constant of metabolite consumption by downstream pathways. The conditional distribution needed in Eq. (1) can then be computed explicitly:

with Poisson parameter

and (*λ*_{∞}, *K*) are two effective kinetic parameters

The parameters *λ*_{∞} and *K* are in units of molecules/cell and depend on the interplay between substrate abundance, enzyme kinetics, and downstream processes.

As illustrated in Fig. 1b, the distribution in Eq. (1) is a PMM^{39,40,41} that convolves the enzyme distribution **P**(*n*_{etot}) with various Poisson modes **P**(*n*_{p}|*n*_{etot}) arising from the catalytic activity. In our model, the analytical distribution of the total enzyme abundance follows the standard solution of the three-stage model for gene expression^{27}, which can be computed explicitly in terms of model parameters. In certain limits, the three-stage model produces approximately Gamma or normal distributions depending on the mean expression level and the half-lives of mRNAs and proteins^{3,27}.

The decomposition in Eq. (1) shows that the PMM is not limited to the model for gene expression we have considered here. Other models may be used, either by using closed-form expressions for **P**(*n*_{etot}), or by inferring the enzyme distribution directly from single-cell protein expression data such as flow cytometry or single-cell microscopy^{1,17}. The PMM thus provides a versatile tool to predict metabolite heterogeneity from modelled or measured enzyme heterogeneity viewed as an upstream source of variation^{41}. Figure 1c shows that the PMM distribution provides a good approximation to Gillespie simulations computed with typical parameter values.

### Qualitative features of the Poisson Mixture Model

At the heart of the PMM is the interplay between variability from gene expression and that originating from enzyme kinetics. Specifically, the Poisson parameter *λ*(*n*_{etot}) in Eq. (4) controls the location and dispersion of the Poisson modes, which in turn shape the overall pattern of variability. As shown in Fig. 1b, there are several cases of interest. For example, for irreversible reactions (*k*_{rev} = 0), the Poisson parameter simplifies to

which scales linearly with the enzyme abundance and thus the Poisson modes have equidistant means. In reversible reactions, on the other hand, the Poisson parameter saturates and causes the Poisson modes to concentrate around *λ*_{∞}. This effect is stronger for strong reversibility (high *k*_{rev}), in which case the kinetic parameter *K* is small. Note also that in either case, as the enzyme number *n*_{etot} grows, the Poisson modes spread out since *λ*(*n*_{etot}) controls both their mean and variance.

From the construction of the PMM in Eq. (1), we observe that the enzyme distribution weighs the various Poisson modes, potentially producing metabolite distributions that are unimodal, bimodal, or even multimodal. For example, for highly expressed reversible enzymes, the distribution **P**(*n*_{etot}) is non-negligible for large *n*_{etot} only. Hence most Poisson modes do not contribute to the final metabolite distribution, except the mode centred at *λ*_{∞}, which leads to a unimodal metabolite distribution with a mean close to the deterministic average.

Conversely, for lowly expressed enzymes, there is a non-negligible probability of enzymes not being expressed, and thus the first term of the PMM, i.e. **P**(0)Poisson(*n*_{p}, 0), causes the metabolite distribution to peak at zero. However, the metabolite distribution may also display a second peak at *λ*_{∞} if, for example, the *λ*(*n*_{etot}) parameter causes many Poisson modes to concentrate around *λ*_{∞}. This results in a bimodal metabolite distribution, whereby an isogenic population splits into metabolite producers and non-producers. Similar reasoning can be used to understand the emergence of multimodal metabolite distributions, which correspond to three or more subpopulations with varying metabolic activities. This qualitative analysis suggests that metabolic subpopulations can emerge even in cases where enzymes display unimodal distributions across the population. Crucially, this also indicates that metabolic subpopulations emerge through mechanisms that do not follow trivially from transcriptional heterogeneity alone, as we explore in more detail in the next section.

### Mechanisms for metabolic bimodality

First, we explored the impact of stochastic promoter switching on the emergence of metabolite bimodality. Figure 2a shows the summary of calculations when evaluating the PMM for variations in the promoter time scale and promoter activity across several orders of magnitude for various values of the kinetic parameter *λ*_{∞}. We found three qualitatively distinct parameter regimes for the metabolite distribution that emerge from the combination of stochastic switching and catalysis: (1) a regime where both enzyme and metabolite have unimodal distributions, akin to the results shown earlier in Fig. 1c; (2) a regime where both enzyme and metabolite have bimodal distributions; and (3) a regime in which the enzyme is unimodal but the metabolite is bimodal.

It can be shown that the deterministic version of our model in Eqs. (R1)–(R9) has a single steady state. Hence regime (1) can be thought of as a stochastic correction consisting of unimodal distributions around a deterministic steady state. This is the expected behaviour under the traditional assumptions of high abundance of enzyme and metabolite molecules.

The other two regimes, however, correspond to alternative routes of noise-induced bimodality that cannot be explained using deterministic models^{42,43,44}. Regime (2) is a highly stochastic regime dominated by the slow stochastic switching of the promoter, which drives and entrains the metabolic response. Hence we term it *switching-induced bimodality*. Slowly switching promoters are known to produce bimodal gene expression^{29,41}, and thus this regime corresponds to a case in which bimodality propagates from enzymes to metabolites. Figure 2a shows that this behaviour appears robustly for slow switching and high promoter activity across values of the *λ*_{∞} parameter.

Regime (3), the second route for metabolite bimodality, originates from a unimodal but weakly expressed enzyme (low *k*_{on}/*k*_{off}) expressed from fast switching promoters. In this case, the birth of a small number of enzyme molecules is sufficient to kick-start catalysis and make it rapidly settle in a quasi-stationary regime. This distinct phenomenon is a result of the separation of time scales between enzyme expression and catalysis, and we refer to it as *catalytically-induced bimodality*. From Fig. 2a, we observe that this form of bimodality appears for a narrow range of promoter switching parameters corresponding to fast switching genes with medium to low promoter activity. This behaviour disappears altogether for a low *λ*_{∞} parameter, for example in case of strong reversibility.

To validate the predictions of the PMM approximation, we ran full Gillespie simulations over a long time horizon for different parameter sets. Figure 2b shows the simulation time courses and resulting histograms. For switching-induced bimodality, we observe how slowly switching promoters cause a single cell to lack the enzyme over several cell cycles, a period during which the metabolite is not produced. In the case of catalytic-induced bimodality, however, fast switching combined with a low average expression level causes the metabolite abundance to drop for shorter but more frequent intervals. In both cases, the PMM provides an excellent approximation to the bimodal histograms obtained from the stochastic simulations. Furthermore, we observe that the bimodal metabolite distributions both regimes are almost indistinguishable from each other, yet they are produced by enzymes with substantially different time courses and distributions. These regimes therefore correspond to distinct forms of bimodality, arising from fundamentally different mechanisms.

### Emergence of metabolic multimodality

To explore the emergence of multimodality, we examined the analytical formula of the PMM in Eq. (1) to identify kinetic regimes associated with distinct enzyme distributions. A necessary condition for the emergence of multiple modes is that the Poisson components do not overlap and are sufficiently spaced from each other. From the definition of the *λ*(*n*_{etot}) parameter in Eq. (5), this happens when the kinetic parameter *K* is large. As discussed earlier, depending on the distribution of the enzyme, the Poisson modes may appear or cancel in the final metabolite distribution. We thus swept the parameter *K* and evaluated the PMM across various enzyme expression levels, including low expression with a skewed distribution and high expression with a normally distributed enzyme.

As shown in Fig. 3, we found intricate patterns of multimodal distributions, depending on the interplay between the heterogeneity of the enzyme, **P**(*n*_{etot}), and the enzyme kinetics encapsulated by the *K* parameter. Multimodality appears when the enzyme expression levels are low as compared to the parameter *K*. For instance, the values of *K* in Fig. 3 are approximately 5-, 20-, and 100-fold those used in the bimodal examples in Fig. 2. For enzymes expressed at intermediate levels, in the order of tens of molecules/cell on average, we found metabolite distributions that are unimodal but highly skewed. In the case of highly expressed enzymes, metabolites followed approximately normal distributions for a wide range of kinetic parameters.

The predictions are confirmed by Gillespie simulations of the full stochastic model, which display a striking match with the PMM approximation, even for complex multimodal distributions. The simulation time courses (shown in the insets of Fig. 3) show that the multiple modes for weakly expressed enzymes correspond to cells remaining in a fixed metabolic state over the scale of the cell cycle but fluctuate across other states over longer time scales. For intermediate enzyme expression and large values for *K*, the metabolite does not settle in the quasi-stationary states and displays a long-tailed distribution. A decrease in *K* suppresses the tail of the distribution driving the PMM towards an approximately normal distribution. Altogether, these results indicate that the relation between enzyme expression and the kinetic parameters *λ*_{∞} and, in particular, *K* are key determinants for the emergence of multimodality. This underscores the utility of the PMM to guide the prediction of qualitative and quantitative features of metabolite distributions for a wide range of parameter combinations.

## Discussion

Metabolic reactions are the powerhouse of living systems, fuelling the activity and dynamics of most cellular functions. Yet metabolism has been traditionally considered as a static process isolated from the rest of the cellular machinery. Currently, the accepted notion is that due to the large number of molecules involved, metabolism is a deterministic process at the cellular level, modulated by potentially random extrinsic factors^{12,45}. Here, we integrated enzyme kinetics and enzyme expression to propose a theoretical model for the variability of metabolites in single cells. The model suggests that cell-to-cell metabolite variation can also arise as a result of intrinsic sources such as stochastic fluctuations in enzyme expression.

The majority of work on non-genetic heterogeneity has focused on stochastic gene expression and the resulting variability in protein levels^{1,2}. This has produced a wealth of single-cell data and models to understand the variability in transcription and translation observed in clonal populations. Metabolite heterogeneity, however, remains poorly understood theoretically and has been observed only indirectly (e.g. through measurements of metabolic enzymes^{12,23} or growth rate^{9}) due to the lack of techniques to measure metabolite abundance in single cells.

Using the separation of time scales characteristic of metabolic reactions, we found that the stationary distribution of a metabolite follows a PMM. The PMM can be efficiently evaluated across large domains of the parameter space and provides excellent approximations to the distributions computed from full stochastic simulations. Importantly, the model can be readily adapted to include different stochastic models for enzyme expression, beyond the three-stage model considered here^{3,46}, or even stochastic and time-dependent enzyme expression modelled as upstream drives^{41}. The model can also be parameterised from experimentally measured distributions for enzyme levels in single-cells^{17}. In combination with the enzyme kinetic parameters, the PMM could provide a powerful tool to predict metabolite variability from single-cell protein data obtained with flow cytometry or time-lapse microscopy.

We found complex patterns of metabolite heterogeneity depending on the interplay between the timescale of promoter activation/deactivation, the enzyme expression level, and the enzyme kinetics. The model predicts that bimodal and multimodal metabolite distributions can emerge in various parameter regimes. In such regimes, single-cells spend several cell cycles in a constant metabolic state, but in timescales as long as tens of cell cycles, they switch stochastically across different states. Such long-term fluctuations in single cells result in highly heterogeneous populations containing several subgroups of metabolically distinct cells.

Bimodal metabolic phenotypes have been observed as a result of transcriptional regulation^{14,23}, post-translational control^{11}, and stochastic effects triggered by environmental shifts^{12}. Our model reveals two distinct regimes in which metabolites display bimodality. One regime, which we call switching-induced bimodality, corresponds to the intuitive case in which a bimodal enzyme produces a bimodal metabolite. In agreement with previous studies on stochastic gene expression, this type of bimodality appears as a result of slow switching between promoter states^{29,47}. In addition, we identified a fundamentally different mechanism of *catalytically-induced bimodality*, in which a unimodal enzyme produces a bimodal distribution of metabolite. This phenomenon results from a combination of slow fluctuations of a weakly expressed enzyme and the comparatively faster timescale of enzyme catalysis. Catalytic timescales are typically in the order of seconds or faster, so that slow fluctuations in enzyme expression levels produce two quasi-stationary metabolic states in single cells. At a population level, this leads to two distinct subpopulations of metabolite producers and non-producers.

As shown in Fig. 4a, single-cell measurements in *E. coli* suggest that metabolic enzymes appear in low copy numbers across most cellular pathways^{17}. In the specific growth conditions of that experiment, the data did not reveal the bimodal expression of enzymes, which precludes the emergence of switching-induced bimodality in the metabolites they catalyse. However, as illustrated by the three representative distributions in Fig. 4a, a number of enzymes have a low mean and a long-tailed distribution, akin to those required for catalytically-induced bimodality and multimodality. This suggests that enzyme distributions found in nature have the characteristics needed for the emergence of subpopulations with two or more distinct metabolite abundances.

Further requirements for metabolite bimodality and multimodality involve conditions on the parameters *λ*_{∞} and *K* in Eq. (5). However, their computation requires rate constants (*k*_{1}, *k*_{−1}, and *k*_{rev}) that are rarely measured individually, and instead enzymology data typically provides values for *k*_{cat} and *K*_{M} = (*k*_{cat} + *k*_{−1})/*k*_{1} only^{48}. In the Methods section, we show that the ratio *λ*_{∞}/*K* can be expressed as \(\lambda _\infty {\mathrm{/}}K = \epsilon \times k_{{\mathrm{cat}}}{\mathrm{/}}k_{\mathrm{c}}\), where \(\epsilon\) is the saturation level of the enzyme and *k*_{c} is the first-order rate constant of metabolite consumption. As illustrated in Fig. 4b, the ratio *λ*_{∞}/*K* corresponds to a straight line in a (*λ*_{∞}, *K*)-space, and a specific enzyme (i.e. with specific values for *k*_{1}, *k*_{−1}, and *k*_{rev}) corresponds to a single point on the line. In Fig. 4b, we compare model predictions for a lowly abundant enzyme with different *λ*_{∞}/*K* ratios computed for *k*_{cat} constants measured in *E. coli*^{49}. Considering the large spread in measured *k*_{cat} values, of up to seven orders of magnitude, plus the multiple combinations of metabolite consumption rates and enzyme saturation, the analysis suggests that catalytically-induced bimodality and multimodality are plausible within physiological regimes. Further validations of our predictions require measuring metabolite distributions directly, but this is still subject to a number of challenges in single-cell measurement technologies^{15}.

Our analysis shows that metabolite heterogeneity depends on a delicate interplay between enzyme expression and enzyme kinetics. It is reasonable to expect that energy-critical enzymes, such as those in central carbon metabolism, filter away fluctuations through post-translational regulatory mechanisms commonly found in metabolism. However, this may not be the case in pathways that are dynamically regulated in response to changes in the environment or cellular context. For example, transcriptional regulation in response to nutrient shifts may steer enzyme levels into regimes of low copy numbers where heterogeneity may dominate the resulting phenotypes. Such a mechanism has been already shown to produce growth bimodality in the gluconeogenic switch of *E. coli*^{12}, while a similar mechanism could underpin the large variability observed in single-cell measurements of S-adenosyl methionine^{21}. Noise-induced phenomena also have implications for the design of dynamic control systems for heterologous pathways, which are focus of much research in synthetic biology and metabolic engineering^{50}.

In our efforts to build a theory that includes components shared by most enzymatic reactions, we have purposely overlooked a number of processes that can shape metabolic activity. For example, we have not addressed the impact of feedback mechanisms that control enzyme activity, including e.g. product inhibition and allostery, or transcriptional mechanisms that control enzyme expression in response to metabolites. Since post-translational regulation operates on timescales much shorter than enzyme expression, with similar timescale separation arguments it should be possible to express the metabolite distribution as a mixture model akin to ours. In such a case, the mixture components are not necessarily Poisson and their distribution will depend on the particular mechanism under study. We expect that bimodal and multimodal responses are likely to emerge in this setting, but the precise parameter conditions would have to be studied on a case-by-case basis. Transcriptional feedback can also display various mechanisms depending on the particular pathway under study. One common motif relies on transcription factors (TF) that up- or down-regulate enzyme expression upon binding to a specific metabolite^{20}. These mechanisms have been shown to play important roles on metabolic activity^{51}, but they also bring to the fore subtle questions that require detailed examination, for example, on the role of fluctuations coming from TF expression itself, or the impact of negative TF autoregulation^{52}. Our study paves the way for these and other questions to be addressed and raises exciting prospects for the future research in metabolic heterogeneity.

In this paper, we laid theoretical foundations to study metabolism in conjunction with stochastic enzyme expression. We brought together classic models for gene expression and enzyme kinetics, and discovered a rich array of distinct stochastic phenomena that underpin the emergence of metabolic subpopulations. Our theory provides a quantitative basis to draw testable hypotheses on the sources of metabolite heterogeneity, which together with the ongoing efforts in single-cell metabolite measurements, will help to re-think metabolism as an active source of phenotypic variation.

## Methods

### Stochastic modelling and simulation

We built a fully stochastic model for the reaction scheme describing a metabolic reaction coupled with gene expression (Fig. 1a):

All reactions are assumed to follow mass action kinetics. Model simulations were computed with Gillespie’s algorithm^{34} over long time horizons, in the order of hundreds of cell cycles for all simulations. Because of the complex multimodality observed, long simulations are needed to obtain accurate approximations of the stationary molecular distributions. The time courses shown in figures correspond to a small time window of the overall simulation. Unless mentioned in figure captions, all parameter values were fixed to their nominal values shown in Table 1. The parameters are selected in a physiologically realistic range respecting the scale separation of molecule numbers between mRNA (~1–5 molecules), total enzyme abundance (~100 molecules) and metabolites (~1000 molecules).

To identify bimodality in Fig. 2, we detected the existence of one or two peaks in a distribution and defined it as bimodal if the height of the smaller peak is a least 10% of the larger peak and the trough between peaks is at most 10% of the height of the smaller peak.

### Analytical expressions for the metabolite distribution

To derive an analytic approximation for the probability to observe *n*_{p} metabolites in a cell, we first use the law of total probability as shown in Eq. (1).

*Distribution of the total enzyme*: Because free enzymes and complexes degrade at the same rate and *n*_{etot} = *n*_{e} + *n*_{c} is conserved by the metabolic reaction, in the slow timescale the enzyme distribution **P**(*n*_{etot}) follows the standard solution^{27} of the three-stage model for gene expression:

where Γ is the Gamma function and _{2}*F*_{1} is the ordinary hypergeometric function. The parameters are *γ* = (*k*_{on} + *k*_{off})/*δ* and \(\alpha _ \pm = \left( {a + \gamma \pm \sqrt {(a + \gamma )^2 - 4ak_{{\mathrm{off}}}{\mathrm{/}}\delta } } \right){\mathrm{/}}2\), with *a* = *k*_{tx}/*δ* and *b* = *k*_{tl}/*k*_{deg}.

*Conditional distribution for the metabolite*: To compute the second term in Eq. (1), we observe that enzyme expression occurs on a much longer timescale than enzyme kinetics, and thus metabolites can be considered to be in a quasi-equilibrium state of the catalytic reactions (R1) and (R2) and metabolite consumption (R6).

To explicitly compute the mixture components **P**(*n*_{p}|*n*_{etot}), we assume that reversible binding between substrate and enzyme in reaction (R1) is much faster than the catalytic step and metabolite consumption^{35}. In this limit, the metabolite number evolves according to the effective reactions:

where \({\mathrm{k}}_{{\mathrm{cat}}}^{{\mathrm{eff}}}\) and \({\mathrm{k}}_{{\mathrm{rev}}}^{{\mathrm{eff}}}\) are effective propensities averaged over the fast fluctuating variables *n*_{c} and *n*_{e}:

where \({\Bbb E}\) denotes the expectation operator. The derivation of the effective propensities in Eq. (9) corresponds to a particular case of a more general methodology for timescale separation in stochastic chemical systems^{29,53,54}. Note that since the total enzyme levels are conserved in the catalytic timescale, it follows that

To derive the conditional expectations in Eq. (9), we write the first-order moment equation for the free enzyme *n*_{e}, which according to Eqs. (R1) and (R2) reads

Under the assumption that the reversible binding of substrate and enzyme is much faster than the other processes, the first two terms dominate the right-hand side of Eq. (11) and determine the enzyme–complex quasi-equilibrium. Equating these two terms and using the conservation relation in Eq. (10), we obtain

and thus both conditional expectations depend on *n*_{etot} and are independent of the metabolite abundance. Therefore, the reactions in Eq. (8) correspond to a birth–death process with a zero-th order birth propensity \({\mathrm{k}}_{{\mathrm{cat}}}^{{\mathrm{eff}}}\) and two linear death propensities. The mixture components **P**(*n*_{p}|*n*_{etot}) are thus Poissonian with parameter *λ*(*n*_{etot}) as shown in Eqs. (3) and (4).

### Comparison of PMM predictions and measured kinetic parameters

The PMM depends on the effective parameters *λ*_{∞} and *K*, which are functions of five rate constants (*k*_{cat}, *k*_{1}, *k*_{−1}, *k*_{c}, and *k*_{rev}). Most of these parameters are not available, except *k*_{cat} and *K*_{M} = (*k*_{cat} + *k*_{−1})/*k*_{1}. From Eq. (5) it follows

which allows the computation of *λ*_{∞}/*K* for measured *k*_{cat} values in different saturation conditions and consumption rate constants. The red line in Fig. 4b represents the *λ*_{∞}/*K* ratio for a saturated enzyme \(\left( {\epsilon = 1} \right)\), fast consumption (*k*_{c} = 100 × *δ*), and the median *k*_{cat} ≈ 16.5 s^{−1} in *E. coli*^{49}. The top grey line is the case without consumption, i.e. metabolites are diluted by cell growth (*k*_{c} = *δ*). Lower saturation \(\left( {\epsilon = 0.2} \right)\) moves the red line down the vertical axis (bottom grey line). The enzyme distribution in Fig. 4b was produced with promoter switching parameters {*k*_{on}, *k*_{off}} = {1.56, 3} × 10^{−4} s^{−1}, *k*_{tx} = 0.025 s^{−1}, and *k*_{tl} = 0.2 s^{−1}. The boundaries between unimodal, bimodal, and multimodal distributions were computed as follows. Unimodal distributions are those with a single maximum. Bimodal distributions were detected as in Fig. 2. Multimodal distributions are those with at least one additional peak higher than a threshold of 1 × 10^{−4} and the trough between neighbouring peaks at most 90% of its height.

### Code availability

The code for producing model simulations is available from the authors upon request.

### Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

All data are available from the authors upon request.

## Additional information

**Publisher’s note:** Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

- 1.
Elowitz, M. B., Levine, A. J. & Siggia, E. D. Stochastic gene expression in a single cell.

*Science***279**, 1183–1186 (2002). - 2.
Paulsson, J. Models of stochastic gene expression.

*Phys. Life Rev.***2**, 157–175 (2005). - 3.
Raj, A. & van Oudenaarden, A. Nature, nurture, or chance: stochastic gene expression and its consequences.

*Cell***135**, 216–226 (2008). - 4.
Balaban, N. Q., Merrin, J., Chait, R., Kowalik, L. & Leibler, S. Bacterial persistence as a phenotypic switch.

*Science***305**, 1622–1625 (2004). - 5.
Lewis, K. Persister cells, dormancy and infectious disease.

*Nat. Rev. Microbiol.***5**, 48–56 (2007). - 6.
Shan, Y. et al. ATP-dependent persister formation in

*Escherichia coli*.*mBio***8**, e02267–16 (2017). - 7.
Vilhena, C. et al. A single-cell view of the BtsSR/YpdAB pyruvate sensing network in

*Escherichia coli*and Its biological relevance.*J. Bacteriol.***200**, e00536-17 (2018). - 8.
Nikolic, N. et al. Cell-to-cell variation and specialization in sugar metabolism in clonal bacterial populations.

*PLoS Genet.***13**, e1007122 (2017). - 9.
Kiviet, D. J. et al. Stochasticity of metabolism and growth at the single-cell level.

*Nature***514**, 376–379 (2014). - 10.
Thomas, P., Terradot, G., Danos, V. & Weiße, A. Y. Sources, propagation and consequences of stochasticity in cellular growth.

*Nat. Commun.***9**, 4528 (2018). - 11.
van Heerden, J. H. et al. Lost in transition: start-up of glycolysis yields subpopulations of nongrowing cells.

*Science***343**, 1245114 (2014). - 12.
Kotte, O., Volkmer, B., Radzikowski, J. L. & Heinemann, M. Phenotypic bistability in

*Escherichia coli*’s central carbon metabolism.*Mol. Syst. Biol.***10**, 736 (2014). - 13.
Şimşek, E. & Kim, M. The emergence of metabolic heterogeneity and diverse growth responses in isogenic bacterial cells.

*ISME J.***12**, 1199–1209 (2018). - 14.
Acar, M., Becskei, A. & van Oudenaarden, A. Enhancement of cellular memory by reducing stochastic transitions.

*Nature***435**, 228–232 (2005). - 15.
Takhaveev, V. & Heinemann, M. Metabolic heterogeneity in clonal microbial populations.

*Curr. Opin. Microbiol.***45**, 30–38 (2018). - 16.
Golding, I., Paulsson, J., Zawilski, S. M. & Cox, E. C. Real-time kinetics of gene activity in individual bacteria.

*Cell***123**, 1025–1036 (2005). - 17.
Taniguchi, Y. et al. Quantifying

*E. coli*proteome and transcriptome with single-molecule sensitivity in single cells.*Science***329**, 533–538 (2010). - 18.
Lemke, E. A. & Schultz, C. Principles for designing fluorescent sensors and reporters.

*Nat. Chem. Biol.***7**, 480–483 (2011). - 19.
Xiao, Y., Bowen, C. H., Liu, D. & Zhang, F. Exploiting non-genetic, cell-to-cell variation for enhanced biosynthesis.

*Nat. Chem. Biol.***12**, 339–344 (2016). - 20.
Mannan, A. A., Liu, D., Zhang, F. & Oyarzún, D. A. Fundamental design principles for transcription-factor-based metabolite biosensors.

*ACS Synth. Biol.***6**, 1851–1859 (2017). - 21.
Paige, J. S., Nguyen-Duc, T., Song, W. & Jaffrey, S. R. Fluorescence imaging of cellular metabolites with RNA.

*Science***335**, 1194 (2012). - 22.
Ibanez, A. J. et al. Mass spectrometry-based metabolomics of single yeast cells.

*Proc. Natl. Acad. Sci. U.S.A.***110**, 8790–8794 (2013). - 23.
Ozbudak, E. M., Thattai, M., Lim, H. N., Shraiman, B. I. & van Oudenaarden, A. Multistability in the lactose utilization network of

*Escherichia coli*.*Nature***427**, 737–740 (2004). - 24.
Cornish-Bowden, A.

*Fundamentals of Enzyme Kinetics*3rd edn (Weinheim, Germany: Wiley-Blackwell, 2004). - 25.
Bennett, B. et al. Absolute metabolite concentrations and implied enzyme active site occupancy in

*Escherichia coli*.*Nat. Chem. Biol.***5**, 593–599 (2009). - 26.
Yaginuma, H. et al. Diversity in ATP concentrations in a single bacterial cell population revealed by.

*Sci. Rep.***4**, 6522 (2014). - 27.
Shahrezaei, V. & Swain, P. S. Analytical distributions for stochastic gene expression.

*Proc. Natl Acad. Sci. USA***105**, 17256–17261 (2008). - 28.
Labhsetwar, P., Cole, J. A., Roberts, E., Price, N. D. & Lutheyschulten, Z. A. Heterogeneity in protein expression induces metabolic variability in a modeled

*Escherichia coli*population.*Proc. Natl Acad. Sci. USA***110**, 14006–14011 (2013). - 29.
Thomas, P., Popović, N. & Grima, R. Phenotypic switching in gene regulatory networks.

*Proc. Natl. Acad. Sci. USA.***111**, 6994–6999 (2014). - 30.
Levine, E. & Hwa, T. Stochastic fluctuations in metabolic pathways.

*Proc. Natl Acad. Sci. USA***104**, 9224–9229 (2007). - 31.
Thomas, P., Straube, A. V. & Grima, R. Communication: limitations of the stochastic quasi-steady-state approximation in open biochemical reaction networks.

*J. Chem. Phys.***135**, 181103 (2011). - 32.
Gupta, A., Milias-argeitis, A. & Khammash, M. Dynamic disorder in simple enzymatic reactions induces stochastic amplification of substrate.

*J. R. Soc.***14**, 1–29 (2017). - 33.
Oyarzún, D. A., Lugagne, J.-B. & Stan, G.-B. Noise propagation in synthetic gene circuits for metabolic control.

*ACS Synth. Biol.***4**, 116–125 (2015). - 34.
Gillespie, D. T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions.

*J. Comput. Phys.***22**, 403–434 (1976). - 35.
Cao, Y., Gillespie, D. T. & Petzold, L. R. Accelerated stochastic simulation of the stiff enzyme–substrate reaction.

*J. Chem. Phys.***123**, 144917 (2005). - 36.
Lugagne, J.-B., Oyarzún, D. A. & Stan, G.-B. Stochastic simulation of enzymatic reactions under transcriptional feedback regulation. In

*Proc.**European Control Conference*3646–3651 (2013). - 37.
Gillespie, D. T. Stochastic simulation of chemical kinetics.

*Annu. Rev. Phys. Chem.***58**, 35–55 (2007). - 38.
van Kampen, N.

*Stochastic Processes in Physics and Chemistry*. (Elsevier, Amsterdam, 1992). - 39.
Chaturvedi, S., Gardiner, C. W., Matheson, I. S. & Walls, D. F. Stochastic analysis of a chemical reaction with spatial and temporal structures.

*J. Stat. Phys.***17**, 469–489 (1977). - 40.
Iyer-Biswas, S., Hayot, F. & Jayaprakash, C. Stochasticity of gene products from transcriptional pulsing.

*Phys. Rev. E***79**, 031911 (2009). - 41.
Dattani, J. & Barahona, M. Stochastic models of gene transcription with upstream drives: exact solution and sample path characterization.

*J. R. Soc. Interface***14**, 20160833 (2017). - 42.
Oyarzún, D. A. & Chaves, M. Design of a bistable switch to control cellular uptake.

*J. R. Soc. Interface***12**, 20150618 (2015). - 43.
Lipshtat, A., Loinger, A., Balaban, N. Q. & Biham, O. Genetic toggle switch without cooperative binding.

*Phys. Rev. Lett.***96**, 0603026 (2006). - 44.
To, T.-L. & Maheshri, N. Noise can induce bimodality in positive transcriptional feedback loops without bistability.

*Science***327**, 1142–1145 (2010). - 45.
Wehrens, M., Buke, F., Nghe, P. & Tans, S. J. Stochasticity in cellular metabolism and growth: approaches and consequences.

*Curr. Opin. Syst. Biol.***8**, 131–136 (2018). - 46.
Corrigan, A. M., Tunnacliffe, E., Cannon, D. & Chubb, J. R. A continuum model of transcriptional bursting.

*eLife***5**, e13051 (2016). - 47.
Ge, H., Wu, P., Qian, H. & Xie, X. S. Relatively slow stochastic gene-state switching in the presence of positive feedback significantly broadens the region of bimodality through stabilizing the uninduced phenotypic state.

*PLoS Comput. Biol.***14**, e1006051 (2018). - 48.
Schomburg, I. et al. BRENDA in 2013: integrated reactions, kinetic data, enzyme function data, improved disease classification: new options and contents in BRENDA.

*Nucleic Acids Res.***41**, D764–D772 (2013). - 49.
Bar-Even, A. et al. The moderately efficient enzyme: evolutionary and physicochemical trends shaping enzyme parameters.

*Biochemistry***50**, 4402–4410 (2011). - 50.
Liu, D., Mannan, A. A., Han, Y., Oyarzún, D. A. & Zhang, F. Dynamic metabolic control: towards precision engineering of metabolism.

*J. Ind. Microbiol. Biotechnol.***45**, 535–543 (2018). - 51.
Chaves, M. & Oyarzún, D. A. Dynamics of complex feedback architectures in metabolic pathways.

*Automatica***99**, 323–332 (2019). - 52.
Fang, X. et al. Global transcriptional regulatory network for

*Escherichia coli*robustly connects gene expression to transcription factor activities.*Proc. Natl Acad. Sci. USA***114**, 10286–10291 (2017). - 53.
Goutsias, J. Quasiequilibrium approximation of fast reaction kinetics in stochastic biochemical systems.

*J. Chem. Phys.***122**, 184102 (2005). - 54.
Melykuti, B., Hespanha, J. P. & Khammash, M. Equilibrium distributions of simple biochemical reaction systems for time-scale separation in stochastic reaction networks.

*J. R. Soc. Interface***11**, 20140054 (2014). - 55.
Serres, M. H. & Riley, M. MultiFun, a multifunctional classification scheme for

*Escherichia coli*K-12 gene products.*Microb. Comp. Genomics***5**, 205–222 (2000). - 56.
So, L. H. et al. General properties of transcriptional time series in

*Escherichia coli*.*Nat. Genet.***43**, 554–560 (2011).

## Acknowledgements

This work was funded by the Human Frontier Science Program through a Young Investigator Grant awarded to D.A.O. (RGY0076-2015), The Royal Commission for the Exhibition of 1851 through a Fellowship to P.T., and the EPSRC Centre for Mathematics of Precision Healthcare (EP/N014529/1).

## Author information

### Affiliations

### Contributions

M.K.T., P.T., M.B., and D.A.O. conceived the study. M.K.T. carried out the theoretical derivations, simulations, and analysed the data. M.B. and D.A.O. supervised the work. M.K.T., P.T., M.B., and D.A.O. wrote the paper.

### Competing interests

The authors declare no competing interests.

### Corresponding author

Correspondence to Diego A. Oyarzún.

## Supplementary information

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

## About this article

#### Received

#### Accepted

#### Published

#### DOI

## Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.