Introduction

Cellular heterogeneity is ubiquitous across all domains of life. In microbes, clonal populations display phenotypic variability as a result of multiple factors such as fluctuations in the microenvironment, stochasticity in gene expression, or asymmetric partitioning at cell division1,2,3. Variability is well recognised at the transcriptional and translational levels. Yet various single-cell phenomena result from the emergence of distinct metabolic states within a clonal population. For example, metabolic heterogeneity plays a key role in antibiotic tolerance4,5,6, heterogeneous nutrient uptake7,8, and variations in growth rate9,10. It has also been shown that nutrient shifts can cause populations to split into two11,12 or more13 subpopulations with distinct growth abilities. The emergence of subpopulations has been theorised as a bet-hedging strategy that gives an evolutionary advantage for survival in adverse environments4,14.

A central challenge to quantify metabolic variability is the lack of techniques for measuring metabolites with single-cell resolution15. In contrast to single-cell measurements of protein expression, for which powerful reporter systems have been developed16,17, quantification of metabolites in single-cells remains a major challenge. Some of the techniques employed so far include Förster resonance energy transfer (FRET) sensors18, metabolite-responsive transcription factors19,20, RNA sensors21, and mass-spectrometry22, yet most of these technologies are in the early stages of development15. As a result, metabolic heterogeneity is typically quantified indirectly via measurements of metabolic enzymes or growth rate in single-cells9,12,23.

Our objective in this paper is to characterise heterogeneity in metabolites as a result of stochastic enzyme expression and catalysis. Metabolic models traditionally assume that enzymatic reactions behave deterministically on the basis that both enzymes and metabolites appear in high molecule numbers24. However, single-cell proteomics in Escherichia coli show that metabolic enzymes are as variable as any other member of the proteome17, while metabolomics data suggest that average metabolite abundances span several orders of magnitude25. The few datasets on single-cell metabolite abundance already suggest substantial variability in some metabolites in E. coli19,26. Such evidence casts doubt on the traditional assumption of metabolism being a purely deterministic process, suggesting a link between fluctuations in enzyme expression and metabolites.

The role of stochastic gene expression in protein variability has been well studied2,3,27,28,29, but the impact of such randomness on metabolic reactions remains much less understood. Various theoretical studies have analysed the impact of fluctuations in the supply and consumption of metabolites30,31,32, or the propagation of enzyme noise to a metabolite33. However, despite mounting experimental evidence of stochastic effects in metabolism, mathematical models still lack the sufficient detail to integrate the processes that are known to shape protein heterogeneity, such as stochastic promoter switching and transcriptional bursting.

In this paper, we propose a model for metabolite heterogeneity in single-cells. The model integrates stochasticity in enzyme catalysis24 and expression27, two well-established processes that so far have been studied in isolation. Our approach includes a stochastic formulation of various relevant mechanisms in enzymatic reactions, including reversible catalysis, stochastic switching of promoter activity, fluctuations in mRNA transcripts, and consumption of the enzymatic product by downstream processes.

We probe the model for various sources of stochasticity using simulations and analytical solutions for the stationary distribution of the metabolite. The analysis reveals intricate patterns of heterogeneity that translate into bimodal and multimodal distributions for the number of metabolite molecules. These phenomena arise from the interplay between a lowly abundant enzyme and its catalytic parameters. Under the separation of timescales typical of metabolic reactions, we show that metabolite distributions can be accurately approximated by a Poisson mixture model (PMM) across large regions of the parameter space. The mixture model can be readily adapted to a wide class of gene expression models and provides a quantitative tool to predict metabolite variability from enzyme measurements in single-cells.

Results

Stochastic model of an enzymatic reaction

We consider a model that combines enzyme kinetics and enzyme expression into a single stochastic description (Fig. 1a). The model includes an enzymatic reaction with standard Michaelis–Menten kinetics, in which substrate and enzyme bind reversibly to form a complex that undergoes reversible catalysis into a metabolite. We assume that enzyme expression follows the well-established three-stage model for gene expression3,27, where a single copy gene switches stochastically between an inactive state (Doff) and active state (Don). In the active state, mRNAs are transcribed and translated into protein. The model also includes consumption of the metabolite by downstream pathways, degradation of mRNA transcripts, and dilution by the growth of all species. Since metabolic reactions operate far from thermodynamic equilibrium, we assume that the substrate pool remains constant so that the system reaches a non-zero flux, e.g. when the substrate is a highly abundant extracellular carbon source or a slowly varying intracellular metabolite. The model reactions are shown in Eqs. (R1)–(R9) in the Methods section.

Fig. 1
figure 1

Stochastic model for an enzymatic reaction. a The model integrates reversible Michaelis–Menten kinetics with the three-stage model for gene expression24,27. The model includes consumption of the metabolite by downstream pathways, degradation of mRNA transcripts, and dilution of all chemical species by cell growth (not shown in the diagram); rate constants are shown in the figure and model reactions are shown in Eqs. (R1)–(R9), Methods. The inset shows a typical simulation for a realistic parameter set shown in Table 1. b Construction of the Poisson Mixture Model (PMM) for the number of metabolite molecules (np). This approximation is valid under a separation of timescales between enzyme expression and enzyme catalysis. The mixture model, shown in Eq. (1), comprises Poisson distributions weighted by the distribution of enzyme expression P(netot). The Poisson parameter λ(netot) depends on enzyme kinetics via the nonlinear relation in Eq. (4). In the irreversible case (krev = 0), the λ(netot) parameter scales linearly and produces equi-spaced Poisson modes. The first mode, Poisson (np, 0), is highlighted as a bar. c The PMM provides an accurate approximation of the stationary distributions. Insets show distributions for enzyme and metabolite, computed via Gillespie simulations and the PMM approximation for fixed λ = 1080 molecules, K = 8 molecules, and three different promoter switching parameters, shown in Table 1

To investigate the emergence of metabolic heterogeneity, we need to compute the stationary probability distribution of metabolite molecules (np) for relevant combinations of model parameters. Figure 1b shows a typical simulation of the model obtained with Gillespie’s algorithm34. A key challenge for such simulations, however, is the multiscale nature of enzymatic reactions: not only do metabolic reactions operate in a much faster timescale (milliseconds) than enzyme expression (tens of minutes)30,35,36, but also the average number of enzymes is much lower than the number of metabolites. These multiple scales result in reaction propensities that differ by several orders of magnitude, thus leading to extremely slow simulations which make the exploration of the parameter space infeasible. An alternative is to use simulation algorithms that exploit the separation of scales to increase computational speed, such as tau-leaping or slow-scale approximations37. Yet in our case it is unclear how such numerical approximations impact the predictions drawn from the simulations.

To determine the impact of genetic and catalytic parameters on metabolic heterogeneity, we obtained an analytic approximation for the distribution of metabolite molecules that can be evaluated efficiently without expensive stochastic simulations. Our solution allows the exploration of parameter space to characterise the different regimes promoting metabolic heterogeneity. The approximation follows from exploiting time scale separation in the chemical master equation of the stochastic process38. In physiological regimes, the model has three timescales: a fast metabolic time scale, in which substrate and enzyme bind and unbind; an intermediate time scale associated with the catalysis of the metabolite (np); and a slow timescale associated with the expression of the enzyme and dilution by cell growth.

The total amount of enzyme (free and substrate-bound, denoted as ne and nc, respectively) varies in the slowest timescale, and therefore the binding/unbinding of substrate and enzyme equilibrates quickly. As a result, in the timescale of gene expression, the metabolite can be assumed to depend directly on the total enzyme netot = ne + nc rather than on ne and nc individually. Under this approximation, it is convenient to use the law of total probability:

$${\bf{P}}\left( {n_{\mathrm{p}}} \right) = \mathop {\sum}\limits_{n_{{\mathrm{etot}}} = 0}^\infty \underbrace {{\bf{P}}\left( {n_{{\mathrm{etot}}}} \right)}_{{\mathrm{gene}}\,{\mathrm{expression}}}\underbrace {{\bf{P}}\left( {n_{\mathrm{p}}{\mathrm{|}}n_{{\mathrm{etot}}}} \right)}_{{\mathrm{catalysis}}}.$$
(1)

The formula in (1) decomposes the distribution of metabolite P(np) into stochasticity originating from enzyme expression, P(netot), and from fluctuations in the catalytic reaction itself, described by the conditional distribution of metabolite given the amount of total enzyme, P(np|netot). In the timescale of metabolite fluctuations, the total enzyme can be assumed to be in a quasi-stationary state. Further, exploiting the fast binding/unbinding between substrate and enzyme, we showed that the metabolite follows a birth–death process with effective propensities (details in the Methods section):

$$\begin{array}{*{20}{l}} {{\mathrm{k}}_{{\mathrm{birth}}}^{{\mathrm{eff}}}} \hfill & = \hfill & {{\mathrm{k}}_{{\mathrm{cat}}}{\Bbb E}\left( {n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right)} \hfill \\ {} \hfill & \approx \hfill & {{\mathrm{k}}_{{\mathrm{cat}}}\frac{{k_{ - 1}}}{{\left( {k_{ - 1} + k_1n_{\mathrm{s}}} \right)}}n_{{\mathrm{etot}}},} \hfill \\ {{\mathrm{k}}_{{\mathrm{death}}}^{{\mathrm{eff}}}} \hfill & = \hfill & {k_{{\mathrm{rev}}}{\Bbb E}\left( {n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right) + k_{\mathrm{c}}} \hfill \\ {} \hfill & \approx \hfill & {k_{{\mathrm{rev}}}\frac{{k_1n_{\mathrm{s}}}}{{\left( {k_{ - 1} + k_1n_{\mathrm{s}}} \right)}}n_{{\mathrm{etot}}} + k_{\mathrm{c}},} \hfill \end{array}$$
(2)

where \({\Bbb E}({n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}})\) and \({\Bbb E}({n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}})\) are the conditional expectations of the free enzyme (ne) and complex (nc) given the total enzyme and metabolite. In Eq. (2), ns is the constant number of substrate molecules, the parameters k1, k−1, kcat, and krev are the rate constants of the Michaelis–Menten mechanism (defined in Fig. 1a), and kc is an effective first-order rate constant of metabolite consumption by downstream pathways. The conditional distribution needed in Eq. (1) can then be computed explicitly:

$${\bf{P}}\left( {n_{\mathrm{p}}{\mathrm{|}}n_{{\mathrm{etot}}}} \right)\sim {\mathrm{Poisson}}\left( {n_{\mathrm{p}};\lambda \left( {n_{{\mathrm{etot}}}} \right)} \right),$$
(3)

with Poisson parameter

$$\lambda \left( {n_{{\mathrm{etot}}}} \right) = \frac{{\lambda _\infty }}{{1 + K{\mathrm{/}}n_{{\mathrm{etot}}}}},$$
(4)

and (λ, K) are two effective kinetic parameters

$$\lambda _\infty = n_{\mathrm{s}}\frac{{k_{{\mathrm{cat}}}k_1}}{{k_{{\mathrm{rev}}}k_{ - 1}}},{\kern 1pt} \,{\mathrm{and}}\,K = k_{\mathrm{c}}\frac{{k_1n_{\mathrm{s}} + k_{ - 1}}}{{k_{{\mathrm{rev}}}k_{ - 1}}}.$$
(5)

The parameters λ and K are in units of molecules/cell and depend on the interplay between substrate abundance, enzyme kinetics, and downstream processes.

As illustrated in Fig. 1b, the distribution in Eq. (1) is a PMM39,40,41 that convolves the enzyme distribution P(netot) with various Poisson modes P(np|netot) arising from the catalytic activity. In our model, the analytical distribution of the total enzyme abundance follows the standard solution of the three-stage model for gene expression27, which can be computed explicitly in terms of model parameters. In certain limits, the three-stage model produces approximately Gamma or normal distributions depending on the mean expression level and the half-lives of mRNAs and proteins3,27.

The decomposition in Eq. (1) shows that the PMM is not limited to the model for gene expression we have considered here. Other models may be used, either by using closed-form expressions for P(netot), or by inferring the enzyme distribution directly from single-cell protein expression data such as flow cytometry or single-cell microscopy1,17. The PMM thus provides a versatile tool to predict metabolite heterogeneity from modelled or measured enzyme heterogeneity viewed as an upstream source of variation41. Figure 1c shows that the PMM distribution provides a good approximation to Gillespie simulations computed with typical parameter values.

Qualitative features of the Poisson Mixture Model

At the heart of the PMM is the interplay between variability from gene expression and that originating from enzyme kinetics. Specifically, the Poisson parameter λ(netot) in Eq. (4) controls the location and dispersion of the Poisson modes, which in turn shape the overall pattern of variability. As shown in Fig. 1b, there are several cases of interest. For example, for irreversible reactions (krev = 0), the Poisson parameter simplifies to

$$\lambda \left( {n_{{\mathrm{etot}}}} \right) = \frac{{k_1n_{\mathrm{s}}k_{{\mathrm{cat}}}}}{{\left( {k_1n_{\mathrm{s}} + k_{ - 1}} \right)k_{\mathrm{c}}}}n_{{\mathrm{etot}}},$$
(6)

which scales linearly with the enzyme abundance and thus the Poisson modes have equidistant means. In reversible reactions, on the other hand, the Poisson parameter saturates and causes the Poisson modes to concentrate around λ. This effect is stronger for strong reversibility (high krev), in which case the kinetic parameter K is small. Note also that in either case, as the enzyme number netot grows, the Poisson modes spread out since λ(netot) controls both their mean and variance.

From the construction of the PMM in Eq. (1), we observe that the enzyme distribution weighs the various Poisson modes, potentially producing metabolite distributions that are unimodal, bimodal, or even multimodal. For example, for highly expressed reversible enzymes, the distribution P(netot) is non-negligible for large netot only. Hence most Poisson modes do not contribute to the final metabolite distribution, except the mode centred at λ, which leads to a unimodal metabolite distribution with a mean close to the deterministic average.

Conversely, for lowly expressed enzymes, there is a non-negligible probability of enzymes not being expressed, and thus the first term of the PMM, i.e. P(0)Poisson(np, 0), causes the metabolite distribution to peak at zero. However, the metabolite distribution may also display a second peak at λ if, for example, the λ(netot) parameter causes many Poisson modes to concentrate around λ. This results in a bimodal metabolite distribution, whereby an isogenic population splits into metabolite producers and non-producers. Similar reasoning can be used to understand the emergence of multimodal metabolite distributions, which correspond to three or more subpopulations with varying metabolic activities. This qualitative analysis suggests that metabolic subpopulations can emerge even in cases where enzymes display unimodal distributions across the population. Crucially, this also indicates that metabolic subpopulations emerge through mechanisms that do not follow trivially from transcriptional heterogeneity alone, as we explore in more detail in the next section.

Mechanisms for metabolic bimodality

First, we explored the impact of stochastic promoter switching on the emergence of metabolite bimodality. Figure 2a shows the summary of calculations when evaluating the PMM for variations in the promoter time scale and promoter activity across several orders of magnitude for various values of the kinetic parameter λ. We found three qualitatively distinct parameter regimes for the metabolite distribution that emerge from the combination of stochastic switching and catalysis: (1) a regime where both enzyme and metabolite have unimodal distributions, akin to the results shown earlier in Fig. 1c; (2) a regime where both enzyme and metabolite have bimodal distributions; and (3) a regime in which the enzyme is unimodal but the metabolite is bimodal.

Fig. 2
figure 2

Mechanisms for metabolite bimodality. a We evaluated the Poisson Mixture Model across a broad range of promoter switching timescale and promoter activity. Unimodal distributions for enzyme and metabolite (similar to those shown in Fig. 1c) cover a large fraction of the parameter space. We identified two regimes in which metabolites are bimodal: in the switching-induced regime, bimodality propagates from the enzyme to the metabolite. In the catalytically-induced regime, bimodality originates from a lowly abundant enzyme and the strong separation of timescales between expression and catalysis. The small panels show model predictions for a fixed kinetic parameter K = 0.1333 molecules, and increasing λ = {300, 3000, 30000} molecules, obtained by increasing the turnover rate constant kcat. b Exact simulations for two parameter sets verify the predictions drawn from the PMM approximation. We simulated over a long time horizon to obtain accurate estimates for stationary distributions; insets show only a small portion of the time courses. The parameter values for the promoter switching rates are indicated in panel (a) and we fixed λ = 500 molecules. Both types of bimodality can be clearly distinguished in the time courses, but we note that they lead to almost identical distributions for the metabolite. In both cases, the PMM provides an accurate approximation for the stationary distributions

It can be shown that the deterministic version of our model in Eqs. (R1)–(R9) has a single steady state. Hence regime (1) can be thought of as a stochastic correction consisting of unimodal distributions around a deterministic steady state. This is the expected behaviour under the traditional assumptions of high abundance of enzyme and metabolite molecules.

The other two regimes, however, correspond to alternative routes of noise-induced bimodality that cannot be explained using deterministic models42,43,44. Regime (2) is a highly stochastic regime dominated by the slow stochastic switching of the promoter, which drives and entrains the metabolic response. Hence we term it switching-induced bimodality. Slowly switching promoters are known to produce bimodal gene expression29,41, and thus this regime corresponds to a case in which bimodality propagates from enzymes to metabolites. Figure 2a shows that this behaviour appears robustly for slow switching and high promoter activity across values of the λ parameter.

Regime (3), the second route for metabolite bimodality, originates from a unimodal but weakly expressed enzyme (low kon/koff) expressed from fast switching promoters. In this case, the birth of a small number of enzyme molecules is sufficient to kick-start catalysis and make it rapidly settle in a quasi-stationary regime. This distinct phenomenon is a result of the separation of time scales between enzyme expression and catalysis, and we refer to it as catalytically-induced bimodality. From Fig. 2a, we observe that this form of bimodality appears for a narrow range of promoter switching parameters corresponding to fast switching genes with medium to low promoter activity. This behaviour disappears altogether for a low λ parameter, for example in case of strong reversibility.

To validate the predictions of the PMM approximation, we ran full Gillespie simulations over a long time horizon for different parameter sets. Figure 2b shows the simulation time courses and resulting histograms. For switching-induced bimodality, we observe how slowly switching promoters cause a single cell to lack the enzyme over several cell cycles, a period during which the metabolite is not produced. In the case of catalytic-induced bimodality, however, fast switching combined with a low average expression level causes the metabolite abundance to drop for shorter but more frequent intervals. In both cases, the PMM provides an excellent approximation to the bimodal histograms obtained from the stochastic simulations. Furthermore, we observe that the bimodal metabolite distributions both regimes are almost indistinguishable from each other, yet they are produced by enzymes with substantially different time courses and distributions. These regimes therefore correspond to distinct forms of bimodality, arising from fundamentally different mechanisms.

Emergence of metabolic multimodality

To explore the emergence of multimodality, we examined the analytical formula of the PMM in Eq. (1) to identify kinetic regimes associated with distinct enzyme distributions. A necessary condition for the emergence of multiple modes is that the Poisson components do not overlap and are sufficiently spaced from each other. From the definition of the λ(netot) parameter in Eq. (5), this happens when the kinetic parameter K is large. As discussed earlier, depending on the distribution of the enzyme, the Poisson modes may appear or cancel in the final metabolite distribution. We thus swept the parameter K and evaluated the PMM across various enzyme expression levels, including low expression with a skewed distribution and high expression with a normally distributed enzyme.

As shown in Fig. 3, we found intricate patterns of multimodal distributions, depending on the interplay between the heterogeneity of the enzyme, P(netot), and the enzyme kinetics encapsulated by the K parameter. Multimodality appears when the enzyme expression levels are low as compared to the parameter K. For instance, the values of K in Fig. 3 are approximately 5-, 20-, and 100-fold those used in the bimodal examples in Fig. 2. For enzymes expressed at intermediate levels, in the order of tens of molecules/cell on average, we found metabolite distributions that are unimodal but highly skewed. In the case of highly expressed enzymes, metabolites followed approximately normal distributions for a wide range of kinetic parameters.

Fig. 3
figure 3

Emergence of metabolic multimodality. We used the PMM approximation to find regimes for multimodality through perturbations to the enzyme kinetics. We vary the kinetic parameter K to control the dependency of the Poisson parameter λ(netot) in Eq. (4) on the total enzyme abundance. Parameter values are λ = 750 molecules and K = {10.0400, 2.1630, 0.4660} molecules obtained by variations to the kinetic rate constants kcat and krev with a constant ratio kcat/krev. We shape the mean enzyme abundance with the promoter switching rates kon = {1.56, 5.9, 20} × 10−4 s−1 and koff = {9.8, 9.3, 8} × 10−4 s−1. From the PMM, we found intricate patterns of multimodal distributions in the metabolite, all of which show an excellent match with the corresponding Gillespie simulations. The simulated time courses show metabolite numbers traversing various quasi-stationary regimes

The predictions are confirmed by Gillespie simulations of the full stochastic model, which display a striking match with the PMM approximation, even for complex multimodal distributions. The simulation time courses (shown in the insets of Fig. 3) show that the multiple modes for weakly expressed enzymes correspond to cells remaining in a fixed metabolic state over the scale of the cell cycle but fluctuate across other states over longer time scales. For intermediate enzyme expression and large values for K, the metabolite does not settle in the quasi-stationary states and displays a long-tailed distribution. A decrease in K suppresses the tail of the distribution driving the PMM towards an approximately normal distribution. Altogether, these results indicate that the relation between enzyme expression and the kinetic parameters λ and, in particular, K are key determinants for the emergence of multimodality. This underscores the utility of the PMM to guide the prediction of qualitative and quantitative features of metabolite distributions for a wide range of parameter combinations.

Discussion

Metabolic reactions are the powerhouse of living systems, fuelling the activity and dynamics of most cellular functions. Yet metabolism has been traditionally considered as a static process isolated from the rest of the cellular machinery. Currently, the accepted notion is that due to the large number of molecules involved, metabolism is a deterministic process at the cellular level, modulated by potentially random extrinsic factors12,45. Here, we integrated enzyme kinetics and enzyme expression to propose a theoretical model for the variability of metabolites in single cells. The model suggests that cell-to-cell metabolite variation can also arise as a result of intrinsic sources such as stochastic fluctuations in enzyme expression.

The majority of work on non-genetic heterogeneity has focused on stochastic gene expression and the resulting variability in protein levels1,2. This has produced a wealth of single-cell data and models to understand the variability in transcription and translation observed in clonal populations. Metabolite heterogeneity, however, remains poorly understood theoretically and has been observed only indirectly (e.g. through measurements of metabolic enzymes12,23 or growth rate9) due to the lack of techniques to measure metabolite abundance in single cells.

Using the separation of time scales characteristic of metabolic reactions, we found that the stationary distribution of a metabolite follows a PMM. The PMM can be efficiently evaluated across large domains of the parameter space and provides excellent approximations to the distributions computed from full stochastic simulations. Importantly, the model can be readily adapted to include different stochastic models for enzyme expression, beyond the three-stage model considered here3,46, or even stochastic and time-dependent enzyme expression modelled as upstream drives41. The model can also be parameterised from experimentally measured distributions for enzyme levels in single-cells17. In combination with the enzyme kinetic parameters, the PMM could provide a powerful tool to predict metabolite variability from single-cell protein data obtained with flow cytometry or time-lapse microscopy.

We found complex patterns of metabolite heterogeneity depending on the interplay between the timescale of promoter activation/deactivation, the enzyme expression level, and the enzyme kinetics. The model predicts that bimodal and multimodal metabolite distributions can emerge in various parameter regimes. In such regimes, single-cells spend several cell cycles in a constant metabolic state, but in timescales as long as tens of cell cycles, they switch stochastically across different states. Such long-term fluctuations in single cells result in highly heterogeneous populations containing several subgroups of metabolically distinct cells.

Bimodal metabolic phenotypes have been observed as a result of transcriptional regulation14,23, post-translational control11, and stochastic effects triggered by environmental shifts12. Our model reveals two distinct regimes in which metabolites display bimodality. One regime, which we call switching-induced bimodality, corresponds to the intuitive case in which a bimodal enzyme produces a bimodal metabolite. In agreement with previous studies on stochastic gene expression, this type of bimodality appears as a result of slow switching between promoter states29,47. In addition, we identified a fundamentally different mechanism of catalytically-induced bimodality, in which a unimodal enzyme produces a bimodal distribution of metabolite. This phenomenon results from a combination of slow fluctuations of a weakly expressed enzyme and the comparatively faster timescale of enzyme catalysis. Catalytic timescales are typically in the order of seconds or faster, so that slow fluctuations in enzyme expression levels produce two quasi-stationary metabolic states in single cells. At a population level, this leads to two distinct subpopulations of metabolite producers and non-producers.

As shown in Fig. 4a, single-cell measurements in E. coli suggest that metabolic enzymes appear in low copy numbers across most cellular pathways17. In the specific growth conditions of that experiment, the data did not reveal the bimodal expression of enzymes, which precludes the emergence of switching-induced bimodality in the metabolites they catalyse. However, as illustrated by the three representative distributions in Fig. 4a, a number of enzymes have a low mean and a long-tailed distribution, akin to those required for catalytically-induced bimodality and multimodality. This suggests that enzyme distributions found in nature have the characteristics needed for the emergence of subpopulations with two or more distinct metabolite abundances.

Fig. 4
figure 4

Model predictions and experimental data. a Single-cell measurements reveal that metabolic enzymes are as variable as other members of the proteome17. Data correspond to ~80% of the E. coli proteome, including 268 enzymes involved in various metabolic functions55 (coloured circles). The coefficient of variation (CV) is defined as the standard deviation over the mean of measured distributions. Distributions with CV > 1 (dashed line) are long-tailed and peak at zero, which resemble the distributions required for catalytically-induced bimodality and multimodality (Figs. 2b and 3); shown are the distributions of three representative enzymes computed from fitted Gamma distributions17. b Predictions of the Poisson Mixture Model for combinations of parameters λ and K in Eq. (5). For a lowly abundant enzyme (distribution in inset), the model predicts unimodality, catalytically-induced bimodality and multimodality in large regions of the (λ, K) space. The red line represents a constant ratio λ/K for the median kcat ≈ 16.5 s−1 across 752 enzymes in E. coli, see kcat distribution displayed in the inset49. Shaded red corresponds to the lines obtained for kcat values within a range 0.5- and 2-fold of the median (highlighted in green in the inset). The grey lines and shaded areas correspond to perturbations to the consumption rate constant (kc) and enzyme saturation \(\left( \epsilon \right)\); more details in Methods

Further requirements for metabolite bimodality and multimodality involve conditions on the parameters λ and K in Eq. (5). However, their computation requires rate constants (k1, k−1, and krev) that are rarely measured individually, and instead enzymology data typically provides values for kcat and KM = (kcat + k−1)/k1 only48. In the Methods section, we show that the ratio λ/K can be expressed as \(\lambda _\infty {\mathrm{/}}K = \epsilon \times k_{{\mathrm{cat}}}{\mathrm{/}}k_{\mathrm{c}}\), where \(\epsilon\) is the saturation level of the enzyme and kc is the first-order rate constant of metabolite consumption. As illustrated in Fig. 4b, the ratio λ/K corresponds to a straight line in a (λ, K)-space, and a specific enzyme (i.e. with specific values for k1, k−1, and krev) corresponds to a single point on the line. In Fig. 4b, we compare model predictions for a lowly abundant enzyme with different λ/K ratios computed for kcat constants measured in E. coli49. Considering the large spread in measured kcat values, of up to seven orders of magnitude, plus the multiple combinations of metabolite consumption rates and enzyme saturation, the analysis suggests that catalytically-induced bimodality and multimodality are plausible within physiological regimes. Further validations of our predictions require measuring metabolite distributions directly, but this is still subject to a number of challenges in single-cell measurement technologies15.

Our analysis shows that metabolite heterogeneity depends on a delicate interplay between enzyme expression and enzyme kinetics. It is reasonable to expect that energy-critical enzymes, such as those in central carbon metabolism, filter away fluctuations through post-translational regulatory mechanisms commonly found in metabolism. However, this may not be the case in pathways that are dynamically regulated in response to changes in the environment or cellular context. For example, transcriptional regulation in response to nutrient shifts may steer enzyme levels into regimes of low copy numbers where heterogeneity may dominate the resulting phenotypes. Such a mechanism has been already shown to produce growth bimodality in the gluconeogenic switch of E. coli12, while a similar mechanism could underpin the large variability observed in single-cell measurements of S-adenosyl methionine21. Noise-induced phenomena also have implications for the design of dynamic control systems for heterologous pathways, which are focus of much research in synthetic biology and metabolic engineering50.

In our efforts to build a theory that includes components shared by most enzymatic reactions, we have purposely overlooked a number of processes that can shape metabolic activity. For example, we have not addressed the impact of feedback mechanisms that control enzyme activity, including e.g. product inhibition and allostery, or transcriptional mechanisms that control enzyme expression in response to metabolites. Since post-translational regulation operates on timescales much shorter than enzyme expression, with similar timescale separation arguments it should be possible to express the metabolite distribution as a mixture model akin to ours. In such a case, the mixture components are not necessarily Poisson and their distribution will depend on the particular mechanism under study. We expect that bimodal and multimodal responses are likely to emerge in this setting, but the precise parameter conditions would have to be studied on a case-by-case basis. Transcriptional feedback can also display various mechanisms depending on the particular pathway under study. One common motif relies on transcription factors (TF) that up- or down-regulate enzyme expression upon binding to a specific metabolite20. These mechanisms have been shown to play important roles on metabolic activity51, but they also bring to the fore subtle questions that require detailed examination, for example, on the role of fluctuations coming from TF expression itself, or the impact of negative TF autoregulation52. Our study paves the way for these and other questions to be addressed and raises exciting prospects for the future research in metabolic heterogeneity.

In this paper, we laid theoretical foundations to study metabolism in conjunction with stochastic enzyme expression. We brought together classic models for gene expression and enzyme kinetics, and discovered a rich array of distinct stochastic phenomena that underpin the emergence of metabolic subpopulations. Our theory provides a quantitative basis to draw testable hypotheses on the sources of metabolite heterogeneity, which together with the ongoing efforts in single-cell metabolite measurements, will help to re-think metabolism as an active source of phenotypic variation.

Methods

Stochastic modelling and simulation

We built a fully stochastic model for the reaction scheme describing a metabolic reaction coupled with gene expression (Fig. 1a):

$${\mathrm{Substrate}} + {\mathrm{Enzyme}}\begin{array}{*{20}{c}} {k_1} \\ \rightleftarrows \\ {k_{ - 1}} \end{array}{\mathrm{Complex}}$$
(R1)
$${\mathrm{Complex}}\begin{array}{*{20}{c}} {k_{{\mathrm{cat}}}}\\ \rightleftarrows \\ {k_{{\mathrm{rev}}}} \end{array}{\mathrm{Metabolite}} + {\mathrm{Enzyme}}$$
(R2)
$${\mathrm{mRNA}}\begin{array}{*{20}{c}} {k_{{\mathrm{tl}}}} \\ \to \\ {} \end{array}{\mathrm{mRNA}} + {\mathrm{Enzyme}}$$
(R3)
$${\mathrm{DNA}}_{{\mathrm{on}}}\begin{array}{*{20}{c}} {k_{{\mathrm{tx}}}} \\ \to \\ {} \end{array}{\mathrm{DNA}}_{{\mathrm{on}}} + {\mathrm{mRNA}}$$
(R4)
$${\mathrm{DNA}}_{{\mathrm{off}}}\begin{array}{*{20}{c}} {k_{{\mathrm{on}}}} \\ \rightleftarrows \\ {k_{{\mathrm{off}}}} \end{array}{\mathrm{DNA}}_{{\mathrm{on}}}$$
(R5)
$${\mathrm{Metabolite}}\begin{array}{*{20}{c}} {k_{\mathrm{c}}} \\ \to \\ {} \end{array}\emptyset$$
(R6)
$${\mathrm{mRNA}}\begin{array}{*{20}{c}} {k_{{\mathrm{deg}}}} \\ \to \\ {} \end{array}\emptyset$$
(R7)
$${\mathrm{Enzyme}}\begin{array}{*{20}{c}} {\mathrm{\delta }} \\ \to \\ {} \end{array}\emptyset$$
(R8)
$${\mathrm{Complex}}\begin{array}{*{20}{c}} {\mathrm{\delta }} \\ \to \\ {} \end{array}\emptyset$$
(R9)

All reactions are assumed to follow mass action kinetics. Model simulations were computed with Gillespie’s algorithm34 over long time horizons, in the order of hundreds of cell cycles for all simulations. Because of the complex multimodality observed, long simulations are needed to obtain accurate approximations of the stationary molecular distributions. The time courses shown in figures correspond to a small time window of the overall simulation. Unless mentioned in figure captions, all parameter values were fixed to their nominal values shown in Table 1. The parameters are selected in a physiologically realistic range respecting the scale separation of molecule numbers between mRNA (~1–5 molecules), total enzyme abundance (~100 molecules) and metabolites (~1000 molecules).

Table 1 Nominal parameters for the stochastic model

To identify bimodality in Fig. 2, we detected the existence of one or two peaks in a distribution and defined it as bimodal if the height of the smaller peak is a least 10% of the larger peak and the trough between peaks is at most 10% of the height of the smaller peak.

Analytical expressions for the metabolite distribution

To derive an analytic approximation for the probability to observe np metabolites in a cell, we first use the law of total probability as shown in Eq. (1).

Distribution of the total enzyme: Because free enzymes and complexes degrade at the same rate and netot = ne + nc is conserved by the metabolic reaction, in the slow timescale the enzyme distribution P(netot) follows the standard solution27 of the three-stage model for gene expression:

$$\begin{array}{*{20}{l}} {{\bf{P}}\left( {n_{{\mathrm{etot}}}} \right)} \hfill & = \hfill &\hskip-1pt {\frac{{{\mathrm{\Gamma }}\left( {\alpha _ + + n_{{\mathrm{etot}}}} \right){\mathrm{\Gamma }}\left( {\alpha _ - + n_{{\mathrm{etot}}}} \right){\mathrm{\Gamma }}(\gamma )}}{{{\mathrm{\Gamma }}\left( {n_{{\mathrm{etot}}} + 1} \right){\mathrm{\Gamma }}\left( {\alpha _ + } \right){\mathrm{\Gamma }}\left( {\alpha _ - } \right){\mathrm{\Gamma }}\left( {\gamma + n_{{\mathrm{etot}}}} \right)}}} \hfill \\ {} \hfill & {} \hfill & { \times \left( {\frac{b}{{1 + b}}} \right)^{n_{{\mathrm{etot}}}}\left( {1 - \frac{b}{{1 + b}}} \right)^{\alpha _ + }} \hfill \\ {} \hfill & {} \hfill & { \times _{2}F_1\left( {\alpha _ + + n_{{\mathrm{etot}}},\gamma - \alpha _ - ,\gamma + n_{{\mathrm{etot}}};\frac{b}{{1 + b}}} \right),} \hfill \end{array}$$
(7)

where Γ is the Gamma function and 2F1 is the ordinary hypergeometric function. The parameters are γ = (kon + koff)/δ and \(\alpha _ \pm = \left( {a + \gamma \pm \sqrt {(a + \gamma )^2 - 4ak_{{\mathrm{off}}}{\mathrm{/}}\delta } } \right){\mathrm{/}}2\), with a = ktx/δ and b = ktl/kdeg.

Conditional distribution for the metabolite: To compute the second term in Eq. (1), we observe that enzyme expression occurs on a much longer timescale than enzyme kinetics, and thus metabolites can be considered to be in a quasi-equilibrium state of the catalytic reactions (R1) and (R2) and metabolite consumption (R6).

To explicitly compute the mixture components P(np|netot), we assume that reversible binding between substrate and enzyme in reaction (R1) is much faster than the catalytic step and metabolite consumption35. In this limit, the metabolite number evolves according to the effective reactions:

$$\emptyset \begin{array}{*{20}{c}} {{\mathrm{k}}_{{\mathrm{cat}}}^{{\mathrm{eff}}}} \\ \rightleftarrows \\ {{\mathrm{k}}_{{\mathrm{rev}}}^{{\mathrm{eff}}}} \end{array}{\mathrm{Metabolite}}\begin{array}{*{20}{c}} {{\mathrm{k}}_{\mathrm{c}}} \\ \to \\ {} \end{array}\emptyset ,$$
(8)

where \({\mathrm{k}}_{{\mathrm{cat}}}^{{\mathrm{eff}}}\) and \({\mathrm{k}}_{{\mathrm{rev}}}^{{\mathrm{eff}}}\) are effective propensities averaged over the fast fluctuating variables nc and ne:

$$\begin{array}{l}{\mathrm{k}}_{{\mathrm{cat}}}^{{\mathrm{eff}}} = {\mathrm{k}}_{{\mathrm{cat}}}{\Bbb E}\left( {n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right),\\ {\mathrm{k}}_{{\mathrm{rev}}}^{{\mathrm{eff}}} = k_{{\mathrm{rev}}}{\Bbb E}\left( {n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right),\end{array}$$
(9)

where \({\Bbb E}\) denotes the expectation operator. The derivation of the effective propensities in Eq. (9) corresponds to a particular case of a more general methodology for timescale separation in stochastic chemical systems29,53,54. Note that since the total enzyme levels are conserved in the catalytic timescale, it follows that

$${\Bbb E}\left( {n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right) + {\Bbb E}\left( {n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right) = n_{{\mathrm{etot}}}.$$
(10)

To derive the conditional expectations in Eq. (9), we write the first-order moment equation for the free enzyme ne, which according to Eqs. (R1) and (R2) reads

$$\begin{array}{*{20}{l}} {\frac{{\mathrm{d}}}{{{\mathrm{d}}t}}{\Bbb E}\left( {n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right)} \hfill & = \hfill &\hskip-1.5pt { - k_1n_{\mathrm{s}}{\Bbb E}\left( {n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right)} \hfill \\ {} \hfill & {} \hfill & { + \,k_{ - 1}{\Bbb E}\left( {n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right)} \hfill \\ {} \hfill & {} \hfill & { + \,k_{{\mathrm{cat}}}{\Bbb E}\left( {n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right)} \hfill \\ {} \hfill & {} \hfill & { - \,k_{{\mathrm{rev}}}{\Bbb E}\left( {n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right)n_{\mathrm{p}}.} \hfill \end{array}$$
(11)

Under the assumption that the reversible binding of substrate and enzyme is much faster than the other processes, the first two terms dominate the right-hand side of Eq. (11) and determine the enzyme–complex quasi-equilibrium. Equating these two terms and using the conservation relation in Eq. (10), we obtain

$$\begin{array}{l}{\Bbb E}\left( {n_{\mathrm{e}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right) \approx \frac{{k_{ - 1}}}{{\left( {k_{ - 1} + k_1n_{\mathrm{s}}} \right)}}n_{{\mathrm{etot}}},\\ {\Bbb E}\left( {n_{\mathrm{c}}{\mathrm{|}}n_{{\mathrm{etot}}},n_{\mathrm{p}}} \right) \approx \frac{{k_1n_{\mathrm{s}}}}{{k_{ - 1} + k_1n_{\mathrm{s}}}}n_{{\mathrm{etot}}},\end{array}$$
(12)

and thus both conditional expectations depend on netot and are independent of the metabolite abundance. Therefore, the reactions in Eq. (8) correspond to a birth–death process with a zero-th order birth propensity \({\mathrm{k}}_{{\mathrm{cat}}}^{{\mathrm{eff}}}\) and two linear death propensities. The mixture components P(np|netot) are thus Poissonian with parameter λ(netot) as shown in Eqs. (3) and (4).

Comparison of PMM predictions and measured kinetic parameters

The PMM depends on the effective parameters λ and K, which are functions of five rate constants (kcat, k1, k−1, kc, and krev). Most of these parameters are not available, except kcat and KM = (kcat + k−1)/k1. From Eq. (5) it follows

$$\frac{{\lambda _\infty }}{K} = \frac{{k_{{\mathrm{cat}}}}}{{k_{\mathrm{c}}}} \times \underbrace {\frac{{k_1n_{\mathrm{s}}}}{{k_1n_{\mathrm{s}} + k_{ - 1}}}}_{{\mathrm{saturation}}\,\epsilon },$$
(13)

which allows the computation of λ/K for measured kcat values in different saturation conditions and consumption rate constants. The red line in Fig. 4b represents the λ/K ratio for a saturated enzyme \(\left( {\epsilon = 1} \right)\), fast consumption (kc = 100 × δ), and the median kcat ≈ 16.5 s−1 in E. coli49. The top grey line is the case without consumption, i.e. metabolites are diluted by cell growth (kc = δ). Lower saturation \(\left( {\epsilon = 0.2} \right)\) moves the red line down the vertical axis (bottom grey line). The enzyme distribution in Fig. 4b was produced with promoter switching parameters {kon, koff} = {1.56, 3} × 10−4 s−1, ktx = 0.025 s−1, and ktl = 0.2 s−1. The boundaries between unimodal, bimodal, and multimodal distributions were computed as follows. Unimodal distributions are those with a single maximum. Bimodal distributions were detected as in Fig. 2. Multimodal distributions are those with at least one additional peak higher than a threshold of 1 × 10−4 and the trough between neighbouring peaks at most 90% of its height.

Code availability

The code for producing model simulations is available from the authors upon request.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.