Heterogeneous recruitment abilities to RNA polymerases generate nonlinear scaling of gene expression with cell volume

Wang, Qirun; Lin, Jie

doi:10.1038/s41467-021-26952-y

Download PDF

Article
Open access
Published: 25 November 2021

Heterogeneous recruitment abilities to RNA polymerases generate nonlinear scaling of gene expression with cell volume

Nature Communications volume 12, Article number: 6852 (2021) Cite this article

2487 Accesses
10 Citations
12 Altmetric
Metrics details

Subjects

Abstract

While most genes’ expression levels are proportional to cell volumes, some genes exhibit nonlinear scaling between their expression levels and cell volume. Therefore, their mRNA and protein concentrations change as the cell volume increases, which often have crucial biological functions such as cell-cycle regulation. However, the biophysical mechanism underlying the nonlinear scaling between gene expression and cell volume is still unclear. In this work, we show that the nonlinear scaling is a direct consequence of the heterogeneous recruitment abilities of promoters to RNA polymerases based on a gene expression model at the whole-cell level. Those genes with weaker (stronger) recruitment abilities than the average ability spontaneously exhibit superlinear (sublinear) scaling with cell volume. Analysis of the promoter sequences and the nonlinear scaling of Saccharomyces cerevisiae’s mRNA levels shows that motifs associated with transcription regulation are indeed enriched in genes exhibiting nonlinear scaling, in concert with our model.

Negative autoregulation controls size scaling in confined gene expression reactions

Article Open access 22 June 2022

Nuclear compartmentalization as a mechanism of quantitative control of gene expression

Article 02 August 2021

Transcription–replication interactions reveal bacterial genome regulation

Article 24 January 2024

Introduction

Homeostasis of gene expression level in a dynamically growing cell volume is widely observed across the domains of biology, namely, the concentrations of mRNAs and proteins of most genes are approximately constant^{1,2,3,4,5,6,7,8,9,10}. Since the cell volume usually grows exponentially^9,11,12, this implies exponential growth of mRNA and protein copy numbers as well, and a linear scaling between the gene expression level and cell volume exists. Theoretical models have shown that the linear scaling between gene expression level and exponentially growing cell volume is a consequence of the limiting nature of gene expression machinery: RNA polymerases (RNAPs) and ribosomes¹³, in agreement with experimental observations^10,14,15,16. However, along with the constant concentrations of mRNAs and proteins of most genes, there is a subset of genes exhibiting nonconstant concentrations as the cell volume increases. These genes are often crucial for cell-cycle and size regulation. They allow cells to sense their sizes based on the concentrations of proteins that have different scaling behaviors with cell volume, such as Whi5 and Cln3 in Saccharomyces cerevisiae^{17,18,19,20,21}. Other examples include DNA-binding proteins such as histone proteins, whose expression levels scale with the total DNA amount instead of cell volume²². Recent experiments show that proteins with changing concentration are often associated with cell senescence²³. A fundamental question then arises: if a linear scaling between gene expression level and cell volume are by default for most genes, how can cells achieve nonlinear scaling for a subset of genes with cell volume in the meantime? In this paper, we show that the superlinear and sublinear scaling of gene expression level is a direct consequence of the heterogeneous recruitment abilities of promoters to RNA polymerases. Given a unimodal distribution of recruitment abilities, those genes with their promoters’ recruitment abilities below (above) the average spontaneously exhibit superlinear (sublinear) scaling with cell volume, while genes with recruitment abilities near the average exhibit approximately linear scaling.

In the following, we first introduce a gene expression model at the whole-cell level in which the promoters of all genes have the same recruitment abilities to RNAPs, which is analytically solvable. Then we consider a scenario in which all genes except a small subset of genes have the same recruitment abilities and show that the expression levels of those special genes can exhibit superlinear or sublinear scaling with cell volume depending on their relative magnitudes of recruitment abilities compared with the majority of genes. Then we extend the simplified model to allow a continuous distribution of recruitment abilities and show that our simplified model can quantitatively capture this more realistic scenario. Genes with recruitment abilities below (above) the average value naturally exhibit superlinear (sublinear) scaling with cell volume. Finally, to verify our theoretical predictions, we analyze the nonlinear scaling of mRNA numbers vs. cell volume of S cerevisiae using the data from Ref. ²¹. Our model predicts a positive correlation between the mRNA production rates and nonlinear scaling degrees among genes, and experimental data confirm this. We further analyze the promoter sequences of all genes with measured nonlinear scaling degrees and find that special motifs for transcription factor binding are indeed enriched in the promoters of genes exhibiting nonlinear scaling, in concert with our theoretical predictions. Our results imply that the nonlinear scaling of gene expression level can be under evolutionary selection through the promoter sequences.

Results

Model of gene expression at the whole-cell level

We consider a coarse-grained model of gene expression. The mRNA production rate k_n,i of one particular gene labeled by index i is proportional to its gene copy number (g_i) and the probability for its promoters to be bound by RNAPs (P_b,i),

$${k}_{n,i}={{{\Gamma }}}_{n,i}{g}_{i}{P}_{b,i},$$

(1)

where

$${P}_{b,i}=\frac{{c}_{n,free}}{{c}_{n,free}+{K}_{n,i}}$$

(2)

Here Γ_n,i is the initiation rate of transcription of gene i: the rate that a promoter-bound RNAP starts transcribing the gene and producing mRNA. We assume the probability for one promoter to be bound by an RNAP follows the Michaelis-Menten equation where c_n,free is the concentration of free RNAPs in the nucleus²⁴ (see the schematic in Fig. 1). For simplicity, we consider all the RNAPs to be in the nucleus and neglect the small fractions of RNAP intermediates that may exist in the cytoplasm.

**Fig. 1: A coarse-grained model of gene expression at the whole-cell level.**

K_n,i is the Michaelis-Menten (MM) constant which is inversely proportional to the binding rate of RNAPs on the promoters (see detailed derivations in Supplementary Discussion A). In the following, we use 1/K_n,i as a metric of the recruitment abilities of genes to RNAPs: a larger K_n,i represents a smaller recruitment ability. We assume the mRNA lifetime as τ_m,i so that given the mRNA production rate, the mRNA copy number m_i changes according to Eq. (6) in Methods. Note that we mainly discuss eukaryotic cells in this work in which the transcription and translation processes are spatially separate, and nonspecific binding of RNAPs to DNA are mostly irrelevant^25,26. As we also mainly discuss the transcription of mRNA, RNAP here refers to RNAP II. Real transcription processes in eukaryotic cells are complex and may involve transcription factors, mediators, enhancers, and TATA-binding proteins^27,28,29. Their effects are coarse-grained into the Michaelis-Menten constant within our model, e.g., a transcription factor that increases the expression level of one gene is equivalent to reducing the Michaelis-Menten constant in the RNAP binding probability P_b,i.

Because the typical time scale for an RNAP to finish transcribing a gene is around one minute³⁰, much shorter than the typical doubling times, we assume that the dynamics of RNAPs along each gene is in the steady-state. Therefore, the outgoing flux of RNAPs that finish transcribing a gene (v_nn_i/L_i) must be equal to the initiation rate of transcription (P_b,iΓ_n,i) where n_i is the number of transcribing RNAPs on one copy of gene i. Here L_i is the length of the gene i in the number of codons and v_n is the elongation speed of RNAP, which we approximate as a constant. This leads to the expression of n_i as n_i = P_b,iΛ_n,i where Λ_n,i = Γ_n,iL_i/v_n is the maximum possible number of transcribing RNAPs on one copy of gene i.

A similar model as transcription also applies to the translational process (Fig. 1 and Methods). In the following analysis, we assume that most proteins are nondegradable, and our results regarding nonlinear scaling are equally valid for degradable proteins. The total protein mass of a cell is M = ∑_ip_iL_i, in the number of amino acids where p_i is the protein copy number of gene i. The total protein mass is known to be proportional to the cell volume^1,31,32,33. Therefore, we assume a constant ratio between the total protein mass and cell volume. We further assume that the nuclear volume is proportional to the total cell volume, supported by experimental observations³⁴.

A simplified model in which all genes share the same recruitment ability

In the following, we consider a simplified scenario in which the promoters of all genes have the same recruitment ability to RNA polymerases so that K_n,i = K_n for all i. Within this scenario, we find that the mass fractions of proteins in the entire proteome are approximately constant as the cell volume increases given fixed gene copy numbers (Supplementary Discussion B). Therefore, the protein numbers of all genes, including RNAP, are proportional to the total protein mass, and therefore also proportional to the cell volume.

We also assume that almost all RNAPs are bound to a promoter or transcribing, and we will discuss its validity later in this section. Therefore, the total mRNA production rate of all genes is proportional to the total number of RNAPs, which is also proportional to the cell volume. Finally, using the fact that all genes share the same recruitment ability to RNAPs, it follows that the total mRNA production rate should be evenly distributed among all genes. Therefore, each gene’s mRNA production rate is also proportional to cell volume, which is the main result of this section.

Here, we also derive the mathematical expression of mRNA production rates. As we show in Methods, the fraction of RNAPs that are bound to a promoter or transcribing is very close to 1 if n < n_c where n_c = ∑_ig_i(1 + Λ_n,i), which is the maximum number of RNAPs the entire genome can hold. In other words, the fraction of free RNAPs, which are neither bound to a promoter nor transcribing, is much smaller than 1 if n < n_c. Given this, we obtain the binding probability as P_b = n/n_c (Methods). Using Eq. (1), the mRNA production rate of gene i becomes

$${k}_{n,i}={{{\Gamma }}}_{n,i}\frac{{g}_{i}}{{n}_{c}}n,$$

(3)

which is proportional to the cell volume.

The above scenario of gene expression, which has been called Phase 1 of gene expression¹³, is the typical state of cells in normal conditions^6,14,35 and the main focus of this work. In Phase 1, the cell volume grows exponentially, and the homeostasis of mRNA and protein concentrations of most genes are maintained¹³ (see detailed discussions in Supplementary Discussion B). When the total number of RNAPs exceeds the threshold value n_c, the linear scaling between mRNA numbers and cell volume breakdowns and cells enter a different phase of gene expression (see detailed discussions in Methods and Supplementary Discussion B).

Finally, since the mRNA lifetimes are typically much shorter than the doubling time³⁰, mRNA productions are in quick equilibrium. Therefore according to Eq. (6) in Methods, the mRNA copy numbers can be approximated as the products of mRNA production rates and mRNA lifetimes, therefore proportional to the cell volume as well.

A more realistic model in which genes can have different recruitment abilities

We now consider a more realistic scenario in which the recruitment abilities to RNAPs of different genes can be different. We start from a simple scenario in which all genes have the same recruitment ability 1/K_n except one special gene i has a recruitment ability 1/K_n,i. We note that the only parameter affecting mRNA production rates as the volume changes is the concentration of free RNAPs, c_n,free, which enters the binding probability P_b,i (Eq. (2)). Since the contribution of the particular gene to the global allocation of RNAPs is negligible, the proportionality between the mRNA production rates of most genes and the cell volume is still valid. It then follows that c_n,free must change in a way that ensures that the binding probability of RNAP P_b,i is proportional to cell volume for all genes except the particular gene with a different MM constant. Therefore, if the particular gene has a lower MM constant K_n,i than the typical K_n, this gene is saturated earlier by the rising free RNAP concentration—it thus increases more slowly with increasing volume. A gene with a higher K_n,i, on the other hand, is not so easily saturated as the typical gene and hence increases superlinearly with volume.

More quantitatively, given the probability for the promoters of most genes to be bound is still P_b ≈ n/n_c, the free RNAP concentration can be expressed as a function of n/n_c. Using the expression of free RNAP concentration, we obtain the mRNA production rate for the particular gene i with the MM constant equal to K_n,i:

$${k}_{n,i}={{{\Gamma }}}_{n,i}{g}_{i}\frac{{K}_{n}n}{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n}.$$

(4)

As n gets closer to n_c (with F_n ≪ 1 still satisfied), we find that if K_n,i > K_n (K_n,i < K_n) the mRNA production rate of gene i exhibits a superlinear (sublinear) dependence on the RNAP number. Because the RNAP number is proportional to cell volume, this nonlinear relation is also valid between the mRNA production rates and cell volume, which leads to the nonlinear scaling between the mRNA copy numbers and cell volume. Furthermore, if the corresponding proteins have short lifetimes, their copy numbers will be proportional to the mRNA production rates as well. More importantly, we find that the nonlinear scaling in mRNA copy numbers also propagates to nondegradable proteins, suggesting that the nonlinear scalings in protein copy numbers are insensitive to their lifetimes (Methods).

In the following, we discuss a more realistic scenario that is a continuous distribution of K_n,i to reflect the promoter structures’ heterogeneity. In this case, we propose that the nonlinear scaling, Eq. (4), is still approximately valid for any gene if K_n is replaced by 〈K_n,i〉, the average value of K_n,i among all genes with some appropriate weights. In Supplementary Discussion D, we show that the appropriate weight can be well approximated by the protein mass fractions, as we confirm numerically in the next section. Therefore, those genes with K_n,i larger (smaller) than 〈K_n,i〉 exhibit superlinear (sublinear) scaling, while genes with K_n,i near the average exhibit approximately linear scaling.

We remark that to ensure the linear scaling of the majority of genes, the RNAP number and the ribosome number should be proportional to the cell volume, which requires that the MM constants of RNAP and ribosome are close to the average value. These are the additional assumptions we make in the case of continuously distributed K_n,i. For RNAP, it is supported by the constant mRNA concentrations of RNAP related genes observed in the experimental data from Ref. ²¹ (Supplementary Fig. 10). For ribosome, we found a small deviation of ribosomal mRNA number from linear scaling (Supplementary Fig. 9). However, as we show later, our theoretical predictions on the nonlinear scaling of gene expression level still work satisfyingly well in the presence of a small deviation of the ribosome from linear scaling.

To quantify the nonlinear degree between the expression level and the cell volume of each gene, we introduce a parameter β for mRNAs (and also α for nondegradable proteins, see detailed derivations and discussions in Methods), so that its scaling with cell volume becomes

$${\widetilde{m}}_{i}(t)=\widetilde{V}(t)\frac{1+{\beta }_{i}}{1+{\beta }_{i}\widetilde{V}(t)}.$$

(5)

Here the mRNA are measured relative to their values at time t = 0, e.g., ${\widetilde{m}}_{i}(t)={m}_{i}(t)/{m}_{i}(0)$. A similar formula hold for proteins (Eq. (19) in Methods). Positive (negative) β_i represent sublinear (superlinear) scaling behaviors. They are related to the recruitment abilities of their corresponding genes to RNAPs (see the full expressions in Methods) and when K_n,i is close to the average Michaelis-Menten constant 〈K_n,i〉, we find that β_i ∝ 〈K_n,i〉 − K_n,i.

Numerical simulations

We numerically tested our theoretical predictions by simulating a cell with 2000 genes, including RNAP and ribosome, which we coarse-grained as single proteins. Simulation details are summarized in Methods, and parameter values are shown in Supplementary Table 1. To avoid the effects of the cell cycle, which is certainly important but also complicates our analysis, we mainly considered the scenario in which the cell volume grows without cell-cycle progress e.g., cells arrested in the G1 phase, which is a common experimental protocol to study the effects of cell volume on gene expression^21,35. We also simulated the case of the periodic cell cycle as we discuss later in the Discussion section.

We first simulated the simplified model with homogeneous recruitment abilities to RNAPs. We confirmed the exponential growth of cell volume and the linear scaling between mRNA copy numbers, protein copy numbers, and cell volume (Supplementary Fig. 2). We then simulated the case when all genes share the same recruitment ability K_n, but two genes have different abilities that are respectively smaller and larger than K_n. Our theoretical predictions for the expression levels of mRNAs, nondegradable proteins (Fig. 2a, b), and degradable proteins (Supplementary Fig. 3), which assume short mRNA lifetimes, match the simulation results reasonably well. We note that the lifetimes of degradable proteins can be comparable to the duration of the cell cycle, e.g., half of the cell-cycle duration³⁶, in which case deviations of numerical simulations from our theoretical prediction are expected (Supplementary Fig. 3). The nature of nonlinear scaling, whether superlinear or sublinear, is nevertheless independent of the protein’s lifetime, as we show in Methods.

**Fig. 2: We simulate the model with two special genes and their Michaelis-Menten constants are respectively K_n,i = 20K_n, K_n,i = 0.2K_n.**

Finally, we simulated the more realistic scenario in which K_n,i is continuously distributed, which we modeled as a lognormal distribution. Our conclusions are independent of the specific choice of the form of distribution. Examples of volume dependence of several mRNAs and proteins are shown in Supplementary Fig. 4. Nonlinear degrees of mRNAs and proteins are measured based on Eqs. (5, 19).

The resulting distribution of mRNAs is shown in Fig. 3a, and those of proteins are shown in Supplementary Fig. 4. To test our theoretical predictions on the relations between the nonlinear degrees and K_n,i (Eqs. (17, 20) in Methods), we also need the expression of the appropriate average 〈K_n,i〉. We found that the appropriate average Michaelis-Menten constant is the one weighted by the initial protein mass fractions, 〈K_n,i〉 = ∑_iϕ_iK_n,i, where ϕ_i is the mass fraction of proteins i in the entire proteome at time t = 0. The above expression of 〈K_n,i〉 leads to a good agreement between numerical simulations and theoretical predictions (Fig. 3b). We also used an alternative weight that is the time-averaged protein mass fractions, which works equally well (Supplementary Fig. 5). We set the Michaelis-Menten constants of RNAP and ribosome as the average value to ensure that their concentrations are approximately constant as the cell volume increases.

**Fig. 3: Simulation of the scenario in which K_n,i is continuously distributed.**

In Fig. 3a, b, we set the coefficient of variation (standard deviation/mean) of the MM constants as 0.5. To confirm the validity of the model since the recruitment abilities of different promoters can be widely different^27,28,37,38, we also simulated a larger coefficient of variation equal to 1 (Fig. 3c, d). The resulting nonlinear degrees exhibit a broader distribution and appear more similar to experiments as we show in the next section. Furthermore, the theoretical predictions of the nonlinear degrees of mRNA numbers still match reasonably well with the simulations.

We note that the recruitment ability not only determines the nonlinear degree of volume scaling but also affects the mRNA production rate since a higher recruitment ability enhances the binding probability of RNAPs to the promoter. This suggests that there should be a positive correlation between the mRNA production rate and the nonlinear degree. Meanwhile, we note that the recruitment ability also depends on the transcription initiation rate Γ_n,i (Supplementary Discussion A, Eq. (S2)): a higher initiation rate reduces the recruitment ability (increases the MM constant). For simplicity, in most of our simulations, we consider a constant Γ_n,i for genes (except for ribosome and RNAP), and in this case, we indeed found a strong positive correlation between the mRNA production rates and the nonlinear degrees β (Supplementary Fig. 4e). However, in a more general model with heterogeneity in Γ_n,i, a higher initiation rate increases the mRNA production rate but also reduces the recruitment ability so that decreases the nonlinear degree. Therefore, heterogeneity in the initiation rates reduces the correlation between the mRNA production rates and the nonlinear degrees. To confirm this prediction, we simulated the case of heterogeneous initiation rates (see numerical details in Supplementary Discussion E), and our predictions are confirmed numerically (Supplementary Fig. 6). We note that this may be a plausible mechanism of the weak but positive correlation observed in the experimental data, as we discuss in the next section.

Our results suggest that those genes with sublinear scaling and smaller K_n,i contribute more in the weighted average of K_n,i, therefore $\langle {K}_{n,i}\rangle \; < \;\overline{{K}_{n,i}}$ where $\overline{{K}_{n,i}}$ is the average over all genes with equal weights. Therefore, genes with ${K}_{n,i}\approx \overline{{K}_{n,i}}$ are expected to exhibit superlinear scaling. However, to have an estimation of the nonlinear degree of most genes, the appropriate MM constant to compare with 〈K_n,i〉 is the median value. For the lognormal distribution we used in simulations, we found that the median value of K_n,i is close and slightly larger than 〈K_n,i〉. Therefore, the nonlinear degree of the median K_n,i is slightly negative compared with the entire distribution (Fig. 3a, c), which is consistent with the experimental observations (Fig. 4a).

**Fig. 4: Analysis of experimental data.**

Analysis of experimental data and searching for motifs in the promoter sequences

We analyzed the genome-wide dataset from Ref. ²¹ where the volume dependences of mRNA levels are measured for S. cerevisiae. We calculated the nonlinear degrees of mRNA scaling with cell volume using Eq. (5) and obtained the resulting distribution (Fig. 4a). The calculated nonlinear degree β is highly correlated with the nonlinear scale calculated in Ref. ²¹ (Supplementary Fig. 7). The median value of nonlinear degrees is close to zero, suggesting that the majority of genes show approximately linear scaling, similar to our numerical simulations. We found that by choosing appropriate parameters, the numerically simulated distribution of nonlinear degrees matches the experimental measured distribution reasonably well (Supplementary Fig. 8).

We also calculated the correlation between the mRNA production rates and the nonlinear degrees β. We used the mRNA amount at the smallest cell volume divided by the mRNA lifetime as a proxy for the mRNA production rate according to Eq. (6) (Methods). A weak but significant correlation is indeed observed, as shown in Fig. 4b. As we discussed in the section of numerical simulations, the heterogeneous initiation rates can reduce the correlation between the mRNA production rates and the nonlinear degrees. To further verify our model, we also simulated a modified model in which the nonlinear scaling is independent of the Michaelis-Menten constants (Supplementary Discussion E) and found that the correlation between the mRNA production rates and the nonlinear degrees becomes negative (Supplementary Fig. 6).

We used Gene Set Enrichment Analysis (GSEA)^39,40 to find annotated functional gene sets that are enriched in the superlinear and sublinear scaling regime (Methods, Supplementary Fig. 9). Interestingly, we found that the ribosomal genes and other translation-related genes, which correspond to the coarse-grained ribosomal genes in our model, are enriched in the superlinear regime with the average β over ribosomal genes about − 0.2. Similar observations were reported in Ref. ²¹. However, the superlinear scaling of ribosomal genes was also observed in the control cases in which the nonlinearities of other known nonlinear scaling genes were suppressed. Therefore, it was argued that the superlinear scaling of ribosomal genes may be an artifact due to the drug that blocks cell-cycle progress. Interestingly, recent experiments of mammalian cells found weak sublinear scaling of ribosomal proteins²³. For RNAP related genes, we found that they indeed show linear scaling with cell volume as we assume in our coarse-grained model, which is crucial for the linear scaling between the mRNA copy numbers and cell volume (Supplementary Fig. 10). To confirm the validity of our conclusions in the presence of weakly superlinear scaling of ribosomes, we numerically simulated our gene expression model with the recruitment ability of ribosomal gene weaker than the average value and found that even with the small deviation of ribosome number from linear scaling, our theoretical predictions still agree well with the numerical simulations (Supplementary Fig. 11).

In Ref. ²¹, the expression of pre-selected activators for the cell cycle were shown to be superlinear, while pre-selected inhibitors were shown to be sublinear. So we next checked the nonlinear degrees of all cell-cycle regulators using GSEA. We found that inhibitors are indeed enriched in the sublinear regime (Fig. 4c), but activators were not enriched in the superlinear regime. We remark that the inconsistency may be due to the preselection of regulators, but the conclusion of Ref. ²¹ that the interplay of inhibitors and activators can trigger cell-cycle progress is still valid as long as they have different scaling behaviors.

To further support our theoretical predictions, we investigated the promoter sequences of all genes included in our analysis. We expect that those genes with nonlinear scaling should have some special patterns in their promoter sequences, which render them stronger or weaker recruitment abilities to RNAPs than the others. If this is the case, specific motifs should be enriched in the superlinear or sublinear regime. We detect the transcription factors binding motifs in the promoter sequences and then used GSEA to identify those motifs that are enriched in the nonlinear regime. We found 77 motifs enriched in the sublinear regime (see one example in Fig. 4d and Supplementary Table 2). To further validate our results, we computed the average β for genes containing at least n motifs that are enriched in the sublinear regime and found that the average β indeed increases as a function of n (Fig. 4e). Consistent with our theoretical predictions, 76% of the 49 corresponding transcription factors exhibit positive regulation and therefore enhance the recruitment abilities to RNAPs of their target genes (Fig. 4f). However, we did not find motifs enriched in the superlinear regime. Considering the cumulative effect of motifs on β as shown in Fig. 4e, we propose that antagonistic effect may also exist among motifs, which suggests that motifs reducing the recruitment ability to RNAPs may reside in most genes. But in genes without superlinear scaling, their effects are counteracted by other motifs that enhance the recruitment ability.

Discussion

For abnormally large cells, the number of RNA polymerases or ribosomes may exceed some threshold values so that the bottlenecks of gene expression become the templates of gene expression: gene copy numbers and mRNA copy numbers^14,35. However, in typical cellular physiological states, cells are far below the thresholds, and the limiting bottlenecks of gene expression are the copy numbers of RNAPs and ribosomes^10,15,16. In this case, if the promoters of all genes share the same recruitment ability, the expression levels of all genes should exhibit linear scaling with cell volume both at the mRNA and protein levels¹³. We extended this simple scenario to a more realistic case in which the recruitment abilities among genes are continuously distributed. We derived the dependence of the mRNA production rate on cell volume. We show that genes with recruitment abilities below (above) the average exhibit superlinear (sublinear) scaling with cell volume, a natural consequence of the heterogeneous distribution of recruitment abilities. We further show that the nonlinear scaling between the mRNA production rates and cell volume propagates to the mRNA copy numbers and proteins copy numbers. All of our theoretical predictions were confirmed by numerical simulations.

Nonlinear scaling of protein levels is crucial to cell-cycle regulation²¹. Time-dependent protein concentrations allow cells to determine the timing of various cell-cycle events, e.g., based on the ratio of two proteins with different scaling behaviors. To confirm this scenario, we also simulated the case of the periodic cell cycle and let the cell divide when the ratio of the concentrations of one superlinear protein and one sublinear protein exceeds some threshold value. We found that periodic patterns of mRNA and protein concentration emerge. For superlinear genes, their mRNA and protein concentrations decrease initially at the beginning of the cell cycle due to the halved RNAP number at cell birth, but quickly increases as the RNAP number increases (vice versa for sublinear genes). As the cell gets the periodic steady-state, all mRNAs and proteins double their numbers at cell division compared with cell birth^13,41 (Supplementary Discussion F and Supplementary Fig. 12).

Our model shares some similarities with the model introduced in the Methods section of Ref. ¹³, but also with key differences. This work focuses on the effects of heterogeneous MM constants and the resulting nonlinear scaling of gene expression levels, including both the mRNA and protein. In contrast, the model in the Methods of Ref. ¹³ only consider transcription process and homogeneous MM constants. Furthermore, the previous model mainly considers the effects of nonspecifically bound RNAPs, which is believed to be important in bacterial gene expression²⁴. The model in this work mainly considers eukaryotic cells, and the experimental data we analyze is from S cerevisiae²¹.

The recruitment abilities not only determine the nonlinear degrees of gene expression but also determine the mRNA production rates. Therefore, our model predicts that genes with higher (lower) mRNA production rates are more likely to exhibit sublinear (superlinear) scaling with cell volumes. We also note that heterogeneity in the transcription initiation rates can reduce the positive correlation between the mRNA production rates and the nonlinear scaling degrees, in concert with the small but positive correlation observed in the experimental data of S. cerevisiae²¹. Furthermore, according to our theoretical models, motifs that enhance or reduce the promoters’ recruitment abilities should exist in the superlinear or sublinear scaling genes. Indeed, we found a group of motifs enriched in the sublinear regime, and these motifs are associated with transcription factor (TF) binding sites that have positive regulation on the target genes. We note that other mechanisms of nonlinear scaling of gene expression levels are possible, such as time-dependent transcription factor concentrations²¹, or time-dependent initiation rates. A time-dependent transcription factor concentration is equivalent to a time-dependent MM constant K_n,i within our model. However, we note that our GSEA analysis showed that TF-related terms were not enriched in the nonlinear regime (Supplementary Fig. 9), which means TFs do not change their concentrations in general. Therefore, we argue that a changing TF concentration is more specific to certain genes instead of a general situation. Also, we remark that our model does not require time-dependent variables to achieve changing concentrations, and the changing concentrations of mRNAs and proteins are the result of the competition between the genes to the limiting resource of RNAPs. We note that in Ref. ²¹, the numbers of binding sites for specific cell-cycle transcription factors in the target genes were found to be positively correlated with the superlinear degrees of their mRNA levels, which may be due to other mechanisms not related to our model.

Our results can have far-reaching implications: the nonlinear scaling of gene expression level allows cells to sense their sizes based on the ratio of concentrations of different proteins, enabling cells to decide the timing of multiple cell-cycle events such as cell division. Our results suggest that sensing the concentration differences among a group of proteins as a measure of cell volume can be the most accessible option cells can take to achieve cell size regulation. The promoter sequences can also be under evolutionary selection to achieve desired nonlinear scalings of particular genes and robust cell-cycle regulation. Finally, the gene expression model proposed in this work is by construction at the whole-cell level. Therefore, it can be a valuable platform for mathematical modeling of gene expression, especially for problems in which the competition among genes for the limiting resources of RNAPs and ribosomes are crucial.

Methods

A summary of the variables used in the main text

Variables	Meaning
V	cell volume
M	totle protein mass
V_c	cytoplasmic volume
V_n	nuclear volume
ρ	protein mass per cell volume
a	ratio between cell volume and nuclear volume
k_n,i	mRNA production rate of gene i
Γ_n,i	transcriptional initiation rate of gene i
g_i	gene copy number of gene i
P_b,i	RNAPs binding probability on the promoter of gene i
c_n,free	free RNAPs concentration
F_n	free RNAPs fraction
Λ_n,i	maximum number of RNAPs one copy of gene i can hold
n_c	maximum number of RNAPs the entire genome can hold
L_i	length of gene i
v_n	RNAP elongation speed
K_n,i	transcriptional MM constant of gene i
〈K_n,i〉	weighted average of MM constant
n	RNAPs number
τ_m,i	lifetime of mRNA i
m_i	mRNA number of gene i
k_r,i	protein production rate of gene i
Γ_r,i	translational initiation rate of gene i
c_r	ribosomes concentration
F_r	free ribosomes fraction
K_r,i	translational MM constant of gene i
τ_p,i	lifetime of protein i
p_i	protein number of gene i
β_i	nonlinear degree of mRNA i
α_i	nonlinear degree of protein i
ϕ_n	RNAPs mass fraction
ϕ_r	ribosomes mass fraction

Details of the gene expression model

We explain more details of the gene expression model in this section. Given the mRNA production rate, the mRNA copy number m_i changes as

$$\frac{d{m}_{i}}{dt}={k}_{n,i}-\frac{{m}_{i}}{{\tau }_{m,i}}$$

(6)

where τ_m,i is the mRNA lifetime.

Regarding protein production, the protein production rate of one particular gene is proportional to its corresponding mRNA number (m_i), the translation initiation rate (Γ_r,i), and the probability for the ribosome binding site of mRNA to be bound by ribosome:

$${k}_{r,i}={{{\Gamma }}}_{r,i}{m}_{i}\frac{{c}_{r,free}}{{c}_{r,free}+{K}_{r,i}}.$$

(7)

Here c_r,free is the concentration of free ribosomes in the cytoplasm and K_r,i is the Michaelis-Menten constant of ribosome binding on the mRNAs (see an alternative formulation of translation model in Ref. ⁴²).

Because the total number of RNAPs can be separated to free RNAPs, initiating RNAPs, and transcribing RNAPs, we obtained a self-consistent equation to determine the fraction of free RNAPs in all RNAPs:

$${n}_{c}\frac{{c}_{n}{F}_{n}}{{c}_{n}{F}_{n}+{K}_{n}}=n-n{F}_{n}.$$

(8)

Here n is the total number of RNAPs, n_c = ∑_ig_i(1 + Λ_n,i), and F_n is the fraction of free RNAPs. c_n is the concentration of total RNAPs in the nucleus and here we consider the simplified model in which all genes have the same MM constant K_n to RNAPs. The left side represents the number of RNAPs bound to promoters or transcribing. The right side represents the difference between the total number of RNAPs and free RNAPs, which should be equal to the left side. Meanwhile, we assume that c_n ≫ K_n, namely, the total RNAP concentration is much larger than the MM constant of a typical promoter, supported by observations in bacteria⁴³. We argue that this assumption is biologically reasonable because if all RNAPs suddenly become free so that c_n,free = c_n, one would expect that these free RNAPs will have a strong tendency to rebind to the promoters; otherwise, a large fraction of RNAPs will be idle, which is clearly inefficient in normal cellular physiological states. We remark that although the assumption appears reasonable, it remains to be tested in yeast. Using the assumption that K_n/c_n ≪ 1, we find that the fraction of free RNAPs F_n that solves Eq. (8) must be much smaller than 1 if n < n_c (see the illustration in Supplementary Fig. 13). Therefore, we can take F_n = 0 in Eq. (8) and obtain the binding probability as P_b = n/n_c.

When the number of RNAPs exceeds the threshold value n_c, the linear scaling between the mRNA numbers and cell volume breakdowns for all genes, and the growth mode of cell volume also deviates from exponential growth (Phase 2). If the ribosome number exceeds some threshold value, the cell volume eventually grows linearly, which has been observed in budding yeast³⁵ (Phase 3, see Ref. ¹³ and detailed derivations in the Supplementary Discussion B). We note that the main purpose of the assumption c_n ≫ K_n is to make the condition of the negligible fraction of free RNAPs more well defined as n/n_c < 1. Our conclusions on the relation between the nonlinear scaling and the recruitment abilities do not rely on this assumption. Since we mainly focus on the scenario in which RNAP is limiting with F_n ≪ 1, the transition details from Phase 1 to Phase 2 is not important to our conclusions. We also discuss the effects of nonspecific binding of RNAPs on the transition between Phase 1 and Phase 2 in Supplementary Discussion C and show that the condition of Phase 1 becomes more stringent in the presence of nonspecific binding.

Derivation of the nonlinear scaling

We consider a simple model assuming all genes have the same MM constant K_n except one special gene i has a MM constant K_n,i. Since the contribution of the particular gene to the global allocation of RNAP is negligible, Eq. (8) is still valid. We focus on Phase 1 so that F_n ≪ 1 and express c_nF_n as a function of n/n_c. The mRNA production rate for the particular gene with the MM constant equal to K_n,i therefore becomes Eq. (4).

If K_n,i > K_n, we find that the mRNA production rate of gene i exhibits a superlinear dependence on the RNAP number, therefore also the cell volume. If K_n,i < K_n, the mRNA production rate of gene i exhibits a sublinear behavior. The general solution for Eq. (6) becomes

$${m}_{i}(t)={m}_{i}(0){e}^{-t/{\tau }_{m,i}}+\int\nolimits_{0}^{t}{e}^{-{{\Delta }}t/{\tau }_{m,i}}{k}_{n,i}(t-{{\Delta }}t)d{{\Delta }}t.$$

(9)

In the limit that the lifetime of mNRA goes to zero, the number of mRNA becomes strictly proportional to the mRNA production rate

$${m}_{i}(t)={\tau }_{m,i}{k}_{n,i}(t).$$

(10)

We now consider the dynamics of protein number which is

$$\frac{d{p}_{i}}{dt}={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}-\frac{{p}_{i}}{{\tau }_{p,i}}.$$

(11)

The general solution for Eq. (11) is

$${p}_{i}(t)=\; {p}_{i}(0){e}^{-t/{\tau }_{p,i}}\\ +\int\nolimits_{0}^{t}{e}^{-{{\Delta }}t/{\tau }_{p,i}}{{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}(t-{{\Delta }}t)d{{\Delta }}t.$$

(12)

In the following, we assume c_rF_r to be constant, which is a good approximation in Phase 1 (Supplementary Discussion B) and consider two limiting cases, τ_p,i → 0 and τ_p,i → ∞.

When τ_p,i → 0, the protein number becomes strictly proportional to the protein production rate, which is proportional to the mRNA number,

$$\begin{array}{rc}{p}_{i}(t)&={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}(t){\tau }_{p,i}\\ &={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{\tau }_{m,i}{\tau }_{p,i}{k}_{n,i}(t).\end{array}$$

(13)

Note that the second identity is valid when τ_m,i → 0 which is a good approximation when the lifetime of mRNA is much shorter than the doubling time.

When τ_p,i → ∞ and τ_m,i → 0 the dynamics of protein number becomes

$$\begin{array}{rc}&{p}_{i}(t)-{p}_{i}(0)=\int\nolimits_{0}^{t}{{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}({t}^{\prime})d{t}^{\prime}\\ &={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{{{\Gamma }}}_{n,i}{g}_{i}{\tau }_{m,i}\int\nolimits_{0}^{t}\frac{{K}_{n}n}{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n}d{t}^{\prime}.\end{array}$$

(14)

In Phase 1, the number of RNA polymerase increases exponentially n(t) = n(0)e^μt, therefore the integral in Eq. (14) can be analytically calculated

$$ \;{p}_{i}(t)-{p}_{i}(0)\\ ={{{\Gamma }}}_{r,i}{{{\Gamma }}}_{n,i}{g}_{i}{\tau }_{m,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}\frac{{K}_{n}}{({K}_{n}-{K}_{n,i})\mu } \\ \quad\;\times{{{{\mathrm{ln}}}}}\,\left(\frac{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n(t)}{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n(0)}\right).$$

(15)

The nonlinear scaling of mRNA copy number also propagates to nondegradable proteins.

To quantify the nonlinear degrees of mRNA copy number, we investigate the volume dependence of mRNA copy number and normalize the volume and mRNA number by their initial values. Using Eqs. (4), (10)), we obtain

$${\widetilde{m}}_{i}(t)=\widetilde{V}(t)\frac{1-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}}}{1-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}}\widetilde{V}(t)}.$$

(16)

Here ${\widetilde{m}}_{i}(t)={m}_{i}(t)/{m}_{i}(0)$, $\widetilde{V}(t)=V(t)/V(0)$, ΔK_n,i = K_n,i − K_n. We have used the fact that the RNAP concentration is constant therefore n(t)/n(0) = V(t)/V(0). Note that when K_n,i is continuously distributed, we should replace K_n by 〈K_n,i〉. Comparing with Eq. (5) in the main text, we find that

$${\beta }_{i}=-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}}.$$

(17)

When ΔK_n,i = 0, β_i = 0 as expected. For proteins with short lifetimes, the above analysis is equally valid.

We also study the nonlinear degree of nondegradable proteins, and using Eq. (15), we find that

$$\begin{array}{rc}{{\Delta }}{\widetilde{p}}_{i}(t)&={C}_{i}{{{{\mathrm{ln}}}}}\,\left(1-\frac{{{\Delta }}{K}_{n,i}(n(t)-n(0))}{{K}_{n,i}{n}_{c}-{{\Delta }}{K}_{n,i}n(0)}\right)\\ &={C}_{i}{{{{\mathrm{ln}}}}}\,\left(1-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}-{{\Delta }}{K}_{n,i}n(0)}{{\Delta }}\widetilde{V}(t)\right).\end{array}$$

(18)

Here we combine all the constants divided by p_i(0) before the logarithmic term in Eq. (15) to C_i and ${{\Delta }}\widetilde{V}(t)=\widetilde{V}(t)-1$. Therefore, we can write Eq. (18) as

$${{\Delta }}{\widetilde{p}}_{i}(t)={C}_{i}{{{{{{\mathrm{ln}}}}}}}\,\left(1+{\alpha }_{i}{{\Delta }}\widetilde{V}(t)\right),$$

(19)

where

$${\alpha }_{i}=-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}-{{\Delta }}{K}_{n,i}n(0)}.$$

(20)

In the limit ΔK_n,i → 0, we find that

$${{\Delta }}{p}_{i}={{{\Gamma }}}_{r,i}{{{\Gamma }}}_{n,i}{g}_{i}{\tau }_{m,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}\frac{1}{\mu }\frac{n(t)-n(0)}{{n}_{c}},$$

(21)

which is consistent with Eq. (14) in the case of K_n,i = K_n.

Details of numerical simulations

All simulations were done in MATLAB (version R2020b and R2021a). We summarize some of the parameters we used in the numerical simulations in Supplementary Table 1. The gene copy numbers are time-independent which we set as 1 for all genes except the ribosome gene which we set as 5. Given an attempted growth rate μ₀, we set the attempted mass fraction of ribosomes in the entire proteome as

$${\phi }_{r}={\mu }_{0}{L}_{r}/{v}_{r}.$$

(22)

Here, L_r is the length of the ribosome gene in the unit of codons. Note that the actual mass fraction is slightly time-dependent and deviates from the attempted value. To get Eq. (22), we assume that all ribosomes are actively translating and neglect the correction due to free ribosomes and initiating ribosomes. Given the mass fraction of ribosomes, we assume that the copy number of RNAPs is about 10% of that of ribosomes and set the attempted mass fraction of RNAP as

$${\phi }_{n}=0.1{\phi }_{r}{L}_{n}/{L}_{r}.$$

(23)

Here L_n is the length of the RNAP gene. In Fig. 2, we set K_n = 6 × 10³/μm³, and the two special genes with K_n,i = 20K_n and K_n,i = 0.2K_n. We also set the lifetimes of all mRNAs as 10 mins. In Fig. 3, we set K_n,i following a lognormal distribution so that it’s average $\overline{{K}_{n,i}}=6\times 1{0}^{3}/\mu {m}^{3}$. We also set the lifetimes of mRNA following a lognormal distribution with a mean equal to 10 min and a coefficient of variation equal to 1. In all simulations, we set the initial attempted total protein mass as M_b = 10⁹ in the unit of amino acid number and the attempted critical RNAP number as n_c = ∑_ig_i(1 + Λ_n,i) = 10⁴.

To find the appropriate value of Γ_n,i that leads to the attempted n_c, we assume c_nF_n ≪ K_n,i so that

$$\frac{{\phi }_{n}}{{\phi }_{r}}\approx \frac{{{{\Gamma }}}_{n,n}\frac{{g}_{n}}{{K}_{n,n}}{\tau }_{m,n}{L}_{n}}{{{{\Gamma }}}_{n,r}\frac{{g}_{r}}{{K}_{n,r}}{\tau }_{m,r}{L}_{r}},$$

(24)

from which we can find that Γ_n,n = yΓ_n,r where Γ_n,r and Γ_n,n are respectively the transcription initiation rates of ribosomes and RNAPs. y can be found using the above equation. We also set Γ_n,i = xΓ_n,r for all i except the genes for RNAP (i = 1) and ribosome (i = 2), so that

$${\phi }_{r}\approx \frac{{{{\Gamma }}}_{n,r}\frac{{g}_{r}}{{K}_{n,r}}{\tau }_{m,r}{L}_{r}}{{{{\Gamma }}}_{n,r}\frac{{g}_{r}}{{K}_{n,r}}{\tau }_{m,r}{L}_{r}+y{{{\Gamma }}}_{n,r}\frac{{g}_{n}}{{K}_{n,n}}{\tau }_{m,n}{L}_{n}+x{{{\Gamma }}}_{n,r}{\sum }_{i \,{ > }\,2}\frac{{g}_{i}}{{K}_{n,i}}{\tau }_{m,i}{L}_{i}}$$

(25)

from which we can find the expression of x. Finally, using ${\sum }_{i}{g}_{i}(1+{{{\Gamma }}}_{n,i}\frac{{L}_{i}}{{v}_{n}})={n}_{c}$, we find that

$${{{\Gamma }}}_{n,r}=\frac{({n}_{c}-{\sum }_{i}{g}_{i}){v}_{n}}{{g}_{r}{L}_{r}+y{g}_{n}{L}_{n}+x{\sum }_{i \,{ > }\,2}{g}_{i}{L}_{i}}$$

(26)

from which we find Γ_n,n and Γ_n,i for i > 2. Note that the approximation c_nF_n ≪ K_n,i for all i is merely to find the values of Γ_n,i, which do not affect our conclusions.

Before we run the simulations, we use the attempted ϕ_n, initial cell mass, and Γ_n,i with g_i and K_n,i to calculate the mRNA production rates for all genes using Eqs. (1, 2), from which we use m_i = k_n,iτ_m,i as the initial condition. Using the initial m_i, we compute the initial protein mass fractions as ϕ_i = m_iL_i/∑_im_iL_i and also update the mass fractions of RNAP and ribosome. The actual initial mass fraction of RNAP can slightly deviate from the attempted value. To make sure the RNAP copy number is continuous from the beginning of simulations, we also slightly shift the total protein mass from the attempted value so that ϕ_n(actual)M_b(actual) = ϕ_n(attemped)M_b(attempted).

To remove the transient effects, we take the simulation results at t = 20 as the initial values when we compare our results with theoretical predictions. To simulate degradable proteins, we choose 200 of the genes as degradable proteins with lifetimes τ_p = 10 mins and for degradable proteins, we take the simulation results at t = 100 as the initial values. The simulation is stopped when the total protein mass is larger than 9M_b.

Details of the mRNA production rates

For the mRNA production rate, we used the mRNA amount (the product of RPKM and cell volume) at the smallest cell volume divided by the mRNA lifetime from Ref. ⁴⁴ as a proxy according to Eq. (6) in Methods.

Details of motif searching in the promoters of genes

Promoters are defined as 500 bp upstream of transcription start sites in the main text. Sequences of promoters for every gene were downloaded from the Yeastract database⁴⁵. Then we used the online tool CentriMo⁴⁶ of MEME suite (version 5.3.2)⁴⁷ to detect the transcription factors binding motifs annotated in the Yeastract database in these sequences with default parameters.

Details of GSEA

We performed enrichment analysis using GSEA using package clusterProfiler (version 3.12.0)⁴⁸ in R (version 3.6.1). Genes were ordered by nonlinear degree β. The cut-off criteria were set as both the p value and the false discovery rate (FDR) q value < 0.05. The number of permutations used in the analysis is 1e5.

In the analysis of functions, the gmt file was generated from KEGG BRITE hierarchy files containing KEGG objects (KO) for budding yeast in Kyoto Encyclopedia of Genes and Genomes (KEGG) database^49,50,51. In the analysis of cell-cycle regulators, we generated a gmt file containing only two terms, negative regulation of cell cycle (GO:0045786) and positive regulation of cell cycle (GO:0045787), obtained from Gene Ontology (GO) database with AmiGO (version 2.5.15)^52,53,54.

In the analysis of motifs, the gmt file was generated based on the CentriMo results. Every motif was considered as a gene set, including multiple genes that contain this motif in their promoters.

Details of functional enrichment analysis of TFs

Transcription factors corresponding to the motifs enriched in GSEA were picked out. Functional enrichment analysis for budding yeast was done using Metascape (version 3.5)⁵⁵ with default parameters. In 641 genes whose function are regulation of transcription, DNA templated (GO:0006355), 303 genes are annotated as positive regulation of transcription by RNA polymerase II (GO:0045944). In our analysis, all the 49 TFs corresponding to the motifs enriched in GSEA (except one gene ABF2) are annotated as GO:0006355 and 34 of them are annotated as GO:0045944. Based on this situation, single-sided hypergeometric test was performed using R function phyper.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data that support this study are available from the corresponding author upon reasonable request. All data analyzed in this study are available in publicly accessible repositories. The RNA-seq data (GSE145206) has been published in²¹. Promoter sequences of the yeast genome are available from Yeastract [http://www.yeastract.com/formseqretrieval.php] database⁴⁵. The nonlinear degree data are included in the repository of GSEA codes.

Code availability

Codes for GSEA are available at https://github.com/QirunWang/R-codes-for-GSEA. Codes for mathematical simulations are available at https://github.com/QirunWang/MATLAB-code-for-simulation.

References

Crissman, H. A. & Steinkamp, J. A. Rapid, simultaneous measurement of dna, protein, and cell volume in single cells from large mammalian cell populations. J. Cell Biol. 59, 766 (1973).
Article CAS PubMed PubMed Central Google Scholar
Elliott, S. & McLaughlin, C. Rate of macromolecular synthesis through the cell cycle of the yeast saccharomyces cerevisiae. Proc. Natl Acad. Sci. 75, 4384–4388 (1978).
Article ADS CAS PubMed PubMed Central Google Scholar
Salman, H. et al. Universal protein fluctuations in populations of microorganisms. Phys. Rev. Lett. 108, 238105 (2012).
Article ADS PubMed Google Scholar
Marguerat, S. & Bähler, J. Coordinating genome expression with cell size. Trend. Genet. 28, 560–565 (2012).
Article Google Scholar
Kempe, H., Schwabe, A., Crémazy, F., Verschure, P. J. & Bruggeman, F. J. The volumes and transcript counts of single cells reveal concentration homeostasis and capture biological noise. Mol. Biol. Cell 26, 797–804 (2015).
Article PubMed PubMed Central Google Scholar
Padovan-Merhar, O. et al. Single mammalian cells compensate for differences in cellular volume and dna copy number through independent global transcriptional mechanisms. Mol. Cell 58, 339–352 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ietswaart, R., Rosa, S., Wu, Z., Dean, C. & Howard, M. Cell-size-dependent transcription of flc and its antisense long non-coding rna coolair explain cell-to-cell expression variation. Cell Sys. 4, 622–635.e9 (2017).
Google Scholar
Zheng, X.-y & O’Shea, E. K. Cyanobacteria maintain constant protein concentration despite genome copy-number variation. Cell Rep. 19, 497–504 (2017).
Article CAS PubMed Google Scholar
Knapp, B. D. et al. Decoupling of rates of protein synthesis from cell expansion leads to supergrowth. Cell Sys. 9, 434–445 (2019).
Article CAS Google Scholar
Sun, X.-M. et al. Size-dependent increase in rna polymerase ii initiation rates mediates gene expression scaling with cell size. Curr. Biol. 30, 1217–1230.e7 (2020).
Article PubMed Google Scholar
Campos, M. et al. A constant size extension drives bacterial cell size homeostasis. Cell 159, 1433–1446 (2014).
Article CAS PubMed PubMed Central Google Scholar
Cermak, N. et al. High-throughput measurement of single-cell growth rates using serial microfluidic mass sensor arrays. Nat. Biotec. 34, 1052–1059 (2016).
Article CAS Google Scholar
Lin, J. & Amir, A. Homeostasis of protein and mrna concentrations in growing cells. Nat. Commun. 9, 4496 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhurinsky, J. et al. A coordinated global control over cellular transcription. Curr. Biol. 20, 2010–2015 (2010).
Scott, M., Gunderson, C. W., Mateescu, E. M., Zhang, Z. & Hwa, T. Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099 (2010).
Article ADS CAS PubMed Google Scholar
Metzl-Raz, E. et al. Principles of cellular resource allocation revealed by condition-dependent proteome profiling. Elife 6, e28034 (2017).
Article PubMed PubMed Central Google Scholar
Schmoller, K. M., Turner, J., Kõivomägi, M. & Skotheim, J. M. Dilution of the cell cycle inhibitor whi5 controls budding-yeast cell size. Nature 526, 268–272 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Heldt, F. S., Lunstone, R., Tyson, J. J. & Novák, B. Dilution and titration of cell-cycle regulators may control cell size in budding yeast. PLoS Comput. Biol.14, e1006548 (2018).
Article ADS PubMed PubMed Central Google Scholar
Litsios, A., Huberts, D. H. E. W., Terpstra, H. M., Guerra, P. & Heinemann, M. Differential scaling between g1 protein production and cell size dynamics promotes commitment to the cell division cycle in budding yeast. Nat. Cell Biol. 2, 1382–1392 (2019).
Qu, Y. et al. Cell cycle inhibitor whi5 records environmental information to coordinate growth and division in yeast. Cell Rep. 29, 987–994 (2019).
Article CAS PubMed Google Scholar
Chen, Y., Zhao, G., Zahumensky, J., Honey, S. & Futcher, B. Differential scaling of gene expression with cell size may explain size control in budding yeast. Mol. Cell 78, 359–370.e6 (2020).
Article PubMed PubMed Central Google Scholar
Claude, K.-L. et al. Transcription coordinates histone amounts and genome content. Nat. Commun. 12, 1–17 (2021).
Article Google Scholar
Lanz, M. C. et al. Increasing cell size remodels the proteome and promotes senescence. bioRxiv https://doi.org/10.1101/2021.07.29.454227 (2021).
Klumpp, S. & Hwa, T. Growth-rate-dependent partitioning of rna polymerases in bacteria. Proc. Natl. Acad. Sci. 105, 20245–20250 (2008).
Article ADS CAS PubMed PubMed Central Google Scholar
Killeen, M. T. & Greenblatt, J. F. The general transcription factor rap30 binds to rna polymerase ii and prevents it from binding nonspecifically to dna. Mol. Cell. Biol. 12, 30–37 (1992).
CAS PubMed PubMed Central Google Scholar
Roeder, R. G. The role of general initiation factors in transcription by rna polymerase ii. Trend. Biochem. Sci. 21, 327–335 (1996).
Article Google Scholar
Allen, B. L. & Taatjes, D. J. The mediator complex: a central integrator of transcription. Nat. Rev. Mol. Cell Biol. 16, 155–166 (2015).
Article CAS PubMed PubMed Central Google Scholar
Ong, C.-T. & Corces, V. G. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat. Reviews Genet. 12, 283–293 (2011).
Article CAS Google Scholar
Basehoar, A. D., Zanton, S. J. & Pugh, B. F. Identification and distinct regulation of yeast tata box-containing genes. Cell 116, 699–709 (2004).
Article CAS PubMed Google Scholar
Milo, R. & Phillips, R. Cell biology by the numbers (Garland Science, 2015).
Kubitschek, H. E., Baldwin, W. W., Schroeter, S. J. & Graetzer, R. Independence of buoyant cell density and growth rate in escherichia coli. J. Bacteriol. 158, 296–299 (1984).
Article CAS PubMed PubMed Central Google Scholar
Bryan, A. K. et al. Measuring single cell mass, volume, and density with dual suspended microchannel resonators. Lab Chip 14, 569–576 (2014).
Article CAS PubMed PubMed Central Google Scholar
Basan, M. et al. Inflating bacterial cells by increased protein synthesis. Mol. Sys. Biol. 11, 836 (2015).
Neumann, F. R. & Nurse, P. Nuclear size control in fission yeast. J. Cell Biol. 179, 593–600 (2007).
Article CAS PubMed PubMed Central Google Scholar
Neurohr, G. E. et al. Excessive cell growth causes cytoplasm dilution and contributes to senescence. Cell 176, 1083–1097 (2019).
Article CAS PubMed PubMed Central Google Scholar
Belle, A., Tanay, A., Bitincka, L., Shamir, R. & O’Shea, E. K. Quantification of protein half-lives in the budding yeast proteome. Proc. Natl. Acad. Sci. 103, 13004–13009 (2006).
Article ADS CAS PubMed PubMed Central Google Scholar
Bintu, L. et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005).
Article CAS PubMed PubMed Central Google Scholar
Brewster, R. C., Jones, D. L. & Phillips, R. Tuning promoter strength through RNA polymerase binding site design in escherichia coli. PLoS Comput. Biol. 8, e1002811 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102, 15545–15550 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Mootha, V. K. et al. Pgc-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).
Article CAS PubMed Google Scholar
Bierbaum, V. & Klumpp, S. Impact of the cell division cycle on gene circuits. Phys. Biol. 12, 066003 (2015).
Article ADS PubMed Google Scholar
Li, S. H.-J. et al. Escherichia coli translation strategies differ across carbon, nitrogen and phosphorus limitation conditions. Nat. Microbiol. 3, 939 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bremer, H., Dennis, P. & Ehrenberg, M. Free rna polymerase and modeling global transcription in escherichia coli. Biochimie 85, 597–609 (2003).
Article CAS PubMed Google Scholar
Neymotin, B., Athanasiadou, R. & Gresham, D. Determination of in vivo rna kinetics using rate-seq. Rna 20, 1645–1652 (2014).
Article CAS PubMed PubMed Central Google Scholar
Monteiro, P. T. et al. YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts. Nucleic Acids Res. 48, D642–D649 (2019).
Article PubMed Central Google Scholar
Bailey, T. L. & Machanick, P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 40, e128–e128 (2012).
Article CAS PubMed PubMed Central Google Scholar
Bailey, T. L. et al. MEME Suite: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).
Article CAS PubMed PubMed Central Google Scholar
Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterprofiler: an r package for comparing biological themes among gene clusters. OMICS. J. Integr. Biol. 16, 284–287 (2012).
Article CAS Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28, 1947–1951 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 49, D545–D551 (2020).
Article PubMed Central Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Carbon, S. et al. AmiGO: online access to ontology and annotation data. Bioinformatics 25, 288–289 (2008).
Article PubMed PubMed Central Google Scholar
Consortium, T. G. O. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2020).
Article Google Scholar
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1–10 (2019).
ADS Google Scholar

Download references

Acknowledgements

We thank Lucas Carey and Ariel Amir for useful discussions related to this work. We also thank Yuping Chen for kindly sharing experimental data. The research was supported by grants from Peking-Tsinghua Center for Life Sciences.

Author information

Authors and Affiliations

Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
Qirun Wang & Jie Lin
Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, 100871, China
Jie Lin

Authors

Qirun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Lin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.L. conceived, designed, and carried out the theoretical and numerical part of this work. Q.W. performed the analysis of experimental data and promoter sequences. All the authors contributed to the preparation of the manuscript.

Corresponding author

Correspondence to Jie Lin.

Ethics declarations

Competing interests

The authors declare no competing interests

Additional information

Peer review information Nature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Q., Lin, J. Heterogeneous recruitment abilities to RNA polymerases generate nonlinear scaling of gene expression with cell volume. Nat Commun 12, 6852 (2021). https://doi.org/10.1038/s41467-021-26952-y

Download citation

Received: 11 April 2021
Accepted: 27 October 2021
Published: 25 November 2021
DOI: https://doi.org/10.1038/s41467-021-26952-y

This article is cited by

A Critical Appraisal of the Role of Oxygen in Phase Evolution and Mechanical Properties of Additively Manufactured Bulk Metallic Glasses
- K. S. N. Satish Idury
- R. Lakshmi Narayan
Transactions of the Indian Institute of Metals (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Negative autoregulation controls size scaling in confined gene expression reactions

Nuclear compartmentalization as a mechanism of quantitative control of gene expression

Transcription–replication interactions reveal bacterial genome regulation

Introduction

Results

Model of gene expression at the whole-cell level

A simplified model in which all genes share the same recruitment ability

A more realistic model in which genes can have different recruitment abilities

Numerical simulations

Analysis of experimental data and searching for motifs in the promoter sequences

Discussion

Methods

A summary of the variables used in the main text

Details of the gene expression model

Derivation of the nonlinear scaling

Details of numerical simulations

Details of the mRNA production rates

Details of motif searching in the promoters of genes

Details of GSEA

Details of functional enrichment analysis of TFs

Reporting Summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Peer Review File

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

A Critical Appraisal of the Role of Oxygen in Phase Evolution and Mechanical Properties of Additively Manufactured Bulk Metallic Glasses

Comments

Search

Quick links