Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# Heterogeneous recruitment abilities to RNA polymerases generate nonlinear scaling of gene expression with cell volume

## Abstract

While most genes’ expression levels are proportional to cell volumes, some genes exhibit nonlinear scaling between their expression levels and cell volume. Therefore, their mRNA and protein concentrations change as the cell volume increases, which often have crucial biological functions such as cell-cycle regulation. However, the biophysical mechanism underlying the nonlinear scaling between gene expression and cell volume is still unclear. In this work, we show that the nonlinear scaling is a direct consequence of the heterogeneous recruitment abilities of promoters to RNA polymerases based on a gene expression model at the whole-cell level. Those genes with weaker (stronger) recruitment abilities than the average ability spontaneously exhibit superlinear (sublinear) scaling with cell volume. Analysis of the promoter sequences and the nonlinear scaling of Saccharomyces cerevisiae’s mRNA levels shows that motifs associated with transcription regulation are indeed enriched in genes exhibiting nonlinear scaling, in concert with our model.

## Introduction

Homeostasis of gene expression level in a dynamically growing cell volume is widely observed across the domains of biology, namely, the concentrations of mRNAs and proteins of most genes are approximately constant1,2,3,4,5,6,7,8,9,10. Since the cell volume usually grows exponentially9,11,12, this implies exponential growth of mRNA and protein copy numbers as well, and a linear scaling between the gene expression level and cell volume exists. Theoretical models have shown that the linear scaling between gene expression level and exponentially growing cell volume is a consequence of the limiting nature of gene expression machinery: RNA polymerases (RNAPs) and ribosomes13, in agreement with experimental observations10,14,15,16. However, along with the constant concentrations of mRNAs and proteins of most genes, there is a subset of genes exhibiting nonconstant concentrations as the cell volume increases. These genes are often crucial for cell-cycle and size regulation. They allow cells to sense their sizes based on the concentrations of proteins that have different scaling behaviors with cell volume, such as Whi5 and Cln3 in Saccharomyces cerevisiae17,18,19,20,21. Other examples include DNA-binding proteins such as histone proteins, whose expression levels scale with the total DNA amount instead of cell volume22. Recent experiments show that proteins with changing concentration are often associated with cell senescence23. A fundamental question then arises: if a linear scaling between gene expression level and cell volume are by default for most genes, how can cells achieve nonlinear scaling for a subset of genes with cell volume in the meantime? In this paper, we show that the superlinear and sublinear scaling of gene expression level is a direct consequence of the heterogeneous recruitment abilities of promoters to RNA polymerases. Given a unimodal distribution of recruitment abilities, those genes with their promoters’ recruitment abilities below (above) the average spontaneously exhibit superlinear (sublinear) scaling with cell volume, while genes with recruitment abilities near the average exhibit approximately linear scaling.

In the following, we first introduce a gene expression model at the whole-cell level in which the promoters of all genes have the same recruitment abilities to RNAPs, which is analytically solvable. Then we consider a scenario in which all genes except a small subset of genes have the same recruitment abilities and show that the expression levels of those special genes can exhibit superlinear or sublinear scaling with cell volume depending on their relative magnitudes of recruitment abilities compared with the majority of genes. Then we extend the simplified model to allow a continuous distribution of recruitment abilities and show that our simplified model can quantitatively capture this more realistic scenario. Genes with recruitment abilities below (above) the average value naturally exhibit superlinear (sublinear) scaling with cell volume. Finally, to verify our theoretical predictions, we analyze the nonlinear scaling of mRNA numbers vs. cell volume of S cerevisiae using the data from Ref. 21. Our model predicts a positive correlation between the mRNA production rates and nonlinear scaling degrees among genes, and experimental data confirm this. We further analyze the promoter sequences of all genes with measured nonlinear scaling degrees and find that special motifs for transcription factor binding are indeed enriched in the promoters of genes exhibiting nonlinear scaling, in concert with our theoretical predictions. Our results imply that the nonlinear scaling of gene expression level can be under evolutionary selection through the promoter sequences.

## Results

### Model of gene expression at the whole-cell level

We consider a coarse-grained model of gene expression. The mRNA production rate kn,i of one particular gene labeled by index i is proportional to its gene copy number (gi) and the probability for its promoters to be bound by RNAPs (Pb,i),

$${k}_{n,i}={{{\Gamma }}}_{n,i}{g}_{i}{P}_{b,i},$$
(1)

where

$${P}_{b,i}=\frac{{c}_{n,free}}{{c}_{n,free}+{K}_{n,i}}$$
(2)

Here Γn,i is the initiation rate of transcription of gene i: the rate that a promoter-bound RNAP starts transcribing the gene and producing mRNA. We assume the probability for one promoter to be bound by an RNAP follows the Michaelis-Menten equation where cn,free is the concentration of free RNAPs in the nucleus24 (see the schematic in Fig. 1). For simplicity, we consider all the RNAPs to be in the nucleus and neglect the small fractions of RNAP intermediates that may exist in the cytoplasm.

Kn,i is the Michaelis-Menten (MM) constant which is inversely proportional to the binding rate of RNAPs on the promoters (see detailed derivations in Supplementary Discussion A). In the following, we use 1/Kn,i as a metric of the recruitment abilities of genes to RNAPs: a larger Kn,i represents a smaller recruitment ability. We assume the mRNA lifetime as τm,i so that given the mRNA production rate, the mRNA copy number mi changes according to Eq. (6) in Methods. Note that we mainly discuss eukaryotic cells in this work in which the transcription and translation processes are spatially separate, and nonspecific binding of RNAPs to DNA are mostly irrelevant25,26. As we also mainly discuss the transcription of mRNA, RNAP here refers to RNAP II. Real transcription processes in eukaryotic cells are complex and may involve transcription factors, mediators, enhancers, and TATA-binding proteins27,28,29. Their effects are coarse-grained into the Michaelis-Menten constant within our model, e.g., a transcription factor that increases the expression level of one gene is equivalent to reducing the Michaelis-Menten constant in the RNAP binding probability Pb,i.

Because the typical time scale for an RNAP to finish transcribing a gene is around one minute30, much shorter than the typical doubling times, we assume that the dynamics of RNAPs along each gene is in the steady-state. Therefore, the outgoing flux of RNAPs that finish transcribing a gene (vnni/Li) must be equal to the initiation rate of transcription (Pb,iΓn,i) where ni is the number of transcribing RNAPs on one copy of gene i. Here Li is the length of the gene i in the number of codons and vn is the elongation speed of RNAP, which we approximate as a constant. This leads to the expression of ni as ni = Pb,iΛn,i where Λn,i = Γn,iLi/vn is the maximum possible number of transcribing RNAPs on one copy of gene i.

A similar model as transcription also applies to the translational process (Fig. 1 and Methods). In the following analysis, we assume that most proteins are nondegradable, and our results regarding nonlinear scaling are equally valid for degradable proteins. The total protein mass of a cell is M = ∑ipiLi, in the number of amino acids where pi is the protein copy number of gene i. The total protein mass is known to be proportional to the cell volume1,31,32,33. Therefore, we assume a constant ratio between the total protein mass and cell volume. We further assume that the nuclear volume is proportional to the total cell volume, supported by experimental observations34.

### A simplified model in which all genes share the same recruitment ability

In the following, we consider a simplified scenario in which the promoters of all genes have the same recruitment ability to RNA polymerases so that Kn,i = Kn for all i. Within this scenario, we find that the mass fractions of proteins in the entire proteome are approximately constant as the cell volume increases given fixed gene copy numbers (Supplementary Discussion B). Therefore, the protein numbers of all genes, including RNAP, are proportional to the total protein mass, and therefore also proportional to the cell volume.

We also assume that almost all RNAPs are bound to a promoter or transcribing, and we will discuss its validity later in this section. Therefore, the total mRNA production rate of all genes is proportional to the total number of RNAPs, which is also proportional to the cell volume. Finally, using the fact that all genes share the same recruitment ability to RNAPs, it follows that the total mRNA production rate should be evenly distributed among all genes. Therefore, each gene’s mRNA production rate is also proportional to cell volume, which is the main result of this section.

Here, we also derive the mathematical expression of mRNA production rates. As we show in Methods, the fraction of RNAPs that are bound to a promoter or transcribing is very close to 1 if n < nc where nc = ∑igi(1 + Λn,i), which is the maximum number of RNAPs the entire genome can hold. In other words, the fraction of free RNAPs, which are neither bound to a promoter nor transcribing, is much smaller than 1 if n < nc. Given this, we obtain the binding probability as Pb = n/nc (Methods). Using Eq. (1), the mRNA production rate of gene i becomes

$${k}_{n,i}={{{\Gamma }}}_{n,i}\frac{{g}_{i}}{{n}_{c}}n,$$
(3)

which is proportional to the cell volume.

The above scenario of gene expression, which has been called Phase 1 of gene expression13, is the typical state of cells in normal conditions6,14,35 and the main focus of this work. In Phase 1, the cell volume grows exponentially, and the homeostasis of mRNA and protein concentrations of most genes are maintained13 (see detailed discussions in Supplementary Discussion B). When the total number of RNAPs exceeds the threshold value nc, the linear scaling between mRNA numbers and cell volume breakdowns and cells enter a different phase of gene expression (see detailed discussions in Methods and Supplementary Discussion B).

Finally, since the mRNA lifetimes are typically much shorter than the doubling time30, mRNA productions are in quick equilibrium. Therefore according to Eq. (6) in Methods, the mRNA copy numbers can be approximated as the products of mRNA production rates and mRNA lifetimes, therefore proportional to the cell volume as well.

### A more realistic model in which genes can have different recruitment abilities

We now consider a more realistic scenario in which the recruitment abilities to RNAPs of different genes can be different. We start from a simple scenario in which all genes have the same recruitment ability 1/Kn except one special gene i has a recruitment ability 1/Kn,i. We note that the only parameter affecting mRNA production rates as the volume changes is the concentration of free RNAPs, cn,free, which enters the binding probability Pb,i (Eq. (2)). Since the contribution of the particular gene to the global allocation of RNAPs is negligible, the proportionality between the mRNA production rates of most genes and the cell volume is still valid. It then follows that cn,free must change in a way that ensures that the binding probability of RNAP Pb,i is proportional to cell volume for all genes except the particular gene with a different MM constant. Therefore, if the particular gene has a lower MM constant Kn,i than the typical Kn, this gene is saturated earlier by the rising free RNAP concentration—it thus increases more slowly with increasing volume. A gene with a higher Kn,i, on the other hand, is not so easily saturated as the typical gene and hence increases superlinearly with volume.

More quantitatively, given the probability for the promoters of most genes to be bound is still Pb ≈ n/nc, the free RNAP concentration can be expressed as a function of n/nc. Using the expression of free RNAP concentration, we obtain the mRNA production rate for the particular gene i with the MM constant equal to Kn,i:

$${k}_{n,i}={{{\Gamma }}}_{n,i}{g}_{i}\frac{{K}_{n}n}{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n}.$$
(4)

As n gets closer to nc (with Fn 1 still satisfied), we find that if Kn,i > Kn (Kn,i < Kn) the mRNA production rate of gene i exhibits a superlinear (sublinear) dependence on the RNAP number. Because the RNAP number is proportional to cell volume, this nonlinear relation is also valid between the mRNA production rates and cell volume, which leads to the nonlinear scaling between the mRNA copy numbers and cell volume. Furthermore, if the corresponding proteins have short lifetimes, their copy numbers will be proportional to the mRNA production rates as well. More importantly, we find that the nonlinear scaling in mRNA copy numbers also propagates to nondegradable proteins, suggesting that the nonlinear scalings in protein copy numbers are insensitive to their lifetimes (Methods).

In the following, we discuss a more realistic scenario that is a continuous distribution of Kn,i to reflect the promoter structures’ heterogeneity. In this case, we propose that the nonlinear scaling, Eq. (4), is still approximately valid for any gene if Kn is replaced by 〈Kn,i〉, the average value of Kn,i among all genes with some appropriate weights. In Supplementary Discussion D, we show that the appropriate weight can be well approximated by the protein mass fractions, as we confirm numerically in the next section. Therefore, those genes with Kn,i larger (smaller) than 〈Kn,i〉 exhibit superlinear (sublinear) scaling, while genes with Kn,i near the average exhibit approximately linear scaling.

We remark that to ensure the linear scaling of the majority of genes, the RNAP number and the ribosome number should be proportional to the cell volume, which requires that the MM constants of RNAP and ribosome are close to the average value. These are the additional assumptions we make in the case of continuously distributed Kn,i. For RNAP, it is supported by the constant mRNA concentrations of RNAP related genes observed in the experimental data from Ref. 21 (Supplementary Fig. 10). For ribosome, we found a small deviation of ribosomal mRNA number from linear scaling (Supplementary Fig. 9). However, as we show later, our theoretical predictions on the nonlinear scaling of gene expression level still work satisfyingly well in the presence of a small deviation of the ribosome from linear scaling.

To quantify the nonlinear degree between the expression level and the cell volume of each gene, we introduce a parameter β for mRNAs (and also α for nondegradable proteins, see detailed derivations and discussions in Methods), so that its scaling with cell volume becomes

$${\widetilde{m}}_{i}(t)=\widetilde{V}(t)\frac{1+{\beta }_{i}}{1+{\beta }_{i}\widetilde{V}(t)}.$$
(5)

Here the mRNA are measured relative to their values at time t = 0, e.g., $${\widetilde{m}}_{i}(t)={m}_{i}(t)/{m}_{i}(0)$$. A similar formula hold for proteins (Eq. (19) in Methods). Positive (negative) βi represent sublinear (superlinear) scaling behaviors. They are related to the recruitment abilities of their corresponding genes to RNAPs (see the full expressions in Methods) and when Kn,i is close to the average Michaelis-Menten constant 〈Kn,i〉, we find that βi 〈Kn,i〉 − Kn,i.

### Numerical simulations

We numerically tested our theoretical predictions by simulating a cell with 2000 genes, including RNAP and ribosome, which we coarse-grained as single proteins. Simulation details are summarized in Methods, and parameter values are shown in Supplementary Table 1. To avoid the effects of the cell cycle, which is certainly important but also complicates our analysis, we mainly considered the scenario in which the cell volume grows without cell-cycle progress e.g., cells arrested in the G1 phase, which is a common experimental protocol to study the effects of cell volume on gene expression21,35. We also simulated the case of the periodic cell cycle as we discuss later in the Discussion section.

We first simulated the simplified model with homogeneous recruitment abilities to RNAPs. We confirmed the exponential growth of cell volume and the linear scaling between mRNA copy numbers, protein copy numbers, and cell volume (Supplementary Fig. 2). We then simulated the case when all genes share the same recruitment ability Kn, but two genes have different abilities that are respectively smaller and larger than Kn. Our theoretical predictions for the expression levels of mRNAs, nondegradable proteins (Fig. 2a, b), and degradable proteins (Supplementary Fig. 3), which assume short mRNA lifetimes, match the simulation results reasonably well. We note that the lifetimes of degradable proteins can be comparable to the duration of the cell cycle, e.g., half of the cell-cycle duration36, in which case deviations of numerical simulations from our theoretical prediction are expected (Supplementary Fig. 3). The nature of nonlinear scaling, whether superlinear or sublinear, is nevertheless independent of the protein’s lifetime, as we show in Methods.

Finally, we simulated the more realistic scenario in which Kn,i is continuously distributed, which we modeled as a lognormal distribution. Our conclusions are independent of the specific choice of the form of distribution. Examples of volume dependence of several mRNAs and proteins are shown in Supplementary Fig. 4. Nonlinear degrees of mRNAs and proteins are measured based on Eqs. (5, 19).

The resulting distribution of mRNAs is shown in Fig. 3a, and those of proteins are shown in Supplementary Fig. 4. To test our theoretical predictions on the relations between the nonlinear degrees and Kn,i (Eqs. (17, 20) in Methods), we also need the expression of the appropriate average 〈Kn,i〉. We found that the appropriate average Michaelis-Menten constant is the one weighted by the initial protein mass fractions, 〈Kn,i〉 = ∑iϕiKn,i, where ϕi is the mass fraction of proteins i in the entire proteome at time t = 0. The above expression of 〈Kn,i〉 leads to a good agreement between numerical simulations and theoretical predictions (Fig. 3b). We also used an alternative weight that is the time-averaged protein mass fractions, which works equally well (Supplementary Fig. 5). We set the Michaelis-Menten constants of RNAP and ribosome as the average value to ensure that their concentrations are approximately constant as the cell volume increases.

In Fig. 3a, b, we set the coefficient of variation (standard deviation/mean) of the MM constants as 0.5. To confirm the validity of the model since the recruitment abilities of different promoters can be widely different27,28,37,38, we also simulated a larger coefficient of variation equal to 1 (Fig. 3c, d). The resulting nonlinear degrees exhibit a broader distribution and appear more similar to experiments as we show in the next section. Furthermore, the theoretical predictions of the nonlinear degrees of mRNA numbers still match reasonably well with the simulations.

We note that the recruitment ability not only determines the nonlinear degree of volume scaling but also affects the mRNA production rate since a higher recruitment ability enhances the binding probability of RNAPs to the promoter. This suggests that there should be a positive correlation between the mRNA production rate and the nonlinear degree. Meanwhile, we note that the recruitment ability also depends on the transcription initiation rate Γn,i (Supplementary Discussion A, Eq. (S2)): a higher initiation rate reduces the recruitment ability (increases the MM constant). For simplicity, in most of our simulations, we consider a constant Γn,i for genes (except for ribosome and RNAP), and in this case, we indeed found a strong positive correlation between the mRNA production rates and the nonlinear degrees β (Supplementary Fig. 4e). However, in a more general model with heterogeneity in Γn,i, a higher initiation rate increases the mRNA production rate but also reduces the recruitment ability so that decreases the nonlinear degree. Therefore, heterogeneity in the initiation rates reduces the correlation between the mRNA production rates and the nonlinear degrees. To confirm this prediction, we simulated the case of heterogeneous initiation rates (see numerical details in Supplementary Discussion E), and our predictions are confirmed numerically (Supplementary Fig. 6). We note that this may be a plausible mechanism of the weak but positive correlation observed in the experimental data, as we discuss in the next section.

Our results suggest that those genes with sublinear scaling and smaller Kn,i contribute more in the weighted average of Kn,i, therefore $$\langle {K}_{n,i}\rangle \; < \;\overline{{K}_{n,i}}$$ where $$\overline{{K}_{n,i}}$$ is the average over all genes with equal weights. Therefore, genes with $${K}_{n,i}\approx \overline{{K}_{n,i}}$$ are expected to exhibit superlinear scaling. However, to have an estimation of the nonlinear degree of most genes, the appropriate MM constant to compare with 〈Kn,i〉 is the median value. For the lognormal distribution we used in simulations, we found that the median value of Kn,i is close and slightly larger than 〈Kn,i〉. Therefore, the nonlinear degree of the median Kn,i is slightly negative compared with the entire distribution (Fig. 3a, c), which is consistent with the experimental observations (Fig. 4a).

### Analysis of experimental data and searching for motifs in the promoter sequences

We analyzed the genome-wide dataset from Ref. 21 where the volume dependences of mRNA levels are measured for S. cerevisiae. We calculated the nonlinear degrees of mRNA scaling with cell volume using Eq. (5) and obtained the resulting distribution (Fig. 4a). The calculated nonlinear degree β is highly correlated with the nonlinear scale calculated in Ref. 21 (Supplementary Fig. 7). The median value of nonlinear degrees is close to zero, suggesting that the majority of genes show approximately linear scaling, similar to our numerical simulations. We found that by choosing appropriate parameters, the numerically simulated distribution of nonlinear degrees matches the experimental measured distribution reasonably well (Supplementary Fig. 8).

We also calculated the correlation between the mRNA production rates and the nonlinear degrees β. We used the mRNA amount at the smallest cell volume divided by the mRNA lifetime as a proxy for the mRNA production rate according to Eq. (6) (Methods). A weak but significant correlation is indeed observed, as shown in Fig. 4b. As we discussed in the section of numerical simulations, the heterogeneous initiation rates can reduce the correlation between the mRNA production rates and the nonlinear degrees. To further verify our model, we also simulated a modified model in which the nonlinear scaling is independent of the Michaelis-Menten constants (Supplementary Discussion E) and found that the correlation between the mRNA production rates and the nonlinear degrees becomes negative (Supplementary Fig. 6).

We used Gene Set Enrichment Analysis (GSEA)39,40 to find annotated functional gene sets that are enriched in the superlinear and sublinear scaling regime (Methods, Supplementary Fig. 9). Interestingly, we found that the ribosomal genes and other translation-related genes, which correspond to the coarse-grained ribosomal genes in our model, are enriched in the superlinear regime with the average β over ribosomal genes about − 0.2. Similar observations were reported in Ref. 21. However, the superlinear scaling of ribosomal genes was also observed in the control cases in which the nonlinearities of other known nonlinear scaling genes were suppressed. Therefore, it was argued that the superlinear scaling of ribosomal genes may be an artifact due to the drug that blocks cell-cycle progress. Interestingly, recent experiments of mammalian cells found weak sublinear scaling of ribosomal proteins23. For RNAP related genes, we found that they indeed show linear scaling with cell volume as we assume in our coarse-grained model, which is crucial for the linear scaling between the mRNA copy numbers and cell volume (Supplementary Fig. 10). To confirm the validity of our conclusions in the presence of weakly superlinear scaling of ribosomes, we numerically simulated our gene expression model with the recruitment ability of ribosomal gene weaker than the average value and found that even with the small deviation of ribosome number from linear scaling, our theoretical predictions still agree well with the numerical simulations (Supplementary Fig. 11).

In Ref. 21, the expression of pre-selected activators for the cell cycle were shown to be superlinear, while pre-selected inhibitors were shown to be sublinear. So we next checked the nonlinear degrees of all cell-cycle regulators using GSEA. We found that inhibitors are indeed enriched in the sublinear regime (Fig. 4c), but activators were not enriched in the superlinear regime. We remark that the inconsistency may be due to the preselection of regulators, but the conclusion of Ref. 21 that the interplay of inhibitors and activators can trigger cell-cycle progress is still valid as long as they have different scaling behaviors.

To further support our theoretical predictions, we investigated the promoter sequences of all genes included in our analysis. We expect that those genes with nonlinear scaling should have some special patterns in their promoter sequences, which render them stronger or weaker recruitment abilities to RNAPs than the others. If this is the case, specific motifs should be enriched in the superlinear or sublinear regime. We detect the transcription factors binding motifs in the promoter sequences and then used GSEA to identify those motifs that are enriched in the nonlinear regime. We found 77 motifs enriched in the sublinear regime (see one example in Fig. 4d and Supplementary Table 2). To further validate our results, we computed the average β for genes containing at least n motifs that are enriched in the sublinear regime and found that the average β indeed increases as a function of n (Fig. 4e). Consistent with our theoretical predictions, 76% of the 49 corresponding transcription factors exhibit positive regulation and therefore enhance the recruitment abilities to RNAPs of their target genes (Fig. 4f). However, we did not find motifs enriched in the superlinear regime. Considering the cumulative effect of motifs on β as shown in Fig. 4e, we propose that antagonistic effect may also exist among motifs, which suggests that motifs reducing the recruitment ability to RNAPs may reside in most genes. But in genes without superlinear scaling, their effects are counteracted by other motifs that enhance the recruitment ability.

## Discussion

For abnormally large cells, the number of RNA polymerases or ribosomes may exceed some threshold values so that the bottlenecks of gene expression become the templates of gene expression: gene copy numbers and mRNA copy numbers14,35. However, in typical cellular physiological states, cells are far below the thresholds, and the limiting bottlenecks of gene expression are the copy numbers of RNAPs and ribosomes10,15,16. In this case, if the promoters of all genes share the same recruitment ability, the expression levels of all genes should exhibit linear scaling with cell volume both at the mRNA and protein levels13. We extended this simple scenario to a more realistic case in which the recruitment abilities among genes are continuously distributed. We derived the dependence of the mRNA production rate on cell volume. We show that genes with recruitment abilities below (above) the average exhibit superlinear (sublinear) scaling with cell volume, a natural consequence of the heterogeneous distribution of recruitment abilities. We further show that the nonlinear scaling between the mRNA production rates and cell volume propagates to the mRNA copy numbers and proteins copy numbers. All of our theoretical predictions were confirmed by numerical simulations.

Nonlinear scaling of protein levels is crucial to cell-cycle regulation21. Time-dependent protein concentrations allow cells to determine the timing of various cell-cycle events, e.g., based on the ratio of two proteins with different scaling behaviors. To confirm this scenario, we also simulated the case of the periodic cell cycle and let the cell divide when the ratio of the concentrations of one superlinear protein and one sublinear protein exceeds some threshold value. We found that periodic patterns of mRNA and protein concentration emerge. For superlinear genes, their mRNA and protein concentrations decrease initially at the beginning of the cell cycle due to the halved RNAP number at cell birth, but quickly increases as the RNAP number increases (vice versa for sublinear genes). As the cell gets the periodic steady-state, all mRNAs and proteins double their numbers at cell division compared with cell birth13,41 (Supplementary Discussion F and Supplementary Fig. 12).

Our model shares some similarities with the model introduced in the Methods section of Ref. 13, but also with key differences. This work focuses on the effects of heterogeneous MM constants and the resulting nonlinear scaling of gene expression levels, including both the mRNA and protein. In contrast, the model in the Methods of Ref. 13 only consider transcription process and homogeneous MM constants. Furthermore, the previous model mainly considers the effects of nonspecifically bound RNAPs, which is believed to be important in bacterial gene expression24. The model in this work mainly considers eukaryotic cells, and the experimental data we analyze is from S cerevisiae21.

The recruitment abilities not only determine the nonlinear degrees of gene expression but also determine the mRNA production rates. Therefore, our model predicts that genes with higher (lower) mRNA production rates are more likely to exhibit sublinear (superlinear) scaling with cell volumes. We also note that heterogeneity in the transcription initiation rates can reduce the positive correlation between the mRNA production rates and the nonlinear scaling degrees, in concert with the small but positive correlation observed in the experimental data of S. cerevisiae21. Furthermore, according to our theoretical models, motifs that enhance or reduce the promoters’ recruitment abilities should exist in the superlinear or sublinear scaling genes. Indeed, we found a group of motifs enriched in the sublinear regime, and these motifs are associated with transcription factor (TF) binding sites that have positive regulation on the target genes. We note that other mechanisms of nonlinear scaling of gene expression levels are possible, such as time-dependent transcription factor concentrations21, or time-dependent initiation rates. A time-dependent transcription factor concentration is equivalent to a time-dependent MM constant Kn,i within our model. However, we note that our GSEA analysis showed that TF-related terms were not enriched in the nonlinear regime (Supplementary Fig. 9), which means TFs do not change their concentrations in general. Therefore, we argue that a changing TF concentration is more specific to certain genes instead of a general situation. Also, we remark that our model does not require time-dependent variables to achieve changing concentrations, and the changing concentrations of mRNAs and proteins are the result of the competition between the genes to the limiting resource of RNAPs. We note that in Ref. 21, the numbers of binding sites for specific cell-cycle transcription factors in the target genes were found to be positively correlated with the superlinear degrees of their mRNA levels, which may be due to other mechanisms not related to our model.

Our results can have far-reaching implications: the nonlinear scaling of gene expression level allows cells to sense their sizes based on the ratio of concentrations of different proteins, enabling cells to decide the timing of multiple cell-cycle events such as cell division. Our results suggest that sensing the concentration differences among a group of proteins as a measure of cell volume can be the most accessible option cells can take to achieve cell size regulation. The promoter sequences can also be under evolutionary selection to achieve desired nonlinear scalings of particular genes and robust cell-cycle regulation. Finally, the gene expression model proposed in this work is by construction at the whole-cell level. Therefore, it can be a valuable platform for mathematical modeling of gene expression, especially for problems in which the competition among genes for the limiting resources of RNAPs and ribosomes are crucial.

## Methods

### A summary of the variables used in the main text

 Variables Meaning V cell volume M totle protein mass V c cytoplasmic volume V n nuclear volume ρ protein mass per cell volume a ratio between cell volume and nuclear volume k n,i mRNA production rate of gene i Γn,i transcriptional initiation rate of gene i g i gene copy number of gene i P b,i RNAPs binding probability on the promoter of gene i c n,free free RNAPs concentration F n free RNAPs fraction Λn,i maximum number of RNAPs one copy of gene i can hold n c maximum number of RNAPs the entire genome can hold L i length of gene i v n RNAP elongation speed K n,i transcriptional MM constant of gene i 〈Kn,i〉 weighted average of MM constant n RNAPs number τ m,i lifetime of mRNA i m i mRNA number of gene i k r,i protein production rate of gene i Γr,i translational initiation rate of gene i c r ribosomes concentration F r free ribosomes fraction K r,i translational MM constant of gene i τ p,i lifetime of protein i p i protein number of gene i β i nonlinear degree of mRNA i α i nonlinear degree of protein i ϕ n RNAPs mass fraction ϕ r ribosomes mass fraction

### Details of the gene expression model

We explain more details of the gene expression model in this section. Given the mRNA production rate, the mRNA copy number mi changes as

$$\frac{d{m}_{i}}{dt}={k}_{n,i}-\frac{{m}_{i}}{{\tau }_{m,i}}$$
(6)

where τm,i is the mRNA lifetime.

Regarding protein production, the protein production rate of one particular gene is proportional to its corresponding mRNA number (mi), the translation initiation rate (Γr,i), and the probability for the ribosome binding site of mRNA to be bound by ribosome:

$${k}_{r,i}={{{\Gamma }}}_{r,i}{m}_{i}\frac{{c}_{r,free}}{{c}_{r,free}+{K}_{r,i}}.$$
(7)

Here cr,free is the concentration of free ribosomes in the cytoplasm and Kr,i is the Michaelis-Menten constant of ribosome binding on the mRNAs (see an alternative formulation of translation model in Ref. 42).

Because the total number of RNAPs can be separated to free RNAPs, initiating RNAPs, and transcribing RNAPs, we obtained a self-consistent equation to determine the fraction of free RNAPs in all RNAPs:

$${n}_{c}\frac{{c}_{n}{F}_{n}}{{c}_{n}{F}_{n}+{K}_{n}}=n-n{F}_{n}.$$
(8)

Here n is the total number of RNAPs, nc = ∑igi(1 + Λn,i), and Fn is the fraction of free RNAPs. cn is the concentration of total RNAPs in the nucleus and here we consider the simplified model in which all genes have the same MM constant Kn to RNAPs. The left side represents the number of RNAPs bound to promoters or transcribing. The right side represents the difference between the total number of RNAPs and free RNAPs, which should be equal to the left side. Meanwhile, we assume that cnKn, namely, the total RNAP concentration is much larger than the MM constant of a typical promoter, supported by observations in bacteria43. We argue that this assumption is biologically reasonable because if all RNAPs suddenly become free so that cn,free = cn, one would expect that these free RNAPs will have a strong tendency to rebind to the promoters; otherwise, a large fraction of RNAPs will be idle, which is clearly inefficient in normal cellular physiological states. We remark that although the assumption appears reasonable, it remains to be tested in yeast. Using the assumption that Kn/cn 1, we find that the fraction of free RNAPs Fn that solves Eq. (8) must be much smaller than 1 if n < nc (see the illustration in Supplementary Fig. 13). Therefore, we can take Fn = 0 in Eq. (8) and obtain the binding probability as Pb = n/nc.

When the number of RNAPs exceeds the threshold value nc, the linear scaling between the mRNA numbers and cell volume breakdowns for all genes, and the growth mode of cell volume also deviates from exponential growth (Phase 2). If the ribosome number exceeds some threshold value, the cell volume eventually grows linearly, which has been observed in budding yeast35 (Phase 3, see Ref. 13 and detailed derivations in the Supplementary Discussion B). We note that the main purpose of the assumption cnKn is to make the condition of the negligible fraction of free RNAPs more well defined as n/nc < 1. Our conclusions on the relation between the nonlinear scaling and the recruitment abilities do not rely on this assumption. Since we mainly focus on the scenario in which RNAP is limiting with Fn 1, the transition details from Phase 1 to Phase 2 is not important to our conclusions. We also discuss the effects of nonspecific binding of RNAPs on the transition between Phase 1 and Phase 2 in Supplementary Discussion C and show that the condition of Phase 1 becomes more stringent in the presence of nonspecific binding.

### Derivation of the nonlinear scaling

We consider a simple model assuming all genes have the same MM constant Kn except one special gene i has a MM constant Kn,i. Since the contribution of the particular gene to the global allocation of RNAP is negligible, Eq. (8) is still valid. We focus on Phase 1 so that Fn 1 and express cnFn as a function of n/nc. The mRNA production rate for the particular gene with the MM constant equal to Kn,i therefore becomes Eq. (4).

If Kn,i > Kn, we find that the mRNA production rate of gene i exhibits a superlinear dependence on the RNAP number, therefore also the cell volume. If Kn,i < Kn, the mRNA production rate of gene i exhibits a sublinear behavior. The general solution for Eq. (6) becomes

$${m}_{i}(t)={m}_{i}(0){e}^{-t/{\tau }_{m,i}}+\int\nolimits_{0}^{t}{e}^{-{{\Delta }}t/{\tau }_{m,i}}{k}_{n,i}(t-{{\Delta }}t)d{{\Delta }}t.$$
(9)

In the limit that the lifetime of mNRA goes to zero, the number of mRNA becomes strictly proportional to the mRNA production rate

$${m}_{i}(t)={\tau }_{m,i}{k}_{n,i}(t).$$
(10)

We now consider the dynamics of protein number which is

$$\frac{d{p}_{i}}{dt}={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}-\frac{{p}_{i}}{{\tau }_{p,i}}.$$
(11)

The general solution for Eq. (11) is

$${p}_{i}(t)=\; {p}_{i}(0){e}^{-t/{\tau }_{p,i}}\\ +\int\nolimits_{0}^{t}{e}^{-{{\Delta }}t/{\tau }_{p,i}}{{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}(t-{{\Delta }}t)d{{\Delta }}t.$$
(12)

In the following, we assume crFr to be constant, which is a good approximation in Phase 1 (Supplementary Discussion B) and consider two limiting cases, τp,i → 0 and τp,i → .

When τp,i → 0, the protein number becomes strictly proportional to the protein production rate, which is proportional to the mRNA number,

$$\begin{array}{rc}{p}_{i}(t)&={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}(t){\tau }_{p,i}\\ &={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{\tau }_{m,i}{\tau }_{p,i}{k}_{n,i}(t).\end{array}$$
(13)

Note that the second identity is valid when τm,i → 0 which is a good approximation when the lifetime of mRNA is much shorter than the doubling time.

When τp,i →  and τm,i → 0 the dynamics of protein number becomes

$$\begin{array}{rc}&{p}_{i}(t)-{p}_{i}(0)=\int\nolimits_{0}^{t}{{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{m}_{i}({t}^{\prime})d{t}^{\prime}\\ &={{{\Gamma }}}_{r,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}{{{\Gamma }}}_{n,i}{g}_{i}{\tau }_{m,i}\int\nolimits_{0}^{t}\frac{{K}_{n}n}{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n}d{t}^{\prime}.\end{array}$$
(14)

In Phase 1, the number of RNA polymerase increases exponentially n(t) = n(0)eμt, therefore the integral in Eq. (14) can be analytically calculated

$$\;{p}_{i}(t)-{p}_{i}(0)\\ ={{{\Gamma }}}_{r,i}{{{\Gamma }}}_{n,i}{g}_{i}{\tau }_{m,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}\frac{{K}_{n}}{({K}_{n}-{K}_{n,i})\mu } \\ \quad\;\times{{{{\mathrm{ln}}}}}\,\left(\frac{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n(t)}{{K}_{n,i}{n}_{c}-({K}_{n,i}-{K}_{n})n(0)}\right).$$
(15)

The nonlinear scaling of mRNA copy number also propagates to nondegradable proteins.

To quantify the nonlinear degrees of mRNA copy number, we investigate the volume dependence of mRNA copy number and normalize the volume and mRNA number by their initial values. Using Eqs. (4), (10)), we obtain

$${\widetilde{m}}_{i}(t)=\widetilde{V}(t)\frac{1-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}}}{1-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}}\widetilde{V}(t)}.$$
(16)

Here $${\widetilde{m}}_{i}(t)={m}_{i}(t)/{m}_{i}(0)$$, $$\widetilde{V}(t)=V(t)/V(0)$$, ΔKn,i = Kn,i − Kn. We have used the fact that the RNAP concentration is constant therefore n(t)/n(0) = V(t)/V(0). Note that when Kn,i is continuously distributed, we should replace Kn by 〈Kn,i〉. Comparing with Eq. (5) in the main text, we find that

$${\beta }_{i}=-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}}.$$
(17)

When ΔKn,i = 0, βi = 0 as expected. For proteins with short lifetimes, the above analysis is equally valid.

We also study the nonlinear degree of nondegradable proteins, and using Eq. (15), we find that

$$\begin{array}{rc}{{\Delta }}{\widetilde{p}}_{i}(t)&={C}_{i}{{{{\mathrm{ln}}}}}\,\left(1-\frac{{{\Delta }}{K}_{n,i}(n(t)-n(0))}{{K}_{n,i}{n}_{c}-{{\Delta }}{K}_{n,i}n(0)}\right)\\ &={C}_{i}{{{{\mathrm{ln}}}}}\,\left(1-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}-{{\Delta }}{K}_{n,i}n(0)}{{\Delta }}\widetilde{V}(t)\right).\end{array}$$
(18)

Here we combine all the constants divided by pi(0) before the logarithmic term in Eq. (15) to Ci and $${{\Delta }}\widetilde{V}(t)=\widetilde{V}(t)-1$$. Therefore, we can write Eq. (18) as

$${{\Delta }}{\widetilde{p}}_{i}(t)={C}_{i}{{{{{{\mathrm{ln}}}}}}}\,\left(1+{\alpha }_{i}{{\Delta }}\widetilde{V}(t)\right),$$
(19)

where

$${\alpha }_{i}=-\frac{{{\Delta }}{K}_{n,i}n(0)}{{K}_{n,i}{n}_{c}-{{\Delta }}{K}_{n,i}n(0)}.$$
(20)

In the limit ΔKn,i → 0, we find that

$${{\Delta }}{p}_{i}={{{\Gamma }}}_{r,i}{{{\Gamma }}}_{n,i}{g}_{i}{\tau }_{m,i}\frac{{c}_{r}{F}_{r}}{{c}_{r}{F}_{r}+{K}_{r,i}}\frac{1}{\mu }\frac{n(t)-n(0)}{{n}_{c}},$$
(21)

which is consistent with Eq. (14) in the case of Kn,i = Kn.

### Details of numerical simulations

All simulations were done in MATLAB (version R2020b and R2021a). We summarize some of the parameters we used in the numerical simulations in Supplementary Table 1. The gene copy numbers are time-independent which we set as 1 for all genes except the ribosome gene which we set as 5. Given an attempted growth rate μ0, we set the attempted mass fraction of ribosomes in the entire proteome as

$${\phi }_{r}={\mu }_{0}{L}_{r}/{v}_{r}.$$
(22)

Here, Lr is the length of the ribosome gene in the unit of codons. Note that the actual mass fraction is slightly time-dependent and deviates from the attempted value. To get Eq. (22), we assume that all ribosomes are actively translating and neglect the correction due to free ribosomes and initiating ribosomes. Given the mass fraction of ribosomes, we assume that the copy number of RNAPs is about 10% of that of ribosomes and set the attempted mass fraction of RNAP as

$${\phi }_{n}=0.1{\phi }_{r}{L}_{n}/{L}_{r}.$$
(23)

Here Ln is the length of the RNAP gene. In Fig. 2, we set Kn = 6 × 103/μm3, and the two special genes with Kn,i = 20Kn and Kn,i = 0.2Kn. We also set the lifetimes of all mRNAs as 10 mins. In Fig. 3, we set Kn,i following a lognormal distribution so that it’s average $$\overline{{K}_{n,i}}=6\times 1{0}^{3}/\mu {m}^{3}$$. We also set the lifetimes of mRNA following a lognormal distribution with a mean equal to 10 min and a coefficient of variation equal to 1. In all simulations, we set the initial attempted total protein mass as Mb = 109 in the unit of amino acid number and the attempted critical RNAP number as nc = ∑igi(1 + Λn,i) = 104.

To find the appropriate value of Γn,i that leads to the attempted nc, we assume cnFnKn,i so that

$$\frac{{\phi }_{n}}{{\phi }_{r}}\approx \frac{{{{\Gamma }}}_{n,n}\frac{{g}_{n}}{{K}_{n,n}}{\tau }_{m,n}{L}_{n}}{{{{\Gamma }}}_{n,r}\frac{{g}_{r}}{{K}_{n,r}}{\tau }_{m,r}{L}_{r}},$$
(24)

from which we can find that Γn,n = yΓn,r where Γn,r and Γn,n are respectively the transcription initiation rates of ribosomes and RNAPs. y can be found using the above equation. We also set Γn,i = xΓn,r for all i except the genes for RNAP (i = 1) and ribosome (i = 2), so that

$${\phi }_{r}\approx \frac{{{{\Gamma }}}_{n,r}\frac{{g}_{r}}{{K}_{n,r}}{\tau }_{m,r}{L}_{r}}{{{{\Gamma }}}_{n,r}\frac{{g}_{r}}{{K}_{n,r}}{\tau }_{m,r}{L}_{r}+y{{{\Gamma }}}_{n,r}\frac{{g}_{n}}{{K}_{n,n}}{\tau }_{m,n}{L}_{n}+x{{{\Gamma }}}_{n,r}{\sum }_{i \,{ > }\,2}\frac{{g}_{i}}{{K}_{n,i}}{\tau }_{m,i}{L}_{i}}$$
(25)

from which we can find the expression of x. Finally, using $${\sum }_{i}{g}_{i}(1+{{{\Gamma }}}_{n,i}\frac{{L}_{i}}{{v}_{n}})={n}_{c}$$, we find that

$${{{\Gamma }}}_{n,r}=\frac{({n}_{c}-{\sum }_{i}{g}_{i}){v}_{n}}{{g}_{r}{L}_{r}+y{g}_{n}{L}_{n}+x{\sum }_{i \,{ > }\,2}{g}_{i}{L}_{i}}$$
(26)

from which we find Γn,n and Γn,i for i > 2. Note that the approximation cnFnKn,i for all i is merely to find the values of Γn,i, which do not affect our conclusions.

Before we run the simulations, we use the attempted ϕn, initial cell mass, and Γn,i with gi and Kn,i to calculate the mRNA production rates for all genes using Eqs. (1, 2), from which we use mi = kn,iτm,i as the initial condition. Using the initial mi, we compute the initial protein mass fractions as ϕi = miLi/∑imiLi and also update the mass fractions of RNAP and ribosome. The actual initial mass fraction of RNAP can slightly deviate from the attempted value. To make sure the RNAP copy number is continuous from the beginning of simulations, we also slightly shift the total protein mass from the attempted value so that ϕn(actual)Mb(actual) = ϕn(attemped)Mb(attempted).

To remove the transient effects, we take the simulation results at t = 20 as the initial values when we compare our results with theoretical predictions. To simulate degradable proteins, we choose 200 of the genes as degradable proteins with lifetimes τp = 10 mins and for degradable proteins, we take the simulation results at t = 100 as the initial values. The simulation is stopped when the total protein mass is larger than 9Mb.

### Details of the mRNA production rates

For the mRNA production rate, we used the mRNA amount (the product of RPKM and cell volume) at the smallest cell volume divided by the mRNA lifetime from Ref. 44 as a proxy according to Eq. (6) in Methods.

### Details of motif searching in the promoters of genes

Promoters are defined as 500 bp upstream of transcription start sites in the main text. Sequences of promoters for every gene were downloaded from the Yeastract database45. Then we used the online tool CentriMo46 of MEME suite (version 5.3.2)47 to detect the transcription factors binding motifs annotated in the Yeastract database in these sequences with default parameters.

### Details of GSEA

We performed enrichment analysis using GSEA using package clusterProfiler (version 3.12.0)48 in R (version 3.6.1). Genes were ordered by nonlinear degree β. The cut-off criteria were set as both the p value and the false discovery rate (FDR) q value < 0.05. The number of permutations used in the analysis is 1e5.

In the analysis of functions, the gmt file was generated from KEGG BRITE hierarchy files containing KEGG objects (KO) for budding yeast in Kyoto Encyclopedia of Genes and Genomes (KEGG) database49,50,51. In the analysis of cell-cycle regulators, we generated a gmt file containing only two terms, negative regulation of cell cycle (GO:0045786) and positive regulation of cell cycle (GO:0045787), obtained from Gene Ontology (GO) database with AmiGO (version 2.5.15)52,53,54.

In the analysis of motifs, the gmt file was generated based on the CentriMo results. Every motif was considered as a gene set, including multiple genes that contain this motif in their promoters.

### Details of functional enrichment analysis of TFs

Transcription factors corresponding to the motifs enriched in GSEA were picked out. Functional enrichment analysis for budding yeast was done using Metascape (version 3.5)55 with default parameters. In 641 genes whose function are regulation of transcription, DNA templated (GO:0006355), 303 genes are annotated as positive regulation of transcription by RNA polymerase II (GO:0045944). In our analysis, all the 49 TFs corresponding to the motifs enriched in GSEA (except one gene ABF2) are annotated as GO:0006355 and 34 of them are annotated as GO:0045944. Based on this situation, single-sided hypergeometric test was performed using R function phyper.

### Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

The data that support this study are available from the corresponding author upon reasonable request. All data analyzed in this study are available in publicly accessible repositories. The RNA-seq data (GSE145206) has been published in21. Promoter sequences of the yeast genome are available from Yeastract [http://www.yeastract.com/formseqretrieval.php] database45. The nonlinear degree data are included in the repository of GSEA codes.

## Code availability

Codes for GSEA are available at https://github.com/QirunWang/R-codes-for-GSEA. Codes for mathematical simulations are available at https://github.com/QirunWang/MATLAB-code-for-simulation.

## References

1. 1.

Crissman, H. A. & Steinkamp, J. A. Rapid, simultaneous measurement of dna, protein, and cell volume in single cells from large mammalian cell populations. J. Cell Biol. 59, 766 (1973).

2. 2.

Elliott, S. & McLaughlin, C. Rate of macromolecular synthesis through the cell cycle of the yeast saccharomyces cerevisiae. Proc. Natl Acad. Sci. 75, 4384–4388 (1978).

3. 3.

Salman, H. et al. Universal protein fluctuations in populations of microorganisms. Phys. Rev. Lett. 108, 238105 (2012).

4. 4.

Marguerat, S. & Bähler, J. Coordinating genome expression with cell size. Trend. Genet. 28, 560–565 (2012).

5. 5.

Kempe, H., Schwabe, A., Crémazy, F., Verschure, P. J. & Bruggeman, F. J. The volumes and transcript counts of single cells reveal concentration homeostasis and capture biological noise. Mol. Biol. Cell 26, 797–804 (2015).

6. 6.

Padovan-Merhar, O. et al. Single mammalian cells compensate for differences in cellular volume and dna copy number through independent global transcriptional mechanisms. Mol. Cell 58, 339–352 (2015).

7. 7.

Ietswaart, R., Rosa, S., Wu, Z., Dean, C. & Howard, M. Cell-size-dependent transcription of flc and its antisense long non-coding rna coolair explain cell-to-cell expression variation. Cell Sys. 4, 622–635.e9 (2017).

8. 8.

Zheng, X.-y & O’Shea, E. K. Cyanobacteria maintain constant protein concentration despite genome copy-number variation. Cell Rep. 19, 497–504 (2017).

9. 9.

Knapp, B. D. et al. Decoupling of rates of protein synthesis from cell expansion leads to supergrowth. Cell Sys. 9, 434–445 (2019).

10. 10.

Sun, X.-M. et al. Size-dependent increase in rna polymerase ii initiation rates mediates gene expression scaling with cell size. Curr. Biol. 30, 1217–1230.e7 (2020).

11. 11.

Campos, M. et al. A constant size extension drives bacterial cell size homeostasis. Cell 159, 1433–1446 (2014).

12. 12.

Cermak, N. et al. High-throughput measurement of single-cell growth rates using serial microfluidic mass sensor arrays. Nat. Biotec. 34, 1052–1059 (2016).

13. 13.

Lin, J. & Amir, A. Homeostasis of protein and mrna concentrations in growing cells. Nat. Commun. 9, 4496 (2018).

14. 14.

Zhurinsky, J. et al. A coordinated global control over cellular transcription. Curr. Biol.20, 2010–2015 (2010).

15. 15.

Scott, M., Gunderson, C. W., Mateescu, E. M., Zhang, Z. & Hwa, T. Interdependence of cell growth and gene expression: origins and consequences. Science 330, 1099 (2010).

16. 16.

Metzl-Raz, E. et al. Principles of cellular resource allocation revealed by condition-dependent proteome profiling. Elife 6, e28034 (2017).

17. 17.

Schmoller, K. M., Turner, J., Kõivomägi, M. & Skotheim, J. M. Dilution of the cell cycle inhibitor whi5 controls budding-yeast cell size. Nature 526, 268–272 (2015).

18. 18.

Heldt, F. S., Lunstone, R., Tyson, J. J. & Novák, B. Dilution and titration of cell-cycle regulators may control cell size in budding yeast. PLoS Comput. Biol.14, e1006548 (2018).

19. 19.

Litsios, A., Huberts, D. H. E. W., Terpstra, H. M., Guerra, P. & Heinemann, M. Differential scaling between g1 protein production and cell size dynamics promotes commitment to the cell division cycle in budding yeast. Nat. Cell Biol.2, 1382–1392 (2019).

20. 20.

Qu, Y. et al. Cell cycle inhibitor whi5 records environmental information to coordinate growth and division in yeast. Cell Rep. 29, 987–994 (2019).

21. 21.

Chen, Y., Zhao, G., Zahumensky, J., Honey, S. & Futcher, B. Differential scaling of gene expression with cell size may explain size control in budding yeast. Mol. Cell 78, 359–370.e6 (2020).

22. 22.

Claude, K.-L. et al. Transcription coordinates histone amounts and genome content. Nat. Commun. 12, 1–17 (2021).

23. 23.

Lanz, M. C. et al. Increasing cell size remodels the proteome and promotes senescence. bioRxiv https://doi.org/10.1101/2021.07.29.454227 (2021).

24. 24.

Klumpp, S. & Hwa, T. Growth-rate-dependent partitioning of rna polymerases in bacteria. Proc. Natl. Acad. Sci. 105, 20245–20250 (2008).

25. 25.

Killeen, M. T. & Greenblatt, J. F. The general transcription factor rap30 binds to rna polymerase ii and prevents it from binding nonspecifically to dna. Mol. Cell. Biol. 12, 30–37 (1992).

26. 26.

Roeder, R. G. The role of general initiation factors in transcription by rna polymerase ii. Trend. Biochem. Sci. 21, 327–335 (1996).

27. 27.

Allen, B. L. & Taatjes, D. J. The mediator complex: a central integrator of transcription. Nat. Rev. Mol. Cell Biol. 16, 155–166 (2015).

28. 28.

Ong, C.-T. & Corces, V. G. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat. Reviews Genet. 12, 283–293 (2011).

29. 29.

Basehoar, A. D., Zanton, S. J. & Pugh, B. F. Identification and distinct regulation of yeast tata box-containing genes. Cell 116, 699–709 (2004).

30. 30.

Milo, R. & Phillips, R. Cell biology by the numbers (Garland Science, 2015).

31. 31.

Kubitschek, H. E., Baldwin, W. W., Schroeter, S. J. & Graetzer, R. Independence of buoyant cell density and growth rate in escherichia coli. J. Bacteriol. 158, 296–299 (1984).

32. 32.

Bryan, A. K. et al. Measuring single cell mass, volume, and density with dual suspended microchannel resonators. Lab Chip 14, 569–576 (2014).

33. 33.

Basan, M. et al. Inflating bacterial cells by increased protein synthesis. Mol. Sys. Biol.11, 836 (2015).

34. 34.

Neumann, F. R. & Nurse, P. Nuclear size control in fission yeast. J. Cell Biol. 179, 593–600 (2007).

35. 35.

Neurohr, G. E. et al. Excessive cell growth causes cytoplasm dilution and contributes to senescence. Cell 176, 1083–1097 (2019).

36. 36.

Belle, A., Tanay, A., Bitincka, L., Shamir, R. & O’Shea, E. K. Quantification of protein half-lives in the budding yeast proteome. Proc. Natl. Acad. Sci. 103, 13004–13009 (2006).

37. 37.

Bintu, L. et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005).

38. 38.

Brewster, R. C., Jones, D. L. & Phillips, R. Tuning promoter strength through RNA polymerase binding site design in escherichia coli. PLoS Comput. Biol. 8, e1002811 (2012).

39. 39.

Subramanian, A. et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102, 15545–15550 (2005).

40. 40.

Mootha, V. K. et al. Pgc-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).

41. 41.

Bierbaum, V. & Klumpp, S. Impact of the cell division cycle on gene circuits. Phys. Biol. 12, 066003 (2015).

42. 42.

Li, S. H.-J. et al. Escherichia coli translation strategies differ across carbon, nitrogen and phosphorus limitation conditions. Nat. Microbiol. 3, 939 (2018).

43. 43.

Bremer, H., Dennis, P. & Ehrenberg, M. Free rna polymerase and modeling global transcription in escherichia coli. Biochimie 85, 597–609 (2003).

44. 44.

Neymotin, B., Athanasiadou, R. & Gresham, D. Determination of in vivo rna kinetics using rate-seq. Rna 20, 1645–1652 (2014).

45. 45.

Monteiro, P. T. et al. YEASTRACT+: a portal for cross-species comparative genomics of transcription regulation in yeasts. Nucleic Acids Res. 48, D642–D649 (2019).

46. 46.

Bailey, T. L. & Machanick, P. Inferring direct DNA binding from ChIP-seq. Nucleic Acids Res. 40, e128–e128 (2012).

47. 47.

Bailey, T. L. et al. MEME Suite: tools for motif discovery and searching. Nucleic Acids Res. 37, W202–W208 (2009).

48. 48.

Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterprofiler: an r package for comparing biological themes among gene clusters. OMICS. J. Integr. Biol. 16, 284–287 (2012).

49. 49.

Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).

50. 50.

Kanehisa, M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 28, 1947–1951 (2019).

51. 51.

Kanehisa, M., Furumichi, M., Sato, Y., Ishiguro-Watanabe, M. & Tanabe, M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 49, D545–D551 (2020).

52. 52.

Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).

53. 53.

Carbon, S. et al. AmiGO: online access to ontology and annotation data. Bioinformatics 25, 288–289 (2008).

54. 54.

Consortium, T. G. O. The Gene Ontology resource: enriching a GOld mine. Nucleic Acids Res. 49, D325–D334 (2020).

55. 55.

Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1–10 (2019).

## Acknowledgements

We thank Lucas Carey and Ariel Amir for useful discussions related to this work. We also thank Yuping Chen for kindly sharing experimental data. The research was supported by grants from Peking-Tsinghua Center for Life Sciences.

## Author information

Authors

### Contributions

J.L. conceived, designed, and carried out the theoretical and numerical part of this work. Q.W. performed the analysis of experimental data and promoter sequences. All the authors contributed to the preparation of the manuscript.

### Corresponding author

Correspondence to Jie Lin.

## Ethics declarations

### Competing interests

The authors declare no competing interests

Peer review informationNature Communications thanks the anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Wang, Q., Lin, J. Heterogeneous recruitment abilities to RNA polymerases generate nonlinear scaling of gene expression with cell volume. Nat Commun 12, 6852 (2021). https://doi.org/10.1038/s41467-021-26952-y

• Accepted:

• Published:

• DOI: https://doi.org/10.1038/s41467-021-26952-y