Contributions to the ‘noise floor’ in gene expression in a population of dividing cells

Experiments with cells reveal the existence of a lower bound for protein noise, the noise floor, in highly expressed genes. Its origins are still debated. We propose a minimal model of gene expression in a proliferating bacterial cell population. The model predicts the existence of a noise floor and it semi-quantitatively reproduces the curved shape of the experimental noise vs. mean protein concentration plots. When the cell volume increases in a different manner than does the mean protein copy number, the noise floor level is determined by the cell population’s age structure and by the dependence of the mean protein concentration on cell age. Additionally, the noise floor level may depend on a biological limit for the mean number of bursts in the cell cycle. In that case, the noise floor level depends on the burst size distribution width but it is insensitive to the mean burst size. Our model quantifies the contributions of each of these mechanisms to gene expression noise.

where G(s, t) = L [p(x, t)] ≡ ∞ 0 e −sx p(x, t)dx , ŵ(s, t) =ν(s, t) − 1 and ν(s, t) = L {ν(u, t)} ≡ ∞ 0 e −su ν(u, t)du ; note that ln[G(−s, t)] = ∞ r=1 κ r (t)s r /r! is a cumulant generating function for p(x, t). Equation (2) will be used to obtain the time-evolution of cumulants κ r (t) of p(x, t) during the cell cycle. Equation (2) can be easily solved, one gets where G(s, t 0 ) = L [p(x, t 0 )] . However, in most cases it is not possible to compute p(x, t) = L −1 [G(s, t)] analytically. protein partitioning at cell division. At cell division (assumed to be instantaneous), the time evolution of p(x, t) given by Eqs. (1) or (2) is interrupted and protein molecules are partitioned between daughter cells: x → {qx, (1 − q)x} . Here, 0 ≤ q ≤ 1 is a random number, drawn from the probability density function η(q) = η(1 − q) . Protein partitioning implies the following relations 22 : Equation (5) links the Laplace transforms of the protein copy number probability density functions immediately after and before cell division, during which protein molecules are randomly partitioned between daughter cells.
Cumulants of protein copy number distribution depend on moments of burst size distribution. In this subsection, we show that the increase in the r-th cumulant κ r (τ ) of the protein copy number probability density function p(x|τ ) during the cell cycle depends solely on the time evolution of the r-th moment m r (τ ) of the burst size probability density function ν(u|τ ) . From (6) it follows that if cell age τ and not the observation time t is used as the time variable, Eq. (2) reads By dividing Eq. (7) by G(s|τ ) one gets the time-evolution equation for ln[G(s|τ )] and hence the time-evolution equation obeyed by cumulants κ r (τ ) of p(x|τ ) (note that we have changed the interpretation of time variable t → τ and therefore the notation according to Eq. (6)): where is the r-th moment of the burst size probability density function ν(u|τ ) . From Eq. (8) it follows that Moments of protein copy number distribution depend on moments of protein partitioning distribution and burst size distribution. In this subsection, we link the time evolution of µ 1 (τ ) and µ 2 (τ ) (the first and second moments of the protein copy number probability density function p(x|τ ) for a single cell) with the moments of ν(u|τ ) and η(q) (the burst size and protein partitioning probability density functions). First, we notice the relation between the r-th moments of p(x|τ ) immediately after and before cell division with the r-th moment of η(q): where is the r-th moment of the protein partitioning probability density function η(q) and is the r-th moment of the protein copy number probability distribution p(x|τ ) . The relation (11) follows from the Eq. (4) or from the Eq. (5), which links the Laplace transform of p(x|τ ) just after the cell division with the Laplace transform of p(x|τ ) just before the division. Note that 2 −r ≤ M r ≤ 2 −1 and M 1 = 1/2 due to the assumed symmetry of η(q) : η(q) = η(1 − q) . (On average, each daughter cell obtains half of the mother's protein molecules of a given type.) Now we want to link the moments µ 1 (τ ) and µ 2 (τ ) with M 2 , m 1 (τ ) , and m 2 (τ ) . For this, we use the auxiliary functions, as well as Eq. (11) and the definition of J r (τ ) , given by the Eq. (10), with The first two moments of p(x|τ ) can be written as: www.nature.com/scientificreports/ Here, h 1 (M 2 ) and h 2 (M 2 ) depend on the second moment M 2 of the protein partitioning probability density function η(q) , whereas I 1 , J 1 (τ ) , I 2 , J 2 (τ ) contain the dependence on the first and second moments m 1 (τ ) , m 2 (τ ) of the burst size probability density function ν(u|τ ) . Note that µ 1 (τ ) and µ 2 (τ ) (16) obey the boundary conditions imposed by Eq. (11). In particular, we have as the mean protein copy number doubles during the cell cycle. The time evolution of µ 1 (τ ) and µ 2 (τ ) for a single cell, as given by Eq. (16), will be needed in the next subsections to obtain the quantities referring to the whole proliferating cell population. emulating binomial protein partitioning distribution. Protein partitioning statistics η(q) used here does not depend on x(T), the number of protein molecules present in the cell immediately before cell division [22][23][24] . Still, we can choose an arbitrary η(q) . We use this freedom to impose the following constraint on M 2 : which holds for the binomial distribution describing protein partitioning in models that assume a discrete protein copy number 12,13,15 . By enforcing the constraint given by Eq. (18) we emulate the behaviour of the first and second moments of the binomial distribution because only this information about the partitioning statistics is needed within our model to calculate the coefficients of variation of protein copy number or concentration.
From (11) and (18) yet another property of the binomial partitioning follows for the coefficient of variation of the protein copy number: The relation (19), linking M 2 , m 1 (τ ) , and m 2 (τ ) , will be later needed for the Eq. (32) which defines M 2 for a particular choice of the burst size probability density function ν(u|τ ) , which will further serve to derive the coefficient of variation of protein concentration for that case.
population averaging over the age structure. All the mathematical results obtained so far referred to a single cell or to a cell lineage. In order to obtain the protein copy number probability density function p a (x) for the proliferating cell population, we must average p(x|τ ) or its moments µ r (τ ) over the cell age probability density function (population age structure) φ a (τ ) [11][12][13]15 . We assume here that the environmental conditions are constant, the population has reached the state of balanced growth and its age structure is stationary; φ a (t, τ ) = φ a (τ ) . (The time independence of the population age structure φ a (τ ) is neither guaranteed, nor obvious. Also, the convergence of φ a (t, τ ) to the stationary age distribution φ a (τ ) may be nontrivial 25,26 .) For any function f (τ ) we introduce the following notation for the population average over the cell age probability density function: The population average f (τ ) a (21) should not be confused with averages over sub-population of cells of the same age or with the time average of f (τ ) , f (τ ) ≡ 1 T T 0 f (τ )dτ over a single cell cycle. Only for the homogeneous age structure, φ a (τ ) For an exponentially growing population in the state of balanced growth, φ a (τ ) is given by 11-13, 15,27 Note that in order to describe the gating procedure-the selection of cells with similar cell age or size-it is sufficient within the present approach to consider an appropriately modified age structure for the whole population: The domain of φ a (τ ) given by Eq. (22), i.e., the interval [0, T] has to be restricted to some narrower age range, The averaging procedure defined by Eq. (21) is not the most general way to obtain the quantities referring to the whole cell population from those referring to a single cell line. A more general approach would involve other model parameters being random variables (e.g., cell volume, cell cycle length), so that the age structure φ a (τ ) or the Master equation for protein levels would be derived 'from scratch' from the time evolution of these variables [28][29][30] .
Scientific RepoRtS | (2020) 10:13533 | https://doi.org/10.1038/s41598-020-69217-2 www.nature.com/scientificreports/ Moments of the protein copy number distribution after integration over the age structure. In order to get the total variance of the protein copy number, i.e., the variance of p a (x) ≡ �p(x|τ )� a referring to the whole cell population, we need the moments of the protein copy number distribution p(x|τ ) to be integrated over the age structure according to Eq. (21): µ 1a = κ 1a ≡ �µ 1 (τ )� a and µ 2a ≡ �µ 2 (τ )� a . We obtain where h 1 (M 2 ) , h 2 (M 2 ) are given by (14) and M 2 is given by (19). The last term of (23), var a [µ 1 (τ )] = �µ 2 1 (τ )� a − �µ 1 (τ )� 2 a , is the variance of the mean protein copy number µ 1 (τ ) (16) computed with respect to φ a (τ ) using (21). This follows from the law of total variance 15 , with φ a (τ ) playing the role of a mixing distribution and p(x|τ ) being the conditional distribution. Note that, due to the boundary conditions (17), we have var a [µ 1 (τ )] > 0. protein concentration. Until now, we have been considering cellular protein levels in terms of the molecule copy number x. Here, we re-calculate the moments of the protein copy number probability density function p(x|τ ) into the moments of the probability density function p(x|τ ) for protein concentration x.
The growing and dividing cell changes its volume, which leads to the following relationship between the protein copy number x(τ ) and protein concentration x(τ ) during the cell cycle: where V (τ ) denotes the volume of a cell of age τ ∈ (0, T) and we assume that V (T) = 2V (0).
We have p(x|τ )dx =p(x|τ )dx , hence the probability density functions for protein copy number and protein concentration scale as and analogously does the concentration burst size probability density function ν(ũ|τ ) . The relationship between the moments of p(x|τ ) and those of p(x|τ ) reads Mean protein concentration does not change during cell division, yet individual cells of the same age τ differ with respect to the protein concentration x(τ ) . The variance of the protein concentration x computed for the whole proliferating cell population reads In the above, μ 1a = �μ 1 (τ )� a and μ 2a = �μ 2 (τ )� a are the first and second moments of p a (x) defined as Clearly, μ 1a and μ 2a are also moments of p(x|τ ) (25) averaged with population age structure φ a (τ ) using Eq. (21), Note that Eq. (28) follows from the law of total variance, as in the case of protein copy number and Eq. (23).
, if the mean protein copy number is proportional to the cell volume). This is in contrast to the case of the protein copy number, where var a [µ 1 (τ )] = 0 is impossible due to the boundary conditions (17).
Distribution of burst sizes. Translational bursting was directly observed by the Xie group in production of reporter proteins under the control of a repressed lac promoter in E. coli. The distributions of protein burst sizes were exponential [31][32][33] . In this subsection, we assume a more general form of the burst size probability density function, which also includes the exponential one: From now on, we consider the τ-independent burst size probability density function of the form where b = m 1 is the mean burst size and m 2 = C 2 b 2 . (For the exponential probability density function, y(z) = exp(−z) and C 2 = 2 .) Then, the Eq. (19), which links the moments of the probability density functions for protein partitioning and for the burst sizes, reads www.nature.com/scientificreports/ Note that M 2 ≤ 1/2 as long as 2C 2 b + 3µ 1 (T) ≥ 2 , which is the case for all but the very small b and µ 1 (T) , where the continuous approximation to the discrete protein copy number used here breaks down anyway. The Eq. (32) will be needed for derivation of the coefficient of variation of protein concentration, under the assumption that the burst size probability density function has the form (31), compatible with the exponential probability density function but not limited to it. cell volume growth. In accordance with the experimental findings 34,35 we assume that cell volume V grows exponentially: is the volume of a newborn cell and is cell volume growth rate. We ignore the stochastic spread of T, and V 0 25,36 , hence the cell volume exactly doubles during the cell cycle ( V (T) = 2V (0) = 2V 0 and = ln(2)/T ). However, even if probability distributions of V 0 , V(T) and are not very broad for bacteria [36][37][38] , such an assumption is reasonable only as the first approximation.
Effective protein copy number. In Ref. 1 , the authors have shown how the coefficient of variation of gene expression in E. coli scales with the mean protein level. The abundance of a fluorescent protein fusion produced from a given gene in a single cell was normalized by the volume of each individual cell to get the protein concentration. However, the final results have been presented in Ref. 1 as the effective protein copy number, i.e., concentration multiplied by the average volume of cells in the population: where V a = �V (τ )� a is the population-averaged cell volume.
From (34) it follows that the probability density function for the effective protein copy number x ⋆ in the cell population is given by where p a (x) is given by (29). The moments µ ⋆ ra of p ⋆ a (x ⋆ ) (for the effective protein copy number) depend on the moments μ ra of p a (x) (for the protein concentration) in the following manner: In consequence, the protein concentration noise c 2 va =κ 2a /μ 2 1a is not affected by the change of variables x → x ⋆ : cell cycle dependent transcription rate. To show the dependence of protein noise on the timing of protein production, we consider transcription rate which is nonzero only during a fraction of the cell cycle: and zero otherwise. Note that k(τ ) can be arbitrary. In fact, the abrupt change in protein production as given by (38) is not very realistic, but it allows us to study the influence of protein production variability during the cell cycle on the protein noise in an idealized and somewhat extreme case. For k(τ ) (38), the quantity I r defined by Eq. (15) reads The parameter = k t T defined in the above equation is the mean number of protein bursts per cell cycle if α = 0 and β = 1 ; in a general case, the mean number of protein bursts per cell cycle is equal to (β − α)� but in what follows for simplicity we still refer to as the 'mean number of bursts per cycle' . If k(τ ) is given by (38) then the mean effective protein copy number in the proliferating cell population, µ ⋆ a1 (36), reads where It is also convenient to define the following auxiliary functions, which will be used in the next section: .

Results
Using the model of stochastic gene expression in dividing cells described in the previous section, we calculate the coefficient of variation of protein concentration c 2 va = (c ⋆ va ) 2 , which measures the protein noise (Eq. 37). We want to see how it depends on the mean protein abundance µ ⋆ 1a . In order to compare the model predictions with the experimental data of Ref. 1 , it is convenient to consider the two extreme cases: Frequency modulation (FM) and amplitude modulation (AM). In FM, the mean size b of translational bursts is constant in Eq. (40) so that the mean protein level µ ⋆ 1a can be changed only by changing the mean burst frequency k t , or more generally by changing the mean number of bursts per cycle = k t T . The corresponding expression for the protein noise will be given by � F (µ ⋆ 1a , b) in Eq. (43) below. In contrast, for AM, the mean burst size b varies and the mean number of bursts per cycle is constant. The expression for the protein noise in that case will be given by � A (µ ⋆ 1a , �) in Eq. (46) below.
protein noise as dependent on the mean number of bursts per cell cycle and mean burst size. The coefficient of variation of the protein concentration may be also written as explicitly dependent on both the mean number of bursts per cell cycle and the mean burst size b but not on the effective mean protein copy number µ ⋆ 1a (Fig. 1): Black lines in Fig. 1 mark the levels of a constant mean protein abundance (mean effective protein copy number) µ ⋆ 1a , given by Eq. (40). µ ⋆ 1a may be varied by moving across these levels along some path which needs to be found experimentally as a dependence between the mean burst size b and mean burst frequency . The two simplest paths, b = const (FM) and = const (AM) have been proposed in the subsections above.
Deterministic protein production. In order to quantify the contributions to protein noise it is desired to compare the predictions of the present model with the predictions of a similar model in which protein production is deterministic. If the protein production is not treated as stochastic but it is described by the deterministic source with intensity σ (t) ≥ 0 , then, instead of Eq. (1), we have (42) (48) ∂ t p(x, t) = −∂ x σ (t)p(x, t) .  (7).) Thus, all predictions of the model with deterministic protein production can be obtained from its stochastic counterpart by putting σ (τ ) = k(τ )m 1 (τ ) and m r = 0 for r > 1 , hence J 2 (τ ) = I 2 = 0 in (28) or C 2 = 0 in (43) and (44). Note that now we have only a single function σ (τ ) describing protein production instead of the two independent functions, k(τ ) and ν(u|τ ) , for the stochastic case. Due to the condition p(0, t) = 0 , the model with deterministic protein production is not sufficient to describe the system of interest if the protein abundance is low. comparison with experimental results. In this subsection, we compare the values of the coefficient of variation of protein concentration predicted by our model with the experimental data of Ref. 1 , to see under what conditions our model reproduces the measured scaling relation of protein noise vs. mean protein abundance.
Both the extreme cases of the coefficient of variation in our model, � F (µ ⋆ 1a , b) for FM (43) and � A (µ ⋆ 1a , �) for AM (46), have the 'boomerang' shape, in a qualitative agreement with the data 1 , see Fig. 2. However, neither , �) (46) with = const should be used for fitting to experimental data; these two functions are just cross-sections of �(�, b) (47) along a fixed value of b (or ), where the value of the non-fixed parameter, (or b), is expressed by µ ⋆ 1a according to the constraint (40). For unambiguous fitting, one would need to additionally introduce an experimentally-based dependence of b or on µ ⋆ 1a , i.e., to define the cross-section path through �(�, b) (Fig. 1), as a function of and b. For this reason, we say that the agreement of the model with the data of Ref. 1

is semi-quantitative because one can always define a constraint
such that the resulting cross-section path will fit the data. One of such constraints may be = const , as discussed below. In general, it seems reasonable that for small protein abundances (small µ ⋆ 1a ), when b cannot be too small, µ ⋆ 1a changes mainly due to varying (FM). In Fig. 1, that would correspond to the increase in the mean protein abundance µ ⋆ 1a by moving down the coefficient of variation plot along the magenta line, b = const . We can also expect that, for highly expressed genes (large µ ⋆ 1a ), the values of saturate (AM) due to some physical upper bound for . That would correspond to the transition from the magenta line b = const to the green line = const in Fig. 1 in order to further increase the mean protein abundance µ ⋆ 1a . In Ref. 1 , Fig. 5A therein, the experimental data were effectively divided into such two regimes. However, clear distinction between the FM and AM regimes when comparing our model to the data is possible only if the cellage dependence of the protein production is identical for all genes, which may be unrealistic: k(τ ) and ν(u|τ ) should be ascribed individually for each gene.
And therefore, the apparent good agreement of the theoretical curves in Fig. 2B with the data of Ref. 1 should be treated with caution. In the naive interpretation, the curves show that translational bursts from most genes have approximately the same mean number of several bursts per cell cycle and the gene expression levels vary only due to the variation of mean burst sizes (AM). However, as discussed above, this picture is too simple: www.nature.com/scientificreports/ Firstly, the most interesting curved part of the noise vs. mean plot in Fig. 2B falls, at least partly, for the values of b < 1 and such small mean burst sizes seem to be unphysical. Secondly, k(τ ) is here assumed constant for each gene, which may not be true.
In the FM regime, the noise floor level F(α, β) is given by Fig. 2A). But if there exists an upper bound for , in particular in the AM regime, where = const , then there is an additional contribution to noise floor given by the 2nd term of Eq. (47) and coming from 1st and 3rd terms in Eqs. (43) or (46), which may substantially enhance the noise floor level. This contribution, proportional to C 2 , is due to the burst size distribution ν(u) (note that the squared coefficient of variation of ν(u) is equal C 2 − 1 , and thus C 2 tells about the burst size distribution's width).
Comparing the case of stochastic ( C 2 > 0 ) and deterministic ( C 2 = 0 ) protein production, we find the contribution of the the burst size distribution's width to the total protein noise. The noise floor level is an increasing function of C 2 in the AM regime (Eq. 46). In the FM regime, the difference between stochastic and deterministic protein production is pronounced only for low and intermediate protein abundances but the noise floor does not depend on C 2 nor on b (Fig. 2A, Eq. (43)). The noise floor levels in the AM regime, where they depend most strongly on , give an interesting information about the lowest possible protein production rates k t for highly expressed genes: The noise floor level of � A (µ ⋆ 1a , �) for = 1 lies well above most of the data (Fig. 2B), and therefore the burst frequency of highly expressed genes should be at least several bursts per cell cycle (depending on cell cycle length T). This is equivalent to the AM regime in 1 , Fig. 5A therein. However, that bound should be even higher if the transcription is cell-cycle dependent, k(τ ) = const (Fig. 2B, blue line; we plotted only an example line for = 5 and protein www.nature.com/scientificreports/ production during 0.6 of the cell cycle not to obscure the plot; the noise floor levels for = 1 and = 10 increase by a similar proportion), or if other extrinsic noise contributions (not included in our model) are present. The noise floor level may be set by limiting the mean number of protein bursts per cell cycle : At that limit, any further increase in gene expression is obtained by increasing the burst size b (Fig. 2B). If, however, were unlimited, then the noise floor given by (49) would fall well below the level observed in the experiment 1 , even for the extremely short period of gene expression, ǫ = 0.1 of the cell cycle (green line in Fig. 2A). Thus, in the FM regime, transcription during only a part of the cell cycle does not seem to realistically increase the noise floor up to the experimentally measured level (Fig. 2A). Note that the contribution from that effect is additive and the vertical scale for the coefficient of variation in the plots in Fig. 2 is logarithmic. The additive increase due to the limitation of gene expression to a part of the cell cycle is thus better visible for low noise but it becomes small for the experimentally measured noise levels. We can see this in the AM regime, where the noise floor is defined by a constant mean number of bursts per cell cycle so that it can match the levels observed in experiment: The additional limitation of gene expression to ǫ = 0.6 of the cell cycle only slightly increases the noise floor level (Fig. 2B, blue line vs. rainbow line for = 5 ). In our model, the minimal noise floor level is ∼ 3 × 10 −4 (Fig. 1A,  FM): For α = 0 , β = 1 (protein production is constant during the entire cell cycle) we obtain the minimum of F(α, β) (49); The corresponding copy number noise floor is equal to 1 − 2(ln 2) 2 ≈ 0.03901 15 . The fact that protein concentration noise is one or two orders of magnitude smaller than the corresponding protein copy number noise was also pointed out in Ref. 11 .
Random protein partitioning at cell division is the cause of the 'boomerang' shape of the noise vs. mean plot in the AM regime. For the deterministic 'half-by-half ' partitioning with η(q) = δ(q − 1/2) and M 2 = 1/4 , the plot in the AM regime is flat (Fig. 2D). In the FM regime, the plot has the 'boomerang' shape even for the halfby-half partitioning, and the contribution of random partitioning to noise is small (Fig. 2C). In both AM and FM regimes, the noise floor level is not affected by the random partitioning. This is because M 2 (µ ⋆ 1a ) → 1/4 and thus

Discussion
We have proposed a model of gene expression in a population of dividing cells which reproduces in a semiquantitative manner the experimental data of Ref. 1 . In particular, our model predicts the existence of the noise floor, i.e., the absolute lower bound for protein noise. Within our model, there are three factors contributing to the noise floor: (i) cell volume and mean protein number may increase asynchronously, which leads to variation of mean protein concentration during cell cycle, (ii) transcription may take place during a fraction of cell cycle and (iii) a physical limitation may be imposed on the mean number of bursts per cell cycle. Although (ii) contributes to (i), we will discuss it separately. Both cell volume growth and mean protein copy number growth are purely periodic and thus deterministic in our model, so is the mean protein concentration calculated with respect of the sub-population of cells of the same age. Consequently, the lack of synchronicity between time evolution of cell volume and mean protein copy number is also of purely deterministic character. This is evident if we note that an identical dependence of mean protein concentration, and thus the same contribution to noise floor, appears in the corresponding model with protein production being deterministic instead of stochastic. For that reason, the term 'noise' may be slightly misleading in the case of (i). This is in analogy to the following situation: One can calculate the variance of a purely deterministic periodic, e.g., sinusoidal signal but the non-zero variance does not mean that the signal has any random component. The degree of randomness of such a signal can be measured by calculating its time correlation, if time-dependent data are available. Without the knowledge of time correlation, just looking at the squared coefficient of variation vs. mean plot of gene expression, one may see an apparent 'noise floor' being the effect of an extrinsic periodic deterministic signal. More realistically, the effect of such a signal may occur as a contribution to the actual noise floor 11 .
However, in order to obtain the protein noise floor for the whole cell population in our model, we have to calculate the variance of the mean protein concentration with respect to the population age structure (probability distribution of cell age or generation time). For that reason, the stochastic character of (i) is related to the stochastic character of the population age structure. And therefore, (i) is a consequence of both the fact that not all cells are of the same age and that the mean protein concentration μ 1 (τ ) varies with cell age τ : The oscillations of the mean protein concentration in consecutive cell cycles occur when protein production does not keep up with or exceeds the cell volume growth. This is already possible for a constant transcription rate but when transcription is limited to a part of the cell cycle (ii), the noise floor level may increase even by 2 orders of magnitude ( Fig. 2A).
Protein noise can be plotted as a function of mean protein abundance, µ ⋆ 1a , after defining how the mean burst size b or the mean number of bursts per cycle (β − α)� depends on µ ⋆ 1a , with the extreme cases being the frequency modulation (FM, Fig. 2A,C) and amplitude modulation (AM, Fig. 2B,D).
The curved shape of the noise vs. mean plot for low protein abundances (tending to ∼ 1/µ ⋆ 1a , Poissonian limit) in the FM regime is due to the burst-like protein production and it occurs even for the deterministic and equal protein partitioning at cell division; random partitioning contributes weakly to the noise for realistic b values. For AM, the Poissonian limit is due to the random protein partitioning between daughter cells at cell division and it disappears when partitioning is deterministic and equal. For AM, the noise floor level depends on (mean number of bursts per cycle for the case of (β − α) = 1 , i.e., for the constant protein production taking place during the entire cell cycle) but it is also finite for = ∞ (deterministic case). Since the experimentally observed noise floor level is ∼ 10 −1 , the contribution to it coming solely from the age structure (i) and cell-cycle dependent gene expression (ii) seems to be very small compared to the contributions of the limitation on (iii) (AM, Fig. 2A) or to the contributions of other possible sources of extrinsic noise not included in the model (e.g., generation time variability 28,29,36 or cell growth rate variability 36  www.nature.com/scientificreports/ Protein noise is often decomposed into extrinsic and intrinsic contributions 39,40 . What is the character of each of the three noise sources (i-iii) considered here?
Within the present approach, cell volume growth and population age structure depend neither on the protein copy number nor on the kinetic parameters describing gene expression. Hence, the cell-cycle dependent variation of mean protein concentration due to asynchronous increases in cell volume and in mean protein number (i) are an extrinsic contribution to protein noise. Now consider the effect of transcription during only a part of cell cycle in each cell generation (ii). Gene regulation, which leads to cell-cycle dependent gene expression, is extrinsic with respect to the gene of interest and deterministic in our model. But protein noise, which occurs when gene expression is enabled, is intrinsic. These notions are to be understood is in analogy those used in the classical works which disentangle extrinsic and intrinsic contributions to gene expression noise by means of the two-reporter assay 40 , where the regulator noise is considered extrinsic.
However, note that there are no correlations assumed a priori between different genes within our model, although such correlations might be present in a cell when a group of cell-cycle-dependent genes is expressed during the same part of cell cycle because of a common cell-cycle-dependent regulator. Such correlations may also occur due to the competition for polymerases (ribosomes) between different genes (transcripts). But in our model we treat each gene (each data point in Fig. 2) as independent, and possibly independently regulated by cell-cycle-dependent factors: For each gene on the plot, the cell-age-dependent transcription rate and the cell-age-dependent burst size distribution may be different. Thus, if we plot a theoretical curve corresponding to gene expression during a fraction of cell cycle against the experimental data (Fig. 2), it does not mean that all data points falling on the curve are the genes expressed during that fraction of cell cycle. It is possible that many theoretical curves can be drawn across the same data point, as corresponding to gene expression during different fractions of cell cycle (or as corresponding to gene expression with different values of other parameters). This shows that our model cannot be used for fitting the data without additional information that would remove ambiguity. The necessary information includes: (a) Experimental dependence linking mean expression level with both mean size and mean frequency of protein bursts. (b) Dependence of transcription rate on cell-cycle age. (c) Dependence of the burst size distribution on cell-cycle age.
Note also that protein production limited to a fraction of the cell cycle (ii) enhances asynchrony between cell volume and mean protein copy number and therefore it contributes to (i).
Finally, consider any limitation imposed on the mean number of protein bursts per cell cycle (iii). If such limitation is due to the limitations imposed on transcription rate, this should be treated as an intrinsic contribution because it depends solely on the parameters describing the gene of interest. Obviously, the contribution (iii) does not appear in the corresponding model where gene expression is a deterministic process. In the deterministic approach, we have a single protein production rate parameter σ describing gene expression instead of two independent parameters (transcription rate k t and mean burst size b) for the stochastic approach. (For simplicity we refer to a situation when gene expression is time-independent). Without these two parameters it is impossible to fit the stochastic model to experimental data, even semi-quantitatively.
If the number of protein bursts per cycle is small, the contribution to noise (iii) is much larger than both (i) and (ii), but if bursts are small and frequent then (iii) either alone or with (i) and (ii) is too small to explain the observed noise floor level.
In summary, our model includes some of the factors contributing to protein noise in gene expression and to the noise floor in particular. Although the model is sufficient for obtaining the functional dependences between the mean protein abundance and noise which apparently fit the experimental data of Ref. 1 , it does not take into account some important contributions to protein noise like stochastic spread of cell volume at birth, cycle length or growth rate of individual cells. As these noise sources are of extrinsic character, the protein noise floor is likely to be of mostly extrinsic origin, too. Still, we show that the sources of protein noise included in our model suffice to obtain the noise floor, and we quantify their contributions to protein noise.