Central dogma rates and the trade-off between precision and economy

Steady-state protein abundance is set by four rates: transcription, translation, mRNA decay and protein decay. A given protein abundance can be obtained from infinitely many combinations of these rates. This raises the question of whether the natural rates for each gene result from historical accidents, or are there rules that give certain combinations a selective advantage? We address this question using high-throughput measurements in rapidly growing cells from diverse organisms to find that about half of the rate combinations do not exist: genes that combine high transcription with low translation are strongly depleted. This depletion is due to a trade-off between precision and economy: high transcription decreases stochastic fluctuations but increases transcription costs. Our theory quantitatively explains which rate combinations are missing, and predicts the curvature of the fitness function for each gene. It may guide the design of gene circuits with desired expression levels and noise.


Introduction Increasing transcription at constant protein abundance increases transcriptional cost while decreasing stochastic fluctuations.
Because biochemical constraints do not seem to explain the lack of genes combining high transcription with low translation, we asked whether evolutionary trade-offs might explain it.
One could hypothesize that cells avoid combining high transcription with low translation in order to minimize the cost of mRNA synthesis. In S. cerevisiae growing in rich medium, the fitness cost of mRNA is c m ∼ 10 −9 per transcribed nucleotide (Methods) 31 . Synthesizing a non-beneficial mRNA of length l m leads to a growth rate penalty ∆ f m that is linear with the transcription rate 31 , ∆ f m = c m l m β m . For a typical mRNA of length l m = 1300 nucleotides transcribed at a rate β m = 30 mRNA / h, the fitness cost of transcription is thus c m β m l m 4 × 10 −5 / h (Fig. 3A, Methods), which is selectable 31,82 . The cost of mRNA is also selectable in E. coli 24,44 .
In addition to their cost, high transcription rates also have benefits in reducing the noise 5,45,52,54 .
Increasing the transcription rate while keeping protein abundance fixed should therefore decrease stochastic fluctuations in protein abundance.
To test if this prediction holds genome-wide across the diversity of chromosomal context and promoters, we use measurements of cell-to-cell variations in protein abundance in S. cerevisiae 50 and E. coli 76 . Cellto-cell variations in protein abundance can be quantified by the coefficient of variation (CV). We determine contours of the CV as a function of transcription and translation rate using Gaussian smoothing (Methods), and compare these to contours of protein abundance. Both in S. cerevisiae ( Fig 3B) and E. coli (Fig. S3A), the CV decreases with increasing transcription and decreasing translation on each equi-protein line. The CV mainly scales with transcription, as predicted by theory 54 (Methods).
Hence, transcription and translation rates impact both gene expression precision and mRNA economy.
At a given protein abundance, high translation / transcription ratios lead to economy but also higher gene expression noise, whereas low translation / transcription ratios yield high precision at the expense of higher mRNA cost (Fig 3C). The lack of genes combining high transcription and low translation could be explained by this trade-off: for genes located in the depleted region, the benefits of increased precision may be smaller than transcriptional costs.

6/69
The precision-economy trade-off and the noise floor explain the depleted region of the

Crick space
To quantitatively test whether a trade-off between precision and economy can explain the depleted region, we developed a minimal mathematical model of the fitness cost and benefit of transcription and translation (Fig. 4). The model has two main predictions: first that the optimal ratio between translation and transcription rates β p /β m is set by the ratio of transcription cost per mRNA molecule C and the gene's sensitivity to noise Q (defined below). The second prediction is an analytical formula for the boundary of the depleted region -the lower bound k on the β p /β m ratio -based on fundamental parameters.
To determine the optimal β p /β m ratio under the precision -economy trade-off, we first model how the β p /β m ratio affects mRNA economy and precision. We then model how mRNA economy and precision affect fitness. Finally, we determine an analytical expression for the optimal β p /β m ratio.
To compute how the β p /β m ratio affects economy, we model the fitness cost of transcription 31 by the linear function ∆ f m = c m l m β m where l m is the (pre-)mRNA length and c m is the fitness penalty per transcribed nucleotide. c m can be estimated from the growth rate µ and the total transcriptional output ∑ β m if we assume that non-beneficial mRNA are transcribed at the expense of beneficial mRNAs. If a total of ∑ β m l m nucleotides are transcribed in a cell, the average fitness contribution of each nucleotide is µ/ ∑ β m l m . This is also the fitness lost per nucleotide of transcribing a non-beneficial mRNA at the expense of a beneficial mRNA. Thus, the fitness cost per transcribed nucleotide is This provides c m ∼ 10 −9 in S. cerevisiae which agrees well with experimental measurements mentioned above (Methods).
To see how β p /β m and precision affect fitness, we consider a protein of abundance p. The protein contributes a quantity f (p) to the organism's fitness (Fig. 4A). Because protein abundance fluctuates around the average expression < p >= p * , the cell doesn't experience the maximum fitness f max but rather a lower average fitness < f (p) >. The fitness lost due to stochastic fluctuations in protein abundance ∆ f noise is called the noise load 32,81 (Fig. 4A). By expanding the fitness function f to second order and 7/69 averaging over fluctuations in protein abundance, we can compute the noise load: Noise load increases with the curvature of the fitness functions (large | f (p * )|) and with noise (large σ ).
To find how β m and β p affect fitness through the precision of gene expression, we note that β m and β p affect the noise level σ 2 in a well-characterized way. Theory and experiments 5,50,54,76 indicate that the variance of protein abundance is given by where α p is the protein decay rate and c v0 is the noise floor (Methods). The noise floor is the minimal amount of cell-to-cell variation in protein abundance in clonal populations 13,50,71,76 .
We can now determine the optimal transcription and translation rates β m and β p that minimize the combination of the transcription cost ∆ f m and of the noise load ∆ f noise (Methods), where C = c m l m α m = c m l m β m /m quantifies the cost of transcription per mRNA molecule, and Q = 1 2 | f (p * )|p * is the gene's sensitivity to noise. Genes with narrower fitness functions (larger | f (p * )|) are more sensitive to noise because the noise kicks protein abundance farther from optimum. The sensitivity of a gene to noise also depends on protein abundance p * because the noise generally scales with protein abundance σ 2 ∼ p * .
Although the cost of translation typically dominates transcriptional cost 31,44,82 , the β p /β m ratio does not depend on translation cost. This is because translation costs are determined by the total amount of protein translated, regardless of whether the protein is synthesized from few or many mRNAs.
The model therefore predicts a relationship between β p /β m ratios and the shape of fitness functions ( Fig. 4B). Broad fitness functions should have large β p /β m because genes with broad fitness functions are not sensitive to noise. For those, high precision provides little benefit, and it is best to maximize economy 8/69 by lowering transcription. On the other hand, genes with narrow fitness functions are sensitive to noise.
For those, requirements of high precision to keep the noise load low dominate the cost of transcription.
Genes with narrow fitness functions should therefore have small β p /β m ratios.
We compared the model prediction for the curvature of the fitness function near its peak to recent measurement of fitness functions of 21 genes in S. cerevisiae 34 . We find that the curvatures predicted from β p /β m are within an order of magnitude of the measurement curvatures without any fitting parameters ( Fig. S4A). Predictions and measurements correlate positively (r = 0.39). A shuffling test suggests that the agreement between the predictions and measurements is unlikely due to chance (p = 0.04, Methods).
However, variability in the experimental measurements precludes a conclusive comparison.
To find a lower bound k on β p /β m and explain the boundary of the depleted region, we note that there is a limit to how small the noise in gene expression can be. This limit, called the 'noise floor' c v0 , is revealed by measurements of cell-to-cell variation of protein abundance in clonal populations 13,50,71,76 ( Fig. 4C, Fig. S4B-C). The cause for the noise floor is a current research topic 60 and it has been proposed that it is due to extrinsic noise 76 or larger transcriptional burst size of high abundance proteins 13 . The noise floor puts an upper bound Q max on how noise-sensitive genes can be: for genes with fitness functions narrower than this limit Q > Q max , the noise load dominates the benefit of expressing the gene (Fig. 4D), leading to negative fitness. Because genes with Q > Q max cannot be selected for, all endogenous genes must satisfy Q < Q max . The boundary of the depleted region β p /β m = k hence corresponds to genes with highest sensitivity to noise Q max (Fig. 4E). The depleted region β p /β m < k corresponds to transcription and translation rates that are optimal for genes with fitness functions too narrow given the noise floor.
To find k, we substitute Q = Q max in Eq. 4 for the optimal β p /β m ratio and rewrite k in terms of the noise floor and other cell biology constants (Methods): where ∑ β m is the combined transcriptional output of all genes. This expression provides an intuition for the cellular parameters which set the boundary of the depleted region of the Crick space. For example, a larger noise floor c v0 raises the boundary because there is less benefit in having high transcription. An

9/69
increased transcriptional output ∑ β m lowers the boundary because individual mRNAs are less costly, allowing increased precision in gene expression.
This formula for k accurately predicts the boundary of the depleted region of the Crick space ( Fig. 4F; red lines on Fig. 2) despite the fact that k varies by nearly two orders of magnitude between organisms (Table 1). Thus, the depleted region can be explained in terms of fundamental parameters such as the noise floor, maximal translation rate, total transcription output and mean protein decay rate. Individual cellular constants alone cannot accurately predict k (Fig. S4D). Neither can k be predicted by noise alone without considering economy, such as hypothesizing that the depleted region is made of all β m and β p for which increasing transcription provides little extra precision relative to the the noise floor (Methods).

Discussion
We find that the distribution of genes in Crick space is not random: genes combining high transcription and low translation are depleted. Such combinations of high transcription and low translation can be achieved with synthetic gene constructs 37 . Therefore, mechanistic constraints cannot explain this depletion. We explain the depletion by a trade-off between precision and economy: increasing transcription at constant protein abundance diminishes stochastic fluctuations, but at a fitness penalty due to the cost of transcription.
High transcription rates are therefore optimal for genes that are sensitive to noise whereas low transcription rates are well suited for genes that can tolerate high noise (Fig. 5A) regulatory mechanism couples GCN4 synthesis to translation stress. It also bypasses transcription, which could allow for rapid upregulation. Such considerations of regulatory couplings or speed might overshadow precision-economy-based limits for certain genes.
The distribution of genes in Crick space is bounded above by the maximal translation rate (10 3.6 -10 4 proteins h −1 ), and below by the boundary of the depleted region. The position of each gene in this space is determined, in the present picture, by the curvature of its fitness function. Genes with a narrow fitness function are most sensitive to noise, and are predicted to lie near the boundary of the depleted region.
Genes with a broader fitness function are predicted to lie farther above this boundary. This prediction suggests an experimental test by measuring the curvature of the fitness function and comparing to the prediction. Accurate measurements of fitness functions can be performed by titrating protein concentration experimentally and measuring fitness. A recent experiment measured such fitness functions for 85 genes in S. cerevisiae 34 . While we found a statistical agreement between the predicted and measured curvatures of fitness functions, the measurement errors were too large to permit a meaningful comparison to the present predictions (Fig. S4A). Further experiments to measure fitness functions can test whether transcription and translation rates predict fitness curvature near the peak.
Beyond optimizing transcription and translation under a precision-economy trade-off, an additional reason why genes may lie farther above the boundary of the depleted region is the possibility that noise is beneficial for some genes. This occurs for example in cases of bet hedging where a gene product brings little or no fitness advantage at present, but expression is maintained in case conditions change so that the gene product becomes important. In such cases, theory and experiments have shown that a wide cell-cell variation in protein level can be beneficial 19,38,69,87 . Such genes expressed for possible future needs are predicted to lie far above the boundary of the depleted region. This prediction is in agreement with the finding of stress genes at relatively high positions above the boundary, and of essential genes closer to the boundary ( Fig. S5A-B, Table S2, Supplementary Item 1).
The theory has further testable experimental predictions. By expressing a protein from different synthetically produced combinations of translation and transcription rates, one should find that there is an optimal translation / transcription ratio. Upon changing growth conditions such that the protein becomes even more important for growth, the optimal transcription rate should increase whereas the

11/69
optimal translation rate should show little change.
The finding that essential, high-precision genes are located close but not below the boundary of the depleted region has implications for synthetic circuit design. If a protein needs to have a specific abundance for the circuit to functions properly, that protein should be expressed using a promoter and ribosomal binding site (RBS) that puts it close to the boundary of the depleted region. In other words, the optimal design should have β p /β m = k with a value of k appropriate to the organism (Table 1). On the other hand, if the circuit is insensitive to the exact concentration of that protein, the protein should be expressed using a weaker promoter and stronger RBS to save transcriptional resources. Combinations of strong transcription (promoters, enhancers) with weak translation (RBS and so on) should be avoided because they incur high transcriptional cost with no extra precision benefit.
The present approach can also help interpret the mode of regulation when the abundance of a protein needs to change. Increasing a protein level can be done by increasing transcription, translation or both.
Studies in several organisms indicate that transcription regulation is more prevalent and strong than translation regulation for most genes 15,30,[41][42][43] . The present theory provides a possible explanation for this observation (Fig. 5B). Transcription regulation increases protein abundance and at the same time decreases noise. Translation regulation will increase noise. Thus transcription control is advantageous assuming that precision is desirable. The relatively rare cases of strong translation regulation may be due to considerations of faster response time, or to cases where it is beneficial to reduce precision, such as in bet hedging 38 . One interesting case is when proteins need to be up-regulated from a very low to a very high level. Geometric considerations rule out a purely transcriptional regulation, because this will put the gene into the depleted region; instead, a combined transcription and translation up-regulation is predicted (Fig. 5C).  Transcription and translation rates were estimated from ribosome profiling and mRNA sequencing data. The top percentile of translation rates (β max p ) is represented as a horizontal dashed line. The observed boundary of the depleted region (diagonal dashed line) has slope 1 and is such that 99% of the genes have a larger translation / transcription ratio. Excluding 1% of genes in this way makes the boundary line less sensitive to measurement errors and to outlier genes, some of which are highlighted. The predicted boundary of the depleted region (red line) is according to the theory introduced later in this article. Technical constraints explain the absence of genes at low transcription and translation rates (region marked 'not sampled'). S. cerevisiae data (A) from Weinberg et al. 84 . M. musculus (B) and H. sapiens (C) data from Eichhorn et al. 16 . E. coli data (D) from Li et al. 41 . E. Transcription and translation rates of 3744 E. coli genes (gray dots) and of 7624 synthetic constructs (red dots) of Kosuri et al. 37 . The apparent negative correlation between transcription and translation rates in this dataset is due to limits in the linear range of flow cytometry measurements which leads to censoring of low and high abundance proteins 37 . F. Figure   The noise load ∆ f noise is the loss of fitness due to stochastic fluctuations in protein abundance. B. The optimal β p /β m depends on the transcription cost per mRNA (C) and on how noise sensitive (Q ∼ | f (p * )|) the gene is. Noise sensitive genes have narrower fitness functions (| f (p * )| large). For a given p * , higher transcription β m decreases fluctuations in protein abundance. Genes that are sensitive to noise (Q large) should thus have low translation / transcription ratio. On the other hand, precision is less critical for genes with flat fitness functions (Q small). These genes should have higher β p /β m to keep transcription costs low. C. The precision of gene expression is limited by the noise floor c v0 . Protein abundance and CV data re-plotted from Taniguchi et al. 76 . The noise floor is also found in the E. coli measurements of Silander et al. 71 (Fig. S4B). D. Genes with fitness functions narrower than the noise floor c v0 have negative average fitness (< f >< 0). Genes with negative fitness are not selectable. Thus, the noise floor c v0 prevents the selection of narrow, noise sensitive fitness functions. E. The maximal, selectable noise sensitivity Q max determines the position k of the boundary of the depleted region. Proteins lying on the boundary have maximal noise sensitivity Q = Q max . Feasible combinations of transcription and translation correspond to genes that are less sensitive to noise Q < Q max . F. The precision -economy trade-off theory predicts the position k of the depleted region in organisms from bacteria to mammals. Error bars represent 95% confidence intervals. See also Figure S4. A. Due to the precision -economy trade-off, low β p /β m is preferred for genes that are sensitive to noise. In log-log scale, low β p /β m corresponds to a diagonal line. On the other hand, high β p /β m is preferred for proteins that tolerate noise. Because translation rates have an upper limit β max p , genes with highest β p /β m are found on a horizontal line. B. Regulatory strategies that lead to the same protein abundance differ in how they impact precision. Up-regulating transcription simultaneously increases protein abundance and precision. On the other hand, up-regulating translation increases protein abundance while decreasing precision. Transcription control is thus advantageous assuming that precision is desirable. C. When protein abundance changes by a large amount, pure transcription regulation can put the gene in the sub-optimal, depleted region. This can be avoided by co-regulating transcription and translation. See also Figure S5. Table 1. The intercept k of the boundary of the depleted region varies over two orders of magnitude across the studied organisms. k can be predicted from the maximal translation rate β max , and the noise floor c v0 using Eq. 5. The measured k is defined by having 99% of genes with β p /β m > k, with error bars from bootstrapping. Error bars indicate the standard error.

Supplementary discussion
This study is based on an optimality approach to evolutionary biology 53 whose aim is to explain adaptations found in living organisms in terms of selective forces. This approach has been used to explain, for example, bacterial growth laws 67,79 or energy landscape in molecular recognition 63 . It is a fruitful approach in the sense that it can suggest new experiments (see discussion).
The quantitative model rests on the assumption that transcription and translation rates can be tuned independently. Although coupling is seen in bacteria 55 and eukaryotes 26 , that coupling has itself evolved rather than being an absolute constraint. Studies on synthetic promoters and ribosomal binding sites indicate that transcription and translation rates can be changed independently over a wide range 37 .
Another assumption of the theory is that cells are well-adapted to the conditions in which transcription and translation were measured, namely rapid growth in rich medium. Rapid growth is thought to be a key fitness component of microorganisms like E. coli and S. cerevisiae 51,86 . Rapid growth also occurs in several contexts in mammals, including immune expansion, cancer, development and stem cells in tissues with rapid turn-over. The mammalian HeLa and 3T3 cell lines studied here have likely been selected for fast growth. HeLa cells were collected from cervical cancer, a condition which selects for mutations that provide a growth advantage. Following collection, HeLa cells underwent serial dilutions for 4 months 64 , a procedure which selects for growth 86 . In the case of 3T3 cells, embryonic fibroblasts were collected and underwent serial dilutions for 3-4 weeks 78 . In the process, cell growth first collapses due to cell senescence, and then recovers to levels of freshly collected embryonic fibroblasts, presumably because immortalized mutants take over the population. Hence, both HeLa and 3T3 cells are likely well adapted to the conditions in which transcription and translation were measured, namely rapid growth in rich culture medium. All genome mappings were performed using Bowtie2in local alignment mode. We discarded all technical reads as well as reads that mapped against non-coding RNAs, defined as transcripts marked as 'ncRNA', 'rRNA' or 'tRNA' in the genome annotation. Remaining reads were mapped to transcripts marked as 'CDS' in the genome annotation. Ribosome profiling reads were mapped to coding transcripts after trimming the first and last 5 codons to remove the effect of translation initiation and termination. Reads that mapped equally well to multiple loci were assigned to one of the loci at random. We then computed RPKMs per gene. Because reproducibility between runs was high, we combined reads from all runs for subsequent analyses.

Data sources and estimation of cellular constants in the four model organisms
The resulting mRNA abundances and protein synthesis rates estimates were highly correlated with those

35/69
Since ribosome profiling RPKMs correlate well with protein abundance 41 , we neglect protein degradation (α deg = 0, α p = µ). The median mRNA half-life is 2.8 minutes 11 , which corresponds to decay rate  21 found a median mRNA decay rate of α m = 0.14h −1 in 3T3 cells, which is the value we used here. Another study 66 , also in 3T3 cells, measured a median half-life of α m = 0.08h −1 , a value for which the predicted boundary is also in good agreement with rates measurements (Fig. S2I). To our knowledge, the noise floor c v0 hasn't been measured in mouse. We therefore used the noise floor from the closest organism in evolutionary terms, namely H. sapiens. or doubling N m (112500 < N m < 450000, see Fig. S2J-K). Gregersen et al. 25 measured mRNA halflives in (human) HEK293 cells and found a median half-life of 11.4h, which is the value we used here (α m = 0.06h −1 ). Another study measured a median half-life of 5h in human B-cells (BL41) 21 , a value for which the predicted boundary of the depleted region is also in good agreement with experimental data (Fig. 2L). Dar et al. 13 found a noise floor c v0 0.3.

Estimating transcription and translation rates from high-throughput experiments
For each gene i, we estimated the number of mRNAs per cell m i from the total number of mRNAs per cell N m from the per-gene mRNAseq RPKM r i data as At steady-state, mRNA abundance m is the ratio of the transcription rate β m to the mRNA decay rate α m 27 .
We thus estimated the transcription rates of each gene as where α m is the median mRNA decay rate (Table S1). Finally, we estimated translation rates β p by combining three numbers: the total number of proteins per cell N p , the protein decay rate α p , and the gene's ribosome profiling RPKM s i of gene i. The number of proteins synthesized per time unit is N p α p .
A fraction s i / ∑ i s i of this protein synthesis flux is translated from mRNAs m i of gene i. We estimate the translation rate β p,i (expressed per mRNA copy per cell) by dividing the protein synthesis of each gene by the mRNA copy number m i : Estimates of N p and α p are provided in Table S1.

37/69 position of genes in 2D Crick space
To evaluate the effect of this simplification, we plot genes in 2D Crick space taking gene-specific mRNA and protein decay rates into account (Fig. S1E). We then assign the same median mRNA and protein decay rates to all genes to re-estimate transcription and translation rates (Fig. S1F). The gene positions in the two resulting 2D Crick spaces differ by less than 0.3 (root mean square deviation in log 10 rates), which is 10% of the total variation (about 3 log 10 units in transcription and translation rates). We conclude that taking into account gene-specific mRNA and protein decay rates has only a small impact on the position of genes in 2D Crick space and thus on present conclusions.

Sequencing data processing
We consider only genes for which the measurement error was small enough to allow accurate estimation of transcription and translation rates. Accurate estimation of these rates is difficult for low abundance mRNAs because they may only collect a handful of reads. This leads to a large uncertainty on the mRNA copy number m, and thus on the transcription rate β m = mα m . How many reads per gene should we require to be confident about our estimates of transcription rates?
Estimates of mRNA abundance m scale with the number of reads n mapping to a given gene (relative to the gene length). We thus compute the minimum number of reads per gene needed to keep the sampling noise on log 10 mRNA abundance below a certain threshold ε log 10 n + σ n < ε where σ is the standard deviation on n due to the sampling error. We model sequencing as a Poisson process, and thus σ = √ n. Substituting this expression for σ into Eq. 9, we compute the minimal number of reads necessary to control for a given error ε on log 10 mRNA abundances: A minimum of 10 reads per mRNA is needed to keep the sampling error on log 10 transcription rates

38/69
in the ±0.1 range (Fig. S2M). We therefore discard genes with less than 10 reads per gene, a procedure which keeps the sampling error low while keeping as many genes as possible in the analysis. Similarly, we require at least ribosomal profiling 20 reads per gene. We applied the same criteria to all four organisms.
We repeated the analyses keeping only genes with at least 100 ribosome profiling reads and reached the same conclusions as the one presented in the article.
In M. musculus and H. sapiens, we discarded canonical histone genes from the analysis because their mRNAs lack a polyA-tail. The polyA+ selection step of mRNAseq discriminates against these mRNAs.
Consequently, the abundance of canonical histone mRNAs is underestimated by mRNAseq RPKMs, leading to aberrant (high) translation rate estimates.

Data and Software Availability
We deposited the raw data (RPKMs from RNAseq and ribosomal profiling) from which we estimated transcription and translation rates at this address: https://data.mendeley.com/datasets/ 2vbrg3w4p3/draft?a=955cbbdf-9f26-4fbb-970b-e6b4081c1f3e Estimated transcription and translation rates are also found at that address. In addition, downloadable tables contain extra fields (such as coefficient of variation on protein abundance fluctuations) needed to reproduce Fig. 3B and 4C.

Estimating maximal translation rates
To estimate the maximal translation rate, we ask how fast proteins can be translated β max p from a single mRNA in the limit where translation initiation is no longer limiting. In this regime, ribosomes follow each other closely along the mRNA. A given ribosome needs to move forward before the next one can advance.
The speed at which ribosome elongate the peptide chain and how many codons each ribosome occupies on the mRNA determine how fast proteins can be synthesized. In E. coli, translation rate can be up to 10 4 proteins per hour (Fig. 2). While the size of ribosomal footprints have not been determined, prokaryotes have smaller ribosomes (21nm, BNID102320) than eukaryotes (26.5nm, BNID111542) and so ribosomal footprints should be smaller. Assuming that ribosomal footprints are proportional to the size of the ribosome, we estimate that prokaryotic ribosomes cover 22 nt or L = 7.3 codons. In favorable conditions, E. coli can elongate up to v = 21 amino-acids per second (BNID100059). This leads to a maximal translation rate of β max p = 10 4 proteins per hour.

Assessing the statistical depletion of genes combining high transcription with low translation
To test for a statistical significant depletion of genes combining high transcription with low translation, we used a randomization strategy.
First, we defined the depleted region as the sub-region of the Crick space lying below a line of slope 1 that has 1% of the genes below it. We then asked if this figure of 1% was high or low compared to chance.
To find out, we randomized transcription and translation rates. Because the main goal of gene expression is to express proteins at the right abundance, we required that randomized datasets have the same distribution of protein abundance as the original dataset. Also, we enforced the observed upper bound β p,max on translation rates. To do so, we randomly sampled a protein abundance p and a translation rate β p for each gene. We then computed the corresponding transcription rates β m = pα m α p /β p , with α m and α p the mRNA and protein decay rates reported in Table S1. Finally, we determined what fraction of genes in the randomized dataset were found below the line of slope 1 and leaving 1% of the genes of the original dataset below it.
We repeated the procedure 10 4 times to determine the distribution of the fraction of genes in the depleted region expected by chance. We finally estimated the p-value that genes avoid combining high transcription with low translation from the fraction of randomized datasets with more genes in the depleted region than the original dataset.

Estimating transcription and translation rates in the synthetic gene library of Kosuri et al.
We compared the distribution of transcription and translation rates of E. coli genes to that of the synthetic constructs library of Kosuri et al. 37 . This study quantified the mRNA and protein abundance of each construct. The constructs only differed in their ribosomal binding sites and promoters. We thus assumed that they shared the same mRNA decay rates and protein decay rates. As a result, the transcription rate of each construct is proportional to mRNA abundance. Translation rates are proportional to the ratio of protein abundance to mRNA abundance.
To compare these measurements to our absolute transcription and translation rates estimates of E. coli genes, we assumed that the strongest promoters and RBSs of Kosuri et al. 37 yielded transcription and translation rates comparable to E. coli's strongest promoters and RBS. We did so by aligning the 99th percentiles of the transcription and translation rates of the synthetic constructs to those of E. coli genes.
The conclusions of the comparison are robust to this assumption. For instance, even if we assume that the strongest Kosuri promoters and RBSs achieve transcription rates 10 times higher or lower than the strongest E. coli promoters (i.e. shifting the red cloud of Fig. 2E to the right or to the left by one unit), the high transcription -low translation region would still be covered by a sizable fraction of synthetic constructs.

Expression for the fitness cost of transcription
Experiments and theory suggest that the fitness cost of transcription ∆ f m scales with the transcription rates 24,31 and (pre-)mRNA length 10,59 . For a pre-mRNA of length l m and transcription rate β m , we thus write the fitness cost of transcription ∆ f m as where the constant c m rescales transcription fluxes (nucleotides per hour) into fitness units (per hour). In this section, we estimate the proportionality constant c m .
To do so, we hypothesize that transcriptional resources are limiting. In this case, making one nonbeneficial mRNA comes at a cost because it replaces a fitness-contributing mRNA. The average fitness

41/69
contribution of a useful mRNA is µ N m where µ is the growth rate and N m = ∑ β m /α m is the total number of mRNAs per cell. Therefore, the fitness cost of making m = β m /α m mRNAs is By identifying the terms in Equations 11 and 12, we see that c m can be estimated from the growth rate µ, the typical pre-mRNA length l m and the total transcriptional capacity ∑ β m c m has units of nt −1 . Alternatively, c m can also be expressed per mRNA copy (which we will use in the next section): where N m is the number of mRNAs per cell.

The fitness cost of synthesizing mRNA in E. coli can be predicted from cellular constants
In this section, we show that the expression for c m derived in the previous section predicts the fitness cost of synthesizing mRNA in E. coli. For this, we use the data of Kosuri et al. 37 We can normalize x m,p (t) to the concentration of clones x 0,0 (t) that express m and p at low levels and hence don't experience a growth penalty (µ µ 0 ): Here, we have assumed that the transformation efficiency of clones is independent of m and p (x m,p (0) x 0,0 (0)).
We estimate x m,p (t)/x 0,0 (t) from the ratio between the DNA counts of each clone and the DNA counts of clone that expressed low levels of GFP (prot < 1.5 × 10 3 in Table S3 of Kosuri et al. 37 ).
To test if mRNA cost is selectable, we perform two linear regression analyses: one regression of log x m,p (t) x 0,0 (t) on p alone, and one regression of log x 0,0 (t) on p and m. Using the F-test for nested linear models, we find that the squared residuals for the regression on m and p are significantly smaller than the squared residuals for the regression on p alone (p < 10 −15 ). Therefore, a model that accounts for mRNA and protein cost is significantly more accurate at predicting fitness than a model that account for protein cost alone. This suggests that the cost of synthesizing mRNA is selectable in E. coli.
Since the growth time t is not known precisely, we cannot determine c m and c p individually. But we can determine their ratio: c m /c p 630. The fitness cost of transcription theory introduced in the previous section (Eq. 14) predicts c m c p = N p N m 2100 (18) where N p is the total number of proteins per cell and N m is the total number of mRNAs per cell (Table S1).
Given the typical uncertainty on measurements of N p and N m (2-fold), the 95% confidence for c m /c p ranges from 300 to 14000. We thus find that predictions of mRNA and protein cost agree with the 43/69 measurements of Kosuri et al. 37 .
Finally, we test whether the theoretical estimates for c m and c p , can predict the abundance of clones. While we need to know the growth time t to predict clone abundance, the correlation between measured and predicted clone abundance is independent of t (see Eq. 17 which relates clone abundance to the growth time and the costs of mRNA and protein).
We find a positive correlation between predictions and measurements of clone abundance (r = 0.67, p < 10 −15 ). We set t to one day (t = 24h) in Fig. S3B to illustrate the correlation.
In conclusion, the cost of mRNA synthesis in E. coli can be predicted from the growth rate and the total transcription output.

The cost of transcription in S. cerevisiae estimated from the measurements of Kafri et al. can be predicted from cellular constants
Here we estimate the growth penalty of transcription in S. cerevisiae from the measurements of Kafri et al. 31 . This study introduced a fluorescent protein construct of pre-mRNA length l m at different copy numbers n in the S. cerevisiae genome. The protein abundance p of the fluorescent protein depended on the genomic copy number of the construct, as did the transcription rate β m .
This study found that the growth rate µ decreases linearly with the genomic copy number n of the fluorescent protein construct, where µ 0 is the growth rate of the WT strain, c m is the growth penalty of transcription (expressed per transcribed nucleotide) and c p is the protein burden (growth penalty per protein). To compare cost across growth conditions, the study normalized the growth rate µ of strains with genomic insertions of the

44/69
construct to that of WT µ 0 (e.g. Fig. 4B of Kafri et al. 31 ): Following the notation of Kafri et al. 31 , we call s N the slope of the relative growth rate µ/µ 0 as function of the genomic copy number of the construct n. We can write s N in terms of the fitness cost parameters c m and c p : To distinguish between the cost of protein and transcription, Kafri et al. 31 Experiments in YPD medium 31 All parameters are known up to two significant digits, except for φ (10 ≤ φ −1 ≤ 30), which is known up to one significant digit. This puts the measurement uncertainty at ±1 × 10 −9 nt −1 .
Plugging the cellular constants of S. cerevisiae from Table 1  Formally, we estimated c v (β m , β p ) as a weighted average,

46/69
where the Gaussian weights w i are defined as: We set the smoothing width σ to one fifth of the data range. We only plotted contours of c v (β m , β p ) for densely populated regions of the Crick space (∑ w i ≥ 200).

Expression for the variance in protein abundance as a function of the Crick rates
In this section, we derive an expression for the coefficient of variation c v as a function of the transcription rate β m , the protein abundance p and the protein decay rate α p . Following a extensive line of theoretical and experimental research 54,72 , we model gene activation and inactivation as a telegraph process (Fig. S3C).
Genes are activated at a rate k on and inactivated at a rate k o f f . Active genes synthesize mRNAs at a rate δ .
Messenger RNAs are translated into proteins at a rate β p and degrade at a rate α m . At steady-state, the coefficient of variation c v on protein abundance of this stochastic process can be computed analytically 54 : The first term of the equation accounts for the Poisson noise on protein abundance stemming from the protein birth-death process. The second term accounts for the noise caused by translating proteins from mRNAs of low copy number. The last term models the noise caused by gene activation and inactivation and transcriptional bursting.
At present time, it is difficult to measure k o f f and k on genome-wide experimentally. We therefore seek a simplified, approximate expression for the coefficient of variation c v in which these parameters occur only implicitly through the transcription rate β m . Note that β m and k on , k o f f are related to each other, where δ is the transcription rate when the gene is in the 'on' state, and P on is the fraction of time when the gene is active.

47/69
E. coli With a median half-life of 2.5 min 11 , mRNAs decay much faster than proteins (α m α p ). In addition, protein decay α p is mainly set by the cell division time (20 min or longer) 35 , which is slow compared to gene inactivation which takes place at the time-scale of seconds 72 (k o f f α p ). In this regime, we can approximate the analytical expression for the coefficient of variation (Eq. 28) as: .
The gene activation rate k on 10/h is largely constant across E. coli promoters, in contrast to k o f f which determines the transcription rate β m 72 . Using Equation 29 which relates the transcription rate β m to gene (in-)activation rates k on and k o f f , we can rewrite the coefficient of variation in terms of β m , where δ 800 / h and k on 10 / h 72 .
The Poisson noise term 1/p is typically negligible compared to the two other terms. The small mRNA copy number noise term α p /β m dominates at low β m (Fig. S3D). This term is consistent with the observation that protein noise initially decreases with protein abundance (Fig. 4C, Fig. S4B-C), and that protein noise decreases with transcription (Fig. 3B).
The third term (gene activation noise) becomes dominant for large β m (Fig. S3D). Because the third term is almost a constant for physiological values of β m (Fig. S3D) S. cerevisiae, H. sapiens, M. musculus In eukaryotes, an approximate expression for the coefficient of variation can also be derived, but is slightly more complicated because the separation of time-scales is less clear than in E. coli: messenger RNAs decay typically faster than proteins, but not by a full order of magnitude. A more realistic, data-driven assumption is with q 3 (Table S1). Measurements in S. cerevisiae 88 and H. sapiens 13 suggest that gene activation dynamics are much faster than protein decay, Under these assumptions, we can approximate the analytical expression for the coefficient of variation (Eq. 28) as Using Eq. 29 which expresses transcription β m as function of gene (in)activation parameters k on , k o f f , δ , we can eliminate β m from the last term:

49/69
We also eliminate δ by introducing the transcriptional burst size b, which is the average number of mRNAs that are synthesized each time the gene is activated: Plugging Eq. 37 for b into Eq. 36 for c 2 v , we obtain: accounts for gene activation dynamics.
Except for highly expressed genes, the transcriptional burst size b is typically small (b 1) 60 . For q = 3 (Table S1), varying P on across its full range causes φ to vary only between 0.75 and 1.5 (Fig. S3E).
Hence, in the worst case, neglecting gene activation dynamics by setting φ = 1 would result in a 1.5-fold error on c 2 v . We conclude that neglecting gene activation dynamics by setting φ = 1 in Eq. 38 for c 2 v results in a reasonable approximation of the coefficient of variation for most genes, except for highly-expressed genes.
For highly-expressed genes, experiments in S. cerevisiae 50 and H. sapiens 13 found a noise floor. This noise floor might occur when the transcription rate exceeds the maximal rate of gene activation 13 . In this scenario, high transcription rate can only be achieved by increasing the burst size b 13,60 . Since b is not a constant in the large β m regime, we rewrite b as in terms of the transcription rate β m . To do so, we combine Eq. 29 which expresses β m in terms of the kinetic rates of gene activation (k on , k o f f , δ ) and Eq. 37 which defines the burst size b in terms of δ and k o f f to find Neglecting the 1/p term which is in the order of 10 −3 or smaller, we can get an expression for the noise floor c v0 by taking the limit of large β m . In this limit, the second term in Eq. 41 vanishes and c 2 v approaches Plugging in gene activation parameter values typical for H. sapiens 13 (P on = 0.18, k o f f = 2h −1 ) and setting q = α m /α p = 3, α p = 0.05h −1 (see Table 1) puts the noise floor c v0 at 0.27, a value comparable to experimental observations 13 (c v0 0.3). Hence, the noise floor on protein abundance could occur when transcription rates saturate gene activation kinetics. In this case, substituting Eq. 42 for c 2 v0 into Eq. 41 leads to: This final expression is identical to the one we previously derived for E. coli (Eq. 32). It would also hold if the noise floor was caused by a mechanism different from the transcriptional saturation of gene activation dynamics, such as extrinsic noise. Note that we neglected the q/(1 + q) term of Eq. 41. This is because the resulting approximation error on c 2 v would be less than 50% (Fig. S3E). Because β m scales inversely with c 2 v (Eq. 43), neglecting gene activation kinetics implies at most a 50% error on β m , or equivalently a 0.2 error on log 10 β m . This is small compared the dynamic range of transcription rates which vary over 2-3 order of magnitudes.

51/69
Power-law scaling of mRNA Fano factor with mRNA abundance cannot explain the noise floor In E. coli, So et al. 72 observed that the Fano factor of mRNA scales as a power-law of mRNA abundance m, with γ 0.64 and s 1.5. This scaling holds for mRNAs whose abundance range from 0.3 to 40 to per cell 72 . A similar scaling was observed in a human fibroblast cell line 11 .
Here we ask whether this scaling can predict the noise floor, which would remove the need for the phenomenological factor c 2 v0 in the expression for the protein noise c v derived in the previous section (Eq. 32).
In the case of the stochastic model of gene expression of the previous section (Fig. S3C), Paulsson 54 showed that the Fano factor for mRNA abundance m can we written as By equating the observed Fano factor power-law (Eq. 44) and the theoretical expression for the Fano factor (Eq. 45), we find how the gene activation noise term varies with the transcription rate β m : where we have used m = β m /α m . Plugging this expression into the expression for c 2 v derived by Paulsson 54 (Eq. 28), we can find how protein noise c 2 v varies with transcription rate β m For γ < 1 and in the typical case in which gene (in-)activation is fast compared to mRNA decay k o f f , k on α m , c 2 v goes to 0 for large p and β m . Thus, there is no noise floor in this regime.

52/69
In the opposite regime where mRNA decay is fast compared to gene (in-)activation α m k o f f + k on , we could have k on ∼ β m or k o f f ∼ β −1 m . In the first case, c 2 v goes to 0 for large p and β m , so there is no noise floor. We already studied the second case in the previous section on E. coli noise and found that the resulting gene activation noise is too small compared to the observed noise floor. Finally, plotting c 2 v as a function of β m for typical values of α m , α p , s, k on , k o f f and γ, we confirmed that the Fano factor power law could not explain the noise floor observed in single cell experiments (not shown).
In conclusion, incorporating the Fano factor power law (Eq. 44) into the model, we find that it results in no noise floor for γ 0.5 or in a noise floor that is too low compared to experimental observations.
Since the noise floor is a well-founded experimental observation 13,50,71,76 , the power law by itself cannot explain the full noise behavior. Additional biology, such as extrinsic noise or increased transcriptional bursting at large transcription rates, is needed to understand the noise floor.
Genes dominated by comparable requirements of precision are predicted to share the same translation / transcription ratio.
We consider a protein of abundance p. The protein contributes a quantity f (p) to the organism's fitness.
The overall fitness is thus The fitness function f (p) reaches its maximum f max at p = p * (Fig. 4A). Expanding f (p) around the optimum p = p * to second order, the overall fitness becomes f (p * ) = 0 since f (p) reaches its optimum f max at p = p * . f (p * ) is the curvature of the fitness function at its maximum. It is a negative number which characterizes how narrow the fitness function is.
Because protein abundance fluctuates around p * , the cell doesn't experience the maximum fitness f max

53/69
but rather a lower average fitness < f >. Averaging F over fluctuations in protein abundance, we obtain where σ 2 is the variance of protein abundance fluctuations (Fig. 4A). The curvature times this variance, − 1 2 f (p * )σ 2 , is the noise load ∆ f noise , the fitness lost due to the stochastic fluctuations in protein abundance 32,81 (Fig. 4A). It is a positive quantity because the curvature f (p) is negative at the maximum We seek the transcription rate β m that maximizes fitness. For this purpose we note that β m affects the noise level σ 2 in a well-characterized way. Theory and experiments of gene expression noise 5,50,54,76 indicate that variance of protein noise is given by where α p is the protein decay rate and c v0 is the noise floor due to extrinsic noise 76 or the larger transcriptional burst size of high abundance proteins 13 (see previous section). We can now solve for the β m that maximizes fitness, by finding d<F> dβ m = 0. The optimal translation / transcription ratio rate β p /β m satisfies From the expression for the optimal β p /β m (Eq. 52), we see that genes with narrow fitness function have lower translation / transcription ratios (Fig. 4B). Higher transcription cost per mRNA moleculedue to longer mRNAs l m or rapid mRNA turn-over α m or a scarcity of nucleotides leading to increased cost c m -shifts the balance towards higher ratios. Note that the translation / transcription ratio does not depend on translation cost, although this cost is typically larger than transcriptional cost 31,44,82 . This is because we assumed that, for a given protein abundance, translation cost are the same if the proteins are synthesized from few or many mRNAs.
The lowest translation / transcription ratio can be predicted from measurable, fundamental parameters of each organism.
In this section, we ask what is the predicted offset of the line that forms the boundary of the depleted region, namely the constant k such that β p /β m > k for all genes. We provide estimates based on known fundamental parameters of cell biology suggested by the theory.
To estimate k, we note that the precision -economy theory predicts that low β p /β m occur for genes with narrow fitness functions (Eq. 52, Fig. 4B). But the noise floor c v0 (Fig. 4C) sets a limit on how narrow fitness functions can be: for fitness functions that are too narrow given the noise floor, average fitness is negative (Fig. 4D). Such fitness functions cannot be selected for in evolution. We can therefore estimate the maximal transcription rate by determining the largest, selectable fitness function curvature, and then compute the optimal transcription rate for that function. The curvature should be small enough that the smallest, unavoidable protein fluctuations (set by the organism's noise floor c v0 ) do not dominate the fitness benefit of expressing the protein. In other words, the fitness benefit of expressing the protein f max should be larger than the noise load ∆ f noise : Here we neglected mRNA cost because for proteins with narrow fitness functions, it is small compared to
If fitness is mainly set by the growth rate µ, the fitness contribution f max of a gene cannot be larger than the growth rate: f max < µ. In addition, fluctuations cannot be smaller than the noise floor, σ /p * > c v0 13,50,71,76 . From these two considerations, we can compute an upper bound on the noise sensitivity Q, (56) By substituting this upper bound in Eq. 52 for the optimal β p /β m , we find an upper bound on β max m on transcription rates We now consider the (hypothetical) protein expressed at maximal transcription β max m and maximal translation β max p . This protein has highest protein abundance p * . It also has narrowest fitness function (narrower fitness are not selectable due to the noise floor), and thus highest noise sensitivity Q max . We can plug Eq. 57 for β max m into Eq. 52 for the optimal β p /β m to find Q max : In this expression, we can see that a higher noise floor implies that genes need to be less sensitive to noise.
We now use Q max in the Equation 52 for the optimal β p /β m to find k, (Eq. 13). Neglecting differences in mRNA length between genes, we finally find: In this derivation, we have assumed that the gene's contribution to fitness f max is smaller than the growth rate µ, f max < µ (Eq. 56). For essential genes, f max µ. For non-essential genes, we can use a tighter upper-bound on f max : with 0 < ρ < 1. For example, if deleting a gene decreases fitness by 1% or less, we have ρ = 0.01. By repeating the derivation, we find Thus, for non-essential genes (ρ 1), the predicted boundary of the depleted region has higher intercept.

Estimating the curvature of fitness functions from the measurements of Keren et al.
Keren et al. 34 measured the fitness of S. cerevisiae cells as a function of log-expression for growth in glucose. Specifically, these measurements map log 10 gene expression x to fitness f (x), with x = log 10 p. These measurements do not allow to estimate the curvature of the fitness function f (p * ) directly. Nevertheless, they allow to estimate a closely related quantity, namely: where we neglected f (p) since we expand around the fitness optimum. In this section, we compare the measured p 2 f (p) to the predictions from the theory.
We focused on genes present in both the study of Keren et al. 34  least half an order of magnitude. Following Keren et al. 34 , we also discarded low-quality genes, such as genes whose fitness value at wild-type expression was significantly lower than the fitness of the wild-type.
This left 34 genes for analysis. We further excluded 9 genes with TATA promoters because these genes tend to have large transcriptional burst size 88 : our model assumes a small transcriptional burst size and hence cannot accurately model the noise of genes with TATA promoters. Finally, we did not consider 4 genes whose curvature was too low to be estimated accurately given the accuracy of fitness measurements (| f (p * )p * 2 < 0.006). This leaves 21 genes for the analysis.
To estimate the local curvature at wild-type log 10 expression x wt , we focus on fitness measurements located within one order of magnitude of x wt : x wt − 1 < x < x wt + 1.
To these measurements, we then fit the parameters a, c, d of the polynomial: There is no first order term because fitness peaks at wild-type expression. The third order term allows for asymmetry in the fitness function. c is the curvature of f (x) at x wt . Using Eq. 63 for the curvature of fitness as a function of log 10 p, we find a relationship between the curvature and the c parameter of the polynomial: We estimate the standard error on c using Fisher's information matrix, assuming a 10% error on fitness measurements 34 .
By rewriting Eq. 52 for the optimal β p /β m in terms of f (p * )p * 2 , we can predict the local curvature of fitness functions from the precision -economy theory: We estimate the error on these predictions by considering that the sampling error on log 10 β m is about 0.1 (see section on data processing), and a 2-fold error on c m (mainly due to uncertainty in the number of 58/69 mRNAs per cell). This leads to a standard error of 0.36 on the log 10 predictions.
To determine the significance of the agreement between predictions and measurements, we compute the root-mean-square deviation (RMSD) between predictions and measurements. Upon shuffling the measurements 10 6 times, only in p = 4.1% of shuffles is the RMSD smaller than the RMSD computed on the non-shuffled measurements. Hence, the agreement between predictions and measurements is unlikely explained by chance. Consistent with this result, a χ 2 test concludes that predictions and measurements do not differ significantly given the measurement error (p = 0.51).

The region in which transcription noise is comparable to the noise floor does not correspond to the depleted region
The depleted region cannot be explained by determining the β m and β p for which increasing transcription provides little extra precision relative to the the noise floor.
To see why, consider Eq. 32 which relates transcription β m to the noise σ /p: Transcription noise becomes comparable to the noise floor when Therefore, the predicted boundary is a vertical line in the Crick space. This is a bad model for the data.
For example, in S. cervisiae, the predicted boundary would be log 10 β m = 1.6 which excludes about half of the genome ( Fig. 2A).

59/69
Identifying groups of genes with significantly high or low translation / transcription ratio Do genes with high or low translation / transcription ratio have different biological functions? To find out, one could iterate through groups of genes of similar function -as defined by Gene Ontology (GO) annotations 2 -and test statistically whether genes in these groups show significantly high or low β p /β m .
However, this approach neglects a key property of the central dogma rates: high abundance proteins cannot have high translation / transcription ratios due to the maximal translation rate β max p . Since translation rates β p cannot exceed β max p , achieving high protein abundance requires recruiting transcription, which decreases β p /β m .
As a result, stratifying genes by equal β p /β m arbitrarily groups low abundance proteins with low β p /β m together with high abundance proteins that would have high β p /β m but cannot because translation rates cannot exceed β p .
To address this issue, we stratify genes by their position relative to two boundaries: the line of lowest possible translation / transcription ratios (β p = kβ m ), and the line of highest possible translation rates (β p = β max p ). We summarize the position of genes in between these two boundaries by an angle θ (Fig. S5C). Genes that sit on the line of lowest possible translation / translation ratios have θ = 0, whereas θ = π/4 corresponds to genes with maximal translation / transcription ratios.
θ can be computed from a gene's β m and β p , and from the maximal transcription and translation rates β max m and β max p by trigonometry (Fig. S5C): The line of lowest possible translation / transcription ratio has slope 1 (β p = kβ m ), η + θ = π/4. Therefore, For each group of genes from the Gene Ontology, we use the Mann-Whitney test to compute the p-value that the θ s in that group are significantly larger or smaller compared to the distribution of θ s in 60/69 genes overall. We estimate the false discovery rates (q-values) using the fdrtool R package 74 . For all four model organisms, Supplementary Item 1 lists all GO categories with significantly high or low θ with false discovery rate q < 0.01, together with (uncorrected) p-values and θ normalized to π/4 so that θ ranges between 0 (for GO categories with lowest β p /β m ) and 1 (for GO categories with highest β p /β m ). Table S2 shows a summary of these results. Fig. S5D-G shows the position of genes belonging to selected significant GO categories in the Crick space.
Differences in the genes groups with high and low β p /β m in HeLa compared to 3T3 cells can be explained by global differences in gene expression profiles.
Different groups of genes take unusually high or low β p /β m in HeLa cells compared to 3T3 cells (Table S2).
Here we examine whether these differences can be explained by global differences in the gene expression.
We obtained log 2 RPKMs for HeLa and 3T3 cells the RNAseq experiments of Eichhorn et al. 16 .
Because 3T3 cells are of murine origin, comparing the two cell lines requires mapping mouse gene names to human names. To so so, we used ENSEMBL's orthology database, using only one-to-one orthology relations of highest confidence (confidence = 1). After mapping genes names from mouse to human, we discarded 31 non-unique genes.
To provide more context for the gene expression analysis, we merged the HeLa and 3T3 gene expression profiles with the Cancer Cell Line Encyclopedia (CCLE) gene expression data 6  Finally, we determined gene sets that were differently regulated in HeLa compared to 3T3 by gene set enrichment analysis 75 (Fig. S5I).   16 , protein decay rates from Cambridge et al. 9 . Rates were estimated as described in the Methods. Bars span the range from the 0.5% quantile to the 99.5% quantile of rates (i.e. 99% of genes). E -F. Taking into account the specific mRNA and protein decay rates of each gene has a negligible effect on the distribution of genes in the Crick space. E. Transcription and translation rates as estimated from measurements of mRNA and protein abundances and decay by Schwanhäusser et al. 66 . F. Same as E., except that transcription and translation rates where estimated from mRNA and protein abundance by setting mRNA and protein decay rates to their median. Gene positions in panels E and F differ by 0.3 (root mean square deviation in log 10 rates), which is small (10%) compared to the dynamic range of transcription and translation rates which vary over three orders of magnitude.

63/69
Figure S 2. Related to Figure 2. A. The 'not sampled' region is explained by sequencing depth and our focus on genes with at least 10 mRNAseq reads and 20 ribosome profiling (RP) reads per mRNA (two green lines). B. In E. coli, a group of 62 genes (triangles) contribute strongly to fitness. Excluding these genes, the boundary of the depleted region (diagonal dotted black line) has intercept log 10 k = 1.7 ± 0.1, which is higher than log 10 k = 1.1 ± 0.1 found using all genes. A higher intercept is expected for non-essential genes (Methods). This is illustrated by the red line, which represents the boundary predicted for genes that contribute 1% of the organism's fitness. Fitness data: Baba et al. 3 . C -E. The depleted region is found in the mass spectrometry (MS) and mRNAseq data of Schwanhäusser et al. 66 (panel C) and Nagaraj et al. 49 (panel D), as well in the flow cytometry data of Newman et al. 50 (panel E). By estimating transcription and translation rates from the mean protein abundance and coefficient of variation 22,23 , we find a depleted region whose boundary has slope 1 (panel E), as in ribosome profiling datasets. The boundary of the depleted region in the two proteomics datasets has slope larger than one (β p ∼ β 4 m ). The different slope observed with mass-spectrometry compared to ribosome profiling and flow cytometry datasets could be explained by technical limitations in these pioneering mass-spectrometry datasets and differences in the error structure of mRNAseq and mass-spectrometry datasets 42 . F -H. Genes with high β m and low β p are statistically depleted. F. The boundary of the depleted region is a line of slope 1 such that 99% of genes are above the line. S. cerevisiae data from Weinberg et al. 84 . G. Shuffling β m and β p while keeping the marginal distributions of β p and protein abundance constant increases the fraction of genes located in the depleted region. H. All 10 4 re-shuffled sets of β m and β p show a increased number of genes in the depleted region. Thus, the depletion of this region is statistically significant (p < 10 −4 ). Repeating this procedure in E. coli, H. sapiens and M. musculus leads to the same conclusion (p < 10 −4 , not shown). I -L. The depleted region and the predicted boundary of this region are robust to uncertainties in mRNA half lives and in total number of mRNAs per cell N m . M. The error on log 10 mRNA abundance ε can be controlled by discarding genes with low number of reads n.  76 . B. Clone abundance in growth competition experiments can be predicted from theoretical estimates of fitness cost of mRNA c m and fitness cost of protein c p . We assumed a growth time t = 24h in this figure. The correlation between predictions and measurements is independent of the growth time t (Methods). The dotted line represents a perfect fit between measurements and experiments (y = x). C. The 3-stage telegraph process can be used to models how intrinsic noise in gene expression is affected by different reaction rates. D. In the 3-stage model of gene expression of panel C, intrinsic protein noise is mainly due to small mRNA copy number noise (blue line) when β m is small. For large β m , intrinsic protein noise is dominated by gene activation noise (green line), which is almost constant for β m in physiological range (x-axis). Constants are from the experiments of So et al. 72 and Taniguchi et al. 76 : α p = 0.28/h, c 2 v0 = 0.07, δ = 800/h, k on = 10/h. With these constants, the noise floor observed experimentally (orange line) is larger than the gene activation noise. E. By neglecting gene activation dynamics, one can estimate the coefficient of variation from the central dogma rates with an error φ less than 1.5-fold. Note however that this result does not hold for highly expressed genes (see Methods).  34 who measured fitness as a function of log protein abundance. The quantity f (p * )p * 2 can be directly estimated from this data (Methods). We used the precision -economy theory to predict f (p * )p * 2 from the central dogma rates (Methods). Error bars represent standard errors. B. A noise floor c v0 = 0.22 ± 0.01 is found in the measurements of the protein noise conferred by E. coli promoters of Silander et al. 71 . This is comparable to the noise floor c v0 = 0.27 ± 0.01 found in the measurements of Taniguchi et al. 76 (Fig. 4C). C. A noise floor c v0 is also found in S. cerevisiae. Coefficients of variation and protein abundance from Newman et al. 50 . D. Individual constants cannot predict the position k of the boundary of the depleted region. We correlated the position of the boundary of the depleted region with the predictions from the theory as well as with the 8 cell biology constants listed in Table 1 (number of mRNAs N m and protein N p per cell, cell volume V , growth rate µ, active protein decay rate α deg , effective protein decay α p , mRNA decay α m , noise floor c v0 ), in addition to the maximal translation rate β max p and total transcription output ∑ β m (Table 1). We tested for significant correlations between these constants and measured k. Only with the precision -economy theory do we find a significant correlation between measurements and theory (p < 0.01).

66/69
Figure S 5. Related to Figure 5. A -B. Genes located close to the boundary of depleted region provide more fitness benefit than genes located far from it. Contours of fitness loss upon gene deletion in S. cerevisiae (panel A) and E. coli (panel B). We obtained fitness data from Steinmetz et al. 73 (S. cerevisiae) and Baba et al. 3 (E. coli). Fitness was Gaussian-smoothed using the same procedure as the CV data (Methods). C. To test for statistical associations between the function of genes and their position in the Crick space, we describe each gene by an angle θ . θ = 0 for genes with lowest possible translation / transcription, whereas genes that maximize the translation / transcription ratio have θ = π/4. θ can be computed from a gene's β m and β p , and from the maximal transcription and translation rates β max m and β max p . D -G. Genes with different functions (GO categories) are found close or far from the boundary of the depleted region. In each panel, black dots represent genes belonging to a specific group of genes. H. Principal component analysis of gene expression in 859 cancer cell lines (gray dots) shows that HeLa cells cluster with ovarian cell lines (red dots) whereas 3T3 cells cluster with skin cell lines (blue dots). Ellipses represent the covariance of skin and ovarian cell lines at the 75% confidence level. I. 3T3 cells over-express Ras signaling genes whereas HeLa cells over-express genes involved in nonsense mediated decay and protein targeting to membrane (Gene Set Enrichment Analysis, see Methods).  Genes that are key to growth tend have low β p /β m whereas genes needed for stress response, survival and differentiation have high β p /β m . For each of the four organisms studied in the present study, the table lists GO categories that are significantly enriched (FDR < 0.01) at high and low β p /β m . The observation that respiration genes have high β p /β m in S. cerevisiae could hint at a bet-hedging strategy, where a fraction of the population prepares for growth in a resource-limited environment. Comparing 3T3 and HeLa cells, we find that genes involved in translation initiation, ATP binding, RNA binding as well as ubiquitin-dependent protein catabolism appear both in both mouse and human cells (Supplementary Item 1). Other groups of genes are only found in one of the two cell lines. Such difference in GO terms is expected for cancer cell lines of different tissue origin: 3T3 cells cluster with skin cell lines whereas HeLa cells cluster with ovary cell lines (Fig. S5H, Methods). In addition, differences in genes expressed by 3T3 and HeLa cells can explain differences in genes with high and low β p /β m (Methods). For example, 3T3 over-express Ras signaling genes (Fig. S5I). These genes have significantly low β p /β m in 3T3 cells but not in HeLa (Supplementary Item 1). In contrast, HeLa cells over-express genes involved in nonsense mediated decay and SRP-mediated protein targeting to membrane (Fig. S5I) which have low β p /β m in HeLa but not in 3T3 (Supplementary Item 1). Therefore, differences in GO terms with high and low β p /β m in human and mouse can be explained by differences in the underlying biology of the cell lines.