Introduction

mRNA translation is, together with transcription, the pillar of the central dogma of molecular biology. In spite of its key role in protein synthesis, the accurate understanding of its dynamical details still remains elusive at the present time, and the sequence determinants of mRNA translation efficiency are not fully understood1,2. Initiation of translation regulates the recruitment of ribosomes and it is believed to be modulated by mRNA secondary structures3,4, while protein elongation is mainly considered to be regulated by tRNA abundances determining the pace of the ribosome5,6,7. The individual steps of translation are thought to be well understood, yet there is no reliable approach quantitatively predicting the overall protein synthesis rates for a given transcript.

A better understanding of the molecular mechanisms of mRNA translation will unravel the physiological determinants of translation efficiency. Besides, this knowledge will be extremely useful in developing applications in synthetic biology and will allow tight control on the average production of a protein and on its expression noise.

The translation efficiency of a transcript is often identified with its experimentally observed polysome state (a transcript with two or more ribosomes; monosome when there is only one ribosome residing on it), meaning that transcripts with high ribosome density are more efficiently translated8. Remarkably, many experimental observations show that the ribosome density is related to the length L of the coding sequence (CDS): the longer the mRNA, the smaller the ribosomal density. This indicates the presence of a length-dependent control of translation. As we show in Fig. 1, the observation that average ribosomal densities ρ strongly anti-correlate to CDS lengths L appears to be a conserved feature across many organisms, ranging from unicellular systems such as L. lactis 9, S. cerevisiae 10,11,12 or P. falciparum 13, to more complex organisms such as mouse and human cells14,15. The common traits in the density-length dependence suggest that this relationship is dictated by universal mechanisms underlying the translation process.

Figure 1
figure 1

Ribosome density vs CDS length for different datasets. Blue diamonds (mice14) and fuchsia down triangles (P. falciparum 13) are individual genes, while yellow circles (HEK293T15), green squares (S. cerevisiae 10) and red triangles (S. cerevisiae 11) are length-binned data for the entire genome, with the error bars representing the standard deviation for each bin. The grey line indicates the behaviour of a power law with exponent −1.

However, this remark has been strangely overlooked in the literature (with the exception of Guo et al.16), particularly in the theoretical literature trying to provide models of mRNA translation. A few hypotheses have been proposed to justify the emergence of the length dependence of the ribosome density, which requires a regulation apparatus acting at the initiation17 or at the elongation stage12. These hypotheses have not been examined with the support of a mechanistic model and a mathematical approach. In contrast with previous studies, here we explain qualitatively and quantitatively the relationship between ribosome density and CDS length, making the point that transcript length is a critical determinant of protein synthesis rates.

In Fig. 1 we show a log-log plot of measured ribosomal densities as a function of the CDS length for different organisms. The figure suggests a power-law behaviour (L −1 is drawn as reference). However, extracting a scaling law from this kind of data is a phenomenological (and probably a too simplistic) description of this relationship that, moreover, can only be measured for a few orders of magnitude in L.

Instead, in this paper we propose a mechanistic explanation for the length dependence of translation that is found in experimental data: we describe how the proximity of the mRNA ends increases the local concentration of ribosomes close to their binding sites via a feedback mechanism, favouring their recruitment (and recycling) in short transcripts. In the last part of this work we show how this mechanism could be exploited by the cell to adjust and balance its ribosomal resources.

Results

A stochastic model of translation

The translation of an mRNA is a three-step process, as sketched in Fig. 2A: during initiation, the ribosomal subunits are recruited and the full functional ribosome is assembled on the START codon, ready to translate the transcript; then the ribosome proceeds to elongation, assembling the protein amino acid by amino acid according to the nucleotide sequence of the mRNA; the ribosome eventually detaches when it reaches the STOP codon (termination).

Figure 2
figure 2

Sketch of the translation process and models. The three-steps of the translation process (A): initiation (in the model approximated by a one-step process with rate α), elongation and termination (β). In the standard exclusion process (B) particles can enter the beginning of the lattice with a rate α, move from one site to the next one with rate p (provided that it is not occupied by another particle), and then exit on the last site with rate β. In this study we consider a more refined version of the model (C) in which ribosomes cover \(\ell \) sites (codons), advance one site at a time, and the unidimensional lattice is placed in a three-dimensional environment. R represents the end-to-end distance between the 5′and the 3′ region, and a is the radius of the reaction volume for initiation. The dashed grey line represents a possible diffusive trajectory of the ribosomal subunits leaving the transcript and being re-absorbed (recycled) in the reaction volume around the first site of the lattice.

We model translating ribosomes as particles moving on a unidimensional discrete track of length L representing the mRNA, as depicted in Fig. 2B. In this model particles are injected from one side of the lattice (the 5′ end of the mRNA) with a rate α, then advance one site (one codon) with rate p only if the arrival site is empty, and are removed at the last site (STOP codon) with rate β (Fig. 2B). The first step mimics the recruitment of ribosomes (initiation); elongation is given by the dynamics of ribosomes in the bulk; the exit of particles from the last site represents the termination. MacDonald and coworkers introduced this class of model at the end of the 60′s precisely in an attempt to mathematically describe the process of mRNA translation18. Since then, under the name of exclusion process, this model has been extended and thoroughly studied from a theoretical point of view; it became an emblematic framework in out-of-equilibrium physics19, for which an exact solution is known in the simplest formulation20.

In the last years, revamped extensions of the exclusion process have appeared in the literature, developed to provide more quantitative models of translation21,22,23,24, and many works are nowadays implicitly based on this framework25,26. Here we first look into a variant of the exclusion process that considers particles covering \(\ell =10\) sites of the track27, as the ribosome footprints cover around 28 nucleotides12 (see Fig. 2C). Details of the model and simulations can be found in the Materials and Methods section and in the Supplementary Material.

Translation efficiency corresponds to the ribosomal current

From the analytical solution of the exclusion process or by Monte Carlo simulations we can estimate, as a function of the initiation rate α, termination rate β and codon elongation rate p, the two quantities of interest for the translation process: the ribosomal density ρ(α, β, p), defined as the average number of ribosomes N divided by the CDS length L, and the ribosomal current J(α, β, p), defined as the average number of ribosomes advancing one site in a unit of time. Those quantities can be compared to experimental measurements of polysome profiles and protein production rates. The ribosomal current J corresponds in fact to the protein production rate per mRNA (proteins produced per unit time per mRNA), and we choose to identify it as a better descriptor for the translation efficiency.

The same analytical solution also gives the dependence of current and density on the system’s parameters. The model’s behaviour has been largely studied in the literature as a function of the dimensionless parameters \(\bar{\alpha }\equiv \alpha /p\) and \(\bar{\beta }\equiv \beta /p\), and depending to their values different phases can be observed (see Supplementary Material).

Since the termination rate β is not limiting translation, the translation efficiency and the density do not depend on \(\bar{\beta }\): the system is in the so-called low density phase, in which the density should always be smaller than ~0.076 consistently with Fig. 1. From the analytical solutions one can show that efficiency J and density ρ only depend on \(\bar{\alpha }\equiv \alpha /p\) in this regime (see Supplementary Material). Hence \(\rho (\alpha ,\beta ,p)=\rho (\bar{\alpha })\) and \(J(\alpha ,\beta ,p)=J(\bar{\alpha })\).

In order to make the model more realistic and compare it to experimental datasets, we need to determine the initiation rate α. The estimation of the translation initiation rate has been previously attempted for a few organisms24,28, and these studies have found a dependence of the initiation rate on the transcript length: the longer the transcript, the weaker the initiation. Here we propose a model that is able to explain this observation by coupling the translation process, in particular translation initiation, to the three-dimensional conformation of the mRNA. We will show how a feedback mechanism between the ribosomal current leaving the end of the mRNA and the initiation process, which is controlled by the polysome compaction, could induce a length-dependent initiation rate and hence a length-dependent density. Before that, we need to introduce some basic properties of the transcript’s spatial conformation.

Transcript end-to-end distance depends on the polysome size

We consider the transcript as a polymer that assumes different spatial conformations and a characteristic 5′-3′ end-to-end distance R (see Fig. 2C). An undecorated mRNA (without ribosomes on it) can be considered as a polymer with a persistence length \({l}_{p}\simeq 1\) nm \(\simeq 1\) codon29, and the average end-to-end distance R can be estimated from basic principles of polymer physics. By assuming an underlying random walk one obtains that the end-to-end distance R depends on the length L of the mRNA as

$$R=\sqrt{2L{l}_{p}}\mathrm{.}$$
(1)

However, the stiffness produced by the large size \(\ell \) of the ribosomes can drastically change the persistence length of the mRNA. We assume that the persistence length of an mRNA depends on the polysome state via an average between \(\ell \) (a typical ribosome footprint) and l p (persistence length of an empty mRNA), weighted by the fraction \(f=\rho \ell \) of the transcript covered by ribosomes (at the steady state). After these considerations we write the effective persistence length of the mRNA as

$$\begin{array}{rcl}{l}_{{\rm{eff}}} & = & f\ell +\mathrm{(1}-f){l}_{p}\\ & = & {\ell }^{2}\rho +\mathrm{(1}-\rho \ell ){l}_{p},\end{array}$$
(2)

which is equal to l p when the mRNA is empty and reaches \(\ell \) when the ribosomal density attains its maximal value \(\rho =\mathrm{1/}\ell \). Substituting this value of the effective persistence length into Eq. (1) we obtain the average end-to-end distance of a polysome as a function of the density ρ and the length L:

$$R=\sqrt{2L}{[{\ell }^{2}\rho +\mathrm{(1}-\rho \ell ){l}_{p}]}^{\mathrm{1/2}}\mathrm{.}$$
(3)

This way, we have used a coarse-grained model to couple the state of the polysome to its spatial conformation. Intuitively, Eq. (3) means that a translated transcript with many ribosomes on it will be more stretched (so the distance R between 5′ and 3′ ends will be larger) compared to a situation with a small ribosomal density or an empty mRNA. We have neglected potential formation of secondary structures inside the coding region. Such structures would only slightly decrease the effective length of the sequence (a few codons) and we treat only translationally active transcripts, meaning that moving ribosomes (10–20 codons/s) continuously unfold those structures.

When initiation is limiting, the density ρ of ribosomes is fixed by \(\bar{\alpha }\), and via Eq. (3) we are hence able to determine the dependence of the end-to-end distance as a function of the initiation rate and the length of the transcript, \(R(\bar{\alpha },L)\).

Initiation can be enhanced by a length-dependent feedback mechanism

We will consider that the magnitude of the initiation rate α is determined by the concentration c of free ribosomal subunits via a first-order rate equation: α = α 0 c, with α 0 being the initiation rate constant depending, for instance, on the affinity between the ribosome and the 5′UTR binding site on the mRNA. The concentration c is the local concentration of subunits in the reaction volume of radius a around the ribosome binding site at the 5′ end of the transcript; we introduce c as the homogeneous concentration of free subunits far from the transcript. The local concentration c in the reaction volume is affected by the ribosomes terminating translation at the 3′ end of a transcript, then diffusing into the reaction volume and contributing to the abundance of free ribosomal subunits in this volume. The contribution δ R to the local concentration c due to this feedback mechanism depends on the end-to-end distance R previously calculated in Eq. (3). It can be shown for different organisms, considering typical values of transcript numbers and cytoplasm volume, that the average end-to-end distance is smaller than the average separation between transcripts. This corroborates the intrinsic assumption that the translating processes of distinct mRNAs do not interfere with each other. Thus each individual mRNA can be thought of as a sink-source system for ribosomes (the ribosome binding site representing the sink and the ribosome termination site representing the source) immersed in an environment with a constant background ribosomal concentration c . Hence, the local ribosome concentration around the ribosome binding site can be written as c(R) = c + δ R .

By considering ribosomes as particles performing free diffusion when they are not bound to the mRNA (see the Supplementary Material) we can obtain a mathematical expression of the initiation rate as a function of the system’s parameters. It can be shown (see Supplementary Material) that the increase δ R due to the source at the end of transcript is \({\delta }_{R}\sim J/DR\), where D is the diffusion coefficient of ribosomes (~0.04 μm2/s30). Considering that the protein production rate per transcript is J~0.1−10 proteins/s24,31, and that in E.coli \({c}_{\infty }\sim 0.5-1\cdot {10}^{3}\) free ribosomes/μm 332, for a typical mRNA of length ~300 codons this back-of-the envelope calculation leads to \({\delta }_{R}/{c}_{\infty }\sim 0.1-10\), meaning that the concentration increase is at least comparable with the bulk ribosomal concentration and the mechanism proposed is relevant in the biological regime.

Regarding translation as a steady state process with ribosomal density and current values given by ρ and J respectively, we find the initiation parameter \(\bar{\alpha }\) (see Supplementary Material for a complete derivation):

$$\bar{\alpha }={\bar{\alpha }}_{0}({c}_{\infty }+{\delta }_{R})={\bar{\alpha }}_{\infty }+\lambda \frac{J(\bar{\alpha })}{R(\bar{\alpha },L)},$$
(4)

where we have emphasised the dependence of the ribosomal current J on \(\bar{\alpha }\). Similarly, the end-to-end distance R also depends on \(\bar{\alpha }\) and L as shown in Eq. (3). We highlight that the parameter \({\bar{\alpha }}_{0}\) and thus \({\bar{\alpha }}_{\infty }\) and λ depend on the binding between the ribosome and the mRNA, which is supposed to be mainly regulated by mRNA secondary structures. We will consider the parameters of the model to be independent on the transcript length. This is justified by a weak correlation (Pearson r = −0.01, p-value 0.5)33 between free energies of secondary structures in the 5′UTRs and the transcript length L (see also Figure S6 in the Supplementary Material). The parameter \({\bar{\alpha }}_{\infty }\) is adimensional and \({\alpha }_{\infty }=p{\bar{\alpha }}_{\infty }\) represents the initiation rate without the feedback mechanism between the current and the initiation, or equivalently when the end-to-end distance R is large enough to make this mechanism negligible. The parameter λ characterises the strength of the feedback. It has the dimensions of a length and it corresponds to the typical separation between 5′ and 3′ below which the feedback mechanism becomes relevant. We measure this parameter in units of codon length, which roughly corresponds to 1 nm.

Consequently, the current of ribosomes leaving the end of a transcript increases the concentration of ribosomal subunits around their binding region; through modulation of the mRNA stiffness due to the ribosome load, this feedback leads to initiation rates that are strongly length-dependent. Equation (4) is an implicit equation that can be numerically solved to obtain the initiation rate, and thus the density \(\rho (\bar{\alpha })\) and the current as a function of L. To check the validity of our analytical calculations, we also developed a simulation scheme that allows us to fix the initiation rate via a self-consistent method (see Methods section).

Although Eq. (4) considers the ribosome as a single diffusing particle, we can explicitly consider the diffusion of the two ribosomal subunits. This would generate a dependence on R 2 instead of R in Eq. (4). However, the qualitative behaviour of our results does not significantly change and for the sake of simplicity we decided to present the outcomes of the theory described by Eq. (4). We include the analysis of this more refined model in the Supplementary Material (see also Figures S9 and S12).

Experimental density-length dependence emerges from initiation enhanced effects

We then compare the outcome of the model to experimental measurements of ribosome densities. The result of this analysis is shown in Fig. 3: the predicted ribosome density is quantitatively comparable to the experimental quantification, and our mechanistic model based on basic physical principles is able to capture the length dependence of the ribosome load. Our theory can therefore explain the observed length dependence of the translational properties.

Figure 3
figure 3

Comparison between theory and experimental ribosome densities in yeast (A,B)10,11 and human embryonic kidney cells (C)15. The symbols and datasets correspond to the ones of Fig. (1). The grey lines represent the best fit of the model (the parameter values are written in each panel), while the shadow areas correspond to the regions spanned within the margin of error of the estimated \({\bar{\alpha }}_{\infty }\). Orange circles are the outcome of stochastic simulations used to test the numerical solution of the equation using the parameters obtained from the best fit.

We fit the expression of \(\rho (\bar{\alpha })\) (continuous lines in Fig. 1) and obtain the two parameters \({\bar{\alpha }}_{\infty }\) and λ for three available datasets (two yeast datasets10,11 and a human embryonic kidney cells dataset15), then check the accuracy of the solution with the stochastic simulation scheme developed as explained in the Material and Methods section. Taking the standard errors of the parameter estimation there is no significant deviation between the data and the fitted model (details can be found in the Materials and Methods). However, for the last set, there is a stronger dependence of the solution on the parameter \({\bar{\alpha }}_{\infty }\): shaded regions in Fig. 3 represent the solution considering the standard error of this parameter. This could explain the slight deviation between theory and experimental data for large mRNAs. We also emphasise that the parameter \({\bar{\alpha }}_{\infty }\), depending on the global availability of ribosomes, is supposed to be the most affected by experimental variations (for instance by growth rate dependence or cell cycle stage). The estimation of \({\bar{\alpha }}_{\infty }\) is more accurate when we take into account the diffusion of the two subunits (see Supplementary Material).

Our simulations also allow us to extract the amount of ribosomes bound to a transcript, from which we can extract the monosome:polysome ratio, which is also subject to length-effects (see Supplementary Material). By increasing the CDS length we observe a reduction of the monosome:polysome ratio following a power-law like behaviour. A recent work34 has identified, by merging polysome and ribosome profiling techniques, the amount of active monosomes, i.e. mRNAs with only one ribosome elongating the protein. Consistent with our findings, the monosome:polysome ratio also shows signs of a marked anti-correlation with the mRNA length.

Ribosome drop-off cannot alone be responsible for the density-length dependence

In this section we study the consequences of ribosome drop-off35,36, thus far neglected in our model, on the observed density-length dependence. In order to do that, we performed simulations including ribosome drop-off at a rate δ = 10−3 s−1. This is justified by the estimated drop-off rates of the order of 10−4/codon35,36, and by the codon elongation rates we considered in this paper of the order of 10 codons s−1. Figure 4 shows that the simulations of the process with drop-off (full lines) do not largely differ from the model without drop-off (dashed lines), and the deviations starts to become relevant for large sequences (order 104 codons). For such lengths the extended model is actually reproducing the experimental behaviour even better than the model without drop-off.

Figure 4
figure 4

Model with ribosome drop-off. Green lines correspond to the Arava dataset10, while red ones correspond to the Mackay dataset11 Symbols of experimental points in yeast correspond to the ones of Fig. 1, while dashed lines represent the solutions of the model described in the previous section with the same parameters used in panels A and B of Fig. 3. The continuous lines are the outcome of simulations of our model including ribosome drop-off at a rate of 10−3 s−1, and the grey line shows the outcome of the simulations with drop-off only.

To further exclude the possibility that the length dependence rises from ribosome prematurely leaving the transcript, we simulated ribosome drop-off occurring during the translation process without the feedback mechanism that we propose (basically, we simulated a standard TASEP with particle detachment rate δ). This is represented by the grey line in Fig. 4. We were not able to obtain any length dependence for biologically relevant values of the drop-off rate (see Supplementary Figure S10), and we can therefore conclude that the behaviour observed in Fig. 1 cannot originate by ribosomal drop-off alone.

mRNA circularisation does not change the phenomenology of the model

The 5′ and 3′ end of eukaryotic mRNAs interact with each other via protein-protein interaction, for instance between the Poly(A)-binding protein PAPB and the initiation factor eIF4F bound at the 5′ cap; this coupling is believed to induce the formation of transcripts with circular structures37. However, depending on the energies at play, the transcript dynamically switches between an open, linear state, and a circularised state (see Fig. 5A). When in the circularised state, the end-to-end distance R will be only of a few nanometers (the order of magnitude of the two molecular partners supposedly involved in this interactions). We will denote as d the distance between 5′ and 3′ in the circularised state. The mRNA is found in its circularised state with a given probability P c , and in an open state with probability P o  = 1 − P c , with a difference in free energies between the two states ΔG = G c  − G o . The free energy difference depends on the physical parameters of the model such as \({l}_{{\rm{e}}{\rm{f}}{\rm{f}}}\), L, ε and d (as it can be seen in Eq. (S19)), and how ΔG increases as a function of these parameter is briefly discussed in the Supplementary Material.

Figure 5
figure 5

Model with mRNA circularisation. Sketches of the two possible mRNA conformations, open and circularised, whose transitions depend on the free energy gap ΔG (A). In blue and yellow we have represented the interacting proteins (e.g. PABP and eIF4F) bound at the 3′ and 5′ ends; the black line is the end-to-end distance that is equal to d in the circularised state. We have fixed d = 5 nm in our calculation. Ribosome density computed taking into account mRNA circularisation (continuous line) is then compared to experimental data (triangles, cfr. symbols used in Fig. 1) and the previous model neglecting circularisation (dashed line) (B). The fitted parameters are \({\bar{\alpha }}_{\infty }\mathrm{=(4.7}\pm \mathrm{0.6)}\,{10}^{-3}\) s−1, λ = 7.0 ± 0.6 nm and ε = − 8.3 ± 0.4 (k B T). End-to-end distance (blue curve, right axis) and calculated probabilities P c and P o  = 1 − P c as a function of the CDS length L (C).

To find how the average end-to-end distance is affected by this interaction we weight its value in the circularised and open state with their respective probabilities:

$${R}_{{\rm{circ}}}={P}_{o}R+{P}_{c}d\,\mathrm{.}$$
(5)

If P o  ≈ 1 then our feedback model well approximates the mRNA translation process. While in prokaryotes we can likely assume P o  = 1, this is probably an oversimplification for eukaryotes. In order to consider transcript circularisation we have now to compute how P o depends on the CDS length L and on the interaction energy ε (in k B T units) between the two ends. The intuitive explanation we used before to determine differences in local concentrations of ribosomes close to the 5′ end can be reproduced here to compute P c . For a fixed ε and a short mRNA, we expect to find circularised transcripts with a probability P c close to one; in contrast, very large transcripts should be hardly found in the circularised state. This length dependence will also contribute to the ribosome density behaviour observed in experiments (Fig. 1). We computed P c as a function of L, \({l}_{{\rm{e}}{\rm{f}}{\rm{f}}}\) and ε, which turns out to be:

$${P}_{c}=\frac{1}{1+{e}^{{\rm{\Delta }}G/{k}_{B}T}}=\frac{1}{1+[{(\frac{{l}_{{\rm{e}}{\rm{f}}{\rm{f}}}L}{{d}^{2}})}^{\frac{3}{2}}\sqrt{\frac{4\pi }{3}}-1]\,{e}^{(\frac{2{\pi }^{2}{l}_{{\rm{e}}{\rm{f}}{\rm{f}}}}{L}+\varepsilon )}}\,.$$
(6)

The details of the calculation of P c can be found in the Supplementary Material. Here ε is considered as a fitting parameter. By inserting Eq. (6) in Eq. (5) and computing the end-to-end distance to be plugged in Eq. (4) we eventually find, now as a function of \({\bar{\alpha }}_{\infty }\), λ and ε, how the initiation rate is affected by the concentration increase of ribosomes in the 5′ reaction volume with also considering transcript circularisation. We have then fitted \(\rho (\bar{\alpha })\) to the dataset we have previously used, and the outcome is shown in Fig. 5B (fitting values and other datasets can be found in the Supplementary Material). We did not find a significative difference from the best fit of the simpler model previously introduced (dashed line in Fig. 5B). In Fig. 5C we show how the probabilities of finding a circularised or open mRNA depend on the transcript length, together with the end-to-end distance R. The results agree with our intuitive explanation.

Length-dependent competition for resources

In this section we speculate on some potential applications of our model related to bacterial growth laws38,39. Specifically, we will show that cells can adjust, based on a length-discriminatory mechanism, their relative expression of genes at different ribosome concentrations. This result from our model leads us to predict and theorise a new regulation mechanism for gene expression.

We observe that changes in ribosome densities (or in translation efficiency) induced by changes in the ribosomal pool are conditional on the mRNA length (Fig. 6A). Short transcripts are less affected by the amount of available ribosomes compared to long ones, suggesting a possible mechanism to regulate the relative protein production rate at different growth rates based on transcript length only. As a proof of principle, in Fig. 6B we plot the relative expression \(\eta ({L}_{1},{L}_{2})={J}_{{L}_{1}}/{J}_{{L}_{2}}\) between the translation efficiencies of transcripts with lengths L 1 and L 2. When ribosomes are limiting, short transcripts are relatively more translated than long ones.

Figure 6
figure 6

Effects of competition for resources (ribosomes) on the protein production rate. The ribosome density depends on the overall ribosome concentration c . We show the ribosome density as a function of the transcript length for different concentrations of available ribosomes c (A). The curve denoted with c in the legend is built starting from the same parameters of Fig. 3A. We change c as described in the legend for the other curves. Short transcripts are less affected by changes in c , as we also show in (B) where we plot the relative expression of transcripts η (defined in the text) as a function of c . We used transcript with three different lengths (here L = 100, 500 and 2500). According to these results, ribosomal proteins that are short should be relatively more expressed under high ribosome competition regimes compared to other types of proteins.

This behaviour can be intuitively explained. The main contribution to the initiation of long mRNAs is the concentration of free ribosomes c , meaning that long mRNAs are largely influenced by changes in the ribosomal pool. Short transcripts instead can more efficiently take advantage of the length dependent contribution δ R to initiation.

This constitutes a potential mechanism for regulating the relative expression between short and long genes. Ribosomal proteins are composed only of a few dozens of amino-acids and, as a consequence, our theory predicts that at least qualitatively ribosomal proteins should be proportionately more efficiently translated than longer proteins under strong ribosome competition regimes, i.e. low growth rates. This difference should decrease when ribosomal resources get less tight, and in the limit of infinite ribosomal resources, η should tend to 1 since the length dependence becomes less and less relevant.

Our model suggests a translational mechanism to relatively over express short proteins (e.g. ribosomal proteins) at the cost of longer ones.

Discussion

Many aspects of translation are still puzzling researchers. Theoretical approaches propose designing principles for modulating the translation efficiency at the level of initiation4 and of elongation24, mainly based on the role of RNA secondary structures, codon bias or amino acid properties. However, when tested on synthetic constructs, the hypotheses underlying the theoretical models are often contradicted8, so that the identification of transcript-dependent determinants of translation efficiency is still debated in the literature. In this work we have identified and studied another factor modulating the translation efficiency: the length of the transcript.

According to our results, the proximity of the 3′ end to the ribosomal recruitment site of the mRNA could induce a feedback in the translation process that would favour the translation of short transcripts over long ones, as it has been shown by experiments in the last decades. We connected the emergence of the ribosome density-mRNA length dependence shown in Fig. 1 to a mechanistic model built on basic physical principles and on the properties of the translation process. Our theory then establishes a link between densities and mRNA lengths and explains experimental data with an excellent agreement without invoking an evolutionary selection of genes based on their length. As a matter of fact, a selection process towards short efficient genes to improve cellular fitness could also be conjectured40,41. Without direct experimental observation we cannot rule out this hypothesis, although this would not explain the same behaviour observed for different organisms (experimental data in Fig. 1 seems to collapse to a unique universal curve). Moreover, the poor correlation between free energy of secondary structures at the 5′ end of an mRNA and CDS length is a signature that binding sites are not significantly weaker for long genes, as it would assume an evolutionary argument. Instead our theory predicts, as an outcome, that short genes have larger initiation rates compared to long ones. A recent publication by Li et al.42. confirms that transcript length is a main determinant of translation.

Experimental results in fact suggest that translation initiation is also dependent on mRNA length10,17. Since the ribosome recruitment rate must depend on the local concentration of ribosomal subunits around the 5′UTR, we conjecture that the local concentration is modulated by the CDS length via a feedback mechanism coupling protein synthesis and initiation rates. We can roughly name this process “recycling”, as subunits terminating translation will contribute to the increase of the local concentration δ R and thus be more easily re-used as sketched in Fig. 2. This coarse-grained physical model has allowed us to reproduce the scaling behaviour of experimental ribosome densities (Fig. 3) and it constitutes, to our understating, the first quantitative explanation of how translational features are affected by the transcript length. Previous works have studied the effect of particle recycling43,44,45,46 but, with the exception of Chou (2003)43, they do not explicitly compute how the recycling term is regulated by the end-to-end distance R.

The compaction of the transcript (here characterised by the end-to-end distance), also depends on its polysome state. Intuitively, an mRNA with many translating ribosomes will be more stretched than an empty mRNA. We captured this feature by introducing a ribosome density dependence on the end-to-end distance through Eq. (3). We emphasise that this is the simplest choice for coupling elongation properties and the three-dimensional conformation of the mRNA, and one could introduce more complicated relationships linking end-to-end distance, ribosome density and elongation rate; here we wanted to show that, as a proof of principle, by a feedback mechanism enhancing initiation we can reproduce experimental data very well.

We have also studied how the results change by explicitly considering the diffusion of the two ribosomal subunits, and we found no significant change (see Supplementary Material). To make the model more realistic we considered two further extensions of the model. We considered (i) ribosome drop-off and (ii) mRNA circularisation. The former brought an improvement in the comparison between data and theory for large transcripts only (see Fig. 4), while the inclusion of the latter did not lead to a particular phenomenological change of the model’s outcomes (Fig. 5). This suggests that ribosome recycling, as considered in the basic model, is the fundamental element originating the length-dependence translation.

We have then speculated on how the length could be exploited to create differences in the relative expression of genes at different growth rates, here used as measure of the free ribosomes abundances. When resources are constrained, i.e. when the amount of free ribosomes c is small, competition for resources might become relevant47,48,49 and our theory predicts that the length-dependent term of the initiation rate dominates the process. In other words, when ribosomes are strongly limiting, ribosome recruitment is mainly due to recycled ribosomes, meaning that short transcripts can better capitalise the resources. This mechanism could also be a way to translationally favour the production of ribosomal proteins (which are short) in a scenario of deficiency of ribosomes. In order to formulate this hypothesis, we neglect known mechanisms responsible for ribosome biogenesis, a complex process that is beyond the scope of our work. Our conjecture has then to be interpreted in the perspective of ribosomal concentrations fixed by a certain amount of ribosome production (established, for instance, by the richness of the growth medium): for a given concentration of ribosomes we make strong predictions on how the ribosomal pool should be partitioned among the different transcripts with just a length-discrimination mechanism.

The model could be further extended to consider translation of bacterial operons: in this case, in fact, one transcript is composed of different sinks (ribosome binding sites) and sources (stop codons) of ribosomes, while here we have discussed the case of a transcript translating a single gene (with one ribosome binding site and one stop codons). Having an operon translating different genes will increase the complexity of the feedback term, and it could in principle create counter-intuitive phenomenologies.

Our findings are compared to experimental ribosomal densities, and our framework can quantitatively reproduce the measurements. We have used our model to estimate the protein production rates of synonymous genes, and the method was successful (see Figure S13). In this work we have used our model to predict the ribosome-length dependence, i.e. we have emphasised the dependence of Eq. (4) on L, but our theory predicts that there is a feedback between elongation (codon usage) and initiation, that we have exploited in Figure S13. However, to further test the model and the relevance of the regulatory mechanism we propose, it will be necessary to make fusions of a reporter gene with peptides of variable lengths, and then measure ribosome density or translation efficiency with the aim of experimentally reproducing Fig. 3 in a controlled manner. However, one should pay attention to the changes in mRNA degradation, translation elongation and initiation induced by the added nucleotides coding the fused peptides. Figure 6B also constitutes a good way to test our hypotheses. One of our predictions is the relative change of expression of short/long transcripts when changing the cellular growth rate. This could be obtained by concurrently expressing two reporter genes of different lengths, and measuring their relative expression at different growth rates obtained by changing growth medium or by different antibiotics.

Methods

The exclusion process

We base our model on the exclusion process, which is also introduced in the section Results and in Fig. 2B. More accurately, this model is known in the literature as TASEP: Totally Asymmetric Simple Exclusion Process, for which nowadays there exists a plethora of extensions applied in many different fields, from vehicular traffic to intracellular transport19. Each site of the track in Fig. 2B corresponds to a codon, and the particles can advance from site to site -provided that the next site is not occupied by another particle- mimicking the elongation process. We give a thorough description of the exclusion process and the known results in the Supplementary Material.

To simulate the dynamics of the exclusion process we used a kinetic Gillespie-like Monte Carlo as used in Ciandrini et al.24.

Fitting and numerical solutions

We substitute the expression for \(\bar{\alpha }\), Eq. (4), in the equation for the density \(\rho (\bar{\alpha })\) in the low density phase of the \(\ell \)-TASEP (see Supplementary Material). The current J is given by the J(ρ) correction in the \(\ell \)-TASEP and the end-to-end distance is R found in Eq. (3). Thus, we obtain an implicit equation \(\rho =\rho ({\bar{\alpha }}_{\infty },\lambda ,L)\) that can be numerically solved for each set of variables \(\{{\bar{\alpha }}_{\infty },\lambda ,L\}\) and used to fit the experimental data ρ exp(L) to obtain the parameters \({\bar{\alpha }}_{\infty }\) and λ for each dataset, and their standard errors (see Supplementary Material).

We have used built-in functions of Mathematica50 to obtain numerical solutions for the density and currents and to fit the three datasets used in this study.

Density and current via a self-consistent simulation scheme

Equation (4) allows us to obtain, via simulation, the values for ρ for different values of L, taking into account the feedback mechanism coupling protein synthesis and initiation rate and finite size effects (the later intrinsic to the numerical simulations). We obtain this with the following self-consistent method:

  1. (i)

    We initialise the system with an arbitrary value of α = α (0), let the system evolve until the steady state is reached and then evaluate the current J (0) and the density ρ;(0)

  2. (ii)

    Compute R as in equation (3) and update α = α (1) according to equation (4) with J (0) and ρ (0) computed in (i);

  3. (iii)

    Repeat the previous points for several iterations until |α (i) − α (i1)|/α (i) < 0.01 (in general less than 10 iterations are needed to make the algorithm converge);

  4. (iv)

    The final value of α is then used to obtain the final densities and currents.

With this iteration process, for a given choice of the parameters α and λ, we can obtain the steady state density and current, which vary with the length L of the transcript, due to the joint contribution of recycling and finite-size effects.

This self-consistent method allows us to simulate the system without explicitly considering, thanks to Equation (4), the diffusion of particles when they are not bound to the lattice and the dynamics of the mRNA.

Choice of datasets

We restricted our analysis to measures made by sucrose gradient methods. We are aware that a more recent technique like ribosome profiling12 would provide ribosome densities with codon resolution, but this method does not provide an absolute ribosome density (see definition below). The length-correlation has been shown to hold in ribosome profiling experiments12. Instead of assuming arbitrary normalisation of ribosome footprints to match our theory, we analysed absolute ribosome densities that are available in the literature. For the Mackay et al. dataset11 we used the reliable subset of data.

Definitions

We define the ribosome density to be the number N of translating ribosomes divided by the length L (expressed in number of codons) of the CDS. We embrace this definition instead of alternative ones (ribosomes per 100 or 1000 nucleotides) for practical reasons. Thus, the density ρN/L is expressed in ribosomes per codons, and it can be thought of as the probability of a codon being covered by the the A-site of the ribosome (i.e., a codon being translated). For steric reasons, this density is bound by 1/\(\ell \), where \(\ell \) is the length of the ribosome footprint (in codons). For instance, \(\ell \sim 10\) in S. cerevisiae.