Introduction

Measuring protein abundance provides information that is not apparent from gene expression data but is crucial for the description of the state of a biological system1. Nevertheless, measured mRNA concentrations are often used to linearly approximate the corresponding protein levels, even though such approximation can be very imprecise1. However, mRNA levels (unlike protein abundances) are relatively easy to determine due to RNA and DNA base pair complementarity, which enables precise and high-throughput measurements, such as sequencing and microarrays. Measuring protein levels remains more challenging, due to the different chemical properties of proteins and wide dynamical range of protein abundances. Studies have shown that protein levels cannot be determined from mRNA levels just by correlation1,2,3,4,5,6. For example, similar mRNA expression levels can be accompanied by a wide range (up to 20-fold difference) of protein abundances and vice versa1.

The relation between mRNA concentration, [mRNAi(t)], and protein concentration, [Pi(t)], of protein i can be described in the first approximation by a kinetic equation:

$$\frac{d[{P}_{i}(t)]}{dt}={k}_{trans,i}\cdot [mRN{A}_{i}(t)]-{k}_{d,i}[{P}_{i}(t)],$$
(1)

where \({k}_{d,i}=\frac{\mathrm{ln}(2)}{{\vartheta }_{d,i}}\), and ϑd,i, kd,i and ktrans,i are half-life, degradation rate, and translation rate, respectively. Data regarding mRNA levels, protein abundances, degradation rates, and translation rates are required to solve Eq. 1. Among these, only translation rates are not readily available for most model organisms. Eq. 1 is typically solved using the steady-state assumption, which is the easiest mathematical way to solve it, but it is also the least physiologically relevant, since the concentrations of many important proteins and their mRNAs change dynamically. Therefore, instead of using the steady-state assumption, we propose to solve Eq. 1 using alternative boundary conditions: that both mRNA and protein levels will be the same at time 0 and at the certain time T at the end of experiment. Such a condition should be fulfilled in a typical control versus treatment experiment, at the time when treatment wears off as the cells go back to their original (control) state. Here, as proof-of-concept, we discuss a specific class of such experiments, where a system undergoes periodic changes, although periodicity of the data is not necessary to use our approach.

Results

Taking advantage of an availability of genome-wide data of mRNA levels, half-lives, and average protein abundances in the model organism S. cerevisiae, we predicted dynamic protein abundances based on gene expression levels. We chose to use a simple, classical model of translation2,3, as described by Eq. 1, above. The protein concentration [Pi(t)] depends on the number of mRNAs ([mRNAi(t)]), which are translated with rate constant ktrans,i, the protein-specific translation rate. Protein degradation is characterized by the rate constant \(\,{k}_{d,i}=\frac{\mathrm{ln}(2)}{{\vartheta }_{d,i}}\), where ϑd,i is the protein half-life. The proposed model does not include variables sometimes reported as proportional to the translation rates, such as ribosome occupancy or ribosome density4. This is because the minimalistic model, based only on data that are known with certainty to be relevant, performs better, as demonstrated below. Despite the simplicity of this model, it has been shown5 to accurately capture the dynamical changes in protein abundances for a majority of human proteins. These results suggest that the model is suitable for other eukaryotic systems (like S. cerevisiae) as well.

As described in detail in Materials and Methods, protein concentration and translation rate can be calculated from a time-course of its gene-expression measurements and its average abundance. As proof-of-concept, we chose five different S. cerevisiae cell cycle synchronized gene expression data sets (Table 1): alpha (3395 proteins), brd26 (2840 proteins), brd30 (2699 proteins), brd38 (2751 proteins), cdc15 (3173 proteins) and cdc28 (3424 proteins). First, we used the periodogram to estimate the consensus period for periodically expressed cell cycle genes in each of these data sets (Materials and Methods and Table 1). Second, we mathematically pre-processed raw data on yeast protein half-lives, to remove negative values and improve overall accuracy of half-life estimates (Materials and Methods). Next, we used an existing compendium of the budding yeast mRNA and protein consensus levels to estimate these levels in our conditions (Materials and Methods). Finally, we numerically solved Eq. 1, using the Fixed Point Iteration method, for all periodically expressed proteins in these five data sets. This resulted in predicted time-courses of dynamic protein abundances, with 1-minute resolution during the whole cell cycle, for all budding yeast proteins available in each of five different data sets. All predicted dynamic protein concentrations and translation rates can be browsed, compared, and downloaded via our web server (http://dynprot.cent.uw.edu.pl/).

Table 1 Pearson and Spearman correlations between average mRNA and average protein concentrations.

Validation of predicted dynamic protein abundances

In order to verify the temporal protein levels calculated using our model, we utilized western blotting to measure the actual protein concentrations for five representative proteins in cell cycle synchronized yeast culture (Materials and Methods). Representative proteins were chosen from the three groups: (1) proteins with relatively constant mRNA levels and predicted protein levels (Fig. 1A), (2) proteins with highly variable mRNA and relatively constant predicted protein levels (Fig. 1B), (3) proteins with variable mRNA and predicted protein levels during the cell cycle (Fig. 1C). For proteins with variable mRNA levels, we also required that they were transcriptionally regulated during the yeast cell cycle to guarantee that the observed changes in their levels would be meaningful. To confirm mRNA level periodicity in the yeast cell cycle the SCEPTRANS web server was used6. The choice of individual proteins within a group was based on availability of commercial antibodies. The first group is represented by Rad50p, a protein required for DNA damage repair, genetic recombination during meiosis, and for telomere maintenance7,8. The levels of RAD50 transcript remain almost constant during the cell cycle and due to a very long half-life of Rad50p (344 minutes, calculated as described in Materials and Methods using the data of Belle et al.9), our model predicted that Rad50p levels should remain virtually constant during our experiments (Fig. 1A). Indeed, western blot analysis of the time-course Rad50p data confirmed this prediction (Fig. 2A). The second group is represented by histone Hht1 and by Rnr1, the major isoform of the large subunit of ribonucleotide-diphosphate reductase, that is required for dNTP synthesis10. As these proteins are crucial for DNA replication, their transcripts peak during S phase and decrease shortly thereafter. Despite this high variability of HHT1 and RNR1 transcripts, concentrations of their proteins during the cell cycle were predicted to be constant by our model due to the long half-lives of Hht1p and Rnr1p (349 and 77 min, respectively; based on the data of Belle et al.9 we re-analyzed, see Materials and Methods). These predictions were confirmed by western blotting data showing no significant variability in the levels of Hht1or Rnr1 proteins during cell cycle progression (Fig. 2B). The last validation group consisted of two proteins: Cdc5 and Clb2, which are directly involved in controlling cell cycle progression. Cdc5 is a polo-like kinase, necessary for meiotic progression11, while Clb2 is a B-type cyclin required for transition from G2 to M phase12. Their function is thus restricted to only specific stages of cell division. Consistent with this, both proteins are known to have transcript levels strongly regulated during the cell cycle6,13. According to our calculations based on O’Shea and colleagues’ data9,14, Cdc5 and Clb2 half-lives are 10 and 22 min, respectively. Our model predicted that Cdc5 and Clb2 concentrations would exhibit strong variability during the yeast cell cycle (Fig. 1C). Indeed, the levels of Cdc5p and Clb2p, as determined by western blotting, varied strongly, reaching peaks at 65 and 115 min (M phase), and 55 and 110 min (G2/M transition), respectively (Fig. 2C). However, assuming that Clb2 has a constant half-life of 22 min (as calculated based on the data of Belle et al.9), gives less than ideal agreement of predicted protein concentrations with western blot measurements (Fig. 2C).

Figure 1
figure 1

Comparison of mRNA vs. predicted protein concentrations for selected proteins in the alpha data set. (A) Rad50 (relatively constant mRNA levels and predicted protein levels), (B) Hht1 and Rrnr1 (highly variable mRNA and relatively constant predicted protein levels) and (C) Cdc5 and Clb2 (variable mRNA and predicted protein levels during the cell cycle).

Figure 2
figure 2

Comparison of experimental vs. predicted protein concentrations for selected proteins: (A) Rad50 (B) Hht1 and Rrnr1 and (C) Cdc5 and Clb2.

Extending the model to accommodate post-translational regulation

Discrepancies between predicted and experimental protein levels during the cell cycle may be caused by known inaccuracies of the western blot (up to 2-fold) or by post-translational regulation. To address this question, we also constructed a more complex model, allowing variable half-life throughout the cell cycle, to verify if considering dynamical half-lives would result in much better agreement between predictions and experimental data. We tested the expanded model on the case of Clb2, since it was the only protein tested showing discrepancy with the predicted model beyond that expected from western blot measurement errors. First, we calculated predicted Clb2 temporal abundance based on static experimental half-lives from two different studies9,14 (Fig. 3A,B). Next, we utilized our expanded model, which allows Clb2 to switch between longer and shorter half-lives depending on the stage of the cell cycle. We generated such models for Clb2 with half-lives ranging from 1 to 40 minutes, with 1-minute step, and changing throughout the cell cycle. We chose the model which best fit the western blot data, which turned out to be the model assuming a very short Clb2 half-life up to minute 30 and after the minute 55 after the alpha-factor release and longer during the rest of the cell cycle (Fig. 3C). Indeed, it was reported earlier that the Clb2 half-life was less than 1 min for cells arrested in G1 by α factor9,14 and in our best-fitting models the Clb2 half-life was 1 minute (shorter values were not considered) during the G1 phase (Fig. 3C). Clb2 had a longer half-life, closer to the value measured in9,14 during the Clb2 activity window, which is at the G2/M transition. These results show another important application of our method: if half-life (and/or translation rates) are unavailable, they can be estimated with good accuracy from corresponding gene expression and proteomic time-courses, even in very challenging cases in which the half-life is variable and the protein time-course is inferred from relatively inaccurate western blots.

Figure 3
figure 3

Variable half-life allows best fit of predicted (red) and experimentally measured (green) temporal protein concentration profiles (A,B). Previously reported half-lives9,14 (left) for Clb2 do not lead to good fit of predicted and measured protein concentration temporal profiles (right), especially for half-life reported in Belle et al.9. (C) Variable half-life (left), found through numerical simulations (Material and Methods) allows for best fit between dynamic Clb2 abundances predicted from mRNA time-course and measured protein abundance time-course from the same condition.

Correlation between mRNAs and protein abundances in time-course data

It is typically assumed that with an increase in the quality of both gene expression and proteomic data, the correlation between mRNA and protein abundance would grow. However, a significant correlation between mRNA and protein concentration can be expected only for some groups of proteins. Greenbaum et al.15 showed a significant increase in correlation between mRNA and protein levels for proteins localized in the same cell compartment or with the same MIPS functional category. O’Shea and colleagues9 later showed that proteins of similar function tend to have similar half-lives. So far, the highest correlations between mRNA and protein concentrations have been achieved by Futcher et al.16, who found relatively high correlations (r = 0.76) after copula-transforming the data to normal distributions. The r = 0.7–0.8 range likely represents the highest possible correlation to achieve. On the other hand, protein half-lives are known to have a dynamic range of several orders of magnitude9, and therefore even similar mRNA expression levels can be accompanied by a wide range of protein abundance levels, and vice versa1. In general, it is increasingly recognized that mRNA abundances are only a weak surrogate for the corresponding protein concentrations, mainly because of post-transcriptional control of gene expression. Our studies allow us to look deeper at this problem. We found that even though the Spearman and Pearson correlation between average protein and mRNA concentrations is highly significant (Table 2), the temporal profiles of protein and mRNA concentrations are only weakly correlated (Fig. 4), with typical correlation not higher than 0.2. As expected, the highest correlations between temporary protein and mRNA abundances were observed for proteins with short half-lives, when protein levels follow close behind mRNA concentrations (Fig. 5). These data show that even in the simplified case of not considering post-translational modification, mRNA levels are good estimates of temporal protein abundances during the whole cell cycle only for a handful of proteins, highlighting the usefulness of the modeling described above.

Table 2 Data sources.
Figure 4
figure 4

Histogram of the Spearman correlation between protein and mRNA concentrations during the cell cycle for all available proteins in the following data sets: alpha (3395 proteins), brd26 (2840 proteins), brd30 (2699 proteins), brd38 (2751 proteins), cdc15 (3173 proteins) and cdc28 (3424 proteins).

Figure 5
figure 5

Relationship between Spearman correlations of protein and mRNA levels during the cell cycle and protein half-lives.

Estimating translation rates

Translation rate (TR, denoted by ktrans,i in Eq. 1) is the output of protein production relative to the amount of mRNA. Translation rates are not easy to measure directly, and are traditionally estimated utilizing a steady-state condition (TRss, Material and Methods, Eq. 7). However, the steady-state assumption is usually not fulfilled in physiological conditions. Moreover, there is growing evidence that unlike the degradation rate, the translation rate is very plastic and is a mechanism to control protein abundances, in response to changing mRNA levels (e.g.17). Our approach provides a method for estimating condition-specific translation rates requiring neither the steady-state condition nor knowing protein abundance, but using time-series gene expression data instead (TRtc, Material and Methods, Eqs 2 and 3). To compare translation rates calculated using these two different approaches we computed the relative difference between steady-state and timecourse-derived rates TRdiff (Materials and Methods, Eq. 6), which varies from 0 to 1 depending on how different TRss and TRtc are. We found that there are relatively few proteins for which TRdiff is greater than 0.1 (65 out of 3395 in the alpha data set, 59 out of 2840 in the brd26 data set, 25 out of 2699 in the brd30, 30 out of 2751 in brd38, 254 out of 3173 in cdc15 and 64 out of 3424 in cdc28). This result shows that our method offers a useful alternative approach to estimating translation rates when protein abundances are not known, but time-course gene expression data are available. We think the three main reasons for the observed discrepancies between these two methods of computing translation rates, described in more detail below, are: (a) the effects of α-factor synchronization, (b) measurement errors of mRNA and protein concentration and (c) time-dependence of half-lives. (a) α-factor synchronization would cause mRNA levels of some genes to be changed, for example upon α-factor synchronization mRNA abundances of SST2/YLR452C (which regulates desensitization to α-factor)18 and SW11/YGL028C (which may play a role in conjugation during mating based on its regulation by Ste12p)19 are elevated. Indeed, for these two proteins we obtained TRdiff > 0.1. (b) The second likely source of differences is measurement errors of data used: here mRNA and protein concentrations and degradation rates. (c) Third, as we will discuss below, some half-lives are time-dependent and neither the steady-state nor the time-course based method used so far can fully accommodate such time dependency. Due to the very different approaches to estimating TR using either the steady-state or time-course method, it is not surprising that time-dependence of actual protein half-lives would affect these calculations in a different manner, causing the observed discrepancies. In summary, the main source of differences in translation rates we computed is related to our experimental conditions, with additional effects resulting from using time-course, not average expression values, and from measurement errors.

TR was expected to correlate with many factors known to contribute to protein production, such as protein abundance, ribosome density, ribosome occupancy, mRNA concentration, the codon adaptation index (CAI), or the tRNA adaptation index (TAI)20,21. However, the TRss we computed (Fig. 6A) does not show high correlation with features that had been expected to be correlated with TR. For example, it seems intuitive and it has been proposed in Arava et al.20 that TR would be proportional to the product of ribosomal density (i.e. number of ribosomes bounded to mRNA) and ribosomal occupancy (number of mRNA associated with ribosomes), denoted in Fig. 6A as TA1. However, we did not observe such correlation using the Spearman or Pearson coefficient (Fig. 6A). Although this could suggest that neither ribosomal density nor occupancy contribute meaningfully to translation rates, the lack of high positive correlation between TR and the proposed TR contributing factors is in fact the result of high standard deviations of \(\frac{{k}_{d,i}}{[mRN{A}_{i}(t)]}\); the proportionality factor between translation rate and average protein concentration for the protein i. Indeed, the factors mentioned earlier, that are reported as likely to correlate with TR in some publications, are highly correlated with average protein concentration (Fig. 6B). TR is associated with average protein concentration, however, this correlation is not very high (0.18 for cdc15, 0.20 for brd30 and 0.19 for others) due to the important impact of protein half-life, which can vary by at least two orders of magnitude, on protein concentration (Eq. 8). Another interesting observation is that very complex attempts at modeling translation rate, such as the Ribosomal Flow Model, do not fare better than simpler models: in our comparison the complex RFM approach of 21 is outperformed by simpler methods.

Figure 6
figure 6

The Spearman (red bars) and Pearson (blue bars) correlations between: (A) Steady-state translation rates and translation rate descriptors, (B) average protein concentrations and translation rate descriptors. Several variants of translational activities have been computed (TA1, TA2 and TA3) using the following formulae: TA1 = (ribosome density) * (ribosome occupancy) * (mRNA concentration), TA2 = (ribosome density) * (ribosome occupancy) * (mRNA concentration) * CAI, TA3 = (ribosome density) * (ribosome occupancy) * (mRNA concentration) * CAI/(0.06 + (ribosome density)) * (ribosome occupancy * mRNA concentration), where CAI is codon adaptation index; tAI is tRNA adaptation index and RFM is translation rate calculated using Ribosome Flow Model21.

To visualize which cell compartments and protein functions are associated with high or low half-life and translation rate, we analyzed different MIPS functional categories and localizations using SCEPTRANS webserver (Fig. 7A–D). Global analysis shows that half-lives and translation rates have almost the same levels in all functional categories. However, there are some interesting exceptions to this principle: in the cell wall and extracellular categories there are proteins with relatively short half-lives (that is high degradation rates) and high translation rates (Fig. 7B and D). Additionally, proteins involved in protein synthesis have much shorter half-lives than average (Fig. 7A).

Figure 7
figure 7

Half-lives (in minutes) (A,B) and translation rates (C,D) in each of the functional and localization categories as described in MIPS database, as retrieved from SCEPTRANS.

In summary, the proposed model (Eq. 1), combined with a periodic data set (other time-course data sets can be used as well) allowed us to estimate not only genome wide changes in protein abundances, but also both translation and degradation rates of proteins. The model performs especially well in the most interesting case of substantially dynamic changes in protein abundances over time. It is also capable of detecting post-translational regulation of proteins for which corresponding time-course abundance data are available. Finally, the calculated protein concentration time-courses were validated experimentally for several proteins.

Discussion

Taking advantage of the high availability of genome-wide data of mRNA levels, we propose a model which predicts dynamic levels of protein abundances based on the time-course of gene expression levels and measured or predicted half-lives. We experimentally verified the proposed computational approach in the model organism S. cerevisiae by measuring protein concentration changes for selected proteins in the α-factor synchronized cell cycle using western blotting. We also showed how our approach can be used to infer post-transcriptional or post-translational regulation, if both gene expression and proteomic time-course data are available. Additionally, we introduced a variant of the method for estimating translation rates without using the standard, but typically non-physiological, steady-state assumption. Instead, we propose to use a boundary condition of the beginning and end protein concentration equivalence, which is typically satisfied not only in periodic processes like the cell cycle, but also in common time-course experiments, when the system is allowed to return to baseline after treatment. Our approach may be useful in many experimental systems in which the steady-state condition is clearly not satisfied (e.g., differentiation), but adaptive changes in translation rates play an important regulatory role17.

The motivation for our study was deeply practical: to obtain in silico estimations of time-course abundance data for proteins for which corresponding gene expression measurements are known or to integrate genomic and proteomic data to elucidate possible post-translational regulation. Most other studies in the field were instead motivated by the desire to explain the observed degree of correlation between protein abundance and gene expression levels1,15,22 or to estimate translation rates21. Nevertheless, it seems that our estimation of translation rates – a necessary step on the path to estimate protein levels - is also quite accurate, perhaps more so than in other popular methods (Fig. 6). Of course, in the case when proteomic data are unavailable, our predictions will be of limited accuracy for proteins undergoing post-translational modifications and possibly additionally due to inaccuracies in the data measurement, especially half-lives (the half-life data we used in this study9 has a multiplicative error of up to 2). Our goal, however, is not to produce accurate predictions for all proteins, but instead provide predictions that are far better than using mRNA as a proxy for a large number of proteins that are not highly unstable, but also do not undergo substantial post-translational regulation in the conditions studied. As was shown in our verification, and as should be expected, depending on half-life, protein abundance profiles may show anywhere between no resemblance, to very high resemblance to the underlying mRNA expression profiles. Therefore, our predicted protein profiles can provide a valuable resource for scientists interested in dynamic changes of protein abundances during their process of interest, but who only have gene expression profiles available, which are much easier and less expensive to measure than protein levels. Moreover, if a protein is known or predicted to undergo a post-translational modification, such as methylation23 or phosphorylation24, it can be flagged for potential lower accuracy of our predictions. If the corresponding proteomic timecourse is available, potential temporal changes to half-life can be calculated, following the approach we used for Clb2. To allow such analysis in a variety of organisms and conditions, we are developing a webserver, based on the proof-of-concept study presented in this paper, to provide predicted protein time-course profiles based on user-provided gene expression and protein half-life data. Currently, all our predictions for proteome dynamics in the budding yeast in different conditions can be conveniently browsed and visualized at http://dynprot.cent.uw.edu.pl/.

To ensure the accuracy of numerical integration, the integration step Δt in Eq.(2) should be very small, smaller than a typical resolution of time-course gene expression experiments. Therefore, estimation of mRNA concentration is required at every step of the numerical integration. In the present case of the cell cycle data sets, we obtained it from linear interpolation. Here, such approximation is justified, since the characteristic time-scales of transcriptional regulation in the process (measured e.g. as \(|\frac{dt}{d\,\mathrm{log}([mRN{A}_{i}(t)])}|\) are much longer than the step Δt. In the case of very dynamic expression data (e.g Yeast Metabolic Cycle)25, where characteristic scale of the process is shorter, more advanced methods may need to be used to prepare input transcriptomic data for modeling proteome dynamics, such as our MaxEnt model-based approach to infer high-resolution changes in gene expression13,26,27.

In summary, we have shown that a simple model of the relationship between mRNA and protein levels usually leads to a rather accurate prediction of protein levels, if post-translational regulation is not involved. Our approach can be used to obtain an approximate view of proteome dynamics (without post-translational regulation), to integrate gene expression and proteomic time-course data if both are available, or to more specific tasks, such as estimating changing degradation rates, as in our example with Clb2. Our approach was verified experimentally to provide useful results and we believe that such an approximated simulation of proteome dynamics may become the standard final step of time-course gene expression analysis, either performed for the whole genome, or for pathways or genes of interest.

The availability of genome-wide measured protein degradation rates in various organisms9,28 is growing17,29, which makes our approach more broadly applicable. Moreover, there is also substantial progress in understanding how protein half-life is encoded in its sequence, which gives hope that these values may be predicted computationally from sequence alone in the coming years30,31. This would allow the extension of our approach to any organism for which gene expression data are available.

Methods

Definitions

Ribosome density is an average number of ribosomes bound to mRNA per unit of mRNA length (100 nt).

Ribosome occupancy is a fraction of transcripts associated with ribosomes, i.e. engaged in translation, with values in the [0,1] interval.

Quantitative model of gene expression

Using periodic gene expression data enables us to eliminate the value of translation rate, ktrans,i, from equation [Eq. 1]. In order to do that, we introduced the function [Ri(t)] defined as follows:

$$[{R}_{i}(t)]=\frac{[{P}_{i}(t)]}{{k}_{trans,i}}.$$

For a small time interval Δt:

$${\int }_{t}^{t+{\rm{\Delta }}t}f(t)dt=\frac{1}{2}(f(t+{\rm{\Delta }}t)+f(t))\cdot {\rm{\Delta }}t,$$

and the first order differential equation, [Eq. 1],

$$\frac{d[{P}_{i}(t)]}{dt}={k}_{trans,i}\cdot [mRN{A}_{i}(t)]-{k}_{d,i}[{P}_{i}(t)],$$

can be rewritten in the form:

$$[{R}_{i}(t+{\rm{\Delta }}t)]=\frac{2-{k}_{d,i}\cdot {\rm{\Delta }}t}{2+{k}_{d,i}\cdot {\rm{\Delta }}t}\cdot [{R}_{i}(t)]+\frac{{\rm{\Delta }}t}{2+{k}_{d,i}\cdot {\rm{\Delta }}t}\cdot ([mRN{A}_{i}(t+{\rm{\Delta }}t)]+[mRN{A}_{i}(t)]).$$
(2)

Detailed derivation of [Eq. 2] is provided in the Appendix. The boundary condition for the [Eq. 1]:

$$[{P}_{i}(t)]=[{P}_{i}(t+T)]$$

is equivalent to:

$$[{R}_{i}(t)]=[{R}_{i}(t+T)],$$

where T is the period of the cyclic phenomenon, e.g. the length of the cell cycle.The proportionality factor ktrans,i can be obtained from the following formula:

$${k}_{trans,i}=\frac{\langle [{P}_{i}]\rangle }{\langle [{R}_{i}]\rangle },$$
(3)

where \(\langle [{P}_{i}]\rangle \), \(\langle [{R}_{i}]\rangle \) are the mean values of [Pi] and [Ri] over time T, respectively.

Data sets used and data pre-processing

The average protein and mRNA concentrations have been taken from previous studies of Beyer et al.22. Test data sets alpha, brd26, brd30, brd38, cdc15 and cdc28 are cell-cycle synchronized gene expression data sets described in detail in Table 1. The data sets alpha and cdc15 have been published by Spellman et al.32; cdc28 by Cho et al.33 and brd26, brd30 and brd38 by Pramila et al.34. The gene expression log2 ratios, Li(t), were transformed to mRNA concentrations [molecules/cell] according to the following relation:

$$[{M}_{i}(t)]={2}^{{L}_{i}(t)}\cdot \frac{\langle [{M}_{i}(t)]\rangle }{\langle {2}^{{L}_{i}(t)}\rangle },$$

where \({2}^{{L}_{i}(t)}\) is the arithmetic average of \({2}^{{L}_{i}(t)}\) in one cell cycle period and [Mi(t)]is the cell-cycle average mRNA concentration in molecules per cell, based on literature22. Linear interpolation was used to approximate the value of mRNA concentration in every minute during cell cycle, based on computed values at points of measurements (equation above).

Estimating the consensus period for periodically expressed genes

The set of genes transcriptionally regulated during the cell cycle will be defined as the genes with a transcriptional modulation consistent with the periodicity T of the mitotic cell division. We utilized the measure of periodicity defined as the periodogram, P35,36,37, of transcript concentration:

$$P(T)=\frac{2}{(b-a){\sigma }^{2}}\cdot {[{({\int }_{a}^{b}E(x)\cos (\frac{2\pi x}{T})dx)}^{2}+{({\int }_{a}^{b}E(x)\sin (\frac{2\pi x}{T})dx)}^{2}]}^{\frac{1}{2}},$$
(4)

where a and b are the beginning and end of the time-course, respectively, E is the transcript concentration and σ is the standard deviation of gene expression E. To accommodate uneven distribution of time points, we estimate P(T) using the unbiased formula of 36. The statistical significance of a single frequency (corresponding to periodicity with period T) in the periodogram, assuming a Gaussian null hypothesis, is expressed by35,36,37,38:

$$z=exp[-{P}_{E}(T)],$$
(5)

Since no reliable value of the period T measured independently from the transcriptome profiles was available, therefore, similar as in6,25, before applying Eq. 4, we estimated the most likely period of transcriptional oscillation in the system from the expression data. We have followed the Maximum Likelihood approach, using Eqs 4 and 5 for each gene independently over a range of possible periods, computing the logarithms of likelihood of periodicity for every gene and every period. These logarithms summed over all genes yield the total likelihood of every period, and the period with the maximum total likelihood has been adopted as the consensus period of regulation in the system. Estimated cell cycle periods for different data sets are described in Table 1.

Correcting the estimated protein degradation half-lives

Belle et al.9 reported protein half-lives, as estimated from the observed degradation rate, that sometimes have very high values, and, at times, negative ones. Since such values are not realistic, we adopted the following algorithm to estimate the most likely true half-lives for these proteins. We assumed that the measured quantity (degradation rate kd,i, which is related to the half-life ϑd,i by \({k}_{d,i}=\frac{\mathrm{ln}(2)}{{\vartheta }_{d,i}}\)) may include an error that has a Gaussian distribution, with a variance corresponding to the inverse of 300 minutes (the maximum reliably measureable value according to9) divided by the scaling factor ln(2). The negative reported half-lives result from experimental error, therefore, to correct the data we used the described above error model and prior assumption that a half-life must be positive. The true degradation rate was computed by integrating the normal distribution, limited and normalized to the positive part of its domain, and the inverse of this value multiplied by ln(2) was adopted as the corrected half-life. The correction was small for half-lives significantly shorter than 300 minutes, but significant for values longer than 300 minutes or negative reported values.

Calculating protein concentrations

We used the Fixed Point Iteration numerical method to solve Eq. 2 for each protein and mRNA data set. As a starting point for iterations we used [Ri(0)] = 0 and Δt = 1 minute. We continued iterative calculations until convergence, specifically until the condition \(|[{R}_{i}(T)]-[{R}_{i}(0)]|\le 5\cdot {10}^{-10}\) had been met.

Comparison between steady-state and time-course based translation rates

To determine the differences between steady-state derived translation rate, TRss, and time-course derived translation rate, TRtc, we defined the coefficient TRdiff :

$$T{R}_{diff}=\frac{|T{R}_{ss}-T{R}_{tc}|}{{\rm{\min }}(T{R}_{ss},T{R}_{tc})},$$
(6)

where the time-course derived translation rate, TRtc, is defined by Eq. 3 and the steady-state derived translation rate, TRss, is defined by:

$$T{R}_{ss,i}={k}_{d,i}\cdot \frac{[{P}_{i}]}{[{M}_{i}]}{\rm{.}}$$
(7)

Incorporating post-translational regulation

To accommodate post-translational regulation, we expanded our approach by allowing time-dependent variation of degradation rates. We will use Clb2 as an example to illustrate detecting post-translational modifications. For Clb2, fitting constant degradation rate results in poor fit, both for half-lives based on the report of O’Shea and colleagues9 (Fig. 3B) and for the much shorter half-life reported by Amon et al.14 (Fig. 3A). Therefore, instead we propose a time-dependent half-life function that will also be periodic in the consecutive cell cycles. To describe a half-life that is modified by post-translational regulation within K minute window starting at the time t0 within the cell cycle with the period T, we propose the following step function \(\vartheta {(t)}_{d}\):

$$\vartheta {(t)}_{d}=\{\begin{array}{rr}{\vartheta }_{d}^{1} & {t}_{0}\le t\le {t}_{0}+K\\ {\vartheta }_{d}^{2}\, & 0 < t < {t}_{0}\,and\,{t}_{0}+K < t\le T\end{array}.$$
(8)

To find values of \({\vartheta }_{d}^{1}\), \({\vartheta }_{d}^{2},\,{t}_{0}\,\)and K optimally describing time dependence of Clb2 half-life we numerically optimized these parameters, considering for half-lives \({\vartheta }_{d}^{1}\) and \({\vartheta }_{d}^{2}\) all values in the range from 1 minute to 40 minutes, with 1 minute steps, and for \({t}_{0}\,\)and K all possible times from the first to the last minute of the cell cycle, again with 1 minute steps. For each set of parameters for the function \(\vartheta {(t)}_{d}\), we solved Eq. 2, as described previously (Calculating protein concentrations). The set of parameters offering the best fit with experimental data was chosen as the best estimate of true Clb2 half-life. Thus, we were also able to calculate the time-dependent degradation rate for Clb2 as \(\,k{(t)}_{d}=\frac{\mathrm{ln}(2)}{\vartheta {(t)}_{d}}\). The best fit was achieved for variable half-life, with the Clb2 protein becoming extremely unstable outside of the window of its activity during the cell cycle (Fig. 3C). This result shows that our approach allows one to re-discover, ab initio, the timing of post-translational regulation of a protein, if only gene expression and proteomic time-courses are available.

α-Factor based synchronization

Yeast strain DBY8724 (Mat a GAL2 ura3 bar1::URA3) was kindly provided by P. T. Spellman. Obtained S. cerevisiae cells were synchronized by α-factor arrest as described by Spellman et al.32 and Pramila et al.34. Cells were grown to an OD600 of 0.2 in YEP glucose pH 5.5, an asynchronous sample was taken and α-factor (Sigma Aldrich) was added to a concentration of 25 ng/ml. After 2 hours cells were released from α-factor arrest by pelleting and re-suspended in fresh medium to an OD600 of 0.2 (Fig. 8C, time 0). Every 5 min, for the next 120 min, 25 samples were taken (25 ml for western blot analysis, 1 ml for FACS analysis and 1 ml to count budding index). Cell cycle progression was monitored by bud counting and DNA content analysis (FACS) (Fig. 8A–C).

Figure 8
figure 8

α-factor cell cycle synchronization. (A) Comparison of budding indices of our α-factor synchronization with those of Pramila et al.34 and Li et al.41, both for wild type (WT) and appropriate mutants. (B) FACS results for asynchronous culture (as) and selected time points of our synchronization. (C) Yeast cells sampled from asynchronous culture and at selected time points.

Budding index calculation and FACS analysis

For budding index calculation, two hundred cells were examined at every time point. The budding percentage was calculated as the number of budded cells divided by the number of all cells. To monitor DNA synthesis, samples were prepared as described previously39 and DNA content was measured using a BD FACSCalibur Flow Cytometer.

Western blot analysis

Cell extracts were prepared by TCA precipitation40 and then subjected to western blot analysis. Protein samples were separated on Mini-PROTEAN TGX 4–20% (Bio-Rad) gels and transferred to PureNitrocellulose Paper 0.45 μm (Bio-Rad). Blots were blocked using 0.2% I-Block buffer (Applied Biosystems), cut horizontally and probed with primary antibodies followed by incubation with appropriate horseradish peroxidase-conjugated secondary antibodies. The primary antisera used to detect selected proteins were from Santa Cruz Biotechnology (Rad50, Cdc5, and Clb2), Abcam (H3), Agrisera (Rnr1) and Millipore (Act1) and the secondary antisera were from Dako. Protein bands were visualized with the Immoblilon Western (Millipore) and scanned in a G-Box imaging system (Syngene). Band intensities were quantified using Gene-Snap software (Syngene).