Replying to N. Fortelny, C. M. Overall, P. Pavlidis & G. V. Cohen Freue Nature 547, doi:10.1038/nature22293 (2017)

In the accompanying Comment, Fortelny et al.1 present a re-analysis of a particular aspect of our draft human proteome2, notably our claim that “Having learned the protein/mRNA ratio for every protein and transcript, it now becomes possible to predict protein abundance in any given tissue with good accuracy from the measured mRNA abundance.” Their key criticism is that the correlation analysis used at the time “…greatly overestimates the accuracy of per-gene predictions,” and hence concluded that “…the current data do not support high accuracy when using mRNA alone.” While we agree with parts of the analysis, we do not agree with all of the conclusions.

First, the controversy may have arisen from our use and interpretation of the word ‘prediction’. In statistics, prediction is always done within the experimental unit (for example, the liver) and allows statements about (relative) protein abundance variation between biological replicates of (many) liver samples. This is neither what we did nor what we meant to imply, because our data did not contain any replicates of the same tissue and the proteomic and transcriptomic data originated from different samples. Instead, our analysis was designed to estimate (perhaps the better word in this context) the absolute abundance of proteins for tissues for which no proteomic data are available. Given the ease of obtaining transcriptomic profiles of tissues, we still think this is a useful and practical approach.

Second, the accuracy with which absolute protein levels can be estimated by our approach depends on the technical variation in the data. As we analysed in extended data figure 5 of the original publication2 on the basis of stable isotope-labelled and absolutely quantified peptides, the median fold error within the assembled (heterogeneous) proteomic data is about 3. The average median absolute deviation of the protein/mRNA ratios is about 2.8. Therefore, the technical variation in our data limits the accuracy and precision with which absolute protein abundance can be estimated and may not suffice to determine variations in protein abundance within biological replicates of the same tissue.

Third, it has been shown that the number of proteins with tissue-specific expression is surprisingly low3 and that many proteins show similar absolute expression values across different tissues and cell lines. This explains why Fortelny et al.1 observe high correlation when using the median protein abundance (‘mRNA-free’) as a proxy for the expression of a protein across all analysed tissues (figure 1a of ref. 1). Despite this observation, using measured mRNA levels is obviously meaningful because some proteins show vast (absolute) expression differences between tissues and cell lines (highlighted in figure 3a of our original work2), a biological fact that would be missed if mRNA levels were not considered.

Fourth, for technical and biological reasons, figure 1b and 1c of Fortelny et al.1 needs careful interpretation. For example, a per-gene correlation of measured versus predicted protein abundance of around zero across different tissues may simply mean that the protein is actually similarly expressed in many tissues and thus the correlation of zero has no particular meaning. Conversely, a correlation of close to 1 for a particular protein may imply a biological function requiring tissue-dependent regulation.

In conclusion, we agree with Fortelny et al.1 that more accurate data are necessary to enable prediction of protein abundance variation within a particular cell type or tissue. We also fully acknowledge that further transcriptional and post-translational factors have to be considered in order to explain protein levels in cells accurately. We therefore welcome the re-analysis of our data as it stimulates discussion on an important scientific topic and further highlights the value of creating and sharing large-scale biological data resources.

Note: The author list of this Reply contains only those individuals most closely involved in the matter discussed in this BCA.