Gene co-expression profiling is a well-established method to predict protein function. These analyses are often carried out at the transcript level, but this may lead to inaccurate results when mRNA and protein levels do not correlate.

Juri Rappsilber and colleagues at the University of Edinburgh used quantitative proteomics to perform large-scale co-expression screens directly at the protein level and built a database of co-regulated proteins. These data reveal protein associations and functional connections independent of mRNA co-expression, physical protein interactions or colocalization.

The researchers combined their own and published isotope labeling mass spectrometry data to quantify the cellular proteome response to 294 biological conditions. To identify proteins with similar quantitative trends across these conditions, Rappsilber and co-workers used unsupervised machine learning, which is shown to be more robust and selective than Pearson correlation analysis or related metrics. They also provided evidence that machine-learning-derived protein co-regulation scores are more informative than mRNA co-expression analysis, although transcriptomics still has distinct advantages with regard to gene coverage.

The protein co-regulation scores form the basis of the ProteomeHD resource, which complements existing protein association databases such as STRING and BioGRID. The Rappsilber team shows that ProteomeHD can reveal proteins with dual cellular functions and provide functional insights that are difficult to obtain by other proteomics approaches. For example, they find the peroxisomal protein PEX11β is co-regulated with several mitochondrial proteins, and confirm in follow-up experiments that PEX11β contributes to peroxisome-mitochondria contacts. ProteomeHD is available as an interactive and functionally annotated map at www.proteomeHD.net, bringing functional genomics one step closer to the protein level.