HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics


We present a liquid chromatography–mass spectrometry (LC-MS)-based method permitting unbiased (gene prediction–independent) genome-wide discovery of protein-coding loci in higher eukaryotes. Using high-resolution isoelectric focusing (HiRIEF) at the peptide level in the 3.7–5.0 pH range and accurate peptide isoelectric point (pI) prediction, we probed the six-reading-frame translation of the human and mouse genomes and identified 98 and 52 previously undiscovered protein-coding loci, respectively. The method also enabled deep proteome coverage, identifying 13,078 human and 10,637 mouse proteins.

Figure 1: HiRIEF LC-MS enables unbiased proteogenomics in higher eukaryotes.
Figure 2: Analysis of distribution of novel peptides into different categories.
Figure 3: New gene models.


Funding from the Swedish Research Council, Swedish Cancer Society, Stockholm's county council, Stockholm's cancer society and EU FP7 project GlycoHit is gratefully acknowledged. Support by BILS (Bioinformatics Infrastructure for Life Sciences) and J. Boekel in publishing the MS raw files is gratefully acknowledged. We thank the SciLifeLab genomics facility for experimental support and J. Lundeberg for the A431 sequence data. We thank E. Bereczki (Karolinska Institutet) for the kind gift of the N2A cell line. We thank K. Lindblad-Toh for critical reading of the manuscript. We acknowledge the late B. Bjellqvist for his early contribution in the development of IPG-IEF and peptide pI prediction.

J.L., R.M.M.B., L.M.O., L.K. and H.J.J. conceived of and designed the experiments. R.M.M.B. and H.J.J. performed the IEF separations and MS analysis. M.H., L.M.O. and R.M.M.B. performed the data analysis of RNA-seq experiments. R.M.M.B. performed the peptide pI calculations. L.K., J.L., R.M.M.B. and V.G. designed the database restriction workflow and performed the 6FT searches. L.K. and V.G. designed the novel-only TDA approach. Å.P.-B. performed the single-nucleotide polymorphism data analysis and calculated Ensembl annotation statistics. R.M.M.B., L.M.O., H.J.J. and J.F. performed proteomics data analysis. R.M.M.B., L.M.O. and J.L. wrote the manuscript. All authors were involved in discussion of the manuscript and approved the final manuscript.

Correspondence to Janne Lehtiö.

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15 (PDF 1952 kb)

Supplementary Table 1

Conventional proteomics performance as measured by number of PSMs, peptides, protein groups and corresponding genes. (XLSX 10 kb)

Supplementary Table 2

Novel peptides identified by proteogenomics in H. sapiens and supporting evidence. (XLSX 103 kb)

Supplementary Table 3

Novel peptides identified by proteogenomics in M. musculus and supporting evidence. (XLSX 35 kb)

Supplementary Table 4

Proteins identified by conventional proteomics in H. sapiens. (XLSX 4938 kb)

Supplementary Table 5

Proteins identified by conventional proteomics in M. musculus. (XLSX 895 kb)

Supplementary Data 1

Annotations of MS2 spectra pertaining to the novel peptides. (ZIP 6962 kb)

Supplementary Data 2

Custom tracks for data visualization in the UCSC genome browser. (ZIP 766 kb)

Supplementary Software

PredpI algorithm (ZIP 2988 kb)

Branca, R., Orre, L., Johansson, H. et al. HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics. Nat Methods 11, 59–62 (2014).

