Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Proteogenomics: concepts, applications and computational strategies

Abstract

Proteogenomics is an area of research at the interface of proteomics and genomics. In this approach, customized protein sequence databases generated using genomic and transcriptomic information are used to help identify novel peptides (not present in reference protein sequence databases) from mass spectrometry–based proteomic data; in turn, the proteomic data can be used to provide protein-level evidence of gene expression and to help refine gene models. In recent years, owing to the emergence of new sequencing technologies such as RNA-seq and dramatic improvements in the depth and throughput of mass spectrometry–based proteomics, the pace of proteogenomic research has greatly accelerated. Here I review the current state of proteogenomic methods and applications, including computational strategies for building and using customized protein sequence databases. I also draw attention to the challenge of false positive identifications in proteogenomics and provide guidelines for analyzing the data and reporting the results of proteogenomic studies.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Peptide and protein identification in shotgun proteomics.
Figure 2: The concept of proteogenomics.
Figure 3: Type of peptides identified in proteogenomics.
Figure 4: Statistical assessment of peptide identifications in proteogenomics.

References

  1. Mann, M., Kulak, N.A., Nagaraj, N. & Cox, J. The coming age of complete, accurate, and ubiquitous proteomes. Mol. Cell 49, 583–590 (2013).

    Article  CAS  PubMed  Google Scholar 

  2. Bantscheff, M., Lemeer, S., Savitski, M.M. & Kuster, B. Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present. Anal. Bioanal. Chem. 404, 939–965 (2012).

    CAS  Article  PubMed  Google Scholar 

  3. Nesvizhskii, A.I. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J. Proteomics 73, 2092–2123 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Nesvizhskii, A.I. & Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 4, 1419–1440 (2005).

    Article  CAS  PubMed  Google Scholar 

  5. Dasari, S. et al. TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res. 9, 1716–1726 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ma, B. & Johnson, R. De novo sequencing and homology searching. Mol. Cell. Proteomics 11, O111.014902 (2012).

    Article  CAS  PubMed  Google Scholar 

  7. Jaffe, J.D., Berg, H.C. & Church, G.M. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4, 59–77 (2004).

    Article  CAS  PubMed  Google Scholar 

  8. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Ingolia, N.T. Ribosome profiling: new views of translation, from single codons to genome scale. Nat. Rev. Genet. 15, 205–213 (2014).

    Article  CAS  PubMed  Google Scholar 

  10. Desiere, F. et al. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol. 6, R9 (2005). Analysis of a large compendium of proteomic data from multiple studies: the first publicly available repository of mass spectrometry data, PeptideAtlas.

    Article  PubMed  Google Scholar 

  11. Ning, K. & Nesvizhskii, A.I. The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment. BMC Bioinformatics 11 (suppl. 11), S14 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Menschaert, G. et al. Deep proteome coverage based on ribosome profiling aids MS-based protein and peptide discovery and provides evidence of alternative translation products and near-cognate translation initiation events. Mol. Cell. Proteomics 12, 1780–1790 (2013). Use of ribosome-profiling data for creating customized protein sequence databases.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Sheynkman, G.M., Shortreed, M.R., Frey, B.L. & Smith, L.M. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol. Cell. Proteomics 12, 2341–2353 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Low, T.Y. et al. Quantitative and qualitative proteome characteristics extracted from in-depth integrated genomics and proteomics analysis. Cell Rep. 5, 1469–1478 (2013).

    Article  CAS  PubMed  Google Scholar 

  15. Wu, P. et al. Discovery of novel genes and gene isoforms by integrating transcriptomic and proteomic profiling from mouse liver. J. Proteome Res. 13, 2409–2419 (2014).

    Article  CAS  PubMed  Google Scholar 

  16. Omasits, U. et al. Directed shotgun proteomics guided by saturated RNA-seq identifies a complete expressed prokaryotic proteome. Genome Res. 23, 1916–1927 (2013). Comprehensive proteogenomic study integrating RNA-seq and proteomic data.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).

    Article  CAS  PubMed  Google Scholar 

  19. Zhang, B. et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387 (2014). Large-scale CPTAC study integrating proteomic and genomic data from human colon and rectal TCGA samples.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. Baerenfaller, K. et al. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320, 938–941 (2008). Comprehensive proteogenomic study to assemble a proteome map of an organism.

    Article  CAS  PubMed  Google Scholar 

  22. Brunner, E. et al. A high-quality catalog of the Drosophila melanogaster proteome. Nat. Biotechnol. 25, 576–583 (2007).

    Article  CAS  PubMed  Google Scholar 

  23. Khatun, J. et al. Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions. BMC Genomics 14, 141 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Fermin, D. et al. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol. 7, R35 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Castellana, N.E. et al. An automated proteogenomic method uses mass spectrometry to reveal novel genes in Zea mays. Mol. Cell. Proteomics 13, 157–167 (2014).

    Article  CAS  PubMed  Google Scholar 

  26. Blakeley, P., Overton, I.M. & Hubbard, S.J. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J. Proteome Res. 11, 5221–5234 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Brosch, M. et al. Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and “resurrected” pseudogenes in the mouse genome. Genome Res. 21, 756–767 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Tanner, S. et al. Improving gene annotation using peptide mass spectrometry. Genome Res. 17, 231–239 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Brent, M.R. Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat. Rev. Genet. 9, 62–73 (2008).

    Article  CAS  PubMed  Google Scholar 

  30. Castellana, N.E. et al. Discovery and revision of Arabidopsis genes by proteogenomics. Proc. Natl. Acad. Sci. USA 105, 21034–21038 (2008). Application of an advanced computational pipeline for proteogenomic annotation.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Choudhary, J.S., Blackstock, W.P., Creasy, D.M. & Cottrell, J.S. Interrogating the human genome using uninterpreted mass spectrometry data. Proteomics 1, 651–667 (2001).

    Article  CAS  PubMed  Google Scholar 

  32. Edwards, N.J. Novel peptide identification from tandem mass spectra using ESTs and sequence database compression. Mol. Syst. Biol. 3, 102 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Nesvizhskii, A.I. et al. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data: toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell. Proteomics 5, 652–670 (2006).

    Article  CAS  PubMed  Google Scholar 

  34. Derrien, T. et al. The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res. 22, 1775–1789 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Engström, P.G. et al. Systematic evaluation of spliced alignment programs for RNA-seq data. Nat. Methods 10, 1185–1191 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nat. Methods 10, 1177–1184 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Evans, V.C. et al. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat. Methods 9, 1207–1211 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Sheynkman, G.M. et al. Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations. BMC Genomics 15, 703 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Wang, X. & Zhang, B. customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics 29, 3235–3237 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Woo, S. et al. Proteogenomic database construction driven from large scale RNA-seq data. J. Proteome Res. 13, 21–28 (2014).

    Article  CAS  PubMed  Google Scholar 

  41. Li, J. et al. A bioinformatics workflow for variant peptide detection in shotgun proteomics. Mol. Cell. Proteomics 10, M110.006536 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Picardi, E. & Pesole, G. REDItools: high-throughput RNA editing detection made easy. Bioinformatics 29, 1813–1814 (2013).

    Article  CAS  PubMed  Google Scholar 

  43. Menon, R. et al. Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. Cancer Res. 69, 300–309 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Xie, C. et al. NONCODEv4: exploring the world of long non-coding RNA genes. Nucleic Acids Res. 42, D98–D103 (2014).

    Article  CAS  PubMed  Google Scholar 

  45. Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Frenkel-Morgenstern, M. et al. ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data. Nucleic Acids Res. 41, D142–D151 (2013).

    Article  CAS  PubMed  Google Scholar 

  47. Frenkel-Morgenstern, M. et al. Chimeras taking shape: potential functions of proteins encoded by chimeric RNA transcripts. Genome Res. 22, 1231–1242 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Krug, K. et al. Deep coverage of the Escherichia coli proteome enables the assessment of false discovery rates in simple proteogenomic experiments. Mol. Cell. Proteomics 12, 3420–3430 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Shteynberg, D., Nesvizhskii, A.I., Moritz, R.L. & Deutsch, E.W. Combining results of multiple search engines in proteomics. Mol. Cell. Proteomics 12, 2383–2393 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Branca, R.M. et al. HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics. Nat. Methods 11, 59–62 (2014). Large-scale proteogenomic study seeking to identify novel protein-coding loci in human and mouse.

    Article  CAS  PubMed  Google Scholar 

  51. Ning, K., Fermin, D. & Nesvizhskii, A.I. Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets. Proteomics 10, 2712–2718 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Helmy, M., Sugiyama, N., Tomita, M. & Ishihama, Y. Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics. Genes Cells 17, 633–644 (2012).

    Article  CAS  PubMed  Google Scholar 

  53. Shteynberg, D. et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Castellana, N. & Bafna, V. Proteogenomics to discover the full coding content of genomes: A computational perspective. J. Proteomics 73, 2124–2135 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Abraham, P., Adams, R.M., Tuskan, G.A. & Hettich, R.L. Moving away from the reference genome: evaluating a peptide sequencing tagging approach for single amino acid polymorphism identifications in the genus Populus. J. Proteome Res. 12, 3642–3651 (2013).

    Article  CAS  PubMed  Google Scholar 

  56. Tsur, D., Tanner, S., Zandi, E., Bafna, V. & Pevzner, P.A. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol. 23, 1562–1567 (2005).

    Article  CAS  PubMed  Google Scholar 

  57. Lasonder, E. et al. Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419, 537–542 (2002).

    Article  CAS  PubMed  Google Scholar 

  58. Merrihew, G.E. et al. Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations. Genome Res. 18, 1660–1669 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Chaerkady, R. et al. A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res. 21, 1872–1881 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Alfaro, J.A., Sinha, A., Kislinger, T. & Boutros, P.C. Onco-proteogenomics: cancer proteomics joins forces with genomics. Nat. Methods 11, 1107–1113 (2014).

    Article  CAS  PubMed  Google Scholar 

  61. Küster, B., Mortensen, P., Andersen, J.S. & Mann, M. Mass spectrometry allows direct identification of proteins in large genomes. Proteomics 1, 641–650 (2001).

    Article  PubMed  Google Scholar 

  62. Yang, X. et al. Discovery and annotation of small proteins using genomics, proteomics, and computational approaches. Genome Res. 21, 634–641 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Frith, M.C. et al. The abundance of short proteins in the mammalian proteome. PLoS Genet. 2, e52 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Oyama, M. et al. Diversity of translation start sites may define increased complexity of the human short ORFeome. Mol. Cell. Proteomics 6, 1000–1006 (2007).

    Article  CAS  PubMed  Google Scholar 

  65. Slavoff, S.A. et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 9, 59 (2013). Identification of sORFs using mass spectrometry data.

    Article  CAS  PubMed  Google Scholar 

  66. Hartmann, E.M. & Armengaud, J. N-terminomics and proteogenomics, getting off to a good start. Proteomics doi:10.1002/pmic.201400157 (2014).

  67. Van Damme, P., Gawron, D., Van Criekinge, W. & Menschaert, G. N-terminal proteomics and ribosome profiling provide a comprehensive view of the alternative translation initiation landscape in mice and men. Mol. Cell. Proteomics 13, 1245–1261 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Nilsen, T.W. & Graveley, B.R. Expansion of the eukaryotic proteome by alternative splicing. Nature 463, 457–463 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Menon, R. & Omenn, G.S. in Data Mining in Proteomics: From Standards to Applications (eds. Hamacher, M., Eisenacher, M. & Stephan, C.) Ch. 20, 319–326 (2011).

    Book  Google Scholar 

  70. Stunnenberg, H.G. & Hubner, N.C. Genomics meets proteomics: identifying the culprits in disease. Hum. Genet. 133, 689–700 (2014).

    Article  CAS  PubMed  Google Scholar 

  71. Sheynkman, G.M., Shortreed, M.R., Frey, B.L., Scalf, M. & Smith, L.M. Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences. J. Proteome Res. 13, 228–240 (2014).

    Article  CAS  PubMed  Google Scholar 

  72. Wang, X. et al. Protein identification using customized protein sequence databases derived from RNA-Seq data. J. Proteome Res. 11, 1009–1017 (2012).

    Article  CAS  PubMed  Google Scholar 

  73. Stepanova, V.V. & Gelfand, M.S. RNA editing: classical cases and outlook of new technologies. Mol. Biol. 48, 11–15 (2014).

    Article  CAS  Google Scholar 

  74. Li, M. et al. Widespread RNA and DNA sequence differences in the human transcriptome. Science 333, 53–58 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Guttman, M., Russell, P., Ingolia, N.T., Weissman, J.S. & Lander, E.S. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Bánfai, B. et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 22, 1646–1657 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Junqueira, M. et al. Protein identification pipeline for the homology-driven proteomics. J. Proteomics 71, 346–356 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Renard, B.Y. et al. Overcoming species boundaries in peptide identification with Bayesian information criterion-driven error-tolerant peptide search (BICEPS). Mol. Cell. Proteomics 11, M111.014167 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Armengaud, J. et al. Non-model organisms, a species endangered by proteogenomics. J. Proteomics 105, 5–18 (2014).

    Article  CAS  PubMed  Google Scholar 

  80. Gupta, N. et al. Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res. 18, 1133–1142 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Tovchigrechko, A., Venepally, P. & Payne, S.H. PGP: parallel prokaryotic proteogenomics pipeline for MPI clusters, high-throughput batch clusters and multicore workstations. Bioinformatics 30, 1469–1470 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Lo, I. et al. Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature 446, 537–541 (2007).

    Article  CAS  PubMed  Google Scholar 

  83. Delmotte, N. et al. Community proteogenomics reveals insights into the physiology of phyllosphere bacteria. Proc. Natl. Acad. Sci. USA 106, 16428–16433 (2009). Large-scale study demonstrating the power of combined metagenome and metaproteome analysis.

    Article  PubMed  PubMed Central  Google Scholar 

  84. Seifert, J. et al. Bioinformatic progress and applications in metaproteogenomics for bridging the gap between genomic sequences and metabolic functions in microbial communities. Proteomics 13, 2786–2804 (2013).

    CAS  PubMed  Google Scholar 

  85. Muth, T., Benndorf, D., Reichl, U., Rapp, E. & Martens, L. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. Mol. Biosyst. 9, 578–585 (2013).

    Article  CAS  PubMed  Google Scholar 

  86. Tanca, A. et al. Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture. PLoS ONE 8, e82981 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. de Souza, G.A. et al. Proteogenomic analysis of polymorphisms and gene annotation divergences in prokaryotes using a clustered mass spectrometry-friendly database. Mol. Cell. Proteomics 10, M110.002527 (2011).

    Article  CAS  PubMed  Google Scholar 

  88. Penzlin, A. et al. Pipasic: similarity and expression correction for strain-level identification and quantification in metaproteomics. Bioinformatics 30, i149–i156 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Albright, J.C., Goering, A.W., Doroghazi, J.R., Metcalf, W.W. & Kelleher, N.L. Strain-specific proteogenomics accelerates the discovery of natural products via their biosynthetic pathways. J. Ind. Microbiol. Biotechnol. 41, 451–459 (2014).

    Article  CAS  PubMed  Google Scholar 

  90. Rodriguez, H. et al. Recommendations from the 2008 International Summit on Proteomics Data Release and Sharing Policy: the Amsterdam principles. J. Proteome Res. 8, 3689–3692 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Vizcaíno, J.A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Mudge, J.M., Frankish, A. & Harrow, J. Functional transcriptomics in the post-ENCODE era. Genome Res. 23, 1961–1973 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Carr, S. et al. The need for guidelines in publication of peptide and protein identification data: Working Group On Publication Guidelines For Peptide And Protein Identification Data. Mol. Cell. Proteomics 3, 531–533 (2004).

    Article  CAS  PubMed  Google Scholar 

  94. Omenn, G.S. The strategy, organization, and progress of the HUPO Human Proteome Project. J. Proteomics 100, 3–7 (2014).

    Article  CAS  PubMed  Google Scholar 

  95. Ellis, M.J. et al. Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov. 3, 1108–1112 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Ezkurdia, I. et al. Comparative proteomics reveals a significant bias toward alternative protein isoforms with conserved structure and function. Mol. Biol. Evol. 29, 2265–2283 (2012). Bioinformatic analysis of proteomic data for improved characterization of alternative splicing.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Leoni, G., Le Pera, L., Ferrè, F., Raimondo, D. & Tramontano, A. Coding potential of the products of alternative splicing in human. Genome Biol. 12, R9 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Wu, L. et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Albert, F.W., Treusch, S., Shockley, A.H., Bloom, J.S. & Kruglyak, L. Genetics of single-cell protein abundance variation in large yeast populations. Nature 506, 494–497 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Picotti, P. et al. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 494, 266–270 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work has been funded in part with US National Institute of Health grant R01-GM-094231. I thank A. Kong, B. Veeneman, A. Shanmugam and G. Omenn for useful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexey I Nesvizhskii.

Ethics declarations

Competing interests

The author declares no competing financial interests.

Supplementary information

Supplementary Table

Supplementary Table 1 (PDF 87 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nesvizhskii, A. Proteogenomics: concepts, applications and computational strategies. Nat Methods 11, 1114–1125 (2014). https://doi.org/10.1038/nmeth.3144

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3144

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing