Perspective | Published:

How many human proteoforms are there?

Nature Chemical Biology volume 14, pages 206214 (2018) | Download Citation

Subjects

Abstract

Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA- and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry–based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, “How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?” We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    et al. The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res. 45, D177–D182 (2017).

  2. 2.

    et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).

  3. 3.

    et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).

  4. 4.

    The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017). This manuscript introduces UniProt, a centralized, authoritative resource for protein sequences.

  5. 5.

    , , , & Missing protein landscape of human chromosomes 2 and 14: progress and current status. J. Proteome Res. 15, 3971–3978 (2016).

  6. 6.

    et al. The chromosome-centric human proteome project for cataloging proteins encoded in the genome. Nat. Biotechnol. 30, 221–223 (2012).

  7. 7.

    , & T cell antigen receptors and the immunoglobulin supergene family. Cell 40, 225–229 (1985).

  8. 8.

    et al. Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl. Acad. Sci. USA 106, 20216–20221 (2009).

  9. 9.

    , & The Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013). This manuscript introduces and defines the term “Proteoform.” The proteomics community has adopted this term, which regularizes the description of whole-protein molecules.

  10. 10.

    et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

  11. 11.

    et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).

  12. 12.

    et al. Widespread expansion of protein interaction capabilities by alternative splicing. Cell 164, 805–817 (2016).

  13. 13.

    & The mitochondrial proteome and human disease. Annu. Rev. Genomics. Hum. Genet. 11, 25–44 (2010).

  14. 14.

    , , & REDIportal: a comprehensive database of A-to-I RNA editing events in humans. Nucleic Acids Res. 45, D750–D757 (2017).

  15. 15.

    et al. An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer. Mol. Cell. Proteomics 15, 1060–1071 (2016).

  16. 16.

    et al. Global proteome analysis of the NCI-60 cell line panel. Cell Reports 4, 609–620 (2013).

  17. 17.

    et al. proBAMsuite, a bioinformatics framework for genome-based representation and analysis of proteomics data. Mol. Cell. Proteomics 15, 1164–1175 (2016).

  18. 18.

    & Discovery and characterization of smORF-encoded bioactive polypeptides. Nat. Chem. Biol. 11, 909–916 (2015).

  19. 19.

    et al. Regulation of DNA repair pathway choice in S and G2 phases by the NHEJ inhibitor CYREN. Nature 549, 548–552 (2017).

  20. 20.

    & The frequency of errors in protein biosynthesis. Biochem. J. 128, 1353–1356 (1972).

  21. 21.

    et al. Identification of codon-specific serine to asparagine mistranslation in recombinant monoclonal antibodies by high-resolution mass spectrometry. Anal. Chem. 81, 9282–9290 (2009).

  22. 22.

    & Translating the histone code. Science 293, 1074–1080 (2001). This manuscript describes the 'histone code', a complex set of PTMs that govern gene transcription.

  23. 23.

    et al. Glycosylation patterns of human chorionic gonadotropin revealed by liquid chromatography-mass spectrometry and bioinformatics. Electrophoresis 27, 2734–2746 (2006).

  24. 24.

    et al. Native mass spectrometry for the revelation of highly complex glycosylation in protein therapeutics. in Proteomic Forum 2017 (Deutsche Gesellschaft für Proteomforschung e.V., Potsdam, Germany, 2017).

  25. 25.

    et al. Hybrid mass spectrometry approaches in glycoprotein analysis and their usage in scoring biosimilarity. Nat. Commun. 7, 13397 (2016).

  26. 26.

    & Proteasome-independent functions of ubiquitin in endocytosis and signaling. Science 315, 201–205 (2007).

  27. 27.

    et al. The first pilot project of the consortium for top-down proteomics: a status report. Proteomics 14, 1130–1140 (2014).

  28. 28.

    et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).

  29. 29.

    et al. The size of the human proteome: the width and depth. Int. J. Anal. Chem. 2016, 7436849 (2016).

  30. 30.

    & Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet. 25, 232–234 (2000).

  31. 31.

    et al. Top-down characterization of endogenous protein complexes with native proteomics. Nat. Chem. Biol. 14, 36–41 (2018).

  32. 32.

    et al. Single-molecule enzyme-linked immunosorbent assay detects serum proteins at subfemtomolar concentrations. Nat. Biotechnol. 28, 595–599 (2010).

  33. 33.

    & Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 4, 1419–1440 (2005).

  34. 34.

    , , & Top-down proteomics: ready for prime time? Anal. Chem. 90, 110–127 (2018).

  35. 35.

    , & Progress in top-down proteomics and the analysis of proteoforms. Annu. Rev. Anal. Chem. (Palo Alto, Calif.) 9, 499–519 (2016).

  36. 36.

    & MALDI Imaging mass spectrometry: current frontiers and perspectives in pathology research and practice. Lab. Invest. 95, 422–431 (2015).

  37. 37.

    , & Mass spectrometry of membrane proteins: a focus on aquaporins. Biochemistry 52, 3807–3817 (2013).

  38. 38.

    et al. Ultra-high mass resolution MALDI imaging mass spectrometry of proteins and metabolites in a mouse model of glioblastoma. Sci. Rep. 7, 603 (2017).

  39. 39.

    et al. Homogenization of tissues via picosecond-infrared laser (PIRL) ablation: Giving a closer view on the in-vivo composition of protein species as compared to mechanical homogenization. J. Proteomics 134, 193–202 (2016).

  40. 40.

    et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

  41. 41.

    et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).

  42. 42.

    , , & The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).

  43. 43.

    A cell-based approach to the human proteome project. J. Am. Soc. Mass Spectrom. 23, 1617–1624 (2012). This manuscript framed a project to define the human proteome by mapping the composition of 1 billion proteoforms within all the different types of human cells.

  44. 44.

    , , , & The emergence of top-down proteomics in clinical research. Genome Med. 5, 53 (2013).

  45. 45.

    & A post-translational modification code for transcription factors: sorting through a sea of signals. Trends Cell Biol. 19, 189–197 (2009).

  46. 46.

    et al. Label-free relative quantitation of isobaric and isomeric human histone H2A and H2B variants by fourier transform ion cyclotron resonance top-down MS/MS. J. Proteome Res. 15, 3196–3203 (2016).

  47. 47.

    , & A complex barcode underlies the heterogeneous response of p53 to stress. Nat. Rev. Mol. Cell Biol. 9, 702–712 (2008).

  48. 48.

    Cellular memory and the histone code. Cell 111, 285–291 (2002).

  49. 49.

    & The tubulin code. Cell Cycle 6, 2152–2160 (2007).

  50. 50.

    , , & Bottom-up and middle-down proteomics have comparable accuracies in defining histone post-translational modification relative abundance and stoichiometry. Anal. Chem. 87, 3129–3133 (2015).

  51. 51.

    et al. Unabridged analysis of human histone H3 by differential top-down mass spectrometry reveals hypermethylated proteoforms from MMSET/NSD2 overexpression. Mol. Cell. Proteomics 15, 776–790 (2016).

  52. 52.

    et al. Therapeutic targeting of polycomb and BET bromodomain proteins in diffuse intrinsic pontine gliomas. Nat. Med. 23, 493–500 (2017).

  53. 53.

    et al. Heterogeneity in primary structure, post-translational modifications, and germline gene usage of nine full-length amyloidogenic kappa1 immunoglobulin light chains. Biochemistry 46, 14259–14271 (2007).

  54. 54.

    , , , & In vitro co-expression of human amyloidogenic immunoglobulin light and heavy chain proteins: a relevant cell-based model of AL amyloidosis. Amyloid 24, 115–122 (2017).

  55. 55.

    et al. Characterization of transthyretin variants in familial transthyretin amyloidosis by mass spectrometric peptide mapping and DNA sequence analysis. Anal. Chem. 74, 741–751 (2002).

  56. 56.

    Possible therapy for ALS based on the cyanobacteria/BMAA hypothesis. Amyotroph. Lateral Scler. 10 Suppl 2, 118–123 (2009).

  57. 57.

    & Parsing disease-relevant protein modifications from epiphenomena: perspective on the structural basis of SOD1-mediated ALS. J. Mass Spectrom. 52, 480–491 (2017).

  58. 58.

    Neuropathology of non-Alzheimer degenerative disorders. Int. J. Clin. Exp. Pathol. 3, 1–23 (2009).

  59. 59.

    et al. Diversity of amyloid-beta proteoforms in the Alzheimer's disease brain. Sci. Rep. 7, 9520 (2017).

  60. 60.

    et al. Quantitative measurement of intact alpha-synuclein proteoforms from post-mortem control and Parkinson's disease brain tissue by intact protein mass spectrometry. Sci. Rep. 4, 5797 (2014).

  61. 61.

    , , & α-Synucleinopathy phenotypes. Parkinsonism Relat. Disord. 20 Suppl 1, S62–S67 (2014).

  62. 62.

    Chapter 7 Ubiquitinopathies. Blue Books of Neurology 30, 165–185 (2007).

  63. 63.

    & Failure of protein quality control in amyotrophic lateral sclerosis. Biochim. Biophys. Acta 1762, 1038–1050 (2006).

  64. 64.

    et al. Top-down quantitative proteomics identified phosphorylation of cardiac troponin I as a candidate biomarker for chronic heart failure. J. Proteome Res. 10, 4054–4065 (2011).

  65. 65.

    et al. Quantitative analysis of intact apolipoproteins in human HDL by top-down differential mass spectrometry. Proc. Natl. Acad. Sci. USA 107, 7728–7733 (2010).

  66. 66.

    , , & Profiling B-type natriuretic peptide cleavage peptidoforms in human plasma by capillary electrophoresis with electrospray ionization mass spectrometry. J. Proteome Res. 16, 4515–4522 (2017).

  67. 67.

    et al. Top-down proteomics reveals a unique protein S-thiolation switch in Salmonella typhimurium in response to infection-like conditions. Proc. Natl. Acad. Sci. USA 110, 10153–10158 (2013).

  68. 68.

    et al. Identification of specific posttranslational O-mycoloylations mediating protein targeting to the mycomembrane. Proc. Natl. Acad. Sci. USA 114, 4231–4236 (2017).

  69. 69.

    et al. Posttranslational modification of pili upon cell contact triggers N. meningitidis dissemination. Science 331, 778–782 (2011).

  70. 70.

    , , & Biomedical mass spectrometry in today's and tomorrow's clinical microbiology laboratories. J. Clin. Microbiol. 50, 1513–1517 (2012).

  71. 71.

    et al. A side by side comparison of Bruker Biotyper and VITEK MS: utility of MALDI-TOF MS technology for microorganism identification in a public health reference laboratory. PLoS One 10, e0144878 (2015). This manuscript describes the use of intact mass measurement to provide a specific, orthogonal method for microorganism identification in the clinical research lab.

  72. 72.

    et al. Site-specific incorporation of phosphotyrosine using an expanded genetic code. Nat. Chem. Biol. 13, 842–844 (2017).

  73. 73.

    et al. Genetically encoding phosphotyrosine and its nonhydrolyzable analog in bacteria. Nat. Chem. Biol. 13, 845–849 (2017).

  74. 74.

    et al. A chemical biology route to site-specific authentic protein modifications. Science 354, 623–626 (2016).

  75. 75.

    , & Expanding the glycoengineering toolbox: the rise of bacterial N-linked protein glycosylation. Trends Biotechnol. 31, 313–323 (2013).

  76. 76.

    et al. Robust production of recombinant phosphoproteins using cell-free protein synthesis. Nat. Commun. 6, 8168 (2015).

  77. 77.

    & Histones: at the crossroads of peptide and protein chemistry. Chem. Rev. 115, 2296–2349 (2015).

  78. 78.

    et al. A high through-put platform for recombinant antibodies to folded proteins. Mol. Cell. Proteomics 14, 2833–2847 (2015).

  79. 79.

    et al. High-resolution myogenic lineage mapping by single-cell mass cytometry. Nat. Cell Biol. 19, 558–567 (2017).

  80. 80.

    , , & Post-translational modification: nature's escape from genetic imprisonment and the basis for dynamic information encoding. Wiley Interdiscip. Rev. Syst. Biol. Med. 4, 565–583 (2012).

  81. 81.

    & Evolvability. Proc. Natl. Acad. Sci. USA 95, 8420–8427 (1998).

  82. 82.

    , & Neuronal process structure and growth proteins are targets of heavy PTM regulation during brain development. J. Proteomics 101, 77–87 (2014).

  83. 83.

    & Moonlighting chaperone-like activity of the universal regulatory 14-3-3 proteins. FEBS J. 284, 1279–1295 (2017).

  84. 84.

    , , , & Transcription of testicular angiotensin-converting enzyme (ACE) is initiated within the 12th intron of the somatic ACE gene. Mol. Cell. Biol. 10, 4294–4302 (1990).

  85. 85.

    et al. The precursor to B-type natriuretic peptide is an O-linked glycoprotein. Arch. Biochem. Biophys. 451, 160–166 (2006).

  86. 86.

    et al. Multiple reaction monitoring to identify site-specific troponin I phosphorylated residues in the failing human heart. Circulation 126, 1828–1837 (2012).

  87. 87.

    , , & Pervasive combinatorial modification of histone H3 in human cells. Nat. Methods 4, 487–489 (2007).

  88. 88.

    , , , & Combinatorial modification of human histone H4 quantitated by two-dimensional liquid chromatography coupled with top down mass spectrometry. J. Biol. Chem. 283, 14927–14937 (2008).

  89. 89.

    , , , & High resolution CZE-MS quantitative characterization of intact biopharmaceutical proteins: proteoforms of interferon-b1. Anal. Chem. 88, 1138–1146 (2016).

  90. 90.

    et al. Top-down proteomics reveals concerted reductions in myofilament and Z-disc protein phosphorylation after acute myocardial infarction. Mol. Cell. Proteomics 13, 2752–2764 (2014).

  91. 91.

    The repertoire of glycan determinants in the human glycome. Mol. Biosyst. 5, 1087–1104 (2009).

  92. 92.

    et al. Middle-down hybrid chromatography/tandem mass spectrometry workflow for characterization of combinatorial post-translational modifications in histones. Proteomics 14, 2200–2211 (2014).

Download references

Acknowledgements

This article was enabled through generous funding of the Paul G. Allen Frontiers Program (Award 11715 to N.L.K.), which supports the curation of a human proteoform atlas (http://allen.kelleher.northwestern.edu). N.L.K. also acknowledges the NIH (P41 GM108569) and H. Thomas, M. Mullowney and S. Bratanch for their support and assistance in constructing this collaborative manuscript.

Author information

Affiliations

  1. Department of Biology, ETH Zurich, Zürich, Switzerland.

    • Ruedi Aebersold
  2. Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, USA.

    • Jeffrey N Agar
    •  & Alexander R Ivanov
  3. Department of Chemistry, University of Georgia, Athens, Georgia, USA.

    • I Jonathan Amster
  4. Department of Biomedical Sciences, Macquarie University, Sydney, New South Wales, Australia.

    • Mark S Baker
  5. Department of Chemistry, Stanford University, Stanford, California, USA.

    • Carolyn R Bertozzi
    • , Parag Mallick
    •  & Sharon J Pitteri
  6. Office of Cancer Clinical Proteomics Research, National Cancer Institute, Bethesda, Maryland, USA.

    • Emily S Boja
    •  & Henry Rodriguez
  7. Department of Biochemistry, Boston University School of Medicine, Boston, Massachusetts, USA.

    • Catherine E Costello
  8. Department of Molecular Medicine, The Scripps Research Institute, La Jolla, California, USA.

    • Benjamin F Cravatt
  9. Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland, USA.

    • Catherine Fenselau
  10. Department of Biochemistry and Biophysics, University of Pennsylvania School of Medicine, and Epigenetics Institute, Philadelphia, Pennsylvania, USA.

    • Benjamin A Garcia
  11. Department of Cell and Regenerative Biology, Human Proteomics Program, University of Wisconsin–Madison, Madison, Wisconsin, USA.

    • Ying Ge
  12. Department of Chemistry, University of Wisconsin–Madison, Madison, Wisconsin, USA.

    • Ying Ge
    •  & Lloyd M Smith
  13. Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA.

    • Jeremy Gunawardena
    •  & Vamsi K Mootha
  14. Memorial Sloan Kettering Cancer Center, New York, New York, USA.

    • Ronald C Hendrickson
  15. Department of Chemistry, University of Illinois, Urbana, Illinois, USA.

    • Paul J Hergenrother
  16. Department of Biosciences and Christian Doppler Laboratory for Biosimilar Characterization, University of Salzburg, Salzburg, Austria.

    • Christian G Huber
    •  & Therese Wohlschlager
  17. Department of Biochemistry and Molecular Biology, University of Southern Denmark, Odense, Denmark.

    • Ole N Jensen
    •  & Martin R Larsen
  18. The Center for Synthetic Biology, Northwestern University, Evanston, Illinois, USA.

    • Michael C Jewett
    •  & Milan Mrksich
  19. Department of Chemistry, Molecular Biosciences and the Proteomics Center of Excellence, Northwestern University, Evanston, Illinois, USA.

    • Neil L Kelleher
    • , Steven M Patrie
    •  & Paul M Thomas
  20. Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Laura L Kiessling
  21. Department of Cellular Molecular Pharmacology, University of California, San Francisco, California, USA.

    • Nevan J Krogan
  22. Department of Biological Chemistry, University of California, Los Angeles, California, USA.

    • Joseph A Loo
    •  & Rachel R Ogorzalek Loo
  23. Science for Life Laboratory, KTH Royal Institute of Technology, Stockholm, Sweden.

    • Emma Lundberg
  24. Department of Genetics, Stanford University, Stanford, California, USA.

    • Emma Lundberg
    •  & Michael P Snyder
  25. Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

    • Michael J MacCoss
  26. Department of Chemistry, Princeton University, Princeton, New Jersey, USA.

    • Tom W Muir
  27. Department of Biology, Saint Mary's College of California, Moraga, California, USA.

    • James J Pesavento
  28. Salk Institute for Biological Studies, Torrey Pines, California, USA.

    • Alan Saghatelian
  29. Applied Proteomics, Genentech, Inc., San Francisco, California, USA.

    • Wendy Sandoval
  30. Department of Clinical Chemistry/Central Laboratories, University Medical Center Hamburg – Eppendorf, Hamburg, Germany.

    • Hartmut Schlüter
  31. National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, Maryland, USA.

    • Salvatore Sechi
  32. Department of Chemistry, Yale University, New Haven, Connecticut, USA.

    • Sarah A Slavoff
  33. Genome Center of Wisconsin, Madison, Wisconsin, USA.

    • Lloyd M Smith
  34. Department of Microbiology, KTH Royal Institute of Technology, Stockholm, Sweden.

    • Mathias Uhlén
  35. Cedars Sinai Medical Center, Los Angeles, California, USA.

    • Jennifer E Van Eyk
  36. Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA.

    • Marc Vidal
  37. Department of Pathology, Harvard Medical School and Wyss Institute at Harvard University, Boston, Massachusetts, USA.

    • David R Walt
  38. Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Forest M White
  39. Department of Chemistry, University of California, Berkeley, Berkeley, California, USA.

    • Evan R Williams
  40. Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio, USA.

    • Vicki H Wysocki
  41. Department of Cell Biology, University of Pittsburgh, Pittsburgh, Pennsylvania, USA.

    • Nathan A Yates
  42. Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA.

    • Nicolas L Young
    •  & Bing Zhang

Authors

  1. Search for Ruedi Aebersold in:

  2. Search for Jeffrey N Agar in:

  3. Search for I Jonathan Amster in:

  4. Search for Mark S Baker in:

  5. Search for Carolyn R Bertozzi in:

  6. Search for Emily S Boja in:

  7. Search for Catherine E Costello in:

  8. Search for Benjamin F Cravatt in:

  9. Search for Catherine Fenselau in:

  10. Search for Benjamin A Garcia in:

  11. Search for Ying Ge in:

  12. Search for Jeremy Gunawardena in:

  13. Search for Ronald C Hendrickson in:

  14. Search for Paul J Hergenrother in:

  15. Search for Christian G Huber in:

  16. Search for Alexander R Ivanov in:

  17. Search for Ole N Jensen in:

  18. Search for Michael C Jewett in:

  19. Search for Neil L Kelleher in:

  20. Search for Laura L Kiessling in:

  21. Search for Nevan J Krogan in:

  22. Search for Martin R Larsen in:

  23. Search for Joseph A Loo in:

  24. Search for Rachel R Ogorzalek Loo in:

  25. Search for Emma Lundberg in:

  26. Search for Michael J MacCoss in:

  27. Search for Parag Mallick in:

  28. Search for Vamsi K Mootha in:

  29. Search for Milan Mrksich in:

  30. Search for Tom W Muir in:

  31. Search for Steven M Patrie in:

  32. Search for James J Pesavento in:

  33. Search for Sharon J Pitteri in:

  34. Search for Henry Rodriguez in:

  35. Search for Alan Saghatelian in:

  36. Search for Wendy Sandoval in:

  37. Search for Hartmut Schlüter in:

  38. Search for Salvatore Sechi in:

  39. Search for Sarah A Slavoff in:

  40. Search for Lloyd M Smith in:

  41. Search for Michael P Snyder in:

  42. Search for Paul M Thomas in:

  43. Search for Mathias Uhlén in:

  44. Search for Jennifer E Van Eyk in:

  45. Search for Marc Vidal in:

  46. Search for David R Walt in:

  47. Search for Forest M White in:

  48. Search for Evan R Williams in:

  49. Search for Therese Wohlschlager in:

  50. Search for Vicki H Wysocki in:

  51. Search for Nathan A Yates in:

  52. Search for Nicolas L Young in:

  53. Search for Bing Zhang in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Neil L Kelleher.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nchembio.2576