Despite decades of accumulated knowledge about proteins and their post-translational modifications (PTMs), numerous questions remain regarding their molecular composition and biological function. One of the most fundamental queries is the extent to which the combinations of DNA-, RNA- and PTM-level variations explode the complexity of the human proteome. Here, we outline what we know from current databases and measurement strategies including mass spectrometry–based proteomics. In doing so, we examine prevailing notions about the number of modifications displayed on human proteins and how they combine to generate the protein diversity underlying health and disease. We frame central issues regarding determination of protein-level variation and PTMs, including some paradoxes present in the field today. We use this framework to assess existing data and to ask the question, “How many distinct primary structures of proteins (proteoforms) are created from the 20,300 human genes?” We also explore prospects for improving measurements to better regularize protein-level biology and efficiently associate PTMs to function and phenotype.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Clinical Proteomics Open Access 26 August 2023
Nature Chemical Biology Open Access 20 March 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Gaudet, P. et al. The neXtProt knowledgebase on human proteins: 2017 update. Nucleic Acids Res. 45, D177–D182 (2017).
Uhlén, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
Aken, B.L. et al. Ensembl 2017. Nucleic Acids Res. 45, D635–D642 (2017).
The UniProt Consortium. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45, D158–D169 (2017). This manuscript introduces UniProt, a centralized, authoritative resource for protein sequences.
Duek, P., Bairoch, A., Gateau, A., Vandenbrouck, Y. & Lane, L. Missing protein landscape of human chromosomes 2 and 14: progress and current status. J. Proteome Res. 15, 3971–3978 (2016).
Paik, Y.K. et al. The chromosome-centric human proteome project for cataloging proteins encoded in the genome. Nat. Biotechnol. 30, 221–223 (2012).
Hood, L., Kronenberg, M. & Hunkapiller, T. T cell antigen receptors and the immunoglobulin supergene family. Cell 40, 225–229 (1985).
Glanville, J. et al. Precise determination of the diversity of a combinatorial antibody library gives insight into the human immunoglobulin repertoire. Proc. Natl. Acad. Sci. USA 106, 20216–20221 (2009).
Smith, L.M., Kelleher, N.L. & The Consortium for Top Down Proteomics. Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013). This manuscript introduces and defines the term “Proteoform.” The proteomics community has adopted this term, which regularizes the description of whole-protein molecules.
Li, Y.I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Wang, E.T. et al. Alternative isoform regulation in human tissue transcriptomes. Nature 456, 470–476 (2008).
Yang, X. et al. Widespread expansion of protein interaction capabilities by alternative splicing. Cell 164, 805–817 (2016).
Calvo S.E. & Mootha V.K. The mitochondrial proteome and human disease. Annu. Rev. Genomics. Hum. Genet. 11, 25–44 (2010).
Picardi, E., D'Erchia, A.M., Lo Giudice, C. & Pesole, G. REDIportal: a comprehensive database of A-to-I RNA editing events in humans. Nucleic Acids Res. 45, D750–D757 (2017).
Ruggles, K.V. et al. An analysis of the sensitivity of proteogenomic mapping of somatic mutations and novel splicing events in cancer. Mol. Cell. Proteomics 15, 1060–1071 (2016).
Gholami, A.M. et al. Global proteome analysis of the NCI-60 cell line panel. Cell Reports 4, 609–620 (2013).
Wang, X. et al. proBAMsuite, a bioinformatics framework for genome-based representation and analysis of proteomics data. Mol. Cell. Proteomics 15, 1164–1175 (2016).
Saghatelian, A. & Couso, J.P. Discovery and characterization of smORF-encoded bioactive polypeptides. Nat. Chem. Biol. 11, 909–916 (2015).
Arnoult, N. et al. Regulation of DNA repair pathway choice in S and G2 phases by the NHEJ inhibitor CYREN. Nature 549, 548–552 (2017).
Loftfield, R.B. & Vanderjagt, D. The frequency of errors in protein biosynthesis. Biochem. J. 128, 1353–1356 (1972).
Yu, X.C. et al. Identification of codon-specific serine to asparagine mistranslation in recombinant monoclonal antibodies by high-resolution mass spectrometry. Anal. Chem. 81, 9282–9290 (2009).
Jenuwein, T. & Allis, C.D. Translating the histone code. Science 293, 1074–1080 (2001). This manuscript describes the 'histone code', a complex set of PTMs that govern gene transcription.
Toll, H. et al. Glycosylation patterns of human chorionic gonadotropin revealed by liquid chromatography-mass spectrometry and bioinformatics. Electrophoresis 27, 2734–2746 (2006).
Wohlschlager, T. et al. Native mass spectrometry for the revelation of highly complex glycosylation in protein therapeutics. in Proteomic Forum 2017 (Deutsche Gesellschaft für Proteomforschung e.V., Potsdam, Germany, 2017).
Yang, Y. et al. Hybrid mass spectrometry approaches in glycoprotein analysis and their usage in scoring biosimilarity. Nat. Commun. 7, 13397 (2016).
Mukhopadhyay, D. & Riezman, H. Proteasome-independent functions of ubiquitin in endocytosis and signaling. Science 315, 201–205 (2007).
Dang, X. et al. The first pilot project of the consortium for top-down proteomics: a status report. Proteomics 14, 1130–1140 (2014).
Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).
Ponomarenko, E.A. et al. The size of the human proteome: the width and depth. Int. J. Anal. Chem. 2016, 7436849 (2016).
Ewing, B. & Green, P. Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet. 25, 232–234 (2000).
Skinner, O.S. et al. Top-down characterization of endogenous protein complexes with native proteomics. Nat. Chem. Biol. 14, 36–41 (2018).
Rissin, D.M. et al. Single-molecule enzyme-linked immunosorbent assay detects serum proteins at subfemtomolar concentrations. Nat. Biotechnol. 28, 595–599 (2010).
Nesvizhskii, A.I. & Aebersold, R. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteomics 4, 1419–1440 (2005).
Chen, B., Brown, K.A., Lin, Z. & Ge, Y. Top-down proteomics: ready for prime time? Anal. Chem. 90, 110–127 (2018).
Toby, T.K., Fornelli, L. & Kelleher, N.L. Progress in top-down proteomics and the analysis of proteoforms. Annu. Rev. Anal. Chem. (Palo Alto, Calif.) 9, 499–519 (2016).
Aichler, M. & Walch, A. MALDI Imaging mass spectrometry: current frontiers and perspectives in pathology research and practice. Lab. Invest. 95, 422–431 (2015).
Schey, K.L., Grey, A.C. & Nicklay, J.J. Mass spectrometry of membrane proteins: a focus on aquaporins. Biochemistry 52, 3807–3817 (2013).
Dilillo, M. et al. Ultra-high mass resolution MALDI imaging mass spectrometry of proteins and metabolites in a mouse model of glioblastoma. Sci. Rep. 7, 603 (2017).
Kwiatkowski, M. et al. Homogenization of tissues via picosecond-infrared laser (PIRL) ablation: Giving a closer view on the in-vivo composition of protein species as compared to mechanical homogenization. J. Proteomics 134, 193–202 (2016).
Kim, M.S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).
Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).
Rozenblatt-Rosen, O., Stubbington, M.J.T., Regev, A. & Teichmann, S.A. The Human Cell Atlas: from vision to reality. Nature 550, 451–453 (2017).
Kelleher, N.L. A cell-based approach to the human proteome project. J. Am. Soc. Mass Spectrom. 23, 1617–1624 (2012). This manuscript framed a project to define the human proteome by mapping the composition of ∼1 billion proteoforms within all the different types of human cells.
Savaryn, J.P., Catherman, A.D., Thomas, P.M., Abecassis, M.M. & Kelleher, N.L. The emergence of top-down proteomics in clinical research. Genome Med. 5, 53 (2013).
Benayoun, B.A. & Veitia, R.A. A post-translational modification code for transcription factors: sorting through a sea of signals. Trends Cell Biol. 19, 189–197 (2009).
Dang, X. et al. Label-free relative quantitation of isobaric and isomeric human histone H2A and H2B variants by fourier transform ion cyclotron resonance top-down MS/MS. J. Proteome Res. 15, 3196–3203 (2016).
Murray-Zmijewski, F., Slee, E.A. & Lu, X. A complex barcode underlies the heterogeneous response of p53 to stress. Nat. Rev. Mol. Cell Biol. 9, 702–712 (2008).
Turner, B.M. Cellular memory and the histone code. Cell 111, 285–291 (2002).
Verhey, K.J. & Gaertig, J. The tubulin code. Cell Cycle 6, 2152–2160 (2007).
Sidoli, S., Lin, S., Karch, K.R. & Garcia, B.A. Bottom-up and middle-down proteomics have comparable accuracies in defining histone post-translational modification relative abundance and stoichiometry. Anal. Chem. 87, 3129–3133 (2015).
Zheng, Y. et al. Unabridged analysis of human histone H3 by differential top-down mass spectrometry reveals hypermethylated proteoforms from MMSET/NSD2 overexpression. Mol. Cell. Proteomics 15, 776–790 (2016).
Piunti, A. et al. Therapeutic targeting of polycomb and BET bromodomain proteins in diffuse intrinsic pontine gliomas. Nat. Med. 23, 493–500 (2017).
Connors, L.H. et al. Heterogeneity in primary structure, post-translational modifications, and germline gene usage of nine full-length amyloidogenic kappa1 immunoglobulin light chains. Biochemistry 46, 14259–14271 (2007).
Klimtchuk, E.S., Prokaeva, T.B., Spencer, B.H., Gursky, O. & Connors, L.H. In vitro co-expression of human amyloidogenic immunoglobulin light and heavy chain proteins: a relevant cell-based model of AL amyloidosis. Amyloid 24, 115–122 (2017).
Lim, A. et al. Characterization of transthyretin variants in familial transthyretin amyloidosis by mass spectrometric peptide mapping and DNA sequence analysis. Anal. Chem. 74, 741–751 (2002).
Bradley, W.G. Possible therapy for ALS based on the cyanobacteria/BMAA hypothesis. Amyotroph. Lateral Scler. 10 Suppl 2, 118–123 (2009).
Schmitt, N.D. & Agar, J.N. Parsing disease-relevant protein modifications from epiphenomena: perspective on the structural basis of SOD1-mediated ALS. J. Mass Spectrom. 52, 480–491 (2017).
Dickson, D.W. Neuropathology of non-Alzheimer degenerative disorders. Int. J. Clin. Exp. Pathol. 3, 1–23 (2009).
Wildburger, N.C. et al. Diversity of amyloid-beta proteoforms in the Alzheimer's disease brain. Sci. Rep. 7, 9520 (2017).
Kellie, J.F. et al. Quantitative measurement of intact alpha-synuclein proteoforms from post-mortem control and Parkinson's disease brain tissue by intact protein mass spectrometry. Sci. Rep. 4, 5797 (2014).
McCann, H., Stevens, C.H., Cartwright, H. & Halliday, G.M. α-Synucleinopathy phenotypes. Parkinsonism Relat. Disord. 20 Suppl 1, S62–S67 (2014).
Dickson, D.W. Chapter 7 Ubiquitinopathies. Blue Books of Neurology 30, 165–185 (2007).
Kabashi, E. & Durham, H.D. Failure of protein quality control in amyotrophic lateral sclerosis. Biochim. Biophys. Acta 1762, 1038–1050 (2006).
Zhang, J. et al. Top-down quantitative proteomics identified phosphorylation of cardiac troponin I as a candidate biomarker for chronic heart failure. J. Proteome Res. 10, 4054–4065 (2011).
Mazur, M.T. et al. Quantitative analysis of intact apolipoproteins in human HDL by top-down differential mass spectrometry. Proc. Natl. Acad. Sci. USA 107, 7728–7733 (2010).
Zhang, S., Raedschelders, K., Santos, M. & Van Eyk, J.E. Profiling B-type natriuretic peptide cleavage peptidoforms in human plasma by capillary electrophoresis with electrospray ionization mass spectrometry. J. Proteome Res. 16, 4515–4522 (2017).
Ansong, C. et al. Top-down proteomics reveals a unique protein S-thiolation switch in Salmonella typhimurium in response to infection-like conditions. Proc. Natl. Acad. Sci. USA 110, 10153–10158 (2013).
Carel, C. et al. Identification of specific posttranslational O-mycoloylations mediating protein targeting to the mycomembrane. Proc. Natl. Acad. Sci. USA 114, 4231–4236 (2017).
Chamot-Rooke, J. et al. Posttranslational modification of pili upon cell contact triggers N. meningitidis dissemination. Science 331, 778–782 (2011).
van Belkum, A., Welker, M., Erhard, M. & Chatellier, S. Biomedical mass spectrometry in today's and tomorrow's clinical microbiology laboratories. J. Clin. Microbiol. 50, 1513–1517 (2012).
Lévesque, S. et al. A side by side comparison of Bruker Biotyper and VITEK MS: utility of MALDI-TOF MS technology for microorganism identification in a public health reference laboratory. PLoS One 10, e0144878 (2015). This manuscript describes the use of intact mass measurement to provide a specific, orthogonal method for microorganism identification in the clinical research lab.
Hoppmann, C. et al. Site-specific incorporation of phosphotyrosine using an expanded genetic code. Nat. Chem. Biol. 13, 842–844 (2017).
Luo, X. et al. Genetically encoding phosphotyrosine and its nonhydrolyzable analog in bacteria. Nat. Chem. Biol. 13, 845–849 (2017).
Yang, A. et al. A chemical biology route to site-specific authentic protein modifications. Science 354, 623–626 (2016).
Baker, J.L., Çelik, E. & DeLisa, M.P. Expanding the glycoengineering toolbox: the rise of bacterial N-linked protein glycosylation. Trends Biotechnol. 31, 313–323 (2013).
Oza, J.P. et al. Robust production of recombinant phosphoproteins using cell-free protein synthesis. Nat. Commun. 6, 8168 (2015).
Müller, M.M. & Muir, T.W. Histones: at the crossroads of peptide and protein chemistry. Chem. Rev. 115, 2296–2349 (2015).
Hornsby, M. et al. A high through-put platform for recombinant antibodies to folded proteins. Mol. Cell. Proteomics 14, 2833–2847 (2015).
Porpiglia, E. et al. High-resolution myogenic lineage mapping by single-cell mass cytometry. Nat. Cell Biol. 19, 558–567 (2017).
Prabakaran, S., Lippens, G., Steen, H. & Gunawardena, J. Post-translational modification: nature's escape from genetic imprisonment and the basis for dynamic information encoding. Wiley Interdiscip. Rev. Syst. Biol. Med. 4, 565–583 (2012).
Kirschner, M. & Gerhart, J. Evolvability. Proc. Natl. Acad. Sci. USA 95, 8420–8427 (1998).
Edwards, A.V., Schwämmle, V. & Larsen, M.R. Neuronal process structure and growth proteins are targets of heavy PTM regulation during brain development. J. Proteomics 101, 77–87 (2014).
Sluchanko, N.N. & Gusev, N.B. Moonlighting chaperone-like activity of the universal regulatory 14-3-3 proteins. FEBS J. 284, 1279–1295 (2017).
Howard, T.E., Shai, S.Y., Langford, K.G., Martin, B.M. & Bernstein, K.E. Transcription of testicular angiotensin-converting enzyme (ACE) is initiated within the 12th intron of the somatic ACE gene. Mol. Cell. Biol. 10, 4294–4302 (1990).
Schellenberger, U. et al. The precursor to B-type natriuretic peptide is an O-linked glycoprotein. Arch. Biochem. Biophys. 451, 160–166 (2006).
Zhang, P. et al. Multiple reaction monitoring to identify site-specific troponin I phosphorylated residues in the failing human heart. Circulation 126, 1828–1837 (2012).
Garcia, B.A., Pesavento, J.J., Mizzen, C.A. & Kelleher, N.L. Pervasive combinatorial modification of histone H3 in human cells. Nat. Methods 4, 487–489 (2007).
Pesavento, J.J., Bullock, C.R., LeDuc, R.D., Mizzen, C.A. & Kelleher, N.L. Combinatorial modification of human histone H4 quantitated by two-dimensional liquid chromatography coupled with top down mass spectrometry. J. Biol. Chem. 283, 14927–14937 (2008).
Bush, D.R., Zang, L., Belov, A.M., Ivanov, A.R. & Karger, B.L. High resolution CZE-MS quantitative characterization of intact biopharmaceutical proteins: proteoforms of interferon-b1. Anal. Chem. 88, 1138–1146 (2016).
Peng, Y. et al. Top-down proteomics reveals concerted reductions in myofilament and Z-disc protein phosphorylation after acute myocardial infarction. Mol. Cell. Proteomics 13, 2752–2764 (2014).
Cummings, R.D. The repertoire of glycan determinants in the human glycome. Mol. Biosyst. 5, 1087–1104 (2009).
Sidoli, S. et al. Middle-down hybrid chromatography/tandem mass spectrometry workflow for characterization of combinatorial post-translational modifications in histones. Proteomics 14, 2200–2211 (2014).
This article was enabled through generous funding of the Paul G. Allen Frontiers Program (Award 11715 to N.L.K.), which supports the curation of a human proteoform atlas (http://allen.kelleher.northwestern.edu). N.L.K. also acknowledges the NIH (P41 GM108569) and H. Thomas, M. Mullowney and S. Bratanch for their support and assistance in constructing this collaborative manuscript.
The authors declare no competing financial interests.
About this article
Cite this article
Aebersold, R., Agar, J., Amster, I. et al. How many human proteoforms are there?. Nat Chem Biol 14, 206–214 (2018). https://doi.org/10.1038/nchembio.2576
This article is cited by
Clinical Proteomics (2023)
Nature Biotechnology (2023)
Nature Chemical Biology (2023)
Nature Structural & Molecular Biology (2023)
Nature Reviews Molecular Cell Biology (2023)