Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A draft map of the human proteome

Abstract

The availability of human genome sequence has transformed biomedical research over the past decade. However, an equivalent map for the human proteome with direct measurements of proteins and peptides does not exist yet. Here we present a draft map of the human proteome using high-resolution Fourier-transform mass spectrometry. In-depth proteomic profiling of 30 histologically normal human samples, including 17 adult tissues, 7 fetal tissues and 6 purified primary haematopoietic cells, resulted in identification of proteins encoded by 17,294 genes accounting for approximately 84% of the total annotated protein-coding genes in humans. A unique and comprehensive strategy for proteogenomic analysis enabled us to discover a number of novel protein-coding regions, which includes translated pseudogenes, non-coding RNAs and upstream open reading frames. This large human proteome catalogue (available as an interactive web-based resource at http://www.humanproteomemap.org) will complement available human genome and transcriptome data to accelerate biomedical research in health and disease.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Overview of the workflow and comparison of data with public repositories.
Figure 2: Landscape of the normal human proteome.
Figure 3: Isoform-specific expression.
Figure 4: Proteogenomic analysis.
Figure 5: Translation of pseudogenes and identification of novel N termini.

References

  1. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)

  2. Aebersold, R. & Mann, M. Mass spectrometry-based proteomics. Nature 422, 198–207 (2003)

    Article  ADS  CAS  PubMed  Google Scholar 

  3. Bensimon, A., Heck, A. J. & Aebersold, R. Mass spectrometry-based proteomics and network biology. Annu. Rev. Biochem. 81, 379–405 (2012)

    Article  CAS  PubMed  Google Scholar 

  4. Cravatt, B. F., Simon, G. M. & Yates, J. R., III The biological impact of mass-spectrometry-based proteomics. Nature 450, 991–1000 (2007)

    Article  ADS  CAS  PubMed  Google Scholar 

  5. Nagaraj, N. et al. System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra HPLC runs on a bench top Orbitrap. Mol. Cell. Proteomics 11, M111.013722 (2012)

    Article  PubMed  CAS  Google Scholar 

  6. Picotti, P. et al. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 494, 266–270 (2013)

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Kelkar, D. S. et al. Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry. Mol. Cell. Proteomics 10, M111.011627 (2011)

    Article  PubMed  PubMed Central  Google Scholar 

  8. Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Gholami, A. M. et al. Global proteome analysis of the NCI-60 cell line panel. Cell Rep. 4, 609–620 (2013)

    Article  CAS  PubMed  Google Scholar 

  10. Branca, R. M. et al. HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics. Nature Methods 11, 59–62 (2014)

    Article  CAS  PubMed  Google Scholar 

  11. Farrah, T. et al. The state of the human proteome in 2012 as viewed through PeptideAtlas. J. Proteome Res. 12, 162–171 (2013)

    Article  CAS  PubMed  Google Scholar 

  12. Craig, R., Cortens, J. P. & Beavis, R. C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 3, 1234–1242 (2004)

    Article  CAS  PubMed  Google Scholar 

  13. Gaudet, P. et al. neXtProt: organizing protein knowledge in the context of human proteome projects. J. Proteome Res. 12, 293–298 (2013)

    Article  CAS  PubMed  Google Scholar 

  14. Uhlen, M. et al. Towards a knowledge-based Human Protein Atlas. Nature Biotechnol. 28, 1248–1250 (2010)

    Article  CAS  Google Scholar 

  15. Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42, D756–D763 (2014)

    Article  CAS  PubMed  Google Scholar 

  16. Perkins, D. N., Pappin, D. J., Creasy, D. M. & Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999)

    Article  CAS  PubMed  Google Scholar 

  17. Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994)

    Article  CAS  PubMed  Google Scholar 

  18. Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nature Methods 4, 923–925 (2007)

    Article  PubMed  CAS  Google Scholar 

  19. Lane, L. et al. Metrics for the human proteome project 2013–2014 and strategies for finding missing proteins. J. Proteome Res. 13, 15–20 (2014)

    Article  CAS  PubMed  Google Scholar 

  20. Mosley, A. L. et al. Highly reproducible label free quantitative proteomic analysis of RNA polymerase complexes. Mol. Cell. Proteomics 10, M110.000687 (2011)

    Article  PubMed  CAS  Google Scholar 

  21. Fountoulakis, M., Juranville, J. F., Dierssen, M. & Lubec, G. Proteomic analysis of the fetal brain. Proteomics 2, 1547–1576 (2002)

    Article  CAS  PubMed  Google Scholar 

  22. Ying, W. et al. A dataset of human fetal liver proteome identified by subcellular fractionation and multiple protein separation and identification technology. Mol. Cell. Proteomics 5, 1703–1707 (2006)

    Article  CAS  PubMed  Google Scholar 

  23. Jansen, R., Greenbaum, D. & Gerstein, M. Relating whole-genome expression data with protein-protein interactions. Genome Res. 12, 37–46 (2002)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Ge, H., Liu, Z., Church, G. M. & Vidal, M. Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nature Genet. 29, 482–486 (2001)

    Article  CAS  PubMed  Google Scholar 

  25. Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes–2009. Nucleic Acids Res. 38, D497–D501 (2010)

    Article  CAS  PubMed  Google Scholar 

  26. Ferrington, D. A. & Gregerson, D. S. Immunoproteasomes: structure, function, and antigen presentation. Prog. Mol. Biol. Transl. Sci. 109, 75–112 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Steen, H. & Mann, M. The abc’s (and xyz’s) of peptide sequencing. Nature Rev. Mol. Cell Biol. 5, 699–711 (2004)

    Article  CAS  Google Scholar 

  28. Sugimoto, J., Sugimoto, M., Bernstein, H., Jinno, Y. & Schust, D. A novel human endogenous retroviral protein inhibits cell-cell fusion. Sci. Rep. 3, 1462 (2013)

    Article  ADS  PubMed  PubMed Central  CAS  Google Scholar 

  29. Guttman, M., Russell, P., Ingolia, N. T., Weissman, J. S. & Lander, E. S. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Kalyana-Sundaram, S. et al. Expressed pseudogenes in the transcriptional landscape of human cancers. Cell 149, 1622–1634 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Pei, B. et al. The GENCODE pseudogene resource. Genome Biol. 13, R51 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

    Article  ADS  PubMed  CAS  Google Scholar 

  33. Peri, S. & Pandey, A. A reassessment of the translation initiation codon in vertebrates. Trends Genet. 17, 685–687 (2001)

    Article  CAS  PubMed  Google Scholar 

  34. Legrain, P. et al. The human proteome project: current state and future direction. Mol. Cell. Proteomics 10, M111.009993 (2011)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  35. Paik, Y. K. et al. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nature Biotechnol. 30, 221–223 (2012)

    Article  CAS  Google Scholar 

  36. Marko-Varga, G., Omenn, G. S., Paik, Y. K. & Hancock, W. S. A first step toward completion of a genome-wide characterization of the human proteome. J. Proteome Res. 12, 1–5 (2013)

    Article  CAS  PubMed  Google Scholar 

  37. Shevchenko, A., Tomas, H., Havlis, J., Olsen, J. V. & Mann, M. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nature Protocols 1, 2856–2860 (2007)

    Article  CAS  Google Scholar 

  38. Wang, Y. et al. Reversed-phase chromatography with multiple fraction concatenation strategy for proteome profiling of human MCF10A cells. Proteomics 11, 2019–2026 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Olsen, J. V. et al. Parts per million mass accuracy on an Orbitrap mass spectrometer via lock mass injection into a C-trap. Mol. Cell. Proteomics 4, 2010–2021 (2005)

    Article  CAS  PubMed  Google Scholar 

  40. Vizcaíno, J. A. et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2013)

    Article  PubMed  CAS  Google Scholar 

  41. Craig, R. & Beavis, R. C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 (2004)

    Article  CAS  PubMed  Google Scholar 

  42. Meyer, L. R. et al. The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res. 41, D64–D69 (2013)

    Article  CAS  PubMed  Google Scholar 

  43. Razick, S., Magklaras, G. & Donaldson, I. M. iRefIndex: a consolidated protein interaction database with provenance. BMC Bioinformatics 9, 405 (2008)

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Zuberi, K. et al. GeneMANIA prediction server 2013 update. Nucleic Acids Res. 41, W115–W122 (2013)

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We would like to acknowledge the National Development and Research Institutes for some of the tissues. We acknowledge the assistance of V. Sandhya, V. Puttamallesh, U. Guha and B. Cole for help with analysis of some of the samples. We thank L. Lane and B. Amos for their assistance with the list of missing genes. This work was supported by an NIH roadmap grant for Technology Centers of Networks and Pathways (U54GM103520), NCI’s Clinical Proteomic Tumor Analysis Consortium initiative (U24CA160036), a contract (HHSN268201000032C) from the National Heart, Lung and Blood Institute and the Sol Goldman Pancreatic Cancer Research Center. The authors acknowledge the joint participation by the Adrienne Helis Malvin Medical Research Foundation and the Diana Helis Henry Medical Research Foundation through its direct engagement in the continuous active conduct of medical research in conjunction with The Johns Hopkins Hospital and the Johns Hopkins University School of Medicine and the Foundation’s Parkinson’s Disease Programs. The analysis work was partially supported by the National Resource for Network Biology (P41GM103504). A.Mah., S.K.Sh., P.S. and T.S.K.P. are supported by DBT Program Support on Neuroproteomics (BT/01/COE/08/05) to IOB and NIMHANS. H.G. is a Wellcome Trust-DBT India Alliance Early Career Fellow. We thank Council of Scientific and Industrial Research, University Grants Commission and Department of Science and Technology, Government of India for research fellowships for S.M.P., R.S.N., A.R., M.K., G.J.S., S.C., P.R., J.S., S.S.M., D.S.K., S.R., S.K.Sr., K.K.D., Y.S., A.S., S.D.Y., N.S., S.A. and G.D.

Author information

Authors and Affiliations

Authors

Contributions

A.P., H.G., R.C., M.-S.K. designed the study; A.P., H.G., M.-S.K. managed the study; D.G., C.L.K., C.A.I.-D., K.R.M. collected human cells/tissues; M.-S.K., R.C., D.G. developed the pipeline of experiment and analysis; D.G., M.-S.K., S.M.P., K.M., R.C., S.R., J.Z., X.W., P.G.S., M.S.Z., T.-C.H. prepared peptide samples for LC-MS/MS; M.-S.K., R.S.N., S.M.P., R.C., D.S.K., S.R., G.J.S. performed LC-MS/MS; M.-S.K., S.M.P., S.P., S.S.M., C.J.M., J.A. and A.K.M. processed MS data and managed data; A.K.M., S.S.M., B.G., A.H.P., Y.S., M.-S.K. performed comparison analysis with PeptideAtlas, neXtProt and GPMDB; R.I., S.Jai., G.D.B. performed interaction and complex analysis; M.-S.K., S.M.P., S.S.M., P.K., A.K.M., N.A.S., R.S.N., L.B., L.D.N.S., D.S.K., V.N., A.R., T.S., M.K., S.K.Sr., G.D., A.Mar., R.R., S.C., K.K.D., A.S., S.D.Y., S.Jay., P.R., A.H.P., B.G., J.S., N.S., R.G., G.J.S., A.A.K., S.A., D.F., T.S.K.P., H.G., A.P. performed proteogenomic analysis; A.C., H.L., R.S., J.T.S., K.K.M., S.S., A.Mah., S.K.Sh., P.S., S.D.L., C.G.D., A.Mai., M.K.H., R.H.H., C.L.K., C.A.I.-D. assisted with analysis of the data; M.-S.K., S.M.P., T.-C.H., P.L.-R. performed western blot experiments; M.-S.K., J.K.T., A.K.M., B.M., S.P., S.M.P. designed the Human Proteome Map web portal; M.-S.K., A.K.M., J.K.T. generated selected reaction monitoring (SRM) database; M.-S.K., K.M., G.D., S.M.P., S.S.M. illustrated figures with help of other authors; A.P., M.-S.K., H.G. wrote the manuscript with inputs from other authors.

Corresponding authors

Correspondence to Harsha Gowda or Akhilesh Pandey.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD000561.

Extended data figures and tables

Extended Data Figure 1 Summary of proteome analysis.

a, Mass error in parts per million for precursor ions of all identified peptides. b, Number of peptides detected per gene binned as shown. c, Distribution of sequence coverage of identified proteins. df, %FDR with a q value of <0.01 plotted against peptide length in number of amino acids, charge state of peptide ion and number of cleavage sites missed by enzyme. P values computed from two-tailed t-test are shown. Error bars indicate s.d. calculated from FDRs of multiple fetal samples. g, h, A comparison of peptides identified in this study with PeptideAtlas and GPMDB. i, Mass error in parts per million for precursor ions identified from proteogenomics analysis.

Extended Data Figure 2 Tissue-wise gene expression and housekeeping proteins.

a, A heat map shows a partial list of not well-characterized, hypothetical genes. b, The bulk of protein mass is contributed by only a small number of genes. Only 2,350 ‘housekeeping genes’ account for 75% of proteome mass. c, The number of cell/tissue types where a gene was observed was counted. Some genes were found to be specifically restricted in a few samples while others were observed in the majority of samples analysed. For example, 1,537 genes were detected only in one sample, and 2,350 genes were found in all samples. These latter genes can be defined as highly abundant ‘housekeeping proteins’. d, Distribution of genes in the RefSeq database based on the number of protein isoforms resulting from their annotated transcripts (left). Distribution of the transcripts with two or more protein isoforms annotated based on the number of isoform-specific or shared peptides (right). e, A representative example of sequence coverage of PSMB8 protein along with tissue distribution of all of its identified peptides and the MS/MS spectrum of one of the peptides is shown along with seven selected reaction monitoring (SRM) transitions.

Extended Data Figure 3 Western blot analysis of select tissue-restricted proteins.

a, Eight proteins showing tissue-restricted expression were tested using western blot analysis in 17 adult tissues. GAPDH was used as a loading control. b, Four proteins found to be expressed in a broad range of tissues, although bands that do not correspond to the expected molecular weight are also observed. CST, Cell Signalling Technology; SCB, Santa Cruz Biotechnology.

Extended Data Figure 4 Identification of novel genes/ORFs and translated non-coding RNAs.

a, An example of a novel ORF in an alternate reading frame located in the 3′ UTR of CHTF8 gene. The relative abundance of peptides from the CHTF8 protein and the protein encoded by the novel ORF is shown (bottom). b, An example of translated non-coding RNA (NR_027693.1) identified by searching 3-frame-translated transcript database. The MS/MS spectrum of one of the five identified peptides (LEVASSPPVSEAVPR) is shown along with a similar fragmentation pattern observed from the corresponding synthetic peptide.

Extended Data Figure 5 Human genome annotation through proteogenomic analysis using GeneSpring.

a, Four genome search specific peptides (GSSPs; red boxes) map to an upstream ORF (denoted as black hashes) located in 5′ UTR of the SLC35A4 gene (ORF shown as blue rectangle). b, GSSP mapping in the intergenic region between two RefSeq annotated genes NDUFv3 and PKNOX1. The ORF region is depicted in dotted lines of human endogenous retroviral element (HERV). c, GSSPs mapping to an annotated pseudogene MAGEB6P1, the alignments of parent gene and pseudogene are shown below the peptides.

Extended Data Figure 6 Frequency of nucleotides surrounding translational start sites.

a, Frequency of nucleotides at positions ranging from −5 to +1 surrounding the AUG codon for confirmed translational start sites. b, Frequency of nucleotides at positions ranging from −5 to +1 surrounding the AUG codon for novel translational start sites identified in this study.

Supplementary information

Supplementary Information

This file contains a Supplementary Discussion and additional references. (PDF 106 kb)

Supplementary Data

This file contains Supplementary Data. (PDF 3594 kb)

Supplementary Table 1

This file contains a summary of results from proteogenomics analysis; a list of peptides indicating novel signal peptide cleavage sites; and a draft map of the human proteome. (XLSX 1178 kb)

PowerPoint slides

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, MS., Pinto, S., Getnet, D. et al. A draft map of the human proteome. Nature 509, 575–581 (2014). https://doi.org/10.1038/nature13302

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature13302

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing