The complete extent to which the human genome is translated into polypeptides is of fundamental importance. We report a peptidomic strategy to detect short open reading frame (sORF)-encoded polypeptides (SEPs) in human cells. We identify 90 SEPs, 86 of which are previously uncharacterized, which is the largest number of human SEPs ever reported. SEP abundances range from 10–1,000 molecules per cell, identical to abundances of known proteins. SEPs arise from sORFs in noncoding RNAs as well as multicistronic mRNAs, and many SEPs initiate with non-AUG start codons, indicating that noncanonical translation may be more widespread in mammals than previously thought. In addition, coding sORFs are present in a small fraction (8 out of 1,866) of long intergenic noncoding RNAs. Together, these results provide strong evidence that the human proteome is more complex than previously appreciated.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Gene Expression Omnibus
Frith, M.C. et al. The abundance of short proteins in the mammalian proteome. PLoS Genet. 2, e52 (2006).
Ingolia, N.T., Lareau, L.F. & Weissman, J.S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).
Zhang, F. & Hinnebusch, A.G. An upstream ORF with non-AUG start codon is translated in vivo but dispensable for translational control of GCN4 mRNA. Nucleic Acids Res. 39, 3128–3140 (2011).
Calvo, S.E., Pagliarini, D.J. & Mootha, V.K. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc. Natl. Acad. Sci. USA 106, 7507–7512 (2009).
Abastado, J.P., Miller, P.F. & Hinnebusch, A.G. A quantitative model for translational control of the GCN4 gene of Saccharomyces cerevisiae. New Biol. 3, 511–524 (1991).
Kozak, M. Bifunctional messenger RNAs in eukaryotes. Cell 47, 481–483 (1986).
Parola, A.L. & Kobilka, B.K. The peptide product of a 5′ leader cistron in the β2 adrenergic receptor mRNA inhibits receptor synthesis. J. Biol. Chem. 269, 4497–4505 (1994).
Werner, M., Feller, A. & Messenguy, F. The leader peptide of yeast gene CPA1 is essential for the translational repression of its expression. Cell 49, 805–813 (1987).
Wadler, C.S. & Vanderpool, C.K. A dual function for a bacterial small RNA: SgrS performs base pairing-dependent regulation and encodes a functional polypeptide. Proc. Natl. Acad. Sci. USA 104, 20454–20459 (2007).
Jay, G., Nomura, S., Anderson, C.W. & Khoury, G. Identification of the SV40 agnogene product: a DNA binding protein. Nature 8, 346–349 (1981).
Casson, S.A. et al. The POLARIS gene of Arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning. Plant Cell 14, 1705–1721 (2002).
Rohrig, H., Schmidt, J., Miklashevichs, E., Schell, J. & John, M. Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc. Natl. Acad. Sci. USA 99, 1915–1920 (2002).
Kastenmayer, J.P. et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 16, 365–373 (2006).
Gleason, C.A., Liu, Q.L. & Williamson, V.M. Silencing a candidate nematode effector gene corresponding to the tomato resistance gene Mi-1 leads to acquisition of virulence. Mol. Plant Microbe Interact. 21, 576–585 (2008).
Galindo, M.I., Pueyo, J.I., Fouix, S., Bishop, S.A. & Couso, J.P. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 5, e106 (2007).
Kondo, T. et al. Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA. Nat. Cell Biol. 9, 660–665 (2007).
Hashimoto, Y. et al. A rescue factor abolishing neuronal cell death by a wide spectrum of familial Alzheimer′s disease genes and Aβ. Proc. Natl. Acad. Sci. USA 98, 6336–6341 (2001).
Hemm, M.R., Paul, B.J., Schneider, T.D., Storz, G. & Rudd, K.E. Small membrane proteins found by comparative genomics and ribosome binding site models. Mol. Microbiol. 70, 1487–1501 (2008).
Oyama, M. et al. Diversity of translation start sites may define increased complexity of the human short ORFeome. Mol. Cell. Proteomics 6, 1000–1006 (2007).
Tinoco, A.D., Tagore, D.M. & Saghatelian, A. Expanding the dipeptidyl peptidase 4-regulated peptidome via an optimized peptidomics platform. J. Am. Chem. Soc. 132, 3819–3830 (2010).
Svensson, M., Skold, K., Svenningsson, P. & Andren, P.E. Peptidomics-based discovery of novel neuropeptides. J. Proteome Res. 2, 213–219 (2003).
Tagore, D.M. et al. Peptidase substrates via global peptide profiling. Nat. Chem. Biol. 5, 23–25 (2009).
Swaney, D.L., Wenger, C.D. & Coon, J.J. Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J. Proteome Res. 9, 1323–1329 (2010).
Eng, J.K., McCormack, A.L. & Yates Iii, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Yates, J.R. III, Eng, J.K., McCormack, A.L. & Schieltz, D. Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 (1995).
Christofk, H.R., Vander Heiden, M.G., Wu, N., Asara, J.M. & Cantley, L.C. Pyruvate kinase M2 is a phosphotyrosine-binding protein. Nature 452, 181–186 (2008).
Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).
Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).
Dix, M.M., Simon, G.M. & Cravatt, B.F. Global mapping of the topography and magnitude of proteolytic events in apoptosis. Cell 134, 679–691 (2008).
Tran, J.C. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).
Kersten, R.D. et al. A mass spectrometry-guided genome mining approach for natural product peptidogenomics. Nat. Chem. Biol. 7, 794–802 (2011).
Keshishian, H., Addona, T., Burgess, M., Kuhn, E. & Carr, S.A. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 (2007).
de Godoy, L.M. et al. Comprehensive mass-spectrometry–based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008).
Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).
Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).
Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Hinnebusch, A.G. Molecular mechanism of scanning and start codon selection in eukaryotes. Microbiol. Mol. Biol. Rev. 75, 434–467 (2011).
Bendtsen, J.D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).
Wedekind, J.E., Dance, G.S., Sowden, M.P. & Smith, H.C. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet. 19, 207–216 (2003).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Mercer, T.R., Dinger, M.E. & Mattick, J.S. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 10, 155–159 (2009).
Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).
Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).
Fonslow, B.R. et al. Improvements in proteomic metrics of low abundance proteins through proteome equalization using ProteoMiner prior to MudPIT. J. Proteome Res. 10, 3690–3700 (2011).
Alpert, A.J. Electrostatic repulsion hydrophilic interaction chromatography for isocratic separation of charged solutes and selective isolation of phosphopeptides. Anal. Chem. 80, 62–76 (2008).
Hao, P. et al. Novel application of electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) in shotgun proteomics: comprehensive profiling of rat kidney proteome. J. Proteome Res. 9, 3520–3526 (2010).
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Levin, J.Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
We thank X. Adiconis and L. Fan for constructing the cDNA libraries used in this study. M.N.C. is supported by a Howard Hughes Medical Institute International Student Research Fellowship, and S.A.S. is supported by an National Research Service Award postdoctoral fellowship (1F32GM099408-01). J.L.R. is supported by a Damon Runyon-Rachleff Innovator Award, a Searle Scholars Award and a Richard and Susan Smith Family Foundation Fellowship. A.S. is supported by a Burroughs Wellcome Fund Career Award in Biomedical Sciences, a Searle Scholars Award and an Alfred P. Sloan Fellowship. This work was also supported by the US National Institutes of Health training grant T32GM007598 (A.J.M.), the US National Human Genome Research Institute grant 3U54HG003067 (J.Z.L.), Director's New Innovator Awards DP2OD00667 (J.L.R.) and DP2OD002374 (A.S.), National Institute of General Medical Sciences grant R01GM102491 (A.S.) and support from Harvard University (A.S.).
The authors declare no competing financial interests.
About this article
Cite this article
Slavoff, S., Mitchell, A., Schwaid, A. et al. Peptidomic discovery of short open reading frame–encoded peptides in human cells. Nat Chem Biol 9, 59–64 (2013). https://doi.org/10.1038/nchembio.1120
Increased expression of peptides from non-coding genes in cancer proteomics datasets suggests potential tumor neoantigens
Communications Biology (2021)
Nature Communications (2021)
Recent advances in mass spectrometry–based peptidomics workflows to identify short-open-reading-frame-encoded peptides and explore their functions
Current Opinion in Chemical Biology (2021)
Nature Communications (2021)
The cardiac translational landscape reveals that micropeptides are new players involved in cardiomyocyte hypertrophy
Molecular Therapy (2021)