Abstract
The complete extent to which the human genome is translated into polypeptides is of fundamental importance. We report a peptidomic strategy to detect short open reading frame (sORF)-encoded polypeptides (SEPs) in human cells. We identify 90 SEPs, 86 of which are previously uncharacterized, which is the largest number of human SEPs ever reported. SEP abundances range from 10–1,000 molecules per cell, identical to abundances of known proteins. SEPs arise from sORFs in noncoding RNAs as well as multicistronic mRNAs, and many SEPs initiate with non-AUG start codons, indicating that noncanonical translation may be more widespread in mammals than previously thought. In addition, coding sORFs are present in a small fraction (8 out of 1,866) of long intergenic noncoding RNAs. Together, these results provide strong evidence that the human proteome is more complex than previously appreciated.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Pervasive translation of circular RNAs driven by short IRES-like elements
Nature Communications Open Access 29 June 2022
-
Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures
Journal of Biomedical Science Open Access 17 March 2022
-
Upstream open reading frames regulate translation of cancer-associated transcripts and encode HLA-presented immunogenic tumor antigens
Cellular and Molecular Life Sciences Open Access 03 March 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout





Accession codes
References
Frith, M.C. et al. The abundance of short proteins in the mammalian proteome. PLoS Genet. 2, e52 (2006).
Ingolia, N.T., Lareau, L.F. & Weissman, J.S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).
Zhang, F. & Hinnebusch, A.G. An upstream ORF with non-AUG start codon is translated in vivo but dispensable for translational control of GCN4 mRNA. Nucleic Acids Res. 39, 3128–3140 (2011).
Calvo, S.E., Pagliarini, D.J. & Mootha, V.K. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc. Natl. Acad. Sci. USA 106, 7507–7512 (2009).
Abastado, J.P., Miller, P.F. & Hinnebusch, A.G. A quantitative model for translational control of the GCN4 gene of Saccharomyces cerevisiae. New Biol. 3, 511–524 (1991).
Kozak, M. Bifunctional messenger RNAs in eukaryotes. Cell 47, 481–483 (1986).
Parola, A.L. & Kobilka, B.K. The peptide product of a 5′ leader cistron in the β2 adrenergic receptor mRNA inhibits receptor synthesis. J. Biol. Chem. 269, 4497–4505 (1994).
Werner, M., Feller, A. & Messenguy, F. The leader peptide of yeast gene CPA1 is essential for the translational repression of its expression. Cell 49, 805–813 (1987).
Wadler, C.S. & Vanderpool, C.K. A dual function for a bacterial small RNA: SgrS performs base pairing-dependent regulation and encodes a functional polypeptide. Proc. Natl. Acad. Sci. USA 104, 20454–20459 (2007).
Jay, G., Nomura, S., Anderson, C.W. & Khoury, G. Identification of the SV40 agnogene product: a DNA binding protein. Nature 8, 346–349 (1981).
Casson, S.A. et al. The POLARIS gene of Arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning. Plant Cell 14, 1705–1721 (2002).
Rohrig, H., Schmidt, J., Miklashevichs, E., Schell, J. & John, M. Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc. Natl. Acad. Sci. USA 99, 1915–1920 (2002).
Kastenmayer, J.P. et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 16, 365–373 (2006).
Gleason, C.A., Liu, Q.L. & Williamson, V.M. Silencing a candidate nematode effector gene corresponding to the tomato resistance gene Mi-1 leads to acquisition of virulence. Mol. Plant Microbe Interact. 21, 576–585 (2008).
Galindo, M.I., Pueyo, J.I., Fouix, S., Bishop, S.A. & Couso, J.P. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 5, e106 (2007).
Kondo, T. et al. Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA. Nat. Cell Biol. 9, 660–665 (2007).
Hashimoto, Y. et al. A rescue factor abolishing neuronal cell death by a wide spectrum of familial Alzheimer′s disease genes and Aβ. Proc. Natl. Acad. Sci. USA 98, 6336–6341 (2001).
Hemm, M.R., Paul, B.J., Schneider, T.D., Storz, G. & Rudd, K.E. Small membrane proteins found by comparative genomics and ribosome binding site models. Mol. Microbiol. 70, 1487–1501 (2008).
Oyama, M. et al. Diversity of translation start sites may define increased complexity of the human short ORFeome. Mol. Cell. Proteomics 6, 1000–1006 (2007).
Tinoco, A.D., Tagore, D.M. & Saghatelian, A. Expanding the dipeptidyl peptidase 4-regulated peptidome via an optimized peptidomics platform. J. Am. Chem. Soc. 132, 3819–3830 (2010).
Svensson, M., Skold, K., Svenningsson, P. & Andren, P.E. Peptidomics-based discovery of novel neuropeptides. J. Proteome Res. 2, 213–219 (2003).
Tagore, D.M. et al. Peptidase substrates via global peptide profiling. Nat. Chem. Biol. 5, 23–25 (2009).
Swaney, D.L., Wenger, C.D. & Coon, J.J. Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J. Proteome Res. 9, 1323–1329 (2010).
Eng, J.K., McCormack, A.L. & Yates Iii, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Yates, J.R. III, Eng, J.K., McCormack, A.L. & Schieltz, D. Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 (1995).
Christofk, H.R., Vander Heiden, M.G., Wu, N., Asara, J.M. & Cantley, L.C. Pyruvate kinase M2 is a phosphotyrosine-binding protein. Nature 452, 181–186 (2008).
Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).
Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).
Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).
Dix, M.M., Simon, G.M. & Cravatt, B.F. Global mapping of the topography and magnitude of proteolytic events in apoptosis. Cell 134, 679–691 (2008).
Tran, J.C. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).
Kersten, R.D. et al. A mass spectrometry-guided genome mining approach for natural product peptidogenomics. Nat. Chem. Biol. 7, 794–802 (2011).
Keshishian, H., Addona, T., Burgess, M., Kuhn, E. & Carr, S.A. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 (2007).
de Godoy, L.M. et al. Comprehensive mass-spectrometry–based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008).
Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).
Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).
Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).
Hinnebusch, A.G. Molecular mechanism of scanning and start codon selection in eukaryotes. Microbiol. Mol. Biol. Rev. 75, 434–467 (2011).
Bendtsen, J.D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).
Wedekind, J.E., Dance, G.S., Sowden, M.P. & Smith, H.C. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet. 19, 207–216 (2003).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).
Mercer, T.R., Dinger, M.E. & Mattick, J.S. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 10, 155–159 (2009).
Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).
Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).
Fonslow, B.R. et al. Improvements in proteomic metrics of low abundance proteins through proteome equalization using ProteoMiner prior to MudPIT. J. Proteome Res. 10, 3690–3700 (2011).
Alpert, A.J. Electrostatic repulsion hydrophilic interaction chromatography for isocratic separation of charged solutes and selective isolation of phosphopeptides. Anal. Chem. 80, 62–76 (2008).
Hao, P. et al. Novel application of electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) in shotgun proteomics: comprehensive profiling of rat kidney proteome. J. Proteome Res. 9, 3520–3526 (2010).
Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).
Levin, J.Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).
Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).
Acknowledgements
We thank X. Adiconis and L. Fan for constructing the cDNA libraries used in this study. M.N.C. is supported by a Howard Hughes Medical Institute International Student Research Fellowship, and S.A.S. is supported by an National Research Service Award postdoctoral fellowship (1F32GM099408-01). J.L.R. is supported by a Damon Runyon-Rachleff Innovator Award, a Searle Scholars Award and a Richard and Susan Smith Family Foundation Fellowship. A.S. is supported by a Burroughs Wellcome Fund Career Award in Biomedical Sciences, a Searle Scholars Award and an Alfred P. Sloan Fellowship. This work was also supported by the US National Institutes of Health training grant T32GM007598 (A.J.M.), the US National Human Genome Research Institute grant 3U54HG003067 (J.Z.L.), Director's New Innovator Awards DP2OD00667 (J.L.R.) and DP2OD002374 (A.S.), National Institute of General Medical Sciences grant R01GM102491 (A.S.) and support from Harvard University (A.S.).
Author information
Authors and Affiliations
Contributions
A.J.M. and A.G.S. contributed equally to this work. A.J.M., S.A.S., A.G.S., M.N.C., J.M., J.Z.L., A.D.K., B.A.B., J.L.R. and A.S. designed the experiments. A.J.M., S.A.S., A.G.S., M.N.C., A.D.K. and B.A.B. performed the experiments. A.J.M., S.A.S., J.M., A.G.S. and B.A.B. collected the peptidomics data and with A.D.K. searched this against the RefSeq database. J.Z.L. provided the RNA-seq data. M.N.C., A.J.M. and J.L.R. performed the lincRNA analysis. S.A.S. performed all cell imaging studies, cloning and FRAT2 experiments. A.J.M., S.A.S., A.G.S., M.N.C., J.L.R. and A.S. discussed the results and implications and wrote the manuscript together.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Results (PDF 820 kb)
Supplementary Data Set 1
Full-list of identified SEPS (XLS 38 kb)
Supplementary Data Set 2
AAPGALPEAAVGPR Sf: .81 (PDF 5180 kb)
Rights and permissions
About this article
Cite this article
Slavoff, S., Mitchell, A., Schwaid, A. et al. Peptidomic discovery of short open reading frame–encoded peptides in human cells. Nat Chem Biol 9, 59–64 (2013). https://doi.org/10.1038/nchembio.1120
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nchembio.1120
This article is cited by
-
Short open reading frames (sORFs) and microproteins: an update on their identification and validation measures
Journal of Biomedical Science (2022)
-
HNF4A-AS1-encoded small peptide promotes self-renewal and aggressiveness of neuroblastoma stem cells via eEF1A1-repressed SMAD4 transactivation
Oncogene (2022)
-
Pervasive translation of circular RNAs driven by short IRES-like elements
Nature Communications (2022)
-
Overlapping genes in natural and engineered genomes
Nature Reviews Genetics (2022)
-
Identification of tumor antigens with immunopeptidomics
Nature Biotechnology (2022)