Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Peptidomic discovery of short open reading frame–encoded peptides in human cells

Abstract

The complete extent to which the human genome is translated into polypeptides is of fundamental importance. We report a peptidomic strategy to detect short open reading frame (sORF)-encoded polypeptides (SEPs) in human cells. We identify 90 SEPs, 86 of which are previously uncharacterized, which is the largest number of human SEPs ever reported. SEP abundances range from 10–1,000 molecules per cell, identical to abundances of known proteins. SEPs arise from sORFs in noncoding RNAs as well as multicistronic mRNAs, and many SEPs initiate with non-AUG start codons, indicating that noncanonical translation may be more widespread in mammals than previously thought. In addition, coding sORFs are present in a small fraction (8 out of 1,866) of long intergenic noncoding RNAs. Together, these results provide strong evidence that the human proteome is more complex than previously appreciated.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Discovering SEPs.
Figure 2: Overview of SEPs.
Figure 3: SEP quantification.
Figure 4: Expression of SEPs.
Figure 5: Characterization of the non-AUG initiation codon of the FRAT2 sORF.

Accession codes

Accessions

Gene Expression Omnibus

References

  1. 1

    Frith, M.C. et al. The abundance of short proteins in the mammalian proteome. PLoS Genet. 2, e52 (2006).

    Article  Google Scholar 

  2. 2

    Ingolia, N.T., Lareau, L.F. & Weissman, J.S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).

    CAS  Article  Google Scholar 

  3. 3

    Zhang, F. & Hinnebusch, A.G. An upstream ORF with non-AUG start codon is translated in vivo but dispensable for translational control of GCN4 mRNA. Nucleic Acids Res. 39, 3128–3140 (2011).

    CAS  Article  Google Scholar 

  4. 4

    Calvo, S.E., Pagliarini, D.J. & Mootha, V.K. Upstream open reading frames cause widespread reduction of protein expression and are polymorphic among humans. Proc. Natl. Acad. Sci. USA 106, 7507–7512 (2009).

    CAS  Article  Google Scholar 

  5. 5

    Abastado, J.P., Miller, P.F. & Hinnebusch, A.G. A quantitative model for translational control of the GCN4 gene of Saccharomyces cerevisiae. New Biol. 3, 511–524 (1991).

    CAS  PubMed  Google Scholar 

  6. 6

    Kozak, M. Bifunctional messenger RNAs in eukaryotes. Cell 47, 481–483 (1986).

    CAS  Article  Google Scholar 

  7. 7

    Parola, A.L. & Kobilka, B.K. The peptide product of a 5′ leader cistron in the β2 adrenergic receptor mRNA inhibits receptor synthesis. J. Biol. Chem. 269, 4497–4505 (1994).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Werner, M., Feller, A. & Messenguy, F. The leader peptide of yeast gene CPA1 is essential for the translational repression of its expression. Cell 49, 805–813 (1987).

  9. 9

    Wadler, C.S. & Vanderpool, C.K. A dual function for a bacterial small RNA: SgrS performs base pairing-dependent regulation and encodes a functional polypeptide. Proc. Natl. Acad. Sci. USA 104, 20454–20459 (2007).

    CAS  Article  Google Scholar 

  10. 10

    Jay, G., Nomura, S., Anderson, C.W. & Khoury, G. Identification of the SV40 agnogene product: a DNA binding protein. Nature 8, 346–349 (1981).

    Article  Google Scholar 

  11. 11

    Casson, S.A. et al. The POLARIS gene of Arabidopsis encodes a predicted peptide required for correct root growth and leaf vascular patterning. Plant Cell 14, 1705–1721 (2002).

    CAS  Article  Google Scholar 

  12. 12

    Rohrig, H., Schmidt, J., Miklashevichs, E., Schell, J. & John, M. Soybean ENOD40 encodes two peptides that bind to sucrose synthase. Proc. Natl. Acad. Sci. USA 99, 1915–1920 (2002).

    CAS  Article  Google Scholar 

  13. 13

    Kastenmayer, J.P. et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 16, 365–373 (2006).

    CAS  Article  Google Scholar 

  14. 14

    Gleason, C.A., Liu, Q.L. & Williamson, V.M. Silencing a candidate nematode effector gene corresponding to the tomato resistance gene Mi-1 leads to acquisition of virulence. Mol. Plant Microbe Interact. 21, 576–585 (2008).

    CAS  Article  Google Scholar 

  15. 15

    Galindo, M.I., Pueyo, J.I., Fouix, S., Bishop, S.A. & Couso, J.P. Peptides encoded by short ORFs control development and define a new eukaryotic gene family. PLoS Biol. 5, e106 (2007).

    Article  Google Scholar 

  16. 16

    Kondo, T. et al. Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA. Nat. Cell Biol. 9, 660–665 (2007).

    CAS  Article  Google Scholar 

  17. 17

    Hashimoto, Y. et al. A rescue factor abolishing neuronal cell death by a wide spectrum of familial Alzheimer′s disease genes and Aβ. Proc. Natl. Acad. Sci. USA 98, 6336–6341 (2001).

    CAS  Article  Google Scholar 

  18. 18

    Hemm, M.R., Paul, B.J., Schneider, T.D., Storz, G. & Rudd, K.E. Small membrane proteins found by comparative genomics and ribosome binding site models. Mol. Microbiol. 70, 1487–1501 (2008).

    CAS  Article  Google Scholar 

  19. 19

    Oyama, M. et al. Diversity of translation start sites may define increased complexity of the human short ORFeome. Mol. Cell. Proteomics 6, 1000–1006 (2007).

    CAS  Article  Google Scholar 

  20. 20

    Tinoco, A.D., Tagore, D.M. & Saghatelian, A. Expanding the dipeptidyl peptidase 4-regulated peptidome via an optimized peptidomics platform. J. Am. Chem. Soc. 132, 3819–3830 (2010).

    CAS  Article  Google Scholar 

  21. 21

    Svensson, M., Skold, K., Svenningsson, P. & Andren, P.E. Peptidomics-based discovery of novel neuropeptides. J. Proteome Res. 2, 213–219 (2003).

    CAS  Article  Google Scholar 

  22. 22

    Tagore, D.M. et al. Peptidase substrates via global peptide profiling. Nat. Chem. Biol. 5, 23–25 (2009).

    CAS  Article  Google Scholar 

  23. 23

    Swaney, D.L., Wenger, C.D. & Coon, J.J. Value of using multiple proteases for large-scale mass spectrometry-based proteomics. J. Proteome Res. 9, 1323–1329 (2010).

    CAS  Article  Google Scholar 

  24. 24

    Eng, J.K., McCormack, A.L. & Yates Iii, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

    CAS  Article  Google Scholar 

  25. 25

    Yates, J.R. III, Eng, J.K., McCormack, A.L. & Schieltz, D. Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 (1995).

    CAS  Article  Google Scholar 

  26. 26

    Christofk, H.R., Vander Heiden, M.G., Wu, N., Asara, J.M. & Cantley, L.C. Pyruvate kinase M2 is a phosphotyrosine-binding protein. Nature 452, 181–186 (2008).

    CAS  Article  Google Scholar 

  27. 27

    Cabili, M.N. et al. Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 25, 1915–1927 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Garber, M. et al. Identifying novel constrained elements by exploiting biased substitution patterns. Bioinformatics 25, i54–i62 (2009).

    CAS  Article  Google Scholar 

  29. 29

    Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).

    CAS  Article  Google Scholar 

  30. 30

    Dix, M.M., Simon, G.M. & Cravatt, B.F. Global mapping of the topography and magnitude of proteolytic events in apoptosis. Cell 134, 679–691 (2008).

    CAS  Article  Google Scholar 

  31. 31

    Tran, J.C. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).

    CAS  Article  Google Scholar 

  32. 32

    Kersten, R.D. et al. A mass spectrometry-guided genome mining approach for natural product peptidogenomics. Nat. Chem. Biol. 7, 794–802 (2011).

    CAS  Article  Google Scholar 

  33. 33

    Keshishian, H., Addona, T., Burgess, M., Kuhn, E. & Carr, S.A. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 (2007).

    CAS  Article  Google Scholar 

  34. 34

    de Godoy, L.M. et al. Comprehensive mass-spectrometry–based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008).

    CAS  Article  Google Scholar 

  35. 35

    Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011).

    Article  Google Scholar 

  36. 36

    Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).

    Article  Google Scholar 

  37. 37

    Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 35, D61–D65 (2007).

    CAS  Article  Google Scholar 

  38. 38

    Hinnebusch, A.G. Molecular mechanism of scanning and start codon selection in eukaryotes. Microbiol. Mol. Biol. Rev. 75, 434–467 (2011).

    CAS  Article  Google Scholar 

  39. 39

    Bendtsen, J.D., Nielsen, H., von Heijne, G. & Brunak, S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783–795 (2004).

    Article  Google Scholar 

  40. 40

    Wedekind, J.E., Dance, G.S., Sowden, M.P. & Smith, H.C. Messenger RNA editing in mammals: new members of the APOBEC family seeking roles in the family business. Trends Genet. 19, 207–216 (2003).

    CAS  Article  Google Scholar 

  41. 41

    Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009).

    CAS  Article  Google Scholar 

  42. 42

    Mercer, T.R., Dinger, M.E. & Mattick, J.S. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 10, 155–159 (2009).

    CAS  Article  Google Scholar 

  43. 43

    Guttman, M. et al. Ab initio reconstruction of cell type–specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat. Biotechnol. 28, 503–510 (2010).

    CAS  Article  Google Scholar 

  44. 44

    Khalil, A.M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl. Acad. Sci. USA 106, 11667–11672 (2009).

    CAS  Article  Google Scholar 

  45. 45

    Fonslow, B.R. et al. Improvements in proteomic metrics of low abundance proteins through proteome equalization using ProteoMiner prior to MudPIT. J. Proteome Res. 10, 3690–3700 (2011).

    CAS  Article  Google Scholar 

  46. 46

    Alpert, A.J. Electrostatic repulsion hydrophilic interaction chromatography for isocratic separation of charged solutes and selective isolation of phosphopeptides. Anal. Chem. 80, 62–76 (2008).

    CAS  Article  Google Scholar 

  47. 47

    Hao, P. et al. Novel application of electrostatic repulsion-hydrophilic interaction chromatography (ERLIC) in shotgun proteomics: comprehensive profiling of rat kidney proteome. J. Proteome Res. 9, 3520–3526 (2010).

    CAS  Article  Google Scholar 

  48. 48

    Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    CAS  Article  Google Scholar 

  49. 49

    Levin, J.Z. et al. Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat. Methods 7, 709–715 (2010).

    CAS  Article  Google Scholar 

  50. 50

    Trapnell, C., Pachter, L. & Salzberg, S.L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank X. Adiconis and L. Fan for constructing the cDNA libraries used in this study. M.N.C. is supported by a Howard Hughes Medical Institute International Student Research Fellowship, and S.A.S. is supported by an National Research Service Award postdoctoral fellowship (1F32GM099408-01). J.L.R. is supported by a Damon Runyon-Rachleff Innovator Award, a Searle Scholars Award and a Richard and Susan Smith Family Foundation Fellowship. A.S. is supported by a Burroughs Wellcome Fund Career Award in Biomedical Sciences, a Searle Scholars Award and an Alfred P. Sloan Fellowship. This work was also supported by the US National Institutes of Health training grant T32GM007598 (A.J.M.), the US National Human Genome Research Institute grant 3U54HG003067 (J.Z.L.), Director's New Innovator Awards DP2OD00667 (J.L.R.) and DP2OD002374 (A.S.), National Institute of General Medical Sciences grant R01GM102491 (A.S.) and support from Harvard University (A.S.).

Author information

Affiliations

Authors

Contributions

A.J.M. and A.G.S. contributed equally to this work. A.J.M., S.A.S., A.G.S., M.N.C., J.M., J.Z.L., A.D.K., B.A.B., J.L.R. and A.S. designed the experiments. A.J.M., S.A.S., A.G.S., M.N.C., A.D.K. and B.A.B. performed the experiments. A.J.M., S.A.S., J.M., A.G.S. and B.A.B. collected the peptidomics data and with A.D.K. searched this against the RefSeq database. J.Z.L. provided the RNA-seq data. M.N.C., A.J.M. and J.L.R. performed the lincRNA analysis. S.A.S. performed all cell imaging studies, cloning and FRAT2 experiments. A.J.M., S.A.S., A.G.S., M.N.C., J.L.R. and A.S. discussed the results and implications and wrote the manuscript together.

Corresponding author

Correspondence to Alan Saghatelian.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Results (PDF 820 kb)

Supplementary Data Set 1

Full-list of identified SEPS (XLS 38 kb)

Supplementary Data Set 2

AAPGALPEAAVGPR Sf: .81 (PDF 5180 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Slavoff, S., Mitchell, A., Schwaid, A. et al. Peptidomic discovery of short open reading frame–encoded peptides in human cells. Nat Chem Biol 9, 59–64 (2013). https://doi.org/10.1038/nchembio.1120

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing