The annotation of the mammalian protein-coding genome is incomplete. Arbitrary size restriction of open reading frames (ORFs) and the absolute requirement for a methionine codon as the sole initiator of translation have constrained the identification of potentially important transcripts with non-canonical protein-coding potential1,2. Here, using unbiased transcriptomic approaches in macrophages that respond to bacterial infection, we show that ribosomes associate with a large number of RNAs that were previously annotated as ‘non-protein coding’. Although the idea that such non-canonical ORFs can encode functional proteins is controversial3,4, we identify a range of short and non-ATG-initiated ORFs that can generate stable and spatially distinct proteins. Notably, we show that the translation of a new ORF ‘hidden’ within the long non-coding RNA Aw112010 is essential for the orchestration of mucosal immunity during both bacterial infection and colitis. This work expands our interpretation of the protein-coding genome and demonstrates that proteinaceous products generated from non-canonical ORFs are crucial for the immune response in vivo. We therefore propose that the misannotation of non-canonical ORF-containing genes as non-coding RNAs may obscure the essential role of a multitude of previously undiscovered protein-coding genes in immunity and disease.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Data availability

RNA-seq, RiboTag RNA-seq, and ribosome profiling data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) repository with the accession code GSE120762. All lncRNA RNA-seq, RiboTagSeq, ribosome profiling sequencing and analysis can also be found in Supplementary Table 1.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Couso, J. P. & Patraquim, P. Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol. 18, 575–589 (2017).

  2. 2.

    Kozak, M. Regulation of translation in eukaryotic systems. Annu. Rev. Cell Biol. 8, 197–225 (1992).

  3. 3.

    Guttman, M., Russell, P., Ingolia, N. T., Weissman, J. S. & Lander, E. S. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013).

  4. 4.

    Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).

  5. 5.

    Hoagland, M. B., Stephenson, M. L., Scott, J. F., Hecht, L. I. & Zamecnik, P. C. A soluble ribonucleic acid intermediate in protein synthesis. J. Biol. Chem. 231, 241–257 (1958).

  6. 6.

    Sanz, E. et al. Cell-type-specific isolation of ribosome-associated mRNA from complex tissues. Proc. Natl Acad. Sci. USA 106, 13939–13944 (2009).

  7. 7.

    Clausen, B. E., Burkhardt, C., Reith, W., Renkawitz, R. & Förster, I. Conditional gene targeting in macrophages and granulocytes using LysMcre mice. Transgenic Res. 8, 265–277 (1999).

  8. 8.

    Mudge, J. M. & Harrow, J. Creating reference gene annotation for the mouse C57BL6/J genome assembly. Mamm. Genome 26, 366–378 (2015).

  9. 9.

    Pruitt, K. D., Tatusova, T., Brown, G. R. & Maglott, D. R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012).

  10. 10.

    Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73 (2014).

  11. 11.

    Carpenter, S. et al. A long noncoding RNA mediates both activation and repression of immune response genes. Science 341, 789–792 (2013).

  12. 12.

    Kotzin, J. J. et al. The long non-coding RNA Morrbid regulates Bim and short-lived myeloid cell lifespan. Nature 537, 239–243 (2016).

  13. 13.

    Osuna, B. A., Howard, C. J., Kc, S., Frost, A. & Weinberg, D. E. In vitro analysis of RQC activities provides insights into the mechanism and function of CAT tailing. eLife 6, e27949 (2017).

  14. 14.

    Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).

  15. 15.

    Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife 4, e08890 (2015).

  16. 16.

    Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  17. 17.

    Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

  18. 18.

    Kondo, T. et al. Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis. Science 329, 336–339 (2010).

  19. 19.

    Kearse, M. G. & Wilusz, J. E. Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Dev. 31, 1717–1731 (2017).

  20. 20.

    Stothard, P. The sequence manipulation suite: JavaScript programs for analyzing and formatting protein and DNA sequences. Biotechniques 28, 1102, 1104 (2000).

  21. 21.

    Wang, H., Wang, Y., Xie, S., Liu, Y. & Xie, Z. Global and cell-type specific properties of lincRNAs with ribosome occupancy. Nucleic Acids Res. 45, 2786–2796 (2017).

  22. 22.

    Xiao, Z. et al. De novo annotation and characterization of the translatome with ribosome profiling data. Nucleic Acids Res. 46, e61–e61 (2018).

  23. 23.

    D’Lima, N. G. et al. A human microprotein that interacts with the mRNA decapping complex. Nat. Chem. Biol. 13, 174–180 (2017).

  24. 24.

    de Jong, R. et al. Severe mycobacterial and Salmonella infections in interleukin-12 receptor-deficient patients. Science 280, 1435–1438 (1998).

  25. 25.

    Lehmann, J. et al. IL-12p40-dependent agonistic effects on the development of protective innate and adaptive immunity against Salmonella enteritidis. J. Immunol. 167, 5304–5315 (2001).

  26. 26.

    Mannon, P. J. et al. Anti-interleukin-12 antibody for active Crohn’s disease. N. Engl. J. Med. 351, 2069–2079 (2004).

  27. 27.

    Neurath, M. F., Fuss, I., Kelsall, B. L., Stüber, E. & Strober, W. Antibodies to interleukin 12 abrogate established experimental colitis in mice. J. Exp. Med. 182, 1281–1290 (1995).

  28. 28.

    Pulak, R. & Anderson, P. mRNA surveillance by the Caenorhabditis elegans smg genes. Genes Dev. 7, 1885–1897 (1993).

  29. 29.

    Martin, L. et al. Identification and characterization of small molecules that inhibit nonsense-mediated RNA decay and suppress nonsense p53 mutations. Cancer Res. 74, 3104–3113 (2014).

  30. 30.

    Nowarski, R. et al. Epithelial IL-18 equilibrium controls barrier function in colitis. Cell 163, 1444–1456 (2015).

  31. 31.

    Gabanyi, I. et al. Neuro-immune interactions drive tissue programming in intestinal macrophages. Cell 164, 378–391 (2016).

  32. 32.

    Obrig, T. G., Culp, W. J., McKeehan, W. L. & Hardesty, B. The mechanism by which cycloheximide and related glutarimide antibiotics inhibit peptide synthesis on reticulocyte ribosomes. J. Biol. Chem. 246, 174–181 (1971).

  33. 33.

    Schneider-Poetsch, T. et al. Inhibition of eukaryotic translation elongation by cycloheximide and lactimidomycin. Nat. Chem. Biol. 6, 209–217 (2010).

  34. 34.

    Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

  35. 35.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  36. 36.

    Zhang, H., Meltzer, P. & Davis, S. RCircos: an R package for Circos 2D track plots. BMC Bioinformatics 14, 244 (2013).

  37. 37.

    Phanstiel, D. H., Boyle, A. P., Araya, C. L. & Snyder, M. P. Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics 30, 2808–2810 (2014).

  38. 38.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

  39. 39.

    Frolova, L. et al. A highly conserved eukaryotic protein family possessing properties of polypeptide chain release factor. Nature 372, 701–703 (1994).

  40. 40.

    Pertea, M. et al. Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise. Preprint at https://www.bioRxiv.org/content/early/2018/05/28/332825 (2018).

  41. 41.

    Wessel, D. & Flügge, U. I. A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids. Anal. Biochem. 138, 141–143 (1984).

  42. 42.

    Slavoff, S. A. et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 9, 59–64 (2013).

  43. 43.

    Gundry, R. L. et al. Preparation of proteins and peptides for mass spectrometry analysis in a bottom-up proteomics workflow. Curr. Protoc. Mol. Biol. Chapter 10, Unit10.25 (2009).

  44. 44.

    Gerber, S. A., Rush, J., Stemman, O., Kirschner, M. W. & Gygi, S. P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl Acad. Sci. USA 100, 6940–6945 (2003).

  45. 45.

    Chen, L. M., Kaniga, K. & Galán, J. E. Salmonella spp. are cytotoxic for cultured macrophages. Mol. Microbiol. 21, 1101–1115 (1996).

  46. 46.

    Gruber, A. R., Lorenz, R., Bernhart, S. H., Neuböck, R. & Hofacker, I. L. The Vienna RNA websuite. Nucleic Acids Res. 36, W70–W74(2008).

  47. 47.

    Xu, D. & Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80, 1715–1735 (2012).

Download references


We thank J. Alderman, C. Lieber, C. Hughes, L. Evangelisti, E. Hughes-Picard, E. Ryke, L. Machado and C. Castaldi for help in facilitating this work. We thank J. Galan and H. Sun for providing the S. Typhimurium and for discussion on the infection model. We would thank R. Nowarski, M. Healy, K. Baker and N. Palm for comments and discussion on the manuscript. This work was supported by the Howard Hughes Medical Institute and the Blavatnik Family Foundation (R.A.F.). This work was supported in part by the Searle Scholars Program, the Leukemia Research Foundation, an American Cancer Society Institutional Research Grant Individual Award for New Investigators (IRG-58-012-57), the NIH (R01GM122984), and Yale University West Campus start-up funds (to S.S.). A.K. was in part supported by an NIH Predoctoral Training Grant (5T32GM06754 3-12) (S.S).

Reviewer information

Nature thanks M. Raffatellu and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information


  1. Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA

    • Ruaidhrí Jackson
    • , Lina Kroehling
    • , Will Bailis
    • , Abigail Jarret
    • , Autumn G. York
    • , Omair M. Khan
    • , J. Richard Brewer
    • , Mathias H. Skadow
    • , Coco Duizer
    • , Christian C. D. Harman
    • , Lelina Chang
    • , Piotr Bielecki
    • , Angel G. Solis
    • , Holly R. Steach
    •  & Richard A. Flavell
  2. Department of Chemistry, Yale University, New Haven, CT, USA

    • Alexandra Khitun
    •  & Sarah Slavoff
  3. Chemical Biology Institute, Yale University, West Haven, CT, USA

    • Sarah Slavoff
  4. Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, USA

    • Sarah Slavoff
  5. Howard Hughes Medical Institute, Yale University, New Haven, CT, USA

    • Richard A. Flavell


  1. Search for Ruaidhrí Jackson in:

  2. Search for Lina Kroehling in:

  3. Search for Alexandra Khitun in:

  4. Search for Will Bailis in:

  5. Search for Abigail Jarret in:

  6. Search for Autumn G. York in:

  7. Search for Omair M. Khan in:

  8. Search for J. Richard Brewer in:

  9. Search for Mathias H. Skadow in:

  10. Search for Coco Duizer in:

  11. Search for Christian C. D. Harman in:

  12. Search for Lelina Chang in:

  13. Search for Piotr Bielecki in:

  14. Search for Angel G. Solis in:

  15. Search for Holly R. Steach in:

  16. Search for Sarah Slavoff in:

  17. Search for Richard A. Flavell in:


R.J. conceived the project, performed experiments, analysed the data and wrote the manuscript. L.K. performed all bioinformatics analysis and aided in writing of the manuscript. A.K. preformed all the mass spectrometry experiments, analysed data and aided in writing of the manuscript. W.B. performed experiments and contributed major conceptual insight into the work. A.J. and A.G.Y. participated in experimental design, conducted experiments, analysed data and offered vital conceptual insight. O.M.K., J.R.B., M.H.S. and C.D. performed experiments and analysed data. C.C.D.H. helped with bioinformatics analysis and provided conceptual discussion. L.C., P.B., A.G.S. and H.R.S. helped with experiments. S.S. supervised all mass spectrometry work and contributed to the overall interpretation of this work. R.A.F. supervised the project, helped interpret the work and supervised writing of the manuscript.

Competing interests

R.A.F. is a scientific advisor to GlaxoSmithKline. All other authors declare no competing interests.

Corresponding author

Correspondence to Richard A. Flavell.

Extended data figures and tables

  1. Extended Data Fig. 1 RiboTag RNA isolation and mRNA expression.

    a, Nanodrop analysis of ribosome-associated RNA isolated from RiboTag (Cre) and RiboTagLysM (Cre+) mice showing no detected RNA isolated from Cre BMDMs. b, Bioanalyzer (Agilent) traces and RNA integrity number (RIN) of ribosome-associated RNA isolated from BMDMs from RiboTagLysM mice non-treated or stimulated with LPS for 6 or 24 h. c, qPCR analysis of ribosome-associated transcripts from non-treated BMDMs, or BMDMs stimulated with LPS (10 ng ml−1) or infected with S. Typhimurium at an MOI of 1 for 6 h. Data are mean ± s.e.m. from six biological replicates.

  2. Extended Data Fig. 2 Ribosome profiling, RNA-seq read tracing and RibORF analysis.

    ad, Wild-type BMDMs were non-treated or stimulated with LPS (10 ng ml−1) for 6 h and ribosome profiling was conducted. Data are representative of two biological replicates. a, Pattern of RNA-seq transcriptional reads (red) and ribosome profiling translational reads (blue) for Il12b from non-treated (top) and LPS-stimulated (bottom) BMDMs. The gene structure of Il12b is located in the centre, with a thin blue line representing the introns and wide blue rectangles indicating exonic structure. Thinner exonic structures represent annotated 5ʹ and 3′ UTRs. b, Pattern of RNA-seq transcriptional reads (red) and ribosome profiling translational reads (blue) for a non-RiboTag-identified lncRNA, A130088B15rik, from non-treated (top) and LPS-stimulated (bottom) BMDMs. The gene structure of A130088B15rik is located in the centre, with a thin blue line representing the introns and wide blue rectangles indicating exonic structure. c, Pattern of RNA-seq transcriptional reads (red) and ribosome profiling translational reads (blue) for a RiboTag-identified lncRNA, Aw112010, from non-treated (top) and LPS-stimulated (bottom) BMDMs. The gene structure of Aw112010 is located in the centre, with a thin blue line representing the introns and wide blue rectangles indicating exonic structure. d, RibORF analysis of read distribution (reads per million mappable reads; RPM) around start and stop codons of known, annotated protein-coding genes in steady state and LPS-stimulated samples.

  3. Extended Data Fig. 3 Breakdown of different analytical approaches to predict protein coding lncRNAs.

    a, RiboCode analysis of ribosome-profiling data identifies 85 ORFs within lncRNAs with protein-coding potential. b, Comparison of non-canonical ORFs identified by RibORF, RRS and translation efficiency, and RiboCode analytical strategies from BMDMs expressing lncRNA using ribosome profiling. c, Wild-type BMDMs were non-treated or stimulated with LPS (10 ng ml−1) for 6 h and ribosome profiling was conducted. Data are representative of two biological replicates. Volcano plot of LPS-induced differentially regulated genes identified by RibORF, RiboCode, RRS and translation efficiency analysis.

  4. Extended Data Fig. 4 Overexpression of non-canonical ORFs reveals distinct subcellular localization.

    a, HEK293 cells were transfected with 500 ng of Flag-tagged plasmids encoding the non-canonical ORFs GM7160 and GM9895. Cells were fixed and stained with DAPI (blue, nucleus), phalloidin (red, cytoskeletal F-actin) and anti-Flag (green, ORF of interest). Original magnifications, ×60 and ×100. Data are representative of three or more independent experiments.

  5. Extended Data Fig. 5 Aw112010HA mouse characterization.

    a, Schematic representation of Aw112010HA knock-in mice. b, Genotyping for Aw112010HA mice from CRISPR–Cas9 injections. c, Sequence information for GGSG(×3)–HA-tag insertion used to generate Aw112010HA mice. d, Wild-type and AW112010HA BMDMs were left untreated or stimulated with LPS (10 ng ml−1) for 6 h. Protein lysates were generated and incubated overnight with anti-haemaggultinin magnetic beads. Purified lysates were probed for haemagglutinin by western blot. Whole-cell lysates were used as a loading control and probed for β-tubulin. Data are representative of three independent experiments.

  6. Extended Data Fig. 6 Mass spectrometry validation of the Aw112010 ORF.

    a, Tandem mass spectrometry (MS/MS) fragmentation of an endogenous peptide from Aw112010 found in LPS-stimulated macrophages after haemagglutinin immunoprecipitation purification. Identified fragment ions (b and y ions, red) are indicated above and below the peptide sequence. b, Aw112010 predicted protein sequence. Peptides detected by mass spectrometry are highlighted in red and blue. c, Mass spectrometry information for the identified fragments shown in Fig. 2h.

  7. Extended Data Fig. 7 Characterization of Aw112010Stop mice.

    a, Schematic representation of Aw112010HA knock-in (KI) mice. b, Genotyping for Aw112010Stop mice generated using CRISPR–Cas9. Het, heterozygous. c, Sanger sequencing of the frameshifting stop codon insertion in Aw112010Stop mice and wild-type controls. d, Sequence of frameshifting stop insertion. Stop codons and frame positions are indicated below the sequence.

  8. Extended Data Fig. 8 Cytokine production in wild-type and Aw112010Stop macrophages and mice.

    ac, Wild-type and Aw112010Stop BMDMs were stimulated with LPS for indicated times and supernatants were analysed for IL-12p40, IL-6 and IL-10 by ELISA. Data are from six biological replicates conducted over two independent experiments. d, Mice were administered PBS (n = 5) or LPS (n = 6, WT and Aw112010Stop) (10 mg kg−1) for 6 h via intraperitoneal injection. Serum was analysed for IL-6 by ELISA. Error bars denote s.e.m. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, unpaired two-tailed t-test.

  9. Extended Data Fig. 9 The Aw112010 protein is required for IL-12-40 production.

    ac, Predicted protein structure of the Aw112010 non-canonical ORF-encoded protein generated using the Quark package. Aw112010 is predicted to contain a single transmembrane domain. d, BMDMs were subjected to electroporation with indicated plasmids. BMDMs were stimulated with LPS (10 ng ml−1) for 6 h. Supernatants were analysed for IL-12p40 production by ELISA. Data are representative of three biological replicates. ***P < 0.001, unpaired two-tailed t-test.

Supplementary information

  1. Supplementary Information

    This file contains Supplementary Figure 1: Uncropped Western Blots.

  2. Reporting Summary

  3. Supplementary Table

    This file contains Supplementary Table 1.

Source data

About this article

Publication history




Issue Date




By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.