The annotation of the mammalian protein-coding genome is incomplete. Arbitrary size restriction of open reading frames (ORFs) and the absolute requirement for a methionine codon as the sole initiator of translation have constrained the identification of potentially important transcripts with non-canonical protein-coding potential1,2. Here, using unbiased transcriptomic approaches in macrophages that respond to bacterial infection, we show that ribosomes associate with a large number of RNAs that were previously annotated as ‘non-protein coding’. Although the idea that such non-canonical ORFs can encode functional proteins is controversial3,4, we identify a range of short and non-ATG-initiated ORFs that can generate stable and spatially distinct proteins. Notably, we show that the translation of a new ORF ‘hidden’ within the long non-coding RNA Aw112010 is essential for the orchestration of mucosal immunity during both bacterial infection and colitis. This work expands our interpretation of the protein-coding genome and demonstrates that proteinaceous products generated from non-canonical ORFs are crucial for the immune response in vivo. We therefore propose that the misannotation of non-canonical ORF-containing genes as non-coding RNAs may obscure the essential role of a multitude of previously undiscovered protein-coding genes in immunity and disease.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
RNA-seq, RiboTag RNA-seq, and ribosome profiling data that support the findings of this study have been deposited in the Gene Expression Omnibus (GEO) repository with the accession code GSE120762. All lncRNA RNA-seq, RiboTagSeq, ribosome profiling sequencing and analysis can also be found in Supplementary Table 1.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Couso, J. P. & Patraquim, P. Classification and function of small open reading frames. Nat. Rev. Mol. Cell Biol. 18, 575–589 (2017).
Kozak, M. Regulation of translation in eukaryotic systems. Annu. Rev. Cell Biol. 8, 197–225 (1992).
Guttman, M., Russell, P., Ingolia, N. T., Weissman, J. S. & Lander, E. S. Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins. Cell 154, 240–251 (2013).
Ingolia, N. T., Lareau, L. F. & Weissman, J. S. Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147, 789–802 (2011).
Hoagland, M. B., Stephenson, M. L., Scott, J. F., Hecht, L. I. & Zamecnik, P. C. A soluble ribonucleic acid intermediate in protein synthesis. J. Biol. Chem. 231, 241–257 (1958).
Sanz, E. et al. Cell-type-specific isolation of ribosome-associated mRNA from complex tissues. Proc. Natl Acad. Sci. USA 106, 13939–13944 (2009).
Clausen, B. E., Burkhardt, C., Reith, W., Renkawitz, R. & Förster, I. Conditional gene targeting in macrophages and granulocytes using LysMcre mice. Transgenic Res. 8, 265–277 (1999).
Mudge, J. M. & Harrow, J. Creating reference gene annotation for the mouse C57BL6/J genome assembly. Mamm. Genome 26, 366–378 (2015).
Pruitt, K. D., Tatusova, T., Brown, G. R. & Maglott, D. R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012).
Kozomara, A. & Griffiths-Jones, S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 42, D68–D73 (2014).
Carpenter, S. et al. A long noncoding RNA mediates both activation and repression of immune response genes. Science 341, 789–792 (2013).
Kotzin, J. J. et al. The long non-coding RNA Morrbid regulates Bim and short-lived myeloid cell lifespan. Nature 537, 239–243 (2016).
Osuna, B. A., Howard, C. J., Kc, S., Frost, A. & Weinberg, D. E. In vitro analysis of RQC activities provides insights into the mechanism and function of CAT tailing. eLife 6, e27949 (2017).
Ingolia, N. T., Ghaemmaghami, S., Newman, J. R. & Weissman, J. S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).
Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife 4, e08890 (2015).
Lander, E. S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Kondo, T. et al. Small peptides switch the transcriptional activity of Shavenbaby during Drosophila embryogenesis. Science 329, 336–339 (2010).
Kearse, M. G. & Wilusz, J. E. Non-AUG translation: a new start for protein synthesis in eukaryotes. Genes Dev. 31, 1717–1731 (2017).
Wang, H., Wang, Y., Xie, S., Liu, Y. & Xie, Z. Global and cell-type specific properties of lincRNAs with ribosome occupancy. Nucleic Acids Res. 45, 2786–2796 (2017).
Xiao, Z. et al. De novo annotation and characterization of the translatome with ribosome profiling data. Nucleic Acids Res. 46, e61–e61 (2018).
D’Lima, N. G. et al. A human microprotein that interacts with the mRNA decapping complex. Nat. Chem. Biol. 13, 174–180 (2017).
de Jong, R. et al. Severe mycobacterial and Salmonella infections in interleukin-12 receptor-deficient patients. Science 280, 1435–1438 (1998).
Lehmann, J. et al. IL-12p40-dependent agonistic effects on the development of protective innate and adaptive immunity against Salmonella enteritidis. J. Immunol. 167, 5304–5315 (2001).
Mannon, P. J. et al. Anti-interleukin-12 antibody for active Crohn’s disease. N. Engl. J. Med. 351, 2069–2079 (2004).
Neurath, M. F., Fuss, I., Kelsall, B. L., Stüber, E. & Strober, W. Antibodies to interleukin 12 abrogate established experimental colitis in mice. J. Exp. Med. 182, 1281–1290 (1995).
Pulak, R. & Anderson, P. mRNA surveillance by the Caenorhabditis elegans smg genes. Genes Dev. 7, 1885–1897 (1993).
Martin, L. et al. Identification and characterization of small molecules that inhibit nonsense-mediated RNA decay and suppress nonsense p53 mutations. Cancer Res. 74, 3104–3113 (2014).
Nowarski, R. et al. Epithelial IL-18 equilibrium controls barrier function in colitis. Cell 163, 1444–1456 (2015).
Gabanyi, I. et al. Neuro-immune interactions drive tissue programming in intestinal macrophages. Cell 164, 378–391 (2016).
Obrig, T. G., Culp, W. J., McKeehan, W. L. & Hardesty, B. The mechanism by which cycloheximide and related glutarimide antibiotics inhibit peptide synthesis on reticulocyte ribosomes. J. Biol. Chem. 246, 174–181 (1971).
Schneider-Poetsch, T. et al. Inhibition of eukaryotic translation elongation by cycloheximide and lactimidomycin. Nat. Chem. Biol. 6, 209–217 (2010).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Zhang, H., Meltzer, P. & Davis, S. RCircos: an R package for Circos 2D track plots. BMC Bioinformatics 14, 244 (2013).
Phanstiel, D. H., Boyle, A. P., Araya, C. L. & Snyder, M. P. Sushi.R: flexible, quantitative and integrative genomic visualizations for publication-quality multi-panel figures. Bioinformatics 30, 2808–2810 (2014).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Frolova, L. et al. A highly conserved eukaryotic protein family possessing properties of polypeptide chain release factor. Nature 372, 701–703 (1994).
Pertea, M. et al. Thousands of large-scale RNA sequencing experiments yield a comprehensive new human gene list and reveal extensive transcriptional noise. Preprint at https://www.bioRxiv.org/content/early/2018/05/28/332825 (2018).
Wessel, D. & Flügge, U. I. A method for the quantitative recovery of protein in dilute solution in the presence of detergents and lipids. Anal. Biochem. 138, 141–143 (1984).
Slavoff, S. A. et al. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 9, 59–64 (2013).
Gundry, R. L. et al. Preparation of proteins and peptides for mass spectrometry analysis in a bottom-up proteomics workflow. Curr. Protoc. Mol. Biol. Chapter 10, Unit10.25 (2009).
Gerber, S. A., Rush, J., Stemman, O., Kirschner, M. W. & Gygi, S. P. Absolute quantification of proteins and phosphoproteins from cell lysates by tandem MS. Proc. Natl Acad. Sci. USA 100, 6940–6945 (2003).
Chen, L. M., Kaniga, K. & Galán, J. E. Salmonella spp. are cytotoxic for cultured macrophages. Mol. Microbiol. 21, 1101–1115 (1996).
Gruber, A. R., Lorenz, R., Bernhart, S. H., Neuböck, R. & Hofacker, I. L. The Vienna RNA websuite. Nucleic Acids Res. 36, W70–W74(2008).
Xu, D. & Zhang, Y. Ab initio protein structure assembly using continuous structure fragments and optimized knowledge-based force field. Proteins 80, 1715–1735 (2012).
We thank J. Alderman, C. Lieber, C. Hughes, L. Evangelisti, E. Hughes-Picard, E. Ryke, L. Machado and C. Castaldi for help in facilitating this work. We thank J. Galan and H. Sun for providing the S. Typhimurium and for discussion on the infection model. We would thank R. Nowarski, M. Healy, K. Baker and N. Palm for comments and discussion on the manuscript. This work was supported by the Howard Hughes Medical Institute and the Blavatnik Family Foundation (R.A.F.). This work was supported in part by the Searle Scholars Program, the Leukemia Research Foundation, an American Cancer Society Institutional Research Grant Individual Award for New Investigators (IRG-58-012-57), the NIH (R01GM122984), and Yale University West Campus start-up funds (to S.S.). A.K. was in part supported by an NIH Predoctoral Training Grant (5T32GM06754 3-12) (S.S).
Nature thanks M. Raffatellu and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Nanodrop analysis of ribosome-associated RNA isolated from RiboTag (Cre−) and RiboTagLysM (Cre+) mice showing no detected RNA isolated from Cre− BMDMs. b, Bioanalyzer (Agilent) traces and RNA integrity number (RIN) of ribosome-associated RNA isolated from BMDMs from RiboTagLysM mice non-treated or stimulated with LPS for 6 or 24 h. c, qPCR analysis of ribosome-associated transcripts from non-treated BMDMs, or BMDMs stimulated with LPS (10 ng ml−1) or infected with S. Typhimurium at an MOI of 1 for 6 h. Data are mean ± s.e.m. from six biological replicates.
a–d, Wild-type BMDMs were non-treated or stimulated with LPS (10 ng ml−1) for 6 h and ribosome profiling was conducted. Data are representative of two biological replicates. a, Pattern of RNA-seq transcriptional reads (red) and ribosome profiling translational reads (blue) for Il12b from non-treated (top) and LPS-stimulated (bottom) BMDMs. The gene structure of Il12b is located in the centre, with a thin blue line representing the introns and wide blue rectangles indicating exonic structure. Thinner exonic structures represent annotated 5ʹ and 3′ UTRs. b, Pattern of RNA-seq transcriptional reads (red) and ribosome profiling translational reads (blue) for a non-RiboTag-identified lncRNA, A130088B15rik, from non-treated (top) and LPS-stimulated (bottom) BMDMs. The gene structure of A130088B15rik is located in the centre, with a thin blue line representing the introns and wide blue rectangles indicating exonic structure. c, Pattern of RNA-seq transcriptional reads (red) and ribosome profiling translational reads (blue) for a RiboTag-identified lncRNA, Aw112010, from non-treated (top) and LPS-stimulated (bottom) BMDMs. The gene structure of Aw112010 is located in the centre, with a thin blue line representing the introns and wide blue rectangles indicating exonic structure. d, RibORF analysis of read distribution (reads per million mappable reads; RPM) around start and stop codons of known, annotated protein-coding genes in steady state and LPS-stimulated samples.
Extended Data Fig. 3 Breakdown of different analytical approaches to predict protein coding lncRNAs.
a, RiboCode analysis of ribosome-profiling data identifies 85 ORFs within lncRNAs with protein-coding potential. b, Comparison of non-canonical ORFs identified by RibORF, RRS and translation efficiency, and RiboCode analytical strategies from BMDMs expressing lncRNA using ribosome profiling. c, Wild-type BMDMs were non-treated or stimulated with LPS (10 ng ml−1) for 6 h and ribosome profiling was conducted. Data are representative of two biological replicates. Volcano plot of LPS-induced differentially regulated genes identified by RibORF, RiboCode, RRS and translation efficiency analysis.
Extended Data Fig. 4 Overexpression of non-canonical ORFs reveals distinct subcellular localization.
a, HEK293 cells were transfected with 500 ng of Flag-tagged plasmids encoding the non-canonical ORFs GM7160 and GM9895. Cells were fixed and stained with DAPI (blue, nucleus), phalloidin (red, cytoskeletal F-actin) and anti-Flag (green, ORF of interest). Original magnifications, ×60 and ×100. Data are representative of three or more independent experiments.
a, Schematic representation of Aw112010HA knock-in mice. b, Genotyping for Aw112010HA mice from CRISPR–Cas9 injections. c, Sequence information for GGSG(×3)–HA-tag insertion used to generate Aw112010HA mice. d, Wild-type and AW112010HA BMDMs were left untreated or stimulated with LPS (10 ng ml−1) for 6 h. Protein lysates were generated and incubated overnight with anti-haemaggultinin magnetic beads. Purified lysates were probed for haemagglutinin by western blot. Whole-cell lysates were used as a loading control and probed for β-tubulin. Data are representative of three independent experiments.
a, Tandem mass spectrometry (MS/MS) fragmentation of an endogenous peptide from Aw112010 found in LPS-stimulated macrophages after haemagglutinin immunoprecipitation purification. Identified fragment ions (b and y ions, red) are indicated above and below the peptide sequence. b, Aw112010 predicted protein sequence. Peptides detected by mass spectrometry are highlighted in red and blue. c, Mass spectrometry information for the identified fragments shown in Fig. 2h.
a, Schematic representation of Aw112010HA knock-in (KI) mice. b, Genotyping for Aw112010Stop mice generated using CRISPR–Cas9. Het, heterozygous. c, Sanger sequencing of the frameshifting stop codon insertion in Aw112010Stop mice and wild-type controls. d, Sequence of frameshifting stop insertion. Stop codons and frame positions are indicated below the sequence.
a–c, Wild-type and Aw112010Stop BMDMs were stimulated with LPS for indicated times and supernatants were analysed for IL-12p40, IL-6 and IL-10 by ELISA. Data are from six biological replicates conducted over two independent experiments. d, Mice were administered PBS (n = 5) or LPS (n = 6, WT and Aw112010Stop) (10 mg kg−1) for 6 h via intraperitoneal injection. Serum was analysed for IL-6 by ELISA. Error bars denote s.e.m. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, unpaired two-tailed t-test.
a–c, Predicted protein structure of the Aw112010 non-canonical ORF-encoded protein generated using the Quark package. Aw112010 is predicted to contain a single transmembrane domain. d, BMDMs were subjected to electroporation with indicated plasmids. BMDMs were stimulated with LPS (10 ng ml−1) for 6 h. Supernatants were analysed for IL-12p40 production by ELISA. Data are representative of three biological replicates. ***P < 0.001, unpaired two-tailed t-test.