Translation is pervasive outside of canonical coding regions, occurring in long noncoding RNAs, canonical untranslated regions and introns1,2,3,4, especially in ageing4,5,6, neurodegeneration5,7 and cancer8,9,10. Notably, the majority of tumour-specific antigens are results of noncoding translation11,12,13. Although the resulting polypeptides are often nonfunctional, translation of noncoding regions is nonetheless necessary for the birth of new coding sequences14,15. The mechanisms underlying the surveillance of translation in diverse noncoding regions and how escaped polypeptides evolve new functions remain unclear10,16,17,18,19. Functional polypeptides derived from annotated noncoding sequences often localize to membranes20,21. Here we integrate massively parallel analyses of more than 10,000 human genomic sequences and millions of random sequences with genome-wide CRISPR screens, accompanied by in-depth genetic and biochemical characterizations. Our results show that the intrinsic nucleotide bias in the noncoding genome and in the genetic code frequently results in polypeptides with a hydrophobic C-terminal tail, which is captured by the ribosome-associated BAG6 membrane protein triage complex for either proteasomal degradation or membrane targeting. By contrast, canonical proteins have evolved to deplete C-terminal hydrophobic residues. Our results reveal a fail-safe mechanism for the surveillance of unwanted translation from diverse noncoding regions and suggest a possible biochemical route for the preferential membrane localization of newly evolved proteins.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Scripts for data analysis are available at https://github.com/xuebingwu/noncoding-translation-code.
Ingolia, N. T. et al. Ribosome profiling reveals pervasive translation outside of annotated protein-coding genes. Cell Rep. 8, 1365–1379 (2014).
Ji, Z., Song, R., Regev, A. & Struhl, K. Many lncRNAs, 5′UTRs, and pseudogenes are translated and some are likely to express functional proteins. eLife 4, e08890 (2015).
Weatheritt, R. J., Sterne-Weiler, T. & Blencowe, B. J. The ribosome-engaged landscape of alternative splicing. Nat. Struct. Mol. Biol. 23, 1117–1123 (2016).
Sudmant, P. H., Lee, H., Dominguez, D., Heiman, M. & Burge, C. B. Widespread accumulation of ribosome-associated isolated 3′ UTRs in neuronal cell populations of the aging brain. Cell Rep. 25, 2447–2456 e2444 (2018).
Adusumalli, S., Ngian, Z. K., Lin, W. Q., Benoukraf, T. & Ong, C. T. Increased intron retention is a post-transcriptional signature associated with progressive aging and Alzheimer’s disease. Aging Cell 18, e12928 (2019).
Mazin, P. et al. Widespread splicing changes in human brain development and aging. Mol. Syst. Biol. 9, 633 (2013).
Hsieh, Y. C. et al. Tau-mediated disruption of the spliceosome triggers cryptic RNA splicing and neurodegeneration in Alzheimer’s disease. Cell Rep. 29, 301–316.e310 (2019).
Dvinge, H. & Bradley, R. K. Widespread intron retention diversifies most cancer transcriptomes. Genome Med. 7, 45 (2015).
Lee, S. H. et al. Widespread intronic polyadenylation inactivates tumour suppressor genes in leukaemia. Nature 561, 127–131 (2018).
Dhamija, S. et al. A pan-cancer analysis reveals nonstop extension mutations causing SMAD4 tumour suppressor degradation. Nat. Cell Biol. 22, 999–1010 (2020).
Laumont, C. M. et al. Noncoding regions are the main source of targetable tumor-specific antigens. Sci. Transl. Med. 10, eaau5516 (2018).
Xiang, R. et al. Increased expression of peptides from non-coding genes in cancer proteomics datasets suggests potential tumor neoantigens. Commun. Biol. 4, 496 (2021).
Smart, A. C. et al. Intron retention is a source of neoepitopes in cancer. Nat. Biotechnol. 36, 1056–1058 (2018).
Vakirlis, N. et al. De novo emergence of adaptive membrane proteins from thymine-rich genomic sequences. Nat. Commun. 11, 781 (2020).
Carvunis, A. R. et al. Proto-genes and de novo gene birth. Nature 487, 370–374 (2012).
Yordanova, M. M. et al. AMD1 mRNA employs ribosome stalling as a mechanism for molecular memory formation. Nature 553, 356–360 (2018).
Hashimoto, S., Nobuta, R., Izawa, T. & Inada, T. Translation arrest as a protein quality control system for aberrant translation of the 3′-UTR in mammalian cells. FEBS Lett. 593, 777–787 (2019).
Arribere, J. A. et al. Translation readthrough mitigation. Nature 534, 719–723 (2016).
Kramarski, L. & Arbely, E. Translational read-through promotes aggregation and shapes stop codon identity. Nucleic Acids Res. 48, 3747–3760 (2020).
Chen, J. et al. Pervasive functional translation of noncanonical human open reading frames. Science 367, 1140–1146 (2020).
van Heesch, S. et al. The translational landscape of the human heart. Cell 178, 242–260.e229 (2019).
Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).
Bai, B. et al. U1 small nuclear ribonucleoprotein complex and RNA splicing alterations in Alzheimer’s disease. Proc. Natl Acad. Sci. USA 110, 16562–16567 (2013).
Wang, L. et al. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N. Engl. J. Med. 365, 2497–2506 (2011).
Hsu, T. Y. et al. The spliceosome is a therapeutic vulnerability in MYC-driven cancer. Nature 525, 384–388 (2015).
Wang, D. et al. Inhibition of nonsense-mediated RNA decay by the tumor microenvironment promotes tumorigenesis. Mol. Cell. Biol. 31, 3670–3680 (2011).
Son, H. G. et al. RNA surveillance via nonsense-mediated mRNA decay is crucial for longevity in daf-2/insulin/IGF-1 mutant C. elegans. Nat. Commun. 8, 14749 (2017).
Sun, Y., Eshov, A., Zhou, J., Isiktas, A. U. & Guo, J. U. C9orf72 arginine-rich dipeptide repeats inhibit UPF1-mediated RNA decay via translational repression. Nat. Commun. 11, 3354 (2020).
Wangen, J. R. & Green, R. Stop codon context influences genome-wide stimulation of termination codon readthrough by aminoglycosides. eLife 9, e52611 (2020).
Dong, C. et al. Intron retention-induced neoantigen load correlates with unfavorable prognosis in multiple myeloma. Oncogene 40, 6130–6138 (2021).
Lin, H. C. et al. C-terminal end-directed protein elimination by CRL2 ubiquitin ligases. Mol. Cell 70, 602–613.e603 (2018).
Koren, I. et al. The eukaryotic proteome is shaped by E3 ubiquitin ligases targeting C-terminal degrons. Cell 173, 1622–1635.e1614 (2018).
Dyson, H. J. & Wright, P. E. Intrinsically unstructured proteins and their functions. Nat. Rev. Mol. Cell Biol. 6, 197–208 (2005).
Zhang, Y. E., Vibranovski, M. D., Landback, P., Marais, G. A. B. & Long, M. Y. Chromosomal redistribution of male-biased genes in mammalian evolution with two bursts of gene gain on the X chromosome. PLoS Biol. 8, e1000494 (2010).
Wolfenden, R. V., Cullis, P. M. & Southgate, C. C. Water, protein folding, and the genetic code. Science 206, 575–577 (1979).
Juszkiewicz, S. & Hegde, R. S. Initiation of quality control during poly(A) translation requires site-specific ribosome ubiquitination. Mol. Cell 65, 743–750 e744 (2017).
Liu, Z. et al. Systematic comparison of 2A peptides for cloning multi-genes in a polycistronic vector. Sci Rep. 7, 2193 (2017).
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).
Wunderley, L., Leznicki, P., Payapilly, A. & High, S. SGTA regulates the cytosolic quality control of hydrophobic substrates. J. Cell Sci. 127, 4728–4739 (2014).
Shao, S., Rodrigo-Brenni, M. C., Kivlen, M. H. & Hegde, R. S. Mechanistic basis for a molecular triage reaction. Science 355, 298–302 (2017).
Hessa, T. et al. Protein targeting and degradation are coupled for elimination of mislocalized proteins. Nature 475, 394–397 (2011).
Mariappan, M. et al. A ribosome-associating factor chaperones tail-anchored membrane proteins. Nature 466, 1120–1124 (2010).
Rodrigo-Brenni, M. C., Gutierrez, E. & Hegde, R. S. Cytosolic quality control of mislocalized proteins requires RNF126 recruitment to Bag6. Mol. Cell 55, 227–237 (2014).
Hu, X. et al. RNF126-mediated reubiquitination is required for proteasomal degradation of p97-extracted membrane proteins. Mol. Cell 79, 320–331.e329 (2020).
Wang, Q. et al. A ubiquitin ligase-associated chaperone holdase maintains polypeptides in soluble states for proteasome degradation. Mol. Cell 42, 758–770 (2011).
Leznicki, P. & High, S. SGTA associates with nascent membrane protein precursors. EMBO Rep. 21, e48835 (2020).
Akahane, T., Sahara, K., Yashiroda, H., Tanaka, K. & Murata, S. Involvement of Bag6 and the TRC pathway in proteasome assembly. Nat. Commun. 4, 2234 (2013).
Yewdell, J. W. & Nicchitta, C. V. The DRiP hypothesis decennial: support, controversy, refinement and extension. Trends Immunol. 27, 368–373 (2006).
Minami, R. et al. BAG-6 is essential for selective elimination of defective proteasomal substrates. J. Cell Biol. 190, 637–650 (2010).
Huang, L., Kuhls, M. C. & Eisenlohr, L. C. Hydrophobicity as a driver of MHC class I antigen processing. EMBO J. 30, 1634–1644 (2011).
Brinkman, E. K., Chen, T., Amendola, M. & van Steensel, B. Easy quantitative assessment of genome editing by sequence trace decomposition. Nucleic Acids Res. 42, e168 (2014).
Hezroni, H. et al. Principles of long noncoding RNA evolution derived from direct comparison of transcriptomes in 17 species. Cell Rep. 11, 1110–1122 (2015).
Tareen, A. & Kinney, J. B. Logomaker: beautiful sequence logos in Python. Bioinformatics 36, 2272–2274 (2020).
Joung, J. et al. Genome-scale CRISPR–Cas9 knockout and transcriptional activation screening. Nat. Protoc. 12, 828–863 (2017).
Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
Moffat, L. & Jones, D. T. Increasing the accuracy of single sequence prediction methods using a deep semi-supervised learning framework. Bioinformatics 37, 3744–3751 (2021).
Osorio, D., Rondon-Villarreal, P. & Torres, R. Peptides: a package for data mining of antimicrobial peptides. R J. 7, 4–14 (2015).
Miyazawa, S. & Jernigan, R. L. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552 (1985).
Lu, S. et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 48, D265–D268 (2020).
The authors thank D. Bartel for supporting some of the early work on this project; C. Zhang, P. Sims, B. Honig and M. AlQuraishi for discussion; and S. Diederichs for sharing the SMAD4 readthrough cells. X.W. is supported by NIH Director’s New Innovator Award (1DP2GM140977), Pershing Square Sohn Prize for Cancer Research, Pew-Stewart Scholar for Cancer Research Award, and the Impetus Longevity Grants. N.M. is supported by the National Institute of Aging (NIA) grants R01AG064244 and RF1AG070075. This research was funded in part through the NIH/NCI Cancer Center Support Grant P30CA013696 and used the Genomics and High Throughput Screening Shared Resource and CCTI Flow Cytometry Core. The CCTI Flow Cytometry Core is supported in part by the Office of the Director, National Institutes of Health under awards S10RR027050 and S10OD020056. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
The authors declare no competing interests.
Peer review information
Nature thanks Hiroyuki Kawahara, Tobias von der Haar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Translation surveillance of representative noncoding sequences.
a, Noncoding sequences in the HSP90B1 3’ UTR, an ACTB intron, and a GAPDH intron were cloned into the bicistronic reporter system shown in Fig. 1b. b, Density plots for the distribution of EGFP/mCherry ratios as measured by flow cytometry 24 hours after reporter transfection. The median fold loss of EGFP/mCherry ratio relative to control is shown on the top left corner of each density plot. c, Density plot of the EGFP/mCherry ratio for cells transfected with either the control or the ACTB intron reporter, alone or with simultaneous treatment of either proteasome inhibitor (lactacystin) or lysosome inhibitor (chloroquine). The numbers indicate the median fold loss of EGFP/mCherry relative to control. d—f, six noncoding sequences from the Pep30 library (KRT2 intron, APOL4 intron, LINC00222, LINC02885, ASPAY 3’ UTR, and IFT81 3’ UTR) were selected and cloned into either the original mCherry-EGFP bicistronic reporter (d, cloning failed for KRT2), fused to the C-terminus of HA-tagged PspCas13b protein (e, cloning failed for APOL4), or fused to the C-terminus of RPL3 (f, cloning failed for IFT81). d, Same as b for indicated noncoding sequences. e, Equal amount of HA-dPspCas13b-pep30 reporter plasmids were co-transfected with a HA-RfxCas13d plasmid and the protein abundance was assayed by western blotting with an HA antibody. HA-dCas13b fused to human protein eIF4E was used as a control. The abundance of HA-dCas13b-pep30 was quantified by first normalizing to HA-Cas13d then to eIF4E fusion. f, Equal amount of RPL3 reporter plasmids were transfected into HEK293T cells and western blots were performed using an RPL3 antibody, which detects both endogenous RPL3 (lower bands) and the RPL3 reporter protein (upper bands). NT: no transfection control. The level of the reporter protein was first normalized to endogenous RPL3 and then to the RPL3-3xHA sample. N = 4 biological replicates.
Extended Data Fig. 2 Characterization of the Pep30 library.
a, Sequence diversity in the Pep30 library. The pairwise hamming distance (number of nucleotides that are different) between any two sequences (of 90-nt) in the library was calculated. Subsequently for each sequence, we identify the shortest distance to any other sequence in the library. The result showed that the vast majority (98%) of Pep30 sequences are at least 40 nt (out of 90 nt) different from other sequences in the library, with a median distance of 48. This is very close the distribution when the Pep30 library sequences are shuffled (median: 50). The result indicated that our Pep30 library is nearly as diverse as one can get from entirely unrelated sequences. b—d, Effect of proteasome inhibition or lysosome inhibition on the Pep30 library. b, Pep30 cells were treated with proteasome inhibitors for 8 h and then analyzed with flow cytometry. Ctrl: Pep30 cells without treatment. c, Same as (b) for multiple lysosome inhibitors. d, longer (24 h vs. 6 h) proteasome inhibition but not lysosome inhibition resulted in more rescue.
Extended Data Fig. 3 Hydrophobicity analyses in the Pep30 library and the human genome.
a, The correlation coefficient between Pep30 reporter expression and average hydrophobicity calculated using various scales. b, Spearman correlation coefficient (light bar) between various properties of the Pep30 sequences and reporter expression. Dark bar: partial correlation conditioned on average hydrophobicity. c. Same as Fig. 2f with a different hydrophobicity scale (Ponnuswamy instead of Miyazawa). d, Average hydrophobicity for the first 100 aa (N-termini) of annotated proteins (N = 38,933). e, Average hydrophobicity of the C-termini of annotated proteins without any annotated protein domains in the last 100aa (N = 8,586). Shown are the Spearman correlation coefficient R and the P value of a two-sided Spearman’s correlation test. No adjustments were made for multiple comparisons.
Extended Data Fig. 4 Bias in the genetic code drives hydrophobicity.
a, Same as Fig. 3b (right) for all peptide lengths. b, Codons ranked by the hydrophobicity of the corresponding amino acids. c, Nucleotide composition in different types of regions in the human genome.
Extended Data Fig. 5 AMD1 3’ UTR translation mitigation.
a, Western blot confirming the loss of the EGFP-AMD1 tail fusion protein. HEK293T cells were transfected with varying amount of the AMD1 3’ UTR readthrough reporter plasmid, from 50 ng to 850 ng. (N = 2 biologically independent samples). b, The AMD1 3’ UTR translation reporter with the hydrophobic region in the AMD1 tail highlighted (A-E). c, Impact of deleting individual hydrophobic regions or larger regions on the EGFP/mCherry ratio. The number in each plot is the median decrease of the EGFP/mCherry ratio relative to controls. d, BAG6 co-immunoprecipitates with EGFP:AMD1 fusion protein but not a mutated fusion protein with the functional hydrophobic region C-to-E deleted (AMD1∆H). N = 4 biologically independent samples over 2 independent experiments for the quantification. Data are presented as mean values +/− s.d. P values calculated using two-sided Student’ t-test. No adjustments were made for multiple comparisons. ****: P < 0.0001.
Extended Data Fig. 6 Ribosome roadblock effect: comparing the AMD1 tail sequence, poly(A) and the XBP1 stalling sequence.
a—e, Reporter constructs shown on the left were transfected into HEK293T cells. The EGFP/mCherry ratio was quantified in individual cells using flow cytometry with distributions shown on the right on a log-10 scale. The number in each plot is the median fold-decrease of the EGFP/mCherry ratio. Note that AMD1 sequence causes less decrease in EGFP compared to both XBP1 and poly(A) sequences, and even this weak effect is independent of the putative pausing sequence in AMD1.
Extended Data Fig. 7 Characterization of the BAG6 KO cells and RNF126 KO cells.
a, Genotyping the BAG6 clonal knockout cell line. Sanger sequencing of 10 clones of PCR-amplified genomic DNA confirmed that the BAG6 KO cells contain a frameshift mutation in both alleles, one with a 5-nt deletion and the other with an 11-nt deletion around the expected Cas9 cut site. b, Re-expressing wild type BAG6 but not an inactive mutant missing the UBL domain for recruiting RNF126 (BAG6-UBL) partially reverses BAG6 KO phenotype as measured by the destabilization of AMD1 readthrough product. c, Same as b but comparing wild type RNF126 and an inactive mutant with a C237A mutation in the active site. d-e, Growth defect of BAG6 KO cells (d) and RNF126 KO cells (N = 3 biologically independent samples) (e) revealed by competitive growth assays. KO cells and WT cells were mixed and co-cultured for 15 days and the relative cell numbers (KO/WT) at each time point was determined by decomposition of sanger sequencing traces as described in Methods. N = 1 for day 0 of BAG6 and N = 3 biologically independent samples for all other time points. Data are presented as mean values +/− s.d.
Extended Data Fig. 8 BAG6 or TRC35 knockout does not affect proteasome activity or level.
a, Representative result from in-gel proteasome activity assay showing proteasome hydrolysis activity (left) and representative immunoblot probing for a subunits levels of the 26S 1- and 2-cap proteasome and 20S proteasome (middle). Cell lysates were run on 4% non-denaturing (native) gels and incubated with fluorogenic Suc-LLVY-amc proteasome substrate to determine relative activities or immunoblotted to determine relative levels. Samples (10.5 µg protein/well) were run separately under denaturing conditions for immunoblot probing for actin as a sample processing control (right). b, The level of 26S 1- and 2-cap proteasome detected by immunoblotting normalized to actin in the same sample (left), densitometric quantification of 26S 1- and 2-cap proteasome in-gel activity normalized by actin in the same sample (middle), and the activity/level ratio (right). Data are expressed mean ± SEM for three biological replicates, where each value represents the activity/level ratio calculated by averaging four technical replicates of activity and level values. One-way ANOVA was used for statistical analysis, with P < 0.05 considered significant. c, Similar result with in vivo proteasome activity reporter assays. The proteasome activity reporter UbG76V-EGFP was co-transfected with mCherry (1:1) into cells and the EGFP/mCherry ratio measured by flow cytometry was used as an indicator of proteasome activity in cells. The distribution the EGFP/mCherry ratio in WT, BAG6 KO, and TRC35 KO cells at 250 ng, 500 ng, and 1000 ng total plasmid were shown.
Extended Data Fig. 9 Replicating the Pep30 reporter assay in BAG6 KO cells.
The sequencing-based assay shown in Fig. 5f–h was repeated starting from cell sorting. a, Same as Fig. 5g. b, Same as Fig. 5h. c, full-length Pep30 reporter sequences with a minimum of 3000 reads (all four bins combined) were divided into three groups: those that are stable in wild-type cells (normalized expression >0.8), those that are unstable in wild type cells but are stabilized (increased expression) in BAG6 KO cells, and those that are unstable in wild type cells and are not stabilized in BAG6 KO cells. Shown are the density plot of the hydrophobicity of sequences in each group. d, same as c for the replicate shown in Fig. 5. P values were calculated using two-sided Mann-Whitney U test. No adjustments were made for multiple comparisons.
Extended Data Fig. 10 BAG6 and RNF126 mediate the degradation of SMAD4 readthrough products.
a, A dual color reporter fusing SMAD4 3’ UTR encoded peptide to the C-terminus of EGFP was tested in wild-type HEK293T cells, BAG6 KO cells, and RNF126 KO cells using flow cytometry as a readout. The number on the top left corner of each density plot is the median fold loss of EGFP/mCherry in the readthrough reporter relative to control. b, No significant change of SMAD4 mRNA level with BAG6 KO. RT: readthrough. N = 4 biologically independent samples. Data are presented as mean values +/− s.d. c, Efficient RNF126 knockdown and the lack of impact on endogenous SMAD4 mRNA (qRT-PCR). N = 4 biologically independent samples. Data are presented as mean values +/− s.d. d, Endogenous SMAD4 readthrough protein is stabilized by both BAG6 KO and RNF126 knockdown. Representative western blots on the left and quantification on the right. N = 3 biologically independent samples. Data are presented as mean values +/− s.d. One-way ANOVA was used for statistical analysis, with P < 0.05 considered significant. **: P < 0.01. No adjustments were made for multiple comparisons.
This file contains the uncropped gels (Supplementary Fig. 1) and the flow cytometry gating strategy.
Supplementary Table 1
Subcellular localization of functional peptides. A list of 64 polypeptides with experimentally determined function and subcellular localization. Shown are peptide name, transcript name, localization, and publication.
Supplementary Table 2
Sequences of the Pep30 library. Tab-delimited text file listing with an ID in column 1 and the sequence in column 2. The first and last 15-nts are constant.
Supplementary Table 3
The CRISPR screen analyzed by MAGeCK. Shown is the gene summary file from MAGeCK output. Adjustments were made for multiple comparisons.
Supplementary Table 4
Oligonucleotide sequences used in this study. A list of all oligonucleotide sequences used in this study, including name, sequence, and a brief annotation.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kesner, J.S., Chen, Z., Shi, P. et al. Noncoding translation mitigation. Nature 617, 395–402 (2023). https://doi.org/10.1038/s41586-023-05946-4
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.