The ability to predict the impact of cis-regulatory sequences on gene expression would facilitate discovery in fundamental and applied biology. Here we combine polysome profiling of a library of 280,000 randomized 5′ untranslated regions (UTRs) with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately direct specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,212 truncated human 5′ UTRs and 3,577 naturally occurring variants and show that the model predicts ribosome loading of these sequences. Finally, we provide evidence of 45 single-nucleotide variants (SNVs) associated with human diseases that substantially change ribosome loading and thus may represent a molecular basis for disease.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.83 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
The authors declare that all data supporting the findings of this study are available from Gene Expression Omnibus under accession GSE114002.
Araujo, P. R. et al. Before it gets started: regulating translation at the 5′ UTR. Comp. Funct. Genom. 2012, 475731 (2012).
Jackson, R. J., Hellen, C. U. T. & Pestova, T. V. The mechanism of eukaryotic translation initiation and principles of its regulation. Nat. Rev. Mol. Cell Biol. 11, 113–127 (2010).
Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Kleftogiannis, D., Kalnis, P. & Bajic, V. B. DEEP: a general computational framework for predicting enhancers. Nucleic Acids Res. 43, e6 (2015).
Liu, F., Li, H., Ren, C., Bo, X. & Shu, W. PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci. Rep. 6, 28517 (2016).
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Zhao, W. et al. Massively parallel functional annotation of 3′ untranslated regions. Nat. Biotechnol. 32, 387–391 (2014).
Noderer, W. L. et al. Quantitative analysis of mammalian translation initiation sites by FACS-seq. Mol. Syst. Biol. 10, 748 (2014).
Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).
Cuperus, J. T. et al. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res. 27, 2015–2024 (2017).
Zuccotti, P. & Modelska, A. in Post-Transcriptional Gene Regulation (ed. Dassi, E.) 59–69 (Humana Press, 2016).
Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. elife 5, e10921 (2016).
Wang, X., Hou, J., Quedenau, C. & Chen, W. Pervasive isoform‐specific translational regulation via alternative transcription start sites in mammals. Mol. Syst. Biol. 12, 875 (2016).
Whiffin, N. et al. Characterising the loss-of-function impact of 5′ untranslated region variants in whole genome sequence data from 15,708 individuals. Preprint at https://www.biorxiv.org/content/10.1101/543504v1 (2019).
Hinnebusch, A. G., Ivanov, I. P. & Sonenberg, N. Translational control by 5′-untranslated regions of eukaryotic mRNAs. Science 352, 1413–1416 (2016).
Morris, D. R. & Geballe, A. P. Upstream open reading frames as regulators of mRNA translation. Mol. Cell. Biol. 20, 8635–8642 (2000).
Johnstone, T. G., Bazzini, A. A. & Giraldez, A. J. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 35, 706–723 (2016).
Lee, S. et al. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc. Natl Acad. Sci. USA 109, E2424–E2432 (2012).
Reuter, K., Biehl, A., Koch, L. & Helms, V. PreTIS: a tool to predict non-canonical 5′ UTR translational initiation sites in human and mouse. PLoS Comput. Biol. 12, e1005170 (2016).
Starck, S. R. et al. Translation from the 5′ untranslated region shapes the integrated stress response. Science 351, aad3867 (2016).
Hinnebusch, A. G. The scanning mechanism of eukaryotic translation initiation. Annu. Rev. Biochem. 83, 779–812 (2014).
Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).
Kozak, M. Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc. Natl Acad. Sci. USA 83, 2850–2854 (1986).
Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).
Ferreira, J. P., Overton, K. W. & Wang, C. L. Tuning gene expression with synthetic upstream open reading frames. Proc. Natl Acad. Sci. USA 110, 11284–11289 (2013).
Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell in press https://doi.org/10.1016/j.cell.2019.04.046 (2019).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).
Karikó, K. et al. Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol. Ther. 16, 1833–1840 (2008).
Anderson, B. R. et al. Incorporation of pseudouridine into mRNA enhances translation by diminishing PKR activation. Nucleic Acids Res. 38, 5884–5892 (2010).
Kierzek, E. et al. The contribution of pseudouridine to stabilities and structure of RNAs. Nucleic Acids Res. 42, 3492–3501 (2014).
Seo, S. W. et al. Predictive design of mRNA translation initiation region to control prokaryotic translation efficiency. Metab. Eng. 15, 67–74 (2013).
Jensen, M. K. & Keasling, J. D. Recent applications of synthetic biology tools for yeast metabolic engineering. FEMS Yeast Res. 15, 1–10 (2015).
Salis, H. M., Mirsky, E. A. & Voigt, C. A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27, 946–950 (2009).
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Hernandez, R. D. et al. Singleton variants dominate the genetic architecture of human gene expression. Preprint https://doi.org/10.2139/ssrn.3151998 (2018).
Battle, A. et al. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015).
Cenik, C. et al. Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans. Genome Res. 25, 1610–1621 (2015).
Wang, B. & Bissell, D. M. Hereditary Coproporphyria (University of Washington, 2012). .
Boria, I. et al. The ribosomal basis of Diamond–Blackfan anemia: mutation and database update. Hum. Mutat. 31, 1269–1279 (2010).
Qin, Y. et al. Germline mutations in TMEM127 confer susceptibility to pheochromocytoma. Nat. Genet. 42, 229–233 (2010).
Mignone, F. et al. Untranslated regions of mRNAs. Genome Biol. 3, reviews0004.1 (2002).
Leppek, K., Das, R. & Barna, M. Functional 5′ UTR mRNA structures in eukaryotic translation regulation and how to find them. Nat. Rev. Mol. Cell Biol. 19, 158–174 (2018).
Richner, J. M. et al. Vaccine mediated protection against Zika virus-induced congenital disease. Cell 170, 273–283 (2017).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2017).
Abadi, M. et al. TensorFlow: Large-scale machine laerning on heterogeneous systems. Software available from tensorflow.org (2015).
Smedley, D. et al. BioMart—biological queries made easy. BMC Genomics 10, 22 (2009).
We would like to thank A. Rosenberg and J. Linder for helpful discussions on data analysis and modeling. We would also like to thank M. Moore, A. Hsieh and Y. Lim for constructive comments on the manuscript. We are grateful to C. Wang for providing fluorescence data27. This work was supported by a sponsored research agreement by Moderna and National Institutes of Health grant R01HG009892 to G.S.
P.J.S., B.W., G.S. and DRM declare no competing interests. D.R., V.P. and I.M. are employees and shareholders of Moderna.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Figures 1–15, Supplementary Tables 4 and 5, and Supplementary Note 1