Abstract
The ability to predict the impact of cis-regulatory sequences on gene expression would facilitate discovery in fundamental and applied biology. Here we combine polysome profiling of a library of 280,000 randomized 5′ untranslated regions (UTRs) with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately direct specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,212 truncated human 5′ UTRs and 3,577 naturally occurring variants and show that the model predicts ribosome loading of these sequences. Finally, we provide evidence of 45 single-nucleotide variants (SNVs) associated with human diseases that substantially change ribosome loading and thus may represent a molecular basis for disease.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The authors declare that all data supporting the findings of this study are available from Gene Expression Omnibus under accession GSE114002.
Code availability
The code for the Optimus 5-Prime model is provided in the Supplementary Code file. All code is also available at https://github.com/pjsample/human_5utr_modeling.
References
Araujo, P. R. et al. Before it gets started: regulating translation at the 5′ UTR. Comp. Funct. Genom. 2012, 475731 (2012).
Jackson, R. J., Hellen, C. U. T. & Pestova, T. V. The mechanism of eukaryotic translation initiation and principles of its regulation. Nat. Rev. Mol. Cell Biol. 11, 113–127 (2010).
Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Kleftogiannis, D., Kalnis, P. & Bajic, V. B. DEEP: a general computational framework for predicting enhancers. Nucleic Acids Res. 43, e6 (2015).
Liu, F., Li, H., Ren, C., Bo, X. & Shu, W. PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci. Rep. 6, 28517 (2016).
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Zhao, W. et al. Massively parallel functional annotation of 3′ untranslated regions. Nat. Biotechnol. 32, 387–391 (2014).
Noderer, W. L. et al. Quantitative analysis of mammalian translation initiation sites by FACS-seq. Mol. Syst. Biol. 10, 748 (2014).
Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).
Cuperus, J. T. et al. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res. 27, 2015–2024 (2017).
Zuccotti, P. & Modelska, A. in Post-Transcriptional Gene Regulation (ed. Dassi, E.) 59–69 (Humana Press, 2016).
Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. elife 5, e10921 (2016).
Wang, X., Hou, J., Quedenau, C. & Chen, W. Pervasive isoform‐specific translational regulation via alternative transcription start sites in mammals. Mol. Syst. Biol. 12, 875 (2016).
Whiffin, N. et al. Characterising the loss-of-function impact of 5′ untranslated region variants in whole genome sequence data from 15,708 individuals. Preprint at https://www.biorxiv.org/content/10.1101/543504v1 (2019).
Hinnebusch, A. G., Ivanov, I. P. & Sonenberg, N. Translational control by 5′-untranslated regions of eukaryotic mRNAs. Science 352, 1413–1416 (2016).
Morris, D. R. & Geballe, A. P. Upstream open reading frames as regulators of mRNA translation. Mol. Cell. Biol. 20, 8635–8642 (2000).
Johnstone, T. G., Bazzini, A. A. & Giraldez, A. J. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 35, 706–723 (2016).
Lee, S. et al. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc. Natl Acad. Sci. USA 109, E2424–E2432 (2012).
Reuter, K., Biehl, A., Koch, L. & Helms, V. PreTIS: a tool to predict non-canonical 5′ UTR translational initiation sites in human and mouse. PLoS Comput. Biol. 12, e1005170 (2016).
Starck, S. R. et al. Translation from the 5′ untranslated region shapes the integrated stress response. Science 351, aad3867 (2016).
Hinnebusch, A. G. The scanning mechanism of eukaryotic translation initiation. Annu. Rev. Biochem. 83, 779–812 (2014).
Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).
Kozak, M. Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc. Natl Acad. Sci. USA 83, 2850–2854 (1986).
Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).
Ferreira, J. P., Overton, K. W. & Wang, C. L. Tuning gene expression with synthetic upstream open reading frames. Proc. Natl Acad. Sci. USA 110, 11284–11289 (2013).
Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell in press https://doi.org/10.1016/j.cell.2019.04.046 (2019).
Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).
Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).
Karikó, K. et al. Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol. Ther. 16, 1833–1840 (2008).
Anderson, B. R. et al. Incorporation of pseudouridine into mRNA enhances translation by diminishing PKR activation. Nucleic Acids Res. 38, 5884–5892 (2010).
Kierzek, E. et al. The contribution of pseudouridine to stabilities and structure of RNAs. Nucleic Acids Res. 42, 3492–3501 (2014).
Seo, S. W. et al. Predictive design of mRNA translation initiation region to control prokaryotic translation efficiency. Metab. Eng. 15, 67–74 (2013).
Jensen, M. K. & Keasling, J. D. Recent applications of synthetic biology tools for yeast metabolic engineering. FEMS Yeast Res. 15, 1–10 (2015).
Salis, H. M., Mirsky, E. A. & Voigt, C. A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27, 946–950 (2009).
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Hernandez, R. D. et al. Singleton variants dominate the genetic architecture of human gene expression. Preprint https://doi.org/10.2139/ssrn.3151998 (2018).
Battle, A. et al. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015).
Cenik, C. et al. Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans. Genome Res. 25, 1610–1621 (2015).
Wang, B. & Bissell, D. M. Hereditary Coproporphyria (University of Washington, 2012). .
Boria, I. et al. The ribosomal basis of Diamond–Blackfan anemia: mutation and database update. Hum. Mutat. 31, 1269–1279 (2010).
Qin, Y. et al. Germline mutations in TMEM127 confer susceptibility to pheochromocytoma. Nat. Genet. 42, 229–233 (2010).
Mignone, F. et al. Untranslated regions of mRNAs. Genome Biol. 3, reviews0004.1 (2002).
Leppek, K., Das, R. & Barna, M. Functional 5′ UTR mRNA structures in eukaryotic translation regulation and how to find them. Nat. Rev. Mol. Cell Biol. 19, 158–174 (2018).
Richner, J. M. et al. Vaccine mediated protection against Zika virus-induced congenital disease. Cell 170, 273–283 (2017).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).
Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2017).
Abadi, M. et al. TensorFlow: Large-scale machine laerning on heterogeneous systems. Software available from tensorflow.org (2015).
Smedley, D. et al. BioMart—biological queries made easy. BMC Genomics 10, 22 (2009).
Acknowledgements
We would like to thank A. Rosenberg and J. Linder for helpful discussions on data analysis and modeling. We would also like to thank M. Moore, A. Hsieh and Y. Lim for constructive comments on the manuscript. We are grateful to C. Wang for providing fluorescence data27. This work was supported by a sponsored research agreement by Moderna and National Institutes of Health grant R01HG009892 to G.S.
Author information
Authors and Affiliations
Contributions
P.J.S. and B.W. designed and performed experiments, performed data analysis and modeling, and wrote the manuscript. D.W.R. performed fluorescence validation experiments. V.P. and I.M. wrote the manuscript. D.R.M. helped design polysome profiling. G.S. designed experiments and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
P.J.S., B.W., G.S. and DRM declare no competing interests. D.R., V.P. and I.M. are employees and shareholders of Moderna.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figures 1–15, Supplementary Tables 4 and 5, and Supplementary Note 1
Rights and permissions
About this article
Cite this article
Sample, P.J., Wang, B., Reid, D.W. et al. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat Biotechnol 37, 803–809 (2019). https://doi.org/10.1038/s41587-019-0164-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-019-0164-5
This article is cited by
-
Characterization and optimization of 5´ untranslated region containing poly-adenine tracts in Kluyveromyces marxianus using machine-learning model
Microbial Cell Factories (2024)
-
Current limitations in predicting mRNA translation with deep learning models
Genome Biology (2024)
-
A 5′ UTR language model for decoding untranslated regions of mRNA and function predictions
Nature Machine Intelligence (2024)
-
Integrated multiplexed assays of variant effect reveal determinants of catechol-O-methyltransferase gene expression
Molecular Systems Biology (2024)
-
Big data and deep learning for RNA biology
Experimental & Molecular Medicine (2024)