Human 5′ UTR design and variant effect prediction from a massively parallel translation assay


The ability to predict the impact of cis-regulatory sequences on gene expression would facilitate discovery in fundamental and applied biology. Here we combine polysome profiling of a library of 280,000 randomized 5′ untranslated regions (UTRs) with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately direct specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,212 truncated human 5′ UTRs and 3,577 naturally occurring variants and show that the model predicts ribosome loading of these sequences. Finally, we provide evidence of 45 single-nucleotide variants (SNVs) associated with human diseases that substantially change ribosome loading and thus may represent a molecular basis for disease.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: A library of 280,000 random 50-nucleotide oligomers as 5′ UTRs for eGFP.
Fig. 2: Modeling 5′ UTR sequences and ribosome loading.
Fig. 3: Design of new 5′ UTRs.
Fig. 4: Model performance with human 5′ UTRs and generalization to 5′ UTRs of varying length.

Data availability

The authors declare that all data supporting the findings of this study are available from Gene Expression Omnibus under accession GSE114002.

Code availability

The code for the Optimus 5-Prime model is provided in the Supplementary Code file. All code is also available at


  1. 1.

    Araujo, P. R. et al. Before it gets started: regulating translation at the 5′ UTR. Comp. Funct. Genom. 2012, 475731 (2012).

    Article  Google Scholar 

  2. 2.

    Jackson, R. J., Hellen, C. U. T. & Pestova, T. V. The mechanism of eukaryotic translation initiation and principles of its regulation. Nat. Rev. Mol. Cell Biol. 11, 113–127 (2010).

    CAS  Article  Google Scholar 

  3. 3.

    Angermueller, C., Pärnamaa, T., Parts, L. & Stegle, O. Deep learning for computational biology. Mol. Syst. Biol. 12, 878 (2016).

    Article  Google Scholar 

  4. 4.

    Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    CAS  Article  Google Scholar 

  5. 5.

    Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

    CAS  Article  Google Scholar 

  6. 6.

    Kleftogiannis, D., Kalnis, P. & Bajic, V. B. DEEP: a general computational framework for predicting enhancers. Nucleic Acids Res. 43, e6 (2015).

    Article  Google Scholar 

  7. 7.

    Liu, F., Li, H., Ren, C., Bo, X. & Shu, W. PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci. Rep. 6, 28517 (2016).

    CAS  Article  Google Scholar 

  8. 8.

    Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    CAS  Article  Google Scholar 

  9. 9.

    Zhao, W. et al. Massively parallel functional annotation of 3′ untranslated regions. Nat. Biotechnol. 32, 387–391 (2014).

    CAS  Article  Google Scholar 

  10. 10.

    Noderer, W. L. et al. Quantitative analysis of mammalian translation initiation sites by FACS-seq. Mol. Syst. Biol. 10, 748 (2014).

    Article  Google Scholar 

  11. 11.

    Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA 110, 14024–14029 (2013).

    CAS  Article  Google Scholar 

  12. 12.

    Cuperus, J. T. et al. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences. Genome Res. 27, 2015–2024 (2017).

    CAS  Article  Google Scholar 

  13. 13.

    Zuccotti, P. & Modelska, A. in Post-Transcriptional Gene Regulation (ed. Dassi, E.) 59–69 (Humana Press, 2016).

  14. 14.

    Floor, S. N. & Doudna, J. A. Tunable protein synthesis by transcript isoforms in human cells. elife 5, e10921 (2016).

    Article  Google Scholar 

  15. 15.

    Wang, X., Hou, J., Quedenau, C. & Chen, W. Pervasive isoform‐specific translational regulation via alternative transcription start sites in mammals. Mol. Syst. Biol. 12, 875 (2016).

    Article  Google Scholar 

  16. 16.

    Whiffin, N. et al. Characterising the loss-of-function impact of 5′ untranslated region variants in whole genome sequence data from 15,708 individuals. Preprint at (2019).

  17. 17.

    Hinnebusch, A. G., Ivanov, I. P. & Sonenberg, N. Translational control by 5′-untranslated regions of eukaryotic mRNAs. Science 352, 1413–1416 (2016).

    CAS  Article  Google Scholar 

  18. 18.

    Morris, D. R. & Geballe, A. P. Upstream open reading frames as regulators of mRNA translation. Mol. Cell. Biol. 20, 8635–8642 (2000).

    CAS  Article  Google Scholar 

  19. 19.

    Johnstone, T. G., Bazzini, A. A. & Giraldez, A. J. Upstream ORFs are prevalent translational repressors in vertebrates. EMBO J. 35, 706–723 (2016).

    CAS  Article  Google Scholar 

  20. 20.

    Lee, S. et al. Global mapping of translation initiation sites in mammalian cells at single-nucleotide resolution. Proc. Natl Acad. Sci. USA 109, E2424–E2432 (2012).

    CAS  Article  Google Scholar 

  21. 21.

    Reuter, K., Biehl, A., Koch, L. & Helms, V. PreTIS: a tool to predict non-canonical 5′ UTR translational initiation sites in human and mouse. PLoS Comput. Biol. 12, e1005170 (2016).

    Article  Google Scholar 

  22. 22.

    Starck, S. R. et al. Translation from the 5′ untranslated region shapes the integrated stress response. Science 351, aad3867 (2016).

    Article  Google Scholar 

  23. 23.

    Hinnebusch, A. G. The scanning mechanism of eukaryotic translation initiation. Annu. Rev. Biochem. 83, 779–812 (2014).

    CAS  Article  Google Scholar 

  24. 24.

    Kozak, M. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283–292 (1986).

    CAS  Article  Google Scholar 

  25. 25.

    Kozak, M. Influences of mRNA secondary structure on initiation by eukaryotic ribosomes. Proc. Natl Acad. Sci. USA 83, 2850–2854 (1986).

    CAS  Article  Google Scholar 

  26. 26.

    Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).

    CAS  Article  Google Scholar 

  27. 27.

    Ferreira, J. P., Overton, K. W. & Wang, C. L. Tuning gene expression with synthetic upstream open reading frames. Proc. Natl Acad. Sci. USA 110, 11284–11289 (2013).

    CAS  Article  Google Scholar 

  28. 28.

    Bogard, N., Linder, J., Rosenberg, A. B. & Seelig, G. A deep neural network for predicting and engineering alternative polyadenylation. Cell in press (2019).

    Article  Google Scholar 

  29. 29.

    Gupta, S., Stamatoyannopoulos, J. A., Bailey, T. L. & Noble, W. S. Quantifying similarity between motifs. Genome Biol. 8, R24 (2007).

    Article  Google Scholar 

  30. 30.

    Ray, D. et al. A compendium of RNA-binding motifs for decoding gene regulation. Nature 499, 172–177 (2013).

    CAS  Article  Google Scholar 

  31. 31.

    Karikó, K. et al. Incorporation of pseudouridine into mRNA yields superior nonimmunogenic vector with increased translational capacity and biological stability. Mol. Ther. 16, 1833–1840 (2008).

    Article  Google Scholar 

  32. 32.

    Anderson, B. R. et al. Incorporation of pseudouridine into mRNA enhances translation by diminishing PKR activation. Nucleic Acids Res. 38, 5884–5892 (2010).

    CAS  Article  Google Scholar 

  33. 33.

    Kierzek, E. et al. The contribution of pseudouridine to stabilities and structure of RNAs. Nucleic Acids Res. 42, 3492–3501 (2014).

    CAS  Article  Google Scholar 

  34. 34.

    Seo, S. W. et al. Predictive design of mRNA translation initiation region to control prokaryotic translation efficiency. Metab. Eng. 15, 67–74 (2013).

    CAS  Article  Google Scholar 

  35. 35.

    Jensen, M. K. & Keasling, J. D. Recent applications of synthetic biology tools for yeast metabolic engineering. FEMS Yeast Res. 15, 1–10 (2015).

    CAS  Article  Google Scholar 

  36. 36.

    Salis, H. M., Mirsky, E. A. & Voigt, C. A. Automated design of synthetic ribosome binding sites to control protein expression. Nat. Biotechnol. 27, 946–950 (2009).

    CAS  Article  Google Scholar 

  37. 37.

    Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).

    CAS  Article  Google Scholar 

  38. 38.

    Hernandez, R. D. et al. Singleton variants dominate the genetic architecture of human gene expression. Preprint (2018).

  39. 39.

    Battle, A. et al. Impact of regulatory variation from RNA to protein. Science 347, 664–667 (2015).

    CAS  Article  Google Scholar 

  40. 40.

    Cenik, C. et al. Integrative analysis of RNA, translation, and protein levels reveals distinct regulatory variation across humans. Genome Res. 25, 1610–1621 (2015).

    CAS  Article  Google Scholar 

  41. 41.

    Wang, B. & Bissell, D. M. Hereditary Coproporphyria (University of Washington, 2012). .

  42. 42.

    Boria, I. et al. The ribosomal basis of Diamond–Blackfan anemia: mutation and database update. Hum. Mutat. 31, 1269–1279 (2010).

    CAS  Article  Google Scholar 

  43. 43.

    Qin, Y. et al. Germline mutations in TMEM127 confer susceptibility to pheochromocytoma. Nat. Genet. 42, 229–233 (2010).

    CAS  Article  Google Scholar 

  44. 44.

    Mignone, F. et al. Untranslated regions of mRNAs. Genome Biol. 3, reviews0004.1 (2002).

    Article  Google Scholar 

  45. 45.

    Leppek, K., Das, R. & Barna, M. Functional 5′ UTR mRNA structures in eukaryotic translation regulation and how to find them. Nat. Rev. Mol. Cell Biol. 19, 158–174 (2018).

    CAS  Article  Google Scholar 

  46. 46.

    Richner, J. M. et al. Vaccine mediated protection against Zika virus-induced congenital disease. Cell 170, 273–283 (2017).

    CAS  Article  Google Scholar 

  47. 47.

    Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 10–12 (2011).

    Article  Google Scholar 

  48. 48.

    Zhao, L., Liu, Z., Levy, S. F. & Wu, S. Bartender: a fast and accurate clustering algorithm to count barcode reads. Bioinformatics 34, 739–747 (2017).

    Article  Google Scholar 

  49. 49.

    Abadi, M. et al. TensorFlow: Large-scale machine laerning on heterogeneous systems. Software available from (2015).

  50. 50.

    Smedley, D. et al. BioMart—biological queries made easy. BMC Genomics 10, 22 (2009).

    Article  Google Scholar 

Download references


We would like to thank A. Rosenberg and J. Linder for helpful discussions on data analysis and modeling. We would also like to thank M. Moore, A. Hsieh and Y. Lim for constructive comments on the manuscript. We are grateful to C. Wang for providing fluorescence data27. This work was supported by a sponsored research agreement by Moderna and National Institutes of Health grant R01HG009892 to G.S.

Author information




P.J.S. and B.W. designed and performed experiments, performed data analysis and modeling, and wrote the manuscript. D.W.R. performed fluorescence validation experiments. V.P. and I.M. wrote the manuscript. D.R.M. helped design polysome profiling. G.S. designed experiments and wrote the manuscript.

Corresponding author

Correspondence to Georg Seelig.

Ethics declarations

Competing interests

P.J.S., B.W., G.S. and DRM declare no competing interests. D.R., V.P. and I.M. are employees and shareholders of Moderna.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sample, P.J., Wang, B., Reid, D.W. et al. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat Biotechnol 37, 803–809 (2019).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing