Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

High-throughput biochemistry in RNA sequence space: predicting structure and function

Abstract

RNAs are central to fundamental biological processes in all known organisms. The set of possible intramolecular interactions of RNA nucleotides defines the range of alternative structural conformations of a specific RNA that can coexist, and these structures enable functional catalytic properties of RNAs and/or their productive intermolecular interactions with other RNAs or proteins. However, the immense combinatorial space of potential RNA sequences has precluded predictive mapping between RNA sequence and molecular structure and function. Recent advances in high-throughput approaches in vitro have enabled quantitative thermodynamic and kinetic measurements of RNA–RNA and RNA–protein interactions, across hundreds of thousands of sequence variations. In this Review, we explore these techniques, how they can be used to understand RNA function and how they might form the foundations of an accurate model to predict the structure and function of an RNA directly from its nucleotide sequence. The experimental techniques and modelling frameworks discussed here are also highly relevant for the sampling of sequence–structure–function space of DNAs and proteins.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Diversity of RNA secondary and tertiary structures.
Fig. 2: Using next-generation sequencing chips for high-throughput biochemical measurements.
Fig. 3: Predicting experimental parameters from RNA sequence information.
Fig. 4: Structural modelling to predict RNA–RNA binding energies verified with RNA array binding data.
Fig. 5: Thermodynamic and kinetic models for RNA–protein interaction and RNA-guided protein binding.
Fig. 6: Single-molecule experiments carried out across a large sequence space.
Fig. 7: Feedback between experimental data, mechanistic modelling and deep learning methods.

Similar content being viewed by others

References

  1. Tinoco, I. Jr & Bustamante, C. How RNA folds. J. Mol. Biol. 293, 271–281 (1999).

    Article  CAS  PubMed  Google Scholar 

  2. Ganser, L. R., Kelly, M. L., Herschlag, D. & Al-Hashimi, H. M. The roles of structural dynamics in the cellular functions of RNAs. Nat. Rev. Mol. Cell Biol. 20, 474–489 (2019). A comprehensive review that covers how the structural dynamics of RNA control cellular functions.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Al-Hashimi, H. M. & Walter, N. G. RNA dynamics: it is about time. Curr. Opin. Struct. Biol. 18, 321–329 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Winkler, W., Nahvi, A. & Breaker, R. R. Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression. Nature 419, 952–956 (2002).

    Article  CAS  PubMed  Google Scholar 

  5. Mironov, A. S. et al. Sensing small molecules by nascent RNA: a mechanism to control transcription in bacteria. Cell 111, 747–756 (2002).

    Article  CAS  PubMed  Google Scholar 

  6. Batey, R. T., Gilbert, S. D. & Montange, R. K. Structure of a natural guanine-responsive riboswitch complexed with the metabolite hypoxanthine. Nature 432, 411–415 (2004).

    Article  CAS  PubMed  Google Scholar 

  7. Flores, J. K. & Ataide, S. F. Structural changes of RNA in complex with proteins in the SRP. Front. Mol. Biosci. 5, 7 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Shi, H. et al. Rapid and accurate determination of atomistic RNA dynamic ensemble models using NMR and structure prediction. Nat. Commun. 11, 5531 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Vicens, Q. & Kieft, J. S. Thoughts on how to think (and talk) about RNA structure. Proc. Natl Acad. Sci. USA 119, e2112677119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Westhof, E. & Patel, D. J. Nucleic acids. From self-assembly to induced-fit recognition. Curr. Opin. Struct. Biol. 7, 305–309 (1997).

    Article  CAS  PubMed  Google Scholar 

  11. Sussman, J. L., Holbrook, S. R., Warrant, R. W., Church, G. M. & Kim, S. H. Crystal structure of yeast phenylalanine transfer RNA. I. Crystallographic refinement. J. Mol. Biol. 123, 607–630 (1978).

    Article  CAS  PubMed  Google Scholar 

  12. Fürtig, B., Richter, C., Wöhnert, J. & Schwalbe, H. NMR spectroscopy of RNA. Chembiochem 4, 936–962 (2003).

    Article  PubMed  Google Scholar 

  13. Leontis, N. B. & Zirbel, C. L. in RNA 3D Structure Analysis and Prediction (eds Leontis, N. & Westhof, E.) 281–298 (Springer Berlin Heidelberg, 2012).

  14. Holley, R. W. et al. Structure of a ribonucleic acid. Science 147, 1462–1465 (1965).

    Article  CAS  PubMed  Google Scholar 

  15. Peattie, D. A. & Gilbert, W. Chemical probes for higher-order structure in RNA. Proc. Natl Acad. Sci. USA 77, 4679–4682 (1980).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Wang, X. D. & Padgett, R. A. Hydroxyl radical ‘footprinting’ of RNA: application to pre-mRNA splicing complexes. Proc. Natl Acad. Sci. USA 86, 7795–7799 (1989).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Latham, J. A. & Cech, T. R. Defining the inside and outside of a catalytic RNA molecule. Science 245, 276–282 (1989).

    Article  CAS  PubMed  Google Scholar 

  18. Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).

    Article  CAS  PubMed  Google Scholar 

  19. Zubradt, M. et al. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat. Methods 14, 75–82 (2017).

    Article  CAS  PubMed  Google Scholar 

  20. Smola, M. J., Rice, G. M., Busan, S., Siegfried, N. A. & Weeks, K. M. Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis. Nat. Protoc. 10, 1643–1669 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Van Damme, R. et al. Chemical reversible crosslinking enables measurement of RNA 3D distances and alternative conformations in cells. Nat. Commun. 13, 911 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Hafner, M. et al. CLIP and complementary methods. Nat. Rev. Methods Prim. 1, 1–23 (2021).

    Google Scholar 

  23. Weidmann, C. A., Mustoe, A. M., Jariwala, P. B., Calabrese, J. M. & Weeks, K. M. Analysis of RNA–protein networks with RNP-MaP defines functional hubs on RNA. Nat. Biotechnol. 39, 347–356 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Hafner, M. et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141, 129–141 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Spitale, R. C. & Incarnato, D. Probing the dynamic RNA structurome and its functions. Nat. Rev. Genet. https://doi.org/10.1038/s41576-022-00546-w (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Nutiu, R. et al. Direct measurement of DNA affinity landscapes on a high-throughput sequencing instrument. Nat. Biotechnol. 29, 659–664 (2011). This paper reports the first implementation of a high-throughput biophysical measurement on a sequencing chip, involving binding of the yeast transcription factor GCn4 to a library of DNA sites.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Tome, J. M. et al. Comprehensive analysis of RNA-protein interactions by high-throughput sequencing-RNA affinity profiling. Nat. Methods 11, 683–688 (2014). This paper reports one of the first implementations of high-throughput biophysical measurements on sequencing chips for RNA, involving the binding of GFP and NELF-E to RNA aptamers.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Buenrostro, J. D. et al. Quantitative analysis of RNA-protein interactions on a massively parallel array reveals biophysical and evolutionary landscapes. Nat. Biotechnol. 32, 562–568 (2014). This paper reports one of the first implementations of high-throughput biophysical measurements on sequencing chips for RNA, involving binding of the coat protein of MS2 bacteriophage to RNA hairpins.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Layton, C. J., McMahon, P. L. & Greenleaf, W. J. Large-scale, quantitative protein assays on a high-throughput DNA sequencing chip. Mol. Cell 73, 1075–1082.e4 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Yesselman, J. D. et al. Sequence-dependent RNA helix conformational preferences predictably impact tertiary structure formation. Proc. Natl Acad. Sci. USA 116, 16847–16855 (2019). In this paper, the authors study RNA–RNA binding using tectoRNAs on the RNA array and construct a structure-based model that can predict experimental binding energies.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. She, R. et al. Comprehensive and quantitative mapping of RNA–protein interactions across a transcribed eukaryotic genome. Proc. Natl Acad. Sci. USA 114, 3619–3624 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Li, Z. et al. DNB-based on-chip motif finding: a high-throughput method to profile different types of protein-DNA interactions. Sci. Adv. 6, eabb3350 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Ozer, A. et al. Quantitative assessment of RNA-protein interactions with high-throughput sequencing–RNA affinity profiling. Nat. Protoc. 10, 1212–1233 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Subtelny, A. O., Eichhorn, S. W., Chen, G. R., Sive, H. & Bartel, D. P. Poly(A)-tail profiling reveals an embryonic switch in translational control. Nature 508, 66–71 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Denny, S. K. et al. High-throughput investigation of diverse junction elements in RNA tertiary folding. Cell 174, 377–390.e20 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Jarmoskaite, I. et al. A quantitative and predictive model for RNA binding by human Pumilio proteins. Mol. Cell 74, 966–981.e18 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Wu, M. J., Andreasson, J. O. L., Kladwang, W., Greenleaf, W. & Das, R. Automated design of diverse stand-alone riboswitches. ACS Synth. Biol. 8, 1838–1846 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Becker, W. R. et al. High-throughput analysis reveals rules for target RNA binding and cleavage by AGO2. Mol. Cell 75, 741–755.e11 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Becker, W. R. et al. Quantitative high-throughput tests of ubiquitous RNA secondary structure prediction algorithms via RNA/protein binding. Preprint at bioRxiv https://doi.org/10.1101/571588 (2019).

  40. Andreasson, J. O. L., Savinov, A., Block, S. M. & Greenleaf, W. J. Comprehensive sequence-to-function mapping of cofactor-dependent RNA catalysis in the glmS ribozyme. Nat. Commun. 11, 1663 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Bonilla, S. L. et al. High-throughput dissection of the thermodynamic and conformational properties of a ubiquitous class of RNA tertiary contact motifs. Proc. Natl Acad. Sci. USA 118, e2109085118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Andreasson, J. O. L. et al. Crowdsourced RNA design discovers diverse, reversible, efficient, self-contained molecular switches. Proc. Natl Acad. Sci. USA 119, e2112979119 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Jung, C. et al. Massively parallel biophysical analysis of CRISPR-Cas complexes on next generation sequencing chips. Cell 170, 35–47.e13 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Jones, S. K. Jr et al. Massively parallel kinetic profiling of natural and engineered CRISPR nucleases. Nat. Biotechnol. 39, 84–93 (2021).

    Article  PubMed  Google Scholar 

  45. Denny, S. K. & Greenleaf, W. J. Linking RNA sequence, structure, and function on massively parallel high-throughput sequencers. Cold Spring Harb. Perspect. Biol. 11, a032300 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Bartel, D. P. Metazoan microRNAs. Cell 173, 20–51 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Lee, R. C., Feinbaum, R. L. & Ambros, V. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75, 843–854 (1993).

    Article  CAS  PubMed  Google Scholar 

  48. Cate, J. H. et al. Crystal structure of a group I ribozyme domain: principles of RNA packing. Science 273, 1678–1685 (1996).

    Article  CAS  PubMed  Google Scholar 

  49. Serganov, A. & Patel, D. J. Ribozymes, riboswitches and beyond: regulation of gene expression without proteins. Nat. Rev. Genet. 8, 776–790 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Glisovic, T., Bachorik, J. L., Yong, J. & Dreyfuss, G. RNA-binding proteins and post-transcriptional gene regulation. FEBS Lett. 582, 1977–1986 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Sadée, C. et al. A comprehensive thermodynamic model for RNA binding by the Saccharomyces cerevisiae Pumilio protein PUF4. Nat. Commun. 13, 4522 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Pickar-Oliver, A. & Gersbach, C. A. The next generation of CRISPR–Cas technologies and applications. Nat. Rev. Mol. Cell Biol. 20, 490–507 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Boyle, E. A. et al. High-throughput biochemical profiling reveals sequence determinants of dCas9 off-target binding and unbinding. Proc. Natl Acad. Sci. USA 114, 5461–5466 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Ober-Reynolds, B. et al. High-throughput biochemical profiling reveals functional adaptation of a bacterial Argonaute. Mol. Cell 82, 1329–1342.e8 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Wu, X. et al. Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat. Biotechnol. 32, 670–676 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Marklund, E. et al. Sequence specificity in DNA binding is mainly governed by association. Science 375, 442–445 (2022).

    Article  CAS  PubMed  Google Scholar 

  57. Eslami-Mossallam, B. et al. A kinetic model predicts SpCas9 activity, improves off-target classification, and reveals the physical basis of targeting fidelity. Nat. Commun. 13, 1367 (2022). References 56 and 57 (Marklund et al. and Eslami-Mossallam et al.) show how high-throughput data on binding, unbinding and cleavage of DNA by Cas9 can be used to gain microscopic mechanistic insights and build kinetic mechanistic models.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Zadeh, J. N. et al. NUPACK: analysis and design of nucleic acid systems. J. Comput. Chem. 32, 170–173 (2011).

    Article  CAS  PubMed  Google Scholar 

  59. Soukup, G. A. & Breaker, R. R. Engineering precision RNA molecular switches. Proc. Natl Acad. Sci. USA 96, 3584–3589 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Suess, B., Fink, B., Berens, C., Stentz, R. & Hillen, W. A theophylline responsive riboswitch based on helix slipping controls gene expression in vivo. Nucleic Acids Res. 32, 1610–1614 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Förster, T. Zwischenmolekulare energiewanderung und fluoreszenz. Ann. Phys. 437, 55–75 (1948).

    Article  Google Scholar 

  62. Stryer, L. & Haugland, R. P. Energy transfer: a spectroscopic ruler. Proc. Natl Acad. Sci. USA 58, 719–726 (1967).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Ha, T. Single-molecule fluorescence resonance energy transfer. Methods 25, 78–86 (2001).

    Article  CAS  PubMed  Google Scholar 

  64. Muschielok, A. et al. A nano-positioning system for macromolecular structural analysis. Nat. Methods 5, 965–971 (2008).

    Article  CAS  PubMed  Google Scholar 

  65. Lerner, E. et al. Toward dynamic structural biology: two decades of single-molecule Förster resonance energy transfer. Science 359, eaan1133 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Chauvier, A. et al. Monitoring RNA dynamics in native transcriptional complexes. Proc. Natl Acad. Sci. USA 118, e2106564118 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Winz, M.-L., Samanta, A., Benzinger, D. & Jäschke, A. Site-specific terminal and internal labeling of RNA by poly(A) polymerase tailing and copper-catalyzed or copper-free strain-promoted click chemistry. Nucleic Acids Res. 40, e78 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Betzig, E. & Chichester, R. J. Single molecules observed by near-field scanning optical microscopy. Science 262, 1422–1425 (1993).

    Article  CAS  PubMed  Google Scholar 

  69. Ha, T. et al. Probing the interaction between two single molecules: fluorescence resonance energy transfer between a single donor and a single acceptor. Proc. Natl Acad. Sci. USA 93, 6264–6268 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Zhuang, X. et al. A single-molecule study of RNA catalysis and folding. Science 288, 2048–2051 (2000).

    Article  CAS  PubMed  Google Scholar 

  71. Shema, E. et al. Single-molecule decoding of combinatorially modified nucleosomes. Science 352, 717–721 (2016). This paper shows the first implementation of high-throughput, single-molecule sequencing by synthesis combined with screening of binding, which is used to study nucleosome modifications in a DNA library of the mouse genome.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Severins, I., Joo, C. & van Noort, J. Exploring molecular biology in sequence space: the road to next-generation single-molecule biophysics. Mol. Cell 82, 1788–1805 (2022). This review summarizes the previous applications of high-throughput biophysical measurements on sequencing chips, and discusses in detail how the technology can be extended to carry out single-molecule experiments.

    Article  CAS  PubMed  Google Scholar 

  73. Magde, D., Elson, E. & Webb, W. W. Thermodynamic fluctuations in a reacting system — measurement by fluorescence correlation spectroscopy. Phys. Rev. Lett. 29, 705 (1972).

    Article  CAS  Google Scholar 

  74. Yu, L. et al. A comprehensive review of fluorescence correlation spectroscopy. Front. Phys. 9, 644450 (2021).

    Article  Google Scholar 

  75. Zheng, Q. et al. Ultra-stable organic fluorophores for single-molecule research. Chem. Soc. Rev. 43, 1044–1056 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Marklund, E. et al. DNA surface exploration and operator bypassing during target search. Nature 583, 858–861 (2020).

    Article  CAS  PubMed  Google Scholar 

  77. Wayment-Steele, H. K. et al. RNA secondary structure packages evaluated and improved by high-throughput experiments. Nat. Methods 19, 1234–1242 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Yu, H., Qi, Y. & Ding, Y. Deep learning in RNA structure studies. Front. Mol. Biosci. 9, 869601 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Zhang, H. et al. A new method of RNA secondary structure prediction based on convolutional neural network and dynamic programming. Front. Genet. 10, 467 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Wang, L. et al. DMfold: a novel method to predict RNA secondary structure with pseudoknots based on deep learning and improved base pair maximization principle. Front. Genet. 10, 143 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Calonaci, N., Jones, A., Cuturello, F., Sattler, M. & Bussi, G. Machine learning a model for RNA structure prediction. Nar. Genom. Bioinform. 2, lqaa090 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  85. Sato, K., Akiyama, M. & Sakakibara, Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat. Commun. 12, 941 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. Fu, L. et al. UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res. 50, e14 (2022).

    Article  CAS  PubMed  Google Scholar 

  87. Townshend, R. J. L. et al. Geometric deep learning of RNA structure. Science 373, 1047–1051 (2021). In this paper, the authors apply deep learning to build a model that can predict the tertiary structure of RNAs after being trained on high-resolution structural data.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Wei, J., Chen, S., Zong, L., Gao, X. & Li, Y. Protein–RNA interaction prediction with deep learning: structure matters. Brief. Bioinform. 23, bbab540 (2021).

    Article  PubMed Central  Google Scholar 

  89. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    Article  CAS  PubMed  Google Scholar 

  90. Lam, J. H. et al. A deep learning framework to predict binding preference of RNA constituents on protein surface. Nat. Commun. 10, 4941 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  91. Trabelsi, A., Chaabane, M. & Ben-Hur, A. Comprehensive evaluation of deep learning architectures for prediction of DNA/RNA sequence binding specificities. Bioinformatics 35, i269–i277 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Karniadakis, G. E. et al. Physics-informed machine learning. Nat. Rev. Phys. 3, 422–440 (2021).

    Article  Google Scholar 

  93. Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Arnold, F. H. Design by directed evolution. Acc. Chem. Res. 31, 125–131 (1998).

    Article  CAS  Google Scholar 

  95. Arnold, F. H. Combinatorial and computational challenges for biocatalyst design. Nature 409, 253–257 (2001).

    Article  CAS  PubMed  Google Scholar 

  96. Zhao, H., Chockalingam, K. & Chen, Z. Directed evolution of enzymes and pathways for industrial biocatalysis. Curr. Opin. Biotechnol. 13, 104–110 (2002).

    Article  CAS  PubMed  Google Scholar 

  97. Wang, Y., Yu, X. & Zhao, H. Biosystems design by directed evolution. AIChE J. 66, e16716 (2020).

    Article  CAS  Google Scholar 

  98. Tan, Z. L. et al. In vivo continuous evolution of metabolic pathways for chemical production. Microb. Cell Fact. 18, 82 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Wittmann, B. J., Johnston, K. E., Wu, Z. & Arnold, F. H. Advances in machine learning for directed evolution. Curr. Opin. Struct. Biol. 69, 11–18 (2021). This review covers how machine learning has been applied to assist in the navigation of large sequence spaces during directed evolution.

    Article  CAS  PubMed  Google Scholar 

  100. Settles, B. Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6, 1–114 (2012).

    Google Scholar 

  101. Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

    Article  PubMed  Google Scholar 

  102. Sverchkov, Y. & Craven, M. A review of active learning approaches to experimental design for uncovering biological networks. PLoS Comput. Biol. 13, e1005466 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  103. Ennifar, E., Walter, P., Ehresmann, B., Ehresmann, C. & Dumas, P. Crystal structures of coaxially stacked kissing complexes of the HIV-1 RNA dimerization initiation site. Nat. Struct. Biol. 8, 1064–1068 (2001).

    Article  CAS  PubMed  Google Scholar 

  104. Okada, K. et al. Solution structure of a GAAG tetraloop in helix 6 of SRP RNA from Pyrococcus furiosus. Nucleosides Nucleotides Nucleic Acids 25, 383–395 (2006).

    Article  CAS  PubMed  Google Scholar 

  105. Kim, N.-K. et al. Solution structure and dynamics of the wild-type pseudoknot of human telomerase RNA. J. Mol. Biol. 384, 1249–1261 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Kuglstatter, A., Oubridge, C. & Nagai, K. Induced structural changes of 7SL RNA during the assembly of human signal recognition particle. Nat. Struct. Biol. 9, 740–744 (2002).

    Article  CAS  PubMed  Google Scholar 

  107. Stoddard, C. D. et al. Free state conformational sampling of the SAM-I riboswitch aptamer domain. Structure 18, 787–797 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Collie, G. W., Haider, S. M., Neidle, S. & Parkinson, G. N. A crystallographic and modelling study of a human telomeric RNA (TERRA) quadruplex. Nucleic Acids Res. 38, 5569–5580 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank E. Sharma for discussions. This work was supported in part by NIH grants R01GM111990, P50HG007735, R01HG009909, P01GM066275, UM1HG009436 and R01GM121487 to W.J.G. W.J.G. acknowledges support as a Chan Zuckerberg Investigator. E.M. was supported by the Swedish Research Council grant 2020-06459.

Author information

Authors and Affiliations

Authors

Contributions

All authors researched, discussed, wrote and edited the manuscript.

Corresponding author

Correspondence to William J. Greenleaf.

Ethics declarations

Competing interests

W.J.G. is a consultant and equity holder for 10x Genomics, Guardant Health, Quantapore and Ultima Genomics, and cofounder of Protillion Biosciences. The other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Genetics thanks M. Depken and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Marklund, E., Ke, Y. & Greenleaf, W.J. High-throughput biochemistry in RNA sequence space: predicting structure and function. Nat Rev Genet 24, 401–414 (2023). https://doi.org/10.1038/s41576-022-00567-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41576-022-00567-5

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing