Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A deep learning method for recovering missing signals in transcriptome-wide RNA structure profiles from probing experiments

Abstract

Sequencing-based RNA structure probing can generate transcriptome-wide profiles of RNA secondary structures. Sufficient structural coverage is needed to obtain unbiased insights about RNA structures and functions, yet probing methods often yield uneven coverage, with missing structural scores across many transcripts. To overcome this barrier, we developed StructureImpute, a deep learning framework inspired by depth completion from computer vision that integrates an RNA sequence with available RNA structural information of neighbouring nucleotides to infer missing structure scores. We demonstrate the strong imputation performance of StructureImpute, with accuracy much superior to predictions based on RNA sequence alone. We also show that StructureImpute reliably reconstructs RNA structural patterns at biologically impactful RNA regulation regions, including protein-binding and RNA-modification sites. Strikingly, StructureImpute can use transfer learning to apply a model trained on one dataset to accurately infer missing structural scores in other datasets, even if they were generated with different technologies (for example, icSHAPE and DMS-seq).

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The overall architecture of StructureImpute for RNA structural score imputation.
Fig. 2: Performance evaluation of StructureImpute.
Fig. 3: Gradient analysis of the contributions of RNA sequence and structural information to the imputation performance of StructureImpute.
Fig. 4: StructureImpute accurately imputes missing structural scores within functional regions.
Fig. 5: A StructureImpute model trained on one dataset accurately imputes missing structural scores in other datasets using transfer learning.
Fig. 6: Performance of StructureImpute on DMS-seq datasets.

Data availability

The raw icSHAPE sequencing data were downloaded from the Gene Expression Omnibus (GEO). HEK293 whole-cell data are from GSE7435326, including both in vivo and in vitro conditions. HEK293 subcellular component (chromatin-associated, nucleoplasmic, cytoplasmic) data are from GSE117840. The m6A modification sites are from the RMBbase database46, which provides a file in .bed format with genomic coordinates of the hg38 assembly. The binding regions of the FXR2 RNA binding protein are from the CLIPDB database44, which provides a file in .bed format with hg38 assembly genomic coordinates. All the processed data are available from figshare at https://doi.org/10.6084/m9.figshare.1660685058.

Code availability

Code used for training models and performing analyses are available from GitHub (https://github.com/Tsinghua-gongjing/StructureImpute) or Zenodo (https://doi.org/10.5281/zenodo.5501018)59.

References

  1. 1.

    Halvorsen, M., Martin, J. S., Broadaway, S. & Laederach, A. Disease-associated mutations that alter the RNA structural ensemble. PLoS Genet. 6, e1001074 (2010).

    Google Scholar 

  2. 2.

    Wapinski, O. & Chang, H. Y. Long noncoding RNAs and human disease. Trends Cell Biol. 21, 354–361 (2011).

    Google Scholar 

  3. 3.

    Bevilacqua, P. C., Ritchey, L. E., Su, Z. & Assmann, S. M. Genome-wide analysis of RNA secondary structure. Annu. Rev. Genet. 50, 235–266 (2016).

    Google Scholar 

  4. 4.

    Piao, M., Sun, L. & Zhang, Q. C. RNA regulations and functions decoded by transcriptome-wide RNA structure probing. Genomics Proteomics Bioinformatics 15, 267–278 (2017).

    Google Scholar 

  5. 5.

    Strobel, E. J., Yu, A. M. & Lucks, J. B. High-throughput determination of RNA structures. Nat. Rev. Genet. 19, 615–634 (2018).

    Google Scholar 

  6. 6.

    Rouskin, S., Zubradt, M., Washietl, S., Kellis, M. & Weissman, J. S. Genome-wide probing of RNA structure reveals active unfolding of mRNA structures in vivo. Nature 505, 701–705 (2014).

    Google Scholar 

  7. 7.

    Ding, Y. et al. In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505, 696–700 (2014).

    Google Scholar 

  8. 8.

    Weng, X. et al. Keth-seq for transcriptome-wide RNA structure mapping. Nat. Chem. Biol. 16, 489–492 (2020).

    Google Scholar 

  9. 9.

    Merino, E. J., Wilkinson, K. A., Coughlan, J. L. & Weeks, K. M. RNA structure analysis at single nucleotide resolution by selective 2′-hydroxyl acylation and primer extension (SHAPE). J. Am. Chem. Soc. 127, 4223–4231 (2005).

    Google Scholar 

  10. 10.

    Siegfried, N. A., Busan, S., Rice, G. M., Nelson, J. A. & Weeks, K. M. RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP). Nat. Methods 11, 959–965 (2014).

    Google Scholar 

  11. 11.

    Spitale, R. C. et al. Structural imprints in vivo decode RNA regulatory mechanisms. Nature 519, 486–490 (2015).

    Google Scholar 

  12. 12.

    Arisdakessian, C., Poirion, O., Yunits, B., Zhu, X. & Garmire, L. X. DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data. Genome Biol. 20, 211 (2019).

    Google Scholar 

  13. 13.

    Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 20, 273–282 (2019).

    Google Scholar 

  14. 14.

    Seetin, M. G. & Mathews, D. H. RNA structure prediction: an overview of methods. Methods Mol. Biol. 905, 99–122 (2012).

    Google Scholar 

  15. 15.

    Mathews, D. H., Turner, D. H. & Watson, R. M. RNA secondary structure prediction. Curr. Protoc. Nucleic Acid Chem. 67, 11.12.11–11.12.19 (2016).

    Google Scholar 

  16. 16.

    Shi, B. et al. RNA structural dynamics regulate early embryogenesis through controlling transcriptome fate and function. Genome Biol. 21, 120 (2020).

    Google Scholar 

  17. 17.

    Sun, L. et al. RNA structure maps across mammalian cellular compartments. Nat. Struct. Mol. Biol. 26, 322–330 (2019).

    Google Scholar 

  18. 18.

    Li, W. V. & Li, J. J. An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nat. Commun. 9, 997 (2018).

    Google Scholar 

  19. 19.

    van Dijk, D. et al. Recovering gene interactions from single-cell data using data diffusion. Cell 174, 716–729 (2018).

    Google Scholar 

  20. 20.

    Huang, M. et al. SAVER: gene expression recovery for single-cell RNA sequencing. Nat. Methods 15, 539–542 (2018).

    Google Scholar 

  21. 21.

    Xiong, L. et al. SCALE method for single-cell ATAC-seq analysis via latent feature extraction. Nat. Commun. 10, 4576 (2019).

    Google Scholar 

  22. 22.

    Qiu, J. X. et al. DeepLiDAR: Deep surface normal guided depth prediction for outdoor scene from sparse LiDAR data and single color image. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 3308–3317 (IEEE, 2019); https://doi.org/10.1109/Cvpr.2019.00343

  23. 23.

    Xu, Y. et al. Depth completion from sparse LiDAR data with depth-normal constraints. In Proc. IEEE International Conference on Computer Vision 2811–2820 (IEEE, 2019); https://doi.org/10.1109/Iccv.2019.00290

  24. 24.

    Tang, J., Tian, F. P., Feng, W., Li, J. & Tan, P. Learning guided convolutional network for depth completion. IEEE Trans. Image Process. 30, 1116–1129 (2021).

    Google Scholar 

  25. 25.

    Li, P., Shi, R. & Zhang, Q. icSHAPE-pipe: a comprehensive toolkit for icSHAPE data analysis and evaluation. Methods 178, 96–103 (2020).

    Google Scholar 

  26. 26.

    Lu, Z. et al. RNA duplex map in living cells reveals higher-order transcriptome structure. Cell 165, 1267–1279 (2016).

    Google Scholar 

  27. 27.

    He, K. M., Zhang, X. Y., Ren, S. Q. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016); https://arxiv.org/abs/1512.03385

  28. 28.

    Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Google Scholar 

  29. 29.

    Singh, J., Hanson, J., Paliwal, K. & Zhou, Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat. Commun. 10, 5407 (2019).

    Google Scholar 

  30. 30.

    Anger, A. M. et al. Structures of the human and Drosophila 80S ribosome. Nature 497, 80–85 (2013).

    Google Scholar 

  31. 31.

    Bernier, C. R. et al. RiboVision suite for visualization and analysis of ribosomes. Faraday Discuss. 169, 195–207 (2014).

    Google Scholar 

  32. 32.

    Bellaousov, S., Reuter, J. S., Seetin, M. G. & Mathews, D. H. RNAstructure: web servers for RNA secondary structure prediction and analysis. Nucleic Acids Res. 41, W471–W474 (2013).

    Google Scholar 

  33. 33.

    Mautner, S. et al. ShaKer: RNA SHAPE prediction using graph kernel. Bioinformatics 35, i354–i359 (2019).

    Google Scholar 

  34. 34.

    Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision 618–626 (IEEE, 2017); https://doi.org/10.1109/ICCV.2017.74

  35. 35.

    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2921–2929 (IEEE, 2016); https://arxiv.org/abs/1512.04150

  36. 36.

    Hentze, M. W., Castello, A., Schwarzl, T. & Preiss, T. A brave new world of RNA-binding proteins. Nat. Rev. Mol. Cell Biol. 19, 327–341 (2018).

    Google Scholar 

  37. 37.

    Lu, Z. & Chang, H. Y. The RNA base-pairing problem and base-pairing solutions. Cold Spring Harb. Perspect. Biol 10, a034926 (2018).

    Google Scholar 

  38. 38.

    Yan, Z. et al. Genome-wide colocalization of RNA-DNA interactions and fusion RNA pairs. Proc. Natl Acad. Sci. USA 116, 3328–3337 (2019).

    Google Scholar 

  39. 39.

    Luo, Z., Yang, Q. & Yang, L. RNA structure switches RBP binding. Mol. Cell 64, 219–220 (2016).

    Google Scholar 

  40. 40.

    Sanchez de Groot, N. et al. RNA structure drives interaction with proteins. Nat. Commun. 10, 3246 (2019).

    Google Scholar 

  41. 41.

    Lewis, C. J., Pan, T. & Kalsotra, A. RNA modifications and structures cooperate to guide RNA–protein interactions. Nat. Rev. Mol. Cell Biol. 18, 202–210 (2017).

    Google Scholar 

  42. 42.

    Huang, J. & Yin, P. Structural insights into N6-methyladenosine (m6A) modification in the transcriptome. Genomics Proteomics Bioinformatics 16, 85–98 (2018).

    Google Scholar 

  43. 43.

    Lukong, K. E., Chang, K. W., Khandjian, E. W. & Richard, S. RNA-binding proteins in human genetic disease. Trends Genet. 24, 416–425 (2008).

    Google Scholar 

  44. 44.

    Yang, Y. C. et al. CLIPdb: a CLIP-seq database for protein-RNA interactions. BMC Genomics 16, 51 (2015).

    Google Scholar 

  45. 45.

    Anderson, B. R., Chopra, P., Suhl, J. A., Warren, S. T. & Bassell, G. J. Identification of consensus binding sites clarifies FMRP binding determinants. Nucleic Acids Res. 44, 6649–6659 (2016).

    Google Scholar 

  46. 46.

    Xuan, J. J. et al. RMBase v2.0: deciphering the map of RNA modifications from epitranscriptome sequencing data. Nucleic Acids Res. 46, D327–D334 (2018).

    Google Scholar 

  47. 47.

    Zaccara, S., Ries, R. J. & Jaffrey, S. R. Reading, writing and erasing mRNA methylation. Nat. Rev. Mol. Cell Biol. 20, 608–624 (2019).

    Google Scholar 

  48. 48.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Google Scholar 

  49. 49.

    Garst, A. D., Edwards, A. L. & Batey, R. T. Riboswitches: structures and mechanisms. Cold Spring Harb. Perspect. Biol 3, a034926 (2011).

    Google Scholar 

  50. 50.

    Wan, Y. et al. Landscape and variation of RNA secondary structure across the human transcriptome. Nature 505, 706–709 (2014).

    Google Scholar 

  51. 51.

    Lackey, L., Coria, A., Woods, C., McArthur, E. & Laederach, A. Allele-specific SHAPE-MaP assessment of the effects of somatic variation and protein binding on mRNA structure. RNA 24, 513–528 (2018).

    Google Scholar 

  52. 52.

    Li, P. et al. Integrative analysis of Zika virus genome RNA structure reveals critical determinants of viral infectivity. Cell Host Microbe 24, 875–886 (2018).

    Google Scholar 

  53. 53.

    Zhang, Z. et al. Deep-learning augmented RNA-seq analysis of transcript splicing. Nat. Methods 16, 307–310 (2019).

    Google Scholar 

  54. 54.

    Flynn, R. A. et al. Transcriptome-wide interrogation of RNA secondary structure in living cells with icSHAPE. Nat. Protoc. 11, 273–290 (2016).

    Google Scholar 

  55. 55.

    Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017–1018 (2011).

    Google Scholar 

  56. 56.

    Andronescu, M., Bereg, V., Hoos, H. H. & Condon, A. RNA STRAND: the RNA secondary structure and statistical analysis database. BMC Bioinformatics 9, 340 (2008).

    Google Scholar 

  57. 57.

    Kalvari, I. et al. Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families. Nucleic Acids Res. 46, D335–D342 (2018).

    Google Scholar 

  58. 58.

    Jing, G., Kui, X. & Qiangfeng Cliff, Z. A deep learning method for recovering missing signals in transcriptome-wide RNA structure profiles from probing experiments. figshare https://doi.org/10.6084/m9.figshare.16606850 (2021).

  59. 59.

    Jing, G. & Kui, X. Tsinghua-gongjing/StructureImpute: v1.0.0. Zenodo https://doi.org/10.5281/zenodo.5501018 (2021).

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (grant numbers 91740204, 91940306 and 31761163007 to Q.C.Z.) and the Chinese Ministry of Science and Technology (grant numbers 2019YFA0110002 and 2018YFA0107603 to Q.C.Z.). We thank the Tsinghua University Branch of China National Center for Protein Sciences (Beijing) for computational facility support.

Author information

Affiliations

Authors

Contributions

Q.C.Z. and Z.J.L. conceived and supervised the research. J.G. and K.X. designed and implemented the StructureImpute model. J.G. designed and performed all the analyses with the help of Z.M. J.G. and Q.C.Z. wrote the manuscript with input from all authors.

Corresponding author

Correspondence to Qiangfeng Cliff Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Zilu Zhou and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–5 and Tables 1 and 2.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gong, J., Xu, K., Ma, Z. et al. A deep learning method for recovering missing signals in transcriptome-wide RNA structure profiles from probing experiments. Nat Mach Intell 3, 995–1006 (2021). https://doi.org/10.1038/s42256-021-00412-0

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing