Abstract
Many functional RNAs have an evolutionarily conserved secondary structure. Conservation of RNA base pairing induces pairwise covariations in sequence alignments. We developed a computational method, R-scape (RNA Structural Covariation Above Phylogenetic Expectation), that quantitatively tests whether covariation analysis supports the presence of a conserved RNA secondary structure. R-scape analysis finds no statistically significant support for proposed secondary structures of the long noncoding RNAs HOTAIR, SRA, and Xist.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
In vivo secondary structural analysis of Influenza A virus genomic RNA
Cellular and Molecular Life Sciences Open Access 02 May 2023
-
Crystal structure of a highly conserved enteroviral 5′ cloverleaf RNA replication element
Nature Communications Open Access 07 April 2023
-
Expansion of the RNAStructuromeDB to include secondary structural data spanning the human protein-coding transcriptome
Scientific Reports Open Access 25 August 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout


References
Holley, R.W. et al. Science 147, 1462–1465 (1965).
Noller, H.F. et al. Nucleic Acids Res. 9, 6167–6189 (1981).
Pace, N.R., Smith, D.K., Olsen, G.J. & James, B.D. Gene 82, 65–75 (1989).
Williams, K.P. & Bartel, D.P. RNA 2, 1306–1310 (1996).
Michel, F., Costa, M., Massire, C. & Westhof, E. Methods Enzymol. 317, 491–510 (2000).
Gutell, R.R., Power, A., Hertz, G.Z., Putz, E.J. & Stormo, G.D. Nucleic Acids Res. 20, 5785–5795 (1992).
Davidovich, C. & Cech, T.R. RNA 21, 2007–2022 (2015).
Ji, Z., Song, R., Regev, A. & Struhl, K. eLife 4, e08890 (2015).
Akmaev, V.R., Kelley, S.T. & Stormo, G.D. Bioinformatics 16, 501–512 (2000).
Lindgreen, S., Gardner, P.P. & Krogh, A. Bioinformatics 22, 2988–2995 (2006).
Yeang, C.-H., Darot, J.F.J., Noller, H.F. & Haussler, D. Mol. Biol. Evol. 24, 2119–2131 (2007).
Dutheil, J.Y. Brief. Bioinform. 13, 228–243 (2012).
Somarowthu, S. et al. Mol. Cell 58, 353–361 (2015).
Weinberg, Z. & Breaker, R.R. BMC Bioinformatics 12, 3 (2011).
Nawrocki, E.P. et al. Nucleic Acids Res. 43, D130–D137 (2015).
Woolf, B. Ann. Hum. Genet. 21, 397–409 (1957).
Dunn, S.D., Wahl, L.M. & Gloor, G.B. Bioinformatics 24, 333–340 (2008).
Szymanski, M., Barciszewska, M.Z., Erdmann, V.A. & Barciszewski, J. Nucleic Acids Res. 30, 176–178 (2002).
Fu, Y., Deiorio-Haggar, K., Anthony, J. & Meyer, M.M. Nucleic Acids Res. 41, 3491–3503 (2013).
del Val, C., Rivas, E., Torres-Quesada, O., Toro, N. & Jiménez-Zurdo, J.I. Mol. Microbiol. 66, 1080–1091 (2007).
Novikova, I.V., Hennelly, S.P. & Sanbonmatsu, K.Y. Nucleic Acids Res. 40, 5034–5051 (2012).
Maenner, S. et al. PLoS Biol. 8, e1000276 (2010).
Fang, R., Moss, W.N., Rutenberg-Schoenberg, M. & Simon, M.D. PLoS Genet. 11, e1005668 (2015).
Rinn, J.L. & Chang, H.Y. Annu. Rev. Biochem. 81, 145–166 (2012).
Rivas, E., Lang, R. & Eddy, S.R. RNA 18, 193–212 (2012).
Price, M.N., Dehal, P.S. & Arkin, A.P. PLoS One 5, e9490 (2010).
Shannon, C.E. Bell Syst. Tech. J. 27, 379–423 (1948).
Gutell, R.R., Larsen, N. & Woese, C.R. Microbiol. Rev. 58, 10–26 (1994).
Martin, L.C., Gloor, G.B., Dunn, S.D. & Wahl, L.M. Bioinformatics 21, 4116–4124 (2005).
Fodor, A.A. & Aldrich, R.W. Proteins 56, 211–221 (2004).
Hofacker, I.L., Fekete, M. & Stadler, P.F. J. Mol. Biol. 319, 1059–1066 (2002).
Gerstein, M., Sonnhammer, E.L.L. & Chothia, C. J. Mol. Biol. 236, 1067–1078 (1994).
Gorodkin, J., Staerfeldt, H.H., Lund, O. & Brunak, S. Bioinformatics 15, 769–770 (1999).
Weigt, M., White, R.A., Szurmant, H., Hoch, J.A. & Hwa, T. Proc. Natl. Acad. Sci. USA 106, 67–72 (2009).
De Leonardis, E. et al. Nucleic Acids Res. 43, 10444–10455 (2015).
Weinreb, C. et al. Cell 165, 963–975 (2016).
Fitch, W.M. Syst. Zool. 20, 406–416 (1971).
Goebel, B., Dawy, Z., Hagenauer, J. & Mueller, J.C. in IEEE International Conference on Communications Vol. 2, 1102–1106 (IEEE, 2005).
Rivas, E. & Eddy, S.R. BMC Bioinformatics 16, 406 (2015).
Guindon, S. et al. Syst. Biol. 59, 307–321 (2010).
Jung, S. et al. Nucleic Acids Res. 39, 7529–7547 (2011).
del Val, C. et al. RNA Biol. 9, 119–129 (2012).
Wheeler, T.J. et al. Nucleic Acids Res. 41, D70–D82 (2013).
Acknowledgements
We thank S.E.R. Egnor for suggesting the name R-scape and the Centro de Ciencias de Benasque Pedro Pascual in Spain, where part of this manuscript was drafted.
Author information
Authors and Affiliations
Contributions
E.R. and S.R.E. designed the method and wrote the manuscript. E.R. wrote the code, and designed and carried out the experiments. J.C. wrote the R-scape web application.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Characterization of different covariation statistics on a positive testset of 104 RNAs.
(a) Plots of the F measure---the harmonic mean of sensitivity (SEN) and positive predictive value (PPV), F=2*SEN*PPV / (SEN+PPV)---for four different covariation statistics as a function of the score's E-value, over all alignments, using R=scape with default parameters. (b) Effect of alignment gaps on the different covariation statistics, seen by including all alignment columns (right) as compared to the R-scape default (left). (c) Effect of measuring covariation using a binary classification (whether a pair is canonical Watson-Crick/G:U or not) versus using the full sixteen-way classification. (d) Covariation detection as a function of the number of sequences in the alignments. (e) The F measure for each of the 104 RNA Rfam alignments in the positive testset as a function of average percentage identity, at an E-value threshold of 0.05.
Supplementary Figure 2 Comparison of R-scape to related methods CoMap and MICA [12] on the testset of 104 RNAs.
(a) Sensitivity (percentage of significant base pairs) and positive predictive value (percentage of significant pairs that are base pairs) as a function of the score's E-value. (b) Running times for the three methods (R-scape in black, CoMap cyan, MICA red) on a log-log plot as a function of the number of sequences in the alignment (left) and as a function of the alignment length (right). Running times are for a single 3GHz intel Core i7 with 8GB 1600GHz DDR3 RAM. Running times for R-scape and CoMap include the cost of generating a phylogenetic tree using FastTree [26].
Supplementary Figure 3 Examples of RNAs with significant covariation support for their proposed structures.
(a) R-scape analysis of a multiple sequence alignment of αr14, a putative regulatory small RNA in α-proteobacteria [20,42]. (b) R-cape analysis of a multiple sequence alignment of Arisong RNA, a noncoding RNA identified in the ciliate Oxytricha [41]. (c) Example of detecting an underannotated structure, an S15 mRNA leader in γ-proteobacteria that autoregulates ribosomal protein synthesis [19]. Three out of the seven significantly covarying pairs are not in the proposed structure. These covarying pairs support the existence of a conserved pseudoknot, which was already known, but happened to not be annotated in the provided alignment [19]. (d) Example of using R-scape to improve a structural annotation for the Rfam seed alignment for SAM-I riboswitch. The R-scape modified structure has seven significant pairs not included in the Rfam-annotated SAM-I structure. The R-scape structure is in agreement with the secondary structure derived from the SAM-I riboswitch crystal structure (RK Montange & RT Batey, Nature, {\bf 441}441, 1172-1175, 2006). Notation is as in Figure 2.
Supplementary Figure 4 Covariation analysis of HOTAIR putative helices H7 and H10.
The structural alignments have been extracted from the HOTAIR Domain1 alignment (with 37 sequences) provided in [13]. The H7 and H10 alignments have 28 and 27 sequences respectively, after removing species for which the region does not include any residues. For any two base paired positions, changes are annotated in color relative to the most frequent Watson-Crick or G:U pair. Green arrows indicate the base pairs (one for H7 and 3 for H10) proposed as covarying in [13]. For putative helix H7, the proposed covarying pair (columns 8:36 marked in green) has covariation score -0.16 (E-value 7.74). Gray arrows indicate the best scoring putative Watson-Crick pair (columns 10:30, with a consensus C:G) which was not part of the proposed structure. This best scoring alternative pair would have one U:A compensatory and one U:G half-compensatory changes, and covariation score 3.66 (E-value 5.52). For both alignments, we also provide the R-scape analysis for all pairs. For putative helix H10, the one covariation above the null hypothesis corresponds to a G:G/U:C non-Watson-Crick covariation in a pair of adjacent columns that are not in the proposed structure and are too close to be a base pair.
Supplementary Figure 5 Covariation analysis of putative helices H3 and H4 of ncSRA.
Color annotation as in Supplementary Figure 4. Green arrows indicate the seven base pairs identified in [21] as significantly covarying. We also provide the R-scape analysis for all pairs in this partial ncSRA alignment.
Supplementary Figure 6 Covariation analysis of putative helices H19, H20, and H21 of ncSRA.
Color annotation as in Supplementary Figure 4. Green arrows indicate eight base pairs identified in [21] as significantly covarying. We also provide the R-scape analysis for all pairs in this partial ncSRA alignment.
Supplementary Figure 7 Apparent covariations in 13 aligned Xist RepA region sequences [23].
(a) An alignment column pair was counted as covarying in [23] if it is entirely consistent with Watson-Crick or G:U base pairing, and at least one substitution and no more than two gaps are observed in each column. The dot plot shows 541 column pairs that satisfy these criteria in the RepA alignment used in [23], including (in blue) three of the four cited as support for the secondary structure in [30] (the other has a A:A non canonical pair, thus does not strictly satisfy the rule), 454 pairs that consist of a U+C column and a G+A column (red), and 84 other pairs (black). (b) Example of how single substitutions in conserved U+C and G+A columns can create apparent covariation.
Supplementary Figure 8 Properties of the structural alignments used in this study.
The alignments we analyzed are derived from the original alignments such that columns with less than 50% occupied positions are not considered. Information for the original alignments is given in parentheses if different from the analyzed alignment. Alignments are available as Stockholm files in the online Supplementary Information.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–8. (PDF 1823 kb)
Supplementary Software
Alignment data and R-scape source code. (ZIP 11746 kb)
Supplementary Dataset 1
Alignment Data. (ZIP 4103 kb)
Source data
Rights and permissions
About this article
Cite this article
Rivas, E., Clements, J. & Eddy, S. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs. Nat Methods 14, 45–48 (2017). https://doi.org/10.1038/nmeth.4066
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4066
This article is cited by
-
Crystal structure of a highly conserved enteroviral 5′ cloverleaf RNA replication element
Nature Communications (2023)
-
In vivo secondary structural analysis of Influenza A virus genomic RNA
Cellular and Molecular Life Sciences (2023)
-
KnotAli: informed energy minimization through the use of evolutionary information
BMC Bioinformatics (2022)
-
LaRA 2: parallel and vectorized program for sequence–structure alignment of RNA sequences
BMC Bioinformatics (2022)
-
Targeting RNA structures with small molecules
Nature Reviews Drug Discovery (2022)