Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs

Abstract

Many functional RNAs have an evolutionarily conserved secondary structure. Conservation of RNA base pairing induces pairwise covariations in sequence alignments. We developed a computational method, R-scape (RNA Structural Covariation Above Phylogenetic Expectation), that quantitatively tests whether covariation analysis supports the presence of a conserved RNA secondary structure. R-scape analysis finds no statistically significant support for proposed secondary structures of the long noncoding RNAs HOTAIR, SRA, and Xist.

This is a preview of subscription content

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Independent substitutions on a tree can create confounding covariations.
Figure 2: Covariation analysis of known or proposed RNA secondary structures.

References

  1. 1

    Holley, R.W. et al. Science 147, 1462–1465 (1965).

    CAS  Article  Google Scholar 

  2. 2

    Noller, H.F. et al. Nucleic Acids Res. 9, 6167–6189 (1981).

    CAS  Article  Google Scholar 

  3. 3

    Pace, N.R., Smith, D.K., Olsen, G.J. & James, B.D. Gene 82, 65–75 (1989).

    CAS  Article  Google Scholar 

  4. 4

    Williams, K.P. & Bartel, D.P. RNA 2, 1306–1310 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. 5

    Michel, F., Costa, M., Massire, C. & Westhof, E. Methods Enzymol. 317, 491–510 (2000).

    CAS  Article  Google Scholar 

  6. 6

    Gutell, R.R., Power, A., Hertz, G.Z., Putz, E.J. & Stormo, G.D. Nucleic Acids Res. 20, 5785–5795 (1992).

    CAS  Article  Google Scholar 

  7. 7

    Davidovich, C. & Cech, T.R. RNA 21, 2007–2022 (2015).

    CAS  Article  Google Scholar 

  8. 8

    Ji, Z., Song, R., Regev, A. & Struhl, K. eLife 4, e08890 (2015).

    Article  Google Scholar 

  9. 9

    Akmaev, V.R., Kelley, S.T. & Stormo, G.D. Bioinformatics 16, 501–512 (2000).

    CAS  Article  Google Scholar 

  10. 10

    Lindgreen, S., Gardner, P.P. & Krogh, A. Bioinformatics 22, 2988–2995 (2006).

    CAS  Article  Google Scholar 

  11. 11

    Yeang, C.-H., Darot, J.F.J., Noller, H.F. & Haussler, D. Mol. Biol. Evol. 24, 2119–2131 (2007).

    CAS  Article  Google Scholar 

  12. 12

    Dutheil, J.Y. Brief. Bioinform. 13, 228–243 (2012).

    Article  Google Scholar 

  13. 13

    Somarowthu, S. et al. Mol. Cell 58, 353–361 (2015).

    CAS  Article  Google Scholar 

  14. 14

    Weinberg, Z. & Breaker, R.R. BMC Bioinformatics 12, 3 (2011).

    CAS  Article  Google Scholar 

  15. 15

    Nawrocki, E.P. et al. Nucleic Acids Res. 43, D130–D137 (2015).

    CAS  Article  Google Scholar 

  16. 16

    Woolf, B. Ann. Hum. Genet. 21, 397–409 (1957).

    CAS  Article  Google Scholar 

  17. 17

    Dunn, S.D., Wahl, L.M. & Gloor, G.B. Bioinformatics 24, 333–340 (2008).

    CAS  Article  Google Scholar 

  18. 18

    Szymanski, M., Barciszewska, M.Z., Erdmann, V.A. & Barciszewski, J. Nucleic Acids Res. 30, 176–178 (2002).

    CAS  Article  Google Scholar 

  19. 19

    Fu, Y., Deiorio-Haggar, K., Anthony, J. & Meyer, M.M. Nucleic Acids Res. 41, 3491–3503 (2013).

    CAS  Article  Google Scholar 

  20. 20

    del Val, C., Rivas, E., Torres-Quesada, O., Toro, N. & Jiménez-Zurdo, J.I. Mol. Microbiol. 66, 1080–1091 (2007).

    CAS  Article  Google Scholar 

  21. 21

    Novikova, I.V., Hennelly, S.P. & Sanbonmatsu, K.Y. Nucleic Acids Res. 40, 5034–5051 (2012).

    CAS  Article  Google Scholar 

  22. 22

    Maenner, S. et al. PLoS Biol. 8, e1000276 (2010).

    Article  Google Scholar 

  23. 23

    Fang, R., Moss, W.N., Rutenberg-Schoenberg, M. & Simon, M.D. PLoS Genet. 11, e1005668 (2015).

    Article  Google Scholar 

  24. 24

    Rinn, J.L. & Chang, H.Y. Annu. Rev. Biochem. 81, 145–166 (2012).

    CAS  Article  Google Scholar 

  25. 25

    Rivas, E., Lang, R. & Eddy, S.R. RNA 18, 193–212 (2012).

    CAS  Article  Google Scholar 

  26. 26

    Price, M.N., Dehal, P.S. & Arkin, A.P. PLoS One 5, e9490 (2010).

    Article  Google Scholar 

  27. 27

    Shannon, C.E. Bell Syst. Tech. J. 27, 379–423 (1948).

    Article  Google Scholar 

  28. 28

    Gutell, R.R., Larsen, N. & Woese, C.R. Microbiol. Rev. 58, 10–26 (1994).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Martin, L.C., Gloor, G.B., Dunn, S.D. & Wahl, L.M. Bioinformatics 21, 4116–4124 (2005).

    CAS  Article  Google Scholar 

  30. 30

    Fodor, A.A. & Aldrich, R.W. Proteins 56, 211–221 (2004).

    CAS  Article  Google Scholar 

  31. 31

    Hofacker, I.L., Fekete, M. & Stadler, P.F. J. Mol. Biol. 319, 1059–1066 (2002).

    CAS  Article  Google Scholar 

  32. 32

    Gerstein, M., Sonnhammer, E.L.L. & Chothia, C. J. Mol. Biol. 236, 1067–1078 (1994).

    CAS  Article  Google Scholar 

  33. 33

    Gorodkin, J., Staerfeldt, H.H., Lund, O. & Brunak, S. Bioinformatics 15, 769–770 (1999).

    CAS  Article  Google Scholar 

  34. 34

    Weigt, M., White, R.A., Szurmant, H., Hoch, J.A. & Hwa, T. Proc. Natl. Acad. Sci. USA 106, 67–72 (2009).

    CAS  Article  Google Scholar 

  35. 35

    De Leonardis, E. et al. Nucleic Acids Res. 43, 10444–10455 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Weinreb, C. et al. Cell 165, 963–975 (2016).

    CAS  Article  Google Scholar 

  37. 37

    Fitch, W.M. Syst. Zool. 20, 406–416 (1971).

    Article  Google Scholar 

  38. 38

    Goebel, B., Dawy, Z., Hagenauer, J. & Mueller, J.C. in IEEE International Conference on Communications Vol. 2, 1102–1106 (IEEE, 2005).

  39. 39

    Rivas, E. & Eddy, S.R. BMC Bioinformatics 16, 406 (2015).

    Article  Google Scholar 

  40. 40

    Guindon, S. et al. Syst. Biol. 59, 307–321 (2010).

    CAS  Article  Google Scholar 

  41. 41

    Jung, S. et al. Nucleic Acids Res. 39, 7529–7547 (2011).

    CAS  Article  Google Scholar 

  42. 42

    del Val, C. et al. RNA Biol. 9, 119–129 (2012).

    CAS  Article  Google Scholar 

  43. 43

    Wheeler, T.J. et al. Nucleic Acids Res. 41, D70–D82 (2013).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We thank S.E.R. Egnor for suggesting the name R-scape and the Centro de Ciencias de Benasque Pedro Pascual in Spain, where part of this manuscript was drafted.

Author information

Affiliations

Authors

Contributions

E.R. and S.R.E. designed the method and wrote the manuscript. E.R. wrote the code, and designed and carried out the experiments. J.C. wrote the R-scape web application.

Corresponding author

Correspondence to Elena Rivas.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Characterization of different covariation statistics on a positive testset of 104 RNAs.

(a) Plots of the F measure---the harmonic mean of sensitivity (SEN) and positive predictive value (PPV), F=2*SEN*PPV / (SEN+PPV)---for four different covariation statistics as a function of the score's E-value, over all alignments, using R=scape with default parameters. (b) Effect of alignment gaps on the different covariation statistics, seen by including all alignment columns (right) as compared to the R-scape default (left). (c) Effect of measuring covariation using a binary classification (whether a pair is canonical Watson-Crick/G:U or not) versus using the full sixteen-way classification. (d) Covariation detection as a function of the number of sequences in the alignments. (e) The F measure for each of the 104 RNA Rfam alignments in the positive testset as a function of average percentage identity, at an E-value threshold of 0.05.

Supplementary Figure 2 Comparison of R-scape to related methods CoMap and MICA [12] on the testset of 104 RNAs.

(a) Sensitivity (percentage of significant base pairs) and positive predictive value (percentage of significant pairs that are base pairs) as a function of the score's E-value. (b) Running times for the three methods (R-scape in black, CoMap cyan, MICA red) on a log-log plot as a function of the number of sequences in the alignment (left) and as a function of the alignment length (right). Running times are for a single 3GHz intel Core i7 with 8GB 1600GHz DDR3 RAM. Running times for R-scape and CoMap include the cost of generating a phylogenetic tree using FastTree [26].

Supplementary Figure 3 Examples of RNAs with significant covariation support for their proposed structures.

(a) R-scape analysis of a multiple sequence alignment of αr14, a putative regulatory small RNA in α-proteobacteria [20,42]. (b) R-cape analysis of a multiple sequence alignment of Arisong RNA, a noncoding RNA identified in the ciliate Oxytricha [41]. (c) Example of detecting an underannotated structure, an S15 mRNA leader in γ-proteobacteria that autoregulates ribosomal protein synthesis [19]. Three out of the seven significantly covarying pairs are not in the proposed structure. These covarying pairs support the existence of a conserved pseudoknot, which was already known, but happened to not be annotated in the provided alignment [19]. (d) Example of using R-scape to improve a structural annotation for the Rfam seed alignment for SAM-I riboswitch. The R-scape modified structure has seven significant pairs not included in the Rfam-annotated SAM-I structure. The R-scape structure is in agreement with the secondary structure derived from the SAM-I riboswitch crystal structure (RK Montange & RT Batey, Nature, {\bf 441}441, 1172-1175, 2006). Notation is as in Figure 2.

Supplementary Figure 4 Covariation analysis of HOTAIR putative helices H7 and H10.

The structural alignments have been extracted from the HOTAIR Domain1 alignment (with 37 sequences) provided in [13]. The H7 and H10 alignments have 28 and 27 sequences respectively, after removing species for which the region does not include any residues. For any two base paired positions, changes are annotated in color relative to the most frequent Watson-Crick or G:U pair. Green arrows indicate the base pairs (one for H7 and 3 for H10) proposed as covarying in [13]. For putative helix H7, the proposed covarying pair (columns 8:36 marked in green) has covariation score -0.16 (E-value 7.74). Gray arrows indicate the best scoring putative Watson-Crick pair (columns 10:30, with a consensus C:G) which was not part of the proposed structure. This best scoring alternative pair would have one U:A compensatory and one U:G half-compensatory changes, and covariation score 3.66 (E-value 5.52). For both alignments, we also provide the R-scape analysis for all pairs. For putative helix H10, the one covariation above the null hypothesis corresponds to a G:G/U:C non-Watson-Crick covariation in a pair of adjacent columns that are not in the proposed structure and are too close to be a base pair.

Supplementary Figure 5 Covariation analysis of putative helices H3 and H4 of ncSRA.

Color annotation as in Supplementary Figure 4. Green arrows indicate the seven base pairs identified in [21] as significantly covarying. We also provide the R-scape analysis for all pairs in this partial ncSRA alignment.

Supplementary Figure 6 Covariation analysis of putative helices H19, H20, and H21 of ncSRA.

Color annotation as in Supplementary Figure 4. Green arrows indicate eight base pairs identified in [21] as significantly covarying. We also provide the R-scape analysis for all pairs in this partial ncSRA alignment.

Supplementary Figure 7 Apparent covariations in 13 aligned Xist RepA region sequences [23].

(a) An alignment column pair was counted as covarying in [23] if it is entirely consistent with Watson-Crick or G:U base pairing, and at least one substitution and no more than two gaps are observed in each column. The dot plot shows 541 column pairs that satisfy these criteria in the RepA alignment used in [23], including (in blue) three of the four cited as support for the secondary structure in [30] (the other has a A:A non canonical pair, thus does not strictly satisfy the rule), 454 pairs that consist of a U+C column and a G+A column (red), and 84 other pairs (black). (b) Example of how single substitutions in conserved U+C and G+A columns can create apparent covariation.

Supplementary Figure 8 Properties of the structural alignments used in this study.

The alignments we analyzed are derived from the original alignments such that columns with less than 50% occupied positions are not considered. Information for the original alignments is given in parentheses if different from the analyzed alignment. Alignments are available as Stockholm files in the online Supplementary Information.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8. (PDF 1823 kb)

Supplementary Software

Alignment data and R-scape source code. (ZIP 11746 kb)

Supplementary Dataset 1

Alignment Data. (ZIP 4103 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rivas, E., Clements, J. & Eddy, S. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs. Nat Methods 14, 45–48 (2017). https://doi.org/10.1038/nmeth.4066

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing