R2DT is a framework for predicting and visualising RNA secondary structure using templates

Non-coding RNAs (ncRNA) are essential for all life, and their functions often depend on their secondary (2D) and tertiary structure. Despite the abundance of software for the visualisation of ncRNAs, few automatically generate consistent and recognisable 2D layouts, which makes it challenging for users to construct, compare and analyse structures. Here, we present R2DT, a method for predicting and visualising a wide range of RNA structures in standardised layouts. R2DT is based on a library of 3,647 templates representing the majority of known structured RNAs. R2DT has been applied to ncRNA sequences from the RNAcentral database and produced >13 million diagrams, creating the world’s largest RNA 2D structure dataset. The software is amenable to community expansion, and is freely available at https://github.com/rnacentral/R2DT and a web server is found at https://rnacentral.org/r2dt.

The perhaps most impressive part of the work is the huge collection of templates provided with R2DT. The template library not only covers most RFam families, but also provides layouts for many subfamilies of ribosomal RNAs, including some based on measured tertiary structures complete with expansion elements. There is even the possibility of further community driven expansion of the template library, such that newly detected ncRNA families can now be published together with their standard layout.
The second important point is the template choice. For this purpose, each template is associated with a covariance model. The alignment of the query sequence to the CM is then used both to select the optimal template and to define the mapping between query and template. The procedure is state of the art and also represents an improvement on the original Traveler algorithm, which uses a tree alignment between the structures of the template and the query RNA.
The pipeline is fully automatic and thus ideal for use in web servers such as RNAcentral. A minor criticism remains, that it is unclear how the user can deviate from the automated layout. It should be easy, for example, to manually choose a template and perhaps even provide a handcrafted alignment of template and query sequence. Moreover, while the manuscript talks about "folding" the RNA using cmalign, one should be aware that this only inserts base pairs that are part of the model, and thus regions without structure conservation will remain single stranded. Again it is not clear whether and how the user can use their own structure (be it predicted or measured), apart from building a new template.
The software itself comes with a number of examples, but needs more proper documentation beyond what can be found in the Readme file..

Reviewer #2:
Remarks to the Author: In this paper Sweeney and colleagues introduced R2DT, a tool for the template-based visualization of RNA structures. The tool is definitely an important resource, and I am impressed by the amount of work the authors carried out. I have a few improvements I would recommend to the authors to make their work more readily usable in the RNA community: -Besides the included templates, I would suggest to include some additional templates that are for sure available, but that I seem to have missed in the authors-provided list, for example, RNases P and MRP -It is not entirely clear how new templates can be created from scratch and included in the database for automatic selection -An important feature would be the possibility to add annotations such as boxing around significantly covarying base-pairs, or structure probing reactivities. It is not clear whether these features can be readily annotated in an automatic fashion by R2DT -What happens in case the tool does not find an optimal template for the provided RNA? Or whether the automatically-selected template only matches part of the provided RNA? Is it possible for the program to also "attempt" to plot the structure even in the absence of a template? I am thinking of well known algorithms such as the NAView, or the loop-resolution approach used by R2R.
Reviewer #3: Remarks to the Author: The authors describe in their manuscript "R2DT: computational framework for template-based RNA secondary structure visualisation across non-coding RNA types" a framework for the generation of consistent and stable RNA secondary structures and their drawings. In order to achieve this goal, first the RNA type of the sequence is detected. Then the RNA sequence is folded and drawn according to the template for this RNA type. The drawings shown in the manuscript are visually well readable and stable against slight changes in the sequence. The framework was tested against the entire database of RNACentral and shows satisfactory results. Nevertheless, the manuscript is not ready for publication and several important issues need to be resolved, that are detailed next.
Major Issues: 1) The authors claim in their manuscript that no RNA drawing algorithm exists so far that produces stable and consistent drawings for similar RNAs. The paper "RNApuzzler: efficient outerplanar drawing of RNA-secondary structures" by Wiegreffe et al. [1] shows that their algorithm also produces stable and consistent RNA drawings even without the usage of templates. The authors demonstrate this in their manuscript also with SSU RNAs, like the authors of this manuscript. Therefore, I see the contribution of this manuscript in the usage of the created templates, which allow different RNA drawing styles depending on the RNA type, rather than in the stability of stable drawings which is provided by the aforementioned publications as well as by others.
2) The framework described by the authors is not a pure drawing algorithm. In addition, the RNAs to be drawn are also pre-folded using a folding template. This leads to as few deviations as possible from the desired layout and the drawing is as stable as possible. Although this procedure is legitimate and seems to make sense, it seems, that most of the stability is due to the constraint folding of the RNAs and not to the subsequent drawing algorithm. Moreover, other drawing algorithms produce similar drawings with similar input data. I would appreciate a more detailed and precise discussion of this point in the manuscript.
3) Due to Concern 2) the title of the manuscript is also misleading. I recommend to change it.
4) The Related Work Chapter is by far the weakest part of the manuscript. The authors describe only very rudimentary the state-of-the-art in drawing RNA secondary structures and do not discuss other approaches for the comparative comparison of RNA structures. The statement that most drawing algorithms are based on force directed layouts is not correct. The cited tools and algorithms VARNA, RNAView, 3DNA, PseudoViewer, and R2R do not use FDL-based algorithms to the best of my knowledge. Moreover, the statement "None of these methods can produce useful diagrams for large RNA structures, such as the small and large subunit ribosomal RNAs (SSU and LSU, respectively" (line 59-60) is not correct as described in Issue number 1) before. Further, there is no in-depth discussion of the newer literature [1], [2] and a standard algorithm [3], which is frequently used, in this literature review. The state-of-the-art reports of Wiese et al. [4] and Ponty et al. [5] on the subject of RNA visualization are also not cited. Finally, the comparative representations of secondary structures by using dotplots [6][7][8] and arcplots [9][10] are entirely missing. Only the cited R2R follows a similar consensus approach as R2DT, but this is not discussed further in the manuscript by the authors. I recommend a comprehensive revision of the section. 5) According to the description, the used Traveler algorithm can modify the positions of individual bases in the sequence so that they fit better to the template. This procedure is questionable from a visualization point of view, because it also changes the interpretation of the drawings by the expert. I would recommend a more detailed discussion of this procedure.
6) The authors evaluate the algorithm for different characteristics in detail, but it is not clear how much computing time the algorithm needs. Own tests ( Intel i7-9750H CPU, 64GB RAM, Docker Image) showed that the computing time is quite high due to the template matching. The provided website shows similar long computing times. I recommend the authors to analyze the computation time and compare it to those of other algorithms.
7) The presented intersection detection of Traveler is simple and does not always converge. There are intersections that cannot be corrected by rotating only Hairpin Loop segments. I would like to have a more detailed discussion of the limitations of this approach as well as the effects on the template representation in the manuscript.
8) The authors describe in their validation section, that a small part of RNA structures cannot be drawn with R2DT. Only further on in the methods section the authors describe the reason for this behavior, since the template cannot be identified uniquely. I recommend that this limitation of the framework is explained in the manuscript before the validation section, otherwise this limitation seems unclear until the methods section.
Minor Issues: 1) Figure 1: The image shown in d) was not generated by FORNA. Probably the visualizations in d) and e) are swapped.
2) Figure 2 appears redundant to the text and offers little information.
3) Line 179-180 "The single page LSU layouts enable R2DT to visualise the LSU 2D structures automatically, which has not been possible until now" With the new structure prediction, however, all other algorithms can also draw this structure as a single page layout. Therefore I would recommend to weaken this statement. 4) Unfortunately, the used RNAs do not have a unique IDs in the manuscript, so that the drawings and foldings are difficult to reproduce. I recommend to provide unique IDs for the RNA sequences used.
5) The layout of the bibliography is inconsistent. 6) One has to build the Docker Image by oneself, because there is no further documentation on how to start the pre-built Docker Image.
Due to all these issues, I recommend an extensive revision of the manuscript based on my recommendations in a major revision. [6] Charif, Delphine, and Jean R. Lobry. "SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis." Structural approaches to sequence evolution. Springer, Berlin, Heidelberg, 2007. 207-232.

Response to Reviewers
Reviewer #1 (Expertise: RNA structural prediction): The R2DT framework described here presents a highly automated way to draw the secondary structures of RNAs from known families based on pre-defined templates. Currently, there are various layout algorithms for secondary structures in use, which can yield drawings of the same RNA structure that appear completely different. Even when using the same layout algorithm drawings of closely related RNAs can be visually different. Template based drawing methods offer a solution to this problem, as they allow to define a standard layout for a family of RNAs, which in turn facilitates discussing structural properties of RNA.
The perhaps most impressive part of the work is the huge collection of templates provided with R2DT. The template library not only covers most RFam families, but also provides layouts for many subfamilies of ribosomal RNAs, including some based on measured tertiary structures complete with expansion elements. There is even the possibility of further community driven expansion of the template library, such that newly detected ncRNA families can now be published together with their standard layout.
The second important point is the template choice. For this purpose, each template is associated with a covariance model. The alignment of the query sequence to the CM is then used both to select the optimal template and to define the mapping between query and template. The procedure is state of the art and also represents an improvement on the original Traveler algorithm, which uses a tree alignment between the structures of the template and the query RNA.
The pipeline is fully automatic and thus ideal for use in web servers such as RNAcentral. A minor criticism remains, that it is unclear how the user can deviate from the automated layout. It should be easy, for example, to manually choose a template and perhaps even provide a handcrafted alignment of template and query sequence.
We agree with the Reviewer that in some cases it could be useful to be able to bypass the automatic template selection, so a new advanced option was implemented in the R2DT web server allowing the user to choose a template from a searchable dropdown list (see https://rnacentral.org/r2dt). The corresponding functionality is also available in the standalone software through two new command line options for listing all templates and specifying a template to be used for visualisation. The manuscript has been updated to describe this functionality and refer to the latest version of R2DT (v1.1).
Moreover, while the manuscript talks about "folding" the RNA using cmalign, one should be aware that this only inserts base pairs that are part of the model, and thus regions without structure conservation will remain single stranded. This is indeed an important point that is addressed in the section "Automatic pipeline for template selection and 2D structure visualisation" which reads as follows: "It is important to note that R2DT does not attempt to fold the unstructured regions found in some templates or predict the structure of the insertions relative to the template." Again it is not clear whether and how the user can use their own structure (be it predicted or measured), apart from building a new template.
As the Reviewer suggests, the recommended way to use a structure that is not part of the R2DT template library is to build a new template. We clarified this in the manuscript in the section "Community expansion of the 2D template library". This process is documented at https://github.com/RNAcentral/R2DT#how-to-add-new-templates.
The software itself comes with a number of examples, but needs more proper documentation beyond what can be found in the Readme file.
We significantly expanded the documentation with detailed installation and usage instructions, including a new option for manually selecting a template (see https://github.com/rnacentral/r2dt/). Reviewer #2 (Expertise: RNA biology and structural prediction): In this paper Sweeney and colleagues introduced R2DT, a tool for the template-based visualization of RNA structures. The tool is definitely an important resource, and I am impressed by the amount of work the authors carried out. I have a few improvements I would recommend to the authors to make their work more readily usable in the RNA community: -Besides the included templates, I would suggest to include some additional templates that are for sure available, but that I seem to have missed in the authorsprovided list, for example, RNases P and MRP

-It is not entirely clear how new templates can be created from scratch and included in the database for automatic selection
The creation of new templates is described in manuscript under the section entitled "Community expansion of the 2D template library" and is also documented on GitHub (https://github.com/RNAcentral/R2DT#how-to-add-new-templates). To summarise, a bespoke version of the XRNA software (https://github.com/LDWLab/XRNA-GT) can be used to import the R2DT-generated SVG files and adjust the 2D layouts (for example, by changing the orientation of RNA helices or edit base pairs). XRNA-GT can also export the files required for the creation of the R2DT templates. This is further supported by a newly added R2DT feature allowing users to manually select a closely-related template to get a draft template before adjusting it with XRNA-GT. The XRNA-GT workflow has been successfully used internally to produce the 3Dbased SSU templates as well as the RNAse P templates described above. In addition, the R2DT documentation contains example files in bpseq, fasta, and xml formats that can be converted to R2DT templates using example scripts.
-An important feature would be the possibility to add annotations such as boxing around significantly covarying base-pairs, or structure probing reactivities. It is not clear whether these features can be readily annotated in an automatic fashion by R2DT We agree with the Reviewer that this would be a very useful feature. We plan to implement it in future versions and have updated the Discussion section accordingly. However, this functionality will take significant time to implement. As the Reviewer acknowledged previously, our work is an important resource for the community and this functionality will be added over time.
-What happens in case the tool does not find an optimal template for the provided RNA? Or whether the automatically-selected template only matches part of the provided RNA? Is it possible for the program to also "attempt" to plot the structure even in the absence of a template? I am thinking of well known algorithms such as the NAView, or the loop-resolution approach used by R2R. Reviewer #3 (Expertise: RNA structural prediction): The authors describe in their manuscript "R2DT: computational framework for templatebased RNA secondary structure visualisation across non-coding RNA types" a framework for the generation of consistent and stable RNA secondary structures and their drawings. In order to achieve this goal, first the RNA type of the sequence is detected. Then the RNA sequence is folded and drawn according to the template for this RNA type. The drawings shown in the manuscript are visually well readable and stable against slight changes in the sequence. The framework was tested against the entire database of RNAcentral and shows satisfactory results. Nevertheless, the manuscript is not ready for publication and several important issues need to be resolved, that are detailed next.
Major Issues: 1) The authors claim in their manuscript that no RNA drawing algorithm exists so far that produces stable and consistent drawings for similar RNAs. The paper "RNApuzzler: efficient outerplanar drawing of RNA-secondary structures" by Wiegreffe et al. [1] shows that their algorithm also produces stable and consistent RNA drawings even without the usage of templates. The authors demonstrate this in their manuscript also with SSU RNAs, like the authors of this manuscript. Therefore, I see the contribution of this manuscript in the usage of the created templates, which allow different RNA drawing styles depending on the RNA type, rather than in the stability of stable drawings which is provided by the aforementioned publications as well as by others.
We thank the Reviewer for the detailed feedback. While other methods may generate stable drawings, their output diagrams are not guaranteed to be as biologically meaningful. For example, the SSU and LSU rRNA diagrams produced without templates do not reflect the 3D architecture of the ribosome that is captured in the manually curated templates. This point is further illustrated in the following response.
2) The framework described by the authors is not a pure drawing algorithm. In addition, the RNAs to be drawn are also pre-folded using a folding template. This leads to as few deviations as possible from the desired layout and the drawing is as stable as possible. Although this procedure is legitimate and seems to make sense, it seems that most of the stability is due to the constraint folding of the RNAs and not to the subsequent drawing algorithm. Moreover, other drawing algorithms produce similar drawings with similar input data. I would appreciate a more detailed and precise discussion of this point in the manuscript.
Although other software could generate similar 2D diagrams following small modifications in the input sequence, this is not guaranteed. For example, consider the following diagrams generated by VARNA (v3.9). The sequence on the right is identical to the one on the left with the exception of an additional hairpin inserted on the 5′ end of the structure, which is sufficient to cause significant changes in the layout.
However, if the left structure is used as a template and the right one as a target by the Traveler software, the resulting layout preserves the orientation of the template while accommodating the additional helix.
The entire R2DT pipeline has been designed to ensure that the template structures are reproduced as faithfully as possible.
3) Due to concern 2) the title of the manuscript is also misleading. I recommend changing it.
As far as we understand, the Reviewer is concerned that the users may not be aware that R2DT not only draws a secondary structure image but also predicts the secondary structure of the input sequence to compare it with the template. However, we do not agree with the Reviewer's assessment that the title is misleading, as both the folding and drawing aspects of the software are captured in the title that defines the R2DT's objective as "template-based RNA secondary structure visualisation." We changed the wording in the Discussion section to make it clear that the pipeline involves a secondary structure prediction step. 6 4) The Related Work Chapter is by far the weakest part of the manuscript. The authors describe only very rudimentary the state-of-the-art in drawing RNA secondary structures and do not discuss other approaches for the comparative comparison of RNA structures. The statement that most drawing algorithms are based on force directed layouts is not correct. The cited tools and algorithms VARNA, RNAView, 3DNA, PseudoViewer, and R2R do not use FDL-based algorithms to the best of my knowledge. Moreover, the statement "None of these methods can produce useful diagrams for large RNA structures, such as the small and large subunit ribosomal RNAs (SSU and LSU, respectively" (line 59-60) is not correct as described in Issue number 1) before. Further, there is no in-depth discussion of the newer literature [1], [2] and a standard algorithm [3], which is frequently used, in this literature review.
The state-of-the-art reports of Wiese et al. [4] and Ponty et al. [5] on the subject of RNA visualization are also not cited. Finally, the comparative representations of secondary structures by using dotplots [6][7][8] and arcplots [9][10] are entirely missing.
Only the cited R2R follows a similar consensus approach as R2DT, but this is not discussed further in the manuscript by the authors. I recommend a comprehensive revision of the section.
We would like to thank the review for this constructive feedback. The introduction has been updated following the Reviewer's suggestion. The reference numbering has changed throughout the entire manuscript.

5)
According to the description, the used Traveler algorithm can modify the positions of individual bases in the sequence so that they fit better to the template. This procedure is questionable from a visualization point of view, because it also changes the interpretation of the drawings by the expert. I would recommend a more detailed discussion of this procedure.  Figure 1 in the manuscript).  figure (Figure 6) showcasing Traveler's abilities and limitations with respect to overlaps.
8) The authors describe in their validation section, that a small part of RNA structures cannot be drawn with R2DT. Only further on in the methods section the authors describe the reason for this behavior, since the template cannot be identified uniquely. I recommend that this limitation of the framework is explained in the manuscript before the validation section, otherwise this limitation seems unclear until the methods section.
To address this in the manuscript, the following sentence has been added to the "Automatic pipeline for template selection and 2D structure visualisation" section: If the sequence does not match any templates, the following steps are skipped, and no output files are generated.
Minor Issues: 1) Figure 1: The image shown in d) was not generated by FORNA. Probably the visualizations in d) and e) are swapped.
We thank the Reviewer for pointing out this mistake. The Figure 1 legend has been corrected.
2) Figure 2 appears redundant to the text and offers little information.
We agree with the Reviewer and have removed Figure 2. The numbering of the remaining figures has been updated accordingly.
3) Line 179-180 "The single page LSU layouts enable R2DT to visualise the LSU 2D structures automatically, which has not been possible until now" With the new structure prediction, however, all other algorithms can also draw this structure as a single page layout. Therefore I would recommend weakening this statement.
The sentence has been changed to: "The single page LSU layouts enable R2DT to visualise the LSU 2D structures in standard orientations completely automatically, which has not been possible until now." 4) Unfortunately, the used RNAs do not have unique IDs in the manuscript, so that the drawings and foldings are difficult to reproduce. I recommend providing unique IDs 9 for the RNA sequences used.
We have added the RNAcentral unique sequence accessions to figure legends (Figures 1, 3, 4, 5) to unambiguously identify each sequence.
5) The layout of the bibliography is inconsistent.
The bibliography has been checked for consistency.
6) One has to build the Docker Image by oneself, because there is no further documentation on how to start the pre-built Docker Image.
We expanded the installation and usage instructions to provide detailed examples, including running R2DT using pre-built images from Docker Hub (please see https://github.com/RNAcentral/r2dt).
Due to all these issues, I recommend an extensive revision of the manuscript based on my recommendations in a major revision.