Modelling the three-dimensional structure of the right-terminal domain of pospiviroids

Viroids, the smallest know plant pathogens, consist solely of a circular, single-stranded, non-coding RNA. Thus for all of their biological functions, like replication, processing, and transport, they have to present sequence or structural features to exploit host proteins. Viroid binding protein 1 (Virp1) is indispensable for replication of pospiviroids, the largest genus of viroids, in a host plant as well as in protoplasts. Virp1 is known to bind at two sites in the terminal right (TR) domain of pospiviroids; each site consists of a purine- (R-) and a pyrimidine- (Y-)rich motif that are partially base-paired to each other. Here we model the important structural features of the domain and show that it contains an internal loop of two Y · Y cis Watson-Crick/Watson-Crick (cWW) pairs, an asymmetric internal loop including a cWW and a trans Watson/Hoogsteen pair, and a thermodynamically quite stable hairpin loop with several stacking interactions. These features are discussed in connection to the known biological functions of the TR domain.

. Virp1 (see Supplementary Fig. S1) is a member of the bromodomain(s) and extraterminal domain (BET) protein family including a nuclear localization signal. BET proteins are known to be involved in chromatin biology and transcriptional regulation 16,17 . The bromodomain recognizes acetyl-lysine residues in histones and other proteins 18,19 . The ET domain consists of three regions; the conserved N-terminal ET (NET) region has an acidic patch that may interact with other proteins or nucleic acids 20 . The RNA binding domain of Virp1 has been localized to the C-terminal half of the protein 9,12 .
The knowledge on the three-dimensional structure of viroids is sparse. At least for PSTVd the presence of a tertiary structure (in the sense of tRNA-L-conformation) was excluded with certainty 21 . The left terminal part of the TL domain was shown by nuclear magnetic resonance (NMR) spectroscopy to be rod-like but not bifurcated 22 ; that is, this study supported secondary structure predictions but delivered no three-dimensional details. Loop 6 ( Fig. 1a) has the sequence 5′ G 36 AC 3′/5′ C 322 GA 3′ flanked by a U:G wobble pair and a cis Watson-Crick/ Watson-Crick (cWW) G:C pair; according to structural modelling, the loop motif is a set of three stacked non-WC pairs 23 . This motif mediates trafficking from palisade to spongy mesophyll in N. benthamiana. Loop 7 (Fig. 1a) is required to traffic from nonvascular into the vascular tissue phloem to initiate systemic infection. According to modelling, the loop nucleotides U 43 · C 317 are a water-inserted cWW base pair flanked by canonical cWW base pairs 24 . Loop 15 of PSTVd (Fig. 1a) is similar to loop E (sarcin/ricin motif) of eukaryotic 5 S rRNA [25][26][27] . A detailed analysis of this loop, mainly based on isostericity matrices, allowed Zhong et al. 28 to design disruptive as well as compensatory mutations that are critical in rolling-circle replication [29][30][31] .
We created three-dimensional RNA models of the terminal right domain of several circular viroids, using a variety of recently published web services, programs and databases, for example FARFAR/Rosetta 32-34 , RNA Bricks 35 , FRAbase (RNA FRAgments search engine & dataBASE) 36 , based on sequence and two-dimensional structural similarity to known three-dimensional RNA structures. These programs were selected for use mainly based on their complementarity (assembly of 3-nt fragments plus refinement versus search for similar structural elements using sequence and structure patterns) from a palette of available tools [37][38][39][40][41][42] . The predicted models encompassed three loops: a symmetrical basepaired 2×2 loop, an asymmetrical internal loop, and a hairpin loop both stabilized by hydrogen bonding and stacking interactions. On the one hand this predicted structure should provide an extraordinary platform for protein and/or nucleic acid binding in trans, on the other this model should facilitate further experimental and structural studies of the TR interaction with Virp1 and TFIIIA.

Results
Secondary structure prediction of TR domain. To get an overview on available TR sequences from members of Pospiviroid and their secondary structures, we assembled alignments of full-length sequences for each species of Pospiviroid and extracted the TR domain. For example, all unique sequences of PSTVd were aligned and a consensus structure was predicted; Fig. 1b details the TR domain of the PSTVd consensus structure. The TR stem-loop consisted of six well-defined helices but the sequences of loops more distant from the hairpin loop were difficult to align, especially if sequences of further Pospiviroid members were included. This is of concern only to the Virp1 binding site of lower affinity (nts 149-153/202-206), while the binding site of higher affinity (nts 170-174/183-187) is located in the more easily alignable region of the terminal stem-loop (nts 161-198; Fig. 1c). We will concentrate in the following on this terminal stem-loop that consists of three helices separated by two internal loops (IL1 and IL2) and the hairpin loop (HP). The final alignment of terminal stem-loop sequences (see Supplementary Fig. S2), on which the consensus structure (Fig. 1c) and the structure logo ( Fig. 2) are based, contained 95 different sequences of pospiviroids.
The nucleotides in the two helices close to the HP are well conserved in sequence (marked by b and c in Fig. 2) but show some covariation marked by green and blue background in Fig. 1c. The sequence of the terminal HP loop is 5′ UUUC in most pospiviroids, with the exception of CEVd and IrVd that have 5′ CUCGW (W is A or U). The sequence of IL1 is 5′ GA/GC in CEVd and 5′ CW/UU in PCFVd, PSTVd, and most CLVd. The sequence of IL2 is 5′ C/UCC, conserved in most pospiviroids. This degree of conservation in sequence as well as in secondary structure of the TR stem-loop gave some confidence as a basis for modelling.
Results from Rosetta modelling. We predicted models of IL1 (Fig. 3), IL2 (Fig. 4), and HP (Fig. 5) within the TR stem-loop ( Fig. 1c) with the Rosetta web server (ROSIE/FARFAR/RNA De Novo). Fragment assembly for the full TR stem-loop showed only a limited convergence; with shorter structural elements, like the single loops, 1,000 generated structures and 10,000 MC cycles were usually sufficient to generate reasonable best-scoring structures. As input to the web server, we used the respective parts of the consensus structure (in bracket-dot notation) but sequences from individual viroid variants (for examples see Fig. 3, top row).
Internal loop 1 (IL1). The IL1 loop typical for PSTVd is a symmetric internal loop of 2×2 nucleotides (10× CC/ UU; Fig. 1b; Fig. 3a, top row). Other loop nucleotides also occur frequently (Fig. 2): 15× CU/UU (Fig. 3c); 2× UU/UU; a 1×1 loop of C/U followed by an A:U pair (13×; Fig. 3b); a G:C pair followed by 1×1 loop of A/G is typical for CEVd (19×). The loop is closed by G:C cWW pairs but also other closing pairs appear quite often. The loop nucleotides form standard cWW pairs in all models of these sequence variants (Fig. 3, middle row). Thus other cWW pairs should be able to substitute: U · C, G · A, A · G, and A · A are isosterically close to C · U; G · U, A · C, and C · C are isosterically close to U · U 43,44 . These isosteric pairs, however, do not occur in viroids. Only a restricted set of pair combinations are able to retain the same backbone geometry according to RNAredesign 45,46 ; this is obvious from the logos shown in Fig. 3 bottom. For example, CC/UU might be replaced by UU/CC or CU/UU by UU/UC. Thus, the viroid-specific combinations of nucleotides might be either essential for interaction with Virp1, or the reduced stability of IL1-in comparison to an IL1 composed of purines with their increased stacking-is critical.  Fig. 1c; missing numbers denote that the respective column of the alignment contains mostly gaps. The RY motif critical for binding of Virp1 is outlined and marked by R and Y. The sequences of internal loops are marked by IL1 and IL2, respectively, and the hairpin loop sequence by HP.  The stability of such tandem "mismatches" is partially known 47,48 : the tandem U · U stabilizes a helix when flanked by G · C pairs, as in case of viroids, but is less stable than symmetric tandem G · U or A · G pairs. The pyrimidine base-pairs led to a contraction of the sugar phosphate backbone of the opposing strands in comparison to standard cWW pairs; this might be of importance for protein binding.
Internal loop 2 (IL2). Predicted IL2 loop nucleotides were highly connected by a complex network of base/ base hydrogen bonds and stacking interactions (Fig. 4a). An example of these interactions is shown in Fig. 4b: C 187 · A 172 :U 185 form a triple pair consisting of a cWW A:U and a trans Watson/Sugar edge (tWS) C · A.
According to RNAredesign, U 186 and C 187 are critical for the backbone geometry of the loop (Fig. 4c). The cWW pair C 171 · C 188 is replaced by a U · U in 22% of structures, which is the only near-isosteric pair to C · C 44 .
Hairpin loop (HP). The most common HP loop is UUUC closed by an U:A pair (Fig. 5a); the HP loop of CEVd and IrVd is CUCGW closed by a G:C pair (see Supplementary Fig. S2). The predicted HP loop structures were characterized by internal interactions of loop nucleotides. For the most common HP loop, Rosetta predicted a conformation with a trans Sugar/Hoogsteen (tSH) pair U 177 :C 180 next to the loop-closing cWW U:A pair and stacking of the two last loop nucleotides (Fig. 5a). The exceptional HP loop of a TCDVd (Fig. 5b) was very similar with a tSH pair U 177 :C 180 next to the loop-closing A:U pair and stacking of the two last loop nucleotides. For the HP loop of CEVd, Rosetta predicted a quite similar structure, despite the five loop nucleotides instead of the tetra-loops in most other cases: the fifth nucleotide of the loop was bulged out (see Supplementary Fig. S3b).
RNAredesign suggested quite different sequences for all loop models shown in Fig. 5 and S3: the optimized sequences contained more purines than the natural sequences, which might point on the one hand to increased thermodynamic stability of the optimized sequences due to increased stacking interactions, and on the other to functional constraints of the natural sequences. Such constraints might be an adaptation to stability requirements and restrictions induced by binding to other molecules. Structural similarities in RNABricks and FRAbase. The Rosetta models, as depicted above, are based on assembly of short fragments from known structures. These fragments might include nucleotides that interact with additional ligands like a protein or another RNA. That is, "unusual" conformations of nucleotides in the final models might be due to absent ligands. In contrast, if a larger fragment of the target is found in a known (template) structure including ligands, one might get insight on the basis of such "unusual" conformations and requirements for interactions. So we searched for structural elements similar to the terminal stem-loop in RNABricks 35 and FRAbase 36 .
IL1. An internal loop with a 5′ CU/5′ UU motif, frequent in pospiviroids, is described as a non-canonical tandem base-pair (i. e. two cWW pairs) in the 3′-UTR of poliovirus-like enteroviruses 49 . Remarkably, this loop is conserved but replacement of the C · U and U · U pairs by a C · G and U · A pairs, respectively, had no effect on virus viability and growth in cell culture. This is in contrast with viroids: replacement of IL1 by two A · U pairs diminished replication efficiency and abolished systemic infection 50 (see below and Supplementary Table S4).
The intron 1 of human cellular nucleic acid-binding protein (also named zinc finger protein 9) contains a (CCUG) n repeat; expansion of this repeat causes myotonic dystrophy type 2 51 . The repeat folds into a stem-loop with 5′ CU/5′ UC tandem cWW pairs flanked by GC pairs that binds and inactivates a splicing regulator 52 . This symmetric tandem mismatch, however, occurs only in CLVd JF446928.
IL2. RNA Bricks found an internal loop with some similarity to IL2 in PDB structures of 16S ribosomal RNA; for an example see Fig. 6a. The internal loop has the sequence C 934 / 1381 UCC 1383 with a 3′ closing A:U pair as in PSTVd (Fig. 4) but with a 5′ G:C pair. C 934 is bulged out from the motif and makes a tWW pair with A 938 :U 1345 (see motif IL_40845.1 in the RNA 3D Motif Atlas) 53 . C 1382 and C 1383 pair with C 936 and A 935 in cWS conformation, respectively. U 1381 pairs with G 1379 in cHS conformation as annotated by x3dna-dssr; rnaview strangely determines this interaction as a cSW pair. A protein contact seems not relevant for this motif. The sequence of this loop is highly conserved according to an alignment of bacterial 16S ribosomal RNAs 54 ; a corresponding logo is shown in Fig. 6b. The most variant loop nucleotide obviously is C 1383 (Fig. 6b and c) that is replaced by an U in ~3% of sequences; this replacement did not occur in IL2.

HP.
Hairpin loops with the four nucleotides and closing base pair common in PSTVd's HP (Fig. 5a) are present in the histone mRNA stem-loop (PDB entries 1JU7 55 and 1KKS 56 ), which binds to the stem-loop binding protein (SLBP) and the histone mRNA 3′-exonuclease 1 (Exo) for 3′ end processing. These hairpin structures-without corresponding proteins-were determined by NMR spectroscopy. An overlay of both hairpin loops is shown in Fig. 7a; for this overlay, the terminal C 1 :G 8 pairs were matched in chimera. Note the stackings of U 2 , U 3 and U 4 , and of U 5 on A 7 in 1JU7 (red), while U 3 , U 5 and A 7 stack in 1KKS (cyan). In 1JU7 the C 6 is bulged out to the major groove, whereas in 1KKS the C 6 is bulged out to the minor groove. According to FRAbase, further PDB entries (1ZBH, Cheng & Patel, unpublished; 4L8R and 4QOZ) 57 are similar to HP; each of these contains two of the histone stem-loops bound by protein (for example see Fig. 7b). These loop structures are very close, but not identical to 1JU7 and 1KKS, respectively, due to the interactions with protein, like stacking of Tyr 144 (Exo)/U 3 and His 195 (Exo)/C 6 and numerous hydrogen bondings between protein side chains and RNA backbone and bases. The sequence of the histone mRNA stem-loop is highly conserved (Fig. 7c,d), which might be forced by interactions with the proteins. The most variable position is C 6 (24% A, 68% C, 1% G, 7% U) as in HP (C 189 ; Fig. 2). The differences between the sequence logos of the histone mRNA stem-loop and HP might be due to additional requirements of HP like stability in the (−) strand, but might also point to a clear biological difference (see Discussion).
We further searched for HP-loop variants from other viroids. The PDB entry 4V7E 58 , a model of the Triticum aestivum 80S ribosome based on cryo-electron microscopy with resolution of <5 Å, contains in the 28S rRNA a hairpin loop with sequence 767 UUCU 770 closed by a C:G pair instead of the A:U pair in a TCDVd (ID GQ169709). The first and last loop nucleotide stabilize the loop by a cWW interaction, and U 769 and U 770 stack on each other (see Supplementary Fig. S5); both interactions might be independent from the loop-closing pair. The PDB entries 3J3V and 3J3W 59 , which contain structures of the Bacillus subtilis 50S ribosome subunit based on cryo-electron microscopy and usage of ribosome structures of Escherichia coli and Thermus thermophilus as templates, show in the 23S rRNA a hairpin loop, similar to that of CEVd (see Supplementary Fig. S3b), that is involved in a kissing loop interaction (Fig. 8). Most interactions in the rRNA loop are due to the kissing interactions; however, the last loop nucleotide, in the rRNA as well as in the predicted CEVd loop, makes a cSH interaction with the 3′ nucleotide of the loop-closing basepair.  mentioned TCDVd variant (Fig. 5b), and an additional U in the HP loop. The original variant R+, which was never recovered, and a further evolved variant (2 in see Supplementary Table S2) might be less viable due to a possible structural rearrangement influencing IL2 and the RY motif and thus did not point to critical loop nucleotides. The loop-closing pair and the loop size, however, seemed uncritical for replication and trafficking.

Synthetic mutants in the TR domain. A synthetic mutant of HP and its progeny.
Binding affinity and infectivity of IL2 mutants. Gozmanova et al. 12 analyzed the binding of a Virp1 fragment to mutants of stem-loops, which were similar to the hairpin depicted in Fig. 1b. That is, their TR fragment contained both RY motifs, but only the terminal motif (as in Fig. 1c) was mutated. Infectivity of longer-than-unit-length viroid transcripts and relative binding affinity of stem-loops are given in Supplementary Table S3. From these data and further biophysical experiments, the authors concluded that the binding to the terminal RY motif was about fivefold stronger than to the internal motif.
Rosetta modelling of the synthetic IL2 mutants, which include single nucleotide replacements or a reverse complement of the loop sequence, predicted that these mutants possess at least one non-WC interaction different from but also additional to WT. Both modified interaction types might lead to an increased stability, rigidity and/ or bending of the mutant loops, explaining the reduced binding of mutants to Virp1.

Inhibition of replication and trafficking by mutations in the TR domain.
Zhong et al. 50 analyzed PSTVd loop motifs with respect to replication in protoplasts and to (replication plus) trafficking in plants. A replacement of the cWW pairs of IL1 by standard A · U WC pairs reduced replication efficiency to 29% of WT and abolished systemic infection. This experimental result fits to the findings from Rosetta modelling: either the lower thermodynamic stability of the WT loop is critical for replication competence, or the cWW loop pairs-in comparison to a loop without pairs-"protect" against endoribonucleases, or the backbone contraction in the WT loop is critical for protein binding. A replacement of the IL2 loop by three standard WC pairs reduced replication efficiency to 46% of WT and abolished systemic infection. A mutation of UU 176 → AA, which opens at least the HP-closing U · A pair, reduced replication efficiency to 53% of WT and abolished systemic infection. An additional mutation, which occurred in one plant, restored the loop-closing pair by an A · U pair that allowed this PSTVd mutant for systemic infection. This analysis points to a lower importance of loops in the TR domain for replication than loops in the TL and central domain but shows that these are most critical for trafficking 50 .

Discussion
In the following we will discuss selected features from the predictions and compare several results from Rosetta modelling with findings from the structure databases. Figure 1b only shows the terminal Virp1 binding site with pairing of the R to Y motif, whereas the second, internal binding site is masked by different pairings. A structure with both Virp1 binding sites 12 , is inferior to the one shown here by about 10 kJ/mol (mean ΔΔG 37°C of 234 PSTVd variants) or a ratio of both structure ensembles (with two versus one perfect binding site) of about 50. This fits roughly to the different binding of Virp1 to both sites 12 . However, only five of the 234 PSTVd variants miss the internal RY motif (ACs HQ452399-401, HQ452411, HQ452412) 61 , which points to the in vivo importance of a twofold RY motif. IL1 is closed by two cWW pairs. The majority of IL1 is a 2 × 2 C · U forming cWW pairs; alternatives are a single cWW plus a standard WC pair (for example 5′ CA/UU 3′, 5′ GA/GC 3′). These interactions fit to low BzCN reactivity of IL1 62 . A further "loop" stabilization by two A · U pairs with standard A-RNA geometry is prohibitive for viroid replication and abolishes trafficking 50 . Base-pairing in the asymmetric IL2 loop. According to the Rosetta model, only U 186 is bulged out while the three other nucleotides are paired. In contrast, the C 171 is bulged out in the 16S rRNA loop and makes contact to a distant base pair. In both cases, the logos for IL2 (Fig. 4c) and for the rRNA loop ( Fig. 6b and c) do not point to compatible mutations of each motif. This fits to the higher degree of sequence conservation of IL2 than of IL1.

Relevance of the two binding sites in TR.
The HP structure allows for interaction despite its stability. According to DeJong et al. 55 , the "stacking of loop nucleotides [in the histone mRNA stem-loop that is quite similar to HP] has the net effect of extending the helix by an additional base pair step", which accounts for the relatively high thermodynamic stability of HP. However, the mRNA hairpin still allows for interaction with an RNA binding protein (Fig. 7) or for a kissing-loop interaction (Fig. 8).
In our opinion, however, the similarity between HP and histone mRNA stem-loop should not be taken as a hint towards biologically relevant interactions of viroid RNA with SLBP and Exo: (i) sequences similar to SLBP are not found by BLAST in higher plants; (ii) sequences similar to Exo are present in plant genomes, but these miss the SAP domain responsible for RNA binding; and (iii) histone 3′ end processing is believed to be restricted to metazoa and green algae 63,64 . Thus the detected similarity might be a pure consequence of the recurrence of evolutionary unrelated RNA motifs 65 .
This work was inspired by the numerous tools for 3D RNA structure prediction that all promise to be manageable by non-experts. Indeed, our results point to an overall TR structure with a nice combination of rigidity and flexibility allowing the TR for binding of proteins. We have, however, not found a reason for the proposed TFIIIA binding to the TR region. Taken together, the predicted structure is encouraging for further exploration by mutational analysis and experimental structure determination.

Methods
Viroid sequences and consensus structures. For each of the 10 species of genus Pospiviroid 4 a single sequence was used to search with nucleotide BLAST for somewhat similar sequences in GenBank. For sequences from each species, a preliminary alignment was produced with MAFFT/G-INS-i 66 . According to these alignments, partial sequences were removed and the remaining sequences were adjusted for same strand orientation and similar start/end points. After a second alignment, redundant sequences were removed; that is, sequences from each genus had to differ from each other by at least one mutation. The final alignments were optimized in ConStruct 67 at 37 °C, excluding lonely base pairs, and drawn with R2R 68 . An example is the consensus structure of PSTVd (Fig. 1a) which was based on 234 unique sequences. From the PSTVd alignment, the TR domain was cut out, redundant sequences were removed, aligned with MAFFT/X-INS-i 69 , and optimized in ConStruct (Fig. 1b). From all individual alignments, the range of the terminal stem-loop of three helices was cut out, pooled, redundant sequences were removed, and aligned with MAFFT/X-INS-i. This alignment is shown in Supplementary  Fig. S2, the corresponding R2R drawing in Fig. 1c. Rosetta modelling. We used the Rosetta web server (ROSIE (Rosetta Online Server that Includes Everyone)/FARFAR (Fragment Assembly of RNA with Full Atom Refinement)/RNA De Novo) to predict 3D structures 33 . FARFAR 34 assembles RNA fragments with length ≤3 nucleotides known from x-ray structures; these fragment sequences match partial sequences of the target RNA. The assembly is a Monte Carlo (MC) process guided by a knowledge-based energy function. Resulting structure models are then refined in an all-atom potential that is thought to discriminate native-like from non-native conformations. We used the following options: full run; variation of bond lengths and angles; optimization after fragment assembly; allowance for bulges; usage of updated (2012) force field; 1,000 structures; 10,000 MC cycles. From the predicted 3D structure(s), we extracted information on types of base pairs and further interactions by rnaview 70 and x3dna-dssr 71 .
RNAredesign 45 samples the possible sequence space that stabilizes a given 3D structure with a fixed structure backbone.
Database searches. The databases RNABricks 35 and FRAbase 36 store short RNA motifs extracted from experimentally determined RNA 3D structures. Both databases allow a user to search for sequences similar to a query. Additionally, RNABricks includes a 3D search engine; as input we used 3D models predicted by ROSIE.