Structural determinants of the SINE B2 element embedded in the long non-coding RNA activator of translation AS Uchl1

Pervasive transcription of mammalian genomes leads to a previously underestimated level of complexity in gene regulatory networks. Recently, we have identified a new functional class of natural and synthetic antisense long non-coding RNAs (lncRNA) that increases translation of partially overlapping sense mRNAs. These molecules were named SINEUPs, as they require an embedded inverted SINE B2 element for their UP-regulation of translation. Mouse AS Uchl1 is the representative member of natural SINEUPs. It was originally discovered for its role in increasing translation of Uchl1 mRNA, a gene associated with neurodegenerative diseases. Here we present the secondary structure of the SINE B2 Transposable Element (TE) embedded in AS Uchl1. We find that specific structural regions, containing a short hairpin, are required for the ability of AS Uchl1 RNA to increase translation of its target mRNA. We also provide a high-resolution structure of the relevant hairpin, based on NMR observables. Our results highlight the importance of structural determinants in embedded TEs for their activity as functional domains in lncRNAs.

Transposable elements (TEs) are mobile repetitive sequences that represent about 50% of the mammalian genomes. Previously referred to as genomic "junk", it is now increasingly acknowledged that TEs contribute to a wide range of biological processes, ultimately promoting genome evolution through rearrangements of genome structure and function. Specific TEs can be regulatory DNA elements acting as promoters or enhancers 1-3 as well as platforms to recruit transcription factors and chromatin remodelling complexes [4][5][6] . They can be natural sources of regulatory sequences, co-opted to rewire gene regulatory networks 7 . In physiological conditions, TE mobilization has been implicated in the evolution of embryonic stem cells 8 and placenta 9 , in pluripotency maintenance 10,11 , in the development of the mouse neocortex 12,13 and in immune responses triggered by proinflammatory interferon-γ 7 . In addition, TEs can be "exonized" in protein-coding transcripts and function as regulatory RNA domains, contributing to alternative splicing 14 or originating miRNAs 15,16 . Recently, a number of studies have highlighted a functional role of TEs when embedded in long non-coding RNAs (lncRNAs). lncRNAs are defined as transcripts >200 nt in length with no coding capability. lncRNAs have been implicated in a variety of biological functions, regulating gene expression at many levels 17,18 . Based on updated catalogues of annotated lncRNAs 19 , it is estimated that more than 80% of human and 66% of mouse lncRNAs contain at least one exonized TE. As a consequence, 40% of human and 33% of mouse lncRNA sequences derive from TEs [20][21][22] . Selective families of TEs can be found enriched or depleted in sub-classes of lncRNAs in the two species, arguing for an active role of TEs in evolution and function of lncRNAs 20,22,23 . In this context, exonized (or embedded) TEs would represent the molecular basis for domain organization in lncRNAs 24,25 . However, direct evidence for domain functionality of embedded TEs is still limited to few examples. Human Alu elements embedded in a group of lncRNAs,

Results
Secondary structure of the invSINEB2 TE embedded in AS Uchl1 RNA. A 183 nt construct (invSINEB2/183) corresponding to the invSINEB2 element of AS Uchl1 was in vitro transcribed from a plasmid and prepared for secondary structure determination using chemical footprinting. DMS (dimethyl sulfate) and CMCT (1-cyclohexyl-(2-morpholinoethyl)carbodiimide metho-p-toluene sulfonate) were used as methylating agents. DMS preferentially methylates positions N1 and N3 of adenines and cytosines, respectively, while CMCT methylates position N3 of uridines and to a lesser extent N1 of guanines. The level of methylation is directly related to the accessibility of potential modification sites to solvent. Therefore, hydrogen bonded nucleotides are not methylated, while non-hydrogen bonded are. The methylation sites were analysed by reverse transcribing RNA into cDNA starting from a fluorescently labelled DNA primer. The DNA oligos were analysed on large sequencing gels and visualized on a densitometer ( Figure S1). Data from footprinting studies has been used as an input for restrained mFOLD secondary structure prediction 32 . It is noteworthy that data from either of the methylating agents was sufficient for an unambiguous secondary structure determination.
The invSINEB2/183 RNA folds into a structure with mostly helical secondary structure elements (Fig. 1). It exhibits several bulges, asymmetric internal loops and hairpins. Consisting of nucleotides 5-7 and 168-171, the internal loop (IL1) could not be directly probed with DMS or CMCT due to hybridization of the fluorescent DNA primer to the 3′ of the RNA. This internal loop is followed by a helical region, which contains three single nucleotide bulges, with nucleotides G154, A157 and U160 showing weak reactivity with methylating agents. The asymmetric internal loop (IL2) is comprised of nucleotides 23-25 and 144-149. Data suggests that G25:U144 base pair is not formed according to chemical footprinting as U144 is reactive with CMCT, while U143 remains protected. This is to be expected, due to the relatively low stability of G:U base pairs, especially at the termini of helical regions. The invSINEB2/183 construct features two more internal loops, which branch out into short hairpins. Comprised of nucleotides 37-41 and 123-132, the internal loop (IL3) is branched into a short stem-tetraloop element (SL3). Similarly, nucleotides 53-63 and 93-111 form a larger internal loop (IL4) with a stem-octaloop motif (SL2). The terminal hairpin (SL1) includes nucleotides 64-92 and exhibits a G/C rich stem with an A/U rich loop region. All stem nucleotides up to C64 and G92 are protected from methylation, including U66:U90 mismatch nucleotides. On the other hand, loop nucleotides G77, U78 and G79 are all susceptible to methylation by CMCT. Importantly, A80 can be methylated by DMS while A81 exhibits very weak reactivity. Partial solvent access suggests that the two A:U base pairs are involved in an equilibrium between opened and closed states.
The terminal SL1 hairpin contributes to AS Uchl1 ability to increase UchL1 protein levels. We have previously shown that the embedded invSINEB2 element acts as ED in natural and synthetic SINEUPs 31 . Deletion of invSINEB2 sequence, but not of the embedded Alu repeat, abolishes UchL1 protein up-regulation mediated by AS Uchl1 in mouse neuroblastoma cell lines 28 . Here we investigated whether secondary structure components of invSINEB2 affect the function of AS Uchl1 RNA. We focused our attention on the terminal SL1 since chemical footprinting suggests it is a stable secondary structure element within invSINEB2. Furthermore, as terminal hairpin, it should be easily accessible to RNA-protein interactions. We disrupted the terminal hairpin structure by deleting nucleotides 68-77 of invSINEB2 (ΔSL1) from the full length AS Uchl1 (ΔSL1 mutant). According to mFOLD structure prediction, this deletion would not affect the stable helical regions, but rearrangements in IL4 are expected. To investigate invSINEB2-ΔSL1 activity when embedded in full length AS Uchl1, we took advantage of murine neuroblastoma Neuro2a cells, as they express Uchl1 mRNA but do not contain detectable levels of endogenous AS Uchl1. AS Uchl1 activity was defined as UchL1 protein increase in the presence of unchanged mRNA levels, as quantified by western blotting and qRT-PCR, respectively. AS Uchl1 caused an ~1.9-fold increase in UchL1 protein levels while maintaining stable Uchl1 mRNA levels, as expected for a post-transcriptional regulatory mechanism. Interestingly, the ΔSL1 deletion mutant abolished the ability of AS Uchl1 RNA to up-regulate UchL1 protein levels ( Fig. 2A). Indeed, UchL1 amounts were comparable in cells transfected with ΔSL1 mutant and in control samples, while differing in a statistically significant manner from cells with full length AS Uchl1 (Fig. 2B).
Taken together, these data indicate that the terminal SL1 hairpin of embedded invSINEB2 is a structural determinant required for AS Uchl1 ability to increase protein levels as synthesized from its target mRNA.
Solution state NMR reveals a triloop structure. To further elucidate the structure of SL1 NMR spectroscopy in solution has been utilized. The full length invSINEB2/183 element and the ΔSL1 construct did not give resolved NMR spectra (Fig. S2) due to their high molecular weights (>55 kDa). Therefore, a 38 nt RNA fragment (invSINEB2/38) corresponding to nucleotides 59-96 of the invSINEB2/183 has been synthesized for NMR studies. Analysis of 1D 1 H spectra revealed that a single species is formed in solution (Fig. 3A). Although experiments were carried out at 25 °C it is noteworthy that the structure is stable also at the physiological temperature of 37 °C. With the use of a natural abundance and a uniformly 13 C, 15 N-labeled RNA most imino, amino, aromatic and anomeric resonances could be assigned with a set of triple resonance NMR experiments. A sequential walk could be traced in NOESY spectra of the 38 nt construct ( Figure S3) although some cross-peaks could not be directly observed due to signal overlap ( Figure S4). Perusal of 15 N HSQC NMR spectra (Fig. 3B) revealed that the secondary structure of the invSINEB2/38 RNA is in agreement with the structure of this fragment within the entire invSINEB2/183 RNA. The RNA folds into a stable hairpin structure, with five and four nucleotide overhangs on 5′ and 3′ ends, respectively (Fig. 3C).
The stem region with observable imino protons extends from nucleotides C64-U75 and A81-G92 and is comprised of seven G:C, three A:U and one G:U base pairs with an additional U:U mismatch. The 15 N HSQC spectrum acquired at 25 °C contains the signals of imino protons G70-G74 comprising the core of the stem (Fig. 3B). Lowering the temperature to 0 °C revealed additional signals for base paired nucleotides, whose imino protons are to a lesser extent protected from exchange with solvent. These include the terminal U75:A81 base pair and the region from C64:G92 to U69:A87 including the G68:U88 base pair. Interestingly, no signal could be observed for U76 imino protons, which would indicate that U76:A80 base pair is dynamic or not formed at all. On the other hand, U75 exhibits a narrow imino proton resonance at 0 °C, suggesting that base pair U75:A81 is stable. Furthermore, imino proton signals of G68, U88 and G89 are broad even at 0 °C indicating that base pairs G68:U88 and C67:G89 are relatively weak, which is most likely due to the destabilizing effect of the adjacent U66:U90 mismatch for which no imino resonances can be observed.
2D COSY and TOCSY spectra exhibit no intense H1′/H2′ cross-peaks indicating that sugar puckers of all nucleotides are C3′-endo. Intensities of intra-nucleotide H6,H8/H1′ NOESY cross-peaks suggest that all nucleotides in the stem region are in anti conformation. However, due to inconclusive analysis of H6,H8/H1′ cross-peak intensities in the loop region, MD simulations did not include any χ angle restraints for nucleotides U75-A81.
A high resolution structure of the hairpin RNA without the 5′ and 3′ overhangs (29 nt) has been derived on the basis of NMR observables and restrained molecular dynamics simulations. A set of 100 structures was calculated using a simulated annealing protocol from which 10 of the lowest energy structures were selected and subjected to subsequent energy minimization. The hairpin adopts an A-type helical stem, which is terminated by a triloop (Fig. 4). The final set of 10 structures exhibits a pairwise heavy atom RMSD of 0.7 Å ( Table 1). The stem region nucleotides (C64-U75 and A81-G92) give an average RMSD value of 0.5 Å, while the loop nucleotides (G77-G79) exhibit a higher average RMSD of 0.7 Å. The stem region exhibits average rise and twist parameters of 3.0 Å and 30.3 Å, respectively. In comparison, the rise and twist parameters in standard A-type RNA are 2.8 Å and   32.8 Å, respectively. Due to the lack of observable imino proton signals for U76, hydrogen bond restraints for the U76:A80 base pair were not included in the calculations. In addition, all backbone dihedral angle restraints were omitted for the loop and two adjacent A:U base pair nucleotides (U75-A81).
The resulting structure of the simulated annealing protocol shows that the U76:A80 base pair is actually formed. However, the lack of an observable U76 imino resonance suggests increased exchange of the imino proton. Similarly, the U66:U90 mismatch in the final MD structures is nearly coplanar with two hydrogen bonds (N3-H3…O2 and O4…H3-N3), which however do not give observable imino 1 H resonances suggesting a dynamic nature of the two nucleotides. Interestingly, the arrangement of the loop nucleobases GUG clusters into one of two possible energy minima in the final ensemble of structures. In one family of structures G79 exhibits a χ angle of around 100° and is nearly coplanar with G77 (Fig. 4C), while in the second family of structures G79 is flipped and exhibits a χ angle of around 45° (Fig. 4D). In both cases G79 is in syn conformation and its N7 atom forms a hydrogen bond with G77's amino group. No stacking interactions between G79 and A80 could be observed in either of the energy minima. On the other hand, all structures exhibit efficient stacking of G77 on U76 and U78 on G77. Analysis of 100 molecular dynamics simulations reveals nearly equivalent relative populations of the two loop structures ( Fig. 4C and D).

MD simulation supports NMR results.
A 400 ns MD simulation was performed with model 1 of the refined structures as a starting conformation. It has been divided into two equal parts which were analysed separately to validate the convergence of the simulation. Since both parts provided similar qualitative data, only results from the second part (200-400 ns) are shown here. The ensemble averages NOE i of the simulation were calculated as = NOE r 1/ i i 6 . For 7 NOEs, the ensemble average was not satisfied, meaning that the average distance between the corresponding hydrogen atoms was too large. 5 out of these 7 NOEs coincide with the ones that were not satisfied in the NMR-refined structures, showing an accumulation of violations in the triloop and its direct vicinity.
The snapshots in the simulation could be reweighted such that the weighted ensemble averages satisfied all NOEs. From the 20 highest weighted structures, a set of 5 structures was chosen, in which every NOE is satisfied in at least one structure. This set was selected by i. taking the highest weighted structure and ii. including from the next highest structures the ones which reduce the number of unsatisfied NOEs. This set of structures can be found in the Supplementary Material.
The amount in which imino protons are participating in hydrogen bonds has been calculated for each donor as a fraction of the reweighted ensemble (Table 2). In agreement with the HSQC spectra acquired at 25 °C, the imino protons in the stem region from G70-G74 are entirely bound in hydrogen bonds (stability 0.98-0.99). All pairs corresponding to the spectra acquired at 0 °C had lower stabilities in the simulation. For the G68:U88 pair, which showed a weak signal in the spectrum, the hydrogen bond involving G68 as a donor had a stability of 0.90, while the one with U88 as a donor had a stability of 0.76. For the U66:U90 pair with MD values of 0.68 (U66 as donor) and 0.41 (U90 as donor) no NMR signal could be observed. The hydrogen bonds of the adjacent Watson-Crick pairs (donors U69, G89, G91 and G92) have stabilities between 0.80 and 0.95, with higher values for the pairs further away from the U:U pair, which is also in agreement with the experimental data. For the base pair adjacent to the triloop, for which no NMR signal was observed, the hydrogen bond U76:A80 was formed with a stability of 0.82. In this A:U base pair the accessibility to solvent seems the prevailing factor responsible for the absence of signals in the HSQC spectrum.
The average annotations for pairs of bases were calculated from the reweighted ensemble and are shown in Supplementary Material Figure S5. They agree qualitatively with the analysis of the imino protons. The base pairs next to the triloop, starting from pair 76U-80A, show stable interactions on their respective Watson-Crick edge. However, the two base pairs immediately next to the loop have a significant fraction of states that do not match a canonical Watson-Crick pair annotation (labelled as WW in Figure S5). A similar effect is observed for the base pairs next to the U66:U90 mismatch. The upward stackings between neighbouring bases starting from G73-G74 until G77-U78 are very stable (0.9-1.0). Stacking interactions between bases U78-G79 (downward, 0.6) and G79-A80 (outward, 0.3) are significantly less stable, indicating that the loop is highly dynamic. This is in qualitative agreement with the fact that base G79 is arranged differently in the different models obtained from annealing. The next upward stackings from A80-A81 to A87-U88 are again very stable (0.8 to 1.0).

PDB mining for similar structures.
The entire PDB was searched for fragments of NMR models in solution with some degree of structural similarity to a template built using the apical 13 nucleotides (U72-A84) of model 1. All the structures with an εRMSD <1.0 are reported in Table 3. The closest match is 2N4L, an intronic splicing silencer with a AGUGA pentaloop and 46% sequence identity with the template. One structure out of the 8 in the list is from a tRNA anticodon stem loop, although the similarity to the template should be probably ascribed to the presence of a loop with an odd number of bases. It is noteworthy that the PDB does not contain any solution structure of a tRNAHis that would have the same sequence in the triloop (GUG).

Discussion
The mouse genome contains approximately 350,000 SINE B2 sequences, dispersed throughout the genome, as independent transcriptional units or embedded in longer RNA polymerase II transcripts 33 . As independently transcribed RNAs, SINEs have been shown to possess important biological functions. SINEB2 and Alu suppress mRNA transcription upon heat shock by direct interaction with RNA polymerase II [34][35][36] . BC1 and SINE elements are involved in RNA localization and in the regulation of translation in neurons 37 . More recently, "exonized" or "embedded" SINE elements have been hypothesized to act as portable domains in lncRNAs, thus contributing to their biological functions 24,31,38 . A major issue to the "embedded domain" hypothesis is represented by the poor sequence conservation of lncRNAs during evolution. It is becoming evident that the sequences with higher inter-species homology do not necessary represent the functional domains of lncRNAs. This has been seen in several examples including Xist, a lncRNA involved in X chromosome inactivation 39,40 . In other cases, as in HOTAIR, the mouse and human genes share very little sequence conservation and display a completely different exon/intron organization. However, in the two species, the pattern of expression and the biological function are fully conserved 41 . Overall these data suggest that conserved lncRNAs' functions are based on structural determinants. This concept may stand true also for embedded TEs. To address the issue of structural/functional relationship in lncRNA domains, here we take advantage of the modular organization and the well-defined biological function of AS Uchl1. This transcript contains an embedded invSINEB2 element acting as ED and crucial for its ability to increase translation of Uchl1 mRNA 28 . We show that a large portion of the invSINEB2 secondary structure is comprised of helical structural elements. Absence of longer single stranded segments precludes any sequence-specific recognition. Therefore it is more likely that the functionality of invSINEB2 is based on its structural features. The invSINEB2 structure exhibits several internal loops and hairpins that may serve as structural motifs for specific recognition by a currently unknown partner molecule. An extended stem-loop structure, with terminal loops and internal bulges, has been previously observed for other SINE RNAs 42 , positioned either at the 5′ end, as for BC1 43 , or at the 3′ terminus, as in the case of salmon SmaI SINE 44 .
Terminal stem loop hairpin structures are often used as functional RNA motifs given their overall accessibility. This is confirmed also for the embedded invSINEB2 element of AS Uchl1. Deletion of the SL1 structural motif abolished the ability of AS Uchl1 to increase endogenous UchL1 protein levels. These results can be interpreted according to two models: 1) the SL1 terminal structural element is needed to maintain the overall structure of the invSINEB2, which in turn is necessary for AS Uchl1 activity; 2) SL1 represents per se a structural determinant of AS Uchl1 activity. We have found that the structure of the invSINEB2 RNA is mostly not affected by SL1 deletion, at least as predicted by mFOLD. The prediction is compatible with our chemical footprinting data that indicate a stable "basal" structure of the molecule with a more flexible region at its "apical" part. Altogether, our results support the model of the invSINEB2 as an independent folding unit acting as ED by the mean of a terminal stem loop structure. Recently, a similar model was reported for human lincRNA-p21 (hLinc-p21), a lncRNA containing an embedded SINE. This intergenic lncRNA was originally discovered in mouse as involved in stress  responses mediated by p53. hLinc-p21 is a single exon gene and contains an inverted repeat Alu element (IRAlus). The authors show that structural determinants in the embedded IRAlus contribute to the nuclear localization of hLinc-p21 45 .
It should be noted that the structure/function model observed for the embedded SINEs in AS Uchl1 and hLinc-p21 seems to differ from what is observed in RNA polymerase III-transcribed SINE B2 RNA. The repression of RNA polymerase II transcription by SINE B2 RNA strictly depends on an internal single stranded region, rather than on a terminal stem loop structure 46 .
Interestingly, the impact of repeats on lncRNA functions can be also extended to repeats not derived from TEs. The Firre lncRNA contains a repeat RNA domain that is necessary and sufficient for nuclear retention 47 . It will be interesting to examine whether structural determinants contribute to the biological activity of these embedded repeats. SINE B2 elements are not conserved in humans. We have recently identified human antisense lncRNAs that share the same domain anatomy of AS Uchl1 and function as natural SINEUPs in human cells 48 . In particular, a Free Right Alu Monomer (FRAM) embedded in AS PPP1R12A is essential for its ability to up-regulate translation. In the future, it will be interesting to assess whether embedded SINEs of human and mouse origin share the same structural determinants or, alternatively, whether different structural determinants are recruited for the same biological function.
Given the requirement of SL1 for AS Uchl1 function, we further provided a high-resolution description of its structural determinants using NMR. The SL1 hairpin features a stable G:C rich stem with a single U:U mismatch. On the other hand, its loop exhibits dynamic properties. The central GUG of the loop is followed by two A:U base pairs. According to chemical footprinting of the full length invSINEB2/183 both A:Us are accessible to methylating agents. NMR data on the truncated invSINEB2/38 clearly shows the formation of U75:A81 base pair. MD simulations also show that the U76:A80 base pair (adjacent to GUG) is stable. The discrepancies between NMR and MD results can be attributed to higher solvent accessibility of U76:A80. However, the susceptibility of the two A:U base pairs to methylation in invSINEB2/183 may suggest that SL1 is destabilized in the full length construct compared to the truncated construct used in NMR studies. The dynamic properties of the loop could be a significant factor for SINEUP activity.
SINEB2 elements are evolutionary linked to tRNA genes 49 . The secondary structure of the SL1 hairpin with a triloop followed by two unstable base pairs shares a certain degree of similarity with the anticodon arm of tRNA molecules. However, comparison of the SL1 3D structure with known structures from the PDB does not show a high degree of similarity on the high-resolution level and dismisses the possibility that tRNA mimicry could be the ground for recognition and interaction of AS Uchl1 mRNA. A similar terminal triloop structure is also seen in BC1 RNA although structure/function relationship studies have provided conflicting results on the exact contribution of structural determinants to BC1 function 50 . Analysis of transgenic mice expressing structural variants in vivo indicates a prominent role for the supporting basal structure in dictating BC1 biological activity 51 suggesting a much more complex scenario than expected from the evolutionary origin of TEs.
In conclusion, we have shown that SL1 is a structural determinant required for AS Uchl1 activity. Further studies will elucidate the precise mechanism to increase protein translation by the embedded invSINEB2 and whether the SL1 motif is the sole ED portion responsible for AS Uchl1 activity. This knowledge will be important to optimize synthetic SINEUPs for their use as therapeutics of human genetic diseases such as haploinsufficiencies.

Methods
Plasmids. The nucleotide sequence corresponding to the invSINEB2 repeat embedded in AS Uchl1 was PCRamplified from a pcDNA 3.1(-) plasmid expressing AS Uchl1 full length 28 . PCR fragment was digested with EcoRI and HindIII and sub-cloned into pUC19 vector for in vitro transcription (pUC-T7-invSINEB2). A T7 priming site was included in the forward primer. The following primers were used for cloning: FWD EcoRI T7 invSINEB2 GAGAGAATTCTAATACGACTCACTATAGGG-CAGTGCTAGAGGAGG REV HindIII B2 GAGAAAGCTTAAGAGACTGGAGC AS Uchl1 mutant lacking the SL1 domain within the invSINEB2 element (Δ68-77 nt) was obtained by gene synthesis and cloned between at XbaI/HindIII sites in pcDNA 3.1(-).
RNA synthesis and purification. 183 nt RNA used for chemical footprinting was in vitro transcribed using a T7 RNA polymerase (Promega), rNTPs (Jena Bioscience) and pUC-T7-invSINEB2. The overnight transcription reaction was quenched with EDTA. Phenol-chloroform extraction was used to remove the proteins and extensive ultrafiltration with H 2 O through a 1000 MWCO membrane (Millipore) removed the low molecular weight components.
Similarly, 38 nt RNA construct (GGUAACCUCGUGGUGGUUGUGAACCACCAUGUGGAUGG) for NMR studies was also prepared by in vitro transcription, however, at a considerably larger scale and with a DNA oligonucleotide (Eurogentec) containing a T7 promoter used as a template. Last two 5′ residues of the template were 2′-OMe modified in order to keep by-products to a minimum. Unlabelled rNTPs (Jena Bioscience) and 13 C, 15 N-labeled rNTPs (CIL) were used for the preparation of a natural abundance and a uniformly double labelled RNA, respectively. Phenol-chloroform extraction was followed by precipitation in ethanol, the precipitate was recovered with centrifugation and by redissolving it in water. RNA was further purified with denaturing (7 M urea) PAGE electrophoresis (15%). Only bands containing the full length 38 nt RNA were excised from the gel and recovered by electroelution (Schleicher & Schuell). Finally, the RNA solution was extensively ultrafiltrated with the NMR buffer (20 mM Tris HCl buffer, 20 mM NaCl, 2 mM EDTA). The RNA concentration in the NMR samples was 0.5 mM.
The full length 183 nt invSINEB2 element and the ΔSL1 construct used for NMR were in vitro transcribed using a T7 RNA polymerase and corresponding pUC plasmids. After overnight transcription was quenched with SCIENTIFIC REPoRtS | (2018) 8:3189 | DOI:10.1038/s41598-017-14908-6 EDTA the reaction mixture was loaded directly onto a GE HiPrep DEAE anion exchange column. The RNA fraction was collected and the product was precipitated in ethanol. After desalting and drying the RNA was dissolved in NMR buffer.
Chemical footprinting. InvSINEB2 RNA was treated either with dimethyl sulfate (DMS), which methylates As and Cs, or 1-cyclohexyl-(2-morpholinoethyl)carbodiimide metho-p-toluene sulfonate (CMCT), which methylates Us and to a lesser extent Gs. 2 μL of 20% solution of DMS in ethanol were added to 20 μL of RNA solution. Modification reactions were incubated at room temperature for 2, 5, 10 or 20 min. Reactions were quenched with 475 μL of 30% β-mercaptoethanol/0.3 M NaOAc mixture. For CMCT reactions Tris HCl was added to 25 μL of RNA to adjust the pH to 8.1. 1 μL of CMCT solution (500 g/L) was added to each reaction, which were incubated at room temperature for 2, 5, 10 or 20 min. Reactions were quenched with 100 μL of 0.3 M NaOAc. DMS and CMCT solutions were prepared fresh each time minutes before treatment. The modified RNA from quenched reactions was recovered by ethanol precipitation. Dried RNA was redissolved in 20 μL H 2 O and used for primer extension. ATTO 620 5′ fluorescently labelled 18 nt DNA oligos (Eurogentec) were used as primers. The primer is complementary to the 3′ of invSINEB2 RNA and was hybridized to it by incubating the mixture at 70 °C for 5 min followed by flash cooling on ice. 80 pmol of primer and 20 pmol of RNA resulted in efficient hybridization and were used for each reaction. 200 U of Superscript II reverse transcriptase (Invitrogen) was used for primer extension. Reactions were incubated for 45 min at 42 °C following the manufacturer's protocol and using the supplied buffer. 125 μM dNTP mixture was added to the reaction mixture. Reactions were stopped by a 20 min incubation at 70 °C. Subsequently, the modified RNA was degraded with a 10 min incubation with 2 μL (10 mg/mL) RNase A at 60 °C. The reaction mixtures were then directly loaded to large sequencing 8% 7 M urea PAGE gel.
Positive control reactions were run in a similar manner, however, with the DMS or CMCT treatment performed for half a minute at 90 °C. Negative control reactions were not treated with methylating agents. In order to position the bands within the RNA sequence classical sequencing reactions with ddNTPs were run in parallel on gels. Non-methylated RNA was used in primer extension reactions with individual ddNTPs being added to the reaction in the ratio of dNTP:ddNTP of 1:5. Bands were visualized on a Perkin Elmer ProEXPRESS imaging system.  (1:20). Oligonucleotide sequences of primers used in this study for GAPDH, Uchl1 and AS Uchl1 were previously described in Carrieri et al. 28 . The amplified transcripts were quantified using the comparative Ct method and relative gene expression was calculated with the ΔΔCt method (Schmittgen and Livak, 2008). Statistical Analysis. All data are expressed as mean ± standard deviation on n ≥ 3 replicas. Statistical analysis was performed using Excel software. Statistically significant differences were assessed by Student's t-test. Differences with p < 0.01 were considered significant. NOESY, DQF-COSY and TOCSY NMR spectra were recorded with the natural abundance sample. 2D 13 C and 15 N HSQC and HCN spectra were recorded with the isotopically enriched sample. 3D NOESY-HSQC and HCCH-TOCSY spectra were also acquired, but gave poor S/N ratio due to low RNA concentration. Cross-peak intensities were therefore evaluated from 2D NOESY spectra.

NMR.
All spectra were acquired on an Agilent VNMRS 600 MHz spectrometer equipped with a cryogenic probe. DPFGSE water suppression scheme was used for suppression of the water signal. All experiments were performed at temperatures of 0, 25 or 37 °C. NMR spectra were processed and analysed using VNMRJ (Agilent), NMRpipe and Sparky software (UCSF). Simulated annealing. NOE distance restraints for non-exchangeable protons were obtained from 2D NOESY spectra recorded at 25 °C in 100% 2 H 2 O with mixing times ranging from 80 to 250 ms. NOE distance restraints for exchangeable protons were obtained from 2D NOESY spectra recorded at 0 °C in 5% 2 H 2 O, 95% H 2 O with a mixing time of 250 ms. Cross-peaks were classified as strong (1.8-3.6 Å), medium (2.6-5.0 Å) and weak (3.5-6.5 Å).
Structure calculations were performed using AMBER 14 software with the ff14SB force field 52 . Initial fully extended structure was created using the leap module of Amber. The structure was then subjected to molecular dynamics (MD) calculations using a generalized Born implicit solvation model 53 . During MD NMR restraints were gradually introduced in order to obtain a hairpin structure. This was used as a starting structure for a series of simulated annealing (SA) calculations. For each of 100 SA calculation a random starting velocity was used. For the first 200 ps the molecules were held at a constant temperature of 1000 K. The molecules were then cooled to 300 K in the next 400 ps, after which the temperature was further decreased to 0 K in the next 400 ps. The force constants were 20 kcal mol −1 Å −2 for NOE distances and 500 kcal mol −1 rad −2 for torsion angles. The SHAKE algorithm for hydrogen atoms was used with a tolerance of 0.0005 Å.
A family of 10 structures with the lowest energy was subjected to a maximum of 10000 steps of steepest descent minimization and selected for further analysis. Helical parameters were determined with 3DNA 2.0 software.
Comparison of refined structure with structures from PDB. The 13 residue hairpin (72-84) of model 1 of the refined structures was compared to solution-state NMR structures in the PDB (as of February 10th, 2016). baRNAba 54 was used to search similar motifs, independent of sequence. Similarity was measured using the εRMSD, which reports on the relative distance and orientation of all pairs of interacting bases. Molecular dynamics simulations. Model 1 from the refined structures (29 nt) has been used as a starting structure for unrestrained molecular dynamics simulations (MD) of 400 nanoseconds length. The structure was solvated with water and neutralized with sodium ions. Water molecules were then replaced with ions to obtain a 0.1 mol/L concentration of NaCl, resulting in a solution of 18822 water molecules, 64 sodium and 36 chloride ions. After energy minimization, the system was subjected to a 400 ns unrestrained MD at constant temperature (T = 300 K) and pressure (P = 1 bar) 55,56 . All bonds were constrained, and snapshots were saved for later analysis every 4 ps 57 .
Simulations were done with GROMACS 4.6.7 using the AMBER force field including the latest corrections for RNA 58-60 and ions 61 . Reweighting. The snapshots from the simulations were reweighted according to the maximum entropy principle 62 such that the averaged reweighted ensemble satisfies all NOE distance restraints. The NOE intensities were classified as weak, medium and strong, corresponding to average distances r weak ≤ 6.5 Å, r medium ≤ 5.0 Å and r weak ≤ 3.6 Å. Since NOEs are proportional to 1/r 6 the NOEs from the simulation were calculated as NOE i (t) = 1/ r i (t) 6  . Lagrangian multipliers λ i were initialized to zero and updated according to λ λ σ 2 with the operator += indicating that λ i is incremented by the amount on the right side. K is a constant with the value 10 −5 . σ accounts for errors in the experimental values and is equal to 0.5 63  Imino hydrogen bonds. Each simulation snapshot was weighted by the final set of weights W(t) which satisfied all NOE constraints. Hydrogen bonds involving imino protons (G:H1 and U:H3) were calculated as the sum of the weights of the snapshots in which the following criteria were satisfied: the distance between donor and acceptor had to be less than 3.5 Å and the angle hydrogen-donor-acceptor less than 30°. Each hydrogen bond has a stability value between 0 and 1.

Annotations.
For the reweighted ensemble, annotations were calculated using baRNAba 54 . The stability of an annotation for a pair of bases was calculated as the sum of the weights of the snapshots in which the annotation was present, and is defined in the range from 0 to 1.