Eukaryotic Box C/D methylation machinery has two non-symmetric protein assembly sites

Box C/D ribonucleoprotein complexes are RNA-guided methyltransferases that methylate the ribose 2’-OH of RNA. The central ‘guide RNA’ has box C and D motifs at its ends, which are crucial for activity. Archaeal guide RNAs have a second box C’/D’ motif pair that is also essential for function. This second motif is poorly conserved in eukaryotes and its function is uncertain. Conflicting literature data report that eukaryotic box C’/D’ motifs do or do not bind proteins specialized to recognize box C/D-motifs and are or are not important for function. Despite this uncertainty, the architecture of eukaryotic 2’-O-methylation enzymes is thought to be similar to that of their archaeal counterpart. Here, we use biochemistry, X-ray crystallography and mutant analysis to demonstrate the absence of functional box C’/D’ motifs in more than 80% of yeast guide RNAs. We conclude that eukaryotic Box C/D RNPs have two non-symmetric protein assembly sites and that their three-dimensional architecture differs from that of archaeal 2’-O-methylation enzymes.


Results
Snu13 binds to none of the predicted non-canonical box C'/D' motifs. Of the 43 known snoRNAs in yeast, only two have box C'/D' motifs matching the canonical sequences 26 . To understand whether predicted, "non-canonical" box C'/D' sequences adopt the typical secondary structure and k-turn three-dimensional structure that justifies their annotation as box C'/D' motifs, we selected three such motifs from the Saccharomyces cerevisiae (Sc) snoRNAs, snR41, snR51 and snR54, and tested their ability to bind the L7Ae protein of archaea and the Snu13 protein of eukaryotes, which specifically recognize k-turn structures. We annotated the box C' and D' sequences of these three snoRNAs manually, as described in Methods. The box D' sequences corresponded to those annotated in the yeast snoRNA database UMass-Amherst 41 . The box C' sequences of snR51 and snR54 corresponded to those annotated in 26 . The box C' sequence annotated for snR41 (UAC AUGU ), differed by three nucleotides from that annotated in 26 (AUGU GCA), because our choice maximized the number of base-pairs in stem II matching those of a canonical box C'/D' motif. We used the sequences from snR51, snR41 and snR54 predicted to form box C'/D' motifs and their flanking structural elements to generate three RNAs: snR51-kl1, snR41-kl1, and snR54-kl1 (Fig. 1B,C). The RNAs were designed to reproduce the native local structure around the predicted box C'/D' motifs; we avoided adding sequences that would cause the 3' and 5' ends of the RNAs to form a helical structure, as the sequences downstream of box C' and upstream of box D' in the snoRNAs do not base-pair with each other. Secondary structure predictions for box C'/D' elements shown in (Fig. 1B-C) are based on sequence annotations of box C' and D' , as described above, and experimentally-known, protein-bound secondary structures of box C/D elements 22,42 . Rather than study the structures of these RNAs in isolation, we tested whether they adopt the typical box C'/D' kinked structure by assaying their ability to bind L7Ae. This protein has a very strong affinity for k-turn-like RNA structures 40 and also binds box C/D sequences that deviate from the consensus 43,44 , as well as RNAs that do not have a stable fold in isolation (such as sR26-kl, which is derived from the box C'/D' motif of P. furiosus sR26, Fig. 1C). Moreover, we know that even box C/D consensus sequences do not necessarily adopt the k-turn conformation in the absence of binding proteins 45,46 . Also, we tested whether snR51-kl1, snR41-kl1, and snR54-kl1 bound to Snu13, the yeast orthologue of L7Ae. Protein binding to the RNAs was monitored by electrophoretic mobility shift assays (EMSA) 47 using 5'-Cy5 fluorescently labeled RNAs. Of the three putative box C'/D' motifs tested, only snR51-kl1 bound to L7Ae (Fig. 1D). Similar to the positive control sR26-kl, the snR51-kl1 RNA was completely displaced at a 1:1 ratio of the total concentrations of RNA and protein. This demonstrates that the K D of the complex is at least one order of magnitude smaller than the total RNA concentration, corresponding to a K D ≤ 100-200 nM). According to its annotation, the snR51-kl1 box C'/D' motif contains the 1n−1b A•G base-pair of the consensus sequence, as well as base-pairs -1 and -2 of stem I. By contrast, the predicted snR41-kl1 box C'/D' motif lacks the 1n−1b A•G base-pair, and snR54-kl1 lacks both base-pair -1 of stem I and the 2n−2b G•A base-pair of stem II. The positive control sR26-kl has the consensus box C/D sequence but lacks stem I.
The results of Fig. 1D suggest that either a conserved box C/D consensus sequence or an intact −1 base-pair in stem I and a conserved 1n−1b A•G base-pair are necessary for the RNA to bind L7Ae. To verify this conclusion, we generated mutants of all three snR51-kl1, snR41-kl1 and snR54-kl1 RNAs and tested their ability to bind L7Ae (Fig. 1D). As predicted, L7Ae interacted with snR41-kl2, which has the 1n−1b A•G and the 2n−2b G•A base-pairs, as well as an intact -1 base-pair in stem I, albeit weaker than with snR51-kl1. Introduction of the 2n−2b G•A base-pair in snR51-kl2 did not improve its affinity for L7Ae, whereas substitution of stem I with a loop in snR51-kl3 increased the dissociation rate (k off ) of the complex, leading to smearing of the gel bands. On the other hand, snR51-kl4, with the 2n−2b G•A base-pair but without the −1 base-pair, bound to L7Ae similarly to snR51-kl1.
Unlike L7Ae, Snu13 bound none of the three native RNAs, snR51-kl1, snR41-kl1 and snR54-kl1 at the maximum ratio of the total concentrations of RNA and protein of 1:2, which suggested a K D at least one order of magnitude higher than the total RNA concentration (corresponding to a K D ≥ ~ 10 μM). Weak binding with band smearing was seen for the snR51-kl2 mutant containing the 2n−2b G•A base-pair (Fig. 1E).
Finally, we repeated a subset of EMSAs using a 50 times higher RNA concentration of non-fluorescently labelled RNA (Supplementary Fig. 3) and visualized the RNA by staining with ethidium bromide. Also in this case, snR51-kl2 was the only RNA that showed binding to Snu13, confirming that all other RNAs either do not bind Snu13 (K D ≥ ~ 500 μM) or do so with a very weak affinity (K D ~ 100 μM). As a control Snu13 bound canonical k-turn sequences at a ratio of the total concentrations of RNA and protein of 1:1 ( Supplementary Fig. 4).
These data suggest that both 1n−1b A•G and 2n−2b G•A base-pairs and an intact −1 base-pair in stem I are required to form a k-turn structure that can be recognized by Snu13. Thus, box C'/D' sequences containing all three features form a functional box C'/D' motif in eukaryotes.
Structure of archaeal L7Ae bound to a eukaryotic non-canonical box C'/D' motif. The 2n−2b G•A base-pair of stem II, whose absence abolishes binding to Snu13 but not to L7Ae, is conserved in archaeal guide RNA. Thus, to understand whether non-canonical box C'/D' motifs lacking the 2n−2b G•A base-pair adopt a k-turn-like structure in the presence of L7Ae, we set out to solve the structure of the L7Ae-snR51-kl1 complex. We obtained crystals for L7Ae in complex with a snR51-kl1 mutant, termed snR51-kl1-S, in which stem I was shortened to eight base-pairs and determined the X-ray crystallographic structure at a resolution of 1.9 Å ( Fig. 2A, Table 1, Supplementary Fig. 5). The annotated box C'/D' sequences did not base-pair as predicted. Instead, the 1n adenosine and the 4b guanosine formed the first A•G base-pair of stem II, rather than the 1n and 1b nucleotides. This shifted the putative box C' sequence by three nucleotides from 5'-UUG AUGA to 5'-AUGA CUA (Table S2). Nucleotides originally annotated as L2 and L3 ( Fig. 1) were part of stem I, whereas nucleotide 3b was bulged out and adopted the position of L3 in the structures of L7Ae-k-turn RNA complexes 42  www.nature.com/scientificreports/ base-pair pattern resulted in stem I containing more canonical base-pairs than predicted as well as in a purine, instead of a pyrimidine, at position L2 ( Fig. 2A). The preference for a purine in this position had been previously established for the Snu13-U3 box C'/D complex, because of its favorable stacking on the first A•G base-pair of stem II 48 . Finally, as in many crystal structures of protein-RNA complexes or of isolated RNA, we cannot exclude that crystal packing contacts between RNA molecules influence the conformation adopted by the RNA in the crystal ( Supplementary Fig. 5). In our structure the stretch 1 GUAC 4 of one RNA forms base-pairs with the same stretch of the neighboring RNA molecule. In our structure, stem II was disrupted with the exception of the first sheared G•A base-pair. The guanosine of this base-pair engaged the backbone H N and the side chain of L7Ae residue E38 in two crucial hydrogen bonds 4n 3n 2n 1n -1n -2n -3n Figure 2. Structure of archaeal L7Ae bound to a eukaryotic non-canonical box C'/D' motif. (A) X-ray structure of Pf L7Ae (pink) bound to RNA snR51-kl1-S (gray), containing the predicted box C'/D' motif from Sc guide RNA snR51. Predicted box C' and D' are colored in green and purple, respectively. The secondary structure of snR51-kl1-S seen in the X-ray structure deviates from the one predicted in Fig www.nature.com/scientificreports/ with its O6 and imino hydrogen, respectively (Fig. 2B). These interactions are conserved in all structures of L7Ae and Snu13 in complex with k-turn RNAs 22,44,49 , explaining why this A•G base-pair is essential for protein recognition. Next to the A•G base-pair was a hydrogen bond formed between the O2 of the U(2n) and the N6 of the A originally annotated as 5b stacked on top of A(1n) (Fig. 2B). There were numerous polar interactions between the box D' sequence and the Arg and Lys residues of L7Ae helix α 2 (Fig. 2B); thus, despite remaining single-stranded, the backbone of the box D' element was oriented similarly to that of a canonical k-turn, while the backbone of the box C' element followed a different trajectory (Fig. 2C).
In the crystal structure, the side of stem I in the RNA snR51-kl1-S contacted loop 9 of L7Ae (Fig. 2D). The guanosine residue initially annotated as 1b stacked on the first G•U base-pair of stem I; its imino hydrogen was involved in a hydrogen bond with the side chain of E93, as in all other L7Ae-k-turn RNA complexes with a guanosine residue at this position. Hydrophobic contacts occurred between residues I92 and V94 and the bases of the purines initially annotated as 1b and 2b, respectively. Residues A97 of loop 9 and I62 of helix α 4 formed a cluster of hydrophobic residues around the base and backbone of the central kink nucleotide (the 3b uridine), while the backbone carbonyl of D58 formed a hydrogen bond with its imino hydrogen.
We conclude that L7Ae induces a kinked-structure in the RNA even in the absence of the 2n−2b G•A basepair and when stem II is disrupted. The eukaryotic orthologue Snu13 is unable to do the same and requires the 2n−2b G•A base-pair to bind the RNA. However, even in the presence of L7Ae, the box C'/D' motif of snR51 is not a bona fide box C'/D' motif, as the backbone of the box C' sequence adopts a different conformation from that of a k-turn (Fig. 2C). www.nature.com/scientificreports/ Amino acid residues in loop 9 of L7Ae and Snu13 tune binding affinities for guide RNA. The much stricter sequence requirements needed for Snu13 binding, together with the poor conservation of box C'/D' elements in yeast, suggests that Snu13 has evolved to recognize mostly canonical box C/D motifs. One difference between Snu13 and L7Ae is the inability of the eukaryotic protein to bind k-loop RNAs, namely box C'/D' motifs lacking stem I. This difference has been attributed to residues in the protein loop 9 43 , whose sequence diverges significantly between archaea and eukaryotes but is well conserved within each kingdom of life (Fig. 3A). In structures of both L7Ae and Snu13 in complex with canonical k-turn RNAs, loop 9 contacts the major groove of stem I (Fig. 3B). To understand why L7Ae binds to k-loop RNAs while Snu13 does not, we generated seven Snu13 mutants with eukaryotic-to-archaea mutations in loop 9 (S94E, R95V, V93IR95V, S94ER95V, R95VP96A, R95VI98A, and S94ER95VP96A, Fig. 3A,B) and tested their ability to bind the RNAs of Fig. 1C. Three of these mutants had been tested previously for the mouse analogue of Snu13, the protein SNU13 (or 15.5 K), together with the archaeal box C/D and box C'/D' motifs of sR8 from Methanococcus jannaschii 43 .
Binding of Snu13 to snR54-kl1 and snR54-kl2 was promoted by the V93IR95V and R95VP96A mutations, while the V93IR95V mutant bound also to snR51-kl3 and snR54-kl3 with a loop instead of stem I ( Supplementary  Fig. 6). In general, binding to RNAs without a stable stem I was promoted by an increase in the hydrophobicity   www.nature.com/scientificreports/ of loop 9. By contrast, introducing a negative charge, as in Snu13 S94E, lowered the binding affinity ( Fig. 3C and Supplementary Fig. 6). Some of the mutations modulated the affinity of Snu13 for snR51-kl2 (Fig. 3C). The V93IR95V, R95VP96A and R95VI98A mutant bound snR51-kl2 better than wild-type: the appearance of well-defined bands for the complex species in the EMSAs suggested that the increased affinity was due to a decrease in the dissociation rate, k off . These results can be rationalized comparing our crystal structure of the L7Ae-snR51-kl1-S RNA complex with the published structure of human Snu13 bound to the U4 RNA k-turn element (PDB ID: 1E7K) (Fig. 2D,E). In complex with the snR51-kl1-S RNA, substitution of L7Ae-V94 by Snu13-R95 would weaken the hydrophobic contacts with the guanine 1b and at the same time compensate this loss with electrostatic contacts to the RNA backbone. Consistent with this, the Snu13 mutant R95V bound to snR51-kl2 with similar affinity as wild type Snu13 did (Fig. 3C). By contrast, the hydrophobic contacts between L7Ae-I92 and the adenine 2b would be weakened if the Ile were substituted by Snu13-V93, explaining why the Snu13 mutant V93IR95V bound snR51-kl2 better than the wild type. The longer side-chain of Snu13-I98, substituting L7Ae-A97, would form more hydrophobic contacts to the bulged-out nucleotide, but would push Snu13-I63 in helix α 4 away from the RNA, explaining why the Snu13 mutant R95VI98A binds snR51-kl2 better than wild-type.
Altogether, our experiments confirm the role of loop 9 residues in determining the affinity of the protein for k-loop RNAs but also reveal how the nature of these residues fine-tunes binding affinities to k-turn RNAs.

Assembly in Box C/D complexes does not rescue non-functional box C'/D' motifs. After test-
ing the affinity of isolated Snu13 for non-canonical box C'/D' sequences, we asked whether this affinity could be modulated by the presence of the scaffolding proteins Nop56/Nop58. Because the heterocomplex of Nop56 and Nop58 cannot be reconstituted from overexpressed proteins in a homogeneous form and sizeable quantities, we sought to answer this question using archaeal Nop5, instead. We reconstituted a chimeric RNP complex containing yeast guide RNA snR51 (Fig. 4A), Snu13 and the complex Nop5 2 -Fib 2 from the archaea species P. furiosus. As a control, we also used the guide RNA sR26 from P. furiosus. The in vitro reconstituted complexes were purified by size-exclusion chromatography (Fig. 4B). snR51 formed RNP particles containing Snu13 and Nop5 2 -Fib 2 , demonstrating that Nop5 2 can substitute for the eukaryotic proteins Nop58 and Nop56 in binding to the Snu13-RNA complex.
We then analyzed the particles assembled with snR51 and sR26 by size-exclusion chromatography and multiple angle light scattering (MALS). The RNP assembled with snR51 (Fig. 4C) resulted in a main peak with a molecular weight (MW) of ~ 187 kDa. This MW corresponds to a monomeric RNP (Supplementary Fig. 2) containing one copy of the Nop5 2 -Fib 2 tetramer, one copy of the guide RNA and only one copy of Snu13 (theoretical MW 193.7 kDa). A second peak related to a dimeric RNP (di-RNP, Supplementary Fig. 2). This result demonstrates that the presence of Nop5 2 -Fib 2 does not promote binding of a second copy of Snu13 to the noncanonical box C'/D' motif of snR51.    www.nature.com/scientificreports/ A similar elution profile was obtained for the complex assembled with the archaeal sR26 RNA, Nop5 2 -Fib 2 and Snu13 (Fig. 4B), with the difference that the monomeric and dimeric RNP peaks were partially overlapped, compromising the accurate determination of their molecular weights. Nevertheless, the MW measured by MALS for the right-most part of peak corresponding to the monomeric RNP, is ~ 200 kDa, which fits with a particle containing one copy of the Nop5 2 -Fib 2 tetramer, one copy of the guide RNA and two copies of Snu13 (theoretical MW 193.3 kDa) rather than only one copy of Snu13 (theoretical MW 179.7 kDa). This data suggests that the presence of Nop5 2 -Fib 2 promotes binding of Snu13 to canonical k-loop structures, as that formed by sR26.
These findings may explain conflicting literature data on the ability of Snu13 to bind snoRNA box C'/D' motifs: on the one hand, Snu13 associates with the box C'/D' motif of human U24 because in this RNA the box C'/D' motif forms a canonical k-loop 29 ; on the other hand, it is unable to bind the non-canonical box C'/D' motif of Xenopus U25 snoRNA 28 .

Discussion
The archaeal Box C/D RNPs are often used as a proxy for eukaryotic Box C/D RNPs, but the similarities and differences between them remain unclear. One major question is whether the predicted, internal, non-canonical box C'/D' motifs in eukaryotic snoRNAs bind the box C/D motif-binding protein Snu13, leading to a similar architecture of the box C/D and box C'/D' protein-assembly sites. In this study, we revisited the question of the existence of box C'/D' motifs in yeast snoRNAs by defining a functional box C'/D' motif as two juxtaposed sequences capable of base-pairing (as in Fig. 1A) and of forming, at least when in complex with proteins, the three-dimensional k-turn-like structure typical of these secondary structure elements. We show that functional box C'/D' motifs exist in some eukaryotic snoRNAs but not in others. When they exist, they recruit a second copy of Snu13 to the RNA; when they do not exist, no second copy of Snu13 is recruited to the complex. Moreover, we determine the features that define a functional box C'/D' motif in eukaryotes and we show that most of the predicted non-canonical box C'/D' motifs are not functional.
Among the 43 yeast snoRNAs, only two (snR60 and snR70) contain box C'/D' motifs that match the consensus sequence; six others (snR58, snR65, snR66, snR69, snR71 and U24) contain both the A•G and G•A sheared base-pairs of stem II and the first base-pair of stem I 26 . Eighteen predicted box C'/D' motifs contain the tandem sheared base-pairs but lack at least the first base-pair of stem I (for example, snR50, snR52, snR63 and snR74) and the remaining 17 lack one of the two sheared base-pairs (for example snR51, snR41, snR54, snR13 and snR38). According to our systematic mutational analysis, Snu13 binding to the box C'/D' motif requires the presence of both an intact stem I and the tandem A•G sheared base-pairs. Thus, we predict that only snR60 and snR70, and possibly snR58, snR65, snR66, snR69, snR71 and U24, recruit Snu13 to their putative box C'/D' motif. In humans none of the 32 snoRNAs homologous to yeast sRNAs contains a canonical box C'/D' and stem I seems to be absent in most cases. Thus, functional box C'/D' motifs are by no means universal in eukaryotic snoRNAs 50 .
In archaea, the Nop5 dimer directs the methyltransferase, fibrillarin, to the methylation sites by anchoring its C-terminal domains to the box C/D and C'/D' motifs of the guide RNA, thereby orienting the RNA guide sequences along its coiled-coil domains ( Figure S2). The activity and specificity of archaeal 2'-O-methylation complexes depend on this bipartite, symmetrical architecture. We show here that a similar bipartite architecture is unlikely to exist in the 2'-O-methylation complexes of eukaryotes, which have a number of asymmetric features. It follows that archaeal Box C/D RNPs are not a satisfactory proxy for eukaryotic Box C/D RNPs in all their functional and regulation aspects.
The prediction that no more than 8 of the 43 yeast methylation guide RNAs contain a functional box C'/D' motif calls into question their role in binding Nop56 and indicates that a RNA three-dimensional structure different from both the k-turn and the k-loop might recognize this protein. Based on the observation that box C'/D' motifs are not universal in eukaryotic snoRNAs, we propose that the interaction of Nop56 with the guide RNA does not depend on a box C'/D' motif but on a yet-unknown RNA motif or structure, which has evolved exclusively in eukaryotes from the box D' sequence and may partially overlap with it. In this scenario, the box C/D sequence would have a leading role in initiating complex assembly by the recruitment of Snu13 and Nop58, followed by a chaperone-aided dimerization of Nop58 with Nop56. Only at this point Nop56, readily recruited to the complex, would be able to recognize the RNA next to the box D' site in a yet-unknown manner. It is also possible that the C-terminal domain of Nop56 is not involved in RNA binding and that the guide-substrate RNA duplex is recognized in the right register by the Nop56-N-terminal-domain-fibrillarin complex with the help of transiently-associated chaperones.
The role of the box C/D element in initiating the assembly of snoRNPs is supported by functional data in vivo. In a study of the effect of mutation or depletion of various secondary structure elements in an engineered guide snoRNA on rRNA methylation in yeast 29 , mutation or depletion of the box C/D motif abolished methylation at both sites upstream of box D and box D' . By contrast, depletion of box D' (or box C') affected methylation only upstream of box D' 29 , indicating that the box C'/D' motif is required neither for the assembly of a functional complex nor for methylation specificity at the site upstream of box D. The sequence of the engineered snoRNA used in this study was derived from human U24, which has a canonical box C'/D' k-loop sequence and was shown to recruit two copies of Snu13. The fact that even in this case the conserved box C'/D' element is unable to nucleate the assembly of the methylation complex indicates that in vivo the snoRNP does not adopt a symmetric bipartite structure like the archaea Box C/D RNPs. These findings and our own data together strongly suggest that the architecture of Box C/D snoRNPs at the box C/D and (putative) box C'/D' sites differ from each other. This conclusion is not in the disagreement with the symmetric architecture of the U3 snoRNP bound to the pre-ribosomal 90S subunit 36 , as the U3 snoRNP does not function as a methyltransferase and the U3 RNA has two canonical box C/D sequences, unlike most of the methylation competent snoRNAs. The apparent existence of different assembly modes for Snu13, Nop56 and Nop58 on Box C/D snoRNAs demonstrates the versatility of www.nature.com/scientificreports/ the eukaryotic snoRNP machinery: the sequence of the snoRNA determines the assembly mode at the Nop56 site to support different functions. The idea that the symmetric bipartite architecture seen in archaeal Box C/D RNPs does not exist in methylation-competent eukaryotic snoRNPs is further supported by the biochemical and in vivo evidence of the importance of the spacer/guide sequences between box C and box D' or box C' and box D in archaea and eukaryotes. In archaea, the optimal spacer/guide length is 12 nucleotides; alteration of the length of one of the two spacer/ guides impacts the methylation only of the corresponding substrate, in agreement with similar architectures at the box C/D and box C'/D' sites 51 . In eukaryotes, the picture is less clear cut: methylation of the substrate upstream of box D is (moderately) sensitive to alterations of the spacer/guide length between box C' and box D, whereas methylation of the substrate upstream of box D' is sensitive to alterations of both spacer/guide sequence lengths 29 .
In summary, we show that a functional box C'/D' element does not exist in most yeast guide RNAs, leading us to conclude that this RNA element is not the specific recognition motif for Nop56. We propose that eukaryotic, methylation-competent RNPs have a non-symmetric architecture with different protein-RNA contacts and assembly geometries at the box C/D and putative box C'/D' sites.
The asymmetric nature of the eukaryotic complex may result into different mechanisms for the regulation of methylation levels at the substrates upstream of box D and box D' and thus into a higher flexibility for the coupling of site-specific methylation to other cellular processes.

Material and methods
Cloning and mutagenesis. Genes encoding full-length L7Ae, Nop5, and fibrillarin in P. furiosus (Uni-ProtKB accession code Q8U160, Q8U4M1, and Q8U4M2) were obtained by PCR from genomic P. furiosus (Pf) DNA. The genes were cloned into expression vector pET-M11 containing a TEV (tobacco etch virus) proteasecleavable N-terminal His 6 -Tag using BamHI and NcoI restriction sites 31 . The full-length SNU13 gene from S. cerevisiae (UniProtKB accession code P39990) was ordered from Invitrogen with codon-usage optimized for E. coli translation. The gene was amplified via PCR and cleaved with NcoI and XhoI restriction enzymes (New England Biolabs, NEB). Cleaved PCR products were ligated into the cleaved pET-M11 expression vector. The final Snu13 construct contained an N-terminal His 6 -Tag cleavable with TEV-protease.
Snu13 point mutations were accomplished using the Pfu Plus! DNA Polymerase (Roboklon) according to the protocol provided by the manufacturer. PCR products were cleared from the starting material by DpnI (NEB) digest; the enzyme was heat-inactivated before transformation of the cleared PCR products into E. coli OmniMax cells. Positive mutants were verified by sequencing (Eurofins).
Full-length DNA templates for S. cerevisiae (Sc) guide RNAs snR51 (Gene-ID: 9.164.983), snR41 (Gene-ID: 9.164.986), and snR54 (Gene-ID: 9.164.960) were ordered as synthetic genes in cloning vector pUC57 from GENEWIZ (Sigma-Aldrich). All templates contained a 3' PstI cleavage site for DNA linearization. For all other RNA constructs, the template DNA was ordered as single-stranded DNA with EcoRI (5' GAA TTC ) and HindIII (5' AAG CTT ) cleavage sites at the 5'-and 3'-end, respectively, as well as a PstI cleavage site upstream of the Hin-dII site. Complementary single-stranded DNA molecules were annealed, cleaved with EcoRI-HF, and HindII-HF (NEB), and purified with the QIAquick PCR purification kit (Qiagen). The inserts were ligated into a cleaved pUC19 cloning vector using T4 DNA ligase (NEB). Correct insertion was verified by sequencing (Eurofins).
coli BL21(DE3). Cells were grown in LB Medium at 37 °C until an OD 600 of 0.6-0.8, and expression was induced at 20 °C with 1 mM final concentration of IPTG (Carl Roth). Cells were harvested 18-20 h after induction by centrifugation at 4500 rpm and 4 °C.
The cell pellet was resuspended in lysis buffer A (50 mM Tris-HCl, 1 M NaCl, 10% glycerol, 10 mM imidazole, 10 mM β-mercaptoethanol, pH 7.5) complemented with one tablet of cOmplete, EDTA-free protease inhibitor cocktail (Roche). After addition of 1 mg lysozyme (Carl Roth), the resuspended cell pellet was incubated for 30 min on ice; afterwards, the cells were lysed by 30 min sonication on ice. The lysate was cleared by centrifugation at 18,500 rpm and 16 °C for 1 h. For L7Ae, the supernatant was mixed with lysis buffer containing 8 M guanidinium hydrochloride (GdnHCl) in a 1:3 ratio to a final GdnHCl concentration of 6 M. The denatured lysate was loaded on a 5 ml HisTrap FF column (Cytiva) using an Äkta Pure system with an external sample pump. After sample loading, the bound protein was refolded by reducing the GdnHCl concentration stepwise with 20 column volumes of lysis buffer. The refolded L7Ae was eluted in 0 M GdnHCl and up to 50% buffer B (50 mM Tris-HCl, 1 M NaCl, 10% glycerol, 1 M imidazole, 10 mM β-mercaptoethanol, pH 7.5). For Nop5 and fibrillarin, the supernatant was boiled for 15 min at 80 °C and cleared by centrifugation at 18,500 rpm and 16 °C. The supernatant was loaded onto a 5 ml HisTrap FF column, which was washed six times with 3 column volumes of high-salt buffer C (50 mM Tris-HCl, 1 M NaCl, 10% glycerol, 10 mM imidazole, 2 M LiCl, 10 mM β -mercaptoethanol, pH 7.5). The proteins were eluted with buffer B. After elution, all proteins (L7Ae, Nop5, and fibrillarin) were buffer-exchanged into buffer A using a HiPrep 26/10 desalting column (Cytiva). The N-terminal His 6 -Tag was removed by overnight cleavage with TEV-protease (produced in-house) at room temperature. The reaction mixture was loaded on a 5 ml HisTrap FF column, which retained the TEV-protease and the cleaved His 6 -Tag, while the cleaved protein was collected with the flow-through. www.nature.com/scientificreports/ lysed by 30 min sonication on ice. The lysate was cleared by centrifugation at 18,500 rpm and 16 °C for 1 h. The supernatant was loaded on a 5 ml HisTrap FF column, which was washed six times with three column volumes of high-salt buffer E (50 mM Tris-HCl, 1 M NaCl, 5% glycerol, 10 mM imidazole, 2 M LiCl, 10 mM β-mercaptoethanol, pH 7.5). The protein was eluted using a gradient up to 50% of buffer F (50 mM Tris-HCl, 1 M NaCl, 5% glycerol, 1 M imidazole, 10 mM β-mercaptoethanol, pH 7.5) and subsequently buffer exchanged into buffer G (50 mM Tris-HCl, 150 mM NaCl, 5% glycerol, 10 mM β-mercaptoethanol, pH 8.0) using a HiLoad desalting 26/10 column. To remove bound RNA, the eluate was loaded on a 5 ml QTrap HP column (Cytiva), from which the RNA-free protein was collected with the flow-through. The RNA and RNA-bound protein were eluted from the column with buffer H (50 mM Tris-HCl, 2 M NaCl, 5% glycerol, 10 mM β-mercaptoethanol, pH 8.0). The RNA-free protein was cleaved with TEV protease (produced in-house) to remove the N-terminal His 6 -Tag. The TEV protease and the cleaved tag were removed by affinity chromatography using a 5 ml HisTrap FF column. The purity of all proteins was confirmed by SDS gel electrophoresis and size exclusion chromatography (SEC).
snoRNA sequence annotation. Yeast snoRNA sequences were obtained from the yeast snoRNA database UMass-Amherst 41 . All snoRNAs considered here have experimentally verified methylation sites 52 . Box D and D' motifs were manually annotated using the consensus motif 5'-CUGA and had to start five nucleotides downstream of the canonical base-pair formed between the snoRNA and the target nucleotide in the rRNA. Our annotation was identical to that of the yeast snoRNA database UMass-Amherst 41 . Box C was annotated using the consensus sequence 5'-RUG AUG A and had to start within 5 nucleotides upstream of the 5' end. Box C' was annotated manually using the consensus sequence 5'-RUG AUG A and had to be positioned between the box D' motif and the guide sequence upstream of the box D motif. Our annotation corresponded to that reported in 26 for all snoRNAs but snR41. For snR41 we chose the sequence 5'-UAC AUGU instead of 5'-AUGU GCA as in 26 , because it yielded a stem II that was more similar to that of a genuine box C'/D' motif than the sequence chosen in 26 .

RNA transcription.
All RNAs used for crystallization and the guide RNAs used for activity assays were produced by in vitro transcription, using T7 RNA polymerase produced in-house. Plasmids containing DNA templates were transformed into E. coli Top10; transformed cells were grown in LB medium overnight at 37 °C and harvested by centrifugation at 4500 rpm and 4 °C. Plasmids were extracted using the Qiagen Plasmid Mega Kit (Qiagen) and cleaved with PstI-HF (NEB). Linearized plasmid DNA was purified by phenol/chloroform/ isoamyl alcohol and chloroform/isoamyl alcohol (Carl Roth) extraction and concentrated by precipitation with pure ethanol and NaCl. For the transcription of each RNA construct, the concentrations of DNA, nucleoside triphosphates (NTPs, Carl Roth), MgCl 2 and T7 polymerase were optimized to maximize the yield. Large-scale transcription reactions were run for five hours at 37 °C. All RNAs were purified using preparative, denaturing polyacrylamide gels containing 8 M urea. Purity was verified using analytical denaturing polyacrylamide gels.
Electrophoretic mobility shift assays. All RNAs used for the electrophoretic mobility shift assays (EMSA) were purchased from Integrated DNA Technologies (IDT) with a 5'-Cy5 label for fluorescence detection. Three nucleotides were added at the 5' end as a spacer between the Cy5 label and the desired RNA sequence (Supplementary Table 1). To ensure that the spacer nucleotides did not interfere with the RNA structure, a subset of binding assays were repeated with non-labelled RNAs lacking the spacer nucleotides and analyzed using ethidium bromide. The results were equivalent in all cases.
In the fluorescence-detected binding assays, 10 pmol of RNA were mixed with sterile LC-MS grade water (Merck) and annealing buffer (final concentrations: 10 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA, pH 7.5) in a total volume of 5 µl and annealed by heating to 80 °C and slow cooling to 4 °C in a T100 Thermocycler (Bio-Rad). After annealing, 0, 2.5, 5, 10 and 20 pmol of protein were added and incubated for up to 30 min at 4 °C. Afterwards, 5 × native loading dye (50 mM Tris-HCl, 0.25% xylene cyanol, 0.25% bromophenol blue, 30% glycerol, pH 7.5) was added to each sample. All samples were analyzed on 10% native polyacrylamide gel at 4 °C. Each gel was pre-run for 0.5-1 h before sample loading. After sample loading, gels were run overnight at 4 °C and 10 mA. Gels were analyzed using a Typhoon Trio system (GE Healthcare) with a 670 nm-bandpass (670 BP 30) emission filter for Cy5 detection. Intensities were extracted and analyzed using Fiji 53 .
A subset of EMSAs (L7Ae or Snu13 with sR26-kl, snR51-kl1, snR51-kl2, snR41-kl1, snR41-kl2, snR54-kl1 and snR54-kl2) was repeated with non-labeled RNAs, lacking the spacer nucleotides between the RNA and the dye at the 5' end. In these assays, 0.5 nmol of non-labeled RNA were mixed with pure and sterile LC-MS grade water (Merck) and annealing buffer (final concentrations: 10 mM Tris-HCl, 100 mM NaCl, 1 mM EDTA, pH 7.5) in a total volume of 5 μl and annealed by heating to 80 °C for 1 min and slow cooling to 4 °C in a T100 Thermo Cycler (Bio-Rad). After annealing, 0, 0.125, 0.25, 0.5 and 1 nmol of protein (L7Ae or Snu13) were added, and the mixture was incubated for 30 min at 4 °C. All samples were further were analyzed on 10% native polyacrylamide gel at 4 °C to prevent diffusion and degradation of the protein and the RNA. Gels were stained with ethidium bromide, and the RNA was visualized using a Gel Doc XR + gel documentation system (Bio-Rad). www.nature.com/scientificreports/ A concentrated solution of ~ 10 mg/ml of L7Ae-snR51-kl1-S was used for crystallization by sitting drop vapor diffusion. Initial crystallization screens were set up with a Crystal Phoenix crystallization robot (Art Robbins Instruments) using NeXtal DWBlock Suites (Qiagen); JCSG Core I Suite, JCSG Core II Suite, JCSG Core II Suite, JCSG Core IV Suite, Nucleix Suite, PEG Suite, and PEG II Suite. The drop solution was equilibrated against 200 µl of reservoir solution at 18 °C. Crystals appeared after one week in several conditions across all initial screens. The best crystal was obtained in 0.02 M CaCl 2 , 0.1 M sodium acetate, 30% 2-Methyl-2,4-pentandiol (MPD) (G12 from Qiagen JCSG Core I Suite). Cryo-protection was achieved by the addition of 10% (2R, 3R)-2,3-butanediol before flash-freezing.
Crystallographic data collection and processing. Data were collected at beamline P11 of the PETRA III storage ring, DESY (Deutsches Elektronen-Synchroton, Hamburg, Germany) 54 . The dataset yielding the structure was recorded at 100 K and a wavelength of 1 Å and processed using the AutoPROC toolbox (Global Phasing) 55 executing XDS 56 , Pointless 57 , Aimless 58 from the CCP4 program suite 59 . The high-resolution cut off was determined using a signal/noise ratio (I/σ(I)) of 2.0. Structure determination. The number of molecules in the asymmetric unit was determined using Xtriage 60 from the Phenix software package 61 . L7Ae from P. furiosus (PDB-ID: 4WB0, sequence identity: 100%) was identified as a suitable search model for molecular replacement by executing Balbes 62 from the CCP4 program suite 59 . The molecular replacement solution containing only the protein component was improved by executing the AutoBuild tool 63 from the Phenix software package 61 . The missing RNA component was then built by the AutoBuild tool around the fixed model of the improved protein component. The crystal structure of the protein-RNA complex was further improved and finalized by iterative cycles of model building in Coot 64 and refinement in Phenix.refine 65 . Data collection and refinement statistics are summarized in Table 1.
Complex assembly. All RNP complexes were assembled in complex buffer (20 mM sodium phosphate,