Usb1 controls U6 snRNP assembly through evolutionarily divergent cyclic phosphodiesterase activities

U6 small nuclear ribonucleoprotein (snRNP) biogenesis is essential for spliceosome assembly, but not well understood. Here, we report structures of the U6 RNA processing enzyme Usb1 from yeast and a substrate analog bound complex from humans. Unlike the human ortholog, we show that yeast Usb1 has cyclic phosphodiesterase activity that leaves a terminal 3′ phosphate which prevents overprocessing. Usb1 processing of U6 RNA dramatically alters its affinity for cognate RNA-binding proteins. We reconstitute the post-transcriptional assembly of yeast U6 snRNP in vitro, which occurs through a complex series of handoffs involving 10 proteins (Lhp1, Prp24, Usb1 and Lsm2–8) and anti-cooperative interactions between Prp24 and Lhp1. We propose a model for U6 snRNP assembly that explains how evolutionarily divergent and seemingly antagonistic proteins cooperate to protect and chaperone the nascent snRNA during its journey to the spliceosome.

S plicing of precursor messenger RNA is an essential process in all eukaryotes and is catalyzed by the spliceosome. The spliceosome is a dynamic macromolecular machine composed primarily of five ribonucleoprotein particles known as the U1, U2, U4, U5 and U6 small nuclear ribonucleoproteins (snRNPs), each containing a small nuclear RNA (snRNA) and numerous proteins. The highly conserved U6 snRNA coordinates magnesium ions in the active site that are required for splicing catalysis 1 . Unlike the other snRNAs, U6 is synthesized by RNA polymerase III (Pol III) and, like other Pol III transcripts, its transcription is terminated stochastically when the polymerase encounters a stretch of adenines in the template strand (Fig. 1a) [2][3][4] with Saccharomyces cerevisiae requiring at least six sequential adenines in the template strand to terminate efficiently 5 . Nascent U6 terminates in a 3′ polyuridine tract of heterogeneous length (4-8 uridines) with terminal 2′ and 3′ hydroxyl groups (a cis-diol) (Fig. 1a) and is bound by the La protein (Lhp1 in S. cerevisiae) 6,7 . However, the predominant form of mature U6 in vivo in most organisms characterized to date does not contain a terminal cis-diol and is not bound by La protein 8 .
Post-transcriptional exonucleolytic processing of U6 is directed by U6 biogenesis protein 1 (Usb1) 9, 10 , a 3′-5′ exonuclease belonging to the 2H phosphodiesterase superfamily of enzymes 9,11 . Usb1 is an essential protein in S. cerevisiae 9, 12 , but not in Schizosaccharomyces pombe 10 and metazoans 10,11,13 . In yeast, it is likely that immature U6 RNA is the only essential target of Usb1, as overexpression of this snRNA rescues deletion of yUsb1 11 . Human Usb1 is involved in processing precursors of One-dimensional 31 P NMR spectra of 2′,3′-cUMP shows a single peak at 20 ppm. A 3′ UMP standard has a single peak at 3.4 ppm. When 2′,3′-cUMP is incubated with AtRNL, which leaves a 2′ phosphate 8 , there is a single peak at 3.2 ppm. Incubation of 2′,3′-cUMP with yUsb1 produces a new signal at 3.4 ppm (c, bottom) Zoom of dashed region in top panel. d Time course of Usb1 processing on RNAs with different 3′ end modifications. yUsb1 is most active on RNA substrates with a cis-diol (lanes 1-4), less active on those with a 2′,3′-cyclic phosphate (>p; lanes 5-8) or 2′ phosphates (2′P; lanes 9-12), and is inactive on 3′ phosphate ends (3′P; lanes [13][14][15][16]. e Model describing the dual activities of yUsb1 both U6 and the minor spliceosomes snRNA U6atac 13 . Loss-offunction mutations in human Usb1 are associated with poikiloderma with neutropenia, a rare skin disease that is also associated with loss of white blood cells 14 .
Binding of Lsm2-8 in turn promotes formation of the U6 and U4/U6 snRNPs [16][17][18] . Thus, the identity of the 3′ end of U6 RNA is a crucial determinant in the assembly of U6-containing snRNPs. However, the end modification of the terminal nucleotide in yeast differs from that in humans, with yeast primarily containing a noncyclic phosphate group 8 . Prior to this work, it was not known if yUsb1 directly promotes formation of the terminal phosphate, or if an additional cyclic phosphodiesterase (CPDase) acts on U6 RNA after exonucleolytic processing. Furthermore, the position of the terminal phosphate on yeast U6 RNA (at either the 2′ or 3′ oxygen) was not known, and cannot be resolved in recent cryo-EM structures due to local resolutions of >7 Å for the U6 3′ tail [19][20][21][22][23][24] .
To investigate the mechanism of U6 RNA processing and snRNP assembly, we characterized the structure and activities of yeast and human Usb1. We determined the 1.8 Å crystal structure of the catalytic domain of yUsb1, and a 1.4 Å co-crystal structure of hUsb1 bound to the substrate analog uridine 5′-monophosphate (5′ UMP). We demonstrate the importance of the identity of the 3′ end of U6 to snRNP formation and show how U6 RNA is involved in a series of protein-mediated handoffs prior to formation of the mature U6 snRNP.

Results
Yeast Usb1 exhibits cyclic phosphodiesterase activity. We sought to understand the activity of Usb1 from S. cerevisiae (yUsb1). To this end, we prepared full-length yUsb1 protein and tested it for exoribonuclease activity in vitro using oligonucleotide model substrates. When incubated with RNA terminating in multiple uridines and a cis-diol, yUsb1 predominately removed only 1 nucleotide from the 3′ end of an RNA, with 80% of the substrate converted into the n-1 product (Fig. 1b). Interestingly, 15% of the n-1 product had a slightly slower mobility consistent with it containing a cyclic phosphate 3′ end 25 . Removal of one or two additional nucleotides occurs infrequently (15 and 3% of the total product, respectively) (Fig. 1b). Substrates with a deoxyuridine at the n-1 position were not processed by yUsb1 ( Supplementary Fig. 1a), indicating that yUsb1 acts exclusively as a 3′-5′ exonuclease and that additional cleavage products are due to inefficient re-processing of n-1 RNA. Additionally, we find that yUsb1 removes 1-3 nucleotides regardless of the length of the polyuridine tail ( Supplementary Fig. 1b).
We investigated the product of yUsb1 processing by exploiting the disparate activities of calf intestinal phosphatase (CIP) and T4 polynucleotide kinase (PNK) (Fig. 1b). Both CIP and T4 PNK can remove terminal 3′ phosphoryl groups from oligonucleotides to produce cis-diols, but only T4 PNK can remove both cyclic and noncyclic terminal phosphates. Treatment of yUsb1-processed RNA with CIP or T4 PNK both resulted in reduced mobility of the products, consistent with the presence of a noncyclic phosphate group on yUsb1-processed RNA (Fig. 1b). Thus, yUsb1 directly catalyzes the formation of a noncyclic phosphate, and U6 3′ end processing does not require a trans-acting 2′,3′-cyclic phosphodiesterase in yeast. Yeast Usb1 catalyzes two distinct chemical reactions: (1) exonucleolytic removal of a terminal uridine ("first step") and (2) cyclic phosphodiesterase (CPDase) ring opening to leave a noncyclic phosphate ("second step").
To unambiguously identify whether yUsb1 leaves a 2′ or 3′ phosphate, we investigated the CPDase activity of yUsb1 in isolation from its exonuclease activity using nuclear magnetic resonance (NMR) spectroscopy (Fig. 1c). The 31 P chemical shift for a 2′,3′-cyclic phosphate is~20 ppm, whereas noncyclic 2′ and 3′ phosphates of UMP have unique and well-resolved chemical shifts between 3 and 3.5 ppm. (Fig. 1c). When 2′,3′-cyclic UMP (cUMP) is incubated with yUsb1, a new peak at 3.4 ppm is formed that corresponds precisely to the chemical shift of 31 P in a 3′ UMP standard ( Fig. 1c) 26 . Additionally, the H6 resonance of uracil is well documented to be highly sensitive to the position of the phosphate at the 2′ vs. 3′ position 26,27 and further confirms that the product is a 3′ phosphate ( Supplementary Fig. 2a). Finally, two-dimensional 1 H-31 P heteronuclear multiple bond correlation (HMBC) and 1 H-1 H correlation spectroscopy (COSY) spectra unambiguously show that the product is a 3′ phosphate ( Supplementary Fig. 2b, c). These data demonstrate that yUsb1 has CPDase activity that catalyzes the formation of a terminal 3′ phosphate, unlike metazoan Usb1, which lacks CPDase activity altogether 10,11 .
To determine how yUsb1 can occasionally remove more than one nucleotide, we tested its ability to process RNA substrates with different 3′ ends. yUsb1 is less efficient on 2′ phosphateterminated substrates and is inactive on RNAs with a terminal 3′ phosphate (Fig. 1d). Substrates terminating in a 2′,3′-cyclic phosphate are processed by Usb1 with slower kinetics and to a lesser extent (Fig. 1d, lanes 6-8). These data demonstrate that yUsb1 is incapable of further processing its dominant 3′ phosphate product and that successive (n-2 and n-3) products are likely formed from a cyclic phosphate intermediate that is reprocessed prior to second step CPDase chemistry. The amount of slower mobility n-1 cyclic phosphate product is consistent with the relative kinetics of n-2 product formation (Fig. 1b). The presence of both exonuclease and CPDase activity, along with inactivity on 3′ phosphate-terminated substrates, reveals an elegant mechanism for ensuring that yUsb1 does not overprocess and degrade U6 RNA.
The conserved architecture and active site of Usb1. We determined the crystal structure of the catalytic domain of yUsb1 (amino acids (a.a.) 71-290) to 1.8 Å resolution ( Fig. 2a and Table 1). The enzyme exhibits a typical 2H phosphodiesterase fold 28 with an active site containing two H-X-S motifs (Fig. 2b). yUsb1 crystallized in the presence of 2 M ammonium sulfate, and the resulting structure contained a sulfate ion coordinated within the active site which likely mimics the binding mode of the scissile phosphate ( Fig. 2b and Supplementary Fig. 3). NE2 of H231 is 2.8 Å from the sulfate ion, while H133 is farther away (4 Å). Residues 107-115 could not be modeled in our structure and are presumed to be disordered. Despite low sequence identity (<20%), the structure of yUsb1 is strikingly homologous to that of human Usb1 11 , with nearly superimposable active sites (root mean square deviation for H/S in active site=0.3 Å).
Structure of human Usb1 with a substrate analog. To understand how Usb1 recognizes RNA and catalyzes 3′ end processing, we sought to determine the co-crystal structure of Usb1 with a substrate analog. We obtained a structure of truncated hUsb1 (a.a. 78-265) with 5′ UMP at 1.4 Å resolution (Fig. 2c), using the same crystal form as Hilcenko et al. 11 , and incorporation of ligand via soaking of 10 mM 5′ UMP at pH~6.5. In the structure of the apoenzyme at pH 5.6 (Protein Data Bank (PDB) ID 4H7W), the active site histidine H120 appears in two conformations, with one conformation tilted away from the active site and a proximal conformation stabilized by the coordination of a water molecule 11 . In the 5′ UMP-containing structure, H120 adopts the proximal conformation, and is 3.1 Å from an oxygen of the phosphate group in 5′ UMP (Fig. 2d). The 5′ UMP is held in place by hydrogen bonding interactions with active site residues H120, S122, S210 and H208, and a stacking interaction with Y202. The 5′ phosphate of 5′ UMP is positioned near the center of the active site, close to the position of the sulfate ion in the yeast active site which would correspond to the scissile phosphate that is involved in the first exonucleolytic step. We note that the hydrogen bonding interaction between H208 and the O5′ oxygen is positioned for general acid catalysis (Fig. 2d), consistent with an enzymatic mechanism proposed elsewhere 11 . We hypothesize that the binding mode of 5′ UMP in our structure of hUsb1 is analogous to how the last nucleotide of an RNA substrate would be coordinated within the active site.
Comparison of human and yeast Usb1 enzymatic activities. We further characterized the enzymatic activities of human and yeast Usb1 in order to better understand how they differ with respect to U6 processing. hUsb1 removes multiple nucleotides over the course of an hour (Fig. 2e), as observed previously 11 . Interestingly, hUsb1 is also strongly inhibited by a 3′ phosphate terminated RNA ( Fig. 2f and ref. 11 ). Thus, the mechanistic underpinning for the ability of hUsb1 to remove multiple nucleotides from U6 arises from its inability to catalyze "second step" cyclic phosphate ring opening. The structure of hUsb1 with 5′ UMP shows that RNAs with 3′ modifications would be sterically occluded by a loop spanning residues 161-167, suggesting that this region of the protein may contribute to substrate specificity. Human Usb1 was less active in our assay conditions (pH 6.5) than in previous assays conducted at pH 8.0 11 . To reconcile this difference, we monitored the pH dependence of human Usb1 and found that it had a pH optimum of 7.5, in contrast to yeast Usb1, which exhibited a significantly different activity range and a pH optimum of 6.5 ( Supplementary Fig. 4a, b). yUsb1 remains partially active even at pH 4, whereas hUsb1 is completely inactive. In contrast, at pH 10, yUsb1 is completely inactive, whereas hUsb1 retains activity. This difference in pH optimum suggests that the active site histidine residues in yeast and hUsb1 have markedly different pK a 's, likely due to different active site microenvironments. For example, hUsb1 has a histidine (H84) adjacent to the active site that can hydrogen bond to the active site serine S122 11 . In yeast, this position is a phenylalanine (F78) which cannot form an analogous hydrogen bond.
A residue adjacent to the active site influences activity. As yeast and human Usb1 exhibit remarkably similar H-X-S active sites, we inspected residues surrounding the active site for additional explanations as to how yeast and human Usb1 exhibit divergent enzymatic behaviors. We observed that Phe78 in yeast and His84 in human Usb1 are structurally homologous, suggesting a possible role for influencing water nucleophilicity during second step chemistry, or modulation of active site pK a (see above section) (Fig. 3a). With the exception of S. pombe Usb1, aromaticity (but not amino acid identity) is conserved at this position in other organisms (Fig. 3b). We therefore asked if mutations in yUsb1 at position 78 would have a phenotype in S. cerevisiae. Indeed, mutating F78 to alanine causes a slow growth phenotype in vivo (Fig. 3c) and a >300-fold reduction in the rate of processing in vitro (Fig. 3d). Surprisingly, F78H has no observable growth phenotype (Fig. 3c), yet has a 47fold reduction in in vitro processing rate (Fig. 3d). Thus, significant reductions in Usb1 activity but not complete loss of the protein (Supplementary Table 1) 11 are tolerated by yeast.
Since mutation of F78 to histidine (as in hUsb1) supported yeast growth, we next asked if hUsb1 could complement deletion of USB1 in yeast. When expressed from a high-copy plasmid using a GPD promoter, wild-type hUsb1 allows for an extreme slow growth phenotype, with small colonies visible only after >3 days (Fig. 3e). We substituted H84 in hUsb1 for the phenylalanine found in that position in yUsb1 and determined if that improved yeast proliferation. Indeed, hUsb1-H84F rescues yeast growth, whereas H84A does not, suggesting that H84F is a true gain-of-function mutant (Fig. 3e). Thus, mutations at this position greatly affect both in vitro processing rate and in vivo viability. Conversely, substitution of many perfectly conserved residues outside of the active site (i.e., yUsb1 Y80) has no effect on yeast viability (Supplementary Table 1).
N-terminus of Usb1 is essential for yeast viability. Orthologs of Usb1 contain an N-terminal region that is predicted to be disordered 29,30 . Surprisingly, the N-terminal region is more conserved across species than the catalytic domain (Fig. 4a).
Although the length of the domain is highly variable, there are two regions that are conserved in sequence: a Y(S) N (D/E) motif in the first 20 amino acids, and a proline-rich region in the center of the domain. The first serine-rich motif could be a site for posttranslational modification, while the second, proline-rich region could be important for a protein-protein interaction, stability or expression of Usb1, as proline-rich regions are important for many protein-protein interactions 31 . In this region of~20 amino acids,~20% of the residues (and up to 35% in Glycine max Usb1) are prolines (Fig. 4a). We asked what role, if any, the N-terminal domain plays in catalysis or function of yUsb1 in vivo.
We first compared the in vitro activity of the catalytic domain of yUsb1 (a.a. 71-290) to that of full-length yUsb1. The N-terminal region (residues 1-70) has no observable effect in our in vitro exonuclease assay as the rate and extent of processing for full-length and the catalytic domain of yUsb1 are indistinguishable (Fig. 4b). Hilcenko et al. 11 previously reported that an N-terminal deletion of yUsb1 (Usb1 77-290) failed to complement USB1Δ in yeast. Our crystal structure reveals that residues 73-76 form the start of a β-strand and make several intramolecular contacts that are likely important for folding. However, we find that yUsb1 71-290 (Δ70) is also insufficient to support yeast growth (Fig. 4c). This result is surprising, because the catalytic domain is fully active in vitro and because the N-terminal region (residues 1-70) is predicted to be largely disordered. By making successive truncations (Supplementary Table 1), we found that residues 43-290 are essential for viability (Fig. 4c). This essential fragment correlates with the start of the proline-rich region (Fig. 4a). The inviability of Δ70 or Δ50 alleles of USB1 could not be rescued by inclusion of an N-terminal SV40 nuclear localization signal (Supplementary Table 1). This suggests that the role of the N-terminus is more complex than controlling subcellular localization of Usb1.
When full-length and truncations of yUsb1 are expressed under control of a GPD promoter to achieve greater expression levels, we find that more extensive truncations of the N-terminus can complement USB1Δ; however, the Δ70 allele is still lethal (Supplementary Fig. 5a). Western blot analysis shows that truncation of the N-terminal region of yUsb1 results in reduced levels of Usb1 ( Supplementary Fig. 5b). Deletion of the first 42 or 44 residues results  in a significantly reduced signal from Usb1, and deletion of the first 70 amino acids results in an undetectable amount of Usb1, suggesting that the N-terminus of Usb1 plays a role in protein stability. Taking advantage of the viable hUsb1-H84F mutation, we discovered that the N-terminal region of hUsb1 (residues 1-78) is also required for growth (Fig. 4d). This result suggests that the N-terminal domain may possess a conserved function. How the N-terminal domain influences stability, whether through an as-yet unidentified interaction or through another mechanism, remains to be determined.
Usb1 processing directly controls formation of U6 RNPs. We next investigated the impact of U6 3′ end processing on the affinity of U6 snRNA 3′ end binding proteins. We tested binding of Lhp1 and Lsm2-8 using a fragment of the U6 3′ end (U6 95-112) with either a cis-diol or a 3′ phosphate. As expected, Lhp1 greatly prefers to bind a cis-diol, with a K d of 43 nM, with essentially no specific binding to a 3′ phosphate ( Fig. 5a and Table 2) 32 . In contrast, Lsm2-8 can bind both a cis-diol and a 3′ phosphate, but preferentially binds a 3′ phosphate (K d of 85 vs. 10 nM for U6 95-112 with a cis-diol or 3′ phosphate, respectively) ( Fig. 5b and Table 2). Thus, exchange of a cis-diol for a 3′ phosphate eliminates Lhp1 binding 32 and improves Lsm2-8 binding affinity for U6. In humans, Lsm2-8 has been reported to preferentially bind a cyclic phosphate over a cis-diol 15 , suggesting that the Lsm2-8 complex in different organisms has evolved to bind the product of Usb1 processing.
Next, we directly monitored the effect of Usb1 processing on Lhp1 and Lsm2-8 binding to oligonucleotides and full-length U6 RNA. Usb1 processing of oligonucleotides produces binding profiles for Lhp1 that mirror the results with chemically defined 3′ ends (Fig. 5c, d), except for the fact that additional shorter products are also formed (Figs. 1b and 5e) that reduce Lsm2-8 affinity, as Lsm2-8 does not efficiently bind an RNA with only three terminal uridines 34 . The effect of Usb1 processing on U6-binding proteins also extends to full-length U6. Using full-length U6 with an additional uridine at the 3′ end (U6 1-112 +1U), we tested the affinity of Lhp1, Lsm2-8 and Prp24 before and after Usb1 processing via native gel shift (Fig. 5f). Lsm2-8 can bind both before and after Usb1 treatment (Fig. 5f, lanes 3-5 vs. 6-8), but binding is slightly reduced due to the reduction in the U-tail length (Fig. 5b). Lhp1 binding is most sensitive to Usb1 processing, with tight binding before processing and virtually no binding after (Fig. 5f, lanes 9-11 vs. [12][13][14]. Prp24 affinity is unchanged (Fig. 5f, lanes 15-17 vs. [18][19][20], as expected from the structure of the Prp24-U6 RNA complex 35 . These data clearly demonstrate that the modification at the 3′ end is the most important determinant for Lhp1 and Lsm2-8 binding and that processing by Usb1 controls the transition from immature (Lhp1-bound) to mature (Lsm2-8 bound) U6.
Ordered binding in the U6 snRNP assembly pathway. It is well established that Lhp1 binds to newly synthesized U6 RNA 36,37 . Previous work on the subsequent steps in the U6 lifecycle has established that Prp24 and Lsm2-8 cooperatively bind U6 RNA 15 and that Prp24 and Lsm2-8 directly interact via the C-terminus of Prp24 18 . However, the effect of Prp24 on binding of Lhp1 has not been investigated. We tested co-binding of Prp24 and either Lhp1 or Lsm2-8 (Fig. 6a). Prp24 binds U6 with a K d that is 10fold lower than Lhp1 (Didychuk et al. 17 ; Fig. 6a, lanes 2-6 and 7-11). When U6 RNA is incubated with equimolar amounts of Prp24 and Lhp1, it is preferentially bound by Prp24 and a ternary complex is not visible (lanes [12][13][14][15][16]. For example, Lhp1 alone is mostly bound to U6 at 160 nM, but in the presence of Prp24 no Lhp1 binding is observed until 640 nM (Fig. 6a, compare lanes 4  and 16). Formation of a U6-Lhp1-Prp24 ternary complex occurs only at high concentrations (lane 16). These data suggest that Prp24 binding is anti-cooperative with binding of Lhp1. In contrast, Prp24 and Lsm2-8 bind cooperatively and efficiently form a ternary complex (Fig. 6a, lanes 22-26). While Lsm2-8 binds U6 with a cis-diol relatively weakly by itself (Fig. 6a,  lanes 17-21), inclusion of Prp24 strongly promotes formation of a ternary complex (Fig. 6a, lanes 22-26) even at the lowest concentration tested and despite the lack of a phosphoryl group on U6 to promote Lsm2-8 binding.
The apparent negative cooperativity of Prp24 and Lhp1 binding is intriguing, since known Prp24 and Lhp1 binding sites on U6 RNA are presumed to be non-overlapping 35,38 . We tested the affinity of Prp24 or Lhp1 for U6 pre-saturated with the opposing binding partner (Fig. 6b). When U6 is pre-bound by Lhp1, Prp24 displaces U6 from U6-Lhp1 (Fig. 6b, lanes 9-14). Consistent with these data, U6 pre-bound by Prp24 is not released from U6-Prp24 by the addition of excess Lhp1 (Fig. 6b,  lanes 23-28). In both cases, ternary complex formation is inefficient and occurs only at high concentration, suggesting that Prp24 and Lhp1 are near-mutually exclusive for binding. From these data we propose a model for U6 snRNP assembly (Fig. 6c). Synthesis by Pol III followed by binding of Lhp1 protects nascent U6 from 3′ exonucleases. Binding of U6 by Prp24 is anticooperative with binding of Lhp1, allowing the freed 3′ tail that can then be recognized by Usb1 to produce a 3′ phosphate. The presence of a 3′ phosphate group prevents re-binding of Lhp1 and allows for recruitment of Lsm2-8, whose interaction is stabilized by recognition of the 3′ phosphate (Fig. 5) and by interactions with the Prp24 C-terminal SNFFL box 15,18 . The U6 snRNP can then join with the U4 snRNP 17 for efficient incorporation into the tri-snRNP and the spliceosome.

Discussion
Members of the 2H phosphodiesterase superfamily are found in viruses, Archaea, bacteria and eukaryotes, and are involved in diverse pathways of RNA metabolism. 2H phophodiesterase enzymes are highly divergent in sequence outside of the active site H-X-S/T motif 28  Yeast Usb1 possesses two RNA processing activities (3′-5′ exonuclease activity and 2′,3′-cyclic nucleotide 2′ phosphodiesterase activity), while hUsb1 lacks CPDase activity. Our results suggest that while the active sites of the 2H phosphodiesterase family are highly similar, residues immediately outside of the active site likely play a previously unappreciated role in RNA recognition and catalysis. Indeed, three highly structurally homologous enzymes-ThpR, yUsb1 and hUsb1-possess virtually superimposable H-X-S/T active sites, but have unique 2′,3′-cyclic phosphodiesterase activities (catalyzing formation of 2′ phosphate 43 , 3′ phosphate (this work) and 2′,3′-cyclic phosphate 11 products, respectively). Enzyme active sites are dynamic and highly sensitive to sub-angstrom scale differences, making prediction of structure-activity relationships difficult. Further study of other 2H phosphodiesterases may reveal additional diversity of substrate specificities and activities for this superfamily. The 3′ phosphate terminus generated by yUsb1 inhibits subsequent exonucleolytic processing, whereas hUsb1 leaves a 2′,3′-cyclic phosphate terminus which is a substrate for an additional exonucleolytic step, resulting in successive trimming of U6 11 . Humans also possess a counteracting TUTase that extends the 3′ tail of U6 48,49 . S. cerevisiae has no identifiable TUTase and thus the autoinhibition of yUsb1 by its own product is necessary to prevent overprocessing of U6. The combined dual exonuclease and CPDase activity of Usb1 accomplishes three important functions to initiate U6 snRNP biogenesis: it prevents further 3′ end processing, improves Lsm2-8 affinity for U6 RNA and, perhaps more importantly, prevents Lhp1 from rebinding. A 3′ phosphate might encounter steric clash with nearby loop residues of Usb1.
The evolutionary divergence in Lsm2-8 binding affinity appears to have co-evolved with the activity of Usb1. Yeast Lsm2-8 tightly    7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 13 . This, along with the observation that overexpression of U6 can rescue loss of Usb1 in S. cerevisiae 11 , suggests that Usb1 is an exonuclease that is highly specific for U6 RNA. How is such stunning specificity accomplished within the cell, where many other RNA polymerase III transcripts also terminate in a polyuridine stretch with a 2′,3′cis-diol? Usb1 is similarly active on short RNA oligonucleotides and on full-length U6 snRNA, excluding specific recognition of U6 secondary structure by Usb1. We hypothesize that specificity stems from the lack of accessible substrates for Usb1 due to binding of Lhp1 on the majority of (U) n cis-diol-containing RNAs. Binding of Prp24, however, is highly specific for the asymmetric bulge in U6 snRNA 17,35 . We have demonstrated that Prp24 binding to U6 lowers the apparent affinity of Lhp1 (Fig. 6a, b); this, in turn, would allow for Usb1 to modify the free 3′ end and thereby promote binding of Lsm2-8. The mechanism for handoff of U6 from Lhp1 to Prp24 is currently unknown, but may be caused by partially overlapping binding sites of Prp24 and Lhp1, occlusion of Lhp1 via electrostatic repulsion or induction of RNA folding that is unfavorable for Lhp1 association. Future studies aimed at determining the origin of the observed anti-cooperative binding could reveal fundamental principles of ordered, multi-step pathways of RNP assembly.
This handoff of U6 RNA, from Lhp1 to Prp24 to Prp24/ Lsm2-8, ensures that full-length and properly 3′-end-modified U6 is guarded from exonucleases in the cell. Presence of a terminal 3′ phosphate may partially protect U6 RNA from degradation by the exosome, as Rrp6 is inactive on a 3′ phosphate-terminated RNA 50 , but Rrp44/Dis3 is active on such substrates 51 . This work illustrates how Usb1 processing initiates U6 snRNP formation through a series of protein-RNA interactions that have evolved to protect U6 snRNA and chaperone it into the active site of the spliceosome.

Methods
Protein expression and purification. The Usb1 coding sequence was PCR amplified from S. cerevisiae genomic DNA with primers to introduce flanking BamHI and XhoI sites and subcloned into a pET28b plasmid, which encodes an N-terminal hexahistidine tag followed by a TEV cleavage site. The resulting cloning scar was removed using the inverse PCR method with Phusion DNA polymerase (New England Biolabs). All primer sequences are listed in Supplementary Table 3. PCR products were DpnI treated, ligated using T4 PNK and T4 DNA ligase and transformed into Escherichia coli NEB 5α-competent cells (New England Biolabs). Expression plasmids contained Usb1 residues 1-290 (full-length) or 71-290 (catalytic domain). Mutants of these plasmids were obtained using inverse PCR as described above. Resulting clones were expressed in E. coli BL21 STAR (DE3) pLysS cells (Invitrogen) in LB at 37°C with late-log phase induction by addition of 1 mM IPTG and subsequent growth for 3 h at 37°C (for Usb1 71-290) or 16°C (for Usb1 1-290 and all other proteins used in this study). Cells were collected by centrifugation, resuspended in IMAC buffer (500 mM NaCl, 50 mM HEPES acid, 50 mM sodium HEPES base, 15 mM imidazole base, 10% v/v glycerol, 1 mM TCEP-HCl) supplemented with DNase I, lysozyme and protease inhibitors (EMD Millipore) and lysed via sonication. Insoluble material was removed by centrifugation. Usb1 was purified by Ni-NTA agarose chromatography by step elution with IMAC buffer containing 500 mM imidazole. The eluate was dialyzed at 4°C overnight with 1 mg TEV protease into IEX-HEPES buffer (100 mM NaCl, 10 mM HEPES acid, 10 mM sodium HEPES base, 10% v/v glycerol, 1 mM TCEP-HCl, pH~7.0) for Usb1 71-290 or IEX-bistris buffer (100 mM NaCl, 20 mM bis-tris, 10 mM HCl, 10% v/v glycerol, 1 mM TCEP-HCl, pH~6.2) for Usb1 1-290. Precipitated protein was removed, then protein was further purified via cation-exchange chromatography (HiTrap S, GE Healthcare) in IEX buffer with gradient elution against IEX buffer containing 2 M NaCl.
The coding sequence for Homo sapiens Usb1 was codon optimized for expression in E. coli. The sequence of the synthetic hUsb1 gene is shown in Supplementary Table 4. For functional assays, this sequence was cloned via NdeI and BamHI restriction cloning into a modified pET3a plasmid (Novagen) containing an N-terminal octahistidine tag, glutathione-S-transferase (GST), and a TEV (tobacco etch virus) cleavage site. Human Usb1 (residues 79-265) used in crystallography experiments was similarly cloned into a modified pET3a plasmid (Novagen) containing an N-terminal octahistidine tag, MBP, and a TEV cleavage site. Protein was expressed and purified as described above using Ni-NTA agarose chromatography and cation-exchange chromatography with IEX-HEPES buffer.
The coding sequence for S. cerevisiae Lhp1 protein (residues 1-275) was codon optimized for expression in E. coli. The sequence of the synthetic Lhp1 gene is shown in Supplementary Table 4. A pET3a plasmid (Novagen) was modified using inverse PCR as described above to encode an N-terminal octahistidine tag followed by a TEV cleavage site. The coding sequence for Lhp1 was cloned into this plasmid using the NdeI and BamHI sites. Protein was expressed and purified as described above using Ni-NTA agarose chromatography and cation-exchange chromatography with IEX-HEPES buffer.
The Prp24 coding sequence was PCR amplified from S. cerevisiae genomic DNA with primers to introduce flanking NdeI and XhoI sites and subcloned into a pET21b plasmid, which encodes a C-terminal hexahistidine tag. Prp24 was expressed as described above in Terrific Broth. Prp24 was purified by Ni-NTA agarose chromatography as described above using Ni-NTA agarose chromatography with IMAC buffer containing 50 mM imidazole, dialyzed without TEV protease and purified via heparin chromatography (HiTrap Heparin, GE Healthcare) in IEX-HEPES as described above with the addition of 1 mM sodium azide.
S. cerevisiae Lsm2-8 lacking the C-terminus of Lsm4 was expressed from pQLink-Lsm2-8 34 in E. coli BL21 STAR (DE3) pLysS cells in Terrific Broth as described above. Lsm2-8 was first purified via Ni-NTA agarose chromatography via a TEV-labile polyhistidine tag on Lsm8 in IMAC buffer containing 50 mM imidazole base, then dialyzed overnight into IEX-HEPES buffer. After removal of precipitated protein, Lsm2-8 was purified via a TEV-labile GST tag on Lsm6 and glutathione agarose chromatography with step elution in IEX-HEPES supplemented with 10 mM reduced glutathione, 50 mM HEPES acid and 50 mM sodium HEPES base. The polyhistidine and GST tags were removed by the addition of 1 mg TEV protease during overnight dialysis at room temperature into IEX-HEPES. Lsm2-8 was then purified by anion-exchange chromatography (HiTrap Q, GE Healthcare) with gradient elution in IEX-HEPES supplemented with 2 M NaCl. Lsm2-8 was then diluted fivefold against IEX-bis-tris containing 1 mM sodium azide and further purified via heparin chromatography (HiTrap Heparin, GE Healthcare) in IEX-bis-tris containing 1 mM sodium azide with gradient elution in buffer supplemented with 2 M NaCl.
A truncated variant of AtRNL containing only the kinase and 2′,3′-cyclic phosphate 3′-phosphodiesterase domains (residues 677-1104) was modified from pET28-Smt3-AtRNL (1-1104) (a kind gift from Stewart Shuman 52 ) via inverse PCR to remove the N-terminal ligase domain and install a TEV cleavage site after the octahistidine tag. Overexpression and purification was essentially as described above using Ni-NTA agarose chromatography with IMAC buffer with 50 mM imidazole and cation-exchange chromatography using IEX-HEPES. All protein samples were analyzed by sodium dodecyl sulfate-polyacrylamide gel electrophoresis to assess purity.
Crystallization and structure determination. Crystals of truncated yUsb1 (residues 71-290) were obtained by hanging drop vapor diffusion with 2 µL concentrated protein (9 mg mL −1 ) and 2 µL of crystallization solution (0.1 M sodium acetate, pH 4.5, 2.0 M ammonium sulfate) with equilibration against 500 µL of crystallization solution at 20°C. Crystals grew as needles of approximate dimensions 50 × 50 x 100 µm over 1-3 days. Crystals were cryoprotected by addition of 10 µL of a solution containing 15% v/v glycerol, 20% w/v PEG 20,000, 0.1 M sodium acetate pH 4.6 to the crystallization drop. A heavy atom derivative was produced as above using a cryoprotectant solution that was also saturated with uranyl acetate and incubation for 1 min prior to freezing. Diffraction data were collected at 100 K on beamlines 21-ID-F or 24-ID-C at the Advanced Photon Source. Data were integrated using XDS 53 . Space group determination and scaling were performed in POINTLESS 54 and AIMLESS 55 , respectively. Phenix.xtriage was used to assay potential twinning in the diffraction data 56 . Initial phases could not be determined using molecular replacement with hUsb1 78-265 (PDB 4H7W), and therefore phases were determined by the method of Single Isomorphous Replacement with Anomalous Scattering, using initial heavy atom site identification, map calculation and density modification in the SHELXC/D/E pipeline 57 as implemented in HKL2Map 58 . Automated model building was accomplished with RESOLVE 56, 59 , with subsequent refinement via iterative rounds of manual model building in Coot 60 and automated refinement in PHENIX 56,61 .
Attempts to co-crystallize or soak in nucleotides into yUsb1 were unsuccessful, likely due to the stringent requirement for ammonium sulfate in the crystallization conditions. Crystals of truncated hUsb1 (residues 79-265) were grown by hanging drop vapor diffusion in 1.4 M sodium potassium phosphate pH 5.6 at 289 K and then transferring into a solution containing 20% w/v PEG 20,000, 20% v/v glycerol, 100 mM bis-tris base, 50 mM HCl and 10 mM disodium 5′ UMP and allowing to incubate overnight. Data were collected on beamline 24-ID-C at the Advanced Photon Source and processed as above. Initial phases were determined by molecular replacement using Phaser 62 with PDB entry 4H7W 11 , and refinement accomplished by the iterative process above. Data collection and refinement statistics are given in Table 1. All figures were generated with PyMOL (http://www.pymol.org).
Statistical analysis. Unless otherwise noted, all activity and binding assays were carried out in triplicate. The resulting data points were averaged before being used to calculate rate or binding constants. Plotted data points represent these averages ± s.d. Errors in rate or binding constants represent the errors generated by the fits to these data using GraphPad Prism 4 software.
Samples were treated with CIP or T4 PNK by addition of "Cutsmart" or "PNK" buffer from New England Biolabs and 10 units of CIP or T4 PNK and incubation at 37°C for 15 min. Mock treated samples contained only Cutsmart buffer and water in lieu of CIP or T4 PNK.
For time-course experiments, the percent processed was calculated using the ratio of product(s) to total signal at 0, 5, 15, 30 and 60 min time points. Resulting data were fit to a one-phase exponential association equation , where Y is the % of the substrate processed, x is time and k is the rate constant (GraphPad Prism 4).
Fluorescence polarization was measured on a Tecan Infinite M1000Pro using an excitation wavelength of 470 nm and emission wavelength of 519 nm. Gain was optimized for each microplate. Fluorescence polarization was measured in triplicate for each condition (using 500-1000 flashes) and averaged. Data were normalized to the values for 0 nM protein (smallest value) and to the highest value, then averaged between three technical replicates. Binding curves were fit using nonlinear regression in GraphPad Prism 4 to a four-parameter logistic equation: % bound=FP min +(FP max −FP min )/(1+10^((logK d −log[protein])×H)), where FP min and FP max are the normalized minimum and maximum % bound, K d is the binding dissociation constant and H is the Hill coefficient. FP min , FP max , K d and H were restrained to be >0.
pH dependence exoribonuclease assay. We found that Usb1 activity is strongly inhibited by the presence of either sulfate or phosphate. Thus, we sought to make a mixed buffer for pH-varied activity assays containing 200 mM each of sodium acetate, bis-tris, sodium HEPES base, Tris base and CHES (pH~9). The pH of this 1 M solution was adjusted up or down with 5 M NaOH or 5 M HCl respectively. Aliquots were removed at every 0.5 pH unit step and kept as a 50× stock of buffer. Protein and RNA were diluted to 1 µM or 200 nM, respectively, in buffer containing 100 mM NaCl, 10% w/v sucrose, 0.1 mM EDTA, 0.1 mM TCEP, 0.1 mg mL −1 BSA, 0.01% v/v Triton X-100 and 20 mM of the composite buffer. Protein and RNA were mixed together in equal volumes (for final concentrations of 500 and 100 nM respectively) for the indicated times. Reactions were quenched by addition of an equal volume of 100% formamide and samples were resolved as described above.
Yeast strains and plasmids. The USB1 coding region and 500 bp up-and downstream of the coding region was amplified from genomic DNA from the BJ2168 (MATa leu2 trp1 ura3-52 prb1-1122 pep4-3 prc1-407 gal2) strain via PCR, which created an upstream BamHI site and a downstream Xho site. The PCR product was digested with BamHI and XhoI (New England Biolabs) and ligated into BamHI/XhoI-cut pRS416 and pRS414. Point mutations in pRS414-Usb1 were generated using inverse PCR as described above.
BJ2168 was transformed with pRS416-Usb1 using the lithium acetate method 63 . The Usb1 disruption strain (yTJC0700) was constructed by transformation of BJ2168 pRS416-Usb1 with linear DNA amplified from pAG32 (Euroscarf) containing the hph gene flanked by regions homologous to 500 nucleotides up-and downstream of the USB1 gene. Cells were grown on YPD for 1 day, then replica plated onto media containing 200 µg mL −1 hygromycin. Individual colonies were screened and deletion of USB1 was confirmed via PCR.
Overexpression plasmids containing Usb1 were generated by PCR amplification of ORFs using primers that added an upstream BamHI and a downstream SalI site. The coding sequence for hUsb1 was codon optimized for expression in S. cerevisiae and the nucleotide sequence is listed in Supplementary Table 4. This PCR product was digested with BamHI and SalI (New England Biolabs) and ligated into BamHI/SalI-cut p425-GPD (ATCC 87359). All mutations, including addition of a single N-terminal hemagglutinin (HA) tag, were generated using inverse PCR as described above.
Western blotting. Yeast were grown in selective media and total protein was isolated by trichloroacetic acid precipitation 64 . Protein concentration was normalized by A 280 measurement and equivalent amounts were separated on a 4-20% Criterion TGX midi protein gel (200 V for 1 h; Bio-Rad) and subsequently transferred to a nitrocellulose membrane (30 min, 100 V, 4°C). The membrane was blocked using 5% (w/v) nonfat dry milk and probed using an HA antibody conjugated to horseradish peroxidase used at 1:5,000 dilution (Sigma Aldrich 11667475001). Mouse α-actin (AMB Millipore MAB1501) was used at 1:5,000 dilution and goat α-mouse HRP antibodies were used at 1:10,000 dilution (Bio-Rad 1706515). Blots were developed using Clarity Western ECL substrate (Bio-Rad) and imaged using an ImageQuant LAS 4000 Imager (GE Healthcare Life Sciences).
Data availability. Coordinates and structure factors have been deposited in the PDB with accession codes 5UQJ and 5V1M. Other data supporting the findings of this manuscript are available from the corresponding author on reasonable request.