RNA regulatory networks diversified through curvature of the PUF protein scaffold

Proteins bind and control mRNAs, directing their localization, translation and stability. Members of the PUF family of RNA-binding proteins control multiple mRNAs in a single cell, and play key roles in development, stem cell maintenance and memory formation. Here we identified the mRNA targets of a S. cerevisiae PUF protein, Puf5p, by ultraviolet-crosslinking-affinity purification and high-throughput sequencing (HITS-CLIP). The binding sites recognized by Puf5p are diverse, with variable spacer lengths between two specific sequences. Each length of site correlates with a distinct biological function. Crystal structures of Puf5p–RNA complexes reveal that the protein scaffold presents an exceptionally flat and extended interaction surface relative to other PUF proteins. In complexes with RNAs of different lengths, the protein is unchanged. A single PUF protein repeat is sufficient to induce broadening of specificity. Changes in protein architecture, such as alterations in curvature, may lead to evolution of mRNA regulatory networks.

R NA-binding proteins control an messenger RNA (mRNA)'s life, including its translation, movement and destruction. These events underlie diverse biological processes, ranging from early development to memory formation. Regulatory proteins bind simultaneously to short RNA sequences, typically in 3 0 -untranslated regions (3 0 UTRs), and to protein effectors that determine the RNA's fate. The RNA-binding specificities of the proteins determine which mRNAs are controlled, while effectors determine the outcomes.
RNA regulatory networks, in which a single RNA-binding protein controls multiple mRNAs, are widespread 1 . For example, cytoplasmic polyadenylation element-binding protein, an RNA recognition motif-containing protein, binds to and regulates many mRNAs that participate in the regulation of embryonic cell cycles 2 , while Nova co-regulates multiple mRNAs with roles in alternative polyadenylation and splicing 3 . As a result, RNA-binding proteins integrate post-transcriptional controls, as DNA-binding proteins coordinate transcriptional regulation. To understand RNA regulatory circuits in molecular terms, we need to know which mRNAs are controlled, how they are recognized, and how the networks change during evolution.
PUF proteins are exemplary mRNA regulators 4,5 . They bind to the 3 0 UTRs of many mRNAs and do so through single-stranded RNA-binding elements. For example, Puf3p of yeast binds nuclear-encoded mRNAs with roles in the mitochondria 6 . Similarly, PUF proteins in Caenorhabditis elegans, Drosophila and humans control an overlapping battery of mRNAs with established roles in stem cells 7 . The RNA-binding specificities of most PUF proteins are defined in part by three amino acids (tripartite recognition motifs, TRMs) in each of eight tandemly reiterated PUF repeats 8 . TRMs, annotated as XY-Z, recognize specific bases through edge-on (residues X and Y) and stacking interactions (residue Z) 9 . Specificity also can be achieved through the requirement for a base that does not contact the protein, but is solvent-exposed 8,9 . RNA immunoprecipitation and microarray (RIP-chip) studies suggested that Saccharomyces cerevisiae Puf5p binds B200 mRNAs that contain 10-nt-long binding elements 6 . Genetic analysis has implicated Puf5p in multiple cellular functions, including lifespan 10 , cell wall integrity 11 , chromatin structure 10 and mating type switching 12 , consistent with the view that it participates in the control of diverse groups of mRNAs.
In this work, we used ultraviolet-crosslinking and high-throughput sequencing to define B1,000 high-confidence RNA targets of Puf5p in vivo. These targets possess unexpected diversity of binding element lengths, with the same RNA sequence features at the two ends but varying numbers of nucleotides in between. The lengths of sites correlate with the biological functions of the targets. The crystal structures of Puf5p-RNA complexes revealed that the RNAs assume altered conformations to accommodate a fixed protein architecture. The plasticity in binding element length is driven by the flattened curvature of the PUF protein scaffold. The findings suggest ways in which alterations in protein curvature result in new specificities and enable the evolution of new RNA networks.

Results
Identification of Puf5p RNA targets. Using in vivo ultravioletcrosslinking and high-throughput sequencing (HITS-CLIP 13,14 ), we identified 41,000 mRNAs to which Puf5p binds in S. cerevisiae, representing 16% of the yeast transcriptome (Fig. 1a). The strain analysed contained a PUF5 gene fused to a tandem affinity purification (TAP) tag. The tagged gene was integrated by homologous recombination into the PUF5 locus. Cells were irradiated during mid-log phase (Fig. 1a). After lysis and mild RNase treatment (Fig. 1a), Puf5p was stringently purified through tandem affinity steps (Fig. 1a) and SDS-polyacrylamide gel electrophoresis. The purification of crosslinked complexes was effective, as evidenced by western blotting (Fig. 1b). Complexes whose RNA components had been 32 P-end labelled exhibited heterogenous, slower mobilities than Puf5p alone (Fig. 1c). To identify RNAs bound to Puf5p, adaptors were ligated, the protein digested, and the RNAs converted to complementary DNAs that were analysed by high-throughput sequencing (Fig. 1a). The adaptors contained random bar codes, so that PCR duplication events could be discarded.
On aligning the sequence reads to the yeast genome, we found that the majority of peaks were within 3 0 UTRs of mRNAs, and  that the set of target mRNAs were distinct from, but overlapped, those of other yeast PUF proteins. We obtained 16,300,145 and 11,100,468 reads from two biological replicates. Of these, 616,401 and 491,532 (6 and 7%) mapped to unique locations in the S. cerevisiae genome, after filtering by quality score and removing PCR duplicates ( Fig. 1a; Supplementary Fig. 1a). The functional enrichment of targets detected was only minimally affected by changing the filtering methods we used ( Supplementary Fig. 1b Fig. 3b). The overlap with Puf3p targets is 30% by CLIP analysis and 2% by RIP-chip ( Supplementary Fig. 3c,d). The 1,043 Puf5p targets represent 16% of yeast mRNAs, a fivefold increase relative to the 206 targets detected in earlier RIP-chip studies 6 (Supplementary Fig. 3a; see the Discussion section). We conclude that Puf5p is a broad regulator of a distinct set of mRNAs in S. cerevisiae.
Puf5p-binding elements range in length from 8 to 12 nt. We developed stringent criteria to select a set of 1,043 high-confidence targets that we used to identify RNA sequence elements bound by Puf5p. We first defined significant peaks as an enrichment of independent reads in a specific genic region (modified false discovery rate (modFDR)o0.01) 15 ( Supplementary Fig. 1a). To identify high-confidence targets, we required that a peak contain 1-nt or 2-nt deletions in multiple reads (a strong indicator that these RNAs had been crosslinked to Puf5p 16 ) and a minimum of 10 reads per peak. In addition, these criteria had to be satisfied in both biological replicates ( Supplementary Fig. 1a). Normalized peak heights at specific loci were reproducible between the biological replicates (Pearson correlation coefficient 0.90; Fig. 1e). Previous studies validated several putative targets of Puf5p by showing they were regulated by that protein in vivo 17,18 . Among the best characterized are SMX2 and HO mRNAs 19,20 , which are used here as examples (Fig. 2a). With both mRNAs, peaks lay over the previously characterized binding elements in their 3 0 UTRs (Fig. 2a). We identified five classes of Puf5p-binding elements ranging from 8 to 12 nt, each comprising a 5 0 -UGUA tetranucleotide sequence and a 3 0 -UA with a variable length spacer region in between (Fig. 2b). We performed an unbiased search of the complete set of high-confidence targets for over-represented sequences in peaks using multiple em for motif elicitation (MEME) 21 . The position weight matrix we obtained consists of a 5 0 -UGUA tetranucleotide sequence followed by a degenerate 3 0end (Fig. 2b). However, we could de-convolute the complete set of 5 0 -UGUA-containing sequences into five classes of binding elements, ranging in length from 8 to 12 nt beginning at the 5 0 -UGUA, each with a 3 0 -terminal UA sequence (Fig. 2b). Seventyone per cent of the 1,043 targets, and 66% of the total number of peaks (1,439), contained at least one Puf5p-binding element. Peaks without enriched sequences may reflect contacts that were less sequence-specific or mediated by interactions between Puf5p and RNA-bound factors. The sequences between UGUA and UA display little difference compared with the background nucleotide frequencies surrounding the sites (Supplementary Fig. 4), though adenosine was modestly enriched 1-or 2-nt upstream of the 3 0terminal UA in 9-and 10-nt elements ( Supplementary Fig. 4a), and guanosines were uncommon ( Supplementary Fig. 4c).
The breadth of binding element lengths associated with Puf5p is unusual among PUF proteins (Fig. 2c). For example, three other PUF proteins-human PUM2 (ref. 14), S. cerevisiae Puf4p 6 and C. elegans FBF-2 (ref. 7)-show a single dominant length of site in vivo, measured either by CLIP methods 22 or inferred from RIP-chip 23 (Fig. 2c). Essentially the same behaviour was observed for each protein in vitro 24 . To examine the sequence preferences of Puf5p in vitro, the purified protein was incubated with an RNA library in which 20 consecutive nucleotides had been randomized, generating a theoretical complexity of four 20 RNA sequences 24 . Bound RNAs were eluted, and the process repeated five times. The RNAs were analysed by high-throughput sequencing. In this method, termed SEQRS, the number of reads obtained is a proxy Orange and green lines represent reads mapped for each replicate. SMX2 has a peak over the 9-nt-binding element. HO has a broad peak over two binding elements: a 9 nt lower affinity site and a 8/10 nt higher affinity site. (b) Binding elements identified in high-confidence Puf5p target mRNAs. The MEME-derived logo is shown on the left, which was deconvoluted into five binding elements of 8-12 nt in length. (c) Distribution of binding element lengths for four PUF proteins representing three species. Results from CLIP (red), SEQRS 24 (light blue), RIP-chip 6,7 (green) and PAR-CLIP 14 (dark blue) experiments are compared, where available, and shown as enrichment relative to the predominant length for each protein, which is set to 1. The consensus RNA sequence element for each protein is shown, where N is A, C, G or U. for affinities measured in vitro 24 . Re-analysis of data obtained with Puf5p 24 revealed that the number of reads for each site length yielded a pattern similar to that seen in HITS-CLIP, in that 9-and 10-nt sites were the most abundant (Fig. 2c). Eight, 11 and 12-nt sites were less prevalent in SEQRS than in vivo, but above background in vitro. Indeed, for each protein analysed, we observed greater binding for sub-optimal lengths in the cell than in vitro (Fig. 2c). Many factors affect binding in vivo, including protein-protein interactions and RNA accessibility.
Site length correlates with biological function. The majority of the sites bound by Puf5p consist of individual elements, in which only a single site of unambiguous length is present in the CLIP peak ( Fig. 3a Table 2). Surprisingly, when we analysed mRNAs with different binding site lengths separately, RNAs with 8-nt binding elements were over-represented for mitochondrion organization (hypergeometric distribution test-correction Holm-Bonferroni, P value 9.5e À 4); 9-nt sites for ribosome biogenesis (hypergeometric distribution test-correction Holm-Bonferroni, P value 3.6e À 14); and 10-nt sites for regulation of gene expression (hypergeometric distribution test-correction Holm-Bonferroni, P value 2.9 e À 6), 11-nt sites for translation (hypergeometric distribution test-correction Holm-Bonferroni, P value 3.6e À 3) (Fig. 3b). A total of 12-nt sites did not correlate with a specific GO term. GO analysis of the mRNAs with binding elements in ORFs or 5 0 -UTRs, revealed that only 10-nt-binding elements in ORFs were associated with a GO term, positive regulation of pseudohyphal growth (hypergeometric distribution test-correction Holm-Bonferroni, P value 8.1E À 3).
The 8-nt Puf5p elements lie in a subset of mRNAs that also were bound to Puf3p in PAR-CLIP experiments 25 . Puf3p associated with B1,000 nuclear-encoded mRNAs with mitochondrial functions 25 . Even if the criteria selecting highconfidence Puf5p targets are relaxed-not filtering for gapped reads-a very similar enrichment emerges ( Supplementary  Fig. 1b). Similarly, 22% of the 9-nt Puf5p elements lie in mRNAs that bind Puf4p and are enriched for genes with ribosome assembly and nucleolar functions. Of the Puf5p targets with GO annotations mitochondrion organization or ribosome biogenesis, 46 and 31% are Puf3p or Puf4p targets, respectively 6 . We suggest that the restricted specificities of Puf3p and Puf4p for 8-and 9-nt sites, respectively, underlie the correlations between Puf5p length of binding sites and their biological functions. In particular, we suggest that the broadened specificity of Puf5p enabled its recruitment to pre-existing RNA regulatory circuits in the S. cerevisiae lineage (see the Discussion section).
Despite the varying lengths and sequences of the RNA-binding sites, the overall conformation of Puf5p was unchanged in the four crystal structures (root mean squared deviationo0.7 Å over all Ca atoms or o1.1 Å over all protein atoms). The protein scaffold comprises eight a-helical repeats flanked by a short N-terminal sequence and a C-terminal helix (R8 0 ) (Fig. 4a). The C-terminal repeats 5-8 bound the 5 0 -UGUA RNA sequence, while repeats 1 and 2 bound the UA-3 0 element (Fig. 4a,b). Repeats 3 and 4 lie opposite the variable central regions of the RNAs (Fig. 4c-f). While the overall architecture of Puf5p resembles that of other PUF proteins, Puf5p's repeats are more irregular in length and structure than seen in human PUM1 and S. cerevisiae Puf3p and Puf4p ( Supplementary Fig. 8a). For example, repeats 7 and 8 in Puf5p are unusually long (64 and 72 residues versus 36 in a typical repeat) with extended a2 and a3 helices and inter-helix loops. The positions of the a3 helices relative to the a1 and a2 helices are also more varied in Puf5p than Puf3p or Puf4p ( Supplementary Fig. 8a).
Since the curvature of the Puf5p scaffold is fixed, RNAs of different lengths adopt different conformations, as described below. Recognition of the 5 0 -UGUA and 3 0 -UA elements by repeats 5-8 and 1-2, respectively, are identical in all structures. Differences in RNA conformation and recognition are found opposite the central repeats 3 and 4.
Puf5p binds to the 9-nt SMX2 RNA site by recognizing all but the central fifth base. (Fig. 4c). Bases 1-4 and 6-9 are each recognized by a PUF repeat ( Supplementary Fig. 5a). However, the fifth base, C5, lies in an atypical conformation, in which the plane of the base is parallel to the axis of the protein, within van der Waals bonding distance of the side chain of Cys381 in repeat 5 (Fig. 4c). The ribose rings of C5 and U6 adopt C2 0 -endo conformations to accommodate positioning base C5. We refer to this conformation as '5-parallel'.
Puf5p binds the 10-nt MFA2 site similarly to the 9-nt SMX2 site, but an additional base is accommodated by turning the eighth base away from the RNA-binding surface opposite repeat 3, which we refer to as '8-flipped' (Fig. 4d). The positions for all but the eighth base overlap with the 9-nt SMX2 RNA, and the protein:RNA recognition pattern is similar, though the seventh base is a uracil in MFA2 and an adenine in SMX2 ( Fig. 4b; Supplementary Fig. 5a).
Puf5p appears to recognize only the 5 0 -UGUA conserved element and two additional 3 0 bases of the 11-nt AAT2 site ( Fig. 4e; Supplementary Fig. 5b), consistent with weaker binding of Puf5p to this site than 9-or 10-nt sites (12-or 2-fold weaker binding, respectively, Supplementary Table 3). A 2.5-Å crystal structure of Puf5p:AAT2 reveals electron density for bases 1-5 and for two 3 0 bases bound to Puf5p repeats 1 and 2. In contrast to the parallel orientation of base 5 in 9 and 10-nt sites, base A5 of AAT2 stacks directly with base A4 and forms a van der Waals contact with the side chain of Cys381 in Puf5p repeat 5 (Fig. 4e). We refer to this conformation as '5-stacked'. Using the consensus sequences as a guide, we modelled the 3 0 bases as the conserved U10 and A11 bases and did not model bases 6-9. However, alternate conformations of the RNA are possible, including a conformation similar to that of the 12-nt AMN1 site.
Puf5p binds to the longer 12-nt AMN1 site with a distinct RNA conformation. Unlike the conformations of the shorter length binding sites, bases A4, A5 and C6 stack directly with each other opposite repeat 5. We refer to this conformation as 'Triple-stacked'. Residues in repeat 5 (Cys381 and Lys385) contact bases A5 and C6 (Fig. 4f). Puf5p repeat 4 does not interact with an RNA base using its edge-interacting residues, but base U7 is bound to repeat 3 (Fig. 4f). Electron density was observed for bases 1-7 and two 3 0 bases bound to Puf5p repeats 1 and 2. We modelled the 3 0 bases as the conserved U11 and A12 bases, as we did for the 11-nt AAT2 site, and bases 8-10 were not included in the model.
Curvature as a determinant of specificity. The flatter RNAbinding surface of Puf5p contributes to its specificity by creating a more extended RNA-binding surface. Puf5p possesses the least curved RNA-binding surface observed among PUF proteins to date (Supplementary Fig. 8b) and binds to the longest RNA target sequences identified thus far. Puf3p preferentially binds 8-nt sites and exhibits the greatest curvature among the yeast PUFs ( Supplementary Fig. 8b); this reflects the regular spacing of RNA-binding helices, which matches the spacing of bases in an extended RNA chain 26 . Puf4p, which binds 9-nt-binding sites, is intermediate in curvature, between Puf3p and Puf5p ( Supplementary Fig. 8b). Each node (small circle or square) represents one HITS-CLIP peak in an mRNA. Green square nodes represent mRNAs containing an individual binding element of either 8, 9, 10, 11 or 12 nt; nodes with only one binding element (edge) are placed at the outer periphery of the diagram. Green square nodes with two lines indicate that the HITS-CLIP peak contained two non-overlapping binding elements. Most binding elements were unambiguously of a single length. In a minority of elements, two different lengths of binding elements co-reside in a single sequence. Circles represent these mRNAs with 'overlapping' binding elements: for example, '8-10' means a single site of the sequence UGUNNNUAUA, which possesses both 8-and 10-nt elements depending on the 3 0 -UA used, and either sequence may be used in vivo, and '8-10, 9-11' means that two distinct overlapping sites are present under the peak. The key to the right is a colour code for each combination of overlapping binding element lengths (nt). The numbers of mRNAs containing overlapping binding elements are provided in Supplementary Table 1. (b) Gene ontology term enrichment for mRNAs belonging to each length of binding element using SGD YeastMine 39 . GO terms that are significantly over-represented in the gene list are bolded for each binding element length. Numbers of genes in the most enriched GO terms are in parentheses. NATURE COMMUNICATIONS | DOI: 10.1038/ncomms9213 ARTICLE Extension of the Puf5p RNA-binding surface is produced by the structural arrangements in repeats 4 and 5 and corresponds to the variability in Puf5p target sequence length relative to other PUF proteins. The largest repeat-to-repeat angle in Puf5p is centred about repeat 5 ( Supplementary Fig. 8c). Repeat 5 also lacks a large side chain capable of stacking with RNA bases and lies opposite several of the atypical RNA conformations (5-parallel, 5-stacked and triple-stacked). The flatness combined with a protein surface lacking specificity allows 'extra' RNA nucleotides, needed to span the distance between repeats with base specificity, to assume different conformations. These extra nucleotides may not contact the protein, but instead stack with one another or lie parallel to the RNA-binding surface.
Evolution of binding specificity across Ascomycota. To examine the evolution of the broad specificity of Puf5p, we probed the RNA-binding preferences of Puf5p proteins from representative species across phylum Ascomycota. This group includes the budding yeasts, filamentous fungi and fission yeasts (Fig. 5a). We used the yeast three-hybrid assay to measure the affinities of Puf5p orthologues from six different species-S. cerevisiae, Saccharomyces bayanus, Eremothecium gossypii, Candida albicans, Neurospora crassa and Schizosaccharomyces pombe (Fig. 5a). These proteins were identified as orthologues using SYNERGY, which relies on the species tree, sequence similarity and synteny 27 . Their binding preferences versus length of site were evaluated using a set of RNAs 8-12 nt in length, conforming to the sequence UGUA(A) 2-5 UA using the yeast three-hybrid system 28 . All the RNAs thus maintained the 5 0 -UGUA and 3 0 -UA critical for S. cerevisiae Puf5p interaction and contained a single 3 0 -UA element to define the target length unambiguously. In the three-hybrid assay, the level of expression of a reporter gene (LacZ) is a proxy for the affinity of the interaction 29 .
Puf5p proteins across Ascomycota exhibited broad binding specificities. Puf5 proteins bound similarly to sites of 9, 10, 11 and 12 nt (Fig. 5b). The more restricted specificities of Puf4p for 9-nt sites (Fig. 5c), and Puf3p for 8-nt sites (Fig. 5d), also were conserved across the entire phylum, with the exception of S. pombe Puf3. This protein bound a broad range of site lengths, unlike its orthologues in other species that showed preference for 8-nt sites.
The broadened specificity of S. pombe Puf3 appears to have arisen exclusively in the fission yeast lineage, which enabled us to probe how that broadening arose during evolution. We reasoned that the broadening was not due to the identity of the RNA-interacting TRMs, as the residues are identical among all the Puf3p orthologues (with the exception of repeat 3 in N. crassa, with a Gln to Arg substitution). To identify the key regions of the proteins that confer specificity, we prepared chimeras in which segments of the S. cerevisiae and S. pombe proteins were exchanged (Fig. 6a). The specificity profile-broad or narrow-was conferred by PUF repeats 6-8. A chimeric protein possessing repeats 6-8 from S. pombe exhibited broad specificity, while a chimera with repeats 6-8 of the S. cerevisiae protein had narrow specificity (Fig. 6a,b). The protein sequences in repeat 6 contain a divergent region among Puf3p orthologues ( Supplementary Fig. 9). Indeed, substitution of S. pombe repeat 6 alone into an S. cerevisiae scaffold was sufficient to confer the broad specificity profile (Fig. 6b).

Discussion
Puf5p is a broad regulator of RNAs in S. cerevisiae, binding to 41,000 RNA targets, constituting B16% of the transcriptome. A total of 71% of these targets possess recognizable binding elements beginning with a 5 0 -UGU sequence, which range in b c MFA2-10 nt UGUAUUUGUA  length from 8 to 12 nt. The variations in length are accommodated by conformational adaptations of the RNA onto a fixed protein scaffold. The wide range of mRNA target site lengths is consistent with prior studies that linked Puf5p to a spectrum of functions, including cell wall integrity 11 and chromatin structure 10 . The biological functions of target mRNAs are correlated with the length of binding elements they possess. How does this correlation arise? We propose that the correlation is imposed by other RNA-binding proteins that recognize the same binding elements, and whose specificity is much more restricted than Puf5p (Fig. 5b-d). For example, Puf3p binds 8-nt sites that are largely in mRNAs with mitochondria-related functions, while Puf4p binds 9-nt-binding elements in mRNAs with roles in ribosomal biogenesis and assembly 6 .
Two PUF proteins that bind the same site could do so sequentially, competitively or cooperatively. Genetic studies demonstrate that Puf4p and Puf5p redundantly control the decay rate of common targets 30 . In the absence of one of the proteins, the other is sufficient. However, for other common targets, the actions of two PUF proteins may be sequential. For example, MRPL8 mRNA is a target of both Puf5p and Puf3p, possesses a single binding element, and is localized to the mitochondrial periphery in a Puf3p-dependent manner 31 . Puf5p could exchange with Puf3p, facilitating repression (Puf5p) en route to localization to the mitochondria (Puf3p).
While 71% of Puf5p targets possess discernible binding elements, 29% do not. RNAs without binding elements may associate with Puf5p indirectly, perhaps through a protein to which it and Puf5p are bound. Crosslinking to RNAs without sites could also be driven by their high concentrations in specific subcellular compartments (such as P-bodies), in which proteins and RNAs are present at high concentrations, and low complexity, Q/N-rich regions present in Puf3p, Puf4p,and Puf5p proteins that could facilitate aggregation 32 .
RNAs of different lengths adopt a broad range of conformations when bound to Puf5p. The flatter, extended scaffold of Puf5p, combined with its specificity for 5 0 and 3 0 sequences, imposes the requirement for these RNA conformational variations and permits recognition of 8-12-nt length RNAs. The elegance of this arrangement is that very similar sets of atomic contacts between amino acids and RNA bases are maintained in the different complexes, despite the range of RNA lengths they possess. For example, 18 of the 21 edge-on contacts made between Puf3p and its RNA target are also made in Puf5p bound to a 10-nt length site. In an analogous manner, b-catenin maintains a fixed scaffold to recognize peptides from different ligands (reviewed in ref. 33). Its central a-helical Armadillo (ARM) repeats interact with conserved sequence elements in an extended peptide while N-and C-terminal ARM repeats bind elements unique to that ligand. The changes in repeat-to-repeat arrangement at the junctions between the central ARM repeats and N or C-terminal repeats seem to mark the regions with different protein-binding functions. In the same manner, changes in curvature at specific repeat junctions in PUF proteins correlate with specialization in RNA-binding specificity.
The fact that a single repeat can broaden or narrow specificity (Fig. 6) suggests that this sort of change may be common in evolution (Fig. 6b). The sixth PUF repeat of Puf3p determines ARTICLE whether that protein binds 8-nt sites (S. cerevisiae) or accommodates 8, 9 or 10-nt sites (S. pombe). S. cerevisiae repeat 6, which induces narrow specificity, contains additional residues relative to the same region of the S. pombe protein, which, although not near the RNA-binding residues, may alter the structure with corresponding effects on specificity ( Supplementary Figs 8a and 9). From an evolutionary perspective, the broadening of Puf5p's specificity may have enabled new regulatory inputs into existing RNA circuits. Perhaps, the ability of Puf5p to recognize a wide array of target lengths arose after ancestral proteins (for example, Puf3p) already regulated batteries of RNAs with related functions and conserved lengths of sites. Recruitment of Puf5p to these same targets, enabled by its flatter curvature, provided new regulatory inputs and/or redundancy into that same circuit. For example, Puf5p binds regulatory kinases 34 , whose input could be brought to bear on a pre-existing circuit. Regardless, we suggest that curvature of the scaffold is critical in defining the RNAs that are controlled. Acquisition of new RNA specificities by alterations of the protein's architecture suggests ways in which new RNA circuits are established, expanded and contracted during evolution.
The beads were resupsened in 30 ml of 1 Â LDS sample buffer (Life Technologies NP0008) including 1 ml sample reducing agent (Life Technologies NP0004). The reaction was incubated at 70°C for 10 min and loaded in one lane of a 4-12% NuPAGE Novex Bis-Tris (Life Technologies NP0321BOX) gel. The gel was run at 150 V until the dye front ran off the gel. The complexes were then transferred to a nitrocellulose membrane (Novex LC2001) in a XCell II Blot module (Novex EI0001) at 30 V for 1 h at 4°C. The membrane was put on a phosphor screen to visualize the Puf5p-RNA complexes. Complexes were cut out of the membrane. Nitrocellulose pieces were then incubated with 200 ml of 4 mg ml À 1 Proteinase K (Thermo Scientific E00491) in 1 Â PK buffer at 37°C for 20 min while shaking. A measure of 200 ml of 1 Â PK 7 M urea was added and then incubated at 37°C for 20 min. RNA was phenol chloroform extracted and precipitated with ethanol:isopropanol (1:1), NaOAc and glycoblue (Life Technologies AM9516) overnight at À 20°C. The pellets were washed and resuspened in 6.9 ml water. 5 0 -RNA adaptor ligation was performed in 1 Â T4 RNA ligase buffer, bovine serum albumin 1 ml (1 mg ml À 1 ) T4 RNA ligase, 1 ml (10 m ml; Thermo Scientific EL0021) 5 0 -adaptor (5 0 -GUUCAGAGUUCUACAGUCCGACGAUCNNNNN-3 0 ) and incubated at 16°C for 2 h. The reaction was quenched by phenol chloroform extraction then precipitated as above.
The pellets were washed and resuspended in 10 ml water. dNTPs (0.5 mM) and 10 uM RT primer (5 0 -GCCTTGGCACCCGAGAATTCCA-3 0 ) were combined and heated to 65°C for 5 min then cooled on ice. Reverse transcriptase buffer (1 Â ), 10 mM dithiothreitol (DTT), 20 units RNasin and 200 units SuperScript II (Life Techonlogies 18064-022) were combined and incubated at 50°C for 45 min, 55°C for 15 min and then 90°C for 5 min. Reverse transcriptase reactions were PCR amplified by adding 10 ml of the reaction to 15 ml GoGreen Taq (Promega M7123) and 10 mM of each primer (forward 5 0 -AATGATACGGCGACCACCGAGATCT ACACGTTCAGAGTTCTACAGTCCGA-3 0 ; reverse 5 0 -GCCTTGGCACCCGAG AATTCCA-3 0 ). Thirty PCR cycles (95°C for 30 s, 60°C for 30 s and 72°C for 30 s followed by 72°C for 5 min) were used to amplify the libraries. PCR reactions were purified by 1% agarose TBE gel. Smears corresponding to 150-300 bp were gel purified and ethanol precipitated as above. Libraries were bar-coded (5 0 -CAAGC AGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCCTTGGCACC CGAGAATTCCA-3 0 ) using the same PCR as above for 10 cycles except replacing the reverse primer with bar-coded primer and submitted to the UW-Madison sequencing facility for sequencing on one highly multiplexed Illumnia HighSeq 2000 lane.
Western blot. A measure of 50 ml of IgG beads were removed from CLIP samples and then incubated in 30 ml LDS sample buffer (Life Technologies NP0007). The whole reaction was run on a Novex 6% TBE gel then transferred to PDVF membrane (Millipore IPVH00010). The membrane was probed with TAP Tag Polyclonal Antibody (1:10,000; Pierce:CAB1001) primary antibody followed by goat anti-mouse secondary antibody (1:10,000; KPL:074-1506).
Informatic pipeline. FASTQ files were uploaded to the Galaxy server 35 and groomed (FASTQ Groomer) 36 . Adaptor sequences were then trimmed using Clip discarding sequences that contained the 5 0 -adaptor or were too short after 3 0 -adaptor clipping. The data were then filtered based on quality score using Filter FASTQ with a minimum length of 15 bases and a minimum quality score of 20. The 5 0 -adaptor included a 3 0 -random bar code that was used to remove PCR duplicates by discarding any read with a perfect duplicate.
The filtered reads were mapped to the S. cerevisiae genome using Bowtie2 (ref. 37; bowtie2 -x /Scgenome -q filename.fastq -S filename.sam -5 5 -N 1 -p 8). The .sam files were used to create.bam and indexed.bam files using samtools for visualization of the data in Artemis Genome browser. Peaks were defined using Pyicoteo 15 (python pyicoclip filename.sam -f filename.pk-region Sc.bed-stranded). The .bed file required for Pyicoteo was downloaded from the Saccharomyces Genome Database (SGD) 38 . Next, the duplicate peaks were removed from the.pk file. Using the Pyicoteo defined summit, each peak was assigned to a genomic feature using the features table from the SGD. Sequences 200 bases upstream of the ORF and 300 bases downstream of the ORF were used as 5 0 UTRs and 3 0 UTRs, respectively, and then added to the SGD features table. The number of gapped reads for each peak was defined. Kurtosis was calculated for each peak using the peak profile defined by Pyicoteo. A total of 25 bases of genomic sequence flanking each peak summit was retrieved to define binding elements in two ways. MEME was used as an unbiased search and direct searches were used for known binding elements.
The biological replicates were combined into one list based on the following: (1) each peak had a summit within 10 bases in both replicates; (2) each peak contained a gapped read in both replicates; and (3) each peak had a height 410 reads (third quartile) in both replicates. Functional enrichment was performed using GO analysis. Gene lists were uploaded to YeastMine where the P value was calculated using the hypergeometric distribution test (whole genome as background) and multiple test corrected using Holm-Bonferroni 39 .
Protein purification. The RNA-binding domain of yeast Puf5p (residues 201-600) was subcloned into the pSMX vector with an N-terminal His 6 -SUMO tag 40 . Escherichia coli cells BL21 Star (DE3) carrying the Puf5p plasmid were grown in Terrific Broth media to OD 600 ¼ B0.   Puf3 specificity is linked to repeat 6. RNA-binding element length preferences for the chimeric proteins were assayed using the yeast three-hybrid system with 8, 9 or 10 nt RNAs as in Fig. 5. Raw luminescence values per cell for each biological replicate (n ¼ 3) were averaged then normalized to an acaAAAUA mutant negative control, which depresses binding 4100-fold 49 . Error bars represent s.d.
Protein-RNA crystallization. RNAs were purchased from Thermo Scientific. Puf5p (4 mg ml À 1 ) was mixed with each of the four different RNAs at a protein:RNA molar ratio of 1:1.2 and incubated on ice for 1 h. Crystals were obtained at 20°C by hanging drop vapour diffusion, mixing 1 ml Puf5p-RNA complex with 1 ml reservoir solution of 15-20% (w/v) PEG 3350 and 0.1 M citrate Bis-Tris propane (CBTP), pH 7.6. Microseeding was performed to grow larger single crystals. Crystals were cryo-protected in crystallization solution supplemented with 15% (v/v) glycerol and flash frozen in liquid nitrogen. For phasing, a Puf5p:SMX RNA complex crystal was soaked in 17.5% (w/v) PEG 3350, 0.5 M KI, 0.1 M CBTP and 15% (v/v) glycerol for 5 min and then flash frozen.
X-ray data collection. X-ray data for structures of the 9-, 10-and 12-nt RNA complexes were collected at the SER-CAT beamline at the Advanced Photon Source, Argonne National Laboratory. Data for the 11-nt RNA complex and the iodide-soaked crystal were collected at the NIEHS in-house facility equipped with a Rigaku 007HF rotating anode generator and a Saturn 92 charge-coupled device area detector system. All data were processed using HKL2000 (ref. 41).
Structure determination. The crystal structure of a Puf5p:SMX2 RNA complex (space group P2 1 2 1 2) was determined by combining molecular replacement (MR) with iodide single-wavelength anomalous diffraction (SAD)) phasing. The Phenix software suite was used throughout the process of structure determination 42 . The anomalous signal of the SAD data extended only to 5.0 Å, and MR or SAD alone failed to solve the structure. A truncated Puf4p structure (PDB: 3BX2) containing repeats 4-8 (residues 684-887) was used as the MR search model. Following MR, AutoSol identified eight iodide sites with Figure of Merit (FOM) of 0.33. Running AutoBuild after MR-SAD phasing produced a model with R free ¼ 44%. The model was further improved to R free ¼ 38% using the EMBL Hamburg Auto-Rickshaw web server 43 . Electron density for the 9-nt RNA was clearly visible. Iterative cycles of manual model building in Coot 44 and refinement with Phenix led to the final model with R free ¼ 28% ( Table 1).
Crystals of the 10-, 11-and 12-nt RNA complexes and some crystals of the 9-nt SMX2 RNA complex belonged to space group P6 1 22, although all crystals were grown in the same conditions as the P2 1 2 1 2 SMX2 crystals. These structures were determined by MR using the Puf5p coordinates from the initial Puf5p:SMX2 structure as the search model. Data and refinement statistics are shown in Table 1. All models show good geometry according to MolProbity 45 : 95-98% of the residues are in favoured regions of the Ramachandran plot, and there are no outliers.
Electrophoretic mobility shift assays. RNA oligonucleotides were radiolabelled using 32 P-g-ATP and T4 polynucleotide kinase (New England Biolabs) following the manufacturer's instructions. Serially diluted Puf5p was mixed with 100 pM labelled RNA in buffer containing 10 mM HEPES, pH 7.4; 50 mM NaCl; 1 mM EDTA; 0.1 mg ml À 1 bovine serum albumin; 0.01% (v/v) Tween 20 and 0.1 mg ml À 1 yeast tRNA. After overnight, incubation at 4°C, 4-ml loading dye (15% v/v Ficoll 400 and 0.01% bromophenol blue) was added to each 20-ml reaction before gel loading. Novex TBE gels (10%; Invitrogen) were run at 100 V at 4°C for 30 min to resolve the samples. The gels were dried and exposed to storage phosphor screens. The screens were scanned using a Molecular Dynamics Typhoon phosphorimaging system (GE Healthcare). The band intensities were analysed using the ImageQuant. K values were calculated using GraphPad Prism by fitting the data assuming one-site specific binding and a Hill coefficient of 1. B93% of Puf5p was active, as determined using the method described in reference 46 . The reported K d values were not adjusted.