Exploring the suitability of RanBP2-type Zinc Fingers for RNA-binding protein design

Transcriptomes consist of several classes of RNA that have wide-ranging but often poorly described functions and the deregulation of which leads to numerous diseases. Engineering of functionalized RNA-binding proteins (RBPs) could therefore have many applications. Our previous studies suggested that the RanBP2-type Zinc Finger (ZF) domain is a suitable scaffold to investigate the design of single-stranded RBPs. In the present work, we have analyzed the natural sequence specificity of various members of the RanBP2-type ZF family and characterized the interaction with their target RNA. Surprisingly, our data showed that natural RanBP2-type ZFs with different RNA-binding residues exhibit a similar sequence specificity and therefore no simple recognition code can be established. Despite this finding, different discriminative abilities were observed within the family. In addition, in order to target a long RNA sequence and therefore gain in specificity, we generated a 6-ZF array by combining ZFs from the RanBP2-type family but also from different families, in an effort to achieve a wider target sequence repertoire. We showed that this chimeric protein recognizes its target sequence (20 nucleotides), both in vitro and in living cells. Altogether, our results indicate that the use of ZFs in RBP design remains attractive even though engineering of specificity changes is challenging.


Results
Sequence-specificity analysis of the RanBP2-type ZF family members. In order to obtain ZF domains that bind to different RNA sequences, we selected five RanBP2-type ZFs for which the likely RNAbinding residues differed from ZRANB2 ZF2; (i) the Homo sapiens EWS ZF, (ii) the Arabidopsis thaliana ABI3-5Sup ZF, (iii) the Danio rerio ZRANB1B ZF, (iv) the Caenorhabditis elegans T0B2.5 ZF and (v) the Mus musculus RBM10 ZF (Fig. 1B). The selected ZFs were expressed and purified as 2-ZF constructs that consisted of a tandem duplication of the ZF connected by a 5-amino acid linker (GSGSG) and fused to glutathione-S-transferase (GST). In contrast, the ZRANB2-ZFs (ZF1, ZF2) were connected by a shortened version of their natural linker (TTEAKM) ( Fig. 2A and Supplementary Fig. S1A), described in our previous work 27 . All proteins were highly expressed, with purification yields reaching ~50-60 mg of GST-tagged (ZF) x2 per liter of culture. For each protein, we examined their RNA-binding specificity using Systematic Evolution of Ligands by EXponential enrichment (SELEX) 29 experiments, in which high-affinity RNA sequences were selected from a random 25 nt-long ssRNA library (Supplementary Table S1) by the GST-fusion proteins pre-immobilized on glutathione-Sepharose beads (Fig. 2B). Noticeably, we preformed the SELEX experiments using 2-ZF constructs just to make sure that the overall binding affinity was sufficient to enrich preferred RNA motifs.
We used the ZRANB2-(ZF1,2) as a positive control since its specificity is known, whereas GST alone provided a negative control. After 4 selection rounds we sequenced 16 cherry picked clones and analyzed them using the motif-based sequence analysis tool MEME (Multiple Em for Motif Elicitation) 30 (Fig. 2C). MEME analysis on GST-ZRANB2-(ZF1,2) enriched sequences confirmed the expected specificity for GGU motifs. In particular, all the selected sequences exhibited from one to many GGU motifs, leading to a statistical significant consensus (E-value: 2.6 × 10 −5 ). For GST-(EWS-ZF) x2 , the identified consensus (E-value: 2.3 × 10 −5 ) indicated specificity of the EWS-ZF for either GGU or GGG motifs. However, we observed a higher occurrence of the GGU over the GGG trinucleotide: 69% of the selected sequences exhibited at least two GGU motifs against the 6% presenting two GGG motifs. For GST-(ABI3-5Sup-ZF) x2 , 81% of the enriched sequences contained either GGA or GGAGGA motifs. The enriched consensus (E-value: 1.6 × 10 −6 ) revealed the specificity of the full (ABI3-5Sup-ZF) x2 array for (GGA) x2 sequences. No clear enrichment was observed after 4 selection rounds for GST-(ZRANB1-B) x2 , GST-(T0B2.5) x2 , GST-(mRBM10) x2 or for the GST control (E-values larger than 0.05). Altogether, these data suggest that the characterized ZFs fall into two categories: those that, supposedly do not bind to ssRNA and those that do. In the latter case, they strongly favour GGU or closely related motifs.
In vitro binding assays for the ZF variants against SELEX-selected RNA sequences. To corroborate our SELEX data, we measured the binding affinities of the different ZFs for their selected RNA sequences (Supplementary Table S2) using Isothermal Titration Calorimetry (ITC) and Bio-Layer Interferometry (BLI) ( Table 1, Supplementary Figs S3 and S4). In addition to the ZF proteins used in the SELEX experiments, we also conducted the ITC and BLI assays on the FUS ZF (Fig. 1B). This ZF is a murin protein domain whose human homolog was reported to bind to GGU.
All ZFs were expressed and purified as GST-(ZF) x2 proteins. After cleavage of the GST tag, we isolated the (ZF) x2 arrays by size exclusion chromatography, with an average yield of 8 mg of purified and isolated (ZF) x2 per liter of culture ( Supplementary Fig. S2). Binding experiments were carried out using the purified (ZF) x2 proteins, as described in the experimental section.
It is interesting to note that all the binding reactions measured by ITC were always characterized by large favorable enthalpy changes and unfavorable entropy changes, as observed for other ssRNA-binding domains 31 . As expected, the values of binding reaction stoichiometry are close to 1:1 or a bit lower, likely due to difficulties in accurate estimation of RNA or protein concentration, as described in our previous study 32 . Overall the BLI and ITC measurements are in good agreement (Table 1, Supplementary Figs S3 and S4), with all dissociation constants in the low μM range. The binding kinetics (k off and k on ) were also closely similar for all the studied ZFs.
Importance of position 4 in the RNA-binding platform of RanBP2-type ZFs. Interestingly, the mouse RBM10 ZF variant (mRBM10-ZF) that we studied herein exhibited no enrichment in SELEX and presented a lower binding affinity for GGU motifs compared to the other studied ZFs that were shown to bind RNA. This observation is intriguing given that this protein only differs by 2 amino acids (at position 4 and 26) from the human RBM10 ZF (Fig. 1), which was previously shown to bind to a GGG motif 25 with low μM binding affinity. In order to analyze the importance of these two residues in the RNA-binding affinity, we generated single mutations (N4D, M26V) in the mRBM10-ZF variant. The corresponding mutated proteins and the RBM10 ZFs were expressed and purified as isolated (ZF) x2 arrays ( Supplementary Fig. S1B). Then, we measured the affinities of each protein for the target sequence GGG (Table 2 and Supplementary Fig. S5). As expected, (mRBM10-ZF) x2 exhibited a lower affinity (five-fold) compared to its human homolog (hRBM10-ZF) x2 . Interestingly, the single N4D mutation was sufficient to restore the RNA binding affinity measured for (hRBM10-ZF) x2 , whereas M26V had no observable effect on the RNA-binding affinity, suggesting that D4 was solely responsible for the difference in binding affinity between the RBM10 ZF homologs. Furthermore, this residue seems to be highly conserved   www.nature.com/scientificreports www.nature.com/scientificreports/ among the putative RNA-binding RanBP2-type ZFs (Fig. 3) since 80% of the sequences exhibit a Asp (D) residue in position 4, the other 20% of the sequences harbor either a Lys (K) or a Glu (E) residue. This later observation has been further investigated and discussed in the Supplementary Data (Fig. S6).
Analysis of the discrimination abilities of the ZF variants. BLI experiments were also conducted to analyze the discrimination ability of the different studied (ZF) x2 proteins towards RNA sequences in which single mutations were introduced (Table 3) in the first, second and third positions of the recognized trinucleotide motif. We performed the G → A mutation for the first and the second positions whereas for the third position the U 3 → A/G/C mutations were performed in order to assess whether ZFs other than ABI3-5-Sup also exhibited tolerance to modifications of the third base. For all the studied (ZF) x2 , mutation of any guanine in the first and second positions of the GGX motif completely abolished the RNA-binding activity ( Table 3). In contrast, the ZFs exhibited different behaviors when the third base was mutated, depending also on the replacing nucleotide ( Table 3, Supplementary Fig. S6). For ZRANB2-(ZF1,2), (EWS-ZF) x2 , and (FUS-ZF) x2 , RNA binding was only observed in the case of the U 3 → G substitution. However, the binding affinity for a GGG motif was 2.3-fold reduced for ZRANB2-(ZF1,2) and 1.3-fold reduced for both (EWS-ZF) x2 and (FUS-ZF) x2 compared to the GGU sequence. Interestingly, (ABI3-5Sup-ZF) x2 and (hRBM10-ZF) x2 showed higher tolerance to modifications of the third base. Despite a 3-fold preference for U in the third position, (ABI3-5Sup-ZF) x2 bound to GGG as well as to GGA motifs. In contrast, protein binding to GGC RNA was too low to allow accurate fitting, therefore we could not report affinity for this sequence. Finally, although (hRBM10-ZF) x2 bound to all tested GGN sequences, it showed a preference for GGG over GGU (1.6-fold), GGA (4.6-fold) and GGC (8.6-fold) motifs.
Design and characterization of a (ZF) x6 array. In order to target a longer RNA sequence and therefore increase the specificity of our ZF arrays in cell-based applications, we generated a tandem array of 6 ZFs. In addition, since the sequence repertoire recognized by RanBP2-type ZFs is restricted to GGN motifs, we decided to combine RanBP2-type ZFs with ZFs from a different family (CCCH-type ZFs) that were shown to bind ssRNA with different sequence specificity. The designed protein sequence included from N-to C-terminal: the EWS RanBP2-type ZF, the Ts11d CCCH-type ZF2 33 , RanBP2-type ZRANB2 ZF2, a phage display mutant of ZRANB2 ZF2 28 , the RBM5 RanBP2-type ZF 27 and the Ts11d CCCH-type ZF1 33 ( Fig. 4 and Supplementary Fig. S1C).
In contrast to the GGU-specific EWS and ZRANB2 ZFs, the ZRANB2 ZF2 mutant and RBM5 ZF recognized GCC and GGG motifs, respectively 27,28 . Both Ts11d ZFs bound to a UAUU motif 34 . The (ZF) x6 array was expressed in fusion with the Maltose Binding Protein (MBP) and purified by affinity chromatography followed by size exclusion chromatography. On average, we recovered ~20 mg of pure MBP-(ZF) x6 per liter of culture ( Supplementary Fig. S9). We worked with the full MBP-fusion protein to overcome protein stability and solubility issues. Using BLI, we showed that this MBP-(ZF) x6 protein bound its 20-nt long target RNA (GGU UAUU GGU GCC GGG UAUU) with an affinity of 24 ± 0.6 nM (Table 4 and Fig. 5A) suggesting that the binding affinity of this (ZF) x6 protein is ≈26-fold tighter compared to the original (ZF) x2 constructs. Since we previously reported that the addition of a third RanBP2-type ZFs to a 2-ZF construct led to a 4-fold increase of binding affinity for the target RNA 27 , this observation suggests that the ZFs included in this (ZF) x6 chimeric protein are at least partially functional.
To assess the binding specificity of this generated (ZF) x6 protein, we measured the binding of the protein binding to a library of random 20 nt-long RNA sequences (polyN) as well as to an RNA in which the 3/4-nt subsites were permuted to give the sequence: UAUU GGU GGG UAUU GCC GGU (PT RNA). Our data indicated that control RNAs bound with only a 10-fold reduced binding affinity compared to the designed target (Table 4 and Fig. 5A). In particular, the (ZF) x6 protein associated to its target and also to polyN RNAs with rate constant (k on ) values that are identical within the error limit, whereas the dissociation was 17-fold faster for random RNAs compared to the target. Noticeably, the binding kinetic profiles measured for the permuted target RNA (PT RNA) were more complex with increased non-specific contributions (plateau drift, Fig. 5).
Finally, BLI experiments conducted on polyU, polyA and polyC RNA controls resulted in very low signal intensities that we attributed to nonspecific binding (Fig. 5B). Other binding experiments were performed against control molecules (ssDNA and dsDNA) and are presented and discussed in the Supplementary Data (Fig. S7).
Assessment of the binding specificity of the (ZF) x6 protein in a living cell. Based on the in vitro binding assays performed with the (ZF) x6 chimeric protein, we asked whether the 17-fold slower dissociation rate Figure 3. WebLogo diagram representing the conserved amino acidic positions among putative RNA-binding RanBP2-type ZFs. We selected the RanBP2type ZFs (accession code pfam00641) exhibiting an aromatic residue at position 17 that could potentially be stacked between the RNA bases, as observed for W17 in ZRANB2 ZF2: RNA structure. Among all the conserved residues, D4 was present in almost 80% of the sequences.
www.nature.com/scientificreports www.nature.com/scientificreports/ observed for the target RNA compared to random RNA sequences would be sufficient to see the specific binding of the (ZF) x6 protein to its target RNA in a living cell. For this purpose, we designed a new bacteria-based assay. In these experiments, we used genetic constructs derived from the pACYDuet vector for dual expression of two gene cassettes (Fig. 6A). We generated three different plasmid constructs with the first gene cassette encoding either: i) the (ZF) x6 array fused to a C-terminal MBP-tag; ii) the (ZF) x6 array alone or iii) the MBP alone. In all these constructs, the second gene cassette coded for a variant of the green fluorescent protein (GFP LVA) that exhibited a reduced half-life when expressed in Escherichia coli compared to normal GFP 35 . Importantly, upstream of the ribosome-binding site (RBS) in the second cassette, we replaced the vector sequence corresponding to positions +33 to +53 with the sequence coding for the 20 nt-long target RNA. These three vectors were named: (ZF) x6 -MBP-target-GFP, (ZF) x6 -target-GFP and MBP-target-GFP vector (Fig. 6A). In addition, we generated a control set of plasmids that included the same gene cassettes with the exception that the sequence upstream the RBS was not modified. These plasmids were termed: (ZF) x6 -MBP-GFP, (ZF) x6 -GFP, MBP-GFP. The idea behind this new bacteria-based assay is that specific interaction of the (ZF) x6 chimeric protein with its target sequence would interfere with proper assembly of the ribosome machinery and consequently reduce GFP translation levels. In other words, this experiment can be seen as a cell-based assay to study the binding specificity of the chimeric protein where the (ZF)x6 has to locate its target RNA sequence among an ocean of RNAs that are present in the bacteria and compete with specific binding.
It is also important to mention that we chose bacteria and not eukaryotic cell-based assays for several reasons: i) bacteria do not have any cell compartment (no nucleus), therefore all the RNAs present in the cell are available   Table 3. BLI results obtained for each (ZF) x2 with all tested RNAs. Reported means and associated standard errors were calculated from at least two independent experiments. "NB" stands for "No Binding"; "*" stands for weak and not fitted binding between (ABI3-5Sup-ZF) x2 and (GGC) x2 RNA.

Figure 4.
Schematic representation of the (ZF) x6 array. From the left: the EWS RanBP2-type ZF, the second CCCH-type ZF of Ts11d, the second RanBP2-type ZF of ZRANB2, a phage display ZRANB2 ZF2, the RBM5 RanBP2-type ZF, the first CCCH-type ZF of Ts11d. The RanBP2-type ZFs and the CCCH-type ZF are blue and green, respectively. Linker regions are shown in red. Table 4. BLI-derived kinetic parameters of MBP-(ZF) x6 binding to RNA. Reported means and associated standard errors were calculated from at least two independent experiments. Kinetic parameters for PT RNA (asterisk) were only indicative because the one-site model did not provide accurate data fit (Fig. 5A).
www.nature.com/scientificreports www.nature.com/scientificreports/ as competing molecules, and ii) bacteria antibiotic resistance has always been the main interest of our lab and our long-term goal is to be able to target regulatory RNAs involved in antibiotic resistance 36 using our ZF technology.
For each construct, we monitored the expression of GFP LVA in living E. coli krx cells over time, after IPTG and L-rhamnose induction (Fig. 6B). For plasmids carrying the target RNA sequence, we observed that the expression of the RBP (either with the MBP-tag or alone) was accompanied by a decrease in fluorescence intensity. In contrast, when the MBP was expressed alone, fluorescence production gradually increased over time. Similarly, the GFP LVA protein was expressed at high levels in all bacteria transformed with the plasmids from control set, in which the region adjacent to the RBS was not modified. Altogether these data suggest that our (ZF) x6 array is specific enough to discriminate its target RNA sequence in the context of the crowded environment of a living bacteria.
It is also important to note that all the bacteria transformed with the plasmid encoding the (ZF) x6 protein (alone or in fusion with MBP) exhibited normal growth behavior and phenotype which suggested that non-specific binding, if present, was negligible and compatible with the bacteria normal metabolism and growth. showing the binding of the chimeric protein to its target RNA, to a library of random sequence (polyN), and to a permuted target (PT). The corresponding fit (simple 1:1 model (red)) presents reduced quality that is mostly attributed to the sigmoidlike early association phase. This behavior was also observed but to a smaller extent in the kinetic profiles of the different (ZF) x2 :RNA interactions (e.g., bottom-right panel). This phenomenon could results from a rapid adsorption of the proteins to the sensor surface at the very beginning of association phase, which causes a nonzero initial signal. This behavior may be amplified in the case of the chimera because of the larger number of ZFs assembled and their different affinities for target motif. (B) Comparison of BLI experimental and control data collected at a protein concentration of 5 µM.

Discussion
In this study, we have attempted to engineer functional RanBP2-type ZFs tandem arrays in an effort to generate tools for RNA study and manipulation. First, we have focused our work on extending the sequence specificity repertoire of the RanBP2-type ZF family by analyzing the substrate specificities of various ZF members of this family that presented different amino acid composition on the predicted RNA-binding surface. Our SELEX experiments and binding assays have revealed that these ZF domains exhibit an unexpected highly conserved sequence specificity for GGU/GGN motifs. Since their associated full-length proteins are mostly predicted to be splicing factors [37][38][39][40][41] , and because the GGU motif resembles the 5′ splice site consensus motif, the redundant specificity of ZFs may reflect the high degree of conservation at splice site junctions and regulatory sequences. ZRANB2 has been proposed to recognize 5′ splice sites characterized by the consensus AG│GUR (R = purine) in mammals 24 . It is therefore plausible that EWS, FUS, ABI3 and RBM10 carry out a similar function. In addition, splicing intronic/ exonic enhancers like UGGG repeats 42 , G-triplets 43,44 , GGAGA 45,46 , UAGG and GGGG 47 sequences are close to the SELEX-selected motifs and therefore could be the targets of these proteins as well.
Furthermore, a critical interaction for RNA recognition in the ZRANB2-ZF2:RNA structure is the hydrogen bond network formed between both guanines and two arginine sidechains (R19, R20; Fig. 1A) 24 . Both residues are conserved in RBM10 ZF whereas EWS and FUS ZFs showed only conservation of R20. Notably, this substitution of R19 for tryptophan in EWS and FUS ZFs did not appear to affect the RNA specificity. We hypothesized that this side chain can potentially stack with the guanine base, perhaps compensating for the loss of the R19-mediated hydrogen bonds. In contrast, in the ZF from the zebrafish protein ZRANB1-B, which did not bind RNA, a lysine and a serine replace R19 and R20, respectively. These amino acid substitutions reduce the number of possible hydrogen bonds and preclude any compensatory stacking interaction. It is important to note that this ZF shares high sequence conservation with its human counterpart that is reported not to bind RNA but ubiquitin [48][49][50] . In contrast, we did not identify any specific feature that could explain the inability of the ZF from the worm protein T0B2.5 to bind to RNA. The R19K exchange found in T0B2.5 ZF is also found in its human homolog RBM5 that is www.nature.com/scientificreports www.nature.com/scientificreports/ able to bind to GGG RNA 27 . Furthermore, T0B2.5 and RBM5 ZFs share all putative RNA-binding residues, with the only exception of position 14, a residue that contacts the third nucleotide in the ZRANB2 ZF2:RNA structure. However, these two homologs exhibit poor conservation of non RNA-binding residues; therefore conformational changes may explain their opposite RNA-binding behaviors.
In this work, we have also analyzed the RNA-binding properties of the human and mouse variants of RBM10-ZF. Our mutagenesis results highlighted the importance of residue D4 in the binding affinity of RanBP2-type ZFs and showed that an aspartate at this position is responsible for a 5-fold contribution to RBM10 ZF RNA-binding affinity. The computational analysis of putative RNA-binding RanBP2 type ZFs sequences indicated that this residue is highly conserved, suggesting a key role in RNA binding. In the ZRANB2 ZF2:RNA structure, D4 forms a water-mediated hydrogen bond via its carboxylate group with an imino proton of the second guanine 24 . We hypothesized that replacement of D4 with the neutral isosteric asparagine could destabilize this hydrogen bond.
Our data have also shown that all the studied ZFs are highly discriminative for the dinucleotide G 1 G 2 . In contrast, modifications of the third nucleotide seemed to be better tolerated. The ZRANB2, EWS and FUS ZFs only bind to GGU or GGG motifs and appeared to be the most discriminating. We hypothesized that the conservation of the carbonyl groups (C=O) on this third base can explain this behavior. Indeed, in the ZRANB2 F2:GGU structure (Fig. 1), both carbonyl groups of U 3 are specifically recognized by two asparagines (N14, N24) that are conserved in EWS and FUS ZFs. In contrast, ABI3-5Sup-ZF and hRBM10-ZF, who tolerate any base on third position, exhibit substitution of both residues N14 and N24 which reduces the number of possible hydrogen bonds with the third nucleotide. For ABI3-5Sup ZF, the N14 is replaced by C14 that could only form a single hydrogen bond via its thiol group, whereas F24 could potentially stack any base. In hRBM10 ZF, hydrophobic residues replaced both asparagines (N14 → V; N24 → F), which explains the reduced specificity on the third base. Interestingly, both of these ZFs are surrounded by RRM domains in the full-length protein ( Supplementary  Fig. S11), which could partially reduce promiscuous RNA binding. It is also possible that this less discriminative behavior could allow ZFs to bind to various sequences while searching/scanning RNAs, before specific recognition by the full-length protein.
Altogether, these in vitro binding measurements have shed new lights on the molecular bases that drives ssR-NA:RanBP2-type ZF interactions.
In this study, we also generated for the first time a (ZF) x6 array in order to be able to target a longer RNA sequence, and therefore increase the specificity of our ZF proteins for future applications in cells. Given that RanBP2-type ZFs present a restricted sequence repertoire of RNA targets, we decided to combine RanBP2-type ZFs with ZFs from other families that were shown to bind RNA and target different sequences. This chimeric (ZF)x6 can be seen as a proof of concept experiment that shows that we can combine ZFs from different families in an effort to extend the sequence-specificity repertoire of the generated ZF arrays. Our in vitro binding assays indicated that the resulting 20-kDa protein exhibited a ≈ 25 nM affinity for its 20-nt long target RNA. These data are very encouraging since higher affinity (low nM) RNA-binding domains such as PUF and PPR repeats include eight repeats that are 35/36-residues long and each of these repeats specifies and interact with a single nucleotide, which means that a 70-kDa protein would be required to target a 20-nt long sequence.
The analysis of the in vitro binding specificity of our chimeric (ZF) x6 protein revealed a relatively poor discrimination ability between cognate and non-cognate RNAs (Table 4 and Fig. 5A). This is why, we went on to determine whether the observed 20-fold affinity difference was sufficient to provide specific binding in a living cell. For this purpose, we designed a new cell-based assay in bacteria (E.coli) using a dual expression plasmid encoding our (ZF) x6 protein (alone or in fusion with MBP) and the GFP, with the region upstream its RBS replaced by the sequence encoding the target RNA. Our data suggest that in a living cell, our (ZF) x6 protein is able to bind its cognate RNA sequence and thereby decrease the yield of expressed GFP, most likely by diminishing the accessibility of the RBS to the translational machinery.
Although additional in vivo experiments should be done to characterize the RNA-binding specificity of the (ZF) x6 array as well as the contribution of each zinc finger to the modular recognition of protein target, this new bacteria-based assay offers a quick and simple approach to assess any protein-RNA interaction in vivo.
It is also interesting to mention that, although, the sequence specificity repertoire of the individual ZF domains that we studied herein is limited, by combining these ZFs in different ways, it is possible to target numerous RNA sequences. Indeed, the different 20 nt-long RNA sequences that we could target by combining all the ssRNA-Binding ZFs known up to date (GGU: ZRANB2-ZF1, ZRANB2-ZF2, EWS-ZF, FUS-ZF, ABI3-5-Sup-ZF/GCC: ZRANB2-ZF2-HN/GGG: RBM5-ZF, RBM10-ZF/UAUU: Ts11d-ZF1, Ts11d-ZF2) is estimated to be close to ≈4 6 = 4096, namely 4096 possible ssRNA targets. This rough estimation makes the assumption that we can combine 4 different target subsites (3 or 4 nt-long) to create these 20 nt-long target sequences.
Altogether, our study comforts the idea that RNA targeting by ZFs-arrays remains quite attractive. Indeed, our ZF-based approach is relatively simple and mainly relies on the functional assembly of ZFs units (from the same or different families) that can be fused to many different effector domains (translational regulator, RS domains…) and therefore constitutes a versatile tool for RNA study and manipulation. In addition, ZF-based proteins can be easily delivered, expressed under the regulation of various tissue-specific promoters or addressed to specific cellular compartments via specific signal peptides. However, the main limitation of this technology lies on the restrained sequence repertoire targeted by RanBP2-type ZFs. The RanBP2-type ZFs only recognize GGU or GGN motifs despites our previous engineering attempt that led to the generation of a single ZF variant displaying specificity (but reduced affinity) for a GCC sequence 28 . More efforts should be done to elucidate the rules that govern RNA recognition by RanBP2-type ZFs and other ZF families in order to improve specificity-engineering approaches. A universal recognition code may not be achieved but at least few engineered ZFs or ZFs from various families could find a direct and easy application in vivo. (
Preparation of the (ZF) x6 chimeric protein. The designed 6-ZFs chimeric protein (Fig. 1C) includes (from the N-to the C-terminus) the following ZFs: EWS, Ts11d F2 (P47974, residues 191-219), ZRANB2 F2, ZRANB2 F2 (HN mutant) 28 , RBM5 (NP_005769, residues 181-211), Ts11d F1 (P47974, residues 153-181). All these domains were connected by a 5 amino acids linker (GSGSG), excepted for those followed by classical fingers. In this latter case, four additional residues (GSGSGSGSG) were introduced in order to reduce potential steric hindrances. The choice of the selected ZFs required consistency in domain polarities since our ZFs bind to ssRNA in a directional manner (in our case: N → Cter to 5 → 3′). The gene encoding this chimeric protein was purchased (GeneCust, Luxembourg) and cloned into the pMALC2X vector. Protein expressions were carried out in E. coli (DE3) Rosetta cells at 18 °C for 4-5 hours upon induction with 0.1 mM IPTG and supplementation with 0.1 mM ZnCl 2 . Cells were lysed by sonication in 20 mM Tris-HCl pH 7.5, 150 mM NaCl, 1 mM DTT, 0.1 mM ZnCl 2 and one Complete EDTA-free protease inhibitor mixture tablet (Roche Applied Science). The MBP fused protein from the soluble fraction was purified on Amylose resin (NewEngland Biolabs) and eluted using a 10 mM maltose solution. An additional step of purification by gel filtration (Sephacryl HR100, XK16/70-120 mL) was carried to ensure size-homogeneity in the final sample.
Image acquisition and processing. SDS-PAGE data were acquired by digital camera or by Gel Doc ™ EZ Imaging System with Image Lab ™ Software 5.0 (Biorad). All images were edited and annotating by using Adobe Illustrator CS6. SELEX experiments. The template DNA library was prepared by Klenow reaction using a oligonucleotide harboring a 25-nt random sequence surrounded by 2 primer binding sites and a forward primer (carrying the T7 promoter region) (Supplementary Tables S1 and S2). This template DNA library was then purified (QIAquick Nucleotide Removal Kit, Qiagen), and 500 ng-1 µg used for the in vitro transcription reaction (T7 RiboMAX ™ Express Large Scale RNA Production System, Promega). After removal of unincorporated nucleotides (mini Quick Spin RNA Columns, Roche), RNA was extracted with phenol/chloroform, ethanol-precipitated and finally resuspended in RNAse-free water (36 µL). Final RNA concentration was calculated by absorbance at 260 nm. Binding reactions were carried out in 200 µL of SELEX buffer (20 mM MOPS pH 7.0, 50 mM KCl, 5 mM MgCl 2 , 1 mM DTT, 0.1% Triton, 0.1 mM PMSF). Samples containing 40 pmol of GST-fused protein immobilized on GSH beads (GE Healthcare), 1-4 µg of heparin sulfate, and 0.4-3.2 nmol of RNA, were gently mixed at 4 °C for 60 min. Unbound RNA was removed by washing the beads (five times in 500 µL of SELEX buffer). RNA-protein complexes were dissociated by acidic elution in HCl-glycine 25 mM pH 2.2 (25 °C, 5 min). RNA was then purified (mini Quick Spin RNA Columns, Roche), ethanol-precipitated, and reverse-transcribed (SuperScript ® III Reverse Transcriptase, Invitrogen) using a complementary primer. Resulting cDNA was then amplified by 10 or 15 rounds of PCR and purified (Thermo Fisher Scientific, PCR clean up kit). A new RNA library was prepared using the amplified cDNA as a template for the following SELEX round. During the different cycles, selection stringency was intensified by increasing the RNA/protein ratios (round 1:10, round 2:20, round 3:30, round 4:40) together with the heparin amount (+1 µg/round). After 4 selection rounds, the final PCR products were subcloned into pJET vector (CloneJET PCR Cloning Kit, Thermo Fisher Scientific) and individual sequences were analyzed.
Preparation of ssRNA oligonucleotides for the binding assays. ssRNA oligonucleotides (Supplementary Table S2) were purchased in dry form from Eurogentec (Belgium) and resuspended in DEPC water according to the instructions provided by the manufacturer. RNA samples were prepared by direct dilution of high concentrated stocks in experiment-specific buffers prior to use. All ssRNA oligonucleotides included the target sequence flanked by two (A) 3 regions, to ensure target stability to degradation. BLI experiments on (ZF) x2 constructions, were carried out using 5′ biotinilated RNAs. For the MBP-(ZF) x6 chimeric protein, 3′biotinilated RNAs including a triethylenglycol (TEG) spacer were used (3′biotin-TEG), to reduce steric hindrance eventually induced by the MBP tag.
www.nature.com/scientificreports www.nature.com/scientificreports/ Isothermal titration calorimetry (ITC). Concentrated proteins and ssRNA oligonucleotides stocks were diluted into the ITC buffer (50 mM Tris pH 7.5, 150 mM NaCl, 1 mM DTT and 0.1 mM ZnCl 2 ). Samples were kept RNase free by adding all of the time RNase inhibitor (0.4 u/µL) (Ribosafe Rnase Inhibitor, Bioline). All experiments were carried out by titrating proteins (100 μM) into RNA (10 μM), on a MicroCal i200 ITC microcalorimeter (GE Healthcare) at 25 °C. For each titration, an initial injection of 0.5 μL (discarded data) and 20 injections of 2 μL of titrant were made at 120 s intervals. Experimental data were corrected by subtraction of dilution heats from control experiments (protein titration into buffer) and finally analyzed (Origin7.0, MicroCal Software, Northampton, MA). The data were fitted to a 1:1 (one-site) model.

Biolayer interferometry (BLI).
BLI experiments were performed at 30 °C in 96-well microplates (Pall), with agitation set to 1000 rpm, on a Octet HTX instrument (FortéBio, Pall). All assays were carried out in Tris 50 mM pH 7.5, NaCl 150 mM, DTT 1 mM, supplemented with BSA 0.1% and Tween 0.02% to minimize non-specific interactions. Proteins and RNAs were directly diluted in this buffer and RNase inhibitor (0.4 u/µL) (Ribosafe Rnase Inhibitor, Bioline) was added to ensure RNase free conditions. For all experiments on (ZF) x2 constructions, biotinylated RNAs (0.5 µg/mL) were immobilized on streptavidin-coated biosensors (Pall) for 300 s. The assays conducted on the (ZF) x6 chimeric protein, were performed with a loading RNA concentration of 0.125 µg/mL and a duration of 500 s in order to increase the spacing between immobilized RNAs and prevent possible dimerization. Biosensor tips were then saturated in biocytin (10 µg/mL) for 500 s to block any remaining free streptavidin, and equilibrated in the experimental buffer for 180 s prior to binding assessment. We tested different protein concentrations ranging from 1.5 to 0.31 µM (1.3-fold dilution series) and from 5 to 0.44 µM (1.5-fold dilution series) for the (ZF) x2 and (ZFs) x6 chimeric proteins, respectively. In all cases, binding was performed for 10 s and dissociation in the experimental buffer for 20 s. Recorded data were corrected by subtraction from the measurements of a reference sensor immobilized with biotinylated RNA (baseline drift). Data were analyzed using Octet software, version 8.0 (Pall) and fitted using a 1:1 interaction model. A single set of kinetic parameters was obtained each time for all tested concentrations by nonlinear least-squares fitting.
The Microlab STAR liquid handling workstation (Hamilton) and the Octet HTX instrument (FortéBio, Pall) are available at the Robotein ® high-throughput platform for protein production and analysis (www.robotein.ulg. ac.be).
Sequence conservation analysis of putative RNA-binding RanBP2-type ZFs. In order to select the putative RNA-binding RanBP2-type ZFs, we extracted from the RanBP2-type family (accession code pfam00641) all protein sequences showing an aromatic residue at position 17. This initial database was reduced to 1704 entries by eliminating redundancy. An additional correction was carried out by removing sequences not displaying to the Trp-X-Cys-X2-4-Cys-X3-Asn-X6-Cys-consensus. The resulting 1526 sequences were analyzed by WebLogo software 51 , providing a graphical representation of the developed sequence alignment.

RNA-binding bacterial assay.
In these experiments, we employed the pACYDuet-1 vector for the co-expression of two genes, each under the control of IPTG inducible T7 promoter. We designed a sequence coding for the chimeric (ZF) x6 array and a C-terminal MBP-tag, fused by a 5-amino acids GS-linker. Both PCR-amplified (ZF) x6 and MBP DNA sequences harbored at their ends the BamHI (5′) and EcoRI (3′) restriction sites. The full sequence coding for (ZF) x6 MBP was assembled between the restriction sites NcoI and HindIII (first MCS) of the pACYDuet-1 vector. Furthermore, we redesigned the second expression cassette of the vector to accommodate the target sequence of the (ZF) x6 array upstream the RBS (+33 to +53). Downstream the RBS and between the EcoRV and XhoI restriction sites, we assembled the sequence coding for the green fluorescent protein unstable variant GFP LVA 35 . Both target and GFP LVA sequences were commercially purchased as gBlocks ® Gene Fragments (Integrated DNA Technologies, IDT). The pACYDuet-(ZF) x6 MBP-target-GFP vector was obtained by assembling multiple DNA fragments in a single step using the NEBuilder ® High-Fidelity Master Mix (New England Biolabs), following the manufacturer's instructions. In addition, we assembled a control plasmid pACYDuet-(ZF) x6 MBP-GFP that carried no target sequence. From these vectors, we excised either the (ZF) x6 or the MBP coding sequence by single digestion. Then, the digested plasmids were gel-extracted, purified and self-ligated. Using this approach, we obtained both negative control (pACYDuet-MBP-target-GFP) and positive control (pACYDuet-(ZF) x6 -target-GFP) from digestion of the pACYDuet-(ZF) x6 MBP-target-GFP ( Supplementary Fig. S10). Additional controls were generated by digestion of the pACYDuet-(ZF) x6 MBP-GFP (pACYDuet-MBP-GFP and pACYDuet-(ZF) x6 -GFP), therefore providing a full set of control vectors with no target sequence (Supplementary Fig. S10). The integrity of each plasmid was verified by DNA sequencing. Genetic constructions were introduced in E. coli Single Step KRX cells (Promega) by heat-shock transformation. This strain ensured a dramatic control of the GFP LVA expression, thanks to the T7 RNA polymerase driven by a rhamnose promoter. For each transformed construct, various dilutions of a single colony were cultured in 160 µL of LB with chloramphenicol (30 μg/mL) in a 96 well plate at 37 °C with shacking. When an OD of 0.3-0.6 was achieved, bacterial dilutions exhibiting similar density were selected for culture inoculation. We prepared the assay plate (Greiner Bio One, Belgium) by diluting 2 µL of each cell suspension in 160 µL of LB supplemented with chloramphenicol (30 μg/mL). Cells were cultured at 37 °C with shacking in a plate reader TECAN infinite ® 200 PRO (Tecan Group Ltd., Switzerland). When cultures reached an OD of 0.3-0.35, protein expression was induced by adding 0.1% L-rhamnose and 0.5 mM IPTG, at 37 °C. As for all ZF productions, 0.1 mM ZnCl 2 was added as well. The OD measurements were performed at 600 nm with a bandwidth of 9 nm. Fluorescence measurements were carried out using a excitation wavelength of 485 nm with a bandwidth of 9 nm, and the emission wavelength of 525 nm with a bandwidth of 20 nm. For each measurement,10 reads were performed every 15 min, with gain set to 50; lag, integration and settle time were set to 0 μs, 20 μs and 0 ms, respectively.