Large-scale analysis of small molecule-RNA interactions using multiplexed RNA structure libraries

The large-scale analysis of small-molecule binding to diverse RNA structures is key to understanding the required interaction properties and selectivity for developing RNA-binding molecules toward RNA-targeted therapies. Here, we report a new system for performing the large-scale analysis of small molecule–RNA interactions using a multiplexed pull-down assay with RNA structure libraries. The system pro�led the RNA-binding landscapes of G-clamp and thiazole orange derivatives (TO and TO-3), which recognizes an unpaired guanine base and are good probes for �uorescent indicator displacement (FID) assays, respectively. Based on the information obtained from the bindings of TO and TO-3, we selected the combinations of �uorescent indicators and drug-targetable pre-miRNAs and screened for RNA-binding molecules using FID. Four hit compounds were identi�ed, and three of them were validated. Our system provides fundamental information about small molecule–RNA interactions and facilitates the discovery of novel RNA-binding molecules.


Introduction
2][3][4][5] For example, drugs targeting speci c RNA splice sites have been approved to alleviate the symptoms of spinal muscular atrophy. 6,7Further, human precursor microRNAs (pre-miRNAs) [8][9][10][11][12][13] , various repetitive RNAs, such as CUG [14][15][16][17] and UGGAA 18 repeats, and structured RNA elements of infectious pathogens [19][20][21] are considered promising drug targets.3][24] One powerful way to pro le the binding of small molecules is an analysis based on massively parallel DNA sequencing.12]25 Their binding pro les focused on the sequence variants within internal loops and bulge structures.More recently, Sugimoto's group implemented RNAcapturing microsphere particles to establish a new sequencing-based RNA-selection method that does not require any ligand labeling for the RNA-binding uorescent molecules. 26,27Although these methods are valuable, they could produce inaccurate results in the pro ling of speci c or stable RNA structures, such as G-quadruplex (G4) structures, owing to structure-dependent ampli cation biases.This is because polymerase tends to pause at structured RNA sites during reverse transcription or polymerase chain reactions (PCR). 28,29Therefore, different approaches that do not involve reverse transcription or PCR are required for the pro ling of small-molecule binding to diverse RNA structures, particularly highly structured RNAs exhibiting naturally occurring sequences.
Recently, we developed a new method, folded RNA element pro ling with structure library (FOREST) 30 , for the large-scale analysis of protein-RNA interactions using a multiplexed RNA structure library.FOREST quanti es interactions using a DNA barcode microarray that can capture RNA probes in an RNA structure library (Fig. 1) that is designed by extracting structured motifs from RNA structure datasets.In this system, a stabilizing common stem, a unique RNA barcode (5′ terminus), and Cy5 or Cy3 (3′ terminus) were attached to each RNA structure (Fig. 1a).Employing this system, we revealed the interaction landscape of RNA-binding proteins (RBPs) using the RNA structure library that was extracted from human pre-miRNAs, human 5′ UTRs, and the HIV-1 RNA genome.FOREST drives ampli cation-free quanti cation, thus facilitating the bias-free detection of different RNA structures and their interactors (e.g., G4 and G4-binding RBPs).Notably, we identi ed cross-reactive interactions among some of the tested RBPs.For example, we observed that three G4-binding proteins exhibited different binding preferences to G4 and interacted with non-G4 RNA motifs (e.g., the r(GAA) n motif) with different selectivity.Thus, we hypothesized that our method could be used as a platform for pro ling the RNAbinding landscapes of small molecules.
In this study, we introduced a new systematic and large-scale approach for investigating small molecule-RNA interaction pro les.By subjecting small molecules to FOREST, our system is advantageous for analyzing large-scale datasets of diverse RNA structures derived from naturally occurring sequences.As the detection of the binding a nities of different RNA structures is based on microarray analysis, FOREST avoids sequencing and structure-dependent ampli cation biases.
Additionally, the results include not only high-a nity interactions but intermediate-and low-a nity ones.Therefore, our datasets will be invaluable resources for understanding the ne determinants of small molecule-RNA interactions.

Results and discussion
Design of the platform for the large-scale analysis of small molecule-RNA interactions Regarding the rst RNA structure library for the analysis (Library-1), we designed 1824 RNA structural motifs by extracting the terminal loops of human pre-miRNAs and adding several repetitive and control sequences. 30Five different barcodes were allocated to each motif structure to exclude the outliers representing non-speci c binding to the barcode sequences.Thereafter, the small molecule was immobilized onto beads via biotin-streptavidin interactions (Fig. 1a).We performed the pull-down process by mixing the RNA structure library and immobilizing the small molecule, followed by the washing and elution steps to collect the bound RNAs.The RNAs that were pulled down were quanti ed by a DNA barcode microarray to obtain the uorescence intensity of each RNA structure because of the correlation of uorescence intensities with binding a nities after background subtraction by no-ligandconjugated streptavidin control samples. 30 this study, we selected G-clamp and thiazole orange (TO) derivatives as the binding molecules (Fig. 1).2][33] G-clamp was used to validate our system because it binds strongly to a wide range of RNAs.0][41][42][43][44][45] For example, TO-PRO-3, a deep-red uorescent indicator, was used in an FID assay to screen for compounds that bind to the bacterial A-site, in uenza A virus RNA, and G4 DNA. 37,38,46However, the binding information of these uorescent indicators and their target RNA sequences is still limited.We believed that it would be bene cial to determine the RNA binding pro les of such conventionally used indicators to further expand the repertoire of target RNA sequences that can be used in FID assays.Based on the structure of TO-PRO-1, we designed the N 3 -modi ed TO-N 3 and TO-N 3 -2 exhibiting different linker positions (Fig. 1d).
Large-scale analysis of the interaction of G-clamp-N 3 with Library-1 First, we ranked the RNA motifs from Library-1 based on their G-clamp binding (ranking list S1).To understand the binding properties of G-clamp, the numbers of bases in the single-stranded (ss) and double-stranded (ds) RNA regions were investigated using the predicted secondary structures of the pre-miRNA loops (Fig. 2).Regarding ssRNA, the G count of high-ranking RNAs (1-360) was signi cantly higher than that of all the pre-miRNAs in Library-1.Contrarily, the G count of the low-ranking RNAs (1441-1800) was signi cantly lower than that of all the examined pre-miRNAs.Conversely, the C counts of the high-and low-ranking RNAs were lower and higher than those of all the pre-miRNAs in Library-1, respectively.The U count of the high-ranking RNAs was lower than that of all the pre-miRNAs, and the A count of ssRNA was not signi cantly different among the rank sections.Regarding dsRNA, the four bases exhibited smaller differences among the ranks compared with ssRNA.The C and U counts were inversely proportional to the G count, as C and U in the ssRNA region can form base pairs with the neighboring G bases.Furthermore, the percentage of the unpaired G count highlighted an unpaired-G selectivity (Figure S3).Five or more unpaired Gs were mainly observed in high-ranking RNAs (1-180), and the percentage decreased gradually as the rank decreased.Contrarily, few RNAs without any or only a single unpaired Gs were observed in the high-ranking group, and the percentage gradually increased as the rank decreased.These results corresponded to the fact that G-clamp mostly recognizes G base in the ssRNA regions. 32xt, to validate our screening platform for RNA structures, we selected 17 sequences from the higha nity (top 100), intermediate-a nity (101-1000), and low-a nity (1001-1824) groups and measured their apparent dissociation constants (K Dapp ) by uorescence titration (Figure S4).The RNA motifs with three base pairs of a common stem (5′-AGC-motif-GCU-3′) were used to measure K Dapp .A histogram of Z-scores and the correlation between the Z-scores and K Dapp values are shown in Figs.3a and 3b and Table S1.The minimum free energy structures of the selected RNAs are shown in Figs.3c and S5.The ranks 1 and 2 RNAs (Fig. 3c, top) contained unpaired guanine bases in their loop structures and exhibited strong G-clamp binding (K Dapp = 0.024 and 0.022 µM, respectively).For the rank 1 RNA (hsamir-4520-1 loop), we performed the G mutation assay using two G-mutated hsa-mir-4520-1 loops (mir-4520-1-mutG2A and -mutG7A).Although mutG2A exhibited strong binding (K Dapp = 0.011 µM) similar to the wild type, mutG7A exhibited weaker binding (K Dapp = 15 µM).The double mutant mutG2,7A also exhibited weaker binding (K Dapp = 3.7 µM) than the wild type, indicating that G7 contributes to the strong interaction with G-clamp.To consider the selectivity of G7, the molecular modeling of the complex structure between mir-4520-1 and G-clamp-N 3 was performed using RNAComposer 51,52 and MacroModel (Fig. 3d).When G-clamp is bound to 7G by four hydrogen bonds, it can interact with neighboring bases.We considered that these interactions, such as stacking with CG base pairs at the top of the stem, would facilitate strong binding in addition to the formation of the four hydrogen bonds, indicating that G-clamp does not recognize all Gs on the loop (G-clamp recognizes speci c Gs).The high number of G bases in the ssRNA region of high-ranking RNAs probably increased the probability of the presence of G bases that bind to G-clamp strongly.In the high-a nity group, two of the selected RNA motifs contained the G4 structure.The K Dapp values of the hsa-mir-6850 loop (rank 28) and G4_(GGGU) 6 (rank 38) were 0.19 and 0.15 µM, respectively.In the intermediate-a nity group, even though hsa-mir-548ba (rank 522) exhibited a loop that was similar to that in hsa-mir-4520-1, its K Dapp value (10 µM) was much higher.Comparing the modeling structures of hsa-mir-4520-1 and hsa-mir-548ba (Figure S6) revealed that G-clamp-N 3 cannot interact with adjacent bases when it forms hydrogen bonds with a G base on the loop structure of hsa-mir-548ba.In the low-a nity group, the loops without any G bases, such as hsa-mir-4773-1 (rank 1192), hsa-mir-4282 (rank 1775), and common stem sequence with four Us in the hairpin loop, exhibited weak binding (K Dapp > 40 µM; Figures S4 and S5).Within the group of selected RNAs, only (CUG) 16 (rank 43) deviated from our expectations in the uorescence titration experiment (Fig. 2b, green color).Overall, we observed a good correlation between the Z-scores and observed K Dapp (Fig. 2b, Spearman's correlation coe cient: −0.86); the coe cient without considering (CUG) 16 exhibited an even higher correlation (− 0.95).The G4 structures, which are susceptible to bias when using sequencing-based methods, were evaluated and ranked.These results indicate that our system for the large-scale analysis of the RNA structure libraries can ensure accurate assessments of small molecule-RNA interactions.
Large-scale analysis of the interaction of the thiazole derivatives with Library-2 Next, we investigated the binding of different RNA motifs to the TO derivatives using our second RNA structure library, Library-2 (ranking lists S2-S5).Library-2 contains 3000 RNA structural motifs that were designed by extracting the terminal loops of human pre-miRNAs, along with SARS-CoV-2 and in uenza A virus RNAs and several repetitive and control sequences.Compared with the G-clamp binding pro le, TO and TO-3 exhibited distinct pro les (Fig. 4a), although a signi cant correlation was observed between their binding pro les (Fig. 4b).These data indicate that the TO derivatives exhibited similar selectivities, which were unique compared with the G-clamp, as expected.The correlation coe cient between TO-N 3 and TO-N 3 -2 with different linker positions (r = 0.78) was lower than that between TO-N 3 and TO-3-N 3 with the same linker position (r = 0.91), suggesting that the linker positions affect the binding pro le (Fig. 4b).The high-a nity group of RNAs for the TO derivatives was mainly populated with G4 RNAs.The kernel density estimation of the Z-scores of the TO derivatives indicated the signi cant enrichment of the G4 control RNAs (Figure S7).
To understand the binding properties of the TO derivatives, the numbers of bases in the ssRNA and dsRNA regions were quanti ed using the predicted secondary structure of the pre-miRNA loops similar to the analysis of the G-clamp (Fig. 4c).For ssRNA, the G count of the high-ranking RNAs (1-360) was signi cantly higher than that of all the pre-miRNAs in Library-2.Contrarily, the ssRNA counts of the other bases were not signi cantly different among the different ranks.Regarding dsRNA, the G and C counts of the high-ranking RNAs (1-360), as well as the A and U counts of the low-ranking RNAs (1441-1800), were signi cantly higher than that of all the pre-miRNAs.The count tendencies of TO-3-N 3 and TO-N 3 were similar.Overall, these results altogether suggest that the TO derivatives prefer G-rich ssRNA and G/C-rich rigid stem structures, such as hsa-mir-5091 and − 4437 (Fig. 4d).Regarding ssRNA, we further examined the total number of nucleotides in the internal and hairpin loops (Fig. 4e).Although highranking RNAs exhibited more G and A bases in their internal loops, the hairpin loops of high-ranking RNAs only exhibited a preference for more G but no other bases.These results suggest that the TO derivatives prefer the G/A bases in the internal and G-rich hairpin loops.A likely explanation is that the internal loops comprising G/A bases may create a binding pocket that is ideal for intercalation, whereas the G-rich hairpins may form G4-like structures.To con rm the preference of the TO derivatives for internal loops comprising G/A bases, we compared the K Dapp values of hsa-mir-4437 and its internal loop (AGG to UCC) mutant, mir-4437-mut (Figs.4d and S8).Although the K Dapp values of TO-N 3 and TO-3-N 3 for the wild type hsa-mir-4437 loop were relatively low, 4.4 and 11 µM, respectively, the K Dapp values of mir-4437-mut were much higher (> 40 µM), suggesting that the G/A bases in the internal loop are crucial to the strong binding of the TO derivatives to the hsa-mir-4437 loop at least.
To further validate the binding pro les of the TO derivatives that were generated by our screening platform, the K Dapp values of TO-N 3 and TO-3-N 3 interacting with 11 RNAs were measured by uorescence titration (Figures S9 and S10 and Table S2).For the high-ranking RNAs (top 100), the K Dapp values correlated well with the Z-scores of TO-N 3 , and the Spearman correlation coe cient was − 0.93 (Fig. 5a).Contrarily, no strong binding was observed for the low-ranking RNAs (K Dapp > 40 µM).Similarly, the K Dapp values of TO-3-N 3 also correlated well with the Z-scores of TO-3-N 3 of high-ranking RNAs (top 100), as the coe cient was − 0.96 (Fig. 5b).These results con rm that our system can provide accurate assessments of different binding modes of ligands and structured RNAs containing G4 structures.
Additionally, we extended this analysis to the commercially available indicators, TO-PRO-1 and TO-PRO3, by measuring their K Dapp values to the 16 selected RNAs (pre-miRNAs, G4 RNAs, and virus RNAs) and calculating the correlations with the Z-scores of TO-N 3 and TO-3-N 3 , respectively (Figures S11-S13 and Tables S3 and S4).Regarding TO-PRO-1, the K Dapp values exhibited weak and improved correlations with the Z-scores of TO-N 3 (r = − 0.60) and TO-N 3 -2 (r = − 0.71), respectively, indicating that the binding pro le of TO-N 3 -2 may re ect TO-PRO-1 binding by various RNA motifs more accurately (Fig. 5a).
Conversely, for TO-PRO-3, there were signi cant correlations between the K Dapp values and Z-scores of TO-3-N 3 (r = − 0.89) and TO-3-N 3 -2 (r = − 0.90) (Fig. 5b).Taken together, these binding pro les will bene t the selection of the proper combinations of target RNA and uorescent indicators for FID assays.
Screening of the novel RNA-binding molecules by uorescent indicator displacement assay using TO-PRO-1 and TO-PRO-3 4][55] As a highrank G4 RNA control, hsa-mir-6850 was selected.Additionally, as a low-rank control, the hairpin loop motifs from SARS-CoV-2 RNA (SARS-low) and hsa-mir-374a were selected.The predicted RNA secondary structures are shown in Fig. 6b, and the K Dapp values of TO-PRO-1 and TO-PRO-3 to these target and control RNAs are listed.The signal-to-background (S/B) ratios of TO-PRO-1 and TO-PRO-3 for these RNAs are summarized in Fig. 6c.The S/B ratios of the low-rank RNAs were signi cantly lower than the others.A low S/B ratio is not favorable for performing an accurate FID assay.To identify the small molecules that bind to the target human pre-miRNAs listed above, we employed FID to screen a chemical library comprising 118 oxidation-reduction compounds (Targetmol).The uorescence emission of TOs depends on the RNA binding: free TOs exhibit low uorescence, although the intensity increases upon RNA binding.Thus, the uorescence emission of TOs decreases when a test compound interacts with a target RNA via the same site as the uorescent indicator, thereby identifying it as a hit compound (Fig. 6a).Through this screen, we identi ed four hit compounds that disrupted TO-RNA interactions (Figs.6d and S14).Although three of these compounds-baicalein (Bai), myricetin (Myr), and chelerythrine chloride (Che)-were hits obtained from the assay when using TO-PRO-1, Bai did not meet our selection criteria when TO-PRO-3 was used as the indicator; rather, AS 602801 (AS) became a hit compound.This is probably because TO-PRO-3 differs in size and/or uorescent properties compared with TO-PRO-1, indicating that diverse uorescent indicators should be included to avoid false negatives and positives.Regarding the hit compounds, Myr 56-58 and Che [59][60][61] have been reported as DNA or RNA binders, whereas AS has not been reported.
The RNA binding of the four hit compounds was validated by measuring their K Dapp values by uorescence titrations.These experiments revealed that Bai exhibits weak RNA binding (K Dapp > 40), indicating that it is a false-positive compound for targeting disease-related human pre-miRNAs when using TO-PRO-1.The structurally similar avonoid, Myr, exhibited moderate binding (K Dapp = 16-25) to target RNAs, as the indicators revealed (Figures S15 and S16).Unexpectedly, Myr bound strongly to hsamir-6850, which forms a G4 structure, although it was not identi ed as a hit compound when TO-PRO-3 was used.This suggests that Myr and TO-PRO-3 might have different binding sites.When using low-rank RNAs, Myr exhibited weak RNA binding (K Dapp > 40) even though the indicators exhibited positive.Moreover, we observed that Che was bound to all the RNAs (K Dapp = 2.6-16) though the indicators exhibited negative for low-rank RNAs (Figs. 6d and S17).Overall, predictably unreliable results were obtained when low-rank RNAs were used.The precisions of demonstrating the reliability of the assay data across the investigated RNAs became worse as the RNA ranking decreased (Figure S18), suggesting that our binding pro les offered insight into the selection of applicable RNA targets for indicators in FID assays.
In the uorescence spectra of Che, two major peaks were observed at 420 and 550 nm (Fig. 7a and S17).Under aqueous conditions, Che forms an OH adduct that emits a strong uorescence signal at 420 nm when the reaction is at equilibrium. 62,63However, the intensity of this 420 nm peak increased dramatically at pH 8 as we shifted the experimental conditions from pH 5 to 8, indicating that the addition of OH was favored under weak alkaline conditions (Figure S19).Although the uorescent intensity of the OH-adduct peak at 420 nm decreased after RNA addition, the 550 nm peak increased.This is likely because Che was protected from hydrolytic attacks after RNA binding and shifted the reaction equilibrium toward Che.Finally, we observed AS binding to hsa-mir-191, -21, and − 6850 (K Dapp = 14, 20, and 4.5, respectively).Interestingly, this compound exhibited strong light-up properties (Figs.7b and S20): although free AS exhibited almost no uorescence (Φ free = 0.00063), strong uorescence was observed after RNA binding (Φ bound = 0.054).The methine tautomer 64 likely contributes to this light-up property.TO-PRO-1 could not detect the RNA binding of this compound because of the interference of its strong light-up property at a similar wavelength range with the detection of the uorescence originating from TO-PRO-1.These characteristics make AS an interesting seed compound for developing novel RNA binders and uorescence probes.

Conclusions
We developed the large-scale analytical platform for investigating small molecule-RNA interactions by subjecting the small molecules to FOREST.The a nity pro les generated by FOREST include not only high a nity interactions but intermediate and low a nity ones, on the wide range of RNA structures that were derived from naturally occurring sequences.Additionally, compared with methods using massively parallel DNA sequencing, FOREST-by using microarray analysis to determine the binding a nities of RNA structure libraries-presents the a nity pro les of small molecules without any structure-dependent ampli cation bias. 30First, we validated our system using the unpaired G-speci c binding property of the G-clamp (Figs. 2 and 3).The FOREST system ranked the G-clamp bindings of high-, intermediate-, and low-a nity RNA targets.Second, we generated the binding pro les of the TO derivatives using this platform (Figs. 4 and 5).Employing FOREST pro ling, G4 structures, which are susceptible to bias by sequencing-based methods, were evaluated and ranked as top-tier interactors of the TO derivatives.
Additionally, the analysis of the a nity pro les reveals a binding preference of the TO derivatives for RNA motifs containing G-rich hairpin loops, internal loop G/A bases, and/or G/C-rich stem structures (Figs.4c-e).
The library-wide binding landscape and pro les were also applicable to commercially available uorescent indicators, TO-PRO-1 and TO-PRO-3, for FID assay (Fig. 6).Since our knowledge of uorescent indicator-RNA combinations remains limited, the pro les generated by this system can bene t the selection of optimal combinations and further expand the repertoire of target RNA sequences for FID assays.In this study, we identi ed three binding molecules for disease-related human pre-miRNA loop motifs by FID assays using TO-PRO-1 and TO-PRO-3 based on the binding pro les of the TO derivatives generated from FOREST.The FID assays using these indicators and low-rank RNAs could not provide accurate hit compounds (Fig. 6), demonstrating that our binding pro les are valuable for selecting applicable combinations for the FID assay.Moreover, we demonstrated the utility of this screening approach by identifying AS 602801 as an RNA binder that binds hsa-mir-191, -21, and − 6850 with remarkable light-up properties (Figs.6d, 7b, and S18).Considering that AS 602801 was identi ed only by using TO-PRO-3, the use of multiple uorescent indicators is recommended for FID assays.Our system will be valuable for obtaining further RNA-binding information for uorescent indicators.
The FOREST system in this study provides the basis for future efforts to identify new small molecule-RNA interactions, investigate the binding pro les and selectivities of various RNA-binding molecules, and aide the design of novel RNA-binding molecules through FID assays.

Methods
In silico RNA motif extraction All motifs including human pre-miRNA in library-1 and − 2 were extracted from miRBase as detailed previously. 30To design library-2, the human pre-miRNA motifs were ltered based on length (< 107 nt), with 1804 species collected in total.Next, we obtained RNA secondary structure datasets as determined by SHAPE-MaP or DMS-MaPseq with structural analysis. 65,66Predicted structures and conserved elements of SARS-CoV2 were obtained from a published study. 67From the collected datasets, we divided long continuous RNAs into terminal motifs and de ned them as structural units using FOREST.py(https://github.com/KRK13/FOREST2020).In total, 1099 motifs were collected from the transcripts of SARS-CoV2 and In uenza A viruses.As controls, selected RNA structural motifs, aptamers, and defective mutants were collected and loaded into the libraries.
Design of a template pool of RNA structure library and DNA barcode microarray extracted RNA motifs were attached with T7 promoter, RNA barcodes, and stabilizing stem sequences for detection and hybridization to the DNA barcode microarray as previously described. 30The ssDNA templates were synthesized by SurePrint oligonucleotide library synthesis (Agilent technologies).The size of the oligo template was limited to 170 nt for RNA structure library-1 and 190 nt for library-2.After assigning barcodes to RNA structures, the DNA reverse complementary strands of RNA barcodes were used by SureDesign (Agilent technologies), a custom CGH array design service, to synthesize DNA barcode microarrays.Probe Replication Factor was set to 5× and 3×.

3'-Terminal labeling with Cy5 or Cy3
All RNA probes in the RNA structure libraries were labelled with a uorescent dye at the 3' end.Ten micromolar RNA structure library, 100 µM pCp-Cy5 or pCp-Cy3 (Jena Bioscience), and 0.5 U/µL T4 RNA Ligase (Thermo Fisher Scienti c) were mixed in 100 µL of 1× T4 Ligase Buffer (Thermo Fisher Scienti c).The mixture was incubated at 16°C for 48 h on a ThermoMixer (Eppendorf) with ThermoTop (Eppendorf).After incubation, the labelled RNA was puri ed using Zymo RNA Clean and Concentrator (Zymo Research) and stored at − 28°C until use.

RNA pull-down
The RNA structure library was prepared in 1× Binding buffer (20 mM phosphate pH 7.0, 20 mM NaCl, 80 mM KCl). 30 For folding, RNA was heated at 95°C and cooled to 4°C on a ProFlex Thermal Cycler (Thermo Fisher Scienti c) with a ramp rate of − 6°C/sec.During the folding step, 100 pmol of small molecules and 50 µL of Streptavidin Mag Sepharose (Cytiva) were mixed in 900 µL of 1× Binding buffer to prepare the small molecule-conjugated beads.The mixture was incubated on a ThermoMixer (Eppendorf) at 25°C for 60 min with vortex mixing at 1200 rpm.The tube was placed on a magnetic rack to remove the supernatant and 1 µg of the refolded RNA structure library in 1 mL of 1× Binding buffer was added.A mixture containing only the beads was prepared as a control for background subtraction.The mixture was incubated on a ThermoMixer at 25°C for 60 min with vortex mixing at 1200 rpm.The mixture was washed three times with 1× Binding buffer when the reaction ended.Two hundred microlitres of 1× Elution buffer (1% SDS, 10 mM Tris-HCl, 2 mM EDTA) was added to the magnetic beads and the mixture was heated at 95°C for 3 min.The bound RNA structures were collected from the supernatant by removing the magnetic beads and puri ed with phenol-chloroform extraction and ethanol precipitation.

Hybridization and microarray scanning
Eighteen microlitres of the bound RNA structures was mixed with 4.5 µL of 10× Blocking Agent (Agilent Technologies) and 22.5 µL of Hi-RPM Hybridization Buffer (Agilent Technologies).The samples were incubated for 5 min in a heat block set at 104°C, then rapidly cooled and incubated for 5 min in ice water.
The samples were applied to an 8× 60 K Agilent microarray gasket slide (Agilent Technologies).The prepared gasket slide and CGH custom array 8× 60 K (Agilent Technologies) were assembled with SureHyb.Hybridization was performed for 20 h at a temperature of 55.5°C at 20 rpm.The microarray slide was washed for 5 min with Gene Expression Wash Buffer 1 (Agilent Technologies) in a glass container at room temperature following hybridization.The microarray slide was moved to a glass container containing Gene Expression Wash Buffer 2 (Agilent Technologies), which was immersed in a thermostatic bath at 37°C.The washing step was performed for 5 min.Fluorescence scanning was performed on the microarray and uorescence image data were acquired using SureScan (Agilent Technologies).The acquired images were converted to numeric uorescence intensities for each spot by Feature Extraction (Agilent Technologies) and GeneSpringGX (Agilent Technologies).

Calculation of binding intensity
The binding intensities of each RNA structure were calculated by subtracting the uorescence intensities of the no-ligand control samples.To alleviate the effect of undesired interactions with the RNA barcode, we calculated the mean uorescence intensities of each RNA structure from the intensities of three RNA probes that had the same RNA structure but different RNA barcodes.For this reason, we ltered the maximum and minimum values from a set of ve intensities.

Statistics
For testing statistical signi cance, the two-tailed Brunner-Munzel test with Bonferroni correction was performed using Julia 1.6.Standard Error (SE) was calculated using the three probes of the RNA structure library.The binding strength is normalized as a Z-score using Eq. ( 1): µ is the mean value of the library population, σ is the standard deviation, and x is the binding intensity of each probe in the library.

Fluorescence binding assay
A solution (100 µL) of the binder (0.01 or 0.1 µM for G-clamp, 0.1 µM for TO-N 3 and TO-PRO-1, 1 µM for TO-3-N 3 , 0.1 or 0.5 µM for TO-PRO-3) in 1x phosphate buffer (1% DMSO, 20 mM phosphate, 20 mM NaCl and 80 mM KCl) was transferred to a micro quartz cell with a 1-cm path length.Serial aliquots of a concentrated solution of RNA in 1× buffer was added to the binder solution and allowed to equilibrate for 2 min.The excitation wavelength was set at 360 nm for G-clamp, 501 nm for TO-N 3 and TO-PRO-1, 623 nm for TO-3-N 3 and TO-PRO-3, and the emission was recorded at 20°C.Fluorescence measurements were performed with a JASCO-6500 spectro uorometer (JASCO, Tokyo, Japan).
The data from the titrations were analyzed according to the independent-site model by non-linear tting to Equations (2) or (3), in which F 0 is the initial uorescence intensity in the absence of RNA, Q (= F max /F 0 ) is the uorescence enhancement upon saturation, A = K Dapp /C ligand and X = nC RNA /C ligand (n is the putative number of binding sites on RNA and n = 1 was used). 68The parameters Q and X were determined by KaleidaGraph (Synergy Software, PA).The K Dapp values in the main text show the mean values of two or three experiments.

RNA secondary structure prediction and visualization
The forna website 69 was used to generate illustrations of the RNA secondary structures predicted by RNAfold 2.4.13 in the ViennaRNA package 70 with the temperature set to 25°C.The RNA structures extracted from the long transcripts (5' UTR and HIV-1 genome) included in library-2 were taken from a previous study. 30

Structural preference analysis
Following previous studies 71 , secondary structure prediction of RNA motifs in the RNA structure library was performed by RNAsubopt 2.4.13 in the ViennaRNA package 70 with parameters set to the following: (command: RNAsubopt --temp = 25 --stochBT = 30).Each nucleotide (A, G, U, C) of each base pair state (ssRNA or dsRNA) or each structural motif (hairpin loop, inner loop, or stem) was counted using the secondary structures generated by RNAsubopt as input.

FID assay
Fluorescence intensities in FID assays were measured with a microplate reader In nite® 200 PRO (TECAN Group Ltd., Mannedorf, Switzerland) using i-control® and LBS coated Optiplate TM -96F as 96-well plates.Buffer solution (20 mM phosphate pH 7.0, 20 mM NaCl, 80 mM KCl) was added to each well (49.5   µL for blank well and negative control well, 49 µL for positive control well and sample well), followed by the addition of 0.25 µL of ligand solution (20 µM for TO-PRO-1 and 100 µM for TO-PRO-3) to each well except for blank wells.RNA solution (0.5 µL of 10 µM for TO-PRO-1 and 50 µM for TO-PRO-3) in binding buffer was dispensed in positive control and sample wells.DMSO was added to the control (negative and positive, 0.25 µL) and blank (0.5 µL) wells; while 0.25 µL of compound solution in DMSO (1 mM, Targetmol) was added to each sample well and mixed with RNA-ligand solutions.Fluorescence intensities of the mixtures were measured after incubating for 30 min.The excitation wavelength was set at 485 nm for TO-PRO-1 or 620 nm for TO-PRO-3.Normalized uorescence intensity (F) was calculated using Eq. ( 4) described below: 4 Hits were selected based on a reduction of TO-PRO-1 or TO-PRO-3 signal by less than a standard deviation (σ) from the mean.Normalized uorescence intensities greater than 1.5 were excluded from calculations for the mean and σ.The uorescent quantum yields (QY) of AS 602801 in the presence of RNA were calculated using quinine sulfate in 0.1 M H 2 SO 4 as a standard (Φ = 0.55).Absorbance and uorescence values were recorded 3 min after mixing RNA and AS 602801.For calculating QY, conditions for absorbance measurement were as follow: [AS 602801] = 2.5 µM, [RNA] = 5 µM, and ε366; and for uorescence measurement: [AS 602801] = 1 µM, [RNA] = 2 µM, emission spectrum area of 380-600 nm was used for integration.QY values were calculated according to Eq. ( 5): F ref .

Figures
Figure 1 Method   The orange dash and blue lines indicate the hydrogen bond and stacking interaction, respectively.The complex structure was modeled by RNAComposer and MacroModel.
− F (buf f er+indicator) F (indicator+RN A) − F (buf f er+indicator) overview and the tested small molecules.(a) Schematic of the large-scale analysis of small molecule-RNA interactions.The designed RNA structure library was used for the multiplexed pull-down assay with a small molecule, and enriched RNA structures were analyzed to quantify the small molecule-RNA interactions on a DNA barcode microarray.(b) Structure and RNA recognition mode of Gclamp-N 3 .The binding moiety is shown in blue, the linker is shown in black, and azide is shown in red.(c) Structures of TO-PRO-1 and TO-PRO-3.(d) Structures of TO-N 3 , TO-N 3 -2, TO-3-N 3 , and TO-3-N 3 -2.

Figure 2 Box
Figure 2

Figure 3 Large
Figure 3

Figure 4 Analysis
Figure 4