Dynamic ensembles hold great promise in advancing RNA-targeted drug discovery. Here we subjected the transactivation response element (TAR) RNA from human immunodeficiency virus type-1 to experimental high-throughput screening against ~100,000 drug-like small molecules. Results were augmented with 170 known TAR-binding molecules and used to generate sublibraries optimized for evaluating enrichment when virtually screening a dynamic ensemble of TAR determined by combining NMR spectroscopy data and molecular dynamics simulations. Ensemble-based virtual screening scores molecules with an area under the receiver operator characteristic curve of ~0.85–0.94 and with ~40–75% of all hits falling within the top 2% of scored molecules. The enrichment decreased significantly for ensembles generated from the same molecular dynamics simulations without input NMR data and for other control ensembles. The results demonstrate that experimentally determined RNA ensembles can significantly enrich libraries with true hits and that the degree of enrichment is dependent on the accuracy of the ensemble.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
We thank M. Larsen and S. Vander Roest (University of Michigan Center of Chemical Genomics) for their help and input in carrying out high-throughput screening. We also thank the Duke Magnetic Resonance Spectroscopy Center for NMR resources and assistance in carrying out experiments and thank the Duke Compute Cluster for computational resources and support. This work was supported by the US National Institutes of Health (P50 GM103297, R01 AI066975 and P01 GM0066275 to H.M.A.-H.; T32 GM08487 and F31 GM119306 to L.R.G.).
Integrated supplementary information
a. Tat-displacement assays for the hit molecules in the absence (black) and presence (gray) of 100-fold excess tRNA. Points represent the mean and error bars represent the s.d. from n = 3 independent experiments. b. Overlay of SOFAST-[1H-13C] HMQC NMR spectra of 50 μM TAR free (black) and in the presence of 3X hit molecule (red, purple or blue). Spectra are overlaid for hits that induced similar chemical shift perturbations. (*) denotes folded peaks.
a. Examples of molecules from HTS that show activity in the Tat-displacement assay, but that do not bind TAR by NMR. b. Tat-displacement assay (black) and Tat-only control assay (gray). Points represent the mean and error bars represent the s.d. from n = 3 independent experiments. c. Overlay of SOFAST-[1H-13C] HMQC NMR spectra of 50 μM TAR both free (black) and in the presence of 6X CCG-111926 (red), 4X CCG-106134 (red), or 3X CCG-160257 (red). (*) denotes folded peaks.
Supplementary Figure 3 Testing for false positives in HTS by assaying compounds that are chemically similar to hit molecules.
a. Examples of molecules with chemical similarity to hit molecules that do not bind TAR. b. Tat-displacement assays (points saturating the fluorescence reader are removed). Points represent the mean and error bars represent the s.d. from n = 3 independent experiments. c. Overlay of SOFAST-[1H-13C] HMQC NMR spectra of 50 μM TAR both free (black) and in the presence of 3X small molecule (red). (*) denotes folded peaks.
a. TAR-binding molecules identified by testing a set of ten molecules from the top 5% of docking scores b. Tat-displacement assays in the absence (black) and presence (gray) of 100-fold excess tRNA (points saturating the fluorescence reader are removed). Points represent the mean and error bars represent the s.d. from n = 3 independent experiments. c. Overlay of SOFAST-[1H-13C] HMQC NMR spectra of 50 μM TAR both free (black) and in the presence of 3X molecule (red). (*) denotes folded peaks.
Supplementary Figure 5 Hit molecules of the Filtered and Optimized libraries identified in the literature.
Hit number corresponds to Supplementary Table 1 and (*) denotes molecules also in the Optimized library.
Supplementary Figure 6 ROC plot AUC values are robust across varied methods of defining hits and non-hits.
Variations in the similarity cutoff and number of non-hits selected per hit results in minor changes in the a. chemical property distributions and b. ROC plots for the Optimized library. c. ROC plots for all libraries (Full, Filtered and Optimized libraries) when using the Boltzmann-weighted average score, arithmetic average score, and best score for all hits before (purple) and after (blue) clustering by Bemis-Murcko atomic framework as well as cell-active hits before (red) and after (orange) clustering. ROC plots were generated from one run of docking all molecules to all receptors.
Supplementary Figure 7 Enrichment scores generally decrease when EBVS is applied to less accurate TAR ensembles.
a. ROC AUC and ROC(2%) scores for docking against individual conformers of the E0,4rdc ensemble, a randomly selected MD ensemble (E0,ran), and the lowest energy NOE-based structures for apo-TAR (PDB 1ANR) and tRNA (PDB 1EHZ) for the Full and Optimized libraries. Dashed lines indicate the values for the full ensemble. b. Dependence of the ROC AUC and ROC(2%) scores on ensemble size for the Full and Optimized libraries. c. Dependence of the ROC AUC and ROC(2%) scores on the RDC RMSD for the Full and Optimized libraries. d. Dependence of the ROC AUC and ROC(2%) scores on the RDC RMSD for the other 20-member ensembles of TAR. ROC plots were generated from one run of docking all molecules to all receptors. For b. c. and d. the mean and s.d. values over all possible sub-ensembles of each ensemble size are plotted.
Supplementary Figure 8 Each conformer of the ensemble contributes differentially to the small molecule scores and some hyper-enriching sub-ensembles outperform the N = 20 ensemble.
a. The Boltzmann-weighted average population of each conformer averaged across the Full library. b. The difference in the Boltzmann-weighted population of each conformer for each subset of molecules relative to the Full library. c. The percent of hyper-enriching sub-ensembles that contain each conformer for all libraries. (*) denotes conformers that most resemble ligand bound TAR conformations and (2) denotes conformers with two binding sites.
a. Dependence of the ROC AUC and ROC(2%) scores on the accuracy (RDC RMSD) of the TAR ensembles for the Full and Optimized libraries for all hits (blue) and cell-active hits (orange). All ROC values were generated from one run of docking all molecules to all receptors. b. Mean and s.d. of EBVS scores for hits (blue) and non-hits (gray) for all ensembles for the Full and Optimized libraries. Dashed lines represent the values for the E0,4rdc ensemble. c. Binding pocket volume (Å) and buriedness (ranging between 0.5–1.0 for fully open to fully occluded pockets, respectively) defined by ICM for each conformer of all TAR ensembles.
Supplementary Figure 10 RDC evaluation of ligand bound NMR structures and comparison of inter-helical angles for ligand bound NMR structures to EBVS predicted structures for E0,ran.
a. Agreement between NOE-based NMR structures and previously published RDCs using the best-fit order tensor determined using RAMAH. Red points denote bulge residues (U23-U25). The number of RDCs used in each correlation plot is n = 46, 30 and 40 for argininamide, acetylpromazine and Neomycin B, respectively. b. Comparison of inter-helical angles for the ligand-bound NMR structures (black, mean and s.d. values over all deposited structures) with conformers in the E0,ran ensemble (open squares) and the Boltzmann-weighted EBVS-predicted structures (red, mean and s.d. values over n = 20 independent docking runs).
Supplementary Notes 1 and 2, and Supplementary Table 2.
TAR binders augmented with hits reported in the literature. Measured CD50 values represent the mean and s.d. from n = 3 independent experiments.
Table of PDB structures used in the ligand bound RNA docking benchmark.