Structural ensemble-based docking simulation and biophysical studies discovered new inhibitors of Hsp90 N-terminal domain

Heat shock protein 90 (Hsp90) is one of the most abundant cellular proteins and plays a substantial role in the folding of client proteins. The inhibition of Hsp90 has been regarded as an attractive therapeutic strategy for treating cancer because many oncogenic kinases are Hsp90 client proteins. In this study, we report new inhibitors that directly bind to N-terminal ATP-binding pocket of Hsp90. Optimized structure-based virtual screening predicted candidate molecules, which was followed by confirmation using biophysical and cell-based assays. Among the reported crystal structures, we chose the two structures that show the most favourable early enrichments of true-positives in the receiver operating characteristic curve. Four molecules showed significant changes in the signals of 2D [1H, 15N] correlation NMR spectroscopy. Differential scanning calorimetry analysis supported the results indicating direct binding. Quantified dissociation constant values of the molecules, determined by a series of 2D NMR experiments, lie in the range of 0.1–33 μM. Growth inhibition assay with breast and lung cancer cells confirmed the cellular activities of the molecules. Cheminformatics revealed that the molecules share limited chemical similarities with known inhibitors. Molecular dynamics simulations detailed the putative binding modes of the inhibitors.


Results
Selection of algorithm for high-throughput structure-based virtual screening. The poses and energies of the docked chemicals reflect the performance of the HTSBVS. Poses predicted by SBVS should be similar to those found in crystal structures. True-positives should have lower energies than false-positives, enabling the earlier enrichments of true-positives in HTSBVS. The combined use of the best pair of software and structure can provide improvements in poses and energies. We first selected the best algorithm by testing which software could most faithfully reproduce the poses of inhibitors found in X-ray crystal structures (Fig. 1). Inhibitors with crystal structures available in complex states were docked into the Hsp90N structures using six algorithms from five software programs [AutoDock 21 , AutoDock Vina 22 , DOCK 3.6 23 , DOCK 6.7 24 (anchor-and-grow and rigid methods), and Glide-SP 25,26 ]. The docked poses in each structure were compared with those found in crystal structures. The predicted poses that could be overlapped with the experimental ones within 2 Å of the root mean square deviation (RMSD) were judged as successful. Glide showed an average success rate of 33.9%, followed by DOCK 6.7 (anchor-and-grow, 30.3%), AutoDock Vina (26.1%), DOCK 3.6 (21.4%), DOCK 6.7 (rigid, 18.7%), and AutoDock (16.6%) (Fig. 1). The highest success rates for Glide, DOCK 6.7 (anchor-and-grow and rigid), DOCK 3.6, AutoDock, and AutoDock Vina were 47.0, 42.4 and 29.4, 45.8, 34.7, and 38.8%, respectively. Therefore, we selected Glide as an engine for subsequent dockings.
Selection of template structure and high-throughput structure-based virtual screening. We then chose the best structure by performing SBVS in the crystal structures with 33 inhibitors extracted from the BindingDB 27 (Fig. S1) and 2370 of their property-matched but topologically different decoys (Fig. 1). Here, the inhibitors were filtered to be similar with a Tanimoto coefficient (Tc) value less than 0.6. The performance was judged based on logarithmically scaled area under the curve (LogAUC) based on receiver operating characteristic curve (ROC). A higher value of LogAUC reflects the early enrichment of the true-positives more faithfully than using the AUC value. Considering faithful pose reproduction and LogAUC, two structures, 2BYI-A (Chain A from Hsp90N with PDB ID of 2BYI) 28 and 2YI5-A 29 , were selected. In 2BYI-A and 2YI5-A, 38% and 39%, respectively, of the reproduced poses were judged as successes. The values of AUC, LogAUC, and enrichment at 1% (EF1) were 83.5, 33.5, and 25.9, respectively, in 2BYI-A and 83.0, 31.9, and 27.62, respectively, in 2YI5-A. Figure 1 shows the ROC curves of 2BYI-A and 2YI5-A. It is noteworthy that the averaged values of AUC, LogAUC, and EF1 for all 108 crystal structures are 75.7, 21.0, and 16.3, respectively. The backbone RMSD between 2BYI-A and 2YI5-A is 1.08 Å. Despite the close similarity in the structures, however, no top candidates in the HTSVS were overlapping between the two, again demonstrating the heavy dependence of HTSBVS on the template structure in use. We purchased the top 30 compounds from the results of both 2BYI-A and 2YI5-A, comprising 60 in total, based on the Glide-SP scoring functions (Fig. S2).

Two-dimensional NMR experiments identified direct binders.
Of the biophysical methods for studying protein-ligand interaction, 2D [ 1 H, 15 N] correlation NMR spectroscopy is the method that generates fewer false-positives 30,31 . Compared with other methods that generate an averaged, single, and overall pattern, all the peaks in 2D NMR can probe intermolecular interaction. The existence of specific changes in 2D NMR peaks upon the addition of a small molecule may be sufficient to judge the molecule as a real binder. We observed noticeable signal changes when adding compounds 1 (ZINC09350001), 2 (ZINC00302593), 3 (ZINC04643798), and 4 (ZINC04757705) ( Table 1). Interestingly, they share some apparent similarities with each other. 1-3 contain resorcinol moieties linked with pyrazole; on the contrary, 4 has triazole in the position corresponding to pyrazole in 1-3. The perturbed patterns of 2D NMR mimicked those of the reference, 17-DMAG (Figs 2 and S3), though the quantified changes differed with 1-4. The mapping of the perturbed residues onto 3D structure allowed the confirmation that 1-4 bind to ATP binding site of Hsp90N (Figs 3 and S4). Because the perturbations happened at the slow exchange, we employed line-shape analyses to calculate K d values using a series of 2D [ 1 H, 15  using differential scanning fluorimetry (DSF) to support direct binding (Fig. 2C). Apo state showed the T m value of 44.4 °C. Remarkably, the order of the elevated T m values in 1-4 and 17-DMAG of 7.3, 8.9, 3.5, 6.5, and 11.3, respectively, are entirely consistent with the 2D NMR data, suggesting that direct binding led to the stabilization of Hsp90N.

Cell-based activity test confirmed the cellular activities of the inhibitors. Whereas only 1 and 2
were significantly inhibitory to MCF7 cells (Fig. 4A), all four compounds inhibited the growth of A549 cells in a concentration-dependent manner (Fig. 4B). The decreased growths using all concentrations were comparable to or smaller than the decreased growth caused by 17-DMAG for both MCF7 and A549 cells. To know the effects of 1-4 in non-transformed cells, we tested the cell growth inhibition in MCF10A cells that originate from human breast mammary gland as well. The addition of the inhibitors resulted in the cell growth inhibition, but no explicit concentration dependency was observed for 1-4 within uncertainty (Fig. 4C). The hallmark response of Hsp90 inhibition is the induction of Hsp70. We further checked mRNA levels in MCF7 cells using RT-PCR upon the addition of 1-4 and 17-DMAG in 3 and 6 h. Meaningful time-dependent increments were detected (Fig. 4D). These data will suggest that 1-4 have activity through Hsp90 in the cellular levels.

Cheminformatics retrieved the chemical novelties and chemicals similar to the hit molecules.
We then searched for the closest known inhibitors of 1-4 in terms of chemical similarity by Tc from BindingDB, which contains Hsp90N inhibitors confirmed directly by the enzyme-based experiments. Corresponding molecules were ZINC16051694 (Tc = 0.46, K d = 280 nM) 34 , ZINC01264822 (0.36, IC 50 = 0.5 nM) 35 , ZINC18130036 (0.42, IC 50 = 2 μM), and ZINC00421600 (0.34, IC 50 = 50 μM) for 1-4, respectively (Fig. 5). The data with ZINC18130036 and ZINC00421600 came from the PubChem BioAssay database (AID: 712) 36 . In order to compare the chemical similarity in detail, we employed the similarity ensemble approach (SEA) 37 . In SEA, the pairwise Tc values between two molecules are summed to form ΣTc. By comparing the ΣTc in a test molecule and (C) Receiver operating characteristic (ROC) curves from two structures for high-throughput virtual screening in this study are drawn in red (2BYI-A 28 ) and magenta (2YI5-A 29 ). The blue line corresponds to that of 1UYG-A 51 , which was used as the DUD-E benchmark 20 . Random enrichment is shown in black for comparison. In the ROC curves, the x-axis is logarithmically scaled to emphasize the earlier enrichments of true-positives. The values of LogAUC for 2BYI-A, 2YI5-A, and 1UYG-A are 33.5, 31.9, and 12.9, respectively. the distribution of ΣTc in a set of small molecules, we can compare the chemical novelty of a test molecule. In this study, only Tc values greater than 0.3 were considered for clarity. A higher value of ΣTc indicates that a molecule shares more chemical similarity to the others. The values of ΣTc were 1.86, 1.35, 2.19, and 0.96 for 1-4, respectively, which correspond to 453rd, 515th, 425th, and 617th of 617 Hsp90N inhibitors. The mean value of ΣTc was 8.28 in the known inhibitors, with 41.57 as the largest value for a macrocyclic o-aminobenzamide, ZINC71340643 38 Figure 29,40 , supporting the reliability of the docked models (Fig. 6B). MD simulation assessed the predicted binding modes in detail (Fig. 7). The overall folds of Hsp90N in complex with inhibitors showed no significant changes compared with the starting structure. The residence time of a contact in an MD simulation can qualitatively reflect the importance of the interaction. Of the intermolecular hydrogen bonds, those through the OD2 of Asp-93 remained almost invariable. In 1-4, the occupancy ratios were more than 99% (Fig. 7A) in 1-4 (Fig. 7B). The hydrogen bonds by the O of Gly-97 existed with residential portions of 82, 66, and 78% for 1-3, respectively. On the contrary, the O of Gly-97 in 4 was not used for the intermolecular hydrogen bond, occupying 5% of the trajectories (Fig. 7C). Similarly, the hydrogen bond through the OD1 of Asp-102 in 4 was unstable appearing at about 12%. The results indicate that the detailed features of the intermolecular hydrogen bonds differ between 1-3 and 4, although the docked models share apparent similarities. The difference may be caused by the improper location of water molecules that play substantial roles in the intermolecular interactions between Hsp90N and inhibitors 6 .

Discussion
The main purpose of SBVS is to find new chemotypes that are difficult to identify by simple quantitative structure-activity relationships. Nevertheless, hits with a new chemotype having higher activity will be preferred in the optimization process for lead compounds. Because the quantified inhibitions of hits by SBVS are dependent on the physicochemical features of the target sites as well as the assays employed, it would be inconclusive to directly compare the data. Even considering these caveats, the sub μM inhibitions observed in 1 and 2 are somewhat stronger than those commonly measured for hits identified by SBVS 41 . In particular, the ligand efficiency metrics for 1 (0.40) and 2 (0.44) show promise for further optimization 42 . Several papers have reported new hits against Hsp90N using SBVS [43][44][45][46] . One of them employed 2D NMR to quantify the inhibition of three hits with K d values in the range of 2 to 20 μM 43 . Our 2D NMR screen used 1:1 ratios of protein and ligand. This condition is more suited to detect strong inhibitors because the population of the protein in complex varies roughly depending on the K d in the system. Indeed, the change of the chemical shifts in 1-4 was the largest in 2 followed by 1. On the contrary, if the ligand is more concentrated than the protein (>10-fold), the portion of the bound form relies less on the K d , and has the advantage of detecting weak binders. Two-dimensional NMR is a robust method to minimize the identification of false-positives such as Pan-Assay INterference compoundS (PAINS) 30,47 as hits.
The four hits in this study scarcely share chemical similarity to the known PAINS 48 . We additionally employed DSF as an orthogonal method to 2D NMR to confirm the direct bindings. It is noteworthy that Hsp90N alone has little ATP catalyzing activity 49 . Therefore the quantified confirmation of the direct binding between Hsp90N and  Here ΔH and ΔN mean the chemical shift difference between apo and holo forms in 1 H and 15 N dimensions. Once calculating the mean and standard deviation (SD) values of ΔCS in the cases of ΔCS > 0, the residues of Hsp90N are classified into four criteria: (i) ΔCS ≤ mean + SD, (ii) mean + SD < ΔCS ≤ mean + 2 SD, (iii) ΔCS > mean + 2 SD, and (iv) disappeared. The residues coloured in yellow, red, and orange correspond to the cases of (ii), (iii), and (iv), respectively. a small molecule is important to interpret the accompanying cellular activities. Our approach could be applied for discovering the small molecules that modulate the protein-protein interaction in a similar way. Here we would like to stress that the careful 2D NMR-based assays became practically applicable since SBVS had reduced the number of test samples.
Hsp90N is known as a problematic case for enriching true-positives using SBVS. In the DUD-E database 20 (Fig. S6). Also, the docking with 1-4 and their property-matched 200 decoys showed that 1-4 were hardly retrieved with the lower scores in the open form, whereas 1-4 were ranked with the lowest scores in the closed form (Fig. S6). Interestingly, the numbers of van der Waals contacts between Leu-107 and 1-4 were much more in the open form than the closed form. It indicates that the location of the region around Leu-107 directly influences positioning of 1-4. Please note that in this study we neither made any assumption on the conformation nor employed tailored scoring functions and instead selected the structure showing the best profile for HTSBVS. This may suggest the generality of our approach. The results are consistent with previous reports where the use of the structural ensemble increased the success rate of initial hits [52][53][54] . If the experimentally available structures are insufficient in number, an ensemble prepared by MD simulations would be an alternative 55 . In this study, we did not alter the algorithm once deciding the best software to use. The paired algorithm and structure can be searched simultaneously, although this requires substantial calculation times. This will be the subject of a future study.
The cell growth inhibitions of 1-4 and 17-DMAG seem qualitatively consistent with the in vitro binding affinities. It would imply that the inhibition of Hsp90N under the cellular environments occurs largely dependent on the strength of the direct binding. Meanwhile, one may question about the quantitatively less correlated activities in the protein-binding and cell-based assays. It is equivocal to link the binding affinities with the cellular activities, but one possible explanation would be the difference in the pharmaceutical features. Compared with 17-DMAG that has been tested in clinical stages after the extensive modifications by medicinal chemistry, 1-4 are hits in the current stage. The cell permeability and solubility under cellular environments as well as the efficacy of 1-4 are less optimized. Of the Hsp90N inhibitors in clinical development, the compounds chemically closest to 1-4 are AUY922 56 and STA9090 57 . Both are classified as the resorcinol analogues from radicicol 11 . Radicicol, also known as monorden, is a macrolactone natural product from Diheterospora chlamydosporia. Since radicicol inhibits topoisomerase 58 , bacterial sensor kinase PhoQ 59 , and pyruvate dehydrogenase kinase 60 by direct binding to the ATP binding pockets of these enzymes, there is a possibility that 1-4 inhibit the enzyme in a similar manner. These proteins are classified as the members of GHKL (Gyrase, Hsp90, Histidine Kinase, MutL) that share an evolutionarily conserved ATPase domain 61 . In fact, some resorcinol compounds inhibit both Hsp90N and pyruvate dehydrogenase kinase 62,63 . A related issue is the selective inhibition for the paralogue Hsp90N of Hsp90α, Hsp90β, Grp94, and TRAP1 64 . As reported by recent papers, it has become feasible to develop organelle-specific Hsp90N inhibitors 65 . Simple comparison indicates that the contacting residues in Hsp90N for 1-4, Asp-93 and Thr-184, are all conserved in the four isoforms, increasing the possibility that the hits are inhibitory for the paralogs of Hsp90N. However, the actual inhibition and the detailed dynamic features such as the residence times of intermolecular hydrogen bonds remain to be addressed.
In conclusion, we have applied SBVS to find new inhibitors of Hsp90N, which has been known as a case with poor enrichment of true-positives by SBVS. Use of an ensemble of experimental structures has resulted in the finding of new types of inhibitors, 1-4, coupled with 2D NMR experiments. The cellular activities were confirmed by MTT experiments using cancer cells. We employed computational studies to understand the intermolecular interactions at the atomic level. The identified chemotypes of 1-4 show limited similarities to the known inhibitors, but the core regions for hydrophilic interactions are shared. The detailed study of 1-4 will be helpful for developing new and potent Hsp90N inhibitors.   with five software programs, AutoDock 21 , AutoDock Vina 22 , DOCK 3.6 23 , DOCK 6.7 24 , and Glide 25,26 . Six algorithms, including two from DOCK 6.7 (anchor-and-grown and rigid), were applied with default parameters. To generate a dataset consisting of the known Hsp90N inhibitors and their physicochemically matched but topologically different decoys, the strongest 33 inhibitors were extracted from the BindingDB database based on IC 50 or K i values 27 . The DUD-E server then generated 2370 decoys 20 . An in-house written script, ALIS-DOCK (Automated pLatform for Integrative Structure-based DOCKing), automated SBVS by simultaneously using a Linux-cluster. The criteria used to choose the best structure for virtual screening were three metrics from the ROC curves [(area under the curve (AUC) 67 , logarithmically scaled AUC (LogAUC) 23 , and enrichment at 1% (EF1) 68 ]. Once all molecules were arranged with their Glide scoring function in ascending order, the proportion of each molecule as false-positive or true-positive was plotted as x and y variables, respectively, to generate a ROC curve. The ROC curve was integrated to yield the AUC having a range of 0-1. Logarithmical scaling of the x-axis of the AUC from 0.001-1 expanded the range as −3-0, yielding a total area of 3. Once the value of AUC in the state was normalized by dividing by 3 and subtracting 0.145, the expected enrichment by chance, this was used to define the LogAUC. Therefore, the value of LogAUC lies in the range of −0.145 to 0.855. The values for AUC and LogAUC were expressed as percentages. The EF1 was defined as the enrichment factor in 1% of the compounds. For instance, EF1 of 10 indicates that 10% of the true-positives have lower scores than the molecule that is found in 1% of the false positives.

Selection of algorithm and best
High-throughput structure-based virtual screening and cheminformatics. Small molecules from the ChemBridge Express (San Diego, CA, U.S.A.) library were virtually screened using their corresponding molecules registered in ZINC 69,70 . The database supplied by the vendor was curated to contain only compounds satisfying the Lipinski rule of five, yielding approximately 450,000 compounds. The top 60 chemicals from two cases were chosen based on the scoring functions and purchased for assays. For the study of cheminformatics, the Tanimoto coefficient (Tc) was calculated based on Morgan circular fingerprint using RDKit (http://www.rdkit.org). The value of Tc lies between 0 and 1, where values closer to 1 indicate more chemical similarities.

MD simulation.
A series of MD simulations were performed in the apo and holo states using the AMBER 16 package supported by GPU 71,72 . Whereas Hsp90N was handled with the ff14SB force field, SQM and LEaP programs were used to prepare the GAFF force field for the inhibitors. The solvents, comprising TIP3P waters and Na and Cl ions for neutralization, solvated the apo or holo structures in an octahedron box. The thickness of the solvent shell from the protein was kept at least 10 Å. The Particle mesh Ewald (PME) method was used for long-range electrostatic interactions. Non-bonded interactions were truncated with 10 Å as a cut-off. All bonds involving hydrogen were constrained by the SHAKE method with an integration time step of 2 fs. Four stages comprised a run: minimization, heating, equilibrium, and production. Heating of the system from 0 to 300 K was done at a constant volume for 50 ps followed by 1500 cycles of minimization. For the next 100 ps, the equilibrium continued at a constant pressure of 1 atm and a temperature of 300 K. MD simulation for producing trajectories were followed for 102 ns using Langevin thermostat with a collision frequency of 2 ps −1 under a constant pressure of 1 atm. Analyses employed the later 100 ns with three runs starting from a structure with different random seeds. Trajectories were analysed using CPPTRAJ 73 . Purification of the Hsp90 N-terminal domain. The plasmid construct for overexpressing Hsp90N contains DNA corresponding to residues 1-235 of human Hsp90α. For stable isotope labelling, Escherichia coli strain BL21 (DE3) harbouring the expression plasmid was grown in M9 minimal medium and induced by 1 mM IPTG at 37 °C for 6 h. Harvested E. coli cells were lysed by sonication in 25 mM Tris buffer (pH 8.0) supplemented with 5 mM dithiothreitol (DTT), a tablet of Protease Inhibitor Cocktail (Roche), and 4 mM ethylene-diaminetetraacetic acid (EDTA). The soluble fraction of the cell lysate was applied to a 60 mL Sepharose Q FF column equilibrated with 25 mM Tris (pH 7.5), 2 mM EDTA, 2 mM DTT, and proteins were eluted with a linear gradient to 1 M NaCl. The protein was then concentrated using a Centriprep10 (Amicon) and purified by gel-filtration chromatography on a 350 mL Sephacryl S100 HR (2.6 × 65 cm) column in 20 mM Tris (pH 7.2), 0.1 M NaCl, 2 mM EDTA, and 2 mM DTT. NMR spectroscopy. All NMR spectra were recorded with Bruker Ascend 600 MHz spectrometer equipped with a TCI cryoprobe at 25 °C. Data were processed using TOPSPIN and were visualized using NMRView software 74 . The NMR buffer consisted of 50 mM phosphate buffer (pH 7.0) with 20 mM NaCl and 1 mM deuterated DTT, and 2.5% deuterated DMSO. To screen for small molecules that bind to Hsp90N, 15 N-labelled protein and two small molecules were mixed with 1:1 stoichiometry at a concentration of 100 μM, leading to 30 data sets in total. In cases with significant changes in the signal of 2D [ 1 H, 15 N] HSQC spectra, additional HSQCs were recorded with an individual small molecule. For the backbone chemical shift assignments, standard triple resonance experiments were performed with 13 C/ 15 N-labelled Hsp90N. A series of 2D [ 1 H, 15 N] HSQC spectra were recorded by changing the ratios of the concentrations of the small molecule and protein to calculate dissociation constants (K d s) with the four hits using the TITAN (TITration ANalysis) package 32 . The chemical shifts of 2D [ 1 H, 15 N] HSQC in holo forms were assigned by tracking the peak changes and observing 3D HNCA of holo forms. 17-DMAG, a water-soluble geldanamycin analog, was used as a positive control. Differential scanning fluorimetry. DSF using a RT-PCR system (CFX connect, BioRad) assessed the stability of Hsp90N in the absence and presence of the hit compounds that were identified by 2D NMR 75 . Once Sypro-Orange (5×) was added into the solution of Hsp90N with each hit compound, the fluorescence from the dye was measured with excitation at 492 nm and emission at 610 nm in a temperature-dependent manner from 30 to 90 °C. The concentrations for Hsp90N and inhibitors were 2 and 200 μM, respectively. As a control, DMSO