Exploration of the structural requirements of Aurora Kinase B inhibitors by a combined QSAR, modelling and molecular simulation approach

Aurora kinase B plays an important role in the cell cycle to orchestrate the mitotic process. The amplification and overexpression of this kinase have been implicated in several human malignancies. Therefore, Aurora kinase B is a potential drug target for anticancer therapies. Here, we combine atom-based 3D-QSAR analysis and pharmacophore model generation to identify the principal structural features of acylureidoindolin derivatives that could potentially be responsible for the inhibition of Aurora kinase B. The selected CoMFA and CoMSIA model showed significant results with cross-validation values (q2) of 0.68, 0.641 and linear regression values (r2) of 0.971, 0.933 respectively. These values support the statistical reliability of our model. A pharmacophore model was also generated, incorporating features of reported crystal complex structures of Aurora kinase B. The pharmacophore model was used to screen commercial databases to retrieve potential lead candidates. The resulting hits were analyzed at each stage for diversity based on the pharmacophore model, followed by molecular docking and filtering based on their interaction with active site residues and 3D-QSAR predictions. Subsequently, MD simulations and binding free energy calculations were performed to test the predictions and to characterize interactions at the molecular level. The results suggested that the identified compounds retained the interactions with binding residues. Binding energy decomposition identified residues Glu155, Trp156 and Ala157 of site B and Leu83 and Leu207 of site C as major contributors to binding affinity, complementary to 3D-QSAR results. To best of our knowledge, this is the first comparison of WaterSwap field and 3D-QSAR maps. Overall, this integrated strategy provides a basis for the development of new and potential AK-B inhibitors and is applicable to other protein targets.

www.nature.com/scientificreports/ proteins is required for AK-B to correct chromosome location in mitosis process. Similar to AK-A, it regulates different processes in mitosis. AK-B activity is based on the phosphorylation of its residue Thr232 and Ser10 of its substrate Histone H3 15,16 . necessary for accurate completion of mitosis Inhibition of AK-B activity can lower the H3 phosphorylation, as a result of which improper condensation of chromosome and in-complete cytokinesis occurs, ultimately leading to cell apoptosis. Its over-expression is found in multiple human cancers such as mesothelioma, malignant endometrium, glioblastoma, non-small cell lung carcinoma, hepatocellular carcinoma, testicular germ cell tumours, oral cancer, thyroid, ovarian, colorectal and prostate cancer [17][18][19][20] . Aurora kinase C particularly is found in the testes and plays an important role in spermatogenesis and regulates the movement of flagella and cilia 21 .
Structurally, Aurora kinases are comprised of three domains: the N-terminal domain, followed by a conserved kinase domain, and a C-terminal domain. The N-terminal domain exhibits sequence dissimilarity thus providing selectivity for protein-protein interactions. The kinase domain constitutes a β-stranded lobe and an α-helical lobe on the N-terminal and C-terminal respectively connected by a hinge region. The autophosphorylation of Thr 288 (AurA), Thr232 (AurB) and Thr195 (AurC) in the catalytic T-loop region of the kinase domain's C-terminal lobe results in conformational changes thereby activating the kinase domain 22,23 . Aurora kinases are considered as potential drug targets for tumour remedy due to their strong association with tumorigenesis 24 .
A number of small molecules inhibitors against AK-B have been reported to date. A classification for such compounds has been reported by Yan and colleagues who utilized a machine learning algorithm (Self-Organizing Map and Support Vector Machine) to classify Aurora kinase inhibitors into three classes, namely dual Aurora-A and Aurora-B inhibitors, Aurora-A selective inhibitors and selective inhibitors of Aurora-B 25 . Aurora kinases are effectively inhibited in numerous preclinical cell line and animal models. Several small molecule inhibitors have been developed and are indifferent stages of clinical trials. Uptil now only ten clinical trials have been reported, using inhibitors targeting specifically AK-B and most of them are still in phase I stage 26 such as AK-B selective inhibitors Barasertib (AZD1152) [27][28][29][30][31] , Chiauranib 32,33 , BI847325 34 , PF-03814735 35,36 , GSK1070916 [37][38][39] , TAK-901 40 and hesperidin 41 and the pan Aurora A/B inhibitors VX-680 19 and ZM447439 [17][18][19]29 . Most of these inhibitors are ATP-competitive with a planar heterocyclic ring system that can reside in the adenine-binding region and mimic adenine-kinase interactions. These inhibitors are derivatives of indole, bis-indole, pyrrolopyrazole, pyrimidine, thiazolo-quinazoline, quinazolin, pyrazoloquinazoline, fused tricyclic structures, and some other structures 42 . In addition, the prospect of designing potent allosteric inhibitors for AK-B appears very promising from the clinical perspective, because it may generate selective inhibitors 43 . Recently, Colombo and colleagues proposed a new possibility in pursuit of designing selective chemical tools by perturbing the intermolecular interaction between chaperone and kinases thus, targeting protein folding and rendering it inactive 44 . Unfortunately, despite intensive efforts, no small molecule inhibitor of AK-B has been approved by the FDA, thus demanding the development of new inhibitors.
In the current study, a multiplex computational approach including molecular docking, 3D-QSAR, pharmacophore-based screening, MD simulation and binding free energy calculation was performed to investigate the binding mechanism of AK-B inhibitors. The 3D-QSAR model was built using CoMFA and CoMSIA approaches to explore the structural requirements of inhibitors impacting their inhibitory activities. Furthermore, to investigate the conformation and contribution of key residues to the potency of compounds, docking, MD simulations and binding free energy calculations were performed. In this study our main focus is to demonstrate a combined computational approach: the combination of QSAR with simulation, specifically in combination with the WaterSwap 45 method. This is first such combined application according to best of our knowledge. By understanding the binding mode of aurora kinase B inhibitors, we can get deeper insights into the structural requirements that may produce more selective and potential AK-B inhibitors.

Materials and methods
Data set. A structurally diverse dataset of 57 compounds was collected from literature to perform 3D-QSAR studies probing the effect of inhibition against AK-B 46,47 . On the basis of structural diversity and activity, 45 out of 57 compounds were randomly selected as the training set however, the remaining 12 compounds were used as the test set. The training set was used to build the 3D-QSAR model while the test set compounds were used for the verification of the developed models. The test compounds evenly covered the range of biological activity and structural diversity of the dataset. Observed biological activities (IC 50 ) were changed to negative logarithm (pIC 50 ) before 3D-QSAR model generation. The chemical structures and the experimental activities of compounds are presented in Table S1 (supplementary material). Structure preparation. The crystal structures of Aurora kinase B crystallized with inhibitors were taken from the protein data bank ( Table 1). The structure preparation was performed in MOE 48 and the missing loops, steric clashes and missing atom names were corrected. The hydrogen atoms were added, water molecules were removed, and the energy minimization was performed using Amber99 force field. The protonation state of every titratable residue within the complexes was assigned at physiological conditions using the Protonate-3D module of MOE 48 and their conformations were produced with default parameters by using the Conformation Search option. Finally, the protein complex was minimized (giving a RMSD of 0.3 Å by selecting heavy atoms) using the force field AMBER99 49,50 . Pharmacophore modeling. The MOE option in LigandScout 56 was used to build the pharmacophore models on the basis of available crystal structures of AK-B complex (Table 1). Hydrophobic (HY), hydrogenbond donor (HBD), hydrogen-bond acceptor (HBA) and exclusion volume features were considered for the generation of pharmacophore model. Numerous pharmacophore models were generated with significant statistical Partial least square analyses (PLS). To associate the CoMFA or CoMSIA fields of 3D-QSAR utilizing AK-B inhibitors and their biological activities, partial least square (PLS) analysis (regression analysis) was performed. The optimal number of components that produces most promising extrapolative models was recognized by using cross-validation leave-one-out (LOO) method. The column filtering value was set at 2.0 kcal/mol to make better signal to noise ratio, by removing some lattice points, with less difference in energy than threshold value. The PLS analysis was performed using no column filtering with non-cross-validation method. The mean values for cross-validation correlation coefficient q 2 and the resultant standard error of prediction (SEP) values were achieved by reiterating every random run of cross-validation.
Validation of CoMFA and CoMSIA models. The robustness and internal prediction power of the generated models was assessed by cross-validation leave one out (LOO) method. The q 2 and r 2 values showed the internal prediction power and sturdiness of the model respectively. The average values of r 2 (r boot 2 ) and SEE (SEE boot )) were calculated, a bootstrapping analysis of 60 runs was also performed to measure the biasness of the calculation. It has been reported that only q 2 value is not enough to assess the predictive quality of 3D-QSAR model thus external validation must be applied. For this purpose, a total of eleven test set compounds that were not utilized previously to build the QSAR model were used for the external validation. The external test set validation is considered as the most reliable validation method to evaluate the extrapolative capability of the developed model. The predictive ability of CoMFA and CoMSIA models was assessed externally by forecasting the biological activity of independent set of test compounds. The model prediction power was expressed by r 2 (rpred 2 > 0. 6), variance in prediction (Qext 2 > 0. 5), standard error of prediction (SEP) and external standard deviation error of prediction (SDEPext).
The given equation was used to calculate R 2 pred .
(1) www.nature.com/scientificreports/ where PRES indicates the sum of squared deviations between biological and predicted activity values while SD indicate the sum of squared deviations between average biological activity of compounds in the training set and experimental activity of the test set compounds. The cross-validated coefficient (q 2 ) was calculated by the following given formula: where y i and ỹ i are the experimental and the predicted values, respectively; and y is the averaged value for the variables of the training set.

Molecular dynamics simulations.
To test the stability of the docked complexes, and for insight into the binding interactions in the inhibitor complexes, molecular dynamic simulations were performed with the AMBER 12 simulation software package 49 using the ff14SB 60 and ff99SB force field 61 . The Acyl-54 (most active), Acyl-24 (inactive), ZINC11253730, ZINC42019540, ZINC65618522 and ZINC07046484 compounds bound in target structures were obtained from molecular docking, as described above. The atomic charges of the compounds were calculated based on the electrostatic potential from single point HF/6-31G* calculations using Gaussian 03 and fitted using RESP in the Antechamber module 62 . The GAFF force field 63 was used to parametrize the compounds. LEaP module was used to assign Hydrogen atoms by setting default protonation states of ionizable residues at a neutral pH. The systems were solvated using the TIP3P water model and a minimum of 10 Å solute-wall distance was used to cover every complex. Each complex was neutralized with chloride counter ions. The solvated systems were energy minimized by following two stages of steepest descent algorithm and conjugate gradient algorithm. The molecular dynamic simulations were performed using heating, density equilibration and production run in Berendsen isothermal isobaric ensemble MD. The systems were gradually heated up to 300 K in 500 ps at constant volume and constant pressure (1 atm) with a Langevin thermostat that was used to maintain a temperature of 300 K. While an isotropic pressure scaling algorithm was used to maintain the pressure of 1 bar, using a pressure relaxation time of 1 ps. Then every system underwent production run of 200 ns. The particle mesh Ewald method was used for calculating long-range electrostatic interactions, with a 10 Å cutoff. The SHAKE algorithm was used to constrain bonds involving hydrogen atoms to their equilibrium length. After equilibration, coordinates were saved every 50 ps. These collected trajectories were used for structural and energetic analysis of each system.
Binding free energy calculations. Binding free energy of all systems was calculated by two different methods: WaterSwap 45,64 and MM(PB/GB)SA 65 . WaterSwap is an absolute binding free energy method that avoids the cavitation and some other issues of double-decoupling methods 45 . WaterSwap calculates the free energy for exchanging a bound ligand with a flexible water cluster of similar shape and size. The free energy difference is calculated by Monte Carlo replica exchange simulations along the Water Swap Reaction Coordinate. It uses an explicit treatment of water, so includes the detail of protein-water, protein-water-ligand and ligand-water interactions that are missing in continuum solvent methods 64 . WaterSwap uses a reaction coordinate which swaps the bound ligand with an equivalent volume and shape of water, moving the ligand from the protein into bulk solvent, with simulations at points along this WaterSwap Reaction Coordinate, using a dual topology algorithm. It uses an identity constraint 66 to define the water cluster: this identity constraint places identity points in space instead of labelling water molecules to define water molecules in the binding site. The absolute binding free energy is calculated by replica exchange thermodynamic integration method 67 . The end points are a pair of simulation boxes (coupled to the same thermostat), one being a water box (box of water molecules) and the other is protein box contain protein-ligand complex solvated in periodic boundary box of explicit water molecules. With the identified cluster, the total energy of the system is assessed using the following equation, where E proteinbox is the energy of all the molecules in the protein box except the ligand, E waterbox is the energy of all the molecules except the water cluster identified in the water box, E ligand represents the intramolecular energy of the ligand, E cluster is the intermolecular energy between all the water molecules present in the water cluster, E ligand:protein box is the energy of interaction between the ligand and all atoms of the protein box, E cluter:water box is the resultant energy of interactions between all water molecules and water clusters of the water box, λ is the WaterSwap Reaction Coordinates, used to scale E cluster:water box by 1 − λ. The decoupling of the ligand from the protein box is associated with decoupling of water cluster from water box. Simultaneously, cluster of water coupled to the protein box and ligand is coupled to the water box. This energy calculation between water molecules and the ligand in the protein box is represented by E ligand:water box, and the molecules and water cluster in the box is presented as E cluster:protein box and scaling by λ. The λ is a single coordinate reaction that is changed from λ = 0 (ligand bound to the protein in protein box) to λ = 1 (unbound ligand and is in bulk water). Moreover, it corresponds to a transferred of water cluster to the protein box for filling the resulting cavity.
The absolute binding free energy is calculated by thermodynamic integration (TI) using the gradient of energy calculations with respect to λ, www.nature.com/scientificreports/ The ensemble average is calculated at different values of λ to obtain free energy gradient across λ, To get average free energy gradients, Monte Carlo (MC) sampling for each λ must be achieved. The binding free energy is finally obtained by integrating the gradient across λ, WaterSwap calculations of absolute binding free energy were performed with the Sire package using the WSRC module 45 , these calculations used the same forcefield and solvent model as in the molecular dynamics simulations 67 . For each system, five WaterSwap calculations were performed of absolute binding free energies. The starting structures for WaterSwap calculation taken from clustering of the 100 ns trajectories. Errors were calculated from standard errors on these averages. The WaterSwap binding free energies were calculated by replica-exchange thermodynamic integration over 16 λ windows (0.005, 0.071, 0.137, 0.203, 0.269, 0.335, 0.401, 0.467, 0.533, 0.599, 0.665, 0.731, 0.797, 0.863, 0.929, 0.995) across the WaterSwap Reaction Coordinate. For each window, 50 million Monte Carlo steps were performed, and absolute free energies calculated by Free Energy Perturbation (FEP), Thermodynamic Integration (TI), and Bennetts algorithm, from the free energy gradient calculated over the last 30 million steps. The "Set A" soft-core parameters are used in the simulations with 15 Å Lennard-Jones and coulomb non-bonded cutoff, the shifted force field used to account long range electrostatics and reflection sphere used to restrict sampling within 15 Å radius of compound. VMD 1.9.1 68 used to visualize the trajectories. Analysis and visualization were carried out with alignment of protein backbone using RMSD tool in VMD.
The binding free energy of the system was also calculated separately by a completely different approach, namely the MM(PB/GB)SA method implemented in AMBER 16 69 . This is an implicit solvent approach, which can give useful results in some cases. 1000 frames were extracted from the simulation trajectories and MMPBSA. py module of AMBER was used for calculation analysis. In the MM(PB/GB)SA approach, the binding free energy is calcualted as the difference between the free energy of complex, and that of the ligand and receptor.
Furthermore, the binding free energy was decomposed into residue contributions to identify amino acids that contribute to binding affinity 70,71 . The binding interaction of each inhibitor-residue pair includes four terms: where the van der Waals contribution (ΔG vdw ) and the electrostatic contribution (ΔG ele ) are calculated using the Pmemd.cuda program 72 in Amber 16. The nonpolar solvation contribution (ΔG nonpol ) is the contribution of nonpolar to the solvation free energy and calculated from the solvent accessible surface area (SASA) model by the LCPO method: (ΔG SA = 0.0072 × ΔSASA). The polar solvation contribution (ΔG pol ) was computed using the generalized Born module in Amber16.

Results and discussion
Molecular docking. Before starting the docking, all available crystal structures of AK-B protein were assessed to evaluate the degree of similarity in their binding sites. The results indicate that most of the binding site residues are conserved within the binding site ( Figure S1). After evaluating the degree of similarity, all compounds were docked to the binding site of 4AF3 protein. The docking produced 30 conformations for each compound. The clusters were examined, and the final docked conformation was selected on the basis of scores. The binding pocket of AK-B is composed of residues Leu83, Phe88, Val91, Ala157 and Leu207. The docked conformation of VX-680, which is a cognate ligand of aurora kinase B, is shown in the Fig. 1. It was observed that compound acyl-54 was well located within the binding pocket of AK-B and showed similar type of interaction with the active site residues of AK-B as presented by cognate ligand. The amino group of Ala157 forms two hydrogen bonds with pyrazolidine and linker nitrogen of VX-680 at a distance of 1.92 and 2.39 Å. Additionally Leu83, Val91 and Phe88 are involved in hydrophobic interaction while Phe88 forms pi-pi interaction with the benzene ring of VX-680. In case of acyl-54, a hydrogen bonding interaction is found between amino group of Ala157 and carbonyl group of acyl-54 at a distance of 1.85 Å. Another hydrogen bond is found between Glu155 and NH group of Acyl-54 at a distance of 3.05 Å. These hydrophilic interactions have been already reported in the literature 73 . As well as docking with the 4AF3 51 structure, the selected compounds were also docked to the active site of other AK-B crystal structures (Table 1). Further, these docked conformations were used in receptorguided 3D-QSAR studies.

Molecular alignment.
In CoMFA and CoMSIA studies, 3D structures of the molecules are required to be aligned based on a suitable conformational template and its substructure, which taken as a "bioactive" conformation. Molecular alignment based on the common scaffold has been extensively used 74,75 ; conversely it is thought to be more reasonable if the model is developed and evaluated on the active conformations. Moreover, the alignment based on docked conformations will help in the contour map analysis of the models in a structure-(4) dE/D = E cluster:proteinbox + E ligand:waterbox − E ligand:waterbox + E cluster:waterbox www.nature.com/scientificreports/ based manner. In the present study, molecular alignment was achieved via molecular docking (Fig. 2). As a result, all the compounds were aligned in the active site for building 3D-QSAR models.
CoMFA and CoMSIA statistics. The CoMFA and CoMSIA models were built using Sybyl7.3 to assess the changes in three-dimension structural features of chemical substitution affects the anticancer activity of acylureidoindolin derivatives. The statistical values of these models are summarized in Table 2   www.nature.com/scientificreports/ correlation coefficients r 2 pred values 0.733 and 0.947, respectively. Relatively, the CoMSIA model presented better competency in predicting the biological activity of external test set compounds (Fig. 3). The statistical results indicate that these developed models are robust enough to design novel and potent inhibitors. Observed and predicted pIC50 values of CoMFA and CoMSIA model was shown in Table 3.  Table 2. The generated 3D-QSAR models were found to be reliable and satisfactory if q 2 value is greater than 0.50 and r 2 value is greater than 0.90, These statistical keys, rep-  www.nature.com/scientificreports/  50 values. The resulting q 2 and r 2 values were in the range of − 0.059 to 0.085 and − 0.006 to 0.030 respectively. The predicted pIC 50 values of acyl and indoline derivatives are listed in Table 3. The correlations between calculated and predicted pIC 50 values for the training and the test sets are depicted in Fig. 3. External validation (using a test set) was also carried out to evaluate the stabilities and the predictive capability of the generated QSAR model. These statistical calculations validated the good ability of our CoMFA and CoMSIA model to predict the external dataset.

CoMFA and
CoMFA and CoMSIA contour map analysis. To visualize the effects of CoMFA and CoMSIA fields on acyl and indoline derivatives in three dimensional spaces, the contour plots for the final model of CoMFA and CoMSIA are shown in Fig. 4. The contour analysis might be effective in recognizing the significant areas where changes in all five fields around the molecule describe the differences in IC 50 values. The field type "standard deviation and coefficient" (stdev*-coeff) was used for building the contour maps. The compound acyl-54 (most active compound) was overlaid on contour plots for visualization.
In CoMFA model, the sterically favorable and unfavorable areas are highlighted by green and yellow colors respectively. While electrostatic map of CoMFA model shows blue and red color for electron donating and withdrawing group respectively. The CoMFA contour map is shown in Fig. 4(A-D).
It is shown that the sterically favorable contour is present near the linker which is adjacent to the core structure of biologically most active compound (acyl-54) indicating that the bulky substitution at that position is favorable (Fig. 4A). This is observed for two compounds, acyl-52 and acyl-53. Another green contour found near the carbonyl group of linker region also explains the sterically favorable substitution at this position. One of the green contours found near the methoxy group of flouro-benzoyl around Phe88 indicates that the methoxy group at this position is favorable for the activity of indoline derivatives, while in the case of indoline-14, 24a, 24c, 24g, 25a, 25h many fold decrease in inhibition is observed due to absence of methoxy group as compared to most active compound. A yellow contour is present near the CH2 of NH(CH2)2pyrrolidin-1-yl methyl group which indicates that the less bulky substitution is suitable for this position.
The CoMFA electrostatic maps displayed in Fig. 4(C,D), represented by red and blue color contours, indicate electron withdrawing and donating groups respectively. Two red contours found near the carbonyl oxygen of the ureido moiety and the side chain the of pyrrole ring indicate that electron withdrawing group around these areas is responsible for increasing the AK-B inhibitory activity. This is verified experimentally by the compounds indoline 24a-24i and 25h where decreased in activity is observed in the absence of these carbonyl oxygens. A blue contour is found near the NH which is adjacent to 2-floro-4-methoxybenzoyl group which indicates that electron donating group at this position is favorable for increasing inhibitory activity. Presence of another red contour indicating that 2-fluoro substituent on 4-methoxy benzoyl was required for optimum activity of aurora kinase B. Similar results can be observed in case of compound acyl-32a, 32b, 32c, 34 and 35 that do not contain fluorine atom at the aforementioned position which is might be responsible for their reduced activity in comparison to acyl-54.
The steric and electrostatic contour map analysis of CoMSIA model showed similar results to CoMFA. Therefore, the remaining three fields of CoMSIA model (hydrophobic, hydrogen bond donor and acceptor) are discussed in this section. The hydrophobic contour analysis of CoMSIA model is represented in Fig. 4(E,F), in which white contour is favorable for hydrophilic character while yellow areas were favorable for hydrophobic features. A white contour was found near the NH of pyrrole ring demonstrating that the hydrophilic substitution at this position was favorable for enhancing its inhibitory activity. At the same time, another large white region was observed near ureido moiety which is complementary to the pocket requirement. A yellow contour was observed near methyl of pyrrole ring suggesting the significance of non-hydrophobic character at pyrrole ring for AK-B inhibition activity as shown in Fig. 4(E,F).
The hydrogen bond donor and acceptor analysis represented in Fig. 4(G,H), the cyan and purple contour represent the maps where hydrogen bond donor groups favored and disfavored the activity respectively. Two medium sized cyan contours were observed near NH of pyrrole ring and indoline scaffold indicating that it fulfils the pocket requirement and responsible for inhibitory activity of indoline derivatives. A purple contour was found around the methoxy group of 2-floro-4-methoxybenzoyl indicating that the hydrogen bond donor substituent is not favorable in this zone. The structure of compound 32a (pIC 50 = 7.8) and 32b (pIC 50 = 8.2) presented the similar results, the only difference in their structure is the replacement of methoxy group by hydrogen in 32b, which shows better activity than 32a.
The magenta and red contour of Fig. 4(G,H) revealed the areas where hydrogen bond acceptor groups increase and decrease activity, respectively. A red contour near NH of indoline core structure suggested that hydrogen bond acceptor group is unfavorable at this position which is also proved experimentally. Another red contour appeared near NH of ureido moiety, suggesting that the hydrogen bond donor groups are favorable at this position. Two magenta contours were found near the carbonyl oxygen of ureido moiety indicating the importance of a hydrogen bond acceptor group at this position for better activity. Here, compounds acyl-32a-32j, 34, 38-53 with ureido moiety show better activity than those which do not contain ureido moiety such as indol-14-14b, 24a, 25h. The biological activities of these compounds are lower, consistent with the contour analysis, showing that our model is robust enough and can be used for future predictions. www.nature.com/scientificreports/ www.nature.com/scientificreports/ Identification of hits via structure-based pharmacophore. A pharmacophore can be defined as "a collection of electronic and steric features which is required for optimum molecular interaction with a specific biological target, to block its biological response 76 . " For successful structure-based pharmacophore virtual screening one should know the maximum information about target of interest. To meet this point, we developed a robust pharmacophore model by using eight crystal structures of AK-B to identify novel compounds that may inhibit AK-B activity. For this purpose, eight basic pharmacophore models were generated, then aligned to produce number of shared and merged pharmacophore models to identify the best model with true and high active hit rate. The selected model was produced by combination of 4AF3 51 , 2VGO 53 and 2VGP 55 features i.e. two hydrophobic (yellow color), one donor (green), one acceptor (red) and three exclusion volume (grey). The selected pharmacophore model with desired features is represented in Fig. 5. The generated pharmacophore model was then assessed for its capability to discriminate between true positive and true negative by actives, random and decoys dataset. The hit rate of the best model with these three datasets is 70%, 3% and 10% respectively. Compound libraries from ChemBridge 77 , MayBridge 78 , NCI 79 and ZINC 80 databases (> 30 million) were first filtered via drug-like and lead-like parameters to give 17.4 million molecules that were further decreased in number (16.9) by removing duplicates. The selected model was used to screen filtered databases and ~ 23,000 hits were obtained. The 70% of active lie in top 25% of the dataset which shows the reliability of pharmacophore model, these results were also supported by statistical techniques i.e., AUC and enrichment factor. ROC curve was calculated to measure the sensitivity and specificity between active and decoys in order to avoid false positive prediction on the basis of pharmacophore fitness score and the value of AUC was found to be 0.73. Further dataset was reduced by similarity signature technique via ROC software with 0.6 cut-off. After similarity search, > 2000 compounds were used for biological activity prediction by previous developed 3D-QSAR model. Finally, 17 compounds were considered as hits on the basis of pharmacophore fit score, docking interaction and 3D-QSAR predictions.
Binding mode prediction for active compounds by docking. Molecular docking was performed using the Dock suite in MOE 48 to predict the most plausible binding conformation of the compounds. Docking is a computational technique that aims to predict the binding position poses of small molecules in protein binding sites, with scoring functions used to evaluate which poses are best matches to the protein binding site. In this study, two aspects are analysed to measure the quality of docking method: first is docking program should be able to reproduce the experimentally determined crystal pose and second the scoring function should be able to discriminate between true binders and non-binders (active and inactive). Primarily acylureidoindolin derivatives as potential Aurora-B inhibitors are docked in the active site, to investigate the binding site of AK-B.
After prediction of hit compounds by the 3D-QSAR model, seventeen compounds were selected based on pharmacophore fit score, docking scores (London dG and GBVI/WSA dG) and prediction by 3D-QSAR models.
As the most representative sample, the binding mode of AK-B (PDB ID: 4AF3 51 ) was selected to continue with further docking analysis. This analysis included assessing the compounds position inside the binding cavity and their hydrogen bond formation pattern. It is evident that these compounds bind in the catalytic pocket of AK-B. The interactions specifically with Lys106, Glu155 and Ala157 in the hinge region and Leu83, Val91 and Leu207 in the conserved hydrophobic region are consider for the binding affinity. Only four compounds from the ZINC database showed hydrogen bond interactions with Lys106, Glu155 and Ala157 as shown in the Fig. 6. The proposed binding modes of the compounds ZINC11253730, ZINC42019540, ZINC65618522 and ZINC7046484 are shown in Fig. 6. Analysis of the binding mode of compound ZINC11253730 showed that it interacted via www.nature.com/scientificreports/ hydrogen bond between the carbonyl oxygen of the carbamoyl acetamide and the NH of Ala157 with the distance of 2.64 Å. Another hydrogen bond was observed between the side chain of Glu155 and carbonyl oxygen of ZINC11253730 at a distance of 2.36 Å. Asn206 also attached via hydrogen bond that radiated from its side chain oxygen and targeted the CN of compound with the distance of 3.5 Å. Besides hydrogen bonds, the molecule also interacted to the target protein via hydrophobic interaction through its benzonitrile ring to the phenyl ring of Lys106. Similarly, compounds with zinc identification code ZINC65618522 and ZINC7046484 interacted efficiently with Ala157 and Glu155 through hydrogen bond of ~ 3.0 and ~ 2.5 Å respectively. These compounds also bound through hydrophobic interaction via their phenyl ring with the five-membered ring of Pro158. Another compound, ZINC42019540 (which is similar to ZINC11253730) interacted through two hydrogen bonds with the backbone NH of Ala157 and carbonyl oxygen of compound mentioned in the Fig. 6, however, Leu83, Phe88, Pro158 and Leu207 residues formed CH-π and π-π interactions with all these compounds.

Stability of complexes and binding mode analysis. Initially, replicates of 200 ns MD simulations
were performed with both forcefields to explore the binding of the reported inhibitors to AK-B protein. In this regard a total of eight simulations were performed using crystal complex (4AF3) and the reported most active inhibitor (Acyl-54) of acyl ureido indoline derivative. The results indicate that ff14SB force field is more suitable for AK-B system than ff99SB due to excessive structural drift in the activation loop of AK-B. Detailed results for both forcefields are provided in the SI ( Figure S2-S3). After analyzing the results with both force fields, the ff14SB force field was finally used to run the 200 ns MD simulations of remaining complexes with Acyl-24, ZINC11253730, ZINC42019540, ZINC65618522 and ZINC07046484. Simulation convergence and system stability were monitored by computing root mean square deviation (RMSD) of backbone atoms C, Cα, N and O with respect to the starting structure (Fig. 7). The RMSD plot showed stability of all systems at around 5 ns. The RMSD values of all systems initially increased during equilibration phase and indicate convergence after 10 ns. The active site residues show stability with less deviation. The RMSF plot was also analyzed from 0 to 200 ns for the regions which show higher flexibility (Fig. 8). The largest fluctuations were observed in the loop region. The binding of compounds did not affect the proteins overall conformational diversity. The interaction analysis in terms of hydrogen bonding and hydrophobic interactions and their occupancy were calculated during simulation ( Figure S4-S5).
The reference compound VX-680 binds deeply within the active site of AK-B via a number of hydrophilic and hydrophobic interactions with the residues Lys106, Glu155 and Ala157 (hinge region) and Leu83, Val91 and Leu207 of hydrophobic region respectively. The amino group linking the pyrazole and pyrimidine ring show hydrogen bonding to the carbonyl oxygen atom of Ala157 throughout simulation with 99% occupancy. Another hydrogen bond which is not found during docking is observed between the Ala157 and N atom of pyrazole ring. The rest of the hydrophobic interactions observed in docking remain stable during the MD simulation. The benzene ring of VX-680 forms a pi-pi interaction with residue Phe88. Initially the cyclopropyl ring formed hydrophobic interaction with Leu207 but after 2 ns simulation it projected out towards the solvent. The piperazine ring is placed in a solvent-exposed region of the cavity and encircled by Arg81, Leu83, and Gly160. In case of the most active compound acyl-54, most of the hydrophobic interactions found by docking were conserved during MD simulation, while changes were observed for a number of hydrogen bonds. The NH of Ala157 formed hydrogen bond with double bonded oxygen of indoline-2-one moiety with 100% occupancy. Another new hydrogen bond was observed during MD simulation between the linker oxygen of acyl-54 and the sidechain of Arg82. shows CH-π and T-shaped π-π stacking interaction with Phe219. In the case of ZINC11253730, this CH-π interaction was found between benzonitrile ring and Lys106.    Table 4. Further, the binding free energy is decomposed into individual residue contributions (Figs. 9-10). The decomposition analysis, highlight the major contributing residues for binding of AK-B inhibitors. For visualization of this residue decomposition, the CHEWD 88 plugin for Chimera was used.
Here, the WaterSwap residue-based binding energy decompositions were investigated to highlight the residue which significantly contributed to the inhibitor binding throughout the course of MD simulations. The time-averaged values of the total (ΔG residue ) components for all crucial residues that display good binding affinity towards the short listed compounds are shown in the Fig. 10. The Residues with positive values from the WaterSwap decomposition analysis favour binding of the water cluster, while negative values indicate that the residue contributes favourably to ligand binding. This shows that in case of reference compound 4af3, acyl-54 and acyl-24 charged residues (Lys106 and Glu161) stabilise the water cluster, while these compounds are stabilised by promising hydrophobic contacts with the neighbouring residues Leu83, Phe88, Val91, Pro158, Gly160 and Leu207 of the binding site, and all these come into direct contact with the large compounds, as some of these contacts are missing in case acyl-24 which is the least compound of the series and also smaller in size. This is realized that the binding contribution is increasing with ligand size. Similarly, the short-listed compounds ZINC11253730, ZINC42019540, ZINC65618522 and ZINC7046484, are stabilized by hydrophobic contact with the surrounding residues Leu83, Phe88, Tyr156, Leu207 and Phe219. Besides hydrophobic interactions, these compounds also show negative contribution with the charged residues Glu155 and Ala157 because of their involvement in hydrogen bond interaction. The interaction between the compound and Lys165 is unfavourable in all complexes except 4af3 and acyl-54, where it is slightly favourable due to its participation in van der Waals interactions. After computing residue-based binding free energy decomposition, the results of WaterSwap and 3D-QSAR maps were correlated. For this purpose, the components of ΔG residue value were visualised by modifying them into a score which is used to colour each corresponding residue differently in molecular viewer. Snapshot from the resulting movie of acyl-54 (most active compound) are shown in Fig. 9. The trajectory displays that the residue based free energy components are fairly stable throughout the simulation. The residues are coloured by their total, van der Waals and electrostatic free energy components.
Additionally, Fig. 9 shows that the charged residues Lys106, Glu155 and Ala157 in the hinge region provides strong electrostatic stabilization (highlighted in red); this result is consistent with the 3D-QSAR maps in which two red contours found near the carbonyl oxygen of the ureido moiety of acyl-54 indicate that electron withdrawing group around these areas is responsible for increasing the AK-B inhibitory activity. Additionally, the compound attains promising van der Waals stabilisation by residues Leu83, Val91 and Leu207 (shown in blue). This decomposition reveals the experimental observation that the compound mostly binds via hydrophobic interactions. This observation is also well correlated with 3D-QSAR maps. The decomposition result also indicates that binding affinity of compound may be improved by adding a hydrogen bond donor group to get the same www.nature.com/scientificreports/ hydrogen bonding interactions that are observed to stabilise the swapped water cluster. To best of our knowledge, this study provides the first comparison of WaterSwap fields and 3D-QSAR maps.
To get further insight into inhibitor binding, MM/GBSA free energy decomposition analysis was used to decompose the total binding free energies into per residue components (Fig. 10). Table 4 lists the binding free energies for all inhibitor complexes. The residue decomposition approach suggested that major binding contribution comes from Glu155, Trp156 and Ala157 of site B and Leu83 and Leu207 of site C that play an important role in the binding. Free energy decomposition also shows favourable electrostatic contributions by Glu155 and Ala157 in both the WaterSwap and 3D-QSAR results.

Conclusions
AK-B is a promising target in the field of oncology 7 . The current study employed structure-based pharmacophore modelling and an atom-based 3D-QSAR analysis of acylureidoindolin derivatives as AK-B inhibitors, followed by MD simulation and free energy calculations. The selected CoMFA and CoMSIA model exhibited good extrapolative power and strong correlation between theory and experiment with cross-validation values (q 2 ) of 0.68, 0.641 and linear regression values (r 2 ) of 0.971, 0.933 respectively. The contour maps of 3D-QSAR indicate that electrostatic and hydrophobic fields explain the activity of acylureidoindolin derivatives. Structural requirements including the carbonyl group of ureido moiety and the methoxy of the fluorobenzoyl in the vicinity  (Table 1) of AK-B inhibitor complexes were used to generate structure based pharmacophoric hypothesis to facilitate the identification of novel compound from commercial database. All compounds that satisfied the structural pharmacophoric features were analyzed with our 3D-QSAR model. To remove false positives, and refine the hits, 7 compounds were evaluated by MD simulation and binding free energy analysis. Decomposition of overall binding energy showed that major contributions came from Glu155, Trp156 and Ala157 of site B and Leu83 and Leu207 of site C. These results could help in understanding AK-B inhibition and in designing more specific and potent inhibitors against AK-B protein to treat various malignant disorders.

Data availability
Data created during this research is provided as supplementary information accompanying this paper.