To construct a quantitative pharmacophore model of tubulin inhibitors and to discovery new leads with potent antitumor activities.
Ligand-based pharmacophore modeling was used to identify the chemical features responsible for inhibiting tubulin polymerization. A set of 26 training compounds was used to generate hypothetical pharmacophores using the HypoGen algorithm. The structures were further validated using the test set, Fischer randomization method, leave-one-out method and a decoy set, and the best model was chosen to screen the Specs database. Hit compounds were subjected to molecular docking study using a Molecular Operating Environment (MOE) software and to biological evaluation in vitro.
Hypo1 was demonstrated to be the best pharmacophore model that exhibited the highest correlation coefficient (0.9582), largest cost difference (70.905) and lowest RMSD value (0.6977). Hypo1 consisted of one hydrogen-bond acceptor, a hydrogen-bond donor, a hydrophobic feature, a ring aromatic feature and three excluded volumes. Hypo1 was validated with four different methods and had a goodness-of-hit score of 0.81. When Hypo1 was used in virtual screening of the Specs database, 952 drug-like compounds were revealed. After docking into the colchicine-binding site of tubulin, 5 drug-like compounds with the required interaction with the critical amino acid residues and the binding free energies <-4 kcal/mol were selected as representative leads. Compounds 1 and 3 exhibited inhibitory activity against MCF-7 human breast cancer cells in vitro.
Hypo1 is a quantitative pharmacophore model for tubulin inhibitors, which not only provides a better understanding of their interaction with tubulin, but also assists in discovering new potential leads with antitumor activities.
The microtubule system of eukaryotic cells has an essential role in regulating cell architectures; this system is crucial during cell division because microtubules are a key component of the mitotic spindle1. Microtubules are targeted by anticancer drugs and are involved in numerous essential cellular processes, such as cell signaling, motility regulation, maintaining cellular shape and transporting material within the cell1,2.
Antimitotic agents arrest the cell cycle at the G2/M phase, resulting in tumor regression and apoptotic cell death3,4,5. The tubulin-binding agents that are regarded as classic antimitotic agents interfere with the dynamics of microtubules by targeting tubulin; these compounds are frequently used to treat human cancers5. Recently, the clinical use of some tubulin inhibitors, such as taxanes and vinca alkaloids, has been limited by neurotoxicity and drug resistance5,6. Therefore, new small-molecule tubulin-binding inhibitors must be developed with novel modes of action5,7,8. The development of this type of drug is focused on the design of novel tubulin inhibitors.
Historically, researchers have maintained a considerable interest in the discovery and development of novel inhibitors that can interfere with tubulin polymerization9,10,11. In recent years, researchers have been actively exploring new antitubulin agents because of the toxicity and drug resistance of the antitubulin chemotherapy drugs12. Various small molecules have been reported as inhibitors of tubulin polymerization; these compounds bind to the colchicine-binding site on tubulin12,13,14,15. Although many different tubulin inhibitors had been synthesized and experimentally assessed, no information is available regarding the discovery of structurally novel leads. Chemical feature-based pharmacophores and virtual library screening may guide the design of novel lead candidates. This study aims to construct a chemical feature-based pharmacophore model and identify lead candidates with antitumor activities.
In our study, we successfully used pharmacophore modeling, database screening, and molecular docking approaches to identify potential leads with antitumor activities. A high-correlation quantitative pharmacophore model was generated using the observed structure-activity relationship of known tubulin inhibitors. After validation, this pharmacophore model was used as a 3D structural search query to find new classes of compounds from Specs database. The hit compounds were subjected to molecular docking studies for refinement. The binding free energy and molecular interactions with the active site residues were considered important components when identifying the potential leads.
Materials and methods
Pharmacophore model generation
The HypoGen module of Discovery Studio program (DS), version 2.5, from Accelrys (San Diego, CA, USA) was used to perform all of the pharmacophore modeling calculations. To represent the structural diversity and broad activity range, 66 compounds from literature resources1,2,5,9,16,17,18 were selected for use in the primary data set during the 3D QSAR pharmacophore modeling study. To ensure statistical relevance, 26 compounds with the experimental activity values (IC50) were selected from the 66 dataset compounds for use as a training set; the remaining 40 compounds (Figure S1) were used as test-set compounds during pharmacophore validation. To achieve a significant pharmacophore hypothesis, the above data set was selected based on the following criteria: (1) all 66 compounds with inhibitory activity against the CEM cancer cell line bind at the colchicine site on tubulin to inhibit tubulin assembly. (2) The data set must be widely populated, covering an activity range encompassing at least 4 orders of magnitude. The inhibitory activity values of the training-set compounds span five orders of magnitude, specifically from 0.52 nmol/L to 13 800 nmol/L, while those of test-set compounds span four orders of magnitude, specifically from 2.8 nmol/L to 14 900 nmol/L. (3) To avoid using the different standard IC50 values generated using different methods and labs, the inhibitory activity of 66 compounds used in the data set was collected from the same wet-lab assays and biological assessments; these compounds were built and subsequently geometrically optimized to the closest local minimum based on a Charmm-like force field (DS). All 26 compounds in the training set were submitted to 3D QSAR pharmacophore generation using DS. The best conformer generation option, which involved a maximum number of 250 and an energy threshold of 10 kcal/mol above the energy minimum necessary for conformation searching, was selected to generate multiple conformations. Hydrogen bond donor (HBD), hydrogen bond acceptor (HBA), hydrophobic (HY) and ring aromatic (RA) features were used to generate ten pharmacophore models. All other parameters used in the HypoGen module were kept at their default settings19,20. In this study, the top 10 hypothetical structures returned by the generation process were selected for further calculations.
The quality of a pharmacophore model is determined primarily by using two theoretical cost calculations that are represented in bit units. One is the “null cost” representing the highest cost of a pharmacophore model with no features; this value estimates every activity as the averaged activity data from the training-set molecules. The second cost is the “fixed cost,” also known as cost of an ideal model, which represents the simplest model that fits all the data perfectly. The total cost should always be far from the null cost and near the fixed cost when developing a meaningful model. The cost difference between the null and fixed cost values should be larger for a significant pharmacophore model. A value of 40–60 bits in a model implies that it has 75%–90% probability of representing a true correlation within the data19,20. The hypotheses are also evaluated based on other cost components. The cost value for every hypothesis is the summation of the weight cost (W), the configuration cost (C) and the error cost (E). The weight cost is a value that increases in a Gaussian form as the feature weights in a model deviate from the ideal value, which is two. The configuration cost measures the entropy of the hypothesis space. The error cost is the value that represents the root-mean-squared difference (RMSD) between the estimated and experimental activity value of the training-set compounds. If the input training-set compounds are too multiplex owing to too much flexibility in the training-set molecules, an effusive number of hypotheses will be generated from the subtractive phase. This configuration cost should always be less than 17. The correlation coefficient of the pharmacophore model should be close to 1.
Pharmacophore model evaluation
The best pharmacophore model was further validated by test set, Fischer randomization, decoy set and leave-one-out methods.
A total of 40 compounds with experimental activity data were selected from reported articles for the test set1,2,5,9,16,17,18. This method is used to elucidate whether the generated pharmacophore model can predict the activities of the compounds other than the training set and classify them correctly in their activity scale. The conformation generation for the test-set compounds was performed using the Diverse Conformation Generation protocol in DS. The different conformations of 40 compounds were subsequently determined for pharmacophore mapping using the Ligand Pharmacophore Mapping protocol with the Best/Flexible Search option available in DS.
Fischer randomization method
To verify whether a strong correlation exists between the biological activities and the chemical structure of the training-set compounds, a Fischer randomization test was carried out. This method generates pharmacophore hypotheses by randomizing the activity data of these compounds while using the same parameters and features used to generate the original pharmacophore hypothesis. For the Fischer's randomization test, a 95% confidence level was chosen for this validation study, and 19 random spreadsheets were constructed19,20. During the pharmacophore generation process, if the randomized data set generates similar or better cost values, RMSD and correlation, the original hypothesis were generated by chance21.
An internal database was developed using 800 compounds containing 43 active structures collected from the reported literature22,23,24,25,26,27,28,29,30,31,32. The database was used to evaluate the discriminative ability of the best pharmacophore model when distinguishing the active compounds from the inactive compounds. A database screening was performed using the Ligand Pharmacophore Mapping protocol available in DS. A set of statistical parameters were calculated including the total hits (Ht), % yield of actives, % ratio of actives, enrichment factor (E), false negatives, false positives, and goodness of hit score (GH).
The generated pharmacophore hypothesis is validated using a leave-one-out method. In this method, one compound is omitted during the generation of a new pharmacophore model, and its affinity is predicted by that new model. The model building and estimation cycle is repeated until each compound is omitted once33. This test verifies whether the correlation coefficient of the training-set compounds depends mainly on one particular compound34.
The CONCORD computer program (Tripos Associates, St Louis, MO) was used to convert the two-dimensional structures of the tested compounds from the Specs database into three-dimensional structures with the addition of charges. All compounds in the Specs database were further filtered based on Lipinski's rule of five35,36,37. A Lipinski-positive compound has the following qualities: (i) a molecular weight <500; (ii) <5 hydrogen bond donor groups; (iii) <10 hydrogen bond acceptor groups; (iv) an octanol/water partition coefficient (Log P) value <519,20. To identify any novel hit compounds, the validated pharmacophore model was used as a 3D query to screen the drug-like compounds in the Specs database. A Search 3D Database protocol with Best/Flexible search option was applied during the database screening19,20. Finally, these compounds were retrieved for further analysis and were selected based on the ligand conformations; these conformations can satisfy the binding free energy and molecular interactions with the key amino acids in the active site.
A Molecular Operating Environment (MOE) (Chemical Computing Group Inc, Montreal, Quebec, Canada) was used for molecular docking. A crystal structure of tubulin, which was obtained at 3.58 Å, was downloaded from the protein data bank (PDB ID: 1SA0). This structure was protonated in the Molecular Operating Environment (MOE)38. The active site was defined with a 6 Å radius around the bound inhibitor (colchicine) in the tubulin crystal structure. The triangle matcher algorithm of the MOE software packages was selected to dock the identified hit compounds into the protein active site. The scoring function must comply with the following parameters: (1) specifying ASE Scoring to rank the poses output by the placement stage; (2) specifying Forcefield Refinement to relax the poses; (3) specifying Affinity dG Scoring to rank the poses using the refinement stage39. The free energy of binding was calculated from the contributions of the hydrophobic, ionic, hydrogen bond, and van der Waals interactions between the protein and the ligand, intramolecular hydrogen bonds and strains of the ligand. We observed that the docking poses were ranked by the binding free energy calculation in the S field.
Cell proliferation inhibition assay
The biological assays were performed by using an MTT assay against one normal human cell line (HBL100) and one human breast cancer cell line (MCF-7) with abundant tubulin expression. The two cell lines were cultured in DMEM/1640 medium supplemented with 10% fetal bovine serum, 200 U/mL penicillin and 200 U/mL streptomycin. For in vitro treatment, the carcinoma cells were seeded in 96-well plates (6000 cells/well) and incubated at 37 °C and 5% CO2. After 24 h, the cells were treated with a known concentration of each test compound for 48 h. At the end of the drug exposure period, the cells were incubated at 37 °C for 4 h to 6 h by adding 3-(4,5-dimethylthiazol-2-yl)-2,5-diphenyltetrazolium bromide (MTT, Sigma) (20 μL/well). Next, the medium was removed, and 200 μL of DMSO was added to the insoluble fraction. The absorbance values at 490 nm were determined with a Spectramax M5 Microtiter Plate Luminometer (Molecular Devices Corporation, Sunnyvale, CA, USA). The values were calculated using the percentage of growth versus the untreated control.
Results and discussion
To correlate the chemical structure of tubulin inhibitors quantitatively to their biological activity, the HypoGen algorithm, which is available in the 3D QSAR Pharmacophore Generation protocol of DS, was carried out. During pharmacophore model generation, a training set containing 26 compounds (Figure 1) with activity values ranging from 0.52 to 13 800 nmol/L was used to generate ten top-scored hypothetical pharmacophores. The results for the top ten hypothetical pharmacophores and their statistical parameters are shown in Table 1. In this study, the first hypothetical pharmacophore (Hypo1) is the best; this structure has the lowest total cost value (114.523), the largest cost difference (70.905), the lowest root-mean-squared difference (RMSD) value (0.6977), and the highest correlation coefficient (0.9582).
A statistical data analysis was performed to assess the quality of the generated hypothetical pharmacophores. The two main values used for the cost analysis are the difference between null and fixed cost and the difference between the total cost and the null cost. The fixed cost of the run was 98.2482, which was far from the null cost of 185.428 and close to the total cost of 114.523. The large difference (87.1798) between the fixed and null cost values suggests that Hypo1 has more than 90% statistical significance as a model. All the 10 hypothetical pharmacophores were subjected to further assessment for their ability to predict the activity of the training-set molecules. A value for the configuration cost below 17 indicates that the correlation from the generated pharmacophores cannot be attributed to chance. All hypotheses have RMSD values below 2, illustrating the good predictive quality of these hypothetical structures. The rule to select a hypothetical pharmacophore with the lowest total cost, a large cost difference, a high correlation coefficient and a low RMSD value reveals that Hypo1 has the best statistical values compared to the other hypothetical structures. Therefore, Hypo1, which included one hydrogen-bond acceptor (HBA), one hydrogen-bond donor (HBD), one hydrophobic feature (HY), one ring aromatic feature (RA) and three excluded volumes (EV), was chosen as the best structure for further analysis (Figure 2A). The 3D space and distance constraints of these features are represented in Figure 2B.
Activity prediction and mapping of the training-set compounds on Hypo1
To verify the predictive ability of Hypo1 with the training-set compounds, a regression analysis was used to estimate the activity of each training-set compound. The experimental activities of the training-set compounds were classified into four groups: highly active (IC50<20 nmol/L, ++++), active (20≤ IC50<200 nmol/L, +++), moderately active (200≤IC50<2000 nmol/L, ++), and inactive (IC50≥2000 nmol/L, +)10. As shown in Table 2, three of the twenty-six training-set compounds were predicted to have different activities than their experimental values. The error value is the ratio between the estimated and experimental activities. An error value below 10 signifies that the estimated activity was below one order of magnitude. None of the 26 training-set compounds had an error value above 4. Figure 2C and 2D map the most and least active compounds of the training set on Hypo1, respectively. Clearly, compound 1 mapped well on all of the hypothetical features, while compound 26 did not map on to two of the hypothetical features, particularly HBD and RA, signifying the importance of these features. Therefore, Hypo1 is a reliable model that accurately estimates the experimental activity of the training-set compounds.
The best model pharmacophore (Hypo1) was further validated using the test-set, Fischer randomization test, leave-one-out and decoy-set methods.
The predictive ability of Hypo1 was evaluated using test-set predictions. The validation process was performed using a test set containing 40 compounds with diverse activities and different functional groups. Various conformers of these test-set compounds were built using the same method as that used for the training-set compounds while using DS. The estimated activity values were predicted for every test-set compound based on the geometric fit of these compounds over Hypo1. The simple regression between the experimental and estimated activity values for 40 test-set compounds had a correlation coefficient value of 0.9181 (Figure 3). Thirty-five of the forty test-set compounds had error values below 2 (Table 3), similar to the experimental and estimated activity values. A good correlation was observed between the experimental and estimated IC50 values, revealing the good predictive capacity of Hypo1.
Fisher randomization method
To validate the statistical confidence of Hypo1, Fischer's randomization method was performed on the training-set compounds. During the validation process, the experimental activities of the training set were randomly mixed and the resulting training set was used in a HypoGen module with the parameters chosen when generating the original pharmacophore. To show that Hypo1 was not generated by chance with a 95% confidence level, a set containing 19 random spreadsheets was generated (Figure 4). None of the randomly generated pharmacophore models obtained from this validation method was produced with statistical values better than those of Hypo1. The Fischer's randomization test confirmed that Hypo1 was statistically robust.
An internal database containing 800 compounds was used during the validation process. This database was created with 757 inactive compounds and 43 inhibitors with known experimental activity. To investigate the ability to distinguish the actives from inactive compounds, Hypo1 was used as a 3D query to screen the internal database while using the pharmacophore search module. Hypo1 retrieved 49 compounds, 39 of which were active. The enrichment factors (E), goodness of hit score (GH) and other statistical values were calculated for Hypo1 using this database (Table 4). The false positive value is 10, and the false negative value is 4. The calculated E value is 14.81, indicating that this model is highly efficient for database screening. When the GH score exceeds 0.7, the model is very good. This score was 0.81 for Hypo1, revealing that this structure could identify the active compounds.
The leave-one-out method was used to cross-validate the model. For this method, the hypothetical pharmacophores were recomputed by omitting one compound at a time from the training set. This process proves that the correlation coefficient of Hypo1 does not depend solely on one particular compound. If the corresponding one-missing hypothesis can correctly predict the activity of each excluded compound, the test is positive. The value of the correlation coefficient, the feature composition of the pharmacophore and the quality of the activity estimated for the excluded compound were used to assess the statistical test. We did not obtain any meaningful differences between Hypo1 and any of the 26 hypothetical structures resulting from the leave-one-out method, confirming that the correlation coefficient for Hypo1 did not depend solely on one particular compound in the training set at required confidence level.
The steps used during the database screening are shown in Figure 5. First, the concord software was used to convert the two-dimensional structures of the tested compounds in the Specs database into three-dimensional structures with the addition of electric charges. Second, the preliminary screening of drug-like compounds was performed based on Lipinski's rule of five. Consequently, 145 307 drug-like compounds were selected for screening with Hypo1. The 952 compounds mapped on all of the pharmacophoric features present in Hypo1 were finally used in a molecular docking study.
To further refine the retrieved hits and remove the false positives, these 952 compounds, as well as the 26 training-set compounds, were docked into the colchicine-binding site of tubulin (PDB ID: 1SA0) using the Molecular Operating Environment (MOE) software. The binding free energy that distinguishes molecules based on their interacting ability was calculated for all 978 compounds. The highly active compounds in the training set had binding free energy values above -3.931 kcal/mol. Finally, 164 compounds were selected by restricting the binding free energy to <-4 kcal/mol.
Because it predated the FDA, colchicine was sold in the United States for many years without having been reviewed by the FDA for safety and efficacy. The crystal structure of tubulin with colchicine was obtained from the protein data bank (PDB ID: 1SA0). Ligand-protein interaction diagrams for the binding site of tubulin with colchicine are shown in Figure 6. Colchicine interacted strongly with critical amino acid residues including Leu255, Leu248, Lys352, and Asn258 in the colchicine-binding site of tubulin. Therefore, these amino acid residues were very important for inhibitor binding. These 164 compounds were selected based on the ligand conformations that could satisfy the necessary interactions with the key amino acids at the active site. Finally, 5 drug-like compounds with the required interaction with critical amino acid residues and good binding free energies were selected as representative leads. Figure 7 depicts good pharmacophore mapping of five hits on Hypo1. A search for compounds using SciFinder Scholar and PubChem Search revealed that these hits belonged to the chalcone derivatives that strongly inhibited the polymerization of tubulin by binding to the colchicine-binding site of the β-tubulin subunit40. However, these hits had no reported in vitro antiproliferative activity against cancer cell lines. Therefore, the five hits were selected and purchased for biological validation.
In vitro antiproliferative activities
Compounds 1–5 were evaluated for in vitro cytotoxic activity against a human breast cancer cell line (MCF-7) and a normal human cell line (HBL100). The preliminary results from the MTT assays showed that all the selected compounds were active with various degrees of inhibition at 100 μmol/L (Figure 8). The five hits exhibited at least 50% inhibition of MCF-7 cell proliferation. After applying a cutoff at 80% inhibition, compounds 1 and 3 were subjected to IC50 studies. Table 5 shows that compound 1, which has an IC50 value of 28.5 μmol/L, exhibited stronger cytotoxicity against MCF-7 cell line than compound 3. As shown in Figure 9, compounds 1, 3 and colchicine exhibited dose-dependent anti-proliferative activity against HBL100, while compound 1 exhibited lower cytotoxicity against a normal human cell line (HBL100) when compared to colchicine and compound 3.
To confirm the correct binding mode and ensure a geometric fit, compounds 1 and 3 were docked into the colchicine-binding site of tubulin (Figure 10). Compound 1 exhibited strong hydrophobic interactions with Ala316, Lys254, and Thr179, as well as critical amino acid residues including Leu255, Leu248, Lys352, and Asn258. Moreover, the methoxy group on this compound formed a hydrogen-bonding interaction with the side chain of Asn101 when compared to colchicine. The molecular docking positions of compounds 1 and 3 in the crystal structure of tubulin are represented in Figure 11. The entire structure of compound 3 mapped very well onto the hydrophobic pocket of tubulin and formed strongly hydrophobic interactions with Leu255, Leu248, Lys352, and Asn258 in the active site. More importantly, the two methoxy groups on this compound, showed a better geometric fit over the hydrophobic pocket and hydrogen-bonding interactions with the side chain of Asn101 compared to colchicine; Asn101 played a very important role during protein-ligand binding process, possibly explaining the stronger inhibitory activity of compounds 1 and 3 against MCF-7 cancer cell line. An understanding of this interaction between tubulin and the hit compounds will aid in the development of new inhibitors with potent antitumor activities.
In the present work, 3D pharmacophore models of tubulin inhibitors were developed using the HypoRefine module in the Discovery Studio program (DS). The best quantitative pharmacophore model was Hypo1; this model was characterized by the lowest total cost value (114.523), the highest cost difference (70.905), the lowest RMSD (0.6977), and the best correlation coefficient (0.9582). Hypo1 was generated with one HBA, one HBD, one HY, one RA feature and three EV. This pharmacophore model was further validated using the test-set prediction, Fischer's randomization test, decoy-set and leave-one-out methods. The results of the test-set method showed a good correlation between the experimental and estimated values (correlation coefficient of 0.9181), revealing the good predictive ability of Hypo1. The results of the Fischer's randomization test further confirmed the statistical confidence for Hypo1. Other validation methods have provided reliable results regarding the strength of Hypo1. Hypo1 was used as a 3D query to screen the Specs database after validation. The hit compounds were subsequently subjected to molecular docking studies to refine the retrieved hits. Finally, five potential inhibitory leads with diverse structures and strong molecular interactions with the key amino acids were identified. Biological evaluation indicated that compounds 1 and 3 showed relatively good inhibitory activity against a cancer cell line (MCF-7). We believe that this study will not only assist in the development of new potent hits for antitumor inhibitors but also provide a better understanding of their interaction with tubulin. More broadly, these results will facilitate the rational design of novel potent drugs.
Protein Data Bank
The project was supported by the National Natural Science Foundation of China (No 81220108012, 61335007, 81371684, 81000666, 81171395 and 81328012).
About this article
The supplementary figure was available on the APS's website.