This study was conducted to compare the efficiencies of two virtual screening approaches, pharmacophore-based virtual screening (PBVS) and docking-based virtual screening (DBVS) methods.
All virtual screens were performed on two data sets of small molecules with both actives and decoys against eight structurally diverse protein targets, namely angiotensin converting enzyme (ACE), acetylcholinesterase (AChE), androgen receptor (AR), D-alanyl-D-alanine carboxypeptidase (DacA), dihydrofolate reductase (DHFR), estrogen receptors α (ERα), HIV-1 protease (HIV-pr), and thymidine kinase (TK). Each pharmacophore model was constructed based on several X-ray structures of protein-ligand complexes. Virtual screens were performed using four screening standards, the program Catalyst for PBVS and three docking programs (DOCK, GOLD and Glide) for DBVS.
Of the sixteen sets of virtual screens (one target versus two testing databases), the enrichment factors of fourteen cases using the PBVS method were higher than those using DBVS methods. The average hit rates over the eight targets at 2% and 5% of the highest ranks of the entire databases for PBVS are much higher than those for DBVS.
The PBVS method outperformed DBVS methods in retrieving actives from the databases in our tested targets, and is a powerful method in drug discovery.
In the past decade, virtual screening has become a promising tool for discovering active (leading) compounds, and has integrated into the pipeline of drug discovery in most pharmaceutical companies1. Researchers have demonstrated the efficiency of virtual screening, which was shown to enrich the hit rate (defined as the number of compounds that bind at a particular concentration divided by the number of compounds experimentally tested) by a hundred to a thousand-fold over random screening (eg, high-throughput screening)2, 3. At the in silico laboratory, researchers use computational methods to evaluate virtual libraries (databases) against virtual receptors (targets) aimed at speeding up the drug discovery process. In essence, virtual screening is designed for searching large-scale hypothetical databases of chemical structures or virtual libraries by using computational analysis and for selecting a limited number of candidate molecules that are likely to be active against a chosen biological receptor4. Therefore, virtual screening is a logical extension of three-dimensional (3D) pharmacophore-based database searching (PBDS) or molecular docking, capable of automatically evaluating large databases of compounds. In this viewpoint, virtual screening approaches can be classified into two categories, pharmacophore-based virtual screening (PBVS) and docking-based virtual screening (DBVS).
Historically, PBVS (ie, PBDS) was developed as a more advanced method than DBVS. Since the 1990s, DBVS has become a more popular tool for discovering active compounds because it directly reflects the ligand-receptor binding process. Recently, PBVS has experienced a revival in drug discovery, especially in the cases when no three-dimensional (3D) structural information on the protein target of interest is available. Even when the 3D structures of targets are known, PBVS is also used as a complementary approach to DBVS for pre-processing databases (libraries) of small molecules to remove compounds not possessing features known to be essential for binding or for post-filtering compounds selected by docking approaches5. A recent case study by Muthas et al indicated that post-filtering with pharmacophores was shown to increase enrichment rates in their investigated targets compared with docking alone6. Several studies have been performed to assess various DBVS methods and compare which docking programs are the most successful in identifying active hits7, 8, 9, 10. The conclusion is that no docking program may outperform other docking programs for all the tested targets, and the performance of each tested docking program is highly dependent on the nature of the target binding site5.
Nevertheless, few case studies for a direct comparison on the performances between PBVS and DBVS have been reported11. To gain a general view for the discrimination between these two types of approach in prioritizing actives from a database with decoys, we performed a benchmark comparison between the performances of PBVS and DBVS. Eight structurally diverse protein targets were selected in this study. The pharmacophore models were generated from the ligand co-crystallized complex structures of these targets using the LigandScout program12, and each PBVS was performed using the program Catalyst13, 14. To avoid the target dependency of docking programs, three docking programs, namely DOCK15, 16, GOLD17, 18, 19, 20, and Glide21, 22, were used in the DBVS. The results for eight tested targets indicated that the pharmacophore-based method generally outperforms all three docking methods in retrieving actives from databases. Of the sixteen sets (one target versus two testing databases) of virtual screenings, PBVS resulted in higher enrichment factors than DBVS for fourteen cases. When the top 2% and 5% of the ranked compounds are considered, the average hit rate of PBVS over the virtual screening results against the eight targets is much higher than those of the docking methods.
The main goal of this study is to make a benchmark comparison between the performances of the two types of virtual screening approach, pharmacophore-based and docking-base methods. The flowchart of the research pipeline is outlined in Figure 1. First, we selected eight pharmaceutically interesting targets representing diverse pharmacological functions and disease areas for the benchmark comparison; these targets include angiotensin converting enzyme (ACE), acetylcholinesterase (AChE), androgen receptor (AR), D-alanyl-D-alanine carboxypeptidase (DacA), dihydrofolate reductase (DHFR), estrogen receptors α (ERα), HIV-1 protease (HIV-pr), and thymidine kinase (TK) (Table 1). For each target, the pharmacophore model was constructed based on several X-ray crystal structures of this target protein in complex with ligands (mostly inhibitors), and one high-resolution crystal structure of the ligand-protein complex was used to generate the model for docking-based virtual screening (Figure 1). One active dataset containing experimentally validated active compounds was constructed for each target, and two decoy datasets composed of ∼1000 compounds were generated (designated as Decoy I and Decoy II). By combining the eight active datasets with the two decoy datasets, sixteen small databases were built up for virtual screening. Finally, each small molecular database was searched by using either the pharmacophore-based or docking-based virtual screening approach against the corresponding pharmacophore or docking virtual screening model, and the performance for each virtual screening was evaluated by measuring the virtual screening effectiveness. The detailed procedure of each step in the research pipeline is described below.
In this study, eight pharmacologically important proteins were selected as test targets, ie, ACE, AChE, AR, DacA, DHFR, ERα, HIV-pr, and TK. For each target, numerous structures in complex with different ligands have been determined using the X-ray crystallographic technology. We selected some typical crystal structures of each target for establishing the pharmacophore models, and selected one high-resolution structure for docking simulation. The coordinates of the crystal structures of these protein targets were obtained from the Protein Data Bank (PDB)23, and their PDB entries are listed in Table 1.
Here, we briefly introduce the pharmacological functions of the eight targets used in this study. ACE plays an essential role in the renin-angiotensin system (RAS), which regulates both arterial blood pressure and the salt/water balance24, 25. Since the first generation of ACE inhibitors was discovered from snakes, inhibitors of ACE have been a major target for discovering anti-hypertensive drugs26, 27, 28, 29. AChE is a key enzyme that breaks down the neurotransmitter acetylcholine at the synaptic cleft. Its inhibitors are now the main stream drug in the treatment of Alzheimer's disease (AD)30. AR is a member of the steroid hormone receptor branch of the nuclear receptor superfamily and, as a mediator of androgen signaling, it has important roles for the coordinated gene expression in male reproductive tissues31. The inhibition of this receptor plays a physiological role in modulating AR-dependent gene regulation in reproductive tissues. DacA, also known as a penicillin target, is a penicillin-sensitive, membrane-bound enzyme required for trimming the carboxy-terminal D-alanyl residues from cell wall precursors. DacA catalyses carboxypeptidation and transpeptidation reactions involved in bacterial cell wall metabolism, and are inactivated by β-lactam antibiotics32. Its inhibitors usually function as antibacterials. DHFR is a ubiquitous enzyme found in all organisms. The enzyme catalyzes the reduction of 7,8-dihydrofolate (DHF) to 5,6,7,8-tetrahydrofolate (THF) by stereospecific hydride transfer from the NADPH cofactor to the C6 atom of the pterin ring with concomitant protonation at N5. DHFR plays a central role in the maintenance of cellular pools of THF and its derivatives, which are essential for purine and thymidylate synthesis and hence for cell growth and proliferation. This enzyme has been the target of several important anticancer drugs and antibiotics33. ERα is a member of the nuclear receptor superfamily of ligand-regulated transcription factors. The inhibitors of this receptor can be potentially applied in the treatment of breast cancer, osteoporosis, urogenital symptoms associated with post-menopausal atrophy of the vagina, or function as an oral contraceptive to prevent pregnancy34. The HIV-1 protease (HIV-pr) is essential for maturation of the virus into infectious viral particles, and it is therefore considered a suitable target for drugs against AIDS35. TK is the key enzyme in the pyrimidine salvage pathway catalyzing the phosphorylation of thymidine (dT) to thymidine monophosphate (dTMP) in the presence of Mg2+ and ATP36. Viral and insect TKs phosphorylate a variety of nucleosides and nucleoside analogues37, thus its inhibitors could be used in classical antiviral therapy.
Datasets of chemicals
Two kinds of datasets were constructed. The first one contains eight sub-datasets, each of which is composed of active compounds corresponding to a specific target (we designated as actives hereinafter). All the actives were isolated from the DrugBank database34. The chemical structures of the actives are shown in Table S1 in the Supplementary Information . To obtain more confident results, two sets of decoy datasets (I and II, Figure 1) were used in this study: dataset I contains 990 non-active molecules for all the eight targets, which were constructed by using the ligand preparation method provided by Bissantz et al38, and dataset II consists of 1000 non-active molecules, which were constructed by Cleves et al39. The molecules in these two datasets do not overlap. Combining the eight sub-datasets of actives and two decoy datasets produced sixteen databases for virtual screening; each target was screened on two databases using both pharmacophore-based and docking-based methods. To satisfy the requirement of pharmacophore-based virtual screening, the two-dimensional (2D) structures (in SD form) of all the small molecules in the datasets were converted into three-dimensional (3D) structures with multi-conformers in Catalyst 3D format using the catDB utility program encoded in the Catalyst; the maximum number of conformers generated for each molecule was set to 250. For the sake of running docking-based virtual screening, the small molecules in the datasets were also transformed into three-dimensional MOL2 format using the Corina program40.
Pharmacophore model generation
In this study, the pharmacophore models were derived according to the ligand-protein interactions. Thus, the pharmacophore of each target was constructed based on the X-ray crystal structures of ligand-protein complexes by using the software LigandScout12. LigandScout is a program for the detection of relevant interaction points between a ligand and a protein. Its algorithms performed a stepwise interpretation of the ligand molecules: planar ring detection, assignment of functional group patterns, determination of the hybridization state, and finally the assignment of Kekulé pattern41. For each target, we built up its pharmacophore model on the basis of a series of X-ray crystal structures of this target protein in complex with different ligands (Table 1). Thus, the common features of all the generated hypotheses were summarized and used as the query features in the virtual screening. In addition, the excluded volumes were involved in the pharmacophore models to improve effectiveness of virtual screening42, 43, 44. All the pharmacophore models were exported and translated by a script into Catalyst pharmacophore files for further virtual screening.
Two types of virtual screening approach for the benchmark comparison, pharmacophore-based and docking-based methods, were used to screen the sixteen small molecule databases containing both actives and decoy molecules. The pharmacophore-based virtual screenings were carried out using the catSearch program of Catalyst13, 14. The number of hits was limited to 1000 with the “maxhits” option for the purpose of enrichment analysis. The best flexible search method was employed in virtual screenings. After the virtual screening was finished, the catEspDriver program of Catalyst13, 14 was applied to estimate the fit values for the hits by calculating the fitness of each hit to the pharmacophore model. Thereafter, the hits retrieved were ranked according to the fit values.
For the docking-based virtual screening, three docking programs, DOCK (UCSF)15, 16, GOLD (Cambridge Crystallographic Data Center)17, 18, 19, 20 and Glide (Schrödinger, Inc)21, 22, were used in this study for the purpose of getting unbiased results of docking. The detailed procedures for the virtual screenings using these three docking programs are described in the Supplementary Information .
Measurement of virtual screening effectiveness
To quantify the ability of the four screening methods in assigning high ranks to the known actives, we report enrichment factors (EF) with graphical form and present accumulation curves that show how the fraction of actives recovered varies with the percentage of the database screened. The enrichment factor (EF) was calculated by Eq (1),
where Hitss is the number of active compounds selected by a virtual screening method at a subsetting of an upper fraction of the ranked list; Hitst is the total number of active compounds in the database; Ns is the total number of compounds in the subsetting of the database; and Nt is the total number of compounds in the database. The effectiveness of virtual screening can also be reflected by the hit rate (HR), ie, the ratio of the number of active compounds selected by the virtual screening method in a certain level of subsetting (Hitss) to the total number of known active compounds in the database (Hitst), as expressed by Eq (2),
Results and discussion
General features for the pharmacophore models of the eight targets
The pharmacophore models corresponding to the eight targets are shown in Figure S1 in the Supplementary Information . As mentioned above, each pharmacophore model was constructed based on a series of crystal structures of ligand-protein complexes. Here, we use the pharmacophores of ACE and DacA to demonstrate the process of how to construct the pharmacophore models and how to describe the feature of the models. Three X-ray crystal structures of ACE-ligand complexes were used in the pharmacophore model generation. Their PDB entries are 1UZF, 1O86 and 1UZE, respectively (Table 1). Firstly, three primary models were constructed on the basis of three crystal structures (Figures 2A–2C). For model A, the hypothesis contains two hydrophobic regions, two hydrogen bond acceptor groups, and two negative ionizable regions (Figure 2A); the pharmacophore feature of model B is composed of three hydrophobic regions, four hydrogen bond acceptor groups, two hydrogen bond donor groups, two negative ionizable regions, and two positive ionizable regions (Figure 2B); and the hypothesis of model C consists of three hydrophobic regions, four hydrogen bond acceptor groups, one hydrogen bond donor group, and two negative ionizable regions (Figure 2C). To obtain a common feature of these three pharmacophore models, we superposed the three crystal structures. The proteins can be superposed well together with a RMSD<0.3Å, and the three ligands showed their conformational diversity (Figure 2D). Common features can be determined through the structural superposition: one hydrogen bond acceptor on the hydroxyl group of the ligands, which hydrogen bonds to Lys511 and Tyr520, one hydrogen bond acceptor that interacts with water molecules in the protein binding site, and one hydrophobic region around the alkyl group of the ligands (Figure 2E). Additionally, one hydrophobic region around the pyrrolidine group of the co-crystallized ligands appeared only in model A. However, this region seems important for ligand-protein interaction. Therefore, this feature was also added in the final pharmacophore model (Figure 2E).
To further elucidate the pharmacophore modeling, here we detail the construction of the pharmacophore model for the target of DacA. Fourteen crystal structures of ligand-DacA complexes are available. Thus, we constructed fourteen primary pharmacophore hypotheses for this target using the LigandScout program. Ligands in the fourteen crystal structures, except those in 1IKI and 1MPL, share a common substructure, and this part of the ligands may superpose well in the protein binding site, as indicated in Figure 3A. For the twelve crystal structures of DacA, the pharmacophore feature around this substructure includes two positive ionizable regions, one hydrophobic region, and five hydrogen bond acceptor groups (Figure 3B). According to the interaction models of the fourteen ligands with DacA, we refined this pharmacophore model further. The final pharmacophore model consists of one hydrophobic region, four hydrogen bond acceptor groups, and six excluded volumes, as shown in Figure 3C. In a similar way, we obtained the pharmacophore models for six other targets (Figure S1 in the Supplementary Information ).
Performances of virtual screening
Once the pharmacophore models of the eight targets were obtained, we performed pharmacophore-based virtual screenings. Two sets of screens were carried out against each pharmacophore model towards the two databases (databases I and II). Simultaneously, databases I and II were also screened using three docking methods, DOCK, GOLD, and Glide, against the structural models of the eight targets.
Generally, the main purpose of virtual screening is to select a subset of a library enriched in compounds having the desired activity for experimental assay. Accordingly, the success of a virtual screening performance can be quantified by the enrichment factor (EF) and hit-rate (HR) when the percentage of active compounds in the screening database is known. The enrichment factor (EF) and hit-rate (HR) results of the 64 virtual screenings are shown in Figure 4. In general, the pharmacophore-based method outperforms all three of the docking-based methods in retrieving actives from the databases, as indicated by the EF and HR values versus the upper fractions of the ranked list (Figure 4). For the screening result on database I against ACE (ACE-1), the pharmacophore-based method gave a maximum EF value of 10.25 at 1.39% of the highest ranks of the entire database, while the maximum EF values of the three docking-based methods are smaller than that of the pharmacophore-based method and also occur at higher percentages of the highest ranks of the entire database (Figure 4A1). For the screening result on database II against ACE (ACE-2), the screening effectiveness of the pharmacophore-based method was found to be better than those of DOCK and GOLD, but slightly less effective than that of Glide (Figure 4A3). Nevertheless, the curves of the HR values versus the upper fractions of the ranked list indicate that the pharmacophore-based method outperforms the three docking methods in selecting actives before 7.55% and 12.69% for databases I and II, respectively (Figures 4A2 and 4A4). For most of the other seven targets, the pharmacophore-based method was also more effective in selecting actives than the docking methods if estimated by the EF values, except for the screening results against ERα on database I (Figure 4F1), and HIV-pr (Figure 4G1 and 4G3) and TK on databases I and II (Figure 4H1 and 4H3). Only one docking method outperformed the pharmacophore method for virtual screenings against ERα and TK. If evaluated by the HR values, the screening effectiveness of the pharmacophore-based method was found to be higher than those of the docking-based methods for all the targets before a certain percentage of the databases (Figures 4, columns 2 and 4).
In practice, a certain percentage of molecules in the highest ranked list are selected for experimental assay. The hit-rate for these selected molecules is also an important reflection of the performance of a virtual screening method. Thus, we calculated the HR values at the top 2% and 5% of compounds in the ranked list for each virtual screening. The results are shown in Figure 5. After direct comparison, the virtual screening results indicate that the hit-rate of the pharmacophore-based method is higher than those of docking-based methods for each target on each database at a 5% subsetting, and the hit-rates of the pharmacophore-based method at a 2% subsetting against all the targets except ACE and HIV-pr are higher than those of docking-based methods. When averaged over the results of the virtual screening against the eight targets, the hit-rate of the pharmacophore-based method was found to be much higher than those of all three docking methods (Figure 5).
Pharmacophore fitting versus molecular docking
Virtual screening approaches by means of docking reflect the ligand-receptor binding process directly, and approaches by means of a pharmacophore as a query structure map the ligand-receptor recognition indirectly45. In principle, the effectiveness of docking-based approaches in retrieving active compounds from databases should be higher than that of pharmacophore-based methods. This, however, may not be the case, at least for the virtual screening against the eight targets tested in this study. We may attribute this result to the shortages of the current docking methods. First, the screening results using the current docking methods rely on the scoring functions to estimate the binding affinities of the compounds in a database with a specific target. So far, no scoring function can predict binding affinity accurately and universally for all targets. The second shortcoming of current docking programs is that most of the existing docking programs omitted the flexibility of targets, preventing potential actives that crash with the binding pockets of available protein crystal structures from binding during docking simulations. By contrast, pharmacophore-based method screen and rank compounds from databases by fitting the molecular structures with the pharmacophore features that are essential for ligand-target binding without considering the real interaction between ligands and targets. Possibly, this simplification involves the flexibility of target proteins for ligand binding.
To test the above notion, we re-analyzed the virtual screening results by fitting the actives at 5% of the highest ranks of the entire databases to the pharmacophore models and target binding sites, respectively. The results are shown in Figures 6, 7, 8, 9 and Figures S2–S13 in the Supplementary Information . Here, we only use the result against TK and ERα as examples to illustrate why the PBVS enrichments are higher than those of DBVS.
For the target TK, there is a total of 8 actives in the testing databases (databases I and II) (Table 1). Among the 5% subsetting of the highest ranked compounds of the entire databases, the pharmacophore-based method recovered 6 and 8 actives from databases I and II, respectively; DOCK, GOLD and Glide only retrieved 1, 2 and 2 actives from both of the two databases, respectively. The pharmacophore model of TK contains three features (Figure S1 in the Supplementary Information ), thus the best fit value of actives with the pharmacophore should be 3. Of the 8 actives of TK, the fit values of 7 compounds were larger than 2, and only one compound had a fit value of 1.97 (Figure 6). This result suggests that the pharmacophore model of TK is reliable and the screening result is confident. However, the docking simulations indicated that only 1–2 compounds could fit well with the binding pocket of TK, as other actives could not bind complementarily with the binding pocket (Figure 6, right column). The docking programs even could not “send” the actives with large sizes into the binding pocket (Figure 6D), or produced binding conformations far away from their active conformation (Figures 6J, 6L and 6P). Structure superposition suggests that most of the actives crashed with the residues around the binding pocket of TK (Figure 7).
For the target ERα, the pharmacophore-based method recovered 9 actives at the top 5% of the highest ranks from database I. However, DOCK and Glide recovered only 1 and 2 actives, respectively, and GOLD did not recover any molecules. Similar to the result of TK, all 9 actives fit well with the pharmacophore model of ERα, with fit values from 2.60 to 3.74 (Figure 8). However, most of the actives could not bind well with the binding pocket of ERα (Figures 8 and 9). The fitting and docking results for the other six targets are shown in Figures S2–S13 in the Supplementary Information , from which we can draw a similar conclusion.
These results indicate that the rigidity of the crystal structure of the target protein restrains the docking programs to retrieve the actives from the database. Nonetheless, the pharmacophore-based method is not limited by the target structures, so molecules that fit well with the pharmacophore model are selected as actives from the database. This implies that pharmacophore-based methods may implicitly consider the flexibility of the target protein during virtual screening, which is one of the reasons why PBVS methods outperform DBVS.
The flexibility of the receptor is an advantageous factor for the performance of the pharmacophore method
As described above, the pharmacophore-based methods fit the molecules with pharmacophore features which are extracted from the ligand-protein complex structures, and those that could match these features will be defined as hits. Figure 10A shows the mapping conformation of compound DB03431 with the structure of TK. Compound DB03431, with a similar scaffold of the ligand in the 1KIM, could match three pharmacophore features, including two hydrogen bond donors and one hydrogen bond acceptor (Figure 6I). The predicted conformation is similar to the native ligand (colored gray in Figure 10A) and can form the hydrogen bonds with residues Gln125, implicating the rationality of this conformation. However, this predicted conformation by Catalyst will clash with residues G56–G59 and K62 of 1KIM, which all locate on a loop (colored pale green in Figure 10A). The result indicates that the pharmacophore-based method may obtain more active compounds from a larger steric space without considering the protein structure restriction, i.e., implicitly taking into account the flexibility of the receptor, while such conformation cannot be predicted by the docking programs due to the steric effect. For example, the binding poses predicted by GOLD (green) and Glide (blue) significantly deviated from the native ligand in the crystal structure (Figure 10A). The DOCK program could not dock the compound DB03431 into the binding site because of the steric clash.
To further describe the advantage of the pharmacophore-based method, we took the compound DB03280 as another example (Figure 10B). The docking programs cannot handle compounds with a large size. The DOCK and Glide programs cannot dock this compound into the pocket of the protein; the GOLD program produced a contracted conformation with lower score. By contrast, the pharmacophore-based method predicts the reasonable conformation of this compound, which is similar to the native ligand in PDB 1KIM (colored gray in Figure 10B). Although the predicted conformation for this compound clashes with some residues of protein, it was still ranked by Catalyst at the top of the hits list, since the pharmacophore-based method neglected the residues' information that is not critical for pharmacophore mapping.
In summary, the above evidences displayed the importance of considering receptor flexibility in virtual screening, which has been presented in several studies, such as those involving the program ICM46. This program takes into account the flexibility of the receptor with a soft van der Waals potential approach, and therefore exhibited superiority compared with other docking programs that only consider the flexibility of the ligand47, 48.
The performance of docking programs employing multiple crystal structures
Since the pharmacophores were extracted from multiple protein-ligand complex structures in this study, to make a fair comparison, we used the same amount of crystal structures in the virtual screening simulations for the ACE and TK target cases, and the hit rates for the top 2% and 5% were presented for each crystal structure. For database I, no actives were retrieved using each docking program against the three crystal structures of ACE at the top 2% (Figure 11A); for database II, the hit rates obtained by DOCK for 1O86 and Glide for 1UZE were the same as and better than that of the pharmacophore-based method at the top 2% (Figure 11B). However, the average hit rates for all the three docking methods against the three crystal structures were found to be lower than that of the pharmacophore-based method. At the top 5%, the Glide program yielded approximately the same hit rate as obtained by the pharmacophore-based method for database I (Figure 11D); for database II, the hit rates of all the docking-based methods were less than that of the pharmacophore-based method. For the target of ACE, the hit rates with docking-based methods were still less than that of the pharmacophore-based methods in most cases, although multiple crystal structures were used in the docking-based methods.
In the TK case, we also performed virtual screening against 21 crystal structures of TK. Glide outperformed other docking-based methods for this target (Figure 12). At the top 2%, the hit rates yielded by Glide against 1E2K, 1E2P, 1KI3, and 1VTK were the same as that of the pharmacophore-based method for database I (Figure 12A), while the hit rates against 1E2I and 1KI7 were higher than that of the pharmacophore-based method (Figure 12A). For other crystal structures, the screening accuracy of the docking-based method was not found to be better than that of the pharmacophore-based method. For database II, the hit rates of the three docking programs were also lower than that of the pharmacophore-based method (Figure 12B and 12D), both at the top 2% and 5% level.
All the results demonstrated that, although the performance of the docking programs varied with multiple crystal structures, the superiority of the pharmacophore-based method is still significant. The hit rates obtained by some docking programs (such as DOCK and Glide) were the same as or better than the pharmacophore method for a few crystal structures, but for most cases, the hit rates obtained by the docking programs were less than that of the pharmacophore-based method. Most importantly, it is difficult to select which crystal structure to be used in the practical virtual screening since the results will be significantly affected by a slight change in the protein, such as the conformational change of one residue in the binding site. In addition, considering the cost of time and computation, generally only the crystal structure with higher resolution will be selected in virtual screening instead of using multiple structures. Correspondingly, it is feasible that one can summarize the pharmacophore from multiple complex structures and select their common features into one pharmacophore model to perform pharmacophore-based virtual screening.
The current study presents a direct comparison of the two virtual screening approaches, pharmacophore-based and docking-based methods, using two datasets of small molecules against eight pharmacologically important and structurally diverse target proteins (Figure 1). Significant differences between these two approaches in retrieving actives from the databases have been observed. Similar to previous reports, docking-based methods show varying performance depending on the nature of the target binding sites5. For example, DOCK outperformed GOLD and Glide for DacA. Glide enrichment is higher than those of DOCK and GOLD for AR (Figures 4 and 5). For most cases (14 out of 16), the pharmacophore-based method outperformed the docking-based methods (DOCK, GOLD, and Glide), and the average PBVS enrichment over the virtual screens against the eight targets is much higher than those of DBVS (Figures 4 and 5). This study also indicated that pharmacophore-based methods, although mapping the ligand-receptor recognition indirectly, imply the involvement of flexibility of target protein-ligand interactions during virtual screening, as indicated by the fitting results in Figures 6, 7, 8, 9. Although we cannot conclude now which method is better from this case study against only eight targets, this interesting result suggests that we should pay attention to pharmacophore-based methods or other ligand-based methods. This study also suggests that pharmacophore-based methods are valuable even if the high-resolution structures of the targets are known.
Prof Hua-liang JIANG and Hong-lin LI designed the research and revised the manuscript; Zhi CHEN, Qi-jun ZHANG, and Xiao-guang BAO conducted the research; Kun-qian YU, Xiao-min LUO, and Wei-liang ZHU helped with part of the research; Zhi CHEN analyzed the data and wrote the paper.
Supporting Methods describing the detailed procedure of virtual screening using DOCK, GOLD, and Glide; Table S1 shows the chemical structures of the actives used in this study; Figure S1 shows the refined pharmacophore models for the eight targets represented in Catalyst form; Figures S2–S13 show the mapping features of the actives to their pharmacophore models and fitting models of these actives to the binding pockets of the eight targets. Supplementary data associated with this article can be found in the online version.
Stahl M, Guba W, Kansy M . Integrating molecular design resources within modern drug discovery research: the Roche experience. Drug Discov Today 2006; 11: 326–33.
Shoichet BK . Virtual screening of chemical libraries. Nature 2004; 432: 862–5.
Doman TN, McGovern SL, Witherbee BJ, Kasten TP, Kurumbail R, Stallings WC, et al. Molecular docking and high-throughput screening for novel inhibitors of protein tyrosine phosphatase-1B. J Med Chem 2002; 45: 2213–21.
Walters WP, Stahl MT, Murcko MA . Virtual screening — an overview. Drug Discov Today 1998; 3: 160–78.
McInnes C . Virtual screening strategies in drug discovery. Curr Opin Chem Biol 2007; 11: 494–502.
Muthas D, Sabnis YA, Lundborg M, Karlen A . Is it possible to increase hit rates in structure-based virtual screening by pharmacophore filtering? An investigation of the advantages and pitfalls of post-filtering. J Mol Graph Model 2008; 26: 1237–51.
Cummings MD, DesJarlais RL, Gibbs AC, Mohan V, Jaeger EP . Comparison of automated docking programs as virtual screening tools. J Med Chem 2005; 48: 962–76.
Warren GL, Andrews CW, Capelli AM, Clarke B, LaLonde J, Lambert MH, et al. A critical assessment of docking programs and scoring functions. J Med Chem 2006; 49: 5912–31.
Perola E, Walters WP, Charifson PS . A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 2004; 56: 235–49.
Kellenberger E, Rodrigo J, Muller P, Rognan D . Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins 2004; 57: 225–42.
Steindl T, Langer T . Docking versus pharmacophore model generation: A comparison of high-throughput virtual screening strategies for the search of human rhinovirus coat protein inhibitors. QSAR & combinatorial science 2008; 24: 470–9.
Wolber G, Langer T . LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J Chem Inf Model 2005; 45: 160–9.
Accelrys: San Diego, CA CATALYST 4.10. 2005:http://www.accelrys.com.
Kurogi Y, Guner OF . Pharmacophore modeling and three-dimensional database searching for drug design using catalyst. Curr Med Chem 2001; 8: 1035–55.
Ewing TJA, Kuntz ID . Critical evaluation of search algorithms for automated molecular docking and database screening. J Comput Chem 1997; 18: 1175–89.
Ewing TJ, Makino S, Skillman AG, Kuntz ID . DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J Comput Aided Mol Des 2001; 15: 411–28.
Jones G, Willett P, Glen RC . Molecular recognition of receptor sites using a genetic algorithm with a description of desolvation. J Mol Biol 1995; 245: 43–53.
Jones G, Willett P, Glen RC, Leach AR, Taylor R . Development and validation of a genetic algorithm for flexible docking. J Mol Biol 1997; 267: 727–48.
Nissink JW, Murray C, Hartshorn M, Verdonk ML, Cole JC, Taylor R . A new test set for validating predictions of protein-ligand interaction. Proteins 2002; 49: 457–71.
Verdonk ML, Cole JC, Hartshorn MJ, Murray CW, Taylor RD . Improved protein-ligand docking using GOLD. Proteins 2003; 52: 609–23.
Friesner RA, Banks JL, Murphy RB, Halgren TA, Klicic JJ, Mainz DT, et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J Med Chem 2004; 47: 1739–49.
Friesner RA, Murphy RB, Repasky MP, Frye LL, Greenwood JR, Halgren TA, et al. Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J Med Chem 2006; 49: 6177–96.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, et al. The Protein Data Bank. Nucleic Acids Res 2000; 28: 235–42.
Laragh JE, Brunner HR, Buhler FG, Ealey JE, Vaughan ED . Renin, angiotensin and aldosterone system in pathogenesis and management of hypertensive vascular disease. Am J Med 1972; 52: 633–52.
Soffer RL . Angiotensin converting enzyme and the regulation of vascoactive peptides. Annu Rev Biochem 1976; 45: 73–94.
Kato H, Suzuki T . Bradykinin-potentiating peptides from the venom of Agkistrodon half blomhoffii. Isolation of five bradykinin potentiators and the amino avid sequences of two of them: potentiators B and C. Biochemistry 1971; 10: 972–80.
Ondetti MA, Williams NJ, Sabo EF, Pluscec J, Weaver ER, Kocy O . Angiotensin I-converting enzyme inhibitors from the venom of Bothrops jararaca. Isolation, elucidation of structure, and synthesis. Biochemistry 1971; 10: 4033–9.
Ondetti MA, Rubin B, Cushman DW . Design of specific inhibitors of angiotensin-converting enzyme: new class of orally active antihypertensive agents. Science 1977; 196: 441–4.
Laffan RJ, Goldberg ME, High JP, Schaeffer TR, Waugh MH, Rubin B . Antihypertensive activity in rats for SQ 14,225, an orally active inhibitor of angiotensin I-converting enzyme. J Pharmacol Exp Ther 1978; 204: 281–8.
Polinsky RJ . Clinical pharmacology of rivastigmine: a new-generation acetylcholinesterase inhibitor for the treatment of Alzheimer's disease. Clin Ther 1998; 20: 634–47.
Holter E, Kotaja N, Makela S, Strauss L, Kietz S, Janne OA, et al. Inhibition of androgen receptor (AR) function by the reproductive orphan nuclear receptor DAX-1. Mol Endocrinol 2002; 16: 515–28.
Dideberg O, Charlier P, Dive G, Joris B, Frere JM, Ghuysen JM . Structure of a Zn2+-containing D-alanyl-D-alanine-cleaving carboxypeptidase at 2.5 A resolution. Nature 1982; 299: 469–70.
Schnell JR, Dyson HJ, Wright PE . Structure, dynamics, and catalytic function of dihydrofolate reductase. Annu Rev Biophys Biomol Struct 2004; 33: 119–40.
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006; 34: D668–672.
Backbro K, Lowgren S, Osterlund K, Atepo J, Unge T, Hulten J, et al. Unexpected binding mode of a cyclic sulfamide HIV-1 protease inhibitor. J Med Chem 1997; 40: 898–902.
Prota A, Vogt J, Pilger B, Perozzo R, Wurth C, Marquez VE, et al. Kinetics and crystal structure of the wild-type and the engineered Y101F mutant of Herpes simplex virus type 1 thymidine kinase interacting with (North)-methanocarba-thymidine. Biochemistry 2000; 39: 9597–603.
Gammon ST, Bernstein M, Schuster DP, Piwnica-Worms D . A method for quantification of nucleotides and nucleotide analogues in thymidine kinase assays using lanthanum phosphate coprecipitation. Anal Biochem 2007; 369: 80–6.
Bissantz C, Folkers G, Rognan D . Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J Med Chem 2000; 43: 4759–67.
Cleves AE, Jain AN . Robust ligand-based modeling of the biological targets of known drugs. J Med Chem 2006; 49: 2921–38.
Gasteiger J, Rudolph C, Sadowski J . Automatic Generation of 3D-atomic coordinates for organic molecules. Tetrahedron Comput Methodol 1990; 3: 537–47.
Krovat EM, Fruhwirth KH, Langer T . Pharmacophore identification, in silico screening, and virtual library design for inhibitors of the human factor Xa. J Chem Inf Model 2005; 45: 146–59.
Palomer A, Cabre F, Pascual J, Campos J, Trujillo MA, Entrena A, et al. Identification of novel cyclooxygenase-2 selective inhibitors using pharmacophore models. J Med Chem 2002; 45: 1402–11.
Norinder U . Refinement of catalyst hypotheses using simplex optimisation. J Comput Aided Mol Des 2000; 14: 545–57.
Greenidge PA, Carlsson B, Bladh LG, Gillner M . Pharmacophores incorporating numerous excluded volumes defined by X-ray crystallographic structure in three-dimensional database searching: application to the thyroid hormone receptor. J Med Chem 1998; 41: 2503–12.
Shen J, Xu X, Cheng F, Liu H, Luo X, Shen J, et al. Virtual screening on natural products for discovering active compounds and target information. Curr Med Chem 2003; 10: 2327–42.
Ruben A, Maxim T, Drnitry K . ICM-A new method for protein modeling and design: Applications to docking and structure prediction from the distorted native conformation. J Comput Chem 1994; 15: 488–506.
Chen H, Lyne PD, Giordanetto F, Lovell T, Li J . On evaluating molecular-docking methods for pose prediction and enrichment factors. J Chem Inf Model 2006; 46: 401–15.
Bursulaya BD, Totrov M, Abagyan R, Brooks CL 3rd . Comparative study of several algorithms for flexible ligand docking. J Comput Aided Mol Des 2003; 17: 755–63.
We thank the Shanghai Supercomputing Center and Computer Network Information Center of Chinese Academy of Sciences for allocation of computing time. This work was supported by the National Natural Science Foundation of China (grants 20721003, 20803022, and 30672539), the International Collaboration Projects (grants 2007DFB30370 and 20720102040), the 863 Hi-Tech Program of China (grant 2007AA02Z304), and the State Key Program of Basic Research of China (grants 2009CB918502 and 2009CB918501).
About this article
Cite this article
Chen, Z., Li, Hl., Zhang, Qj. et al. Pharmacophore-based virtual screening versus docking-based virtual screening: a benchmark comparison against eight targets. Acta Pharmacol Sin 30, 1694–1708 (2009). https://doi.org/10.1038/aps.2009.159
- hit rate
This article is cited by
Journal of Computer-Aided Molecular Design (2015)
Structure-based ensemble-QSAR model: a novel approach to the study of the EGFR tyrosine kinase and its inhibitors
Acta Pharmacologica Sinica (2014)