Introduction

Biomolecular recognition is central to cellular processes mediated by the formation of complexes between biomolecular receptors and their ligands. Understanding of biomolecular recognition is one of the most important issues in modern molecular biology1,2 and has direct applications in drug discovery and design3,4. The fast and accurate prediction of a ligand specifically binding to a target protein is a crucial step for lead discovery. During the past two decades, considerable efforts have been devoted to the development of docking algorithms and scoring functions5. There are many docking algorithms available, however, imperfections of scoring functions continue to be a major limiting factor6,7.

Highly efficient and specific biomolecular recognition requires both affinity and specificity8,9,10,11. The stability of the complex is determined by the affinity while the specificity is controlled by either partner binding to other competitive biomolecules discriminatively. The current scoring functions of protein-ligand binding12, whether force-field based, empirical, or knowledge-based scoring functions, are mainly focused on improving the ability of predicting the known binding affinities observed in experiments as accurately as possible. The strategy of developing these scoring functions seeks to optimize the stability based on the combination of energetics and shape complementarity without explicit consideration of binding specificity.

However, high affinity does not guarantee high specificity which is critical for function, e.g. drug-target recognition. To design a drug, one has to design a lead compound binding to a specific target receptor rather than others indiscriminately for avoiding the possible side effects. According to the Boltzman distribution (P exp[−F/KT]), the equilibrium population is exponentially dependent on the binding free energy. A gap in binding free energy or affinity will lead to significant population discrimination between the specific complex and non-specific ones. Thus to develop a scoring function, the strategy should satisfy the requirement that the stability of the specific complex is maximized while the stability of competing complexes is minimized, which can guarantee both the stability and the specificity for the specific complex.

The reason that the specificity usually was not taken into account explicitly is that the description of binding specificity was challenging to quantify. The conventional definition (Fig. 1a) of specificity is the ability of a ligand to specifically bind to a protein against other proteins, namely the relative difference in affinity of one specific protein-ligand complex to others8,9,10,11. The quantification of conventional specificity is challenging since specificity requires comparison of the affinities of all the different receptors with the same ligand. The receptor universe is huge and the information is often incomplete on those proteins. We proposed an alternative way to quantitatively determine the ligand-receptor binding specificity13,14. The new view of specificity is that a specific and native binding site of a ligand to its receptor is preferred to other binding sites on the same receptor (Fig. 1b). This concept is based on the assumption that a ligand binding to many protein receptors is equivalent to its binding to them with N and C terminus of these protein receptors linked together, leading to binding to an effectively large protein. Therefore, if the target protein is large enough, one can think of it as composed of many different segments each mimicking a protein receptor. The conventional specificity as relative affinity against different receptors now becomes specificity of the native binding mode against the other binding modes for this large target protein (Fig. 1a,b).

Figure 1
figure 1

Illustration of the equivalence of conventional specificity to intrinsic specificity.

(a) Different receptors (green) binding to the same ligand (red) with the corresponding energy spectrum. (b) Different binding states (modes) of a particular ligand to its receptor. (c) Different ligands binding to the same receptor.

Also, one receptor protein with different sites binding with a ligand is similar to the case of the whole universe of different ligands binding with the same receptor (Fig. 1c). If the protein target size is large enough, then the spatial contact interactions can be sampled enough to cover all the contact interactions appeared in the ligand binding to different receptors. A recent work on molecular dynamics of a ligand searching for binding to the receptor has validated this assumption. In that work, Shaw et al.15 carried out a relatively long molecular dynamics simulation in which the inhibitor PP1 was initially placed at a random location to search the docking sites on the protein Src kinase. Persistent and noteworthy intermediate conformations with the inhibitor located diversely on the surface of the kinase were observed in the binding process, indicating multiple competing binding sites do exist on the same receptor.

Similar to protein folding, the binding process of protein-ligand can be physically quantified and visualized as a funnel-like energy landscape towards the native binding state with local roughness along the binding paths13,14,16,17,18,19,20,21,22,23. According to the theory of energy landscape (Fig. 2), the native conformation of the binding complex is the conformation with the lowest binding energy and the energies of the non-native conformations follow a statistical Gaussian distribution. A dimensionless quantity (termed as intrinsic specificity ratio (ISR), where δE is the energy gap between the native binding state and the average non-native binding states, ΔE is the energy variance of the non-native states and S is the configurational entropy) is defined to describe the magnitude of intrinsic specificity. A large ISR indicates a high level of discrimination of the native binding state from the non-native binding states, which means a high specificity. Therefore, ISR provides a quantitative measure of intrinsic specificity that can be determined without evaluating the conventional specificity by exploring the whole set of receptor or ligand universe.

Figure 2
figure 2

Distribution of energy and funneled energy landscape of the biomolecular binding.

(a) Multiple docking complex conformations of COX-2 with a selective drug (SC-558), the native pose of the drug shown in red and sticks while other decoys shown in blue and lines. (b) The binding energy spectrum for different binding decoys. (c) The corresponding distribution of binding energies where the standard deviation of energy (ΔE), average energy of decoys (< ED >) and the energy gap (δE) are represented. (d) Energy landscape of binding with a funneled shape towards the native state.

Given the inherent limitations of current scoring functions and the demands for the practical way of evaluating specificity, we developed a novel scoring function by optimizing both intrinsic specificity and affinity in this study (named as SPA, SPA stands for SPecificity and Affinity. The development flowchart of SPA is shown in Supplementary Fig. 1). The optimizing strategy for SPA is to simultaneously reach the maximization of the performances on both the specificity and the affinity predictions of the training set. SPA was validated by testing the benchmark set and the performance was compared with 16 previous scoring functions. The test results showed that SPA outperformed all these previous scoring functions. SPA was also applied for a target protein cyclooxygenase-2 (COX-2) of nonsteroidal anti-inflammatory drugs (NSAIDs) for discrimination of the drugs against the diversity set and selective drugs against non-selective drugs. The results suggested that more reliable lead compounds can be screened with SPA through two dimensional screening of intrinsic specificity ISR and affinity.

Results

Binding pose prediction

The process of a ligand binding to a protein can be thought as a conformational search guided by a scoring function and the final destination is to look for the “native” binding pose with the best score. Whether the scoring function can select out the best-scored binding pose which resembles the one in the crystal structure closely determines the performance of the scoring function for identifying the “native” binding conformation. Normally, the root mean square deviation (RMSD) is taken as the measure of the structural closeness between the scored binding poses and the “native” binding conformation which is the ligand pose in the crystal structure here. If the RMSD value of the best-scored binding pose is less than a predefined cutoff, it is considered as a successful recognition of the “native” or “near-native” binding pose by the scoring function.

To make a comparison with other scoring functions, the success rate for the benchmark set was calculated by SPA and compared to 16 scoring functions implemented in the mainstream commercial softwares or available from academic research groups24 (Table 1). The success rates of SPA as well as the other 16 scoring functions under five different cutoffs (1.0, 1.5, 2.0, 2.5 and 3.0 Å) clearly shows that SPA performs the best among all the scoring functions no matter what criterion of RMSD cutoff is used, suggesting that SPA is very successful on the ability to identify “native” or “near-native” binding poses.

Table 1 Success rates of identifying the “native” or “near-native” conformations under different RMSD cutoffs

In practice, besides the best-scored binding pose, multiple other binding poses with good scores can also be selected out as putative “native” binding poses. Namely, it is probable in molecular docking to select a few binding poses with top-ranked scores for further evaluation in hierarchical database screenings25. The success rates were compared for the binding poses with top five scores under the commonly used criterion of RMSD cutoff ( = 2.0 Å) (Supplementary Table 1). We can see that SPA yields more than 90% success rate to identify the “native” binding pose when the binding scores are considered from top two to five binding scores. It has comparable performance as other three top-ranked scoring function GOLD/ASP, DrugScorePDB/PairSurf and DS/PLP1. This result further validates the outstanding performance of SPA on the “native” or “near-native” binding pose identification.

A high success rate of identifying the “native” conformation implies that the binding poses structurally close to the “native” conformation have high binding scores in energetics. This structure-energy correlation is consistent with the concept of funnel-shaped energy landscape of protein-ligand binding13, where the “native” binding conformation with lowest energy locates at the global minimum of the energy landscape and the conformations with low energies are structurally similar to the “native” conformation. According to the energy landscape theory16,17, it is promising that SPA with the highest success rate to identify the “native” conformation can search the global minimum with a fast convergence if it is applied to guide the conformational sampling in molecular docking. The high success rate also implies the capability of SPA to discriminate the “native” binding mode against decoys. This leads to a better quantification and discrimination of intrinsic specificity and therefore the (conventional) specificity of biomolecular recognition.

Binding affinity prediction

In addition to the prediction of binding pose, the prediction of binding affinity is another important criterion to evaluate the performance of scoring function. In contrast to binding pose prediction which emphasizes on the discrimination of the “native” conformation from the “non-native” decoys for each protein-ligand complex, the prediction of binding affinity relates to the ability of reproducing the experimentally measured binding affinities for different types of protein-ligand complexes. It determines the accuracy of binding scores predicted by the scoring functions compared with the experimental measurements. Due to scaling, the scoring functions usually cannot reproduce the absolute values of experimental binding affinities, the correlation between the predicted and experimental measured binding affinities is widely used to evaluate the accuracy of binding affinity prediction. Pearson correlation coefficients (CP) and Spearman rank correlation coefficients (CS) between the binding scores and the known binding constants were computed.

It can be seen (Table 2) that SPA gives the best correlations of both CP and CS. Compared to other scoring functions, this performance of SPA is surprisingly good since no other scoring functions simultaneously rank on the tops for both the predictions of binding pose (Table 1) and binding affinity (Table 2). For example, X-Score/HMScore performs well on the prediction of binding affinity but moderately on the prediction of binding pose. GOLD/ASP is able to identify the correct binding pose with a second highest success rate whereas it is less successful to reproduce the binding affinities. It is worth noticing that only SPA and X-Score/HMScore achieve CP over 0.6 in predicting affinity which is much superior to other scoring functions (Supplementary Fig. 2).

Table 2 Correlations between the predicted binding affinity and experimentally measured binding affinity

The best performance of SPA on the prediction of both binding pose and binding affinity suggests that the optimization strategy calibrated to improve specificity and affinity for the development of SPA is the correct route to explore the protein-ligand recognition. SPA is not only capable of discriminating the specific “native” conformation out of a large number of decoys by their scores but also accurately predicting the binding affinities of different protein-ligand complexes. This result is encouraging and motivates us to apply SPA in the virtual screening to identify the lead compounds for drug discovery with both affinity and specificity.

Virtual screening test

With the excellent performance of SPA on the benchmark test, we want to evaluate the ability of SPA in real virtual screening test. The enzyme cyclooxygenase-2 (COX-2) was chosen as our target protein model which is the target of nonsteroidal anti-inflammatory drugs (NSAIDs) for reducing fever and inflammation, such as the commonly-taken drugs aspirin, motrin, telenoid and advil. The virtual screening for COX-2 is challenging. Besides the importance to discriminate the drugs from the diversity set, it is more important to distinguish selective and nonselective drugs since selective drugs is more potent to inhibit COX-2 than non-selective drugs. Whether the differences can be evaluated according to the affinities and specificities determines the performance of SPA in the applications of virtual screening.

As seen from the enrichment curves (Fig. 3a), the selective drugs are obviously separated from the diversity set, while the non-selective drugs are weakly separated from the diversity set. This indicates that SPA has the capacity to discriminate the drugs from the diversity set, especially the selective drugs from the diversity set. The weak discrimination of non-selective drugs from the diversity set may result from the fact that the non-selective drugs are not specific for COX-2 and have much lower potency than the selective drugs. Clearly, the statistics of the top-ranked compounds of all the ligands (Supplementary Table 2), often taken as interest compounds in the virtual screening, shows that compared to affinity, the specificity is a more efficient criterion to select the drugs out of the top-ranked compounds. It is worth noticing that the performance of linear combination of the affinity and specificity is better than the single parameter to discriminate the selective drugs from the diversity set.

Figure 3
figure 3

Statistics for the drugs and two dimensional contour map of drug screening.

(a) Enrichment for both selective and non-selective drugs with E, ISR and the linear combination of them (E & ISR) through logistic regression (b) KS statistic for the discrimination of selective drugs and non-selective drugs by the difference of cumulative fraction based on the parameters. (c) Two dimensional density contour map of binding affinity and ISR for 650 small molecules binding with COX-2, the selective drugs shown in red and non-selective drugs shown in green.

To further quantify the discrimination of selective drugs from non-selective drugs, the statistical discrimination KS test was calculated (Fig. 3b). The relative high values of KS statistic (higher than 40%) suggest significant differences between the selective drugs and the non-selective drugs in both affinity and specificity and more obvious in terms of the combination of them. The KS statistic results demonstrate that SPA is capable to discriminate selective drugs against non-selective drugs, which is important for selecting drug candidates with specificity against targets such as COX-2.

Based on the performance of SPA on the screening, a two-dimensional projection of specificity and affinity is plotted (Fig. 3c) for COX-2 with the diversity set of 650 selected compounds as well as its 37 selective and 20 non-selective drugs. The basin center with the highest density locates in the area with small ISR and low affinity, indicating that random compounds which have weak thermodynamic stability also generally do not have high specificity. Whereas, most of the selective drugs have large ISR and high affinity and most of the non-selective market drugs tend to have relatively smaller ISR and lower affinity. It is worth noticing that a few drugs have values near the basin center in one parameter (ISR or affinity), but have larger values in another parameter, which suggests that specificity and affinity can be complementary in searching some specific drug candidates in virtual screening. These results validate that specificity is an important property of drug-target system.

These results validate that specificity is an important property of drug-target system. Previously, both experimentally and computationally screening techniques mostly concentrated on the affinity selection for the lead compounds. The virtual screening test of SPA here demonstrates that the ISR is an appropriate indicator for the lead compounds with specificity selection. Experimentally, it is challenging to determine the binding specificity for a given target. Computationally, it is practical to employ the scoring functions to carry out two dimensional virtual screening using both affinity and ISR in drug discovery. SPA is a good choice based on its excellent performance.

Discussion

In this work, we developed a novel and quantitative descriptions of biomolecular interactions through the scoring function called SPA which takes into account of both specificity and affinity of protein-ligand binding. It represents a significant advance over the previous investigations on protein-ligand binding interactions and scoring functions that only focused on affinity for development. Two another important innovations are incorporated into SPA. Firstly, SPA provides an effective way to circumvent the calculation of the reference state which is an issue in the development of knowledge-based scoring functions26,27,28. Secondly, lacking of a large and high-quality set of protein-ligand complexes with experimentally determined binding affinities and three dimensional structures was a bottleneck for developing accurate and general scoring functions. SPA takes the largest data so far of high-quality set of protein-ligand complexes with experimentally determined binding affinities and 3D structures29. It gives SPA with the chance to be less training-set dependent and more general for applications, which is superior than previous empirical scoring functions.

The excellent performance of SPA was validated by the test on a benchmark set. Compared to the other 16 existing popular scoring functions, SPA achieves the highest success rate in identifying correct binding pose, yields the highest correlation coefficient in the prediction of the experimentally measured binding affinity. These significant improvements of performances over previous scoring functions are very encouraging and motivate us to apply SPA in identifying the lead compounds in drug discovery. The virtual screening test of SPA on a drug target COX-2 shows that it can successfully distinguish not only the drugs from the diversity set according to the binding affinity as well as the specificity, but also the selective drugs from non-selective drugs, the later discrimination is more demanding in the discovery of lead compounds for drug targets. Thus, more reliable lead compounds with both stability and specificity can be searched with SPA scoring. The success of SPA proves that the specificity is critical to the biomolecular recognition and necessary to be incorporated to the scoring function. In the computational design of the protein-protein interactions9,10, both stability and specificity can be considered to design the interactions for discriminating natural binding partners from many other possible ones with similar sequences and structures. In natural systems, both parameters may be subject to evolutionary optimization, giving the rationality of the optimization on affinity and specificity.

With the availability of rapidly increasing number of protein structures and the advent of high-performance computing system, computational virtual screening offers an effective and practical route to discovering new drug molecules, a complementary or even an alternative way of experimental high-throughput screening30. In the processes of virtual screening, the performance of the scoring function has a major impact on the quality of molecular docking predictions. The outstanding performance of SPA shown in this work makes it practical to be implemented in the docking software and widely applied in virtual screening for identifying the lead compounds.

Methods

Derivation of distance-dependent potentials

The initial atom-pair potential to be optimized is directly derived from the Boltzmann relation widely used in the knowledge-based statistical potentials27,28,31, which is

where gij(r) is the observed pair distribution function which can be calculated by

is the observed number density of atom pair ij within a spherical shell between r and r+δr and the is the expected number density within the sphere of the reference state where there were no interactions between atoms. The former can be directly extracted from the database of protein-ligand complexes, while the later is obtained based on the approximation that the atom pair ij is uniformly distributed in the sphere of the reference state32. Respectively, they are calculated as

where M is the number of protein-ligand complexes, and are the numbers of atom pair ij within the spherical shell and the reference sphere for a given protein-ligand complex m, where . and are the volumes of the spherical shell and the reference sphere, where Δr is the bin size and R is the radius of sphere. In this work, Δr and R are set as 0.3Å and 7.0Å, respectively. In total, there are 16 spherical shells with bin size 0.3 Å from the shortest radius 2.2Å which is the value to exclude the protein-ligand complexes with the steric atom clashes in PDBbind database29.

In fact, the initial potential can be extracted from any database of protein-ligand complex structures, while a good set of initial potentials can make optimizing search more efficient. The database of protein-ligand complexes used here was taken from the refined set of 2011 version in PDBbind database29,33. This database provides a comprehensive and highquality collection of the experimentally determined biomolecular complexes with measured binding affinities which were filtered from the Protein Data Bank (PDB) by applying a series of criteria. Due to infrequent occurrence of metal atoms in the protein-ligand complexes, the complexes containing metal atoms are excluded and only the potentials between nonmetal atoms are considered. 2316 protein-ligand complexes are remaining as our database to extract the initial potentials. Based on the definition of atom type by SYBYL34, 22 atom types are used to cover protein-ligand interactions (Supplementary Table. 3), these atom types were converted from PDB files by BABEL35. A cutoff ( = 600) of Nij was employed to ignore the contribution from the atom pairs with statistically insufficient occurrences. This leads to 101 effective types of atom pairs in our calculation. In addition, if the atom pair has no occurrence in a particular spherical shell, the corresponding pair potential was set as the van der Waals interaction within this shell.

Generation of docking decoys

To calculate the ISR for the optimization of SPA, enough conformations need to be sampled for each ligand docking to its specific protein receptor to explore the underlying binding energy landscape13,14. For each protein-ligand complex, except the “native” protein-ligand conformation obtained from the PDB structure, all the conformational decoys of protein-ligand complexes were generated by the molecular docking with software AutoDock4.236 in this work. Given that enough sampling of binding decoys to generate binding energy landscape is dependent on the docking space that the ligand can explore, the grid box for ligand docking should be sufficiently large to cover the active site as well as significant portions of the surrounding surface. The edges of the grid box were set as five times as the radius of gyration of the naive conformation of ligand to guarantee the enough exploration of the active site and the grid box was centered on the geometric center of the native pose with a grid spacing of 0.375. Within the grid box, Autodock4.2 stochastically generates a population of conformational, rotational and translational isomers from the starting structure of the ligand and docks them with the conformational search method of Lamarckian genetic algorithm. During the search, the ligand was considered conformationally flexible with its torsional bonds defined by AutoDock4.2 according to their chemical features. For each protein-ligand complex, 500 separate docking runs were performed which resulted in a database of 500 decoys for each complex. Other parameters were set as the default values of AutoDock4.2.

Quantitative description of specificity and affinity

To get a potential energy function which can maximize the binding specificity and the consistence between predicted and experimental affinity, we rewrote the initial energy function by introducing a set of adjustable parameters as the coefficients ck for the potentials of atom pair, that is

where E is the total intermolecular energy of a protein-ligand complex. k stands for the type of atom pair interaction, there are 1616 types by multiplying the number of effective atom pairs ( = 101) and the number of shells ( = 16). fk represents the occurrences of the interaction type k between the protein and ligand and uk is an alternative representation of uij(r) in equation 1.

The intrinsic specific ratio (ISR) for a given protein-ligand complex m is calculated as

where α is a scaling factor which accounts for the contribution of the entropy to the specificity13. Here, it approximately depends on the number of torsional bonds of the ligands . δE is the energy gap between the energy of native conformation EN and the average energy of ensemble of decoys < ED > and ΔE is the energy fluctuation or the width of the energy distribution of the decoys (Fig. 2), namely

<> means the average over the ensemble of decoys. Combined equations 58 together, the λm can be represented as

where k,l are the indices of the interaction types. Once fk is computed for each interaction type in the decoys, one can easily compute the value of λm for a given set of ck.

The λm above is defined for a single protein-ligand complex, while we seek the potential energy function that simultaneously makes λm values large enough for the whole proteinlingand complexes in the training set. Therefore we need a single objective function that reflects the λm values for all the protein-ligand complexes in the training set. We chose the Bolzmann-like weighted average of λm as the objective function which is

where βλ is a constant value for weighting which is set as −0.1. The Bolzmann-like weighted average has the advantage over the normally used algebraic average since the protein-ligand complex with the smallest absolute value of λm contribute most to the objective function λ. If we optimize the smallest value of λm from the distribution pool among different protein-ligand binding complexes, we will be sure that even the smallest λm will be large enough for discrimination of separating the “native” from non-native decoys. Therefore Bolzmann-like weighted average is an appropriate combination of individual λm to match our purpose that all the resulting protein-ligand complexes of the training set will be optimized with large λm value. This average approach has similar function as some other weighted average approaches used for optimizing energy function of protein folding37,38,39.

The quantitative measurements of the correlation between predicted and experimental affinity is depicted with Pearson's correlation coefficient by

The predicted binding affinity for the protein-ligand complex is represented by the binding scores calculated from our scoring function with a given set of ck. The experimentally measured affinity is expressed in log Kd or log Ki units, where Kd and Ki are experimentally determined dissociation constant and inhibition constant respectively for the protein-ligand complex m.

Optimization of potential energy function

After getting the initial potential energy and decoy ensembles, the energy function can be readily optimized. The aim of optimization here is to maximize the value of λ for specificity and the value of γ for affinity, a combination parameter (ρ = λγ) which couples specificity and affinity is constructed to evaluate the performance of scoring function during the optimization. The optimization is performed by Monte Carlo (MC) search with simulated annealing in the space of adjustable coefficients ck. The initial values of coefficients are set as 1.0 here, which means optimization of the scoring function starts from the energy function obtained through equation 1-4. A constraint is applied to ck by restricting it varied within [, 5] times of its initial value, otherwise some coefficients could become very large or small due to infrequent occurrences of the interactions. At each MC step one of the coefficients is chosen at random and added with 0.2 or −0.2. The resulting change in E (E is defined as E = −ρ, minimizing E is equivalent of maximizing ρ) is accepted with the probability

where is the optimization temperature for ρ. This guarantees the chosen MC steps statistically prefer the low E and high ρ. The temperature decreases exponentially during the search and the starting temperature is 0.5. The search converges well within 300,000 MC steps (Supplementary Fig. 2a), which suggests that a set of ck are found maximally optimized for ρ. The convergence of both λ and γ (Supplementary Fig. 2b) indicates the optimized scoring function reaches the maximal performance of simultaneously quantifying the specificity and affinity. For each MC search, 1500 protein-ligand complexes are randomly selected as the training set from the refined set of PDBbind database except the complexes in the test set. 5 independent MC simulations were performed and the average of correlation between the coefficients from different MC simulations is 0.80. This high correlation indicates our optimization is successful and robust on the training set.

Validation of SPA

To validate SPA, two kinds of tests were taken. First, SPA was tested on a benchmark of protein-ligand complexes which is a high-quality set of 195 protein-ligand complexes selected out from the refined set of 2007 version of the PDBbind database24. This benchmark was taken as testing set to compare the performance for a large collection of 16 scoring functions implemented in main-stream commercial softwares or available from academic research groups, which offers a reference for the performance of SPA. Each protein-ligand complex of the benchmark set was docked with the same parameter as the training set above to generate a binding energy landscape with decoy ensemble. Second, SPA was applied on a target protein cyclooxygenase-2 (COX-2) for the virtual screening test. COX-2 is the inhibition target of nonsteroidal anti-inflammatory drugs (NSAIDs) for reducing fever and inflammation. A diverse set of 650 small molecules were selected from the NCI-Diversity database40 having molecular weights similar to that of the reference compound SC-558, with which the crystal structure of the COX-2 complex is available (PDB code 1CX2)41,42. 37 COX-2 selective and 20 nonselective drugs (Supplementary Table. 4) of NSAIDs are taken for the test of discrimination of drugs from the diversity set, as well as the discrimination of selective from non-selective drugs. COX-2 selective inhibitors are specific to inhibit only COX-2, while COX-2 non-selective drugs inhibit both the COX-2 and its isoenzyme COX-1. Also, each ligand was docked to COX-2 to generate a binding energy landscape with 2000 decoy conformations and the box size of docking was set as 100 * 100 * 100 grids centered at the compound SC-558. Since the crystal structures of COX-2 with the ligands in the diversity set are not available, by clustering the decoys the same as implemented in AutoDock software, the decoy with the lowest energy in the largest population cluster was considered as the “native” pose43. The evaluation methods44 including the enrichment test and the Kolmogorov-Smirnov test (KS test) were employed to describe the performance of SPA for COX-2 system.