Scoring functions are a group of computational methods widely applied in structure-based drug design for fast evaluation of protein–ligand interactions. To date, a whole spectrum of scoring functions have been developed based on different assumptions or algorithms. Therefore, it is important to both the end users and the developers of scoring functions that their performance be objectively assessed. We have developed the comparative assessment of scoring functions (CASF) benchmark as an open-access solution for scoring function evaluation. The latest CASF-2013 benchmark enables evaluation of the so-called 'scoring power', 'ranking power', 'docking power', and 'screening power' of a given scoring function with a high-quality test set of 195 complexes formed between diverse protein molecules and their small-molecule ligands. Evaluation results of the standard scoring functions implemented in several mainstream software programs (including Schrödinger, MOE, Discovery Studio, SYBYL, and GOLD) are provided as reference. This benchmark has become popular among the scoring function community since its first release. In this protocol, we provide detailed descriptions of the data files included in the CASF-2013 package and step-by-step instructions on how to conduct the performance tests with the ready-to-use computer scripts included in the package. This protocol is expected to lower the technical hurdles in front of new and existing users of the CASF-2013 benchmark. On a standard desktop workstation, it takes roughly half an hour to complete the whole evaluation procedure for one scoring function, once the required inputs, i.e., the results computed on the test set, are ready to use.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
Scientific Reports Open Access 10 January 2022
New machine learning and physics-based scoring functions for drug discovery
Scientific Reports Open Access 04 February 2021
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
Böhm, H.J. & Stahl, M. The use of scoring functions in drug discovery applications. in Reviews in Computational Chemistry, Vol. 18 (eds. Lipkowitz, K.B. & Boyd, D.B.) 41–88 (Wiley-VCH, 2002).
Schulz-Gasch, T. & Stahl, M. Scoring functions for protein-ligand interactions: a critical perspective. Drug Discov. Today Tech. 1, 231–239 (2004).
Leach, A.R., Shoichet, B.K. & Peishoff, C.E. Prediction of protein-ligand interactions. docking and scoring: successes and gaps. J. Med. Chem. 49, 5851–5855 (2006).
Rajamani, R. & Good, A.C. Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development. Curr. Opin. Drug Discov. Dev. 10, 308–315 (2007).
Brooijmans, N. & Kuntz, I.D. Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct. 32, 335–373 (2003).
Muegge, I. & Rarey, M. Small molecule docking and scoring. in Reviews in Computational Chemistry, Vol. 17 (eds. Lipkowitz, K.B. & Boyd, D.B.) 1–60 (Wiley-VCH, 2001).
Kitchen, D.B., Decornez, H., Furr, J.R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).
Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. & Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269–288 (1982).
Ewing, T.J.A., Makino, S., Skillman, A.G. & Kuntz, I.D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15, 411–428 (2001).
Morris, G.M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 19, 1639–1662 (1998).
Morris, G.M. et al. Autodock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 16, 2785–2791 (2009).
Jones, G., Willett, P., Glen, R.C., Leach, A.R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).
Friesner, R.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Halgren, T.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759 (2004).
Jain, A.N. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J. Med. Chem. 46, 499–511 (2003).
Jain, A.N. Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des. 21, 281–306 (2007).
Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).
Kutchukian, P.S. & Shakhnovich, E.I. De novo design: balancing novelty and confined chemical space. Expert Opin. Drug Discov. 5, 789–812 (2010).
Liu, J. & Wang, R. Classification of current scoring functions. J. Chem. Inf. Model. 55, 475–482 (2015).
Charifson, P.S., Corkery, J.J., Murcko, M.A. & Walters, W.P. Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. 42, 5100–5109 (1999).
Bissantz, C., Folkers, G. & Rognan, D. Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 43, 4759–4767 (2000).
Ha, S., Andreani, R., Robbins, A. & Muegge, I. Evaluation of docking/scoring approaches: a comparative study based on MMP3 inhibitors. J. Comput. Aided Mol. Des. 14, 435–448 (2000).
Stahl, M. & Rarey, M. Detailed analysis of scoring functions for virtual screening. J. Med. Chem. 44, 1035–1042 (2001).
Bursulaya, B., Totrov, M., Abagyan, R. & Brooks, C. Comparative study of several algorithms for flexible ligand docking. J. Comput. Aided Mol. Des. 17, 755–763 (2003).
Xing, L., Hodgkin, E., Liu, Q. & Sedlock, D. Evaluation and application of multiple scoring functions for a virtual screening experiment. J. Comput. Aided Mol. Des. 18, 333–344 (2004).
Hu, X., Balaz, S. & Shelver, W.H. A practical approach to docking of zinc metalloproteinase inhibitors. J. Mol. Graph. Model. 22, 293–307 (2004).
Kontoyianni, M., McClellan, L.M. & Sokol, G.S. Evaluation of docking performance: comparative data on docking algorithms. J. Med. Chem. 47, 558–565 (2004).
Kontoyianni, M., Sokol, G.S. & MCclellan, L.M. Evaluation of library ranking efficacy in virtual screening. J. Comput. Chem. 26, 11–22 (2005).
Cummings, M.D., DesJarlais, R.L., Gibbs, A.C., Mohan, V. & Jaeger, E.P. Comparison of automated docking programs as virtual screening tools. J. Med. Chem. 48, 962–976 (2005).
Evers, A. & Klabunde, T. Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1a adrenergic receptor. J. Med. Chem. 48, 1088–1097 (2005).
Warren, G.L. et al. A critical assessment of docking programs and scoring functions. J. Med. Chem. 49, 5912–5931 (2006).
Zhou, Z., Felts, A.K., Friesner, R.A. & Levy, R.M. Comparative performance of several flexible docking programs and scoring functions: enrichment studies for a diverse set of pharmaceutically relevant targets. J. Chem. Inf. Model. 47, 1599–1608 (2007).
McGaughey, G.B. et al. Comparison of topological, shape, and docking methods in virtual screening. J. Chem. Inf. Model. 47, 1504–1519 (2007).
Houston, D.R. & Walkinshaw, M.D. Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model. 53, 384–390 (2013).
Tuccinardi, T., Poli, G., Romboli, V., Giordano, A. & Martinelli, A. Extensive consensus docking evaluation for ligand pose prediction and virtual screening studies. J. Chem. Inf. Model. 54, 2980–2986 (2014).
Xu, W., Lucke, A.J. & Fairlie, D.P. Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. J. Mol. Graph. Model. 57, 76–88 (2015).
Damm-Ganamet, K.L., Smith, R.D., Dunbar, J.B., Stuckey, J.A. & Carlson, H.A. CSAR benchmark exercise 2011–2012: evaluation of results from docking and relative ranking of blinded congeneric series. J. Chem. Inf. Model. 53, 1853–1870 (2013).
Dunbar, J.B. et al. CSAR Data Set Release 2012: ligands, affinities, complexes, and docking decoys. J. Chem. Inf. Model. 53, 1842–1852 (2013).
Smith, R.D. et al. CSAR benchmark exercise 2013: evaluation of results from a combined computational protein design, docking, and scoring/ranking challenge. J. Chem. Inf. Model. 56, 1022–1031 (2016).
Carlson, H.A. et al. CSAR 2014: a benchmark exercise using unpublished data from pharma. J. Chem. Inf. Model. 56, 1063–1077 (2016).
Perez, C. & Ortiz, A.R. Evaluation of docking functions for protein-ligand docking. J. Med. Chem. 44, 3768–3785 (2001).
Kellenberger, E., Rodrigo, J., Muller, P. & Rognan, D. Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins 57, 225–242 (2004).
Perola, E., Walters, W.P. & Charifson, P.S. A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 56, 235–249 (2004).
Chen, H., Lyne, P.D., Giordanetto, F., Lovell, T. & Li, J. On evaluating molecular-docking methods for pose prediction and enrichment factors. J. Chem. Inf. Model. 46, 401–415 (2006).
Onodera, K., Satou, K. & Hirota, H. Evaluations of molecular docking programs for virtual screening. J. Chem. Inf. Model. 47, 1609–1618 (2007).
Kim, R. & Skolnick, J. Assessment of programs for ligand binding affinity prediction. J. Comput. Chem. 29, 1316–1331 (2008).
Cross, J.B. et al. Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 49, 1455–1474 (2009).
Li, X., Li, Y., Cheng, T., Liu, Z. & Wang, R. Evaluation of the performance of four molecular docking programs on a diverse set of protein-ligand complexes. J. Comput. Chem. 31, 2109–2125 (2010).
Plewczynski, D., Lazniewski, M., Augustyniak, R. & Ginalski, K. Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J. Comput. Chem. 32, 742–755 (2011).
Wang, Z. et al. Comprehensive evaluation of ten docking programs on a diverse set of protein–ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 18, 12964–12975 (2016).
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).
Liu, Z.H. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).
Hu, L., Benson, M.L., Smith, R.D., Lerner, M.G. & Carlson, H.A. Binding MOAD (Mother of All Databases). Proteins 60, 333–340 (2005).
Benson, M.L. et al. Binding MOAD, a high-quality protein-ligand database. Nucleic Acids Res. 36, D674–D678 (2008).
Ahmed, A., Smith, R.D., Clark, J.J., Dunbar, J.B. Jr. & Carlson, H.A. Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures. Nucleic Acids Res. 43, D465–D469 (2014).
Cole, J.C., Murray, C.W., Nissink, W.M., Taylor, R.D. & Taylor, R. Comparing protein-ligand docking programs is difficult. Proteins 60, 325–332 (2005).
Jain, A.N. Bias, reporting, and sharing: computational evaluations of docking methods. J. Comput. Aided Mol. Des. 22, 201–212 (2008).
Todorov, N.P., Monthoux, P.H. & Alberts, I.L. The influence of variations of ligand protonation and tautomerism on protein-ligand recognition and binding energy landscape. J. Chem. Inf. Model. 46, 1134–1142 (2006).
Brink, T. & Exner, T.E. Influence of protonation, tautomeric, and stereoisomeric states on protein-ligand docking results. J. Chem. Inf. Model. 49, 1535–1546 (2009).
Wang, R., Lu, Y. & Wang, S. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 46, 2287–2303 (2003).
Wang, R., Lu, Y., Fang, X. & Wang, S. An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes. J. Chem. Inf. Comput. Sci. 44, 2114–2125 (2004).
Ferrara, P., Gohlke, H., Price, D.J., Klebe, G. & Brooks, C.L. Assessing scoring functions for protein-ligand interactions. J. Med. Chem. 47, 3032–3047 (2004).
Marsden, P.M., Puvanendrampillai, D., Mitchell, J.B.O. & Glen, R.C. Predicting protein-ligand binding affinities: a low scoring game? Org. Biomol. Chem. 2, 3267–3273 (2004).
Oda, A., Tsuchida, K., Takakura, T., Yamaotsu, N. & Hirono, S. Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes. J. Chem. Inf. Model. 46, 380–391 (2006).
Dunbar, J.B. et al. CSAR benchmark exercise of 2010: selection of the proteinligand complexes. J. Chem. Inf. Model. 51, 2036–2046 (2011).
Smith, R.D. et al. CSAR benchmark exercise of 2010: combined evaluation across all submitted scoring functions. J. Chem. Inf. Model. 51, 2115–2131 (2011).
Yilmazer, N.D. & Korth, M. Comparison of molecular mechanics, semi-empirical quantum mechanical, and density functional theory methods for scoring protein-ligand interactions. J. Phys. Chem. B 117, 8075–8084 (2013).
Cheng, T., Li, X., Li, Y., Liu, Z. & Wang, R. Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49, 1079–1093 (2009).
Li, Y. et al. Comparative assessment of scoring functions on an updated benchmark: I. Compilation of the test set. J. Chem. Inf. Model. 54, 1700–1716 (2014).
Li, Y., Han, L., Liu, Z.H. & Wang, R.X. Comparative assessment of scoring functions on an updated benchmark: II. Evaluation methods and general results. J. Chem. Inf. Model. 54, 1717–1736 (2014).
Berman, H.M., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980 (2003).
Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).
Wang, R., Lai, L. & Wang, S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Des. 16, 11–26 (2002).
Ain, Q.U., Aleksandrova, A., Roessler, F.D. & Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput. Mol. Sci. 5, 405–424 (2015).
We are grateful to the users of the CASF benchmark for their valuable feedback. This work was financially supported by the Ministry of Science and Technology of China (National Key Research Program, grant no. 2016YFA0502302), the National Natural Science Foundation of China (grant nos. 81725022, 81430083, 21472227, 21673276, and 21402230), the Chinese Academy of Sciences (Strategic Priority Research Program, grant no. XDB20000000), and the Science and Technology Development Foundation of Macao SAR (grant no. 055/2013/A2).
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Contents under the ‘decoys_docking/’ directory in the CASF-2013 package.
Only some data files under this directory are shown in this figure as demonstration.
Supplementary Figure 2 Contents under the ‘decoys_screening/’ directory in the CASF-2013 package.
Only some data files under the "10gs/" subdirectory are shown in this figure as demonstration.
Supplementary Figure 3 Information of the target proteins and their known binders recorded in ‘TargetInfo.dat’.
Only some target proteins are shown in this figure. The first four-letter code in each line refers to the PDB entry from which the target protein structure is retrieved; while the rest codes indicate the PDB entries containing the known binders to this target protein. All known binders to the target protein are ranked in a descending order by their binding affinities, i.e. the tightest binder is ranked at the first place.
Rights and permissions
About this article
Cite this article
Li, Y., Su, M., Liu, Z. et al. Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark. Nat Protoc 13, 666–680 (2018). https://doi.org/10.1038/nprot.2017.114
This article is cited by
Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
Scientific Reports (2022)
Pharmacophore-based virtual screening of ZINC database, molecular modeling and designing new derivatives as potential HDAC6 inhibitors
Molecular Diversity (2022)
New machine learning and physics-based scoring functions for drug discovery
Scientific Reports (2021)
A geometric deep learning approach to predict binding conformations of bioactive molecules
Nature Machine Intelligence (2021)
Prediction of ligand binding mode among multiple cross-docking poses by molecular dynamics simulations
Journal of Computer-Aided Molecular Design (2020)
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.