Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark


Scoring functions are a group of computational methods widely applied in structure-based drug design for fast evaluation of protein–ligand interactions. To date, a whole spectrum of scoring functions have been developed based on different assumptions or algorithms. Therefore, it is important to both the end users and the developers of scoring functions that their performance be objectively assessed. We have developed the comparative assessment of scoring functions (CASF) benchmark as an open-access solution for scoring function evaluation. The latest CASF-2013 benchmark enables evaluation of the so-called 'scoring power', 'ranking power', 'docking power', and 'screening power' of a given scoring function with a high-quality test set of 195 complexes formed between diverse protein molecules and their small-molecule ligands. Evaluation results of the standard scoring functions implemented in several mainstream software programs (including Schrödinger, MOE, Discovery Studio, SYBYL, and GOLD) are provided as reference. This benchmark has become popular among the scoring function community since its first release. In this protocol, we provide detailed descriptions of the data files included in the CASF-2013 package and step-by-step instructions on how to conduct the performance tests with the ready-to-use computer scripts included in the package. This protocol is expected to lower the technical hurdles in front of new and existing users of the CASF-2013 benchmark. On a standard desktop workstation, it takes roughly half an hour to complete the whole evaluation procedure for one scoring function, once the required inputs, i.e., the results computed on the test set, are ready to use.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Illustration of how the test set used in CASF-2013 was compiled.
Figure 2: Correlation between the experimental binding constant of each protein–ligand complex and its ΔSAS (i.e., buried solvent-accessible surface area of the ligand molecule upon binding) in the scoring power test.
Figure 3: Illustration of how the docking power is evaluated with decoy ligand-binding poses.
Figure 4: Information recorded in the 'CoreSet.dat' file.
Figure 5: An example output given by the scoring power test.
Figure 6: An example output given by the ranking power test.
Figure 7: An example output of the docking power test.
Figure 8: An example output of the screening power test measured by enrichment factors.
Figure 9: An example output of the screening power test measured by the success rate of finding the tightest binder.


  1. 1

    Böhm, H.J. & Stahl, M. The use of scoring functions in drug discovery applications. in Reviews in Computational Chemistry, Vol. 18 (eds. Lipkowitz, K.B. & Boyd, D.B.) 41–88 (Wiley-VCH, 2002).

  2. 2

    Schulz-Gasch, T. & Stahl, M. Scoring functions for protein-ligand interactions: a critical perspective. Drug Discov. Today Tech. 1, 231–239 (2004).

    CAS  Article  Google Scholar 

  3. 3

    Leach, A.R., Shoichet, B.K. & Peishoff, C.E. Prediction of protein-ligand interactions. docking and scoring: successes and gaps. J. Med. Chem. 49, 5851–5855 (2006).

    CAS  Article  Google Scholar 

  4. 4

    Rajamani, R. & Good, A.C. Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development. Curr. Opin. Drug Discov. Dev. 10, 308–315 (2007).

    CAS  Google Scholar 

  5. 5

    Brooijmans, N. & Kuntz, I.D. Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct. 32, 335–373 (2003).

    CAS  Article  Google Scholar 

  6. 6

    Muegge, I. & Rarey, M. Small molecule docking and scoring. in Reviews in Computational Chemistry, Vol. 17 (eds. Lipkowitz, K.B. & Boyd, D.B.) 1–60 (Wiley-VCH, 2001).

  7. 7

    Kitchen, D.B., Decornez, H., Furr, J.R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).

    CAS  Article  Google Scholar 

  8. 8

    Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. & Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269–288 (1982).

    CAS  Article  Google Scholar 

  9. 9

    Ewing, T.J.A., Makino, S., Skillman, A.G. & Kuntz, I.D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15, 411–428 (2001).

    CAS  Article  Google Scholar 

  10. 10

    Morris, G.M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 19, 1639–1662 (1998).

    CAS  Article  Google Scholar 

  11. 11

    Morris, G.M. et al. Autodock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 16, 2785–2791 (2009).

    Article  Google Scholar 

  12. 12

    Jones, G., Willett, P., Glen, R.C., Leach, A.R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).

    CAS  Article  Google Scholar 

  13. 13

    Friesner, R.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).

    CAS  Article  Google Scholar 

  14. 14

    Halgren, T.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759 (2004).

    CAS  Article  Google Scholar 

  15. 15

    Jain, A.N. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J. Med. Chem. 46, 499–511 (2003).

    CAS  Article  Google Scholar 

  16. 16

    Jain, A.N. Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des. 21, 281–306 (2007).

    CAS  Article  Google Scholar 

  17. 17

    Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).

    CAS  Article  Google Scholar 

  18. 18

    Kutchukian, P.S. & Shakhnovich, E.I. De novo design: balancing novelty and confined chemical space. Expert Opin. Drug Discov. 5, 789–812 (2010).

    CAS  Article  Google Scholar 

  19. 19

    Liu, J. & Wang, R. Classification of current scoring functions. J. Chem. Inf. Model. 55, 475–482 (2015).

    CAS  Article  Google Scholar 

  20. 20

    Charifson, P.S., Corkery, J.J., Murcko, M.A. & Walters, W.P. Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. 42, 5100–5109 (1999).

    CAS  Article  Google Scholar 

  21. 21

    Bissantz, C., Folkers, G. & Rognan, D. Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 43, 4759–4767 (2000).

    CAS  Article  Google Scholar 

  22. 22

    Ha, S., Andreani, R., Robbins, A. & Muegge, I. Evaluation of docking/scoring approaches: a comparative study based on MMP3 inhibitors. J. Comput. Aided Mol. Des. 14, 435–448 (2000).

    CAS  Article  Google Scholar 

  23. 23

    Stahl, M. & Rarey, M. Detailed analysis of scoring functions for virtual screening. J. Med. Chem. 44, 1035–1042 (2001).

    CAS  Article  Google Scholar 

  24. 24

    Bursulaya, B., Totrov, M., Abagyan, R. & Brooks, C. Comparative study of several algorithms for flexible ligand docking. J. Comput. Aided Mol. Des. 17, 755–763 (2003).

    CAS  Article  Google Scholar 

  25. 25

    Xing, L., Hodgkin, E., Liu, Q. & Sedlock, D. Evaluation and application of multiple scoring functions for a virtual screening experiment. J. Comput. Aided Mol. Des. 18, 333–344 (2004).

    CAS  Article  Google Scholar 

  26. 26

    Hu, X., Balaz, S. & Shelver, W.H. A practical approach to docking of zinc metalloproteinase inhibitors. J. Mol. Graph. Model. 22, 293–307 (2004).

    CAS  Article  Google Scholar 

  27. 27

    Kontoyianni, M., McClellan, L.M. & Sokol, G.S. Evaluation of docking performance: comparative data on docking algorithms. J. Med. Chem. 47, 558–565 (2004).

    CAS  Article  Google Scholar 

  28. 28

    Kontoyianni, M., Sokol, G.S. & MCclellan, L.M. Evaluation of library ranking efficacy in virtual screening. J. Comput. Chem. 26, 11–22 (2005).

    CAS  Article  Google Scholar 

  29. 29

    Cummings, M.D., DesJarlais, R.L., Gibbs, A.C., Mohan, V. & Jaeger, E.P. Comparison of automated docking programs as virtual screening tools. J. Med. Chem. 48, 962–976 (2005).

    CAS  Article  Google Scholar 

  30. 30

    Evers, A. & Klabunde, T. Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1a adrenergic receptor. J. Med. Chem. 48, 1088–1097 (2005).

    CAS  Article  Google Scholar 

  31. 31

    Warren, G.L. et al. A critical assessment of docking programs and scoring functions. J. Med. Chem. 49, 5912–5931 (2006).

    CAS  Article  Google Scholar 

  32. 32

    Zhou, Z., Felts, A.K., Friesner, R.A. & Levy, R.M. Comparative performance of several flexible docking programs and scoring functions: enrichment studies for a diverse set of pharmaceutically relevant targets. J. Chem. Inf. Model. 47, 1599–1608 (2007).

    CAS  Article  Google Scholar 

  33. 33

    McGaughey, G.B. et al. Comparison of topological, shape, and docking methods in virtual screening. J. Chem. Inf. Model. 47, 1504–1519 (2007).

    CAS  Article  Google Scholar 

  34. 34

    Houston, D.R. & Walkinshaw, M.D. Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model. 53, 384–390 (2013).

    CAS  Article  Google Scholar 

  35. 35

    Tuccinardi, T., Poli, G., Romboli, V., Giordano, A. & Martinelli, A. Extensive consensus docking evaluation for ligand pose prediction and virtual screening studies. J. Chem. Inf. Model. 54, 2980–2986 (2014).

    CAS  Article  Google Scholar 

  36. 36

    Xu, W., Lucke, A.J. & Fairlie, D.P. Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. J. Mol. Graph. Model. 57, 76–88 (2015).

    CAS  Article  Google Scholar 

  37. 37

    Damm-Ganamet, K.L., Smith, R.D., Dunbar, J.B., Stuckey, J.A. & Carlson, H.A. CSAR benchmark exercise 2011–2012: evaluation of results from docking and relative ranking of blinded congeneric series. J. Chem. Inf. Model. 53, 1853–1870 (2013).

    CAS  Article  Google Scholar 

  38. 38

    Dunbar, J.B. et al. CSAR Data Set Release 2012: ligands, affinities, complexes, and docking decoys. J. Chem. Inf. Model. 53, 1842–1852 (2013).

    CAS  Article  Google Scholar 

  39. 39

    Smith, R.D. et al. CSAR benchmark exercise 2013: evaluation of results from a combined computational protein design, docking, and scoring/ranking challenge. J. Chem. Inf. Model. 56, 1022–1031 (2016).

    CAS  Article  Google Scholar 

  40. 40

    Carlson, H.A. et al. CSAR 2014: a benchmark exercise using unpublished data from pharma. J. Chem. Inf. Model. 56, 1063–1077 (2016).

    CAS  Article  Google Scholar 

  41. 41

    Perez, C. & Ortiz, A.R. Evaluation of docking functions for protein-ligand docking. J. Med. Chem. 44, 3768–3785 (2001).

    CAS  Article  Google Scholar 

  42. 42

    Kellenberger, E., Rodrigo, J., Muller, P. & Rognan, D. Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins 57, 225–242 (2004).

    CAS  Article  Google Scholar 

  43. 43

    Perola, E., Walters, W.P. & Charifson, P.S. A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 56, 235–249 (2004).

    CAS  Article  Google Scholar 

  44. 44

    Chen, H., Lyne, P.D., Giordanetto, F., Lovell, T. & Li, J. On evaluating molecular-docking methods for pose prediction and enrichment factors. J. Chem. Inf. Model. 46, 401–415 (2006).

    CAS  Article  Google Scholar 

  45. 45

    Onodera, K., Satou, K. & Hirota, H. Evaluations of molecular docking programs for virtual screening. J. Chem. Inf. Model. 47, 1609–1618 (2007).

    CAS  Article  Google Scholar 

  46. 46

    Kim, R. & Skolnick, J. Assessment of programs for ligand binding affinity prediction. J. Comput. Chem. 29, 1316–1331 (2008).

    CAS  Article  Google Scholar 

  47. 47

    Cross, J.B. et al. Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 49, 1455–1474 (2009).

    CAS  Article  Google Scholar 

  48. 48

    Li, X., Li, Y., Cheng, T., Liu, Z. & Wang, R. Evaluation of the performance of four molecular docking programs on a diverse set of protein-ligand complexes. J. Comput. Chem. 31, 2109–2125 (2010).

    Article  Google Scholar 

  49. 49

    Plewczynski, D., Lazniewski, M., Augustyniak, R. & Ginalski, K. Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J. Comput. Chem. 32, 742–755 (2011).

    CAS  Article  Google Scholar 

  50. 50

    Wang, Z. et al. Comprehensive evaluation of ten docking programs on a diverse set of protein–ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 18, 12964–12975 (2016).

    CAS  Article  Google Scholar 

  51. 51

    Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).

    CAS  Article  Google Scholar 

  52. 52

    Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).

    CAS  Article  Google Scholar 

  53. 53

    Liu, Z.H. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).

    CAS  Article  Google Scholar 

  54. 54

    Hu, L., Benson, M.L., Smith, R.D., Lerner, M.G. & Carlson, H.A. Binding MOAD (Mother of All Databases). Proteins 60, 333–340 (2005).

    CAS  Article  Google Scholar 

  55. 55

    Benson, M.L. et al. Binding MOAD, a high-quality protein-ligand database. Nucleic Acids Res. 36, D674–D678 (2008).

    CAS  Article  Google Scholar 

  56. 56

    Ahmed, A., Smith, R.D., Clark, J.J., Dunbar, J.B. Jr. & Carlson, H.A. Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures. Nucleic Acids Res. 43, D465–D469 (2014).

    Article  Google Scholar 

  57. 57

    Cole, J.C., Murray, C.W., Nissink, W.M., Taylor, R.D. & Taylor, R. Comparing protein-ligand docking programs is difficult. Proteins 60, 325–332 (2005).

    CAS  Article  Google Scholar 

  58. 58

    Jain, A.N. Bias, reporting, and sharing: computational evaluations of docking methods. J. Comput. Aided Mol. Des. 22, 201–212 (2008).

    CAS  Article  Google Scholar 

  59. 59

    Todorov, N.P., Monthoux, P.H. & Alberts, I.L. The influence of variations of ligand protonation and tautomerism on protein-ligand recognition and binding energy landscape. J. Chem. Inf. Model. 46, 1134–1142 (2006).

    CAS  Article  Google Scholar 

  60. 60

    Brink, T. & Exner, T.E. Influence of protonation, tautomeric, and stereoisomeric states on protein-ligand docking results. J. Chem. Inf. Model. 49, 1535–1546 (2009).

    Article  Google Scholar 

  61. 61

    Wang, R., Lu, Y. & Wang, S. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 46, 2287–2303 (2003).

    CAS  Article  Google Scholar 

  62. 62

    Wang, R., Lu, Y., Fang, X. & Wang, S. An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes. J. Chem. Inf. Comput. Sci. 44, 2114–2125 (2004).

    CAS  Article  Google Scholar 

  63. 63

    Ferrara, P., Gohlke, H., Price, D.J., Klebe, G. & Brooks, C.L. Assessing scoring functions for protein-ligand interactions. J. Med. Chem. 47, 3032–3047 (2004).

    CAS  Article  Google Scholar 

  64. 64

    Marsden, P.M., Puvanendrampillai, D., Mitchell, J.B.O. & Glen, R.C. Predicting protein-ligand binding affinities: a low scoring game? Org. Biomol. Chem. 2, 3267–3273 (2004).

    CAS  Article  Google Scholar 

  65. 65

    Oda, A., Tsuchida, K., Takakura, T., Yamaotsu, N. & Hirono, S. Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes. J. Chem. Inf. Model. 46, 380–391 (2006).

    CAS  Article  Google Scholar 

  66. 66

    Dunbar, J.B. et al. CSAR benchmark exercise of 2010: selection of the proteinligand complexes. J. Chem. Inf. Model. 51, 2036–2046 (2011).

    CAS  Article  Google Scholar 

  67. 67

    Smith, R.D. et al. CSAR benchmark exercise of 2010: combined evaluation across all submitted scoring functions. J. Chem. Inf. Model. 51, 2115–2131 (2011).

    CAS  Article  Google Scholar 

  68. 68

    Yilmazer, N.D. & Korth, M. Comparison of molecular mechanics, semi-empirical quantum mechanical, and density functional theory methods for scoring protein-ligand interactions. J. Phys. Chem. B 117, 8075–8084 (2013).

    CAS  Article  Google Scholar 

  69. 69

    Cheng, T., Li, X., Li, Y., Liu, Z. & Wang, R. Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49, 1079–1093 (2009).

    CAS  Article  Google Scholar 

  70. 70

    Li, Y. et al. Comparative assessment of scoring functions on an updated benchmark: I. Compilation of the test set. J. Chem. Inf. Model. 54, 1700–1716 (2014).

    CAS  Article  Google Scholar 

  71. 71

    Li, Y., Han, L., Liu, Z.H. & Wang, R.X. Comparative assessment of scoring functions on an updated benchmark: II. Evaluation methods and general results. J. Chem. Inf. Model. 54, 1717–1736 (2014).

    CAS  Article  Google Scholar 

  72. 72

    Berman, H.M., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980 (2003).

    CAS  Article  Google Scholar 

  73. 73

    Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

    CAS  Article  Google Scholar 

  74. 74

    Wang, R., Lai, L. & Wang, S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Des. 16, 11–26 (2002).

    CAS  Article  Google Scholar 

  75. 75

    Ain, Q.U., Aleksandrova, A., Roessler, F.D. & Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput. Mol. Sci. 5, 405–424 (2015).

    CAS  Article  Google Scholar 

Download references


We are grateful to the users of the CASF benchmark for their valuable feedback. This work was financially supported by the Ministry of Science and Technology of China (National Key Research Program, grant no. 2016YFA0502302), the National Natural Science Foundation of China (grant nos. 81725022, 81430083, 21472227, 21673276, and 21402230), the Chinese Academy of Sciences (Strategic Priority Research Program, grant no. XDB20000000), and the Science and Technology Development Foundation of Macao SAR (grant no. 055/2013/A2).

Author information




R.W. conceived and supervised the project. Y.L. designed the protocol, performed computations, and also drafted the manuscript. M.S., Z.L., J. Li, J. Liu, and L.H. helped with data processing and programming.

Corresponding author

Correspondence to Renxiao Wang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Contents under the ‘decoys_docking/’ directory in the CASF-2013 package.

Only some data files under this directory are shown in this figure as demonstration.

Supplementary Figure 2 Contents under the ‘decoys_screening/’ directory in the CASF-2013 package.

Only some data files under the "10gs/" subdirectory are shown in this figure as demonstration.

Supplementary Figure 3 Information of the target proteins and their known binders recorded in ‘TargetInfo.dat’.

Only some target proteins are shown in this figure. The first four-letter code in each line refers to the PDB entry from which the target protein structure is retrieved; while the rest codes indicate the PDB entries containing the known binders to this target protein. All known binders to the target protein are ranked in a descending order by their binding affinities, i.e. the tightest binder is ranked at the first place.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–9. (PDF 1359 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Su, M., Liu, Z. et al. Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark. Nat Protoc 13, 666–680 (2018).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing