Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark

Li, Yan; Su, Minyi; Liu, Zhihai; Li, Jie; Liu, Jie; Han, Li; Wang, Renxiao

doi:10.1038/nprot.2017.114

Protocol
Published: 08 March 2018

Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark

Yan Li¹,
Minyi Su¹,
Zhihai Liu¹,
Jie Li¹,
Jie Liu¹,
Li Han¹ &
…
Renxiao Wang^1,2

Nature Protocols volume 13, pages 666–680 (2018)Cite this article

4044 Accesses
76 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Scoring functions are a group of computational methods widely applied in structure-based drug design for fast evaluation of protein–ligand interactions. To date, a whole spectrum of scoring functions have been developed based on different assumptions or algorithms. Therefore, it is important to both the end users and the developers of scoring functions that their performance be objectively assessed. We have developed the comparative assessment of scoring functions (CASF) benchmark as an open-access solution for scoring function evaluation. The latest CASF-2013 benchmark enables evaluation of the so-called 'scoring power', 'ranking power', 'docking power', and 'screening power' of a given scoring function with a high-quality test set of 195 complexes formed between diverse protein molecules and their small-molecule ligands. Evaluation results of the standard scoring functions implemented in several mainstream software programs (including Schrödinger, MOE, Discovery Studio, SYBYL, and GOLD) are provided as reference. This benchmark has become popular among the scoring function community since its first release. In this protocol, we provide detailed descriptions of the data files included in the CASF-2013 package and step-by-step instructions on how to conduct the performance tests with the ready-to-use computer scripts included in the package. This protocol is expected to lower the technical hurdles in front of new and existing users of the CASF-2013 benchmark. On a standard desktop workstation, it takes roughly half an hour to complete the whole evaluation procedure for one scoring function, once the required inputs, i.e., the results computed on the test set, are ready to use.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Illustration of how the test set used in CASF-2013 was compiled.**

Figure 2: Correlation between the experimental binding constant of each protein–ligand complex and its ΔSAS (i.e., buried solvent-accessible surface area of the ligand molecule upon binding) in the scoring power test.

**Figure 3: Illustration of how the docking power is evaluated with decoy ligand-binding poses.**

**Figure 4: Information recorded in the 'CoreSet.dat' file.**

**Figure 5: An example output given by the scoring power test.**

**Figure 6: An example output given by the ranking power test.**

**Figure 7: An example output of the docking power test.**

**Figure 8: An example output of the screening power test measured by enrichment factors.**

**Figure 9: An example output of the screening power test measured by the success rate of finding the tightest binder.**

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

An open source knowledge graph ecosystem for the life sciences

Article Open access 11 April 2024

References

Böhm, H.J. & Stahl, M. The use of scoring functions in drug discovery applications. in Reviews in Computational Chemistry, Vol. 18 (eds. Lipkowitz, K.B. & Boyd, D.B.) 41–88 (Wiley-VCH, 2002).
Schulz-Gasch, T. & Stahl, M. Scoring functions for protein-ligand interactions: a critical perspective. Drug Discov. Today Tech. 1, 231–239 (2004).
Article CAS Google Scholar
Leach, A.R., Shoichet, B.K. & Peishoff, C.E. Prediction of protein-ligand interactions. docking and scoring: successes and gaps. J. Med. Chem. 49, 5851–5855 (2006).
Article CAS Google Scholar
Rajamani, R. & Good, A.C. Ranking poses in structure-based lead discovery and optimization: current trends in scoring function development. Curr. Opin. Drug Discov. Dev. 10, 308–315 (2007).
CAS Google Scholar
Brooijmans, N. & Kuntz, I.D. Molecular recognition and docking algorithms. Annu. Rev. Biophys. Biomol. Struct. 32, 335–373 (2003).
Article CAS Google Scholar
Muegge, I. & Rarey, M. Small molecule docking and scoring. in Reviews in Computational Chemistry, Vol. 17 (eds. Lipkowitz, K.B. & Boyd, D.B.) 1–60 (Wiley-VCH, 2001).
Kitchen, D.B., Decornez, H., Furr, J.R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).
Article CAS Google Scholar
Kuntz, I.D., Blaney, J.M., Oatley, S.J., Langridge, R. & Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 161, 269–288 (1982).
Article CAS Google Scholar
Ewing, T.J.A., Makino, S., Skillman, A.G. & Kuntz, I.D. DOCK 4.0: search strategies for automated molecular docking of flexible molecule databases. J. Comput. Aided Mol. Des. 15, 411–428 (2001).
Article CAS Google Scholar
Morris, G.M. et al. Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function. J. Comput. Chem. 19, 1639–1662 (1998).
Article CAS Google Scholar
Morris, G.M. et al. Autodock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 16, 2785–2791 (2009).
Article Google Scholar
Jones, G., Willett, P., Glen, R.C., Leach, A.R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).
Article CAS Google Scholar
Friesner, R.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Article CAS Google Scholar
Halgren, T.A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759 (2004).
Article CAS Google Scholar
Jain, A.N. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. J. Med. Chem. 46, 499–511 (2003).
Article CAS Google Scholar
Jain, A.N. Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility, and knowledge-based search. J. Comput. Aided Mol. Des. 21, 281–306 (2007).
Article CAS Google Scholar
Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).
Article CAS Google Scholar
Kutchukian, P.S. & Shakhnovich, E.I. De novo design: balancing novelty and confined chemical space. Expert Opin. Drug Discov. 5, 789–812 (2010).
Article CAS Google Scholar
Liu, J. & Wang, R. Classification of current scoring functions. J. Chem. Inf. Model. 55, 475–482 (2015).
Article CAS Google Scholar
Charifson, P.S., Corkery, J.J., Murcko, M.A. & Walters, W.P. Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. 42, 5100–5109 (1999).
Article CAS Google Scholar
Bissantz, C., Folkers, G. & Rognan, D. Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 43, 4759–4767 (2000).
Article CAS Google Scholar
Ha, S., Andreani, R., Robbins, A. & Muegge, I. Evaluation of docking/scoring approaches: a comparative study based on MMP3 inhibitors. J. Comput. Aided Mol. Des. 14, 435–448 (2000).
Article CAS Google Scholar
Stahl, M. & Rarey, M. Detailed analysis of scoring functions for virtual screening. J. Med. Chem. 44, 1035–1042 (2001).
Article CAS Google Scholar
Bursulaya, B., Totrov, M., Abagyan, R. & Brooks, C. Comparative study of several algorithms for flexible ligand docking. J. Comput. Aided Mol. Des. 17, 755–763 (2003).
Article CAS Google Scholar
Xing, L., Hodgkin, E., Liu, Q. & Sedlock, D. Evaluation and application of multiple scoring functions for a virtual screening experiment. J. Comput. Aided Mol. Des. 18, 333–344 (2004).
Article CAS Google Scholar
Hu, X., Balaz, S. & Shelver, W.H. A practical approach to docking of zinc metalloproteinase inhibitors. J. Mol. Graph. Model. 22, 293–307 (2004).
Article CAS Google Scholar
Kontoyianni, M., McClellan, L.M. & Sokol, G.S. Evaluation of docking performance: comparative data on docking algorithms. J. Med. Chem. 47, 558–565 (2004).
Article CAS Google Scholar
Kontoyianni, M., Sokol, G.S. & MCclellan, L.M. Evaluation of library ranking efficacy in virtual screening. J. Comput. Chem. 26, 11–22 (2005).
Article CAS Google Scholar
Cummings, M.D., DesJarlais, R.L., Gibbs, A.C., Mohan, V. & Jaeger, E.P. Comparison of automated docking programs as virtual screening tools. J. Med. Chem. 48, 962–976 (2005).
Article CAS Google Scholar
Evers, A. & Klabunde, T. Structure-based drug discovery using GPCR homology modeling: successful virtual screening for antagonists of the alpha1a adrenergic receptor. J. Med. Chem. 48, 1088–1097 (2005).
Article CAS Google Scholar
Warren, G.L. et al. A critical assessment of docking programs and scoring functions. J. Med. Chem. 49, 5912–5931 (2006).
Article CAS Google Scholar
Zhou, Z., Felts, A.K., Friesner, R.A. & Levy, R.M. Comparative performance of several flexible docking programs and scoring functions: enrichment studies for a diverse set of pharmaceutically relevant targets. J. Chem. Inf. Model. 47, 1599–1608 (2007).
Article CAS Google Scholar
McGaughey, G.B. et al. Comparison of topological, shape, and docking methods in virtual screening. J. Chem. Inf. Model. 47, 1504–1519 (2007).
Article CAS Google Scholar
Houston, D.R. & Walkinshaw, M.D. Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model. 53, 384–390 (2013).
Article CAS Google Scholar
Tuccinardi, T., Poli, G., Romboli, V., Giordano, A. & Martinelli, A. Extensive consensus docking evaluation for ligand pose prediction and virtual screening studies. J. Chem. Inf. Model. 54, 2980–2986 (2014).
Article CAS Google Scholar
Xu, W., Lucke, A.J. & Fairlie, D.P. Comparing sixteen scoring functions for predicting biological activities of ligands for protein targets. J. Mol. Graph. Model. 57, 76–88 (2015).
Article CAS Google Scholar
Damm-Ganamet, K.L., Smith, R.D., Dunbar, J.B., Stuckey, J.A. & Carlson, H.A. CSAR benchmark exercise 2011–2012: evaluation of results from docking and relative ranking of blinded congeneric series. J. Chem. Inf. Model. 53, 1853–1870 (2013).
Article CAS Google Scholar
Dunbar, J.B. et al. CSAR Data Set Release 2012: ligands, affinities, complexes, and docking decoys. J. Chem. Inf. Model. 53, 1842–1852 (2013).
Article CAS Google Scholar
Smith, R.D. et al. CSAR benchmark exercise 2013: evaluation of results from a combined computational protein design, docking, and scoring/ranking challenge. J. Chem. Inf. Model. 56, 1022–1031 (2016).
Article CAS Google Scholar
Carlson, H.A. et al. CSAR 2014: a benchmark exercise using unpublished data from pharma. J. Chem. Inf. Model. 56, 1063–1077 (2016).
Article CAS Google Scholar
Perez, C. & Ortiz, A.R. Evaluation of docking functions for protein-ligand docking. J. Med. Chem. 44, 3768–3785 (2001).
Article CAS Google Scholar
Kellenberger, E., Rodrigo, J., Muller, P. & Rognan, D. Comparative evaluation of eight docking tools for docking and virtual screening accuracy. Proteins 57, 225–242 (2004).
Article CAS Google Scholar
Perola, E., Walters, W.P. & Charifson, P.S. A detailed comparison of current docking and scoring methods on systems of pharmaceutical relevance. Proteins 56, 235–249 (2004).
Article CAS Google Scholar
Chen, H., Lyne, P.D., Giordanetto, F., Lovell, T. & Li, J. On evaluating molecular-docking methods for pose prediction and enrichment factors. J. Chem. Inf. Model. 46, 401–415 (2006).
Article CAS Google Scholar
Onodera, K., Satou, K. & Hirota, H. Evaluations of molecular docking programs for virtual screening. J. Chem. Inf. Model. 47, 1609–1618 (2007).
Article CAS Google Scholar
Kim, R. & Skolnick, J. Assessment of programs for ligand binding affinity prediction. J. Comput. Chem. 29, 1316–1331 (2008).
Article CAS Google Scholar
Cross, J.B. et al. Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J. Chem. Inf. Model. 49, 1455–1474 (2009).
Article CAS Google Scholar
Li, X., Li, Y., Cheng, T., Liu, Z. & Wang, R. Evaluation of the performance of four molecular docking programs on a diverse set of protein-ligand complexes. J. Comput. Chem. 31, 2109–2125 (2010).
Article Google Scholar
Plewczynski, D., Lazniewski, M., Augustyniak, R. & Ginalski, K. Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J. Comput. Chem. 32, 742–755 (2011).
Article CAS Google Scholar
Wang, Z. et al. Comprehensive evaluation of ten docking programs on a diverse set of protein–ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 18, 12964–12975 (2016).
Article CAS Google Scholar
Wang, R., Fang, X., Lu, Y. & Wang, S. The PDBbind database: collection of binding affinities for protein-ligand complexes with known three-dimensional structures. J. Med. Chem. 47, 2977–2980 (2004).
Article CAS Google Scholar
Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).
Article CAS Google Scholar
Liu, Z.H. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).
Article CAS Google Scholar
Hu, L., Benson, M.L., Smith, R.D., Lerner, M.G. & Carlson, H.A. Binding MOAD (Mother of All Databases). Proteins 60, 333–340 (2005).
Article CAS Google Scholar
Benson, M.L. et al. Binding MOAD, a high-quality protein-ligand database. Nucleic Acids Res. 36, D674–D678 (2008).
Article CAS Google Scholar
Ahmed, A., Smith, R.D., Clark, J.J., Dunbar, J.B. Jr. & Carlson, H.A. Recent improvements to Binding MOAD: a resource for protein-ligand binding affinities and structures. Nucleic Acids Res. 43, D465–D469 (2014).
Article Google Scholar
Cole, J.C., Murray, C.W., Nissink, W.M., Taylor, R.D. & Taylor, R. Comparing protein-ligand docking programs is difficult. Proteins 60, 325–332 (2005).
Article CAS Google Scholar
Jain, A.N. Bias, reporting, and sharing: computational evaluations of docking methods. J. Comput. Aided Mol. Des. 22, 201–212 (2008).
Article CAS Google Scholar
Todorov, N.P., Monthoux, P.H. & Alberts, I.L. The influence of variations of ligand protonation and tautomerism on protein-ligand recognition and binding energy landscape. J. Chem. Inf. Model. 46, 1134–1142 (2006).
Article CAS Google Scholar
Brink, T. & Exner, T.E. Influence of protonation, tautomeric, and stereoisomeric states on protein-ligand docking results. J. Chem. Inf. Model. 49, 1535–1546 (2009).
Article Google Scholar
Wang, R., Lu, Y. & Wang, S. Comparative evaluation of 11 scoring functions for molecular docking. J. Med. Chem. 46, 2287–2303 (2003).
Article CAS Google Scholar
Wang, R., Lu, Y., Fang, X. & Wang, S. An extensive test of 14 scoring functions using the PDBbind refined set of 800 protein-ligand complexes. J. Chem. Inf. Comput. Sci. 44, 2114–2125 (2004).
Article CAS Google Scholar
Ferrara, P., Gohlke, H., Price, D.J., Klebe, G. & Brooks, C.L. Assessing scoring functions for protein-ligand interactions. J. Med. Chem. 47, 3032–3047 (2004).
Article CAS Google Scholar
Marsden, P.M., Puvanendrampillai, D., Mitchell, J.B.O. & Glen, R.C. Predicting protein-ligand binding affinities: a low scoring game? Org. Biomol. Chem. 2, 3267–3273 (2004).
Article CAS Google Scholar
Oda, A., Tsuchida, K., Takakura, T., Yamaotsu, N. & Hirono, S. Comparison of consensus scoring strategies for evaluating computational models of protein-ligand complexes. J. Chem. Inf. Model. 46, 380–391 (2006).
Article CAS Google Scholar
Dunbar, J.B. et al. CSAR benchmark exercise of 2010: selection of the proteinligand complexes. J. Chem. Inf. Model. 51, 2036–2046 (2011).
Article CAS Google Scholar
Smith, R.D. et al. CSAR benchmark exercise of 2010: combined evaluation across all submitted scoring functions. J. Chem. Inf. Model. 51, 2115–2131 (2011).
Article CAS Google Scholar
Yilmazer, N.D. & Korth, M. Comparison of molecular mechanics, semi-empirical quantum mechanical, and density functional theory methods for scoring protein-ligand interactions. J. Phys. Chem. B 117, 8075–8084 (2013).
Article CAS Google Scholar
Cheng, T., Li, X., Li, Y., Liu, Z. & Wang, R. Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49, 1079–1093 (2009).
Article CAS Google Scholar
Li, Y. et al. Comparative assessment of scoring functions on an updated benchmark: I. Compilation of the test set. J. Chem. Inf. Model. 54, 1700–1716 (2014).
Article CAS Google Scholar
Li, Y., Han, L., Liu, Z.H. & Wang, R.X. Comparative assessment of scoring functions on an updated benchmark: II. Evaluation methods and general results. J. Chem. Inf. Model. 54, 1717–1736 (2014).
Article CAS Google Scholar
Berman, H.M., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Biol. 10, 980 (2003).
Article CAS Google Scholar
Lipinski, C.A., Lombardo, F., Dominy, B.W. & Feeney, P.J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).
Article CAS Google Scholar
Wang, R., Lai, L. & Wang, S. Further development and validation of empirical scoring functions for structure-based binding affinity prediction. J. Comput. Aided Mol. Des. 16, 11–26 (2002).
Article CAS Google Scholar
Ain, Q.U., Aleksandrova, A., Roessler, F.D. & Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. WIREs Comput. Mol. Sci. 5, 405–424 (2015).
Article CAS Google Scholar

Download references

Acknowledgements

We are grateful to the users of the CASF benchmark for their valuable feedback. This work was financially supported by the Ministry of Science and Technology of China (National Key Research Program, grant no. 2016YFA0502302), the National Natural Science Foundation of China (grant nos. 81725022, 81430083, 21472227, 21673276, and 21402230), the Chinese Academy of Sciences (Strategic Priority Research Program, grant no. XDB20000000), and the Science and Technology Development Foundation of Macao SAR (grant no. 055/2013/A2).

Author information

Authors and Affiliations

State Key Laboratory of Bioorganic and Natural Products Chemistry, Center for Excellence in Molecular Synthesis, Shanghai Institute of Organic Chemistry, Chinese Academy of Sciences, Shanghai, China
Yan Li, Minyi Su, Zhihai Liu, Jie Li, Jie Liu, Li Han & Renxiao Wang
State Key Laboratory of Quality Research in Chinese Medicine, Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macau, People's Republic of China.,
Renxiao Wang

Authors

Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Minyi Su
View author publications
You can also search for this author in PubMed Google Scholar
Zhihai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jie Li
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Li Han
View author publications
You can also search for this author in PubMed Google Scholar
Renxiao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.W. conceived and supervised the project. Y.L. designed the protocol, performed computations, and also drafted the manuscript. M.S., Z.L., J. Li, J. Liu, and L.H. helped with data processing and programming.

Corresponding author

Correspondence to Renxiao Wang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Contents under the ‘decoys_docking/’ directory in the CASF-2013 package.

Only some data files under this directory are shown in this figure as demonstration.

Supplementary Figure 2 Contents under the ‘decoys_screening/’ directory in the CASF-2013 package.

Only some data files under the "10gs/" subdirectory are shown in this figure as demonstration.

Supplementary Figure 3 Information of the target proteins and their known binders recorded in ‘TargetInfo.dat’.

Only some target proteins are shown in this figure. The first four-letter code in each line refers to the PDB entry from which the target protein structure is retrieved; while the rest codes indicate the PDB entries containing the known binders to this target protein. All known binders to the target protein are ranked in a descending order by their binding affinities, i.e. the tightest binder is ranked at the first place.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3 and Supplementary Tables 1–9. (PDF 1359 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Su, M., Liu, Z. et al. Assessing protein–ligand interaction scoring functions with the CASF-2013 benchmark. Nat Protoc 13, 666–680 (2018). https://doi.org/10.1038/nprot.2017.114

Download citation

Published: 08 March 2018
Issue Date: April 2018
DOI: https://doi.org/10.1038/nprot.2017.114

This article is cited by

Molecular docking in organic, inorganic, and hybrid systems: a tutorial review
- Madhuchhanda Mohanty
- Priti S. Mohanty
Monatshefte für Chemie - Chemical Monthly (2023)
Pharmacophore-based virtual screening of ZINC database, molecular modeling and designing new derivatives as potential HDAC6 inhibitors
- Priya Poonia
- Monika Sharma
- Madhu Chopra
Molecular Diversity (2023)
Ensemble learning from ensemble docking: revisiting the optimum ensemble size problem
- Sara Mohammadi
- Zahra Narimani
- Mohammad Hossein Karimi‐Jafari
Scientific Reports (2022)
New machine learning and physics-based scoring functions for drug discovery
- Isabella A. Guedes
- André M. S. Barreto
- Maria A. Miteva
Scientific Reports (2021)
A geometric deep learning approach to predict binding conformations of bioactive molecules
- Oscar Méndez-Lucio
- Mazen Ahmad
- Jörg Kurt Wegner
Nature Machine Intelligence (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.