Efficient and accurate large library ligand docking with KarmaDock

Zhang, Xujun; Zhang, Odin; Shen, Chao; Qu, Wanglin; Chen, Shicheng; Cao, Hanqun; Kang, Yu; Wang, Zhe; Wang, Ercheng; Zhang, Jintu; Deng, Yafeng; Liu, Furui; Wang, Tianyue; Du, Hongyan; Wang, Langcheng; Pan, Peichen; Chen, Guangyong; Hsieh, Chang-Yu; Hou, Tingjun

doi:10.1038/s43588-023-00511-5

Article
Published: 21 September 2023

Efficient and accurate large library ligand docking with KarmaDock

Nature Computational Science volume 3, pages 789–804 (2023)Cite this article

3945 Accesses
8 Citations
14 Altmetric
Metrics details

Subjects

Abstract

Ligand docking is one of the core technologies in structure-based virtual screening for drug discovery. However, conventional docking tools and existing deep learning tools may suffer from limited performance in terms of speed, pose quality and binding affinity accuracy. Here we propose KarmaDock, a deep learning approach for ligand docking that integrates the functions of docking acceleration, binding pose generation and correction, and binding strength estimation. The three-stage model consists of the following components: (1) encoders for the protein and ligand to learn the representations of intramolecular interactions; (2) E(n) equivariant graph neural networks with self-attention to update the ligand pose based on both protein–ligand and intramolecular interactions, followed by post-processing to ensure chemically plausible structures; (3) a mixture density network for scoring the binding strength. KarmaDock was validated on four benchmark datasets and tested in a real-world virtual screening project that successfully identified experiment-validated active inhibitors of leukocyte tyrosine kinase (LTK).

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: The accuracy and speed of KarmaDock.**

**Fig. 3: The impacts of heavy atoms and rotatable bond numbers on docking speed and accuracy.**

**Fig. 4: Impact of post-processing on the rationality of the binding poses.**

**Fig. 6: VS with experimental validation targeting LTK.**

Calibrated geometric deep learning improves kinase–drug binding predictions

Article 06 November 2023

Improving the generalizability of protein-ligand binding predictions with AI-Bind

Article Open access 08 April 2023

State-specific protein–ligand complex structure prediction with a multiscale deep generative model

Article 12 February 2024

Data availability

The raw datasets^26,27,28,29 are available at http://pdbbind.org.cn/index.php, https://github.com/devalab/Apobind and http://www.pharmchem.uni-tuebingen.de/dekois/data/DEKOIS2.0_library/DEKOIS2.0_library.rar. The prepared datasets^42,43,44 are available at https://zenodo.org/record/7788083, https://zenodo.org/record/8211452 and https://zenodo.org/record/8131256. PDB IDs 1S38, 1SQA, 4JXS, 1PS3, 3DXG, 3D4Z, 4CLI, 4JSZ and 4CTB are available in the Protein Data Bank (https://www.rcsb.org/)³⁷. Source data are available with this paper.

Code availability

The source code is available at Zenodo (https://zenodo.org/record/8211513)⁴⁵ and GitHub (https://github.com/schrojunzhang/KarmaDock).

References

Shen, C. et al. From machine learning to deep learning: advances in scoring functions for protein-ligand docking. WIREs Comput. Mol. Sci. 10, e1429 (2020).
Article Google Scholar
Morris, G. M. et al. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791 (2009).
Article Google Scholar
Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2.0: new docking methods, expanded force field and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).
Article Google Scholar
Zhao, H. & Caflisch, A. Discovery of ZAP70 inhibitors by high-throughput docking into a conformation of its kinase domain generated by molecular dynamics. Bioorg. Med. Chem. Lett. 23, 5721–5726 (2013).
Article Google Scholar
Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Article Google Scholar
Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).
Article Google Scholar
Irwin, J. J. et al. ZINC20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).
Article Google Scholar
Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).
Article Google Scholar
Santos-Martins, D. et al. Accelerating AutoDock4 with GPUs and gradient-based local search. J. Chem. Theory Comput. 17, 1060–1073 (2021).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci 14, 1557–1568 (2023).
Article Google Scholar
Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional diffusion for molecular conformer generation. in Advances in Neural Information Processing Systems (eds. Koyejo, S. et al.) Vol. 35, 24240–24253 (Curran Associates, Inc., 2022).
Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. in International Conference on Learning Representations (2022).
Zhang, Y., Cai, H., Shi, C. & Tang, J. E3Bind: an end-to-end equivariant network for protein-ligand docking. in International Conference on Learning Representations (2023).
Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: geometric deep learning for drug binding structure prediction. in Proceedings of the 39th International Conference on Machine Learning 20503–20521 (PMLR, 2022).
Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15171 (2023).
Lu, W. et al. TANKBind: trigonometry-aware neural networKs for drug-protein binding structure prediction. in Advances in Neural Information Processing Systems Vol. 35, 7236–7249 (2022).
Junfeng, Z., Kelei, H., Tiejun, D. & Wu, J. Accurate protein-ligand complex structure prediction using geometric deep learning. Res. Square https://doi.org/10.21203/rs.3.rs-1454132/v1 (2022).
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.01776 (2023).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. in Proceedings of the 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) Vol. 139, 9323–9332 (PMLR, 2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) Vol. 33, 6840–6851 (Curran Associates, 2020).
Hu, X. et al. Discovery of novel non-steroidal selective glucocorticoid receptor modulators by structure- and IGN-based virtual screening, structural optimization and biological evaluation. Eur. J. Med. Chem. 237, 114382 (2022).
Article Google Scholar
Hu, X. et al. Discovery of novel GR ligands toward druggable GR antagonist conformations identified by MD simulations and Markov state model analysis. Adv. Sci. 9, 2102435 (2022).
Article Google Scholar
Shen, C. et al. Boosting protein-ligand binding pose prediction and virtual screening based on residue-atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).
Article MathSciNet Google Scholar
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. in International Conference on Learning Representations (2021).
Liu, Z. et al. Forging the basis for developing protein-ligand interaction scoring functions. Acc. Chem. Res. 50, 302–309 (2017).
Article Google Scholar
Aggarwal, R., Gupta, A. & Priyakumar, U. D. APObind: a dataset of ligand unbound protein conformations for machine learning applications in de novo drug design. Preprint at arXiv https://doi.org/10.48550/arXiv.2108.09926 (2021).
Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2019).
Article Google Scholar
Bauer, M. R., Ibrahim, T. M., Vogel, S. M. & Boeckler, F. M. Evaluation and optimization of virtual screening workflows with DEKOIS 2.0—a public library of challenging docking benchmark sets. J. Chem. Inf. Model. 53, 1447–1462 (2013).
Article Google Scholar
Friesner, R. A. et al. Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 49, 6177–6196 (2006).
Article Google Scholar
Wang, Z. et al. Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 18, 12964–12975 (2016).
Article Google Scholar
Jain, A. N. Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility and knowledge-based search. J. Comput. Aided Mol. Des. 21, 281–306 (2007).
Article Google Scholar
Madhavi Sastry, G., Adzhigirey, M., Day, T., Annabhimoju, R. & Sherman, W. Protein and ligand preparation: parameters, protocols and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 27, 221–234 (2013).
Article Google Scholar
Truchon, J.-F. & Bayly, C. I. Evaluating virtual screening methods: good and bad metrics for the ‘early recognition’ problem. J. Chem. Inf. Model. 47, 488–508 (2007).
Article Google Scholar
Izumi, H. et al. The CLIP1-LTK fusion is an oncogenic driver in non-small-cell lung cancer. Nature 600, 319–323 (2021).
Article Google Scholar
Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46, W296–W303 (2018).
Article Google Scholar
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Article Google Scholar
Kaminski, G. A., Friesner, R. A., Tirado-Rives, J. & Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 105, 6474–6487 (2001).
Article Google Scholar
Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).
Article Google Scholar
Shelley, J. C. et al. Epik: a software program for pK(a) prediction and protonation state generation for drug-like molecules. J. Comput. Aided Mol. Des. 21, 681–691 (2007).
Article Google Scholar
Wójcikowski, M., Zielenkiewicz, P. & Siedlecki, P. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. J. Cheminform. 7, 26 (2015).
Article Google Scholar
Zhang, X. J. APObind core set for KarmaDock (229 protein-ligand complexes) Zenodo https://doi.org/10.5281/zenodo.8211452 (2023).
Zhang, X. J. DEKOIS2.0 for KarmaDock Zenodo https://doi.org/10.5281/zenodo.8131256 (2023).
Zhang, X. J. KarmaDock_PDBBind2020_coreset (1.0) Zenodo https://doi.org/10.5281/zenodo.7788083 (2023).
Zhang, X. J. schrojunzhang/KarmaDock: v1.0.0 Zenodo https://doi.org/10.5281/zenodo.8211513 (2023).

Download references

Acknowledgements

This work was financially supported by the National Key Research and Development Program of China (2022YFF1203000), the National Natural Science Foundation of China (22220102001, 82204279 and 22007082), the Natural Science Foundation of Zhejiang Province (LD22H300001 and LQ21B030013) and Fundamental Research Funds for the Central Universities (226-2022-00220). We also thank L. Xu at Jiangsu University of Technology for preparing all the compounds used in this study based on the Glide module in Schrödinger software, which substantially contributed to our research.

Author information

These authors contributed equally: Xujun Zhang, Odin Zhang.

Authors and Affiliations

Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang, China
Xujun Zhang, Odin Zhang, Chao Shen, Wanglin Qu, Shicheng Chen, Yu Kang, Zhe Wang, Jintu Zhang, Tianyue Wang, Hongyan Du, Peichen Pan, Chang-Yu Hsieh & Tingjun Hou
Department of Mathematics, Chinese University of Hong Kong, Hong Kong, China
Hanqun Cao
Zhejiang Lab, Hangzhou, Zhejiang, China
Ercheng Wang, Furui Liu & Guangyong Chen
Hangzhou Carbonsilicon AI Technology Co., Ltd, Hangzhou, Zhejiang, China
Yafeng Deng
Department of Pathology, New York University Medical Center, New York, NY, USA
Langcheng Wang

Authors

Xujun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Odin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chao Shen
View author publications
You can also search for this author in PubMed Google Scholar
Wanglin Qu
View author publications
You can also search for this author in PubMed Google Scholar
Shicheng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hanqun Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yu Kang
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ercheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jintu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yafeng Deng
View author publications
You can also search for this author in PubMed Google Scholar
Furui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Tianyue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Du
View author publications
You can also search for this author in PubMed Google Scholar
Langcheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Peichen Pan
View author publications
You can also search for this author in PubMed Google Scholar
Guangyong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chang-Yu Hsieh
View author publications
You can also search for this author in PubMed Google Scholar
Tingjun Hou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.Z., O.Z. and C.S. developed this method, analyzed the data and wrote the manuscript. W.Q. and S.C. bought the compounds and measured their IC₅₀ values. H.C., Y.K., Z.W., E.W., J.Z., Y.D., F.L., T.W., H.D. and L.W. evaluated and interpreted the results and wrote the manuscript. P.P., G.C., C.-Y.H. and T.H. conceived and supervised the project, interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Peichen Pan, Guangyong Chen, Chang-Yu Hsieh or Tingjun Hou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Matthew Holcomb and Shina Kamerlin for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary sections 1–5, Figs. 1–3 and Tables 1–7.

Reporting Summary

Supplementary Data 1

The docking power and screening power of various models on CASF 2016. Tool rank denotes the ranking of the tools based on their performance. Success rate: a metric used for evaluating the docking power of SFs, represents the ratio of successfully docked complexes (r.m.s.d. ≤ 2 Å) to all the tested complexes; Model type signifies the type of algorithm or methodology used in the tool (for example, DL for deep learning models, DK for traditional docking program and HB for hybrid models); EF 1% (enrichment factor), a metric used for assessing the screening power of SFs, represents the enrichment of active ligands in top 1% scored compounds.

Supplementary Data 2

Interaction reproduction. Each row represents a specific interaction type for a given structure, the post-processing type applied to the data, the repeat number and the observed reproduction rate of the interaction.

Supplementary Data 3

Pose refinement results for structures from PDBBind core set. The data provides r.m.s.d. values indicating the accuracy of predicted poses generated by various methods and their respective post-processing treatments.

Source data

Source Data Fig. 2

Statistical source data for Fig. 2 showing the success rate, docking speed, heavy atoms number, rotatable bonds number for PDBBind refined set.

Source Data Fig. 3

Statistical source data for Fig. 3 showing the success rate, docking speed, heavy atoms number, rotatable bonds number for PDBBind refined set.

Source Data Fig. 4

Statistical source data for Fig. 4 showing the conformation errors.

Source Data Fig. 5

Statistical source data for Fig. 5 showing the accuracy and screening power of KarmaDock on DEKOIS 2.0.

Source Data Fig. 6

Statistical source data for Fig. 6 showing the Inhibition Activity of compounds on BaF3/CLIP1-LTK cells.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, X., Zhang, O., Shen, C. et al. Efficient and accurate large library ligand docking with KarmaDock. Nat Comput Sci 3, 789–804 (2023). https://doi.org/10.1038/s43588-023-00511-5

Download citation

Received: 20 March 2023
Accepted: 08 August 2023
Published: 21 September 2023
Issue Date: September 2023
DOI: https://doi.org/10.1038/s43588-023-00511-5