Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Efficient and accurate large library ligand docking with KarmaDock

Abstract

Ligand docking is one of the core technologies in structure-based virtual screening for drug discovery. However, conventional docking tools and existing deep learning tools may suffer from limited performance in terms of speed, pose quality and binding affinity accuracy. Here we propose KarmaDock, a deep learning approach for ligand docking that integrates the functions of docking acceleration, binding pose generation and correction, and binding strength estimation. The three-stage model consists of the following components: (1) encoders for the protein and ligand to learn the representations of intramolecular interactions; (2) E(n) equivariant graph neural networks with self-attention to update the ligand pose based on both protein–ligand and intramolecular interactions, followed by post-processing to ensure chemically plausible structures; (3) a mixture density network for scoring the binding strength. KarmaDock was validated on four benchmark datasets and tested in a real-world virtual screening project that successfully identified experiment-validated active inhibitors of leukocyte tyrosine kinase (LTK).

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow of KarmaDock.
Fig. 2: The accuracy and speed of KarmaDock.
Fig. 3: The impacts of heavy atoms and rotatable bond numbers on docking speed and accuracy.
Fig. 4: Impact of post-processing on the rationality of the binding poses.
Fig. 5: Screening power on DEKOIS.
Fig. 6: VS with experimental validation targeting LTK.

Similar content being viewed by others

Data availability

The raw datasets26,27,28,29 are available at http://pdbbind.org.cn/index.php, https://github.com/devalab/Apobind and http://www.pharmchem.uni-tuebingen.de/dekois/data/DEKOIS2.0_library/DEKOIS2.0_library.rar. The prepared datasets42,43,44 are available at https://zenodo.org/record/7788083, https://zenodo.org/record/8211452 and https://zenodo.org/record/8131256. PDB IDs 1S38, 1SQA, 4JXS, 1PS3, 3DXG, 3D4Z, 4CLI, 4JSZ and 4CTB are available in the Protein Data Bank (https://www.rcsb.org/)37. Source data are available with this paper.

Code availability

The source code is available at Zenodo (https://zenodo.org/record/8211513)45 and GitHub (https://github.com/schrojunzhang/KarmaDock).

References

  1. Shen, C. et al. From machine learning to deep learning: advances in scoring functions for protein-ligand docking. WIREs Comput. Mol. Sci. 10, e1429 (2020).

    Article  Google Scholar 

  2. Morris, G. M. et al. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791 (2009).

    Article  Google Scholar 

  3. Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2.0: new docking methods, expanded force field and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).

    Article  Google Scholar 

  4. Zhao, H. & Caflisch, A. Discovery of ZAP70 inhibitors by high-throughput docking into a conformation of its kinase domain generated by molecular dynamics. Bioorg. Med. Chem. Lett. 23, 5721–5726 (2013).

    Article  Google Scholar 

  5. Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).

    Article  Google Scholar 

  6. Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).

    Article  Google Scholar 

  7. Irwin, J. J. et al. ZINC20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).

    Article  Google Scholar 

  8. Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).

    Article  Google Scholar 

  9. Santos-Martins, D. et al. Accelerating AutoDock4 with GPUs and gradient-based local search. J. Chem. Theory Comput. 17, 1060–1073 (2021).

    Article  Google Scholar 

  10. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  11. Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci 14, 1557–1568 (2023).

    Article  Google Scholar 

  12. Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional diffusion for molecular conformer generation. in Advances in Neural Information Processing Systems (eds. Koyejo, S. et al.) Vol. 35, 24240–24253 (Curran Associates, Inc., 2022).

  13. Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. in International Conference on Learning Representations (2022).

  14. Zhang, Y., Cai, H., Shi, C. & Tang, J. E3Bind: an end-to-end equivariant network for protein-ligand docking. in International Conference on Learning Representations (2023).

  15. Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: geometric deep learning for drug binding structure prediction. in Proceedings of the 39th International Conference on Machine Learning 20503–20521 (PMLR, 2022).

  16. Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15171 (2023).

  17. Lu, W. et al. TANKBind: trigonometry-aware neural networKs for drug-protein binding structure prediction. in Advances in Neural Information Processing Systems Vol. 35, 7236–7249 (2022).

  18. Junfeng, Z., Kelei, H., Tiejun, D. & Wu, J. Accurate protein-ligand complex structure prediction using geometric deep learning. Res. Square https://doi.org/10.21203/rs.3.rs-1454132/v1 (2022).

  19. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.01776 (2023).

  20. Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. in Proceedings of the 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) Vol. 139, 9323–9332 (PMLR, 2021).

  21. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) Vol. 33, 6840–6851 (Curran Associates, 2020).

  22. Hu, X. et al. Discovery of novel non-steroidal selective glucocorticoid receptor modulators by structure- and IGN-based virtual screening, structural optimization and biological evaluation. Eur. J. Med. Chem. 237, 114382 (2022).

    Article  Google Scholar 

  23. Hu, X. et al. Discovery of novel GR ligands toward druggable GR antagonist conformations identified by MD simulations and Markov state model analysis. Adv. Sci. 9, 2102435 (2022).

    Article  Google Scholar 

  24. Shen, C. et al. Boosting protein-ligand binding pose prediction and virtual screening based on residue-atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).

    Article  MathSciNet  Google Scholar 

  25. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. in International Conference on Learning Representations (2021).

  26. Liu, Z. et al. Forging the basis for developing protein-ligand interaction scoring functions. Acc. Chem. Res. 50, 302–309 (2017).

    Article  Google Scholar 

  27. Aggarwal, R., Gupta, A. & Priyakumar, U. D. APObind: a dataset of ligand unbound protein conformations for machine learning applications in de novo drug design. Preprint at arXiv https://doi.org/10.48550/arXiv.2108.09926 (2021).

  28. Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2019).

    Article  Google Scholar 

  29. Bauer, M. R., Ibrahim, T. M., Vogel, S. M. & Boeckler, F. M. Evaluation and optimization of virtual screening workflows with DEKOIS 2.0—a public library of challenging docking benchmark sets. J. Chem. Inf. Model. 53, 1447–1462 (2013).

    Article  Google Scholar 

  30. Friesner, R. A. et al. Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 49, 6177–6196 (2006).

    Article  Google Scholar 

  31. Wang, Z. et al. Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 18, 12964–12975 (2016).

    Article  Google Scholar 

  32. Jain, A. N. Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility and knowledge-based search. J. Comput. Aided Mol. Des. 21, 281–306 (2007).

    Article  Google Scholar 

  33. Madhavi Sastry, G., Adzhigirey, M., Day, T., Annabhimoju, R. & Sherman, W. Protein and ligand preparation: parameters, protocols and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 27, 221–234 (2013).

    Article  Google Scholar 

  34. Truchon, J.-F. & Bayly, C. I. Evaluating virtual screening methods: good and bad metrics for the ‘early recognition’ problem. J. Chem. Inf. Model. 47, 488–508 (2007).

    Article  Google Scholar 

  35. Izumi, H. et al. The CLIP1-LTK fusion is an oncogenic driver in non-small-cell lung cancer. Nature 600, 319–323 (2021).

    Article  Google Scholar 

  36. Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46, W296–W303 (2018).

    Article  Google Scholar 

  37. Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).

    Article  Google Scholar 

  38. Kaminski, G. A., Friesner, R. A., Tirado-Rives, J. & Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 105, 6474–6487 (2001).

    Article  Google Scholar 

  39. Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).

    Article  Google Scholar 

  40. Shelley, J. C. et al. Epik: a software program for pK(a) prediction and protonation state generation for drug-like molecules. J. Comput. Aided Mol. Des. 21, 681–691 (2007).

    Article  Google Scholar 

  41. Wójcikowski, M., Zielenkiewicz, P. & Siedlecki, P. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. J. Cheminform. 7, 26 (2015).

    Article  Google Scholar 

  42. Zhang, X. J. APObind core set for KarmaDock (229 protein-ligand complexes) Zenodo https://doi.org/10.5281/zenodo.8211452 (2023).

  43. Zhang, X. J. DEKOIS2.0 for KarmaDock Zenodo https://doi.org/10.5281/zenodo.8131256 (2023).

  44. Zhang, X. J. KarmaDock_PDBBind2020_coreset (1.0) Zenodo https://doi.org/10.5281/zenodo.7788083 (2023).

  45. Zhang, X. J. schrojunzhang/KarmaDock: v1.0.0 Zenodo https://doi.org/10.5281/zenodo.8211513 (2023).

Download references

Acknowledgements

This work was financially supported by the National Key Research and Development Program of China (2022YFF1203000), the National Natural Science Foundation of China (22220102001, 82204279 and 22007082), the Natural Science Foundation of Zhejiang Province (LD22H300001 and LQ21B030013) and Fundamental Research Funds for the Central Universities (226-2022-00220). We also thank L. Xu at Jiangsu University of Technology for preparing all the compounds used in this study based on the Glide module in Schrödinger software, which substantially contributed to our research.

Author information

Authors and Affiliations

Authors

Contributions

X.Z., O.Z. and C.S. developed this method, analyzed the data and wrote the manuscript. W.Q. and S.C. bought the compounds and measured their IC50 values. H.C., Y.K., Z.W., E.W., J.Z., Y.D., F.L., T.W., H.D. and L.W. evaluated and interpreted the results and wrote the manuscript. P.P., G.C., C.-Y.H. and T.H. conceived and supervised the project, interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Peichen Pan, Guangyong Chen, Chang-Yu Hsieh or Tingjun Hou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Computational Science thanks Matthew Holcomb and Shina Kamerlin for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary sections 1–5, Figs. 1–3 and Tables 1–7.

Reporting Summary

Supplementary Data 1

The docking power and screening power of various models on CASF 2016. Tool rank denotes the ranking of the tools based on their performance. Success rate: a metric used for evaluating the docking power of SFs, represents the ratio of successfully docked complexes (r.m.s.d. ≤ 2 Å) to all the tested complexes; Model type signifies the type of algorithm or methodology used in the tool (for example, DL for deep learning models, DK for traditional docking program and HB for hybrid models); EF 1% (enrichment factor), a metric used for assessing the screening power of SFs, represents the enrichment of active ligands in top 1% scored compounds.

Supplementary Data 2

Interaction reproduction. Each row represents a specific interaction type for a given structure, the post-processing type applied to the data, the repeat number and the observed reproduction rate of the interaction.

Supplementary Data 3

Pose refinement results for structures from PDBBind core set. The data provides r.m.s.d. values indicating the accuracy of predicted poses generated by various methods and their respective post-processing treatments.

Source data

Source Data Fig. 2

Statistical source data for Fig. 2 showing the success rate, docking speed, heavy atoms number, rotatable bonds number for PDBBind refined set.

Source Data Fig. 3

Statistical source data for Fig. 3 showing the success rate, docking speed, heavy atoms number, rotatable bonds number for PDBBind refined set.

Source Data Fig. 4

Statistical source data for Fig. 4 showing the conformation errors.

Source Data Fig. 5

Statistical source data for Fig. 5 showing the accuracy and screening power of KarmaDock on DEKOIS 2.0.

Source Data Fig. 6

Statistical source data for Fig. 6 showing the Inhibition Activity of compounds on BaF3/CLIP1-LTK cells.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, X., Zhang, O., Shen, C. et al. Efficient and accurate large library ligand docking with KarmaDock. Nat Comput Sci 3, 789–804 (2023). https://doi.org/10.1038/s43588-023-00511-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-023-00511-5

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing