Abstract
Ligand docking is one of the core technologies in structure-based virtual screening for drug discovery. However, conventional docking tools and existing deep learning tools may suffer from limited performance in terms of speed, pose quality and binding affinity accuracy. Here we propose KarmaDock, a deep learning approach for ligand docking that integrates the functions of docking acceleration, binding pose generation and correction, and binding strength estimation. The three-stage model consists of the following components: (1) encoders for the protein and ligand to learn the representations of intramolecular interactions; (2) E(n) equivariant graph neural networks with self-attention to update the ligand pose based on both protein–ligand and intramolecular interactions, followed by post-processing to ensure chemically plausible structures; (3) a mixture density network for scoring the binding strength. KarmaDock was validated on four benchmark datasets and tested in a real-world virtual screening project that successfully identified experiment-validated active inhibitors of leukocyte tyrosine kinase (LTK).
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$99.00 per year
only $8.25 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The raw datasets26,27,28,29 are available at http://pdbbind.org.cn/index.php, https://github.com/devalab/Apobind and http://www.pharmchem.uni-tuebingen.de/dekois/data/DEKOIS2.0_library/DEKOIS2.0_library.rar. The prepared datasets42,43,44 are available at https://zenodo.org/record/7788083, https://zenodo.org/record/8211452 and https://zenodo.org/record/8131256. PDB IDs 1S38, 1SQA, 4JXS, 1PS3, 3DXG, 3D4Z, 4CLI, 4JSZ and 4CTB are available in the Protein Data Bank (https://www.rcsb.org/)37. Source data are available with this paper.
Code availability
The source code is available at Zenodo (https://zenodo.org/record/8211513)45 and GitHub (https://github.com/schrojunzhang/KarmaDock).
References
Shen, C. et al. From machine learning to deep learning: advances in scoring functions for protein-ligand docking. WIREs Comput. Mol. Sci. 10, e1429 (2020).
Morris, G. M. et al. AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J. Comput. Chem. 30, 2785–2791 (2009).
Eberhardt, J., Santos-Martins, D., Tillack, A. F. & Forli, S. AutoDock Vina 1.2.0: new docking methods, expanded force field and Python bindings. J. Chem. Inf. Model. 61, 3891–3898 (2021).
Zhao, H. & Caflisch, A. Discovery of ZAP70 inhibitors by high-throughput docking into a conformation of its kinase domain generated by molecular dynamics. Bioorg. Med. Chem. Lett. 23, 5721–5726 (2013).
Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).
Irwin, J. J. et al. ZINC20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).
Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C.-K. Fast, accurate and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).
Santos-Martins, D. et al. Accelerating AutoDock4 with GPUs and gradient-based local search. J. Chem. Theory Comput. 17, 1060–1073 (2021).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci 14, 1557–1568 (2023).
Jing, B., Corso, G., Chang, J., Barzilay, R. & Jaakkola, T. Torsional diffusion for molecular conformer generation. in Advances in Neural Information Processing Systems (eds. Koyejo, S. et al.) Vol. 35, 24240–24253 (Curran Associates, Inc., 2022).
Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. in International Conference on Learning Representations (2022).
Zhang, Y., Cai, H., Shi, C. & Tang, J. E3Bind: an end-to-end equivariant network for protein-ligand docking. in International Conference on Learning Representations (2023).
Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: geometric deep learning for drug binding structure prediction. in Proceedings of the 39th International Conference on Machine Learning 20503–20521 (PMLR, 2022).
Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Preprint at arXiv https://doi.org/10.48550/arXiv.2209.15171 (2023).
Lu, W. et al. TANKBind: trigonometry-aware neural networKs for drug-protein binding structure prediction. in Advances in Neural Information Processing Systems Vol. 35, 7236–7249 (2022).
Junfeng, Z., Kelei, H., Tiejun, D. & Wu, J. Accurate protein-ligand complex structure prediction using geometric deep learning. Res. Square https://doi.org/10.21203/rs.3.rs-1454132/v1 (2022).
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. Preprint at arXiv https://doi.org/10.48550/arXiv.2210.01776 (2023).
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. in Proceedings of the 38th International Conference on Machine Learning (eds. Meila, M. & Zhang, T.) Vol. 139, 9323–9332 (PMLR, 2021).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) Vol. 33, 6840–6851 (Curran Associates, 2020).
Hu, X. et al. Discovery of novel non-steroidal selective glucocorticoid receptor modulators by structure- and IGN-based virtual screening, structural optimization and biological evaluation. Eur. J. Med. Chem. 237, 114382 (2022).
Hu, X. et al. Discovery of novel GR ligands toward druggable GR antagonist conformations identified by MD simulations and Markov state model analysis. Adv. Sci. 9, 2102435 (2022).
Shen, C. et al. Boosting protein-ligand binding pose prediction and virtual screening based on residue-atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. in International Conference on Learning Representations (2021).
Liu, Z. et al. Forging the basis for developing protein-ligand interaction scoring functions. Acc. Chem. Res. 50, 302–309 (2017).
Aggarwal, R., Gupta, A. & Priyakumar, U. D. APObind: a dataset of ligand unbound protein conformations for machine learning applications in de novo drug design. Preprint at arXiv https://doi.org/10.48550/arXiv.2108.09926 (2021).
Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2019).
Bauer, M. R., Ibrahim, T. M., Vogel, S. M. & Boeckler, F. M. Evaluation and optimization of virtual screening workflows with DEKOIS 2.0—a public library of challenging docking benchmark sets. J. Chem. Inf. Model. 53, 1447–1462 (2013).
Friesner, R. A. et al. Extra precision glide: docking and scoring incorporating a model of hydrophobic enclosure for protein-ligand complexes. J. Med. Chem. 49, 6177–6196 (2006).
Wang, Z. et al. Comprehensive evaluation of ten docking programs on a diverse set of protein-ligand complexes: the prediction accuracy of sampling power and scoring power. Phys. Chem. Chem. Phys. 18, 12964–12975 (2016).
Jain, A. N. Surflex-Dock 2.1: robust performance from ligand energetic modeling, ring flexibility and knowledge-based search. J. Comput. Aided Mol. Des. 21, 281–306 (2007).
Madhavi Sastry, G., Adzhigirey, M., Day, T., Annabhimoju, R. & Sherman, W. Protein and ligand preparation: parameters, protocols and influence on virtual screening enrichments. J. Comput. Aided Mol. Des. 27, 221–234 (2013).
Truchon, J.-F. & Bayly, C. I. Evaluating virtual screening methods: good and bad metrics for the ‘early recognition’ problem. J. Chem. Inf. Model. 47, 488–508 (2007).
Izumi, H. et al. The CLIP1-LTK fusion is an oncogenic driver in non-small-cell lung cancer. Nature 600, 319–323 (2021).
Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 46, W296–W303 (2018).
Berman, H. M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235–242 (2000).
Kaminski, G. A., Friesner, R. A., Tirado-Rives, J. & Jorgensen, W. L. Evaluation and reparametrization of the OPLS-AA force field for proteins via comparison with accurate quantum chemical calculations on peptides. J. Phys. Chem. B 105, 6474–6487 (2001).
Olsson, M. H. M., Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Comput. 7, 525–537 (2011).
Shelley, J. C. et al. Epik: a software program for pK(a) prediction and protonation state generation for drug-like molecules. J. Comput. Aided Mol. Des. 21, 681–691 (2007).
Wójcikowski, M., Zielenkiewicz, P. & Siedlecki, P. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. J. Cheminform. 7, 26 (2015).
Zhang, X. J. APObind core set for KarmaDock (229 protein-ligand complexes) Zenodo https://doi.org/10.5281/zenodo.8211452 (2023).
Zhang, X. J. DEKOIS2.0 for KarmaDock Zenodo https://doi.org/10.5281/zenodo.8131256 (2023).
Zhang, X. J. KarmaDock_PDBBind2020_coreset (1.0) Zenodo https://doi.org/10.5281/zenodo.7788083 (2023).
Zhang, X. J. schrojunzhang/KarmaDock: v1.0.0 Zenodo https://doi.org/10.5281/zenodo.8211513 (2023).
Acknowledgements
This work was financially supported by the National Key Research and Development Program of China (2022YFF1203000), the National Natural Science Foundation of China (22220102001, 82204279 and 22007082), the Natural Science Foundation of Zhejiang Province (LD22H300001 and LQ21B030013) and Fundamental Research Funds for the Central Universities (226-2022-00220). We also thank L. Xu at Jiangsu University of Technology for preparing all the compounds used in this study based on the Glide module in Schrödinger software, which substantially contributed to our research.
Author information
Authors and Affiliations
Contributions
X.Z., O.Z. and C.S. developed this method, analyzed the data and wrote the manuscript. W.Q. and S.C. bought the compounds and measured their IC50 values. H.C., Y.K., Z.W., E.W., J.Z., Y.D., F.L., T.W., H.D. and L.W. evaluated and interpreted the results and wrote the manuscript. P.P., G.C., C.-Y.H. and T.H. conceived and supervised the project, interpreted the results and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Computational Science thanks Matthew Holcomb and Shina Kamerlin for their contribution to the peer review of this work. Primary Handling Editor: Kaitlin McCardle, in collaboration with the Nature Computational Science team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary sections 1–5, Figs. 1–3 and Tables 1–7.
Supplementary Data 1
The docking power and screening power of various models on CASF 2016. Tool rank denotes the ranking of the tools based on their performance. Success rate: a metric used for evaluating the docking power of SFs, represents the ratio of successfully docked complexes (r.m.s.d. ≤ 2 Å) to all the tested complexes; Model type signifies the type of algorithm or methodology used in the tool (for example, DL for deep learning models, DK for traditional docking program and HB for hybrid models); EF 1% (enrichment factor), a metric used for assessing the screening power of SFs, represents the enrichment of active ligands in top 1% scored compounds.
Supplementary Data 2
Interaction reproduction. Each row represents a specific interaction type for a given structure, the post-processing type applied to the data, the repeat number and the observed reproduction rate of the interaction.
Supplementary Data 3
Pose refinement results for structures from PDBBind core set. The data provides r.m.s.d. values indicating the accuracy of predicted poses generated by various methods and their respective post-processing treatments.
Source data
Source Data Fig. 2
Statistical source data for Fig. 2 showing the success rate, docking speed, heavy atoms number, rotatable bonds number for PDBBind refined set.
Source Data Fig. 3
Statistical source data for Fig. 3 showing the success rate, docking speed, heavy atoms number, rotatable bonds number for PDBBind refined set.
Source Data Fig. 4
Statistical source data for Fig. 4 showing the conformation errors.
Source Data Fig. 5
Statistical source data for Fig. 5 showing the accuracy and screening power of KarmaDock on DEKOIS 2.0.
Source Data Fig. 6
Statistical source data for Fig. 6 showing the Inhibition Activity of compounds on BaF3/CLIP1-LTK cells.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, X., Zhang, O., Shen, C. et al. Efficient and accurate large library ligand docking with KarmaDock. Nat Comput Sci 3, 789–804 (2023). https://doi.org/10.1038/s43588-023-00511-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s43588-023-00511-5
This article is cited by
-
EC-Conf: A ultra-fast diffusion model for molecular conformation generation with equivariant consistency
Journal of Cheminformatics (2024)