Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

State-specific protein–ligand complex structure prediction with a multiscale deep generative model

A preprint version of the article is available at arXiv.

Abstract

The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life. Despite recent advancements in protein structure prediction, existing algorithms are so far unable to systematically predict the binding ligand structures along with their regulatory effects on protein folding. To address this discrepancy, we present NeuralPLexer, a computational approach that can directly predict protein–ligand complex structures solely using protein sequence and ligand molecular graph inputs. NeuralPLexer adopts a deep generative model to sample the three-dimensional structures of the binding complex and their conformational changes at an atomistic resolution. The model is based on a diffusion process that incorporates essential biophysical constraints and a multiscale geometric deep learning system to iteratively sample residue-level contact maps and all heavy-atom coordinates in a hierarchical manner. NeuralPLexer achieves state-of-the-art performance compared with all existing methods on benchmarks for both protein–ligand blind docking and flexible binding-site structure recovery. Moreover, owing to its specificity in sampling both ligand-free-state and ligand-bound-state ensembles, NeuralPLexer consistently outperforms AlphaFold2 in terms of global protein structure accuracy on both representative structure pairs with large conformational changes and recently determined ligand-binding proteins. NeuralPLexer predictions align with structure determination experiments for important targets in enzyme engineering and drug discovery, suggesting its potential for accelerating the design of functional proteins and small molecules at the proteome scale.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: NeuralPLexer enables accurate prediction of protein–ligand complex structure and conformational changes.
Fig. 2: Architecture details.
Fig. 3: Model performance on benchmarking problems.
Fig. 4: Model predictions for contrasting apo–holo pairs from the PocketMiner dataset.
Fig. 5: Model predictions for recently determined structures.

Similar content being viewed by others

Data availability

All datasets and predictions used to generate the reported results are available on Code Ocean86 and also on Zenodo at https://doi.org/10.5281/zenodo.10373581.

Code availability

The code, scripts and interactive data analysis notebooks are available on Code Ocean86 and also on GitHub at https://github.com/zrqiao/NeuralPLexer.

References

  1. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  2. Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  5. Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).

    Article  CAS  PubMed  Google Scholar 

  6. Chowdhury, R. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 40, 1617–1623 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  7. Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804–814 (2022).

    Article  CAS  PubMed  Google Scholar 

  8. Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1 (2022)

  9. Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article  ADS  MathSciNet  CAS  PubMed  Google Scholar 

  10. Zhang, Y. et al. Benchmarking refined and unrefined AlphaFold2 structures for hit discovery. J. Chem. Inf. Model. 63, 1656–1667 (2023).

    Article  CAS  PubMed  Google Scholar 

  11. Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Jones, D. T. & Thornton, J. M. The impact of AlphaFold2 one year on. Nat. Methods 19, 15–20 (2022).

    Article  CAS  PubMed  Google Scholar 

  13. Henzler-Wildman, K. & Kern, D. Dynamic personalities of proteins. Nature 450, 964–972 (2007).

    Article  ADS  CAS  PubMed  Google Scholar 

  14. Nussinov, R. & Tsai, C.-J. Allostery in disease and in drug discovery. Cell 153, 293–305 (2013).

    Article  CAS  PubMed  Google Scholar 

  15. Ayaz, P. et al. Structural mechanism of a drug-binding process involving a large conformational change of the protein target. Nat. Commun. 14, 1885 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Lane, T. J. Protein structure prediction has reached the single-structure frontier. Nat. Methods 20, 170–173 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Moore, A. R., Rosenberg, S. C., McCormick, F. & Malek, S. Ras-targeted therapies: is the undruggable drugged? Nat. Rev. Drug Discov. 19, 533–552 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Draper-Joyce, C. J. et al. Positive allosteric mechanisms of adenosine a1 receptor-mediated analgesia. Nature 597, 571–576 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  19. Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  20. Shaw, D. E. et al. Atomic-level characterization of the structural dynamics of proteins. Science 330, 341–346 (2010).

    Article  ADS  CAS  PubMed  Google Scholar 

  21. Shan, Y. et al. How does a small molecule bind at a cryptic binding site? PLoS Comput. Biol. 18, e1009817 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).

    Google Scholar 

  23. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  24. Vaswani, A. et al. Attention is All You Need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, Inc., 2017).

  25. Zvyagin, M. et al. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. Int. J. High Perform. Comput. Appl. 37, 683–705 (2023).

    Article  Google Scholar 

  26. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  29. Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  30. Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at https://arxiv.org/abs/2209.15611 (2022).

  32. Lin, Y. & AlQuraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. Preprint at https://arxiv.org/abs/2301.12485 (2023).

  33. Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations (2022).

  34. Lu, W. et al. Tankbind: trigonometry-aware neural networks for drug-protein binding structure prediction. In Advances in Neural Information Processing Systems, Vol. 35 (eds Koyejo, S. et al.) 7236–7249 (Curran Associates, Inc., 2022).

  35. Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. In The Eleventh International Conference on Learning Representations.

  36. Nakata, S., Mori, Y. & Tanaka, S. End-to-end protein–ligand complex structure generation with diffusion-based generative models. BMC Bioinformatics 24, 233 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at https://arxiv.org/abs/2210.13695 (2022).

  38. Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. Adv. Neural Inf. Process. Syst. 35, 23716–23736 (2022).

    Google Scholar 

  39. Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).

    Article  CAS  PubMed  Google Scholar 

  40. Davis, I. W. & Baker, D. Rosettaligand docking with full ligand and receptor flexibility. J. Mol. Biol. 385, 381–392 (2009).

    Article  CAS  PubMed  Google Scholar 

  41. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Eliel, E. L. & Wilen, S. H. Stereochemistry of Organic Compounds (John Wiley & Sons, 1994).

  43. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 2256–2265 (PMLR, 2015).

  44. Song, Y. et al. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (2021).

  45. Shin, Y. et al. Discovery of N-(1-acryloylazetidin-3-yl)-2-(1H-indol-1-yl)acetamides as covalent inhibitors of KRASG12C. ACS Med. Chem. Lett. 10, 1302–1308 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  47. Meller, A. et al. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat. Commun. 14, 1177 (2023).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  48. Best, R. B., Hummer, G. & Eaton, W. A. Native contacts determine protein folding mechanisms in atomistic simulations. Proc. Natl Acad. Sci. USA 110, 17874–17879 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  49. Karelina, M., Noh, J. J. & Dror, R. O. How accurately can one predict drug binding modes using AlphaFold models? eLife https://doi.org/10.7554/elife.89386.1 (2023).

  50. Chen, C.-Y., Chang, Y.-C., Lin, B.-L., Huang, C.-H. & Tsai, M.-D. Temperature-resolved cryo-EM uncovers structural bases of temperature-dependent enzyme functions. J. Am. Chem. Soc. 141, 19983–19987 (2019).

    Article  CAS  PubMed  Google Scholar 

  51. Lee, M.-Y. et al. Harnessing the power of an X-ray laser for serial crystallography of membrane proteins crystallized in lipidic cubic phase. IUCrJ 7, 976–984 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. García-Nafría, J., Lee, Y., Bai, X., Carpenter, B. & Tate, C. G. Cryo-EM structure of the adenosine A2A receptor coupled to an engineered heterotrimeric G protein. eLife 7, e35946 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  53. Bertheleme, N., Singh, S., Dowell, S. J., Hubbard, J. & Byrne, B. Loss of constitutive activity is correlated with increased thermostability of the human adenosine A2A receptor. Br. J. Pharmacol. 169, 988–998 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  55. Wishart, D. S. et al. HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 50, D622–D631 (2022).

    Article  CAS  PubMed  Google Scholar 

  56. Irwin, J. J. & Shoichet, B. K. ZINC—a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617–626 (2020).

  58. Fu, T. et al. Differentiable scaffolding tree for molecule optimization. In International Conference on Learning Representations (2022).

  59. Plested, A. J. Structural mechanisms of activation and desensitization in neurotransmitter-gated ion channels. Nat. Struct. Mol. Biol. 23, 494–502 (2016).

    Article  CAS  PubMed  Google Scholar 

  60. Kondor, R. I. & Lafferty, J. Diffusion kernels on graphs and other discrete structures. In Proc. 19th International Conference on Machine Learning, 315–322 (2002) .

  61. Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In Proceedings of the 38th International Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 9323–9332 (PMLR, 2021).

  62. Brandstetter, J., Hesselink, R., van der Pol, E., Bekkers, E. J. & Welling, M. Geometric and physical quantities improve E(3) equivariant message passing. In International Conference on Learning Representations (2022).

  63. Li, Y., Wu, J., Tedrake, R., Tenenbaum, J. B. & Torralba, A. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In International Conference on Learning Representations (2019).

  64. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations (2021).

  65. Shen, T. et al. E2Efold-3D: end-to-end deep learning method for accurate de novo RNA 3D structure prediction. Preprint at https://arxiv.org/abs/2207.01586 (2022).

  66. Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://arxiv.org/abs/2205.15019 (2022).

  67. Meucci, A. Review of statistical arbitrage, cointegration, and multivariate Ornstein–Uhlenbeck. SSRN: https://ssrn.com/abstract=1404905 (2009).

  68. Song, Y. & Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. In: Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. Vol. 32. Curran Associates, Inc.; 2019.

  69. Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).

    Google Scholar 

  70. Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems (2022).

  71. Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 41, D1096–D1103 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Pándy-Szekeres, G. et al. GPCRdb in 2023: state-specific structure models using AlphaFold2 and new ligand resources. Nucleic Acids Res. 51, D395–D402 (2023).

    Article  PubMed  Google Scholar 

  73. Del Alamo, D., Sala, D., Mchaourab, H. S. & Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife 11, e75751 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  74. Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Preprint at https://www.biorxiv.org/content/10.1101/2022.11.20.517210v3 (2022).

  76. Yan, X. et al. Pointsite: a point cloud segmentation tool for identification of protein ligand binding atoms. J. Chem. Inf. Model. 62, 2835–2845 (2022).

    Article  CAS  PubMed  Google Scholar 

  77. Krivák, R. & Hoksza, D. P2Rank: machine learning-based tool for rapid and accurate prediction of ligand binding sites from protein structure. J. Cheminform. 10, 39 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  78. McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  79. Yu, Y. et al. Uni-dock: GPU-accelerated docking enables ultralarge virtual screening. J. Chem. Theory Comput. 19, 3336–3345 (2023).

    Article  CAS  PubMed  Google Scholar 

  80. Yu, Y., Lu, S., Gao, Z., Zheng, H. & Ke, G. Do deep learning models really outperform traditional approaches in molecular docking? Preprint at arXiv:2302.07134 (2023). https://arxiv.org/abs/2302.07134

  81. Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. In: Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, editors. Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research; Vol. 162. PMLR; 2022 Jul 17-23. p. 20503-20521.

  82. Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722–2728 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Robin, X. et al. Continuous Automated Model EvaluatiOn (CAMEO)—perspectives on the future of fully automated evaluation of structure prediction methods. Proteins 89, 1977–1986 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Biasini, M. et al. OpenStructure: an integrated software framework for computational structural biology. Acta Crystallogr. D Biol. Crystallogr. 69, 701–709 (2013).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  85. Rappé, A. K., Casewit, C. J., Colwell, K., Goddard III, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).

    Article  Google Scholar 

  86. Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Code Ocean https://doi.org/10.24433/CO.9870737.v1 (2023).

Download references

Acknowledgements

Z.Q. acknowledges graduate research funding from Caltech and partial support from the Amazon-Caltech AI4Science fellowship. T.F.M. acknowledges partial support from the Caltech DeLogi fund, and A.A. acknowledges support from a Caltech Bren professorship. We thank M. Welborn, F. R. Manby, C. Zhang and V. Bhethanabotla for discussions on the work and for comments on the manuscript. We thank A. Meller and J. Borowsky for sharing the PocketMiner dataset.

Author information

Authors and Affiliations

Authors

Contributions

Z.Q., W.N., A.V., T.F.M. and A.A. conceived and designed the experiments. Z.Q. performed the experiments. Z.Q., W.N., A.V., T.F.M. and A.A. analysed the data. Z.Q. contributed analysis tools. Z.Q. and A.A. wrote the paper.

Corresponding authors

Correspondence to Zhuoran Qiao, Thomas F. Miller III or Animashree Anandkumar.

Ethics declarations

Competing interests

Z.Q. and T.F.M. are currently employees of Iambic Therapeutics or its affiliates. A provisional patent application related to this work has been filed (US Patent App. provisional 63/496,899). The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Shigenori Tanaka, Anastassis Perrakis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Structure prediction accuracy on all targets.

Comparing AlphaFold2 (AF2), NeuralPLexer, and NeuralPLexer (no ligand) in terms of TM-score against all structure prediction targets described in this study, including PocketMiner and recent structures. All NeuralPLexer results shown in this figure are obtained using the LSA-SDE sampler and are based on the structure with the highest average protein pLDDT among the 8 generated structures for each prediction target.

Supplementary information

Supplementary Information

Supplementary results and discussions and Algorithms 1–12, Figs. 1–5 and Tables 1–6.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qiao, Z., Nie, W., Vahdat, A. et al. State-specific protein–ligand complex structure prediction with a multiscale deep generative model. Nat Mach Intell 6, 195–208 (2024). https://doi.org/10.1038/s42256-024-00792-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-024-00792-z

This article is cited by

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics