State-specific protein–ligand complex structure prediction with a multiscale deep generative model

Qiao, Zhuoran; Nie, Weili; Vahdat, Arash; Miller, Thomas F.; Anandkumar, Animashree

doi:10.1038/s42256-024-00792-z

Article
Published: 12 February 2024

State-specific protein–ligand complex structure prediction with a multiscale deep generative model

Nature Machine Intelligence volume 6, pages 195–208 (2024)Cite this article

10k Accesses
2 Citations
77 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

The binding complexes formed by proteins and small molecule ligands are ubiquitous and critical to life. Despite recent advancements in protein structure prediction, existing algorithms are so far unable to systematically predict the binding ligand structures along with their regulatory effects on protein folding. To address this discrepancy, we present NeuralPLexer, a computational approach that can directly predict protein–ligand complex structures solely using protein sequence and ligand molecular graph inputs. NeuralPLexer adopts a deep generative model to sample the three-dimensional structures of the binding complex and their conformational changes at an atomistic resolution. The model is based on a diffusion process that incorporates essential biophysical constraints and a multiscale geometric deep learning system to iteratively sample residue-level contact maps and all heavy-atom coordinates in a hierarchical manner. NeuralPLexer achieves state-of-the-art performance compared with all existing methods on benchmarks for both protein–ligand blind docking and flexible binding-site structure recovery. Moreover, owing to its specificity in sampling both ligand-free-state and ligand-bound-state ensembles, NeuralPLexer consistently outperforms AlphaFold2 in terms of global protein structure accuracy on both representative structure pairs with large conformational changes and recently determined ligand-binding proteins. NeuralPLexer predictions align with structure determination experiments for important targets in enzyme engineering and drug discovery, suggesting its potential for accelerating the design of functional proteins and small molecules at the proteome scale.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: NeuralPLexer enables accurate prediction of protein–ligand complex structure and conformational changes.**

**Fig. 3: Model performance on benchmarking problems.**

**Fig. 4: Model predictions for contrasting apo–holo pairs from the PocketMiner dataset.**

**Fig. 5: Model predictions for recently determined structures.**

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

De novo design of protein structure and function with RFdiffusion

Article Open access 11 July 2023

Discovery of potent inhibitors of α-synuclein aggregation using structure-based iterative learning

Article Open access 17 April 2024

Data availability

All datasets and predictions used to generate the reported results are available on Code Ocean⁸⁶ and also on Zenodo at https://doi.org/10.5281/zenodo.10373581.

Code availability

The code, scripts and interactive data analysis notebooks are available on Code Ocean⁸⁶ and also on GitHub at https://github.com/zrqiao/NeuralPLexer.

References

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Ovchinnikov, S. et al. Protein structure determination using metagenome sequence data. Science 355, 294–298 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Baek, M. et al. Accurate prediction of protein–nucleic acid complexes using RoseTTAFoldNA. Nat. Methods 21, 117–121 (2024).
Article CAS PubMed Google Scholar
Chowdhury, R. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. 40, 1617–1623 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, W., Peng, Z. & Yang, J. Single-sequence protein structure prediction using supervised transformer protein language models. Nat. Comput. Sci. 2, 804–814 (2022).
Article CAS PubMed Google Scholar
Wu, R. et al. High-resolution de novo structure prediction from primary sequence. Preprint at https://www.biorxiv.org/content/10.1101/2022.07.21.500999v1 (2022)
Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).
Article ADS MathSciNet CAS PubMed Google Scholar
Zhang, Y. et al. Benchmarking refined and unrefined AlphaFold2 structures for hit discovery. J. Chem. Inf. Model. 63, 1656–1667 (2023).
Article CAS PubMed Google Scholar
Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jones, D. T. & Thornton, J. M. The impact of AlphaFold2 one year on. Nat. Methods 19, 15–20 (2022).
Article CAS PubMed Google Scholar
Henzler-Wildman, K. & Kern, D. Dynamic personalities of proteins. Nature 450, 964–972 (2007).
Article ADS CAS PubMed Google Scholar
Nussinov, R. & Tsai, C.-J. Allostery in disease and in drug discovery. Cell 153, 293–305 (2013).
Article CAS PubMed Google Scholar
Ayaz, P. et al. Structural mechanism of a drug-binding process involving a large conformational change of the protein target. Nat. Commun. 14, 1885 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lane, T. J. Protein structure prediction has reached the single-structure frontier. Nat. Methods 20, 170–173 (2023).
Article CAS PubMed PubMed Central Google Scholar
Moore, A. R., Rosenberg, S. C., McCormick, F. & Malek, S. Ras-targeted therapies: is the undruggable drugged? Nat. Rev. Drug Discov. 19, 533–552 (2020).
Article CAS PubMed PubMed Central Google Scholar
Draper-Joyce, C. J. et al. Positive allosteric mechanisms of adenosine a1 receptor-mediated analgesia. Nature 597, 571–576 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Sadybekov, A. V. & Katritch, V. Computational approaches streamlining drug discovery. Nature 616, 673–685 (2023).
Article ADS CAS PubMed Google Scholar
Shaw, D. E. et al. Atomic-level characterization of the structural dynamics of proteins. Science 330, 341–346 (2010).
Article ADS CAS PubMed Google Scholar
Shan, Y. et al. How does a small molecule bind at a cryptic binding site? PLoS Comput. Biol. 18, e1009817 (2022).
Article CAS PubMed PubMed Central Google Scholar
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).
Google Scholar
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Google Scholar
Vaswani, A. et al. Attention is All You Need. In Advances in Neural Information Processing Systems Vol. 30 (eds Guyon, I. et al.) (Curran Associates, Inc., 2017).
Zvyagin, M. et al. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. Int. J. High Perform. Comput. Appl. 37, 683–705 (2023).
Article Google Scholar
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bepler, T. & Berger, B. Learning the protein language: evolution, structure, and function. Cell Syst. 12, 654–669 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dauparas, J. et al. Robust deep learning-based protein sequence design using ProteinMPNN. Science 378, 49–56 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Wu, K. E. et al. Protein structure generation via folding diffusion. Preprint at https://arxiv.org/abs/2209.15611 (2022).
Lin, Y. & AlQuraishi, M. Generating novel, designable, and diverse protein structures by equivariantly diffusing oriented residue clouds. Preprint at https://arxiv.org/abs/2301.12485 (2023).
Xu, M. et al. GeoDiff: a geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations (2022).
Lu, W. et al. Tankbind: trigonometry-aware neural networks for drug-protein binding structure prediction. In Advances in Neural Information Processing Systems, Vol. 35 (eds Koyejo, S. et al.) 7236–7249 (Curran Associates, Inc., 2022).
Corso, G., Stärk, H., Jing, B., Barzilay, R. & Jaakkola, T. S. DiffDock: diffusion steps, twists, and turns for molecular docking. In The Eleventh International Conference on Learning Representations.
Nakata, S., Mori, Y. & Tanaka, S. End-to-end protein–ligand complex structure generation with diffusion-based generative models. BMC Bioinformatics 24, 233 (2023).
Article CAS PubMed PubMed Central Google Scholar
Schneuing, A. et al. Structure-based drug design with equivariant diffusion models. Preprint at https://arxiv.org/abs/2210.13695 (2022).
Alayrac, J.-B. et al. Flamingo: a visual language model for few-shot learning. Adv. Neural Inf. Process. Syst. 35, 23716–23736 (2022).
Google Scholar
Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).
Article CAS PubMed Google Scholar
Davis, I. W. & Baker, D. Rosettaligand docking with full ligand and receptor flexibility. J. Mol. Biol. 385, 381–392 (2009).
Article CAS PubMed Google Scholar
Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
Article CAS PubMed PubMed Central Google Scholar
Eliel, E. L. & Wilen, S. H. Stereochemistry of Organic Compounds (John Wiley & Sons, 1994).
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N. & Ganguli, S. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning Vol. 37 (eds Bach, F. & Blei, D.) 2256–2265 (PMLR, 2015).
Song, Y. et al. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations (2021).
Shin, Y. et al. Discovery of N-(1-acryloylazetidin-3-yl)-2-(1H-indol-1-yl)acetamides as covalent inhibitors of KRAS^G12C. ACS Med. Chem. Lett. 10, 1302–1308 (2019).
Article CAS PubMed PubMed Central Google Scholar
Polizzi, N. F. & DeGrado, W. F. A defined structural unit enables de novo design of small-molecule-binding proteins. Science 369, 1227–1233 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Meller, A. et al. Predicting locations of cryptic pockets from single protein structures using the PocketMiner graph neural network. Nat. Commun. 14, 1177 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Best, R. B., Hummer, G. & Eaton, W. A. Native contacts determine protein folding mechanisms in atomistic simulations. Proc. Natl Acad. Sci. USA 110, 17874–17879 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Karelina, M., Noh, J. J. & Dror, R. O. How accurately can one predict drug binding modes using AlphaFold models? eLife https://doi.org/10.7554/elife.89386.1 (2023).
Chen, C.-Y., Chang, Y.-C., Lin, B.-L., Huang, C.-H. & Tsai, M.-D. Temperature-resolved cryo-EM uncovers structural bases of temperature-dependent enzyme functions. J. Am. Chem. Soc. 141, 19983–19987 (2019).
Article CAS PubMed Google Scholar
Lee, M.-Y. et al. Harnessing the power of an X-ray laser for serial crystallography of membrane proteins crystallized in lipidic cubic phase. IUCrJ 7, 976–984 (2020).
Article CAS PubMed PubMed Central Google Scholar
García-Nafría, J., Lee, Y., Bai, X., Carpenter, B. & Tate, C. G. Cryo-EM structure of the adenosine A2A receptor coupled to an engineered heterotrimeric G protein. eLife 7, e35946 (2018).
Article PubMed PubMed Central Google Scholar
Bertheleme, N., Singh, S., Dowell, S. J., Hubbard, J. & Byrne, B. Loss of constitutive activity is correlated with increased thermostability of the human adenosine A2A receptor. Br. J. Pharmacol. 169, 988–998 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Wishart, D. S. et al. HMDB 5.0: the human metabolome database for 2022. Nucleic Acids Res. 50, D622–D631 (2022).
Article CAS PubMed Google Scholar
Irwin, J. J. & Shoichet, B. K. ZINC—a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182 (2005).
Article CAS PubMed PubMed Central Google Scholar
Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 617–626 (2020).
Fu, T. et al. Differentiable scaffolding tree for molecule optimization. In International Conference on Learning Representations (2022).
Plested, A. J. Structural mechanisms of activation and desensitization in neurotransmitter-gated ion channels. Nat. Struct. Mol. Biol. 23, 494–502 (2016).
Article CAS PubMed Google Scholar
Kondor, R. I. & Lafferty, J. Diffusion kernels on graphs and other discrete structures. In Proc. 19th International Conference on Machine Learning, 315–322 (2002) .
Satorras, V. G., Hoogeboom, E. & Welling, M. E(n) equivariant graph neural networks. In Proceedings of the 38th International Conference on Machine Learning Vol. 139 (eds Meila, M. & Zhang, T.) 9323–9332 (PMLR, 2021).
Brandstetter, J., Hesselink, R., van der Pol, E., Bekkers, E. J. & Welling, M. Geometric and physical quantities improve E(3) equivariant message passing. In International Conference on Learning Representations (2022).
Li, Y., Wu, J., Tedrake, R., Tenenbaum, J. B. & Torralba, A. Learning particle dynamics for manipulating rigid bodies, deformable objects, and fluids. In International Conference on Learning Representations (2019).
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. L. & Dror, R. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations (2021).
Shen, T. et al. E2Efold-3D: end-to-end deep learning method for accurate de novo RNA 3D structure prediction. Preprint at https://arxiv.org/abs/2207.01586 (2022).
Anand, N. & Achim, T. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models. Preprint at https://arxiv.org/abs/2205.15019 (2022).
Meucci, A. Review of statistical arbitrage, cointegration, and multivariate Ornstein–Uhlenbeck. SSRN: https://ssrn.com/abstract=1404905 (2009).
Song, Y. & Ermon, S. Generative Modeling by Estimating Gradients of the Data Distribution. In: Wallach H, Larochelle H, Beygelzimer A, d'Alché-Buc F, Fox E, Garnett R, editors. Advances in Neural Information Processing Systems. Vol. 32. Curran Associates, Inc.; 2019.
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Karras, T., Aittala, M., Aila, T. & Laine, S. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems (2022).
Yang, J., Roy, A. & Zhang, Y. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic Acids Res. 41, D1096–D1103 (2012).
Article PubMed PubMed Central Google Scholar
Pándy-Szekeres, G. et al. GPCRdb in 2023: state-specific structure models using AlphaFold2 and new ligand resources. Nucleic Acids Res. 51, D395–D402 (2023).
Article PubMed Google Scholar
Del Alamo, D., Sala, D., Mchaourab, H. S. & Meiler, J. Sampling alternative conformational states of transporters and receptors with AlphaFold2. eLife 11, e75751 (2022).
Article PubMed PubMed Central Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Article CAS PubMed PubMed Central Google Scholar
Ahdritz, G. et al. OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization. Preprint at https://www.biorxiv.org/content/10.1101/2022.11.20.517210v3 (2022).
Yan, X. et al. Pointsite: a point cloud segmentation tool for identification of protein ligand binding atoms. J. Chem. Inf. Model. 62, 2835–2845 (2022).
Article CAS PubMed Google Scholar
Krivák, R. & Hoksza, D. P2Rank: machine learning-based tool for rapid and accurate prediction of ligand binding sites from protein structure. J. Cheminform. 10, 39 (2018).
Article PubMed PubMed Central Google Scholar
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).
Article PubMed PubMed Central Google Scholar
Yu, Y. et al. Uni-dock: GPU-accelerated docking enables ultralarge virtual screening. J. Chem. Theory Comput. 19, 3336–3345 (2023).
Article CAS PubMed Google Scholar
Yu, Y., Lu, S., Gao, Z., Zheng, H. & Ke, G. Do deep learning models really outperform traditional approaches in molecular docking? Preprint at arXiv:2302.07134 (2023). https://arxiv.org/abs/2302.07134
Stärk, H., Ganea, O., Pattanaik, L., Barzilay, D. R. & Jaakkola, T. EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction. In: Chaudhuri K, Jegelka S, Song L, Szepesvari C, Niu G, Sabato S, editors. Proceedings of the 39th International Conference on Machine Learning. Proceedings of Machine Learning Research; Vol. 162. PMLR; 2022 Jul 17-23. p. 20503-20521.
Mariani, V., Biasini, M., Barbato, A. & Schwede, T. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 2722–2728 (2013).
Article CAS PubMed PubMed Central Google Scholar
Robin, X. et al. Continuous Automated Model EvaluatiOn (CAMEO)—perspectives on the future of fully automated evaluation of structure prediction methods. Proteins 89, 1977–1986 (2021).
Article CAS PubMed PubMed Central Google Scholar
Biasini, M. et al. OpenStructure: an integrated software framework for computational structural biology. Acta Crystallogr. D Biol. Crystallogr. 69, 701–709 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Rappé, A. K., Casewit, C. J., Colwell, K., Goddard III, W. A. & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. J. Am. Chem. Soc. 114, 10024–10035 (1992).
Article Google Scholar
Qiao, Z., Nie, W., Vahdat, A., Miller III, T. F. & Anandkumar, A. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Code Ocean https://doi.org/10.24433/CO.9870737.v1 (2023).

Download references

Acknowledgements

Z.Q. acknowledges graduate research funding from Caltech and partial support from the Amazon-Caltech AI4Science fellowship. T.F.M. acknowledges partial support from the Caltech DeLogi fund, and A.A. acknowledges support from a Caltech Bren professorship. We thank M. Welborn, F. R. Manby, C. Zhang and V. Bhethanabotla for discussions on the work and for comments on the manuscript. We thank A. Meller and J. Borowsky for sharing the PocketMiner dataset.

Author information

Authors and Affiliations

Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
Zhuoran Qiao
Iambic Therapeutics, San Diego, CA, USA
Zhuoran Qiao & Thomas F. Miller III
Nvidia Corporation, Santa Clara, CA, USA
Weili Nie, Arash Vahdat & Animashree Anandkumar
Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, USA
Animashree Anandkumar

Authors

Zhuoran Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Weili Nie
View author publications
You can also search for this author in PubMed Google Scholar
Arash Vahdat
View author publications
You can also search for this author in PubMed Google Scholar
Thomas F. Miller III
View author publications
You can also search for this author in PubMed Google Scholar
Animashree Anandkumar
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.Q., W.N., A.V., T.F.M. and A.A. conceived and designed the experiments. Z.Q. performed the experiments. Z.Q., W.N., A.V., T.F.M. and A.A. analysed the data. Z.Q. contributed analysis tools. Z.Q. and A.A. wrote the paper.

Corresponding authors

Correspondence to Zhuoran Qiao, Thomas F. Miller III or Animashree Anandkumar.

Ethics declarations

Competing interests

Z.Q. and T.F.M. are currently employees of Iambic Therapeutics or its affiliates. A provisional patent application related to this work has been filed (US Patent App. provisional 63/496,899). The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Shigenori Tanaka, Anastassis Perrakis and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Structure prediction accuracy on all targets.

Comparing AlphaFold2 (AF2), NeuralPLexer, and NeuralPLexer (no ligand) in terms of TM-score against all structure prediction targets described in this study, including PocketMiner and recent structures. All NeuralPLexer results shown in this figure are obtained using the LSA-SDE sampler and are based on the structure with the highest average protein pLDDT among the 8 generated structures for each prediction target.

Supplementary information

Supplementary Information

Supplementary results and discussions and Algorithms 1–12, Figs. 1–5 and Tables 1–6.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Qiao, Z., Nie, W., Vahdat, A. et al. State-specific protein–ligand complex structure prediction with a multiscale deep generative model. Nat Mach Intell 6, 195–208 (2024). https://doi.org/10.1038/s42256-024-00792-z

Download citation

Received: 02 June 2023
Accepted: 09 January 2024
Published: 12 February 2024
Issue Date: February 2024
DOI: https://doi.org/10.1038/s42256-024-00792-z

This article is cited by

A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture
- Hyun Park
- Xiaoli Yan
- Emad Tajkhorshid
Communications Chemistry (2024)