Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling

Abstract

Most molecular generative models based on artificial intelligence for de novo drug design are ligand-centric and do not consider the detailed three-dimensional geometries of protein binding pockets. Pocket-aware three-dimensional molecular generation is challenging due to the need to impose physical equivariance and to evaluate protein–ligand interactions when incrementally growing partially built molecules. Inspired by multiscale modelling in condensed matter and statistical physics, we present a three-dimensional molecular generative model conditioned on protein pockets, termed ResGen, for designing organic molecules inside of a given target. ResGen is built on the principle of parallel multiscale modelling, which can capture higher-level interaction and achieve higher computational efficiency (about eight-times faster than the previous best art). The generation process is formulated as a hierarchical autoregression, that is, a global autoregression for learning protein–ligand interactions and atomic component autoregression for learning each atom’s topology and geometry distributions. We demonstrate that ResGen has a higher success rate than existing state-of-the-art approaches in generating novel molecules that can bind to unseen targets more tightly than the original ligands. Moreover, retrospective computational experiments on de novo drug design in real-world scenarios show that ResGen successfully generates drug-like molecules with lower binding energy and higher diversity than state-of-the-art approaches.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Workflow and architecture of ResGen.
Fig. 2: Evaluation of generated molecules.

Data availability

The train and test data of this study is available at Zenodo (https://doi.org/10.5281/zenodo.7759114).

Code availability

The source code of this study is freely available at GitHub (https://github.com/HaotianZhangAI4Science/ResGen) to allow replication of the results.

References

  1. Mandal, S. & Mandal, S. K. Rational drug design. Eur. J. Pharmacol. 625, 90–100 (2009).

    Article  Google Scholar 

  2. Bo, G. Giuseppe Brotzu and the discovery of cephalosporins. Clin. Microbiol. Infection 6, 6–9 (2000).

    Article  Google Scholar 

  3. Kong, L. Y. & Tan, R. X. Artemisinin, a miracle of traditional Chinese medicine. Nat. Prod. Rep. 32, 1617–1621 (2015).

    Article  Google Scholar 

  4. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    Article  Google Scholar 

  5. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e613 (2020).

    Article  Google Scholar 

  6. Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4, 180–186 (2022).

    Article  Google Scholar 

  7. Zang, C. & Wang, F. MoFlow: An invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 617–626 (ACM, 2020).

  8. Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning 2323–2332 (PMLR, 2018).

  9. Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at https://arxiv.org/abs/2001.09382 (2020).

  10. Gao, K., Nguyen, D. D., Tu, M. & Wei, G.-W. Generative network complex for the automated generation of drug-like molecules. J. Chem. Inf. Model. 60, 5682–5698 (2020).

    Article  Google Scholar 

  11. Xie, W., Wang, F., Li, Y., Lai, L. & Pei, J. Advances and challenges in de novo drug design using three-dimensional deep generative models. J. Chem. Inf. Model. 62, 2269–2279 (2022).

    Article  Google Scholar 

  12. Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).

    Article  Google Scholar 

  13. Sussman, J. L. et al. Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr. D 54, 1078–1084 (1998).

    Article  Google Scholar 

  14. Jiang, D. et al. InteractionGraphNet: a novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. J. Med. Chem. 64, 18209–18232 (2021).

    Article  Google Scholar 

  15. Shen, C. et al. Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).

    Article  MathSciNet  Google Scholar 

  16. Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet–a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).

    Article  Google Scholar 

  17. Cohen, T. S., Geiger, M., Köhler, J. & Welling, M. Spherical CNNs. Preprint at https://arxiv.org/abs/1801.10130 (2018).

  18. Deng, C. et al. Vector neurons: a general framework for SO(3)-equivariant networks. In Proc. IEEE/CVF International Conference on Computer Vision 12200–12209 (IEEE, 2021).

  19. Thomas, N. et al. Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. Preprint at https://arxiv.org/abs/1802.08219 (2018).

  20. Grechishnikova, D. Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11, 321 (2021).

    Article  Google Scholar 

  21. Li, C. et al. Geometry-based molecular generation with deep constrained variational autoencoder. In IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE) (IEEE, 2021).

  22. Kang, S.-G. et al. In-pocket 3D graphs enhance ligand–target compatibility in generative small-molecule creation. Preprint at https://arxiv.org/abs/2204.02513 (2022).

  23. Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).

    Article  Google Scholar 

  24. Liu, M., Luo, Y., Uchino, K., Maruhashi, K. & Ji, S. Generating 3D molecules for target protein binding. Preprint at https://arxiv.org/abs/2204.09410 (2022).

  25. Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. In Advances in Neural Information Processing Systems Vol. 34, 6229–6239 (NeurIPS, 2021).

  26. Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. Preprint at https://arxiv.org/abs/2205.07249 (2022).

  27. Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).

    Article  Google Scholar 

  28. Isert, C., Atz, K. & Schneider, G. Structure-based drug design with geometric deep learning. Curr. Opin. Struct. Biol. 79, 102548 (2023).

    Article  Google Scholar 

  29. Rudd, R. E. & Broughton, J. Q. Coarse-grained molecular dynamics and the atomic limit of finite elements. Phys. Rev. B 58, R5893 (1998).

    Article  Google Scholar 

  30. Senn, H. M. & Thiel, W. QM/MM methods for biomolecular systems. Angew. Chem. Int. Ed. 48, 1198–1229 (2009).

    Article  Google Scholar 

  31. Jing, B., Eismann, S., Suriana, P., Townshend, R. J. & Dror, R. Learning from protein structure with geometric vector perceptrons. Preprint at https://arxiv.org/abs/2009.01411 (2020).

  32. Moon, S., Zhung, W., Yang, S., Lim, J. & Kim, W. Y. PIGNet: a physics-informed deep learning model toward generalized drug–target interaction predictions. Chem. Sci. 13, 3661–3673 (2022).

    Article  Google Scholar 

  33. Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    Google Scholar 

  34. RDKit Documentation (RDKit, 2021); https://www.rdkit.org/docs/

  35. Menéndez, M., Pardo, J., Pardo, L. & Pardo, M. The Jensen–Shannon divergence. J. Franklin Inst. 334, 307–318 (1997).

    Article  MathSciNet  MATH  Google Scholar 

  36. Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Inst. Stat. 22, 79–86 (1951).

    MathSciNet  MATH  Google Scholar 

  37. Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 55, 2562–2574 (2015).

    Article  Google Scholar 

  38. Rappé, A. K., Casewit, C. J., Colwell, K., Goddard, W. A. III & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. JACS 114, 10024–10035 (1992).

    Article  Google Scholar 

  39. Ganea, O. et al. Geomol: torsional geometric generation of molecular 3D conformer ensembles. In Advances in Neural Information Processing Systems Vol. 34, 13757–13769 (NeurIPS, 2021).

  40. Simm, G. N. & Hernández-Lobato, J. M. A generative model for molecular distance geometry. Preprint at https://arxiv.org/abs/1909.11459 (2019).

  41. Shi, C., Luo, S., Xu, M. & Tang, J. in Proc. 38th International Conference on Machine Learning Vol. 139 (eds Melia, M. & Zhang, T.) 9558–9568 (PMLR, 2021).

  42. Xu, M., Luo, S., Bengio, Y., Peng, J. & Tang, J. Learning neural generative dynamics for molecular conformation generation. Preprint at https://arxiv.org/abs/2102.10240 (2021).

  43. Luo, S., Shi, C., Xu, M. & Tang, J. Predicting molecular conformation via dynamic graph score matching. In Advances in Neural Information Processing Systems Vol. 34 (NeurIPS, 2021).

  44. Salentin, S., Schreiber, S., Haupt, V. J., Adasme, M. F. & Schroeder, M. PLIP: fully automated protein–ligand interaction profiler. Nucleic Acids Res. 43, W443–W447 (2015).

    Article  Google Scholar 

  45. Huang, Y., Peng, X., Ma, J. & Zhang, M. 3DLinker: an E(3) equivariant variational autoencoder for molecular linker design. Preprint at https://arxiv.org/abs/2205.07309 (2022).

  46. Anderson, P. W. More is different: broken symmetry and the nature of the hierarchical structure of science. Science 177, 393–396 (1972).

    Article  Google Scholar 

  47. Comez, L. et al. More is different: experimental results on the effect of biomolecules on the dynamics of hydration water. J. Phys. Chem. Lett. 4, 1188–1192 (2013).

    Article  Google Scholar 

  48. Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In 33rd Conference on Neural Information Processing Systems Vol. 32 (NeurIPS, 2019).

  49. Gardner, M. W. & Dorling, S. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ. 32, 2627–2636 (1998).

    Article  Google Scholar 

  50. Méndez-Lucio, O., Ahmad, M., del Rio-Chanona, E. A. & Wegner, J. K. A geometric deep learning approach to predict binding conformations of bioactive molecules. Nat. Mach. Intell. 3, 1033–1039 (2021).

    Article  Google Scholar 

  51. Bishop, C. M. Mixture Density Networks (Aston Univ., 1994).

  52. Zou, L. et al. GMDN: a lightweight graph-based mixture density network for 3D human pose regression. Comput. Graph. 95, 115–122 (2021).

    Article  Google Scholar 

  53. Chen, J., Yu, Y. & Liu, Y. Physics-guided mixture density networks for uncertainty quantification. Reliab. Eng. Syst. Saf. 228, 108823 (2022).

    Article  Google Scholar 

  54. Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning 8867–8887 (PMLR, 2022).

  55. Sproul, G. Electronegativity and bond type: predicting bond type. J. Chem. Educ. 78, 387 (2001).

    Article  Google Scholar 

  56. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  57. Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (NeurIPS, 2017).

  58. Shen, Z., Zhang, M., Zhao, H., Yi, S. & Li, H. in Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 3531–3539 (IEEE, 2021).

  59. Wang, G., Ying, R., Huang, J. & Leskovec, J. Multi-hop attention graph neural network. Preprint at https://arxiv.org/abs/2009.14332 (2020).

  60. Lu, W. et al. TANKBind: trigonometry-aware neural networks for drug–protein binding structure prediction. Preprint at https://www.biorxiv.org/content/10.1101/2022.06.06.495043v1 (2022).

  61. Lewis, G. N. The atom and the molecule. J. Am. Chem. Soc. 38, 762–785 (1916).

    Article  Google Scholar 

  62. Yu, L., Su, Y., Liu, Y. & Zeng, X. Review of unsupervised pretraining strategies for molecules representation. Brief. Funct. Genom. 20, 323–332 (2021).

    Article  Google Scholar 

  63. Hu, W. et al. Strategies for pre-training graph neural networks. Preprint at https://arxiv.org/abs/1905.12265 (2019).

  64. Zhu, J. et al. Unified 2D and 3D pre-training of molecular representations. In Proc. 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2626–2636 (ACM, 2022).

  65. Lamb, A. M. et al. Professor forcing: a new algorithm for training recurrent networks. In Advances in Neural Information Processing Systems Vol. 29 (NeurIPS, 2016).

  66. Drossos, K., Gharib, S., Magron, P. & Virtanen, T. Language modelling for sound event detection with teacher forcing and scheduled sampling. Preprint at https://arxiv.org/abs/1907.08506 (2019).

  67. Tanimoto, T. T. An Elementary Mathematical Theory of Classification and Prediction (International Business Machines Corporation, 1958).

  68. Axelrod, S. & Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 9, 185 (2022).

    Article  Google Scholar 

  69. Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci. 14, 1557–1568 (2023).

    Article  Google Scholar 

  70. Xu, M. et al. An end-to-end framework for molecular conformation generation via bilevel programming. In Proc. 38th International Conference on Machine Learning 11537–11547 (PMLR, 2021).

  71. Clark, D. E. & Pickett, S. D. Computational methods for the prediction of ‘drug-likeness’. Drug Discov. Today 5, 49–58 (2000).

    Article  Google Scholar 

  72. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 1–11 (2009).

    Article  Google Scholar 

  73. Ganesan, A. The impact of natural products upon modern drug discovery. Curr. Opin. Chem. Biol. 12, 306–317 (2008).

    Article  Google Scholar 

  74. Sangster, J. Octanol‐water partition coefficients of simple organic compounds. J. Phys. Chem. Ref. Data 18, 1111–1229 (1989).

    Article  Google Scholar 

Download references

Acknowledgements

This study was supported by the National Key Research and Development Program of China (grant no. 2022YFF1203000), the National Natural Science Foundation of China (grant no. 22220102001), the Fundamental Research Funds for the Central Universities (grant no. 226-2022-00220) and the Hong Kong Innovation and Technology Fund (project no. ITS/241/21).

Author information

Authors and Affiliations

Authors

Contributions

O.Z. contributed to the main idea and code. J.Z. contributed to the manuscript writing and code reorganization. X.Z. contributed to the collection of the dataset and the corresponding experiment. R.H. and J.J. contributed to the curation of the real-world dataset. C.S. and H.D. contributed to the data analysis and drawing. H.C. and Y.K. contributed to the instruction in physical concepts. Y.D. contributed to the visualization and technical support. F.L. contributed to the suggestion of the geometry analysis metric. G.C. and C.-Y.H. contributed to manuscript revision and experimental design. T.H. contributed to the essential financial support, the conceptualization, and was responsible for the overall quality.

Corresponding authors

Correspondence to Furui Liu, Guangyong Chen, Chang-Yu Hsieh or Tingjun Hou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Arne Elofsson and Guo-Wei Wei for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, Discussion and Tables 1–6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, O., Zhang, J., Jin, J. et al. ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling. Nat Mach Intell 5, 1020–1030 (2023). https://doi.org/10.1038/s42256-023-00712-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00712-7

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing