Abstract
Most molecular generative models based on artificial intelligence for de novo drug design are ligand-centric and do not consider the detailed three-dimensional geometries of protein binding pockets. Pocket-aware three-dimensional molecular generation is challenging due to the need to impose physical equivariance and to evaluate protein–ligand interactions when incrementally growing partially built molecules. Inspired by multiscale modelling in condensed matter and statistical physics, we present a three-dimensional molecular generative model conditioned on protein pockets, termed ResGen, for designing organic molecules inside of a given target. ResGen is built on the principle of parallel multiscale modelling, which can capture higher-level interaction and achieve higher computational efficiency (about eight-times faster than the previous best art). The generation process is formulated as a hierarchical autoregression, that is, a global autoregression for learning protein–ligand interactions and atomic component autoregression for learning each atom’s topology and geometry distributions. We demonstrate that ResGen has a higher success rate than existing state-of-the-art approaches in generating novel molecules that can bind to unseen targets more tightly than the original ligands. Moreover, retrospective computational experiments on de novo drug design in real-world scenarios show that ResGen successfully generates drug-like molecules with lower binding energy and higher diversity than state-of-the-art approaches.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout


Data availability
The train and test data of this study is available at Zenodo (https://doi.org/10.5281/zenodo.7759114).
Code availability
The source code of this study is freely available at GitHub (https://github.com/HaotianZhangAI4Science/ResGen) to allow replication of the results.
References
Mandal, S. & Mandal, S. K. Rational drug design. Eur. J. Pharmacol. 625, 90–100 (2009).
Bo, G. Giuseppe Brotzu and the discovery of cephalosporins. Clin. Microbiol. Infection 6, 6–9 (2000).
Kong, L. Y. & Tan, R. X. Artemisinin, a miracle of traditional Chinese medicine. Nat. Prod. Rep. 32, 1617–1621 (2015).
Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702.e613 (2020).
Godinez, W. J. et al. Design of potent antimalarials with generative chemistry. Nat. Mach. Intell. 4, 180–186 (2022).
Zang, C. & Wang, F. MoFlow: An invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining 617–626 (ACM, 2020).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning 2323–2332 (PMLR, 2018).
Shi, C. et al. GraphAF: a flow-based autoregressive model for molecular graph generation. Preprint at https://arxiv.org/abs/2001.09382 (2020).
Gao, K., Nguyen, D. D., Tu, M. & Wei, G.-W. Generative network complex for the automated generation of drug-like molecules. J. Chem. Inf. Model. 60, 5682–5698 (2020).
Xie, W., Wang, F., Li, Y., Lai, L. & Pei, J. Advances and challenges in de novo drug design using three-dimensional deep generative models. J. Chem. Inf. Model. 62, 2269–2279 (2022).
Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).
Sussman, J. L. et al. Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr. D 54, 1078–1084 (1998).
Jiang, D. et al. InteractionGraphNet: a novel and efficient deep graph representation learning framework for accurate protein–ligand interaction predictions. J. Med. Chem. 64, 18209–18232 (2021).
Shen, C. et al. Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).
Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet–a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).
Cohen, T. S., Geiger, M., Köhler, J. & Welling, M. Spherical CNNs. Preprint at https://arxiv.org/abs/1801.10130 (2018).
Deng, C. et al. Vector neurons: a general framework for SO(3)-equivariant networks. In Proc. IEEE/CVF International Conference on Computer Vision 12200–12209 (IEEE, 2021).
Thomas, N. et al. Tensor field networks: rotation-and translation-equivariant neural networks for 3D point clouds. Preprint at https://arxiv.org/abs/1802.08219 (2018).
Grechishnikova, D. Transformer neural network for protein-specific de novo drug generation as a machine translation problem. Sci. Rep. 11, 321 (2021).
Li, C. et al. Geometry-based molecular generation with deep constrained variational autoencoder. In IEEE 21st International Conference on Bioinformatics and Bioengineering (BIBE) (IEEE, 2021).
Kang, S.-G. et al. In-pocket 3D graphs enhance ligand–target compatibility in generative small-molecule creation. Preprint at https://arxiv.org/abs/2204.02513 (2022).
Ragoza, M., Masuda, T. & Koes, D. R. Generating 3D molecules conditional on receptor binding sites with deep generative models. Chem. Sci. 13, 2701–2713 (2022).
Liu, M., Luo, Y., Uchino, K., Maruhashi, K. & Ji, S. Generating 3D molecules for target protein binding. Preprint at https://arxiv.org/abs/2204.09410 (2022).
Luo, S., Guan, J., Ma, J. & Peng, J. A 3D generative model for structure-based drug design. In Advances in Neural Information Processing Systems Vol. 34, 6229–6239 (NeurIPS, 2021).
Peng, X. et al. Pocket2Mol: efficient molecular sampling based on 3D protein pockets. Preprint at https://arxiv.org/abs/2205.07249 (2022).
Francoeur, P. G. et al. Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. J. Chem. Inf. Model. 60, 4200–4215 (2020).
Isert, C., Atz, K. & Schneider, G. Structure-based drug design with geometric deep learning. Curr. Opin. Struct. Biol. 79, 102548 (2023).
Rudd, R. E. & Broughton, J. Q. Coarse-grained molecular dynamics and the atomic limit of finite elements. Phys. Rev. B 58, R5893 (1998).
Senn, H. M. & Thiel, W. QM/MM methods for biomolecular systems. Angew. Chem. Int. Ed. 48, 1198–1229 (2009).
Jing, B., Eismann, S., Suriana, P., Townshend, R. J. & Dror, R. Learning from protein structure with geometric vector perceptrons. Preprint at https://arxiv.org/abs/2009.01411 (2020).
Moon, S., Zhung, W., Yang, S., Lim, J. & Kim, W. Y. PIGNet: a physics-informed deep learning model toward generalized drug–target interaction predictions. Chem. Sci. 13, 3661–3673 (2022).
Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).
RDKit Documentation (RDKit, 2021); https://www.rdkit.org/docs/
Menéndez, M., Pardo, J., Pardo, L. & Pardo, M. The Jensen–Shannon divergence. J. Franklin Inst. 334, 307–318 (1997).
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Inst. Stat. 22, 79–86 (1951).
Riniker, S. & Landrum, G. A. Better informed distance geometry: using what we know to improve conformation generation. J. Chem. Inf. Model. 55, 2562–2574 (2015).
Rappé, A. K., Casewit, C. J., Colwell, K., Goddard, W. A. III & Skiff, W. M. UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations. JACS 114, 10024–10035 (1992).
Ganea, O. et al. Geomol: torsional geometric generation of molecular 3D conformer ensembles. In Advances in Neural Information Processing Systems Vol. 34, 13757–13769 (NeurIPS, 2021).
Simm, G. N. & Hernández-Lobato, J. M. A generative model for molecular distance geometry. Preprint at https://arxiv.org/abs/1909.11459 (2019).
Shi, C., Luo, S., Xu, M. & Tang, J. in Proc. 38th International Conference on Machine Learning Vol. 139 (eds Melia, M. & Zhang, T.) 9558–9568 (PMLR, 2021).
Xu, M., Luo, S., Bengio, Y., Peng, J. & Tang, J. Learning neural generative dynamics for molecular conformation generation. Preprint at https://arxiv.org/abs/2102.10240 (2021).
Luo, S., Shi, C., Xu, M. & Tang, J. Predicting molecular conformation via dynamic graph score matching. In Advances in Neural Information Processing Systems Vol. 34 (NeurIPS, 2021).
Salentin, S., Schreiber, S., Haupt, V. J., Adasme, M. F. & Schroeder, M. PLIP: fully automated protein–ligand interaction profiler. Nucleic Acids Res. 43, W443–W447 (2015).
Huang, Y., Peng, X., Ma, J. & Zhang, M. 3DLinker: an E(3) equivariant variational autoencoder for molecular linker design. Preprint at https://arxiv.org/abs/2205.07309 (2022).
Anderson, P. W. More is different: broken symmetry and the nature of the hierarchical structure of science. Science 177, 393–396 (1972).
Comez, L. et al. More is different: experimental results on the effect of biomolecules on the dynamics of hydration water. J. Phys. Chem. Lett. 4, 1188–1192 (2013).
Ingraham, J., Garg, V., Barzilay, R. & Jaakkola, T. Generative models for graph-based protein design. In 33rd Conference on Neural Information Processing Systems Vol. 32 (NeurIPS, 2019).
Gardner, M. W. & Dorling, S. Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences. Atmos. Environ. 32, 2627–2636 (1998).
Méndez-Lucio, O., Ahmad, M., del Rio-Chanona, E. A. & Wegner, J. K. A geometric deep learning approach to predict binding conformations of bioactive molecules. Nat. Mach. Intell. 3, 1033–1039 (2021).
Bishop, C. M. Mixture Density Networks (Aston Univ., 1994).
Zou, L. et al. GMDN: a lightweight graph-based mixture density network for 3D human pose regression. Comput. Graph. 95, 115–122 (2021).
Chen, J., Yu, Y. & Liu, Y. Physics-guided mixture density networks for uncertainty quantification. Reliab. Eng. Syst. Saf. 228, 108823 (2022).
Hoogeboom, E., Satorras, V. G., Vignac, C. & Welling, M. Equivariant diffusion for molecule generation in 3D. In Proc. 39th International Conference on Machine Learning 8867–8887 (PMLR, 2022).
Sproul, G. Electronegativity and bond type: predicting bond type. J. Chem. Educ. 78, 387 (2001).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Vaswani, A. et al. Attention is all you need. In Advances in Neural Information Processing Systems Vol. 30 (NeurIPS, 2017).
Shen, Z., Zhang, M., Zhao, H., Yi, S. & Li, H. in Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 3531–3539 (IEEE, 2021).
Wang, G., Ying, R., Huang, J. & Leskovec, J. Multi-hop attention graph neural network. Preprint at https://arxiv.org/abs/2009.14332 (2020).
Lu, W. et al. TANKBind: trigonometry-aware neural networks for drug–protein binding structure prediction. Preprint at https://www.biorxiv.org/content/10.1101/2022.06.06.495043v1 (2022).
Lewis, G. N. The atom and the molecule. J. Am. Chem. Soc. 38, 762–785 (1916).
Yu, L., Su, Y., Liu, Y. & Zeng, X. Review of unsupervised pretraining strategies for molecules representation. Brief. Funct. Genom. 20, 323–332 (2021).
Hu, W. et al. Strategies for pre-training graph neural networks. Preprint at https://arxiv.org/abs/1905.12265 (2019).
Zhu, J. et al. Unified 2D and 3D pre-training of molecular representations. In Proc. 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2626–2636 (ACM, 2022).
Lamb, A. M. et al. Professor forcing: a new algorithm for training recurrent networks. In Advances in Neural Information Processing Systems Vol. 29 (NeurIPS, 2016).
Drossos, K., Gharib, S., Magron, P. & Virtanen, T. Language modelling for sound event detection with teacher forcing and scheduled sampling. Preprint at https://arxiv.org/abs/1907.08506 (2019).
Tanimoto, T. T. An Elementary Mathematical Theory of Classification and Prediction (International Business Machines Corporation, 1958).
Axelrod, S. & Gómez-Bombarelli, R. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Sci. Data 9, 185 (2022).
Zhang, H. et al. SDEGen: learning to evolve molecular conformations from thermodynamic noise for conformation generation. Chem. Sci. 14, 1557–1568 (2023).
Xu, M. et al. An end-to-end framework for molecular conformation generation via bilevel programming. In Proc. 38th International Conference on Machine Learning 11537–11547 (PMLR, 2021).
Clark, D. E. & Pickett, S. D. Computational methods for the prediction of ‘drug-likeness’. Drug Discov. Today 5, 49–58 (2000).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 1–11 (2009).
Ganesan, A. The impact of natural products upon modern drug discovery. Curr. Opin. Chem. Biol. 12, 306–317 (2008).
Sangster, J. Octanol‐water partition coefficients of simple organic compounds. J. Phys. Chem. Ref. Data 18, 1111–1229 (1989).
Acknowledgements
This study was supported by the National Key Research and Development Program of China (grant no. 2022YFF1203000), the National Natural Science Foundation of China (grant no. 22220102001), the Fundamental Research Funds for the Central Universities (grant no. 226-2022-00220) and the Hong Kong Innovation and Technology Fund (project no. ITS/241/21).
Author information
Authors and Affiliations
Contributions
O.Z. contributed to the main idea and code. J.Z. contributed to the manuscript writing and code reorganization. X.Z. contributed to the collection of the dataset and the corresponding experiment. R.H. and J.J. contributed to the curation of the real-world dataset. C.S. and H.D. contributed to the data analysis and drawing. H.C. and Y.K. contributed to the instruction in physical concepts. Y.D. contributed to the visualization and technical support. F.L. contributed to the suggestion of the geometry analysis metric. G.C. and C.-Y.H. contributed to manuscript revision and experimental design. T.H. contributed to the essential financial support, the conceptualization, and was responsible for the overall quality.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Arne Elofsson and Guo-Wei Wei for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–4, Discussion and Tables 1–6.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, O., Zhang, J., Jin, J. et al. ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modelling. Nat Mach Intell 5, 1020–1030 (2023). https://doi.org/10.1038/s42256-023-00712-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-023-00712-7