A deep generative model for molecule optimization via one fragment modification

Chen, Ziqi; Min, Martin Renqiang; Parthasarathy, Srinivasan; Ning, Xia

doi:10.1038/s42256-021-00410-2

Article
Published: 09 December 2021

A deep generative model for molecule optimization via one fragment modification

Ziqi Chen¹,
Martin Renqiang Min²,
Srinivasan Parthasarathy^1,3 &
…
Xia Ning ORCID: orcid.org/0000-0002-6842-1165^1,3,4

Nature Machine Intelligence volume 3, pages 1040–1049 (2021)Cite this article

4971 Accesses
24 Citations
6 Altmetric
Metrics details

Subjects

A preprint version of the article is available at arXiv.

Abstract

Molecule optimization is a critical step in drug development to improve the desired properties of drug candidates through chemical modification. We have developed a novel deep generative model, Modof, over molecular graphs for molecule optimization. Modof modifies a given molecule through the prediction of a single site of disconnection at the molecule and the removal and/or addition of fragments at that site. A pipeline of multiple, identical Modof models is implemented into Modof-pipe to modify an input molecule at multiple disconnection sites. Here we show that Modof-pipe is able to retain major molecular scaffolds, allow controls over intermediate optimization steps and better constrain molecule similarities. Modof-pipe outperforms the state-of-the-art methods on benchmark datasets. Without molecular similarity constraints, Modof-pipe achieves 81.2% improvement in the octanol–water partition coefficient, penalized by synthetic accessibility and ring size, and 51.2%, 25.6% and 9.2% improvement if the optimized molecules are at least 0.2, 0.4 and 0.6 similar to those before optimization, respectively. Modof-pipe is further enhanced into Modof-pipe^m to allow modification of one molecule to multiple optimized ones. Modof-pipe^m achieves additional performance improvement, at least 17.8% better than Modof-pipe.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Modof-pipe examples for plogP optimization.**

**Fig. 3: Modof-pipe examples for DRD2, QED and multi-property optimization.**

MedGAN: optimized generative adversarial network with graph convolutional networks for novel molecule design

Article Open access 12 January 2024

3D molecular generative framework for interaction-guided drug design

Article Open access 27 March 2024

A pharmacophore-guided deep learning approach for bioactive molecular generation

Article Open access 06 October 2023

Data availability

The data used in this manuscript are available publicly from Chen et al.⁵² and https://github.com/ziqi92/Modof. Source data are provided with this paper.

Code availability

The code for Modof, Modof-pipe and Modof-pipe^m is publicly available from Chen et al.⁵² and https://github.com/ziqi92/Modof.

References

Jorgensen, W. L. Efficient drug lead discovery and optimization. Acc. Chem. Res. 42, 724–733 (2009).
Article Google Scholar
Verdonk, M. L. & Hartshorn, M. J. Structure-guided fragment screening for lead discovery. Curr. Opin. Drug Discov. Dev. 7, 404–410 (2004).
Google Scholar
de Souza Neto, L. R. et al. In silico strategies to support fragment-to-lead optimization in drug discovery. Front. Chem 8, 93 (2020).
Article Google Scholar
Hoffer, L. et al. Integrated strategy for lead optimization based on fragment growing: the diversity-oriented-target-focused-synthesis approach. J. Med. Chem. 61, 5719–5732 (2018).
Article Google Scholar
Gerry, C. J. & Schreiber, S. L. Chemical probes and drug leads from advances in synthetic planning and methodology. Nat. Rev. Drug Discov. 17, 333–352 (2018).
Article Google Scholar
Sattarov, B. et al. De novo molecular design by combining deep autoencoder recurrent neural networks with generative topographic mapping. J. Chem. Inf. Model. 59, 1182–1196 (2019).
Article Google Scholar
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Article Google Scholar
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. Machine Learning Research Vol. 80 (eds Dy, J. & Krause, A.), 2323–2332 (PMLR, 2018).
You, J., Liu, B., Ying, Z., Pande, V. & Leskovec, J. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) 6410–6421 (Curran Associates, 2018).
Murray, C. & Rees, D. The rise of fragment-based drug discovery. Nat. Chem. 1, 187–192 (2009).
Article Google Scholar
Hajduk, P. J. & Greer, J. A decade of fragment-based drug design: strategic advances and lessons learned. Nat. Rev. Drug Discov. 6, 211–219 (2007).
Article Google Scholar
Shi, C. et al. Graphaf: a flow-based autoregressive model for molecular graph generation. In Proc. 8th International Conference on Learning Representations (OpenReview.net, 2020).
Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (eds Gupta, R. et al.) 617–626 (ACM, 2020).
Jin, W., Yang, K., Barzilay, R. & Jaakkola, T. S. Learning multimodal graph-to-graph translation for molecule optimization. In Proc. 7th International Conference on Learning Representations (2019).
Jin, W., Barzilay, R. & Jaakkola, T. S. Hierarchical generation of molecular graphs using structural motifs. In Proc. 37th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 119 (eds Daumé, H. III & Singh, H.) 4839–4848 (PMLR, 2020).
Podda, M., Bacciu, D. & Micheli, A. A deep generative model for fragment-based molecule generation. In Proc. Twenty Third International Conference on Artificial Intelligence and Statistics, Proc. Machine Learning Research Vol. 108 (eds Chiappa, S. & Calandra, R.) 2240–2250 (PMLR, 2020).
Ji, C., Zheng, Y., Wang, R., Cai, Y. & Wu, H. Graph Polish: a novel graph generation paradigm for molecular optimization. Preprint at https://arxiv.org/abs/2008.06246 (2021).
Lim, J., Hwang, S.-Y., Moon, S., Kim, S. & Kim, W. Y. Scaffold-based molecular design with a graph generative model. Chem. Sci. 11, 1153–1164 (2020).
Article Google Scholar
Ahn, S., Kim, J., Lee, H. & Shin, J. Guiding deep molecular optimization with genetic exploration. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) (Curran Associates, 2020).
Nigam, A., Friederich, P., Krenn, M. & Aspuru-Guzik, A. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In Proc. 8th International Conference on Learning Representations (OpenReview.net, 2020).
Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput. Sci. 39, 868–873 (1999).
Article Google Scholar
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminf. 1, 8 (2009).
Article Google Scholar
Sterling, T. & Irwin, J. J. Zinc 15—ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).
Article Google Scholar
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Article Google Scholar
Abu-Aisheh, Z., Raveaux, R., Ramel, J.-Y. & Martineau, P. An exact graph edit distance algorithm for solving pattern recognition problems. In Proc. International Conference on Pattern Recognition Applications and Methods Vol. 1, 271–278 (SciTePress, 2015).
Sanfeliu, A. & Fu, K. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. SMC-13, 353–362 (1983).
Article Google Scholar
Lipinski, C. A. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337–341 (2004).
Article Google Scholar
Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–68 (1999).
Article Google Scholar
Whiteson, S., Tanner, B., Taylor, M. E. & Stone, P. Protecting against evaluation overfitting in empirical reinforcement learning. In Proc. 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (eds Sarangapani, J. et. al.) 120–127 (IEEE, 2011).
Zhang, C., Vinyals, O., Munos, R. & Bengio, S. A study on overfitting in deep reinforcement learning. Preprint at https://arxiv.org/abs/1804.06893 (2018).
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).
Article Google Scholar
Rokitskaya, T. I., Luzhkov, V. B., Korshunova, G. A., Tashlitsky, V. N. & Antonenko, Y. N. Effect of methyl and halogen substituents on the transmembrane movement of lipophilic ions. Phys. Chem. Chem. Phys. 21, 23355–23363 (2019).
Article Google Scholar
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
Article Google Scholar
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminf 9, 48 (2017).
Article Google Scholar
Kusner, M. J., Paige, B. & Hernández-Lobato, J. M. Grammar variational autoencoder. In Proc. 34th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 70 (eds Precup, D. & Teh, Y. W.) 1945–1954 (PMLR, 2017).
De Cao, N. & Kipf, T. MolGAN: an implicit generative model for small molecular graphs. In ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models (2018).
Zhou, Z., Kearnes, S., Li, L., Zare, R. N. & Riley, P. Optimization of molecules via deep reinforcement learning. Sci. Rep. 9, 10752 (2019).
Article Google Scholar
Wainberg, M., Merico, D., Delong, A. & Frey, B. J. Deep learning in biomedicine. Nat. Biotechnol. 36, 829–838 (2018).
Article Google Scholar
Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2020).
Article Google Scholar
Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).
Article Google Scholar
Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).
Article Google Scholar
Kishimoto, A., Buesser, B., Chen, B. & Botea, A. Depth-first proof-number search with heuristic edge cost and application to chemical synthesis planning. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. M. et al.) 7224–7234 (Curran Associates, 2019).
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).
Article Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Article Google Scholar
Liu, J. & Ning, X. Multi-assay-based compound prioritization via assistance utilization: a machine learning framework. J. Chem. Inf. Model. 57, 484–498 (2017).
Article Google Scholar
Liu, J. & Ning, X. Differential compound prioritization via bidirectional selectivity push with power. J. Chem. Inf. Model. 57, 2958–2975 (2017).
Article Google Scholar
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning Vol. 70 (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).
Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? In Proc. 7th International Conference on Learning Representations (OpenReview.net, 2019).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proc. 2nd International Conference on Learning Representations (eds. Bengio, Y. & LeCun, Y.) (OpenReview.net, 2014).
Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput. Sci. 39, 868–873 (1999).
Article Google Scholar
Reddi, S. J., Kale, S. & Kumar, S. On the convergence of Adam and beyond. In Proc. 6th International Conference on Learning Representations (OpenReview.net, 2018).
Chen, Z. A deep generative model for molecule optimization via one fragment modification. Zenodo https://doi.org/10.5281/zenodo.4667928 (2021).

Download references

Acknowledgements

This project was made possible, in part, by support from the National Science Foundation grant nos. IIS-1855501 (X.N.), IIS-1827472 (X.N.), IIS-2133650 (X.N. and S.P.) and OAC-2018627 (S.P.), the National Library of Medicine grant nos. 1R01LM012605-01A1 (X.N.) and 1R21LM013678-01 (X.N.), an AWS Machine Learning Research Award (X.N.) and The Ohio State University President’s Research Excellence programme (X.N.). Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. We thank X. Wang and X. Cheng for their constructive comments.

Author information

Authors and Affiliations

Computer Science and Engineering, The Ohio State University, Columbus, OH, USA
Ziqi Chen, Srinivasan Parthasarathy & Xia Ning
Machine Learning Department, NEC Labs America, Princeton, NJ, USA
Martin Renqiang Min
Translational Data Analytics Institute, The Ohio State University, Columbus, OH, USA
Srinivasan Parthasarathy & Xia Ning
Biomedical Informatics, The Ohio State University, Columbus, OH, USA
Xia Ning

Authors

Ziqi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Martin Renqiang Min
View author publications
You can also search for this author in PubMed Google Scholar
Srinivasan Parthasarathy
View author publications
You can also search for this author in PubMed Google Scholar
Xia Ning
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.N. conceived the research. X.N. and S.P. obtained funding for the research and co-supervised Z.C. Z.C., M.R.M., S.P. and X.N. designed the research. Z.C. and X.N. conducted the research, including data curation, formal analysis, methodology design and implementation, result analysis and visualization. Z.C. drafted the original manuscript. M.R.M. provided comments on the original manuscript. Z.C., X.N. and S.P. conducted the manuscript editing and revision. All authors reviewed the final manuscript.

Corresponding author

Correspondence to Xia Ning.

Ethics declarations

Competing interests

M.R.M. was employed by NEC Labs America. The remaining authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Michael Withnall and Benjamin Sanchez-Lengeling for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–14, Discussion, Tables 1–11, Figs. 1–9, Results and Algorithms 1–5.

Source data

Source Data Fig. 1

SMILES strings of molecules in Fig. 1.

Source Data Fig. 2

SMILES strings of molecules in Fig. 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Z., Min, M.R., Parthasarathy, S. et al. A deep generative model for molecule optimization via one fragment modification. Nat Mach Intell 3, 1040–1049 (2021). https://doi.org/10.1038/s42256-021-00410-2

Download citation

Received: 27 December 2020
Accepted: 04 October 2021
Published: 09 December 2021
Issue Date: December 2021
DOI: https://doi.org/10.1038/s42256-021-00410-2

This article is cited by

COMA: efficient structure-constrained molecular generation using contractive and margin losses
- Jonghwan Choi
- Sangmin Seo
- Sanghyun Park
Journal of Cheminformatics (2023)
DeepSA: a deep-learning driven predictor of compound synthesis accessibility
- Shihang Wang
- Lin Wang
- Fang Bai
Journal of Cheminformatics (2023)
Material symmetry recognition and property prediction accomplished by crystal capsule representation
- Chao Liang
- Yilimiranmu Rouzhahong
- Huashan Li
Nature Communications (2023)
Hierarchical Molecular Graph Self-Supervised Learning for property prediction
- Xuan Zang
- Xianbing Zhao
- Buzhou Tang
Communications Chemistry (2023)
Regression Transformer enables concurrent sequence regression and generation for molecular language modelling
- Jannis Born
- Matteo Manica
Nature Machine Intelligence (2023)