Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A deep generative model for molecule optimization via one fragment modification

A preprint version of the article is available at arXiv.


Molecule optimization is a critical step in drug development to improve the desired properties of drug candidates through chemical modification. We have developed a novel deep generative model, Modof, over molecular graphs for molecule optimization. Modof modifies a given molecule through the prediction of a single site of disconnection at the molecule and the removal and/or addition of fragments at that site. A pipeline of multiple, identical Modof models is implemented into Modof-pipe to modify an input molecule at multiple disconnection sites. Here we show that Modof-pipe is able to retain major molecular scaffolds, allow controls over intermediate optimization steps and better constrain molecule similarities. Modof-pipe outperforms the state-of-the-art methods on benchmark datasets. Without molecular similarity constraints, Modof-pipe achieves 81.2% improvement in the octanol–water partition coefficient, penalized by synthetic accessibility and ring size, and 51.2%, 25.6% and 9.2% improvement if the optimized molecules are at least 0.2, 0.4 and 0.6 similar to those before optimization, respectively. Modof-pipe is further enhanced into Modof-pipem to allow modification of one molecule to multiple optimized ones. Modof-pipem achieves additional performance improvement, at least 17.8% better than Modof-pipe.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Modof model overview.
Fig. 2: Modof-pipe examples for plogP optimization.
Fig. 3: Modof-pipe examples for DRD2, QED and multi-property optimization.

Data availability

The data used in this manuscript are available publicly from Chen et al.52 and Source data are provided with this paper.

Code availability

The code for Modof, Modof-pipe and Modof-pipem is publicly available from Chen et al.52 and


  1. Jorgensen, W. L. Efficient drug lead discovery and optimization. Acc. Chem. Res. 42, 724–733 (2009).

    Article  Google Scholar 

  2. Verdonk, M. L. & Hartshorn, M. J. Structure-guided fragment screening for lead discovery. Curr. Opin. Drug Discov. Dev. 7, 404–410 (2004).

    Google Scholar 

  3. de Souza Neto, L. R. et al. In silico strategies to support fragment-to-lead optimization in drug discovery. Front. Chem 8, 93 (2020).

    Article  Google Scholar 

  4. Hoffer, L. et al. Integrated strategy for lead optimization based on fragment growing: the diversity-oriented-target-focused-synthesis approach. J. Med. Chem. 61, 5719–5732 (2018).

    Article  Google Scholar 

  5. Gerry, C. J. & Schreiber, S. L. Chemical probes and drug leads from advances in synthetic planning and methodology. Nat. Rev. Drug Discov. 17, 333–352 (2018).

    Article  Google Scholar 

  6. Sattarov, B. et al. De novo molecular design by combining deep autoencoder recurrent neural networks with generative topographic mapping. J. Chem. Inf. Model. 59, 1182–1196 (2019).

    Article  Google Scholar 

  7. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  Google Scholar 

  8. Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. Machine Learning Research Vol. 80 (eds Dy, J. & Krause, A.), 2323–2332 (PMLR, 2018).

  9. You, J., Liu, B., Ying, Z., Pande, V. & Leskovec, J. Graph convolutional policy network for goal-directed molecular graph generation. In Advances in Neural Information Processing Systems Vol. 31 (eds Bengio, S. et al.) 6410–6421 (Curran Associates, 2018).

  10. Murray, C. & Rees, D. The rise of fragment-based drug discovery. Nat. Chem. 1, 187–192 (2009).

    Article  Google Scholar 

  11. Hajduk, P. J. & Greer, J. A decade of fragment-based drug design: strategic advances and lessons learned. Nat. Rev. Drug Discov. 6, 211–219 (2007).

    Article  Google Scholar 

  12. Shi, C. et al. Graphaf: a flow-based autoregressive model for molecular graph generation. In Proc. 8th International Conference on Learning Representations (, 2020).

  13. Zang, C. & Wang, F. Moflow: an invertible flow model for generating molecular graphs. In Proc. 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (eds Gupta, R. et al.) 617–626 (ACM, 2020).

  14. Jin, W., Yang, K., Barzilay, R. & Jaakkola, T. S. Learning multimodal graph-to-graph translation for molecule optimization. In Proc. 7th International Conference on Learning Representations (2019).

  15. Jin, W., Barzilay, R. & Jaakkola, T. S. Hierarchical generation of molecular graphs using structural motifs. In Proc. 37th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 119 (eds Daumé, H. III & Singh, H.) 4839–4848 (PMLR, 2020).

  16. Podda, M., Bacciu, D. & Micheli, A. A deep generative model for fragment-based molecule generation. In Proc. Twenty Third International Conference on Artificial Intelligence and Statistics, Proc. Machine Learning Research Vol. 108 (eds Chiappa, S. & Calandra, R.) 2240–2250 (PMLR, 2020).

  17. Ji, C., Zheng, Y., Wang, R., Cai, Y. & Wu, H. Graph Polish: a novel graph generation paradigm for molecular optimization. Preprint at (2021).

  18. Lim, J., Hwang, S.-Y., Moon, S., Kim, S. & Kim, W. Y. Scaffold-based molecular design with a graph generative model. Chem. Sci. 11, 1153–1164 (2020).

    Article  Google Scholar 

  19. Ahn, S., Kim, J., Lee, H. & Shin, J. Guiding deep molecular optimization with genetic exploration. In Advances in Neural Information Processing Systems Vol. 33 (eds Larochelle, H. et al.) (Curran Associates, 2020).

  20. Nigam, A., Friederich, P., Krenn, M. & Aspuru-Guzik, A. Augmenting genetic algorithms with deep neural networks for exploring the chemical space. In Proc. 8th International Conference on Learning Representations (, 2020).

  21. Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput. Sci. 39, 868–873 (1999).

    Article  Google Scholar 

  22. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminf. 1, 8 (2009).

    Article  Google Scholar 

  23. Sterling, T. & Irwin, J. J. Zinc 15—ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).

    Article  Google Scholar 

  24. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  25. Abu-Aisheh, Z., Raveaux, R., Ramel, J.-Y. & Martineau, P. An exact graph edit distance algorithm for solving pattern recognition problems. In Proc. International Conference on Pattern Recognition Applications and Methods Vol. 1, 271–278 (SciTePress, 2015).

  26. Sanfeliu, A. & Fu, K. A distance measure between attributed relational graphs for pattern recognition. IEEE Trans. Syst. Man Cybern. SMC-13, 353–362 (1983).

    Article  Google Scholar 

  27. Lipinski, C. A. Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov. Today Technol. 1, 337–341 (2004).

    Article  Google Scholar 

  28. Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–68 (1999).

    Article  Google Scholar 

  29. Whiteson, S., Tanner, B., Taylor, M. E. & Stone, P. Protecting against evaluation overfitting in empirical reinforcement learning. In Proc. 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (eds Sarangapani, J. et. al.) 120–127 (IEEE, 2011).

  30. Zhang, C., Vinyals, O., Munos, R. & Bengio, S. A study on overfitting in deep reinforcement learning. Preprint at (2018).

  31. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug Deliv. Rev. 46, 3–26 (2001).

    Article  Google Scholar 

  32. Rokitskaya, T. I., Luzhkov, V. B., Korshunova, G. A., Tashlitsky, V. N. & Antonenko, Y. N. Effect of methyl and halogen substituents on the transmembrane movement of lipophilic ions. Phys. Chem. Chem. Phys. 21, 23355–23363 (2019).

    Article  Google Scholar 

  33. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Article  Google Scholar 

  34. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminf 9, 48 (2017).

    Article  Google Scholar 

  35. Kusner, M. J., Paige, B. & Hernández-Lobato, J. M. Grammar variational autoencoder. In Proc. 34th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 70 (eds Precup, D. & Teh, Y. W.) 1945–1954 (PMLR, 2017).

  36. De Cao, N. & Kipf, T. MolGAN: an implicit generative model for small molecular graphs. In ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models (2018).

  37. Zhou, Z., Kearnes, S., Li, L., Zare, R. N. & Riley, P. Optimization of molecules via deep reinforcement learning. Sci. Rep. 9, 10752 (2019).

    Article  Google Scholar 

  38. Wainberg, M., Merico, D., Delong, A. & Frey, B. J. Deep learning in biomedicine. Nat. Biotechnol. 36, 829–838 (2018).

    Article  Google Scholar 

  39. Kim, S. et al. PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res. 49, D1388–D1395 (2020).

    Article  Google Scholar 

  40. Gao, W. & Coley, C. W. The synthesizability of molecules proposed by generative models. J. Chem. Inf. Model. 60, 5714–5723 (2020).

    Article  Google Scholar 

  41. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article  Google Scholar 

  42. Kishimoto, A., Buesser, B., Chen, B. & Botea, A. Depth-first proof-number search with heuristic edge cost and application to chemical synthesis planning. In Advances in Neural Information Processing Systems Vol. 32 (eds Wallach, H. M. et al.) 7224–7234 (Curran Associates, 2019).

  43. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).

    Article  Google Scholar 

  44. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  Google Scholar 

  45. Liu, J. & Ning, X. Multi-assay-based compound prioritization via assistance utilization: a machine learning framework. J. Chem. Inf. Model. 57, 484–498 (2017).

    Article  Google Scholar 

  46. Liu, J. & Ning, X. Differential compound prioritization via bidirectional selectivity push with power. J. Chem. Inf. Model. 57, 2958–2975 (2017).

    Article  Google Scholar 

  47. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning Vol. 70 (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).

  48. Xu, K., Hu, W., Leskovec, J. & Jegelka, S. How powerful are graph neural networks? In Proc. 7th International Conference on Learning Representations (, 2019).

  49. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In Proc. 2nd International Conference on Learning Representations (eds. Bengio, Y. & LeCun, Y.) (, 2014).

  50. Wildman, S. A. & Crippen, G. M. Prediction of physicochemical parameters by atomic contributions. J. Chem. Inf. Comput. Sci. 39, 868–873 (1999).

    Article  Google Scholar 

  51. Reddi, S. J., Kale, S. & Kumar, S. On the convergence of Adam and beyond. In Proc. 6th International Conference on Learning Representations (, 2018).

  52. Chen, Z. A deep generative model for molecule optimization via one fragment modification. Zenodo (2021).

Download references


This project was made possible, in part, by support from the National Science Foundation grant nos. IIS-1855501 (X.N.), IIS-1827472 (X.N.), IIS-2133650 (X.N. and S.P.) and OAC-2018627 (S.P.), the National Library of Medicine grant nos. 1R01LM012605-01A1 (X.N.) and 1R21LM013678-01 (X.N.), an AWS Machine Learning Research Award (X.N.) and The Ohio State University President’s Research Excellence programme (X.N.). Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies. We thank X. Wang and X. Cheng for their constructive comments.

Author information

Authors and Affiliations



X.N. conceived the research. X.N. and S.P. obtained funding for the research and co-supervised Z.C. Z.C., M.R.M., S.P. and X.N. designed the research. Z.C. and X.N. conducted the research, including data curation, formal analysis, methodology design and implementation, result analysis and visualization. Z.C. drafted the original manuscript. M.R.M. provided comments on the original manuscript. Z.C., X.N. and S.P. conducted the manuscript editing and revision. All authors reviewed the final manuscript.

Corresponding author

Correspondence to Xia Ning.

Ethics declarations

Competing interests

M.R.M. was employed by NEC Labs America. The remaining authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Michael Withnall and Benjamin Sanchez-Lengeling for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Sections 1–14, Discussion, Tables 1–11, Figs. 1–9, Results and Algorithms 1–5.

Source data

Source Data Fig. 1

SMILES strings of molecules in Fig. 1.

Source Data Fig. 2

SMILES strings of molecules in Fig. 2.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Min, M.R., Parthasarathy, S. et al. A deep generative model for molecule optimization via one fragment modification. Nat Mach Intell 3, 1040–1049 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research