Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Improving de novo molecular design with curriculum learning

An Author Correction to this article was published on 20 July 2022

This article has been updated

A preprint version of the article is available at ChemRxiv.


Reinforcement learning is a powerful paradigm that has gained popularity across multiple domains. However, applying reinforcement learning may come at the cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, causing extensive periods of non-productivity. Curriculum learning provides a suitable alternative by arranging a sequence of tasks of increasing complexity, with the aim of reducing the overall cost of learning. Here we demonstrate the application of curriculum learning for drug discovery. We implement curriculum learning in the de novo design platform REINVENT, and apply it to illustrative molecular design problems of different complexities. The results show both accelerated learning and a positive impact on the quality of the output when compared with standard policy-based reinforcement learning.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: CL overview.
Fig. 2: CL target scaffold construction.
Fig. 3: Baseline RL versus CL to design PDK1 inhibitors.
Fig. 4: Baseline RL versus CL docking score distribution.
Fig. 5: Baseline RL versus CL unique Bemis–Murcko scaffolds.
Fig. 6: Agent knowledge retention and effects of curriculum objectives on the solution space diversity.

Similar content being viewed by others

Data availability

The trained generative model to reproduce the experiments in this work is provided at The raw data that support the findings of this study are available from the corresponding author upon request.

Code availability

The code used in this study is available at A corresponding tutorial for the code is available at The specific frozen version of the code is available at (ref. 48). The DOI badge is provided at

Change history


  1. Jiménez-Luna, J., Grisoni, F, Weskamp, N & Schneider, G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin. Drug Discov. 16, 949–959 (2021).

  2. Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).

    Article  Google Scholar 

  3. Polishchuk, P. G., Madzhidov, T. I. & Varnek, A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput. Aided Mol. Des. 27, 675–679 (2013).

    Article  Google Scholar 

  4. Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).

    Article  Google Scholar 

  5. Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459 (2022).

    Article  Google Scholar 

  6. Arús-Pous, J. et al. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminformatics 11, 71 (2019).

    Article  Google Scholar 

  7. Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article  Google Scholar 

  8. Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model. 60, 5918–5922 (2020).

    Article  Google Scholar 

  9. Thomas, M., Smith, R. T., O’Boyle, N. M., de Graaf, C. & Bender, A. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J. Cheminformatics 13, 39 (2021).

    Article  Google Scholar 

  10. Goel, M., Raghunathan, S., Laghuvarapu, S. & Priyakumar, U. D. MoleGuLAR: Molecule Generation Using Reinforcement Learning with Alternating Rewards. J. Chem. Inf. Model. 61, 5815–5826 (2021).

  11. Ståhl, N., Falkman, G., Karlsson, A., Mathiason, G. & Boström, J. Deep reinforcement learning for multiparameter optimization in de novo drug design. J. Chem. Inf. Model. 59, 3166–3176 (2019).

    Article  Google Scholar 

  12. Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C. & Aspuru-Guzik, A. Objective-Reinforced Generative Adversarial Networks (ORGAN) for sequence generation models. Preprint at (2017).

  13. Sanchez-Lengeling, B., Outeiral, C. & Guimaraes, G. L. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). Preprint at (2017).

  14. Zhou, Z., Kearnes, S., Li, L., Zare, R. N. & Riley, P. Optimization of molecules via deep reinforcement learning. Sci. Rep. 9, 10752 (2019).

    Article  Google Scholar 

  15. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  Google Scholar 

  16. Ma, B. et al. Structure-based de novo molecular generator combined with artificial intelligence and docking simulations. J. Chem. Inf. Model. 61, 3304–3313 (2021).

    Article  Google Scholar 

  17. Bai, Q. et al. MolAICal: a soft tool for 3D drug design of protein targets by artificial intelligence and classical algorithm. Brief. Bioinform. 22, bbaa161 (2021).

  18. Choi, J. & Lee, J. V-dock: fast generation of novel drug-like molecules using machine-learning-based docking score and molecular optimization. Int. J. Mol. Sci. 22, 11635 (2021).

    Article  Google Scholar 

  19. Nigam, A., Pollice, R. & Aspuru-Guzik, A. JANUS: parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design. Preprint at (2021).

  20. Nicolaou, C. A., Apostolakis, J. & Pattichis, C. S. De novo drug design using multiobjective evolutionary graphs. J. Chem. Inf. Model. 49, 295–307 (2009).

    Article  Google Scholar 

  21. Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In ICML’09: Proc. 26th Annual International Conference on Machine Learning 41–48 (ACM, 2009);

  22. Weinshall, D., Cohen, G. & Amir, D. Curriculum learning by transfer learning: theory and experiments with deep networks. Preprint at (2018).

  23. Hacohen, G. & Weinshall, D. On the power of curriculum learning in training deep networks. Proc. 36th International Conference on Machine Learning 2535–2544 (PMLR, 2019).

  24. Zhao, H. Scaffold selection and scaffold hopping in lead generation: a medicinal chemistry perspective. Drug Discov. Today 12, 149–155 (2007).

    Article  Google Scholar 

  25. Angiolini, M. et al. Structure-based optimization of potent PDK1 inhibitors. Bioorg. Med. Chem. Lett. 20, 4095–4099 (2010).

    Article  Google Scholar 

  26. Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).

    Article  Google Scholar 

  27. ROCS (OpenEye Scientific Software, 2021).

  28. Hawkins, P. C. D., Skillman, A. G. & Nicholls, A. Comparison of shape-matching and docking as virtual screening tools. J. Med. Chem. 50, 74–82 (2007).

  29. Schrödinger Release 2019-4: LigPrep (Schrödinger, 2019).

  30. Schrödinger Release 2019-4: Glide (Schrödinger, 2019).

  31. Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).

    Article  Google Scholar 

  32. Halgren, T. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759 (2004).

    Article  Google Scholar 

  33. Friesner, R. A. et al. Extra Precision Glide: docking and scoring incorporating a model of hydrophobic enclosure for protein–ligand complexes. J. Med. Chem. 49, 6177–6196 (2006).

    Article  Google Scholar 

  34. Alex, A., Millan, D. S., Perez, M., Wakenhut, F. & Whitlock, G. A. Intramolecular hydrogen bonding to improve membrane permeability and absorption in beyond rule of five chemical space. MedChemComm 2, 669–674 (2011).

    Article  Google Scholar 

  35. Nettles, J. H. et al. Bridging chemical and biological space: ‘target fishing’ using 2D and 3D molecular descriptors. J. Med. Chem. 49, 6802–6810 (2006).

    Article  Google Scholar 

  36. Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).

    Article  Google Scholar 

  37. McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for dimension reduction. Preprint at (2018).

  38. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Article  Google Scholar 

  39. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminformatics 9, 48 (2017).

    Article  Google Scholar 

  40. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  Google Scholar 

  41. Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).

    Article  Google Scholar 

  42. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at (2017).

  43. Blaschke, T., Engkvist, O., Bajorath, J. & Chen, H. Memory-assisted reinforcement learning for diverse molecular de novo design. J. Cheminformatics 12, 68 (2020).

    Article  Google Scholar 

  44. Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T. P. & Wayne, G. Experience replay for continual learning. Preprint at (2019).

  45. Papadopoulos, K., Giblin, K. A., Janet, J. P., Patronov, A. & Engkvist, O. De novo design with deep generative models based on 3D similarity scoring. Bioorg. Med. Chem. 44, 116308 (2021).

    Article  Google Scholar 

  46. Schrödinger Release 2021-2: Maestro (Schrödinger, 2021).

  47. Guo, J. et al. DockStream: a docking wrapper to enhance de novo molecular design. J. Cheminformatics 13, 89 (2021).

    Article  Google Scholar 

  48. Patronov, A., Margreitter, C., Guo, J. & Blaschke T. patronov/Reinvent: REINVENT 3.2 (v3.2). Zenodo (2022).

Download references


We thank K. Giblin, A. Tomberg and E. Nittinger for constructive user feedback that helped us develop the concepts presented in work.

Author information

Authors and Affiliations



V.F., J.G., J.D.A. and A.P. developed the code. J.G., A.P., J.P.J., C.M. and K.P. designed the experiments. J.G. performed the experiments and analyses. J.G. wrote the manuscript and all other authors revised it. A.P. supervised the work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Atanas Patronov.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Christos Nicolaou and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–20, Discussion and Tables 1–4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, J., Fialková, V., Arango, J.D. et al. Improving de novo molecular design with curriculum learning. Nat Mach Intell 4, 555–563 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research