Abstract
Reinforcement learning is a powerful paradigm that has gained popularity across multiple domains. However, applying reinforcement learning may come at the cost of multiple interactions between the agent and the environment. This cost can be especially pronounced when the single feedback from the environment is slow or computationally expensive, causing extensive periods of non-productivity. Curriculum learning provides a suitable alternative by arranging a sequence of tasks of increasing complexity, with the aim of reducing the overall cost of learning. Here we demonstrate the application of curriculum learning for drug discovery. We implement curriculum learning in the de novo design platform REINVENT, and apply it to illustrative molecular design problems of different complexities. The results show both accelerated learning and a positive impact on the quality of the output when compared with standard policy-based reinforcement learning.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The trained generative model to reproduce the experiments in this work is provided at https://github.com/MolecularAI/ReinventCommunity/blob/master/notebooks/models/random.prior.new. The raw data that support the findings of this study are available from the corresponding author upon request.
Code availability
The code used in this study is available at https://github.com/MolecularAI/Reinvent. A corresponding tutorial for the code is available at https://github.com/MolecularAI/ReinventCommunity/blob/master/notebooks/Automated_Curriculum_Learning_Demo.ipynb. The specific frozen version of the code is available at https://zenodo.org/badge/latestdoi/486692494 (ref. 48). The DOI badge is provided at https://zenodo.org/badge/486692494.svg.
Change history
20 July 2022
A Correction to this paper has been published: https://doi.org/10.1038/s42256-022-00522-3
References
Jiménez-Luna, J., Grisoni, F, Weskamp, N & Schneider, G. Artificial intelligence in drug discovery: recent advances and future perspectives. Expert Opin. Drug Discov. 16, 949–959 (2021).
Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2020).
Polishchuk, P. G., Madzhidov, T. I. & Varnek, A. Estimation of the size of drug-like chemical space based on GDB-17 data. J. Comput. Aided Mol. Des. 27, 675–679 (2013).
Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).
Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601, 452–459 (2022).
Arús-Pous, J. et al. Randomized SMILES strings improve the quality of molecular generative models. J. Cheminformatics 11, 71 (2019).
Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).
Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model. 60, 5918–5922 (2020).
Thomas, M., Smith, R. T., O’Boyle, N. M., de Graaf, C. & Bender, A. Comparison of structure- and ligand-based scoring functions for deep generative models: a GPCR case study. J. Cheminformatics 13, 39 (2021).
Goel, M., Raghunathan, S., Laghuvarapu, S. & Priyakumar, U. D. MoleGuLAR: Molecule Generation Using Reinforcement Learning with Alternating Rewards. J. Chem. Inf. Model. 61, 5815–5826 (2021).
Ståhl, N., Falkman, G., Karlsson, A., Mathiason, G. & Boström, J. Deep reinforcement learning for multiparameter optimization in de novo drug design. J. Chem. Inf. Model. 59, 3166–3176 (2019).
Guimaraes, G. L., Sanchez-Lengeling, B., Outeiral, C., Farias, P. L. C. & Aspuru-Guzik, A. Objective-Reinforced Generative Adversarial Networks (ORGAN) for sequence generation models. Preprint at https://arxiv.org/abs/1705.10843 (2017).
Sanchez-Lengeling, B., Outeiral, C. & Guimaraes, G. L. Optimizing distributions over molecular space. An objective-reinforced generative adversarial network for inverse-design chemistry (ORGANIC). Preprint at https://doi.org/10.26434/chemrxiv.5309668.v3 (2017).
Zhou, Z., Kearnes, S., Li, L., Zare, R. N. & Riley, P. Optimization of molecules via deep reinforcement learning. Sci. Rep. 9, 10752 (2019).
Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Ma, B. et al. Structure-based de novo molecular generator combined with artificial intelligence and docking simulations. J. Chem. Inf. Model. 61, 3304–3313 (2021).
Bai, Q. et al. MolAICal: a soft tool for 3D drug design of protein targets by artificial intelligence and classical algorithm. Brief. Bioinform. 22, bbaa161 (2021).
Choi, J. & Lee, J. V-dock: fast generation of novel drug-like molecules using machine-learning-based docking score and molecular optimization. Int. J. Mol. Sci. 22, 11635 (2021).
Nigam, A., Pollice, R. & Aspuru-Guzik, A. JANUS: parallel tempered genetic algorithm guided by deep neural networks for inverse molecular design. Preprint at https://arxiv.org/abs/2106.04011 (2021).
Nicolaou, C. A., Apostolakis, J. & Pattichis, C. S. De novo drug design using multiobjective evolutionary graphs. J. Chem. Inf. Model. 49, 295–307 (2009).
Bengio, Y., Louradour, J., Collobert, R. & Weston, J. Curriculum learning. In ICML’09: Proc. 26th Annual International Conference on Machine Learning 41–48 (ACM, 2009); https://doi.org/10.1145/1553374.1553380
Weinshall, D., Cohen, G. & Amir, D. Curriculum learning by transfer learning: theory and experiments with deep networks. Preprint at https://arxiv.org/abs/1802.03796 (2018).
Hacohen, G. & Weinshall, D. On the power of curriculum learning in training deep networks. Proc. 36th International Conference on Machine Learning 2535–2544 (PMLR, 2019).
Zhao, H. Scaffold selection and scaffold hopping in lead generation: a medicinal chemistry perspective. Drug Discov. Today 12, 149–155 (2007).
Angiolini, M. et al. Structure-based optimization of potent PDK1 inhibitors. Bioorg. Med. Chem. Lett. 20, 4095–4099 (2010).
Bickerton, G. R., Paolini, G. V., Besnard, J., Muresan, S. & Hopkins, A. L. Quantifying the chemical beauty of drugs. Nat. Chem. 4, 90–98 (2012).
ROCS 3.4.2.1 (OpenEye Scientific Software, 2021).
Hawkins, P. C. D., Skillman, A. G. & Nicholls, A. Comparison of shape-matching and docking as virtual screening tools. J. Med. Chem. 50, 74–82 (2007).
Schrödinger Release 2019-4: LigPrep (Schrödinger, 2019).
Schrödinger Release 2019-4: Glide (Schrödinger, 2019).
Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47, 1739–1749 (2004).
Halgren, T. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47, 1750–1759 (2004).
Friesner, R. A. et al. Extra Precision Glide: docking and scoring incorporating a model of hydrophobic enclosure for protein–ligand complexes. J. Med. Chem. 49, 6177–6196 (2006).
Alex, A., Millan, D. S., Perez, M., Wakenhut, F. & Whitlock, G. A. Intramolecular hydrogen bonding to improve membrane permeability and absorption in beyond rule of five chemical space. MedChemComm 2, 669–674 (2011).
Nettles, J. H. et al. Bridging chemical and biological space: ‘target fishing’ using 2D and 3D molecular descriptors. J. Med. Chem. 49, 6802–6810 (2006).
Bemis, G. W. & Murcko, M. A. The properties of known drugs. 1. Molecular frameworks. J. Med. Chem. 39, 2887–2893 (1996).
McInnes, L., Healy, J. & Melville, J. UMAP: Uniform Manifold Approximation and Projection for dimension reduction. Preprint at https://arxiv.org/abs/1802.03426 (2018).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminformatics 9, 48 (2017).
Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).
Gaulton, A. et al. The ChEMBL database in 2017. Nucleic Acids Res. 45, D945–D954 (2017).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2017).
Blaschke, T., Engkvist, O., Bajorath, J. & Chen, H. Memory-assisted reinforcement learning for diverse molecular de novo design. J. Cheminformatics 12, 68 (2020).
Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T. P. & Wayne, G. Experience replay for continual learning. Preprint at https://arxiv.org/abs/1811.11682 (2019).
Papadopoulos, K., Giblin, K. A., Janet, J. P., Patronov, A. & Engkvist, O. De novo design with deep generative models based on 3D similarity scoring. Bioorg. Med. Chem. 44, 116308 (2021).
Schrödinger Release 2021-2: Maestro (Schrödinger, 2021).
Guo, J. et al. DockStream: a docking wrapper to enhance de novo molecular design. J. Cheminformatics 13, 89 (2021).
Patronov, A., Margreitter, C., Guo, J. & Blaschke T. patronov/Reinvent: REINVENT 3.2 (v3.2). Zenodo https://doi.org/10.5281/zenodo.6502363 (2022).
Acknowledgements
We thank K. Giblin, A. Tomberg and E. Nittinger for constructive user feedback that helped us develop the concepts presented in work.
Author information
Authors and Affiliations
Contributions
V.F., J.G., J.D.A. and A.P. developed the code. J.G., A.P., J.P.J., C.M. and K.P. designed the experiments. J.G. performed the experiments and analyses. J.G. wrote the manuscript and all other authors revised it. A.P. supervised the work. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Machine Intelligence thanks Christos Nicolaou and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–20, Discussion and Tables 1–4.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Guo, J., Fialková, V., Arango, J.D. et al. Improving de novo molecular design with curriculum learning. Nat Mach Intell 4, 555–563 (2022). https://doi.org/10.1038/s42256-022-00494-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-022-00494-4
This article is cited by
-
Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning
Nature Machine Intelligence (2023)
-
Augmented Hill-Climb increases reinforcement learning efficiency for language-based de novo molecule generation
Journal of Cheminformatics (2022)