Recent advances in generative modelling allow designing novel compounds through deep neural networks. One such neural network model, JT-VAE (the Junction Tree Variational Auto-Encoder), excels at proposing chemically valid structures. Here, on the basis of JT-VAE, we built a generative modelling approach, JAEGER, for finding novel chemical matter with desired bioactivity. Using JAEGER, we designed compounds to inhibit malaria. To prioritize the compounds for synthesis, we used the in-house pQSAR (Profile-QSAR) program, a massively multitask bioactivity model based on 12,000 Novartis assays. On the basis of pQSAR activity predictions, we selected, synthesized and experimentally profiled two compounds. Both compounds exhibited low nanomolar activity in a malaria proliferation assay as well as a biochemical assay measuring activity against PI(4)K, which is an essential kinase that regulates intracellular development in malaria. The compounds also showed low activity in a cytotoxicity assay. Our findings show that JAEGER is a viable approach for finding novel active compounds for drug discovery.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Nature Communications Open Access 10 June 2023
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
The data used in this study are proprietary to Novartis. The data are not publicly available due to intellectual property restrictions. A demo dataset is available from the ChEMBL – Neglected Tropical Disease archive at https://chembl.gitbook.io/chembl-ntd/downloads/deposited-set-2-novartis-gnf-whole-cell-dataset-20th-may-2010.
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Keshavarzi Arshadi, A., Salem, M., Collins, J., Yuan, J. S. & Chakrabarti, D. DeepMalaria: artificial intelligence driven discovery of potent antiplasmodials. Front. Pharmacol. 10, 1526 (2019).
Lima, M. N. N. et al. Integrative multi-kinase approach for the identification of potent antiplasmodial hits. Front. Chem. 7, 773 (2019).
Bharti, D. R. & Lynn, A. M. QSAR based predictive modeling for anti-malarial molecules. Bioinformation 13, 154–159 (2017).
Winkler, D. A. Use of artificial intelligence and machine learning for discovery of drugs for neglected tropical diseases. Front. Chem. 9, 614073 (2021).
Rotstein, S. H. & Murcko, M. A. GroupBuild: a fragment-based method for de novo drug design. J. Med. Chem. 36, 1700–1710 (1993).
Ertl, P. & Lewis, R. IADE: a system for intelligent automatic design of bioisosteric analogs. J. Comput. Aided Mol. Des. 26, 1207–1215 (2012).
Vanhaelen, Q., Lin, Y. C. & Zhavoronkov, A. The advent of generative chemistry. ACS Med. Chem. Lett. 11, 1496–1505 (2020).
Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Awale, M., Sirockin, F., Stiefl, N. & Reymond, J. L. Drug analogs from fragment-based long short-term memory generative neural networks. J. Chem. Inf. Model. 59, 1347–1356 (2019).
Elton, D. C., Boukouvalas, Z., Fugea, M. D. & Chunga, P. W. Deep learning for molecular design—a review of the state of the art. Mol. Syst. Design Eng. 4, 828–849 (2019).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Li, X. & Fourches, D. SMILES pair encoding: a data-driven substructure tokenization algorithm for deep learning. J. Chem. Inf. Model. 61, 1560–1569 (2021).
Liu, Q., Allamanis, M., Brockschmidt, M. & Gaunt, A. L. Constrained graph variational autoencoders for molecule design. In Conference on Neural Information Processing Systems (NeurIPS) (eds Bengio, S. et al.) 7806–7815 (2018).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Jin, W., Barzilay, D. R. & Jaakkola, T. Hierarchical generation of molecular graphs using structural motifs. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé Hal, III & Singh Aarti) 4839–4848 (PMLR, 2020).
Bresson, X. L. & Thomas. A. Two-step graph convolutional decoder for molecule generation. In NeurIPS Workshop on Machine Learning and the Physical Sciences (2019).
Martin, E. J. et al. All-Assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC50s for 8558 Novartis assays. J. Chem. Inf. Model. 59, 4450–4459 (2019).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Alexander, D. L. J., Tropsha, A. & Winkler, D. A. Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322 (2015).
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 46, 3–26 (2001).
Zhumagambetov, R. et al. cheML.io: an online database of ML-generated molecules. RSC Adv. https://doi.org/10.1039/D0RA07820D (2020).
Winter, R. et al. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 10, 8016–8024 (2019).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR) (eds Bengio, Y. & LeCun, Y.) (2014).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) (eds Bengio, Y. & LeCun, Y.) (2015).
Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118, e2105070118 (2021).
Shenk, J., Richter, M. L., Arpteg, A. & Huss, M. Spectral analysis of latent representations. In Proc. Computational Cognition (COMCO 2019) (2019).
Chenouard, N. et al. Objective comparison of particle tracking methods. Nat. Methods 11, 281–289 (2014).
Godinez, W. J. & Rohr, K. Tracking multiple particles in fluorescence time-lapse microscopy images via probabilistic data association. IEEE Trans. Med. Imaging 34, 415–432 (2015).
Trager, W. & Jensen, J. B. Human malaria parasites in continuous culture. Science 193, 673–675 (1976).
Johnson, J. D. et al. Assessment and continued validation of the malaria SYBR green I-based fluorescence assay for use in malaria drug screening. Antimicrob. Agents Chemother. 51, 1926–1933 (2007).
McNamara, C. W. et al. Targeting plasmodium PI(4)K to eliminate malaria. Nature 504, 248–253 (2013).
Godinez, W. J. & Ma, E. J. Novartis/JAEGER: Public. Zenodo https://doi.org/10.5281/zenodo.5794429 (2021).
We express our gratitude to colleagues at Novartis that collected the data that were used to build the malaria model. We thank C. Sarko and W. Cortopassi for valuable discussions.
All authors are (or were at the time of their involvement with the studies) employees of Novartis.
Peer review information
Nature Machine Intelligence thanks Milad Salem and David Winkler for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Godinez, W.J., Ma, E.J., Chao, A.T. et al. Design of potent antimalarials with generative chemistry. Nat Mach Intell 4, 180–186 (2022). https://doi.org/10.1038/s42256-022-00448-w
This article is cited by
Nature Reviews Drug Discovery (2023)
Nature Communications (2023)
A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design
Nature Machine Intelligence (2022)
Nature Machine Intelligence (2022)