Abstract
Recent advances in generative modelling allow designing novel compounds through deep neural networks. One such neural network model, JT-VAE (the Junction Tree Variational Auto-Encoder), excels at proposing chemically valid structures. Here, on the basis of JT-VAE, we built a generative modelling approach, JAEGER, for finding novel chemical matter with desired bioactivity. Using JAEGER, we designed compounds to inhibit malaria. To prioritize the compounds for synthesis, we used the in-house pQSAR (Profile-QSAR) program, a massively multitask bioactivity model based on 12,000 Novartis assays. On the basis of pQSAR activity predictions, we selected, synthesized and experimentally profiled two compounds. Both compounds exhibited low nanomolar activity in a malaria proliferation assay as well as a biochemical assay measuring activity against PI(4)K, which is an essential kinase that regulates intracellular development in malaria. The compounds also showed low activity in a cytotoxicity assay. Our findings show that JAEGER is a viable approach for finding novel active compounds for drug discovery.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Discovery of senolytics using machine learning
Nature Communications Open Access 10 June 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




Data availability
The data used in this study are proprietary to Novartis. The data are not publicly available due to intellectual property restrictions. A demo dataset is available from the ChEMBL – Neglected Tropical Disease archive at https://chembl.gitbook.io/chembl-ntd/downloads/deposited-set-2-novartis-gnf-whole-cell-dataset-20th-may-2010.
Code availability
The code for JAEGER is available in Supplementary Software and at https://github.com/Novartis/JAEGER34.
References
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Keshavarzi Arshadi, A., Salem, M., Collins, J., Yuan, J. S. & Chakrabarti, D. DeepMalaria: artificial intelligence driven discovery of potent antiplasmodials. Front. Pharmacol. 10, 1526 (2019).
Lima, M. N. N. et al. Integrative multi-kinase approach for the identification of potent antiplasmodial hits. Front. Chem. 7, 773 (2019).
Bharti, D. R. & Lynn, A. M. QSAR based predictive modeling for anti-malarial molecules. Bioinformation 13, 154–159 (2017).
Winkler, D. A. Use of artificial intelligence and machine learning for discovery of drugs for neglected tropical diseases. Front. Chem. 9, 614073 (2021).
Rotstein, S. H. & Murcko, M. A. GroupBuild: a fragment-based method for de novo drug design. J. Med. Chem. 36, 1700–1710 (1993).
Ertl, P. & Lewis, R. IADE: a system for intelligent automatic design of bioisosteric analogs. J. Comput. Aided Mol. Des. 26, 1207–1215 (2012).
Vanhaelen, Q., Lin, Y. C. & Zhavoronkov, A. The advent of generative chemistry. ACS Med. Chem. Lett. 11, 1496–1505 (2020).
Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).
Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).
Awale, M., Sirockin, F., Stiefl, N. & Reymond, J. L. Drug analogs from fragment-based long short-term memory generative neural networks. J. Chem. Inf. Model. 59, 1347–1356 (2019).
Elton, D. C., Boukouvalas, Z., Fugea, M. D. & Chunga, P. W. Deep learning for molecular design—a review of the state of the art. Mol. Syst. Design Eng. 4, 828–849 (2019).
Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).
Li, X. & Fourches, D. SMILES pair encoding: a data-driven substructure tokenization algorithm for deep learning. J. Chem. Inf. Model. 61, 1560–1569 (2021).
Liu, Q., Allamanis, M., Brockschmidt, M. & Gaunt, A. L. Constrained graph variational autoencoders for molecule design. In Conference on Neural Information Processing Systems (NeurIPS) (eds Bengio, S. et al.) 7806–7815 (2018).
Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).
Jin, W., Barzilay, D. R. & Jaakkola, T. Hierarchical generation of molecular graphs using structural motifs. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé Hal, III & Singh Aarti) 4839–4848 (PMLR, 2020).
Bresson, X. L. & Thomas. A. Two-step graph convolutional decoder for molecule generation. In NeurIPS Workshop on Machine Learning and the Physical Sciences (2019).
Martin, E. J. et al. All-Assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC50s for 8558 Novartis assays. J. Chem. Inf. Model. 59, 4450–4459 (2019).
Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).
Alexander, D. L. J., Tropsha, A. & Winkler, D. A. Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322 (2015).
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 46, 3–26 (2001).
Zhumagambetov, R. et al. cheML.io: an online database of ML-generated molecules. RSC Adv. https://doi.org/10.1039/D0RA07820D (2020).
Winter, R. et al. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 10, 8016–8024 (2019).
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR) (eds Bengio, Y. & LeCun, Y.) (2014).
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) (eds Bengio, Y. & LeCun, Y.) (2015).
Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118, e2105070118 (2021).
Shenk, J., Richter, M. L., Arpteg, A. & Huss, M. Spectral analysis of latent representations. In Proc. Computational Cognition (COMCO 2019) (2019).
Chenouard, N. et al. Objective comparison of particle tracking methods. Nat. Methods 11, 281–289 (2014).
Godinez, W. J. & Rohr, K. Tracking multiple particles in fluorescence time-lapse microscopy images via probabilistic data association. IEEE Trans. Med. Imaging 34, 415–432 (2015).
Trager, W. & Jensen, J. B. Human malaria parasites in continuous culture. Science 193, 673–675 (1976).
Johnson, J. D. et al. Assessment and continued validation of the malaria SYBR green I-based fluorescence assay for use in malaria drug screening. Antimicrob. Agents Chemother. 51, 1926–1933 (2007).
McNamara, C. W. et al. Targeting plasmodium PI(4)K to eliminate malaria. Nature 504, 248–253 (2013).
Godinez, W. J. & Ma, E. J. Novartis/JAEGER: Public. Zenodo https://doi.org/10.5281/zenodo.5794429 (2021).
Acknowledgements
We express our gratitude to colleagues at Novartis that collected the data that were used to build the malaria model. We thank C. Sarko and W. Cortopassi for valuable discussions.
Author information
Authors and Affiliations
Contributions
W.J.G. and W.A.G. initiated, designed and led the study. W.J.G. and E.J. Ma developed and implemented JAEGER. W.J.G. built the malaria model and sampling algorithms. W.A.G. sampled the antimalarial molecule ideas. A.T.C. and L.P. conducted the profiling experiments and collected data. P.S.-C., J.L.J. and S.M.C. provided computational and synthesis resources as well as feedback. J.M.Y. designed the seed compound and provided feedback. E.J. Martin performed cheminformatics modelling and provided feedback. W.J.G., E.J. Martin, and W.A.G. analysed and interpreted the results. W.J.G. and W.A.G. wrote the manuscript. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
All authors are (or were at the time of their involvement with the studies) employees of Novartis.
Peer review
Peer review information
Nature Machine Intelligence thanks Milad Salem and David Winkler for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Supplementary information
Supplementary Information
Supplementary Figs. 1–5 and Note.
Supplementary Software
Source code for JAEGER.
Rights and permissions
About this article
Cite this article
Godinez, W.J., Ma, E.J., Chao, A.T. et al. Design of potent antimalarials with generative chemistry. Nat Mach Intell 4, 180–186 (2022). https://doi.org/10.1038/s42256-022-00448-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-022-00448-w
This article is cited by
-
Antimalarial drug discovery: progress and approaches
Nature Reviews Drug Discovery (2023)
-
Discovery of senolytics using machine learning
Nature Communications (2023)
-
A multilevel generative framework with hierarchical self-contrasting for bias control and transparency in structure-based ligand design
Nature Machine Intelligence (2022)
-
Potent antimalarial drugs with validated activities
Nature Machine Intelligence (2022)