Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Design of potent antimalarials with generative chemistry


Recent advances in generative modelling allow designing novel compounds through deep neural networks. One such neural network model, JT-VAE (the Junction Tree Variational Auto-Encoder), excels at proposing chemically valid structures. Here, on the basis of JT-VAE, we built a generative modelling approach, JAEGER, for finding novel chemical matter with desired bioactivity. Using JAEGER, we designed compounds to inhibit malaria. To prioritize the compounds for synthesis, we used the in-house pQSAR (Profile-QSAR) program, a massively multitask bioactivity model based on 12,000 Novartis assays. On the basis of pQSAR activity predictions, we selected, synthesized and experimentally profiled two compounds. Both compounds exhibited low nanomolar activity in a malaria proliferation assay as well as a biochemical assay measuring activity against PI(4)K, which is an essential kinase that regulates intracellular development in malaria. The compounds also showed low activity in a cytotoxicity assay. Our findings show that JAEGER is a viable approach for finding novel active compounds for drug discovery.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of JAEGER.
Fig. 2: Distribution of calculated properties of training molecules and the 282 molecules generated by the model.
Fig. 3: Structures of synthesized compounds.
Fig. 4: Profiling results for compounds 1 and 2 over three assays.

Data availability

The data used in this study are proprietary to Novartis. The data are not publicly available due to intellectual property restrictions. A demo dataset is available from the ChEMBL – Neglected Tropical Disease archive at

Code availability

The code for JAEGER is available in Supplementary Software and at


  1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  Google Scholar 

  2. Keshavarzi Arshadi, A., Salem, M., Collins, J., Yuan, J. S. & Chakrabarti, D. DeepMalaria: artificial intelligence driven discovery of potent antiplasmodials. Front. Pharmacol. 10, 1526 (2019).

    Article  Google Scholar 

  3. Lima, M. N. N. et al. Integrative multi-kinase approach for the identification of potent antiplasmodial hits. Front. Chem. 7, 773 (2019).

    Article  Google Scholar 

  4. Bharti, D. R. & Lynn, A. M. QSAR based predictive modeling for anti-malarial molecules. Bioinformation 13, 154–159 (2017).

    Article  Google Scholar 

  5. Winkler, D. A. Use of artificial intelligence and machine learning for discovery of drugs for neglected tropical diseases. Front. Chem. 9, 614073 (2021).

    Article  Google Scholar 

  6. Rotstein, S. H. & Murcko, M. A. GroupBuild: a fragment-based method for de novo drug design. J. Med. Chem. 36, 1700–1710 (1993).

    Article  Google Scholar 

  7. Ertl, P. & Lewis, R. IADE: a system for intelligent automatic design of bioisosteric analogs. J. Comput. Aided Mol. Des. 26, 1207–1215 (2012).

    Article  Google Scholar 

  8. Vanhaelen, Q., Lin, Y. C. & Zhavoronkov, A. The advent of generative chemistry. ACS Med. Chem. Lett. 11, 1496–1505 (2020).

    Article  Google Scholar 

  9. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    Article  Google Scholar 

  10. Sanchez-Lengeling, B. & Aspuru-Guzik, A. Inverse molecular design using machine learning: generative models for matter engineering. Science 361, 360–365 (2018).

    Article  Google Scholar 

  11. Awale, M., Sirockin, F., Stiefl, N. & Reymond, J. L. Drug analogs from fragment-based long short-term memory generative neural networks. J. Chem. Inf. Model. 59, 1347–1356 (2019).

    Article  Google Scholar 

  12. Elton, D. C., Boukouvalas, Z., Fugea, M. D. & Chunga, P. W. Deep learning for molecular design—a review of the state of the art. Mol. Syst. Design Eng. 4, 828–849 (2019).

    Article  Google Scholar 

  13. Weininger, D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28, 31–36 (1988).

    Article  Google Scholar 

  14. Li, X. & Fourches, D. SMILES pair encoding: a data-driven substructure tokenization algorithm for deep learning. J. Chem. Inf. Model. 61, 1560–1569 (2021).

    Article  Google Scholar 

  15. Liu, Q., Allamanis, M., Brockschmidt, M. & Gaunt, A. L. Constrained graph variational autoencoders for molecule design. In Conference on Neural Information Processing Systems (NeurIPS) (eds Bengio, S. et al.) 7806–7815 (2018).

  16. Jin, W., Barzilay, R. & Jaakkola, T. Junction tree variational autoencoder for molecular graph generation. In Proc. 35th International Conference on Machine Learning Vol. 80 (eds Dy, J. & Krause, A.) 2323–2332 (PMLR, 2018).

  17. Jin, W., Barzilay, D. R. & Jaakkola, T. Hierarchical generation of molecular graphs using structural motifs. In Proc. 37th International Conference on Machine Learning Vol. 119 (eds Daumé Hal, III & Singh Aarti) 4839–4848 (PMLR, 2020).

  18. Bresson, X. L. & Thomas. A. Two-step graph convolutional decoder for molecule generation. In NeurIPS Workshop on Machine Learning and the Physical Sciences (2019).

  19. Martin, E. J. et al. All-Assay-Max2 pQSAR: activity predictions as accurate as four-concentration IC50s for 8558 Novartis assays. J. Chem. Inf. Model. 59, 4450–4459 (2019).

    Article  Google Scholar 

  20. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).

    Article  Google Scholar 

  21. Alexander, D. L. J., Tropsha, A. & Winkler, D. A. Beware of R2: simple, unambiguous assessment of the prediction accuracy of QSAR and QSPR models. J. Chem. Inf. Model. 55, 1316–1322 (2015).

    Article  Google Scholar 

  22. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 46, 3–26 (2001).

    Article  Google Scholar 

  23. Zhumagambetov, R. et al. an online database of ML-generated molecules. RSC Adv. (2020).

  24. Winter, R. et al. Efficient multi-objective molecular optimization in a continuous latent space. Chem. Sci. 10, 8016–8024 (2019).

    Article  Google Scholar 

  25. Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. In International Conference on Learning Representations (ICLR) (eds Bengio, Y. & LeCun, Y.) (2014).

  26. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. In International Conference on Learning Representations (ICLR) (eds Bengio, Y. & LeCun, Y.) (2015).

  27. Jin, W. et al. Deep learning identifies synergistic drug combinations for treating COVID-19. Proc. Natl Acad. Sci. USA 118, e2105070118 (2021).

    Article  Google Scholar 

  28. Shenk, J., Richter, M. L., Arpteg, A. & Huss, M. Spectral analysis of latent representations. In Proc. Computational Cognition (COMCO 2019) (2019).

  29. Chenouard, N. et al. Objective comparison of particle tracking methods. Nat. Methods 11, 281–289 (2014).

    Article  Google Scholar 

  30. Godinez, W. J. & Rohr, K. Tracking multiple particles in fluorescence time-lapse microscopy images via probabilistic data association. IEEE Trans. Med. Imaging 34, 415–432 (2015).

    Article  Google Scholar 

  31. Trager, W. & Jensen, J. B. Human malaria parasites in continuous culture. Science 193, 673–675 (1976).

    Article  Google Scholar 

  32. Johnson, J. D. et al. Assessment and continued validation of the malaria SYBR green I-based fluorescence assay for use in malaria drug screening. Antimicrob. Agents Chemother. 51, 1926–1933 (2007).

    Article  Google Scholar 

  33. McNamara, C. W. et al. Targeting plasmodium PI(4)K to eliminate malaria. Nature 504, 248–253 (2013).

    Article  Google Scholar 

  34. Godinez, W. J. & Ma, E. J. Novartis/JAEGER: Public. Zenodo (2021).

Download references


We express our gratitude to colleagues at Novartis that collected the data that were used to build the malaria model. We thank C. Sarko and W. Cortopassi for valuable discussions.

Author information

Authors and Affiliations



W.J.G. and W.A.G. initiated, designed and led the study. W.J.G. and E.J. Ma developed and implemented JAEGER. W.J.G. built the malaria model and sampling algorithms. W.A.G. sampled the antimalarial molecule ideas. A.T.C. and L.P. conducted the profiling experiments and collected data. P.S.-C., J.L.J. and S.M.C. provided computational and synthesis resources as well as feedback. J.M.Y. designed the seed compound and provided feedback. E.J. Martin performed cheminformatics modelling and provided feedback. W.J.G., E.J. Martin, and W.A.G. analysed and interpreted the results. W.J.G. and W.A.G. wrote the manuscript. All authors reviewed the manuscript.

Corresponding authors

Correspondence to William J. Godinez or W. Armand Guiguemde.

Ethics declarations

Competing interests

All authors are (or were at the time of their involvement with the studies) employees of Novartis.

Peer review

Peer review information

Nature Machine Intelligence thanks Milad Salem and David Winkler for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Table 1 Tanimoto similarities

Supplementary information

Supplementary Information

Supplementary Figs. 1–5 and Note.

Reporting Summary

Supplementary Software

Source code for JAEGER.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Godinez, W.J., Ma, E.J., Chao, A.T. et al. Design of potent antimalarials with generative chemistry. Nat Mach Intell 4, 180–186 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing