Application of variational graph encoders as an effective generalist algorithm in computer-aided drug design

Lam, Hilbert Yuen In; Pincket, Robbe; Han, Hao; Ong, Xing Er; Wang, Zechen; Hinks, Jamie; Wei, Yanjie; Li, Weifeng; Zheng, Liangzhen; Mu, Yuguang

doi:10.1038/s42256-023-00683-9

Article
Published: 06 July 2023

Application of variational graph encoders as an effective generalist algorithm in computer-aided drug design

Nature Machine Intelligence volume 5, pages 754–764 (2023)Cite this article

3028 Accesses
4 Citations
2 Altmetric
Metrics details

Subjects

A preprint version of the article is available at bioRxiv.

Abstract

Although there has been considerable progress in molecular property prediction in computer-aided drug design, there is a critical need to have fast and accurate models. Many of the currently available methods are mostly specialize in predicting specific properties, leading to the use of many models side-by-side that lead to impossibly high computational overheads for the common researcher. Henceforth, the authors propose a single, generalist unified model exploiting graph convolutional variational encoders that can simultaneously predict multiple properties such as absorption, distribution, metabolism, excretion and toxicity, target-specific docking score prediction, and drug–drug interactions. The use of such a method allows for state-of-the-art virtual screening with a considerable acceleration advantage of up to two orders of magnitude. The minimization of a graph variational encoder’s latent space also allows for accelerated development of specific drugs for targets with Pareto optimality principles considered, and has the added advantage of explainability.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Molecules are encoded into a graph format, which is then passed through an autoencoder, with intermediate mathematical latent space used for property prediction through surrogate models.**

**Fig. 2: Variational graph encoder showed high accuracy in deriving fingerprints and other molecular descriptors while maintaining a Gaussian-distributed latent space.**

Fig. 3: Surrogate models exploiting the variational graph encoder’s latent space can accurately predict single- and multiclassification problems, and regression problems for common datasets, even if the data is skewed.

**Fig. 4: Ligand-based drug discovery is doable with latent space-trained surrogate models, with substantial speedup.**

**Fig. 5: Desired molecular properties can be engineered with surrogate model optimization, with explainability as to how one molecule is preferred over another in property prediction.**

Accurate structure prediction of biomolecular interactions with AlphaFold 3

Article 08 May 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

De novo generation of multi-target compounds using deep generative chemistry

Article Open access 06 May 2024

Data availability

Data used in this study are all publicly available from various datasets cited. Molecular clusters used to train the model are available at https://doi.org/10.34740/kaggle/dsv/5657232 (~19 GB). The trained model encoder weights are also provided in the GitHub repository.

Code availability

Most of the updated code is available at https://github.com/Chokyotager/NotYetAnotherNightshade (ref. ⁶⁸).

References

Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev. 16, 3–50 (1996).
Google Scholar
Hutchinson, L. & Kirk, R. High drug attrition rates–where are we going wrong? Nat. Rev. Clin. Oncol. 8, 189–190 (2011).
Google Scholar
Wouters, O. J., McKee, M. & Luyten, J. Estimated research and development investment needed to bring a new medicine to market, 2009–2018. JAMA 323, 844–853 (2020).
Google Scholar
Baig, M. H., Ahmad, K., Rabbani, G., Danishuddin, M. & Choi, I. Computer aided drug design and its application to the development of potential drugs for neurodegenerative disorders. Curr. Neuropharmacol. 16, 740–748 (2018).
Google Scholar
Liu, T. et al. Applying high-performance computing in drug discovery and molecular simulation. Natl Sci. Rev. 3, 49–63 (2016).
Google Scholar
Sun, D., Gao, W., Hu, H. & Zhou, S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 12, 3049–3062 (2022).
Google Scholar
Tornio, A., Filppula, A. M., Niemi, M. & Backman, J. T. Clinical studies on drug–drug interactions involving metabolism and transport: methodology, pitfalls, and interpretation. Clin. Pharmacol. Ther. 105, 1345–1361 (2019).
Google Scholar
Wang, J. Comprehensive assessment of ADMET risks in drug discovery. Curr. Pharm. Des. 15, 2195–2219 (2009).
Google Scholar
Kwon, S., Bae, H., Jo, J. & Yoon, S. Comprehensive ensemble in QSAR prediction for drug discovery. BMC Bioinf. 20, 521 (2019).
Google Scholar
Wang, J. & Skolnik, S. Recent advances in physicochemical and ADMET profiling in drug discovery. Chem. Biodivers. 6, 1887–1899 (2009).
Google Scholar
Wu, F. et al. Computational approaches in preclinical studies on drug discovery and development. Front. Chem. 8, 726 (2020).
Google Scholar
Kingma, D. P. & Welling, M. Auto-encoding variational Bayes. Preprint at https://arxiv.org/abs/1312.6114 (2013).
Li, Y. et al. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nat. Commun. 13, 6891 (2022).
Google Scholar
Yang, L. et al. Transformer-based generative model accelerating the development of novel BRAF Inhibitors. ACS Omega 6, 33864–33873 (2021).
Google Scholar
Gomez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).
Google Scholar
Lee, M. & Min, K. MGCVAE: multi-objective inverse design via molecular graph conditional variational autoencoder. J. Chem. Inf. Model. 62, 2943–2950 (2022).
Google Scholar
Martin Simonovsky, N. K. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (ed. HI Honolulu, USA) (2017).
Richard, A. M. et al. The Tox21 10K compound library: collaborative chemistry advancing toxicology. Chem. Res. Toxicol. 34, 189–216 (2021).
Google Scholar
Huang, K. et al. Artificial intelligence foundation for therapeutic science. Nat. Chem. Biol. 18, 1033–1036 (2022).
Google Scholar
Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–D1107 (2012).
Google Scholar
Maia, E. H. B., Assis, L. C., de Oliveira, T. A., da Silva, A. M. & Taranto, A. G. Structure-based virtual screening: from classical to artificial intelligence. Front. Chem. 8, 00343 (2020).
Google Scholar
International Classification of Diseases, Eleventh Revision (ICD-11) (World Health Organization, 2019).
Lagunin, A. A., Dearden, J. C., Filimonov, D. A. & Poroikov, V. V. Computer-aided rodent carcinogenicity prediction. Mutat. Res. 586, 138–146 (2005).
Google Scholar
Hansen, P. & Bichel, J. Carcinogenic effect of sulfonamides. Acta Radiol. 37, 258–265 (1952).
Google Scholar
Littlefield, N. A., Sheldon, W. G., Allen, R. & Gaylor, D. W. Chronic toxicity/carcinogenicity studies of sulphamethazine in Fischer 344/N rats: two-generation exposure. Food Chem. Toxicol. 28, 157–167 (1990).
Google Scholar
Masumshah, R., Aghdam, R. & Eslahchi, C. A neural network-based method for polypharmacy side effects prediction. BMC Bioinform. 22, 385 (2021).
Google Scholar
Wang, L. et al. Long short-term memory neural network with transfer learning and ensemble learning for remaining useful life prediction. Sensors 22, 5744 (2022).
Wallraven, K. et al. Adapting free energy perturbation simulations for large macrocyclic ligands: how to dissect contributions from direct binding and free ligand flexibility. Chem. Sci. 11, 2269–2276 (2020).
Google Scholar
Price, W. N. Big data and black-box medical algorithms. Sci. Transl. Med. 10, aao5333 (2018).
Google Scholar
Zeng, X. et al. Deep generative molecular design reshapes drug discovery. Cell Rep. Med. 3, 100794 (2022).
Google Scholar
Stumpfe, D., Hu, H. & Bajorath, J. Advances in exploring activity cliffs. J. Comput. Aided Mol. Des. 34, 929–942 (2020).
Google Scholar
Musigmann, M. et al. Testing the applicability and performance of Auto ML for potential applications in diagnostic neuroradiology. Sci. Rep. 12, 13648 (2022).
Google Scholar
Irwin, J. J. & Shoichet, B. K. ZINC—a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182 (2005).
Google Scholar
RDKit. RDKit: Open-source cheminformatics., https://www.rdkit.org
Moriwaki, H., Tian, Y. S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).
Google Scholar
Platt, J. Probabilistic Outputs For Support Vector Machines and Comparisons to Regularized Likelihood Methods (Univ. Colorado, 1999).
Wang, S. et al. ADMET evaluation in drug discovery. 16. Predicting hERG blockers by combining multiple pharmacophores and machine learning approaches. Mol. Pharm. 13, 2855–2866 (2016).
Google Scholar
Veith, H. et al. Comprehensive characterization of cytochrome P450 isozyme selectivity across chemical libraries. Nat. Biotechnol. 27, 1050–1055 (2009).
Google Scholar
Carbon-Mangels, M. & Hutter, M. C. Selecting relevant descriptors for classification by Bayesian estimates: a comparison with decision trees and support vector machines approaches for disparate data sets. Mol. Inform. 30, 885–895 (2011).
Google Scholar
Cheng, F. et al. admetSAR: a comprehensive source and free tool for assessment of chemical ADMET properties. J. Chem. Inf. Model. 52, 3099–3105 (2012).
Google Scholar
Martins, I. F., Teixeira, A. L., Pinheiro, L. & Falcao, A. O. A Bayesian approach to in silico blood–brain barrier penetration modeling. J. Chem. Inf. Model. 52, 1686–1697 (2012).
Google Scholar
Xu, C. et al. In silico prediction of chemical Ames mutagenicity. J. Chem. Inf. Model. 52, 2840–2847 (2012).
Google Scholar
Hou, T., Wang, J., Zhang, W. & Xu, X. ADME evaluation in drug discovery. 7. Prediction of oral absorption by correlation and classification. J. Chem. Inf. Model. 47, 208–218 (2007).
Google Scholar
Xu, Y. et al. Deep learning for drug-induced liver injury. J. Chem. Inf. Model. 55, 2085–2093 (2015).
Google Scholar
Alves, V. M. et al. Predicting chemically-induced skin reactions. Part I: QSAR models of skin sensitization and their application to identify potentially hazardous compounds. Toxicol. Appl. Pharmacol. 284, 262–272 (2015).
Google Scholar
National Institute of Environmental Health Sciences (NIEHS); the murine local lymph node assay: a test method for assessing the allergic contact dermatitis potential of chemicals/compounds, report now available. Public health service. Fed. Regist. 64, 14006–14007 (1999).
Zhu, H. et al. Quantitative structure–activity relationship modeling of rat acute toxicity by oral exposure. Chem. Res. Toxicol. 22, 1913–1921 (2009).
Google Scholar
Lombardo, F. & Jing, Y. In silico prediction of volume of distribution in humans. Extensive data set and the exploration of linear and nonlinear methods coupled with molecular interaction fields descriptors. J. Chem. Inf. Model. 56, 2042–2052 (2016).
Google Scholar
Wenlock, M. & Tomkinson, N. Experimental In Vitro DMPK and Physicochemical Data on a Set of Publicly Disclosed Compounds (ChEMBL); https://doi.org/10.6019/CHEMBL3301361
Obach, R. S., Lombardo, F. & Waters, N. J. Trend analysis of a database of intravenous pharmacokinetic parameters in humans for 670 drug compounds. Drug Metab. Dispos. 36, 1385–1405 (2008).
Google Scholar
Di, L. et al. Mechanistic insights from comparing intrinsic clearance values between human liver microsomes and hepatocytes to guide drug design. Eur. J. Med. Chem. 57, 441–448 (2012).
Google Scholar
Ma, C. Y. et al. Prediction models of human plasma protein binding rate and oral bioavailability derived by using GA-CG-SVM method. J. Pharm. Biomed. Anal. 47, 677–682 (2008).
Google Scholar
Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).
Google Scholar
Sorkun, M. C., Khetan, A. & Er, S. AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Sci. Data 6, 143 (2019).
Google Scholar
Mobley, D. L. & Guthrie, J. P. FreeSolv: a database of experimental and calculated hydration free energies, with input files. J. Comput. Aided Mol. Des. 28, 711–720 (2014).
Google Scholar
Touret, F. et al. In vitro screening of a FDA approved chemical library reveals potential inhibitors of SARS-CoV-2 replication. Sci. Rep. 10, 13093 (2020).
Google Scholar
Main Protease Structure and XChem Fragment Screen (Diamond, 2020).
Tatonetti, N. P., Ye, P. P., Daneshjou, R. & Altman, R. B. Data-driven prediction of drug effects and interactions. Sci. Transl. Med. 4, 125ra131 (2012).
Google Scholar
Ryu, J. Y., Kim, H. U. & Lee, S. Y. Deep learning improves prediction of drug–drug and drug–food interactions. Proc. Natl Acad. Sci. USA 115, E4304–E4311 (2018).
Google Scholar
Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucl. Acids Res. 46, D1074–D1082 (2018).
Google Scholar
Ravindranath, P. A., Forli, S., Goodsell, D. S., Olson, A. J. & Sanner, M. F. AutoDockFR: advances in protein–ligand docking with explicitly specified binding site flexibility. PLoS Comput. Biol. 11, e1004586 (2015).
Google Scholar
Alhossary, A., Handoko, S. D., Mu, Y. & Kwoh, C. K. Fast, accurate, and reliable molecular docking with QuickVina 2. Bioinformatics 31, 2214–2216 (2015).
Google Scholar
McNutt, A. T. et al. GNINA 1.0: molecular docking with deep learning. J. Cheminform. 13, 43 (2021).
Google Scholar
Zheng, L. et al. Improving protein–ligand docking and screening accuracies by incorporating a scoring function correction term. Brief. Bioinform. 23, bbac051 (2022).
Google Scholar
Shen, C. et al. Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).
MathSciNet Google Scholar
Wang, Z. et al. A fully differentiable ligand pose optimization framework guided by deep learning and a traditional scoring function. Brief. Bioinform. 24, bbac520 (2022).
MathSciNet Google Scholar
Pincus, M. Letter to the editor—a Monte Carlo method for the approximate solution of certain types of constrained optimization problems. Oper. Res. 18, 1225–1228 (1970).
MathSciNet MATH Google Scholar
Chokyotager/NotYetAnotherNightshade v.1.1 (Zenodo, 2022); https://doi.org/10.5281/zenodo.7827194

Download references

Acknowledgements

We thank T. L. Heng and S. Shikhar for their comments in the initial phase of the work, and T. L. Dawson Jr for his continued support. We would also like to dedicate this work to the memory of Jamie Hinks, a friend, colleague, and co-author of this paper, who sadly passed away in March 2023. This work is supported by the Singapore Ministry of Education (MOE), tier 1 grants RG27/21 and RG97/22 (M.Y.). H.L.Y.I. is also supported by funding from the Agency for Science, Technology and Research (A*STAR), and A*STAR BMRC EDB IAF-PP grants (H17/01/a0/004, Skin Research Institute of Singapore; H18/01a0/016 and H22J1a0040, Asian Skin Microbiome Program). Computations were mainly performed using the resources of the National Supercomputing Centre, Singapore (https://www.nscc.sg) and the HADLEY high-performance computing cluster of SCELSE. SCELSE is funded by Singapore’s National Research Foundation, the Ministry of Education, NTU, and the National University of Singapore (NUS), and is hosted by NTU in partnership with NUS.

Author information

Deceased: Jamie Hinks.

Authors and Affiliations

School of Biological Sciences (SBS), Nanyang Technological University (NTU), Singapore, Singapore, Republic of Singapore
Hilbert Yuen In Lam, Hao Han & Yuguang Mu
A*STAR Skin Research Labs (A*SRL), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore, Republic of Singapore
Hilbert Yuen In Lam
Heliovision, Kalmthout, Belgium
Robbe Pincket
MagMol Pte. Ltd., Singapore, Republic of Singapore
Xing Er Ong
School of Physics, Shandong University, Jinan, China
Zechen Wang & Weifeng Li
Singapore Centre for Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University (NTU), Singapore, Singapore, Republic of Singapore
Jamie Hinks
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yanjie Wei & Liangzhen Zheng
Shanghai Zelixir Biotech, Shanghai, China
Liangzhen Zheng

Authors

Hilbert Yuen In Lam
View author publications
You can also search for this author in PubMed Google Scholar
Robbe Pincket
View author publications
You can also search for this author in PubMed Google Scholar
Hao Han
View author publications
You can also search for this author in PubMed Google Scholar
Xing Er Ong
View author publications
You can also search for this author in PubMed Google Scholar
Zechen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jamie Hinks
View author publications
You can also search for this author in PubMed Google Scholar
Yanjie Wei
View author publications
You can also search for this author in PubMed Google Scholar
Weifeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Liangzhen Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yuguang Mu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

H.L.Y.I, R.P. and M.Y. conceptualized the work. H.L.Y.I, R.P., H.H. and M.Y. designed the methodology. H.L.Y.I. and R.P. wrote the software. HL.Y.I., H.H. and M.Y. validated the work. H.L.Y.I. and W.Z. performed a formal analysis. HL.Y.I., R.P., H.H., W.Z. and O.X.E. performed investigations. H.L.Y.I., O.X.E. and W.Z. curated the data. H.L.Y.I. & M.Y. wrote the original draft, whereas R.P., H.H., O.X.E., W.Z., J.H., W.Y., L.W., Z.L. and M.Y. reviewed and edited the manuscript. H.L.Y.I. and O.X.E. visualized the work. H.L.Y.I. and M.Y. supervised the work. J.H. and M.Y. attained resources. M.Y. administered the project and acquired funding.

Corresponding authors

Correspondence to Weifeng Li, Liangzhen Zheng or Yuguang Mu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Shivam Patel and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs 1–4 and legends for Supplementary Figs. 1 and 2.

Supplementary Data 1

Scores and comparisons of all surrogate models.

Supplementary Data 2

TWOSIDES polypharmacy labels reclassified using ICD-11 as a reference.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lam, H.Y.I., Pincket, R., Han, H. et al. Application of variational graph encoders as an effective generalist algorithm in computer-aided drug design. Nat Mach Intell 5, 754–764 (2023). https://doi.org/10.1038/s42256-023-00683-9

Download citation

Received: 13 January 2023
Accepted: 02 June 2023
Published: 06 July 2023
Issue Date: July 2023
DOI: https://doi.org/10.1038/s42256-023-00683-9