Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Exploiting machine learning for end-to-end drug discovery and development

Abstract

A variety of machine learning methods such as naive Bayesian, support vector machines and more recently deep neural networks are demonstrating their utility for drug discovery and development. These leverage the generally bigger datasets created from high-throughput screening data and allow prediction of bioactivities for targets and molecular properties with increased levels of accuracy. We have only just begun to exploit the potential of these techniques but they may already be fundamentally changing the research process for identifying new molecules and/or repurposing old drugs. The integrated application of such machine learning models for end-to-end (E2E) application is broadly relevant and has considerable implications for developing future therapies and their targeting.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Implementing end-to-end (E2E) machine learning models at all stages of drug discovery and development illustrating some of the key areas that could be modelled.
Fig. 2: Demonstrating iterative drug discovery using machine learning.

Similar content being viewed by others

References

  1. Butler, L. D. et al. Current nonclinical testing paradigms in support of safe clinical trials: an IQ Consortium DruSafe perspective. Regul. Toxicol. Pharmacol. 87, S1–S15 (2017).

    Google Scholar 

  2. Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates. Nat. Rev. Drug. Discov. 3, 711–715 (2004).

    CAS  Google Scholar 

  3. Bowes, J. et al. Reducing safety-related drug attrition: the use of in vitro pharmacological profiling. Nat. Rev. Drug. Discov. 11, 909–922 (2012).

    CAS  Google Scholar 

  4. DiMasi, J. A., Grabowski, H. G. & Hansen, R. W. Innovation in the pharmaceutical industry: new estimates of R&D costs. J. Health Econ. 47, 20–33 (2016).

    Google Scholar 

  5. Kenna, J. G. Human biology-based drug safety evaluation: scientific rationale, current status and future challenges. Expert Opin. Drug Metab. Toxicol. 13, 567–574 (2017).

    CAS  Google Scholar 

  6. Gayvert, K. M., Madhukar, N. S. & Elemento, O. A data-driven approach to predicting successes and failures of clinical trials. Cell Chem. Biol. 23, 1294–1301 (2016).

    CAS  Google Scholar 

  7. Wagner, J. A. et al. Application of a dynamic map for learning, communicating, navigating, and improving therapeutic development. Clin. Transl. Sci. 11, 166–174 (2018).

    Google Scholar 

  8. Paul, S. M. et al. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat. Rev. Drug Discov. 9, 203–214 (2010).

    CAS  Google Scholar 

  9. Zhavoronkov, A. Artificial intelligence for drug discovery, biomarker development, and generation of novel chemistry. Mol. Pharm. 15, 4311–4313 (2018).

    CAS  Google Scholar 

  10. Davies, D. W., Butler, K. T., Isayev, O. & Walsh, A. Materials discovery by chemical analogy: role of oxidation states in structure prediction. Faraday Discuss. 211, 553–568 (2018).

    CAS  Google Scholar 

  11. Drouin, A. et al. Predictive computational phenotyping and biomarker discovery using reference-free genome comparisons. BMC Genom. 17, 754 (2016).

    Google Scholar 

  12. Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today. 23, 1241–1250 (2018).

    Google Scholar 

  13. Ekins, S. et al. Machine learning models and pathway genome data base for trypanosoma cruzi drug discovery. PLoS Negl. Trop. Dis. 9, e0003878 (2015).

    Google Scholar 

  14. Lampa, S. et al. Predicting off-target binding profiles with confidence using conformal prediction. Front. Pharmacol. 9, 1256 (2018).

    Google Scholar 

  15. Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Identifying the macromolecular targets of de novo-designed chemical entities through self-organizing map consensus. Proc. Natl Acad. Sci. USA 111, 4067–4072 (2014).

    CAS  Google Scholar 

  16. Kim, S. et al. PubChem substance and compound databases. Nucleic Acids Res. 44, D1202–1213 (2016).

    CAS  Google Scholar 

  17. Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40, D1100–1107 (2012).

    CAS  Google Scholar 

  18. Mayr, A. et al. Large-scale comparison of machine learning methods for drug target prediction on ChEMBL. Chem. Sci. 9, 5441–5451 (2018).

    CAS  Google Scholar 

  19. Clark, A. M., Williams, A. J. & Ekins, S. Machines first, humans second: on the importance of algorithmic interpretation of open chemistry data. J. Cheminform. 7, 9 (2015).

    Google Scholar 

  20. Christianini, N. & Shawe-Taylor, J. Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge Univ. Press, 2000).

  21. Shen, M., Xiao, Y., Golbraikh, A., Gombar, V. K. & Tropsha, A. Development and validation of K-nearest neighbour QSPR models of metabolic stability of drug candidates. J. Med. Chem. 46, 3013–3020 (2003).

    CAS  Google Scholar 

  22. Bender, A. et al. Analysis of pharmacology data and the prediction of adverse drug reactions and off-target effects from chemical structure. ChemMedChem 2, 861–873 (2007).

    CAS  Google Scholar 

  23. Susnow, R. G. & Dixon, S. L. Use of robust classification techniques for the prediction of human cytochrome P450 2D6 inhibition. J. Chem. Inf. Comput. Sci. 43, 1308–1315 (2003).

    CAS  Google Scholar 

  24. Mitchell, J. B. Machine learning methods in chemoinformatics. Wiley Interdiscip. Rev. Comput. Mol. Sci. 4, 468–481 (2014).

    CAS  Google Scholar 

  25. Schmidhuber, J. Deep learning in neural networks: an overview. Neural Netw. 61, 85–117 (2015).

    Google Scholar 

  26. Aliper, A. et al. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13, 2524–2530 (2016).

    CAS  Google Scholar 

  27. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).

    CAS  Google Scholar 

  28. Wu, K., Zhao, Z., Wang, R. & Wei, G.-W. TopPS: Persistent homology-based multi-task deep neural networks for simultaneous predictions of partition coefficient and aqueous solubility. J. Comput. Chem. 39, 1444–1454 (2018).

    CAS  Google Scholar 

  29. Wen, M. et al. Deep-learning-based drug-target interaction prediction. J. Proteome Res. 16, 1401–1409 (2017).

    CAS  Google Scholar 

  30. Ekins, S. The next era: Deep learning in pharmaceutical research. Pharm. Res. 33, 2594–2603 (2016).

    CAS  Google Scholar 

  31. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    CAS  Google Scholar 

  32. Altae-Tran, H., Ramsundar, B., Pappu, A. S. & Pande, V. Low data drug discovery with one-shot learning. ACS Cent. Sci. 3, 283–293 (2017).

    CAS  Google Scholar 

  33. Kadurin, A., Nikolenko, S., Khrabrov, K., Aliper, A. & Zhavoronkov, A. druGAN: an advanced generative adversarial autoencoder model for de novo generation of new molecules with desired molecular properties in silico. Mol. Pharm. 14, 3098–3104 (2017).

    CAS  Google Scholar 

  34. Butler, K. T., Davies, D. W., Cartwright, H., Isayev, O. & Walsh, A. Machine learning for molecular and materials science. Nature 559, 547–555 (2018).

    CAS  Google Scholar 

  35. Rifaioglu, A. S. et al. Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform. https://doi.org/10.1093/bib/bby061 (2018).

    Article  Google Scholar 

  36. Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Google Scholar 

  37. Putin, E. et al. Adversarial threshold neural computer for molecular de novo design. Mol. Pharm. 15, 4386–4397 (2018).

    CAS  Google Scholar 

  38. McGaughey, G. B. et al. Comparison of topological, shape, and docking methods in virtual screening. J. Chem. Inf. Model. 47, 1504–1519 (2007).

    CAS  Google Scholar 

  39. Johnson, K. W. et al. Enabling precision cardiology through multiscale biology and systems medicine. JACC Basic Transl. Sci. 2, 311–327 (2017).

    Google Scholar 

  40. Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. npj Digit. Med. 1, 18 (2018).

    Google Scholar 

  41. Ekins, S. et al. Machine learning models identify molecules active against Ebola virus in vitro. F1000Research 4, 1091 (2015).

    Google Scholar 

  42. Perryman, A. L., Stratton, T. P., Ekins, S. & Freundlich, J. S. Predicting mouse liver microsomal stability with “pruned’ machine learning models and public data. Pharm. Res. 33, 433–449 (2015).

    Google Scholar 

  43. Clark, A. M. et al. Open source Bayesian models: 1. Application to ADME/Tox and drug discovery datasets. J. Chem. Inf. Model. 55, 1231–1245 (2015).

    CAS  Google Scholar 

  44. Perryman, A. L. et al. Naive Bayesian models for vero cell cytotoxicity. Pharm. Res. 35, 170 (2018).

    Google Scholar 

  45. Sandoval, P. J., Zorn, K. M., Clark, A. M., Ekins, S. & Wright, S. H. Assessment of substrate dependent ligand interactions at the organic cation transporter OCT2 using six model substrates. Mol. Pharmacol. 94, 1057–1068 (2018).

    CAS  Google Scholar 

  46. Russo, D. P., Zorn, K. M., Clark, A. M., Zhu, H. & Ekins, S. Comparing multiple machine learning algorithms and metrics for estrogen receptor binding prediction. Mol. Pharm. 15, 4361–4370 (2018).

    CAS  Google Scholar 

  47. Lusci, A., Pollastri, G. & Baldi, P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model. 53, 1563–1575 (2013).

    CAS  Google Scholar 

  48. Stratton, T. P. et al. Addressing the metabolic stability of antituberculars through machine learning. ACS Med. Chem. Lett. 8, 1099–1104 (2017).

    CAS  Google Scholar 

  49. Korotcov, A., Tkachenko, V., Russo, D. P. & Ekins, S. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery datasets. Mol. Pharm. 14, 4462–4475 (2018).

    Google Scholar 

  50. Lenselink, E. B. et al. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminform. 9, 45 (2017).

    Google Scholar 

  51. Koutsoukas, A., Monaghan, K. J., Li, X. & Huan, J. Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data. J. Cheminform. 9, 42 (2017).

    Google Scholar 

  52. Lane, T. et al. Comparing and validating machine learning models for mycobacterium tuberculosis drug discovery. Mol. Pharm. 15, 4346–4360 (2018).

    CAS  Google Scholar 

  53. Ridley, D. B. Priorities for the priority review voucher. Am. J. Trop. Med. Hyg. 96, 14–15 (2017).

    Google Scholar 

  54. Ekins, S. et al. Bayesian models leveraging bioactivity and cytotoxicity information for drug discovery. Chem. Biol. 20, 370–378 (2013).

    CAS  Google Scholar 

  55. Hernandez, H. W. et al. High throughput and computational repurposing for neglected diseases. Pharm. Res. 36, 27 (2018).

    Google Scholar 

  56. Ekins, S. Industrializing rare disease therapy discovery and development. Nat. Biotechnol. 35, 117–118 (2017).

    CAS  Google Scholar 

  57. Ekins, S. & Perlstein, E. O. Doing it all – how families are reshaping rare disease research. Pharm. Res. 35, 192 (2018).

    Google Scholar 

  58. Chen, B. & Altman, R. B. Opportunities for developing therapies for rare genetic diseases: focus on gain-of-function and allostery. Orphanet. J. Rare Dis. 12, 61 (2017).

    CAS  Google Scholar 

  59. Trujillano, D. et al. A comprehensive global genotype-phenotype database for rare diseases. Mol. Genet. Genomic Med. 5, 66–75 (2017).

    Google Scholar 

  60. Thompson, R. et al. RD-Connect: an integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. J. Gen. Intern. Med. 29, 780–787 (2014).

    Google Scholar 

  61. Rath, A. et al. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum. Mutat. 33, 803–808 (2012).

    Google Scholar 

  62. Rare Disease InfoHub https://rarediseases.oscar.ncsu.edu (2018).

  63. Fleming, N. How artificial intelligence is changing drug discovery. Nature 557, 55–57 (2018).

    Google Scholar 

  64. Chuang, K. V. & Keiser, M. J. Adversarial controls for scientific machine learning. ACS Chem. Biol. 13, 2819–2821 (2018).

    CAS  Google Scholar 

  65. Marchese Robinson, R. L., Palczewska, A., Palczewski, J. & Kidley, N. Comparison of the predictive performance and interpretability of random forest and linear models on benchmark data sets. J. Chem. Inf. Model. 57, 1773–1792 (2017).

    CAS  Google Scholar 

  66. Jones, D. E., Ghandehari, H. & Facelli, J. C. A review of the applications of data mining and machine learning for the prediction of biomedical properties of nanoparticles. Comput. Methods Programs Biomed. 132, 93–103 (2016).

    Google Scholar 

  67. Shamay, Y. et al. Quantitative self-assembly prediction yields targeted nanomedicines. Nat. Mater. 17, 361–368 (2018).

    CAS  Google Scholar 

  68. de la Iglesia, D. et al. A machine learning approach to identify clinical trials involving nanodrugs and nanodevices from ClinicalTrials.gov. PLOS ONE 9, e110331 (2014).

    Google Scholar 

  69. Tropsha, A., Mills, K. C. & Hickey, A. J. Reproducibility, sharing and progress in nanomaterial databases. Nat. Nanotechnol. 12, 1111–1114 (2017).

    CAS  Google Scholar 

  70. Baker, N. C., Ekins, S., Williams, A. J. & Tropsha, A. A bibliometric review of drug repurposing. Drug Discov. Today 23, 661–672 (2018).

    CAS  Google Scholar 

  71. Lamb, J. et al. The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).

    CAS  Google Scholar 

  72. Dudley, J. T. et al. Computational repositioning of the anticonvulsant topiramate for inflammatory bowel disease. Sci. Transl. Med. 3, 96ra76 (2011).

    CAS  Google Scholar 

  73. Schadt, E. E., Buchanan, S., Brennand, K. J. & Merchant, K. M. Evolving toward a human-cell based and multiscale approach to drug discovery for CNS disorders. Front. Pharmacol. 5, 252 (2014).

    Google Scholar 

  74. Napolitano, F. et al. Drug repositioning: a machine-learning approach through data integration. J. Cheminform. 5, 30 (2013).

    CAS  Google Scholar 

  75. Cruz, S. et al. In silico HCT116 human colon cancer cell-based models en route to the discovery of lead-like anticancer drugs. Biomolecules 8, 56 (2018).

    Google Scholar 

  76. Fröhlich, H. et al. From hype to reality: data science enabling personalized medicine. BMC Med. 16, 150 (2018).

    Google Scholar 

  77. Chen, R., Liu, X., Jin, S., Lin, J. & Liu, J. Machine learning for drug-target interaction prediction. Molecules 23, 2208 (2018).

    Google Scholar 

  78. Lin, J. & Wong, K. C. Off-target predictions in CRISPR-Cas9 gene editing using deep learning. Bioinformatics 34, i656–i663 (2018).

    Google Scholar 

  79. Chang, Y. et al. Cancer drug response profile scan (CDRscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Sci. Rep. 8, 8857 (2018).

    Google Scholar 

  80. Boland, M. R., Polubriaginof, F. & Tatonetti, N. P. Development of A machine learning algorithm to classify drugs of unknown fetal effect. Sci. Rep. 7, 12839 (2017).

    Google Scholar 

  81. Rannals, M. D. et al. Psychiatric risk gene transcription factor 4 regulates intrinsic excitability of prefrontal neurons via repression of SCN10a and KCNQ1. Neuron 90, 43–55 (2016).

    CAS  Google Scholar 

  82. Zang, Q. et al. In silico prediction of physicochemical properties of environmental chemicals using molecular fingerprints and machine learning. J. Chem. Inf. Model. 57, 36–49 (2017).

    CAS  Google Scholar 

  83. Hong, H., Thakkar, S., Chen, M. & Tong, W. Development of decision forest models for prediction of drug-induced liver injury in humans using a large set of FDA-approved drugs. Sci. Rep. 7, 17311 (2017).

    Google Scholar 

  84. Korotcov, A., Tkachenko, V., Russo, D. P. & Ekins, S. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Mol. Pharm. 14, 4462–4475 (2017).

    CAS  Google Scholar 

  85. Wang, W., Kim, M. T., Sedykh, A. & Zhu, H. Developing enhanced blood-brain barrier permeability models: integrating external bio-assay data in QSAR modeling. Pharm. Res. 32, 3055–3065 (2015).

    CAS  Google Scholar 

  86. Baba, H., Takahara, J., Yamashita, F. & Hashida, M. Modeling and prediction of solvent effect on human skin permeability using support vector regression and random forest. Pharm. Res. 32, 3604–3617 (2015).

    CAS  Google Scholar 

  87. Xu, C. et al. In silico prediction of chemical Ames mutagenicity. J. Chem. Inf. Model. 52, 2840–2847 (2012).

    CAS  Google Scholar 

  88. Huang, W. et al. Prediction of human clearance based on animal data and molecular properties. Chem. Biol. Drug Des. 86, 990–997 (2015).

    CAS  Google Scholar 

  89. Basant, N., Gupta, S. & Singh, K. P. QSAR modeling for predicting reproductive toxicity of chemicals in rats for regulatory purposes. Toxicol. Res. 5, 1029–1038 (2016).

    CAS  Google Scholar 

  90. Alhalaweh, A. et al. Computational predictions of glass-forming ability and crystallization tendency of drug molecules. Mol. Pharm. 11, 3123–3132 (2014).

    CAS  Google Scholar 

  91. Miller, T. H. et al. Prediction of bioconcentration factors in fish and invertebrates using machine learning. Sci. Total Environ. 648, 80–89 (2019).

    CAS  Google Scholar 

  92. Rose, S., Bergquist, S. L. & Layton, T. J. Computational health economics for identification of unprofitable health care enrollees. Biostatistics 18, 682–694 (2017).

    Google Scholar 

  93. Calderon, C. P., Daniels, A. L. & Randolph, T. W. Deep convolutional neural network analysis of flow imaging microscopy data to classify subvisible particles in protein formulations. J. Pharm. Sci. 107, 999–1008 (2018).

    CAS  Google Scholar 

  94. Degardin, K., Guillemain, A., Guerreiro, N. V. & Roggo, Y. Near infrared spectroscopy for counterfeit detection using a large database of pharmaceutical tablets. J. Pharm. Biomed. Anal. 128, 89–97 (2016).

    CAS  Google Scholar 

  95. Page, D. et al. Identifying adverse drug events by relational learning. Proc. Conf. AAAI Artif. Intell. 2012, 790–793 (2012).

    Google Scholar 

Download references

Acknowledgements

In memory of Rebecca J. Williams. J. Freundlich, R. J. G. Arnold, P. Madrid, J. Lage de Siqueira-Neto, A. Williams, A. Tropsha, A. Gerlach, J. Gerlach, D. Chipman, A. Davidow and M. Hupcey are kindly acknowledged for discussions and some of the collaborations described herein. S.E. acknowledges funding to Collaborations Pharmaceuticals, Inc., from NIGMS R44 GM122196-02A1, NINDS 1R43NS107079-01, NINDS 3R43NS107079-01S1, NCATS 1UH2TR002084-01 and FY2018 UNC Research Opportunities Initiative (ROI) award. Research reported in this publication was supported by the National Institute of Neurological Disorders and Stroke of the National Institutes of Health under award number R43NS107079. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean Ekins.

Ethics declarations

Competing interests

S.E. is founder and CEO, A.C.P., K.M.Z., T.L. and J.J.K. are employees, and D.P.R. and A.M.C. are consultants of Collaborations Pharmaceuticals, Inc. A.M.C. is also the founder and owner of Molecular Materials Informatics, Inc. A.J.H. has no conflicts of interest.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ekins, S., Puhl, A.C., Zorn, K.M. et al. Exploiting machine learning for end-to-end drug discovery and development. Nat. Mater. 18, 435–441 (2019). https://doi.org/10.1038/s41563-019-0338-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41563-019-0338-z

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research