Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR

Abstract

Quantitative structure–activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term ‘deep QSAR’. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Contrasting traditional and deep QSAR models.
Fig. 2: Generative molecular design.
Fig. 3: Workflow for deep docking.
Fig. 4: Molecular simulations enhanced by deep learning potentials in the calculation of ligand binding affinity.

Similar content being viewed by others

References

  1. Hansch, C., Maloney, P., Fujita, T. & Muir, R. Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients. Nature 194, 178–180 (1962).

    Article  CAS  Google Scholar 

  2. Cherkasov, A. et al. QSAR modeling: where have you been? Where are you going to? J. Med. Chem. 57, 4977–5010 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Ivakhnenko, A. G. & Lapa, V. G. Cybernetics and Forecasting Techniques (American Elsevier Co, 1967).

  5. Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E. & Svetnik, V. Deep neural nets as a method for quantitative structure-activity relationships. J. Chem. Inf. Model. 55, 263–274 (2015).

    Article  CAS  PubMed  Google Scholar 

  6. Chen, H., Engkvist, O., Wang, Y., Olivecrona, M. & Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today 23, 1241–1250 (2018).

    Article  PubMed  Google Scholar 

  7. Yang, X., Wang, Y., Byrne, R., Schneider, G. & Yang, S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119, 10520–10594 (2019).

    Article  CAS  PubMed  Google Scholar 

  8. Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).

    Article  Google Scholar 

  9. Pandey, M. et al. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4, 211–221 (2022).

    Article  Google Scholar 

  10. Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2012).

    Article  Google Scholar 

  11. Real, E., Aggarwal, A., Huang, Y. & Le, Q. V. Regularized evolution for image classifier architecture search. Preprint at:arXiv https://doi.org/10.48550/arXiv.1802.01548 (2018).

    Article  Google Scholar 

  12. Elsken, T., Metzen, J. H. & Hutter, F. Neural architecture search: a survey.J. Mach. Learn. Res. 20, 1–21 (2019).

    Google Scholar 

  13. Li, X. & Fourches, D. Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT. J. Cheminform. 12, 27 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Xu, Y., Ma, J., Liaw, A., Sheridan, R. P. & Svetnik, V. Demystifying multitask deep neural networks for quantitative structure–activity relationships. J. Chem. Inf. Model. 57, 2490–2504 (2017).

    Article  CAS  PubMed  Google Scholar 

  15. Moon, C. & Kim, D. Prediction of drug-target interactions through multi-task learning. Sci. Rep. 12, 18323 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Fourches, D., Muratov, E. & Tropsha, A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model. 50, 1189–1204 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Fourches, D. et al. Trust, but verify II: a practical guide to chemogenomics data curation. J. Chem. Inf. Model. 56, 1243–1252 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Fourches, D., Muratov, E. & Tropsha, A. Curation of chemogenomics data. Nat. Chem. Biol. 11, 535 (2015).

    Article  CAS  PubMed  Google Scholar 

  19. Alves, V. M. et al. Curated data in — trustworthy in silico models out: the impact of data quality on the reliability of artificial intelligence models as alternatives to animal testing. Altern. Lab. Anim. 49, 73–82 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Tropsha, A. Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29, 476–488 (2010).

    Article  CAS  PubMed  Google Scholar 

  21. Golbraikh, A., Muratov, E., Fourches, D. & Tropsha, A. Data set modelability by QSAR. J. Chem. Inf. Model. 54, 1–4 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Maggiora, G. M. On outliers and activity cliffs — why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (2006).

    Article  CAS  PubMed  Google Scholar 

  23. Aldeghi, M. et al. Roughness of molecular property landscapes and its impact on modellability. J. Chem. Inf. Model. 62, 4660–4671 (2022).

    Article  CAS  PubMed  Google Scholar 

  24. Bosc, N. et al. Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J. Cheminform. 11, 4 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Varnek, A. & Tropsha, A. Chemoinformatics Approaches to Virtual Screening. https://doi.org/10.1039/9781847558879 (Royal Society of Chemistry, 2008).

  26. Schneider, G. & Fechner, U. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discov. 4, 649–663 (2005).

    Article  CAS  PubMed  Google Scholar 

  27. Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19, 353–364 (2019).

    Article  PubMed  Google Scholar 

  29. Schneider, G. Mind and machine in drug design. Nat. Mach. Intell. 1, 128–130 (2019).

    Article  Google Scholar 

  30. Schneider, G. & Clark, D. E. Automated de novo drug design: are we nearly there yet? Angew. Chem. Int. Ed. Engl. 58, 10792–10803 (2019).

    Article  CAS  PubMed  Google Scholar 

  31. Hartenfeller, M. et al. DOGS: reaction-driven de novo design of bioactive compounds. PLoS Comput. Biol. 8, e1002380 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Tong, X. et al. Generative models for de novo drug design. J. Med. Chem. 64, 14011–14027 (2021).

    Article  CAS  PubMed  Google Scholar 

  33. Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article  CAS  PubMed  Google Scholar 

  35. Blaschke, T., Olivecrona, M., Engkvist, O., Bajorath, J. & Chen, H. Application of generative autoencoder in de novo molecular design. Mol. Inform. 37, 1700123 (2018).

    Article  PubMed  Google Scholar 

  36. Putin, E. et al. Reinforced adversarial neural computer for de novo molecular design. J. Chem. Inf. Model. 58, 1194–1204 (2018).

    Article  CAS  PubMed  Google Scholar 

  37. Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 3, 1023–1032 (2021).

    Article  Google Scholar 

  38. Button, A., Merk, D., Hiss, J. A. & Schneider, G. Automated de novo molecular design by hybrid machine intelligence and rule-driven chemical synthesis. Nat. Mach. Intell. 1, 307–315 (2019).

    Article  Google Scholar 

  39. Grisoni, F. Chemical language models for de novo drug design: challenges and opportunities. Curr. Opin. Struct. Biol. 79, 102527 (2023).

    Article  CAS  PubMed  Google Scholar 

  40. Kotsias, P. C. et al. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks. Nat. Mach. Intell. 2, 254–265 (2020).

    Article  Google Scholar 

  41. Korshunova, M. et al. Generative and reinforcement learning approaches for the automated de novo design of bioactive compounds. Commun. Chem. 5, 129 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Baskin, I. I. Is one-shot learning a viable option in drug discovery? Expert Opin. Drug Discov. 14, 601–603 (2019).

    Article  PubMed  Google Scholar 

  43. Simões, R. S., Maltarollo, V. G., Oliveira, P. R. & Honorio, K. M. Transfer and multi-task learning in QSAR modeling: advances and challenges. Front. Pharmacol. 9, 74 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed. Engl. 60, 19477–19482 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).

    Article  Google Scholar 

  46. Blaschke, T. et al. REINVENT 2.0: an AI tool for de novo drug design. J. Chem. Inf. Model. 60, 5918–5922 (2020).

    Article  CAS  PubMed  Google Scholar 

  47. Grisoni, F. & Schneider, G. De novo molecular design with chemical language models. Methods Mol. Biol. 2390, 207–232 (2022).

    Article  CAS  PubMed  Google Scholar 

  48. Chen, H. Can generative-model-based drug design become a new normal in drug discovery? J. Med. Chem. 65, 100–102 (2022).

    Article  CAS  PubMed  Google Scholar 

  49. Lam, L. & Suen, C. Y. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Syst. Man Cybern. Part. A Syst. Hum. 27, 553–568 (1997).

    Article  Google Scholar 

  50. Nippa, D. F. et al. Enabling late-stage drug diversification by high-throughput experimentation with geometric deep learning. Preprint at: ChemRxiv https://doi.org/10.26434/CHEMRXIV-2022-GKXM6 (2022).

    Article  Google Scholar 

  51. Clark, K., Luong, M.-T., Le, Q. V. & Manning, C. D. ELECTRA: pre-training text encoders as discriminators rather than generators. Preprint at:arXiv https://doi.org/10.48550/arxiv.2003.10555 (2020).

    Article  Google Scholar 

  52. Corey, E. J. & Wipke, W. T. Computer-assisted design of complex organic syntheses. Science 166, 178–192 (1969).

    Article  CAS  PubMed  Google Scholar 

  53. Corey, E. J. General methods for the construction of complex molecules. Pure Appl. Chem. 14, 19–38 (1967).

    Article  CAS  Google Scholar 

  54. Coley, C. W., Green, W. H. & Jensen, K. F. Machine learning in computer-aided synthesis planning. Acc. Chem. Res. 51, 1281–1289 (2018).

    Article  CAS  PubMed  Google Scholar 

  55. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article  CAS  PubMed  Google Scholar 

  56. Segler, M. H. S. & Waller, M. P. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry 23, 5966–5971 (2017).

    Article  CAS  PubMed  Google Scholar 

  57. Szymkuć, S. et al. Computer-assisted synthetic planning: the end of the beginning. Angew. Chem. Int. Ed. Engl. 55, 5904–5937 (2016).

    Article  PubMed  Google Scholar 

  58. Genheden, S. et al. AiZynthFinder: a fast, robust and flexible open-source software for retrosynthetic planning. J. Cheminform. 12, 70 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Jin, W., Coley, C. W., Barzilay, R. & Jaakkola, T. In: NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems 2608–2617 (Neural Information Processing Systems Foundation, 2017).

  60. Sutskever, I., Vinyals, O. & Le, Q. V. In: Proceedings of the 27th International Conference on Neural Information Processing Systems 2, 3104–3112 (Neural Information Processing Systems Foundation, 2014).

  61. Liu, B. et al. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS Cent. Sci. 3, 1103–1113 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Schwaller, P., Gaudin, T., Lányi, D., Bekas, C. & Laino, T. “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models. Chem. Sci. 9, 6091–6098 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Wołos, A. et al. Computer-designed repurposing of chemical wastes into drugs. Nature 604, 668–676 (2022).

    Article  PubMed  Google Scholar 

  64. Patel, H. et al. SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules. Sci. Data 7, 384 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Zabolotna, Y. et al. SynthI: a new open-source tool for synthon-based library design. J. Chem. Inf. Model. 62, 2151–2163 (2022).

    Article  CAS  PubMed  Google Scholar 

  66. Bonnet, P. Is chemical synthetic accessibility computationally predictable for drug and lead-like molecules? A comparative assessment between medicinal and computational chemists. Eur. J. Med. Chem. 54, 679–689 (2012).

    Article  CAS  PubMed  Google Scholar 

  67. Boda, K., Seidel, T. & Gasteiger, J. Structure and reaction based evaluation of synthetic accessibility. J. Comput. Aided Mol. Des. 21, 311–325 (2007).

    Article  CAS  PubMed  Google Scholar 

  68. Ertl, P. & Schuffenhauer, A. Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. J. Cheminform. 1, 8 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  69. Coley, C. W., Rogers, L., Green, W. H. & Jensen, K. F. SCScore: synthetic complexity learned from a reaction corpus. J. Chem. Inf. Model. 58, 252–261 (2018).

    Article  CAS  PubMed  Google Scholar 

  70. Hoonakker, F., Lachiche, N., Varnek, A. & Wagner, A. A representation to apply usual data mining techniques to chemical reactions — illustration on the rate constant of S(N)2 reactions in water. Int. J. Artif. Intell. Tools 20, 253–270 (2010).

    Article  Google Scholar 

  71. Gimadiev, T. et al. Bimolecular nucleophilic substitution reactions: predictive models for rate constants and molecular reaction pairs analysis. Mol. Inform. 38, 1800104 (2019).

    Article  Google Scholar 

  72. Baskin, I. I., Madzhidov, T. I., Antipin, I. S. & Varnek, A. A. Artificial intelligence in synthetic chemistry: achievements and prospects. Russ. Chem. Rev. 86, 1127–1156 (2017).

    Article  CAS  Google Scholar 

  73. Glavatskikh, M. et al. predictive models for kinetic parameters of cycloaddition reactions. Mol. Inform. 38, 1800077 (2019).

    Article  CAS  Google Scholar 

  74. Gimadiev, T. R. et al. Assessment of tautomer distribution using the condensed reaction graph approach. J. Comput. Aided Mol. Des. 32, 401–414 (2018).

    Article  CAS  PubMed  Google Scholar 

  75. Granda, J. M., Donina, L., Dragone, V., Long, D. L. & Cronin, L. Controlling an organic synthesis robot with machine learning to search for new reactivity. Nature 559, 377–381 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Ahneman, D. T., Estrada, J. G., Lin, S., Dreher, S. D. & Doyle, A. G. Predicting reaction performance in C–N cross-coupling using machine learning. Science 360, 186–190 (2018).

    Article  CAS  PubMed  Google Scholar 

  77. Skoraczyñski, G. et al. Predicting the outcomes of organic reactions via machine learning: are current descriptors sufficient? Sci. Rep. 7, 3582 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Probst, D., Schwaller, P. & Reymond, J.-L. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digit. Discov. 1, 91–97 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Marcou, G. et al. Expert system for predicting reaction conditions: the Michael reaction case. J. Chem. Inf. Model. 55, 239–250 (2015).

    Article  CAS  PubMed  Google Scholar 

  80. Gao, H. et al. Using machine learning to predict suitable conditions for organic reactions. ACS Cent. Sci. 4, 1465–1476 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Afonina, V. A. et al. Prediction of optimal conditions of hydrogenation reaction using the likelihood ranking approach. Int. J. Mol. Sci. 23, 248 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Lin, A. I. et al. Automatized assessment of protective group reactivity: a step toward big reaction data analysis. J. Chem. Inf. Model. 56, 2140–2148 (2016).

    Article  CAS  PubMed  Google Scholar 

  83. Schneider, G. Automating drug discovery. Nat. Rev. Drug Discov. 17, 97–113 (2018).

    Article  CAS  PubMed  Google Scholar 

  84. Abolhasani, M. & Kumacheva, E. The rise of self-driving labs in chemical and materials sciences. Nat. Synth. 2, 483–492 (2023).

    Article  Google Scholar 

  85. Reutlinger, M., Rodrigues, T., Schneider, P. & Schneider, G. Combining on-chip synthesis of a focused combinatorial library with computational target prediction reveals imidazopyridine GPCR ligands. Angew. Chem. Int. Ed. Engl. 53, 582–585 (2014).

    Article  CAS  PubMed  Google Scholar 

  86. Burger, B. et al. A mobile robotic chemist. Nature 583, 237–241 (2020).

    Article  CAS  PubMed  Google Scholar 

  87. Genheden, S., Norrby, P. O. & Engkvist, O. AiZynthTrain: robust, reproducible, and extensible pipelines for training synthesis prediction models. J. Chem. Inf. Model. 63, 1841–1846 (2023).

    Article  CAS  PubMed  Google Scholar 

  88. Ton, A.-T., Gentile, F., Hsing, M., Ban, F. & Cherkasov, A. Rapid identification of potential inhibitors of SARS- CoV-2 main protease by deep docking of 1.3 billion compounds. Mol. Inform. 39, e2000028 (2020).

    Article  PubMed  Google Scholar 

  89. Cherkasov, A., Ban, F., Li, Y., Fallahi, M. & Hammond, G. L. Progressive docking: a hybrid QSAR/docking approach for accelerating in silico high throughput screening. J. Med. Chem. 49, 7466–7478 (2006).

    Article  CAS  PubMed  Google Scholar 

  90. Hilpert, K., Fjell, C. D. & Cherkasov, A. Peptide-based drug design. Methods Mol. Biol. 494, 127–159 (2008).

    Article  CAS  PubMed  Google Scholar 

  91. Durrant, J. D. & McCammon, J. A. NNScore 2.0: a neural-network receptor-ligand scoring function. J. Chem. Inf. Model. 51, 2897–2903 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Svensson, F., Norinder, U. & Bender, A. Improving screening efficiency through iterative screening using docking and conformal prediction. J. Chem. Inf. Model. 57, 439–444 (2017).

    Article  CAS  PubMed  Google Scholar 

  93. Ahmed, L. et al. Efficient iterative virtual screening with Apache Spark and conformal prediction. J. Cheminform. 10, 8 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Rossetti, G. G. et al. Non-covalent SARS-CoV-2 Mpro inhibitors developed from in silico screen hits. Sci. Rep. 12, 2505 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Gentile, F. et al. Automated discovery of noncovalent inhibitors of SARS-CoV-2 main protease by consensus deep docking of 40 billion small molecules. Chem. Sci. 12, 15960–15974 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Garland, O. et al. Large-scale virtual screening for the discovery of SARS-CoV-2 papain-like protease (PLpro) non-covalent inhibitors. J. Chem. Inf. Model. 63, 2158–2169 (2023).

    Article  CAS  PubMed  Google Scholar 

  97. Radaeva, M. et al. Discovery of novel Lin28 Inhibitors to suppress cancer cell stemness. Cancers 14, 5687 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Gentile, F. et al. Deep docking: a deep learning platform for augmentation of structure based drug discovery. ACS Cent. Sci. 6, 939–949 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  99. Gorgulla, C. et al. VirtualFlow Ants — ultra-large virtual screenings with artificial intelligence driven docking algorithm based on ant colony optimization. Int. J. Mol. Sci. 22, 5807 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Charifson, P. S., Corkery, J. J., Murcko, M. A. & Walters, W. P. Consensus scoring: a method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. J. Med. Chem. 42, 5100–5109 (1999).

    Article  CAS  PubMed  Google Scholar 

  101. Palacio-Rodríguez, K., Lans, I., Cavasotto, C. N. & Cossio, P. Exponential consensus ranking improves the outcome in docking and receptor ensemble docking. Sci. Rep. 9, 5142 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Ban, F. et al. Best practices of computer-aided drug discovery: lessons learned from the development of a preclinical candidate for prostate cancer with a new mechanism of action. J. Chem. Inf. Model. 57, 1018–1028 (2017).

    Article  CAS  PubMed  Google Scholar 

  103. Liu, Z. et al. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 381–385 https://doi.org/10.1109/BIBM52615.2021.9669513 (2021).

  104. McNutt, A. T. & Koes, D. R. Improving ΔΔG predictions with a multitask convolutional siamese network. J. Chem. Inf. Model. 62, 1819–1829 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  105. Wang, J. & Dokholyan, N. V. Yuel: improving the generalizability of structure-free compound-protein interaction prediction. J. Chem. Inf. Model. 62, 463–471 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Li, X. et al. Deep learning enhancing kinome-wide polypharmacology profiling: model construction and experiment validation. J. Med. Chem. 63, 8723–8737 (2020).

    Article  CAS  PubMed  Google Scholar 

  107. Li, Z. et al. KinomeX: a web application for predicting kinome-wide polypharmacology effect of small molecules. Bioinformatics 35, 5354–5356 (2019).

    Article  CAS  PubMed  Google Scholar 

  108. Krishnan, S. R., Bung, N., Bulusu, G. & Roy, A. Accelerating de novo drug design against novel proteins using deep learning. J. Chem. Inf. Model. 61, 621–630 (2021).

    Article  CAS  PubMed  Google Scholar 

  109. Gentile, F. et al. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc. 17, 672–697 (2022).

    Article  CAS  PubMed  Google Scholar 

  110. LeGrand, S. et al. In: BCB ‘20: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics https://doi.org/10.1145/3388440.3412472 (Association for Computing Machinery, Inc., 2020).

  111. Gorgulla, C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 580, 663–668 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Venkatraman, V. et al. Drugsniffer: an open source workflow for virtually screening billions of molecules for binding affinity to protein targets. Front. Pharmacol. 13, 1389 (2022).

    Article  Google Scholar 

  113. Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Häse, F., Roch, L. M. & Aspuru-Guzik, A. Next-generation experimentation with self-driving laboratories. Trends Chem. 1, 282–291 (2019).

    Article  Google Scholar 

  115. Zubatiuk, T. & Isayev, O. Development of multimodal machine learning potentials: toward a physics-aware artificial intelligence. Acc. Chem. Res. 54, 1575–1585 (2021).

    Article  CAS  PubMed  Google Scholar 

  116. Behler, J. Four generations of high-dimensional neural network potentials. Chem. Rev. 121, 10037–10072 (2021).

    Article  CAS  PubMed  Google Scholar 

  117. Smith, J. S., Nebgen, B., Lubbers, N., Isayev, O. & Roitberg, A. E. Less is more: sampling chemical space with active learning. J. Chem. Phys. 148, 241733 (2018).

    Article  PubMed  Google Scholar 

  118. Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  119. Devereux, C. et al. Extending the applicability of the ANI deep learning molecular potential to sulfur and halogens. J. Chem. Theory Comput. 16, 4192–4202 (2020).

    Article  CAS  PubMed  Google Scholar 

  120. Galvelis, R., Doerr, S., Damas, J. M., Harvey, M. J. & De Fabritiis, G. A scalable molecular force field parameterization method based on density functional theory and quantum-level machine learning. J. Chem. Inf. Model. 59, 3485–3493 (2019).

    Article  CAS  PubMed  Google Scholar 

  121. Rufa, D. A. et al. Towards chemical accuracy for alchemical free energy calculations with hybrid physics-based machine learning / molecular mechanics potentials. Preprint at: bioRxiv https://doi.org/10.1101/2020.07.29.227959 (2020).

    Article  Google Scholar 

  122. Wang, L. et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703 (2015).

    Article  CAS  PubMed  Google Scholar 

  123. Zubatyuk, R., Smith, J. S., Leszczynski, J. & Isayev, O. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network. Sci. Adv. 5, eaav6490 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  124. Matta, C. F. & Boyd, R. J. An introduction to the quantum theory of atoms in molecules. The Quantum Theory of Atoms in Molecules https://doi.org/10.1002/9783527610709.ch1 (2007).

  125. Gokcan, H. & Isayev, O. Prediction of protein pKa with representation learning. Chem. Sci. 13, 2462–2474 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  126. Bas, D. C., Rogers, D. M. & Jensen, J. H. Very fast prediction and rationalization of pKa values for protein-ligand complexes. Proteins 73, 765–783 (2008).

    Article  CAS  PubMed  Google Scholar 

  127. Lam, Y. H. et al. Applications. Org. Process. Res. Dev. 24, 1496–1507 (2020).

    Article  CAS  Google Scholar 

  128. Hassanzadeh, P. Towards the quantum of quantum chemistry in pharmaceutical process development: current state and opportunities-enabled technologies for development of drugs or delivery systems. J. Control. Rel. 324, 260–279 (2020).

    Article  CAS  Google Scholar 

  129. Li, Q. et al. The role of UNC5C in Alzheimer’s disease. Ann. Transl. Med. 6, 178 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Cao, Y., Romero, J. & Aspuru-Guzik, A. Potential of quantum computing for drug discovery. IBM J. Res. Dev. 62, 10.1147/JRD.2018.2888987 (2018).

  131. Kirsopp, J. J. M. et al. Quantum computational quantification of protein-ligand interactions. Int. J. Quantum Chem. 122, e26975 (2022).

    Article  CAS  Google Scholar 

  132. Outeiral, C. et al. The prospects of quantum computing in computational molecular biology. Wiley Interdiscip. Rev. Comput. Mol. Sci. 11, e1481 (2021).

    Article  CAS  Google Scholar 

  133. Li, J. et al. Drug discovery approaches using quantum machine learning. Preprint at: arXiv https://doi.org/10.48550/arxiv.2104.00746 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  134. Romero, J., Olson, J. P. & Aspuru-Guzik, A. Quantum autoencoders for efficient compression of quantum data. Quantum Sci. Technol. 2, 045001 (2017).

    Article  Google Scholar 

  135. Cavasotto, C. N. Binding free energy calculation using quantum mechanics aimed for drug lead optimization. Methods Mol. Biol. 2114, 257–268 (2020).

    Article  CAS  PubMed  Google Scholar 

  136. Heinen, S. et al. Predicting toxicity by quantum machine learning. J. Phys. Commun. 4, 125012 (2020).

    Article  Google Scholar 

  137. Jayatunga, M. K. P., Xie, W., Ruder, L., Schulze, U. & Meier, C. AI in small-molecule drug discovery: a coming wave? Nat. Rev. Drug Discov. 21, 175–176 (2022).

    Article  CAS  PubMed  Google Scholar 

  138. Pyzer-Knapp, E. O. Using Bayesian optimization to accelerate virtual screening for the discovery of therapeutics appropriate for repurposing for COVID-19. Preprint at: arXiv https://doi.org/10.48550/arxiv.2005.07121 (2020).

    Article  Google Scholar 

  139. Jastrzębski, S. et al. Emulating docking results using a deep neural network: a new perspective for virtual screening. J. Chem. Inf. Model. 60, 4246–4262 (2020).

    Article  PubMed  Google Scholar 

  140. Graff, D. E., Shakhnovich, E. I. & Coley, C. W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. 12, 7866–7881 (2020).

    Article  Google Scholar 

  141. Martin, L. J. State of the art iterative docking with logistic regression and Morgan fingerprints. ChemRxiv https://doi.org/10.26434/chemrxiv.14348117.v1 (2021).

    Article  Google Scholar 

  142. Berenger, F., Kumar, A., Zhang, K. Y. J. & Yamanishi, Y. Lean-docking: exploiting ligands’ predicted docking scores to accelerate molecular docking. J. Chem. Inf. Model. 61, 2341–2352 (2021).

    Article  CAS  PubMed  Google Scholar 

  143. Kalliokoski, T. Machine learning boosted docking (HASTEN): an open-source tool to accelerate structure-based virtual screening campaigns. Mol. Inform. 40, 2100089 (2021).

    Article  CAS  Google Scholar 

  144. Mehta, S. et al. MEMES: machine learning framework for enhanced molecular screening. Chem. Sci. 12, 11710–11721 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  145. Yang, Y. et al. Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput. 17, 7106–7119 (2021).

    Article  CAS  PubMed  Google Scholar 

  146. Choi, J. & Lee, J. V-Dock: fast generation of novel drug-like molecules using machine-learning-based docking score and molecular optimization. Int. J. Mol. Sci. 22, 11635 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Bucinsky, L. et al. Machine learning prediction of 3CLpro SARS-CoV-2 docking scores. Comput. Biol. Chem. 98, 107656 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  148. Sha, C. M., Wang, J. & Dokholyan, N. V. NeuralDock: rapid and conformation-agnostic docking of small molecules. Front. Mol. Biosci. 9, 244 (2022).

    Article  Google Scholar 

  149. Morris, C. J., Stern, J. A., Stark, B., Christopherson, M. & Della Corte, D. MILCDock: machine learning enhanced consensus docking for virtual screening in drug discovery. J. Chem. Inf. Model. 62, 5342–5350 (2022).

    Article  CAS  PubMed  Google Scholar 

  150. García-Ortegón, M. et al. DOCKSTRING: easy molecular docking yields better benchmarks for ligand design. J. Chem. Inf. Model. 62, 3486–3502 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  151. Qiu, Y. et al. Development and benchmarking of open force field v1.0.0 — the parsley small-molecule force field. J. Chem. Theory Comput. 17, 6262–6280 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Tingle, B. I. et al. ZINC-22 — a free multi-billion-scale database of tangible compounds for ligand discovery. J. Chem. Inf. Model. 63, 1166–1176 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Babuji, Y. Targeting SARS-CoV-2 with AI- and HPC-enabled lead generation: a first data release. Preprint at: arXiv https://doi.org/10.48550/arXiv.2006.02431 (2020).

    Article  Google Scholar 

  154. Warr, W. A., Nicklaus, M. C., Nicolaou, C. A. & Rarey, M. Exploration of ultralarge compound collections for drug discovery. J. Chem. Inf. Model. 62, 2021–2034 (2022).

    Article  CAS  PubMed  Google Scholar 

  155. Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J. L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17.J. Chem. Inf. Model. 52, 2864–2875 (2012).

    Article  CAS  PubMed  Google Scholar 

  156. Oprea, T. I. & Gottfries, J. Chemography: the art of navigating in chemical space. J. Comb. Chem. 3, 157–166 (2001).

    Article  CAS  PubMed  Google Scholar 

  157. Medina-Franco, J., Martinez-Mayorga, K., Giulianotti, M., Houghten, R. & Pinilla, C. Visualization of the chemical space in drug discovery. Curr. Comput. Aided Drug Des. 4, 322–333 (2008).

    Article  CAS  Google Scholar 

  158. Kireeva, N. et al. Generative topographic mapping (GTM): universal tool for data visualization, structure-activity modeling and dataset comparison. Mol. Inform. 31, 301–312 (2012).

    Article  CAS  PubMed  Google Scholar 

  159. Zabolotna, Y. et al. Chemography: searching for hidden treasures. J. Chem. Inf. Model. 61, 179–188 (2021).

    Article  CAS  PubMed  Google Scholar 

  160. Casciuc, I. et al. Virtual screening with generative topographic maps: how many maps are required? J. Chem. Inf. Model. 59, 564–572 (2019).

    Article  CAS  PubMed  Google Scholar 

  161. Zabolotna, Y. et al. ChemSpace Atlas: multiscale chemography of ultralarge libraries for drug discovery. J. Chem. Inf. Model. 62, 4537–4548 (2022).

    Article  CAS  PubMed  Google Scholar 

  162. Sattarov, B. et al. De novo molecular design by combining deep autoencoder recurrent neural networks with generative topographic mapping. J. Chem. Inf. Model. 59, 1182–1196 (2019).

    Article  CAS  PubMed  Google Scholar 

  163. Bort, W. et al. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci. Rep. 11, 3178 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors acknowledge support of their studies by the National Institutes of Health (grant R01GM140154) for A.T. and National Science Foundation (grant CHE-2154447) for O.I.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Alexander Tropsha or Artem Cherkasov.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Drug Discovery thanks Esben Jannik Bjerrum, Eric Martin and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

BioSolveIT KnowledgeSpace: https://www.biosolveit.de/infiniSee/#knowledgespace

CAS. AI drug discovery: assessing the first AI-designed drug candidates to go into human clinical trials: https://www.cas.org/resources/cas-insights/drug-discovery/ai-designed-drug-candidates

CHEMRriya: https://chemriya.com/

Enamine REAL Database: https://enamine.net/compound-collections/real-compounds/real-database

Enamine REAL Space on-demand library: https://enamine.net/compound-collections/real-compounds/real-space-navigator

eXplore from eMolecules: https://www.biosolveit.de/2022/09/12/introducing-explore-trillion-sized-chemical-space-by-emolecules/

Gartner hype cycle: https://www.gartner.com/en/articles/what-s-new-in-artificial-intelligence-from-the-2022-gartner-hype-cycle

Insilico Medicine. From start to phase 1 in 30 months: https://insilico.com/phase1

Kaggle. Merck Molecular Activity Challenge: https://www.kaggle.com/c/MerckActivity

KNIME: https://sites.google.com/site/dtclabdc/

MolDB from Deepcure: https://deepcure.ai/technology/

WuXi AppTec: https://www.wuxiapptec.com/

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tropsha, A., Isayev, O., Varnek, A. et al. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 23, 141–155 (2024). https://doi.org/10.1038/s41573-023-00832-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41573-023-00832-0

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing