Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Artificial intelligence for natural product drug discovery

Abstract

Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Applications of artificial intelligence in natural product and drug discovery.
Fig. 2: Example compounds discovered using artificial intelligence approaches.
Fig. 3: Predicting biological activities and macromolecular targets from genomic, metabolomic and phenotypic data.
Fig. 4: Chemical featurization techniques.
Fig. 5: Depositing and sharing natural product data: infrastructure and incentives.

Similar content being viewed by others

References

  1. Dobson, P. D., Patel, Y. & Kell, D. B. ‘Metabolite-likeness’ as a criterion in the design and selection of pharmaceutical drug libraries. Drug Discov. Today 14, 31–40 (2009).

    Article  CAS  PubMed  Google Scholar 

  2. Newman, D. J. & Cragg, G. M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019. J. Nat. Prod. 83, 770–803 (2020).

    Article  CAS  PubMed  Google Scholar 

  3. Koehn, F. E. & Carter, G. T. The evolving role of natural products in drug discovery. Nat. Rev. Drug. Discov. 4, 206–220 (2005).

    Article  CAS  PubMed  Google Scholar 

  4. Terlouw, B. R. et al. MIBiG 3.0: a community-driven effort to annotate experimentally validated biosynthetic gene clusters. Nucleic Acids Res. 51, D603–D610 (2023).

    Article  CAS  PubMed  Google Scholar 

  5. Gavriilidou, A. et al. Compendium of specialized metabolite biosynthetic diversity encoded in bacterial genomes. Nat. Microbiol. 7, 726–735 (2022).

    Article  CAS  PubMed  Google Scholar 

  6. van der Hooft, J. J. J. et al. Linking genomics and metabolomics to chart specialized metabolic diversity. Chem. Soc. Rev. 49, 3297–3314 (2020).

    Article  PubMed  Google Scholar 

  7. Doerr, S. et al. TorchMD: a deep learning framework for molecular simulations. J. Chem. Theory Comput. 17, 2355–2363 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Rodríguez-Espigares, I. et al. GPCRmd uncovers the dynamics of the 3D-GPCRome. Nat. Methods 17, 777–787 (2020).

    Article  PubMed  Google Scholar 

  9. Liu, X., IJzerman, A. P. & van Westen, G. J. P. Computational approaches for de novo drug design: past, present, and future. Methods Mol. Biol. 2190, 139–165 (2021).

    Article  CAS  PubMed  Google Scholar 

  10. Choudhury, C., Arul Murugan, N. & Priyakumar, U. D. Structure-based drug repurposing: traditional and advanced AI/ML-aided methods. Drug Discov. Today 27, 1847–1861 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Blin, K. et al. antiSMASH 6.0: improving cluster detection and comparison capabilities. Nucleic Acids Res. 49, W29–W35 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Skinnider, M. A. et al. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat. Commun. 11, 6058 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Medema, M. H. & Fischbach, M. A. Computational approaches to natural product discovery. Nat. Chem. Biol. 11, 639–648 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Medema, M. H., de Rond, T. & Moore, B. S. Mining genomes to illuminate the specialized chemistry of life. Nat. Rev. Genet. 22, 553–571 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Cimermancic, P. et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158, 412–421 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hannigan, G. D. et al. A deep learning genome-mining strategy for biosynthetic gene cluster prediction. Nucleic Acids Res. 47, e110 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Carroll, L. M. et al. Accurate de novo identification of biosynthetic gene clusters with GECCO. Preprint at bioRxiv https://doi.org/10.1101/2021.05.03.442509 (2021).

  18. Sanchez, S. et al. Expansion of novel biosynthetic gene clusters from diverse environments using SanntiS. Preprint at bioRxiv https://doi.org/10.1101/2023.05.23.540769 (2023).

  19. Kloosterman, A. M. et al. Expansion of RiPP biosynthetic space through integration of pan-genomics and machine learning uncovers a novel class of lanthipeptides. PLoS Biol. 18, e3001026 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. de Los Santos, E. L. C. NeuRiPP: neural network identification of RiPP precursor peptides. Sci. Rep. 9, 13406 (2019).

    Article  Google Scholar 

  21. Merwin, N. J. et al. DeepRiPP integrates multiomics data to automate discovery of novel ribosomally synthesized natural products. Proc. Natl Acad. Sci. USA 117, 371–380 (2020).

    Article  CAS  PubMed  Google Scholar 

  22. Tietz, J. I. et al. A new genome-mining tool redefines the lasso peptide biosynthetic landscape. Nat. Chem. Biol. 13, 470–478 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Louwen, J. J. R. & van der Hooft, J. J. J. Comprehensive large-scale integrative analysis of omics data to accelerate specialized metabolite discovery. mSystems 6, e0072621 (2021).

    Article  PubMed  Google Scholar 

  24. Huber, F. et al. Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships. PLoS Comput. Biol. 17, e1008724 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Huber, F., van der Burg, S., van der Hooft, J. J. J. & Ridder, L. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra. J. Cheminform. 13, 84 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Ludwig, M. et al. Databse-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat. Mach. Intell. 2, 629–641 (2020).

    Article  Google Scholar 

  27. Hoffmann, M. A. et al. High-confidence structural annotation of metabolites absent from spectral libraries. Nat. Biotechnol. 40, 411–421 (2022).

    Article  CAS  PubMed  Google Scholar 

  28. Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 39, 462–471 (2021).

    Article  PubMed  Google Scholar 

  29. Kim, H. W. et al. NPClassifier: a deep neural network-based structural classification tool for natural products. J. Nat. Prod. 84, 2795–2807 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Aalizadeh, R., Nika, M.-C. & Thomaidis, N. S. Development and application of retention time prediction models in the suspect and non-target screening of emerging contaminants. J. Hazard. Mater. 363, 277–285 (2019).

    Article  CAS  PubMed  Google Scholar 

  31. Chen, D., Wang, Z., Guo, D., Orekhov, V. & Qu, X. Review and prospect: deep learning in nuclear magnetic resonance spectroscopy. Chemistry 26, 10391–10401 (2020).

    Article  CAS  PubMed  Google Scholar 

  32. Wu, K. et al. Improvement in signal-to-noise ratio of liquid-state NMR spectroscopy via a deep neural network DN-unet. Anal. Chem. 93, 1377–1382 (2021).

    Article  CAS  PubMed  Google Scholar 

  33. Ito, K., Xu, X. & Kikuchi, J. Improved prediction of carbonless NMR spectra by the machine learning of theoretical and fragment descriptors for environmental mixture analysis. Anal. Chem. 93, 6901–6906 (2021).

    Article  CAS  PubMed  Google Scholar 

  34. Li, D.-W., Hansen, A. L., Yuan, C., Bruschweiler-Li, L. & Brüschweiler, R. DEEP picker is a deep neural network for accurate deconvolution of complex two-dimensional NMR spectra. Nat. Commun. 12, 5229 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Zheng, S. et al. Deep learning driven biosynthetic pathways navigation for natural products with BioNavi-NP. Nat. Commun. 13, 3342 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Milanowski, D. J. et al. Unequivocal determination of caulamidines A and B: application and validation of new tools in the structure elucidation tool box. Chem. Sci. 9, 307–314 (2018).

    Article  CAS  PubMed  Google Scholar 

  37. Audoin, C. et al. Metabolome consistency: additional parazoanthines from the mediterranean zoanthid parazoanthus axinellae. Metabolites 4, 421–432 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Fox Ramos, A. E. et al. CANPA: computer-assisted natural products anticipation. Anal. Chem. 91, 11247–11252 (2019).

    Article  CAS  PubMed  Google Scholar 

  39. Jones, C. G. et al. The CryoEM method MicroED as a powerful tool for small molecule structure determination. ACS Cent. Sci. 4, 1587–1592 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Kim, L. J. et al. Prospecting for natural products by genome mining and microcrystal electron diffraction. Nat. Chem. Biol. 17, 872–877 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Searching molecular structure databases with tandem mass spectra using CSI:fingerID. Proc. Natl Acad. Sci. USA 112, 12580–12585 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Lindsay, R. K. Applications of Artificial Intelligence for Organic Chemistry: The DENDRAL Project (McGraw-Hill, 1980).

  43. Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).

    Article  PubMed  Google Scholar 

  44. Stravs, M. A., Dührkop, K., Böcker, S. & Zamboni, N. MSNovelist: de novo structure generation from mass spectra. Nat. Methods 19, 865–870 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Colby, S. M., Nuñez, J. R., Hodas, N. O., Corley, C. D. & Renslow, R. R. Deep learning to generate chemical property libraries and candidate molecules for small molecule identification in complex samples. Anal. Chem. 92, 1720–1729 (2020).

    Article  CAS  PubMed  Google Scholar 

  46. Burns, D. C., Mazzola, E. P. & Reynolds, W. F. The role of computer-assisted structure elucidation (CASE) programs in the structure elucidation of complex natural products. Nat. Prod. Rep. 36, 919–933 (2019).

    Article  CAS  PubMed  Google Scholar 

  47. Reher, R. et al. A convolutional neural network-based approach for the rapid annotation of molecularly diverse natural products. J. Am. Chem. Soc. 142, 4114–4120 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Kim, H. W., Zhang, C., Cottrell, G. W. & Gerwick, W. H. SMART‐Miner: a convolutional neural network‐based metabolite identification from 1H‐13C HSQC spectra. Magn. Reson. Chem. 60, 1070–1075 (2022).

    Article  PubMed  Google Scholar 

  49. Wang, C. et al. COLMAR lipids web server and ultrahigh-resolution methods for two-dimensional nuclear magnetic resonance- and mass spectrometry-based lipidomics. J. Proteome Res. 19, 1674–1683 (2020).

    Article  CAS  PubMed  Google Scholar 

  50. Smith, S. G. & Goodman, J. M. Assigning stereochemistry to single diastereoisomers by GIAO NMR calculation: the DP4 probability. J. Am. Chem. Soc. 132, 12946–12959 (2010).

    Article  CAS  PubMed  Google Scholar 

  51. Howarth, A., Ermanis, K. & Goodman, J. DP4-AI automated NMR data analysis: straight from spectrometer to structure. Chem. Sci. 11, 4351–4359 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Das, S., Edison, A. S. & Merz, K. M. Jr. Metabolite structure assignment using in silico NMR techniques. Anal. Chem. 92, 10412–10419 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Rodrigues, T., Reker, D., Schneider, P. & Schneider, G. Counting on natural products for drug design. Nat. Chem. 8, 531–541 (2016).

    Article  CAS  PubMed  Google Scholar 

  54. Lanz, J. & Riedl, R. Merging allosteric and active site binding motifs: de novo generation of target selectivity and potency via natural-product-derived fragments. ChemMedChem 10, 451–454 (2015).

    Article  CAS  PubMed  Google Scholar 

  55. Reker, D. et al. Revealing the macromolecular targets of complex natural products. Nat. Chem. 6, 1072–1078 (2014).

    Article  CAS  PubMed  Google Scholar 

  56. Wassermann, A. M. et al. A screening pattern recognition method finds new and divergent targets for drugs and natural products. ACS Chem. Biol. 9, 1622–1631 (2014).

    Article  CAS  PubMed  Google Scholar 

  57. Rollinger, J. M., Hornick, A., Langer, T., Stuppner, H. & Prast, H. Acetylcholinesterase inhibitory activity of scopolin and scopoletin discovered by virtual screening of natural products. J. Med. Chem. 47, 6248–6254 (2004).

    Article  CAS  PubMed  Google Scholar 

  58. Reker, D. et al. Machine learning uncovers food- and excipient-drug interactions. Cell Rep. 30, 3710–3716.e4 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Conde, J. et al. Allosteric antagonist modulation of TRPV2 by piperlongumine impairs glioblastoma progression. ACS Cent. Sci. 7, 868–881 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Lagunin, A., Filimonov, D. & Poroikov, V. Multi-targeted natural products evaluation based on biological activity prediction with PASS. Curr. Pharm. Des. 16, 1703–1717 (2010).

    Article  CAS  PubMed  Google Scholar 

  61. Sá, M. S. et al. Antimalarial activity of physalins B, D, F, and G. J. Nat. Prod. 74, 2269–2272 (2011).

    Article  PubMed  Google Scholar 

  62. Schneider, G. et al. Deorphaning the macromolecular targets of the natural anticancer compound doliculide. Angew. Chem. Int. Ed. Engl. 55, 12408–12411 (2016).

    Article  CAS  PubMed  Google Scholar 

  63. Bertoni, M. et al. Bioactivity descriptors for uncharacterized chemical compounds. Nat. Commun. 12, 3932 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 181, 475–483 (2020).

    Article  CAS  PubMed  Google Scholar 

  65. Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. https://doi.org/10.1038/s41589-023-01349-8 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Pandey, M. et al. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4, 211–221 (2022).

    Article  Google Scholar 

  69. Schindler, C. E. M. et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model. 60, 5457–5474 (2020).

    Article  CAS  PubMed  Google Scholar 

  70. Walker, A. S. & Clardy, J. A machine learning bioinformatics method to predict biological activity from biosynthetic gene clusters. J. Chem. Inf. Model. 61, 2560–2571 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Yang, Z. et al. Deep-BGCpred: a unified deep learning genome-mining framework for biosynthetic gene cluster prediction. Preprint at bioRxiv https://doi.org/10.1101/2021.11.15.468547 (2021).

  72. Mikolov, T., Chen, K., Corrado, G. & Dean, J. Efficient estimation of word representations in vector space. Preprint at arXiv. https://doi.org/10.48550/ARXIV.1301.3781 (2013).

  73. Thaker, M. N. et al. Identifying producers of antibacterial compounds by screening for antibiotic resistance. Nat. Biotechnol. 31, 922–927 (2013).

    Article  CAS  PubMed  Google Scholar 

  74. Alcock, B. P. et al. CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic Acids Res. 48, D517–D525 (2020).

    CAS  PubMed  Google Scholar 

  75. Bortolaia, V. et al. ResFinder 4.0 for predictions of phenotypes from genotypes. J. Antimicrob. Chemother. 75, 3491–3500 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Mungan, M. D. et al. ARTS 2.0: feature updates and expansion of the antibiotic resistant target seeker for comparative genome mining. Nucleic Acids Res. 48, W546–W552 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Jia, B. et al. CARD 2017: expansion and model-centric curation of the comprehensive antibiotic resistance database. Nucleic Acids Res. 45, D566–D573 (2017).

    Article  CAS  PubMed  Google Scholar 

  78. Sélem-Mojica, N., Aguilar, C., Gutiérrez-García, K., Martínez-Guerrero, C. E. & Barona-Gómez, F. EvoMining reveals the origin and fate of natural product biosynthetic enzymes. Microb. Genom. 5, e000260 (2019).

    PubMed  PubMed Central  Google Scholar 

  79. Chevrette, M. G. et al. Evolutionary dynamics of natural product biosynthesis in bacteria. Nat. Prod. Rep. 37, 566–599 (2020).

    Article  CAS  PubMed  Google Scholar 

  80. Cereto-Massagué, A. et al. Molecular fingerprint similarity search in virtual screening. Methods 71, 58–63 (2015).

    Article  PubMed  Google Scholar 

  81. Willighagen, E. L. et al. The chemistry development kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 9, 33 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  82. Todeschini, R. & Consonni, V. Handbook of Molecular Descriptors (John Wiley & Sons, 2008).

  83. Skinnider, M. A., Dejong, C. A., Franczak, B. C., McNicholas, P. D. & Magarvey, N. A. Comparative analysis of chemical similarity methods for modular natural products with a hypothetical structure enumeration algorithm. J. Cheminform. 9, 46 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  CAS  PubMed  Google Scholar 

  85. Riniker, S. & Landrum, G. A. Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminform. 5, 26 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  86. O’Boyle, N. M. & Sayle, R. A. Comparing structural fingerprints using a literature-based similarity benchmark. J. Cheminform. 8, 36 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Grisoni, F. et al. Scaffold hopping from natural products to synthetic mimetics by holistic molecular similarity. Commun. Chem. 1, 44 (2018).

    Article  Google Scholar 

  88. Capecchi, A., Probst, D. & Reymond, J.-L. One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome. J. Cheminform. 12, 43 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Capecchi, A. & Reymond, J.-L. Assigning the origin of microbial natural products by chemical space map and machine learning. Biomolecules 10, 1385 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Riniker, S. Molecular dynamics fingerprints (MDFP): machine learning from MD data to predict free-energy differences. J. Chem. Inf. Model. 57, 726–741 (2017).

    Article  CAS  PubMed  Google Scholar 

  91. Esposito, C., Wang, S., Lange, U. E. W., Oellien, F. & Riniker, S. Combining machine learning and molecular dynamics to predict p-glycoprotein substrates. J. Chem. Inf. Model. 60, 4730–4749 (2020).

    Article  CAS  PubMed  Google Scholar 

  92. Bannan, C. C. et al. Blind prediction of cyclohexane–water distribution coefficients from the SAMPL5 challenge. J. Comput. Aided Mol. Des. 30, 927–944 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Wang, S. & Riniker, S. Use of molecular dynamics fingerprints (MDFPs) in SAMPL6 octanol-water log P blind challenge. J. Comput. Aided Mol. Des. 34, 393–403 (2020).

    Article  CAS  PubMed  Google Scholar 

  94. Gorostiola González, M. et al. 3DDPDs: describing protein dynamics for proteochemometric bioactivity prediction. A case for (mutant) G protein-coupled receptors. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2023-90082 (2023).

  95. Durairaj, J., Akdel, M., de Ridder, D. & van Dijk, A. D. J. Geometricus represents protein structures as shape-mers derived from moment invariants. Bioinformatics 36, i718–i725 (2020).

    Article  CAS  PubMed  Google Scholar 

  96. Paull, K. D. et al. Display and analysis of patterns of differential activity of drugs against human tumor cell lines: development of mean graph and COMPARE algorithm. J. Natl Cancer Inst. 81, 1088–1092 (1989).

    Article  CAS  PubMed  Google Scholar 

  97. Kauvar, L. M. et al. Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107–118 (1995).

    Article  CAS  PubMed  Google Scholar 

  98. Petrone, P. M. et al. Rethinking molecular similarity: comparing compounds on the basis of biological activity. ACS Chem. Biol. 7, 1399–1409 (2012).

    Article  CAS  PubMed  Google Scholar 

  99. Norinder, U., Spjuth, O. & Svensson, F. Using predicted bioactivity profiles to improve predictive modeling. J. Chem. Inf. Model. 60, 2830–2837 (2020).

    Article  CAS  PubMed  Google Scholar 

  100. Mater, A. C. & Coote, M. L. Deep learning in chemistry. J. Chem. Inf. Model. 59, 2545–2559 (2019).

    Article  CAS  PubMed  Google Scholar 

  101. Bronstein, M. M., Bruna, J., Cohen, T. & Veličković, P. Geometric deep learning: grids, groups, graphs, geodesics, and gauges. Preprint at arXiv. https://doi.org/10.48550/arXiv.2104.13478 (2021).

  102. Wu, Z. et al. MoleculeNet: a benchmark for molecular machine learning. Chem. Sci. 9, 513–530 (2018).

    Article  CAS  PubMed  Google Scholar 

  103. van Tilborg, D., Alenicheva, A. & Grisoni, F. Exposing the limitations of molecular machine learning with activity cliffs. J. Chem. Inf. Model. 62, 5938–5951 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  104. Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer Nature, 2019).

  105. Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).

    Article  Google Scholar 

  106. Jiménez-Luna, J., Skalic, M., Weskamp, N. & Schneider, G. Coloring molecules with explainable artificial intelligence for preclinical relevance assessment. J. Chem. Inf. Model. 61, 1083–1094 (2021).

    Article  PubMed  Google Scholar 

  107. Preuer, K., Klambauer, G., Rippmann, F., Hochreiter, S. & Unterthiner, T. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R.) 331–345 (Springer International Publishing, 2019).

  108. Webel, H. E. et al. Revealing cytotoxic substructures in molecules using deep learning. J. Comput. Aided Mol. Des. 34, 731–746 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Kearnes, S., McCloskey, K., Berndl, M., Pande, V. & Riley, P. Molecular graph convolutions: moving beyond fingerprints. J. Comput. Aided Mol. Des. 30, 595–608 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Coley, C. W., Barzilay, R., Green, W. H., Jaakkola, T. S. & Jensen, K. F. Convolutional embedding of attributed molecular graphs for physical property prediction. J. Chem. Inf. Model. 57, 1757–1772 (2017).

    Article  CAS  PubMed  Google Scholar 

  111. Duvenaud, D. et al. Convolutional networks on graphs for learning molecular fingerprints. in Advances in Neural Information Processing Systems 28 (NIPS 015).

  112. Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. in Proceedings of the 34th International Conference on Machine Learning 1263–1272 (2017).

  113. Nguyen, T. et al. GraphDTA: predicting drug-target binding affinity with graph neural networks. Bioinformatics 37, 1140–1147 (2021).

    Article  CAS  PubMed  Google Scholar 

  114. Yuan, W. et al. Chemical space mimicry for drug discovery. J. Chem. Inf. Model. 57, 875–882 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  115. Segler, M. H. S., Kogej, T., Tyrchan, C. & Waller, M. P. Generating focused molecule libraries for drug discovery with recurrent neural networks. ACS Cent. Sci. 4, 120–131 (2018).

    Article  CAS  PubMed  Google Scholar 

  116. Liu, X., Ye, K., van Vlijmen, H. W. T., IJzerman, A. P. & van Westen, G. J. P. DrugEx v3: scaffold-constrained drug design with graph transformer-based reinforcement learning. J. Cheminform. 15, 24 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  117. Li, X. & Fourches, D. Inductive transfer learning for molecular activity prediction: next-gen QSAR models with MolPMoFiT. J. Cheminform. 12, 27 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Karpov, P., Godin, G. & Tetko, I. V. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J. Cheminform. 12, 17 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  119. Gainza, P. et al. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nat. Methods 17, 184–192 (2020).

    Article  CAS  PubMed  Google Scholar 

  120. Winter, R., Montanari, F., Noé, F. & Clevert, D.-A. Learning continuous and data-driven molecular descriptors by translating equivalent chemical representations. Chem. Sci. 10, 1692–1701 (2019).

    Article  CAS  PubMed  Google Scholar 

  121. Bjerrum, E. J. & Sattarov, B. Improving chemical autoencoder latent space and molecular generation diversity with heteroencoders. Biomolecules 8, 131 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  122. Gómez-Bombarelli, R. et al. Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4, 268–276 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  123. Atz, K., Grisoni, F. & Schneider, G. Geometric deep learning on molecular representations. Nat. Mach. Intell. 3, 1023–1032 (2021).

    Article  Google Scholar 

  124. Callaway, E. After AlphaFold: protein-folding contest seeks next big breakthrough. Nature 613, 13–14 (2023).

    Article  CAS  PubMed  Google Scholar 

  125. Wallner, B. AFsample: improving multimer prediction with alphafold using aggressive sampling. Preprint at bioRxiv https://doi.org/10.1101/2022.12.20.521205 (2022).

  126. Bender, A. & Cortés-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: ways to make an impact, and why we are not there yet. Drug Discov. Today 26, 511–524 (2021).

    Article  CAS  PubMed  Google Scholar 

  127. Bender, A. & Cortés-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 2: a discussion of chemical and biological data. Drug Discov. Today 26, 1040–1052 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  128. Sydow, D., Rodríguez-Guerra, J. & Volkamer, A. in Teaching Programming across the Chemistry Curriculum 135–158 ACS Symposium Series vol. 1387 (American Chemical Society, 2021).

  129. Korshunova, M., Ginsburg, B., Tropsha, A. & Isayev, O. OpenChem: a deep learning toolkit for computational chemistry and drug design. J. Chem. Inf. Model. 61, 7–13 (2021).

    Article  CAS  PubMed  Google Scholar 

  130. Sieg, J., Flachsenberg, F. & Rarey, M. In need of bias control: evaluating chemical data for machine learning in structure-based virtual screening. J. Chem. Inf. Model. 59, 947–961 (2019).

    Article  CAS  PubMed  Google Scholar 

  131. Lenselink, E. B. et al. Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set. J. Cheminform. 9, 45 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  133. Topçuoğlu, B. D., Lesniak, N. A., Ruffin, M. T. 4th, Wiens, J. & Schloss, P. D. A framework for effective application of machine learning to microbiome-based classification problems. MBio 11, e00434-20 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  134. Quinn, T. P. & Erb, I. Examining microbe–metabolite correlations by linear methods. Nat. Methods 18, 37–39 (2021).

    Article  CAS  PubMed  Google Scholar 

  135. Morger, A. et al. KnowTox: pipeline and case study for confident prediction of potential toxic effects of compounds in early phases of development. J. Cheminform. 12, 24 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  136. Soleimany, A. P. et al. Evidential deep learning for guided molecular property prediction and discovery. ACS Cent. Sci. 7, 1356–1367 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Manica, M. et al. Toward explainable anticancer compound sensitivity prediction via multimodal attention-based convolutional encoders. Mol. Pharm. 16, 4797–4806 (2019).

    Article  CAS  PubMed  Google Scholar 

  138. Grinsztajn, L., Oyallon, E. & Varoquaux, G. in Advances in Neural Information Processing Systems 35 (NeurIPS 2022) 507–520 (2022).

  139. Chithrananda, S., Grand, G. & Ramsundar, B. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. Preprint at https://doi.org/10.48550/arXiv.2010.09885 (2020).

  140. Irwin, R., Dimitriadis, S., He, J. & Bjerrum, E. J. Chemformer: a pre-trained transformer for computational chemistry. Mach. Learn. Sci. Technol. 3, 015022 (2022).

    Article  Google Scholar 

  141. Chapelle, O., Zien, A. & Schölkopf, B. (Eds) Semi-Supervised Learning (MIT, 2006).

  142. Zhang, Y. & Lee, A. A. Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem. Sci. 10, 8154–8163 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Röttig, M. et al. NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res. 39, W362–W367 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  144. Torrey, L. & Shavlik, J. in Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques 242–264 (IGI Global, 2010).

  145. Cai, C. et al. Transfer learning for drug discovery. J. Med. Chem. 63, 8683–8694 (2020).

    Article  CAS  PubMed  Google Scholar 

  146. Moret, M., Helmstädter, M., Grisoni, F., Schneider, G. & Merk, D. Beam search for automated design and scoring of novel ROR ligands with machine intelligence. Angew. Chem. Int. Ed. Engl. 60, 19477–19482 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  147. Moret, M., Friedrich, L., Grisoni, F., Merk, D. & Schneider, G. Generative molecular design in low data regimes. Nat. Mach. Intell. 2, 171–180 (2020).

    Article  Google Scholar 

  148. Moret, M. et al. Leveraging molecular structure and bioactivity with chemical language models for de novo drug design. Nat. Commun. 14, 114 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Reker, D. Practical considerations for active machine learning in drug discovery. Drug Discov. Today Technol. 32–33, 73–79 (2019).

    Article  PubMed  Google Scholar 

  150. Reker, D., Schneider, P. & Schneider, G. Multi-objective active machine learning rapidly improves structure-activity models and reveals new protein-protein interaction inhibitors. Chem. Sci. 7, 3919–3927 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  151. Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminform. 8, 61 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  152. Reher, R. et al. Native metabolomics identifies the rivulariapeptolide family of protease inhibitors. Nat. Commun. 13, 4619 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  153. Olivecrona, M., Blaschke, T., Engkvist, O. & Chen, H. Molecular de-novo design through deep reinforcement learning. J. Cheminform. 9, 48 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  154. Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Liu, X., Ye, K., van Vlijmen, H. W. T., IJzerman, A. P. & van Westen, G. J. P. An exploration strategy improves the diversity of de novo ligands using deep reinforcement learning: a case for the adenosine A2A receptor. J. Cheminform. 11, 35 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555, 604–610 (2018).

    Article  CAS  PubMed  Google Scholar 

  157. Coley, C. W. et al. A robotic platform for flow synthesis of organic compounds informed by AI planning. Science 365, eaax1566 (2019).

    Article  CAS  PubMed  Google Scholar 

  158. Thakkar, A., Kogej, T., Reymond, J.-L., Engkvist, O. & Bjerrum, E. J. Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain. Chem. Sci. 11, 154–168 (2020).

    Article  CAS  PubMed  Google Scholar 

  159. Koch, M., Duigou, T. & Faulon, J.-L. Reinforcement learning for bioretrosynthesis. ACS Synth. Biol. 9, 157–168 (2020).

    Article  CAS  PubMed  Google Scholar 

  160. Kramer, C., Kalliokoski, T., Gedeck, P. & Vulpetti, A. The experimental uncertainty of heterogeneous public ki data. J. Med. Chem. 55, 5165–5173 (2012).

    Article  CAS  PubMed  Google Scholar 

  161. Tiikkainen, P., Bellis, L., Light, Y. & Franke, L. Estimating error rates in bioactivity databases. J. Chem. Inf. Model. 53, 2499–2505 (2013).

    Article  CAS  PubMed  Google Scholar 

  162. Sorokina, M. & Steinbeck, C. Review on natural products databases: where to find data in 2020. J. Cheminform. 12, 1–51 (2020).

    Article  Google Scholar 

  163. Mendez, D. et al. ChEMBL: towards direct deposition of bioassay data. Nucleic Acids Res. 47, D930–D940 (2019).

    Article  CAS  PubMed  Google Scholar 

  164. Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein-ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).

    Article  CAS  PubMed  Google Scholar 

  165. Wimalaratne, S. M. et al. Uniform resolution of compact identifiers for biomedical data. Sci. Data 5, 180029 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  166. Rajan, K., Zielesny, A. & Steinbeck, C. DECIMER 1.0: deep learning for chemical image recognition using transformers. J. Cheminformatics 13, 61 (2021).

    Article  Google Scholar 

  167. Rajan, K., Brinkhaus, H. O., Sorokina, M., Zielesny, A. & Steinbeck, C. DECIMER-segmentation: automated extraction of chemical structure depictions from scientific literature. J. Cheminform. 13, 20 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  168. Schymanski, E. L. & Bolton, E. E. FAIR chemical structures in the Journal of Cheminformatics. J. Cheminform. 13, 50 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  169. Kautsar, S. A. et al. MIBiG 2.0: a repository for biosynthetic gene clusters of known function. Nucleic Acids Res. 48, D454–D458 (2020).

    PubMed  Google Scholar 

  170. van Santen, J. A. et al. The natural products atlas: an open access knowledge base for microbial natural products discovery. ACS Cent. Sci. 5, 1824–1833 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  171. van Santen, J. A. et al. The natural products atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res. 50, D1317–D1323 (2021).

    PubMed Central  Google Scholar 

  172. Wang, M. et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34, 828–837 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  173. Wishart, D. S. et al. NP-MRD: the natural products magnetic resonance database. Nucleic Acids Res. 50, D665–D677 (2022).

    Article  CAS  PubMed  Google Scholar 

  174. Flissi, A. et al. Norine: update of the nonribosomal peptide resource. Nucleic Acids Res. 48, D465–D469 (2020).

    CAS  PubMed  Google Scholar 

  175. Jarmusch, S. A., van der Hooft, J. J. J., Dorrestein, P. C. & Jarmusch, A. K. Advancements in capturing and mining mass spectrometry data are transforming natural products research. Nat. Prod. Rep. 38, 2066–2082 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  176. Jarmusch, A. K. et al. ReDU: a framework to find and reanalyze public mass spectrometry data. Nat. Methods 17, 901–904 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  177. Proteau, P. J. Journal of Natural Products 2022: perspectives, monthly cover art, and more. J. Nat. Products 85, 1–2 (2022).

    Article  CAS  Google Scholar 

  178. Clark, T. N. et al. Interlaboratory comparison of untargeted mass spectrometry data uncovers underlying causes for variability. J. Nat. Prod. 84, 824–835 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  179. Fiehn, O. et al. The metabolomics standards initiative (MSI). Metabolomics 3, 175–178 (2007).

    Article  CAS  Google Scholar 

  180. Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008).

    Article  CAS  PubMed  Google Scholar 

  181. Miller, I. J. et al. Autometa: automated extraction of microbial genomes from individual shotgun metagenomes. Nucleic Acids Res. 47, e57 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  182. Schymanski, E. L. et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48, 2097–2098 (2014).

    Article  CAS  PubMed  Google Scholar 

  183. Deutsch, E. W. et al. Universal spectrum identifier for mass spectra. Nat. Methods 18, 768–770 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  184. Bittremieux, W. et al. Universal MS/MS visualization and retrieval with the metabolomics spectrum resolver web service. Preprint at BioRxiv https://doi.org/10.1101/2020.05.09.086066 (2020).

  185. Gordon, J. E. Chemical inference. 2. formalization of the language of organic chemistry: generic systematic nomenclature. J. Chem. Inf. Comput. Sci. 24, 81–92 (1984).

    Article  CAS  Google Scholar 

  186. Wang, Y. et al. PubChem’s bioassay database. Nucleic Acids Res. 40, D400–D412 (2012).

    Article  CAS  PubMed  Google Scholar 

  187. Banerjee, P. et al. Super Natural II—a database of natural products. Nucleic Acids Res. 43, D935–D939 (2015).

    Article  CAS  PubMed  Google Scholar 

  188. Zeng, X. et al. NPASS: natural product activity and species source database for natural product research, discovery and tool development. Nucleic Acids Res. 46, D1217–D1222 (2018).

    Article  CAS  PubMed  Google Scholar 

  189. van der Hooft, J. J. J. A community-driven paired data platform to accelerate natural product mining by combining structural information from genomes and metabolomes. Preprint at https://doi.org/10.18174/fairdata2018.16286 (2018).

  190. Eldjárn, G. H. et al. Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions. PLoS Comput. Biol. 17, e1008920 (2021).

    Article  Google Scholar 

  191. Schorn, M. A. et al. A community resource for paired genomic and metabolomic data mining. Nat. Chem. Biol. 17, 363–368 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  192. Doroghazi, J. R. et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 10, 963–968 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  193. McClure, R. A. et al. Elucidating the rimosamide-detoxin natural product families and their biosynthesis using metabolite/gene cluster correlations. ACS Chem. Biol. 11, 3452–3460 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  194. Goering, A. W. et al. Metabologenomics: correlation of microbial gene clusters with metabolites drives discovery of a nonribosomal peptide with an unusual amino acid monomer. ACS Cent. Sci. 2, 99–108 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  195. Parkinson, E. I. et al. Discovery of the tyrobetaine natural products and their biosynthetic gene cluster via metabologenomics. ACS Chem. Biol. 13, 1029–1037 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  196. Caesar, L. K. et al. Correlative metabologenomics of 110 fungi reveals metabolite-gene cluster pairs. Nat. Chem. Biol. 19, 846–854 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  197. Soldatou, S. et al. Comparative metabologenomics analysis of polar actinomycetes. Mar. Drugs 19, 103 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  198. Sulheim, S. et al. Enzyme-constrained models and omics analysis of streptomyces coelicolor reveal metabolic changes that enhance heterologous production. iScience 23, 101525 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  199. Amos, G. C. A. et al. Comparative transcriptomics as a guide to natural product discovery and biosynthetic gene cluster functionality. Proc. Natl Acad. Sci. USA 114, E11121–E11130 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  200. Wandy, J. & Daly, R. GraphOmics: an interactive platform to explore and integrate multi-omics data. BMC Bioinform. 22, 603 (2021).

    Article  Google Scholar 

  201. Eren, A. M. et al. Community-led, integrated, reproducible multi-omics with anvi’o. Nat. Microbiol. 6, 3–6 (2020).

    Article  Google Scholar 

  202. Sorokina, M., Merseburger, P., Rajan, K., Yirik, M. A. & Steinbeck, C. COCONUT online: collection of open natural products database. J. Cheminform. 13, 2 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  203. Rutz, A. et al. The LOTUS initiative for open knowledge management in natural products research. eLife 11, e70780 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  204. Chen, Y., Stork, C., Hirte, S. & Kirchmair, J. NP-scout: machine learning approach for the quantification and visualization of the natural product-likeness of small molecules. Biomolecules 9, 43 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  205. Cao, L. et al. MolDiscovery: learning mass spectrometry fragmentation of small molecules. Nat. Commun. 12, 3718 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  206. Visser, U. et al. BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results. BMC Bioinform. 12, 257 (2011).

    Article  Google Scholar 

  207. Sarntivijai, S. et al. CLO: the cell line ontology. J. Biomed. Semant. 5, 37 (2014).

    Article  Google Scholar 

  208. Shoemaker, R. H. The NCI60 human tumour cell line anticancer drug screen. Nat. Rev. Cancer 6, 813–823 (2006).

    Article  CAS  PubMed  Google Scholar 

  209. Cooper, M. A. A community-based approach to new antibiotic discovery. Nat. Rev. Drug. Discov. 14, 587–588 (2015).

    Article  CAS  PubMed  Google Scholar 

  210. Cech, N. B., Medema, M. H. & Clardy, J. Benefiting from big data in natural products: importance of preserving foundational skills and prioritizing data quality. Nat. Prod. Rep. 38, 1947–1953 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  211. Blin, K., Shaw, S., Kautsar, S. A., Medema, M. H. & Weber, T. The antiSMASH database version 3: increased taxonomic coverage and new query features for modular enzymes. Nucleic Acids Res. 49, D639–D643 (2021).

    Article  CAS  PubMed  Google Scholar 

  212. Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass. Spectrom. 45, 703–714 (2010).

    Article  CAS  PubMed  Google Scholar 

  213. Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020).

    CAS  PubMed  Google Scholar 

  214. Kuhn, S. & Schlörer, N. E. Facilitating quality control for spectra assignments of small organic molecules: nmrshiftdb2–a free in-house NMR database with integrated LIMS for academic service laboratories. Magn. Reson. Chem. 53, 582–589 (2015).

    Article  CAS  PubMed  Google Scholar 

  215. Irwin, J. J. et al. ZINC20—a free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60, 6065–6073 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  216. Hastings, J. et al. ChEBI in 2016: improved services and an expanding collection of metabolites. Nucleic Acids Res. 44, D1214–D1219 (2016).

    Article  CAS  PubMed  Google Scholar 

  217. Martens, M. et al. WikiPathways: connecting communities. Nucleic Acids Res. 49, D613–D621 (2021).

    Article  CAS  PubMed  Google Scholar 

  218. Jassal, B. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 48, D498–D503 (2020).

    CAS  PubMed  Google Scholar 

  219. Blaskovich, M. A. T., Zuegg, J., Elliott, A. G. & Cooper, M. A. Helping chemists discover new antibiotics. ACS Infect. Dis. 1, 285–287 (2015).

    Article  CAS  PubMed  Google Scholar 

  220. Waagmeester, A. et al. Wikidata as a knowledge graph for the life sciences. eLife 9, e52614 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  221. Reker, D., Rodrigues, T., Schneider, P. & Schneider, G. Target prediction by cascaded self-organizing maps for ligand de-orphaning and side-effect investigation. J. Cheminform. 6, P47 (2014).

    Article  PubMed Central  Google Scholar 

  222. Navarro-Muñoz, J. C. et al. A computational framework to explore large-scale biosynthetic diversity. Nat. Chem. Biol. 16, 60–68 (2020).

    Article  PubMed  Google Scholar 

  223. van der Hooft, J. J. J., Wandy, J., Barrett, M. P., Burgess, K. E. V. & Rogers, S. Topic modeling for untargeted substructure exploration in metabolomics. Proc. Natl Acad. Sci. USA 113, 13738–13743 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  224. Reymond, J.-L. The chemical space project. Acc. Chem. Res. 48, 722–730 (2015).

    Article  CAS  PubMed  Google Scholar 

  225. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Deliv. Rev. 46, 3–26 (2001).

    Article  CAS  PubMed  Google Scholar 

  226. Janssen, A. P. A. et al. Drug discovery maps, a machine learning model that visualizes and predicts kinome–inhibitor interaction landscapes. J. Chem. Inf. Model. 59, 1221–1229 (2019).

    Article  CAS  PubMed  Google Scholar 

  227. McInnes, L., Healy, J., Saul, N. & Großberger, L. UMAP: uniform manifold approximation and projection. J. Open. Source Softw. 3, 861 (2018).

    Article  Google Scholar 

  228. Probst, D. & Reymond, J.-L. Visualization of very large high-dimensional data sets as minimum spanning trees. J. Cheminform. 12, 12 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  229. Feher, M. & Schmidt, J. M. Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry. J. Chem. Inf. Comput. Sci. 43, 218–227 (2003).

    Article  CAS  PubMed  Google Scholar 

  230. Béquignon, O. J. M. et al. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J. Cheminform. 15, 3 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

All authors thank the Lorentz Center and Leiden University for funding the Lorentz Workshop ‘Artificial Intelligence for Natural Product Drug Discovery’ that laid the foundation for this Review. M.W.M. was supported by funds from the Duchossois Family Institute at the University of Chicago. K.R.D. was supported by the UK Research and Innovation Biotechnology and Biological Sciences Research Council (BB/R022054/1). N.G. was supported by an NSF CAREER award (award number 2047235). J.J.J.v.d.H. was supported by an ASDI eScience grant from the Netherlands eScience Center (award number ASDI.2017.030). N.I.M. is supported by funding from the European Research Council (ERC consolidator grant agreement no. 725523). K.B. was supported by a Novo Nordisk Foundation grant NNF20CC0035580. M.G.G. was supported by ONCODE funding. E.J.N.H. was supported by the LOEWE Center for Translational Biodiversity Genomics and the Funds of the Chemical Industry Germany. M.S. was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), Project-ID 239748522, SFB 1127 ChemBioSys. M.A.B. was supported by the National French Agency (ANR grants 15-CE29-0001 and 20-CE43-0010). C.M.C. was supported by a National Library of Medicine training grant to the Computation and Informatics in Biology and Medicine Training Program (NLM 5T15LM007359). S.F. was supported by MASTS/IbioIC/Xanthella. O.V.K. was funded by the Klaus Faber Foundation. H.K. was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (grants NRF 2018R1A5A2023127 and NRF 2022R1F1A107462311). J.M. was supported by a grant from the Research Foundation – Flanders (G061821N). E.R.R. was supported by the US National Science Foundation (DBI-1845890). D.R. was supported, in part, by a Flash Grant from NC Biotech (2021-FLG-3819), a UNC CGIBD Pilot Award (NIH NIDDK DK034987), a Duke Cancer Institute and Duke Microbiome Center Pilot Award (NIH NCI CA014236), the Engineering Research Center for Precision Microbiome Engineering (NSF EEC-2133504), and the Duke Science and Technology Initiative. P.S. acknowledges support from the NCCR Catalysis (grant number 180544), a National Centre of Competence in Research funded by the Swiss National Science Foundation. N.Z. was supported by Germany’s Excellence Strategy – EXC 2124-390838134. H.U.K. was supported by the KAIST Key Research Institute (Interdisciplinary Research Group) Project. R.G.L. was supported by the US NIH (U41-AT008718 and U24-AT010811). S.L.R. was supported by Eawag discretionary funding. M.H.M. was supported by the Leiden University ‘van der Klaauw’ chair for theoretical biology and an ERC Starting Grant (DECIPHER-948770). We thank A. R. Leach for discussion of the content of this manuscript. We thank all participants of the Lorentz Workshop ‘Artificial Intelligence for Natural Product Drug Discovery’ who did not participate in the review writing for providing inspiration through their talks and/or discussions.

Author information

Authors and Affiliations

Authors

Contributions

M.W.M., S.S.E., D.M., B.R.T., M.G.G., S.L.-M., K.R., T.d.R., J.A.v.S., M.S., S.F., A.K.H.H., R.G.L., S.L.R. and M.H.M. researched data for the article. M.W.M., K.R.D., S.S.E., N.G., J.J.J.v.d.H., N.I.M., D.M., B.R.T., F.B., J.D., E.J.N.H., F.H., T.d.R., M.S., M.J.B., D.A.v.B., L.M.C., C.M.C., C.A.D., C.D., F.G., A.H., W.J., O.V.K., H.K., T.F.L., J.M., E.R.R., R.R., D.R., P.S., M.S., M.A.S., A.S.W., N.Z., R.J.M.G., A.V., W.H.G., R.M., G.P.v.W., G.J.P.v.W., A.K.H.H., R.L., S.L.R. and M.H.M. contributed substantially to discussion of the content. M.W.M., K.R.D., N.G., J.J.J.v.d.H., B.R.T., F.B., J.D., M.G.G., M.S., M.J.B., M.A.B., L.M.C., C.M.C., C.A.D., S.F., A.H., W.J., O.V.K., S.A.K., T.F.L., J.M., D.R., M.A.S., A.S.W., B.Z., N.Z., R.J.M.G., P.G., A.V., W.H.G., G.J.P.v.W., A.K.H.H., R.G.L., S.L.R. and M.H.M. wrote the article. M.W.M., K.R.D., S.S.E., N.G., J.J.J.v.d.H., N.I.M., D.M., B.R.T., F.B., K.B., E.J.N.H., F.H., T.d.R., M.J.B., L.M.C., C.M.C., D.-A.C., C.A.D., F.G., S.A.K., H.K., E.R.R., R.R., P.S., M.A.S., E.L.W., B.Z., W.H.G., H.U.K., R.M., G.P.v.W., G.J.P.v.W., A.K.H.H., R.G.L., S.L.R. and M.H.M. reviewed and/or edited the manuscript before submission.

Corresponding authors

Correspondence to Gerard J. P. van Westen, Anna K. H. Hirsch, Roger G. Linington, Serina L. Robinson or Marnix H. Medema.

Ethics declarations

Competing interests

J.J.J.v.d.H. is a member of the scientific advisory board of NAICONS Srl., Milan, Italy. C.A.D. is a founding member of Adapsyn Bioscience. M.A.S. is a consultant to Adapsyn Bioscience. M.H.M. is on the scientific advisory board of Hexagon Bio and co-founder of Design Pharmaceuticals. The other authors declare no competing interests.

Peer review

Peer review information

Nature Reviews Drug Discovery thanks Hosein Mohimani and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

National Database of Antibiotic Resistant Organisms (NDARO): https://www.ncbi.nlm.nih.gov/pathogens/antimicrobial-resistance/

RDKit: Open-Source Cheminformatics Software: http://www.rdkit.org/

Wikidata Query Service: https://w.wiki/5bpq

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mullowney, M.W., Duncan, K.R., Elsayed, S.S. et al. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 22, 895–916 (2023). https://doi.org/10.1038/s41573-023-00774-7

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41573-023-00774-7

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research