Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Discovery of a structural class of antibiotics with explainable deep learning

Abstract

The discovery of novel structural classes of antibiotics is urgently needed to address the ongoing antibiotic resistance crisis1,2,3,4,5,6,7,8,9. Deep learning approaches have aided in exploring chemical spaces1,10,11,12,13,14,15; these typically use black box models and do not provide chemical insights. Here we reasoned that the chemical substructures associated with antibiotic activity learned by neural network models can be identified and used to predict structural classes of antibiotics. We tested this hypothesis by developing an explainable, substructure-based approach for the efficient, deep learning-guided exploration of chemical spaces. We determined the antibiotic activities and human cell cytotoxicity profiles of 39,312 compounds and applied ensembles of graph neural networks to predict antibiotic activity and cytotoxicity for 12,076,365 compounds. Using explainable graph algorithms, we identified substructure-based rationales for compounds with high predicted antibiotic activity and low predicted cytotoxicity. We empirically tested 283 compounds and found that compounds exhibiting antibiotic activity against Staphylococcus aureus were enriched in putative structural classes arising from rationales. Of these structural classes of compounds, one is selective against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci, evades substantial resistance, and reduces bacterial titres in mouse models of MRSA skin and systemic thigh infection. Our approach enables the deep learning-guided discovery of structural classes of antibiotics and demonstrates that machine learning models in drug discovery can be explainable, providing insights into the chemical substructures that underlie selective antibiotic activity.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Ensembles of deep learning models for predicting antibiotic activity and human cell cytotoxicity.
Fig. 2: Filtering and visualizing chemical space.
Fig. 3: Graph-based rationales reveal scaffolds for prospective antibiotic classes.
Fig. 4: Resistance and mechanism of action of a structural class.
Fig. 5: In vivo efficacy.

Data availability

Data generated from chemical screens, machine learning models and whole-genome sequencing experiments are available as Supplementary Data 14. Source Data are available for Figs. 4 and 5 and Extended Data Figs. 8 and 9. Data from whole-genome sequencing reads have been deposited on BioProject under accession number PRJNA1026995. A copy of model predictions for the Mcule purchasable database (ver. 200601) and the Broad Institute database used in this work is available at https://github.com/felixjwong/antibioticsaiSource data are provided with this paper.

Code availability

Chemprop is available at https://github.com/chemprop/chemprop. The Chemprop checkpoints for the final antibiotic activity, cytotoxicity, and proton motive force-alteration models, along with a code platform for performing and adapting the analyses developed in this work, are available at https://github.com/felixjwong/antibioticsai and https://zenodo.org/records/1009587957.

References

  1. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Imai, Y. et al. A new antibiotic selectively kills Gram-negative pathogens. Nature 576, 459–464 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ling, L. L. et al. A new antibiotic kills pathogens without detectable resistance. Nature 517, 455–459 (2015).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  4. Martin, J. K. II et al. A dual-mechanism antibiotic kills Gram-negative bacteria and avoids drug resistance. Cell 181, 1518–1532.e14 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Lewis, K. Platforms for antibiotic discovery. Nat. Rev. Drug Discov. 12, 371–387 (2013).

    Article  CAS  PubMed  Google Scholar 

  6. Culp, E. J. et al. Evolution-guided discovery of antibiotics that inhibit peptidoglycan remodelling. Nature 578, 582–587 (2020).

    Article  ADS  CAS  PubMed  Google Scholar 

  7. Mitcheltree, M. J. et al. A synthetic antibiotic class overcoming bacterial multidrug resistance. Nature 599, 507–512 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  8. Durand-Reville, T. F. et al. Rational design of a new antibiotic class for drug-resistant infections. Nature 597, 698–702 (2021).

    Article  ADS  CAS  PubMed  Google Scholar 

  9. Silver, L. L. Challenges of antibacterial discovery. Clin. Microbiol. Rev. 24, 71–109 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gilmer, J. et al. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning (2017).

  11. Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Wong, F. et al. Leveraging artificial intelligence in the fight against infectious diseases. Science 381, 164–170 (2023).

    Article  ADS  MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  13. Melo, M. C. R., Maasch, J. R. M. A. & de la Fuente-Nunez, C. Accelerating antibiotic discovery through artificial intelligence. Commun. Biol. 4, 1050 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. 19, 1342–1350 (2023).

    Article  ADS  CAS  PubMed  Google Scholar 

  15. Wong, F. et al. Discovering small-molecule senolytics with deep neural networks. Nat. Aging 3, 734–750 (2023).

    Article  CAS  PubMed  Google Scholar 

  16. Antimicrobial Resistance: Tackling a Crisis for the Health and Wealth of Nations (The Review on Antimicrobial Resistance, 2014)

  17. Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Sterling, T. & Irwin, J. J. ZINC 15—ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Camacho, D. M. et al. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).

    Article  CAS  PubMed  Google Scholar 

  20. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Lee, A. S. et al. Methicillin-resistant Staphylococcus aureus. Nat. Rev. Dis. Primers 4, 18033 (2018).

    Article  PubMed  Google Scholar 

  22. Toxicology in the 21st century. National Center for Advancing Translational Sciences. https://tripod.nih.gov/tox/ (accessed 20 October 2022).

  23. The Human Metabolome Database. https://hmdb.ca/metabolites (accessed 20 October 2022).

  24. M-cule purchaseable database (in-stock), ver. 200601. https://mcule.com/database/ (accessed 27 June 2020).

  25. Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  26. Jin, W., Barzilay, R. & Jaakkola, T. Multi-objective molecule generation using interpretable substructures. In Proc. 37th International Conference on Machine Learning 450, 4849–4859 (2020).

  27. Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Article  ADS  CAS  PubMed  Google Scholar 

  28. Cao, Y., Jiang, T. & Girke, T. A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24, i366–i374 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).

    Article  CAS  PubMed  Google Scholar 

  31. Brenk, R. et al. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem 3, 435–444 (2008).

    Article  CAS  PubMed  Google Scholar 

  32. Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Dis. Rev. 23, 3–25 (1997).

    Article  CAS  Google Scholar 

  33. Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–68 (1999).

    Article  CAS  PubMed  Google Scholar 

  34. O’Shea, R. & Moser, H. E. Physicochemical properties of antibacterial compounds: implications for drug discovery. J. Med. Chem. 51, 2871–2878 (2008).

    Article  PubMed  Google Scholar 

  35. Wong, F. et al. Reactive metabolic byproducts contribute to antibiotic lethality under anaerobic conditions. Mol. Cell 82, 3499–3512 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Wong, F. et al. Cytoplasmic condensation induced by membrane damage is associated with antibiotic lethality. Nat. Commun. 12, 2321 (2021).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  37. Wong, F. et al. Understanding beta-lactam-induced lysis at the single-cell level. Front. Microbiol. 12, 712007 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Wong, F. et al. Mechanics and dynamics of bacterial cell lysis. Biophys. J. 116, 2378–2389 (2019).

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  39. Zheng, E. J. et al. Discovery of antibiotics that selectively kill metabolically dormant bacteria. Cell Chem. Biol. https://doi.org/10.1016/j.chembiol.2023.10.026 (2023). 

  40. Farha, M. A., Verschoor, C. P., Bowdish, D. & Brown, E. D. Collapsing the proton motive force to identify synergistic combinations against Staphylococcus aureus. Chem. Biol. 20, 1168–1178 (2013).

    Article  CAS  PubMed  Google Scholar 

  41. Hurdle, J. G. Targeting bacterial membrane function: an underexploited mechanism for treating persistent infections. Nat. Rev. Microbiol. 9, 62–75 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Antibiotic Resistance Threats in the United States, 2019. Centers for Disease Control and Prevention. https://www.cdc.gov/drugresistance/pdf/threats-report/2019-ar-threats-report-508.pdf (accessed 20 September 2021).

  43. Lewis, K. The science of antibiotic discovery. Cell 181, 29–45 (2020).

    Article  CAS  PubMed  Google Scholar 

  44. Walsh, C. Where will new antibiotics come from? Nat. Rev. Microbiol. 1, 65–70 (2003).

    Article  CAS  PubMed  Google Scholar 

  45. Ying, R., Bourgeois, D., You, J., Zitnik, M. & Leskovic, J. GNNExplainer: Generating explanations for graph neural networks. Adv. Neural. Inf. Process. Syst. 32, 9240–9251 (2019).

    PubMed  PubMed Central  Google Scholar 

  46. Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).

    Article  Google Scholar 

  47. Yuan, H., Yu, H., Gui, S. & Ji, S. Explainability in graph neural networks: a taxonomic survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 5782–5799 (2023).

    PubMed  Google Scholar 

  48. DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).

    Article  CAS  PubMed  Google Scholar 

  49. Kazeev, N. The fast version of DeLong’s method for computing the covariance of unadjusted AUC. https://github.com/yandexdataschool/roc_comparison (accessed 21 July 2023).

  50. Rosin, C. D. Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011).

    Article  MathSciNet  Google Scholar 

  51. Wang, Y., Backman, T. W. H., Horan, K. & Girke, T. fmcsR: mismatch tolerant maximum common substructure searching in R. Bioinformatics 29, 2792–2794 (2013).

    Article  CAS  PubMed  Google Scholar 

  52. Daina, A., Michielin, O. & Zoete, V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep. 7, 42717 (2017).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  53. Wong, F. et al. Benchmarking AlphaFold‐enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  55. Greco, I. et al. Correlation between hemolytic activity, cytotoxicity and systemic in vivo toxicity of synthetic antimicrobial peptides. Sci Rep. 6, 13206 (2020).

    Article  ADS  Google Scholar 

  56. Krol, L. R. Permutation Test. https://github.com/lrkrol/permutationTest (accessed 22 July 2023).

  57. Wong, F. et al. Supporting code for: discovery of a structural class of antibiotics with explainable deep learning. Zenodo https://doi.org/10.5281/zenodo.10095879 (2023).

Download references

Acknowledgements

The authors thank the past and present members of the Collins laboratory for helpful discussions; members of the Broad Institute Center for the Development of Therapeutics (CDoT) for helpful feedback; the Microbial Genome Sequencing Center for assistance with sequencing; the Harvard Center for Mass Spectrometry for assistance with LC–MS experiments; S. Gould and R. Singh for medicinal chemistry feedback; A. Vrcic and T. Dawson for assistance with compound management; A. Graveline for assistance with mouse experiments; and Z. Gitai for E. coli strains RFM795 and JW5503-KanS. F.W. was supported by the James S. McDonnell Foundation and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award number K25AI168451. A.K. was supported by the Swiss National Science Foundation under grant number SNSF_ 203071. A.M.E. and A.L.M. were supported by federal funds from the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under grant number U19AI110818 to the Broad Institute. J.M.S. was supported by the Banting Fellowships Program (393360). L.D.R. was supported by the Volkswagen Foundation. J.J.C. was supported by the Defense Threat Reduction Agency (grant number HDTRA12210032), the National Institutes of Health (grant number R01-AI146194), and the Broad Institute of MIT and Harvard. This work is part of the Antibiotics-AI Project, which is directed by J.J.C. and supported by the Audacious Project, Flu Lab, LLC, the Sea Grape Foundation, R. Zander and H. Wyss for the Wyss Foundation, and an anonymous donor.

Author information

Authors and Affiliations

Authors

Contributions

F.W. conceived research, designed all models and experiments, performed or directed all experiments and analysis, wrote the paper and supervised research. E.J.Z., S.O. and A.L. performed screening experiments and analysis. J.A.V. and W.J. assisted with data interpretation and analysis, and W.J. developed and implemented the MCTS rationale extraction algorithm. N.M.D., M.N.A. and A.C.-R. performed mouse experiments and analysis. M.N.A. and A.K. performed screening experiments and assisted with data interpretation. J.F. and R.H. performed cellular physiology experiments and analysis. A.L.M. and A.M.E. performed genomic analysis and assisted with data interpretation. B.H., H.H.S. and J.M.S. assisted with data interpretation. D.K.F. and F.F.W. assisted with chemical testing experiments. L.D.R. performed cellular physiology experiments and analysis and assisted with data interpretation. J.J.C. supervised research. All authors assisted with manuscript editing.

Corresponding author

Correspondence to James J. Collins.

Ethics declarations

Competing interests

J.J.C. is an academic co-founder and scientific advisory board chair of EnBiotix, an antibiotic drug discovery company, and Phare Bio, a non-profit venture focused on antibiotic drug development. J.J.C. is also an academic co-founder and board member of Cellarity and the founding scientific advisory board chair of Integrated Biosciences. J.M.S. is scientific co-founder and scientific director of Phare Bio. F.W. is a co-founder of Integrated Biosciences. S.O. and A.L. contributed to this work as employees of Integrated Biosciences, and S.O. may have an equity interest in Integrated Biosciences. F.W. and J.J.C. have filed a patent based on the results of this work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Molecular weight distribution of the 39,312 compounds screened.

Data are from an original set of 39,312 compounds containing most known antibiotics, natural products, and structurally diverse molecules, with molecular weights between 40 Da and 4,200 Da. Frequency is shown on a log scale.

Extended Data Fig. 2 Comparison of deep learning models for predicting antibiotic activity.

a,b, Precision-recall curves for predictions of antibiotic activity, for an ensemble of 10 Chemprop models without RDKit features (a) and the best-performing random forest classifier model based on Morgan fingerprints (b), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (1.3%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping. AUC, area under the curve.

Extended Data Fig. 3 Comparison of deep learning models for predicting human cell cytotoxicity.

a,b, Precision-recall curves for predictions of HepG2 cytotoxicity, for an ensemble of 10 Chemprop models without RDKit features (a) and the best-performing random forest classifier model based on Morgan fingerprints (b), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (8.5%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping. AUC, area under the curve. c,d, Precision-recall curves for predictions of HSkMC cytotoxicity, for an ensemble of 10 Chemprop models without RDKit features (c) and the best-performing random forest classifier model based on Morgan fingerprints (d), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (3.8%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping. e,f, Precision-recall curves for predictions of IMR-90 cytotoxicity, for an ensemble of 10 Chemprop models without RDKit features (e) and the best-performing random forest classifier model based on Morgan fingerprints (f), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (8.8%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping.

Extended Data Fig. 4 Visualizing chemical space across different prediction score thresholds.

a,b, t-Distributed neighbor embedding (t-SNE) plot of compounds with high and low antibiotic prediction scores, in addition to compounds in the training set, for different prediction score thresholds. The plot shows the chemical similarity or dissimilarity of various compounds, and active compounds in the training set (red dots) are seen to largely separate compounds with high prediction scores (green, black, and purple dots) from compounds with low prediction scores (brown dots).

Extended Data Fig. 5 Examples of rationale calculations using Monte-Carlo tree search.

a, Illustration of the MCTS forward pass using compound 1. The figure shows three possible search paths from the root (compound 1) by deleting peripheral bonds or rings (highlighted in red). Due to space limitations, only three steps from the root are shown. b, Illustration of a complete search path from the root (compound 1) to a leaf node (the rationale). Chemprop is used to predict the activity of each leaf node, and these predictions are used to make updates to the statistics of each intermediate node in the backward pass.

Extended Data Fig. 6 Maximal common substructure identification reveals known antibiotic classes, but are less predictive than Chemprop rationales across all hits.

a,b, Rank-ordered numbers of hits (a) and non-hits (b) associated with maximal common substructures (MCSs) identified by a grouping method. Here, any hit associated with any of the MCSs shown shares a minimum of 12 atoms with the MCS. Dashed lines in MCSs indicate either single or double bonds. Each green or brown bar shows the prediction score of each MCS viewed as a molecule in its own right. Where bars are thin, the corresponding MCS prediction scores are approximately zero (including all brown bars in (b)). c,d, Similar to (a), but here, any hit associated with any of the MCSs shown shares a minimum of 10 (c) or 15 (d) atoms with the MCS. e, Illustration of the rationales (red) determined using a Monte Carlo tree search for example hits (black) associated with MCSs A1-A12. No hit associated with MCS A12 possessed a rationale. f, MCS prediction scores (blue bars) and the average prediction scores of all rationales of all hits associated with MCSs A1-A12 (red bars). Where blue bars are thin, the corresponding MCS prediction scores are approximately zero. No hit associated with MCS A12 possessed a rationale.

Extended Data Fig. 7 Closest active training set compounds to, and selectivities of, four validated hits associated with rationale groups G1-G5.

a, Closest active compounds (right), as measured by Tanimoto similarity, are from the training set of 39,312 compounds. Compounds are colored according to associated rationale groups (as indicated in parentheses), and the identifier and Tanimoto similarity score of each closest active compound are displayed. b, S. aureus MIC and human cell IC50 values of the four compounds in (a), shown on a log scale. Bars show the means of two biological replicates (points) and are colored by the bacterial strain, human cell type, or media condition tested. Asterisks indicate values larger than 128 µg/mL.

Extended Data Fig. 8 Comparison of MICs of different compounds against methicillin-susceptible and methicillin-resistant S. aureus, and eradication of kanamycin persisters by treatment with compounds 1 and 2.

a, MICs of various antibiotics against S. aureus RN4220 (black) and S. aureus USA300 (blue) on a log scale. Bars show the mean of two biological replicates (individual points). b, Survival curves of B. subtilis 168 after combination treatment with kanamycin and compounds 1 and 2, respectively, as determined by plating and CFU counting. Initial CFU values are ~107. Each point is representative of the mean of two biological replicates. Cultures treated with kanamycin in addition to compounds 1 and 2 were eradicated after 24 h (CFU/mL = 0), and these values were truncated to a log survival value of −7 on this plot.

Source Data

Extended Data Fig. 9 Toxicity, chemical properties, and in vivo efficacy of compounds 1 and 2.

a, Fractional hemolysis measurements of human red blood cells (RBCs) treated with compounds 1 and 2 at the indicated final concentrations. Vehicle (1% DMSO) was used as a negative control, and Triton X-100, a detergent, was used as a positive control. Black points indicate values from two biological replicates, and red bars indicate average values. b, Ferrous iron chelation measurements of compounds 1 and 2. Vehicle (1% DMSO) was used as a negative control, and ethylenediaminetetraacetic acid (EDTA), an iron chelator, was used as a positive control. Black points indicate values from two biological replicates, and gray bars indicate average values. c, Ames test mutagenesis measurements of the fractions of revertant S. typhimurium TA100 cultures treated with compounds 1 and 2 at the indicated final concentrations. Vehicle (1% DMSO) was used as a negative control, and 5 µg/mL sodium azide was used as a positive control. Black points indicate values from two biological replicates, and purple bars indicate average values. Higher fractions of revertant cultures indicate higher mutagenic potential (inset). d, Chemical stability of compound 1 in various buffers as a function of incubation time at 37 °C. Values are normalized to the mean measurement at time zero, and each point is representative of the mean of two biological replicates. Error bars indicate the full range of values arising from two biological replicates. e, Photographs of WoundSkin models 24 h after topical treatment with compound 1 (1%) or DMSO vehicle. Images are representative of six biological replicates in each treatment group. Scale bar, 2 mm. f, Illustration of the in vivo study of a neutropenic mouse wound infection model using MRSA CDC 563 shown in Fig. 5a of the main text. g, Illustration of the in vivo study of a neutropenic mouse thigh infection model using MRSA CDC 706 shown in Fig. 5b of the main text.

Source Data

Extended Data Fig. 10 Exploration of a structural class through structure-activity relationships.

a, The rationale of compounds 1 and 2, overlaid with chemical modifications (R1-R8) that encompass all compounds used to test SAR (Supplementary Data 2). SAR, structure-activity relationships. b, Analogues of compounds 1 and 2 found to have varying degrees of activity against S. aureus. Corresponding MIC and IC50 values are representative of two biological replicates.

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-4, Supplementary References, and Supplementary Tables 1-9.

Reporting Summary

Supplementary Data 1

Training set of 39,312 compounds tested for antibiotic activity and cytotoxicity, in addition to 200 RDKit features used to augment the models and cytotoxicity testing results. Antibiotic activity was defined using a 20% relative mean growth cut-off in S. aureus RN4220. Cytotoxicity was defined using a 90% relative mean cell viability cut-off in HepG2 cells, HSkMCs, and IMR-90 cells. Data are from two biological replicates.

Supplementary Data 2

Model predictions, rationales, and procured compounds from the ensemble Chemprop model. Compound SMILES strings and corresponding prediction scores are shown for all 3,646 hits, out of 12,076,365 compounds whose antibiotic activity and cytotoxicity against human cells were predicted. Rationale and scaffold SMARTS strings, vendor catalogue information for all 283 procured and tested compounds shown in Fig. 3e of the main text, and vendor catalogue information for all 17 procured and tested compounds as part of the structure–activity relationship analyses shown in Extended Data Fig. 10 are also provided, in addition to the MCS SMARTS strings for the analyses described in Supplementary Note 2 and Extended Data Fig. 6.

Supplementary Data 3

Mutations arising in cells exposed to compounds. For each compound, results are shown for at least two independently passaged or suppressor mutant populations. All mutations that passed mapping filters are listed here. Black boxes highlight mutations in similar regions across sequencing replicates either present in the same gene, or present in an adjacent gene or intergenic region.

Supplementary Data 4

Training and test data for models predicting proton motive force-altering activity. Proton motive force-altering activity was defined using a 30% relative mean fluorescence change in S. aureus RN4220 in the presence of DiSC3(5), a proton motive force-sensitive dye. 475 antibacterial compounds from Supplementary Data 1 were tested, and all inactive antibacterial compounds were assumed to not alter proton motive force. Data are from two biological replicates.

Peer Review File

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wong, F., Zheng, E.J., Valeri, J.A. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2024). https://doi.org/10.1038/s41586-023-06887-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-023-06887-8

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research