Discovery of a structural class of antibiotics with explainable deep learning

Wong, Felix; Zheng, Erica J.; Valeri, Jacqueline A.; Donghia, Nina M.; Anahtar, Melis N.; Omori, Satotaka; Li, Alicia; Cubillos-Ruiz, Andres; Krishnan, Aarti; Jin, Wengong; Manson, Abigail L.; Friedrichs, Jens; Helbig, Ralf; Hajian, Behnoush; Fiejtek, Dawid K.; Wagner, Florence F.; Soutter, Holly H.; Earl, Ashlee M.; Stokes, Jonathan M.; Renner, Lars D.; Collins, James J.

doi:10.1038/s41586-023-06887-8

Article
Published: 20 December 2023

Discovery of a structural class of antibiotics with explainable deep learning

Nature volume 626, pages 177–185 (2024)Cite this article

49k Accesses
10 Citations
1815 Altmetric
Metrics details

Subjects

Abstract

The discovery of novel structural classes of antibiotics is urgently needed to address the ongoing antibiotic resistance crisis^{1,2,3,4,5,6,7,8,9}. Deep learning approaches have aided in exploring chemical spaces^{1,10,11,12,13,14,15}; these typically use black box models and do not provide chemical insights. Here we reasoned that the chemical substructures associated with antibiotic activity learned by neural network models can be identified and used to predict structural classes of antibiotics. We tested this hypothesis by developing an explainable, substructure-based approach for the efficient, deep learning-guided exploration of chemical spaces. We determined the antibiotic activities and human cell cytotoxicity profiles of 39,312 compounds and applied ensembles of graph neural networks to predict antibiotic activity and cytotoxicity for 12,076,365 compounds. Using explainable graph algorithms, we identified substructure-based rationales for compounds with high predicted antibiotic activity and low predicted cytotoxicity. We empirically tested 283 compounds and found that compounds exhibiting antibiotic activity against Staphylococcus aureus were enriched in putative structural classes arising from rationales. Of these structural classes of compounds, one is selective against methicillin-resistant S. aureus (MRSA) and vancomycin-resistant enterococci, evades substantial resistance, and reduces bacterial titres in mouse models of MRSA skin and systemic thigh infection. Our approach enables the deep learning-guided discovery of structural classes of antibiotics and demonstrates that machine learning models in drug discovery can be explainable, providing insights into the chemical substructures that underlie selective antibiotic activity.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Ensembles of deep learning models for predicting antibiotic activity and human cell cytotoxicity.**

**Fig. 2: Filtering and visualizing chemical space.**

**Fig. 3: Graph-based rationales reveal scaffolds for prospective antibiotic classes.**

**Fig. 4: Resistance and mechanism of action of a structural class.**

Causal machine learning for predicting treatment outcomes

Article 19 April 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

De novo design of protein structure and function with RFdiffusion

Article Open access 11 July 2023

Data availability

Data generated from chemical screens, machine learning models and whole-genome sequencing experiments are available as Supplementary Data 1–4. Source Data are available for Figs. 4 and 5 and Extended Data Figs. 8 and 9. Data from whole-genome sequencing reads have been deposited on BioProject under accession number PRJNA1026995. A copy of model predictions for the Mcule purchasable database (ver. 200601) and the Broad Institute database used in this work is available at https://github.com/felixjwong/antibioticsai. Source data are provided with this paper.

Code availability

Chemprop is available at https://github.com/chemprop/chemprop. The Chemprop checkpoints for the final antibiotic activity, cytotoxicity, and proton motive force-alteration models, along with a code platform for performing and adapting the analyses developed in this work, are available at https://github.com/felixjwong/antibioticsai and https://zenodo.org/records/10095879⁵⁷.

References

Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).
Article CAS PubMed PubMed Central Google Scholar
Imai, Y. et al. A new antibiotic selectively kills Gram-negative pathogens. Nature 576, 459–464 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Ling, L. L. et al. A new antibiotic kills pathogens without detectable resistance. Nature 517, 455–459 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Martin, J. K. II et al. A dual-mechanism antibiotic kills Gram-negative bacteria and avoids drug resistance. Cell 181, 1518–1532.e14 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lewis, K. Platforms for antibiotic discovery. Nat. Rev. Drug Discov. 12, 371–387 (2013).
Article CAS PubMed Google Scholar
Culp, E. J. et al. Evolution-guided discovery of antibiotics that inhibit peptidoglycan remodelling. Nature 578, 582–587 (2020).
Article ADS CAS PubMed Google Scholar
Mitcheltree, M. J. et al. A synthetic antibiotic class overcoming bacterial multidrug resistance. Nature 599, 507–512 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Durand-Reville, T. F. et al. Rational design of a new antibiotic class for drug-resistant infections. Nature 597, 698–702 (2021).
Article ADS CAS PubMed Google Scholar
Silver, L. L. Challenges of antibacterial discovery. Clin. Microbiol. Rev. 24, 71–109 (2011).
Article CAS PubMed PubMed Central Google Scholar
Gilmer, J. et al. Neural message passing for quantum chemistry. In Proc. 34^th International Conference on Machine Learning (2017).
Yang, K. et al. Analyzing learned molecular representations for property prediction. J. Chem. Inf. Model. 59, 3370–3388 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wong, F. et al. Leveraging artificial intelligence in the fight against infectious diseases. Science 381, 164–170 (2023).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Melo, M. C. R., Maasch, J. R. M. A. & de la Fuente-Nunez, C. Accelerating antibiotic discovery through artificial intelligence. Commun. Biol. 4, 1050 (2021).
Article CAS PubMed PubMed Central Google Scholar
Liu, G. et al. Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii. Nat. Chem. Biol. 19, 1342–1350 (2023).
Article ADS CAS PubMed Google Scholar
Wong, F. et al. Discovering small-molecule senolytics with deep neural networks. Nat. Aging 3, 734–750 (2023).
Article CAS PubMed Google Scholar
Antimicrobial Resistance: Tackling a Crisis for the Health and Wealth of Nations (The Review on Antimicrobial Resistance, 2014)
Corsello, S. M. et al. The Drug Repurposing Hub: a next-generation drug library and information resource. Nat. Med. 23, 405–408 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sterling, T. & Irwin, J. J. ZINC 15—ligand discovery for everyone. J. Chem. Inf. Model. 55, 2324–2337 (2015).
Article CAS PubMed PubMed Central Google Scholar
Camacho, D. M. et al. Next-generation machine learning for biological networks. Cell 173, 1581–1592 (2018).
Article CAS PubMed Google Scholar
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Article PubMed PubMed Central Google Scholar
Lee, A. S. et al. Methicillin-resistant Staphylococcus aureus. Nat. Rev. Dis. Primers 4, 18033 (2018).
Article PubMed Google Scholar
Toxicology in the 21st century. National Center for Advancing Translational Sciences. https://tripod.nih.gov/tox/ (accessed 20 October 2022).
The Human Metabolome Database. https://hmdb.ca/metabolites (accessed 20 October 2022).
M-cule purchaseable database (in-stock), ver. 200601. https://mcule.com/database/ (accessed 27 June 2020).
Van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Jin, W., Barzilay, R. & Jaakkola, T. Multi-objective molecule generation using interpretable substructures. In Proc. 37th International Conference on Machine Learning 450, 4849–4859 (2020).
Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).
Article ADS CAS PubMed Google Scholar
Cao, Y., Jiang, T. & Girke, T. A maximum common substructure-based algorithm for searching and predicting drug-like compounds. Bioinformatics 24, i366–i374 (2008).
Article CAS PubMed PubMed Central Google Scholar
Muratov, E. N. et al. QSAR without borders. Chem. Soc. Rev. 49, 3525–3564 (2020).
Article CAS PubMed PubMed Central Google Scholar
Baell, J. B. & Holloway, G. A. New substructure filters for removal of pan assay interference compounds (PAINS) from screening libraries and for their exclusion in bioassays. J. Med. Chem. 53, 2719–2740 (2010).
Article CAS PubMed Google Scholar
Brenk, R. et al. Lessons learnt from assembling screening libraries for drug discovery for neglected diseases. ChemMedChem 3, 435–444 (2008).
Article CAS PubMed Google Scholar
Lipinski, C. A., Lombardo, F., Dominy, B. W. & Feeney, P. J. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. Adv. Drug. Dis. Rev. 23, 3–25 (1997).
Article CAS Google Scholar
Ghose, A. K., Viswanadhan, V. N. & Wendoloski, J. J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J. Comb. Chem. 1, 55–68 (1999).
Article CAS PubMed Google Scholar
O’Shea, R. & Moser, H. E. Physicochemical properties of antibacterial compounds: implications for drug discovery. J. Med. Chem. 51, 2871–2878 (2008).
Article PubMed Google Scholar
Wong, F. et al. Reactive metabolic byproducts contribute to antibiotic lethality under anaerobic conditions. Mol. Cell 82, 3499–3512 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wong, F. et al. Cytoplasmic condensation induced by membrane damage is associated with antibiotic lethality. Nat. Commun. 12, 2321 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Wong, F. et al. Understanding beta-lactam-induced lysis at the single-cell level. Front. Microbiol. 12, 712007 (2021).
Article PubMed PubMed Central Google Scholar
Wong, F. et al. Mechanics and dynamics of bacterial cell lysis. Biophys. J. 116, 2378–2389 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Zheng, E. J. et al. Discovery of antibiotics that selectively kill metabolically dormant bacteria. Cell Chem. Biol. https://doi.org/10.1016/j.chembiol.2023.10.026 (2023).
Farha, M. A., Verschoor, C. P., Bowdish, D. & Brown, E. D. Collapsing the proton motive force to identify synergistic combinations against Staphylococcus aureus. Chem. Biol. 20, 1168–1178 (2013).
Article CAS PubMed Google Scholar
Hurdle, J. G. Targeting bacterial membrane function: an underexploited mechanism for treating persistent infections. Nat. Rev. Microbiol. 9, 62–75 (2011).
Article CAS PubMed PubMed Central Google Scholar
Antibiotic Resistance Threats in the United States, 2019. Centers for Disease Control and Prevention. https://www.cdc.gov/drugresistance/pdf/threats-report/2019-ar-threats-report-508.pdf (accessed 20 September 2021).
Lewis, K. The science of antibiotic discovery. Cell 181, 29–45 (2020).
Article CAS PubMed Google Scholar
Walsh, C. Where will new antibiotics come from? Nat. Rev. Microbiol. 1, 65–70 (2003).
Article CAS PubMed Google Scholar
Ying, R., Bourgeois, D., You, J., Zitnik, M. & Leskovic, J. GNNExplainer: Generating explanations for graph neural networks. Adv. Neural. Inf. Process. Syst. 32, 9240–9251 (2019).
PubMed PubMed Central Google Scholar
Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2, 573–584 (2020).
Article Google Scholar
Yuan, H., Yu, H., Gui, S. & Ji, S. Explainability in graph neural networks: a taxonomic survey. IEEE Trans. Pattern Anal. Mach. Intell. 45, 5782–5799 (2023).
PubMed Google Scholar
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Article CAS PubMed Google Scholar
Kazeev, N. The fast version of DeLong’s method for computing the covariance of unadjusted AUC. https://github.com/yandexdataschool/roc_comparison (accessed 21 July 2023).
Rosin, C. D. Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61, 203–230 (2011).
Article MathSciNet Google Scholar
Wang, Y., Backman, T. W. H., Horan, K. & Girke, T. fmcsR: mismatch tolerant maximum common substructure searching in R. Bioinformatics 29, 2792–2794 (2013).
Article CAS PubMed Google Scholar
Daina, A., Michielin, O. & Zoete, V. SwissADME: a free web tool to evaluate pharmacokinetics, drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep. 7, 42717 (2017).
Article ADS PubMed PubMed Central Google Scholar
Wong, F. et al. Benchmarking AlphaFold‐enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18, e11081 (2022).
Article CAS PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Greco, I. et al. Correlation between hemolytic activity, cytotoxicity and systemic in vivo toxicity of synthetic antimicrobial peptides. Sci Rep. 6, 13206 (2020).
Article ADS Google Scholar
Krol, L. R. Permutation Test. https://github.com/lrkrol/permutationTest (accessed 22 July 2023).
Wong, F. et al. Supporting code for: discovery of a structural class of antibiotics with explainable deep learning. Zenodo https://doi.org/10.5281/zenodo.10095879 (2023).

Download references

Acknowledgements

The authors thank the past and present members of the Collins laboratory for helpful discussions; members of the Broad Institute Center for the Development of Therapeutics (CDoT) for helpful feedback; the Microbial Genome Sequencing Center for assistance with sequencing; the Harvard Center for Mass Spectrometry for assistance with LC–MS experiments; S. Gould and R. Singh for medicinal chemistry feedback; A. Vrcic and T. Dawson for assistance with compound management; A. Graveline for assistance with mouse experiments; and Z. Gitai for E. coli strains RFM795 and JW5503-KanS. F.W. was supported by the James S. McDonnell Foundation and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under award number K25AI168451. A.K. was supported by the Swiss National Science Foundation under grant number SNSF_ 203071. A.M.E. and A.L.M. were supported by federal funds from the National Institute of Allergy and Infectious Diseases of the National Institutes of Health under grant number U19AI110818 to the Broad Institute. J.M.S. was supported by the Banting Fellowships Program (393360). L.D.R. was supported by the Volkswagen Foundation. J.J.C. was supported by the Defense Threat Reduction Agency (grant number HDTRA12210032), the National Institutes of Health (grant number R01-AI146194), and the Broad Institute of MIT and Harvard. This work is part of the Antibiotics-AI Project, which is directed by J.J.C. and supported by the Audacious Project, Flu Lab, LLC, the Sea Grape Foundation, R. Zander and H. Wyss for the Wyss Foundation, and an anonymous donor.

Author information

Jonathan M. Stokes
Present address: Department of Biochemistry and Biomedical Sciences, Michael G. DeGroote Institute for Infectious Disease Research and David Braley Centre for Antibiotic Discovery, McMaster University, Hamilton, Ontario, Canada
These authors contributed equally: Felix Wong, Erica J. Zheng

Authors and Affiliations

Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Felix Wong, Erica J. Zheng, Jacqueline A. Valeri, Melis N. Anahtar, Satotaka Omori, Andres Cubillos-Ruiz, Aarti Krishnan, Abigail L. Manson, Ashlee M. Earl, Jonathan M. Stokes & James J. Collins
Institute for Medical Engineering and Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
Felix Wong, Jacqueline A. Valeri, Andres Cubillos-Ruiz, Aarti Krishnan, Jonathan M. Stokes & James J. Collins
Integrated Biosciences, San Carlos, CA, USA
Felix Wong, Satotaka Omori & Alicia Li
Program in Chemical Biology, Harvard University, Cambridge, MA, USA
Erica J. Zheng
Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
Erica J. Zheng, Jacqueline A. Valeri, Nina M. Donghia, Andres Cubillos-Ruiz & James J. Collins
Eric and Wendy Schmidt Center, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Wengong Jin
Leibniz Institute of Polymer Research and the Max Bergmann Center of Biomaterials, Dresden, Germany
Jens Friedrichs, Ralf Helbig & Lars D. Renner
Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
Behnoush Hajian, Dawid K. Fiejtek, Florence F. Wagner & Holly H. Soutter

Authors

Felix Wong
View author publications
You can also search for this author in PubMed Google Scholar
Erica J. Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jacqueline A. Valeri
View author publications
You can also search for this author in PubMed Google Scholar
Nina M. Donghia
View author publications
You can also search for this author in PubMed Google Scholar
Melis N. Anahtar
View author publications
You can also search for this author in PubMed Google Scholar
Satotaka Omori
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Li
View author publications
You can also search for this author in PubMed Google Scholar
Andres Cubillos-Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Aarti Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Wengong Jin
View author publications
You can also search for this author in PubMed Google Scholar
Abigail L. Manson
View author publications
You can also search for this author in PubMed Google Scholar
Jens Friedrichs
View author publications
You can also search for this author in PubMed Google Scholar
Ralf Helbig
View author publications
You can also search for this author in PubMed Google Scholar
Behnoush Hajian
View author publications
You can also search for this author in PubMed Google Scholar
Dawid K. Fiejtek
View author publications
You can also search for this author in PubMed Google Scholar
Florence F. Wagner
View author publications
You can also search for this author in PubMed Google Scholar
Holly H. Soutter
View author publications
You can also search for this author in PubMed Google Scholar
Ashlee M. Earl
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan M. Stokes
View author publications
You can also search for this author in PubMed Google Scholar
Lars D. Renner
View author publications
You can also search for this author in PubMed Google Scholar
James J. Collins
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

F.W. conceived research, designed all models and experiments, performed or directed all experiments and analysis, wrote the paper and supervised research. E.J.Z., S.O. and A.L. performed screening experiments and analysis. J.A.V. and W.J. assisted with data interpretation and analysis, and W.J. developed and implemented the MCTS rationale extraction algorithm. N.M.D., M.N.A. and A.C.-R. performed mouse experiments and analysis. M.N.A. and A.K. performed screening experiments and assisted with data interpretation. J.F. and R.H. performed cellular physiology experiments and analysis. A.L.M. and A.M.E. performed genomic analysis and assisted with data interpretation. B.H., H.H.S. and J.M.S. assisted with data interpretation. D.K.F. and F.F.W. assisted with chemical testing experiments. L.D.R. performed cellular physiology experiments and analysis and assisted with data interpretation. J.J.C. supervised research. All authors assisted with manuscript editing.

Corresponding author

Correspondence to James J. Collins.

Ethics declarations

Competing interests

J.J.C. is an academic co-founder and scientific advisory board chair of EnBiotix, an antibiotic drug discovery company, and Phare Bio, a non-profit venture focused on antibiotic drug development. J.J.C. is also an academic co-founder and board member of Cellarity and the founding scientific advisory board chair of Integrated Biosciences. J.M.S. is scientific co-founder and scientific director of Phare Bio. F.W. is a co-founder of Integrated Biosciences. S.O. and A.L. contributed to this work as employees of Integrated Biosciences, and S.O. may have an equity interest in Integrated Biosciences. F.W. and J.J.C. have filed a patent based on the results of this work. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Molecular weight distribution of the 39,312 compounds screened.

Data are from an original set of 39,312 compounds containing most known antibiotics, natural products, and structurally diverse molecules, with molecular weights between 40 Da and 4,200 Da. Frequency is shown on a log scale.

Extended Data Fig. 2 Comparison of deep learning models for predicting antibiotic activity.

a,b, Precision-recall curves for predictions of antibiotic activity, for an ensemble of 10 Chemprop models without RDKit features (a) and the best-performing random forest classifier model based on Morgan fingerprints (b), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (1.3%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping. AUC, area under the curve.

Extended Data Fig. 3 Comparison of deep learning models for predicting human cell cytotoxicity.

a,b, Precision-recall curves for predictions of HepG2 cytotoxicity, for an ensemble of 10 Chemprop models without RDKit features (a) and the best-performing random forest classifier model based on Morgan fingerprints (b), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (8.5%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping. AUC, area under the curve. c,d, Precision-recall curves for predictions of HSkMC cytotoxicity, for an ensemble of 10 Chemprop models without RDKit features (c) and the best-performing random forest classifier model based on Morgan fingerprints (d), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (3.8%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping. e,f, Precision-recall curves for predictions of IMR-90 cytotoxicity, for an ensemble of 10 Chemprop models without RDKit features (e) and the best-performing random forest classifier model based on Morgan fingerprints (f), trained and tested using data from a screen of 39,312 molecules (Fig. 1 of the main text). The black dashed line represents the baseline fraction of active compounds in the training set (8.8%). Blue curves and the 95% confidence interval indicate the variation generated by bootstrapping.

Extended Data Fig. 4 Visualizing chemical space across different prediction score thresholds.

a,b, t-Distributed neighbor embedding (t-SNE) plot of compounds with high and low antibiotic prediction scores, in addition to compounds in the training set, for different prediction score thresholds. The plot shows the chemical similarity or dissimilarity of various compounds, and active compounds in the training set (red dots) are seen to largely separate compounds with high prediction scores (green, black, and purple dots) from compounds with low prediction scores (brown dots).

Extended Data Fig. 5 Examples of rationale calculations using Monte-Carlo tree search.

a, Illustration of the MCTS forward pass using compound 1. The figure shows three possible search paths from the root (compound 1) by deleting peripheral bonds or rings (highlighted in red). Due to space limitations, only three steps from the root are shown. b, Illustration of a complete search path from the root (compound 1) to a leaf node (the rationale). Chemprop is used to predict the activity of each leaf node, and these predictions are used to make updates to the statistics of each intermediate node in the backward pass.

Extended Data Fig. 6 Maximal common substructure identification reveals known antibiotic classes, but are less predictive than Chemprop rationales across all hits.

a,b, Rank-ordered numbers of hits (a) and non-hits (b) associated with maximal common substructures (MCSs) identified by a grouping method. Here, any hit associated with any of the MCSs shown shares a minimum of 12 atoms with the MCS. Dashed lines in MCSs indicate either single or double bonds. Each green or brown bar shows the prediction score of each MCS viewed as a molecule in its own right. Where bars are thin, the corresponding MCS prediction scores are approximately zero (including all brown bars in (b)). c,d, Similar to (a), but here, any hit associated with any of the MCSs shown shares a minimum of 10 (c) or 15 (d) atoms with the MCS. e, Illustration of the rationales (red) determined using a Monte Carlo tree search for example hits (black) associated with MCSs A1-A12. No hit associated with MCS A12 possessed a rationale. f, MCS prediction scores (blue bars) and the average prediction scores of all rationales of all hits associated with MCSs A1-A12 (red bars). Where blue bars are thin, the corresponding MCS prediction scores are approximately zero. No hit associated with MCS A12 possessed a rationale.

Extended Data Fig. 7 Closest active training set compounds to, and selectivities of, four validated hits associated with rationale groups G1-G5.

a, Closest active compounds (right), as measured by Tanimoto similarity, are from the training set of 39,312 compounds. Compounds are colored according to associated rationale groups (as indicated in parentheses), and the identifier and Tanimoto similarity score of each closest active compound are displayed. b, S. aureus MIC and human cell IC₅₀ values of the four compounds in (a), shown on a log scale. Bars show the means of two biological replicates (points) and are colored by the bacterial strain, human cell type, or media condition tested. Asterisks indicate values larger than 128 µg/mL.

Extended Data Fig. 8 Comparison of MICs of different compounds against methicillin-susceptible and methicillin-resistant S. aureus, and eradication of kanamycin persisters by treatment with compounds 1 and 2.

a, MICs of various antibiotics against S. aureus RN4220 (black) and S. aureus USA300 (blue) on a log scale. Bars show the mean of two biological replicates (individual points). b, Survival curves of B. subtilis 168 after combination treatment with kanamycin and compounds 1 and 2, respectively, as determined by plating and CFU counting. Initial CFU values are ~10⁷. Each point is representative of the mean of two biological replicates. Cultures treated with kanamycin in addition to compounds 1 and 2 were eradicated after 24 h (CFU/mL = 0), and these values were truncated to a log survival value of −7 on this plot.

Source Data

Extended Data Fig. 9 Toxicity, chemical properties, and in vivo efficacy of compounds 1 and 2.

a, Fractional hemolysis measurements of human red blood cells (RBCs) treated with compounds 1 and 2 at the indicated final concentrations. Vehicle (1% DMSO) was used as a negative control, and Triton X-100, a detergent, was used as a positive control. Black points indicate values from two biological replicates, and red bars indicate average values. b, Ferrous iron chelation measurements of compounds 1 and 2. Vehicle (1% DMSO) was used as a negative control, and ethylenediaminetetraacetic acid (EDTA), an iron chelator, was used as a positive control. Black points indicate values from two biological replicates, and gray bars indicate average values. c, Ames test mutagenesis measurements of the fractions of revertant S. typhimurium TA100 cultures treated with compounds 1 and 2 at the indicated final concentrations. Vehicle (1% DMSO) was used as a negative control, and 5 µg/mL sodium azide was used as a positive control. Black points indicate values from two biological replicates, and purple bars indicate average values. Higher fractions of revertant cultures indicate higher mutagenic potential (inset). d, Chemical stability of compound 1 in various buffers as a function of incubation time at 37 °C. Values are normalized to the mean measurement at time zero, and each point is representative of the mean of two biological replicates. Error bars indicate the full range of values arising from two biological replicates. e, Photographs of WoundSkin models 24 h after topical treatment with compound 1 (1%) or DMSO vehicle. Images are representative of six biological replicates in each treatment group. Scale bar, 2 mm. f, Illustration of the in vivo study of a neutropenic mouse wound infection model using MRSA CDC 563 shown in Fig. 5a of the main text. g, Illustration of the in vivo study of a neutropenic mouse thigh infection model using MRSA CDC 706 shown in Fig. 5b of the main text.

Source Data

Extended Data Fig. 10 Exploration of a structural class through structure-activity relationships.

a, The rationale of compounds 1 and 2, overlaid with chemical modifications (R1-R8) that encompass all compounds used to test SAR (Supplementary Data 2). SAR, structure-activity relationships. b, Analogues of compounds 1 and 2 found to have varying degrees of activity against S. aureus. Corresponding MIC and IC₅₀ values are representative of two biological replicates.

Supplementary information

Supplementary Information

This file contains Supplementary Notes 1-4, Supplementary References, and Supplementary Tables 1-9.

Reporting Summary

Supplementary Data 1

Training set of 39,312 compounds tested for antibiotic activity and cytotoxicity, in addition to 200 RDKit features used to augment the models and cytotoxicity testing results. Antibiotic activity was defined using a 20% relative mean growth cut-off in S. aureus RN4220. Cytotoxicity was defined using a 90% relative mean cell viability cut-off in HepG2 cells, HSkMCs, and IMR-90 cells. Data are from two biological replicates.

Supplementary Data 2

Model predictions, rationales, and procured compounds from the ensemble Chemprop model. Compound SMILES strings and corresponding prediction scores are shown for all 3,646 hits, out of 12,076,365 compounds whose antibiotic activity and cytotoxicity against human cells were predicted. Rationale and scaffold SMARTS strings, vendor catalogue information for all 283 procured and tested compounds shown in Fig. 3e of the main text, and vendor catalogue information for all 17 procured and tested compounds as part of the structure–activity relationship analyses shown in Extended Data Fig. 10 are also provided, in addition to the MCS SMARTS strings for the analyses described in Supplementary Note 2 and Extended Data Fig. 6.

Supplementary Data 3

Mutations arising in cells exposed to compounds. For each compound, results are shown for at least two independently passaged or suppressor mutant populations. All mutations that passed mapping filters are listed here. Black boxes highlight mutations in similar regions across sequencing replicates either present in the same gene, or present in an adjacent gene or intergenic region.

Supplementary Data 4

Training and test data for models predicting proton motive force-altering activity. Proton motive force-altering activity was defined using a 30% relative mean fluorescence change in S. aureus RN4220 in the presence of DiSC₃(5), a proton motive force-sensitive dye. 475 antibacterial compounds from Supplementary Data 1 were tested, and all inactive antibacterial compounds were assumed to not alter proton motive force. Data are from two biological replicates.

Peer Review File

Source data

Source Data Fig. 4

Source Data Fig. 5

Source Data Extended Data Fig. 8

Source Data Extended Data Fig. 9

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wong, F., Zheng, E.J., Valeri, J.A. et al. Discovery of a structural class of antibiotics with explainable deep learning. Nature 626, 177–185 (2024). https://doi.org/10.1038/s41586-023-06887-8

Download citation

Received: 05 January 2022
Accepted: 21 November 2023
Published: 20 December 2023
Issue Date: 01 February 2024
DOI: https://doi.org/10.1038/s41586-023-06887-8

This article is cited by

Hybrid modeling design patterns
- Maja Rudolph
- Stefan Kurz
- Barbara Rakitsch
Journal of Mathematics in Industry (2024)
Machine learning for antimicrobial peptide identification and design
- Fangping Wan
- Felix Wong
- Cesar de la Fuente-Nunez
Nature Reviews Bioengineering (2024)
Explainable deep learning discovers novel antibiotic
- Sarah Crunkhorn
Nature Reviews Drug Discovery (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.