Artificial intelligence uncovers carcinogenic human metabolites

Mittal, Aayushi; Mohanty, Sanjay Kumar; Gautam, Vishakha; Arora, Sakshi; Saproo, Sheetanshu; Gupta, Ria; Sivakumar, Roshan; Garg, Prakriti; Aggarwal, Anmol; Raghavachary, Padmasini; Dixit, Nilesh Kumar; Singh, Vijay Pal; Mehta, Anurag; Tayal, Juhi; Naidu, Srivatsava; Sengupta, Debarka; Ahuja, Gaurav

doi:10.1038/s41589-022-01110-7

Article
Published: 11 August 2022

Artificial intelligence uncovers carcinogenic human metabolites

Aayushi Mittal¹^na1,
Sanjay Kumar Mohanty ORCID: orcid.org/0000-0002-1375-2223¹^na1,
Vishakha Gautam¹^na1,
Sakshi Arora ORCID: orcid.org/0000-0001-7535-3597¹^na1,
Sheetanshu Saproo²,
Ria Gupta¹,
Roshan Sivakumar¹,
Prakriti Garg¹,
Anmol Aggarwal¹,
Padmasini Raghavachary¹,
Nilesh Kumar Dixit¹,
Vijay Pal Singh³,
Anurag Mehta⁴,
Juhi Tayal⁴,
Srivatsava Naidu²,
Debarka Sengupta ORCID: orcid.org/0000-0002-6353-5411¹ &
…
Gaurav Ahuja ORCID: orcid.org/0000-0002-2837-9361¹

Nature Chemical Biology volume 18, pages 1204–1213 (2022)Cite this article

4711 Accesses
7 Citations
94 Altmetric
Metrics details

Subjects

Abstract

The genome of a eukaryotic cell is often vulnerable to both intrinsic and extrinsic threats owing to its constant exposure to a myriad of heterogeneous compounds. Despite the availability of innate DNA damage responses, some genomic lesions trigger malignant transformation of cells. Accurate prediction of carcinogens is an ever-challenging task owing to the limited information about bona fide (non-)carcinogens. We developed Metabokiller, an ensemble classifier that accurately recognizes carcinogens by quantitatively assessing their electrophilicity, their potential to induce proliferation, oxidative stress, genomic instability, epigenome alterations, and anti-apoptotic response. Concomitant with the carcinogenicity prediction, Metabokiller is fully interpretable and outperforms existing best-practice methods for carcinogenicity prediction. Metabokiller unraveled potential carcinogenic human metabolites. To cross-validate Metabokiller predictions, we performed multiple functional assays using Saccharomyces cerevisiae and human cells with two Metabokiller-flagged human metabolites, namely 4-nitrocatechol and 3,4-dihydroxyphenylacetic acid, and observed high synergy between Metabokiller predictions and experimental validations.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Metabokiller is an artificial-intelligence-driven tool for carcinogen prediction.**

**Fig. 2: Metabokiller outperforms other prediction methods.**

**Fig. 3: Experimental validations support Metabokiller predictions.**

**Fig. 4: 4NC and DP trigger an anti-apoptotic response in yeast.**

**Fig. 5: 4NC and DP trigger malignant transformation of human cells.**

Three million images and morphological profiles of cells treated with matched chemical and genetic perturbations

Article Open access 09 April 2024

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche

Article Open access 20 March 2024

Data availability

The raw RNA sequencing files are available at ArrayExpress under accession E-MTAB-11179. The processed datasets detailing about the compound SMILES, compound names, PubChem IDs, InChIs, Bioactivity status and their source information are accessible from GitHub at https://github.com/the-ahuja-lab/Metabokiller/tree/main/datasets as well as Zenodo at https://doi.org/10.5281/zenodo.6683106 repositories. Source data are provided with this paper.

Code availability

A Python package for Metabokiller is provided at https://pypi.org/project/Metabokiller/ or from the project GitHub page at https://github.com/the-ahuja-lab/Metabokiller and Zenodo at https://doi.org/10.5281/zenodo.6683106. Code used for building machine-learning models is provided on the project GitHub page.

References

Rappaport, S. M. Redefining environmental exposure for disease etiology. NPJ Syst. Biol. Appl. 4, 1–6 (2018).
Article Google Scholar
Farland, W. H., Lynch, A., Erraguntla, N. K. & Pottenger, L. H. Improving risk assessment approaches for chemicals with both endogenous and exogenous exposures. Regul. Toxicol. Pharmacol. 103, 210–215 (2019).
Article CAS PubMed Google Scholar
Swenberg, J. A. et al. Endogenous versus exogenous DNA adducts: their role in carcinogenesis, epidemiology, and risk assessment. Toxicol. Sci. 120, S130–S145 (2011).
Article CAS PubMed Google Scholar
Luch, A. Nature and nurture—lessons from chemical carcinogenesis. Nat. Rev. Cancer 5, 113–125 (2005).
Article CAS PubMed Google Scholar
Yasaei, H. et al. Carcinogen-specific mutational and epigenetic alterations in INK4A, INK4B and p53 tumour-suppressor genes drive induced senescence bypass in normal diploid mammalian cells. Oncogene 32, 171–179 (2012).
Article PubMed Google Scholar
Fuchs, R. P. P., Schwartz, N. & Daune, M. P. Hot spots of frameshift mutations induced by the ultimate carcinogen N-acetoxy-N-2-acetylaminofluorene. Nature 294, 657–659 (1981).
Article CAS PubMed Google Scholar
Lilly, L. J., Bahner, B. & Magee, P. N. Chromosome aberrations induced in rat lymphocytes by N-nitroso compounds as a possible basis for carcinogen screening. Nature 258, 611–612 (1975).
Article CAS PubMed Google Scholar
Madia, F., Worth, A., Whelan, M. & Corvi, R. Carcinogenicity assessment: addressing the challenges of cancer and chemicals in the environment. Environ. Int. 128, 417–429 (2019).
Article CAS PubMed PubMed Central Google Scholar
Anand, P. et al. Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25, 2097–2116 (2008).
Article CAS PubMed PubMed Central Google Scholar
Williams, G. M., Iatropoulos, M. J. & Weisburger, J. H. Chemical carcinogen mechanisms of action and implications for testing methodology. Exp. Toxicol. Pathol. 48, 101–111 (1996).
Article CAS PubMed Google Scholar
Barrett, J. C. Mechanisms of action of known human carcinogens. IARC Sci. Publ. 116, 115–134 (1992).
CAS Google Scholar
Meister, K. A. America’s War on ‘Carcinogens’: Reassessing the Use of Animal Tests to Predict Human Cancer Risk (American Council on Science, Health, 2005).
Banerjee, P., Eckert, A. O., Schrey, A. K. & Preissner, R. ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res. 46, W257–W263 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhang, L. et al. CarcinoPred-EL: novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods. Sci. Rep. 7, 2118 (2017).
Article PubMed PubMed Central Google Scholar
Gupta, R. et al. OdoriFy: a conglomerate of artificial intelligence-driven prediction engines for olfactory decoding. J. Biol. Chem. 297, 100956.
Gupta, A. et al. Machine-OlF-Action: a unified framework for developing and interpreting machine-learning models for chemosensory research. Bioinformatics 37, 1769–1771 (2021).
Article CAS Google Scholar
Fjodorova, N. et al. Quantitative and qualitative models for carcinogenicity prediction for non-congeneric chemicals using CP ANN method for regulatory uses. Mol. Divers. 14, 581–594 (2010).
Article CAS PubMed Google Scholar
Morales, A. H., Pérez, M. A. C., Combes, R. D. & González, M. P. Quantitative structure activity relationship for the computational prediction of nitrocompounds carcinogenicity. Toxicology 220, 51–62 (2006).
Article PubMed Google Scholar
Benigni, R., Giuliani, A., Franke, R. & Gruska, A. Quantitative structure-activity relationships of mutagenic and carcinogenic aromatic amines. Chem. Rev. 100, 3697–3714 (2000).
Article CAS PubMed Google Scholar
Singh, K. P., Gupta, S. & Rai, P. Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches. Toxicol. Appl. Pharmacol. 272, 465–475 (2013).
Article CAS PubMed Google Scholar
Li, X. et al. In silico estimation of chemical carcinogenicity with binary and ternary classification methods. Mol. Inform. 34, 228–235 (2015).
Article CAS PubMed Google Scholar
Benigni, R., Bossa, C., Tcheremenskaia, O. & Giuliani, A. Alternatives to the carcinogenicity bioassay: in silico methods, and the in vitro and in vivo mutagenicity assays. Expert Opin. Drug Metab. Toxicol. 6, 809–819 (2010).
Article CAS PubMed Google Scholar
Butterworth, B. E., Aylward, L. L. & Hays, S. M. A mechanism-based cancer risk assessment for 1,4-dichlorobenzene. Regul. Toxicol. Pharmacol. 49, 138–148 (2007).
Article CAS PubMed Google Scholar
Liehr, J. G. Is estradiol a genotoxic mutagenic carcinogen? Endocr. Rev. 21, 40–54 (2000).
CAS PubMed Google Scholar
Knerr, S. & Schrenk, D. Carcinogenicity of 2,3,7,8-tetrachlorodibenzo-p-dioxin in experimental models. Mol. Nutr. Food Res. 50, 897–907 (2006).
Article CAS PubMed Google Scholar
Ryffel, B. The carcinogenicity of ciclosporin. Toxicology 73, 1–22 (1992).
Article CAS PubMed Google Scholar
Hernández, L. G., van Steeg, H., Luijten, M. & van Benthem, J. Mechanisms of non-genotoxic carcinogens and importance of a weight of evidence approach. Mutat. Res. 682, 94–109 (2009).
Article PubMed Google Scholar
Miller, E. C. & Miller, J. A. Searches for ultimate chemical carcinogens and their reactions with cellular macromolecules. Cancer 47, 2327–2345 (1981).
Article CAS PubMed Google Scholar
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Article CAS PubMed Google Scholar
Bertoni, M. et al. Bioactivity descriptors for uncharacterized chemical compounds. Nat. Commun. 12, 3932 (2021).
Article CAS PubMed PubMed Central Google Scholar
Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).
Article PubMed PubMed Central Google Scholar
Ramsundar, B., Eastman, P., Walters, P. & Pande, V. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More (O’Reilly Media, 2019).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (Association for Computing Machinery, New York, 2016).
Maunz, A. et al. lazar: a modular predictive toxicology framework. Front. Pharmacol. 4, 38 (2013).
Article CAS PubMed PubMed Central Google Scholar
Schyman, P., Liu, R., Desai, V. & Wallqvist, A. vNN web server for ADMET predictions. Front. Pharmacol. 8, 889 (2017).
Article PubMed PubMed Central Google Scholar
Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018).
Article CAS PubMed Google Scholar
Reznik, E. et al. A landscape of metabolic variation across tumor types. Cell Syst. 6, 301–313.e3 (2018).
Article CAS PubMed PubMed Central Google Scholar
Dando, I. et al. Oncometabolites in cancer aggressiveness and tumour repopulation. Biol. Rev. Camb. Philos. Soc. 94, 1530–1546 (2019).
PubMed Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 47, e47 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lutz, W. K. & Fekete, T. Endogenous and exogenous factors in carcinogenesis: limits to cancer prevention. Int. Arch. Occup. Environ. Health 68, 120–125 (1996).
Article CAS PubMed Google Scholar
Rattray, N. J. W. et al. Beyond genomics: understanding exposotypes through metabolomics. Hum. Genomics 12, 4 (2018).
Article PubMed PubMed Central Google Scholar
Hoeijmakers, J. H. J. DNA damage, aging, and cancer. N. Engl. J. Med. 361, 1475–1485 (2009).
Article CAS PubMed Google Scholar
&Ahuja, G. et al. Loss of genomic integrity induced by lysosphingolipid imbalance drives ageing in the heart. EMBO Rep. 20, e47407 (2019).
Article PubMed PubMed Central Google Scholar
Siramshetty, V. B. et al. WITHDRAWN—a resource for withdrawn and discontinued drugs. Nucleic Acids Res. 44, D1080–D1086 (2016).
Article CAS PubMed Google Scholar
Zhou, Z., Dai, Q. & Gu, T. A QSAR model of PAHs carcinogenesis based on thermodynamic stabilities of biactive sites. J. Chem. Inf. Comput. Sci. 43, 615–621 (2003).
Article CAS PubMed Google Scholar
Ruiz, P. et al. Prediction of the health effects of polychlorinated biphenyls (PCBs) and their metabolites using quantitative structure–activity relationship (QSAR). Toxicol. Lett. 181, 53–65 (2008).
Article CAS PubMed Google Scholar
Ježek, P. 2-Hydroxyglutarate in cancer cells. Antioxid. Redox Signal. 33, 903–926 (2020).
Article PubMed PubMed Central Google Scholar
Smith, M. T. et al. Key characteristics of carcinogens as a basis for organizing data on mechanisms of carcinogenesis. Environ. Health Perspect. 124, 713–721 (2016).
Article CAS PubMed Google Scholar
Schmidt, F. H. A new way to understand chemical carcinogenesis and cancer prevention. RRMC 4, 23–33 (2014).
Article Google Scholar
Gusenleitner, D. et al. Genomic models of short-term exposure accurately predict long-term chemical carcinogenicity and identify putative mechanisms of action. PLoS ONE 9, e102579 (2014).
Article PubMed PubMed Central Google Scholar
O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).
Article PubMed PubMed Central Google Scholar
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010).
Article Google Scholar
Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).
Article Google Scholar
Teng, X. & Hardwick, J. M. Reliable method for detection of programmed cell death in yeast. Methods Mol. Biol. 559, 335–342 (2009).
Article CAS PubMed PubMed Central Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Article CAS PubMed PubMed Central Google Scholar
Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The authors would like to thank the IT-HelpDesk team of IIIT-Delhi for providing assistance with the computational resources. We thank all the members of the Ahuja lab for their intellectual contributions at various stages of this project. We also thank K. Datta for providing critical insights into this study and K. Chakraborty for sharing yeast strains. The Ahuja lab is supported by the Ramalingaswami Re-entry Fellowship (BT/HRD/35/02/2006), a re-entry scheme of the Department of Biotechnology, Ministry of Science & Technology, Government of India, Start-Up Research Grant (SRG/2020/000232) from the Science and Engineering Research Board and an intramural Start-up grant from Indraprastha Institute of Information Technology-Delhi. The Sengupta lab is funded by the INSPIRE faculty grant from the Department of Science & Technology, India.

Author information

These authors contributed equally: Aayushi Mittal, Sanjay Kumar Mohanty, Vishakha Gautam, Sakshi Arora.

Authors and Affiliations

Department of Computational Biology, Indraprastha Institute of Information Technology-Delhi, Okhla, Phase III, New Delhi, Delhi, India
Aayushi Mittal, Sanjay Kumar Mohanty, Vishakha Gautam, Sakshi Arora, Ria Gupta, Roshan Sivakumar, Prakriti Garg, Anmol Aggarwal, Padmasini Raghavachary, Nilesh Kumar Dixit, Debarka Sengupta & Gaurav Ahuja
Department of Bio-Medical Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India
Sheetanshu Saproo & Srivatsava Naidu
CSIR-Institute of Genomics & Integrative Biology, New Delhi, Delhi, India
Vijay Pal Singh
Rajiv Gandhi Cancer Institute & Research Centre, New Delhi, Delhi, India
Anurag Mehta & Juhi Tayal

Authors

Aayushi Mittal
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Kumar Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
Vishakha Gautam
View author publications
You can also search for this author in PubMed Google Scholar
Sakshi Arora
View author publications
You can also search for this author in PubMed Google Scholar
Sheetanshu Saproo
View author publications
You can also search for this author in PubMed Google Scholar
Ria Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Roshan Sivakumar
View author publications
You can also search for this author in PubMed Google Scholar
Prakriti Garg
View author publications
You can also search for this author in PubMed Google Scholar
Anmol Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Padmasini Raghavachary
View author publications
You can also search for this author in PubMed Google Scholar
Nilesh Kumar Dixit
View author publications
You can also search for this author in PubMed Google Scholar
Vijay Pal Singh
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Mehta
View author publications
You can also search for this author in PubMed Google Scholar
Juhi Tayal
View author publications
You can also search for this author in PubMed Google Scholar
Srivatsava Naidu
View author publications
You can also search for this author in PubMed Google Scholar
Debarka Sengupta
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Ahuja
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The study was conceived by G.A. Computational analysis workflows were designed by G.A., D.S, and A.Mi. Yeast experimental workflows were designed by G.A., and A.Mi., whereas, human experimental workflows were designed by S.N. Yeast-based assays were performed by A.Mi., S.A., and N.K.D. Human cell culture-based experiments were performed by S.S. Data compilation for the model building was performed by A.Mi., P.G., A.A., P.R., and analysis workflow was made by S.M., V.G., S.A., A.Mi., R.S., R.G. and P.G. V.P.S., A.Me. and J.T. assisted in data interpretation. Metabokiller Python package was created by S.K.M. Illustrations were drafted by A.M. and G.A. G.A. and A.Mi. wrote the paper. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Debarka Sengupta or Gaurav Ahuja.

Ethics declarations

Competing interests

A provisional patent has been filed (reference no. 202111052929, application no. TEMP/E-1/60118/2021-DEL) describing the computational architecture of the Metabokiller. Usage of the Metabokiller Python package is free for the academic institutions, or for any academic-related project, however, for commercial usage, users must contact the authors.

Peer review

Peer review information

Nature Chemical Biology thanks Michael Fasullo, Hongsheng Liu and Stefano Monti for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Workflow detailing Metabokiller functionalities.

Schematic representation depicting the step-by-step workflow used to build all the six individual biochemical models and the ensemble model (Metabokiller). Up/downsampling approach was used to counteract the class imbalance. Signaturizer library was used to generate bioactivity features. Hyperparameter tuning was performed to obtain the best-performing model parameters. The ensemble model (Metabokiller) was built using biochemical features of experimentally validated carcinogens/non-carcinogens generated using six models. The majority voting method was used to assign the final carcinogenicity status.

Extended Data Fig. 2 Metabokiller harbors high prediction performance.

(a) Box plot depicting the AUCROC values of the bootstrapping (n = 20 repetitions) of the indicated models. (b-f) Box plots depicting the AUCROC, accuracy, F1 Score, precision, and recall of the indicated models as inferred from the 10-fold cross-validation. (g) Box plot depicting the model performance of the twenty Gradient Boosting Machine (GBM)-based models generated using bootstrapping technique (n = 20 repetitions). (h) Variables factor map (PCA) depicting the direction and contribution of all the six variables (individual models) representing the experimentally validated carcinogens (MK_ETn) in the Eigenspace. (i) Principal Component Analysis revealing the chemical heterogeneity between the carcinogens and non-carcinogens in the indicated datasets. The heatmap at the bottom depicts the relative enrichment of the indicated functional groups (RNH2: primary amine, R2NH: secondary amine, R3N: tertiary amine, ROPO3: monophosphate, ROH: alcohol, RCHO: aldehyde, RCOR: ketone, RCOOH: carboxylic acid, RCOOR: ester, ROR: ether, RCCH: terminal alkyne, RCN: nitrile) in both classes. (j) Bar graphs depicting the accuracy of Metabokiller on the indicated unseen datasets. In the box plots, center lines represent the medians; box limits indicate the 25th and 75th percentiles as determined by R software (ggplot2); whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; outliers are represented by dots.

Source data

Extended Data Fig. 3 Metabokiller unravels potential oncometabolites.

(a) Heatmap depicting the number of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) predictions on the Independent Dataset (I.D.) for indicated methods/tools. (b) Venn diagram depicting predicted carcinogenic human metabolites, further segregated based on prediction probability cutoffs. (c) Variables factor map depicting the contribution of all the six individual models in predicting carcinogenic metabolites from HMDB (probability cutoff ≥ 0.5). (d) Projection of the predicted carcinogens (indicated as red dots; probability cutoff ≥ 0.7) on the human metabolic space, achieved using iPath Web Server. (e) Schematic representation of the steps involved in processing pan-cancer metabolomics dataset. Of note, Pearson correlation was computed between log₂ fold change (tumor vs healthy) and biochemical/carcinogenicity probabilities. (f) Heatmap detailing the correlation values further segregated based on cancer type. (g) Volcano plots depicting the differentially enriched/de-enriched metabolites in the indicated cancer datasets. Gray dots highlight the metabolites that do not qualify for the enrichment cutoff (log₂ fold change ≥ 1 or ≤ -1, and p-value (adjusted) < 0.05), and green and red dots represent the metabolites that qualify for the enrichment cutoff and are predicted as non-carcinogenic and carcinogenic by Metabokiller respectively. The p-value was computed using two-sided Mann–Whitney U test and corrected using Benjamini-Hochberg method. (h) Structural information of some of the well-characterized oncometabolites reported in the literature and predicted by Metabokiller.

Source data

Extended Data Fig. 4 Experimental validations support Metabokiller predictions.

(a) Schematic representation highlighting the predicted-carcinogenic metabolic intermediates of the tyrosine metabolism pathway and aminobenzoate degradation pathway. (b) Box plots depicting the fluorescence intensity of propidium iodide staining indicating cell viability in the indicated conditions (n = 8 biological replicates) after 9 hours (left) and 12 hours (right) of treatment. Of note, heat-killed (HK) yeast cells were used as a positive control. Two-sided Mann–Whitney U test was used to compute statistical significance between the test conditions and the negative control. For left panel, the p-values are 0.0009 (HK); for 4NC: 0.96 (0.1 µM), 0.87 (1 µM), and 0.02 (10 µM); for DP: 0.59 (0.1 µM), 0.64 (1 µM), and 0.83 (10 µM). For right panel, the p-values are 0.0009 (HK); for 4NC: 0.63 (0.1 µM), 0.75 (1 µM), and 0.2 (10 µM); for DP: 0.42 (0.1 µM), 0.26 (1 µM), and 0.17 (10 µM). (c) Growth curve profiles of the treated and untreated wild-type yeast during transient exposure with the indicated conditions (n = 8 biological replicates with technical duplicates). Data points represent mean ± SD. Two-sided Student’s t-test was used to compute statistical significance between the positive (H₂O₂ treated yeast cells) and negative control (untreated yeast cells). The p-values are 0.9 (0 hrs), 1.5 × 10⁻⁶ (1.5 hrs), 4.85 × 10⁻⁶ (3 hrs), 4.45 × 10⁻¹⁶ (4.5 hrs), 1.62 × 10⁻¹⁰ (6 hrs), 2.27 × 10⁻¹⁸ (7.5 hrs), 6.41 × 10⁻¹³ (9 hrs), 1.04 × 10⁻²³ (10.5 hrs), 5.82 × 10⁻³⁴ (12 hrs). (d) Box plot depicting the results of reactive oxygen species (ROS) levels inferred using DCFH-DA dye-based assay in the indicated conditions (n = 8 biological replicates). Of note, ROS levels were measured 12 hours post-incubation. Notably, hydrogen peroxide (H₂O₂) treated yeast cells were used as a positive control. Two-sided Mann–Whitney U test was used to compute statistical significance between the test conditions and the negative control. The p-values are 0.003 (H₂O₂); for 4NC: 0.069 (0.1 µM), 0.1 (1 µM), and 0.001 (10 µM); for DP: 0.016 (0.1 µM), 0.087 (1 µM), and 0.07 (10 µM). The p-value cutoff for all the plots is 0.05. *, **, ***, and **** refer to p-values <0.05, <0.01, <0.001, and <0.0001, respectively. In the box plots, center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; outliers are represented by dots.

Source data

Extended Data Fig. 5 RNA-Seq reveals mode-of-action of 4NC and DP.

(a) Bar plots depicting the total read counts (in millions) of the indicated RNA sequencing samples. (b) Box plot representing the distribution of the transformed read count data in the indicated conditions (n = 3 biological replicates). (c) Correlation plot showing the relationship between the individual RNA sequencing samples. Of note, 75% of the normalized and transformed data was used for the correlation analysis. (d-e) Box plots depicting the relative log expression of the 3 biological replicates of the indicated conditions before and after upper quantile normalization. (f) Volcano plot indicating the differentially expressed genes between the treated (metabolite treatment) and untreated conditions. p-value was computed using Wald test and corrected using Benjamini-Hochberg method (g) Metascape-based Functional Gene Ontology analysis identified the involvement of differentially expressed genes in the indicated prominent biological processes. (h) Schematic representation depicting the genomic alterations in the CAN1 gene in the indicated replicates. In the box plots, center lines represent the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; outliers are represented by dots.

Source data

Supplementary information

Supplementary Information

Supplementary Tables 1–12.

Reporting Summary

Source data

Source Data Fig. 1

Statistical source data for Fig. 1.

Source Data Fig. 2

Statistical source data for Fig. 2.

Source Data Fig. 3

Statistical source data for Fig. 3.

Source Data Fig. 4

Statistical source data for Figure 4.

Source Data Fig. 5

Statistical source data for Figure 5.

Source Data Extended Data Fig. 2

Statistical source data for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Statistical source data for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Statistical source data for Extended Data Figure 4.

Source Data Extended Data Fig. 5

Statistical source data for Extended Data Figure 5.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mittal, A., Mohanty, S.K., Gautam, V. et al. Artificial intelligence uncovers carcinogenic human metabolites. Nat Chem Biol 18, 1204–1213 (2022). https://doi.org/10.1038/s41589-022-01110-7

Download citation

Received: 02 December 2021
Accepted: 07 July 2022
Published: 11 August 2022
Issue Date: November 2022
DOI: https://doi.org/10.1038/s41589-022-01110-7

This article is cited by

Chemistry-intuitive explanation of graph neural networks for molecular property prediction with substructure masking
- Zhenxing Wu
- Jike Wang
- Tingjun Hou
Nature Communications (2023)
The AI system that picks carcinogens out of the chemical crowd

Nature (2022)
AI method detects carcinogenic human metabolites

Nature India (2022)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links