Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Artificial intelligence uncovers carcinogenic human metabolites

Abstract

The genome of a eukaryotic cell is often vulnerable to both intrinsic and extrinsic threats owing to its constant exposure to a myriad of heterogeneous compounds. Despite the availability of innate DNA damage responses, some genomic lesions trigger malignant transformation of cells. Accurate prediction of carcinogens is an ever-challenging task owing to the limited information about bona fide (non-)carcinogens. We developed Metabokiller, an ensemble classifier that accurately recognizes carcinogens by quantitatively assessing their electrophilicity, their potential to induce proliferation, oxidative stress, genomic instability, epigenome alterations, and anti-apoptotic response. Concomitant with the carcinogenicity prediction, Metabokiller is fully interpretable and outperforms existing best-practice methods for carcinogenicity prediction. Metabokiller unraveled potential carcinogenic human metabolites. To cross-validate Metabokiller predictions, we performed multiple functional assays using Saccharomyces cerevisiae and human cells with two Metabokiller-flagged human metabolites, namely 4-nitrocatechol and 3,4-dihydroxyphenylacetic acid, and observed high synergy between Metabokiller predictions and experimental validations.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Metabokiller is an artificial-intelligence-driven tool for carcinogen prediction.
Fig. 2: Metabokiller outperforms other prediction methods.
Fig. 3: Experimental validations support Metabokiller predictions.
Fig. 4: 4NC and DP trigger an anti-apoptotic response in yeast.
Fig. 5: 4NC and DP trigger malignant transformation of human cells.

Similar content being viewed by others

Data availability

The raw RNA sequencing files are available at ArrayExpress under accession E-MTAB-11179. The processed datasets detailing about the compound SMILES, compound names, PubChem IDs, InChIs, Bioactivity status and their source information are accessible from GitHub at https://github.com/the-ahuja-lab/Metabokiller/tree/main/datasets as well as Zenodo at https://doi.org/10.5281/zenodo.6683106 repositories. Source data are provided with this paper.

Code availability

A Python package for Metabokiller is provided at https://pypi.org/project/Metabokiller/ or from the project GitHub page at https://github.com/the-ahuja-lab/Metabokiller and Zenodo at https://doi.org/10.5281/zenodo.6683106. Code used for building machine-learning models is provided on the project GitHub page.

References

  1. Rappaport, S. M. Redefining environmental exposure for disease etiology. NPJ Syst. Biol. Appl. 4, 1–6 (2018).

    Article  Google Scholar 

  2. Farland, W. H., Lynch, A., Erraguntla, N. K. & Pottenger, L. H. Improving risk assessment approaches for chemicals with both endogenous and exogenous exposures. Regul. Toxicol. Pharmacol. 103, 210–215 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Swenberg, J. A. et al. Endogenous versus exogenous DNA adducts: their role in carcinogenesis, epidemiology, and risk assessment. Toxicol. Sci. 120, S130–S145 (2011).

    Article  CAS  PubMed  Google Scholar 

  4. Luch, A. Nature and nurture—lessons from chemical carcinogenesis. Nat. Rev. Cancer 5, 113–125 (2005).

    Article  CAS  PubMed  Google Scholar 

  5. Yasaei, H. et al. Carcinogen-specific mutational and epigenetic alterations in INK4A, INK4B and p53 tumour-suppressor genes drive induced senescence bypass in normal diploid mammalian cells. Oncogene 32, 171–179 (2012).

    Article  PubMed  Google Scholar 

  6. Fuchs, R. P. P., Schwartz, N. & Daune, M. P. Hot spots of frameshift mutations induced by the ultimate carcinogen N-acetoxy-N-2-acetylaminofluorene. Nature 294, 657–659 (1981).

    Article  CAS  PubMed  Google Scholar 

  7. Lilly, L. J., Bahner, B. & Magee, P. N. Chromosome aberrations induced in rat lymphocytes by N-nitroso compounds as a possible basis for carcinogen screening. Nature 258, 611–612 (1975).

    Article  CAS  PubMed  Google Scholar 

  8. Madia, F., Worth, A., Whelan, M. & Corvi, R. Carcinogenicity assessment: addressing the challenges of cancer and chemicals in the environment. Environ. Int. 128, 417–429 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Anand, P. et al. Cancer is a preventable disease that requires major lifestyle changes. Pharm. Res. 25, 2097–2116 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Williams, G. M., Iatropoulos, M. J. & Weisburger, J. H. Chemical carcinogen mechanisms of action and implications for testing methodology. Exp. Toxicol. Pathol. 48, 101–111 (1996).

    Article  CAS  PubMed  Google Scholar 

  11. Barrett, J. C. Mechanisms of action of known human carcinogens. IARC Sci. Publ. 116, 115–134 (1992).

    CAS  Google Scholar 

  12. Meister, K. A. America’s War on ‘Carcinogens’: Reassessing the Use of Animal Tests to Predict Human Cancer Risk (American Council on Science, Health, 2005).

  13. Banerjee, P., Eckert, A. O., Schrey, A. K. & Preissner, R. ProTox-II: a webserver for the prediction of toxicity of chemicals. Nucleic Acids Res. 46, W257–W263 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Zhang, L. et al. CarcinoPred-EL: novel models for predicting the carcinogenicity of chemicals using molecular fingerprints and ensemble learning methods. Sci. Rep. 7, 2118 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Gupta, R. et al. OdoriFy: a conglomerate of artificial intelligence-driven prediction engines for olfactory decoding. J. Biol. Chem. 297, 100956.

  16. Gupta, A. et al. Machine-OlF-Action: a unified framework for developing and interpreting machine-learning models for chemosensory research. Bioinformatics 37, 1769–1771 (2021).

    Article  CAS  Google Scholar 

  17. Fjodorova, N. et al. Quantitative and qualitative models for carcinogenicity prediction for non-congeneric chemicals using CP ANN method for regulatory uses. Mol. Divers. 14, 581–594 (2010).

    Article  CAS  PubMed  Google Scholar 

  18. Morales, A. H., Pérez, M. A. C., Combes, R. D. & González, M. P. Quantitative structure activity relationship for the computational prediction of nitrocompounds carcinogenicity. Toxicology 220, 51–62 (2006).

    Article  PubMed  Google Scholar 

  19. Benigni, R., Giuliani, A., Franke, R. & Gruska, A. Quantitative structure-activity relationships of mutagenic and carcinogenic aromatic amines. Chem. Rev. 100, 3697–3714 (2000).

    Article  CAS  PubMed  Google Scholar 

  20. Singh, K. P., Gupta, S. & Rai, P. Predicting carcinogenicity of diverse chemicals using probabilistic neural network modeling approaches. Toxicol. Appl. Pharmacol. 272, 465–475 (2013).

    Article  CAS  PubMed  Google Scholar 

  21. Li, X. et al. In silico estimation of chemical carcinogenicity with binary and ternary classification methods. Mol. Inform. 34, 228–235 (2015).

    Article  CAS  PubMed  Google Scholar 

  22. Benigni, R., Bossa, C., Tcheremenskaia, O. & Giuliani, A. Alternatives to the carcinogenicity bioassay: in silico methods, and the in vitro and in vivo mutagenicity assays. Expert Opin. Drug Metab. Toxicol. 6, 809–819 (2010).

    Article  CAS  PubMed  Google Scholar 

  23. Butterworth, B. E., Aylward, L. L. & Hays, S. M. A mechanism-based cancer risk assessment for 1,4-dichlorobenzene. Regul. Toxicol. Pharmacol. 49, 138–148 (2007).

    Article  CAS  PubMed  Google Scholar 

  24. Liehr, J. G. Is estradiol a genotoxic mutagenic carcinogen? Endocr. Rev. 21, 40–54 (2000).

    CAS  PubMed  Google Scholar 

  25. Knerr, S. & Schrenk, D. Carcinogenicity of 2,3,7,8-tetrachlorodibenzo-p-dioxin in experimental models. Mol. Nutr. Food Res. 50, 897–907 (2006).

    Article  CAS  PubMed  Google Scholar 

  26. Ryffel, B. The carcinogenicity of ciclosporin. Toxicology 73, 1–22 (1992).

    Article  CAS  PubMed  Google Scholar 

  27. Hernández, L. G., van Steeg, H., Luijten, M. & van Benthem, J. Mechanisms of non-genotoxic carcinogens and importance of a weight of evidence approach. Mutat. Res. 682, 94–109 (2009).

    Article  PubMed  Google Scholar 

  28. Miller, E. C. & Miller, J. A. Searches for ultimate chemical carcinogens and their reactions with cellular macromolecules. Cancer 47, 2327–2345 (1981).

    Article  CAS  PubMed  Google Scholar 

  29. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  CAS  PubMed  Google Scholar 

  30. Bertoni, M. et al. Bioactivity descriptors for uncharacterized chemical compounds. Nat. Commun. 12, 3932 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Moriwaki, H., Tian, Y.-S., Kawashita, N. & Takagi, T. Mordred: a molecular descriptor calculator. J. Cheminform. 10, 4 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Ramsundar, B., Eastman, P., Walters, P. & Pande, V. Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More (O’Reilly Media, 2019).

  33. Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (Association for Computing Machinery, New York, 2016).

  34. Maunz, A. et al. lazar: a modular predictive toxicology framework. Front. Pharmacol. 4, 38 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Schyman, P., Liu, R., Desai, V. & Wallqvist, A. vNN web server for ADMET predictions. Front. Pharmacol. 8, 889 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018).

    Article  CAS  PubMed  Google Scholar 

  37. Reznik, E. et al. A landscape of metabolic variation across tumor types. Cell Syst. 6, 301–313.e3 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Dando, I. et al. Oncometabolites in cancer aggressiveness and tumour repopulation. Biol. Rev. Camb. Philos. Soc. 94, 1530–1546 (2019).

    PubMed  Google Scholar 

  39. Liao, Y., Smyth, G. K. & Shi, W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 47, e47 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Lutz, W. K. & Fekete, T. Endogenous and exogenous factors in carcinogenesis: limits to cancer prevention. Int. Arch. Occup. Environ. Health 68, 120–125 (1996).

    Article  CAS  PubMed  Google Scholar 

  41. Rattray, N. J. W. et al. Beyond genomics: understanding exposotypes through metabolomics. Hum. Genomics 12, 4 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Hoeijmakers, J. H. J. DNA damage, aging, and cancer. N. Engl. J. Med. 361, 1475–1485 (2009).

    Article  CAS  PubMed  Google Scholar 

  43. &Ahuja, G. et al. Loss of genomic integrity induced by lysosphingolipid imbalance drives ageing in the heart. EMBO Rep. 20, e47407 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Siramshetty, V. B. et al. WITHDRAWN—a resource for withdrawn and discontinued drugs. Nucleic Acids Res. 44, D1080–D1086 (2016).

    Article  CAS  PubMed  Google Scholar 

  45. Zhou, Z., Dai, Q. & Gu, T. A QSAR model of PAHs carcinogenesis based on thermodynamic stabilities of biactive sites. J. Chem. Inf. Comput. Sci. 43, 615–621 (2003).

    Article  CAS  PubMed  Google Scholar 

  46. Ruiz, P. et al. Prediction of the health effects of polychlorinated biphenyls (PCBs) and their metabolites using quantitative structure–activity relationship (QSAR). Toxicol. Lett. 181, 53–65 (2008).

    Article  CAS  PubMed  Google Scholar 

  47. Ježek, P. 2-Hydroxyglutarate in cancer cells. Antioxid. Redox Signal. 33, 903–926 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Smith, M. T. et al. Key characteristics of carcinogens as a basis for organizing data on mechanisms of carcinogenesis. Environ. Health Perspect. 124, 713–721 (2016).

    Article  CAS  PubMed  Google Scholar 

  49. Schmidt, F. H. A new way to understand chemical carcinogenesis and cancer prevention. RRMC 4, 23–33 (2014).

    Article  Google Scholar 

  50. Gusenleitner, D. et al. Genomic models of short-term exposure accurately predict long-term chemical carcinogenicity and identify putative mechanisms of action. PLoS ONE 9, e102579 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  51. O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13 (2010).

    Article  Google Scholar 

  53. Chawla, N. V., Bowyer, K. W., Hall, L. O. & Kegelmeyer, W. P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002).

    Article  Google Scholar 

  54. Teng, X. & Hardwick, J. M. Reliable method for detection of programmed cell death in yeast. Methods Mol. Biol. 559, 335–342 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  56. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Vaser, R., Adusumalli, S., Leng, S. N., Sikic, M. & Ng, P. C. SIFT missense predictions for genomes. Nat. Protoc. 11, 1–9 (2016).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the IT-HelpDesk team of IIIT-Delhi for providing assistance with the computational resources. We thank all the members of the Ahuja lab for their intellectual contributions at various stages of this project. We also thank K. Datta for providing critical insights into this study and K. Chakraborty for sharing yeast strains. The Ahuja lab is supported by the Ramalingaswami Re-entry Fellowship (BT/HRD/35/02/2006), a re-entry scheme of the Department of Biotechnology, Ministry of Science & Technology, Government of India, Start-Up Research Grant (SRG/2020/000232) from the Science and Engineering Research Board and an intramural Start-up grant from Indraprastha Institute of Information Technology-Delhi. The Sengupta lab is funded by the INSPIRE faculty grant from the Department of Science & Technology, India.

Author information

Authors and Affiliations

Authors

Contributions

The study was conceived by G.A. Computational analysis workflows were designed by G.A., D.S, and A.Mi. Yeast experimental workflows were designed by G.A., and A.Mi., whereas, human experimental workflows were designed by S.N. Yeast-based assays were performed by A.Mi., S.A., and N.K.D. Human cell culture-based experiments were performed by S.S. Data compilation for the model building was performed by A.Mi., P.G., A.A., P.R., and analysis workflow was made by S.M., V.G., S.A., A.Mi., R.S., R.G. and P.G. V.P.S., A.Me. and J.T. assisted in data interpretation. Metabokiller Python package was created by S.K.M. Illustrations were drafted by A.M. and G.A. G.A. and A.Mi. wrote the paper. All authors have read and approved the manuscript.

Corresponding authors

Correspondence to Debarka Sengupta or Gaurav Ahuja.

Ethics declarations

Competing interests

A provisional patent has been filed (reference no. 202111052929, application no. TEMP/E-1/60118/2021-DEL) describing the computational architecture of the Metabokiller. Usage of the Metabokiller Python package is free for the academic institutions, or for any academic-related project, however, for commercial usage, users must contact the authors.

Peer review

Peer review information

Nature Chemical Biology thanks Michael Fasullo, Hongsheng Liu and Stefano Monti for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Workflow detailing Metabokiller functionalities.

Schematic representation depicting the step-by-step workflow used to build all the six individual biochemical models and the ensemble model (Metabokiller). Up/downsampling approach was used to counteract the class imbalance. Signaturizer library was used to generate bioactivity features. Hyperparameter tuning was performed to obtain the best-performing model parameters. The ensemble model (Metabokiller) was built using biochemical features of experimentally validated carcinogens/non-carcinogens generated using six models. The majority voting method was used to assign the final carcinogenicity status.

Extended Data Fig. 2 Metabokiller harbors high prediction performance.

(a) Box plot depicting the AUCROC values of the bootstrapping (n = 20 repetitions) of the indicated models. (b-f) Box plots depicting the AUCROC, accuracy, F1 Score, precision, and recall of the indicated models as inferred from the 10-fold cross-validation. (g) Box plot depicting the model performance of the twenty Gradient Boosting Machine (GBM)-based models generated using bootstrapping technique (n = 20 repetitions). (h) Variables factor map (PCA) depicting the direction and contribution of all the six variables (individual models) representing the experimentally validated carcinogens (MKETn) in the Eigenspace. (i) Principal Component Analysis revealing the chemical heterogeneity between the carcinogens and non-carcinogens in the indicated datasets. The heatmap at the bottom depicts the relative enrichment of the indicated functional groups (RNH2: primary amine, R2NH: secondary amine, R3N: tertiary amine, ROPO3: monophosphate, ROH: alcohol, RCHO: aldehyde, RCOR: ketone, RCOOH: carboxylic acid, RCOOR: ester, ROR: ether, RCCH: terminal alkyne, RCN: nitrile) in both classes. (j) Bar graphs depicting the accuracy of Metabokiller on the indicated unseen datasets. In the box plots, center lines represent the medians; box limits indicate the 25th and 75th percentiles as determined by R software (ggplot2); whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; outliers are represented by dots.

Source data

Extended Data Fig. 3 Metabokiller unravels potential oncometabolites.

(a) Heatmap depicting the number of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) predictions on the Independent Dataset (I.D.) for indicated methods/tools. (b) Venn diagram depicting predicted carcinogenic human metabolites, further segregated based on prediction probability cutoffs. (c) Variables factor map depicting the contribution of all the six individual models in predicting carcinogenic metabolites from HMDB (probability cutoff ≥ 0.5). (d) Projection of the predicted carcinogens (indicated as red dots; probability cutoff ≥ 0.7) on the human metabolic space, achieved using iPath Web Server. (e) Schematic representation of the steps involved in processing pan-cancer metabolomics dataset. Of note, Pearson correlation was computed between log2 fold change (tumor vs healthy) and biochemical/carcinogenicity probabilities. (f) Heatmap detailing the correlation values further segregated based on cancer type. (g) Volcano plots depicting the differentially enriched/de-enriched metabolites in the indicated cancer datasets. Gray dots highlight the metabolites that do not qualify for the enrichment cutoff (log2 fold change ≥ 1 or ≤ -1, and p-value (adjusted) < 0.05), and green and red dots represent the metabolites that qualify for the enrichment cutoff and are predicted as non-carcinogenic and carcinogenic by Metabokiller respectively. The p-value was computed using two-sided Mann–Whitney U test and corrected using Benjamini-Hochberg method. (h) Structural information of some of the well-characterized oncometabolites reported in the literature and predicted by Metabokiller.

Source data

Extended Data Fig. 4 Experimental validations support Metabokiller predictions.

(a) Schematic representation highlighting the predicted-carcinogenic metabolic intermediates of the tyrosine metabolism pathway and aminobenzoate degradation pathway. (b) Box plots depicting the fluorescence intensity of propidium iodide staining indicating cell viability in the indicated conditions (n = 8 biological replicates) after 9 hours (left) and 12 hours (right) of treatment. Of note, heat-killed (HK) yeast cells were used as a positive control. Two-sided Mann–Whitney U test was used to compute statistical significance between the test conditions and the negative control. For left panel, the p-values are 0.0009 (HK); for 4NC: 0.96 (0.1 µM), 0.87 (1 µM), and 0.02 (10 µM); for DP: 0.59 (0.1 µM), 0.64 (1 µM), and 0.83 (10 µM). For right panel, the p-values are 0.0009 (HK); for 4NC: 0.63 (0.1 µM), 0.75 (1 µM), and 0.2 (10 µM); for DP: 0.42 (0.1 µM), 0.26 (1 µM), and 0.17 (10 µM). (c) Growth curve profiles of the treated and untreated wild-type yeast during transient exposure with the indicated conditions (n = 8 biological replicates with technical duplicates). Data points represent mean ± SD. Two-sided Student’s t-test was used to compute statistical significance between the positive (H2O2 treated yeast cells) and negative control (untreated yeast cells). The p-values are 0.9 (0 hrs), 1.5 × 10−6 (1.5 hrs), 4.85 × 10−6 (3 hrs), 4.45 × 10−16 (4.5 hrs), 1.62 × 10−10 (6 hrs), 2.27 × 10−18 (7.5 hrs), 6.41 × 10−13 (9 hrs), 1.04 × 10−23 (10.5 hrs), 5.82 × 10−34 (12 hrs). (d) Box plot depicting the results of reactive oxygen species (ROS) levels inferred using DCFH-DA dye-based assay in the indicated conditions (n = 8 biological replicates). Of note, ROS levels were measured 12 hours post-incubation. Notably, hydrogen peroxide (H2O2) treated yeast cells were used as a positive control. Two-sided Mann–Whitney U test was used to compute statistical significance between the test conditions and the negative control. The p-values are 0.003 (H2O2); for 4NC: 0.069 (0.1 µM), 0.1 (1 µM), and 0.001 (10 µM); for DP: 0.016 (0.1 µM), 0.087 (1 µM), and 0.07 (10 µM). The p-value cutoff for all the plots is 0.05. *, **, ***, and **** refer to p-values <0.05, <0.01, <0.001, and <0.0001, respectively. In the box plots, center lines show the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; outliers are represented by dots.

Source data

Extended Data Fig. 5 RNA-Seq reveals mode-of-action of 4NC and DP.

(a) Bar plots depicting the total read counts (in millions) of the indicated RNA sequencing samples. (b) Box plot representing the distribution of the transformed read count data in the indicated conditions (n = 3 biological replicates). (c) Correlation plot showing the relationship between the individual RNA sequencing samples. Of note, 75% of the normalized and transformed data was used for the correlation analysis. (d-e) Box plots depicting the relative log expression of the 3 biological replicates of the indicated conditions before and after upper quantile normalization. (f) Volcano plot indicating the differentially expressed genes between the treated (metabolite treatment) and untreated conditions. p-value was computed using Wald test and corrected using Benjamini-Hochberg method (g) Metascape-based Functional Gene Ontology analysis identified the involvement of differentially expressed genes in the indicated prominent biological processes. (h) Schematic representation depicting the genomic alterations in the CAN1 gene in the indicated replicates. In the box plots, center lines represent the medians; box limits indicate the 25th and 75th percentiles; whiskers extend 1.5 times the interquartile range from the 25th and 75th percentiles; outliers are represented by dots.

Source data

Supplementary information

Supplementary Information

Supplementary Tables 1–12.

Reporting Summary

Source data

Source Data Fig. 1

Statistical source data for Fig. 1.

Source Data Fig. 2

Statistical source data for Fig. 2.

Source Data Fig. 3

Statistical source data for Fig. 3.

Source Data Fig. 4

Statistical source data for Figure 4.

Source Data Fig. 5

Statistical source data for Figure 5.

Source Data Extended Data Fig. 2

Statistical source data for Extended Data Fig. 2.

Source Data Extended Data Fig. 3

Statistical source data for Extended Data Fig. 3.

Source Data Extended Data Fig. 4

Statistical source data for Extended Data Figure 4.

Source Data Extended Data Fig. 5

Statistical source data for Extended Data Figure 5.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mittal, A., Mohanty, S.K., Gautam, V. et al. Artificial intelligence uncovers carcinogenic human metabolites. Nat Chem Biol 18, 1204–1213 (2022). https://doi.org/10.1038/s41589-022-01110-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41589-022-01110-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing