Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Prediction of DNA methylation-based tumor types from histopathology in central nervous system tumors with deep learning

Abstract

Precision in the diagnosis of diverse central nervous system (CNS) tumor types is crucial for optimal treatment. DNA methylation profiles, which capture the methylation status of thousands of individual CpG sites, are state-of-the-art data-driven means to enhance diagnostic accuracy but are also time consuming and not widely available. Here, to address these limitations, we developed Deep lEarning from histoPathoLOgy and methYlation (DEPLOY), a deep learning model that classifies CNS tumors to ten major categories from histopathology. DEPLOY integrates three distinct components: the first classifies CNS tumors directly from slide images (‘direct model’), the second initially generates predictions for DNA methylation beta values, which are subsequently used for tumor classification (‘indirect model’), and the third classifies tumor types directly from routinely available patient demographics. First, we find that DEPLOY accurately predicts beta values from histopathology images. Second, using a ten-class model trained on an internal dataset of 1,796 patients, we predict the tumor categories in three independent external test datasets including 2,156 patients, achieving an overall accuracy of 95% and balanced accuracy of 91% on samples that are predicted with high confidence. These results showcase the potential future use of DEPLOY to assist pathologists in diagnosing CNS tumors within a clinically relevant short time frame.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the datasets and computational workflow.
Fig. 2: Performance of DEPLOY in predicting methylation beta values.
Fig. 3: Performance of DEPLOY in classifying brain tumor types.
Fig. 4: The diagnostic changes suggested by DEPLOY in a subcohort of diagnostically challenging cases.
Fig. 5: DEPLOY provides spatial methylation predictions, allowing assessment of tumor heterogeneity, as illustrated by analysis of a dual-genotype oligoastrocytoma.

Similar content being viewed by others

Data availability

DBTA data, including WSIs, tumor types and demographics, are available on EBRAINS (https://ebrains.eu; https://doi.org/10.25493/WQ48-ZGX). Registration and acceptance of terms and data use agreements are required for access. CBTN WSIs and demographic data can be accessed via a request form at the Children Brain Tumor Network (https://cbtn.org/). Requests for data collected in-house, including the NCI and NCI–Prospective cohorts, should be directed to the corresponding authors. Each request will be assessed individually within 15 business days to ensure adherence to intellectual property rights and patient privacy standards.

Code availability

The DEPLOY code is available for academic research purposes via Zenodo at https://doi.org/10.5281/zenodo.8381774 (ref. 57).

References

  1. Louis, D. N. et al. The 2021 WHO classification of tumors of the central nervous system: a summary. Neuro. Oncol. 23, 1231–1251 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bruner, J. M., Inouye, L., Fuller, G. N. & Langford, L. A. Diagnostic discrepancies and their clinical impact in a neuropathology referral practice. Cancer 79, 796–803 (1997).

    Article  CAS  PubMed  Google Scholar 

  3. Wu, Z. et al. Impact of the methylation classifier and ancillary methods on CNS tumor diagnostics. Neuro. Oncol. 24, 571–581 (2022).

    Article  CAS  PubMed  Google Scholar 

  4. Capper, D. et al. DNA methylation-based classification of central nervous system tumours. Nature 555, 469–474 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Pratt, D., Sahm, F. & Aldape, K. DNA methylation profiling as a model for discovery and precision diagnostics in neuro-oncology. Neuro. Oncol. 23, S16–S29 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26, 900–908 (2020).

    Article  CAS  PubMed  Google Scholar 

  7. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Kalra, S. et al. Pan-cancer diagnostic consensus through searching archival histopathology images using artificial intelligence. NPJ Digit. Med. 3, 31 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).

    Article  CAS  PubMed  Google Scholar 

  12. Li, Z. et al. Vision transformer-based weakly supervised histopathological image analysis of primary brain tumors. iScience 26, 105872 (2023).

    Article  CAS  PubMed  Google Scholar 

  13. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).

    Article  CAS  PubMed  Google Scholar 

  16. Qu, H. et al. Genetic mutation and biological pathway prediction based on whole slide images in breast carcinoma using deep learning. NPJ Precis. Oncol. 5, 87 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Schaumberg, A. J., Rubin, M. A. & Fuchs, T. J. H&E-stained whole slide image deep learning predicts SPOP mutation state in prostate cancer. Preprint at bioRxiv https://doi.org/10.1101/064279 (2018).

  18. Tsou, P. & Wu, C.-J. Mapping driver mutations to histopathological subtypes in papillary thyroid carcinoma: applying a deep convolutional neural network. J. Clin. Med. Res. 8, 1675 (2019).

    CAS  Google Scholar 

  19. Chang, P. et al. Deep-learning convolutional neural networks accurately classify genetic mutations in gliomas. Am. J. Neuroradiol. 39, 1201–1207 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kim, R. H. et al. A deep learning approach for rapid mutational screening in melanoma. Preprint at bioRxiv https://doi.org/10.1101/610311 (2020).

  21. Chen, M. et al. Classification and mutation prediction based on histopathology H&E images in liver cancer using deep learning. NPJ Precis. Oncol. 4, 1–7 (2020).

    Google Scholar 

  22. Ghaffari, L. N. et al. Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology. Med. Image Anal. 79, 102474 (2022).

    Article  Google Scholar 

  23. Nasrallah, M. P. et al. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med 4, 526–540.e4 (2023).

    Article  PubMed  Google Scholar 

  24. Wang, Y. et al. Predicting molecular phenotypes from histopathology images: a transcriptome-wide expression-morphology analysis in breast cancer. Cancer Res. 81, 5115–5126 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Schmauch, B. et al. A deep learning model to predict RNA-Seq expression of tumours from whole slide images. Nat. Commun. 11, 3877 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Alsaafin, A., Safarpoor, A., Sikaroudi, M., Hipp, J. D. & Tizhoosh, H. R. Learning to predict RNA sequence expressions from whole slide images with applications for search and classification. Nat. Commun. Biol. 6, 304 (2023).

    Article  CAS  Google Scholar 

  27. Hoang, D.-T. et al. Prediction of cancer treatment response from histopathology images through imputed transcriptomics. Preprint at bioRxiv https://doi.org/10.1101/2022.06.07.495219 (2023).

  28. He, B. et al. Integrating spatial gene expression and breast tumour morphology via deep learning. Nat. Biomed. Eng. 4, 827–834 (2020).

    Article  CAS  PubMed  Google Scholar 

  29. Monjo, T., Koido, M., Nagasawa, S., Suzuki, Y. & Kamatani, Y. Efficient prediction of a spatial transcriptomics profile better characterizes breast cancer tissue sections without costly experimentation. Sci. Rep. 12, 4133 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Levy-Jurgenson, A., Tekpli, X., Kristensen, V. N. & Yakhini, Z. Spatial transcriptomics inferred from pathology whole-slide images links tumor heterogeneity to survival in breast and lung cancer. Sci. Rep. 10, 18802 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Zheng, H., Momeni, A., Cedoz, P.-L., Vogel, H. & Gevaert, O. Whole slide images reflect DNA methylation patterns of human tumors. NPJ Genom. Med. 5, 11 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource. Sci. Data 9, 55 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Lilly, J. V. et al. The children’s brain tumor network (CBTN)—accelerating research in pediatric central nervous system tumors through collaboration and open science. Neoplasia 35, 100846 (2023).

    Article  PubMed  Google Scholar 

  34. Unruh, D. et al. Methylation and transcription patterns are distinct in IDH mutant gliomas compared to other IDH mutant cancers. Sci. Rep. 9, 8946 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).

    Article  CAS  PubMed  Google Scholar 

  36. Iorio, F. et al. Pathway-based dissection of the genomic heterogeneity of cancer hallmarks’ acquisition with SLAPenrich. Sci. Rep. 8, 6713 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Capper, D. et al. Practical implementation of DNA methylation and copy-number-based CNS tumor diagnostics: the Heidelberg experience. Acta Neuropathol. 136, 181–210 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Nasrallah, M. L. P., Desai, A., O’Rourke, D. M., Surrey, L. F. & Stein, J. M. A dual-genotype oligoastrocytoma with histologic, molecular, radiological and time-course features. Acta Neuropathol. Commun. 8, 115 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Ellison, D. W. et al. Histopathological grading of pediatric ependymoma: reproducibility and clinical relevance in European trial cohorts. J. Negat. Results Biomed. 10, 7 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  40. van den Bent, M. J. Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician’s perspective. Acta Neuropathol. 120, 297–304 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Pickles, J. C. et al. DNA methylation-based profiling for paediatric CNS tumour diagnosis and treatment: a population-based study. Lancet Child Adolesc. Health 4, 121–130 (2020).

    Article  CAS  PubMed  Google Scholar 

  42. Priesterbach-Ackley, L. P. et al. Brain tumour diagnostics using a DNA methylation-based classifier as a diagnostic support tool. Neuropathol. Appl. Neurobiol. 46, 478–492 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Lujan, G. et al. Dissecting the business case for adoption and implementation of digital pathology: a white paper from the digital pathology association. J. Pathol. Inform. 12, 17 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Evans, A. J., Vajpeyi, R., Henry, M. & Chetty, R. Establishment of a remote diagnostic histopathology service using whole slide imaging (digital pathology). J. Clin. Pathol. 74, 421–424 (2021).

    Article  PubMed  Google Scholar 

  45. Williams, B. J., Bottoms, D., Clark, D. & Treanor, D. Future-proofing pathology part 2: building a business case for digital pathology. J. Clin. Pathol. 72, 198–205 (2019).

    Article  PubMed  Google Scholar 

  46. Ho, J. et al. Can digital pathology result in cost savings? A financial projection for digital pathology implementation at a large integrated health care organization. J. Pathol. Inform. 5, 33 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Hanna, M. G. et al. Implementation of digital pathology offers clinical and operational increase in efficiency and cost savings. Arch. Pathol. Lab. Med. 143, 1545–1555 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Koelsche, C. et al. Sarcoma classification by DNA methylation profiling. Nat. Commun. 12, 498 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Jurmeister, P. et al. DNA methylation-based classification of sinonasal tumors. Nat. Commun. 13, 7148 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  50. Hackeng, W. M. et al. Genome methylation accurately predicts neuroendocrine tumor origin: an online tool. Clin. Cancer Res. 27, 1341–1350 (2021).

    Article  CAS  PubMed  Google Scholar 

  51. Leitheiser, M. et al. Machine learning models predict the primary sites of head and neck squamous cell carcinoma metastases based on DNA methylation. J. Pathol. 256, 378–387 (2022).

    Article  CAS  PubMed  Google Scholar 

  52. Jurmeister, P. et al. Machine learning analysis of DNA methylation profiles distinguishes primary lung squamous cell carcinomas from head and neck metastases. Sci. Transl. Med. 11, eaaw8513 (2019).

    Article  CAS  PubMed  Google Scholar 

  53. Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Zhou, W., Laird, P. W. & Shen, H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 45, e22 (2017).

    PubMed  Google Scholar 

  55. Moran, S., Arribas, C. & Esteller, M. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics 8, 389–399 (2016).

    Article  CAS  PubMed  Google Scholar 

  56. Fang, Z., Liu, X. & Peltz, G. GSEApy: a comprehensive package for performing gene set enrichment analysis in Python. Bioinformatics 39, btac757 (2023).

    Article  CAS  PubMed  Google Scholar 

  57. DEPLOY: an integrated deep learning model for predicting DNA methylation and tumor types from H&E images. Zenodo https://doi.org/10.5281/zenodo.8381774 (2024).

Download references

Acknowledgements

This work was partially supported by grant DP190103402 from the Australian Research Council (D.-T.H and E.A.S.) and by the Intramural Research Program of the National Institutes of Health, NCI and Center for Cancer Research (E.D.S., R.T., Z.A., E.M.C., H.L., E.R. and K.A). E.D.S. is supported by the NCI T2I Fellowship. This work utilized the super computational resources of the Australian National Computational Infrastructure, the Australian National University Merit Allocation Scheme and National Institutes of Health High Performance Computing (Biowulf cluster).

Author information

Authors and Affiliations

Authors

Contributions

K.A. and E.R. conceived and designed the study. D.-T.H. and E.D.S. developed the DEPLOY models. D.-T.H. and E.D.S. conducted data analysis and validation. M.P.N. generated data and figures for mapping tumor diagnostic class heterogeneity. K.A. and M.P.N. provided expertise in neuropathology. R.T., Z.A. and O.S. generated and collected the NCI datasets. R.T. normalized methylation data. O.S., E.M.C. and H.L. provided assistance with computing and data visualization. E.A.S. interpreted the results and provided feedback on the study. K.A. and E.R. supervised the study. D.-T.H., E.D.S., M.P.N., E.R. and K.A. wrote the paper with the assistance and feedback from all the other co-authors.

Corresponding authors

Correspondence to Eytan Ruppin or Kenneth Aldape.

Ethics declarations

Competing interests

D.-T.H., E.D.S., E.R. and K.A. are listed as inventors on a provisional patent (application no. 63/626,277, United States, 2024) filed based on the methodology outlined in this study. E.R. is a co-founder of Medaware, Metabomed and Pangea Biomed (divested from the latter). E.R. serves as a non-paid scientific consultant to Pangea Biomed under a collaboration agreement between Pangea Biomed and the NCI. The other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Lorenzo Righetto and Saheli Sadanand, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Pathway enrichment analysis of predicted beta values across brain tumor types not covered by the ten predefined categories.

a, This panel broadens the scope of Fig. 2f, incorporating brain tumor types that are not included in the ten predefined types. Values inside the grid denote the mean of the scaled normalized enrichment score for the specific tumor type and pathway combination. Analyses based on actual and predicted beta values from the NCI cohort are displayed on the right and left, respectively (Methods). b, This panel follows the format of (a) but focuses on gene body methylation. It extends the analysis from Fig. 2g to include 31 additional brain tumor types and two control tissue signatures not covered by the ten predefined categories. c, The bar plots display the Pearson correlation coefficients, which quantify the agreement between DEPLOY’s predicted and actual average enrichment scores across various brain cancer types in the NCI cohort (n = 31 tumor types). The upper section of the panel presents correlations for promoter methylation, whereas the lower section addresses gene body methylation. Error bars indicate the 95% confidence intervals calculated using Fisher transformation and centers correspond to the computed correlation values. This panel focuses on brain tumor types that fall outside the scope of the ten predefined tumor types. For correlations associated with these predefined tumor types, refer to Fig. 2h.

Extended Data Fig. 2 Micro-averaged precision-recall curves and the area under the precision-recall curve (AUPRC) values on each cohort.

Results are depicted for the four models: Demographic (light blue), Direct (orange), Indirect (green), and Integrated (DEPLOY, dark blue). Error bands represent the 95% confidence intervals for each model, calculated through 1,000 bootstrapping iterations and centered on each model’s computed values.

Extended Data Fig. 3 Overall (micro-averaged) accuracy and balanced accuracy for all the external datasets together.

a-b, results for all samples (n = 2,156 patients). c-d, results for samples with high prediction scores (above 0.39; n = 1,407 patients). Error bars denote the 95% confidence intervals derived from bootstrapping. For every case, the top-1 and top-2 accuracies of the integrated model are significantly higher than those of each individual component (P-values < 0.001, according to one-sided bootstrap test, see Supplementary Table 3).

Extended Data Fig. 4 Performance of the combination of the direct and demographic models, and the combination of the indirect and demographic models.

Top-1 and top-2 accuracies obtained by the four models: Indirect, combination of Direct and Demographic (Direct+Demographic), combination of Indirect and Demographic (Indirect+Demographic), and Integrated model. a, The results are shown for all samples. Results for each cohort are organized separately from left to right: NCI (n = 1,796 patients), DBTA (n = 1,522 patients), CBTN (n = 348 patients), and NCI-Prospective (n = 286). b, Results for samples with prediction score above 0.39. Results for each cohort are organized separately from left to right: NCI (n = 1,193 patients), DBTA (n = 1,026 patients), CBTN (n = 212 patients), and NCI-Prospective (n = 169 patients) (b). Error bars denote the 95% confidence intervals derived from 1,000 bootstrapping iterations and the centers correspond to the computed accuracy values for each model.

Extended Data Fig. 5 Model performance on NCI cohort for each individual cancer type.

The one-versus-rest AUPRC, Precision, Recall, and F1 score for each brain tumor type achieved by demographic (light blue), direct (orange), indirect (red), and our integrated model (dark blue). The number of patients in each cancer type is shown in parentheses. For each metric indicated on the vertical axis, the error bars represent the 95% confidence intervals, derived from 1,000 bootstrapping iterations and the centers correspond to computed values of the respective metric.

Extended Data Fig. 6 Model performance on all the three external cohorts together for each individual cancer type.

The one-versus-rest AUPRC, Precision, Recall, and F1 score for each brain tumor type achieved by demographic (light blue), direct (orange), indirect (red), and our integrated model (dark blue). The number of patients in each cancer type is shown in parentheses. For each metric indicated on the vertical axis, the error bars represent the 95% confidence intervals, derived from 1,000 bootstrapping iterations and the centers correspond to computed values of the respective metric.

Extended Data Fig. 7 Model performance on DBTA cohort for each individual cancer type.

The one-versus-rest AUPRC, Precision, Recall, and F1 score for each brain tumor type achieved by demographic (light blue), direct (orange), indirect (red), and our integrated model (dark blue). The number of patients in each cancer type is shown in parentheses. For each metric indicated on the vertical axis, the error bars represent the 95% confidence intervals, derived from 1,000 bootstrapping iterations and the centers correspond to computed values of the respective metric.

Extended Data Fig. 8 Model performance on CBTN cohort for each individual cancer type.

The one-versus-rest AUPRC, Precision, Recall, and F1 score for each brain tumor type achieved by demographic (light blue), direct (orange), indirect (red), and our integrated model (dark blue). Notably, GBM, SE and O-IDH are not present in the CBTN cohort. The number of patients in each cancer type is shown in parentheses. For each metric indicated on the vertical axis, the error bars represent the 95% confidence intervals, derived from 1,000 bootstrapping iterations and the centers correspond to computed values of the respective metric.

Extended Data Fig. 9 Model performance on NCI-Prospective cohort for each individual cancer type.

The one-versus-rest AUPRC, Precision, Recall, and F1 score for each brain tumor type achieved by demographic (light blue), direct (orange), indirect (red), and our integrated model (dark blue). The number of patients in each cancer type is shown in parentheses. For each metric indicated on the vertical axis, the error bars represent the 95% confidence intervals, derived from 1,000 bootstrapping iterations and the centers correspond to computed values of the respective metric.

Extended Data Fig. 10 Balanced accuracies on each cohort for samples with high prediction scores (above 0.39).

Results for each cohort are organized separately from left to right: NCI (first column), DBTA (second column), CBTN (third column) and NCI-Prospective (fourth column). Error bars denote the 95% confidence intervals obtained via bootstrapping (1,000 iterations) and the centers correspond to the computed accuracy values for each model.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hoang, DT., Shulman, E.D., Turakulov, R. et al. Prediction of DNA methylation-based tumor types from histopathology in central nervous system tumors with deep learning. Nat Med (2024). https://doi.org/10.1038/s41591-024-02995-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41591-024-02995-8

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer