Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer

A preprint version of the article is available at bioRxiv.

Abstract

Tissue biomarkers are crucial for cancer diagnosis, prognosis assessment and treatment planning. However, there are few known biomarkers that are robust enough to show true analytical and clinical value. Deep learning (DL)-based computational pathology can be used as a strategy to predict survival, but the limited interpretability and generalizability prevent acceptance in clinical practice. Here we present an interpretable human-centric DL-guided framework called PathFinder (Pathological-biomarker-finder) that can help pathologists to discover new tissue biomarkers from well-performing DL models. By combining sparse multi-class tissue spatial distribution information of whole slide images with attribution methods, PathFinder can achieve localization, characterization and verification of potential biomarkers, while guaranteeing state-of-the-art prognostic performance. Using PathFinder, we discovered that spatial distribution of necrosis in liver cancer, a long-neglected factor, has a strong relationship with patient prognosis. We therefore proposed two clinically independent indicators, including necrosis area fraction and tumour necrosis distribution, for practical prognosis, and verified their potential in clinical prognosis according to criteria derived from the Reporting Recommendations for Tumor Marker Prognostic Studies. Our work demonstrates a successful example of introducing DL into clinical practice in a knowledge discovery way, and the approach may be adopted in identifying biomarkers in various cancer types and modalities.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The workflow of PathFinder.
Fig. 2: Performance of PathFinder in the discovery of new tissue biomarkers for clinical prognosis of HCC.
Fig. 3: Discovery and characterization of new tissue biomarkers.
Fig. 4: Verification of new tissue biomarkers.
Fig. 5: Exploring the robustness of macro mode indicators.

Data availability

The TCGA diagnostic whole-slide data and corresponding clinical information are available from NIH genomic data commons (https://portal.gdc.cancer.gov/projects/TCGA-LIHC). The PAIP histology data and corresponding annotations are available from the Pathology AI Platform 2019 challenge (https://paip2019.grand-challenge.org/Dataset/). Restrictions apply to the availability of the QHCG data, including WSIs and generated PaSegNet dataset, which were used with institutional permission through institutional review board approval for the current study, and are thus not publicly available. Please email all requests for academic use of raw and processed data to the corresponding author. All requests will be evaluated on the basis of institutional and departmental policies to determine whether the data requested are subject to intellectual property or patient privacy obligations. Data can only be shared for non-commercial academic purposes and will require a formal material transfer agreement. Source data are provided with this paper.

Code availability

All code was implemented in Python using PyTorch as the primary DL package. All code and scripts to reproduce the experiments of this paper are available at https://github.com/Biooptics2021/PathFinder. The code is also available at https://zenodo.org/record/7628549 (ref. 55).

References

  1. Kather, J. N. & Calderaro, J. Development of AI-based pathology biomarkers in gastrointestinal and liver cancer. Nat. Rev. Gastroenterol. Hepatol. https://doi.org/10.1038/s41575-020-0343-3 (2020).

  2. Ludwig, J. A. & Weinstein, J. N. Biomarkers in cancer staging, prognosis and treatment selection. Nat. Rev. Cancer 5, 845–856 (2005).

    Article  Google Scholar 

  3. Bosman, F. T. & True, L. D. Prognostic biomarkers: an introduction. Virchows Arch. 464, 253–256 (2014).

    Article  Google Scholar 

  4. Mandalà, M. & Massi, D. Tissue prognostic biomarkers in primary cutaneous melanoma. Virchows Arch. 464, 265–281 (2014).

    Article  Google Scholar 

  5. Hamilton, P. W. et al. Digital pathology and image analysis in tissue biomarker research. Methods 70, 59–73 (2014).

    Article  Google Scholar 

  6. Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).

    Article  Google Scholar 

  7. Kleppe, A. et al. Designing deep learning studies in cancer diagnostics. Nat. Rev. Cancer 21, 199–211 (2021).

    Article  Google Scholar 

  8. Courtiol, P. et al. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat. Med. 25, 1519–1525 (2019).

    Article  Google Scholar 

  9. Shi, J.-Y. et al. Exploring prognostic indicators in the pathological images of hepatocellular carcinoma based on deep learning. Gut https://doi.org/10.1136/gutjnl-2020-320930 (2020).

  10. Saillard, C. et al. Predicting survival after hepatocellular carcinoma resection using deep learning on histological slides. Hepatology 72, 2000–2013 (2020).

    Article  Google Scholar 

  11. Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 0062, 1–1 (2020).

    Google Scholar 

  12. Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).

    Article  Google Scholar 

  13. Watson, D. S. et al. Clinical applications of machine learning algorithms: beyond the black box. Br. Med. J. 364, 10–13 (2019).

    Google Scholar 

  14. Boehm, K. M., Khosravi, P., Vanguri, R., Gao, J. & Shah, S. P. Harnessing multimodal data integration to advance precision oncology. Nat. Rev. Cancer 22, 114–126 (2022).

    Article  Google Scholar 

  15. Vasey, B. et al. DECIDE-AI: new reporting guidelines to bridge the development-to-implementation gap in clinical artificial intelligence. Nat. Med. 27, 186–187 (2021).

    Article  Google Scholar 

  16. Kundu, S. AI in medicine must be explainable. Nat. Med. 27, 1328 (2021).

    Article  Google Scholar 

  17. Collins, G. S. & Moons, K. G. M. Reporting of artificial intelligence prediction models. Lancet 393, 1577–1579 (2019).

    Article  Google Scholar 

  18. Elmarakeby, H. A. et al. Biologically informed deep neural network for prostate cancer discovery. Nature 598, 348–352 (2021).

    Article  Google Scholar 

  19. Barredo Arrieta, A. et al. Explainable Artificial Intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020).

    Article  Google Scholar 

  20. Gunning, D. et al. XAI—Explainable artificial intelligence. Sci. Robot. 4, eaay7120 (2019).

  21. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

  22. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).

    Article  Google Scholar 

  23. Bulten, W. et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).

    Article  Google Scholar 

  24. Skrede, O. J. et al. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. Lancet 395, 350–360 (2020).

    Article  Google Scholar 

  25. S. Xie, R. Girshick, P. Dollár, Z. Tu, & K. He. Aggregated Residual Transformations for Deep Neural Networks. in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5987–5995 (2017). doi: 10.1109/CVPR.2017.634

  26. Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. 2nd Int. Conf. Learn. Represent. ICLR 2014 - Work. Track Proc. 1–8 (2014).

  27. McShane, L. M. et al. REporting recommendations for tumour MARKer prognostic studies (REMARK). Br. J. Cancer 93, 387–391 (2005).

  28. Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).

    Article  Google Scholar 

  29. Shaban, M. et al. A novel digital score for abundance of tumour infiltrating lymphocytes predicts disease free survival in oral squamous cell carcinoma. Sci. Rep. 9, 1–13 (2019).

    Article  Google Scholar 

  30. Tellez, D., Litjens, G., Van Der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2021).

    Article  Google Scholar 

  31. Davies, A. et al. Advancing mathematics by guiding human intuition with AI. Nature 600, 70–74 (2021).

    Article  MATH  Google Scholar 

  32. Bijelic, L. & Rubio, E. R. Tumor necrosis in hepatocellular carcinoma—unfairly overlooked? Ann. Surg. Oncol. 28, 600–601 (2021).

    Article  Google Scholar 

  33. Wei, T. et al. Tumor necrosis impacts prognosis of patients undergoing curative-intent hepatocellular carcinoma. Ann. Surg. Oncol. 28, 797–805 (2021).

    Article  Google Scholar 

  34. Ling, Y. H. et al. Tumor necrosis as a poor prognostic predictor on postoperative survival of patients with solitary small hepatocellular carcinoma. BMC Cancer 20, 1–9 (2020).

    Article  Google Scholar 

  35. Vakkila, J. & Lotze, M. T. Inflammation and necrosis promote tumour growth. Nat. Rev. Immunol. 4, 641–648 (2004).

    Article  Google Scholar 

  36. Minervini, A. et al. Prognostic role of histological necrosis for nonmetastatic clear cell renal cell carcinoma: correlation with pathological features and molecular markers. J. Urol. 180, 1284–1289 (2008).

    Article  Google Scholar 

  37. Trentin, L. et al. Tumour-infiltrating lymphocytes bear the 75 kDa tumour necrosis factor receptor. Br. J. Cancer 71, 240–245 (1995).

    Article  Google Scholar 

  38. Mercan, E. et al. Assessment of machine learning of breast pathology structures for automated differentiation of breast cancer and high-risk proliferative lesions. JAMA Netw. Open 2, 1–11 (2019).

    Article  Google Scholar 

  39. Javed, S., Mahmood, A., Werghi, N., Benes, K. & Rajpoot, N. Multiplex cellular communities in multi-gigapixel colorectal cancer histology images for tissue phenotyping. IEEE Trans. Image Process. 29, 1–1 (2020).

  40. Wu, R. et al. Comprehensive analysis of spatial architecture in primary liver cancer. Sci. Adv. 7, eabg3750 (2021).

  41. Liu, Y. et al. High-spatial-resolution multi-omics sequencing via deterministic barcoding in tissue. Cell 183, 1665–1681 (2020).

  42. Xie, W. et al. Prostate cancer risk stratification via non-destructive 3D pathology with deep learning-assisted gland analysis. Cancer Res. https://doi.org/10.1158/0008-5472.can-21-2843 (2021).

  43. Dehaene, O., Camara, A., Moindrot, O., de Lavergne, A. & Courtiol, P. Self-supervision closes the gap between weak and strong supervision in histology. Preprint at arXiv https://doi.org/10.48550/arXiv.2012.03583 (2020).

  44. Whang, S. E., Roh, Y., Song, H. & Lee, J.-G. Data collection and quality challenges in deep learning: a data-centric AI perspective. VLDB J. (2023). doi: 10.1007/s00778-022-00775-9

  45. Yamashita, R. et al. Deep learning model for the prediction of microsatellite instability in colorectal cancer: a diagnostic study. Lancet Oncol. 22, 132–141 (2021).

    Article  Google Scholar 

  46. Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, 1–22 (2019).

    Article  Google Scholar 

  47. Otsu, N. A threshold selection method from gray level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979).

  48. J. Deng et al. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (2009). doi: 10.1109/CVPR.2009.5206848

  49. Vahadane, A. et al. Structure-preserving color normalization and sparse stain separation for histological images. IEEE Trans. Med. Imaging 35, 1962–1971 (2016).

    Article  Google Scholar 

  50. Verweij, P. J. M. & Van Houwelingen, H. C. Penalized likelihood in Cox regression. Stat. Med. 13, 2427–2436 (1994).

    Article  Google Scholar 

  51. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. 34th Int. Conf. Mach. Learn. ICML 2017 7, 5109–5118 (2017).

    Google Scholar 

  52. Xu, K. et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. 32th Int. Conf. Mach. Learn. ICML 2015 37, 2048–2057 (2015).

  53. Horn, H. S. Measurement of ‘overlap’ in comparative ecological studies. Am. Nat. 100, 419–424 (1966).

    Article  Google Scholar 

  54. Kokhlikyan, N. et al. Captum: a unified and generic model interpretability library for PyTorch An Overview of the Algorithms. Preprint at arXiv https://doi.org/10.48550/arXiv.2009.07896 (2020).

  55. Liang, J & Kong, L. PathFinder. Zenodo https://doi.org/10.5281/zenodo.7628549 (2023).

Download references

Acknowledgements

We thank Y. Gao, S. Yang and X. Chen for helpful comments on the manuscript. The study by L.K. and J.L. was partially supported by the STI2030-Major Projects (no. 2022ZD0212000), National Natural Science Foundation of China (NSFC) (nos. 61831014, and 32021002), Tsinghua-Foshan Innovation Special Fund (TFISF) (no. 2021THFS0207) and the Guoqiang Institute, Tsinghua University (no. 2021GQG1024). Y.X. was supported by the Beijing Tsinghua Changgung Hospital Fund (no. 12021C1009).

Author information

Authors and Affiliations

Authors

Contributions

L.K. and J.L. conceived the idea. L.K. supervised the project. J.L. and Y.X. performed the experiments. Y.X., Y.J. and W.M. curated the QHCG dataset. J.L., Y.X. and W.Z. analysed the results. Q.D. and H.Y. provided helpful discussions on the project design. J.L. and L.K. prepared the manuscript with inputs from all co-authors.

Corresponding authors

Correspondence to Qionghai Dai, Hongfang Yin, Ying Xiao or Lingjie Kong.

Ethics declarations

Competing interests

The authors declare that they have no competing financial interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Jiguang Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 The macro mode and micro mode in our model.

Macro mode mainly focuses on the global information at low WSIs resolution. In this case, the spatial distribution information of different tissue types is included, while the high-resolution cell morphology information is discarded. On the contrary, micro mode mainly focuses on the region-level information at high spatial resolution. In this case, the high-resolution cell morphology information is included, while the tissue spatial distribution and contextual information are ignored. Scale bar: 100 μm.

Extended Data Fig. 2 The gap between pathological annotation and deep learning, and the pipeline of meta annotation.

a, The distributions of data points and decision boundaries in latent feature space of different situations. In an ideal situation, DL can learn an ideal decision boundary based on enough and class-balanced data. However, the actual data distribution is often not clear. The dataset we observed usually has noisy labels near the decision boundary, which makes the decision boundary learned by the model jitter in the ideal boundary area, or is class-unbalanced, which makes the decision boundary deviate from the ideal boundary. Meta annotated dataset, collecting a small number of representative data points in each class, is still possible to make the decision boundary close to the ideal boundary. b, Conventional pathological annotation method. It usually takes a long time to complete pixel-level annotation of complex tissues. Scale bar, 500 μm. c, WSI example. Scale bar, 2000 μm. d, The borders or interiors of tumor regions annotated by conventional methods still contain other types of tissue. Scale bar, 200 μm. e, An example of annotating regions with great difficulties. Multiple classes of tissue are mixed together. Scale bar, 100 μm. f, The tumor morphologies at different spatial locations of the WSI are similar. Scale bar, 100 μm. g, The pipeline of our proposed meta annotation. Scale bar: 100 μm (above), 200 μm (below).

Extended Data Fig. 3 Summary of study design and data usage.

a, Information of datasets. b, Training and validation of PaSegNet. c, Acquiring macro mode and micro mode by WSI decoupling and sparsification. d, 10-fold cross-validations of prognosis networks on TCGA dataset. e, Generalization ability test. The prognosis networks were first trained on TCGA dataset and then tested on QHCG dataset. f, Discovery, characterization, and verification of new biomarkers. g, Exploration of macro mode robustness and multiple WSIs selection rule.

Extended Data Fig. 4 Neural network architectures and detailed processes of various modes.

a, The process of obtaining probability heatmaps, segmentation maps, and tumor patches based on PaSegNet. b, c, d, Neural network architectures and detailed processes of MacroNet (b), MicroNet (c) and M2MNet (d), respectively. e, The detailed architecture of each neural network module in the model.

Extended Data Fig. 5 Segmentation results.

a, Segmentation results of QHCG WSIs. b, Segmentation results of TCGA WSIs. c, Segmentation results of PAIP WSIs. d, Segmentation results of small key lesion regions. Left, necrosis regions and corresponding probability heatmap. Scale bar, 250 μm. Right, tumor regions and corresponding probability heatmap. Scale bar, 1 mm. TUM, tumor; Nor, normal; FIB, fibrosis; INF, inflammation; NEC, necrosis; REA, bile duct reaction; STE, steatosis.

Extended Data Fig. 6 Survival and recurrence analyses on TCGA and QHCG dataset, and the correlation maps of clinical parameters.

a, b, Kaplan-Meier analyses of patient stratification of low and high death risk patients via M2MNet on TCGA dataset (a) and QHCG dataset (b). c-f, Kaplan-Meier analyses of patient stratification of low and high recurrence risk patients via M2MNet (c), MacroNet (d), TND (e), and NEC (f) on QHCG dataset. g-i, Multivariable analyses of factors associated with recurrence and MacroNet (g), TND (h), and NEC (i) on QHCG dataset (n = 83 patients); the data are presented as hazard ratio estimates (squares) and the error bars show the 95%-confidence interval of the hazard ratio estimate, according to multivariable Cox proportional hazards model. The results of univariate, multivariate analyses, and the abbreviations of each variable are detailed in Supplementary Table 3. j, k, Correlation maps of clinical parameters on TCGA dataset (j) and QHCG dataset (k). P values according to two-sided log-rank test (a-f) and multivariable Cox proportional hazards model (g-i). n, sample size; HR, hazard ratio; Stage, AJCC staging; TIL, tumor infiltrating lymphocytes digital score; BDT, bile duct thrombosis; AFP, alpha-fetoprotein; MVI, microvascular invasion.

Source data

Extended Data Fig. 7 Quantification analysis of macro mode, and the indicator distributions among all WSIs.

a, Quantification of tissue fraction on TCGA dataset (n = 330 patients). b, Quantification of TIL on TCGA dataset (n = 330 patients). c, Quantification of tissue fraction on QHCG dataset (n = 83 patients). d, Quantification of TIL on QHCG dataset (n = 83 patients). e, Distribution of NEC score from different WSIs of a same patient. f, Distribution of TND score from different WSIs of a same patient. a-d, The median risk score value is taken as the cutoff value of high risk group and low risk group; the significance level shown is determined using a two-sided Mann-Whitney-Wilcoxon test; boxplot whiskers extend to the smallest and largest value within 1.5 times the interquartile ranges of hinges, and box centre and hinges indicate median and first and third quartiles, respectively. TIL, tumor infiltrating lymphocytes digital score; TUM, tumor; Nor, normal; FIB, fibrosis; INF, inflammation; NEC, necrosis; REA, bile duct reaction; STE, steatosis.

Source data

Extended Data Fig. 8 The localization results of corresponding pathological features of TND and NEC.

a, TND heatmaps and pathological features of its localization. b, NEC heatmaps and pathological features of its localization. The zoom-in views of pathological slides are from the heatmaps labelled in black and red boxes. Scale bar: 500 μm.

Extended Data Fig. 9 Robustness of TND under different segmentation accuracies.

a, TND scores calculated for each patient based on segmentation results generated by 11 CNNs. The TND scores corresponding to ResNeXt50 (the CNN used in this study) are marked with an opaque blue asterisk. Patients are ranked based on TND scores corresponding to ResNeXt50. b, Classification performance, segmentation results, TND heatmaps, and prognostic performance of different CNNs. Histograms include recall, precision, and F1-score for each CNN’s ‘tumor’ category tested on QHCG test set, as well as TND prognostic performance (C-Index) based on segmentation maps generated by each CNN. c, Prognostic performance distributions of different CNNs (n = 11 networks). Boxplot whiskers extend to the smallest and largest value within 1.5 times the interquartile ranges of hinges, and box centre and hinges indicate median and first and third quartiles, respectively.

Source data

Extended Data Fig. 10 Robustness of NEC under different segmentation accuracies.

a, NEC scores calculated for each patient based on segmentation results generated by 11 CNNs. The NEC scores corresponding to ResNeXt50 (the CNN used in this study) are marked with an opaque blue asterisk. Patients are ranked based on NEC scores corresponding to ResNeXt50. b, Classification performance, segmentation results, NEC heatmaps, and prognostic performance of different CNNs. Histograms include recall, precision, and F1-score for each CNN’s ‘necrosis’ category tested on QHCG test set, as well as NEC prognostic performance (C-Index) based on segmentation maps generated by each CNN. c, Prognostic performance distributions of different CNNs (n = 11 networks). Boxplot whiskers extend to the smallest and largest value within 1.5 times the interquartile ranges of hinges, and box centre and hinges indicate median and first and third quartiles, respectively.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–7 and Notes 1 and 2.

Reporting Summary

Supplementary Table 1

Including Supplementary Tables 1–9. The title and description are included within Excel workbook.

Source data

Source Data Fig. 2

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Fig./Table 6

Statistical source data.

Source Data Extended Data Fig./Table 7

Statistical source data.

Source Data Extended Data Fig./Table 9

Statistical source data.

Source Data Extended Data Fig./Table 10

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, J., Zhang, W., Yang, J. et al. Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer. Nat Mach Intell 5, 408–420 (2023). https://doi.org/10.1038/s42256-023-00635-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s42256-023-00635-3

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer