Although it has long been known that the immune cell composition has a strong prognostic and predictive value in colorectal cancer (CRC), scoring systems such as the immunoscore (IS) or quantification of intraepithelial lymphocytes are only slowly being adopted into clinical routine use and have their limitations. To address this we established and evaluated a multistain deep learning model (MSDLM) utilizing artificial intelligence (AI) to determine the AImmunoscore (AIS) in more than 1,000 patients with CRC. Our model had high prognostic capabilities and outperformed other clinical, molecular and immune cell-based parameters. It could also be used to predict the response to neoadjuvant therapy in patients with rectal cancer. Using an explainable AI approach, we confirmed that the MSDLM’s decisions were based on established cellular patterns of anti-tumor immunity. Hence, the AIS could provide clinicians with a valuable decision-making tool based on the tumor immune microenvironment.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
Prices may be subject to local taxes which are calculated during checkout
No publicly available datasets were used in this study. The datasets and models were generated from institutional cohorts (Technical University Munich cohort, Comprehensive Cancer Centre Erlangen-EMN cohort, Mainz cohort, and the neoadjuvant cohort consisting of FFPE material in the form of TMAs) and cannot be made publicly available due to general data protection regulations and institutional guidelines. Example data from another use case (for trying out our approach) is available at https://zenodo.org/record/6791937.
An open source version of the code base is available at https://github.com/AGFoersch/MultiStainDeepLearning. For more information please contact the corresponding author at firstname.lastname@example.org.
Galon, J. et al. Type, density, and location of immune cells within human colorectal tumors predict clinical outcome. Science 313, 1960–1964 (2006).
Bindea, G. et al. Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. Immunity 39, 782–795 (2013).
Pages, F. et al. International validation of the consensus Immunoscore for the classification of colon cancer: a prognostic and accuracy study. Lancet 391, 2128–2139 (2018).
Nagtegaal, I. D. et al. The 2019 WHO classification of tumours of the digestive system. Histopathology 76, 182–188 (2020).
Argiles, G. et al. Localised colon cancer: ESMO Clinical Practice Guidelines for diagnosis, treatment and follow-up. Ann. Oncol. 31, 1291–1305 (2020).
Denkert, C. et al. Standardized evaluation of tumor-infiltrating lymphocytes in breast cancer: results of the ring studies of the International Immuno-oncology Biomarker Working Group. Mod. Pathol. 29, 1155–1164 (2016).
Angell, H. K., Bruni, D., Barrett, J. C., Herbst, R. & Galon, J. The immunoscore: colon cancer and beyond. Clin. Cancer Res. 26, 332–339 (2020).
Sinicrope, F. A. & Graham, R. P. Tumor-infiltrating lymphocytes for prognostic stratification in nonmetastatic colon cancer: are we there yet? JAMA Oncol. 7, 969–970 (2021).
Schurch, C. M. et al. Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front. Cell 182, 1341–1359 (2020).
Nearchou, I. P. et al. Automated analysis of lymphocytic infiltration, tumor budding, and their spatial relationship improves prognostic accuracy in colorectal cancer. Cancer Immunol. Res. 7, 609–620 (2019).
Vayrynen, J. P. et al. Spatial organization and prognostic significance of NK and NKT-like cells via multimarker analysis of the colorectal cancer microenvironment. Cancer Immunol. Res. 10, 215–227 (2022).
Ribeiro, A. H. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat. Commun. 11, 1760 (2020).
De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).
Ardila, D. et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25, 954–961 (2019).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Saldanha, O. L. et al. Swarm learning for decentralized artificial intelligence in cancer histopathology. Nat. Med. 28, 1232–1239 (2022).
Woerl, A. C. et al. Deep learning predicts molecular subtype of muscle-invasive bladder cancer from conventional histopathological slides. Eur. Urol. 78, 256–264 (2020).
Cifci, D., Foersch, S. & Kather, J. N. Artificial intelligence to identify genetic alterations in conventional histopathology. J. Pathol. 257, 430–444 (2022).
Foersch, S. et al. Deep learning for diagnosis and survival prediction in soft tissue sarcoma. Ann. Oncol. 32, 1178–1187 (2021).
Schulz, S. et al. Multimodal deep learning for prognosis prediction in renal cancer. Front. Oncol. 11, 788740 (2021).
Huang, S. C., Pareek, A., Seyyedi, S., Banerjee, I. & Lungren, M. P. Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines. NPJ Digit. Med. 3, 136 (2020).
Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770 (2022).
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
Bubendorf, L., Nocito, A., Moch, H. & Sauter, G. Tissue microarray (TMA) technology: miniaturized pathology archives for high-throughput in situ studies. J. Pathol. 195, 72–79 (2001).
Skrede, O. J. et al. Deep learning for prediction of colorectal cancer outcome: a discovery and validation study. Lancet 395, 350–360 (2020).
El Sissy, C. et al. A diagnostic biopsy-adapted immunoscore predicts response to neoadjuvant treatment and selects patients with rectal cancer eligible for a watch-and-wait strategy. Clin. Cancer Res. 26, 5198–5207 (2020).
Galon, J. et al. Towards the introduction of the ‘Immunoscore’ in the classification of malignant tumours. J. Pathol. 232, 199–209 (2014).
Kather, J. N. et al. Topography of cancer-associated immune cells in human solid tumors. Elife 7, e36967 (2018).
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018).
Reichling, C. et al. Artificial intelligence-guided tissue analysis combined with immune infiltrate assessment predicts stage III colon cancer outcomes in PETACC08 study. Gut 69, 681–690 (2020).
Boehm, K. M. et al. Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer. Nat. Cancer 3, 723–733 (2022).
Fridman, W. H. et al. B cells and cancer: to B or not to B? J. Exp. Med. 218, e20200851 (2021).
Aras, S. & Zaidi, M. R. TAMeless traitors: macrophages in cancer progression and metastasis. Br. J. Cancer 117, 1583–1591 (2017).
Wulczyn, E. et al. Interpretable survival prediction for colorectal cancer using deep learning. NPJ Digit. Med. 4, 71 (2021).
Brockmoeller, S. et al. Deep learning identifies inflamed fat as a risk factor for lymph node metastasis in early colorectal cancer. J. Pathol. 256, 269–281 (2022).
Kleppe, A. et al. Designing deep learning studies in cancer diagnostics. Nat. Rev. Cancer 21, 199–211 (2021).
Dworak, O., Keilholz, L. & Hoffmann, A. Pathological features of rectal cancer after preoperative radiochemotherapy. Int. J. Colorectal Dis. 12, 19–23 (1997).
Bankhead, P. et al. QuPath: open source software for digital pathology image analysis. Sci. Rep. 7, 16878 (2017).
Reinhard, E., Adhikhmin, M., Gooch, B. & Shirley, P. Color transfer between images. IEEE Comput. Graph. Appl. 21, 34–41 (2001).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, 2016).
Selvaraju, R.R., et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In 2017 IEEE International Conference on Computer Vision (ICCV) 618–626 (IEEE, 2020).
S.F. was supported by the Federal Ministry of Education and Research (16SV8167), the Stage-I-Program of the University Medical Center Mainz, the Mainz Research School of Translational Biomedicine (TransMed) and the Manfred-Stolte-Foundation. J.N.K. is supported by the German Federal Ministry of Health (DEEP LIVER, ZMVI1-2520DAT111) and the Max-Eder-Program of the German Cancer Aid (grant 70113864). Aspects of this work are part of the medical doctoral theses of S.S., F.K. and K.T.
J.N.K. reports consulting services for Owkin (France), Panakeia (UK), and DoMore Diagnostics (Norway) and has received honoraria for lectures by MSD, Eisai, and Fresenius. The other authors declare no competing interests.
Peer review information
Nature Medicine thanks Tae Hyun Hwang, Timothy Maughan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Ulrike Harjes and Saheli Sadanand, in collaboration with the Nature Medicine team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1
a: Scatter plots of CD3, CD8, CD4, CD20, and CD68 positive immune cell counts derived from whole slide images and tissue microarrays. Spearman’s correlation coefficients and p-values from two-sided testing are given. Note that for all markers but CD3 the scale is logarithmic. The line indicates a linear regression model with indication of the 95% confidence interval. b: Confusion matrices of the IS 2 and IS 3 derived from both TMA and WSI. Results of Fisher’s exact and Chi2 test are provided. c: Scatter plots of MLH1, MSH2, MSH6, and PMS2 expression derived from whole slide images and tissue microarrays. Spearman’s correlation coefficients and p-values from two-sided testing are given. The line indicates a linear regression model with indication of the 95% confidence interval. N = 51 for all stainings (except PMS2 where n = 45). (WSI: Whole slide image. TMA: Tissue microarray).
Extended Data Fig. 2
a: Procedure of the TMA generation for the TUM–CCC-EMN cohort. For the Erlangen Cohort three cores from the invasive margin and three cores form the tumor center were used. For the Munich cohort, one core from the invasive margin and one core from the tumor center was used. b: Procedure of the TMA generation for the Mainz cohort. Three TMA cores were taken according to the ratio of invasive margin to tumor center. c, d: Further preprocessing included serial sectioning, immunohistochemistry, and preprocessing of the TMA cores and the tiles. Scale bars on the left subpanel represent ca. 200 µm. Scale bars on the right represent ca. 100 µm. e: Examples of excluded cores. Scale bars represent ca. 200 µm. (TMA: Tissue microarray, NAT: Normal adjacent tissue (not used in this study)). Some illustrations were generated with BioRender.com.
Extended Data Fig. 3 Overview of the SSDLM (Single-stain deep learning model).
Arrow labels indicate each component’s output dimensions. Dim, mmhid, num_classes, etc. are defined model parameters. Fc(x, y) represents a fully connected layer with the input dimension x and the output dimension y.
Extended Data Fig. 4 Overview of the MSDLM (Multistain deep learning model).
Arrow labels indicate each component’s output dimensions. Dim, mmhid, num_classes, etc. are defined model parameters. Fc(x, y) represents a fully connected layer with the input dimension x and the output dimension y. Subblocks 1, … n-1 are structured identically to subblock 0.
Extended Data Fig. 5
a-d: Accuracy (A), AUPRC (B), AUROC (C), and F1-Score (D) of the MSDLM and different classical machine learning techniques after training validation on the Mainz cohort. N = 11 models trained during 11-fold cross validation per group. One-way ANOVA with Dunnett Test to correct for multiple testing was used. Statistical significance is indicated by asterisks as described in the M&M section for the comparison to the MSDLM (p > 0.05:ns, p ≤ 0.05:*, p ≤ 0.01:**, p ≤ 0.001:***, p ≤ 0.0001:****). The 10th, 50th (Median), and 90th quantile as well as the minimum and maximum are depicted. E, F: Precision-recall and receiver operator characteristics curves of the MSDLM and the classical machine learning techniques. The mean of the 11-fold cross validation is shown. Shaded area indicates 1 standard deviation (std. dev.). (AUPRC: Area under the precision recall curve. AUROC: Area under the receiver operator characteristic. MSDLM: multistain deep learning model. SVM: Support vector machine classifier. RF: Random forest classifier. LR: Logistic regression classifier. GB: Gradient boosted (decision tree) classifier).
Extended Data Fig. 6
a: Kaplan–Meier curves of the test cohort for each UICC stage stratified by AImmunoscore (AIS). N = 61 for UICC stage I, n = 128 for UICC stage II, n = 114 for UICC stage III, and n = 36 for UICC stage IV. Censors are indicated with a ‘+’. Log-rank test was used.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Foersch, S., Glasner, C., Woerl, AC. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat Med 29, 430–439 (2023). https://doi.org/10.1038/s41591-022-02134-1