Abstract
Recent advances in cancer research and diagnostics largely rely on new developments in microscopic or molecular profiling techniques, offering high levels of detail with respect to either spatial or molecular features, but usually not both. Here, we present an explainable machine-learning approach for the integrated profiling of morphological, molecular and clinical features from breast cancer histology. First, our approach allows for the robust detection of cancer cells and tumour-infiltrating lymphocytes in histological images, providing precise heatmap visualizations explaining the classifier decisions. Second, molecular features, including DNA methylation, gene expression, copy number variations, somatic mutations and proteins are predicted from histology. Molecular predictions reach balanced accuracies up to 78%, whereas accuracies of over 95% can be achieved for subgroups of patients. Finally, our explainable AI approach allows assessment of the link between morphological and molecular cancer properties. The resulting computational multiplex-histology analysis can help promote basic cancer research and precision medicine through an integrated diagnostic scoring of histological, clinical and molecular features.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
A transparent artificial intelligence framework to assess lung disease in pulmonary hypertension
Scientific Reports Open Access 07 March 2023
-
New definitions of human lymphoid and follicular cell entities in lymphatic tissue by machine learning
Scientific Reports Open Access 08 November 2022
-
Patient-level proteomic network prediction by explainable artificial intelligence
npj Precision Oncology Open Access 07 June 2022
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
The data used for the main analyses presented here are available for non-commercial use at https://doi.org/10.6084/m9.figshare.1307883540.
Code availability
The code used for the machine-learning analyses has been deposited at https://doi.org/10.6084/m9.figshare.1307883540.
References
Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3, 108ra113 (2011).
Yuan, Y. Spatial heterogeneity in the tumor microenvironment. Cold Spring Harb. Perspect. Med. 6, a026583 (2016).
Gerner, M. Y., Kastenmuller, W., Ifrim, I., Kabat, J. & Germain, R. N. Histo-cytometry: a method for highly multiplex quantitative tissue imaging analysis applied to dendritic cell subset microanatomy in lymph nodes. Immunity 37, 364–376 (2012).
Rimm, D. L. Next-gen immunohistochemistry. Nat. Methods 11, 381–383 (2014).
Samek, W. & Müller, K.-R. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds. Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R.) 5–22 (Springer, 2019); https://doi.org/10.1007/978-3-030-28954-6_1
Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10, e0130140 (2015).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Zhang, Z. et al. Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nat. Mach. Intell. 1, 236–245 (2019).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
André, B., Vercauteren, T., Buchner, A. M., Wallace, M. B. & Ayache, N. Endomicroscopic video retrieval using mosaicing and visualwords. In 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 1419–1422 (2010); https://doi.org/10.1109/ISBI.2010.5490265
Caicedo, J. C., Cruz, A. & Gonzalez, F. A. Histopathology image classification using bag of features and kernel functions. In Conference on Artificial Intelligence in Medicine in Europe 126–135 (Springer, 2009).
Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).
Klauschen, F. et al. Scoring of tumor-infiltrating lymphocytes: From visual estimation to machine learning. Semin. Cancer Biol. 52, 151–157 (2018).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).
Sabbaghi, M. et al. Defective cyclin B1 induction in trastuzumab-emtansine (T-DM1) acquired resistance in HER2-positive breast cancer. Clin. Cancer Res. 23, 7006–7019 (2017).
Harrell, J. C., Shroka, T. M. & Jacobsen, B. M. Estrogen induces c-Kit and an aggressive phenotype in a model of invasive lobular breast cancer. Oncogenesis 6, 396 (2017).
Kuonen, F. et al. Inhibition of the Kit ligand/c-Kit axis attenuates metastasis in a mouse model mimicking local breast cancer relapse after radiotherapy. Clin. Cancer Res. 18, 4365–4374 (2012).
Jiang, Y., Zou, L., Lu, W.-Q., Zhang, Y. & Shen, A.-G. Foxo3a expression is a prognostic marker in breast cancer. PLoS ONE 8, e70746 (2013).
Loi, S. et al. Tumor-infiltrating lymphocytes and prognosis: a pooled individual patient analysis of early-stage triple-negative breast cancers. J. Clin. Oncol. 37, 559–569 (2019).
Amgad, M. et al. Report on computational assessment of tumor infiltrating lymphocytes from the International Immuno-Oncology Biomarker Working Group. NPJ Breast Cancer 6, 16 (2020).
Gonzalez-Ericsson, P. I. et al. The path to a better biomarker: application of a risk management framework for the implementation of PD-L1 and TILs as immuno-oncology biomarkers in breast cancer clinical trials and daily practice. J. Pathol. 250, 667–684 (2020).
Gatrell, A. C., Bailey, T. C., Diggle, P. J. & Rowlingson, B. S. Spatial point pattern analysis and its application in geographical epidemiology. Trans. Inst. Br. Geogr. 21, 256–274 (1996).
Budczies, J. et al. Classical pathology and mutational load of breast cancer–integration of two worlds. J. Pathol. Clin. Res. 1, 225–238 (2015).
Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50 (2018).
Gurcan, M. N. et al. Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009).
Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer, 2019).
Lapuschkin, S. et al. Unmasking clever Hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).
Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J. & Müller, K.-R. Toward interpretable machine learning: transparent deep neural networks and beyond. Preprint at https://arxiv.org/abs/2003.07631 (2020).
Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73, 1–15 (2018).
Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K. & Schölkopf, B. An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12, 181–201 (2001).
Csurka, G., Dance, C., Fan, L., Willamowski, J. & Bray, C. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision (2004).
Sonnenburg, S. et al. The SHOGUN machine learning toolbox. J. Mach. Learn. Res. 11, 1799–1802 (2010).
Lapuschkin, S., Binder, A., Montavon, G., Muller, K.-R. & Samek, W. Analyzing classifiers: Fisher vectors and deep neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2912–2920 (2016).
Binder, A., Samek, W., Müller, K.-R. & Kawanabe, M. Enhanced representation and multi-task learning for image annotation. Comput. Vision Image Understanding 117, 466–478 (2013).
Bishop, C. M. Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995).
Zien, A. & Ong, C. S. Multiclass multiple kernel learning. in Proc. 24th International Conference on Machine Learning 1191–1198 (2007).
Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. Preprint at https://arxiv.org/abs/1811.12808 (2018).
Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963).
Binder, A. & Bockmayr, M. Morphological and molecular breast cancer profiling through explainable machine learning. figshare https://doi.org/10.6084/m9.figshare.13078835 (2021).
Acknowledgements
This work was funded by the Charité Institute of Pathology, Berlin, the Technical University of Berlin, the Human Frontier Science Program (HFSP) Young Investigator Grant (M.I. and F.K.) and the Einstein Foundation Berlin (F.K.) and partly by the German Research Foundation to A.H. (DFG SFB-TR84, B6, Z1a) and the German Consortium for Translational Cancer Research (DKTK). F.K. was also supported by the German Ministry for Education and Research (BMBF) within the Berlin Institute for the Foundations of Learning and Data (BIFOLD; grant no. 01IS18025D and 01IS18037E), the clinical mass spectrometry centre MSTARS (grant no. 031L0220A) and CompLS Patho234 (grant no. 031L0207B) and the European Research Council under Horizon 2020 of the EU Framework Programme for Research and Innovation (647257). A.B. acknowledges support by the Ministry of Education AcRF Tier 2 grant MOE2016-T2-2-154, and expresses gratitude to SUTD for the SGPAIRS1811 grant. M.B. was supported in part by the University Medical Center Hamburg-Eppendorf. K.R.M. was supported in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korean government (no. 2017-0-00451, Development of BCI based Brain and Cognitive Computing Technology for Recognizing User’s Intentions using Deep Learning and no. 2019-0-00079, Artificial Intelligence Graduate School Program, Korea University), and by the German Ministry for Education and Research (BMBF) under grants 01IS14013A-E, 01GQ1115, 01GQ0850, 01IS18025A, 031L0207D and 01IS18037A; the German Research Foundation (DFG) under grant Math+, EXC 2046/1, project ID 390685689.
Author information
Authors and Affiliations
Contributions
A.B., M.B., M.H., K.-R.M. and F.K. conceptualized the project. A.B., M.B., K.H., M.H., S.W., D.H., K.-R.M. and F.K. were responsible for the methodology. A.B., M.B., M.H., S.W., D.H., K.-R.M. and F.K. developed the software. A.B., M.B., M.H., K.-R.M. and F.K. were responsible for validation. A.B., M.B., M.H. and F.K. conducted the formal analysis. A.B., M.B., M.H., C.D., K.-R.M. and F.K. performed investigation. A.B., M.I., C.D., K.-R.M. and F.K. were responsible for resources. Data curation was completed by A.B., M.B., D.H. and F.K.; A.B., M.B., K.-R.M. and F.K. wrote the first draft, which was reviewed and edited by A.B., M.B., M.H., S.W., D.H., K.H., D.T., M.I., A.S., A.H., C.D., K.-R.M. and F.K. Visualization was performed by A.B., M.B., M.H. and F.K.; K.-R.M. and F.K. supervised the project. Funding was secured by A.B., M.I., K.-R.M. and F.K.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Machine Intelligence thanks Carsten Marr and the other, anonymous, reviewers for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Methods, Figs. 1–17, Tables 1–13.
Supplementary Data 1
Balanced accuracies and significance of molecular predictions.
Supplementary Data 2
Area under the curve of molecular prediction.
Supplementary Data 3
Number of predictable cases after tail probability analysis for different tails.
Rights and permissions
About this article
Cite this article
Binder, A., Bockmayr, M., Hägele, M. et al. Morphological and molecular breast cancer profiling through explainable machine learning. Nat Mach Intell 3, 355–366 (2021). https://doi.org/10.1038/s42256-021-00303-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s42256-021-00303-4
This article is cited by
-
A transparent artificial intelligence framework to assess lung disease in pulmonary hypertension
Scientific Reports (2023)
-
Künstliche Intelligenz als Lösung des PathologInnenmangels?
Wiener klinisches Magazin (2023)
-
Human DNA/RNA motif mining using deep-learning methods: a scoping review
Network Modeling Analysis in Health Informatics and Bioinformatics (2023)
-
Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer
Nature Medicine (2023)
-
Artificial intelligence in histopathology: enhancing cancer research and clinical oncology
Nature Cancer (2022)