Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Morphological and molecular breast cancer profiling through explainable machine learning

Abstract

Recent advances in cancer research and diagnostics largely rely on new developments in microscopic or molecular profiling techniques, offering high levels of detail with respect to either spatial or molecular features, but usually not both. Here, we present an explainable machine-learning approach for the integrated profiling of morphological, molecular and clinical features from breast cancer histology. First, our approach allows for the robust detection of cancer cells and tumour-infiltrating lymphocytes in histological images, providing precise heatmap visualizations explaining the classifier decisions. Second, molecular features, including DNA methylation, gene expression, copy number variations, somatic mutations and proteins are predicted from histology. Molecular predictions reach balanced accuracies up to 78%, whereas accuracies of over 95% can be achieved for subgroups of patients. Finally, our explainable AI approach allows assessment of the link between morphological and molecular cancer properties. The resulting computational multiplex-histology analysis can help promote basic cancer research and precision medicine through an integrated diagnostic scoring of histological, clinical and molecular features.

A preprint version of the article is available at ArXiv.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Workflow of machine-learning-based morphological and molecular feature prediction and heatmapping.
Fig. 2: Spatial heatmapping results showing detection and localization of carcinoma cells and TiLs in breast cancer.
Fig. 3: Explainable machine learning avoids black-box limitation of conventional machine learning.
Fig. 4: Positive tail accuracies for molecular property prediction from histology.
Fig. 5: Computationally generated ‘fluorescence microscopy’ visualizing correlation of spatiomorphological and molecular features.
Fig. 6: Validation of computational morphomolecular predictions.

Data availability

The data used for the main analyses presented here are available for non-commercial use at https://doi.org/10.6084/m9.figshare.1307883540.

Code availability

The code used for the machine-learning analyses has been deposited at https://doi.org/10.6084/m9.figshare.1307883540.

References

  1. 1.

    Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl. Med. 3, 108ra113 (2011).

    Article  Google Scholar 

  2. 2.

    Yuan, Y. Spatial heterogeneity in the tumor microenvironment. Cold Spring Harb. Perspect. Med. 6, a026583 (2016).

    Article  Google Scholar 

  3. 3.

    Gerner, M. Y., Kastenmuller, W., Ifrim, I., Kabat, J. & Germain, R. N. Histo-cytometry: a method for highly multiplex quantitative tissue imaging analysis applied to dendritic cell subset microanatomy in lymph nodes. Immunity 37, 364–376 (2012).

    Article  Google Scholar 

  4. 4.

    Rimm, D. L. Next-gen immunohistochemistry. Nat. Methods 11, 381–383 (2014).

    Article  Google Scholar 

  5. 5.

    Samek, W. & Müller, K.-R. in Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds. Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R.) 5–22 (Springer, 2019); https://doi.org/10.1007/978-3-030-28954-6_1

  6. 6.

    Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10, e0130140 (2015).

    Article  Google Scholar 

  7. 7.

    Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).

    Google Scholar 

  8. 8.

    Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).

    Article  Google Scholar 

  9. 9.

    Zhang, Z. et al. Pathologist-level interpretable whole-slide cancer diagnosis with deep learning. Nat. Mach. Intell. 1, 236–245 (2019).

    Article  Google Scholar 

  10. 10.

    Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  Google Scholar 

  11. 11.

    André, B., Vercauteren, T., Buchner, A. M., Wallace, M. B. & Ayache, N. Endomicroscopic video retrieval using mosaicing and visualwords. In 2010 IEEE International Symposium on Biomedical Imaging: From Nano to Macro 1419–1422 (2010); https://doi.org/10.1109/ISBI.2010.5490265

  12. 12.

    Caicedo, J. C., Cruz, A. & Gonzalez, F. A. Histopathology image classification using bag of features and kernel functions. In Conference on Artificial Intelligence in Medicine in Europe 126–135 (Springer, 2009).

  13. 13.

    Yu, K.-H. et al. Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features. Nat. Commun. 7, 12474 (2016).

    Article  Google Scholar 

  14. 14.

    Klauschen, F. et al. Scoring of tumor-infiltrating lymphocytes: From visual estimation to machine learning. Semin. Cancer Biol. 52, 151–157 (2018).

    Article  Google Scholar 

  15. 15.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57, 289–300 (1995).

    MathSciNet  MATH  Google Scholar 

  16. 16.

    Sabbaghi, M. et al. Defective cyclin B1 induction in trastuzumab-emtansine (T-DM1) acquired resistance in HER2-positive breast cancer. Clin. Cancer Res. 23, 7006–7019 (2017).

    Article  Google Scholar 

  17. 17.

    Harrell, J. C., Shroka, T. M. & Jacobsen, B. M. Estrogen induces c-Kit and an aggressive phenotype in a model of invasive lobular breast cancer. Oncogenesis 6, 396 (2017).

    Article  Google Scholar 

  18. 18.

    Kuonen, F. et al. Inhibition of the Kit ligand/c-Kit axis attenuates metastasis in a mouse model mimicking local breast cancer relapse after radiotherapy. Clin. Cancer Res. 18, 4365–4374 (2012).

    Article  Google Scholar 

  19. 19.

    Jiang, Y., Zou, L., Lu, W.-Q., Zhang, Y. & Shen, A.-G. Foxo3a expression is a prognostic marker in breast cancer. PLoS ONE 8, e70746 (2013).

    Article  Google Scholar 

  20. 20.

    Loi, S. et al. Tumor-infiltrating lymphocytes and prognosis: a pooled individual patient analysis of early-stage triple-negative breast cancers. J. Clin. Oncol. 37, 559–569 (2019).

    Article  Google Scholar 

  21. 21.

    Amgad, M. et al. Report on computational assessment of tumor infiltrating lymphocytes from the International Immuno-Oncology Biomarker Working Group. NPJ Breast Cancer 6, 16 (2020).

    Article  Google Scholar 

  22. 22.

    Gonzalez-Ericsson, P. I. et al. The path to a better biomarker: application of a risk management framework for the implementation of PD-L1 and TILs as immuno-oncology biomarkers in breast cancer clinical trials and daily practice. J. Pathol. 250, 667–684 (2020).

    Article  Google Scholar 

  23. 23.

    Gatrell, A. C., Bailey, T. C., Diggle, P. J. & Rowlingson, B. S. Spatial point pattern analysis and its application in geographical epidemiology. Trans. Inst. Br. Geogr. 21, 256–274 (1996).

    Article  Google Scholar 

  24. 24.

    Budczies, J. et al. Classical pathology and mutational load of breast cancer–integration of two worlds. J. Pathol. Clin. Res. 1, 225–238 (2015).

    Article  Google Scholar 

  25. 25.

    Denkert, C. et al. Tumour-infiltrating lymphocytes and prognosis in different subtypes of breast cancer: a pooled analysis of 3771 patients treated with neoadjuvant therapy. Lancet Oncol. 19, 40–50 (2018).

    Article  Google Scholar 

  26. 26.

    Gurcan, M. N. et al. Histopathological image analysis: a review. IEEE Rev. Biomed. Eng. 2, 147–171 (2009).

    Article  Google Scholar 

  27. 27.

    Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K. & Müller, K.-R. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (Springer, 2019).

  28. 28.

    Lapuschkin, S. et al. Unmasking clever Hans predictors and assessing what machines really learn. Nat. Commun. 10, 1096 (2019).

    Article  Google Scholar 

  29. 29.

    Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J. & Müller, K.-R. Toward interpretable machine learning: transparent deep neural networks and beyond. Preprint at https://arxiv.org/abs/2003.07631 (2020).

  30. 30.

    Montavon, G., Samek, W. & Müller, K.-R. Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73, 1–15 (2018).

    MathSciNet  Article  Google Scholar 

  31. 31.

    Müller, K.-R., Mika, S., Rätsch, G., Tsuda, K. & Schölkopf, B. An introduction to kernel-based learning algorithms. IEEE Trans. Neural Netw. 12, 181–201 (2001).

    Article  Google Scholar 

  32. 32.

    Csurka, G., Dance, C., Fan, L., Willamowski, J. & Bray, C. Visual categorization with bags of keypoints. In Workshop on Statistical Learning in Computer Vision (2004).

  33. 33.

    Sonnenburg, S. et al. The SHOGUN machine learning toolbox. J. Mach. Learn. Res. 11, 1799–1802 (2010).

    MATH  Google Scholar 

  34. 34.

    Lapuschkin, S., Binder, A., Montavon, G., Muller, K.-R. & Samek, W. Analyzing classifiers: Fisher vectors and deep neural networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2912–2920 (2016).

  35. 35.

    Binder, A., Samek, W., Müller, K.-R. & Kawanabe, M. Enhanced representation and multi-task learning for image annotation. Comput. Vision Image Understanding 117, 466–478 (2013).

    Article  Google Scholar 

  36. 36.

    Bishop, C. M. Neural Networks for Pattern Recognition (Oxford Univ. Press, 1995).

  37. 37.

    Zien, A. & Ong, C. S. Multiclass multiple kernel learning. in Proc. 24th International Conference on Machine Learning 1191–1198 (2007).

  38. 38.

    Raschka, S. Model evaluation, model selection, and algorithm selection in machine learning. Preprint at https://arxiv.org/abs/1811.12808 (2018).

  39. 39.

    Hoeffding, W. Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963).

  40. 40.

    Binder, A. & Bockmayr, M. Morphological and molecular breast cancer profiling through explainable machine learning. figshare https://doi.org/10.6084/m9.figshare.13078835 (2021).

Download references

Acknowledgements

This work was funded by the Charité Institute of Pathology, Berlin, the Technical University of Berlin, the Human Frontier Science Program (HFSP) Young Investigator Grant (M.I. and F.K.) and the Einstein Foundation Berlin (F.K.) and partly by the German Research Foundation to A.H. (DFG SFB-TR84, B6, Z1a) and the German Consortium for Translational Cancer Research (DKTK). F.K. was also supported by the German Ministry for Education and Research (BMBF) within the Berlin Institute for the Foundations of Learning and Data (BIFOLD; grant no. 01IS18025D and 01IS18037E), the clinical mass spectrometry centre MSTARS (grant no. 031L0220A) and CompLS Patho234 (grant no. 031L0207B) and the European Research Council under Horizon 2020 of the EU Framework Programme for Research and Innovation (647257). A.B. acknowledges support by the Ministry of Education AcRF Tier 2 grant MOE2016-T2-2-154, and expresses gratitude to SUTD for the SGPAIRS1811 grant. M.B. was supported in part by the University Medical Center Hamburg-Eppendorf. K.R.M. was supported in part by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korean government (no. 2017-0-00451, Development of BCI based Brain and Cognitive Computing Technology for Recognizing User’s Intentions using Deep Learning and no. 2019-0-00079, Artificial Intelligence Graduate School Program, Korea University), and by the German Ministry for Education and Research (BMBF) under grants 01IS14013A-E, 01GQ1115, 01GQ0850, 01IS18025A, 031L0207D and 01IS18037A; the German Research Foundation (DFG) under grant Math+, EXC 2046/1, project ID 390685689.

Author information

Affiliations

Authors

Contributions

A.B., M.B., M.H., K.-R.M. and F.K. conceptualized the project. A.B., M.B., K.H., M.H., S.W., D.H., K.-R.M. and F.K. were responsible for the methodology. A.B., M.B., M.H., S.W., D.H., K.-R.M. and F.K. developed the software. A.B., M.B., M.H., K.-R.M. and F.K. were responsible for validation. A.B., M.B., M.H. and F.K. conducted the formal analysis. A.B., M.B., M.H., C.D., K.-R.M. and F.K. performed investigation. A.B., M.I., C.D., K.-R.M. and F.K. were responsible for resources. Data curation was completed by A.B., M.B., D.H. and F.K.; A.B., M.B., K.-R.M. and F.K. wrote the first draft, which was reviewed and edited by A.B., M.B., M.H., S.W., D.H., K.H., D.T., M.I., A.S., A.H., C.D., K.-R.M. and F.K. Visualization was performed by A.B., M.B., M.H. and F.K.; K.-R.M. and F.K. supervised the project. Funding was secured by A.B., M.I., K.-R.M. and F.K.

Corresponding authors

Correspondence to Klaus-Robert Müller or Frederick Klauschen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Machine Intelligence thanks Carsten Marr and the other, anonymous, reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Figs. 1–17, Tables 1–13.

Supplementary Data 1

Balanced accuracies and significance of molecular predictions.

Supplementary Data 2

Area under the curve of molecular prediction.

Supplementary Data 3

Number of predictable cases after tail probability analysis for different tails.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Binder, A., Bockmayr, M., Hägele, M. et al. Morphological and molecular breast cancer profiling through explainable machine learning. Nat Mach Intell 3, 355–366 (2021). https://doi.org/10.1038/s42256-021-00303-4

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing