Data-efficient and weakly supervised computational pathology on whole-slide images

Lu, Ming Y.; Williamson, Drew F. K.; Chen, Tiffany Y.; Chen, Richard J.; Barbieri, Matteo; Mahmood, Faisal

doi:10.1038/s41551-020-00682-w

Article
Published: 01 March 2021

Data-efficient and weakly supervised computational pathology on whole-slide images

Nature Biomedical Engineering volume 5, pages 555–570 (2021)Cite this article

26k Accesses
498 Citations
56 Altmetric
Metrics details

Subjects

Abstract

Deep-learning methods for computational pathology require either manual annotation of gigapixel whole-slide images (WSIs) or large datasets of WSIs with slide-level labels and typically suffer from poor domain adaptation and interpretability. Here we report an interpretable weakly supervised deep-learning method for data-efficient WSI processing and learning that only requires slide-level labels. The method, which we named clustering-constrained-attention multiple-instance learning (CLAM), uses attention-based learning to identify subregions of high diagnostic value to accurately classify whole slides and instance-level clustering over the identified representative regions to constrain and refine the feature space. By applying CLAM to the subtyping of renal cell carcinoma and non-small-cell lung cancer as well as the detection of lymph node metastasis, we show that it can be used to localize well-known morphological features on WSIs without the need for spatial labels, that it overperforms standard weakly supervised classification algorithms and that it is adaptable to independent test cohorts, smartphone microscopy and varying tissue content.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the CLAM conceptual framework, architecture and interpretability.**

**Fig. 2: Performance, data efficiency and comparative analysis.**

**Fig. 3: Adaptability to independent test cohorts.**

**Fig. 4: Interpretability and visualization.**

**Fig. 5: Adaptability to smartphone microscopy images.**

**Fig. 6: Adaptability to biopsy slides.**

An interpretable machine learning system for colorectal cancer diagnosis from pathology slides

Article Open access 05 March 2024

Pedro C. Neto, Diana Montezuma, … Jaime S. Cardoso

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Richard J. Chen, Tong Ding, … Faisal Mahmood

An annotation-free whole-slide training approach to pathological classification of lung cancer types using deep learning

Article Open access 19 February 2021

Chi-Long Chen, Chi-Chung Chen, … Cheng-Yu Chen

Data availability

The TCGA diagnostic whole-slide data (NSCLC, RCC) and corresponding labels are available from the NIH genomic data commons (https://portal.gdc.cancer.gov). The CPTAC whole-slide data (NSCLC) and the corresponding labels are available from the NIH cancer imaging archive (https://cancerimagingarchive.net/datascope/cptac). Metastatic-lymph-node data are publicly available from the CAMELYON16 and CAMELYON17 website (https://camelyon17.grand-challenge.org/Data). We included links to all public data in Supplementary Table 20. All reasonable requests for academic use of in-house raw and analysed data can be addressed to the corresponding author. All requests will be promptly reviewed to determine whether the request is subject to any intellectual property or patient-confidentiality obligations, will be processed in concordance with institutional and departmental guidelines and will require a material transfer agreement.

Code availability

All code was implemented in Python using PyTorch as the primary deep-learning library. The complete pipeline for processing WSIs as well as training and evaluating the deep-learning models is available at https://github.com/mahmoodlab/CLAM and can be used to reproduce the experiments of this paper. All source code has been released under the GNU GPLv3 free software license.

References

Bera, K., Schalper, K. A. & Madabhushi, A. Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Article Google Scholar
Niazi, M. K. K., Parwani, A. V. & Gurcan, M. N. Digital pathology and artificial intelligence. Lancet Oncol. 20, e253–e261 (2019).
Article Google Scholar
Hollon, T. C. et al. Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks. Nat. Med. 26, 52–58 (2020).
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Article CAS Google Scholar
Bulten, W. et al. Automated deep-learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).
Ström, P. et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 21, 222–232 (2020).
Schapiro, D. et al. histoCAT: analysis of cell phenotypes and interactions in multiplex image cytometry data. Nat. Methods 14, 873–876 (2017).
Article CAS Google Scholar
Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019).
Mahmood, F. et al. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imaging 39, 3257–3267 (2019).
Graham, S. et al. Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Article Google Scholar
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018).
Article CAS Google Scholar
Javed, S. et al. Cellular community detection for tissue phenotyping in colorectal cancer histology images. Med. Image Anal. 63, 101696 (2020).
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).
Article CAS Google Scholar
Heindl, A. et al. Microenvironmental niche divergence shapes brca1-dysregulated ovarian cancer morphological plasticity. Nat. Commun. 9, 3917 (2018).
Article Google Scholar
Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).
Article Google Scholar
Lazar, A. J. et al. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 (2017).
Article Google Scholar
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
Article Google Scholar
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Article Google Scholar
Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging https://doi.org/10.1109/TMI.2020.3021387 (2020).
Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl Med. 3, 108ra113 (2011).
Article Google Scholar
Yamamoto, Y. et al. Automated acquisition of explainable knowledge from unannotated histopathology images. Nat. Commun. 10, 5642 (2019).
Article CAS Google Scholar
Pell, R. et al. The use of digital pathology and image analysis in clinical trials. J. Pathol. Clin. Res. 5, 81–90 (2019).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Article CAS Google Scholar
Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019).
Article CAS Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Article CAS Google Scholar
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).
Article Google Scholar
McKinney, S. M. et al. International evaluation of an ai system for breast cancer screening. Nature 577, 89–94 (2020).
Article CAS Google Scholar
Mitani, A. et al. Detection of anaemia from retinal fundus images via deep learning. Nat. Biomed. Eng. 4, 18–27 (2020).
Article Google Scholar
Shen, L., Zhao, W. & Xing, L. Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nat. Biomed. Eng. 3, 880–888 (2019).
Article Google Scholar
Tellez, D., Litjens, G., van der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2019).
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Article Google Scholar
Chen, P.-H. C. et al. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat. Med. 25, 1453–1457 (2019).
Article CAS Google Scholar
Nagpal, K. et al. Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer. npj Digit. Med. 2, 48 (2019).
Article Google Scholar
Wang, S. et al. RMDL: recalibrated multi-instance deep learning for whole slide gastric image classification. Med. Image Anal. 58, 101549 (2019).
Article Google Scholar
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Article CAS Google Scholar
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS Google Scholar
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In International Conference on Machine Learning (eds Lawrence, M. & Reid, M.) 2132–2141 (PMLR, 2018).
Maron, O. & Lozano-Pérez, T. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems (eds Jordan, M. I. et al.) 570–576 (Citeseer, 1998).
Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).
BenTaieb, A. & Hamarneh, G. Adversarial stain transfer for histopathology image analysis. IEEE Trans. Med. Imaging 37, 792–802 (2017).
Article Google Scholar
Couture, H. D., Marron, J. S., Perou, C. M., Troester, M. A. & Niethammer, M. Multiple instance learning for heterogeneous images: training a CNN for histopathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Frangi, A. F. et al.) 254–262 (Springer, 2018).
Kraus, O. Z., Ba, J. L. & Frey, B. J. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016).
Article CAS Google Scholar
Zhang, C., Platt, J. C. & Viola, P. A. Multiple instance boosting for object detection. In Advances in Neural Information Processing Systems (eds Weiss, Y. et al.) 1417–1424 (Citeseer, 2006).
Berrada, L., Zisserman, A. & Kumar, M. P. Smooth loss functions for deep top-k classification. In International Conference on Learning Representations (2018).
Crammer, K. & Singer, Y. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001).
Google Scholar
Litjens, G. et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7, giy065 (2018).
Article Google Scholar
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Article Google Scholar

Download references

Acknowledgements

The authors thank A. Bruce for scanning internal cohorts of patient histology slides at BWH; J. Wang, K. Bronstein, L. Cirelli and S. Sahai for querying the BWH slide database and retrieving archival slides; M. Bragg, S. Zimmet and T. Mellen for administrative support; and Z. Noor for developing the interactive demo website. This work was supported in part by internal funds from BWH Pathology, the NIH National Institute of General Medical Sciences (NIGMS) grant no. R35GM138216A (to F.M.), a Google Cloud Research Grant and the Nvidia GPU Grant Program. R.J.C. was additionally supported by the NSF Graduate Research Fellowship and NIH National Human Genome Research Institute (NHGRI) grant no. T32HG002295. The content is solely the responsibility of the authors and does not reflect the official views of the National Institute of Health, National Institute of General Medical Sciences, National Human Genome Research Institute and the National Science Foundation.

Author information

These authors contributed equally: Drew F. K. Williamson, Tiffany Y. Chen.

Authors and Affiliations

Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Ming Y. Lu, Drew F. K. Williamson, Tiffany Y. Chen, Richard J. Chen, Matteo Barbieri & Faisal Mahmood
Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Ming Y. Lu, Matteo Barbieri & Faisal Mahmood
Cancer Data Science, Dana–Farber Cancer Institute, Boston, MA, USA
Ming Y. Lu & Faisal Mahmood
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Richard J. Chen

Authors

Ming Y. Lu
View author publications
You can also search for this author in PubMed Google Scholar
Drew F. K. Williamson
View author publications
You can also search for this author in PubMed Google Scholar
Tiffany Y. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Richard J. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Matteo Barbieri
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Mahmood
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.Y.L. and F.M. conceived the study and designed the experiments. M.Y.L. performed the experimental analysis. D.F.K.W. and T.Y.C. curated the in-house datasets and collected smartphone microscopy data. M.Y.L., R.J.C and M.B. developed and tested the CLAM Python package. M.Y.L. and F.M. prepared the manuscript. F.M. supervised the research.

Corresponding author

Correspondence to Faisal Mahmood.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biomedical Engineering thanks Anant Madabhushi, Geert Litjens and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary figures and tables.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021). https://doi.org/10.1038/s41551-020-00682-w

Download citation

Received: 23 April 2020
Accepted: 22 December 2020
Published: 01 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1038/s41551-020-00682-w

This article is cited by

DeepRisk network: an AI-based tool for digital pathology signature and treatment responsiveness of gastric cancer using whole-slide images
- Mengxin Tian
- Zhao Yao
- Xuefei Wang
Journal of Translational Medicine (2024)
Deep learning in cancer genomics and histopathology
- Michaela Unger
- Jakob Nikolas Kather
Genome Medicine (2024)
Pathogenomics for accurate diagnosis, treatment, prognosis of oncology: a cutting edge overview
- Xiaobing Feng
- Wen Shu
- Min He
Journal of Translational Medicine (2024)
Development and prognostic validation of a three-level NHG-like deep learning-based model for histological grading of breast cancer
- Abhinav Sharma
- Philippe Weitz
- Mattias Rantalainen
Breast Cancer Research (2024)
Translating prognostic quantification of c-MYC and BCL2 from tissue microarrays to whole slide images in diffuse large B-cell lymphoma using deep learning
- Thomas E. Tavolara
- M. Khalid Khan Niazi
- Metin N. Gurcan
Diagnostic Pathology (2024)