Abstract
Deep-learning methods for computational pathology require either manual annotation of gigapixel whole-slide images (WSIs) or large datasets of WSIs with slide-level labels and typically suffer from poor domain adaptation and interpretability. Here we report an interpretable weakly supervised deep-learning method for data-efficient WSI processing and learning that only requires slide-level labels. The method, which we named clustering-constrained-attention multiple-instance learning (CLAM), uses attention-based learning to identify subregions of high diagnostic value to accurately classify whole slides and instance-level clustering over the identified representative regions to constrain and refine the feature space. By applying CLAM to the subtyping of renal cell carcinoma and non-small-cell lung cancer as well as the detection of lymph node metastasis, we show that it can be used to localize well-known morphological features on WSIs without the need for spatial labels, that it overperforms standard weakly supervised classification algorithms and that it is adaptable to independent test cohorts, smartphone microscopy and varying tissue content.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Colorectal cancer lymph node metastasis prediction with weakly supervised transformer-based multi-instance learning
Medical & Biological Engineering & Computing Open Access 21 February 2023
-
Detection of acute promyelocytic leukemia in peripheral blood and bone marrow with annotation-free deep learning
Scientific Reports Open Access 13 February 2023
-
Federated learning with hyper-network—a case study on whole slide image analysis
Scientific Reports Open Access 31 January 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 per month
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$79.00 per year
only $6.58 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout






Data availability
The TCGA diagnostic whole-slide data (NSCLC, RCC) and corresponding labels are available from the NIH genomic data commons (https://portal.gdc.cancer.gov). The CPTAC whole-slide data (NSCLC) and the corresponding labels are available from the NIH cancer imaging archive (https://cancerimagingarchive.net/datascope/cptac). Metastatic-lymph-node data are publicly available from the CAMELYON16 and CAMELYON17 website (https://camelyon17.grand-challenge.org/Data). We included links to all public data in Supplementary Table 20. All reasonable requests for academic use of in-house raw and analysed data can be addressed to the corresponding author. All requests will be promptly reviewed to determine whether the request is subject to any intellectual property or patient-confidentiality obligations, will be processed in concordance with institutional and departmental guidelines and will require a material transfer agreement.
Code availability
All code was implemented in Python using PyTorch as the primary deep-learning library. The complete pipeline for processing WSIs as well as training and evaluating the deep-learning models is available at https://github.com/mahmoodlab/CLAM and can be used to reproduce the experiments of this paper. All source code has been released under the GNU GPLv3 free software license.
References
Bera, K., Schalper, K. A. & Madabhushi, A. Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Niazi, M. K. K., Parwani, A. V. & Gurcan, M. N. Digital pathology and artificial intelligence. Lancet Oncol. 20, e253–e261 (2019).
Hollon, T. C. et al. Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks. Nat. Med. 26, 52–58 (2020).
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Bulten, W. et al. Automated deep-learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).
Ström, P. et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 21, 222–232 (2020).
Schapiro, D. et al. histoCAT: analysis of cell phenotypes and interactions in multiplex image cytometry data. Nat. Methods 14, 873–876 (2017).
Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019).
Mahmood, F. et al. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imaging 39, 3257–3267 (2019).
Graham, S. et al. Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018).
Javed, S. et al. Cellular community detection for tissue phenotyping in colorectal cancer histology images. Med. Image Anal. 63, 101696 (2020).
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).
Heindl, A. et al. Microenvironmental niche divergence shapes brca1-dysregulated ovarian cancer morphological plasticity. Nat. Commun. 9, 3917 (2018).
Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).
Lazar, A. J. et al. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 (2017).
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging https://doi.org/10.1109/TMI.2020.3021387 (2020).
Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl Med. 3, 108ra113 (2011).
Yamamoto, Y. et al. Automated acquisition of explainable knowledge from unannotated histopathology images. Nat. Commun. 10, 5642 (2019).
Pell, R. et al. The use of digital pathology and image analysis in clinical trials. J. Pathol. Clin. Res. 5, 81–90 (2019).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).
McKinney, S. M. et al. International evaluation of an ai system for breast cancer screening. Nature 577, 89–94 (2020).
Mitani, A. et al. Detection of anaemia from retinal fundus images via deep learning. Nat. Biomed. Eng. 4, 18–27 (2020).
Shen, L., Zhao, W. & Xing, L. Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nat. Biomed. Eng. 3, 880–888 (2019).
Tellez, D., Litjens, G., van der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2019).
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Chen, P.-H. C. et al. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat. Med. 25, 1453–1457 (2019).
Nagpal, K. et al. Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer. npj Digit. Med. 2, 48 (2019).
Wang, S. et al. RMDL: recalibrated multi-instance deep learning for whole slide gastric image classification. Med. Image Anal. 58, 101549 (2019).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In International Conference on Machine Learning (eds Lawrence, M. & Reid, M.) 2132–2141 (PMLR, 2018).
Maron, O. & Lozano-Pérez, T. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems (eds Jordan, M. I. et al.) 570–576 (Citeseer, 1998).
Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).
BenTaieb, A. & Hamarneh, G. Adversarial stain transfer for histopathology image analysis. IEEE Trans. Med. Imaging 37, 792–802 (2017).
Couture, H. D., Marron, J. S., Perou, C. M., Troester, M. A. & Niethammer, M. Multiple instance learning for heterogeneous images: training a CNN for histopathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Frangi, A. F. et al.) 254–262 (Springer, 2018).
Kraus, O. Z., Ba, J. L. & Frey, B. J. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016).
Zhang, C., Platt, J. C. & Viola, P. A. Multiple instance boosting for object detection. In Advances in Neural Information Processing Systems (eds Weiss, Y. et al.) 1417–1424 (Citeseer, 2006).
Berrada, L., Zisserman, A. & Kumar, M. P. Smooth loss functions for deep top-k classification. In International Conference on Learning Representations (2018).
Crammer, K. & Singer, Y. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001).
Litjens, G. et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7, giy065 (2018).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
Acknowledgements
The authors thank A. Bruce for scanning internal cohorts of patient histology slides at BWH; J. Wang, K. Bronstein, L. Cirelli and S. Sahai for querying the BWH slide database and retrieving archival slides; M. Bragg, S. Zimmet and T. Mellen for administrative support; and Z. Noor for developing the interactive demo website. This work was supported in part by internal funds from BWH Pathology, the NIH National Institute of General Medical Sciences (NIGMS) grant no. R35GM138216A (to F.M.), a Google Cloud Research Grant and the Nvidia GPU Grant Program. R.J.C. was additionally supported by the NSF Graduate Research Fellowship and NIH National Human Genome Research Institute (NHGRI) grant no. T32HG002295. The content is solely the responsibility of the authors and does not reflect the official views of the National Institute of Health, National Institute of General Medical Sciences, National Human Genome Research Institute and the National Science Foundation.
Author information
Authors and Affiliations
Contributions
M.Y.L. and F.M. conceived the study and designed the experiments. M.Y.L. performed the experimental analysis. D.F.K.W. and T.Y.C. curated the in-house datasets and collected smartphone microscopy data. M.Y.L., R.J.C and M.B. developed and tested the CLAM Python package. M.Y.L. and F.M. prepared the manuscript. F.M. supervised the research.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Biomedical Engineering thanks Anant Madabhushi, Geert Litjens and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary figures and tables.
Rights and permissions
About this article
Cite this article
Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021). https://doi.org/10.1038/s41551-020-00682-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41551-020-00682-w
This article is cited by
-
Detection of acute promyelocytic leukemia in peripheral blood and bone marrow with annotation-free deep learning
Scientific Reports (2023)
-
Predicting the HER2 status in oesophageal cancer from tissue microarrays using convolutional neural networks
British Journal of Cancer (2023)
-
Federated learning with hyper-network—a case study on whole slide image analysis
Scientific Reports (2023)
-
Colorectal cancer lymph node metastasis prediction with weakly supervised transformer-based multi-instance learning
Medical & Biological Engineering & Computing (2023)
-
A multi-view deep learning model for pathology image diagnosis
Applied Intelligence (2023)