Deep-learning methods for computational pathology require either manual annotation of gigapixel whole-slide images (WSIs) or large datasets of WSIs with slide-level labels and typically suffer from poor domain adaptation and interpretability. Here we report an interpretable weakly supervised deep-learning method for data-efficient WSI processing and learning that only requires slide-level labels. The method, which we named clustering-constrained-attention multiple-instance learning (CLAM), uses attention-based learning to identify subregions of high diagnostic value to accurately classify whole slides and instance-level clustering over the identified representative regions to constrain and refine the feature space. By applying CLAM to the subtyping of renal cell carcinoma and non-small-cell lung cancer as well as the detection of lymph node metastasis, we show that it can be used to localize well-known morphological features on WSIs without the need for spatial labels, that it overperforms standard weakly supervised classification algorithms and that it is adaptable to independent test cohorts, smartphone microscopy and varying tissue content.
This is a preview of subscription content
Subscribe to Nature+
Get immediate online access to the entire Nature family of 50+ journals
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
The TCGA diagnostic whole-slide data (NSCLC, RCC) and corresponding labels are available from the NIH genomic data commons (https://portal.gdc.cancer.gov). The CPTAC whole-slide data (NSCLC) and the corresponding labels are available from the NIH cancer imaging archive (https://cancerimagingarchive.net/datascope/cptac). Metastatic-lymph-node data are publicly available from the CAMELYON16 and CAMELYON17 website (https://camelyon17.grand-challenge.org/Data). We included links to all public data in Supplementary Table 20. All reasonable requests for academic use of in-house raw and analysed data can be addressed to the corresponding author. All requests will be promptly reviewed to determine whether the request is subject to any intellectual property or patient-confidentiality obligations, will be processed in concordance with institutional and departmental guidelines and will require a material transfer agreement.
All code was implemented in Python using PyTorch as the primary deep-learning library. The complete pipeline for processing WSIs as well as training and evaluating the deep-learning models is available at https://github.com/mahmoodlab/CLAM and can be used to reproduce the experiments of this paper. All source code has been released under the GNU GPLv3 free software license.
Bera, K., Schalper, K. A. & Madabhushi, A. Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Niazi, M. K. K., Parwani, A. V. & Gurcan, M. N. Digital pathology and artificial intelligence. Lancet Oncol. 20, e253–e261 (2019).
Hollon, T. C. et al. Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks. Nat. Med. 26, 52–58 (2020).
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Bulten, W. et al. Automated deep-learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).
Ström, P. et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 21, 222–232 (2020).
Schapiro, D. et al. histoCAT: analysis of cell phenotypes and interactions in multiplex image cytometry data. Nat. Methods 14, 873–876 (2017).
Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019).
Mahmood, F. et al. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imaging 39, 3257–3267 (2019).
Graham, S. et al. Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018).
Javed, S. et al. Cellular community detection for tissue phenotyping in colorectal cancer histology images. Med. Image Anal. 63, 101696 (2020).
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).
Heindl, A. et al. Microenvironmental niche divergence shapes brca1-dysregulated ovarian cancer morphological plasticity. Nat. Commun. 9, 3917 (2018).
Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).
Lazar, A. J. et al. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 (2017).
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging https://doi.org/10.1109/TMI.2020.3021387 (2020).
Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl Med. 3, 108ra113 (2011).
Yamamoto, Y. et al. Automated acquisition of explainable knowledge from unannotated histopathology images. Nat. Commun. 10, 5642 (2019).
Pell, R. et al. The use of digital pathology and image analysis in clinical trials. J. Pathol. Clin. Res. 5, 81–90 (2019).
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019).
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).
McKinney, S. M. et al. International evaluation of an ai system for breast cancer screening. Nature 577, 89–94 (2020).
Mitani, A. et al. Detection of anaemia from retinal fundus images via deep learning. Nat. Biomed. Eng. 4, 18–27 (2020).
Shen, L., Zhao, W. & Xing, L. Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nat. Biomed. Eng. 3, 880–888 (2019).
Tellez, D., Litjens, G., van der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2019).
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Chen, P.-H. C. et al. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat. Med. 25, 1453–1457 (2019).
Nagpal, K. et al. Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer. npj Digit. Med. 2, 48 (2019).
Wang, S. et al. RMDL: recalibrated multi-instance deep learning for whole slide gastric image classification. Med. Image Anal. 58, 101549 (2019).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In International Conference on Machine Learning (eds Lawrence, M. & Reid, M.) 2132–2141 (PMLR, 2018).
Maron, O. & Lozano-Pérez, T. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems (eds Jordan, M. I. et al.) 570–576 (Citeseer, 1998).
Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).
BenTaieb, A. & Hamarneh, G. Adversarial stain transfer for histopathology image analysis. IEEE Trans. Med. Imaging 37, 792–802 (2017).
Couture, H. D., Marron, J. S., Perou, C. M., Troester, M. A. & Niethammer, M. Multiple instance learning for heterogeneous images: training a CNN for histopathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Frangi, A. F. et al.) 254–262 (Springer, 2018).
Kraus, O. Z., Ba, J. L. & Frey, B. J. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016).
Zhang, C., Platt, J. C. & Viola, P. A. Multiple instance boosting for object detection. In Advances in Neural Information Processing Systems (eds Weiss, Y. et al.) 1417–1424 (Citeseer, 2006).
Berrada, L., Zisserman, A. & Kumar, M. P. Smooth loss functions for deep top-k classification. In International Conference on Learning Representations (2018).
Crammer, K. & Singer, Y. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001).
Litjens, G. et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7, giy065 (2018).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).
The authors thank A. Bruce for scanning internal cohorts of patient histology slides at BWH; J. Wang, K. Bronstein, L. Cirelli and S. Sahai for querying the BWH slide database and retrieving archival slides; M. Bragg, S. Zimmet and T. Mellen for administrative support; and Z. Noor for developing the interactive demo website. This work was supported in part by internal funds from BWH Pathology, the NIH National Institute of General Medical Sciences (NIGMS) grant no. R35GM138216A (to F.M.), a Google Cloud Research Grant and the Nvidia GPU Grant Program. R.J.C. was additionally supported by the NSF Graduate Research Fellowship and NIH National Human Genome Research Institute (NHGRI) grant no. T32HG002295. The content is solely the responsibility of the authors and does not reflect the official views of the National Institute of Health, National Institute of General Medical Sciences, National Human Genome Research Institute and the National Science Foundation.
The authors declare no competing interests.
Peer review information Nature Biomedical Engineering thanks Anant Madabhushi, Geert Litjens and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng 5, 555–570 (2021). https://doi.org/10.1038/s41551-020-00682-w
Nature Medicine (2022)
Deep neural network trained on gigapixel images improves lymph node metastasis detection in clinical settings
Nature Communications (2022)
Virchows Archiv (2022)
Cancer Causes & Control (2022)
Development and evaluation of a deep neural network for histologic classification of renal cell carcinoma on biopsy and surgical resection slides
Scientific Reports (2021)