Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Data-efficient and weakly supervised computational pathology on whole-slide images


Deep-learning methods for computational pathology require either manual annotation of gigapixel whole-slide images (WSIs) or large datasets of WSIs with slide-level labels and typically suffer from poor domain adaptation and interpretability. Here we report an interpretable weakly supervised deep-learning method for data-efficient WSI processing and learning that only requires slide-level labels. The method, which we named clustering-constrained-attention multiple-instance learning (CLAM), uses attention-based learning to identify subregions of high diagnostic value to accurately classify whole slides and instance-level clustering over the identified representative regions to constrain and refine the feature space. By applying CLAM to the subtyping of renal cell carcinoma and non-small-cell lung cancer as well as the detection of lymph node metastasis, we show that it can be used to localize well-known morphological features on WSIs without the need for spatial labels, that it overperforms standard weakly supervised classification algorithms and that it is adaptable to independent test cohorts, smartphone microscopy and varying tissue content.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Overview of the CLAM conceptual framework, architecture and interpretability.
Fig. 2: Performance, data efficiency and comparative analysis.
Fig. 3: Adaptability to independent test cohorts.
Fig. 4: Interpretability and visualization.
Fig. 5: Adaptability to smartphone microscopy images.
Fig. 6: Adaptability to biopsy slides.

Data availability

The TCGA diagnostic whole-slide data (NSCLC, RCC) and corresponding labels are available from the NIH genomic data commons ( The CPTAC whole-slide data (NSCLC) and the corresponding labels are available from the NIH cancer imaging archive ( Metastatic-lymph-node data are publicly available from the CAMELYON16 and CAMELYON17 website ( We included links to all public data in Supplementary Table 20. All reasonable requests for academic use of in-house raw and analysed data can be addressed to the corresponding author. All requests will be promptly reviewed to determine whether the request is subject to any intellectual property or patient-confidentiality obligations, will be processed in concordance with institutional and departmental guidelines and will require a material transfer agreement.

Code availability

All code was implemented in Python using PyTorch as the primary deep-learning library. The complete pipeline for processing WSIs as well as training and evaluating the deep-learning models is available at and can be used to reproduce the experiments of this paper. All source code has been released under the GNU GPLv3 free software license.


  1. 1.

    Bera, K., Schalper, K. A. & Madabhushi, A. Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).

    Article  Google Scholar 

  2. 2.

    Niazi, M. K. K., Parwani, A. V. & Gurcan, M. N. Digital pathology and artificial intelligence. Lancet Oncol. 20, e253–e261 (2019).

    Article  Google Scholar 

  3. 3.

    Hollon, T. C. et al. Near real-time intraoperative brain tumor diagnosis using stimulated raman histology and deep neural networks. Nat. Med. 26, 52–58 (2020).

  4. 4.

    Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).

    CAS  Article  Google Scholar 

  5. 5.

    Bulten, W. et al. Automated deep-learning system for gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).

  6. 6.

    Ström, P. et al. Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study. Lancet Oncol. 21, 222–232 (2020).

  7. 7.

    Schapiro, D. et al. histoCAT: analysis of cell phenotypes and interactions in multiplex image cytometry data. Nat. Methods 14, 873–876 (2017).

    CAS  Article  Google Scholar 

  8. 8.

    Moen, E. et al. Deep learning for cellular image analysis. Nat. Methods 16, 1233–1246 (2019).

  9. 9.

    Mahmood, F. et al. Deep adversarial training for multi-organ nuclei segmentation in histopathology images. IEEE Trans. Med. Imaging 39, 3257–3267 (2019).

  10. 10.

    Graham, S. et al. Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images. Med. Image Anal. 58, 101563 (2019).

    Article  Google Scholar 

  11. 11.

    Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018).

    CAS  Article  Google Scholar 

  12. 12.

    Javed, S. et al. Cellular community detection for tissue phenotyping in colorectal cancer histology images. Med. Image Anal. 63, 101696 (2020).

  13. 13.

    Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).

    CAS  Article  Google Scholar 

  14. 14.

    Heindl, A. et al. Microenvironmental niche divergence shapes brca1-dysregulated ovarian cancer morphological plasticity. Nat. Commun. 9, 3917 (2018).

    Article  Google Scholar 

  15. 15.

    Yuan, Y. et al. Quantitative image analysis of cellular heterogeneity in breast tumors complements genomic profiling. Sci. Transl. Med. 4, 157ra143 (2012).

    Article  Google Scholar 

  16. 16.

    Lazar, A. J. et al. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 171, 950–965 (2017).

    Article  Google Scholar 

  17. 17.

    Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).

    Article  Google Scholar 

  18. 18.

    Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).

    Article  Google Scholar 

  19. 19.

    Chen, R. J. et al. Pathomic fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging (2020).

  20. 20.

    Beck, A. H. et al. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci. Transl Med. 3, 108ra113 (2011).

    Article  Google Scholar 

  21. 21.

    Yamamoto, Y. et al. Automated acquisition of explainable knowledge from unannotated histopathology images. Nat. Commun. 10, 5642 (2019).

    CAS  Article  Google Scholar 

  22. 22.

    Pell, R. et al. The use of digital pathology and image analysis in clinical trials. J. Pathol. Clin. Res. 5, 81–90 (2019).

    Article  Google Scholar 

  23. 23.

    LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    CAS  Article  Google Scholar 

  24. 24.

    Esteva, A. et al. A guide to deep learning in healthcare. Nat. Med. 25, 24–29 (2019).

    CAS  Article  Google Scholar 

  25. 25.

    Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    CAS  Article  Google Scholar 

  26. 26.

    Poplin, R. et al. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat. Biomed. Eng. 2, 158–164 (2018).

    Article  Google Scholar 

  27. 27.

    McKinney, S. M. et al. International evaluation of an ai system for breast cancer screening. Nature 577, 89–94 (2020).

    CAS  Article  Google Scholar 

  28. 28.

    Mitani, A. et al. Detection of anaemia from retinal fundus images via deep learning. Nat. Biomed. Eng. 4, 18–27 (2020).

    Article  Google Scholar 

  29. 29.

    Shen, L., Zhao, W. & Xing, L. Patient-specific reconstruction of volumetric computed tomography images from a single projection view via deep learning. Nat. Biomed. Eng. 3, 880–888 (2019).

    Article  Google Scholar 

  30. 30.

    Tellez, D., Litjens, G., van der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2019).

  31. 31.

    Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).

    Article  Google Scholar 

  32. 32.

    Chen, P.-H. C. et al. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat. Med. 25, 1453–1457 (2019).

    CAS  Article  Google Scholar 

  33. 33.

    Nagpal, K. et al. Development and validation of a deep learning algorithm for improving gleason scoring of prostate cancer. npj Digit. Med. 2, 48 (2019).

    Article  Google Scholar 

  34. 34.

    Wang, S. et al. RMDL: recalibrated multi-instance deep learning for whole slide gastric image classification. Med. Image Anal. 58, 101549 (2019).

    Article  Google Scholar 

  35. 35.

    Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    CAS  Article  Google Scholar 

  36. 36.

    Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    CAS  Article  Google Scholar 

  37. 37.

    Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In International Conference on Machine Learning (eds Lawrence, M. & Reid, M.) 2132–2141 (PMLR, 2018).

  38. 38.

    Maron, O. & Lozano-Pérez, T. A framework for multiple-instance learning. In Advances in Neural Information Processing Systems (eds Jordan, M. I. et al.) 570–576 (Citeseer, 1998).

  39. 39.

    Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).

  40. 40.

    BenTaieb, A. & Hamarneh, G. Adversarial stain transfer for histopathology image analysis. IEEE Trans. Med. Imaging 37, 792–802 (2017).

    Article  Google Scholar 

  41. 41.

    Couture, H. D., Marron, J. S., Perou, C. M., Troester, M. A. & Niethammer, M. Multiple instance learning for heterogeneous images: training a CNN for histopathology. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Frangi, A. F. et al.) 254–262 (Springer, 2018).

  42. 42.

    Kraus, O. Z., Ba, J. L. & Frey, B. J. Classifying and segmenting microscopy images with deep multiple instance learning. Bioinformatics 32, i52–i59 (2016).

    CAS  Article  Google Scholar 

  43. 43.

    Zhang, C., Platt, J. C. & Viola, P. A. Multiple instance boosting for object detection. In Advances in Neural Information Processing Systems (eds Weiss, Y. et al.) 1417–1424 (Citeseer, 2006).

  44. 44.

    Berrada, L., Zisserman, A. & Kumar, M. P. Smooth loss functions for deep top-k classification. In International Conference on Learning Representations (2018).

  45. 45.

    Crammer, K. & Singer, Y. On the algorithmic implementation of multiclass kernel-based vector machines. J. Mach. Learn. Res. 2, 265–292 (2001).

    Google Scholar 

  46. 46.

    Litjens, G. et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. GigaScience 7, giy065 (2018).

    Article  Google Scholar 

  47. 47.

    Russakovsky, O. et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

    Article  Google Scholar 

Download references


The authors thank A. Bruce for scanning internal cohorts of patient histology slides at BWH; J. Wang, K. Bronstein, L. Cirelli and S. Sahai for querying the BWH slide database and retrieving archival slides; M. Bragg, S. Zimmet and T. Mellen for administrative support; and Z. Noor for developing the interactive demo website. This work was supported in part by internal funds from BWH Pathology, the NIH National Institute of General Medical Sciences (NIGMS) grant no. R35GM138216A (to F.M.), a Google Cloud Research Grant and the Nvidia GPU Grant Program. R.J.C. was additionally supported by the NSF Graduate Research Fellowship and NIH National Human Genome Research Institute (NHGRI) grant no. T32HG002295. The content is solely the responsibility of the authors and does not reflect the official views of the National Institute of Health, National Institute of General Medical Sciences, National Human Genome Research Institute and the National Science Foundation.

Author information




M.Y.L. and F.M. conceived the study and designed the experiments. M.Y.L. performed the experimental analysis. D.F.K.W. and T.Y.C. curated the in-house datasets and collected smartphone microscopy data. M.Y.L., R.J.C and M.B. developed and tested the CLAM Python package. M.Y.L. and F.M. prepared the manuscript. F.M. supervised the research.

Corresponding author

Correspondence to Faisal Mahmood.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Biomedical Engineering thanks Anant Madabhushi, Geert Litjens and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary figures and tables.

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lu, M.Y., Williamson, D.F.K., Chen, T.Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat Biomed Eng (2021).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing