Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Towards a general-purpose foundation model for computational pathology

Abstract

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks, requiring the objective characterization of histopathological entities from whole-slide images (WSIs). The high resolution of WSIs and the variability of morphological features present significant challenges, complicating the large-scale annotation of data for high-performance applications. To address this challenge, current efforts have proposed the use of pretrained image encoders through transfer learning from natural image datasets or self-supervised learning on publicly available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data) across 20 major tissue types. The model was evaluated on 34 representative CPath tasks of varying diagnostic difficulty. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient artificial intelligence models that can generalize and transfer to a wide range of diagnostically challenging tasks and clinical workflows in anatomic pathology.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of UNI.
Fig. 2: Slide-level tasks for OT-43 and OT-108, and slide-level task performance.
Fig. 3: ROI-level tasks.
Fig. 4: Few-shot ROI- and slide-level prototyping.

Similar content being viewed by others

Data availability

TCGA and CPTAC data consisting of whole-slide images and labels can be accessed through the NIH genomic data commons (https://portal.gdc.cancer.gov) and proteomics data commons (https://proteomic.datacommons.cancer.gov), respectively. GTEx data added to the pretraining dataset can be accessed through the GTEx portal (https://www.gtexportal.org/home/). CPTAC data consisting of all publicly available datasets analyzed in this work can be can accessed in their respective data portals: CRC-100K (https://zenodo.org/record/1214456), HunCRC ROIs (10.6084/m9.figshare.c.5927795.v1), HunCRC slides (10.7937/tcia.9cjf-0127), BACH (https://iciar2018-challenge.grand-challenge.org/Dataset/), TCGA CRC-MSI (https://zenodo.org/record/3832231), CCRCC tissue classification (https://zenodo.org/record/7898308), TCGA-TILs (https://zenodo.org/record/6604094), TCGA Uniform (https://zenodo.org/record/5889558), UniToPatho (https://zenodo.org/record/4643645), ESCA(https://zenodo.org/record/7548828), CAMELYON17-WILDS (https://wilds.stanford.edu/datasets), EBRAINS (10.25493/WQ48-ZGX), DHMC (https://bmirds.github.io/KidneyCancer), BRACS (https://bracs.icar.cnr.it), PANDA (https://panda.grand-challenge.org), SegPath (https://zenodo.org/record/7412731) and AGGC (https://zenodo.org/record/6460100). TCGA, CPTAC, HunCRC and TCGA-TILS can also be accessed using The Cancer Imaging Archive175. Links for all datasets are also listed in Supplementary Table 73. We note that data from AGGC were obtained from a public grand challenge (of the same name (https://aggc22.grand-challenge.org)) with a pending publication101, with permission granted by the challenge organizers to present results from this dataset. No internal patient data were specifically collected for this study. This study relies on retrospective analysis of anonymized whole-slide images. Following institution policies, all requests for data collected or curated in-house will be evaluated on a case-by-case basis to determine whether the data requested and the use case comply with intellectual property or patient privacy obligations.

Code availability

Code and model weights for UNI can be accessed for academic research purposes at https://github.com/mahmoodlab/UNI. We have documented all technical deep learning methods and software libraries used in the study while ensuring that the paper is accessible to the broader clinical and scientific audience.

References

  1. Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng. 1, 930–949 (2023).

    Article  Google Scholar 

  2. Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology: new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Heinz, C. N., Echle, A., Foersch, S., Bychkov, A. & Kather, J. N. The future of artificial intelligence in digital pathology: results of a survey across stakeholder groups. Histopathology 80, 1121–1127 (2022).

    Article  PubMed  Google Scholar 

  5. Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Amgad, M. et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat. Med. 30, 85–97 (2024).

    Article  CAS  PubMed  Google Scholar 

  9. Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Cooper, M., Ji, Z. & Krishnan, R. G. Machine learning in computational histopathology: challenges and opportunities. Genes Chromosomes Cancer 62, 540–556 (2023).

    Article  CAS  PubMed  Google Scholar 

  12. Graham, S. et al. Screening of normal endoscopic large bowel biopsies with interpretable graph learning: a retrospective study. Gut 72, 1709–1721 (2023).

    Article  PubMed  Google Scholar 

  13. Ozyoruk, K. B. et al. A deep-learning model for transforming the style of tissue images from cryosectioned to formalin-fixed and paraffin-embedded. Nat. Biomed. Eng. 6, 1407–1419 (2022).

    Article  CAS  PubMed  Google Scholar 

  14. Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).

    Article  CAS  PubMed  Google Scholar 

  15. Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).

    Article  CAS  PubMed  Google Scholar 

  18. Bulten, W. et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat. Med. 28, 154–163 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023).

    Article  CAS  PubMed  Google Scholar 

  20. Chen, R. J. et al. Multimodal co-attention transformer for survival prediction in gigapixel whole slide images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4015–4025 (2021).

  21. He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16000–16009 (2022).

  22. Oquab, M. et al. DINOv2: learning robust visual features without supervision. Preprint at https://doi.org/10.48550/arxiv.2304.07193 (2023).

  23. Balestriero, R. et al. A cookbook of self-supervised learning. Preprint at https://doi.org/10.48550/arxiv.2304.12210 (2023).

  24. Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021).

  25. Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international Conference on Computer Vision, 9650–9660 (2021).

  26. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, 1597–1607 (PMLR, 2020).

  27. Grill, J.-B. et al. Bootstrap your own latent: a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020).

    Google Scholar 

  28. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (IEEE, 2009).

  29. Russakovsky, O. et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 211–252 (2015).

    Article  Google Scholar 

  30. Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision, 843–852 (2017).

  31. Zhai, X., Kolesnikov, A., Houlsby, N. & Beyer, L. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12104–12113 (2022).

  32. Goyal, P., Mahajan, D., Gupta, A. & Misra, I. Scaling and benchmarking self-supervised visual representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6391–6400 (2019).

  33. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://doi.org/10.48550/arxiv.2108.07258 (2021).

  34. Yuan, L. et al. Florence: A new foundation model for computer vision. Preprint at https://doi.org/10.48550/arxiv.2111.11432 (2021).

  35. Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Chen, R. J. & Krishnan, R. G. Self-supervised vision transformers learn visual concepts in histopathology. In Learning Meaningful Representations of Life, NeurIPS 2021 (2022).

  37. Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).

    Article  PubMed  Google Scholar 

  38. Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).

    Article  PubMed  Google Scholar 

  39. Kang, M., Song, H., Park, S., Yoo, D. & Pereira, S. Benchmarking self-supervised learning on diverse pathology datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3344–3354 (2023).

  40. Li, B., Li, Y. & Eliceiri, K. W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14318–14328 (2021).

  41. Lazard, T., Lerousseau, M., Decencière, E. & Walter, T. Giga-SSL: self-supervised learning for gigapixel images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4304–4313 (2023).

  42. Schirris, Y., Gavves, E., Nederlof, I., Horlings, H. M. & Teuwen, J. DeepSMILE: contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer. Med. Image Anal. 79, 102464 (2022).

    Article  PubMed  Google Scholar 

  43. Vu, Q. D., Rajpoot, K., Raza, S. E. A. & Rajpoot, N. Handcrafted Histological Transformer (H2T): unsupervised representation of whole slide images. Med. Image Anal. 85, 102743 (2023).

    Article  PubMed  Google Scholar 

  44. Zhao, Y. et al. Predicting lymph node metastasis using histopathological images based on multiple instance learning with deep graph convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4837–4846 (2020).

  45. Wang, X. et al. RetCCL: clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).

    Article  PubMed  Google Scholar 

  46. Filiot, A. et al. Scaling self-supervised learning for histopathology with masked image modeling. Preprint at https://doi.org/10.1101/2023.07.21.23292757 (2023).

  47. Srinidhi, C. L., Kim, S. W., Chen, F.-D. & Martel, A. L. Self-supervised driven consistency training for annotation efficient histopathology image analysis. Med. Image Anal. 75, 102256 (2022).

    Article  PubMed  Google Scholar 

  48. Koohbanani, N. A., Unnikrishnan, B., Khurram, S. A., Krishnaswamy, P. & Rajpoot, N. Self-Path: self-supervision for classification of pathology images with limited annotations. IEEE Trans. Med. Imaging 40, 2845–2856 (2021).

    Article  PubMed  Google Scholar 

  49. Ciga, O., Xu, T. & Martel, A. L. Self supervised contrastive learning for digital histopathology. Machine Learning with Applications 7, 100198 (2022).

    Article  Google Scholar 

  50. Lin, T. et al. SGCL: spatial guided contrastive learning on whole-slide pathological images. Med. Image Anal. 89, 102845 (2023).

    Article  PubMed  Google Scholar 

  51. Tellez, D., Litjens, G., van der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2021).

    Article  PubMed  Google Scholar 

  52. Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).

    Article  CAS  PubMed  Google Scholar 

  53. Jiang, C. et al. Hierarchical discriminative learning improves visual representations of biomedical microscopy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19798–19808 (2023).

  54. Saldanha, O. L. et al. Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology. NPJ Precis. Oncol. 7, 35 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Lu, M. Y. et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19764–19775 (2023).

  56. Mokhtari, R. et al. Interpretable histopathology-based prediction of disease relevant features in inflammatory bowel disease biopsies using weakly-supervised deep learning. In Medical Imaging with Deep Learning 479–495 (PMLR, 2023).

  57. Jaume, G. et al. Modeling dense multimodal interactions between biological pathways and histology for survival prediction. Preprint at https://doi.org/10.48550/arxiv.2304.06819 (2023).

  58. Hörst, F. et al. Histology-based prediction of therapy response to neoadjuvant chemotherapy for esophageal and esophagogastric junction adenocarcinomas using deep learning. JCO Clin. Cancer Inform. 7, e2300038 (2023).

    Article  PubMed  Google Scholar 

  59. Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study. Cancer Cell 41, 1650–1661 (2023).

    Article  Google Scholar 

  60. Hörst, F. et al. CellViT: vision transformers for precise cell segmentation and classification. Preprint at https://doi.org/10.48550/arxiv.2306.15350 (2023).

  61. Kaczmarzyk, J. R. et al. ChampKit: a framework for rapid evaluation of deep neural networks for patch-based histopathology classification. Computer Methods and Programs in Biomedicine 239, 107631 (2023).

    Article  PubMed  Google Scholar 

  62. Zhang, J. et al. Gigapixel whole-slide images classification using locally supervised learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 192–201 (Springer, 2022).

  63. Nasrallah, M. P. et al. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med. 4, 526–540 (2023).

    Article  PubMed  Google Scholar 

  64. Li, H. et al. Task-specific fine-tuning via variational information bottleneck for weakly-supervised pathology whole slide image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7454–7463 (2023).

  65. Ikezogwo, W. O. et al. Quilt-1M: One million image-text pairs for histopathology. In Advances in Neural Information Processing Systems (2023).

  66. Zhang, D. et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat. Biotechnol., https://doi.org/10.1038/s41587-023-02019-9 (2024).

    Article  PubMed  Google Scholar 

  67. Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Komura, D. et al. Universal encoding of pan-cancer histology by deep texture representations. Cell Rep. 38, 110424 (2022).

    Article  CAS  PubMed  Google Scholar 

  69. Kalra, S. et al. Yottixel: an image search engine for large archives of histopathology whole slide images. Med. Image Anal. 65, 101757 (2020).

    Article  PubMed  Google Scholar 

  70. Schmauch, B. et al. A deep learning model to predict RNA-seq expression of tumours from whole slide images. Nat. Commun. 11, 3877 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Graham, S. et al. One model is all you need: multi-task learning enables simultaneous histology image segmentation and classification. Med. Image Anal. 83, 102685 (2023).

    Article  PubMed  Google Scholar 

  72. Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Wulczyn, E. et al. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS One 15, e0233678 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Riasatian, A. et al. Fine-tuning and training of DenseNet for histopathology image representation using TCGA diagnostic slides. Med. Image Anal. 70, 102032 (2021).

    Article  PubMed  Google Scholar 

  75. Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In International Conference on Learning Representations (2021).

  76. GTEx Consortium Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  PubMed Central  Google Scholar 

  77. Kundra, R. et al. OncoTree: a cancer classification system for precision oncology. JCO Clin. Cancer Inform. 5, 221–230 (2021).

    Article  PubMed  Google Scholar 

  78. Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).

    Article  Google Scholar 

  79. Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607–616 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Shao, Z. et al. TransMIL: transformer based correlated multiple instance learning for whole slide image classification. In 35th Conference on Neural Information Processing Systems (2021).

  81. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Gatta, G. et al. Burden and centralised treatment in Europe of rare tumours: results of RARECAREnet – a population-based study. Lancet Oncol. 18, 1022–1039 (2017).

    Article  PubMed  Google Scholar 

  83. Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, 2132–2141 (2018).

  84. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).

  85. Kim, Y. J. et al. PAIP 2019: liver cancer segmentation challenge. Med. Image Anal. 67, 101854 (2021).

    Article  PubMed  Google Scholar 

  86. Lipkova, J. et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat. Med. 28, 575–582 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Cancer Genome Atlas Research Network. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 2481–2498 (2015).

  89. Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource. Sci. Data 9, 55 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  90. Pati, P. et al. Weakly supervised joint whole-slide segmentation and classification in prostate cancer. Preprint at https://doi.org/10.48550/arxiv.2301.02933 (2023).

  91. Jacovi, A., Caciularu, A., Goldman, O. & Goldberg, Y. Stop uploading test data in plain text: practical strategies for mitigating data contamination by evaluation benchmarks. Preprint at https://doi.org/10.48550/arxiv.2305.10160 (2023).

  92. Magar, I. & Schwartz, R. Data contamination: from memorization to exploitation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 157–165 (2022).

  93. Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).

    Google Scholar 

  94. Dodge, J. et al. Documenting large webtext corpora: a case study on the colossal clean crawled corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 1286–1305 (2021).

  95. Kapoor, S. & Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4(9), 100804 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  96. Xiang, J. & Zhang, J. Exploring low-rank property in multiple instance learning for whole slide image classification. In The Eleventh International Conference on Learning Representations (2022).

  97. Niehues, J. M. et al. Generalizable biomarker prediction from cancer pathology slides with self-supervised deep learning: a retrospective multi-centric study. Cell Rep. Med. 4, 100980 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  98. Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Pataki, B. Á. et al. HunCRC: annotated pathological slides to enhance deep learning applications in colorectal cancer screening. Sci. Data 9, 370 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  100. Barbano, C. A. et al. UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading. In 2021 IEEE International Conference on Image Processing (ICIP), 76–80 (IEEE, 2021).

  101. Huo, X. et al. Comprehensive AI model development for Gleason grading: from scanning, cloud-based annotation to pathologist–AI interaction. Preprint at https://doi.org/10.2139/ssrn.4172090 (2022).

  102. Komura, D. et al. Restaining-based annotation for cancer histology segmentation to overcome annotation-related limitations among pathologists. Patterns 4, 100688 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  103. Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention mask transformer for universal image segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).

  104. Fang, Y. et al. EVA: exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19358–19369 (2023).

  105. Wang, Y., Chao, W.-L., Weinberger, K. Q. & van der Maaten, L. SimpleShot: revisiting nearest-neighbor classification for few-shot learning. Preprint at https://doi.org/10.48550/arxiv.1911.04623 (2019).

  106. Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems 30 (2017).

  107. Vorontsov, E. et al. Virchow: a million-slide digital pathology foundation model. Preprint at https://doi.org/10.48550/arxiv.2309.07778 (2023).

  108. Campanella, G. et al. Computational pathology at health system scale: self-supervised foundation models from three billion images. Preprint at https://doi.org/10.48550/arxiv.2310.07033 (2023).

  109. Lai, J. et al. Domain-specific optimization and diverse evaluation of self-supervised models for histopathology. Preprint at https://doi.org/10.48550/arxiv.2310.13259 (2023).

  110. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Article  CAS  PubMed  Google Scholar 

  111. Chen, Z. et al. Vision transformer adapter for dense predictions. In The Eleventh International Conference on Learning Representations (2023).

  112. Wang, X. et al. SCL-WC: cross-slide contrastive learning for weakly-supervised whole-slide image classification. Advances in Neural Information Processing Systems 35, 18009–18021 (2022).

    Google Scholar 

  113. Kolesnikov, A., Zhai, X. & Beyer, L. Revisiting self-supervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1920–1929 (2019).

  114. Chen, R. J. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 7, 719–742 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  115. Lu, M. Y. et al. Towards a visual-language foundation model for computational pathology. Preprint at https://doi.org/10.48550/arxiv.2307.12914 (2023).

  116. Lu, M. Y. et al. A foundational multimodal vision language AI assistant for human pathology. Preprint at https://doi.org/10.48550/arxiv.2312.07814 (2023).

  117. Chen, R. J. et al. Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022).

  118. Zhou, J. et al. iBOT: image BERT pre-training with online tokenizer. In International Conference on Learning Representations (2022).

  119. Zhai, X., Oliver, A., Kolesnikov, A. & Beyer, L. S4L: self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1476–1485 (2019).

  120. Bandi, P. et al. From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE Trans. Med. Imaging 38, 550–560 (2019).

    Article  PubMed  Google Scholar 

  121. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (2018).

  122. Tian, K. et al. Designing BERT for convolutional networks: sparse and hierarchical masked modeling. In The Eleventh International Conference on Learning Representations (2023).

  123. Sablayrolles, A., Douze, M., Schmid, C. & Jégou, H. Spreading vectors for similarity search. In International Conference on Learning Representations (2019).

  124. Touvron, H., Vedaldi, A., Douze, M. & Jegou, H. Fixing the train–test resolution discrepancy. In Advances in Neural Information Processing Systems, Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).

  125. Dao, T., Fu, D. Y., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems (2022).

  126. Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10022 (2021).

  127. Kolesnikov, A. et al. Big Transfer (BiT): general visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, 491–507 (Springer, 2020).

  128. Lin, T., Yu, Z., Hu, H., Xu, Y. & Chen, C.-W. Interventional bag multi-instance learning on whole-slide pathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19830–19839 (2023).

  129. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (2019).

  130. Bentley, J. L. Multidimensional binary search trees used for associative searching. Communications of the ACM 18, 509–517 (1975).

    Article  Google Scholar 

  131. Zhu, C., Byrd, R. H., Lu, P. & Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software 23, 550–560 (1997).

    Article  Google Scholar 

  132. Sarıyıldız, M. B., Kalantidis, Y., Alahari, K. & Larlus, D. No reason for no supervision: improved generalization in supervised models. In The Eleventh International Conference on Learning Representations (2023).

  133. Fang, Z. et al. SEED: self-supervised distillation for visual representation. In International Conference on Learning Representations (2020).

  134. Pedregosa, F. et al. Scikit-learn: machine learning in python. Journal of Machine Learning Research 12, 2825–2830 (2011).

    Google Scholar 

  135. Ghiasi, G. et al. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2918–2928 (2021).

  136. El Banani, M., Desai, K. & Johnson, J. Learning visual representations via language-guided sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19208–19220 (2023).

  137. Koch, G., Zemel, R. & Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the 32nd International Conference on Machine Learning (2015).

  138. Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K. & Wierstra, D. Matching networks for one shot learning. In Advances in Neural Information Processing Systems 29 (2016).

  139. Yu, J.-G. et al. Prototypical multiple instance learning for predicting lymph node metastasis of breast cancer from whole-slide pathological images. Med. Image Anal. 85, 102748 (2023).

    Article  PubMed  Google Scholar 

  140. Yu, Z., Lin, T. & Xu, Y. SLPD: slide-level prototypical distillation for WSIs. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 259–269 (Springer, 2023).

  141. Quiros, A. C. et al. Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unlabeled, unannotated pathology slides. Preprint at https://doi.org/10.48550/arxiv.2205.01931 (2022).

  142. Yang, J., Chen, H., Yan, J., Chen, X. & Yao, J. Towards better understanding and better generalization of low-shot classification in histology images with contrastive learning. In International Conference on Learning Representations (2021).

  143. Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J. B. & Isola, P. Rethinking few-shot image classification: a good embedding is all you need? In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, 266–282 (Springer, 2020).

  144. Lloyd, S. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982).

    Article  Google Scholar 

  145. Zhu, X., Yao, J., Zhu, F. & Huang, J. WSISA: making survival prediction from whole slide histopathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7234–7242 (2017).

  146. Yao, J., Zhu, X. & Huang, J. Deep multi-instance learning for survival prediction from whole slide images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 496–504 (Springer, 2019).

  147. Yao, J., Zhu, X., Jonnagaddala, J., Hawkins, N. & Huang, J. Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Med. Image Anal. 65, 101789 (2020).

    Article  PubMed  Google Scholar 

  148. Li, R., Yao, J., Zhu, X., Li, Y. & Huang, J. Graph CNN for survival analysis on whole slide pathological images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 174–182 (Springer, 2018).

  149. Sivic, J. & Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision, 1470–1477 (IEEE, 2003).

  150. Fei-Fei, L. & Perona, P. A Bayesian hierarchical model for learning natural scene categories. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) Vol. 2, 524–531 (IEEE, 2005).

  151. Cruz-Roa, A., Caicedo, J. C. & González, F. A. Visual pattern mining in histology image collections using bag of features. Artif. Intell. Med. 52, 91–106 (2011).

    Article  PubMed  Google Scholar 

  152. Xu, Y. et al. Weakly supervised histopathology cancer image segmentation and classification. Med. Image Anal. 18, 591–604 (2014).

    Article  PubMed  Google Scholar 

  153. Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  154. Gillette, M. A. et al. Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma. Cell 182, 200–225 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  155. Satpathy, S. et al. A proteogenomic portrait of lung squamous cell carcinoma. Cell 184, 4348–4371 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  156. Zhu, M. et al. Development and evaluation of a deep neural network for histologic classification of renal cell carcinoma on biopsy and surgical resection slides. Sci. Rep. 11, 7080 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  157. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).

  158. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of papillary renal-cell carcinoma. N. Engl. J. Med. 374, 135–145 (2016).

  159. Davis, C. F. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26, 319–330 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  160. Li, Y. et al. Histopathologic and proteogenomic heterogeneity reveals features of clear cell renal cell carcinoma aggressiveness. Cancer Cell 41, 139–163 (2023).

    Article  CAS  PubMed  Google Scholar 

  161. Brancati, N. et al. BRACS: a dataset for breast carcinoma subtyping in H&E histology images. Database 2022, baac093 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  162. Bulten, W. et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).

    Article  PubMed  Google Scholar 

  163. Veeling, B. S., Linmans, J., Winkens, J., Cohen, T. & Welling, M. Rotation equivariant CNNs for digital pathology. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16–20, 2018, Proceedings, Part II 11, 210–218 (Springer, 2018).

  164. Koh, P. W. et al. WILDS: a benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, 5637–5664 (PMLR, 2021).

  165. Aresta, G. et al. BACH: grand challenge on breast cancer histology images. Med. Image Anal. 56, 122–139 (2019).

    Article  PubMed  Google Scholar 

  166. Brummer, O., Pölönen, P., Mustjoki, S. & Brück, O. Computational textural mapping harmonises sampling variation and reveals multidimensional histopathological fingerprints. British Journal of Cancer 129, 683–695 (2023).

  167. Tolkach, Y. et al. Artificial intelligence for tumour tissue detection and histological regression grading in oesophageal adenocarcinomas: a retrospective algorithm development and validation study. Lancet Digit. Health 5, e265–e275 (2023).

    Article  CAS  PubMed  Google Scholar 

  168. Howard, F. M. et al. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat. Commun. 12, 4423 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  169. Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE international Symposium on Biomedical Imaging: From Nano to Macro, 1107–1110 (IEEE, 2009).

  170. Abousamra, S. et al. Deep learning-based mapping of tumor infiltrating lymphocytes in whole slide images of 23 types of cancer. Front. Oncol. 11, 806603 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  171. Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (2019).

  172. Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, e215–e220 (2000).

    Article  CAS  PubMed  Google Scholar 

  173. Azizi, S. et al. Medical AI research foundations: a repository of medical foundation models (version 1.0.0). PhysioNet https://doi.org/10.13026/grp0-z205 (2023).

  174. Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y. & Girshick, R. Detectron2. GitHub https://github.com/facebookresearch/detectron2 (2019).

  175. Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26, 1045–1057 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Zhou and T. Darcet for providing insights into the training dynamics for iBOT and DINOv2, respectively, and L. Beyer for providing insights and feedback on evaluating self-supervised models. This work was supported in part by the BWH president’s fund, BWH & MGH Pathology, and National Institutes of Health (NIH) NIGMS R35GM138216 (F.M.). G.G. was supported by the BWH President’s Scholar Award, NIGMS R35GM149270, NIDDK P30DK034854, and the Massachusetts Life Sciences Center. R.J.C., D.S. and S.S. were supported by the NSF Graduate Research Fellowship. T.D. was supported by the Harvard SEAS Fellowship. M.Y.L. was supported by the Siebel Scholars program. D.F.K.W. was supported by the NIH NCI Ruth L. Kirschstein National Service Award, T32CA251062. L.O. was supported by the German Academic Exchange (DAAD) Fellowship. We also thank T. Janicki, R. Kenny and the system administration staff at the MGB Enterprise Research Infrastructure & Services (ERIS) Research Computing Core for their support with computing resources, and N. Vatanian, M. Thiagarajan, B. Fevrier-Sullivan and J. Kirby at the NIH for navigating access to whole-slide imaging data in CPTAC.

Author information

Authors and Affiliations

Authors

Contributions

R.J.C., F.M., M.Y.L., T.D. and D.F.K.W. conceived the study and designed the experiments. R.J.C., L.P.L., D.F.K.W., J.J.W., T.D., M.Y.L, G.J., A.H.S., B.C., D.S., M.S., L.O., A.Z., A.V. and S.S. collected the data for self-supervised learning. R.J.C., T.D. and M.Y.L. performed model development for self-supervised learning. R.J.C., M.Y.L, T.D., B.C. and G.J. organized the datasets and codebases for all downstream tasks regarding ROI classification, ROI segmentation and slide classification. R.J.C, T.D., M.Y.L., A.H.S., G.J., M.S., A.Z., L.L.W. and A.V. performed quality control of the codebase and the results. R.J.C., M.Y.L., T.D. and G.J. carried out analysis of the ROI classification. T.D., M.Y.L., R.J.C., L.L.W., A.Z. and W.W. carried out analysis of the ROI segmentation. R.J.C., T.D., B.C., D.S., M.S., M.W. and L.L.W. carried out analysis of the slide classification. R.J.C., T.D., M.Y.L., D.F.K.W., G.J., A.H.S., M.S., L.P.L., G.G. and F.M. interpreted the results and provided feedback on the study. R.J.C., T.D., M.Y.L, D.F.K.W. and F.M. prepared the paper with input from all co-authors. F.M. supervised the research.

Corresponding author

Correspondence to Faisal Mahmood.

Ethics declarations

Competing interests

R.J.C., M.Y.L. and F.M. are inventors on a provisional US patent (application no. 63/611,059) filed corresponding to the methodological aspects of this work. The other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Andrew Beck, Francesco Ciompi and Lee Cooper for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Few-shot slide classification.

To study the label efficiency of UNI in slide classification, we compare UNI with other pretrained encoders on: a. breast metastasis detection in CAMELYON16, b. NSCLC subtyping in CPTAC (trained on TCGA) c. RCC subtyping in CPTAC-DHMC (trained on TCGA), d. RCC subtyping in DHMC, e. BRCA coarse-grained subtyping in BRACS, f. BRCA fine-grained subtyping in BRACS, g. CRC screening in HunCRC, h. Prostate ISUP Grading in PANDA, i. glioma IDH1 prediction in EBRAINS (trained on TCGA), j. glioma histomolecular subtyping in EBRAINS (trained on TCGA), k. brain tumor coarse-grained subtyping in EBRAINS, l. brain tumor fine-grained subtyping in EBRAINS, and m. heart transplant assessment in BWH-EMB. The performance is measured across different few-shot settings with K  1, 2, 4, 8, 16, 32 training examples used per class. Boxes indicate quartile values of model performance (n = 5 runs) and whiskers extend to data points within 1.5 × the interquartile range. Overall, we observe that UNI consistently demonstrates superior label efficiency over other baselines.

Extended Data Fig. 2 Comparing supervised performance on PRAD tissue classification in AGCC.

Qualitative illustrations comparing UNI to CTransPath, REMEDIS, and ResNet-50 (IN) via KNN probing on PRAD tissue classification in AGCC. UNI achieves better accuracy (acc.) on all three examples. The reported results are based on partial annotations (left-most panel) provided by pathologists.

Extended Data Fig. 3 ROI retrieval.

We evaluate content-based image retrieval for ROI-level classes with at least 5 classes, for a. CRC tissue classification in CRC-100K, b. CRC tissue classification in HunCRC, c. ESCA subtyping on CHA (trained on UKK, WNS and TCGA), d. PRAD tissue classification in AGGC, e. CRC polyp classification in UniToPatho, and f. pan-cancer tissue classification in TCGA, and. UNI consistently outperforming all pretrained encoders. Error bars represent 95% confidence intervals and the center is the computed value of the corresponding retrieval metric. Detailed performance metrics are further provided in Supplementary Tables 6368.

Extended Data Fig. 4 ROI classification across different image resolutions.

To assess how image resolution affects performance, we compare UNI and other baselines on various resized and center-cropped ROIs for a. BRCA subtyping and b. CRC polyp classification tasks. The original image sizes are 2048 × 1536 and 1812 × 1812 pixels, respectively. All models are evaluated on linear, SimpleShot (1-NN), and KNN (20-NN) probe settings. UNI consistently outperforms all baselines across all resolutions. The performance metrics are further provided in Supplementary Tables 45, 46, 51, 52.

Extended Data Fig. 5 Multi-head self-attention (MHSA) heatmap visualization of UNI across different image resolutions in BRCA Subtyping in BACH.

Each colored square represents a 16 × 16 patch token encoded by UNI, with heatmap color corresponding to the attention weight of that patch token to the global [CLS] token of the penultimate layer in UNI. We show MHSA visualizations for resized and center-cropped ROIs at 2242, 4482, 8962, 1,3442 resolutions for the a. normal, b. benign, c. in situ, and d. invasive classes in BACH. In each, the left-most image is the original H&E ROI and the right four images are the MHSA visualizations. For comparative purposes, we resize all images within the figure to have the same dimension, but note that at higher resolutions, each colored square has an original image resolution of 16 × 16 pixels at 0.42 mpp. As the resolution increases, the heatmaps demonstrate increasing and increasingly fine-grained attention focused on epithelial structures, with relatively lower attention on stroma or other background, neither of which are contributory to the diagnoses in these ROIs.

Extended Data Fig. 6 Multi-head self-attention (MHSA) heatmap visualization of UNI across different image resolutions for CRC polyp classification in UniToPatho.

Each colored square represents a 16 × 16 patch token encoded by UNI, with heatmap color corresponding to the attention weight of that patch token to the global [CLS] token of the penultimate layer in UNI. We show MHSA visualizations for resized and center-cropped ROIs at 2242, 4482, 8962, 17922 resolutions for a. normal tissue, b. hyperplastic polyp, c. tubular adenoma with low-grade dysplasia, d. tubular adenoma with high-grade dysplasia, e. tubulo-villous adenoma with high-grade dysplasia, and f. tubulo-villous adenoma with low-grade dysplasia. In each, the left-most image is the original H&E ROI and the right four images are the MHSA visualizations. For comparative purposes, we resize all images within the figure to have the same dimension, but note that at higher resolutions, each colored square has an original image resolution of 16 × 16 pixels at 0.48 mpp. As resolution increases, the heatmaps demonstrate increasing and increasingly fine-grained attention focused on the crypts, in all cases except the hyperplastic polyp in b, focusing on areas a pathologist would use to make the diagnosis.

Extended Data Fig. 7 Visualizing segmentation results in SegPath.

Using the Mask2Former head, we visualize the tissue segmentation of each class in SegPath created by all pretrained encoders. Overall, we find that UNI is competitive with convolutional and hierarchical models like CTransPath and REMEDIS in matching the segmentation masks obtained via immunofluorescence and DAPI nuclear staining.

Extended Data Fig. 8 Few-shot ROI classification using class prototypes.

Similar to slide-level classification, we also assess the label efficiency of UNI on ROI-level tasks, and observe superior label efficiency of UNI on most tasks except CRC tissue classification on HunCRC. We evaluate all pretrained encoders using the nonparametric SimpleShot framework for a. CRC tissue classification in CRC-100K, b. Breast metastasis detection in CAMELYON17-WILDS, c. RCC tissue classification on HEL (trained on TCGA), d. BRCA subtyping in BACH, e. CRC tissue classification in HunCRC, f. ESCA subtyping on CHA (UKK+WNS+TCGA), g. PRAD tissue classification in AGGC, h. CRC polyp classification in UniToPatho, i. CRC MSI screening in TCGA, j. pan-cancer tissue classification in TCGA, and k. pan-cancer TIL detection in TCGA. The performance is measured across different few-shot settings with K  1, 2, 4, 8, 16, 32, 64, 128, 256 training examples used per class (or support set size). Boxes indicate quartile values of model performance (n = 1000 runs) and whiskers extend to data points within 1.5 × the interquartile range.

Extended Data Fig. 9 Few-shot slide classification using class prototypes.

We adapt the SimpleShot framework for slide-level classification, called ‘MI-SimpleShot’. ROI class prototypes are constructed by averaging the pre-extracted ROI features for each class using the ‘TCGA Uniform Tumor’ dataset, which we use as ‘prompts’ for assigning the slide-level label. We assess and compare the few-shot performance of all pretrained encoders on NSCLC subtyping (a) and RCC subtyping task (b), using the same runs (n = 5) in the few-shot setting for ABMIL for K  1, 2, 4, 8, 16, 32 training examples used per class. We compare performance of top-5 and top-50 pooling of nearest patches in the test set, as well as show performance on both the internal test fold in TCGA and external cohort. Boxes indicate quartile values of model performance (n = 5 runs) and whiskers extend to data points within 1.5 × the interquartile range. Overall, we observe that the formed prototypes by UNI can be used to classify slides based on the MI-SimpleShot frame- work. a. On NSCLC subtyping, we observe that 2-shot and 4-shot performance from UNI outperforms the 32-shot performance of all other models. b. On RCC subtyping, 1-shot performance of UNI also outperforms the 32-shot performance of other models. We also observe that MI-SimpleShot can be combined with other pretrained encoders as well, but generally require more annotated ROIs for creating prototypes.

Extended Data Fig. 10 Comparing 1-shot similarity heatmaps of pretrained encoders with class prototype.

We compare the similarity heatmaps of all pretrained encoders using annotated ROIs from a single slide per class for forming class prototypes in MI-SimpleShot (with top-5 pooling) on NSCLC subtyping (a) and RCC subtyping task (b), with top visualizing example ROIs used for each class, and bottom showing similarity heatmaps. Outlined in blue are pathologist annotations of ROIs that match the label of the histology slide. Similarity heatmaps are created with respect to the class prototype of the correct slide label (indicated in green), with a indicating a correct prediction and ✗ indicating incorrect prediction. Note that since the visualizations are created with respect to the ground truth label, the model may retrieve correct patches that match pathologist annotations but still misclassify the slide. a. On a LUAD slide, we observe strong agreement of the pathologist’s annotations with retrieved LUAD patches by UNI. Although retrieved patches by REMEDIS also have strong agreement with the pathologist’s annotations, we note that slide was misclassified as LUSC, indicating that the top-5 retrieved patches of the LUSC prototype was higher than that of the LUAD prototype. Vice versa, ResNet-50IN classifies the slide correctly but incorrectly retrieves the correct patches that agree with the pathologist’s annotations, indicating that non-LUAD patches in the slide were more LUAD-like than the pathologist-annotated LUAD patches with respect to the LUAD prototype. The similarity heatmap for CTransPath both misclassified the slide and retried incorrect patches. b. On an CCRCC slide, we observe strong agreement of the pathologist’s annotations with retrieved CCRCC patches by UNI. We observe similar mismatch in predicted class label and retrieved patches, in which REMEDIS classifies the slide correctly but retrieves the incorrect patches, and CTransPath misclassifies the slide but retrieves the correct patches.

Supplementary information

Supplementary Information

Supplementary Tables 1–73.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, R.J., Ding, T., Lu, M.Y. et al. Towards a general-purpose foundation model for computational pathology. Nat Med 30, 850–862 (2024). https://doi.org/10.1038/s41591-024-02857-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-024-02857-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing