Towards a general-purpose foundation model for computational pathology

Chen, Richard J.; Ding, Tong; Lu, Ming Y.; Williamson, Drew F. K.; Jaume, Guillaume; Song, Andrew H.; Chen, Bowen; Zhang, Andrew; Shao, Daniel; Shaban, Muhammad; Williams, Mane; Oldenburg, Lukas; Weishaupt, Luca L.; Wang, Judy J.; Vaidya, Anurag; Le, Long Phi; Gerber, Georg; Sahai, Sharifa; Williams, Walt; Mahmood, Faisal

doi:10.1038/s41591-024-02857-3

Article
Published: 19 March 2024

Towards a general-purpose foundation model for computational pathology

Richard J. Chen ORCID: orcid.org/0000-0003-0389-1331^1,2,3,4,5^na1,
Tong Ding^1,6^na1,
Ming Y. Lu^1,2,3,4,7^na1,
Drew F. K. Williamson ORCID: orcid.org/0000-0003-1745-8846^1,2,3^na1,
Guillaume Jaume^1,2,3,4,
Andrew H. Song^1,2,3,4,
Bowen Chen^1,2,
Andrew Zhang ORCID: orcid.org/0000-0002-9432-2793^1,2,3,4,8,
Daniel Shao^1,2,3,4,8,
Muhammad Shaban^1,2,3,4,
Mane Williams^1,2,3,4,5,
Lukas Oldenburg¹,
Luca L. Weishaupt^1,2,3,4,8,
Judy J. Wang¹,
Anurag Vaidya^1,2,3,4,8,
Long Phi Le^2,8,
Georg Gerber ORCID: orcid.org/0000-0002-9149-5509¹,
Sharifa Sahai^1,2,3,4,9,
Walt Williams^1,6 &
…
Faisal Mahmood ORCID: orcid.org/0000-0001-7587-1562^1,2,3,4,10

Nature Medicine volume 30, pages 850–862 (2024)Cite this article

14k Accesses
1 Citations
160 Altmetric
Metrics details

Subjects

Abstract

Quantitative evaluation of tissue images is crucial for computational pathology (CPath) tasks, requiring the objective characterization of histopathological entities from whole-slide images (WSIs). The high resolution of WSIs and the variability of morphological features present significant challenges, complicating the large-scale annotation of data for high-performance applications. To address this challenge, current efforts have proposed the use of pretrained image encoders through transfer learning from natural image datasets or self-supervised learning on publicly available histopathology datasets, but have not been extensively developed and evaluated across diverse tissue types at scale. We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data) across 20 major tissue types. The model was evaluated on 34 representative CPath tasks of varying diagnostic difficulty. In addition to outperforming previous state-of-the-art models, we demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types in the OncoTree classification system. UNI advances unsupervised representation learning at scale in CPath in terms of both pretraining data and downstream evaluation, enabling data-efficient artificial intelligence models that can generalize and transfer to a wide range of diagnostically challenging tasks and clinical workflows in anatomic pathology.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Slide-level tasks for OT-43 and OT-108, and slide-level task performance.**

**Fig. 4: Few-shot ROI- and slide-level prototyping.**

Data-efficient and weakly supervised computational pathology on whole-slide images

Article 01 March 2021

A generalized deep learning framework for whole-slide image segmentation and analysis

Article Open access 02 June 2021

Fast and scalable search of whole-slide images via self-supervised deep learning

Article Open access 10 October 2022

Data availability

TCGA and CPTAC data consisting of whole-slide images and labels can be accessed through the NIH genomic data commons (https://portal.gdc.cancer.gov) and proteomics data commons (https://proteomic.datacommons.cancer.gov), respectively. GTEx data added to the pretraining dataset can be accessed through the GTEx portal (https://www.gtexportal.org/home/). CPTAC data consisting of all publicly available datasets analyzed in this work can be can accessed in their respective data portals: CRC-100K (https://zenodo.org/record/1214456), HunCRC ROIs (10.6084/m9.figshare.c.5927795.v1), HunCRC slides (10.7937/tcia.9cjf-0127), BACH (https://iciar2018-challenge.grand-challenge.org/Dataset/), TCGA CRC-MSI (https://zenodo.org/record/3832231), CCRCC tissue classification (https://zenodo.org/record/7898308), TCGA-TILs (https://zenodo.org/record/6604094), TCGA Uniform (https://zenodo.org/record/5889558), UniToPatho (https://zenodo.org/record/4643645), ESCA(https://zenodo.org/record/7548828), CAMELYON17-WILDS (https://wilds.stanford.edu/datasets), EBRAINS (10.25493/WQ48-ZGX), DHMC (https://bmirds.github.io/KidneyCancer), BRACS (https://bracs.icar.cnr.it), PANDA (https://panda.grand-challenge.org), SegPath (https://zenodo.org/record/7412731) and AGGC (https://zenodo.org/record/6460100). TCGA, CPTAC, HunCRC and TCGA-TILS can also be accessed using The Cancer Imaging Archive¹⁷⁵. Links for all datasets are also listed in Supplementary Table 73. We note that data from AGGC were obtained from a public grand challenge (of the same name (https://aggc22.grand-challenge.org)) with a pending publication¹⁰¹, with permission granted by the challenge organizers to present results from this dataset. No internal patient data were specifically collected for this study. This study relies on retrospective analysis of anonymized whole-slide images. Following institution policies, all requests for data collected or curated in-house will be evaluated on a case-by-case basis to determine whether the data requested and the use case comply with intellectual property or patient privacy obligations.

Code availability

Code and model weights for UNI can be accessed for academic research purposes at https://github.com/mahmoodlab/UNI. We have documented all technical deep learning methods and software libraries used in the study while ensuring that the paper is accessible to the broader clinical and scientific audience.

References

Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng. 1, 930–949 (2023).
Article Google Scholar
Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology: new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Article PubMed PubMed Central Google Scholar
Lipkova, J. et al. Artificial intelligence for multimodal data integration in oncology. Cancer Cell 40, 1095–1110 (2022).
Article CAS PubMed PubMed Central Google Scholar
Heinz, C. N., Echle, A., Foersch, S., Bychkov, A. & Kather, J. N. The future of artificial intelligence in digital pathology: results of a survey across stakeholder groups. Histopathology 80, 1121–1127 (2022).
Article PubMed Google Scholar
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kather, J. N. et al. Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer. Nat. Med. 25, 1054–1056 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mobadersany, P. et al. Predicting cancer outcomes from histology and genomics using convolutional networks. Proc. Natl Acad. Sci. USA 115, E2970–E2979 (2018).
Article CAS PubMed PubMed Central Google Scholar
Amgad, M. et al. A population-level digital histologic biomarker for enhanced prognosis of invasive breast cancer. Nat. Med. 30, 85–97 (2024).
Article CAS PubMed Google Scholar
Chen, R. J. et al. Pan-cancer integrative histology-genomic analysis via multimodal deep learning. Cancer Cell 40, 865–878 (2022).
Article CAS PubMed PubMed Central Google Scholar
Vanguri, R. S. et al. Multimodal integration of radiology, pathology and genomics for prediction of response to PD-(L)1 blockade in patients with non-small cell lung cancer. Nat. Cancer 3, 1151–1164 (2022).
Article CAS PubMed PubMed Central Google Scholar
Cooper, M., Ji, Z. & Krishnan, R. G. Machine learning in computational histopathology: challenges and opportunities. Genes Chromosomes Cancer 62, 540–556 (2023).
Article CAS PubMed Google Scholar
Graham, S. et al. Screening of normal endoscopic large bowel biopsies with interpretable graph learning: a retrospective study. Gut 72, 1709–1721 (2023).
Article PubMed Google Scholar
Ozyoruk, K. B. et al. A deep-learning model for transforming the style of tissue images from cryosectioned to formalin-fixed and paraffin-embedded. Nat. Biomed. Eng. 6, 1407–1419 (2022).
Article CAS PubMed Google Scholar
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Article CAS PubMed Google Scholar
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Article PubMed PubMed Central Google Scholar
Kather, J. N. et al. Pan-cancer image-based detection of clinically actionable genetic alterations. Nat. Cancer 1, 789–799 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y. et al. Pan-cancer computational histopathology reveals mutations, tumor composition and prognosis. Nat. Cancer 1, 800–810 (2020).
Article CAS PubMed Google Scholar
Bulten, W. et al. Artificial intelligence for diagnosis and Gleason grading of prostate cancer: the PANDA challenge. Nat. Med. 28, 154–163 (2022).
Article CAS PubMed PubMed Central Google Scholar
Foersch, S. et al. Multistain deep learning for prediction of prognosis and therapy response in colorectal cancer. Nat. Med. 29, 430–439 (2023).
Article CAS PubMed Google Scholar
Chen, R. J. et al. Multimodal co-attention transformer for survival prediction in gigapixel whole slide images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 4015–4025 (2021).
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16000–16009 (2022).
Oquab, M. et al. DINOv2: learning robust visual features without supervision. Preprint at https://doi.org/10.48550/arxiv.2304.07193 (2023).
Balestriero, R. et al. A cookbook of self-supervised learning. Preprint at https://doi.org/10.48550/arxiv.2304.12210 (2023).
Chen, X., Xie, S. & He, K. An empirical study of training self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (2021).
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international Conference on Computer Vision, 9650–9660 (2021).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, 1597–1607 (PMLR, 2020).
Grill, J.-B. et al. Bootstrap your own latent: a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020).
Google Scholar
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 248–255 (IEEE, 2009).
Russakovsky, O. et al. ImageNet large scale visual recognition challenge. International Journal of Computer Vision 115, 211–252 (2015).
Article Google Scholar
Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE International Conference on Computer Vision, 843–852 (2017).
Zhai, X., Kolesnikov, A., Houlsby, N. & Beyer, L. Scaling vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12104–12113 (2022).
Goyal, P., Mahajan, D., Gupta, A. & Misra, I. Scaling and benchmarking self-supervised visual representation learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 6391–6400 (2019).
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://doi.org/10.48550/arxiv.2108.07258 (2021).
Yuan, L. et al. Florence: A new foundation model for computer vision. Preprint at https://doi.org/10.48550/arxiv.2111.11432 (2021).
Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Article PubMed PubMed Central Google Scholar
Chen, R. J. & Krishnan, R. G. Self-supervised vision transformers learn visual concepts in histopathology. In Learning Meaningful Representations of Life, NeurIPS 2021 (2022).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Article PubMed Google Scholar
Azizi, S. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng. 7, 756–779 (2023).
Article PubMed Google Scholar
Kang, M., Song, H., Park, S., Yoo, D. & Pereira, S. Benchmarking self-supervised learning on diverse pathology datasets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3344–3354 (2023).
Li, B., Li, Y. & Eliceiri, K. W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14318–14328 (2021).
Lazard, T., Lerousseau, M., Decencière, E. & Walter, T. Giga-SSL: self-supervised learning for gigapixel images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4304–4313 (2023).
Schirris, Y., Gavves, E., Nederlof, I., Horlings, H. M. & Teuwen, J. DeepSMILE: contrastive self-supervised pre-training benefits MSI and HRD classification directly from H&E whole-slide images in colorectal and breast cancer. Med. Image Anal. 79, 102464 (2022).
Article PubMed Google Scholar
Vu, Q. D., Rajpoot, K., Raza, S. E. A. & Rajpoot, N. Handcrafted Histological Transformer (H2T): unsupervised representation of whole slide images. Med. Image Anal. 85, 102743 (2023).
Article PubMed Google Scholar
Zhao, Y. et al. Predicting lymph node metastasis using histopathological images based on multiple instance learning with deep graph convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4837–4846 (2020).
Wang, X. et al. RetCCL: clustering-guided contrastive learning for whole-slide image retrieval. Med. Image Anal. 83, 102645 (2023).
Article PubMed Google Scholar
Filiot, A. et al. Scaling self-supervised learning for histopathology with masked image modeling. Preprint at https://doi.org/10.1101/2023.07.21.23292757 (2023).
Srinidhi, C. L., Kim, S. W., Chen, F.-D. & Martel, A. L. Self-supervised driven consistency training for annotation efficient histopathology image analysis. Med. Image Anal. 75, 102256 (2022).
Article PubMed Google Scholar
Koohbanani, N. A., Unnikrishnan, B., Khurram, S. A., Krishnaswamy, P. & Rajpoot, N. Self-Path: self-supervision for classification of pathology images with limited annotations. IEEE Trans. Med. Imaging 40, 2845–2856 (2021).
Article PubMed Google Scholar
Ciga, O., Xu, T. & Martel, A. L. Self supervised contrastive learning for digital histopathology. Machine Learning with Applications 7, 100198 (2022).
Article Google Scholar
Lin, T. et al. SGCL: spatial guided contrastive learning on whole-slide pathological images. Med. Image Anal. 89, 102845 (2023).
Article PubMed Google Scholar
Tellez, D., Litjens, G., van der Laak, J. & Ciompi, F. Neural image compression for gigapixel histopathology image analysis. IEEE Trans. Pattern Anal. Mach. Intell. 43, 567–578 (2021).
Article PubMed Google Scholar
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Article CAS PubMed Google Scholar
Jiang, C. et al. Hierarchical discriminative learning improves visual representations of biomedical microscopy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19798–19808 (2023).
Saldanha, O. L. et al. Self-supervised attention-based deep learning for pan-cancer mutation prediction from histopathology. NPJ Precis. Oncol. 7, 35 (2023).
Article PubMed PubMed Central Google Scholar
Lu, M. Y. et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 19764–19775 (2023).
Mokhtari, R. et al. Interpretable histopathology-based prediction of disease relevant features in inflammatory bowel disease biopsies using weakly-supervised deep learning. In Medical Imaging with Deep Learning 479–495 (PMLR, 2023).
Jaume, G. et al. Modeling dense multimodal interactions between biological pathways and histology for survival prediction. Preprint at https://doi.org/10.48550/arxiv.2304.06819 (2023).
Hörst, F. et al. Histology-based prediction of therapy response to neoadjuvant chemotherapy for esophageal and esophagogastric junction adenocarcinomas using deep learning. JCO Clin. Cancer Inform. 7, e2300038 (2023).
Article PubMed Google Scholar
Wagner, S. J. et al. Transformer-based biomarker prediction from colorectal cancer histology: a large-scale multicentric study. Cancer Cell 41, 1650–1661 (2023).
Article Google Scholar
Hörst, F. et al. CellViT: vision transformers for precise cell segmentation and classification. Preprint at https://doi.org/10.48550/arxiv.2306.15350 (2023).
Kaczmarzyk, J. R. et al. ChampKit: a framework for rapid evaluation of deep neural networks for patch-based histopathology classification. Computer Methods and Programs in Biomedicine 239, 107631 (2023).
Article PubMed Google Scholar
Zhang, J. et al. Gigapixel whole-slide images classification using locally supervised learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 192–201 (Springer, 2022).
Nasrallah, M. P. et al. Machine learning for cryosection pathology predicts the 2021 WHO classification of glioma. Med. 4, 526–540 (2023).
Article PubMed Google Scholar
Li, H. et al. Task-specific fine-tuning via variational information bottleneck for weakly-supervised pathology whole slide image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7454–7463 (2023).
Ikezogwo, W. O. et al. Quilt-1M: One million image-text pairs for histopathology. In Advances in Neural Information Processing Systems (2023).
Zhang, D. et al. Inferring super-resolution tissue architecture by integrating spatial transcriptomics with histology. Nat. Biotechnol., https://doi.org/10.1038/s41587-023-02019-9 (2024).
Article PubMed Google Scholar
Saltz, J. et al. Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images. Cell Rep. 23, 181–193 (2018).
Article CAS PubMed PubMed Central Google Scholar
Komura, D. et al. Universal encoding of pan-cancer histology by deep texture representations. Cell Rep. 38, 110424 (2022).
Article CAS PubMed Google Scholar
Kalra, S. et al. Yottixel: an image search engine for large archives of histopathology whole slide images. Med. Image Anal. 65, 101757 (2020).
Article PubMed Google Scholar
Schmauch, B. et al. A deep learning model to predict RNA-seq expression of tumours from whole slide images. Nat. Commun. 11, 3877 (2020).
Article CAS PubMed PubMed Central Google Scholar
Graham, S. et al. One model is all you need: multi-task learning enables simultaneous histology image segmentation and classification. Med. Image Anal. 83, 102685 (2023).
Article PubMed Google Scholar
Diao, J. A. et al. Human-interpretable image features derived from densely mapped cancer pathology slides predict diverse molecular phenotypes. Nat. Commun. 12, 1613 (2021).
Article CAS PubMed PubMed Central Google Scholar
Wulczyn, E. et al. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS One 15, e0233678 (2020).
Article CAS PubMed PubMed Central Google Scholar
Riasatian, A. et al. Fine-tuning and training of DenseNet for histopathology image representation using TCGA diagnostic slides. Med. Image Anal. 70, 102032 (2021).
Article PubMed Google Scholar
Dosovitskiy, A. et al. An image is worth 16×16 words: transformers for image recognition at scale. In International Conference on Learning Representations (2021).
GTEx Consortium Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Article PubMed Central Google Scholar
Kundra, R. et al. OncoTree: a cancer classification system for precision oncology. JCO Clin. Cancer Inform. 5, 221–230 (2021).
Article PubMed Google Scholar
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Article Google Scholar
Campbell, J. D. et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat. Genet. 48, 607–616 (2016).
Article CAS PubMed PubMed Central Google Scholar
Shao, Z. et al. TransMIL: transformer based correlated multiple instance learning for whole slide image classification. In 35th Conference on Neural Information Processing Systems (2021).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gatta, G. et al. Burden and centralised treatment in Europe of rare tumours: results of RARECAREnet – a population-based study. Lancet Oncol. 18, 1022–1039 (2017).
Article PubMed Google Scholar
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, 2132–2141 (2018).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778 (2016).
Kim, Y. J. et al. PAIP 2019: liver cancer segmentation challenge. Med. Image Anal. 67, 101854 (2021).
Article PubMed Google Scholar
Lipkova, J. et al. Deep learning-enabled assessment of cardiac allograft rejection from endomyocardial biopsies. Nat. Med. 28, 575–582 (2022).
Article CAS PubMed PubMed Central Google Scholar
Brennan, C. W. et al. The somatic genomic landscape of glioblastoma. Cell 155, 462–477 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N. Engl. J. Med. 372, 2481–2498 (2015).
Roetzer-Pejrimovsky, T. et al. The Digital Brain Tumour Atlas, an open histopathology resource. Sci. Data 9, 55 (2022).
Article PubMed PubMed Central Google Scholar
Pati, P. et al. Weakly supervised joint whole-slide segmentation and classification in prostate cancer. Preprint at https://doi.org/10.48550/arxiv.2301.02933 (2023).
Jacovi, A., Caciularu, A., Goldman, O. & Goldberg, Y. Stop uploading test data in plain text: practical strategies for mitigating data contamination by evaluation benchmarks. Preprint at https://doi.org/10.48550/arxiv.2305.10160 (2023).
Magar, I. & Schwartz, R. Data contamination: from memorization to exploitation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 157–165 (2022).
Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020).
Google Scholar
Dodge, J. et al. Documenting large webtext corpora: a case study on the colossal clean crawled corpus. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 1286–1305 (2021).
Kapoor, S. & Narayanan, A. Leakage and the reproducibility crisis in machine-learning-based science. Patterns 4(9), 100804 (2023).
Article PubMed PubMed Central Google Scholar
Xiang, J. & Zhang, J. Exploring low-rank property in multiple instance learning for whole slide image classification. In The Eleventh International Conference on Learning Representations (2022).
Niehues, J. M. et al. Generalizable biomarker prediction from cancer pathology slides with self-supervised deep learning: a retrospective multi-centric study. Cell Rep. Med. 4, 100980 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).
Article PubMed PubMed Central Google Scholar
Pataki, B. Á. et al. HunCRC: annotated pathological slides to enhance deep learning applications in colorectal cancer screening. Sci. Data 9, 370 (2022).
Article PubMed PubMed Central Google Scholar
Barbano, C. A. et al. UniToPatho, a labeled histopathological dataset for colorectal polyps classification and adenoma dysplasia grading. In 2021 IEEE International Conference on Image Processing (ICIP), 76–80 (IEEE, 2021).
Huo, X. et al. Comprehensive AI model development for Gleason grading: from scanning, cloud-based annotation to pathologist–AI interaction. Preprint at https://doi.org/10.2139/ssrn.4172090 (2022).
Komura, D. et al. Restaining-based annotation for cancer histology segmentation to overcome annotation-related limitations among pathologists. Patterns 4, 100688 (2023).
Article CAS PubMed PubMed Central Google Scholar
Cheng, B., Misra, I., Schwing, A. G., Kirillov, A. & Girdhar, R. Masked-attention mask transformer for universal image segmentation. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022).
Fang, Y. et al. EVA: exploring the limits of masked visual representation learning at scale. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19358–19369 (2023).
Wang, Y., Chao, W.-L., Weinberger, K. Q. & van der Maaten, L. SimpleShot: revisiting nearest-neighbor classification for few-shot learning. Preprint at https://doi.org/10.48550/arxiv.1911.04623 (2019).
Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems 30 (2017).
Vorontsov, E. et al. Virchow: a million-slide digital pathology foundation model. Preprint at https://doi.org/10.48550/arxiv.2309.07778 (2023).
Campanella, G. et al. Computational pathology at health system scale: self-supervised foundation models from three billion images. Preprint at https://doi.org/10.48550/arxiv.2310.07033 (2023).
Lai, J. et al. Domain-specific optimization and diverse evaluation of self-supervised models for histopathology. Preprint at https://doi.org/10.48550/arxiv.2310.13259 (2023).
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Article CAS PubMed Google Scholar
Chen, Z. et al. Vision transformer adapter for dense predictions. In The Eleventh International Conference on Learning Representations (2023).
Wang, X. et al. SCL-WC: cross-slide contrastive learning for weakly-supervised whole-slide image classification. Advances in Neural Information Processing Systems 35, 18009–18021 (2022).
Google Scholar
Kolesnikov, A., Zhai, X. & Beyer, L. Revisiting self-supervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1920–1929 (2019).
Chen, R. J. et al. Algorithmic fairness in artificial intelligence for medicine and healthcare. Nat. Biomed. Eng. 7, 719–742 (2023).
Article PubMed PubMed Central Google Scholar
Lu, M. Y. et al. Towards a visual-language foundation model for computational pathology. Preprint at https://doi.org/10.48550/arxiv.2307.12914 (2023).
Lu, M. Y. et al. A foundational multimodal vision language AI assistant for human pathology. Preprint at https://doi.org/10.48550/arxiv.2312.07814 (2023).
Chen, R. J. et al. Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022).
Zhou, J. et al. iBOT: image BERT pre-training with online tokenizer. In International Conference on Learning Representations (2022).
Zhai, X., Oliver, A., Kolesnikov, A. & Beyer, L. S4L: self-supervised semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 1476–1485 (2019).
Bandi, P. et al. From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE Trans. Med. Imaging 38, 550–560 (2019).
Article PubMed Google Scholar
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (2018).
Tian, K. et al. Designing BERT for convolutional networks: sparse and hierarchical masked modeling. In The Eleventh International Conference on Learning Representations (2023).
Sablayrolles, A., Douze, M., Schmid, C. & Jégou, H. Spreading vectors for similarity search. In International Conference on Learning Representations (2019).
Touvron, H., Vedaldi, A., Douze, M. & Jegou, H. Fixing the train–test resolution discrepancy. In Advances in Neural Information Processing Systems, Vol. 32 (eds Wallach, H. et al.) (Curran Associates, 2019).
Dao, T., Fu, D. Y., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems (2022).
Liu, Z. et al. Swin transformer: hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 10012–10022 (2021).
Kolesnikov, A. et al. Big Transfer (BiT): general visual representation learning. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16, 491–507 (Springer, 2020).
Lin, T., Yu, Z., Hu, H., Xu, Y. & Chen, C.-W. Interventional bag multi-instance learning on whole-slide pathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19830–19839 (2023).
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In International Conference on Learning Representations (2019).
Bentley, J. L. Multidimensional binary search trees used for associative searching. Communications of the ACM 18, 509–517 (1975).
Article Google Scholar
Zhu, C., Byrd, R. H., Lu, P. & Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Transactions on Mathematical Software 23, 550–560 (1997).
Article Google Scholar
Sarıyıldız, M. B., Kalantidis, Y., Alahari, K. & Larlus, D. No reason for no supervision: improved generalization in supervised models. In The Eleventh International Conference on Learning Representations (2023).
Fang, Z. et al. SEED: self-supervised distillation for visual representation. In International Conference on Learning Representations (2020).
Pedregosa, F. et al. Scikit-learn: machine learning in python. Journal of Machine Learning Research 12, 2825–2830 (2011).
Google Scholar
Ghiasi, G. et al. Simple copy-paste is a strong data augmentation method for instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2918–2928 (2021).
El Banani, M., Desai, K. & Johnson, J. Learning visual representations via language-guided sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19208–19220 (2023).
Koch, G., Zemel, R. & Salakhutdinov, R. Siamese neural networks for one-shot image recognition. In Proceedings of the 32nd International Conference on Machine Learning (2015).
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K. & Wierstra, D. Matching networks for one shot learning. In Advances in Neural Information Processing Systems 29 (2016).
Yu, J.-G. et al. Prototypical multiple instance learning for predicting lymph node metastasis of breast cancer from whole-slide pathological images. Med. Image Anal. 85, 102748 (2023).
Article PubMed Google Scholar
Yu, Z., Lin, T. & Xu, Y. SLPD: slide-level prototypical distillation for WSIs. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 259–269 (Springer, 2023).
Quiros, A. C. et al. Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unlabeled, unannotated pathology slides. Preprint at https://doi.org/10.48550/arxiv.2205.01931 (2022).
Yang, J., Chen, H., Yan, J., Chen, X. & Yao, J. Towards better understanding and better generalization of low-shot classification in histology images with contrastive learning. In International Conference on Learning Representations (2021).
Tian, Y., Wang, Y., Krishnan, D., Tenenbaum, J. B. & Isola, P. Rethinking few-shot image classification: a good embedding is all you need? In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, 266–282 (Springer, 2020).
Lloyd, S. Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982).
Article Google Scholar
Zhu, X., Yao, J., Zhu, F. & Huang, J. WSISA: making survival prediction from whole slide histopathological images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7234–7242 (2017).
Yao, J., Zhu, X. & Huang, J. Deep multi-instance learning for survival prediction from whole slide images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 496–504 (Springer, 2019).
Yao, J., Zhu, X., Jonnagaddala, J., Hawkins, N. & Huang, J. Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks. Med. Image Anal. 65, 101789 (2020).
Article PubMed Google Scholar
Li, R., Yao, J., Zhu, X., Li, Y. & Huang, J. Graph CNN for survival analysis on whole slide pathological images. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 174–182 (Springer, 2018).
Sivic, J. & Zisserman, A. Video Google: A text retrieval approach to object matching in videos. In Proceedings of the Ninth IEEE International Conference on Computer Vision, 1470–1477 (IEEE, 2003).
Fei-Fei, L. & Perona, P. A Bayesian hierarchical model for learning natural scene categories. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05) Vol. 2, 524–531 (IEEE, 2005).
Cruz-Roa, A., Caicedo, J. C. & González, F. A. Visual pattern mining in histology image collections using bag of features. Artif. Intell. Med. 52, 91–106 (2011).
Article PubMed Google Scholar
Xu, Y. et al. Weakly supervised histopathology cancer image segmentation and classification. Med. Image Anal. 18, 591–604 (2014).
Article PubMed Google Scholar
Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).
Article PubMed PubMed Central Google Scholar
Gillette, M. A. et al. Proteogenomic characterization reveals therapeutic vulnerabilities in lung adenocarcinoma. Cell 182, 200–225 (2020).
Article CAS PubMed PubMed Central Google Scholar
Satpathy, S. et al. A proteogenomic portrait of lung squamous cell carcinoma. Cell 184, 4348–4371 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhu, M. et al. Development and evaluation of a deep neural network for histologic classification of renal cell carcinoma on biopsy and surgical resection slides. Sci. Rep. 11, 7080 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43–49 (2013).
Cancer Genome Atlas Research Network. Comprehensive molecular characterization of papillary renal-cell carcinoma. N. Engl. J. Med. 374, 135–145 (2016).
Davis, C. F. et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 26, 319–330 (2014).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Histopathologic and proteogenomic heterogeneity reveals features of clear cell renal cell carcinoma aggressiveness. Cancer Cell 41, 139–163 (2023).
Article CAS PubMed Google Scholar
Brancati, N. et al. BRACS: a dataset for breast carcinoma subtyping in H&E histology images. Database 2022, baac093 (2022).
Article PubMed PubMed Central Google Scholar
Bulten, W. et al. Automated deep-learning system for Gleason grading of prostate cancer using biopsies: a diagnostic study. Lancet Oncol. 21, 233–241 (2020).
Article PubMed Google Scholar
Veeling, B. S., Linmans, J., Winkens, J., Cohen, T. & Welling, M. Rotation equivariant CNNs for digital pathology. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16–20, 2018, Proceedings, Part II 11, 210–218 (Springer, 2018).
Koh, P. W. et al. WILDS: a benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, 5637–5664 (PMLR, 2021).
Aresta, G. et al. BACH: grand challenge on breast cancer histology images. Med. Image Anal. 56, 122–139 (2019).
Article PubMed Google Scholar
Brummer, O., Pölönen, P., Mustjoki, S. & Brück, O. Computational textural mapping harmonises sampling variation and reveals multidimensional histopathological fingerprints. British Journal of Cancer 129, 683–695 (2023).
Tolkach, Y. et al. Artificial intelligence for tumour tissue detection and histological regression grading in oesophageal adenocarcinomas: a retrospective algorithm development and validation study. Lancet Digit. Health 5, e265–e275 (2023).
Article CAS PubMed Google Scholar
Howard, F. M. et al. The impact of site-specific digital histology signatures on deep learning model accuracy and bias. Nat. Commun. 12, 4423 (2021).
Article CAS PubMed PubMed Central Google Scholar
Macenko, M. et al. A method for normalizing histology slides for quantitative analysis. In 2009 IEEE international Symposium on Biomedical Imaging: From Nano to Macro, 1107–1110 (IEEE, 2009).
Abousamra, S. et al. Deep learning-based mapping of tumor infiltrating lymphocytes in whole slide images of 23 types of cancer. Front. Oncol. 11, 806603 (2022).
Article PubMed PubMed Central Google Scholar
Paszke, A. et al. PyTorch: an imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32 (2019).
Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101, e215–e220 (2000).
Article CAS PubMed Google Scholar
Azizi, S. et al. Medical AI research foundations: a repository of medical foundation models (version 1.0.0). PhysioNet https://doi.org/10.13026/grp0-z205 (2023).
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y. & Girshick, R. Detectron2. GitHub https://github.com/facebookresearch/detectron2 (2019).
Clark, K. et al. The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26, 1045–1057 (2013).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank J. Zhou and T. Darcet for providing insights into the training dynamics for iBOT and DINOv2, respectively, and L. Beyer for providing insights and feedback on evaluating self-supervised models. This work was supported in part by the BWH president’s fund, BWH & MGH Pathology, and National Institutes of Health (NIH) NIGMS R35GM138216 (F.M.). G.G. was supported by the BWH President’s Scholar Award, NIGMS R35GM149270, NIDDK P30DK034854, and the Massachusetts Life Sciences Center. R.J.C., D.S. and S.S. were supported by the NSF Graduate Research Fellowship. T.D. was supported by the Harvard SEAS Fellowship. M.Y.L. was supported by the Siebel Scholars program. D.F.K.W. was supported by the NIH NCI Ruth L. Kirschstein National Service Award, T32CA251062. L.O. was supported by the German Academic Exchange (DAAD) Fellowship. We also thank T. Janicki, R. Kenny and the system administration staff at the MGB Enterprise Research Infrastructure & Services (ERIS) Research Computing Core for their support with computing resources, and N. Vatanian, M. Thiagarajan, B. Fevrier-Sullivan and J. Kirby at the NIH for navigating access to whole-slide imaging data in CPTAC.

Author information

These authors contributed equally: Richard J. Chen, Tong Ding, Ming Y. Lu, Drew F. K. Williamson.

Authors and Affiliations

Department of Pathology, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA
Richard J. Chen, Tong Ding, Ming Y. Lu, Drew F. K. Williamson, Guillaume Jaume, Andrew H. Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, Mane Williams, Lukas Oldenburg, Luca L. Weishaupt, Judy J. Wang, Anurag Vaidya, Georg Gerber, Sharifa Sahai, Walt Williams & Faisal Mahmood
Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
Richard J. Chen, Ming Y. Lu, Drew F. K. Williamson, Guillaume Jaume, Andrew H. Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, Mane Williams, Luca L. Weishaupt, Anurag Vaidya, Long Phi Le, Sharifa Sahai & Faisal Mahmood
Cancer Program, Broad Institute of Harvard and MIT, Cambridge, MA, USA
Richard J. Chen, Ming Y. Lu, Drew F. K. Williamson, Guillaume Jaume, Andrew H. Song, Andrew Zhang, Daniel Shao, Muhammad Shaban, Mane Williams, Luca L. Weishaupt, Anurag Vaidya, Sharifa Sahai & Faisal Mahmood
Cancer Data Science Program, Dana-Farber Cancer Institute, Boston, MA, USA
Richard J. Chen, Ming Y. Lu, Guillaume Jaume, Andrew H. Song, Andrew Zhang, Daniel Shao, Muhammad Shaban, Mane Williams, Luca L. Weishaupt, Anurag Vaidya, Sharifa Sahai & Faisal Mahmood
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Richard J. Chen & Mane Williams
Harvard John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
Tong Ding & Walt Williams
Electrical Engineering and Computer Science, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA
Ming Y. Lu
Health Sciences and Technology, Harvard-MIT, Cambridge, MA, USA
Andrew Zhang, Daniel Shao, Luca L. Weishaupt, Anurag Vaidya & Long Phi Le
Department of Systems Biology, Harvard University, Cambridge, MA, USA
Sharifa Sahai
Harvard Data Science Initiative, Harvard University, Cambridge, MA, USA
Faisal Mahmood

Authors

Richard J. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tong Ding
View author publications
You can also search for this author in PubMed Google Scholar
Ming Y. Lu
View author publications
You can also search for this author in PubMed Google Scholar
Drew F. K. Williamson
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Jaume
View author publications
You can also search for this author in PubMed Google Scholar
Andrew H. Song
View author publications
You can also search for this author in PubMed Google Scholar
Bowen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Shao
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Shaban
View author publications
You can also search for this author in PubMed Google Scholar
Mane Williams
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Oldenburg
View author publications
You can also search for this author in PubMed Google Scholar
Luca L. Weishaupt
View author publications
You can also search for this author in PubMed Google Scholar
Judy J. Wang
View author publications
You can also search for this author in PubMed Google Scholar
Anurag Vaidya
View author publications
You can also search for this author in PubMed Google Scholar
Long Phi Le
View author publications
You can also search for this author in PubMed Google Scholar
Georg Gerber
View author publications
You can also search for this author in PubMed Google Scholar
Sharifa Sahai
View author publications
You can also search for this author in PubMed Google Scholar
Walt Williams
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Mahmood
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.J.C., F.M., M.Y.L., T.D. and D.F.K.W. conceived the study and designed the experiments. R.J.C., L.P.L., D.F.K.W., J.J.W., T.D., M.Y.L, G.J., A.H.S., B.C., D.S., M.S., L.O., A.Z., A.V. and S.S. collected the data for self-supervised learning. R.J.C., T.D. and M.Y.L. performed model development for self-supervised learning. R.J.C., M.Y.L, T.D., B.C. and G.J. organized the datasets and codebases for all downstream tasks regarding ROI classification, ROI segmentation and slide classification. R.J.C, T.D., M.Y.L., A.H.S., G.J., M.S., A.Z., L.L.W. and A.V. performed quality control of the codebase and the results. R.J.C., M.Y.L., T.D. and G.J. carried out analysis of the ROI classification. T.D., M.Y.L., R.J.C., L.L.W., A.Z. and W.W. carried out analysis of the ROI segmentation. R.J.C., T.D., B.C., D.S., M.S., M.W. and L.L.W. carried out analysis of the slide classification. R.J.C., T.D., M.Y.L., D.F.K.W., G.J., A.H.S., M.S., L.P.L., G.G. and F.M. interpreted the results and provided feedback on the study. R.J.C., T.D., M.Y.L, D.F.K.W. and F.M. prepared the paper with input from all co-authors. F.M. supervised the research.

Corresponding author

Correspondence to Faisal Mahmood.

Ethics declarations

Competing interests

R.J.C., M.Y.L. and F.M. are inventors on a provisional US patent (application no. 63/611,059) filed corresponding to the methodological aspects of this work. The other authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Andrew Beck, Francesco Ciompi and Lee Cooper for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Few-shot slide classification.

To study the label efficiency of UNI in slide classification, we compare UNI with other pretrained encoders on: a. breast metastasis detection in CAMELYON16, b. NSCLC subtyping in CPTAC (trained on TCGA) c. RCC subtyping in CPTAC-DHMC (trained on TCGA), d. RCC subtyping in DHMC, e. BRCA coarse-grained subtyping in BRACS, f. BRCA fine-grained subtyping in BRACS, g. CRC screening in HunCRC, h. Prostate ISUP Grading in PANDA, i. glioma IDH1 prediction in EBRAINS (trained on TCGA), j. glioma histomolecular subtyping in EBRAINS (trained on TCGA), k. brain tumor coarse-grained subtyping in EBRAINS, l. brain tumor fine-grained subtyping in EBRAINS, and m. heart transplant assessment in BWH-EMB. The performance is measured across different few-shot settings with K ∈ 1, 2, 4, 8, 16, 32 training examples used per class. Boxes indicate quartile values of model performance (n = 5 runs) and whiskers extend to data points within 1.5 × the interquartile range. Overall, we observe that UNI consistently demonstrates superior label efficiency over other baselines.

Extended Data Fig. 2 Comparing supervised performance on PRAD tissue classification in AGCC.

Qualitative illustrations comparing UNI to CTransPath, REMEDIS, and ResNet-50 (IN) via KNN probing on PRAD tissue classification in AGCC. UNI achieves better accuracy (acc.) on all three examples. The reported results are based on partial annotations (left-most panel) provided by pathologists.

Extended Data Fig. 3 ROI retrieval.

We evaluate content-based image retrieval for ROI-level classes with at least 5 classes, for a. CRC tissue classification in CRC-100K, b. CRC tissue classification in HunCRC, c. ESCA subtyping on CHA (trained on UKK, WNS and TCGA), d. PRAD tissue classification in AGGC, e. CRC polyp classification in UniToPatho, and f. pan-cancer tissue classification in TCGA, and. UNI consistently outperforming all pretrained encoders. Error bars represent 95% confidence intervals and the center is the computed value of the corresponding retrieval metric. Detailed performance metrics are further provided in Supplementary Tables 63–68.

Extended Data Fig. 4 ROI classification across different image resolutions.

To assess how image resolution affects performance, we compare UNI and other baselines on various resized and center-cropped ROIs for a. BRCA subtyping and b. CRC polyp classification tasks. The original image sizes are 2048 × 1536 and 1812 × 1812 pixels, respectively. All models are evaluated on linear, SimpleShot (1-NN), and KNN (20-NN) probe settings. UNI consistently outperforms all baselines across all resolutions. The performance metrics are further provided in Supplementary Tables 45, 46, 51, 52.

Extended Data Fig. 5 Multi-head self-attention (MHSA) heatmap visualization of UNI across different image resolutions in BRCA Subtyping in BACH.

Each colored square represents a 16 × 16 patch token encoded by UNI, with heatmap color corresponding to the attention weight of that patch token to the global [CLS] token of the penultimate layer in UNI. We show MHSA visualizations for resized and center-cropped ROIs at 224², 448², 896², 1,344² resolutions for the a. normal, b. benign, c. in situ, and d. invasive classes in BACH. In each, the left-most image is the original H&E ROI and the right four images are the MHSA visualizations. For comparative purposes, we resize all images within the figure to have the same dimension, but note that at higher resolutions, each colored square has an original image resolution of 16 × 16 pixels at 0.42 mpp. As the resolution increases, the heatmaps demonstrate increasing and increasingly fine-grained attention focused on epithelial structures, with relatively lower attention on stroma or other background, neither of which are contributory to the diagnoses in these ROIs.

Extended Data Fig. 6 Multi-head self-attention (MHSA) heatmap visualization of UNI across different image resolutions for CRC polyp classification in UniToPatho.

Each colored square represents a 16 × 16 patch token encoded by UNI, with heatmap color corresponding to the attention weight of that patch token to the global [CLS] token of the penultimate layer in UNI. We show MHSA visualizations for resized and center-cropped ROIs at 224², 448², 896², 1792² resolutions for a. normal tissue, b. hyperplastic polyp, c. tubular adenoma with low-grade dysplasia, d. tubular adenoma with high-grade dysplasia, e. tubulo-villous adenoma with high-grade dysplasia, and f. tubulo-villous adenoma with low-grade dysplasia. In each, the left-most image is the original H&E ROI and the right four images are the MHSA visualizations. For comparative purposes, we resize all images within the figure to have the same dimension, but note that at higher resolutions, each colored square has an original image resolution of 16 × 16 pixels at 0.48 mpp. As resolution increases, the heatmaps demonstrate increasing and increasingly fine-grained attention focused on the crypts, in all cases except the hyperplastic polyp in b, focusing on areas a pathologist would use to make the diagnosis.

Extended Data Fig. 7 Visualizing segmentation results in SegPath.

Using the Mask2Former head, we visualize the tissue segmentation of each class in SegPath created by all pretrained encoders. Overall, we find that UNI is competitive with convolutional and hierarchical models like CTransPath and REMEDIS in matching the segmentation masks obtained via immunofluorescence and DAPI nuclear staining.

Extended Data Fig. 8 Few-shot ROI classification using class prototypes.

Similar to slide-level classification, we also assess the label efficiency of UNI on ROI-level tasks, and observe superior label efficiency of UNI on most tasks except CRC tissue classification on HunCRC. We evaluate all pretrained encoders using the nonparametric SimpleShot framework for a. CRC tissue classification in CRC-100K, b. Breast metastasis detection in CAMELYON17-WILDS, c. RCC tissue classification on HEL (trained on TCGA), d. BRCA subtyping in BACH, e. CRC tissue classification in HunCRC, f. ESCA subtyping on CHA (UKK+WNS+TCGA), g. PRAD tissue classification in AGGC, h. CRC polyp classification in UniToPatho, i. CRC MSI screening in TCGA, j. pan-cancer tissue classification in TCGA, and k. pan-cancer TIL detection in TCGA. The performance is measured across different few-shot settings with K ∈ 1, 2, 4, 8, 16, 32, 64, 128, 256 training examples used per class (or support set size). Boxes indicate quartile values of model performance (n = 1000 runs) and whiskers extend to data points within 1.5 × the interquartile range.

Extended Data Fig. 9 Few-shot slide classification using class prototypes.

We adapt the SimpleShot framework for slide-level classification, called ‘MI-SimpleShot’. ROI class prototypes are constructed by averaging the pre-extracted ROI features for each class using the ‘TCGA Uniform Tumor’ dataset, which we use as ‘prompts’ for assigning the slide-level label. We assess and compare the few-shot performance of all pretrained encoders on NSCLC subtyping (a) and RCC subtyping task (b), using the same runs (n = 5) in the few-shot setting for ABMIL for K ∈ 1, 2, 4, 8, 16, 32 training examples used per class. We compare performance of top-5 and top-50 pooling of nearest patches in the test set, as well as show performance on both the internal test fold in TCGA and external cohort. Boxes indicate quartile values of model performance (n = 5 runs) and whiskers extend to data points within 1.5 × the interquartile range. Overall, we observe that the formed prototypes by UNI can be used to classify slides based on the MI-SimpleShot frame- work. a. On NSCLC subtyping, we observe that 2-shot and 4-shot performance from UNI outperforms the 32-shot performance of all other models. b. On RCC subtyping, 1-shot performance of UNI also outperforms the 32-shot performance of other models. We also observe that MI-SimpleShot can be combined with other pretrained encoders as well, but generally require more annotated ROIs for creating prototypes.

Extended Data Fig. 10 Comparing 1-shot similarity heatmaps of pretrained encoders with class prototype.

We compare the similarity heatmaps of all pretrained encoders using annotated ROIs from a single slide per class for forming class prototypes in MI-SimpleShot (with top-5 pooling) on NSCLC subtyping (a) and RCC subtyping task (b), with top visualizing example ROIs used for each class, and bottom showing similarity heatmaps. Outlined in blue are pathologist annotations of ROIs that match the label of the histology slide. Similarity heatmaps are created with respect to the class prototype of the correct slide label (indicated in green), with a ✓ indicating a correct prediction and ✗ indicating incorrect prediction. Note that since the visualizations are created with respect to the ground truth label, the model may retrieve correct patches that match pathologist annotations but still misclassify the slide. a. On a LUAD slide, we observe strong agreement of the pathologist’s annotations with retrieved LUAD patches by UNI. Although retrieved patches by REMEDIS also have strong agreement with the pathologist’s annotations, we note that slide was misclassified as LUSC, indicating that the top-5 retrieved patches of the LUSC prototype was higher than that of the LUAD prototype. Vice versa, ResNet-50_IN classifies the slide correctly but incorrectly retrieves the correct patches that agree with the pathologist’s annotations, indicating that non-LUAD patches in the slide were more LUAD-like than the pathologist-annotated LUAD patches with respect to the LUAD prototype. The similarity heatmap for CTransPath both misclassified the slide and retried incorrect patches. b. On an CCRCC slide, we observe strong agreement of the pathologist’s annotations with retrieved CCRCC patches by UNI. We observe similar mismatch in predicted class label and retrieved patches, in which REMEDIS classifies the slide correctly but retrieves the incorrect patches, and CTransPath misclassifies the slide but retrieves the correct patches.

Supplementary information

Supplementary Information

Supplementary Tables 1–73.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, R.J., Ding, T., Lu, M.Y. et al. Towards a general-purpose foundation model for computational pathology. Nat Med 30, 850–862 (2024). https://doi.org/10.1038/s41591-024-02857-3

Download citation

Received: 28 August 2023
Accepted: 05 February 2024
Published: 19 March 2024
Issue Date: March 2024
DOI: https://doi.org/10.1038/s41591-024-02857-3

This article is cited by

Demographic bias in misdiagnosis by computational pathology models
- Anurag Vaidya
- Richard J. Chen
- Faisal Mahmood
Nature Medicine (2024)