Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A visual–language foundation model for pathology image analysis using medical Twitter

Abstract

The lack of annotated publicly available medical images is a major barrier for computational research and education innovations. At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter. Here we harness these crowd platforms to curate OpenPath, a large dataset of 208,414 pathology images paired with natural language descriptions. We demonstrate the value of this resource by developing pathology language–image pretraining (PLIP), a multimodal artificial intelligence with both image and text understanding, which is trained on OpenPath. PLIP achieves state-of-the-art performances for classifying new pathology images across four external datasets: for zero-shot classification, PLIP achieves F1 scores of 0.565–0.832 compared to F1 scores of 0.030–0.481 for previous contrastive language–image pretrained model. Training a simple supervised classifier on top of PLIP embeddings also achieves 2.5% improvement in F1 scores compared to using other supervised model embeddings. Moreover, PLIP enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing. Our approach demonstrates that publicly shared medical information is a tremendous resource that can be harnessed to develop medical artificial intelligence for enhancing diagnosis, knowledge sharing and education.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the study.
Fig. 2: PLIP predicts new classes via zero-shot transfer learning.
Fig. 3: Image embedding analysis and linear probing results.
Fig. 4: Text-to-image retrieval for pathology images.
Fig. 5: Image-to-image retrieval for pathology images.

Similar content being viewed by others

Data availability

All data in OpenPath are publicly available from Twitter and LAION-5B (https://laion.ai/blog/laion-5b/). The Twitter IDs used for training and validation can be accessed at https://tinyurl.com/openpathdata. The validation datasets are publicly available and can be accessed from the following: Kather colon dataset (https://zenodo.org/record/1214456); PanNuke (https://warwick.ac.uk/fac/cross_fac/tia/data/pannuke); DigestPath (https://digestpath2019.grand-challenge.org/); WSSS4LUAD (https://wsss4luad.grand-challenge.org/); PathPedia (https://www.pathpedia.com/Education/eAtlas/Default.aspx); PubMed and Books pathology collection (https://warwick.ac.uk/fac/cross_fac/tia/data/arch); KIMIA Path24C (https://kimialab.uwaterloo.ca/kimia/index.php/pathology-images-kimia-path24/). The ImageNet dataset (https://www.image-net.org/) was adopted for the pretrained ViT-B/32 model. The trained model, source codes and interactive results can also be accessed at https://tinyurl.com/webplip.

Code availability

The trained model and source codes can be accessed at https://tinyurl.com/webplip.

References

  1. Huang, Z. et al. Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ Precis. Oncol. 7, 14 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Dawood, M., Branson, K., Rajpoot, N. M. & Ul Amir Afsar Minhas, F. ALBRT: cellular composition prediction in routine histology images. In Proc. IEEE/CVF International Conference on Computer Vision Workshops 664–673 (IEEE, 2021).

  4. Hegde, N. et al. Similar image search for histopathology: SMILY. NPJ Digit. Med. 2, 56 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Chen, C. et al. Fast and scalable search of whole-slide images via self-supervised deep learning. Nat. Biomed. Eng. 6, 1420–1434 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Gamper, J., Alemi Koohbanani, N., Benet, K., Khuram, A. & Rajpoot, N. PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification. In Digital Pathology (eds Reyes-Aldasoro, C. et al.) 11–19 (Springer International Publishing, 2019).

  7. Graham, S. et al. Lizard: a large-scale dataset for colonic nuclear instance segmentation and classification. In Proc. IEEE/CVF International Conference on Computer Vision Workshops 684–693 (IEEE, 2021).

  8. Amgad, M. et al. Structured crowdsourcing enables convolutional segmentation of histology images. Bioinformatics 35, 3461–3467 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Singh, H. & Graber, M. L. Improving diagnosis in health care—the next imperative for patient safety. N. Engl. J. Med. 373, 2493–2495 (2015).

    Article  PubMed  Google Scholar 

  10. Erickson, L. A., Mete, O., Juhlin, C. C., Perren, A. & Gill, A. J. Overview of the 2022 WHO classification of parathyroid tumors. Endocr. Pathol. 33, 64–89 (2022).

    Article  PubMed  Google Scholar 

  11. van Rijthoven, M. et al. Few-shot weakly supervised detection and retrieval in histopathology whole-slide images. Medical Imaging 2021: Digital Pathology (eds Tomaszewski, J. E. & Ward, A. D.) 137–143 (Society of Photographic Instrumentation Engineers, 2021).

  12. Chen, J., Jiao, J., He, S., Han, G. & Qin, J. Few-shot breast cancer metastases classification via unsupervised cell ranking. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 1914–1923 (2021).

    Article  PubMed  Google Scholar 

  13. Schaumberg, A. J. et al. Interpretable multimodal deep learning for real-time pan-tissue pan-disease pathology search on social media. Mod. Pathol. 33, 2169–2185 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Schukow, C. P., Booth, A. L., Mirza, K. M. & Jajosky, R. P. #PathTwitter: a positive platform where medical students can engage the pathology community. Arch. Pathol. Lab. Med. 147, 135–136 (2023).

    Article  PubMed  Google Scholar 

  15. Crane, G. M. & Gardner, J. M. Pathology image-sharing on social media: recommendations for protecting privacy while motivating education. AMA J. Ethics 18, 817–825 (2016).

    Article  PubMed  Google Scholar 

  16. El Hussein, S. et al. Next-generation scholarship: rebranding hematopathology using twitter: the MD Anderson experience. Mod. Pathol. 34, 854–861 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Mukhopadhyay, S. et al. The network that never sleeps. Lab. Med. 52, e83–e103 (2021).

    Article  PubMed  Google Scholar 

  18. Allen, T. C. Social media: pathologists’ force multiplier. Arch. Pathol. Lab. Med. 138, 1000–1001 (2014).

    Article  PubMed  Google Scholar 

  19. Misialek, M. J. & Allen, T. C. You’re on social media! So now what? Arch. Pathol. Lab. Med. 140, 393 (2016).

    Article  PubMed  Google Scholar 

  20. Katz, M. S. et al. Disease-specific hashtags for online communication about cancer care. JAMA Oncol. 2, 392–394 (2016).

    Article  PubMed  Google Scholar 

  21. Oltulu, P., Mannan, A. A. S. R. & Gardner, J. M. Effective use of Twitter and Facebook in pathology practice. Hum. Pathol. 73, 128–143 (2018).

    Article  PubMed  Google Scholar 

  22. Schuhmann, C. et al. LAION-5B: An open large-scale dataset for training next generation image-text models. In Proc. 35th International Conference on Neural Information Processing Systems (eds Koyejo, S. et al.) 25278–25294 (2022).

  23. Palatucci, M., Pomerleau, D., Hinton, G. & Mitchell, T. M. Zero-shot learning with semantic output codes. In Proc. 22nd International Conference on Neural Information Processing Systems (eds Bengio, Y. et al.) 1410–1418 (Curran Associates, 2009).

  24. Pathology Tag Ontology. Symplur https://www.symplur.com/healthcare-hashtags/ontology/pathology/ (2023).

  25. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).

  26. Kather, J. N. et al. Predicting survival from colorectal cancer histology slides using deep learning: a retrospective multicenter study. PLoS Med. 16, e1002730 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Da, Q. et al. DigestPath: a benchmark dataset with challenge review for the pathological detection and segmentation of digestive-system. Med. Image Anal. 80, 102485 (2022).

    Article  PubMed  Google Scholar 

  28. Han, C. et al. WSSS4LUAD: grand challenge on weakly-supervised tissue semantic segmentation for lung adenocarcinoma. Preprint at https://doi.org/10.48550/arXiv.2204.06455 (2022).

  29. Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Eslami, S., Meinel, C. & de Melo, G. PubMedCLIP: How much does CLIP benefit visual question answering in the medical domain? In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (eds Vlachos, A. & Augenstein, I.) 1181–1193 (EACL, 2023).

  31. Wang, Z., Wu, Z., Agarwal, D. & Sun, J. MedCLIP: contrastive learning from unpaired medical images and text. Preprint at https://doi.org/10.48550/arXiv.2210.10163 (2022).

  32. Mormont, R., Geurts, P. & Maree, R. Multi-task pre-training of deep neural networks for digital pathology. IEEE J. Biomed. Health Inform. 25, 412–421 (2021).

    Article  PubMed  Google Scholar 

  33. Kherfi, M. L., Ziou, D. & Bernardi, A. Image retrieval from the world wide web: issues, techniques, and systems. ACM Comput. Surv. 36, 35–67 (2004).

    Article  Google Scholar 

  34. Gamper, J. & Rajpoot, N. Multiple instance captioning: learning representations from histopathology textbooks and articles. In Proc. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 16549–16559 (IEEE, 2021).

  35. Shafiei, S., Babaie, M., Kalra, S. & Tizhoosh, H. R. Colored Kimia Path24 dataset: configurations and benchmarks with deep embeddings. Preprint at https://doi.org/10.48550/arXiv.2102.07611 (2021).

  36. Madabhushi, A. & Lee, G. Image analysis and machine learning in digital pathology: challenges and opportunities. Med. Image Anal. 33, 170–175 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Srinidhi, C. L., Kim, S. W., Chen, F.-D. & Martel, A. L. Self-supervised driven consistency training for annotation efficient histopathology image analysis. Med. Image Anal. 75, 102256 (2022).

    Article  PubMed  Google Scholar 

  38. Tizhoosh, H. R. & Pantanowitz, L. Artificial intelligence and digital pathology: challenges and opportunities. J. Pathol. Inform. 9, 38 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Zhou, C., He, J., Ma, X., Berg-Kirkpatrick, T. & Neubig, G. Prompt consistency for zero-shot task generalization. In Findings of the Association for Computational Linguistics (eds Goldberg, Y. et al.) 2613–2626 (EMNLP, 2022).

  40. Radford, A. et al. Language models are unsupervised multitask learners. OpenAI blog 1, 9 (2019).

    Google Scholar 

  41. van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://doi.org/10.48550/arXiv.1807.03748 (2018).

  42. Alain, G. & Bengio, Y. Understanding intermediate layers using linear classifier probes. In 5th International Conference on Learning Representations Workshop (2017).

  43. Liang, Y., Zhu, L., Wang, X. & Yang, Y. A simple episodic linear probe improves visual recognition in the wild. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 9559–9569 (IEEE, 2022).

  44. Pedregosa, F. et al. Scikit-learn: machine learning in Python.J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  45. Huang, G., Liu, Z., Maaten, L. van der & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4700–4708 (IEEE, 2017).

  46. Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  47. Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. Preprint at https://doi.org/10.48550/arXiv.1711.05101 (2017).

  48. Zhang, S., Yang, M., Cour, T., Yu, K. & Metaxas, D. N. Query specific fusion for image retrieval. In Proc. European Conference on Computer Vision 2012 (eds Fitzgibbon, A. et al.) 660–673 (ECCV, 2012).

Download references

Acknowledgements

F.B. is supported by the Hoffman-Yee Research Grant Program and the Stanford Institute for Human-Centered Artificial Intelligence. J.Z. is supported by the Chan Zuckerberg Biohub.

Author information

Authors and Affiliations

Authors

Contributions

Z.H., F.B. and J.Z. designed the study. Z.H. and F.B. carried out the data collection, data analysis, model construction, model validation and manuscript writing. M.Y. carried out the data analysis, model construction, model validation and manuscript writing. T.J.M. provided knowledge support, interpreted the findings and helped with manuscript writing. J.Z. provided knowledge support, interpreted the findings, helped with manuscript writing and supervised the study. All authors contributed to writing the manuscript and reviewed and approved the final version.

Corresponding author

Correspondence to James Zou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Medicine thanks Geert Litjens, Lee Cooper and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Cohort inclusion and exclusion criteria and flowcharts.

a, Twitter training dataset from 2006-03-21 to 2022-11-15. b, Twitter validation dataset for image retrieval from 2022-11-16 to 2023-01-15.

Extended Data Fig. 2 Confusion matrix from zero-shot learning in the Kather colon dataset.

a, Confusion matrix of PLIP model; b, Confusion matrix of CLIP model; c, Confusion matrix of the results from predicting the majority class (or Majority in short).

Extended Data Fig. 3 Comparison of image embeddings derived from models in the Kather colon dataset.

a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath. ADI: Adipose tissue, BACK: background, DEB: debris, LYM: lymphocytes, MUC: mucus, MUS: smooth muscle, NORM: normal colon mucosa, STR: cancer-associated stroma, TUM: colorectal adenocarcinoma epithelium.

Extended Data Fig. 4 Comparison of image embeddings between PLIP, CLIP, and MuDiPath models for the PanNuke dataset.

a, Image embeddings generated by the PLIP model, colored by benign and malignant. b, Image embeddings generated by the PLIP model, colored by 19 pathology subspecialties. Scatters with black edges indicate malignant images. c, Image embeddings generated by the CLIP model, colored by benign and malignant. d, Image embeddings generated by the CLIP model, colored by 19 pathology subspecialties. Scatters with black edges indicate malignant images. e, Image embeddings generated by the MuDiPath model, colored by benign and malignant. f, Image embeddings generated by the MuDiPath model, colored by 19 organs. Scatters with black edges indicate malignant images.

Extended Data Fig. 5 Comparison of image embeddings derived from models in the DigestPath dataset.

a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath.

Extended Data Fig. 6 Comparison of image embeddings derived from models in the WSSS4LUAD dataset.

a, image embeddings derived from PLIP, b, image embeddings derived from baseline CLIP. c, image embeddings derived from MuDiPath.

Extended Data Fig. 7 Cluster visualization of images in DigestPath dataset.

a, Image patches in low-dimensional space colored by different downsampling rates. b, Visualization of image patches on different clusters.

Extended Data Fig. 8 Comparison to supervised deep learning models.

The fine-tuning was conducted on a, Kather colon dataset training split, b, PanNuke dataset, c, DigestPath dataset, and d, WSSS4LUAD dataset, by comparing the PLIP image encoder to ViT-B/32 (pre-trained on ImageNet). In the line plots, mean values and 95% confidence intervals are presented by using 10 different random seeds for subsetting the data and running the models. The improvements for PLIP are particularly large for smaller datasets. For instance, when comparing the weighted F1 scores across the four datasets using only 1% of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.952, while ViT-B/32 achieved F1 = 0.921; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.715, while ViT-B/32 achieved F1 = 0.637; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.933, while ViT-B/32 achieved F1 = 0.872; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.816, while ViT-B/32 achieved F1 = 0.645. When comparing the weighted F1 scores across the four datasets using all of the training data: (i) for Kather training split, the PLIP image encoder achieved F1 = 0.994, while ViT-B/32 achieved F1 = 0.991; (ii) for PanNuke dataset, the PLIP image encoder achieved F1 = 0.962, while ViT-B/32 achieved F1 = 0.938; (iii) for DigestPath dataset, the PLIP image encoder achieved F1 = 0.977, while ViT-B/32 achieved F1 = 0.968; (iv) for WSSS4LUAD dataset, the PLIP image encoder achieved F1 = 0.958, while ViT-B/32 achieved F1 = 0.941.

Extended Data Fig. 9 Text-to-image retrieval performances for Recall@50.

a, Image retrieval performances for Recall@50 within each of the pathology subspecialty-specific hashtags. b, Two-sided Spearman correlations between the number of candidates and fold changes for Recall@50 when comparing the PLIP model with random and CLIP, respectively. In regression plots, the regression estimates are displayed with 95% confidence intervals in grey or purple colors.

Extended Data Table 1 List of pathology hashtags on Twitter used in this study

Supplementary information

Supplementary Information

Supplementary Discussion, Tables 1–6 and Figs. 1 and 2.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Z., Bianchi, F., Yuksekgonul, M. et al. A visual–language foundation model for pathology image analysis using medical Twitter. Nat Med 29, 2307–2316 (2023). https://doi.org/10.1038/s41591-023-02504-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41591-023-02504-3

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing