Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging

Abstract

Machine-learning models for medical tasks can match or surpass the performance of clinical experts. However, in settings differing from those of the training dataset, the performance of a model can deteriorate substantially. Here we report a representation-learning strategy for machine-learning models applied to medical-imaging tasks that mitigates such ‘out of distribution’ performance problem and that improves model robustness and training efficiency. The strategy, which we named REMEDIS (for ‘Robust and Efficient Medical Imaging with Self-supervision’), combines large-scale supervised transfer learning on natural images and intermediate contrastive self-supervised learning on medical images and requires minimal task-specific customization. We show the utility of REMEDIS in a range of diagnostic-imaging tasks covering six imaging domains and 15 test datasets, and by simulating three realistic out-of-distribution scenarios. REMEDIS improved in-distribution diagnostic accuracies up to 11.5% with respect to strong supervised baseline models, and in out-of-distribution settings required only 1–33% of the data for retraining to match the performance of supervised models retrained using all available data. REMEDIS may accelerate the development lifecycle of machine-learning models for medical imaging.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the REMEDIS approach for developing robust and efficient ML for medical imaging.
Fig. 2: Overview of clinical settings for evaluating REMEDIS.
Fig. 3: Data-efficient generalization.
Fig. 4: Data-efficient generalization of REMEDIS with various self-supervised learning techniques.
Fig. 5: REMEDIS versus weakly supervised DeepMIL.
Fig. 6: Cross-domain shift.

Similar content being viewed by others

Data availability

The datasets from Northwestern Medicine and Apollo Hospitals were used under a licence for the current study and are not publicly available. Applications for access to the Optimam database can be made using this web form. The de-identified teledermatology data used in this study are not publicly available owing to restrictions in the data-sharing agreement. The unlabelled dataset used for DME classification is de-identified data from EyePACS Inc. Interested researchers should contact jcuadros@eyepacs.com to enquire about access to EyePACSdata and approach the Office of Research and Development to enquire about access to VA data. The rest of annotated data for ID and OOD DME classification tasks were collected at the Rajavithi Hospital Thailand and at the Lions Eye Institute and are not publicly available owing to restrictions in the data-sharing agreement. Data used in the evaluation and pretraining of the chest-X-ray-condition classification, including MIMIC-CXR, CheXpert, and ChestX-ray 14 are publicly available. Data used for the ID fine-tuning and evaluation of the detection of metastases are publicly available on the CAMELYON challenge website. The TCGA data used for pretraining for both the pathology-based metastases-detection and survival-prediction tasks are available via the NIH website. The rest of the data used in pathology tasks are not publicly available owing to restrictions in the data-sharing agreement. Moreover, ImageNet-1K (ILSVRC-2012)68 used for the pretraining of baseline supervised models, and ImageNet-21K used for the pretraining of BiT-M models are publicly available via the ImageNet website. BiT-L models trained on the JFT-300M54 dataset are not publicly available owing to restrictions in the data-sharing agreement.

Code availability

Several major components of the work are available in open-source repositories, such as the T library. The code base and pretrained weights used for self-supervised pretraining are available at S. The code base and pretrained weights for the BiT models are available at B. All experiments and implementation details are described in sufficient detail in Methods and in Supplementary Information to support replication with non-proprietary libraries. The code base used for our comparison to ResNet-RS was based on R. A number of the checkpoints and models generated through REMEDIS are readily accessible to researchers via the P. Additionally, the Foundation Medical ML repositories on GitHub offer access to codes that can be used to train REMEDIS-based models.

References

  1. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

    Article  CAS  PubMed  Google Scholar 

  2. Yala, A., Lehman, C., Schuster, T., Portnoi, T. & Barzilay, R. A deep learning mammography-based model for improved breast cancer risk prediction. Radiology 292, 60–66 (2019).

    Article  PubMed  Google Scholar 

  3. Wu, N. et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans. Med. Imaging 39, 1184–1194 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  4. McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).

    Article  CAS  PubMed  Google Scholar 

  5. Rajpurkar, P. et al. Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 15, e1002686 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Liu, Y. et al. A deep learning system for differential diagnosis of skin diseases. Nat. Med. 26, 900–908 (2020).

    Article  CAS  PubMed  Google Scholar 

  8. Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Rakha, E. A. et al. Current and future applications of artificial intelligence in pathology: a clinical perspective. J. Clin. Pathol. 74, 409–414 (2021).

    Article  CAS  PubMed  Google Scholar 

  10. Wulczyn, E. et al. Interpretable survival prediction for colorectal cancer using deep learning. npj Digit. Med. 4, 71 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  11. Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316, 2402–2410 (2016).

    Article  PubMed  Google Scholar 

  12. De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018).

    Article  PubMed  Google Scholar 

  13. Zhou, S. K. et al. A review of deep learning in medical imaging: imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 109, 820–838 (2021).

    Article  CAS  Google Scholar 

  14. Condon, J. J. J. et al. Replication of an open-access deep learning system for screening mammography: reduced performance mitigated by retraining on local data. Preprint at medRxiv https://doi.org/10.1101/2021.05.28.21257892 (2021).

  15. Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 15, e1002683 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Zhang, H. et al. An empirical framework for domain generalization in clinical settings. In Proc. Conference on Health, Inference, and Learning (eds Ghassemi, M. et al.) 279–290 (Association for Computing Machinery, 2021).

  17. Seyyed-Kalantari, L., Liu, G., McDermott, M., Chen, I. Y. & Ghassemi, M. CheXclusion: fairness gaps in deep chest X-ray classifiers. Pac. Symp. Biocomput. 26, 232–243 (2021).

    PubMed  Google Scholar 

  18. Kadambi, A. Achieving fairness in medical devices. Science 372, 30–31 (2021).

    Article  CAS  PubMed  Google Scholar 

  19. Pierson, E., Cutler, D. M., Leskovec, J., Mullainathan, S. & Obermeyer, Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat. Med. 27, 136–140 (2021).

  20. Artificial Intelligence in Health Care: Benefits and Challenges of Technologies to Augment Patient Care (US Government Accountability Office, 2020).

  21. Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G. & King, D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med. 17, 195 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3, 199–217 (2021).

    Article  Google Scholar 

  23. Van Leeuwen, K. G., Schalekamp, S., Rutten, M. J., van Ginneken, B. & de Rooij, M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur. Radiol. 31, 3797–3804 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Freeman, K. et al. Use of artificial intelligence for image analysis in breast cancer screening programmes: systematic review of test accuracy. bmj 374, n1872 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  25. D’Amour, A. et al. Underspecification presents challenges for credibility in modern machine learning. J. Mach. Learn. Res. 23, 1–61 (2020).

    Google Scholar 

  26. Finlayson, S. G. et al. The clinician and dataset shift in artificial intelligence. N. Engl. J. Med. 386, 283–286 (2020).

    Google Scholar 

  27. Futoma, J., Simons, M., Panch, T., Doshi-Velez, F. & Celi, L. A. The myth of generalisability in clinical research and machine learning in health care. Lancet Dig. Health 2, e489–e492 (2020).

    Article  Google Scholar 

  28. Willemink, M. J. et al. Preparing medical imaging data for machine learning. Radiology 295, 4–15 (2020).

    Article  PubMed  Google Scholar 

  29. Li, F.-F., Fergus, R. & Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 594–611 (2006).

    Article  Google Scholar 

  30. Zhu, X., Ghahramani, Z. & Lafferty, J. D. Semi-supervised learning using gaussian fields and harmonic functions. In Proc. 20th International Conference on Machine Learning (eds Fawcett, T. & Mishra, N.) 912–919 (AAAI Press, 2003).

  31. Cohn, D., Atlas, L. & Ladner, R. Improving generalization with active learning. Mach. Learn. 15, 201–221 (1994).

    Article  Google Scholar 

  32. Sutton, R. S. Generalization in reinforcement learning: successful examples using sparse coarse coding. Adv. Neural Inf. Process. Syst. 8, 1038–1044 (1996).

    Google Scholar 

  33. Doersch, C., Gupta, A. & Efros, A. A. Unsupervised visual representation learning by context prediction. In Proc. IEEE International Conference on Computer Vision 1422–1430 (IEEE, 2015).

  34. Doersch, C. & Zisserman, A. Multi-task self-supervised visual learning. In Proc. IEEE International Conference on Computer Vision 2070–2079 (IEEE, 2017).

  35. Gidaris, S., Singh, P. & Komodakis, N. Unsupervised representation learning by predicting image rotations. Preprint at https://arxiv.org/abs/1803.07728 (2018).

  36. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T. & Efros, A. A. Context encoders: Feature learning by inpainting. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2536–2544 (IEEE, 2016).

  37. Larsson, G., Maire, M. & Shakhnarovich, G. Colorization as a proxy task for visual understanding. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 6874–6883 (IEEE, 2017).

  38. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018).

  39. Brown, T. B. et al. Language models are few-shot learners. Adv. Neural Inf. Process Syst. 33, 1877–1901 (2020).

    Google Scholar 

  40. Baevski, A., Auli, M. & Mohamed, A. Effectiveness of self-supervised pre-training for speech recognition. Preprint at https://arxiv.org/abs/1911.03912 (2019).

  41. Chen, L. et al. Self-supervised learning for medical image analysis using image context restoration. Med. Image Anal. 58, 101539 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  42. He, K., Fan, H., Wu, Y., Xie, S. & Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9729–9738 (IEEE, 2020).

  43. Grill, J.-B. et al. Bootstrap your own latent: a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020).

    Google Scholar 

  44. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. & Singh, A.) 1597–1607 (JMLR, 2020).

  45. Deng, J. et al. Imagenet: a large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition 248–255 (IEEE, 2009).

  46. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 770–778 (IEEE, 2016).

  47. Touvron, H. et al. Training data-efficient image transformers and distillation through attention. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 10347–10357 (PMLR, 2021).

  48. Liu, H. & Abbeel, P. Hybrid discriminative-generative training via contrastive learning. Preprint at https://arxiv.org/abs/2007.09070 (2020).

  49. Winkens, J. et al. Contrastive training for improved out-of-distribution detection. Preprint at https://arxiv.org/abs/2007.05566 (2020).

  50. Shen, K. et al. Connect, not collapse: explaining contrastive learning for unsupervised domain adaptation. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 19847–19878 (PMLR, 2022).

  51. HaoChen, J. Z., Wei, C., Kumar, A. & Ma, T. Beyond separability: analyzing the linear transferability of contrastive representations to related subpopulations. Preprint at https://arxiv.org/abs/2204.02683 (2022).

  52. Kolesnikov, A. et al. Big transfer (BiT): general visual representation learning. In Proc. European Conference on Computer Vision (eds Vedaldi, A. et al.) 491–507 (Springer, 2020).

  53. Huh, M., Agrawal, P. & Efros, A. A. What makes ImageNet good for transfer learning? Preprint at https://arxiv.org/abs/1608.08614 (2016).

  54. Sun, C., Shrivastava, A., Singh, S. & Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proc. IEEE International Conference on Computer Vision 843–852 (IEEE, 2017).

  55. Mahajan, D. et al. Exploring the limits of weakly supervised pretraining. In Proc. European Conference on Computer Vision (eds Ferrari, V. et al.) 185–201 (Springer, 2018).

  56. Houlsby, N. & Zhai, X. The Visual Task Adaptation Benchmark (Google Research, 2019).

  57. Mustafa, B. et al. Supervised transfer learning at scale for medical imaging. Preprint at https://arxiv.org/abs/2101.05913 (2021).

  58. Raghu, M., Zhang, C., Kleinberg, J. & Bengio, S. Transfusion: understanding transfer learning for medical imaging. Adv. Neural Inf. Process. Syst. 33, 3347–3357 (2019).

  59. Hendrycks, D., Lee, K. & Mazeika, M. Using pre-training can improve model robustness and uncertainty. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 2712–2721 (PMLR, 2019).

  60. Li, J., Lin, T. & Xu, Y. SSLP: Spatial guided self-supervised learning on pathological images. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds de Bruijne, M. et al.) 3–12 (Springer, 2021).

  61. Srinidhi, C. L. & Martel, A. L. Improving self-supervised learning with hardness-aware dynamic curriculum learning: an application to digital pathology. In Proc. IEEE/CVF International Conference on Computer Vision 562–571 (IEEE, 2021).

  62. Azizi, S. et al. Big self-supervised models advance medical image classification. In IEEE/CVF International Conference on Computer Vision (ICCV) 3458–3468 (IEEE, 2021).

  63. Sowrirajan, H., Yang, J., Ng, A. Y. & Rajpurkar, P. MoCo pretraining improves representation and transferability of chest X-ray models. In Proc. Fourth Conference on Medical Imaging with Deep Learning (eds Heinrich, M. et al.) 728–744 (PMLR, 2021).

  64. Zhou, Z. et al. Models genesis: generic autodidactic models for 3D medical image analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Shen, D. et al.) 384–393 (2019).

  65. Liu, X. et al. Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 35, 857–876 (2023).

    Google Scholar 

  66. Wang, X. et al. Chestx-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 3462–3471 (IEEE, 2017).

  67. Hendrycks, D. et al. Pretrained transformers improve out-of-distribution robustness. Preprint at https://arxiv.org/abs/2004.06100 (2020).

  68. Russakovsky, O. et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015).

    Article  Google Scholar 

  69. Alzubaidi, L. et al. Optimizing the performance of breast cancer classification by employing the same domain transfer learning from hybrid deep convolutional neural network model. Electronics 9, 445 (2020).

    Article  Google Scholar 

  70. Graziani, M., Andrearczyk, V. & Müller, H. Visualizing and interpreting feature reuse of pretrained CNNs for histopathology. In Proc. IMVIP 2019: Irish Machine Vision and Image Processing (Technological University Dublin, 2019).

  71. Wu, Y. & He, K. Group normalization. In Proc. European Conference on Computer Vision (ECCV) 3–19 (2018).

  72. Chen, T., Kornblith, S., Swersky, K., Norouzi, M. & Hinton, G. Big self-supervised models are strong semi-supervised learners. Adv. Neural Inf. Process. Syst. 33, 22243–22255 (2020).

    Google Scholar 

  73. Becker, S. & Hinton, G. E. Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature 355, 161–163 (1992).

    Article  CAS  PubMed  Google Scholar 

  74. Virgili, G. et al. Optical coherence tomography (OCT) for detection of macular oedema in patients with diabetic retinopathy. Cochrane Database Syst. Rev. 1, CD008081 (2015).

    PubMed  Google Scholar 

  75. Liu, X. et al. Deep learning to detect optical coherence tomography-derived diabetic macular edema from retinal photographs: a multicenter validation study. Ophthalmol. Retina 6, 398–410 (2022).

    Article  PubMed  Google Scholar 

  76. Brown, J. C. et al. Detection of diabetic foveal edema: contact lens biomicroscopy compared with optical coherence tomography. Arch. Ophthalmol. 122, 330–335 (2004).

    Article  PubMed  Google Scholar 

  77. Sadda, S. R. et al. Automated detection of clinically significant macular edema by grid scanning optical coherence tomography. Ophthalmology 113, 1187.e1-12 (2006).

    Article  PubMed  Google Scholar 

  78. Irvin, J. et al. Chexpert: a large chest radiograph dataset with uncertainty labels and expert comparison. Proc. Conf. AAAI Artif. Intell. 33, 590–597 (2019).

  79. Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  80. Neyshabur, B., Sedghi, H. & Zhang, C. What is being transferred in transfer learning? Adv. Neural Inf. Process. Syst. 33, 512–523 (2020).

    Google Scholar 

  81. Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).

  82. Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).

    Article  Google Scholar 

  83. Vu, Y. N. T. et al. MedAug: contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation. In Proc. 6th Machine Learning for Healthcare Conference (eds Jung, K. et al.) 755–769 (PMLR, 2021).

  84. Chen, X., Fan, H., Girshick, R. & He, K. Improved baselines with momentum contrastive learning. Preprint at https://arxiv.org/abs/2003.04297 (2020).

  85. Mitrovic, J., McWilliams, B., Walker, J., Buesing, L. & Blundell, C. Representation learning via invariant causal mechanisms. Preprint at https://arxiv.org/abs/2010.07922 (2020).

  86. Zbontar, J., Jing, L., Misra, I., LeCun, Y. & Deny, S. Barlow twins: self-supervised learning via redundancy reduction. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 12310–12320 (PMLR, 2021).

  87. Dunnmon, J. A. et al. Cross-modal data programming enables rapid medical machine learning. Patterns 1, 100019 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Eyuboglu, S. et al. Multi-task weak supervision enables anatomically-resolved abnormality detection in whole-body FDG-PET/CT. Nat. Commun. 12, 1880 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Bakalo, R., Ben-Ari, R. & Goldberger, J. Classification and detection in mammograms with weak supervision via dual branch deep neural net. In IEEE 16th International Symposium on Biomedical Imaging (ISBI) 1905–1909 (IEEE, 2019).

  91. Wenzel, F. et al. Assaying out-of-distribution generalization in transfer learning. Adv. Neural Inf. Process. Syst. 35, 7181–7198 (2022).

    Google Scholar 

  92. Hendrycks, D. & Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. Preprint at https://arxiv.org/abs/1903.12261 (2019).

  93. Wang, Z., Dai, Z., Póczos, B. & Carbonell, J. Characterizing and avoiding negative transfer. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 11285–11294 (IEEE, 2019).

  94. Gulrajani, I. & Lopez-Paz, D. In search of lost domain generalization. Preprint at https://arxiv.org/abs/2007.01434 (2020).

  95. Vapnik, V. N. Statistical Learning Theory (Wiley-Interscience, 1998).

  96. Zhang, H., Cisse, M., Dauphin, Y. N. & Lopez-Paz, D. mixup: beyond empirical risk minimization. Preprint at https://arxiv.org/abs/1710.09412 (2017).

  97. Goyal, P. et al. Self-supervised pretraining of visual features in the wild. Preprint at https://arxiv.org/abs/2103.01988 (2021).

  98. Bubeck, S. & Sellke, M. A universal law of robustness via isoperimetry. J. ACM 70, 1–18 (2023).

    Article  Google Scholar 

  99. Ericsson, L., Gouk, H. & Hospedales, T. M. How well do self-supervised models transfer? In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 5410–5419 (IEEE, 2021).

  100. Chen, X. & He, K. Exploring simple Siamese representation learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 15745–15753 (IEEE, 2021).

  101. Ciga, O., Martel, A. L. & Xu, T. Self-supervised contrastive learning for digital histopathology. Mach. Learn. 7, 100198 (2022).

    Google Scholar 

  102. Taher, M. R. H., Haghighi, F., Gotway, M. B. & Liang, J. CAiD: context-aware instance discrimination for self-supervised learning in medical imaging. In Proc. 5th International Conference on Medical Imaging with Deep Learning (eds Konukoglu, E. et al.) 535–551 (PMLR, 2022).

  103. Taher, M. R. H., Haghighi, F., Feng, R., Gotway, M. B. & Liang, J. in Domain Adaptation and Representation Transfer, and Affordable Healthcare and AI for Resource Diverse Global Health (eds Albarqouni, S. et al.) 3–13 (Springer, 2021).

  104. Xie, Q., Luong, M.-T., Hovy, E. & Le, Q. V. Self-training with noisy student improves imagenet classification. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10684–10695 (IEEE, 2020).

  105. Srinidhi, C. L., Kim, S. W., Chen, F.-D. & Martel, A. L. Self-supervised driven consistency training for annotation efficient histopathology image analysis. Med. Image Anal. 75, 102256 (2022).

    Article  PubMed  Google Scholar 

  106. Li, Z. et al. Domain generalization for mammography detection via multi-style and multi-view contrastive learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds de Bruijne, M. et al.) 98–108 (Springer, 2021).

  107. Sato, J. et al. Anatomy-aware self-supervised learning for anomaly detection in chest radiographs. Preprint at https://arxiv.org/abs/2205.04282 (2022).

  108. Wortsman, M. et al. Robust fine-tuning of zero-shot models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 7959–7971 (IEEE, 2022).

  109. Nguyen, T., Raghu, M. & Kornblith, S. Do wide and deep networks learn the same things? Uncovering how neural network representations vary with width and depth. Preprint at https://arxiv.org/abs/2010.15327 (2020).

  110. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. In International Conference on Learning Representations (ICLR) (OpenReview, 2021).

  111. He, K., Zhang, X., Ren, S. & Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision (eds Leibe, B. et al.) 630–645 (Springer, 2016).

  112. Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd International Conference on Machine Learning (eds Bach, F. & Blei, D.) 448–456 (2015).

  113. Qiao, S., Wang, H., Liu, C., Shen, W. & Yuille, A. Micro-batch training with batch-channel normalization and weight standardization. Preprint at https://arxiv.org/abs/1903.10520 (2019).

  114. You, Y., Gitman, I. & Ginsburg, B. Large batch training of convolutional networks. Preprint at https://arxiv.org/abs/1708.03888 (2017).

  115. Castro, E., Cardoso, J. S. & Pereira, J. C. Elastic deformations for data augmentation in breast cancer mass detection. In IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) 230–234 (IEEE, 2018).

  116. Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Navab, N. et al.) 234–241 (Springer, 2015).

  117. Szegedy, C. et al. Going deeper with convolutions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 1–9 (IEEE, 2015).

  118. Tripuraneni, N., Jordan, M. I. & Jin, C. On the theory of transfer learning: the importance of task diversity. Adv. Neural Inf. Process. Syst. 33, 7852–7862 (2020).

    Google Scholar 

  119. Du, S. S., Hu, W., Kakade, S. M., Lee, J. D. & Lei, Q. Few-shot learning via learning the representation, provably. Preprint at https://arxiv.org/abs/2002.09434 (2020).

  120. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  121. Loshchilov, I. & Hutter, F. Sgdr: stochastic gradient descent with warm restarts. Preprint at https://arxiv.org/abs/1608.03983 (2016).

  122. Goyal, P. et al. Accurate, large minibatch sgd: training imagenet in 1 hour. Preprint at https://arxiv.org/abs/1706.02677 (2017).

  123. Bengio, Y., Goodfellow, I. & Courville, A. Deep Learning (MIT Press, 2017).

  124. Wang, M. & Deng, W. Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018).

    Article  Google Scholar 

  125. Bello, I. et al. Revisiting resnets: improved training and scaling strategies. Adv. Neural Inf. Process. Syst. 34, 22614–22627 (2021).

    Google Scholar 

  126. Varadarajan, A. V. et al. Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning. Nat. Commun. 11, 130 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  127. Winkler, J. K. et al. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol. 155, 1135–1141 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  128. Seah, J. C. et al. Effect of a comprehensive deep-learning model on the accuracy of chest X-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet Digit. Health 3, e496–e506 (2021).

    Article  CAS  PubMed  Google Scholar 

  129. Haygood, T. M. et al. Timed efficiency of interpretation of digital and film-screen screening mammograms. AJR Am. J. Roentgenol. 192, 216–220 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  130. Jain, A. et al. Development and assessment of an artificial intelligence–based tool for skin condition diagnosis by primary care physicians and nurse practitioners in teledermatology practices. JAMA Netw. Open 4, e217249 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  131. Pugh, J. A. et al. Screening for diabetic retinopathy: the wide-angle retinal camera. Diabetes Care 16, 889–895 (1993).

    Article  CAS  PubMed  Google Scholar 

  132. Schölkopf, B. et al. Toward causal representation learning. Proc. IEEE 109, 612–634 (2021).

    Article  Google Scholar 

  133. Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2013).

    Article  PubMed  Google Scholar 

  134. Liu, J., Hu, Z., Cui, P., Li, B. & Shen, Z. Heterogeneous risk minimization. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T) 6804–6814 (PMLR, 2021).

  135. Robey, A., Pappas, G. J. & Hassani, H. Model-based domain generalization. Adv. Neural Inf. Process. Syst. 34, 20210–20229 (2021).

    Google Scholar 

  136. Shen, Z. et al. Towards out-of-distribution generalization: a survey. Preprint at https://arxiv.org/abs/2108.13624 (2021).

  137. Wang, J. et al. Generalizing to unseen domains: a survey on domain generalization. IEEE Trans. Knowl. Data Eng. (2022).

  138. Zhou, K., Liu, Z., Qiao, Y., Xiang, T. & Loy, C. C. Domain generalization: a survey. Preprint at https://arxiv.org/abs/2103.02503 (2021).

  139. Locatello, F. et al. Challenging common assumptions in the unsupervised learning of disentangled representations. In Proc. 36th International Conference on Machine Learning (eds Chaudhuri, K. & Salakhutdinov, R.) 4114–4124 (PMLR, 2019).

  140. Geirhos, R. et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. Preprint at https://arxiv.org/abs/1811.12231 (2018).

  141. Geirhos, R. et al. Generalisation in humans and deep neural networks. Adv. Neural Inf. Process. Syst. 31, 7538–7550 (2018).

    Google Scholar 

  142. Kim, H. & Mnih, A. Disentangling by factorising. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2649–2658 (PMLR, 2018).

  143. Yang, M. et al. CausalVAE: disentangled representation learning via neural structural causal models. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 9588–9597 (IEEE, 2021).

  144. Leeb, F. et al. Structure by architecture: disentangled representations without regularization. Preprint at https://arxiv.org/abs/2006.07796 (2020).

  145. Träuble, F. et al. On disentangled representations learned from correlated data. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 10401–10412 (PMLR, 2021).

  146. Dittadi, A. et al. On the transfer of disentangled representations in realistic settings. Preprint at https://arxiv.org/abs/2010.14407 (2020).

  147. Andreassen, A., Bahri, Y., Neyshabur, B. & Roelofs, R. The evolution of out-of-distribution robustness throughout fine-tuning. Preprint at https://arxiv.org/abs/2106.15831 (2021).

  148. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).

  149. Taori, R. et al. When robustness doesn’t promote robustness: synthetic vs. natural distribution shifts on ImageNet. In International Conference on Learning Representations (ICLR) (2019).

  150. Albuquerque, I., Monteiro, J., Darvishi, M., Falk, T. H. & Mitliagkas, I. Adversarial Target-Invariant Representation Learning for Domain Generalization (DeepAI, 2020).

  151. Li, Y. et al. Deep domain generalization via conditional invariant adversarial networks. In Proc. European Conference on Computer Vision (ECCV) (eds Ferrari, V. et al.) 624–663 (Springer, 2018).

  152. Ganin, Y. & Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proc. 32nd International Conference on Machine Learning (eds Bach, F. & Blei, D.) 1180–1189 (JMLR, 2015).

  153. Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 2096–2030 (2016).

    Google Scholar 

  154. Shao, R., Lan, X., Li, J. & Yuen, P. C. Multi-adversarial discriminative deep domain generalization for face presentation attack detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 10015–10023 (IEEE, 2019).

  155. Motiian, S., Piccirilli, M., Adjeroh, D. A. & Doretto, G. Unified deep supervised domain adaptation and generalization. In Proc. IEEE International Conference on Computer Vision 5716–5726 (IEEE, 2017).

  156. Muandet, K., Balduzzi, D. & Schölkopf, B. Domain generalization via invariant feature representation. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) I-10–I-18 (2013).

  157. Menegola, A. et al. Knowledge transfer for melanoma screening with deep learning. In IEEE 14th International Symposium on Biomedical Imaging (ISBI) 297–300 (IEEE, 2017).

  158. Xie, H. et al. Dual network architecture for few-view CT-trained on ImageNet data and transferred for medical imaging. In Proc. SPIE 11113, Developments in X-Ray Tomography XII (eds Müller, B. & Wang, G.) 111130V (SPIE, 2019).

  159. Alzubaidi, L. et al. Towards a better understanding of transfer learning for medical imaging: a case study. Appl. Sci. 10, 4523 (2020).

    Article  Google Scholar 

  160. Heker, M. & Greenspan, H. Joint liver lesion segmentation and classification via transfer learning. Preprint at https://arxiv.org/abs/2004.12352 (2020).

  161. Chen, S., Ma, K. & Zheng, Y. Med3D: transfer learning for 3D medical image analysis. Preprint at https://arxiv.org/abs/1904.00625 (2019).

  162. Liang, G. & Zheng, L. A transfer learning method with deep residual network for pediatric pneumonia diagnosis. Comput. Methods Prog. Biomed. 187, 104964 (2020).

    Article  Google Scholar 

  163. Geyer, R., Corinzia, L. & Wegmayr, V. Transfer learning by adaptive merging of multiple models. In Proc. 2nd International Conference on Medical Imaging with Deep Learning (eds Cardoso, M. J. et al.) 185–196 (PMLR, 2019).

  164. Noroozi, M. & Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. In European Conference on Computer Vision (eds Leibe, B. et al.) 69–84 (Springer, 2016).

  165. Zhang, R., Isola, P. & Efros, A. A. Colorful image colorization. In European Conference on Computer Vision (eds Leibe, B. et al.) 649–666 (Springer, 2016).

  166. Wu, Z., Xiong, Y., Yu, S. X. & Lin, D. Unsupervised feature learning via non-parametric instance discrimination. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 3733–3742 (IEEE, 2018).

  167. Hénaff, O. J. et al. Data-efficient image recognition with contrastive predictive coding. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. & Singh, A.) 4182–4192 (PMLR, 2020).

  168. van den Oord, A., Li, Y. & Vinyals, O. Representation learning with contrastive predictive coding. Preprint at https://arxiv.org/abs/1807.03748 (2018).

  169. Hjelm, R. D. et al. Learning deep representations by mutual information estimation and maximization. Preprint at https://arxiv.org/abs/1808.06670v5 (2019).

  170. Ye, M., Zhang, X., Yuen, P. C. & Chang, S.-F. Unsupervised embedding learning via invariant and spreading instance feature. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6203–6212 (IEEE, 2019).

  171. Bachman, P., Hjelm, R. D. & Buchwalter, W. Learning representations by maximizing mutual information across views. Adv. Neural Inf. Process. Syst. 15535–15545 (2019).

  172. Tian, Y., Krishnan, D. & Isola, P. Contrastive multiview coding. In European Conference on Computer Vision (eds Vedaldi, A. et al.) 776–794 (Springer, 2019).

  173. Misra, I. & Maaten, L. V. D. Self-supervised learning of pretext-invariant representations. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 6706–6716 (IEEE, 2020).

  174. Caron, M. et al. Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural Inf. Process. Syst. 33, 9912–9924 (2020).

    Google Scholar 

  175. Bai, W. et al. Self-supervised learning for cardiac MR image segmentation by anatomical position prediction. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Shen, D. et al.) 541–549 (Springer, 2019).

  176. Spitzer, H., Kiwitz, K., Amunts, K., Harmeling, S. & Dickscheid, T. Improving cytoarchitectonic segmentation of human brain areas with self-supervised Siamese networks. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Frangi, A. F. et al.) 663–671 (Springer, 2018).

  177. Zhuang, X. et al. Self-supervised feature learning for 3D medical images by playing a Rubik’s cube. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Shen, D. et al.) 420–428 (Springer, 2019).

  178. Zhu, J. et al. Rubik’s Cube+: a self-supervised feature learning framework for 3D medical image analysis. Med. Image Anal. 64, 101746 (2020).

    Article  PubMed  Google Scholar 

  179. Chaitanya, K., Erdil, E., Karani, N. & Konukoglu, E. Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv. Neural Inf. Process. Syst. 33, 12546–12558 (2020).

    Google Scholar 

  180. He, X. et al. Sample-efficient deep learning for COVID-19 diagnosis based on CT scans. Adv. Neural Inf. Process. Syst. 33, 12546–12558 (2020).

    Google Scholar 

  181. Li, H. et al. Imbalance-aware self-supervised learning for 3D radiomic representations. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds de Bruijne, M. et al.) 36–46 (Springer, 2021).

  182. Liu, J. et al. Align, attend and locate: chest X-ray diagnosis via contrast induced attention network with limited supervision. In Proc. IEEE/CVF International Conference on Computer Vision 106321–10640 (IEEE, 2019).

  183. Zhou, H.-Y. et al. Comparing to learn: surpassing ImageNet pretraining on radiographs by comparing image representations. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Martel, A. L.) 398–407 (Springer, 2020).

  184. Soni, P. N., Shi, S., Sriram, P. R., Ng, A. Y. & Rajpurkar, P. Contrastive learning of heart and lung sounds for label-efficient diagnosis. Patterns 3, 100400 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  185. Liu, Q., Yu, L., Luo, L., Dou, Q. & Heng, P. A. Semi-supervised medical image classification with relation-driven self-ensembling model. IEEE Trans. Med. Imaging 39, 3429–3440 (2020).

    Article  PubMed  Google Scholar 

  186. Wang, D., Zhang, Y., Zhang, K. & Wang, L. FocalMix: semi-supervised learning for 3D medical image detection. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition 3950–3959 (IEEE, 2020).

  187. Zhang, Y., Jiang, H., Miura, Y., Manning, C. D. & Langlotz, C. P. Contrastive learning of medical visual representations from paired images and text. In Proc. 7th Machine Learning for Healthcare Conference (eds Lipton, Z. et al.) 2–25 (PMLR, 2020).

  188. Truong, T., Mohammadi, S. & Lenga, M. How transferable are self-supervised features in medical image classification tasks? In Proc. Machine Learning for Health (eds Roy, S. et al.) 54–74 (PMLR, 2021).

Download references

Acknowledgements

This project was an extensive collaboration between Google Brain and the Google Health AI Team. We thank Z. Ghahramani for valuable feedback and continuous support through the course of the project; M. Raghu, J. Krause, D. Eck and M. Howell for valuable feedback in improving the quality of the work; J. Uszkoreit, J. Deaton, V. Godbole, M. Sieniek, S. Prabhakara, D. Golden, D. Steiner, X. Zhai, A. Giurgiu, T. Duerig, C. Semturs, P. Bui, J. Hartford, S. Jansen, S. Shetty, T. Spitz, D. Tran, J. Luo, O. Wichrowska and A. Ward for support throughout this project; multiple contributors to this international project: Rajavithi Hospital Thailand, Lions Eye Institute and Derbarl Yerrigan Health Service, Western Australia, Stanford Center for Artificial Intelligence in Medicine and Imaging, MIT Laboratory for Computational Physiology and PhysioNet, and NIH Clinical Centre; our collaborators at DermPath AI, Apollo Hospitals and EyePACS for support of this work; collaborators at Northwestern medicine and all members of the Etemadi Research Group for support of this work.

The images and data used in this publication were derived from the Optimam database, the creation of which was funded by Cancer Research UK. Part of the retinal image dataset was provided for the study by Sankara Nethralaya, Chennai, India. The results included in this paper are in whole or in part based on data generated by The Cancer Genome Atlas (TCGA) managed by the NCI and NHGRI. Information about TCGA can be found at the NIH website. This study also used archived and anonymized pathology slides, clinicopathologic variables, and outcomes from the Institute of Pathology and the Biobank at the Medical University of Graz. The study also used pathology slides from the CAMELYON challenge.

Author information

Authors and Affiliations

Authors

Contributions

S.A., J.F., L.C., V.N., N.H., A.K., M.N., S.K., T.C., N.T., J.M., B.M., P.S., S.S.M., F.R., E.W., P.-H.C.C. and G.H. contributed to the conception and design of the work. S.A., L.C., J.F., V.N., A.K., B.B., P.B., E.W., P.-H.C.C., Yuan Liu, Yun Liu, S.M.M., A.L., J.W., M.W., Z.B., A.G.R., D.R.W., L.P., G.S.C., U.T. and J.K. contributed to data acquisition. S.A., L.C., J.F., S.B., B.M. and V.N. majorly contributed to the evaluation of the work. S.A., L.C., J.F., V.N., N.H., A.K., M.N., S.B., S.K., T.C., B.B., D.R.W., D.F., G.S.C. and M.E. contributed to analysis and interpretation of the data. S.A., L.C., J.F., V.N., N.H., A.K., M.N., S.K., E.W., P.S., S.S.M. and M.E. contributed to drafting and revising the paper. N.H., A.K., M. N. and V.N. contributed equally as co-advisers.

Corresponding authors

Correspondence to Shekoofeh Azizi, Alan Karthikesalingam or Vivek Natarajan.

Ethics declarations

Competing interests

This study was funded by Google LLC and/or a subsidiary thereof (‘Google’). J.F., L.C., S.A., V.N., N.H., A.K., M.N., B.M., S.B., P.S., S.S.M., S.K., T.C., N.T., J.M., B.B., P.B., E.W., P.-H.C.C., Yuan Liu, Yun Liu, S.M., A.L., J.W., M.W., Z.B., A.G.R., U.T., D.R.W., D.F., L.P., G.S.C., J.K. and G.H. are employees of Google and may own stock as part of the standard compensation package. M.E. received funding from Google to support the research collaboration.

Peer review

Peer review information

Nature Biomedical Engineering thanks Pranav Rajpurkar and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 REMEDIS comparison with strong supervised JFT baseline under severe synthetic data shifts.

We observe that, under increasing severity of synthetic shifts, the performance of both the REMEDIS and the supervised baseline drops. However, the drop is more gradual for REMEDIS.

Extended Data Fig. 2 Overview of our experimental setup for the development of REMEDIS and of the baseline AI models across the various medical-imaging tasks.

The different stages in which unlabeled and labeled (both ID and OOD) are used for model development and evaluation.

Extended Data Fig. 3 Visual samples of distribution shifts across the medical-imaging tasks considered in this study.

Variation between ID and OOD data can be visually subtle or pronounced. This variation includes (but is not limited to) changes in contrast, sharpness or tint, differences in non-linear effects of X-ray-sensor construction or in zoom levels. The underlying cause of the distribution shift can be associated with technology shift, demographic shift or behavioral shift45.

Supplementary information

Main Supplementary Information

Supplementary Results, Discussion, Figures, Tables and References.

Reporting Summary

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azizi, S., Culp, L., Freyberg, J. et al. Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging. Nat. Biomed. Eng 7, 756–779 (2023). https://doi.org/10.1038/s41551-023-01049-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41551-023-01049-7

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing