Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Predicting systemic diseases in fundus images: systematic review of setting, reporting, bias, and models’ clinical availability in deep learning studies

Abstract

Background

Analyzing fundus images with deep learning techniques is promising for screening systematic diseases. However, the quality of the rapidly increasing number of studies was variable and lacked systematic evaluation.

Objective

To systematically review all the articles that aimed to predict systemic parameters and conditions using fundus image and deep learning, assessing their performance, and providing suggestions that would enable translation into clinical practice.

Methods

Two major electronic databases (MEDLINE and EMBASE) were searched until August 22, 2023, with keywords ‘deep learning’ and ‘fundus’. Studies using deep learning and fundus images to predict systematic parameters were included, and assessed in four aspects: study characteristics, transparent reporting, risk of bias, and clinical availability. Transparent reporting was assessed by the TRIPOD statement, while the risk of bias was assessed by PROBAST.

Results

4969 articles were identified through systematic research. Thirty-one articles were included in the review. A variety of vascular and non-vascular diseases can be predicted by fundus images, including diabetes and related diseases (19%), sex (22%) and age (19%). Most of the studies focused on developed countries. The models’ reporting was insufficient in determining sample size and missing data treatment according to the TRIPOD. Full access to datasets and code was also under-reported. 1/31(3.2%) study was classified as having a low risk of bias overall, whereas 30/31(96.8%) were classified as having a high risk of bias according to the PROBAST. 5/31(16.1%) of studies used prospective external validation cohorts. Only two (6.4%) described the study’s calibration. The number of publications by year increased significantly from 2018 to 2023. However, only two models (6.5%) were applied to the device, and no model has been applied in clinical.

Conclusion

Deep learning fundus images have shown great potential in predicting systematic conditions in clinical situations. Further work needs to be done to improve the methodology and clinical application.

摘要

背景: 深度学习技术分析眼底图像有助于筛查全身性疾病。然而, 数量迅速增加的研究, 其质量存在差异, 并且缺乏系统性评估。目的: 对所有旨在使用眼底图像和深度学习预测系统参数和条件的文献进行系统回顾, 评估其性能, 并提供促进临床实践转化的建议。方法: 截至2023年8月22日, 使用关键词“深度学习”和“眼底”检索了两个主要的电子数据库 (MEDLINE和EMBASE) 。包括使用深度学习和眼底图像预测系统参数的研究, 并从四个方面进行评估: 研究特征、透明报告、偏倚风险和临床可用性。透明报告由TRIPOD声明评估, 而偏倚风险由PROBAST评估。结果: 系统性搜索共4969篇文章, 其中31篇被纳入综述。眼底图像可预测各种血管性和非血管性疾病, 包括糖尿病及相关疾病 (19%) 、性别 (22%) 和年龄 (19%) 。大多数研究集中在发达国家。根据TRIPOD, 在确定样本大小和缺失数据处理方面的报告不足。对于数据库和代码完整访问的报道也存在不足。根据PROBAST, 1/31 (3.2%) 被分类为总体偏倚风险低, 而30/31 (96.8%) 被分类为总体偏倚风险高。而5/31 (16.1%) 研究使用了前瞻性外部验证队列。只有两项 (6.4%) 描述了研究的校准情况。从2018年到2023年, 每年的发表文献数量显著增加。然而, 只有两个模型 (6.5%) 应用于设备, 没有模型应用于临床。结论: 深度学习眼底图像在预测临床情况方面展现出巨大潜力。需要进一步努力改进方法学和临床应用。

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow chart of study records.
Fig. 2: Risk of bias assessed using Prediction model Risk Of Bias Assessment Tool (PROBAST) reporting standards for all included studies.
Fig. 3: Temporal distribution illustrating the number and models’ clinical availability of studies included in this review.
Fig. 4: Distribution illustrating the number and clinical availability of studies in 7 different prediction outcomes.

Similar content being viewed by others

Data availability

Raw data are available on request from the corresponding author.

References

  1. Liew G, Gopinath B, White AJ, Burlutsky G, Yin Wong T, Mitchell P. Retinal vasculature fractal and stroke mortality. Stroke. 2021;52:1276–82.

    Article  CAS  PubMed  Google Scholar 

  2. Patton N, Aslam TM, MacGillivray T, Deary IJ, Dhillon B, Eikelboom RH, et al. Retinal image analysis: concepts, applications and potential. Prog Retin Eye Res. 2006;25:99–127.

    Article  PubMed  Google Scholar 

  3. Forster RB, Garcia ES, Sluiman AJ, Grecian SM, McLachlan S, MacGillivray TJ, et al. Retinal venular tortuosity and fractal dimension predict incident retinopathy in adults with type 2 diabetes: the Edinburgh Type 2 Diabetes Study. Diabetologia. 2021;64:1103–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Wong TY, Knudtson MD, Klein R, Klein BE, Meuer SM, Hubbard LD. Computer-assisted measurement of retinal vessel diameters in the Beaver Dam Eye Study: methodology, correlation between eyes, and effect of refractive errors. Ophthalmology. 2004;111:1183–90.

    Article  PubMed  Google Scholar 

  5. Thom S, Stettler C, Stanton A, Witt N, Tapp R, Chaturvedi N, et al. Differential effects of antihypertensive treatment on the retinal microcirculation: an anglo-scandinavian cardiac outcomes trial substudy. Hypertension. 2009;54:405–8.

    Article  CAS  PubMed  Google Scholar 

  6. Czakó C, Kovács T, Ungvari Z, Csiszar A, Yabluchanskiy A, Conley S, et al. Retinal biomarkers for Alzheimer’s disease and vascular cognitive impairment and dementia (VCID): implication for early diagnosis and prognosis. Geroscience. 2020;42:1499–525.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Gamble L, Mash AJ, Burdan T, Ruiz RS, Spivey BE. Ophthalmology (eye physician and surgeon) manpower studies for the United States. Part IV: Ophthalmology manpower distribution 1983. Ophthalmology. 1983;90:47a–64a.

    Article  CAS  PubMed  Google Scholar 

  8. Yuan M, Chen W, Wang T, Song Y, Zhu Y, Chen C, et al. Exploring the growth patterns of medical demand for eye care: a longitudinal hospital-level study over 10 years in China. Ann Transl Med. 2020;8:1374.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Celi LA, Mark RG, Stone DJ, Montgomery RA. “Big data” in the intensive care unit. Closing data loop. Am J Respir Crit Care Med. 2013;187:1157–60.

    Article  PubMed  Google Scholar 

  10. Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health. 2020;2:e489–e492.

    Article  PubMed  PubMed Central  Google Scholar 

  11. Mookiah MRK, Hogg S, MacGillivray TJ, Prathiba V, Pradeepa R, Mohan V, et al. A review of machine learning methods for retinal blood vessel segmentation and artery/vein classification. Med Image Anal. 2021;68:101905.

    Article  PubMed  Google Scholar 

  12. van Leeuwen KG, Schalekamp S, Rutten M, van Ginneken B, de Rooij M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur Radio. 2021;31:3797–804.

    Article  Google Scholar 

  13. Auffermann WF, Gozansky EK, Tridandapani S. Artificial intelligence in cardiothoracic radiology. AJR Am J Roentgenol 2019;212:997–1001.

  14. Jones OT, Matin RN, van der Schaar M, Prathivadi Bhayankaram K, Ranmuthu CKI, Islam MS, et al. Artificial intelligence and machine learning algorithms for early detection of skin cancer in community and primary care settings: a systematic review. Lancet Digit Health. 2022;4:e466–e476.

    Article  CAS  PubMed  Google Scholar 

  15. Phillips M, Marsden H, Jaffe W, Matin RN, Wali GN, Greenhalgh J, et al. Assessment of accuracy of an artificial intelligence algorithm to detect melanoma in images of skin lesions. JAMA Netw Open. 2019;2:e1913436.

    Article  PubMed  PubMed Central  Google Scholar 

  16. Nabi J. Artificial intelligence can augment global pathology initiatives. Lancet. 2018;392:2351–2.

    Article  PubMed  Google Scholar 

  17. Bera K, Schalper KA, Rimm DL, Velcheti V, Madabhushi A. Artificial intelligence in digital pathology - new tools for diagnosis and precision oncology. Nat Rev Clin Oncol. 2019;16:703–15.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Ting DSJ, Foo VH, Yang LWY, Sia JT, Ang M, Lin H, et al. Artificial intelligence for anterior segment diseases: Emerging applications in ophthalmology. Br J Ophthalmol. 2021;105:158–68.

    Article  PubMed  Google Scholar 

  19. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350:g7594.

    Article  PubMed  Google Scholar 

  20. Moons KG, Altman DG, Reitsma JB, Ioannidis JP, Macaskill P, Steyerberg EW, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162:W1–73.

    Article  PubMed  Google Scholar 

  21. Nagendran M, Chen Y, Lovejoy CA, Gordon AC, Komorowski M, Harvey H, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368:m689.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Corti C, Cobanaj M, Marian F, Dee EC, Lloyd MR, Marcu S, et al. Artificial intelligence for prediction of treatment outcomes in breast cancer: systematic review of design, reporting standards, and bias. Cancer Treat Rev. 2022;108:102410.

    Article  CAS  PubMed  Google Scholar 

  23. Wolff RF, Moons KGM, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. 2019;170:51–58.

    Article  PubMed  Google Scholar 

  24. Moons KGM, Wolff RF, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. 2019;170:W1–w33.

    Article  PubMed  Google Scholar 

  25. Gallifant J, Zhang J, Del Pilar Arias Lopez M, Zhu T, Camporota L, Celi LA, et al. Artificial intelligence for mechanical ventilation: systematic review of design, reporting standards, and bias. Br J Anaesth. 2022;128:343–51.

    Article  CAS  PubMed  Google Scholar 

  26. Lee AY, Yanagihara RT, Lee CS, Blazes M, Jung HC, Chee YE, et al. Multicenter, head-to-head, real-world validation study of seven automated artificial intelligence diabetic retinopathy screening systems. Diabetes Care. 2021;44:1168–75.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Kaushal A, Altman R, Langlotz C. Geographic distribution of US cohorts used to train deep learning algorithms. JAMA. 2020;324:1212–3.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117.

    Article  PubMed  Google Scholar 

  29. Blaha MJ. The critical importance of risk score calibration: time for transformative approach to risk score validation? J Am Coll Cardiol. 2016;67:2131–4.

    Article  PubMed  Google Scholar 

  30. Laukkanen JA, Kunutsor SK. Is ‘re-calibration’ of standard cardiovascular disease (CVD) risk algorithms the panacea to improved CVD risk prediction and prevention? Eur Heart J. 2019;40:632–4.

    Article  PubMed  Google Scholar 

Download references

Funding

This work was supported by National Natural Science Foundation of China (82220108017, 82141128), the Capital Health Research and Development of Special (2020–1–2052), and the Science & Technology Project of Beijing Municipal Science & Technology Commission (Z201100005520045, Z181100001818003).

Author information

Authors and Affiliations

Authors

Contributions

YTL and DL conceived the study and designed the study protocol. YTL, RHZ and WBW executed the search and extracted data. YTL performed the initial analysis of data, with all authors contributing to interpretation of data. All authors contributed to critical revision of the manuscript for important intellectual content and approved the final version. WBW is the study guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Corresponding author

Correspondence to Wenbin Wei.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Zhang, R., Dong, L. et al. Predicting systemic diseases in fundus images: systematic review of setting, reporting, bias, and models’ clinical availability in deep learning studies. Eye (2024). https://doi.org/10.1038/s41433-023-02914-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41433-023-02914-0

Search

Quick links