Introduction

Diabetic retinopathy (DR), a common retinal microvascular complication of diabetes mellitus (DM), is a leading cause of vision loss in the working-age population globally [1]. With the aging of the global population and the expansion of the DM epidemic, the prevalence of DR will continue to escalate [2, 3]. The pooled prevalence rate of DR in the Chinese general population was 1.14% from 1990 to 2017; for patients with DM, the pooled prevalence rate of DR was 18.45% [4]. It is worth noting that the onset of DR is relatively insidious, and there are few symptoms. When diabetic patients first present to the hospital with fundus morbidities, they are always in intermediate or late stages DR. Early detection is necessary for a good DR prognosis. It will be very beneficial for all diabetic patients to undergo DR screening, especially in developing countries.

DR screening is usually performed by ophthalmologists through fundus examination or fundus photography. Although fundus cameras have become popular in primary hospitals, there is a lack of experienced ophthalmologists and the equipment is always underutilized. To assist community doctors in their work, the public health system has established telehealth services for DR, uploading fundus photos to higher-level hospitals and waiting for the results of manual interpretation [5]. However, the ophthalmologists of higher-level hospitals often cannot complete these interpretations in time due to the heavy workload. In addition, manual interpretation has certain subjectivity. The generation of automated techniques for DR diagnosis is undoubtedly critical to solving these problems [6]. Deep learning-based artificial intelligence (AI) grading of DR is fast and has achieved high validation accuracies. Raju et al. [7] reported a sensitivity of 80.28% and a specificity of 92.29% for automatic diagnosis of DR on the publicly available Kaggle dataset. Despite these studies, there is a lack of relevant data on the clinical use of this technology [8, 9].

In light of this, we applied deep learning-based AI grading of DR to community hospital clinics. The aim of this study is to assess the accuracy of AI-based techniques for DR screening and explore the feasibility of implementing AI-based techniques in community hospital clinic.

Methods

Participants

Diabetes patients who attended PengPu Town Community Hospital of Jing’an district between May 30, 2018 and July 18, 2018 were invited to participate in this study. All participants were 18 years of age or older and provided written informed consent. This study was approved by the ethical committee of Shibei hospital, Jing’an district, Shanghai (ChiCTR1800016785).

Retinal images

Using an automatic nonmydriatic Topcon TRC-NW400 camera (Topcon, Tokyo, Japan), 45° colour retinal photographs were taken of each eye. All retinal images captured two fields, macula-centred and disc-centred, according to EURODIAB protocol [10]. AI equipment was installed and used in the community hospital. The fundus photographs of each participant were analysed by AI and transmitted to two ophthalmologists simultaneously. Participants who had unclear fundus photographs due to small pupils, cataracts or vitreous opacity were removed.

Human grading

All fundus photographs were graded independently by two ophthalmologists (retina specialists, Kappa (κ) = 0.89954) who were masked to each other and to AI device outputs. When the results between the two retina specialists were inconsistent, the third retina specialist would make a decision. The grading of retinopathy was evaluated according to the International Clinical Diabetic Retinopathy (ICDR) severity scale [11]. The classification of the five stages of DR are as follows: score 0, no apparent retinopathy, no abnormalities; score 1, mild nonproliferative DR (NPDR), microaneurysms only; score 2, moderate NPDR, more than just microaneurysms but less than severe NPDR; score 3, severe NPDR, one or more of the following: (i) more than 20 intraretinal haemorrhages in each of four quadrants, (ii) definite venous beading in two or more quadrants and (iii) prominent intraretinal microvascular abnormality in one or more quadrants; score 4, proliferative DR (PDR), retinal neovascularization with or without vitreous/preretinal haemorrhage [8, 11]. Referable diabetic retinopathy (RDR) has been defined more than mild NPDR and/or macular oedema [12].

Automated grading

The AI device automated analysis and identified signs of retinal photographs with AI software (Airdoc, Beijing, China). Then, a DR screening report including referral recommendations was generated and delivered to the participant immediately. The core of the AI software is deep neural network. The deep neural network, a sequence of mathematical operations, is applied to certain inputs, such as images. The deep neural network, inception v4 architecture with input size 512 × 512 was used to build the classification network [13]. The network weights were pre-trained on the ImageNet dataset with 1.2 million images for 1000 categories classification and fine-tuned on the fundus image. This network was implemented by TensorFlow framework. In our experiment, the deep learning model is trained by using two Geforce GTX Titan X graphical processing units with CUDA version 9.0 cuDNN 7.0. The inputs of the network are the fundus images which are preprocessed by image mirror and image rotation. We augmented the training dataset via random rotating the image between −15° and +15°. The output of the deep neural network is a vector, which indicates the category (five stages of DR) of the input image. Based on the values of the output vector from the heatmap, the AI software gives a DR stage prediction (Fig. 1).

Fig. 1
figure 1

Examples of the heatmap generated by AI on different severity levels of DR

Statistical analysis

Statistical analysis of the data was performed using the SPSS statistical package version 16.0. The performance of the AI algorithm was evaluated using the ophthalmologist grading as a reference standard. Kappa (κ) statistics were used to quantify and evaluate the degree of agreement between automated analysis and manual grading. The sensitivity and specificity of the AI algorithm for detecting DR were calculated.

Results

At the PengPu town Community Clinic, 889 diabetic patients agreed to participate in DR screening, including 418 men and 471 women. The average age of the participants was 68.46 ± 7.168 years old. A total of 3556 retinal images were obtained and graded in this study. According to the ICDR classification scale, 149 participants had DR in at least one eye that was detected by the ophthalmologist or AI. Ophthalmologists detected DR in 143 (16.1%) participants, while AI detected DR in 145 (16.3%) participants. The proportion of different DR grades between the two interpretation modes is shown in Fig. 2. Most of the participants’ fundus photographs reveal no DR. Even among participants with DR, most were scored as moderate NPDR (score 2) or severe NPDR (score 3). RDR was diagnosed in 101 (11.4%) participants based on manual grading and in 103 (11.6%) participants using the deep learning algorithm. The matched diagnosis of RDR between ophthalmologist and AI grading was observed in 91 participants (Fig. 3).

Fig. 2
figure 2

Comparison of diabetic retinopathy (DR) grading between ophthalmologist and AI

Fig. 3
figure 3

Venn diagram showed the overlap comparison of RDR between human and automated grading

With ophthalmologist grading as the reference standard, the sensitivity and specificity of AI for detecting score 0, score 1, score 2–3, score 4, any DR and RDR are shown in Table 1. For detecting any DR, the sensitivity and specificity were 90.79% (95%, CI 86.4–94.1) and 98.5% (95%, CI 97.8–99.0), respectively. For detecting RDR, AI showed 91.18% (95%, CI 86.4–94.7) sensitivity and 98.79% (95%, CI 98.1–99.3) specificity. The area under the curve (AUC) was 0.946 (95%, CI 0.935–0.956) when testing the ability of AI to detect any DR; for the detection of RDR, the AUC was 0.950 (95%, CI 0.939–0.960) (Table 1).

Table 1 Sensitivity, specificity and AUC of AI for detection of any varying degrees of DR with ophthalmologist grading as reference standard

Discussion

This study assessed the accuracy of AI-based techniques for DR screening. We found that it was feasible to use an AI-based DR screening model in Chinese community clinic.

In our study, the prevalence of DR in DM patients was 16%, similar to the report by Song et al. They conducted a META analysis of 31 studies and found that the prevalence of DR in Chinese DM patients was 18.45% [4]. In India, the prevalence of DR in patients with type 2 diabetes was 17.6% [14]. In the US diabetes population, the incidence of any DR was 33.2% [15]. And Yau et al. estimated that the prevalence of any DR was 34.6% by summarizing 35 population-based study data from around the world [1]. Therefore, the difference in DR prevalence reported in different regions may be related to the research methods, the demographic characteristics, and the DR identification and classification.

The screening method for DR is constantly improving, but it still fails to meet the demand of the explosive growth of DM patients [16]. Verma et al. [17] attempted to reduce the workload of ophthalmologists in tertiary hospitals with the short-term training of nonophthalmologists and optometrists to examine the fundus under direct ophthalmoscopy. Later, the computerized “disease/no disease” scoring system was slightly better than the manual system [18]. In recent years, the emergence of deep learning algorithms has allowed for the quick and accurate identification of diabetic macular oedema (DMO) and the grading of DR [6]. This system can help not only in the early screening of DR but also in the long-term follow-up of DM patients [9].

Our statistical analysis showed that the sensitivity of DR screening using AI was high, reaching 90.79%, and the sensitivity of RDR screening using AI was 91.18%. Abràmoff et al. [19] showed the performance of a deep learning enhanced algorithm for the automated detection of RDR. The sensitivity of the algorithm was 96.8%, the specificity was 87.0% and the AUC was 0.980. Gargeya et al. [20] also reported a 94% sensitivity, a 98% specificity and an AUC of 0.970. However, our sensitivity is similar to the data reported by Ting et al. [21], with a sensitivity of 90.5%, a specificity of 91.6% and an AUC of 0.936. Because their research is closer to that of the clinic, it is conducted in community and clinical multiracial populations with diabetes. Moreover, the specificity of RDR in our study was 98.79%, which is slightly higher than that of previously reported cases. In our study, the detection sensitivity of proliferative DR was 80.36% and perhaps the relatively lower sensitivity in this group was relevant to the small sample size. Therefore, our system has the advantages in detecting “any retinopathy” and “no retinopathy”. The initial screening results will be further confirmed by ophthalmologists. Given the relatively low incidence of DR, it will effectively reduce the workload of ophthalmologists.

The advantage of this study is that AI equipment is studied in clinical scenarios. Our results and previous reports [8, 9] indicate that AI-based DR screening for outpatients seems to be feasible. Firstly, a community hospital clinic can provide an ideal environment for capturing individuals with DR in DM patients who may not routinely participate in an eye examination. What’s more, the patient’s pupils do not need to be dilated when taking a picture of the fundus, which is more easily accepted by the patient. Finally, the report is quickly released on the spot, and the doctor in community hospitals can recommend referral to the upper level hospital ophthalmologist according to the condition.

Undoubtedly, there are also some limitations of this study. The sample size is relatively small. For fundus photographs in which the pupil is small or the refractive interstitial is turbid, the quality of some image is poor [16]. For AI, it is difficult to accurately diagnose macular oedema based solely on the fundus photos. The classification of DMO will be an important improvement in the AI system in the future.

In summary, AI has a high sensitivity and specificity for identifying DR in Chinese community clinic. This approach seems to be feasible. Further research is needed to assess compliance with referral rates and the effectiveness of DR vision improvement.

Summary

What was known before

  • Although deep learning-based artificial intelligence (AI) grading of DR is fast and has achieved high validation accuracies, there is a lack of relevant data on the clinical use of this technology.

What this study adds

  • This research is a supplement to the data on the clinical use of this technology.