Introduction

Retinal exudates and/or drusen share similar appearances while being signs of different vision-threatening fundus diseases [1,2,3]. For example, retinal exudates in diabetic patients indicate diabetic retinopathy (DR) severity level of moderate or worse [4]. and these patients require to be referred to retina specialists for further clinical evaluation [5, 6]. In addition, retinal exudates can also result from other retinal vasculopathy (e.g., hypertensive retinopathy and retinal vein occlusion) and retinal inflammation (e.g., acute retinal necrosis syndrome and uveoencephalitis), and drusen can be a sign of age-related macular degeneration (AMD) [1,2,3]. These RED-related fundus diseases often lead to irreversible visual impairment, and early diagnosis and timely medical intervention of these diseases help to prevent visual loss and blindness [7,8,9,10,11]. Digital imaging of the retina examined by retina specialists is sensitive in detecting RED [12]. However, manual examination of fundus images is time-consuming and labour-intensive, especially when applied to a large scale.

To enhance the early detection of RED in screening programs, various computer-based artificial intelligence (AI) systems have been developed to automatically detect RED in fundus images [13,14,15,16,17,18]. Those systems were developed based on DR and AMD datasets [13,14,15,16,17,18] and therefore can only be applied to detect the RED-related DR and AMD with limited application to other types of RED-related fundus diseases, such as intermediate and posterior uveitis, Coats’ disease, and acute retinal necrosis syndrome [19,20,21,22,23].

Currently, all of the computer-based systems for detecting RED were based on traditional fundus images [2, 3, 13,14,15,16,17,18,19, 24]. This type of image only provides a 30 to 60-degree visible scope of the retina, and it is mainly used to detect lesions at the posterior pole region with limited information about the peripheral retina [25, 26]. RED that initially appears in the peripheral retina, such as exudates caused by intermediate uveitis [27], is easy to be undetected when applying these systems. Ultra-widefield fundus (UWF) imaging provides a 200-degree panoramic review of the retina, and it can be applied to identify fundus lesions in almost the entire retina, including both the posterior pole and peripheral regions [28]. Several studies have developed AI systems using UWF images for identifying lattice degeneration, retinal detachment, idiopathic macular holes, DR, pathological myopia, etc [29,30,31,32,33,34]. To date, no automated intelligent systems for detecting RED using UWF images have been reported.

In this study, we developed a system with deep learning for the automated detection of RED-related fundus diseases using UWF images. In addition, we evaluated this system in three independent datasets and compared the performance of the system to that of retina specialists.

Methods

Data acquisition

A total of 22,411 UWF images (13,258 subjects) were collected from the Chinese Medical Alliance for Artificial Intelligence (CMAAI), a union of medical organizations, computer science research groups, and related enterprises in the AI field with the purpose of improving the research and translational applications of AI in medicine. The CMAAI dataset includes subjects who underwent retinopathy examinations, ophthalmology consultations, and routine ophthalmic health evaluations. The UWF images were captured between June 2016 and June 2019 using an OPTOS nonmydriatic camera (OPTOS Daytona, Dunfermline, UK) with 200-degree fields of view. The subjects were examined without mydriasis. All of the UWF images were anonymized before being transferred to research investigators. This study was approved by the Institutional Review Board of Zhongshan Ophthalmic Center (ZOC) and performed in accordance with the tenets of the Declaration of Helsinki.

Image classification and reference standard

All UWF images were classified into two groups: RED and non-RED. The RED group included images of various types of retinal exudates (e.g., DR, intermediate and posterior uveitis, and Coats’ disease) and drusen (e.g., AMD and optic nerve drusen). The non-RED group included images of the normal retina and other retinal lesions, such as retinal detachment, glaucomatous optic neuropathy, lattice degeneration, and retinal haemorrhages. Poor-quality images were automatically excluded from the study by our previously established deep learning-based image filtering system [35].

Training a deep learning system requires a reliable reference standard [36]. The reference standard for all included UWF images was obtained by consensus annotation by three board-certified retina specialists with over five years of experience. Any disputed images were adjudicated by another senior retina specialist with over twenty years of experience. The performance of the deep learning system in detecting RED was compared to this reference standard.

Image preprocessing and augmentation

We performed image standardization before deep learning. The pixel values of the UWF images were scaled to a range of 0–1, and the size of the images was resampled to a resolution of 512 × 512 pixels. Data augmentation was conducted to increase the diversity of a training dataset and thus reduce the chance of overfitting during a deep learning process. The training dataset was augmented 5-fold via a combination of the random horizontal and vertical flip, random rotation up to 90 degrees around the image centre, and random brightness shift within the range of 0.8 to 1.6. A total of 69,925 UWF images were used as training data.

Development and evaluation of the deep learning model

The whole pipeline of our deep learning model development is shown in Fig. 1. The UWF images from the CMAAI dataset were randomly divided into the training set, validation set, and test set with a ratio of 7:1.5:1.5 (no overlapping subjects). The training set was used to optimize the parameters of the deep learning model; the validation set was used to guide the selection of hyperparameters, and the test set was used to evaluate the selected model. Two external datasets were used to further verify the effectiveness of the model. One was obtained from the outpatient clinics at ZOC in Guangzhou (southeast China), consisting of 1311 UWF images from 676 subjects, and the other was obtained from the outpatient clinics and health screening centre at Xudong Ophthalmic Hospital (XOH) in Inner Mongolia (northwest China), consisting of 2687 UWF images from 1060 subjects. The reference standard of these two datasets was the same as the CMAAI dataset.

Fig. 1: The pipeline of developing and evaluating a deep learning system for the detection of retinal exudates and/or drusen based on ultra-widefield fundus (UWF) images.
figure 1

CMAAI Chinese Medical Alliance for Artificial Intelligence, ZOC Zhongshan Ophthalmic Center, XOH Xudong Ophthalmic Hospital.

Our deep learning model was trained in TensorFlow by a state-of-the-art deep convolutional neural network (CNN) architecture, InceptionResNetV2, which mimics the architectural characteristics of two previous CNNs (the Residual Network and the Inception Network) [37]. CNN architectures were initialized by the weights pretrained for ImageNet classification [38].

The model was trained up to 180 epochs. In the training process, validation loss was assessed using the validation set after each epoch and performed as a reference for model choice. Early stopping was employed, and if the validation loss was not improved over 60 consecutive epochs, the training process was ceased. The model state with the lowest loss was selected as the final state of the model.

Model explanation

The saliency map visualization technique was used to understand which areas in the UWF image had the most influence on our deep learning system when detecting RED. This technique calculates the gradient of the output of the CNN with regard to each pixel in the image to identify the pixels with the greatest impact on the final classification. The intensity value of the heatmap is a direct indication of the pixels’ impact on the system’s classification. Using this approach, the heatmap traces back to a specific location in the UWF image to highlight features that positively contributed to the classification. The effectiveness of the heatmap was determined by a senior retina specialist based on whether the highlighted regions were colocalized with the regions of the RED.

Characteristics of misclassified images

To analyse misclassified images, a senior retina specialist reviewed them and categorized false-negative images and false-positive images according to the most commonly observed characteristics.

Comparisons between the deep learning system and retina specialists

To evaluate our deep learning system in the context of RED screening, we recruited 2 retina specialists who had 3 and 6 years of experience, respectively, in UWF image analysis, and then we compared the performance of the system to retina specialists with the reference standard using the ZOC dataset. Notably, to reflect the level of the retina specialists in normal clinical practices, they were not told that they competed with the deep learning system to avoid bias from the competition.

Statistical analyses

Receiver operating characteristic (ROC) curves and areas under the curve (AUCs) with 95% confidence intervals (CIs) were used to estimate the performance of the deep learning system in different datasets. The sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) were calculated according to the reference standard for each dataset. Unweighted Cohen’s kappa coefficients were applied to compare the results of the system to the reference standard. All statistical analyses were conducted using Python 3.7.3 (Wilmington, Delaware, USA).

Results

Baseline characters

In total, 26,409 UWF images from 14,994 subjects were used to develop and evaluate the deep learning system. The demographics and image characteristics of the datasets from the CMAAI, ZOC, and XOH are summarized in Table 1.

Table 1 Demographics and image characteristics of datasets.

Performance of the deep learning system

The AUCs of the system in detecting RED were 0.994 (95% CI: 0.991–0.996), 0.972 (95% CI: 0.957–0.984), and 0.988 (95% CI: 0.983–0.992) in the CMAAI test set, ZOC set, and XOH set, respectively (Fig. 2). Further information on the system’s performance, including the sensitivity, specificity, accuracy, PPV, and NPV of each dataset, is shown in Supplementary Table 1. Compared to the reference standard of the CMAAI test set, ZOC set, and XOH set, the unweighted Cohen’s kappa coefficients of the system were 0.878 (95% CI: 0.853–0.902), 0.876 (95% CI: 0.841–0.911), and 0.815 (95% CI: 0.772–0.857), respectively.

Fig. 2: Receiver operating characteristic (ROC) curves of the deep learning system in detecting retinal exudates and/or drusen from ultra-widefield fundus images obtained at multiple sites.
figure 2

AUC area under the ROC curve, CI confidence interval, CMAAI Chinese Medical Alliance for Artificial Intelligence, ZOC Zhongshan Ophthalmic Center, XOH Xudong Ophthalmic Hospital.

Interpretability of the deep learning system

To investigate the interpretability of the deep learning system in detecting RED from UWF images, the network was visualized by saliency maps. We found that heatmaps effectively highlighted regions of RED, regardless of the number, location, and shape of the RED. Typical examples of heatmaps for RED images are shown in Fig. 3.

Fig. 3: Typical examples of positive images and corresponding heatmaps.
figure 3

Drusen of age-related macular degeneration shown in A1 correspond to the highlighted areas displayed in heatmap A2. Retinal exudates of diabetic retinopathy shown in B1 correspond to the highlighted areas displayed in heatmap B2. Retinal exudates of Coats’ disease shown in C1 correspond to the highlighted areas displayed in heatmap C2.

False-negative and false-positive findings

In the CMAAI test set, ZOC set, and XOH set, a total of 44 RED images were misclassified into the non-RED group by the deep learning system (false-negative classification), among which 14 images showed RED under obscured optical media, 16 images showed unclear RED due to underexposure, and 14 images showed tiny RED (Supplementary Figure 1A). In contrast, a total of 164 non-RED images were erroneously assigned to the RED group, among which 41 images had flash artifacts, 73 images had harsh reflections from the internal limiting membrane, and 50 images had several flecks of white dust (Supplementary Figure 1B).

Retina specialists vs. the deep learning system

In the ZOC dataset, for detecting RED in UWF images, the retina specialist with 6 years of experience achieved a sensitivity of 93.5% (95% CI: 90.2–96.8) and specificity of 97.4% (95% CI: 96.4–98.4), and the retina specialist with 3 years of experience achieved a sensitivity of 86.6% (95% CI: 82.1–91.1) and specificity of 94.4% (95% CI: 93.3–95.8), while the deep learning system achieved a sensitivity of 94.9% (95% CI: 92.0–97.8) and specificity of 96.5% (95% CI: 95.4–97.6) (Fig. 4).

Fig. 4: Comparisons between the deep learning system and retina specialists for the detection of RED in the dataset of the Zhongshan Ophthalmic Center.
figure 4

Retina specialist A, 6 years of experience in ultra-widefield fundus (UWF) image analysis. Retina specialist B, 3 years of experience in UWF analysis. RED, retinal exudates and drusen. AUC, area under the receiver operating characteristic curve. The figure on the right side is the enlarged portion of the yellow shadow of the figure on the left side.

Discussion

In this study, we developed and evaluated a deep learning system based on 26,409 UWF images. The system exhibited robust performance in automated RED screening. In addition, the system showed broad generalizability, since the AUCs in all of the external validation datasets were greater than 0.97. Also, the unweighted Cohen’s kappa coefficients indicated high agreement between the system outcomes and the reference standard, further demonstrating the effectiveness of our system. Compared with the performance of retina specialists in detecting RED, that of the system was comparable to that of a retina specialist with six years of experience and was better than a retina specialist with three years of experience.

Due to the automation characteristics and reliable performance, this deep learning-based system can be applied in primary care centres and undeveloped areas that lack retina specialists for the early detection of RED-related fundus diseases (e.g., uveoencephalitis, referable DR, and AMD), providing timely referrals for positive cases. In addition, our previously established UWF image-based screening system would detect more fundus lesions if combining this system [39,40,41,42], which can be deployed in hospitals with large numbers of patients to assist retina specialists by avoiding examinations of evidently normal eyes, saving time for patients in need.

Several automated techniques for detecting RED have been published. Sánchez et al. [16] built a computer-aided diagnosis system using active learning based on 4000 retinal images and reported an AUC of 0.823. Rocha et al. [43] developed a machine learning-based approach using 1014 retinal images, which achieved an AUC of 0.953. Sadek et al. [13] developed a deep learning method based on 1113 retinal images and described an accuracy rate of 92.0%. Compared with these studies, our study has several unique features. First, all previous studies were based on traditional fundus images that had a high possibility of missing lesions located in the peripheral retina due to the limited visible scope of these images. Our study has developed the first deep learning system to detect RED using UWF images covering almost the entire retina. Second, we developed a lesion-based screening system rather than a disease-based screening system, to screen for the RED-related fundus diseases. As a preliminary screening tool, it might be more reasonable and reliable to detect RED instead of making a specific diagnosis of RED simply by fundus images without considering other clinical information (e.g., age, lifestyle, and medical history) and examinations. Third, to enhance the performance, the datasets that we used to train and validate the system were substantially large (26,409 UWF images from 14,994 subjects). Fourth, our datasets were acquired at multiple medical centres with different UWF cameras and thereby were more representative of the real world.

Deep learning algorithms are often deemed a “black box” because they use millions of image features to make a classification [44]. To interpret the decision-making rationales of our system, a heatmap was generated to indicate the location on which the decision of the system was based. Inspiringly, the regions of RED in UWF images were highlighted, further substantiating the effectiveness of our system. This interpretability feature could further promote the application of our system in real-world settings and could assist retina specialists in efficiently localizing the lesion sites.

Although our system has high accuracy in detecting RED, misclassification still exists. When investigating the reasons for false-negative classifications, approximately 63.6% of the misclassified images resulted from unclear RED features caused by obscured optical media or underexposure. The remaining false-negative images were attributable to RED that were too small to be identified. When analysing the reasons for false-positive classifications, misclassified images were due to flash artifacts, white dust, or harsh reflections from the internal limiting membrane, all of which had a similar appearance of RED. To reduce false-negative and false-positive results of the system, more studies are needed to explore how these happened and to find strategies to minimize the misclassifications.

There are several limitations to this study. First, the system was developed based on two-dimensional images lacking stereoscopic qualities, thus rendering the differentiation between dot RED and white dust on the camera lens challenging. Therefore, it is necessary to keep the lens clean prior to taking an image. In addition, although UWF imaging can capture the largest visible scope of the retina when compared to other existing technologies, this equipment could still not cover the whole retina. Accordingly, our system may miss a few REDs that are not captured by UWF imaging. Third, as this study was retrospective and the images were collected retrospectively from multiple medical centres, the nature of the lesions (drusen or exudates) in most UWF images could not be determined due to lack of other clinical information and examinations. Therefore, this study did not establish a system to differentiate between drusen and exudates.

Conclusions

We have developed and evaluated a deep learning system capable of detecting RED using UWF images obtained from multiple clinical settings. This system achieves high sensitivity and specificity, comparable to those of an experienced retina specialist. Our system has great potential to provide a timely referral for positive cases, promoting the early detection and treatment of RED-related fundus diseases. Prospective multicentre validation is expected to obtain high-level evidence for real-world application in subsequent studies.

Summary

What was known before

  • Retinal exudates and drusen (RED) are signs of many ocular fundus diseases (e.g., uveoencephalitis, referable diabetic retinopathy [DR], and age-related macular degeneration [AMD]) that can result in irreversible vision loss. Early detection and treatment of these RED-related diseases can help reduce retinal damage and improve vision prognosis. Manual screening for RED is time-consuming and labor-intensive, and automated diagnosis systems based on traditional fundus images (30- to 60-degree visible scope of the retina) often miss RED located in the peripheral retina.

What this study adds

  • We developed a deep learning system to detect the RED of a variety of ocular fundus diseases from ultra-widefield fundus images (200-degree) that can provide information of almost the entire retina. This system achieves high sensitivity and specificity, comparable to an experienced retinal specialist. Due to the automation characteristics and reliable performance, our deep learning-based system has the high potential to be applied to screen for RED from UWF images as a part of ophthalmic health evaluations in physical examination centres or be applied in undeveloped areas that lack retinal specialists to provide timely referrals for cases with RED-related ocular fundus diseases, such as uveoencephalitis, referable DR, and AMD.