Introduction

Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, often affecting working-age individuals1. It is the most common microvascular complication of diabetes mellitus (DM), affecting nearly one-third of those with DM2. In the early non-proliferative stages of DR (NPDR), regular screening is recommended to identify individuals at risk of progression to proliferative DR (PDR)3. If PDR develops, early detection and treatment with laser and/or intravitreal anti-VEGF injections reduces the risk of permanent vision-threatening complications4,5. It is therefore imperative to accurately identify patients with PDR to guide timely treatment.

Eye exams and color fundus photographs are the standard of care to detect the presence of neovascularization (NV), the defining feature of PDR. NV must be discerned from other abnormal vascular features in NPDR, such as microaneurysms, dot-blot hemorrhages, venous beading, and intraretinal microvascular abnormalities6. Another imaging modality that complements eye exams is fluorescein angiography (FA). FA uses the fluorescent properties of fluorescein dye to highlight abnormal vascular features that may be more difficult to observe with an eye exam alone7. For example, the NV lesions that are diagnostic of PDR cause leakage on FA, which is characterized by bright hyperfluorescence that progressively enlarges over time with hazy borders. However, leakage caused by NV must still be distinguished from other angiographic findings in DR. For example, microaneurysms may appear as bright hyperfluorescent foci, retinal edema may appear as areas of late hyperfluorescence, and vascular incompetence manifests as late hyperfluorescence along retinal vessels. Thus, the interpretation of FAs may provide additional diagnostic value in DR beyond examination and color fundus photographs, but also requires a trained clinician to identify NV and PDR.

Convolutional neural networks (CNNs) have been used in the automated classification of eye diseases including age-related macular degeneration, retinopathy of prematurity, and glaucoma8,9,10. CNNs have also been used in the detection of referral-warranted diabetic retinopathy from color fundus photographs11,12. In this study, we trained a deep learning algorithm to detect leakage from retinal NV on ultra-widefield fluorescein angiography (UWF-FA), an imaging modality that provides a 200-degree field of view for FA. Retinal NV defines PDR, and accurate detection of NV facilitates early treatment before progression to vision-threatening complications. Our algorithm was able to accurately classify neovascular leakage in the presence of other hyperfluorescent and hypofluorescent lesions on UWF-FA images. With further improvements in performance, adoption of such an algorithm could allow for more consistent identification of NV, facilitating early treatment of patients with PDR and reducing the burden of blinding diabetic eye disease.

Results

Patient and imaging characteristics

The study included 678 images from 377 patients. Table 1 shows the demographic and clinical characteristics of the 377 patients.

Table 1 Patient baseline characteristics.

There was a slightly greater proportion of male patients (55%; n = 209) and most were White (66%; n = 249) with a substantial minority who were Black (24%; n = 90). Mean age was 58.6 years (SD 13.4), mean BMI was 32.6 kg/m2 (SD 8.7), and mean systolic blood pressure was 136.8 mmHg (SD 22.4). Most patients had type 2 DM (82%; n = 308), and mean hemoglobin A1c was 8.0% (SD 2.0). Eye-level characteristics of the 678 images are shown in Table 2.

Table 2 Eye characteristics.

Eyes with varying clinical DR severity ranging from no DR to PDR were included, with DR severity determined by eye exam with the treating retinal specialist. Mean logMAR visual acuity was 0.36 (SD 0.36), or Snellen equivalent 20/46. Severity of DR was determined on clinical exam and evaluation by a retinal specialist. 10% (n = 70) of eyes had no DR, 15% (n = 102) of eyes had mild NPDR, 20% (n = 135) of eyes had moderate NPDR, and 13% (n = 90) of eyes had severe NPDR. 24% (n = 163) of eyes had PDR. A substantial proportion of eyes had a history of macular edema (43%; n = 290). Graders identified the presence of angiographic neovascular leakage in 18% (n = 120) of images. Most eyes were pseudophakic (63%; n = 425).

Algorithm performance

We trained three CNNs and evaluated performance of the model-averaged ensemble classifier through five-fold cross-validation (Fig. 1).

Figure 1
figure 1

Architecture of the deep learning classifier. Five-fold cross-validation was used with an 80%/20% train/test split. Predictions from the ResNet152V2, EfficientNetB6, and InceptionResNetV2 convolutional neural networks were ensembled using model averaging.

The component CNNs were selected based on performance in a variety of prior ophthalmic applications13,14,15,16,17,18,19. Since the data set of 678 images contained only 120 images (18%) with grader-identified neovascular leakage, additional weight was placed on this classification to address the class imbalance.

Figure 2A shows the receiver operating characteristic (ROC) curves obtained from five-fold cross-validation. Area under the ROC curve (AUC) was 0.96 for the model-averaged ensemble predictor. The AUCs for each individual CNN in the ensemble were 0.90 for InceptionResNetV2, 0.92 for EfficientNetB6, and 0.94 for ResNet152V2. Figure 2B shows the precision-recall (PR) curves from five-fold cross-validation. The average precision was 0.87 for the ensemble predictor. Individual CNN average precisions were 0.76 for InceptionResNetV2, 0.79 for EfficientNetB6, and 0.83 for ResNet152V2. Table 3 lists the metrics obtained within each fold of training and testing for the ensembled predictor. At the selected operating point, sensitivity was 0.82, specificity was 0.95, and precision was 0.77.

Figure 2
figure 2

Receiver operating characteristic and precision-recall curves for detection of leakage from neovascularization by the deep learning classifier.

Table 3 Cross-validation metrics.

Figure 3 shows the confusion matrix for the ensemble classifier at the selected operating point. Most of the positive and negative images were corrected identified by the model, but a proportion of images were false-positives or false-negatives.

Figure 3
figure 3

Confusion matrix comparing grader labels and algorithm predictions. Most of the fluorescein angiogram images were negative for neovascularization.

Supplementary Figure 1 shows images of the false-positive and false-negative algorithm predictions from Fold 1 of cross-validation. Some potential reasons for false-negative predictions in these images (Supplementary Fig. 1) included NV that exhibited slow leakage of fluorescein, very small foci of NV, NV near the optic disc, and concurrent presence of other causes of leakage. Other causes of leakage observed included staining around retinal venules and retinal edema. Potential reasons for false-positive predictions in these images (Supplementary Fig. 2) included window defect from fibrosis or scar, aberrant non-neovascular vessels (such as intraretinal microvascular abnormalities or IRMA), and bright fluorescence from other causes of leakage such as clusters of microaneurysms. Forty randomly-selected images and corresponding saliency maps are included in Supplementary Fig. 3 (true-positive predictions), Supplementary Fig. 4 (true-negative predictions), Supplementary Fig. 5 (false-positive predictions), and Supplementary Fig. 6 (false-negative predictions).

While no standardized external test set exists for UWF-FA images in diabetic retinopathy, to conduct a pilot external validation study we used UWF-FA images of diabetic retinopathy found in the American Society of Retinal Specialists (ASRS) Image Bank. The algorithm correctly classified 17 out of 17 (100%) images as true-positive for NV leakage, and 4 out of 5 (80%) images as true-negative for NV leakage. 1 out of 5 (20%) images was classified as false positive, and there were no false negatives.

Discussion

Our deep learning algorithm was able to detect neovascular leakage in UWF-FA images containing other DR disease features that cause hyper- and hypo-fluorescent angiographic changes. To our knowledge, an algorithm to perform this classification task has not previously been constructed. Our image data set was obtained from a well-characterized group of patients with DM and varying stages of DR20. The algorithm may be useful in augmenting ophthalmologists’ or retinal specialists’ ability to discern neovascular leakage on fluorescein angiography in the clinical setting.

Deep learning algorithms have been created for screening, classification, and segmentation of numerous eye conditions8,9,11,21,22. In diabetic retinopathy, deep learning has been used to screen patients with diabetes mellitus (DM) for referral-warranted diabetic retinopathy (DR)11,23, and to identify the severity of DR24. For example, Dai et al. achieved an AUC of 0.97 for detection of PDR with a ResNet-based classifier24. Ting et al. achieved an AUC of 0.96 for detection of vision-threatening DR used a VGGNet-based classifier25. However, many studies including these used 50-degree color fundus photographs. Peripheral vascular lesions not visible on traditional photography are commonly found in DR and may be important prognostic indicators26,27. Our study used UWF imaging with a 200-degree field of view to include NV lesions outside of the posterior pole28. A study by Nagasawa et al. used non-FA UWF imaging to detect treatment-naïve PDR with an AUC of 0.9729. However, in their study the classifier only had to distinguish between images of normal subjects and images of subjects with PDR, whereas discriminating between NPDR and PDR may be a more challenging task. Our algorithm trained on UWF-FA images achieved an AUC of 0.96 to detect NV leakage, a finding that is diagnostic of PDR. Sickle cell retinopathy is another retinal vascular disease which may also progress into a proliferative stage characterized by neovascularization and tractional retinal detachment. Cai et al. trained an InceptionV4 network to detect seafan neovascularization from ultra-wide-field fundus photographs, achieving sensitivity and specificity of 0.9730. Early detection of proliferative vascular disease using automated methods may facilitate early treatment to reduce the risk of vision loss.

Our data included 163 eyes classified as PDR by a clinician, whereas only 120 images were labeled by graders as positive for NV leakage. This discrepancy could be explained by clinical scenarios in which a clinician would diagnose PDR despite lack of evidence of NV on UWF-FA. These could include development of vitreous hemorrhage in an eye with known pre-existing diabetic retinopathy, imaging of a patient who had intravitreal injection performed for proliferative disease preceding referral (causing regression of NV on imaging), neovascularization in the far periphery visible clinically but not visualized on UWF-FA, or iris neovascularization. False-positive predictions by the algorithm were not statistically different (Chi-Squared Test) between eyes with PDR without NV leakage on imaging, and eyes with NPDR without NV leakage on imaging.

Fluorescein angiogram results are typically recorded as a collection of a dozen or more image frames reflecting the different time points and phases of the angiogram. A potential use of the algorithm would be identifying image frames in which neovascularization is detected, indicating the most important frames to review. In this role, the algorithm would facilitate clinical diagnosis. Deep learning has been used to detect abnormalities on FA in other retinal diseases such as retinopathy of prematurity and age-related macular degeneration31,32,33. Since PDR is a cause of vision-threatening complications of diabetes mellitus, early diagnosis is key to obtaining appropriate treatment to prevent vision loss4,5. Leakage from NV is present in active, untreated proliferative diabetic retinopathy, and often resolves with treatment of PDR.

The limitations of this study include the lack of a larger, standardized external test image set, the inherent limitations of a retrospective, single-center study, inability to assess for diabetic macular edema, and that images used were only from the early venous phase of fluorescein angiography. No standardized external test image set exists for this classifier. We used a data set of 22 images from the ASRS Image Bank as a pilot data set for external validation, but a larger, standardized external test set would be needed to confirm the algorithm’s generalizability. Improved data capture and sharing standards through the National Eye Institute Bridge2AI initiative or a model-to-data approach could be used to confirm external validity34. Although diabetic macular edema does cause leakage on FA, the preferred imaging modality for detection of diabetic macular edema is optical coherence tomography and not FA. A multi-modality imaging approach with optical coherence tomography would be needed to incorporate detection of diabetic macular edema which is an additional vision-threatening complication of DM. Finally, we chose the early venous phase of FA because leakage from NV appeared to be most prominent during this phase in contrast to other angiographic findings. We reasoned that earlier phases of FA would be less likely to exhibit leakage from neovascularization, and that in the later phases of FA, hyperfluorescence from staining would be more difficult to distinguish from leakage. However, leakage from NV may also be present during other phases, and the temporal pattern of leakage could provide additional information to the model. Analysis of videos of FA could also be of benefit, but video FA is not widely recorded which limits its generalizability.

In summary, we trained a deep learning algorithm to detect the presence of neovascular leakage in UWF-FA images from patients with diabetic retinopathy. With additional testing to verify external validity, the algorithm could help guide early identification and treatment of proliferative diabetic retinopathy.

Methods

We included patients 18 years of age or older with DM who received a diagnosis of diabetic retinopathy by a retina-trained clinician and had an ultra-widefield fluorescein angiography (UWF-FA) study performed between January 2009 and May 2018 at two sites of a tertiary academic medical center (Kellogg Eye Center Ann Arbor and Grand Blanc). Images were obtained retrospectively from a previously generated data set used for quantification of retinal neovascularization20. Briefly, the images were captured with a UWF scanning laser ophthalmoscopy device (Optos 200Tx or Optos California; Optos PLC). Images from the venous phase of the fluorescein angiogram were used because leakage from NV was most prominent during this phase compared to other angiographic findings. Images were excluded if image quality was too poor to identify distinct fundus features. Images were labeled with neovascular leakage by graders who underwent training as described previously20. Graders were masked to patient data. A fellowship-trained retinal specialist (P.Y.Z.) verified grader labels. Images were cropped to the central 1792 × 1280 and downsampled to 896 × 640. This study adhered to the tenets of the Declaration of Helsinki. The study was initiated after approval by the University of Michigan Institutional Review Board (HUM00120509, PI: Y.M. Paulus), which approved an exemption for the requirement to obtain informed consent.

For the deep learning algorithm, we trained three CNNs: ResNet152V2, EfficientNetB6, and InceptionResNetV2. Each network was pre-trained on ImageNet and then trained on the UWF-FA data set. The algorithm as evaluated using five-fold cross-validation, which each fold consisting of an 80% training set and 20% test set. To generate an ensembled prediction, we averaged the predictions of each of the three CNNs. The Adam algorithm was used for optimization, and the learning rate was set to 0.0005. The batch size was set to 16. Training images were randomly augmented with horizontal and/or vertical translation of up to 10%, random horizontal and vertical flip, image rotation of up to 72 degrees, and image zoom between 90 and 110%. Training was set to a maximum of 35 epochs, with early stopping if the training loss did not decrease after 8 epochs. Computation was performed using Keras with TensorFlow version 2.7.0 as backend on the University of Michigan High Performance Computing Cluster (16 GB NVIDIA Tesla V100; NVIDIA Corporation). Statistical analyses were performed using Excel and Python version 3.9.7 with scikit-learn module version 1.0.1.