A deep learning model for identifying diabetic retinopathy using optical coherence tomography angiography

As the prevalence of diabetes increases, millions of people need to be screened for diabetic retinopathy (DR). Remarkable advances in technology have made it possible to use artificial intelligence to screen DR from retinal images with high accuracy and reliability, resulting in reducing human labor by processing large amounts of data in a shorter time. We developed a fully automated classification algorithm to diagnose DR and identify referable status using optical coherence tomography angiography (OCTA) images with convolutional neural network (CNN) model and verified its feasibility by comparing its performance with that of conventional machine learning model. Ground truths for classifications were made based on ultra-widefield fluorescein angiography to increase the accuracy of data annotation. The proposed CNN classifier achieved an accuracy of 91–98%, a sensitivity of 86–97%, a specificity of 94–99%, and an area under the curve of 0.919–0.976. In the external validation, overall similar performances were also achieved. The results were similar regardless of the size and depth of the OCTA images, indicating that DR could be satisfactorily classified even with images comprising narrow area of the macular region and a single image slab of retina. The CNN-based classification using OCTA is expected to create a novel diagnostic workflow for DR detection and referral.


Results
A total of 301 eyes (51 healthy normal, 51 diagnosed with diabetes mellitus (DM) without DR, 53 with mild non-proliferative DR, 49 with moderate non-proliferative DR, 48 with severe non-proliferative DR, 49 with proliferative DR) were recruited and imaged. After excluding the images with insufficient scan quality, a total of 240 datasets, consisting of 40 datasets for each stage of DR, for each of 3 × 3 mm 2 and 6 × 6 mm 2 scan images were obtained. For the external validation, a total of 195 eyes (33 healthy normal, 36 diagnosed with DM without DR, 34 with mild non-proliferative DR, 31 with moderate non-proliferative DR, 31 with severe non-proliferative DR, 30 with proliferative DR) were further recruited and imaged. After excluding the images with insufficient scan quality, a total of 120 datasets, consisting of 20 datasets for each stage of DR, for each of 3 × 3 mm 2 and 6 × 6 mm 2 scan images were obtained.
For the classification task of diagnosing referable DR, the AUCs of the CNN-based classifier using 3 × 3 mm 2 OCTA images were 0.919 (with 86% sensitivity and 93% specificity), 0.967 (with 91% sensitivity and 98% specificity), 0.942 (with 88% sensitivity and 98% specificity), and 0.940 (with 89% sensitivity and 97% specificity) for only SCP, only DCP, only full-thickness retina, and the combined data, respectively. The AUCs of the CNNbased classifier using 6 × 6 mm 2 OCTA images were 0.975 (with 95% sensitivity and 98% specificity), 0.975 (with 94% sensitivity and 99% specificity), 0.970 (with 95% sensitivity and 99% specificity), and 0.976 (with 96% sensitivity and 98% specificity) for only SCP, only DCP, only full-thickness retina, and the combined data, respectively. Conversely, the AUCs of the machine learning-based classification using local features extracted from OCTA images were 0.795 (with 66% sensitivity and 86% specificity) for 3 × 3 mm 2 images and 0.837 (with 76% sensitivity and 88% specificity) for 6 × 6 mm 2 OCTA images (Table 3, Fig. 1). The confusion matrix between the ground truth labels and the predictions of the proposed method are illustrated in Supplementary Fig. S2. For the external dataset, the performances of the CNN-based classifier were comparable: − 0 to − 0.05 AUC, − 3% www.nature.com/scientificreports/ to + 9% sensitivity, and − 2% to + 5% specificity for 3 × 3 mm 2 OCTA images, and − 0.04 to − 0.06 AUC, − 1% to − 7% sensitivity, and − 0% to − 2% specificity for 6 × 6 mm 2 OCTA images (Table 4). Figure 2 illustrates examples of OCTA images with the class activation maps (CAM) obtained using the proposed model for detecting DR and referable DR cases. In the case of no DR, the whole image was weakly activated, with the exception of the foveal avascular zone (FAZ) and the region around the large blood vessel   www.nature.com/scientificreports/ being occasionally activated. On the other hand, in the case of referable DR, the overall area around the FAZ and vicinity of the large blood vessels were strongly activated. The activation increased in regions where the density of blood vessels significantly changed compared to regions of even spread, which is related to the non-uniformity in the blood vessel region. As the activation map visualizes areas where the network is used for decision making, rather than the specific abnormal area, the activated area was not the same for the 3 × 3 mm 2 images and 6 × 6 mm 2 images.

Discussion
In this study, we designed an end-to-end deep learning-based classification system for DR and referable DR diagnoses using OCTA, based on ground truths determined clinically using UWF FA. The proposed CNN classifier produced good results with an accuracy of 90 to 95%, a sensitivity of 91 to 98%, a specificity of 85 to 93%, and an AUC of 0.93 to 0.97 for detecting the onset of DR; moreover, an accuracy of 91 to 98%, a sensitivity of 86 to 96%, a specificity of 93 to 99%, and an AUC of 0.94 to 0.98 was obtained for detecting referable DR. In the external validation, overall similar performances were also achieved. Thus, the proposed classifier showed consistently reliable performance for all OCTA slabs (whether individually or combined) including all evaluated scan sizes. www.nature.com/scientificreports/ Recently, researchers have focused on automated solutions for the diagnosis and classification of retinal diseases. However, previous studies used medical records or fundus photographs with limited field of views to diagnose the stages of DR in the training set for automated models. This technique is relatively subjective which has limitations, as it cannot reflect the diabetic changes based on the entire retina. Although models based on conventional fundus photographs can be trained with large amounts of data owing to easy accessibility, there is an inherent potential for misclassification of DR. Therefore, it is uncertain whether these models will function effectively with other devices in the clinical setting. Through this study, we were able to reflect diabetic changes from the entire retina based on UWF FA for accurate diagnosis of DR in all cases, which is one of the significant advantages of this study.
In this study, a deep CNN algorithm using OCTA achieved comparable accuracy to previous CNN-based DR grading algorithms using fundus photographs with less than 250 samples [11][12][13][14][15] . OCTA visualizes microvascular structures in different retinal layers, enabling comprehensive quantitative analysis of pathological vascular changes relative to diabetes. As DR is primarily a disease of the retinal vasculature, OCTA can provide more instructive information than fundus photographs and perhaps a more suitable imaging modality for the automated classification of DR. This result is consistent with those of previous deep learning classification studies using OCT and OCTA, which demonstrated that the detailed information of the macular region extracted through OCT/OCTA images is sufficient to diagnose DR at a level similar to fundus photographs, although they provide limited field of view than fundus photographs 23,24 . Further studies are required to directly compare the results of deep learning algorithm using OCTA and traditional fundus photography, respectively, based on accurate ground truth using UWF FA.
Since DR can cause extensive and full-depth damages to the retinal microvasculature, we investigated the ability of the CNN algorithm across different image sizes and retinal slabs to identify which images are most appropriate for DR classification. Several cross-sectional studies have compared the diagnostic performance of DR assessment based on OCTA scan size or slab depth using quantitative microvascular metrics of OCTA, but the results are still debatable [30][31][32][33][34][35][36][37][38][39] . As those studies depended on handcrafted feature extraction for DR characterization, results may be affected by parameters of image acquisition; including oversampling density, filters and algorithms used for quantification of vascular metrics 40,41 . Moreover, though algorithms using handcrafted features may perform well on singular OCTA datasets; it can be difficult to generalize to other datasets due to overfitting on the original samples 42,43 . However, as OCTA contains more unlabeled information, a fully automated CNN algorithm can process heterogeneous images quickly regardless of the size and slab for accurate and objective DR classification, potentially alleviating the requirement for resource-intensive manual analysis and thus guiding high-risk patients for further treatment. Interestingly, we observed that the obtained results were similar regardless of the size and depth of the OCTA images. Our results showed that DR, including pathological changes of the entire retina, could be satisfactorily classified even with images comprising an area of 3 × 3 mm 2 of the macular region and a single image slab of SCP or DCP.
The highlighted regions in the CAM images accurately correspond to the local features such as the foveal avascular zone (FAZ) area, blood vessel density, skeletal vessel density, and/or fractal dimension. In the case of no DR, the whole image was weakly activated, but only FAZ and the region around the large blood vessel were activated. Conversely, in the case of referable DR, an overall strongly activated region appeared, including the area around the FAZ and vicinity of the large blood vessels. The activation of the CAM increased in regions where the density of blood vessels significantly changed compared to regions of even spread, which is related to the non-uniformity in the blood vessel region. Based on this observation, we hypothesize that FAZ and blood vessel density played an important role in the classification of DR using CNN. As we used CAM to check how predictions are made in this study and its application to clinical practice is still a matter to be considered in the future work.
This end-to-end CNN classifier showed better efficacy than the machine learning classifier using local features extracted after the vessel and FAZ segmentation. This further indicates that traditionally known human parameters used in this study are insufficient for DR characterization and the missing critical features can be effectively extracted through end-to-end deep learning. Although it is difficult to compare the performance due to the different experimental settings, the machine-learning model in this study showed lower performance than previous studies [16][17][18] . The performance of the machine-learning model can be improved by increasing the dataset, but improvements may be limited even with the aforementioned features. As deep learning leverages unlabeled information to achieve the best accuracy in most cases, it can produce good results even with a small training dataset.
Though we report comparable performance in this study, a notable limitation is that the number of patients employed is still relatively small. However, the number of patients in this study is comparable to others employing OCTA [16][17][18] , considering that this technology is still not ubiquitous in ophthalmology practices. Additionally, the absence of FA in normal subjects may affect the reliability of the ground truth label as a whole. Although FA could not be performed in the normal subjects due to ethical issues, the presence of systemic and ophthalmic diseases was checked and excluded through detailed history taking and medical records confirmation. Lastly, although we performed external validation using additional data, it remains to be proven on a fully independent, larger, de novo set of images that also contains images with macular edema, artifacts, or low-quality for application to real-world clinical practice of this system. However, this study supports an important first step in an end-to-end deep learning models for DR classification using OCTA images. Moreover, the ground truth for classification of DR stages based on UWF FA is another strength of this study as UWF imaging involves diabetic changes from the entire retina with a wider field of view than conventional FA and fundus photography. Next, it will be necessary to elucidate the ability of the deep learning algorithm using OCTA to identify not only eyes with referrable DR but also eyes with peripheral dominant lesions or eyes with PDR. www.nature.com/scientificreports/ In this work, we introduced a fully automated deep CNN DR classification method using only OCTA images. Although OCTA is rapidly adapted to the new modality in a clinical routine, the interpretation of OCTA data remains limited. If OCTA can provide comparable diagnostic value to UWF FA, invasive FA can be avoided even when diagnosis or referral decisions are difficult. In this way, the OCTA-based automated classification framework can perform more accurate DR screening than using conventional fundus photography. This system on a clinical basis is expected to drastically reduce the rate of vision loss attributed to DR, improve clinical management, and create a novel diagnostic workflow for disease detection and referral. For proper clinical application of our method, further testing and optimization of the sensitivity metrics such as genetic factors, hemoglobin A1C, duration of diabetes, and other clinical data may be required to ensure a minimum falsenegative rate. Combining the data from various imaging modalities such as fundus photography or FA can reinforce the performance value, thereby further improving accuracy. Future work should include the extension of the algorithm to classify the detailed severity levels of DR with a larger number of patients.

Methods
Dataset. This cross-sectional study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board (IRB) of Yeungnam University Medical Center (approval number: 2020-02-003). The requirement for written consent was waived by the IRB because of the retrospective nature of the study. Data were collected between January 2018 and January 2019. Data between February 2019 and January 2020 were additionally collected to validate our method on external dataset.
Subjects who had visited the hospital for visual floater and ocular discomfort, and who had undergone detailed examination including OCTA (Optovue RTVue XR AVANTI, Optovue Inc., Fremont, CA, USA), but had no systemic disease or ocular disease were retrospectively included. Subjects who had previously been diagnosed with diabetes mellitus (DM) and undergone comprehensive ophthalmic examinations including UWF FA (Optos California, Optos plc, Dunfermline, UK) and OCTA were also included. UWF FA was performed limitedly after explaining possible side effects if the patient desires a full-examination in spite of absence of DR. A small number of diabetic patients without DR wanted FA examination, accounting for less than 10% of those diabetic patients without DR who visited our clinic during the year. Only diabetic patients underwent fluorescein angiography, but not the healthy control participants. As the normal control group, patients without systemic disease who had undergone several ophthalmic examinations including mydriatic examination and OCTA for health-screening purposes, but had no definite ocular diseases were included. In the case of diabetes without retinopathy, only cases where FA was performed were included. This is retrospective study, hence, no patients received FA to participate this study. Indication of FA is not related with the protocol of this study. Exclusion criteria included the presence of glaucoma or retinal disorders affecting retinal capillary changes other than DR. Eyes with macular edema were excluded because it can obscure retinal microvasculature on OCTA. Images with low signal strength (≤ 6), excessive motion artifacts, and projection artifacts caused by media opacities were also excluded. OCTA images were obtained as volume scans of 3 × 3 mm 2 and 6 × 6 mm 2 sizes centered on the macula, and images of the SCP, DCP, and full-thickness retina slab were used for analysis. The ground truth for determining the accuracy of each diagnosis and grading DR was determined by two masked expert retinal specialists (G.H. and D.P.) reviewing all phases of central/axial UWF FA images, which were recorded up to 15 min after dye injection. Grading was performed based on the International Clinical DR Severity Scale 44 , which was adapted by means of extending the grading quadrants to the periphery of the entire image while maintaining the original grading nomenclature for simplicity 29,45 . When there was a disagreement between the graders, the supervising grader (M.S.) confirmed the final decision.
Convolutional neural network-based classifier using raw images from OCTA . The overall structure of the proposed method for detecting early signs of DR and referable DR is shown in Fig. 3. SCP, DCP, and full-retina OCTA images were concatenated and used as input of the CNN. Using the ResNet101 model 46 , images were passed through residual blocks with 101 layers, which repeatedly performed a summation of the input and output feature maps from the convolution layers each with batch normalization, rectified linear unit (ReLU) activation functions, and max pooling. After the residual blocks, each feature map was averaged in the global average pooling (GAP) layer with the probability of each stage obtained through a fully connected layer with a softmax function. CAM were derived from the GAP layer by summating the feature maps with the weights from the last layer in order to visualize regions that show high correlation with the task of interest. It is worth nothing that parameters of the network were initially transferred from the pre-trained parameters of the ImageNet dataset, excluding the first and last layer parameters. Subsequently, all parameters were retrained using our OCTA dataset, which was optimized based on the cross-entropy loss with an Adam optimizer and a learning rate of 0.0001 47 .
Machine learning-based classifier using local features extracted from OCTA images. The machine learning-based classifier consisted of three stages: segmentation, feature-extraction, and classification stages (Fig. 4).
In the segmentation step, U-Net 48 was used to segment the blood vessels and FAZ from the OCTA images. The combined data from each layer of the OCTA image was used as input, given that prior machine learning based studies demonstrated that the best DR classification results were obtained when local features from the combined data was used i.e. both SCP and DCP [16][17][18] . In the U-Net model, a contracting path extracts high-level features from the input images by repeatedly using convolution layers, batch normalization, ReLU activation function, and max pooling; while the expanding path generates a segmentation map of the same size as the input image by repeatedly using upsampling, convolution layers, batch normalization, and ReLU activation functions on the www.nature.com/scientificreports/ extracted high-level features. In the expanding path, intermediate feature maps of the contracting path were concatenated with the feature maps of the previous expanding path and used as input in the next expanding path. The parameters in the network were optimized using the Adam optimizer 47 with a dice similarity coefficient loss and learning rate of 0.0001. In the feature-extraction stage, four local features (blood vessel density, skeletal vessel density, fractal dimension, and size of the FAZ) were extracted from the segmented OCTA images. Finally, in the classification step, data from these four extracted features were fed into a neural network classifier to classify the OCTA images into DR and normal cases as well as referable DR and non-referable states.  www.nature.com/scientificreports/ Experimental setting and statistical analysis. To obtain the final predictions for all the data samples, we divided the data into four distinct subsets with an even class distribution and performed four-fold cross validation. Specifically, a classifier was trained using three subsets and then tested on the remaining subset. The above operation was repeated four times with different combinations of subsets so that the DR stages of the entire dataset was obtained. The results were then compared to the ground truth determined by retinal specialists using the UWF FA images. Accuracy, sensitivity, specificity, and AUC of the system was calculated to evaluate overall performance. These metrics were calculated based on the average value obtained by four test runs. To validate our CNN method on external dataset, we trained our model using all training data used for fourfold cross validation and tested on the external dataset. The machine learning model was also trained with the same setup to compare with the CNN model.

Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.