Abstract
Diabetic retinopathy is one of the leading causes of blindness around the world. This makes early diagnosis and treatment important in preventing vision loss in a large number of patients. Microaneurysms are the key hallmark of the early stage of the disease, non-proliferative diabetic retinopathy, and can be detected using OCT angiography quickly and non-invasively. Screening tools for non-proliferative diabetic retinopathy using OCT angiography thus have the potential to lead to improved outcomes in patients. We compared different configurations of ensembled U-nets to automatically segment microaneurysms from OCT angiography fundus projections. For this purpose, we created a new database to train and evaluate the U-nets, created by two expert graders in two stages of grading. We present the first U-net neural networks using ensembling for the detection of microaneurysms from OCT angiography en face images from the superficial and deep capillary plexuses in patients with non-proliferative diabetic retinopathy trained on a database labeled by two experts with repeats.
Similar content being viewed by others
Introduction
Diabetic retinopathy (DR) is one of the leading causes of blindness among working-age individuals worldwide. It consists of two stages, an earlier non-proliferative (NPDR) stage, and a more advanced proliferative stage (PDR) which occurs when new retinal blood vessels form (‘proliferate’) often in response to tissue retinal ischemia. During the earlier NPDR stage, patients may be asymptomatic, however microaneurysms (MAs), the hallmark of this stage, already emerge as outpouchings of the retinal blood vessels that are weakened as a result of the sugar overload in the blood1,2,3,4.
NPDR can be graded as mild, moderate, and severe and MAs are an early and important clinical sign of disease progression and are a main component of classifying DR severity. Early diagnosis of DR is key for treatment and preserving patient vision since it can prevent blindness in more than 90% of patients1.
Fluorescein angiography (FA) is currently the gold standard for the diagnosis of DR and the most sensitive test for detecting MAs. However, it suffers from several drawbacks. During FA imaging, fluorescein, a contrast agent, is injected to highlight the patient’s retinal vasculature3. In rare cases, fluorescein can lead to an anaphylactic shock in patients that are allergic to it, a reaction that can be fatal if urgent medical intervention is not taken2. This makes FA invasive, costly, and time consuming. Furthermore, superposition of retinal capillary layers and leakage pose a challenge to FA, while the deep capillary plexus is barely visible in FA2,5. The combination of these factors makes FA less suitable as an ideal screening tool for DR, pushing scientists and engineers to find complementary imaging modalities such as optical coherence tomography angiography (OCTA)3,6. OCTA allows the separation of the superficial (SCP) and deep capillary plexuses (DCP) and does not require injections of a contrast agent. OCTA on the other hand does not show all MAs visible with FA as the speed of blood flow within certain MAs is below the threshold of OCTA detection2.
Machine learning and deep learning methods for biomedical segmentation tasks have made significant progress over the last decade. This ranges from segmentation of disease markers on acquired clinical images to patient referral7,8. The U-net architecture in particular has been successful in the field of biomedical segmentation tasks and can be considered state-of-the-art9. Three-dimensional U-nets have been trained by Deep Mind to enable the referral of patients based on OCT scans8.
There already exists an expansive body of work in the scientific literature about finding MAs or markers of DR in images. This includes using non deep learning-related approaches such as eigenvalue analysis, radon-transform, multi-agent learning, and dictionary learning on fundus photos10,11,12,13,14. Artificial neural networks have been used to locate MAs in fundus photos15,16,17,18,19,20 as well as in FA fundus images21,22. The U-net architecture has also been used to segment MAs in fundus photos23,24,25,26. Neural networks have also been used in conjunction with OCT images for marker identification in DR7,27.
In the case of OCTA, classic machine learning algorithms such as random forests or image feature analysis have been used before28,29,30,31. A large number of different types of neural networks have been trained on OCTA images for the diagnosis of DR. These approaches range from evaluating textural features, and transfer learning to ensembled approaches and the segmentation of vascular features32,33,34,35,36. Ryu et al. included a U-net in their approach for the diagnosis of DR37. There is no published work, to the best of our knowledge, that uses a U-Net to segment MAs from OCTA en face projections. This extends to ensembling of U-Nets as well as to the segmentation of MAs on two capillary layers at the same time.
Neural networks are commonly associated with the “black box” problem. If a network is trained on images with only a specific referral class as network output then the output is often not immediately traceable or explainable to the outside observer38. This is especially the case when a network generates a diagnosis or a referral without indicating which specific markers led to it. For this reason we propose to detect commonly recognized markers that indicate disease progression. This allows to leverage the adaptability of a neural network while generating segmentation results on markers that clinical staff can interpret and judge themselves39,40.
Because of OCTA being non-invasive and its ability to resolve different layers of the retina separately and the proven capabilities of the U-net architecture for biomedical segmentation tasks, we decided to adapt nnU-Net to the task of segmenting MAs from OCTA en face projections of the SCP and DCP. Creating an annotated data set with accurate labels is a very time-consuming process. We decided to approach that challenge by annotation MAs using bounding boxes that are converted to a binary label before training of the networks.
It is the objective of this work to detect MAs as an early marker for DR from OCTA scans, which can be acquired quickly and non-invasively, using an adapted U-net architecture. The paper is structured as follows: first we describe the creation of the expert labeled database and the adaptation of nnU-Net. Then we describe the evaluation, followed by a result and discussion section and a conclusion.
Method
This section consists of two main parts. First, the MA labeling process and creation of the expert labeled database are described, followed by an explanation of the neural network.
This study was approved by the institutional review board (IRB) at Tufts Medical Center and conformed to the tenets of the Declaration of Helsinki and the Health Insurance Portability and Accountability Act of 1996. Informed consent obtained from patients at the New England Eye Center was considered exempt by the IRB because of the study’s retrospective design.
Expert labeled database
Training of the network requires a training data set with accurate annotations. There is currently, to the best of our knowledge, no data set of OCTA scans of patients with NPDR/PDR and annotated MAs publicly available. We created a suitable data set ourselves, which was labeled by two expert graders from the New England Eye Center at Tufts Medical Center in Boston.
119 eyes of 70 patients diagnosed with early, intermediate, or severe NPDR or PDR were included in this study. Data were collected from a Zeiss Plex Elite 9000 SS-OCT device with dual-speed 100 kHz and 200 kHz A-scan rate, a lateral resolution of \(\le 20\) micrometer, and an axial resolution of 6.3 micrometer. All OCTA images had a signal strength of 6 or grater indicated by the system’s software and were qualitatively screened for overall quality and excessive artifacts. The field size of all scans is \(6\times 6\) mm.
The data was split into 96 eyes (from 52 patients) for training and 23 eyes (from 18 patients) for testing. The system software was used to segment the SCP and DCP of the OCTA scans and to generate en face projections. Table 1 shows the number of patients and diagnosis for the test and training data split.
Our two stage approach for creating an expert labeled database of MAs from the SCP and DCP layers is similar to the one by Bertram et al.41. For the first stage, the two expert graders labeled MAs in the en face projections as is shown in Fig. 1. To create the expert labeled database, the graders used the open source web-based labeling tool EXACT to label MAs in both layers42. Each MA was annotated by creating a bounding box containing it on the en face projections. Even though MAs were annotated separately in the SCP and DCP, the presented method uses 2D images and 2D convolutions and not volumetric data. Each eye uses 2D fundus images, with the SCP and the DCP being in separate channels. First, the experts labeled MAs in the en face projections independently of each other by reviewing each en face projection image twice. MAs were identified by the experts by examining the available OCTA en face images themselves.
In the next step, the experts had to come to an agreement on each MA label which is necessary for it to become part of the expert labeled database. Only the bounding boxes on which both experts agreed remained in the database. In order to illustrate the challenge of finding and annotating all MAs, we computed the Pearson correlation coefficient for the numbers of MAs per eye labeled by each grader before they had to come to an agreement. How many MAs are labeled by each grader on a given eye can serve as a surrogate for reader agreement. The Pearson correlation coefficient for number of MAs labeled per eye is \(\approx 0.21\) with a p-value of \(\approx 0.045\). A correlation coefficient of 1.0 would indicate perfect agreement, while 0.0 indicates no correlation at all. This helps to illustrate the challenge for readers with respect to finding and annotating MAs.
The contents of the bounding boxes in the en face images were converted to training targets for the network via a thresholding step with examples shown in Fig. 2. A threshold of 150 was applied to the areas enclosed by bounding boxes to generate binary labels for the MAs. This threshold was chosen for the value range of 0 to 255 for the en face images. In order to find this threshold, a small subset of randomly chosen MAs was used to find a threshold that preserved the area of the MAs after thresholding. It is possible that small groups of unconnected pixels, not directly belonging to the MA remain, as seen in Fig. 2. This can be compensated later on by suppressing connected components below a given size on the network’s output (details further down below).
The OCTA en face projections were used as training input for the network, while the bounding box annotations were converted to binary ground truth images with per-pixel annotations of MAs for the network output. The process is shown in Fig. 3. The en face images served as input to the network, while the binary masks generated from the en face images and the bounding boxes were used as training targets. Both SCP and DCP en face projections were used as input for the network at the same time. The input used channel one for the SCP and channel two for the DCP.
The database was used for the first stage of training of the networks and to decide on the training parameters. After training this initial network on the training data set via fivefold cross-validation the second stage of the database creation could proceed. The resulting false positives (FPs) and false negatives (FNs) were then reviewed by the expert graders again. Because of the small size of potential MAs and their potentially large numbers, it is a challenge for the graders to find all MAs. Reviewing MAs, which were flagged by the neural network as false positives, can help to identify MAs that have been overlooked by the experts before. Even though the number of MAs in a given eye can be substantial, the overall fraction of all pixels in all images that belong to a MA is relatively small. This means that less than 1 % of all pixels were labeled as belonging to a MA. The database resulting from this two stage process was used for training of the networks and their results in the results and discussion section below.
In order to assert the quality of the labeled data set, we show the number of labeled MAs per eye in Fig. 4. The diagram indicated the number of MAs labeled per eye with the given disease severity. I.e. the blue marker near 60 indicated that an eye with the diagnosis of mild NPDR contains 59 labeled MAs. The red markers indicate increasing numbers of MAs coinciding with disease progression. There is a drop from severe NPDR to PDR however, which is likely related to laser treatment in patients. Furthermore, the graders annotated more MAs in the DCP than in the SCP. This is consistent with previous studies, which state that MAs occur more often in the DCP4,43.
U-Net
We decided to use a U-net, first published by Ronneberger at al., to segment MAs due to its proven effectiveness for medical segmentation tasks9. It consists of a convolutional down-sampling branch which downsizes the image data while computing features using the filters defined during training. It is completed by an up-sampling branch in order to provide per-pixel labels that match the size of the input. The number of down-sampling steps depends on the size of the input images and structures to be segmented while the intermediate feature maps from the down-sampling branch are also passed on to the up-sampling branch. This preserves spatial information that could otherwise be lost during subsequent down-sampling operations. The combination of these elements makes the U-net architecture a proven network design and candidate for the segmentation of MAs9,23,24,25.
We use nnU-Net as a starting point for our U-net adaptation for MA segmentation. nnU-Net is a generalized toolbox that specializes in providing support for solving segmentation problems in biomedical imaging. It provides a U-net adapted automatically to the dimensions of the images to be trained on. It additionally provides a sane set of default settings and heuristic rules based on properties of the data set. nnU-Net differentiates between three different sets of parameters. The first set is comprised of parameters that remain the same across all potential segmentation tasks, e.g. the U-net architecture, but also the optimizer and its learning rate, number of epochs, the loss function and augmentations. The second set is rule-based and based on the properties of the training data, e.g. intensity distribution, spacing of pixels, and modality (e.g. computed tomography). The third set of parameters is empirical. This means that nnU-Net can make certain choices based on post-processing. The advantage of nnU-Net is that it provides a deep learning pipeline that should lead to usable results without additional changes. Its defaults however, leave room for changes and additional tuning to improve the results delivered by nnU-Net. Additionally, nnU-Net supports ensembling of trained networks. I.e., if enough data are available for a train/test split, the five networks trained on each of the cross-validation folds can be used as an ensemble on the test data. For this, the output of the five nets are averaged. This can lead to an improvement in segmentation performance at the expense of increased training time. nnU-Net’s architecture uses skip-connections to avoid over-fitting, a combination of dice and cross-entropy loss, leaky ReLUs as activation function, deep supervision, and it uses stochastic gradient descent with Nesterov momentum for training9,44.
Due to the imbalance of the expert labeled database (less than 1% of pixel belong to a MA), we decided to investigate focal loss and dice loss and compared them with the default nnU-Net configuration45. We also added a comparison with TransUNet and Swin-Unet, which are two state-of-the-art U-net implementations. TransUNet adds transformers and pre-trained weights to the U-net architecture46, while Swin-Unet implements a transformer-based U-shaped encoder-decoder architecture with skip-connections for local-global semantic feature learning47. Additionally, we suppressed connected components with a width or height of less than 11 pixels to reduce the number of false positives. All configurations were trained with a learning rate of 0.1.
Results and discussion
We provide both per-pixel and per-MA metrics as part of the evaluation. The metrics per pixel show how many pixels are classified correctly as belonging to a MA or not while the per-MA metrics indicate whether a MA was picked up by a net or not or whether the net detected a false positive MA. Even though the per-pixel metrics help to understand the overall results, we consider the per-MA metrics to be the more clinically relevant metric. Furthermore, we have added comparisons with TransUNet and Swin-Unet3,46. Both network architectures serve as a point of reference for the changes we have made to nnU-Net.
Overall, we compare three U-Net configurations, TransUNet, and Swin-Unet:
-
the original nnU-Net configuration
-
a new configuration using dice loss,
-
a new configuration using focal loss, and
-
TransUNet, which is a state-of-the-art implementation of the U-net architecture adding transformers and pre-trained weights.
-
Swin-Unet, which is a state-of-the-art implementation of the U-net architecture adding a transformer-based U-shaped encoder-decoder architecture with skip-connections for local-global semantic feature learning.
Since FA is the gold standard for the diagnosis of DR and MAs, it seems self-evident to use FA images for the evaluation of any automated detection algorithm. The challenge to this approach lies in the dynamic nature of MAs themselves. The number of MAs can vary from visit to visit3. Both OCTA scans and FA images would need to be be acquired during the same visit. Due to the difficulty of of obtaining OCTA scans and FA images from the same visit, we rely on a comparison to state-of-the-art networks instead.
We list precision/recall and associated metrics (number of true positives, false negatives, false positives, F1-score) for each configuration. For per-pixel results we provide area-under-curve (AUC), and precision/recall metrics.
Figure 5 shows precision/recall curves using fivefold cross-validation on the training data over the decision thresholds. Table 2 shows results for the same data at different decision thresholds. Figure 6 and Table 3 show results on the test data using an ensemble of the five U-nets, five TransUNets, and five Swin-Unets trained on each of the fivefolds of the training data.
First, we consider the fivefold cross-validation results on the training data. For each nnU-Net configuration, including the default nnU-Net and our adaptations with dice loss and focal loss, a single network was trained on each fold. TransUNet and Swin-Unet were also trained once for every one of the fivefolds. Figure 5 and Table 2 show these results.
First of all, it is apparent that the curves in Fig. 5 for the default nnU-Net and the dice loss version behave similarly due to nnU-Net’s loss being a combination of dice loss and cross-entropy loss. The precision is slightly lower for dice loss, but the recall is better for dice loss when compared to nnU-Net. This does not come at a surprise considering the class imbalance in the data set and the fact that cross-entropy loss does not perform well on imbalanced data sets without compensating features such as sample weights. The precision of TransUNet is also higher when compared to the dice loss configuration, but the recall is worse for lower thresholds and slightly better for higher thresholds. This extends to the F1-scores. Focal loss, on the other hand, achieves the highest precision. It also displays the the highest recall at low thresholds but this coincides with very low precision. Precision across all networks is noticeably better in the DCP, when compared to the SCP. The opposite applies to the recall across all networks. It is generally higher in the SCP, when compared to the DCP. Swin-Unet however, consistently shows worse precision and recall when compared to the other networks.
Next, we evaluate the results of the ensembled networks on the test data in Fig. 6 and Table 3. Several of the previous observations from the results on the fivefold cross-validation data still hold true. Precision for the dice loss configuration is slightly worse than for the nnU-Net configuration. The precision of TransUNet is higher than both dice loss and default nnU-Net configurations. Again, the focal loss configuration performs best in the lower decision threshold ranges, but the F1-score is in the same range as the other configurations. The precision across all networks is slightly better in the SCP when compared to the DCP. Recall decreases in the DCP when compared to the SCP, except for the dice loss configuration. Interestingly, it appears that the dice loss network benefits from the ensembling of networks, which is a notable exception to the other networks. When considering the F1-scores on the SCP, dice loss and TransUNet show very similar performances overall, with the dice loss performing slightly better. This changes in the case of the DCP however, with the dice loss’ improved recall also improving its F1-score. Swin-Unet shows improved precision when used with ensembling on the test data, its recall, however, does not notably improve.
When comparing the results for the cross-validation evaluation on the training data in Table 2 with the ensembled results on the test data in Table 3, it becomes clear that precision improves across all tested configurations for the ensembled networks. Recall however, increases for lower thresholds while it decreases for higher thresholds with the exception for dice loss in the DCP. Generally, a decrease in recall is unfortunate for use cases such as screening, where high recall (e.g., finding every possible case of the condition) is preferred over precision, to ensure as few cases as possible are missed. Note that in screening scenarios, it’s often more important to identify all possible cases (high recall) rather than being overly concerned about false positives (high precision). This is because missing a true case (a false negative) can have more severe consequences than incorrectly identifying a case that isn’t there (a false positive), which can usually be ruled out with further testing.
For both sets of results, the fivefold cross-validation on the training data and ensembling on the test data, it is apparent that precision is higher in the DCP. We mainly attribute this to the difference in vascular morphology between the two layers. OCTA scans of the SCP show clear and continuous vessel shapes against a black background, while the DCP shows a greater similarity to a regular distribution and small complex interconnections48. This can be observed in Figs. 7 and 8. Also, for both sets of results, recall for the DCP decreases when compared to the SCP. Even though fewer FPs in the DCP benefit precision, we theorize that the larger number of annotated MAs in the DCP lead to slightly fewer of them being found and thus inhibiting recall. On the training data set, 1094 MAs were annotated in the DCP, while 2028 MAs were annotated on the DCP, almost twice as many. On the test data set, 313 MAs were annotated in the SCP, while 534 MAs were annotated in the DCP. This is congruent with the clinical observations in DR, where the majority of MAs tend to occur in the DCP, not the SCP4,43. A somewhat reduced recall in the DCP can be compensated for by the larger number of MAs in that layer, as long as the recall does not sink too closely towards 0 (see Tables 2 and 3. For instance, in case of the dice loss on the test data in Table 3, the recall is still 0.35 at a decision threshold of 0.45 at a precision of 0.91 with 188 MAs found out of 534.
The ensembling step works by running the prediction of the five networks, each trained on a different fold of the training data, on the test data. The five predictions for each eye are then averaged. Dice loss in the DCP benefits from this step disproportionately when compared to the other networks and their losses and compared to the SCP. In the case of TransUNet for instance, it is possible that each of the five instances find different subsets of MAs in the DCP, but those fall below the size and decision threshold when ensembled. Dice loss on the DCP by comparison, performs better in this instance due to a combination of the tendency to favor contained areas with clearly delineated outlines and its resilience toward class imbalance. This is illustrated in supplementary Fig. F1, which shows the network output for patients 1 and 2 shown in Figs. 7 and 8 respectively.
Overall, nnU-Net’s default configuration, the dice loss configuration, and TransUNet behave very similarly due to nnU-Net’s and TransUNet’s loss being a combination of dice and cross-entropy loss. This can be seen in Figs. 5 and 6. The fact that the dice loss configuration achieves a better recall than nnU-Net does not come at a surprise considering the class imbalance in the data set and the fact that cross-entropy loss does not perform well on imbalanced data sets without features that compensate for it, such as sample weights. The changes to TransUNet over nnU-Net however, are able to partially compensate for this.
Figure 7 shows en face projections of the SCP and DCP from a patient’s eye with PDR and macular edema. A true positive from the SCP using dice loss is enlarged in the lower left. This is a large MA that was found by all five neural networks. A false negative MA from the DCP is shown in the bottom center left. Even though this is an annotated MA, it has only been found by U-net ensemble using the dice loss configuration. The lower center right shows a false positive that has been found by the default nnU-Net configuration in the SCP. The enlarged area shows a potential vascular anomaly that could be an MA, that was not labeled. The lower right shows an MA from the SCP that was only found by the TransUNet ensemble. Figure 8 shows en face projections of the SCP and DCP from another patient’s eye with PDR and macular edema. A false positive from the SCP using dice loss is enlarged in the lower left. A true positive MA from the DCP is shown in the bottom center left. This MA has been found by the dice and focal loss nnU-Net configurations but not by the default nnU-net. The lower center right shows a false negative that has not been found by the default nnU-Net configuration in the DCP. This MA could be found using the dice loss configuration. The lower right shows another MA that was only found by the TransUNet ensemble.
Conclusion and outlook
In this paper we present two things. First, we created a data set of MAs on the SCP and DCP OCTA projections from patients with DR for the training and evaluation of U-nets by two expert graders in two rounds of labeling. Secondly, we present different U-net configurations designed to detect MAs in en face projections of the SCP and DCP from OCTA scans of patients with DR and compare them with TransUNet and Swin-Unet. Our results demonstrate that it is possible to detect MAs with high accuracy/specificity albeit at the cost of recall/sensitivity. Even though higher recall is preferable in a clinical screening scenario, it never reaches zero in case of the presented dice loss configuration. The performance of the networks is generally comparable for application on the SCP and DCP, with the former benefiting from higher recall and the latter from slightly higher precision. The dice loss configuration is also the only network that benefited from ensembling in the DCP due to its resilience toward class imbalance and its ability to highlight clearly delineated areas. Overall, we demonstrate the viability of the U-net architecture for the segmentation of MAs in both the SCP and DCP in patients with DR. Using markers that are recognizable avoids the “black box” problem commonly associated with deep learning and allows clinicians to evaluate and trace the diagnosis made by the system.
Future work will include additional recognized markers, such as measurement/segmentation of non-perfused areas and foveal avascular zone enlargement3,5, a larger data set, and will aim for making referable predictions based on specific disease markers.
Data availibility
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the corresponding author upon reasonable request.
References
Sadda, S. R. et al. Quantitative assessment of the severity of diabetic retinopathy. Am. J. Ophthalmol. 218, 342–352. https://doi.org/10.1016/j.ajo.2020.05.021 (2020).
Choi, W. et al. Ultrahigh speed swept source optical coherence tomography angiography of retinal and choriocapillaris alterations in diabetic patients with and without retinopathy. Retina 37, 11–21. https://doi.org/10.1097/IAE.0000000000001250 (2017).
de Carlo, T. E. et al. Detection of microvascular changes in eyes of patients with diabetes but not clinical diabetic retinopathy using optical coherence tomography angiography. Retina 35, 2364–2370. https://doi.org/10.1097/IAE.0000000000000882 (2015).
Querques, G., Borrelli, E., Battista, M., Sacconi, R. & Bandello, F. Optical coherence tomography angiography in diabetes: Focus on microaneurysms. Eye https://doi.org/10.1038/s41433-020-01173-7 (2021).
Couturier, A. et al. Capillary plexus anomalies in diabetic retinopathy on optical coherence tomography angiography. Retina 35, 2384–2391. https://doi.org/10.1097/IAE.0000000000000859 (2015).
Husvogt, L., Ploner, S. & Maier, A. Optical coherence tomography. In Medical Imaging Systems (eds Maier, A. et al.), chap. 12, 251–261, https://doi.org/10.1007/978-3-319-96520-8_12 (Springer, 2018).
Seebock, P. et al. Unsupervised identification of disease marker candidates in retinal OCT imaging data. IEEE Trans. Med. Imaging 38, 1037–1047. https://doi.org/10.1109/TMI.2018.2877080 (2018).
De Fauw, J. et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. https://doi.org/10.1038/s41591-018-0107-6 (2018).
Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (eds Navab, N. et al.), Vol. 9351, 234–241, https://doi.org/10.1007/978-3-319-24574-4_28 (Springer, 2015).
Inoue, T., Hatanaka, Y., Okumura, S., Muramatsu, C. & Fujita, H. Automated microaneurysm detection method based on eigenvalue analysis using hessian matrix in retinal fundus images. In 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 5873–5876, https://doi.org/10.1109/EMBC.2013.6610888 (IEEE, 2013).
Pallawala, P. M., Hsu, W., Lee, M. L. & Goh, S. S. Automated microaneurysm segmentation and detection using generalized eigenvectors. In Proceedings - Seventh IEEE Workshop on Applications of Computer Vision, WACV, Vol. 322–327. https://doi.org/10.1109/ACVMOT.2005.26 (IEEE Computer Society, 2005).
Giancardo, L. et al. Microaneurysm detection with radon transform-based classification on retina images. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 5939–5942, https://doi.org/10.1109/IEMBS.2011.6091562 (2011).
Pereira, C. et al. Using a multi-agent system approach for microaneurysm detection in fundus images. Artif. Intell. Med. 60, 179–188. https://doi.org/10.1016/j.artmed.2013.12.005 (2014).
Javidi, M., Pourreza, H. R. & Harati, A. Vessel segmentation and microaneurysm detection using discriminative dictionary learning and sparse representation. Comput. Methods Programs Biomed. 139, 93–108. https://doi.org/10.1016/j.cmpb.2016.10.015 (2017).
Dai, L. et al. A deep learning system for detecting diabetic retinopathy across the disease spectrum. Nat. Commun. 12, 1–11. https://doi.org/10.1038/s41467-021-23458-5 (2021).
Wang, Z., Chen, K.-J. & Zhang, L. A R-CNN based approach for microaneurysm detection in retinal fundus images. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11837 LNCS, 201–212, https://doi.org/10.1007/978-3-030-32962-4_19 (2019).
Feng, Z. et al. Deep retinal image segmentation: A FCN-based architecture with short and long skip connections for retinal image segmentation. In Neural Information Processing, Vol. 10637 LNCS, 713–722, https://doi.org/10.1007/978-3-319-70093-9_76 (Springer Verlag, 2017).
Qiao, L., Zhu, Y. & Zhou, H. Diabetic retinopathy detection using prognosis of microaneurysm and early diagnosis system for non-proliferative diabetic retinopathy based on deep learning algorithms. IEEE Access 8, 104292–104302. https://doi.org/10.1109/ACCESS.2020.2993937 (2020).
Xu, Y. et al. FFU-Net: Feature fusion U-Net for lesion segmentation of diabetic retinopathy. BioMed Res. Int. https://doi.org/10.1155/2021/6644071 (2021).
Tan, J. H. et al. Automated segmentation of exudates, haemorrhages, microaneurysms using single convolutional neural network. Inf. Sci. 420, 66–76. https://doi.org/10.1016/j.ins.2017.08.050 (2017).
Spencer, T., Olson, J. A., McHardy, K. C., Sharp, P. F. & Forrester, J. V. An image-processing strategy for the segmentation and quantification of microaneurysms in fluorescein angiograms of the ocular fundus. Comput. Biomed. Res. 29, 284–302. https://doi.org/10.1006/cbmr.1996.0021 (1996).
Mendonca, A. M., Campilho, A. J. & Nunes, J. M. Automatic segmentation of microaneurysms in retinal angiograms of diabetic patients. In Proceedings - International Conference on Image Analysis and Processing, ICIAP, Vol. 728–733. https://doi.org/10.1109/ICIAP.1999.797681 (IEEE Computer Society, 1999).
Bilal, A., Sun, G., Mazhar, S., Imran, A. & Latif, J. A Transfer Learning and U-Net-based automatic detection of diabetic retinopathy from fundus images. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. https://doi.org/10.1080/21681163.2021.2021111 (2022).
Kou, C., Li, W., Liang, W., Yu, Z. & Hao, J. Microaneurysms segmentation with a U-Net based on recurrent residual convolutional neural network. J. Med. Imaging 6, 1. https://doi.org/10.1117/1.JMI.6.2.025008 (2019).
Sambyal, N., Saini, P., Syal, R. & Gupta, V. Modified U-Net architecture for semantic segmentation of diabetic retinopathy images. Biocybern. Biomed. Eng. 40, 1094–1109. https://doi.org/10.1016/j.bbe.2020.05.006 (2020).
Andersen, J. K. H., Grauslund, J. & Savarimuthu, T. R. Comparing objective functions for segmentation and detection of microaneurysms in retinal images. In Proceedings of Machine Learning Research, Vol. 121, 19–32 (PMLR, 2020).
Perdomo, O. et al. Classification of diabetes-related retinal diseases using a deep learning approach in optical coherence tomography. Comput. Methods Programs Biomed. 178, 181–189. https://doi.org/10.1016/j.cmpb.2019.06.016 (2019).
Husvogt, L. et al. Automatic detection of capillary dilation and looping in patients with diabetic retinopathy from optical coherence tomography angiography data. In Investigative Ophthalmology & Visual Science, Vol. 59, 5380 (C.V. Mosby Co, 2018).
Husvogt, L. et al. First approaches towards automatic detection of microaneurysms in OCTA images. In Informatik aktuell, (eds Maier, A. et al.) Vol. 211279, 11–12, https://doi.org/10.1007/978-3-662-56537-7_11 (Springer Vieweg, 2018).
Le, D., Alam, M., Miao, B. A., Lim, J. I. & Yao, X. Fully automated geometric feature analysis in optical coherence tomography angiography for objective classification of diabetic retinopathy. Biomed. Opt. Express 10, 2493. https://doi.org/10.1364/BOE.10.002493 (2019).
Takase, N. et al. Enlargement of foveal avascular zone in diabetic eyes evaluated by en face optical coherence tomography angiography. Retina 35, 2377–2383. https://doi.org/10.1097/IAE.0000000000000849 (2015).
Gao, W. et al. Detection of diabetic retinopathy in its early stages using textural features of optical coherence tomography angiography. J. Innov. Opt. Health Sci. 15, 2250006. https://doi.org/10.1142/S1793545822500067/ASSET/IMAGES/LARGE/S1793545822500067FIGF1.JPEG (2022).
Le, D. et al. Transfer learning for automated octa detection of diabetic retinopathy. Transl. Vis. Sci. Technol. 9, 1–9. https://doi.org/10.1167/tvst.9.2.35 (2020).
Heisler, M. et al. Ensemble deep learning for diabetic retinopathy detection using optical coherence tomography angiography. Transl. Vis. Sci. Technol. 9, 20. https://doi.org/10.1167/tvst.9.2.20 (2020).
Eladawi, N. et al. Early diabetic retinopathy diagnosis based on local retinal blood vessel analysis in optical coherence tomography angiography (OCTA) images. Med. Phys. 45, 4582–4599. https://doi.org/10.1002/mp.13142 (2018).
Eladawi, N. et al. Early signs detection of diabetic retinopathy using optical coherence tomography angiography scans based on 3D multi-path convolutional neural network. In 2019 IEEE International Conference on Image Processing (ICIP), 1390–1394, https://doi.org/10.1109/ICIP.2019.8803031 (IEEE, 2019).
Ryu, G., Lee, K., Park, D., Park, S. H. & Sagong, M. A deep learning model for identifying diabetic retinopathy using optical coherence tomography angiography. Sci. Rep. 11, 1–9. https://doi.org/10.1038/s41598-021-02479-6 (2021).
Watson, D. S. et al. Clinical applications of machine learning algorithms: Beyond the black box. BMJ https://doi.org/10.1136/bmj.l886 (2019).
Petch, J., Di, S. & Nelson, W. Opening the black box: The promise and limitations of explainable machine learning in cardiology. Can. J. Cardiol. https://doi.org/10.1016/j.cjca.2021.09.004 (2022).
Ratti, E. & Graves, M. Explainable machine learning practices: Opening another black box for reliable medical AI. AI Ethics 2, 801–814. https://doi.org/10.1007/s43681-022-00141-z (2022).
Bertram, C. A., Aubreville, M., Marzahl, C., Maier, A. & Klopfleisch, R. A large-scale dataset for mitotic figure assessment on whole slide images of canine cutaneous mast cell tumor. Sci. Data https://doi.org/10.1038/s41597-019-0290-4 (2019).
Marzahl, C. et al. EXACT: a collaboration toolset for algorithm-aided annotation of images with annotation version control. Sci. Rep. https://doi.org/10.1038/s41598-021-83827-4 (2021).
Hasegawa, N., Nozaki, M., Takase, N., Yoshida, M. & Ogura, Y. New insights into microaneurysms in the deep capillary plexus detected by optical coherence tomography angiography in diabetic macular edema. Investig. Ophthalmol. Vis. Sci. 57, OCT348–OCT355. https://doi.org/10.1167/iovs.15-18782 (2016).
Drozdzal, M., Vorontsov, E., Chartrand, G., Kadoury, S. & Pal, C. The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications (eds Carneiro, G. et al.), Vol. 10008, 179–187, https://doi.org/10.1007/978-3-319-46976-8_19 (Springer, 2016).
Lin, T. Y., Goyal, P., Girshick, R., He, K. & Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2017).
Chen, J. et al. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv https://doi.org/10.48550/arXiv.2102.04306 (2021).
Cao, H. et al. Swin-unet: Unet-like pure transformer for medical image segmentation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) Vol. 13803 LNCS, 205–218, https://doi.org/10.1007/978-3-031-25066-8_9 (2023).
Nam, K. Y., Lee, M. W., Lee, K. H. & Kim, J. Y. Superficial capillary plexus vessel density/deep capillary plexus vessel density ratio in healthy eyes. BMC Ophthalmol. https://doi.org/10.1186/s12886-022-02673-8 (2022).
Acknowledgements
This work was supported by the National Institutes of Health R01EY034080 and R01EY011289 (Bethesda, MD); Deutsche Forschungsgemeinschaft (DFG) MA 4898/12-2 (Bonn, Germany); Beckman-Argyros Award in Vision Research (Irvine, CA); Champalimaud Vision Award (Lisbon, Portugal); Greenberg Prize to End Blindness; Retina Research Foundation (Houston, TX); Topcon Medical Systems (Tokyo, Japan); Massachusetts Lions Eye Research Fund (Belmont, MA); Research to Prevent Blindness (New York, NY). The sponsor or funding organization had no role in the design or conduct of this research
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
LH conceived and performed the experiments, designed the figures, and wrote the manuscript. AY, AC, and KL collected the data. AY and AC labeled the data. JS and SBP contributed to the manuscript. JGF, NKW, and AM supervised this work. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
Stefan Ploner and Dr Fujimoto hold a patent related to VISTA-OCTA. Dr Fujimoto holds stock in Optovue and has received funding from Topcon. Dr Waheed is employed by Applied Genetic Technologies Corporation, has consulted for Complement Therapeutics Ltd, Iolyx Pharmaceuticals, Hubble, Olix Pharma, Saliogen, Syncona, and Topcon, has received funding from Topcon, Nidek, and Zeiss, and has personal financial interests in Gyroscope Therapeutics Ltd and Ocudyne. All other authors declare no competing interest.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Husvogt, L., Yaghy, A., Camacho, A. et al. Ensembling U-Nets for microaneurysm segmentation in optical coherence tomography angiography in patients with diabetic retinopathy. Sci Rep 14, 21520 (2024). https://doi.org/10.1038/s41598-024-72375-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-72375-2
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.