Automated segmentation of macular edema for the diagnosis of ocular disease using deep learning method

Macular edema is considered as a major cause of visual loss and blindness in patients with ocular fundus diseases. Optical coherence tomography (OCT) is a non-invasive imaging technique, which has been widely applied for diagnosing macular edema due to its non-invasive and high resolution properties. However, the practical applications remain challenges due to the distorted retinal morphology and blurred boundaries near macular edema. Herein, we developed a novel deep learning model for the segmentation of macular edema in OCT images based on DeepLab framework (OCT-DeepLab). In this model, we used atrous spatial pyramid pooling (ASPP) to detect macular edema at multiple features and used the fully connected conditional random field (CRF) to refine the boundary of macular edema. OCT-DeepLab model was compared against the traditional hand-crafted methods (C-V and SBG) and the end-to-end methods (FCN, PSPnet, and U-net) to estimate the segmentation performance. OCT-DeepLab showed great advantage over the hand-crafted methods (C-V and SBG) and end-to-end methods (FCN, PSPnet, and U-net) as shown by higher precision, sensitivity, specificity, and F1-score. The segmentation performance of OCT-DeepLab was comparable to that of manual label, with an average area under the curve (AUC) of 0.963, which was superior to other end-to-end methods (FCN, PSPnet, and U-net). Collectively, OCT-DeepLab model is suitable for the segmentation of macular edema and assist ophthalmologists in the management of ocular disease.

www.nature.com/scientificreports/ methods have been used in the segmentation of macular edema, including threshold-based, graph-based, active contours-based, and region-based approaches [9][10][11][12] . However, these methods were designed based on the handcrafted features, which was highly dependent on the quality of OCT images and crafted based on domain knowledge. Deep learning is a form of machine learning using the convolutional neural network, which has been used for healthcare and image analysis 13 . In deep learning, convolutional neural network (CNN) is a class of deep neural network, which is the most commonly applied to image analysis. CNN has been used for the segmentation of subretinal fluid, pigment epithelium detachment, and classification of retinal vasculature [14][15][16] . Due to the low computational efficiency and weak multi-scale feature extraction ability, improved CNNs were proposed including fully convolutional network (FCN), Pyramid scene parsing network (PSP-net) and U-net. FCN adds the full-connected layer in CNN network as convolution layer and connects the deconvolution layer to enhance the computational efficiency 17 . Pyramid scene parsing network (PSP-net) uses the pyramid level to separate the feature map into different sub-regions and form pooled representation for different locations to improve multi-scale feature extraction ability 18 . U-net is modified with up-sampling operators and a large number of feature channels to improve computational efficiency and multi-scale feature extraction ability 19 . Although these improved CNNs have better segmentation efficiency than convolutional neural network (CNN). However, there are still some challenges, such as fault-segmentation problem, over-segmentation problem, and multi-scale feature extraction problem caused by the limited depth of the convolutional network.
In this study, we proposed a deep learning method based on the DeepLab model for the segmentation of macular edema in OCT images (OCT-DeepLab). Atrous spatial pyramid pooling (ASPP) was used to segment the objects at multiple features to enhance the multi-scale feature extraction ability 20 . The fully connected conditional random field (FC-CRF) was then used to refine the boundary of macular edema to reduce the fault-segmentation and over-segmentation 21 . The segmentation performance of OCT-DeepLab was finally estimated by comparing against the hand-crafted methods (C-V and SBG) and the end-to-end methods (FCN, PSPnet, and U-net).
Experimental principle of the propose method. The flowchart of the proposed method, OCT-Deep-Lab, was shown in Fig. 1, including the pre-processing of OCT images by wavelet transform, the coarse segmentation of macular edema by DeepLab framework, and the boundary optimization by FC-CRF.  www.nature.com/scientificreports/ Pre-processing of OCT images. Speckle noise can result in granular appearance, limit the contrast, and reduce the signal-to-noise ratio (SNR) of OCT images, which can pose great difficulties to identify the detailed features of OCT images 22,23 . In the pre-processing step, wavelet transform can reduce the speckle noises of OCT images 24 . OCT images are decomposed by two-level wavelet transform (Fig. 2). At the first level, OCT images ( LL 0 ) is decomposed into a low frequency band ( LL 1 ) and three high frequency band ( HH 1 , LH 1 and HL 1 ). At the second level, LL i is split into an approximation LL i+1 and three detail channels LH i+1 , HL i+1 and HH i+1 for horizontally, vertically, and diagonally oriented details, respectively. The noise threshold (NT) of each low frequency band LL i+1 can differentiate between target signal and speckle noise. NT is calculated by: where i and j is the horizontal and vertical pixel coordinates of OCT images respectively;p ij is the pixel value.α is the hyperparameter, which can be used for rescaling denominator. The process of reducing speckle noise is shown below: p ′ ij is the pixel value after reducing noise. If p ij ≤ NT , it denotes that the pixel is speckle noise and should be reduced. If p ij > NT , it denotes that the pixel is the target signal and should be retained. OCT images are decomposed by 2-level wavelet transform (Fig. 3).
The threshold of speckle noises (NT) for each OCT image is calculated by Eq. (1). Then, the speckle noises are reduced by Eq. (2). Figure 4 is the denoising flowchart of original image and the re-construction of new image.
Coarse segmentation of macular edema in OCT images by Deeplab framework. At this step, macular edema is segmented by Deeplab framework. DeepLab is a deep learning model for image segmentation with deep convolutional nets, atrous spatial pyramid pooling (ASPP), and fully connected CRFs 20 . DeepLab uses the Resnet-101 with atrous convolutions as the main feature extractor and uses ASPP for extracting multiple scales features.
Resnet-101 encoder addresses the degradation problem based on the residual learning block 25 , which is computed as shown below: where f denotes the residual function; x l denotes the input feature to the l-th residual block; w l denotes a set of weights associated with l-th residual block. The operating principle of residual learning block is shown in Fig. 5A.
ASPP can extract the multiple scale features of OCT images by atrous convolution operation, which can enlarge the field of view of the kernel without suffering the increasing number of parameter's problems 26 . Macular edema has different scales in OCT images. ASPP can account for different scales of macular edema which can improve the accuracy of segmentation.
Taking the re-constituted OCT images as the input dataset, the coarse segmentation of macular edema is conducted by OCT-Deeplab. Here, the learning rate of Deeplab neural network is set as 0.007, and then the value of learning rate is updated dynamically by the 'poly' optimization method. The value of momentum and weight decay is set as 0.9 and 5e−4, respectively 27 . Then, the coarse boundary of macular edema in OCT image is obtained (Fig. 5). Compared with the segmentation of macular edema by manual labels, the boundary of coarse segmentation of macular edema is smoother and fails to show the smaller scale of macular edema. Thus, the boundary of macular edema is required to be further optimized and refined.   www.nature.com/scientificreports/ where Z(I) is a normalization constant; ς is a graph associated with I ; c is a set of cliques C ς in ς , each inducing a potential c 28 . The conditional probability of X is caculated by Eq. (4). Gibbs energy function of X is The maximum posteriori X is obtained by minimizing the corresponding energy: After minimizing E(X|I) , a binary segmentation of macular edema is obtained. Given a graph ς on I , its energy is obtained by summing its unary and pairwise potentials ( ψ u and ψ p , respectively): www.nature.com/scientificreports/ where i and j range from 0 to N. The unary potential ψ u (X i ) defines a log-likelihood over the label assignment X i . ψ u (X i ) is computed by a classifier. ψ u (X i ) is the coarse segmentation result of macular edema. The pairwise potentials is calculated as shown below: is a Gaussian kernels, which determines the similarity between connected pixels by means of f (m) .
were the vectors f i and f j are the feature vectors for pixel i and j in an arbitrary feature space; p i and p j are the coordinate vectors of pixel i and j . θ α and θ β are used to control the degrees of nearness and similarity between pixel i and j . The proximity in distance ( θ α ) and the similarity with the adjacent pixels ( θ β ) are the scale parameters of Gaussian kernel, which can refine the boundary of macular edema. Taking the parameter θ α as the fixed values, the changing curve of the parameter θ β with respect to F1-score is shown in Fig. 6A, where θ α and θ β range from 1 to 20, the step size is 1. When θ α = 16 and θ β = 8, F1-score reach the peak value. FC-CRF can obtain the optimal result of segmentation. The refined result is shown in Fig. 6B. Compared with the coarse segmentation result, the refined segmentation results show more detail features of macular edema, which is close to the segmentation result of macular edema by manual labels.

Results
Compared the segmentation performance against with Deep lab with different setting. In order to evaluate the effect of wavelet transform and fully connected conditional random field on the segmentation performance, the proposed method against the Deeplab with different setting, including traditional Deeplab, Deeplab with wavelet transform (Deeplab + WT), Deeplab with fully connected conditional random field (Deeplab + FC-CRF).The different segmentation results of macular edema were shown in Fig. 7. Table 1 showed the results of evaluation metrics for macular edema segmentation by Deeplab with different settings. Compared with the original Deeplab method, the Deeplab + WT method improved the scores of precision, specificity, F1-score of segmentation results, which were 94.73(2.92↑), 95.87(1.16↑), 92.82(0.95↑) respectively. And Deeplab + FC-CRF method improved the scores of precision, sensitivity, specificity and F1-score of segmentation results, which were 92.52(0.71↑), 96.56(4.25↑), 96.31(1.6↑), and 94.69(2.82↑) respectively. While our proposed OCT-DeepLab method achieved higher scores of precision, specificity, and F1-score than other methods, including Deeplab, Deeplab + WT and Deeplab + FC-CRF. While the segmentation results by OCT-DeepLab achieved higher scores of sensitivity than that of Deeplab and Deeplab + WT, and similar scores of sensitivity with that of Deeplab + FC-CRF.
Compared with traditional hand-crafted methods. We compared our proposed method against other traditional hand-crafted methods, including C-V 29 and SBG 9 , to evaluate the segmentation performance of macular edema. The segmentation results of macular edema were shown in Fig. 8.
As shown in Fig. 8, the red line is the initial contour curve of segmentation algorithm, while the green line is used to mark the segmented result of macular edema region. By the C-V model, a small part of macular edema  www.nature.com/scientificreports/ region was identified. Several macular edema regions were omitted, especially for the precise segmentation of anomalous boundaries. By the SBG model, a part of retinal tissue was identified. OCT-DeepLab could accurately segment the region of macular edema. The segmentation results of OCT-DeepLab method showed great consistency with the segmentation results by manual labels. Table 2 showed the results of evaluation metrics. OCT-DeepLab achieved higher scores of precision, sensitivity, specificity, and F1-score than that of other traditional hand-crafted methods, including C-V and SBG. Precision is a measure of relevance of results, and high precision attributes a method to yield accurate results. The advantage of OCT-DeepLab can be observed in the metric of precision, where OCT-DeepLab could achieve higher scores than C-V and SBG. The advantages of OCT-DeepLab can be also observed in the metrics of sensitivity, specificity, and F1-score. Sensitivity and specificity measure the proportion of relevant results. A high sensitivity means that the majority of all positive samples are truly detected. A high specificity means that the majority of all negative samples are truly detected. The sensitivity of OCT-DeepLab was greater than that of other methods. The higher score of sensitivity demonstrates that OCT-DeepLab can recognize more macular edema compared with the other two models. As for the specificity, OCT-DeepLab substantially exceeded C-V and SBG. F1-score is utilized to find the match between two similarities in the images. Its value also ranges between zero and one. The metric of F1-score in OCT-DeepLab was significantly greater than that in C-V and SBG, suggesting that the segmentation results of OCT-DeepLab is greatly consist with the segmentation results by manual labels.
Comparison with other end-to-end methods. In this section, we compared the proposed OCT-Deeplab method with the end-to-end methods, including FCN, PSPNet and U-net. The segmentation results of macular edema by different models were shown in Fig. 9.
In the FCN and PSPNet models, a part region of macular edema was misclassified and a small scale of macular edema were not correctly segmented. In the U-net model, the segmentation results of macular edema became clearer than FCN and PSPNet models, especially in small-scale macular edema regions. However, limited by the network structure of U-net, the input image size of U-net must be 32 or a multiple of 32. The segmentation result of our proposed is in agreement with the result of manual labels.
In order to reduce the influence of test data selection on experimental results, we use fivefolds cross-validation method to evaluate the performance of different methods. Table 3 shows the values of 4 different metrics for the segmentation of macular edema by different methods. OCT-DeepLab had better precision compared with FCN, PSPNet, or U-net. The precision of OCT-DeepLab was 95.79 which is over 84. 30, 88.27, or 89.48 in FCN, PSPNet, or U-net by a large margin. The advantages of OCT-DeepLab were also observed in the metrics of sensitivity, specificity, and F1-score. The sensitivity of OCT-DeepLab was greater than that of other methods. The higher score of sensitivity demonstrates that OCT-DeepLab can recognize more macular edema compared with the other two models. As for the specificity, OCT-DeepLab substantially exceeded FCN, U-Net, and PSPNet. F1-score can determine the degree of similarities match between two images. OCT-DeepLab achieved higher  www.nature.com/scientificreports/ scores than CN, PSPNet, or U-net, suggesting that OCT-Deeplab can obtain the closest results of macular edema segmentation to the results of manual labels. Receiver operating characteristics (ROC) analysis was then used to evaluate the performance for the segmentation of macular edema. The ROC curves for by different methods were shown in Fig. 10. Based on ROC curves, we computed the area under the curve (AUC). AUC represents the degree of separability between target regions and non-target regions. Higher the AUC, better the model is at distinguishing between macular edema and other retinal region. The performance of OCT-DeepLab was comparable to that of manual label, with an average area under the curve (AUC) of 0.963. Moreover, OCT-DeepLab had the greater value of AUC compared with that of other methods, suggesting that OCT-DeepLab shows better performance on distinguishing between macular edema and other retinal regions.

Conclusion
A novel method based on DeepLab-based deep learning (OCT-DeepLab) was proposed to segment macular edema in OCT images, including: pre-processing of OCT images via speckle de-noising, coarse segmentation of macular edema based on atrous spatial pyramid pooling (ASPP), and refine the segmentation result of macular edema by FC-CRF. Compared with conventional CNNs or improved CNNs, OCT-DeepLab had better precision, sensitivity, specificity, and F1-score. OCT-DeepLab method can enhance the multi-scale feature extraction ability and reduce fault-segmentation and over-segmentation. This method will assist ophthalmologists for the detection of edema region and enhance the diagnosis efficiency.
In OCT-DeepLab method, atrous convolution is a powerful tool for the segmentation. Atrous convolution allows us to explicitly control the resolution and effectively enlarges the view field of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Atrous spatial pyramid pooling (ASPP) can segment objects at multiple scales. ASPP probes a convolutional feature layer with filters at multiple sampling rates, thus capturing objects and image context at multiple scales. Moreover, the use of wavelet transform denoising further enhances the model's ability to segment small-scale lesions. In addition, the use of FC-CRF as a post-processing tool can refine the boundaries of macular edema and enhanced the accuracy of segmentation results.  www.nature.com/scientificreports/ In conclusion, we provide a deep learning method based on DeepLab framework to segment macular edema in OCT images. Due to its precision, reliability, and objectivity, it is a promising tool in the individual and the large-scale management of patients with ocular disease 30 . However, there are some limitations for this model. Due to the limited number of training samples in the given datasets, the segmentation results are comparatively not as high as the detection. As more data is accumulated in future, further improvements in the accuracy for macular edema segmentation in OCT images can be achieved.

Methods
Dataset. The large scale OCT image cohort was constructed with the collaboration of Eye Hospital (Nanjing Medical University), Suzhou First People's Hospital, and Huai' An First People's Hospital. The patients of diabetic macular edema who presented to the hospital between May 1, 2019 and June 30, 2020 were included. Exclusion criteria include recent pan-retinal photocoagulation, history of focal or grid laser, and other ophthalmologic diseases which may affect the accuracy of results. OCT images were centered on the macula with an axial resolution of 10 μm and a 24-bit depth and acquired in 2 s, covering a 4 × 4-mm area captured by Cirrus HD-OCT (Carl Zeiss Meditec, Inc., Dublin, CA, USA).Three medical students manually screened the data and removed unclassifiable images (i.e. signal-shielded and off-center). Three retinal specialists with more than 10-year clinical experience worked individually to label OCT images as ground truth. A senior expert was consulted in case of disagreement. The final dataset consists of 8676 volumetric OCT images from 6230 subjects. This study was approved by Ethics Committee of Eye Hospital (Nanjing Medical University) and followed the tenets of the Declaration of Helsinki. The written informed consent was obtained from all subjects.

Evaluation experiments.
To evaluate the performance on the segmentation of macular edema, three comparison experiments were conducted. In experiment 1, the proposed method was compared against Deeplab with different setting, including traditional Deeplab, Deeplab with wavelet transform (Deeplab + WT), Deeplab with fully connected conditional random field (Deeplab + FC-CRF). In experiment 2, the proposed method was compared against the traditional hand-crafted methods, including C-V and SBG. In experiment 3, the proposed method was compared against other end-to-end methods, including FCN, PSPNet, and U-net.
Evaluation metrics. Four different metrics, including precision, sensitivity, specificity, and F1-score, were calculated to estimate the performance of segmentation as shown below: