Automatic detection of pathological myopia using machine learning

Pathological myopia is a severe case of myopia, i.e., nearsightedness. Pathological myopia is also known as degenerative myopia because it ultimately leads to blindness. In pathological myopia, certain myopia-specific pathologies occur at the eye’s posterior i.e., Foster-Fuchs’s spot, Cystoid degeneration, Liquefaction, Macular degeneration, Vitreous opacities, Weiss’s reflex, Posterior staphyloma, etc. This research is aimed at developing a machine learning (ML) approach for the automatic detection of pathological myopia based on fundus images. A deep learning technique of convolutional neural network (CNN) is employed for this purpose. A CNN model is developed in Spyder. The fundus images are first preprocessed. The preprocessed images are then fed to the designed CNN model. The CNN model automatically extracts the features from the input images and classifies the images i.e., normal image or pathological myopia. The best performing CNN model achieved an AUC score of 0.9845. The best validation loss obtained is 0.1457. The results show that the model can be successfully employed to detect pathological myopia from the fundus images.

www.nature.com/scientificreports/ Zhang et al. study focuses on identifying a compact feature set that can be used for predicting pathological myopia. The study proposes a minimum Redundancy-Maximum Relevance (mRMR) based classification approach. The features set are composed of the information extracted from the fundus images and the screening exam. It is concluded that the accuracy obtained using mRMR classifiers is approximately higher than simply using a support vector machine 5 . Jiang Liu et al. focused on learning the most relevant visual features of pathological myopia by using machine learning and computer vision techniques. Scale-invariant feature transform (SIFT) feature extraction of the green channel is done. K-mean clustering is employed to generate a codebook after extraction of all the SIFT features from training images 6 . Xiangyu et al. developed a deep learning (DL) architecture with a convolutional neural network for automatic detection of glaucoma. Region of interest (ROI) image is given as input to the deep Convolutional Neural Network (CNN) 7 . Jiang Liu et al. worked on the detection of pathological myopia based on the presence of peripapillary atrophy (PPA). Features are generated from the combination of individual and relativistic metrics. These texture-based features are given to an SVM classifier 8

Method and experimental setup
Deep learning emerged as a subfield of machine learning. It enables the computer to make complex concepts from simpler concepts 10 . The 'deep' in deep learning refers to the multiple layers of representations 11 . A simple deep learning model is composed of one input layer, one output layer, and a hidden layer (can be multiples).
Overview of deep learning architecture. Convolutional neural networks, a deep learning model, are also known as "convnets. " It takes its name from the mathematical operation of "convolution" that it employs. CNN is mostly used in the field of computer vision particularly in image classification problems. The key terms in the architecture of a CNN (designed in this research) are described below: Convolutional layer. In terms of image processing, convolution is nothing else than a simple element-wise matric multiplication and sum of a filter and an input image. It helps to learn the local patterns 11 . Patch s, depth of output feature map (number of filters used in the convolution), and stride are parameters of the convolutional layer. The convolutional operation: x is the first argument to the convolution and is called input. If the input to the CNNs is an image, then this x constitutes the multidimensional array of pixel values. w is the second argument is called the kernel. It is the filter with which the multiplication is done. s(t) is the output of the operation. It is also called a feature map.
Pooling layer. The pooling layer receives the output of the convolutional layer and extracts features from each activation map. There are two main types of pooling: Max pooling and Average pooling. Max pooling is focused on this work. Because it's more informative to observe the maximal presence of the features than their average presence 11 . After each max polling operation, the feature map's size gets halved.
Fully connected layer. The neurons of the fully connected layer receive the input from every neuron in the pooling layer. These incoming inputs are flattened out i.e. vectorization. The process of classification starts at this layer 11 . A fully connected layer's output is equal to the number of classes in the classification task 12 .
Activation function. The activation function is non-linear. It regulates the firing of a neuron i.e. whether the incoming information is enough to activate the neuron.
Batch normalization. Normalization makes the various samples observed by a machine-learning model less different from each other, helping the model to learn and generalize better to new data 11 . It is usually employed after a fully connected/convolutional layer and non-linearity. It should be noted that during the process, we are not changing the weights but only normalizing the inputs to each layer. The equation shows how it's computed.
Optimizer. The job of an optimizer is to determine how a network will get updated based on a loss function.
A feedback signal is used to adjust the value of the weight in the direction of a lower loss score. It implements it by the Backpropagation algorithm. Backpropagation starts with the final value of the loss and works backward from the top to the bottom layers, using the chain rule to measure the contribution of each parameter in the loss value 11 .
Loss function. It's also called the Cost function. It's the quantity that is tried to be minimized during the training process 11 . All the networks focus on reducing the loss. www.nature.com/scientificreports/ Dropout. It is one of the most common and effective regularization techniques. It randomly drops out a number of the layer features during training. The dropout rate determines the fraction of the features that'll be zeroed out. Its common settings are between 0.2 and 0.5. It helps reduce overfitting 11 . Overfitting refers to the situation where the network has good performance on the training data but fails to generalize.
Weight regularization. It's employed to minimize overfitting of the model. Overfitting is reduced by putting constraints on the network's complexity. It is achieved by adding a cost linked with having large weights to the loss function. The cost is of two types; L1 regularization and L2 regularization. Focus is on the L2 regularization also known as weight decay. In it, the added cost is proportional to the square of the value of the weight coefficients 11 .
Mini-batch. The whole training data is divided into small equal samples called mini-batch. Thus mini-batch is a subset of the training data. It is specified by 'batch size. ' It reduces the computation cost and saves memory.

Experimental setup
The experimental setup consists of three main steps. These are discussed below: Preprocessing phase. (a) Grayscale conversion. The original input images are color images, i.e. RGB format. Figure 1 shows the original image having three channels i.e. red, blue, and green. These input images are preprocessed and converted into grayscale. Figure 2 shows the grayscale image.
(b) Resize. The original image size is very large i.e. 1444 × 1444 × 3. Resizing is necessary for reducing the computational load, train the network faster, and reduce the time required by training. However, it is necessary that the resizing should not be done to a value that'll lose the useful information. Figure 3 shows a grayscale fundus image that is resized.  Adaptive histogram equalization is also employed as seen in Figures 6, 7. The image gets divided into small blocks known as 'tiles' (tile size is 8 × 8 in OpenCV by default). To avoid noise amplification, contrast limiting is   www.nature.com/scientificreports/ done. If any histogram bin is above the specified contrast limit (40 by default), these pixels are clipped out and uniformly distributed to other bins before the application of histogram equalization 13 .
(f) Red channel extraction. Visual observation of the fundus images reveals a high content of red color. Based on this observation, the red channel from the original image is extracted and preprocessed. Figure 8 shows the original image with the three channels. Figure 9 shows the extracted red channel.
Training phase. Figure 10 shows the best performing CNN model architecture. In the second step, the CNN model is trained on the given pre-processed training input data. The model is evaluated based on the validation set. It helps us in adjusting hyperparameters by giving us information regarding network performance. To find the best performing model, hyperparameter optimization is done on TensorBoard with a validation dataset.    Ethical approval. This article does not contain any studies with human participants or animals performed by any of the authors.

Results and analysis
The CNN algorithm is developed in Spyder and is executed on Intel Core i7 Quad-Core CPU (8 GB RAM) at 2.60 GHz with NVIDIA GPU (4 GB). The accuracy, loss, and AUC score are the evaluation metrics. The CNN model is tested for different optimizers. Among which the best performing optimizer is selected i.e. ADAM. Various preprocessing techniques are checked. Dropout of 0.3 and L2 regularization of 0.001 is added to the best performing network (Table 1) to reduce overfitting. Tensor board is used to optimize the designed model having the best AUC score. It is achieved by checking different model capacities, i.e. different number convolutional layer (1, 2, 3), layer size (32, 64,128), and dense layer (0, 1, 2; these are added before the last dense layer of each model). There is a total of 27 different models. Figure 11 shows the four selected models out of 27, selected based on their better performance.
Our proposed method is compared with other state of the art methods in Table 2.

Discussion and conclusion
A convolutional neural network designed for automatic detection of pathological myopia is the aim of this research project. A simple CNN model is designed using the Spyder platform (Python 3.7). A series of tests are conducted on the designed CNN model. All these tests are focused on obtaining the best performing network. From best performing CNN, we mean the CNN model that has the least validation loss. In an attempt to do so, several preprocessing steps are added to the tests. These preprocessing steps include resizing of the image,  Figure 11. Performance of the models based on evaluation metrics. www.nature.com/scientificreports/ conversion to a grayscale image, and extraction of red channel image, shuffling of the images, histogram equalization, adaptive histogram equalization, and normalization. Different optimizers and activation functions are also tested. Furthermore, batch normalization is added and different learning rates (default learning rate performed better) are checked. The results of each test are being evaluated based on the AUC score i.e. area under the curve. However, the best AUC score does not ensure a perfect model i.e. free of overfitting. To overcome the overfitting problem, dropout and L2 regularization are added to the network architecture. The tensor board is used for network optimization. It's achieved by analyzing the graphs for different model capacities.
In conclusion, the preprocessing step of red channel extraction and the batch normalization gives the best AUC score. Comparatively, neither the histogram equalization nor the adaptive histogram equalization has any contribution towards improving the AUC score. Adam optimizer, ReLU, and sigmoid activation functions showed the best performance for the selected CNN model. The overfitting is significantly removed by adding dropout and L2 regularization. The best performing CNN model is 2C-128N-0D i.e. two convolutional layers, 128 layer size, and no multiple dense layers. Figure 10 shows the architecture of this model. The proposed classification model is trained from scratch and is not dependent on transfer learning like Cefas Rodrigues Freire et al. 15 and does not use data augmentation like Jaydeep Devda et al. 16 . The validation loss of this CNN model is 0.1457. The results of the proposed model are compared to another state of the art techniques in Table 2. The comparison shows that our proposed CNN model performs better than the rest, based on the Accuracy and AUC values.
The prime focus of this research is to work on devising an efficient and accurate diagnostic method for pathological myopia detection. This study could be extended to the diagnostic study of other types of eye-related complications i.e. glaucoma, retinopathy, etc. Furthermore, it can be used in deep learning research to analyze the efficiency of this technique and several others for this condition.