Deploying efficient net batch normalizations (BNs) for grading diabetic retinopathy severity levels from fundus images

Diabetic retinopathy (DR) is one of the main causes of blindness in people around the world. Early diagnosis and treatment of DR can be accomplished by organizing large regular screening programs. Still, it is difficult to spot diabetic retinopathy timely because the situation might not indicate signs in the primary stages of the disease. Due to a drastic increase in diabetic patients, there is an urgent need for efficient diabetic retinopathy detecting systems. Auto-encoders, sparse coding, and limited Boltzmann machines were used as a few past deep learning (DL) techniques and features for the classification of DR. Convolutional Neural Networks (CNN) have been identified as a promising solution for detecting and classifying DR. We employ the deep learning capabilities of efficient net batch normalization (BNs) pre-trained models to automatically acquire discriminative features from fundus images. However, we successfully achieved F1 scores above 80% on all efficient net BNs in the EYE-PACS dataset (calculated F1 score for DeepDRiD another dataset) and the results are better than previous studies. In this paper, we improved the accuracy and F1 score of the efficient net BNs pre-trained models on the EYE-PACS dataset by applying a Gaussian Smooth filter and data augmentation transforms. Using our proposed technique, we have achieved F1 scores of 84% and 87% for EYE-PACS and DeepDRiD.

www.nature.com/scientificreports/normalization standardization, followed by the application of a CNN classifier to visualize the severity degree of DR.In Sai Venkatesh Chilukoti et al. 13 paper deployed ResNet, VGG, and efficient net BNs (0-6) models.They got results evaluation using the F1 and quadratic weighted kappa scores, which were suitable for grading various classes built on the intensity.But they got the best F1 and Quadratic weighted kappa scores on efficient net B3.In their paper, ResNet and VGG models evaluate least F1 and Quadratic weighted kappa = 0 scores.Our proposed work on efficient net BNs (0-6) on two datasets evaluated the F1 score.We had improved the F1 score in all efficient net BNs but only efficient net b-3 has the same score in the EYE-PACS dataset as in previous work.For the first time evaluated the F1 score by applying efficient net BNs on DeepDRiD.Gao Jinfeng et al. 14 two deep CNN models used an ensemble method to detect all the DR stages by using balanced and unbalanced datasets.The outcome demonstrated that models outperform more sophisticated techniques like the Kaggle datasets in contrast to how well they now detect all stages of DR.By implementing a light-weight mobile network and assessing the effectiveness of their classifier, Sheikh et al. 15 employed a novel approach.MobileNetV2 was constructed as a light-weight, mobile-friendly architecture and trained using datasets of diabetic retinal fundus images.Yi et al. 16 applied RA-efficient net for 2-grade and 5-grade classifications.They used the APTOS 2019 dataset.In their model RA block best to perceive between the lesion features of DR images.A new method for binary class and multiclass classification based on the datasets of APTOS (2019) blindness detection and Messidor-II was announced by Nahiduzzaman et al. 17 .Initially, data were preprocessed by applying Ben Graham's technique.To locate the contrast-enhanced image data with the least amount of noise, contrast-limited adaptive histogram equalization (CLAHE) has been used.After that, a new hybrid CNN model singular value decomposition is proved to be useful for decreasing the input classifier.To reduce the training time, an ELM technique was employed as the classifier.Their approach focused on accuracy, precision, recall, and F1 score, showing the achievable potential of a future DR detection strategy.Khan et al. 18 applied depth learning-based ensemble techniques for diabetic retinopathy identification.They conducted structural modifications in real CNN to increase the effectiveness and precision of grading the DR's classes in fundus colorful images.They worked on an imbalanced Kaggle dataset to check the working of their deployed model.The results showed that the applied model did not have high accuracy computational scores 19,20 .Recently, there have been several notable research studies in the field of biomedical applications.One study by Zeng et al. 22 explored the use of the metal-organic framework to advance the signal of hyperpolarized xenon nuclear magnetic resonance.Another study by Ye et al. 23 analyzed the effectiveness of vitrectomy in treating highly myopic macular holes.Ao et al. 24 focused on stimulated Raman scattering microscopy, while Wang et al. 25 discussed a simple low-light image enhancement technique based on the Weber-Fechner law.Wang et al. 26

investigated an injectable polyzwitterionic lubricant for preventing cardiac adhesion.
There were also several research studies in the field of medical imaging.Zhang et al. 27 and Liu et al. 28 developed algorithms for endoscope imaging and analyzed applications of large CT image sequences.In the other recent research, Zhuang et al. 29,30 improved the image quality assessment in large lung CT images.Other studies, represented by references [31][32][33] , focused on different aspects of image quality assessment.
In addition to imaging, there were studies on biomedical signal processing and clinical evaluation [34][35][36] .Dual microphone active noise cancellation was explored in a study 37 , while other studies delved into fatigue feature detection method based on multifractal theory, 3D scanning and 3D object detection network [38][39][40] .

Materials and methods
Our proposed framework includes the following phases: images into separate labels folders, data loading splitting train, validation and test data, preprocessing, data augmentation, and classification (as shown in Fig. 1).

Description of datasets
With the help of retinal fundus images, our work presents a predicate technique for grading DR.DeepDRiD and Kaggle EYE-PACS datasets, both of which are freely accessible online, are used.In these two datasets originally retinal images were collected from subjects under the supervision of various medical institutions.These retinal images are both the left and right eyes of each subject (participant).The Ethics Committee at Shanghai Sixth People's Hospital accepted the DeepDRiD dataset, and the study followed the Declaration of Helsinki.Participants provided informed consent.Under the identifier ChiCTR2000031184, the trial was entered into the Chinese www.nature.com/scientificreports/Clinical Trials Registry (ChiCTR.org.cn) 41.The EYE-PACS dataset images were taken at several primary care facilities in California and elsewhere using a variety of different equipment and under a variety of different circumstances.So, this is a heterogeneous dataset made up of images from various smaller datasets that were taken with various cameras, using various settings, at various sizes, and under various lighting and brightness conditions.According to the Early Treatment Diabetic Retinopathy Study (ETDRS) scale, a physician was asked to assign a score of 0-4 to each image based on the presence of diabetic retinopathy (DR) 2 .This dataset is openly available in a zip file on the Kaggle site 42 .The total labeled fundus images in DeepDRiD and Kaggle datasets are 1600 and 35,108.Compared to Kaggle EYE-PACS, and DeepDRiD have a limited database.These two databases each feature five DR phases, which are represented by the numbers 0, 1, 2, 3, and 4: normal, mild, moderate, severe, and proliferative.Figure 2 displays fundus pictures of these five stages.These five stages are classified on the base of lesions.That is shown in the below Table 1.All image data is distributed into 5 different folders.Each folder has single-label graded images.The data distribution is displayed in the below Table 2.

Separation into folders
DeepDRiD has combined images of all grades into two folders.The first training folder has 300 patients' subfolders and the second validation folder of 100 patients' sub-folders.With these folders two labeled CSV (commaseparated values) files are present.In each sub-folder, there are four eye images (2 left and 2 right).And Kaggle dataset has also one folder and a single CSV file.The folder has 35,108 fundus images.We have used code to divide single image folder data into 5 separated label folders with the help of a given CSV file.We can distribute all fundus images into label folders in a very short time through code otherwise manually it will take too much time without knowing whether distribution into the following folders is correct or not.

Data loading
The data is set in the directory.The folders' paths are given in the directory.Firstly, read the dataset files and display some statistics.To train the classification model, the dataset's five folders normal (DR-0), mild DR (DR-1), moderate DR (DR-2), severe DR (DR-3), and PDR (DR-4) should be tagged.Images from the dataset that were viewable were randomly selected.Create a dataloader of training, testing, and validation datasets which can be seen in the below Fig. 3.  www.nature.com/scientificreports/

Preprocessing
Preprocessing is the procedure that we apply before utilizing the data images for neural network training.In preprocessing we resize all image data in the same size, assess the distribution among the classes, and examine the visual quality of all classes.The preprocessing transforms are defined before adding them to data loaders.Compose Transform is used to order a series of callables collected in a sequence.Every transformation in the series essentially takes only an argument and back to one value.In our proposed model we use simple and few preprocessing techniques.In preprocessing, we applied the following transformations.These are mentioned below:

Crop foreground
We have applied the Crop Foreground transform to manipulate images.We have isolated the foreground and all black pixels everywhere else.After it, we are enabled to crop the image so there are no full black rows above and below, or full black columns left and right of the foreground.

Resize
We are using two different types of datasets, and both have different numbers of fundus images.These datasets have contrary image resolutions.In the Kaggle EYE-PACS dataset, the images have (1024, 1024) resolution.It is a huge dataset that's why for speeding up the training we resized the images' (512, 512) resolutions.And in DeepDRiD images have (1956 × 1934) resolution we resized the images to (512, 512) resolution.We haven't used too many complex preprocessing methods.And resizing image data is the main transform.After resizing images, we observed speedy and prominent changes in training.

Gaussian smooth
There is a noise in image data.We have overcome the noise by applying a filter.During preprocessing of the data, we used the Gaussian Smooth filter.Gaussian Smooth Filter is a low pass filter applied for lowering the noise (high freq.components and distorting areas of data images).It is applied as an Odd sized Symmetric Kernel which is distributed over individual pixels of the ROI-Region of interest to develop the wanted effect.Gaussian filters are commonly isotropic, that is, they have exactly similar standard deviations along both dimensions.So, this filter removes extra noise from image data.After using this filter, we observed a prominent effect on evaluation scores.

Data augmentation
DL mostly tracks a problem where data images have a specific size.To obtain improved generalization in the model we require more data and so as many changes as likely in the images.Occasionally dataset is not sufficient to detect adequate variation, in such scenarios, we need to create additional data from a certain dataset.In such cases, augmented data can play a significant role.This technique is used to artificially grow the size of the training set by generating modified data from the initial one.Data augmentation transforms are applied on each grade to avoid data misbalancing.By this technique, we can prevent overfitting, and good for enhancing the model's performance and decreasing the number of false positives.

Network structure
There are numerous CNN models being used to classify the DR classes.efficient net appears to have performed admirably.This is because, in contrast to scaling techniques that arbitrarily scale depth, height, and breadth, a constructive scaling approach is used to increase the resolution, depth, and width of the images used in the architecture of the efficient net.efficient net is one of them which gives high accuracy scores with fewer parameters than other models that have high parameters.In the past Mingxing Tan and Quoc V. Le of Google Research, the Brain team introduced the efficient net categorization model.In order to create an autonomous network of neural networks that mutually optimizes accuracy and efficacy measured in FLOPS (floating point operations per second) they first created a network by carrying out a neural architecture search.This classification model's design customs the mobile inverted bottleneck convolution (MBConv).After that, researchers expanded the fundamental network to create the efficient nets deep-learning classifiers.Before model scaling was not changed layer operators in the baseline network, having a good baseline network is similarly serious.In addition to learning a new mobile size baseline called efficient net, we are computing such a scaling approach using real Conv-Nets, which better reflect the competency of the scaling strategy.The effectiveness of network scalability will be greatly influenced by the initial network used.In order to follow the AutoML MNAS framework, which aims to compute accuracy and efficiency, the architecture used as the baseline model on efficient net is called EfficientNet-b0 (FLOPS).The basic network is then built up using the compound scaling principle to create the family of Efficient Nets, which includes efficient net b1 through EfficientNet-b7.The Efficient Net-b0 algorithm used in this study requires a mini-input size of 224 × 224 before the image size in the RMFD is smaller than 200 × 200 pixels 21 .Efficient net (BNs) models' architecture is mentioned below in the flow diagram in Fig. 4.
Efficient net (BNs) model parameters and flops are mentioned which have been used to train our datasets in Table 3.
These are descriptions of parameters of model batch normalizations from 0 to 6.In the previous study, only an efficient net b3 gave the highest score.In their studies, they describe efficient net b3 architecture in the proposed work.According to them the efficient net b3 to develop a feature with dimensions of 7 × 7 × 1536, then a feature vector having dimension 1536 has been fully connected layer, giving a feature vector of size 512.Then passes to the dropout layer with a rate equal to 0.5, followed by the rectified linear unit (ReLU) activation layer distributed to the fully connected layer, giving the output size of 512, which is served to the dropout layer with a rate equal to 0.25 and next going to the ReLU activation layer.In the last, the output of the ReLU activation layer is given to  www.nature.com/scientificreports/ the fully connected layer, which has five units equal to the number of output classes.We also deployed the same pre-trained efficient net BNs models as previously 13 .The only difference between their and our work is having different approaches in preprocessing and data augmentation techniques through which our all efficient net b0 to b6 models showed above 80% scores in all metrics which is state-of-the-art work.
Efficient net BNs models are renowned for their compact size.With fewer parameters than popular deep learning architectures like Res-Net, inception, and VGG, they deliver performance that is competitive.Efficient net models use less memory during training and inference since they contain fewer parameters.Efficient net models frequently exhibit lower FLOPs while maintaining similar or greater performance when compared to alternative deep learning architectures.Efficient net models are more suited for real-time applications and deployment on devices with limited resources because of the decreased computational complexity.Due to its computational effectiveness, efficient net models are suitable for deployment on devices with limited resources, such as mobile devices.Efficient net models offer a fair balance of accuracy and computational effectiveness.They usually outperform earlier designs in terms of accuracy while using less computational power than larger state-of-the-art models.Based on computational capabilities and work requirements, several model sizes (b0 to b6) are available from the efficient net family.Practitioners may make informed decisions based on their unique demands thanks to the scalability of the solution.

Training
Efficient net b (0-6) was used to train our datasets through the use of hyper parameters.Adam's optimizers were used to implement training for the displayed model, and the learning rate was set to 0.0001.We divided the EYE-PACS data into three groups for training, testing, and validation.24,576 fundus images make up the training set, while 5266 images were included in the test and validation datasets.In data of DeepDRiD 1408 retinal fundus images in training data, in contrast, the test and validation dataset has 96 images.For EYEPACS and DeepDRiD, the weights of the network were randomly initialized with batch sizes ranging from 30 to 12.Then, for the EYE-PACS dataset, we trained a network for 30 epochs, and for the DeepDRiD dataset, for 100 epochs.An objective function was the categorical cross-entropy.These hyper-parameters are given below in Table 4.

Results
By deploying efficient net BNs, a complete survey of the functioning of the proposed strategy to calculate DR.The EyePACS and DeepDRiD datasets were later applied with augmentation techniques divided into the testing and training datasets.

The system configurations
Efficient net BNs models have been implemented in MONAI open-source architecture.The output channels of models are analogous to five classes (DR0, DR_1, DR_2, DR_3, and DR_4).We used Cross Entropy Loss as a loss function.A system with some parameters was utilized for the methodology.

The performance metrics
We employed common and well-known metrics to evaluate the performance of our suggested technique, such as accuracy (Acc), precision, recall, and F1-score.The terms used here are used to describe how metrics are calculated.
True-Positive (TP) indicates a retinopathy class that was correctly anticipated.The projected label that is incorrectly predicted as a retinopathy class is shown as False-Positive (FP).True-Negative (TN) is the projected label that is really a non-retinopathy pixel False-Negative (FN) is the projected label that has been mislabeled as a non-retinopathy pixel.All metrics used in evaluating the scores are described below with formulas 9,11,21 .Precision Specifically, the ratio of true positives to components is stated as being acceptable for the positive class (i.e., the sum of true positives (TP) and false positives (FP)).To show the precision, the Positive Predictive Value (PPV) is used.An equation for precision is:

Recall
It can be expressed as the proportion of TN parts to all parts belonging to the negative class (such as the total of FP and TN).The mathematical representation of the appearance is: AUC It is the area under the curve and has been used to execute a fixed integral among the two points.The evaluating equation is: It indicates the percentage of correct predictions.An equation for accuracy is: (1) F1 score = 2 × precision × recall precision + recall .

Result evaluation
F1 scores were calculated by using six efficient net BNs classification models.The efficient net models could distinguish all classes.In addition, the efficient net BNs have identified all the classes with great possibility; later results evaluated in four metrics.We discussed three tables evaluating the scores of precision, recall, accuracy (Acc), and F1-score metrics for all the efficient net BNs models given below.Table 5 results of these four metrics of previous work 13 , on which we proposed our work.They also used the EYE-PACS dataset to evaluate these metrics.In the below table, we discussed these four metrics of our work on the EYE-PACS dataset.
We also evaluated metrics on another dataset (DeepDRiD) and we see prominent improvements in results in the below Table 7.We downloaded this dataset from the DeepDRiD online challenge of 2020.
We have compared the previous 13 and our proposed work results of the F1 score metric and added execution time for each image in the below Table 8.
In the above Table 5. metrics scores of previous work utilized a Kaggle EYE-PACS dataset and deployed the model of efficient net b3, which was capable of detecting stages of DR with an achievement of F1 score of 0.84 13 .And Table 6 displayed our proposed work using EYE-PACS datasets and established all models efficient net b(0-6), which are accomplished to detect all DR stages F1 scores above 80% while only an efficient net b3 had achieved 84% F1 score in previous work with similar dataset.However, we deployed these models on DeepDRiD and evaluated the F1-score for the first time displayed in Table 7.In comparison Table 8 we observed our proposed studies displayed promising improvements in efficient net b0, b1, b2, b4, b5, and b6 in EYE-PACS.Even there were better metrics scores in the DeepDRiD dataset which evaluated the F1-score very first time with the deployment of these models.
The highest F1 scores of all classes of DR in both datasets are shown below in Fig. 5a,b.
The projected model displayed greater accuracy for identifying and sorting diabetic retinopathy on Deep-DRiD and Kaggle EYE-PACS datasets.In Fig. 6a,b, epoch average and validation AUC graphs of DeepDRiD and EYE-PACS are drawn.

Discussion
Scientists are noticing broad investigations to describe DR classes.There is a significant aim for selecting this disease in the research that this disease is frequent in China, the USA, and India, and has in recent times taken place to develop generally in our country.In future research, we will work on planning an effective DR classification system in which numerous models can be cohesive with a web-based interface for usage in clinical purposes.
In order to retain high accuracy while balancing model size and computational performance, efficient net BNs were particularly created.As a result, compared to other models, it may attain comparable performance with fewer parameters.The architectures of other deep learning models, such as VGG, ResNet, and Inception, differ and may not be as well-suited for effective computing.The basic drive of using these classification models is to detect all stages of diabetic retinopathy.Because efficient net BNs are built to reduce computational complexity, training, and inference processes go more quickly than other deep learning models.Although efficient net models are intended to be flexible and effective image classifiers and are not specifically developed to diagnose certain stages of diabetic retinopathy (DR), their traits may help them identify different stages of DR.There are some distinctive characteristics of efficient net models that can aid with DR diagnosis at various stages.
(1) These models are made to capture properties at various levels of abstraction in an adaptable manner.(2) Efficient net's compound scaling optimizes the model's depth, breadth, and resolution.This optimization makes sure that the model is able to catch the small features that are present in photos, which are crucial for spotting early DR.
Applying a few simple preprocessing techniques, dataset resizing with high-resolution images, and a Gaussian smooth filter.We implemented efficient net models to enhance performance and categories at every stage.While we were able to achieve above 80% scores by applying efficient net (1-6) BNs models, previous work using efficient net BNs models deployed on the EYE-PACS dataset only acquired a 0.84 F1-score by efficient net B3.Our models raised the EYE-PACS Kaggle dataset's F1 score, which was employed in earlier research 13 , but the DeepDRiD 2020 dataset, which we used, had a higher F1 score than the Kaggle dataset.In the EYE-PACS dataset, all of our classes have scores; however, in DeepDRiD, the majority of our classes have scores above 78% and one of our illness classes has been detected with a score of 100%.A few researchers had also previously worked with limited data on diabetic retinopathy.We were unaware that employing efficient net models with DeepDRiD little data did not evaluate F1 scores in the literature.As a result, we observed the little DeepDRiD dataset while simultaneously using the older Kaggle dataset in our investigations.We even achieved the top score using a different, smaller DeepDRiD dataset.With a higher F1 score of 0.87 and 0.84, the efficient net b5 and b6 models on the DeepDRiD and EYE-PACS datasets detected all types of DR.On the DeepDRiD dataset, however, we attained the greatest F1 score of 0.87.Our suggested research offers scores for smaller datasets like DeepDRiD in addition to S-O-T-A (state-of-the-art) outcomes for huge datasets like EYE-PACS.One disadvantage of datasets larger than 6GBs is that these CNN models cannot be trained on the available GPU. Due to the extensive parameters compared to other efficient net BNs that were used in our work, we were also unable to train EYE-PACS data using efficient net b7 due to the limited availability of GPU.The algorithms have a number of shortcomings that prevent them from classifying with high image resolution in datasets on Google-Colab, including extraordinarily high computational costs, the lack of use of complex preprocessing and data augmentation techniques, and the failure to add additional features to the models.These datasets may not fully represent the whole population because of biases against specific groups.This could limit how well the model works with various groups of the population.If the datasets span a sizable period of time, changes in imaging technology or clinical practices may result in temporal bias.It may be difficult to apply models created by old data to instances from more recent periods.
Other drawbacks include the deployment of automated categorization algorithms in healthcare contexts presents ethical and legal considerations.Clinicians may rely extensively on these models, thereby reducing their own diagnostic and decision-making abilities.Liability problems may occur if a misclassification results in negative patient consequences.In conclusion, while machine learning-based categorization of diabetic retinopathy offers promise, it is critical to recognize and address these possible disadvantages to ensure that the technology is utilized ethically and efficiently in clinical settings.The constraints can be overcome by validating the models' performance via ensemble models, k-fold cross-validation, future modifications in model topologies, and other cutting-edge structural designs such as CoAtNet.We may also share data repositories to help Artificial Intelligence (AI) systems in the future by enabling collaborative learning and the development of more robust models.In essence, the dynamic nature of the medical industry creates the groundwork for AI to redefine diabetic retinopathy diagnosis.These advancements provide more accuracy, faster analysis, and personalized therapy, ensuring that diabetic patients receive earlier medications and better outcomes, ushering in a new era of precision medicine.Researchers are encouraged to explore the challenges associated with deploying Deep Learning (DL) models for Diabetic Retinopathy (DR) classification in real-world clinical settings.This exploration should encompass critical aspects such as data privacy concerns, regulatory compliance adherence, and seamless integration with electronic health records (EHRs).By focusing on these specific research directions, these emerging scholars have the potential to make substantial contributions to the field of diabetic retinopathy categorization.These contributions hold the promise of significantly enhancing the diagnosis, treatment, and overall management of this sight-threatening medical condition.

Conclusion
Currently, a very limited investigation into separating all diabetic retinopathy groups with a higher F1 score has been completed.We used efficient net BNs models in our paper.It has been noted that grade 0 of DR occurs more frequently in other CNN models.In earlier research, we found that applying ResNet and VGG models, which were scaled only by their depth, suggested that they were unable to obtain the parameters of the images that resembled classes.Previously used efficient net models consistently scaled the depth, width, and resolution using the compound scaling technique More than one class of DR can be distinguished by the primary factor of efficient net BN models 13 .Future improvements will be needed, and these can be made by applying advanced preprocessing and data augmentation techniques, using images with the highest resolution possible, applying an ensemble of multiple CNN classification models, or using a model similar to this one but with additional features to achieve the best possible outcome.

Figure 1 .
Figure 1.Our framework with all phases.

Figure 4 .
Figure 4.The architecture of the efficient net model.

Figure 6 .
Figure 6.(a) Plot the epoch average loss and Val AUC of the EYE-PACS dataset.(b) Plot the epoch average loss and Val AUC of the DeepDRiD dataset.

Table 1 .
DR-Stages on the base of lesions.

Table 3 .
Parameters and flops of efficient net BNs.

Table 5 .
Evaluating metric scores for all the efficient net BNs of previous work.

Table 6 .
Evaluating metric scores for all the efficient net BNs on EYE-PACS.

Table 7 .
Evaluating metric scores for all the efficient net BNs of deep DRiD.

Table 8 .
Comparison of F1 scores of previous and our work.