Assessment of image quality on color fundus retinal images using the automatic retinal image analysis

Image quality assessment is essential for retinopathy detection on color fundus retinal image. However, most studies focused on the classification of good and poor quality without considering the different types of poor quality. This study developed an automatic retinal image analysis (ARIA) method, incorporating transfer net ResNet50 deep network with the automatic features generation approach to automatically assess image quality, and distinguish eye-abnormality-associated-poor-quality from artefact-associated-poor-quality on color fundus retinal images. A total of 2434 retinal images, including 1439 good quality and 995 poor quality (483 eye-abnormality-associated-poor-quality and 512 artefact-associated-poor-quality), were used for training, testing, and 10-ford cross-validation. We also analyzed the external validation with the clinical diagnosis of eye abnormality as the reference standard to evaluate the performance of the method. The sensitivity, specificity, and accuracy for testing good quality against poor quality were 98.0%, 99.1%, and 98.6%, and for differentiating between eye-abnormality-associated-poor-quality and artefact-associated-poor-quality were 92.2%, 93.8%, and 93.0%, respectively. In external validation, our method achieved an area under the ROC curve of 0.997 for the overall quality classification and 0.915 for the classification of two types of poor quality. The proposed approach, ARIA, showed good performance in testing, 10-fold cross validation and external validation. This study provides a novel angle for image quality screening based on the different poor quality types and corresponding dealing methods. It suggested that the ARIA can be used as a screening tool in the preliminary stage of retinopathy grading by telemedicine or artificial intelligence analysis.

Image quality, in real-world settings, is a significant aspect for diagnosis because the proportion of poor-quality images has been reported to reach up to 19.7% in non-mydriatic retinal photography 1 . Poor images are the main reason for decreasing the accuracy of retinopathy detection 2 . It has been investigated for decades to improve the automatic mage quality assessment, such as clarity assessment techniques including spatial techniques [3][4][5][6][7][8][9][10][11] and wavelet transform (WT) techniques [12][13][14][15][16] . However, despite low computational complexity, traditional methods requiring human intervention only can identify some characteristics of image quality and have poor generalization to other datasets. Recently, some research focused on using the deep learning (DL) approach to assess image quality on color fundus retinal images. As the most popular deep learning architecture, a deep convolutional neural network (CNN) can automatically identify and extract hidden or latent features inherent in the input images with no need to define hand-crafted features 17 , and shows superior performance than traditional machine learning methods. The main studies about CNN method are summarized in the following paragraphs and Table 1.
In 2016, Mahapatra et al. 18 combined unsupervised information from saliency maps and supervised information from CNN with 5 convolution layers to assess image quality. A dataset from the diabetic retinopathy (DR) screening initiative with 9653 ungradable retinal images and 11,347 gradable images was used and graded by human graders. In 2017, Yu et al. 19 used a dataset including the training set with 3000 images and the test set with 2200 images from the Kaggle database labelled by the professionals. They extract CNN-based features as well as the unsupervised saliency map-based features and fuse them. In 2017, Saha et al. 20  www.nature.com/scientificreports/ framework using 3428 images (3179 accept, 249 reject, and 147 ambiguous), and the rest 3425 images (3302 accept and 123 reject) were used for evaluation. All the images were labelled by three retinal image analysis experts including one ophthalmologist from the EyePACS. They defined about 2% of the images of observation discrepancy between graders as 'ambiguous' category and found that they confuse the network and decrease overall performance when used for training. However, these three studies did not validate the algorithm on other datasets. In 2018, Zago et al. 21 used a pre-trained deep neural network (Inception-v3) and selected two public databases, namely, DRIMDB(216 retinal images divided into 125 good, 69 poor, and outlier classes) and ELSA-Brasil (842 retinal images divided into 817 good and 25 poor). The intra-database cross-validation and inter-database cross-validation (train from a database and test from another database) were used to evaluate the algorithm's performance.
In 2019, Chalakkala et al. 22 proposed a deep learning-based approach and tested on seven public databases. They used images from one database (DR1) as the training set and the images from all the other databases as the validation set. The training set consisted of 1500 images (800 medically suitable retinal images (MSRI) highquality and 700 low-quality), and the validation set consisted of 7007 images (5009 MSRI high-quality and 1998 low quality). The classification of image quality was separately labelled by different graders in each database.
In 2020, Shen et al. 23 proposed a novel multi-task deep learning framework, which can evaluate the overall image quality with quality factor analysis in terms of artefact, clarity, and field definition. It has three modules: (1) multi-task multi-factor image quality assessment; (2) landmarks (optic disc and fovea) detection; (3) semitied adversarial discriminative domain adaptation. Dataset in this study including 18,653 retinal images (10,000 for training and 8653 for testing) from Shanghai Diabetic Retinopathy Screening Program (SDRSP) labelled by three ophthalmologists. The training dataset and testing dataset were from different districts in Shanghai.
In 2021, Yuen et al. 24 developed two DL-based algorithms (EfficientNet-B0 and MobileNet-V2) using 21,348 retinal photographs (14,422 in the primary dataset and 6926 in the external dataset). In addition, one ophthalmologist and two medical students labelled images targeting image quality, a field of view, and the laterality of the eye. For image quality, they achieved AUC of 0.975, 0.999, and 0.987 in the internal dataset and external 1 and 2 datasets, respectively.
In these studies, images were divided into predefined categories, usually binary classification according to whether retinal images can be used for assessment. However, among images with poor quality, except for technical failures (e.g., small pupil and wrong position of patients), some eye diseases in the late stage (e.g., mature cataract) are also noticeable causes of poor image quality 25 .
Some other studies using the automatic method in detecting retinopathy also considered the influence of image quality. Most of them manually excluded images with poor quality 2,[26][27][28][29][30] or created a classification model to select images with good quality 31 . However, an eye disease that can make images ungradable is usually very severe in clinic, such as central retinal vein occlusion (CRVO), or in the late stage, such as mature cataract and proliferative diabetic retinopathy (PDR). Grouping them as poor quality images will result in misdiagnosis for retinopathy and prompting treatment. Therefore, those images need to be detected in the image quality assessment stage and then suggest patients with those images be referred to ophthalmologists. Only a few studies focused on this issue. Among them, two studies put all the ungradable images into the referral category 32,33 . Another study suggested adding the visual acuity assessment into retinal analysis in DR screening, and those with reduced vision should be given a referral 34 . Therefore, if automatic image quality assessment includes www.nature.com/scientificreports/ differentiation between eye-abnormality-associated-poor-quality and artefact-associated-poor-quality, it would neither overlook the eye diseases with the need of a referral, nor lead to the waste of medical resources for additional examinations and invalid referral.
On the other hand, in terms of technique, images with eye-abnormality-associated-poor-quality should also be detected. There are two options to make images with poor quality gradable: 1. A recapture of images 23,35 . 2. Using artificial methods to improve image quality 36 .
However, neither of them is effective for quality improvement of eye-abnormality-associate-poor-quality. For the first option, recapturing images can only fix the technical problem of photography rather than changing the ocular structure. Thus, the repeated image acquisition for eye abnormality-associate-poor-quality leads to a waste of time, manpower, and medical resource. For the second option, even though researchers have successfully investigated and developed artificial methods to eliminate artefacts, they found that images with eye-abnormality-associated-poor-quality (e.g., asteroid hyalosis) could not be improved by those methods and remained ungradable 36 .
To the best of our knowledge, no study focuses on the automatic image analysis method in identifying eye abnormalities that contribute to the poor quality of the retinal image. In previous work, we developed an automatic retinal imaging analysis (ARIA) approach to assess the risks of stroke and cardiovascular disease (CVD) and identify lesion patterns in retinopathy [37][38][39][40] . It showed great performance in detecting patterns and extracting the data mainly through fractal analysis, high order spectra analysis and statistical texture analysis. This study aims to train and validate ARIA to automatically assess image quality and distinguish eye-abnormality-associated-poor-quality from artefact-associated-poor-quality on color fundus retinal images.

Method
Primary dataset. Kaggle database, a publicly available database, was used for training and testing. The images in the dataset come from different types of cameras under various illumination. This data set contains 35,126 training images and 53,576 testing images, and many uninterpretable images with poor quality under different circumstances. Although most of the images were derived from the Kaggle database, we added a subgroup of 118 images with CRVO collected from Zhongshan hospital of Fudan University using TOPCON TRC-50DC Retinal Camera into the primary dataset because the Kaggle database lacks images with CRVO. This common retinal disease can make images ungradable. Two thousand four hundred and thirty four color fundus retinal images composed the primary dataset (Table 2). Among them, 1439 images were labelled as good quality, and 995 images were labelled as poor quality. Among images of poor quality, 483 were labelled as eye-abnormalityassociated-poor-quality, and 512 as artefact-associated-poor-quality. Eye abnormalities included 220 cataracts, 118 CRVO, 83 asteroid hyalosis, and 62 vitreous opaque (Fig. 1).
Image quality classification and labelling. Our study's definition of image quality was evaluated according to two clinical established aspects: visibility and clarity. Images were labelled as poor quality if artefacts or eye abnormalities cover more than 1/4 of images or level III vascular arches or larger vascular arches are invisible (Table 3) based on the recommendation of the threshold of image quality 23 . Among images with poor quality, they were categorized into eye-abnormality-associated-poor-quality and artefact-associated-poorquality. Three ophthalmologists rated all the images in the training and external validation datasets; all of them have more than 5 years of experience. If there was a discordance result between ophthalmologists, another ophthalmologist with experience over ten years would make a final decision.
External validation dataset. We used images from the Zhongshan Hospital of Fudan University for ARIA external validation. This study and data collection were approved by the Research Ethics Committee of Zhongshan Hospital and were performed according to the tenets of the Declaration of Helsinki. Images were captured by TOPCON TRC-NW100 Non-mydriatic Retinal Camera and TOPCON TRC-50DC Retinal Cam- www.nature.com/scientificreports/ era. It is worth mentioning that there was no overlap of selected images from the same eyes between the two cameras. Three hundred and fifty six color fundus retinal images composed the validation dataset (198 good quality, and 158 poor quality), and show on Table 4. Among images with poor quality, 104 were labelled as eye-  Training, testing, and external validation. In the primary dataset, the model(s) building with 70% of images was for training and 30% for testing. The study included deep learning models for (1) classification of all good quality and all poor quality images, (2) classification of all good quality and eye-abnormality-associatedpoor-quality, (3) classification of all good quality and artefact-associated-poor-quality and (4) classification of eye abnormality-associated poor quality and artefact-associated-poor-quality. We used the automatic retinal image analysis(ARIA) method developed using Matlab, which has been reported (US Patent 8787638 B2) 41 . It incorporates deep learning techniques to classify image quality and shows on Fig. 2. Firstly, we applied transfer net ResNet-50 deep network with retinal images (RGB) as input, and features generated at the layer of ''fc1000_softmax'' as output based on pixels associated with image quality 42 . We also used the ARIA automatic features generation approach to extract the texture/fractal/spectrum-related features written in Matlab 41 . Then we applied the Glmnet approach to select the most important subset of features based on the penalized maximum likelihood 43 . These refined features are highly associated with image quality and were used to generate models by random forest (RF) in Matlab. The rest, 30% of the primary dataset, was used to test the performance of models. After internal portion testing and the model built, we also applied a tenfold cross-validation with support vector machine (SVM) approach for the model assessment in order to avoid Applied Glmnet to extract potential significant features to identify the good images and poor images including eye abnormality associated and artefact associated Applied SVM classifier to carry out 10-fold cross validation: generate validated results for sensitivity and specificity ( Figure 3, Table 5) Confirm the classification result with external data testing and interpret related model ( Figure 4, Table 6) Carry out the analysis for sensitivity and specificity based on RF model(s) using all above extracted features with random sampling of 30% for testing ( Figure 3,  Statistical analysis. The performance of prediction models in training and testing dataset by calculating sensitivity, specificity, and accuracy from the testing dataset and 10-fold crossing validation. In external validation, the area under the curve (AUC) of receiver operator characteristic (ROC) curve of analysis, sensitivity, specificity, accuracy, and the proportion of the false-positive cases and false-negative cases of external validation were calculated. All analyses were performed using the software of SPSS 20 and Matlab 2020a.

Result
The performance of ARIA in differentiating different categories of image quality was summarized on Table 5 and Fig. 3. Using a simple random sampling method, 1007 images with good quality and 696 images with poor quality (including 338 images with eye-abnormality-associated-poor-quality and 358 images with artefact-associated-poor-quality) were assigned to the training dataset. The remaining 432 images with good quality and 299 images with poor quality (including 145 images with eye-abnormality-associated-poor-quality and 154 images with artefact-associated-poor-quality) were held out for testing. The sensitivity, specificity and accuracy of the ARIA for testing good quality against poor quality were 98.0%, 99.1%, and 98.6%, and ones in tenfold crossvalidation were 99.0%, 98.3%, and 98.6%, respectively. The sensitivity, specificity and accuracy of the ARIA for testing good quality VS eye-abnormality-associated-poor-quality were 98.6%, 99.8%, and 99.5%, and ones in tenfold cross-validation were 99.0%, 99.7%, and 99.5%, respectively. The sensitivity, specificity and accuracy of the ARIA for testing good quality against artefact-associated-poor-quality were 98.7%, 99.8%, and 99.5%, and ones in tenfold cross-validation were 99.8%, 99.6%, and 99.6%, respectively. Last but not least, the sensitivity, specificity and accuracy of the ARIA for testing eye-abnormality-associated-poor-quality against artefactassociated-poor-quality were 92.2%, 93.8%, and 93.0%, and ones in tenfold cross-validation were 92.2%, 93.8%, and 93.0%, respectively. In the external validation, ARIA also showed comparable and robust results in distinguishing between good quality and poor quality (Fig. 4a), achieving a sensitivity of 100%, the specificity of 99.4%, the accuracy of 99.7%, and the area under the ROC curve of 0.997. The performance of ARIA was also evaluated in distinguishing between artefact-associated-poor-quality and eye-abnormality-associated-poor-quality (Fig. 4b). The sensitivity, specificity, accuracy, and area under the ROC curve were 87.5%, 92.6%, 98.2%, and 0.915, respectively. Furthermore, details of misclassification of artefact-associated-poor-quality and eye-abnormality-associatedpoor-quality were shown on Table 6 and Fig. 5. Eye abnormalities of false-negative cases included CRVO (n = 7) and vitreous opaque (n = 6), which showed that these images with these eye abnormalities were misclassified as images with artefacts.

Discussion
In this study, we trained and validated an automatic method, ARIA, to automatically assess image quality. In evaluating color retinal images from a multiethnic public dataset and an external dataset from the hospital, the ARIA had excellent performance in classifying good quality and poor quality compared to the performance of other automatic methods (Table 1). Additionally, the differentiation of two types of poor quality is more challenging than the classification of good and poor quality and did not investigate before. Our results also show that the ARIA can differentiate eye-abnormality-associated-poor-quality from artefact-associated-poor-quality. Therefore, this approach has the potential to be used as a screening tool for automatic retinal image quality assessment in the preliminary stage of retinopathy detection. Furthermore, poor quality images with eye abnormality can be filtered out and referred to an ophthalmologist. In contrast, images with artefact-associated-poor-quality require image improvement by the second photography or image processing.
Our study included some of the most common eye abnormalities that may cause poor image quality. Media opacity has been reported to play a critical role in causing poor image quality [44][45][46][47][48] , and some studies focusing on automatic retinopathy grading excluded them before the training procedure 30,49 . A cataract is lens opacity, a type of the most common opacity. According to the definition of four grades of cataract 50 , cataract in the advanced stage was labelled as eye-abnormality-associated-poor-quality and successfully detected in our study. Suppose we screened out images as poor quality during the image assessment procedure. In that case, patients will lose the opportunity for prompt cataract surgery. They may lead to hyper mature senile cataract (HMSC) or even Table 5. The performance of ARIA in testing dataset. RF Random forest, SVM Support vector machine, EAAPQ Eye-abnormality-associated-poor-quality, AAPQ Artefact-associated-poor-quality, Se Sensitivity, Sp Specificity, Acc Accuracy.   (a(1), a(2)) Good quality VS Poor quality; (b(1), b(2)) Good quality VS Eye-abnormality-associated-poor-quality; (c(1), c(2)) Good quality VS Artefact-associated-poor-quality; (d(1), d(2)) Eye-abnormality-associated-poor-quality VS Artefact-associatedpoor-quality. www.nature.com/scientificreports/ adverse complications such as lens-induced uveitis phagocytic glaucoma and rarely spontaneous dislocation of the nucleus 51-53 . Besides cataracts, vitreous opaque, such as haemorrhage and proliferative membrane, and retinal vein occlusion (RVO) are also the common eye abnormalities that can make images ungradable and should be promptly referred to doctors in case of the complication of retinal detachment, retinal tears, neovascular event, and retinal capillary nonperfusion 54,55 , which may lead to permanent vision loss or blindness. In addition, in external validation, we found that CRVO and opaque vitreous consist of the entire false-positive cases. It indicates that they are more likely to be misclassified as artefact-associated-poor-quality compared to cataracts, probably due to the relatively small amount of training images and various patterns of lesions on images. This investigation will be helpful for a better understanding of ARIA grading. In the future, to optimize the model, more images with CRVO and vitreous opaque can be included in the training dataset.
Asteroid hyalosis is an eye abnormality; that vitreous body contains small yellow-white, spherical particles known as asteroid bodies (ABs) 56 . Although it barely impacts on the patient's vision, it can significantly influence the clinical examination of ophthalmologists on the fundus. Severe asteroid hyalinosis can even render fundus examination impossible 56,57 . Moreover, the estimated global prevalence of asteroid hyalinosis continues increasing, and old people make up a substantial and increasing fraction 57 . Hence, although most of those patients do not need a referral, the images should also be screened out in the quality screening stage due to the increasing prevalence and impairment of fundus assessment. It is worth mentioning that cataract, asteroid hyalinosis, and RVO are the conditions becoming increasingly prevalent with age 58,59 . Therefore, in age-related retinopathy screening programs, such as age-related macula disease (AMD), a larger proportion of eye disease is associated The performance of ARIA to differentiate artefact-associated-poor-quality from eyeabnormality-associated-poor-quality in the external validation dataset. Table 6. The misclassification of ARIA in differentiating artefact-associated-poor-quality from eyeabnormality-associated-poor-quality. CRVO central retinal vein occlusion.  Violin plot for the probability of ARIA output in differentiating artefact-associated-poor-quality from eye-abnormality-associated-poor-quality in the external validation dataset (predicted value of probability: 0: Artefact-associated-poor-quality; 1: Eye-abnormality-associated-poor-quality).