Introduction

Diabetic retinopathy (DR) is a sight threatening, microvascular complication of diabetes. It is the most common complication of diabetes [1, 2] and is a leading cause of blindness amongst working aged adults in the developed world [3]. Patients with DR are 25 times more likely to become blind than patients without diabetes [2]. India has been estimated to have 65.1 million people with diabetes mellitus (DM) and another 21.5 million in the pre-diabetes stage (i.e., at very high risk) [4]. The number of people with DM is projected to increase to 109 million by 2035, especially involving developing countries where resources for in-person examinations are limited. Lifestyle changes, especially increasing levels of obesity, may lead to an even greater number of people with DM [5]. These data, and considerations that much of the rural world has limited access to health care, suggest that there is a need to expand services for diabetes to rural areas and to develop and implement appropriate prevention and control interventions [6]. Various studies indicate that 12–18% of the people with diabetes develop DR [7,8,9,10].

A key challenge in addressing the problem of DR is the difficulty in identifying patients at an early stage, when treatment is highly beneficial and cost-effective. Currently, screening in India (and many other countries) is undertaken on an ad hoc basis, and no optimal strategy has been developed at the national level [11]. Different models have been developed for DR case finding, and they are implemented to varying degrees across different settings [12,13,14,15,16,17,18]. Studies have reported level of awareness and lack of access to a screening facility as barriers for uptake of DR screening programmes [19,20,21]. From care providers’ perspective, lack of skilled human resources, infrastructure of retinal imaging and cost of services have been found to be the key challenges [22, 23].

Situations where images are sent from outside clinics on a regular basis demand the availability of a full time ophthalmologist skilled at diabetic retinopathy diagnosis (often a retina specialist) to read and grade every image and give feedback accordingly [14]. A major bottleneck today in making this happen in tertiary care centres is the availability of a human grader to read and grade the fundus images sent from the remote clinics. Setting aside a retina specialist at all the facilities is not likely to be feasible from an economic or availability perspective. If a non-physician grader or less specialized ophthalmologist could be effective in this role, as has been done in population studies of diabetic retinopathy where trained non-ophthalmologists graders have already been effective in research settings [24, 25], cost savings would be substantial. In this study, we aimed to validate the results of image grading by a non-ophthalmologist (Trained grader) and an ophthalmologist with that of an in-person retina specialist (taken as the Gold Standard) to explore whether a trained grader can reduce dependence of a DR screening system on Retina Specialist grading.

Subjects and methods

This prospective cross sectional study was carried out using non-mydriatic fundus images from 2002 eyes of 1001 patients who presented to the vitreo-retinal clinic of Aravind Eye Hospital, Madurai, India between April 2016 and July 2016. The research protocol was approved by the Institutional Review Board of Aravind Eye Hospitals (AEH). Written informed consent was obtained from all patients and the study adhered to the tenants of the declaration of Helsinki throughout.

Study population

Patients who were older than 40 years and previously received a DM diagnosis were taken in for the study. Exclusion criteria included a history of any intraocular surgery other than cataract surgery; ocular laser treatments for any retinal disease; ocular injections for DMO or proliferative DR; a history of any other retinal vascular disease, glaucoma, or other diseases that may affect the appearance of the retina or optic disc; medical conditions that would be a contraindication to dilation; overt media opacity; and/or gestational diabetes.

Outcome measures

The key outcome measures were sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) of the two graders with reference to the in-person retina specialist (gold standard) for referable DR and DMO. Three retina specialists with similar years of experience performed the in-person examination. We defined referable DR as DR worse than or equal to moderate non-proliferative DR (NPDR) and referable DMO as exudates within 1 disc diameter of the macula. We also have estimated the level of agreement between grader 1 and grader 2 using Cohen’s kappa statistic.

Study procedure

Patient eligibility was determined by reviewing their medical records on presentation to the clinic. All eligible patients underwent using a non-mydriatic fundus camera (3nethra; Forus Health, Bengaluru, India) to capture a macula-centred 40° to 45° fundus photograph by trained ophthalmic assistants. Following imaging, patients underwent a routine, dilated fundus examination by a retinal specialist. The fundus images were graded by a trained non-ophthalmologist grader (grader 1) and an ophthalmologist grader (grader 2) for DR and referable DMO Using the Aravind Diabetic Retinopathy Evaluation Software (ADRES; Aravind Eye Care System, Madurai, India). Both the graders were masked to each other’s grading results as well as the findings of the retina specialist. Patients were advised and provided treatment based on the retinal specialist’s assessments. Image grading by both the graders as well as in person diagnosis by the retinal specialist were done following the International Clinical Diabetic Retinopathy (ICDR) severity scale [26]. The results of grading by the graders were not available to the treating retinal specialist to ensure that standard clinical care was not affected by the study. The trained non-ophthalmologist grader (grader 1) had a one month structured training followed by 7 months of DR grading experience. The grader’s training, supervised by a retina specialist, focused on ocular anatomy, retinal disease, DR signs and severity, with a marked assessment at the end of the training. The ophthalmologist grader was Fellowship trained at the vitreo-retinal department and involved in retinal image grading for over 15 months. Intra-grader reliability was measured for both the graders against the retinal specialist live evaluation grading by re-grading approximately 10% of the cases with a minimum of a 1-week interval between the initial grading and the over-read.

Statistical analysis

All patient-related data were de-identified before transferring for statistical analysis. Demographic and clinical characteristics were summarized with means and percentages as appropriate for the type of data. Diagnostic accuracy was evaluated both at eye-level and person-level. Sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) were calculated with 95% exact binomial confidence intervals. For agreement between trained non-ophthalmologist graders and ophthalmologist graders, Cohen’s kappa statistic with 95% confidence interval was calculated following the guidelines by Landis and Koch for kappa statistic: k = 0.00–0.20, slight agreement; k = 0.21–0.40, fair; k = 0.41–0.60, moderate; 0.61–0.80, substantial; and k = 0.81–1.00, almost perfect agreement [27]. Only the images which are gradable were included in the analysis. P value of <0.05 was considered as statistical significance. All statistical analyses were performed using Statistical software STATA version 14.0 (StataCorp, College Station, Texas, USA).

Results

Images of 2002 eyes of 1001 participants were included in the study. The mean (SD) age of the patients was 55.8 (8.37) years and 420 (42%) of them were women (Table 1). We included 1901 (95%) images that were classified as ‘gradable’ by both the graders for rest of the analyses with regard to referable DR and DMO (Fig. 1). As per the evaluation of the retinal specialist (gold standard), 861 (45.3 %) eyes had DR of varying stages, of which, 209 (11%), 409 (21.5%) & 104 (5.5%) had mild, moderate or severe non-proliferative DR (NPDR) respectively and 139 (7.3%) had proliferative DR (PDR). There were 118 (6.2%) eyes with referable DMO. Of all the retinal images, 101(5.0%) were indicated as not gradable by either grader 1 or grader 2, of which, 16(15.8%), 22(21.8%), 7(6.9%) and 19(18.8%) had mild, moderate, severe NPDR and PDR respectively and 37 (36.7%) did not have any DR, as per the assessment given by the retina specialist.

Table 1 Patient characteristics and distribution of DR cases.
Fig. 1: Flow chart describing the study procedure.
figure 1

DM diabetes mellitus.

Sensitivity and specificity of detecting DR and DMO

In the eye-level analysis, compared to the reference standard clinical assessment by the retinal specialist (Table 2, which contains 95% confidence intervals), the non-ophthalmologist grader (grader 1) had a sensitivity of 66.9% and specificity of 91.0%, and the ophthalmologist grader (grader 2) had sensitivity and specificity of 83.6% and 80.3%, respectively for referable DR. The PPVs and for grader 1 and 2 were 79.6% and 68.9% respectively and the NPVs for grader 1 and 2 were 84.0% and 90.0%, respectively. Grader 1 and grader 2 correctly classified 82.7% and 81.4% images respectively.

Table 2 Sensitivity and specificity analysis for referable DR and DME, comparing each grader to the gold standard (Retina Specialist).

For Referable DMO, grader 1 and grader 2 had a sensitivity of 74.6% and 85.6% respectively and a specificity of 83.7% and 79.8% respectively (Table 2, which include the 95% confidence intervals). Here, the PPVs for graders 1 and 2 were 23.2% and 21.9% and the NPVs were 98.0% and 98.8% respectively. With respect to referable DMO, Grader 1 and 2 correctly classified 83.1% and 80.1% images respectively.

Inter-observer reliability for DR and DMO grading

We found substantial level of agreement for both grader 1 (k = 0.60, P-value < 0.001) and grader 2 (k = 0.61, P-value < 0.001) with the retina specialist for referable DR. With regard to referable DMO, the level of agreement was only fair for both the grader 1 (k = 0.29, P-value <0.001) and grader 2 (k = 0.28, P-value < 0.001).

For referable DR, a moderate level of agreement was found between the graders (Kappa = 0.60, P-value < 0.001) and for referable DMO, a substantial level of agreement was found between the graders (Kappa = 0.71, P value < 0.001) [Table 3]. Grader 1 and 2 classified 95 (4.8%) and 35 (1.8%) images as ‘ungradable’ respectively; of this 29 (28.7%) were classified so by both the graders.

Table 3 Inter-rater agreement for referable DR and DME assessment for retina specialist, grader 1 and grader 2.

For the person-level analyses, we considered the right eye diagnosis based on the finding that 91% of the patients had similar grading in both the eyes as per the gold standard retina specialist’s assessment. Very similar to the eye-level analysis, we found high sensitivity, specificity, PPV and NPV for both the graders in assessing DR and high sensitivity, specificity, NPV and low PPV for both the graders in assessing referable DMO (Table 4). We assessed the probability of a patient not being referred due to false negative classification by the system, which would not be safe for the patient. Taking a conservative analysis, we apply a 16% probability of a false negative classification based on results of the non-ophthalmologist grader (with the lowest NPV of 84%) for each eye. Because photos would be presented to graders in a masked fashion, the probability of both eyes being false negative should be independent of each other and thus equal to (0.16)*(0.16) = 2.56%. The actual non-referral proportion would be somewhat less (more favourable) than this because negative predictive values were 98% or better for diabetic macular oedema, and some with bilateral false negative results for diabetic retinopathy would be referred on the basis of diabetic macular oedema.

Table 4 Person-level agreement for referable DR and DME, comparing each grader to the gold standard (Retina Specialist).

Discussion

We found good sensitivity and excellent specificity for retinal image grading by the non-ophthalmologist grader compared to the reference standard eye examination results by the retina specialist with regard to referable DR and referable DMO, indicating proof of concept for photographic screening for diabetic retinopathy graded by trained, non-ophthalmologist graders as a potentially cost-effective strategy. Even though the sensitivity values for the non-ophthalmologists were slightly lower when compared to that for the ophthalmologist grader (grader 2), the specificity values were higher for the non-ophthalmologist grader with regard to both DR and DMO. The high specificity of grader 1 is in line with the level of accuracy of image grading by non-physicians found in previous studies conducted in Singapore [28], China [29] and the United Kingdom [30]. This indicates that when there is no pathology of DR or DMO, there is a high chance of the images being graded as ‘normal’ which suggests a non-ophthalmologist grader could accurately identify patients who do not require retina specialist evaluation—thus saving time and financial resources for patients and the health care system.

The high PPV and NPV for the grader 1 with regard to grading of referable DR supports a favourable level of reliability. However, the low PPV value for referable DMO for both grader 1 (21.9%) and grader 2 (23.2%) could be an indication of a high rate of false positives in identifying DMO by both the graders, or else greater ease in detection of subtle signs of DMO when looking at a photo than a live patient. Additionally, the percent agreement with the retina specialist with regard to both DR and DMO was more than 80% for both the graders, with grader 1 demonstrating slightly higher rates. These levels of agreement were slightly better than those reported in a large USA based trial comparing clinical examinations and fundus image grading by retina specialists [17]. OCT or portable OCT may be worth investigating as an image-based screening strategy with potentially better sensitivity and specificity for DMO.

We found only 5% of the fundus images were ungradable, which is more favourable than other studies. Previous studies have reported considerably higher proportions of non-gradable images (19.7% [18], 36% [31], 44.8% [14] and 86% [32]) mainly attributed to cataract or small pupil size. Additional studies have addressed some of the problems of poor images by using nonmydriatic, ultra-wide-field imaging while retaining the advantages of nonmydriasis and patient convenience [33, 34], a very expensive technology. The fact that we used a comparatively inexpensive but good quality camera [35] and that the images were taken by the ophthalmic assistants who could be trained in shorter period of time supports the effectiveness and feasibility of this model. In order to ensure that referable cases are not missed due to non-gradability of images, we recommend that the graders refer all patients with ungradable images to a retinal specialist.

A major strength of our study is the large image sample size which makes the result-based estimates more precise for generalization to similar settings. The study also has a few limitations. When considering only sight threatening DR (PDR), the grader 1 and grader 2 misclassified 24 (17.3%) and 19 (13.7%) images respectively as non-referable DR (not presented). However, on additional examination of these misclassified images by a retina specialist, it was found that these images were hazy predominantly due to lenticular changes or asteroid hyalosis. Training adjustments suggesting a lower threshold for calling an image ungradable might address this problem. Since the images captured only a 45° view of the retina, it is possible that the graders might have missed the neovascularization outside the field of view. Further, the exclusion criteria we set for the study might be a limitation as they might not completely be applicable in a DR screening program.

Good specificity and NPV, as found in our study, are considered key attributes of a screening programme [36] since false positives can be addressed upon referral. Our results suggest that a trained non-ophthalmologist grader can considerably enhance the efficiency of a screening system compared with having all patients screened by hospital based vitreo-retinal specialists, who are very limited in number in much of the world.

Technological advancements such as use of artificial intelligence (AI) are getting introduced in various areas of eye care including DR screening [37]. The United States of America, for instance, has recently introduced a FDA approved AI-based device to detect certain diabetes-related eye problems [38]. The UK National Health Services is already into the process of adopting an AI based automated retinal image analysis systems (ARIAS) for DR screening [39]. The strategy of expecting everyone with diabetes to undergo annual retinal examination is unlikely to succeed in the low and middle income country (LMIC) settings, because of the low levels of adherence due to various barriers [40]. Even if barriers could be addressed and adherence was increased, the enhanced demand and extra resources required would overwhelm the healthcare system. The eye care facilities in India are inadequate for dealing with the current volume of patients due to limited trained retinal specialists; shortage of diagnostic, laser, or surgical equipment; and good follow-up systems [41]. The situation is not different in other low and middle income countries. According to current predictions, diabetes-associated blindness is likely to rise dramatically in the developing world [42]. Given the numbers of people with diabetes, bringing down the costs of quality eye care will become even more important. In this context, it will be more cost effective to have manual grading by non-ophthalmologist graders than AI based systems in the near future.

In conclusion, our results suggest that the grading done by a trained-non ophthalmologist can have similar results to grading by an ophthalmologist. While there are many avenues for future work, this study provides encouraging proof of concept type results regarding the feasibility of using this model for efficient DR screening and care delivery for patients, especially in LMICs. Future research involving more graders would be needed before widespread adoption of the system. Adopting a tele-retinal screening model by a non-ophthalmologist grader, as simulated in this study, potentially could make the process of DR diagnosis more cost-effective thereby enhancing the ability of health systems to scale up DR diagnostic systems into currently unserved areas.

Summary

What was known before

  • Grading of retinal images by a retina specialist has been proven to be a reliable approach for diagnosing diabetic retinopathy. The accuracy of grading by a non-ophthalmologist grader is yet to be established, especially, in the low and middle income settings.

What this study adds

  • We found good level of accuracy for the fundus image grading performed by a trained-non ophthalmologist. DR diagnosis can become more efficient by engaging trained non-ophthalmologists in resource limited settings. Accuracy in DME diagnosis by the non-ophthalmologist grader needs improvement.