Comparing the results of manual and automated quantitative corneal neuroanalysing modules for beginners

This study aimed to evaluate the reliability of in vivo confocal microscopic neuroanalysis by beginners using manual and automated modules. Images of sub-basal corneal nerve plexus (SCNP) from 108 images of 18 healthy participants were analyzed by 7 beginner observers using manual (CCMetrics, [CCM]) and automated (ACCMetrics, [ACCM]) module. SCNP parameters analyzed included corneal nerve fiber density (NFD), corneal nerve branch density (NBD), corneal nerve fiber length (NFL), and tortuosity coefficient (TC). The intra-observer repeatability, inter-observer reliability, inter-module agreement, and left–right eye symmetry level of SCNP parameters were examined. All observers showed good intra-observer repeatability using CCM (intraclass correlation coefficient [ICC] > 0.60 for all), except when measuring TC. Two observers demonstrated especially excellent repeatability in analyzing NFD, NBD, and NFL using manual mode, indicating the quality of interpretation may still be observer-dependent. Among all SCNP parameters, NFL had the best inter-observer reliability (Spearman’s rank-sum correlation coefficient [SpCC] and ICC > 0.85 for the 3 original observers) and left–right symmetry level (SpCC and ICC > 0.60). In the additional analysis of inter-observer reliability using results by all 7 observers, only NFL showed good inter-observer reliability (ICC = 0.79). Compared with CCM measurements, values of ACCM measurements were significantly lower, implying a poor inter-module agreement. Our result suggested that performance of quantitative corneal neuroanalysis by beginners maybe acceptable, with NFL being the most reliable parameter, and automated method cannot fully replace manual work.


Scientific Reports
| (2021) 11:18208 | https://doi.org/10.1038/s41598-021-97567-y www.nature.com/scientificreports/ nerve fibers, Manchester University, Manchester, UK), was developed to improve the interpreting efficiency. Automated module software, ACCMetrics (ACCM, Automated tracing of nerve fibers, Manchester University, Manchester, UK), was later developed to perform automatic analysis of nerve parameters. Some previous studies have compared the measurements of CCM and ACCM modules, and the results were not considered interchangeble 18,22 . Due to the non-desirable performance of automated software, some studies recruited less experienced observers or used crowdsourcing to perform large-volume image interpretation, as formal training for neuroanalysis is time-and labor-consuming. Such methods have also been applied to grading diabetic retinopathy and showed satisfactory result 23 . Before a reliable automated neuroanalytic method for corneal nerve is developed, it is important to investigate if less experienced observers can still achieve acceptable results for corneal neuroanalysis through manual work. In this study, we evaluated the reliability of beginner observers in interpreting IVCM images for measuring SCNP parameters using both manual and automated modules. We aimed to evaluate the reliability of both neuroanalytic modules for beginner observers, and selected the most reliable parameters. Our study may provide precious information and help dealing with the big and complicated data from the corneal nerve images.

Results
Eighteen male subjects (age: 23.0 ± 1.56 years, range 20 to 26 years) were enrolled for IVCM imaging, and a total of 108 images were used in formal evaluation (3 images per eye per subject).
Intra-observer repeatability. The CCM measurements and ACCM measurements obtained by the 3 original observers (Original group, Observer 1-3) were summarized in Table 1. The CCM measurements by the 4 additional observers (Additional group, Observer 4-7) were summarized in Supplement Table S1. To evaluate the intra-observer repeatability of individual observer, the intra-class correlation (ICC) was calculated based on data from both sessions of evaluation ( Table 2). All 7 observers showed good intra-observer repeatability (ICC > 0.6) when measuring all SCNP parameters, except for corneal tortuosity coefficient (TC). Excellent intraobserver repeatability (ICC > 0.8) was found in 6, 3, 6, 3 out of 7 observers for the analysis of corneal nerve fiber density (NFD), corneal nerve branch density (NBD), corneal nerve fiber length (NFL), and TC respectively, with the best performance in NFL. Two observers (Observer 2 and Observer 5) achieved especially excellent intraobserver repeatability for NFD, NBD, and NFL. Bland-Altman plots, which demonstrated the intra-observer repeatability of each original observer, were shown in Fig. 1.
Inter-observer reliability. The results of inter-observer reliability of the original group were summarized in Table 3. The inter-observer ICC was 0.20, 0.39, 0.86, and 0.44 for NFD, NBD, NFL, and TC, respectively. In Table 1. The CCMetrics (CCM) values of NFD, NBD and NFL in the first and the second evaluations from the original group and the ACCMetrics (ACCM) values and the values of both eyes from the first visit of the observer 2 and ACCM. Results are expressed as Mean ± SD. NFD (nerve fiber density) is measured in number of fibers/mm 2 , NBD (nerve branch density) is measured in number of branch points on the main fibers/ mm 2 , NFL (nerve fiber length) is measured in total length of fiber (mm/mm 2 ), TC (tortuosity coefficient) is measured in main fiber average tortuosity, OD (Oculus Dexter) represents right eye, OS (Oculus Sinister) represents left eye, NA non-applicable.  Table 3. The ICC of inter-module agreement (ACCM vs either observer 1, 2, or 3) was all < 0.6 for NFD, NBD, and NFL, implying a poor inter-module agreement. Similar results were found when CCM data derived from the additional group was used to compare with ACCM result (ICC of inter-module agreement < 0.6, Supplement Table S2). Agreement plot and linear regression was also performed to visualize the relationship between NFL measurements using CCM by observer 2 and the measurement by ACCM (Fig. 2). The plot demonstrated good correlation between measurement by the two modules with poor absolute agreement.
Left-right eye level of symmetry. The left-right eye level of symmetry using CCM and ACCM was also assessed. The values of measurements were summarized in Table 1, and the results of ICC and SpCC calculation were summarized in Table 3. The right and left eyes of each subject were supposed to show similar results when a single module was used for image analysis. Since the CCM measurements from observer 2 in the original group demonstrated the most consistent results among the 3 observers, CCM data derived from observer 2 and ACCM data were used to assess the level of symmetry, and satisfactory results of NFL were found (SpCC = 0.86, ICC = 0.83 for CCM; SpCC = 0.83, ICC = 0.83 for ACCM). Figure 3 depicted the correlation between measurements of the left and right eyes by observer 2, who had the highest intra-observer repeatability in the original group. Figure 4 depicted the correlation between measurements of the two eyes using ACCM.

Discussion
We examined the performance of beginner observers in quantitative corneal neuroanalysis using manual module and automated module. All observers showed good intra-observer repeatability using CCM when measuring all SCNP parameters, except for TC. Two observers demonstrated especially excellent repeatability, indicating the quality of interpretation may still be observer-dependent. Compared with other parameters, NFL measurement had the best inter-observer reliability and left-right eye level of symmetry based on results by the senior observers. Even when data from both original and additional groups were used, only NFL showed a good inter-observer reliability. The values of all parameters measured by ACCM were significantly lower than that by CCM module, and the results between CCM and ACCM were neither consistent nor comparable. Based on our results, the intra-observer repeatability of NFD and NBD measured by CCM were satisfactory, and the repeatability of NFL was excellent. As mentioned above, we only provided a brief training of 20 images to the 7 observers before formal evaluation. Our result indicates that, with limited training, SCNP measurements obtained by beginner observers using CCM module seem acceptable. Thus, recruitment of beginner observers could be considered when large-volume corneal image interpretation is needed, and a strict requirement for experience level may not be necessary to pursue a satisfactory result.
Previous literatures did not conclude which SCNP parameter obtained by CCM has the highest repeatability 19 19 . However, other studies suggested that only NFL had high inter-observer repeatability across healthy and diseased patients 20,21 . Some studies even stated that corneal nerve evaluation by IVCM should only focus on NFL due to its high reproducibility and validity 20 . In the current study, NFL was the only parameter Table 2. The intraclass correlation coefficient for 2 CCMetrics (CCM) measurements separated by 2 weeks in all observers. NFD (nerve fiber density) is measured in number of fibers/mm 2 , NBD (nerve branch density) is measured in number of branch points on the main fibers/mm 2 , NFL (nerve fiber length) is measured in total length of fiber (mm/mm 2 ), TC (tortuosity coefficient) is measured in main fiber average tortuosity, CI confidence interval. *Represents an excellent correlation when values > 0.8. The reason for a better intra-and inter-observer reliability of NFL measurements comparing to NFD and NBD measurements remains elusive. One possible explanation is the unclear operational definition for measuring NFD and NBD 24 , which may lead to subjective interpretation and thus great differences in results obtained by different observers. On the contrary, the measurement of NFL does not involve differentiating the main or branch nerve fiber, and is usually not influenced by the observer's own judgement. This measurement minimizes the subjective factor in the evaluation process, which might be the reason for its higher repeatability. As for TC, values of TC are calculated based on the observer's depiction of how tortuous the main fibers are. Therefore, the values of TC reported may easily vary, and it was not surprising that both low intra-and inter-observer repeatability of TC were low.
The currently available tools for corneal neuroanalysis include manual module (CCM), semimanual module (Neuron J), and fully automated module (ACCM). A study has compared NFL values measured using all three modules 22 , and implied that the values of NFL derived from CCM was greater than that from the others.   27 . In our study, the NFL values obtained using ACCM were not only significant lower compared to results by CCM but also lower than past reports 22,28 . A possible explanation is the higher requirement for image quality when using ACCM to calculate NFL, as defocused nerve on the images cannot be captured by automated software. In contrast, for human observer, delineation of the nerve was relatively unaffected by a blurry background. Although we included images after quality selection, the selected images might not have been optimal for automatic analysis. In addition, the results by ACCM can be easily affected by the optical quality of the images (brightness, contrast, sharpness, etc.) ( Supplementary Fig. S1), making the reliability of this method more settings-dependent.  Figure 2. Agreement plot (a) and comparison plot (b) of nerve fiber length (NFL) between observer 2, the observer with the highest intra-observer repeatability in the original group, and ACCMetrics (ACCM). In (a), the black dotted line represents the mean differences, and the solid lines represent the 95% limits of agreement. The blue dashed line represents no difference between NFL measured by two different modules. In (b), linear regression was depicted in the dotted line and equivalence line was depicted in the solid line. Correlation was excellent between observer 2 and ACCM but there was a significant underestimation using ACCM to calculate NFL comparing with using manual CCMetrics (CCM). Recently, machine learning algorithms showed excellent performance in medical image analysis. Several artificial intelligence-based methods have been developed to improve the less-than-ideal results by ACCM 29,30 . In two prior studies, NFL measured using machine learning techniques was comparable to that obtained using manual method and was significantly better than results by ACCM 29,30 . Therefore, in addition to outsourcing this task to beginner observers, the utilization of artificial intelligence-based automated methods may be another option to reduce the labor-and time-associated cost of training professional image-readers.
There were several limitations of the current study. First, as our subjects were all healthy young males, the results may not be applicable to diseased eyes 3,31-33 . Second, corneal characteristics may differ based on ethnicity, so our results may not be generalizable for Western populations. For instance, Asians have smaller anterior segments and higher prevalence of myopia [34][35][36] , both of which may affect the SCNP parameters 37 . The reported values of normal SCNP measurements also varied across different ethnicities in past reports 24,38,39 . Similarly, our  www.nature.com/scientificreports/ results cannot serve as the reference for normative values of SCNP measurements. Third, the cohort size was relatively small in our study, and they were all healthy young males. Therefore, the generalizability of our results to healthy populations with different demographic characteristics or diseased populations remains unknown. Lastly, our study did not include performance of quantitative neuroanalysis by an experienced expert. Thus, the validity of the beginner observers' results cannot be confirmed due to the lack of ground truth provided by an expert observer. However, evaluation of the validity of beginner observers may be our next goal in establishing a reliable and accurate method for large-volume neuroanalysis.
In conclusion, without extensive training, beginners may still achieve acceptable performance in manual quantitative corneal neuroanalysis, and NFL had the best reliability among all SCNP parameters. Automated module, although convenient, cannot yet replace manual work as it may lead to underestimation of the measurements. Before a more accurate and reliable automated method is established, human resource may still be the main force for neuroanalysis, and our results may serve as the basis for future work and studies on large-volume image interpretation for corneal neuropathies.

Materials and methods
Study subjects. The study was approved by National Taiwan University Hospital Ethics Committee and was conducted in accordance with the Declaration of Helsinki. Healthy male volunteers without corneal diseases, peripheral neuropathy or diabetes were recruited for IVCM imaging. Those who had history of contact lens use or refractive surgeries were excluded. Before enrollment, both eyes of each subject were examined by slit-lamp biomicroscopy and were confirmed to be clinically healthy. Written informed consent was obtained from all patients.
In vivo corneal confocal microscopy. IVCM scan (Heidelberg Retinal Tomograph III (HRT III), Heidelberg Engineering GmbH, Heidelberg, Germany) was performed on all subjects. This IVCM uses a 670-nm wavelength helium-neon diode laser which was proven to be safe for ocular usage. An X63 objective lens with a numerical 0.9 um working intervals relative to the anterior surface of applanating cap (TomoCap; Heidelberg Engineering GmbH) was used. The obtained size of the 2-dimentional image products was 384 × 384 um, with a transverse optical resolution of 10 um per pixel.
The examinations were performed by an experienced technician following published protocols 11,[40][41][42] . Briefly, topical anesthetic was applied to the corneal surface, and a viscous gel medium was then applied to the corneal surfaces 5 min later, which permitted a visual gel bridge between the sterile cap on the microscope objective lens and the surface of the central cornea. To ensure examination of the central cornea, the subjects were instructed to fixate on the flashing light of the instrument. We used the interface between the corneal epithelium and Bowman's layer as a reference point, such that the examiner could easily find the image with the highest contrast on the SCNP.
In this study, we used a "volume scan mode" to capture a set of 40 automatically obtained images at each examination 21 . Around 6 to 8 examinations were repeated for each eye, and both eyes of each subject were examined. The overall examination took about 3-5 min. Three images were selected from each eye of each subject, and the selection was based on the depth, focus position, and contrast of the images. Details of the image selection criteria was summarized in prior study 43 . Images with even distribution of SCNP on the whole area were also the selection criteria.
Image analysis for SCNP parameters. Seven beginner observers with similar research background were recruited to analyze IVCM images and were divided into two groups. The original group consisted of 3 observers, who performed the data analysis from December 2019 to March 2020, and the additional group consisted of 4 observers, who performed the data analysis from June 2021 to July 2021. All beginner observers had no previous experience with corneal neuroanalysis. Both manual module (CCMetrics, CCM, Manual tracing of nerve fibers, Manchester University, Manchester, UK) and fully automated module (ACCMetrics, ACCM, Automated tracing of nerve fibers, Manchester University, Manchester, UK) 44 were used. Before the study started, all observers were instructed to practice corneal nerve quantification using CCM on 20 images obtained from IVCM that were not used in formal evaluation.
The quantitative SCNP parameters measured in the study were: (1) corneal nerve fiber density (NFD) (numbers per square millimeter), (2) corneal nerve branch density (NBD) (numbers per square millimeter), (3) corneal nerve fiber length (NFL) (millimeters per square millimeter), and (4) corneal tortuosity coefficient (TC) (Fig. 5) 19,21,39 . NFD is the total number of main nerve fibers (NF) per frame divided by the surface area of the frame in square millimeters (area = 0.16 mm 2 ; Fig. 5). NBD is the total number of main nerve branches (NBs, defined as the nerve branches that stem from an NF) divided by the surface area of the image frame. NFL is the total length of NFs, NBs, and secondary NBs (branches that stem from an NB) per frame. TC is a mathematical computation of the tortuosity of NF previously described by Kallinikos et al. 3 , which is independent of the angle of the nerve in the image. A straight nerve has the value of 0 in TC, and the value of TC increases when tortuosity of the NF increases. NFD, NBD, and NFL were measured using both modules, while TC was only measured by CCM module as ACCM did not provided analysis of TC.
There were two sessions of evaluation. In the first evaluation, each observer used CCM and ACCM to evaluate the randomly distributed IVCM images of the subjects. The second evaluation was performed 14 days after completion of the first evaluation, and the observers were asked to repeat analysis for all masked images after further randomization. Only CCM was used in the second evaluation, as the ACCM module was fully automated and the result would not change.  In both (b,c) images, red lines represented main fiber, which was used to calculate nerve fiber density (NFD); green dots represented the junction between the main fiber and the branch fiber (the blue line), which was used to calculate nerve branch density (NBD); NFL, nerve fiber length, represented the total length of red lines and blue lines in the image. The data from CCM (b) had obviously higher NFD, NBD and higher NFL compared to data from ACCM (c). Yellow arrows in (c) indicated the missing nerve fibers calculated by ACCM.