Estimating Visual Field Mean Deviation using Optical Coherence Tomographic Nerve Fiber Layer Measurements in Glaucoma Patients

To construct an optical coherence tomography (OCT) nerve fiber layer (NFL) parameter that has maximal correlation and agreement with visual field (VF) mean deviation (MD). The NFL_MD parameter in dB scale was calculated from the peripapillary NFL thickness profile nonlinear transformation and VF area-weighted averaging. From the Advanced Imaging for Glaucoma study, 245 normal, 420 pre-perimetric glaucoma (PPG), and 289 perimetric glaucoma (PG) eyes were selected. NFL_MD had significantly higher correlation (Pearson R: 0.68 vs 0.55, p < 0.001) with VF_MD than the overall NFL thickness. NFL_MD also had significantly higher sensitivity in detecting PPG (0.14 vs 0.08) and PG (0.60 vs 0.43) at the 99% specificity level. NFL_MD had better reproducibility than VF_MD (0.35 vs 0.69 dB, p < 0.001). The differences between NFL_MD and VF_MD were −0.34 ± 1.71 dB, −0.01 ± 2.08 dB and 3.54 ± 3.18 dB and 7.17 ± 2.68 dB for PPG, early PG, moderate PG, and severe PG subgroups, respectively. In summary, OCT-based NFL_MD has better correlation with VF_MD and greater diagnostic sensitivity than the average NFL thickness. It has better reproducibility than VF_MD, which may be advantageous in detecting progression. It agrees well with VF_MD in early glaucoma but underestimates damage in moderate~advanced stages.

The correlation between NFL_MD and VF_MD (dB) was significantly (p < 0.014) higher than that between overall average NFL thickness average (either in µm or dB scale) and VF_MD, for both Pearson and Spearman coefficients (Fig. 1). If the VF_MD (dB) was transformed into VF sensitivity (1/Lambert), the correlation with NFL thickness actually became worse.
Difference analysis (Table 3) and Bland-Altman analysis (Fig. 2) showed that the agreement between NFL_ MD and VF_MD was good in PPG group, fair in the early PG group, and poor in the moderate PG group and advance-to-severe PG group. There was an average bias toward better NFL_MD than VF_MD in the moderate to severe PG groups. The standard deviation of the difference between NFL_MD and VF_MD increased with increasing glaucoma severity. There were several outliers in the PPG, early PG, and moderate PG groups that had  Table 1. Characteristics of the Study Population. The characteristics of the study participants were averaged over the 4 consecutive study visits except for axial length and central corneal thickness, which were only measured at baseline. IOP = intraocular pressure; VF = visual field; MD = mean deviation; PSD = pattern standard deviation; NFLT = nerve fiber layer thickness. much worse NFL_MD than VF_MD (Figs. 2,3). Whereas the NFL_MD was generally better than the VF_MD in the advanced-to-severe PG group. Overall, NFL_MD agreed well with VF_MD in PPG and early PG stages. But in the later stages of glaucoma (moderate to severe PG), NFL_MD tend to underestimate glaucoma severity, in comparison to VF_MD. One possible explanation for the discrepancy between NFL_MD and VF_MD in the moderate-to-severe PG stages is cataract severity. Therefore, we examined cataract severity and BCVA in the different stages of glaucoma (Table 3). No significant difference between stages was found.
The agreement between NFL_MD and VF_MD staging of glaucoma severity was compared using the modified Hodapp-Parrish-Anderson classification ( Table 4). The NFL_MD staging is based on the value of NFL_MD only: Stage 0-1, NFL_MD > = −6dB; stage 3, NFL_MD < −6dB. The F1 score was used to assess agreement. The F1 score was a better metric than kappa as a metric for agreement in this case because of the imbalance in the chi-square tables ( Table 4). The classification agreement was excellent in the PPG group (F1 score 0.99) and good in the PG group (F1 score 0.87). In the PG group, there was a tendency for NFL_MD to under-estimate glaucoma severity stage, compared to VF_MD.
In aggregate analysis of all groups, NFL_MD had similarly excellent intra-class correlation as overall NFL thickness for both within-visit repeatability (0.988 vs 0.988) and between-visit reproducibility (0.978 vs 0.968).
The reproducibilities of NFL_MD and VF_MD were also assessed by the pooled root-mean-square residual of linear regression over 4 consecutive visits in glaucoma eyes (Table 5). This could be viewed as the standard deviation between visits adjusted for the glaucoma progression trend between visits. Overall, NFL_MD has better reproducibility than VF_MD (0.35 vs 0.69 dB, p < 0.001). For both NFL_MD and VF_MD, the reproducibility was best at the earliest stage of glaucoma and worsened in the more severe stages. NFL_MD had better reproducibility than VF_MD at all stages and the difference is significantly in PPG and early PG stages.
The diagnostic accuracy of NFL_MD was compared with VF_MD and the two best NFL diagnostic parameters on linear micron scales (Table 6). For the discrimination between PPG and normal groups, NFL_MD had significantly (p < 0.001) better diagnostic accuracy, as measured by AROC, than overall NFL thickness. NFL_MD also had both higher diagnostic sensitivity at the 95% and 99% specificity cutoff (p < = 0.001, McNemar test) than overall NFL thickness, and inferior NFL thickness (p < = 0.01). For discrimination between the PG and normal groups, NFL_MD had significantly (p < 0.001) higher AROC than overall NFL thickness, marginally higher (p = 0.09) than inferior NFL thickness, and significantly (p < 0.006) higher sensitivity than both micron-scale NFL parameters at both 95% and 99% specificity. Other differences between NFL_MD and other parameters were not significant. Overall, the consistent pattern was that NFL_MD had better diagnostic accuracy than micron scale NFL parameters.
We provided the diagnostic accuracy measures for VF_MD as background information. However, because abnormal VF is an inclusion criterion for the PG group and exclusion criterion for the PPG group, the selection bias makes it difficult to draw conclusions regarding the diagnostic accuracy of VF_MD. But it is remarkable that despite the selection bias in favor of VF_MD, NFL_MD actually achieved a higher diagnostic accuracy.
Several examples are shown to give insight on why NFL_MD might perform differently from overall NFL thickness (micrometer scale) and VF_MD (Fig. 3). The example in Fig. 3A shows that overall NFL thickness could be abnormally low in a normal eye with uniformly thin NFL, but yet NFL_MD could remain within normal limits. This demonstrates how NFL_MD could have improved diagnostic specificity over NFL thickness in people with normally thin NFL. In Fig. 3B, NFL_MD was abnormal due to focal defects in the superotemporal  www.nature.com/scientificreports www.nature.com/scientificreports/ and inferotemporal sectors while the overall NFL thickness remained within normal range because other sectors had above normal thickness (positive sector dB values). This demonstrates how NFL_MD could have improved diagnostic sensitivity because the logarithmic (dB) scale and VF area weighting emphasized focal thinning in the characteristic glaucoma pattern. Figure 3C shows an early PG eye where NFL_MD was much worse than VF_MD, probably because the eye already started with thin NFL prior to glaucoma damage -the pattern of NFL thinning was both diffuse and focal. Figure 3D shows an advanced PG eye where the NFL_MD was much better www.nature.com/scientificreports www.nature.com/scientificreports/ than VF_MD, probably because the eye started with thicker than average NFL -in sectors less affected by glaucoma the NFL thickness remained above average (positive dB values).

Discussion
Visual field and OCT measurements are both commonly used for the diagnosis and monitoring of glaucoma [13][14][15] . Unfortunately, VF parameters and OCT-based NFL thickness parameters do not correlate well with each other [16][17][18] . This poses challenges in the staging and monitoring of glaucoma, given the potential for discordant functional and structural results.
One reason for the low correlation between NFL and VF is the disparate scales on which they are measured. NFL thickness parameters (i.e. overall, quadrant, octant, and sector averages) are measured using a linear µm scale, while VF parameters (i.e. mean deviation, pattern standard deviation, and visual field index) are measured in dB using a logarithmic scale. To harmonize the two types of measurements, Malik et al. suggested that the correlation between VF and NFL should be either in linear to linear scale or logarithm-logarithm scale 19 .
To convert OCT measurements to a scale more consistent with VF testing, investigators have used quadratic, broken stick and logarithmic transformations 16,17,20,21 . Machine learning has also been used to transform OCT information into estimates of retinal sensitivity (a VF measure) 22,23 . In Kihara's deep learning model, localized slices from B-scans was directly used to estimate the retinal sensitivity point-by-point using a convolutional neural network with a regression output 23 .
Other investigators have converted VF results to a linear scale. Hood et al. suggested a linear model to relate NFL thickness and VF sector retinal sensitivity (linear 1/Lambert unit) 24 , using a modified Garway-Heath sector scheme 25 . Hood also showed that it is necessary to subtract the NFL thickness floor value in order to find the best correspondence with linearized VF measures. Wu et al. used the similar model on a different structure-function correspondence map generate by Kanamori et al. 21 .
We believe that converting OCT measurements to a logarithmic scale is a superior strategy for determining the rate of disease progression, as compared to converting VF parameters to a linear scale. Caprioli et al. showed that the worsening of VF_MD, on the usual dB scale, decelerates with respect to time in the more advanced stages 26 . If VF_MD is transformed from dB to linear scale, this nonlinearity would be even more exaggerated, with rapid progression in the early stages and very little change in the later stages. Indeed this is what is found when glaucoma is monitored with OCT NFL measurements on a linear micron scale -there is more rapid progression in early stages and almost no change in the advanced stage 8 . It makes sense that in advanced stages of glaucoma, when there few retinal nerve fibers remain, there would be very little further thinning of the NFL. Yet it is important to monitor the rate of thinning as a percentage of what remains, as even a few μm of thinning at  Table 3. Mean Deviations, Cataract Density, and Visual Acuity Stratified by Glaucoma Severity. Group mean ± standard deviation. The best-corrected visual acuity (BCVA) was analyzed in the form of the logarithm of minimum angle of resolution (logMAR). LogMAR values of 0, 0.1, and 0.2 are equivalent to Snellen acuity of 20/20, 20/25, and 20/32. www.nature.com/scientificreports www.nature.com/scientificreports/ the advanced stages could have large impact on vision and quality of life. Thus, using a logarithmic (dB) scale to measure glaucoma may facilitate change detection across the entire spectrum of glaucomatous disease severity. Statistical considerations also favor the logarithmic scale, as we have found the log-log correlation to be better than linear-linear correlation between VF and NFL thickness (Fig. 1). Examples showing how nerve fiber layer-mean deviation (NFL_MD) could behave differently from visual field mean deviation (VF_MD) and overall nerve fiber layer thickness (NFLT) as diagnostic parameters. Visual field (VF) total deviation maps are shown in the left column. The sectoral retinal nerve fiber layer (NFL) thickness in decibel (dB) scale is shown in the middle column. (A) a normal eye with diffusely thin NFL; (B) an early perimetric glaucoma (PG) eye with focal VF and NFL defects; (C) an early PG eye with NFL_MD was more than 6 dB worse than VF_MD; (4) an advanced PG eye with NFL_MD more than 11 dB better than VF_MD. www.nature.com/scientificreports www.nature.com/scientificreports/ In order to improve the correlation with VF_MD, it is insufficient to simply transform the overall NFL thickness from a µm to dB scale. It is necessary to perform the logarithmic transformation on a point or sector basis, and then perform the averaging operation using weights that are proportional to VF area. We demonstrated that this NFL weighted logarithmic average, compared to a simple logarithmic transform of the NFL average thickness, was better correlated with VF_MD. This result is consistent with the finding by some investigators that the correlation between VF and NFL is higher for sectors averages than overall average 18,27 .
The NFL-weighted logarithmic average still exhibited a floor effect in eyes with moderate-to-severe glaucoma. Thus a final quadratic fit was used to obtain the NFL_MD, an OCT-based optimized estimate for VF_MD. Compared to overall NFL thickness using a linear scale, NFL_MD demonstrated much better correlation with VF_MD. The agreement between NFL_MD and VF_MD are good in the PPG and early PG stages, however, NFL_MD still significantly underestimated VF damage in the moderate PG stage and markedly under-estimated VF damage in the advanced-to-severe stages. Thus the clinician needs to exercise caution in applying NFL_MD to glaucoma staging.
There are several reasons for the this discrepancy. The lower limit of −12.8 dB we placed on sector NFL value, is not nearly as low as the worst VF total deviation on a pointwise basis, which has a bottom limit of −33 dB on the Humphrey Field Analyzer 28 . While we could lower the bottom limit to extend the dynamic range of NFL_MD, this would significantly worsen the repeatability from NFL measurement noise. Since our primary goal for developing the NFL_MD was to improve glaucoma monitoring, we want to maintain the reproducibility of NFL_MD over VF_MD across all stages of glaucoma. Thus some remaining discrepancy in the advanced stages of glaucoma may be unavoidable. Other reasons for discrepancy between NFL_MD and VF_MD include cataract, other media opacities and optical aberrations, dry eye, and psychophysical limitations on the subject's test taking ability. These may explain some outlier points where VF_MD was poor while NFL_MD was near normal. In these cases,    www.nature.com/scientificreports www.nature.com/scientificreports/ NFL_MD may provide a more accurate assessment of glaucoma severity than VF_MD. On the other hand, error in NFL_MD could be introduced by image processing (i.e. segmentation) error and anatomic changes such as retinal edema and epiretinal membrane.
The largest source of discrepancy may be unavoidable variation in NFL thickness within the normal population. The standard deviation of overall NFL thickness in our sample was 8.4 µm, 8.5% of the normal average value of 99.7 µm. Thus 95% confidence interval of NFL_MD would be −1.5 to + 0.9 dB simply from normal population variation. If the eye were to have −6 dB (75%) loss of nerve fibers from baseline, the 95% confidence interval due to the variation from their starting point would be −14.1 to −5.0 dB according to our NFL_MD formula. Thus one can see that the agreement between NFL_MD and VF_MD would deteriorate in the more advanced stages of glaucoma simply due to the variation in normal NFL thickness and its floor value. Although we have reduced this variation by adjusting for age and axial length 29 , most of this variation is random and cannot be adjusted for. Thus the use of NFL_MD in the staging of glaucoma would always be hampered by the fact that each of us is born with a different NFL thickness.
Compared to conventional µm-scale NFL thickness, NFL_MD correlates better with VF_MD. But this correlation is still not good in moderate and severe glaucoma stages, and this poses a limitation for the monitoring of glaucoma progression. For the objective monitoring of glaucoma progression in the more advanced stages, structural OCT measurement of the macular ganglion cell complex 8,9,12 and optical coherence tomographic angiography (OCTA) measurements of perfusion 11,30-34 may perform better. The methods developed here to improve VF correlation and diagnostic accuracy could be applied to those other OCT and OCTA measurements as well.
We found that NFL_MD had significantly better glaucoma detection sensitivity at both 95% and 99% specificity diagnostic cut-points, compared to VF_MD and the best conventional NFL diagnostic parameters (overall average and inferior quadrant). While we did not intentionally optimize NFL_MD for glaucoma diagnosis, we believe the improved diagnostic performance is due to the weighted logarithmic averaging step. Converting the sector NFL measurements to a dB scale emphasizes focal defect. And weighting by VF area emphasizes the inferior and superior arcuate areas most often affected by glaucoma. To illustrate, a 5% uniform diffuse loss of NFL thickness in an average normal eye would yield an NFL_MD of −0.22 dB, well within the normal range. But a 55% loss in the inferior-most inferotemporal sector (16-division sectors), while still giving a 5% reduction in overall average NFL thickness (still within normal range), would yield an NFL_MD of −1.89 dB, which crosses the 99%-specificity diagnostic threshold for glaucoma. Glaucoma damage in the early stages tend to be focal and most likely in the sectors weighted most by VF area (inferotemporal and superotemporal). Thus the higher diagnostic accuracy NFL_MD may be due to its ability to accentuate focal loss in any of the likely sectors.

conclusion
In conclusion, we have developed a method to simulate VF_MD based on OCT NFL measurements. The resulting parameter is called NFL_MD. Compared to conventional NFL parameters, NFL_MD has improved correlation with VF_MD. NFL_MD is on a dB scale that corresponds to VF_MD, and thus the speed of glaucoma progression measured by NFL_MD is easier to interpret than conventional NFL parameters. NFL_MD has better reproducibility than VF_MD, thus it may allow earlier detection of significant glaucoma progression. We plan to study the use of NFL_MD in monitoring glaucoma progression using the AIG dataset in upcoming publications.

Method
Data. Data from the Advanced Imaging for Glaucoma (AIG) study were analyzed in this study. AIG was a bioengineering partnership (R01 EY013516) and multi-site longitudinal prospective clinical study sponsored by the National Eye Institute (ClinicalTrials.gov identifier: NCT01314326). The study design and baseline participant characteristics have been reported previously 35 , and the Manual of Procedures is publically available online (www.AIGStudy.net). The study procedures adhered to the Declaration of Helsinki, which guides studies involving human subjects. Written informed consent was obtained from all patients for the participation in the study. Proper institutional review board approvals were obtained from all participating institutions. The study was in accordance with the Health Insurance Portability and Accountability Act of 1996 (HIPAA) privacy and security regulations. This study was approved by the Institutional Review Board (IRB) of Oregon Health&Science University.
In this study, data collected from the normal (N), pre-perimetric glaucoma (PPG) and perimetric glaucoma (PG) participants from the AIG study were analyzed.
Both eyes of normal participants met the following criteria: VF tests within normal limits, IOP < 21 mm Hg, and normal optic nerve on slit-lamp biomicroscopy.
Eyes enrolled in the PPG group had glaucomatous optic neuropathy as evidenced by diffuse or localized thinning of the neuroretinal rim or NFL defect on fundus examination, but normal VF with pattern standard deviation (PSD, P > 0.05) and glaucoma hemifield test (GHT) within normal limits.
Eyes enrolled in the PG group had glaucomatous optic neuropathy as evidenced by diffuse or localized thinning of the neuroretinal rim or NFL defect on fundus examination, and corresponding repeatable VF defects with PSD (P < 0.05) or GHT outside normal limits.
Exclusion criteria common to all groups included best-corrected visual acuity (BCVA) worse than 20/40, evidence of retinal pathology, or history of keratorefractive surgery. Cataract was not an exclusion criteria for AIG enrollment, but the cataract density (grade 0 to 4) was recorded. For the analysis in this article, we excluded eyes with cataract density worse than 2 or BCVA worse than 20/30 during any of the 4 visits analyzed in this article.
Normal participants were followed every 12 months and glaucoma participants were followed every 6 months. OCT and VF testing were performed at all follow-up and baseline visits for PG/PPG participants. In order to (2019) 9:18528 | https://doi.org/10.1038/s41598-019-54792-w www.nature.com/scientificreports www.nature.com/scientificreports/ improve the repeatability of the measurements in the same eye, we averaged measurements from the 4 earliest consecutive visits that had complete OCT and VF data for glaucoma participants.
Visual field testing. The visual field was assessed by standard automated perimetry on the Humphrey Field Analyzer (HFA II; Carl Zeiss Meditec, Inc, Dublin, California, USA) using the Swedish Interactive Thresholding Algorithm 24-2. The minimum requirement for reliability included less than 15% fixation losses, less than 33% false positives, and less than 33% false negatives. The VF test was done at baseline for all participants, and then every 6 months for glaucoma participants and every 4 years for normal participants. nerve fiber layer thickness measurement and conversion to decibel scale. Spectral-domain optical coherence tomography. Participants were scanned with spectral domain OCT (RTVue, Optovue, Inc, Fremont, California, USA), the optic nerve head (ONH) and 3-D Disc scans were used to map the optic nerve head and nerve fiber layer. Three ONH scans were obtained in each visit for disc and NFL thickness measurements. One Disc 3D scan was obtained at the baseline visit. The OCT data were export from the OCT machine of each clinical center and send to the OCT reading center for grading. In the OCT reading center, OCT data were analyzed using REVue software (Version 6.12, Optovue). Firstly, the center of the optic disc was identified on the Disc 3D scan, and was used to register the disc positions in all subsequent ONH scans. Then NFL thickness maps (1.3~4.9 mm) were measured from the ONH scans; a NFL thickness profile was resampled on a 3.4-mm diameter circle centered on the disc 36 . The process was automated but the grader needed to validate the data to exclude scans with poor SSI, cropping or failed segmentation. Scans with failed segmentation, cropping, low signal strength index (SSI < = 37), or decentration > 0.75 mm were excluded from further analysis. Among the repeated ONH scans in the same visit, one scan was randomly picked for further analysis and comparison to the single VF test available for each visit.
Age and axial length correction. In the normal group, we found significate association of NFL thickness with age, and with axial length (p < 0.001). Thus a multivariant regression was used to correct the NFL thickness. The regression is applied to each sector seperatly. Based on the regression, the sector NFL thickness was corrected to reference age and axial length. The reference age was set to 50 years to match the VF test 37 . The reference axial length was set to the average axial length (23.6 mm) of the emmetropic (spherical equivalent refraction between −1.00 and + 1.00 D) eyes in the normal group.
Floor value of nerve fiber layer thickness. The NFL floor value refers to the residual thickness of NFL in end stage glaucoma. This thickness represents the remaining glial tissue and secondary scar tissue. In order to estimate the fraction of nerve fibers that has been lost, it is necessary to know both the reference value from a normal population, as well as the floor value from areas of severe glaucoma damage. When 100% of the nerve fibers are present, the NFL thickness is close to the normal reference value. At the other extreme, an NFL thickness near the floor value indicates that the nerve fiber survival is near 0%. To estimate the floor value, we selected eyes with severe glaucoma according to the modified Hodapp-Parrish-Anderson criteria (VF_MD < −12 dB). In each of these eyes, the NFL sector with end-stage damage was identified as the sector with the lowest NFL thickness as a percentage of the normal reference. The residual percentage from the worst sectors of these eyes were then Converting nerve fiber layer thickness to a logarithmic decibel scale. The following formula was used to transform NFL thickness on a µm scale to NFL loss on a dB scale.
where f was the floor; N was the normal reference (average value of healthy eyes in our normal group). This conversion formula could be applied to either overall or sector NFL thickness values. The normal reference and floor were adjusted for age and axial length in the above formula. Multiple linear regression was performed to fit axial length and age to NFL thickness for each sectoral, quadrantile or overall average. Then the normal references were generated from the fitting equation. The floor value for NFL thickness was adjusted for axial length, but not age 9 .
We limited the minimum value of NFL dB to −12.8 dB to avoid extremely negative dB values that could be obtained when NFL thickness is near the floor. The −12.8 dB minimum is equivalent to 5% above the floor value. This limit was based on the coefficient of variation of sector NFL thickness of 5% for repeat measurements in normal eyes.
Weighted logarithm average of sector nfL thickness. In order to simulate the VF_MD, we calculated a weighted average of sector NFL dB . The weight was set to the VF area corresponding to NFL bundles passing through a particular peripapillary sector. To determine weights, we used a modified Garway-Heath scheme to estimate the VF area (Fig. 5). The 6 sectors of the original Garway-Heath scheme were divided into 8 sectors by adding superior-inferior divisions 25,38 . In the VF map, the test points were divided along the horizontal center line. In the peripapillary profile, the dividing line was the maculopapillary axis temporally and the horizontal midline nasally. The Garway-Heath sectors were originally defined at the disc rim; we extended these sector divisions outward from the disc edge to the 3.4-mm diameter circle D = 3.4 mm along the average trajectory of nerve fibers obtained using a published flux analysis in normal human subjects 39 . The weight in the 8 sectors was set to the number of VF test points in corresponding VF sector. These weights in these 8 sectors were interpolated to obtain weights for the 16 evenly divided sectors (Fig. 2C). With these weights, we calculated the NFL weighted logarithm average (NFL WLA ) using the following formula: Where w i is the weight of a sector i; NFL dB (i) is the NFL loss in dB for sector I; the number 52 is the summation of weights.
Simulation of visual field mean deviation. In order to reduce measurement noise, we averaged NFL parameters and VF_MD from 4 consecutive visits for glaucoma eyes. The first 4 consecutive visits with Spectral domain OCT scans were selected. When VF_MD was plotted against NFL WLA , it was clear that the relationship was still significantly nonlinear. Thus a quadratic regression was used to fit the NFL WLA to VF_MD using all eyes from normal, PPG and PG groups. The intercept was fixed at zero with the a priori knowledge that an average normal NFL thickness profile should correspond to an average normal VF. Five-fold cross validation was used to avoid bias due to overfitting. For each fold, NFL-MD was then estimated in the validation sub-set using the corresponding fitting result. The NFL-MD obtained in 5 folds were pooled for the statistic analysis. www.nature.com/scientificreports www.nature.com/scientificreports/ Statistical analysis. To remove the between-eye correlation, the linear mixed effects model was used to compare the mean values of parameters between groups. Chi-square test was used for comparing gender between groups. Linear mixel effects model was applied to estimate the pearson correlation and the spearman correlation coeffcients between NFL parameters and VF MD 40,41 . A percentile bootstrap method was used to compare the correlation coefficients 42 .
To assess the between-visit reproducibility, the residual of linear regression over time was calculated for the 4 consecutive visits in glaucoma eyes. This was applied to the overall NFL thickness, NFL_MD, and VF_MD. The residuals were pooled by groups stratified by glaucoma severity. Glaucoma severity was staged by a modified Hodapp-Parrish-Anderson (HPA) classification system: Stage 0 -PPG, Stage 1 -early PG (MD > = −6 dB), Stage 2 moderate PG (−12 dB < = MD < −6 dB), and Stage 3 -severe PG (MD < −12 dB) 7 .
Intra-class correlation was used to compare the within-visit repeatability and the between-visit reproducibility of the overall NFL thickness average and NFL_MD 43 . The within-visit repeatability was based on scans in baseline visits. The between-visit reproducibility was based on pairwise analysis between the baseline and the first follow-up visit.
To assess agreement, the difference between NFL_MD and VF_MD was calculated in each eye from each visit. The mean difference was averaged over the 4 consecutive visits and then averaged again in each of the 4 stages. The standard deviation was calculated by pooling the difference over the 4 consecutive visits by root mean square. Then it is pooled again in each of the 4 stages. Difference between NFL_MD and VF_MD was also assessed by Bland-Altman analysis. Agreement between NFL_MD and VF_MD for glaucoma staging was assessed by the F1-score.
The diagnostic accuracy of separating PPG and PG groups from the normal group were evaluated by the area under the receiver operating characteristic curves (AROC), and the sensitivities at 95% and 99% specificity cutoffs with Generalized estimating equations 43,44 . The cutoff thresholds were based on the mean and standard deviation from normal eyes after the age and axial length adjustment, assuming normal distribution. The 95%/99% specificity cutoff was set at 1.65/2.33 standard deviations (SD) below the mean of the normal group. The overall and inferior NFL thickness values had a normal distribution in the normal group according to the Kolmogorov-Smirnov normality test. VF_MD and NFL_MD had normal distributions only after transformation from dB to linear scale, therefore their diagnostic cutoff values were calculated on the linear scale and then transformed back to the dB scale.
All statistical analyses were done using MATLAB with the statistical toolbox.