Introduction

Detecting the progression of glaucoma is a challenge for the clinician. Traditionally, the most commonly used quantitative techniques involved the mean deviation (MD) of the 24-2 visual field (VF), obtained with standard automated perimetry. With the advent of optical coherence tomography (OCT), the average thickness of the circumpapillary retinal nerve fibre layer (cRNFL) became a common measure of progression. This measure, called global cRNFL thickness, has been incorporated into commercial OCT reports. With the recent incorporation of OCT scanning of the macula, an average (global) measure of the retinal ganglion cell plus inner plexiform layer (RGCLP) thickness also has been employed to track progression, and a number of studies have compared these two OCT global measures [1,2,3,4,5,6,7].

However, these two measures, global cRNFL (GONH) and global RGCLP (Gmac), miss early glaucomatous damage clearly visible on probability/deviation maps, which display abnormal regions of RNFL and/or RGCLP thickness [8,9,10]. Thus, it is likely that these two measures will also miss clear progression of glaucoma, while also falsely identifying some eyes as progressors.

Our purpose here was to understand the problems involved in using global OCT measures for detecting progression in early glaucoma. First, we show, as expected, that the conventional thickness measures, GONH and Gmac, combined with a traditional event-based analysis, lead to both excessive false positives (FPs) and false negatives (FNs). Second, and most importantly, we identify the reasons for these errors via a post-hoc analysis.

Methods

Participants

There were 104 study eyes from 104 individuals; 76 were from glaucoma or glaucoma suspect patients. The remaining 28 eyes were healthy controls (HCs) with normal fundus examination, normal VFs, and IOP < 22 mmHg. All eyes had 24-2 MD better than -6 dB and at least two OCT scans: a baseline scan and a scan obtained at least 1 year after the baseline (mean: 24.9 ± 8.7 months, range 12–42 months). All individuals were enroled in Columbia University’s prospective study, Macular Damage in Early Glaucoma and Progression (ClinicalTrials.gov: NCT02547740).

Study procedures followed the tenets of the Declaration of Helsinki and Health Insurance Portability and Accountability Act and were approved by the Institutional Review Board of Columbia University. Written informed consent was obtained from all participants.

OCT data

Widefield (12 × 9 mm) swept-source OCT volume scans (Atlantis; Topcon Inc., Tokyo, Japan) were obtained for each eye. Every scan was rotated to a common fovea-to-disc angle, which accounted for head-eye torsion, and to some extent anatomical differences, as previously described [11], and currently incorporated in a commercial report similar to the one in Fig. 1 generated by our custom programme. A derived B-scan image (Fig. 1Aa) was generated from the widefield scan for a circle 3.45-mm in diameter centred on the optic disc. The cRNFL thicknesses were measured (black-magenta-blue-black curve in Fig. 1Ab). A RNFL thickness map (Fig. 1Ad) was obtained from the widefield scan. A portion of the widefield scan, 6 × 6 mm centred on the fovea, was used to produce a RGCLP thickness map (Fig. 1Ae). Age-corrected RNFL (Fig. 1Ac) and RGCLP (Fig. 1Af) probability maps were created based on these thickness maps and normative controls [12].

Fig. 1: Local defect.
figure 1

Example of an eye in the likely progression reference standard (RS-P) that showed “statistical progression” according to the global RGC (Gmac) metric, but not according to the global RNFL (GONH) metric. The arrows indicate damage in the baseline (A) and follow-up (B) reports, which was not detected by the GONH metric. The red and black arrows indicate inferior and superior damage respectively, both of which are subtle local arcuate regions showing progression. Panels a–f show essential parts of the one-page report: Derived circle b-scan (a) and its corresponding cRNFL thickness plot (b); RNFL (d) and RGC (e) thickness maps; RNFL (c) and RGC (f) probability/deviation maps.

Establishing progression with OCT summary metrics

Global cRNFL (GONH) and global RGCLP (Gmac) average thicknesses were calculated for each eye at each visit. The thresholds [95% confidence interval (CI)] to identify statistically significant event-based progression in the study group were derived from a short-term group after performing quantile regression [13], which is analogous to how event-based progression is defined with commercially available VFs and OCT. Details of this event-based methodology are provided in the Supplementary Information (Supplementary Fig. 1).

These 95% thresholds were then applied to the 104 eyes of the study group. Eyes whose GONH or Gmac metric on the follow-up test were equal or greater than the 95% CI were classified as “statistical progressors”.

Reference standard (RS) for progression

Our objective here was to identify factors affecting changes in GONH or Gmac by analysing B-scans (e.g., Fig. 1Aa) and probability maps (e.g., Fig. 1Ac, f) of possible FPs and possible FNs. To identify the eyes that are possible FP and false FN, a reference standard (RS) was used. In particular, four of the authors independently decided on progression or no progression after evaluating all available OCT and VF tests, and all OCT reports with probability maps (Fig. 1). For the 104 study eyes, the average number of visits was 8.3 ± 2.6. Initially, the experts agreed for 98 eyes, and consensus was reached for the remaining 6 after they reviewed the cases together.

Results

Progressors according to metrics

The two global summary metrics (i.e., GONH and Gmac) identified a similar number of patient eyes as ‘statistical progressors’; 24 for GONH and 25 for Gmac (Fig. 2A). About half, 12 eyes, were ‘statistical progressors’ according to both metrics.

Fig. 2: Comparison of the performance of the global RNFL (GONH) and global RGCL (Gmac) metrics in identifying progression, against progressors in the clinical reference standard (RS-P).
figure 2

The eyes are split into patient (A) and healthy (B) eyes.

The GONH and/or Gmac metric also identified 7 of the 28 (25%) HC eyes as “statistical progressors” (Fig. 2B). These seven eyes were clearly FP as they were HCs with no signs of glaucomatous damage. Of these seven FP, two were FP on GONH and six on Gmac, and one on both.

Comparison of clinical RS and summary metrics

Based upon the RS, 31 of the 76 (40.8%) patient eyes showed signs of progression (RS-P), while none of the 28 HC eyes were identified as RS-P.

True positives based upon RS

Of the 31 RS-P eyes, 11 eyes (35.5%) were identified by both metrics as statistical progressors (Fig. 2A). All 11 showed clear signs of progressing damage on both the RNFL and RGCLP thickness and probability maps. An example of a true positive for both metrics is provided in the Supplementary Information (Supplementary Fig. 2/Supplementary Video 1).

FNs based upon RS

Only four (12.9%) RS-P eyes were missed by both metrics (Fig. 3A). All four showed clear glaucomatous damage when the entire report was evaluated.

Fig. 3: Differences in widefield centring.
figure 3

Example of an eye with different disc centring between baseline (A) and follow-up (B) scans. The red arrows and dashed red lines confirm the misalignment by the position of the blood vessel shadows on the circle scan. A second example shows different foveal centreing (white crossed lines) in the baseline (C) and follow-up (D) RGC+ probability plots. The black arrows indicate areas with subtle differences in the probability values due to foveal centring rather than true glaucomatous progression.

The fact that only four eyes were missed by both GONH and Gmac underestimates the extent of the problem with the clinical use of these metrics. Suppose we were to use “abnormal on GONH OR Gmac” for clinical decision making. Then, although the FN rate for RS-P would be 12.9% (4 eyes), the FP rate for the HC would be 25% (Table 1). Thus, we need to understand the FNs for GONH and Gmac alone. A total of 20 (64.5%) of 31 RS eyes were missed by one or both metrics. That is, in addition to the 4 missed by both, 16 other eyes were missed by either GONH or Gmac.

Table 1 Percent/(number) of false negatives (FN) and false positives (FP) based on global retinal nerve fibre layer (GONH) and global retinal ganglion cell plus inner plexiform layer (Gmac) metrics.

Missed only by Gmac

Five (5) of the 31 eyes categorised as RS-P were identified as ‘statistical progressors’ on the GONH, but not Gmac, metric. Three of the five eyes showed clear thinning on the RGCLP thickness map, even though the Gmac metric failed to identify the eye as a progressor.

Missed only by GONH

Eleven (11) of the 31 eyes in the RS were identified as ‘possible progressors’ on the Gmac, but not the GONH, metric. Seven of these 11 GONH FN eyes showed clear progressive thinning on the RNFL, which was not detected by the GONH metric. Figure 1 shows the reports for one of these eyes. The arrows point to corresponding regions with clear progression in the inferior retina and disc (red) and the superior retina and disc (black).

Three additional examples are provided in the Supplementary Information where the metrics failed to detect the RS-P eyes correctly (Results). One eye was missed by both metrics (Supplementary Fig. 3/Supplementary Video 2), while another only by GONH (Supplementary Fig. 4/Supplementary Video 3), and the last only by Gmac (Supplementary Fig. 5/Supplementary Video 4).

FPs based upon RS-NP

First, of the 28 HC eyes, GONH and Gmac falsely classified 2 (GONH) and 6 (Gmac) eyes as statistical progressors. Further, of the 45 patient eyes judged to be RS-NP, 8 (GONH) and 3 (Gmac) eyes were classified as “statistical progressors”, with 1 eye judged as progressing by both.

Post-hoc analysis of FP and FN

A post-hoc analysis was performed to understand the possible reasons for the disagreement between the metrics and the RS. This analysis identified three possible reasons: (1) local damage; (2) disc and fovea centring; and (3) segmentation errors.

Local damage

Of the 20 FN eyes missed by one or both metrics, 6 had local defects (2 FNs on both metrics, 3 on GONH, and 1 on Gmac). The reports (panels A and B in Fig. 1) are for an eye “progressing” according to the Gmac, but not the GONH. Local defects in both the superior (black arrows) and inferior (red arrows) retina deepen over time. The GONH metric missed this local damage.

Differences in centring of derived circle or fovea

In six of the eyes where the GONH metric disagreed with the RS (four FN, two FP), there was a small difference in centring of the optic disc between days identified on the reports. Figure 3A, B shows an example where the disc was centred differently on the two reports. This resulted in a change in the location of the derived circle scan, as can be seen by the shadows of the blood vessels (red arrows and dashed lines). This resulted in an FN for GONH. For these six eyes, the change in GONH was small (average of 3.8 μm), only just outside the 95% CI. (Overall, based upon the quantile regression, the 95% CI for GONH ranged from 3.2 to 3.6 μm.) Note that in five of these six eyes, Gmac, which does not depend upon disc centring, agreed with the RS-P.

A similar problem can occur via small differences in the centring of the fovea for the Gmac analysis. This appeared to be the primary reason for five HC eyes that were FP only on the Gmac. For example, in Fig. 3C, D, the ring-like artefact in the RGCLP probability plot (known to be due to anatomical differences of the fovea) suggests a small difference in centreing [14]. For the five eyes, the Gmac change ranged from only 1.4 to 1.5 μm, a large value relative to the 95% CI which ranged from 1.0 to 1.4 μm. Note, the foveal centring should only affect the Gmac. Consistent with this, GONH agreed with the RS for all five eyes.

Segmentation errors

Segmentation errors can affect the metrics. Figure 4A shows an example where the segmentation, secondary to a scanning artefact, clearly affected the Gmac value. While large errors such as this were rare, more subtle segmentation errors undoubtedly occurred and would be harder to detect. Figure 4B shows an example where a subtle segmentation error (red arrows) resulted in a decrease in the cRNFL thickness in the follow-up scan of this HC eye. The GONH value changed by 4.6 μm, resulting in a FP, as the 95% CI was 3.1 μm. By superimposing the cRNFL plots for the two scan dates (lower right panel), we estimate that this segmentation error contributed about 4 μm to the change in GONH. Thus, small segmentation errors can lead to FP or FN errors.

Fig. 4: Segmentation errors.
figure 4

Example of a scanning artefact in the macula (A), with the baseline RGC+ scans in the bottom row showing scan artefact (black arrows). The artefact is indicated by the black arrows. Example of a segmentation error around blood vessels (B), indicated by the red arrows.

Discussion

We evaluated the performance of two common metrics used for detecting progression of glaucoma, global cRNFL thickness (GONH) and global RGCLP thickness (Gmac). Consistent with previous studies, these metrics identified a similar number of eyes with a standard event-based technique [15,16,17]. In particular, the metrics identified 24 (GONH) and 25 (Gmac) eyes as “statistical progressors,” with 12 eyes progressing on both. Further, we demonstrated that these conventional thickness measures, combined with a traditional event-based analysis, resulted in both excessive FPs and FNs. A post-hoc analysis uncovered reasons for their poor performance, which was the main purpose of this study.

An evaluation of metrics

Based upon the RS for the patients, the metrics had relatively high FN and FP rates as shown in Table 1. For example, the eyes showing progression according to our RS-P, the FN rates for GONH and Gmac were 48.4% (15 eyes) and 29.0% (9 eyes) (columns 1 and 2, row 1). Given that only four eyes were missed by both metrics, if we classify an eye as a “progressor” based upon an abnormal GONH OR an abnormal Gmac, then the FN rate of 12.9% (column 3, row 1), is considerably lower. However, this OR criterion will increase the FP rate (i.e., decrease specificity). In particular, 10 of the 45 RS-NP eyes would be identified as statistical progressors based upon an abnormal GONH OR Gmac, for an FP rate of 22.2% and a specificity of 77.8% (column 3, row 2). Further, 7 of the 28 HC eyes would be identified as statistical progressors, for an FP rate of 25% and a specificity of 75%. Thus, GONH and Gmac metrics are a poor method for detecting progression in this population of eyes with early glaucoma.

Why are metrics performing poorly?

We identified three reasons why these global metrics perform poorly. First, they can miss local damage. The fact that local damage can be missed is understandable as both metrics are based upon averages of regions larger than these local defects. Second, we found that subtle segmentation errors can produce changes in GONH and Gmac that are large relative to the criterion change used to identify progression. Finally, relatively subtle changes in centring of the fovea or disc can also produce changes in GONH and Gmac. As a test of concept, we simulated changes in the centring of the fovea and the disc. According to these simulations, small changes in the centre of the disc can produce a change in GONH equal to the average 95% CI cutoff. This is consistent with a 2009 study by Cheung et al. [18]. Based upon older time domain OCT circle scans, they estimated that offsets as small as 0.1 mm in disc centring produced on average a change in GONH of 2.3 μm. Similarly, we found changes in the centre of the fovea as small as 0.5° (about 0.14 mm) can produce a change in Gmac equal to or more than the average 95% CI cutoff.

There are two important points to be made about segmentation and centring problems. First, all algorithms make segmentation errors and correcting them is difficult in general, and typically not feasible in a clinical practice [19,20,21]. Likewise, small changes in centring of disc and/or fovea are difficult to impossible to avoid [22, 23]. Segmentation will affect centring and so will head tilt into the plane of the scan. Currently, there is no way to correct the latter. Second, relatively small changes fall outside the 95% CI for these metrics. In this study, average changes of only 3.4 μm (GONH) and 1.6 μm (Gmac) are needed. Thus, although the changes in these metrics caused by segmentation and centring are small, they can still lead to both FPs and FNs [18].

Given these three problems, it is not surprising that global metrics are suboptimal for identifying progression. Further, there is no easy fix for these problems. Conventional clinical standards, such as Zeiss’ Glaucoma Progression Analysis (GPA), use longer series (usually at least four tests) in an attempt to overcome some of these issues. Trend- and event-based analysis of a series of tests can potentially reduce the ‘noise’ and exclude outliers, although it is likely that local damage will still be missed, and segmentation and centring errors will still contribute to variability. However, there is a more fundamental problem inherent in the trend-based analysis. We have argued that analyses of long series of tests do not fully answer a crucial clinical question that physicians face in a glaucoma clinic; that is, “has glaucoma progressed since the last visit?” [24].

Our 95% CI values and the literature

Previous studies using different OCT instruments arrived at a 95% CI near 5 μm for the GONH metric [25,26,27]. This lead to the “Rule of 5  μm” used by some clinicians [28]. Some consider changes in GONH of more than 5 μm as indicating progression. In a longitudinal study, Thompson et al. concluded that a 95% CI of 5 μm resulted in too many FPs due to test-retest variability [28]. Our 95% CI value for GONH was on average 3.4 μm, smaller than 5 μm. Had we used 5 μm instead, it would have reduced the FP rate, but increased the FN rate, leaving accuracy about the same (Table 1, column 5). The accuracy of these global metrics is poor. Thus, changing cutoffs will only trade off sensitivity vs. specificity; it will not improve accuracy.

What is the alternative?

We have previously argued that OCT global metrics will miss damage that can be seen on reports such as those in Fig. 1 [12, 29]. As in the case of early detection, we are suggesting that trained observers will outperform GONH and Gmac metrics if they had these reports. Of course, there may be some purposes, such as clinical trials, where qualitative evaluations are not appropriate. For these purposes, we need to find alternatives to global metrics. For detection of glaucoma, we have shown success with an objective structure–function method, as well as a deep learning approach [11, 30,31,32,33]. Similar approaches can be applied to progression. For example, the clinician can topographically compare the changes in the VF to the changes in the OCT probability maps, as well as topographically compare the changes in the different OCT maps and images.

Limitations

There are three limitations to this study worth mentioning. First, the sample is relatively small, although it is hard to see how more eyes will change the fundamental findings here. Second, the design suffers from the general problem facing studies of progression. There is no “gold standard” or “litmus test for progression.” In this study we used an RS based on the consensus of four experts after evaluation of all available structural and functional information. Other progression studies have used, for example, Zeiss’ GPA to confirm the presence of deterioration [34, 35]. Thus, applying different RS will produce different estimates of FP and FN. However, our general conclusions regarding the problems with these metrics should hold. See the Supplementary Figures for proof of concept.

Finally, the eyes in this study were all “early glaucoma,” as defined by 24-2 MD better than -6 dB at baseline. The results here need to be extended to more advanced glaucoma. While it is generally held that one cannot use OCT for eyes with GONH values less than about 50 μm, we have recently shown this is not true [36].

Conclusions

Global statistics such as average cRNFL thickness (GONH) and average RGCLP thickness (Gmac) will miss or overcall progression of glaucoma. There are inherent problems with these methods that will be difficult, if not impossible, to correct. In particular, as they are averages, they can miss local defects. Further, they are prone to FP and FN mistakes due to subtle segmentation and alignment errors of the fovea and disc centres. Approaches are needed which do not rely on these metrics and instead focus on the topographical agreement among the cRNFL, RGCLP, and RNFL thickness measures.

Summary

What was known before

  • Average (global) measures of the circumpapillary retinal nerve fibre layer (cRNFL) and the retinal ganglion cell plus inner plexiform layer (RGCLP) thickness are common measures of progression. However, these two measures, global cRNFL (G) and global RGCLP (Gmac), miss early glaucomatous damage. Thus, it is likely that these two measures will also miss clear progression of glaucoma, while also falsely identifying some eyes as progressors.

What this study adds

  • Global metrics G and Gmac can lead to both false positives and false negatives because of problems inherent in OCT scanning, such as segmentation and centring. In addition, they can miss local damage (false negatives). These problems are difficult, if not impossible, to correct, and raise concerns about the advisability of using global metrics for detecting progression.