Introduction

Automated perimetry is widely used to monitor individual patients with glaucoma and as an outcome measure for glaucoma intervention studies. The lack of gold standard against which glaucomatous change can be measured has led to the development of arbitrary or empirical criteria for change.1 All change criteria used to assess visual field change are hampered by the fact that points in automated visual fields are known to show intra- and inter-test variability.2 This variability is greater in glaucomatous fields than normal ones.3 Criteria developed for clinical use have been ‘Arbitrary’ (x points changing by y dB represents significant change) or ‘Empiric’ (derived from the study of a population followed over time).1 However, as the Normal-tension study group reported, even when empiric criteria are developed, their application to real data can lead to unexpectedly high rates of field change that require adjustment of the field criteria.1

Pointwise linear regression (PLR) was developed as a technique for dealing with the intertest variability in threshold seen with automated visual fields.4 Rather than compare baseline field(s) with an index field, it analyses the point thresholds from all the fields produced during the follow-up period.4 It has been shown to be capable of closely emulating the Statpac analysis on the HFA5 and capable of detecting progression earlier than Statpac.6 Early PLR criteria were arbitrary, and the investigators using them had no ‘gold standard’ to use, a situation that persists. The criteria chosen required one6, 7, 8, 9, 10, 11 or two points12, 13 to be changing. The minimum critical slope was specified as being <0 dB/year,8, 9, 13 <−1 dB/year,6, 10, 12 <−2.4 dB/year7 or <normal age decline.11 The level of significance or P-value used has not always been reported;11, 12 but where it has, the values chosen have been P<0.0009,13 P<0.001,8, 9, 10 and P<0.05.6, 7

Criteria from some of the studies mentioned above were used to analyse the visual fields from the UK Medical Research Council (MRC)/Moorfields 5-FU Filtration (More flow) Surgery Study. The analysis revealed either no change detected (1 point changing, slope <−1 dB/year, P<0.001)8, 9, 10 or large amounts of change (1 point changing, slope <−1 dB/year, P<0.05).6, 7 With the latter criteria, the amounts of change detected suggested that 64% of the patients were showing progression and/or improvement over their first 16 months of follow-up. It was clear that using established criteria on our data set led to the detection of unrealistic amounts of change. We thus sought to explore the effect that altering PLR progression criteria has on the overall detection of change, with the ultimate aim of obtaining more realistic change criteria. In selecting a PLR change criterion, we had to first consider what were the underlying causes for any apparent change in retinal threshold that were detected. When change is detected in a visual field series, the following three factors are likely to be responsible:

  1. a)

    neuronal damage

  2. b)

    spurious change due to learning14 and fatigue15

  3. c)

    noise.3

Ideally, a criterion would only detect changes due to retinal ganglion cell damage. Spurious change due to learning and fatigue was minimised owing to the way that the trial protocol had been established. Patients were already experienced in automated fields prior to entering the study. Once enrolled, they underwent additional testing prior to surgery. Furthermore, the trial patients were seen in special clinics by a dedicated team, with rest periods factored in to the schedule.

Applying numerous criteria to a data set, after the data have been collected, could yield criteria that appeared to be superior through chance alone. We therefore validated what are still empirical criteria against a second independent data set. Furthermore, we have explored a method for deriving criteria for future field data sets whether obtained in clinical practice or in trials. As more data from the trial become available, criteria applicable to longer follow-up will be obtainable using these techniques.

Method

Data for this study were obtained from the ongoing UK MRC Moorfields 5FU filtration (More Flow) study. Low-risk patients undergoing primary trabeculectomy are randomised to receive intraoperative 5-FU or placebo. As part of the study, patients are followed with regular Humphrey Field Analyzer full threshold 24-2 fields. Patients must have a diagnosis of glaucoma with an intraocular pressure (IOP) >21 mmHg and a cupped optic disc. The study entry conditions require that only patients with a persistent visual field defect be recruited. The trial definition of field defect is a Humphrey 24-2 visual field with

  • 2 locations >5 dB less than age-corrected normal and/or

  • 1 location >10 dB less than age-corrected normal

The subject must be able to perform the field test with <20% false positives, <33% false negatives and <20% fixation losses.

Once enrolled, the patients undergo a refraction followed by at least two preoperative baseline visual field tests. In order to minimise the effect of learning, the patient undergoes further preoperative field testing if the mean deviation (MD) difference between successive baseline tests is ≥2 dB. Only one eye from any patient is entered in the trial.

Following surgery, each eye undergoes field testing at 3-monthly intervals in the first year and 4-monthly intervals thereafter. Only the final baseline field is included in any analysis. Where the same eye is tested twice on the same postoperative visit, only the first field test is used.

From the data collected, two sets of field data were defined and made available for analysis from the following patients:

  1. i)

    the initial 56 patients to complete 16 months follow-up,

  2. ii)

    the next 97 patients who completed 16 months follow-up.

In both data sets, the number of fields available for analysis was 6 including the last field obtained before surgery.

An initial analysis of the first data set was performed using PLR criteria as well as the following conventional progression criteria that were applicable to this data set:

  1. 1)

    Statpac global indices: deviation, pattern standard deviation, and corrected pattern standard deviation.16

  2. 2)

    The AGIS grading system:17 individual patients had their AGIS scores calculated at baseline and at 16 months. A change of 4 or more was deemed to be significant.17

From the various PLR criteria tested, it became clear that some were capable of detecting reasonable amounts of progression while not detecting improvement, ‘Exclusive Progression’. In selecting criteria that did not detect improvement, we felt that we were minimising the detection of noise. Patients were experienced in automated visual fields and unlikely to be showing marked learning. We felt that criteria showing exclusive progression were likely to be detecting change due to retinal ganglion cell death. We therefore tested criteria showing exclusive progression against the second data set in order to validate them.

PLR analysis

For each change criterion, PLR was performed using the following steps:

  1. 1)

    calculate the slope and level of significance of the threshold in decibels each point, over the chosen number of fields.

  2. 2)

    select a change criterion, for example, 2 contiguous points in any hemifield.

  3. 3)

    count the number of eyes with points with a significant slope (P<0.01) fulfilling the criterion. A significant slope was deemed to be <−1 or >1 dB/year.

  4. 4)

    repeat step 3 as the level of significance was varied from 0.001 to 0.05 in 0.001 steps.

Points were labelled as progressing or improving in a manner similar to that used by Smith et al.8 Those with significant negative slopes were said to be progressing whereas those with positive slopes were said to be improving. If sufficient progressing and improving points were present in the same analysis, then an eye was labelled as ‘both’, that is, showing both progression and improvement.

The first data set of 56 patients (data set i) was analysed using the following criteria, and each required the specified number of points to have significant slopes:

  1. 1)

    1, 2, 3, 4, or 5 points anywhere in the field.

  2. 2)

    2 contiguous points in any vertical hemifield.

  3. 3)

    3 contiguous points in any vertical hemifield.

  4. 4)

    2 points in a glaucoma hemifield test (GHT) cluster.18

  5. 5)

    2 points in a perimetric nerve fibre bundle cluster (PNFB).19, 20

  6. 6)

    2 points in an optic nerve head (ONH)-derived cluster.21

To facilitate comparison with the GHT and PNFB clusters the ONH cluster arrangement was bisected along the horizontal meridian. Figure 1 shows the cluster layout for the GHT, PNFB, and ONH clusters. Two of the PNFB cluster areas shown in Figure 1 only cover a single test point in a 24-2 field. Given the expected low variability of the single cluster close to the fovea, a significantly changing point in these clusters was sufficient to label a patient as changing. Elsewhere in the field, 2 points had to be changing in the same cluster to label an eye as changing. The single clusters adjacent to the blind spot are likely to exhibit high variability being close to a physiological scotoma22 and were therefore excluded.

Figure 1
figure 1

Glaucoma hemifield test, perimetric nerve fibre bundle, and optic nerve head cluster arrangements.

From PLR analysis, criteria were derived that, at any point through the range of significance tested, were capable of detecting progression without detecting improvement—‘exclusive progression’. An arbitrary minimum of four eyes showing exclusive progression was required. This was approximately 1/10 the rate of progression detected in preliminary analysis of the data using the criterion of 1 point changing with a slope of 1 dB/year, P<0.05 (36/56 eyes), or 1 point changing on Statpac II analysis, P<0.05 (40/56 eyes). These criteria were then ‘Validated’ by applying them to data set ii, the next 97 patients to enter the trial and be followed for 16 months. We then looked to see if any criterion was capable of detecting exclusive progression in both data sets.

Results

The baseline demographics data sets i and ii are shown in Table 1. There were no significant differences between the baseline values for the two patient groups. Values for MD, pattern standard deviation (PSD), short-term fluctuation (SF), and corrected pattern standard deviation (CPSD) at baseline, 3, 6, 9, 12, and 16 months are shown in Table 2. None of the changes at 16 months were significant when compared using a t-test with the baseline values.

Table 1 Baseline characteristics of the initial 56 and subsequent 97 patients to complete 16 months of follow-up
Table 2 Statpac Global Indices for each postoperative visit from baseline to 16 months

Out of the initial 56 patients followed for 16 months, five patients showed progression and two patients showed improvement. The definition of change was an increase or decrease in score of 4 or more.

PLR analysis

Figure 2 shows the numbers of eyes showing change for 1, 2, 3, 4, or 5 points as the level of significance was varied from 0.001 to 0.05. Regardless of the number of points specified, there is an increase in the amount of change detected as one moves from a strict to a lax P-value. The amount of change detected falls as one uses a change criterion that requires more points to have a significant slope. When using a criterion of 3 points or stricter, the detection of ‘both’ progression and improvement is abolished. Thus, with a single point criterion and a P-value <0.05, large amounts of progression and improvement are seen; 64% are progressing, 58% improving, and 39% showing both. Detection of some exclusive progression occurs using 3, 4, or 5 point change criteria; requiring 5 points to change, results in at least four patients showing exclusive progression.

Figure 2
figure 2

Basic analysis, data set i. Number of eyes showing change as P-value is varied from 0.001 to 0.05, with critical slope held constant at 1 dB/year. N=56 eyes.

Figure 3 shows the change detected when using criteria that specify the arrangement of points in the field. Overall, the amount of change detected is lower; at the P<0.05 level, 20 eyes show change using ‘2 points anywhere in the field’, whereas with ‘2 points in a PNFB cluster’, only 11 show progression. No eyes show both progression and improvement. At least four eyes show exclusive progression when using the criteria: ‘3 contiguous points’, ‘2 points in a GHT cluster’, and ‘2 points in a PNFB cluster’. Only one eye shows exclusive progression using ‘2 points in an ONH cluster’.

Figure 3
figure 3

Spatial analysis, data set i. Number of eyes showing change as P-value is varied from 0.001 to 0.05, with critical slope held constant at 1 dB/year. Spatial arrangement of points. N=56 eyes.

Validation analysis

Having examined Figures 2 and 3, we selected criteria that were capable of exclusively detecting at least 4/56 eyes as exclusively progressing. The criteria selected were:

  1. a)

    5 points anywhere in the field,

  2. b)

    3 contiguous points in either hemifield,

  3. c)

    2 points in a GHT cluster, and

  4. d)

    2 points in a PNFB cluster.

These criteria were then applied to the second data set. Figure 4 shows the change detected when the criteria were applied to an independent group of 97 patients followed over six fields. The curve patterns are quite different, especially the relationship between the progressing and improving curves. Only one criterion ‘2 points in a PNFB cluster’ continues to be able to detect exclusive progression. When this arrangement of points was used to analyse the first data set, 5/56 (8.9%) eyes showed exclusive progression at P<0.026. Maximum exclusive progression is detected at P<0.013 in the second data set with 7/97 (7.2%) eyes demonstrating this at that point.

Figure 4
figure 4

Validation analysis; criteria capable of detecting exclusive progression in data set i applied to data set ii. N=97 eyes.

Discussion

This study highlights the large amount of progression and improvement that can be detected using pre-existing pointwise linear regression change criteria. By refining change criteria and, in particular, by specifying a spatial arrangement of points, it is possible to produce criteria that will exclusively detect progression. We feel that it is beneficial to solely detect exclusive progression because it is more likely to detect change in retinal sensitivity due to retinal ganglion cell death rather than due to noise, and with this data set, learning or fatigue. If these criteria are applied to an independent data set, it is possible to select a spatial arrangement of points that continues to be capable of detecting exclusive progression. It is the criterion involving 2 points changing in a perimetric nerve fibre bundle cluster that offers the best option for exclusively detecting progression. This criterion uses the slope criterion of 1 dB/year. This has been used not only because it represents approximately 10 × the age-related decline per year but also because a patient with a normal sensitivity of 30 dB, if declining at such a rate, would take 30 years to lose all sensitivity at that point. This represents a clinically relevant end point within the lifetime of most glaucoma patients.

With only 16 months of follow-up, it is difficult to assess how much true change has occurred. Analysis of global visual field parameters shows no significant mean change over the 16 months of analysis performed. Given the slow rate of change of similar parameters noted in other studies, it would be more remarkable if significant change had been noted. In the Glaucoma Laser Trial, the mean change in decibels per test location improved for the first 3 years of the trial.23 It has also been shown that global parameters improve following trabeculectomy.24 Two retrospective studies of visual field change confirm the slow rate of change in MD and CPSD. Smith et al8 examined the fields of glaucoma patients followed with automated fields for an average of 7.1 years. When linear regression of MD and CPSD was performed, only 12.6% had a significant change in MD and 14.1% had a significant change in CPSD. Those eyes with a significant slope showed a mean change in MD of −1.26 dB and in CPSD of 0.71 dB. A similar retrospective review by Katz et al9 over a median of 6.3 years found that only 17.9% of eyes showed a significant deterioration in MD and only 7.5% showed a significant change in CPSD.

Out of 56 patients followed for 16 months, five showed progression and two showed improvement where change was defined as an increase or decrease in score of 4 or more. This figure was derived from the AGIS group who found that retesting of fields produces a change of 4 less than 5% of the time.17 Thus, with random change, one might expect 2–3 patients to show improvement and 2–3 to show progression. So only two or three of the patients are progressing, although it is impossible to say which of the five truly are. These figures for significant change in AGIS score, especially improvement, are markedly different from those seen in a previous study, where 7.5% showed progression and 11.9% showed improvement.25 It may well be that the patients' level of field experience in the MRC Moorfields 5-FU Filtration (More Flow) trial was greater and that this contributed to reduced levels of improvement. The change in mean AGIS score from 7.86 to 8.21 over 16 months was not significant. In the AGIS study, the change in score in Caucasian patients randomised to trabeculectomy–argon laser trabeculoplasty–trabeculectomy was approximately +0.15 over 18 months and was not significant.26 This probably reflects the inability of the scoring system to detect small change.

Some of the early criteria used in PLR analysis used simple criteria that required 1 point6, 7, 8, 9, 10, 11 or 2 points12, 13 to be changing. However, from the data shown here, it is clear that criteria such as 1 point changing with a slope of 1 dB/year and P<0.0018, 9, 10 would probably detect change over shorter periods of follow-up (Figure 1). Previous work has also noted the detection of improvement using PLR;8, 9, 10, 11 however, some early work assumed that those patients without points with significant negative slope should be labelled as not changing.7, 12 What is clear from the data in this study is the appearance of patients showing both progression and improvement when lax criteria are used. However, it is clear that with some additional modification criteria exclusively detected progression can be created. Either by requiring more points to have a significant slope or by requiring a spatial relationship between the points, a reasonable number of patients can be detected as exclusively progressing.

All three cluster arrangements are based on observations of the relationship between the nerve fibre layer and the visual field. The GHT clusters were defined by photographing the nerve fibre layer of normal subjects, making a line drawing of fibre arrangement, and then relating the fibre arrangement to the points tested in a 30-2 field.18 The PNFB cluster layout was derived by correlating high-resolution visual field defects with nerve fibre layer defects.19, 20 Subsequent work adapted the field clusters to the test points on a Humphrey Field Analyzer.27 The ONH clusters were derived by overlaying nerve fibre layer photographs with the corresponding test points from a 24-2 field. The nerve fibre layer defects were traced back to the optic nerve head, thus allowing a relationship between 24-2 test points and sectors of the optic nerve head to be derived.21

Validation of criteria on the second independent data set

Given the arbitrary nature of the criteria applied to the first data set, it is hard to make claims from this analysis alone. More interesting is what happens when these same criteria are then applied to an independent data set. Moving from the first to the second sets, one sees a general leftward shift in change curves, with a variable change in the spacing between progression and improvement curves. Three criteria perform badly on the test set, detecting no or minimal exclusive progression:

  1. a)

    5 points anywhere in the visual field, slope=1 dB/year

  2. b)

    3 contiguous points, slope=1 dB/year

  3. c)

    2 points in a GHT cluster, slope=1 dB/year

Choosing an optimal progression criterion is now easier after the analysis involving the second data set. By comparing the results from the data sets, one is left with a single potential criterion: 2 points in a PNFB cluster, slope=1 dB/year. This expression of a change criterion does not stipulate a P-value. For the data here, the P-value would be P<0.026 for the first data set and P<0.013 for the second. The difference in P-values for the sets may represent random fluctuation or may be due to the larger size of the second set. The first set may have been too small to register any improvement at P-values <0.026.

With larger data sets, it might well be the case that low levels of improvement at strict P-values would make the detection of exclusive progression very difficult. One could foresee a criterion that at worst had a 5% improvement in detection rate when applied at a specified time point. A criterion that allowed for a specified amount of improvement would also allow for the initial improvement seen after trabeculectomy. Such a criterion could be developed without looking at the treatment code by analysing the data set as a whole. It could then be applied to the data from separate treatment arms. Ultimately, this is a technique for developing criteria for large groups of patients. Their applicability in the clinical setting to an individual patient will always be influenced by the clinician's knowledge of the extent of the pre-existing visual field defect, and other parameters particularly IOP, optic disc, retinal changes, and media opacity.

All intraocular surgery is recognised as potentially being cataractogenic. Trabeculectomy is no exception, involving intraocular inflammation, alteration of aqueous flow, changes in the blood eye barrier, and prolonged exposure to topical steroids. Thus, although our patients were excluded from the trial if they were likely to develop cataract over 3 years, it is quite possible that some did go on to develop lenticular opacities. Some studies that have prospectively followed trabeculectomy patients have found low levels of cataract formation in the first 6 months after trabeculectomy. Clarke et al28 found no evidence of cataract causing a drop of 2 lines of vision over the first 6 months. A larger short-term study found that 42/508 (8.3%) eyes dropped 2 or more Snellen lines of acuity in the first 3 postoperative months.29 Of these, only 16 lost vision due to cataract. Other studies, especially those with longer follow-up have suggested higher rates of cataract progression: 18.1% of patients developed cataract over 1 year in Robin's study,30 and 37.1% of Caucasian patients randomised initially to trabeculectomy in the AGIS study underwent cataract surgery over 7 years of follow-up.26 An additional factor to bear in mind is the possible impact of the use of antimetabolites. One study has shown that the use of mitomycin is associated with an increased risk of progression of cataract post operatively.30 However, Egbert et al's31 study of intraoperative 5-FU found no differences in rates of cataract formation between treatment and control arms. None of the patients underwent cataract extraction during this phase of the study.

If one assumes that significant cataract formation was occurring, then it would be likely that a diffuse change in the visual field would result.32, 33 One can try to predict the effect of this diffuse change on the pointwise linear regression of fields, one could argue that significant change would be more likely to be detected in ‘normal’ areas where there was lower threshold variability.

Existing criteria for detecting change in automated visual fields using pointwise linear regression can thus be improved for the data sets shown here. Using simple criteria, the amount of change detected is so large as to make meaningful interpretation of a trial's results impossible. By exploring the change detected with new criteria on one data set and then validating the changes on an additional data set, it is possible to select an optimal criterion. A criterion that requires 2 points to change in a modified Perimetric Nerve Fibre Bundle Cluster19, 34 offers the best detection of progression without detecting improvement. The simple change criterion tested here requiring change over consecutive analyses was not as effective. All analyses were made with a critical slope of 1 dB/year. The selection of a P-value will probably depend on the characteristics of the data set: degree of field loss, field test reliability, frequency of testing, and duration of follow-up. However, with the data sets used here, the optimal P-value would be P<0.013.

We have presented a data set with a short follow-up, with obvious limitations. We have used fields from patients in a trial where the field used was the 24-2 rather than 30-2, reducing the amount of field data collected. Furthermore, the inclusion criteria for a field defect takes no account of the change in threshold variability with eccentricity;2 this might produce a slight bias on the selection of patients, but any effect is likely to be small. However, the process of testing criteria on one data set and then validating them on an independent set could be used for any collection of fields. It may well be that the criteria derived will be unique for each data set depending on the baseline characteristics, as well as frequency and duration of follow-up. However, it avoids the problems of unrealistic amounts of change being detected when using empirically derived criteria. With extended follow-up from the UK MRC Moorfields 5-FU filtration study, we plan to explore change over longer periods of time in future publications.