Introduction

Glaucoma is a progressive optic neuropathy with loss of retinal nerve fibres leading to morphological changes of the optic nerve head and visual field defects.1 Currently, the mainstays of diagnosis include ophthalmoscopic assessment of the optic disc and retinal nerve fibre layer (RNFL) and testing of the visual field. These methods, however, have their limitations. They are too laborious for screening and somewhat subjective when employed for glaucoma detection and follow-up.2

Several imaging technologies have been developed to objectively quantify the anatomy of the optic disc and RNFL and to improve the accuracy of each in making the diagnosis of glaucoma and in detecting any progression.3 One of these is scanning laser ophthalmoscopy (SLO), which provides a topographical image of the optic disc and calculates parameters to quantify its three-dimensional structure. The technique of SLO is featured in the Heidelberg Retina Tomograph (HRT, Heidelberg Engineering GmbH, Dossenheim, Germany) and in the Topographical Scanning System (TopSS, Laser Diagnostic Technologies, Inc., San Diego, CA, USA). So far, SLO-studies have mainly evaluated the usefulness of optic disc parameters for glaucoma detection, their reproducibility of measurements and any correlations with visual field defects.4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 However, relatively little attention has been paid to the possible value of SLO optic disc biometry for improvement of glaucoma follow-up.20, 21, 22

To reliably detect development or progression of disease with a certain measurement parameter, it is necessary to know the absolute difference between follow-up measurements that must be exceeded to detect a statistically significant change over time. The first aim of this study was to determine for all TopSS parameters the 95% limits of agreement (LA95%),23 which is a reliable absolute measure-of-change that can be used directly as a clinical diagnostic tool. Secondly, we wanted to estimate which parameters are most suitable for clinical disease detection and follow-up. However, to our knowledge, currently there is no index that allows a direct comparison between parameters to estimate which are the most suitable ones.

We currently propose such an index, based on the LA95%-values. This so-called Discriminating Capacity Index (DCI) was derived by dividing measurement range by the corresponding LA95%-value. Hence, the DCI takes into account both the dynamic range of disease progression and the extent of measurement variability that should be overcome in order to define a statistically significant change. A parameter can be highly reproducible, but still be of little clinical use, if its dynamic range is small. Conversely, a parameter with large variability may still prove clinically useful if its expected dynamic range is even larger. Moreover, as the DCI is dimensionless it allows comparison across various parameters and different technologies.

Materials and methods

Scanning laser ophthalmoscope

In the present study, we measured all subjects with the TopSS. The details of this instrument have been described elsewhere.24 In short, employing a gallium-aluminum-arsenide diode laser beam (wavelength 780 nm), the instrument makes an optic disc scan in 0.9 s. Within a predefined scan depth range, 32 x,y-frames (of 256 × 256 pixels each) are acquired confocally at regular intervals along the z-axis. The machine software then generates a light intensity curve for each concomitant set of 32 pixels along the z-axis. The top of each curve supposedly indicates the position of the vitreo–retinal interface at that particular location. For our measurements, we adjusted the scan depth in such a way that the entire shape and size of the optic disc was enclosed within the 32 frames. Scan depth ranged between 2 and 4 mm. The scan offset was set at three for all measurements and the scan angle at 15°.

Patients and volunteers

We performed measurements on 14 healthy volunteers and 14 glaucoma patients of similar age (see Table 1), with no difference in total area of the optic disc (total contour area), focal correction, or corneal radius. Table 1 shows that the glaucoma patients only differed statistically significantly from the healthy subjects in visual field parameters. The study followed the tenets of the Declaration of Helsinki and was approved by the institutional human experimentation committee. Informed consent was obtained from all participants. None of the healthy subjects had a remarkable ocular history. They all had an intraocular pressure (IOP) of less than 24 mmHg in both eyes, as measured with Goldmann applanation tonometry, healthy looking optic discs on ophthalmoscopic evaluation, and normal visual fields, defined as a Glaucoma Hemifield Test within normal limits with standard automated perimetry (Humphrey Field Analyzer (HFA) 24-2 full threshold program; Carl Zeiss Meditec, Dublin, CA, USA). One eye per volunteer was randomly selected for measurement.

Table 1 Demographics of healthy volunteers (N=14) and glaucoma patients (N=14) as well as the visual field parameters mean deviation (MD) and pattern standard deviation (PSD) measured with the Humphrey Field Analyzer (HFA)

The glaucoma patients were recruited from a cohort study of patients with primary open angle glaucoma (POAG). All had a reproducible visual field defect and a glaucomatous appearance of the optic disc, as judged by a glaucoma specialist. Their IOP was less than 21 mmHg with medication. In unilateral glaucoma, the affected eye was studied. If both eyes were affected, one eye was randomly selected.

Subjects with any coexisting ocular disease, including pseudophakia, cataract, or systemic diseases with possible ocular involvement, for example diabetes mellitus and systemic hypertension, were excluded from the study.

Measurement protocols

All subjects were measured three times on separate days within a 6-week period. In addition, on one of these occasions, all subjects were measured three times within 1 h. Each measurement session started with positioning a subject's head as upright as possible on the head and chin rest. The pupils were left undilated. All images were acquired by the same operator (NJR). A live image of the optic nerve head was obtained on screen by adjusting depth of focus and illumination while the subject looked at a fixation LED. At each measurement session, three high-quality images were obtained. Scans were deemed to be of high quality if they met the following criteria: centered optic disc, well-focussed, even and just illuminated throughout the image, and without any significant motion in the 32 consecutive images along the z-axis. These three images were then aligned and averaged by the software to form one mean image.

In each mean image, the edge of the optic disc was marked by manually drawing a user-defined region (UDR) along the inner scleral ring. All UDRs were drawn manually by the same observer (GJMT), without the aid of stereoscopic optic disc photographs. For all measurements, the reference offset for the cup margin was set at −100 μm from the reference plane (see Figure 1), as advised by the TopSS’ manufacturers. The instrument's software then automatically calculated 18 optic disc parameters, listed in Table 2. Some of these parameters have also been graphically depicted in Figure 1. The UDR drawn in the first image was subsequently exported to all following images acquired in the same eye, to minimize variability due to manual drawing of contour lines.

Figure 1
figure 1

Schematic of various optic disc parameters obtained with a scanning laser ophthalmoscope, employing a manually drawn user-defined region (UDR) and a reference offset (RO) of −100 μm.

Table 2 Mean values and standard deviations (SD) of optic disc parameters obtained in healthy volunteers and glaucoma patients

Measurement precision and 95% limits of agreement

Measurement precision of the 18 TopSS optic disc parameters was expressed as repeatability for measurements acquired on the same day (ie within-visit-variability) and reproducibility for measurements acquired on separate days (ie between-visit-variability). Repeatability refers in this context to the measurement variance, which is primarily due to technical and operator variability, as well as short-term biological variability. Reproducibility also reflects variance over longer periods of time. An instrument's measurement precision, expressed as repeatability, will notably influence its diagnostic accuracy in a screening setting. Measurement reproducibility will largely affect the ability of a parameter to detect changes over time within a subject. In the present study, we also determined the repeatability of measurements with the TopSS for single images, employing three images of one single measurement session.

For each parameter, measurement precision of both repeatability (within-visit-variability) and reproducibility (between-visit-variability) were expressed as LA95%;23 the smaller these limits, the higher measurement precision is. LA95%-values were calculated as follows:

where is the average within-patient standard deviation (see below). The latter was only accepted after a test for heteroscedasticity (ie non-uniform within-patient variance) by plotting the typical error (ie within-patient standard deviation: σwithin-patient) against the within patient mean and testing for any correlation. If no significant Pearson's correlation coefficient (P>0.05) was found, uniformity of within-patient variance is likely, enabling determination of the limits of agreement. Otherwise the parameter was discarded. The within-patient standard deviation of measurements (σwithin-patient) was calculated for each patient in the usual way

The LA95% provides the absolute difference between measurements that must be exceeded to detect statistically significant changes over time due to development or progression of disease. Suppose an optic disc, still healthy on prior SLO-evaluation, develops glaucomatous optic disc atrophy, the amount of change from earlier measurements will have to exceed the LA95% of healthy subjects to be statistically significantly detectable. The LA95% for glaucoma detection by progression (ie conversion) is therefore based on measurement reproducibility (between-visit-variability) in healthy volunteers. Similarly, in outright glaucoma, any statistically significant change would have to exceed the LA95% for glaucoma patients in order to detect further progression of optic disc atrophy. Note that the LA95% may vary across parameters. We employed the LA95% to introduce a new method to compare the clinical ability of any parameter to monitor disease progression, designated the DCI (see below).

Discriminating capacity index (DCI)

In our opinion, a fine measure of change detection must be precise relative to the measurement range of interest, that is, yielding many possible steps along the measurement range. The number of steps can be determined by comparing measurement range with measurement precision (LA95%). We therefore divided for each parameter the measurement range by its LA95% for reproducibility and called this fraction the DCI. The diagnostic range was arbitrarily defined as the absolute difference between the mean values of a parameter in healthy volunteers and mild glaucoma patients as observed in our population (ie an HFA MD >−6 dB). The follow-up range was arbitrarily defined as the absolute difference between the mean values of a parameter in our mild and severe glaucoma patients (ie an HFA MD <−18 dB). The means obtained for measurement reproducibility (Table 2) were used to calculate the diagnostic and follow-up range. Thus, the DCI for detection of glaucomatous optic nerve atrophy (DCIglaucoma detection) was calculated as follows:

Similarly, the DCI for follow-up of progressive glaucomatous optic nerve atrophy (DCIglaucoma follow-up) was calculated as

The DCI is dimensionless, which facilitates comparison between optic disc parameters.

Statistical analysis

For continuous data, a Wilcoxon signed-rank sum test was performed in paired measurements and a Mann–Whitney U test in unpaired measurements. Binomial distributions were analysed by means of a Pearson's χ2 test. For all tests, a P-value <0.05 was considered to be statistically significant. All statistical analyses were performed with SPSS software version 9.0 (SPSS Inc., Chicago, IL, USA).

Results

Table 2 presents the mean values and standard deviations of the 18 optic disc parameters calculated by the TopSS.

95% limits of agreement

Table 3 presents the LA95% of all optic disc parameters for both healthy volunteers and glaucoma patients. Values that could not be calculated because of heteroscedasticity have been omitted (marked as N/A in Table 3). Table 3 shows that repeatability (within-visit-variability) of mean images (II) gave generally lower LA95%-values than for single images (I); testing both columns against each other yielded for healthy volunteers P=0.008 and for glaucoma patients P=0.06. Table 3 also shows an overall difference in mean image repeatability (II) and reproducibility, that is between-visit-variability (III), with regard to LA95% in healthy volunteers (P=0.01); differences were most pronounced for average slope angle (P=0.02) and neuroretinal rim area (P=0.04), and were possibly present for effective area (P=0.06), volume above (P=0.07), and cup–disc ratio (P=0.07). In the glaucoma patients, no such difference was found overall (P=0.86), nor for individual parameters. Interestingly, mean image repeatability (II) did not differ between healthy volunteers and glaucoma patients (P=0.20).

Table 3 95% Limits of agreement (LA95%) obtained for repeatability (within-visit-variability) of both single images (I), and mean images (II), as well as for reproducibility (between-visit-variability) of mean images (III); the smaller these limits, the higher measurement precision is

The DCI for glaucoma follow-up and detection

Table 4 shows that the DCIs in glaucoma patients were higher than those in healthy volunteers. Some parameters had a DCI smaller than 1, and are therefore unlikely to be of clinical use; note that most of these parameters failed to show a statistically significant difference in mean values between groups, as shown in Table 2.

Table 4 The Discriminating Capacity Index (DCI), based on measurement reproducibility (columns III, Table 3), provides a quantitative estimate of the usefulness of a parameter for glaucoma follow-up, when determined in glaucoma patients, and for glaucoma detection, when determined in healthy volunteers

In Table 5, parameters have been listed with a DCI >1.5 for glaucoma detection and with a DCI >4.0 for glaucoma follow-up, together with their respective LA95%. This table illustrates the large difference in the DCI between glaucoma detection and follow-up. In addition, it shows that there were more parameters possibly useful for glaucoma follow-up than for detecting glaucoma conversion. Interestingly, volume below, which had the highest DCI for follow-up, is not useful for detecting conversion to glaucoma, because the LA95% could not be calculated due to heteroscedasticity of this parameter in healthy volunteers. Mean contour depth, which ranked high for follow-up, may also be useful for glaucoma detection. The best parameter for glaucoma detection, neuroretinal rim area, though possibly useful for glaucoma follow-up, did not perform half as well as Mean contour depth or Volume below.

Table 5 Optic disc parameters that may be suitable for glaucoma follow-up or detection in an individual

Discussion

The present study provides the LA95% for optic disc parameters measured with an SLO in glaucoma patients and in healthy volunteers. For each parameter, these limits represent, in their own measurement scale, the absolute difference between measurements required to detect a significant change. In addition, we derived for each parameter the dimensionless DCI, which equals the number of differential steps that can be distinguished in a specified measurement range, corrected for measurement precision. Thus, the newly proposed DCI provides a quantitative estimate of the clinical usefulness of SLO parameters for detecting both conversion to glaucoma and change in outright glaucoma.

In the TopSS, the most useful parameter for glaucoma follow-up appeared to be volume below. Suitable parameters for both follow-up and detection might be mean contour depth, cup-disc ratio, and neuroretinal rim area. We also found an attenuated biological variability over time in glaucoma patients.

Our results indicate that optic disc biometry may be of greater value for follow-up of outright glaucoma than for detecting conversion from a healthy state to glaucoma. One of the limiting factors in conversion detection may be the influence of biological variability over time, apparent as an increase in LA95% between measurement repeatability (ie within-visit-variability) and reproducibility (ie between-visit-variability) in healthy volunteers, which was not observed in glaucoma patients; repeatability did not differ between healthy volunteers and glaucoma patients. This effect was most prominent in parameters related to the effective area (ie cup area). These results agree with the findings by Tan et al,19 who found a significant influence of biological variability on measurement precision of rim area in healthy volunteers and not in glaucoma patients. However, they did not mention whether such influences existed for other optic disc parameters. In addition, long-term fluctuations of the normalized rim/disc area quotient in normal eyes have been reported,25 while a comparable trend was noted in super pixel height measurements in healthy volunteers (P=0.14, N=10).26

The larger biological variability in healthy subjects compared to glaucoma patients is an interesting finding. It has been found that long- and short-term fluctuations are characteristic of many physiological states, which can be explained by chaos theory principles.27 A decrease or loss of this variability that leads to regular patterns or morphological appearances is often a sign of disease. For example, heart rate variability, which reflects autonomic nerve function in healthy individuals, may decrease in postinfarction and diabetic patients and is a sign of an increased risk for sudden cardiac death.28 The larger variability in optic disc biometry of healthy persons, compared to glaucoma patients, warrants a longitudinal study of normal subjects.

To express measurement precision of an optic disc parameter, we employed Bland and Altman's limits of agreement23 because this is an absolute and uniform measure of precision that can be used directly as a clinical diagnostic tool for detecting changes in follow-up measurements due to disease. Previous reports have employed the coefficient of variation7, 12, 17, 19, 29 for expressing measurement precision. Unfortunately, it is a relative measure of precision; therefore, it cannot be directly applied as a diagnostic tool in disease detection and follow-up over time.

We acknowledge that the instrument employed in this study, that is the TopSS, is scarcely used compared to the HRT, which may limit the general interest in our data. More importantly, however, we have used our data to propose a novel method for mathematically estimating the usefulness of a parameter based on its measurement precision, both for detecting disease in a very early stage (ie at the time of conversion) and for follow-up. Therefore, we proposed the so-called DCI, which is a mathematical estimate of the number of discriminative steps along the average theoretical curve of a disease parameter by which changes may be monitored within an individual over time. A high DCI indicates that a parameter is very sensitive in detecting change over time, for example, due to conversion or progression of disease. Moreover, a low DCI strongly suggests that a parameter is unlikely to be of clinical use because measurement precision is relatively low. It turned out that some parameters were mathematically better suited for detecting disease and others for detecting progression. We think that the DCI might merit a wider usage than optic disc biometry alone. We stress, however, that because the DCI is a mathematical estimate, it only provides an indication of possible clinical use, and therefore requires further clinical validation by longitudinal follow-up studies. Another limitation of our study is that the study groups were small, especially those employed for estimating the DCIs for follow-up from mild to severe glaucoma. This will have influenced the nominator of the fraction, which forms this index. However, the reliability of the DCI mainly depends on its denominator, since the latter is much smaller compared to the nominator. The denominators, being the LA95%-values, were based on both adequate group sizes and numbers of measurement. Nevertheless, the ranking of the DCI parameters listed in Table 5 may change when a more precise estimate of the follow-up range is derived from larger group sizes. As a result, these DCI values may be different from the populations they came from, or from other populations, and their magnitudes may therefore not be applicable in other settings.

Our results indicate that most optic disc parameters calculated from a reference plane and with a manually drawn user-defined region are uniformly reproducible (ie show no heteroscedasticity) and that several parameters may prove useful for glaucoma follow-up and detection. In addition, the use of mean images appeared to be better than single images, in agreement with an earlier report by Weinreb et al.30 So far, few studies have reported on the possible value of SLO optic disc biometry for glaucoma follow-up. One possible technique for follow-up, the Topographic Change Analysis20 was used by Chauhan et al21 in a 5-year longitudinal follow-up study. During these 5 years, more glaucoma patients showed a progressive loss of optic nerve head tissue than a progressive loss of visual field function. Their findings support the idea that the main merit of SLO technology lies in glaucoma follow-up, rather than its detection. Regarding automated perimetry, the results reported by Artes et al31 clearly show heteroscedasticity below threshold values of 30 dB. Our study underlines the importance of taking this determinant of measurement precision into account. Heteroscedasticity may otherwise cause erroneous over- or underestimation of the significance of change in a parameter. It is conceivable that the accuracy of follow-up of more severe stages of glaucoma by automated perimetry is limited. Using the data from Artes et al,31 the DCI for glaucoma follow-up of standard automated perimetry in our study group was calculated to be at best 5.6 below a 30 dB threshold, but probably less because of the heteroscedasticity. As can be seen in Table 5, at least two SLO parameters yielded clearly higher values (ie >8.0). This is an example of the comparative function of the DCI between different instruments and suggests that SLO optic disc biometry could be of added clinical usefulness in glaucoma follow-up.

We have shown that SLO optic disc biometry is uniformly reproducible in both healthy subjects and glaucoma patients. By expressing measurement precision as LA95%, SLO optic disc parameters can be used in individual patients as a clinical diagnostic tool. A new method, the DCI, has been introduced, providing a quantitative estimate of the clinical usefulness of each parameter. The DCI facilitates the selection of suitable parameters and enables comparison between different measurement techniques. Based on this new method, we suggest that measurement of volume below, mean contour depth, effective area, or average depth by means of SLO may be of added value for glaucoma follow-up. Moreover, we found an attenuated biological variability over time in glaucoma patients.