Introduction

Microperimetry has been implemented clinically to achieve a direct correlation between retinal pathologies and functional defects by providing simultaneous observation of the fundus and a correction for eye movements during the perimetric examination.

Over the past several years, the scanning laser ophthalmoscope (SLO) was the only commercially available microperimeter.1, 2, 3 Previous studies have shown the value of this method for the follow-up of patients with progressive macular diseases.4, 5, 6 An important disadvantage of the SLO was the lack of a real-time fundus-tracking function. In addition, the SLO can no longer be obtained.

Recently, a new microperimeter, the Micro Perimeter 1 (MP1; Nidek Technologies, Italy), has been introduced to the market. This instrument includes automated full-threshold perimetry software and a real-colour fundus image acquisition. Compared to the SLO, covering a field of 33 × 22°, an enlarged area can be tested with the integrated 45° fundus camera. After the examination, an overlay of the perimetric findings onto the colour image of the central retina is provided. For a follow-up exam, the previous examination data can be loaded, so as to stimulate the retina exactly in the same location and with the same intensities used in the first exam, allowing an accurate comparison of the functional assessments of the two exams.

As with any new diagnostic instrument, the validity of the data acquired might be reduced by variability, both interdevice and interoperator.7 Interoperator variability was previously studied for other medical devices like optical coherence tomography (OCT),8 the BVI ultrasonic Pachymeter,9 the GD × VCC,7 and the prototype of the ACMaster.10 Other devices for visual field testing such as the Humphrey or Octopus perimeters were analysed in detail for test/retest variability.11, 12, 13, 14, 15 Intra- and interrater agreement of psychophysical tests was investigated regarding the Functional Field Score16 and cumulative defect curves.17 The interobserver agreement interpreting the test results of different devices was also examined,18, 19 but no studies concerning intra- and interoperator reliability of the examination itself were found.

The aim of the present study was to quantify the interexaminer and intraexaminer reliability and variability of the MP-1.

Methods

Participants

We assessed 35 eyes of 35 consecutive patients in our institutional study and divided them into three different groups as follows. Group 1 (young): 15 eyes from 15 healthy young volunteers with no history of ocular diseases, other than ametropia. Group 2 (old): 15 eyes of 15 healthy subjects over 60 years of age, with a history of cataract surgery at least 4 months before the study, with no optical opacities and a macula with no pathological findings. Group 3 included five patients with age-related macular degeneration (ARMD) (Age-Related Eye Disease Study classification III: Intermediate ARMD—absence of advanced ARMD in both eyes and at least one eye, with 20/32 vision or better, with at least one large drusen (125 μm), extensive intermediate drusen, geographic atrophy not involving the centre of the macula, or any combination of these).

In each group, one eye was included for examination, according to the randomization scheme.

Each patient underwent a routine ophthalmologic examination, including funduscopy and determination of best-corrected visual acuity. Distance visual acuity at 4 m was tested using the Early Treatment Diabetic Retinopathy regimen, best-corrected refraction, and retro-illuminated charts.

Optical coherence tomography was used to quantify retinal thickness. We certify that all applicable institutional and governmental regulations concerning the ethical use of human volunteers were followed during this research.

Perimetric examination

All subjects in the study were tested for the first time during the baseline exam to avoid a possible learning effect. Perimetric examination was performed after explanation of the method to all participants and an adaptation time to the darkened room of 5 min.

To assess interexaminer reliability and variability, microperimetry was performed by two examiners on the same day. Examiner 1 (E1) was a newly recruited staff member with no prior experience in operating any diagnostic or photographic equipment. Examiner 2 (E2) was a skilled ophthalmologist with experience in operating most types of ophthalmic, diagnostic, and photographic equipments, but no further experience in operating the MP-1. To become familiar with the device and the software, the two operators read the MP-1 operation manual before commencing the study. The examinations were performed in random order with regard to the examiners, to prevent biasing of the results due to a possible learning effect. In between the two individual measurements (E1 and E2) of the same day, there was a break of at least 15 min. Retinal sensitivity and stability of fixation were studied. The interexaminer variability resulted from the fluctuations in measurements between examiners (E1a and E2).

To analyse the intraexaminer reliability and variability, E1 performed an additional measurement (E1b) a week later. The intraexaminer variability resulted from the variation in the measurements obtained by E1 (E1a and E1b).

Microperimeter MP-1

Microperimetry was performed using the MP-1 (Nidek Technologies). A test grid with 41 stimulus locations was applied, covering an area of 10° diameter. Goldmann III stimuli and a 4-2-1 staircase strategy were used. The stimuli were projected on a white background with background illumination set to 1.27 cd/m2 (1.27 cd/m2=4 apostilbs; 1 asb=0.31831 cd/m2; stimulus intensity may be varied on 1 (0.1 log) step scale from 0 to 20 decibels (dB), where 0 dB represents the brightest luminance of 400 asb=127 cd/m2) and a stimulus presentation time of 200 ms. A single cross of 2° was used as fixation target in Groups 1 and 2; in Group 3, we used a single cross of 5°.The perimetric strategy of the MP-1 starts at an initially defined threshold level (12 dB) for each stimulus. A 4-2-1 staircase strategy is then implemented and the weakest recognized value is documented as the threshold for retinal sensitivity at each tested site. Light threshold in dB of all test locations was analysed for the study.

The fundus movements are tracked during examination while the patient gazes at the fixation target to assess fixation.20 The autotracking system calculates horizontal and vertical shifts relative to a reference frame and draws a map of the patient's eye movements during the examination. The recorded fixation points are classified into three categories for fixation analysis (stable, relatively unstable, unstable) in the manual of the MP-1 as well as in the literature.21 Fixation is defined as ‘stable’ if more than 75% of the fixation points are inside the 2° diameter circle, as ‘relatively unstable’ if less than 75% are inside the 2° diameter circle, but more than 75% inside the 4° diameter circle, and as ‘unstable’ if less than 75% of the fixation sites are inside the 4° diameter circle.

Statistical methods

The statistical tests were performed using SPSS v. 14.0.1 for Windows (SPSS Inc., Chicago, IL, USA) and MedCalc v. 9.4.1.0 (MedCalc Software, Mariakerke, Belgium).

All data were analysed with analysis of variance (ANOVA) test for repeated-measures ANOVA. Mean and standard deviations (SDs) of macular sensitivity were calculated for each group separately. To quantify reproducibility, the mean differences between the examiners and the SD of this difference were calculated to assess agreement. Subsequently, the 95% limits of agreement were charged, using Bland–Altman plots.22, 23 To further assess reliability, the intraclass correlation coefficients (ICCs) were calculated from a two-way random effects model, for absolute agreement. Examination time was calculated with unpaired t-test comparing the different groups and paired t-test comparing the different examiners.

A P-value less than 0.05 was considered statistically significant.

Results

The mean age of Group 1 (9 women, 6 men) was 29±5 years (range, 24–38 years), of Group 2 (8 women, 7 men) 75±7 years (range, 64–86 years), and of Group 3 (3 women, 2 men) 74±7 years (range, 63–81 years).

Best-corrected visual acuity in Group 1 was −0.2±0.1 logMAR (Snellen 20/12.5), in Group 2 0.11±0.1 logMAR (Snellen 20/25), and 0.2±0.2 logMAR (Snellen 20/32) in Group 3. Retinal thickness, quantified with the OCT, was 217±33 μm in Group 1, and 246±31 and 297±10 μm in Groups 2 and 3, respectively.

The mean differential light threshold is shown in Table 1 for each examiner and each group separately. The mean differential light threshold for all groups was 15.29±3.56 dB for E1a, 15.16±3.61 dB for E1b, and 15.27±3.38 dB for E2, respectively.

Table 1 Mean differential light threshold in decibels (mean) and standard deviations of the mean differential light threshold (SD)

In Group 1, 100% of the participants showed a stable fixation. In Group 2, the fixation was stable in 55.56%, relatively unstable in 33.33%, and unstable in 11.11%. A stable fixation in Group 3 was found in 66.67%, a relatively unstable fixation in 26.67%, and an unstable fixation in 6.67% (Table 2).

Table 2 Fixation stability

In Group 1, all participants showed stable fixation independent of the examiner. There was also no significant difference in fixation stability in Group 2, neither interexaminer nor intraexaminer (E1aE1b: P=0.082 and E1aE2: P=0.433, respectively). In Group 3, there was also no intra- or interexaminer difference in fixation stability (E1aE1b: P=1.000 and E1aE2: P=0.208, respectively).

The mean examination time in Group 1 was 13 : 04±3 : 58 min and the tracked time was 10.45±1 : 53 min. In Group 2, the mean examination time was 16 : 09±6 : 24 min and the tracked time was 10 : 59±3 : 07 min. The mean examination time in Group 3 was 19 : 55±7 : 38 min and the tracked time was 11 : 57±2 : 51 min. The examination time in Group 1 was significantly shorter than that in Group 2 (P=0.006). The tracked time also showed a difference between Groups 1 and 2, but not statistically significant (P=0.802). Examination time and tracked time between Groups 2 and 3 showed no statistically significant difference (P=0.052 and P=0.151, respectively). Group 1 had a significant shorter examination and tracked time compared to Group 3 (P<0.001 and P=0.027, respectively).

Interobserver variability

The mean measurement differences between examiners E1a and E2 are shown in Table 3. The differences between the examiners were all nonsignificant using one-way ANOVA in either of the groups (Group 1: P=0.381; Group 2: P=0.979; and Group 3: P=0.276, respectively).

Table 3 Interexaminer variability

To assess agreement between both examiners, Bland–Altman plots were constructed. It could be seen that the limits of agreement were narrow with respect to the mean differential light threshold (Table 3).

In Groups 1 and 2, all but one of the data points (n=15 in each group) lay at or within 1.96 SDs of the mean. In Group 3, all data points (n=5) lay at or within 1.96 SDs of the mean.

To further assess agreement, ICC was calculated and found to be 0.845 (95% CI 0.538–0.948) in Group 1. In Group 2, the ICC was 0.964 (95% CI 0.892–0.988) and 0.996 (95% CI 0.963–1.000) in Group 3, respectively. The analysis indicated good agreement in all the groups for two different examiners.

In Group 1, examination time was longer and tracked time was shorter when E1a performed the perimetry. The differences were not statistically significant (P=0.171 and P=0.390, respectively). E1a had statistically significant longer tracked time (P=0.013) and not statistically significant longer examination times (P=0.522) in Group 2. In Group 3, tracked time was slightly longer (P=0.216) and examination time statistically significant shorter (P=0.035) during the perimetries carried out by E1a.

Intraobserver variability

The mean measurement differences between the two examinations of E1 (E1a and E1b) are shown in Table 4. The differences between the examinations were all nonsignificant using one-way ANOVA in either of the groups (Group 1: P=0.802; Group 2: P=0.135; and Group 3: P=0.577, respectively).

Table 4 Intraexaminer variability

Bland–Altman plots again showed that the limits of agreement were narrow with respect to the mean differential light threshold (Table 4).

In all the groups, all data points lay at or within 1.96 SDs of the mean.

To further assess agreement, the ICC was calculated and found to be 0.937 (95% CI 0.813–0.979) in Group 1. In Group 2, the ICC was 0.976 (95% CI 0.929–0.992) and 0.997 (95% CI 0.967–1.000) in Group 3, respectively. The analysis indicated good agreement in all groups for two different examinations performed by one examiner.

Examination time and tracked time were longer (P=0.110 and P=0.615, respectively) in Group 1 in the first perimetries (E1a). In Groups 2 and 3, the first examinations (E1a) had slightly shorter tracked (P=0.981 and P=0.436, respectively) and examination times (P=0.522 and P=0.173, respectively).

Discussion

Any new diagnostic image device requires initial evaluation that includes reproducibility, reliability, and variability. As shown in an earlier study, the MP-1 provides reproducible threshold values, with a systematic difference of 11.4–18.3 dB compared to standard octopus perimetry.20 In another paper, results comparable to those obtained with the SLO perimetry were obtained.24 When perimetric findings based on the scotoma depths were compared, there was near-complete agreement between the SLO and MP-1 perimetry. Sawa et al25 found a larger scotoma size with MP-1 than with SLO in eight of 15 examined eyes. Using the MP-1 within a sensitive area of SLO scotometry, decreases in retinal threshold sensitivity were found in all the eyes. The location of the preferred retinal locus and fixation stability in the MP-1 fixation test significantly correlated with that in SLO scotometry.

In contrast to these comparable results regarding the newer devices, there are some less reliable results in the literature: Keltner et al26 described that 85.9% of abnormalities in visual field were not confirmed in the retests in the Ocular Hypertension Treatment Study and pointed out the importance of reproducible results for long-term follow-up. In a later study, the same group even found one or more normal tests on follow-up after confirmation of glaucoma by three consecutive, abnormal, reliable test results with the Humphrey Field Analyser in 12%.27 These results suggest that either or both perimetric testing and early glaucomatous visual field loss may be inherently variable.27 As reported in earlier investigations, the amount of variability is much higher in patients with glaucomatous visual field loss;28, 29, 30 so the comparison with microperimetry in nonglaucomatous patients seems to be difficult.

In the current study, we evaluated the interexaminer and intraexaminer reliability of the MP-1.

A mandatory condition for a reliable examination is stable reproducibility and operator-independent results. The reliability should first be tested in healthy volunteers with good compliance to prevent biasing due to possible influencing factors.

Our results suggest a good reliability, allowing examiner-independent measurements in different groups of participants.

A possible limiting factor of our study might be the small study population, but it has to be seen as a first attempt to evaluate reliability of MP-1. The results show good agreement in all the three groups and also in the total study population.

In Group 3, the ARMD patients, there was also no significant difference, neither interexaminer nor intraexaminer. But the inhomogeneity of the group is shown due to larger SDs of the mean differential light threshold and larger coefficients of variation. Although there was no statistically significant difference between Groups 2 and 3 for visual acuity, the ARMD patients needed larger fixation targets (5° instead of 2°) to maintain stable fixation. In previous examinations with other ARMD patients, a single cross of 2° as fixation target caused problems to maintain stable fixation.

These facts, as well as the variability within the group, caused us to plan an additional study with a larger sample size of ARMD patients with an adapted ARMD test grid (adapted stimulus arrangement in closer distribution, covering a smaller area).

Not surprisingly, examination and tracked times were shortest in Group 1, the longest times were found in Group 3. Between Groups 2 and 3, examination times were comparable, both groups with elderly patients. Regarding the interobserver variability, E1a tended to have longer examination times in all groups. This could be a hint for a possible operator dependency, because E1 was the newly recruited staff member with no further experience.

Concerning the intraobserver variability, a possible learning effect was found in Group 1, the healthy young volunteers, with shorter examination and tracked times in the second examination. A negative learning effect was found in Groups 2 and 3, with longer examination times. This could be explained by the gap of 1 week between the examinations, in which time, presumably, the older participants forgot the positive learning effect of the previous week. On the other hand, a learning effect performing perimetry even with monthly intervals is well described in the literature.31, 32

Otherwise, these results could also be attributed to possible operator dependency, because the third examination was always performed by E1, the newly recruited staff member with no further experience.

Regarding the fixation stability, the elderly patients in Groups 2 and 3 showed multiple fixation losses. As mentioned above, ARMD patients needed a 5° fixation cross instead of 2° to maintain stable fixation. Therefore, more fixation losses are found in the older normal controls in Group 2 than in the ARMD patients in Group 3. Similar findings are found in the literature, in which age was a significant factor for fixation loss and therefore for unreliability of Humphrey visual field testing.33 Rohrschneider et al34 also found a decrease of fixation stability with increasing age, even in normal subjects, evaluating SLO. In contrast, Kosnik et al35 described no age-dependent difference of fixation stability measured by a Scientific Research International Mark IV dual Purkinje image eye tracker. But older observers showed greater variability in their fixations along the horizontal meridian compared to the vertical meridian;35 these more eccentric movements might be measured as fixation losses more often.

The increasing number of patients with diabetic maculopathy36 and patients with ARMD37 highlight the need for a diagnostic instrument that allows a precise analysis of the central visual field and enables an exact correlation between fundus pathology and corresponding functional defects. Microperimetry may be of value in the follow-up of diabetic macular oedema (DME), as it incorporates a functional measure that completes the prognostic value of OCT and visual acuity.36 Significant correlations between mean retinal sensitivities measured by the MP-1, the foveal thickness in the OCT, and visual acuity were found in a retrospective chart review of patients with DME.38 Vujosevic et al36 compared the changes in macular sensitivity and macular thickness in different degrees of DME. They found macular sensitivity to be a relevant explanatory variable of visual function, independent of macular thickness data.

Varano et al39 showed a nonsignificant increase in visual acuity and macular sensitivity after photodynamic therapy in 14 myopic eyes with choroidal neovascularization.

In a previous study, we evaluated the potential benefit of macular function tests in patients with macular hole and macular pucker who underwent macular surgery.40 The results highlighted a significant improvement in central visual function after surgery in both the groups. We could demonstrate a significant increase in visual acuity, retinal sensitivity, and retinal fixation measured with the MP-1, whereas preferential hyperacuity perimeter measurements could not identify a significant difference in the pre- and post-operative results.

These studies point out that the MP-1 might be helpful for exact evaluation of macular function in patients with macular diseases. Our study demonstrates a good reliability, allowing examiner-independent measurements.