Introduction

Perimetric evaluation of the central 24 degrees of monocular vision is the cornerstone of glaucoma diagnosis and monitoring and is critical for evaluating numerous diseases of the eye, optic nerve and visual pathways. Conventional perimetry, either standard automated perimetry (SAP) or frequency doubling perimetry (FDP), is commonly performed using dedicated calibrated perimetric machines [1]. Traditional printouts have been replaced by electronic files that can be linked to electronic medical records, or raw data that can be compiled with imaging modalities in dedicated software, but still involve the traditional hardware at dedicated practices run by qualified staff [2].

There are several advantages for perimetric testing using a web-application on a personal computer [3]. Such an application would allow at-home monitoring, reducing a potential barrier to access glaucoma screening and monitoring. COVID-19 has taught us the importance of remote health evaluation and telemedicine [4]. Developing countries, or remote/disadvantaged communities within developed countries, may suffer from lack of access to expensive, dedicated machinery or specialised, highly trained eye health professionals [5]. In contrast, computers are versatile and ubiquitous, and an online web-application requires only a basic computer with a standard monitor, a mouse and internet access.

Even for communities with plentiful access to visual field machines and eye healthcare staff, there are advantages of a computer-based system of perimetry. Many people do not enjoy perimetry, finding the test stressful and difficult [6]. A computer based app can be made more user-friendly, and with some similarities to a traditional computer game, can be made more enjoyable. Truly digitising the service will lend itself much more easily to enhancements with artificial intelligence and machine learning, and integrate better with a pivot to online health delivery [7, 8].

There are several potential hurdles in converting traditional perimetry to a web-based application––namely in terms of user monitoring, variations in monitor size and display––however the design of the on-line test can be adapted to overcome or compensate for most of these. These need to be considered in the context of the advantages of online delivery compared to machine-based. A further discussion of the features of the web application is outlined below.

A pilot study has demonstrated the benefit of the personal computer-based online circular contrast perimetry (OCCP) test to distinguish glaucoma from normal eyes with sensitivity 92% and specificity 87% [9]. This pilot study also found a similar area under receiver operating characteristic curve (AUC) for the OCCP to SAP and optical coherence tomography (OCT) parameters, as well as good agreement with these parameters using Cohen’s kappa coefficient.

The OCCP test can be further refined by optimising parameters and establishing a normative database. The latter is important to allow calculation of total deviation and pattern deviation plots, mean deviation (MD), pattern standard deviation (PSD) and visual field index (VFI) [10, 11].

Increased test duration often leads to user fatigue and associated inaccuracy [12]. Conversely, perimetry performed too quickly does not take sufficiently accurate measurements. A balance must be reached to ensure most reliable and consistent results [13, 14]. Repeat staircase estimations (RSEs) and probability density functions (PDFs) are commonly used techniques to optimise accuracy.

A RSE checks for errors by being initiated if the difference between measured threshold and the estimate is greater than a certain value. In the initial OCCP model, this RSE was initiated with a deviation of seven relative decibels with the final threshold taken as the average of the two staircase estimates, as a cautious over-estimation of the need for repetition based on similar testing strategies [15, 16]. It is important to determine if RSEs are useful and if so what the appropriate deviation for triggering a RSE should be.

PDFs allow thresholds to be determined with fewer sample targets than formal double staircases, based on prior knowledge of age-matched population normal standards. This underlies much of the efficiency of modern Swedish Interactive Threshold Algorithm (SITA) algorithms and requires a carefully chosen error related factor (ERF) to calculate the appropriate cutoffs for the PDF [17, 18].

We describe the collection of a normative database and process of refining test parameters to calculate MD, PSD and to optimise the OCCP by determining the appropriate use of PDFs and RSEs.

Methods

Subjects

Subject recruitment was from a Melbourne multisite Ophthalmology subspecialty practice in 2021. The study adhered to the tenets of the Declaration of Helsinki. Ethical approval was provided by the Royal Australian and New Zealand College of Ophthalmology Human Research Ethics Committee. Subjects were consecutively invited to participate in the study after providing informed consent.

Inclusion criteria were: best-corrected visual acuity ≤0.3 logarithm of the minimum angle of resolution (logMAR); open anterior chamber angle; reliable SAP and OCCP test results; normal SAP results; satisfactory optical coherence tomography (OCT) image quality with normal optic nerve head (ONH) and retinal nerve fibre layer (RNFL) appearance; willingness and ability to provide informed written consent.

Exclusion criteria were: ametropia >±6 dioptres; ocular pathology (such as glaucoma, visually significant cataract (Lens Opacities Classification System III greater than Grade 2) [19], non-glaucomatous optic neuropathy, retinal or macular pathology); angle abnormalities; optic nerve head anomalies; large peripapillary atrophy; media opacities preventing good image scans; neurological disorders; medication that could modify perimetry results (ie chloroquine, vigabatrin, pilocarpine, etc.); neurological morbidity that could affect perimetry; significant cognitive impairment; previous intraocular surgery (excluding cataract surgery performed at least six months prior); OCT findings consistent with glaucomatous optic neuropathy (GON); unreliable SAP and OCCP results.

Reliability criteria for OCCP and SAP tests included false-positive >15%, false-negative >33%, and fixation losses >20%. A normal SAP result was defined according to the Hoddapp and associates criteria [20]. Glaucoma was diagnosed based on characteristic disc appearance and visual field changes defined on Anderson’s criteria [21].

OCT scans with segmentation errors or signal strength lower than 8/10 were rejected. All examinations were reviewed by ophthalmologist SS for appropriate centration.

Assessment of clinical parameters

All participants underwent a complete ophthalmologic examination, including personal and family history, to rule out ocular or systemic pathology that could affect vision, SAP, OCCP, Cirrhus OCT of the ONH (Carl Zeiss Meditec Inc, Dublin, California, USA), and macula within a two month period. SAP was performed using the Humphrey Field Analyzer SITA standard 24–2 test (Zeiss). Intraocular pressure was measured with the Goldmann applanation tonometer (GAT; Haag-Streit International, Bern, Switzerland). Central corneal thickness (CCT) was measured with the PachMate hand-held pachymeter (Exton PA, USA).

Normal ONH and RNFL appearance was clinically defined as: cup-to-disk ratio ≤0.6 and the absence of diffuse or focal rim thinning, cupping, localised pallor, optic disk haemorrhage, or RNFL defects. ONH and RNFL were classified as GON if at least one of the following was evident: notching involving ≥2 clock hours; focal or diffuse atrophy of neural rim area involving ≥2 clock hours; inter-eye vertical cup-disk asymmetry ≥0.3, ONH excavation; disc haemorrhage; or generalised or focal atrophy of the RNFL. The eyes were classified as GON based on consensus grading of the ONH by two expert graders (SS, DB).

The online circular contrast perimetry test

Online monocular 24-degree 52-loci perimetry was delivered through a web-application designed using circular flickering sinusoidal targets. These targets are similar to those in Pulsar perimetry (Haag-Streit International, Bern, Switzerland) with the same level of contrast in all radial directions, to avoid stimulation of those cells that selectively respond to a given orientation [22, 23]. They are similar in size to targets used in 24–2 FDP (Welch Allyn, Skaneateles, N.Y., and Carl Zeiss Meditec, Dublin, Calif.) at 3.5 degrees of visual angle and like FDP targets they are of consistent contrast throughout their spatial extent [24].

Users were asked to maintain gaze on a spinning golden star which moves mid-test from the upper to lower screen to maximise sampling area. Subjects were instructed to click the mouse when they saw a target in their peripheral vision and were instructed to use spectacles with near add as required, similar to their conventional computer use.

Similar to traditional FDP, each flicker lasted 60 milliseconds making three cycles of on/off flicker lasting 300milliseconds; likewise there was a ramp up and down of contrast at the beginning and end of the target presentation to prevent temporal transients [24, 25]. Unlike FDP, in which light and dark bands vary around a mean of background luminance, target light-band colour was fixed to background (a fixed light grey), while dark-band varied to achieve the desired contrast level, similar to a luminance pedestal flicker for stimulus decrements, as described by Anderson and Vingrys [26]. The sequence of stimulus presentation is adapted to the patient’s response time; i.e., accepted latency of response time varies according to prior user responses to ensure the test is at an appropriate tempo for each user. A random delay between stimuli is added in order to avoid rhythmic responses.

Relative luminance (as a percentage) was calculated for each 256-bit greyscale level defined based on the Web Content Accessibility Guidelines standards for relative luminance calculation, ranging from pure white (255,255,255) as 100% and black (0,0,0) as 0% [27]. The Michaelson formula was then used to calculate contrast of targets (in relative decibels) comparing peaks and troughs of targets:

$${{{{{{{\mathrm{Relative}}}}}}}}\,{{{{{{{\mathrm{decibel}}}}}}}}\;\left( {{{{{{{{\mathrm{rdB}}}}}}}}} \right) = - 2{{{{{{{\mathrm{log}}}}}}}}\left( {{{{{{{{\mathrm{RL}}}}}}}}_1 - {{{{{{{\mathrm{RL}}}}}}}}_2} \right)/\left( {{{{{{{{\mathrm{RL}}}}}}}}_1 + {{{{{{{\mathrm{RL}}}}}}}}_2} \right)$$

Where RL1 is the light band maximum [28] and RL2 dark band minimum relative luminance. A staircase system with two reversals on 37 rdB levels ranging from zero to 36 was used. Similar to conventional SITA algorithms, prior to the initial reversal (ie going from seeing to not seeing the target, or vice versa) the magnitude of the change in target level was four rdb; after the first reversal this reduced to two rdB [15]. Head position was assessed via the monitor’s webcam using artificial intelligence––details of the face were not stored or used for recognition purposes.

Blind-spot localisation at the start of the test was used to subsequently count fixation losses. Estimation of the blind spot was at 15 degrees temporal and 0.5 degrees inferior to fixation. Using a grid spanning 4 × 10 degrees overlying the proposed blind spot area, small, close spots were tested to map out the user’s blind-spot. The blind-spot and loci for 24–2 perimetry were spaced on the monitor relative to fixation, determined trigonometrically so that patients performed the test using a computer at 40–45 cm.

Reliability indices were assessed similar to conventional perimetry. Fixation loss was assessed using smaller stimuli within the measured blind spot at 0 rdB. False negatives were assessed with stimuli brighter than measured sensitivity. False positives were measured by mouse clicks not during the appropriate latency period.

Sound was used to guide users through the test. A positive, reassuring noise is produced every time the user clicks at a correct time––similar to the noise when finding a coin on a computer game. A negative sound, similar to the noise when an error occurs when using computers or computer games, is produced when the user clicks at an incorrect time–this corresponds with a false positive.

The OCCP was coded in javascript with web application server-side coded using a python microframework. Using the window.request Animation Frame object with a timestamp callback in the javascript code allows precision of timing measurements for loci presentations and flicker despite potential inconsistencies in screen refresh rate (https://developer.mozilla.org/en-US/docs/Web/API/window/requestAnimationFrame).

Test settings

All subjects underwent the OCCP in a controlled clinical environment, supervised by researchers DB or LB who ensured consistency of testing protocol. The room lighting conditions were kept dark (save for the monitor light) for 5 min prior to testing commencement. The computer was turned on for at least 30 min prior to test administration, to ensure consistency of adaptation and screen brightness. Screens were calibrated using a SpyderX screen photometer (Datacolor, Lucerne, Switzerland). Background screen luminance was set at 180 cd/m2 output. Gamma was set at 2.2 and white temperature 6500 K, consistent with most modern monitor displays, and all monitors used were 24-inch diagonal screen size with resolution 1920 × 1080 pixels. Viewing distance was measured at 40 cm immediately prior to test commencement.

Main outcome measures

These were per-point mean sensitivity, test duration, number of stimuli presentations per test, number of RSEs and mean square error (MSE) for OCCP; per-point mean sensitivity, MD, PSD and test duration for SAP.

Statistical analysis

Statistical analyses were conducted using Statistical Package for Social Sciences (SPSS, Inc., Chicago, IL, USA) and Real Statistics in Excel 2016 (Microsoft 365). Significance was set at p < 0.05, with adjustment by the Bonferroni method. Data were checked for normality using the Shapiro-Wilkes statistic.

Continuous variables were presented as mean (SD). Mean (SD) threshold was calculated per locus tested in the central 24 degrees. Linear regression analysis was conducted to evaluate the relationship between age and threshold, and between age and test duration. Bland–Altman analyses were used to analyse the agreement and estimate the 95% limits of agreement (LoA) between the point-wise sensitivities for OCCP and SAP [29].

Modelling was used to determine optimal use of RSEs and PDFs. Accuracy was measured based on pointwise and full-field MSE between measured and simulated threshold.

Repeat staircase estimation

Number of RSEs and MSE improvement was analysed over a range of potential sensitivity thresholds. RSEs were defined as useful (second measured sensitivity closer to expected than first by four or more rdB), harmful (second measured sensitivity further from expected than first by four or more rdB) or useless (neither useful nor harmful).

Probability density functions and error related factor

The ERF was calculated according to the following equation: [17, 18]

$${{{{{{{\mathrm{ERF}}}}}}}} = {{{{{{{\mathrm{a}}}}}}}} + {{{{{{{\mathrm{b}}}}}}}}_1\,{{{{{{{\mathrm{x}}}}}}}}\,{\sqrt {{{{{{{\mathrm{variance}}}}}}}} + {{{{{{{\mathrm{b}}}}}}}}}_2\;{{{{{{{\mathrm{x}}}}}}}}\;{{{{{{{\mathrm{threshold}}}}}}}}\,{{{{{{{\mathrm{value}}}}}}}}$$

To optimise the use of PDFs modelling was performed to assess the impact of different ERF limits to optimise test accuracy and efficiency, keeping other parameters constant.

Determination of global indices

MD, PSD and VFI were determined for the OCCP based on previously defined formulae and the OCCP sensitivities in this cohort [10, 11].

Results

158 eyes of 101 patients with mean 62.9 ± 14.3 years were included in the study. 13 eyes (7.6%) from 13 patients were excluded due to unreliable OCCP results. Mean testing time was 377(±41) seconds per eye.

Table 1 presents the patient characteristics and SAP test parameters. Mean (SD) of pointwise thresholds are presented in Fig. 1a comparing OCCP with SAP for left and right eyes; Fig. 1b outlines the respective mean sensitivities plotted for each of the 52 loci. A Bland-Altman plot demonstrated good agreement between OCCP and SAP for 51 out of 52(98%) loci, with a bias of 4.02 dB (95% confidence interval = 3.77–4.27) and 95% limits of agreement ranged from 2.24 to 5.76 dB, p < 0.001 (Fig. 2a). Similar to SAP, mean sensitivity for OCCP reduced with age, corresponding to a reduction of 1.0 relative decibel per decade (Fig. 2b). Mean test duration increased with age at 14 ± 2.5 sec per decade (p < 0.001).

Table 1 Patient characteristics and perimetric test results.
Fig. 1: The physiological hill of vision.
figure 1

a Pointwise mean sensitivity and standard deviation in decibels versus relative decibels for SAP versus OCCP; (b). mean SAP versus OCCP sensitivity distributed over loci (vertical dotted lines dividing rows in perimetric map).

Fig. 2: Comparison between SAP and OCCP.
figure 2

a Bland Altman plot comparing SAP with OCCP for 52 loci. The continuous line represents the mean differences (bias) between the two tests. The dashed and dotted lines represent the 95% limits of agreement (Bias ±1.96 SD). b Mean sensitivity of SAP and OCCP reduces with age.

Repeat staircase estimation

Sensitivity on the first staircase was determined after an average of 5.15(±2.08) stimuli; this increased to 7.97(±2.58) stimuli for loci evaluated with a repeat staircase estimation. MSE reduction was modest at all assessed thresholds (Fig. 3a).

Fig. 3: Optimising repeat staircase estimates (RSEs) and error related factor (ERF).
figure 3

a Number of RSEs in cohort (bars) subdivided according to useful, useless or harmful; mean square error (MSE) reduction (black dots); at different RSE instigation threshold values. b Number of stimuli required per test versus average mean square error (MSE): the influence of varying ERF limit on use of probability density functions.

Probability density functions and error related factor

The utility of PDFs was measured at varying ERF limits, demonstrating increasing MSE with reduced testing stimuli (Fig. 3b). At an acceptable MSE level of 1.0 rdB2 an ERF of 0.85 was chosen (solid black line).

Determination of global indices

Based on this normative database MD, PSD and VFI for the OCCP test were calculated for the cohort (Table 1).

Discussion

Given the potential advantages of online perimetry, it is important to refine testing by optimising parameters and establishing expected results in normal eyes. This study demonstrates that an age-sensitive normative database can be established for OCCP. This information can be used to calculate total deviation plots, pattern deviation plots, PSD and MD scores [10]. Furthermore two test parameters designed to improve efficiency and accuracy of testing, RSEs and PDFs, were critically evaluated.

The cohort’s pointwise sensitivity to circular contrast targets via an online app obeyed a traditional hill of vision pattern, with sensitivities similar in magnitude, slope and standard deviation to traditional SAP. Agreement between tests was good; a four dB bias is smaller than the difference between SAP and FDP normal sensitivities [24]. Such a pattern of orderly results is promising, however further analysis is required to assess test repeatability and ability to differentiate normal from disease states. The pilot study (49 eyes) demonstrated high sensitivity and specificity in separating glaucomatous eyes from healthy controls [9].

Similar to traditional FDP and SAP, age had a modest influence on test sensitivity [30]. This cohort demonstrated a reduction of 1.0 rdBs per decade, comparable to 0.7 dB per decade on traditional FDP technology [31].

RSEs are a component of traditional step-wise perimetry, however their utility is increasingly being questioned in modern algorithms that prioritise speed; they have been omitted from the current SITA-faster algorithm [18]. There are several fundamental problems of RSEs to consider: the second staircase might not necessarily be an improvement on the first; what to do if the second staircase deviates further than the first from expected sensitivity; what is the potential relative weighting for first and second staircases (and is this to be influenced by deviation of either result from expected). In addition, the extra time may impair overall test performance [17, 32]. In this cohort the RSEs appeared to increase the test duration at minimal benefit to error improvement. In light of these factors they will not be included in subsequent versions of the OCCP. Further studies will be performed to assess the influence of removing RSEs on OCCP sensitivity and specificity.

PDFs are an established component of modern perimetry–both through zippy estimating by sequential testing (ZEST) and modified staircase (such as SITA) algorithms [15, 16, 33]. PDFs reduce test duration without necessarily sacrificing accuracy, by predicting point threshold from user responses prior to full double staircase completion. This is achievable via an age-matched normative dataset. As in the development of SITA and SITA-like algorithms, the ERF limit can be varied to optimise the relationship between reduction of stimuli number and increase in MSE. In this cohort an ERF of 0.85 was chosen––this is slightly higher but comparable to 0.69 set for SITA-standard perimetry [15].

In this cohort test duration was less than previous test versions [9]. One reason for the reduced duration is the new variable inter-spot interval, which alters based on user response time. For users with faster reflexes this can result in a quicker test; likewise, for those with slower reflexes it can allow the test to be taken at a tempo that is comfortable for them––this may minimise test-related stress. Further studies to quantitatively assess user satisfaction of machine- vs computer-based perimetry would be of interest. In addition, it is expected that with the addition of PDFs and removal of RSEs the test duration will reduce further.

As discussed above, there are several advantages to web-based perimetry: access in under-resourced settings, telemedicine if access to clinics is not possible (e.g., during a pandemic), more frequent testing, abilities to further refine the enjoyment of the test, use artificial intelligence and align with computer-based databases. Also, its convenient at-home use may lead to increased penetration of screening––the latter is significant even in developed countries where 50% of glaucoma remains undiagnosed [34]. A barrier for current screening is that people must attend an optometrist’s or ophthalmologist’s practice.

Our findings are consistent with other studies evaluating perimetry through non-conventional means, such as tablets, personal computers, or virtual reality headsets. Tablet perimetry has good reliability and repeatability, and is useful for home-perimetry; however one potential disadvantage is access to hardware [35, 36]. Computer-based standard white-on-white perimetry has been previously reported with good validity compared to conventional white-on-white perimetry [3]. Similarly 240 central perimetry has been achieved using a virtual reality headpiece and smartphone system with similar results to SAP [37].

More work is required to assess the feasibility of the OCCP for at-home monitoring. It is important to provide clear instructional content immediately prior to the test to ensure correct testing procedures and environmental setup at-home.

Consistent head positioning, easily achieved on dedicated perimeters, will influence mapping of loci and test-retest consistency. On a self-administered web-based test, the use of web-cam facial monitoring can be used to assess head position. Fixation losses detected during the test can be used trigger a reminder to the user to focus on the fixation target.

Dealing with variations in monitors is a potential challenge. Variability of luminance between screens can be compensated by pre-test appropriate greyscale calibration and adjusting contrast, brightness, gamma and white temperature accordingly [38]. Within target thresholds, such as used in the OCCP and in FDT, might offer some advantages over SAP when delivered over varying monitors, given that the contrast is calculated on inter-target variation and not on diffuse background (such as in SAP), which can vary across some (older) monitors [39]. Variations in screen size will occur but can be compensated by altering viewing distance to ensure consistent viewing angle. Guidance through blind-spot determination can aid with this; the test will ask users to sit slightly closer for smaller screens and further way for larger ones. For example, should the blind spot be found to deviate too far peripherally or centrally, the user can be instructed to move closer or further from the screen. Once screen size reaches a lower limit (such as for tablet devices) the test will instruct users to use a desktop or laptop computer. More work is required to assess generalisability of the OCCP to tablet devices.

At a comfortable viewing distance, it is impossible to detect the full 240 field on a standard computer screen without adjusting the fixation point. A moving fixation point can cause some problems. Namely, the entopic after-image of staring at the pale screen for a prolonged period can wash out the subsequent image after the fixation has moved (and the relative position of the screen on the retina has shifted). For this reason, there is a 20 s delay after the fixation point moves from the top to the bottom of the screen, to allow re-adaptation of the user’s eye to the new screen positioning. It is uncertain what effect the mid-test repositioning of fixation has on accuracy––whether positive (giving a user a rest mid-test) or negative (the change in fixation position might make it harder to maintain fixation throughout the test). This is sufficient to eliminate the effect of the entopic after-image. This, and the blind-spot localisation, add extra time to the test compared to machine-based perimetry. Future studies would be of interest to compare the duration of the OCCP (sped-up by using PDFs determined here) to SAP.

Perimetry using a dedicated perimeter including a Ganzfield bowl has some advantages over a flat computer screen. Projection from the latter can be influenced by screen tilt and reflected oblique light sources [38]. For the OCCP, simple trigonometry was used to space the peripheral targets more widely apart than central targets to simulate the even 6 degree distribution of bowl-based perimetry [40].

Fixation is a challenge for any form of perimetry. One advantage of a web-based programme is that it allows the use of a rotating fixation point. A rotating star may be an easier fixation target than a fixed point [41]. Subsequent studies will evaluate whether users prefer kinetic to static fixation targets, and their influence on fixation loss rates.

This study has limitations. More testing is required to assess test-retest consistency and assess feasibility of testing on varying monitors, both in clinic and at home. It would be useful to gain insights into user experiences for web-based perimetry compared to machine-based perimetry. Larger validation studies comparing OCCP to current clinical investigations are required for the diagnosis and monitoring of glaucoma, as well as assessment of test reliability. As a scientific study all subjects here were supervised by a trained researcher; it would be interesting to know if the test were reliable for unsupervised self-testing at home, and appropriate messaging within the app is needed to ensure the test instructions are clear and easy to understand.

In conclusion, the OCCP appears to have a normal sensitivity that obeys a regular hill of vision, with good agreement to SAP on Bland-Altman analysis. Mean sensitivity declines with age consistent with similar perimetric tests. PDFs can be used to reduce test duration without a major impairment to accuracy, and RSEs can be removed without major impact on test accuracy. Circular contrast perimetry through a web-based application on a personal computer provides an accurate, rapid and convenient method to assess perimetry without the need of dedicated machinery.

Summary

What was known before

  • Standard automated perimetry over 24 degrees provides accurate diagnosis and monitoring for glaucoma.

  • It relies on dedicated custom built machinery in an eye clinic.

What this study adds

  • Similar perimetric services might be possible on personal computers through a web application, via the Online Circlular Contrast Perimetry test.

  • This study demonstrates a normative database can be collected and some test parameters are optimised.