Diagnosis of colour vision deficits using eye movements

We set out to develop a simple objective test of functional colour vision based on eye movements made in response to moving patterns. We exploit the finding that while the motion of a colour-defined stimulus can be cancelled by adding a low-contrast luminance-defined stimulus moving in the opposite direction, the “equivalent luminance contrast” required for such cancellation is reduced when colour vision is abnormal. We used a consumer-grade infrared eye-tracker to measure eye movements made in response to coloured patterns drifting at different speeds. An automated analysis of these movements estimated individuals’ red-green equiluminant point and their equivalent luminance contrast. We tested 34 participants: 23 colour vision normal controls, 9 deuteranomalous and 2 protanomalous individuals. We obtained reliable estimates of strength of directed eye movements (i.e. combined optokinetic and voluntary tracking) for stimuli moving at 16 deg/s and could use these data to classify participants’ colour vision status with a sensitivity rate of 90.9% and a specificity rate of 91.3%. We conclude that an objective test of functional colour vision combining a motion-nulling technique with an automated analysis of eye movements can diagnose and assess the severity of protanopia and deuteranopia. The test places minimal demands on patients (who simply view a series of moving patterns for less than 90 s), requires modest operator expertise, and can be run on affordable hardware.

www.nature.com/scientificreports/ Such eye movements are mediated by partially shared brainstem and spinal pathways 22 and serve to minimise retinal slip 23 . Plotting the horizontal position of the eye (y-axis) against time (x-axis) leads to a characteristic "saw-tooth" pattern, the slope of which varies with direction of the slow-tracking movement (middle of Fig. 1). Critically, the direction of the tracking-phase of the participants' OKN response closely matches their subjective report, a finding that has been confirmed in both human and non-human primates [24][25][26] . Because the strength of OKN is generally determined by the visibility of moving stimuli, and visibility can be manipulated along a variety of visual dimensions, OKN has proven to be a flexible technique for assessing visual function such as acuity 27 contrast sensitivity 28 , visual field loss 29 and refractive error 30 . Compared to perceptual report, OKN provides a more objective measure, requiring minimal compliance (typically only passive viewing of a series of movies).
Other studies have used eye movements to measure chromatic sensitivity in humans 31,32 and in non-foveate vertebrates 33 . Results indicate that purely chromatic stimuli elicit OKN 34,35 provided the grating is of a sufficiently high contrast to support discrimination of its direction; the contrast required is considerably higher than for a luminance stimulus 36,37 . In the wider OKN literature, instructing participants to "attempt to fixate" (stare-OKN) generally results in low-amplitude and high-frequency nystagmus in comparison to the behaviour observed when participants are instructed to "follow the stimulus" (look-OKN) which leads to low-frequency and largeamplitude nystagmus 38,39 . The use of look-OKN paradigms which encourage active tracking of the stimulus leads to estimates of higher chromatic sensitivity 40 . For example Crognale and Schor 41 recorded voluntary pursuit and involuntary stare-OKN eye movements made by observers in response to drifting equiluminant stimuli. They note irregular stare-OKN responses to purely chromatic stimuli, compared to a reliable look-OKN responses to the same stimuli.
Other non-visual factors can influence optokinetic response. The magnitude of OKN elicited can vary between participants and even across sessions for the same participant 42 . Age also has an impact on OKN, with studies showing a decrease in OKN gain of 6 to 18% (relative to a baseline of around 80%) in individuals over 50 43,44 . In addition, fatigue can reduce saccadic velocity in the OKN response, whereas administration of a stimulant such as caffeine can increase OKN gain 45 . Finally, increased attention leads to higher gain and frequency of OKN 43,46,47 . In short, while within-subject changes of OKN response can provide an objective proxy of their perception, such between-subject and non-visual factors make it challenging to use the optokinetic response to group individuals who share a similar perceptual experience (e.g. as a result of a CVd).
With these considerations in mind, and seeking to improve the reliability of the oculomotor response across participants, we encouraged our participants to follow the stimulus using a "look-OKN" paradigm. As a result, both optokinetic reflex and active tracking of the stimulus (a.k.a. smooth pursuit) contribute to the response we measured. Following earlier work 31 we refer to this collective pattern of eye movement response as directed eye movements (DEM). Our study builds upon Cavanagh et al. 13 previous work, by developing an objective test that uses DEM (instead of subjective responses) to accurately measure the type and severity of colour vison deficiency. We use an infrared eye tracker to measure eye movements in response to bi-directional coloured stimuli, with automated analysis of those eye movement data to objectively quantifying strength of DEM. Plotting DEM-gain as a function of red-green luminance-balance allows us to make an objective estimate of both equivalent RG-brightness (red-green luminance-balance) and equivalent luminance contrast C Eq . We use these pairs of measures to train a classifier to distinguish CV-status and compare results to a gold-standard clinical colour vision assessment.

Methods
We ran three colour vision tests on observers with and without congenital CVd. The first was the gold standard Neitz Anomaloscope. The second was the clinical standard 14-plate Ishihara. The third was our updated Cavanagh/Anstis test that estimated both red-green equiluminant point and equivalent luminance contrast estimated from participants' directed eye movements to motion-nulling stimuli.
Participants. We recruited 34 participants (17 females, 17 male, 17-65 years), of which 23 were normal trichromat controls, 9 were deuteranopes and 2 were protanopes (see Appendix A for patient details). We targeted recruitment of participants with red-green colour vision deficiencies or normal trichromatic colour vision. Due to the strobing appearance of some stimuli, participants with epilepsy were excluded. The experimental protocols and procedure were approved by the University of Auckland Human Research Ethics Committee. The protocols and procedure complied with the Declaration of Helsinki, and informed consent was obtained from all participants prior to the experiment.

Apparatus.
A Type-I Neitz anomaloscope (Model OT-II) was used to perform a standard diagnosis of redgreen CVd by having the participants adjust the luminance of a yellow test to match the appearance of the range of red-green mixed reference colours 7 . Plotting the results as in Fig. 2, the position and slope of matched values indicates the type of CVd, whereas the length of the matching range quantifies the severity of CVd.
For the eye tracking tasks we presented stimuli and recorded eye movements using a Windows 10 PC laptop (ROG Zephyrus M) fitted with a Tobii 4c eye tracker 48 50 . The experiment was performed without any chin or headrest, although participants were instructed to attempt to maintain a constant head position. Note that in our pilot testing, we measured almost identical responses regardless of whether the head was fixed or not.
Stimuli. Stimuli were composed of superimposed pairs of vertical sine-wave gratings moving in opposite horizontal directions ( Fig. 3 top row, and Video 1). Gratings were defined either by modulations of luminance (generated by in-phase spatial modulation of red and green channels) or of chromaticity (generated by anti-phase Expected results from the anomaloscope for different CVd groups. The x-axis shows the ratio of red-green in the reference ranging from 0 (pure green) to 73 (pure red). The y-axis indicates the luminance of the yellow test scaled from 0 (black) to 40 (maximum brightness)-that matched the corresponding redgreen test. Lines are labelled with the corresponding CV category. Because the red and green primaries of the anomaloscope lie along the red-green axis, the test captures the colours readily confused by dichromats. As a result, extreme dichromats can match the yellow test to red-green references spread across the entire colour mixture range-solid arrows. CVn observers on the other hand see contrast between colours and will match only at near-equal ratios of red and green (40)(41)(42)(43)(44)(45)(46)(47)(48)(49)(50). Mild dichromats-dashed lines being less sensitive to either red or green, require a greater amount of that colour to make a luminance match, thus matching values on either side of the CVn range.   Stimuli are comprised of a fixed-contrast (20%) yellow luminance grating added to a R-G coloured grating. (Middle row) When a yellow grating is superimposed on an equiluminant RG grating (R50:G50), a CVn observer will report the direction of the yellow component, because its contrast exceeds C Eq. Increasing the luminance of either the red or green component first nulls the fixed-contrast component (leading to flicker) and then exceeds it (at which point observers report direction of the R-G grating). (Bottom row) By comparison, an observer with protanomalous colour vision who perceives red as weaker than green-produces responses shifted right along the stimulus axis. Note how the observers require more red (here R62:G38) to achieve equi-brightness. www.nature.com/scientificreports/ spatial modulation of red, and green channels). All gratings had a peak spatial frequency of 0.5 cycles per degrees (i.e. 1 cycle of a sinewave grating for every two degrees of visual angle) and subtended 46.54° by 27.26° presented fullscreen on the laptop monitor. These parameters were selected based on previous studies showing that such stimuli induce robust OKN 28,51 . In pilot testing we noted that the stimulus factor predominantly responsible for driving variability in DEM response was grating speed. We therefore varied this parameter experimentally between 4, 8 or 16 deg/s, to determine which speed led to optimal classification of test from controls. During tesing, the contrast of the luminance-defined grating, C Fix , was maintained at 10% or 20%, while the mixture of red M Red and M Green varied from 25 to 75%, to cover the range of red-green equibrightness-points of participants.
To generate drifting sinusoidal gratings the luminance of the red ( L r ) and green ( L g ) channels were set using: where f s and f t are the spatial and temporal frequency respectively, L mean is the mean luminance of the display (36.0 cd/m 2 ), and L range is the luminance range (± 35.6 cd/m 2 ). Perception of stimulus direction will vary as the ratio of red-to-green luminance changes. Figure 3 depicts the typical percept for a CVn (middle row) and a protanomalous trichromat (bottom row). Participants were presented with six 75 s long movies in total, each displaying a single combination of component-grating speed and fixed luminance contrast (4 Hz + 10%, 4 Hz + 20%, 8 Hz + 10%, 8 Hz + 20%, 16 Hz + 10%, 16 Hz + 20%). Each movie comprised forty 2.5 s trials. Direction of grating (left or right) and the proportion of red (25%, 37.5%, 50%, 62.5%, 75%) were shuffled across trials, to minimise the build-up of optokinetic aftereffects. Each combination of direction and proportion of red of the coloured grating was repeated four times. Participants were instructed to follow the stimulus if it felt natural. Eye movements were scored using the procedure described in the next section below.

Analysis: quantifying directed eye movements (DEM). To quantify the direction and magnitude of
DEMs induced by our stimuli we adapted an approach for measuring contrast sensitivity using the optokinetic response 28 . We first pre-processed eye tracking data, breaking the sequence up into fragments punctuated by blinks. Blinks were signalled when instantaneous pupil-diameter deviated from the median pupil size by more than 3 times the mean absolute deviation of the pupil-diameter. Eye position data collected during-or within 33 ms of the onset or offset of-a blink, were discarded. The remaining eye-position data were used to calculate an eye-velocity threshold which was used to classify instantaneous estimates of horizontal eye velocity as either saccadic or tracking movements. The threshold was set such that it would maximise the distance travelled by the eye ( D) , assuming DEMs/ optokinetic nystagmus in the stimulus direction ( θ ). D θ then, was the sum of all eye-movements classed as tracking ( T θ ) in the same direction as the stimulus, and saccades ( S θ +π ) in the opposite direction of the stimulus motion: D θ quantifies strength of DEM in degrees and a similar calculation was performed for the opposite direction to give D θ+π . The calculated velocity threshold aimed to maximise C θ , the ratio between D θ (consistent with DEM in the direction of the coloured grating) and D θ +π .
The measure we use to characterise DEM strength-DEM-gain-is like D θ except it is calculated only using tracking velocity. It is the ratio of mean tracking velocity ( T θ ) to mean stimulus-velocity.

Analysis: estimating equibrightness and equivalent luminance contrast. Plotting DEM-gain
against the red-green luminance-balance (Fig. 4a) yields a V-shaped function which has a minimum at the equibrightness point and (normally) crosses the zero-gain line at two points (the two red-green mixtures leading to motion nulling, as discussed in Fig. 3). We fit these data using a standard V-function with three free parameters: where R is the predicted DEM-response, and M red is the red component of the red-green colour mixture of the coloured grating. The three fit parameters are: B Red (the red-green mix that minimizes R, i.e. the equi-brightness point), and A and S are offset and scaling parameters respectively. From V-functions fit to each participant's six data-sets (2 fixed luminance contrasts × 3 stimulus speeds) we record the equi-brightness point B Red and calculate C Eq , the equivalent luminance contrast. C Eq is estimated based on the zero-crossings of the V function (inferred M red levels that would lead to motion nulls). By setting R = 0 in Eq. 5, and rearranging we see that the zero crossings arise at: C Eq , then is defined as difference between the fixed luminance contrast C fix and the distance of the nulls from B Red i.e. www.nature.com/scientificreports/ Figure 4b,c illustrate how the V-shaped function is shifted for a deuteranope and protanope observer. These observers have both atypical points of equi-brightness and experience weak motion in equi-bright stimuli, resulting in lower B Red and C Eq values respectively.

Results
Representative plots of DEM-strength versus red-green colour balance are shown for normal and colour deficient observers in Fig. 5. DEM-strength is signed positive or negative for whether the tracking-phase of DEM was consistent with the direction of the colour or luminance component, respectively. In Fig. 5a the participant with normal colour-vision exhibits a robust DEM response in the direction of the coloured grating at both extreme red/green luminance-balances. The luminance contrast leading to nulling of colour-motion (C Eq = 14.5%) is lower than the fixed luminance (20%), to yield an equivalent luminance contrast of 20.0-14.5 = 5.5%. As we approach   For CVn observers the V-function is relatively symmetrical around the physical equiluminance point (50-50 on the x-axes). For CVd observers, equi-brightness and the "V" function is shifted towards the defective colour. A more severe CVd leads to responses being dominated by the luminance-defined grating, shifting the "V" downwards and reducing C Eq accordingly. Horizontal and vertical shifting of the V function is evident in Fig. 5b,c. The end-points of the V-function for both CVd responses are shifted towards the zero line as colour contributes less to their motion response (C Eq = − 5.1% for the deuteranope and C = − 15.4% for the protanope) leading to a shallower V-function.
This shifting could lead to a negative C Eq (as seen in Fig. 5b,c) consistent with the colour-grating driving the motion response in the opposite direction to itself. We consider two explanations for this finding. First it could be noise; our paradigm lacks the resolution to precisely determine C Eq since it only presented coloured stimuli at a series of fixed possible red-green mixtures. A second suggestion is that this outcome could arise from aliasing of the stimulus in the periphery [52][53][54] . Figure 6 shows the pattern of response across all participants observing stimuli moving at 16 deg/s with a fixed luminance contrast of 20%. Note the high levels of variation in DEM gain (across trials at a given stimulus www.nature.com/scientificreports/ level) and in overall pattern of DEM gain (although data are generally well captured by the V-fit) between participants belonging to the same colour vision group. In particular participants with the same CV status exhibit wide variation in the depths of the fit 'V' function. This is not unexpected given that gain in one component of their DEM-response-OKN-can vary due to a range of factors including but not limited to observer age 44 , attention 47 , fatigue 45 , and the instructions received 39 . However, because the 'V' fit is dependent on the relative DEM strength across R-G stimulus levels, absolute DEM strength across participants should not influence fits greatly. Rather inconsistency in DEM gains did influence the V-function fit (Fig. 6, #9, #24, #29), generally producing "shallower" fits that were typically different to the characteristic 'V' fits of other participants belonging to the same colour vision group. While both protanopes (red panels) had V-functions shifted to the right, most deuteranopes (green panels) had V-functions that were symmetrical around the physical-equiluminance point, like those of observers without a CVd. As such, while B Red best separated protanopes from CVn (red cells Table 1), C Eq better separated deuteranopes from CVn (green cells Table 1). A summary of the group differences when compared using a Wilcoxon rank sum test is noted in Table 1. Because age too can affect the OKN response 55 , we also include additional comparisons between groups based on age (specifically, 50 years of age and under and over 50 years of age based on previous literature 44 ) in Table 1. We report that for individuals 50 years and under and individuals over 50 years, the differences in B Red for CVn and CVd (Prot) are significant. Whereas for C Eq , only individuals who are 50 years and under show significant difference between CVn and CVd (Deut).
As shown in Table 1, participants belonging to the same colour vision group had substantial variation in their B Red and C Eq values as indicated by the values' standard deviations (SD). For CVn (of all ages), the SD of B Red was equal to 6.28%, and for CVd (of all ages), the SD of C Eq was equal to 8.95%. To reduce variability, we selected parameters from the "deeper" V function (quantified using the magnitude of the scaling parameter) of the two C fix levels for the same stimulus speed condition. We elected to do this because visual inspection of our data suggested that some parameters from some conditions were unreliable (usually as a result of noisy DEM responses that were fit to "shallow" V functions) and by averaging them across C fix conditions, which is what Cavanagh and Anstis did, we would likely compromise the categorisation accuracy of the system. Our process lowered the variability of measures within participants belonging to the same colour vision group, decreasing SD of B Red for CVn (σ = 6.28% to 2.93%) and of C Eq of CVd (σ = 8.95% to 3.54%). These better matched the low variability reported by Cavanagh and Anstis 13 (abbreviated CA91); SD of 3.44% (for CVn for B Red ) and 0.72% (for CVd for C Eq ) for their most optimum test condition that best separated colour vison groups (4 deg/s).
A scatterplot of individual estimates of equi-brightness against equivalent luminance, derived in this way is shown in Fig. 7.
We next considered if we could use these data to reliably determine the colour vision status of our observers. An unsupervised machine learning algorithm (K-means clustering) 56 partitioned participants into 3 clusters in which each participant belonged to the cluster with the nearest mean (Fig. 7a-c). K-means is an iterative process that requires no labelled data. It instead initialises K centroids (where k = number of expected clusters) at distinct given locations (x,y) and moves each centroid to the average of the data-points nearest to it. This is repeated until the centroid assignment no longer changes. Based on previous findings 13 , the initial centroid positions were set to B Red = (32.5, 50, 62.5) and C Eq = (0, C fix , 0) respectively. Our results indicated that tests conducted with 16 deg/ sec stimulus had the highest sensitivity (90.9%) and specificity rate (91.30%). For comparison, Fig. 7d-f plots the estimates of CA91 participants, partitioned using the same K-means algorithm. www.nature.com/scientificreports/ We note the similarity of results from our best-performing test-condition (Fig. 7c) and those of CA91's (Fig. 7d). For example, deuteranopes in both studies reached equi-brightness at somewhat more greener light (CA91: B Red = 38.5% vs Ours: B Red = 41.63%), whereas protanopes required a lot more red light to experience the same luminance contrast (CA91: B Red = 64.8% vs Ours: B Red = 65.78%). Likewise, colour deficient participants in both studies showed lower equivalent luminance contrast (CA91: C Eq = 0.56% vs Ours: C Eq = 0.60%) compared to CVn (CA91: C Eq = 11.75% vs Ours: C Eq = 6.73%).
In addition to this, it was found that the euclidean distance between a participant's B Red and C Eq and that of the centroid of the CVn cluster (found using the K-means algorithm) acted as a measure of his/her CVd severity. However, only at the highest speed condition did this measure significantly correlate (using the Kendall Rank correlation test) with the severity measure made using the anomaloscope (Fig. 8).

Figure 7. Equibrightness plot against equivalent luminance contrast for (a-c) our 30 participants and (d-f)
Cavanagh and Anstis' participants, run in the three speed conditions. Boundaries of the coloured regions were derived using a K-means algorithm that sought to best separate the three groups. Error rates indicate the percentage of mis-classified participants. www.nature.com/scientificreports/ Response bias. We note that CA91 13 achieved better separation between colour vision groups when participants were presented with stimuli drifting at slower speeds (4 deg/s). They report that equivalent luminance contrast of controls was reduced as the temporal frequency was increased, leading to increased misclassification of CVn as CVd (deut), and vice versa. Despite including a small sample of only four CVn's (of which three's C Eq decreased with speed) CA91's results are consistent with previous work showing reduced sensitivity to colour at higher temporal frequencies 36,57 . In contrast we found that the equivalent luminance contrast of controls modestly increased with grating speed and the higher speed conditions produced data that better separated our participants into distinct colour vision groups. This discrepancy is likely attributable to the influence of something akin to "response bias" on the reliability of DEM data as compared to subjective report. Figure 9a,b (column 1) plots "V" functions from three of the six CVn participants-who were mis-classified based on data from the low-speed conditions-measured at 4 and 16 deg/s respectively. Note how noisy the DEM-gain data are in Fig. 9a, as indicated by the high mean squared error (MSE) of the fit 'V' function, compared to fits in the higher-speed condition (Fig. 9b). Comparing the MSE for all participants across speed (Fig. 9c) showed similar patterns with both 8 and 16 deg/s speed conditions leading to a significantly lower MSE than 4 deg/s. 16 deg/s had by far the lowest mean MSE of 0.11. This finding is not attributable to low-speed conditions eliciting lower DEM-gain; analysis in fact shows that the opposite is true (see Appendix B). Rather note that in Fig. 9a the red and blue symbols are more likely to flip sign around www.nature.com/scientificreports/ the mean (solid black line), at a given red-green mix, in lower compared to higher speed conditions. Recall that red/blue symbols colour code the direction of the coloured grating (red: left, blue: right) and DEM-response data are signed for colour direction (positive: towards colour-component direction, negative: towards luminance direction). Were the participant to randomly switch between colour and luminance components, (regardless of their contrast) a random distribution of red and blue data-points around the mean gain-level would be noticeable. This can be quantified by taking the SD of the DEM response for all trials moving left (red points), and the SD of the DEM response for all trials moving right (blue points) and averaging the two values (Fig. 9a,b-3rd column). Comparing random switching for all participants across speed (Fig. 9e) shows significantly higher switching at the lower speed. Another source of variability in the DEM-gain data could arise from a bias towards responding in one direction (left or right) more than in another. When DEM-gain is signed for absolute direction the anticipated mean-gain at a given red-green mix should be zero (since we balanced left and right drift-directions of both colour and luminance components). Were the DEM-data to be biased towards a particular absolute direction then the red and blue data points would form widely separated clusters above and below the mean gain level. We can quantify this by averaging the DEM response signed for absolute direction rather than colour-component direction, as seen in Fig. 9a,b (column 2) for all three CVn participants. Figure 9d compares the direction bias for all participants across speed. Unlike switching, the magnitude of direction bias stays almost constant across speed, with no distinguishable pattern.
Why is switching more frequent at low speeds? Recall that participants were asked to "follow the stimulus, " in an effort to elicit more reliable DEM responses. Having been instructed to continuously track the stimulus, participants who did not know what to follow could voluntarily track and by extension, switch between individual components (colour or luminance) on an arbitrary basis, in low-speed but not in high-speed conditions. For this reason, we recommend the use of higher speed stimuli for DEM testing.
Why didn't CA91 suffer from this problem? Recall they used a method of adjustment (to null motion) and not a two alternative forced choice, as we did. Is response bias arising from our use of eye movement measures or from our forced choice procedure? To answer this, we reran our test on four of our CVn's (participant 15,16,18,19) but had them simultaneously make perceptual reports of stimulus direction using the computer keyboard (left and right arrow key to represent the respective direction). Appendix C plots results from both types of measures, which shows DEM or subjective responses lead to similar estimates of response bias. Thus, it is the forced choice procedure (rather than the measure used) that determines the level of response bias.

Discussion
Adapting an approach described by CA91, we have further developed and validated an automated test for human colour vision deficiency, based entirely on eye movements made in response to dynamic-coloured stimuli. However, unlike CA91 who relied on subjective inputs to measure motion nulls, our use of eye movements creates an entirely objective and involuntary test that requires minimal instructions to administer in clinic (or may even be self-administered at home). Participants CV-status was most accurately determined using a 16 deg/s stimulus drift speed, where our results closely agreed with both the categorisation and measure of severity made using the gold standard anomaloscope. However, unlike the anomaloscope, our test is significantly shorter and simpler to administer, making it fit for use on both young and old participants who may be unable to comply with the anomaloscope (e.g., unable to sit for long durations, have difficulty understanding and following instructions, etc.). Further to this, our test may also be used on nonverbal populations such as babies.
Comparison to previous work. We report good agreement between results from our objective test of colour vision stats and results from both the clinical gold standard screening procedure (Ishihara plates) and from a diagnostic procedure (the anomaloscope-test, based on patients' subjective colour judgments). We also compared our results to those from another test (CA91) which uses participants' subjective judgement of motion direction (rather than the DEM/optokinetic response) for stimuli essentially identical to our own. A notable difference between the results of our study and CA91 is that we find that colour vision status is more reliably determined by data measured in higher speed than in lower speed conditions. This is likely because of the 'random switching' discussed above, where lower speed conditions allowed participants to engage in random attentional tracking of either the colour or luminance component, eliciting inconsistent DEM directions for the same stimulus level.
Another difference between findings from the two studies is that we report (for CVn observers) increasing C Eq with increasing stimulus-speed (4 deg/s: Avg. C Eq = 4.31% vs 16 deg/s: Avg. C Eq = 6.72%). This is opposite to CA91's findings that temporal frequency reduces C Eq. Interestingly, when Teller and Palmer 31 used a paradigm similar to our own to evaluate both CVn adults and children, they too report higher C Eq with increasing speed. Teller and Palmer report equivalent luminance contrast of ~ 12% at 25 deg/s, similar to CA91's results for 4 deg/s. This along with our own results suggests that the neural system supporting DEM has higher chromatic sensitivity at higher speeds (potentially leading to better separation between CVn and CVd).
This difference between Teller and Palmer's and CA91's study may be attributable to the different motion processing mechanisms engaged. Recall that CA91 asked participants to fixate on a stationary circular marker in the centre of the screen while making subjective reports of direction. On the other hand, Teller and Palmer (and our own study) instructed participants to actively track the stimulus (i.e. eliciting a combination of OKN and smooth pursuit). The difference in eye movements between these two types of tests (fixation vs tracking) may have impacted on C Eq -a measure of chromatic sensitivity. Prior work by Krauskopf and Li 58 and Cavanagh 59 suggests that while a low-level motion processing mechanism such as OKN is driven well only by a luminancebased stimulus, smooth pursuit may be driven by either a luminance or colour-based stimulus. It would then www.nature.com/scientificreports/ stand to reason that our stimuli (comprised of a luminance and chromatic gratings) activated two distinct motion processing pathways: (a) a low-level optokinetic system driven largely by luminance, (b) a higher level smooth pursuit system driven by the chromatic image-structure. We hypothesise that smooth pursuit dominated the DEMs made in response to higher speed stimuli, based on the similarity in findings between our study and that of Crognale and Schor 41 . Their study measured consistency of look-OKN (i.e. smooth pursuit) vs stare-OKN for isoluminant chromatic stimuli. They noted irregular stare-OKN for isoluminant chromatic stimuli, similar to the DEM responses we report for our lower speed condition (Fig. 9c, left box and whisker). In contrast Crognale and Schor 41 reported consistent look-OKN with the same iso-luminant stimuli, similar to the DEM we measured in our higher speed condition (Fig. 9c, right box and whisker). As such the consistency of our OKN estimates measured at higher speeds suggests strong active tracking of the stimulus (over a pure-OKN response). We note that a stronger smooth pursuit response would drive tracking of the colour stimulus (over the luminance stimulus) and would likely increase C Eq for CVn's.
Further work is however needed to test participant response across a wider variety of speed conditions, and to see how average C Eq value changes across speed for different colour vision groups.
Further development. The current test runs 40 × 2.5 s trials to derive an objective estimate of equibrightness and equivalent luminance contrast. Compared to the anomaloscope procedure, our test reduces the time required to diagnose colour vision deficiency by a factor of 10. That DEM's could be exclusively comprised of involuntary eye movements also means that it can potentially be tested on nonverbal populations such as babies. With that said, there is scope for further improvement.
A significant challenge for a wide-scale role-out of our DEM-based CV test is display calibration. Display calibration is necessary to establish e.g. the physical equiluminance point for a given device, and is typically achieved by making a series of measurements of display luminance using a photometer at different levels of display activation. The validity of this approach is predicated on testing being conducted under similar lighting conditions to the calibration. This may limit the test's use to clinics where typically the device can be calibrated on-site and lighting conditions under which the device is calibrated are maintained.
At-home calibration would either require a photometer or an observer with ostensibly normal colour vision (for a subjective calibration procedure). Alternatively, at-home testing could make use of pre-calibrated displays or of displays that exhibit high levels of consistency "out of the box" (e.g., iPads). For these options to work under different home lighting conditions, we could potentially analyse environment light (via the webcam) to shift the white point. Home based testing would also require additional data-quality checks to ensure the head and eye positions are reasonably within the limits of the eye tracker/screen. Further research would be needed to gauge the decrease in sensitivity/specificity when this DEM colour vision tests were used in a home-setting without an administrator and varying light conditions.
In terms of additional improvements, DEM responses are variable across participants, and participants that exhibit little to no DEM are often misclassified in the high-speed condition. Implementing open loop OKN where the stimulus follows the eye movements is known to significantly increase the gain 60,61 and would likely help improve the robustness of the measures.
To reduce test time, we are developing an adaptive procedure 62,63 that uses current and previous DEM measures to update the R-G mix, in order to more accurately and quickly estimate the motion null points. Reducing test time could be crucial for testing young children who already struggle with the anomaloscope 8 and are too young to recall numbers used in the Ishihara plates. In-fact previous work has successfully measured OKN in children between 1 and 3 months to estimate equibrightness 64 .
Our automated test is a promising first step towards more accessible, accurate and in-depth colour vision diagnosis in clinics and or homes. Given recent advances in software-based eye tracking running on computers/ tablets/phones equipped with front-facing cameras 65 , our test could become a simple, reliable, automated colour vision assessment that could be downloaded for use by clinician and patient alike.