Introduction

Stereo vision is the computation of depth based on the binocular disparity between the images of an object in left and right eyes (Figure 1). This requires matching up features in the two eyes, that is, identifying features in the left and right retinas that are both images of the same point in the visual scene. The matching process begins in primary visual cortex of the brain, but requires a series of computations across several distinct areas of visual cortex before it is fully achieved.1

Figure 1
figure 1

The optic axes of each eye are shown with solid lines. The fixated object, by definition, projects to the fovea in each eye. The retinal projections of another object are shown with dashed lines. Because this object is not at the fixation distance, its images in the two eyes fall at different locations relative to the fovea, as indicated by the angles L and R. Absolute binocular disparity is defined as the difference between the angles to the fovea in each eye: Δ=R−L. Note that in this figure, angles are exaggerated for clarity. In reality, the second object would be seen double as its disparity is too large to be fused.

In principle, if the eyes were free to move independently, the match for a given feature in the right image could be located anywhere in the left retina (Figure 2). Humans, however, simplify the matching problem by imposing severe constraints on our eye movements. Most notably, our binocular vision requires that the optic axes intersect, that is, both eyes fixate the same point in space. This, together with the other laws governing eye movements,2, 3 means that matches generally lie at essentially the same vertical position in both eyes.4 The visual system simplifies further by considering only potential matches that are at very similar horizontal locations in the two eyes, within a degree or so (Figure 2). The effect of this is to limit the search to objects at or near the fixation distance.

Figure 2
figure 2

Cartoon of the two retinas, imagining them tiled by 8 × 8 receptive fields. Consider searching for the location in the right retina viewing the same object as the black-shaded location in the left retina. In principle, there are 64 possible locations. However, the horizontal offset of the eyes together with the laws governing eye movements means that the geometrically possible matches are largely confined to the row at the same vertical position (shaded light grey). The visual system then further restricts its search to matches at a similar horizontal position (shaded dark grey).

Together, these simplifications greatly reduce the number of brain cells required for binocular vision. The primary visual cortex of the brain contains neurons receiving information from both eyes. Each such neuron corresponds to a pair of ‘tiles’ in Figure 2, with the size of the tile corresponding to the size of the retinal receptive field (<1° near the fovea). Experimentally, such neurons are found to view very similar visual directions in space, with a wider range of horizontal than vertical disparity.5, 6, 7 That is, the brain contains only those neurons representing the most likely disparities. This represents a great reduction in the number of neurons required (from 64 down to 3 in the cartoon shown in Figure 2).

The cost is that objects that are too far from the fixation distance, such as the second object in Figure 1, are not fused, but are seen double. This physiological diplopia can be noticed in everyday life, but is not generally problematic. Natural scenes tend to vary relatively smoothly in depth, so that points near fixation are generally at similar depth. Where this is not the case, for example where we fixate the edge of a surface, more distant objects will be blurred as we are accommodating on the fixated surface. A more minor side effect is that stereopsis fails for sufficiently extreme eye postures.3 We typically avoid such eye postures by moving our heads to point at what we want to look at. Thus, this strategy works well in practice.

However, this strategy clearly depends critically on the ability of the oculomotor system to direct both foveas at the object of interest—precisely the ability that fails in strabismus. The normal visual system contains several mechanisms to support this. Neuronal crosslinks between accommodation and vergence help the eyes to fixate at the correct distance.8, 9 Images with non-zero disparity trigger vergence reflexes, intended to null out disparity at the fovea and thus ensure that both foveas are directed at the same object in space.10 Many people show phorias, misalignments of the eyes that occur when normal visual input is removed (eg, by occluding one eye’s view), demonstrating the importance of sensory feedback in maintaining normal binocular alignment.

Strabismic patients exhibit tropias: binocular misalignments that persist even with normal viewing. Normal stereo vision is not possible when the eyes are misaligned, because an object’s images on the two retinas are too far apart; they do not fall within the range of matches that the brain can consider. Some strabismic patients show abnormal retinal correspondence; that is, their brain learns to perceive objects as lying in the same visual direction, even though they fall at what would normally be noncorresponding locations in the retina.11, 12 If a tropia is constant and present from an early age, one might have expected stereopsis to be present but similarly shifted, with the receptive fields of binocular visual neurons offset on the retina by the amount of the deviation. However, apparently the normal retinotopic projection is not plastic enough to allow this to occur, perhaps because this would involve matching up physically very different parts of the retina, for example, the fovea in one eye with peripheral retina of the other. Wong et al11 have suggested that anomalous retinal correspondence may be achieved by using chains of neurons across abnormally large areas of visual cortex. Apparently, this mechanism cannot support the binocular neurons necessary to support stereopsis. In addition, eye position in strabismus may be not only misaligned, but also more variable. As noted, receptive fields in early visual cortex are well under 1°, and respond to disparities typically over a range of under 0.5°. Thus, vergence fluctuations of ±0.25°, although not clinically noticeable, could damage the development of stereopsis.

Because stereo vision depends upon good vision in both eyes, excellent oculomotor control and cortical mechanisms for sensory fusion, it is regarded as the gold standard for binocular visual function.13 Where strabismus therapy succeeds in restoring stereo vision, this proves that the patient has excellent and stable binocular alignment, and that the necessary sensory cortical mechanisms have been preserved.

How to measure stereo vision

Several commercial stereoacuity tests are available for use in the clinic.14, 15 Some of these are listed in Table 1 along with some of their properties. At its most basic, a stereotest can aim simply to determine whether a patient has any stereo vision at all, without quantifying their stereoacuity. The ‘gross stereo’ component of the Randot test is one example. For many clinical applications it is also desirable to quantify a patient’s stereo vision. This requires a measurement of stereoacuity. Stereoacuity is defined as the reciprocal of stereo threshold, the smallest binocular disparity that can reliably be distinguished by a patient.

Table 1 Some stereotests available for use in the clinic

There are two problems with this definition. First, just as in other areas of sensation, there is no abrupt cutoff between visible and nonvisible disparities. If subjects are given some task that depends on the perception of the disparity, performance will rise smoothly from chance to perfect over some finite range. ‘Threshold’ must then be defined as a given level of performance, for example, when the subject has a 50–50 chance of detecting the disparity. On clinical tests, the reported threshold is taken to be the smallest tested disparity on which the subject was correct on at least M out of N presentations (where N and M are small integers). Second, the threshold depends on the particular task, on the viewing distance, and on the details of the stimulus. These issues will all be discussed in detail below.

Monocular cues

It is clearly essential that the patient should not be able to pass a stereotest using monocular cues. Where the test images have monocularly visible contours (see below), as in the FD2 or the graded-circles component of the Randot stereotest, large disparities may be visible monocularly as a shift in the contour. Fawcett16 states that this ‘frequently’ gives rise to false-positive results in patients. Using a test image with no monocular contours, as in a random-dot pattern, avoids this. The Frisby and Lang stereotests avoid monocular contours, but because of the way in which they introduce disparity, they still offer monocular cues if the patient does not remain perfectly stationary relative to the test image,17 and this is hard to ensure in children. Fawcett et al18 state that steroacuity scores above 160 seconds of arc in such tests should be interpreted with caution, as they may represent a response to monocular cues. To avoid such artefacts, the test should be repeated with monocular viewing. If the same score is obtained as with binocular, the binocular score is not a measure of stereoacuity.

Unfortunately, the only way of avoiding monocular cues in currently available stereotests requires the use of 3D glasses. These are an additional barrier to using a test in young children; the child may be distracted by the glasses or unwilling to wear them, or the glasses may not fit. The TNO test and the anaglyph version of the Test Chart Xpert 3Di test use red/blue glasses, in which each eye sees a different colour. This is undesirable as it tends to promote rivalry and dissociation, and may promote suppression of one eye by the other. The Randot family of tests uses polarised vectographs to present different images to each eye. As humans are not sensitive to the polarisation of light, both images appear the same apart from the disparity.

Number of alternatives

Clinical stereotests typically ask patients to use disparity cues to choose between a number of alternatives. The number of alternatives varies. For example, the FD2 test and the graded components of the Randot test all ask subjects to identify which of several shapes is closer than the others. The Randot ‘circles’ task asks subjects to choose between three circles, the FD2 test asks them to choose between four shapes, and the Randot ‘animals’ task between five pictures. At first sight, the advantage of offering more alternatives is that patients are less likely to succeed by guessing. Thus, four-alternative tests generally require fewer correct responses to pass a level: two out of three19 or two out of two20 correct responses, as compared with four out of five21 for a two-alternative test. However, the logic behind this seems debatable. If patients are willing to guess, all stereotests are in trouble. On a 2-alternative test, scoring 4 out of 5 is not significantly different from the chance performance of 50%. In fact, if subjects answered at random with both eyes closed, 20% of them would score 4/5 or better. On a 4-alternative test, answering correctly on 2 consecutive trials does not enable us to reject the null hypothesis of stereoblindness at the conventional 5% significance level, as a sightless subject stands a 6% chance of obtaining this score by guessing. Thus, stereotests depend on patients not guessing; for this reason, test protocols often stress that patients should be instructed not to guess but only to report clear depth percepts. If we dismiss the possibility of guessing, the two-alternative tests may actually be preferable, as they reduce task complexity and are more accessible to small children.22

Precision and reliability of stereoacuity measurements

Very few clinical stereo tests are capable of measuring the stereo threshold of healthy controls. Subjects with good binocular vision can have stereoacuity thresholds as low as 2 seconds of arc, and 80% have thresholds 30 arcsec (sec 19.3.1 of Howard and Rogers23). As Table 1 shows, most commercial tests are not designed to measure such low thresholds, with a threshold of 20 arcsec often being considered ‘normal’. In principle, commercial tests can be used to present arbitrarily low disparities by increasing the viewing distance. This is not an ideal solution, as it also changes the size and spatial frequency content of the image.24 Clinical stereotests are designed to quantify the degree of impairment rather than measuring the abilities of healthy subjects.

Several studies have examined the test–retest variability of stereotests. These have generally quantified agreement by the 95% limits of agreement, that is, ±2σ, where σ is the SD of the difference between two measurements in the same subject using the same test.25 The ‘measurement’ here refers to log10 (threshold in arcsec), as the uncertainty on stereo threshold is roughly constant when expressed as a fractional error. In clinical practice, one is often interested in the change in the measured value: if a stereo threshold is lower now than on the last visit, does this represent a real improvement? In order to be confident that a stereo threshold has really changed, the log10-thresholds must differ by 2σ. That is, the two thresholds must differ by a factor of at least F=102σ.

Fawcett and Birch26 found F=2.1 for the Randot Preschool stereotest. Adler et al,27 using the Randot graded circles test, report σ as 1.57 Randot plates rather than in log10 arcsec, but on average each Randot plate increases disparity by a factor of 1.4. Thus, this corresponds to F=2.8. Adams et al28 examined reliability for several stereotests and found F=3.9 for preschool Randot, F=1.7 for the near Frisby, F=4.8 for the FD2, and F=2.9 for the distance Randot. They concluded that changes in stereoacuity of less than a factor of four are not clinically meaningful, as they cannot be distinguished from measurement error. This is fairly poor, especially as stereotests only offer a few possible scores, often differing by a factor of two, which would tend to increase the reported reliability compared with the same test with a finer range of possible scores. The poor reliability makes it hard for clinicians to monitor the effect of therapy on stereoacuity.

One reason for this poor reliability must be the low number of trials required by protocols: as we have seen, if the ‘pass’ level is set on just three trials of a four-alternative test, it is possible to pass by chance. It is difficult to obtain many trials in small children, but current protocols ‘waste’ trials, for example by conducting three repeats of the early, easy levels. It would be more efficient to use a staircase procedure, where one or two correct answers moves the subject onto the next level, but a wrong answer sends them back. These are statistically a more efficient way of obtaining information about the patient’s abilities, but are hard for the tester to implement. In addition, current tests offer a limited range of trials; for example, the Randot graded-circles test offers only one trial at each level. Disparities can be presented only at a limited number of preset levels.

Many of these problems could be avoided by using computerised tests. These are routine in vision science laboratories, but are not usual in the clinic because of the cost and space required and the lack of suitable software. A few recent papers have explored the use of computerised stereotests in clinical populations,29, 30, 31, 32 but none of these are currently commercially available. The only commercial computerised stereotest of which I am aware is the Test Chart Xpert 3Di from Thomson Software Solutions, available with either anaglyph (red/blue) or polarising 3D display. It enables the user to present unlimited trials at a range of available disparities, but does not implement the mathematical techniques used to improve the precision of threshold estimates in the lab. As yet, no published studies have used the stereotest aspect of the Test Chart Xpert.

Effect of viewing distance

In healthy controls, stereo thresholds in arcsec are independent of viewing distances over a very wide range (30 cm–10 m).33, 34, 35, 36 That is, the threshold depends only on the retinal disparity, and not on the vergence angle. (Bradshaw and Glennerster33 found a small increase in stereo threshold when the viewing distance was halved from 60 to 30 cm, but at <2 arcsec this is not clinically measurable.) This would suggest that the viewing distance of a clinical stereotest should be immaterial. Accordingly, most stereotests are designed for arm’s length viewing. This is convenient in the clinic, as it requires less space and makes it easier to maintain the attention of young patients.

However, this independence on viewing distance is true only for the sensory component of stereopsis. The binocular neurons in visual cortex that detect disparity are sensitive almost exclusively to retinal information, regardless of how this is presented.37 In normal subjects, the oculomotor system ensures that subjects fixate correctly upon stimuli of all viewing distances, ensuring that the fixated object has zero retinal disparity regardless of its distance. In strabismus, this system is impaired. Even a misalignment of just 0.25° (900 arcsec, 0.44 prism dioptres) would have a profoundly damaging on stereoacuity, as it would add a disparity of 900 arcsec to all parts of the stimulus. A subject who can easily see the disparity boundary between 0 and 20 arcsec might well be completely blind to the difference between 900 and 920 arcsec.38 This effect probably explains why distance stereotests such as the FD2 or Distance Randot have been found to be more sensitive than near tests in intermittent exotropia, and to be a more valuable tool in management and predicting surgery.39, 40, 41, 42, 43 Patients with intermittent exotropia generally find it easier to maintain correct fixation at near distances. When tested at distance, even if they do not show a measurable deviation, they may fixate less accurately or precisely (meaning that there may be a mean misalignment, or simply greater fluctuations). This will reduce their measured stereoacuity. Thus, near stereotests are more practical for assessing sensory mechanisms of stereopsis, but distance stereotests are probably more useful in assessing oculomotor function.

Contour vs cyclopean stereograms

Stereotest stimuli can be divided into two classes. In the first, disparity is applied to monocularly visible contours, as in a line drawing of a circle or fly, that are defined by their luminance in the usual way. In the second, the object to be detected is defined only by its disparity. Monocularly, the image appears as a pattern of dots or the ‘static’ of an untuned television, without any contours that define the edges of the object (Figure 3). The object is revealed only when the two eyes’ images are compared. These are sometimes known as ‘cyclopean’ stimuli, a term introduced by Julesz.44 In Table 1, the column headed ‘Monocular contours?’ distinguishes these two types of test image. Examples of ‘cyclopean’ stereotests include the Lang and Randot Preschool tests; examples of ‘contour’ stereotests include the Randot Circles and FD2. Several lines of evidence suggest that neuronal processing may be different for contour and cyclopean stereograms. Disparity-tuned neurons in primary visual cortex respond to disparity in both types of image, but additional mechanisms may be available to extract disparity in contour images. Contour and cyclopean stereograms are affected differently when one eye’s image is replaced with its photographic negative, as in Figure 4. In dense random-dot patterns like those in Figure 3, this manipulation either destroys depth perception completely or leaves a weak perception of reversed depth.44, 45, 46 However, when the image is sufficiently sparse, for example, the line drawing shown in Figure 4, depth is seen in the direction consistent with the disparity of the contours.45, 47 As a second example, sparse line stimuli presented with large disparities, outside Panum’s fusional range, are seen double, but subjects nevertheless show appropriate vergence movements and can report the sign of the disparity.48, 49 For random-dot stereograms with similarly large disparities, subjects are at chance.50 Several workers have suggested that human stereo vision consists of at least two distinct components, sometimes dubbed ‘coarse’ and ‘fine’ stereopsis, supported by different neural mechanisms.51, 52, 53 Contour and cyclopean test images probably activate these components to differing amounts.

Figure 3
figure 3

An example of a cyclopean stereogram. If the eyes are crossed or diverged such that each eye fixates the centre of one of the patterns, a square region will be seen standing out in depth. This square is not defined in either of the monocular images.

Figure 4
figure 4

A stereogram where one eye’s image is replaced with its photographic negative, redrawn from Helmholtz47 (Plate IV, Figure Q). When these images are fused divergently, the central pentagon should appear in front, in accordance with the disparity of the lines, whereas the contrast mismatch produces the impression of a ‘crystal … of some dark shining substance like graphite’.47

Surprisingly, little seems to have been done in comparing contour vs cyclopean stereoacuity in healthy controls. Fawcett16 reports that, in 54 controls, there were no significant differences in stereoacuity measured with the Titmus circles, Randot circles, or Preschool Randot, but this may reflect the floor effect noted above, that is, all available disparities were above threshold for subjects with good stereoacuity. Wong et al36 report that, in 12 controls, stereoacuity was better with Contour Circles than with a Random Dot E stimulus; the median stereo threshold was ∼40 arcsec lower with the Circles than with the E.

Conversely, in strabismus patients, it is well established that patients will generally show better stereoacuity (ie, lower thresholds) when measured on a monocular-contour test such as the Randot graded-circles, as compared with a cyclopean pattern such as the Randot Preschool test.15, 16, 54, 55, 56, 57, 58, 59 Fu et al55 suggest that the better performance on contour stereograms may be because these ‘provide cues to fusion that allow some patients with strabismus to better control their deviations than random dot targets’. This motor explanation may well contribute,58 but sensory mechanisms probably contribute as well. Giaschi et al56 have reported that amblyopic children with poor or no stereopsis on the Randot Preschool test nevertheless perform as well as controls when the stimulus was a monocularly visible cartoon character with a large disparity. They conclude that although the ‘fine stereo’ sensory system is impaired in these children, ‘coarse stereo’ is spared.

The clinical importance of stereoacuity measurements

As we have seen, stereopsis in humans requires good vision in each eye individually, precise oculomotor control in order to direct the two eyes at a common target, and a population of binocular sensory neurons in visual cortex in order to detect the disparity between the two eyes’ images. Stereopsis with cyclopean stimuli probably requires additional neuronal mechanisms over and above contour stimuli. Thus, good stereoacuity with cyclopean stimuli is the most demanding achievement of binocular vision. For this reason, as the Cochrane review on ‘Interventions for infantile esotropia’ states,13 a measurement of stereoacuity is regarded as the gold standard for diagnosing the presence and quality of binocular vision. It is a key component of outcome measures in most studies of interventions for strabismus and amblyopia. For example, the Cochrane review on botulinum treatment for strabismus60 classifies outcomes based on angle of deviation, simultaneous perception, motor vergence, and stereoacuity.

Strabismus in early life prevents the normal development of binocular sensory neurons in visual cortex.61 Accordingly, early strabismus has a profoundly damaging effect on stereoacuity, particularly on the ‘fine’ stereoacuity that works with cyclopean images and depends on these binocular neurons. Studies in non-human primates suggest that the sensitive period for binocular vision may be substantially longer than for other aspects of vision such as spectral sensitivity.62 In healthy controls, although cyclopean stereo vision can be demonstrated in infants at ∼3 months,63, 64 stereoacuity continues to improve up to the age of ∼10 years.65, 66, 67, 68, 69, 70 This long period of plasticity implies that binocular vision remains vulnerable to disruption until later in development, for example, by the onset of accommodative esotropia in toddlerhood.71 Conversely, the window for recovery remains open for longer. Indeed, there are occasional reports of strabismus patients recovering stereopsis as adults, many years after treatment.72 Thus, although early strabismus is extremely damaging to stereo vision, it is also clear that sufficiently early intervention can go some way to restoring it.

Infantile esotropia, defined as a large-angle inwards deviation that becomes constant before 6 months of age, unsurprisingly has a particularly disruptive effect on stereo vision. Among children whose eyes are surgically aligned after the age of 24 months, only 12% achieve any stereo vision,73 although this rises to 74% among children aligned before 6 months of age.73 Birch et al73 suggest that the poorer outcome of surgery after 6 months does not reflect the closure of a sensitive period, but simply the brain’s longer exposure to misaligned visual input. However, although early surgery does restore some stereo vision, stereoacuity is by no means normal.73, 74, 75 One possibility is that infantile esotropia reflects a pre-existing sensory deficit. However, Birch et al73, 74, 75 argue that the available evidence does not support this, and argue that the sensory deficit is secondary to the motor deficit. They suggest that stereoacuity remains subnormal because surgery was not carried out soon enough, and did not align the eyes precisely enough. Normal stereoacuity may require alignment within 0.6 prism dioptres, yet ocular alignment can only be measured clinically to around ±3 prism dioptres.76 Birch et al75 suggest that very early (<2–3 months of age) treatment may be necessary, with botulinum treatment preferable to surgery.77 These conclusions are supported in other forms of strabismus such as accommodative esotropia and intermittent exotropia. In all cases, early intervention is associated with better stereo vision,13, 78 and the critical factor is the duration of the misalignment. Fawcett et al78 conclude that fine stereoacuity is likely to be permanently impaired by a constant misalignment that persists for longer than 4 months. Prompt treatment is therefore important if normal stereoacuity is to be achieved.

Achieving good stereo vision is a valuable goal. Binocular disparity cues help us guide our hand movements precisely79, 80, 81, 82 and both children and adults with impaired stereo vision perform worse on a range of visuomotor tasks than their peers with normal stereoacuity.83, 84, 85, 86, 87, 88, 89, 90 Stereoacuity has also been linked to better reading ability,91, 92 perhaps because both stereopsis and reading require precise control of eye movements.93 Untreated children with infantile esotropia lag behind on developmental milestones but catch up following early alignment surgery (in the first year of life),94 an outcome that the authors attribute to better binocular vision and stereopsis. However, it remains unclear whether this motor improvement reflects stereopsis specifically or some other aspect of binocular vision.95

As well as enabling better visual and motor performance, stereoacuity is also linked to long-term stability of alignment.96, 97 Birch et al96, 97 studied children who underwent surgery for infantile esotropia, resulting in stable alignment within 4 prism dioptres by 2 years of age. Children who had no stereo vision postoperatively were 3.6 times more likely to need repeat surgery later in childhood. Out of 60 children with accommodative esotropia who received successful optical correction to within 4 prism dioptres by age 4, those who had no stereo vision following alignment were 17 times more likely to need surgery later. These are very striking differences. Several mechanisms may contribute to these differential risk factors. For example, it may be that perfect orthotropia is more stable than less perfect alignment, and also allows better stereoacuity. The children with stereopsis may simply have been those whose misalignment was corrected most accurately. The low precision of clinical measurement of misalignment makes it hard to test this hypothesis; after treatment, all these children were equally well aligned to within the precision possible with clinical measurements.76 Alternatively, the differences may have been sensory: perhaps the children who were stereoblind after alignment had lost the sensory neurons that normally support stereopsis. As these neurons help maintain correct alignment by triggering vergence reflexes, it is not surprising that children in whom these mechanisms are spared are better able to maintain long-term alignment. What is clear is that stereoacuity is the most sensitive outcome measure currently available; stereoacuity measures immediately after treatment predict the long-term success.

Conclusion

Human stereo vision is capable of remarkably precise judgments, discriminating binocular disparities as small as 2 seconds of arc. Such performance requires good vision in both eyes, very precise oculomotor coordination and specialised sensory neurons in visual cortex. A measurement of stereoacuity is therefore a very sensitive test of binocular function at both the ocular and cortical levels. As well as enabling the clinician to assess the short-term success of surgery, it can also predict long-term outcomes. However, current measures of stereoacuity are plagued by low reliability that limits their usefulness in practice. Making the measurement of stereoacuity more precise and reliable, especially in young children, should improve its value as a tool to manage strabismus.