First- and second-order contributions to depth perception in anti-correlated random dot stereograms

Asher, Jordi M.; Hibbard, Paul B.

doi:10.1038/s41598-018-32500-4

Download PDF

Article
Open access
Published: 20 September 2018

First- and second-order contributions to depth perception in anti-correlated random dot stereograms

Scientific Reports volume 8, Article number: 14120 (2018) Cite this article

2277 Accesses
6 Citations
3 Altmetric
Metrics details

Subjects

Abstract

The binocular energy model of neural responses predicts that depth from binocular disparity might be perceived in the reversed direction when the contrast of dots presented to one eye is reversed. While reversed-depth has been found using anti-correlated random-dot stereograms (ACRDS) the findings are inconsistent across studies. The mixed findings may be accounted for by the presence of a gap between the target and surround, or as a result of overlap of dots around the vertical edges of the stimuli. To test this, we assessed whether (1) the gap size (0, 19.2 or 38.4 arc min) (2) the correlation of dots or (3) the border orientation (circular target, or horizontal or vertical edge) affected the perception of depth. Reversed-depth from ACRDS (circular no-gap condition) was seen by a minority of participants, but this effect reduced as the gap size increased. Depth was mostly perceived in the correct direction for ACRDS edge stimuli, with the effect increasing with the gap size. The inconsistency across conditions can be accounted for by the relative reliability of first- and second-order depth detection mechanisms, and the coarse spatial resolution of the latter.

Absolute and relative disparity mechanisms revealed by an equivalent noise analysis

Article Open access 22 March 2024

Jian Ding, Hilary H. Lu & Dennis M. Levi

Cognitive penetrability of scene representations based on horizontal image disparities

Article Open access 25 October 2022

Yulan D. Chen, Milena Kaestner & Anthony M. Norcia

Nearby contours abolish the binocular advantage

Article Open access 19 August 2021

Maria Lev, Jian Ding, … Dennis M. Levi

Introduction

Binocular depth perception depends on our ability to determine the difference in position of corresponding points between the two eyes’ images. This correspondence problem can be solved by using a matching measure to determine the image regions in each eye which are most similar in terms of the variation in local luminance intensity. This similarity matching can for example be based on the correlation of local intensity values^1,2,3,4. At the correct disparity offset between the left and right eyes, each point is expected to have similar luminance values in each eye, leading to a high interocular correlation. At incorrect disparities, non-corresponding points will be compared, which are likely to have different values of luminance, leading to low correlations.

In addition to the standard correlation-based measures described above, other matching-based metrics have been proposed that depend on detecting similarities between two image samples, based on the presence of individual matching features, rather than the overall correlation within the region^5,6. These metrics are not necessarily mutually exclusive, and it has been suggested that both metrics are used independently and simultaneously^5,6. Under the matching metric proposed, all evidence in favour of a match, when points have the same luminance polarity, contributes positively to the matching metric. However, in contrast to a standard correlation, points that have opposite contrast polarities, and provide evidence against a match, are ignored.

These similarity calculations can be related to the responses of binocular neurons in the visual cortex^3,4,7,8. Neurons in V1 have a localised receptive field, consisting of both excitatory and inhibitory regions, and tend also to be tuned to orientation and spatial frequency^3,9,10,11,12. Binocular neurons have a receptive field in both eyes, and their responses are thus sensitive to binocular disparities. Figure 1(a) shows an idealised binocular energy neuron^3,9,10, consisting of a quadrature pair of receptive fields for each eye. The responses of the first-stage filters are the square of the sum across the two eyes’ receptive fields (Fig. 1(b)), which in turn are summed to compute the binocular energy response.

These components of the energy model have been used to characterise the responses of binocular simple and complex cells, respectively, although this simple hierarchy is best viewed as an idealisation of how these computations are performed¹³. The response of a binocular energy neuron depends on the disparity in the images, forming a characteristic disparity tuning function (Fig. 1(c)). In this example, the receptive fields of the filters are shifted horizontally in the right eye compared to those in the left eye, meaning that this neuron responds most strongly to stimuli with the same disparity. This peak response at this optimum disparity is accompanied by disparities at which the response is reduced compared to baseline.

The binocular energy response is related to the point-wise correlation between the filtered left and right images³. By summing across frequency, orientation and position^3,14,15, the correlation within a spatial neighbourhood can be calculated^3,4. Complex cells in V1, as characterised by the binocular energy model, are thus well suited to support the calculation of the cross-correlation and other matching calculations thought to contribute to the solution of the correspondence problem⁹.

The dependence of the energy response on binocular cross-correlation means that manipulation of this correlation has provided a useful way of understanding how the responses of populations of binocular neurons contribute to the perception of depth². This is often investigated psychophysically using random-dot stereograms (RDS) which, by projecting an image that has been shifted slightly between the left and right eye, creates the perception of depth. Correlated random-dot stereograms (CRDS) present dots of the same luminance to each eye (Fig. 2(a)). However, an interesting application of RDS is the use of anti-correlated random-dot stereogram stimuli (ACRDS), where one eye’s view is replaced with its photographic negative (Fig. 2(b))^{5,12,16,17,18,19,20,21,22,23}. This means that the high positive correlations expected at the correct disparity become negative, and the disparity tuning function is inverted. Neurons in V1 show this inversion effect, but also a reduction in magnitude of their response that is not predicted by the energy model^9,10,11. This reduction in response has been modelled using the introduction of a threshold non-linearity^24,25,26, or a squaring of the energy response²⁷. These expansive nonlinearities, by enhancing the difference between the amplitudes of the positive and negative peaks in the disparity tuning function, can be used to implement the cross-matching mechanism proposed by Doi and Fujita^5,6.

In higher visual areas in the ventral stream, the responses of neurons tend not to be modulated by the disparity in ACRDS^28,29. In contrast, neurons in dorsal stream areas show disparity tuning similar to that found in V1, but reduced in magnitude³⁰.

In some psychophysical studies, the direction of depth perceived in ACRDS has been found to be reversed in comparison with equivalent CRDS^5,17,21,23. Thus, we use the terms ‘reversed-depth’ to describe depth that is perceived in the opposite direction to the correct direction, and ‘forward-depth’ depth that is perceived in the correct direction. One possible explanation of this percept is that it reflects the peaks in the inverted disparity tuning function, although whether these would signal a reversed- or forward-depth direction depends on the relationship between the spatial frequency and disparity tuning of the neuron, and the stimulus disparity (Fig. 3).

Alternatively, it has been suggested that the estimation of depth might reflect opponent processing, in which the difference between the responses of neurons tuned to equal but opposite disparities are calculated^6,27. In this case, the negative correlations that exist in ACRDS would directly contribute to the reduction in the response to the correct disparity, thereby biasing the perception of depth towards the opposite direction.

Other studies have found no evidence for the perception of depth in ACRDS^18,19,21. This result is consistent with the fact that there is no coherent disparity, across different scales of analysis, in ACRDS, and also with the lack of disparity-selective responses in higher cortical areas.

The effect of decorrelation has been further assessed by Doi et al.⁵ who created stimuli containing an equal mixture of correlated and anti-correlated dot pairs. This results in an overall correlation of zero, so that a correlation-based mechanism predicts that depth would not be seen in these half-matched stimuli. In fact, depth is perceived in the correct direction. This is consistent with the responses of the cross-matching mechanism, which responds to the correct-matched dot pairs, but not to the anti-correlated pairs. Henriksen et al.²⁷ showed that a similar prediction can be made by the squared energy response, which enhances the difference between the response to the paired and unpaired dots.

Predicting the perception of depth in ACRDS is complicated by the fact that there is no coherent disparity signal across different spatial scales. In CRDS, pooling of information across orientation, scale and position allows true peaks to emerge from amongst the many large responses that will occur at incorrect disparity values^3,14,15. This process does not produce a clear estimate for ACRDS, since large peaks are predicted to occur at different locations at different spatial scales¹⁹ (Fig. 3).

The perception of depth in ACRDS is further complicated by the existence of both first- and second-order mechanisms in stereoscopic processing. The discussion above considers the disparity information present in the Fourier components of the image, and how they might be combined. However, depth can also be perceived in second-order stimuli, containing informative disparities in variations in contrast, rather than in the underlying texture³¹. Evidence for the existence of second-order channels has been provided by both psychophysical experiments, showing that participants can perceive depth in these stimuli^{31,32,33,34,35,36,37,38,39,40,41}, and physiological experiments, showing disparity-tuned responses to contrast envelopes¹⁴. Tanaka and Ohzawa¹⁴ accounted for the responses of second-order neurons using a variation of the energy model. This model takes as its input not raw image values, but the outputs of monocular energy filters (Fig. 4). This monocular energy calculation is followed by a binocular energy calculation, with filters tuned to a much lower spatial frequency.

Wilcox and Hess⁴⁰ showed that depth can be perceived from contrast envelopes even when the underlying random noise patterns are completely uncorrelated, provided that there is an informative disparity in the contrast variation. Sensitivity to depth from second-order stimuli is much poorer than that for first-order stimuli⁴¹, and allows simple depth judgements to be made, but not the perception of 3D shape⁴². Second-order mechanisms will also provide disparity-tuned responses to reversed polarity stimuli, because the rectifying nonlinearity captures the magnitude, but not the contrast polarity, of luminance variations. An important distinction between first- and second-order channels is that the latter will signal depth in the forward, rather than reversed, direction. Cogan et al.¹⁸ proposed that depth from ACRDS relies on second-order mechanisms. In their experiments, they found forward-depth perception for low density stimuli, but no reliable depth discrimination for high-density ACRDS.

The perception of depth in both CRDS and ACRDS also depends strongly on the presence of features at different disparities in the stimulus, such that the depth of a target can be judged based on its disparity relative to that at other locations. Large changes in absolute disparity, when they are not accompanied by changes in relative disparity, can go unnoticed by participants^43,44,45. Sensitivity to depth differences also falls as the spatial separation between target and reference increases⁴⁶. Karmihirata et al.⁴⁷ argued that, due to the relatively weak disparity provided by ACRDS, depth is only perceived when a correlated reference is present, and there is no spatial gap between this and the anti-correlated target. Using stimuli in which the target was a circular region of anti-correlated dots, surrounded by an annulus of correlated dots, they showed that reversed-depth was perceived when there was no-gap, but that this deteriorated when a small-gap was presented. The lack of a gap means that, once a non-zero disparity is incorporated into the stimuli, there will be overlap between the dots in the target and surround, such that both correlated and anti-correlated dots will fall into the receptive fields of neurons aligned with the vertical edges of the stimulus. This creates regions of decorrelation at the edges of the stimuli, on a different side in each eye. Decorrelation of this type occurs naturally through half-occlusion, whereby parts of a stimulus are visible to one eye but not the other (Fig. 5)^{48,49,50,51,52}. Depth is perceived in these stimuli, consistent with this geometric interpretation. It is thus possible that the perception of depth in this case relates to the presence of decorrelation, although it should be noted that this would not explain depth discrimination in other cases²¹ in which there was a horizontal edge.

The first purpose of the current study was to understand the contribution of decorrelated edge regions, and the spatial separation between target and surround, on the perception of depth in ACRDS. This was done by using stimuli consisting of (i) an anti-correlated circular target surrounded by a correlated annulus (ii) a vertical edge between correlated and anti-correlated regions and (iii) a horizontal edge between correlated and anti-correlated regions. In the first two stimuli, there are regions containing both correlated and anti-correlated dots, whereas in the horizontal stimuli each region contains exclusively correlated or anti-correlated dots. The second purpose was to understand the contribution of mechanisms sensitive to first- and second-order disparities, and to monocular image regions, to the perception of depth in RDS. We did this firstly by modelling the responses of first- and second-order mechanisms to our stimuli, and secondly by manipulating the presence of decorrelated regions in our psychophysical experiments.

Psychophysical Experiment

Methods

Participants

10 participants (8 females, mean(std) age 24.5(9.6)) completed the experiment. All had normal or corrected to normal vision, and stereoacuity of at least 50 arc sec, as measured using the Stereo Optical Butterfly Stereotest. All work was carried out in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). The study procedures were approved by the University of Essex Ethics Committee (Application No. JA1601). All participants gave informed written consent and received payment for their participation.

Apparatus

Stimuli were presented on a VIEWPIXX 3D monitor, viewed from a distance of 40 cm. The monitor screen was 52 cm wide and 29 cm tall. The screen resolution was 1920 by 1080 pixels, with a refresh rate of 120 Hz. Each pixel subtended 2.4 arc min. Stereoscopic presentation was achieved using a 3DPixx IR emitter and NVIDIA 3D Vision LCD shutter glasses. The cross-talk between the left and right images, measured using a Minolta LS-110 photometer, was 0.12%. Participants’ responses were recorded using the computer keyboard. Stimuli were generated and presented using MATLAB and the Psychophysics Toolbox extensions^53,54,55.

Stimuli

Stimuli in all conditions were random dot stereograms, consisting of 12 arc min square red (27.0 cdm⁻²) and black (0 cdm⁻²) dots against a red (13.5 cdm⁻²) background. Equal numbers of red and black dots were presented, with a total density of 1.12 dots/degree⁻². In all cases, stimuli consisted of a correlated reference region, presented with 0 disparity, and a test region which was either correlated or anti-correlated. The dots were in all cases randomly repositioned on each frame. The test region was presented with disparities of ±5.5, ±11 and ±22 arc min. Stimuli were presented in three conditions: circular, horizontal and vertical (Fig. 6(a–i)).

Circular

The target was a circular region with a diameter of 4.4 degrees. The reference was a surrounding annulus, with no-gap, a small-gap (19.2 arc min) or a large-gap (38.4 arc min) between target and the inside of the reference region (Fig. 6(a–c)). The inner diameter of the reference annulus depended on the size of the gap between the test and reference, and the outer diameter was 1.83 degrees larger than the inner radius. The circular stimuli were presented with the centre of the test region 5.5 degrees below the fixation cross.

Horizontal Edge

The reference and test regions were both a rectangle with a width of 5.5 degrees and a height of 2.75 degrees. Both were centered 5.5 degrees to the right of fixation, with the test presented above and the reference presented below the fixation cross. There were three levels of vertical gap; no-gap, a small-gap (19.2 arc min) or a big-gap (38.4 arc min) between the reference and test regions (Fig. 6(d–f)).

Vertical Edge

The reference and test regions were both a rectangle with a width of 2.75 degrees and a height of 5.5 degrees. Both were centred 5.5 degrees below the fixation cross, with the test presented to the right and the reference presented to the left. As with the previous conditions, there was a horizontal separation of either no-gap, a small-gap (19.2 arc min) or a big-gap (38.4 arc min) between the reference and test regions (Fig. 6(g–i)).

Procedure

Each trial began with the presentation of a central fixation cross for 200 ms, followed by the presentation of a stimulus for 80 ms. The fixation cross remained visible throughout each trial block.

After stimulus presentation, the participant was required to make two responses. The first was to indicate whether the target appeared closer (down arrow) or further way (up arrow) than the reference. The second was to indicate whether they felt confident in their response (right arrow) or that they were guessing (left arrow). The next trial began after the two responses had been made (Fig. 6(j)).

Each participant completed 18 blocks of trials, for all combinations of the three configurations, three separations, and two correlation conditions. In each block, each of the 6 disparities was presented 20 times. Blocks were presented in a randomised order, and separated over two or more sessions.

Results

Depth Perception

To determine whether participants’ judgements shifted from ‘far’ to ‘near’ (or vice versa) as disparity shifted from uncrossed to crossed, results were analysed with a generalised linear mixed effects model. In this model, disparity was used as a predictor, with a probit linking function, and random slopes and intercepts across participants were included. A separate model was fit for each combination of shape, correlation and stimulus separation. This model allows us to determine whether, at the population level, forward- or reversed-depth was perceived, while also taking account of variation across participants.

Previous studies have found significant individual differences in depth perception in ACRDS, for example with some participants perceiving reversed-depth but others perceiving no depth²¹. We therefore fit the same generalised linear model, with a probit linking function, to the data for each participant separately. In each case, we then determined whether that participant perceived forward- or reversed-depth for each stimulus based on whether the slope parameter of the model had a significant positive or negative value. The results are shown in Fig. 8, which plots the number of participants perceiving forward-depth, reversed-depth, or no significant depth, for each stimulus type.

Circular Stimuli

The proportion of near judgements is plotted as a function of disparity. Results are plotted separately for CRDS and ACRDS and for the three separation distances. Figure 7(a–c) shows the mean results across participants. For CRDS, there was a positive slope for all separations indicating forward-depth. In contrast there was no significant slope in either direction for ACRDS (Table 1).

Table 1 Estimates of effects from the generalised linear mixed effects regression for the proportion of near versus far responses in CRDS and ACRDS stimuli.

Full size table

Individual analyses (Fig. 8) showed that the majority of participants perceived forward-depth for CRDS, while some were not able to reliably report the direction of depth for the larger stimulus separations. For CRDS, in most cases there was no reliable perception of depth. With no separation, 4/10 participants reported reversed-depth, but no reversed-depth was perceived for larger separations, consistent with recent findings⁴⁷. However, for the largest separation, there was evidence of reliable perceived depth in the forward direction for two participants.

Horizontal Stimuli

Data are presented in Fig. 7(d–f) in the same format as for circular stimuli. For CRDS, there as a positive slope for all separations. For ACRDS, there was a significant positive slope for the small separation, consistent with forward, rather than reversed, depth (Table 1).

Individual analyses (Fig. 8) showed that the majority of participants perceived forward-depth for CRDS, although this proportion decreased with increasing separation. For ACRDS, one participant perceived depth in the reverse direction when there was no-gap, and a minority of participants perceived depth in the forward direction.

Vertical Stimuli

Data for the vertical stimuli are presented in Fig. 7(g–i). For CRDS, there is a positive slope for all separations. For ACRDS, there was a significant positive slope for the two non-zero gaps, consistent with forward-depth (Table 1).

Individual analyses (Fig. 8) showed very similar results to those found with a horizontal edge. The majority of participants perceived forward-depth for CRDS, and this proportion decreased with increasing separation. For ACRDS, the majority of participants did not reliably perceive depth, but a minority did consistently perceive forward-depth, and there was only one example of reliable reversed-depth.

Confidence

Mean confidence ratings are plotted as a function of disparity for each combination of shape, correlation and distance in Fig. 10. Results are averaged across signs of disparity since, unlike depth judgements, we do not expect opposite results for near and far stimuli.

These results were analysed using a linear mixed effects model, with the shape (circular, horizontal or vertical) and correlation (correlated or anti-correlated) as categorical factors, and separation and disparity as linear covariates. A correlation-by-distance interaction term was also included to determine whether separation had a greater effect on confidence for ACRDS than for CRDS. Participant was included as a random factor, with random intercepts and slopes against disparity. The results are summarised in Table 2. Confidence ratings were significantly lower for ACRDS than for CRDS. They were also significantly lower for the horizontal and vertical stimuli than for the circular stimuli. Ratings tended to increase with increasing gap size, but were not affected by disparity. There was no significant distance-by-correlation interaction, meaning that the separation did not affect confidence differently for CRDS and ACRDS. The effects of correlation, shape and distance are summarised along with the mean ratings in Fig. 9.

Table 2 Estimates of effects from the linear mixed effects regression for confidence judgements in CRDS and ACRDS stimuli.

Full size table

Relationship between Confidence and Performance

For each stimulus, we recorded a near/far depth judgement and an indication of whether or not the participant felt confident in this judgement. Figure 10 plots heatmaps of the relationship between confidence and performance, for CRDS and ACRDS, summed over all participants, disparities, separations and stimulus types. Performance was summarised as the number of consistent responses such that, for example, 20 near responses out of 20, and 20 far responses, were both coded as 100% consistent, while a stimulus for which there were 10 near and 10 far responses was coded as 50% consistent. This consistency measure does not depend on whether responses are in the forward or reversed direction, relative to the stimulus disparity. For CRDS, results are clustered in the top-right corner, indicating generally high levels of both consistency and confidence. For ACRDS, confidence was generally low, consistent with a low level of consistency in responses.

Discussion

The aim of our psychophysical experiment was to understand how the spatial separation (the size of the gap) and overlapping correlated and anti-correlated elements influence our perception of depth in ACRDS.

Predictably, our results show forward-depth perception for all CRDS stimuli. Depth perception for horizontal and vertical stimuli was not influenced by the size of the gap or type of stimulus. In addition for CRDS, confidence increased with gap size. Confidence was highest for circular stimuli and lower for horizontal and vertical stimuli.

In contrast, for ACRDS confidence in judgements was at or below 50% for all conditions. These are significantly lower than the respective confidence ratings for CRDS. Furthermore, depth perception for ACRDS was variable depending on the type of stimulus.

For circular ACRDS there was, on average, no depth perception for any gap size. However, reversed-depth as predicted by Aoki et al.¹⁷ was seen by some participants in the no-gap condition. For the small-gap, there was no depth discrimination shown by any participants, and for the big-gap only two participants reported forward-depth.

For horizontal stimuli, on average there was a tendency towards forward-depth for ACRDS with a small-gap, but no forward or reverse perception of depth for stimuli with no-gap or a big-gap. While the majority of participants reported no depth, a minority reported forward-depth, and one participant reported reversed-depth.

Finally, for vertical stimuli, on average depth was perceived in the forward direction for the small- and large-gap ACRDS, while there was no perception of depth with the no-gap stimuli. This trend reflected the forward-depth perception in a minority of participants. In both the horizontal and vertical edge conditions, perception of forward-depth in CRDS conditions was lower than in the circular conditions. This may be as a result of the surround annulus in the circular condition providing more points of reference (zero disparity), allowing for better judgement of depth for the central target. While it may be of interest to test this edge perception with an inner and outer rectangle, this was not the purpose of our investigation. However, we would predict that the complexity provided by such a target would provide the same results (for CRDS and ACRDS) as the circular conditions. We will discuss this prediction in more detail in the general discussion.

Aoki et al.¹⁷ found that removing the gap between the target and the surround for ACRDS increased the perception of reversed-depth. However the removal of the gap results in the potential for overlap between the anti-correlated dots in the target and the correlated dots in the surround. This results in decorrelation (an average correlation of zero) at the edges on opposite sides of the target in each eye. This occlusion occurs naturally in the phenomenon of da Vinci stereopsis when the edges of vertically oriented stimuli occlude the surface from the opposite eye⁵⁰ as illustrated in Fig. 5(a–c). If these uncorrelated regions were taken to indicate the presence of half-occlusions, we would predict that depth would be perceived in the reverse direction.

Our results indicate that there was little evidence of reversed-depth for the vertical edges, and one condition (small-gap) showed a significant tendency for depth to be perceived in the forward direction. There was only one participant who reported reversed-depth for the horizontal no-gap condition.

Our results are consistent with the findings of Aoki et al.¹⁷, where removing the gap did result in reversed-depth for some observers. However, this only occurred for the circular condition, and no reliable perception of reversed-depth was reported for the horizontal or vertical no-gap conditions. This suggests that the complexity of the circular stimulus (particularly when there is no separation between the target and it zero-disparity reference) is more likely responsible for the perception of reversed-depth, than is the removal of the gap.

This study found no robust evidence for reliable reversed-depth perception in any condition, supporting the findings by Hibbard et al.¹⁹. The anti-correlated circular condition with no-gap produced the highest proportion of reversed-depth perception, consistent with previous results^17,47. Furthermore the circular no-gap and small-gap conditions were the only conditions not reporting any forward-depth perception. The trend towards depth perception in the forward direction, while not consistent with the information provided by first-order disparity channels, is predicted by the depth signalled by second-order mechanisms. The conflicting depth signalled by first- and second-order mechanisms for ACRDS is explored in the following section.

First- and Second-Order Responses to Correlated and Anti-Correlated Random Dot Stereograms

The perception of depth depends on multiple mechanisms, including first- and second- order channels, tuned to a variety of scales and orientations. How these contribute to the perception of depth depends on how the information they provide is processed in higher cortical areas. In order to understand the contribution of first- and second-order mechanisms to perceived depth in CRDS and ACRDS, we modelled the responses of these mechanisms to our stimuli.