No single, stable 3D representation can explain pointing biases in a spatial updating task

People are able to keep track of objects as they navigate through space, even when objects are out of sight. This requires some kind of representation of the scene and of the observer’s location but the form this might take is debated. We tested the accuracy and reliability of observers’ estimates of the visual direction of previously-viewed targets. Participants viewed four objects from one location, with binocular vision and small head movements then, without any further sight of the targets, they walked to another location and pointed towards them. All conditions were tested in an immersive virtual environment and some were also carried out in a real scene. Participants made large, consistent pointing errors that are poorly explained by any stable 3D representation. Any explanation based on a 3D representation would have to posit a different layout of the remembered scene depending on the orientation of the obscuring wall at the moment the participant points. Our data show that the mechanisms for updating visual direction of unseen targets are not based on a stable 3D model of the scene, even a distorted one.

Black solid line indicates walls that were always present. In (c-j), the gray dotted line shows the wall that appeared when the participant pressed a button at the start zone (white diamond). The black dotted line shows the wall that appeared after participant left the start zone. In (g) and (h) the wall remained in the same place.  (e) and (f) show data from an additional condition (participants initially faced 'West') compared to the 'North' and 'South' conditions. (g) In Experiment 4, the same conditions were repeated twice; pointing errors for the two runs are plotted against each other (slope is 0.61, indicating a smaller range of errors on the second run). (f) The very first time that participants viewed the stimulus in the real world, they were given no instructions at the start zone. Then, at the pointing zone, they were asked to point to the four boxes in a random order, eight times each (i.e. 32 shots per participant). It is debatable whether a post-hoc power analysis is of value in this instance but, for the record, this shows that the power achieved to rule out the correlation shown in Fig. 2d/Fig. S2a occurring by chance is, to a very close approximation, 100%. More relevantly, the same pattern of biases is found throughout the remaining experiments in the paper.  Figure S3. Evidence supporting the choice of 'shooting' angle (φ) rather than visual direction (θ) as the most appropriate definition of pointing direction. In a control condition, participants pointed at target boxes from the start zone so the target was visible (18 participants tested on 2 box layouts with 4 target boxes in each layout, 144 shots in total). Their instruction was to 'shoot at the box', just as it was in the spatial updating experiments. (a) Distribution of pointing errors when target direction is defined relative to the pointing device and shooting direction is defined by the orientation of the device (see φ in Fig. 1d). (b) As for (a) but with pointing errors defined as the difference in visual direction of the target and the pointing device as measured from the cyclopean point (see θ in Fig. 1e). The mean of the distribution is significantly biased for θ (t-test, p < 0.001) but not for φ (p = 0.605) suggesting φ reflects participants' intentions when pointing. (c) There is a significant correlation between the φ and θ measures in the spatial updating experiments (correlation coefficient 0.91). These data are from Experiment 1. When φ = 0, θ is about −20 • . A negative bias is the direction of bias that would be created if a right-handed participant held the pointing device slightly out to their right and pointed to a target directly ahead of them.

Noisy-path-integration Model
The noisy-path-integration model simulates a moving observer storing an egocentric map of the box positions by constantly updating the heading direction with respect to 'North' (α) and estimating the distance traveled on each step (d). Here, we assume that the observer misestimates α and d with a constant error (multiplicative calibration errors ω α and ω d respectively). This leads to a cumulative error in the estimate of the participant's location. The box locations are assumed to be known correctly. Initially, the position of the boxes is given, by definition, as follows, where S b 0 is the starting distance of box b and angle η b 0 is its visual direction with respect to 'North' ( π 2 ) as viewed from the start position (boxes are still 4/10 visible here): with boxpos b x and boxpos b y being the xand y-coordinates of the b th box, pos 0,x and pos 0,y the cyclopean point of the observer at the start position. At subsequent steps, when the boxes become obscured by the wall, the polar coordinates are calculated using the following equations: We fitted this model to all the data from all participants in Experiment 1 by varying the two free parameters, ω α and ω d , to give the minimum root-mean-square-error between the actual and predicted pointing directions. (see Fig. S4a).  Fig. 2d, each symbol is based on the mean data for 20 participants. The data for zone C (triangles) is most informative as the length difference between the direct and indirect paths is most extreme in this case. Here, the errors for indirect walking are significantly more positive than the errors for direct walking (direct walking M = 6.06, SD = 3.19, indirect walking M = 0.764, SD = 2.76, t(35) = 18.5, p < 0.001), whereas the experimental data for these two conditions, reproduced here in (c) from Fig. 2d, were not significantly different. (d) Histogram of prediction errors calculated 100 times for each box in each layout, using the walking trajectory of every participant tested in the indirect walking condition of Experiment 1 at one of the pointing zones (zone C) with a normally distributed random noise on the estimate of η b n . The mean of the pointing errors is not significantly different from zero. The mean of the distribution for zone A and zone B was also not significantly different from zero.

Zero-mean Noise
If, instead, we assume that the noisy-path-integration noise has zero mean then there is no systematic effect on pointing, which we demonstrate as follows for an estimate of orientation. We added a normally distributed random error to the estimate of visual direction with respect to 'North' on each step, η b n : with the function randn(µ, σ) returning a random number drawn from a distribution with a standard deviation σ, and a mean µ. E is a random error added to the estimate of η b n , drawn from a distribution with a standard 6/10 deviation of σ = π 360 radians, and a mean of µ = 0. Using Eq. (1), predictions of pointing directions were calculated with a random additive noise on the direction of 'North'. Calculating the directions 100 times for each box in each layout, using the walking trajectory of every participant tested in the indirect walking condition of Experiment 1 at pointing Zone C, a histogram of errors is plotted in Fig. S4d (and the same result applies for Zone A and Zone B).

Abathic Model
Johnston 1 shows psychophysical data described by a linear relationship between estimated (or 'scaling') distance to physical viewing distance (e.g. their Fig. 7). In general, we can fit the two parameters (intercept and slope) to our pointing data (Fig. 7a). In our case, the best-fitting values are a slope of 1.03 and an intercept of 0.17, which corresponds to an abathic distance of −5.66. Specifically, the misestimated egocentric distances of the boxes, d b est is: where d b true is the true egocentric distance and b = [1, . . . , 4] the index for each box.

Retrofit Model
We can allow box position to vary and calculate the maximum likelihood configuration of boxes that could account for the participant data (separately for each Experiment). We considered a 200 by 200 grid of possible box positions centered on the true box locations. For each possible box position the likelihood of the participant representing the box as being at that location (given their pointing responses from 3 different zones) can be defined as: L b,l,k = L b,l,k 1,1 · L b,l,k 1,2 · L b,l,k 1,3 · L b,l,k 2,1 · . . . · L b,l,k P,M where P = 20 for the total number of participants and M = 3 for the total number of shooting zones.