Lighting-from-above prior in biological motion perception

The visual system is able to recognize body motion from impoverished stimuli. This requires combining stimulus information with visual priors. We present a new visual illusion showing that one of these priors is the assumption that bodies are typically illuminated from above. A change of illumination direction from above to below flips the perceived locomotion direction of a biological motion stimulus. Control experiments show that the underlying mechanism is different from shape-from-shading and directly combines information about body motion with a lighting-from-above prior. We further show that the illusion is critically dependent on the intrinsic luminance gradients of the most mobile parts of the moving body. We present a neural model with physiologically plausible mechanisms that accounts for the illusion and shows how the illumination prior might be encoded within the visual pathway. Our experiments demonstrate, for the first time, a direct influence of illumination priors in high-level motion vision.


Neural model
Our hierarchical neural model consists of four layers that implement a position-invariant recognition of walking direction based on the intrinsic gradual shading variations of the individual stimulus elements.
The model reproduces qualitatively the illusion, as well as the dependence of the illusion size on the available shading information.
The first layer of the model is composed from uneven Gabor filters Gu that are ordered within a rectangular spatial grid. Such filters are sensitive to oriented local luminance gradients. We assume that the receptive field center of the filter is specified by the vector (xc, yc), and that its preferred gradient direction is given by the angle . We assume further that the receptive field size is specified by the parameter  , and the preferred spatial frequency by the constant k0. With these parameters, the filter functions are defined as: . We used 810,000 spatial grid points for the receptive field centers and eight different angles (cf. Tab. 1). Assuming I(x, y) as the gray-level input pixel image, the output signal of the filter with these parameters is given by the sum: In our experiments the walking direction could not be reliably extracted from the stimuli with 'flat' shading. These stimuli specify strong luminance gradients on the boundaries of the stimulus elements, but no shading gradients inside of them. The strong gradients on the element boundaries dominate the responses of the filters on the first hierarchy layer. In a set of additional simulations, we found that a more robust processing of the weak luminance gradients inside the elements can be accomplished by suppressing the gradient responses on the silhouette boundaries. This suppression is accomplished by the second layer of our model, exploiting a multiplicative gating mechanism. The underlying operation can be implemented by a simple feed-forward network that multiplies a gating signal with the filter responses from the previous layer. The gating signal is computed from all filter responses of the first layer with the same receptive field center, taking the maximum over the responses with selectivity for 3 different direction angles: . The output signals of the second layer are then given by thresholded products of the form: The third layer of the model pools the responses with same direction selectivity from the second level over limited spatial regions U(x,y) using a maximum operation, proving a (spatially subsampled) set of detector responses with partial position invariance. Mathematically these responses were given by: The responses of the 648 neural detectors on this level can be interpreted as 'mid-level features' within the visual hierarchy. The recognition of walking direction was based on the classification of the activation patterns, exploiting a subset of these mid-level feature-detector responses, including only those detectors whose response varied significantly over the training set. In order to determine this set of detectors automatically, we computed, separately for each position (x, y), a population vector from the detector responses with different direction specificity, which was given by the complex number: Exploiting circular statistics, we computed the variance of these population vectors (see (29) for details) and retained only the responses of those detectors whose circular variance over the whole training set exceeded a fixed threshold. This operation corresponds to a feature selection stage that identifies features with robust variation within the set of training patterns. Interestingly, the selected feature detectors correspond nicely to the body parts whose shading, according to Experiment 2, is most critical for the size of the illusion.
The highest level of the model (layer 4) is given by a classifier stage that is formed by two radial basis function (RBF) units that have been trained with example stimuli walking TOWARDS and AWAY, and which were illuminated from above. The underlying training set contained 100 movies of walkers that were lit with an elevation angle of 78.75 deg, where the veridical motion direction of 100 stimuli was AWAY and the one of the other 100 movies TOWARDS. The different movies were rendered using 4 different actors (2 male, 2 female) and randomly varying sizes of the body elements. This training encodes a perceptual prior, which reflects statistical properties of stimulus patterns that are illuminated from above, and forms thus the implementation of the 'lighting-from-above prior' in our model.
The EM procedure fits the mixture weights n (both very close to 0.5), and the means n and the covariance parameters n  of the Gaussian distributions, where we assume the Gaussian density functions: Interpreting the parameters n as the prior probabilities of the two classes 'walking TOWARDS and given that approximately 1 =2. The posterior class probability can thus be computed by a simple normalization operation from the RBF neuron activities, and the most likely class can be found by simple winner-takes-all competition.
In order to compare the simulation results for Experiment 1 with the experimental data, we fitted the classification responses of the model with a logistic regression, and the experimental data with a logistic mixed-effects model (see section on statistical analysis of data from Experiment 1), using only the elevation angle as fixed and random effect predictor (smooth curves in Fig. 4A). Data was collapsed across the veridical walking directions since the model treats AWAY and TOWARDS walking equally and does not contain a special mechanism that can account for the walking towards bias that has been observed in biological motion vision (18). For Experiment 2, for each test stimulus, we computed the differences of the accuracies between the two light source positions (above and below) in order to quantify the size of the illusion. The similarity of the illusion sizes derived from the model with the real data for the different shading conditions was quantified by a linear regression analysis, predicting the illusion size obtained from the model with the ones in the experimental data (Fig. 4B).

Statistical analysis of data from Experiment 1
Statistical analysis was realized using R version 3.4, RStudio version 1.1.4, and lme4 version 1.1-12. We use ggplot2 version 2.2 to generate the statistical plots and multcomp version 1.4-6 for the multiple comparisons.
For the analysis of the conditions from Experiment 1 with graded shading of all elements we combined the responses of all 13 observers and fitted them with a logistic mixed-effects model (function glmer from package lme4) with cosine and sine of the light elevation angle and veridical walking direction both as fixed and random effects. In total, we fitted data points from 13 (observers) * 2 (veridical walking directions) * 17 (light angle directions) * 15 (repetitions) = 6630 data points. We did not include the interaction between the light elevation angle and veridical walking direction in the regression because this interaction was not significant. Fig. S1 shows the probability of perceiving the veridical walking direction as a function of light elevation angle in a separate panel for each observer (the panels are ordered by increasing difference in participant's points of subjective equality). The data points in each panel are jittered to minimize overlap (hence some probabilities seem smaller than zero or larger than one). Different colors indicate the two veridical walking directions, and the curves from the random effects fit are shown in the same color. The illusion is present for all observers and both veridical locomotion directions, while the fitted curves show different offsets (thresholds) that vary between participants, and partly between the two veridical locomotion directions.
A particularly interesting question is whether the condition with frontal illumination ( = 0), which is the condition with the smallest luminance gradients within the different stimulus elements, conveys information about then veridical walking direction. The accuracy for this class of stimuli for the different observers is shown in Fig. S2. All participants show accuracies above 0.5, indicating that there is some remaining information about walking directions even for the stimuli with illumination from the side, potentially mediated by the subtle time-varying shading gradients in the stimulus elements that are present for this stimulus class. This stands in contrast with the stimuli with flat shading, that showed accuracies even below 0.5, indicating on average a perception of the wrong walking direction.
A further set of statistical analyses investigated potential biases in the perception of walking direction in favor of walking TOWARDS or AWAY. Such biases were studied for the class of stimuli of all shaded walker elements as well as for the stimuli with flat shading.
To quantify potential biases for the stimuli with gradual shading of all elements we analyzed the points of subjective equality (PSEs) of the random effects from the model fitted above (as plotted in Fig. S1).
As shown in Fig. S3A, only observer 10 has a bias to perceive 'away', while eight participants have a bias to perceive walking 'towards' them. Further four observers have little to no bias (observers 9, 5, 13 and 12). Thus, the bias in perceived walking direction is different for different observers, but most have a 'towards' bias, consistent with results on biological motion stimuli in the literature (18).
We investigated also potential biases for the perception of walking direction for the stimuli with flat shading. Response accuracies are shown in Fig. S3B, separately for the AWAY and TOWARDS conditions. Observers are ordered by the sizes of the differences between the PSEs between AWAY and TOWARDS walking. Observers 10, 9, 5, 13 and 12, have similar accuracy for the AWAY and TOWARDS walkers, whereas all others are more accurate for the TOWARDS condition.
There is a striking analogy between the two panels, in that observers with a larger difference in PSE's while viewing the fully shaded stimuli also tend to have a larger difference in accuracy between AWAY and TOWARDS walkers while viewing stimuli with flat shading. To make this relationship explicit, we plotted the difference in accuracy for the stimuli with flat shading against the difference in PSE derived from the fully shaded stimuli in Fig. S3C. Testing the correlation between both variables, by an ordinary linear regression analysis, we found the slope of the regression line to be significantly different from zero (p < 0.01). This shows that the observer-specific bias in perceiving an ambiguous walker as walking towards the observer is similar for the two different stimulus classes: the fully shaded stimuli and the ones with flat shading.

Statistical analysis of data from Experiment 2
We used the same software for statistical analysis as for Experiment 1. The combined responses of all 16 observers were fitted with a linear mixed-effects model (function lmer from package lme4 (19)) with body part and veridical walking direction both as fixed and random effects. In total, we fitted 16 (observers) * 2 (veridical walking directions) * 9 (body part conditions) = 288 data points, each of which was defined by the difference in accuracy between 15 repetitions with lighting from above and 15 repetitions with lighting from below. The data was collected testing 9 levels of the factor body part, while only 8 levels were used in the main analysis, since the results for the levels "none" (flat shading) and "body" (gradual shading of head, torso and upper arms only) were not statistically different and hence combined. Fig. S4 shows all of these 288 means, with the panels ordered by increasing average accuracy difference. The trend of increasing mean accuracy with body part condition holds for most observers, although there are also clear inter-individual differences. Interestingly, there are only minor differences between the stimuli with different veridical walking directions.
In a linear mixed-effects regression analysis we found a significant effect of factor body part (p < 10 -15 ) and an insignificant effect of veridical walking direction (p > 0.05). Subsequently, we realized also selected pairwise comparisons with corrections for multiple testing (R package multcomp (20)). First, we first asked whether adding the forearms, the thighs or the (lower) legs increased the mean difference in accuracy. We found that adding each of these elements significantly increased (all p < 0.001) the illusion relative to the baseline condition with gradual shading of the head, torso and upper arms. Second, we compared which of the three additions, forearms, thighs or (lower) legs lead to a larger illusion and found no significant differences between these three conditions (all p > 0.5). Third, we asked if the addition of a second moving limb region to the first increased size of the illusion. In detail, we asked whether adding (lower) legs or thighs to the forearms increased the illusion and found that it did (all p < 0.01). Adding the forearms to the (lower) legs or thighs did not increase the illusion (all p > 0.1). Adding the (lower) legs to the thighs or the thighs to the (lower) legs did increase the illusion by a modest amount (p < 0.05). Fourth, we asked if adding a third moving limb region to any other two moving limb regions resulted in a significant increase of performance. Only adding of the thighs to the forearms and (lower) legs resulted in a significant difference (p < 0.05), while this was not the case for the addition of the other two moving regions (p > 0.2).

Rendering of stimuli for Control Experiment
In a control experiment we used a stimulus that does not provide classical shape-from-shading cues that allow the reconstruction of the three-dimensional orientation of the body segments. However, the elements of this control stimulus approximate the internal luminance profiles of the elements of the original stimuli used in Experiment 1. The purpose of this experiment was to rule out the possibility that the observed illusion just is a consequence of a classical shape-from-shading mechanism that estimates the stimulus element orientations in space.
The walker for this control experiment was composed of elements with fixed circular shape in order to eliminate silhouette-based orientation cues. The internal luminance profiles of these circular elements were . To each boundary point on the conic elements we assigned a corresponding boundary point , where the corresponding radius R* was given by the maximal . Exploiting these corresponding coordinate systems, we warped the luminance profiles of the original conic elements on the ones with circular shape using nearest neighbor interpolation. Fig. S5A illustrates example frames form the generated control stimuli. The circular elements look like deforming rubber sheets and do not provide a clear impression about orientation in space.

Control Experiment
In order to assess the illusion size for the control stimulus that prevents the use of shape-from-shading for the estimation of element orientations in space, we used only three light source positions with illumination from ABOVE and three from BELOW. Like in Experiments 1 and 2, we asked the participants to report the perceived walking direction. A new group of 12 observers participated in this control experiment. The experimental protocol was identical to experiment 1 and 2, except for the fact that performed 8 instead of 4 steps (4 gait cycles instead of 2) with 20 repetitions per condition.
The results were analyzed using the same procedure as in experiments 1 and 2. We combined the responses of all 12 observers and fitted them with a logistic mixed-effects model with light elevation angle and veridical walking direction both as fixed and random effects. In total, we fitted 12 (observers) * 2 (veridical walking directions) * 6 (light angle directions) * 20 (repetitions) = 2880 data points.

9
The data (Fig. S5B) shows a weaker, but significant effect of the light source position on the accuracy of perceived walking direction (p=0.02). This shows that the illusion was present even for stimuli which make a reconstruction of the three-dimensional orientation of individual limb segments impossible.
However, taking the holistic body configuration and the structure of the inner luminance gradients into account, the visual system was able to extract the walking direction from such stimuli and shows the same illusion as for the stimuli with conic elements in Experiment 1. The observed illusion is thus not just a consequence of a classical shape-from-shading process that estimates the orientation of individual stimulus elements. Instead, the observed phenomenon must be based on a mechanism that integrates body shape and information about internal shading gradients, as for example the mechanism realized by the proposed model. . Per-participant data from Experiment 1. Accuracy of reporting the veridical walking direction as a function of the light source elevation angle α. The participants are ordered by increasing difference between their AWAY and TOWARDS points of subjective equality. Fig. S2. Accuracy of reporting veridical walking direction of the frontal lighting condition ( = 0) for the fully shaded walkers. The participants are ordered by increasing difference between their AWAY and TOWARDS points of subjective equality (same as Fig. S1).