Motion direction is represented as a bimodal probability distribution in the human visual cortex

Humans infer motion direction from noisy sensory signals. We hypothesize that to make these inferences more precise, the visual system computes motion direction not only from velocity but also spatial orientation signals – a ‘streak’ created by moving objects. We implement this hypothesis in a Bayesian model, which quantifies knowledge with probability distributions, and test its predictions using psychophysics and fMRI. Using a probabilistic pattern-based analysis, we decode probability distributions of motion direction from trial-by-trial activity in the human visual cortex. Corroborating the predictions, the decoded distributions have a bimodal shape, with peaks that predict the direction and magnitude of behavioral errors. Interestingly, we observe similar bimodality in the distribution of the observers’ behavioral responses across trials. Together, these results suggest that observers use spatial orientation signals when estimating motion direction. More broadly, our findings indicate that the cortical representation of low-level visual features, such as motion direction, can reflect a combination of several qualitatively distinct signals.


Supplementary figures
Supplementary Figure 1.Non-independent neural noise.The panels show the relationship between the behavioral response and the peak locations for a model assuming that the neural signals for motion and orientation are not independent.Different levels of correlation r between the two signals are shown.
For the Bayesian observer model that uses both velocity and spatial orientation signals to infer motion direction ('MAP readout'), the location of either peak in the decoded posterior is positively correlated with the direction and magnitude of the behavioral errors, regardless of correlation strength.In contrast, for an observer who has a bimodal probabilistic representation of motion direction but uses only velocity (and not spatial orientation) signals ('velocity-only readout'), the correlation between the second peak location and behavioral errors is negative for weak correlation, absent for moderate correlation, and is positive only in case of strong correlation.Note that the 'velocity-only readout' model is inconsistent with the results of the follow-up behavioral study.Note that, as expected, the first column in this figure matches the results in Fig. 2c, because the correlation parameter r equals 0 here, so the measurements are independent.Please see Supplementary methods, section 3, for further detail on the simulations.In all panels, circles show the mean simulated error, and the lines show the fit of the linear regression model with shaded regions showing 95% confidence intervals.Velocity precision (κ V )

Orientation precision (κ O )
Supplementary Figure 2. The shape of the posterior distribution as a function of the variance in the velocity and orientation measurements for the Bayesian observer model.Shown is the likelihood given the velocity measurement (blue, dashed line), and the likelihood given the spatial orientation measurements (green, dashed line) for different levels of variance (precision parameter ) in the two measurements.Because these two likelihoods are multiplied to compute the posterior distribution (orange, solid line), the shape of the posterior depends on the shape of the two likelihoods.When the variance of the velocity measurements is relatively low (  is high, right column), resulting in a narrow likelihood, the posterior becomes a unimodal von Mises distribution.In contrast, if the variance of the velocity measurements is high (  is close to zero) and the variance of the orientation measurements is low (  is high), the posterior becomes bimodal with two identical peaks at the true and the opposite directions of motion (lower-left plot).Finally, when both the velocity and orientation measurements are highly variable, resulting in very broad likelihoods, the posterior distribution becomes close to uniform.Posterior Supplementary Figure 3. Concurrent BOLD bimodality and neural unimodality.Bimodality in the posterior distribution decoded from fMRI data can arise even if the neural posterior distribution that is read out for behavior is unimodal.This happens because of the non-neural sources of noise that are included in the fMRI signal.Here, the top row shows the simulated 'neural' likelihood and the corresponding posterior (magenta) computed from these likelihoods (blue: velocity, and green: orientation) for four example trials (each column shows one trial).Because there is comparatively little noise at the neural level (as we assume is the case for the fMRI experiment presented in the paper), the neural posterior that is read out for behavior is unimodal.However, measuring cortical activity with fMRI introduces additional noise.This additional noise is combined with the neural measurements (Eq.30), and results in wider likelihoods, which in turn, correspond to wider and sometimes bimodal (third and fourth columns) posterior distributions measured with fMRI (bottom row).Supplementary Figure 4.The predicted relationship between the decoded posterior distribution and the observer's behavioral responses for low coherence stimuli.To assess whether it would be helpful to collect additional fMRI data for low coherence stimuli, we simulated decoded posterior distributions and behavioral responses for this scenario.We were specifically interested in a potential relationship between the location of the decoded posterior's largest peak and opposite behavioral responses when motion coherence is low and the neural posterior is bimodal.Accordingly, for both the velocityonly model and the Bayesian model, we divided the simulated trials over two bins: one for which the largest peak of the decoded posterior fell 'opposite' to the presented direction of motion (i.e., 170-190 degrees away from the presented direction of motion, green), and another bin (blue) for all the other trials.The inset shows the across-trial distribution of the largest peak location, with the green region depicting the 'opposite' bin.Within each bin, we then calculated the fraction of trials on which the model observer gave the 'opposite' behavioral response.The bar graphs show the obtained results.While the Bayesian model predicts a larger overall fraction of opposite behavioral responses than the velocity-only model, both models predict that the fraction of opposite behavioral responses is much larger when the larger peak of the decoded posterior distribution is located close to the opposite direction of motion than when it is located elsewhere.Interestingly, the increase in the fraction of opposite behavioral responses (compare blue with green) is even larger for the velocity-only than the Bayesian observer.Because both models predict a link with behavior, we conclude that analyzing fMRI data for low-coherence stimuli would not enable us to further adjudicate between the two models.Please note that merely analyzing the shape of the posterior in cortex for low coherence stimuli (i.e., without linking this to behavior), would similarly not allow for further adjudication between the two models.The details related to these simulations are provided in Supplementary Methods, section 3.This section also provides an intuition for why we observe these results.Supplementary Figure 5.Control analyses.The effect of trial-by-trial uncertainty on behavioral variability was estimated after controlling for mean BOLD amplitude in the ROI, gaze fixation position, variability of gaze fixation positions, and head motion.The hierarchical regression model was identical to the main analysis (where decoded uncertainty and the absolute distance to the nearest cardinal direction were included as independent variables), except that the corresponding control variable was included as an additional predictor both as a fixed and as a random effect.We aimed to assess whether the addition of the control variable would eliminate the effect of uncertainty on behavioral variability.This was not the case: decoded uncertainty reliably predicts behavioral variability, regardless of the control variable included.This rules out simple explanations in terms of the amount of attentional effort put in the task (as mean BOLD amplitude tends to increase with attentional effort, see, e.g., 1 ) or the amount of eye movements.Circles indicate the estimated effect (i.e., the regression coefficient) with bars showing 95% HPD credible intervals (none of the intervals includes zero).The associated distribution of the samples from the regression model posterior is shown in gray.d Supplementary Figure 6.Results separated by ROI. a Circular correlation between the decoded and presented direction is above chance levels for all ROIs (all BF > 340).Circles show the mean and bars show 95% confidence intervals.b Posterior distributions decoded from individual ROIs (averaged across trials and all subjects) show the predicted bimodal pattern with peaks at the presented and opposite direction of motion.For all ROIs, the posterior distribution was bimodal with peaks close to 0 and 180 ∘ for the majority of observers (bimodal for 12 observers in V1, 15 observers in V2 and hV4, 16 in V3 and hMT+, and 18 observers in V3AB).c The effect of uncertainty (b, the estimated regression coefficient) on behavioral variability in trial-by-trial analyses estimated with a hierarchical Bayesian regression (not significant for any of the ROI taken individually) following the same procedure as described for the analyses of all ROI combined (cf. Figure 3c, which shows the significant effect for all ROIs combined).d The effect of the peak locations (b, the estimated regression coefficient) on behavioral errors in trial-by-trial analyses estimated with a hierarchical Bayesian regression following the same procedure as described for the analyses of all ROI combined (cf. Figure 4c).The peak closer to 0 ∘ (corresponding to the larger peak in the averaged posterior distribution in b) is correlated with the direction and magnitude of behavioral errors in all ROIs (all BF ≥ 10.62) except for hMT+ To plot the joint probability distribution of peak location (shown), the locations of the two peaks in the posterior (estimated using a mixture of von Mises model as described in the Methods) were tabulated for each trial, and the joint histogram of possible location pairs was obtained across trials for each posterior type.Three clusters of trials can be observed for the joint probability distribution of the two peak locations for the Combined condition, in which brighter colors correspond to higher probabilities of observing a given combination of peak locations on a given trial.For a considerable portion of trials, the larger peak of the posterior distribution is located around the presented direction of motion, while the smaller peak is located around the opposite direction (i.e., a bimodal decoded posterior).
In contrast, for the Velocity-only condition only a single cluster is observed with both the larger and the smaller peak location located around the presented direction of motion.Finally, for the Orientationonly condition, two clusters are observed with the larger and the smaller peak separated by 180 degrees.Please see Supplementary Methods, section 1, for further detail on the simulations.Supplementary Figure 8.Control analyses involving saccade direction.To ensure that the relationship between posterior peak location and the direction and magnitude of behavioral errors (Fig. 4c) is not due to the presence of eye-movements, we ran additional control analyses.Specifically, we included additional variables in our analysis to see if the relationship between the peak locations and the behavioral errors can be explained by these variables.The variables were computed over a narrow time window from 0 to 4.5 seconds after the start of the trial (note this corresponds to the same interval that we used for decoding, after accounting for hemodynamic lag).We used the mean saccade direction without any transformations ('avg.direction') and the mean sine-transformed saccade direction to probe for a potential circular relationship ('avg.direction (sine-transf.)').Additionally, to test for a potential effect of back-and-forth saccades (i.e., if the dots are moving to the right, the observers might saccade to the right and then to the left, back to the fixation point), we included the average orientation of saccades ('avg.orientation') as well as the linearized version of saccade orientation ('avg.orientation (sine-transf.)').For all control variables, the regression coefficient of peak location was reliably above zero and virtually the same as without the addition of the control variable (blue, top), indicating that saccade direction does not play a significant role in the observed relationship between the peaks of the decoded posterior and the observer's behavioral errors.Dots denote the regression coefficient, and bars show 95% highest posterior density (HPD) credible intervals.Note that for the follow-up study, the plots match Figure 4 (reproduced here for ease of comparison).Supplementary Figure 10.Neuroimaging noise that is non-independent.The panels show the relationship between the behavioral response and the peak locations for a model assuming that the BOLD signal for motion and orientation is not independent due to non-neural sources of noise, such as fMRI scanner noise and physiological sources of noise, that affect both signals.Different levels of correlation r between the signals are shown.Please see the Supplementary methods, section 3, for detail on the simulations.For the Bayesian observer model that uses both velocity and spatial orientation signals to infer motion direction ('MAP readout'), the location of either peak in the decoded posterior is positively correlated with the direction and magnitude of the behavioral errors, regardless of correlation strength.
In contrast, for an observer who has a bimodal probabilistic representation of motion direction but uses only velocity (and not spatial orientation) signals ('velocity-only readout'), the relationship between the second peak location and behavioral errors is negative, except for extremely strong correlation levels r that seem unrealistic for fMRI voxels.The circles show the mean simulated error, and the lines show the fit of the linear regression model with shaded regions showing 95% confidence intervals.).Accordingly, the response distribution is shifted relative to the true stimulus direction at 0 ∘ , but never becomes bimodal.The 'random orientation weight' model (pink) combines spatial orientation and velocity signals while ignoring uncertainty; its response is a weighted average of the velocity-based estimate and the closest orientation-based estimate with randomly-assigned weights.This produces a bimodal response distribution when noise in the orientation signals is high.However, this model also predicts a very wide behavioral response distribution when uncertainty is low for velocity signals and high for orientation signals (upper-right corner).This is inconsistent with previous studies showing that observers perform well at slow motion speeds when orientation information is presumably very noisy or even absent 2 .A 'switching observer' (red) randomly switches between the velocity and orientation likelihoods on a proportion of trials (here, 20%).It exhibits bimodality in its response distribution, but the share of opposite behavioral responses is the same regardless of sensory noise.This is inconsistent with our results demonstrating opposite responses for low coherence (follow-up experiment) but not for high coherence stimuli (main experiment).The 'uncertainty-guided switching observer' (orange) selects the orientation or the velocity-based estimate based on their relative uncertainty 3 .This observer exhibits a higher probability of opposite responses than what is observed in our main experiment (high coherence stimuli), except when the orientation likelihood is flat (upper right plot).This is inconsistent with our fMRI results (Figure 4) which indicate that orientation signals are not only represented in cortex but also used by the participants in their behavioral estimates.Moreover, an already flat orientation likelihood cannot lead to the bimodal behavioral response distribution with decreased coherence observed in the follow-up experiment, making also this model inconsistent with our data.Neurophysiological studies show that the tuning curves of direction selective neurons in V1 often have a smaller second peak around the direction of motion opposite to the preferred direction 4,5 .Here, we show that it is unlikely that these neurons alone could give rise to a bimodal posterior distribution at the level of voxels (the conclusions are similar for the behavioral response distribution).Please see Supplementary Methods, section 3, for detail on the simulations.As is evident from the figure, the posterior distributions decoded from the voxel responses (shown) was always unimodal and never bimodal, regardless of the level of tuning-dependent noise ( 2 , illustrated by different colors).Similar results were found for voxel-specific noise ( 2  ).This further strengthens the hypothesis that the empirically observed posterior reflects the responses of cells whose spatial orientation receptive field runs parallel to the presented motion direction.Posterior Supplementary Figure 14.A unimodal decoded posterior does not necessarily imply a unimodal neural posterior.Because the BOLD signal contains non-neural sources of noise, the fMRI posterior can be bimodal when the neural posterior that is read out for behavior is unimodal (Supplementary Fig. 3).Importantly, this also holds the other way around: the fMRI posterior can become unimodal when the posterior at the neural level is, in fact, bimodal.As shown here for four example trials (each column shows one trial), at higher noise levels, the posterior (magenta) computed from the neural signals becomes bimodal (top panels).However, the addition of non-neural sources of noise shifts the likelihoods for velocity (blue) and spatial orientation (green) and makes them wider.As a result, the fMRI posterior that is decoded from all of the signals combined can be unimodal (lower panels) even though the neural posterior is bimodal.The behavioral data showed direction-dependent biases, which were removed prior to the main analyses, as follows.For each individual observer, we first determined the direction of their behavioral bias by fitting two models, which described either attraction or repulsion from cardinal directions.For the model that describes an attraction bias, the behavioral errors are expected to be close to zero at the cardinal directions.Accordingly, trials were split into four 90-degree bins (indicated by colors) centered at cardinal ({0, 90, 180, 270} degrees).For the repulsion biases, on the other hand, the errors are expected to be close to zero at oblique directions.Hence, for this model, the trials were divided into four 90-degree bins centered at oblique ({45, 135, 225, 315} degrees) directions.For each bin separately, and for each direction of motion within the bin, bias magnitude was determined by fitting a 4th degree polynomial with motion direction (computed relative to the bin center) as the independent variable and behavioral error as the dependent variable.To account for the heterogeneity of responses across motion direction (e.g., the oblique effect), the standard deviation of the behavioral errors was allowed to vary with distance to the polynomial's center (illustrated by dashed lines here, showing mean +/-3 SD boundary).The best model (solid lines) of bias direction (as determined by their likelihood) was selected and subsequently used to remove the bias.All subsequent analyses were performed on the residuals of its fit (i.e., the difference between the predicted mean response for a given stimulus and the actual response).For subject A, the model with polynomials centered at oblique directions provides the better fit, while for subject B, the model with polynomials centered at cardinals is the better one.Errors that were larger than +/-3 times the predicted SD (shown with crosses) were considered outliers and not included in subsequent analyses.b The bias-corrected errors that were fed into subsequent analyses.The bimodality in the averaged decoded posterior distribution is observed with 100+ voxels.c The correlation between decoded uncertainty and behavioral variability in trial-by-trial analyses (after accounting for the effect of stimulus orientation) is positive for 500+ voxels (BF ≥ 3.01).d The correlation between either peak location and behavioral error is observed for 500+ voxels (BF ≥ 8.90 for the peak closer to 0 ∘ ; BF ≥ 5.03 for the peak closer to 180 ∘ , with the exception of N = 1000 voxels where BF = 1.65).Circles and bars in c and d show the estimated effect (i.e., the regression coefficient) and 95% highest posterior density (HPD) credible intervals, respectively.Using the generative BOLD model (Supplementary Methods, section 1), we simulated BOLD responses (Supplementary Eq. 16) for varying levels of noise (or uncertainty) in neuronal populations tuned to orientation (  , panels) and velocity (  , colors).As with the Bayesian observer model (Supplementary Fig. 2), the posterior (shown) becomes more bimodal when the uncertainty of velocity signals increases and the precision of orientation signals increases (uncertainty decreases).A separate set of simulations demonstrated that tuning-independent noise ( ) does not have a significant effect on the shape of the decoded posterior.Please see Supplementary Methods, section 1, for further detail on the simulations.
First, we confirmed that if the population response reflects both spatial orientation and velocity cues, and the velocity response is relatively noisy, the decoded probability distribution is bimodal with two unequal peaks corresponding to the true and opposite direction (Supplementary Figure 7a).In contrast, a unimodal distribution is observed when decoding from a population that only represents velocity cues, and a bimodal distribution with two equally high peaks is obtained when decoding from a population response that reflects spatial orientation cues alone.As for the Bayesian observer model described in the main text, the shape of the decoded posterior depends (on average) on the relative amount of noise in the two signals (Supplementary Figure 20).
Second, the bimodal shape of the mean distribution across trials (decoded from responses that reflect both cues) is not an artefact of the aggregation.The locations of the peaks in the decoded posterior can be quantified on a trial-by-trial basis using a mixture of two von Mises basis functions.Supplementary Figure 7b shows the joint distribution of the locations of the larger and smaller peak in this mixture.Using this quantification, three clusters of trials are observed, with two components either co-located at the true motion direction (corresponding to a unimodal average posterior) or one of them located at the true and another located at the opposite motion direction (corresponding to a bimodal posterior).This is in sharp contrast to what is observed when decoding from a population response that represent only velocity cues (only one cluster of trials is observed, corresponding to a unimodal average posterior) or only orientation cues (two cluster of trials with one of the two components located at the true motion direction and another at the opposite one).

Simulated BOLD responses
To obtain predictions for the shape of the decoded posterior distribution, we simulated the responses from a population of velocity-and orientation-tuned neurons.We combined the responses in idealized voxels, the response of which is a simple sum of the signals from the underlying neuronal populations, combined with additional noise due to the fMRI measurements.Specifically, we assumed that the response of each of N voxels reflects a mixture of signals from two independent neural populations, a bimodally-tuned one and a unimodally-tuned one (corresponding to orientation-tuned and velocity-tuned neural populations, respectively), and noise.Accordingly, the BOLD response vector of size N,   , was modeled as the sum of three components: where   and   refer to the vectors of velocity-tuned and orientation-tuned components, respectively and  ϵ refers to the noise in BOLD responses due to other factors, such as non-neural fMRI-specific noise or neural responses uncorrelated with the orientation or velocity components.The velocity-tuned and orientation-tuned components are normally distributed with stimulus-dependent mean   () and   (), and covariance   and   , respectively: ~(  (),   ) The noise component  ϵ is normally-distributed with zero mean and covariance  ϵ : where  ϵ is a diagonal covariance matrix determining the noise specific to individual voxels with variance   2 : The mean response   () and   () of the velocity-tuned and orientation-tuned components is given by: () = ∑      ()  =1 (7)   Here,   and   are weights, K=8, and g is a basis function: () = max(0, cos( −   )) 5 (8)   where   is the preferred motion direction of basis function k.The mean response (as a function of s) can be thought of as the voxel's tuning function for the velocity and spatial orientation signals.The weights   are described by an N by K matrix of 1 by K unit vectors    * with K uniformly-distributed values so that for the i-th voxel: ~  (0,1) where () is the function that normalizes a vector by its Euclidean norm, so that for any vector : For the spatial orientation-tuned component, the weights   are distributed in the same way as for the velocity-tuned component for half of the basis functions (indicated by vector a) and repeated for the other half, after which they are normalized to unit length: The covariance matrices   and   define the degree of (shared) noise in the voxels.was normally distributed with mean   and variance   2 = 0.035 across voxels.We used different levels of   varying from 0.3 to 1.1 in 0.2 steps.To obtain posterior distributions, we inverted the generative model of voxel responses (Eqs.S1-S4, and a uniform prior) using Bayes rule.We obtained posterior distributions given the combined responses ((|  )), velocity responses alone ((|  )), and spatial orientation response alone ((|  )).The results are presented in Supplementary Figure 7 and Supplementary Figure 20.

Deriving the posterior distribution for the Bayesian observer model.
Here, we provide the derivation for the likelihood function of the Bayesian observer (Eq.21 in the main text) reproduced here for convenience: () =   (;   ,   )(  (;   ,   ) +   (;   + ,   )) where  refers to the motion direction of the stimulus,   and   are the velocity and orientation measurements, and   and   are the associated precision parameters.This equation shows the sum of two products, each of which involves two von Mises distributions.In other words, it is a mixture distribution.Recall that a product of two VM distributions, 1 and 2, is an unnormalized von Mises distribution ): The precision of a product of two VM distributions (  ) depends on their individual precisions ( 1 and  2 ) and the distance between their locations ( 1 −  2 ): For the likelihood function described here, there are two such products in a mixture distribution.Thus, applying Supplementary Eq. 19, the likelihood (Eq.21 in the main text) is described as a mixture of two von Mises distributions with locations   and   and precisions   and   computed per Supplementary Eq. 20 and 21: The first part (A) of the mixture is a product of the velocity likelihood and the first peak of the orientation likelihood: Supplementary Equation 30 summarizes the posterior as a mixture of two von Mises distributions with weights defined by their relative precision.Each of the two peaks is a product of the velocity and orientation likelihoods.The precision and the location of the two peaks in the mixture depend on the precision and the location of the velocity and orientation components.
distribution centered on 0 and with correlation coefficient   .We varied the amount of correlation between the two noise variables in our simulations,   = {0.0,0.3, 0.5, 0.7, 1.0}.The variance parameters ′  2 and ′  2 of the bivariate normal were obtained by converting two precision parameters,   ′ ,   ′ , which were themselves drawn randomly from a log-normal distribution with  , = 0.9 and  , = 1.1 for velocity, and  , = 1.4 and  , = 0.7 for orientation.These latter parameter values are the same as for the simulations reported in the main text.The posterior distribution given voxel activity was subsequently obtained by inverting this generative model, where the likelihood was computed numerically (similar to above).The peak locations were determined in the same way as for Supplementary Figure 1.
, = 0.9 and  , = 1.1 for velocity, and  , = 1.4 and  , = 0.7 for orientation.These parameter values are the same as for the simulations reported in the main text.The fMRI likelihood was obtained numerically by computing the joint probability of observing the fMRI measurements   +   ′ and   +   ′ for each stimulus s on a 720-step (0.5˚) grid.The decoded posterior distribution was calculated using Bayes rule and a flat prior.The peak locations of the simulated decoded posterior were estimated in the same way as those that were decoded from fMRI data (see Methods, Analyses of the shape of the decoded distribution for details).That is, a two-component mixture model was fitted to estimate the peak locations.The correlation of each peak location with the behavioral response was then estimated for trials where one of the peaks was closer to the true (-90° to 90°) and the other was closer to the opposite (90° to 270°) direction of motion.

Supplementary Figure 7 .
. The peak closer to 180 ∘ (corresponding to the smaller peak in the averaged posterior distribution in b) is reliably correlated with errors only in V1 and V3 (BF = 239.19and BF = 51.94,respectively).Circles and bars in c and d show the estimated regression coefficient and 95% highest posterior density (HPD) credible intervals, respectively.Simulated results for the cortical representation of motion direction.a Decoded posterior distributions, averaged across trials.When decoding from a mixture of velocity-and orientation-tuned signals ('Combined'), the resulting posterior distribution is bimodal of shape, with a larger peak at the presented and a smaller peak at the opposite motion direction.In contrast, when information is extracted from velocity signals only ('Velocity-only'), the posterior distribution is unimodal.Using spatial orientation signals alone ('Orientation-only') produces a bimodal posterior with two equalsized peaks.b

Supplementary Figure 9 .
Behavioral performance in the main fMRI study and the follow-up behavioral experiment.a Joint probability distribution of the presented motion direction and the observer's judgment, aggregated across observers.b Estimates of motion direction for an example observer (each dot represents a single trial).

r-Supplementary Figure 12 .
κneur = 0.0 r κneur = 0.3 r κneur = 0.5 r κneur = 0.7 r κneur = 1.0 MAP readout Velocity-only readout -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 transformed, 180 deg.space) Behavioral error (deg.) 1 st peak (around 0º) 2 nd peak (around 180º)Supplementary Figure11.The variance parameters of the neural noise distributions are correlated.The degree of uncertainty in the observer's motion and orientation estimates can covary due to, for example, fluctuations in attention.Here, we ask whether such covariation in uncertainty magnitude can explain the observed correlation between the peak locations and behavioral errors.Different amounts of correlation r between the motion and orientation uncertainty values are shown.Please see Supplementary methods, section 3, for further detail on the simulations.For the Bayesian observer model that uses both velocity and spatial orientation signals to infer motion direction ('MAP readout'), the location of either peak in the decoded posterior is positively correlated with the direction and magnitude of the behavioral errors, regardless of correlation strength.In contrast, for an observer who has a bimodal probabilistic representation of motion direction but uses only velocity (and not spatial orientation) signals ('velocity-only readout'), the correlation between the second peak location and behavioral errors is negative.The simulation results show that the velocity-only readout scenario is inconsistent with the empirical data.The circles show the mean simulated error, and the lines show the fit of the linear regression model with shaded regions showing 95% confidence intervals.Orientation noise (SD, Simulated predictions for behavioral response distributions of alternative observer models.Shown are the predicted behavioral response distributions for different amounts of sensory noise in the orientation and velocity measurements.The 'Bayesian MAP' model (blue, discussed in the paper) is shown for comparison.The 'constant response bias' model (green) assumes that behavioral responses are based on the velocity-only estimate alone with an arbitrary constant shift (here, +15 ∘

σ
Supplementary Figure13.Posterior distribution of motion direction decoded from simulated responses of bimodally-tuned direction-selective neural populations.

b
Supplementary Figure16.Removal of direction-dependent behavioral biases in two example observers.a The behavioral responses (dots) of two example participants A and B.

Supplementary Figure 17 .
hMT+ localizer maps for an example participant (LH: left hemisphere; RH: right hemisphere).hMT+ location (bright green) was determined based on a comparison of responses to coherent and random motion stimuli ('coherence') and a comparison between stimuli composed of moving or static dots ('motion').See Methods for further detail.Colors indicate t-values for a given localizer contrast (threshold set here at 2.0 for consistency between subplots; hMT+ location was delineated with p < .05threshold, FDR corrected).Other boundaries shown are those of V1(yellow), V2 (light green), V3 (blue) and hV4 (pink).
Effect of the peak location on error 1 st peak (around 0 • ) 2 nd peak (around 180 • ) d Supplementary Figure 19.Control analyses using different numbers of voxels in the analyses.Voxels were selected based on their response to the visual localizer stimulus, with the most active voxels selected first.a Decoding accuracy improved with increasing numbers of voxels, and remained relatively stable with 1000+ voxels.Circles show the mean correlation and bars show 95% confidence intervals.b

Supplementary Figure 20 .
The shape of the posterior distribution decoded from simulated BOLD signals depends on uncertainty in the underlying orientation and velocity signals.
7ee van Bergen et al.6and van Bergen and Jehee7for additional derivations of, and rationale for, this particular covariance structure.~(,) (16)with mean response   and covariance   defined as:  =   +   = ∑(   +    )  ()We simulated voxel responses for different levels of velocity and orientation noise with   and   varying from 0.1 to 1.3 in 0.2 steps.In each simulation, we generated the BOLD response for 500 voxels and 100 trials with randomly chosen stimuli for 10 observers.Voxel tuning preferences (determined by   and   ) were randomly and independently drawn for each voxel according to Eqs.S6-S13.The noise variance parameter   2 Following Supplementary Eq. 1-4, the total voxel response   is a multivariate normally distributed variable: where  1 and  2 are the peak locations of the two distributions and  1 and  2 refer to their precision.The location   of the resulting distribution is between the locations of the original distributions  1 and  2 with the relative position dependent on the ratio of two precisions ( (;   ,   )  (;   ,   ) =The second part (B) of the mixture is a product of the velocity likelihood and the second peak from the orientation likelihood:   () =   (;   ,   )  (;   + ,   ) = exp(  cos( −   )) 4 2  0 (  ) 0 (  ) (26) Using trigonometric identities, it can be demonstrated that:   =   + atan2 (sin(  −   ) , Given that the denominator in Eqs.23 and 26 does not depend on the stimulus, the posterior distribution is proportional to a sum of two exponents: (|  ,   ) ∝ exp(  cos( −   )) + exp(  cos( −   )) (29) After including the normalization constant, it can be shown that: (|  ,   ) =     (;   ,   ) +     (;   ,   )