A Bayesian observer model constrained by efficient coding can explain 'anti-Bayesian' percepts

Journal name:
Nature Neuroscience
Year published:
Published online


Bayesian observer models provide a principled account of the fact that our perception of the world rarely matches physical reality. The standard explanation is that our percepts are biased toward our prior beliefs. However, reported psychophysical data suggest that this view may be simplistic. We propose a new model formulation based on efficient coding that is fully specified for any given natural stimulus distribution. The model makes two new and seemingly anti-Bayesian predictions. First, it predicts that perception is often biased away from an observer's prior beliefs. Second, it predicts that stimulus uncertainty differentially affects perceptual bias depending on whether the uncertainty is induced by internal or external noise. We found that both model predictions match reported perceptual biases in perceived visual orientation and spatial frequency, and were able to explain data that have not been explained before. The model is general and should prove applicable to other perceptual variables and tasks.

At a glance


  1. Bayesian observer model constrained by efficient coding.
    Figure 1: Bayesian observer model constrained by efficient coding.

    (a) We model perception as an encoding-decoding process. We assume encoding is governed by efficient coding and is characterized by the corresponding conditional probability distribution p(m|θ) of the sensory measurement m given a stimulus value θ. We also assume that decoding is Bayesian based on an accurate generative model of the sensory process. The percept is then specified based on the posterior distribution p(θ|m) and the loss function . (b) Our assumptions imply that the Bayesian observer is constrained by the natural stimulus distribution: the prior belief is assumed to directly match the stimulus distribution (for example, through learning), although the likelihood function is constrained by the stimulus distribution via efficient coding. (c) Example for an arbitrary stimulus distribution. An efficient coding principle that maximizes mutual information implies that the encoding accuracy (measured as the square-root of the Fisher information J(θ)) matches the stimulus distribution. With some assumptions about the sensory noise characteristics, the likelihood function is fully constrained by the Fisher information. Likelihood functions for different sensory measurements are shown to illustrate their heterogeneity across the stimulus space. Technically, the likelihood functions can be computed by assuming a symmetric noise structure (that is, symmetric likelihood functions) in a space in which the Fisher information is uniform (sensory space, characterized by the mapping F(θ)) and then transforming those symmetric likelihood functions back to the stimulus space.

  2. Prediction 1: Bayesian perception can be biased away from the prior peak.
    Figure 2: Prediction 1: Bayesian perception can be biased away from the prior peak.

    (a) A standard Bayesian observer model that a priori assumes a symmetric likelihood function typically predicts perceptual biases toward the peak of the prior. This bias towards the prior has been considered to be a fundamental characteristic of a Bayesian model. (b) In our new Bayesian observer model, efficient encoding promotes a nonlinear mapping between stimulus and sensory representation. Assuming that the sensory representation is affected by internal noise, the resulting likelihood function is asymmetric for any non-uniform prior distribution, with a long tail pointing away from the prior peak. As a result, the estimate can be biased away from the prior peak. Here, this is illustrated assuming a squared-error loss function. As a result of its asymmetry, the mean of the likelihood function is away from the peak of the prior relative to the true stimulus value θ0 (likelihood repulsion). Although the prior still leads to an attractive shift of the posterior (prior attraction), the net bias can be repulsive. Note that the degree of asymmetry of the likelihood function, and thus the magnitude of the repulsive bias, depends directly on the steepness of the prior. Both examples are illustrated for the case of the median likelihood function (that is, the measurement m equals the stimulus value θ0). The repulsive effect is further amplified because the distribution of the measurement also follows the same asymmetry.

  3. Prediction 2: stimulus (external) and sensory (internal) noise differentially affect perceptual bias.
    Figure 3: Prediction 2: stimulus (external) and sensory (internal) noise differentially affect perceptual bias.

    (a) Stimulus noise directly affects stimulus uncertainty and thus the likelihood function (formulated in stimulus space). The uncertainty introduced by sensory noise, however, is transformed back through the inverse of the mapping function F (equation (8), Online Methods) between sensory and stimulus space; the very reason the likelihood function is asymmetric in the first place. (b) Increasing the (symmetric) noise at the level of the sensory representation leads to a more asymmetric likelihood function (formulated in the stimulus space) and thus increases likelihood repulsion (dashed lines). As a result, the increase in prior attraction resulting from the increase in likelihood width is smaller than the increase in likelihood repulsion, leading to an overall net increase in repulsive bias. (c) In contrast, adding (symmetric) stimulus noise does not affect the asymmetry of the likelihood function (dashed lines) because the added noise essentially convolves the likelihood function with the noise kernel. The likelihood repulsion remains the same while the prior attraction grows because the overall width of the likelihood increases. As a result, the perceptual bias becomes more attractive. (d) Summary plot illustrating how perceptual biases depend on stimulus and sensory noise. We assumed additive Gaussian noise and a squared-error loss function. Dots correspond to the conditions shown in b. In general, the perceptual bias is repulsive and grows with increasing sensory noise. However, increasing stimulus noise reduces the repulsive bias, eventually leading to attractive biases for large noise levels. Note that this differential dependency on the different noise sources is a direct consequence of the inhomogeneous sensory representation imposed by efficient coding. For comparison, the black curve illustrates the expected biases for a Bayesian observer model that a priori assumes a symmetric likelihood function.

  4. Biases in perceived orientation.
    Figure 4: Biases in perceived orientation.

    (a) Measured distribution of local visual orientation in natural images (gray line, replotted from ref. 16), superimposed with the parametric description we used for the model predictions (black line: p(θ) = c0(2 − |sin θ|) where c0 is a normalization constant). (b) Predicted mean biases as a function of stimulus orientation θ and different levels of sensory noise; biases are generally repulsive, that is, away from the nearest cardinal orientation, with larger biases for larger noise magnitudes. (c) Predictions are presented as in b but for different levels of stimulus noise; here the repulsive biases are smaller for larger noise magnitudes, eventually becoming attractive for very large levels. Curves in b and c represent the expected bias values over the full measurement distributions. (d) Measured biases at 15 degrees oblique orientation15, 32 (average over all four orientations indicated by dashed lines in a). The biases match the predicted behavior shown in Figure 3d well. (e) Measured biases as a function of sensory noise (±1 s.e.m.). Sensory noise was modulated by different stimulus presentation times (low to high: 1,000 ms, 160 ms, 80 ms, 40 ms). Data from ref. 32 were reanalyzed. (f) Measured biases for two levels of additive Gaussian stimulus noise (N = 5 subjects, mean ± 2 s.e.m.). Arrows indicate the mean bias over all orientations in each of two corresponding quadrants (for example, top dark blue arrow: mean bias for high stimulus noise computed over the range (0,45)(90,135) degrees). The overall biases were clearly repulsive and were reduced for larger stimulus noise. Data are replotted from ref. 15.

  5. Relative biases in perceived orientation.
    Figure 5: Relative biases in perceived orientation.

    Relative bias is the difference in perceived orientation between a high-noise and a low-noise stimulus (reference). (a) Two orientation stimuli with different levels of stimulus noise. Each stimulus consists of an array of Gabor elements, and the width of the distribution from which the orientations of the elements were sampled controls the noise level. (b) Measured relative biases as a function of stimulus orientation using the stimuli shown in a; data are replotted from ref. 15 (blue; N = 5 subjects, mean ± 2 s.e.m.) and ref. 16 (green; N = 5 subjects, mean and 95% quantile). The relative bias is attractive because the repulsive bias is smaller for the high-noise stimulus (see Fig. 4f). (c) Two orientation stimuli associated with different levels of sensory noise. Sensory noise can be modulated by stimulus contrast (this example) or presentation time32 with lower contrast/shorter presentation time corresponding to higher sensory noise. (d) Measured relative bias (±1 s.e.m.) between the percepts of two stimuli with different sensory noise as a function of stimulus orientation32. Relative bias is repulsive because the repulsive bias is larger for larger sensory noise. Unlike previous models15, 16, the new model accounts for both relative bias patterns.

  6. Biases in perceived spatial frequency.
    Figure 6: Biases in perceived spatial frequency.

    (a) The amplitude spectrum of spatial frequency in natural images approximately follows a power-law function of the form p(ξ) 1/ξα, with reported values for α around 1 (refs. 33,34). We assumed the spectrum to be a good proxy for the prior distribution over spatial frequency (we set α = 1). (b) The predicted biases as a function of spatial frequency for different levels of sensory (internal) noise. (c) Predicted biases for different levels of stimulus (external) noise. (d) Biases in perceived spatial frequency measured for different levels of sensory noise (N = 3; mean). Data are replotted from ref. 36. The experiments used different levels of stimulus contrast (1, 2, 4, 8, 16 and 32%) to modulate sensory noise. Stimuli consisted of a Gabor patch with different spatial frequency. The predicted biases for stimulus noise in c have not been validated yet. Note that, at very low and very high spatial frequencies, the amplitude spectrum is no longer well described by a single power-law function33. As a result, our predictions here are limited to the intermediate frequency range.

  7. Predicted biases for different loss functions.
    Figure 7: Predicted biases for different loss functions.

    (a) Predicted biases in perceived orientation for the observer model with L0 norm (posterior mode, black), L1 norm (posterior median, blue) or L2 norm (posterior mean, green) loss function. Both the L1 and L2 norm predict repulsive biases, whereas the L0 norm always leads to attractive biases. Sensory noise is fixed and identical for all three models. (b) Adding stimulus noise reduces the likelihood asymmetry and thus increases the attractive influence of the prior. The influence of the likelihood asymmetry is weaker with the L1 loss than the L2 loss, explaining the transition to an attractive bias curve. (c,d) A similar pattern is predicted for the perceptual biases in spatial frequency.

  8. Equivalent efficient neural representations for the same stimulus distribution.
    Figure 8: Equivalent efficient neural representations for the same stimulus distribution.

    (a) A circular stimulus variable θ with stimulus distribution p(θ). Three different neural populations that efficiently encode θ according to equation (1) are shown. (b) The tuning curves of the first population were constrained to be wide and identical. Efficient coding promotes a distribution of the neurons such that the flanks of the individual tuning curves most overlap at the peaks of the prior distribution. The neural density (green dots) is such that it is lowest at the prior peaks. (c) The tuning curves and the neural density of a second population were allowed to vary24, 40. As a result, the density followed the prior distribution49 (blue dots) and tuning curves were narrowest at the prior peaks. (d) The tuning curve shapes as well as the neural density (gray dots) of the third population were constrained to be identical/homogeneous. Only the gain was allowed to vary. As a result, neurons at the peak of the prior had highest gain. (eg) Population likelihoods for all three populations (averaged over 400 presentations of the same stimulus value θ0, assuming independent Poisson spike count variability) were similar (up to a scale factor) and showed the predicted asymmetry with the heavy tail pointing away from the nearest peak of the prior. (hj) As a result, Bayesian decoding with prior p(θ) of all three populations resulted in similar repulsive biases. Biases were computed over 10,000 samples of the neural population response.


  1. Attneave, F. Some informational aspects of visual perception. Psychol. Rev. 61, 183 (1954).
  2. Barlow, H.B. Possible principles underlying the transformation of sensory messages. in Sensory Communication (ed. Rosenblith, W.A.) 217234 (MIT Press, 1961).
  3. Olshausen, B.A. & Field, D.J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607609 (1996).
  4. Dan, Y., Atick, J.J. & Reid, R.C. Efficient coding of natural scenes in the lateral geniculate nucleus: Experimental test of a computational theory. J. Neurosci. 16, 33513362 (1996).
  5. Lewicki, M.S. Efficient coding of natural sounds. Nat. Neurosci. 5, 356363 (2002).
  6. Helmholtz, H. Treatise on Physiological Optics (transl.) (Thoemmes Press, Bristol, UK, 2000).
  7. Curry, R.E. A Bayesian model for visual space perception. in Seventh Annual Conference on Manual Control NASA SP-281, 187ff (NASA, 1972).
  8. Knill, D.C. & Richards, W. Perception as Bayesian Inference (Cambridge University Press, 1996).
  9. Körding, K.P. & Wolpert, D. Bayesian integration in sensorimotor learning. Nature 427, 244247 (2004).
  10. Stocker, A.A. & Simoncelli, E.P. Noise characteristics and prior expectations in human visual speed perception. Nat. Neurosci. 9, 578585 (2006).
  11. van den Berg, R., Vogel, M., Josic, K. & Ma, W.J. Optimal inference of sameness. Proc. Natl. Acad. Sci. USA 109, 31783183 (2012).
  12. Jazayeri, M. & Shadlen, M.N. Temporal context calibrates interval timing. Nat. Neurosci. 13, 10201026 (2010).
  13. Jones, M. & Love, B.C. Bayesian fundamentalism or enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition. Behav. Brain Sci. 34, 169188 (2011).
  14. Bowers, J.S. & Davis, C.J. Bayesian just-so stories in psychology and neuroscience. Psychol. Bull. 138, 389 (2012).
  15. Tomassini, A., Morgan, M.J. & Solomon, J.A. Orientation uncertainty reduces perceived obliquity. Vision Res. 50, 541547 (2010).
  16. Girshick, A.R., Landy, M.S. & Simoncelli, E.P. Cardinal rules: visual orientation perception reflects knowledge of environmental statistics. Nat. Neurosci. 14, 926932 (2011).
  17. Geisler, W.S., Najemnik, J. & Ing, A.D. Optimal stimulus encoders for natural tasks. J. Vis. 9, 17.117.16 (2009).
  18. Burge, J. & Geisler, W.S. Optimal defocus estimation in individual natural images. Proc. Natl. Acad. Sci. USA 108, 1684916854 (2011).
  19. Brayanov, J.B. & Smith, M.A. Bayesian and “Anti-Bayesian” biases in sensory integration for action and perception in the size-weight illusion. J. Neurophysiol. 103, 15181531 (2010).
  20. Wei, X.-X. & Stocker, A.A. Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference. Adv. Neural Inf. Process. Syst. 25, 13131321 (2012).
  21. Linsker, R. Self-organization in a perceptual network. Computer 21, 105117 (1988).
  22. Brunel, N. & Nadal, J.-P. Mutual information, Fisher information, and population coding. Neural Comput. 10, 17311757 (1998).
  23. McDonnell, M.D. & Stocks, N.G. Maximally informative stimuli and tuning curves for sigmoidal rate-coding neurons and populations. Phys. Rev. Lett. 101, 058103 (2008).
  24. Ganguli, D. & Simoncelli, E.P. Implicit encoding of prior probabilities in optimal neural populations. Adv. Neural Inf. Process. Syst. 23, 658666 (2010).
  25. Fechner, G.T. Elemente der Psychophysik (Breitkopf und Haertel, Leipzig, 1860).
  26. Stocker, A.A. & Simoncelli, E.P. Sensory adaptation within a Bayesian framework for perception. Adv. Neural Inf. Process. Syst. 18, 1289 (2006).
  27. Webb, B.S., Ledgeway, T. & McGraw, P.V. Relating spatial and temporal orientation pooling to population decoding solutions in human vision. Vision Res. 50, 22742283 (2010).
  28. Putzeys, T., Bethge, M., Wichmann, F., Wagemans, J. & Goris, R. A new perceptual bias reveals supoptimal population decoding of sensory responses. PLoS Comput. Biol. 8, e1002453 (2012).
  29. Switkes, E., Mayer, M.J. & Sloan, J.A. Spatial frequency analysis of the visual environment: anisotropy and the carpentered environment hypothesis. Vision Res. 18, 13931399 (1978).
  30. Coppola, D.M., Purves, H.R., McCoy, A.N. & Purves, D. The distribution of oriented contours in the real world. Proc. Natl. Acad. Sci. USA 95, 40024006 (1998).
  31. Jastrow, J. Studies from the University of Wisconsin: on the judgment of angles and positions of lines. Am. J. Psychol. 5, 214248 (1892).
  32. de Gardelle, V., Kouider, S. & Sackur, J. An oblique illusion modulated by visibility: non-monotonic sensory integration in orientation processing. J. Vis. 10, 6 (2010).
  33. Ruderman, D.L. The statistics of natural images. Network 5, 517548 (1994).
  34. Dong, D.W. & Atick, J.J. Statistics of natural time-varying images. Network 6, 345358 (1995).
  35. Campbell, F.W. & Robson, J.G. Application of Fourier analysis to the visibility of gratings. J. Physiol. (Lond.) 197, 551 (1968).
  36. Georgeson, M.A. & Ruddock, K.H. Spatial frequency analysis in early visual processing. Phil. Trans. R. Soc. Lond. B [and discussion] 290, 1122 (1980).
  37. Körding, K.P. & Wolpert, D. The loss function of sensorimotor learning. Proc. Natl. Acad. Sci. USA 101, 98399842 (2004).
  38. Wang, Z., Stocker, A.A. & Lee, D.D. Optimal neural tuning curves for arbitrary stimulus distributions: Discrimax, Infomax and minimum Lp loss. Adv. Neural Inf. Process. Syst. 25, 21772185 (2012).
  39. Salinas, E. How behavioral constraints may determine optimal sensory representations. PLoS Biol. 4, e387 (2006).
  40. Ganguli, D. & Simoncelli, E.P. Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput. 26, 21032134 (2014).
  41. Simoncelli, E.P. & Olshausen, B.A. Natural image statistics and neural representation. Annu. Rev. Neurosci. 24, 11931216 (2001).
  42. Laughlin, S.B. A simple coding procedure enhances a neuron's information capacity. Z. Naturforsch. C. 36, 910912 (1981).
  43. Stilp, C.E. & Kluender, R.K. Efficient coding and statistically optimal weighting of covarinace among acoustic attributes in novel sounds. PLoS ONE 7, e30845 (2012).
  44. Chalk, M., Seitz, A.R. & Series, P. Rapidly learned stimulus expectations alter perception of motion. J. Vis. 10, 2 (2010).
  45. Crane, B.T. Direction specific biases in human visual and vestibular heading perception. PLoS ONE 7, e51383 (2012).
  46. Cuturi, L.F. & MacNeilage, P.R. Systematic biases in human heading estimation. PLoS ONE 8, e56862 (2013).
  47. Rose, D. & Blakemore, C. An analysis of orientation selectivity in the cat's visual cortex. Exp. Brain Res. 20, 117 (1974).
  48. Gu, Y., Fetsch, C.R., Adeyemo, B., DeAngelis, G.C. & Angelaki, D.E. Decoding of MSTd population activity accounts for variations in the precision of heading perception. Neuron 66, 596609 (2010).
  49. Fischer, B.J. Bayesian estimates from heterogeneous population codes. Proc. Int. Jt. Conf. Neural Netw., 17 (IEEE, 2010).
  50. Wei, X.-X. & Stocker, A.A. Bayesian inference with efficient neural population codes. in Artificial Neural Networks and Machine Learning–ICANN 2012 (eds. Villa, A., Duch, W., Erdi, P., Masulli, F. & Palm, G.) 523530 (Springer, 2012).
  51. Kullback, S. & Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 22, 7986 (1951).
  52. Mr. Bayes & Mr. Price. An essay towards solving a problem in the doctrine of chances, by the late Rev. Mr. Bayes, F. R. S. communicated by Mr. Price, in a letter to John Canton, A. M. F. R. S. Philos. Trans. 53, 370418 (1763).

Download references

Author information


  1. Department of Psychology, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    • Xue-Xin Wei &
    • Alan A Stocker
  2. Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, Pennsylvania, USA.

    • Alan A Stocker


Both authors jointly designed and performed the research, and wrote the paper.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Methods Checklist (360 KB)

    Supplementary Information

Additional data