Abstract
Bayesian models of behavior suggest that organisms represent uncertainty associated with sensory variables. However, the neural code of uncertainty remains elusive. A central hypothesis is that uncertainty is encoded in the population activity of cortical neurons in the form of likelihood functions. We tested this hypothesis by simultaneously recording population activity from primate visual cortex during a visual categorization task in which trialtotrial uncertainty about stimulus orientation was relevant for the decision. We decoded the likelihood function from the trialtotrial population activity and found that it predicted decisions better than a point estimate of orientation. This remained true when we conditioned on the true orientation, suggesting that internal fluctuations in neural activity drive behaviorally meaningful variations in the likelihood function. Our results establish the role of populationencoded likelihood functions in mediating behavior and provide a neural underpinning for Bayesian models of perception.
Access options
Subscribe to Journal
Get full journal access for 1 year
$209.00
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
from$8.99
All prices are NET prices.
Code availability
Codes used for modeling and training the DNNs, as well as for figure generation, can be viewed and downloaded from https://github.com/eywalker/v1_likelihood. All other codes used for analysis, including data selection and decision model fitting, can be found at https://github.com/eywalker/v1_project. Finally, codes used for electrophysiology data processing can be found in the Tolias lab GitHub organization website (https://github.com/atlab).
References
 1.
Laplace, P.S. Theorie Analytique des Probabilites (Ve Courcier, Paris, 1812).
 2.
von Helmholtz, H. Versuch einer erweiterten Anwendung des Fechnerschen Gesetzes im farbensystem. Z. Psychol. Physiol. Sinnesorg 2, 1–30 (1891).
 3.
Knill, D. C. & Richards, W. (eds) Perception as Bayesian Inference (Cambridge University Press, 1996).
 4.
Kersten, D., Mamassian, P. & Yuille, A. Object perception as Bayesian inference. Annu. Rev. Psychol. 55, 271–304 (2004).
 5.
Knill, D. C. & Pouget, A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719 (2004).
 6.
Ma, W. J. & Jazayeri, M. Neural coding of uncertainty and probability. Annu. Rev. Neurosci. 37, 205–220 (2014).
 7.
Alais, D. & Burr, D. The ventriloquist effect results from nearoptimal bimodal integration. Curr. Biol. 14, 257–262 (2004).
 8.
Ernst, M. O. & Banks, M. S. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415, 429–433 (2002).
 9.
Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9, 1432–1438 (2006).
 10.
Beck, J. M. et al. Probabilistic population codes for bayesian decision making. Neuron 60, 1142–1152 (2008).
 11.
Pouget, A., Dayan, P. & Zemel, R. Information processing with population codes. Nat. Rev. Neurosci. 1, 125–132 (2000).
 12.
Pouget, A., Dayan, P. & Zemel, R. S. Inference and computation with population codes. Annu. Rev. Neurosci. 26, 381–410 (2003).
 13.
Ma, W. J., Beck, J. M. & Pouget, A. Spiking networks for Bayesian inference and choice. Curr. Opin. Neurobiol. 18, 217–222 (2008).
 14.
Graf, A. B. A., Kohn, A., Jazayeri, M. & Movshon, J. A. Decoding the activity of neuronal populations in macaque primary visual cortex. Nat. Neurosci. 14, 239–245 (2011).
 15.
Qamar, A. T. et al. Trialtotrial, uncertaintybased adjustment of decision boundaries in visual categorization. Proc. Natl. Acad. Sci. USA 110, 20332–20337 (2013).
 16.
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
 17.
Goodfellow, A., Bengio, I. & Courville, Y. Deep Learning (MIT Press, 2016).
 18.
Seung, H. S. & Sompolinsky, H. Simple models for reading neuronal population codes. Proc. Natl. Acad. Sci. USA 90, 10749–10753 (1993).
 19.
Sanger, T. D. Probability density estimation for the interpretation of neural population codes. J. Neurophysiol. 76, 2790–2793 (1996).
 20.
Zemel, R. S., Dayan, P. & Pouget, A. Probabalistic interpretation of population codes. Neural Comput. 10, 403–430 (1998).
 21.
Jazayeri, M. & Movshon, J. A. Optimal representation of sensory information by neural populations. Nat. Neurosci. 9, 690–696 (2006).
 22.
Fetsch, C. R., Pouget, A., Deangelis, G. C. & Angelaki, D. E. Neural correlates of reliabilitybased cue weighting during multisensory integration. Nat. Neurosci. 15, 146–154 (2012).
 23.
Averbeck, B. B. & Lee, D. Effects of noise correlations on information encoding and decoding. J. Neurophysiol. 95, 3633–3644 (2006).
 24.
Ecker, A. S. et al. Decorrelated neuronal firing in coritcal micorcircuits. Science 327, 584–587 (2010).
 25.
Ecker, A. S., Berens, P., Tolias, A. S. & Bethge, M. The effect of noise correlations in populations of diversely tuned neurons. J. Neurosci. 31, 14272–14283 (2011).
 26.
Ecker, A. S. et al. State dependence of noise correlations in macaque primary visual cortex. Neuron 82, 235–248 (2014).
 27.
van Bergen, R. S. & Jehee, J. F. M. Modeling correlated noise is necessary to decode uncertainty. Neuroimage 180, 78–87 (2018).
 28.
Denfield, G. H., Ecker, A. S., Shinn, T. J., Bethge, M. & Tolias, A. S. Attentional fluctuations induce shared variability in macaque primary visual cortex. Nat. Commun. 9, 2654 (2018).
 29.
Ma, W. J. Signal detection theory, uncertainty, and poissonlike population codes. Vis. Res. 50, 2308–2319 (2010).
 30.
Van Bergen, R. S., Ma, W. J., Pratte, M. S. & Jehee, J. F. M. Sensory uncertainty decoded from visual cortex predicts behavior. Nat. Neurosci. 18, 1728–1730 (2015).
 31.
Tolhurst, D. J., Movshon, J. A. & Dean, A. F. The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vis. Res. 23, 775–785 (1983).
 32.
Shadlen, M. N. & Newsome, W. T. The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J. Neurosci. 18, 3870–3896 (1998).
 33.
Ancona, M., Ceolini, E., Öztireli, C. & Gross, M. A unified view of gradientbased attribution methods for deep neural networks. In NIPS 2017 Workshop onInterpreting, Explaining and Visualizing Deep Learning http://www.interpretableml.org/nips2017workshop/papers/02.pdf (2017).
 34.
Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at arXiv https://arxiv.org/abs/1312.6034 (2013).
 35.
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. in Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 70 (eds Precup, D. & Teh, Y. W.) 3145–3153 (2017).
 36.
Campbell, F. W. & Kulikowski, J. J. The visual evoked potential as a function of contrast of a grating pattern. J. Physiol. 222, 345–356 (1972).
 37.
Britten, K. H., Newsome, W. T., Shadlen, M. N., Celebrini, S. & Movshon, J. A. A relationship between behavioral choice and the visual responses of neurons in macaque mt. Vis. Neurosci. 13, 87–100 (1996).
 38.
Angelaki, D. E., Humphreys, G. & DeAngelis, G. C. Multisensory integration. J. Theor. Humanit. 19, 452–458 (2009).
 39.
Ma, W. J., Navalpakkam, V., Beck, J. M., van den Berg, R. & Pouget, A. Behavior and neural basis of nearoptimal visual search. Nat. Neurosci. 14, 783–790 (2011).
 40.
Beck, J. M., Latham, P. E. & Pouget, A. Marginalization in neural circuits with divisive normalization. J. Neurosci. 31, 15310–15319 (2011).
 41.
Ma, W. J. & Rahmati, M. Towards a neural implementation of causal inference in cue combination. Multisens. Res. 26, 159–176 (2013).
 42.
Orhan, A. E. & Ma, W. J. Efficient probabilistic inference in generic neural networks trained with nonprobabilistic feedback. Nat. Commun. 8, 138 (2017).
 43.
Cumming, B. G. & Nienborg, H. Feedforward and feedback sources of choice probability in neural population responses. Curr. Opin. Neurobiol. 37, 126–132 (2016).
 44.
Bondy, A. G., Haefner, R. M. & Cumming, B. G. Feedback determines the structure of correlated variability in primary visual cortex. Nat. Neurosci. 21, 598–606 (2018).
 45.
Geisler, W. S. Contributions of ideal observer theory to vision research. Vis. Res. 51, 771–781 (2011).
 46.
Körding, K. Decision theory: what ‘should’ the nervous system do? Science 318, 606–610 (2017).
 47.
Maloney, L. T. & Mamassian, P. Bayesian decision theory as a model of human visual perception: testing Bayesian transfer. Vis. Neurosci. 26, 147–155 (2009).
 48.
Ma, W. J. Organizing probabilistic models of perception. Trends Cogn. Sci. 16, 511–518 (2012).
 49.
Brainard, D. H. The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997).
 50.
Tolias, A. S. et al. Recording chronically from the same neurons in awake, behaving primates. J. Neurophysiol. 98, 3780–3790 (2007).
 51.
Subramaniyan, M., Ecker, A. S., Berens, P. & Tolias, A. S. Macaque monkeys perceive the flash lag illusion. PLoS ONE 8, e58788 (2013).
 52.
Quiroga, R. Q., Nadasdy, Z. & BenShaul, Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput. 16, 1661–1687 (2004).
 53.
Kohn, A. & Movshon, J. A. Adaptation changes the direction tuning of macaque MT neurons. Nat. Neurosci. 7, 764–772 (2004).
 54.
Richard, M. D. & Lippmann, R. P. Neural network classifiers estimate bayesian a posteriori probabilities. Neural Comput. 3, 461–483 (1991).
 55.
Kline, D. M. & Berardi, V. L. Revisiting squarederror and crossentropy functions for training neural network classifiers. Neural Comput. Appl. 14, 310–318 (2005).
 56.
Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951).
 57.
MacKay, D. J. C. Information Theory, Inference, and Learning Algorithms Vol. 22 (Cambridge University Press, 2003).
 58.
Srivastava, N., Hinton, G., Krizhevsky, A. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
 59.
Prechelt, L. in Neural Networks: Tricks of the Trade (eds Grégoire, M., Orr, G. B. & Müller, K.R.) 53–68 (SpringerVerlag, 1998).
 60.
Jaderberg, M., Simonyan, K., Zisserman, A. & Kavukcuoglu, K. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015).
 61.
Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) (The MIT Press, 2005).
 62.
Shrikumar, A., Greenside, P., Shcherbina, A. & Kundaje, A. Not just a black box: learning important features through propagating activation differences. Preprint at arXiv https://arxiv.org/abs/1605.01713 (2016).
 63.
Mamassian, P. & Landy, M. S. Observer biases in the 3D interpretation of line drawings. Vis. Res. 38, 2817–2832 (1998).
 64.
Acerbi, L., Vijayakumar, S. & Wolpert, D. M. On the origins of suboptimality in human probabilistic inference. PLoS Comput. Biol. 10, e1003661 (2014).
 65.
Acerbi, L. & Ma, W. J. Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. Adv. Neural Inf. Process. Syst. 30, 1836–1846 (2017).
Acknowledgements
The research was supported by a National Science Foundation Grant (no. IIS1132009 to W.J.M. and A.S.T.), a DP1 EY023176 Pioneer Grant (to A.S.T.) and grants from the US Department of Health & Human Services, National Institutes of Health, National Eye Institute (nos. F30 EY025510 to E.Y.W., R01 EY026927 to A.S.T. and W.J.M., and T32 EY00252037 and T32 EY07001 to A.S.T.) and National Institute of Mental Health (nos. F30 F30MH088228 to R.J.C.). We thank F. Sinz for helpful discussion and suggestions on the DNN fitting to likelihood functions. We also thank T. Shinn for assistance in the behavioral training of the monkeys and experimental data collection.
Author information
Affiliations
Contributions
All authors designed the experiments and developed the theoretical framework. R.J.C. programmed the experiment. R.J.C. trained the first monkey, and R.J.C. and E.Y.W. recorded data from this monkey. E.Y.W. trained and recorded from the second monkey. E.Y.W. performed all data analyses. E.Y.W. wrote the manuscript, with contributions from all authors. W.J.M. and A.S.T. supervised all stages of the project.
Corresponding authors
Ethics declarations
Competing interests
E.Y.W. and A.S.T. hold equity ownership in Vathes LLC, which provides development and consulting for the open source software (DataJoint) used to develop and operate a data analysis pipeline for this publication.
Additional information
Peer review information Nature Neuroscience thanks Jan Drugowitsch and Robbe Goris for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Number of trials per contrastsession.
Each point corresponds to a single contrastsession, depicting the number of trials performed at the particular contrast.
Extended Data Fig. 2 Example decoded likelihood functions.
Example decoded likelihood functions under FullLikelihood, Poissonlike and IndependentPoisson based decoders are shown for randomly selected trials from three distinct contrastsessions from Monkey T.
Extended Data Fig. 3 Performance of the likelihood functions decoded by DNNbased decoders.
a, b, Results on independent Poisson population responses. a, KL divergence between the ground truth likelihood function and likelihood function decoded with: a trained DNN \(D_{{\mathrm{DNN}}}\) vs. independent Poisson distribution assumption \(D_{{\mathrm{Poiss}}}\). Each point is a single trial in the test set. The distributions of \(D_{{\mathrm{DNN}}}\) and \(D_{{\mathrm{Poiss}}}\) are shown at the top and right margins, respectively. The distribution of pairwise difference between \(D_{{\mathrm{DNN}}}\) and \(D_{{\mathrm{Poiss}}}\)is shown on the diagonal. b, Example likelihood functions. The ground truth (solid blue), independentPoisson based (dotted orange), and DNNbased (dashed green) likelihood functions are shown for selected trials from the test set. Four random samples (columns) were drawn from the top, middle and bottom 1/3 of trials sorted by the \(D_{{\mathrm{DNN}}}\) (rows). c, d, Same as in a, b but for simulated population responses with correlated Gaussian distribution where the variance is scaled by the mean.
Extended Data Fig. 4 Alternative relationships between the likelihood function and the decision.
Possible relationships between variables in the model are indicated by black arrows. We consider two scenarios: a, c the likelihood function \({\cal{L}}\) mediates the decision \(\hat C\), b, d the likelihood function does not mediate the decision. The gray arrow represents the trialbytrial fluctuations in the subject’s decisions \(\hat C\) as predicted by the variable. a, b, When not conditioning on the stimulus \(s\), the stimulus can drive correlation among all variables, making it difficult to distinguish the two scenarios. c, d, When conditioning on the stimulus (red push pins), we expect correlation between \(\hat C\) and \({\cal{L}}\) only when \({\cal{L}}\) mediates the decision, allowing us to distinguish the two scenarios. The variable r represents the recorded cortical population and r_{all} represents responses of all recorded and unrecorded neurons.
Extended Data Fig. 5 FixedUncertainty decoder.
a, A schematic of a DNN for the FixedUncertainty decoder mapping r to the decoded likelihood function \(\mathbf{L}\). For each contrastsession, the FixedUncertainty decoder learns a single fixedshape likelihood function \({\mathbf{L}}_0\) and a network that shifts \({\mathbf{L}}_0\) based on the population response. Therefore, all resulting likelihood functions share the same shape (uncertainty) but differ in the center location from trial to trail. b, Example decoded likelihood functions from randomly selected trials from a single contrastsession for both the FixedUncertainty decoder and the FullLikelihood decoder.
Extended Data Fig. 6 Fitted Bayesian decision maker parameters.
Each point corresponds to a single contrastsession, depicting the average fitted parameter value across 10 crossvalidation training sets plotted against the contrast of the contrastsession. The solid line and error bars/shaded area depicts the mean and the standard error of the mean of the parameter value for binned contrast values, respectively.
Extended Data Fig. 7 Model performance on decision predictions.
a, b, Model performance measured in proportions of trials correctly predicted by the model as a function of contrast for four decision models based on different likelihood decoders (n=110,695 and n=192,630 total trials across all contrasts for Monkey L and T, respectively). On each trial, the class decision that would maximize the posterior \(P( {\hat C{\mathrm{}}{\mathbf{r}}} )\) was chosen to yield a concrete classification prediction. c, d, Same as in a, b but with performance measured as the trialaveraged log likelihood of the model. For a, b and c, d, black dashed lines indicate the performance at chance (50 % and \(\ln \left( {0.5} \right)\), respectively). e, f, The average trialbytrial performance of the FullLikelihood, Poissonlike and Independent Poisson Models are shown relative to the FixedUncertainty Model across contrasts, measured as the average trial difference in the log likelihood (n=110,695 and n=192,630 total trials for Monkey L and T, respectively). Results are shown for the crossvalidated datasets. All data points are the means and error bar/shaded area indicates the standard error of the mean.
Extended Data Fig. 8 Model performance based on population responses to different stimulus windows.
a, c, Average trialbytrial performance of the FullLikelihood Model relative to the FixedUncertainty Model across contrasts, measured as the average trial difference in the log likelihood. The models were trained and evaluated on the population response to (a) the first half (0—250 ms, ‘fh’) (n=110,816 and n=192,962 total trials for Monkey L and T) or (c) the second half (250—500 ms, ‘sh’) (n=110,887 and n=192,980 total trials for Monkey L and T) of the stimulus presentation. The results for the original (unshuffled) and the shuffled data are shown in solid and dashed lines, respectively. The squares and triangles mark Monkey L and T, respectively. b, d, Relative model performance summarized across all contrasts based on models trained as described in (a, c). Performance on the original and the shuffled data is shown individually for both monkeys. The trial log likelihood difference between the two models was statistically significant for both stimulus windows, and on both the original and the shuffled data for both monkeys (two tailed paired ttests; Monkey L: \(t_{{\mathrm{fh}},{\mathrm{original}}}\left( {110815} \right) = 31.29\), \(t_{{\mathrm{sh}},{\mathrm{original}}}\left( {110886} \right) = 25.86\), \({\mathrm{t}}_{{\mathrm{sh}},{\mathrm{shuffled}}}\left( {110886} \right) =  6.98\); Monkey T: \({\mathrm{t}}_{{\mathrm{fh}},{\mathrm{original}}}\left( {192961} \right) = 18.48\), \({\mathrm{t}}_{{\mathrm{fh}},{\mathrm{shuffled}}}\left( {192961} \right) =  19.31\), \({\mathrm{t}}_{{\mathrm{sh}},{\mathrm{original}}}\left( {192979} \right) = 19.01\), \({\mathrm{t}}_{{\mathrm{sh}},{\mathrm{shuffled}}}\left( {192979} \right) =  20.17\); all with \(p < 10^{  9}\)), 0—250 ms for Monkey L (\(t_{{\mathrm{fh}},{\mathrm{shuffled}}}\left( {110815} \right) = 1.89\) with \(P = 0.17\)). The difference between the FullLikelihood Model on the original and the shuffled data was significant for both monkeys for both stimulus windows (two tailed paired ttests; Monkey L: \({\mathrm{t}}_{{\mathrm{fh}}}\left( {110815} \right) = 32.73\), \({\mathrm{t}}_{{\mathrm{sh}}}\left( {110886} \right) = 37.10\); Monkey T: \({\mathrm{t}}_{{\mathrm{fh}}}\left( {192961} \right) = 40.69\), \({\mathrm{t}}_{{\mathrm{sh}}}\left( {192979} \right) = 42.78\); all with \(P < 10^{  9}\)). All p values are Bonferroni corrected for the three comparisons. All data points are means, and error bar/shaded area indicate standard error of the means.
Extended Data Fig. 9 Expected model performance on simulated data and observed effect of shuffling.
a, b, Using the trained FullLikelihood Model as the ground truth to simulate the behavior, the expected performances of the model on the simulated data was assessed. a, Average trialbytrial performance of the FullLikelihood Model relative to the FixedUncertainty Model across contrasts on the simulated data, measured as the trialaveraged difference in the log likelihood. The results for the unshuffled and the shuffled simulated data are shown in solid and dashed lines, respectively. The squares and triangles mark Monkey L and T, respectively. b, Relative model performance summarized across all contrasts. Results are shown for each monkey and for unshuffled and shuffled simulated data. For a and b, all data points are the means and error bar/shaded area indicates the standard deviation across the 5 simulation repetitions. For b, data points for individual simulation repetitions are depicted by gray icons next to the error bars. c, The dependence of the width of the likelihood function \(\sigma _L\) on the stimulus orientation is depicted for an example contrastsession (Monkey T, 8 % contrast, n=1,126 trials) on the original and the shuffled data. The shuffling procedure preserves the relationship between the average likelihood width and the stimulus orientation as desired. All data points are means, and error bar indicates standard deviation across trials falling in the specific bin.
Extended Data Fig. 10 Contributions of multiunits to the total attribution.
a, For each contrastsession, the multiunits were ordered from the largest to the smallest attribution to the likelihood mean \(A_\mu\), and the cumulative attribution over the total of 96 multiunits were plotted (thin gray lines, n=545 total contrastsessions from Monkey L and T). The average cumulative attribution over all contrastsessions are depicted by the thick black lines. The results are shown for each attribution method separately. b, Same as in a, but for attribution to the likelihood standard deviation \(A_\sigma\).
Supplementary information
Supplementary Information
Supplementary Table 1.
Rights and permissions
About this article
Cite this article
Walker, E.Y., Cotton, R.J., Ma, W.J. et al. A neural basis of probabilistic computation in visual cortex. Nat Neurosci 23, 122–129 (2020). https://doi.org/10.1038/s4159301905545
Received:
Accepted:
Published:
Issue Date:
Further reading

Unmet expectations delay sensory processes
Vision Research (2021)

Bayesian modeling of the mind: From norms to neurons
WIREs Cognitive Science (2021)

Parsing Neurodynamic Information Streams to Estimate the Frequency, Magnitude and Duration of Team Uncertainty
Frontiers in Systems Neuroscience (2021)

The extensible DataBrain model: Architecture, applications and directions
Journal of Computational Science (2020)

Probabilistic representations in perception: Are there any, and what would they be?
Mind & Language (2020)