A neural basis of probabilistic computation in visual cortex

Abstract

Bayesian models of behavior suggest that organisms represent uncertainty associated with sensory variables. However, the neural code of uncertainty remains elusive. A central hypothesis is that uncertainty is encoded in the population activity of cortical neurons in the form of likelihood functions. We tested this hypothesis by simultaneously recording population activity from primate visual cortex during a visual categorization task in which trial-to-trial uncertainty about stimulus orientation was relevant for the decision. We decoded the likelihood function from the trial-to-trial population activity and found that it predicted decisions better than a point estimate of orientation. This remained true when we conditioned on the true orientation, suggesting that internal fluctuations in neural activity drive behaviorally meaningful variations in the likelihood function. Our results establish the role of population-encoded likelihood functions in mediating behavior and provide a neural underpinning for Bayesian models of perception.

Access options

from\$8.99

All prices are NET prices.

Data availability

All figures except for Fig. 1 and Extended Data Fig. 4 were generated from raw data or processed data. The data generated and/or analyzed during this study are available from the corresponding author upon reasonable request. No publicly available data were used in this study.

Code availability

Codes used for modeling and training the DNNs, as well as for figure generation, can be viewed and downloaded from https://github.com/eywalker/v1_likelihood. All other codes used for analysis, including data selection and decision model fitting, can be found at https://github.com/eywalker/v1_project. Finally, codes used for electrophysiology data processing can be found in the Tolias lab GitHub organization website (https://github.com/atlab).

References

1. 1.

Laplace, P.-S. Theorie Analytique des Probabilites (Ve Courcier, Paris, 1812).

2. 2.

von Helmholtz, H. Versuch einer erweiterten Anwendung des Fechnerschen Gesetzes im farbensystem. Z. Psychol. Physiol. Sinnesorg 2, 1–30 (1891).

3. 3.

Knill, D. C. & Richards, W. (eds) Perception as Bayesian Inference (Cambridge University Press, 1996).

4. 4.

Kersten, D., Mamassian, P. & Yuille, A. Object perception as Bayesian inference. Annu. Rev. Psychol. 55, 271–304 (2004).

5. 5.

Knill, D. C. & Pouget, A. The Bayesian brain: the role of uncertainty in neural coding and computation. Trends Neurosci. 27, 712–719 (2004).

6. 6.

Ma, W. J. & Jazayeri, M. Neural coding of uncertainty and probability. Annu. Rev. Neurosci. 37, 205–220 (2014).

7. 7.

Alais, D. & Burr, D. The ventriloquist effect results from near-optimal bimodal integration. Curr. Biol. 14, 257–262 (2004).

8. 8.

Ernst, M. O. & Banks, M. S. Humans integrate visual and haptic information in a statistically optimal fashion. Nature 415, 429–433 (2002).

9. 9.

Ma, W. J., Beck, J. M., Latham, P. E. & Pouget, A. Bayesian inference with probabilistic population codes. Nat. Neurosci. 9, 1432–1438 (2006).

10. 10.

Beck, J. M. et al. Probabilistic population codes for bayesian decision making. Neuron 60, 1142–1152 (2008).

11. 11.

Pouget, A., Dayan, P. & Zemel, R. Information processing with population codes. Nat. Rev. Neurosci. 1, 125–132 (2000).

12. 12.

Pouget, A., Dayan, P. & Zemel, R. S. Inference and computation with population codes. Annu. Rev. Neurosci. 26, 381–410 (2003).

13. 13.

Ma, W. J., Beck, J. M. & Pouget, A. Spiking networks for Bayesian inference and choice. Curr. Opin. Neurobiol. 18, 217–222 (2008).

14. 14.

Graf, A. B. A., Kohn, A., Jazayeri, M. & Movshon, J. A. Decoding the activity of neuronal populations in macaque primary visual cortex. Nat. Neurosci. 14, 239–245 (2011).

15. 15.

Qamar, A. T. et al. Trial-to-trial, uncertainty-based adjustment of decision boundaries in visual categorization. Proc. Natl. Acad. Sci. USA 110, 20332–20337 (2013).

16. 16.

LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

17. 17.

Goodfellow, A., Bengio, I. & Courville, Y. Deep Learning (MIT Press, 2016).

18. 18.

Seung, H. S. & Sompolinsky, H. Simple models for reading neuronal population codes. Proc. Natl. Acad. Sci. USA 90, 10749–10753 (1993).

19. 19.

Sanger, T. D. Probability density estimation for the interpretation of neural population codes. J. Neurophysiol. 76, 2790–2793 (1996).

20. 20.

Zemel, R. S., Dayan, P. & Pouget, A. Probabalistic interpretation of population codes. Neural Comput. 10, 403–430 (1998).

21. 21.

Jazayeri, M. & Movshon, J. A. Optimal representation of sensory information by neural populations. Nat. Neurosci. 9, 690–696 (2006).

22. 22.

Fetsch, C. R., Pouget, A., Deangelis, G. C. & Angelaki, D. E. Neural correlates of reliability-based cue weighting during multisensory integration. Nat. Neurosci. 15, 146–154 (2012).

23. 23.

Averbeck, B. B. & Lee, D. Effects of noise correlations on information encoding and decoding. J. Neurophysiol. 95, 3633–3644 (2006).

24. 24.

Ecker, A. S. et al. Decorrelated neuronal firing in coritcal micorcircuits. Science 327, 584–587 (2010).

25. 25.

Ecker, A. S., Berens, P., Tolias, A. S. & Bethge, M. The effect of noise correlations in populations of diversely tuned neurons. J. Neurosci. 31, 14272–14283 (2011).

26. 26.

Ecker, A. S. et al. State dependence of noise correlations in macaque primary visual cortex. Neuron 82, 235–248 (2014).

27. 27.

van Bergen, R. S. & Jehee, J. F. M. Modeling correlated noise is necessary to decode uncertainty. Neuroimage 180, 78–87 (2018).

28. 28.

Denfield, G. H., Ecker, A. S., Shinn, T. J., Bethge, M. & Tolias, A. S. Attentional fluctuations induce shared variability in macaque primary visual cortex. Nat. Commun. 9, 2654 (2018).

29. 29.

Ma, W. J. Signal detection theory, uncertainty, and poisson-like population codes. Vis. Res. 50, 2308–2319 (2010).

30. 30.

Van Bergen, R. S., Ma, W. J., Pratte, M. S. & Jehee, J. F. M. Sensory uncertainty decoded from visual cortex predicts behavior. Nat. Neurosci. 18, 1728–1730 (2015).

31. 31.

Tolhurst, D. J., Movshon, J. A. & Dean, A. F. The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vis. Res. 23, 775–785 (1983).

32. 32.

Shadlen, M. N. & Newsome, W. T. The variable discharge of cortical neurons: implications for connectivity, computation, and information coding. J. Neurosci. 18, 3870–3896 (1998).

33. 33.

Ancona, M., Ceolini, E., Öztireli, C. & Gross, M. A unified view of gradient-based attribution methods for deep neural networks. In NIPS 2017 Workshop onInterpreting, Explaining and Visualizing Deep Learning http://www.interpretable-ml.org/nips2017workshop/papers/02.pdf (2017).

34. 34.

Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. Preprint at arXiv https://arxiv.org/abs/1312.6034 (2013).

35. 35.

Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. in Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 70 (eds Precup, D. & Teh, Y. W.) 3145–3153 (2017).

36. 36.

Campbell, F. W. & Kulikowski, J. J. The visual evoked potential as a function of contrast of a grating pattern. J. Physiol. 222, 345–356 (1972).

37. 37.

Britten, K. H., Newsome, W. T., Shadlen, M. N., Celebrini, S. & Movshon, J. A. A relationship between behavioral choice and the visual responses of neurons in macaque mt. Vis. Neurosci. 13, 87–100 (1996).

38. 38.

Angelaki, D. E., Humphreys, G. & DeAngelis, G. C. Multisensory integration. J. Theor. Humanit. 19, 452–458 (2009).

39. 39.

Ma, W. J., Navalpakkam, V., Beck, J. M., van den Berg, R. & Pouget, A. Behavior and neural basis of near-optimal visual search. Nat. Neurosci. 14, 783–790 (2011).

40. 40.

Beck, J. M., Latham, P. E. & Pouget, A. Marginalization in neural circuits with divisive normalization. J. Neurosci. 31, 15310–15319 (2011).

41. 41.

Ma, W. J. & Rahmati, M. Towards a neural implementation of causal inference in cue combination. Multisens. Res. 26, 159–176 (2013).

42. 42.

Orhan, A. E. & Ma, W. J. Efficient probabilistic inference in generic neural networks trained with non-probabilistic feedback. Nat. Commun. 8, 138 (2017).

43. 43.

Cumming, B. G. & Nienborg, H. Feedforward and feedback sources of choice probability in neural population responses. Curr. Opin. Neurobiol. 37, 126–132 (2016).

44. 44.

Bondy, A. G., Haefner, R. M. & Cumming, B. G. Feedback determines the structure of correlated variability in primary visual cortex. Nat. Neurosci. 21, 598–606 (2018).

45. 45.

Geisler, W. S. Contributions of ideal observer theory to vision research. Vis. Res. 51, 771–781 (2011).

46. 46.

Körding, K. Decision theory: what ‘should’ the nervous system do? Science 318, 606–610 (2017).

47. 47.

Maloney, L. T. & Mamassian, P. Bayesian decision theory as a model of human visual perception: testing Bayesian transfer. Vis. Neurosci. 26, 147–155 (2009).

48. 48.

Ma, W. J. Organizing probabilistic models of perception. Trends Cogn. Sci. 16, 511–518 (2012).

49. 49.

Brainard, D. H. The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997).

50. 50.

Tolias, A. S. et al. Recording chronically from the same neurons in awake, behaving primates. J. Neurophysiol. 98, 3780–3790 (2007).

51. 51.

Subramaniyan, M., Ecker, A. S., Berens, P. & Tolias, A. S. Macaque monkeys perceive the flash lag illusion. PLoS ONE 8, e58788 (2013).

52. 52.

Quiroga, R. Q., Nadasdy, Z. & Ben-Shaul, Y. Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering. Neural Comput. 16, 1661–1687 (2004).

53. 53.

Kohn, A. & Movshon, J. A. Adaptation changes the direction tuning of macaque MT neurons. Nat. Neurosci. 7, 764–772 (2004).

54. 54.

Richard, M. D. & Lippmann, R. P. Neural network classifiers estimate bayesian a posteriori probabilities. Neural Comput. 3, 461–483 (1991).

55. 55.

Kline, D. M. & Berardi, V. L. Revisiting squared-error and cross-entropy functions for training neural network classifiers. Neural Comput. Appl. 14, 310–318 (2005).

56. 56.

Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Stat. 22, 79–86 (1951).

57. 57.

MacKay, D. J. C. Information Theory, Inference, and Learning Algorithms Vol. 22 (Cambridge University Press, 2003).

58. 58.

Srivastava, N., Hinton, G., Krizhevsky, A. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

59. 59.

Prechelt, L. in Neural Networks: Tricks of the Trade (eds Grégoire, M., Orr, G. B. & Müller, K.-R.) 53–68 (Springer-Verlag, 1998).

60. 60.

Jaderberg, M., Simonyan, K., Zisserman, A. & Kavukcuoglu, K. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015).

61. 61.

Rasmussen, C. E. & Williams, C. K. I. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) (The MIT Press, 2005).

62. 62.

Shrikumar, A., Greenside, P., Shcherbina, A. & Kundaje, A. Not just a black box: learning important features through propagating activation differences. Preprint at arXiv https://arxiv.org/abs/1605.01713 (2016).

63. 63.

Mamassian, P. & Landy, M. S. Observer biases in the 3D interpretation of line drawings. Vis. Res. 38, 2817–2832 (1998).

64. 64.

Acerbi, L., Vijayakumar, S. & Wolpert, D. M. On the origins of suboptimality in human probabilistic inference. PLoS Comput. Biol. 10, e1003661 (2014).

65. 65.

Acerbi, L. & Ma, W. J. Practical Bayesian optimization for model fitting with Bayesian adaptive direct search. Adv. Neural Inf. Process. Syst. 30, 1836–1846 (2017).

Acknowledgements

The research was supported by a National Science Foundation Grant (no. IIS-1132009 to W.J.M. and A.S.T.), a DP1 EY023176 Pioneer Grant (to A.S.T.) and grants from the US Department of Health & Human Services, National Institutes of Health, National Eye Institute (nos. F30 EY025510 to E.Y.W., R01 EY026927 to A.S.T. and W.J.M., and T32 EY00252037 and T32 EY07001 to A.S.T.) and National Institute of Mental Health (nos. F30 F30MH088228 to R.J.C.). We thank F. Sinz for helpful discussion and suggestions on the DNN fitting to likelihood functions. We also thank T. Shinn for assistance in the behavioral training of the monkeys and experimental data collection.

Author information

Authors

Contributions

All authors designed the experiments and developed the theoretical framework. R.J.C. programmed the experiment. R.J.C. trained the first monkey, and R.J.C. and E.Y.W. recorded data from this monkey. E.Y.W. trained and recorded from the second monkey. E.Y.W. performed all data analyses. E.Y.W. wrote the manuscript, with contributions from all authors. W.J.M. and A.S.T. supervised all stages of the project.

Corresponding authors

Correspondence to Edgar Y. Walker or Wei Ji Ma or Andreas S. Tolias.

Ethics declarations

Competing interests

E.Y.W. and A.S.T. hold equity ownership in Vathes LLC, which provides development and consulting for the open source software (DataJoint) used to develop and operate a data analysis pipeline for this publication.

Peer review information Nature Neuroscience thanks Jan Drugowitsch and Robbe Goris for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Number of trials per contrast-session.

Each point corresponds to a single contrast-session, depicting the number of trials performed at the particular contrast.

Extended Data Fig. 2 Example decoded likelihood functions.

Example decoded likelihood functions under Full-Likelihood, Poisson-like and Independent-Poisson based decoders are shown for randomly selected trials from three distinct contrast-sessions from Monkey T.

Extended Data Fig. 3 Performance of the likelihood functions decoded by DNN-based decoders.

a, b, Results on independent Poisson population responses. a, KL divergence between the ground truth likelihood function and likelihood function decoded with: a trained DNN $$D_{{\mathrm{DNN}}}$$ vs. independent Poisson distribution assumption $$D_{{\mathrm{Poiss}}}$$. Each point is a single trial in the test set. The distributions of $$D_{{\mathrm{DNN}}}$$ and $$D_{{\mathrm{Poiss}}}$$ are shown at the top and right margins, respectively. The distribution of pair-wise difference between $$D_{{\mathrm{DNN}}}$$ and $$D_{{\mathrm{Poiss}}}$$is shown on the diagonal. b, Example likelihood functions. The ground truth (solid blue), independent-Poisson based (dotted orange), and DNN-based (dashed green) likelihood functions are shown for selected trials from the test set. Four random samples (columns) were drawn from the top, middle and bottom 1/3 of trials sorted by the $$D_{{\mathrm{DNN}}}$$ (rows). c, d, Same as in a, b but for simulated population responses with correlated Gaussian distribution where the variance is scaled by the mean.

Extended Data Fig. 4 Alternative relationships between the likelihood function and the decision.

Possible relationships between variables in the model are indicated by black arrows. We consider two scenarios: a, c the likelihood function $${\cal{L}}$$ mediates the decision $$\hat C$$, b, d the likelihood function does not mediate the decision. The gray arrow represents the trial-by-trial fluctuations in the subject’s decisions $$\hat C$$ as predicted by the variable. a, b, When not conditioning on the stimulus $$s$$, the stimulus can drive correlation among all variables, making it difficult to distinguish the two scenarios. c, d, When conditioning on the stimulus (red push pins), we expect correlation between $$\hat C$$ and $${\cal{L}}$$ only when $${\cal{L}}$$ mediates the decision, allowing us to distinguish the two scenarios. The variable r represents the recorded cortical population and rall represents responses of all recorded and unrecorded neurons.

Extended Data Fig. 5 Fixed-Uncertainty decoder.

a, A schematic of a DNN for the Fixed-Uncertainty decoder mapping r to the decoded likelihood function $$\mathbf{L}$$. For each contrast-session, the Fixed-Uncertainty decoder learns a single fixed-shape likelihood function $${\mathbf{L}}_0$$ and a network that shifts $${\mathbf{L}}_0$$ based on the population response. Therefore, all resulting likelihood functions share the same shape (uncertainty) but differ in the center location from trial to trail. b, Example decoded likelihood functions from randomly selected trials from a single contrast-session for both the Fixed-Uncertainty decoder and the Full-Likelihood decoder.

Extended Data Fig. 6 Fitted Bayesian decision maker parameters.

Each point corresponds to a single contrast-session, depicting the average fitted parameter value across 10 cross-validation training sets plotted against the contrast of the contrast-session. The solid line and error bars/shaded area depicts the mean and the standard error of the mean of the parameter value for binned contrast values, respectively.

Extended Data Fig. 7 Model performance on decision predictions.

a, b, Model performance measured in proportions of trials correctly predicted by the model as a function of contrast for four decision models based on different likelihood decoders (n=110,695 and n=192,630 total trials across all contrasts for Monkey L and T, respectively). On each trial, the class decision that would maximize the posterior $$P( {\hat C{\mathrm{|}}{\mathbf{r}}} )$$ was chosen to yield a concrete classification prediction. c, d, Same as in a, b but with performance measured as the trial-averaged log likelihood of the model. For a, b and c, d, black dashed lines indicate the performance at chance (50 % and $$\ln \left( {0.5} \right)$$, respectively). e, f, The average trial-by-trial performance of the Full-Likelihood, Poisson-like and Independent Poisson Models are shown relative to the Fixed-Uncertainty Model across contrasts, measured as the average trial difference in the log likelihood (n=110,695 and n=192,630 total trials for Monkey L and T, respectively). Results are shown for the cross-validated datasets. All data points are the means and error bar/shaded area indicates the standard error of the mean.

Extended Data Fig. 8 Model performance based on population responses to different stimulus windows.

a, c, Average trial-by-trial performance of the Full-Likelihood Model relative to the Fixed-Uncertainty Model across contrasts, measured as the average trial difference in the log likelihood. The models were trained and evaluated on the population response to (a) the first half (0—250 ms, ‘fh’) (n=110,816 and n=192,962 total trials for Monkey L and T) or (c) the second half (250—500 ms, ‘sh’) (n=110,887 and n=192,980 total trials for Monkey L and T) of the stimulus presentation. The results for the original (unshuffled) and the shuffled data are shown in solid and dashed lines, respectively. The squares and triangles mark Monkey L and T, respectively. b, d, Relative model performance summarized across all contrasts based on models trained as described in (a, c). Performance on the original and the shuffled data is shown individually for both monkeys. The trial log likelihood difference between the two models was statistically significant for both stimulus windows, and on both the original and the shuffled data for both monkeys (two tailed paired t-tests; Monkey L: $$t_{{\mathrm{fh}},{\mathrm{original}}}\left( {110815} \right) = 31.29$$, $$t_{{\mathrm{sh}},{\mathrm{original}}}\left( {110886} \right) = 25.86$$, $${\mathrm{t}}_{{\mathrm{sh}},{\mathrm{shuffled}}}\left( {110886} \right) = - 6.98$$; Monkey T: $${\mathrm{t}}_{{\mathrm{fh}},{\mathrm{original}}}\left( {192961} \right) = 18.48$$, $${\mathrm{t}}_{{\mathrm{fh}},{\mathrm{shuffled}}}\left( {192961} \right) = - 19.31$$, $${\mathrm{t}}_{{\mathrm{sh}},{\mathrm{original}}}\left( {192979} \right) = 19.01$$, $${\mathrm{t}}_{{\mathrm{sh}},{\mathrm{shuffled}}}\left( {192979} \right) = - 20.17$$; all with $$p < 10^{ - 9}$$), 0—250 ms for Monkey L ($$t_{{\mathrm{fh}},{\mathrm{shuffled}}}\left( {110815} \right) = 1.89$$ with $$P = 0.17$$). The difference between the Full-Likelihood Model on the original and the shuffled data was significant for both monkeys for both stimulus windows (two tailed paired t-tests; Monkey L: $${\mathrm{t}}_{{\mathrm{fh}}}\left( {110815} \right) = 32.73$$, $${\mathrm{t}}_{{\mathrm{sh}}}\left( {110886} \right) = 37.10$$; Monkey T: $${\mathrm{t}}_{{\mathrm{fh}}}\left( {192961} \right) = 40.69$$, $${\mathrm{t}}_{{\mathrm{sh}}}\left( {192979} \right) = 42.78$$; all with $$P < 10^{ - 9}$$). All p values are Bonferroni corrected for the three comparisons. All data points are means, and error bar/shaded area indicate standard error of the means.

Extended Data Fig. 9 Expected model performance on simulated data and observed effect of shuffling.

a, b, Using the trained Full-Likelihood Model as the ground truth to simulate the behavior, the expected performances of the model on the simulated data was assessed. a, Average trial-by-trial performance of the Full-Likelihood Model relative to the Fixed-Uncertainty Model across contrasts on the simulated data, measured as the trial-averaged difference in the log likelihood. The results for the unshuffled and the shuffled simulated data are shown in solid and dashed lines, respectively. The squares and triangles mark Monkey L and T, respectively. b, Relative model performance summarized across all contrasts. Results are shown for each monkey and for unshuffled and shuffled simulated data. For a and b, all data points are the means and error bar/shaded area indicates the standard deviation across the 5 simulation repetitions. For b, data points for individual simulation repetitions are depicted by gray icons next to the error bars. c, The dependence of the width of the likelihood function $$\sigma _L$$ on the stimulus orientation is depicted for an example contrast-session (Monkey T, 8 % contrast, n=1,126 trials) on the original and the shuffled data. The shuffling procedure preserves the relationship between the average likelihood width and the stimulus orientation as desired. All data points are means, and error bar indicates standard deviation across trials falling in the specific bin.

Extended Data Fig. 10 Contributions of multi-units to the total attribution.

a, For each contrast-session, the multi-units were ordered from the largest to the smallest attribution to the likelihood mean $$A_\mu$$, and the cumulative attribution over the total of 96 multi-units were plotted (thin gray lines, n=545 total contrast-sessions from Monkey L and T). The average cumulative attribution over all contrast-sessions are depicted by the thick black lines. The results are shown for each attribution method separately. b, Same as in a, but for attribution to the likelihood standard deviation $$A_\sigma$$.

Supplementary information

Supplementary Information

Supplementary Table 1.

Rights and permissions

Reprints and Permissions

Walker, E.Y., Cotton, R.J., Ma, W.J. et al. A neural basis of probabilistic computation in visual cortex. Nat Neurosci 23, 122–129 (2020). https://doi.org/10.1038/s41593-019-0554-5

• Accepted:

• Published:

• Issue Date:

• Unmet expectations delay sensory processes

• Buse M. Urgen
•  & Huseyin Boyaci

Vision Research (2021)

• Bayesian modeling of the mind: From norms to neurons

• Michael Rescorla

WIREs Cognitive Science (2021)

• Parsing Neurodynamic Information Streams to Estimate the Frequency, Magnitude and Duration of Team Uncertainty

• Ronald H. Stevens
•  & Trysha L. Galloway

Frontiers in Systems Neuroscience (2021)

• The extensible Data-Brain model: Architecture, applications and directions

• Hongzhi Kuai
•  & Ning Zhong

Journal of Computational Science (2020)

• Probabilistic representations in perception: Are there any, and what would they be?

• Steven Gross

Mind & Language (2020)