Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Reward prediction error neurons implement an efficient code for reward

Abstract

We use efficient coding principles borrowed from sensory neuroscience to derive the optimal neural population to encode a reward distribution. We show that the responses of dopaminergic reward prediction error neurons in mouse and macaque are similar to those of the efficient code in the following ways: the neurons have a broad distribution of midpoints covering the reward distribution; neurons with higher thresholds have higher gains, more convex tuning functions and lower slopes; and their slope is higher when the reward distribution is narrower. Furthermore, we derive learning rules that converge to the efficient code. The learning rule for the position of the neuron on the reward axis closely resembles distributional reinforcement learning. Thus, reward prediction error neuron responses may be optimized to broadcast an efficient reward signal, forming a connection between efficient coding and reinforcement learning, two of the most successful theories in computational neuroscience.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Experimental design and basic results of Eshel et al.
Fig. 2: Comparisons between the measured neurons of Eshel et al. and the efficient code.
Fig. 3: Efficient coding accounts for response slope characteristics in the variable-distribution task.
Fig. 4: Combination of learning rules to learn the efficient code.

Similar content being viewed by others

Data availability

No new data were measured for this project. The data collected by Eshel et al.15 that we analyze here were kindly made available by Dabney et al.16 at https://doi.org/10.17605/OSF.IO/UX5RG.

Code availability

The code to recreate our analyses42 is available at https://github.com/dongjae-kim/efficient-coding-dist-rl.

References

  1. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  2. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MathWorks, 2018).

  3. Balleine, B. W., Daw, N. D. & O’Doherty, J. P. in Neuroeconomics (eds Glimcher, P. W. et al.) 367–387 (Academic Press, 2009).

  4. Attneave, F. Some informational aspects of visual perception. Psychol. Rev. 61, 183–193 (1954).

    Article  CAS  PubMed  Google Scholar 

  5. Barlow, H. B. in Sensory Communication (ed Rosenblith, W. A.) 216–234 (MIT Press, 1961).

  6. Laughlin, S. A simple coding procedure enhances a neuron’s information capacity. Z. Naturforsch. C Biosci. 36, 910–912 (1981).

    Article  CAS  PubMed  Google Scholar 

  7. Schwartz, O. & Simoncelli, E. P. Natural signal statistics and sensory gain control. Nat. Neurosci. 4, 819–825 (2001).

    Article  CAS  PubMed  Google Scholar 

  8. Wei, X.-X. & Stocker, A. A. Lawful relation between perceptual bias and discriminability. Proc. Natl Acad. Sci. USA 114, 10244–10249 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Louie, K., Glimcher, P. W. & Webb, R. Adaptive neural coding: from biological to behavioral decision-making. Curr. Opin. Behav. Sci. 5, 91–99 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Polanía, R., Woodford, M. & Ruff, C. C. Efficient coding of subjective value. Nat. Neurosci. 22, 134–142 (2019).

    Article  PubMed  Google Scholar 

  11. Bhui, R., Lai, L. & Gershman, S. J. Resource-rational decision making. Curr. Opin. Behav. Sci. 41, 15–21 (2021).

    Article  Google Scholar 

  12. Louie, K. & Glimcher, P. W. Efficient coding and the neural representation of value. Ann. N Y Acad. Sci. 1251, 13–32 (2012).

    Article  PubMed  Google Scholar 

  13. Motiwala, A., Soares, S., Atallah, B. V., Paton, J. J. & Machens, C. K. Efficient coding of cognitive variables underlies dopamine response and choice behavior. Nat. Neurosci. 25, 738–748 (2022).

    Article  CAS  PubMed  Google Scholar 

  14. Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Rothenhoefer, K. M., Hong, T., Alikaya, A. & Stauffer, W. R. Rare rewards amplify dopamine responses. Nat. Neurosci. 24, 465–469 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ganguli, D. & Simoncelli, E. P. Efficient sensory encoding and Bayesian inference with heterogeneous neural populations. Neural Comput. 26, 2103–2134 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).

    Article  CAS  PubMed  Google Scholar 

  20. Cohen, J. D. & Servan-Schreiber, D. A theory of dopamine function and its role in cognitive deficits in schizophrenia. Schizophr. Bull. 19, 85–104 (1993).

    Article  CAS  PubMed  Google Scholar 

  21. Wei, X.-X. & Stocker, A. A. Bayesian inference with efficient neural population codes. In Artificial Neural Networks and Machine Learning—ICANN 2012, Vol. 7552 (eds Hutchison, D. et al.) 523–530 (Springer, 2012).

  22. Frank, M. J., Seeberger, L. C. & O’Reilly, R. C. By carrot or by stick: cognitive reinforcement learning in Parkinsonism. Science 306, 1940–1943 (2004).

    Article  CAS  PubMed  Google Scholar 

  23. Mikhael, J. G. & Bogacz, R. Learning reward uncertainty in the basal ganglia. PLoS Comput. Biol. 12, e1005062 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Kobayashi, S. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Kim, H. R. et al. A unified framework for dopamine signals across timescales. Cell 183, 1600–1616 (2020).

    Article  Google Scholar 

  27. Starkweather, C. K. & Uchida, N. Dopamine signals as temporal difference errors: recent advances. Curr. Opin. Neurobiol. 67, 95–105 (2021).

    Article  CAS  PubMed  Google Scholar 

  28. Starkweather, C. K., Babayan, B. M., Uchida, N. & Gershman, S. J. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci. 20, 581–589 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time. Science 354, 1273–1277 (2016).

    Article  CAS  PubMed  Google Scholar 

  30. Tano, P., Dayan, P. & Pouget, A. A local temporal difference code for distributional reinforcement learning. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 13662–13673 (Neural Information Processing Systems Foundation, 2020).

  31. Louie, K. Asymmetric and adaptive reward coding via normalized reinforcement learning. PLoS Comput. Biol. 18, e1010350 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Naka, K. I. & Rushton, W. A. H. An attempt to analyse colour reception by electrophysiology. J. Physiol. 185, 556–586 (1966).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Bredenberg, C., Simoncelli, E. P. & Savin, C. Learning efficient task-dependent representations with synaptic plasticity. In Advances in Neural Information Processing Systems 33 (eds Larochelle, H. et al.) 15714–15724 (Neural Information Processing Systems Foundation, 2020).

  34. Savin, C. & Triesch, J. Emergence of task-dependent representations in working memory circuits. Front. Comput. Neurosci. 8, 57 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Gerstner, W., Lehmann, M., Liakoni, V., Corneil, D. & Brea, J. Eligibility traces and plasticity on behavioral time scales: experimental support of neoHebbian three-factor learning rules. Front. Neural Circuits 12, 53 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  36. Frémaux, N. & Gerstner, W. Neuromodulated spike-timing-dependent plasticity, and theory of three-factor learning rules. Front. Neural Circuits 9, 85 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Wei, X.-X. & Stocker, A. A. A Bayesian observer model constrained by efficient coding can explain ‘anti-Bayesian’ percepts. Nat. Neurosci. 18, 1509–1517 (2015).

    Article  CAS  PubMed  Google Scholar 

  38. Brunel, N. & Nadal, J.-P. Mutual information, Fisher information, and population coding. Neural Comput. 10, 1731–1757 (1998).

    Article  CAS  PubMed  Google Scholar 

  39. Cover, T. M. & Thomas, J. A. Elements of Information Theory (Wiley, 1991).

  40. Wei, X.-X. & Stocker, A. A. Mutual information, Fisher information, and efficient coding. Neural Comput. 28, 305–326 (2016).

    Article  PubMed  Google Scholar 

  41. Bethge, M., Rotermund, D. & Pawelzik, K. Optimal short-term population coding: when Fisher information fails. Neural Comput. 14, 2317–2351 (2002).

    Article  CAS  PubMed  Google Scholar 

  42. Schütt, H., Kim, D. & Ma, W. J. Code for efficient coding and distributional reinforcement learning. Zenodo https://doi.org/10.5281/zenodo.10669061

Download references

Acknowledgements

We thank H.-H. Li for valuable discussions. We received no specific funding for this work.

Author information

Authors and Affiliations

Authors

Contributions

H.H.S. derived the efficient code. H.H.S. and D.K. analyzed the neural data. W.J.M. supervised the project. All authors wrote the manuscript.

Corresponding author

Correspondence to Heiko H. Schütt.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Comparing encoding populations for reward with 10 neurons and the same expected number of spikes.

A: Compared neuronal populations: single neuron: All neurons share the same response curve, optimized to maximize transferred information. equal spacing: neurons tile the space, not optimized. no gain: positions and slopes are optimized, but all neurons have equal gain. optimal α = 1: fully optimized population as derived previously18 with density proportional to the distribution. optimal α = 0.673: Equally optimal distribution but with α fit to match the midpoint distribution for the optimal code and the experimental data. B: Fisher information as a function of reward for each of the populations. C: Expected logarithm of Fisher information under the reward distribution relative to the single-neuron case.

Extended Data Fig. 2 Illustration of the solution of the efficient coding problem, varying α (rows) and the reward distribution (columns).

The reward distributions are all log-normal distributions with their pdfs and parameters plotted at the top.

Extended Data Fig. 3 Efficient code for the variable-reward task14.

A: Tuning curves. For clarity, only 20 of 39 neurons are shown. B: Density of neurons as a function of midpoint. C: Gain as a function of midpoint.

Extended Data Fig. 4 Log-normal kernel density estimation of midpoints and threshold.

A: Midpoints. B: Thresholds. Measured neurons (black) and efficient code (cyan) are overlayed over the reward density (gray).

Extended Data Fig. 5 Efficient code for the variable-magnitude task17.

A-C: Efficient code for the uniform distribution. D-F: Efficient code for the normal distribution. A,D: Tuning curves. For clarity, only 13 of 40 neurons are shown. B,E: Density. C,F: Gain.

Extended Data Fig. 6 Evaluation of learning rules placing neurons’ midpoints at expectiles instead of quantiles.

Plotting conventions as in Fig. 4. Each panel shows the converged population of 20 neurons after learning based on 20, 000 reward presentations. The inset illustrates the learning rule. A: Learning the position on the reward axis for the neurons to converge to the quantiles of the distribution. This learning rule is the distribution RL learning rule. B: Additionally learning the slope of the neurons to be proportional to the local density by increasing the slope when the reward falls within the dynamic range and decreasing otherwise. C: First method to set the gain: iterative adjustment to converge to a fixed average firing rate. D: Second method to set the gain: use a fixed gain per neuron based on the quantile it will eventually converge to. E: The efficient tuning curve for a single neuron. F: The analytically derived optimal solution. G: Comparison of information transfer across the different populations with the same number of neurons and expected firing rate.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schütt, H.H., Kim, D. & Ma, W.J. Reward prediction error neurons implement an efficient code for reward. Nat Neurosci 27, 1333–1339 (2024). https://doi.org/10.1038/s41593-024-01671-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-024-01671-x

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing