Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Rare rewards amplify dopamine responses

Abstract

Dopamine prediction error responses are essential components of universal learning mechanisms. However, it is unknown whether individual dopamine neurons reflect the shape of reward distributions. Here, we used symmetrical distributions with differently weighted tails to investigate how the frequency of rewards and reward prediction errors influence dopamine signals. Rare rewards amplified dopamine responses, even when conventional prediction errors were identical, indicating a mechanism for learning the complexities of real-world incentives.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Behavior.
Fig. 2: Rare rewards amplified dopamine reward prediction error responses.
Fig. 3: Dopamine pseudo-populations and single neurons simultaneously reflect predicted probability distributions.

Data availability

The data that support the findings of this study are available from the corresponding author upon request.

Code availability

The code used to analyze these data are available from the corresponding author upon request.

References

  1. Stauffer, W. R. The biological and behavioral computations that influence dopamine responses. Curr. Opin. Neurobiol. 49, 123–131 (2018).

    CAS  Article  Google Scholar 

  2. Enomoto, K. et al. Dopamine neurons learn to encode the long-term value of multiple future rewards. Proc. Natl Acad. Sci. USA 108, 15462–15467 (2011).

    CAS  Article  Google Scholar 

  3. Sutton, R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998).

  4. d’Acremont, M. & Bossaerts, P. Neural mechanisms behind identification of leptokurtic noise and adaptive behavioral response. Cereb. Cortex 26, 1818–1830 (2016).

    Article  Google Scholar 

  5. Diederen, K. M. J. & Schultz, W. Scaling prediction errors to reward variability benefits error-driven learning in humans. J. Neurophysiol. 114, 1628–1640 (2015).

    CAS  Article  Google Scholar 

  6. Nassar, M. R., Wilson, R. C., Heasly, B. & Gold, J. I. An approximately Bayesian delta-rule model explains the dynamics of belief updating in a changing environment. J. Neurosci. 30, 12366–12378 (2010).

    CAS  Article  Google Scholar 

  7. Krajbich, I., Armel, C. & Rangel, A. Visual fixations and the computation and comparison of value in simple choice. Nat. Neurosci. 13, 1292–1298 (2010).

    CAS  Article  Google Scholar 

  8. Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005).

    CAS  Article  Google Scholar 

  9. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    CAS  Article  Google Scholar 

  10. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    CAS  Article  Google Scholar 

  11. Bellemare, M. G., Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. Proc. Mach. Learn. Res. 70, 449–458 (2017).

    Google Scholar 

  12. Dabney, W. et al. A distributional code for value in dopamine-based reinforcement learning. Nature 577, 671–675 (2020).

    CAS  Article  Google Scholar 

  13. Stauffer, W. R., Lak, A. & Schultz, W. Dopamine reward prediction error responses reflect marginal utility. Curr. Biol. 24, 2491–2500 (2014).

    CAS  Article  Google Scholar 

  14. Gershman, S. J. A unifying probabilistic view of associative learning. PLoS Comput. Biol. 11, e1004567 (2015).

    Article  Google Scholar 

  15. Babayan, B. M., Uchida, N. & Gershman, S. J. Belief state representation in the dopamine system. Nat. Commun. 9, 1891 (2018).

    Article  Google Scholar 

  16. Lak, A., Stauffer, W. R. & Schultz, W. Dopamine neurons learn relative chosen value from probabilistic rewards. eLife 5, e18044 (2016).

  17. Morrens, J., Aydin, Ç., Janse van Rensburg, A., Esquivelzeta Rabell, J. & Haesler, S. Cue-evoked dopamine promotes conditioned responding during learning. Neuron 106, 142–153.e147 (2020).

    CAS  Article  Google Scholar 

  18. Preuschoff, K., Marius’t Hart, B. & Einhauser, W. Pupil dilation signals surprise: evidence for noradrenaline’s role in decision making. Front. Neurosci. 5, 115 (2011).

  19. Schultz, W. Neuronal reward and decision signals: from theories to data. Physiol. Rev. 95, 853–951 (2015).

    CAS  Article  Google Scholar 

  20. Vijayraghavan, S., Wang, M., Birnbaum, S. G., Williams, G. V. & Arnsten, A. F. Inverted-U dopamine D1 receptor actions on prefrontal neurons engaged in working memory. Nat. Neurosci. 10, 376–384 (2007).

    CAS  Article  Google Scholar 

  21. Van Slooten, J. C., Jahfari, S., Knapen, T. & Theeuwes, J. How pupil responses track value-based decision-making during and after reinforcement learning. PLoS Comput. Biol. 14, e1006632 (2018).

    Article  Google Scholar 

  22. de Hollander, G. & Knapen, T. nideconv v0.1.0 (2017); https://nideconv.readthedocs.io/en/latest/

  23. Loe, P. R., Whitsel, B. L., Dreyer, D. A. & Metz, C. B. Body representation in ventrobasal thalamus of macaque: a single-unit analysis. J. Neurophysiol. 40, 1339–1355 (1977).

    CAS  Article  Google Scholar 

  24. Guyenet, P. G. & Aghajanian, G. K. Antidromic identification of dopaminergic and other output neurons of the rat substantia nigra. Brain Res. 150, 69–84 (1978).

    CAS  Article  Google Scholar 

  25. Batista, A. P. et al. Cortical neural prosthesis performance improves when eye position is monitored. IEEE Trans. Neural Syst. Rehabil. Eng. 16, 24–31 (2008).

    Article  Google Scholar 

Download references

Acknowledgements

We thank A. Bostan for comments and discussion, and J. Breter for animal care and enrichment. This work was supported by the National Institutes of Health (NIMH) grant DP2MH113095 to W.R.S.

Author information

Authors and Affiliations

Authors

Contributions

K.M.R. and W.R.S. designed the experiment. K.M.R., A.A. and W.R.S. collected data. K.M.R., T.H. and W.R.S. analyzed the data and wrote the paper.

Corresponding author

Correspondence to William R. Stauffer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Dopamine neurons and recording sites.

a, Example dopamine waveform from one of the neurons in our population. b, The population of 40 neurons used for our analyses in the Pavlovian and choice task had significant activations following unpredicted rewards – a characteristic feature of dopamine neurons. Gray bar along the x axis indicate the response window used for analysis. c, Recording locations for the left hemisphere of monkey S. X axis indicates lateral to medial location in the grid in millimeters, relative to midline (0). Right y axis indicates posterior to anterior location in the grid in millimeters, relative to interaural line (IAL). Each locations’ color indicates the number of neurons recorded for that location. Black circles surrounding the individual locations indicated that neurons recorded there were part of the population of 29 neurons that had a steeper response slopes in normal compared to uniform condition. Bar graphs on the left and top axes indicate the proportion of cells in that AP (left) or ML (top) location that were effect positive. Yellow dot corresponds to location indicated in MRI scan shown in d and e. d, Recording locations for the left hemisphere of monkey B. Same as panel c. e, Sagittal view MRI of the recording chamber of monkey S. Purple arrow indicates the AP location in the grid (+12 mm from IAL). f, Coronal view MRI of the recording chamber of monkey S. Purple arrow indicates the ML location in the grid (1 mm from Midline). Yellow dot in e and f correspond to approximate recording grid location in c.

Extended Data Fig. 2 Normal and Uniform reward size distributions have equivalent subjective values.

a, Schematic of the distribution-predicting fractal cues used to represent Normal (N) and Uniform (U) distributions, and safe values for the choice task in b. Three unique cues were used to predict a Normal distribution of rewards, and three unique cues were used to predict a Uniform distribution of rewards. All the distribution predicting cues were comprised of the same three reward volumes (0.2, 0.4, and 0.6 ml), and thus the same expected value (EV) of 0.4 ml. Additionally, one fractal cue predicted a sure reward of 0.2 ml, and another fractal cue predicted a sure reward of 0.6 ml. b, Monkeys made saccade-guided choices between Normal distribution-predicting cues, Uniform distribution-predicting cues, and safe rewards. c, Bar graphs are the probability of choosing the alternate cue over a Uniform distribution-predicting cue with an EV of 0.4 ml. The alternates from left to right on the x axis are a safe cue predicting 0.2 ml, a Normal distribution-predicting cue with a mean of 0.4 ml, and a safe cue predicting 0.6 ml. Data points are from individual blocks, and error bars represent ±SEM across blocks (between 6 and 18 blocks per condition). d, Same as in c, but the probability of choosing an alternate cue over a Normal distribution-predicting cue with an EV of 0.4 ml, and the middle alternate option represents Uniform distribution-predicting cues with an EV of 0.4 ml. e, The choice task used to measure subjective value. Animals made saccade-directed choices between a distribution predicting cue and a safe alternative option. The safe alternative option was a value bar with a minimum and maximum of 0 and 0.8 ml at the bottom and top, respectively. The intersection between the horizontal bar and the scale indicated the volume of juice that would be received if monkeys selected the safe cue. f, Probability of choosing the safe cue as a function of the value of the safe option, when the distribution predicting cue had an expected value (EV) of 0.4ml. Dots show average choice probability for 9 safe value options for monkey B. Solid lines are a logistic fit to the data. Red indicates data from normal distribution blocks, gray indicates data from uniform distribution blocks. The dashed horizontal lines indicate subjective equivalence, and the CE for each distribution type is indicated with the dashed vertical lines. g, Same as in f, for monkey S.

Extended Data Fig. 3 Reward randomization schemes used to determine trial types.

Top, ‘CS matched” randomization with equal frequencies of Normal and Uniform trials. Bottom, “PE matched” randomization with equal frequencies of 0.2 ml and 0.6 ml reward trials in each distribution. In both graphs, the y axis represents the probability of drawing the trial type (trial types drawn with replacement). The 6 trial types divided according to distribution type (N and U) and reward size (0.2, 0.4 and 0.6 ml). The number of instances in each trial type “stack” indicates the probability of drawing the trial type.

Extended Data Fig. 4 Amplification effect was robust.

Box and whisker plots show the baseline subtracted responses to 0.2 and 0.6 ml of juice, as in Fig. 3b, but applied to all 34 neurons that were significantly modulated by value. * indicates p < 0.05, ** indicates p < 0.01, N = 34 neurons, Wilcoxon signed-rank test, Bonferroni corrected for multiple comparisons. Box and whisker plots show, median (line), quartiles (boxes), range (whiskers), and outliers (+).

Supplementary information

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rothenhoefer, K.M., Hong, T., Alikaya, A. et al. Rare rewards amplify dopamine responses. Nat Neurosci 24, 465–469 (2021). https://doi.org/10.1038/s41593-021-00807-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-021-00807-7

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing