Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Abstract

Dopamine neuron activity is tied to the prediction error in temporal difference reinforcement learning models. These models make significant simplifying assumptions, particularly with regard to the structure of the predictions fed into the dopamine neurons, which consist of a single chain of timepoint states. Although this predictive structure can explain error signals observed in many studies, it cannot cope with settings where subjects might infer multiple independent events and outcomes. In the present study, we recorded dopamine neurons in the ventral tegmental area in such a setting to test the validity of the single-stream assumption. Rats were trained in an odor-based choice task, in which the timing and identity of one of several rewards delivered in each trial changed across trial blocks. This design revealed an error signaling pattern that requires the dopamine neurons to access and update multiple independent predictive streams reflecting the subject’s belief about timing and potentially unique identities of expected rewards.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Task design, behavior and identification of putative dopamine neurons.
Fig. 2: Changes in activity of reward-responsive dopamine neurons to shifts in reward timing and flavor.
Fig. 3: Correlations between activity on unexpected reward delivery and omission.
Fig. 4: Changes in activity of reward-responsive dopamine neurons in response to omission of long reward.
Fig. 5: Putative reset mechanisms for the classic TDRL model.
Fig. 6: Comparison of changes in activity of reward-responsive dopamine neurons to terminal reward and short reward in delay-only blocks.
Fig. 7: Comparison of changes in activity of reward-responsive dopamine neurons to terminal reward and short reward in delay-only blocks.
Fig. 8: Prediction error responses in a TDRL model with multithreaded predictions are sensitive to reward identity.

Similar content being viewed by others

Data availability

The dataset and all scripts used in the present study are available at https://github.com/YouGTakahashi/ultra_delay_analysis for the unit analyses and at https://github.com/ajlangdon/multithreadTD for the modeling.

Code availability

The dataset and all scripts used in the present study are available at https://github.com/YouGTakahashi/ultra_delay_analysis for the unit analyses and at https://github.com/ajlangdon/multithreadTD for the modeling.

References

  1. Schultz, W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 17, 183–195 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Keiflin, R. & Janak, P. H. Dopamine prediction errors in reward learning and addiction: from theory to neural circuitry. Neuron 88, 247–263 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Glimcher, P. W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl Acad. Sci. USA 108, 15647–15654 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate for prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  5. Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Mirenowicz, J. & Schultz, W. Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027 (1994).

    Article  CAS  PubMed  Google Scholar 

  7. Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).

    Article  CAS  PubMed  Google Scholar 

  8. Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).

    Article  CAS  PubMed  Google Scholar 

  9. Tobler, P. N., Dickinson, A. & Schultz, W. Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci. 23, 10402–10410 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Lak, A., Stauffer, W. R. & Schultz, W. Dopamine prediction error responses integrate subjective value from different reward dimensions. Proc. Natl Acad. Sci. USA 111, 2342–2348 (2014).

    Article  Google Scholar 

  11. Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Kim, H. R. et al. A unified framework for dopamine signals across timescales. Cell 183, 1600–1616 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Fiorillo, C. D., Newsome, W. T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 11, 966–973 (2008).

    Article  CAS  PubMed  Google Scholar 

  16. Kobayashi, K. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Suri, R. E. & Schultz, W. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91, 871–890 (1999).

    Article  CAS  PubMed  Google Scholar 

  18. Daw, N., Courville, A. C. & Touretzky, D. S. Representation and timing in theories of the dopamine system. Neural Comput. 18, 1637–1677 (2006).

    Article  PubMed  Google Scholar 

  19. Takahashi, Y. K., Langdon, A. J., Niv, Y. & Schoenbaum, G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Starkweather, C. K., Babayan, B. M., Uchida, N. & Gershman, S. J. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci. 20, 581–589 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Takahashi, Y. K. et al. Dopamine neurons respond to errors in the prediction of sensory features of expected rewards. Neuron 95, 1395–1405 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Stalnaker, T. A. et al. Dopamine neuron ensembles signal the content of sensory prediction errors. eLife 8, e49315 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Howard, J. D. & Kahnt, T. Identity prediction errors in the human midbrain update reward-identity expectations in the orbitofrontal cortex. Nat. Commun. 9, 1–11 (2018).

    Article  Google Scholar 

  24. Chang, C. Y., Gardner, M., Di Tillio, M. G. & Schoenbaum, G. Optogenetic blockade of dopamine transients prevents learning induced by changes in reward features. Curr. Biol. 27, 3480–3486 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Keiflin, R., Pribut, H. J., Shah, N. B. & Janak, P. H. Ventral tegmental dopamine neurons participate in reward identity predictions. Curr. Biol. 29, 92–103 (2019).

    Article  Google Scholar 

  26. Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lak, A., Nomoto, K., Keramati, M., Sakagami, M. & Kepecs, A. Midbrain dopamine neurons signal belief in choice accuracy during a perceptual decision. Curr. Biol. 27, 821–832 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Starkweather, C. K. & Uchida, N. Dopamine signals as temporal difference errors: recent advances. Curr. Opin. Neurobiol. 67, 95–105 (2021).

    Article  CAS  PubMed  Google Scholar 

  29. Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Starkweather, C. K., Gershman, S. J. & Uchida, N. The medial prefrontal cortex shapes dopamine reward prediction errors under state uncertainty. Neuron 98, 616–629 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Jo, Y. S. & Mizumori, S. J. Prefrontal regulation of neuronal activity in the ventral tegmental area. Cereb. Cortex 26, 4057–4068 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Jo, Y. S., Lee, J. & Mizumori, S. J. Effects of prefrontal cortical inactivation on neural activity in the ventral tegmental area. J. Neurosci. 33, 8159–8171 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Takahashi, Y. K. et al. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat. Neurosci. 14, 1590–1597 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Langdon, A. J., Sharpe, M. J., Schoenbaum, G. & Niv, Y. Model-based predictions for dopamine. Curr. Opin. Neurobiol. 49, 1–7 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Sutton, R. S. Learning to predict by the method of temporal difference. Mach. Learn. 3, 9–44 (1988).

    Article  Google Scholar 

  36. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An introduction (MIT Press, 1998).

    Google Scholar 

  37. Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artif. Intelligence 101, 99–134 (1998).

    Article  Google Scholar 

  38. Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Evaluating the TD model of classical conditioning. Learn. Behav. 40, 305–319 (2012).

    Article  PubMed  Google Scholar 

  40. Glascher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by grant no. Z1A-DA000587 (to G.S.) at the Intramural Research Program of the National Institute on Drug Abuse, the Intramural Research Program of the National Institute of Mental Health (to A.J.L.) and by grant no. U01DA050647 (to A.J.L.) from the National Institute on Drug Abuse. The opinions expressed in this article are the authors’ own and do not reflect the view of the National Institutes of Health/Department of Health and Human Services.

Author information

Authors and Affiliations

Authors

Contributions

Y.K.T., T.A.S. and G.S. designed the experiments. Y.K.T. and L.E.M. conducted the behavioral training and single-unit recording. S.K.H. and A.J.L. conducted the modeling. Y.K.T., A.J.L. and G.S. interpreted the data and wrote the manuscript with input from the other authors.

Corresponding authors

Correspondence to Yuji K. Takahashi, Angela J. Langdon or Geoffrey Schoenbaum.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Okihide Hikosaka, Kevin Miller and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Activity to the odor cues changes with value and free choice behavior across trial blocks and does so similarly for the two block types.

(a) Average firing in the reward-responsive dopamine neurons during presentation of the high and low value cues. Average firing is plotted for the first and last 5 trials of the Delay-Only and Delay-Flavor blocks. Activity increased to the high value cue, paired with the early reward, across each block (Three-way ANOVA, Trial x value, F9,1071 = 9.46, p < 0.01), and there was no difference in the pattern across switch types (F’s < 1.86, p’s > 0.06). Error bars represent SEM. n = 120 cells collected from 8 independent rats. (b) Relationship between the change in firing to the high and low value cues and the change in free choice behavior. The difference in firing to the high and low value cues in the first (blue) and last (red) 5 trials of all the blocks is plotted against the percentage of choice of the short reward during these same trials. The two measures were strongly correlated (scatter plot) reflecting the shift in both measures from early to late (blue to red in the distribution plots) within each block (Two-sided Wilcoxon ranksum test, cue, p < 0.01; choice, p < 0.01). n = 120 cells collected from 8 independent rats. Note all models implemented in the main text also produced changes in signal to the cues that differed by value (see Extended Data Fig. 5).

Extended Data Fig. 2 Changes activity of reward-responsive dopamine neurons to omission of a delayed reward in delay-only and delay-flavor blocks do not depend on order of switches.

Displays as in Fig. 5a of main text except that (a) shows data from blocks 2 and 3 data in which the delay-flavor block preceded the delay-only block, and (b) shows data from blocks 4 and 5 in which the delay-only block preceded the delay—flavor block. Statistics in each panel indicate results of Wilcoxon signed-rank test (p) and the average difference score (u). Comparisons of the distributions in panels a and b showed that they were not different (Two-sided Wilcoxon rank sum test) within either delay-only (p = 0.48) or delay-flavor switches (p = 0.60). n = 120 cells collected from 8 independent rats.

Extended Data Fig. 3

Changes activity of reward-responsive dopamine neurons to omission of a delayed reward in delay-only and delay-flavor blocks do not depend on non-significant numerical differences in subjects’ consumption of the two flavors (Fig. 1c). Displays as in Fig. 5a of main text except that (a) shows data involving omission of the numerically-higher reward and (b) show data involving omission of the numerically-lower reward. Statistics in each panel indicate results of Wilcoxon signed-rank test (p) and the average difference score (u). Comparisons of the distributions in panels a and b showed that they were not different (Two-sided Wilcoxon rank sum test) within either delay-only (p = 0.29) or delay-flavor switches (p = 0.88). n = 120 cells collected from 8 independent rats.

Extended Data Fig. 4 Licking behavior during omission of long reward.

(a) Average lick rate in 2 sec after an omission of delayed reward on first trial and average of last 5 trials in the Delay-Only (dark-blue) and the Delay-Flavor (light-blue) switches. Two-way ANOVA (early/late x switch types) revealed a significant main effect of early/late (F1,51 = 9.47, p < 0.01) and a significant interaction between early/late and switch type (F1,51 = 5.18, p < 0.05). A step down comparison revealed that lick rates in the first trial were significantly higher than those in the last trials after a Delay-Flavor switch (F1,51 = 9.90, p < 0.01, light-blue), but not after a Delay-Only switch (F1,51 = 0.54, p > 0.10, blue). Lick rates on the first trial in the Delay-Flavor switch were significantly higher than those in the first trial after a Delay-Only switch (Two-way ANOVA, F1,51 = 4.11, p = 0.04), but not during the last 5 trials (F1,51 = 0.85, p > 0.10). Error bars represent SEM. n = 53 sessions collected from 8 independent rats. (b) Distributions of difference scores comparing lick rates on the first and last trials after Delay-Only (left) and Delay-Flavor (right) switches. The numbers in each panel indicate results of Two-sided Wilcoxon signed-rank test (p) and the average difference score (u). n = 53 sessions collected from 8 independent rats.

Extended Data Fig. 5 Simulated prediction error response to the cue in high and low value blocks for each TDRL model.

Simulated reward prediction error responses to the cue in the (a) single thread TDRL model without reset, (b) single thread TDRL model with reset, (c) single thread TDRL model with sequential reset, (d) single thread TDRL model with delay-specific sequential reset and the (e) multithread TDRL model for each of the delay and delay-flavor block switches. All models predicted qualitatively similar changes in activity to the high value (blue) and low value (red) cues across first and last 5 trials of delay-only and delay-flavor block switches.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Takahashi, Y.K., Stalnaker, T.A., Mueller, L.E. et al. Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model. Nat Neurosci 26, 830–839 (2023). https://doi.org/10.1038/s41593-023-01310-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-023-01310-x

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing