Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Takahashi, Yuji K.; Stalnaker, Thomas A.; Mueller, Lauren E.; Harootonian, Sevan K.; Langdon, Angela J.; Schoenbaum, Geoffrey

doi:10.1038/s41593-023-01310-x

Article
Published: 20 April 2023

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Nature Neuroscience volume 26, pages 830–839 (2023)Cite this article

5359 Accesses
3 Citations
7 Altmetric
Metrics details

Subjects

Abstract

Dopamine neuron activity is tied to the prediction error in temporal difference reinforcement learning models. These models make significant simplifying assumptions, particularly with regard to the structure of the predictions fed into the dopamine neurons, which consist of a single chain of timepoint states. Although this predictive structure can explain error signals observed in many studies, it cannot cope with settings where subjects might infer multiple independent events and outcomes. In the present study, we recorded dopamine neurons in the ventral tegmental area in such a setting to test the validity of the single-stream assumption. Rats were trained in an odor-based choice task, in which the timing and identity of one of several rewards delivered in each trial changed across trial blocks. This design revealed an error signaling pattern that requires the dopamine neurons to access and update multiple independent predictive streams reflecting the subject’s belief about timing and potentially unique identities of expected rewards.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Task design, behavior and identification of putative dopamine neurons.**

**Fig. 2: Changes in activity of reward-responsive dopamine neurons to shifts in reward timing and flavor.**

**Fig. 3: Correlations between activity on unexpected reward delivery and omission.**

**Fig. 4: Changes in activity of reward-responsive dopamine neurons in response to omission of long reward.**

**Fig. 5: Putative reset mechanisms for the classic TDRL model.**

**Fig. 6: Comparison of changes in activity of reward-responsive dopamine neurons to terminal reward and short reward in delay-only blocks.**

**Fig. 7: Comparison of changes in activity of reward-responsive dopamine neurons to terminal reward and short reward in delay-only blocks.**

**Fig. 8: Prediction error responses in a TDRL model with multithreaded predictions are sensitive to reward identity.**

Distinct µ-opioid ensembles trigger positive and negative fentanyl reinforcement

Article Open access 22 May 2024

Mapping model units to visual neurons reveals population code for social behaviour

Article Open access 22 May 2024

Simple Behavioral Analysis (SimBA) as a platform for explainable machine learning in behavioral neuroscience

Article 22 May 2024

Data availability

The dataset and all scripts used in the present study are available at https://github.com/YouGTakahashi/ultra_delay_analysis for the unit analyses and at https://github.com/ajlangdon/multithreadTD for the modeling.

Code availability

The dataset and all scripts used in the present study are available at https://github.com/YouGTakahashi/ultra_delay_analysis for the unit analyses and at https://github.com/ajlangdon/multithreadTD for the modeling.

References

Schultz, W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 17, 183–195 (2016).
Article CAS PubMed PubMed Central Google Scholar
Keiflin, R. & Janak, P. H. Dopamine prediction errors in reward learning and addiction: from theory to neural circuitry. Neuron 88, 247–263 (2015).
Article CAS PubMed PubMed Central Google Scholar
Glimcher, P. W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl Acad. Sci. USA 108, 15647–15654 (2011).
Article CAS PubMed PubMed Central Google Scholar
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate for prediction and reward. Science 275, 1593–1599 (1997).
Article CAS PubMed Google Scholar
Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).
Article CAS PubMed PubMed Central Google Scholar
Mirenowicz, J. & Schultz, W. Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027 (1994).
Article CAS PubMed Google Scholar
Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).
Article CAS PubMed Google Scholar
Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).
Article CAS PubMed Google Scholar
Tobler, P. N., Dickinson, A. & Schultz, W. Coding of predicted reward omission by dopamine neurons in a conditioned inhibition paradigm. J. Neurosci. 23, 10402–10410 (2003).
Article CAS PubMed PubMed Central Google Scholar
Lak, A., Stauffer, W. R. & Schultz, W. Dopamine prediction error responses integrate subjective value from different reward dimensions. Proc. Natl Acad. Sci. USA 111, 2342–2348 (2014).
Article Google Scholar
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Article CAS PubMed PubMed Central Google Scholar
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. R. et al. A unified framework for dopamine signals across timescales. Cell 183, 1600–1616 (2020).
Article CAS PubMed PubMed Central Google Scholar
Fiorillo, C. D., Newsome, W. T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 11, 966–973 (2008).
Article CAS PubMed Google Scholar
Kobayashi, K. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008).
Article CAS PubMed PubMed Central Google Scholar
Suri, R. E. & Schultz, W. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91, 871–890 (1999).
Article CAS PubMed Google Scholar
Daw, N., Courville, A. C. & Touretzky, D. S. Representation and timing in theories of the dopamine system. Neural Comput. 18, 1637–1677 (2006).
Article PubMed Google Scholar
Takahashi, Y. K., Langdon, A. J., Niv, Y. & Schoenbaum, G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193 (2016).
Article CAS PubMed PubMed Central Google Scholar
Starkweather, C. K., Babayan, B. M., Uchida, N. & Gershman, S. J. Dopamine reward prediction errors reflect hidden-state inference across time. Nat. Neurosci. 20, 581–589 (2017).
Article CAS PubMed PubMed Central Google Scholar
Takahashi, Y. K. et al. Dopamine neurons respond to errors in the prediction of sensory features of expected rewards. Neuron 95, 1395–1405 (2017).
Article CAS PubMed PubMed Central Google Scholar
Stalnaker, T. A. et al. Dopamine neuron ensembles signal the content of sensory prediction errors. eLife 8, e49315 (2019).
Article PubMed PubMed Central Google Scholar
Howard, J. D. & Kahnt, T. Identity prediction errors in the human midbrain update reward-identity expectations in the orbitofrontal cortex. Nat. Commun. 9, 1–11 (2018).
Article Google Scholar
Chang, C. Y., Gardner, M., Di Tillio, M. G. & Schoenbaum, G. Optogenetic blockade of dopamine transients prevents learning induced by changes in reward features. Curr. Biol. 27, 3480–3486 (2017).
Article CAS PubMed PubMed Central Google Scholar
Keiflin, R., Pribut, H. J., Shah, N. B. & Janak, P. H. Ventral tegmental dopamine neurons participate in reward identity predictions. Curr. Biol. 29, 92–103 (2019).
Article Google Scholar
Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).
Article CAS PubMed PubMed Central Google Scholar
Lak, A., Nomoto, K., Keramati, M., Sakagami, M. & Kepecs, A. Midbrain dopamine neurons signal belief in choice accuracy during a perceptual decision. Curr. Biol. 27, 821–832 (2017).
Article CAS PubMed PubMed Central Google Scholar
Starkweather, C. K. & Uchida, N. Dopamine signals as temporal difference errors: recent advances. Curr. Opin. Neurobiol. 67, 95–105 (2021).
Article CAS PubMed Google Scholar
Wilson, R. C., Takahashi, Y. K., Schoenbaum, G. & Niv, Y. Orbitofrontal cortex as a cognitive map of task space. Neuron 81, 267–279 (2014).
Article CAS PubMed PubMed Central Google Scholar
Starkweather, C. K., Gershman, S. J. & Uchida, N. The medial prefrontal cortex shapes dopamine reward prediction errors under state uncertainty. Neuron 98, 616–629 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jo, Y. S. & Mizumori, S. J. Prefrontal regulation of neuronal activity in the ventral tegmental area. Cereb. Cortex 26, 4057–4068 (2016).
Article PubMed PubMed Central Google Scholar
Jo, Y. S., Lee, J. & Mizumori, S. J. Effects of prefrontal cortical inactivation on neural activity in the ventral tegmental area. J. Neurosci. 33, 8159–8171 (2013).
Article CAS PubMed PubMed Central Google Scholar
Takahashi, Y. K. et al. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat. Neurosci. 14, 1590–1597 (2011).
Article CAS PubMed PubMed Central Google Scholar
Langdon, A. J., Sharpe, M. J., Schoenbaum, G. & Niv, Y. Model-based predictions for dopamine. Curr. Opin. Neurobiol. 49, 1–7 (2017).
Article PubMed PubMed Central Google Scholar
Sutton, R. S. Learning to predict by the method of temporal difference. Mach. Learn. 3, 9–44 (1988).
Article Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An introduction (MIT Press, 1998).
Google Scholar
Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artif. Intelligence 101, 99–134 (1998).
Article Google Scholar
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Article CAS PubMed PubMed Central Google Scholar
Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Evaluating the TD model of classical conditioning. Learn. Behav. 40, 305–319 (2012).
Article PubMed Google Scholar
Glascher, J., Daw, N., Dayan, P. & O’Doherty, J. P. States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron 66, 585–595 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by grant no. Z1A-DA000587 (to G.S.) at the Intramural Research Program of the National Institute on Drug Abuse, the Intramural Research Program of the National Institute of Mental Health (to A.J.L.) and by grant no. U01DA050647 (to A.J.L.) from the National Institute on Drug Abuse. The opinions expressed in this article are the authors’ own and do not reflect the view of the National Institutes of Health/Department of Health and Human Services.

Author information

These authors jointly supervised this work: Angela J. Langdon, Geoffrey Schoenbaum.

Authors and Affiliations

Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
Yuji K. Takahashi, Thomas A. Stalnaker, Lauren E. Mueller & Geoffrey Schoenbaum
Psychology Department, Princeton University, Princeton, NJ, USA
Sevan K. Harootonian
Intramural Research Program, National Institute of Mental Health, Bethesda, MD, USA
Angela J. Langdon

Authors

Yuji K. Takahashi
View author publications
You can also search for this author in PubMed Google Scholar
Thomas A. Stalnaker
View author publications
You can also search for this author in PubMed Google Scholar
Lauren E. Mueller
View author publications
You can also search for this author in PubMed Google Scholar
Sevan K. Harootonian
View author publications
You can also search for this author in PubMed Google Scholar
Angela J. Langdon
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey Schoenbaum
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.K.T., T.A.S. and G.S. designed the experiments. Y.K.T. and L.E.M. conducted the behavioral training and single-unit recording. S.K.H. and A.J.L. conducted the modeling. Y.K.T., A.J.L. and G.S. interpreted the data and wrote the manuscript with input from the other authors.

Corresponding authors

Correspondence to Yuji K. Takahashi, Angela J. Langdon or Geoffrey Schoenbaum.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Neuroscience thanks Okihide Hikosaka, Kevin Miller and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Activity to the odor cues changes with value and free choice behavior across trial blocks and does so similarly for the two block types.

(a) Average firing in the reward-responsive dopamine neurons during presentation of the high and low value cues. Average firing is plotted for the first and last 5 trials of the Delay-Only and Delay-Flavor blocks. Activity increased to the high value cue, paired with the early reward, across each block (Three-way ANOVA, Trial x value, F_9,1071 = 9.46, p < 0.01), and there was no difference in the pattern across switch types (F’s < 1.86, p’s > 0.06). Error bars represent SEM. n = 120 cells collected from 8 independent rats. (b) Relationship between the change in firing to the high and low value cues and the change in free choice behavior. The difference in firing to the high and low value cues in the first (blue) and last (red) 5 trials of all the blocks is plotted against the percentage of choice of the short reward during these same trials. The two measures were strongly correlated (scatter plot) reflecting the shift in both measures from early to late (blue to red in the distribution plots) within each block (Two-sided Wilcoxon ranksum test, cue, p < 0.01; choice, p < 0.01). n = 120 cells collected from 8 independent rats. Note all models implemented in the main text also produced changes in signal to the cues that differed by value (see Extended Data Fig. 5).

Extended Data Fig. 2 Changes activity of reward-responsive dopamine neurons to omission of a delayed reward in delay-only and delay-flavor blocks do not depend on order of switches.

Displays as in Fig. 5a of main text except that (a) shows data from blocks 2 and 3 data in which the delay-flavor block preceded the delay-only block, and (b) shows data from blocks 4 and 5 in which the delay-only block preceded the delay—flavor block. Statistics in each panel indicate results of Wilcoxon signed-rank test (p) and the average difference score (u). Comparisons of the distributions in panels a and b showed that they were not different (Two-sided Wilcoxon rank sum test) within either delay-only (p = 0.48) or delay-flavor switches (p = 0.60). n = 120 cells collected from 8 independent rats.

Extended Data Fig. 3

Changes activity of reward-responsive dopamine neurons to omission of a delayed reward in delay-only and delay-flavor blocks do not depend on non-significant numerical differences in subjects’ consumption of the two flavors (Fig. 1c). Displays as in Fig. 5a of main text except that (a) shows data involving omission of the numerically-higher reward and (b) show data involving omission of the numerically-lower reward. Statistics in each panel indicate results of Wilcoxon signed-rank test (p) and the average difference score (u). Comparisons of the distributions in panels a and b showed that they were not different (Two-sided Wilcoxon rank sum test) within either delay-only (p = 0.29) or delay-flavor switches (p = 0.88). n = 120 cells collected from 8 independent rats.

Extended Data Fig. 4 Licking behavior during omission of long reward.

(a) Average lick rate in 2 sec after an omission of delayed reward on first trial and average of last 5 trials in the Delay-Only (dark-blue) and the Delay-Flavor (light-blue) switches. Two-way ANOVA (early/late x switch types) revealed a significant main effect of early/late (F_1,51 = 9.47, p < 0.01) and a significant interaction between early/late and switch type (F_1,51 = 5.18, p < 0.05). A step down comparison revealed that lick rates in the first trial were significantly higher than those in the last trials after a Delay-Flavor switch (F_1,51 = 9.90, p < 0.01, light-blue), but not after a Delay-Only switch (F_1,51 = 0.54, p > 0.10, blue). Lick rates on the first trial in the Delay-Flavor switch were significantly higher than those in the first trial after a Delay-Only switch (Two-way ANOVA, F_1,51 = 4.11, p = 0.04), but not during the last 5 trials (F_1,51 = 0.85, p > 0.10). Error bars represent SEM. n = 53 sessions collected from 8 independent rats. (b) Distributions of difference scores comparing lick rates on the first and last trials after Delay-Only (left) and Delay-Flavor (right) switches. The numbers in each panel indicate results of Two-sided Wilcoxon signed-rank test (p) and the average difference score (u). n = 53 sessions collected from 8 independent rats.

Extended Data Fig. 5 Simulated prediction error response to the cue in high and low value blocks for each TDRL model.

Simulated reward prediction error responses to the cue in the (a) single thread TDRL model without reset, (b) single thread TDRL model with reset, (c) single thread TDRL model with sequential reset, (d) single thread TDRL model with delay-specific sequential reset and the (e) multithread TDRL model for each of the delay and delay-flavor block switches. All models predicted qualitatively similar changes in activity to the high value (blue) and low value (red) cues across first and last 5 trials of delay-only and delay-flavor block switches.

Supplementary information

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Takahashi, Y.K., Stalnaker, T.A., Mueller, L.E. et al. Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model. Nat Neurosci 26, 830–839 (2023). https://doi.org/10.1038/s41593-023-01310-x

Download citation

Received: 12 November 2021
Accepted: 16 March 2023
Published: 20 April 2023
Issue Date: May 2023
DOI: https://doi.org/10.1038/s41593-023-01310-x