Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors

Maes, Etienne J. P; Sharpe, Melissa J.; Usypchuk, Alexandra A.; Lozzi, Megan; Chang, Chun Yun; Gardner, Matthew P. H.; Schoenbaum, Geoffrey; Iordanova, Mihaela D.

doi:10.1038/s41593-019-0574-1

Brief Communication
Published: 20 January 2020

Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors

Nature Neuroscience volume 23, pages 176–178 (2020)Cite this article

8598 Accesses
38 Citations
80 Altmetric
Metrics details

Subjects

Abstract

Reward-evoked dopamine transients are well established as prediction errors. However, the central tenet of temporal difference accounts—that similar transients evoked by reward-predictive cues also function as errors—remains untested. In the present communication we addressed this by showing that optogenetically shunting dopamine activity at the start of a reward-predicting cue prevents second-order conditioning without affecting blocking. These results indicate that cue-evoked transients function as temporal-difference prediction errors rather than reward predictions.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: The cue-evoked dopamine transient is necessary for SOC.**

**Fig. 3: The cue-evoked dopamine transient is not necessary for blocking.**

Dopamine transients do not act as model-free prediction errors during associative learning

Article Open access 08 January 2020

Melissa J. Sharpe, Hannah M. Batchelor, … Geoffrey Schoenbaum

Dopamine encodes real-time reward availability and transitions between reward availability states on different timescales

Article Open access 01 July 2022

Abigail Kalmbach, Vanessa Winiger, … Eleanor H. Simpson

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Article 20 April 2023

Yuji K. Takahashi, Thomas A. Stalnaker, … Geoffrey Schoenbaum

Data availability

Behavioral data will be made available upon reasonable request.

Code availability

Simulations were performed using custom-written functions in MATLAB (Mathworks), which are posted on Github (https://github.com/mphgardner/Basic_Pavlovian_TDRL/tree/Maes_2018).

References

Glimcher, P. W. Understanding dopamine and reinforcement learning: the dopamine reward prediction error hypothesis. Proc. Natl Acad. Sci. USA 108, 15647–15654 (2011).
Article CAS Google Scholar
Dayan, P. Improving generalization for temporal difference learning: the successor representation. Neural Comput. 5, 613–624 (1993).
Article Google Scholar
Sutton, R. S. Learning to predict by the method of temporal difference. Machine Learn. 3, 9–44 (1988).
Google Scholar
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate for prediction and reward. Science 275, 1593–1599 (1997).
Article CAS Google Scholar
Rizley, R. C. & Rescorla, R. A. Associations in second-order conditioning and sensory preconditioning. J. Compar. Physiol. Psychol. 81, 1–11 (1972).
Article CAS Google Scholar
Chang, C. Y., Gardner, M., Di Tillio, M. G. & Schoenbaum, G. Optogenetic blockade of dopamine transients prevents learning induced by changes in reward features. Curr. Biol. 27, 3480–3486 (2017).
Article CAS Google Scholar
Kamin, L. J. Aversive stimulation. In Miami Symposium on the Prediction of Behavior, 1967 (ed. M.R. Jones) 9–31 (Univ. Miami Press, 1968).
Chang, C. Y., Gardner, M. P. H., Conroy, J. S., Whitaker, L. R. & Schoenbaum, G. Brief, but not prolonged, pauses in the firing of midbrain dopamine neurons are sufficient to produce a conditioned inhibitor. J. Neurosci. 38, 8822–8830 (2018).
Article CAS Google Scholar
Sharpe, M. J. et al. Dopamine transients are sufficient and necessary for acquisition of model-based associations. Nat. Neurosci. 20, 735–742 (2017).
Article CAS Google Scholar
Kim H. R. et al. A unified framework for dopamine signals across timescales. Preprint at bioRxiv https://doi.org/10.1101/803437 (2019).
Gardner, M. P. H., Schoenbaum, G. & Gershman, S. J. Rethinking dopamine as generalized prediction error. Proc. R. Soc. B 285, https://doi.org/10.1098/rspb.2018.1645 (2018).
Article Google Scholar
Keiflin, R., Pribut, H. J., Shah, N. B. & Janak, P. H. Ventral tegmental dopamine neurons participate in reward identity predictions. Curr. Biol. 29, 93–103.E3 (2019).
Nairne, J. S. & Rescorla, R. A. 2nd-order conditioning with diffuse auditory reinforcers in the pigeon. Learn. Motiv. 12, 65–91 (1981).
Article Google Scholar
Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 1998).
Rescorla, R. A. & Wagner, A. R. in Classical Conditioning: II. Current Research and Theory (eds Black A. H. & Prokasy W. F.) 64–99 (Appleton–Century–Crofts, 1972).
Sharpe, M. J. & Killcross, A. S. The prelimbic cortex contributes to the down-regulation of attention toward redundant cues. Cereb. Cortex 24, 1066–1074 (2014).
Article Google Scholar
Mahmud, A., Petrov, P., Esber, G. R. & Iordanova, M. D. The serial blocking effect: a testbed for the neural mechanisms of temporal-difference learning. Sci. Rep. 9, 5962 (2019).
Article Google Scholar
Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
Article CAS Google Scholar
Olejnik, S. & Algina, J. Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychol. Methods 8, 434–447 (2003).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Intramural Research Program at the NIDA; the Canada Research Chair’s program (to M.D.I.); a Natural Sciences and Engineering Research Council of Canada Discovery Grant (to M.D.I.); a Natural Sciences and Engineering Research Council of Canada Undergraduate Student Research Award (to E.J.P.M.); and a Concordia University Undergraduate Research Award (to A.A.U.). The opinions expressed in this article are our own and do not reflect the view of the NIH/DHHS.

Author information

Authors and Affiliations

Department of Psychology/Centre for Studies in Behavioural Neurobiology, Concordia University, Montreal, Quebec, Canada
Etienne J. P Maes, Alexandra A. Usypchuk, Megan Lozzi & Mihaela D. Iordanova
Department of Psychology, University of California, Los Angeles, CA, USA
Melissa J. Sharpe
Intramural Research Program, National Institute on Drug Abuse, Baltimore, MD, USA
Chun Yun Chang, Matthew P. H. Gardner & Geoffrey Schoenbaum
Departments of Anatomy & Neurobiology and Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
Geoffrey Schoenbaum
Solomon H. Snyder Department of Neuroscience, The Johns Hopkins University, Baltimore, MD, USA
Geoffrey Schoenbaum

Authors

Etienne J. P Maes
View author publications
You can also search for this author in PubMed Google Scholar
Melissa J. Sharpe
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra A. Usypchuk
View author publications
You can also search for this author in PubMed Google Scholar
Megan Lozzi
View author publications
You can also search for this author in PubMed Google Scholar
Chun Yun Chang
View author publications
You can also search for this author in PubMed Google Scholar
Matthew P. H. Gardner
View author publications
You can also search for this author in PubMed Google Scholar
Geoffrey Schoenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Mihaela D. Iordanova
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.J.P.M., M.J.S., G.S. and M.D.I. conceived and designed the experiments. E.J.P.M., A.U. and M.L. carried out the surgical procedures and collected the behavioral data. C.Y.C., E.J.P.M., A.U. and L.M. supervised the immunohistologic verification of virus expression and fiber placement. M.P.H.G. conducted the computational modeling. M.J.S. and M.D.I. analyzed the data. G.S. and M.D.I. interpreted the data and wrote the manuscript with input from the other authors.

Corresponding authors

Correspondence to Geoffrey Schoenbaum or Mihaela D. Iordanova.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks S. Floresco and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Complete Designs and Modeling Data.

Experimental design for within-subjects blocking and second-order conditioning as used in our study, along with graphs modeling the predicted results of shunting of the dopamine transient at the start of the reward-predictive cue, A, in each procedure. In Model 1 the VTA DA signal encodes a prediction error and in Model 2 it encodes a reward prediction. Bar graphs are reproduced from Fig. 1 in the main text; other panels model results of training in the other phases. Note the output of the classic TDRL model was converted from V to conditioned responding (CR) to better reflect the behavioral output actually measured in our experiments. The major impact of the neural manipulation was on responding to X in Model 2. Elimination of the prediction on AX trials in this model causes a positive prediction error on reward delivery in the blocking phase. This results in unblocking of X.

Extended Data Fig. 2 Experimental designs for within-subjects blocking and second-order conditioning as used in our study with NpHR and eYFP rats.

During Conditioning, responding to A but not B increased across days, and this responding was higher for A compared to B. During Blocking, responding to the control compound (DZ) was lower compared to blocking compound (AX, AY) at the start, but equivalent by the end of training, with no difference between the blocking compounds. Responding during the first trial of the Probe Test showed evidence of blocking (X and Y vs. Z) and no difference between the blocking cues (X vs Y, see Fig. 3 legend for statistics). Differences disappeared on subsequent trials. Responding to the retrained cue A increased across reminder (Rmdr) trials while that to C (i.e., C→A trials) and D (i.e., D→A trials) did not differ across second-order training. On Probe Test, responding to C was lower compared to D (see Fig. 2 legend for statistics) on the first trial as well as across the entire test. eYFP: The pattern of data obtained for the NpHR rats was similar to that obtained for the eYFP rats with one critical exception: there was no difference between C and D on Probe Test in the eYFP rats. Some data are reproduced from Figs. 2 and 3 in the main text. CR or conditioned responding is percent time spent in the magazine during the last 5 s of the cue. Drawings to the left illustrate the extent of expression of NpHR and eYFP and location of fiber tips within VTA.

Extended Data Fig. 3 The cue-evoked dopamine transient is necessary for second-order conditioning in naïve rats.

Drawings to the left illustrate the extent of expression of NpHR and location of fiber tips within VTA. The three panels of behavioral responding show behavioral data across the three phases of the second-order conditioning experiment represented using three different CRs (top—percent time spent in the magazine; middle—cumulative head entries during the CS across a single day of training; bottom—percent trials containing a head entry). Behavioral responding during A increased during Conditioning (see methods for statistics). Responding to C (i.e., C→A trials) and D (i.e., D→A trials) did not differ (see methods) during second-order training when shunting of VTA transients took place at the start of the reward-predictive cue, A. On Test, responding to C was lower compared to D (see methods for statistics for each of the CRs), showing that inhibition of the VTA DA signal at the start of A prevented A from supporting second-order conditioning to C whereas identical inhibition during the ITI left learning to D intact.

Extended Data Fig. 4 Modeling data for the Blocking and Second-order experiments with different strengths of neuronal inhibition.

The modeling data show how different inhibition strength (i.e., η = 0, 0.5, 1 as used in the models, see also Figure S1) affects the predicted conditioned responding on Probe Test across the different models. Model Control represents eYFP controls in which inhibition is not effective. Model Error represents the dopamine signal acting as a prediction-error in which increases in inhibition strength do not affect conditioned responding to X in blocking but lead to reduced conditioned respdoning to the C in second-order conditioning. Model V represents the dopamine signal as prediction in which increases in inhibition strength lead to greater conditioned responding to X in blocking (i.e., unblocking) and reduced conditioned responding to C in second-order conditioning.

Supplementary information

Reporting Summary

Supplementary Software

Code used for the simulations presented in the article.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Maes, E.J.P., Sharpe, M.J., Usypchuk, A.A. et al. Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nat Neurosci 23, 176–178 (2020). https://doi.org/10.1038/s41593-019-0574-1

Download citation

Received: 09 January 2019
Accepted: 09 December 2019
Published: 20 January 2020
Issue Date: February 2020
DOI: https://doi.org/10.1038/s41593-019-0574-1

This article is cited by

Striatal dopamine signals reflect perceived cue–action–outcome associations in mice
- Tobias W. Bernklau
- Beatrice Righetti
- Simon N. Jacob
Nature Neuroscience (2024)
Mobilization of endocannabinoids by midbrain dopamine neurons is required for the encoding of reward prediction
- Miguel Á. Luján
- Dan P. Covey
- Joseph F. Cheer
Nature Communications (2023)
Neural substrates of parallel devaluation-sensitive and devaluation-insensitive Pavlovian learning in humans
- Eva R. Pool
- Wolfgang M. Pauli
- John P. O’Doherty
Nature Communications (2023)
A gradual temporal shift of dopamine responses mirrors the progression of temporal difference error in machine learning
- Ryunosuke Amo
- Sara Matias
- Mitsuko Watabe-Uchida
Nature Neuroscience (2022)
A computational theory of the subjective experience of flow
- David E. Melnikoff
- Ryan W. Carlson
- Paul E. Stillman
Nature Communications (2022)