Dopamine neurons report an error in the temporal prediction of reward during learning

Hollerman, Jeffrey R.; Schultz, Wolfram

doi:10.1038/1124

Article
Published: August 1998

Dopamine neurons report an error in the temporal prediction of reward during learning

Jeffrey R. Hollerman^1,2 &
Wolfram Schultz¹

Nature Neuroscience volume 1, pages 304–309 (1998)Cite this article

12k Accesses
793 Citations
21 Altmetric
Metrics details

Abstract

Many behaviors are affected by rewards, undergoing long-term changes when rewards are different than predicted but remaining unchanged when rewards occur exactly as predicted. The discrepancy between reward occurrence and reward prediction is termed an 'error in reward prediction'. Dopamine neurons in the substantia nigra and the ventral tegmental area are believed to be involved in reward-dependent behaviors. Consistent with this role, they are activated by rewards, and because they are activated more strongly by unpredicted than by predicted rewards they may play a role in learning. The present study investigated whether monkey dopamine neurons code an error in reward prediction during the course of learning. Dopamine neuron responses reflected the changes in reward prediction during individual learning episodes; dopamine neurons were activated by rewards during early trials, when errors were frequent and rewards unpredictable, but activation was progressively reduced as performance was consolidated and rewards became more predictable. These neurons were also activated when rewards occurred at unpredicted times and were depressed when rewards were omitted at the predicted times. Thus, dopamine neurons code errors in the prediction of both the occurrence and the time of rewards. In this respect, their responses resemble the teaching signals that have been employed in particularly efficient computational learning models.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: The discrimination learning task.**

**Figure 3: Reward responses of three dopamine neurons **(a–c)** during learning of pairs of novel pictures.**

**Figure 4: Changes of average population response (54 neurons tested) to reward during learning.**

**Figure 5: Comparison between progress of learning and neuronal responses to reward.**

**Figure 6: Responses of dopamine neurons related to errors in the temporal prediction of reward.**

The cost of obtaining rewards enhances the reward prediction error signal of midbrain dopamine neurons

Article Open access 15 August 2019

Dopamine transients do not act as model-free prediction errors during associative learning

Article Open access 08 January 2020

A distributional code for value in dopamine-based reinforcement learning

Article 15 January 2020

References

Rescorla, R. A. & Wagner, A.R. in Classical Conditioning II: Current Research and Theory (eds Black, A. H. & Prokasy, W. F.) 64–99 (Appleton Century Crofts, New York, 1972).
Google Scholar
Dickinson, A. Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, 1980).
Google Scholar
Mackintosh, N. J. A theory of attention: Variations in the associability of stimulus with reinforcement . Psychol. Rev. 82, 276– 298 (1975).
Article Google Scholar
Pearce, J. M. & Hall, G. A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli . Psychol. Rev. 87, 532– 552 (1980).
Article CAS Google Scholar
Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981).
Article CAS Google Scholar
Smith, M. C. CS-US interval and US intensity in classical conditioning of the rabbit's nictitating membrane response. J. Comp. Physiol. Psychol. 66, 679–687 (1968).
Article CAS Google Scholar
Dickinson, A., Hall, G. & Mackintosh, N. J. Surprise and the attenuation of blocking. J. Exp. Psychol. Anim. Behav. Proc. 2, 313– 322 (1976).
Article Google Scholar
Sutton, R. S. Learning to predict by the method of temporal difference. Machine Learning 3, 9–44 (1988 ).
Google Scholar
Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Trans Syst. Man Cybernet. SMC 13, 834–846 (1983).
Article Google Scholar
Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comp. 6, 215–219 (1994).
Article Google Scholar
Wise, R. A. Neuroleptics and operant behavior: The anhedonia hypothesis. Behav. Brain Sci. 5, 39–87 (1982).
Article Google Scholar
Wise, R. A. Neuroleptics and operant behavior: The anhedonia hypothesis. Behav. Brain Sci . 5, 39–87 ( 1982).
Article Google Scholar
Fibiger, H. C. & Phillips, A. G. in Handbook of Physiology - The Nervous System, vol IV (ed. Bloom, F. E.) 647– 675 (Williams & Wilkins, Baltimore, 1986).
Google Scholar
Robbins, T. W. & Everitt, B. J. Neurobehavioural mechanisms of reward and motivation. Curr. Opin. Neurobiol. 6, 228–236 (1996).
Article CAS Google Scholar
Mirenowicz, J. & Schultz, W. Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027 (1994).
Article CAS Google Scholar
Schultz, W., Dayan, P. & Montague, R. R. A neural substrate of prediction and reward. Science 275, 1593–1599 ( 1997).
Article CAS Google Scholar
Ljungberg, T., Apicella, P. & Schultz, W. Responses of monkey dopamine neurons during learning of behavioral reactions . J. Neurophysiol. 67, 145– 163 (1992).
Article CAS Google Scholar
Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci . 13, 900–913 ( 1993).
Article CAS Google Scholar
Steinfels, G. F., Heym, J., Strecker, R.E & Jacobs, B. L. Behavioral correlates of dopaminergic unit activity in freely moving cats. Brain Res. 258, 217–228 (1983).
Article CAS Google Scholar
Schultz, W. & Romo, R. Dopamine neurons of the monkey midbrain: Contingencies of responses to stimuli eliciting immediate behavioral reactions . J. Neurophysiol. 63, 607– 624 (1990).
Article CAS Google Scholar
Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 ( 1996).
Article CAS Google Scholar
Horvitz, J. C., Stewart, T. & Jacobs, B. L. Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759, 251–258 (1997).
Article CAS Google Scholar
Houk, J. C., Adams, J. L. & Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G.) 249– 270 (MIT Press, Cambridge, 1995).
Google Scholar
Montague, P. R., Dayan, P., Person, C. & Sejnowski, T. J. Bee foraging in uncertain environments using predictive hebbian learning. Nature 377, 725–728 ( 1995).
Article CAS Google Scholar
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Article CAS Google Scholar
Suri, R. E. & Schultz, W. Learning of sequential movements with dopamine-like reinforcement signal in neural network model. Exp. Brain Res. (in press).
Calabresi, P., Maj, R., Pisani, A., Mercuri, N. B. & Bernardi, G. Long-term synaptic depression in the striatum: Physiological and pharmacological characterization. J. Neurosci. 12, 4224–4233 (1992).
Article CAS Google Scholar
Wickens, J. R., Begg, A. J. & Arbuthnott, G. W. Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro . Neuroscience 70, 1–5 (1996).
Article CAS Google Scholar
Calabresi, P. et al. Abnormal synaptic plasticity in the striatum of mice lacking dopamine D2 receptors . J. Neurosci. 17, 4536– 4544 (1997).
Article CAS Google Scholar
Hikosaka, O., Sakamoto, M. & Usui, S. Functional properties of monkey caudate neurons. III.Activities related to expectation of target and reward. J. Neurophysiol. 61, 814–832 (1989).
Article CAS Google Scholar
Apicella, P., Ljungberg, T., Scarnati, E. & Schultz, W. Responses to reward in monkey dorsal and ventral striatum. Exp. Brain Res. 85, 491–500 (1991).
Article CAS Google Scholar
Schultz, W., Apicella, P., Scarnati, E. & Ljungberg, T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12, 4595–4610 (1992).
Article CAS Google Scholar
Williams, G. V., Rolls, E. T., Leonard, C. M. & Stern, C. Neuronal responses in the ventral striatum of the behaving monkey. Behav. Brain Res. 55, 243–252 (1993).
Article CAS Google Scholar
Aosaki, T. et al. Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensorimotor conditioning. J. Neurosci . 14, 3969–3984 ( 1994).
Article CAS Google Scholar
Bowman, E. M., Aigner, T. G. & Richmond, B. J. Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. J. Neurophysiol. 75, 1061–1073 (1996).
Article CAS Google Scholar
Apicella, P., Legallet, E. & Trouche, E. Responses of tonically discharging neurons in the monkey striatum to primary rewards delivered during different behavioral states. Exp. Brain Res. 116, 456–466 ( 1997).
Article CAS Google Scholar
Matsumura, M., Kojima, J., Gardiner, T. W. & Hikosaka, O. Visual and oculomotor functions of monkey subthalamic nucleus. J. Neurophysiol. 67, 1615–1632 (1992).
Article CAS Google Scholar
Nishijo, H., Ono, T. & Nishino, H. Topographic distribution of modality-specific amygdalar neurons in alert monkey . J. Neurosci. 8, 3556– 3569 (1988).
Article CAS Google Scholar
Watanabe, M. The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units. Neurosci. Lett. 101, 113– 117 (1989).
Article CAS Google Scholar
Watanabe, M. Reward expectancy in primate prefrontal neurons. Nature 382, 629–632 (1996).
Article CAS Google Scholar
Thorpe, S. J., Rolls, E. T. & Maddison, S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Exp. Brain Res. 49, 93– 115 (1983).
Article CAS Google Scholar
Niki, H. & Watanabe, M. Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res. 171 , 213–224 (1979).
Article CAS Google Scholar
Aston-Jones, G. & Bloom, F. E. Norepinephrine-containing locus coeruleus neurons in behaving rats exhibit pronounced responses to nonnoxious environmental stimuli. J. Neurosci. 1, 887 –900 (1981).
Article CAS Google Scholar
Sara, S. J. & Segal, M. Plasticity of sensory responses of locus coeruleus neurons in the behaving rat: implications for cognition. Prog. Brain Res. 88, 571–585 (1991).
Article CAS Google Scholar
Aston-Jones, G., Rajkowski, J., Kubiak, P. & Alexinsky, T. Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. J. Neurosci. 14, 4467 –4480 (1994).
Article CAS Google Scholar
Richardson, R. T. & DeLong, M. R. Nucleus basalis of Meynert neuronal activity during a delayed response task in monkey. Brain Res. 399, 364–368 (1986).
Article CAS Google Scholar
Wilson, F. A. W. & Rolls, E. T. Neuronal responses related to reinforcement in the primate basal forebrain. Brain Res. 509, 213–231 (1990).
Article CAS Google Scholar
Gaffan, E. A., Gaffan, D. & Harrison, S. Disconnection of the amygdala from visual association cortex impairs visual reward association learning in monkeys. J. Neurosci. 8, 3144–3150 (1988).
Article CAS Google Scholar

Download references

Acknowledgements

We thank Anthony Dickinson, David Gaffan and P. Read Montague for helpful discussions and advice, and B. Aebischer, J. Corpataux, A. Gaillard, A. Pisani, A. Schwarz and F. Tinguely for expert technical assistance. Supported by Swiss NSF, Roche Research Foundation and NIMH postdoctoral fellowship to J.R.H.

Author information

Authors and Affiliations

Institute of Physiology, University of Fribourg, CH-1700, Fribourg, Switzerland
Jeffrey R. Hollerman & Wolfram Schultz
Department of Psychology, Allegheny College, Meadville, 16335, Pennsylvania, USA
Jeffrey R. Hollerman

Authors

Jeffrey R. Hollerman
View author publications
You can also search for this author in PubMed Google Scholar
Wolfram Schultz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfram Schultz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hollerman, J., Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1, 304–309 (1998). https://doi.org/10.1038/1124

Download citation

Received: 01 April 1998
Accepted: 16 June 1998
Issue Date: August 1998
DOI: https://doi.org/10.1038/1124

This article is cited by

Neural inhibition as implemented by an actor-critic model involves the human dorsal striatum and ventral tegmental area
- Ana Araújo
- Isabel Catarina Duarte
- Miguel Castelo-Branco
Scientific Reports (2024)
Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model
- Yuji K. Takahashi
- Thomas A. Stalnaker
- Geoffrey Schoenbaum
Nature Neuroscience (2023)
Role of dopamine D1 receptor in the modulation of memory consolidation by passive and self-administered heroin and associated conditioned stimuli
- Travis Francis
- Francesco Leri
Scientific Reports (2023)
Dopamine receptors of the rodent fastigial nucleus support skilled reaching for goal-directed action
- Violeta-Maria Caragea
- Marta Méndez-Couz
- Denise Manahan-Vaughan
Brain Structure and Function (2023)
A primate temporal cortex–zona incerta pathway for novelty seeking
- Takaya Ogasawara
- Fatih Sogukpinar
- Ilya E. Monosov
Nature Neuroscience (2022)

Dopamine neurons report an error in the temporal prediction of reward during learning

Abstract

Access options

Similar content being viewed by others

The cost of obtaining rewards enhances the reward prediction error signal of midbrain dopamine neurons

Dopamine transients do not act as model-free prediction errors during associative learning

A distributional code for value in dopamine-based reinforcement learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

This article is cited by

Neural inhibition as implemented by an actor-critic model involves the human dorsal striatum and ventral tegmental area

Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model

Role of dopamine D1 receptor in the modulation of memory consolidation by passive and self-administered heroin and associated conditioned stimuli

Dopamine receptors of the rodent fastigial nucleus support skilled reaching for goal-directed action

A primate temporal cortex–zona incerta pathway for novelty seeking

Unexpected rewards

Search

Quick links

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links