Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Dopamine neurons report an error in the temporal prediction of reward during learning

Abstract

Many behaviors are affected by rewards, undergoing long-term changes when rewards are different than predicted but remaining unchanged when rewards occur exactly as predicted. The discrepancy between reward occurrence and reward prediction is termed an 'error in reward prediction'. Dopamine neurons in the substantia nigra and the ventral tegmental area are believed to be involved in reward-dependent behaviors. Consistent with this role, they are activated by rewards, and because they are activated more strongly by unpredicted than by predicted rewards they may play a role in learning. The present study investigated whether monkey dopamine neurons code an error in reward prediction during the course of learning. Dopamine neuron responses reflected the changes in reward prediction during individual learning episodes; dopamine neurons were activated by rewards during early trials, when errors were frequent and rewards unpredictable, but activation was progressively reduced as performance was consolidated and rewards became more predictable. These neurons were also activated when rewards occurred at unpredicted times and were depressed when rewards were omitted at the predicted times. Thus, dopamine neurons code errors in the prediction of both the occurrence and the time of rewards. In this respect, their responses resemble the teaching signals that have been employed in particularly efficient computational learning models.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: The discrimination learning task.
Figure 2: Learning curves.
Figure 3: Reward responses of three dopamine neurons (a–c) during learning of pairs of novel pictures.
Figure 4: Changes of average population response (54 neurons tested) to reward during learning.
Figure 5: Comparison between progress of learning and neuronal responses to reward.
Figure 6: Responses of dopamine neurons related to errors in the temporal prediction of reward.

References

  1. Rescorla, R. A. & Wagner, A.R. in Classical Conditioning II: Current Research and Theory (eds Black, A. H. & Prokasy, W. F.) 64–99 (Appleton Century Crofts, New York, 1972).

    Google Scholar 

  2. Dickinson, A. Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, 1980).

    Google Scholar 

  3. Mackintosh, N. J. A theory of attention: Variations in the associability of stimulus with reinforcement . Psychol. Rev. 82, 276– 298 (1975).

    Article  Google Scholar 

  4. Pearce, J. M. & Hall, G. A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli . Psychol. Rev. 87, 532– 552 (1980).

    Article  CAS  Google Scholar 

  5. Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981).

    Article  CAS  Google Scholar 

  6. Smith, M. C. CS-US interval and US intensity in classical conditioning of the rabbit's nictitating membrane response. J. Comp. Physiol. Psychol. 66, 679–687 (1968).

    Article  CAS  Google Scholar 

  7. Dickinson, A., Hall, G. & Mackintosh, N. J. Surprise and the attenuation of blocking. J. Exp. Psychol. Anim. Behav. Proc. 2, 313– 322 (1976).

    Article  Google Scholar 

  8. Sutton, R. S. Learning to predict by the method of temporal difference. Machine Learning 3, 9–44 (1988 ).

    Google Scholar 

  9. Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Trans Syst. Man Cybernet. SMC 13, 834–846 (1983).

    Article  Google Scholar 

  10. Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comp. 6, 215–219 (1994).

    Article  Google Scholar 

  11. Wise, R. A. Neuroleptics and operant behavior: The anhedonia hypothesis. Behav. Brain Sci. 5, 39–87 (1982).

    Article  Google Scholar 

  12. Wise, R. A. Neuroleptics and operant behavior: The anhedonia hypothesis. Behav. Brain Sci . 5, 39–87 ( 1982).

    Article  Google Scholar 

  13. Fibiger, H. C. & Phillips, A. G. in Handbook of Physiology - The Nervous System, vol IV (ed. Bloom, F. E.) 647– 675 (Williams & Wilkins, Baltimore, 1986).

    Google Scholar 

  14. Robbins, T. W. & Everitt, B. J. Neurobehavioural mechanisms of reward and motivation. Curr. Opin. Neurobiol. 6, 228–236 (1996).

    Article  CAS  Google Scholar 

  15. Mirenowicz, J. & Schultz, W. Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027 (1994).

    Article  CAS  Google Scholar 

  16. Schultz, W., Dayan, P. & Montague, R. R. A neural substrate of prediction and reward. Science 275, 1593–1599 ( 1997).

    Article  CAS  Google Scholar 

  17. Ljungberg, T., Apicella, P. & Schultz, W. Responses of monkey dopamine neurons during learning of behavioral reactions . J. Neurophysiol. 67, 145– 163 (1992).

    Article  CAS  Google Scholar 

  18. Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci . 13, 900–913 ( 1993).

    Article  CAS  Google Scholar 

  19. Steinfels, G. F., Heym, J., Strecker, R.E & Jacobs, B. L. Behavioral correlates of dopaminergic unit activity in freely moving cats. Brain Res. 258, 217–228 (1983).

    Article  CAS  Google Scholar 

  20. Schultz, W. & Romo, R. Dopamine neurons of the monkey midbrain: Contingencies of responses to stimuli eliciting immediate behavioral reactions . J. Neurophysiol. 63, 607– 624 (1990).

    Article  CAS  Google Scholar 

  21. Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 ( 1996).

    Article  CAS  Google Scholar 

  22. Horvitz, J. C., Stewart, T. & Jacobs, B. L. Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759, 251–258 (1997).

    Article  CAS  Google Scholar 

  23. Houk, J. C., Adams, J. L. & Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G.) 249– 270 (MIT Press, Cambridge, 1995).

    Google Scholar 

  24. Montague, P. R., Dayan, P., Person, C. & Sejnowski, T. J. Bee foraging in uncertain environments using predictive hebbian learning. Nature 377, 725–728 ( 1995).

    Article  CAS  Google Scholar 

  25. Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

    Article  CAS  Google Scholar 

  26. Suri, R. E. & Schultz, W. Learning of sequential movements with dopamine-like reinforcement signal in neural network model. Exp. Brain Res. (in press).

  27. Calabresi, P., Maj, R., Pisani, A., Mercuri, N. B. & Bernardi, G. Long-term synaptic depression in the striatum: Physiological and pharmacological characterization. J. Neurosci. 12, 4224–4233 (1992).

    Article  CAS  Google Scholar 

  28. Wickens, J. R., Begg, A. J. & Arbuthnott, G. W. Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro . Neuroscience 70, 1–5 (1996).

    Article  CAS  Google Scholar 

  29. Calabresi, P. et al. Abnormal synaptic plasticity in the striatum of mice lacking dopamine D2 receptors . J. Neurosci. 17, 4536– 4544 (1997).

    Article  CAS  Google Scholar 

  30. Hikosaka, O., Sakamoto, M. & Usui, S. Functional properties of monkey caudate neurons. III.Activities related to expectation of target and reward. J. Neurophysiol. 61, 814–832 (1989).

    Article  CAS  Google Scholar 

  31. Apicella, P., Ljungberg, T., Scarnati, E. & Schultz, W. Responses to reward in monkey dorsal and ventral striatum. Exp. Brain Res. 85, 491–500 (1991).

    Article  CAS  Google Scholar 

  32. Schultz, W., Apicella, P., Scarnati, E. & Ljungberg, T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12, 4595–4610 (1992).

    Article  CAS  Google Scholar 

  33. Williams, G. V., Rolls, E. T., Leonard, C. M. & Stern, C. Neuronal responses in the ventral striatum of the behaving monkey. Behav. Brain Res. 55, 243–252 (1993).

    Article  CAS  Google Scholar 

  34. Aosaki, T. et al. Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensorimotor conditioning. J. Neurosci . 14, 3969–3984 ( 1994).

    Article  CAS  Google Scholar 

  35. Bowman, E. M., Aigner, T. G. & Richmond, B. J. Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. J. Neurophysiol. 75, 1061–1073 (1996).

    Article  CAS  Google Scholar 

  36. Apicella, P., Legallet, E. & Trouche, E. Responses of tonically discharging neurons in the monkey striatum to primary rewards delivered during different behavioral states. Exp. Brain Res. 116, 456–466 ( 1997).

    Article  CAS  Google Scholar 

  37. Matsumura, M., Kojima, J., Gardiner, T. W. & Hikosaka, O. Visual and oculomotor functions of monkey subthalamic nucleus. J. Neurophysiol. 67, 1615–1632 (1992).

    Article  CAS  Google Scholar 

  38. Nishijo, H., Ono, T. & Nishino, H. Topographic distribution of modality-specific amygdalar neurons in alert monkey . J. Neurosci. 8, 3556– 3569 (1988).

    Article  CAS  Google Scholar 

  39. Watanabe, M. The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units. Neurosci. Lett. 101, 113– 117 (1989).

    Article  CAS  Google Scholar 

  40. Watanabe, M. Reward expectancy in primate prefrontal neurons. Nature 382, 629–632 (1996).

    Article  CAS  Google Scholar 

  41. Thorpe, S. J., Rolls, E. T. & Maddison, S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Exp. Brain Res. 49, 93– 115 (1983).

    Article  CAS  Google Scholar 

  42. Niki, H. & Watanabe, M. Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res. 171 , 213–224 (1979).

    Article  CAS  Google Scholar 

  43. Aston-Jones, G. & Bloom, F. E. Norepinephrine-containing locus coeruleus neurons in behaving rats exhibit pronounced responses to nonnoxious environmental stimuli. J. Neurosci. 1, 887 –900 (1981).

    Article  CAS  Google Scholar 

  44. Sara, S. J. & Segal, M. Plasticity of sensory responses of locus coeruleus neurons in the behaving rat: implications for cognition. Prog. Brain Res. 88, 571–585 (1991).

    Article  CAS  Google Scholar 

  45. Aston-Jones, G., Rajkowski, J., Kubiak, P. & Alexinsky, T. Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. J. Neurosci. 14, 4467 –4480 (1994).

    Article  CAS  Google Scholar 

  46. Richardson, R. T. & DeLong, M. R. Nucleus basalis of Meynert neuronal activity during a delayed response task in monkey. Brain Res. 399, 364–368 (1986).

    Article  CAS  Google Scholar 

  47. Wilson, F. A. W. & Rolls, E. T. Neuronal responses related to reinforcement in the primate basal forebrain. Brain Res. 509, 213–231 (1990).

    Article  CAS  Google Scholar 

  48. Gaffan, E. A., Gaffan, D. & Harrison, S. Disconnection of the amygdala from visual association cortex impairs visual reward association learning in monkeys. J. Neurosci. 8, 3144–3150 (1988).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank Anthony Dickinson, David Gaffan and P. Read Montague for helpful discussions and advice, and B. Aebischer, J. Corpataux, A. Gaillard, A. Pisani, A. Schwarz and F. Tinguely for expert technical assistance. Supported by Swiss NSF, Roche Research Foundation and NIMH postdoctoral fellowship to J.R.H.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfram Schultz.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hollerman, J., Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat Neurosci 1, 304–309 (1998). https://doi.org/10.1038/1124

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/1124

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing