Dopamine neurons report an error in the temporal prediction of reward during learning

Article metrics

Abstract

Many behaviors are affected by rewards, undergoing long-term changes when rewards are different than predicted but remaining unchanged when rewards occur exactly as predicted. The discrepancy between reward occurrence and reward prediction is termed an 'error in reward prediction'. Dopamine neurons in the substantia nigra and the ventral tegmental area are believed to be involved in reward-dependent behaviors. Consistent with this role, they are activated by rewards, and because they are activated more strongly by unpredicted than by predicted rewards they may play a role in learning. The present study investigated whether monkey dopamine neurons code an error in reward prediction during the course of learning. Dopamine neuron responses reflected the changes in reward prediction during individual learning episodes; dopamine neurons were activated by rewards during early trials, when errors were frequent and rewards unpredictable, but activation was progressively reduced as performance was consolidated and rewards became more predictable. These neurons were also activated when rewards occurred at unpredicted times and were depressed when rewards were omitted at the predicted times. Thus, dopamine neurons code errors in the prediction of both the occurrence and the time of rewards. In this respect, their responses resemble the teaching signals that have been employed in particularly efficient computational learning models.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: The discrimination learning task.
Figure 2: Learning curves.
Figure 3: Reward responses of three dopamine neurons (a–c) during learning of pairs of novel pictures.
Figure 4: Changes of average population response (54 neurons tested) to reward during learning.
Figure 5: Comparison between progress of learning and neuronal responses to reward.
Figure 6: Responses of dopamine neurons related to errors in the temporal prediction of reward.

References

  1. 1

    Rescorla, R. A. & Wagner, A.R. in Classical Conditioning II: Current Research and Theory (eds Black, A. H. & Prokasy, W. F.) 64–99 (Appleton Century Crofts, New York, 1972).

  2. 2

    Dickinson, A. Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, 1980).

  3. 3

    Mackintosh, N. J. A theory of attention: Variations in the associability of stimulus with reinforcement . Psychol. Rev. 82, 276– 298 (1975).

  4. 4

    Pearce, J. M. & Hall, G. A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli . Psychol. Rev. 87, 532– 552 (1980).

  5. 5

    Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981).

  6. 6

    Smith, M. C. CS-US interval and US intensity in classical conditioning of the rabbit's nictitating membrane response. J. Comp. Physiol. Psychol. 66, 679–687 (1968).

  7. 7

    Dickinson, A., Hall, G. & Mackintosh, N. J. Surprise and the attenuation of blocking. J. Exp. Psychol. Anim. Behav. Proc. 2, 313– 322 (1976).

  8. 8

    Sutton, R. S. Learning to predict by the method of temporal difference. Machine Learning 3, 9–44 (1988 ).

  9. 9

    Barto, A. G., Sutton, R. S. & Anderson, C. W. Neuronlike adaptive elements that can solve difficult learning problems. IEEE Trans Syst. Man Cybernet. SMC 13, 834–846 (1983).

  10. 10

    Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comp. 6, 215–219 (1994).

  11. 11

    Wise, R. A. Neuroleptics and operant behavior: The anhedonia hypothesis. Behav. Brain Sci. 5, 39–87 (1982).

  12. 12

    Wise, R. A. Neuroleptics and operant behavior: The anhedonia hypothesis. Behav. Brain Sci . 5, 39–87 ( 1982).

  13. 13

    Fibiger, H. C. & Phillips, A. G. in Handbook of Physiology - The Nervous System, vol IV (ed. Bloom, F. E.) 647– 675 (Williams & Wilkins, Baltimore, 1986).

  14. 14

    Robbins, T. W. & Everitt, B. J. Neurobehavioural mechanisms of reward and motivation. Curr. Opin. Neurobiol. 6, 228–236 (1996).

  15. 15

    Mirenowicz, J. & Schultz, W. Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027 (1994).

  16. 16

    Schultz, W., Dayan, P. & Montague, R. R. A neural substrate of prediction and reward. Science 275, 1593–1599 ( 1997).

  17. 17

    Ljungberg, T., Apicella, P. & Schultz, W. Responses of monkey dopamine neurons during learning of behavioral reactions . J. Neurophysiol. 67, 145– 163 (1992).

  18. 18

    Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci . 13, 900–913 ( 1993).

  19. 19

    Steinfels, G. F., Heym, J., Strecker, R.E & Jacobs, B. L. Behavioral correlates of dopaminergic unit activity in freely moving cats. Brain Res. 258, 217–228 (1983).

  20. 20

    Schultz, W. & Romo, R. Dopamine neurons of the monkey midbrain: Contingencies of responses to stimuli eliciting immediate behavioral reactions . J. Neurophysiol. 63, 607– 624 (1990).

  21. 21

    Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 ( 1996).

  22. 22

    Horvitz, J. C., Stewart, T. & Jacobs, B. L. Burst activity of ventral tegmental dopamine neurons is elicited by sensory stimuli in the awake cat. Brain Res. 759, 251–258 (1997).

  23. 23

    Houk, J. C., Adams, J. L. & Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G.) 249– 270 (MIT Press, Cambridge, 1995).

  24. 24

    Montague, P. R., Dayan, P., Person, C. & Sejnowski, T. J. Bee foraging in uncertain environments using predictive hebbian learning. Nature 377, 725–728 ( 1995).

  25. 25

    Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).

  26. 26

    Suri, R. E. & Schultz, W. Learning of sequential movements with dopamine-like reinforcement signal in neural network model. Exp. Brain Res. (in press).

  27. 27

    Calabresi, P., Maj, R., Pisani, A., Mercuri, N. B. & Bernardi, G. Long-term synaptic depression in the striatum: Physiological and pharmacological characterization. J. Neurosci. 12, 4224–4233 (1992).

  28. 28

    Wickens, J. R., Begg, A. J. & Arbuthnott, G. W. Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro . Neuroscience 70, 1–5 (1996).

  29. 29

    Calabresi, P. et al. Abnormal synaptic plasticity in the striatum of mice lacking dopamine D2 receptors . J. Neurosci. 17, 4536– 4544 (1997).

  30. 30

    Hikosaka, O., Sakamoto, M. & Usui, S. Functional properties of monkey caudate neurons. III.Activities related to expectation of target and reward. J. Neurophysiol. 61, 814–832 (1989).

  31. 31

    Apicella, P., Ljungberg, T., Scarnati, E. & Schultz, W. Responses to reward in monkey dorsal and ventral striatum. Exp. Brain Res. 85, 491–500 (1991).

  32. 32

    Schultz, W., Apicella, P., Scarnati, E. & Ljungberg, T. Neuronal activity in monkey ventral striatum related to the expectation of reward. J. Neurosci. 12, 4595–4610 (1992).

  33. 33

    Williams, G. V., Rolls, E. T., Leonard, C. M. & Stern, C. Neuronal responses in the ventral striatum of the behaving monkey. Behav. Brain Res. 55, 243–252 (1993).

  34. 34

    Aosaki, T. et al. Responses of tonically active neurons in the primate's striatum undergo systematic changes during behavioral sensorimotor conditioning. J. Neurosci . 14, 3969–3984 ( 1994).

  35. 35

    Bowman, E. M., Aigner, T. G. & Richmond, B. J. Neural signals in the monkey ventral striatum related to motivation for juice and cocaine rewards. J. Neurophysiol. 75, 1061–1073 (1996).

  36. 36

    Apicella, P., Legallet, E. & Trouche, E. Responses of tonically discharging neurons in the monkey striatum to primary rewards delivered during different behavioral states. Exp. Brain Res. 116, 456–466 ( 1997).

  37. 37

    Matsumura, M., Kojima, J., Gardiner, T. W. & Hikosaka, O. Visual and oculomotor functions of monkey subthalamic nucleus. J. Neurophysiol. 67, 1615–1632 (1992).

  38. 38

    Nishijo, H., Ono, T. & Nishino, H. Topographic distribution of modality-specific amygdalar neurons in alert monkey . J. Neurosci. 8, 3556– 3569 (1988).

  39. 39

    Watanabe, M. The appropriateness of behavioral responses coded in post-trial activity of primate prefrontal units. Neurosci. Lett. 101, 113– 117 (1989).

  40. 40

    Watanabe, M. Reward expectancy in primate prefrontal neurons. Nature 382, 629–632 (1996).

  41. 41

    Thorpe, S. J., Rolls, E. T. & Maddison, S. The orbitofrontal cortex: neuronal activity in the behaving monkey. Exp. Brain Res. 49, 93– 115 (1983).

  42. 42

    Niki, H. & Watanabe, M. Prefrontal and cingulate unit activity during timing behavior in the monkey. Brain Res. 171 , 213–224 (1979).

  43. 43

    Aston-Jones, G. & Bloom, F. E. Norepinephrine-containing locus coeruleus neurons in behaving rats exhibit pronounced responses to nonnoxious environmental stimuli. J. Neurosci. 1, 887 –900 (1981).

  44. 44

    Sara, S. J. & Segal, M. Plasticity of sensory responses of locus coeruleus neurons in the behaving rat: implications for cognition. Prog. Brain Res. 88, 571–585 (1991).

  45. 45

    Aston-Jones, G., Rajkowski, J., Kubiak, P. & Alexinsky, T. Locus coeruleus neurons in monkey are selectively activated by attended cues in a vigilance task. J. Neurosci. 14, 4467 –4480 (1994).

  46. 46

    Richardson, R. T. & DeLong, M. R. Nucleus basalis of Meynert neuronal activity during a delayed response task in monkey. Brain Res. 399, 364–368 (1986).

  47. 47

    Wilson, F. A. W. & Rolls, E. T. Neuronal responses related to reinforcement in the primate basal forebrain. Brain Res. 509, 213–231 (1990).

  48. 48

    Gaffan, E. A., Gaffan, D. & Harrison, S. Disconnection of the amygdala from visual association cortex impairs visual reward association learning in monkeys. J. Neurosci. 8, 3144–3150 (1988).

Download references

Acknowledgements

We thank Anthony Dickinson, David Gaffan and P. Read Montague for helpful discussions and advice, and B. Aebischer, J. Corpataux, A. Gaillard, A. Pisani, A. Schwarz and F. Tinguely for expert technical assistance. Supported by Swiss NSF, Roche Research Foundation and NIMH postdoctoral fellowship to J.R.H.

Author information

Correspondence to Wolfram Schultz.

Rights and permissions

Reprints and Permissions

About this article

Further reading