According to contemporary learning theories, the discrepancy, or error, between the actual and predicted reward determines whether learning occurs when a stimulus is paired with a reward. The role of prediction errors is directly demonstrated by the observation that learning is blocked when the stimulus is paired with a fully predicted reward. By using this blocking procedure, we show that the responses of dopamine neurons to conditioned stimuli was governed differentially by the occurrence of reward prediction errors rather than stimulus–reward associations alone, as was the learning of behavioural reactions. Both behavioural and neuronal learning occurred predominantly when dopamine neurons registered a reward prediction error at the time of the reward. Our data indicate that the use of analytical tests derived from formal behavioural learning theory provides a powerful approach for studying the role of single neurons in learning.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Phasic Dopamine Release Magnitude Tracks Individual Differences in Sensitization of Locomotor Response following a History of Nicotine Exposure
Scientific Reports Open Access 13 January 2020
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
Thorndike, E. L. Animal Intelligence: Experimental Studies (MacMillan, New York, 1911).
Pavlov, I. P. Conditional Reflexes (Oxford Univ. Press, London, 1927).
Rescorla, R. A. & Wagner, A. R. in Classical Conditioning II: Current Research and Theory (eds Black, A. H. & Prokasy, W. F.) 64–99 (Appleton Century Crofts, New York, 1972).
Mackintosh, N. J. A theory of attention: Variations in the associability of stimulus with reinforcement. Psychol. Rev. 82, 276–298 (1975).
Pearce, J. M. & Hall, G. A. A model for Pavlovian conditioning: variations in the effectiveness of conditioned but not of unconditioned stimuli. Psychol. Rev. 87, 532–552 (1980).
Dickinson, A. Contemporary Animal Learning Theory (Cambridge Univ. Press, Cambridge, 1980).
Schultz, W. & Dickinson, A. Neuronal coding of prediction errors. Annu. Rev. Neurosci. 23, 473–500 (2000).
Widrow, G. & Hoff, M. E. Adaptive switching circuits. IRE Western Electron. Show Convention, Convention Record Part 4, 96–104 (1960).
Kalman, R. E. A new approach to linear filtering and prediction problems. J. Basic Eng. Trans. ASME 82, 35–45 (1960).
Widrow, G. & Sterns, S. D. Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs, 1985).
Marr, D. A theory of cerebellar cortex. J. Physiol. 202, 437–470 (1969).
Ito, M. Long-term depression. Ann. Rev. Neurosci. 12, 85–102 (1989).
Thompson, R. F. & Gluck, M. A. in Perspectives on Cognitive Neuroscience (eds Lister, R. G. & Weingartner, H.) 25–45 (Oxford Univ. Press, New York, 1991).
Kawato, M. & Gomi, H. The cerebellum and VOR/OKR learning models. Trends Neurosci. 15, 445–453 (1992).
Kim, J. J., Krupa, D. J. & Thompson, R. F. Inhibitory cerebello-olivary projections and blocking effect in classical conditioning. Science 279, 570–573 (1998).
Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks: expectation and prediction. Psychol. Rev. 88, 135–170 (1981).
Sutton, R. S. & Barto, A. G. Reinforcement Learning (MIT Press, Cambridge, Massachusetts, 1998).
Fibiger, H. C. & Phillips, A. G. in Handbook of Physiology—The Nervous System IV (ed. Bloom, F. E.) 647–675 (Williams and Wilkins, Baltimore, 1986).
Wise, R. A. & Hoffman, C. D. Localization of drug reward mechanisms by intracranial injections. Synapse 10, 247–263 (1992).
Robinson, T. E. & Berridge, K. C. The neural basis for drug craving: an incentive-sensitization theory of addiction. Brain Res. Rev. 18, 247–291 (1993).
Robbins, T. W. & Everitt, B. J. Neurobehavioural mechanisms of reward and motivation. Curr. Opin. Neurobiol. 6, 228–236 (1996).
Romo, R. & Schultz, W. Dopamine neurons of the monkey midbrain: Contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63, 592–606 (1990).
Schultz, W. & Romo, R. Dopamine neurons of the monkey midbrain: Contingencies of responses to stimuli eliciting immediate behavioural reactions. J. Neurophysiol. 63, 607–624 (1990).
Ljungberg, T., Apicella, P. & Schultz, W. Responses of monkey dopamine neurons during learning of behavioural reactions. J. Neurophysiol. 67, 145–163 (1992).
Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900–913 (1993).
Schultz, W., Dayan, P. & Montague, R. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nature Neurosci. 1, 304–309 (1998).
Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).
Salamone, J. D. The involvement of nucleus accumbens dopamine in appetitive and aversive motivation. Behav. Brain Res. 61, 117–133 (1994).
Horvitz, J. C. Mesolimbocortical and nigrostriatal dopamine responses to salient non-reward events. Neuroscience 96, 651–656 (2000).
Sutton, R. S. & Barto, A. G. in Learning and Computational Neuroscience: Foundations of Adaptive Networks (eds Gabriel, M. & Moore, J.) 497–537 (MIT Press, Cambridge, Massachusetts, 1990).
Mackintosh, N. J. Conditioning and Associative Learning (Oxford Univ. Press, New York, 1983).
Friston, K. J., Tononi, G., Reeke, G. N. Jr, Sporns, O. & Edelman, G. M. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59, 229–243 (1994).
Montague, P. R., Dayan, P. & Sejnowski, T. J. A framework for mesencephalic dopamine systems based on predictive Hebbian learning. J. Neurosci. 16, 1936–1947 (1996).
Houk, J. C., Adams, J. L. & Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G.) 249–270 (MIT Press, Cambridge, Massachusetts, 1995).
Suri, R. & Schultz, W. A neural network with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91, 871–890 (1999).
Kamin, L. J. in Fundamental Issues in Instrumental Learning (eds Mackintosh, N. J. & Honig, W. K.) 42–64 (Dalhousie Univ. Press, Dalhousie, 1969).
Martin, I. & Levey, A. B. Blocking observed in human eyelid conditioning. Q. J. Exp. Psychol. B 43, 233–255 (1991).
Dickinson, A. Causal learning: An associative analysis. Q. J. Exp. Psychol. B 54, 3–25 (2001).
Holland, P. C. Brain mechanisms for changes in processing of conditioned stimuli in Pavlovian conditioning: Implications for behavioural theory. Anim. Learn. Behav. 25, 373–399 (1997).
Calabresi, P., Maj, R., Pisani, A., Mercuri, N. B. & Bernardi, G. Long-term synaptic depression in the striatum: Physiological and pharmacological characterization. J. Neurosci. 12, 4224–4233 (1992).
Garcia-Munoz, M., Young, S. J. & Groves, P. Presynaptic long-term changes in excitability of the corticostriatal pathway. NeuroReport 3, 357–360 (1992).
Wickens, J. R., Begg, A. J. & Arbuthnott, G. W. Dopamine reverses the depression of rat corticostriatal synapses which normally follows high-frequency stimulation of cortex in vitro. Neuroscience 70, 1–5 (1996).
Calabresi, P. et al. Abnormal synaptic plasticity in the striatum of mice lacking dopamine D2 receptors. J. Neurosci. 17, 4536–4544 (1997).
Otani, S., Blond, O., Desce, J. M. & Crepel, F. Dopamine facilitates long-term depression of glutamatergic transmission in rat prefrontal cortex. Neuroscience 85, 669–676 (1998).
Otani, S., Auclair, N., Desce, J., Roisin, M. P. & Crepel, F. J. Neurosci. 19, 9788–9802 (1999).
Centonze, D. et al. Unilateral dopamine denervation blocks corticostriatal LTP. J. Neurophysiol. 82, 3575–3579 (1999).
Minsky, M. L. Steps toward artificial intelligence. Proc. Inst. Radio Engineers 49, 8–30 (1961).
Rauhut, A. S., McPhee, J. E. & Ayres, J. J. B. Blocked and overshadowed stimuli are weakened in their ability to serve as blockers and second-order reinforcers in Pavlovian fear conditioning. J. Exp. Psychol: Anim. Behav. Process 25, 45–67 (1999).
Schultz, W. & Romo, R. Responses of nigrostriatal dopamine neurons to high intensity somatosensory stimulation in the anesthetized monkey. J. Neurophysiol. 57, 201–217 (1987).
We thank B. Aebischer, J. Corpataux, A. Gaillard, B. Morandi, A. Pisani and F. Tinguely for expert technical assistance. The study was supported by the Swiss NSF, the European Union (Human Capital and Mobility, and Biomed 2 programmes), the James S. McDonnell Foundation and the British Council.
About this article
Cite this article
Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001). https://doi.org/10.1038/35083500
This article is cited by
Dopaminergic prediction errors in the ventral tegmental area reflect a multithreaded predictive model
Nature Neuroscience (2023)
VTA dopamine neuron activity encodes social interaction and promotes reinforcement learning through social prediction error
Nature Neuroscience (2022)
Deep brain stimulation of the “medial forebrain bundle”: a strategy to modulate the reward system and manage treatment-resistant depression
Molecular Psychiatry (2022)
Nature Reviews Neuroscience (2021)
A Model of the Neural Mechanism of Instrumentalization of Movements Induced by Stimulation of the Motor Cortex
Neuroscience and Behavioral Physiology (2021)