The finding that dopamine neurons of the ventral midbrain are activated by reward uncertainty challenges past models of dopamine's role in reinforcement, while at the same time suggesting that dopamine may act to reinforce gambling and risk-taking behavior.
To understand the function of dopamine in the brain, it is of obvious importance to know the natural conditions in which the activity of dopamine-containing neurons is modulated. To this end, Wolfram Schultz et al have made single-unit electrophysiological recordings from dopamine neurons of the substantia nigra and ventral tegmental area of behaving primates. As virtually no qualitative electrophysiological differences between dopamine neurons of these two regions of the ventral midbrain have been reported, at any level of investigation, I will refer to them collectively as dopamine neurons. Dopamine neurons respond with a phasic burst of activity (∼100 ms in duration, with an onset latency of ∼80 ms) following the onset of any arbitrary stimulus that predicts reward (food or liquid), provided that the stimulus itself was unpredicted (reviewed by Schultz).1 The basic observation, confirmed and extended by recent work,2,3, is that the phasic response of dopamine neurons appears to signal a reward prediction error. Thus, at each moment in time, the activity of dopamine neurons increases if reward prediction is better than expected, decreases if reward prediction is worser than expected, and is unchanged if reward prediction occurs as expected.
The pattern of activity observed in dopamine neurons displays a striking resemblance to the prediction error signal that drives reinforcement in the Rescorla–Wagner theory of associative learning and in temporal difference (TD) learning algorithms.4,5 This suggests a mechanism by which dopamine could mediate its long-lasting, conditioned effects by ‘teaching’ target neurons the motivational value of various stimuli and actions. This may occur gradually through synaptic plasticity. However, the concept of prediction error has not shed much light on the unconditioned effects of dopamine. These are the immediate effects of dopamine on behavior, such as its effects on locomotor activity, attention and cognition, that presumably involve changes in information processing.
If dopamine neurons code reward prediction error, their phasic activation following reward should decline monotonically as the animal's subjective probability of reward increases. Prior work had only examined situations in which reward was predicted by a stimulus or action, with a probability approaching zero or one. We have now studied the responses of dopamine neurons across the full range of probabilities.3 However, this was not merely a quantitative variation on past work, as uncertainty was examined for the first time. Uncertainty is maximal at a probability of 0.5, but absent at the extremes of zero and one.
Monkeys were conditioned in a Pavlovian design in which distinct visual stimuli indicated the probability of a liquid reward occurring 2 s after stimulus onset. As hypothesized, the phasic activation following reward-predicting stimuli varied monotonically with reward probability. More unexpectedly, we also found that the neural activity gradually increased during the interval between onset of the conditioned stimulus and potential reward. This expectation-related activity was only present when there was uncertainty about the reward outcome. It was maximal at a probability of 0.5, less pronounced at probabilities of 0.25 and 0.75, and absent at 0.0 or 1.0. In additional experiments, we found that this ‘sustained’ activation increased with the discrepancy between the magnitude of potential reward outcomes. Thus, dopamine neurons appear to code the uncertainty in reward magnitude.
The sustained activation may shed light on unconditioned aspects of dopamine transmission that are not readily explained by phasic activation. For example, the coding of reward uncertainty by dopamine neurons suggests their potential role in a nonselective form of attention.3 This suggestion is related to the proposal of Pearce and Hall6 that uncertainty facilitates learning through an attentional mechanism. In addition, uncertainty is the quantitative definition of information7 and specific dopamine-dependent disturbances in information processing are thought to be present in both Parkinson's disease8 and schizophrenia.9 The nature of dopamine's involvement in these unconditioned effects remains to be seen.
The discovery of activity related to reward uncertainty poses a challenge to models of dopamine's action in reinforcement. This finding is inconsistent with the TD model of dopamine neural activity,4,5 which does not predict any unique effect of uncertainty, and which implies a stable basal level of neural activity upon which phasic modulations are superimposed. Although present forms of the TD model may not provide valid descriptions of the activity of dopamine neurons, the basic principles of the model still appear very useful, particularly the ideas that prediction error is the form of a reinforcement signal and that prediction error should transfer back in time to the first predictive stimulus. The major reason for adhering to this basic idea, in my opinion, is the well-established fact that dopamine has positive reinforcing properties,10 and that a (broadly defined) prediction error appears to be the only practical form for a reinforcement term to take. Even theories of learning focused on the importance of attention rely on prediction error.6,11 In addition, dopamine neurons have the anatomical properties that might be expected for such a function, broadcasting their signal to brain regions known to be critical for motivation and the selection of behaviors.
Whether the sustained and phasic components of activity are discriminated by postsynaptic neurons is a critical question. However, the fact that the activity of dopamine neurons changes slowly in the absence of any change in the stimulus (reward prediction), and rapidly after onset of a reward predicting stimulus, does not even necessarily imply the coding of two parameters. Both observations could potentially correspond to the coding of (subjective) uncertainty in the prediction of reward, as uncertainty would presumably increase sharply at the moment of stimulus onset. It would therefore be premature to ascribe discrete functions to the two patterns of activity.
If dopamine is increased by reward uncertainty, and it is reinforcing, then it seems likely that it would contribute to the reinforcing and potentially addictive properties of gambling, which is defined by reward uncertainty. Past work has shown that ‘reward’ regions of the brain are activated during a gambling-like task12 and that extracellular dopamine rises during the playing of a video game13 (which may increase reward uncertainty). But why would a brain system promote gambling? A theory of gambling must explain, in terms consistent with evolution, why such a large number of apparently healthy people persist in an activity that is so obviously maladaptive.
Natural environments are highly structured, so that sequences of events tend to occur in stereotyped patterns. Learning these patterns allows an animal to predict and thereby to maximize reward. Risk-taking promotes learning, and therefore would be expected to have reinforcing properties. For example, pursuing large but uncertain reward over the smaller ‘sure-thing’ provides the promise of information and thus the possibility of identifying stimuli or actions that are accurate predictors of the larger reward. Thus, a limited amount of risk-taking behavior is likely to be beneficial in the long run, though it may prove costly in the short term. By contrast, risk taking in a casino is futile insofar as the odds are fixed; there are no accurate predictors and hence nothing to learn. A casino thus represents a very unnatural environment for which the brain may be ill suited. Gambling behavior might therefore be understood if one expands the concept of reward to include not only concrete objects, but also the more abstract and comprehensive notion of reward information. It is proposed that the increased activity of dopamine neurons due to reward uncertainty provides a reinforcement signal (consistent with the spirit but not the detail of present TD models) that with repeated experience causes stimuli and actions predictive of reward uncertainty to gain in motivational value. Thus, the reinforcing properties of gambling might share common mechanisms with those of addictive drugs at the neurochemical level, both being mediated by dopamine. At a theoretical level, the proposal that the ‘unnatural’ conditions of a casino distort the utility of a system designed to promote the value of risk-taking under natural conditions resembles the suggestion that addictive drugs derive their power by targeting a system designed to promote the value of natural rewards.10
About this article