Dopamine cell firing can encode errors in reward prediction, providing a learning signal to guide future behavior. Yet dopamine is also a key modulator of motivation, invigorating current behavior. Existing theories propose that fast (phasic) dopamine fluctuations support learning, whereas much slower (tonic) dopamine changes are involved in motivation. We examined dopamine release in the nucleus accumbens across multiple time scales, using complementary microdialysis and voltammetric methods during adaptive decision-making. We found that minute-by-minute dopamine levels covaried with reward rate and motivational vigor. Second-by-second dopamine release encoded an estimate of temporally discounted future reward (a value function). Changing dopamine immediately altered willingness to work and reinforced preceding action choices by encoding temporal-difference reward prediction errors. Our results indicate that dopamine conveys a single, rapidly evolving decision variable, the available reward for investment of effort, which is employed for both learning and motivational functions.
At a glance
- A cellular mechanism of reward-related learning. Nature 413, 67–70 (2001). , &
- A neural substrate of prediction and reward. Science 275, 1593–1599 (1997). , &
- Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci 10, 1020–8 (2007). , , &
- Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J. Neurosci. 34, 698–704 (2014). , , &
- Optogenetic mimicry of the transient activation of dopamine neurons by natural reward is sufficient for operant reinforcement. PLoS ONE 7, e33612 (2012). et al.
- A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013). et al.
- The debate over dopamine's role in reward: the case for incentive salience. Psychopharmacology (Berl.) 191, 391–431 (2007).
- Dopamine modulates reward-related vigor. Neuropsychopharmacology 38, 1495–1503 (2013). et al.
- Regional brain dopamine metabolism: a marker for the speed, direction, and posture of moving animals. Science 229, 62–65 (1985). &
- How fast to work: response vigor, motivation and tonic dopamine. Adv. Neural Inf. Process. Syst. 18, 1019 (2006). , &
- Mice with chronically elevated dopamine exhibit enhanced motivation, but not learning, for a food reward. Neuropsychopharmacology 31, 1362–1370 (2006). , , &
- The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470–485 (2012). &
- Correlated coding of motivation and outcome of decision by dopamine neurons. J. Neurosci. 23, 9913–9923 (2003). , , &
- Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618 (2003). , , , &
- Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. 24, 1265–1271 (2004). , , , &
- Prolonged dopamine signaling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013). , , , &
- Representation of action-specific reward values in the striatum. Science 310, 1337–1340 (2005). , , &
- Vigor in the face of fluctuating rates of reward: an experimental examination. J. Cogn. Neurosci. 23, 3933–3938 (2011). , , , &
- The dorsomedial striatum encodes net expected return, critical for energizing performance vigor. Nat. Neurosci. 16, 639–647 (2013). , &
- Foraging theory (Princeton University Press, 1986).
- Reinforcement Learning: an Introduction (MIT Press, 1998). &
- The goal-gradient hypothesis and maze learning. Psychol. Rev. 39, 25 (1932).
- Response Times of carbon fiber microelectrodes to dynamic changes in catecholamine concentration. Anal. Chem. 74, 539–546 (2002). , &
- Real-time measurement of dopamine fluctuations after cocaine in the brain of behaving rats. Proc. Natl. Acad. Sci. USA 102, 10023–10028 (2005). et al.
- The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior. J. Neurosci. 30, 16585–16600 (2010).
- The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking. Brain Res. Brain Res. Rev. 31, 6–41 (1999). &
- Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine. Nat. Neurosci. 13, 25–27 (2010). , &
- The neural correlates of subjective value during intertemporal choice. Nat. Neurosci. 10, 1625–1633 (2007). &
- Optogenetic interrogation of dopaminergic modulation of the multiple phases of reward-seeking behavior. J. Neurosci. 31, 10829–10835 (2011). et al.
- Optimal indolence: a normative microscopic approach to work and leisure. J. R. Soc. Interface 11, 20130969 (2014). et al.
- What 50 years of research tell us about pausing under ratio schedules of reinforcement. Behav. Anal. 31, 39 (2008). , &
- Dopamine invigorates reward seeking by promoting cue-evoked excitation in the nucleus accumbens. J. Neurosci. 34, 14349–14364 (2014). &
- Reward without dopamine. J. Neurosci. 23, 10827–10831 (2003). &
- Accumbens dopamine and the regulation of effort in food-seeking behavior: modulation of work output by different ratio or force requirements. Behav. Brain Res. 151, 83–91 (2004). , , , &
- Dextroamphetamine. Its cognitive and behavioral effects in normal and hyperactive boys and normal men. Arch. Gen. Psychiatry 37, 933–943 (1980). et al.
- Amping up effort: effects of d-amphetamine on human effort-based decision-making. J. Neurosci. 31, 16597–16602 (2011). , , , &
- From anticipation to action, the role of dopamine in perceptual decision making: an fMRI-tyrosine depletion study. J. Neurophysiol. 108, 501–512 (2012). et al.
- Dissociable effects of dopamine on learning and performance within sensorimotor striatum. Basal Ganglia 4, 43–54 (2014). et al.
- Basal ganglia contributions to motor control: a vigorous tutor. Curr. Opin. Neurobiol. 20, 704–716 (2010). &
- Evidence for hyperbolic temporal discounting of reward in control of movements. J. Neurosci. 32, 11727–11736 (2012). , &
- Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–1063 (2006). , , , &
- Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–41 (2009). &
- Influence of phasic and tonic dopamine release on receptor activation. J. Neurosci. 30, 14273–14283 (2010). , , &
- A computational substrate for incentive salience. Trends Neurosci. 26, 423–428 (2003). , &
- The primate ventral pallidum encodes expected reward value and regulates motor action. Neuron 76, 826–837 (2012). &
- Ventral striatum: a critical look at models of learning and evaluation. Curr. Opin. Neurobiol. 21, 387–392 (2011). &
- Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–902 (2003). , &
- Dopamine ramps are a consequence of reward prediction errors. Neural Comput. 26, 467–471 (2014).
- Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front Neural Circuits 8, 36 (2014). &
- Striatal dopamine release is triggered by synchronized activity in cholinergic interneurons. Neuron 75, 58–64 (2012). et al.
- Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733 (2011). et al.
- Selective activation of striatal fast-spiking interneurons during choice execution. Neuron 67, 466–79 (2010). , , &
- Basal ganglia beta oscillations accompany cue utilization. Neuron 73, 523–536 (2012). et al.
- Canceling actions involves a race between basal ganglia pathways. Nat. Neurosci. 16, 1118–1124 (2013). , , , &
- Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005). &
- Rapid decision threshold modulation by reward rate in a neural network. Neural Netw. 19, 1013–1026 (2006). , &
- Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616 (2002). , &
- Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004). , &
- Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Front Neurosci 6, 9 (2012). , &
- Putting desire on a budget: dopamine and energy expenditure, reconciling reward and resources. Front Integr Neurosci 6, 49 (2012). , &
- In vivo neurochemical monitoring using benzoyl chloride derivatization and liquid chromatography–mass spectrometry. Anal. Chem. 84, 412–419 (2012). , , &
- Mass spectrometry “sensor” for in vivo acetylcholine monitoring. Anal. Chem. 84, 4659–4664 (2012). , , , &
- Regional specificity in the real-time development of phasic dopamine transmission patterns during acquisition of a cue–cocaine association in rats. Eur. J. Neurosci. 30, 1889–1899 (2009). et al.
- Representation and timing in theories of the dopamine system. Neural Comput. 18, 1637–1677 (2006). , &
- Optimizing the temporal resolution of fast-scan cyclic voltammetry. ACS Chem. Neurosci. 3, 285–292 (2012). et al.
- Tests of an equivalence rule for fixed and variable reinforcer delays. J. Exp. Psychol. Anim. Behav. Process. 10, 426 (1984).
- Précis of breakdown of will. Behav. Brain Sci. 28, 635–650 (2005).
- Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 7837–7846 (2008). &
- Normative and descriptive models of decision making: time discounting and risk sensitivity. Ciba Found. Symp. 208, 51–67 (1997).
- Supplementary Figure 1: Reward rate affects the decision to begin work. (305 KB)
(a) Latency distributions are bimodal, and depend on reward rate. Very short latencies (early peak) preferentially occur when a greater proportion of recent trials have been rewarded (same data set as Fig 1d–i). (b) (top) Schematic of video analysis. Each trial was categorized as “engaged” (already waiting for Light-On) or non-engaged based upon distance (s) and orientation (θ) immediately before Light-On (see Methods). (bottom) Arrows indicate rat head position and orientation for engaged (pink) and non-engaged (green) trials (one example session shown). (c) Categorization into engaged, non-engaged trials accounts for bimodal latency distribution (data shown are all non-laser trials across 12 ChR2 sessions in TH-Cre+ rats). (d) Proportion of engaged trials increases when more recent trials have been rewarded (3336 trials from 4 rats, r=0.82, p=0.003). (e) Especially for non-engaged trials, latencies are lower when reward rate is higher (r=−0.11,p=0.004 for 1570 engaged trials, r=−0.18, p=5.2×10−19 for 1766 non-engaged trials).
- Supplementary Figure 2: Individual microdialysis sessions. (802 KB)
Each row shows data for a different session, with indicated rat ID (e.g. IM463) and recording side (LH = left, RH=right). From left: dialysis probe location, behavioral and [DA] time courses, and individual session correlations to behavioral variables. Reward rate is in units of rewards per min. Numbers of microdialysis samples for each of the seven sessions: 86,72,39,39,68,73,67 respectively. The overall relationship between dopamine and reward rate remained highly significant even if excluding periods of inactivity (defined as no trials initiated for >2 minutes, shaded in green; regression R2 = 0.12, p = 1.4 × 10−13).
- Supplementary Figure 3: Cross-correlograms for behavioral variables and neurochemicals. (592 KB)
Each plot shows cross-correlograms averaged across all microdialysis sessions, all using the same axes (−20min to +20min lags, −0.5 to +1 correlation). Colored lines indicate statistical thresholds corrected for multiple comparisons (see Methods). Many neurochemical pairs show no evidence of covariation, but others display strong relationships including a cluster of glutamate, serine, aspartate and glycine.
- Supplementary Figure 4: Individual voltammetry sessions. (1,786 KB)
Each row shows data for a different rat (e.g. IM355, which was also used as the example in Figs.3, 4). At left, recording site within nucleus accumbens. Middle panels show behavioral data for the FSCV session (same format as Fig.1). Right panels show individual FSCV data (same format as Fig.3, but with additional event alignments).
- Supplementary Figure 5: SMDP model. (376 KB)
(a) Task performance was modeled as a sequence of transitions between states of the agent (rat). Each state had a single associated cached value V(s) (rather than, for example, separate state-action (Q) values for leftward and rightward trials). Most state transitions occur at variable times (hence “semi-Markov”) marked by observed external events (Center-In, Go-Cue, etc). In contrast, the state sequence between Side-Out and Reward Port In is arbitrarily defined (“Approaching Reward Port” begins 1s before Reward Port In; “Arriving At Reward Port” begins 0.5s before Reward Port In). Changing the number or specific timing of these intermediate states does not materially affect the rising shape of the value function. (b) Average correlation (color scale = Spearman’s r) between SMDP model state value at Center-In (Vci) and latency across all six FSCV rats, for a range of learning rates ɑ and exponential discounting time constants ɣ. Note that color scale is inverted (red indicates strongest negative relationship, with higher value corresponding to shorter latency). White dot marks point of strongest relationship (ɑ=0.40, ɣ=0.95). (c) Correlation between [DA] and state value V is stronger than the correlation between [DA] and reward prediction error δ, across the same range of parameters. Color scale at right is the same for both matrices (Spearman’s r).
- Supplementary Figure 6: Dopamine relationships to temporally-stretched model variables. (510 KB)
(a) Kernel consisted of an exponential rise (to 50% of asymptote) and an exponential fall, with separate time constants. (b) Within-trial correlation coefficients between [DA] and kernel-convolved model variables V and δ, for a range of rise and fall time constants (0 – 1.5s each, in 50ms timesteps, using data from all 6 rats). Regardless of parameter values, [DA] correlations to V were always higher than to δ. (c) Same example data as Fig. 4E, but also showing convolved V and δ (using time constants that maximized correlation to [DA] in each case). (d) Trial-by-trial (top) and average (bottom) [DA], convolved V, and convolved δ, for the same session as Fig. 4d,e.
- Supplementary Figure 7: Histology for behavioral optogenetic experiments. (511 KB)
Identifier (e.g. “IM389”) for each rat is given at bottom right corner. Coronal sections shown are within 180µm (anterior-posterior) of the observed fiber tip location. Green indicates expression of eYFP, blue is DAPI counterstain. In a couple of cases (IM423, IM441) autofluorescence of damaged brain tissue is visible along the optic fiber tracts; this was not specific to the green channel.
- Supplementary Figure 8: Further analysis of persistence of optogenetic effects. (589 KB)
(a) Regression analysis showing substantial effects of recent rewards (black) on latency, but no comparable effect of recent Side-In laser stimulations on latency. (b) Effects of Light-On [DA] manipulation on same-trial latency distributions (top), and of Side-In [DA] manipulation on next-trial latency distributions (bottom). Dataset shown is the same as Fig. 6c, i.e. all completed trials in TH-Cre+ rats with ChR2 (left), TH-Cre− rats with ChR2 (middle) and TH-Cre+ rats with halorhodopsin (right). (c) Regression analysis of laser stimulation on subsequent left/right choices. Recent food rewards for a given left/right action increase the probability that it will be repeated. Extra [DA] at Light-On has little or no effect on subsequent choices, but extra [DA] at Side-In is persistently reinforcing. For the Side-In data, note especially the positive coefficients for otherwise unrewarded laser trials.
- Supplementary Figure 9: Video analysis of optogenetic effects on latency. (379 KB)
(a) Extra [DA] at Light-On causes shorter latencies for non-engaged trials, but longer latencies for a subset of engaged trials. Top plot shows all trials (for the n=4 TH-Cre+ rats with ChR2 stimulation at Light-On for which video was recorded; 3 sessions/rat; 3336 no-laser trials in grey; 1335 laser trials in blue). Bottom plots show the breakdown into engaged (n=1975) and non-engaged (n=2696) trials. (b) We examined whether laser-slowed trials might be those in which the rat was waiting at the wrong port (if, for example, DA were to increase the salience of currently attended stimuli). Engaged trials were further broken down into “lucky guesses” (those trials for which the rat was immediately adjacent to the start port as it was illuminated) and “unlucky guesses” (immediately adjacent to one of the other two possible start ports). Blue dashed ellipses indicate zones used to classify trials by guessed port (8.5cm long diameter, 3.4cm short diameter) (c) Laser-slowing was observed for both lucky (n=603) and unlucky (n=1007) guesses. Note that blue distribution is bimodal in both cases, indicating that only a subset of trials were affected. Video observations suggested that on some trials extra [DA] evokes a small extra head/neck movement, that makes the trajectory to the illuminated port longer and therefore slower. (d) Quantification of trajectories, by scoring rat location on each video frame from 1s before Light-On to 1s after Center-In. Colored lines show all individual trajectories for one example session. Panels at right show the same trajectories plotted as distance remaining from Center-In port, by time elapsed from either Light-On or Center-In. Note that for non-engaged trials (green), the approach to the Center-In port consistently takes ~1-2s. Therefore, the epoch considered as “baseline” in the FSCV analyses (−3 to −1s relative to Center-In) is around the time that rats decide to initiate approach behaviors. (e) Extra [DA] causes longer average trajectories for engaged trials. Cumulative distributions of path-lengths between Light-On and Center-In, for (top-to-bottom) engaged/lucky, engaged/unlucky and non-engaged respectively. Blue lines indicate laser trials, and p-values are from Komolgorov-Smirnov tests comparing laser to no-laser distributions (no-laser/laser trial numbers: top, 292/75; middle, 424/99; bottom, 1897/792). On engaged trials rats often reoriented between the three potential start ports, perhaps checking if they were illuminated; one possibility is that the extra laser-evoked movement on engaged trials reflects dopaminergic facilitation of these orienting movements. If such a movement is already close to execution before Light-On, it may be evoked before the correct start port can be appropriately targeted. (f) Additional trajectory analysis, plotting time courses of rat distance from the illuminated start port. On non-engaged trials extra [DA] tends to make the approach to the illuminated start port occur earlier (note progressive separation of green, blue lines when aligned on Light-On). However, the approach time course is extremely similar (note overlapping lines in the final ~1-2s before Center-In), indicating that extra [DA] did not affect the speed of approach.
- Supplementary Figure 10: Optogenetic effects on hazard rates for individual video-scored rats. (152 KB)
Latency survivor plots (top) and corresponding hazard rates (bottom) for each of the four TH-Cre+ rats with ChR2 stimulation at Light-On for which video was recorded (each rat had 3 video sessions that were concatenated for analysis). Only non-engaged trials are included (Numbers of no-laser/laser trials: IM-389, 522/215; IM-391, 294/125; IM-392, 481/191; IM-394, 462/189). For each rat laser stimulation caused an increase in the hazard rate of the Center-In event ~1-2s later (the duration of an approach).