Mesolimbic dopamine signals the value of work

Journal name:
Nature Neuroscience
Volume:
19,
Pages:
117–126
Year published:
DOI:
doi:10.1038/nn.4173
Received
Accepted
Published online

Abstract

Dopamine cell firing can encode errors in reward prediction, providing a learning signal to guide future behavior. Yet dopamine is also a key modulator of motivation, invigorating current behavior. Existing theories propose that fast (phasic) dopamine fluctuations support learning, whereas much slower (tonic) dopamine changes are involved in motivation. We examined dopamine release in the nucleus accumbens across multiple time scales, using complementary microdialysis and voltammetric methods during adaptive decision-making. We found that minute-by-minute dopamine levels covaried with reward rate and motivational vigor. Second-by-second dopamine release encoded an estimate of temporally discounted future reward (a value function). Changing dopamine immediately altered willingness to work and reinforced preceding action choices by encoding temporal-difference reward prediction errors. Our results indicate that dopamine conveys a single, rapidly evolving decision variable, the available reward for investment of effort, which is employed for both learning and motivational functions.

At a glance

Figures

  1. Adaptive choice and motivation in the trial-and-error task.
    Figure 1: Adaptive choice and motivation in the trial-and-error task.

    (a) Sequence of behavioral events (in rewarded trials). (b) Choice behavior in a representative session. Numbers at top denote nominal block-by-block reward probabilities for left (purple) and right (green) choices. Tick marks indicate actual choices and outcomes on each trial (tall ticks indicate rewarded trials, short ticks unrewarded). The same choice data is shown below in smoothed form (thick lines, seven-trial smoothing). (c) Relationship between reward rate and latency for the same session. Tick marks indicate only whether trials were rewarded or not, regardless of choice. Solid black line shows reward rate and cyan line shows latency (on inverted log scale), both smoothed in the same way as in b. (d) Choices progressively adapted toward the block reward probabilities (data set for di: n = 14 rats, 125 sessions, 2,738 ± 284 trials per rat). (e) Reward rate breakdown by block reward probabilities. (f) Latencies by block reward probabilities. Latencies became rapidly shorter when reward rate was higher. (g) Latencies by proportion of recent trials rewarded. Error bars represent s.e.m. (h) Latency distributions presented as survivor curves (the average fraction of trials for which the Center-In event has not yet happened, by time elapsed from Light-On) broken down by proportion of recent trials rewarded. (i) Same latency distributions as h, but presented as hazard rates (the instantaneous probability that the center-in event will happen, if it has not happened yet). The initial bump in the first second after Light-On reflects engaged trials (Supplementary Fig. 1), after that hazard rates are relatively stable and continue to scale with reward history.

  2. Minute-by-minute dopamine levels track reward rate.
    Figure 2: Minute-by-minute dopamine levels track reward rate.

    (a) Total ion chromatogram of a single representative microdialysis sample, illustrating the set of detected analytes in this experiment. x axis indicates chromatography retention times, y axis indicates intensity of ion detection for each analyte (normalized to peak values). Inset, locations of each microdialysis probe in the nucleus accumbens (all data shown in the same Paxinos atlas section; six were on the left side and one on the right). DA, dopamine; 3-MT, 3-methoxytyramine; NE, norepinephrine; NM, normetanephrine; 5-HT, serotonin; DOPAC, 3,4-dihydroxyphenylacetate acid; HVA, homovanillic acid; 5HIAA, 5-hydroxyindole-3-acetic acid; ACh, acetylcholine. (b) Regression analysis results indicating strength of linear relationships between each analyte and each of four behavioral measures (reward rate, number of attempts, exploitation index and cumulative rewards). Data are from six rats (seven sessions, total of 444 1-min samples). Color scale shows P values, Bonferroni-corrected for multiple comparisons (4 behavioral measures × 19 analytes), with red bars indicating a positive relationship and blue bars indicating a negative relationship. Given that both reward rate and attempts showed significant correlations with [DA], we constructed a regression model that included these predictors and an interaction term. In this model, R2 remained at 0.15 and only reward rate showed a significant partial effect (P < 2.38 × 10−12). (c) An alternative assessment of the relationship between minute-long [DA] samples and behavioral variables. In each of the seven sessions, [DA] levels were divided into three equal-sized bins (low, medium and high); different colors indicate different sessions. For each behavioral variable, means were compared across [DA] levels using one-way ANOVA. There was a significant main effect of reward rate (F(2,18) = 10.02, P = 0.0012), but no effect of attempts (F(2,18) = 1.21, P = 0.32), exploitation index (F(2,18) = 0.081, P = 0.92) or cumulative rewards (F(2,18) = 0.181, P = 0.84). Post hoc comparisons using the Tukey test revealed that the mean reward rates of low and high [DA] differed significantly (P = 0.00082). See also Supplementary Figures 2 and 3.

  3. A succession of within-trial dopamine increases.
    Figure 3: A succession of within-trial dopamine increases.

    (a) Examples of FSCV data from a single session. Color plots display consecutive voltammograms (every 0.1 s) as a vertical colored strip; examples of individual voltammograms are shown at top (taken from marked time points). Dashed vertical lines indicate side-in events for rewarded (red) and unrewarded (blue) trials. Black traces below indicate raw current values, at the applied voltage corresponding to the dopamine peak. (b) [DA] fluctuations for each of the 312 completed trials of the same session, aligned to key behavioral events. For Light-On and Center-In alignments, trials are sorted by latency (pink dots mark light on times; white dots mark Center-In times). For the other alignments, rewarded (top) and unrewarded (bottom) trials are shown separately, but otherwise in the order in which they occurred. [DA] changes aligned to Light-On were assessed relative to a 2-s baseline period, ending 1 s before Light-On. For the other alignments, [DA] is shown relative to a 2-s baseline ending 1 s before Center-In. (c) Average [DA] changes during a single session (same data as b; shaded area represents s.e.m.). (d) Average event-aligned [DA] change across all six animals, for rewarded and unrewarded trials (see Supplementary Fig. 4 for each individual session). Data are normalized by the peak average rewarded [DA] in each session and are shown relative to the same baseline epochs as in b. Black arrows indicate increasing levels of event-related [DA] during the progression through rewarded trials. Colored bars at top indicate time periods with statistically significant differences (red, rewarded trials greater than baseline, one-tailed t tests for each 100-ms time point individually; blue, same for unrewarded trials; black, rewarded trials different to unrewarded trials, two-tailed t tests; all statistical thresholds set to P = 0.05, uncorrected).

  4. Within-trial dopamine fluctuations reflect state value dynamics.
    Figure 4: Within-trial dopamine fluctuations reflect state value dynamics.

    (a) Top, temporal discounting: the motivational value of rewards is lower when they are distant in time. With the exponential discounting commonly used in RL models, value is lower by a constant factor γ for each time step of separation from reward. People and other animals may actually use hyperbolic discounting that can optimize reward rate (as rewards/time is inherently hyperbolic). Time parameters were chosen here simply to illustrate the distinct curve shapes. Bottom, effect of reward cue or omission on state value. At trial start, the discounted value of a future reward will be less if that reward is less likely. Lower value provides less motivational drive to start work, producing, for example, longer latencies. If a cue signals that upcoming reward is certain, the value function jumps up to the (discounted) value of that reward. For simplicity, the value of subsequent rewards is not included. (b) The reward prediction error δ reflects abrupt changes in state value. If the discounted value of work reflects an unlikely reward (for example, probability = 0.25) a reward cue prompts a larger δ than if the reward was likely (for example, probability = 0.75). Note that in this idealized example, δ would be zero at all other times. (c) Task events signal times to reward. Data is from the example session shown in Figure 3c. Bright red indicates actual times to the very next reward, dark red indicates subsequent rewards. Green arrowheads indicate average times to next reward (harmonic mean, only including rewards in the next 60s). As the trial progresses, average times-to-reward get shorter. If the reward cue is received, rewards are reliably obtained ~2 s later. Task events are considered to prompt transitions between different internal states (Supplementary Fig. 5) whose learned values reflect these different experienced times to reward. (d) Average state value of the RL model for rewarded (red) and unrewarded (blue) trials, aligned on the Side-In event. The exponentially discounting model received the same sequence of events as in Figure 3c, and model parameters (α = 0.68, γ = 0.98) were chosen for the strongest correlation to behavior (comparing state values at Center-In to latencies in this session, Spearman r = −0.34). Model values were binned at 100 ms, and only bins with at least three events (state transitions) were plotted. (e) Example of the [DA] signal during a subset of trials from the same session compared with model variables. Black arrows indicate Center-In events, red arrows indicate Side-In with reward cue, and blue arrows indicate Side-In alone (omission). Scale bars represent 20 nM ([DA]), 0.2 (V) and 0.2 (δ). Dashed gray lines mark the passage of time in 10-s intervals. (f) Within-trial [DA] fluctuations were more strongly correlated with model state value (V) than with RPE (δ). For every rat, the [DA]:V correlation was significant (number of trials for each rat: 312, 229, 345, 252, 200, 204; P < 10−14 in each case; Wilcoxon signed-rank test of null hypothesis that median correlation within trials is zero) and significantly greater than the [DA]:δ correlation (P < 10−24 in each case, Wilcoxon signed-rank test). Group-wise, both [DA]:V and [DA]:δ correlations were significantly nonzero, and the difference between them was also significant (n = 6 sessions, all comparisons P = 0.031, Wilcoxon signed-rank test). Model parameters (α = 0.4, γ = 0.95) were chosen to maximize the average behavioral correlation across all six rats (Spearman r = −0.28), but the stronger [DA] correlation to V than to δ was seen for all parameter combinations (Supplementary Fig. 5). (g) Model variables were maximally correlated with [DA] signals ~0.5 s later, consistent with a slight delay caused by the time taken by the brain to process cues, and by the FSCV technique.

  5. Between-trial dopamine shifts reflect updated state values.
    Figure 5: Between-trial dopamine shifts reflect updated state values.

    (a) Less-expected outcomes provoke larger changes in [DA]. [DA] data from all FSCV sessions together (as in Fig. 3d), broken down by recent reward history and shown relative to pre-trial baseline (−3 to −1 s relative to Center-In). Note that the [DA] changes after reward omission last at least several seconds (shift in level), rather than showing a highly transient dip followed by return to baseline, as might be expected for encoding RPEs alone. (b) Quantification of [DA] changes, between baseline and reward feedback (0.5–1.0 s after Side-In for rewarded trials, 1–3 s after Side-In for unrewarded trials). Error bars show s.e.m. (c) Data are presented as in a, but plotted relative to [DA] levels after reward feedback. These [DA] observations are consistent with a variable baseline whose level depends on recent reward history (as in Fig. 4b model). (d) Alternative accounts of [DA] make different predictions for between-trial [DA] changes. When reward expectation is low, rewarded trials provoke large RPEs, but RPEs should decline across repeated consecutive rewards. Thus, if absolute [DA] levels encode RPE, the peak [DA] evoked by the reward cue should decline between consecutive rewarded trials (and baseline levels should not change). For simplicity, this cartoon omits detailed within-trial dynamics. (e) Predicted pattern of [DA] change under this account, which also does not predict any baseline shift after reward omissions (right). (f) If instead [DA] encodes state values, then peak [DA] should not decline from one reward to the next, but the baseline level should increase (and decrease following unrewarded trials). (g) Predicted pattern of [DA] change for this alternative account. (h) Unexpected rewards cause a shift in baseline, not in peak [DA]. Average FSCV data from consecutive pairs of rewarded trials (all FSCV sessions combined, as in a), shown relative to the pre-trial baseline of the first trial in each pair. Data were grouped into lower reward expectation (left pair of plots, 165 total trials; average time between side-in events = 11.35 ± 0.22 s, s.e.m.) and higher reward expectation (right pair of plots, 152 total trials; time between side-in events = 11.65 ± 0.23 s) by a median split of each individual session (using number of rewards in last ten trials). Dashed lines indicate that reward cues evoked a similar absolute level of [DA] in the second rewarded trial compared with the first. Black arrow indicates the elevated pre-trial [DA] level for the second trial in the pair (mean change in baseline [DA] = 0.108, P = 0.013, one-tailed Wilcoxon signed rank test). No comparable change was observed if the first reward was more expected (right pair of plots; mean change in baseline [DA] = 0.0013, P = 0.108, one-tailed Wilcoxon signed rank test). (i) [DA] changes between consecutive trials follow the pattern expected for value coding, rather than RPE coding alone. Error bars represent ±s.e.m.

  6. Phasic dopamine manipulations affect both learning and motivation.
    Figure 6: Phasic dopamine manipulations affect both learning and motivation.

    (a) FSCV measurement of optogenetically evoked [DA] increases. Optic fibers were placed above VTA and [DA] change examined in nucleus accumbens core. Example shows dopamine release evoked by a 0.5-s stimulation train (average of six stimulation events, shaded area indicates ±s.e.m.). (b) Effect of varying the number of laser pulses on evoked dopamine release, for the same 30-Hz stimulation frequency. (c) Dopaminergic stimulation at Side-In reinforces the chosen left or right action. Left, in Th-Cre+ rats stimulation of ChR2 increased the probability that the same action would be repeated on the next trial. Circles indicate average data for each of six rats (three sessions each, 384 trials per session ± 9.5, s.e.m.). Middle, this effect did not occur in Th-Cre littermate controls (six rats, three sessions each, 342 ± 7 trials per session). Right, in Th-Cre+ rats expressing Halorhodopsin, orange laser stimulation at Side-In reduced the chance that the chosen action was repeated on the next trial (five rats, three sessions each, 336 ± 10 trials per session). See Supplementary Figure 8 for additional analyses. (d) Laser stimulation at Light-On causes a shift toward sooner engagement, if the rats were not already engaged. Latency distribution (on log scale, 10 bins per log unit) for non-engaged, completed trials in Th-Cre+ rats with ChR2 (n = 4 rats with video analysis; see Supplementary Figure 9 for additional analyses). (e) Same latency data as d, but presented as hazard rates. Laser stimulation (blue ticks at top left) increased the chance that rats would decide to initiate an approach, resulting in more Center-In events 1–2 s later (for these n = 4 rats, one-way ANOVA on hazard rate F(1,3) = 18.1, P = 0.024). See Supplementary Figure 10 for hazard rate time courses from the individual rats.

  7. Reward rate affects the decision to begin work.
    Supplementary Fig. 1: Reward rate affects the decision to begin work.

    (a) Latency distributions are bimodal, and depend on reward rate. Very short latencies (early peak) preferentially occur when a greater proportion of recent trials have been rewarded (same data set as Fig 1d–i). (b) (top) Schematic of video analysis. Each trial was categorized as “engaged” (already waiting for Light-On) or non-engaged based upon distance (s) and orientation (θ) immediately before Light-On (see Methods). (bottom) Arrows indicate rat head position and orientation for engaged (pink) and non-engaged (green) trials (one example session shown). (c) Categorization into engaged, non-engaged trials accounts for bimodal latency distribution (data shown are all non-laser trials across 12 ChR2 sessions in TH-Cre+ rats). (d) Proportion of engaged trials increases when more recent trials have been rewarded (3336 trials from 4 rats, r=0.82, p=0.003). (e) Especially for non-engaged trials, latencies are lower when reward rate is higher (r=−0.11,p=0.004 for 1570 engaged trials, r=−0.18, p=5.2×10−19 for 1766 non-engaged trials).

  8. Individual microdialysis sessions.
    Supplementary Fig. 2: Individual microdialysis sessions.

    Each row shows data for a different session, with indicated rat ID (e.g. IM463) and recording side (LH = left, RH=right). From left: dialysis probe location, behavioral and [DA] time courses, and individual session correlations to behavioral variables. Reward rate is in units of rewards per min. Numbers of microdialysis samples for each of the seven sessions: 86,72,39,39,68,73,67 respectively. The overall relationship between dopamine and reward rate remained highly significant even if excluding periods of inactivity (defined as no trials initiated for >2 minutes, shaded in green; regression R2 = 0.12, p = 1.4 × 10−13).

  9. Cross-correlograms for behavioral variables and neurochemicals.
    Supplementary Fig. 3: Cross-correlograms for behavioral variables and neurochemicals.

    Each plot shows cross-correlograms averaged across all microdialysis sessions, all using the same axes (−20min to +20min lags, −0.5 to +1 correlation). Colored lines indicate statistical thresholds corrected for multiple comparisons (see Methods). Many neurochemical pairs show no evidence of covariation, but others display strong relationships including a cluster of glutamate, serine, aspartate and glycine.

  10. Individual voltammetry sessions.
    Supplementary Fig. 4: Individual voltammetry sessions.

    Each row shows data for a different rat (e.g. IM355, which was also used as the example in Figs.3, 4). At left, recording site within nucleus accumbens. Middle panels show behavioral data for the FSCV session (same format as Fig.1). Right panels show individual FSCV data (same format as Fig.3, but with additional event alignments).

  11. SMDP model.
    Supplementary Fig. 5: SMDP model.

    (a) Task performance was modeled as a sequence of transitions between states of the agent (rat). Each state had a single associated cached value V(s) (rather than, for example, separate state-action (Q) values for leftward and rightward trials). Most state transitions occur at variable times (hence “semi-Markov”) marked by observed external events (Center-In, Go-Cue, etc). In contrast, the state sequence between Side-Out and Reward Port In is arbitrarily defined (“Approaching Reward Port” begins 1s before Reward Port In; “Arriving At Reward Port” begins 0.5s before Reward Port In). Changing the number or specific timing of these intermediate states does not materially affect the rising shape of the value function. (b) Average correlation (color scale = Spearman’s r) between SMDP model state value at Center-In (Vci) and latency across all six FSCV rats, for a range of learning rates ɑ and exponential discounting time constants ɣ. Note that color scale is inverted (red indicates strongest negative relationship, with higher value corresponding to shorter latency). White dot marks point of strongest relationship (ɑ=0.40, ɣ=0.95). (c) Correlation between [DA] and state value V is stronger than the correlation between [DA] and reward prediction error δ, across the same range of parameters. Color scale at right is the same for both matrices (Spearman’s r).

  12. Dopamine relationships to temporally-stretched model variables.
    Supplementary Fig. 6: Dopamine relationships to temporally-stretched model variables.

    (a) Kernel consisted of an exponential rise (to 50% of asymptote) and an exponential fall, with separate time constants. (b) Within-trial correlation coefficients between [DA] and kernel-convolved model variables V and δ, for a range of rise and fall time constants (0 – 1.5s each, in 50ms timesteps, using data from all 6 rats). Regardless of parameter values, [DA] correlations to V were always higher than to δ. (c) Same example data as Fig. 4E, but also showing convolved V and δ (using time constants that maximized correlation to [DA] in each case). (d) Trial-by-trial (top) and average (bottom) [DA], convolved V, and convolved δ, for the same session as Fig. 4d,e.

  13. Histology for behavioral optogenetic experiments.
    Supplementary Fig. 7: Histology for behavioral optogenetic experiments.

    Identifier (e.g. “IM389”) for each rat is given at bottom right corner. Coronal sections shown are within 180µm (anterior-posterior) of the observed fiber tip location. Green indicates expression of eYFP, blue is DAPI counterstain. In a couple of cases (IM423, IM441) autofluorescence of damaged brain tissue is visible along the optic fiber tracts; this was not specific to the green channel.

  14. Further analysis of persistence of optogenetic effects.
    Supplementary Fig. 8: Further analysis of persistence of optogenetic effects.

    (a) Regression analysis showing substantial effects of recent rewards (black) on latency, but no comparable effect of recent Side-In laser stimulations on latency. (b) Effects of Light-On [DA] manipulation on same-trial latency distributions (top), and of Side-In [DA] manipulation on next-trial latency distributions (bottom). Dataset shown is the same as Fig. 6c, i.e. all completed trials in TH-Cre+ rats with ChR2 (left), TH-Cre rats with ChR2 (middle) and TH-Cre+ rats with halorhodopsin (right). (c) Regression analysis of laser stimulation on subsequent left/right choices. Recent food rewards for a given left/right action increase the probability that it will be repeated. Extra [DA] at Light-On has little or no effect on subsequent choices, but extra [DA] at Side-In is persistently reinforcing. For the Side-In data, note especially the positive coefficients for otherwise unrewarded laser trials.

  15. Video analysis of optogenetic effects on latency.
    Supplementary Fig. 9: Video analysis of optogenetic effects on latency.

    (a) Extra [DA] at Light-On causes shorter latencies for non-engaged trials, but longer latencies for a subset of engaged trials. Top plot shows all trials (for the n=4 TH-Cre+ rats with ChR2 stimulation at Light-On for which video was recorded; 3 sessions/rat; 3336 no-laser trials in grey; 1335 laser trials in blue). Bottom plots show the breakdown into engaged (n=1975) and non-engaged (n=2696) trials. (b) We examined whether laser-slowed trials might be those in which the rat was waiting at the wrong port (if, for example, DA were to increase the salience of currently attended stimuli). Engaged trials were further broken down into “lucky guesses” (those trials for which the rat was immediately adjacent to the start port as it was illuminated) and “unlucky guesses” (immediately adjacent to one of the other two possible start ports). Blue dashed ellipses indicate zones used to classify trials by guessed port (8.5cm long diameter, 3.4cm short diameter) (c) Laser-slowing was observed for both lucky (n=603) and unlucky (n=1007) guesses. Note that blue distribution is bimodal in both cases, indicating that only a subset of trials were affected. Video observations suggested that on some trials extra [DA] evokes a small extra head/neck movement, that makes the trajectory to the illuminated port longer and therefore slower. (d) Quantification of trajectories, by scoring rat location on each video frame from 1s before Light-On to 1s after Center-In. Colored lines show all individual trajectories for one example session. Panels at right show the same trajectories plotted as distance remaining from Center-In port, by time elapsed from either Light-On or Center-In. Note that for non-engaged trials (green), the approach to the Center-In port consistently takes ~1-2s. Therefore, the epoch considered as “baseline” in the FSCV analyses (−3 to −1s relative to Center-In) is around the time that rats decide to initiate approach behaviors. (e) Extra [DA] causes longer average trajectories for engaged trials. Cumulative distributions of path-lengths between Light-On and Center-In, for (top-to-bottom) engaged/lucky, engaged/unlucky and non-engaged respectively. Blue lines indicate laser trials, and p-values are from Komolgorov-Smirnov tests comparing laser to no-laser distributions (no-laser/laser trial numbers: top, 292/75; middle, 424/99; bottom, 1897/792). On engaged trials rats often reoriented between the three potential start ports, perhaps checking if they were illuminated; one possibility is that the extra laser-evoked movement on engaged trials reflects dopaminergic facilitation of these orienting movements. If such a movement is already close to execution before Light-On, it may be evoked before the correct start port can be appropriately targeted. (f) Additional trajectory analysis, plotting time courses of rat distance from the illuminated start port. On non-engaged trials extra [DA] tends to make the approach to the illuminated start port occur earlier (note progressive separation of green, blue lines when aligned on Light-On). However, the approach time course is extremely similar (note overlapping lines in the final ~1-2s before Center-In), indicating that extra [DA] did not affect the speed of approach.

  16. Optogenetic effects on hazard rates for individual video-scored rats.
    Supplementary Fig. 10: Optogenetic effects on hazard rates for individual video-scored rats.

    Latency survivor plots (top) and corresponding hazard rates (bottom) for each of the four TH-Cre+ rats with ChR2 stimulation at Light-On for which video was recorded (each rat had 3 video sessions that were concatenated for analysis). Only non-engaged trials are included (Numbers of no-laser/laser trials: IM-389, 522/215; IM-391, 294/125; IM-392, 481/191; IM-394, 462/189). For each rat laser stimulation caused an increase in the hazard rate of the Center-In event ~1-2s later (the duration of an approach).

References

  1. Reynolds, J.N., Hyland, B.I. & Wickens, J.R. A cellular mechanism of reward-related learning. Nature 413, 6770 (2001).
  2. Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 15931599 (1997).
  3. Day, J.J., Roitman, M.F., Wightman, R.M. & Carelli, R.M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat Neurosci 10, 10208 (2007).
  4. Hart, A.S., Rutledge, R.B., Glimcher, P.W. & Phillips, P.E. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J. Neurosci. 34, 698704 (2014).
  5. Kim, K.M. et al. Optogenetic mimicry of the transient activation of dopamine neurons by natural reward is sufficient for operant reinforcement. PLoS ONE 7, e33612 (2012).
  6. Steinberg, E.E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966973 (2013).
  7. Berridge, K.C. The debate over dopamine's role in reward: the case for incentive salience. Psychopharmacology (Berl.) 191, 391431 (2007).
  8. Beierholm, U. et al. Dopamine modulates reward-related vigor. Neuropsychopharmacology 38, 14951503 (2013).
  9. Freed, C.R. & Yamamoto, B.K. Regional brain dopamine metabolism: a marker for the speed, direction, and posture of moving animals. Science 229, 6265 (1985).
  10. Niv, Y., Daw, N. & Dayan, P. How fast to work: response vigor, motivation and tonic dopamine. Adv. Neural Inf. Process. Syst. 18, 1019 (2006).
  11. Cagniard, B., Balsam, P.D., Brunner, D. & Zhuang, X. Mice with chronically elevated dopamine exhibit enhanced motivation, but not learning, for a food reward. Neuropsychopharmacology 31, 13621370 (2006).
  12. Salamone, J.D. & Correa, M. The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470485 (2012).
  13. Satoh, T., Nakai, S., Sato, T. & Kimura, M. Correlated coding of motivation and outcome of decision by dopamine neurons. J. Neurosci. 23, 99139923 (2003).
  14. Phillips, P.E., Stuber, G.D., Heien, M.L., Wightman, R.M. & Carelli, R.M. Subsecond dopamine release promotes cocaine seeking. Nature 422, 614618 (2003).
  15. Roitman, M.F., Stuber, G.D., Phillips, P.E., Wightman, R.M. & Carelli, R.M. Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. 24, 12651271 (2004).
  16. Howe, M.W., Tierney, P.L., Sandberg, S.G., Phillips, P.E. & Graybiel, A.M. Prolonged dopamine signaling in striatum signals proximity and value of distant rewards. Nature 500, 575579 (2013).
  17. Samejima, K., Ueda, Y., Doya, K. & Kimura, M. Representation of action-specific reward values in the striatum. Science 310, 13371340 (2005).
  18. Guitart-Masip, M., Beierholm, U.R., Dolan, R., Duzel, E. & Dayan, P. Vigor in the face of fluctuating rates of reward: an experimental examination. J. Cogn. Neurosci. 23, 39333938 (2011).
  19. Wang, A.Y., Miura, K. & Uchida, N. The dorsomedial striatum encodes net expected return, critical for energizing performance vigor. Nat. Neurosci. 16, 639647 (2013).
  20. Stephens, D.W. Foraging theory (Princeton University Press, 1986).
  21. Sutton, R.S. & Barto, A.G. Reinforcement Learning: an Introduction (MIT Press, 1998).
  22. Hull, C.L. The goal-gradient hypothesis and maze learning. Psychol. Rev. 39, 25 (1932).
  23. Venton, B.J., Troyer, K.P. & Wightman, R.M. Response Times of carbon fiber microelectrodes to dynamic changes in catecholamine concentration. Anal. Chem. 74, 539546 (2002).
  24. Heien, M.L. et al. Real-time measurement of dopamine fluctuations after cocaine in the brain of behaving rats. Proc. Natl. Acad. Sci. USA 102, 1002310028 (2005).
  25. Nicola, S.M. The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior. J. Neurosci. 30, 1658516600 (2010).
  26. Ikemoto, S. & Panksepp, J. The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking. Brain Res. Brain Res. Rev. 31, 641 (1999).
  27. Gan, J.O., Walton, M.E. & Phillips, P.E. Dissociable cost and benefit encoding of future rewards by mesolimbic dopamine. Nat. Neurosci. 13, 2527 (2010).
  28. Kable, J.W. & Glimcher, P.W. The neural correlates of subjective value during intertemporal choice. Nat. Neurosci. 10, 16251633 (2007).
  29. Adamantidis, A.R. et al. Optogenetic interrogation of dopaminergic modulation of the multiple phases of reward-seeking behavior. J. Neurosci. 31, 1082910835 (2011).
  30. Niyogi, R.K. et al. Optimal indolence: a normative microscopic approach to work and leisure. J. R. Soc. Interface 11, 20130969 (2014).
  31. Schlinger, H.D., Derenne, A. & Baron, A. What 50 years of research tell us about pausing under ratio schedules of reinforcement. Behav. Anal. 31, 39 (2008).
  32. du Hoffmann, J. & Nicola, S.M. Dopamine invigorates reward seeking by promoting cue-evoked excitation in the nucleus accumbens. J. Neurosci. 34, 1434914364 (2014).
  33. Cannon, C.M. & Palmiter, R.D. Reward without dopamine. J. Neurosci. 23, 1082710831 (2003).
  34. Ishiwari, K., Weber, S.M., Mingote, S., Correa, M. & Salamone, J.D. Accumbens dopamine and the regulation of effort in food-seeking behavior: modulation of work output by different ratio or force requirements. Behav. Brain Res. 151, 8391 (2004).
  35. Rapoport, J.L. et al. Dextroamphetamine. Its cognitive and behavioral effects in normal and hyperactive boys and normal men. Arch. Gen. Psychiatry 37, 933943 (1980).
  36. Wardle, M.C., Treadway, M.T., Mayo, L.M., Zald, D.H. & de Wit, H. Amping up effort: effects of d-amphetamine on human effort-based decision-making. J. Neurosci. 31, 1659716602 (2011).
  37. Nagano-Saito, A. et al. From anticipation to action, the role of dopamine in perceptual decision making: an fMRI-tyrosine depletion study. J. Neurophysiol. 108, 501512 (2012).
  38. Leventhal, D.K. et al. Dissociable effects of dopamine on learning and performance within sensorimotor striatum. Basal Ganglia 4, 4354 (2014).
  39. Turner, R.S. & Desmurget, M. Basal ganglia contributions to motor control: a vigorous tutor. Curr. Opin. Neurobiol. 20, 704716 (2010).
  40. Haith, A.M., Reppert, T.R. & Shadmehr, R. Evidence for hyperbolic temporal discounting of reward in control of movements. J. Neurosci. 32, 1172711736 (2012).
  41. Morris, G., Nevet, A., Arkadir, D., Vaadia, E. & Bergman, H. Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 10571063 (2006).
  42. Matsumoto, M. & Hikosaka, O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 83741 (2009).
  43. Dreyer, J.K., Herrik, K.F., Berg, R.W. & Hounsgaard, J.D. Influence of phasic and tonic dopamine release on receptor activation. J. Neurosci. 30, 1427314283 (2010).
  44. McClure, S.M., Daw, N.D. & Montague, P.R. A computational substrate for incentive salience. Trends Neurosci. 26, 423428 (2003).
  45. Tachibana, Y. & Hikosaka, O. The primate ventral pallidum encodes expected reward value and regulates motor action. Neuron 76, 826837 (2012).
  46. van der Meer, M.A. & Redish, A.D. Ventral striatum: a critical look at models of learning and evaluation. Curr. Opin. Neurobiol. 21, 387392 (2011).
  47. Fiorillo, C.D., Tobler, P.N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898902 (2003).
  48. Gershman, S.J. Dopamine ramps are a consequence of reward prediction errors. Neural Comput. 26, 467471 (2014).
  49. Morita, K. & Kato, A. Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front Neural Circuits 8, 36 (2014).
  50. Threlfell, S. et al. Striatal dopamine release is triggered by synchronized activity in cholinergic interneurons. Neuron 75, 5864 (2012).
  51. Witten, I.B. et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721733 (2011).
  52. Gage, G.J., Stoetzner, C.R., Wiltschko, A.B. & Berke, J.D. Selective activation of striatal fast-spiking interneurons during choice execution. Neuron 67, 46679 (2010).
  53. Leventhal, D.K. et al. Basal ganglia beta oscillations accompany cue utilization. Neuron 73, 523536 (2012).
  54. Schmidt, R., Leventhal, D.K., Mallet, N., Chen, F. & Berke, J.D. Canceling actions involves a race between basal ganglia pathways. Nat. Neurosci. 16, 11181124 (2013).
  55. Lau, B. & Glimcher, P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555579 (2005).
  56. Simen, P., Cohen, J.D. & Holmes, P. Rapid decision threshold modulation by reward rate in a neural network. Neural Netw. 19, 10131026 (2006).
  57. Daw, N.D., Kakade, S. & Dayan, P. Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603616 (2002).
  58. Sugrue, L.P., Corrado, G.S. & Newsome, W.T. Matching behavior and the representation of value in the parietal cortex. Science 304, 17821787 (2004).
  59. Humphries, M.D., Khamassi, M. & Gurney, K. Dopaminergic control of the exploration-exploitation trade-off via the basal ganglia. Front Neurosci 6, 9 (2012).
  60. Beeler, J.A., Frazier, C.R. & Zhuang, X. Putting desire on a budget: dopamine and energy expenditure, reconciling reward and resources. Front Integr Neurosci 6, 49 (2012).
  61. Song, P., Mabrouk, O.S., Hershey, N.D. & Kennedy, R.T. In vivo neurochemical monitoring using benzoyl chloride derivatization and liquid chromatography–mass spectrometry. Anal. Chem. 84, 412419 (2012).
  62. Song, P., Hershey, N.D., Mabrouk, O.S., Slaney, T.R. & Kennedy, R.T. Mass spectrometry “sensor” for in vivo acetylcholine monitoring. Anal. Chem. 84, 46594664 (2012).
  63. Aragona, B.J. et al. Regional specificity in the real-time development of phasic dopamine transmission patterns during acquisition of a cue–cocaine association in rats. Eur. J. Neurosci. 30, 18891899 (2009).
  64. Daw, N.D., Courville, A.C. & Touretzky, D.S. Representation and timing in theories of the dopamine system. Neural Comput. 18, 16371677 (2006).
  65. Kile, B.M. et al. Optimizing the temporal resolution of fast-scan cyclic voltammetry. ACS Chem. Neurosci. 3, 285292 (2012).
  66. Mazur, J.E. Tests of an equivalence rule for fixed and variable reinforcer delays. J. Exp. Psychol. Anim. Behav. Process. 10, 426 (1984).
  67. Ainslie, G. Précis of breakdown of will. Behav. Brain Sci. 28, 635650 (2005).
  68. Kobayashi, S. & Schultz, W. Influence of reward delays on responses of dopamine neurons. J. Neurosci. 28, 78377846 (2008).
  69. Kacelnik, A. Normative and descriptive models of decision making: time discounting and risk sensitivity. Ciba Found. Symp. 208, 5167 (1997).

Download references

Author information

  1. Present address: BrainLinks-BrainTools Cluster of Excellence and Bernstein Center, University of Freiburg, Germany (R.S.), Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA (C.M.V.W.).

    • Robert Schmidt &
    • Caitlin M Vander Weele
  2. These authors contributed equally to this work.

    • Arif A Hamid &
    • Jeffrey R Pettibone

Affiliations

  1. Department of Psychology, University of Michigan, Ann Arbor, Michigan, USA.

    • Arif A Hamid,
    • Jeffrey R Pettibone,
    • Vaughn L Hetrick,
    • Robert Schmidt,
    • Caitlin M Vander Weele,
    • Brandon J Aragona &
    • Joshua D Berke
  2. Neuroscience Graduate Program, University of Michigan, Ann Arbor, Michigan, USA.

    • Arif A Hamid,
    • Brandon J Aragona &
    • Joshua D Berke
  3. Department of Chemistry, University of Michigan, Ann Arbor, Michigan, USA.

    • Omar S Mabrouk &
    • Robert T Kennedy
  4. Department of Pharmacology, University of Michigan, Ann Arbor, Michigan, USA.

    • Omar S Mabrouk &
    • Robert T Kennedy
  5. Department of Biomedical Engineering, University of Michigan, Ann Arbor, Michigan, USA.

    • Joshua D Berke

Contributions

A.A.H. performed and analyzed both FSCV and optogenetic experiments, and J.R.P. performed and analyzed the microdialysis experiments. O.S.M. assisted with microdialysis, C.M.V.W. assisted with FSCV, V.L.H. assisted with optogenetics and R.S. assisted with reinforcement learning models. B.J.A. helped supervise the FSCV experiments and data analysis, and R.T.K. helped supervise microdialysis experiments. J.D.B. designed and supervised the study, performed the computational modeling, developed the theoretical interpretation, and wrote the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

Supplementary Figures

  1. Supplementary Figure 1: Reward rate affects the decision to begin work. (305 KB)

    (a) Latency distributions are bimodal, and depend on reward rate. Very short latencies (early peak) preferentially occur when a greater proportion of recent trials have been rewarded (same data set as Fig 1d–i). (b) (top) Schematic of video analysis. Each trial was categorized as “engaged” (already waiting for Light-On) or non-engaged based upon distance (s) and orientation (θ) immediately before Light-On (see Methods). (bottom) Arrows indicate rat head position and orientation for engaged (pink) and non-engaged (green) trials (one example session shown). (c) Categorization into engaged, non-engaged trials accounts for bimodal latency distribution (data shown are all non-laser trials across 12 ChR2 sessions in TH-Cre+ rats). (d) Proportion of engaged trials increases when more recent trials have been rewarded (3336 trials from 4 rats, r=0.82, p=0.003). (e) Especially for non-engaged trials, latencies are lower when reward rate is higher (r=−0.11,p=0.004 for 1570 engaged trials, r=−0.18, p=5.2×10−19 for 1766 non-engaged trials).

  2. Supplementary Figure 2: Individual microdialysis sessions. (802 KB)

    Each row shows data for a different session, with indicated rat ID (e.g. IM463) and recording side (LH = left, RH=right). From left: dialysis probe location, behavioral and [DA] time courses, and individual session correlations to behavioral variables. Reward rate is in units of rewards per min. Numbers of microdialysis samples for each of the seven sessions: 86,72,39,39,68,73,67 respectively. The overall relationship between dopamine and reward rate remained highly significant even if excluding periods of inactivity (defined as no trials initiated for >2 minutes, shaded in green; regression R2 = 0.12, p = 1.4 × 10−13).

  3. Supplementary Figure 3: Cross-correlograms for behavioral variables and neurochemicals. (592 KB)

    Each plot shows cross-correlograms averaged across all microdialysis sessions, all using the same axes (−20min to +20min lags, −0.5 to +1 correlation). Colored lines indicate statistical thresholds corrected for multiple comparisons (see Methods). Many neurochemical pairs show no evidence of covariation, but others display strong relationships including a cluster of glutamate, serine, aspartate and glycine.

  4. Supplementary Figure 4: Individual voltammetry sessions. (1,786 KB)

    Each row shows data for a different rat (e.g. IM355, which was also used as the example in Figs.3, 4). At left, recording site within nucleus accumbens. Middle panels show behavioral data for the FSCV session (same format as Fig.1). Right panels show individual FSCV data (same format as Fig.3, but with additional event alignments).

  5. Supplementary Figure 5: SMDP model. (376 KB)

    (a) Task performance was modeled as a sequence of transitions between states of the agent (rat). Each state had a single associated cached value V(s) (rather than, for example, separate state-action (Q) values for leftward and rightward trials). Most state transitions occur at variable times (hence “semi-Markov”) marked by observed external events (Center-In, Go-Cue, etc). In contrast, the state sequence between Side-Out and Reward Port In is arbitrarily defined (“Approaching Reward Port” begins 1s before Reward Port In; “Arriving At Reward Port” begins 0.5s before Reward Port In). Changing the number or specific timing of these intermediate states does not materially affect the rising shape of the value function. (b) Average correlation (color scale = Spearman’s r) between SMDP model state value at Center-In (Vci) and latency across all six FSCV rats, for a range of learning rates ɑ and exponential discounting time constants ɣ. Note that color scale is inverted (red indicates strongest negative relationship, with higher value corresponding to shorter latency). White dot marks point of strongest relationship (ɑ=0.40, ɣ=0.95). (c) Correlation between [DA] and state value V is stronger than the correlation between [DA] and reward prediction error δ, across the same range of parameters. Color scale at right is the same for both matrices (Spearman’s r).

  6. Supplementary Figure 6: Dopamine relationships to temporally-stretched model variables. (510 KB)

    (a) Kernel consisted of an exponential rise (to 50% of asymptote) and an exponential fall, with separate time constants. (b) Within-trial correlation coefficients between [DA] and kernel-convolved model variables V and δ, for a range of rise and fall time constants (0 – 1.5s each, in 50ms timesteps, using data from all 6 rats). Regardless of parameter values, [DA] correlations to V were always higher than to δ. (c) Same example data as Fig. 4E, but also showing convolved V and δ (using time constants that maximized correlation to [DA] in each case). (d) Trial-by-trial (top) and average (bottom) [DA], convolved V, and convolved δ, for the same session as Fig. 4d,e.

  7. Supplementary Figure 7: Histology for behavioral optogenetic experiments. (511 KB)

    Identifier (e.g. “IM389”) for each rat is given at bottom right corner. Coronal sections shown are within 180µm (anterior-posterior) of the observed fiber tip location. Green indicates expression of eYFP, blue is DAPI counterstain. In a couple of cases (IM423, IM441) autofluorescence of damaged brain tissue is visible along the optic fiber tracts; this was not specific to the green channel.

  8. Supplementary Figure 8: Further analysis of persistence of optogenetic effects. (589 KB)

    (a) Regression analysis showing substantial effects of recent rewards (black) on latency, but no comparable effect of recent Side-In laser stimulations on latency. (b) Effects of Light-On [DA] manipulation on same-trial latency distributions (top), and of Side-In [DA] manipulation on next-trial latency distributions (bottom). Dataset shown is the same as Fig. 6c, i.e. all completed trials in TH-Cre+ rats with ChR2 (left), TH-Cre rats with ChR2 (middle) and TH-Cre+ rats with halorhodopsin (right). (c) Regression analysis of laser stimulation on subsequent left/right choices. Recent food rewards for a given left/right action increase the probability that it will be repeated. Extra [DA] at Light-On has little or no effect on subsequent choices, but extra [DA] at Side-In is persistently reinforcing. For the Side-In data, note especially the positive coefficients for otherwise unrewarded laser trials.

  9. Supplementary Figure 9: Video analysis of optogenetic effects on latency. (379 KB)

    (a) Extra [DA] at Light-On causes shorter latencies for non-engaged trials, but longer latencies for a subset of engaged trials. Top plot shows all trials (for the n=4 TH-Cre+ rats with ChR2 stimulation at Light-On for which video was recorded; 3 sessions/rat; 3336 no-laser trials in grey; 1335 laser trials in blue). Bottom plots show the breakdown into engaged (n=1975) and non-engaged (n=2696) trials. (b) We examined whether laser-slowed trials might be those in which the rat was waiting at the wrong port (if, for example, DA were to increase the salience of currently attended stimuli). Engaged trials were further broken down into “lucky guesses” (those trials for which the rat was immediately adjacent to the start port as it was illuminated) and “unlucky guesses” (immediately adjacent to one of the other two possible start ports). Blue dashed ellipses indicate zones used to classify trials by guessed port (8.5cm long diameter, 3.4cm short diameter) (c) Laser-slowing was observed for both lucky (n=603) and unlucky (n=1007) guesses. Note that blue distribution is bimodal in both cases, indicating that only a subset of trials were affected. Video observations suggested that on some trials extra [DA] evokes a small extra head/neck movement, that makes the trajectory to the illuminated port longer and therefore slower. (d) Quantification of trajectories, by scoring rat location on each video frame from 1s before Light-On to 1s after Center-In. Colored lines show all individual trajectories for one example session. Panels at right show the same trajectories plotted as distance remaining from Center-In port, by time elapsed from either Light-On or Center-In. Note that for non-engaged trials (green), the approach to the Center-In port consistently takes ~1-2s. Therefore, the epoch considered as “baseline” in the FSCV analyses (−3 to −1s relative to Center-In) is around the time that rats decide to initiate approach behaviors. (e) Extra [DA] causes longer average trajectories for engaged trials. Cumulative distributions of path-lengths between Light-On and Center-In, for (top-to-bottom) engaged/lucky, engaged/unlucky and non-engaged respectively. Blue lines indicate laser trials, and p-values are from Komolgorov-Smirnov tests comparing laser to no-laser distributions (no-laser/laser trial numbers: top, 292/75; middle, 424/99; bottom, 1897/792). On engaged trials rats often reoriented between the three potential start ports, perhaps checking if they were illuminated; one possibility is that the extra laser-evoked movement on engaged trials reflects dopaminergic facilitation of these orienting movements. If such a movement is already close to execution before Light-On, it may be evoked before the correct start port can be appropriately targeted. (f) Additional trajectory analysis, plotting time courses of rat distance from the illuminated start port. On non-engaged trials extra [DA] tends to make the approach to the illuminated start port occur earlier (note progressive separation of green, blue lines when aligned on Light-On). However, the approach time course is extremely similar (note overlapping lines in the final ~1-2s before Center-In), indicating that extra [DA] did not affect the speed of approach.

  10. Supplementary Figure 10: Optogenetic effects on hazard rates for individual video-scored rats. (152 KB)

    Latency survivor plots (top) and corresponding hazard rates (bottom) for each of the four TH-Cre+ rats with ChR2 stimulation at Light-On for which video was recorded (each rat had 3 video sessions that were concatenated for analysis). Only non-engaged trials are included (Numbers of no-laser/laser trials: IM-389, 522/215; IM-391, 294/125; IM-392, 481/191; IM-394, 462/189). For each rat laser stimulation caused an increase in the hazard rate of the Center-In event ~1-2s later (the duration of an approach).

PDF files

  1. Supplementary Text and Figures (9,040 KB)

    Supplementary Figures 1–10

  2. Supplementary Methods Checklist (432 KB)

Additional data