The dopamine projection from ventral tegmental area (VTA) to nucleus accumbens (NAc) is critical for motivation to work for rewards and reward-driven learning. How dopamine supports both functions is unclear. Dopamine cell spiking can encode prediction errors, which are vital learning signals in computational theories of adaptive behaviour. By contrast, dopamine release ramps up as animals approach rewards, mirroring reward expectation. This mismatch might reflect differences in behavioural tasks, slower changes in dopamine cell spiking or spike-independent modulation of dopamine release. Here we compare spiking of identified VTA dopamine cells with NAc dopamine release in the same decision-making task. Cues that indicate an upcoming reward increased both spiking and release. However, NAc core dopamine release also covaried with dynamically evolving reward expectations, without corresponding changes in VTA dopamine cell spiking. Our results suggest a fundamental difference in how dopamine release is regulated to achieve distinct functions: broadcast burst signals promote learning, whereas local control drives motivation.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Custom MATLAB code is available on request from J.D.B.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Pan, W. X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
Saunders, B. T., Richard, J. M., Margolis, E. B. & Janak, P. H. Dopamine neurons create Pavlovian conditioned stimuli with circuit-defined motivational properties. Nat. Neurosci. 21, 1072–1083 (2018).
Phillips, P. E., Stuber, G. D., Heien, M. L., Wightman, R. M. & Carelli, R. M. Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618 (2003).
Roitman, M. F., Stuber, G. D., Phillips, P. E., Wightman, R. M. & Carelli, R. M. Dopamine operates as a subsecond modulator of food seeking. J. Neurosci. 24, 1265–1271 (2004).
Wassum, K. M., Ostlund, S. B. & Maidment, N. T. Phasic mesolimbic dopamine signaling precedes and predicts performance of a self-initiated action sequence task. Biol. Psychiatry 71, 846–854 (2012).
Howe, M. W., Tierney, P. L., Sandberg, S. G., Phillips, P. E. & Graybiel, A. M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).
Syed, E. C. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).
Morris, G., Nevet, A., Arkadir, D., Vaadia, E. & Bergman, H. Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–1063 (2006).
da Silva, J. A., Tecuapetla, F., Paixão, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Patriarchi, T., Cho, J. R., Merten, K., Howe, M. W., et al. Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science 360, eaat4422 (2018).
Salamone, J. D. & Correa, M. The mysterious motivational functions of mesolimbic dopamine. Neuron 76, 470–485 (2012).
Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).
Garris, P. A. & Wightman, R. M. Different kinetics govern dopaminergic transmission in the amygdala, prefrontal cortex, and striatum: an in vivo voltammetric study. J. Neurosci. 14, 442–450 (1994).
Frank, M. J., Doll, B. B., Oas-Terpstra, J. & Moreno, F. Prefrontal and striatal dopaminergic genes predict individual differences in exploration and exploitation. Nat. Neurosci. 12, 1062–1068 (2009).
St Onge, J. R., Ahn, S., Phillips, A. G. & Floresco, S. B. Dynamic fluctuations in dopamine efflux in the prefrontal cortex and nucleus accumbens during risk-based decision making. J. Neurosci. 32, 16880–16891 (2012).
Bartra, O., McGuire, J. T. & Kable, J. W. The valuation system: a coordinate-based meta-analysis of BOLD fMRI experiments examining neural correlates of subjective value. Neuroimage 76, 412–427 (2013).
Ikemoto, S. Dopamine reward circuitry: two projection systems from the ventral midbrain to the nucleus accumbens-olfactory tubercle complex. Brain Res. Brain Res. Rev. 56, 27–78 (2007).
Breton, J. M. et al. Relative contributions and mapping of ventral tegmental area dopamine and GABA neurons by projection target in the rat. J. Comp. Neurol. (2018).
Ungless, M. A., Magill, P. J. & Bolam, J. P. Uniform inhibition of dopamine neurons in the ventral tegmental area by aversive stimuli. Science 303, 2040–2042 (2004).
Morales, M. & Margolis, E. B. Ventral tegmental area: cellular heterogeneity, connectivity and behaviour. Nat. Rev. Neurosci. 18, 73–85 (2017).
Morris, G., Arkadir, D., Nevet, A., Vaadia, E. & Bergman, H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133–143 (2004).
Floresco, S. B., West, A. R., Ash, B., Moore, H. & Grace, A. A. Afferent modulation of dopamine neuron firing differentially regulates tonic and phasic dopamine transmission. Nat. Neurosci. 6, 968–973 (2003).
Grace, A. A. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nat. Rev. Neurosci. 17, 524–532 (2016).
Cohen, J. Y., Amoroso, M. W. & Uchida, N. Serotonergic neurons signal reward and punishment on multiple timescales. eLife 4, e06346 (2015).
Niv, Y., Daw, N. & Dayan, P. How fast to work: response vigor, motivation and tonic dopamine. Adv. Neural Inf. Process. Syst. 18, 1019 (2006).
Bayer, H. M., Lau, B. & Glimcher, P. W. Statistics of midbrain dopamine neuron spike trains in the awake primate. J. Neurophysiol. 98, 1428–1439 (2007).
Chergui, K., Suaud-Chagny, M. F. & Gonon, F. Nonlinear relationship between impulse flow, dopamine release and dopamine elimination in the rat brain in vivo. Neuroscience 62, 641–645 (1994).
Parker, N. F. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat. Neurosci. 19, 845–854 (2016).
Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).
Trulson, M. E. Simultaneous recording of substantia nigra neurons and voltammetric release of dopamine in the caudate of behaving cats. Brain Res. Bull. 15, 221–223 (1985).
Glowinski, J., Chéramy, A., Romo, R. & Barbeito, L. Presynaptic regulation of dopaminergic transmission in the striatum. Cell. Mol. Neurobiol. 8, 7–17 (1988).
Zhou, F. M., Liang, Y. & Dani, J. A. Endogenous nicotinic cholinergic activity regulates dopamine release in the striatum. Nat. Neurosci. 4, 1224–1229 (2001).
Threlfell, S. et al. Striatal dopamine release is triggered by synchronized activity in cholinergic interneurons. Neuron 75, 58–64 (2012).
Cachope, R. et al. Selective activation of cholinergic interneurons enhances accumbal phasic dopamine release: setting the tone for reward processing. Cell Reports 2, 33–41 (2012).
Sulzer, D., Cragg, S. J. & Rice, M. E. Striatal dopamine neurotransmission: regulation of release and uptake. Basal Ganglia 6, 123–148 (2016).
Floresco, S. B., Yang, C. R., Phillips, A. G. & Blaha, C. D. Basolateral amygdala stimulation evokes glutamate receptor-dependent dopamine efflux in the nucleus accumbens of the anaesthetized rat. Eur. J. Neurosci. 10, 1241–1251 (1998).
Jones, J. L. et al. Basolateral amygdala modulates terminal dopamine release in the nucleus accumbens and conditioned responding. Biol. Psychiatry 67, 737–744 (2010).
Schultz, W. Responses of midbrain dopamine neurons to behavioral trigger stimuli in the monkey. J. Neurophysiol. 56, 1439–1461 (1986).
Berke, J. D. What does dopamine mean? Nat. Neurosci. 21, 787–793 (2018).
Bromberg-Martin, E. S., Matsumoto, M. & Hikosaka, O. Distinct tonic and phasic anticipatory activity in lateral habenula and dopamine neurons. Neuron 67, 144–155 (2010).
Pasquereau, B. & Turner, R. S. Dopamine neurons encode errors in predicting movement trigger occurrence. J. Neurophysiol. 113, 1110–1123 (2015).
Fiorillo, C. D., Newsome, W. T. & Schultz, W. The temporal precision of reward prediction in dopamine neurons. Nat. Neurosci. 11, 966–973 (2008).
Morita, K. & Kato, A. Striatal dopamine ramping may indicate flexible reinforcement learning with forgetting in the cortico-basal ganglia circuits. Front. Neural Circuits 8, 36 (2014).
Gershman, S. J. Dopamine ramps are a consequence of reward prediction errors. Neural Comput. 26, 467–471 (2014).
Nicola, S. M. The flexible approach hypothesis: unification of effort and cue-responding hypotheses for the role of nucleus accumbens dopamine in the activation of reward-seeking behavior. J. Neurosci. 30, 16585–16600 (2010).
Paxinos, G. & Watson, C. The Rat Brain in Stereotaxic Coordinates 5th edn (Elsevier Academic, 2005).
Witten, I. B. et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733 (2011).
Sugrue, L. P., Corrado, G. S. & Newsome, W. T. Matching behavior and the representation of value in the parietal cortex. Science 304, 1782–1787 (2004).
Wong, J. M. et al. Benzoyl chloride derivatization with liquid chromatography-mass spectrometry for targeted metabolomics of neurochemicals in biological samples. J. Chromatogr. A 1446, 78–90 (2016).
Chung, J. E. et al. A fully automated approach to spike sorting. Neuron 95, 1381–1394 (2017).
Kvitsiani, D. et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013).
Grace, A. A. & Bunney, B. S. The control of firing pattern in nigral dopamine neurons: burst firing. J. Neurosci. 4, 2877–2890 (1984).
Lerner, T. N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).
We thank P. Dayan, H. Fields, L. Frank, C. Donaghue and T. Faust for their comments on an early version of the manuscript, and V. Hetrick, R. Hashim and T. Davidson for technical assistance and advice. This work was supported by the National Institute on Drug Abuse, the National Institute of Mental Health, the National Institute on Neurological Disorders and Stroke, the University of Michigan, Ann Arbor, and the University of California, San Francisco.
Nature thanks Margaret Rice and the other anonymous reviewer(s) for their contribution to the peer review of this work.
Extended data figures and tables
a, Top left, anatomical definitions of the subregions examined with microdialysis. Brain atlas outlines in this figure were reproduced with permission from Paxinos and Watson, 200551. Other panels map the correlation between dopamine release and reward rate at individual probe placements in coronal (mm from bregma, B) and sagittal (mm from midline) planes. Colour bar shows strength of correlation. b, Top left, Regression analysis showing dependency of (log) latency on the outcome of recent trials, during microdialysis sessions (n = 26 sessions, 7,113 trials, from 12 rats; error bars show s.e.m.). *average regression weights significantly different from zero (t-test, P < 0.05). Top right, illustration of how the reward rate definition depends on the time constant (tau) of the leaky integrator. Top middle, dopamine: reward rate correlations as a function of τ. In the main Figs., τ was chosen (from a range of 1–1,200 s) to maximize the (negative) correlation between reward rate and (log) latency in each session. Thin lines represent individual sessions, with the best fit τ used in regression analyses indicated by a dot. Thick lines indicate the average of all dopamine: reward rate correlations for a given tau within each subregion. Overall behavioural metrics were similar between sessions sampling from each of the seven subregions (mean rewards per min: range 1.42–1.77, ANOVA F(6,44) = 0.58, P = 0.746; mean attempts per min: range 3.32–3.97, F(6,44) = 0.40, P = 0.872; mean latency: range 5.99–8.02, F(6,44) = 0.27, P = 0.948).
Bars represent R2 values for linear tests between each analyte (rows) and behavioural covariates (columns). In models with more than one covariate, bar length indicates the R2 for the full model. Negative relationships are reported in blue and positive relationships are in red. P values are reported at three alpha levels (0.05, 0.0005 and 0.000005) after Bonferroni correction for multiple comparisons (7 subregions × 21 analytes × 12 measures). To calculate reward rate, we averaged the leaky-integrator-estimated reward rate in 1-min bins defined by the start and end of each dialysis sample. ‘Attempts’ is the number of initiated trials (including trials that resulted in an error) in each dialysis minute. Attempts and reward rate and an interaction term were combined in a single model (column 2) to examine whether adding attempts could explain additional variance in the analyte signal that could not be explained by reward rate alone. ‘Latency’ is the average of the (log) latency in each minute. ‘Exploit’ is the proportion of choices of the higher reward probability option, in the last half of blocks for which the two ports had different probabilities. ‘Rewards’ and ‘omissions’ were defined as the number of rewarded and unrewarded trials in each minute, respectively. ‘Cumulative rewards’ and ‘time’ were included in the same regression model to estimate progressive factors such as satiety, and possible slow timescale increases or decreases in analyte concentration across the session. Cumulative rewards represents the total number of rewards received by the end of the current dialysis minute, and time was simply the number of minutes elapsed since the session began. Bars in this column show colour when only the coefficient for the cumulative reward variable was significant. %Ipsi and %Contra represent the fraction of choices to ipsi- or contra-versive ports (relative to probe location in the brain) in each minute, independent of block probability. P(win-stay) is the probability of repeating the previous choice, given the previous choice was rewarded.
Left, atlas locations and histology photomicrographs for each rat (IM-657, IM-1002, IM-1003, IM-1037 and IM-1078) from which opto-tagged dopamine cells were obtained. Red, TH-staining; green: ChR2–eYFP; blue: DAPI. Scale bars, 1 mm. IM-1037 and IM-1078 brains were sliced horizontally, so fibre tracks appear as a circle. Font colours for rat ID numbers correspond to colours of tick marks in coronal atlas sections, indicating estimated recording locations for opto-tagged dopamine cells. For IM-1078, virus was injected into NAc core, and retrogradely infected dopamine neurons were recorded in VTA. Right, retrograde tracing of CTb from NAc core (top) to VTA-l (bottom). Top panel shows approximate extent of NAc labelling in each of the three rats (each rat indicated by a different colour). Bottom left panels show close-ups of TH labelling (blue), CTb (green) and merged image. Bottom right panels show reconstructed locations of TH+ and double-labelled TH+CTb+ midbrain neurons, on horizontal atlas sections. Estimated optrode locations are shown by red circles (or orange circle, in the case of the retrograde tagging rat IM-1078). Labelled neurons were counted within the red rectangles that span the AP and ML extent of estimated recording locations. Percentages shown are the fraction of TH+ neurons that are also CTb+. Brain atlas outlines in this figure were reproduced with permission from Paxinos and Watson, 200551.
a, Average waveforms of optogenetically identified dopamine neurons (negative voltage upwards). Average light-evoked waveforms are shown in blue and session-wide average waveforms are in black. All spikes within 10 ms of laser onset were used to construct light-evoked waveform average. Averaged waveforms are normalized to have similar total peak-valley voltages (see Supplementary Fig. 1 for individual voltage ranges). b, Session-wide average waveform for non-dopamine cells. c, Opto-tagging P value for all units plotted in log-scale, showing a strong bimodal distribution. To classify cells as light-responsive we used a threshold of P < 0.001. d, Times to first spike after laser onset, showing mean for each identified dopamine neuron, and standard deviation (jitter).
a, Tone pips were followed by reward delivery (‘click’) with different probabilities (zero, medium or high) depending on the tone pitch. During prior training (average 15.6 sessions, range 2–26) rats had learned about these different probabilities, as indicated by their corresponding scaled likelihood of entering the food port during cue presentation. ‘Head entry %’ indicates proportion of trials for which the rat was at the food port at each moment in time, for one example session. Red and blue indicate rewarded and unrewarded trials, respectively. This rat was more likely to go to the food port during the cue that was highly (75%) predictive of rewards compared to the other cues (25% and 0%; one-way ANOVA, F = 11.1, P < 1.2 × 10−6). Unpredictable reward delivery (right) prompts rapid approach. Bottom, raster plots and peri-event time histograms from an identified dopamine neuron during that same session. b, Averaged firing for identified dopamine cells (n = 27) in this task. High/medium tones were either 75%/25% predictive of reward (n = 9 cells) or 100%/50% (n = 18), respectively. Data on each individual dopamine neuron are presented in the Supplementary Fig. 1. c, Behaviour (top), cue response (middle) and click response (bottom) for all Pavlovian sessions with opto-tagged dopamine cells. Statistical comparisons were all one-way ANOVA, using food port head entry during 0.3–3-s epoch relative to cue onset, and peak firing rate during 0.5-s duration epochs after cue onset or food-hopper clicks. d–f, Same as above except for dLight measurements (n = 10 sessions total). All dLight sessions used tones with 75, 25 and 0% reward probability, and ANOVA tests examined peak signal within 1 s of cue onset or food-hopper clicks.
Each row shows a distinct optic fibre placement, and the corresponding recording session that was included in data analyses. For two rats (IM-1066 and IM-1088) we obtained bilateral NAc dLight recordings. From left to right, panels show histologically determined NAc location of fibre tip (within horizontal brain atlas section, including atlas coordinates51), long timescale cross-correlation with reward rate (as in Fig. 3c), short timescale cross-correlation with reward rate (black), SMDP state value (green) and RPE (magenta; as in Fig. 3f); event-aligned averages (as in Fig. 4b, but including more events). For Light-on and Centre-in alignments data are split by latencies <1 s (light green) or >2 s (dark green; as in Fig. 4d); for other alignments, data are split by rewarded (red) and unrewarded (blue) trials. Brain atlas outlines in this figure were reproduced with permission from Paxinos and Watson, 200551.
Format is as in Fig. 4. dLight fluorescence is here shown separately for 470-nm and 405-nm (control) excitation. Of note, (1) rapid, behaviour-linked dLight fluorescence changes occur at 470 nm, as expected, not in the control 405-nm band; (2) distinct timing of spiking, dLight, and voltammetry responses to cue onsets; and (3) non-dopamine cell firing is much more variable (wider error bands) but on average shows activity during movements: starting just before Centre-in (irrespective of latency), just before Side-in, and just before Food-port-in.
Left column, average firing rate of dopamine cells around Side-in, broken down by terciles of reward expectation, based either on recent reward rate (top; same as Fig. 5a), number of rewards in previous ten trials, state value (V) of an actor-critic model or state value (Qleft + Qright) of a Q-learning model. The actor-critic and Q-learning models were both trial-based, rather than evolving continuously in time. The actor-critic model estimated the overall probability of receiving a reward on each trial, V, using the update rule V′ = V + alpha(RPE), in which RPE = actual reward [1 or 0] − V. The Q-learning model kept separate estimates of the probabilities of receiving rewards for left and right choices (Qleft and Qright) and updated Q for the chosen action (only) using Q′ = Q + alpha(RPE), in which RPE = actual reward [1 or 0] – Q. The learning parameter alpha was determined for each session by best fit to latencies, for V or (Qleft + Qright) respectively. The subsequent columns show correlations between reward expectation and dopamine cell firing after Side-in, measuring either peak firing rate (within 250 ms after rewarded Side-in), minimum firing rate (middle; within 2 s after unrewarded Side-in) and pause duration (bottom; maximum inter-spike-interval within 2 s after unrewarded Side-in). For all histograms, light blue indicates cells with significant correlations (P < 0.01) before multiple comparisons correction, dark blue indicates cells that remained significant after correction. Positive RPE coding is strong and consistent, negative RPE coding is less so.