Dopamine neurons are thought to facilitate learning by comparing actual and expected reward1,2. Despite two decades of investigation, little is known about how this comparison is made. To determine how dopamine neurons calculate prediction error, we combined optogenetic manipulations with extracellular recordings in the ventral tegmental area while mice engaged in classical conditioning. Here we demonstrate, by manipulating the temporal expectation of reward, that dopamine neurons perform subtraction, a computation that is ideal for reinforcement learning but rarely observed in the brain. Furthermore, selectively exciting and inhibiting neighbouring GABA (γ-aminobutyric acid) neurons in the ventral tegmental area reveals that these neurons are a source of subtraction: they inhibit dopamine neurons when reward is expected, causally contributing to prediction-error calculations. Finally, bilaterally stimulating ventral tegmental area GABA neurons dramatically reduces anticipatory licking to conditioned odours, consistent with an important role for these neurons in reinforcement learning. Together, our results uncover the arithmetic and local circuitry underlying dopamine prediction errors.
Subscribe to Journal
Get full journal access for 1 year
only $3.90 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)
Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005)
Bush, R. R. & Mosteller, F. A mathematical model for simple learning. Psychol. Rev. 58, 313–323 (1951)
Rescorla, R. A. & Wagner, A. R. in Classical Conditioning II: Current Research and Theory (eds Black, A. & Prokasy, W. ) 64–99 (Appleton-Century-Crofts, 1972)
Carandini, M. & Heeger, D. J. Normalization as a canonical neural computation. Nature Rev. Neurosci. 13, 51–62 (2012)
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012)
Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005)
Silver, R. A. Neuronal arithmetic. Nature Rev. Neurosci. 11, 474–489 (2010)
Houk, J. C., Adams, J. L. & Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G. ) 249–270 (MIT Press, 1995)
Kawato, M. & Samejima, K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr. Opin. Neurobiol. 17, 205–212 (2007)
Matsumoto, M. & Hikosaka, O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111–1115 (2007)
Hong, S., Jhou, T. C., Smith, M., Saleem, K. S. & Hikosaka, O. Negative reward signals from the lateral habenula to dopamine neurons are mediated by rostromedial tegmental nucleus in primates. J. Neurosci. 31, 11457–11471 (2011)
Omelchenko, N. & Sesack, S. R. Ultrastructural analysis of local collaterals of rat ventral tegmental area neurons: GABA phenotype and synapses onto dopamine and GABA cells. Synapse 63, 895–906 (2009)
Van Zessen, R., Phillips, J. L., Budygin, E. A. & Stuber, G. D. Activation of VTA GABA neurons disrupts reward consumption. Neuron 73, 1184–1194 (2012)
Tan, K. R. et al. GABA neurons of the VTA drive conditioned place aversion. Neuron 73, 1173–1183 (2012)
Hazy, T. E., Frank, M. J. & O’Reilly, R. C. Neural mechanisms of acquired phasic dopamine responses in learning. Neurosci. Biobehav. Rev. 34, 701–720 (2010)
Rivest, F., Kalaska, J. F. & Bengio, Y. Conditioning and time representation in long short-term memory networks. Biol. Cybern. 108, 23–48 (2014)
Vitay, J. & Hamker, F. H. Timing and expectation of reward: a neuro-computational model of the afferents to the ventral tegmental area. Front. Neurorobot. 8, 4 (2014)
Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput. 20, 3034–3054 (2008)
Tan, C. O. & Bullock, D. A local circuit model of learned striatal and dopamine cell responses under probabilistic schedules of reward. J. Neurosci. 28, 10062–10074 (2008)
Han, X. et al. A high-light sensitivity optical neural silencer: development and application to optogenetic control of non-human primate cortex. Front. Syst. Neurosci. 5, 18 (2011)
Fiorillo, C. D., Song, M. R. & Yun, S. R. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli. J. Neurosci. 33, 4710–4725 (2013)
Pi, H.-J. et al. Cortical interneurons that specialize in disinhibitory control. Nature 503, 521–524 (2013)
Wilson, N. R., Runyan, C. A., Wang, F. L. & Sur, M. Division and subtraction by distinct cortical inhibitory networks in vivo . Nature 488, 343–348 (2012)
Atallah, B. V., Bruns, W., Carandini, M. & Scanziani, M. Parvalbumin-expressing interneurons linearly transform cortical responses to visual stimuli. Neuron 73, 159–170 (2012)
Murphy, B. K. & Miller, K. D. Multiplicative gain changes are induced by excitation or inhibition alone. J. Neurosci. 23, 10040–10051 (2003)
Ayaz, A. & Chance, F. S. Gain Modulation of neuronal responses by subtractive and divisive mechanisms of inhibition. J. Neurophysiol. 101, 958–968 (2009)
Holt, G. R. & Koch, C. Shunting inhibition does not have a divisive effect on firing rates. Neural Comput. 9, 1001–1013 (1997)
Roy, J. E. & Cullen, K. E. Dissociating self-generated from passively applied head motion: neural mechanisms in the vestibular nuclei. J. Neurosci. 24, 2102–2111 (2004)
Rust, N. C., Schwartz, O., Movshon, J. A. & Simoncelli, E. P. Spatiotemporal elements of macaque V1 receptive fields. Neuron 46, 945–956 (2005)
Bäckman, C. M. et al. Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006)
Vong, L. et al. Leptin action on GABAergic neurons prevents obesity and reduces inhibitory tone to POMC neurons. Neuron 71, 142–154 (2011)
Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G. & Deisseroth, K. Millisecond-timescale, genetically targeted optical control of neural activity. Nature Neurosci. 8, 1263–1268 (2005)
Atasoy, D., Aponte, Y., Su, H. H. & Sternson, S. M. A FLEX switch targets Channelrhodopsin-2 to multiple cell types for imaging and long-range circuit mapping. J. Neurosci. 28, 7025–7030 (2008)
Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nature Neurosci. 6, 1224–1229 (2003)
Schmitzer-Torbert, N. & Redish, A. D. Neuronal activity in the rodent dorsal striatum in sequential navigation: separation of spatial and reward responses on the multiple T task. J. Neurophysiol. 91, 2259–2272 (2004)
Lima, S. Q., Hromádka, T., Znamenskiy, P. & Zador, A. M. PINP: a new method of tagging neuronal populations for identification during in vivo electrophysiological recording. PLoS One 4, e6099 (2009)
Kvitsiani, D. et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013)
Olsen, S. R., Bhandawat, V. & Wilson, R. I. Divisive normalization in olfactory population codes. Neuron 66, 287–299 (2010)
We thank M. Andermann, J. Assad, R. Born, J. Buckholtz, P. Glimcher, J. Maunsell, B. Sabatini, W. Schultz, R. Wilson, and members of the Uchida laboratory for comments on the manuscript; S. Haesler for technical expertise and discussions on experimental design; E. Molnar for histology assistance; C. Dulac for sharing resources; K. Deisseroth for the AAV–FLEX–ChR2 construct; and E. Boyden for the AAV–FLEX–ArchT construct. This work was supported by a Sackler Fellowship in Psychobiology (N.E.) and National Institutes of Health grants T32GM007753 (to N.E.), F30MH100729 (to N.E.), R01MH095953 (to N.U.), and R01MH101207 (to N.U.).
The authors declare no competing financial interests.
Extended data figures and tables
a–d, Schematic of recording locations for mice used in the dopamine identification task (a, n = 5), the GABA stimulation task (b, n = 7), the GABA inhibition task (c, n = 9), and the behavioural task (d, n = 12). b, Red, experimental mice expressing ChR2 in VTA GABA neurons (n = 5). Blue, control mice expressing GFP in VTA GABA neurons (n = 2). c, Red, mice in which laser was delivered at continuous intensity (n = 7). Blue, mice in which laser was delivered with ramping intensity (n = 2). d, Red, experimental mice expressing ChR2 in VTA GABA neurons (n = 6). Blue, control mice expressing GFP in VTA GABA neurons (n = 6). e–g, Selectivity and efficiency of ArchT expression. e, Representative merged image (one of 30 z-stacks). Magenta, Vgat-tdTomato; green, ArchT–GFP. Open arrow, neuron expressing Vgat-tdTomato but not ArchT–GFP. Closed arrow, neuron expressing both Vgat-tdTomato and ArchT–GFP. Scale bar, 10 μm. f, Selectivity of infection to GABA neurons: percentage of ArchT–GFP-expressing neurons (n = 131 neurons for AAV1 and 165 neurons for AAV8) that were positive for Vgat-tdTomato. Filled bars, Vgat-tdTomato mouse injected with AAV1–FLEX–ArchT–GFP. Empty bars, Vgat-tdTomato mouse injected with AAV8–FLEX–ArchT–GFP. g, Efficiency of infection: percentage of Vgat-tdTomato-expressing neurons (n = 278 neurons for AAV1 and 283 neurons for AAV8) that were positive for ArchT–GFP.
Extended Data Figure 2 Neuron classification for dopamine identification and GABA stimulation experiments.
a–c, Dopamine identification experiment. d–f, ChR2-expressing animals in GABA stimulation experiment. g–i, GFP-expressing control animals in GABA stimulation experiment. a, d, g, Responses of all VTA neurons recorded in the tasks. Each row reflects the area under the ROC values for a single neuron in the second before and after delivery of expected reward. Baseline is taken as 1 s before odour onset. Yellow, increase from baseline; cyan, decrease from baseline. Light-identified neurons are denoted by an asterisk to the left of each column. b, e, h, The first three principal components of the area under the ROC curves. These values were used for unsupervised hierarchical clustering, as shown in the dendrogram on the right. c, f, i, Average firing rates for the three clusters of neurons in each task. Odour was delivered for 1 s, followed by a 0.5 s delay and then reward delivery.
a, Raw signal from one example light-identified dopamine neuron. Blue bars, light pulses. b, For the same neuron, mean waveforms for spontaneous (black) and light-evoked (blue) action potentials. c, For the same neuron, raster plots for 20 Hz (left) and 50 Hz (right) laser stimulation. Each row is one trial of laser stimulation. d, Histogram of log P values for each neuron recorded in the dopamine identification experiment (n = 170). The P values were derived from the stimulus-associated spike latency test (see Methods). Neurons with P < 0.001 and waveform correlations > 0.9 were considered identified (filled bars). e, f, For light-identified neurons, probability of spiking (e) and latency to first spike (f) after laser pulses at different frequencies. Orange circles, mean across neurons. g, Histogram of mean latencies (left) and latency standard deviations (right) in response to laser stimulation for all light-identified dopamine neurons in the variable-reward task. h–n, Same conventions as a–g, but for neurons recorded in the GABA stimulation task (n = 102).
a–c, Results from dopamine identification experiment (Fig. 1). d, e, Results from GABA stimulation experiment (Fig. 2). f–i, Results from GABA inhibition experiment (Fig. 3). a, Raster plots (top and middle) and firing rate (bottom) of representative dopamine neuron in response to unexpected (orange) or temporally expected (black) reward. ***P < 0.001, t-test. b, For the same neuron, responses (mean ± s.e.m. across trials) to each reward size. Orange line, fit for unexpected reward. Dotted black line, divisive transformation. Solid black line, subtractive transformation. c, Individual neuron regression slopes for the analysis in Fig. 1d. Empty bars, slope not different from zero (P > 0.05). Filled bars, P < 0.05. Triangle, mean slope. d, e, Firing rate of representative VTA GABA (d) and putative dopamine (e) neuron with (blue) and without (black) ChR2 stimulation. Light blue box, laser delivery. f, g, Firing rate of representative VTA GABA (f) and putative dopamine (g) neuron during odour B trials with (green) or without (black) laser delivery. h, i, Histogram of putative GABA (h) and dopamine (i) neuron responses to laser delivery. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum); empty bars, P > 0.05. Triangle, mean.
a–c, Putative GABA neurons in the dopamine identification experiment (Fig. 1). d–f, Putative GABA neurons in the GABA stimulation experiment (Fig. 2). a, b, Average firing rate of putative GABA neurons to unexpected (a) or temporally expected (b) rewards of various sizes. c, Population responses (mean ± s.e.m. across putative GABA neurons) for different reward sizes. Orange, unexpected reward. Black, temporally expected reward. Responses were averaged over a 600 ms window after reward delivery. d, e, Average firing rate of putative GABA neurons to rewards of various sizes, delivered with (e) or without (d) optogenetic GABA stimulation. f, Population responses (mean ± s.e.m. across putative GABA neurons) for different reward sizes. Blue, reward with laser stimulation. Black, reward without laser stimulation. Responses were averaged over a 600 ms window after reward delivery.
a, To understand how dopamine neurons compute reward prediction error, we first determined how dopamine neurons respond to various sizes of unexpected reward (schematized as orange curves). We then taught the mice to expect reward and observed how expectation shifted this dose–response (black curves). We modelled four types of shift: output subtraction (top left), input subtraction (bottom left), output division (top right), and input division (bottom right). Output subtraction was consistently the best fit. For equations, see Methods. Analysis adapted from a previous study39. b–e, Results from dopamine identification experiment. f–i, Results from GABA stimulation experiment. b, c, Results from all putative dopamine neurons (n = 84). ***P < 0.001, bootstrap. d, e, Results from light-identified dopamine neurons (n = 40). ***P < 0.001, bootstrap. f, g, Results from putative dopamine neurons in the GABA stimulation experiment (n = 45). *P < 0.05, bootstrap. h, i, Results from putative dopamine neurons in the GABA stimulation experiment, subtracting the 500 ms period immediately before reward delivery. This takes into account the laser-induced baseline shift in dopamine responses. *P < 0.05, bootstrap. b, d, f, h, Average responses (mean ± s.e.m. across neurons) to different sizes of reward, with fits for output subtraction (solid line) and output division (dotted line). c, e, g, i, Results of bootstrapping analysis. For each resample, we compared the mean squared error for the subtractive fit with the mean squared error for the divisive fit. Negative numbers favour subtraction. P values were calculated as the proportion of resamples in which division was a better fit than subtraction.
a–d, Results from GABA stimulation experiment. e–h, Results from GABA inhibition experiment. a, Firing rate (mean ± s.e.m.) of putative dopamine neurons that did not show a significant baseline shift. ***P < 0.001, t-test. b, To visualize whether GABA stimulation preferentially affected phasic dopamine responses in addition to baseline firing rates, we took the activity in Fig. 2c and subtracted the trials when laser was delivered alone. Any remaining change at the time of reward could not be due to a baseline shift. **P = 0.01, t-test. c, Firing rate (mean ± s.e.m.) of putative dopamine (left) and GABA (right) neurons in trials where laser was delivered in the absence of reward. This dopamine response was subtracted to calculate the firing rates in b. d, Histogram of the phasic effect of GABA stimulation. The values were calculated by subtracting the black line from the blue line in b. Empty bars, slope not different from zero (P > 0.05, Wilcoxon rank-sum test). Filled bars, slope different from zero (P < 0.05). Triangle, mean (P < 0.001, t-test). e–h, Same conventions as a–d, but for the GABA inhibition experiment. ***P < 0.001, t-test.
a, In the dopamine identification task (Fig. 1), lick rates (mean ± s.e.m. across sessions) for odours predicting reward (black) or nothing (grey). b, In the GABA stimulation task (Fig. 2), lick rates (mean ± s.e.m. across sessions) for reward alone (black), reward + GABA stimulation (blue), or GABA stimulation alone (orange). c, In the GABA inhibition task (Fig. 3), lick rates (mean ± s.e.m. across sessions) for the odours predicting reward with 90% probability (black) and 10% probability (grey). Green laser was delivered to inhibit VTA GABA neurons in 25% of reward (green) and nothing (orange) trials. d, e, In the bilateral stimulation experiment (Fig. 4), anticipatory licks (mean ± s.e.m. across mice) for mice injected with ChR2 (d) and GFP (e). Grey bars, odour B; blue or orange bars, odour D. Left, last three training sessions before odour D was paired with laser; middle, last three sessions with laser delivery (excluding probe trials); right, last three sessions after laser was turned off. **P < 0.01; ***P < 0.001; paired t-test.
a–c, Mice in which laser was delivered with continuous intensity. d–f, Mice in which laser was delivered with ramping intensity. a, d, Responses of all VTA neurons recorded in the tasks. Each row reflects the area under the ROC values for a single neuron in the second before and after delivery of expected reward. Baseline is taken as one second before odour onset. Yellow, increase from baseline; cyan, decrease from baseline. b, e, The first three principal components of the area under the ROC curves. These values were used for unsupervised hierarchical clustering, as shown in the dendrogram on the right. c, f, Average firing rates for the three clusters of neurons in each task. Odour was delivered for 1 s, followed by a 0.5 s delay and then reward delivery.
a, Firing rate (mean ± s.e.m.) of putative VTA GABA neurons during odour B trials with (green) or without (black) ramping laser delivery. ***P < 0.001, t-test. b, Histogram of putative GABA neuron responses to laser delivery. Responses were averaged over the entire duration of the laser. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum test); empty bars, P > 0.05. Triangle, mean (P < 0.001, t-test). c, Firing rate (mean ± s.e.m.) of putative dopamine neurons with (green) or without (black) ramping GABA inhibition. ***P < 0.001, t-test. d, Histogram of putative dopamine neuron responses to laser delivery. Responses were averaged over the 0.5 s window after reward delivery. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum test); empty bars, P > 0.05. Triangle, mean (P < 0.001, t-test).
About this article
Cite this article
Eshel, N., Bukwich, M., Rao, V. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015). https://doi.org/10.1038/nature14855
Nature Communications (2021)
Computational Brain & Behavior (2021)
In vivo patch-clamp recordings reveal distinct subthreshold signatures and threshold dynamics of midbrain dopamine neurons
Nature Communications (2020)
Nature Neuroscience (2020)