Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Arithmetic and local circuitry underlying dopamine prediction errors

An Erratum to this article was published on 07 October 2015

Abstract

Dopamine neurons are thought to facilitate learning by comparing actual and expected reward1,2. Despite two decades of investigation, little is known about how this comparison is made. To determine how dopamine neurons calculate prediction error, we combined optogenetic manipulations with extracellular recordings in the ventral tegmental area while mice engaged in classical conditioning. Here we demonstrate, by manipulating the temporal expectation of reward, that dopamine neurons perform subtraction, a computation that is ideal for reinforcement learning but rarely observed in the brain. Furthermore, selectively exciting and inhibiting neighbouring GABA (γ-aminobutyric acid) neurons in the ventral tegmental area reveals that these neurons are a source of subtraction: they inhibit dopamine neurons when reward is expected, causally contributing to prediction-error calculations. Finally, bilaterally stimulating ventral tegmental area GABA neurons dramatically reduces anticipatory licking to conditioned odours, consistent with an important role for these neurons in reinforcement learning. Together, our results uncover the arithmetic and local circuitry underlying dopamine prediction errors.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Expectation triggers subtraction of dopamine neuron responses.
Figure 2: Selective excitation of VTA GABA neurons mimics the effect of expectation.
Figure 3: Selective inhibition of VTA GABA neurons modulates prediction errors.
Figure 4: Bilateral excitation of VTA GABA neurons disrupts learned association.

References

  1. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997)

    Article  CAS  Google Scholar 

  2. Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005)

    Article  CAS  Google Scholar 

  3. Bush, R. R. & Mosteller, F. A mathematical model for simple learning. Psychol. Rev. 58, 313–323 (1951)

    Article  CAS  Google Scholar 

  4. Rescorla, R. A. & Wagner, A. R. in Classical Conditioning II: Current Research and Theory (eds Black, A. & Prokasy, W. ) 64–99 (Appleton-Century-Crofts, 1972)

    Google Scholar 

  5. Carandini, M. & Heeger, D. J. Normalization as a canonical neural computation. Nature Rev. Neurosci. 13, 51–62 (2012)

    Article  CAS  Google Scholar 

  6. Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012)

    Article  ADS  CAS  Google Scholar 

  7. Tobler, P. N., Fiorillo, C. D. & Schultz, W. Adaptive coding of reward value by dopamine neurons. Science 307, 1642–1645 (2005)

    Article  ADS  CAS  Google Scholar 

  8. Silver, R. A. Neuronal arithmetic. Nature Rev. Neurosci. 11, 474–489 (2010)

    Article  CAS  Google Scholar 

  9. Houk, J. C., Adams, J. L. & Barto, A. G. in Models of Information Processing in the Basal Ganglia (eds Houk, J. C., Davis, J. L. & Beiser, D. G. ) 249–270 (MIT Press, 1995)

    Google Scholar 

  10. Kawato, M. & Samejima, K. Efficient reinforcement learning: computational theories, neuroscience and robotics. Curr. Opin. Neurobiol. 17, 205–212 (2007)

    Article  CAS  Google Scholar 

  11. Matsumoto, M. & Hikosaka, O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111–1115 (2007)

    Article  ADS  CAS  Google Scholar 

  12. Hong, S., Jhou, T. C., Smith, M., Saleem, K. S. & Hikosaka, O. Negative reward signals from the lateral habenula to dopamine neurons are mediated by rostromedial tegmental nucleus in primates. J. Neurosci. 31, 11457–11471 (2011)

    Article  CAS  Google Scholar 

  13. Omelchenko, N. & Sesack, S. R. Ultrastructural analysis of local collaterals of rat ventral tegmental area neurons: GABA phenotype and synapses onto dopamine and GABA cells. Synapse 63, 895–906 (2009)

    Article  CAS  Google Scholar 

  14. Van Zessen, R., Phillips, J. L., Budygin, E. A. & Stuber, G. D. Activation of VTA GABA neurons disrupts reward consumption. Neuron 73, 1184–1194 (2012)

    Article  CAS  Google Scholar 

  15. Tan, K. R. et al. GABA neurons of the VTA drive conditioned place aversion. Neuron 73, 1173–1183 (2012)

    Article  CAS  Google Scholar 

  16. Hazy, T. E., Frank, M. J. & O’Reilly, R. C. Neural mechanisms of acquired phasic dopamine responses in learning. Neurosci. Biobehav. Rev. 34, 701–720 (2010)

    Article  CAS  Google Scholar 

  17. Rivest, F., Kalaska, J. F. & Bengio, Y. Conditioning and time representation in long short-term memory networks. Biol. Cybern. 108, 23–48 (2014)

    Article  MathSciNet  Google Scholar 

  18. Vitay, J. & Hamker, F. H. Timing and expectation of reward: a neuro-computational model of the afferents to the ventral tegmental area. Front. Neurorobot. 8, 4 (2014)

    Article  Google Scholar 

  19. Ludvig, E. A., Sutton, R. S. & Kehoe, E. J. Stimulus representation and the timing of reward-prediction errors in models of the dopamine system. Neural Comput. 20, 3034–3054 (2008)

    Article  MATH  Google Scholar 

  20. Tan, C. O. & Bullock, D. A local circuit model of learned striatal and dopamine cell responses under probabilistic schedules of reward. J. Neurosci. 28, 10062–10074 (2008)

    Article  CAS  Google Scholar 

  21. Han, X. et al. A high-light sensitivity optical neural silencer: development and application to optogenetic control of non-human primate cortex. Front. Syst. Neurosci. 5, 18 (2011)

    Article  Google Scholar 

  22. Fiorillo, C. D., Song, M. R. & Yun, S. R. Multiphasic temporal dynamics in responses of midbrain dopamine neurons to appetitive and aversive stimuli. J. Neurosci. 33, 4710–4725 (2013)

    Article  CAS  Google Scholar 

  23. Pi, H.-J. et al. Cortical interneurons that specialize in disinhibitory control. Nature 503, 521–524 (2013)

    Article  ADS  CAS  Google Scholar 

  24. Wilson, N. R., Runyan, C. A., Wang, F. L. & Sur, M. Division and subtraction by distinct cortical inhibitory networks in vivo . Nature 488, 343–348 (2012)

    Article  ADS  CAS  Google Scholar 

  25. Atallah, B. V., Bruns, W., Carandini, M. & Scanziani, M. Parvalbumin-expressing interneurons linearly transform cortical responses to visual stimuli. Neuron 73, 159–170 (2012)

    Article  CAS  Google Scholar 

  26. Murphy, B. K. & Miller, K. D. Multiplicative gain changes are induced by excitation or inhibition alone. J. Neurosci. 23, 10040–10051 (2003)

    Article  CAS  Google Scholar 

  27. Ayaz, A. & Chance, F. S. Gain Modulation of neuronal responses by subtractive and divisive mechanisms of inhibition. J. Neurophysiol. 101, 958–968 (2009)

    Article  Google Scholar 

  28. Holt, G. R. & Koch, C. Shunting inhibition does not have a divisive effect on firing rates. Neural Comput. 9, 1001–1013 (1997)

    Article  CAS  Google Scholar 

  29. Roy, J. E. & Cullen, K. E. Dissociating self-generated from passively applied head motion: neural mechanisms in the vestibular nuclei. J. Neurosci. 24, 2102–2111 (2004)

    Article  CAS  Google Scholar 

  30. Rust, N. C., Schwartz, O., Movshon, J. A. & Simoncelli, E. P. Spatiotemporal elements of macaque V1 receptive fields. Neuron 46, 945–956 (2005)

    Article  CAS  Google Scholar 

  31. Bäckman, C. M. et al. Characterization of a mouse strain expressing Cre recombinase from the 3′ untranslated region of the dopamine transporter locus. Genesis 44, 383–390 (2006)

    Article  Google Scholar 

  32. Vong, L. et al. Leptin action on GABAergic neurons prevents obesity and reduces inhibitory tone to POMC neurons. Neuron 71, 142–154 (2011)

    Article  CAS  Google Scholar 

  33. Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G. & Deisseroth, K. Millisecond-timescale, genetically targeted optical control of neural activity. Nature Neurosci. 8, 1263–1268 (2005)

    Article  CAS  Google Scholar 

  34. Atasoy, D., Aponte, Y., Su, H. H. & Sternson, S. M. A FLEX switch targets Channelrhodopsin-2 to multiple cell types for imaging and long-range circuit mapping. J. Neurosci. 28, 7025–7030 (2008)

    Article  CAS  Google Scholar 

  35. Uchida, N. & Mainen, Z. F. Speed and accuracy of olfactory discrimination in the rat. Nature Neurosci. 6, 1224–1229 (2003)

    Article  CAS  Google Scholar 

  36. Schmitzer-Torbert, N. & Redish, A. D. Neuronal activity in the rodent dorsal striatum in sequential navigation: separation of spatial and reward responses on the multiple T task. J. Neurophysiol. 91, 2259–2272 (2004)

    Article  Google Scholar 

  37. Lima, S. Q., Hromádka, T., Znamenskiy, P. & Zador, A. M. PINP: a new method of tagging neuronal populations for identification during in vivo electrophysiological recording. PLoS One 4, e6099 (2009)

    Article  ADS  Google Scholar 

  38. Kvitsiani, D. et al. Distinct behavioural and network correlates of two interneuron types in prefrontal cortex. Nature 498, 363–366 (2013)

    Article  ADS  CAS  Google Scholar 

  39. Olsen, S. R., Bhandawat, V. & Wilson, R. I. Divisive normalization in olfactory population codes. Neuron 66, 287–299 (2010)

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank M. Andermann, J. Assad, R. Born, J. Buckholtz, P. Glimcher, J. Maunsell, B. Sabatini, W. Schultz, R. Wilson, and members of the Uchida laboratory for comments on the manuscript; S. Haesler for technical expertise and discussions on experimental design; E. Molnar for histology assistance; C. Dulac for sharing resources; K. Deisseroth for the AAV–FLEX–ChR2 construct; and E. Boyden for the AAV–FLEX–ArchT construct. This work was supported by a Sackler Fellowship in Psychobiology (N.E.) and National Institutes of Health grants T32GM007753 (to N.E.), F30MH100729 (to N.E.), R01MH095953 (to N.U.), and R01MH101207 (to N.U.).

Author information

Authors and Affiliations

Authors

Contributions

N.E. and N.U. designed the recording experiments. N.E., V.R., and N.U. designed the behaviour experiment. N.E., M.B., V.R., V.H., and J.T. collected data. N.E., M.B., and V.R. analysed data. N.E. wrote the manuscript with comments from N.U.

Corresponding author

Correspondence to Naoshige Uchida.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 Recording sites and ArchT expression.

ad, Schematic of recording locations for mice used in the dopamine identification task (a, n = 5), the GABA stimulation task (b, n = 7), the GABA inhibition task (c, n = 9), and the behavioural task (d, n = 12). b, Red, experimental mice expressing ChR2 in VTA GABA neurons (n = 5). Blue, control mice expressing GFP in VTA GABA neurons (n = 2). c, Red, mice in which laser was delivered at continuous intensity (n = 7). Blue, mice in which laser was delivered with ramping intensity (n = 2). d, Red, experimental mice expressing ChR2 in VTA GABA neurons (n = 6). Blue, control mice expressing GFP in VTA GABA neurons (n = 6). eg, Selectivity and efficiency of ArchT expression. e, Representative merged image (one of 30 z-stacks). Magenta, Vgat-tdTomato; green, ArchT–GFP. Open arrow, neuron expressing Vgat-tdTomato but not ArchT–GFP. Closed arrow, neuron expressing both Vgat-tdTomato and ArchT–GFP. Scale bar, 10 μm. f, Selectivity of infection to GABA neurons: percentage of ArchT–GFP-expressing neurons (n = 131 neurons for AAV1 and 165 neurons for AAV8) that were positive for Vgat-tdTomato. Filled bars, Vgat-tdTomato mouse injected with AAV1–FLEX–ArchT–GFP. Empty bars, Vgat-tdTomato mouse injected with AAV8–FLEX–ArchT–GFP. g, Efficiency of infection: percentage of Vgat-tdTomato-expressing neurons (n = 278 neurons for AAV1 and 283 neurons for AAV8) that were positive for ArchT–GFP.

Extended Data Figure 2 Neuron classification for dopamine identification and GABA stimulation experiments.

ac, Dopamine identification experiment. df, ChR2-expressing animals in GABA stimulation experiment. gi, GFP-expressing control animals in GABA stimulation experiment. a, d, g, Responses of all VTA neurons recorded in the tasks. Each row reflects the area under the ROC values for a single neuron in the second before and after delivery of expected reward. Baseline is taken as 1 s before odour onset. Yellow, increase from baseline; cyan, decrease from baseline. Light-identified neurons are denoted by an asterisk to the left of each column. b, e, h, The first three principal components of the area under the ROC curves. These values were used for unsupervised hierarchical clustering, as shown in the dendrogram on the right. c, f, i, Average firing rates for the three clusters of neurons in each task. Odour was delivered for 1 s, followed by a 0.5 s delay and then reward delivery.

Extended Data Figure 3 Light identification of dopamine and GABA neurons.

a, Raw signal from one example light-identified dopamine neuron. Blue bars, light pulses. b, For the same neuron, mean waveforms for spontaneous (black) and light-evoked (blue) action potentials. c, For the same neuron, raster plots for 20 Hz (left) and 50 Hz (right) laser stimulation. Each row is one trial of laser stimulation. d, Histogram of log P values for each neuron recorded in the dopamine identification experiment (n = 170). The P values were derived from the stimulus-associated spike latency test (see Methods). Neurons with P < 0.001 and waveform correlations > 0.9 were considered identified (filled bars). e, f, For light-identified neurons, probability of spiking (e) and latency to first spike (f) after laser pulses at different frequencies. Orange circles, mean across neurons. g, Histogram of mean latencies (left) and latency standard deviations (right) in response to laser stimulation for all light-identified dopamine neurons in the variable-reward task. hn, Same conventions as ag, but for neurons recorded in the GABA stimulation task (n = 102).

Extended Data Figure 4 Individual neuron analysis from all recording experiments.

ac, Results from dopamine identification experiment (Fig. 1). d, e, Results from GABA stimulation experiment (Fig. 2). fi, Results from GABA inhibition experiment (Fig. 3). a, Raster plots (top and middle) and firing rate (bottom) of representative dopamine neuron in response to unexpected (orange) or temporally expected (black) reward. ***P < 0.001, t-test. b, For the same neuron, responses (mean ± s.e.m. across trials) to each reward size. Orange line, fit for unexpected reward. Dotted black line, divisive transformation. Solid black line, subtractive transformation. c, Individual neuron regression slopes for the analysis in Fig. 1d. Empty bars, slope not different from zero (P > 0.05). Filled bars, P < 0.05. Triangle, mean slope. d, e, Firing rate of representative VTA GABA (d) and putative dopamine (e) neuron with (blue) and without (black) ChR2 stimulation. Light blue box, laser delivery. f, g, Firing rate of representative VTA GABA (f) and putative dopamine (g) neuron during odour B trials with (green) or without (black) laser delivery. h, i, Histogram of putative GABA (h) and dopamine (i) neuron responses to laser delivery. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum); empty bars, P > 0.05. Triangle, mean.

Extended Data Figure 5 VTA GABA activity does not vary consistently with reward size.

ac, Putative GABA neurons in the dopamine identification experiment (Fig. 1). df, Putative GABA neurons in the GABA stimulation experiment (Fig. 2). a, b, Average firing rate of putative GABA neurons to unexpected (a) or temporally expected (b) rewards of various sizes. c, Population responses (mean ± s.e.m. across putative GABA neurons) for different reward sizes. Orange, unexpected reward. Black, temporally expected reward. Responses were averaged over a 600 ms window after reward delivery. d, e, Average firing rate of putative GABA neurons to rewards of various sizes, delivered with (e) or without (d) optogenetic GABA stimulation. f, Population responses (mean ± s.e.m. across putative GABA neurons) for different reward sizes. Blue, reward with laser stimulation. Black, reward without laser stimulation. Responses were averaged over a 600 ms window after reward delivery.

Extended Data Figure 6 Statistical test for subtraction versus division.

a, To understand how dopamine neurons compute reward prediction error, we first determined how dopamine neurons respond to various sizes of unexpected reward (schematized as orange curves). We then taught the mice to expect reward and observed how expectation shifted this dose–response (black curves). We modelled four types of shift: output subtraction (top left), input subtraction (bottom left), output division (top right), and input division (bottom right). Output subtraction was consistently the best fit. For equations, see Methods. Analysis adapted from a previous study39. be, Results from dopamine identification experiment. fi, Results from GABA stimulation experiment. b, c, Results from all putative dopamine neurons (n = 84). ***P < 0.001, bootstrap. d, e, Results from light-identified dopamine neurons (n = 40). ***P < 0.001, bootstrap. f, g, Results from putative dopamine neurons in the GABA stimulation experiment (n = 45). *P < 0.05, bootstrap. h, i, Results from putative dopamine neurons in the GABA stimulation experiment, subtracting the 500 ms period immediately before reward delivery. This takes into account the laser-induced baseline shift in dopamine responses. *P < 0.05, bootstrap. b, d, f, h, Average responses (mean ± s.e.m. across neurons) to different sizes of reward, with fits for output subtraction (solid line) and output division (dotted line). c, e, g, i, Results of bootstrapping analysis. For each resample, we compared the mean squared error for the subtractive fit with the mean squared error for the divisive fit. Negative numbers favour subtraction. P values were calculated as the proportion of resamples in which division was a better fit than subtraction.

Extended Data Figure 7 Laser effect is more than a baseline shift.

ad, Results from GABA stimulation experiment. eh, Results from GABA inhibition experiment. a, Firing rate (mean ± s.e.m.) of putative dopamine neurons that did not show a significant baseline shift. ***P < 0.001, t-test. b, To visualize whether GABA stimulation preferentially affected phasic dopamine responses in addition to baseline firing rates, we took the activity in Fig. 2c and subtracted the trials when laser was delivered alone. Any remaining change at the time of reward could not be due to a baseline shift. **P = 0.01, t-test. c, Firing rate (mean ± s.e.m.) of putative dopamine (left) and GABA (right) neurons in trials where laser was delivered in the absence of reward. This dopamine response was subtracted to calculate the firing rates in b. d, Histogram of the phasic effect of GABA stimulation. The values were calculated by subtracting the black line from the blue line in b. Empty bars, slope not different from zero (P > 0.05, Wilcoxon rank-sum test). Filled bars, slope different from zero (P < 0.05). Triangle, mean (P < 0.001, t-test). eh, Same conventions as ad, but for the GABA inhibition experiment. ***P < 0.001, t-test.

Extended Data Figure 8 Behavioural performance on all four experiments.

a, In the dopamine identification task (Fig. 1), lick rates (mean ± s.e.m. across sessions) for odours predicting reward (black) or nothing (grey). b, In the GABA stimulation task (Fig. 2), lick rates (mean ± s.e.m. across sessions) for reward alone (black), reward + GABA stimulation (blue), or GABA stimulation alone (orange). c, In the GABA inhibition task (Fig. 3), lick rates (mean ± s.e.m. across sessions) for the odours predicting reward with 90% probability (black) and 10% probability (grey). Green laser was delivered to inhibit VTA GABA neurons in 25% of reward (green) and nothing (orange) trials. d, e, In the bilateral stimulation experiment (Fig. 4), anticipatory licks (mean ± s.e.m. across mice) for mice injected with ChR2 (d) and GFP (e). Grey bars, odour B; blue or orange bars, odour D. Left, last three training sessions before odour D was paired with laser; middle, last three sessions with laser delivery (excluding probe trials); right, last three sessions after laser was turned off. **P < 0.01; ***P < 0.001; paired t-test.

Extended Data Figure 9 Neuron classification for GABA inhibition experiment.

ac, Mice in which laser was delivered with continuous intensity. df, Mice in which laser was delivered with ramping intensity. a, d, Responses of all VTA neurons recorded in the tasks. Each row reflects the area under the ROC values for a single neuron in the second before and after delivery of expected reward. Baseline is taken as one second before odour onset. Yellow, increase from baseline; cyan, decrease from baseline. b, e, The first three principal components of the area under the ROC curves. These values were used for unsupervised hierarchical clustering, as shown in the dendrogram on the right. c, f, Average firing rates for the three clusters of neurons in each task. Odour was delivered for 1 s, followed by a 0.5 s delay and then reward delivery.

Extended Data Figure 10 Ramping laser stimulation eliminates baseline shift.

a, Firing rate (mean ± s.e.m.) of putative VTA GABA neurons during odour B trials with (green) or without (black) ramping laser delivery. ***P < 0.001, t-test. b, Histogram of putative GABA neuron responses to laser delivery. Responses were averaged over the entire duration of the laser. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum test); empty bars, P > 0.05. Triangle, mean (P < 0.001, t-test). c, Firing rate (mean ± s.e.m.) of putative dopamine neurons with (green) or without (black) ramping GABA inhibition. ***P < 0.001, t-test. d, Histogram of putative dopamine neuron responses to laser delivery. Responses were averaged over the 0.5 s window after reward delivery. Filled bars, significant effect of laser (P < 0.05, Wilcoxon rank-sum test); empty bars, P > 0.05. Triangle, mean (P < 0.001, t-test).

Supplementary information

Supplementary Information

This file contains Supplementary Text. (PDF 85 kb)

Supplementary Information

This file contains Supplementary Table 1. (XLSX 12 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eshel, N., Bukwich, M., Rao, V. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015). https://doi.org/10.1038/nature14855

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature14855

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing