Article | Published:

The timing of action determines reward prediction signals in identified midbrain dopamine neurons

Nature Neurosciencevolume 21pages15631573 (2018) | Download Citation

Abstract

Animals adapt their behavior in response to informative sensory cues using multiple brain circuits. The activity of midbrain dopaminergic neurons is thought to convey a critical teaching signal: reward-prediction error. Although reward-prediction error signals are thought to be essential to learning, little is known about the dynamic changes in the activity of midbrain dopaminergic neurons as animals learn about novel sensory cues and appetitive rewards. Here we describe a large dataset of cell-attached recordings of identified dopaminergic neurons as naive mice learned a novel cue–reward association. During learning midbrain dopaminergic neuron activity results from the summation of sensory cue-related and movement initiation-related response components. These components are both a function of reward expectation yet they are dissociable. Learning produces an increasingly precise coordination of action initiation following sensory cues that results in apparent reward-prediction error correlates. Our data thus provide new insights into the circuit mechanisms that underlie a critical computation in a highly conserved learning circuit.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Data availability

The data used to generate the results that support the findings of this study are available from the corresponding authors upon reasonable request.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Hebb, D. O. The Organization of Behavior: a Neuropsychological Theory (Wiley, New York, USA, 1949).

  2. 2.

    Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction (MIT Press, Boston, MA, USA, 1998).

  3. 3.

    Schultz, W. Neuronal reward and decision signals: from theories to data. Physiol. Rev. 95, 853–951 (2015).

  4. 4.

    Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).

  5. 5.

    Lak, A., Stauffer, W. R. & Schultz, W. Dopamine neurons learn relative chosen value from probabilistic rewards. eLife 5, e18044 (2016).

  6. 6.

    Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).

  7. 7.

    Pan, W. X., Brown, J. & Dudman, J. T. Neural signals of extinction in the inhibitory microcircuit of the ventral midbrain. Nat. Neurosci. 16, 71–78 (2013).

  8. 8.

    Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900–913 (1993).

  9. 9.

    Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).

  10. 10.

    Kawagoe, R., Takikawa, Y. & Hikosaka, O. Reward-predicting activity of dopamine and caudate neurons—a possible mechanism of motivational control of saccadic eye movement. J. Neurophysiol. 91, 1013–1024 (2004).

  11. 11.

    Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).

  12. 12.

    Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).

  13. 13.

    Bornstein, A. M. & Daw, N. D. Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–380 (2011).

  14. 14.

    Hart, G., Leung, B. K. & Balleine, B. W. Dorsal and ventral streams: the distinct role of striatal subregions in the acquisition and performance of goal-directed actions. Neurobiol. Learn. Mem. 108, 104–118 (2014).

  15. 15.

    Suri, R. E. & Schultz, W. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91, 871–890 (1999).

  16. 16.

    Darvas, M., Wunsch, A. M., Gibbs, J. T. & Palmiter, R. D. Dopamine dependency for acquisition and performance of Pavlovian conditioned response. Proc. Natl Acad. Sci. USA 111, 2764–2769 (2014).

  17. 17.

    Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G. & Deisseroth, K. Millisecond-timescale, genetically targeted optical control of neural activity. Nat. Neurosci. 8, 1263–1268 (2005).

  18. 18.

    Osborne, J. E. & Dudman, J. T. RIVETS: a mechanical system for in vivo and in vitro electrophysiology and imaging. PLoS One 9, e89007 (2014).

  19. 19.

    Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

  20. 20.

    Lima, S. Q., Hromadka, T., Znamenskiy, P. & Zador, A. M. PINP: a new method of tagging neuronal populations for identification during in vivo electrophysiological recording. PLoS One 4, e6099 (2009).

  21. 21.

    Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

  22. 22.

    Pan, W. X. & Dudman, J. T. A specific component of the evoked potential mirrors phasic dopamine neuron activity during conditioning. J. Neurosci. 35, 10451–10459 (2015).

  23. 23.

    Dodson, P. D. et al. Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism. Proc. Natl Acad. Sci. USA 113, E2180–E2188 (2016).

  24. 24.

    Howe, M. W. & Dombeck, D. A. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510 (2016).

  25. 25.

    da Silva, J. A., Tecuapetla, F., Paixao, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).

  26. 26.

    Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).

  27. 27.

    Syed, E. C. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).

  28. 28.

    Barter, J. W. et al. Beyond reward prediction errors: the role of dopamine in movement kinematics. Front. Integr. Neurosci. 9, 39 (2015).

  29. 29.

    Jin, X. & Costa, R. M. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature 466, 457–462 (2010).

  30. 30.

    Collins, A. L. et al. Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Sci. Rep. 6, 20231 (2016).

  31. 31.

    Romo, R. & Schultz, W. Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63, 592–606 (1990).

  32. 32.

    Fiorillo, C. D., Yun, S. R. & Song, M. R. Diversity and homogeneity in responses of midbrain dopamine neurons. J. Neurosci. 33, 4693–4709 (2013).

  33. 33.

    Betley, J. N. et al. Neurons for hunger and thirst transmit a negative-valence teaching signal. Nature 521, 180–185 (2015).

  34. 34.

    Zimmerman, C. A. et al. Thirst neurons anticipate the homeostatic consequences of eating and drinking. Nature 537, 680–684 (2016).

  35. 35.

    Day, J. J., Roitman, M. F., Wightman, R. M. & Carelli, R. M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).

  36. 36.

    Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model. J. Neurosci. 28, 9619–9631 (2008).

  37. 37.

    Stuber, G. D. et al. Reward-predictive cues enhance excitatory synaptic strength onto midbrain dopamine neurons. Science 321, 1690–1692 (2008).

  38. 38.

    Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).

  39. 39.

    Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time. Science 354, 1273–1277 (2016).

  40. 40.

    Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).

  41. 41.

    Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).

  42. 42.

    Watabe-Uchida, M., Zhu, L., Ogawa, S. K., Vamanrao, A. & Uchida, N. Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron 74, 858–873 (2012).

  43. 43.

    Lammel, S., Lim, B. K. & Malenka, R. C. Reward and aversion in a heterogeneous midbrain dopamine system. Neuropharmacology 76, 351–359 (2014).

  44. 44.

    Takakuwa, N., Kato, R., Redgrave, P. & Isa, T. Emergence of visually-evoked reward expectation signals in dopamine neurons via the superior colliculus in V1 lesioned monkeys. eLife 6, e24459 (2017).

  45. 45.

    Wood, J., Simon, N. W., Koerner, F. S., Kass, R. E. & Moghaddam, B. Networks of VTA neurons encode real-time information about uncertain numbers of actions executed to earn a reward. Front. Behav. Neurosci. 11, 140 (2017).

  46. 46.

    Lammel, S. et al. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron 57, 760–773 (2008).

  47. 47.

    Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).

  48. 48.

    Chang, C. Y. et al. Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors. Nat. Neurosci. 19, 111–116 (2016).

  49. 49.

    Fischbach-Weiss, S., Reese, R. M. & Janak, P. H. Inhibiting mesolimbic dopamine neurons reduces the initiation and maintenance of instrumental responding. Neuroscience 372, 306–315 (2018).

  50. 50.

    Xie, X. & Seung, H. S. Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Comput. 15, 441–454 (2003).

  51. 51.

    Zhuang, X., Masson, J., Gingrich, J. A., Rayport, S. & Hen, R. Targeted gene expression in dopamine and serotonin neurons of the mouse brain. J. Neurosci. Methods 143, 27–32 (2005).

  52. 52.

    Tritsch, N. X., Oh, W. J., Gu, C. & Sabatini, B. L. Midbrain dopamine neurons sustain inhibitory transmission using plasma membrane uptake of GABA, not synthesis. eLife 3, e01936 (2014).

  53. 53.

    Hod, D. et al. Sensitive red protein calcium indicators for imaging neural activity. eLife 5, e12727 (2016).

Download references

Acknowledgements

We thank members of the J.T.D. laboratory, K. Bittner, C. Grienberger, D. Hunt, J. Macklin, J. Cohen, and R. Egnor for technical guidance; members of the J.T.D laboratory and members of the V. Jayaraman laboratory, B. Mensh, A. Lee, G. Rubin, and J. Day for project feedback; R. Rogers, J. Arnold, and C. Loper for assistance with behavioral rig design and implementation; and S. Lindo for assistance with surgeries. This work was supported by the Howard Hughes Medical Institute. J.T.D. is supported by Janelia.

Author information

Affiliations

  1. Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA

    • Luke T. Coddington
    •  & Joshua T. Dudman

Authors

  1. Search for Luke T. Coddington in:

  2. Search for Joshua T. Dudman in:

Contributions

Data collection and analysis were performed by L.T.C. with input from J.T.D. Simulations were implemented by J.T.D. with input from L.T.C. All other aspects of the work were the product of both authors.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Luke T. Coddington or Joshua T. Dudman.

Integrated supplementary information

  1. Supplementary Figure 1 Properties and characterization of recordings.

    a, Section from DAT-Cre::ai32 brain expressing eYFP under the control of the DAT promoter, additionally stained for tyrosine hydroxylase (TH). White scale bar, 1 mm. Representative of results from three independent experiments. b, Mean firing rates (top) and biphasic action potential duration (see Brown & Magill J. Neuro. 2009) for dopaminergic neurons recorded during (n = 96) and before (n = 29) animals experienced reward training. Bars indicate mean ± s.e.m. c, A small but significant difference existed between mDA neurons excited at movement initiation versus those inhibited or not modulated (one-way ANOVA, F = 10.5, P < 0.0001; Tukey’s post hoc, excited versus inhibited P < 0.0001, excited versus non-modulated P = 0.001, inhibited versus non-modulated P = 1). Bars indicate mean ± s.e.m. At right, mean ± s.e.m. spiking PETH aligned to movement initiation for neurons excited (red) versus inhibited (blue) around movement initiation. di, Left, comparing mean firing rates for mDA modulated (n = 75) versus not modulated (n = 21) at water delivery (t-test, p = 0.75, bars indicate mean ± s.e.m.), and (right) mean firing rate PETH and lick rate PETH during cued reward trials for the mDA neurons not modulated at water delivery (n = 21). The shaded area indicates s.e.m. dii, Left, mean ± s.e.m. firing rate for 21 non-reward modulated mDA neurons aligned to movement initiation. Right, mean effect of movement on firing rate, n =21; bars indicate mean ± s.e.m.

  2. Supplementary Figure 2 ‘Covert’ excitation in the form of spike phase advances in the absence of significant modulation of mean firing rates.

    a, Comparison of spike rasters either aligned to random spikes drawn from periods of stillness (top, gray) or aligned to the last spike before the movement-initiation-related pause (middle, green), with the cumulative density function at bottom, from the cell shown in a. b, Interspike intervals from the 1-s baseline prior to movement onset were significantly slower than for the last spikes before movement onset (two-tailed t test, n = 28, P = 0.009). c, The same analysis of spike phase advance for neurons recorded during reward training that lacked significant movement-aligned excitation (as determined by one-tailed sign-rank test P > 0.05). Two tailed t test, n = 73, P < 0.0001. *In b and c, colors reflect neurons recorded in the SNc (dark blue) or VTA (cyan).

  3. Supplementary Figure 3 Putatively ‘silent’ solenoid is undetectable.

    a, We tested for solenoid-related sounds at the point of mouse head fixation within the behavior rig by recording at 400 kHz with a Bruel & Kjaer 1/4" ultrasonic microphone (4939) with preamplifier (2670), amplified to 1 V/Pa with a Bruel and Kjaer Nexus microphone amplifier (2690-A). Data were filtered with a band pass of 1 to 200 kHz. b, Mean sound intensity across solenoid valve openings was calculated for the ultrasonic range and averaged across solenoid valve openings (indicated in a). The shaded area reflects s.e.m. c, Data summarized from n = 4 mice trained with an audible solenoid (>8 sessions) in which a ‘silent’ solenoid was triggered during ITI on ~ 20% of trials during a session. No modulation of behavior, either body movement (top) or licking (bottom), was apparent. The shaded area reflects s.e.m.

  4. Supplementary Figure 4 Cue and reward activity simulation incorporating independently learned responses scaled by a common factor replicates observed relationships in the data.

    Left, example results plotting the phasic modulation of activity (arbitrary units) as a function of trials for a simulation of the equations governing the change in dopamine neuron activity (ΔDA) at the time of the predictive tone (ΔDAtone) and the reward (ΔDAreward). The qquation is explicitly shown at right. Random numbers were drawn from a normal distribution (Nscaling). Right, Pearson’s correlation coefficients were calculated for all trials in the simulation (n = 1,000) for comparison with ‘observed’ Pearson’s correlations taken from the main text. Throughout the figure, red corresponds to the tone responses and black corresponds to reward responses.

  5. Supplementary Figure 5 mDA neurons do not encode within-bout movement.

    a, Mean firing rate (top), movement (middle), and lick rate (bottom) for 23 mDA cell recordings where cells were significantly excited at movement initiation (as determined by one-tailed sign-rank test P < 0.05). The shaded area reflects s.e.m. b, Same as in a, but for the point of maximum basket displacement within each movement bout excluding the first 500 ms surrounding movement initiation. c, No significant difference was observed between baseline rates and rates during the within-bout movements shown in b (two-tailed t test, n = 23, P = 0.6).

  6. Supplementary Figure 6 Locations of mDA axon fiber photometry recordings and reward correlates.

    a, Left, locations of recording fibers in ventral striatum in mice bilaterally injected with jRCaMP1a in the VTA as verified by histology. Right, example mean ventral striatal dopamine axon responses in the mouse indicated by the shaded fiber at left, aligned to cued reward delivery after three sessions of training. Mean ± s.e.m. from 56 water deliveries. b, Same as in a, but for mice injected with jRCaMP1a in the SNc and fibers implanted in the dorsal striatum. The trace at right represents the mean ± s.e.m. from 59 water deliveries.

  7. Supplementary Figure 7 Omission signals are independent from the positive RPE computation.

    a, Comparison of PETHs aligned to reward delivery in predicted trials (red) and to the moment of omitted reward delivery in trials with no reward delivery following the predictive tone (blue) are shown for middle (left) and late (right) training epochs. b, PETHs aligned to reward delivery for actual omission trials (blue) compared with the inferred, putative subtractive prediction effect (pred. PETH – unpred. PETH, purple). c, Inferred effect of prediction on mDA modulation by reward delivery (pred. – unpred.) plotted as a function of the mean modulation a movement initiation (F). The inset P value indicates the result of Pearson’s correlation (n = 10, r = –0.07, P = 0.8). d, Omission response versus movement response for 17 mDA neurons recorded when animals received omission trials in late training. Pearson’s correlation results in inset. The dotted line represents the best-fit trend. e, Effect of prediction (predicted – unpredicted reward responses) versus movement responses in 65 mDA neurons recorded when animals received unpredicted reward trials. Pearson’s correlation results in inset. The dotted line represents the best-fit trend.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–7 and Supplementary Table 1

  2. Reporting Summary

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41593-018-0245-7