Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The timing of action determines reward prediction signals in identified midbrain dopamine neurons

Abstract

Animals adapt their behavior in response to informative sensory cues using multiple brain circuits. The activity of midbrain dopaminergic neurons is thought to convey a critical teaching signal: reward-prediction error. Although reward-prediction error signals are thought to be essential to learning, little is known about the dynamic changes in the activity of midbrain dopaminergic neurons as animals learn about novel sensory cues and appetitive rewards. Here we describe a large dataset of cell-attached recordings of identified dopaminergic neurons as naive mice learned a novel cue–reward association. During learning midbrain dopaminergic neuron activity results from the summation of sensory cue-related and movement initiation-related response components. These components are both a function of reward expectation yet they are dissociable. Learning produces an increasingly precise coordination of action initiation following sensory cues that results in apparent reward-prediction error correlates. Our data thus provide new insights into the circuit mechanisms that underlie a critical computation in a highly conserved learning circuit.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Juxtacellular recording from identified mDA neurons in awake, behaving mice.
Fig. 2: Peri-movement excitation of mDA neurons, but not inhibition, depends on reward context.
Fig. 3: Initial mDA reward responses encode reward-related movements.
Fig. 4: mDA neuron responses to predictive cue and reward stimuli evolve independently during acquisition learning.
Fig. 5: Peri-movement activity reflects reward expectation and sums with cue responses.
Fig. 6: Physiological mDA stimulation supports learning but is insufficient to provoke movement initiation.
Fig. 7: Time course of RPE correlates in mDA neurons is determined by the timing of action initiation.
Fig. 8: mDA neuron responses are consistent with temporal summation of sensory cue and action initiation components.

Data availability

The data used to generate the results that support the findings of this study are available from the corresponding authors upon reasonable request.

References

  1. Hebb, D. O. The Organization of Behavior: a Neuropsychological Theory (Wiley, New York, USA, 1949).

  2. Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction (MIT Press, Boston, MA, USA, 1998).

  3. Schultz, W. Neuronal reward and decision signals: from theories to data. Physiol. Rev. 95, 853–951 (2015).

    CAS  Article  Google Scholar 

  4. Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).

    CAS  Article  Google Scholar 

  5. Lak, A., Stauffer, W. R. & Schultz, W. Dopamine neurons learn relative chosen value from probabilistic rewards. eLife 5, e18044 (2016).

    Article  Google Scholar 

  6. Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).

    CAS  Article  Google Scholar 

  7. Pan, W. X., Brown, J. & Dudman, J. T. Neural signals of extinction in the inhibitory microcircuit of the ventral midbrain. Nat. Neurosci. 16, 71–78 (2013).

    CAS  Article  Google Scholar 

  8. Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900–913 (1993).

    CAS  Article  Google Scholar 

  9. Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).

    CAS  Article  Google Scholar 

  10. Kawagoe, R., Takikawa, Y. & Hikosaka, O. Reward-predicting activity of dopamine and caudate neurons—a possible mechanism of motivational control of saccadic eye movement. J. Neurophysiol. 91, 1013–1024 (2004).

    CAS  Article  Google Scholar 

  11. Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).

    Article  Google Scholar 

  12. Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).

    CAS  Article  Google Scholar 

  13. Bornstein, A. M. & Daw, N. D. Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–380 (2011).

    CAS  Article  Google Scholar 

  14. Hart, G., Leung, B. K. & Balleine, B. W. Dorsal and ventral streams: the distinct role of striatal subregions in the acquisition and performance of goal-directed actions. Neurobiol. Learn. Mem. 108, 104–118 (2014).

    Article  Google Scholar 

  15. Suri, R. E. & Schultz, W. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91, 871–890 (1999).

    CAS  Article  Google Scholar 

  16. Darvas, M., Wunsch, A. M., Gibbs, J. T. & Palmiter, R. D. Dopamine dependency for acquisition and performance of Pavlovian conditioned response. Proc. Natl Acad. Sci. USA 111, 2764–2769 (2014).

    CAS  Article  Google Scholar 

  17. Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G. & Deisseroth, K. Millisecond-timescale, genetically targeted optical control of neural activity. Nat. Neurosci. 8, 1263–1268 (2005).

    CAS  Article  Google Scholar 

  18. Osborne, J. E. & Dudman, J. T. RIVETS: a mechanical system for in vivo and in vitro electrophysiology and imaging. PLoS One 9, e89007 (2014).

    Article  Google Scholar 

  19. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    CAS  Article  Google Scholar 

  20. Lima, S. Q., Hromadka, T., Znamenskiy, P. & Zador, A. M. PINP: a new method of tagging neuronal populations for identification during in vivo electrophysiological recording. PLoS One 4, e6099 (2009).

    Article  Google Scholar 

  21. Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

    CAS  Article  Google Scholar 

  22. Pan, W. X. & Dudman, J. T. A specific component of the evoked potential mirrors phasic dopamine neuron activity during conditioning. J. Neurosci. 35, 10451–10459 (2015).

    CAS  Article  Google Scholar 

  23. Dodson, P. D. et al. Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism. Proc. Natl Acad. Sci. USA 113, E2180–E2188 (2016).

    CAS  Article  Google Scholar 

  24. Howe, M. W. & Dombeck, D. A. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510 (2016).

    CAS  Article  Google Scholar 

  25. da Silva, J. A., Tecuapetla, F., Paixao, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).

    Article  Google Scholar 

  26. Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).

    CAS  Article  Google Scholar 

  27. Syed, E. C. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).

    CAS  Article  Google Scholar 

  28. Barter, J. W. et al. Beyond reward prediction errors: the role of dopamine in movement kinematics. Front. Integr. Neurosci. 9, 39 (2015).

    Article  Google Scholar 

  29. Jin, X. & Costa, R. M. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature 466, 457–462 (2010).

    CAS  Article  Google Scholar 

  30. Collins, A. L. et al. Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Sci. Rep. 6, 20231 (2016).

    CAS  Article  Google Scholar 

  31. Romo, R. & Schultz, W. Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63, 592–606 (1990).

    CAS  Article  Google Scholar 

  32. Fiorillo, C. D., Yun, S. R. & Song, M. R. Diversity and homogeneity in responses of midbrain dopamine neurons. J. Neurosci. 33, 4693–4709 (2013).

    CAS  Article  Google Scholar 

  33. Betley, J. N. et al. Neurons for hunger and thirst transmit a negative-valence teaching signal. Nature 521, 180–185 (2015).

    CAS  Article  Google Scholar 

  34. Zimmerman, C. A. et al. Thirst neurons anticipate the homeostatic consequences of eating and drinking. Nature 537, 680–684 (2016).

    CAS  Article  Google Scholar 

  35. Day, J. J., Roitman, M. F., Wightman, R. M. & Carelli, R. M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).

    CAS  Article  Google Scholar 

  36. Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model. J. Neurosci. 28, 9619–9631 (2008).

    CAS  Article  Google Scholar 

  37. Stuber, G. D. et al. Reward-predictive cues enhance excitatory synaptic strength onto midbrain dopamine neurons. Science 321, 1690–1692 (2008).

    CAS  Article  Google Scholar 

  38. Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).

    CAS  Article  Google Scholar 

  39. Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time. Science 354, 1273–1277 (2016).

    CAS  Article  Google Scholar 

  40. Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).

    CAS  Article  Google Scholar 

  41. Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).

    CAS  Article  Google Scholar 

  42. Watabe-Uchida, M., Zhu, L., Ogawa, S. K., Vamanrao, A. & Uchida, N. Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron 74, 858–873 (2012).

    CAS  Article  Google Scholar 

  43. Lammel, S., Lim, B. K. & Malenka, R. C. Reward and aversion in a heterogeneous midbrain dopamine system. Neuropharmacology 76, 351–359 (2014).

    CAS  Article  Google Scholar 

  44. Takakuwa, N., Kato, R., Redgrave, P. & Isa, T. Emergence of visually-evoked reward expectation signals in dopamine neurons via the superior colliculus in V1 lesioned monkeys. eLife 6, e24459 (2017).

    Article  Google Scholar 

  45. Wood, J., Simon, N. W., Koerner, F. S., Kass, R. E. & Moghaddam, B. Networks of VTA neurons encode real-time information about uncertain numbers of actions executed to earn a reward. Front. Behav. Neurosci. 11, 140 (2017).

    Article  Google Scholar 

  46. Lammel, S. et al. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron 57, 760–773 (2008).

    CAS  Article  Google Scholar 

  47. Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).

    CAS  Article  Google Scholar 

  48. Chang, C. Y. et al. Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors. Nat. Neurosci. 19, 111–116 (2016).

    CAS  Article  Google Scholar 

  49. Fischbach-Weiss, S., Reese, R. M. & Janak, P. H. Inhibiting mesolimbic dopamine neurons reduces the initiation and maintenance of instrumental responding. Neuroscience 372, 306–315 (2018).

    CAS  Article  Google Scholar 

  50. Xie, X. & Seung, H. S. Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Comput. 15, 441–454 (2003).

    Article  Google Scholar 

  51. Zhuang, X., Masson, J., Gingrich, J. A., Rayport, S. & Hen, R. Targeted gene expression in dopamine and serotonin neurons of the mouse brain. J. Neurosci. Methods 143, 27–32 (2005).

    CAS  Article  Google Scholar 

  52. Tritsch, N. X., Oh, W. J., Gu, C. & Sabatini, B. L. Midbrain dopamine neurons sustain inhibitory transmission using plasma membrane uptake of GABA, not synthesis. eLife 3, e01936 (2014).

    Article  Google Scholar 

  53. Hod, D. et al. Sensitive red protein calcium indicators for imaging neural activity. eLife 5, e12727 (2016).

    Article  Google Scholar 

Download references

Acknowledgements

We thank members of the J.T.D. laboratory, K. Bittner, C. Grienberger, D. Hunt, J. Macklin, J. Cohen, and R. Egnor for technical guidance; members of the J.T.D laboratory and members of the V. Jayaraman laboratory, B. Mensh, A. Lee, G. Rubin, and J. Day for project feedback; R. Rogers, J. Arnold, and C. Loper for assistance with behavioral rig design and implementation; and S. Lindo for assistance with surgeries. This work was supported by the Howard Hughes Medical Institute. J.T.D. is supported by Janelia.

Author information

Authors and Affiliations

Authors

Contributions

Data collection and analysis were performed by L.T.C. with input from J.T.D. Simulations were implemented by J.T.D. with input from L.T.C. All other aspects of the work were the product of both authors.

Corresponding authors

Correspondence to Luke T. Coddington or Joshua T. Dudman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Properties and characterization of recordings.

a, Section from DAT-Cre::ai32 brain expressing eYFP under the control of the DAT promoter, additionally stained for tyrosine hydroxylase (TH). White scale bar, 1 mm. Representative of results from three independent experiments. b, Mean firing rates (top) and biphasic action potential duration (see Brown & Magill J. Neuro. 2009) for dopaminergic neurons recorded during (n = 96) and before (n = 29) animals experienced reward training. Bars indicate mean ± s.e.m. c, A small but significant difference existed between mDA neurons excited at movement initiation versus those inhibited or not modulated (one-way ANOVA, F = 10.5, P < 0.0001; Tukey’s post hoc, excited versus inhibited P < 0.0001, excited versus non-modulated P = 0.001, inhibited versus non-modulated P = 1). Bars indicate mean ± s.e.m. At right, mean ± s.e.m. spiking PETH aligned to movement initiation for neurons excited (red) versus inhibited (blue) around movement initiation. di, Left, comparing mean firing rates for mDA modulated (n = 75) versus not modulated (n = 21) at water delivery (t-test, p = 0.75, bars indicate mean ± s.e.m.), and (right) mean firing rate PETH and lick rate PETH during cued reward trials for the mDA neurons not modulated at water delivery (n = 21). The shaded area indicates s.e.m. dii, Left, mean ± s.e.m. firing rate for 21 non-reward modulated mDA neurons aligned to movement initiation. Right, mean effect of movement on firing rate, n =21; bars indicate mean ± s.e.m.

Supplementary Figure 2 ‘Covert’ excitation in the form of spike phase advances in the absence of significant modulation of mean firing rates.

a, Comparison of spike rasters either aligned to random spikes drawn from periods of stillness (top, gray) or aligned to the last spike before the movement-initiation-related pause (middle, green), with the cumulative density function at bottom, from the cell shown in a. b, Interspike intervals from the 1-s baseline prior to movement onset were significantly slower than for the last spikes before movement onset (two-tailed t test, n = 28, P = 0.009). c, The same analysis of spike phase advance for neurons recorded during reward training that lacked significant movement-aligned excitation (as determined by one-tailed sign-rank test P > 0.05). Two tailed t test, n = 73, P < 0.0001. *In b and c, colors reflect neurons recorded in the SNc (dark blue) or VTA (cyan).

Supplementary Figure 3 Putatively ‘silent’ solenoid is undetectable.

a, We tested for solenoid-related sounds at the point of mouse head fixation within the behavior rig by recording at 400 kHz with a Bruel & Kjaer 1/4" ultrasonic microphone (4939) with preamplifier (2670), amplified to 1 V/Pa with a Bruel and Kjaer Nexus microphone amplifier (2690-A). Data were filtered with a band pass of 1 to 200 kHz. b, Mean sound intensity across solenoid valve openings was calculated for the ultrasonic range and averaged across solenoid valve openings (indicated in a). The shaded area reflects s.e.m. c, Data summarized from n = 4 mice trained with an audible solenoid (>8 sessions) in which a ‘silent’ solenoid was triggered during ITI on ~ 20% of trials during a session. No modulation of behavior, either body movement (top) or licking (bottom), was apparent. The shaded area reflects s.e.m.

Supplementary Figure 4 Cue and reward activity simulation incorporating independently learned responses scaled by a common factor replicates observed relationships in the data.

Left, example results plotting the phasic modulation of activity (arbitrary units) as a function of trials for a simulation of the equations governing the change in dopamine neuron activity (ΔDA) at the time of the predictive tone (ΔDAtone) and the reward (ΔDAreward). The qquation is explicitly shown at right. Random numbers were drawn from a normal distribution (Nscaling). Right, Pearson’s correlation coefficients were calculated for all trials in the simulation (n = 1,000) for comparison with ‘observed’ Pearson’s correlations taken from the main text. Throughout the figure, red corresponds to the tone responses and black corresponds to reward responses.

Supplementary Figure 5 mDA neurons do not encode within-bout movement.

a, Mean firing rate (top), movement (middle), and lick rate (bottom) for 23 mDA cell recordings where cells were significantly excited at movement initiation (as determined by one-tailed sign-rank test P < 0.05). The shaded area reflects s.e.m. b, Same as in a, but for the point of maximum basket displacement within each movement bout excluding the first 500 ms surrounding movement initiation. c, No significant difference was observed between baseline rates and rates during the within-bout movements shown in b (two-tailed t test, n = 23, P = 0.6).

Supplementary Figure 6 Locations of mDA axon fiber photometry recordings and reward correlates.

a, Left, locations of recording fibers in ventral striatum in mice bilaterally injected with jRCaMP1a in the VTA as verified by histology. Right, example mean ventral striatal dopamine axon responses in the mouse indicated by the shaded fiber at left, aligned to cued reward delivery after three sessions of training. Mean ± s.e.m. from 56 water deliveries. b, Same as in a, but for mice injected with jRCaMP1a in the SNc and fibers implanted in the dorsal striatum. The trace at right represents the mean ± s.e.m. from 59 water deliveries.

Supplementary Figure 7 Omission signals are independent from the positive RPE computation.

a, Comparison of PETHs aligned to reward delivery in predicted trials (red) and to the moment of omitted reward delivery in trials with no reward delivery following the predictive tone (blue) are shown for middle (left) and late (right) training epochs. b, PETHs aligned to reward delivery for actual omission trials (blue) compared with the inferred, putative subtractive prediction effect (pred. PETH – unpred. PETH, purple). c, Inferred effect of prediction on mDA modulation by reward delivery (pred. – unpred.) plotted as a function of the mean modulation a movement initiation (F). The inset P value indicates the result of Pearson’s correlation (n = 10, r = –0.07, P = 0.8). d, Omission response versus movement response for 17 mDA neurons recorded when animals received omission trials in late training. Pearson’s correlation results in inset. The dotted line represents the best-fit trend. e, Effect of prediction (predicted – unpredicted reward responses) versus movement responses in 65 mDA neurons recorded when animals received unpredicted reward trials. Pearson’s correlation results in inset. The dotted line represents the best-fit trend.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Table 1

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Coddington, L.T., Dudman, J.T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat Neurosci 21, 1563–1573 (2018). https://doi.org/10.1038/s41593-018-0245-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-018-0245-7

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing