The timing of action determines reward prediction signals in identified midbrain dopamine neurons

Coddington, Luke T.; Dudman, Joshua T.

doi:10.1038/s41593-018-0245-7

Article
Published: 15 October 2018

The timing of action determines reward prediction signals in identified midbrain dopamine neurons

Nature Neuroscience volume 21, pages 1563–1573 (2018)Cite this article

15k Accesses
96 Citations
111 Altmetric
Metrics details

Subjects

Abstract

Animals adapt their behavior in response to informative sensory cues using multiple brain circuits. The activity of midbrain dopaminergic neurons is thought to convey a critical teaching signal: reward-prediction error. Although reward-prediction error signals are thought to be essential to learning, little is known about the dynamic changes in the activity of midbrain dopaminergic neurons as animals learn about novel sensory cues and appetitive rewards. Here we describe a large dataset of cell-attached recordings of identified dopaminergic neurons as naive mice learned a novel cue–reward association. During learning midbrain dopaminergic neuron activity results from the summation of sensory cue-related and movement initiation-related response components. These components are both a function of reward expectation yet they are dissociable. Learning produces an increasingly precise coordination of action initiation following sensory cues that results in apparent reward-prediction error correlates. Our data thus provide new insights into the circuit mechanisms that underlie a critical computation in a highly conserved learning circuit.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Juxtacellular recording from identified mDA neurons in awake, behaving mice.**

**Fig. 2: Peri-movement excitation of mDA neurons, but not inhibition, depends on reward context.**

**Fig. 3: Initial mDA reward responses encode reward-related movements.**

**Fig. 4: mDA neuron responses to predictive cue and reward stimuli evolve independently during acquisition learning.**

**Fig. 5: Peri-movement activity reflects reward expectation and sums with cue responses.**

**Fig. 6: Physiological mDA stimulation supports learning but is insufficient to provoke movement initiation.**

**Fig. 7: Time course of RPE correlates in mDA neurons is determined by the timing of action initiation.**

**Fig. 8: mDA neuron responses are consistent with temporal summation of sensory cue and action initiation components.**

Machine learning reveals the control mechanics of an insect wing hinge

Article 17 April 2024

Control of neuronal excitation–inhibition balance by BMP–SMAD1 signalling

Article Open access 17 April 2024

A high-resolution transcriptomic and spatial atlas of cell types in the whole mouse brain

Article Open access 13 December 2023

Data availability

The data used to generate the results that support the findings of this study are available from the corresponding authors upon reasonable request.

References

Hebb, D. O. The Organization of Behavior: a Neuropsychological Theory (Wiley, New York, USA, 1949).
Sutton, R. S. & Barto, A. G. Reinforcement Learning: an Introduction (MIT Press, Boston, MA, USA, 1998).
Schultz, W. Neuronal reward and decision signals: from theories to data. Physiol. Rev. 95, 853–951 (2015).
Article CAS Google Scholar
Hollerman, J. R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).
Article CAS Google Scholar
Lak, A., Stauffer, W. R. & Schultz, W. Dopamine neurons learn relative chosen value from probabilistic rewards. eLife 5, e18044 (2016).
Article Google Scholar
Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).
Article CAS Google Scholar
Pan, W. X., Brown, J. & Dudman, J. T. Neural signals of extinction in the inhibitory microcircuit of the ventral midbrain. Nat. Neurosci. 16, 71–78 (2013).
Article CAS Google Scholar
Schultz, W., Apicella, P. & Ljungberg, T. Responses of monkey dopamine neurons to reward and conditioned stimuli during successive steps of learning a delayed response task. J. Neurosci. 13, 900–913 (1993).
Article CAS Google Scholar
Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).
Article CAS Google Scholar
Kawagoe, R., Takikawa, Y. & Hikosaka, O. Reward-predicting activity of dopamine and caudate neurons—a possible mechanism of motivational control of saccadic eye movement. J. Neurophysiol. 91, 1013–1024 (2004).
Article CAS Google Scholar
Menegas, W., Babayan, B. M., Uchida, N. & Watabe-Uchida, M. Opposite initialization to novel cues in dopamine signaling in ventral and posterior striatum in mice. eLife 6, e21886 (2017).
Article Google Scholar
Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).
Article CAS Google Scholar
Bornstein, A. M. & Daw, N. D. Multiplicity of control in the basal ganglia: computational roles of striatal subregions. Curr. Opin. Neurobiol. 21, 374–380 (2011).
Article CAS Google Scholar
Hart, G., Leung, B. K. & Balleine, B. W. Dorsal and ventral streams: the distinct role of striatal subregions in the acquisition and performance of goal-directed actions. Neurobiol. Learn. Mem. 108, 104–118 (2014).
Article Google Scholar
Suri, R. E. & Schultz, W. A neural network model with dopamine-like reinforcement signal that learns a spatial delayed response task. Neuroscience 91, 871–890 (1999).
Article CAS Google Scholar
Darvas, M., Wunsch, A. M., Gibbs, J. T. & Palmiter, R. D. Dopamine dependency for acquisition and performance of Pavlovian conditioned response. Proc. Natl Acad. Sci. USA 111, 2764–2769 (2014).
Article CAS Google Scholar
Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G. & Deisseroth, K. Millisecond-timescale, genetically targeted optical control of neural activity. Nat. Neurosci. 8, 1263–1268 (2005).
Article CAS Google Scholar
Osborne, J. E. & Dudman, J. T. RIVETS: a mechanical system for in vivo and in vitro electrophysiology and imaging. PLoS One 9, e89007 (2014).
Article Google Scholar
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Article CAS Google Scholar
Lima, S. Q., Hromadka, T., Znamenskiy, P. & Zador, A. M. PINP: a new method of tagging neuronal populations for identification during in vivo electrophysiological recording. PLoS One 4, e6099 (2009).
Article Google Scholar
Cohen, J. Y., Haesler, S., Vong, L., Lowell, B. B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).
Article CAS Google Scholar
Pan, W. X. & Dudman, J. T. A specific component of the evoked potential mirrors phasic dopamine neuron activity during conditioning. J. Neurosci. 35, 10451–10459 (2015).
Article CAS Google Scholar
Dodson, P. D. et al. Representation of spontaneous movement by dopaminergic neurons is cell-type selective and disrupted in parkinsonism. Proc. Natl Acad. Sci. USA 113, E2180–E2188 (2016).
Article CAS Google Scholar
Howe, M. W. & Dombeck, D. A. Rapid signalling in distinct dopaminergic axons during locomotion and reward. Nature 535, 505–510 (2016).
Article CAS Google Scholar
da Silva, J. A., Tecuapetla, F., Paixao, V. & Costa, R. M. Dopamine neuron activity before action initiation gates and invigorates future movements. Nature 554, 244–248 (2018).
Article Google Scholar
Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
Article CAS Google Scholar
Syed, E. C. et al. Action initiation shapes mesolimbic dopamine encoding of future rewards. Nat. Neurosci. 19, 34–36 (2016).
Article CAS Google Scholar
Barter, J. W. et al. Beyond reward prediction errors: the role of dopamine in movement kinematics. Front. Integr. Neurosci. 9, 39 (2015).
Article Google Scholar
Jin, X. & Costa, R. M. Start/stop signals emerge in nigrostriatal circuits during sequence learning. Nature 466, 457–462 (2010).
Article CAS Google Scholar
Collins, A. L. et al. Dynamic mesolimbic dopamine signaling during action sequence learning and expectation violation. Sci. Rep. 6, 20231 (2016).
Article CAS Google Scholar
Romo, R. & Schultz, W. Dopamine neurons of the monkey midbrain: contingencies of responses to active touch during self-initiated arm movements. J. Neurophysiol. 63, 592–606 (1990).
Article CAS Google Scholar
Fiorillo, C. D., Yun, S. R. & Song, M. R. Diversity and homogeneity in responses of midbrain dopamine neurons. J. Neurosci. 33, 4693–4709 (2013).
Article CAS Google Scholar
Betley, J. N. et al. Neurons for hunger and thirst transmit a negative-valence teaching signal. Nature 521, 180–185 (2015).
Article CAS Google Scholar
Zimmerman, C. A. et al. Thirst neurons anticipate the homeostatic consequences of eating and drinking. Nature 537, 680–684 (2016).
Article CAS Google Scholar
Day, J. J., Roitman, M. F., Wightman, R. M. & Carelli, R. M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).
Article CAS Google Scholar
Pan, W.-X., Schmidt, R., Wickens, J. R. & Hyland, B. I. Tripartite mechanism of extinction suggested by dopamine neuron activity and temporal difference model. J. Neurosci. 28, 9619–9631 (2008).
Article CAS Google Scholar
Stuber, G. D. et al. Reward-predictive cues enhance excitatory synaptic strength onto midbrain dopamine neurons. Science 321, 1690–1692 (2008).
Article CAS Google Scholar
Eshel, N., Tian, J., Bukwich, M. & Uchida, N. Dopamine neurons share common response function for reward prediction error. Nat. Neurosci. 19, 479–486 (2016).
Article CAS Google Scholar
Soares, S., Atallah, B. V. & Paton, J. J. Midbrain dopamine neurons control judgment of time. Science 354, 1273–1277 (2016).
Article CAS Google Scholar
Waelti, P., Dickinson, A. & Schultz, W. Dopamine responses comply with basic assumptions of formal learning theory. Nature 412, 43–48 (2001).
Article CAS Google Scholar
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).
Article CAS Google Scholar
Watabe-Uchida, M., Zhu, L., Ogawa, S. K., Vamanrao, A. & Uchida, N. Whole-brain mapping of direct inputs to midbrain dopamine neurons. Neuron 74, 858–873 (2012).
Article CAS Google Scholar
Lammel, S., Lim, B. K. & Malenka, R. C. Reward and aversion in a heterogeneous midbrain dopamine system. Neuropharmacology 76, 351–359 (2014).
Article CAS Google Scholar
Takakuwa, N., Kato, R., Redgrave, P. & Isa, T. Emergence of visually-evoked reward expectation signals in dopamine neurons via the superior colliculus in V1 lesioned monkeys. eLife 6, e24459 (2017).
Article Google Scholar
Wood, J., Simon, N. W., Koerner, F. S., Kass, R. E. & Moghaddam, B. Networks of VTA neurons encode real-time information about uncertain numbers of actions executed to earn a reward. Front. Behav. Neurosci. 11, 140 (2017).
Article Google Scholar
Lammel, S. et al. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron 57, 760–773 (2008).
Article CAS Google Scholar
Steinberg, E. E. et al. A causal link between prediction errors, dopamine neurons and learning. Nat. Neurosci. 16, 966–973 (2013).
Article CAS Google Scholar
Chang, C. Y. et al. Brief optogenetic inhibition of dopamine neurons mimics endogenous negative reward prediction errors. Nat. Neurosci. 19, 111–116 (2016).
Article CAS Google Scholar
Fischbach-Weiss, S., Reese, R. M. & Janak, P. H. Inhibiting mesolimbic dopamine neurons reduces the initiation and maintenance of instrumental responding. Neuroscience 372, 306–315 (2018).
Article CAS Google Scholar
Xie, X. & Seung, H. S. Equivalence of backpropagation and contrastive Hebbian learning in a layered network. Neural Comput. 15, 441–454 (2003).
Article Google Scholar
Zhuang, X., Masson, J., Gingrich, J. A., Rayport, S. & Hen, R. Targeted gene expression in dopamine and serotonin neurons of the mouse brain. J. Neurosci. Methods 143, 27–32 (2005).
Article CAS Google Scholar
Tritsch, N. X., Oh, W. J., Gu, C. & Sabatini, B. L. Midbrain dopamine neurons sustain inhibitory transmission using plasma membrane uptake of GABA, not synthesis. eLife 3, e01936 (2014).
Article Google Scholar
Hod, D. et al. Sensitive red protein calcium indicators for imaging neural activity. eLife 5, e12727 (2016).
Article Google Scholar

Download references

Acknowledgements

We thank members of the J.T.D. laboratory, K. Bittner, C. Grienberger, D. Hunt, J. Macklin, J. Cohen, and R. Egnor for technical guidance; members of the J.T.D laboratory and members of the V. Jayaraman laboratory, B. Mensh, A. Lee, G. Rubin, and J. Day for project feedback; R. Rogers, J. Arnold, and C. Loper for assistance with behavioral rig design and implementation; and S. Lindo for assistance with surgeries. This work was supported by the Howard Hughes Medical Institute. J.T.D. is supported by Janelia.

Author information

Authors and Affiliations

Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA
Luke T. Coddington & Joshua T. Dudman

Authors

Luke T. Coddington
View author publications
You can also search for this author in PubMed Google Scholar
Joshua T. Dudman
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Data collection and analysis were performed by L.T.C. with input from J.T.D. Simulations were implemented by J.T.D. with input from L.T.C. All other aspects of the work were the product of both authors.

Corresponding authors

Correspondence to Luke T. Coddington or Joshua T. Dudman.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Properties and characterization of recordings.

a, Section from DAT-Cre::ai32 brain expressing eYFP under the control of the DAT promoter, additionally stained for tyrosine hydroxylase (TH). White scale bar, 1 mm. Representative of results from three independent experiments. b, Mean firing rates (top) and biphasic action potential duration (see Brown & Magill J. Neuro. 2009) for dopaminergic neurons recorded during (n = 96) and before (n = 29) animals experienced reward training. Bars indicate mean ± s.e.m. c, A small but significant difference existed between mDA neurons excited at movement initiation versus those inhibited or not modulated (one-way ANOVA, F = 10.5, P < 0.0001; Tukey’s post hoc, excited versus inhibited P < 0.0001, excited versus non-modulated P = 0.001, inhibited versus non-modulated P = 1). Bars indicate mean ± s.e.m. At right, mean ± s.e.m. spiking PETH aligned to movement initiation for neurons excited (red) versus inhibited (blue) around movement initiation. di, Left, comparing mean firing rates for mDA modulated (n = 75) versus not modulated (n = 21) at water delivery (t-test, p = 0.75, bars indicate mean ± s.e.m.), and (right) mean firing rate PETH and lick rate PETH during cued reward trials for the mDA neurons not modulated at water delivery (n = 21). The shaded area indicates s.e.m. dii, Left, mean ± s.e.m. firing rate for 21 non-reward modulated mDA neurons aligned to movement initiation. Right, mean effect of movement on firing rate, n =21; bars indicate mean ± s.e.m.

Supplementary Figure 2 ‘Covert’ excitation in the form of spike phase advances in the absence of significant modulation of mean firing rates.

a, Comparison of spike rasters either aligned to random spikes drawn from periods of stillness (top, gray) or aligned to the last spike before the movement-initiation-related pause (middle, green), with the cumulative density function at bottom, from the cell shown in a. b, Interspike intervals from the 1-s baseline prior to movement onset were significantly slower than for the last spikes before movement onset (two-tailed t test, n = 28, P = 0.009). c, The same analysis of spike phase advance for neurons recorded during reward training that lacked significant movement-aligned excitation (as determined by one-tailed sign-rank test P > 0.05). Two tailed t test, n = 73, P < 0.0001. *In b and c, colors reflect neurons recorded in the SNc (dark blue) or VTA (cyan).

Supplementary Figure 3 Putatively ‘silent’ solenoid is undetectable.

a, We tested for solenoid-related sounds at the point of mouse head fixation within the behavior rig by recording at 400 kHz with a Bruel & Kjaer 1/4" ultrasonic microphone (4939) with preamplifier (2670), amplified to 1 V/Pa with a Bruel and Kjaer Nexus microphone amplifier (2690-A). Data were filtered with a band pass of 1 to 200 kHz. b, Mean sound intensity across solenoid valve openings was calculated for the ultrasonic range and averaged across solenoid valve openings (indicated in a). The shaded area reflects s.e.m. c, Data summarized from n = 4 mice trained with an audible solenoid (>8 sessions) in which a ‘silent’ solenoid was triggered during ITI on ~ 20% of trials during a session. No modulation of behavior, either body movement (top) or licking (bottom), was apparent. The shaded area reflects s.e.m.

Supplementary Figure 4 Cue and reward activity simulation incorporating independently learned responses scaled by a common factor replicates observed relationships in the data.

Left, example results plotting the phasic modulation of activity (arbitrary units) as a function of trials for a simulation of the equations governing the change in dopamine neuron activity (ΔDA) at the time of the predictive tone (ΔDA_tone) and the reward (ΔDA_reward). The qquation is explicitly shown at right. Random numbers were drawn from a normal distribution (N_scaling). Right, Pearson’s correlation coefficients were calculated for all trials in the simulation (n = 1,000) for comparison with ‘observed’ Pearson’s correlations taken from the main text. Throughout the figure, red corresponds to the tone responses and black corresponds to reward responses.

Supplementary Figure 5 mDA neurons do not encode within-bout movement.

a, Mean firing rate (top), movement (middle), and lick rate (bottom) for 23 mDA cell recordings where cells were significantly excited at movement initiation (as determined by one-tailed sign-rank test P < 0.05). The shaded area reflects s.e.m. b, Same as in a, but for the point of maximum basket displacement within each movement bout excluding the first 500 ms surrounding movement initiation. c, No significant difference was observed between baseline rates and rates during the within-bout movements shown in b (two-tailed t test, n = 23, P = 0.6).

Supplementary Figure 6 Locations of mDA axon fiber photometry recordings and reward correlates.

a, Left, locations of recording fibers in ventral striatum in mice bilaterally injected with jRCaMP1a in the VTA as verified by histology. Right, example mean ventral striatal dopamine axon responses in the mouse indicated by the shaded fiber at left, aligned to cued reward delivery after three sessions of training. Mean ± s.e.m. from 56 water deliveries. b, Same as in a, but for mice injected with jRCaMP1a in the SNc and fibers implanted in the dorsal striatum. The trace at right represents the mean ± s.e.m. from 59 water deliveries.

Supplementary Figure 7 Omission signals are independent from the positive RPE computation.

a, Comparison of PETHs aligned to reward delivery in predicted trials (red) and to the moment of omitted reward delivery in trials with no reward delivery following the predictive tone (blue) are shown for middle (left) and late (right) training epochs. b, PETHs aligned to reward delivery for actual omission trials (blue) compared with the inferred, putative subtractive prediction effect (pred. PETH – unpred. PETH, purple). c, Inferred effect of prediction on mDA modulation by reward delivery (pred. – unpred.) plotted as a function of the mean modulation a movement initiation (F). The inset P value indicates the result of Pearson’s correlation (n = 10, r = –0.07, P = 0.8). d, Omission response versus movement response for 17 mDA neurons recorded when animals received omission trials in late training. Pearson’s correlation results in inset. The dotted line represents the best-fit trend. e, Effect of prediction (predicted – unpredicted reward responses) versus movement responses in 65 mDA neurons recorded when animals received unpredicted reward trials. Pearson’s correlation results in inset. The dotted line represents the best-fit trend.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Table 1

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Cite this article

Coddington, L.T., Dudman, J.T. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nat Neurosci 21, 1563–1573 (2018). https://doi.org/10.1038/s41593-018-0245-7

Download citation

Received: 26 February 2018
Accepted: 20 August 2018
Published: 15 October 2018
Issue Date: November 2018
DOI: https://doi.org/10.1038/s41593-018-0245-7

This article is cited by

Dopamine projections to the basolateral amygdala drive the encoding of identity-specific reward memories
- Ana C. Sias
- Yousif Jafar
- Kate M. Wassum
Nature Neuroscience (2024)
Unique functional responses differentially map onto genetic subtypes of dopamine neurons
- Maite Azcorra
- Zachary Gaertner
- Daniel A. Dombeck
Nature Neuroscience (2023)
Mesolimbic dopamine adapts the rate of learning from action
- Luke T. Coddington
- Sarah E. Lindo
- Joshua T. Dudman
Nature (2023)
Spontaneous behaviour is structured by reinforcement without explicit reward
- Jeffrey E. Markowitz
- Winthrop F. Gillis
- Sandeep Robert Datta
Nature (2023)
Nigrostriatal dopamine pathway regulates auditory discrimination behavior
- Allen P. F. Chen
- Jeffrey M. Malgady
- Qiaojie Xiong
Nature Communications (2022)