Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target

Abstract

Dopaminergic (DA) neurons in the midbrain provide rich topographic innervation of the striatum and are central to learning and to generating actions. Despite the importance of this DA innervation, it remains unclear whether and how DA neurons are specialized on the basis of the location of their striatal target. Thus, we sought to compare the function of subpopulations of DA neurons that target distinct striatal subregions in the context of an instrumental reversal learning task. We identified key differences in the encoding of reward and choice in dopamine terminals in dorsal versus ventral striatum: DA terminals in ventral striatum responded more strongly to reward consumption and reward-predicting cues, whereas DA terminals in dorsomedial striatum responded more strongly to contralateral choices. In both cases the terminals encoded a reward prediction error. Our results suggest that the DA modulation of the striatum is spatially organized to support the specialized function of the targeted subregion.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Mice continually learn which choice to make on the basis of recent experience.
Figure 2: Inhibiting DA neurons in the VTA–SNc alters an animal's choice on future trials.
Figure 3: Calcium recordings in terminals of striatally projecting DA neurons.
Figure 4: Response to reward consumption and to reward-predictive cues dominates in VTA–SN::NAc relative to VTA–SN::DMS terminals.
Figure 5: Evidence of reward prediction error encoding in both VTA–SN::NAc and VTA–SN::DMS terminals.
Figure 6: Contralateral response preference in VTA–SN::DMS terminals.
Figure 7: Contralateral response preference in VTA–SN::DMS cell bodies.

Similar content being viewed by others

References

  1. Morris, G., Nevet, A., Arkadir, D., Vaadia, E. & Bergman, H. Midbrain dopamine neurons encode decisions for future action. Nat. Neurosci. 9, 1057–1063 (2006).

    CAS  PubMed  Google Scholar 

  2. Schultz, W., Dayan, P. & Montague, P.R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    Article  CAS  PubMed  Google Scholar 

  3. Starr, B.S. & Starr, M.S. Differential effects of dopamine D1 and D2 agonists and antagonists on velocity of movement, rearing and grooming in the mouse. Implications for the roles of D1 and D2 receptors. Neuropharmacology 25, 455–463 (1986).

    CAS  PubMed  Google Scholar 

  4. Wise, R.A. Dopamine, learning and motivation. Nat. Rev. Neurosci. 5, 483–494 (2004).

    CAS  PubMed  Google Scholar 

  5. Marshall, J.F. & Berrios, N. Movement disorders of aged rats: reversal by dopamine receptor stimulation. Science 206, 477–479 (1979).

    CAS  PubMed  Google Scholar 

  6. Arbuthnott, G.W. & Crow, T.J. Relation of contraversive turning to unilateral release of dopamine from the nigrostriatal pathway in rats. Exp. Neurol. 30, 484–491 (1971).

    CAS  PubMed  Google Scholar 

  7. Hollerman, J.R. & Schultz, W. Dopamine neurons report an error in the temporal prediction of reward during learning. Nat. Neurosci. 1, 304–309 (1998).

    CAS  PubMed  Google Scholar 

  8. Schultz, W. Predictive reward signal of dopamine neurons. J. Neurophysiol. 80, 1–27 (1998).

    CAS  PubMed  Google Scholar 

  9. Witten, I.B. et al. Recombinase-driver rat lines: tools, techniques, and optogenetic application to dopamine-mediated reinforcement. Neuron 72, 721–733 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Tsai, H.-C. et al. Phasic firing in dopaminergic neurons is sufficient for behavioral conditioning. Science 324, 1080–1084 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Nestler, E.J. & Carlezon, W.A. Jr. The mesolimbic dopamine reward circuit in depression. Biol. Psychiatry 59, 1151–1159 (2006).

    CAS  PubMed  Google Scholar 

  12. Niv, Y., Daw, N.D., Joel, D. & Dayan, P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology (Berl.) 191, 507–520 (2007).

    CAS  Google Scholar 

  13. Nicola, S.M., Surmeier, J. & Malenka, R.C. Dopaminergic modulation of neuronal excitability in the striatum and nucleus accumbens. Annu. Rev. Neurosci. 23, 185–215 (2000).

    CAS  PubMed  Google Scholar 

  14. Graybiel, A.M. The basal ganglia. Curr. Biol. 10, R509–R511 (2000).

    CAS  PubMed  Google Scholar 

  15. Domesick, V.B. Neuroanatomical organization of dopamine neurons in the ventral tegmental area. Ann. NY Acad. Sci. 537, 10–26 (1988).

    CAS  PubMed  Google Scholar 

  16. Lammel, S., Ion, D.I., Roeper, J. & Malenka, R.C. Projection-specific modulation of dopamine neuron synapses by aversive and rewarding stimuli. Neuron 70, 855–862 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Lynd-Balta, E. & Haber, S.N. The organization of midbrain projections to the striatum in the primate: sensorimotor-related striatum versus ventral striatum. Neuroscience 59, 625–640 (1994).

    CAS  PubMed  Google Scholar 

  18. Lammel, S. et al. Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron 57, 760–773 (2008).

    CAS  PubMed  Google Scholar 

  19. Lerner, T.N. et al. Intact-brain analyses reveal distinct information carried by SNc dopamine subcircuits. Cell 162, 635–647 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Tai, L.-H., Lee, A.M., Benavidez, N., Bonci, A. & Wilbrecht, L. Transient stimulation of distinct subpopulations of striatal neurons mimics changes in action value. Nat. Neurosci. 15, 1281–1289 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Balleine, B.W., Delgado, M.R. & Hikosaka, O. The role of the dorsal striatum in reward and decision-making. J. Neurosci. 27, 8161–8165 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Graybiel, A.M., Aosaki, T., Flaherty, A.W. & Kimura, M. The basal ganglia and adaptive motor control. Science 265, 1826–1831 (1994).

    CAS  PubMed  Google Scholar 

  23. Roitman, M.F., Wheeler, R.A. & Carelli, R.M. Nucleus accumbens neurons are innately tuned for rewarding and aversive taste stimuli, encode their predictors, and are linked to motor output. Neuron 45, 587–597 (2005).

    CAS  PubMed  Google Scholar 

  24. Ikemoto, S. & Panksepp, J. The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking. Brain Res. Brain Res. Rev. 31, 6–41 (1999).

    CAS  PubMed  Google Scholar 

  25. Chen, T.-W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature 499, 295–300 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Gunaydin, L.A. et al. Natural neural projection dynamics underlying social behavior. Cell 157, 1535–1551 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Lau, B. & Glimcher, P.W. Value representations in the primate striatum during matching behavior. Neuron 58, 451–463 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Lau, B. & Glimcher, P.W. Dynamic response-by-response models of matching behavior in rhesus monkeys. J. Exp. Anal. Behav. 84, 555–579 (2005).

    PubMed  PubMed Central  Google Scholar 

  29. Lammel, S. et al. Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron 85, 429–438 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Cui, G. et al. Concurrent activation of striatal direct and indirect pathways during action initiation. Nature 494, 238–242 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Hyland, B.I., Reynolds, J.N.J., Hay, J., Perk, C.G. & Miller, R. Firing modes of midbrain dopamine cells in the freely moving rat. Neuroscience 114, 475–492 (2002).

    CAS  PubMed  Google Scholar 

  32. Pan, W.-X., Schmidt, R., Wickens, J.R. & Hyland, B.I. Dopamine cells respond to predicted events during classical conditioning: evidence for eligibility traces in the reward-learning network. J. Neurosci. 25, 6235–6242 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Cohen, J.Y., Haesler, S., Vong, L., Lowell, B.B. & Uchida, N. Neuron-type-specific signals for reward and punishment in the ventral tegmental area. Nature 482, 85–88 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Roesch, M.R., Calu, D.J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615–1624 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Bayer, H.M. & Glimcher, P.W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Day, J.J., Roitman, M.F., Wightman, R.M. & Carelli, R.M. Associative learning mediates dynamic shifts in dopamine signaling in the nucleus accumbens. Nat. Neurosci. 10, 1020–1028 (2007).

    CAS  PubMed  Google Scholar 

  37. Ilango, A. et al. Similar roles of substantia nigra and ventral tegmental dopamine neurons in reward and aversion. J. Neurosci. 34, 817–822 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Gershman, S.J., Pesaran, B. & Daw, N.D. Human reinforcement learning subdivides structured action spaces by learning effector-specific values. J. Neurosci. 29, 13524–13531 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. O'Reilly, R.C. & Frank, M.J. Making working memory work: a computational model of learning in the prefrontal cortex and basal ganglia. Neural Comput. 18, 283–328 (2006).

    PubMed  Google Scholar 

  40. Diuk, C., Tsai, K., Wallis, J., Botvinick, M. & Niv, Y. Hierarchical learning induces two simultaneous, but separable, prediction errors in human basal ganglia. J. Neurosci. 33, 5797–5805 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Matsumoto, M. & Hikosaka, O. Two types of dopamine neuron distinctly convey positive and negative motivational signals. Nature 459, 837–841 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Mirenowicz, J. & Schultz, W. Importance of unpredictability for reward responses in primate dopamine neurons. J. Neurophysiol. 72, 1024–1027 (1994).

    CAS  PubMed  Google Scholar 

  43. Mirenowicz, J. & Schultz, W. Preferential activation of midbrain dopamine neurons by appetitive rather than aversive stimuli. Nature 379, 449–451 (1996).

    CAS  PubMed  Google Scholar 

  44. Stefani, M.R. & Moghaddam, B. Rule learning and reward contingency are associated with dissociable patterns of dopamine activation in the rat prefrontal cortex, nucleus accumbens, and dorsal striatum. J. Neurosci. 26, 8810–8818 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Howe, M.W., Tierney, P.L., Sandberg, S.G., Phillips, P.E.M. & Graybiel, A.M. Prolonged dopamine signalling in striatum signals proximity and value of distant rewards. Nature 500, 575–579 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Phillips, P.E.M., Stuber, G.D., Heien, M.L.A.V., Wightman, R.M. & Carelli, R.M. Subsecond dopamine release promotes cocaine seeking. Nature 422, 614–618 (2003).

    CAS  PubMed  Google Scholar 

  47. Hart, A.S., Rutledge, R.B., Glimcher, P.W. & Phillips, P.E.M. Phasic dopamine release in the rat nucleus accumbens symmetrically encodes a reward prediction error term. J. Neurosci. 34, 698–704 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Hamid, A.A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).

    CAS  PubMed  Google Scholar 

  49. Willuhn, I., Burgeno, L.M., Everitt, B.J. & Phillips, P.E.M. Hierarchical recruitment of phasic dopamine signaling in the striatum during the progression of cocaine use. Proc. Natl. Acad. Sci. USA 109, 20703–20708 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Brown, H.D., McCutcheon, J.E., Cone, J.J., Ragozzino, M.E. & Roitman, M.F. Primary food reward and reward-predictive stimuli evoke different patterns of phasic dopamine signaling throughout the striatum. Eur. J. Neurosci. 34, 1997–2006 (2011).

    PubMed  PubMed Central  Google Scholar 

  51. Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. Preprint at http://arxiv.org/abs/1406.5823 (2014).

  52. Friston, K.J., Stephan, K.E., Lund, T.E., Morcom, A. & Kiebel, S. Mixed-effects and fMRI studies. Neuroimage 24, 244–252 (2005).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank C. Gregory, M. Applegate and J. Finkelstein for assistance in data collection; J. Pillow and A. Conway for advice with data analysis; M. Murugan and B. Engelhard for comments on the manuscript; and D. Tindall and P. Wallace for administrative support. I.B.W. was supported by the Pew, McKnight, NARSAD and Sloan Foundations, NIH DP2 New Innovator Award and an R01 MH106689-02; N.F.P was supported by an NSF Graduate Research Fellowship; and J.P.T. and I.B.W. were supported by the Essig and Enright '82 Fund.

Author information

Authors and Affiliations

Authors

Contributions

N.F.P., C.M.C., J.P.T. and J.L. performed the experiments; N.F.P., C.M.C., J.P.T., J.L. and J.Y.C. analyzed the data; T.J.D. provided advice on rig design; N.D.D. and I.B.W. provided advice on statistical analysis; N.F.P., N.D.D. and I.B.W. designed experiments and interpreted the results; and N.F.P. and I.B.W. wrote the manuscript.

Corresponding author

Correspondence to Ilana B Witten.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Confirmation that optogenetic inhibition of dopaminergic neurons affects choice

(a) Sample behavioral trace as in Figure 1d, but with NpHR stimulation trials depicted as green blocks (stimulation on a randomly selected 10% of all trials; 200 example trials from an NpHR-YFP animal). (b) Whole cell recording in voltage clamp of the photocurrent in an example NpHR-YFP expressing neuron from a TH::Cre mouse injected with Cre-dependent NpHR virus (green bar, 560 nm light). Inset shows the average and SEM across the population (n=14 neurons; peak current: 689+/-19 pA, steady state current: 455+/-13 pA). (c) Light-induced inhibition of spikes generated by current injections (150pA injections). During 10-seconds of photostimulation (green bar), action potentials. Trace is a single trial of a single neuron. (d) Population summary for c. Normalized spike rate before (left bar), during (middle bar) and after (right bar) 10s of photostimulation, averaged across the population. (n=14 neurons, paired two-tailed t-test, p=1.8e-8, t(11)=8.59, comparison of baseline and stim period firing rate; p=1.6e-10, t(11)=11.15, comparison of stim and recovery). (e) Surgical schematic. Cre-dependent NpHR is injected into the VTA/SN of DAT::Cre mice and the optical fiber is implanted above the structure. (f) Coefficients from a logistic regression model demonstrating the influence of VTA/SN cell body inhibition on lever choice in subsequent trials in NpHR-YFP and YFP-control DAT::Cre mice. A negative coefficient indicates a reduction in the return probability to the lever chosen on the previous trial. Conversely, a positive coefficient indicates that the animal is more likely to return to the previously chosen lever. Rewarded choices with stimulation decreased the probability of returning to the chosen lever in comparison to rewarded choices without stimulation in NpHR-YFP mice (left panel; p = 0.007, t(5)=4.46 for 1 trial back; two-tailed t-test comparing coefficients of “rewarded choice” in blue with “rewarded choice+ rewarded choice x stim” in purple; n=6 mice). Likewise, unrewarded choice with stimulation significantly decreased the probability of returning to the chosen lever compared to unrewarded choice alone (left panel; p=0.03, t(5)=3.04 for 1 trial back, two-tailed t-test comparing “unrewarded choice” in red with “unrewarded choice+unrewarded choice x stim” in orange). (g) Same as f but stimulation was limited to only part of the trial. Left: inhibition from the time of the initial nose poke to the time of the lever press. Right: inhibition from the time of the lever press until the end of reward consumption. In either case, no significant effect of light stimulation on either rewarded or unrewarded choice; p>0.1.

Supplementary Figure 2 Largely non-overlapping populations of TH+ neurons project to DMS and NAc

(a) The retrograde tracer CTB 488 was injected into the NAc and the retrograde tracer CTB 555 was injected in the DMS of an example mouse. Scale bar: 1 mm. (b) Retrogradely labeled neurons were evident in the lateral VTA / medial SNc. Scale bar: 100 µm. (c) Of TH+ neurons labeled with either CTB, 52±7% were labeled with the CTB injected in the NAc (green bar), while 56±5% were labeled with the CTB injected in the DMS (red bar). Notably, only 8±4% of TH+ neurons that were labeled with either CTB were labeled with both CTBs (gray bar), signifying that DAergic inputs to NAc and DMS are largely independent. For these data, 3 coronal midbrain slices were considered from each of n=6 mice (total of 487 neurons counted). Error bars are SEM.

Supplementary Figure 3 Individual sites show similar responses to reward consumption and reward-predictive cues in VTA/SN::DMS and VTA/SN::NAc terminal recordings

(a) gCaMP6f responses time-locked to either CS+ (left panel), CS- (middle panel) or Reward Consumption (right panel) in VTA/SN::DMS (blue) and VTA/SN::NAc terminals (n=11). Each line is the average response of between 500 and 1500 trials of an individual animal. The average of these traces is represented in Figure 4d. (b) Same as a except each trace is the response kernel derived from the regression model outlined in Figure 4a instead of the gCaMP6f response.

Supplementary Figure 4 Difference in ipsilateral and contralateral responses in individual recording sites in VTA/SN::DMS and VTA/SN::NAc terminals

(a) The difference in gCaMP6f time-locked responses between contralateral and ipsilateral choice trials in VTA/SN::DMS. (b) Same as a only and VTA/SN::NAc terminals. The time locked response to the nose poke is shown in the left panel, lever presentation on the right. A positive number indicates a larger contralateral response while a negative number indicates a larger ipsilateral response. Each trace is the average of an individual animal. (c,d) Same as in a, b except traces are the difference between the contralateral and ipsilateral response kernels derived from the regression model outlined in Figure 4a. (e) VTA/SN::NAc terminal time-locked gCaMP6f responses from individual animals separated into ipsilateral (grey) and contralateral (orange) trials for nose poke and lever presentation (left and right panels, respectively). (f) Same as in e except for VTA/SN::DMS terminal recordings (ipsilateral trials in grey, contralateral in blue). (g,h) Same as in e,f only traces are response kernels derived from the regression model outlined in Figure 4a.

Supplementary Figure 5 Control striatal recordings in mice expressing GFP (not gCaMP6f) in VTA DA terminals does not reveal contralateral preference.

(a) Control recordings in DMS and NAc of TH::Cre mice injected with GFP virus in the VTA (n=4 recording sites). Each row represents GFP Z-score data from a different trial, time locked to either the nose poke (left) or the lever presentation (right). No obvious modulation in the GFP fluorescence signal at the time of these events is evident. (b) Kernels were calculated exactly as in Fig. 6b,d, and, no significant ipsilateral/contralateral modulation is evident in the GFP signal, indicating that the modulation with upcoming movement is not a movement artifact.

Supplementary Figure 6 Upcoming lever choice is a better predictor of VTA/SN::DMS terminal responses than previous lever choice

A linear regression model of the VTA/SN::DMS terminal responses that includes all behavioral events (trial start, nose poke, lever presentation, lever press, CS+, CS- and reward consumption), as well as interaction kernels between the choice on the upcoming trial and the event kernels, and finally interaction with choice on the previous kernel and the event kernel. This reveals non-zero interaction kernels for the upcoming but not previous trial interaction kernels. In other words, upcoming choice is a better predictor of neural activity than previous choice. Error bars are SEM across recording sites.

Supplementary Figure 7 Difference in contralateral and ipsilateral responses in individual recording sites in VTA/SN::DMS cell bodies

(a) The difference in gCaMP6f time-locked responses between contralateral and ipsilateral choice trials in VTA/SN::DMS cell body recordings (n=7, surgery schematic outlined in Fig 7a). The time locked response to the nose poke is shown in the left panel, lever presentation on the right. A positive number indicates a larger contralateral response while a negative number indicates a larger ipsilateral response. Each trace is the average of an individual animal. (b) Same as in a except traces are the difference between the contralateral and ipsilateral response kernels derived from the regression model outlined in Figure 4a. (c) VTA/SN::DMS cell body time-locked gCaMP6f responses from individual animals separated into ipsilateral (grey) and contralateral (blue) trials for nose poke and lever presentation (left and right panels, respectively). (d) Same as in c only traces are again response kernels derived from the regression model outlined in Figure 4a.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Parker, N., Cameron, C., Taliaferro, J. et al. Reward and choice encoding in terminals of midbrain dopamine neurons depends on striatal target. Nat Neurosci 19, 845–854 (2016). https://doi.org/10.1038/nn.4287

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nn.4287

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing