Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Dissociable roles of ventral pallidum neurons in the basal ganglia reinforcement learning network

Abstract

Reinforcement learning models treat the basal ganglia (BG) as an actor–critic network. The ventral pallidum (VP) is a major component of the BG limbic system. However, its precise functional roles within the BG circuitry, particularly in comparison to the adjacent external segment of the globus pallidus (GPe), remain unexplored. We recorded the spiking activity of VP neurons, GPe cells (actor) and striatal cholinergic interneurons (critic) while monkeys performed a classical conditioning task. Here, we report that VP neurons can be classified into two distinct populations. The persistent population displayed sustained activation following visual cue presentation, was correlated with monkeys’ behavior and showed uncorrelated spiking activity. The transient population displayed phasic synchronized responses that were correlated with the rate of learning and the reinforcement learning model’s prediction error. Our results suggest that the VP is physiologically different from the GPe and identify the transient VP neurons as a BG critic.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Task design and recording.
Fig. 2: GPe persistent average responses versus VP and TAN transient average responses.
Fig. 3: Clustering analysis reveals two distinct transient and persistent neuronal populations in the VP, unlike the GPe and TANs.
Fig. 4: Correlative spiking activity of transient VP neurons and TANs versus noncorrelative spiking activity of persistent VP and GPe cells.
Fig. 5: Persistent VP and GPe spiking activity in the overtrained blocks positively correlates with licking and blinking behavior, in contrast to transient VP and TAN spiking activity.
Fig. 6: Transient VP spiking activity positively correlates with the learning slope and with the prediction error of a TD model, in contrast to persistent VP spiking activity.

Similar content being viewed by others

Data availability

Data from this study are available from the corresponding author upon reasonable request.

Code availability

The code related to this study is available from the corresponding author upon reasonable request.

References

  1. Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).

    CAS  PubMed  Google Scholar 

  2. Deffains, M. & Bergman, H. Striatal cholinergic interneurons and cortico–striatal synaptic plasticity in health and disease. Mov. Disord. 30, 1014–1025 (2015).

    CAS  PubMed  Google Scholar 

  3. Tachibana, Y. & Hikosaka, O. The primate ventral pallidum encodes expected reward value and regulates motor action. Neuron 76, 826–837 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Root, D. H. et al. Differential roles of ventral pallidum subregions during cocaine self-administration behaviors. J. Comp. Neurol. 521, 558–588 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Panagis, G. & Spyraki, C. Neuropharmacological evidence for the role of dopamine in ventral pallidum self-stimulation. Psychopharmacology 123, 280–288 (1996).

    CAS  PubMed  Google Scholar 

  6. Miller, J. M. et al. Anhedonia after a selective bilateral lesion of the globus pallidus. Am. J. Psychiatry 163, 786–788 (2006).

    PubMed  Google Scholar 

  7. Childress, A. R. et al. Prelude to passion: limbic activation by ‘unseen’ drug and sexual cues. PLoS One 3, e1506 (2008).

    PubMed  PubMed Central  Google Scholar 

  8. Smith, K. S., Tindell, A. J., Aldridge, J. W. & Berridge, K. C. Ventral pallidum roles in reward and motivation. Behav. Brain Res. 196, 155–167 (2009).

    PubMed  Google Scholar 

  9. Switzer, R. C., Hill, J. & Heimer, L. The globus pallidus and its rostroventral extension into the olfactory tubercle of the rat: a cyto- and chemoarchitectural study. Neuroscience 7, 1891–1904 (1982).

    CAS  PubMed  Google Scholar 

  10. Zahm, D. S. & Heimer, L. Ventral striatopallidal parts of the basal ganglia in the rat: I. Neurochemical compartmentation as reflected by the distributions of neurotensin and substance P immunoreactivity. J. Comp. Neurol. 272, 516–535 (1988).

    CAS  PubMed  Google Scholar 

  11. Smith, Y., Parent, A., Seguela, P. & Descarries, L. Distribution of GABA-immunoreactive neurons in the basal ganglia of the squirrel monkey (Saimiri sciureus). J. Comp. Neurol. 259, 50–64 (1987).

    CAS  PubMed  Google Scholar 

  12. Mitchell, S. J., Richardson, R. T., Baker, F. H. & DeLong, M. R. The primate globus pallidus: neuronal activity related to direction of movement. Exp. Brain Res. 68, 491–505 (1987).

    CAS  PubMed  Google Scholar 

  13. Napier, T. C., Simson, P. E. & Givens, B. S. Dopamine electrophysiology of ventral pallidal/substantia innominata neurons: comparison with the dorsal globus pallidus. J. Pharmacol. Exp. Ther. 258, 249–262 (1991).

    CAS  PubMed  Google Scholar 

  14. Alexander, G. Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annu. Rev. Neurosci. 9, 357–381 (1986).

    CAS  PubMed  Google Scholar 

  15. Haber, S. N., Groenewegen, H. J., Grove, E. A. & Nauta, W. J. H. Efferent connections of the ventral pallidum: evidence of a dual striato pallidofugal pathway. J. Comp. Neurol. 235, 322–335 (1985).

    CAS  PubMed  Google Scholar 

  16. Wright, C. I., Beijer, A. V. & Groenewegen, H. J. Basal amygdaloid complex afferents to the rat nucleus accumbens are compartmentally organized. J. Neurosci. 16, 1877–1893 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. McGeorge, A. J. & Faull, R. L. The organization of the projection from the cerebral cortex to the striatum in the rat. Neuroscience 29, 503–537 (1989).

    CAS  PubMed  Google Scholar 

  18. O’Doherty, J. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304, 452–454 (2004).

    PubMed  Google Scholar 

  19. Haber, S. N., Lynd-Balta, E. & Mitchell, S. J. The organization of the descending ventral pallidal projections in the monkey. J. Comp. Neurol. 329, 111–128 (1993).

    CAS  PubMed  Google Scholar 

  20. Kalivas, P. W., Churchill, L. & Klitenick, M. A. GABA and enkephalin projection from the nucleus accumbens and ventral pallidum to the ventral tegmental area. Neuroscience 57, 1047–1060 (1993).

    CAS  PubMed  Google Scholar 

  21. Grace, A. A. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nat. Rev. Neurosci. 17, 524–532 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. DeLong, M. R. Activity of pallidal neurons during movement. J. Neurophysiol. 34, 414–427 (1971).

    CAS  PubMed  Google Scholar 

  23. Contreras, C. M., Mexicano, G. & Guzman-Flores, C. A stereotaxic brain atlas of the green monkey (Cercopithecus aethiops aethiops). Bol. Estud. Med. Biol. 31, 383–428 (1981).

    CAS  PubMed  Google Scholar 

  24. Martin, R. F. & Bowden, D. M. Primate Brain Maps: Structure of the Macaque Brain. (Elsevier, 2000).

  25. Adler, A. et al. Temporal convergence of dynamic cell assemblies in the striato–pallidal network. J. Neurosci. 32, 2473–2484 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Kimura, M., Rajkowski, J. & Evarts, E. Tonically discharging putamen neurons exhibit set-dependent responses. Proc. Natl Acad. Sci. USA 81, 4998–5001 (1984).

    CAS  PubMed  Google Scholar 

  27. Graybiel, A. M., Aosaki, T., Flaherty, A. W. & Kimura, M. The basal ganglia and adaptive motor control. Science 265, 1826–1831 (1994).

    CAS  PubMed  Google Scholar 

  28. Raz, A., Vaadia, E. & Bergman, H. Firing patterns and correlations of spontaneous discharge of pallidal neurons in the normal and the tremulous 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine vervet model of parkinsonism. J. Neurosci. 20, 8559–8571 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Morris, G., Arkadir, D., Nevet, A., Vaadia, E. & Bergman, H. Coincident but distinct messages of midbrain dopamine and striatal tonically active neurons. Neuron 43, 133–143 (2004).

    CAS  PubMed  Google Scholar 

  30. Joshua, M. et al. Synchronization of midbrain dopaminergic neurons is enhanced by rewarding events. Neuron 62, 695–704 (2009).

    CAS  PubMed  Google Scholar 

  31. Gawne, T. J. & Richmond, B. J. How independent are the messages carried by adjacent inferior temporal cortical neurons? J. Neurosci. 13, 2758–2771 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Williams, Z. M. & Eskandar, E. N. Selective enhancement of associative learning by microstimulation of the anterior caudate. Nat. Neurosci. 9, 562–568 (2006).

    CAS  PubMed  Google Scholar 

  33. Joshua, M., Adler, A., Mitelman, R., Vaadia, E. & Bergman, H. Midbrain dopaminergic neurons and striatal cholinergic interneurons encode the difference between reward and aversive events at different epochs of probabilistic classical conditioning trials. J. Neurosci. 28, 11673–11684 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Parent, A. & Hazrati, L. N. Functional anatomy of the basal ganglia. I. The cortico-basal ganglia-thalamo-cortical loop. Brain Res. Brain Res. Rev. 20, 91–127 (1995).

    CAS  PubMed  Google Scholar 

  35. Adler, A., Finkes, I., Katabi, S., Prut, Y. & Bergman, H. Encoding by synchronization in the primate striatum. J. Neurosci. 33, 4854–4866 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Kita, H. Globus pallidus external segment. Prog. Brain Res. 160, 111–133 (2007).

    CAS  PubMed  Google Scholar 

  37. Bar-Gad, I., Morris, G. & Bergman, H. Information processing, dimensionality reduction and reinforcement learning in the basal ganglia. Prog. Neurobiol. 71, 439–473 (2003).

    PubMed  Google Scholar 

  38. Iskhakova, L. et al. Computational physiology of the basal ganglia, movement disorders, and their therapy. in Movement Disorders Curricula 3–10 (Springer Vienna, 2017).

  39. Root, D. H., Melendez, R. I., Zaborszky, L. & Napier, T. C. The ventral pallidum: subregion-specific functional anatomy and roles in motivated behaviors. Prog. Neurobiol. 130, 29–70 (2015).

    PubMed  PubMed Central  Google Scholar 

  40. Saga, Y. et al. Ventral pallidum encodes contextual information and controls aversive behaviors. Cereb. Cortex 27, 2528–2543 (2017).

    PubMed  Google Scholar 

  41. Schultz, W. Dopamine reward prediction-error signalling: a two-component response. Nat. Rev. Neurosci. 17, 183–195 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Mitelman, R., Joshua, M., Adler, A. & Bergman, H. A noninvasive, fast and inexpensive tool for the detection of eye open/closed state in primates. J. Neurosci. Methods 178, 350–356 (2009).

    PubMed  Google Scholar 

  43. Joshua, M., Elias, S., Levine, O. & Bergman, H. Quantifying the isolation quality of extracellularly recorded action potentials. J. Neurosci. Methods 163, 267–282 (2007).

    PubMed  Google Scholar 

  44. Bar-Gad, I., Ritov, Y., Vaadia, E. & Bergman, H. Failure in identification of overlapping spikes from multiple neuron activity causes artificial correlations. J. Neurosci. Methods 107, 1–13 (2001).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Y. Dagan and T. Ravins-Yaish for assistance with animal care; A. Bick and A. Payis for assistance with the MRI scan; and H. Gabbay, S. Freeman, U. Werner-Reiss and E. Singer for general assistance. We thank A. Shapochnikov for help in preparing the experimental setup. We also thank M. Deffains for helpful comments and discussion. This work was supported by grants from the European Research Council (grant 322495) and the Rosetrees Trust (grant M93-F1) to H.B.

Author information

Authors and Affiliations

Authors

Contributions

A.K., A.M.K. and H.B. designed the research. Z.I. performed surgery. A.K. and A.M.K. collected the data. A.K. analyzed the data and implemented the reinforcement learning model. A.A. collected and analyzed the ventral striatum data. A.K. and H.B. wrote the manuscript. All authors read and commented on the final version of the manuscript.

Corresponding author

Correspondence to Alexander Kaplan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Neuroscience thanks Atsushi Nambu, Yoshihisa Tachibana and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 GPe and VP neurons display different discharge properties and spatial arrangements.

a, Distributions of the mean discharge rates of GPe (top) and VP (bottom) neurons, computed during the last 3 s of the inter-trial interval (ITI) period. Red vertical lines represent the average discharge rate of each structure. N = 97 GPe and 43 VP neurons. ***P = 3 ×10−19, two-sided Mann-Whitney U test. b, Distributions of the mean coefficient of variation (CV) of the inter-spike intervals (ISIs). Same conventions as in (a). N = 97 GPe and 43 VP neurons. **P = 0.0073, two-sided Mann-Whitney U test. c, Spike waveform durations, computed as the time difference between the first negative peak and the next positive peak. N = 97 GPe and 43 VP neurons. ***P = 4 ×10−11, two-sided Mann-Whitney U test. d, Spatial distribution of GPe and VP neurons. Each dot represents a single cell. Brown, GPe cells; gray, VP cells. Abscissa: coordinates in a parasagittal plane (in mm); A, anterior; P, posterior; zero is coronal section AC0 (AC, anterior commissure); ordinate: coordinates in the horizontal plane (in mm); M, medial; L, lateral; zero is the central coordinate of both structures in our recordings; Z-axis: depth from entry to the cortex (in mm). N = 97 GPe and 43 VP neurons.

Extended Data Fig. 2 Absolute population responses of GPe and VP neurons.

a, Absolute population responses of GPe cells for the reward, neutral and aversive trials, N = 97 neurons. Abscissa, time (−0.5 – 2 s). Vertical dashed lines indicate the times of cue onset; ordinate, firing rate in Hz, normalized by the mean discharge rate during the last 3 s of the ITI period. Shaded regions represent SEMs. b, Absolute population responses of VP cells, N = 43 neurons. Same conventions as in (a).

Extended Data Fig. 3 VP neurons exhibit similar average responses across the different blocks.

a, Mean population responses of all VP neurons in the OTR-D block. Abscissa, time (−2 – 4 s). Vertical dashed lines at 0 and 2 s indicate the times of cue onset and outcome delivery, respectively; ordinate, firing rate in Hz, normalized by the mean discharge rate during the last 3 s of the ITI period. Shaded regions represent SEMs. b, Mean population responses of all VP neurons in the OTR-P block. Same conventions as in (a). c, Mean population responses of all VP neurons in the NOV-D block. Same conventions as in (a). d, Mean population responses of all VP neurons in the NOV-P block. Same conventions as in (a). Data for this figure were taken from all 43 VP neurons that met the inclusion criteria.

Extended Data Fig. 4 VP cells and TANs display higher TP indices than GPe neurons.

a, Transient-Persistent (TP) indices of GPe neurons, VP cells and TANs in the reward trials of all four blocks. The mean value of the absolute deviation from baseline on the interval [1-2 s] was subtracted from the maximum of the absolute deviation from baseline on the interval [0-1 s]. The result was divided by the sum of these two real numbers to obtain the TP index. Each dot represents the TP index of a single cell (N = 97, 43 and 24 for the GPe, VP and TAN, respectively). Bars indicate the mean TP indices. Error bars represent SEMs. **P = 0.0036 (VP and GPe), ***P = 6 ×10−4 (TAN and GPe), two-sided Mann-Whitney U test. b, TP indices of GPe neurons, VP cells and TANs in the neutral trials. Same conventions as in (a). **P = 0.0059 (VP and GPe), two-sided Mann-Whitney U test. c, TP indices of GPe neurons, VP cells and TANs in the aversive trials. Same conventions as in (a). *P = 0.0287 (TAN and GPe), two-sided Mann-Whitney U test.

Extended Data Fig. 5 Transient VP average neuronal responses are not strongly affected by the number of clusters in the k-means algorithm.

Grouping of VP neuronal responses using different numbers of clusters in the k-means algorithm. The data points on the left depict the output of the PCA for the representative case of the reward trials. PC1 and PC2 are the first and second principal components, respectively. Clusters 1-4 represent the average VP responses in the different groups. Abscissa, time (-0.5 – 2 s). Vertical dashed lines indicate the times of cue onset; ordinate, firing rate in Hz, normalized by the mean discharge rate during the last 3 s of the ITI period. Shaded regions represent SEMs. N represents the number of VP cells in each cluster.

Extended Data Fig. 6 Monkeys exhibit similar learning kinetics throughout the NOV-D block.

a, Learning curves for monkey D, for the reward, neutral and aversive trials. Abscissa, trial number; ordinate, mean number of licks minus the mean number of blinks (computed between 0-1 s; 0 is the time of cue onset), normalized between zero and one. Error bars represent SEMs across trials (N = 10 trials for each cue type, 117 repetitions per trial). All curves were smoothed with a 5-point moving average. b, Learning curves for monkey N. Same conventions as in (a). N = 10 trials for each cue type, 74 repetitions per trial.

Extended Data Fig. 7 Transient versus persistent VP neuronal responses do not reflect different discharge properties or spatial layout.

a, Distributions of the mean discharge rates of transient (top) and persistent (bottom) VP neurons (N = 29 and 14 neurons, respectively), computed during the last 3 s of the ITI period. Red vertical lines represent the average discharge rates. b, Distributions of the mean CV of the ISIs. Same conventions as in (a). c, Spike waveform durations, measured as the time difference from the first negative peak to the next positive peak. d, Spatial distribution of VP neurons. Each dot represents a single VP cell. Purple, transient VP cells; olive, persistent VP cells. Abscissa: coordinates in a parasagittal plane (in mm); A, anterior; P, posterior; zero is coronal section AC0 (AC, anterior commissure). Ordinate: coordinates in the horizontal plane (in mm); M, medial; L, lateral; zero is the center of the VP in our recordings. Z-axis: depth from entry to the cortex (in mm). e, Fraction of pairs of VP neurons belonging to the same transient/persistent cluster. “Same session”, cell pairs recorded from different electrodes in the same recording sessions; “different session”, cell pairs recorded in different sessions.

Extended Data Fig. 8 VS TANs display homogeneous responses to cue presentations, as well as correlational spiking activity.

a, TAN responses to cue events. Each row is the Z-score transformed PSTH of a single neuron to the presentation of cues that signal rewarding, neutral or aversive outcomes. Z-scores are color-coded. Abscissa, time (0 – 2 s), zero is the time of cue onset; ordinate, unit number. Cells are randomly ordered. N = 43 neurons. b, TAN average population responses to cue presentations. Abscissa, time (-0.5 – 2 s). The vertical dashed line at t = 0 indicates the time of cue onset; ordinate, normalized firing rate in Hz. Blue, reward trials; green, neutral trials; red, aversive trials. Shaded regions represent SEMs. N = 43 neurons for each trial type. c, Left: mean cross-correlation histogram of simultaneously recorded pairs of TANs (N = 8 pairs). Abscissa: time, ±1 s around the trigger spike (time = 0); ordinate: normalized firing rate in Hz. Shaded region represents SEM. Right: distribution of TAN signal correlations (N = 453 pairs).

Extended Data Fig. 9 Clustering analysis reveals three distinct populations of VS MSNs, which differ in their response profiles and correlation patterns.

a, MSN responses to cue events. Each row is the Z-score transformed PSTH of a single neuron to the presentation of cues that signal rewarding, neutral or aversive outcomes. Z-scores are color-coded. Abscissa, time (0–2 s), zero is the time of cue onset; ordinate, unit number. Cells are ordered by clusters. Within each cluster, cells are randomly ordered. N = 394 neurons. b, Top: MSN population responses to cue presentations. Abscissa, time (-0.5 – 2 s). The vertical dashed lines at t = 0 indicate the time of cue onset; ordinate, normalized firing rate in Hz. Blue, reward trials; green, neutral trials; red, aversive trials. Shaded regions represent SEMs. Bottom: MSN absolute population responses to cue presentations. N = 163, 85 and 146 neurons, from left to right. c, Top: mean cross-correlation histograms. N = 99, 34 and 70 pairs, from left to right. Abscissa: time, ±1 s around the trigger spike (time = 0); ordinate: normalized firing rate in Hz. Shaded regions represent SEMs. Bottom: distributions of signal correlations. N = 6771, 1776 and 5257 pairs, from left to right.

Extended Data Fig. 10 Transient VP neurons show evidence of reward prediction error encoding during learning, in contrast to persistent VP and GPe neurons.

a, Transient VP average population responses in the OTR-D and OTR-P blocks. In the OTR-P block, only trials in which the outcome was given are presented. Abscissa, time (-2 – 4 s). Vertical dashed lines at t = 0 and t = 2 s indicate the times of cue onset and outcome delivery, respectively; ordinate, firing rate in Hz, normalized by the mean discharge rate during the last 3 s of the ITI period. N = 29 neurons. Shaded regions represent SEMs. b, Transient VP, persistent VP and GPe average population responses in the first and last two trials of the NOV-D block. Top: reward trials; middle: neutral trials; bottom: aversive trials. N = 27, 12 and 94 neurons for the transient VP, persistent VP and GPe, respectively. Same conventions as in (a). c, Transient VP, persistent VP and GPe average population responses in the first and last trials of the NOV-P block, for trials in which the reward outcome was given (upper row) and in which it was omitted (lower row). N = 26, 12 and 93 neurons for the transient VP, persistent VP and GPe, respectively. Same conventions as in (a).

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Reporting Summary

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaplan, A., Mizrahi-Kliger, A.D., Israel, Z. et al. Dissociable roles of ventral pallidum neurons in the basal ganglia reinforcement learning network. Nat Neurosci 23, 556–564 (2020). https://doi.org/10.1038/s41593-020-0605-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41593-020-0605-y

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing