The nervous system is hypothesized to compute reward prediction errors (RPEs) to promote adaptive behavior. Correlates of RPEs have been observed in the midbrain dopamine system, but the extent to which RPE signals exist in other reward-processing regions is less well understood. In the present study, we quantified outcome history-based RPE signals in the ventral pallidum (VP), a basal ganglia region functionally linked to reward-seeking behavior. We trained rats to respond to reward-predicting cues, and we fit computational models to predict the firing rates of individual neurons at the time of reward delivery. We found that a subset of VP neurons encoded RPEs and did so more robustly than the nucleus accumbens, an input to the VP. VP RPEs predicted changes in task engagement, and optogenetic manipulation of the VP during reward delivery bidirectionally altered rats’ subsequent reward-seeking behavior. Our data suggest a pivotal role for the VP in computing teaching signals that influence adaptive reward seeking.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Sutton, R. S. & Barto, A. G. Introduction to Reinforcement Learning (MIT Press, Cambridge, MA, 1998).
Rescorla, R. A. & Wagner, A. R. A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement, in Classical Conditioning II: Current Research and Theory, Vol. 2 (eds Black, A. H. & Prokasy, W. F.), 64–99 (Apple-Century-Crofts, 1972).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593–1599 (1997).
Bayer, H. M. & Glimcher, P. W. Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron 47, 129–141 (2005).
Smith, K. S., Tindell, A. J., Aldridge, J. W. & Berridge, K. C. Ventral pallidum roles in reward and motivation. Behav. Brain Res. 196, 155–167 (2009).
Root, D. H., Melendez, R. I., Zaborszky, L. & Napier, T. C. The ventral pallidum: subregion-specific functional anatomy and roles in motivated behaviors. Prog. Neurobiol. 130, 29–70 (2015).
de Olmos, J. S. & Heimer, L. The concepts of the ventral striatopallidal system and extended amygdala. Ann. NY Acad. Sci. 877, 1–32 (1999).
Richard, J. M., Ambroggi, F., Janak, P. H. & Fields, H. L. Ventral pallidum neurons encode incentive value and promote cue-elicited instrumental actions. Neuron 90, 1165–1173 (2016).
Ottenheimer, D., Richard, J. M. & Janak, P. H. Ventral pallidum encodes relative reward value earlier and more robustly than nucleus accumbens. Nat. Commun. 9, 4350 (2018).
Fujimoto, A. et al. Signaling incentive and drive in the primate ventral pallidum for motivational control of goal-directed action. J. Neurosci. 39, 1793–1804 (2019).
White, J. K. et al. A neural network for information seeking. Nat. Commun. 10, 1–19 (2019).
Tindell, A. J., Berridge, K. C. & Aldridge, J. W. Ventral pallidal representation of Pavlovian cues and reward: population and rate codes. J. Neurosci. 24, 1058–1069 (2004).
Tachibana, Y. & Hikosaka, O. The primate ventral pallidum encodes expected reward value and regulates motor action. Neuron 76, 826–837 (2012).
Tian, J. et al. Distributed and mixed information in monosynaptic inputs to dopamine neurons. Neuron 91, 1374–1389 (2016).
Stephenson-Jones, M. et al. Opposing contributions of gabaergic and glutamatergic ventral pallidal neurons to motivational behaviors. Neuron 105, 921–933 (2020).
Kaplan, A., Mizrahi-Kliger, A. D., Israel, Z., Adler, A. & Bergman, H. Dissociable roles of ventral pallidum neurons in the basal ganglia reinforcement learning network. Nat. Neurosci. 23, 556–564 (2020).
Tooley, J. et al. Glutamatergic ventral pallidal neurons modulate activity of the habenula–tegmental circuitry and constrain reward seeking. Biol. Psychiatry 83, 1012–1023 (2018).
Faget, L. et al. Opponent control of behavioral reinforcement by inhibitory and excitatory projections from the ventral pallidum. Nat. Commun. 9, 849 (2018).
Sclafani, A., Hertwig, H., Vigorito, M. & Feigin, M. B. Sex differences in polysaccharide and sugar preferences in rats. Neurosci. Biobehav. Rev. 11, 241–251 (1987).
Mohebi, A. et al. Dissociable dopamine dynamics for learning and motivation. Nature 570, 65–70 (2019).
Roesch, M. R., Calu, D. J. & Schoenbaum, G. Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat. Neurosci. 10, 1615 (2007).
Takahashi, Y. K. et al. Expectancy-related changes in firing of dopamine neurons depend on orbitofrontal cortex. Nat. Neurosci. 14, 1590 (2011).
Takahashi, Y. K., Langdon, A. J., Niv, Y. & Schoenbaum, G. Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum. Neuron 91, 182–193 (2016).
Sutton, R. S. Learning to predict by the methods of temporal differences. Mach. Learn. 3, 9–44 (1988).
Nakahara, H., Itoh, H., Kawagoe, R., Takikawa, Y. & Hikosaka, O. Dopamine neurons can represent context-dependent prediction error. Neuron 41, 269–280 (2004).
Fiorillo, C. D., Tobler, P. N. & Schultz, W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299, 1898–1902 (2003).
Eshel, N. et al. Arithmetic and local circuitry underlying dopamine prediction errors. Nature 525, 243–246 (2015).
Keiflin, R. & Janak, P. H. Dopamine prediction errors in reward learning and addiction: from theory to neural circuitry. Neuron 88, 247–263 (2015).
Watabe-Uchida, M., Eshel, N. & Uchida, N. Neural circuitry of reward prediction error. Annu. Rev. Neurosci. 40, 373–394 (2017).
Matsumoto, M. & Hikosaka, O. Lateral habenula as a source of negative reward signals in dopamine neurons. Nature 447, 1111–1115 (2007).
Tian, J. & Uchida, N. Habenula lesions reveal that multiple mechanisms underlie dopamine prediction errors. Neuron 87, 1304–1316 (2015).
Jhou, T. C., Fields, H. L., Baxter, M. G., Saper, C. B. & Holland, P. C. The rostromedial tegmental nucleus (RMTg), a GABAergic afferent to midbrain dopamine neurons, encodes aversive stimuli and inhibits motor responses. Neuron 61, 786–800 (2009).
Hong, S., Jhou, T. C., Smith, M., Saleem, K. S. & Hikosaka, O. Negative reward signals from the lateral habenula to dopamine neurons are mediated by rostromedial tegmental nucleus in primates. J. Neurosci. 31, 11457–11471 (2011).
Niv, Y., Daw, N. D., Joel, D. & Dayan, P. Tonic dopamine: opportunity costs and the control of response vigor. Psychopharmacology 191, 507–520 (2007).
Hamid, A. A. et al. Mesolimbic dopamine signals the value of work. Nat. Neurosci. 19, 117–126 (2016).
Bari, B. A. et al. Stable representations of decision variables for flexible behavior. Neuron 103, 922–933 (2019).
Beier, K. T. et al. Circuit architecture of vta dopamine neurons revealed by systematic input–output mapping. Cell 162, 622–634 (2015).
Hong, S. & Hikosaka, O. Diverse sources of reward value signals in the basal ganglia nuclei transmitted to the lateral habenula in the monkey. Front. Hum. Neurosci. 7, 778 (2013).
Knowland, D. et al. Distinct ventral pallidal neural populations mediate separate symptoms of depression. Cell 170, 284–297 (2017).
Gale, S. D. & Perkel, D. J. A basal ganglia pathway drives selective auditory responses in songbird dopaminergic neurons via disinhibition. J. Neurosci. 30, 1027–1037 (2010).
Chen, R. et al. Songbird ventral pallidum sends diverse performance error signals to dopaminergic midbrain. Neuron 103, 266–276 (2019).
Kearney, M. G., Warren, T. L., Hisey, E., Qi, J. & Mooney, R. Discrete evaluative and premotor circuits enable vocal learning in songbirds. Neuron 104, 559–575 (2019).
Hnasko, T. S., Hjelmstad, G. O., Fields, H. L. & Edwards, R. H. Ventral tegmental area glutamate neurons: electrophysiological properties and projections. J. Neurosci. 32, 15076–15085 (2012).
Leung, B. K. & Balleine, B. W. Ventral pallidal projections to mediodorsal thalamus and ventral tegmental area play distinct roles in outcome-specific Pavlovian-instrumental transfer. J. Neurosci. 35, 4953–4964 (2015).
Prasad, A. A. et al. Complementary roles for ventral pallidum cell types and their projections in relapse. J. Neurosci. 40, 880–893 (2020).
Richard, J. M., Stout, N., Acs, D. & Janak, P. H. Ventral pallidal encoding of reward-seeking behavior depends on the underlying associative structure. eLife 7, e33107 (2018).
Ottenheimer, D. J., Wang, K., Haimbaugh, A., Janak, P. H. & Richard, J. M. Recruitment and disruption of ventral pallidal cue encoding during alcohol seeking. Eur. J. Neurosci. 50, 3428–3444 (2019).
Elber-Dorozko, L. & Loewenstein, Y. Striatal action-value neurons reconsidered. eLife 7, e34248 (2018).
Mathis, A. et al. DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nat. Neurosci. 21, 1281–1289 (2018).
Nath, T. et al. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nat. Protoc. 14, 2152–2176 (2019).
Ottenheimer, D. J. et al. Analysis of a reward prediction error signal in ventral pallidum. G-Node https://doi.org/10.12751/g-node.3lbd0c (2020).
This work was supported by the National Institutes of Health (grant nos. 5T32NS91018-17 (to D.J.O.), F30MH110084 (to B.A.B.), K99AA025384 (to J.M.R.), R01DA042038 and R01NS104834 (to J.Y.C.), and R01DA035943 (to P.H.J.)), by Klingenstein-Simons, MQ, NARSAD, and Whitehall (to J.Y.C.), by a NARSAD Young Investigator Award (to J.M.R.) and by the National Science Foundation Graduate Research Fellowship (grant no. DGE1746891 to D.J.O.). We thank K. Wang and X. Tong for technical assistance.
The authors declare no competing interests.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 Placements for random sucrose/maltodextrin, random sucrose/maltodextrin/water, and blocked sucrose/maltodextrin rats.
Recording locations for nucleus accumbens (left) and ventral pallidum (right) rats.
(a) Distribution of the learning rate, α, for RPE neurons in VP (green) and NAc (orange). (b) Likelihood (LH) per trial for RPE and Current outcome neurons (n = 72 RPE and 126 Current outcome neurons from 5 rats) for RPE and Current outcome models, relative to the LH per trial of the Unmodulated model. Lower (more negative) indicates a better fit. Line represents median, box represents 25th and 75th percentile, and whiskers extend to 1.5 times the interquartile range. Red highlights the AIC-selected model. Median [25th to 75th percentile; min to max] ∆LH/trial are: RPE neurons, RPE model -0.21 [-0.39 to −0.14; −3.16 to −0.05], RPE neurons, Current outcome model −0.15 [−0.32 to −0.09; −3.03 to −0.02], Current outcome neurons, RPE model -0.12 [-0.23 to -0.07; -0.174 to -0.03], Current outcome neurons, Current outcome model -0.12 [-0.22 to -0.07; -1.73 to -0.03]. Median [25th-75th percentile] LH per trial for RPE neurons was 2.29 [2.04 to 2.49] and for Current outcome neurons was 2.15 [1.92 to 2.37]. (c) Model recovery, plotted as the fraction of neurons simulated with each model recovered as that model. (d) Distribution of difference between the true value of the parameters used to simulate the neurons in (c) and the values recovered by MLE.
(a) Expression of ArchT3.0:YFP and fiber tip placement for the rats included in the ArchT3.0 group for the optogenetic experiment in Fig. 3. (b) Expression of ChR2:GFP and fiber tip placement for the rats included in the ChR2 group. Pattern of results remained unchanged with or without inclusion of the rat with the most caudal placement.
(a) Mean(+/−SEM) port occupancy in time surrounding reward delivery on laser and no laser trials for YFP (left, n = 7 rats) and ArchT (right, n = 7 rats) groups. (b) Mean(+/−SEM) port occupancy in time surrounding reward delivery on laser and no laser trials for GFP (left, n = 7 rats) and ChR2 (right, n = 11 rats) groups. To account for the disruption of port occupancy by laser stimulation, we ran our distance from port analysis on the time beyond 15 s past reward delivery and found the same pattern of results. (c) Additional optogenetic experiment in ChR2 rats and controls where the 2 sec of laser stimulation was at the onset of the cue. (d) Mean(+/−SEM) distance from port in the ITI following laser stimulation did not differ from no laser trials for GFP (p = 0.94, Wilcoxon signed-rank test, two-sided, n = 7 rats) or ChR2 (p = 0.11, Wilcoxon signed-rank test, two-sided, n = 10 rats) groups. (e) The effect of laser was similar across both groups (median: 0.06 GFP, n = 7 rats; -0.09 ChR2, n = 10 rats; p = 0.36, Wilcoxon rank-sum test, two-sided).
Extended Data Fig. 5 Value encoding in VP at the time of cue onset in the random sucrose/maltodextrin task.
(a) Schematic of model-fitting and neuron classification process. For each neuron, the reward outcome and spike count following reward delivery on each trial were used to fit two models: Value and Unmodulated. Akaike information criterion (AIC) was used to select the best model (right). (b) Mean(+/−SEM) activity of neurons best fit by each of the models, plotted according to previous outcome (n = 39 Value and 397 Unmodulated neurons from 5 rats). (c) Coefficients(+/−SE) for outcome history linear regression for each class of neurons (n = 39 Value and 397 Unmodulated neurons). (d) Mean(+/−SEM) activity of all Value neurons with trials binned by model-derived Value. (e) Mean(+/−SEM) population activity of simulated and actual Value neurons according to each trial’s Value (V). (f) Model recovery, plotted as the fraction of neurons simulated with each model recovered as that model.
Extended Data Fig. 6 Value encoding at the time of cue onset in the random sucrose/maltodextrin/water task.
(a) Fraction of VP neurons best fit by the Value and Unmodulated models in the random sucrose/maltodextrin/water task. (b) Mean(+/−SEM) activity of neurons best fit by each of the models, plotted according to previous outcome (n = 38 Value and 216 Unmodulated neurons from 3 rats). (c) Coefficients(+/−SE) for outcome history linear regression for each class of neurons (n = 38 Value and 216 Unmodulated neurons). (d) Mean(+/−SEM) population activity of simulated and actual Value neurons according to each trial’s Value (V). (e) Mean(+/−SEM) activity of all Value neurons with trials binned by model-derived Value. (f) Distribution of correlations between individual VP neurons’ firing rates at cue onset on each trial and the distance from the port during the previous ITI. * = p = 0.00001 for negative shift in mean correlation coefficient (vertical line) compared to 1000 shuffles of data for Value neurons, Wilcoxon signed-rank test, two-sided, as well as p = 0.0000002 for more negative coefficients for Value neurons compared to Unmodulated neurons, Wilcoxon rank-sum test, two-sided. See also Fig. 4c,d.
Recording locations for rats from predictable and random sucrose/maltodextrin experiment in Extended Data Fig. 8.
(a) Task schematic: three auditory cues indicated three trial types. (b) Median latency to enter reward port following onset of cue for each trial type, plotted as the mean(+/−SEM) across all sessions for each rat (gray lines, n = 8, 9, 10, and 10 sessions for the 4 rats) and the overall mean(+/−SEM) (n = 37 sessions). (c) Percentage sucrose of total solution consumption in a two-bottle choice, before (‘Initial’) and after (‘Final’) recording (n = 4 rats). (d) Mean(+/−SEM) lick rate relative to reward delivery for each trial type (n = 37 sessions from 4 rats). (e) Mean(+/−SEM) activity of all neurons recorded in the predictable and random sucrose/maltodextrin task, aligned to reward delivery (n = 487 neurons from 4 rats). (f) Schematic of cue model-fitting. The best model (of 6 total) was selected with Akaike information criterion. (g) Fraction of the population best fit by each model. (h) Coefficients(+/−SE) for outcome history regression for each class of neurons with no cue effect (n = 38 RPE, 135 Current outcome, and 204 Unmodulated neurons). (i) Mean(+/−SEM) activity of all RPE neurons with no cue effect (n = 38 neurons). The trials for each neuron are binned according to their model-derived RPE. (j) Population activity of simulated and actual VP RPE neurons with no cue effect according to each trial’s RPE value. (k) Scatterplot of each cue effect neuron’s weight for specific sucrose and maltodextrin cues (n = 7 RPE, 33 Current outcome, and 70 Unmodulated cells with cue effects). The percentage of neurons falling in each quadrant is indicated. The percentage in our quadrant of interest (positive value for sucrose and negative value for maltodextrin) did not differ from chance (p = 0.1 for exact binomial test compared to null of 25%). (l) Mean(+/−SEM) activity of neurons with sucrose values > 0 and maltodextrin values < 0, consistent with a value-based cued expectation modulation. (m) Neurons with cue effects for cue-evoked signaling, rather than reward-evoked signaling, as in (g). (n) As in (k), for activity at the time of the cue rather than time of reward (n = 143 neurons with cue effects). * = p = 0.00001 for exact binomial test compared to null of 25%. (o) As in (l), for activity at the time of the cue rather than time of reward.
(a) Fraction of neurons classified as RPE, Current outcome, and Unmodulated in VP and NAc in the random sucrose/maltodextrin task using Bayesian information criterion (BIC) as the selection criterion. (b) Coefficients(+/−SE) for outcome history regression for VP neurons of each BIC subset (n = 37 RPE, 110 Current outcome, and 289 Unmodulated cells from 5 rats). (c) Population mean(+/−SEM) of all VP BIC RPE neurons, binned according to the model-derived RPE. (d) Mean(+/−SEM) population activity of simulated and actual BIC RPE neurons according to each trial’s RPE value for VP (left) and NAc (right). (e) Distribution of correlations between model-predicted and actual spiking for all RPE neurons from each region. (f) Distribution of α for RPE neurons in VP (green) and NAc (orange). (g) Mean(+/−SEM) activity of VP neurons classified as RPE by AIC but not BIC according to current and previous outcome (n = 35 neurons). (h) Coefficients(+/−SE) for outcome history regression for these neurons. (i) Mean(+/−SEM) activity of these neurons binned according to model-derived RPE on each trial.
About this article
Cite this article
Ottenheimer, D.J., Bari, B.A., Sutlief, E. et al. A quantitative reward prediction error signal in the ventral pallidum. Nat Neurosci 23, 1267–1276 (2020). https://doi.org/10.1038/s41593-020-0688-5