The lateral habenula (LHb) is believed to convey an aversive or 'anti-reward' signal, but its contribution to reward-related action selection is unknown. We found that LHb inactivation abolished choice biases, making rats indifferent when choosing between rewards associated with different subjective costs and magnitudes, but not larger or smaller rewards of equal cost. Thus, instead of serving as an aversion center, the evolutionarily conserved LHb acts as a preference center that is integral for expressing subjective decision biases.
When choosing between rewards that differ in terms of their relative value, subjective impressions of which option may be 'better' can be colored by certain costs (for example, effort, delays and uncertainty) that diminish the subjective value of objectively larger rewards. Decisions of this kind are facilitated by different nodes in the mesocorticolimbic dopamine system1. Recent studies have highlighted the LHb as a critical nucleus in this circuitry that acts as a brake on dopamine activity via disynaptic pathways through the rostromedial tegmental nucleus (RMTg)2,3,4. LHb neurons encode negative reward prediction errors opposite of dopamine neurons, exhibiting increased phasic firing in expectation of, or after, aversive events (for example, punishments, omission of expected rewards) and reduced firing after positive outcomes. LHb stimulation promotes conditioned avoidance and reduces reward-related responding, suggesting that this nucleus conveys an anti-reward or aversive signal5,6,7,8,9. Yet LHb neurons also encode rewards of dissimilar magnitude, displaying phasic increases and decreases in firing in anticipation, or after receipt, of smaller and larger rewards9. This differential reward encoding may aid in biasing decisions toward or away from subjectively superior or inferior rewards. However, the manner in which LHb signals may influence decision biases and volitional choice behavior is unknown.
We investigated the contribution of the LHb to different forms of cost and benefit decision-making mediated by dopamine circuitry10,11. We initially employed a probabilistic discounting task (a measure of risk-based decision-making), requiring rats to choose between a small, certain reward (1 pellet) and a large, risky reward. During daily training sessions, the probability of obtaining a larger four-pellet reward changed in a systematic manner over blocks of discrete-choice trials (100–12.5% or 12.5–100%). After ∼25 training days, rats (n = 16) displayed appropriate shifts in their decision biases, playing riskier during the higher probability blocks (100–50%) and safer when the odds were poorer (25–12.5%). This was apparent after control treatments in the LHb (Fig. 1a).
On a separate day, rats received infusions of GABAA/B agonists to inactivate the LHb. Given that phasic LHb firing encodes an anti-reward or disappointment signal after reward omissions6,7, LHb stimulation reduces responding for reward8 and LHb inactivation increases mesolimbic dopamine release12, a parsimonious expectation would be that this manipulation should increase responding for the larger reward. In fact, we observed an effect that was much more substantial. LHb inactivation completely abolished any discernible choice bias, inducing random patterns of responding that, when averaged across subjects, yielded a choice profile reflective of rats selecting both options with equal frequency, with choice behavior not differing from chance (50%) (F3,45 = 6.69, P = 0.001; Fig. 1a and Supplementary Fig. 1). This effect was apparent irrespective of whether reward probabilities decreased (n = 9) or increased (n = 7) over time (Fig. 1b,c). LHb inactivation also increased hesitation to make a choice (control latency = 0.7 ± 0.1 s, inactivation = 1.3 ± 0.2 s; t15 = 2.53, P = 0.02) and the number of trials in which no choices were made (control = 0.9 ± 0.5, inactivation = 6.45 ± 1.5; t15 = 3.22, P = 0.006). Moreover, this shift to indifference was apparent during periods in which subjects showed a prominent bias for either the large and risky or the small and certain option (Supplementary Fig. 2). We also observed an identical effect in a separate group (n = 7) trained on a simpler task in which the odds of obtaining a larger reward remained constant over trials (40%; Fig. 1d), indicating that promotion of choice biases by the LHb is not restricted to situations where reward probabilities are volatile, unlike other nodes in dopamine decision circuitry13.
Notably, the effects of LHb inactivation were neuroanatomically specific, as infusion 1 mm dorsal (hippocampus, n = 8) or ventral (thalamus, n = 5) to the LHb or near the ventricle (n = 8) had no effect on decision-making compared with control conditions (Fig. 1e and Supplementary Fig. 3). Thus, disruption of decision biases induced by inactivation treatments was attributable to suppression of neural activity circumscribed to the LHb, but not adjacent regions.
In addition to sending direct projections to midbrain dopamine neurons that promote aversive behaviors5, the LHb projects to the RMTg (which in turn regulates dopamine neuron activity) and to dorsal raphe serotoninergic neurons3. To clarify which of these projection targets may interact with the LHb to promote probabilistic choice biases, we trained separate groups of rats on the fixed probabilistic choice task. Inactivation of the RMTg induced a choice profile resembling indifference, similar to LHb inactivation (t5 = 3.17, P = 0.02; Fig. 1f and Supplementary Fig. 4). In contrast, dorsal raphe inactivation had no effect on choice (t3 = 0.01, P = 1.0.; Fig. 1g). Thus, modification of probabilistic choice biases by the LHb is mediated primarily via projections to the RMTg that in turn control dopamine neural activity, but not via serotonergic pathways.
We next investigated whether the LHb is specifically involved in cost/benefit judgments entailing reward uncertainty or if it has a broader role in promoting biases during other decisions about rewards of different subjective value. To this end, we used a delay-discounting task that required rats (n = 6) to choose either a small reward that was delivered immediately or a large reward that was delayed. Here, the subject is always guaranteed the larger reward, but delaying reward delivery after choice (0–45 s) diminishes its subjective value and shifts bias toward the small, immediate reward (Fig. 1h). In parallel with probabilistic discounting, LHb inactivation abolished delay discounting (F3,15 = 3.99, P = 0.03; Fig. 1h), as choice shifted to a point of indifference (t5 versus 50% = 0.37, P = 0.72). Thus, the LHb appears to have a fundamental role in promoting biases in situations requiring choice between rewards that differ in their subjective value.
A separate group (n = 5) was trained on a reward magnitude discrimination, choosing between four and one reward pellet, both delivered with 100% certainty. LHb inactivation did not alter preference for the larger reward, which, in this instance, clearly had greater objective value (F1,4 = 2.98, P = 0.16; Fig. 1i). Choice latencies (t4 = 0.35, P = 0.74) and trial omissions (t4 = 0.76, P = 0.49) were also unaffected (Supplementary Table 1). Thus, it is unlikely that the disruptions induced by LHb inactivation on cost/benefit decision-making can be attributed to motivational or discrimination deficits. Instead, the LHb contributes selectively to evaluation of rewards that differ in terms of their relative costs and subjective values, but not to simpler preferences for larger and smaller rewards of equal cost, similar to other nodes in dopamine decision circuitry11.
The fact that LHb inactivation reduced preference for the larger reward during the 100% or 0-s delay blocks of the discounting tasks, but not on the reward magnitude discrimination, likely reflects differences in the relative value representation of the larger versus smaller rewards that emerge after experience with these two types of tasks. This notion was supported by an analysis of forced-choice response latencies on the larger and smaller reward levers (Supplementary Fig. 5). Rats tested on the magnitude task showed greater response latencies when forced to select the smaller versus larger reward after both treatments. In contrast, for the discounting tasks, this difference during the forced-choice trials in the 100% or 0-s blocks was substantially muted or nonexistent. Thus, rats trained on the magnitude discrimination viewed the one-pellet option as being substantially inferior to the four-pellet option, whereas this discrepancy was not as apparent for those in the discounting tasks, similar to previous findings11. This may explain why preference for the larger reward during the no-cost blocks of the discounting task may have been more susceptible to disruption following LHb inactivation.
It could be argued that the lack of effect of LHb inactivation on reward magnitude discrimination is a result of rats responding in a habitual manner on this task relative to the discounting tasks, which were clearly goal-directed. To address this, we conducted a subsequent behavioral experiment to determine whether choice behavior under these conditions is sensitive to reinforcer or response devaluation. A separate group of rats that were well-trained on the reward magnitude discrimination was given a reinforcer devaluation test (Online Methods) that caused a marked reduction in preference for the larger reward (Supplementary Fig. 6a–e). An additional response devaluation test was conducted 2 d later, during which responses on the large reward lever did not provide reward. This manipulation also decreased preference for the lever formerly associated with the larger reward (Supplementary Fig. 6f). Thus, because choice on the reward magnitude task was altered following devaluation of either the reinforcer or the response contingency, this suggests that the rats maintained a representation of the relative value of the two options and were responding in a goal-directed, as opposed to a habitual, manner. As such, the lack of effect of LHb inactivation on this task, combined with the marked disruption in decision-making on the discounting tasks, renders it unlikely that these differential effects are attributable to differences in the contribution of the LHb to goal-directed versus habitual behavior. Instead, these data add further support to the notion that the LHb has a selective role in promoting choice biases in situations in which larger rewards are tainted by some form of cost, but not in expressing a more general preference for larger or smaller rewards.
These findings reveal a previously uncharacterized role for the LHb in reward-related processing, suggesting that it is critical for promoting choice biases during evaluation of the subjective costs and relative benefits associated with different actions. Disruption of LHb signal outflow rendered rats unable to display any sort of preference toward larger, costly rewards or smaller, cheaper ones. Instead, the rats behaved as if they had no idea which option might be better for them, defaulting to an inherently unbiased and random pattern of choice, but only when the relative value of the larger reward was tainted by some sort of cost (uncertainty or delays).
LHb stimulation induces avoidance behaviors and suppresses reward-related responding, and phasic increases in LHb neural firing encode aversive or disappointing events6,7. As such, an emerging consensus is that the LHb conveys some form of aversive or anti-reward signal5,8. Our findings call for a refinement of this view. Indeed, suppression of LHb activity did not enhance responding for larger rewards, but instead disrupted expression of a subjective preference for rewards of different value. In this regard, it is important to note that LHb neurons encode both aversive and rewarding situations via dynamic and opposing changes in activity. Thus, although phasic increases in firing encode aversive and non-rewarded expectations or events, or smaller rewards, these fast-firing LHb neurons also show reduced activity in response to rewarding stimuli6,9,14. Our findings indicate that suppressing these differential signals, encoding expectation or occurrence of negative or positive events, renders a decision-maker incapable of determining which option may be better. As such, it is apparent that the LHb does not merely serve as a disappointment or anti-reward center; rather, more properly, this nucleus may be viewed as a 'preference' center, whereby integration of differential LHb reward and aversion signals sets a tone that is crucial for expression of preferences for one course of action over another. Expression of these subjective preferences is likely achieved through subsequent integration of these dynamic signals by regions downstream of LHb, including the RMTg and midbrain dopamine neurons15,16. Indeed, the LHb exerts robust control over the firing of dopamine neurons17, and, similar to the LHb, mesolimbic dopamine circuitry has a preferential role in biasing choice toward larger, costly rewards rather than larger or smaller rewards of equal cost11,18. Collectively, these findings suggest that the LHb, working in collaboration with other nodes of dopamine decision circuitry, has a fundamental role in helping an organism make up its mind when faced with ambiguous decisions regarding the cost and benefits of different actions. Activity in this evolutionarily conserved nucleus aids in biasing behavior from a point of indifference toward committing to choices that may yield outcomes perceived as more beneficial. Further exploration of how the LHb facilitates these functions may provide insight to the pathophysiology underlying psychiatric disorders associated with aberrant reward processing and LHb dysfunction, such as depression19,20.
Experimental subjects and apparatus.
Experimentally naive, male Long Evans rats (Charles River Laboratories) weighing 250–300 g (60–70 d old) at the start of the experiment were single-housed and given access to food and water ad libitum. The colony was maintained on a 12-h light/dark cycle, with lights turned on at 7:00 a.m. Rats were food restricted to no more than 85–90% of free-feeding weight beginning 1 week before training. Feeding occurred in the rats' home cages at the end of the experimental day and body weights were monitored daily. Rats were trained and tested between 9:00 a.m. and 5:00 p.m. Individual rats were trained and tested at a consistent time each day. All testing was in accordance with the Canadian Council on Animal Care. Testing occurred in operant chambers (Med Associates) that were fitted with two retractable levers on either side of central food receptacles where reinforcement (45-mg pellets, Bioserv) was delivered by a dispenser, as described previously13. No statistical methods were used to predetermine sample sizes.
Rats were initially trained to press retractable levers within 10 s of their insertion into the chamber over a period of 5–7 d (refs. 10,11), after which they were trained on one of four decision-making tasks.
Risk-based decision making was assessed with a probabilistic discounting task described previously10,11. Rats received daily training sessions 6–7 d per week, consisting of 72 trials, separated into 4 blocks of 18 trials. Each 48-min session began in darkness with both levers retracted (the intertrial state). Trials began every 40 s with house light illumination and, 3 s later, insertion of one or both levers. One lever was designated the large, risky lever and the other the small, certain lever, which remained consistent throughout training (counterbalanced left/right). No response within 10 s of lever insertion reset the chamber to the intertrial state until the next trial (omission). Any choice retracted both levers. Choice of the small, certain lever always delivered one pellet with 100% probability; choice of the large, risky lever delivered four pellets with a probability that changed across the four trial blocks. Blocks were comprised of eight forced-choice trials (four trials for each lever, randomized in pairs), followed by ten free-choice trials, where both levers were presented. The probability of obtaining four pellets after selecting the large, risky option varied across blocks. Separate groups of rats were trained on variants where reward probabilities systematically decreased (100, 50, 25, 12.5%) or increased (12.5, 25, 50, 100%) across blocks. For each forced and free-choice trial in a particular block, the probability of receiving the large reward was drawn from a random number–generating function (Med-PC) with a set probability distribution (100, 50, 25 or 12.5%). Thus, on any given session, the probabilities in each block may have varied, but, on average across training days, the actual probability experienced by the rat approximated the set value in a block. Latencies to choose were also recorded. Rats were trained until, as a group, they chose the large, risky lever during the 100% probability block on ∼90% of trials and demonstrated stable baseline levels of choice, assessed using an ANOVA analysis as described previously10,11. Data from three consecutive sessions were analyzed with a two-way repeated-measures ANOVA with day and trial block as factors. If there was no main effect of day or day × trial block interaction (at P > 0.1 level), performance of the group was deemed stable.
Probabilistic choice with fixed reward probabilities.
Training on this task was very similar to the probabilistic discounting task, except that the probability of obtaining the larger four-pellet reward was set at 40%, and remained constant over one block of 20 free-choice trials that were preceded by 20 forced-choice trials. Data from rats that displayed a preference for the large, risky reward were used in the analysis.
This task shared similarities to the probabilistic discounting tasks in a number of respects, but with some key differences. Daily sessions consisted of 48 trials, separated into 4 blocks of 12 trials (2 forced- followed by 10 free-choice trials per block, 56-min session). Trials began every 70 s with house light illumination and insertion of one or both levers. One lever was designated the small, immediate lever, which, when pressed, always delivered one pellet immediately. Selection of the other, large, delayed lever delivered four pellets after a delay that increased systematically over the four blocks: it was initially 0 s, then 15, 30 and 45 s. No explicit cues were presented during the delay period; the house light was extinguished, and then re-illuminated following reward delivery.
Reward magnitude discrimination.
This task was used to confirm if the reduced preference for larger, costly rewards was due to a general reduction in preference for larger rewards or some other form of non-specific motivation or discrimination deficits. Rats were trained and tested on a task consisting of 48 trials divided into 4 blocks, each consisting of 2 forced- and 10 free-choice trials. As with the discounting tasks, choices were between a large 4-pellet and smaller, 1-pellet reward, both of which were delivered immediately with 100% certainty after a choice.
A separate behavioral experiment was conducted in intact animals to assess whether performance during the reward magnitude discrimination was under habitual or goal-directed control. A separate group of rats was trained for 9 d on a reward magnitude discrimination in an identical manner to those that received LHb inactivation. On day 10 of training, rats received a reinforcer devaluation test. 1 h before the test session, rats received ad libitum access to the sweetened reward pellets in their home cages. If responding on this task had become habitual, the prediction would be that reinforcer devaluation by pre-feeding should not influence performance during the test. Conversely, if choice was goal-directed, the bias toward the large reward should be diminished during this test.
Following the sucrose devaluation test, rats were retrained for two additional days on the task under standard food restriction, after which they again were selecting the large reward on nearly every free-choice trial. On the following day, rats received a response devaluation test during which responding on the large reward lever no longer delivered reward (although selecting the other lever still yielded one reward pellet).
Surgery and microinfusion protocol.
Rats were trained on the discounting tasks until they displayed stable levels of choice (20–25 d), after which they were fed ad libitum for 1–3 d and subjected to surgery. Those trained on the other tasks were implanted before training. Rats were anesthetized with 100 mg/kg ketamine and 7 mg/kg xylazine and implanted with bilateral 23 gauge stainless-steel cannulae aimed at the LHb (flat skull: anteroposterior = −3.8 mm; mediolateral = ±0.8 mm; dorsoventral = −4.5 mm from dura). Separate anatomical control groups were implanted with cannulae at sites, either 0.5–1.0 mm dorsal or 1 mm ventral to the LHb site. Separate groups of rats to be trained on the fixed probabilistic choice task were implanted with bilateral cannulae in the RMTg (flat skull at 10° laterally: anteroposterior = −6.8 mm; mediolateral = ±0.7 mm; dorsoventral = −7.4 mm) or a unilateral cannula in the dorsal raphe (flat skull with cannula at 20° laterally: anteroposterior = −7.6 mm; mediolateral = 0.0; dorsoventral = −5.2 mm). Cannulae were held in place with stainless steel screws and dental acrylic and plugged with obdurators that remained in place until the infusions were made. Rats were given ∼7 d to recover from surgery before testing, during which they were again food restricted.
Training was re-initiated on the respective task for at least 5 d until the group displayed stable levels of choice behavior for three consecutive days. 1–2 d before the first microinfusion test day, obdurators were removed, and a mock infusion procedure was conducted. The day after displaying stable discounting, the group received its first microinfusion test day. A within-subjects design was used for all experiments. Reversible inactivation of the LHb was achieved by infusion of a combination of GABA agonists baclofen and muscimol using procedures described previously11 (50 ng each in 0.2 μl, delivered over 45 s). Injection cannulae were left in place for 1 min for diffusion. Rats remained in their cage for an additional 10-min period before behavioral testing.
On the first infusion test day, half of the rats in each group received control treatments (saline); the remaining received baclofen/muscimol. The next day, rats received a baseline training day (no infusion). If, for any individual rat, choice of the large, risky lever deviated by >15% from its pre-infusion baseline, the rat received an additional day of training before the next test. On the following day, the second counterbalanced infusion was given.
Rats were killed, and their brains were removed and fixed in 4% formalin (vol/vol) for ≥24 h, frozen, sliced in 50-μm sections and stained with cresyl violet. Placements were verified with reference to an atlas21. Based on previous autoradiographical, metabolic, neurophysiological and behavioral measures22,23,24,25, the effective functional spread of inactivation induced by 0.2-μl infusions of 50 ng of GABA agonists would be expected to be between 0.5 and 1 mm in radius from the center of the infusion. Placements were deemed to be in the LHb only if the majority of the gliosis from the infusions resided in the clearly defined anatomical boundaries of this nucleus. Alternatively, rats whose placements resided outside this region, either because of direct targeting or from missed placements, were allocated to separate dorsal (hippocampus), medial (third ventricle) or ventral (thalamic) neuroanatomical control groups, which resided beyond the estimated effective functional spread of our inactivation treatments. Data from these groups were analyzed separately.
The primary dependent measure of interest was the proportion of choices directed toward the large reward lever (that is, large and risky or large and delayed) for each block of free-choice trials, factoring in trial omissions. For each block, this was calculated by dividing the number of choices of the large reward lever by the total number of successful trials. For the probabilistic discounting experiment, choice data were analyzed using three-way, between/within-subjects ANOVAs, with treatment and probability block as two within-subjects factors and task variant (that is, reward probabilities decreasing or increasing over blocks) as a between- subjects factor. Thus, in this analysis, the proportion of choices of the large, risky option across the four levels of trial block was analyzed irrespective of the order in which they were presented. For the delay discounting and reward magnitude experiment, choice data were analyzed with a two-way repeated measures ANOVA, with treatment and trial block as factors. Choice data from fixed probability experiments were analyzed with paired-sample two-tailed t tests response latencies (the time elapsed between lever insertion and subsequent choice) and the number of trial omissions (that is, trials where rats did not respond within 10 s) were likewise analyzed with paired-sample two-tailed t tests. Data distribution was assumed to be normal, but this was not formally tested. The use of automated operant procedures eliminated the need for experimenters to be blind to treatment.
Additional analyses were conducted on the latencies to make a response during forced choice trials of the different tasks to explore why LHb inactivation affected choice during the no-cost blocks of the discounting task but not the reward magnitude task. The rationale was that animals trained on the reward magnitude discrimination learn that the relative value of the larger reward is always higher than the smaller reward, while those trained on discounting tasks consistently experienced changes in relative value of the large reward option over a session and learn that the large reward lever is not always the best option available. To provide support for this hypothesis, we analyzed response latencies to select the large and small reward during all of the forced-choice trials for rats trained on the reward magnitude discrimination, and compared them to large and small reward forced-choice latencies displayed by rats performing the discounting tasks during the 100% or 0-s delay (that is, no cost) blocks. If well-trained animals perceived the larger reward as considerably 'better' than the smaller one, they should display faster response latencies when forced to choose the larger versus smaller reward. On the other hand, if the relative value of the two rewards is perceived as more comparable (even in the 100% or 0-s delay blocks), the difference in response latencies when forced to select one option or the other should be diminished.
We are indebted to D. Montes, C. Wiedman, M. Tse, G. Dalton and P. Piantadosi for their outstanding technical support. This work was supported by a grant from the Canadian Institutes of Health Research (MOP 89861) to S.B.F. S.B.F. receives funding from the Michael Smith Foundation for Heath Research.
Integrated supplementary information
Supplementary Figures 1–6, Supplementary Table 1
About this article