INTRODUCTION

Impairments in cost–benefit decision making requiring evaluations of potential risk and rewards have been associated with an array of psychiatric disorders characterized by dysfunction of mesocorticolimbic dopamine (DA) circuitry. As such, there has been growing interest in clarifying the relationship of the DA system to risk-based decision making in both healthy individuals and clinical populations. Decreases in DA and its metabolites have been observed in cerebrospinal fluid of pathological gamblers, indicative of increased DA transmission (Bergh et al, 1997). In addition, dopaminergic drugs administered to humans alter risk-based decision making. Acute treatment with amphetamine enhances gambling urges in pathological gamblers (Zack and Poulos, 2004), and treatment with DA agonists has been reported to produce pathological gambling tendencies in patients with Parkinson’s disease and Restless Legs Syndrome (Gallagher et al, 2007; Quickfall and Sucherowsky, 2007; Dang et al, 2011). Studies using animal models of decision making have further advanced our understanding of how DA and its different receptor subtypes modulate risk-based decision making. For example, administration of amphetamine, D1 or D2 receptor agonists increases risky choice in rats performing a probabilistic discounting task. Conversely, D1 or D2 antagonists reduce risky choice and block the effects of amphetamine (St Onge and Floresco, 2009). In contrast, D3 receptor stimulation reduced preference for larger, risky rewards, whereas blockade of these receptors alone were without effect, as were manipulations of D4 receptors.

Recent work in our laboratory has begun to investigate the specific terminal regions through which DA may exert its effects on decision making. In a recent study, St Onge et al (2011) reported differential effects of D1 and D2 receptor manipulations in the medial prefrontal cortex on probabilistic choice. D1 receptor blockade induced risk aversion by increasing negative-feedback sensitivity, whereas D2 blockade increased risky choice. The nucleus accumbens (NAc) is another critical efferent of midbrain DA neurons that has been implicated in reward and reinforcement learning processes. Neuroimaging data indicate that in the absence of choice, the NAc is preferentially activated by cues predicting financial gains compared with losses (Knutson et al, 2001a, 2001b). On risk-taking tasks, NAc activation precedes risky choice (Kuhnen and Knutson, 2005; Matthews et al, 2004) or anticipation of larger or more preferred rewards (Knutson et al, 2007), independent of cost. Cools et al (2007) examined NAc activation during probabilistic learning in Parkinson’s patients on or off L-DOPA medication. They reported that the NAc was activated during reversal learning in patients off medication, but L-DOPA treatment attenuated this effect. Animal studies complement these findings and have further clarified the specific contribution of the NAc to these types of processes. Lesion or inactivation of the NAc in rats disrupts decision making, reducing reward sensitivity and preference for, riskier options, particularly when these options have greater long-term utility (Cardinal and Howes, 2005; Stopper and Floresco, 2011).

Although a broader understanding of the functional role of the NAc function in decision making is emerging, it is unclear how mesoaccumbens DA may modulate these functions, and the specific receptor mechanisms underlying these actions. NAc D1 and D2 receptors have been shown to differentially contribute to other executive function regulated by the prefrontal cortex. NAc D1 receptors facilitate complex strategy shifting but not simple reversal learning, whereas blockade of D2 receptors increases response times without affecting performance on these tasks. In contrast, stimulation of NAc D2 receptors induced more global deficits in behavioral flexibility (Haluk and Floresco, 2009). Blockade of D1 or D2 receptors in the NAc has also been reported to impair attentional accuracy (Pezze et al, 2007). In addition, we have recently shown that dynamic fluctuation in NAc DA release during decision making appear to encode integrated signals about reward rates, uncertainty, and choice, reflecting implementation of decision policies (St Onge et al, 2012a). Yet, the manner in which the activity of different DA receptors in the NAc may modify cost–benefit assessments about potential risks and rewards remains to be addressed experimentally. Accordingly, this study was conducted to explore how DA receptors in the NAc modulate risk-based decision making, assessed with a probabilistic discounting procedure. In doing so, we used local administration of different DA receptor-selective agonists and antagonists into the NAc, using compounds that have been shown to alter this aspect of decision making when administered systemically (St Onge and Floresco, 2009).

MATERIALS AND METHODS

Animals

Male Long Evans rats (Charles River Laboratories, Montreal, Canada) weighing 250–300 g at the beginning of training were used. On arrival, rats were given 1 week to acclimatize to the colony and food restricted to 85–90% of their free-feeding weight for 1 week before behavioral training and given ad libitum access to water for the duration of the experiment. Feeding occurred in the rats’ home cages at the end of the experimental day and body weights were monitored daily. All testing was in accordance with the Canadian Council on Animal Care and the Animal Care Committee of the University of British Columbia.

Apparatus

Behavioral testing was conducted in twenty operant chambers (30.5 × 24 × 21 cm; Med Associates, St Albans, VT, USA) enclosed in sound attenuating boxes. The boxes were equipped with a fan that provided ventilation and masked extraneous noise. Each chamber was fitted with two retractable levers, one located on each side of a central food receptacle where food reinforcement (45 mg; Bioserv, Frenchtown, NJ, USA) was delivered by a pellet dispenser. The chambers were illuminated by a single 100-mA house light located in the top center of the wall opposite the levers. Four infrared photobeams were mounted on the side of each chamber, and another photobeam was located in the food receptacle. Locomotor activity was indexed by the number of photobeam breaks that occurred during a session. All experimental data were recorded by personal computers connected to the chambers through an interface.

Lever Pressing Training/Side Bias Testing

Before training on the full task, rats received 5–7 days of lever press training, in a manner identical to that used by St Onge and Floresco (2009) (as adapted from Cardinal et al, 2000). Briefly, rats were initially trained to press each of the two levers on a FR-1 schedule, and then received retractable lever training (90 trials per session), requiring them to press one of the two levers within 10 s of its insertion for reinforcement delivered with a 50% probability. This procedure familiarized them with the association of lever pressing with food reward delivery as well as the probabilistic nature of the subsequent discounting task.

Immediately after the last day of retractable lever training, rats that were to be trained on the discounting task were tested for their side bias, using procedures we have described elsewhere (Floresco et al, 2008; Haluk and Floresco, 2009). This procedure was instituted because pilot studies in our laboratory revealed that accounting for rats innate side bias when designating the lever to be associated with a larger reward reduced considerably the number of training sessions required to observe prominent discounting by groups of rats. This session resembled pretraining, except that both levers were inserted into the chamber simultaneously. On the first trial, a food pellet was delivered after responding on either lever. Upon subsequent lever insertion, food was delivered only if the rat responded on the lever opposite to the one chosen initially. If the rat chose the same lever as the initial choice, no food was delivered, and the house light was extinguished. This continued until the rat chose the lever opposite to the one chosen initially. After choosing both levers, a new trial commenced. Thus, a single trial of the side bias procedure consisted of at least one response on each lever. Rats received 7 such trials, and typically required 13–15 responses to complete side bias testing. The lever (right or left) that a rat responded on first during the initial choice of a trial was recorded and counted toward its side bias. If the total number of responses on the left and right lever were comparable, the lever that a rat chose initially four or more times over seven total trials was considered its side bias. However, if a rat made a disproportionate number of responses on one lever over the entire session (ie, >2 : 1 ratio for the total number of presses), that lever was considered its side bias. On the following day, rats commenced training on the decision-making task.

Decision-Making Tasks

Probabilistic discounting

The primary task used in these studies was the probabilistic discounting procedure that has also been described previously (St Onge and Floresco, 2009), which was originally modified from that described by Cardinal and Howes (2005) (Figure 1). Rats received daily sessions consisting of 72 trials, separated into four blocks of 18 trials. The entire session took 48 min to complete, and the animals were trained 5–7 days per week. A session began in darkness with both levers retracted (the intertrial state). A trial began every 40 s with the illumination of the house light and the insertion of one or both levers into the chamber. One lever was designated the large/risky lever, the other the small/certain lever, which remained consistent throughout training. For each rat, the large/risky lever was set to be opposite of its side bias. If the rat did not respond within 10 s of lever presentation, the chamber was reset to the intertrial state until the next trial (omission). When a lever was chosen, both levers retracted. Choice of the small/certain lever always delivered one pellet with 100% probability; choice of the large/risky lever delivered four pellets but with a particular probability. After a response was made and food delivered, the house light remained on for another 4 s, after which the chamber reverted back to the intertrial state until the next trial. Multiple pellets were delivered 0.5 s apart. The four blocks consisted of eight forced choice trials where only one lever was presented (four trials for each lever, randomized in pairs) permitting animals to learn the amount of food associated with each lever press and the respective probability of receiving reinforcement over each block. This was followed by 10 free-choice trials, where both levers were presented and the animal had to decide whether to choose the small/certain or the large/risky lever. The probability of obtaining four pellets after pressing the large/risky lever was varied systematically across the four blocks: it was initially 100%, then 50%, 25%, and 12.5%, respectively. Thus, when the probability of obtaining the four-pellet reward was 100% or 50%, this option would be more advantageous. In the 25% block, both options had equal long-term utility, whereas at 12.5%, the small/certain lever would be the more advantageous option in the long term.

Figure 1
figure 1

Probabilistic discounting task design. Format of a single free-choice trial.

PowerPoint slide

Rats were trained on the task until as a group, they (1) chose the large/risky lever during the first trial block (100% probability) on at least 80% of successive trials, (2) chose the large/risky lever during the final trial block (12.5% probability) on fewer than 60% of successive trials, and (3) demonstrated stable baseline levels of choice. Infusions were administered after a group of rats displayed stable patterns of choice for 3 consecutive days, assessed using a procedure similar to that described by St Onge and Floresco (2010). In brief, data from three consecutive sessions were analyzed with a repeated-measures ANOVA with two within-subjects factors (day and trial block). If the effect of block was significant at the p<0.05 level but there was no main effect of day or day × block interaction (at the p>0.1 level), animals were judged to have achieved stable baseline levels of choice behavior.

Reward magnitude discrimination

As we have done in other studies (Ghods-Sharifi et al, 2009; St Onge et al, 2012b), we determined a priori that if any drug treatment reduced preference for the large/risky option, we would assess how the most effective dose of that compound altered reward magnitude discrimination. This was done to confirm whether or not the reduced preference for the risky option was due to a general reduction in preference for larger rewards. Separate groups of animals were trained and tested on an abbreviated task consisted of 48 trials divided into 4 blocks, each consisting of 2 forced- and 10 free-choice trials. As with the discounting task, choices were between a large (four pellets) option and a small (one pellet) option. However, the probability of reinforcement for both options was held constant at 100% across blocks.

Surgery

Rats were trained on task until they displayed stable levels of choice, after which they were provided food ad libitum for 1–3 days later, and were then subjected to surgery. Rats were anesthetized with 100 mg/kg ketamine hydrochloride and 7 mg/kg xylazine and implanted with bilateral 23 gauge stainless steel guide cannulae aimed at the NAc. Rats received implants aimed at the central portion of the NAc along the core/shell border, to inactivate both subregions (flat skull: anteroposterior=+1.5 mm; medialateral=±1.4 mm; dorsoventral=−5.9 mm from dura). Our previous work has shown that microinfusions aimed at the core/shell border produce the combined behavioral effects observed following inactivation of either subregion individually, with inactivation of the NAc shell specifically reducing risky choice (Stopper and Floresco, 2011). Guide cannulae were held in place with stainless steel screws and dental acrylic. Thirty gauge obdurators flush with the end of guide cannulae remained in place until the infusions were made. Rats were given at least 7 days to recover from surgery before testing. During this period, they were handled at least 5 min each day and were food restricted to 85% of their free-feeding weight.

Drugs and Microinfusion Protocol

Following recovery from surgery, rats were subsequently trained on task for at least 5 days until the group displayed stable levels of choice behavior for 3 consecutive days. Two to three days before their first microinfusion test day, obdurators were removed, and a mock infusion procedure was conducted. Stainless steel injectors were placed in the guide cannulae for 2 min, but no infusion was administered. The day after displaying stable discounting, the group received its first microinfusion test day.

A within-subjects design was used for all experiments. Drugs or vehicle were infused at a volume of 0.5 μl per hemisphere. This volume has been used for numerous studies that have assessed the effects of infusion of DA agonists or antagonists into the NAc on variety of cognitive functions and reward-related behaviors (eg, Nowend et al, 2001; Bari and Pierce, 2005; Pattij et al, 2007; Pezze et al, 2007; Haluk and Floresco, 2009; Besson et al, 2010). Furthermore, 0.5 μl infusions of D1, or D2 or D3 antagonists at doses similar to the ones used in this study have been reported to induce dissociable effects on behavior when infused into the shell vs core region of the NAc, which are separated by 1.5 mm (Bari and Pierce, 2005; Pattij et al, 2007; Besson et al, 2010). As our infusions were targeted in the central NAc, it is likely that the effects reported here are due primarily to actions on DA receptors residing within the NAc.

The following dopaminergic agents were selected because they have been shown to interfere with decision making either when administered systemically (St Onge and Floresco, 2009) or to disrupt other executive functions when infused into the NAc (ie, quinpirole; Haluk and Floresco, 2009). The following DA antagonist and doses (per hemisphere) were used: D1 antagonist R-(+)-SCH 23 390 hydrochloride (0.1 and 1 μg; Sigma-Aldrich) and D2 antagonist eticlopride hydrochloride (0.1 and 1 μg; Sigma-Aldrich). The D1 agonist used was SKF 81 297 (0.2 and 2 μg; Tocris Bioscience). To stimulate the D2-like family of receptors, our initials studies used quinpirole (1 and 10 μg; Sigma-Aldrich), which stimulates both D2 and D3 receptors. Functional assays have shown that quinpirole is approximately three times more selective for D3 vs D2 receptors (Sautel et al, 1995). Subsequent experiments used agonists that display more preferential affinities to a specific receptor subtype. Bromocriptine (1 and 10 μg; Sigma-Aldrich) was used as a D2-preferring agonist, as it is 10 times as active at the D2 receptor compared with D3 and D4 receptors (Sautel et al, 1995). PD 128 907 (1.5 and 3 μg; Tocris Bioscience) was used as a D3-preferring agonist. In comparison with quinpirole, PD 128 907 has a substantially higher selectivity for D3 relative to D2 receptors (>50 times; Bristow et al, 1996; Sautel et al 1995). We did not test the effects of a D3 antagonist because previous work has shown that systemic blockade of these receptors alone does not reliably affect decision making (St Onge and Floresco, 2009). All drugs were dissolved in physiological saline, sonicated until dissolved, and protected from light, with the exception of bromocriptine, which was first dissolved in dimethyl sulfoxide (DMSO) and then diluted with saline in a 50 : 50 ratio; this DMSO/saline solution was also used as the vehicle treatment for the bromocriptine experiment. Infusions were administered bilaterally via 30 gauge injection cannulae that protruded 0.8 mm past the end of the guide cannulae, at a rate of 0.4 μl/min by a microsyringe pump. Injection cannulae were left in place for an additional 1 min to allow for diffusion. Each rat remained in its home cage for an additional 10-min period (or 20 min for bromocriptine infusions) before behavioral testing.

On separate test days, rats trained on the discounting task received infusions of one of two doses of each drug and vehicle. Only three infusions were administered to minimize mechanical damage that can occur with repeated infusions. As such, doses for each compound were carefully selected from previous studies that have shown them to be effective at altering behavior when infused into the NAc. Whenever possible, these doses were taken from studies that focused on the effects of these drugs on prefrontal cortex-mediated cognitive functions or reward-related behavior. For example, intra-NAc infusions of SCH 23 390 (1 μg) or quinpirole (1 or 10 μg) impairs strategy shifting set shifting (Haluk and Floresco, 2009). Infusions of 0.1 μg eticlopride disrupts social partner preference (Gingrich et al, 2000), whereas a 1 μg dose increased response latencies and trial omissions, and blocked amphetamine-induced increases in premature responses on a five-choice serial reaction time task (Pattij et al, 2007). A 0.2 μg dose of SKF 81 297 caused slight, nonsignificant improvements in strategy shifting (Haluk and Floresco, 2009), whereas infusions of 1–3 μg promoted reinstatement of cocaine seeking (Schmidt et al, 2006; Bachtell et al, 2005). There have been no studies assessing the effects of intra-NAc infusions of PD 128 907 on cognition. However, infusions of 1.5–3 μg of this drug into the NAc reduce spontaneous and DA-induced locomotor activity (Ouagazzal and Creese, 2000). Similarly, infusions of 10 μg bromocriptine into the NAc has been reported to potentiate DA-induced locomotor activity (Jenkins and Jackson, 1986). In addition, most of the above-mentioned studies have shown these drugs to be behaviorally active for 30–60 min, which is within the time frame of the behavioral tests used here.

A separate group was allocated for testing each of the drugs. Drug doses were administered in a counterbalanced order across rats, using a within-subjects design; each rat received vehicle and both doses of the drug on separate test days. For rats that received saline infusions on their first test, efforts were made to match control performance across different drug groups. Test days were separated by a baseline training day where no infusion was administered. If, for any individual rat, choice of the large/risky lever deviated by >15% from its preinfusion baseline during this first baseline day, it received an additional day of training before the next infusion test.

Histology

After completion of behavioral testing, rats were euthanized in a carbon dioxide chamber. Brains were removed and fixed in a 4% formalin solution. The brains were frozen and sliced in 50 μm sections before being mounted and stained with Cresyl Violet. Placements were verified with reference to the neuroanatomical atlas of Paxinos and Watson (2005) (see Figure 2). All of the placements resided within the main boundaries of the NAc, clustering around the border of the core and shell subregions. None of the placements encroached on the ventral portion of islands of Calleja; this is particularly relevant for studies that involved the D3 agonist, and labeling for these receptors is considerably higher in this region vs the NAc (Bouthenet et al, 1991; Levant, 1998).

Figure 2
figure 2

Schematic of sections of the rat brain showing location of acceptable infusions in the NAc. Numbers correspond to mm from bregma.

PowerPoint slide

Data Analysis

The primary dependent measure of interest was the proportion of choices directed toward the large reward lever for each block of free-choice trials, factoring out trial omissions. For each block, this was calculated by dividing the number of choices of the large reward lever by the total number of successful trials (ie, those where the rat made a choice). Choice data were analyzed using two-way within-subjects ANOVAs, with treatment and trial block as two within-subjects factors. The effect of trial block was always significant (p<0.001) for the probabilistic discounting task and will not be reported further. Response latencies, locomotor activity (ie, photobeam breaks) and the number of trial omissions were analyzed with one-way repeated-measures ANOVAs.

Win-stay/lose-shift analysis

Whenever we observed a significant main effect of a drug treatment on probabilistic discounting, we conducted a supplementary analysis to further clarify whether changes in choice biases were due to alterations in sensitivity to reward (win-stay performance) or negative-feedback (lose-shift performance) (Bari et al, 2009; Stopper and Floresco 2011; St Onge et al, 2011, 2012b). Animals’ choices during the task were analyzed according to the outcome of each preceding trial (reward or non-reward) and expressed as a ratio. The proportion of win-stay trials was calculated from the number of times a rat chose the large/risky lever after choosing the risky option on the preceding trial and obtaining the large reward (a win), divided by the total number of free-choice trials where the rat obtained the larger reward. Conversely, lose-shift performance was calculated from the number of times a rat shifted choice to the small/certain lever after choosing the risky option on the preceding trial and was not rewarded (a loss), divided by the total number of free-choice trials resulting in a loss. This analysis was conducted for all trials across the four blocks. We could not conduct a block-by-block analysis of these data because there were many instances where rats either did not select the large/risky lever or did not obtain the large reward at all during the latter blocks. Changes in win-stay performance were used as an index of reward sensitivity, whereas changes in lose-shift performance served as an index of negative-feedback sensitivity.

The win-stay/lose-shift supplementary analyses were conducted to obtain more detailed information regarding the specific processes affected by DA receptor manipulations that may have caused an overall change in choice bias. For example, reduced preference for the large/risky option induced by a particular dose of a drug may have been associated with either a reduced tendency to select the risky option after obtaining the large reward on the previous trial (ie, reduced win-stay behavior), or an increased tendency to select the certain option after selecting risky on the preceding trial and not receiving a reward (ie, increased lose-shift behavior). Thus, an overall decrease in risky choice could only result in a unidirectional change in one these measures (eg, an overall decrease in risky choice could not be caused by decreased lose-shift behavior). Furthermore, we were only interested in conducting these analyses for treatments that caused an overall change in risky choice in the primary analysis. Thus, when these analyses were conducted, we compared win-stay and lose-shift ratios observed after vehicle treatment with those observed following treatment with a specific drug dose that caused an overall change in choice preference. To this end, we used separate one-tailed dependent variable t-tests when only one dose caused an overall change in choice behavior, or one-way ANOVAs when both doses were effective at altering choice. The raw values from which win-stay/lose-shift ratios were calculated are presented in Table 1.

Table 1 Mean (±SEM) Number of Risky Choices for Each Probability Block Separated by ‘Wins’ and ‘Losses’

RESULTS

Blockade of NAc D1 and D2 Receptors

D1 receptor blockade

Rats in this group were trained on the probabilistic discounting task for an average of 21 days before being implanted with guide cannulae in the NAc, retrained on the task, and receiving counterbalanced microinfusions. A total of 13 rats with acceptable placements were included in the data analysis. Analysis of the choice data revealed a significant main effect of treatment (F(2, 24)=4.37, p<0.05) but no treatment × block interaction (F(6, 72)=0.33, NS). Multiple comparisons parsing out the main effect of treatment confirmed that across all blocks, the high dose of SCH 23 390 (1 μg) significantly decreased preference for the large/risky lever compared with both saline (Tukey’s test, p<0.05) and the low dose (0.1 μg; p<0.05), whereas the low dose produced no reliable change in choice behavior (Figure 3a). D1 blockade significantly increased response latencies (F(2, 24)=3.16, p=0.05), and decreased locomotor counts (F(2, 24)=8.35, p<0.005; Table 2). The high dose of SCH 23 390 also caused a slight increase in trial omissions but this effect only approached statistical significance (F(2, 24)=3.29, p=0.06; Table 2).

Figure 3
figure 3

Blockade of D1, but not D2 receptors in the NAc reduces risky choice. (a) Percentage choice of the large/risky option following infusions of two doses of SCH 23 390 or saline into the NAc across four blocks of free-choice trials. Choice data are plotted as a function of probability block. Symbols represent mean+SEM. Black star denotes p<0.05 of average choice across blocks for the 1.0 μg dose condition vs saline. (b) Win-stay/lose-shift ratios observed after treatment with the 1 μg dose of SCH 23 390 and vehicle saline treatments. Win-stay values are displayed as the proportion of choices on the large/risky lever following a rewarded risky choice on the preceding trial. Lose-shift values are displayed as the proportion of choices on the small/certain lever following unrewarded risky choice on the preceding trial. SCH 23 390 selectively augmented loss sensitivity, increasing the tendency to select the small/certain option after a non-rewarded risky choice. (c) Choice data for animals receiving infusions of two doses of the D2 antagonist eticlopride or saline.

PowerPoint slide

Table 2 Mean (±SEM) Locomotor Activity, Response Latencies, and Omissions During Probabilistic Discounting or Reward Magnitude Discrimination

We further analyzed the proportion of ‘win-stay’ and ‘lose-shift’ trials to determine whether the decrease in risky choice induced by the 1 μg dose of SCH 23 390 could be attributed to altered reward or negative-feedback sensitivity, respectively. This analysis revealed that risk aversion induced by the 1 μg dose of SCH 23 390 was not due to decreased reward sensitivity, as win-stay tendencies were unaltered (t(12)=0.77, NS; Figure 3b, left). In contrast, analysis of lose-shift tendencies revealed that this dose increases negative-feedback sensitivity, (t(12)=1.95, p<0.05, one-tailed; Figure 3b, right). Thus, following D1 receptor blockade, rats were more likely to shift their response selection toward the safe option following an unrewarded risky choice.

D2 receptor blockade

A total of eight rats with acceptable placements were included in the data analysis. This group was trained for 25 days, after which they displayed stable discounting behavior. In stark contrast to the effects of D1 receptor antagonism, blockade of D2 receptors in the NAc did not affect risky choice. Analysis of the choice data revealed no significant main effect of treatment (F(2, 14)=0.03, NS; Figure 3c) and no treatment × block interaction (F(6, 42)=0.27, NS). Response latencies tended to be longer following D2 blockade, but this effect was not statistically significant (F(2, 14)=1.99, NS; Table 2). Trial omissions did not differ across treatments (F(2, 14)=1.16, NS; Table 2). However, these doses were behaviorally active, as they did significantly decrease locomotor counts (F(2, 14)=5.41, p<0.05; Table 2). Collectively, these data show that blockade of D1, but not D2, receptors in the NAc altered probabilistic discounting, reducing preference for larger, uncertain rewards.

Stimulation of NAc D1, D2, and D3 Receptors

D1 receptor stimulation

A total of 11 rats with acceptable placements were included in the data analysis. These rats required an average of 23 days of training before displaying stable discounting. As displayed in Figure 4a, infusions of the 2 μg dose of SKF 81 297 induced a particularly interesting profile of choice. Specifically, D1 receptor stimulation optimized the discounting curve, so that animals tended to choose the risky option more often when it was of greater utility, and less often when this option was of lesser long-term relative value. Analysis of the choice data revealed no significant main effect of treatment with SKF 81 297 (F(2, 20)=0.07, NS), but did show a significant treatment × block interaction (F(6, 60)=3.43, p<0.01; Figure 4a). Subsequent simple main effect analyses of this interaction, analyzing differences in choice behavior during each probability block revealed that the 2 μg dose increased choice of the large/risky lever compared with saline on the 50% block and decreased risky choice on the 12.5% block (p<0.05).

Figure 4
figure 4

Stimulation of D1, but not D2 agonists in the NAc modifies risky choice. All conventions are the same as Figure 3 (a). Infusions of the D1 agonist SKF 81 297 optimized decision making. The 2 μg dose of SKF 81 297 increased choice of the risky lever on blocks when the probability of obtaining the large/risky reward was high (50%) and decreased risky choice when this option was disadvantageous (12.5%). Black star denotes p<0.05 for the treatment × trial block interaction. In contrast, neither infusions of the D2/D3 agonist quinpirole (b) nor the D2-selective agonist bromocriptine (c) affected risky choice.

PowerPoint slide

Infusions of the 2 μg dose of SKF increased win-stay tendencies, although this difference was not statistically significant (saline=0.86±0.06; SKF=0.93±0.02; t(10)=1.11, NS). Similarly, lose-shift ratios were decreased by SKF, but again this was not a statistically reliable effect (saline=0.40±0.09; SKF=0.31±0.05; t (10)=0.66, NS). Response latencies were unaffected (F(2, 20)=0.38, NS), as were trial omissions (F(2, 20)=0.95, NS), and locomotor counts (F(2, 20)=0.45, NS; Table 2). Thus, stimulation of D1 receptors in the NAc ‘improved’ decision making, and optimized choice behavior, wherein choice biases toward the large/risky or small/certain reward were enhanced during periods when these options had greater long-term utility.

D2/D3 stimulation (quinpirole)

Our initial studies investigating the effects of NAc D2 receptor stimulation on decision making used the mixed D2/D3 receptor agonist quinpirole, at doses we have shown previously to markedly disrupt behavioral flexibility when infused into the NAc (Haluk and Floresco, 2009). A total of 12 rats with acceptable placements were included in the data analysis. These rats displayed stable discounting following an average of 28 days of training. Somewhat surprisingly, infusions of quinpirole did not alter risky choice (main effect of treatment, F(2, 22)=0.04, NS; treatment × block interaction, F(6, 66)=0.35, NS; Figure 4b). This treatment had no effect on response latencies (F(2, 22)=0.24, NS), trial omissions (F(2, 22)=1.61, NS), or locomotor counts (F(2, 22)=1.92, NS; Table 2).

Preferential D2 receptor stimulation (bromocriptine)

As mentioned above, quinpirole has comparable affinity for both D2 and D3 receptors, and previous studies have shown that more preferential stimulation of either of these receptors can induce opposing effects on risky choice using this assay (St Onge and Floresco, 2009). As such, we conducted additional experiments whereby we used agonists that had higher relative affinities for either D2 or D3 receptors that have been shown to modify decision making when administered systemically. For D2 receptors, we used bromocriptine, at doses of 1 and 10 μg. A total of 11 rats with acceptable placements were included in the data analysis. Rats displayed stable discounting following an average of 22 days of training. Similar to what was observed in the quinpirole experiment, infusions of bromocriptine into the NAc did not modify choice behavior at either dose tested (main effect and interaction F-values<1.0, NS; Figure 4c). Similarly, D2 stimulation had no effect on response latencies, trial omissions, or locomotor counts (all F-values<2.6, NS; Table 2). Thus, stimulation of NAc D2 receptors does not seem to interfere with risk-based decision making assessed in this manner.

Preferential D3 stimulation (PD 128 907)

To test the effect of NAc D3 receptor stimulation, we used the agonist PD 128 907, which has been reported to reduce risky choice when administered systemically (St Onge and Floresco, 2009). A total of 11 rats with acceptable placements were included in the data analysis. Rats displayed stable discounting following 22 days of training. In contrast to the effects of quinpirole and bromocriptine, preferential stimulation of D3 receptors reduced choice of the large/risky option. Analysis of the choice data revealed a significant main effect of treatment (F(2, 20)=3.42, p<0.05; Figure 5a) but no treatment × block interaction (F(6, 60)=0.36, NS), indicating that these treatment caused a reduced preference for the large reward option that was comparable across all blocks. Multiple comparisons further revealed that both doses of the drug reduced risky choice (p<0.05), although the lower, 1.5 μg dose produced an effect that was numerically >3 μg dose. D3 stimulation had a marginal effect on locomotion (F(2, 20)=3.33, p=0.056), which was due to a significant decrease in locomotor counts following administration of the low (1.5 μg) dose compared with saline (Table 2). PD 128 907 had no effect on response latencies (F(2, 20)=0.67, NS) or trial omissions (F(2, 20)=0.54, NS; Table 2).

Figure 5
figure 5

Stimulation of D3 receptors in the NAc reduces preference for larger, uncertain rewards. All conventions are the same as Figure 3. (a) Infusion of PD 128 907 significantly decreased overall choice of the large/risky lever. (b) Analysis of win-stay and lose-shift behavior demonstrates that PD 128 907 selectively decreases reward sensitivity, with this effect being more prominent at the lower, 1.5 μg dose. This decrease in win-stay tendencies indicates that PD 128 907 caused reduced the tendency to maintain a preference of the large/risky lever after obtaining the larger reward on preceding trials. Stars denote p<0.05 vs saline.

PowerPoint slide

The reduced preference for the large/risky option induced by PD 128 907 was attributable primarily to a reduction in reward sensitivity. Analysis of win-stay tendencies revealed a significant effect of treatment (F(2, 20)=4.05, p<0.05). Thus, after receiving a larger reward following selection of the risky option, rats were less likely to select that option on a subsequent trial after treatment with PD 128 907. Multiple comparisons further showed that the effect of this D3 agonist displayed a biphasic dose-response function, wherein win-stay tendencies were significantly (p<0.05) reduced following treatment with the 1.5 μg dose, but not the 3 μg dose (Figure 5b, left). In contrast, lose-shift tendencies were unaffected by these treatment (F(2, 20)=0.03, NS; Figure 5b right). Thus, stimulation of D3 receptors in the NAc reduced the impact that larger, uncertain rewards exert on subsequent choice.

Reward Magnitude Discrimination

Both the D1 antagonist (SCH 23 390) and D3 agonist (PD 128 907) shifted preference away from option associated with the larger, but uncertain reward. To confirm whether or not this effect was attributable to a reduced preference for larger rewards or an inability to discriminate between differing amounts of reward, separate groups of animals independent from those trained on the discounting task trained on a reward magnitude discrimination task. After 9 days of training on the task, rats received infusion of saline and either SCH 23 390 (1 μg, n=7) or PD 128 907 (1.5 μg, n=6) on separate test days. As displayed in Figure 6, neither D1 receptor blockade (F(1, 6)=0.13, NS; Figure 6a) nor D3 receptor stimulation (F(1, 5)=0.46, NS; Figure 6b) affected preference for the certain four-pellet option. As such, these data indicate that the effect of these treatments on risk-based decision making cannot be attributed to a reduced preference for larger rewards.

Figure 6
figure 6

Dopaminergic manipulations that decrease risky choice do not impair reward magnitude discrimination. Choice data on the reward magnitude discrimination task for SCH 23 390 (a) and PD 128 907 (b) compared against saline. Animals chose between two certain rewards of differing magnitude (4 pellets vs 1 pellet). Data are divided into four blocks of 10 trials. Neither drug treatment decreased preference for a larger, certain reward.

PowerPoint slide

DISCUSSION

The present data provide novel insight into the contribution of DA receptors in the NAc to risk-based decision making, demonstrating that D1 but not D2, receptor activity exerts important modulatory control over choice between small, certain and larger, uncertain rewards. Blockade of D1 receptors induced risk aversion and enhanced negative-feedback sensitivity, increasing the tendency to shift to the small/certain option following non-rewarded risky choices. Conversely, stimulation of D1 receptors optimized decision-making biases, reflected by a sharpening the discounting curve. The D1 agonist enhanced biases for the option that provided greater long-term utility as the likelihood of delivering reward changed across a session. On the other hand, neither antagonism nor stimulation of NAc D2 receptors altered choice behavior. However, stimulation of D3 receptors reduced preference for large/risky rewards, decreasing the likelihood of choosing the large/risky option following a risky win. These results show that DA receptor subtypes within the NAc make distinct contributions to risky choice via differential effects on reward and negative-feedback sensitivity.

The probabilistic discounting task used here has been used by our laboratory to dissect the relative contribution of different regions of the prefrontal cortex, amygdala, and NAc to certain aspects of risk-based decision making (Ghods-Sharifi et al, 2009; St Onge and Floresco, 2010; Stopper and Floresco, 2011). Thus, this study used the same assay so that we could directly compare the effects of NAc DA receptor manipulations with our previous findings. In this task, rats learn over training to keep track of changes the probability of obtaining the larger reward in order to facilitate modifications in choice biases when the large/risky reward is of greater, equal or lesser long-term utility relative to the small/certain option. Previous work by our group has shown that rats display similar patterns of discounting on this task irrespective of whether the odds of obtaining the larger reward decrease or increase systematically over a session (St Onge et al, 2010). Moreover, lesions of the NAc, or systemic DA receptor blockade reduce preference for the large/risky option under each of these task conditions (Cardinal and Howes, 2005; St Onge et al, 2010), suggesting that the effects reported here are unlikely to be dependent on the manner in which reward probabilities change. Interestingly, rats trained on a variant where large/risky reward probabilities change in a more randomized manner show considerably less discounting, even with extended training, presumably because they find this task more difficult compared with odds shifts that occur in a more systematic manner (St Onge et al, 2010). As such, we chose to use a more standard version of the task to maximize the possibility of observing significant shifts in choice biases. Note that despite the 20–25 days of training required by rats to display prominent and stable discounting behavior, it is unlikely that their choice patterns reflect habitual-like patterns of choice. On each training day, rats routinely sample both levers during each block of a session, but shift their bias gradually as reward probabilities change. In addition, choice behavior can be influenced by satiety manipulations (St Onge and Floresco, 2009), further arguing against the idea that performance on this task reflects habitual modes of responding. Rather, choice behavior appears to be guided primarily by changes action/outcome contingencies that signal variations in the likelihood of obtaining the larger reward.

NAc D1 Receptors and Risk-Based Decision Making

Blockade of D1 receptors in the NAc reduced preference for larger, uncertain rewards. These treatments also increased response latencies and decreased locomotor activity. However, similar treatments did not alter choice behavior on a simpler, reward magnitude discrimination, where rats chose between larger and smaller rewards, both delivered with 100% probability. This latter finding indicates that alterations in decision making induced by D1 receptor antagonism in the NAc cannot be easily attributed to disruptions in preference for larger vs smaller rewards, or nonspecific impairments in motivational or motoric processes.

A detailed analysis of choice behavior on trials following those where animals received or did not receive the large/risky reward provides important insight into the underlying processes that were disrupted by D1 receptor blockade. Under control conditions, rats chose the risky option on 85% of trials after obtaining the larger reward on the preceding trial. Conversely, on trials following a risky choice and loss, animals shifted to the small/certain option on 35% of subsequent trials. SCH 23 390 administered into the NAc did not alter win-stay tendencies, suggesting that the decrease in risky choice was not attributable to a reduction in reward sensitivity. Instead, these manipulations selectively enhanced lose-shift tendencies, increasing the likelihood that rats would shift choice after a risky ‘loss’, and select the small/certain option on the subsequent choice. It is interesting to note that NAc D1 receptor antagonism altered risky choice in a manner very similar to that induced by infusions of SCH 23 390 into the prefrontal cortex (St Onge et al, 2011). Prefrontal D1 receptor blockade also increase risk aversion, specifically via an enhanced sensitivity to reward omissions. These findings suggest that under conditions involving reward uncertainty, D1 receptors in the NAc and prefrontal cortex appear share a similar function, mitigating the impact that reward omissions exert on subsequent choice preferences, facilitating biases toward potentially more profitable options despite their uncertainty. In essence, D1 receptors aid in overcoming uncertainty costs and keeping the ‘eye on the prize’, by maintaining choice biases even when a risky choice leads to reward omission.

In comparison with the effects of D1 receptor blockade, stimulation of these receptors in the NAc yielded a true ‘improvement’ in decision making. Infusion of SKF 81 297 significantly sharpened the discounting curve; when the four-pellet option was more advantageous (50% block), animals selected it with greater frequency, whereas rats chose the risky option less during the 12.5% block when the small/certain option would have greater long-term utility. Infusions of D1 agonists into the NAc have been reported to improve other aspects of cognition, such as attentional accuracy (Pezze et al, 2007). With respect to the effects on probabilistic discounting, the observation that choice was shifted sometimes toward or away from the risky option suggests that NAc D1 stimulation may have augmented attention for both delivery of uncertain rewards and reward omissions during different phases of the task. A more detailed analysis of choice behavior showed that win-stay and lose-shift tendencies across the entire session did not differ following intra-NAc administration of SKF 81 297 vs control treatments. The fact that choice biases shifted in opposing directions on different trial blocks following D1 receptor stimulation may be one explanation for the lack of overall effect of SKF 81 297 on these measures. Thus, D1 receptor stimulation may have reduced negative-feedback sensitivity in the earlier portion of the task, when the large/risky option was more advantageous, but had the opposite effect in the last block. The nature of the probabilistic discounting task used here did not permit an examination of these measures within individual trial blocks, as few losses occur in high probability blocks and few wins occur in low probability blocks. However, these treatments did tend to increase win-stay tendencies while at the same time causing a slight reduction in negative-feedback sensitivity. As such, the improvements in decision making induced by NAc D1 receptor stimulation may be attributable to a relatively nonspecific refinement of the impact that both rewarded and non-rewarded outcomes exert on future decisions.

D1 receptor activity exerts important neuromodulatory control over excitatory inputs to the NAc. For example, D1 receptors can facilitate firing of NAc neurons driven by inputs from the basolateral amygdala (Floresco et al, 2001). In this regard, functional interactions between the amygdala and the NAc are critically important in driving choice toward larger, uncertain rewards (St Onge et al, 2012). Moreover, inactivation of the basolateral amygdala reduces preference for larger, probabilistic rewards, by enhancing lose-shift tendencies (Ghods-Sharifi et al, 2009), in a manner similar to the effects of D1 receptor antagonism reported here. Therefore, activation of NAc D1 receptors (potentially via phasic increases in DA; Sugam et al, 2012), may promote choice of larger, riskier rewards by enhancing task-related activity driven by the basolateral amygdala. Short-term potentiation of amygdala inputs by NAc D1 receptors may attenuate the salience of losses by augmenting representations of recently rewarded choices, bridging the gap between rewarded and non-rewarded actions, thereby increasing the likelihood of selecting potentially more profitable options at subsequent opportunities.

NAc D2 Receptors and Risk-Based Decision Making

Blockade of D2 receptors with eticlopride did not alter probabilistic discounting. This result was surprising, given that systemic treatment with this compound induced a marked decrease in preference for a large/risky option using a similar procedure (St Onge and Floresco, 2009). Furthermore, intra-mPFC infusions of this D2 antagonist actually increased risky choice (St Onge et al, 2011). It is unlikely that this lack of effect was due to insufficient dosing with this compound, as this drug displays a high potency at D2 receptor sites, and infusions were behaviorally active, in that they reduced overall locomotor activity. It is possible that a higher dose of eticlopride may have altered choice behavior. However, intra-NAc infusions of a 10 μg dose of this compound has been reported to suppress instrumental responding for food, which could potentially confound interpretation of its effects on decision making (Bari and Pierce, 2005). The findings that NAc D2 receptor blockade did not affect task performance, in combination with the pronounced effects of D1 receptor manipulations on decision making is in keeping with other studies reporting that NAc D1 receptors mediate response accuracy whereas D2 receptors play a greater role in motivational aspects of performance (Floresco and Phillips, 1999; Floresco, 2007; Pattij et al, 2007; Haluk and Floresco, 2009). The present data add to these findings, indicating that NAc D2 receptor activity does not appear to make a discernible contribution to probabilistic discounting. However, D2 receptors in the NAc have been shown to facilitate other forms of cost–benefit decision making, specifically those related to evaluation of effort costs (Cousins et al, 1994; Aberman et al, 1998; Nowend et al, 2001; Salamone et al, 2002). This finding, in combination with the reduction in locomotion reported here suggests that NAc D2 receptors may be more important in overcoming physical effort costs to obtain larger rewards, as opposed to costs related to reward uncertainty.

Infusions of D2 receptor agonists also failed to alter risky choice. An initial experiment used quinpirole, which displays comparable affinity among the D2, D3, and D4 receptors. Previous studies have shown that intracranial infusion of quinpirole within the dose ranges used here impairs probabilistic choice when administered into the prefrontal cortex (St Onge et al, 2011), and disrupts behavioral flexibility, as evidenced by impairments in set-shifting and reversal learning when infused into the NAc (Haluk and Floresco, 2009). A subsequent experiment used the agonist bromocriptine, which has a greater affinity for the D2 receptor over the D3 and D4 receptors, and increases risky choice on this task when administered systemically (St Onge and Floresco, 2009). Yet, neither of these treatments interfered with the ability to modify choice biases in response to changes in reward probability. This lack of effect contrasts with the above-mentioned observations that stimulation of NAc D2 receptors markedly impairs flexible responding in situations requiring shifts between or within different discrimination strategies (Goto and Grace, 2005; Haluk and Floresco, 2009). When comparing these disparate findings, an important consideration is that classical tests of behavioral flexibility require shifting behavior between responses that either result in reward delivery or do not. On the other hand, the probabilistic discounting task used here required animals to choose between smaller/certain and larger/probabilistic rewards. Therefore, it appears that excessive activation of NAc D2 receptor impedes modifications in behavior most prominently when shifting between actions that are reinforced in a deterministic manner (all or none), as opposed to those associated with delivery of uncertain or probabilistic rewards.

With regards to the anatomical distribution of different DA receptors within the ventral striatum, there is evidence to suggest that NAc outputs may be organized in a manner similar to the well-documented direct and indirect output pathways of the dorsal striatum. Thus, D1-expressing neurons in the NAc send a direct output to the substantia nigra pars reticulate, whereas neurons inhibited by D2 receptor activity preferentially project to the ventral pallidum and subthalamic nucleus (Nicola, 2007). The present data suggest that mesoaccubens DA may modulate risk-based decision making primarily by acting on D1 receptors residing on neurons in the direct output pathway.

Despite the lack of effect of D2 receptor manipulations in the NAc on risky choice, the fact remains that systemic treatments with D2 agonists or antagonists can increase or decrease preferences for larger, probabilistic rewards (St Onge and Floresco, 2009). Given that infusions of D2 drugs into the prefrontal cortex alters decision making in a manner qualitatively different from those observed following systemic treatment, it is likely that D2 receptors in other brain regions are critical for modulating probabilistic decisions. One obvious candidate is the basolateral amygdala. Electrophysiologcal studies have shown that stimulation of D2 receptors the basolateral amygdala potentiates sensory (non-limbic) cortical inputs to the basolateral amygdala, overshadowing D1-driven prefrontal cortical inputs to this nucleus (Grace and Rosenkranz, 2002). These findings would suggest that D2 stimulation within the basolateral amygdala may suppress prefrontal cortical inputs to this nucleus, which would be expected to increase risky choice (St Onge et al, 2012b). In addition, blockade of D2 receptors in the basolateral amygdala attenuates cue-induced reinstatement of cocaine seeking, suggesting that these receptors facilitate reward-directed behavior (Berglind et al, 2006). As the contribution of DA transmission within the basolateral amygdala to decision making has been virtually unexplored, future studies on this topic should provide additional insight to these issues.

As opposed to the lack of effect of D2 receptor ligands, intra-NAc infusion of the D3-preferring agonist PD 128 907 decreased preference for the large/risky option in a manner similar to that induced by systemic treatment with this drug (St Onge and Floresco, 2009). The reduced preference for the larger reward induced by the lower dose of the D3 agonist was apparent during the first, 100% block, when there was no risk associated with this option. Thus, it may be argued that excessive D3 receptor stimulation may not have affected probabilistic discounting per se, but instead may have disrupted self-control or attentional processes that would cause the rats to not choose the large reward lever. Note, however, that infusions of PD 128 907 into the NAc in a separate group of rats did not affect performance on a reward magnitude discrimination, demonstrating that these treatments do not always reduce preference for larger rewards. Interestingly, infusions of quinpirole, which stimulates both D2 and D3 receptors with relatively little selectivity (Sautel et al, 1995) did not affect risky choice. Thus, it appears that increased D3 activity within the NAc may dampen the impact that larger, uncertain rewards exert over subsequent choice behavior, whereas concurrent activation of D2 receptors may counter this effect. This notion is consistent with our observation that the lower dose of PD 128 907 caused a significant decrease in win-stay performance whereas the higher dose did not, possibly due to a loss of selectivity.

Systemic or local administration of D3 agonists decreases NAc DA efflux (Pugsley et al, 1995; Roberts et al, 2006), which has led to the suggestion that these receptors may serve as autoreceptors on DA terminals. Therefore, a parsimonious explanation for the D3 receptor-mediated decrease in risky choice observed in this experiment may be that their activation reduced NAc DA transmission, which in turn may have diminished dopaminergic tone on D1 receptors. However, it is important to highlight that the reduced preference for larger, uncertain rewards induced by D3 receptor stimulation was qualitatively different from that induced by D1 receptor blockade. PD 128 907 decreased win-stay tendencies, whereas SCH 23 390 increased negative-feedback sensitivity. Furthermore, each of these receptors modulate electrophysiological properties of NAc neurons via dissociable mechanisms. For instance, postsynaptic D1 receptor stimulation potentiates synaptic NMDA-mediated responses (Harvey and Lacey, 1997; Nicola et al, 2000), whereas presynaptic D1 receptors may depress excitatory and inhibitory synaptic transmission (Pennartz et al, 1994; Nicola and Malenka, 1997; Nicola et al, 2000). In comparison, D3 receptor activity suppresses inhibitory synaptic transmission by decreasing the availability of GABA receptors in NAc (Chen et al, 2006). In light of these findings, it is plausible that reductions in risky choice caused by excessive activation of NAc D3 receptors was the result of a combination neurophysiological alterations that may include, but are not limited to, reductions in DA transmission.

DA-agonist therapies prescribed for the treatment of Parkinson’s disease have been documented to induce a variety of impulse control disorders, including pathological gambling (Lader, 2008; Ahlskog, 2011). Some of these drugs, such as pramipexole have a higher affinity for D3 vs D2 receptors, which has led to the conjecture that these drugs may impair decision making and increase gambling behavior via actions on D3 receptors. However, the fact that stimulation of D3 receptors systemically or locally within the NAc actually decrease risky choice would suggest that this is not the primary mechanisms through which dopaminergic therapies may promote the emergence of impulse control disorders. Rather, as systemic treatment with D2-preferring agonists increase risky choice (St Onge and Floresco, 2009), it would appear that these side effects of DA-agonist therapies may occur through actions on D2 receptors residing in cortical and/or limbic brain regions beyond the NAc.

SUMMARY AND CONCLUSIONS

The findings of this study provide novel insight into the mechanisms through which DA transmission in the ventral striatum can refine cost–benefit evaluations requiring risk–reward judgments. By revealing risk aversion caused by D1 receptor blockade and optimization of decision making following D1 stimulation, these results suggest that normal levels of NAc D1 activity serve to modify decision-making biases toward or away from larger uncertain rewards to maximize the amount of reward that can be obtained in the long term. These effects may be mediated in part by fluctuations in tonic DA in the NAc, which appears to represent an integration of multiple types of information relevant to decision making, including reward uncertainty, opportunities to select preferred rewards, overt choice behavior, and changes in reward availability (St Onge et al, 2012a). On the other hand, NAc D2 receptor activity does not appear to make a discernible contribution to these functions, whereas excessive D3 receptor activity blunts the impact that larger rewards exert over decision biases. Additional studies of the mechanisms through which NAc DA may regulate these functions will expand our understanding of how DA transmission within this nucleus relates to both normal neuroeconomic processing and aberrant decision making associated with a variety of disorders linked to dysfunction within this system.