Introduction

Converging evidence from studies with animals and humans have implicated different parts of the medial prefrontal cortex (mPFC) as key nodes within a broader neural circuitry that mediates different aspects of reward-related decisions involving uncertainty or risk. In humans, these type of decisions have often been assessed using paradigms that inform subjects of the relative likelihood of obtaining rewards if they chose (relatively) safe but smaller rewards vs riskier but potentially more prosperous ones [1, 2]. Neuroimaging studies have revealed that decision making under these conditions is associated with activation of different regions of the anterior cingulate cortex (ACC) [1,2,3,4]. However, the ACC is functionally heterogeneous and specific contributions of ACC subregions to guiding choice remain unclear. For example, some neuroimaging studies suggest the dorsal part of the ACC facilitates risk taking, whereas other human imaging and lesion studies suggest that the infragenual region suppress risky choices [1, 3,4,5,6].

Additional insight into how different mPFC regions regulate risk/reward decision making comes from studies with rats. Along the ventral-dorsal axis, the rat mPFC can be divided into the infralimbic cortex (IL), prelimbic cortex (PL), and dorsal anterior cingulate (dAC), which are anatomically homologous to Areas 25, 32, and 24 of the human ACC, respectively [7]. Preclinical studies of risk/reward decision making have typically used procedures that require rats to rely on internally-generated value representations, wherein they learn about reward probabilities linked to different actions [8] or keep track of choice-outcome histories to estimate which options may be more profitable [9]. Using these assays, we and others identified a role for the mPFC in selecting the most advantageous action among options with various contingencies [9,10,11,12,13]. Specifically, the IL and PL, but not the dAC were found to play a role in facilitating optimal and flexible decision making [10, 11]. However, this is somewhat at odds with the general finding in the human literature, where ventral and dorsal regions of the mPFC appear to make dissociable contributions to risk/reward decision making. The possibility remains that the manner in which different regions of the mPFC influence decisions guided by cues informing about reward probabilities associated with different options may differ from those guided by internal value representations. In this regard, the role of the mPFC in cue-guided decision making has not been assessed in rodents.

To address this issue, we used a novel “Blackjack” task [14], which requires animals to choose between an option that delivers a Small/Certain reward, or a larger reward with varying probabilities (i.e., good vs. poor-odds). Critically, auditory cues signaled these probabilities (50% vs. 12.5%) at the start of each trial. Using this assay, we previously reported that the nucleus accumbens (NAc) shell—a main target of IL projections-, suppresses risky choices when odds are disadvantageous, consistent with previous work suggesting that the NAc shell and IL are crucial for suppressing inappropriate responses [15,16,17,18,19]. In comparison, inactivation of the NAc core, which receives projections from dorsal mPFC [20,21,22], induced random patterns of responding. These findings, combined with those from human neuroimaging studies led us to hypothesize that the ventral (IL) vs. dorsal (dAC and PL) mPFC make differential contributions to biasing risk/reward decisions guided by external cues. Accordingly, we examined how transient inactivation of the IL, PL, and dAC altered performance on the Blackjack task.

Materials and methods

Animals

Male Long-Evan rats (225–275 g at time of arrival) were used and were food restricted for the experiment. Details on housing conditions are described in the supplementary materials and methods. All experiments were conducted in accordance with the Canadian Council for Animal Care and were approved by the Animal Care Committee of the University of British Columbia.

Initial training

Training was conducted in operant chambers. Prior to training on the main task, rats received magazine training, basic lever press training, retractable lever press training, reward magnitude training and a reward magnitude training with a probabilistic component. The timeline for these training procedures are delineated in Fig. 1b-bottom, and details on the chambers and each of the procedures are described in the supplementary materials and methods.

Fig. 1
figure 1

Blackjack and Auditory Conditional Discrimination task design. a Format of a single free-choice trial on the Blackjack task. A trial started in darkness, followed by the illumination of the house light and the initiation of one of the auditory stimuli (randomized in pairs) and the insertion of the levers 3 s later. A response on either lever caused both to retract. If the rat selected the risky lever and was rewarded (a “win”), the cue and house light remained on during reward delivery and turned off 3 s after the choice. Non-rewarded risky choices (a “loss”) extinguished the house light, followed 2 s later by the termination of the auditory cue. The presentation of the cues for a period after risky choices was intended to facilitate learning of the associations between each cue and the likelihood of each outcome after a risky choice. Choice of the certain lever immediately turned off the auditory cue and delivered one pellet. Following a response omission (no response within 10 s of lever insertion), levers retracted and the house light and auditory cue were turned off. During forced trials the trial structure was similar, except that either the certain or risky lever was presented alone. When the risky lever was inserted, the auditory cue indicated the probability of obtaining the large reward (50% or 12.5%). When the certain lever was inserted on forced trials, the auditory cues were not informative but allowed animals to learn that the Small/Certain outcome was independent of the cue. b potential reward contingencies associated with responding on either lever during the Blackjack task. The right panels in (b) illustrate probabilities of obtaining the larger reward on good (top)—vs. poor (bottom)—odds trials, which were signaled by one of two auditory cues. The bottom panel illustrates the training timeline. c In the Auditory Conditional Discrimination task animals were trained to press the right lever when presented with one auditory stimulus (e.g.; 3 kz tone) and the left lever when presented with another stimulus (e.g.; white noise) to receive a two-pellet reward upon each correct response. The task structure was similar to that of the Blackjack task. After each correct choice, the reward was delivered, and the auditory cue and house light were turned off 1 s and 3 s later, respectively. Incorrect responses were followed by immediately termination of the house light and auditory cue. The bottom panel illustrates the training timeline. HL house light, ITI intertrial interval

Blackjack task

The main task used in this study has been described previously [14] (see Fig. 1a, b). Training consisted of two phases. During the initial phase, sessions consisted of 52 trials, of which the first 32 were forced-choice, followed by 20 free-choice trials. Each trial gave rats a choice between two options: a) choice of the Small/Certain lever, which always delivered one reward pellet; or b) choice of the Large/Risky lever, which delivered four pellets, but with varying probabilities (i.e., 50% or 12.5%). Every 40 s, a trial started with the illumination of the house light and presentation of one of two distinct auditory stimuli (a 3 kHz pure tone or white noise, 80 dB), followed 3 s later by the insertion of one (forced-choice) or two (free-choice) levers into the chamber. The auditory stimuli were presented pseudorandomly an equal number of times over a session. These cues indicated the probability that selecting the risky lever would deliver the Large/Risky four-pellet reward. One cue was associated with “good-odds”, where a risky choice delivered four pellets with 50% probability (e.g., the tone in Fig. 1b). The other cue signaled “poor-odds”, where a risky choice was rewarded with 12.5% probability (e.g., the white noise in Fig. 1b). The probabilities associated with each cue were counterbalanced across rats and remained constant over the duration of the experiment. A response on the certain lever always delivered one pellet, irrespective of which cue was presented. As such, on good-odds trials, the Large/Risky option had a greater expected value compared to the Small/Certain option, while the opposite was true for poor-odds trials.

Once animals exhibited stable choice behavior (~12–19 days; supplementary materials and [14]), they started the final phase of training, where sessions consisted of 40 free-choice trials (4–5 days—all free choice). They were then subjected to surgery, retrained on the task until performance was once again stable (5–10 days) after which they received their first microinfusion test day.

Auditory conditional discrimination

Separate groups were trained on an Auditory Conditional Discrimination task [14, 23], which followed the structure of the Blackjack task, except that the auditory cues indicated which lever yielded a food reward (each correct press delivered a two-pellet reward a 100% probability; Fig. 1c). Initial training (magazine training, basic lever press training and retractable lever pressing) was similar to the procedures used for the Blackjack task, except that during retractable lever press training, a press delivered reward with 100% probability Fig. 1c). Rats were then trained on the discrimination task, in which they learned to press a lever associated with an auditory cue (e.g., 3 kHz tone = right lever; white noise = left lever) in order to receive a two-pellet reward. The auditory cue associated with the correct lever was counterbalanced across rats and remained constant during the experiment.

Rats were trained initially on a 52-trial version (32 forced then 20 free-choice trials). During this phase of training, each auditory cue was presented pseudorandomly and an equal number of times. On forced-choice trials, only the correct lever was inserted. Training continued until an individual rat achieved criterion performance of at least 70% correct over two consecutive sessions (9–10 days). Subsequently they received an additional 3–10 days of training on the all free-choice version of the task (40 trials).

Surgery

Using standard stereotaxic procedures [14], rats were implanted with 23 gauge guide cannula into either the IL (anterior-posterior (AP) = +2.8 mm; medial-lateral (ML) = +/− 0.7 mm from bregma; dorso-ventral (DV) = − 4.1 mm from dura), PL (AP = +3.4 mm; ML = +/− 0.7 mm; DV = −2.8 mm), and dAC (AP = +2 mm; ML = +/− 0.7 mm; DV = −1.2 mm). Animals were given at least one week to recover before being (re)trained on the task. Additional details are described in the supplementary materials.

Drugs and microinfusion tests

Prior to the first microinfusion test day, animals received a mock infusion (see supplementary materials and methods) and 1–4 days later, they received their first microinfusion test day. A within-subjects design was used. Inactivations were induced using a solution containing the GABAA/B agonists muscimol and baclofen (0.1 μg each; Sigma-Aldrich) dissolved in 0.9% saline. Drug or saline were infused at a volume of 0.4 μl over 89 s. Behavioral testing commenced 10 min after the procedure (see supplementary materials) The day after their first test day, rats were retrained until their performance returned to their pre-test baseline (1–4 days). On the following day, they received a second, counterbalanced microinfusion test day. Previous studies have shown that infusion of these GABA agonists at a similar concentration and volume can induce dissociable effect on risk/reward decision-making tasks when infused into regions separated by ≥1 mm, including the PL, IL or dAC regions of the mPFC, [e.g., 9, 24] or adjacent core and shell regions of the NAc [14, 24, 25]. This is in keeping with electrophysiological studies estimating the functional spread of GABA agonist-induced neural inactivations to be ~1 mm [26]. Given these considerations, it is likely that the dissociable effects reported here are due to suppression of activity within the targeted mPFC region, and not due to spread to adjacent regions.

Histology

Standard histological procedures [14] were used to visualize placements of the microinjectors within targeted brain regions (see supplementary materials and Figs. 2c, 3e, 4d). Data from rats with asymmetrical placements or those residing outside the border of the mPFC regions of interest were removed from the analysis (Blackjack task: n = 4; Auditory Conditional Discrimination task: n = 3), which resulted in the inclusion of 14 animals in the IL group, 18 in the PL group and 13 animals in the dAC group for the Blackjack task. For the Auditory Conditional Discrimination task, the number of rats in the IL, PL, and dAC groups with acceptable placements were 6, 9, and 7, respectively.

Fig. 2
figure 2

Ventral mPFC inactivation increases risky choice on poor-odds trials during cue-guided risk/reward decision making. a Mean percentage choice of the Large/Risky option on the Blackjack task during good- vs poor- odds trials following control and infralimbic inactivation (n = 14). b Reward/negative feedback sensitivity, plotted in terms of how the outcome of a risky choice influenced choice on the next trial of the same type for good-odds (left and middle) and poor-odds trials (right). Each graph represents data from subsets of rats that experienced a sufficient amount of win or loss trials for each condition. c Location of acceptable infusion placements in the infralimbic cortex (circles). Numbers in each section refer to distance (mm) from bregma [59]. For this and all other figures, error bars represent S.E.M.; *significant p < 0.05 difference

Fig. 3
figure 3

Dorsal mPFC inactivation reduces risky choice on goods-odds trials during cue-guided risk/reward decision making. a Mean percentage choice of the Large/Risky option on the Blackjack task during good- vs poor-odds trials following control treatments and prelimbic inactivation (n = 18). b Reward/negative feedback sensitivity: how the outcome of a risky choice influenced choice on the next trial of the same type for good-odds (left and middle) and poor-odds trials (right) for the subset of rats that experienced a sufficient amount of win or loss trials for each condition. Prelimbic inactivation reduced win-stay tendencies across good-odds trials. c Choice of the Large/Risky option following control and anterior cingulate inactivation treatments (n = 13). d Reward/negative feedback sensitivity across trials of the same type for good (left and middle) and poor-odds trials (right) for a subset of rats that experienced a sufficient amount of win or loss trials. Anterior cingulate inactivation increased lose-stay tendencies across good-odds trials. e Location of acceptable infusion placements in the prelimbic cortex (circles) and dorsal anterior cingulate (triangles). All other conventions are the same as in Fig. 2. For 3B (left) and 3D (middle), insets show choice data for subset of rats that experienced a sufficient number of good-odds wins/losses to be included in the analyses

Fig. 4
figure 4

Inactivation of the mPFC does not disrupt performance of an auditory conditional discrimination. Mean percentage correct responses following control or inactivation treatments of the a infralimbic (n = 6), b prelimbic (n = 9), c dorsal anterior cingulate (n = 7). d Location of acceptable infusion placements in the infralimbic (stars), prelimbic (circles), and dorsal anterior cingulate cortex (triangles)

Experimental design and statistical analyses

For the Blackjack task, the primary dependent variable of interest was the percentage of choice of the risky lever on good vs. poor-odds trials, factoring out trial omissions. These data were analyzed using a two-way repeated measures ANOVA with Treatment (control vs. inactivation) and Odds (good vs. poor) as factors. In these analyses, a main effect of treatment would indicate a general increase/decrease in the total number of risky choices across both types of trials, whereas a Treatment × Odds interaction indicates that inactivation differentially altered choice on good vs poor-odds trials. Pairwise tests decomposing Treatment × Odds interactions survived multiple comparison corrections (p < 0.025). Response latencies were analyzed using paired t-tests. Across all experiments and treatments, response omissions were rare and did not differ across conditions (see supplementary materials).

Complementary analyses assessed whether changes in choice behavior were accompanied by alterations in sensitivity to reward or negative feedback in terms of how the outcome of preceding choices influenced subsequent choice. Specifically, we analyzed the proportion of trials that rats either chose the risky option again after receiving a Large/Risky reward (win-stay behavior) or switched to the Small/Certain option after a non-rewarded risky choice (lose-shift behavior). Two distinct types of analyses were conducted, the first focusing on how a particular choice was influenced by the outcome of a preceding risky choice in the sequence (win or loss), as we have done previously [14, 27] (Fig. S1–top). Win-stay ratios were calculated by dividing the number of instances where a rat followed a rewarded risky choice with another risky choice by the number of trials on which a large reward was obtained. Lose-shift ratios were calculated by dividing the number of Small/Certain choices that followed a non-rewarded risky choice by the total number of non-rewarded trials. We partitioned these trial-types by the odds associated with the choice trial, irrespective of the odds associated with the previous outcome trial on which the win or loss occurred. Additional details of these analyses are described in the supplementary materials.

However, as odds vary on a trial-by-trial basis in this task, those on the current trial (i.e., the one which the stay or shift is assessed) are not necessarily the same as those on the previous trial (i.e., the trial on which the win or loss occurred). We therefore conducted a second type of win-stay/lose-shift analysis based on the outcomes on previous trials of the same type (i.e., good or poor-odds trials) (Fig. S1–bottom). Specifically, in this novel analysis, we examined how the outcome of a risky choice on a good or poor-odds trial influenced choice on the next trial of the same type. For example, if a rat chose risky on a good-odds trial and was not rewarded, we assessed lose-shift behavior on the next good-odds trial, which—based on the task structure—could be 1–3 trials after that loss occurred. This resulted in four conditions (win | good-odds, loss | good-odds, win | poor-odds, and loss | poor-odds). Under these task conditions, there were very few “win | poor-odds” trials. Furthermore, across the different groups, treatment conditions and trial types, there were some rats that responded only 0–2 times on a particular trial type, which we felt was an insufficient number to include in a particular analysis. To overcome these complications, we opted to only analyse data from three conditions (win-stay | good-odds, lose-shift | good-odds and lose-shift | poor-odds) and did so for a subset of rats that experienced a win or loss three or more times during a session with three separate one-way ANOVAS (see supplementary materials for additional rationale and description). Whenever one of these supplementary analyses revealed a significant effect, we also analyzed the overall choice data from that subgroup of rats to confirm that the overall effect on risky choice was still apparent.

For the Auditory Conditional Discrimination task, the primary dependent variable of interest was the percentage of correct responses, factoring out trial omissions. Trial omissions and response latencies were analyzed using paired t-tests. For both tasks, we also compared the number of pellets earned.

Results

Blackjack task

Ventral mPFC: infralimbic cortex

Data from 14 animals with accurate placements were included in the analysis. As can be seen in Fig. 2c, three rats had cannula tips crossing the border between the IL and the dorsal peduncular cortex. It is important to note that analyzing the choice data with the data from these rats removed did not change the results in any way. Under control conditions, rats chose the risky lever more often on trials where cues signaled the odds of obtaining the larger reward were good compared to those when the odds were poor. Inactivation of the IL selectively increased risky choice on poor-odds trials (Fig. 2a and see Fig. S2 for individual data). This pattern of results was confirmed by a significant Treatment × Odds interaction (F(1,13) = 6.21, p = 0.027), but no main effect of Treatment (F(1,13) = 1.50, p = 0.243). Simple effects analyses confirmed IL inactivation increased risky choice on poor-odds trials (a ~75% increase above control levels; F(1,13) = 8.80, p = 0.011), although choice biases on these trials remained significantly different from chance levels (one-sample t-test vs 50%, t(13) = 2.17, p < 0.05). In contrast, IL inactivation did not alter choice on good-odds trials (F(1,13) = 0.35, p = 0.564) and also did not affect response latencies (t(13) = −0.654, p = 0.525; Table 1), response omissions or the number of earned pellets (see supplementary materials).

Table 1 Response latencies (mean ± S.E.M) for the Blackjack and Auditory Conditional Discrimination tasks

We next conducted win-stay/lose-shift analyses assessing how the outcome after a risky choice influenced action selection on the next trial in the sequence. We did not observe significant main effects of Treatment or interactions with other factors (Treatment × Outcome × Odds, Treatment × Outcome, Treatment × Odds; all Fs < 1.58, ps > 0.23; Table S1). However, as odds vary on a trial-by-trial basis in this task, those on the current trial (i.e., the one which the stay or shift is assessed) are not necessarily the same as those on the previous trial (i.e., the trial on which the win or loss occurred). We therefore conducted a second series of analyses assessing how the outcome of a risky choice influenced choice on the next trial of the same type (i.e., good or poor-odds). Somewhat surprisingly, analysis of these data from subsets of rats that experienced a sufficient number of trials revealed that the increase in risky choice induced by IL inactivation was not associated with changes in reward or negative feedback sensitivity. This was the case on both good-odds [win-stay | good-odds (n = 12): F(1,11) < 1, ns—Fig. 2b-left; lose-shift | good-odds (n = 12): F(1,11) < 1, ns—Fig. 2b-middle], and poor-odds trials [lose-shift | poor-odds (n = 7): F(1,6) < 1, ns—Fig. 2b-right; Table S2]. Thus, suppression of neural activity within the IL increased the tendency to pursue larger, risky rewards in spite of the presence of cues signaling the probability of obtaining these rewards was low. However, this increase in suboptimal risky choice was not associated with any discernable change in sensitivity to recently rewarded or non-rewarded choices. The location of acceptable placements in the IL are displayed in Fig. 2c.

Dorsal mPFC: prelimbic cortex

Data from 18 rats with acceptable placements in the PL were included in the analysis. In stark contrast to the effects observed in the ventral mPFC, inactivation of the PL selectively reduced risky choice on good-odds trials (Treatment × Odds interaction (F(1,17) = 7.59, p = 0.014; main effect of Treatment F(1,17) = 2.50, p = 0.132; Fig. 3a and see Fig. S2 for individual data). Simple effects analyses confirmed that PL inactivation significantly reduced risky choice on good (a ~20% decrease from control levels; F(1,17) = 16.33, p < 0.001) but not poor-odds trials (F(1,17) < 1, ns). Moreover, choice behavior on good-odds trials was not significantly different from chance after PL inactivation (one-sample t-test vs 50%, t(14) = 1.47, p = 0.16). Inactivation of the PL also slowed decision latencies (t(17) = −2.81, p = 0.012; Table 1), decreased the number of earned pellets (t(17) = −2.153, p = 0.046), but did not affect response omissions (see supplementary materials).

As in the IL group, analyses of how outcomes of a risky choice influenced choice on the subsequent trial did not reveal any effects of treatment (all Fs < 2.23, ps > 0.15 ns; Table S1). We then assessed win-stay/lose-shift behavior across trials of the same type. Here we observed that the reduction in risky choice on good-odds trials was accompanied by a decrease in win-stay behavior from one good-odds trial to the next. Thus, when rats played risky on a good-odds trial and won, they were less likely to select the risky option on the next good-odds trial in the choice sequence (F(1,15) = 5.77, p = 0.03; Fig. 3b-left). We further confirmed that in this subset of rats (n = 16), PL inactivation reduced risky choice on good-odds trials in a manner similar to the entire group (Treatment × Odds interaction: F(1,15) = 5.61, p = 0.032; good-odds simple effects: F(1,15) = 11.33, p = 0.004; poor-odds simple effects: F(1,15) < 1, ns; Fig. 3b- inset). In comparison, PL inactivation did not affect sensitivity to losses across good- (n = 14; F(1,13) < 1, ns—Fig. 3b-middle; or poor- (n = 10; F(1,9) < 1, ns—Fig. 3b-right) odds trials (Table S2). Thus, inactivation of the PL induced a suboptimal pattern of decision making, reducing the tendency to select the Large/Risky option when cues signaled the probability of obtaining a larger reward was relatively high. This was accompanied by an attenuation of the influence that rewarded risky choices had on action selection from one good-odds trial to the next.

Dorsal mPFC: anterior cingulate cortex

Data from 13 rats with acceptable placements in the dAC were included in the analysis. On the surface, the effects of dAC inactivation were similar to those induced by PL inactivation, in that they selectively decreased risky choice on good-odds trials (Treatment × Odds interaction: (F(1,12) = 8.829, p = 0.012; main effect of Treatment: F(1,12) = 0.66, p = 0.434; Fig. 3c and see Fig. S2 for individual data). Simple effects analyses confirmed this interaction was driven by a decrease in the choice of the risky option during good-odds trials (a ~18% decrease relative to control treatments; F(1,12) = 7.67, p = 0.017) and that choice behavior on these trials was not significantly different from chance levels (one-sample t-test vs 50%, t(13) = 1.05, p = 0.31). Furthermore, dAC inactivations did not alter choice on poor-odds trials (F(1,12) = 1.83, p = 0.201). Inactivation of the dAC increased response latencies, although this effect was not statistically significant (t(12) = −1.861, p = 0.088; Table 1), and it did not affect response omissions or the number of earned pellets (see supplementary materials).

Win-stay/lose-shift behavior based on the outcome of the previous trial was not affected by dAC inactivation (all Fs < 1, ns, Table S1). However, dAC inactivation did alter win-stay/lose-shift behavior across trials of the same type. Notably, these effects were distinct from those induced by PL inactivation, even though both treatments altered choice in a qualitatively similar manner. In this instance, rather than affecting reward sensitivity, dAC inactivation increased sensitivity to losses. Thus, dAC inactivation had no effect on win-stay behavior on good-odds trials (n = 11; F(1,10) < 1, ns—Fig. 3d-left). In contrast, in a subset of rats that experienced a sufficient number of trials, these treatments increased the tendency to shift choice towards the Small/Certain option on good-odds trials after a non-rewarded risky choice on a preceding good-odds trial (n = 9; F(1,8) = 49.65, p < 0.001—Fig. 3d-middle; Table S2). We then confirmed that dAC inactivation in the subgroup of rats included in this lose-shift analysis induced comparable changes in risky choice as the entire group (Treatment × Odds interaction: (F(1,8) = 7.31, p = 0.027; good-odds simple effects F(1,8) = 8.71, p = 0.018; poor-odds simple effects: F(1,8) = 1.16, p = 0.764, Fig. 3dinset). dAC inactivation also tended to increase lose-shift behavior on “lose-shift | poor-odds” trials, even though in this instance, we could only include data from four rats and thus, the effect only approached statistical significance (F(1,3) = 6.93, p = 0.078—Fig. 3d-right). Collectively, these analyses reveal that inactivation of the dAC induced a suboptimal pattern of decision making, reducing the tendency to select the Large/Risky option when cues signaled the probability of obtaining a larger reward was relatively high, similar to inactivation of the PL. However, these treatments also increased the tendency to shift away from the risky option after a non-rewarded choice. The location of acceptable placements in the PL and dAC are presented in Fig. 3e.

Dorsal/ventral mPFC comparisons

Given the distinct effects of inactivation of the ventral vs dorsal mPFC on decision making, we conducted a formal analysis to directly compare effects between groups, using a three-way mixed ANOVA with Treatment and Odds as within-subjects factors. As inactivation of the PL and dAC induce qualitatively similar effects on choice, these data were combined in the analysis, yielding two levels of mPFC Region (ventral vs dorsal) as a third, between-subjects factor. Confirming the general impression from the separate comparisons of each region, this analysis revealed a significant Region × Treatment interaction (F(1,43) = 4.19, p = 0.049), which reflected the increase in risky choice induced by IL inactivation contrasted with the decrease in risky choice following PL/dAC inactivation. This provides additional support for the idea that dorsal vs ventral mPFC regions play differential roles in refining action selection during cue-guided risk/reward decision making. Another analysis explored whether the dorsal-ventral coordinate of the cannula placement was predictive of the behavioral effects (supplementary materials, Fig. S3). The results of these analyses suggested there was no linear relationship between the depth of infusions within different mPFC regions and the effects on choice, suggesting these differential effects are due to inactivation of discrete mPFC subregions.

Auditory conditional discrimination

The Blackjack task may be viewed a complex form of conditional discrimination wherein otherwise arbitrary stimuli direct action selection, with the additional requirement of integrating information about reward magnitudes and probabilities. We thus felt it imperative to assess whether these treatments may also affect implementation of conditional rules. Separate groups of rats were trained on an Auditory Conditional Discrimination task that had a similar structure, but did not require integration of information about variations in reward magnitude and probability [14, 23]. These experiments used fewer rats than the Blackjack studies, as we have been able to detect significant difference between treatment conditions using ns of 7–8 with this task [14, 23], and power analyses revealed ns in this range provided statistical power of at least 0.80.

Inclusion of animals with acceptable placements in the IL, PL, and dAC resulted in six, nine and seven animals in each respective group (Fig. 4d). In contrast to the differential effects induced by inactivation of the mPFC regions on the Blackjack task, similar treatments had no effect on performance of this simpler auditory conditional discrimination [Treatment effects: F(1,5) < 1, ns; F(1,8) = 2.40, p = 0.160; F(1,6) = 1.92, p = 0.215; for IL, PL and dAC, Fig. 4a–c, respectively]. Furthermore, response latencies, response omissions, or the number of earned pellets were not altered by inactivation of any of the mPFC regions (all ps > 0.1; Table 1, supplementary materials). These lack of effects on choice or response latencies suggest that alteration in cue-guided, risk/reward decision making induced by inactivation of IL, PL or dAC are unlikely to be attributable to a disruption in the implementation of conditional rules, attentional, motivational or other non-specific deficits.

Discussion

The main findings of the present study were that the ventral (IL) and dorsal (PL, dAC) regions of the mPFC play dissociable yet complementary roles in promoting optimal risk/reward decisions when they are guided by external cues informing about the probabilities of obtaining larger rewards.

The ventral mPFC suppresses inappropriate risky choices

IL inactivation increased risky choice selectively on poor-odds trials, suggesting this region aids in suppressing the urge to pursue larger rewards when discriminative stimuli indicate that they are unlikely to be received. This result complements other findings demonstrating that the IL serves to inhibit inappropriate or non-rewarded actions in situations where cues signal whether those actions may not be rewarded [28,29,30]. The present results extend these findings to show that activity in the IL regulates action selection in more complex contexts requiring the use of discriminative external cues to choose between different options associated with different reward magnitudes and probabilities. In this regard, these effects on choice were not associated with changes in reward or negative feedback sensitivity. However, it should be emphasized that with the Blackjack task used here, reliance on a win-stay/lose-shift strategy based on the outcome of the previous choice is not advantageous, as reward probabilities on a particular trial are independent of previous ones. In addition, the use of informative cues and extensive training would be expected to diminish the tendency to use information about previous outcomes to guide choice, even on trials with the same type. Indeed, under control conditions, win-stay and lose-stay (inverse of lose-shift) values on good-odds trials were ~70% in the IL group. Thus, once rats were well-trained, they were not as influenced by recent feedback and were just as likely to choose risky on the next good-odds trial after a win or loss on a previous good-odds trial, and this remained unaffected by inactivation. Notably, this lack of effect on feedback sensitivity following IL inactivation is in keeping with the finding that similar treatments do not affect win-stay/lose-shift behavior on a probabilistic reversal task [31]. Thus, it appears that the IL promotes a general suppression of inappropriate risky choices, tempering the allure of larger yet unlikely rewards in a manner uninfluenced by the outcome of recent actions.

It is notable IL inactivation induced effects similar to those of inactivation of the more caudal [32] region of the medial NAc shell, a major target of IL projections [14]. In this regard, IL-NAc shell circuitry has been described extensively for its role in inhibiting inappropriate behaviors [16, 17, 33, 34]. Together, these results imply that when discriminative stimuli inform a decision maker that certain actions are unlikely to be rewarded, IL activity serves to bias behavior away from these options via interactions with the NAc shell.

The dorsal mPFC promotes more profitable risky choices

Inactivation of dorsal mPFC regions altered choice in a manner opposite to that observed after similar treatments of the ventral mPFC. Suppressing either PL or dAC activity reduced risky choice on good-odds trials, suggesting both of these regions bias choice towards riskier options when discriminative cues indicate they may be more profitable. Yet, there were important differences in the manner in which inactivation of these two regions altered how context-relevant wins or losses biased subsequent choices. PL inactivation decreased risky choices following previously rewarded risky choices on the same type of trial, and also slowed decision latencies. In comparison, the reduction in risky choice on good-odds trials induced by dAC inactivation was driven by increased sensitivity to losses.

The effects of PL inactivation reported here are in line with previous observations that this region promotes choice of the best option during different forms of risk/reward decision making, which depending on the task structure may manifest as either decreases or increases in risky choice [9, 10, 12, 35]. Note however that even though PL inactivation decreased win-stay tendencies in the present study, it is unlikely that this region uniformly facilitates win-stay behavior. Indeed, we previously observed that PL inactivation actually increased win-stay tendencies in rats performing a probabilistic reversal learning task, wherein reward seeking is guided by internal representations of reward history [31]. These discrepant findings suggest that the specific involvement of the PL in action selection during reward seeking can vary depending on the task at hand. Thus, in situations where decisions are guided by internal representation of reward contingencies, the PL promotes flexible patterns of choice when animals need to explore different options to ascertain which ones may be more profitable in changing environments [9, 31, 36, 37]. Under these conditions, optimal reward seeking may rely on adherence to win-stay or lose-shift strategies based on the outcome of recent actions [27, 38]. Note that such a strategy would be relatively ineffective in the Blackjack task, where cues—rather than recent outcomes—indicate which options may be more profitable. With this in mind, the PL has also been implicated in producing and maintaining situation-appropriate responding in the context of rewards [39,40,41]. With respect to the present findings, activity in the PL appears to promote the tendency to play risky when external stimuli indicate that risky choices are likely to be rewarded, and may reinforce previously rewarded risky choices in a context-appropriate manner.

The finding that dAC inactivation induced sub-optimal decision making is particularly novel, as it contrasts with others that have failed to implicate this region in modulating risk preferences [9, 10, 12]. However, these previous studies used tasks that required rats to rely on internally-generated value representations, as opposed to the task used here that probed how external cues are used to guide choice. In this regard, several studies have implicated the dAC in the use of explicit cues to select appropriate actions in dynamic environments [42,43,44]. As such, it appears the dAC plays a more prominent role in guiding risk/reward decision making when discriminative cues inform a decision maker about which action will yield the best outcome on each trial.

The finding that dAC inactivation increased risk aversion following reward omissions comports with contemporary theory that this regions mediates error and conflict processing [43, 45,46,47,48]. In this regard, negative feedback in the Blackjack task may act as a form of interference. As cued reward probabilities are independent from trial-to-trial, excessive reliance on negative feedback after a choice may ineffectively bias decisions away from better options. Taken together, these results suggest the dAC may serve to reduce the aversive impact that previously non-rewarded choices exert over subsequent action selection. More generally, the observation that the feedback effects of both PL and dAC inactivation were specific to the outcomes of trials of the same type, rather than the most recent choice, suggests that these regions may monitor outcome history over a series of choices in a context-dependent manner [43].

Inactivation of either of mPFC region did not alter accuracy on a simpler conditional discrimination task, where probabilities and reward magnitude were kept constant between options. This is in keeping with previous reports that lesions of the dAC, PL, IL or entire mPFC [42, 49], or inactivation of the PL or IL [50] did not alter simpler conditional discrimination. Performance of this simpler task is mediated in part by subcortical regions including the NAc [14] and potentially the amygdala [51]. The findings that these prefrontal regions make differential contributions to decision making on the Blackjack task but not more basic conditional discriminations suggests that these regions are preferentially recruited as part of a broader cortical-subcortical network to aid cue-guided reward seeking in situations requiring integration of information about reward probabilities and magnitudes associated with different actions.

The win-stay/lose-shift analysis detected differences in response to feedback across trials of the same type, which could span up to three trials after the trial on which the outcome (win or loss) occurred. Thus, it is possible that changes in feedback sensitivity observed after inactivation of the dorsal mPFC may be related to perturbations in certain aspects of working memory, which have been observed after manipulations of the mPFC [e.g., 52,53,54,55]. However, a general disruption in working memory would be expected to affect feedback on both types of trials non-specifically, as opposed to the more selective effects observed here. In addition, lesions that predominantly affect the IL or PL affect working memory, whereas lesions of the dAC region do not impair delayed visual object recognition or spatial working memory [54, 55]. In this regard, IL inactivation induced changes in choice were independent from “delayed” feedback sensitivity, whereas dAC inactivations cause selective alterations in lose-shift behavior. These observations render it less likely that the effects reported here are due primarily to alterations in working memory functions.

Comparison with human studies of mPFC decision making

Fundamental differences in paradigms used to assess risk/reward decision making across species has made it challenging to reconcile human neuroimaging findings with those probing how the rodent mPFC is involved in these processes. Specifically, human studies often use tasks that provide explicit information about reward probabilities associated with upcoming choices, whereas rodent assays often require subjects to infer the most advantageous action from trial-by-trial feedback. By using an assay that measured cue-guided risk/reward decision making, we revealed previously uncharacterized roles of the ventral vs. dorsal regions of the rat mPFC in risk/reward decision making that complement findings in the human literature. The increase in risky choice induced by IL inactivation is strikingly similar to that observed in humans with lesions of the ventromedial PFC, including area 25 [5] and to negative correlations observed between ventral ACC activity and risk taking in healthy subjects [6]. Moreover, neuroimaging studies have shown that human cingulate Areas 32/24 (homologous to rat PL/dAC) display increased activity when subjects make risky choices after being informed explicitly about the odds associated with different options [1, 3, 4, 6]. This meshes well with our observation that the PL/dAC promotes profitable risky choices. These results highlight the importance of careful translation of assays between species, and the caution required when extrapolating findings across paradigms.

One interesting point arising from our findings relates to subtle differences in the contribution of the PL vs. dAC. Specifically, human neuroimaging studies often report risk-related activity in the dorsal ACC as a whole, with loci of activation either in Area 32, 24, or both. In comparison, our studies were able to parse differential contributions of these regions. It is important to emphasize that our ability to tease apart these contributions came from an in-depth analysis of animals’ response to feedback, an approach that not always implemented in human neuroimaging studies. Nevertheless, Areas 32 and 24 of the human ACC have been implicated widely in sensitivity to feedback, specifically errors and reward omissions [56, 57] and rewards [2]. It therefore may be that Areas 32 and 24 are dissociable in terms of feedback sensitivity, although this hypothesis remains to be formally tested.

Conclusion

Collectively the present results further refine our understanding of how different regions of the mPFC contribute to risk/reward decisions making. Our IL findings are congruous with the idea that activity in this region suppresses inappropriate risky choices. In comparison, dorsal mPFC regions do not uniformly drive risk taking or risk aversion per se. Instead, their contribution depends greatly on the task context. The PL may aid in determining whether exploration vs exploitation strategy may be more beneficial, which may facilitate win-stay behavior during cue-guided decisions, and supress them in other situations. Similarly, the dAC may evaluate and integrate response conflict and risk-related information provided by external stimuli, as well as information about errors or non-rewarded actions, to allocate appropriate attentional resources [58], which can serve to suppress ineffective responses to losses. As such, each of these dorsal mPFC regions work cooperatively, likely in combination with other striatal and amygdalar circuits, to promote optimal cue-guided risk/reward decisions through distinct yet complementary mechanisms.