Introduction

Many decisions such as choosing a restaurant to dine in, or foraging for food in the wild require the organism to accumulate and integrate information across multiple domains that include the reward value (e.g., food), effort (e.g., distance), and potential punishment (gastric malaise, predation). Much progress has been made in the quantification of, and delineation of the neural substrates governing cost-benefit decision-making that manipulates the magnitude, probability, effort, and timing of obtaining reward [1, 2]. However, less research has been directed at quantifying decisions that are made under an approach-avoidance conflict as a result of exposure to competing/mixed outcomes, despite historical interest in the subject [3,4,5,6,7]. Furthermore, the underlying neural circuitry of approach-avoidance decision-making remains to be elucidated, an important undertaking given that approach-avoidance decisions are likely dysregulated in psychiatric diseases [8,9,10].

The ventral hippocampus (VH) has emerged more recently as a key area involved in emotional regulation, distinct from the role of its dorsal counterpart in spatial learning and memory [11]. The VH has been implicated in the control of anxiety [11,12,13], extinction of fear conditioning [14], extinction of active avoidance [15], in addition to its role in regulating maze-based cued approach-avoidance decisions [16,17,18], and foraging-related and operant-based approach-avoidance decisions in humans [19, 20]. However, its role in operant decision-making in preclinical studies, in which animals have to make discrete choices under approach-avoidance conflict has not been well studied, with some studies reporting an absence of effect with anterior hippocampus inactivation (in marmosets) [21]. One important consideration is that the majority of studies revealing significant contributions of VH in emotional regulation have utilised tests that are conducted in extinction-like scenarios, involving the presentation of conditioned or innate stimuli in the absence of outcomes, or exposure to situations of potential, but no actual threat. It is therefore unclear if the VH is critically involved in decision-making tasks that involve reward and punishment outcomes, alongside the orbitofrontal cortex (OFC), insula, and subgenual ACC [22, 23], or whether the VH is selectively involved in cue-driven decision-making.

We therefore developed a novel operant cued conflict-based decision-making task that enabled the investigation of the role of the VH in decision-making that involves choosing between two cued options under conditions in which the outcomes are delivered, and not delivered (in extinction). Given the evidence discussed above, we predicted that chemogenetic inhibition of excitatory neurons of VH would selectively disrupt cued approach-avoidance choice decisions made under extinction conditions. Animals were first trained in multiple blocks of increasing approach-avoidance conflict, to choose between a high reward option with varying shock intensities (conflict) on one lever, and a low reward option (reward only) on another lever. Varying the shock intensity between blocks allowed us to observe the point of subjective equality (PSE), the conflict level at which animals exhibited equal preference between the conflict and low reward options, and discrimination sensitivity (DS), the animals’ ability to differentiate responding between blocks. In addition, unlike most other decision-making tasks, our task was based on a differential reinforcement of low rates of responding (DRL) schedule of reinforcement, in which animals were required to wait for at least 6 s before responding to obtain the outcome. This ensured that animals were given sufficient time to appraise the cues associated with the conflict and reward options, and to minimize the incidence of impulsive choices, since HC lesions have previously been shown to induce impulsive choice in a delayed discounting task [24]. Furthermore, the DRL schedule enabled us to calculate deviations from optimality [25].

Materials and methods

Subjects

Subjects were 22 male Long Evans rats (Charles River, QC, Canada), weighing ~400–600 g at the time of surgery. They were housed in groups of two in a 21 °C room, under a 12 h light/dark cycle (lights on at 7:00 A.M.). Water was available ad libitum throughout the experiment. Before commencing behavioral testing, food was restricted to maintain animals’ weights between 90% of their free-feeding weight to promote motivation to engage in behavioral tasks. All behavioral testing took place in accordance with the ethical and legal requirements under Ontario’s Animals for Research Act, the Canadian Council of Animal Care, and approval of the Local Animal Care Committee.

Surgery

Rats were divided into two groups (n = 11 hM4Di-injected group; n = 11 GFP-injected controls) and underwent bilateral adeno-associated virus (AAV) injection surgeries. Rats were anaesthetized with isoflurane gaseous anesthetic (5% for induction, 2–3% for maintenance, Benson Medical, ON, Canada) and placed in a stereotaxic frame (Stoelting Co, IL). Briefly, an incision was made at the skull midline and small holes were created via a dental drill to target the VH bilaterally (AP −5.8 mm; ML ± 5.4 mm; DV −6.5 mm). A 10 μl Hamilton syringe was then lowered into the VH for bilateral infusion of 0.5 μl of AAV containing Designer Receptors Exclusively Activated by Designer Drugs hM4Di (AAV8-CAMKII-hM4Di-mCherry, Addgene, MA) or a control virus (AAV8-CAMKII-GFP, Addgene, MA). The AAV delivery occurred over a period of 5 min with controlled electronic microinjector (Stoelting Co, IL). Once injection was completed, the syringe remained at the delivery site for an additional 5 min to allow AAV particles to diffuse away from the injector tip before the scalp was stitched back together. After surgery, the animals were allowed a recovery period of a minimum of 7 days with postsurgical care and food available ad libitum before behavioral testing.

Drug injections

All rats received an intraperitoneal (IP) injection of either 0.9% saline and clozapine-N-oxide dihydrochloride (CNO, R&D Systems, MN, dissolved in sterile saline at a volume of 1 ml/kg) 45 min prior to outcome and extinction test sessions. For the elevated plus maze (EPM) sessions, all rats received an IP injection of CNO (1 mg/kg) 45 min prior to the test session.

Behavioral procedures

Apparatus

Testing took place in eight operant chambers (30.5 cm L × 24.1 cm W × 29.2 cm H, Med Associates, VT) contained within a sound-attenuated box with a ventilation fan. Each chamber had two 4 cm wide retractable levers positioned on the sidewall, located 2 cm either side of a central food well (magazine) and 5 cm above the floor composed of parallel shock bars. A light panel was positioned 3 cm above each lever. The shock bars were connected to shock generators, which produced mild foot shocks (0.5 s, 0.05–0.65 mA). A pellet dispenser connected to the central food well delivered 45 mg chocolate sucrose pellets (TestDiet, MO) as reward stimuli, and an infrared detector monitored every nose poke entry into the food well. The chamber was illuminated by a 1.8 W, 17 V house light located at the top right corner of the chamber.

Pre-training

All rats received two magazine training sessions in which they were placed in the operant box for 1 h and received sucrose pellets on a variable interval 20 s schedule of reinforcement. Rats were then trained to lever press for sucrose pellets under a Fixed Ratio 1 schedule. To avoid side bias, the lever to be inserted (left vs. right) was randomly chosen until 50 rewards were obtained from each side. A session ended after 30 min, or after animals pressed both levers 50 times. Rats were moved to the next phase of training after two consecutive sessions with at least 40 presses on each side.

Differential reinforcement of low rates of responding (DRL) schedule

All rats were trained to respond for a reward pellet under a delayed DRL schedule of reinforcement following lever press training (Fig. 1A). Each trial started with the insertion of the left or right lever and rats were required to wait a fixed minimum time (6 s) before responding to obtain a single sucrose pellet. If the subject responded prematurely (incorrect response), then the lever was retracted without reward and the trial ended 0.5 s after retraction. When a correct response was executed, the lever retracted immediately, and a reward pellet was delivered to the magazine after a brief (0.5 s) delay. The 0.5 s delay was implemented in order to accommodate the introduction of shocks (0.5 s duration) in the next phase of training, such that the shock delivery co-terminated with the reward delivery. After 30 sessions, training was halted for the rats to undergo surgical infusions of the hM4Di/GFP construct.

Fig. 1: Two alternative forced choice DRL training and tests.
figure 1

Timeline of training (top): A Rats were first trained on the differential reinforcement of low rates of responding (DRL6) schedule, in which all responses made after a fixed schedule (6 s) were rewarded with a sucrose pellet 0.5 s after the lever press. Premature responses (shorter than the schedule) were not rewarded following the 0.5 s timeout. B Following successful acquisition of DRL responding for reward, animals were trained on a cued two alternative forced choice task (CH-3b). In this phase of training, rats underwent 3 blocks of trials in each session. In each block, rats were first given 10 forced trials in which they were exposed to the outcomes associated with the two levers and cues (1 reward pellet only vs. 2 reward pellets and shock (conflict) option). They were then given a minimum of 20 ‘correct’ free choice trials in which both levers and cues were presented, and rats responded to one of two options on a DRL schedule to receive the associated outcomes. The conflict option was signalled by a flickering light (of different frequency according to the shock intensity) and reward-only option was signalled by constant light. The blocks differed in the magnitude of shock associated with the conflict option (0.25, 0.30, 0.35 mA). C Rats received further training, until they reached a version of the DRL choice task (CH-7B) in which rats were administered 7 blocks of a minimum of 30 trials (10 forced, >20 correct free choice trials), with the following shock levels associated with the conflict option: 0.05, 0.15, 0.25, 0.35, 0.45, 0.55, 0.65 mA. Timeline of testing (bottom), with accompanying treatments (CNO vs. Saline). D Following the training phase, animals began their testing period with an EPM test on day 85, followed by 3 test sessions of the DRL choice task (CH-7B) under the influence of CNO (1 mg/kg, IP) or Saline injections. After 3 days of retraining, animals underwent 3 further test sessions of CH-7B, this time with a reversal of the drug treatment for a within-subject comparison of CH-7B performance with and without CNO injections. E Animals were then administered a shorter version of the CH-7B task (CHS-7B), in preparation for upcoming extinction test sessions. F After 9 training sessions, animals underwent two rounds of one CHS-7B session under extinction conditions, with CNO and Saline injections administered in counterbalanced order. During the ‘extinction session’, no outcomes were delivered upon the pressing of either of the levers, with the animals’ choice decisions guided primarily on the basis of visual cues associated with both options. On the final day, animals received one last session of CHS-7B, before sacrifice.

DRL two alternative forced choice training

Following the DRL training, all rats underwent the two alternative forced choice training in which responding on one of two extended levers (side of lever counterbalanced) delivered one sucrose pellet under the same DRL schedule (reward only option), while responding on the other lever led to a mixed outcome (two pellets and shock − conflict option). Training took place in a number of steps, with the initial step involving the administration of three blocks of a minimum of 30 trials per session, with each block representing increasing shock intensities associated with the conflict option (0.25, 0.30, 0.35 mA, Fig. 1B). Each block began with 10 forced choice trials in which one of the two levers was presented in random order (5 trials each), and animals were required to respond under DRL6s to receive a single reward pellet or a mixed outcome (two reward pellets and shock). Responding on the lever led to the retraction of lever for 0.5 s, and the delivery of associated outcomes, followed by a re-extension of the lever to start the next trial. If a response was not emitted within 12 s, the lever was retracted, and the intended outcome for that trial was delivered. Importantly, the extension of the levers was accompanied by the presentation of a light stimulus positioned over each lever, signalling the outcome associated with the lever. A constant light was paired with the reward only option and a flickering light was paired with the mixed outcome (conflict) option. The light associated with the conflict option flickered at 1.33 HZ at the lowest shock intensity (0.25 mA), with a decrease in the rate of flickering with increasing shock intensity/block (0.30 mA:1 Hz, 0.35 mA: 0.8 Hz). Following the forced choice trials, animals were administered free choice trials in which two levers were presented, and animals were required to respond on one of the two levers under DRL 6 s to receive the associated outcome, with the 12 s response deadline removed. In this phase, a minimum of 20 correct responses was required (waiting >6 s to respond on either lever) to advance to the next block. Animals underwent further training steps (Fig. 1B, C, Supplementary Methods) until stable choice performance was established in training sessions consisting of seven blocks of a minimum of 30 trials (10 forced, >20 free choice) marked by increasing shock intensity (0.05–0.65 mA, in increments of 0.1 mA), and a light cue of decreasing frequencies (4,2,1.33, 1, 0.8, 0.66, 0.57 Hz) accompanying the conflict option.

DRL choice outcome test

For the DRL choice outcome test sessions (Fig. 1D), half of the rats underwent three consecutive sessions of DRL choice test under the effect of 1 mg/kg CNO, then three consecutive sessions without injections (for drug washout), followed by three consecutive sessions with saline injections, while the other half received the drug and saline injections in reverse order. The test sessions were identical in structure to the training sessions, with the exception that the first 10 forced choice trials were not administered. Therefore, in each session, animals were required to complete a minimum of 20 free choice trials per block (seven blocks, total of 140 trials minimum).

DRL choice extinction test

Following the DRL choice outcome tests, animals were retrained on a shorter version of the task in preparation for the extinction test (Fig. 1E). The free choice phase was now reduced to achieving ten correct responses before proceeding to the next block (i.e., a minimum of 70 trials/session). After ten sessions, animals were tested in the DRL choice task for one session under extinction conditions (Fig. 1F), in which the session proceeded as during training (same temporal structure, and same minimum number of correct trials), with the exception that no outcomes were delivered. Half of the animals received saline while the other half received 1 mg/kg CNO prior to the extinction test session. Animals were then retrained in the shorter DRL choice task for three further sessions, and received one further test in extinction with the drug/saline injections reversed.

Elevated plus maze (EPM)

All rats underwent a standard ethological test that measures innate expressions of approach-avoidance conflict in EPM, which consists of two open arms (40 cm L ×  10 cm W × 2 cm H) and two closed arms (40 cm L × 10cm W × 30 cm H) with a central platform (10 cm L × 10 cm W). Each rat was placed individually in the maze for 5 min, beginning at the central hub of the maze facing an open arm. The arm entries and time spent in each arm were measured and compared. This task was performed prior to the delayed DRL choice test with CNO injections.

Cfos immunohistochemistry

Following the final extinction test under the influence of either saline or CNO (1 mg/kg), all hM4Di rats were anaesthetized with a lethal dose of pentobarbital and perfused intracardially with 0.9% saline and then a 4% paraformaldehyde solution. The brains were then extracted and kept in 4% paraformaldehyde and 20% sucrose solution for 24 h, and were cut into 50-μm thick coronal sections using a vibratome (Leica VT1200S). All sections then underwent immune staining for cfos, according to established protocols in our laboratory (see Supplementary Methods), and mounted on gelatin-coated slides and air dried before being cover slipped with Fluoroshield Mounting medium with DAPI for nuclear staining (Abcam, UK).

Cell imaging and counting

hM4Di and GFP expression, and c-Fos immunoreactivity were visualized at ×4, ×10, ×20 and ×40 magnification using the NIKON Ni-U upright florescent microscope (NIKON, NY). GFP expressing cells and c-Fos positive (+) cells conjugated with TSA-fluorescein were visualized using the FITC filter, while hM4Di-expressing cells were visualized using the TexRed filter. Qualitative analysis of the extent of the hM4Di expression was achieved by superimposing ×4 images from each animal and generating an intensity mapping of the areas most consistently expressing hM4Di. Quantification of c-fos+ cells, and cells double labeled with cfos and hM4Di/mCherry was achieved using two images of coronal sections of the VH taken at ×4 and ×20 magnification from each animal. VH subregions were demarcated into six regions of interest [vCA1 (vCA1v, vCA1d), vCA2, vCA3 (vCA3v, vCA3d), vDG), and the number of cfos+, and cfos + mCherry-positive cells within those boundaries were counted using ImageJ [26] and Fiji [27] software.

Statistical analysis

Data were analysed using statistical software R [28] and lme4 [29]. DRL responses from every three sessions of acquisition training and choice tests (except the extinction test) were pooled to obtain more representative probability distributions. These distributions were fitted with exponential inverse-gaussian using a custom MATLAB script that uses maximum likelihood estimation [30] and fit parameters were used to calculate a number of DRL parameters including the wait times/inter-response times (IRT), timing uncertainty (coefficient of variation, CV) and optimality related measures (Expected Reward Rate (ERR), Percentage of Maximum Expected Reward Rate (PMERR)). For acquisition data, DRL parameters for the first and last three sessions were compared to assess learning. For DRL choice test data, DRL parameters for each lever option were calculated separately (conflict and reward only) for each block. However, since the number of reward only responses at lower shock intensities and number of conflict responses at higher shock intensities were few in number, we analysed the % correct responses (responses with >6 s wait time) and DRL responses from the three highest intensity blocks (0.45–0.65 mA) for the reward only option and the three lowest intensity blocks (0.05–0.25 mA) for the conflict option. Additionally, preference for the reward only option was calculated by dividing the number of reward only responses by the total number of responses in each block. PSE and DS were calculated by fitting psychometric curves using a custom Matlab script [31] to the preference for the reward only option for each subject. Linear mixed modelling was then employed to analyse all parameters for both conflict and reward only choices in the outcome and extinction test sessions, using the intensity of the shock (×7), experiment (outcome, extinction sessions), drug (saline, CNO), and virus (hM4Di-mCherry or GFP) as fixed effects, and intercepts for subjects as random effects. Post-hoc analysis was conducted by using emmeans package with Tukey familywise correction. Degrees of freedom were calculated using the Satterthwaite method.

Finally, random data sampling was carried out to establish that significant effects found between outcome and extinction conditions were not driven by differences in trial numbers between the two conditions (Supplemental Information, Table S1).

Results

CNO-induced reduction in Cfos+ cells in the vCA3v and vCA1d

All animals showed robust expression of hM4Di (mCherry+ cells) or GFP within the confines of VH (Fig. 2A–D), with consistent transfection of the vCA3, vCA2, and vCA1. Analysis of the number of cfos positive (+) cells in the VH of saline-injected hM4Di-expressing animals revealed the greatest and least levels of cfos activation to be in the vCA1 and vCA3d, respectively. To characterize the inhibition pattern more fully (Fig. 2E), analysis of the number of cfos+ cells following the final DRL choice test revealed that the group injected with CNO exhibited reduced amount of cfos+ cells in the vCA1d (t(5) = 3.12, p = 0.026) and vCA3v (t(5) = 3.309, p = 0.021,) but not in vCA1v (t(5)=0.82, p = 0.45), vCA3d (t(5)=0.97, p = 0.37), vCA2 (t(5)=1.44, p = 0.20), or vDG (t(5) = 1.29, p = 0.129) compared to those injected with saline. Analysis of the number of double labeled (cfos+/hM4Di (mCherry)+ cells) also confirmed the same pattern of results (Fig. 2F), with a significant reduction in the number of double labeled cells in the vCA1d (t(5) = 3.19, p = 0.021) and vCA3v (t(5) = 3.05, p = 0.028) areas, but no other areas (all p > 0.14). No significant differences were observed in the number of non-hM4Di cfos+ cells between the CNO- and saline-injected animals (all p > 0.08, CNO: 4.19 ± 1.0, SAL: 3.75 ± 0.97 across all subareas).

Fig. 2: c-Fos immunohistochemistry analysis of hM4Di-mediated VH inhibition.
figure 2

A, B Targeted locations and expression of the control (AAV-CAMKII-GFP) and hM4Di (AAV-CAMKII-hM4Di-mCherry) virus in the ventral hippocampus. C Representative ×4 images showing mCherry/hM4Di expression, Cfos expression, DAPI stain, and ×40 images showing double labeled (mCherry and cfos) cells in Saline- vs. CNO- administered rats. D Heatmap showing subareas of the VH consistently expressing mCherry/hM4Di across animals (AP −5.6 to −6.5 from Bregma as per [67]). E, F The number of cfos+ cells and cfos+/mCherry+ cells in hM4Di-expressing rats that had received saline or Clozapine-N-Oxide (CNO) were counted from ×4 images in six regions of interest: vCA1d, vCA1v, vCA2, vCA3v, vCA3d, and vDG. There was a significant reduction in cfos+ and cfos+/mCherry+ cell counts in the vCA1d and vCA3 regions. *p < 0.05.

Acquisition of DRL schedule of responding for reward

Comparisons of performance measures in the first and last three sessions of responding under a 6 s DRL schedule revealed that animals improved their performance as training progressed (Fig. 3), with a decrease in average wait time (F(1, 21.244) = 54.350, p < 0.0001, Fig. 3B) and timing uncertainty (CV, F(1, 17.349) = 263.812, p < 0.0001, Fig. 3C), as well as increases in expected reward rate (ERR, F(1, 19.988) = 515.867, p < 0.0001, Fig. 3D) and proximity to optimal duration (PMERR, F(1, 42) = 15.611, p < 0.001, Fig. 3E).

Fig. 3: DRL responding for reward.
figure 3

A Normalized probability distribution histograms for the first three and last three sessions of training. Exponential Wald distribution was fit to the response histograms to obtain a measure of timing uncertainty (CV) and average wait time (IRT, μ). There was a significant increase in the average IRT (B), decrease in timing uncertainty, CV (C), increase in expected reward rate, ERR (D), and increase in proximity to optimality, PMERR (E) with extensive training. All data shown as mean values ± SEM.

Acquisition of DRL choice task

In this phase of training rats were trained to choose between lever pressing on one lever for the reward only option (one pellet) and on another for a mixed outcome conflict option (two pellets and shock). The shock intensity in the conflict option was increased across seven blocks (0.05 mA −> 65 mA) to examine the change in preference for the reward only option. After extensive training, animals exhibited discriminative responding between blocks (Fig. 4A), as evidenced by a significant reduction in the preference of reward-only option at lower shock intensities (0.05 mA (p < 0.0001), 0.15 mA (p < 0.0001), 0.25 mA (p < 0.0001), 0.35 mA (p < 0.0001), and 0.45 mA (p < 0.001) when performance averages of the first three and last three sessions of training were compared (F shock×session (6, 286) = 5.17, p < 0.0001). Crucially, the DRL choice acquisition pattern of animals belonging to the hM4Di and GFP-control groups was not significantly different (all virus group interactions p > 0.29).

Fig. 4: Cued operant conflict decision-making with VH inhibition.
figure 4

A Preference for one pellet option for the first three and last three sessions of choice training prior to the start of testing sessions with drug manipulations. Animals were biased towards the reward only option in the first three sessions, but this bias was abolished with training. B, C Preference for the reward only option during testing and extinction for hM4Di and GFP virus groups. CNO injections in the hM4Di group induced a significant increase in preference for the reward only option compared to saline injections during extinction. D This was accompanied by a significant reduction in the point of subjective equality (PSE) in the CNO-injected hM4Di group during the extinction session, but not the discrimination sensitivity (DS). There were no changes to the number of trials completed overall in extinction or outcome sessions with CNO injections (dotted lines show the minimum number of trials possible per block across three sessions for the outcome condition, and one session for the extinction condition). Finally, the % correct responses (responses >6 s) for the conflict option in the CNO-injected hM4Di group were significantly increased. E In contrast, no significant change was observed in the GFP group with CNO manipulations. All data shown as mean values ± SEM.

VH inhibition increases the preference of reward only option, but only under extinction conditions

We then assessed the effect of chemogenetically inhibiting the VH on DRL choice performance with outcomes, and under extinction in which only the cues associated with the outcomes were presented. All animals received injections of saline or CNO (1 mg/kg) in a within-subject design, prior to each of six sessions of 7-block DRL choice testing with outcomes, and each of two sessions under extinction conditions (Fig. 1 Testing). Chemogenetic VH inhibition significantly increased the preference of reward-only option across the seven blocks for the extinction test only, when compared to their performance after saline injections (Fig. 4B,F drug×virus×experiment (1, 553.13) = 5.24, p < 0.03, post-hoc: p < 0.01). In contrast, VH inhibition did not alter choice performance when animals experienced the delivery of the outcomes (p = 0.47). Similarly, CNO injections did not induce significant changes in DRL choice performance in the control-GFP group in either of the experimental conditions (both p > 0.32, Fig. 4C). When we compared choice decisions made under extinction and with outcomes separately for the hM4Di and GFP groups, we found saline-injected hM4Di animals decreased their preference for the reward only option in extinction, compared to outcome conditions (F drug×experiment (1,270) = 5.33, p < 0.03, post-hoc: p < 0.05), but this change was absent in CNO-injected animals (p = 0.19). In the GFP group, however, animals decreased their preference for the reward only option specifically at the two highest shock intensities under extinction conditions (F experiment (1,287.45)=6.27, p < 0.02, F experiment x shock (6,283.13) = 3.97, p < 0.001, post-hoc; 0.55 mA, p < 0.001, 0.65 mA, p < 0.01), irrespective of drug injection (all drug effects p > 0.22). Thus, VH inhibition led to a significant change in choice preference under extinction conditions.

Chemogenetic VH inhibition in the hM4Di group also led to a decrease (leftward shift) in PSE during extinction (Fig. 4D, F drug×virus×experiment (1, 61.32) = 4.13, p < 0.05, post-hoc: p < 0.005), but not during the outcome test (p = 0.56) when compared with performance after saline injections. Furthermore, when the PSE from testing under outcome and extinction conditions was compared, saline-injected hM4Di animals had lower PSE values under extinction (p < 0.02). In contrast, no change in PSE was observed in either drug condition (CNO or saline), nor experimental condition (extinction vs. outcome) in the control-GFP group (all p > 0.28, Fig 4E). The DS was not altered under any of the conditions in the hM4Di and GFP groups (all p > 0.15). These results indicate that VH inhibition altered the parity between reward- and shock-associated cues (increased avoidance of the conflict cues) but not the rats’ ability to discriminate between cues signaling shock intensities under extinction conditions.

VH inhibition did not impact other performance measures such as the overall percentage of completed trials over the minimum number or trials required to progress to the next block (Fig. 4D, E, no interactions between drug, virus and shock, all p > 0.11), percentage of correct trials for the reward only option (no significant interactions between drug, virus and shock, all p > 0.82) during outcomes-present or extinction conditions. However, VH inhibition increased the percentage of correct responses for the conflict option, but only in the extinction test (Fig. 4B, F drug×virus×experiment (1, 231) = 8.884, p < 0.005, post-hoc: p < 0.001).

VH inhibition increases the average wait time for conflict option, but only in extinction

To further elucidate the response patterns for the conflict and reward only options, the IRT and other DRL parameters were analyzed separately. Analysis of the conflict option response data revealed that CNO-induced VH inhibition in the hM4Di group increased the average IRTs during the extinction test, as compared to when the same animals were injected with saline (Fig. 5A, C, F drug×virus×experiment (1, 231) = 10.95, p < 0.003, post-hoc: p < 0.0001), and when compared to the IRTs during the outcome test (p = 0.0002). VH inhibition did not affect average IRTs in the outcome test (p = 0.84), when compared to saline injection conditions. CNO injections in the control-GFP group failed to induce any changes in the average IRTs during tests with outcomes (p = 0.35) or in extinction (p = 0.41). Furthermore, control-GFP group IRTs in the extinction test did not differ from those in the outcome test (p = 0.29). Unlike responding for the conflict option, there was no effect of VH inhibition or CNO injections on average IRTs (p > 0.09 for all analyses) in either the hM4Di or GFP group for the reward only option (Fig. 5B, D). Thus, animals waited longer to respond to the conflict cue but not reward cue when the glutamatergic cells in their VH were inhibited.

Fig. 5: Temporal parameters for conflict and reward only options.
figure 5

CNO injections induced a significant increase in average wait time during extinction but not during testing selectively in the hM4Di group. The average wait time (IRT), coefficient of variation (CV), expected reward rate (ERR), and percentage of maximum expected reward rate (PMERR) for the conflict option for hM4Di group (A), for the reward only option for hM4Di (B), conflict option for GFP (C), and reward option for GFP virus groups (D) are shown. All data shown as mean values ± SEM.

Generally, across both hM4Di and GFP groups, the CV (F(1, 231) = 8.2160, p < 0.01) and PMERR (F(1, 231) = 4.129, p < 0.05) for the conflict option were higher during the extinction test compared to the outcome test. Similarly, the CV (F(1, 230.877) = 19.087, p < 0.001) was increased for the reward only option, while PMERR (F(1, 218.73) = 5.617, p < 0.03) and ERR were decreased (F(1, 218.892) = 25.0612, p < 0.0001). Notably, there was no effect of group or treatment, or interactions between these for CV, ERR, or PMERR (p > 0.082).

VH inhibition increases the time spent in the central compartment of EPM

We also assessed the effect of VH inhibition on EPM, an ethological test of anxiety. Independent sample t-test comparisons between the virus groups under the effect of CNO injection revealed that the time spent in open and closed arms for the hM4Di group was not significantly different to the control-GFP group; (open: t(20) = 0.001, p = 0.99, closed: t(20) = 0.26, p = 1.15, Fig. 6B). On the other hand, time spent in the central compartment was significantly higher for the hM4Di group compared to the control-GFP group (t(20) = 3.24, p = 0.004). Analyses of the number of entries into the open and closed arms revealed no significant differences between the hM4Di and GFP groups (Fig. 6C open: t(20) = 0.46, p = 0.65, closed: t(20) = 0.48, p = 0.63).

Fig. 6: Elevated plus maze (EPM) test.
figure 6

Time spent, and number of entries made in EPM compartments (center, open and closed arms) under the effects of CNO injections. hM4Di group spent significantly more time in the central compartment compared to the GFP group. All data shown as mean values ± SEM.

Discussion

In the present study, we trained rats in a novel cued operant approach-avoidance conflict choice task in which rats chose between a high reward option with varying intensities of shock and a low reward option that was never paired with shock. Following steady state performance, the excitatory projections from the VH were chemogenetically inhibited, and its effect on choice performance with the outcomes present, and in the presence of cues only, was examined. VH inhibition led to an overall increase in preference for the low reward option across different shock intensities, but only during cued decision-making when outcomes were not presented (i.e. extinction). VH inhibition also selectively increased the average duration animals waited before pressing the conflict option, but not the reward-only option, indicating that the VH-mediated effect was specific to the presence of an approach-avoidance conflict. The lack of an effect during testing with outcomes suggests that, unlike manipulations of basolateral amygdala [32, 33], OFC [32], or rostromedial tegmental nucleus [34], the VH is not engaged in the online calculation of outcome values in guiding decision-making; rather, it is critical during cued, but non-reinforced decision-making under motivational conflict.

One of the major goals of this study was to investigate approach-avoidance conflict in a goal-directed two-forced choice task in which parity between reward and shock can be calculated by presenting animals with two options: the “conflict option” associated with the delivery of high reward with varying levels of shock to vary the degree of approach-avoidance conflict, and “reward only option” associated with low reward. Many established rat decision-making tasks such as the rat gambling task [35], Iowa gambling task [36], delay discounting task [37], probabilistic discounting task [38] (for review [39]), risky decision task [40] and two-lever operant conflict task [41] also allow animals to make a choice between multiple options associated with differing reward and punishment (timeout, delay, effort, shock) contingencies. However, most are designed to capture risk-taking behavior that hinges on the presence of uncertainty, rather than an approach-avoidance conflict, with the use of probabilistic delivery of aversive outcomes (e.g., shock). We opted to manipulate shock intensity rather than probability of shock to ensure that the conflict option produces choice conflict in every trial. Additionally, outcomes were delivered following a forced wait time (DRL schedule) in order to minimize the occurrence of impulsive and/or perseverative responding, and to ensure sufficient exposure to the cues signaling each option. Controlling for impulsivity was particularly important in the present study, as an increase in impulsive responding has been previously documented following HC lesions [24, 42], and would potentially have generated chance level responding in choice decision-making across all levels of shock due to attenuated deliberation of the cued options, and DS in VH-inhibited animals. In addition, the DRL schedule has a well-defined optimal wait time that is independent of the value of the outcome [30, 43]. Maximizing the reward rate under this schedule depends on the trade-off between reward probability based on timing uncertainty and the average wait time. This allowed us to quantify the optimal wait time for each option based on timing uncertainty.

Our data implicate the VH selectively in guiding approach-avoidance choice decisions based on cue information alone (in the absence of outcomes), which is in accord with accumulating evidence that the VH may be critical in the processing of cues that signal potential threat, or conflicting outcomes, rather than actual threat/outcomes. Existing ethological tests of anxiety, (e.g., EPM, open field) and cue-based AA decision-making tasks that the VH is strongly linked to [16,17,18, 20, 42, 44] are all administered in the presence of innate or learned environmental cues (open and closed arms, cues in the maze/screen) that predict or threaten the delivery of future appetitive or aversive outcomes to guide decision-making, without the actual subjective experience of the outcomes. Together with more established evidence of the role of the VH in the contextual control of appetitive responses [45,46,47,48], and extinction of cued fear or active avoidance responses [14, 15], these findings lend substantial support to the view that the VH is preferentially engaged in mediating cued control over appetitively and aversively motivated behaviors and decisions, but not in goal-directed behavior driven by outcome value. This view is further corroborated by a recent marmoset study in which it was shown that GABAR-mediated inactivation of the anterior hippocampus failed to induce any changes in responding for the preferred but punished (conflict) option in a two-choice decision-making task in which responding was guided predominantly by a probabilistic delivery of aversive outcomes during cued reward-seeking, [21]. In humans, approach-avoidance tasks involving virtual foraging (foraging for reward tokens in the presence of a sleeping predator) [19] and cue-based operant-like responding (making approach/avoid button presses to visual stimuli associated with reward or punishment) [20] have been associated with anterior HPC involvement. A common feature of these tasks is that participants do not know the ultimate outcome on each trial while foraging (whether the predator will wake up) or deciding which button to press (whether approaching will lead to reward/punishment). There are perhaps some similarities between this task feature and an extinction condition in which a rat maintains responding in the absence of an outcome.

In seeking to further understand the contrasting effects of VH inhibition on reinforced and non-reinforced (extinction) decision-making, the ‘task state’ theory [49, 50] is worth considering. According to this account, the OFC plays a role in determining the ‘current state’ or location in a cognitive map of the task, which encapsulates the task-specific structure, associations between cues, outcomes and actions, and importantly, in distinguishing states that are perceptually imperceptible, yet different, as in a reinforced vs. extinction session. It has been suggested that the VH works together with the OFC in the representation of this map [51] and that VH output is required for the OFC to encode state representation [52]. In the present study, VH inhibition caused animals to generate similar choice preference in the extinction session to that in the reinforced (outcome) session, while saline injections increased the preference of conflict choice during the extinction session, potentially reflecting a failure of the VH-inhibited animals to discern the subtle change in task state with the extinction session, and an inability of these rats to update the change in outcome contingencies, which is strongly reminiscent of the previously reported effects of whole HPC or OFC lesions on attenuating extinction [47, 53, 54]. However, while compelling, this account cannot fully explain the observed increase in the animal’s wait times selectively for the conflict option in the extinction session, but not the outcome test sessions. Further research is warranted to further probe the exact role of the VH in task state representation.

The specificity of the increase in the wait times of the VH-inhibited animals for the conflict option in the extinction session indicates that the observed effect is related to decision-making in the presence of a cued approach-avoidance conflict, rather than a corollary of a motor deficit, timing deficit, or general behavioral inhibition. The time spent before responding can be conceived as a measure of confidence in the choice [55] and its increase highlights a possible disruption in decision-making [32]. Similarly, we observed an increase in the time spent in the central component of the EPM with VH inhibition, which, unlike the times spent in the open or closed arms, has been previously correlated to measures of decision-making processes rather than anxiety [56]. Together, these findings implicate the VH, and its excitatory glutamatergic neurons, in the impairment of decision-making under cued approach-avoidance conflict in the absence of outcomes.

The observed avoidance of the cued conflict option and leftward shift of the PSE in the absence of any reinforcement indicates that in VH-inhibited animals, the perceived parity between the reward- and shock-associated cues had become dysregulated. We believe these data to further indicate that the perceived negative valence of the conflict cue may have increased under extinction, implicating the VH in facilitating cued approach decisions in the face of motivational conflict under normal circumstances, consistent with the role that we had previously attributed to the vCA1, and not vCA3/DG subregions of the VH in a maze-based cued approach-avoidance conflict task that was also administered under extinction conditions [16, 57]. An alternative account of an increase in the perceived positive valence of the low reward option is incompatible with the current results as the conflict cue signals a higher value of reward. Of note is the observation that under extinction conditions, the control groups exhibited the opposite pattern of preferring the conflict option across all shock intensities in the hM4Di saline condition. This shift in preference for the conflict option was accompanied by an increase in PSE only in the hM4Di saline control condition, which may be reflective of the value of the shock-associated cue extinguishing faster than the reward-associated cue. The absence of a change in PSE in the GFP-control groups is most likely due to the change in preference for the conflict option occurring only at the highest shock levels, as a change in PSE represents altered preference across all shock intensities.

Finally, while we observed a VH inhibition-induced alteration in a decision-making parameter (central compartment time) in the EPM, we failed to observe the widely reported anxiolytic effect of VH lesions/inactivation [11, 58,59,60,61,62] in the present study. It is possible that the more targeted inhibition of glutamatergic cells residing in a small band of vCA3 and vCA1 (vCA1d) in the present study, as opposed to larger, non-cell type specific inactivation of VH cells employed in previous studies, led to a more focal effect of disrupting decision-making parameters of approach-avoidance conflict processing. Existing research remains inconclusive on the effect of subfield-specific manipulations on ethological tests of anxiety, particularly with reports of varying effectiveness of vDG and vCA1 disruption on anxiety [16, 57, 63, 64]. The existence of functional/anatomical/genetic heterogeneity in the vCA1 itself is well documented, with subpopulations of neurons that are differentially responsive to anxiety, spatial, and goal-directed tasks (e.g., [65, 66]), thus raising the possibility that behavioral expressions of anxiety, and decision-making parameters of approach-avoidance conflict may be subserved by distinct neuronal subdomains with in the VH.

In conclusion, using a novel cued approach-avoidance conflict decision-making task combined with a chemogenetic approach that allowed selective targeting of excitatory neurons in the VH, the present study extends, and offers novel insight into our understanding of VH function in approach-avoidance conflict processing in a number of important ways. First, we have demonstrated that the excitatory neurons within a band of the vCA3 and vCA1 are preferentially involved in choice decisions that are made on the basis of conflicting cues in the absence of outcomes. Second, by allowing the independent measurement of two decision-making parameters: wait time and choice, the present task enabled us to identify a subarea in the VH that is critical in the deliberation and initiation of cue-elicited decision-making under two conflict situations that involve animals being presented with a decision of choosing between a potential small reward over large reward and punishment, or choosing safety (closed arms of EPM) over potential reward and threat (open arms). These findings have implications for neuropsychiatric diseases in which approach-avoidance conflict decision-making is likely aberrant [8, 9].

Funding and disclosure

This study was funded by the Natural Sciences and Engineering Research Council of Canada (N.S.E.R.C, 402642) and Canadian Institutes of Health Research (390877). We declare that there are no competing financial interests with the work described.