Introduction

Navigating a changing environment requires organisms to make predictions about the future. These processes may involve the orbitofrontal prefrontal cortex (OFC), which is conceptualized as building ‘task spaces,’ allowing organisms to link behaviors and stimuli with anticipated outcomes, even when these associations are not readily observable (Wilson et al, 2014; Stalnaker et al, 2015). In this capacity, the OFC must retain information about past events to organize future behavior, but how this occurs is not fully understood. Neural activity in the OFC can correlate with the anticipation of a pending outcome (Murray et al, 2007), and OFC inactivation in some reversal tasks does not immediately impact response flexibility, but rather, subsequent response choice (Keiflin et al, 2013). Further, OFC inactivation during a task involving punishment modifies subsequent reactivity to punishment-associated conditioned stimuli (CS) (Clarke et al, 2015). These patterns suggest that OFC plasticity during or soon following new learning is important for later engaging optimal response strategies.

Substantial efforts have focused on understanding OFC involvement in forming and modifying stimulus-outcome associations. Until recently (Ostlund and Balleine, 2007; Gremel and Costa, 2013; Gourley et al, 2013b; Rhodes and Murray, 2013; Fuizat et al, 2017), investigations regarding response-outcome conditioning have instead largely focused on the medial prefrontal cortex. Nevertheless, selective knockdown of the synaptic plasticity regulator, brain-derived neurotrophic factor (BDNF), in the OFC interferes with the ability of mice to modify behavior based on both stimulus-outcome and response-outcome associations (Gourley et al, 2013b). Furthermore, a BDNF receptor agonist corrects these failures. In these investigations, the agonist was given immediately following task training, and corrective effects were detected later, when it was not present (Zimmermann et al, 2017). This pattern suggests that task-related plasticity in the OFC is involved in response-outcome memory formation or retention, allowing organisms to later update response strategies.

The inability to modify or override previously learned associations when they are no longer valid is associated with a broad range of psychopathologies—eg, in unremitting drug seeking in addiction, repetitive acts of compulsion, and extinction-resistant fear in post-traumatic stress disorder (PTSD). Abnormal OFC structure and function are associated with obsessive-compulsive disorder (Rauch et al, 1994) and phobias that fail to extinguish (Tillfors et al, 2001), as well as addiction and PTSD (Volkow et al, 2011; Jackowski et al, 2012). Notably, OFC dysfunction is in some cases responsive to cognitive behavioral therapy (Kennedy et al, 2007) and antidepressants (Kennedy et al, 2007; Fani et al, 2011), making it a promising target in understanding and improving treatment approaches to a broad spectrum of disorders. Further, within the prefrontal cortex, OFC structure and function are considered to be relatively highly conserved across rodent-human species (Preuss, 1995; Murray et al, 2007; Wallis, 2012), and it shares rich reciprocal connections with the basolateral amygdala (BLA) (McDonald, 1991; McDonald et al, 1996), which may contribute to its function.

Here we used CaMKII-driven Gi-coupled designer receptors exclusively activated by designer drugs (DREADDs) to inducibly inactivate the ventrolateral OFC (VLO) during two events relevant to human psychopathology: (1) training mice to extinguish conditioned fear, in which a shock-associated stimulus is presented in the absence of the shock, and (2) instrumental contingency degradation, in which a familiar instrumental behavior is no longer likely to be rewarded. Our findings suggest that the VLO is necessary for stable memory retention; this is as opposed to, for example, acquiring the initial associations, which instead required the BLA.

Materials and methods

Subjects

Mice were 6-week-old C57BL/6 males (Jackson Labs) maintained on a 12-h light cycle (0800 on) and allowed ad libitum access to food and water except during instrumental conditioning when body weights were maintained at ~90% of baseline. Procedures were Emory University IACUC-approved. Experimental groups are outlined in Supplementary Table 1.

Surgery

Viral vectors expressing CaMKII-driven Gi-DREADDs or control fluorophores were infused intracranially (Supplementary Materials).

Fear Conditioning and Extinction

For two days prior to conditioning, mice were habituated to the conditioning chambers (Med-Associates or Coulbourn) for 15 min per day. Auditory fear conditioning was then conducted; this 8-min session consisted of 5 presentations of a 30-s, 6 kHz tone CS co-terminating with a 1-s, 0.6 mA footshock, the unconditioned stimulus (US). Extinction training was conducted in a novel context. It consisted of 15 CS presentations without the US over 17 min, starting with a 3-min habituation period. Clozapine-N-oxide (CNO) was administered 30 min prior to extinction training. Extinction retention tests were conducted daily and without CNO. The percentage of time mice spent freezing in the presence of the CS was determined using FreezeFrame software (Coulbourn).

Instrumental Conditioning

Mice were trained to nose poke for food reinforcement (20 mg grain-based pellets; Bioserv) using Med-Associates conditioning chambers equipped with two nose poke recesses and a separate food magazine. Training was initiated using a fixed ratio (FR) 1 schedule of reinforcement; mice could earn up to 30 pellets for responding on each of 2 active apertures, and the sessions ended when all 60 pellets had been delivered, or at 135 min. Following 5 days of FR1 training, mice were shifted to a random interval 30-s (RI30) schedule of reinforcement for 2 or 6 sessions as indicated graphically. Two sessions would be expected to bias responding towards a goal-directed strategy sensitive to response-outcome contingencies, while extended training (6 sessions) would be expected to promote habit-based responding that is insensitive to response-outcome contingency.

Next, the response-outcome contingency associated with one response was ‘degraded.’ In this case, one of the nose poke apertures was occluded, and responding on the remaining aperture was not reinforced. Instead, pellets were delivered independently of animals’ responding, at a rate matched to each subject’s reinforcement rate the previous day (for example, (Gourley et al, 2013a, b; Swanson et al, 2015; Zimmermann et al, 2017)). During a ‘non-degraded’ session, the opposite aperture was available, and responding was reinforced according to a variable ratio 2 schedule. Thus, both response-outcome contingencies changed, but one response became significantly less predictive of reinforcement than the other. Mice were administered CNO either 30 min prior to the ‘degraded’ session or immediately after. Sessions were 25 min long.

The next day, mice were returned to the chambers and allowed access to both apertures in a 10 min choice test conducted in extinction. Preferential engagement of the response most likely to result in reinforcement is considered ‘goal-directed,’ whereas equivalent engagement of both familiar responses is considered a failure in response-outcome conditioning (Yin et al, 2008; Balleine and O'Doherty, 2010). Response rates are shown. Response preference scores are also reported in the Supplementary Materials, and patterns concur with raw response rates.

In one experiment, we subjected mice to 2 additional training sessions. Then, the instrumental contingency degradation procedure and choice tests were repeated. Both tests are represented.

Reversal Test

Mice were tested as above, then subjected to another cycle of instrumental contingency degradation, except the response-outcome contingency that had been intact was ‘degraded,’ and the previously degraded contingency remained intact. Instead of CNO, all mice received an i.p. injection of saline immediately following the contingency degradation procedure to control for any effects of injection stress. A choice test was conducted the following day.

Following instrumental conditioning experiments, mice were returned to ad libitum feeding for 1 week. Fear conditioning experiments followed.

CNO

CNO (Sigma-Aldrich) was dissolved in 2% dimethyl sulfoxide (DMSO) and saline and prepared within 48 h of use. In mice with VLO- or dorsolateral striatal (DLS)-targeted infusions, 1 mg/kg, i.p., was administered. In mice with BLA-targeted infusions, 1 mg/kg was used in the initial instrumental contingency degradation test. We then repeated the procedure with a 10 mg/kg dose. 1–10 mg/kg spans a dose range commonly reported in the literature (Farrell and Roth, 2013). The subsequent fear extinction experiments then continued to utilize the 10 mg/kg dose. All mice, regardless of viral vector, received CNO.

Histological and Electrophysiological Confirmation of DREADDs Expression

See Supplementary Materials.

Statistical Analysis

Broadly, two-way ANOVAs with α0.05 were performed using SigmaStat v.3.1 or SPSS with repeated measures when appropriate. Tukey’s post-hoc comparisons were applied following significant interactions; the results of these tests are indicated graphically. In one experiment (main text Figure 5d), planned comparisons were made by t-test. Also, 1-sample t-tests were applied to the preference scores reported in the Supplementary Materials. Correlations reflect linear regression analyses. Values lying >2 standard deviations outside of the mean were considered outliers and excluded (total n=8).

Results

Here we test the role of the VLO in two forms of outcome-based conditioning: Pavlovian fear extinction, in which a previously-threatening stimulus fails to predict an aversive outcome, and response-outcome contingency degradation, in which a familiar behavior fails to produce an associated reinforcer. Despite clear differences between these forms of stimulus-outcome and response-outcome conditioning, both require: the initial formation of an outcome-based association, and subsequent modification of that association. We hypothesized that the VLO influences outcome-based learning and memory by consolidating or retaining new information about previously acquired associations. Furthermore, we envisioned a valence-free model, independent of whether new learning was driven by appetitive or aversive motivations.

The VLO is Necessary for Fear Extinction: Comparison to the BLA

First, we infused CaMKII-driven viruses expressing either GFP (controls) or Gi-DREADDs into the VLO, allowing for selective and controlled suppression of neural activity in local excitatory neurons by systemic administration of CNO (Farrell and Roth, 2013; Supplementary Figure 1). Some viral vector spread into the DLO/AI was noted (Figure 1a), but these mice did not obviously differ from those with infection restricted to the VLO. GFP-expressing terminals were detected in the downstream BLA, consistent with VLO infection (Figure 1b). Additional mice were infused directly into the BLA for comparison (Figure 1c).

Figure 1
figure 1

Viral vector infection of the VLO and BLA. (a) Spread of viral vectors following infusion into the VLO is represented in the right hemispheres. Black indicates the largest spread and white the smallest, drawn on images from the Mouse Brain Library (Rosen et al, 2000). The boundaries of the OFC subdivisions are imposed on the left hemisphere. While viral vector expression was noted in the lateral OFC in some mice, ~70% of infusions were contained within the VLO. (b) VLO infusions resulted in fluorescing terminals in the BLA, as expected. (c) Viral vectors were also infused directly into the BLA, with black+red indicating the largest spread and white the smallest. ‘MO’ refers to medial OFC, VLO to the ventrolateral subregion, and ‘DLO/AI’ refers to the dorsolateral OFC/agranular insula. A full color version of this figure is available at the Neuropsychopharmacology journal online.

PowerPoint slide

Mice were trained in an auditory fear conditioning paradigm, in which we paired a US (mild foot shock) with a CS (auditory tone). Then, CNO was administered 30 min prior to extinction conditioning (Figure 2a). During this session, the CS was repeatedly presented in the absence of shock. Extinction retention tests were then conducted drug-free over 3 subsequent days. All mice developed conditioned freezing, evidenced by increased CS-elicited freezing over the 5 CS-US presentations. No differences were detected between groups (interaction F(4,40)=1.8, p=0.15; virus F=1)(Figure 2b), and VLO inactivation did not impact the acquisition of extinction conditioning (interaction F<1; virus F(1,10)=1.2, p=0.3) (Figure 2c). However, when mice were again tested the following days, the previously-inactivated group showed deficits in extinction memory retention, generating higher freezing levels than control mice (main effects Retention1 F(1,10)=8.8, p=0.01) (Figure 2d) (Retention2 F(1,10)=19.2, p=0.001) (Figure 2e) (Retention3 F(1,10)=6.6, p=0.03) (Figure 2f). Thus, the VLO appears necessary for fear extinction memory, though not the acquisition of extinction conditioning per se (session × group interaction F(4,40)=3.6, p=0.01) (Figure 2g).

Figure 2
figure 2

The VLO is necessary for fear extinction memory. (a) An experimental timeline is shown. (b) Mice with VLO infusions acquired conditioned freezing, demonstrated by increased immobility across 5 CS-US pairings. (c) Freezing during extinction training was not affected by VLO inactivation. Inset: When CS-elicited freezing during the extinction training session was calculated as a fold-change from freezing generated during CS-US pairings in the previous session, the groups again did not differ. (d) During extinction retention testing, however, mice with a history of VLO inactivation froze more than control mice. (e, f) Elevated freezing in this group persisted for two more sessions. (g) Average freezing scores across sessions revealed a long-lasting deficit in the extinction of conditioned freezing in VLO Gi-DREADDs-expressing mice. n=6/group following exclusions due to viral vector misplacement. (h) Mice with BLA infusions acquired conditioned freezing, demonstrated by increased immobility across 5 CS-US pairings. (i) Gi-DREAAD-mediated inactivation impaired extinction conditioning. (j) During the initial extinction retention test, mice with a history of BLA inactivation again generated higher freezing levels. (k, l) These levels normalized with additional training. (m) All sessions are represented on a single graph, highlighting elevated freezing during the initial extinction training period. n=7–8/group following exclusions due to viral vector misplacement. Symbols represent means+SEMs. #p<0.05, main effect of group; *p<0.05 following interactions; **p<0.007 following interactions.

PowerPoint slide

Freezing during the initial CS-US conditioning was qualitatively lower in the DREADD group. While non-significant, this trend could potentially mask a deficit in extinction acquisition. To rule out this possibility, we calculated the fold-change in CS-elicited freezing during extinction vs the initial CS-US pairings. No group differences were detected (t10=0.79, p=0.4) (Figure 2c, inset), further evidence that VLO inactivation did not impact the acquisition of extinction training.

Next, mice with Gi-DREADDs or a control viral vector in the BLA acquired conditioned freezing, evidenced by increased CS-elicited freezing over 5 CS-US pairings, with no differences between groups (main effect and interactions F<1) (Figure 2h). Mice were then administered CNO prior to extinction training. BLA inactivation increased freezing (main effect F(1,12)=14.6, p=0.002) (Figure 2i), an extinction deficit. The following day, the DREADD group again froze more, presumably due to extinction failures during the previous session (main effect F(1,13)=8.2, p=0.01) (Figure 2j). CS-elicited freezing did not, however, differ in subsequent tests (F<1) (Figure 2k and l). Overall, a session × group interaction (F(4,52)=5.8, p=0.001) (Figure 2m) and post hoc comparisons indicated that BLA inactivation caused a failure in the acquisition of fear extinction conditioning.

The VLO Also Retains Reward-Related Information

We also trained mice to generate two instrumental responses for food pellets. Then, the response-outcome contingency associated with one of the responses was ‘degraded’ by providing reinforcers associated with this response non-contingently. Meanwhile, responding on the other aperture remained reinforced (Figure 3a). In Experiment 1, VLO-infused mice acquired the instrumental responses, with no group differences (interaction F(6,132)=1.5, p=0.2; virus F(1,22)=1.8, p=0.2) (Figure 3b). CNO was then administered prior to response-outcome contingency degradation, inactivating the VLO in DREADD-expressing mice. Nevertheless, a main effect of response indicated that mice, regardless of viral vector group, preferred the response most likely to be reinforced in 2 subsequent choice tests (test 1 interaction F(1,22)=1.7, p=0.2; response F(1,22)=14, p=0.001; test 2 interaction F<1; response F(1,22)=4.9, p=0.04) (Figure 3c; see Supplementary Figure 2a for preference scores).

Figure 3
figure 3

The VLO retains response-outcome memory. (a) A task schematic is shown. Mice are trained to generate two distinct nose poke responses that are equally likely to be reinforced. Then, the likelihood that one response will be reinforced is decreased (‘contingency degradation’). Preferential engagement of the remaining response during subsequent choice tests is thought to reflect knowledge of the response-outcome relationships. (b) Mice acquired the responses. (c) When CNO was delivered prior to instrumental contingency degradation to inactivate the VLO, no effects were detected. In other words, both groups subsequently engaged the response that was likely to be reinforced (non-degraded vs degraded). n=11–13/group. (d) A separate cohort of mice also acquired the responses. (e) In this case, the VLO was inactivated immediately following instrumental contingency degradation. Initially, all mice preferred the behavior that was most likely to be reinforced; however this preference decayed in the previously-inactivated mice, and responding became non-selective. n=7/group. (f) To further test whether response-outcome memory had decayed in mice with a history of VLO inactivation, we reversed the location of the active vs inactive nose poke recesses. (g) Mice with a history of VLO inactivation rapidly updated response strategies, generating the response most likely to be reinforced, apparently unencumbered by stable memory retention. Meanwhile, control mice generated both responses equally, presumably because they retained the memory of both responses being reinforced (and not reinforced) in recent history. Symbols and bars represent means+SEMs. *p<0.05; **p<0.001.

PowerPoint slide

In Experiment 2, another group of mice acquired the instrumental responses, again with no group differences (interaction and virus F<1) (Figure 3d). This time, the VLO was inactivated immediately following instrumental contingency degradation. Both groups subsequently preferred the response that was most likely to be reinforced (interaction F<1; response F(1,12)=21.4, p=0.001) (Figure 3e; Supplementary Figure 2b). This preference decayed in the previously-inactivated group, however, such that these mice responded in a non-selective fashion the following day (interaction F(1,12)=6.4, p=0.03) (Figure 3e; see Supplementary Figure 2b for preference scores). Thus, VLO plasticity following response-outcome conditioning appears necessary for retaining information that sustains optimal response preferences.

On the basis of this conclusion, one might predict that in the absence of further inactivation, the previously-inactivated mice may be better able to modify future behaviors, given that they are unencumbered by reward-related memory. We tested this hypothesis by reversing the response-outcome contingencies, such that the response that was previously unlikely to be reinforced was reinforced, while the response that was previously likely to be reinforced was not reinforced (Figure 3f). As hypothesized, previously-inactivated mice readily updated their response strategies, preferring the now highly reinforced response. By contrast, control mice engaged both responses equivalently, presumably because they retained memory of both of these behaviors having been likely to be reinforced (and not reinforced) in recent history (interaction F(1,11)=8.6, p=0.01)(Figure 3g).

Instrumental Response Strategies Correlate with Fear Extinction Retention

Mice from Experiments 1 and 2 above were subject to conditioned fear extinction testing. Using data from control AAV-CaMKII-GFP-infected mice, we compared the percentage of total responses directed towards the non-degraded contingency during the second choice test against freezing in a first extinction retention test. Preference for the response that was predictive of reinforcement was indeed associated with decreased freezing during extinction retention testing (r2=0.4, p=0.01) (Figure 4a). This correlation was maintained with the addition of a third independent cohort of mice that also received intracranial infusions and appetitive and aversive conditioning (r2=0.3, p=0.007) (Figure 4b). (Gi-DREADD mice were not included because the effects of Gi-DREADD-mediated inactivation in choice test 2 were dependent on the timing of CNO administration.) Thus, the stable expression of response-outcome conditioning was associated with successful conditioned fear extinction.

Figure 4
figure 4

Reward-related response strategies correlate with the retention of fear extinction conditioning. (a) Control GFP-infected mice that demonstrated stable response-outcome memory retention (see again, Figure 3) were more likely to show evidence of successful fear extinction, characterized by lower levels of freezing during extinction retention testing. (b) This correlation was maintained with the addition of an independent third cohort of mice. The dashed lines indicate equal engagement of both responses, non-selective responding. Each symbol represents a single mouse.

PowerPoint slide

How Do the Effects of VLO Inactivation Compare to Inactivating Other Structures Involved in Reward-Related Decision Making?

We next trained mice expressing Gi-DREADDs or a control viral vector in the BLA, as well as mice bearing mis-targeted Gi-DREADDs infusions affecting the central nucleus of the amygdala (CeA) and the posterior pole of the dorsal striatum (Figure 5a), to nose poke for food reinforcement (Figure 5b). No group differences were identified during training (Fs<1](Figure 5c). Next, CNO was delivered prior to a session during which one familiar response-outcome contingency was violated. Subsequently, control and ‘mistargeted’ mice preferred the response most likely to be reinforced, evidence of knowledge of the response-outcome contingency. Inactivation of the BLA, however, occluded new response-outcome conditioning, and these mice generated both responses equally (interaction F(2,18)=9.6, p=0.001) (Figure 5d; see Supplementary Figure 2c for preference scores).

Figure 5
figure 5

The BLA promotes, while the DLS occludes, response-outcome conditioning. (a) Viral vector infusions into the BLA are represented on images from the Mouse Brain Library (Rosen et al, 2000), with black+red in the left hemispheres representing the largest, and white the smallest, viral vector spread. In some mice, infusions were mis-targeted, instead infecting the CeA, the posterior pole of the caudate putamen, and aspects of the hippocampus. These infusions are represented in the right hemispheres. (b) CNO was delivered prior to a training session during which one familiar nose poke response was no longer likely to be reinforced, then mice were tested in a choice test the next day. (c) Mice acquired the nose poke responses. (d) CNO (1 or 10 mg/kg) pretreatment disrupted response choice in the BLA, but not control or mis-targeted, groups. n=6–8/group. (e) We also expressed Gi-DREADDs in the DLS, with viral vector expression represented on images from the Mouse Brain Library (Rosen et al, 2000). (f) Mice could acquire food-reinforced instrumental responses and were subjected to multiple training sessions using an RI schedule of reinforcement that promotes habit-based behavior, which is insensitive to response-outcome contingency. (g) DLS inactivation enhanced response-outcome conditioning, resulting in robust response preference in the DLS group. n=5/group following exclusions due to viral vector misplacement. Symbols and bars represent means+SEMs. α p=0.065; *p<0.05; **p<0.001 following interactions. A full color version of this figure is available at the Neuropsychopharmacology journal online.

PowerPoint slide

In mistargeted mice, CaMKII-driven Gi-DREADDs were detected in the CeA and posterior pole of the dorsal striatum. This is significant because the dorsomedial striatum (DMS), particularly the posterior region, is involved in goal-directed action selection (Yin et al, 2008), and 70% of striatal neurons express CaMKII (Goto et al, 1994), the promoter driving expression of the DREADD. With the caveat that prior studies focused on a region of the DMS not nearly as posterior (Yin et al, 2008), one might nonetheless conceivably anticipate that inactivation of this region could disrupt response-outcome conditioning as well. To confirm our null finding, we repeated the contingency degradation procedure using a higher, 10 mg/kg, dose of CNO. Again, control mice and mistargeted mice preferentially engaged the behavior most likely to be reinforced, while BLA inactivation again impaired response-outcome conditioning, resulting in non-selective responding (Figure 5d; Supplementary Figure 2c).

In contrast to the DMS, the DLS is part of a ‘habit circuit’ that is insensitive to response-outcome contingency (Yin et al, 2008). Unlike the BLA and DMS, it is also spared OFC innervation (Schilman et al, 2008; Hoover and Vertes, 2011; Mailly et al, 2013). In a final experiment, we expressed Gi-DREADDs in the DLS (Figure 5e) and trained mice using fixed ratio, then RI schedules of reinforcement that stimulate habit-based behavior. Mice acquired the nose poke responses without group differences (Fs<1) (Figure 5f). Following instrumental contingency degradation, control mice generated both responses equally, habitually, as expected, but DLS inactivation enhanced response-outcome conditioning, blocking habit-based behavior (interaction F(1,8)=13.5,p=0.006) (Figure 5g). Notably, we have observed the same pattern following more prolonged RI training that would be expected to strengthen habit-based responding (DePoy and Gourley, unpublished), suggesting that this effect is quite robust.

Discussion

We report that a subregion of the OFC, the VLO, is necessary for conditioned fear extinction memory, such that VLO inactivation during extinction training causes a durable maintenance of CS-elicited freezing. Likewise, VLO silencing weakens the ability of mice to sustainably select actions based on their consequences. Lesions of the OFC in marmosets similarly impair response-outcome decision making (Jackson et al, 2016). We suggest that conditioning-induced plasticity aids in retaining information essential to engaging optimal response strategies, rather than encoding or consolidating this information, because VLO-inactivated mice were initially able to select actions based on the likelihood of reinforcement (evidence of memory formation), but response preferences decayed. Later, mice with a history of VLO inactivation were better able than control mice to adjust response strategies when reinforcement likelihood again changed, apparently unencumbered by stable memory retention.

The VLO Supports fear Extinction Memory: Comparison to the BLA

Here we used Gi-coupled DREADDs to inducibly impair VLO function during fear extinction training when a previously shock-associated CS was presented in the absence of shock. VLO inactivation impaired extinction, causing sustained conditioned freezing. Specifically, conditioned freezing during extinction training was unaffected, but mice failed to further inhibit freezing when later exposed to the CS. Abundant evidence indicates that extinction-induced neuronal plasticity in another region of the prefrontal cortex, the infralimbic cortex (IL), is necessary for extinction memory retention (Milad and Quirk, 2012; Maren and Holmes, 2016). Similar principles could apply to the VLO, given that the VLO was inactivated during extinction training, but apparent memory failures were detectable later, during the retention tests. Memory consolidation could alternatively (or additionally) have been affected. Another interpretation is that VLO-inactivated mice were able to retain their initial extinction memory, since freezing levels on the extinction training and retention days did not differ, but they were unable to then further inhibit freezing. We nevertheless favor the interpretation that extinction memory retention was impaired, given that control mice instead generated lower freezing rates during the retention relative to extinction tests, evidence of new learning due to extinction training.

The OFC is thought to generate outcome expectancies, recognizing the expectancy violation that occurs when a CS no longer predicts the US. When the OFC is compromised, expectations are not updated, hence failures in extinction following VLO inactivation in rats (Panayi and Killcross, 2014; Zelinski et al, 2010) and mice (Figure 3). During the retention tests here, groups did not differ with the initial CS presentations, diverging only later in each session. This pattern hints at failures in within-session, in addition to between-session, adaptation to new outcome expectancies. Interestingly, in an earlier report, infusion of the protein synthesis inhibitor anisomycin into the OFC had no effects on extinction conditioning, nor its retention (Santini et al, 2004). Infusions were considerably more lateral than ours; however, hinting at neuroanatomical differences within the OFC in the regulation of fear-based behaviors.

In apparent contrast to the VLO, the BLA is involved in the acquisition of extinction conditioning. For instance, NMDA receptor and Extracellular signal-Regulated Kinase 1/2 blockade in the BLA impair within-session extinction (Herry et al, 2006; Sotres-Bayon et al, 2007; Zimmerman and Maren, 2010). We recapitulated this phenomenon using Gi-DREADDs-mediated inactivation, despite differences in baseline freezing rates relative to prior experiments.

The VLO, BLA, and DLS Differentially Influence Response-Outcome Decision Making

In parallel studies, we found that VLO inactivation immediately following a training session in which one of two familiar food-reinforced behaviors was no longer likely to be reinforced disrupted subsequent response selection. While Gi-DREADD-expressing mice could initially select the action most closely associated with reinforcement, this preference decayed. In conceptually similar investigations, VLO inactivation did not impair the within-session extinction of conditioned approach or response reversal, but rather, deficits became apparent when rats were tested subsequently (Keiflin et al, 2013; Panayi and Killcross, 2014). Thus, conditioning-related plasticity in the VLO appears important for retaining information that sustains optimal response choice.

In contrast, pretraining BLA inactivation induced response failures that were rapidly detectable, as similarly reported in rats with BLA lesions (Balleine et al, 2003), interpreted as evidence that the BLA is necessary for response-outcome memory encoding. Posttraining BLA lesions also interfere with the expression of response-outcome memory (Ostlund and Balleine, 2008). This is notable because the VLO and BLA share reciprocal connections (Figure 1) (albeit to a lesser degree than more lateral regions of the OFC (McDonald et al, 1996; Zimmermann et al, 2017)). We recently reported that unilateral reduction of the pro-plasticity neurotrophin Bdnf in the VLO, paired with contralateral amygdala lesions (ie, VLO-BLA disconnection), delays response-outcome conditioning (Zimmermann et al, 2017). Although we initially interpreted this impairment as a memory consolidation failure, it now seems likely that it instead reflects weakened retention and then expression of new memory, otherwise supported by BDNF-dependent VLO-BLA interactions.

Importantly, our data do not preclude a role for the VLO in rapid ‘on-line’ behavioral control. VLO-dorsal striatal interactions appear to enable organisms to rapidly adjust behavioral response strategies when faced with changes in response-outcome contingency or outcome value (Gremel and Costa, 2013; Gourley et al, 2013b). Moreover, BDNF (Gourley et al, 2013b), GABAAα1 (Swanson et al, 2015), and endocannabinoid signaling (Gremel et al, 2016) in the VLO support response-outcome-based strategy shifting, likely via interactions with the dorsal striatum. Further, some striatal-targeted projections from the VLO terminate in the ventrolateral striatum (Schilman et al, 2008), where neuronal activity is associated with both action initiation and sustained, motivated behavior (Natsubori et al, 2017).

OFC projections to the striatum fan in a topographically-organized fashion from the dorsomedial to ventrolateral regions (Schilman et al, 2008; Hoover and Vertes, 2011; Mailly et al, 2013). They spare the DLS, however, which is part of a stimulus-response ‘habit’ circuit that is by definition insensitive to response-outcome contingency (Yin et al, 2008; Smith and Graybiel, 2016). To validate our approach, we inactivated the DLS and found that, as with other forms of temporary inactivation (Pauli et al, 2012), Gi-DREADD-mediated inactivation induced a shift from non-selective (habitual) to preferential (goal-directed) responding. Ie, DLS inactivation facilitated response-outcome conditioning (as opposed to VLO and BLA inactivation, which impaired it). These effects were detected a full day following inactivation (see also (Yin et al, 2006)), supporting the conceptualization of habit-based behaviors as active, rather than reflexive, processes that involve new learning (see (Pauli et al, 2012)) and potentially even structural plasticity (Gourley et al, 2013a) within the DLS.

Gi-DREADD Activation Inhibits Long-Term Potentiation (LTP) in the VLO

Finally, we used single-cell patch clamp recordings to reveal an increased threshold for LTP induction in Gi-DREADD-expressing VLO neurons (Supplementary Figure 1). This pattern may point to a Gαi-mediated mechanism that inhibits adenylyl cyclase and its downstream effectors, cyclic adenosine monophosphate and protein kinase A (PKA). Indeed, a postsynaptic PKA-dependent pathway is necessary for early LTP in the hippocampus following a stimulation protocol very similar to that used here (Blitzer et al, 1995). In addition, mobilization of the Giα subunit could disrupt the later phases of LTP by inhibiting PKA-mediated gene transcription and protein synthesis (Mayr and Montminy, 2001; Sindreu et al, 2007). These later phases may be critical for VLO function, given that CNO delivery after, but not before, appetitive training impacted subsequent response choice, suggesting that some degree of VLO plasticity occurring following conditioning supports memory retention.

Conclusions

Unifying models of OFC function argue that the OFC coordinates decision making that requires inference or prediction of future outcomes (Wilson et al, 2014; Stalnaker et al, 2015). We suggest that learning-related plasticity in the VLO subregion enables the retention of information essential to coordinating future behavior in both appetitive and aversive (extinction) contexts. This function differs from that of the BLA and likely other subregions of the OFC. For example, the agranular insular subregion appears to be involved in contextually-conditioned fear expression, but not extinction (Morgan and LeDoux, 1999; Santini et al, 2004). Further characterizing OFC subregion function will be an important step in comprehensively understanding this large, heterogeneous brain region.

Funding and disclosure

This work was supported by P51OD011132 and an NIMH BRAINS award to SLG (MH101477). The authors declare no conflict of interest.