Introduction

Obsessive compulsive disorder (OCD) is a debilitating neuropsychiatric disorder that affects 1ā€“3% of the population, and is characterized by persistent intrusive thoughts (obsessions) and uncontrollable repetitive rituals (compulsions) [1, 2]. There is a remarkable convergence of abnormal functional neuroimaging findings in prefrontal cortex (PFC)ā€“striatal circuits in OCD patients [3]; however, there are discrepancies in the directionality of these results [4]. Typically, hyperactivity has been reported at baseline and during symptom provocation in OCD patients in areas including orbitofrontal cortex (OFC) [5,6,7], anterior-cingulate cortex (ACC) [6, 7], ventromedial PFC (vmPFC) [8], and caudate [5, 7]; this activity is normalized following successful treatment [9, 10]. In contrast, prefrontal regions including OFC [11, 12], vmPFC [13,14,15,16], and dorsolateral PFC (DLPFC) [11, 17] display impaired recruitment by cognitive demands. Alterations in resting-state functional connectivity in PFCā€“striatal circuits have also been described, but both increases [18] and decreases [19] have been reported. Together, these findings suggest complex changes in PFCā€“striatal functioning in OCD patients, which may represent a valuable target for future therapeutic interventions.

Precise neural circuit mechanisms underlying these disturbances are difficult to uncover in clinical studies, and preclinical models have become an increasingly valuable complementary tool. A growing number of reports describe OCD-relevant transgenic mouse models, with a particular focus on the phenotype of compulsive grooming [20,21,22,23,24,25]. This is an ethologically relevant compulsive behavior in mice [26], which shows predictive validity using clinically effective serotonin-reuptake inhibitors in several different models [20, 22, 24, 27]. While this research has yielded important insights regarding striatal alterations that contribute to compulsive grooming [20, 21, 25, 28], there has been little exploration of translational OCD-relevant cognitive paradigms in these experimental mouse systems. To our knowledge, the present study is among the first to describe cognitive impairments in an OCD-relevant mouse model [29], and identify potential underlying neural substrates in the PFC and striatum.

To probe the neural circuits underlying cognitive flexibility, we examined Sapap3 knockout mice (KOs), the most widely used preclinical model in OCD research [20, 21, 28]. Sapap genes encode a family of four postsynaptic density-scaffolding proteins, and Sapap1 and Sapap3 are both candidates for OCD risk [30, 31]. Sapap3-KOs show compulsive grooming that is reversed by the first-line OCD treatment fluoxetine, and several studies have demonstrated dorsal striatal hyperactivity in this model [20, 21, 28]. The goal of this study was to determine whether Sapap3-KOs show OCD-relevant cognitive impairment, and assess functioning of associated PFC and striatal areas. OCD patients reliably show performance deficits and/or altered blood oxygen level-dependent (BOLD) responses during reversal-learning paradigms [11, 12, 32, 33], and impaired reversal learning may reflect circuit dysfunction that could contribute to perseverative thoughts and actions [33]. We therefore used a reversal-learning paradigm to examine cognitive flexibility in Sapap3-KOs and wild-type (WT) littermates, and quantitative cFos analysis to determine if activity in PFC and striatal regions was differentially associated with reversal performance in Sapap3-KOs.

Materials and methods

Animals

Sapap3-KOs and WT littermates were maintained on C57BL/6 background, and were derived from a colony initially established at MIT by Dr. Guoping Feng [20]. Mice were group-housed with 2ā€“5 same-sex mice per cage and ad libitum access to food and water until operant training commenced between 5 and 7 months of age (further details about cohorts available in figure legends and Supplementary Materials). At least 12 days prior to commencement of operant training, mice were transferred to a reverse light-cycle room (12:12, lights on at 7:00ā€‰pm). All experiments were approved by the Institutional Animal Care and Use Committee at the University of Pittsburgh in compliance with National Institutes of Health guidelines for the care and use of laboratory animals.

Reversal learning: behavioral characterization

Mice were tested in a reversal-learning paradigm similar to that previously described [34]. Briefly, mice were tested in operant chambers (Med Associates, Fairfax, VT) with two levers positioned either side of a food magazine. Mice were trained to acquire lever pressing for food rewards (20-mg chocolate-flavored grain-based pellets; BioServ, Flemington, NJ) on a fixed-ratio 1 (FR1) schedule during hour-long sessions until they reached a criterion of 20 correct responses (see Fig.Ā 1a for training timeline). Next, they were trained for 6 days with the same lever designated correct on variable ratio 2 (VR2) schedule (30-min sessions, ā€œdiscrimination trainingā€). Lever contingencies were then reversed for 5 days (ā€œreversalā€), before changing back to the original contingency for 5 more days (ā€œ2nd reversalā€, 30-min sessions, VR2 schedule). The criterion for successful reversal was pre-specified atā€‰>ā€‰20 correct responses on at least 1 day of training.Ā Supplementary Materials include further details.

Fig. 1
figure 1

Sapap3-KOs show impaired reversal learning. a Timeline of operant training. b Following acquisition of lever training criteria, Sapap3-KOs showed normal levels of correct responding, although incorrect responses were lower than WT (c). d Following reversal, Sapap3-KOs showed impaired correct-response acquisition. e Perseverative responding on the previously correct lever was also lower in KOs, but only on the first day of training. ^ denotes P-value of the main effect, whereas * denotes P-value of post hoc tests comparing genotypes on each training day surviving the Bonferroni correction. nā€‰=ā€‰23WT, 28 KO. **P < 0.01, ***/^^^P < 0.001, ****/^^^^P < 0.0001. FR1 fixed-ratio one schedule of reinforcement, VR2 variable ratio 2 schedule, WT wild-type controls, KO Sapap3 knockout mice

Reversal learning: cFos analysis

Operant training for cohort 4 was similar to cohorts 1ā€“3. However, in order to end the experiment on the same day for all animals for the purpose of consistent tissue collection, we stopped training mice once they reached lever training criterion to allow mice that were slower to reach the same level of performance (all pauses in training wereā€‰<ā€‰7 days; see Fig.Ā 3a for training timeline). Once all mice reached criterion, they had 1 day of ā€œcatch-upā€ training on FR1 (20 correct responses maximum) before initiation of VR2 discrimination training the next day. Following a total of 6 days of discrimination, only 1 day of reversal training was performed prior to sacrifice and tissue collection (30-min session; early termination if mice earned 25 pellets). Mice that earned less than 25 pellets were given free access to the remaining pellets immediately after the session. All mice therefore consumed 25 pellets in total, minimizing the influence of total pellet consumption on cFos activation. More details available inĀ Supplementary Materials.

Tissue collection, immunohistochemistry, and analysis

Mice were deeply anesthetized with ketamine 2ā€‰h after the beginning of reversal testing, and transcardially perfused with 4% paraformaldehyde (PFA; Sigma Aldrich, St Louis, MO) in 0.1ā€‰M phosphate-buffered saline (PBS). cFos was detected using Millipore ABE-457 primary antibody (1:5000; Burlington, MA) and diaminobenzidine (DAB) kit (Vector Laboratories, Burlingame, CA) using standard protocols; seeĀ Supplementary Materials for further details.

Stained sections were imaged using an Olympus inverted slide scanning microscope (Olympus, Tokyo, Japan; 20x magnification, 4.6-ms exposure time), and automated cell counting was performed using CellSense software (Olympus). Ten cortical and striatal regions of interest (ROIs) were analyzed: medial and lateral OFC (mOFC and lOFC); prelimbic (PrL) and infralimbic (IL) regions of medial PFC (mPFC); dorsomedial, dorsolateral, centromedial, and ventromedial striatum (DMS, DLS, CMS, and VMS); and nucleus accumbens core (NAcC) and shell (NAcS). Three sections per animal spaced ~210ā€‰Ī¼m apart were analyzed bilaterally for each ROI for cFos counts (except for VMSā€”two sections). Cells were automatically detected in a 300ā€‰Ć—ā€‰300ā€‰Ī¼m area in each ROI, using pixel intensity threshold (red: 111ā€“256, green: 0ā€“175, and blue: 0ā€“120) and minimum area of super-threshold filter (9ā€‰Ī¼m) to detect cFosā€‰+ā€‰cells.

Statistical analysis

Repeated measures ANOVAs were used to test effects of genotype and training day during each phase of operant training, with Bonferroni-corrected post hoc tests if significant interactions were detected. Unpaired nonparametric Mannā€“Whitney tests were used to assess genotype differences on non-repeated measures that failed normality testing (acquisition of lever pressing; compulsive grooming severity; total number of inactive lever presses). For cFos analysis, repeated measures ANOVA was used to compare the effect of genotype across the brain regions, and Spearmanā€™s rho (Ļ) was used to correlate cFos levels between ROIs. To compare cFos measurements associated with reversal performance, a generalized linear mixed-effects (GLME) model was developed using experimental data (c-fos data and time stamps of operant responses from cohort 4). Using this model, we were able to estimate whether response rate was influenced by interactions between cFos expression and genotype in either a response type (correct/incorrect) or time (during testing)-dependent manner (response rate ~ROI cFosā€‰xā€‰genotypeā€‰xā€‰response type [correct/incorrect] or time); details of analysis can be found inĀ Supplementary Material. Unlike standard correlations which are often used to compare cFos expression to behavior, GLME has key advantages including: (1) direct modeling of the effect of genotype on the cFos-behavior relationship; (2) modeling within-subject variance (e.g., mice with high vs. low variability in response rate), which improves power by increasing the degrees of freedom; and (3) modeling the effects of time and response type (correct/incorrect), which are both critical factors in reversal learning.

Statistical analyses were completed using GraphPad Prism 6 software, except for GLME analyses, which were completed in R (version 3.3.3; package: stats:glm; link to code inĀ Supplementary Materials). While cFosā€‰+ā€‰cell density was intercorrelated across regions (FigureĀ S1), we did not perform dimension reduction in order to avoid assumptions about the determinants of these relationships. Instead, we adopted the conservative family-wise type I error control strategy for GLME, applying the Bonferroni correction for 10 regions (P was considered significant whenā€‰<0.005). Graphs show meanā€‰Ā±ā€‰standard error of the mean (SEM) unless otherwise stated. For graphs of GLME analysis, post hoc contrasts used Tukey adjustments to determine the genotype and response type accounting for observed three-way statistical interactions; see figure legend for more details.

Results

Impaired reversal learning in Sapap3 knockout mice

Cognitive flexibility was examined in male Sapap3-KOs and WT controls, using a reversal-learning paradigm (Fig.Ā 1a). Mice were first trained to acquire lever pressing; Sapap3-KOs showed a significant delay in acquisition (FigureĀ S2A, median days: WTā€‰=ā€‰3, KOā€‰=ā€‰5; Uā€‰=ā€‰156; Pā€‰=ā€‰0.0011). However, once mice attained criterion, WTs and KOs showed similar levels of correct responding throughout 6 days of discrimination training (Fig.Ā 1b, training day main effect: Pā€‰<ā€‰0.0001, F5,245ā€‰=ā€‰31.3; genotype main effect: Pā€‰=ā€‰0.95; no interaction). Interestingly, incorrect responses were lower in Sapap3-KOs compared with WT (Fig.Ā 1c; genotype main effect: Pā€‰<ā€‰0.0001, F1,49ā€‰=ā€‰19.5; training day main effect: Pā€‰=ā€‰0.001, F5,245ā€‰=ā€‰4.1; no interaction). Decrease in incorrect responses is consistent with reports of general locomotor hypoactivity in Sapap3-KOs (FigureĀ S3 and Ref. [29]). In Sapap3-KOs, similar rate of correct responses combined with reduced rate of incorrect responses is reflected by higher percentage of correct responses during discrimination training relative to WTs (FigureĀ S4A).

Upon rule reversal, Sapap3-KOs showed significant impairment in acquisition of the new contingency relative to WTs (Fig.Ā 1d; correct responses, genotype main effect: P= 0.0002, F1,49ā€‰=ā€‰15.8; training day main effect: P < 0.0001, F4,196ā€‰=ā€‰24.8; interaction: P = 0.002, F4,196ā€‰=ā€‰4.3). While post hoc tests indicate that the genotypes did not differ on the first day of reversal, Sapap3-KOs showed significantly impaired performance across days 2ā€“5. Sapap3-KOs also showed fewer perseverative responses on the previously correct lever on the first day of reversal, consistent with discrimination training (Fig.Ā 1e; genotype main effect: Pā€‰=ā€‰0.006, F1,49ā€‰=ā€‰8.2; training day main effect: P<0.0001, F4,196ā€‰=ā€‰194.6; interaction Pā€‰=ā€‰0.0002, F4,196ā€‰=ā€‰5.8).

Sapap3-KOs show heterogeneous reversal performance

Careful inspection of the data for individual differences suggested the presence of two populations within the KOsā€”one that acquired rule reversal, and one that failed to do so (Fig.Ā 2a; also see FigureĀ S5 for a histogram of the distribution of correct responses by genotype). Using the prespecified criterion of 20 correct responses during at least one session across 5 days of reversal training (dotted line: Fig.Ā 2a), 13/28 KOs failed reversal (KOFail). In contrast, no WT mice failed reversal after 5 days of training, with 16 reaching criterion on day 1, 5 on day 2, and 2 on day 5. A descriptive follow-up analysis demonstrated that KOs that successfully reversed (KORev) and WT have indistinguishable correct responses following reversal (Fig.Ā 2b; groupā€‰xā€‰training day interaction: Pā€‰<ā€‰0.0001, F2,47ā€‰=ā€‰43.0). However, KORev and KOFail did not differ on incorrect responses, with both groups showing lower perseverative pressing on the previously correct lever than WT on the first day of reversal (Fig.Ā 2c; group x training day interaction: Pā€‰=ā€‰0.005, F8,192ā€‰=ā€‰2.9; see legend for post hoc tests). WT and KORev made a similar number of responses before they reached reversal criterion (FigureĀ S6). Comparing the total number of responses made by KOFail across 5 days of training to responses to criterion in the other groups suggests that KOFail performed a sufficient number of responses to acquire the new rule. Reversal learning was similarly impaired in female Sapap3-KOs (FigureĀ S7).

Fig. 2
figure 2

Almost half of Sapap3-KOs show complete failure of reversal. a Sapap3-KOs can be segregated into two populations based on reversal performance. Sapap3-KOs that did not achieve more than 20 correct responses on any day of training (pink dotted line in a) were classified as failing reversal based on prespecified criteria (KOFail). b Sapap3-KOs that reached this criterion (KORev) did not differ from WTs in rate of acquisition of the new correct response. c All Sapap3-KOs showed lower perseverative (incorrect) responses on day 1 (Bonferroni-corrected post hoc test day 1: WT vs. KORev: t240ā€‰=ā€‰4.6; WT vs. KOFail: t240ā€‰=ā€‰4.5; KORev vs. KOFail: t240ā€‰=ā€‰0.1; all other days not significant). d, e Following 5 days of reversal 1, the lever contingency was returned to that used during discrimination training (2nd reversal). d KOFail did not differ from KORev in correct responding during the 2nd reversal, implying similar levels of response vigor between these two subgroups. e KOFail showed minimal incorrect responding throughout the 2nd reversal, and KORev showed fewer incorrect responses than WT on day 3 only (Bonferroni post hoc test day 3: WT vs. KORev t235ā€‰=ā€‰2.5; WT vs. KOFail t235ā€‰=ā€‰3.2; KORev vs. KOFail t235ā€‰=ā€‰0.8). Although Sapap3-KOs showed increased compulsive grooming (f), delayed acquisition of lever pressing (g), and reduced inactive lever pressing during discrimination (h) relative to WT controls, this did not differ between KOFail vs. KORev. i Severity of compulsive grooming and reversal performance (total # correct responses across 5 days of reversal training) were unrelated in Sapap3-KOs. Unfilled circles in panels f and i reflect mice that were identified as having severe lesions; filled circles denote mice that did not have lesions. ^ denotes P-value of the main effect, whereas * denotes P-value of post hoc tests comparing groups on each training day surviving the Bonferroni correction. Colors of symbols denote P-values for post hoc tests for WT vs. KOFail (pink); WT vs. KORev (gray); or KORev vs. KOFail (black) surviving the Bonferroni correction. For panels aā€“g, nā€‰=ā€‰23WT, 28 KO (13KOFail); for panel h, nā€‰=ā€‰14 WT, 17 KO (9KOFail); for panel i, nā€‰=ā€‰17 KO. *Pā€‰<ā€‰0.05, #Pā€‰<ā€‰0.10, **Pā€‰<ā€‰0.01, ***Pā€‰<ā€‰0.001, ****/^^^^Pā€‰<ā€‰0.0001. WT wild-type controls, KO Sapap3 knockout mice, KOFail Sapap3 knockout mice that fail to reach reversal criteria, KORev Sapap3 knockouts that reach reversal criteria

After 5 days of reversal training, total lever pressing was very low in KOFail (5th day: correct responseā€‰=ā€‰2.2ā€‰Ā±ā€‰1.3; incorrect responseā€‰=ā€‰16.3ā€‰Ā±ā€‰4.5) compared with an average ofā€‰>ā€‰100 total responses in WT and KORev, suggesting that the failure to reverse could be a consequence of reduced motivation or vigor. To test this possibility, a second reversal was performed with the levers returned to their original contingencies. KOFail and KORev were indistinguishable in their reacquisition of the original discrimination rule, suggesting similar levels of response vigor (Fig.Ā 2d; group main effect: Pā€‰=ā€‰0.02, F2,47ā€‰=ā€‰4.2; training day main effect: Pā€‰=ā€‰0.0007, F4,188ā€‰=ā€‰5.0; no interaction). Incorrect responses on the original contingency were negligible in KOFail, and KORev showed less incorrect responding than WT on day 3 of the second reversal (Fig.Ā 2e; group main effect: Pā€‰<ā€‰0.0001, F2,47ā€‰=ā€‰25.9; training day main effect: Pā€‰<ā€‰0.0001, F4,188ā€‰=ā€‰72.8; interaction: Pā€‰<ā€‰0.0001, F8,188ā€‰=ā€‰16.5; see legend for post hoc tests).

Other behavioral measures do not account for success or failure of Sapap3-KOs in reversal learning

Compared with WT, Sapap3-KOs showed delayed acquisition of initial lever pressing, less incorrect responding during discrimination training (FigureĀ S2), and increased grooming. To determine whether these factors could help distinguish KORev and KOFail, data were reanalyzed based on reversal outcomes; however, KO reversal performance was not correlated with differences in any of these behaviors (Fig.Ā 2fā€“i; correlation between grooming and reversal performance: Rā€‰=ā€‰0.09, Pā€‰=ā€‰0.73; correlation between lever press acquisition and reversal performance: Rā€‰=ā€‰āˆ’0.17, Pā€‰=ā€‰0.38; correlation between incorrect responses during discrimination and reversal performance: Rā€‰=ā€‰0.20, Pā€‰=ā€‰0.30).

To determine whether genetic or environmental litter effects were contributing to reversal failure, concordance of reversal phenotype between mice with the same parents (either littermates or different litters from the same breeding pair) was analyzed in 17/28 male KOs (note: remaining 11 KOs had no comparators since they were the only ones from a breeding pair tested in reversal). No evidence for litter/breeding pair effects on reversal performance were observed, with 10/17 mice showing discordance for reversal performance.

Altered relationship between neural activity and reversal performance in Sapap3-KOs

To examine relationships between neural activity and reversal learning in Sapap3-KOs and WTs, brains were collected 2ā€‰h after commencement of training on day 1 of reversal to assess cFos expression related to reversal learning (Fig.Ā 3a). Four cortical and six striatal subregions of interest were analyzed (Fig.Ā 3b). Repeated measures ANOVA comparing regional activity between genotypes revealed no significant differences between KOs and WTs (Pā€‰=ā€‰0.88, F1,21ā€‰=ā€‰0.023), and no region x genotype interactions (Pā€‰=ā€‰0.32, F9,189ā€‰=ā€‰1.17, Fig.Ā 3c). Correlated cFos density was calculated for all pairs of ROIs in each genotype (Fig.Ā 3d). This analysis revealed significantly increased correlations between ROIs in KOs relative to WTs (paired t test of Ļ values for Spearmanā€™s correlations: t44ā€‰=ā€‰7.3, Pā€‰=ā€‰4.1ā€‰Ć—ā€‰10āˆ’9, mean difference in Ļā€‰=ā€‰0.23), which may reflect stronger functional connectivity [35]. Total number of correct responses did not differ between genotypes on the first day of training; however, the temporal profile of correct and incorrect responses was altered in KOs (FigureĀ S8).

Fig. 3
figure 3

Increased correlation of regional cFos expression in Sapap3-KOs following reversal. a Mice were trained in reversal learning according to an experimental timeline similar to the previous experiment, and brains were collected 120ā€‰min after commencement of training on Day 1 of reversal to assess expression of the immediate early gene cFos. b cFos was quantified in 10 cortical and striatal regions of interest (ROIs); left of each panel shows representative staining and right of each panel shows schematic brain atlas image with ROIs highlighted. c The density of cFos- positive cells did not differ between genotypes in any of the regions assessed. d Rho (Ļā€‰) values of pairwise Spearmanā€™s correlations between ROI cFos measurements were elevated in KO mice, suggesting strengthened connectivity (paired t test of Ļ values for correlations: t44ā€‰=ā€‰7.3, Pā€‰=ā€‰4.1ā€‰Ć—ā€‰10āˆ’9, mean difference in ā€‰Ļ=ā€‰0.23). nā€‰=ā€‰11 WT, 12 KO. FR1 fixed-ratio one schedule of reinforcement, VR2 variable ratio 2 schedule, mOFC medial orbitofrontal cortex, lOFC lateral orbitofrontal cortex, PrL prelimbic prefrontal cortex, IL infralimbic prefrontal cortex, DLS dorsolateral striatum, DMS dorsomedial striatum, CMS centromedial striatum, VMS ventromedial striatum, NAcS nucleus accumbens shell, NAcC nucleus accumbens core, WT wild-type controls, KO Sapap3 knockout mice

GLME was used to investigate how genotype moderated the relationship between neural activation in the ROIs (cFos density) and reversal performance; all ROI effects from GLME are described in TableĀ S1. In two prefrontal regions (PrL and IL), increased cFos was associated with poor reversal learning in Sapap3-KOs (Fig.Ā 4a, b; response rate ~genotype x response type x ROI: PrL: Pā€‰=ā€‰0.0008; IL: Pā€‰=ā€‰1.9ā€‰Ć—ā€‰10āˆ’7). Specifically, increased cFos in PrL and IL was associated with fewer correct responses in Sapap3-KOs, whereas low c-fos densities were associated with comparable response rates in KOs and WTs. Furthermore, within Sapap3-KOs, increased cFos in PrL and IL was associated with increased perseverative incorrect responses, whereas at low cFos density Sapap3-KOs perseverated less than WTs. Differential effects of genotype were also observed in the influence of mOFC cFos between response types (Fig.Ā 4c; response rate ~genotype x response type x ROI: mOFC: Pā€‰=ā€‰0.0007). Both WT and KO mice showed reduced correct responding with increasing cFos, although high cFos was associated with significantly lower correct responding in Sapap3-KOs relative to WTs. In WTs, only high cFos in mOFC was also associated with reduced incorrect responses.

Fig. 4
figure 4

Differential associations between regional cFos expression and reversal performance in Sapap3-KOs and WTs. The relationship between regional cFos expression, genotype, and response rate on each response type was assessed using GLME analysis, which revealed altered associations between regional cFos and reversal performance in Sapap3-KOs. Data points indicate the least-squares (LS) mean of response rateā€‰Ā±ā€‰95% confidence interval (Sidak method for 12 estimates) at three cFos densities inputted to the GLME model (approximately the mean cFos densityā€‰Ā±ā€‰1 standard deviation, rounded to whole numbers as appropriate). This gives three points along the x axis representing the influence of regional cFos density on predicted response rate. a, b In PrL and IL, increased cFos was associated with poorer reversal acquisition in Sapap3-KOs as indicated by both reduced correct responding and increased incorrect responding. c In mOFC, increased cFos was associated with lower correct responding in both WT and Sapap3-KOs; however, this effect was larger in Sapap3-KOs, resulting in significantly lower correct responding relative to WT at medium and high cFos densities (denoted by *). In contrast, increased mOFC cFos was associated with reduced perseverative incorrect responses in WT only. nā€‰=ā€‰11 WT, 12 KO. * indicates difference between genotypes; # indicates difference across cFos levels for Sapap3-KOs; ^ indicates difference across cFos levels for WTs (Tukey-adjusted comparisons for 12 estimates). mOFC medial orbitofrontal cortex, PrL prelimbic prefrontal cortex, IL infralimbic prefrontal cortex, WT wild-type controls, KO Sapap3 knockout mice, GLME generalized linear mixed effects

Changes in the combined rate of correct and incorrect responses across the first day of reversal training primarily reflect the initial burst of high-rate perseverative responding (i.e., the ā€œextinction burstā€) [36], which typically decays over 30ā€‰min after initiation of extinction (Fig.Ā 5a, also FigureĀ S8C, D). Thus, successful extinction manifests in lower response rates late in the testing session. Genotype-dependent associations between response rate changes across the session and ROI cFos were observed in mOFC, PrL, IL, and NAc (Fig.Ā 5bā€“f; response rate ~genotype x time x ROI: mOFC: Pā€‰=ā€‰3.7ā€‰Ć—ā€‰10āˆ’6; PrL: Pā€‰=ā€‰0.001; IL: Pā€‰=ā€‰0.001; NAcC: Pā€‰=ā€‰1.4ā€‰Ć—ā€‰10āˆ’5; NAcS: Pā€‰=ā€‰8.3ā€‰Ć—ā€‰10āˆ’5). The significant interaction term derived from the GLME is graphically represented by plotting the predicted output (response rate) of the model across a spread of typical cFos densities and timepoints that are inputted into the GLME model (Fig.Ā 5). For example, in Fig.Ā 5c, for two WT mice with high and low PrL cFos, respectively, the GLME model predicts that their response rate is comparable early in the session, but deviates in the middle and late timepoints of testing, with high PrL cFos associated with better extinction. This pattern is not seen in Sapap3-KOs, in which different PrL cFos levels do not predict different response rates at any timepoint during testing. Across the significant ROIs, a similar pattern emerged. In WT, higher levels of cFos were associated with reduced responding late in the testing session, consistent with activity associated with successful extinction. In Sapap3-KOs, the temporal profile of recruitment of these areas was reversed, with increased cFos in IL and NAcC/S associated with reduced perseveration early in testing.

Fig. 5
figure 5

Differential associations between regional cFos expression and response rate across reversal session between Sapap3-KOs and WTs. Genotype-dependent associations between regional cFos expression and response rate changes across the reversal session (for correct and incorrect levers combined) were assessed using GLME. Data points indicate the least-squares (LS) mean of response rateā€‰Ā±ā€‰95% confidence interval (Sidak method for 18 estimates) at three cFos densities (approximately the mean cFos densityā€‰Ā±ā€‰1 standard deviation, rounded to whole numbers as appropriate). This generates three points along the x axis representing the influence of regional cFos density on response rate. a A LOESS model was used to generate smoothed response rate and error bands for behavior throughout the testing session for WT and KO mice, which are shown for illustrative purposes. This demonstrates elevated responding early after rule reversal, corresponding to the extinction burst which eventually decays as extinction initiates later in the session. The profile of responding across the session differed between the genotypes; WTs perseverated until the middle of the testing session, whereas extinction commenced earlier in KOs. The significant interaction term derived from the GLME is graphically represented in panels B-F by plotting the predicted output (response rate) of the model across a range of typical cFos densities (~mean, meanā€‰+ā€‰1 standard deviation, mean ā€“ 1 standard deviation) and timepoints (early, middle, and late time bin; dotted lines on a) inputted into the model. This analysis revealed genotype differences in the association between regional cFos expression and response rate changes during the session. Five ROIs showed similar patterns of changes: b mOFC, c PrL, d) IL, e NAcC, and f NAcS. Specifically, late in the reversal session, WT mice showed higher response rates than Sapap3-KOs at the lowest cFos density. Furthermore, in WTs, increased cFos expression was associated with reduced response rate, reflecting successful extinction (significant for all ROIs except IL). Similar patterns of cFos-dependent modulation of response rate were observed in the middle of testing in WT mice. At this timepoint, Sapap3-KOs also showed reduced response rate with increasing cFos in NAcC and mOFC. In contrast, early in the session, WTs showed no cFos-dependent modulation of response rate, whereas Sapap3-KOs showed an association between increased cFos in NAcS, NAcC, and IL and reduced response rate reflecting reduced perseveration. nā€‰=ā€‰11 WT, 12 KO. * indicates difference between genotypes; # indicates difference across cFos levels for Sapap3-KOs; ^ indicates difference across cFos levels for WTs (Tukey-adjusted comparisons for 18 estimates). mOFC medial orbitofrontal cortex, PrL prelimbic prefrontal cortex, IL infralimbic prefrontal cortex, NAcS nucleus accumbens shell, NAcC nucleus accumbens core, WT wild-type controls, KO Sapap3 knockout mice, GLME generalized linear mixed effects

Discussion

These studies demonstrate that Sapap3-KO mice, the most widely used and well-validated transgenic model in preclinical OCD research, show impairments in OCD-relevant cognitive flexibility using an instrumental reversal-learning paradigm. There was striking heterogeneity in this effect. Almost 50% of Sapap3-KOs completely failed to acquire a reversed contingency, while the other half were indistinguishable from controls in reversal acquisition. To interrogate neural correlates of this heterogeneity, we used quantitative cFos analysis in PFCā€“striatal circuits to assess regional activation and associations with reversal performance. In Sapap3-KOs, increased neural activity in PrL and IL was associated with behavioral impairments following rule reversal (Fig.Ā 4a, b; both higher rate of incorrect responses and lower rate of correct responses). In contrast, this association was not seen in WTs. Instead, increased neural activity in PrL, IL, mOFC, and NAc was associated with reduced response rate late in the reversal testing session in WT (Fig.Ā 5bā€“f), consistent with effective engagement of networks that support extinction of the previous correct response.

Although Sapap3-KOs are genetically identical, we found that only a subset showed dramatic impairment in reversal learning. This model system is therefore ideal to explore correlates of OCD-relevant behavioral heterogeneity in a uniform genetic background. First, we found that KORev and KOFail had similar levels of grooming behavior, indicating that reversal impairment was unrelated to compulsive grooming severity, and suggesting that different neural circuits may underlie these two phenotypes. Note, in contrast to the original paper describing Sapap3-KOs [20], we did not observe 100% penetrance of compulsive grooming at 4ā€“6 months of age. However, the original study was conducted in mice on a mixed 129/Sv//C57BL/6 background strain [Dr. Guoping Feng, personal communication; also see Fig.Ā 1a of Ref. [20]]. Our findings are consistent with another recent report using Sapap3-KOs on a C57BL/6 background [37]. Second, KORev and KOFail did not differ in their acquisition of lever pressing at the beginning of training, suggesting that reversal failure does not reflect a general learning impairment. Third, pedigree analysis of Sapap3-KOs demonstrated that most littermates were discordant for reversal learning impairment, suggesting that genetic drift within the colony or litter-related factors (e.g., parental behavior or epigenetic factors) are unlikely to contribute to the observed heterogeneity. In the absence of clear causal factors, we next turned to cFos analysis to determine whether neural activity patterns in cortico-striatal circuits could help explain this behavioral heterogeneity.

Interestingly, impaired reversal-learning performance in Sapap3-KOs was associated with increased cFos expression in the mPFC (PrL/IL; Fig.Ā 4a, b). In contrast, neither correct nor incorrect responses were associated with mPFC cFos density in WT mice. Correlations between ROI cFos measurements were also stronger in Sapap3-KOs, suggesting that altered patterns of mPFC activity in KOs may have more pronounced influence over brain-wide activity patterns via increased functional connectivity. Our findings suggest that heterogeneous reversal-learning performance in Sapap3-KOs may result from heterogeneity in aberrant mPFC activity, which is consistent with studies in OCD patients examining neural activity in vmPFC, the human homolog of the rodent mPFC [38]. Similar to our findings in Sapap3-KOs, vmPFC hyperactivity was recently associated with disrupted fear reversal learning in OCD patients. Specifically, vmPFC activity was elevated in OCD subjects relative to healthy controls throughout training, and did not show the appropriate modulation required to update the reversed contingency [13]. Impaired recruitment and modulation of vmPFC activity in OCD patients has also been reported during fear extinction recall [14], task switching [15], and a flanker task (during errors) [16], suggesting that vmPFC dysfunction in OCD patients may disrupt flexible learning of both rewarding and aversive contingencies. vmPFC hyperactivity is also associated with symptom-provoked anxiety in OCD patients with hoarding [8], and mPFC has recently been implicated in compulsive checking in a rodent model [39]. Together with our data, these findings suggest that more work is warranted investigating vmPFC contributions to OCD symptoms in the context of flexible learning. Interestingly, reversal-learning performance deficits have been inconsistent in studies of OCD patients [11, 12, 32, 33]; our findings suggest that these inconsistencies may reflect heterogeneity in neuropathology between study subject populations. OFC dysfunction has been demonstrated in OCD patients during reversal learning independent of reversal performance [11, 12]; however, well-powered clinical studies with baseline and reversal-learning-related neuroimaging data have not been performed to address whether heterogeneous task performance is associated with distinct neural correlates. Based on our current findings, we would predict that elevated vmPFC activity is associated with severity of reversal-learning impairments in OCD patients.

Our findings in WT mice are consistent with studies demonstrating that mPFC lesion and pharmacological inactivation have no significant effect on reversal learning in normal rodents [34, 40, 41]. However, these same interventions disrupt cognitive flexibility in more complex set-shifting tasks [40, 41], suggesting a role for mPFC during greater cognitive load [42]. Consistent with this idea, rule-responsive neurons have been identified in the mPFC using in vivo electrophysiology not only during set shifting [43,44,45], but also during reversal learning and stable performance of rule-guided decision-making [43]. Importantly, the proportion of responsive cells is increased during set shifting relative to reversal learning and stable performance, which likely mediates the critical role of the mPFC in this behavior [43]. Although the potential function of rule-responsive mPFC neurons during reversal learning is currently unknown, findings from a recent study suggest that they may impact reversal performance. Using a spatially restricted pharmacogenetic approach, the authors demonstrated that acute suppression of PrL activity significantly improves reversal learning [46]. This supports our finding that increased mPFC cFos is associated with impaired reversal learning in Sapap3-KOs, and suggests that aberrant activity in a subpopulation of PrL neurons could interfere with reversal learning, potentially by encoding the old rule. It is also currently unclear whether reversal-learning-associated recruitment of mPFC activity (as quantified using cFos) is state-dependent (i.e., dependent on performance in both KORev and KOFail) or a stable trait (i.e., inherently and stably different in KORev and KOFail). In the data presented in Fig.Ā 2a, 8/15 mice in the KORev group did not reach reversal criteria on the first day of training. If mPFC activity during reversal is state-dependent, we would predict that these mice would show mPFC hyperactivity on the first day of training because their reversal performance is poor; however, on subsequent days, we would expect a decline in mPFC activity, permitting acquisition of the new rule. Further investigation of mPFC involvement in reversal learning is warranted, particularly in the context of impaired performance in Sapap3-KOs, where longitudinal in vivo measurement of neural activity during reversal learning will be important to test these predictions.

The combined rate of correct and incorrect responses typically decays across the first day of reversal (Fig.Ā 5a), as the mouse begins to undergo extinction of the previously correct response (FigureĀ S8D). Although cFos measurements reflect the sum of activity across the entire session, we were able to use GLME analysis to identify genotype-dependent associations between cFos densities inputted into the model and predicted response rates at different timepoints during the session. These changes in associations across the session may reflect differential engagement of extinction, reward seeking, and value comparison processes subserved by these areas at different points during reversal learning. Typically, the IL-NAcS circuit is most strongly implicated in the acquisition and expression of extinction [47, 48], whereas the PrLā€“NAcC is more implicated in the expression of reward-seeking behavior [49, 50]. However, in the context of reversal learning, PrLā€“NAcC may help to invigorate exploratory reward-seeking behavior that supports acquisition of the new correct response and decreases perseverative responding. In addition, the mOFC plays a critical role in value comparisons [51] and helps guide flexible decision-making based on changing outcome values (e.g., following outcome devaluation) [34, 52]. Therefore, increased mOFC activity may support extinction of the previously correct response through improved representation of the new value associated with that response following reversal. Findings from the GLME analysis were consistent with these ideas, by demonstrating that elevated neural activity in PFC (PrL, IL, and mOFC; Fig.Ā 5bā€“d) and NAc (both core and shell; Fig.Ā 5e, f) supports extinction late on the first day of reversal in WTs. These associations were absent late in training in Sapap3-KOs, although some overlapping ROIs were associated with reduced response rate early and mid-way through training in KOs. These changes are consistent with an altered response profile in KOs, including attenuated peak of ā€œextinction burstā€, suggesting that earlier recruitment of extinction-associated networks may contribute to these changes in the profile of responding in Sapap3-KOs [47,48,49].

It is important to recognize the limitations of static cFos measurements, which lack temporal resolution to precisely link neural activity to specific behaviors, or to describe dynamic changes in activity that may underlie behavior. Based on previous studies demonstrating that cFos expression peaks 60ā€“120ā€‰min following neural activity, and is relatively stable across this period [53], our assumption is that neural activity at the start (120ā€‰min prior to tissue collection) and end (up to 90ā€‰min prior to tissue collection) of testing contributed equally to cFos-positive cell counts. However, if neural activity led to greater cFos expression at particular times during the session, this could complicate our interpretation of genotype differences, due to the differential time course of learning in WTs and KOs leading to increased frequency of early session termination in WTs. Furthermore, correct responding in this paradigm is proportional to correct-response contingent rewards received in the session (~2:1 on VR2 schedule), and it is therefore possible that significant associations between ROI cFos and correct responding are influenced by receipt of contingent rewards rather than performance of correct responses. Nonetheless, by surveying 10 ROIs simultaneously and leveraging GLME analysis, these quantitative cFos studies have generated novel hypotheses regarding neural mechanisms regulating OCD-associated cognitive flexibility. Future studies using more temporally precise measurements of neural activity in vivo will be important to address the limitations of these experiments and test these new hypotheses. It is also important to note that Sapap3-KOs are less active than WTs, which could confound interpretation of reversal-learning findings. However, it is unlikely that our observed impairments in reversal learning are a consequence of reduced activity, because KOs show similar levels of correct responding during discrimination training (Fig.Ā 1b), suggesting similar ability to engage in the task. Nevertheless, this is an important consideration for researchers using this transgenic model, as cognitive tasks with higher motor demands may be susceptible to this confound.

In conclusion, these studies are among the first to describe impairments in OCD-relevant cognitive flexibility in Sapap3-KOs [29]. Despite the limitations of cFos as an indirect measure of neural activation, our findings suggest that severity of reversal-learning deficits is related to hyperactivity in the mPFC. Future studies using this model may also help to identify specific neural mechanisms underlying mPFC hyperactivity and disruption of cognitive flexibility. This could ultimately guide the development of novel neuromodulatory treatments that could potentially both decrease OCD symptoms and facilitate vocational and social functioning.

Funding and Disclosure

These studies were supported by BRAINS R01 MH104255, McKnight Neuroscience Scholar Award, MQ Fellows Award, Burroughs Wellcome Fund CAMS Award, Klingenstein Fellowship in the Neurosciences (SEA), and American Australian Association Sir Keith Murdoch Fellowship (EEM).