Introduction

Cognitive flexibility is the ability to shift perspectives and strategies in the face of changing environmental contingencies [1]. Deficits in cognitive flexibility are among the most pervasive symptoms across psychiatric and neurological disorders, including schizophrenia, addiction, obsessive-compulsive disorder, bipolar disorder, frontal lobe damage, and anxiety disorders, among others [2,3,4,5,6,7,8]. As such, understanding the circuits that regulate this cognitive construct, as a first step towards developing effective therapies to treat cognitive flexibility deficits, is a top priority.

Much elegant work has elucidated regions and neurotransmitter systems [9, 10] in the frontal cortex [11, 12] and striatum [13,14,15] that play key roles in cognitive flexibility [16]; however, the complete circuitry underlying this construct, specifically the mechanism by which flexible decision making is enacted downstream, remains unclear. One brain region that has emerged is a sub region of the basal forebrain called the medial septum (MS). The MS has a well-established role in learning and memory processes [17,18,19,20], specifically in behaviors that require inhibition of prior learned information in order to learn a new rule [17, 18, 21] or prevention of attention to irrelevant stimuli [22]. Furthermore, we recently showed that MS activation potently regulates dopamine (DA) neuron population activity in both the ventral tegmental area (VTA) and substantia nigra pars compacta (SNc) [23, 24], via a previously described circuit [25,26,27,28] (Fig. 1a) that includes the ventral subiculum of the hippocampus (vSub) and ventral pallidum [23]. This is important because DA neuron population activity and release has been shown previously to be necessary for reward learning, both in general and in the context of cognitive flexibility tasks [29,30,31,32,33]. Therefore, these data highlight the possibility that the MS could play a key role in cognitive flexibility, and that its effects on DA neuron population activity could be a downstream mechanism by which flexible decision-making is enacted.

Fig. 1
figure 1

Proposed pathway from MS to midbrain and MS DREADD virus spread and expression. a Medial septum (MS) activation was shown previously to increase DA neuron population activity in the ventral tegmental area (VTA) and decrease it in the substantia nigra pars compacta (SNc) [23, 24]. Both effects required activation of the ventral subiculum (vSub), as intra-vSub infusions of TTX prevented both the VTA increase and the SNc decrease in DA neuron population activity [23]. The vSub activates the nucleus accumbens (NAc [26, 30]), which inhibits the ventral pallidum (VP [27]). VP regulation of midbrain DA neuron population activity was shown to be functionally divided along the rostral/caudal axis, with the rostral VP (rVP) selectively modulating DA neuron population activity changes in the VTA and the caudal VP (cVP) selectively modulating DA neuron population activity changes in the SNc [23]. VP inhibition disinhibits the VTA, leading to an increase in DA neuron population activity. In contrast, VP inhibition more potently disinhibits the substantia nigra pars reticulata (SNr) due to the greater sensitivity of GABAA receptors on reticulata GABAergic interneurons compared to SNc DA neurons [58,59,60]. Thus, DA neurons in the SNc are inhibited by SNr disinhibition, leading to a decrease in DA neuron population activity. b Diagram and corresponding representative photomicrograph showing the extent of the viral spread at the point of infusion, i.e., the area with the greatest expression and medial–lateral spread. This was well-contained within the MS and occurred at AP 0.48 for the majority of DREADD (red; right) and control (green; left) rats. Arrows denote the actual placement of the representative photomicrograph compared to the diagram to provide further context for overall viral placement in brain. Representative images at 30x magnification to show DREADD expression around cell bodies in MS. The c red staining is the endogenous reporter molecule, mCherry, and the d green is the mouse mCherry primary and goat anti-mouse (alexa 488) secondary stain. e The overlay shows co-staining of the endogenous reporter and the primary/secondary antibody stain, suggesting good expression of the DREADD receptor in the MS and good selectivity of the antibody for mCherry

To test this hypothesis, we activated the MS of male Sprague–Dawley rats with designer receptors exclusively activated by designer drugs (DREADDs) and measured their performance on a T-maze spatial reversal-learning task. Next, we determined if both the effects of MS activation on reversal learning and the previously described effects on VTA and SNc DA neuron population activity (measured using in vivo electrophysiology) were mediated via the same pathway. Finally, we determined whether DA transmission at D1 receptors was necessary to produce the MS’s effects on reversal learning.

Materials and methods

Animals

Experiments were performed using adult male Sprague–Dawley rats (400–500 g, Envigo, Frederick, MD). Rats were housed in pairs with ad libitum access to food and water in a temperature and humidity controlled room before and after DREADD infusion surgeries. Experimental procedures were approved by the Institutional Animal Care and Use Committee of the University of Pittsburgh according to National Institute of Health Guide for the Care and Use of Laboratory Animals.

Viral construct and survival surgeries

Rats were secured to a stereotaxic frame under general anesthesia (isoflurane in oxygen, 5% induction and 2% maintenance). A 5.0 µL Hamilton (Reno, NV) syringe with a 30 gauge needle connected to a Micro4 microsyringe pump controller (World Precision Instruments, Sarasota, FL) was lowered into the MS through a burr hole in the skull at a 5° angle to avoid the sinus (AP: 0.5, ML: −0.54, DV: −6.22 mm from bregma). A virus with the activating DREADD attached to the human synapsin promotor and the m-cherry reporter (AAV2–hSyn–hM3Dq–mCherry; Addgene, Watertown, MA) or an empty vector control (AAV2–hSyn–EFP) was infused into the MS (0.1 µL of air + 0.8 µL virus, 0.1 µL/min) over 9 min with an additional 9 min to allow for adequate viral diffusion. The burr hole was then sealed with bone wax and the incision closed with EZ clips. Rats were re-paired after a 2-week recovery and they remained for 8 or 12 total weeks to allow sufficient cell body and terminal viral expression, respectively.

Cannula implantation

Ten weeks after virus infusion surgeries, a subset of rats received a second surgery to implant a 26 gauge guide cannula (Plastics One) into their ventral subiculum (vSub; AP: −6.0, ML: 4.5 mm from bregma; DV: −7.5 mm from skull). Guide cannulae were implanted 1 mm above the target DV (−8.5 mm) to allow for a 1 mm protrusion of the infusion cannula. The sterile surgery procedure was followed as above, and cannulae were secured with bones screws and dental cement. Rats were re-paired after a 1 week recovery.

T-maze reversal learning

Rats were split into two groups. The first group began training eight weeks after virus infusion surgery (viral transfection of MS cell bodies) and received systemic CNO or vehicle injections (Intraperitoneal (IP), 1st experiment) or systemic CNO or vehicle and SCH23390 injections (3rd experiment). The second group began training 12 weeks after virus infusion surgery (viral transfection of MS terminals in vSub) and received CNO or vehicle infusions via an implanted cannula (see above) directly into the vSub (2nd experiment; Intra-vSub).

Training

Briefly (see supplementary methods and Fig. S.1 for further detail), rats were habituated to the maze on the first day of training, which consisted of free exploration of the maze, with random treat placement, for 15–20 min. On the 2nd day, rats were required to learn that 1 arm of the maze was baited and the other was not (egocentric discrimination, left or right arm). Once an arm was entered, a sliding door was lowered, which prevented entrance into the other arm. If the rat chose correctly, it could consume the food reward; if not, no reward was given. Trials continued until rats “learned” which side was correct. Learning was defined as reaching a criterion of 6 correct trials in a row.

Test day

On the 3rd day, rats were given systemic clozapine-N-oxide hydrochloride (1st experiment; IP CNO, 3 mg/kg converted to 3.639 mg/kg to account for molecular weight increase of salt form; Tocris) mixed in Dulbecco’s phosphate buffered saline (dPBS; Sigma Aldrich) or dPBS (vehicle), intra-vSub CNO (2nd experiment; 1 mM, 0.5 µL) or vehicle, or systemic CNO (3rd experiment, IP; 3 mg/kg) or vehicle and SCH23390 (3 µg/kg; Tocris; dose similar to those shown not to affect controls in previous studies [34,35,36]) and then placed into the start arm 10 (intra-vSub infusion) or 30 (IP injection) minutes later (see Fig. S.1 for timeline). See supplemental materials and methods for a discussion on the use of CNO. The task began with the same side as day 2 baited and the trials were carried out in the same manner as day 2. Upon re-reaching criterion, the opposite side was baited. Following the reversal, the session continued until criterion was reached on the newly baited side (6 correct in a row). Total trials to reach criterion before and after the reversal, number of entries into the previously baited side post-reversal, consecutive trials before first entrance into the newly baited side, and latency to reach the reward cup before and after the reversal were quantified. Trial progression was also qualified into 4 categories (win-stay, win-shift, lose-stay, lose-shift) depending on whether the rat chose the correct arm just after entering the correct (win-stay) or incorrect (lose-shift) arm or the incorrect arm just after entering the correct (win-shift) or incorrect (lose-stay) in the previous trial.

Electrophysiological recordings

Recordings were performed similar to that previously reported [23, 24] (see supplemental materials and methods). CNO (1 mM, 0.5 µL) or vehicle was infused into the vSub, as above (0.5 µL/min), and 10 min later electrophysiological recording of the VTA or SNc began. A glass recording electrode was lowered into the brain in nine sequential vertical “tracks” (see Fig. 4), moving lateral or posterior 0.2 mm for each new track. This recording pattern allowed assessment of the total number of active DA neurons recorded within an animal, defined as population activity, which was then averaged across the total number of tracks recorded (neurons/track). DA neuron population activity was normalized within each animal and then analyzed across animals. Coordinates were determined using an atlas [37] and follow a previously described pattern [38,39,40]. DA neurons were identified using well-established criteria [41,42,43]. Once identified, neurons were recorded for 3 min and assessed for firing rate and burst firing properties. At the conclusion of the ninth track, electrophoretic ejection of Chicago sky blue dye marked the recording location for histological confirmation of electrode site.

Experimental design

To minimize animal number, multiple endpoints were gathered from each rat. Each rat performed two runs in the T-maze reversal learning task, counter-balanced across groups and separated by at least 2 weeks to minimize carry-over effects, and were then used for non-survival electrophysiology experiments when possible.

Histology

Rats were sacrificed, perfused transcardially with an aldehyde-based fixative for histological analysis of brain sections, and then decapitated. Brains were removed, fixed in 4% paraformaldehyde solution, cryoprotected in a 25% sucrose solution, sectioned (coronal sections; 60 µm for placement histology, 40 µm for fluorescent histology), and mounted on gelatin-coated slides. Slides were either stained with cresyl violet for histological verification of cannula placement [37] or stained with a mouse mCherry primary (1:8000, Abcam-ab125096) and goat anti-mouse secondary stain (alexa 488, 1:500; Abcam-ab150113) for verification of DREADD virus transfection and spread.

Statistical analysis

DA neuron population activity was analyzed using LabChart and NeuroExplorer. Electrophysiological measures included DA neuron population activity (neurons/track), neuron firing rate (Hz), and percentage of spikes occurring in bursts (%), within each rat. Behavioral dependent measures, error type, and trial progression (as mentioned above) were also quantified. All measures (reported in mean ± SEM), were analyzed by one-way ANOVA. Post-hoc analyses were performed using the Tukey’s test. Significance is defined as P < 0.05 (GraphPad Prism 7).

Results

DREADD expression

Viral vectors were highly expressed by cell bodies and terminals, but expression at the infusion site was well-contained within the MS (Fig. 1b, see Fig. S.2 for viral spread diagram for every rat used in this study). DREADD rats that received CNO (DR/CNO) in which viral expression was found to be substantially in regions lateral or ventral to the MS were analyzed separately (see Figs. S.3 and S.5). In terms of anterior to posterior viral expression, the mean spread range across all rats was from AP: 1.40 ± 0.04 to 0.05 ± 0.02; thus, encompassing the vast majority of the anterior-posterior range of the MS.

DREADD activation of the MS enhances spatial reversal learning

To determine the effect of MS activation on spatial reversal learning performance, the activating DREADD (DR) or empty vector control (Con) was infused into the MS of Sprague–Dawley rats (N = 11–13 rats per group). After 8 weeks, CNO (3 mg/kg) or vehicle (Veh) was injected systemically and the reversal learning task began 30 min later (Fig. S.1). On the first half of the test day, DREADD rats that received systemic CNO (DR/CNO) reached criterion on the initial egocentric discrimination slightly, but not significantly, faster than control rats (Fig. 2a; mean ± SEM, trials to criterion before the reversal = Con/Veh: 12.5 ± 1.5, Con/CNO: 16.9 ± 1.5, DR/Veh: 14.9 ± 2.7, DR/CNO: 10.8 ± 1.4; F3,43 = 2.211, p = 0.101). Following the reversal, however, DR/CNO rats reached criterion in the reversal in significantly fewer trials (Fig. 2b; Con/Veh: 25.4 ± 1.9, Con/CNO: 24.5 ± 2.1, DR/Veh: 22.6 ± 1.9, DR/CNO: 15.9 ± 1.5; F3,43 = 5.28, p = 0.0034) and entered the previously baited arm significantly fewer times (Con/Veh: 9.2 ± 0.8, Con/CNO: 9.6 ± 1.0, DR/Veh: 10.4 ± 1.6, DR/CNO: 5.6 ± 0.6; F3,42 = 4.24, p = 0.0104) compared to the control groups (Fig. 2c). This effect was specific to rats with DREADD infusion in the MS, as viral infusion in the lateral septum or horizontal limb of the diagonal band (N = 3 rats) produced behavioral effects not different from control rats (See Fig. S.3; trials to criterion after reversal = 24.7 ± 1.5, entries into the previously baited arm = 10.0 ± 2.1). The enhanced reversal learning effect was not driven by a change in motivation [44] or a poorly formed set in the DR/CNO rats as all rats had a similar latency to reach the reward cup at the end of the T-maze arms, and had similar numbers of consecutive trials before their first entry into the newly baited side (Fig. S.4). DR/CNO rats did, however, make significantly fewer win-shift (Con/Veh: 4.7 ± 0.5, Con/CNO: 4.3 ± 0.6, DR/Veh: 3.6 ± 0.8, DR/CNO: 2.1 ± 0.5; F3,42 = 3.80, p = 0.0168), but not lose-stay errors, compared to controls (p > 0.1; Fig. 2d, e).

Fig. 2
figure 2

DREADD activation of the MS enhances reversal learning. Data are reported as mean ± SEM and black dots indicate values for each individual rat. a DREADD rats that received systemic CNO (DR/CNO; 3 mg/kg) showed a small, but nonsignificant reduction in the number of trials required to perform the initial egocentric discrimination (trials to criterion of 6-in-a-row correct; p = 0.101) compared to control rats that received vehicle (Con/Veh), control rats that received CNO (Con/CNO), and DREADD rats that received vehicle (DR/Veh). b Following the reversal, DR/CNO rats reached criterion in significantly fewer trials compared to controls (F3,43 = 5.28, p = 0.0034). *Post-hoc Tukey’s tests revealed a significant reduction compared to Con/Veh (p = 0.0053) and Con/CNO (p = 0.0093) rats, with a significant trend compared to DR/Veh rats (P = 0.0778). c DR/CNO rats also chose the previously baited arm significantly fewer times compared to controls (F3,42 = 4.24, p = 0.0104). *Post-hoc Tukey’s tests revealed a significant reduction compared to Con/CNO (p = 0.0337) and DR/Veh (p = 0.0124) rats, with a significant trend compared to Con/Veh rats (p = 0.0869). d DR/CNO rats made significantly fewer win-shift (F3,42 = 3.80, p = 0.0168), e but not lose-stay (Con/Veh: 4.1 ± 0.8, Con/CNO: 4.5 ± 0.7, DR/Veh: 6.1 ± 1.6, DR/CNO: 3.0 ± 0.7; F3,42 = 1.643, p = 0.194) errors. *DR/CNO rats had fewer win-shift errors than Con/Veh (P = 0.0176) and Con/CNO (P = 0.0455), but not DR/Veh rats (P = 0.280). f DREADD rats that received intra-vSub CNO (DR/CNO; 1 mM) had no reduction in the number of trials required to perform the initial egocentric discrimination (trials to criterion of 6-in-a-row correct; P = 0.633) compared to control rats. g However, following the reversal, DR/CNO rats reached criterion in significantly fewer trials compared to controls (F3,47 = 6.01, p = 0.0015). *Post-hoc Tukey’s tests revealed a significant reduction compared to Con/CNO (p = 0.0257) and DR/Veh (p = 0.0009) rats, but not Con/Veh rats (p = 0.230). h DR/CNO rats also chose the previously baited arm significantly fewer times compared to controls (F3,47 = 5.59, p = 0.0023). *Post-hoc Tukey’s tests again revealed a significant reduction compared to Con/CNO (p = 0.0068) and DR/Veh (p = 0.0058) rats, but not Con/Veh rats (p = 0.420). i DR/CNO rats made significantly fewer win-shift (F3,47 = 5.62, p = 0.0022) and j lose-stay errors (F3,47 = 3.016, p = 0.0391). However, only win-stay errors showed significant post-hoc differences. *DR/CNO rats had fewer win-shift errors than DR/Veh (P = 0.0012) and Con/Veh (P = 0.038), while Con/CNO rats only showed a trend (P = 0.0832)

MS activation-induced effects on reversal learning are mediated via MS to vSub pathway

To demonstrate that the behavioral effects from the first experiment are produced by specific activation of the MS to vSub pathway (Fig. 1a), DREADD (DR) or control (Con) virus was infused into the MS and a guide cannula was implanted into the vSub. Twelve weeks later (to allow for terminal expression), CNO (1 mM) or vehicle (Veh) was infused directly onto MS terminals in the vSub (Fig. 3) and the T-maze reversal learning paradigm was performed 10 min later (N = 11–15 rats per group). All rats reached criterion on the initial egocentric discrimination at a similar rate (Fig. 2f; mean ± SEM, trials to criterion before the reversal = Con/Veh: 16.9 ± 2.4, Con/CNO: 15.9 ± 2.2, DR/Veh: 18.7 ± 1.9, DR/CNO: 14.8 ± 1.4; F3,47 = 0.576, p = 0.633). Following the reversal, however, DR/CNO rats again reached criterion after the reversal in significantly fewer trials (Fig. 2g; Con/Veh: 22.3 ± 1.5, Con/CNO: 25.0 ± 2.0, DR/Veh: 29.3 ± 3.4, DR/CNO: 16.6 ± 1.2; F3,47 = 6.01, p = 0.0015) and entered the previously baited arm significantly fewer times (Fig. 2h; Con/Veh: 8.3 ± 0.7, Con/CNO: 10.7 ± 1.0, DR/Veh: 11.1 ± 1.4, DR/CNO: 6.3 ± 0.6; F3,47 = 5.59, p = 0.0023) similar to that observed with systemic CNO. This enhancement in reversal learning was specific to rats that had both DREADD-containing viral expression in the MS and cannula placements within the vSub, as animals with DREADD expression in the MS, but cannula placements dorsal or medial to the vSub (in the dentate gyrus, N = 5 rats; Fig. S.5; trials to criterion after reversal = 28.8 ± 1.6; entries into previously baited arm = 11.8 ± 1.5) or rats that had cannula placements in the vSub, but DREADD expression in the lateral septum or horizontal limb of the diagonal band (N = 3 rats; Fig. S.5; trials to criterion after reversal = 25.7 ± 3.8, entries into previously baited arm = 10.7 ± 3.2) did not show the reversal learning enhancement. As in the first experiment, latency to reach the reward cup and number of consecutive trials before first entry into the newly baited side (Fig. S.4) were not different across groups. However, in contrast to the first experiment, DR/CNO animals showed a significant reduction in both number of win-shift (Fig. 2i; Con/Veh: 4.6 ± 0.5, Con/CNO: 4.3 ± 0.6, DR/Veh: 5.8 ± 0.8, DR/CNO: 2.3 ± 0.4; F3,47 = 5.62, p = 0.0022) and lose-stay errors (Fig. 2j; Con/Veh: 2.8 ± 0.4, Con/CNO: 5.5 ± 1.0, DR/Veh: 4.5 ± 1.0, DR/CNO: 3.0 ± 0.4; F3,47 = 3.016, p = 0.0391). However, only win-shift errors showed significant post-hoc differences (Tukey’s test).

Fig. 3
figure 3

DREADD terminal expression and cannulae placements in vSub. Representative photomicrograph showing extensive DREADD terminal expression in the vSub at a 10x and b 20x magnification. Terminals (green) are stained with the mouse mCherry primary and goat anti-mouse secondary stain (alexa 488) and are shown near DAPI-stained cell bodies in vSub (blue) c Placement map showing the termination of the cannula location in vSub (blue dot) for each rat included in the reversal learning analysis. All rats included in the primary behavioral and electrophysiological analyses had placements within the vSub, while rats with cannula placements outside the vSub were analyzed separately (Fig. S.5). d Representative cannula placement showing termination of the cannula in the vSub. The top arrow marks the bottom of the guide cannula, and the bottom arrow marks the bottom of the infusion cannula

MS activation-induced effects on midbrain DA neuron population activity are also mediated via the MS to vSub pathway

To confirm that the effects of MS activation on DA neuron population activity, as previously reported [23, 24], are also due to the specific pathway from MS to vSub (Fig. 1a), the same manipulations as above were done (Fig. 3) and in vivo DA neuron recordings (similar to those previously published [23, 24]) of the VTA or SNc began 10 min later (N = 6–7 rats per group, per region). DR/CNO rats showed a significant increase in the number of active DA neurons in the VTA (1.5 ± 0.1 active DA neurons per track) compared to all three control groups (Fig. 4e; Con/Veh: 1.0 ± 0.1, Con/CNO: 1.0 ± 0.1, DR/Veh: 0.9 ± 0.1; F3,21 = 8.83, p = 0.0006). When DA neuron population activity was measured in the SNc, DR/CNO rats showed a significant decrease in the number of active DA neurons (1.0 ± 0.1 active DA neurons per track) compared to control groups (Fig. 4g; Con/Veh: 1.7 ± 0.1, Con/CNO: 1.7 ± 0.1, DR/Veh: 1.6 ± 0.04, F3,21 = 11.18, p = 0.0001). The changes to midbrain DA neuron population activity were specific to rats that had both DREADD-containing viral expression in the MS and cannula placements within the vSub, as cannula or virus misses did not show the above-mentioned midbrain DA neuron population activity effects (N = 3 rats; Fig. S.5). DA neuron firing frequency and bursting activity were not affected by DR/CNO treatment in either the VTA or SNc (Fig. 4; p’s > 0.1).

Fig. 4
figure 4

DREADD activation of the MS to vSub pathway increases DA neuron population activity in VTA and decreases it in SNc. Population activity in the a VTA and c SNc was measured in 9 sequential electrode tracks in the anatomical range depicted (VTA = AP: 5.3–5.7 mm posterior from bregma, ML: 0.6–1.0 mm lateral from sinus, and DV: 6.5–9.0 mm from the brain surface; SNc = AP: 4.9–5.3 mm posterior from bregma, ML: 2.0–2.4 mm lateral from sinus, and DV: 6.5–9.0 mm from the brain surface). Corresponding representative histological slices showing tracks in the b VTA and d SNc (black arrows). d Following the conclusion of the 9th track, Chicago Sky Blue is ejected from the electrode leaving a blue dot for histological verification (white arrow). e DREADD activation of MS terminals in vSub (CNO; 1 mM) significantly increased DA neuron population activity in the VTA, but did not affect firing frequency or the percentage of spikes in bursts (Firing frequency in Hz = Con/ Veh: 4.5 ± 0.2, Con/CNO: 4.1 ± 0.4, DR/Veh: 3.5 ± 0.5, DR/CNO: 4.0 ± 0.3, F3,21 = 1.39, p = 0.273; percent of DA neurons firing in bursts Con/ Veh: 36.3 ± 5.8, Con/ CNO: 27.2 ± 5.0, DR/Veh: 28.2 ± 6.5, DR/CNO: 32.1 ± 3.0, F3,21 = 0.633, p = 0.602). ***significant increase in the number of active DA neurons per electrode recording track in DR/CNO rats as compared to Con/Veh (P = 0.0036), Con/CNO (P = 0.0116), and DR/Veh (P = 0.0008) rats (Tukey’s test). f Representative DA neuron recording and individual spike. The time bar for the recording equals 1 s, demonstrating a frequency of ~7 spikes per second during that particular interval. Arrows indicate sets of spikes that are occurring in bursts. g MS activation decreased DA neuron population activity in the SNc, but did not affect firing frequency or the percentage of spikes in bursts (Firing frequency in Hz = Con/ Veh: 3.3 ± 0.4, Con/CNO: 3.0 ± 0.2, DR/Veh: 3.5 ± 0.3, DR/CNO: 3.7 ± 0.4, F3,21 = 0.803, p = 0.506; percent of DA neurons firing in bursts Con/ Veh: 19.4 ± 5.8, Con/ CNO: 15.7 ± 3.3, DR/Veh: 8.1 ± 1.0, DR/CNO: 19.3 ± 5.0, F3,21 = 1.506, p = 0.242). ***significant decrease in the number of active DA neurons per electrode recording track in DR/CNO rats as compared to Con/Veh (p = 0.0004), Con/CNO (p = 0.0006), and DR/Veh (p = 0.0024) rats (Tukey’s test). Data are reported as mean ± SEM and black dots indicate respective DA activity for each individual rat

MS activation-induced enhancement of reversal learning was prevented by D1 antagonist administration

To test whether DA transmission is necessary for the MS activation-induced enhancement of reversal learning, DREADD (DR) or control (Con) viruses were infused into the MS of Sprague–Dawley rats (N = 13–16 rats per group). Eight weeks later, CNO (3 mg/kg) or vehicle (Veh) and the D1 antagonist SCH23390 (SCH, 3 µg/kg) were injected systemically and the reversal learning task began 30 min later. Systemic injection of SCH23390 was performed, instead of local infusion, to generate D1 antagonism in both the striatum and PFC, which are both involved in reversal learning [11,12,13,14,15] and targets of reward-related midbrain DA transmission [29, 30]. SCH co-treatment had no effect on the rats’ ability to reach criterion on the initial egocentric discrimination (Fig. 5a; mean ± SEM, trials to criterion before the reversal = Con/Veh/SCH: 14.3 ± 1.7, Con/CNO/SCH: 12.7 ± 1.6, DR/Veh/SCH: 16.3 ± 1.3, DR/CNO/SCH: 14.5 ± 1.9; F3,54 = 0.834, p = 0.481). Following the reversal, however, SCH co-treatment prevented the reversal learning enhancement observed previously in DR/CNO rats both in terms of trials to reach criterion (Fig. 5b; Con/Veh/SCH: 19.3 ± 1.3, Con/CNO/SCH: 20.6 ± 2.0, DR/Veh/SCH: 19.1 ± 1.4, DR/CNO/SCH: 20.5 ± 2.1; F3,54 = 0.204, p = 0.893) and entries into the previously baited arm (Fig. 5c; Con/Veh/SCH: 6.8 ± 0.8, Con/CNO/SCH: 8.0 ± 1.2, DR/Veh/SCH: 7.1 ± 0.7, DR/CNO/SCH: 6.5 ± 0.8; F3,54 = 0.554, p = 0.648) when compared to control groups. Importantly, control groups from this experiment performed comparably to control groups from the previous two experiments both in terms of overall trial number (see above) and post-reversal latency to reach the reward cup (Fig. S.4). This suggests that SCH co-treatment prevented the reversal learning enhancement in the DR/CNO rats without affecting either control performance or post-reversal motivation. Further analysis into error type revealed that the significant reduction in win-shift errors seen in the DR/CNO rats in both of the previous experiments was eliminated by SCH treatment (Fig. 5d; Con/Veh/SCH: 2.6 ± 0.3, Con/CNO/SCH: 3.1 ± 0.6, DR/Veh/SCH: 2.9 ± 0.5, DR/CNO/SCH: 3.5 ± 0.6; F3,54 = 0.557, p = 0.646).

Fig. 5
figure 5

Co-injection of SCH prevents reversal learning enhancement. Data are reported as mean ± SEM and black dots indicate values for each individual rat. a Systemic injection of the D1 antagonist, SCH23390 (3 µg/kg) with CNO, did not affect the rats’ memory of the initial egocentric discrimination. However, SCH treatment completely prevented the reversal learning enhancement seen in previous experiments both in terms of b trials to reach criterion, post-reversal and c number of entries into the previously baited side, without affecting control performance. SCH23390 co-treatment (DR/CNO/SCH) also eliminated the previously seen decreases in d win-shift (F3,54 = 0.557, p = 0.646) and e lose-stay errors (Con/Veh/SCH: 4.0 ± 1.0, Con/CNO/SCH: 3.9 ± 1.0, DR/Veh/SCH: 3.8 ± 0.7, DR/CNO/SCH: 2.5 ± 0.6; F3,54 = 0.557, p = 0.646)

Discussion

These experiments show that DREADD activation of the MS enhanced reversal learning by significantly reducing both trials required to reach post-reversal criterion, as well as the number of entries into the previously baited side. This effect seemed to be driven by an enhanced ability to learn the new rule, as DR/CNO rats made fewer win-shift but not lose-stay errors. Moreover, specific activation of the MS to vSub pathway recapitulated the reversal learning enhancement from the first experiment, as well as the previously demonstrated MS activation-induced increase in VTA and decrease in SNc DA neuron population activity [23, 24], demonstrating that both effects were mediated via the same pathway. Finally, systemic co-injection of a DA D1 antagonist with CNO completely prevented the reversal learning enhancement seen in the previous experiments, without affecting control performance. Taken together, these data suggest a key role for the MS in the circuitry that has been previously implicated in reversal learning [12, 14,15,16], and suggest that one potential mechanism by which flexible decision-making is enacted downstream may be through activation of MS-mediated changes in midbrain DA neuron population activity.

Proposed pathway and mechanism by which MS activation enhances reversal learning

Our data demonstrated that activation of MS terminals in vSub precipitated both the DA neuron population activity changes in the midbrain and the reversal learning enhancement. This suggests that these effects were not precipitated via the MS’s direct projections to the midbrain [45, 46], but rather through the indirect pathway via the hippocampus [23, 26] and ventral pallidum [23, 27] (Fig. 1a). This confirms our prior data demonstrating that TTX infusion into the vSub or biccuculine infusion into the ventral pallidum were sufficient to eliminate MS activation-induced effects on DA neuron population activity in both the VTA and SNc [23]. The MS makes widespread connections within the ventral hippocampus, however, including regions near to the vSub such as dentate gyrus and CA1 [47, 48]. Therefore, even direct infusion of CNO into the vSub could reach DREADD expressing terminals in these regions. Nonetheless, rats with good MS DREADD expression but cannula implantations into the dentate gyrus performed as controls (Fig. S.5). This does not rule out the CA1, but does indicate that CNO infusions just medial and dorsal to the vSub (in the dentate gyrus) were not able to diffuse to the vSub in a high enough concentration to produce the reversal learning enhancement. This suggests that mediation of the reversal learning enhancement by the MS’s projection to vSub may be a more parsimonious explanation than diffusion to CA1.

The VTA projects primarily to PFC and ventral/associative striatum and plays a role in goal-directed behavior [29, 30], while the SNc primarily projects to dorsolateral striatum and plays a role in habit-related responding [30, 49]. Therefore, our overall hypothesis is that the MS enhances reversal learning by weakening responding to the previous rule (SNc decrease), while re-opening goal-directed, active search for the new rule (VTA increase). We propose that this could occur via the following process. First, a rule switch would occur leading to a reduction in reward acquisition and subsequent activation of the MS. MS activation would precipitate an increase in VTA DA neuron population activity and a decrease in SNc DA neuron population activity via the previously described pathway [25,26,27,28] (Fig. 1a). Decreases in SNc activity via dopamine depletion [49] or transient inactivation [50] have been shown to reduce habit-formation through retained sensitivity to outcome devaluation. Therefore, one possibility is that the reduction in SNc DA neuron population activity seen following MS activation, which would reduce the number of neurons available to respond to phasic input [28], could act similarly, producing a state where devaluation of the previously baited arm could happen more quickly once it was found to no longer contain a reward. Concurrently, increases in VTA DA neuron population activity would increase the number of active DA neurons available to respond to an incoming phasic signal, as DA neurons must be spontaneously active to burst fire [28, 51]. Interestingly, phasic DA neuron activity in the VTA is thought to code for reward prediction error (RPE), with prior studies showing that unexpected reward leads to a large increase in VTA DA neuron burst firing that dissipates once a reward is expected [33]. Therefore, we propose that an MS activation-induced increase in the number of spontaneously active VTA DA neurons would not affect behavior in isolation, but would significantly enhance the magnitude of the incoming RPE-like phasic signal that would likely occur upon finding a rewarding treat in a place where it previously had not been (the newly baited arm). This enhanced RPE-like positive feedback would likely increase the salience of the newly baited side and speed the learning process [52], leading to a quicker incorporation of the new rule into continued behavior.

A second possibility is that the enhanced reversal learning effect was due to augmented tonic DA transmission, possibly leading to an increase in motivation [53]. This possibility exists because our data demonstrated that DREADD activation of the MS lead to a stable increase in DA neuron population activity in the VTA that lasted 1–1.5 h. This is less likely, however, for several reasons. First, our data demonstrated that MS activation reduced the number of win-shift errors, suggesting a decreased likelihood to make subsequent entries into the previously baited arm (errors) once the newly baited arm was discovered. Additionally, we saw no effect in the latency to reach the reward cup, which is similar to measures of motivation used by others [44]. This suggests that an enhancement in the learning of the newly baited side, a function commonly attributed to phasic DA transmission [32, 33, 54], is more likely to have been the cause of the reversal learning enhancement. Second, the enhancement in reversal learning was blocked by an antagonist of the D1 receptor, a receptor thought to require phasic DA transmission to be activated based on its lower affinity for DA [55]. Third, more recent data have demonstrated a dissociation between DA population activity changes and tonic DA levels, suggesting that tonic DA levels may be regulated more so by local mechanisms [54], such as cholinergic interneurons [56]. Thus, we propose that the increase in VTA DA neuron population activity affected reversal learning via an enhancement of phasic RPE signals once the rat discovered the newly baited arm as opposed to via effects on tonic DA levels throughout the behavior session.

The role of the MS in learning and cognitive flexibility

Our results revealed that reversal learning, as one facet of cognitive flexibility, is likely regulated via the MS. Interestingly, we showed that MS activation did not significantly improve the rats’ ability to remember the initial egocentric discrimination (first half of the test day), a measure of spatial memory, which is a function that has been previously associated with the MS [17, 18]. This finding, however, is not likely to be in opposition to prior work for two reasons. First, prior studies associating the MS with spatial memory have primarily shown a deficit in memory following an MS lesion [17, 18], which is functionally distinct from not showing an enhancement following MS activation. Second, it is possible that the pathway involved in the reversal learning enhancement is different from that involved in spatial memory. For example, the MS projects throughout the hippocampus, including dorsal hippocampus [47, 48], as well as to the PFC [57]. Interestingly, systemic activation of the MS resulted in a minor, but not significant, enhancement of the initial egocentric discrimination, i.e., spatial memory, whereas direct activation of MS terminals in vSub had no effect on this initial discrimination (Fig. 2). Therefore, one possibility is that the MS’s role in reversal learning is mediated, specifically, via the MS to vSub pathway’s regulation of DA neuron population activity [23, 24], while other MS-mediated forms of learning and memory occur via MS projections to regions such as the dorsal hippocampus [47, 48] or PFC [57].

Summary

In conclusion, these data add substantially to the known circuitry involved in cognitive flexibility, demonstrating a key role for the MS to vSub pathway in reversal learning. They also provide evidence to suggest that this pathway’s regulation of VTA and SNc DA neuron population activity could be one mechanism by which strategy-switch signals, precipitated during cognitive flexibility tasks, could be enacted downstream. Taken together, these data provide a first step towards developing novel, more effective therapies to treat cognitive flexibility deficits.

Funding and disclosure

This work was funded by NIMH grants F32MH115550 (DMB) and MH057440-11 (AAG). AAG received funds from the following organizations: Lundbeck, Pfizer, Otsuka, Lilly, Roche, Asubio, Abbott, Autofony, Janssen, Alkermes, Newron.