Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Dorsal and ventral striatal dopamine D1 and D2 receptors differentially modulate distinct phases of serial visual reversal learning


Impaired cognitive flexibility in visual reversal-learning tasks has been observed in a wide range of neurological and neuropsychiatric disorders. Although both human and animal studies have implicated striatal D2-like and D1-like receptors (D2R; D1R) in this form of flexibility, less is known about the contribution they make within distinct sub-regions of the striatum and the different phases of visual reversal learning. The present study investigated the involvement of D2R and D1R during the early (perseverative) phase of reversal learning as well as in the intermediate and late stages (new learning) after microinfusions of D2R and D1R antagonists into the nucleus accumbens core and shell (NAcC; NAcS), the anterior and posterior dorsomedial striatum (DMS) and the dorsolateral striatum (DLS) on a touchscreen visual serial reversal-learning task. Reversal learning was improved after dopamine receptor blockade in the nucleus accumbens; the D1R antagonist, SCH23390, in the NAcS and the D2R antagonist, raclopride, in the NAcC selectively reduced early, perseverative errors. In contrast, reversal learning was impaired by D2R antagonism, but not D1R antagonism, in the dorsal striatum: raclopride increased errors in the intermediate phase after DMS infusions, and increased errors across phases after DLS infusions. These findings indicate that D1R and D2R modulate different stages of reversal learning through effects localised to different sub-regions of the striatum. Thus, deficits in behavioral flexibility observed in disorders linked to dopamine perturbations may be attributable to specific D1R and D2R dysfunction in distinct striatal sub-regions.


Cognitive flexibility, the ability to adapt behavior to changes in the environment, is impaired in a wide range of neurological and neuropsychiatric disorders, including schizophrenia [1], obsessive-compulsive disorder (OCD) [2], Parkinson’s disease (PD) [3] and substance use disorder [4]. Such cognitive dysfunction can be evaluated in reversal-learning tasks. Converging evidence from such tests implicates dopamine (DA) as an important modulator of reversal learning. For instance, systemic blockade or agonism of D2-like receptors (D2R) impairs reversal learning in vervet monkeys and rats [5, 6], while D2R knockout mice show deficiencies in initial visual discrimination and in reversal learning [7]. In contrast, pharmacological activation of D1-like receptors (D1R) impaired early phases of reversal learning [8], whereas D1R antagonism did not alter reversal learning performance [5]. In healthy humans, repeat variations in the dopamine transporter gene, DAT1, have been linked to performance during the early, perseverative phase of reversal learning, when prior beliefs about the stimulus-reward outcomes still guide behavior, whereas accuracy during later phases, when new learning takes place, showed no such link [9].

The main sub-regions of the dorsal striatum, namely the caudate nucleus and the putamen in primates and the dorsomedial and dorsolateral striatum in rodents (DMS; DLS), have also been differentially linked to reversal learning. Recent evidence suggests that pharmacological inactivation of the putamen and caudate nucleus differentially affect serial visual reversal learning in marmoset monkeys [10]. Furthermore, D2R availability in these sub-regions of vervet monkeys is associated with reversal learning performance [11]. Importantly, the DMS appears strongly linked to the early, perseverative phase of reversal, whereas the DLS becomes engaged during later stages [12]. This is perhaps in line with the view that the DLS mediates stimulus-response habits whereas the DMS—especially the anterior over the posterior DMS (aDMS; pDMS; [13], but see [14])—is more strongly associated with goal-directed actions [15]. Both forms of control over instrumental behavior are likely necessary for implementing a new strategy following contingency reversal, specifically the ability to suppress prepotent, perhaps habitual, responding to the previously rewarded (and now unrewarded) stimulus, and flexibly learn to select, via goal-directed behavior, the previously unrewarded (now rewarded) option [16].

In the ventral striatum, previous studies have shown that increased dopaminergic tone in the nucleus accumbens (NAc), or infusions of a D2R agonist (quinpirole) into this area impaired reversal learning in rats [17], whereas infusions of a D1R agonist (SKF81297) disrupted set-shifting by increasing perseverative behavior [17, 18]. Lesions of the NAc disrupted initial stimulus discrimination and reversal learning [19, 20], including spatial, but not visual, reversal learning in monkeys [21], and pharmacological inactivation impaired probabilistic learning in rats [22]. However, other studies report no effect of NAc interventions on such flexibility [23, 24]. This discrepancy may be explained by the heterogeneity of the NAc with the core and shell sub-regions (NAcC; NAcS) contributing differentially to attention [25, 26] and impulsivity-related behaviors [27,28,29], with these NAc sub-regions often being suggested to play opposite roles in modulating behavior. For instance, inactivation of the NAcS impaired probabilistic reversal performance in rats, identifying a key role for this nucleus in using probabilistic reward feedback to facilitate discriminative learning and flexibility, whereas inactivation of the NAcC, while not affecting performance accuracy did cause a general slowing of approach toward the response levers [22].

Taken together, this evidence suggests a general pattern of impaired reversal learning when DA activity is low in the dorsal striatum and when the dopaminergic tone is elevated in the ventral striatum. However, there is no clear evidence of the role of D1R and D2R in different sub-regions of the striatum in visual reversal learning or of their involvement in its different learning phases.

We therefore sought to investigate whether D1R and D2R differentially affect reversal learning both across different striatal sub-regions, including DLS, aDMS, pDMS, NAcC and NAcS, and on the different phases of reversal learning by exploring the behavioral effects of local administration of a D2R antagonist and a D1R antagonist using a recently established touchscreen task for rats [30].

Materials and methods


The subjects were 82 male Lister-Hooded rats (Charles River, UK) initially housed in groups of up to 4 under humidity- and temperature-controlled conditions and a 12:12-h light-dark cycle (lights off at 0700 h). Following implantation of guide cannulae, animals were singly housed. Rats were ≈300 g at the beginning of training and were maintained at >85% of their free-feeding weight by food restriction (19 g/day of Purina chow). Water was provided ad libitum. The number of animals used for each experiment is shown in Table 1. The work was carried out under a UK Home Office Project license (PPL 70/7548) in accordance with the UK Animals (Scientific Procedures) Act 1986 and local ethical review at Cambridge University.

Table 1 Coordinates and group size for the different striatal sub-regions and DA receptor antagonists, raclopride (D2R) and SCH23390 (D1R).

Experimental procedures

Surgeries and microinfusion procedures are described in the Supplementary Materials and Methods.

Behavioral pre-training

All software was written by Dr. A. C. Mar [30]. Rats were initially trained to touch the screens with daily sessions of 60 min or 100 trials. Pre-training consisted of five stages with gradually increased difficulty (Fig. 1b). Briefly, in stage 1, a large white horizontal square ‘start-box’ (15 × 9 cm) was presented in the bottom center of the screen, and touching it was associated with reward (45 mg sucrose pellet; TestDiet 5UTL; Sandown Scientific, Middlesex, UK). The size of the ‘start box’ decreased throughout the stages until measuring 3 × 4 cm in stage 3. Animals were moved to the next stage when reaching 100 responses/rewards per session. In stage 4, touching the white box was not reinforced but led to the presentation of a visual stimulus (vertical or horizontal bars) with a pseudo-random spatial placement, left or right. The same stimulus was not displayed on the same side for more than three consecutive trials to avoid side-biasing. Responding to the stimulus was reinforced, whereas the blank side led to the illumination of the house-light for a 5 s time-out (TO) period. After collecting the reward, there was an inter-trial interval (ITI) of 5 s. In stage 5, the stimuli were presented slightly higher to avoid accidental touches e.g. with the tail. The criterion to move from stages 4 and 5 was reaching ≥80% of correct responses per session.

Fig. 1: Schematic representation of the task.
figure 1

a Behavioral training and testing protocol. The rewarded stimulus is represented as a + and the unrewarded stimulus as a −. Stimuli were vertical or horizontal bars and were counterbalanced as CS+ or CS− across rats. b Diagram of pre-training stages, from 1 to 5. Stimulus presentation in stages 4 and 5 was preceded by the same starting box from stage 3. Only one of the two stimuli appeared at any one time. Position (i.e. left/right) was pseudo-randomized. c Representation of the stimuli during visual discrimination (VD) and reversal learning. Criterion was reached at a performance of ≥24/30 correct responses, which represents a performance at or above 80%. After criterion was met during both reversal learning and in two retention sessions, conditions changed again. d Flowchart of the testing procedure and phases of reversal learning. Phases depended on performance within sessions. After reversal, during the early phase performance was lower than 11 correct trials out of a set of 30 trials, as animals tended to perseverate on the previously CS+, now CS−. After some trials, performance improved, and animals reached the so-called mid, intermediate or random phase, before reaching the late or learning phase, in which they have learnt to approach the new CS+ (>19/30 correct responses) [30, 31].

Visual discrimination training

After the initial training stages, subjects were trained on a visual two-choice discrimination task (Fig. 1). Touching the square ‘start-box’ triggered the simultaneous presentation of two stimuli (vertical and horizontal bars), determined pseudo-randomly on either left or right side of the screen [30]. The start-box procedure was used to ensure the central position of the animal before the choice phase. Responses to one stimulus (CS+) were associated with reward and collecting the reward initiated the next ITI. In contrast, responses to the other stimulus (CS−) were not rewarded and led to a house light-signaled TO. The response window after stimulus presentation was set to 10 s. After this time, the trial was considered as an omission and led to a new ITI. The session ended after 250 trials, 150 rewards or 1 h, whichever came first. Criterion for discrimination learning was set to 24 correct responses out of 30 consecutive trials. Once acquired within any session, rats were given a retention session with the same reward contingencies to ensure they had reliably acquired the visual discrimination.

Serial visual reversal learning

Following acquisition of visual discrimination, animals were trained in serial visual reversal learning (Fig. 1c). After the discrimination and retention sessions, contingencies reversed so the previous CS+ was then CS− and vice versa. Rats were required to respond to the new CS+ until reaching the discrimination criterion (≥24/30 correct responses). After reaching criterion, an extra retention session was run. Additional reversals were performed until the rats were able to attain the criterion within 3 daily sessions. When this was met, rats underwent surgery prior to testing. A retention session was run before each reversal and after reaching the criterion (Fig. 1d), both in training and testing.

Data analysis

The main dependent variables were the number of errors and trials to criterion (≥24/30 correct responses). Omissions, latencies to respond and latencies to collect the reward were additionally analyzed. Data from each reversal were collapsed over days. Trial outcomes were classified in three different phases: early, mid or late, depending on the performance over a running window of 30 consecutive trials [30, 31]. If animals had a significant bias (binomial distribution probabilities) towards the previously positive stimulus (<11/30 correct responses), performance was considered to belong to the early phase, in which animals exhibited mainly perseverative responses. If their performance instead showed a significant preference for the currently rewarded stimulus (>19/30 correct responses) it was considered as the late phase, in which animals moved closer to criterion for learning the reversed contingency. Performance in-between these thresholds was classified as intermediate or mid-phase, prior to acquisition of the new learned association. Data from all trials after the rats had reached the final learning criterion (≥24/30 correct responses) were excluded from the analysis.

Statistical tests were performed with RStudio, version 1.2.1335 (RStudio, Inc). Errors were square-root transformed and latencies log transformed to ensure normality. Data were then subjected to Linear Mixed-Effects Model analysis with the lmer package in R. The model contained three fixed factors (dose, phase, region) and one factor (subject) modeled as a random slope to account for individual differences between rats across phases (i.e. individual learning curves). Significance was considered at α = 0.05. The normality of residuals was confirmed with a quantile-quantile plot (QQ plot) and model fitting was tested with a Chi-squared test. When significant three-way interactions were found, further analysis was performed by conducting separate multilevel models on “dose” and “phase” for each region. In the absence of significant three-way interactions, two-way Dose × Region interactions were explored further. Analysis was followed by post hoc Tukey’s corrected pairwise comparisons.



The ventral-most locations of injectors are included in each of the data figures. Rats were excluded from the study if the injector cannulas were positioned outside the target areas (n = 3 pDMS, n = 5 DLS and n = 1 NAcC). Final group sizes with verified injector positions for each of the drug groups and targeted coordinates are shown in Table 1.

Effects of intra-striatal infusions of the D2R antagonist raclopride and the D1R antagonist SCH23390

Across all behavioral variables we found no significant differences between the aDMS and pDMS. We therefore combined these two regions as ‘DMS’ for subsequent analysis. Separate data for each of these regions are given in the Supplementary Material online.

Figures 2 and 3 indicate that whereas local infusions of the D2R antagonist raclopride improved early stages of reversal learning when administered into the NAcC, they impaired reversal when given in the dorsal striatum, both in the DMS (mid-phase) and DLS (across phases). In contrast, D1R antagonism in the NAcS improved the early phase of reversal learning but did not affect the number of errors when administered into the NAcC.

Fig. 2: In the ventral striatum, reversal learning was modulated via D1R in the NAcC and D2R in the NAcS during early stages of reversal learning.
figure 2

a, d Injector tip placements. Closed circles represent rats that received both raclopride and SCH23390; open circles represent rats that received only raclopride. b, e Errors to criterion by phase—early, mid and late—after the D2R antagonist, raclopride, in the NAcC and NAcS, respectively. c, f Errors to criterion by phase—early, mid and late—after the D1R antagonist, SCH23390, in the NAcC and NAcS, respectively. Errors until reaching criterion of a high performance (>24/30 correct responses) are collapsed over reversals. Data shown as mean ± SEM. *p < 0.05. ***p < 0.001.

Fig. 3: In the dorsal striatum, reversal learning was modulated via D2R in the DMS during the intermediate phase, and in the DLS during across all the phases of reversal learning.
figure 3

a, d Injector tip placements. Closed circles represent rats that received both raclopride and SCH23390; open circles represent rats that received only raclopride. b, e Errors to criterion by phase—early, mid and late—after the D2R antagonist, raclopride, in the DMS and DLS, respectively. c, f errors to criterion by phase—early, mid and late—after the D1R antagonist, SCH23390, in the DMS and DLS, respectively. Errors until reaching criterion of a high performance (>24/30 correct responses) are collapsed over reversals. Data shown as mean ± SEM. #p = 0.05. *p < 0.05.

Analysis for both raclopride and SCH23390 treatments substantiated that the effect of drugs varied across regions and phases of the reversal task. For the number of errors committed we found a significant Dose × Phase × Region interaction after both raclopride (F12, 479.990 = 4.109, p = 0.005) and SCH23390 (F6, 191.999 = 4.109, p < 0.001) treatment. This was matched by significant Dose × Phase × Region interactions in number of trials per phase after antagonists administration (Raclopride: F12, 407.990 = 5.300, p < 0.001; SCH23390 F6, 192.010 = 3.280, p = 0.004). In addition, there was a significant Dose × Phase × Region interaction on omissions after SCH23390 microinfusions (F6, 232.089 = 11.512, p < 0.001), whereas no such effect was detected for raclopride (ns). On latencies, we observed no three-way interactions, but a number of Dose × Region interactions. Thus, we found a significant Dose × Region interaction in latencies to collect after infusions of Raclopride (F6, 469.120 = 3.511, p = 0.002), and both in latencies to collect and to respond with administration of SCH23390 (F3, 221.033 = 19.275, p < 0.001; F3, 220.847 = 24.379, p < 0.001, respectively).

Effects of D1R and D2R antagonism in the ventral striatum

Since the three-way interactions were significant, separate multilevel models were used to ascertain the phase-dependency of the drug effects in each region separately. Thus, in the NAcC there was a Dose × Phase interaction on the number of errors after raclopride infusions (F4, 126.01 = 3.905, p = 0.005). Post hoc analysis revealed that raclopride selectively improved performance during the early phase of reversal learning when infused in the NAcC at 0.1 μg/μl and 1 μg/μl, compared to vehicle control (p < 0.001 and p = 0.028, respectively; Fig. 2b). In the NAcS, there was also a Dose × Phase interaction for errors (F4, 63.005  = 3.813, p = 0.008), but pairwise comparisons revealed that no dose differed from the vehicle-control group (ns). There was thus no clear effect of raclopride when infused into the NAcS (Fig. 2e).

In contrast, analysis on the number of errors committed after SCH23390 infusions identified a significant Dose × Phase interaction after NAcS infusions (F2, 31.997 = 25.616, p < 0.001). Post hoc analyses showed that D1R antagonism into the NAcS selectively decreased perseveration in the early phase compared with the vehicle condition (p < 0.001; Fig. 2f). No main effect of Dose or a Dose × Phase interaction was observed after SCH23390 infusions into the NAcC (ns; Fig. 2c).

The above results on the number of errors committed after infusions into the NAcC and NAcS were similar when trials were analyzed instead. Specifically, the interactions Dose × Phase were significant for raclopride in the NAcC (F4, 126 = 3.402, p = 0.011); and for SCH23390 in the NAcS (F2, 32 = 20.328, p < 0.001) but not the NAcC (ns).

Table 2B shows that in the NAcC, SCH23390 strongly affected the number of omissions (Dose × Phase: F2, 58.492 = 11.838, p < 0.001). Post hoc analysis showed that SCH23390 selectively increased the number of omissions in the early phase (p < 0.001), with no significant effect during the mid or late phases (ns). No such effect was detected after NAcS infusions of SCH23390, or after raclopride infusions into either the NAcC or the NAcS (Table 2). SCH23390 infusions also prolonged the latencies to collect the reward and to respond to the stimuli in both sub-regions regardless of the phase (Dose: in Collect, NAcC: F1, 57.096 = 85.205, p < 0.001, and NAcS: F1, 31.062 = 99.382, p < 0.001; in Respond, NAcC: F1, 57.181 = 64.593, p < 0.001, and NAcS: F1, 31.082 = 7.838, p = 0.009). Raclopride had no effect on these variables in either NAcC or NAcS (Table 2A).

Table 2 D1R antagonism increased omissions when infused in the NAcC.

Effects of D1R and D2R antagonism in the dorsal striatum

The potential effects of drug infusions into the dorsal striatum were analysed next. There was a phase-dependent effect of raclopride in the DMS (Dose × Phase: F4, 196.002 = 3.574, p = 0.008). As can be seen in Fig. 3b, post hoc analysis showed that, in this region, the high dose (1.0 µg/µl) of raclopride marginally induced a significant impairment in the mid phase (p = 0.050) versus saline. There was no significant Dose × Phase interaction after raclopride infusions into the DLS (ns), although a main effect of Dose and Phase was observed (Phase: F2, 12.057 = 17.472, p < 0.001; Dose: F2, 70.008 = 3.764, p = 0.028). We explored this further and identified the main effect was driven by the low dose of raclopride across all the phases of reversal learning (Fig. 3e). D1R antagonism with SCH23390 in the dorsal striatum did not alter performance either in the DMS or in the DLS (Fig. 3). In all cases, the effects were similar for trials to criterion.

Both SCH23390 and raclopride infusions increased latencies to collect the reward across all phases when infused into the DMS (Dose: SCH23390, F1, 113.493 = 33.828, p < 0.001; Raclopride, F2, 192.771 = 14.706, p < 0.001), but not the DLS (ns). Further analysis showed that raclopride caused this effect at both the low and high doses (p = 0.002; p < 0.001, respectively). Omissions or latencies to respond to the stimuli were not affected after manipulation in any region of the dorsal striatum, neither by raclopride nor by SCH23390 infusions (Table 2).


This study demonstrates dissociable effects on visual serial reversal learning of D2R and D1R antagonists locally infused into the striatum, and shows that the effects of each drug differ fundamentally based on the striatal sub-region targeted and the different learning phases of the task (i.e. the early, perseverative phase versus new learning phases). An important overall finding was that whereas DA receptor antagonism improved reversal-learning performance in the ventral striatum, learning was impaired after drug infusions into the dorsal striatum, clearly showing the different roles of DA signaling within these structures when stimulus-reward contingencies change. This finding is in general consistent with previous data on humans with PD [32, 33] indicating that excess DA activity may often be detrimental for reversal performance in the NAc, whereas intact DA function in the dorsal striatum is necessary for efficient reversal learning, as supported by data from non-human primates [11, 34].

The effects of DA receptor blockade were highly dependent on the phases of reversal learning, as defined by binomial distribution probabilities (cf. [31]) to indicate whether the rats were still being guided by the previous and obsolete stimulus-reward contingencies (significant bias to the previously correct stimulus; early phase; perseveration), at random performance (no bias; mid phase), or had learned to respond in accordance with the new contingencies (significant bias towards the new correct stimulus; late phase). These phases were previously linked to defined brain circuits; e.g., inactivation of the lateral orbitofrontal cortex (OFC) produces increased perseveration in the early phase of visual reversal learning in both marmoset monkeys [35] and rats [36, 37], whereas inactivation of the medial OFC decreases perseveration in visual reversal learning without affecting the later phases of reversal ([37]; but see [38]). In contrast, disrupted function in the medial prefrontal cortex of mice improves the later phases of reversal learning [16], and excitotoxic lesions of the infralimbic cortex impairs late learning in rats [36]. Since the above mentioned prefrontal cortical regions form distinct circuitries and innervate dissociable terminal fields in the striatum [39], it is not unexpected that striatal sub-regions also mediate specific phases of visual reversal learning, both in the present work and from previous reports [12, 40].

The improvements in reversal learning after NAc infusions depended on both the accumbal sub-region and the sub-type of DA receptor, and they were selective for the early phase of reversal learning. Whereas D1R antagonism in the NAcS decreased perseverative errors, this effect was only observed after D2R antagonism in the NAcC. Such a double dissociation refines previous reports showing e.g. that elevated dopaminergic states in the NAc are detrimental for reversal learning [18], and that D2R agonism in the NAc impairs behavioral flexibility [17, 41]. This could be relevant for the DA overdose hypothesis of iatrogenic cognitive impairments associated with dopaminergic drug treatment in PD [42], as our data suggest that such effects are driven by D1R in the NAcS and D2R in the NAcC. However, since the antagonists given here only block endogenous ligands (i.e. DA), our data also suggest that DA signalling at D1R in the NAcS and D2R in the NAcC contribute to perseverative responding in visual reversal learning, perhaps by inappropriately maintaining the previous stimulus-reward association [43] or Pavlovian conditioned approach [44]. Inactivation of the NAcS can also improve various forms of behavioral flexibility, including latent inhibition [45], attentional set-shifting [26] and spatial reversal learning [22, 23, 46]; our results suggest that such effects could be mediated by D1R-expressing neurons.

Additionally, blocking D1R in the NAcC disrupted performance overall by increasing omissions. This effect is similar to what was previously reported after NAcC infusions of higher doses of both raclopride and SCH23390 in rats trained on a visual reversal task [47]. However, it is noteworthy that rats treated with intra-NAcC SCH23390 in our task consistently initiated trials but then failed to respond to either stimulus; again an effect only noticeable in the early phase. While it is possible that D1R antagonism interferes with the processing of visual cues, an alternative interpretation is therefore that such receptor blockade selectively impairs learning from positive feedback by blunting the impact of positive prediction errors, as theorised by Frank and colleagues [48]. Hence, rats in our task could rapidly learn (from negative feedback) that the previously positive stimulus is now incorrect, but, due to the NAcC D1R blockade, not be able to update the value they associate with the previously incorrect, now rewarded stimulus. We recently found some evidence for such an effect of systemic D1R antagonism in visual reversal learning [49].

In the dorsal striatum, D2R antagonism was active in the DMS where it delayed the re-learning of the new stimulus-reward contingencies (mid phase), but did not affect either early or late phases; in the DLS, D2R antagonism impaired reversal learning overall, including the initial (perseverative) phase and during subsequent learning. D1R antagonism showed a lack of effect in both the DMS and the DLS at doses and infusion parameters routinely used in the literature [50]. Hence, D2R antagonism in the DMS and DLS had almost complementary effects with regard to the phase of reversal that was affected. It is plausible theoretically to reconcile this dissociation with evidence that the DMS and DLS mediate different aspects of instrumental learning in both rodents and humans [15]. Whereas the DMS is generally associated with goal-directed behavior, the DLS is thought to mediate habitual, stimulus-response behavior [13]. In this context, it is noteworthy that well-trained visual discrimination may exhibit rule-like or habitual tendencies [51], which need surmounting for reversal learning to proceed. Such top-down executive control over habitual tendencies may implicate cortico-striatal projections. The present data suggest that striatal D2R might play an important modulatory role in controlling habits. These findings for the rat DLS are consistent with recent evidence that the putamen in primates also plays a key role in reversal learning [10, 11]. By contrast, the DMS is implicated in DA-dependent goal-directed behavior and so the modulation of the mid phase, characterised by new learning, by intra-DMS raclopride was predictable. Our data on dorsal-striatal D2R and reversal learning is in accordance with the positive relationship between behavioral flexibility and D2R availability in both caudate and putamen, but not ventral striatum, of vervet monkeys trained in a visual reversal task [11]. This could be relevant also for human conditions such as OCD and substance-use disorder, where reduced D2R binding has been reported [52, 53]. For example, the mixed full/partial D2R agonist pramipexole ameliorated deficits in reversal performance in chronic stimulant abusers with a concomitant normalisation of on-task activation of the caudate nucleus [4].

These findings add to considerable data implicating DA receptors in reversal learning across species by showing that D1R and D2R antagonism can both impair and improve reversal according to the region of the striatum and at the stage of learning this occurs. Of particular interest are two recent studies; Horst and colleagues found that a D2R agonist infused into the caudate nucleus improved serial visual reversal learning at intermediate doses in marmoset monkeys [54], whereas Verharen et al. reported that D1R and D2R agonists impaired probabilistic spatial reversal learning in rats, both after systemic treatment and after local infusions into the ventral striatum [41].


A number of limitations should be borne in mind when interpreting the results from this set of experiments. Firstly, all rats first completed the Latin Square-design experiment investigating the impact of raclopride on reversal learning, and then received the SCH23390 infusions in a cross-over experiment. It is possible that the additional training (three reversals minimum), number of prior infusion events (average 12 infusions during the raclopride experiment) or plastic changes in, e.g., membrane presentation of receptors after exposure to a D2R antagonist altered the impact of subsequent SCH23390 infusions. Next, all rats in this study were male, and it is conceivable that future studies will reveal sex differences in the impact of D2R or D1R antagonism on reversal learning. In addition, it must be noted that SCH23390, although frequently used for experiments targeting the D1R, also shows affinity (as an agonist) at the serotonin 5-HT2C receptor [55], which could in theory contribute to the effects observed after NAcC and NAcS infusions. However, previous reports have suggested no impact on reversal learning after 5-HT2C receptor manipulation in the NAcC [56].

Perhaps more importantly, the D2R antagonist drug employed also has strong dopamine D3 receptors (D3R) antagonism properties and, so like many studies employing such drugs we are unable clearly to distinguish between D2R and D3R actions. Furthermore, understanding and dissecting the role of DA signalling is challenging due to the expression of D2R both in pre- and post-synaptic striatal neurons, as well as on striatal GABAergic and cholinergic interneurons [57, 58].

In addition, although the present findings imply that visual reversal learning involves sequential processing in ventral striatal and dorsal striatal domains, more direct evidence would come from monitoring the involvement of all of these regions simultaneously during the course of reversal learning [12].


The current study elucidates the involvement of DA in reversal learning and suggests that striatal regions differentially modulate this form of behavioral flexibility. Using a serial visual reversal learning task in touchscreen operant chambers, we show that infusions of D1R and D2R antagonists in four striatal sub-regions (NAcC, NAcS, DMS, and DLS) differentially affect distinct phases in reversal learning. These results enhance our understanding of the neural circuits underlying visual reversal learning and could be relevant for cognitive inflexibility in DA-related disorders, such as PD [32], OCD [52] or drug addiction [53].

Funding and disclosure

This research was funded by a Wellcome Trust Senior Investigator award to TWR (104631/Z/14/Z) and an award from Boehringer Ingelheim to JWD. All experiments were conducted at the Behavioral and Clinical Neuroscience Institute, which was jointly funded by the Medical Research Council and the Wellcome Trust. JSB was supported by a PhD scholarship from the La Caixa Foundation, Spain, and a studentship from Boehringer Ingelheim Pharma GmbH, Germany. LF was funded by a Biotechnology and Biological Sciences Research Council Doctoral Training Partnership. JRN is a full-time employee at Boehringer Ingelheim Pharma GmbH, Germany. JWD has received funding from GlaxoSmithKline. TWR is a consultant for, and receives royalties from, Cambridge Cognition; is a consultant for Unilever and Greenfield Bioventures, had recent research grants with Shionogi and Small Pharma and GlaxoSmithKline and receives editorial honoraria from Springer Nature and Elsevier. The rest of the authors declare no conflict of interest.


  1. Leeson VC, Robbins TW, Matheson E, Hutton SB, Ron MA, Barnes TRE, et al. Discrimination learning, reversal, and set-shifting in first-episode schizophrenia: stability over six years and specific associations with medication type and disorganization syndrome. Biol Psychiatry. 2009;66:586–93.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Remijnse PL, van den Heuvel OA, Nielen MMA, Vriend C, Hendriks GJ, Hoogendijk WJG, et al. Cognitive inflexibility in obsessive-compulsive disorder and major depression is associated with distinct neural correlates. PLoS ONE. 2013;8:e359600, 1–8.

  3. Cools R, Barker Ra, Sahakian BJ, Robbins TW. Enhanced or impaired cognitive function in Parkinson’s disease as a function of dopaminergic medication and task demands. Cereb Cortex. 2001;11:1136–43.

    Article  CAS  PubMed  Google Scholar 

  4. Ersche KD, Roiser JP, Abbott S, Craig KJ, Mller U, Suckling J, et al. Response perseveration in stimulant dependence is associated with striatal dysfunction and can be ameliorated by a D2/3receptor agonist. Biol Psychiatry. 2011;70:754–62.

    Article  CAS  PubMed  Google Scholar 

  5. Lee B, Groman S, London ED, Jentsch JD. Dopamine D2/D3 receptors play a specific role in the reversal of a learned visual discrimination in monkeys. Neuropsychopharmacology. 2007;32:2125–34.

    Article  CAS  PubMed  Google Scholar 

  6. Boulougouris V, Castañé A, Robbins TW. Dopamine D2/D3 receptor agonist quinpirole impairs spatial reversal learning in rats: Investigation of D3 receptor involvement in persistent behavior. Psychopharmacology. 2009;202:611–20.

    Article  CAS  PubMed  Google Scholar 

  7. Kruzich PJ, Grandy DK. Dopamine D2 receptors mediate two-odor discrimination and reversal learning in C57BL/6 mice. BMC Neurosci. 2004;5:1–10.

    Article  Google Scholar 

  8. Izquierdo A, Wiedholz LM, Millstein RA, Yang RJ, Bussey TJ, Saksida LM, et al. Genetic and dopaminergic modulation of reversal learning in a touchscreen-based operant procedure for mice. Behav Brain Res. 2006;171:181–8.

    Article  CAS  PubMed  Google Scholar 

  9. den Ouden HEM, Daw ND, Fernandez G, Elshout JA, Rijpkema M, Hoogman M, et al. Dissociable effects of dopamine and serotonin on reversal learning. Neuron. 2013;80:1090–1100.

    Article  CAS  Google Scholar 

  10. Jackson SAW, Horst NK, Axelsson SFA, Horiguchi N, Cockcroft GJ, Robbins TW, et al. Selective role of the putamen in serial reversal learning in the marmoset. Cereb Cortex. 2019;29:447–60.

    Article  PubMed  Google Scholar 

  11. Groman SM, Lee B, London ED, Mandelkern MA, James AS, Feiler K, et al. Dorsal striatal D2-like receptor availability covaries with sensitivity to positive reinforcement during discrimination learning. J Neurosci. 2011;31:7291–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Brigman JL, Daut Ra, Wright T, Gunduz-Cinar O, Graybeal C, Davis MI, et al. GluN2B in corticostriatal circuits governs choice learning and choice shifting. Nat Neurosci. 2013;16:1101–10.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Yin HH, Ostlund SB, Knowlton BJ, Balleine BW. The role of the dorsomedial striatum in instrumental conditioning. Eur J Neurosci. 2005;22:513–23.

    Article  PubMed  Google Scholar 

  14. Corbit LH, Janak PH. Posterior dorsomedial striatum is critical for both selective instrumental and Pavlovian reward learning. Eur J Neurosci. 2010;31:1312–21.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Balleine BW, O’Doherty JP. Human and rodent homologies in action control: corticostriatal determinants of goal-directed and habitual action. Neuropsychopharmacology. 2010;35:48–69.

    Article  PubMed  Google Scholar 

  16. Graybeal C, Feyder M, Schulman E, Saksida LM, Bussey TJ, Brigman JL, et al. Paradoxical reversal learning enhancement by stress or prefrontal cortical damage: rescue with BDNF. Nat Neurosci. 2011;14:1507–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Haluk DM, Floresco SB. Ventral striatal dopamine modulation of different forms of behavioral flexibility. Neuropsychopharmacology. 2009;34:2041–52.

    Article  CAS  PubMed  Google Scholar 

  18. Verharen JPH, De Jong JW, Roelofs TJM, Huffels CFM, Van Zessen R, Luijendijk MCM, et al. A neuronal mechanism underlying decision-making deficits during hyperdopaminergic states. Nat Commun. 2018;9:1–15.

    Article  CAS  Google Scholar 

  19. Annett LE, McGregor A, Robbins TW. The effects of ibotenic acid lesions of the nucleus accumbens on spatial learning and extinction in the rat. Behav Brain Res. 1989;31:231–42.

    Article  CAS  PubMed  Google Scholar 

  20. Taghzouti K, Le Moal M, Simon H. Enhanced frustrative nonreward effect following 6-hydroxydopamine lesions of the lateral septum in the rat. Behav Neurosci. 1985;99:1066–73.

    Article  CAS  PubMed  Google Scholar 

  21. Stern CE, Passingham RE. The nucleus accumbens in monkeys (Macaca fascicularis). III. Reversal Learn Exp Brain Res. 1995;106:239–47.

    CAS  PubMed  Google Scholar 

  22. Dalton GL, Phillips AG, Floresco SB. Preferential involvement by nucleus accumbens shell in mediating probabilistic learning and reversal shifts. J Neurosci. 2014;34:4618–26.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Castañé A, Theobald DEH, Robbins TW. Selective lesions of the dorsomedial striatum impair serial spatial reversal learning in rats. Behav Brain Res. 2010;210:74–83.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Schoenbaum G, Setlow B. Lesions of nucleus accumbens disrupt learning about aversive outcomes. J Neurosci. 2003;23:9833–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Corbit LH, Muir JL, Balleine BW. The Role of the nucleus accumbens in instrumental conditioning: evidence of a functional dissociation between accumbens core and shell. J Neurosci. 2001;21:3251–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Floresco SB, Ghods-Sharifi S, Vexelman C, Magyar O. Dissociable roles for the nucleus accumbens core and shell in regulating set shifting. J Neurosci. 2006;26:2449–57.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Besson M, Belin D, McNamara R, Theobald DEH, Castel A, Beckett VL, et al. Dissociable control of impulsivity in rats by dopamine D2/3 receptors in the core and shell subregions of the nucleus accumbens. Neuropsychopharmacology. 2010;35:560–9.

    Article  CAS  PubMed  Google Scholar 

  28. Economidou D, Theobald DEH, Robbins TW, Everitt BJ, Dalley JW. Norepinephrine and dopamine modulate impulsivity on the five-choice serial reaction time task through opponent actions in the shell and core sub-regions of the nucleus accumbens. Neuropsychopharmacology. 2012;37:2057–66.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Sesia T, Temel Y, Lim LW, Blokland A, Steinbusch HWM, Visser-Vandewalle V. Deep brain stimulation of the nucleus accumbens core and shell: opposite effects on impulsive action. Exp Neurol. 2008;214:135–9.

    Article  PubMed  Google Scholar 

  30. Alsiö J, Nilsson SRO, Gastambide F, Wang RAH, Dam SA, Mar AC, et al. The role of 5-HT2C receptors in touchscreen visual reversal learning in the rat: a cross-site study. Psychopharmacology. 2015;232:4017–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Jones B, Mishkin M. Limbic lesions and the problem of stimulus-reinforcement associations. Exp Neurol. 1972;36:362–77.

    Article  CAS  PubMed  Google Scholar 

  32. Cools R, Lewis SJG, Clark L, Barker RA, Robbins TW. L-DOPA disrupts activity in the nucleus accumbens during reversal learning in Parkinson’s disease. Neuropsychopharmacology. 2007;32:180–9.

    Article  CAS  PubMed  Google Scholar 

  33. Dagher A, Robbins TW. Personality, addiction, dopamine: insights from Parkinson’s Disease. Neuron. 2009;61:502–10.

    Article  CAS  PubMed  Google Scholar 

  34. Clarke HF, Hill GJ, Robbins TW, Roberts AC. Dopamine, but not serotonin, regulates reversal learning in the marmoset caudate nucleus. J Neurosci. 2011;31:4290–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Dias R, Robbins TW, Roberts AC. Dissociation in prefrontal cortex of affective and attentional shifts. Nature. 1996;380:69–72.

    Article  CAS  PubMed  Google Scholar 

  36. Chudasama Y, Robbins TW. Dissociable contributions of the orbitofrontal and infralimbic cortex to pavlovian autoshaping and discrimination reversal learning: further evidence for the functional heterogeneity of the rodent frontal cortex. J Neurosci. 2003;23:8771–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hervig ME, Fiddian L, Piilgaard L, Božič T, Blanco-Pozo M, Knudsen C, et al. Dissociable and paradoxical roles of rat medial and lateral orbitofrontal cortex in visual serial reversal learning. Cereb Cortex. 2019;00:1–14.

    Google Scholar 

  38. Dalton GL, Wang NY, Phillips AG, Floresco SB. Multifaceted contributions by different regions of the orbitofrontal and medial prefrontal cortex to probabilistic reversal learning. J Neurosci. 2016;36:1996–2006.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Heilbronner SR, Rodriguez-Romaguera J, Quirk GJ, Groenewegen HJ, Haber SN. Circuit-based corticostriatal homologies between rat and primate. Biol Psychiatry. 2016;80:509–21.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Clarke HF, Robbins TW, Roberts AC. Lesions of the medial striatum in monkeys produce perseverative impairments during reversal learning similar to those produced by lesions of the orbitofrontal cortex. J Neurosci. 2008;28:10972–82.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Verharen JPH, Adan RAH, Vanderschuren LJMJ. Differential contributions of striatal dopamine D1 and D2 receptors to component processes of value-based decision making. Neuropsychopharmacology. 2019;0:1–10.

    CAS  Google Scholar 

  42. Swainson R, Rogers RD, Sahakian BJ, Summers BA, Polkey CE, Robbins TW. Probabilistic learning and reversal deficits in patients with Parkinson’s disease or frontal or temporal lobe lesions: Possible adverse effects of dopaminergic medication. Neuropsychologia. 2000;38:596–612.

    Article  CAS  PubMed  Google Scholar 

  43. Flagel SB, Clark JJ, Robinson TE, Mayo L, Czuj A, Willuhn I, et al. A selective role for dopamine in stimulus-reward learning. Nature. 2011;469:53–59.

    Article  CAS  PubMed  Google Scholar 

  44. Fraser KM, Haight JL, Gardner EL, Flagel SB. Examining the role of dopamine D2 and D3 receptors in Pavlovian conditioned approach behaviors. Behav Brain Res. 2016;305:87–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Weiner I, Gal G, Rawlins JNP, Feldon J. Differential involvement of the shell and core subterritories of the nucleus accumbens in latent inhibition and amphetamine-induced activity. Behav Brain Res. 1996;81:123–33.

    Article  CAS  PubMed  Google Scholar 

  46. Aquili L, Liu AW, Shindou M, Shindou T, Wickens JR. Behavioral flexibility is increased by optogenetic inhibition of neurons in the nucleus accumbens shell during specific time segments. Learn Mem. 2014;21:223–31.

    Article  PubMed  PubMed Central  Google Scholar 

  47. Calaminus C, Hauber W. Intact discrimination reversal learning but slowed responding to reward-predictive cues after dopamine D1 and D2 receptor blockade in the nucleus accumbens of rats. Psychopharmacology. 2007;191:551–66.

    Article  CAS  PubMed  Google Scholar 

  48. Frank MJ, Seeberger LC, O’reilly RC. By carrot or by stick: cognitive reinforcement learning in parkinsonism. Science. 2004;306:1940–3.

    Article  CAS  PubMed  Google Scholar 

  49. Alsiö J, Phillips BU, Sala-Bayo J, Nilsson SRO, Calafat-Pla TC, Rizwand A, et al. Dopamine D2-like receptor stimulation blocks negative feedback in visual and spatial reversal learning in the rat: behavioural and computational evidence. Psychopharmacology. 2019;236:2307–23.

  50. Eagle DM, Wong JCK, Allan ME, Mar AC, Theobald DE, Robbins TW. Contrasting roles for dopamine D1 and D2 receptor subtypes in the dorsomedial striatum but not the nucleus accumbens core during behavioral inhibition in the stop-signal task in rats. J Neurosci. 2011;31:7349–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Bachevalier J, Brickson M, Hagger C, Mishkin M. Age and sex differences in the effects of selective temporal lobe lesion on the formation of visual discrimination habits in rhesus monkeys (Macaca mulatta). Behav Neurosci. 1990;104:885–99.

    Article  CAS  PubMed  Google Scholar 

  52. Denys D, van der Wee N, Janssen J, De Geus F, Westenberg HGM. Low level of dopaminergic D2 receptor binding in obsessive-compulsive disorder. Biol Psychiatry. 2004;55:1041–5.

    Article  CAS  PubMed  Google Scholar 

  53. Volkow ND, Fowler JS, Wang GJ, Baler R, Telang F. Imaging dopamine’s role in drug abuse and addiction. Neuropharmacology. 2009;56:3–8.

    Article  CAS  PubMed  Google Scholar 

  54. Horst NK, Jupp B, Roberts AC, Robbins TW. D2 receptors and cognitive flexibility in marmosets: tri-phasic dose–response effects of intra-striatal quinpirole on serial reversal performance. Neuropsychopharmacology. 2019;44:564–71.

    Article  CAS  PubMed  Google Scholar 

  55. Millan MJ, Newman-Tancredi A, Quentric Y, Cussac D. The ‘selective’ dopamine D1 receptor antagonist, SCH23390, is a potent and high efficacy agonist at cloned human serotonin2C receptors. Psychopharmacology. 2001;156:58–62.

    Article  CAS  PubMed  Google Scholar 

  56. Boulougouris V, Robbins TW. Enhancement of spatial reversal learning by 5-HT2C receptor antagonism is neuroanatomically specific. J Neurosci. 2010;30:930–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Delle Donne KT, Sesack SR, Pickel VM. Ultrastructural immunocytochemical localization of neurotensin and the dopamine D2 receptor in the rat nucleus accumbens. J Comp Neurol. 1996;371:552–66.

    Article  CAS  PubMed  Google Scholar 

  58. De Mei C, Ramos M, Iitaka C, Borrelli E. Getting specialized: presynaptic and postsynaptic dopamine D2 receptors. Curr Opin Pharm. 2009;9:53–58.

    Article  CAS  Google Scholar 

Download references


We thank Ms. T. Lapanja for skilled technical assistance and Dr. B. U. Phillips for helpful discussion. The experimental work was carried out under a Home Office Project Licence held by Dr. A. L. Milton.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Trevor W. Robbins.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information


Supplementary Material. Dorsal and ventral striatal dopamine D1 and D2 receptors differentially modulate distinct phases of serial visual reversal learning

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sala-Bayo, J., Fiddian, L., Nilsson, S.R.O. et al. Dorsal and ventral striatal dopamine D1 and D2 receptors differentially modulate distinct phases of serial visual reversal learning. Neuropsychopharmacol. 45, 736–744 (2020).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


Quick links