Distinct roles for dopamine clearance mechanisms in regulating behavioral flexibility

Dopamine plays a crucial role in adaptive behavior, and dysfunctional dopamine is implicated in multiple psychiatric conditions characterized by inflexible or inconsistent choices. However, the precise relationship between dopamine and flexible decision making remains unclear. One reason is that, while many studies have focused on the activity of dopamine neurons, efficient dopamine signaling also relies on clearance mechanisms, notably the dopamine transporter (DAT), which predominates in striatum, and catechol-O-methyltransferase (COMT), which predominates in cortex. The exact locus, extent, and timescale of the effects of DAT and COMT are uncertain. Moreover, there is limited data on how acute disruption of either mechanism affects flexible decision making strategies mediated by cortico-striatal networks. To address these issues, we combined pharmacological modulation of DAT and COMT with electrochemistry and behavior in mice. DAT blockade, but not COMT inhibition, regulated sub-second dopamine release in the nucleus accumbens core, but surprisingly neither clearance mechanism affected evoked release in prelimbic cortex. This was not due to a lack of sensitivity, as both amphetamine and atomoxetine changed the kinetics of sub-second release. In a multi-step decision making task where mice had to respond to reversals in either reward probabilities or the choice sequence to reach the goal, DAT blockade selectively impaired, and COMT inhibition improved, performance after reward reversals, but neither manipulation affected the adaptation of choices after action-state transition reversals. Together, our data suggest that DAT and COMT shape specific aspects of behavioral flexibility by regulating different aspects of the kinetics of striatal and cortical dopamine, respectively.


INTRODUCTION
Changes in behavioral flexibility are a central component of multiple psychiatric and neurological conditions, such as schizophrenia and substance use disorders, that involve dysfunctional dopamine transmission [1,2]. However, the precise relationship between flexible behavior and patterns of dopamine signaling in different brain regions remains a matter of contention. Understanding how dopamine signaling regulates behavioral flexibility at different timescales and in different brain regions has important implications for the development of the more selective dopaminergic agents that are required. For example, in the case of schizophrenia, although antipsychotic drugs successfully antagonize the elevated striatal presynaptic dopamine function observed in patients [3], and thereby successfully treat psychosis, they have little or no effect on the cognitive dysfunction that patients experience, which is thought to result at least in part from insufficient dopaminergic transmission in the prefrontal cortex [4]. Similarly, although L-DOPA successfully remediates the motor symptoms of Parkinson's disease, it has limited influence on-or may even impair-behavioral flexibility in this patient group [5].
Rapid and precisely-timed dopamine activity plays a crucial role in adaptive behavior, signaling how much better or worse an event is than predicted (a "reward prediction error") [6][7][8]. Boosting or blocking these phasic signals can change the likelihood of repeating or switching away from a recent choice [9,10]. However, effective dopamine signaling depends not just on dopamine neuron activity but also on clearance mechanisms, which are critical for the temporal and spatial regulation of dopamine's actions. Crucially, clearance mechanisms vary across cortico-striatal networks implicated in flexible decision making [2]. Recycling via the dopamine transporter (DAT) predominates in nucleus accumbens (NAc) and other striatal regions [11], where it regulates the size and kinetics of dopamine transients as well as background dopamine tone [11][12][13], while enzymatic degradation, particularly by catechol-O-methyltransferase (COMT), is more prominent in prefrontal cortex (PFC), where it has been shown to modulate evoked dopamine release measured over minutes [14,15]. These clearance mechanisms are ideally placed to shape mesolimbic and mesocortical dopamine control over flexible decision making. Indeed, recent evidence suggests that manipulations of either the mesolimbic pathway to NAc or mesocortical pathway to medial PFC can have distinct effects on behavioral flexibility [16][17][18]. Moreover, several lines of evidence indicate that cortical and striatal dopamine play synergistic and even opponent roles in shaping appropriate levels of flexibility and focus [19][20][21].
While there have been a number of studies investigating how DAT and COMT shape reward pursuit and cognitive functions like working memory, respectively [22][23][24][25][26], their role in regulating behavioral flexibility has been comparatively neglected. Individual variation in DAT or COMT function in humans has been shown to correlate with differences in reinforcement learning and flexible decision making [27][28][29][30][31][32][33]. However, it is not clear precisely how markers of individual variation in transporter or enzyme function in humans relate to dopamine signaling in vivo. Notably, when the effects of targeted pharmacological or genetic manipulations of DAT have been investigated in animal models, deficits in reward learning or behavioral flexibility have been notably absent despite changes in motivation [22,23,34,35], while evidence from animal models of COMT's involvement in behavioral flexibility is limited [36]. Moreover, there is still much uncertainty over the locus, extent, and timescale of the effects of DAT and COMT, particularly concerning their regulation of phasic fluctuations in dopamine that produce reward prediction error-like signals in both striatum and cortex [17,37]. Understanding these relationships is particularly critical given that novel COMT inhibitors are currently under development as putative novel agents for treating cognitive impairments [38].
To address these issues, we combined pharmacological modulation of DAT and COMT function with electrochemistry and behavior (see Fig. 1 for a schematic of the experiments). We first determined the effects of DAT blockade and COMT inhibition on fast fluctuations in dopamine in NAc using fast-scan cyclic voltammetry (FCV) and extended this approach for the first time to understand the effects of DAT and COMT on stimulated release in PFC. We then studied the effects of each clearance mechanism on motivation-to-work and behavioral flexibility using, first, an effort-based operant task and, second, a multi-step decision-making task that dissociated changes in the optimal response strategy to reach a goal from changes in reward probabilities at the goal location. We predicted that changes in motivation-towork would reflect DAT's and COMT's capacity to modulate NAc dopamine. However, given the interplay of cortico-striatal systems and dopamine in behavioral flexibility, we anticipated that both DAT blockade and COMT inhibition would influence how quickly animals responded to changes on the multi-step decision making task, potentially either by changing the strategies the animals use or by modulating how they update their behavior in response based on their recent choices and outcomes. Moreover, based on the evidence that dopamine is predominantly activated by reward prediction errors over sensorimotor surprise [39], we anticipated that any influence of DAT or COMT might be selective to situations where there is a need to update choice strategies following a change in reward value, but not, for example, when needing to update information about the appropriate action sequence to reach the high value reward option.

Animals
For details of animal numbers, see figures and Supplementary Information. Adult male C57BL/6 mice were used because COMT exhibits sexually dimorphic effects [40]. Mice were maintained at >85% free feeding weight for all behavioral experiments, except for the sucrose preference test where they were water deprived for 3 h prior to test sessions. Food and water were otherwise provided ad libitum. All experiments were conducted in accordance with the UK Home Office Animals (Scientific Procedures) Act 1986 and the local ethical review board at the University of Oxford. Fig. 1 Schematic of the study design and details of the primary hypotheses to be tested.

Drugs
In all experiments mice were administered the selective COMT inhibitor tolcapone (30 mg/kg; TRC Inc) [25,41] and/or the selective DAT blocker GBR-12909 dihydrochloride (6 mg/kg, Tocris) [42]. These doses are effective at altering behavior without causing motor stereotypies (see Supplementary Information). D-amphetamine (in sulfate formulation; Tocris) was used at 4 mg/kg [43], and the NET blocker atomoxetine (in hydrochloride formulation; Tocris) at 1 mg/kg [44] in the voltammetry experiment only. All drugs were dissolved in 20% hydroxypropyl-beta-cyclodextrin (Acros Organics) in 0.9% saline (AquPharm), which served as a vehicle control in all experiments. All drugs were delivered by intraperitoneal injection.

Fast-scan cyclic voltammetry (FCV)
Carbon fiber electrodes targeting NAc core or prelimbic PFC were implanted and voltammetric recordings were made as previously described [45]. Catecholamine release was induced by electrical stimulation of the ventral tegmental area (VTA) (60 biphasic 2 ms pulses, 50 Hz, 300 μA), based on literature [46,47] and on pilot experiments. Following a 30 min pre-drug baseline period, tolcapone or vehicle was administered. GBR-12909 or vehicle was then given after a further 90 min of recording. To determine whether it was possible to detect pharmacologically-induced changes in VTA-evoked transients in medial PFC using FCV, amphetamine was tested either after the last injection or in naïve animals, and atomoxetine was  Table 1 for numbers in each treatment group).

Behavioral tasks
Progressive ratio (PR) task. To investigate the influence of DAT blockade or COMT inhibition on motivation-to-work and response vigor, mice were trained on a PR task. During PR sessions, the number of active lever presses required to obtain each subsequent reward (60 µL of 10% sucrose solution) was increased according to the following equation: required lever presses = 5*e i*0.16 -5 ("i": trial number). Drug effects were assessed by giving mice two systemic injections prior to PR test sessions: tolcapone or vehicle (either 105 or 120 min before testing), followed by GBR-12909 or vehicle (15 or 60 min before testing). Each mouse received all possible drug combinations, including combined DAT blockade and COMT inhibition, over four PR test sessions according to a counterbalanced within-subjects design.
Multi-step decision making task. To investigate the influence of DAT blockade or COMT inhibition on reinforcement learning and behavioral flexibility, mice were trained on a multi-step decision making task adapted from the two-step task developed by Daw et al. [48] for dissociating modelbased and model-free reinforcement learning in humans, as reported in [49,50] (see Supplementary Table 2 for training steps). On each trial, mice chose between a top and a bottom nose poke, causing either the left or right reward port to light up, selection of which caused reward (20% sucrose) to be delivered on a probabilistic schedule. At any point in time, a particular first-step action (top or bottom) usually led to a particular secondstep state (left or right port active) ("common" transitions, 80% of trials), though sometimes led to the opposite state ("rare" transitions, 20% of trials). Similarly, one reward port had a high probability of giving reward (0.8) and the other a low reward probability (0.2). Unlike other recent rodent adaptations of multi-step decision tasks [51][52][53], both the reward probabilities in the second-step states and the transition probabilities linking the first-step actions to the second-step states reversed in blocks (20 trials after an exponential moving average of choices [tau = 8 trials] crossed a 75% correct threshold, leading to either a reward or transition reversal).
Once animals had acquired the task, pharmacological manipulations were performed. Tolcapone or its vehicle were administered 90 min prior to sessions; GBR-12909 or its vehicle were administered 15 min prior to sessions. Each subject received a total of eight separate sessions of each drug and five sessions of each vehicle injection, order counterbalanced across animals.

Data analysis
Detailed analysis methodologies are reported in Supplementary Information. Full statistical results are presented in figure legends and Supplementary Information. The datasets are available from the corresponding author on reasonable request.
In brief, for the FCV experiments, we investigated whether DAT blockade or COMT inhibition affected dopamine release and kinetics of uptake in NAc or PFC by examining: (i) peak height; (ii) latency to peak; and (iii) rate of decay to half peak height (T50) separately in each region. For the primary experiments, the significance of drug effects on each parameter was determined using repeated-measures ANOVAs, with time as the within-subjects factor and drug 1 (tolcapone or vehicle) and drug 2 (GBR-12909 or vehicle) as between-subjects factors. In addition, Bayesian independent samples t tests were performed to compare evidence that the vehicle and drug groups were different post-administration. For the positive control experiments, the drug group (amphetamine or atomoxetine) was directly compared to vehicle using independent samples t tests. Full statistics are presented in Supplementary Tables 3-5. For the PR task, the main outcome measure was cumulative active lever presses over the session as a measure of motivation, though we also examined inactive lever presses and average reward collection and reengagement latencies. The significance of drug effects on each of these measures was assessed using a repeated-measures ANOVA, with drug 1 (tolcapone or vehicle) and drug 2 (GBR-12909 or vehicle) as within-subjects factors.
For the multi-step task, we focused on the effect of DAT blockade or COMT inhibition on motivational engagement and flexible updating. We examined drug effects on (i) trial rate, (ii) second-step reaction times (divided by common/rare), and (iii) choices pre-/post-reversal (separated by reward-or transition-reversal) using repeated-measures ANOVAs with within-subjects' factors of vehicle/drug and GBR-12909/tolcapone. To get a more fine-grained picture of how adaptation to reversals was affected by the drugs, we also calculated post-reversal choice probability trajectories Fig. 2 Effects of COMT inhibition and DAT blockade on evoked dopamine release in the NAc. A Schematic of recording and stimulating electrodes in NAc core and VTA, respectively (see Figure S1 for precise locations). B Illustration of features of dopamine transients quantified during analysis. C Schematic of experiment structure. Drug injections are indicated below the timeline and time points of interest above it. D An example recording made in the NAc core. The pseudocolour plot shows the recorded current as a function of the applied potential of the triangular waveform (y axis) over an 11 s period (x axis); timing and duration of stimulation indicated by thick black bar. The trace above the pseudocolour plot shows the extracted current attributable to dopamine release as a function of time. The cyclic voltammogram (current versus applied voltage; boxed inset) during the peak of the signal, 1.5 s after the time of stimulation, is consistent with dopamine release. E Evoked dopamine release in NAc in animals given the COMT inhibitor as drug 1 (red) compared to release in those given vehicle as drug 1 (black) 85 min after the first injection. Release is normalized to the average pre-drug baseline peak height (equivalent to 100% on the y axis), binned over 15 min centered on the time point of interest, and presented as mean ± SEM across animals within each drug group. Timing and duration of stimulation indicated by thick black bar. F As in E, but comparing release in animals given the DAT blocker as drug 2 (blue) with release in those given vehicle as drug 2 (black) 30 min after the second injection. G-J Influence of COMT inhibition or DAT blockade on dopamine release kinetics, measured as a change from pre-drug baseline. G Quantification of the peak height of evoked dopamine release in NAc following administration of the COMT inhibitor. Left: peak height at the same time point shown in E. Each animal's data is shown individually and is normalized to its average pre-drug baseline peak height (equivalent to 100% on the y axis). Box plots show median and 25th and 75th percentiles; whiskers extend from the minimum to maximum value. Right: normalized peak height (mean ± SEM for each drug group) of dopamine transients induced by VTA stimulation, recorded every 5 min over the 90 min following the first injection in animals that received the COMT inhibitor compared with those that received vehicle as drug 1. H As in G, but comparing data from animals that received the DAT blocker with data from those that received vehicle as drug 2; right-hand plot shows peak height over the 90 min following the second injection. I As in the left-hand plot of G, but showing the quantification of the latency from stimulation to peak (left) and of the decay from the peak to T50 (right). Decay constant data are shown on a log 10 scale for clarity. J As in I, but for the DAT blocker. Statistical analysis: COMT inhibition did not alter the size or kinetics of evoked release (all F < 2.4, p > 0.14; 95% CI for difference between tolcapone and vehicle: release: Post-hoc tests confirmed significantly greater evoked release following administration of the DAT blocker than vehicle (p = 0.002, 95% CI for difference: 26.32 to 104.66). DAT blockade also slowed the latency to peak compared to vehicle (GBR-12909 × time interaction: F 1,19 = 12.6, p = 0.002; main effect of GBR-12909: F 1,19 = 7.8, p = 0.011; main effect of time: F 1,19 = 6.3, p = 0.021; 95% CI for difference between GBR-12909 and vehicle: 12.59 to 48.58). In addition, there was an interactive effect of the DAT blocker and time on the rate of decay from peak (F 1,19 = 6.5, p = 0.020). Post-hoc tests confirmed the selective effect of the DAT blocker on the latency to peak (p = 0.002) and showed a trend level effect of DAT blockade on post-peak kinetics compared to vehicle (p = 0.078, 95% CI for difference −82.15 to 4.78), with a difference in the rate of decay in the DAT blockade group before and after receiving the drug (p = 0.007) that was not seen in the vehicle group (p = 0.565).
(see Supplementary Materials for details) and used permutation testing to evaluate differences between each drug and its corresponding vehicle. Finally, the statistical significance of trial-to-trial learning, and its modulation by drug treatment, was assessed using a logistic regression model and a hierarchical regression was fit using a mixed-effects package (lme4).

DAT, but not COMT, influences fast dopamine fluctuations in NAc
We first wanted to extend our understanding of how DAT and COMT shape dynamics of catecholamine transmission in striatum and cortex by using FCV to assess the size and timing of release events at sub-second resolution. This is important because the precision of dopamine transients is thought to be critical for appropriate reward learning and behavioral flexibility and because most previous studies have assessed the roles of DAT and COMT on dopamine transmission at relatively slow (minute-level) timescales [54][55][56][57][58].
We initially focused on VTA-evoked release in NAc core (Fig. 2, Figure S1), both to validate the effects of our stimulation parameters and chosen dose of GBR-12909 on striatal dopamine and to examine the influence of COMT on these signals. As expected, DAT blockade increased the magnitude and duration of dopamine transients: GBR-12909 increased the peak size of evoked release and slowed both the latency to peak and the rate of decay from peak (GBR-12909 × time interaction: all F > 6.5, p < 0.021, Fig. 2F, H, J). In contrast, COMT inhibition had no effect (all F < 2.4, p > 0.14) (Fig. 2E, G, I). When considered with previous microdialysis studies [54,56,58], this implies that only DAT recycling, and not COMT breakdown, plays a direct role in shaping the dynamics of fast dopamine transmission in NAc.
Neither DAT nor COMT affect fast catecholamine fluctuations in PFC We then examined VTA-evoked release in prelimbic PFC (Fig. 3,  Fig. S2). Transients here were in general considerably smaller than in NAc (pre-drug: NAc 692.2 ± 234.87 nM; PFC 112.05 ± 13.71 nM, mean ± SEM: Fig. 3B). Unexpectedly, we found little reliable influence of either DAT blockade (Fig. 3D, F, H) or COMT inhibition (Fig. 3C * main effects or interaction with drug: F < 2.67, p > 0.12, except GBR-12909 × time interaction F 1,21 = 6.0, p = 0.023). The lack of observed effect was not because our approach was insufficiently sensitive to detect drug-induced changes in PFC: both amphetamine, an indirect sympathomimetic known to potentiate dopamine (Fig. 3I), and atomoxetine, a norepinephrine transporter (NET) blocker (Fig. S3), changed VTA-evoked release kinetics (all p < 0.031) (see Fig. S4 for comparison of the effect of each pharmacological treatment on stimulation-evoked dopamine levels). Therefore, although COMT is implicated in shaping minute-by-minute dopamine tone in frontal cortex [36,57,59], our data indicate that neither it nor DAT regulates the release and kinetics of fast catecholamine transmission, induced here by periodic brief VTA stimulation over a 90 min period, in this region.
DAT blockade, but not COMT inhibition, increases motivation to work for reward We investigated how differential regulation of dopamine clearance by DAT and COMT influences motivation to work for reward. While striatal dopamine and DAT have been linked to motivational drive by numerous previous studies [22][23][24]60], much less is known about COMT's influence [54,58].
We therefore first examined the separate and interactive effects of DAT and COMT on motivation to work for reward on a progressive ratio (PR) task (Fig. 4). DAT blockade increased motivational drive indexed by willingness to work for reward and sustained engagement. It selectively increased active lever presses (main effect of GBR-12909: F 1,19 = 26.4, p < 0.001, Fig. 4C, D) and speeded re-engagement latencies (main effect of GBR-12909: F 1,19 = 15.4, p = 0.001, Fig. 4E, F). In contrast, there were no reliable effects of COMT inhibition, either in isolation or in interaction with DAT blockade, on any measures in the PR task (all F < 3.1, p > 0.09, Fig. 4C-F).

DAT and COMT have distinct effects on value updating during multi-step decision making
The results of our voltammetry and PR experiments largely reinforce traditional views of DAT's and COMT's roles in dopamine transmission and behavior, emphasizing DAT's role in shaping striatal dopamine transmission and aspects of motivational drive. However, given the importance of precisely-regulated dopamine transmission for appropriate reward-guided learning and flexible decision making, we hypothesized that DAT and COMT might affect these processes in the context of a more complex decisionmaking task likely to recruit both striatal and cortical circuitry (Fig. 5). Unlike other similar rodent multi-step paradigms [51][52][53], our task included reversals in both reward and action-state transition probabilities (Fig. 5C). Crucially, this allowed us to examine the influence of DAT and COMT over different aspects of behavioral flexibility: how efficiently mice respond either when the location of high probability reward changes (reward reversals) or when the initial choice required to reach the high probability reward changes (transition reversals).
Mice became proficient at the task, triggering reversals on average in <30 trials prior to drug administration, and performing on average 434-799 trials during each of 26 experimental sessions. We first examined the influence of DAT or COMT on willingness-towork given that the actions incurred negligible effort-related cost. Consistent with our PR experiment, DAT blockade but not COMT inhibition increased the rate at which subjects performed trials (veh/drug × DAT/COMT interaction: F 1,7 = 6.52, p = 0.038, Fig. 5D, F). There was also a selective speeding of second-step reaction times following DAT blockade, particularly following rare transitions (veh/drug × DAT/COMT interaction: F 1,7 = 7.76, p = 0.027, veh/drug × common/rare interaction: F 1,7 = 9.26, p = 0.019, Fig. 5E, G). These findings further support a selective role for DAT, but not COMT, in invigorating reward-guided responding. Fig. 3 Effects of COMT inhibition and DAT blockade on evoked dopamine release in the PFC. A Schematic of recording and stimulating electrodes in prelimbic PFC and the VTA, respectively (see Fig. S2 for precise histology). B An example recording made in the prelimbic PFC. The pseudocolour plot shows the recorded current as a function of the applied potential of the triangular waveform (y axis) over a 6 s period (x axis); timing and duration of stimulation indicated by thick black bar. The trace above the pseudocolour plot shows the extracted current attributable to dopamine release as a function of time. The cyclic voltammogram (current versus applied voltage; boxed inset) during the peak of the signal, 1.4 s after the time of stimulation, is consistent with catecholamine release. C Evoked dopamine release in the PFC in animals given the COMT inhibitor as drug 1 (red) compared to release in those given vehicle as drug 1 (black) 85 min after the first injection. Release is normalized to the average pre-drug baseline peak height (equivalent to 100% on the y axis), binned over 15 min centered on the time point of interest, and presented as mean ± SEM across animals within each drug group. Timing and duration of stimulation indicated by thick black bar. D As in C, but comparing release in animals given the DAT blocker as drug 2 (blue) with release in those given vehicle as drug 2 (black) 30 min after the second injection. E-H Influence of COMT inhibition or DAT blockade on dopamine release kinetics, measured as a change from pre-drug baseline. E Quantification of the peak height of evoked dopamine release in PFC following administration of the COMT inhibitor. Left: peak height at the same time point shown in C. Each animal's data is shown individually and is normalized to its average pre-drug baseline peak height (equivalent to 100% on the y axis). Box plots show median and 25th and 75th percentiles; whiskers extend from the minimum to maximum value. Right: normalized peak height (mean ± SEM for each drug group) of dopamine transients induced by VTA stimulation, recorded every 5 min over the 90 min following the first injection in animals that received the COMT inhibitor compared with those that received vehicle as drug 1. F As in E, but comparing data from animals that received the DAT blocker with data from those that received vehicle as drug 2; right-hand plot shows peak height over the 90 min following the second injection. G As in the left-hand plot of E, but showing the quantification of the latency from stimulation to peak (left) and of the decay from the peak to T50 (right). H As in G, but for the DAT blocker. l The effect of amphetamine (magenta) on evoked dopamine release in PFC. Left: schematic of experiment structure. Middle: evoked dopamine release normalized to the average pre-drug baseline peak height, binned over 15 min centered on the time point of interest (30 min after drug injection), and presented as mean ± SEM across animals within each group. Right: quantification of the signal decay, with each animal's data shown individually and normalized to its average pre-drug baseline decay constant. Statistical analysis: Neither COMT inhibition nor DAT blockade altered any measure of evoked release in PFC (all main effects or interactions with drug: F < 2.67, p > 0.12, with the exception of a GBR-12909 × time interaction on post-peak signal decay (F 1,21 = 6.0, p = 0.023), caused by small changes in the slope pre-and post-drug in the two groups, which likely reflects the difficulty of performing an exponential fit on small cortical transients rather than a drug effect, as there was no difference between vehicle and GBR-12909 after drug administration (p = 0. We next assessed how DAT blockade and COMT inhibition affected choice performance, focusing on reward and transition reversals. Neither drug affected pre-reversal choice rates at the end of blocks (all F 1,7 < 1.59, p > 0.24). Following reward reversals, DAT blockade significantly slowed (p = 0.0012) and COMT inhibition significant improved (p = 0.0496) adaptation, as measured by the choice probability trajectories on and off drug (Fig. 5H, J). Importantly, however, neither drug affected the speed of adaptation following reversals in transition probabilities (all p > 0.22) (Fig. 5I, J; Supplementary Tables 6, 7). To further examine the specificity of this reversal effect, we directly compared whether choices immediately after each type of reversal were selectively influenced by drug administration. This again highlighted that the effect of DAT blockade and COMT inhibition depended on whether the reward or transition probabilities reversed (veh/drug × DAT/COMT × reward/transition-reversal interaction: . Taken together, these data demonstrate that dopamine clearance mechanisms shape adaptive behaviors only when there is a change in reward location and not when the action sequence required to reach that reward reverses. To gain insight into what might be causing these changes in behavioral flexibility, we investigated the specific influence of DAT and COMT on decision-making strategies and trial-to-trial learning. Consistent with previous work using the same task [50], mice were sensitive to the transition structure of the task, not simply being more likely to repeat any choice that was rewarded, as would be expected if they were exclusively using a "model-free" reinforcement learning strategy, but instead being more likely to repeat a rewarded choice after a common than a rare transition (Fig. 5L, M). A comparison of reinforcement learning models showed that mice's choices were best explained by a model that employed both model-free and model-based control (see Supplementary Information for discussion of the mice's strategies, Fig. S5, Supplementary Tables 8, 9). Moreover, a logistic regression analysis of the factors influencing choice repetition demonstrated a significant effect of both outcome and transition (p < 0.001 mixed effects logistic regression).
Importantly, and more unexpectedly, neither DAT nor COMT manipulations reliably modified these measures, as might be expected if either mechanism were simply disrupting trial-by-trial reward learning (p > 0.086 for all interactions of outcome and transition with drug condition). There was also no effect on choice or motor biases (p > 0.21). Instead, the drug manipulations-and particularly DAT blockade-affected the regression model's "correct" predictor, which models a tendency to repeat choices to the first step option that has a higher probability of leading to reward at the end of the trial (correct × veh/drug x DAT/COMT interaction: p = 0.016). This predictor captures the influence of the extended history prior to the previous trial [49]. Together, these results imply that DAT blockade impairs reward-guided behavioral flexibility not by biasing decision making strategies or through a direct effect on reward learning, but instead by disrupting the cumulative effect of past choices and outcomes on reward-driven choices.

DISCUSSION
Here we demonstrate distinct roles for DAT and COMT in flexible reward-guided decision making. FCV recordings confirmed a role for DAT recycling, but not COMT degradation, in regulating fast fluctuations in NAc dopamine transmission but showed that neither affected evoked transients in PFC, indicating that clearance mechanisms other than DAT and COMT-potentially including NET-contribute to regulation of cortical dopamine at sub-second timescales. Behaviorally, we confirmed that DAT blockade, but not COMT inhibition, increased motivational drive and promoted task engagement, as previously observed [22,23]. Importantly, in line with evidence indicating distinct but interconnected relationships between striatal and cortical dopamine to enable appropriate levels of flexibility and focus [19], we additionally demonstrated specific and bidirectional roles for dopamine clearance mechanisms in behavioral flexibility, with DAT impairing and COMT causing a relative improvement in performance selectively following reversals in reward probabilities.
Previous studies have indicated that DAT can affect motivation but has limited, if any, influence over reward learning [22-24, 29, 34, 60]. Here, we demonstrated that DAT not only modulates motivational performance and task engagement but, notably, can also influence behavioral flexibility. Specifically, DAT blockade selectively disrupted the ability to adapt following reversals in reward probabilities. Strikingly, the effect of DAT blockade was not observed following reversals in the transition probabilities linking first-step choices to second-step reward states, even though these also required animals to adapt their behavior (in this case, at the first-step choice) in order to maximize rewards obtained. Therefore, the effect cannot be attributed to a general behavioral inflexibility, a finding also supported by the lack of influence on any motor level biases in the logistic regression model. Instead, our data are consistent with DAT influencing reward-driven alternations in behavioral policies in addition to shaping motivational components of reward-guided behavior.
In agreement with previous studies [54,56,58], we found that DAT blockade both increased and extended evoked NAc dopamine transients. The effects were similar to those seen on spontaneous transients in freely-moving animals following administration of a catecholamine transporter blocker or stimulant drugs [13,43]. In addition, we found no effect of DAT blockade on evoked release in prelimbic PFC, consistent with the sparse DAT expression in this region [61]. Therefore, DAT blockade predominantly exerts its effects through regulation of striatal dopamine.
Given that fast fluctuations in striatal dopamine correlate with reward prediction error signals, which are strongly linked to animals' ability to form certain reward-related associations [39,62], it may seem surprising that there is such limited evidence in animal models linking DAT to regulation of adaptive behavior. However, although the reward reversal impairment might initially seem to imply a change in reward learning, there was in fact no reliable evidence that DAT blockade caused alterations to specific trial-by-trial reinforcement learning or a change in reinforcement learning strategies. Instead, it appears that DAT more prominently helps to regulate how the history of past choices and outcomes influences behavioral policies, which can shape how quickly animals detect a change in reward value. Such a conclusion aligns with studies showing effects of dopamine on flexible responding that go beyond influences on trial-by-trial reinforcement learning. For instance, DAT knockdown mice, which have increased tonic dopamine (though, unlike in this study, reduced phasic transients), Fig. 4 Effects of DAT blockade and COMT inhibition on the progressive ratio (PR) task. A within-subjects design was used to test drug effects; n = 21. A Depiction of the progression across trials of lever press ratios required to obtain reward during PR sessions. B Schematic of experiment structure. Drug injections are indicated below the timeline. Timing of drug administration shown is for cohort 2 (see Supplementary Methods) Each data point shows the lever press session total for one animal. Boxplots show median and 25th and 75th percentiles; whiskers extend from the minimum to maximum value. Lever press data are shown on a log 10 scale for clarity. D As in C, but for responses on the inactive lever. Data points again show lever press session totals for individual animals and are displayed on a log 10 scale. E As in C, but for the latency to collect reward following its delivery. Each data point shows the cross-trial average latency for one animal. F As in C, but for the latency to reengage with the task by recommencing lever pressing following the consumption of reward. Data points show cross-trial average latencies for individual animals. Statistical analysis: DAT blockade increased active lever presses (main effect of GBR-12909: F 1,19 = 26.4, p = 6 × 10 −5 ) and speeded task performance (main effect of GBR-12909 on re-engagement latency: F 1,19 = 15.4, p = 0.001; main effect of GBR-12909 on reward collection latency: F 1,19 = 4.1, p = 0.056) but did not affect inactive lever presses (F 1,19 = 2.7, p = 0.118). COMT inhibition did not alter PR task performance, either on its own or when combined with DAT blockade (all F < 3.1, p > 0.09).
have reduced flexibility in response to changes in effort cost that has been linked to a reduced sensitivity to recent outcome history [63]. Similarly, a human DAT polymorphism specifically influenced perseverative errors during a probabilistic reward reversal task [30]. These findings imply that the behavioral consequences of DAT blockade on reward learning and behavioral flexibility will depend strongly on how changeable and uncertain rewards are in any context.
Research on COMT's role in behavior has largely focused on cognitive functions; its significance for reward-guided behavior remains relatively unstudied [33]. We found that COMT inhibition had no effect on motivational aspects of behavior, consistent with its lack of effect on NAc dopamine [54,64,65]. More surprisingly, given robust associations between COMT activity and dopamine levels recorded over longer (min) timescales by microdialysis, there was no evidence that COMT inhibition had an effect on the size or kinetics of VTA-evoked transients in prelimbic PFC, at least within the sensitivity limits afforded by FCV for dopamine (~10 nM level of detection: [66]) and sample sizes used for these comparisons (n = 11-14) (although it is important to note that our data also only provide anecdotal evidence in favor of the null hypothesis that vehicle and tolcapone had equal effects on release, see Supplementary  Table 4). This was not due to a lack of sensitivity to sub-second changes in cortex, as amphetamine increased VTA-evoked transient size and slowed the decay of cortical signals, while the NET blocker atomoxetine slowed signal decay in PFC. Taken together with previous studies, this points towards COMT having a selective influence in regulating cortical dopamine transmission over longer timescales than are required for precise reinforcement learning [36,57,59]. These results, however, contrast with a report of differences in dopamine overflow in COMT knockout mice [46]. This discrepancy may reflect different methodological and analytical approaches between this previous study and the one reported here (medial forebrain bundle stimulation combined with constant potential amperometry interspersed with brief switches to FCV in [34] compared with VTA stimulation combined with FCV and chemometric analysis in the current study). However, it may also speak to differences between the acute changes in COMT in adult animals induced by acute and reversible pharmacological inhibition, compared to the constitutive changes present in the knockout mouse, in which compensatory changes are possible [25]. Consistent with this proposal, human COMT genotype is associated with changes in midbrain dopamine function [67,68], suggesting that the prolonged changes in dopaminergic transmission that presumably result from genetically-encoded differences in COMT activity may induce subtle functional changes elsewhere in the dopamine system. This highlights the need for a greater understanding of the timescales over which compensatory changes across cortical and subcortical dopamine systems occur. Although COMT does not appear to regulate the kinetics of evoked dopamine transients, COMT inhibition nonetheless selectively speeded adaptation to reward reversals relative to vehicle injections. The precise mechanism by which COMT regulates reward-driven behavioral flexibility remains unclear, as there were no consistent changes in behavioral strategy or reinforcement learning that emerged from the analyses. Although acute pharmacological inhibition is different than the chronic changes in enzymatic activity arising from the human COMT Val/Met polymorphism [69], our findings are concordant with reports of increased sensitivity to rewards and faster reinforcement learning in Met allele carriers, who have lower COMT activity than Val allele homozygotes [70,71]. Moreover, the absence of any detectable influence of COMT inhibition on cortical or striatal dopamine transients, similar to those that would be generated by reward prediction error-driven dopamine activity, implies that its role is not specifically to modulate how individual experiences shape learning but instead to regulate dopamine-driven cortical states over minute-by-minute timescales. Given that our data indicate that NET may be a key regulator of cortical dopamine signaling at fast (second-by-second) timescales, it will be interesting in future studies to investigate how NET blockers influence rapid reinforcement learning.
Together, our data suggest that DAT and COMT differentially regulate specific aspects of reward-based behavioral flexibility. Surprisingly, however, we found no evidence that either do this by influencing the balance of reinforcement learning strategies or shaping trial-by-trial learning. These findings demonstrate the importance of different regulators of dopamine signaling, operating over distinct timescales and in different brain regions, in shaping multiple aspects of adaptive reward-guided behavior beyond trial-by-trial reinforcement learning. In addition, they reinforce the idea that appropriate levels of behavioral flexibility, motivation, and engagement rely, at least in part, on the balance of striatal and cortical dopamine. Such data, therefore, highlight both the promise and the complexities associated with the development of novel pharmacological agents to remedy dopamine-dependent behavioral changes seen in patients with psychiatric disorders. Fig. 5 Effects of DAT blockade and COMT inhibition on performance during a multi-step decision making task. A within-subjects design was used to test drug effects; n = 8, 26 vehicle/drug sessions per mouse (8 × GBR-12909, 8 × tolcapone, 5 × each drug's vehicle), average range = 434-799 trials per session per mouse. A Diagram of two-step task apparatus and trial events. B Diagram of the task state space; reward and transition probabilities are shown for one block type. C Example session. Top panel: black line shows exponential moving average of subjects' choices (tau = 8 trials). The correct choice (high or low) for each block is indicated by the gray bars. Middle panel: reward probabilities for each block for the left (red) and right (blue) sides. Bottom panel: transition probabilities linking first step actions to second step states; yellow line shows probability of top→left and bottom→right transitions, purple line shows probability of top→right and bottom→left transitions. Colors in middle and bottom panel of C match those in B. D, F Trial rate as a function of session time (D DAT blockade; F COMT inhibition). Trial rates were smoothed with a Gaussian of 5 min standard deviation. Shaded areas show cross-subject SEM. E, G Median second-step reaction times following common and rare transitions (E DAT blockade; G COMT inhibition). The second-step reaction time is the time from choosing high or low to entering the active side port at the second step. Error bars show cross subject SEM. H-K Choice probability trajectories around reversals in reward probabilities (H DAT blockade, J COMT inhibition) and transition probabilities (I DAT blockade, K COMT inhibition). Pale lines show cross-subject mean choice probability trajectory and dark lines show double exponential fit. L, M Probability of repeating the first step choice on the next trial as a function of the trial outcome (rewarded or not) and experience state transition (common or rare) (L DAT blockade, M COMT inhibition). Statistical analysis: DAT blockade, but not COMT inhibition, increased trial rate (main effect of veh/drug: F 1,7 = 9.79, p = 0.016; veh/drug × DAT/COMT interaction: F 1,7 = 6.52, p = 0.038; post-hoc paired t tests, DAT: p = 0.012, COMT: p = 0.98). DAT blockade, but not COMT inhibition, also speeded second-step reaction times (main effect of veh/drug: F 1,7 = 8.20, p = 0.024; veh/drug × DAT/COMT interaction: F 1,7 = 7.76, p = 0.027), particularly following rare transitions (veh/drug × common/rare interaction: F 1,7 = 9.26, p = 0.019; 3-way interaction did not quite reach significance: veh/drug × common/rare × DAT/COMT interaction: F 1,7 = 4.35, p = 0.076). Neither DAT blockade nor COMT inhibition changed the choice rates at the end of each block prior to a reversal (all F < 1.59, p > 0.24). DAT blockade and COMT inhibition both modulated the speed of reward reversals, assessed by fitting a double exponential to the choice probability trajectory and using permutation testing to evaluate whether differences between drug and corresponding vehicle condition were significant (reward reversals: GBR-12909: p = 0.0012; tolcapone: p = 0.0496; transition reversals: all p > 0.23). In addition, direct comparison of post-reversal choices highlighted that the effect depended on whether the reward or transition probabilities reversed (veh/ drug × DAT/COMT × reward/transition-reversal interaction: F 1,7 = 6.88, p = 0.034; reversals in the reward probabilities: veh/drug × DAT/COMT interaction: F 1,7 = 17.85, p = 0.004; reversals in transition probabilities: all F < 0.36, p > 0.56). A mixed effects logistic regression showed that the mice's choices were sensitive to both the outcome (reward/no reward) and the transition structure (common/rare) (both p < 0.001). However, neither of these were reliably affected by drug condition (p > 0.086 for all interactions of outcome and transition with drug condition). Instead, the drug manipulations-and particularly DAT blockade-affected the regression model's "correct" predictor, which captures a general tendency to repeat choices commonly associated with the high reward probability option (correct × veh/drug × DAT/COMT interaction: p = 0.016; DAT blockade, p = 0.056; COMT inhibition: p = 0.28). *, p < 0.05; **, p < 0.01.