Behavioral characteristics of dopamine D5 receptor knockout mice

Major psychiatric disorders such as attention-deficit/hyperactivity disorder and schizophrenia are often accompanied by elevated impulsivity. However, anti-impulsive drug treatments are still limited. To explore a novel molecular target, we examined the role of dopamine D5 receptors in impulse control using mice that completely lack D5 receptors (D5KO mice). We also measured spontaneous activity and learning/memory ability because these deficits could confound the assessment of impulsivity. We found small but significant effects of D5 receptor knockout on home cage activity only at specific times of the day. In addition, an analysis using the q-learning model revealed that D5KO mice displayed lower behavioral adjustment after impulsive actions. However, our results also showed that baseline impulsive actions and the effects of an anti-impulsive drug in D5KO mice were comparable to those in wild-type littermates. Moreover, unlike previous studies that used other D5 receptor-deficient mouse lines, we did not observe reductions in locomotor activity, working memory deficits, or severe learning deficits in our line of D5KO mice. These findings demonstrate that D5 receptors are dispensable for impulse control. Our results also indicate that time series analysis and detailed analysis of the learning process are necessary to clarify the behavioral functions of D5 receptors.

www.nature.com/scientificreports/ than dopamine D 1 receptors 12 . Thus, we hypothesized that the anti-impulsive effects of the above drugs might be exerted via stimulation of dopamine D 5 receptors. The development of a selective dopamine D 5 receptor agonist might resolve problems encountered with the current stable of anti-impulsive drugs by enabling us to selectively manipulate the medial prefrontal cortex without affecting the nucleus accumbens. To our knowledge, however, there are so far no drugs that clearly distinguish between dopamine D 1 and D 5 receptors. Given that psychostimulants primarily facilitate addiction through the modulation of the nucleus accumbens 13 , a selective dopamine D 5 receptor agonist might not induce this process, unlike psychostimulant-based anti-impulsive drugs. Furthermore, a selective agonist for dopamine D 5 receptors would not likely exacerbate hypertension since dopamine D 5 receptor knockout (D5KO) mice are hypertensive 14 . Spontaneous motor activity in D5KO mice is generally normal or reduced, implying that a selective dopamine D 5 receptor agonist will not induce sedation. However, in the absence of selective D5 receptor agonists, examining dopamine D5KO mice is a reasonable way to determine whether D 5 receptors could be a promising target for anti-impulsive drugs.
In the present study, we used an alternative line of D5KO mice 15 instead of traditional D5KO mice 14 because of three reasons. First, the traditional D5KO mice could express truncated transcripts that might alter the expression of related genes 16,17 , while the alternative line of D5KO mice would not express them because the entire dopamine D 5 receptor gene region, including the promoter region, is removed. Second, previous studies have shown that different lines of transgenic mice or different background strains of transgenic mice could alter baseline behavioral phenotype 18,19 . To clarify the role of a molecule in brain functions, we are better off testing not only a specific line or background strain but also another line or background strain. Third, some studies have reported lower spontaneous motor activity and deficits of learning and working memory in traditional D5KO mice [20][21][22] . These phenotypes make it difficult for researchers to assess impulsivity because most tasks evaluating impulsivity assume a certain level of spontaneous activity and learning/memory ability. We speculated that these phenotypes are due to the above reasons, but not due to the lack of D 5 receptors.
In this study, using an alternative line of D5KO mice, we conducted quantitative PCR to confirm that dopamine D 5 receptors were not transcribed as expected and whether compensatory changes in dopamine D 1 receptors did not occur, (2) measured locomotor activity in two different environments: a novel environment and a familiar environment, (3) conducted a Y-maze test to assess working memory, and (4) employed the 3-choice serial reaction time task   11,23 to assess learning ability and impulsivity. To evaluate possible learning deficits or bias, we modeled the learning process within the 3-CSRTT using a q-learning model.

RNA analysis.
To confirm that the dopamine D 5 receptor gene is not transcribed and that a compensatory overexpression of D 1 receptors does not occur, we conducted quantitative PCR tests. As expected, the Drd5 gene expression levels were below the detection limit in the D5KO mice (Fig. 1a). Moreover, the Drd1 gene expression levels were not increased in the D5KO mice compared to wildtype littermates in the hippocampus (t 14 = 0.96, p = 0.35), medial prefrontal cortex (mPFC) (t 14 = 0.61, p = 0.55), and striatum (t 14 = -1.34, p = 0.20) (Fig. 1b).
Home cage activity. To measure locomotor activity in a familiar environment, we measured home cage activity for 24 h. We performed a three-factor ANOVA on the changes in locomotor activity every two hours in their home cages (Fig. 2a). There was a main effect of time (F 5.39, 285.43 = 115.28, p < 0.001, with Greenhouse-Geisser correction). There was a significant interaction between time and genotype (F 1, 53 = 3.34, p < 0.001, with Greenhouse-Geisser correction). Other main effects and interactions were not detected (Table S1).  www.nature.com/scientificreports/ Simple main effects analyses for each time point revealed that D5KO mice were significantly more active than wildtype littermates between 7:00 and 9:00 (F 1, 53 = 4.63, p = 0.036) (Fig. 2a). D5KO mice were also significantly less active than wildtype littermates between 15:00-17:00 (F 1, 53 = 6.88, p = 0.011) (Fig. 2a) A three-factor ANOVA for the percentage of time spent in the central area, which is a measure of decreased anxiety-like behavior, revealed a significant main effect of time (F 2.99, 146.5 = 6.86, p = 0.0002, with Geisser-Greenhouse correction). However, we did not find other main effects or any interactions (Fig. 2d, Table S2).

Y maze test.
To assess working memory in mice, we conducted the Y maze test. Two factor ANOVA for the percentage of spontaneous alternation, a measure of working memory, did not reveal any main effects or interaction (Fig. 3a, Table S3). Two factor ANOVA for the total number of arm entries, a measure of locomotor activity, did not reveal any main effects or interaction (Fig. 3b, Table S3).
Assessment of learning ability with a q-learning model. To determine whether D5KO mice display learning deficits or bias, we modeled the learning process of 3-CSRTT. We assume experience and non-reward distributions to represent premature behaviors of mice. Experience distribution represents a memory state of all trials and is updated by the q-learning process whatever a previous result is, while non-reward distribution represents a memory state of premature responses. Table 1 shows the estimated parameters of a q-learning model, and Table S4 has descriptions of each parameter. In this model, key parameters are learning rates, representing learning ability, and an inverse temperature, representing confidence in their own choice. The baseline effect of learning rate for the experience distribution, α X,0 , was 0.04296 while that for the non-reward distribution, α Y ,0 , was 0.08555. For the experience distribution, only male effects were significant according to the 95% Highest Density Interval (HDI). Also, for the non-reward distribution, both D5KO and male effects were negative and significant, based on 95% HDI. This result indicated that D5KO mice have a learning deficit for premature results, not for non-premature results. As for the inverse temperature, baseline effect, β 0 , took a value of 130.603, and both D5KO and male effects were not significant.
Although α X,male , α Y ,D5KO, and α Y ,male were not zero for these 95% HDI, the degree of effects of these parameters remains unclear. To quantify contributions of these parameters, we simulated the q-learning process with estimated parameters. Table S5 shows proportions of premature responses for each session for trial and simulation data. For the simulation data, overall, the start timing of the proportions was consistently lower than those for trial data. The proportions of premature responses among each session fluctuated for real and simulated results. If we focus on the values at session 10, the minimum values were observed in D5KO male mice, whereas the maximum value was observed in wildtype female mice. Simulated results with individually estimated parameters using the q-learning model over trials indicated the model could potentially capture behaviors of trial results in the 3-CSRTT (see Supplementary Fig. 1).
Averaged values for functions used in the q-learning model at the end of the simulation with estimated parameters are shown in Fig. 4. Values at each elapsed time were averaged values calculated from 100 simulation results. Overall, there were no major differences in the shape of the functions. For the probability of confidence, the rise of the distribution around 5 s was steeper in D5KO mice than in wild type mice, reflecting the higher value of inverse temperature, although the D5KO effect for an inverse temperature is not significant.  (Table S6). Table 1. Estimated parameters of the q-learning model. It is noted that α X,D5KO , α X,male , α Y ,D5KO , and α Y ,male were sampled from real numbers, while the other parameters were sampled from positive real numbers. a HDI represents the highest density interval.  www.nature.com/scientificreports/ Furthermore, other parameters were also affected by the administration of duloxetine. Three factor repeated measures ANOVA revealed a significant dose effect on accuracy (Fig. 5c,d) (F 2.56 , 143.28 = 3.93, p = 0.014, with Greenhouse-Geisser correction), correct response latency (Fig. 5g,h) (F 2.40 , 134.19 = 10.66, p < 0.001, with Greenhouse-Geisser correction), and reward latency (Fig. 5i,j) (F 3, 168 = 6.74, p < 0.001), but not on omission (Fig. 5e,f). Multiple comparisons with Bonferroni's correction revealed that the 3.0 mg/kg dose of duloxetine decreased accuracy compared to the 0.3 mg/kg dose (Fig. 5c,d), while the 1.0 mg/kg dose of duloxetine prolonged correct response latency compared to vehicle, the 3.0 mg/kg dose of duloxetine prolonged correct response latency compared to vehicle and the 0.3 mg/kg dose (Fig. 5g,h), the 3.0 mg/kg dose of duloxetine administration significantly prolonged reward latency compared to vehicle and 0.3 mg/kg duloxetine, and the 1 mg/kg duloxetine administration significantly prolonged the reward latency compared to 0.3 mg/kg (Fig. 5i,j). Moreover, a main effect of sex was detected on mean reward response latency (F 1, 56 = 5.210, p = 0.026). However, we did not find any other main effects or interactions (Table S6).

Discussion
Although minor differences were found, no major differences were observed in any behavioral parameters between wildtype and D5KO mice. Small but significant effects of D5KO were observed in home cage activity only at specific times of day. In addition, D5KO mice displayed lower behavioral adjustments after premature responses in the 3-CSRTT. We did not observe a reduction in locomotor activity in a novel environment or working memory deficits in D5KO mice, inconsistent with some previous studies. No significant effects of D5KO on impulsive action were observed, suggesting that our hypothesis that D 5 receptors play an essential role in impulse control is incorrect. We discuss possible interpretations for each result below.
We replicated a previous KO study that indicated no mRNA expression of D 5 receptors using Northern blotting 15 . We used a different method, quantitative RT-PCR (Fig. 1a) and reached the same conclusion: our D 5 receptor KO mice do not express D 5 receptor at all. Another concern of D 5 receptor KO mice is compensatory effects. For decades, studies from transgenic and gene knockout mice have contributed to the delineation of the functional role of many kinds of proteins. However, recent evidence has demonstrated that the interpretation of these studies may be complicated by compensatory changes in animals because gene mutations truncating the encoded protein could affect the expression of related genes 16,17 . Our RT-qPCR results (Fig. 1b) indicated that we could exclude the possibility of a compensatory increase of dopamine D 1 receptors, which are involved in impulse control. However, we cannot deny the other numerous possibilities that expression of other genes 24 or functional pathways 25,26 was altered in D 5 receptor KO mice. In future studies, an AAV-mediated knockdown or knockout of the D5 receptor in the PFC would be required to ensure no compensatory changes because testing above numerous possibilities is impractical.
We found that dopamine D 5 receptor KO mice exhibited higher locomotor activity at the beginning of the dark period (7:00-9:00) but lower locomotor activity at the end of the dark period (15:00-17:00) in their home cage, a familiar environment (Fig. 2a). Previous studies have shown that dopamine D 5 receptor KO mice display lower locomotor activity than wildtype mice 21,22 , while others did not detect any difference in locomotor activity 15,20 . The present study might explain the inconsistent results from previous studies on locomotor activity in dopamine D 5 receptor KO mice. In the previous studies, an open field test lasting 60 to 150 min has been used to measure locomotor activity. However, since the locomotor activity of dopamine D 5 receptor KO mice changed significantly between the first and second halves of the dark period, the results may vary depending on the time of day the test was conducted. Although speculative, previous studies that showed lower locomotor activity in dopamine D 5 receptor KO mice might have been conducted in the latter half of the dark period.
In the open field test in the present study, we did not detect any difference in locomotor activity in the novel environment between dopamine D5KO mice and their wild type littermates (Fig. 2b,c). This might be due to the fact that the time of measurement was not kept constant. Alternatively, the results in the home cage described above might be limited to a familiar environment. Because further studies examining locomotor activity in the open field at specific times will be required to address this issue, we suspend our conclusion. In addition, there was no difference in the time spent in the central compartment (%), a measure of reduced anxiety-like behavior, in dopamine D 5 receptor KO mice compared to their wildtype littermates (Fig. 2d). Therefore, our findings indicate that dopamine D 5 receptors may not relate to anxiety-like behavior, consistent with previous studies 21, 22 .
In the Y maze test, there was no difference in working memory in dopamine D 5 receptor KO mice compared to their wildtype littermates (Fig. 3). However, in previous studies, dopamine D 5 receptor KO mice tended to exhibit lower working memory [20][21][22] . There are at least two possible explanations for this discrepancy. First, we used an alternative line of D 5 receptor KO mice in this study 15 , while the previous studies that detected working memory deficits had used traditional dopamine D 5 receptor KO mice 14 . As discussed earlier, the traditional mice could alter the expression of related genes 16,17 . Thus, working memory deficits observed in these studies might be due to the secondary effects. The second possibility is the difference in the working memory measurement task employed. A previous study demonstrating working memory deficit in D 5 receptor KO mice used a baited T-maze test 20 . In the present study, we used the Y maze test as a simple test that does not require training. In this test, behavioral variability would be relatively large because we do not provide a clear motivation such as a food reward. Therefore, the Y maze test might not be able to detect minute differences, though the Y maze test in our laboratory can detect working memory deficits by pharmacological manipulation 27 . Therefore, we conclude that the role of dopamine D 5 receptors in working memory is limited.
Because our dopamine D5KO mice showed almost normal motor functions and working memory, we conducted the 3-CSRTT to assess impulsive actions. The q-learning analysis revealed that small deficits of learning were observed in D5KO mice (Table 1). In other words, D5KO mice, especially male mice, exhibit an inferior ability to learn from their mistakes and fine-tune their behavior. However, these small differences did not www.nature.com/scientificreports/ significantly affect behavioral parameters in the 3-CSRTT (Fig. 4). It should be also noted that the variability in the results of each individual mouse and each session is quite high. At a minimum, we raise the possibility that the q-learning model is useful for the analysis of learning processed in the 3-CSRTT, and the detailed time series analysis could provide a clue to clarify the function of D5 receptors. We replicated the dose-dependent anti-impulsive effects of duloxetine previously found in male rats 5 using male and female mice. However, the anti-impulsive effects of duloxetine were detected not only in the wild type littermates but also in dopamine D 5 receptor KO mice. That is, D 5 receptor KO failed to prevent the antiimpulsive effects of duloxetine, indicating that our original hypothesis was incorrect. Moreover, the baseline of impulsive action (following 0 mg of duloxetine) was almost the same between D 5 receptor KO mice and wild type littermates. Based on these results, we suggest that dopamine D 5 receptors do not play an important role in impulsivity. It should be noted that other parameters were also affected by duloxetine. Accuracy, a measure of attentional function, was decreased when 3 mg/kg duloxetine was injected in both genotype and both sexes. In addition, 1 mg/kg and 3 mg/kg duloxetine administration prolonged the mean correct latency and reward latency in both genotype and both sexes. These measures represent motivation and motor function. The percentage of omissions, which represents attentional function and motivation, was not affected by duloxetine. Therefore, the prolonged latencies would reflect a decrease in motor function, indicating that higher doses of duloxetine would be inappropriate in the evaluation of anti-impulsive effects. However, we still conclude that dopamine D 5 receptors have a negligible role in impulse control because these side effects were equally observed in either genotype and the anti-impulsive effects of a low or moderate dose of duloxetine did not disappear in D5KO mice.
In light of these results, how should we interpret previous studies 5,6 indicating that drugs suppress impulsivity by stimulating dopamine D 1 -like receptors in the mPFC? There are at least two possibilities. First, dopamine D 1 receptors may be more involved in impulsivity suppression, since the involvement of D 5 receptors has been ruled out. However, since dopamine D 1 receptors are also densely expressed in the nucleus accumbens, where impulsivity is enhanced by their stimulation 7,8 , they will not be an appropriate molecular target for anti-impulsive drugs. The second possibility is that previous studies have largely examined nonselective effects of dopamine D 1 -like receptor antagonists, where SCH23390 is frequently used, although its selectivity for D 1 -like receptors is not high enough to completely exclude effects on other receptors and channels 28 . In either case, the development of a selective dopamine D 5 receptor agonist would not resolve the current problems encountered in current anti-impulsive drugs. Interestingly, a recent study showed that striatal dopamine D 5 receptors are involved in the pathophysiology of levodopa-induced dyskinesia 29 . Therefore, dopamine D 5 receptors might play a role in pathological but not physiological situations.

Materials and methods
Animals. Adult male and female D5KO mice 15 or wildtype littermates (8-28 weeks old) were used. The B6.129-Drd5 < tm1Mok > mouse strain (RBRC01084) was provided by RIKEN BRC through the National Bio-Resource Project of the MEXT, Japan. In the D5KO mice used in this study, the entire dopamine D 5 receptor gene region was removed and replaced with a neomycin resistance gene. These mice were backcrossed to the C57BL/6N strain for more than 13 generations. C57BL/6N mice were supplied from Nippon SLC Co. Ltd (Hamamatsu, Japan). Animals were group-housed before starting behavioral experiments at 25 °C ± 2 °C and relative humidity of 40%-50%. Food and water were provided ad libitum except for the mice undergoing the 3-choice serial reaction time task. The lights of the animal rooms were turned on from 19:00 to 07:00. All tests were performed during the dark period except for the home cage activity test. All procedures followed the guidelines for the Care and Use of Laboratory Animals from the Animal Research Committee of Hokkaido University and were approved by the Animal Research Committee of Hokkaido University (approval no. 18-0070). We conducted all experiments in compliance with the Animal Research: Reporting of In Vivo Experiments (ARRIVE) guidelines. Mice received one or several behavioral tests as summarized in Table 2. A few mice that experienced the 3-choice serial reaction time task were excluded from assessment of learning ability because a programming error affected the premature response latency data. www.nature.com/scientificreports/ Drugs. Duloxetine hydrochloride (Tokyo Chemical Industry Co., Ltd., Tokyo, Japan) was dissolved in saline and administered intraperitoneally at a volume of 10 mL/kg. Doses reported here are based on the molecular weight of the salt.

RNA analysis.
Mice were deeply anesthetized with urethane (2 g/kg) intraperitoneally and sacrificed by decapitation. Brain tissue including hippocampus, medial prefrontal cortex (mPFC), and striatum (Str) were dissected on ice. Each sample was weighed, placed in a tube, immediately frozen in liquid nitrogen and kept frozen at − 80 °C until analysis. Total RNA was extracted from tissue using NucleoSpin RNA reagent (Takara Bio, Shiga, Japan). The mRNA expression levels of Drd1 and Drd5 were quantified by reverse-transcription quantitative PCR (RT-qPCR) using the respective cDNA fragment as a standard and were normalized to mouse Gapdh mRNA levels. Briefly, 5 μg of total RNA were reverse transcribed using ReverTra Ace® qPCR RT Master Mix with gDNA Remover (Toyobo, Osaka, Japan). Real-time quantitative PCR was performed on a fluorescence thermal cycler Step One™ Real-time PCR System (Thermo Fisher Scientific, Waltham, MA, USA) by using TaqMan ® Fast Advanced Master Mix + probe set (Thermo Fisher Scientific). The PCR conditions were 50 °C for 2 min, 95℃ for 20 s, followed by 40 cycles of 95 °C for 1 s, and 60 °C for 20 s. Primer sequences for Drd1 (Thermo Fisher Scientific, Mm01353211_m1) and Drd5 (Thermo Fisher Scientific, Mm00658653_s1) were chosen based on a previous study 30 . Gapdh was used as a control (Thermo Fisher Scientific, Mm99999915_g1). The results were analyzed using the StepOne Software ver.2.3 (Thermo Fisher Scientific).
Home cage activity. Animals were individually housed in a Plexiglas cage (18 cm × 26 cm × 12 cm) for at least 1 week before this test. Spontaneous movements were measured by a passive infrared sensor that detected changes in animal thermal radiation due to movement 31 . The sensor detected a change in the intensity of infrared energy radiated from an animal (The Chronobiology Kit, Stanford Software Systems, Stanford, CA). The amount of movement was recorded every minute with computer software Analysis98 (Stanford Software Systems, Santa Crus, CA).

Open field test.
A mouse was placed in an acrylic box (45 × 45 × 45 cm) for 70 min. The inside of the box was covered by rough-surfaced polypropylene sheets. The light intensity in the box was adjusted to 20 lx. The movement of each mouse was monitored through a CCD camera and was tracked using a software package (LimeLight, Actimetrics, USA). We considered the total distance traveled and the number of total crossings (defined by crossings of the lines made by the division of the chamber into 7.5 cm × 7.5 cm squares) as measures of locomotor activity. Moreover, we considered the percentage of time spent in the central area (15 cm × 15 cm square) as a measure of anxiety-like behavior.
Y maze test. The details of the Y maze test have been described in our previous studies 27,32 . Briefly, a mouse was placed in an apparatus consisting of three arms (10 cm-wide, 45 cm-length, and 35 cm-high-walls) for 8 min. The light intensity in the apparatus was adjusted to 20 lx. The number of entries into an arm was as a measure of locomotor activity. The percentage of spontaneous alternation was used as a measure of working memory.

3-choice serial reaction time task (3-CSRTT).
Mice were trained to perform the 3-CSRTT as described previously 33 . We purchased aluminum operant chambers from Med Associates Inc. (St. Albans, VT, USA). The main sequence of the 3-CSRTT is briefly described below. When a mouse entered the food magazine, a 5-s inter-trial interval (ITI) began. After the ITI, one of the three hole lights was turned on (stimulus duration (SD) in experimental sessions: 1 s (SD1)) with a pseudo-random order. Nose poking before turning on a hole light was recorded as a "premature response, " which is a measure of impulsive action. Nose poking into the lit hole was recorded as a correct response and resulted in delivery of a palatable food pellet (20 mg, dustless precision pellets, Bio-Serv, Frenchtown, NJ, USA). Nose poking into an unlit hole was recorded as an incorrect response. When the animal did not nose poke into any holes, we recorded it as an omission. A 5-s time-out period started after premature responses, incorrect responses, and omissions. We also recorded the premature response latency (the time between the ITI onset and a nose poke into a unlit hole), the correct response latency (the time between stimulus onset and a nose poke into the lit hole), and reward latency (the time between reward delivery and a nose poke into the food magazine). Session data in the 3-CSRTT were used for two purposes (Fig. 6a). Training sessions after a pre-training period were used for q-learning analysis to assess learning ability. The pre-training sessions included several types of training and mice usually experienced each step for only a few sessions. After five SD1-ITI9 sessions were completed, duloxetine administration was started as described later.

Assessment of learning ability with a q-learning model with 3-CSRTT training sessions.
We focused on ITI with 5-s (ITI5) training session data from the first session after the pre-training process to the tenth session (Fig. 6a). In these sessions, our preliminary analysis showed no clear difference in proportions of result categories (correct, premature, incorrect, or omission) between genotypes (see Supplementary Fig. 2). However, an impulsive action could be related to previous trial behaviors, and detailed analysis with a q-learning model could reveal differences between wild type and D5KO mice in terms of impulsivity. Premature response latency was recorded for these sessions and combined with correct and incorrect latencies. We could reconstruct the time between stimulus onset and a nose poke into the hole regardless of trial results. www.nature.com/scientificreports/ The q-learning model in this study attempts to capture the mechanisms of premature behavior, which represents an impulsive action. To model the memory state in the mouse brain, we assumed two types of timedependent probability distribution functions (p.d.f.) to represent "experience" and "non-reward" memory states. The experience distribution represents the memory of all trials that mice have completed, and a non-reward distribution represents the memory of trials with premature results. Combining these two mechanisms, mice decide when to nose poke into a hole. Parameters of both distributions are updated based on the q-learning process. Since incorrect results were thought to be caused by different mechanisms and the number of incorrect results was small, incorrect results were treated the same as correct ones in this analysis.
We firstly define random variables for "experience" and "non-reward" distributions, which follow a normal distribution. With q-learning theory, these two random variables were updated based on a result type and elapsed time from a start timing of the previous trial. Rates of updating each parameter are controlled by learning rates of the experience distribution, α X , and the non-reward distribution, α Y . If these learning rates are lower for D5KO mice than wild-type mice, we state that D5KO mice have a deficit in learning ability.
Experience and non-reward distributions are used to calculate the probability of confidence with a softmax function. An inverse temperature, β , in the softmax function controls the degree of confidence by weighting experience and non-reward distribution. If the inverse temperature is higher for D5KO mice than wild-type www.nature.com/scientificreports/ mice, D5KO mice have strong confidence in their own choices. This probability of confidence is multiplied by the experience distribution and scaled to one, yielding the probability density function of choice representing the time of the decision to nose poke into a hole. This function can be converted to be a survival function, which is used for simulation purposes. These probabilities and distributions are illustrated in Fig. 6b. Detailed explanations of model derivation, parameter specification, estimation procedures, and simulation procedures are in Supplementary Information.

The effects of acute duloxetine injection on impulsive action in mice.
To determine whether dopamine D 5 receptors play an important role in the enhancement of impulse control, we administered duloxetine (0, 0.3, 1.0, and 3.0 mg/kg) intraperitoneally to D5KO mice and their wildtype littermates 30 min before the 3-choice serial reaction time task session. We did not use higher doses of duloxetine (> 3.0 mg/kg) because higher doses induced sedation in our preliminary study. Drug treatments were carried out using a Latin square design and were administered with at least a 2-day interval between injections. During the testing phase of this study, the duration of the ITI was prolonged to 9 s (ITI9) because the mice made only a few (< 10) premature responses during the task using a 5-s interval (ITI5). Each testing session with ITI9 was conducted for 70 min or until 100 trials were completed, whichever came first, while sessions with ITI5 were conducted for 60 min or until 100 trials were completed, whichever came first. When the mice experienced 10 ITI5 sessions, they were habituated to ITI9 sessions 6 times with 2-day intervals.
The we used a three-factor mixed analysis of variance (ANOVA) with time as a within-subjects factor and genotype and sex as between-subjects factors. For the effect of genotype in the Y maze test, we used a two-factor mixed ANOVA with genotype and sex as between-subjects factors. For the 3-CSRTT, each measure was analyzed separately by a three-factor mixed ANOVA with drug as a within-subjects factor and genotype and sex as betweensubjects factors except for the assessment of learning ability. If Mauchly's sphericity test was significant, a Greenhouse-Geisser correction was used. Multiple comparisons with Bonferroni's correction were also conducted in cases where ANOVA revealed a significant main effect. All results except for the assessment of learning ability are presented as mean ± standard error of the mean (S.E.M.). The results were considered statistically significant when p < 0.05. SPSS (version 23.0) and GraphPad Prism (version 8.4.2) were used for statistical analyses.

Data availability
The datasets of this study are available from the corresponding author on reasonable request.

Code availability
Codes used for model fitting and plotting of q-learning assessments of the 3-choice serial reaction time task is available on a GitHub repository at https:// github. com/ Neuro pharm acol/ 3csrtt_ q_ learn ing. www.nature.com/scientificreports/