Prolonged abstinence from cocaine or morphine disrupts separable valuations during decision conflict

Neuroeconomic theories propose changes in decision making drive relapse in recovering drug addicts, resulting in continued drug use despite stated wishes not to. Such conflict is thought to arise from multiple valuation systems dependent on separable neural components, yet many neurobiology of addiction studies employ only simple tests of value. Here, we tested in mice how prolonged abstinence from different drugs affects behavior in a neuroeconomic foraging task that reveals multiple tests of value. Abstinence from repeated cocaine and morphine disrupts separable decision-making processes. Cocaine alters deliberation-like behavior prior to choosing a preferred though economically unfavorable offer, while morphine disrupts re-evaluations after rapid initial decisions. These findings suggest that different drugs have long-lasting effects precipitating distinct decision-making vulnerabilities. Our approach can guide future refinement of decision-making behavioral paradigms and highlights how grossly similar behavioral maladaptations may mask multiple underlying, parallel, and dissociable processes that treatments for addiction could potentially target.

corridor heading to the next restaurant). From this, we could capture the degree to which animals interrupted smooth offer zone passes with pause-and-look re-orientation behaviors, known as vicarious trial and error (VTE). 1 The physical hemming-and-hawing characteristic of VTE is best measured by calculating changes in velocity vectors of discrete body x and y positions over time as dx and dy. From this, we can calculate the momentary change in angle, Phi, as dPhi. When this metric is integrated over the duration of the pass through the offer zone, VTE is measured in the offer zone as the absolute integrated angular velocity, or IdPhi, until either a skip or enter decision was made. Reaction time in the offer-zone was also measured in this period.
Reaction time to quit was also measured in the wait-zone from tone-count-down-onset until exit from the wait-zone prematurely before a pellet is earned. Post-earn-consumption-and-lingeringtime was measured from pellet delivery-onset until the first exit was made out of wait-zone. In an earlier pilot study, cameras were placed in the wait-zone in order to observe lingering behaviors. After immediate pellet consumption, mice exhibited no unusual behaviors other than occasional grooming and checking the empty pellet receptacle for varying lengths of time before exiting and proceeding to the next restaurant.
Offer-and wait-zone thresholds were measured for each session by fitting sigmoid functions to zone choice outcomes as a function of offer delay, restaurant by restaurant. Inflection point and slope of each sigmoid fit was calculated. In order to calculate the value of the offer on any given trial, thresholds were re-calculated in a leave-one-out analysis excluding the current trial. We then used wait-zone threshold minus offer to calculate value.
Economic conflict inefficiency (Fig. 2F,2N) was measured both for the offer-zone (Fig. 2F) and wait-zone (Fig. 2N). This metric characterized how mice responded to an economically unfavorable offer (an offer where the delay was greater than wait-zone threshold). The ratio of the probability of entering the wait-zone for offers above the wait-zone threshold relative to skipping them was calculated in each restaurant as a function of rank. Similarly, in the wait-zone, after mice had already accepted such offers greater than wait-zone threshold, we characterized how long it took an animal to quit such an offer. If mice took so long that the amount of time remaining when quitting was less than wait-zone threshold, that was characterized as an economically inefficient quit. The ratio of the probability of quitting these offers after they counted down passed wait-zone thresholds relative to quitting before the countdown passed waitzone thresholds was calculated in each restaurant as a function of rank.
In order to control for the possibility that the analysis of changes in VTE in the offer-zone in economically unfavorable acceptances (taking offer-zone deals that are above the wait-zone thresholds) could have been affected by unequal or different distributions of offers based on trial type (e.g., skipping offers, entering offers above threshold, or entering offers below threshold), we generated simulated shuffled data sets of reaction time and VTE when both skipping and entering offers below threshold matching the same trial-by-trial distributions of offer lengths as those subsets of trials where mice entered offers above threshold. In Fig. 2J-K and Fig. 3C-D, this ensures any changes seen in offer-zone behaviors, particularly when entering economically favorable vs. unfavorable offers, are not skewed by differences in distribution of trials of different offer lengths ( Supplementary Fig. 11, Supplemental Discussion).

Drug exposure regimen and locomotor sensitization
This drug treatment regimen is a simple, straightforward yet powerful means of producing robust and long-lasting behavioral and neurobiological changes linked to aspects of addiction such as incentive sensitization and neural plasticity in the mesocorticolimbic dopamine system. [2][3][4] By looking at a time point during prolonged abstinence, we intended to characterize changes that may reflect the life-long decision-making problems seen in recovering addicts. Long-lasting forms of neurobiological plasticity changes are observed at these prolonged abstinence time points coinciding with and causally linked to escalation of craving. Such plasticity measurements predict relapse susceptibility in human addicts. 5 Injections took place in the evening 4 hours post-Restaurant Row testing. Our goal was to expose animals to drugs of abuse outside of testing hours, to be especially sure drug has cleared the animals' system before the next day's behavior. Furthermore, we wanted to avoid the effects of acute withdrawal on each day of Restaurant Row testing during the drug exposure phase. Repeated Restaurant Row testing during the drug exposure phase was not intended to capture instances when drug is on board, nor was it intended to compare changes between first and subsequent drug exposures, nor was it intended to analyze the effects of immediate cessation of repeated drug administration on decision-making. Instead, the goal was to interrogate decisionmaking after prolonged abstinence. Repeated Restaurant Row testing during the drug exposure phase and early abstinence was mainly intended to (1) ensure the animals did not unlearn the task day to day, and (2) maintain regular self-earned food-intake amounts contingent upon task performance rather than giving the animals non-contingent food or days off.
In the evening at the time of each drug injection, mice were placed in large locomotion monitoring boxes with tracking cameras fixed above automatically measuring distance traveled using AnyMaze software (Stoelting). Mice were placed in the boxes for 20min before being injected intraperitoneally with saline and then monitored for 90min post-injections. Then mice were divided into three groups: saline (n=10), cocaine (n=10), and morphine (n=10). One mouse out of the original 31 was excluded because it never learned the task. Mice were then injected with their respective treatment for 12 consecutive nights while being tested in Restaurant Row regularly. For the drug groups, mice were given lower doses (15mg/kg cocaine, 10mg/kg morphine) on the first and last nights and received repeated higher doses (30mg/kg cocaine, 20mg/kg morphine) on the intermediate 10 nights. Three mice were lost during the drug phase in the cocaine group and were excluded from analyses. Mice were then put through a forced abstinence period for 2 weeks while regularly being tested in Restaurant Row.
In addition to the prolonged abstinence timepoint that is the main focus of the drug paradigm, we also introduced animals to an acute drug challenge at the end of the ~2 weeks of abstinence timepoint. This was intended to probe responsivity to a drug prime and assess degree of locomotor sensitization that typically incubates over prolonged abstinence and can be expressed upon drug-re-exposure. Locomotor sensitization was measured as the psychomotor response measured immediately following drug injection at this timepoint compared to psychomotor response measured immediately following drug injection on the 12 th evening of the repeated drug exposure sequence. We randomly injected mice 3 times with saline across the evenings before experiencing this acute drug challenge, again, to acclimate the animals to the stress of injections in preparation for the forthcoming drug-re-exposure challenge.
Mice were challenged in the evening with a single low dose of drug same dose as the 1 st and 12 th night of drug in the repeated drug exposure sequence, being re-exposed to the same drug administered previously. Saline mice were divided into two groups of n=5 to receive a low dose of either cocaine or morphine for the first time, acutely. Despite the small sample size, this split was done to ensure that sensitized locomotion in response to a single dose was present only in animals with a history of repeated drug exposure. This comparison was statistically significant even with samples of n=5. This replicates work from our lab and numerous others. [2][3][4] Regardless, the primary analyses (comparing baseline to prolonged abstinence) occurred before the saline group was split and statistics were done with the complete saline group as control.
Following the acute drug-re-exposure challenge, Restaurant Row was tested regularly during the day for an additional 2-3 weeks. Because there were no lasting drug effects on any animal behavior in the formerly saline animals after the acute drug-re-exposure challenge session which took place ~20 days before the pre-feeding probe sessions (described below), this group served as control conditions for the pre-feeding probe sessions.

Devaluation/Invigoration Pre-feeding Probe Sessions
The pre-feeding probe sessions were performed at the end of the experiment and were intended to elucidate if rapid decisions or snap-judgments were flexible or inflexible processes. Devaluation probes are often used to differentiate goal-oriented (flexible and thus sensitive to devaluation) and habitual (inflexible and thus insensitive to devaluation) decision processes. [6][7][8][9][10] The devaluation probe in our task allowed us to rule-out habitual processes. There was no further testing after the pre-feeding probes as the experiment ended and all mice were retired.
Mice were pre-fed 30-60min before testing in an amount equivalent to what they typically earned in their most-preferred restaurant. Since each animal showed individual revealed preferences (i.e. different animals like different flavors best), we fed each animal its most-preferred flavor on one day and its least-preferred on the next. Since some animals received their most-preferred flavor on the first-day of pre-feeding while others received their least-preferred flavor on the first-day of pre-feeding (randomly selected and counter-balanced), day two of pre-feeding flipped this assignment. There were no order effects and no lasting body weight changes on day one versus day two of pre-feeding, so we pooled together the first and second day of pre-feeding to look at group differences between being fed one's most-preferred flavor versus least-preferred flavor.
The fact that all groups still showed sensitivity to the pre-feeding probe (although with intricate fine-grained differences between groups described in the Supplementary Discussion), we determined that the decision-processes in Restaurant Row remained flexible and had not transitioned to habit-like processes.

Vicarious trial and error
A key to interpreting parallel competing valuations in our task during decision-conflict between forward-looking planning and immediate desire-driven responding is the presence or absence of a critical behavioral metric -vicarious trial and error (VTE) -which has extensively been studied in a series of proof of principle publications. 1 We know that VTE is a sign of deliberation but VTE has not yet been measured in an addiction model.
In 2007, Johnson and Redish discovered that during VTE, hippocampal representations swept forward along the path of the animal, alternating between potential goals. 11 This key result has been replicated several times. We know that these sequences align to hippocampal theta cycles. 12 That is, they are theta sequences. However, the sequences during VTE sweep farther than during normal navigation. 13 The sequences proceed all the way to the goal. 12 If an animal is going to run past one goal to another one, the sequences run farther to the second goal. 14 They reflect indecision in the animal. An animal that knows where to go does not show VTE and the sequences only sweep forward to the goal the animal is actually going to go to. 11,[15][16] Furthermore, neurophysiologically, during VTE, reward-related representations appear in the nucleus accumbens (ventral striatum) [17][18] and in the orbitofrontal cortex. 19 Both of these results have been replicated. 20 These data suggest that there is an evaluation going along with the prediction in hippocampus. Neurophysiologically, we know that there is a triple dissociation between hippocampus (sweeps during VTE), ventral striatum (reward representations during VTE), and dorsal striatum (no extra activity during VTE, but slowly learned situation-action pairs). 21 As animals develop regular paths and VTE goes away, the dorsal striatum develops task-bracketing wherein activity appears at the start of the ballistic journey. 22 This result has been replicated. 23 In both of these papers, VTE is negatively correlated to the striatal taskbracketing.
Behaviorally, VTE occurs during times when the animal knows the structure of the world, but does not know what to do on it. VTE occurs when the animal is indecisive about goals and when contingencies change. 19,[23][24][25] Manipulations that force flexibility in tasks lead to an increase in VTE, while manipulations that force regularity in paths lead to a decrease in VTE. 26 Finally, on tasks able to differentiate decisions that require planning (sometimes called model-based) from decisions that reflect cached values (sometimes called model-free), VTE occurs when the decisions show planning (model-based) and disappear when the decisions reflect cached values (model-free). 24,[26][27] In this task, we can take VTE as a sign of indecision and deliberation, and a lack of VTE as a sign of quick, decisive decisions (snap-judgments). In this task, we can reliably detect the difference between VTE and rapid (snap) judgments. Furthermore, we found that when VTE events took place, they did so with delayed onset overriding initial snap judgments in the offerzone that would have otherwise violated normative economic behavior. This form of delayed deliberative VTE-containing override decisions rescued and prevented economic violations from occurring, importantly only when skipping, and could serve as a behavioral operationalization of knowing better or should not judgments. Sometimes when such slower deliberative VTE process failed to come online, mice accepted expensive offers only to later reverse that initial rapid commitment by quitting in the wait-zone. This indicated that a re-evaluation process can also occur in the wait-zone. Both override-processes in the offer-zone or wait-zone took longer to override in higher preferred restaurants, capturing an increasingly stronger desire-component of these parallel computational processes.

Sub-optimality
Theories of foraging behavior are rooted in hypotheses of optimizing time allocation in order to maximize reward rate. 28 In Restaurant Row, all flavored pellets are of equal caloric value, and thus any differences in reinforcement rate as a function of cost between flavors must be taken as reflecting an underlying subjective valuation. Mice demonstrated a large variability in subjective flavor preferences from which we found interesting asymmetries and interactions with multiple valuation processes measurable on this task.
If we take into account individual differences in subjective preferences of willingness to wait for rewards (wait-zone thresholds), we can still determine a measure of sub-optimality, normalized to each animal's idiosyncratic preference for each flavor. In order to calculate maximum number of rewards a mouse could earn in each restaurant taking into account subjective flavor preferences, we simulated Restaurant Row sessions yet eliminated wasteful behaviors. To this end, in this model, we forced offer-zone thresholds to match wait-zone thresholds, thus eliminating all quit events. Furthermore, we eliminated differences in offer-zone deliberation time and post-earn lingering time between flavors (by using minimum deliberation time and minimum consumption time collapsed across all restaurants based on each animal's performance). We also used minimum transit time between restaurants based on each animal. These are the times the animal could have used if the only difference between decisions was the underlying willingness-to-wait thresholds between the flavors.
We found that mice overall were sub-optimal on this metric, even after taking into account individual differences in subjective flavor preferences and that prolonged abstinence from repeated drug exposure did not influence this metric ( Supplementary Fig. 8D).
We also found that degree of sub-optimality interacted with flavor ranking. That is, mice were more sub-optimal in less-preferred restaurants. This is likely due to the disproportionate excess amount of time spent in the offer-zone, wait-zone, and lingering in more-preferred restaurants. Such disproportionate excess amount time that was removed from our optimal-performance model, when re-allocated optimally, would lead the model to predict disproportionately higher earnings than actual in less-preferred restaurants. This is due to the combination of excess time available, lower thresholds in those restaurants, and greater likelihood of our model encountering low cost offers in those restaurants that can be earned and that would have not been actually encountered otherwise. Thus, this yielded higher predicted than actual reinforcement rates in less-preferred restaurants.

Drug-related effects
Importantly, our decision-making tests are made during times when cocaine and morphine are not on board, and we show that drug exposure after the drug has cleared the animal's system does not have any persistent effects on locomotor activity or appetite that could confound our interpretations of our decision-making tests ( Supplementary Fig. 7).
Acute locomotor and appetite changes are typical effects when drug is on board and could confound behavioral performance on many tasks. The half-life of cocaine is ~1hr and morphine is ~2hr. 29 We tested mice on our task 10 hours after each drug injection (which took place 4 hours post-testing on our task) and well into prolonged abstinence for 2 weeks where we observed our decision-making conflict changes.
We used the following metrics to test for off-target effects of chronic drug: speed of locomotion on the task, number of completed laps, total amount of food earned and total weight gained. We found no differences in any of these metrics between controls and drug-treated mice (or within individuals) across the entire experiment. This lack of change rules out off-target effects on locomotion or appetite as possible confounding factors for our observed changes in decisionmaking metrics, including VTE ( Supplementary Fig. 7).
Furthermore, our effects of drug on decision-making persist 2 weeks after chronic drug exposure at a time point when long-lasting circuit changes in decision-making-related brain areas including the prefrontal cortex, nucleus accumbens, and hippocampus are known to develop and when psychomotor sensitization is expressed -a hallmark and behavioral correlate of repeated drug-induced incubation of plasticity changes replicated numerous times. [2][3][4][30][31][32][33][34][35] Our repeated drug exposure regimen did induce psychomotor sensitization measured in the 90minute window following drug administration expressed after prolonged abstinence during a drug challenge (Supplementary Fig. 7).
Long-lasting changes in decision-making conflict were observed only after repeated drug exposure, not after acute one-time drug exposure ( Supplementary Fig. 9). We examined behavior during the drug-exposure phase (Fig 1A, cyan timepoint 1), during early abstinence (Fig 1A,  cyan timepoint 2), and following the acute drug-re-exposure change after prolonged abstinence ( Fig 1A, cyan timepoint 3). The main timepoint of interest was after prolonged abstinence from repeated drug use, a timepoint at which psychomotor sensitization is typically expressed, at which neural plasticity in defined circuits develop, and at which recovering addicts struggle to make good decisions before relapsing. [2][3][4][30][31][32][33][34][35] Psychomotor sensitization seen after repeated drug exposure has been shown to be a behavioral correlate of drug-induced neural plasticity in specific mesolimbic and striatal circuits. That is, animals that show heightened locomotor responses to drug injections following repeated administration and incubated over prolonged abstinence show drug-induced circuit plasticity while animals that do not show heightened locomotor responses do not exhibit neural plasticity. 36 Nonetheless, we present additional data during the drug exposure phase, early abstinence, and following the drug-re-exposure challenge primarily intended to express degree of psychomotor sensitization incubated throughout prolonged abstinence (Supplementary Fig. 9). We found no decision-making changes during Restaurant Row during the drug-exposure phase in offer-zone deliberation behaviors (Supplementary Fig. 9A-B, enters comparison, non-significant, Kolmogorov-Smirnov tests, P>0.05), nor between the first and last (12 th ) injection during the drug exposure phase in thresholds ( Supplementary Fig. 9C, wait-zone across time, nonsignificant, Friedman, P>0.05), nor in post-earn lingering time ( Supplementary Fig. 9D, lingering time across time, non-significant, Friedman, P>0.05).
Looking at the early abstinence time point, we found no changes in offer-zone deliberation behaviors ( Supplementary Fig. 9E-F, enters comparison, non-significant, Kolmogorov-Smirnov tests, P>0.05), nor between baseline and early abstinence in thresholds ( Supplementary Fig. 9G, wait-zone across time, non-significant, Friedman, P>0.05), nor in post-earn lingering time ( Supplementary Fig. 9H, lingering time across time, non-significant, Friedman, P>0.05).
Looking immediately following the drug-re-exposure challenge after prolonged abstinence, we only saw the persisting difference in the cocaine group ( Supplementary Fig. 9I-J, enters comparison, cocaine group only, significant, Kolmogorov-Smirnov tests, *P<0.05, see Fig. 3C-D for comparison). Interestingly, only in mice with a history of repeated drug exposure, and not in formerly saline-treated mice experiencing drug for the first time at the time of the drug challenge, we saw an increase in wait-zone thresholds immediately before and after the drug-reexposure challenge ( Supplementary Fig. 9K, wait-zone across time, cocaine and morphine, Friedman, *P<0.05). Interestingly, in all mice following the drug challenge, we found an increase in post-earn lingering time ( Supplementary Fig. 9L, lingering time across time, all mice, Friedman, *P<0.05).
Taken together, this suggests that the decision-making changes reported in the main text seen in mice with a history of repeated cocaine and morphine exposure were apparent only after prolonged abstinence and not after a single drug-exposure. Interestingly, all mice appeared to increase lingering time regardless of history of drug use following an acute exposure to drug ( Supplementary Fig. 9L). This suggests that hedonic valuations of non-drug rewards can be enhanced during acute withdrawal from drug. An acute drug-re-exposure challenge has been shown in the literature to precipitate reinstatement of drug-seeking behavior as a model of provoking relapse as well as induce neural plasticity changes unique from prolonged-abstinenceinduced plasticity. [2][3][4][30][31][32][33][34][35] While the main focus of this manuscript was not to actually induce relapse, but rather model decision-making changes just before relapse after prolonged abstinence, it is interesting that drug-re-exposure after prolonged abstinence caused changes in wait-zone thresholds only in mice with a history of repeated drug exposure and not in first-time users (saline-pre-treated mice). This sets the stage for further investigation in future studies to more closely examine decision-making changes at secondary timepoint after relapse.

Devaluation
Referring to cyan timepoint 4 in Fig. 1A and Supplementary Fig. 10, pre-feeding has been shown to change reward seeking behaviors depending on factors including amount pre-fed, instrumental action being assessed, and reward-selective versus reward-nonselective modulation. 6-10 Pre-feeding-induced devaluation of reward-seeking behaviors has been widely used as a way to probe if behaviors are inflexible, stimulus-response-driven, and thus habit-like versus flexible, response-outcome-driven, and thus goal-directed. 6-10 These two potential responses to a devaluation manipulation such as pre-feeding have been shown to separate behaviors that are differentially driven by separable neural circuits.
We pre-fed mice either their least-or most-preferred flavors in an amount that did not disrupt typical number of laps run or pellets earned ( Supplementary Fig. 10A-C, C: Friedman, nonsignificant, P>0.05). Bodyweight did significantly increase following pre-feeding but before testing, yet was normalized by the next day ( Supplementary Fig. 10D, before and after feedings, Friedman, significant, *P<0.05, before feeding across days, Friedman, non-significant, P>0.05).
Wait-zone thresholds were devalued (decreased) in saline and cocaine mice while the thresholds of morphine mice did not change ( Supplementary Fig. 10F, Sign test, *P<0.05). Only when prefed their most-preferred flavor were saline mice devalued in the offer-zone as well ( Supplementary Fig. 10E, Sign test, *P<0.05). Offer-zone thresholds of cocaine mice interestingly increased, suggesting pre-feeding for these animals carried an invigorating-like food-prime component on this aspect of behavior ( Supplementary Fig. 10E, Sign test, *P<0.05).
In the offer-zone, deliberation time and VTE when skipping or accepting offers below threshold (economically favorable) was unaltered; however, saline mice accepted offers above threshold (economically unfavorable) more slowly when pre-fed their most-preferred flavor (Supplementary Fig. 10G-H, Sign test, *P<0.05), suggesting a shift in the balance of valuation functions. Entering offers above threshold however, just as before, took place after little VTE with no further pre-feeding-induced changes, indicating these events were still snap-judgments (did not involve deliberating about correct alternatives, Supplementary Fig. 10H, Sign test, P>0.05). Morphine mice responded just as saline mice did while cocaine mice displayed no changes on this metric ( Supplementary Fig. 10G-H, Sign test, *P<0.05).
Finally, although lingering remained unchanged in saline-treated mice, morphine-abstinent mice showed invigorated (increased) lingering while cocaine-abstinent mice showed the opposite ( Supplementary Fig. 10J, Sign test, *P<0.05). Additionally, cocaine-abstinent mice displayed less time spent waiting before quitting ( Supplementary Fig. 10I, Sign test, *P<0.05). Taken together, pre-feeding revealed changes in dissociable valuation algorithms that were blunted or enhanced based on drug history.
Devaluation experiments can modify the incentive value of instrumental actions and reveal specific encoding of emotional states or craving underlying goal-oriented behavior. [28][29][30][31] In appetitive tasks, pre-feeding is one way to accomplish this. Taking advantage of the subjective value properties of rewards and different zones, we found that pre-feeding decreased wait-zone thresholds (indicating devaluation) consistent with satiety effects on incentive processes. 29 These effects were not-flavor specific and seemed to affect appetitive reward taking valuation processes in general. However, only when pre-feeding most-preferred flavors did offer-zone thresholds also decrease. This highlights not only a flavor-specific satiety effect consistent with past reports 6-10 but also a subjective value-specific capacity to modify motivational states unique to choose-between decisions involving highly wanted rewards. Pre-feeding seemed to induce invigoration-like effects in drug-treated mice absent in saline-treated mice. In morphineabstinent mice, we found increased conditioned-place-preference (CPP)-like lingering, which may reflect enhanced craving and explain why their wait-zone thresholds, which were generally insensitive to change, paradoxically opposed satiety-induced devaluation. In contrast, cocaineabstinent mice, while sensitive to wait-zone threshold devaluation, paradoxically displayed increased offer-zone thresholds. That is, cocaine-abstinent mice were food-primed to over-value offers in the offer-zone that were exaggeratedly under-valued in the wait-zone. Thus, the hypothesis that cocaine-abstinent mice may be transitioning into a lower value state once in the wait-zone may explain why they were more likely to quit, quit faster, and spend less time lingering, suggesting the predicted value of accepted rewards were less than expected.
Pre-feeding was not intended to assess drug effects but rather to assess decision flexibility and rule out habitual processes. Because there were no lasting drug effects on any behavior in the formerly saline animals after the acute drug-re-exposure challenge session which took place 20 days before the pre-feeding probe sessions, this group served as control conditions for the prefeeding probe. Again, the fact that all groups still showed sensitivity to the pre-feeding probe (although with intricate fine-grained differences between groups), we determined that the decision-processes in Restaurant Row remained flexible and had not transitioned to habit-like processes.    Fig.1D,1F for agreement that the most-preferred restaurants yielded the highest o er-zone thresholds, wait-zone thresholds, and lingering time). Shaded error region displays 95% CI. N=31.   Fig.2L), indicating that the majority of quits occurred after mice had taken o ers greater than their typical threshold (i.e. economically unfavorable), and the quit was a form of self-correction.     We ran single trial simulation analyses during baseline (C) or prolonged abstinence (D) to control for unequal distributions of o ers based on trial type (skipping bad deal, entering bad deal then quitting, or entering good deal) that could confound interpretations of o er zone behaviors when making initial enter or skip decisions. We generated simulated shu ed data sets of both skipping a bad deal and entering a good deal then earning matching the same trial-bytrial distributions of o er lengths as those subsets of trials where mice entered a bad deal then quit. That is, simulations were performed by using the o er length distributions that belong to the enter-bad-deal scenario and then averaging only those o er-zone reaction times that matched this o er distribution where the outcomes were instead skips (for the skip simulation) or enter-good-deals (for the enter simulation). We found that after running these analyses on baseline days 66-70, we do not see any signi cant di erences in any treatment group between our conditions of interest (how mice deliberated before accepting bad deals in the o er-zone), comparing entering bad o ers that leads to quits to the shu ed control entering that leads to earns simulated to match o er distribution of entering bad deals before quitting, (P>0.05). This comparison of interest does change even when matched against simulated shu ed data sets only in the cocaine group after prolonged abstinence (*P<0.05). Thus, o er-zone behavior when entering-bad-deals looks like entering-good-deals (both are rapid snap judgments even if the former is a mistake) for all mice at both time points except the cocaine-treated animals at the prolonged abstinence time point. Error bars. ± 1 SEM. N per group listed on respective plots.