Abstract
Trial history biases and lapses are two of the most common suboptimalities observed during perceptual decisionmaking. These suboptimalities are routinely assumed to arise from distinct processes. However, previous work has suggested that they covary in their prevalence and that their proposed neural substrates overlap. Here we demonstrate that during decisionmaking, history biases and apparent lapses can both arise from a common cognitive process that is optimal under mistaken beliefs that the world is changing i.e. nonstationary. This corresponds to an accumulationtobound model with historydependent updates to the initial state of the accumulator. We test our model’s predictions about the relative prevalence of history biases and lapses, and show that they are robustly borne out in two distinct decisionmaking datasets of male rats, including data from a novel reaction time task. Our model improves the ability to precisely predict decisionmaking dynamics within and across trials, by positing a process through which agents can generate quasistochastic choices.
Similar content being viewed by others
Introduction
It has long been known that experienced perceptual decision makers deviate from the predictions of optimal decisiontheory, displaying several suboptimalities in their decisionmaking. Among the most pervasive of these is the dependence of behavior on the recent history of observed stimuli, performed actions, or experienced outcomes, despite it being disadvantageous and leading to worse performance^{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18} (schematized in Fig. 1a top). History biases may arise due to a strategy that is optimized for naturalistic settings, where continual learning of priors, actionvalues, or other decision variables helps agents adapt to changing environments, but is maladaptive in experimental settings where the statistics of the environment are stationary^{19,20}. To date, decisiontheoretic models have accommodated history biases by modeling them as a biasing factor on the perceptual evidence that drives choices^{3,12,13,21,22,23,24,25,26}. In the predominant conceptualization of these models, history biases can be overcome with sufficient perceptual evidence.
A second widelyrecognized but less studied suboptimality is the tendency to “lapse", or make (asymptotic) errors that are immune to strong evidence^{3,4,11,27,28,29,30,31,32,33} (schematized in Fig. 1a bottom). Because lapses appear to be evidenceindependent, they are assumed to arise from nuisance mechanisms that are separate from the perceptual decisionmaking process and are often imputed to adhoc noise sources such as inattention, motor errors etc.
However, several recent results suggest that these two suboptimalities may be linked in their origin. In primates, learning reduces dependence on recent trial history^{2} as well as lapse probabilities^{28}. Intriguingly, mice trained on a visual detection task showed higher levels of history dependence on sessions with higher lapse probabilities^{3}. Moreover, lapses occur in runs (i.e. display Markov dependencies), rather than occurring with the traditionally assumed independent probabilities across trials^{34}. Furthermore, lapses have been proposed to reflect forms of exploration^{32} that are sensitive to trialbytrial updates of variables such as action value. Likewise, neural perturbations of secondary motor cortex and striatum in rodents have been shown to substantially impact both lapses^{32,35,36,37,38,39} and trialhistory influences on decisions^{39,40}. Together, these observations challenge the assumption that history biases and lapses have independent causes and raise the possibility that some of the variance ascribed to lapses emerges from history dependence.
In this work, we explore the idea that history biases reflect a misbelief about nonstationarity in the world, and demonstrate that normative decisionmaking under such beliefs gives rise to choices that are both historydependent and appear to be evidenceindependent (i.e. akin to lapses). This corresponds to an accumulation to bound process with a history dependent initial state. We fit this model to a large dataset of choices made by 152 rats trained on an auditory decisionmaking task. Despite heterogeneity in history biases and lapse rates in this population, we show that a substantial fraction of lapses can be explained by the presence of history dependence during evidence accumulation. Further, our model predicts the time it takes to make decisions. We test these predictions in a novel task in rats with reaction time reports, and show that it captures patterns of choices, reaction times, and their history dependence. This model significantly improves our ability to predict the temporal dynamics of decision variables within and across trials in perceptual decisionmaking tasks, rendering choices that were previously thought to be stochastic, predictable.
Results
A common mechanism produces history biases and apparent lapses
It is often assumed that welltrained subjects in twoalternative forced choice (2AFC) tasks have faithfully learnt the likelihood function and priors that determine the structure of the task^{23,41}. Under this assumption, the optimal decisionmaking strategy entails combining any knowledge about prior prevalence of available options with the stream of incoming evidence until a desired threshold of confidence is reached in favor of one of the options^{41,42,43} (Fig. 1b top). This strategy converges to a driftdiffusion model (DDM) when evidence is sampled continuously^{23}. In a DDM, one’s belief about the correct option maps onto a diffusing particle that drifts between two boundaries, where the first boundary the particle crosses determines the decision (Fig. 1b). Correspondingly, the initial state of this particle encodes the prior belief, and the drift rate is set by the likelihood of incoming evidence (Fig. 1b). We refer to the evolving state of the particle in this model as ‘accumulated evidence’.
However, in general, subjects may not know that the task structure is stationary, and might incorrectly assume that it is constantly changing^{19}. In this case, even experienced subjects would not converge to a static estimate of prior probabilities and likelihood functions, but would instead continually update them from trial to trial. Here we consider choice behavior that results from nonstationary beliefs about priors, which result in trialtotrial updates to the initial accumulator states. Although initial state updating is common to nonstationary beliefs in priors, likelihoods and reward functions, updates to the latter two additionally require drift rate updates (for a treatment of nonstationary likelihood functions which yield variability in drift rate, see^{14,44}).
We assume that the initial state of the accumulator (I) is set based on the exponentially filtered history of choices and outcomes on past trials. Each unique choiceoutcome pair (denoted by h; Fig. 1c) is tracked by its own exponential filter (i^{h}). On each trial n, each filter i^{h} decays by a factor of β^{h} and is incremented by a factor of η^{h} depending on the choiceoutcome pair on the previous trial:
{Rw, Lw, Rl, Ll} represent the possible choiceoutcome pairs: rightwin, leftwin, rightloss, and leftloss respectively. o_{n−1} is the choiceoutcome pair observed on trial (n−1) and 1^{h}(o_{n−1}) is an indicator function that is 1 when o_{n−1} = h and is 0 otherwise. The initial state of accumulation, I on trial n is given by the sum of these individual exponential filters:
Such a filter can approximate optimal updating strategies under a variety of nonstationary beliefs. As an example, we show that this exponential filter can successfully approximate initial state updates during Bayesian learning of priors under the belief that the prior probabilities of the two hypotheses can undergo unsignaled jumps^{5,19} (Supplementary Fig. 1). Nevertheless, we use this more flexible parameterization to allow for asymmetric learning from different choices and outcomes, which could be beneficial under generative models where one believes that one category persists for longer than another (requiring different decay rates), or correct and incorrect outcomes are not equally informative (requiring different update magnitudes). For instance, in a priortracking experiment where previous correct choices had a cumulative effect, but errors had a resetting effect^{13}, this could be captured in the exponential filter by faster decay rates for errors.
What are the consequences of such trialbytrial updating of initial accumulator states for choice behavior? In a DDM, for a given initial state I and drift rate μ, the probability of choosing the option corresponding to bound B + is given by:
where B is the magnitude of the bound and σ^{2} is the squared diffusion coefficient (derived from Palmer et al.^{45}). The resultant psychometric curves for different values of initial accumulator states are plotted in Fig. 1d. This expression reduces to a logistic function of μB/σ^{2} only when I = 0. Small deviations in the initial state largely resemble additive biases to the total evidence, shifting psychometric curves horizontally towards the option favored by the initial state. This corresponds to a change in the psychometric threshold i.e. the xaxis value at its inflection point (Fig. 1d lighter colors). Note that our use of the word “threshold” follows from Wichmann & Hill^{27}, referring to the xaxis value at the inflection point, whereas we refer to the slope at this inflection point as “sensitivity”. Interestingly, large deviations in the initial state produce qualitatively different effects on choices (Fig. 1d darker colors). They not only bias the choices towards the option consistent with the initial state but additionally reduce the effective sensitivity to evidence. This can be seen as reduction in slope at the inflection point of the psychometric curve (Fig. 1d dashed lines) in addition to a change in threshold. Therefore, trial to trial deviations in the initial state produce historybiased choices which have differently diminished dependence on the evidence.
The average choice behavior obtained by pooling choices with different historybiased initial states is a mixture of psychometric curves with varying thresholds and sensitivity to perceptual evidence. Such a psychometric curve is heavytailed^{46,47} and appears to have asymptotic errors or “lapse rates” (Fig. 1e, black curve). These asymptotic errors are not truly evidenceindependent, random decisions or true lapses, rather they are “apparent lapses” arising from evidence accumulation with deterministic historybased updates to the initial accumulator state. Importantly, these apparent lapses contribute to lapse rates when heavytailed psychometric curves are approximated by a logistic function. However, this approximation is bound to be inadequate if measurements were made for even higher stimulus strengths, making the heaviness of the tails even more evident. In such a setting, the psychometric curves obtained by conditioning on past trials’ choice and outcome, or historyconditioned psychometric curves, are both horizontally and vertically shifted, i.e. they show historydependent modulations in both threshold and lapse rate parameters (Fig. 1e, Supplementary Fig. 2b). Furthermore, trialhistory modulated lapse rates are uniquely produced by historybiased initial accumulator states (and therefore reflect apparent lapses), in contrast to lapse rates observed in the unconditioned psychometric curve which might have additional extraneous causes^{27,32,34}, and therefore reflect both apparent and true lapses.
In this model, because history modulations of psychometric thresholds and lapse rates arise from one unified process, they are not allowed to vary independently of the decisionmaking process, or of each other. Rather their relative magnitudes are intimately coupled with and constrained by accumulation variables. For instance, increased magnitudes or timescales of initial state updating produce large fluctuations in the initial accumulator state across trials. This in turn reduces the effective sensitivity of the accumulation process to evidence, giving rise to more apparent lapses and history biases (Supplementary Fig. 2a). Similarly, changes in withintrial parameters of accumulation can dramatically influence these history modulations (Supplementary Fig. 2c). Decisions made with smaller accumulator bounds are more sensitive to initial state modulations, and therefore give rise to more apparent lapses and higher modulations of lapse rates and thresholds. Higher levels of sensory noise have a similar effect, yielding more apparent lapses, consistent with recent reports of lapse rates being modulated by sensory uncertainty^{32}. Finally, impulsive integration strategies that overweigh early evidence rather than accumulating uniformly^{23} exaggerate the influence of initial states, producing more apparent lapses and history biases.
Some definitions:
Lapse rate: Lapse rates capture the difference between perfect performance and observed performance at the asymptotes, measured through sigmoidal fits to the psychometric curves.
True lapse: A true lapse is a stochastic, evidenceindependent choice that arises from cognitive processes entirely separate from the decision process, such as inattention or motor error.
Apparent lapses: Apparent lapses are deterministic evidencedependent choices, that nonetheless contribute to lapse rates when performance is averaged across trials.
Rats display varying degrees of historydependent threshold and lapse rate modulation
We sought to test if the comodulations posited by our model are present in rat decisionmaking datasets, in order to ascertain whether a unified explanation could underlie the links between history biases and lapses.
We first examined whether and how rat decisionmaking strategies were affected by trial history. We analyzed choice data from 152 rats (37522 ± 22090 trials per rat, mean ± SD; Supplementary Fig. 3a) trained on a previously developed task that requires accumulation of pulsatile auditory evidence over time (‘Poisson Clicks’ task^{30}). In this task, the subject is presented with two simultaneous streams of randomlytimed discrete pulses of evidence, one from a speaker to their left and the other to their right (Fig. 2a). The subject must maintain fixation throughout the stimulus, and subsequently orient towards the side which played the greater number of clicks to receive a water reward. The trial difficulty, stimulus duration, and correct answer were set independently on each trial. Because this task delivers sensory evidence through randomly but precisely timed pulses, it provides high statistical power to characterize decision variables that give rise to the choice behavior.
Rats performed this task accurately (0.79 ± 0.04, mean accuracy ± SD, Supplementary Fig. 3b). Performance was stable with little to no change in accuracy across trials (mean slope ± SD across rats of linear fit to hit rate over trials: 1.13 × 10^{−7} ± 8.90 × 10^{−7}; Supplementary Fig. 3c) reflecting asymptotic behavior rather than task acquisition. Rats showed history dependence in their choices, largely tending towards a “winstay, loseswitch” dependence (Supplementary Fig. 3e). We found substantial individual variability in the dependence of rats’ choices on history in the dataset. Some rats were weakly influenced by history (Fig. 2b left) while others showed a historydependent modulation of the psychometric threshold parameter (Fig. 2b middle) or a historydependent modulation of both threshold and lapse rate parameters (Fig. 2b right). The population as a whole most closely resembles Example rat 3, with both threshold and lapse rate parameters being significantly different following left and right wins while sensitivity is not affected (p = 0.8 for sensitivity, 3 × 10^{−17} for bias, 8 × 10^{−8} for left lapse, 6 × 10^{−7} for right lapse, twosided MannWhitney Utest, n = 152 Fig. 2c). Using simulations, we confirmed that the logistic fits to psychometric curves can reliably recover performance asymptotes i.e lapse rates particularly in the parameter regimes of this dataset (Supplementary Fig. 4). As predicted by our model (Fig. 1e), trialhistory biased both threshold and lapse rate parameters in the same direction (e.g. both biased toward rightward choices following right rewards). Moreover, the vast majority of rats show comodulations of both parameters by history (Pearson’s correlation coefficient: r = − 0.35, p = 7.28 × 10^{−6}; Fig. 2d). Across rats, on average 17 ± 12% of lapses are modulated by trial history and therefore could potentially reflect apparent rather than true lapses (Supplementary Fig. 3d). These findings support the conclusion that rat decisionmaking strategies, while idiosyncratic, largely show historydependent effects consistent with our model. Next, we tested the model more directly using trialbytrial model fitting.
Historydependent initial states capture comodulations in thresholds and lapse rates in the data
To test whether the observed history modulations in thresholds and lapse rates arise from trialbytrial updates to the initial accumulator state, we extended an accumulator model previously adapted to this pulsatile task^{30} to incorporate Historydependent Initial States (abbreviated as HISt, Fig. 3a). As before, we model this historydependence using an exponential filter over past trials’ choices and outcomes (Fig. 1c). Hence, across trials the accumulator model with HISt produces apparent lapses, as well as coupled history modulations in psychometric threshold and lapse rate parameters.
Within a trial, our accumulator model leverages knowledge of the timing of each evidence pulse to model the sensory adaptation process as well as to estimate the noise and drift of the accumulator variable (Fig. 3a top bubble, Methods). The model includes a feedback parameter that controls whether integration is leaky, perfect, or impulsive. Following Brunton et al.^{30}, this model also includes (biased) random choices independent of the accumulator value on a small fraction of trials (κ)  we consider decisions arising from this process to be “true lapses” because they are evidenceindependent, unlike apparent lapses which still retain some evidencedependence (Fig. 3a bottom bubble).
We performed trialbytrial fitting of the accumulator model with and without Historydependent Initial States (HISt) to choices from each rat using maximum likelihood estimation (Methods). We find that the accumulator model with HISt captures both psychometric curve threshold and lapse rate modulations well across different regimes of rat behavior, as evident from fits to example rats (Fig. 3b). Moreover, conditioning rats’ psychometric curves on modelinferred initial state values reveals that the initial state captures a large amount of variance in choice probabilities (Fig. 3c), resembling theoretical predictions (Fig. 1c). This shows that the initial state is a key explanatory variable underlying choice variability both across and within individuals, that jointly modulates multiple features of the empirical psychometric curves in a parametric fashion. We used Bayes Information Criterion (BIC) to determine whether adding HISt to the accumulator model was warranted (Fig. 3d, e). Individual BIC scores recommended that adding HISt was warranted in 147/152 rats (Fig. 3d). This model also best captured choices across the population as a whole, with significantly lower mean BIC scores across rats (Mean per trial BIC score for HISt: 0.91 ± 0.01 vs. no HISt: 0.93 ± 0.01, p = 9.85 × 10^{−18}, paired ttest; Fig. 3e). Next, we compared the psychometric threshold and lapse rate modulations produced by this model to the modulations in the data, as determined by conditioning the psychometric functions on trialhistory (Fig. 3b). As predicted, the model successfully accounted for modulations in both these distinct psychometric features via the singular process of trialbytrial historydependent updates to the initial accumulator state. Next, we examined the extent to which these modulations were captured across individual rats (Fig 3f, g). We quantified these history modulations as follows: “threshold modulations" are defined as the horizontal distance between the midpoints of psychometric curves conditioned on previous wins and losses, and “lapse rate modulation" as the vertical distance between the asymptotes of these curves (Methods: History modulation of psychometric parameters, also see Supplementary Fig. 2b). This was done separately for modelpredicted and rat choices and then compared. Across individuals, the model with HISt captured a substantial amount of variance [R^{2} = 0.72 (threshold parameter), R^{2} = 0.69 (lapse rate parameter)] and showed good correspondence to the empirical modulations in data [slope = 1.02 (threshold parameter), slope = 0.70 (lapse rate parameter)].
In our model, apparent lapses show history modulations since they are produced by historydependent initial accumulator states, while true lapses do not since they result from an occasional flip in the final choice and are independent of the accumulator value (following Brunton et al.^{30}). Such kinds of true lapses could reflect errors in motor execution or random exploratory choices made despite successful accumulation (Supplementary Fig. 5b). However true lapses could also occur due to inattention, i.e. an occasional failure to attend to the stimulus. In such cases, the optimal strategy devoid of sensory evidence is to deterministically choose the side favored by the initial accumulator state (Supplementary Fig. 5c). Therefore, inattentional true lapses, while remaining evidence independent, may nevertheless be modulated by history due to their initial state dependence. In order to account for this possibility, we fit an additional “inattentional” variant of the accumulator model with HISt (Supplementary Fig. 5a, c), and found that it was closely matched on BIC scores with the previous model which we label as the “motor error” variant (Supplementary Fig. 5e, f). Moreover, the inattentional variant, which additionally allows true lapses to depend on history, only captured slightly more variance in history modulations of lapse rates, at the expense of history modulations of thresholds (Supplementary Fig. 5d) while a variant of the model with inattentional true lapses but without HISt failed completely to capture the comodulation and performed much worse overall (Supplementary Fig. 6). Together these two findings support the hypothesis that apparent lapses produced by historydependent initial states (rather than true lapses due to motor error or inattention) are the major driver of historydependent comodulations in psychometric thresholds and lapse rates in the dataset.
To gain further insight into the initial state updating dynamics, we examined the fit parameters controlling the magnitude and timescale of updates (Supplementary Fig. 7). We found that across the population of rats, updates following wins and losses had similar magnitudes, but opposite signs, suggesting a tendency to repeat after wins and switch after losses. We compared these fits to those from a restricted version of the model whose initial state dynamics correspond to optimal updates in a Dynamic Belief Model^{48} (Supplementary Fig. 1) and found that about a third of the population (47/152 rats) were consistent with this form of statistical inference (Supplementary Fig. 7b). The remainder of the population did not show a significant correlation between postwin and postloss parameters, consistent with a statistical model that treats wins and losses differentially^{13,49} (Supplementary Fig. 7c).
To summarize, our model predicted that the initial accumulator state should be the underlying variable that jointly drives historydependence in thresholds and lapse rates – implying that our accumulator model with HISt should be able to simultaneously capture variability in both these parameters across rats. Our rat dataset strongly supports this prediction, lending evidence to the hypothesis that historydependent initial states give rise to apparent lapses, and are the common cognitive process that underlie links between these two suboptimalities that were previously thought to be distinct from each other.
Reaction times support historydependent initial state updating
In our model with historydependent initial accumulator states, the time it takes for the accumulation variable to hit the bound determines the duration that the subject deliberates for, before committing to a choice. Therefore in addition to choices the model makes clear predictions about subjects’ reaction times (RTs). We sought to test if these predictions are borne out in subject RTs.
To this end, we trained rats (n = 6) on a new variant of the auditory evidence accumulation task, with two key modifications that allowed us to collect reaction time reports (Fig. 4a). First, in this new task the stimulus is played as long as the rat maintains their nose in the center port (or “fixates”) and stops immediately when this fixation is broken. Second, in this task the rat has to correctly report which speaker’s auditory click train is sampled from a higher Poisson rate to receive a water reward (unlike the nonreaction time task where the subject has to report the side which played the greater number of clicks). Rats perform this task with high accuracy (Fig. 4b left panel, average accuracy: 0.75 ± 0.02, number of trials 37205 ± 14247, mean ± SD). Similar to the previously analyzed data, their choices are impacted by recent trial history (Fig. 4b right panel). Moreover, trialhistory dependent modulation of psychometric function parameters (Fig. 4c) resembles that of the nonreaction time task (Fig. 2c; p = 0.69 for sensitivity, 0.004 for threshold, 0.02 for left lapse rate, 0.02 for right lapse rate, MannWhitney Utest). Once again, this history modulation of both psychometric threshold and lapse rate parameters in tandem is consistent with our singular accumulator model with historydependent initial states.
Moreover, RTs of these rats display several signatures predicted by our model (Fig. 4d–f). First, trialtotrial variability in the initial state of the accumulator is expected to give rise to shorter RTs on error trials compared to correct trials^{22} (Fig. 4e, left). This is because trials in which the initial state is closer to the incorrect bound are more likely to be errors, but because of the closer bound they are also likely to hit it faster. This is unlike a standard DDM with no trialtotrial variability in parameters, where RTs for correct and error trials are of similar magnitudes (Fig. 4d, left). Indeed in the rat dataset, error RTs are consistently shorter than correct RTs across rats (Fig. 4f, left). Second, initial state updates towards previously rewarded choices (such as in a winstay agent) are expected to produce shorter RTs when the current stimulus favors the previously rewarded choice^{19,24} (Fig. 4e, middle). We find that this signature is also present in the dataset across rats (Fig. 4f, middle). Finally, variability in the initial state is most influential early in the decision process, predicting that the majority of history dependence in choices occurs on trials with fast RTs^{12} (Fig. 4e, right). Indeed, the data displays this pattern as well, with repetition bias being most prominent for short RTs, disappearing and turning into a weak alternation bias for long RTs (Fig. 4f, right). Taken together, these three signatures offer strong, complementary evidence from RTs for the prevalence of historydependent initial states in rats performing this evidence accumulation task.
We directly test if our model can simultaneously capture reaction time patterns and historymodulation of psychometric threshold and lapse parameters by jointly fitting choices and RTs of individual subjects in a trialbytrial fashion (see Methods). We find that the historydependent initial state model jointly captures patterns of choices, reaction times, and their history modulations in the data (Fig. 4g  fits from example rat, Supplementary Fig. 8  fits from all rats). This model accounts for substantial variance in historydependent threshold and lapse rate modulations (Fig. 4h). We also fit a hybrid variant of the accumulator model with HISt that flexibly allows true lapses to be motorerror like and unaffected by history, or inattentionlike and additionally be modulated by history (Supplementary Fig. 9a, b). While this model has a better BIC and leads to a slight improvement in correspondence to the history modulation of psychometric lapse rates, it does so at the cost of correspondence to modulations in psychometric thresholds (Supplementary Fig. 9c–e). This equivocal improvement over the HISt model in capturing the threshold and lapse rate modulations support the conclusion that HISt and its resultant apparent lapses (rather than true lapses) are a major contributor to the observed comodulation of both parameters.
Overall, these results show that the historydependent initial state updates that we invoked to explain apparent lapses in rodent data are corroborated by their reaction times, and accounting for them can help render a sizable fraction of decisions — that would have been otherwise attributed to noise — more predictable both within and across trials.
Discussion
History biases and lapses have both long been known to impact perceptual decisionmaking across species. However, they have largely been assumed to be distinct from each other, despite their frequent cooccurrence and comodulation. Here, we propose that normative accumulation under misbeliefs of nonstationarity can produce both history biases and apparent lapses, offering an explanatory link between the two suboptimalities. This corresponds to historydependent trialtotrial updates to the initial state of an evidence accumulator. We show that such updates produce choices with varying biases in psychometric thresholds as well as varying sensitivities to evidence, yielding apparent, historymodulated lapse rates when choices are averaged across trials (Fig. 1). Our model postulates that the initial state of the accumulator is a key underlying variable that jointly modulates psychometric thresholds and lapse rate parameters, with the exact nature of this comodulation determined by the within and across trial parameters governing evidence accumulation. We tested this model in a large rat dataset consisting of choices from 152 rats (Fig. 2) and confirmed its predictions using detailed modelfitting. We found that the singular process of historydependent initial states successfully captured a substantial amount of variance in history modulations of both thresholds and lapse rates in the dataset (Fig. 3). Finally, we tested the reaction time predictions of the model in a novel task in rats, and confirmed that the data showed signatures of initial state updating. The model could successfully capture choices, reaction times, and history modulations in psychometric thresholds and lapse rates (Fig. 4). Altogether, our results suggest that history biases and a substantial amount of variance attributed to lapses may reflect a common mechanistic process, whose evolution can be precisely tracked both within and across trials.
History biases in perceptual decision making tasks have been modeled using initial state updates to DDMs in humans and nonhuman primates^{2,5,24}. These studies tended to have relatively small magnitudes of history bias, and miniscule lapse rates, hence being well captured by small deviations in the initial state of a DDM, which largely yield horizontal shifts in the psychometric function. This regime of initial state updates is well approximated by a logistic function with additive biases, which is the dominant descriptive model used to characterize historydependent psychometric curves^{3,4,6,8,9,11,12,13,17,26,34,50}. However, as we demonstrate, when deviations in the initial state are large, this logistic approximation breaks down. This fact has been overlooked in much of the literature. Consequently, even in datasets with large history biases and lapses, the logistic formulation continues to be favored^{9,17,18,34}, albeit requiring additional components. Such effects tend to be prevalent in rodents but not human or nonhuman primate behavior. Our demonstration predicts that the full range of initial state effects should resemble concurrent, trialbytrial changes in both threshold and sensitivity parameters of the logistic function. Indeed, Ashwood et al.^{34} found that apparent lapses in several rodent datasets can be better captured by runs of trials with such concurrent modulations, yielding biased “disengaged" states. Our model captures both these behavioral regimes simply using different magnitudes of initial state updates, rendering it capable of accounting for individual differences across animals, and potentially even species with very different behavioral signatures, as long as the constraints between initial state updating, history biases and lapses are obeyed.
A number of previous studies have hinted at the performancelimiting effect of sequential biases, variability in initial points and/or sensitivity across trials^{23,46,47}. Nguyen et al.^{47} examined the optimal decision making strategy under a nonstationary generative model, and arrived at psychometric curves similar to the heavytailed curves produced by our model. Similarly, Shen et al.^{46} examined decisionmaking under variable “precision" across trials, which also yields heavytailed psychometrics, trading off against lapse parameters. However, to our knowledge, ours is the first study to directly examine the effect of sequential biases on lapse rates, and link the two relatively separate literatures. Our model formulation shares some features with previous work on sequential biases, albeit with some distinct features  our model is a Drift Diffusion Model with historydependent initial states (similar to Nguyen et al.^{47}, but unlike Kim et al.^{25}, who use an adaptive LATER model) adapted to discrete stimuli for the purpose of trialbytrial modeling. Our model’s initial states are a continuous variable, unlike Urai et al.^{12}, whose initial states take on one of two possible discrete values. Also, our model’s initial states are set by a flexible exponential filter on several past choices and outcomes, unlike Nguyen et al.^{47}, Kim et al.^{25}, Yu et al.^{48} and other variants of the Dynamic Belief Model, albeit reducing to them for certain restricted parameter regimes.
In our treatment, we only considered historydependent updates to the initial state of a DDM. Such a mechanism is normative under nonstationary beliefs about the prior (note that this is the case if the agent assumes that a shift in the prior over stimulus categories maps onto an overall shift in the prior over stimulus difficulties — see Drugowitsch et al.^{44} for a detailed treatment), which is our favored interpretation as it aligns with other studies of history biases^{2,8,19,20,24,25,51,52}. Nevertheless, these updates may also reflect other heuristic strategies^{53} which we accommodate using our flexible parameterization of initial state updates. Animals may entertain nonstationary beliefs about other elements of the decision process, such as the rewards or likelihoods^{14,15,32,42}. Normative updating in such situations still reduces to initial state updates in simple settings (for e.g. nonstationary rewards for a single difficulty^{54,55}), but in more complex ones it affects drift rates or bounds in addition to initial states^{12,14,44,45,56,57,58}. This commonality of initial state updating to many different nonstationary beliefs motivated us to probe its role in producing apparent lapses, and indeed this mechanism was able to explain an impressive amount of variance in our dataset, leading us to conclude that initial state updating is at least a major factor driving animal behavior. Another crucial possibility is trialtotrial variability in drift rates, which is known to give rise to longer error RTs than correct RTs^{43,59,60,61} and is a signature often reported in monkeys and humans^{62,63}. We did not observe the reaction time signatures of drift rate variability in our dataset, instead we identified signatures of initial state variability, where error RTs were shorter than correct RTs, rather than longer. However, drift rate updates may represent an alternative mechanism through which historymodulated apparent lapses could occur in other datasets. It is worth noting that certain task designs include efforts to actively measure and counter trial history biases. In such cases, lapses may still occur, likely due to exploration or inattention. In this manuscript, we refer to lapses caused by these factors as “true lapses”, since they cannot be explained by fluctuations in DDMrelated parameters.
Lapse rates are often considered to be a mixed bag comprising several different noise processes separate from the decision process, yet most studies so far have focused on one or more of these component processes in isolation^{32,34}. In this work, we have attempted a more expansive approach of considering multiple processes at once, in an attempt to partition lapse rate variance into mixtures of deterministic and stochastic components. We distinguished apparent lapses that interact with sensory evidence from two models of “true" lapses that are both evidence independent — motor error or exploration, which does not interact with the accumulator, and inattention, which may still depend on its initial state. While we find that the behavior of our rats is best described by a mixture of apparent lapses and the two true lapse variants, it is primarily the apparent lapses (rather than either true lapse variant) that capture the links between the suboptimalities i.e. the historydependent comodulations in psychometric thresholds and lapse rates. A previous study proposed an evidencedependent model of true lapses, uncertaintyguided exploration^{32}, in order to account for the scaling of lapse rates with sensory noise. Although we don’t explicitly consider this model, our model of apparent lapses already displays this property, with higher levels of sensory noise leading to more frequent apparent lapses.
Our model predicts that an increased reliance on history (i.e., larger shifts of the initial states) should produce more apparent lapses. Indeed, this could provide an explanation that links disparate sets of observations from previous studies: while some studies have reported that perturbations of secondary motor cortex and striatum give rise to higher lapse rates^{32,36,37,38,39}, others have shown that the effects of perturbing these regions seems to resemble an increased historydependence^{39,64}. Interpreting these results through the lens of our model, we would conclude that these regions play a crucial role in the interaction of historydependent initial states with sensory evidence, making them a potential common neural substrate that could contribute to both kinds of suboptimalities. Indeed, increased history dependence upon M2 perturbation has been shown to be mediated by increased bias in the initial value of the neurally derived accumulator variable^{64}. Similarly, DMS perturbations had large effects on lapse rates in moderately engaged behavioral states that were influenced by both sensory evidence and history^{50}. Our model could also help explain why Busse et al.^{3} found that mice with higher lapse probabilities showed higher history dependence, or results from IBL^{18} who observed a modulation in lapse rates in addition to horizontal biases upon explicit manipulation of category priors. Nonetheless, these observations do not preclude the possibility that there are indeed independent neural mechanisms and/or areas through which trialhistory effects and lapses (particularly true lapses) arise. Consistent with this, studies have implicated different brain areas in producing deterministic vs stochastic biases in action timing^{65}, and even different subcircuits within the same area in giving rise to distinct behavioral strategies^{66}. Detailed manipulations of brain regions with prior information such as in studies like IBL (2023)^{67} could help pinpoint the neural mechanisms through which these suboptimalities arise.
One interesting future line of investigation is to probe the precise nature of the model of nonstationarity over priors assumed by animals in such tasks. The range of parameter values inferred using our flexible formulation could offer a useful starting point for this line of investigation. For instance, Dynamic Belief Models^{19,68}, a popular class of generative models over priors, correspond to a narrowly constrained set of parameter values in our model. Such an understanding would not only afford more reliable control of behavior and more accurate interpretation of neural correlates in stationary tasks, but could also yield insight into the inductive biases that allow animals to learn quickly and efficiently in nonstationary, naturalistic settings.
Methods
Subjects
Animal use procedures were approved by the Princeton University Institutional Animal Care and Use Committee (IACUC #1853). All subjects (n = 152) were adult male Long Evans rats, typically housed in pairs. Housing both male and female rats in our rodent system resulted in a significant rise in aggression especially in certain transgenic rat lines to the point of making these rats unsafe to handle. This prevented us from studying both sexes and including sex as a factor in our study design. Rats that trained during the day were housed in a reverse light cycle room. Rats were typically aged between 624 months. Rats had free access to food but in order to to motivate them to work for water reward, they were placed on a controlled water schedule: 24 hours per day during task training, usually 7 days a week and between 0 and 1hour ad lib following training.
Drift diffusion model of decisionmaking
We use a standard formulation of sequential decisionmaking^{23,43}, in which an agent is faced with a stream of noisy sensory evidence ϵ_{1:t} coming from one of two hypotheses H_{1} and H_{2}. The agent has to decide between sampling for longer or choosing one of two actions L,R (reaction time regime) or has to choose one of two actions after a fixed amount of evidence (fixed duration regime). Such a problem can be formulated as one of finding an optimal policy π_{t} in a partiallyobservable markov decision process^{43,69}, whose solution can be written as a pair of thresholds on the logposterior ratio \(\log \left(\frac{g(t)}{1g(t)}\right)\), where g(t) = p(H_{1}∣ϵ_{1:t}):
The log posterior ratio can be further broken down into a sum of log prior ratios and loglikelihood ratios, using Bayes rule:
The optimal policy can equivalently be expressed in terms of the prior and sum of momentary sensory evidence x(t) = ∑_{t} ϵ_{t}, which are sufficient statistics of the posterior^{43,70}. In the continuous time limit, when the average rate of evidence increments or drift rate is μ, and the standard deviation of sensory noise is σ, this corresponds to a driftdiffusion model that terminates when it reaches one of two bounds^{23} and whose initial state I is proportional to the log prior ratio:
In this case, the probability of choosing rightward actions, i.e. hitting the upper bound can be written analytically as follows (derived from ref. ^{45}):
In cases where trial difficulties (and hence drift rates) vary from trial to trial the optimal policy includes timedependent, collapsing bounds on the posterior. However, under certain circumstances, constant bounds on X_{t} = ∑_{t} ϵ_{t} implement closetooptimal collapsing bounds on the posterior^{43,71}, which is the regime we assume for our analysis.
Models of initial state updating
We model initial state updating as a sum of exponential filters over past choiceoutcome pairs (Rw: rightwins, Lw: leftwins, Rl: rightloss, Ll: leftloss). So the initial state I at trial n + 1 is given by:
where each filter i^{h} decays by a factor of β^{h}, and is incremented by a factor of η^{h} following the observation of that particular choiceoutcome pair, i.e
o_{n} is the choiceoutcome pair observed on trial n and 1^{h}(o_{n}) is an indicator function that is 1 when o_{n} = h and is 0 otherwise.
For nonreaction time datasets, in order to ensure good identifiability we constrained the update parameters to be the same following both left and right losses i.e. β^{h} and η^{h} to be the same for h = {Rl, Ll}. Additionally, following correct trials, we enforce the timescale of update i.e. β^{h} to be the same for left and right trials h = {Lw, Rw} while allowing the increment parameters η^{h} to be different. When, β^{h} and η^{h} are the same ∀ h, this rule reduces to an approximation of the Bayesian update for the Dynamic Belief Model^{19}, which tracks a prior that undergoes discrete unsignaled switches at a fixed rate. We compared this reduced (DBM) model to the exponetial filter as described above (Supplementary Fig. 6a, b). While model comparison revealed that not every rat required all parameters to be different, the unconstrained model is the most general form that best captures behavior across rats.
Psychometric curves
Psychometric curves model the probability of a subject choosing one of the options (e.g. right) as a function of stimulus strength. We parametrize the psychometric curve as a 4parameter logistic function:
where x_{0} is the threshold parameter that additively biases the stimulus x, b measures sensitivity to the stimulus, κ_{0} is the left asymptote or left lapse rate and κ_{1} scales the logistic function. Therefore, the right asymptote is given by κ_{0} + κ_{1} and the right lapse rate itself is given by 1−(κ_{0} + κ_{1}). We fit all four of these parameters {κ_{0}, κ_{1}, x_{0}, b} to choices generated by either the DDM (Fig. 1), rats (Figs. 2–4), or accumulator models adapted to the tasks (Figs. 3, 4) using a gradientdescent algorithm (interiorpoint) to maximize the (Binomial) log likelihood of choices using MATLAB’s constrained optimization function fmincon. κ_{0} and κ_{1} were both constrained to lie within the interval [0, 1]. 95% confidence intervals on these parameters were generated using bootstrapping. Throughout this manuscript, we follow the convention from Wichmann and Hill (2001) and use “threshold" to denote the xaxis value at the inflection point of the psychometric curve, and “slope" to denote the sensitivity or slope of the curve at this inflection point. Also all lapse rates reported, were measured through the fits of such 4parameter logistic functions to animal’s choices following previous definitions of lapse rates (Brunton et al.^{30}, Prins^{72}) and never through the error rates at extreme stimulus strengths.
History modulation of psychometric parameters
To summarize the effects of trial history on psychometric parameters we fit independent psychometric curves to choices conditioned on 1trial back choiceoutcome history i.e. following rightward wins (Rw) and leftward wins (Lw). Modulation of the threshold parameter by history was then computed as \({x}_{0}^{Rw}{x}_{0}^{Lw}\). To quantify the modulation of lapse rate parameter by history we first computed the difference in the left and right asymptotes following rightward and leftward wins: \({\kappa }_{0}^{Rw}{\kappa }_{0}^{Lw}\) and \(({\kappa }_{0}^{Rw}+{\kappa }_{1}^{Rw})({\kappa }_{0}^{Lw}+{\kappa }_{1}^{Lw})\) respectively. The net modulation of lapse rates with trial history is given by the sum of these differences: \(2({\kappa }_{0}^{Rw}{\kappa }_{0}^{Lw})+({\kappa }_{1}^{Rw}{\kappa }_{1}^{Lw})\).
Behavioral tasks
Auditory evidence accumulation task
Rats were trained with previously established protocol^{30,36,37,73} using the BControl system. Briefly, rats were put in an operant chamber with three nose ports. They were trained to begin a trial by poking their nose into the middle port. This initiated two simultaneous streams of randomlytimed discrete auditory clicks for a predetermined duration after a variable delay (0.5–1.3s), one from a speaker to their left and the other to their right. Rats were required to maintain “fixation" throughout the entire stimulus (1.5s), failure to do so led to a violation trial. At the end of the stimulus, rats had to poke towards the side which played the greater number of clicks to obtain a water reward. Stimulus difficulty was varied from trialtotrial by changing the ratio of the generative Poisson rates of the two click streams. Trial difficulty and rewarded side were independently sampled on each trial.
We analyzed rats which performed greater than 30,000 trials, at 70% or more accuracy. Sessions with less than 300 trials or less than 60% accuracy for either of the choices were excluded. Since rats typically perform this task for many months after having passed the final training stage, to minimize nonstationarities in the data (due to break in training because of holiday closures etc.) and ensure that we are analyzing asymptotic performance, we identified temporally contiguous sessions with stable accuracy by performing changepoint detection on smoothed trial hit rate using MATLAB’s findchangepts function. The partition with most number of trials was included in the analysis. Since the animals neither made a choice nor received an outcome on violation trials, we ignore them while computing trialhistory effects. In addition, data from 19 rats analyzed in Brunton et al.^{30} was also included in this analysis.
Auditory evidence accumulation task with reaction time reports
To measure rats’ reaction times in addition to choices we modified the auditory evidence accumulation task in two ways. First, we relaxed the “fixation" requirement and instead allowed rats to sample the stimulus for as long as they want. As soon as rats broke fixation by removing their nose from the center port, the stimulus stopped and the rats were required to report their decision by poking into one of the side ports. For any given trial, the time that the rat spent sampling the stimulus was its reaction time. Second, we rewarded rats if they correctly reported the side which had greater underlying Poisson rate rather than the side which played the greater number of clicks. This helped eliminate the trivial strategy of culminating a decision after the first click and having perfect accuracy by simply reporting the side of that click without any need for evidence accumulation.
In practice, we followed the same training protocol as the interrogation task^{30} but with the modified reward rule. Once the rats were fully trained on the interrogation protocol we gradually reduced the duration of delay between stimulus onset and trial initiation as well as the fixation period. Most rats maintained high accuracy (>70%) upon this manipulation, if rats performance did not meet this criterion even after a week of training, they were excluded. Rats tended to have worse accuracy early in the session, so we omitted the first 50 trials from our analysis. After the first 50 trials, we confirmed that the accuracy in the first and second halves of the session was comparable.
Data modeling methods
Accumulator model
To model subjects choices and RTs, we used the accumulation to bound model modified to take into account the discrete nature of evidence in our behavioral tasks^{30}. In the model, the evolution of accumulated evidence x(t) in response to the left (ϵ_{L}) and right (ϵ_{R}) click trains on trial n is given by:
where λ is the inverse time constant of the consistent drift in memory of x(t). C_{R}(t) and C_{L}(t) are the magnitudes of each right and left click respectively after undergoing sensory adaptation (with adaptation strength ϕ and adaptation time constant τ_{ϕ}). The sensory noise that accompanies each click is represented by ξ_{R}, ξ_{L} which are Gaussian random variables with mean 1 and variance \({\sigma }_{s}^{2}\). The accumulation variable x also undergoes Brownian diffusion through the addition of a Wiener process (W) with variance \({\sigma }_{x}^{2}\). B represents the absorbing decision bound that prevents x(t) from evolving further, if crossed. The initial value of the accumulator variable a varies from trialtotrial and is set based on exponentially filtered history of previous choices and outcomes (see Methods section on Models of initial state updating). A choice is made by comparing the final value of the accumulator x(T) to a side bias. A rightward choice is made if x(T) > bias.
Since the model quantifies noise sources on each trial, it requires estimating the evolution of a noiseinduced probability distribution P(x(t)). We compute P(x(t)) by solving the FokkerPlanck equations that correspond to model dynamics (see refs. ^{30,74} for numerical methods). The probability of making a rightward choice at the end timepoint T of a trial, given accumulation model parameters θ^{acc} is:
Models of true lapses
We assume that some fraction of choices κ arise from processes extraneous to evidence accumulation such as motor error/exploration or inattention. We parameterize these processes with θ^{lapse} and refer to them as “true lapses":

In the motor error/exploration variant, the probability of making a choice towards the right  when lapsing  is given by ρ.

In the inattention variant (Supplementary Fig. 5c), the subject lapses towards the side favored by the initial state relative to a bias ρ. So the probability of a rightward choice due to inattention on trial n is:

In the hybrid variant (with motor error and inattention; Supplementary Fig. 9), the probability of lapsing towards right depends on the initial state through a sigmoidal function whose slope m (or matching constant) as well as bias ρ is a free parameter:
Hence the total probability of making a rightward choice due to accumulation and true lapses is:
where Θ = {θ^{acc}, θ^{lapse}, κ}.
Model fitting
The model parameters were fit to individual rats by maximizing the log likelihood of the observed choices of the rat c_{obs}, i.e. by maximizing
where n indexes trials. Throughout this manuscript, we assumed that for each rat, the parameters remain fixed across all sessions. So one set of parameters were fit to each rat for each model variant. Constrained optimization was performed in Julia using Optim package. We computed gradients for parameter optimization using a forwardmode automatic differentiation package. The reported maximum likelihood parameters and likelihood values (used for model comparison) are from model fits to the entire dataset. We fit a random subset of 10 rats using 5fold crossvalidation (85% training dataset, 15% test dataset) but this yielded very similar maximum likelihood parameters and virtually identical test and training loglikelihoods. Hence, to save on computing time we fit the different model variants to each rat’s entire dataset. This agreement between test and training likelihoods is likely due to the large number of trials in our dataset and the modest number of parameters in our model.
Simultaneous modeling of choices and RTs
In decisionmaking tasks, observed reaction times (RTs) are often thought of as comprised of stimulus sampling or decision times (DTs, the time it takes for the subject’s accumulated evidence to hit the bound) and nondecision related processing times (NDTs). In our datasets we observed that reaction times tended to be slower following incorrect trials and that they grew longer over the course of a session. These effects could be isolated just to RTs and were not observed in choice behavior. To model these trends we conceptualize nondecision times as arising from a separate drift diffusion process whose drift ν is additionally modulated by current trial number n and previous trial’s outcome. These nondecision time driftdiffusion processes terminate when the bound ω is hit. We assume that the nondecision times for each choice k ∈ {L, R} have independent bounds (ω_{k}) and drifts (ν_{k}). So the nondecision times for a trial n are samples from the following Wald or Inverse Gaussian (IG) distribution:
where k ∈ {L, R} and \({1}_{(n1)}^{}\) is an indicator function which is 1 if the previous trial was incorrect and is 0 otherwise. α parameterizes the impact of trial number on NDTs and γ_{o} parameterizes the impact of previous trial’s outcome on current trial’s NDT.
We fit the model by maximizing the joint log likelihood of the observed choices and RTs. For any given trial, we can compute the likelihood of observing a particular reaction time RT_{obs} and choice c_{obs} due to accumulation by marginalizing over possible decision or bound hitting times \({\tau }_{{c}_{obs}}\) for the observed choice:
On true lapse trials, RTs were assumed to arise from NDTs alone and therefore the joint likelihood due to accumulation and true lapses is given by:
where Θ = {θ^{acc}, θ^{NDT}, θ^{lapse}, κ}.
We followed previously established methods to compute the probability distribution of x(t) for computing the likelihood^{30,74}. This involves expressing the temporal dynamics of the probability distribution as a FokkerPlanck equation and then computing the solution numerically, by dividing P(x(t)) into a set of n discrete spatial bins and determining how probability mass moves after a discrete temporal interval Δt. The transition matrix for discrete time dynamics and a full description of the methods can be found in these studies.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
The rodent behavioral data generated in this study from the Poisson Clicks and the reaction time task have been deposited in the figshare database under accession code CC BY 4.0 at the following https://doi.org/10.6084/m9.figshare.24113793. Source data are provided with this paper.
Code availability
Analysis codes is available here: https://github.com/BrodyLab/trialhistory_lapses_EA.git with https://doi.org/10.5281/zenodo.10161051.
References
Cho, R. et al. Mechanisms underlying dependencies of performance on stimulus history in a twoalternative forcedchoice task. Cogn. Affect. Behav. Neurosci. 2, 283–299 (2002).
Gold, J., Law, C., Connolly, P. & Bennur, S. The relative influences of priors and sensory evidence on an oculomotor decision variable during perceptual learning. J. Neurophysiol. 100, 2653–2668 (2008).
Busse, L. et al. The detection of visual contrast in the behaving mouse. J. Neurosci. 31, 11351–11361 (2011).
Carandini, M. & Churchland, A. Probing perceptual decisions in rodents. Nat. Neurosci. 16, 824–831 (2013).
Zhang, S., Huang, H. & Yu, A. Sequential effects: A Bayesian analysis of prior bias on reaction time and behavioral choice. Proc. Ann. Meet. Cogn. Sci. Soc. 36 (2014).
Fründ, I., Wichmann, F. & Macke, J. Quantifying the effect of intertrial dependence on perceptual decisions. J. Vis. 14, 9–9 (2014).
Scott, B., Constantinople, C., Erlich, J., Tank, D. & Brody, C. Sources of noise during accumulation of evidence in unrestrained and voluntarily headrestrained rats. Elife 4, e11308 (2015).
Abrahamyan, A., Silva, L., Dakin, S., Carandini, M. & Gardner, J. Adaptable history biases in human perceptual decisions. Proc. Natl Acad. Sci. 113, E3548–E3557 (2016).
Odoemene, O., Pisupati, S., Nguyen, H. & Churchland, A. Visual evidence accumulation guides decisionmaking in unrestrained mice. J. Neurosci. 38, 10143–10155 (2018).
Akrami, A., Kopec, C., Diamond, M. & Brody, C. Posterior parietal cortex represents sensory history and mediates its effects on behaviour. Nature 554, 368–372 (2018).
Pinto, L. et al. An accumulationofevidence task using visual pulses for mice navigating in virtual reality. Front. Behav. Neurosci. 12, 36 (2018).
Urai, A., De Gee, J., Tsetsos, K. & Donner, T. Choice history biases subsequent evidence accumulation. Elife 8, e46331 (2019).
HermosoMendizabal, A. et al. Response outcomes gate the impact of expectations on perceptual decisions. Nat. Commun. 11, 1057 (2020).
Mendonça, A. et al. The impact of learning on perceptual decisions and its implication for speedaccuracy tradeoffs. Nat. Commun. 11, 2757 (2020).
Lak, A. et al. Reinforcement biases subsequent perceptual decisions when confidence is low, a widespread behavioral phenomenon. Elife 9, e49834 (2020).
Mochol, G., Kiani, R. & MorenoBote, R. Prefrontal cortex represents heuristics that shape choice bias and its integration into future behavior. Curr. Biol. 31, 1234–1244 (2021).
Roy, N., Bak, J., Akrami, A., Brody, C. & Pillow, J. Extracting the dynamics of behavior in sensory decisionmaking experiments. Neuron 109, 597–610 (2021).
Laboratory, I. et al. Standardized and reproducible measurement of decisionmaking in mice. Elife 10, e63711 (2021).
Yu, A. & Cohen, J. Sequential effects: Superstition or rational behavior? Adv. Neural Inf. Process. Syst. 21 (2008).
MolanoMazón, M. et al. Recurrent networks endowed with structural priors explain suboptimal animal behavior. Current Biology 33, 622–638 (2023).
Laming, D. Information theory of choicereaction times. (Academic Press,1968).
Ratcliff, R. & Rouder, J. Modeling response times for twochoice decisions. Psychol. Sci. 9, 347–356 (1998).
Bogacz, R., Brown, E., Moehlis, J., Holmes, P. & Cohen, J. The physics of optimal decision making: a formal analysis of models of performance in twoalternative forcedchoice tasks. Psychol. Rev. 113, 700 (2006).
Goldfarb, S., WongLin, K., Schwemmer, M., Leonard, N. & Holmes, P. Can posterror dynamics explain sequential reaction time patterns? Front. Psychol. 3, 213 (2012).
Kim, T., Kabir, M. & Gold, J. Coupled decision processes update and maintain saccadic priors in a dynamic environment. J. Neurosci. 37, 3632–3645 (2017).
Gardner, J. Optimality and heuristics in perceptual neuroscience. Nat. Neurosci. 22, 514–523 (2019).
Wichmann, F. & Hill, N. The psychometric function: I. Fitting, sampling, and goodness of fit. Percept. Psychophys. 63, 1293–1313 (2001).
Law, C. & Gold, J. Reinforcement learning can account for associative and perceptual learning on a visualdecision task. Nat. Neurosci. 12, 655–663 (2009).
Gold, J. & Ding, L. How mechanisms of perceptual decisionmaking affect the psychometric function. Prog. Neurobiol. 103, 98–114 (2013).
Brunton, B., Botvinick, M. & Brody, C. Rats and humans can optimally accumulate evidence for decisionmaking. Science 340, 95–98 (2013).
Wang, H. et al. Finding the needle in highdimensional haystack: A tutorial on canonical correlation analysis. ArXiv Preprint ArXiv:1812.02598. (2018).
Pisupati, S., ChartarifskyLynn, L., Khanal, A. & Churchland, A. Lapses in perceptual decisions reflect exploration. Elife 10, e55490 (2021).
Shushruth, S., Zylberberg, A. & Shadlen, M. Sequential sampling from memory underlies action selection during abstract decisionmaking. Curr. Biol. 32, 1949–1960 (2022).
Ashwood, Z. et al. Mice alternate between discrete strategies during perceptual decisionmaking. Nat. Neurosci. 25, 201–212 (2022).
Erlich, J., Bialek, M. & Brody, C. A cortical substrate for memoryguided orienting in the rat. Neuron 72, 330–343 (2011).
Erlich, J., Brunton, B., Duan, C., Hanks, T. & Brody, C. Distinct effects of prefrontal and parietal cortex inactivations on an accumulation of evidence task in the rat. Elife 4, e05457 (2015).
Yartsev, M., Hanks, T., Yoon, A. & Brody, C. Causal contribution and dynamical encoding in the striatum during evidence accumulation. Elife 7, e34929 (2018).
Guo, L., Weems, J., Walker, W., Levichev, A. & Jaramillo, S. Choiceselective neurons in the auditory cortex and in its striatal target encode reward expectation. J. Neurosci. 39, 3687–3697 (2019).
Sindreu, C. et al. The causal role of the striatum in the encoding of taskadaptive expectationbased choice biases. Comput. Syst. Neurosci. 2021. 117 (2021).
Siniscalchi, M., Wang, H. & Kwan, A. Enhanced population coding for rewarded choices in the medial frontal cortex of the mouse. Cerebr. Cortex 29, 4090–4106 (2019).
Gold, J. & Shadlen, M. The neural basis of decision making. Annu. Rev. Neurosci. 30, 535–574 (2007).
Dayan, P. & Daw, N. Decision theory, reinforcement learning, and the brain. Cogn. Affect. Behav. Neurosci. 8, 429–453 (2008).
Drugowitsch, J., MorenoBote, R., Churchland, A., Shadlen, M. & Pouget, A. The cost of accumulating evidence in perceptual decision making. J. Neurosci. 32, 3612–3628 (2012).
Drugowitsch, J., Mainen, Z. & Pouget, A. Learning optimal decisions with confidence. Proc. Natl Acad. Sci. 116, 24872–24880 (2019).
Palmer, J., Huk, A. & Shadlen, M. The effect of stimulus strength on the speed and accuracy of a perceptual decision. J. Vis. 5, 1–1 (2005).
Shen, S. & Ma, W. Variable precision in visual perception. Psychol. Rev. 126, 89 (2019).
Nguyen, K., Josić, K. & Kilpatrick, Z. Optimizing sequential decisions in the driftdiffusion model. J. Math. Psychol. 88, 32–47 (2019).
Yu, A., Dayan, P. & Cohen, J. Dynamics of attentional selection under conflict: toward a rational Bayesian account. J. Exp. Psychol. Hum. Percept. Perform. 35, 700 (2009).
Karlsson, M., Tervo, D. & Karpova, A. Network resets in medial prefrontal cortex mark the onset of behavioral uncertainty. Science 338, 135–139 (2012).
Bolkan, S. et al. Others Opponent control of behavior by dorsomedial striatal pathways depends on task demands and internal state. Nat. Neurosci. 25, 345–357 (2022).
Summerfield, C. & Koechlin, E. Economic value biases uncertain perceptual choices in the parietal and prefrontal cortices. Front. Hum. Neurosci. 4, 208 (2010).
Mulder, M., Wagenmakers, E., Ratcliff, R., Boekel, W. & Forstmann, B. Bias in the brain: a diffusion model analysis of prior probability and potential payoff. J. Neurosci. 32, 2335–2343 (2012).
Gigerenzer, G. & Gaissmaier, W. Heuristic decision making. Ann. Rev. Psychol. 62, 451–482 (2011).
Simen, P. et al. Reward rate optimization in twoalternative decision making: empirical tests of theoretical predictions. J. Exp. Psychol. Hum. Percept. Perform. 35, 1865 (2009).
Rorie, A., Gao, J., McClelland, J. & Newsome, W. Integration of sensory and reward information during perceptual decisionmaking in lateral intraparietal cortex (LIP) of the macaque monkey. PloS One 5, e9308 (2010).
Eckhoff, P., Holmes, P., Law, C., Connolly, P. & Gold, J. On diffusion processes with variable drift rates as models for decision making during learning. N. J. Phys. 10, 015006 (2008).
Hanks, T., Mazurek, M., Kiani, R., Hopp, E. & Shadlen, M. Elapsed decision time affects the weighting of prior probability in a perceptual decision task. J. Neurosci. 31, 6339–6352 (2011).
Fan, Y., Gold, J. & Ding, L. Ongoing, rational calibration of rewarddriven perceptual biases. Elife 7, e36018 (2018).
Ditterich, J. Evidence for timevariant decision making. Eur. J. Neurosci. 24, 3628–3641 (2006).
Ditterich, J. Stochastic models of decisions about motion direction: behavior and physiology. Neural Netw. 19, 981–1012 (2006).
Nguyen, Q. & Reinagel, P. A qualitative difference in decisionmaking of rats vs. humans explained by quantitative differences in behavioral variability. BioRxiv., 202001 (2020).
Roitman, J. & Shadlen, M. Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci. 22, 9475–9489 (2002).
Shevinsky, C. & Reinagel, P. The interaction between elapsed time and decision accuracy differs between humans and rats. Front. Neurosci. 13, 1211 (2019).
Luo, T., Hanks, T., Gupta, D., Bondy, A. & Brody, C. Dorsomedial frontal cortex participates in both evidence accumulation and historybased updating. Comput. Syst. Neurosci. (2021).
Murakami, M., Shteingart, H., Loewenstein, Y. & Mainen, Z. Distinct sources of deterministic and stochastic components of action timing decisions in rodent frontal cortex. Neuron 94, 908–919 (2017).
Cazettes, F. et al. A reservoir of foraging decision variables in the mouse brain. Nat. Neurosci. 1–10 (2023).
Findling, C. et al. Brainwide representations of prior information in mouse decisionmaking. BioRxiv. 202307 (2023).
Ryali, C., Reddy, G. & Yu, A. Demystifying excessively volatile human learning: A Bayesian persistent prior and a neural approximation. Adv. Neural Inf. Process. Syst. 31 (2018).
Rao, R. Decision making under uncertainty: a neural model based on partially observable markov decision processes. Front. Comput. Neurosci. 4, 146 (2010).
Piet, A., El Hady, A. & Brody, C. Rats adopt the optimal timescale for evidence integration in a dynamic environment. Nat. Commun. 9, 4265 (2018).
Deneve, S. Making decisions with unknown sensory reliability. Front. Neurosci. 6, 75 (2012).
Prins, N. The psychometric function: the lapse rate revisited. J Vis. 12, 25 (2012).
Hanks, T. et al. Distinct relationships of parietal and prefrontal cortices to evidence accumulation. Nature 520, 220–223 (2015).
DePasquale, B., Brody, C. & Pillow, J. Neural population dynamics underlying evidence accumulation in multiple rat brain regions. BioRxiv, 202110 (2021).
Acknowledgements
We thank members of the Brody lab for experimental support and helpful feedback throughout the project especially Adrian Bondy, Thomas Luo, Emily Dennis, Tyler BoydMeredith, and Ahmed ElHady. We also thank Jovanna Teran and Brody lab technicians for assistance with rat training. We are grateful to Sashank Pisupati, Jonathan Cohen, Sebastian Musslick, Jonathan Pillow, and Ilana Witten for helpful discussions at various points during the project. This work was supported by NIH grant R01MH108358 awarded to C.D.B as well as a grant from the Simons Foundation (Grant number: NCGBCULM0000311803) awarded to C.D.B.
Author information
Authors and Affiliations
Contributions
D.G. organized and analyzed the data and wrote the initial draft of the manuscript. C.D.K. assisted with data collection. B.D. assisted with analysis. All authors provided feedback on the manuscript. C.D.B. oversaw all aspects of the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Joshua Gold and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gupta, D., DePasquale, B., Kopec, C.D. et al. Trialhistory biases in evidence accumulation can give rise to apparent lapses in decisionmaking. Nat Commun 15, 662 (2024). https://doi.org/10.1038/s41467024448805
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467024448805
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.