Even in contexts where visual input varies randomly from trial to trial, human observers tend to blend stimuli from previous trials into their representation of the current one, leading to a bias in behavioral reports1,2,3,4,5,6,7,8,9,10,11,12,13,14. This smoothing of representations – termed “serial dependence” – is a function of how close successive stimuli are in space1,15 and time1,2,3,4,5,6,7,12,14,15. It is also sensitive to their featural similarity1,2,3,7,8,9,10,12,14,16,17. Serial dependence has been observed in judgments of orientation1,8,9, and location16,17, as well as more complex attributes like the identity2 and attractiveness3,5,11 of human faces. That the bias is observed for such disparate features suggests it may be a universal principle of visual processing, and recent work has sought to demonstrate its adaptiveness8: In natural environments – where the input to our eyes is generally very similar from moment to moment18 – temporal smoothing would be expected to stabilize perception in the face of noise and occlusion1,8.

While the benefits of perceptual stability seem obvious, it is important to note that serial dependence impedes another goal of perception, which is to be sensitive to change. A classic example of how visual perception prioritizes change detection is the tilt after-effect19. This illusion (which is a type of adaptation20) is the quantitative opposite of serial dependence: Perception of the current moment is repelled away from, rather than merged with, recently processed stimuli – exaggerating differences. Like serial dependence, adaptation spans different types of stimulus features20,21,22,23,24. However, unlike adaptation, the attractive bias depends on attention: the observer must attend to each stimulus for serial dependence to occur1. Attention is thought to rely on the same neural and psychological mechanisms as working memory25,26,27,28,29,30,31. Hence, it is possible that whereas adaptation is a phenomenon of visual perception20,21,22,23,24, serial dependence arises instead from post-perceptual visual working memory9,32 If this were true, stability would operate in parallel with (rather than compete against) change detection, as these functions would be relegated to distinct cognitive systems9.

Preliminary efforts have been made to resolve whether serial dependence is perceptual or mnemonic in nature, with mixed results1,9. Using a comparison task that minimizes memory demands, one group identified positive serial dependence in a small number of individuals1 – in favor of the perceptual account. However, an attempt to replicate this effect with a larger sample size only revealed repulsive adaptation9. That is, no attractive serial dependence was observed when memory demands were removed using the same comparison task in the follow-up study, in support of the idea that serial dependence requires working memory9. A complementary strategy for clarifying this issue has been to boost memory demands – by increasing the delay between stimulus and response in delayed-estimation tasks – to determine whether this potentiates the attractive bias9,32. Traditionally, errors that scale with delay length are interpreted as mnemonic in origin, whereas those that are constant over time are assumed to be tied to the perceptual or motor demands that are also fixed33. Over a limited range, the magnitude of the serial dependence effect increases the longer that working memory is active9,16,17. Despite this potential connection to working memory, serial dependence has yet to be incorporated into the many mathematical models that have been developed in recent years to fit the dispersion of errors in human memory-guided behavior34,35,36,37,38,39,40,41,42,43,44.

In the present study, we investigate temporal smoothing in visual cognition over a wider range of memory delays than has been used in the past. We use a spatial delayed response task, which has been shown to produce serial dependence in non-human primates16,17. Previous experiments using delayed response tasks to measure serial dependence have included a visual mask after the stimulus presentation period1,2,8,9, as well as a delay period of at least several hundred milliseconds before a response is permitted1,2,8,9,16,17, which encourages encoding into working memory and cannot cleanly measure more fragile perceptual representations45,46. In our shortest delay condition, we allow participants to respond immediately after stimulus offset, with no mask. From this 0-s baseline, we parametrically increase the delay length up to 10 s. In a separate experiment, we parametrically manipulate the length of the inter-trial interval (ITI). This permits us to assess the decay rate of the trial-history effect in the absence of intervening trials – clarifying its potential functional and biological implementation. Finally, we pursue a novel formal unification of the serial dependence phenomenon with mathematical models of working memory34,35,37,38,39,40,41. This sets the stage for future experiments to dissect the neural mechanisms of serial dependence in the context of ongoing research into the organization of the working memory system32.


Experiment 1: Manipulation of visual working memory delay

Participants completed a spatial delayed response task, depicted in Fig. 1. For Experiment 1, the length of the working memory delay period in this task was varied randomly from trial to trial (0, 1, 3, 6, or 10 s). Collapsing across these five delay conditions, we identified serial dependence in the group dataset significantly greater than zero (\(p < {10}^{-4}\), group permutation test; peak-to-peak = 1.67°; bootstrapped 95% confidence interval = [1.48°, 1.85°]). To do this analysis, we measured the magnitude of serial dependence as the peak-to-peak of the curve fit to the pattern of errors across all possible differences between current and previous stimulus location (see Methods). The peak-to-peak is a measure of the maximal pull of responses away from the correct stimulus feature value as a result of this trial-history bias. Previous studies have used similar measures of amplitude to quantify serial dependence1,2,8,9,16. No bias was present in the data in the direction of the stimulus on the upcoming trial (\(n.s.\), group permutation test; peak-to-peak \(=-0.14^\circ \); bootstrapped 95% confidence interval = [−0.59°, 0.15°]), which supports the conclusion that the dependence of behavior on the previous trial is not due to spurious correlations in the particular randomized sequences of stimuli generated for the subjects10,14.

Figure 1
figure 1

The events in each trial of the generic spatial judgment task used for Experiments 1 and 2 (not to scale, see Methods for exact dimensions). Stimuli were presented in black against a gray background. Participants maintained fixation at the central square whenever it was on the screen (all task stages aside from the response period). Each trial started with the presentation of the cue whose location needed to be remembered for either a variable (Experiment 1) or fixed (Experiment 2) delay. Upon the disappearance of the central square at the end of the delay, the mouse cursor appeared at the exact center of the screen (not shown), and subjects used the mouse to make their response. Responses were not timed. Immediately after the response was made, the fixation square returned for a fixed (Experiment 1) or variable (Experiment 2) inter-trial interval.

Next we examined each of the delays individually. The magnitude of serial dependence across memory delays from 0–10 s is plotted in Fig. 2a. When participants reported the location of the stimulus immediately after viewing it, presumably relying at least in part on residual neural activity associated with perception, their responses showed signs of sensory adaptation, an effect that is opposite in direction from serial dependence (\(p{\mathrm{ < 10}}^{-4}\), group permutation test; peak-to-peak \(=\,-{1.72}^{\circ }\); bootstrapped 95% confidence interval \(=[-{2.30}^{\circ },-{1.09}^{\circ }]\); Fig. 2b). In contrast, for every other delay tested, serial dependence was significantly greater than zero (all \(p < 0.01\), group permutation tests). Moreover, the magnitude of serial dependence increased from 0–1 s (\(p\, < \,{10}^{-4}\), group permutation test; peak-to-peak at 1 s \(=\,0.85^\circ \); bootstrapped 95% confidence interval \(\mathrm{=[0.48}^\circ \mathrm{,1.20}^\circ ]\)) and again from 3–6 s (\(p < {10}^{-3}\); peak-to-peak at 6 s \(\,=\,3.37^\circ \); bootstrapped 95% confidence interval \(=\,[2.88^\circ \mathrm{,3.84}^\circ ]\)) before asymptoting between 6 and 10 s (\(n.s.\); peak-to-peak at 10 s \(\,=\,2.86^\circ \); bootstrapped 95% confidence interval \(\,=\,[2.28^\circ \mathrm{,3.41}^\circ ]\)). Serial dependence was numerically strongest in the 6-s condition, shown in Fig. 2c. Here, the peak-to-peak is visible as the distance along the y-axis between the maximal and minimal values of the model fit to the data. We note that the asymptote in serial dependence between 6 and 10 s does not correspond to an asymptote in the accumulation of noise in working memory. Consistent with a recent theoretical study and reanalysis of empirical data47, we observed a sublinear increase in the variance of responses, which in the case of our data continued up to the 10 s delay (bootstrapped 95% confidence interval at 6 s \({\mathrm{=[41.28}}^{^\circ 2}{\mathrm{,44.92}}^{^\circ 2}]\), at 10 s \({\mathrm{=[50.22}}^{^\circ 2}{\mathrm{,54.40}}^{^\circ 2}]\); Fig. 3).

Figure 2
figure 2

(a) Magnitude of serial dependence in the group data for each delay period tested in Experiment 1. Serial dependence is measured as the peak-to-peak of a least squares fit of the derivative of Gaussian (DoG) tuning function to the data. Error bars represent bootstrapped 95% confidence intervals. The magnitude of the serial dependence increases during the first 6 s of the delay period, and goes from significantly negative (evident of sensory adaptation) to significantly positive between 0 and 1 s. (b) Tuning of serial dependence across all possible angular differences between the current and previous stimulus, for the 0-s delay condition. The thin black line represents the group moving average of response errors, with the standard error in gray shading. The thick black line is the best-fitting DoG curve, and the orange line depicts the best fit of an alternative model – the Clifford model (see Methods) – which cannot capture the pattern of sensory adaptation in this condition. Although its positive and negative peaks are asymmetrical, adaptation is significantly stronger than chance at 0 s, with a peak-to-peak of \(-1.72^\circ \). (c) Tuning of serial dependence for the 6-s delay condition. Here, serial dependence is significantly more positive than chance, and the peak-to-peak of the DoG fit – indicated by the blue double-headed arrow in the figure – is 3.37°. Note that in this condition both the DoG and Clifford models capture the amplitude of the effect equivalently well.

Figure 3
figure 3

Variance of response errors as a function of the current trial’s delay in Experiment 1. The thin black line depicts the group mean, with bootstrapped 95% confidence intervals in gray shading. The thick black line is the linear best fit, which is a mismatch to the sublinear increase of variance with delay. A better fit is achieved with a power law (modified to allow for non-zero variance in the 0-s condition), depicted in orange. The power law is \(y\sim {(x+t)}^{\beta }\), with \(\beta =0.47\).

The large sample of participants from whom we collected data enabled us to assess the nature and range of individual differences in the pattern of trial-history effects that we observed at the group level. Adaptation and serial dependence are subtle effects – just a few degrees in magnitude at their peaks – whose tuning can be measured accurately only with many trials. Hence, we were statistically underpowered to detect differences between delay conditions for each individual subject (though we provide results divided by delay for a few sample subjects as Supp. Fig. 1). Instead, we collapsed over delay conditions for the purposes of evaluating which (if any) trial-history effect dominated throughout all time points in perception and working memory for each participant. The results are displayed in Fig. 4. Participants fell into three categories. Seven subjects showed evidence of strong repulsive adaptation that dominated across time points (all p < 0.05, permutation tests; all bootstrapped 95% confidence intervals \( < \,0\)). One of these, whose repulsive bias was strongest (peak-to-peak \(=-5.08^\circ \)), is presented in Fig. 4b (and Supp. Fig. 1A). Another 11 subjects had data that showed weak and/or noisy variation as a function of the previous stimulus’ location (all \(n.s.\)). The remaining majority of subjects (\(n=20\)) displayed visibly apparent and statistically significant attractive serial dependence (all \(p < 0.05\); all bootstrapped 95% confidence intervals \( > \,0\)). However, among these, there was noticeable diversity in the tuning of the effect. Figure 4c shows the tuning over stimulus differences for one subject with a low-amplitude (peak-to-peak \(=\,1.57^\circ \)) and narrow attractive serial dependence surrounded by negative “peripheral bumps”9 (where the bias changes direction to repulsion when consecutive stimuli are far apart). In contrast, the participant highlighted in Fig. 4d (and Supp. Fig. 1B) evinced the canonical shape of the effect – a broad spread of the attractive effect (peak-to-peak \(=\,5.11^\circ \)) and less prominent peripheral bumps.

Figure 4
figure 4

(a) Magnitudes of serial dependence observed for the individual participants tested in Experiment 1. For all but three individuals, serial dependence was measured as the peak-to-peak of the DoG fit to the data. The DoG was a qualitatively poor fit to the remaining participants (e.g., Fig. 4c), due to the prominence of peripheral bumps in their serial dependence tuning functions, which the DoG cannot capture. (The term “peripheral bumps” refers to repulsion at large differences between consecutive stimuli, in the same condition in which attraction occurs at small differences.) These participants are colored orange in this plot. Letters in this plot refer to the subfigures that follow. Significant negative adaptation and positive serial dependence (p < 0.05, permutation tests) are labeled with asterisks. Error bars are bootstrapped 95% confidence intervals. (b) Tuning of sensory adaptation across all possible angular differences between the current and previous stimulus, for the subject whose adaptation was strongest (peak-to-peak \(=-5.08^\circ \). The thin black line represents the group moving average of response errors, with the standard error in gray shading, and the thick black line is the best-fitting DoG curve, which fits the data as well as the Clifford model (in orange). (c) Tuning of serial dependence for a subject with a non-canonical pattern of the effect (peak-to-peak \(=\,1.57^\circ \)). The peripheral bumps are high in amplitude and wide relative to the narrow central attractive bias. The Clifford model in orange captures the positive and negative peaks of the effect well (even while the widths are misestimated), whereas the DoG mischaracterizes the bias as adaptation (negative peak-to-peak). (d) Tuning of serial dependence for a subject with strong, canonical serial dependence (peak-to-peak \(=\,5.11^\circ \)). Here, the central peaks of serial dependence are wider and higher-amplitude than the peripheral bumps, and both the DoG (in black) and Clifford model (in orange) capture the magnitude of the effect well, though the DoG misses the peripheral repulsion.

The time course of serial dependence we observed at the group level over the current trial’s delay period was not reproduced when trials were sorted based on the previous trial’s delay period. For each of the possible preceding delay period lengths, serial dependence in the current trial’s response was significantly greater than zero (all \(p < 0.01\), group permutation tests; Fig. 5a). Between 0 and 6 s (of delay on the previous trial), serial dependence varied little (all comparisons \(n.s.\), group permutation tests; minimum peak-to-peak at 6 s \(=\,1.78^\circ \); maximum peak-to-peak at 3 s \(=\,2.21^\circ \); all bootstrapped 95% confidence intervals overlapping). However, when the previous delay was as long as 10 s, serial dependence was significantly reduced relative to each of the other delay lengths (all \(p < 0.005\); peak-to-peak at 10 s \(=\,0.91^\circ \); bootstrapped 95% confidence interval \(=\,[0.50^\circ \mathrm{,1.20}^\circ ]\)). Tuning for the conditions with the strongest (3 s) and weakest (10 s) serial dependence are displayed in Fig. 5b and c, respectively. This set of findings is partially consistent with results from another study that tested a narrower range of delays (0.8 – 3.2 s) in non-human primates16. In this earlier study, it was found that the amplitude of serial dependence remained constant over this range (of previous delay length). Here, we extend this result to show that the previous trial’s influence does eventually decay when the previous trial’s delay is especially long.

Figure 5
figure 5

(a) Peak-to-peak of serial dependence in the group data for each length of the previous trial’s delay tested in Experiment 1. The peak-to-peak was calculated using a least squares fit of the Clifford tuning function to the data. Statistics could not be computed reliably using the DoG function due to its inability to capture the “peripheral bumps” of serial dependence, which were prominent when the data were sorted by the length of the previous delay period (see Supp. Fig. 2). Error bars represent bootstrapped 95% confidence intervals. Serial dependence is constant between 0 and 6 s, but then drops in magnitude between 6 and 10 s. (b) Tuning of serial dependence across all possible angular differences between the current and previous stimulus, for responses that followed trials with a delay length of 3 s. The thin black line represents the group moving average of response errors, with the standard error in gray shading, and the thick orange line is the best-fitting Clifford curve (2.21° peak-to-peak). The DoG fit (thick black line) misses the large-amplitude peripheral bumps at the extremes of the x-axis in this plot. (c) Tuning of serial dependence when the preceding trial’s delay period was 10 s. Here, the peak-to-peak of the Clifford fit is 0.91°. The attractive peaks of serial dependence in this condition are clearly reduced relative to Fig. 5b, but the peripheral bumps are just as prominent. The best DoG fit reasonably approximates the magnitude of the attractive effect, but its failure to account for the peripheral bumps causes resampling and permutation statistics to be unstable (Supp. Fig. 2).

In line with past work on the sources of error in working memory9,33, our findings suggest that the increasing serial dependence over longer delays (in the current trial) reflects its association with mnemonic processes rather than perceptual and motor processes that were held constant in our experiment. Research over the last decade has yielded several mathematical models designed to isolate distinct sources of error in working memory34,35,37,38,39,40,41. None of these include parameters for the proactive interference that serial dependence represents32. Also, the only substantive difference between the models is their characterization of noise in the distribution of behavioral responses. As a form of systematic error, serial dependence is separable from noise, and so can be incorporated into any of these models without changing their definitions or differences. The simplest model (sometimes called the “equal precision” model37,39) fits random error with a single von Mises distribution34. In contrast, the “variable precision” model assumes the standard deviation parameter of the von Mises varies from trial to trial according to a gamma distribution37,38. A third model explicitly regards the precision of working memory as arising from noise in Poisson-distributed spike trains of individual neurons40,41. Errors in this model are distributed according to a von Mises random walk41. We will refer to these three working memory models as EP (equal precision), VP (variable precision), and VMRW (von Mises random walk).

As a first pass, we fit each of these models to the behavioral data from Experiment 1. Model comparison on the basis of the corrected Akaike Information Criterion (AICc) revealed that the VMRW model fit the data about as well as the VP model (Δ AICc \(=\,3.9\pm 5.3\) in favor of VMRW). Both of these models fit the data better than the EP model (\({\rm{\Delta }}\) AICc \(=\,40.2\pm 9.6\) in favor of VMRW; \({\rm{\Delta }}\) AICc \(=\,36.2\pm 8.1\) in favor of VP). This relative performance is consistent with published comparisons of the three models using behavioral data from other working memory tasks37,38,39,40.

Next, we created a hybrid model that incorporates serial dependence into the mean of the VMRW distribution – sliding the mean clockwise or counterclockwise on each trial by the magnitude dictated by the tuning of the history effect (see Methods). This hybrid model significantly outperformed the base VMRW model (\({\rm{\Delta }}\) AICc \(=\,29.2\pm 7.9\)). However, this result on its own falls short of confirming that the serial dependence tuning function is needed to quantify the influence of the history effect on each trial. To verify that inclusion of the tuning curve visible in Figs 2 and 46 is needed for the improvement in fit, we developed an alternative hybrid “memory confusion”2 model that takes trial history into account in a different way. In this model, it is assumed that on a subset of trials, subjects simply mix up which stimulus belongs to the current trial and report the previous trial’s location when probed (analogous to a “swap”36,43 over time rather than space). This “memory confusion” model provided no benefit above the base VMRW model and made it worse, due to the addition of parameters that captured little variance (\({\rm{\Delta }}\) AICc \(=-7.7\pm 2.7\)).

Figure 6
figure 6

(a) Peak-to-peak of serial dependence in the group data for each ITI tested in Experiment 2. The peak-to-peak was calculated using a least squares fit of the DoG tuning function to the data. Error bars represent bootstrapped 95% confidence intervals. Serial dependence decreases in magnitude as the ITI lengthens, and then flips to significant repulsive sensory adaptation at an ITI of 10 s. (b) Tuning of serial dependence across all possible angular differences between the current and previous stimulus, for the 1-s ITI condition. The thin black line represents the group moving average of response errors, with the standard error in gray shading, and the thick black line is the best-fitting DoG curve (2.59° peak-to-peak). In orange is the best fit of the Clifford model, which captures the amplitude of the effect as well as the DoG. (c) Tuning of sensory adaptation for the 10-s ITI condition. Here, the peak-to-peak of the DoG fit is −1.38°. The Clifford model cannot conform to the narrow width of the effect in this condition and underestimates its magnitude.

Finally, we tested whether the addition of the serial dependence tuning function to all three of the base models would change the order of performance among them. Whereas the VMRW and VP models performed equivalently without taking account of serial dependence, we found that the extended VMRW model outperformed the extended VP model when fit to our data (\({\rm{\Delta }}\) AICc \(=\,12.2\pm 4.1\)). The fit of the extended EP model was still worse than that of the VP model with serial dependence terms (\({\rm{\Delta }}\) AICc \(=\,14.7\pm 5.2\)). Hence, the overall best-fitting model to our data was the VMRW model with added terms for the DoG-shaped serial dependence effect.

Experiment 2: Manipulation of baseline interval between trials

It is possible that the delay manipulation in Experiment 1 confounded two variables1: the time for which subjects must hold the current item in memory and2 the time that has elapsed since the behavioral response on the previous trial, before the current trial’s response. To assess whether the time course of serial dependence we observed (Fig. 2a) corresponds to mnemonic processes and not the simple passage of time, we conducted a second experiment in which the inter-trial interval (ITI) varied randomly among 1, 3, 6, and 10 s. The delay in this new task was held constant at 3 s. In all other respects, the tasks for the two experiments were identical.

Collapsing across ITIs, we identified serial dependence in the group dataset significantly greater than zero (\(p < {10}^{-4}\), group permutation test; peak-to-peak \(=\,1.62^\circ \); bootstrapped 95% confidence interval \(\,=\,\mathrm{[1.38}^\circ ,\,1.86^\circ ]\)). As for Experiment 1, there was no bias in the data in the direction of the stimulus on the upcoming trial (\(n.s.\), group permutation test; peak-to-peak = 0.23°; bootstrapped 95% confidence interval = \([-0.15^\circ ,0.80^\circ ]\)), an important control10,14. This pair of results replicates our finding from Experiment 1 of serial dependence in this spatial delayed response task, using an independent dataset.

Next we examined each of the ITIs individually. The magnitude of serial dependence across ITIs from 1–10 s is plotted in Fig. 6a. The magnitude of serial dependence decreases gradually during the interval between trials, marginally from 3–6 s (\(p=0.01\), group permutation test, Bonferroni-corrected \(\alpha =0.008\); lower bound of bootstrapped 95% confidence interval at 3 s \(=\,2.48^\circ \); upper bound of confidence interval at 6 s \(=\,1.81^\circ \)) and significantly from 6–10 s (\(p < {10}^{-4}\); lower bound of bootstrapped 95% confidence interval at 6 s \(=\,1.00^\circ \); upper bound of confidence interval at 10 s \(\,=\,0.48^\circ \)). The difference in serial dependence between the 1-s (Fig. 6b) and 3-s ITIs was statistically non-significant. The slope of this time course is opposite that obtained in Experiment 1, strengthening our conclusion that increased serial dependence with increased delay length is due to the prolongation of memory demands rather than the mere passage of time. For the largest ITI (10 s), participants’ responses on the trial after the ITI were repelled away from the preceding trial’s stimulus, an effect consistent with sensory adaptation (\(p=0.006\); peak-to-peak = −1.38°; bootstrapped 95% confidence interval \(=\,[-2.15^\circ ,-0.48^\circ ]\); Fig. 6c). In contrast, for every other ITI tested, serial dependence was significantly greater than zero (all \(p\le {10}^{-3}\), group permutation tests).


In everyday visual experience, humans rely not just on moment-to-moment perception but also on continued maintenance of information in working memory to navigate their environments and accomplish tasks. While there is much evidence to suggest that working memory recruits the same cortical areas active during sensory perception48,49,50,51,52,53,54,55,56, remembered visual content differs in quality33,57,58 – and potentially representational format59,60 – from feedforward signals driven by the presence of an external stimulus. Both behavioral data33,57,58 and computational theory61 have implied that passage of visual percepts into memory makes them less precise. This past work has also claimed that mnemonic processes do not attach to percepts any accumulating systematic bias – just random noise due to drift and/or decay33,57,58,61. With the experiments reported here, we provide new evidence to disconfirm this view. Serial dependence – a systematic bias in the direction of the preceding trial’s stimulus – is absent from percepts until the working memory system is engaged. Our demonstration of repulsive adaptation – with no attractive serial dependence – in the perception condition extends previous work9 by showing that this oppositely valenced effect that precedes working memory does not require that subjects make a comparison between two simultaneously presented stimuli9; adaptation occurs in the context of the same delayed response task that yields serial dependence when memory demands are increased.

By testing a wider range of delays between stimulus and response than used in previous studies9,16,17, we were able to chart the time course of serial dependence in visual working memory. This technique – of probing participants to report the contents of memory at variable points in time after stimulus offset – is common in visual psychophysics62. It has revealed how information passing through the visual system progresses from a rich perceptual code to a more impoverished mnemonic one. For a few hundred milliseconds after visual input ceases, a great deal of perceptual detail is still accessible to the observer in iconic memory – a form of storage intermediate between perception and working memory63. After that, within one second of delay, capacity-limited, distraction-resistant working memory comes online in parallel with a larger-capacity system that is vulnerable to distraction – fragile memory45,46,64,65,66,67. Our experiments demonstrate that the residual sensory trace associated with iconic memory is free of serial dependence – though it does carry the opposite, repulsive bias associated with sensory adaptation. The attractive bias arises slowly in the later short-term memory systems, but asymptotes before long-term storage processes are engaged (at approximately 20 seconds of delay58). Future research may resolve with finer resolution the exact moment at which serial dependence appears and whether it is most strongly associated with fragile or distraction-resistant stages of working memory. (Consistent with most work in this area68,69, we have tended to use the term “working memory” as a shorthand for both of these systems.)

Our results indicate that not only do the relative strengths of serial dependence and adaptation differ over time (between perception and working memory) – they also differ across individuals (Fig. 4). This suggests that serial dependence may appear in behavior sooner – perhaps as early as the perceptual period – in individuals for whom adaptation processes are especially weak (Supp. Fig. 1). By the same reasoning, experimental manipulations employed to dimish the strength of adaptation might help reveal an earlier onset of weak, underlying positive serial dependence in most subjects. However, as it can take as little as 50 ms for a viewed stimulus to be consolidated into working memory70, it seems unlikely that future investigations conducted at finer temporal resolution will identify robust serial dependence at time points that definitively exclude the involvement of working memory – especially given that positive serial dependence is weaker the sooner the response is made to a stimulus. (We note that subjects are likely to have engaged working memory encoding processes even in our 0-s delay condition, where serial dependence was not consistently observed. Our argument is that consolidation into working memory is likely necessary, but not always sufficient – if maintenance times are short – for the occurrence of the effect.) The measurement of neural signals associated with serial dependence may be needed to definitively disambiguate whether the effect originates in low-level sensory cortex immediately upon sensory perception or hundreds of milliseconds later, after sensory input has propagated to higher-level areas involved in the maintenance of short-term perceptual memories32.

Beyond demonstrating that serial dependence accumulates for longer in working memory than previous studies have indicated9,16,17, we have taken strides to integrate this phenomenon into the study of working memory in ways it has not been before32. Specifically, we have made concrete, formal improvements to prominent mathematical models designed to characterize the psychological architecture of working memory. The provision of terms for serial dependence to these models allows them to capture more variance in behavioral data and ensures that the variance associated with the temporal smoothing operation of serial dependence does not distort estimates of the models’ other parameters. Furthermore, our results demonstrate that the inclusion of these terms can reveal performance differences between seemingly equivalent models – we found that the VMRW model performs better than the VP model only after the tuning of the trial-history effect is taken into account. Claims that have been made about the nature of decay rates in working memory without consideration of trial-history biases must now be reëvaluated. For example, one study that modeled behavioral responses following different delay period lengths concluded that maintained representations are susceptible to spontaneous complete erasure from working memory as the delay length increases (measured as an increase in guess rates), but not to subtle degradations in precision (measured with the \(\kappa \) parameter of a variation on the EP model)42. However, because this study ignored potential serial dependencies in the data, as well as alternative models of noise (e.g., VP and VMRW), the validity of this conclusion is unclear. It is impossible to address claims about total loss of information from memory with our data, because guess rates in our simple one-item spatial task were near zero. In the future, however, the hybrid models we have developed that incorporate serial dependence may help elucidate the nature of working memory storage in more difficult multi-item tasks44. To what extent serial dependencies occur when multiple items are held in mind at once is an open question that the hybrid models we have validated can help answer.

Our experiments have filled other gaps in the field’s understanding of the temporal properties of serial dependence. We have determined the approximate duration for which this effect persists between trials. At least in spatial working memory, the attractive bias disappears within ten seconds after the end of each trial, and is replaced by (or exposes a persistent) low-amplitude adaptation. This constrains possible neural theories of serial dependence – viable mechanisms must have time constants on the order of 10 s, which rules out especially short-term (e.g., synaptic facilitation) and long-term (e.g., long-term potentiation) forms of plasticity. Previous attempts to measure the washout period of serial dependence in humans have used a short, fixed ITI, preventing the measurement of pure time in the absence of intervening trials1,14. One experiment using non-human primate subjects did report a decrease in serial dependence between 2 and 7 s of ITI16. Over this range, the effect remained above baseline for two of three subjects, and no crossover to adaptation was observed. We have also shown that serial dependence weakens when the stimulus on the previous trial is maintained for as long as 10 s. It is possible that the neural code changes abruptly around this time point: for example, elevated neural firing keeping the representation active may begin to fail spontaneously (as happens in some neural-network models71), leaving the representation in an “activity silent” state supported by short-term plasticity72,73. Exponential decay of this synaptic trace (without continued active firing to keep it in place) may explain the reduced influence on responses on the subsequent trial.

Reframing serial dependence as a phenomenon of working memory rather than perception does not change the theories that have been put forth about its functional importance32. Thus, it remains an important mechanism for stabilizing representations against interruptions in visibility1,8. The contents of working memory track the focus of attention25,26,27,28,29,30,31, which, during the execution of a single goal, can remain the same for several seconds, even as the raw visual input that impinges on the retina fluctuates due to saccades, occlusion, and changes in lighting. Hence, temporal autocorrelation in visual working memory is potentially even higher than it is in visual scenes (and perception). If true, this would explain why serial dependence may have evolved in working memory as opposed to perceptual circuits – more autocorrelation enhances the ability of temporal smoothing to limit the influence of noise and boost signal. Moreover, the offloading of attractive serial dependence to memory systems may accord perceptual systems enhanced capacity to specialize in novelty detection, in part via adaptation. More research is needed to elucidate the ways in which serial dependence and adaptation interact, and to reveal the ecologically valid situations in which one or the other (perhaps both at the same time) enhance visual performance32. Such continued study should aim to clarify the mechanisms and functional consequences of the striking diversity we observed in the strength and tuning shape of these effects across individuals.



Fifty-five adults (34 female) from the UC Berkeley community were recruited to participate in this study. Thirty-five of these individuals completed Experiment 1 only, fourteen completed Experiment 2 only and six completed both experiments. All aspects of data collection and analysis were conducted in accordance with guidelines approved by the Committee for the Protection of Human Subjects at UC Berkeley. Informed consent was obtained from all subjects, and they were compensated monetarily for their time.

Experimental Procedures

Participants completed the protocol in a soundproof, dimly lit testing room. For both experiments, they completed a spatial delayed response task, depicted in Fig. 1 (adapted from44). The task was programmed in MATLAB using the Psychophysics Toolbox74 (version 3) and run on a Mac mini (OSX El Capitan 10.11). For eight subjects in Experiment 1, a 17-in monitor was used with a screen resolution of 1280 × 1024 pixels. The remaining sessions were run with a 23-in monitor, 1920 × 1080 pixels. Results with regard to serial dependence were not appreciably different between the two groups that used different monitors (Supp. Fig. 3). All participants were seated such that their eyes were approximately 60 cm from the center of the testing display.

The stages of the generic task used for both experiments are as follows (with angle measurements reported in degrees of visual angle). Each trial began with the presentation of a black circle for 1 s at a random polar angle from fixation, with eccentricity fixed at \(12^\circ \). The circle’s diameter was \(1^\circ \). All stimuli were displayed against a gray background. Participants were instructed to fixate a central black square – which spanned \(0.5^\circ \) × \(0.5^\circ \) – whenever it was on the screen (all stages of the task aside from the response period). In Experiment 1, participants remembered the location of the presented circle for a delay that varied randomly from trial to trial (0, 1, 3, 6, or 10 s). The delay was always 3 s in Experiment 2. At the end of the delay, the fixation square was replaced with the mouse cursor (at the exact center of the screen), and participants indicated the location in mind by moving the cursor to that location and clicking once. No feedback was given. Errors were measured in degrees of polar angle. In Experiment 1, a 1-s ITI followed the response period, before the start of the next trial. The ITI varied randomly from trial to trial in Experiment 2 (1, 3, 6, or 10 s). Each participant completed 1,000 trials (200 per delay) in Experiment 1, divided into 40 blocks over the course of one or two experimental sessions. All but two participants completed 1,008 trials (252 per ITI) in Experiment 2. The remaining two participants completed 999 and 1,017 trials, respectively.

Data Analysis

The data were analyzed using Python, MATLAB, and shell scripts. All code written for this study is available in a public Git repository (

Before model fitting for trial-history effects, the data were submitted to preprocessing. First, trials with responses that were within 5° of visual angle of the origin were dropped, as were trials with responses further than three standard deviations from the participant’s mean error (<0.7% of all trials, across subjects). Next, we computed systematic directional error as the mean response for each stimulus location. This mean was then subtracted from the response on each individual trial (ignoring the location of the previous trial) to obtain the residual error that was used to characterize serial dependence. Replicating the procedure in16, we computed the systematic error by spatially low-pass filtering the responses as a function of stimulus location using the MATLAB function loess. Finally, to ensure that our analyses were restricted to data from participants who performed the task correctly, we removed those with noticeably poor performance. Specifically, we removed subjects with an overall mean absolute error greater than \(10^\circ \) of polar angle. This criterion, though arbitrary, removed only subjects with qualitatively noisy error histograms while retaining those whose errors were roughly normally distributed around the correct value (the expected pattern). Only three subjects in Experiment 1 failed to pass this criterion (mean absolute error \(37.3\pm 19.5\) for these three compared to \(4.7\pm 0.2\) for the others). Data from two subjects in Experiment 2 were excluded (\(53.9\pm 1.8\) for these compared to \(4.7\pm 0.4\) for the others).

Studies that have modeled the tuning of serial dependence to featural differences between past and current visual stimuli have used the derivative of Gaussian (DoG)1,2,8,9 (or the very similar Gabor function16,17). There is another function in the perception literature, developed by Clifford and colleagues20, that has been used to model sensory adaptation – and that therefore fits serial dependence readily (when multiplied by −1). Overall, these functions fit the data from our Experiment 1 equivalently well (collapsing over delays, \({\rm{\Delta }}\) AICc = \(0.8\pm 1.0\), favoring DoG over Clifford), and this equivalence also holds for Experiment 2 (collapsing over ITIs, \({\rm{\Delta }}\) AICc = \(\,1.1\pm 1.0\), favoring DoG over Clifford). However, we noted significant differences between the Clifford and DoG models in certain conditions. Occasionally, the attractive bias of serial dependence is accompanied by a repulsion effect (“peripheral bumps”9) when previous visual input is close to maximally different from the input on the current trial (the extremes of the x-axis in Fig. 2c). The DoG cannot account for this reversal of the response bias, so when it is prominent in the data, the best fit of the DoG tends to mischaracterize the true effect size (e.g., 4C). The Clifford model is a combination of sinusoids of different frequencies designed to capture the peripheral bumps when they appear20. However, when the trial-history effect is narrow over stimulus differences and there are no peripheral bumps, the Clifford model tends to fail (e.g., Fig. 2a). This is because the Clifford model – unlike the DoG – does not have an independent width parameter; shrinking the central width of the Clifford fit requires that the peripheral bumps be increased. To be consistent with previous literature on serial dependence, we use the DoG for all analyses, except in cases where it provides a poor fit to the data, in which case we use the Clifford model (as noted below). The mathematical definitions of both models are reported next.

In this study, differences between past and current visual input ranged between −180 and \(180^\circ \) of polar angle (a complete circle). The DoG is defined as


where \(y\) is the signed error, \(x\) is the relative angle of the previous trial, \(a\) is the amplitude of the curve peaks, \(w\) is the width of the curve, and \(c\) is the constant \(\sqrt{2}/{e}^{-0.5}\).

The Clifford model is stated as follows:

$$\sin (y+x)=\frac{\sin (x)}{\sqrt{{(s\cos (x)-c)}^{2}+{\sin }^{2}(x)}},$$

where \(s\) is a scaling parameter and \(c\) is a centering parameter.

We used the scipy75 function least_squares (in the optimize module) to find the values of \(a\) and \(w\), in the case of the DoG, or \(c\) and \(s\), in the case of the Clifford model, that minimized the difference, for each \(x\), between the estimated \(y\) and the subject’s actual error. Across all values of \(x\), we take the magnitude of serial dependence (or adaptation) to be the peak-to-peak of \(y(x)\), with the sign adjusted to match the direction of the effect (see Fig. 2c).

To determine whether the magnitude of serial dependence was significantly greater than zero, or greater in one condition than in another, we submitted the data to permutation testing at the group level2,9. Specifically, we shuffled the values of \(x\) (current trial’s location relative to the previous trial’s) while leaving in place the corresponding errors. We then fit the DoG to the shuffled dataset. This process was repeated 10,000 times. The \(p\)-values we report are the proportion of permutations that led to equal or higher values for the peak-to-peak of the function fit than the one estimated for the unshuffled data. In the case of a comparison between conditions, we subtracted the null peak-to-peaks for one condition from those for the other, and report the proportion of these differences that had equal or higher values than the empirical difference. The criterion for significance was Bonferroni-corrected for each family of tests.

We computed bootstrapped confidence intervals as follows:2,9 We resampled the data with replacement 10,000 times. We then fit the DoG to each resampled dataset. This yielded a distribution of peak-to-peak values from which we selected the boundaries of the 95% confidence interval – separately for each delay and ITI condition.

In our statistical analyses of group data, only one condition – the 10-s condition in the analysis of the previous trial’s delay length for Experiment 1 – could not be fit reliably using the DoG, due to large peripheral bumps (Fig. 5c; Supp. Fig. 2). Hence, for this analysis we used the Clifford model, which estimated the peak-to-peak reliably. In our plot of individual subjects’ data (Fig. 4a), the DoG fit (with bootstrapped confidence intervals) is reported for all but three subjects, for whom the Clifford model was a qualitatively superior fit. These three subjects are labeled in the figure, and one is highlighted in Fig. 4c.

Three base mathematical models of working memory – EP34, VP37,38, and VMRW40,41 – were fit to our behavioral data, as described in the Results. Model fitting was done using the MATLAB function fminsearch, separately for each delay condition. EP is defined as

$$p(\hat{s}|s;k)=\frac{e{k}^{\cos (\hat{{\rm{s}}}-s)}}{2\pi {l}_{0}(k)},$$

the von Mises probability density function. Here, \(\hat{s}\) is each trial’s response, \(s\) the corresponding stimulus, and \({I}_{0}\) the modified Bessel function of the first kind, order zero. The concentration parameter, \(\kappa \), is a measure of response precision, spanning all trials, and is the model’s one free parameter for fitting.

In VP, precision is drawn anew for each trial from a gamma (\(\gamma \)) distribution with mean \(\bar{J}\) and scale parameter \(\tau \) (the model’s free parameters). Built from EP, this gives

$$p(\hat{s}|s;\bar{J},\tau )=\int EP(\hat{s};s,{\rm{\Phi }}(J))\gamma (J;\bar{J},\tau )dJ,$$

where EP’s concentration parameter \(\kappa \) is a function of \(J\) – here expressed as \({\rm{\Phi }}(J)\) – and \(J\) is formally defined as Fisher information. The analytical relation between \(\kappa \) and \(J\) is \(J=\kappa \frac{{I}_{1}(\kappa )}{{I}_{0}(\kappa )}\), and \({\rm{\Phi }}(J)\) is approximated numerically using this equation. The integral in Equation (4) has no analytical expression and so is also approximated using Monte Carlo simulations37,39.

Finally, in VMRW, noise in working memory is distributed according to a von Mises random walk, as derived from a population coding model of cortex41. Specifically, behavioral errors for a random walk of length \(r\) are von Mises distributed:

$$p(\hat{s}|s;r,k)=\frac{{e}^{{\rm{k}}r\cos (\hat{s}-s)}}{2\pi {l}_{0}(kr)},$$

where the distribution of \(r\) for \(m\) walk steps is

$$p(r|m,k)=\frac{{l}_{0}(kr)}{{l}_{0}{(k)}^{m}}r{\psi }_{m}(r).$$

Here, \(r{\psi }_{m}(r)\) is the probability density function for a uniform random walk of length \(r\) and number of steps \(m\). The variable \(m\) is itself Poisson-distributed, with expected value \(\xi \). For additional equations and a full derivation, including the neural interpretation of these variables, see41. In order to fit this model to data, we approximated the density \({\psi }_{m}(r)\) via Monte Carlo simulation. The free variables for fitting are \(\kappa \) (the concentration parameter) and \(\xi \), which corresponds to gain.

We added terms to these base models to capture temporal smoothing in the data in the form of serial dependence (or adaptation). In particular, we allowed the mean of each model’s probability density function to vary on a trial-by-trial basis, as a function of the location of the previous trial’s stimulus. Given a particular difference in location between the current and previous trial’s stimuli, the mean shift was set to be the value of the DoG model fit to the data at that point. (That is, in visual terms, the input to the model was a point on the x-axis in Fig. 2c, for example, and the output mean shift was the DoG function’s value on the y-axis.) This procedure added two additional variables to each of the base models – \(a\) and \(w\).

As an alternative to the base models with the serial dependence expansion, we made alternative models that account for trial history by assuming that participants, on a subset of trials, confuse which stimulus was presented most recently and report the wrong item when probed. This alternative similarly allowed the mean of the base probability density functions to shift, depending on the difference between the previous trial’s location and the current one, without altering their shape or width. This “swap over time” model is defined as36

$$p(\hat{s}|s)=(1-\alpha )BM(\hat{s}-s)+\alpha BM(\hat{s}-s\ast ),$$

where \(BM\) is a base model, \(\alpha \) (an additional free parameter) sets the frequency of swaps, and \({s}^{\ast }\) is the stimulus location for the previous trial.

Within each model, we used a separate set of parameters for each memory delay length, and formally compared the fits of different models using the Akaike Information Criterion (as recommended in39), with the standard correction for finite sample sizes (AICc). AICc values were averaged across subjects for these comparisons.