Individual beliefs about temporal continuity explain variation of perceptual biases

Perception of magnitudes such as duration or distance is often found to be systematically biased. The biases, which result from incorporating prior knowledge in the perceptual process, can vary considerably between individuals. The variations are commonly attributed to differences in sensory precision and reliance on priors. However, another factor not considered so far is the implicit belief about how successive sensory stimuli are generated: independently from each other or with certain temporal continuity. The main types of explanatory models proposed so far—static or iterative—mirror this distinction but cannot adequately explain individual biases. Here we propose a new unifying model that explains individual variation as combination of sensory precision and beliefs about temporal continuity and predicts the experimentally found changes in biases when altering temporal continuity. Thus, according to the model, individual differences in perception depend on beliefs about how stimuli are generated in the world.

Appendix A. The relative and absolute sequential dependence and their relations to the Bayesian models.
In the literature, two main methods to quantify sequential (or serial) dependence can be found. Older studies such as Holland and Lockhead (1968) quantified the sequential effect as the dependence of the current error on the stimulus magnitude in the previous trial (we refer to it as the absolute sequential dependence, ASD). More recent studies (e.g., Fischer & Whitney 2014, Kiyonaga et al. 2017, Clifford et al. 2018, Cicchini et al. 2018 reported the dependence of the current error on the difference between the stimuli in the previous and current trials (the relative sequential dependence, RSD).
Even though it is rarely mentioned, the RSD is appropriate only when stimuli 1) come from a circular scale, such as angular orientation, and 2) are equally distributed over the whole scale. This is the case for most of the studies mentioned above, which investigated serial dependence for perception of visual orientation. For other cases, such as when stimuli are drawn from an open scale or from only part of a circular scale, the RSD is problematic and potentially misleading, because it inflates the true sequential dependence effect (see details in Appendix A2 below). Moreover, the RSD can then falsely show a dependence, based on mathematical coupling (Archie 1981, Curran-Everett 2010, even if the result of the current trial is completely independent of the previous stimulus.

A1. Relative and absolute serial dependence for a special case
To show that relative sequential dependence (RSD) is not appropriate for stimuli on an open scale, such as distance or duration, let us consider a typical perceptual experiment (Fig. S1A): stimuli are presented successively in randomized order, and for each stimulus one response is measured, that should ideally reflect the stimulus magnitude (typical responses shown in Fig.  S1B). Now consider a 'dead salmon case' (a famous fMRI study showing BOLD responses in a dead salmon, Bennett et al. 2009): if the response is constant, say zero, independently of the given stimuli (Fig. S1C), we still get a strong 'serial dependence' in terms of RSD measure (Fig. S1D). The RSD also remains strong when the response is random with no relation to the stimulus (Fig. S1E). Figure S1: Simulations of a perceptual experiment. A: Schematic timeline of a typical duration productionreproduction experiment. Stimulus duration given by a visual cue is actively reproduced as response. B: Exemplary responses plotted over stimulus for 400 randomly presented stimuli with an arbitrary stimulus range of 2-3. C: "dead salmon" case (blue, zero response) and independent random responses with a central tendency of 1 (red dots) to the same stimuli as in A. D: the RSD measure for the zero response in C shows a highly significant relation with a correlation coefficient of r=0.71 and a regression slope of 0.5, erroneously suggesting a serial dependence. E: again, the corresponding RSD shows a significant relation. Note, however, that an appropriate statistical permutation test reveals for both cases that both slope and correlation coefficient are not statistically significant (here: both p>0.3).
Why is that the case? Because the current stimulus magnitude is calculated twice in both the response error (on the y-axis) and the stimulus difference (on the x-axis). Thus, the two variables are algebraically coupled, which engenders a strong bogus correlation between both variables. For the 'dead salmon' case in Fig. S1C, it can be shown analytically that the slope of the regression line of the RSD becomes 0.5.
Let us denote ! as the current response and ! as the current stimulus of the trial and . For the "dead salmon" case, response is "dead" ( ! = 0) and the error plotted on the y-axis is ! = ! − ! = − ! . The inter-trial difference plotted on the x-axis is ! = !"# − ! (as defined in Fischer & Whitney 2014). The regression slope of the two variables is calculated by = Since the stimuli ! are presented in random order, the mean difference ̅ approaches zero for large n. Consequently, = The denominator can be rewritten as . !/and the numerator becomes . !/- .

B
Note that this holds for random stimuli on an open scale, but not on a circular scale as used in Fischer & Whitney (2014) and many other recent studies on serial dependence. For the circular scale, if stimuli are drawn from the whole circular range, the slope becomes zero even for the "dead salmon" case.

A2. Relative and absolute serial dependence for a Bayesian model with fixed prior
Let's consider a Bayesian model with a fixed static prior for the central tendency which should not show a serial dependence, because the estimate in the current trial ! is, in first approximation, generated by a weighted average of the measurement of the current stimulus ! and a fixed value (usually the mean ̅ of the stimuli, assuming a normal or log-normal distribution of the stimuli). The current response can then be computed by ! = ! + (1 − ) ̅ with being a weight between 0 and 1. The model predicts a central tendency = 1 − with large magnitudes being underestimated and small ones overestimated, no matter what the previous stimulus is. The range of the stimuli, however, causes a sequential structure of the trials with a high probability of a large magnitude followed by a small magnitude, and vice versa. Thus, the RSD, measured by the inter-trial difference, falsely takes the central tendency effect as the sequential dependence. Fig S2A shows a simulation of the static model with a noisy response that yields a more realistic dependence of the result on the stimulus. Again, the simulation of the RSD shows a significant relationship (Fig. S2B) despite the error being independent of the previous stimulus for the static model (Fig. S2C). Evidently, the RSD measure used in the recent literature could be dangerously misleading for such a case on an open scale. In the presence of a central tendency (or other systematic errors), the RSD measure without appropriate permutation testing cannot make clear assertions on whether the response in the actual trial really depends on the stimulus in the previous trial or not.
We show in the following that the RSD in a model with fixed prior results in a dependence of error on stimulus difference that could be mistaken as serial dependence. Given a set of stimuli ! drawn from a random distribution on an open scale with mean ̅ , a simple model for the perceptual response ! would be: with the weight being determined, for example, by using the variance of the stimuli and the variance of the measurement noise. Note that the model assumes that ! only depends on the current stimulus ! , but not on the previous one. In this model, the central tendency is given as = 1 − . The RSD (serial dependence according to Fischer & Whitney 2014) is a systematic dependence of the error ! − ! on the stimulus difference !"# − ! . Note that in Fischer & Whitney (2014) a derivative-of-Gaussian was fitted to the orientation judgements as orientation difference has lower and upper bounds. We show here for unbound continuous magnitude using a linear fit that with this formulation a systematic dependence arises even for the trialindependent model above. reproduced magnitude plotted over stimulus; the central tendency is given as 1-slope of the regression line (blue). B: Relative serial dependence (RSD): Error (response-stimulus) in the current trial plotted over the difference between stimuli in the previous and the current trials to quantify serial dependence (the misleading p-value is calculated from standard regression statistics). C: Absolute serial dependence (ASD): Error (response-stimulus) plotted over stimulus in the previous trial. The serial dependence, given as slope of the regression, now correctly reports a value that is not significantly different from zero, as expected for a model with a fixed prior. Simulated data points with large inter-trial stimulus differences (red, on right and left sides in B) or previous stimulus with extreme magnitudes (yellow, on the right and left sides in C) are marked in colour to emphasize that large errors correlate with large stimulus differences. Even though no true serial dependence exists in this simulation, the RSD slope differs significantly from zero and amounts to approximately 50% of the central tendency. (D-F) Simulation of an iterative Bayesian model estimating the same random sequence of 400 stimuli as in Fig. S2A-C and yielding the same central tendency. D: reproduced magnitude plotted over stimulus; the central tendency is given as 1-slope of the regression line (blue). E: Relative serial dependence (RSD): Error (response-stimulus) in the current trial plotted over the difference between stimuli in the previous and current trials. Note that the slope is larger than for the static model (Fig. 2B) F: Absolute serial dependence (ASD). Error (response-stimulus) plotted over stimulus in the previous trial. The serial dependence, given as slope of the regression, differs significantly from zero, as expected from the iterative model.

The linear relation is given by
Thus, the slope of the best fit using least-squares is = With the equation of the static model, the slope can be rewritten as Note that the denominator can be rewritten as And therefore, we have the following relation That is, the half of the central tendency effect is falsely attributed to the RSD, even though the model assumes no sequential dependence. If, for the static model above, the current error ( ! − ! ) is plotted over the last stimulus !"# , then the slope of the best fit regression line becomes ′ = . While the denominator is the autocorrelation of ( − ̅ ) with lag zero, the numerator can be evaluated as follows: . !/-Therefore, the numerator is the autocorrelation of ( − ̅ ) with lag 1 weighted by the central tendency (1 − ). The slope ′ therefore depends on the model weight (or the central tendency) but can maximally assume a value that is the quotient between stimulus autocorrelations with lag 1 and lag 0. Since the stimulus autocorrelation for lag 1 is close to zero for random stimuli, the serial dependence is also close to zero, as expected for a model in which the current response is independent of previous stimuli.

A3. Serial dependence for an iterative Bayesian model
The simplest probabilistic model for serial dependence is a Kalman filter model with the assumption that the stimulus at the current trial is equal to that of the last trial plus some random fluctuation (e.g., Glasauer 2019). Since the Kalman gain k reaches a steady state value relatively quickly, the basic iterative equation can be simplified to Fig. S2D shows the simulated response of the iterative model to the same stimuli as in Fig.  S2A. For better comparison, the parameter k was chosen to approximate the central tendency to that of the static model. The RSD again shows a significant positive slope (Fig. 2SE), which is larger than in Fig. S2B. But now also the ASD shows a significant relationship and confirms that the current response indeed depends on the previous stimulus signifying a serial dependence.
The central tendency is given by the linear least squares fit = + , for which the slope From this directly follows that 9 ≈ ̅ . The denominator can be rewritten as follows The numerator can be written as .

!/#
The numerator can be rewritten as follows: This relation between serial dependence and central tendency is shown as red curve in Figs. 2 and 3.

Appendix B. Generative models
The assumptions behind the three estimation models can be illustrated by looking at the generative mechanisms assumed by the estimation models. These mechanisms are also called generative models (e.g., Griffiths et al. 2008) and illustrated in Fig. S3. Each generative model yields different predictions concerning the sequential dependence. The two-state model assumes two hidden states (the stimulus and the mean of the stimulus distribution, Fig. S3B), which generalizes the two-stage model proposed previously (Petzschner & Glasauer 2011). The original two-stage model, which is not identical to the simple iterative model mentioned in the main text and above, assumes that the variance of the stimulus distribution is directly related to the variance of the known random fluctuations of the mean and that it could be estimated from these fluctuations. When this restriction is released, the resulting two-state model encompasses the two-stage model as a special case, and also both the static model and the simple iterative model as two boundary cases: The static model is a boundary case of the twostate model, if the mean ! in the two-state model is constant instead of fluctuating (i.e., = 0). To be exact, the two-state model with q=0 and the static model are equivalent only for the steady state, because the two-state model initially estimates the unknown mean of the stimulus distribution m, while the static model assumes that the mean m is already known at trial 1. When the assumed stimulus distribution has a negligible variance ( = 0), the two-state model reduces to the simple iterative model. Figure S3: Generative models of magnitude reproduction. A: The static model (left column) assumes that stimuli (red lines) are randomly drawn from a fixed stimulus distribution (blue). In the simple iterative model (middle column), the stimulus at the current trial i is assumed to be the same as in the previous trial i-1 except for some random fluctuation that is conceptualized as additive random shift (black distribution). The two-state iterative model (right column) combines these two models: the stimulus is drawn from a stimulus distribution (blue), but the mean of the stimulus distribution fluctuates from trial to trial by a random amount (additive random shift, black distribution). In the illustration the stimuli are the same in all three cases, but the assumed sequential structure differs in each model. The measurement noise distribution around the stimuli has been omitted for clarity. B: Hidden Markov Model diagram visualizing the three generative models. Note that the sensation z, the observable, is also shown. It is derived from the stimulus via the conditional probability P(z|x), which describes the measurement noise. For the static model, the state m, which represents the mean of the stimulus distribution, is assumed not to change from trial to trial. Stimuli are drawn from the stimulus distribution with P(x|m). In the simple iterative model, stimuli change from one trial to the next with a stationary transition probability P(xi|xi-1).
In the two-state model, the mean of the stimulus distribution changes from trial to trial with a stationary transition probability P(mi|mi-1), and the stimuli are drawn from the new stimulus distribution with P(xi|mi-1).

C1. Duration Reproduction
The average data for the randomized condition of the duration reproduction experiment (14 subjects, data available as Glasauer & Shi 2021) is presented in Fig. S4. To present average data, we grouped individual data in bins of 0.1 s and then averaged over subjects. The model exhibits not only an excellent fit to the averaged data for reproduced duration plotted over stimulus duration (Fig. S4 A1, coefficient of determination R 2 =.986), but also captures the average absolute serial dependence of the error on the previous stimulus ( Fig. S4 A2, R 2 =.827), even though this relation was not included in the fitting procedure. Note that for all models average mean data and simulated responses match well when plotted over current target time (A1-C1), but only match in A2, but not B2 and C2, when plotted over previous target time, showing that the serial dependence is not appropriately captured by the static or simple iterative models.
For comparison, the best fit simple iterative model yields R 2 =.985 for average reproduced duration over current stimulus (Fig. S4 C1), but R 2 =-.039 for average error over previous stimulus, indicating that the model is not appropriate for explaining this relation (Fig S4 C2).
Similarly, the static model yields R 2 =.987 for average reproduced duration over current stimulus (Fig. S4 B1), but R 2 =-.240 for error over previous stimulus (Fig S4 B2), showing that this model is also inappropriate for explaining the serial dependence.  The average data and model predictions for the random walk condition are shown in Fig. S6.  From the predicted individual response time courses, the average prediction was computed in the same way as for the real response time courses. Note that for this prediction the data shown here were not used, thus the good match between model prediction and data supports the validity of the model and that model parameters are preserved despite the change in stimulus sequence.

C2. Distance Reproduction
The data were published as Petzschner & Glasauer (2011) and are available as Petzschner & Glasauer (2020). Figs S8 and S9 show the average simulation of the distance and angular reproduction from the two-state model. Similar to the duration reproduction study, the two-state model also fits very well both for the central tendency and the sequential dependence.   The full model of Cicchini et al. (2018) additionally assumes that the weight itself depends on the distance between stimuli (their Eqn. 3.7) so that ! = 1 − 1 2 + ( !"# − ! )with being a constant related to the stimulus reliability. This modification of the model, however, does not change the basic relation between central tendency and ASD slope, as exemplified in Fig. S11. In other words, in this model, the serial dependence is equal to the central tendency. Note, however, that according to the weight dependence in the model, the weight ! can only assume values between 0.5 and 1. Thus, the central tendency according to this model cannot become larger than 0.5. The model thus cannot capture cases such as the "dead salmon" or the noise response shown in Fig. S1, where the response becomes largely independent of the stimulus.  Figure S11: Simulation of the full serial dependence model (Cicchini et al. 2018) for the same stimulus sequence as in Figs. S2 and S10. RSD and ASD slopes are again both equal to the central tendency. Due to the dependence of the weight on the stimulus difference the RSD (shown in B) would be better fitted with a polynomial or a difference of Gaussians (as used in Fischer & Whitney 2014). However, a linear fit is still appropriate for the absolute serial dependence (shown in C).