Individual beliefs about temporal continuity explain variation of perceptual biases

Perception of magnitudes such as duration or distance is often found to be systematically biased. The biases, which result from incorporating prior knowledge in the perceptual process, can vary considerably between individuals. The variations are commonly attributed to differences in sensory precision and reliance on priors. However, another factor not considered so far is the implicit belief about how successive sensory stimuli are generated: independently from each other or with certain temporal continuity. The main types of explanatory models proposed so far – static or iterative – mirror this distinction but cannot adequately explain individual biases. Here we propose a new unifying model that explains individual variation as combination of sensory precision and beliefs about temporal continuity and predicts the experimentally found changes in biases when altering temporal continuity. Thus, according to the model, individual differences in perception depend on beliefs about how stimuli are generated in the world.

how sensory stimuli are caused, or generated, in the external world. These assumptions, or 71 beliefs, are essential for predictions, which serve as prior knowledge: for example, if asked 72 what tomorrow's temperature will be, we might answer something like 'a little warmer/colder 73 than today'. That is, we assume that daily temperature changes by a random amount each day, 74 but that it will be similar on successive days. In contrast, in a standard psychophysical 75 experiment the stimuli presented to our participants are often drawn randomly from a fixed 76 distribution, just like numbers in a lottery, and they are thus independent from trial to trial. 77 Thus, a good guess (in the sense of small error) of predicting the value presented at the next 78 trial would be the mean of the stimulus distribution. In other words, when it comes to daily 79 temperature, a good estimator would use today's temperature as prior knowledge for 80 tomorrow's value. In contrast, when the stimulus value presented in a psychophysical 81 experiment is concerned, a useful prior knowledge would be the distribution of possible values. 82 83 Thus, when a sensory input is combined with prior knowledge using these two cases of 84 generative assumptions, the final estimate might differ substantially ( visual reference and is aligned with the mean of the initial prior. In trial 2 (middle row), the stimulus and likelihood 97 are again the same for both models, but the prior differs: the static model takes the same prior as in trial 1, because 98 1 In the Bayesian estimation process, the estimate of the current stimulus is based on a posterior distribution, which is the product between the likelihood distribution (describing the sensory uncertainty that is inherited from the current sensory input) and the prior distribution (describing prior knowledge). The estimate can be the most likely value from the posterior distribution, the maximum of the distribution, or other measures such as the mean, and its uncertainty can also be derived. the underlying assumption is that all stimuli come from that distribution. In contrast, the iterative model uses the 99 estimate from the previous trial to predict the next stimulus distribution, which is used as new prior knowledge, 100 because it assumes that the new stimulus is similar to the old except for some random change. The two resulting 101 posterior distributions (yellow), and thus the perceptual estimates, differ considerably with the one for the static 102 model showing much smaller underestimation than that of the iterative estimate, even though the stimuli and 103 sensory accuracy is the same in both cases. The third row shows trial 3, in which the estimates of both models 104 come much closer again.

106
Both cases, however, are mirrored by explanatory perceptual estimation models found in the 107 literature. For example, Jazayeri and Shadlen (2010) proposed a model to account for the central 108 tendency in duration reproduction in which prior knowledge was formalized as fixed stimulus 109 distribution (as depicted in Fig. 1 as static model). By contrast, Petzschner & Glasauer (2011) 110 in a distance reproduction study proposed that the central tendency is a consequence of an 111 iterative Bayesian process, in which prior knowledge is iteratively based on the perceptual 112 estimate of the previous trial (like in the iterative model in Fig. 1 Materials and Methods for the detailed method). Fig. 2 shows an example of individual raw 147 results plotted for evaluation of central tendency ( Fig. 2A) and sequential dependence (Fig.  148 2B). The relation of sequential dependence and central tendency for all individual participants 149 is depicted in Fig. 2C. Note that, as mentioned in the introduction, perceptual estimation with 150 a fixed prior (see the static model in Fig. 1) would predict that sequential dependence is zero 151 independently of central tendency.

164
However, individual responses show a large scatter for both central tendency and sequential 165 dependence. The mean sequential dependence was 0.108 ± 0.056 (mean ± SD), which is 166 significantly different from zero (p < .0001; t-test, n = 14) and thus ruled out the static model 167 as valid explanation for the results. In fact, all data points show higher sequential dependence 168 than predicted by the static model (zero). We also conducted a partial correlation analysis and 169  As mentioned before, the sequential dependence larger than zero rules out the static prior as an 177 explanation for the central tendency. However, in the iterative model ( Fig. 1) We also analysed a publicly available data set on distance reproduction (Petzschner & Glasauer 192 2020) published previously (Petzschner & Glasauer 2011), using the same method. The data 193 come from two separate experiments on visual path integration, one on linear distance 194 reproduction and one on reproduction of angular distance (see Materials and Methods). While 195 Petzschner and Glasauer (2011) showed that their iterative model could well capture the central 196 tendency, they did not analyse sequential dependence. Fig. 3 shows the equivalent analysis as 197 above for the two path-integration experiments (Petzschner & Glasauer 2011

208
The same analysis was applied as in Fig. 2C.

210
The present data also show that the individual variation in biases cannot be explained by 211 individually different levels of sensory uncertainty: under the assumption of a static model, 212 changing sensory uncertainty would not lead to different levels of sequential dependence; if 213 the iterative model was underlying perceptual estimation, variation of sensory uncertainty 214 would still confine the individual biases to the orange line in Figs. 2C, 3A, and 3B. 215 216 From Figs. 2 and 3 we can see that for all individual data points the predictions of the two 217 models considered so far seem to be boundary conditions. Neither is any of the individual bias 218 values located below zero for sequential dependence, nor is any of them above the orange 219 parabola denoting the iterative model. The obvious conclusion is that individual participants 220 seem to follow beliefs that lie between the two extremes expressed by the static and simple 221 iterative models, which assume 1) the sampled stimuli are random and independent, or 2) the 222 current stimulus is equal to the previous plus some random change. An intermediate belief about 223 stimulus generation can be described as follows: assume that the stimulus on the current trial 224 has been drawn from a stimulus distribution, but that the mean of that distribution is allowed to 225 change randomly from trial to trial. Regarding our first example of daily temperature changes, 226 this assumption is also reasonably applicable. 227 228 From this assumption we can now construct a new estimation model, which requires two 229 variables to be estimated: in addition to estimating the current stimulus, the perceptual process 230 also needs to estimate the mean of the current stimulus distribution. Therefore, the model 231 requires two internal states. The new model is an extended iterative model, more flexible than 232 the simple iterative model depicted in Fig

260
The static model and the simple iterative model are special cases of the new two-state model: 261 both are nested within the two-state model. Therefore, one can determine whether the 262 parameters that are set to zero for the simpler models differ significantly from zero in the full 263 model. On average, both parameters (Parameter 1: the relative variability of the stimulus 264 distribution, and Parameter 2: the relative variability of the additive change of the mean) of the 265 full model were significantly different from zero (Parameter 1: 1.03 ± 0.28, mean ± SEM, t-test 266 p < .01; Parameter 2: 0.14 ± 0.05, mean ± SEM, t-test p = 0.025; both n = 14). In individual 267 participants, the relative variability of the stimulus distribution differed significantly from zero 268 (assessed via confidence intervals of the parameters) for all subjects (range 0.20 to 4.12), while 269 the variability of the additive change differed from zero only for 6 of 14 subjects (range 0 to 270 0.66). To determine which model was more appropriate for fitting the data, we used an out-of-271 sample cross validation procedure specifically suited for model selection in time series (Arlot 272 & Celisse 2010). According to this cross-validation procedure (see Materials and Methods), the 273 two-state model is the preferable model for 8 of 14 participants, while for the remaining 6 274 participants the static model is sufficient (Fig. 4B,C). C serial dependence is too small for the static model but too large for the simple iterative model 279 (Fig. 4C), while the two-state model matches the data reasonably well. 280 281 While the results so far confirm that the two-state model provides a quantitative explanation 282 for central tendency and sequential dependence at lag one (i.e., dependence on the previous 283 stimulus), due to its iterative nature the two-state model also predicts dependence of the 284 current error on stimuli further in the past. That this is indeed the case also experimentally can 285 be shown by cross correlation analysis: for duration reproduction, the cross correlation 286 between stimulus and reproduction is, on average, significantly different from zero up to lag 3 287 (t-test; match, but that also the size of the error bars is captured quite well by the model.

301
The averaged experimental results for duration reproduction together with the averaged model 302 results for dependence of error on current and previous stimuli is shown in Fig. 5 (see also SI 303 Fig. S4). Note that the model was fitted to each individual trial-by-trial reproduction time 304 course separately by minimizing the trial-wise least-squares distance between experimental 305 reproduction and model simulation. Thus, the good match shown in Fig. 5B and 5C, 306 quantified by a high coefficient of determination R 2 , is caused by the model mimicking the 307 experimental sequential dependence without explicitly including it in the fitting procedure. 308 This is not a trivial consequence of the model fit, as shown by the fact that both static and 309 simple iterative models can fit the central tendency equally well (i.e., the dependence shown 310 in Fig. 5A), but fail to correctly exhibit the sequential dependence shown in Fig. 5B and 5C 311 (see SI Appendix C1).

318
As explained above, the models (static, iterative, two-state) make different assumptions about 319 how stimuli are generated, e.g., the static model assumes independent and identically 320 distributed (i.i.d.) random variables. Fig. 6 shows histograms, autocorrelation, and time course 321 of exemplary stimulus sequences generated according to the assumptions of the three models. 322 The stimulus sequences of the iterative models (iterative, two-state) have been generated so that 323 their histograms are similar to the histogram of the static model's sequence (quantified by 324 minimizing the Kullback-Leibler divergence). While the histograms are reasonably similar 325 (Fig. 6A, KL divergence < 0.01), the autocorrelation differs considerably (Fig. 6B). As 326 expected, the sequence for the static model, which is a Gaussian noise sequence, shows no 327 sequential dependence between current and previous values, while the two iterative models 328 generate sequences with autocorrelation at higher lags. The sequence generated by the simple 329 iterative model is a Wiener process or random walk, while the sequence of the two-state model 330 is a superposition of a random walk and Gaussian noise. The corresponding exemplary time 331 courses are shown in Fig. 6C: the blue time course (i.i.d. stimuli) would be optimal for the static 332 model, the red random walk time course is optimal for the simple iterative model, and the 333 yellow trace, being a compromise between randomness and slow drift would be optimal for the 334 two-state model. 335 336 This implies that the result of the estimation process depends on how well the stimulus sequence 337 is matched to the model assumptions about stimulus generation. Using an iterative model is 338 suboptimal for an i.i.d. sequence, and, vice versa, using the static model is not the best solution 339 for estimation a random walk sequence. conditions.

351
We tested this implication by analysing the second condition of the duration reproduction 352 experiment, in which the same stimuli as analysed above, were presented in a random walk 353 order to the same participants (see also Materials and Methods). Since we suspected that the 354 remaining central tendency in the random walk condition and the change in sequential 355 dependence could be explained by the new two-state model introduced above, we used the 356 individually fitted model parameters obtained from the randomized condition to predict the 357 individual time courses of the random walk condition. In this condition, subsequent stimuli are 358 similar to each other (example in Fig. 7C), just as supposed by the generative model of the 359 simple iterative Bayes (see Fig. 6C, red time course, for an example of such a random walk). 360 As explained in our previous paper (Glasauer & Shi 2021), this condition tests the prediction 361 of the simple iterative and the Petzschner & Glasauer (2011) explanatory models, which both 362 predict that the central tendency vanishes in the random walk condition. An example of the 363 effect of changed stimulus order on central tendency is shown in Fig. 7A (randomized 364 condition, replotted from Fig. 2A). and Fig. 7B (random walk condition) for one participant. In 365 this participant, the central tendency seen for randomized stimulus order (Fig. 7A) almost 366 vanishes for the random walk stimulus order (Fig. 7B). 367 368 significantly smaller during random walk (t(13) = 7.32, p < .0001; see Fig. 8A). However, it 370 did not completely vanish and was still larger than predicted by these previous models 371 (Glasauer & Shi 2021). For some subjects, the central tendency was no longer different from 372 zero (see example data in Fig. 7), while for others it clearly was still visible. Sequential 373 dependence also changed and became on average negative with a significant difference between 374 conditions ( Fig. 8B; t(13 Figure 9 shows the averaged experimental results together with the averaged model prediction. 385 Both central tendency (Fig. 9A) and sequential dependence ( Fig. 9B and 9C) are well-predicted 386 by the model, showing that the central tendency remaining in the random walk condition is 387 explained by the generative assumption of the two-state model (see also red model results in 388 Fig. 8). Note that the similarity of the error dependence on current and previous stimuli in the 389 random walk condition shown in Fig. 9 is expected, since stimuli in this condition are highly 390 autocorrelated, i.e., the current stimulus is indeed similar to the stimuli preceding it (and thus 391 the reproduction error is similar when plotted over current or previous stimuli). 392 393 394 395 two-state models used in Fig. 5 (open dots). Stimuli are the same as in Fig. 5 except for the order of presentation.

398
Model parameters were determined from fitting the 'randomized' condition and are the same as used in Fig. 5 Like the static prior model, the simple iterative model proposed previously (e.g., Dyjas et al. 438 2012; Glasauer 2019) predicts the central tendency effect very well but falls short in accounting 439 for the experimentally observed sequential dependence. The simple iterative model assumes 440 that stimuli remain similar from trial to trial with a random fluctuation. This assumption 441 corresponds to stimuli being generated by a random walk or discrete Wiener process. According 442 to this assumption, the overall variance of the stimuli builds up over the trials. By contrast, the 443 static model assumes that the stimulus distribution has a fixed variance, and a fixed mean. The 444 generative assumption for the iterative model also implies a stimulus sequence that differs 445 considerably from that of the static model: it resembles Brownian motion or a diffusion process 446 in one dimension rather than a random sequence (see Fig. 6C for examples). Both the static and 447 the simple iterative models provide predictions concerning the sequential dependence: the static 448 model predicts zero sequential dependence, the iterative model predicts that, in case of random 449 stimuli, sequential dependence depends on central tendency in a predictable way (see red curve 450 in Figs. 2C and 3). The empirical data, however, showed that neither of these two models 451 captures the experimental relation between central tendency and sequential dependence. 452 453 Therefore, we proposed the two-state model that combines the static and the simple iterative 454 models and assumes that the stimulus at each trial comes from a distribution with fixed variance, 455 but that the mean of that distribution changes from trial to trial. By merging the assumptions of 456 static and simple iterative models about stimulus generation, both the central tendency effect 457 and the absolute sequential dependence can be well explained. According to the two-state 458 model, the considerable variations between participants are not only caused by different impact 459 of noise on sensory measurement, but also because of different beliefs concerning the sequential 460 structure of the stimuli. As an example, in Fig. 2C, of two participants with approximately the 461 same central tendency of 0.42, one had a sequential dependence of 0.03, the other of 0.17. This 462 difference reflects the observers' own supposition about the sequential structure: the participant 463 with a low sequential dependence assumed the world is volatile and trusted only the current 464 stimulus together with a hypothesis about the limited range of stimuli for perceptual estimates. 465 By contrast, the participant with a large sequential dependence agreed about the randomness of 466 the world but further assumed that things change over time with some continuity. For perceptual 467 decision-making task, it has recently been suggested that individual differences are due to 468 different implicit assumptions about the complexity of a sequence (Glaze et al. 2018). In their 469 study, participants had to infer from which of two possible Gaussian sources the current visual 470 stimulus was drawn. The true source was randomly switched with a hazard rate that could 471 change. The authors proposed that a bias-variance trade-off was the underlying reason for 472 differences in choice variability. While this study is very different from ours, both have in 473 common that the implicit beliefs of participants about the temporal volatility of stimulus 474 generation are the reason for individual differences. 475 476 The present investigation also suggests that an observer's belief about the world's sequential 477 structure is carried over from one experimental condition to another instead of being adapted 478 to an individual condition: the model parameters derived from the randomized condition of 479 duration reproduction provided an excellent prediction of the experimental results of the 480 random walk condition, even though both conditions varied exactly (and only) by their 481 sequential structure. Thus, participants in these experiments did apparently not adapt their 482 beliefs to the actual temporal structure of the stimuli but relied on their individual hypothesis. 483 However, whether these beliefs can be altered, for example by feedback, or reflect intrinsic 484 personality traits warrants further investigation. A recent study on the perception of probability 485 emphasized that average results do not provide the full picture and that individuals deviated 486 substantially from optimal performance, with these idiosyncratic deviations persisting over a 487 long time (Khaw et al. 2021 One might wonder about the purpose of integrating immediate prior information into a current 505 decision, given that it may cause an estimation bias. One common explanation is that the 506 regularity of our environment is relatively stable, so that integrating prior knowledge will boost 507 the reliability of the estimation and facilitate performance (Petzschner et al. 2015, Shi et al. 508 2013). For a visual orientation reproduction task (Cicchini et al. 2018), the authors argued that 509 sequential dependence provides a behavioural advantage manifesting with low reaction times 510 and high accuracy. When the stimuli are similar between trials, it is useful to use the last 511 perceived stimulus as prior. This assumption about the sequential structure is included in the 512 generative assumption of the two-state model: the stimulus of the current trial is assumed to be 513 similar to that of the last trial, since it comes from a distribution with a similar mean. However, 514 the mean of the sampled stimuli also fluctuates over time, which makes the two-state model 515 more flexible than a static model. That is, observers do not assume that the randomness of the 516 external environment is strictly stable, but rather expect variations and changes. 517 518 Next, the question arises whether the proposed two-state model is optimal for the usual 519 experimental situations with standard randomization. That is, stimuli are randomly generated 520 as i.i.d. process from a fixed, pre-defined distribution, which has become a 'standard' 521 experimental procedure since Vierordt's work in 1868. The answer is obvious: the two-state 522 model is not optimal, given that the stimuli are randomly drawn from a fixed distribution. Using 523 the last trial to estimate the current would deteriorate rather than improve the quality of the 524 estimate. However, as evidenced by the significant sequential dependence, instead of believing 525 the stimuli are randomly generated, most of our participants assumed that there is at least some 526 temporal continuity in the stimulus sequence. According to both the simple iterative model and 527 the two-state model, for these participants, the overall central tendency bias should be smaller, 528 if the stimulus sequence is changed so that stimuli are indeed similar from trial to trial. This 529 was validated by showing in our previous study (Glasauer & Shi 2021) that the central tendency 530 in sequences with complete random stimulus order was larger than in sequences with random-531 walk fluctuation. Here we showed that this decrease in central tendency and, more importantly, 532 the remaining central tendency, is well-predicted by the two-state model on an individual basis. 533 The model also predicts the experimentally found reversal of sequential dependence (compare 534 the positive dependence in Fig. 5B  (http://psychtoolbox.org). Each trial started after 500 ms presentation of a fixation cross 569 followed by the stimulus which appeared for a pre-defined duration. After a short break of 500 570 ms participants were prompted to reproduce the duration of the stimulus by pressing and 571 holding a key. The visual stimulus was shown again during key press. At the end of the trial, a 572 coarse visual feedback was given for 500 ms (5 categories from < -30% to > 30% error). Each 573 participant performed two blocked sessions in balanced order. In the random walk condition, 574 participants received 400 stimuli generated by cumulative summation (integration) of randomly 575 distributed values from a normal distribution with zero mean and a SD that was chosen to yield 576 stimuli between 400 ms and 1900 ms. In the randomized condition, the same 400 stimuli were 577 used in scrambled order. Each participant received a different sequence (see Fig. 4A for an 578 example). The data have been used previously (Glasauer & Shi 2021) and are publicly available 579 (Glasauer & Shi 2021b). 580 581 Distance reproduction 582 583 The experimental procedure has been published previously (Petzschner & Glasauer 2011) and 584 the data are publicly accessible (Petzschner & Glasauer 2020 distances or angles, see Fig. S5 and S6, 200 trials per condition) in a production-reproduction 591 task. For distance estimation, participants were instructed to move forward on a linear path until 592 movement was stopped when reaching the randomly selected production distance (same 593 sequence for all subjects) and then had to reproduce the perceived distance in the same direction 594 using the joystick and indicate their final position via button press. Velocity was kept constant 595 during movement but randomized up to up to 60% to exclude time estimation strategies. No 596 feedback was given. For angular turning estimation, the procedure was the same except that 597 subjects had to turn. 598 599 Data analysis: central tendency and sequential dependence 600 601 To quantify central tendency, a linear least-squares regression was fitted to stimulus 602 reproduction plotted over stimulus duration for each participant individually using Matlab (The  603 Mathworks, Natick MA, USA). Central tendency was defined as 1-slope of the regression line. 604 Sequential dependence was assessed by fitting a linear least-squares regression to the error in 605 trial k plotted over the stimulus in trial k-1 (Holland and Lockhead 1968 with the weight being determined by using the variance of the stimulus distribution and the 618 variance of the measurement noise. Note that the model assumes that ! only depends on the 619 current stimulus ! , but not on the previous one. The fixed prior of the model could be the mean 620 of the stimulus distribution ̅ . In this model, the central tendency is given as = 1 − . Since 621 in this model the current response does not depend on the previous stimulus, the sequential 622 dependence is zero regardless of the central tendency (see SI Appendix A2). 623 624 Modelling: Iterative Bayesian model 625 626 For an iterative or dynamic model, the quantification of sequential dependence should yield an 627 effect, given that in such a model the actual response is defined to depend on both the current 628 and the previous magnitudes. The simplest iterative Bayesian model (Fig. 1) can be derived 629 from two assumptions for the underlying generative process (Glasauer 2019): 1) the stimulus model as ! = + % with % being a random number coming from a distribution 662 (0, ).

663
-The iterative model assumes that the stimulus ! in trial is the same as in trial − 1 664 except for some random change with variance . In other words, the generative model 665 is ! = !"# + & with & coming from a distribution (0, ). 666 667 From these assumptions we can construct a third generative model, the two-state model, that 668 combines advantages of both models: 669 -The two-state model assumes that the stimulus ! in trial comes from a random 670 distribution ( !"# , ) with mean !"# and variance . The mean of this distribution 671 in trial is the same as in trial − 1 except for some random change with variance . In 672 other words, the stimulus distribution in the current trial depends on that in the previous 673 trial. The generative model now has two states: the randomly changing mean of the 674 stimulus distribution ! = !"# + & and the actual stimulus ! = !"# + % , drawn 675 from this distribution. 676 677 For an illustration of the generative models see SI Appendix C. 678 679 Modelling: The two-state model 680 681 Thus, the generative equations for the two-state model are given as follows: 682 with ! being the stimulus at trial that is drawn from a distribution with mean !"# and 686 variance (here expressed by the random number % , which is normally distributed as (0, )). 687 The mean of this stimulus distribution ! at the trial i is the same as in the trial before except 688 for the random fluctuation & ( & is normally distributed as (0, )). The actual sensory 689 measurement (or sensation) ! is the stimulus corrupted by the sensory noise , which is 690 normally distributed as (0, ). 691 We can rewrite these equations in matrix notation with ! = =