Modeling the effects of perisaccadic attention on gaze statistics during scene viewing

How we perceive a visual scene depends critically on the selection of gaze positions. For this selection process, visual attention is known to play a key role in two ways. First, image-features attract visual attention, a fact that is captured well by time-independent fixation models. Second, millisecond-level attentional dynamics around the time of saccade drives our gaze from one position to the next. These two related research areas on attention are typically perceived as separate, both theoretically and experimentally. Here we link the two research areas by demonstrating that perisaccadic attentional dynamics improve predictions on scan path statistics. In a mathematical model, we integrated perisaccadic covert attention with dynamic scan path generation. Our model reproduces saccade amplitude distributions, angular statistics, intersaccadic turning angles, and their impact on fixation durations as well as inter-individual differences using Bayesian inference. Therefore, our result lend support to the relevance of perisaccadic attention to gaze statistics. Lisa Schwetlick et al. present a computational model linking visual scan path generation in scene viewing to physiological and experimental work on perisaccadic covert attention, the act of attending to an object visually without obviously moving the eyes toward it. They find that integrating covert attention into predictive models of visual scan paths greatly improves the model’s agreement with experimental data.

V isual perception in humans is the result of complex signal processing of visual input in the brain. Information enters the eyes at a rate of about 10 8 −10 9 bit/s 1 . In order to handle this enormous amount of input, the visual system relies on foveation and selective attention 2 . These two mechanisms reduce the information available at any given point in time to enable the brain to efficiently process the relevant aspects of visual information. Foveation refers to the decrease of visual acuity from the region extending about 2°around the point of fixation (the fovea) to the periphery of the visual field. During natural viewing, regions of interest are sequentially moved into the high-resolution foveal area by saccadic eye movements 3,4 . Natural vision is therefore an active process, determined by sequential choices of fixation locations. The resulting scan path 5 is characterized by pronounced spatial correlations 6 . Selective attention is the second key bottleneck of visual processing with a rate of about 100 bit/s 7 , prioritizing selected image regions at the cost of others. Under natural viewing conditions, fixation position and visual attention are closely linked and coincide at the same location most of the time during viewing 3 .
Experimentally, however, the locus of visual attention and fixation position can diverge, a condition referred to as covert attention 8,9 . Research on saccade dynamics in highly controlled experimental setups indicates that attention, as measured by processing benefits, precedes the fixation to the next saccade target [10][11][12] . Current models of eye movements and visual attention are typically based on the plausible simplification of directly equating location of attention and fixation position [13][14][15][16] . Here we propose that perisaccadic covert attention shifts are an important factor in eye-movement guidance. The field of modeling eyemovement behavior has primarily focused on predicting where fixations are placed in an image [17][18][19] . The most advanced models are able to predict fixation density maps that closely resemble the empirical fixation densities they are based on 20 . The step from modeling static fixation densities to predicting scan paths reveals that bottom-up image information, while important, cannot comprehensively explain the fixation selection process. This is illustrated by the fact that even a model that comprises no image information at all outperforms some static saliency models 21,22 . Thus, scan path dynamics also play an important role. The ability of a model to predict human-like behavior can be much improved 16 by adding basic dynamic mechanisms to the static image-based predictions 15,21,23,24 .
Theoretical 13,25 and experimental work 24 agree that two essential components in explaining dynamic scan paths are attentional selection and inhibitory tagging of previously fixated locations. The former refers to the combination of foveation and the attentional field, which defines a limited area from which information can be extracted. The attentional field is often represented as a Gaussian distribution, with its peak representing the fovea. Thus, as a first-order approximation, visual input is given by a Gaussian blob defined by the fixation position in a given scene. The second component keeps track of fixation history in order to drive exploration in scan paths and prevent continuous return to the same high-saliency regions 26 . In behavioral experiments, inhibition of return has been widely found as a component of human visual behavior 27 , electrophysiology 28 , and, more recently, as a neural process in the frontal eye field 29 .
Attentional selection and inhibitory tagging have been previously implemented in a dynamical model for scan path generation 14,16 . The SceneWalk model 14 serves as a platform for the current work on the analysis of the role of attention around the time of saccade. Conceptually the model comprises two independent streams, activation and inhibition, which are computed on discrete 128 × 128 grids mapped to the image dimensions. The activation stream is implemented as a Gaussian aperture around the current fixation location (see Eq. (1)) convolved with a saliency map. This local saliency then evolves over time using a differential equation (see "Methods" for mathematical details), meaning that past fixations can influence the current activation stream. The inhibition stream implements fixation tagging by Gaussian maps centered around the fixation location and similarly evolving over time using a differential equation such that past fixations retain some influence over the current inhibition stream. The size of the Gaussian window σ A/F , as well as the decay parameters ω A/F and other free model parameters are jointly obtained from the parameter inference (see "Methods"). As illustrated in Fig. 1, activation and inhibition maps are subtractively combined to yield a priority map 30 , i.e., the 2D fixation probability map for the selection of the upcoming saccade target.
In the current context of perisaccadic processes, it is important to note that the strongest impact on mean fixation duration is generated by the variation in saccadic turning angles 15 . Continuing to move along the previous saccade's vector is associated with much shorter fixation durations than when the saccade direction changes by 90°or more (see the 80 ms effect in Fig. 6a). Therefore, we primarily seek to explain this coupling between fixation duration and saccade angle. Thus, we simplify our analysis by assuming random timing of fixation durations (assuming a gamma-distribution) and investigate the coupling with target selection under different turning angles. In future work the temporal control in the model could be extended to include other metrics (e.g., local saliency) for predicting fixation durations.
In this article we investigate a neurophysiologically plausible implementation of attentional dynamics and inhibitory principles. We extend the SceneWalk model 14 of eye-movement control by adding the concept of attentional shifts around the time of a saccade. Large-scale numerical simulations are carried out to estimate model parameters from experimental data using Bayesian data assimilation 16 . These covert perisaccadic attentional shifts turn out to improve model performance on a variety of eyemovement statistics. Fig. 1 Attentional processing streams in a conceptual scan path model. Visual attention and inhibitory tagging are largely independent processing streams which evolve neural activations via time-dependent input and decay. Constraining a saliency map (black and white color map) by a Gaussian aperture can approximate the extent of visual attention (orange color maps), as shown on the left. Inhibitory tagging, shown in blue color maps, keeps track of previously visited locations, as shown on the right. The '×' marks the current fixation position. Combining the activation and inhibition streams yields a priority map from which fixation positions can be selected.

Results
The current work investigated the potential role of perisaccadic attention on human saccade statistics. In the next paragraph, we explain our theoretical model, before we describe experimental paradigm and experimental data.
Integrating perisaccadic attention with gaze control. Before the saccade is executed toward a target, performance benefits in accuracy and speed can be measured at the target location. This has frequently been interpreted as attention being allocated to the part of the image that is about to be fixated as part of saccadic planning. In Fig. 2a (leftmost), we see that during a fixation, the fixation location and the center of attention are coaligned. Once the upcoming target location is selected from the priority map u ij (t) but before the saccade occurs (Fig. 2a, second from left), attention already moves to the upcoming saccade target, decoupling fixation (red three-pointed star) and attention (green fivepointed star). The concept that covert attention shifts precede saccadic eye movements is well-established in the literature 10,31 , with clear evidence for this predictive attentional targeting as early as 150 ms before saccade onset 32 .
Furthermore, attention has been shown to move retinotopically with the saccade 33 . Thus, just after a saccade similar processing benefits can be found in a location along the saccade vector, which aligns with the retinotopic position of the target before the saccade 34 , a phenomenon called retinotopic attentional trace (RAT). The pre-allocated attention peak moves with the saccade such that it lands shifted along the saccade vector away from the saccade target. Figure 2a (third from left) shows that immediately after a saccade, attention is shifted to the same retinotopic position as the previous pre-saccadic shift and thus spatiotopically shifted in the same direction as the saccadic movement. Experimentally, the influence of the shift lasts about 100−200 ms 34 . After this interval the locus of activation moves to coincide with the fixation position again (Fig. 2a, rightmost panel). An alternative representation of the temporal progression of persaccadic processes in the model is available in Supplementary Fig. S2.
If we consider the added activation along the saccade vector as a component in saccade selection, this is in good agreement with the experimental finding of shorter fixation durations before forward saccades. The post-saccadic RAT is therefore the second part of the attentional decoupling that begins before saccade onset. Behavioral evidence for attentional shifts during a saccade 10 as well as neurophysiological correlates for postsaccadic retinotopic enhancements have been found 35 . Below we suggest that attentional shifts are a likely explanation for a systematic effect on saccade statistics observed during scan path formation. Figure 2b illustrates the influence of perisaccadic attentional shifts on the activation maps. The streams evolve over time (Eqs. (4) and (5)). Each successive map consists of the previous map and the current new information in a ratio determined by the decay function. The model, thus, has infinite memory, although depending on the strength of the decay parameters, previous fixation's influence may decrease rapidly Fixation targets are selected from the priority map (Eq. (6)) at time t fix − τ pre , where t fix is the duration of the fixation and τ pre is the duration of the pre-saccadic shift. Once the upcoming target is selected, attention moves to its location; after saccade execution, the post-saccadic attentional shift occurs; lastly, attention and fixation position are realigned when entering the main fixation phase (for details of the implementation, see Supplementary Information).
In the experiments, 35 human observers viewed 30 natural color images (see Supplementary Information). We will compare simulations for the baseline model 14,16 which includes only local saliency and inihibition evolving over time with the extended model that includes perisaccadic attention mechanisms. Model parameters for both models were estimated independently for each participant. For model fitting, fixation sequences of 2/3 of the images were used as training data, while all subsequent analyses were carried out on the remaining test images for each participant. The following section details some characteristic eyemovement statistics found in experimental data.
Saccade amplitude distribution. The distribution of saccade amplitudes generated during a scene viewing experiment varies across participants and images. Overall, both the baseline and the extended models reproduce the qualitative shape of the saccade amplitude distribution (Fig. 3a) [36][37][38][39] . The experimentally observed saccade amplitude distribution is right-skewed, reflecting that amplitudes tend to be smaller than computer-generated saccades obtained by random sampling from the static 2D fixation density 6,14 . Previously, we suggested this drop in saccade amplitudes is caused by the foveated visual system, which preferentially selects saccade targets from within attentional span. Therefore, inter-individual differences in mean saccade amplitudes should correlate with the size of the attentional span σ A , which is defined as the standard deviation parameter of the Gaussian-shaped attentional blob (see Eq. (1)). In Fig. 3b, we show the expected correlation between σ A and mean of saccade amplitude across participants, indicating that a larger area does indeed lead to longer saccades.
This statistic is perhaps the most prominent and intuitive. Previous modeling studies, like our baseline model, have been able to capture it as well as the extended model. The result we show here confirms that our addition of more complex mechanisms has not come at the cost of the more basic effects.
Additionally, the improved fitting procedure allows both models to be fit separately for each subject. With model parameters estimated for each participant using the training images, the predicted mean saccade amplitudes for test images were compared to experimentally observed mean saccade amplitudes. We found good agreement between predicted and experimentally observed mean saccade amplitudes ( Fig. 3c) indicated by a high correlation (r = 0.91). Our model is able to explain the inter-individual differences in the data via parameter variation.
Absolute and relative saccade angle distributions. Saccade angles are another important characteristic of human eyemovement behavior. The absolute angle distribution reports the directions of saccades relative to the image frame. Interestingly, there is a strongly image-dependent tendency, which varies mostly with the distribution of image features. On average the distribution shows characteristic peaks in the four cardinal directions 40,41 . Figure 4a shows that the baseline model does not show the pronounced pattern found in experimental data. Comparatively the extended model shows a clear improvement with distinct peaks at 0°, 90°, 180°, 270°, and 360°. The extended model implements a mechanism for an oculomotor potential (see Eqs. (14) and (15)), which preferentially weights the activation in the cardinal directions 42 before it is combined with the inhibition stream.
The saccade turning angle distribution characterizes the relationship of consecutive saccades. In the experimental data there is a clear bias towards forward saccades, which follow the same vector of motion, and a secondary preference for return saccades, which reverse the saccade vector. Therefore, we should expect clear peaks at 0°and 180°in the corresponding turning angle distribution 21,24,43,44 . Figure 4b shows the results of the baseline and extended models in comparison to experimental data. The baseline model produces a U-shaped distribution without any indication of a forward bias. There is an increased probability of turning by about 180°, since the edges of the image represent hard constraints. This effect is large enough to overshadow the effects of return saccades that directly return to the previous fixation location (of which there are comparatively few). The extended model does develop a peak for forward saccades, showing better qualitative agreement with the experimental data, although the bias towards forward saccades is clearly weaker than in the experiment. The model's slightly muted responses could be caused by a number of factors, not least of which is the fact that the chosen general purpose likelihood procedure does not specifically target this metric. The indirect fitting of parameters supports the existence of the directional biases but may capture them only partially in the presence of other variance in the data.
The statistical preference of observers to maintain current saccade direction has been referred to as saccadic momentum [43][44][45][46] . Here we propose that the experimental effect is at least partially due to attentional enhancement in the current saccade direction, which generates a peak in the attention map that produced the forward bias.
Joint probability of intersaccadic angle and amplitude. More generally, we can identify potential dependencies of saccade turning angle and saccade amplitude by visualizing the corresponding joint probability (Fig. 5). As discussed above, compared to all other directions, there is a pronounced tendency for saccades to either maintain or completely reverse the direction of the previous saccade. This effect is well documented in the literature 21,24,38,43,44 and is independent of a variety of other factors such as image content. The values on the axes in Fig. 5 are relative to direction and amplitude of the previous saccade. In this normalized coordinate space, the previous saccade moved from position (−1, 0) to position (0, 0). The plotted density indicates the probability of the following saccade to be executed in a direction and with an amplitude relative to the previous saccade. Figure 5a reveals that there are two clear peaks in the experimental data, i.e., the return peak to the normalized launch site (−1, 0) of the previous saccade and the forward peak that is related to the saccadic momentum effect discussed above. It is important to note that the experimental return peak is not particularly high, but it is distinct since surrounding 2D regions do not exhibit a high fixation density. In our extended model (Fig. 5b), the mechanism responsible for the forward saccades is the attentional shift before and after a saccade (Eqs. (9)−(11)). The distinctive shape of the return saccade peak, we suggest, is the result of the combination of a slow, global inhibition of return and a directed smaller facilitation of return (Eq. (12)) (see Supplementary Information). The former is implemented as the model's inhibition stream, while the latter is implemented as reduction in decay speed in the attention map, localized at the previous fixation location. The baseline model cannot produce the return and forward peak, since it lacks the mechanistic principles for coupling subsequent saccades. Intersaccadic angle and fixation duration and saccadic amplitude. The next two analyses correspond to the interdependence of fixation duration and saccade amplitude, and saccadic turning angles. Both have a distinctive shape in the data, showing that forward saccades tend to be shorter and preceded by shorter fixations, while changing direction takes longer and evokes longer saccades. Pilot simulations indicated that the effect reported in this section are not due to the addition of the oculomotor potential.
The new model notably improves the fit of the dependence of fixation duration on the turning angles (see Fig. 6). While previously there was no temporal component in the model, the added phases of shifted activation enable the model to dynamically respond to the duration of a fixation. In the model, each fixation begins with the post-saccadic shift phase. In terms of the attention activation map, this means that there is more activation along the previous saccade vector. After this phase the influence of the shift diminishes. Thus, when the fixation is short, there is still a lot of influence from the shift, increasing the chance of producing a forward saccade. When the fixation is long, the influence of the post-saccadic shift has subsided, allowing for activation from other salient locations to guide the saccade.
Likelihood-based comparison. Since our approach includes the likelihood computation of the baseline and extended models, we can make use of the models' likelihood functions for model comparison 16 . This approach entails evaluating the model likelihood given the empirical test data and computing the average log-likelihood per fixation of all scan paths. We then compare this metric to previous models 47 .
The overall likelihood of the model given the data is larger for the extended model than for the original model (Fig. 7). In general, improved likelihood indicates improved predictive power of a model. The additions to the baseline model discussed in the current study, though theoretically well-founded, were extensive and considerably increased the model complexity. Conceivably adding these mechanisms could have led to improved scan path dynamics but worsened overall likelihood predictions, or else made the model volatile or unstable. In general, the likelihood is an objective measure of overall model performance 16 . As we have seen, the extended model performs much better than the baseline model at a number of qualitative eye-movement effects, while the improvement in general model likelihood is relatively small. Effects such as the impact of saccade turning angles on saccade amplitude are strong and important for biological plausibility of the model. At the same time, however, the impact on the overall likelihood is limited, since their contribution to 2D fixation density is small. In combination, the large improvements in eyemovement statistics and relative improvements in likelihood across model variants allow a strong conclusion in favor of the proposed model extension.

Discussion
Moving from models of static fixation probabilities to the generation of scan paths has recently begun to attract interest in the field of attention modeling [14][15][16]23,48 . The success of saliencybased visual attention modeling 13,19,47 over the last 30 years makes a strong case for the use of priority maps 30 as a core component in scan path generation. In addition to image and task influences biologically represented in priority maps, scan paths on scenes are also characterized by a number of statistical characteristics, e.g., saccade angles and modulations of fixation duration or saccade amplitude by saccadic turning angles. Our modeling study lends support to the fact that attentional dynamics around the time of saccade exert a fundamental influence on the behavioral statistics of scan paths.
Previous research on visual attention shows that processing resources are covertly allocated away from the current fixation location just before 10,31,32 and just after 32,34,35 a saccade is produced. In this study, we added shifts of covert attention to a dynamical model of scan path generation 14,16 and find improved agreement with gaze statistics observed in experimental data. Most importantly, the characteristic distribution of saccadic turning angles with a clear bias towards forward and return saccades and the influences of saccadic turning angle on fixation durations and saccade amplitudes can be explained by covert attention shifts around the time of a saccade. The importance of covert attention and perisaccadic mechanisms is apparent throughout the visual system, both at the macroscopic and at the microsaccade levels [49][50][51] .
The first generation of computational models in scene viewing were static models that predicted fixation locations on any given image based on statistical image features. The strength of these static models lies in producing densities that resemble empirical fixation density maps. Recently, the predictive power of some models has become close to perfect and approached the gold standard 19,47 . However, by design these models do not take temporal dynamics within a scan path and the inhomogeneity of the retinal acuity into account. From this perspective, it is not surprising that static models predict fixation density, but not sequences of fixations 16,24,52 . This simple fact points to the interesting observation that eye movements in scene viewing are guided in large part, but not exclusively by observer-and imagespecific factors. Human eye movements are influenced by oculomotor and attention systems, producing pervasive systematic statistical tendencies in experimental data.
Previously published dynamic models outperform static models substantially 16,23 . The most evident feature of the human visual system that indisputably influences scan path dynamics is foveation. Accordingly, even a minimal model like weighting a saliency map by the distance to a current fixation location significantly improves model performance 53 . The SceneWalk model 14 , which served as a baseline for our study, incorporates foveated saliency in its activation stream. A further advance in the modeling of scan paths has been the addition of inhibitory fixation tagging 26,54,55 . The baseline model implements such an inhibition stream as a second component shaping the priority map 30 by difference of activation.
The fact that long fixations often occur in frequently fixated areas 56 implies that fixation duration and target selection are related. The LATEST model 15 combines the prediction of scan paths and fixation durations by interpreting scan paths as a continuous series of stay (maintain fixation) or go (saccade) decision [57][58][59] . Each individual location on a weighted saliency map influences two LATER units 60 , i.e., one for normal and long latencies and one for short latencies. These units accumulate evidence from each location in the image until one reaches a threshold depending on the current location, triggering a saccade. The accumulation rate of each location in the image is controlled by image-content factors like image features and semantic interest, as well as by oculomotor factors like the change in saccade direction and target eccentricity. Coupling of experimental data and model is achieved by statistical linear mixed-effects modeling. Thus, the LATEST model makes little attempt at explaining the origin of the factors that influence the rate of evidence accumulation, instead focusing on the specific selection mechanism and its relationship with fixation duration. By contrast, the extended SceneWalk model is based on mechanistic assumptions derived from neural and cognitive knowledge about the contributing factors to fixation selection. Parameters are based on statistically rigorous likelihood approach that evaluates the model assumptions given the data.
Generally, the value of a model must be quantified in terms of predictive power and explanatory value. For the models discussed here, we carried out comparisons of simulated scan paths and human eye-movement data. A number of metrics have been proposed for such a comparison [61][62][63] . Critically, however, the choice of individual statistics has a crucial influence on the outcome and there is, in most cases, no rigorous justification for the used metric. A solution to this is to evaluate dynamical scan path models using a likelihood approach 16 , which provides a statistically well-founded and reliable measure for the predictive quality of a dynamical model. In this article we relied on Bayesian data assimilation 64 as a statistically rigorous framework for testing whether the model architecture accurately represents the data generation process. This approach turned out to be particularly fruitful for strongly theory-guided models. Using general likelihood to estimate parameters of the model lends credibility to the theoretical foundations from eye-movement literature implemented by the model.
In addition to better predicting human scan paths during scenes viewing, the integration of biologically inspired attentional dynamics into models of eye guidance unifies two very disparate fields of eye-movement research. The research into covert attention shifts and perisaccadic effects is typically concerned with processes that occur on a highly detailed level in very controlled experimental setups. By contrast scene viewing literature usually operates at a higher level, on which the minutia of saccade programming or covert attention are typically passed over. Thus, influences arising from the microscopic level of eye-movement control can explain effects we observe at the macroscopic level.

Methods
Experiment. Experimental data for this study were collected in a larger corpus study on scene viewing which is described in detail elsewhere 65,66 . Images and fixation data from this corpus experiment can be downloaded from an Open Science Foundation repository (see below 66 ). The corpus consists of eye-movement data from 105 participants viewing 90 images of natural or urban landscapes from six different categories for a fixed duration (10 s). Each category contained 15 images. Images were chosen such that the most interesting image parts either fell on the left, right, upper, lower or, central image side (Supplementary Fig. S1 provides some examples). The last category were images with natural patterns, minimizing the influence of particularly salient objects. During the viewing subjects were given no task except to freely view the images.
In this study we used Experiment 3 from the corpus study 65,66 , in which participants viewed color images. This subset of data contains the eye movements of 35 participants, who viewed 30 images from each category without a task. We further split the data set into test and training data by randomly choosing 1/3 of the images (ten from each category) for each participant.
For saccade detection we applied a velocity-based algorithm 67,68 . Saccades had a minimum amplitude of 0.5°and exceeded an average velocity during a trial by six (median-based) standard deviations for at least six data samples (12 ms). The epoch between two subsequent saccades was defined as a fixation. After preparation, 312,267 fixations and saccades were detected for further analysis.
Baseline model. The original SceneWalk model 25 was implemented on a 128 × 128 grid, where (x, y) give the physical coordinates in degrees. For each fixation in the scan path we start by computing simple 2D Gaussians centered at current fixation position (x f , y f ) for both the inhibition and the attention pathway, each with an appropriate standard deviation σ A/F (A denotes the attention stream, F denotes the fixation stream to generate inhibitory tagging).
Both the inhibition F ij (t) and the activation A ij (t) streams evolve over time under current visual input and decay (due to limited of visual memory), i.e., where the input to the activation maps is the Gaussian-weighted local saliency S kl G A (x k , y l ; x f , y f ) and the input to the inhibition map is a Gaussian blob at current fixation position. The differential equations that determine the temporal evolution of the activation maps, Eq. (2) for the attention map and Eq. (3) for the fixation/ inhibition map, can be integrated analytically to provide a closed solution for the activation changes during fixation, i.e., and where we dropped the indices i, j for simplicity. In the equations, the term e Àω A=F ðtÀt 0 Þ determines the speed of decay of the past states of the map. Next, both activation maps were combined to compute the priority map u ij (t), Mathematically, the two maps are shaped by exponent γ before subtraction, and a weight parameter C F for inhibition is introduced. We expect γ ≈ 1, equivalent to Luce's choice rule 69 .
As subtraction can cause negative activation, in the next step we take only the positive component of the map, and, finally, add noise ζ to obtain the probability map π(i, j) for the selection of saccade targets. This process is repeated for each fixation in a sequence, where the current state information is combined with the past activation maps to produce a continuously evolving prediction of the next fixation.
The model structure reveals the following parameters: (1, 2) σ A and σ F , which are the standard deviations of the current fixation's attention and inhibition Gaussians respectively, (3,4) ω A and ω F , which are the speed at which past states of the model lose influence over the current, (5) γ, the shaping parameter for the Gaussians, (6) the coupling factor C F , which is the weight of the inhibition pathway, and (7) the noise parameter ζ determining the background noise for the probability map π(i, j).
Pre-saccadic attentional shifts. Once a new fixation location is chosen the center of attention moves to the upcoming fixation location, while the center of the inhibition map remains at the current fixation location (see Table 1). In the model, the pre-saccadic shift is implemented by moving the attentional Gaussian to center around the next fixation location, while the inhibition remains in the same position for a time τ pre . The inhibition stream is calculated for the entire fixation duration using Eq. (5), therefore, we have and then continue computations using Eqs. (4) and (5) with G pre A instead of G A for the duration of τ pre . When the pre-saccadic phase terminates, the saccade is executed.
Post-saccadic attentional shifts. The center of the post-saccade attention peak is determined by extending the vector of the preceding saccade by a shift amplitude η, i.e., ðx s ; y s Þ ¼ ðx n ; y n Þ þ ðx δ ; y δ Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi where the saccade direction is given by the vector (x δ , y δ ) with x δ = x n − x n − 1 and y δ = y n − y n − 1 . Thus, the attentional Gaussian is centered at the shifted location After the post-saccadic shift phase, the cycle is completed and another main phase follows. The attention center moves to each of the three locations in turn via discrete steps as shown in Table 1. We have chosen this discrete approximation with constant durations of pre-and post-saccadic shifts to compute activation changes in all fixation phase efficiently. Neurophysiological support for our discrete approximation has been found 35 , indicating that attention does not move smoothly over space from location n to location n + 1 but instead selectively starts building up at the target location n + 1.

Facilitation of Return.
To account for Facilitation of Return (FoR), we implement a selectively slower decay of the attention map in a spatial window centered at the previous fixation location. Different from the overall decay rate ω A , we define a reduced decay rate ω FoR for a window x − ν < x f − 1 < x + ν and y − ν < y f − 1 < y + ν around the previous fixation location (x f − 1 , y f − 1 ), where ν is the size of the window. Therefore, reduced decay of activation in the attention map, Eq. (4), is given by for the spatial window defined above. In addition to the strongly attention-related mechanisms above, we added the following two less dynamic and more general biases.
Center bias. The original SceneWalk model initiates its activation maps with uniform distributions. While it is difficult to accurately know the initial state of the visual system when viewing images, previous work has shown that the central Parameter t fix indicates the fixation's duration, parameters τ pre , τ post are the phase durations, and parameters fix n+1 , fix n , and remap are the locations.
fixation bias has a strong influence on the first fixation. Starting the model with a central activation improves the predictions of the model 70 . In line with this finding, we also initiated the model with central activation. The evolution equation for the first fixation is Oculomotor potential. Research into the oculomotor system has revealed a marked preference for saccades in the cardinal directions. In order to implement this tendency in the model, we introduced an additive occulomotor component. A plus-shaped oculomotor map centered on the current fixation position is generated where the factor χ determines the steepness of the slopes. The oculomotor map is added to the combined map u ij , before the normalization and the addition of noise (Eqs. (7) and (8)) Additional model parameters. The implementation of the extended SceneWalk model gives rise to several new parameters. To the seven parameters of the original model, we add (a) ω CB , the decay speed of the center bias; (b, c) σ CB x and σ CB y , the size of the center bias; (d, e) τ pre and τ post , the durations of the attention shift phases; (f) η, the distance of the post-saccadic shift; (g) σ post the size of the shifted Gaussian; (h, i) ω FoR , the attention decay at the previous fixation position and ν, the size of the facilitation window; and (j, k) the steepness χ and factor ψ of the oculomotor potential.
Estimated and fixed model parameters. We implemented a fully Bayesian approach to parameter inference 16 Table 2 we report point estimates for all parameters as averages over participants. The full estimates for each participant can be found in the Supplementary Information (Tables S2 and S3). These point estimates were computed from the posterior densities by determining the highest posterior density region for an alpha of 0.5 (i.e., the highest 50% of the density are in this region), assuming a unimodal distribution. The reported credibility intervals the lower and upper bounds of the highest density interval. The point estimate for the parameter represents the center of the highest posterior density interval. Some of the model parameters could be constrained by the physiological literature and some of the parameters had to be fixed in order to improve convergence of the parameter estimation. The latter case was checked by large-scale pilot simulations with different model versions using a separate data set. In Table  S1 we list all fixed model parameters.
First, we separated the time scales of attention and inhibition stream by one order of magnitude, i.e., ω F = ω A /10. We assume ω F is slower to decay by a magnitude than ω A , to enable long-term inhibition of return and fast build-up of activation for attentional capture. Second, we set C F = 0.3, where the numerical value was obtained from pilot simulations indicating that the relative influence of the inhibition stream must be smaller (but not negligible) compared to the corresponding influence of the attention stream.
In the extended model, some of the additional parameters need further discussion. First, we set σ CB = 4.3 and ω CB = 1.5 as described in ref. 70 , for a typically sized center bias and an attention decay that is slower than the regular ω.
The center bias parameters are difficult to estimate, since their influence is mainly limited to the first fixation. Second, we fixed ω FoR = ω A /10, representing an approximate value for decay slower by a magnitude and the size of the facilitation of return window to be approximately the size of the fovea, i.e., 2°of visual angle. As before, only a relatively small amount of fixations are influenced by this mechanism, making it difficult to identify the numerical value reliably. Third, we set the times for post-and pre-saccadic attentional shifts to τ pre = 0.1 s and τ post = 0.05 s, where the numerical values are determined by pilot simulations. Due to their small magnitude, values for ζ and ψ were estimated in the log scale.
Bayesian parameter inference. Parameter inference of the dynamical models discussed here was implemented in the general framework of data assimilation 64 using a fully Bayesian estimation procedure 16,71,72 . In this statistical inference we used the computation of the models' likelihood functions. Given a fixation sequence f 1 …f i − 1 , where each fixation f i is determined by its coordinates f i = (x i , y i ), the likelihood of the model specified by a set of parameters θ can be computed as a product of probabilities, i.e., where P M (f 1 ) is the probability of the initial fixation starting at time t = 0 and the conditional probabilities P M (f i |f 1 …f i − 1 , θ) can be read off from the models priority map π(i, j). For scaling and numerical reasons the log-likelihood is usually used. Thus, the sum of the scan path's log-likelihood per fixation for the entire data set gives one value that characterizes model performance. As suggested by ref. 16 , taking the log 2 of the likelihood enables the use of the unit bit. A null model, in which the probability of choosing each point a 128 × 128 pixel image is the constant, would be log 2 ð1=128 2 Þ ¼ À14. A hypothetical model which, unrealistically, perfectly predicts the data would have a log-likelihood of 0. It is important to note that for model comparison we can take the mean log-likelihood per fixation while for the parameter estimation the non-normalized sum log-likelihood of a scan path is the appropriate measure.
Based on the likelihood L M (θ|data) and a prior distribution P(θ), the posterior distribution is computed via Bayes' rule as where typically a Markov Chain Monte Carlo (MCMC) approach is needed to compute the posterior density numerically. For our parameter estimations we used the implementation of the DREAM Algorithm that is published as PyDream 73 . Each estimation ran three chains of 20,000 iterations. Since the DREAM estimation procedure requires a large number of model evaluations, the computing time of the likelihood function is critical for the baseline SceneWalk model and, in particular, for the extended SceneWalk model. We therefore implemented parallel computations of the likelihood for fixation sequences. The priors, loosely based on pilot estimations on a separate data set, were chosen to be broad and relatively uninformative.
Inter-individual differences in behavior are a main source of variance in eyemovement data. Here we took advantage of these differences by testing model generalizability. We implemented individual independent model fitting for each participant by running a DREAM parameter estimation for each participant separately. The advantage of using this method is that when simulating data, we obtain an upper limit for the variance of parameters between individual participants.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The experimental data used in this study represent a subset of the Potsdam Corpus on Spatial Frequency Search in Natural Scenes 66 , which is publicly available via the Open Science Framework (osf.io/caqt2).