The movement of the eyes has been the subject of intensive research as a way to elucidate inner mechanisms of cognitive processes. A cognitive task that is rather frequent in our daily life is the visual search for hidden objects. Here we investigate through eye-tracking experiments the statistical properties associated with the search of target images embedded in a landscape of distractors. Specifically, our results show that the twofold process of eye movement, composed of sequences of fixations (small steps) intercalated by saccades (longer jumps), displays characteristic statistical signatures. While the saccadic jumps follow a log-normal distribution of distances, which is typical of multiplicative processes, the lengths of the smaller steps in the fixation trajectories are consistent with a power-law distribution. Moreover, the present analysis reveals a clear transition between a directional serial search to an isotropic random movement as the difficulty level of the searching task is increased.
It is a common misconception to believe that memories are stored in the brain the same way a movie is stored in a hard drive. Remembering, just like seeing and listening, is in fact an act of construction much more complex than usually thought, where vasts amounts of information are processed and interpreted by the brain in order to create what we call memories, and pretty much everything else we call reality1. The field dedicated to the study of these types of processes is called Cognitive Science, which took its current form in the first half of the 20th century out of a mishmash of sciences, including, among others, Psychology, Linguistics and Computer Science. More precisely, the main challenge of the Cognitive Science is to answer questions related to the way in which the brain processes available information and how this shapes behaviour2.
As theoretical entities, cognitive processes cannot be directly observed and measured3. Thus, in order to be able to study them, we need to rely on observations about the behaviour of individuals. A very often utilized approach is to follow the eye movement during cognitive tasks. By the end of the XIX century, it was still thought that the eyes smoothly scanned the line of text during reading. Louis Émile Javal4, in his unprecedented study of 1879, observed that the eyes actually move in a succession of steps, called fixations, followed by jerk-like movements, called saccades, that are too fast to capture new visual information5. The method of eye-tracking as a fundamental source of information about cognition was finally introduced through the seminal work of Yarbus6. This study provided unambiguous demonstration for the fact that the movement of the eyes is strongly correlated with the cognitive objectives of the individual.
A cognitive process that benefits the most from the study of eye movement is the visual search for hidden objects7, like when trying to find a person in a crowded place, or a 2 inches nail inside a box of nails of various sizes. An early theory related to this process is due to Treismann and Gelade, called Feature Integration Theory (FIT)8. This theory deals with attention, a kind of mental focus that can be directed to a desired region of the visual scene, therefore enhancing the perceptual sensitivity in that region. The FIT proposes that visual search tasks are divided into two stages. The first is a detection stage, in which a small set of simple separable features like color, size and orientation are identified in the elements inside the optical array. This stage is a preattentive one, that is, attention need not be directed at each element of the image in order to perform detection, all feature registration takes place in parallel across the whole visual scene. In the second stage, called integration, the features identified in the previous stage are combined in order to conceive more complex characteristics. This is an attentive stage, thus it is much slower, requiring the observer to scan each element of the image serially.
It is interesting to note that the FIT resembles a broader category of paradigms, namely the dual process models9. Under this conceptual framework, complex cognitive tasks usually consist of two systems, that essentially differ in which Kahneman10 referred as (1) effortless intuition, and (2) deliberate reasoning. The system (1) comprises processes that are fast, intuitive and can be performed automatically and in parallel, like when trying to identify the state of spirit of a person based on her/his facial expression. These processes are acquired through habit, being usually inflexible and hard do control or modify. The system (2), on the other hand, is characterized by slow, serial but extremely controlled processes, which is the case, for example, when one tries to solve a mathematical equation.
The FIT was thoroughly studied and expanded during subsequent years11. Although regarded, in its initial form, as an oversimplification12, it surely represents a formidable conceptual starting point point for research on the subject. Two particular assumptions of the FIT, namely that the whole optical array is homogeneously analyzed and that attention can be displaced independently of the eye movements (thus being called covert attention), are of interest to be expanded upon13. It is widely known that visual acuity falls rapidly from the point of fixation14, being confined to a small region called fovea. While there is little doubt about the existence of covert attention15, it has been argued that situations in which covert attention performs better than overt eye movements are unusual and restricted to laboratory tests13. These led to further investigation about the function of eye movements in visual search16,17,18.
Eye movements are composed of fixations and saccades, but even during fixations, the eyes are not completely still. In fact, fixational eye movements (FEyeM) include drift, tremor and microsacades. The drift corresponds to the erratic and low velocity component of FEyeM. The tremor is irregular and noise-like with very high frequency, while microsacades correspond to small rapid shifts in eye position akin to saccades, but preferentially taking place on horizontal and vertical directions19. Whether or not each one of these movements play an effective role in visual cognition still represents a rather controversial issue20,21,22, but it is known widespread that, if the FEyeM halt, visual perception stops completely. Previous attempts to model eye movements have been mainly devoted to describe the sequence of fixations and saccades in terms of stochastic processes23 like regular random walks24. Very often, the gaze is considered as a random walker subjected to a potential extracted from a saliency map, namely a field that depends on the particular features of the image under inspection, such as color, intensity and orientation25,26,27,28.
Recent research on visual cognition has been directed to the development of experimental and analytical methods that can potentially elucidate the interplay between different components of cognitive activities, and how their interactions give rise to cognitive performance29. While the detection and integration processes mentioned before represent basic components of visual cognition that can be investigated separately, the way they interact should be relevant for the comprehension of more intricate visual tasks. Therefore, it is of paramount interest to determine if cognitive dynamics is dominated by components or interactions. Here we show through eye-tracking experiments that the cognitive task of visual search for hidden objects displays typical statistical signatures of interaction-dominated processes. Interestingly, by increasing the difficulty level of the visual task, our results also indicate that the eye movement changes from a serial reading-like (systematic) to an isotropic (random) searching strategy.
Visual search experiments have been performed with targets hidden in two different types of disordered substrate images (see Methods for details). In the first, as depicted in Fig. 1, the subjects were asked to search for a target (number 5) in an image with distractors (numbers 2) placed on a regular array. Figure 2 shows an example of the second type of test, where we utilized images from the book series “Where's Wally?”30. These last can be considered as very complex images, since distractors are irregularly placed in an off-lattice configuration and specially drawn to closely resemble the target. The resulting image designed under these conditions frequently leads to a searching task of enhanced difficulty. The analysis of the results from the two tests enabled us to identify general statistical patterns as well as particular features in the eye movement that are related with the irregularity and complexity of the image adopted in the eye-tracking experiments.
In the case of the 5-2 lattice tests, the typical trajectories shown in Fig. 1 indicate that, when the number of distractors is small, most subjects performed systematic searches, that is, the task is accomplished in a manner that resembles a person reading a text, for example, from left to right and/or from top to bottom. By increasing the number of distractors, a transition can be observed from this directional (systematic) trajectory to an isotropic random strategy of searching for the large majority of the experiments. Precisely, systematic patterns have been observed in two thirds of the eye-tracking recordings (42 out of 63) for difficulty 0 (see Fig. 1a), half of the recordings (16 out of 32) for difficulty 1 (see Fig. 1b), and only one fourth of the recordings (6 out of 24) for difficulty 2 (see Fig. 1c). No discernible systematic searches were observed in the case of “Where's Wally?” tests.
Next, we analyze the size distributions of gaze jumps calculated for the raw data obtained from eye-tracking experiments. By definition, the size of a jump in this case corresponds to the distance, measured in number of pixels, covered by the eye gaze during each recording step of the eye-tracker device, adjusted here for approximately 17 milliseconds. Strikingly, as depicted in Figs. 3, all tests produced alike distributions of gaze jumps, regardless of the subjects, complexity of the tests, or the search strategy (regular or random). This universal shape reflects the fixation-saccade duality of the eye movement and clearly points to a superposition of behaviours instead of a description in terms of pure monomodal distributions31,32.
The presence of two modes separated by a slight depression that marks the overlap region can be observed in practically all jump size distributions of the raw data. Such a behaviour strongly suggests the need for a filtering process through which fixations and saccades can be adequately identified and their statistical properties independently studied. With this purpose, here we apply a modified version of the fixation filter developed by Olsson33, as described in the Methods section. As shown in Figs. 4 and 5, the resulting distributions of jump sizes for fixational movements obtained for 5-2 and “Where's Wally?” tests, respectively, also display the same statistical signature. Precisely, for gaze steps larger than 10 px, the distances Δr follow typical power-law distributions,with a statistically identical exponent, α ≈ 2.9, for all tests (see Table 1). For gaze steps smaller than 10 px, the distributions display approximately uniform behaviour, possibly due to the fact that, in this scale, eye tremor is of the order of drift, although this hypothesis cannot be tested with the time resolution used in our measurements34. Once identified through the filtering process, the analysis of the saccadic movements in all tests reveals that the distributions of sizes for this type of eye jump can be well described in terms of a log-normal distribution,where the parameters µ and σ correspond to the average and variance of the logarithm of the saccade length, respectively. Once more, the fact that a single distribution function can properly describe the general statistical features of different searching tests suggests that same underling mechanisms control the cognitive task under investigation. It is interesting to note, however, that the numerical values of the estimated parameters of the distributions depend on the details of the test. For instance, in the case of 5-2 tests, the mode of the distribution (the most probable length) decreases systematically with the difficulty of the searching task, indicating that saccadic movements somehow adapt to the complexity of the image.
In summary, our results from eye-tracking tests in which subjects are asked to find a specific target hidden among a set of distractors (see Methods) reveal a gradual change on the searching strategy, from a directional reading-like (systematic) to an isotropic (random) movement as the number of distractors increases. However, regardless of the differences in image complexity, searching tasks and individual skills of the subjects, we observe universal statistical features related with the distributions of gaze jump sizes. These distributions generally show a characteristic bimodal behaviour, consequence of the intrinsic dual nature of eye movement32, that alternates between saccades and fixations.
The application of a fixation filter to the raw data enables us to study separately the distributions of jump sizes for fixational and saccadic gaze steps. We find that the distribution of fixational movements show long tails which obey power-laws35, while saccades, on the other hand, follow a log-normal type of behaviour. The fact that both log-normal and power-law distributions arise from multiplicative processes36 provide strong support to the hypothesis that the interactions between components dominates the cognition task of visual search31. In a dynamics governed by interactions, the organization of the components and the way they process information are context dependent, with no particular function being encapsulated in any of the components themselves. This non-linear response to the influx of information would give rise to multiplicative distributions like the ones we disclosed here.
These observations are in evident contrast with a component based scenario, where the final performance of a given cognitive task results from the simple addition of sub-tasks that usually process information in a specialized manner. Instead of log-normal or power-law distributions, a process like this would give rise to Gaussian or other additive distributions (e.g., exponential or gamma distributions)37. It is worth noting that our results are conceptually consistent with previous studies describing complex behaviour in visual cognition32,38,39,40. As a perspective for future work, it would be interesting to relate our findings with other potential approaches based on non-cognitive random strategies, where the searching task can be the result of an optimization process41,42,43,44.
Eye movements were recorded with a Tobii T120 eye-tracking system (Tobii Technology). In this study we only consider data obtained after a valid calibration protocol is applied to both eyes of the subject. The stimuli were presented on a 17” TFT-LCD monitor with resolution 1024 × 1280 pixels and capture rate of 60 Hz.
Two types of tests consisting of visual searching for a hidden target randomly placed among a set of distractors were performed by 11 healthy subjects with an average age of 23 years. The stimuli of the first test consists of a square lattice composed of a single target number 5 and several number 2’s serving as distractors. All numbers (target and distractors) are randomly colored red or green, hindering the visual detection of the target through the identification of patterns on the peripheral vision. This images were organized in three difficulty levels according to the number of distractors, labeled 0, 1 and 2 for 207, 587 and 1399 distractors, respectively.
The stimuli of the second test are scanned images from the “Where’s Wally?” series of books30. The complexity of these images, where a large number of distractors (background characters) are irregularly placed together with Wally, the hidden target character, explains the high difficulty involved in this visual searching task. Not all images used had an actual target, since we had no intention to track the time taken to find the target. Instead, our objective was to induce the subjects to perform the searching task as naturally as possible.
In order to stimulate subjects to search efficiently, in all tests, they were told to have a limited time to find the target, but not informed exactly how much time would be available. In the case of the 5-2 lattice tests, 1, 1.5 and 2 minutes were given to search the target for the difficulties 0, 1 and 2, respectively. For the “Where’s Wally” tests, the subjects had 2 min. A summary of the parameters can be found in Table 2.
We adopted a modified version of the fixation filter developed by Olsson33 in order to identify which gaze points belong to fixations and which belong to saccades. The basic idea is to distinguish between segments of the signal that are moving slowly due to drift, thus identified as part of a fixational sequence, from those moving faster, constituting the saccades. This is achieved here by taking the raw signal output, si, namely the position of the gaze captured at each timestamp i, and calculating for each point the mean position of two sliding windows of size r, one retarded and the other advanced,
The distance between them is calculated as,
Since each timestamp has the same duration, the displacement given by Eq. 4 may be analyzed in the same way as the average velocity, thus if di is larger than its two neighbors (di−1 and di+1), and is also larger than a given velocity threshold, it is considered a peak. If two peaks are found within the interval of a single window, only the largest one is considered.
At this stage, the gaze points are divided into clusters separated by the peaks. In the original filter33, the median position of each cluster is used to locate the corresponding fixation. Since we are instead interested in separating the gaze points that correspond to fixations from those that belong to saccades, the radius of gyration for each cluster C is then calculated as,
where is the mean position of the gaze points that belong to C. Steps that fall inside the circle area covered by the radius of gyration, and are centered at , are considered to be fixational. The same applies to those steps that leave this area but return to it without passing through another fixation cluster. All other steps are considered saccadic jumps.
We thank the Brazilian Agencies CNPq, CAPES, FUNCAP and FINEP, the FUNCAP/CNPq Pronex grant, and the National Institute of Science and Technology for Complex Systems in Brazil for financial support.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/