Introduction

Background

Humans evolve in a highly complex visual world, in which visual scenes are composed of numerous objects, seen at different scales and angles, one often occluding another. Recognising objects under these conditions is a difficult task and yet every day we perform it seamlessly2. This process is not static: it relies on active sampling3,4, involving sequences of eye movements that are not random. During the exploration of a scene, people preferentially fixate informative locations, such as persons and objects5. Humans’ visual behaviour is thus adapting not only to the varying physical properties of visual stimuli, like local contrast or spatial frequency6,7, but also to their individual circumstances, such as a goal sat by experimental instructions5,8,9 or relevant prior knowledge10.

Prior knowledge, or in short priors11, is of particular interest because its influence on visual recognition suggests that people might not see the world in the same way given their past experience. In the context of visual search, priors have been proposed to be grouped into two main categories4: (1) specific priors, which relate to knowledge on the objects themselves and can be either short-term, acquired during the task, or long-term, acquired during the observer’s daily life; (2) generic priors, which relate to semantic and spatial knowledge providing general rules for how visual input is organised, e.g., where to find a roof in a scene depicting a house. Such priors affect the recognition of objects by influencing the way observers sample visual information, but also their treatment of this information, i.e., their perception. Priors can either help or hinder sampling and perception. For example, when a new category is learned, visually searching for an object that is part of this category among other objects is slower than for well-known categories, but it also results in less false alarms12,13. Moreover, priors can also bias the perception of a stimulus: they can favour perceiving the same stimulus as previously seen in a hysteresis or attractive effect14,15,16, or push the perception away from the previous stimulus in a repulsive or contrastive effect17.

Thus, priors and eye movements are central to the visual recognition process and can be studied in concert to characterise how one influences the other. Most of the work looking at priors and eye movements for object recognition has focused on visual search paradigms (see Ref.4 for a review), which are particularly well-suited because they typically prompt extensive sequences of eye movements18. Little is known about how priors influence eye movements in simpler tasks, like single object recognition, because humans can habitually recognise single objects at a glance19. Nonetheless, single objects have been shown to be explored extensively when they are difficult to perceive1,20, which offers new avenues for looking into the role of eye-movements, “a window into the operation of the attentional system4, for elementary perceptual functions, such as object recognition.

The Dots method

One method that can prompt intensified exploration of single objects is the Dots method1,21. In this paradigm, stimuli are composed of lattices of dots that are displaced away from their initial position towards contour-dense regions of an object. The deformation of the lattice is precisely calculated for each dot depending on the contour density of a source image combined with a global visibility variable, g. By varying the g value, one can vary the visibility of different stimuli in a highly controlled and comparable manner. Because the stimuli are all made of the same local elements, close comparisons are possible both between objects and between visibility levels. Dots stimuli are generated in a controlled manner, which bears the advantage of allowing to precisely quantify information content at any location in a stimulus. Two types of information can be extracted: (1) the Local Dots Displacement (LDD), which represents the physical information present in the stimulus and varies with each level of visibility, g; (2) the Local Contour Density (LCD), which represents the hidden information from the source images and is independent of the g-level (see Ref.1 for more details). It is the latter that is used to generate the stimuli, but only the former is directly accessible to the participants. The two types of information are thus related but do not fully map onto each other. Reconstructing these values for locations fixated by participants allows to investigate the way observers extract information through visual sampling. To make sense of Dots stimuli, participants have to rely on Gestalt principles, mainly proximity, grouping, and good continuation22, and potentially to also rely on priors23,24,25. Following Henderson’s typology4, participants undergoing a Dots visual paradigm can be expected to use: (1) specific priors related to the particular objects of the stimulus-set, which they can both build as they recognise the objects and withdraw from their daily-life experience; and (2) generic priors that provide contextual rules about how the stimuli are presented within the task, e.g., their location, order, and composition, acquired throughout the experiment.

The Dots method was introduced by Moca et al.1, who presented participants with 50 objects of 7 visibility levels, g, in 7 blocks of incremental g (0.00–0.30). This was a way to manipulate participants’ access to information and thereby their ability to build priors right from the start of the experiment. Participants in an Ascending group experienced a block-by-block increase of the visibility level, g, while participants in a Descending group saw blocks of decreasing visibility. Moca et al. showed that prior access to information influenced participants’ ability to accurately recognise the objects, but also their visual exploration of the stimuli, i.e., their ability to sample informative points. The Descending group, who could build the strongest priors from the start, recognised stimuli more accurately and sampled more informative locations, especially at lower visibility levels, compared to the Ascending participants. This phenomenon was called “visual hysteresis”1.

Current study

The current study extends the work of Moca et al.1, whose data was revisited and entirely reanalysed, alongside a further group of Random participants (see “Materials and methods”). This additional group of participants viewed the same stimuli as the former Ascending and Descending participants, with each of the 50 objects similarly presented once per block, but this time g-levels were randomised across the seven blocks. This Random group of participants was thereby given an intermediate access to information compared to the Ascending and Descending participants and was expected to build priors, explore, and perform at an intermediate level between the two former groups. Additionally, here we introduce a graphical description of the expected recognition process and the study’s predictions (see Fig. 1). The figure sets out to account for changes in performance at three main levels of visibility (left to right: low, medium, and high g) in terms of (1) the available information (top row); (2–3) the task-general (generic) and object-specific (specific) priors (respectively, second and third row); and (4) the expected performance (bottom row), for each of the three groups. An additional, purely theoretical naïve group, who is not building any priors but experiencing varying levels of information availability throughout the g-levels, was also included as a reference point for purely stimulus, (g)-related changes in performance.

Figure 1
figure 1

Predictions regarding participants’ access to information (top row), task-general (middle-top), and object-specific (middle-low) priors, as well as expected performance (bottom) along 3 generic g-levels: low (left column), medium (middle column), and high (right column). Descending participants are depicted in red, Ascending ones in blue, and Random ones in black. Grey bars correspond to a hypothetical group of naïve participants building no priors. Higher bars signify higher values.

We predict that all three groups will perform better with increasing g, and that each group’s performance relative to the other groups will depend on the strength of their priors at each g. We expect the three groups to build task-general priors equally well depending on their mere number of trials, as reflected by symmetrical changes along g-levels for the Ascending and Descending groups, and constant, intermediate general priors at all g-levels for the Random group (second row). By contrast, the object-specific priors are thought to be built according to present and past information availability: Descending participants build strong priors quickly, while Ascending ones only build limited priors from medium g onward. Random participants’ priors should be moderately strong all throughout (third row). Thus, we predict that Descending participants will perform best at low and medium g, while Ascending participants will dominate at high g, and the Random group will remain intermediate all throughout. Conversely, we expect these groups to match the theoretical naïve group when they start the experiment: at high g for the Descending group and low g for the Ascending group, while the Random group is expected to be better than naïve in all g levels. We expect all these effects to be reflected in participants’ performance (bottom row), both in terms of recognition accuracy and visual exploration informativeness (LCD and LDD).

Results

Recognition accuracy

We first looked at participants’ recognition accuracy (answered “Seen” and correctly recognised; Fig. 2). There was a strong effect of g on recognition accuracy (F(2.759,41.384) = 389.526, p < 0.001 , η2 = 0.887): all participants recognised the objects increasingly well for larger visibility levels. There was no evidence that the different groups recognised objects overall better or worse than the others, as reflected by no significant group effect (F(2,15) = 2.617, p = 0.106, η2 = 0.014). However, participants’ response accuracy changed with the visibility, g, in a way that depended on their group, as reflected by a significant g*group effect (F(5.518, 41.384) = 5.230, p < 0.001, η2 = 0.024). The Ascending participants dominated at high g, while the Descending group was more accurate at middle and low g, which in both cases corresponded to the levels when they had the strongest priors. On the other hand, the Random group did not appear to follow the predictions from Fig. 1: their recognition accuracy remained comparable to the group with weak priors and lower than the group with strong priors all along, never reaching the intermediate level predicted (contrasts at medium g: t(1,21.01) = 1.398 and p = 0.177 for Random vs. Ascending, t(1,21.012) = 4.127 and p < 0.001 for Random vs. Descending; at high g: t(1,26.81) = 2.216 and p = 0.035 for Random vs. Ascending, t(1,26.81) = 0.468 and p = 0.644 for Random vs. Descending). This suggests a poorer building of priors than expected for this group, who behaved like the theoretical naïve group from Fig. 1.

Figure 2
figure 2

Recognition accuracy as a function of g and for each group. Error bars represent s.d.

Recognition accuracy as a function of blocks

To test this idea, post-hoc analyses were conducted on the Random group’s response accuracy by block, rather than by g, which allowed us to look for a performance improvement over time (Fig. 3). We projected no effect this time, since it would suggest that these participants were building and using priors over time as they accumulated information, as opposed to behaving naïvely as we just proposed. This analysis was not performed on the other groups as their block order was the same as (or the reverse of) the g order, which did not allow to decouple visibility and learning effects. We found decisive evidence (BF > 100) for the alternative hypothesis supporting an effect of block: BF10 = 256.522. However, this effect seemed to be driven by the change between block 1 and 2, as shown by a dramatic drop to an anecdotal value (1/3 < BF < 1) in favour of the null hypothesis when not including block 1 in the analysis: BF10 = 0.995. This implies that the Random participants did improve their performance and built priors, but only during the first block and not along the whole experiment, as was initially expected. This was taken as reflecting the accumulation of task-general priors only. Indeed, within only a few trials, participants could quickly acquire good knowledge on the general properties of the stimuli, especially as they first performed 7 practice trials (see “Methods” section), which made successive trials uninformative regarding the building of this type of general priors. By contrast, accumulating object-specific priors demanded several blocks, since many of the stimuli that they saw in the first block were sub-recognition-threshold and did not enable them to build priors yet.

Figure 3
figure 3

Recognition accuracy as a function of block for the Random group (black). The values for the Ascending and Descending group are depicted for reference in thin-line blue and red, respectively. Error bars represent s.d.

Hence, the Random group’s performance was re-computed not taking block 1 into account (Fig. 4) to test whether decoupling the Random participants’ learning effect at the start of the experiment from their performance over different g-levels at later stages in the task enhanced their performance relative to the other groups. We found that their performance curve shifted up towards a more accurate recognition, but the change was minimal, and the group remained at a comparable or lower level than the weak-prior groups, never reaching the expected intermediate level (contrasts for Random vs. Ascending at medium g, not corrected: t(1,46.511) = 0.653 and p = 0.517; for Random vs. Descending at high g, not corrected: t(1,46.511) = − 0.026 and p = 0.979).

Figure 4
figure 4

Recognition accuracy as a function of g and for each group, including the Random group without block 1 (grey line). Error bars represent s.d.

Visual behaviour

In a second set of analyses, we investigated whether the effects of visibility (g-level) and priors (group: Ascending, Descending, or Random) on recognition performance could be linked to differences in participants’ visual information sampling. To this end, participants’ eye-movements were recorded using eye-tracking, and the information content at the locations fixated by the participants was analysed. Using the Dots method1, we extracted measures of available information, or Local Dots Displacement (LDD), and hidden information, or Local Contour Density (LCD) at each point fixated. For each type of information, two measures were extracted on a trial-by-trial basis: the average LDD or LCD value across all fixations made within a trial, and the total information sampled within a trial (the sum across the trial’s fixations). The former measure is considered to capture participants’ strategy with regards to sampling informative locations, while the latter reflects the amount of information that the participants needed in order to reach a decision about the identity of the object in a given trial.

Local dots displacement (LDD)

In terms of available information (LDD, Fig. 5), we found a strong, positive effect of g both on the average and the total LDD accessed by participants (respectively: F(2.753,41.297) = 1088.508, p < 0.001, η2 = 0.981; and F(2.159,32.379) = 37.768, p < 0.001, η2 = 0.449). For the average LDD, this was not accompanied by any effect of group (F(2,15) = 0.834, p = 0.454, η2 = 0.0004) or g*group (F(5.506,41.297) = 0.866, p = 0.520, η2 = 0.002): this showed a relatively unbiased, prior-independent sampling of the stimulus space by participants in terms of available (physical) information. In terms of total LDD however, we found a significant interaction effect of g*group (F(4.317,32.379) = 5.494), p < 0.001, η2 = 0.131) but no group effect (F(2,15) = 0.251, p = 0.781, η2 = 0.008). This suggests that while on average all participants similarly sampled more information at each fixation when information availability (g) increased, the amount of information that they needed to reach their decision given the g-level depended on their group. Here Fig. 1’s predictions are particularly helpful to explain the seemingly complex patterns that Moca et al.1 had documented. Indeed, the Ascending group needed the least amount of information compared to the other groups at high g, when Fig. 1 predicts that their priors were the strongest (contrast with both other groups at g = 0.30: t(1,42.495) = 7.644, p < 0.001). The Descending group needed the highest amount of information at high g, when they were naïve (contrast with both other groups at g = 0.30: t(1,42.495) = 0.799, p = 0.429) and the lowest at middle g, when they had the strongest priors (contrast with both other groups at g = 0.15: t(1,42.495) = 4.352, p < 0.001). The Random group appeared intermediate at both medium and high g, which was compatible with our predictions, but needed the least amount of information at low g, which we predicted would rather be the case for the Descending group (contrast with both other groups at g = 0.05: t(1,4) = 2.341, p = 0.024). Interestingly, the Random group’s need for physical information before reaching a decision scaled linearly and did not match their relative recognition performance (which was shown earlier to be lower or similar than both groups all along and never intermediate or higher). This possibly reflects an overall higher fatigue.

Figure 5
figure 5

Average (left) and total (right) Local Dots Displacement (LDD) as a function of g and for each group. Error bars represent s.d.

As for the former analyses, we looked at potential learning effects along blocks in the Random group (Fig. 6). We found no effect of block for the average LDD (F(6,96) = 1.623, p = 0.149, η2 = 0.135), which was compatible with the former result that the average physical LDD sampled by participants appeared to be more a function of the visibility (g) than of their priors’ strength. On the other hand, we found a significant block effect for the total LDD (F(6,96) = 2.488, p = 0.028, η2 = 0.135), which appeared to decrease with blocks. This suggests that as they progressed through trials, Random participants either needed less information to reach a decision or became less motivated to explore.

Figure 6
figure 6

Average (left) and total (right) Local Dots Displacement (LDD) as a function of block for the Random group (in black). The values for the Ascending and Descending group are depicted for reference in thin-line blue and red, respectively. Error bars represent s.d.

Local contour density (LCD)

In terms of hidden information, or LCD (Fig. 7), we found a significant effect of g on the average LCD (F(4.747, 71.204) = 51.184, p < 0.001, η2 = 0.585), coupled with a significant interaction effect of g*group (F(9.494, 71.204) = 2.236, p = 0.027, η2 = 0.051) and no significant effect of group (F(2,15) = 0.789, p = 0.472, η2 = 0.018). This suggests that participants were becoming more efficient at sampling hidden information (local contour from the original image) as visibility increased, in a way that depended on their priors. Indeed, as predicted in Fig. 1, and similarly to the total LDD described earlier, the Ascending group accessed locations with the highest average LCD at high g (contrast with both other groups at g = 0.30: t(1,50.940) = 10.815, p < 0.001), while it was the Descending group who sampled locations with the highest LCD at middle g (contrast with both other groups at g = 0.15: t(1,50.940) = 10.442, p < 0.001). The Random group remained generally below the other groups all throughout (contrast with other groups pooling all g-levels: t(1,15.000) = 15.510, p < 0.001), similarly to their performance in terms of recognition accuracy and resembling the behaviour of the naïve group. There was an exception for the first g-level, when there was a surge in the Random group’s average LCD sampled and the group actually reached a level comparable to the Descending group (contrast at g = 0.00: t(1, 50.940) = − 0.799, p = 0.856, BF corrected for two comparisons) and higher than the Ascending group (contrast at g = 0.00: t(1, 50.940) = − 2,626, p = 0.033, BF corrected for two comparisons), at odds with the idea that this group did not build priors. Interestingly, this g-level actually did not contain any physical LDD information, so any of the hidden LCD information sampled at this point was guided not by the physically available information or by object-specific priors, but by a combination of chance and task-general priors (knowledge about the location and extent of objects in the stimulus images). This result at the lowest g-level again implies that the Random group’s impaired ability to guide their behaviour with priors might have been limited to object-specific priors only, while their ability to build and use task-general priors was maintained.

Figure 7
figure 7

Average (left) and total (right) Local Contour Density (LCD) as a function of g and for each group. Error bars represent s.d.

The effect of g on the total LCD was not significant, although close to significance threshold (F(2.001, 30.009) = 2.174, p = 0.053, η2 = 0.038) and associated with a significant g*group effect (F(4.001,30.009) = 4.922, p = 0.004, η2 = 0.173) and no significant group effect (F2,15) = 2.542, p = 0.112, η2 = 0.133). This suggests that the amount of hidden LCD information that participants needed to integrate in order to reach a decision changed as a function of g in a group-dependent manner. Qualitatively, the changes in total LCD with g appeared to follow non-linear and group-specific shapes. The Ascending group sampled decreasing amounts of total LCD with g (simple main effect of g: F(1,6) = 3.814, p = 0.006), as both their priors and access to information increased block by block, and resulting in them sampling the least total LCD at the highest g-level (contrast with both other groups at g = 0.30: t(1,38.945) = 3.268, p = 0.002). The Descending group followed a more complex pattern: their total LCD sampled also decreased with blocks at the start of the experiment, reflected by a decrease from high to middle g as the group built priors, resulting in this group accessing the smallest amount of total LCD at middle g (contrast with both other groups at g = 0.15: t(1,38.945) = 3.682, p < 0.001). As the information became scarcer in the next blocks, the Descending group sampled more total LCD as g continued to decrease, possibly to compensate for the gradual vanishing of information from the stimuli. The Random group’s total LCD was stable at middle and high g but was lower for low g-levels, when they explored the least of all groups (contrast with both other groups at g = 0.00: t(1,38,945) = 5,645, p < 0.001). This indicates a premature cessation of exploration in the Random group, compatible with the idea of a curb in motivation, especially when compared to the Descending participants’ increased effort with decreasing g.

Once more, we looked at these measures as a function of block for the Random group (Fig. 8). We found no effect of block on average LCD (F(6,96) = 1.741, p = 0.120, η2 = 0.098) nor total LCD (F(6,96) = 1.766, p = 0.114, η2 = 0.099), suggesting that Random participants did not learn to sample more hidden information over the experiment. The fact that this group did access less total physical but not hidden information suggests that their building of task-general priors remained functional while their object-specific prior building was impaired.

Figure 8
figure 8

Average (left) and total (right) Local Contour Density (LCD) as a function of block for the Random group (in black). The values for the Ascending and Descending group are depicted for reference in thin-line blue and red, respectively. Error bars represent s.d.

Lateralisation of fixations

Because the Dots stimuli require participants to rely on Gestalt principles of proximity and good continuation, focusing exactly on the informative points is not necessarily the most efficient strategy for exploring these stimuli. Indeed, grouping dots together into a meaningful contour can be easier when the dots are besides the gaze’s focus point, where the peripheral vision blurs the points together, which may then appear linked in a contour. Lateralising the fixations to the right of the location of interest has been shown to be helpful in stimuli made of dots, because they reached the right cerebral hemisphere that is more efficient in identifying meaningful patterns26,27. Thus, based on the evidence that the Random group, contrary to our predictions, did not sample as much information as they could afford to, given their prior access to information, we performed a last post-hoc analysis looking into the lateralization of the participants’ fixations. We hypothesised that participants might have been relying more on lateralisation when either their priors or their access to information was limited: lateralising could be a compensation strategy used by participants, resulting in them apparently sampling less information, while actually being able to find the information and precisely gaze next to it.

Lateralisation appeared to significantly shift from right to left both along g levels (Fig. 9) (F(2.699,40.484) = 2.954, p = 0.036, η2 = 0.055, Huynh–Feldt corrected) and blocks (Fig. 10) (F(2.740,38.356) = 4.118, p = 0.008, η2 = 0.093, Huynh–Feldt corrected). This was not associated with a g*group (F(5.398, 40.484) = 0.232), p = 0.403, η2 = 0.040, Huynh–Feldt corrected) or a block*group (F(5.479, 38.356) = 0.678, p = 0.689, η2 = 0.031, Huynh–Feldt corrected) effect, suggesting that participants’ change in lateralisation was affected in a similar way across groups both by the visibility of the stimuli (g) and by their experience with the stimulus set (block). However, there was a significant effect of group (F(2,15) = 5.143, p = 0.020, η2 = 0.254), suggesting that although all participants were similarly affected by g and block, their overall degree of lateralisation was not the same depending on the order in which they saw the stimuli (group). Indeed, Random participants appeared to lateralise generally more than the other groups, although this effect only came close to significance (contrast between the Random group and both other groups: t(1,15.000) = − 2.107, p = 0.052).

Figure 9
figure 9

Fixation lateralisation as a function of g and for each group. Error bars represent s.d. Values above zero represent a right lateralisation and values below zero, a left lateralisation.

Figure 10
figure 10

Fixation lateralisation as a function as a function of block for Random group. The values for the Ascending and Descending groups are depicted for reference in thin-line blue and red, respectively. Error bars represent s.d. Values above zero represent a right lateralisation and values below zero, a left lateralisation.

General discussion

We showed that participants’ access to information, both in-the-moment (visibility g) and preceding (priors) the stimulus at hand, influenced their prior building, visual exploration, and perception (recognition accuracy). By presenting stimuli of objects in blocks of either Ascending, Descending, or Random visibility order, we controlled participants’ ability to build priors, which allowed us to compare recognition performance and exploration between different information-access scenarios.

In line with Moca et al.1, all participants were better at both recognising and exploring the informative locations of stimuli as visibility (g) increased, and each group’s performance relative to the others changed along g. While the original study focused on the methodical novelty brought by the paradigm, here we add a theoretical grounding by showing that these results are well explained by simple predictions (Fig. 1) combining the in-the-moment availability of information (g) with the previous access to information (priors) for the original Ascending and Descending groups. Indeed, the Ascending participants performed the best at the highest g-levels, when they had already gone through all the lower g-levels and held the strongest priors of all groups. They both recognised objects more accurately and explored more informative locations, while needing the smallest amount of information to reach a decision about the objects’ identity. At medium g-levels, this was true for the Descending group, who had just gone through blocks of the most visible levels and held the strongest priors of all groups at this point.

The Random group, which is new to this study, was found not to perform according to our predictions (Fig. 1). Random participants both explored and recognised objects the least well of all tree groups, at all g-levels, despite their access to information being intermediate between the two former groups, which we had predicted would lead to an intermediate performance. It appears that increased access to information, if not systematic, is not accompanied by an increase in recognition accuracy. Randomness impaired Random participants’ ability to build and use priors to guide both their recognition and their exploration of the stimuli. Interestingly, while it appears that Random participants’ object-specific priors were hindered, several pieces of evidence suggest that this was not the case of their task-general priors. Indeed, Random participants’ performance improved over the first block only. This is not compatible with the accumulation of object-specific information over several blocks, as more and more visible stimuli keep being presented over time, but is compatible with the evaluation of task-general statistics, which do not change from a block or a stimulus to another and can be learned quickly from the start.

Surprisingly, Random participants performed as well as the high-prior Descending group and better than the naïve Ascending group at the lowest level g = 0.00. At this level, there was no physically available information, such that participants could only be guided by their task-general priors to access locations that were expected to be statistically informative. Accordingly, we propose to update our predictions to account for this limited performance and lack of specific priors building in the Random group (Fig. 11).

Figure 11
figure 11

Updated predictions regarding participants’ access to information (top row), task-general (middle-top), and object-specific (middle-low) priors, as well as expected performance (bottom) along 3 generic g-levels: low (left column), medium (middle column), and high (right column). Descending participants are depicted in red, Ascending ones in blue, and Random ones in black. Grey bars correspond to a hypothetical group of naïve participants building no priors. Higher bars signify higher values.

Follow-up novel analyses showed that this difference between the behaviour of the two original Ascending and Descending groups, well accounted for in our initial predictions, and the unexpected behaviour of the Random group, could be related to a strategical difference in the lateralisation of the participants’ gaze. Lateralisation can help the recognition process of the Dots stimuli by allowing para-foveal integration of the visual input, where the peripheral vision’s lower spatial resolution enables a blurring of the contours, making them easier to integrate. Lateralisation has been intensively studied in the context of face perception28 but, to our knowledge, less so in object recognition. Here, we found that all groups compensated the lack of information at low g with a higher right lateralisation compared to high g, and that the Random group overall lateralised more than the other two. This could explain some of the effects for this group’s sampling of less informative points than expected from our initial predictions, since the Random participants were gazing to the side of the images more than the other participants, while potentially still being able to find informative points besides which to gaze. However, this still translated into a lower recognition performance for the Random group than the other groups, suggesting that this strategy was not particularly efficient for object recognition: random access to information remained worse than little access and destroyed participants’ ability to build and/or use meaningful priors.

It is not the first time that randomness is shown to impair performance, even when the dimension of randomisation is orthogonal to the task’s goal, e.g., Refs.29,30, as is the case here: predicting the g-level of the next Random trial does not help identifying the object to come. The detection and processing of predictable events has been shown to be both faster and more accurate than unpredictable events, e.g., Refs.31,32,33,34. Still, it is not always the case that randomness impairs performance, as for example, in machine learning, neural networks generalize the best if training samples are presented in a random order35. This seems to not be the case for humans.

In the present study, the results for the Random group, although diverging from our initial predictions (Fig. 1), are compatible with some of the current views on randomness’ effect on human learning. Still, in the visual modality, randomness was mostly studied in the context of visual search and had never been shown to impair a process as basic as single object recognition. Furthermore, here we show that it does not merely affect the participants’ performance in terms of decision making (recognition), but also in terms of information seeking, which to our knowledge had never been shown before for object recognition.

It has been suggested that the hindering effects of randomness are due to its low saliency compared to predictable events, resulting in attentional capture and facilitation of the processing of information related to predictable events over unpredictable ones30,36. This notion was questioned by experiments showing that randomness could impair the processing of information emanating from predictable sources too. Indeed, when Southwell et al.29 asked participants to track two concurrent streams of sounds presented binaurally, which could or not be randomised, they showed that whether the target stream was randomised or not did not influence performance per se, and that it was rather the mere tracking of random information that appeared to impair participants’ ability to detect targets in either stream. Indeed, tracking a stream of each type was easier than tracking two random streams, and harder than tracking two regular ones, but finding a target in the regular or the random stream when both were presented was just as difficult. They concluded that neither predictable nor random information seemed to capture attention more, but that the difference between predictable and random information lied in a cognitive load or computational demand disparity. Indeed, processing irregularities is particularly demanding as it constantly generates prediction errors when compared to the observers’ expectations, which requires a constant update of their model, while regular stimuli can be easily explained away by a predictive rule that does not need to be updated with each stimulus, and thus requires less resources.

In our case, only object-specific priors were shown to be impaired, while task-general priors appeared to remain functional: the task’s randomness did not impact all types of information equally, which questions the idea of a general higher attentional capture for regular compared to random stimuli30,36. However, it remains compatible with the hypothesis of an extra cognitive load associated with random events29. We propose that participants remained able to encode information relative to their building of a general model of their environment (the task) but were less able to store detailed information about specific items of this environment (the objects). This appears as an evolutionarily effective behaviour: in a cognitively taxing context, when observers are involved in the demanding task of trying to predict items that escape their expectations, encoding items’ specificities in the absence of a general rule for how these items are organised, appears secondary and inefficient. Orhan and Jacobs37 proposed that unpredictable stimuli, such as shapes that do not predict colour, provoke a “model mismatch” between participants’ general model of the world from their long-term priors (e.g., bananas are usually yellow) and the information that they are currently experiencing (e.g., a blue banana). We add that the cognitively demanding resolution of this mismatch by updating the model appears to be taking precedence over other types of computations, resulting in participants’ poorer performance in random contexts, linked to reduced specific but not general priors. These results bring important and novel information for the field’s understanding of how randomness impacts perception both in terms of sampling and interpretation of visual information.

Conclusion

Our results indicate that priors already guide not only decision making (recognition) but also visual information sampling for a process as fundamental as single object recognition: participants sampled different pieces of information depending on their priors’ strength, which they then interpreted differently, accurately recognising objects or not. Importantly, priors do not only influence decision making and recognition itself, but we show that the first step of gathering information in the service of making a decision is already biased depending on priors. Furthermore, we show that the general structure through which the information is presented influences this fundamental process of guiding exploration with priors: randomness, even when introduced in a dimension orthogonal to the task’s goal, destroys participants’ ability to guide their exploration and performance through their priors. Random access to information appears worse than little access. Participants in the randomised paradigm behaved seemingly naïvely and appeared unable to use the information previously encountered on the specific objects viewed, but not the information related to the general structure of the stimuli in this task. This was the case in spite of their intermediate (better than naïve) access to information. Such findings stress the importance of catering for different stages of learning and presenting information in a structured manner, even for very basic tasks, such as object recognition.

Materials and methods

Participants

18 participants (10 females, mean age 28.3 years, S.D. 4.4 years) took part in this experiment. They either joined as volunteers or received course credits for their contribution as part of their undergraduate Psychology curriculum. They all had normal or corrected-to-normal vision. They each were assigned to one of three experimental conditions (described below and referred to as Ascending, Descending, or Random), resulting in three groups of N = 6 subjects each. All participants were informed about the experimental procedures and gave their written informed consent before starting the experiment. All procedures were carried out in accordance with relevant guidelines and regulations, and were approved by the ethics committee of the University of Medicine and Pharmacy “Iuliu Hatieganu” of Cluj-Napoca, Romania, under the approval No. 150/10.12.2009.

Stimuli

Moca et al.’s1 stimuli were used: images were generated via the Dots method, through which contour information extracted from source images of objects was used to apply a deformation to a 2D lattice of dots to displace dots and reveal objects’ outline. The visibility of the stimuli and their information content was precisely manipulated through the use of 7 different deformation levels controlled by a “gravitational” constant g. It ranged from g = 0.00, at the lowest visibility level (no deformation), to g = 0.30, at the highest visibility level, with steps of g = 0.05. For each of the 50 objects, 7 stimuli at each of the 7 g levels were generated, resulting in a pool of 350 stimuli. Example stimuli for one object are shown in Fig. 12.

Figure 12
figure 12

Example of Dots stimuli for one object at each level of visibility, g, shown in order of increasing g.

Procedure

Participants were tested in a detection-recognition task. Stimuli were presented on a 22″ Samsung SyncMaster 226BW (2 ms Grey-To-Grey response time) at a resolution of 1680 × 1050 pixels, placed at a distance of 115 cm from the participant. The images were presented in the central part of the screen at a resolution of 600 × 400 pixels. Each trial started with a red fixation cross for a random duration of 1000 to 1500 ms. The stimulus then appeared on the screen for an indefinite duration. Participants were instructed to visually explore each stimulus for as long as they wanted to, and to decide whether the dots pattern of the stimulus represented anything meaningful. After visualising the stimulus and when they wanted to, they pressed one of three buttons to indicate whether they had “seen” the object (“L” key: they perceived something meaningful in the stimulus and knew what it was), were “uncertain” what the object was (“S” key: they thought they saw something but were not sure what it was), or saw “nothing” in the dots grid (“A” key: they did not think anything meaningful was depicted). This was followed by a green fixation cross for 500 ms, after which a message appeared asking the subject to explicitly name the object that they (thought they) had seen, in the cases when they answered “seen” or “uncertain” (guess). Their oral answers were manually recorded by an experimenter present in the room throughout the experiment (only “seen” answers were considered for the calculation of the accuracy measure). Participants finally pressed SPACE to move on to the next trial, which began after a 200 ms delay. The session started with a 14-trial training block using a separate set of objects, followed by the experiment’s blocks. The general design for the trials is summarised in Fig. 13.

Figure 13
figure 13

Structure of a trial; the trials were composed of a fixation cross of randomised duration in the 1000–1500 ms interval, a free viewing exploration phase until the participant pressed one of three buttons to signify that they had viewed “Nothing”, were “Uncertain”, or had “Seen” an object, after what they were prompted by a sentence on the screen to verbalise the object they thought that they saw if the pressed “Uncertain” or “Seen”. There were seven blocks of 50 trials, and each object was shown one per block, at each of the seven g-levels across all blocks.

Presentation order: between-subjects design

Stimuli were presented in 7 blocks, each containing all 50 objects at one of the 7 visibility levels. The order of the objects within the blocks was randomised for each participant. The order of the objects’ visibility throughout blocks varied between the groups to manipulate the participants’ access to information. One group of participants saw the stimuli in an Ascending fashion: the stimuli were presented at the same visibility level within blocks, starting with a block of the lowest (no deformation) visibility level, going up to more and more visible stimuli at each block. A second group viewed the stimuli in a reversed Descending path, where they first saw the objects at the highest visibility level, going down in visibility at each block. Both these groups correspond to those described in Moca et al.1. Finally, a last group viewed the stimuli in blocks of mixed visibility levels, in which all the objects were presented once per block, but each at a Random level of visibility out of the seven. This latter presentation order resulted in an access to visual information which was intermediate between the Ascending presentation (lowest, uninformative content first) and the Descending presentation (highest, most informative content first). This group was a new addition to the data presented by Moca et al.1.

Recordings

Participants’ button-presses (Seen/Uncertain/Nothing) were recorded with precise timings, together with their verbal responses (object’s name). Eye-tracking was used to monitor their gaze throughout the experiment. The eye-tracking recordings were made monocularly with an ASL EyeStart 6000 system at a rate of 50 Hz. Participants’ heads rested on a cheek-rest to avoid changes in their head position during the tracking, while still enabling them to speak after each stimulus. A nine-point calibration was conducted at the start of each block, and each trials’ fixation cross was used as a post-hoc calibration to correct for potential shifts in the eye position within blocks.

Data processing

Manual inspection of the data vs. Moca et al.’s automatic pipeline

Although two of the groups were already presented in Moca et al.1, all datasets from all three groups were processed anew from raw data for the present study. The same pre-processing pipeline was used, which included automatic identification of saccades and fixations, but all fixations were then manually checked for any missing, additional, or misidentified saccades. This was found to increase the quality of the data, as shown by a comparison of fixation durations between the automatic process in Moca et al.’s data and the current manually checked data. A histogram of the fixation durations, pooling all trials of all Ascending and Descending participants, for the fully automatic process (Fig. 14A) and for the manually checked data (Fig. 14B) showed that manually checking the data resulted in an overall more normal distribution, with less of the very short and very long fixations, and mean and median values closer to each-other (the time bin for the mean and median values in the automatic pipeline were, respectively, 480–500 ms and 900–910 ms, which shifted to 540–560 ms and 800–820 ms after manual inspection). There were no substantial changes in the results between the manual and the automatic processing, thus the difference between the pre-processing pipelines is not further discussed in the paper.

Figure 14
figure 14

Histogram, using 20-ms time bins, of the fixation durations of all Ascending and Descending participants across all trials for the fully automatic process (A) and the manually checked data. (B) The blue bin represents the bin that contains the median, and the red bin, the mean. The tail of histogram was too long for good visualisation and was not entirely plotted here.

Pre-processing pipeline

Trials were automatically screened and any trial with over 50% data loss was discarded. Horizontal and vertical gaze location data were smoothed, and fixations were automatically detected using a simplified version of Nystrom and Holmqvist38 algorithm with two adaptive velocity thresholds: the velocity of the eye’s position was computed and summed to obtain a horizontal-vertical composite velocity variable. Each time the velocity crossed a high threshold, a saccade was detected, whose start and end were identified using a second, lower threshold. The thresholds were computed on a trial-by-trial basis according to the trial’s mean and standard deviation in the composite velocity, respectively 4 and 1.5 standard deviations above the mean. Fixations were defined as a time series between two saccades with a minimum duration of 50 ms. This automatic algorithm was used to guide the parsing of the data but, as described above, a manual check of all saccades for all trials and all participants was conducted to avoid misdetections of saccades in case of local noise. The first and last fixations were discarded as they presented significantly different profiles in standard eye-movement statistics (duration, spread) and were linked to, respectively, baseline fixations on a central cross and post-button click awaiting fixations before verbalising the answer39.

At each fixation identified, the stimulus’ local information content around this point was reconstructed in two different ways. On the one hand, the information physically conveyed by the image was reconstructed from the stimulus image itself: taking an area of 0.5 visual degrees around the fixation, we calculated the amount of local dots displacement (LDD), which directly relates to the information present at this location. On the other hand, we also calculated a more semantic, hidden form of local information content, using the object’s source image this time: in the same area of 0.5 visual degrees around the fixation, we calculated the amount of local contour density (LCD) in the source image. Since the squared contour density was used to generate dots displacement in the stimulus, these two measures are highly related, however they do not fully map onto each other.

Data analysis

The variables described above were analysed for the three groups of Ascending, Descending, and Random participants and for the seven g-levels in 3 × 7 repeated measures analyses of variance (RM-ANOVAs), using a Huynh–Feldt correction for non-sphericity. To note, unequal variances were also found for most g levels, suggesting that the homogeneity of the participants’ responses varied as a function of g. This was expected, as the low g-levels corresponded to no or very little visibility, causing a floor effect, while the highest g-levels created a ceiling effect, and the levels in between were expected to be particularly affected by the groups’ difference in priors’ strength. No correction could be applied to correct for this violation of homogeneity as well as the violation of sphericity, however since the groups had the same sample size, it was considered appropriate to still conduct ANOVAs40. Indeed, the violation of the equal variance assumption yields an increased likelihood of type II error or false negatives compared to equal variances, while the type I errors or false positives remained comparable, resulting in more stringent tests such that positive results remained highly reliable. For blocks analysis of the behavioural data, Bayesian RM-ANOVAs were used such that evidence for the null hypothesis could be examined. These tests were helpful to study the overall direction of effects along all the g-levels, but additionally, contrasts at low (0.05), medium (0.15) or high (0.30) g were computed to study local differences between the groups as predicted initially (Fig. 1). These were planned with the predictions and thus not corrected.