How prior knowledge prepares perception: Prestimulus oscillations carry perceptual expectations and influence early visual responses

Perceptual experience results from a complex interplay of bottom-up input and prior knowledge about the world, yet the extent to which knowledge affects perception, the neural mechanisms underlying these effects, and the stages of processing at which these two sources of information converge, are still unclear. In a series of experiments we show that language, in the form of verbal cues, both aids recognition of ambiguous “Mooney” images and improves objective visual discrimination performance. We then used electroencephalography (EEG) to better understand the mechanisms of this effect. The improved discrimination of images previously labeled was accompanied by a larger occipital-parietal P1 evoked response to the meaningful versus meaningless target stimuli. Time-frequency analysis of the interval between the two stimuli (just prior to the target stimulus) revealed increases in the power of posterior alpha-band (8-14 Hz) oscillations when the meaning of the stimuli to be compared was trained. The magnitude of the prestimulus alpha difference and the P1 amplitude difference was positively correlated across individuals. These results suggest that prior knowledge prepares the brain for upcoming perception via the modulation of prestimulus alpha-band oscillations, and that this preparatory state influences early (~120 ms) stages of visual processing.


Introduction
A chief function of visual perception is to "provide a description that is useful to the viewer" 1 , that is, to construct meaning 2,3 . Canonical models of visual perception explain this ability as a feed-forward process, whereby low-level sensory signals are progressively combine into more complex descriptions that are the basis for recognition and categorization 4,5 . There is now considerable evidence, however, suggesting that prior knowledge impacts relatively early stages of perception [6][7][8][9][10][11][12][13][14][15] . A dramatic demonstration of how prior knowledge can create meaning from apparently meaningless inputs occurs with two-tone "Mooney" images 16 , which can become recognizable following the presentation of perceptual hints 17,18 .
Although there is general acceptance that knowledge can shape perception, there are fundamental unanswered questions concerning the type of knowledge that can exert such effects.
Previous demonstrations of recognition of Mooney images by knowledge have used perceptual hints such as pointing out where the meaningful image is located or showing people the completed version of the image. Our first question is whether category information cued linguistically-in the absence of any perceptual hints cf. 17,19 -can have similar effects. Second, it remains unclear whether such effects of knowledge reflect modulation of low-level perception and if so, when during visual processing such modulation occurs. Some have argued that benefits of knowledge on perception reflects late, post-perceptual processes occurring only after processes that could be reasonably called perceptual 20 . In contrast, recent fMRI experiments have observed knowledge-based modulation of stimulus-evoked activity in sensory regions, suggesting an early locus of top-down effects [21][22][23][24] . However, the sluggish nature of the BOLD signal makes it difficult to distinguish between knowledge affecting bottom-up processing from later feedback signals to the same regions.
One way that prior knowledge may influence perception is by biasing baseline activity in perceptual circuits, pushing the interpretation of sensory evidence towards that which is expected 25 . Biasing of prestimulus activity according to expectations has been observed both in decisionand motor-related prefrontal and parietal regions [26][27][28] as well as in sensory regions 21,29,30 . In visual regions, alpha-band oscillations are thought to play an important role in modulating prestimulus activity according to expectations. For example, prior knowledge of the location of an upcoming stimulus changes preparatory alpha activity in retinotopic cortex [31][32][33][34] . Likewise, expectations about when a visual stimulus will appear are reflected in prestimulus alpha dynamics [35][36][37] . Recently, Mayer and colleagues demonstrated that when the identity of a target letter could be predicted, prestimulus alpha power increased over left-lateralized posterior sensors 38 . These findings suggest that alpha-band dynamics are involved in establishing perceptual predictions in anticipation of perception.
Here, we examined whether verbal cues that offered no direct perceptual hints can improve visual recognition of indeterminate two-tone "Mooney" images (Experiment 1). We then measured whether such verbally ascribed meaning affected an objective visual discrimination task (Experiments 2-3). Finally, we recorded electroencephalography (EEG) during the visual discrimination task (Experiment 4) to better understand the locus at which knowledge influenced perception. Our findings suggest that using language to ascribe meaning to ambiguous images impacts early visual processing by biasing prestimulus neural activity in the alpha-band.

Experiment 1
Materials. We constructed 71 Mooney images by superimposing familiar images of easily nameable and common artefacts and animals onto patterned background. These superimposed images were then blurred (Gaussian Blur) and then thresholded to a black-and-white bitmap.
Procedure. Experiment 1A. Free Naming. We recruited 94 participants from Amazon Mechanical Turk.
Each participant was randomly assigned to view one of 4 subsets of the 71 Mooney images, and to name at the basic-level what they saw in each image. Each image was seen by approximately 24 people. Naming accuracies for the 71 images (see below for details on how these were computed) ranged from 0% to 95%. Experiment 1B. Basic Level Cues. From the 71 images used in Exp. 1A we selected the images with accuracy at or below 33% (29 images). We then presented these images to an additional 42 participants recruited from Amazon Mechanical Turk. Each participant was shown one of two subsets of the 29 images and asked to choose among 29 basic-level names (e.g., "trumpet", "leopard", "table"), which object they thought was present in the image (i.e., a 29-alternative forced choice). Each image received approximately 21 responses. Experiment 1C. Superordinate Cues. Out of the 29 images used in Exp. 1B we selected 15 that had a clear superordinate label (see Fig. 1). Twenty additional participants recruited from Amazon Mechanical Turk were presented with each image along with its corresponding superordinate label and were asked to name, at the basic level, the object they saw in their picture by typing their response. For example, given the superordinate cue "musical instrument", participants were expected to respond with "trumpet" given a Mooney image of a trumpet.

Experiment 2
Materials. From the set of 15 categories used in Exp. 1C, we chose the 10 that had the highest accuracy in the basic-level cue condition (Exp. 1B) and were most benefited by the cues (boot, cake, cheese, desk, guitar, leopard, socks, train, trumpet, turtle). The images subtended approximately 7 o ×7 o of visual angle. Each category (e.g., guitar) was instantiated by four variants: two different image backgrounds and two different positions of the images. These additional images were introduced to tease apart potential detection effects be driven by lowlevel processing alone.
Participants. We recruited 35 college undergraduates to participate in exchange for course credit.
Two were eliminated for low accuracy (less than 77%), resulting in 14 participants in the meaning trained condition (8 female), and 19 in the meaning untrained condition (11 female).  in what they saw in each image, guessing in the case that they could not see anything (Trials 21-30). Finally, participants were shown each image again, asked to type in the label once more and asked to rate on a 1-5 how certain they were that the image portrayed the object they typed. In the meaning untrained condition, participants were familiarized with the images while performing a one-back task, being asked to press the spacebar anytime an image was repeated back-to-back. Repetitions occurred on 20-25% of the trials. In total, participants in the meaningtrained and untrained conditions saw each image 4 and 5 times respectively.
Same/Different Task. Following familiarization, participants' were tested in their ability to visually discriminate pairs of Mooney images. Their task was to indicate whether the two images were physically identical or different in any way ( Fig. 2A). Each trial began with a central fixation cross (500 ms), followed by the presentation of one of the Mooney images (the "cue") approximately 8 o of visual angle above, below, to the left or to the right of fixation. After 1500 ms the second image (the "target") appeared in one of the remaining cardinal positions. The two images remained visible until the participant responded "same" or "different" using the keyboard (hand-response mapping was counterbalanced between participants). Accuracy feedback (a buzz or bleep) sounded following the response, followed by a randomly determined inter-trial interval (blank screen) between 250 and 450 ms. Image pairs were equally divided into three trial-types ( Fig 2C): (1) two identical images (same trials), (2) same object, but different location, (3) different-objects at different locations. The backgrounds of the two images on a given trial were always the same and On a given trial, both cue and target objects were either trained or untrained. Participants completed 6 practice trials followed by 360 testing trials.
Behavioral Data Analysis. Accuracy was modeled using logistic mixed effects regression with experiment block and trial-type random slopes and subject and item-category random intercepts.
RTs were modeled in the same way, but using linear mixed effects regression. RT analyses excluded responses longer than 5s and those exceeding 3SDs of the subject's mean.

Experiment 3
Participants. We recruited 32 college undergraduates to participate in exchange for course credit. 16 were assigned to the meaning trained condition (13 female), and the other 16 to the meaning untrained condition (12 female).
Familiarization Procedure and Task. The familiarization procedure, task, and materials were identical to Experiment 2 except that the first and second images (approximately 6 o ×6 o of visual angle) were presented briefly and sequentially at the point of fixation, in order to increase difficulty and better test for effects of meaning on task accuracy (see Fig. 2B). On each trial, the initial cue image was presented for 300 ms for the initial 6 practice trials and 150 ms for the 360 subsequent trials. The image was then replaced by a pattern mask for 167 ms followed by a 700 ms blank screen, followed by the second target image. Participants' task, as before, was to indicate whether the cue and target images were identical. The pattern masks were black-andwhite bitmaps consisting of randomly intermixed ovals and rectangles (https://osf.io/stvgy/).

Behavioral Data Analysis.
Exclusion criteria and analysis were the same as in Experiment 2.

Experiment 4
Participants. Nineteen college undergraduates were recruited to participate in exchange for monetary compensation. 3 were excluded from any analysis due to poor EEG recoding quality, resulting in 16 participants (9 female) with usable data. All participants reported normal or corrected visual acuity and color vision and no history of neurological disorders.
Familiarization Procedure and Task. The familiarization procedure, task, and materials were nearly identical to that used for Experiment 3, but modified to accommodate a within-subject design. For each participant, 5 of the 10 images were assigned to the meaning trained condition and the remaining to the meaning untrained condition, counterbalanced between subjects.
Participants first viewed the 5 Mooney images in the meaning condition together with their names (trials 1-10), with each image seen twice. Participants then viewed the same images again and asked to type in what they saw in each image (trials [11][12][13][14][15]. For trials 16-20 participants were again asked to enter labels for the images and prompted after each trial to indicate on a 1-5 scale how certain they were that the image portrayed the object they named. During trials 21-43 participants completed a 1-back task identical to that used in Experiments 2-3 as a way of becoming familiarized with the images assigned to the meaning untrained condition. Participants then completed 360 trials of the same/different task described in Experiment 3. with a sampling rate of 1450 Hz. Preprocessing and analysis was conducted in MATLAB (R2014b, Natick, MA) using custom scripts and the EEGLAB toolbox 39 . Data were downsampled to 500 Hz offline and were divided into epochs spanning −1500 ms prior to cue onset to +1500 ms after target onset. Epochs with activity exceeding ±75 µV at any electrode site were automatically discarded. Independent components responsible for vertical and horizontal eye artifacts were identified from an independent component analysis (using the runica algorithm implemented in EEGLAB) and subsequently removed. Visually identified channels with poor contact were spherically interpolated. After these preprocessing steps, we applied a Laplacian transform to the data using spherical splines 40 . The Laplacian is a spatial filter (also known as current scalp density) that aids in topographical localization and converts the data into a reference-independent scheme, allowing researchers to more easily compare results across labs; the resulting units are in µV/cm 2 . For recent discussion on the benefits of the surface Laplacian for scalp EEG see 41,42 . ms window, centered on the P1 peak as identified from the grand average ERP (see Fig. 4A).
Lastly, in order to relate P1 amplitude and latencies to behavior, we used a single-trial analysis.
As in prior work 43  amplitudes. Where correlations are reported, we used Spearman rank coefficients to test for monotonic relationships while mitigating the influence of potential outliers.

Experiment 1
Mean accuracy for the 15 images used in all versions of Experiment 1 is displayed in Fig. 1A.
The benefit conferred by different cue-types relative to a free naming baseline shown in Fig. 1B the image produced a 16-fold increase in accuracy in recognizing it as a desk (an impressive result even allowing for guessing). The recognition advantage that verbal cues provide is especially striking given that they do not provide any spatial or other perceptual information to the identity of the image.

Experiment 2
Results are shown in Fig. 3. Overall accuracy was high-93.1% (93.5% on different trials and 92.2% on same trials) and not significantly affected by training with meaning training (z<1).
This is not surprising given that participants had unlimited time to inspect the two images.
Participants exposed to the meaning of the images, however, had significantly shorter RTs than those who were not exposed to image meanings: RT meaning =824 ms; RT no-meaning =1018ms (b=192,  This suggests that spatial attention is unlikely to be the source of the effects of meaning training.

Discussion
To better understand when and how prior knowledge influences perception we first examined how non-perceptual cues influence recognition of initially meaningless Mooney images. These verbal cues resulted in substantial recognition improvements. For example, being told that an image contained a piece of furniture produced a 16-fold increase in recognizing a desk. We next examined whether ascribing meaning to the ambiguous images improved not just people's ability to recognize the denoted object, but to perform a basic perceptual task: distinguishing whether two images were physically identical. Indeed, ascribing meaning to the images through verbal cues improved people's ability to determine whether two simultaneously or sequentially presented images were the same or not ( Fig. 3 and 4). The behavioral advantage might still be thought to reflect an effect of meaningfulness on some relatively late process were it not for the electrophysiological results showing that ascribing meaning led to increase in the amplitude of P1 responses to the target (Fig. 4B) cf. 50 . The P1 enhancement was preceded by an increase in alpha amplitude during the cue-target interval when the cue was meaningful (Fig. 5). The effect of meaning training on pre-target alpha power and target-evoked P1 amplitude were positively correlated across participants, such that individuals who showed larger increases in pre-target alpha power as a result of meaning training, also showed larger increases in P1 amplitude (Fig.   6).
Combined, our results contradict claims that knowledge affects perception only at a very late stage 51,20,52 and provide general support for predictive processing accounts of perception, positing that knowledge may feed back to modulate lower levels of perceptual processing 3,25,53 .
Our results are also the first to show that making ambiguous images meaningful via nonperceptual linguistic cues enhances not only the ability to recognize the images, but also a putatively lower-level process subserving visual discrimination.
The P1 ERP component is associated with relatively early regions in the visual hierarchy (most likely ventral peristriate regions within Brodmann's Area 18 [54][55][56][57] but is has been shown to be sensitive to top-down manipulations such as spatial cueing 58,59 , object based attention 60 , object recognition 61,62 , and recently, trial-by-trial linguistic cuing 43 . Our finding that averaged P1 amplitudes were increased following meaning training is thus most parsimoniously explained as prior knowledge having an early locus in its effects on visual discrimination (although the failure to find this effect in the single-trial EEG suggests some caution in its interpretation). This result is consistent with prior fMRI findings implicating sectors of early visual cortex in the recognition of Mooney images 17,63 but extends these results by demonstrating that the timing of Mooney recognition is consistent with the modulation of early, feedforward visual processing.
Interestingly, the effect of meaning on P1 amplitude was present only in response to the target stimulus, and not the cue. This suggests that, in our task, prior knowledge impacted early visual responses in a dynamic manner, such that experience with the verbal cues facilitated the ability to form expectations for a subsequently "target" image. We speculate that this early targetrelated enhancement may be accomplished by the temporary activation of the cued perceptual features (reflected in sustained alpha power) rather than by an immediate interaction with longterm memory representations of the meaning-trained features, which would be expected to lead to enhancements of both cue and target p1. Another possibility is that long-term memory representations are brought to bear on the meaning-trained "cue" images, but these affect later perceptual and post-perceptual processes.
Our findings are also in line with two recent magnetoencephalography (MEG) studies reporting early effects of prior experience on subjective visibility ratings 38,64 . In those studies, however, prior experience is difficult to disentangle from perceptual repetition. For example, Aru et al., (2016) compared MEG responses to images that had previously been studied against images that were completely novel, leaving open mere exposure as a potential source of differences. In our task, by contrast, participants were familiarized with both meaning trained and meaning untrained images but only the identity of the Mooney image was revealed in the meaning training condition, thereby isolating effects of recognition. Our design further rules out the possibility that stimulus factors (e.g., salience) could explain our effects, since the choice of which stimuli were trained was randomized across subjects. One possible alternative by which meaning training may have had its effect is through spatial attention. For example, it is conceivable that on learning that a given image has a boot on the left side, participants subsequently were more effective in attending to the more informative side of the image. If true, such an explanation would not detract from the behavioral benefit we observed, but would mean that the effects of knowledge were limited to spatial attentional gain. Subsequent analyses suggest this is not the case (see Control Analyses).
It is noteworthy that, as in the present results, the two abovementioned MEG studies, as well as related work from our lab employing linguistic cues 43 , have all found early effects over leftlateralized occipito-parietal sensors, perhaps suggesting that the effects of linguistically aided perception may be more pronounced in the left hemisphere perhaps owing to the predominantly left lateralization of lexical processing 46 .
Mounting neurophysiological evidence has linked low-frequency oscillations in the alpha and beta bands to top-down processing [65][66][67][68] . Recent work has demonstrated that perceptual expectations modulate alpha-band activity prior to the onset of a target stimulus, biasing baseline activity towards the interpretation of the expected stimulus 28,38 . We provide further support for this hypothesis by showing that posterior alpha power increases when participants have prior knowledge of the meaning of the cue image, which was to be used as a comparison template for the subsequent target. Further, pre-target alpha modulation was found to predict the effect of prior knowledge on target-evoked P1 responses, suggesting that representations from prior knowledge activated by the cue interacted with target processing. Notably, the positive direction of this effect-increased prestimulus alpha power predicted larger P1 amplitudes (Fig. 6)directly contrasts with previous findings of a negative relationship between these variables [69][70][71] , which is typically interpreted as reflecting the inhibitory nature of alpha rhythms 72,73 . Indeed, our observation directly contrasts with the notion of alpha as a purely inhibitory or "idling" rhythm. We suggest that, in our task, increased prestimulus alpha-band power may reflect the pre-activation of neurons representing prior knowledge about object identity, thereby facilitating subsequent perceptual same/different judgments. This is consistent with the finding that evoked gamma and multiunit responses in Macaque inferotemporal cortex are positively correlated with prestimulus alpha power 74 , suggesting that the alpha modulation we observed may have its origin in regions where alpha is not playing an inhibitory role.
Although our results are supportive of a general tenant of predictive processing accounts 8,11,25 that predictions, formed through prior knowledge, can influence sensory representations-our results also depart in an important way from certain proposals made by predictive coding theorists 8,75,76 . With respect to the neural implementation of predictive coding, it is suggested that feedforward responses reflect the difference between the predicted information and the actual input. Predicted inputs should therefore result in a reduced feedforward response.
Experimental evidence for this proposal, however, is controversial. Several fMRI experiments have observed reduced visual cortical responses to expected stimuli [77][78][79] , whereas visual neurophysiology studies describe most feedback connections as excitatory input onto excitatory neurons in lower-level regions [80][81][82] , which may underlie the reports of enhanced fMRI and electrophysiological responses to expected stimuli 22,38,83 . A recent behavioral experiment designed to tease apart these alternatives found that predictive feedback increased perceived contrast-which is known to be monotonically related to activity in primary visual cortexsuggesting that prediction enhances sensory responses 84 . Our finding that prior knowledge increased P1 amplitude also supports the notion that feedback processes enhance early evoked responses, although teasing apart the scenarios under which responses are enhanced or reduced by predictions remains an important challenge for future research.