Effects of meaningfulness on perception: Alpha-band oscillations carry perceptual expectations and influence early visual responses

Samaha, Jason; Boutonnet, Bastien; Postle, Bradley R.; Lupyan, Gary

doi:10.1038/s41598-018-25093-5

Download PDF

Article
Open access
Published: 26 April 2018

Effects of meaningfulness on perception: Alpha-band oscillations carry perceptual expectations and influence early visual responses

Scientific Reports volume 8, Article number: 6606 (2018) Cite this article

4215 Accesses
29 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Perceptual experience results from a complex interplay of bottom-up input and prior knowledge about the world, yet the extent to which knowledge affects perception, the neural mechanisms underlying these effects, and the stages of processing at which these two sources of information converge, are still unclear. In several experiments we show that language, in the form of verbal labels, both aids recognition of ambiguous “Mooney” images and improves objective visual discrimination performance in a match/non-match task. We then used electroencephalography (EEG) to better understand the mechanisms of this effect. The improved discrimination of images previously labeled was accompanied by a larger occipital-parietal P1 evoked response to the meaningful versus meaningless target stimuli. Time-frequency analysis of the interval between the cue and the target stimulus revealed increases in the power of posterior alpha-band (8–14 Hz) oscillations when the meaning of the stimuli to be compared was trained. The magnitude of the pre-target alpha difference and the P1 amplitude difference were positively correlated across individuals. These results suggest that prior knowledge prepares the brain for upcoming perception via the modulation of alpha-band oscillations, and that this preparatory state influences early (~120 ms) stages of visual processing.

The language network as a natural kind within the broader landscape of the human brain

Article 12 April 2024

Memorability shapes perceived time (and vice versa)

Article 22 April 2024

EEG is better left alone

Article Open access 09 February 2023

Introduction

A chief function of visual perception is to “provide a description that is useful to the viewer”¹, that is, to construct meaning^2,3. Canonical models of visual perception explain this ability as a feed-forward process, whereby low-level sensory signals are progressively combined into more complex descriptions that are the basis for recognition and categorization^4,5. There is now considerable evidence, however, suggesting that prior knowledge impacts relatively early stages of perception^{6,7,8,9,10,11,12,13,14,15}. A dramatic demonstration of how prior knowledge can create meaning from apparently meaningless inputs occurs with two-tone “Mooney” images¹⁶, which can become recognizable following the presentation of perceptual hints^17,18.

Although there is general acceptance that knowledge can shape perception, there are fundamental unanswered questions concerning the type of knowledge that can exert such effects. Previous demonstrations of Mooney recognition by prior knowledge have used perceptual hints, such as pointing out where the meaningful image is located or showing people the completed version of the image^17,19. Our first question is whether category information cued linguistically—in the absence of any perceptual hints—can have similar effects. Second, it remains unclear whether such effects of knowledge reflect modulation of low-level perception and if so, when during visual processing such modulation occurs. Some have argued that benefits of knowledge on perception reflects late, post-perceptual processes occurring only after processes that could be reasonably called perceptual²⁰. In contrast, recent fMRI experiments have observed knowledge-based modulation of stimulus-evoked activity in sensory regions, suggesting an early locus of top-down effects^21,22,23,24. However, the sluggish nature of the BOLD signal makes it difficult to distinguish between knowledge affecting bottom-up processing from later feedback signals to the same regions.

One way that prior knowledge may influence perception is by biasing baseline activity in perceptual circuits, pushing the interpretation of sensory evidence towards that which is expected²⁵. Biasing of prestimulus activity according to expectations has been observed both in decision- and motor-related prefrontal and parietal regions^26,27,28 as well as in sensory regions^21,29,30. In visual regions, alpha-band oscillations are thought to play an important role in modulating prestimulus activity according to expectations. For example, prior knowledge of the location of an upcoming stimulus changes preparatory alpha activity in visual cortex^{31,32,33,34,35}. Likewise, expectations about when a visual stimulus will appear are reflected in alpha dynamics^36,37,38. Recently, Mayer and colleagues demonstrated that when the identity of a target letter could be predicted, pre-target alpha power increased over left-lateralized posterior sensors³⁹. These findings suggest that alpha-band dynamics are involved in establishing perceptual predictions in anticipation of perception.

Here, we examined whether verbal cues that offered no direct perceptual hints can improve visual recognition of indeterminate two-tone Mooney images (Experiment 1). We then measured whether such verbally ascribed meaning affected an objective visual discrimination task (Experiments 2–3). Finally, we recorded electroencephalography (EEG) during the visual discrimination task (Experiment 4) to better understand the locus at which knowledge influenced perception. Our findings suggest that using language to ascribe meaning to ambiguous images impacts early visual processing by biasing pre-target neural activity in the alpha-band.

Materials and Method

Experiment 1

Materials

We constructed 71 Mooney images by superimposing familiar images of easily nameable and common artefacts and animals onto patterned background. These superimposed images were then blurred (Gaussian Blur) and then thresholded to a black-and-white bitmap. Materials are available at https://osf.io/stvgy/.

Participants

All participants for Experiments 1A-1C were recruited from Amazon Mechanical Turk and were paid $1 (Experiments 1A and 1B), or $0.50 (Experiment 1C) for participating. Demographic information was not collected. All studies were approved by the University of Wisconsin-Madison Institutional Review Board and were conducted in accordance with their policies.

Procedure

Experiment 1A. Free Naming. We recruited 94 participants (four excluded for non-compliance). Each participant was randomly assigned to view one of 4 subsets of the 71 Mooney images, and to name at the basic-level what they saw in each image. Each image was seen by approximately 24 people. Average accuracies for the 71 images ranged from 0% to 95%.

Experiment 1B. Basic Level Cues. From the 71 images used in Experiment 1A we selected the images with accuracy at or below 33% (30 images). We then presented these images to an additional 42 participants (2 excluded for non-compliance. Each participant was shown one of two subsets of the 30 images (15 trials) and asked to choose among 15 basic-level names (e.g., “trumpet”, “leopard”, “table”), which object they thought was present in the image (i.e., a 15-alternative forced choice). Each image received approximately 21 responses.

Experiment 1C. Superordinate Cues. Out of the 30 images used in Experiment 1B we selected 15 that had a clear superordinate label (see Fig. 1). Twenty additional participants were presented with each image along with its corresponding superordinate label and were asked to name, at the basic level, the object they saw in their picture by typing their response. For example, given the superordinate cue “musical instrument”, participants were expected to respond with “trumpet” given a Mooney image of a trumpet.

Data Coding and Analysis

For free-responses (Experiments 1A, 1C) we considered a response to be correct if it (1) matched the designated image name, e.g., for “lantern”, participants entered “lantern.” (2) was misspelled but identifiable (e.g., “lanturn”), (3) if it was synonymous, e.g., “camping light”, (4) if it contained the target word inside a carrier phrase, e.g., both “socks” and “a pair of socks” was coded as correct. We also coded as correct any “errors” in plurality (e.g., lantern/lanterns) though these were very rare. The responses were first independently coded by three research assistants and any disagreements were discussed until consensus was reached. The effect of condition on accuracy was modeled using logistic regression with a subject and item (Mooney-image category) as random intercepts. The model also included an item-by-condition random slope.

Experiment 2

Materials

From the set of 15 categories used in Experiment 1C, we chose the 10 that had the highest accuracy in the basic-level cue condition (Experiment 1B) and were most benefited by the cues (boot, cake, cheese, desk, guitar, leopard, socks, train, trumpet, turtle). The images subtended approximately 7° × 7° of visual angle. Each category (e.g., guitar) was instantiated by four variants: two different image backgrounds and two different positions of the images. These additional images were introduced to tease apart potential detection effects be driven by low-level processing alone.

Participants

We recruited 35 college undergraduates to participate in exchange for course credit. Two were eliminated for low accuracy (less than 77%), resulting in 14 participants in the meaning trained condition (8 female), and 19 in the meaning untrained condition (11 female). All participants provided written informed consent.

Familiarization Procedure

Participants were randomly assigned to a meaning trained or meaning untrained condition. The two conditions differed only in how participants were familiarized with the images. In the meaning trained condition, participants first viewed each Mooney image accompanied by an instruction, e.g., “Please look for CAKE”, twice for each Mooney image (Trials 1–20). Participants then saw all the images again and were asked to type in what they saw in each image, guessing in the case that they could not see anything (Trials 21–30). Finally, participants were shown each image again, asked to type in the label once more and asked to rate on a 1–5 how certain they were that the image portrayed the object they typed. In the meaning untrained condition, participants were familiarized with the images while performing a one-back task, being asked to press the spacebar anytime an image was repeated back-to-back. Repetitions occurred on 20–25% of the trials. In total, participants in the meaning trained and untrained conditions saw each image 4 and 5 times respectively.

Same/Different Task

Following familiarization, participants were tested in their ability to visually discriminate pairs of Mooney images. Their task was to indicate whether the two images were physically identical or different in any way (Fig. 2A). Each trial began with a central fixation cross (500 ms), followed by the presentation of one of the Mooney images (the “cue”) approximately 8° of visual angle above, below, to the left or to the right of fixation. After 1500 ms the second image (the “target”) appeared in one of the remaining cardinal positions. The two images remained visible until the participant responded “same” or “different” using the keyboard (hand-response mapping was counterbalanced across participants). Accuracy feedback (a buzz or bleep) sounded following the response, followed by a randomly determined inter-trial interval (blank screen) between 250 and 450 ms. Image pairs were equally divided into three trial-types (Fig. 2C): (1) a pair of identical images, (2) a pair of images containing the same object, but in different locations, (3) a pair of images containing different objects at different locations. The backgrounds of the two images on a given trial were always the same. On a given trial, both cue and target objects were either trained or untrained. Participants completed 6 practice trials followed by 360 testing trials and were asked to respond as quickly as possible without compromising accuracy.

Behavioral Data Analysis

Accuracy was modeled using logistic mixed effects regression with trial-type and meaning-training as fixed effects, subject and item-category random effects with trial-type random slopes. RTs were modeled in the same way, but using linear mixed effects regression (see Fig. 3). RT analyses excluded responses longer than 5 s and those exceeding 3SDs of the subject’s mean.

Experiment 3

Participants

We recruited 32 college undergraduates to participate in exchange for course credit. 16 were assigned to the meaning trained condition (13 female), and the other 16 to the meaning untrained condition (12 female). All participants provided written informed consent.

Familiarization Procedure and Task

The familiarization procedure, task, and materials were identical to Experiment 2 except that the first and second images (approximately 6° × 6° of visual angle) were presented briefly and sequentially at the point of fixation, in order to increase difficulty and better test for effects of meaning on task accuracy (see Fig. 2B). On each trial, the initial cue image was presented for 300 ms for the initial 6 practice trials and 150 ms for the 360 subsequent trials. The image was then replaced by a pattern mask for 167 ms followed by a 700 ms blank screen, followed by the target image. Participants’ task, as before, was to indicate whether the cue and target images were identical. The pattern masks were black-and-white bitmaps consisting of randomly intermixed ovals and rectangles (https://osf.io/stvgy/).

Behavioral Data Analysis

Exclusion criteria and analysis were the same as in Experiment 2.

Experiment 4

Participants

Nineteen college undergraduates were recruited to participate in exchange for monetary compensation. 3 were excluded from any analysis due to poor EEG recoding quality, resulting in 16 participants (9 female) with usable data. All participants reported normal or corrected visual acuity and color vision and no history of neurological disorders and provided written informed consent.

Familiarization Procedure and Task

The familiarization procedure, task, and materials were nearly identical to that used for Experiment 3, but modified to accommodate a within-subject design. For each participant, 5 of the 10 images were assigned to the meaning trained condition and the remaining to the meaning untrained condition, counterbalanced between subjects. Participants first viewed the 5 Mooney images in the meaning condition together with their names (trials 1–10), with each image seen twice. Participants then viewed the same images again and asked to type in what they saw in each image (trials 11–15). For trials 16–20 participants were again asked to enter labels for the images and prompted after each trial to indicate on a 1–5 scale how certain they were that the image portrayed the object they named. During trials 21–43 participants completed a 1-back task identical to that used in Experiments 2–3 as a way of becoming familiarized with the images assigned to the meaning untrained condition. Participants then completed 360 trials of the same/different task described in Experiment 3.

EEG Recording and Preprocessing

EEG was recorded from 60 Ag/AgCl electrodes with electrode positions conforming to the extended 10–20 system. Recordings were made using a forehead reference electrode and an Eximia 60-channel amplifier (Nextim; Helsinki, Finland) with a sampling rate of 1450 Hz. Preprocessing and analysis was conducted in MATLAB (R2014b, The Mathworks, Natick, MA) using custom scripts and the EEGLAB toolbox⁴⁰. Data were downsampled to 500 Hz offline and were divided into epochs spanning −1500 ms prior to cue onset to +1500 ms after target onset. Epochs with activity exceeding ±75 μV at any electrode were automatically discarded, resulting in an average of 352 (range: 331–360) useable trials per subjects. Independent components responsible for vertical and horizontal eye artifacts were identified from an independent component analysis (using the infomax algorithm with 3 second epochs of 1500 samples each implemented in the EEGLAB function runica.m) and subsequently removed. Visually identified channels with poor contact were spherically interpolated (range across subjects: 1–7). After these preprocessing steps, we applied a Laplacian transform to the data using spherical splines⁴¹. The Laplacian is a spatial filter (also known as current scalp density) that aids in topographical localization and converts the data into a reference-independent scheme, allowing researchers to more easily compare results across labs; the resulting units are in μV/cm². For recent discussion on the benefits of the surface Laplacian for scalp EEG see^42,43.

Event-related Potential Analysis

Cleaned epochs were filtered between 0.05 and 25 Hz using a first-order Butterworth filter (MATLAB function butter.m). Data were time-locked to target onset, baselined using a subtraction of a 200 ms pre-target window, and sorted according to target meaning condition (trained or untrained). To quantify the effect of meaning on early visual responses, we focused on the amplitude of the visual P1 component. Following prior work in our lab that found larger left-lateralized P1 amplitudes to images preceded by linguistic cues⁴⁴, we derived separate left and right regions of interest by averaging the signal from occipito-parietal electrodes PO3/4, P3/4, P7/8, P9/10, and O1/2. P1 amplitude was defined as the average of a 30 ms window, centered on the P1 peak as identified from the grand average ERP (see Fig. 4A). This same procedure was used to analyze P1 amplitudes in response to the cue stimulus, with the exception that baseline subtraction was performed using the 200 ms prior to cue onset. Lastly, in order to relate P1 amplitude and latencies to behavior, we used a single-trial analysis. As in prior work⁴⁴, single-trial peaks were determined from each electrode cluster (left and right regions of interest) by extracting the largest local voltage maxima between 70 to 150 ms post-stimulus (using the MATLAB function findpeaks). Any trial without a detectable local maximum (on average ~1%) was excluded from analysis.

Time-Frequency Analysis

Time-frequency decomposition was performed by convolving single trial unfiltered data with a family of Morelet wavelets, spanning 3–50 Hz, in 1.6-Hz steps, with wavelet cycles increasing linearly between 3 and 10 cycles as a function of frequency. Power was extracted from the resulting complex time series by squaring the absolute value of the time series. To adjust for power-law scaling, time-frequency power was converted into percent signal change relative to a common condition pre-cue baseline of −400 to −100 ms. To identify time-frequency-electrode features of interest for later analysis in a data-driven way while avoiding circular inference, we first averaged together all data from all conditions and all electrodes. This revealed a prominent (~65% signal change from baseline) task-related increase in alpha-band power (8–14 Hz) during the 500 ms preceding target onset, with a clear posterior scalp distribution (see Fig. 5A), in-line with the topography of alpha observed in many other experiments^45,46. Based on this, we focused subsequent analysis on 8–14 Hz power across the pre-target window −500 to 0 ms using the same left/right posterior electrode clusters as in the ERP analysis.

Statistical Analysis

We conducted two analyses of pre-target alpha power. To examine the effect of meaning training on the time course of pre-target alpha power (see Fig. 5B), we analyzed left and right electrode groups separately with a non-parametric permutation test and cluster correction to deal with multiple comparisons across time points⁴⁷. This was accomplished by randomly shuffling the association between condition labels (meaning trained or untrained) and alpha power 10,000 times. On every iteration, a t-statistic comparing alpha power between meaning trained and meaning untrained conditions was computed for each time sample. The largest number of contiguous significant samples was saved, forming a distribution of t-statistics under the null hypothesis that meaning training had no effect, as well as a distribution of cluster sizes expected under the null. The t-statistic associated with the true data mapping was compared, at each time point, against this null distribution and only cluster sizes exceeding the 95% percentile of the null cluster distribution was considered statistically different. α was set at 0.05 for all comparisons. In the second analysis which additionally tested for an interaction between hemispheres, we averaged alpha power across the pre-target window −500 to 0 ms and fit a linear mixed-effects model using meaning condition (trained vs. untrained), electrode cluster (left vs. right hemisphere), and their interaction to predict alpha power, with random slopes for meaning condition and hemisphere by subject (this model is equivalent to a 2-by-2 repeated-measures ANOVA).

To predict trial-averaged P1 amplitudes we used a linear mixed-effects model predicting P1 amplitude from meaning (trained vs. untrained), electrode cluster (left vs. right hemisphere), and their interaction, with random slopes for meaning condition and hemisphere by subject. Simple effects were then tested using paired t-tests to compare P1 amplitudes and pre-target alpha power between meaning conditions separately for each electrode group. We examined simple effects on the basis of two recent reports examining the influence of linguistic⁴⁴ and perceptual cues³⁹ on P1 amplitudes. Both of these experiments found left-lateralized P1 enhancements to cued images. We therefore anticipated significant differences over left, but not right sensors, and report simple effects in addition to main effects and interactions. Regarding the single-trial P1 analysis (see Event-related Potential Analysis above), we used linear mixed-effects models with subject and item random effects to examine the relationship between single-trial P1 peak amplitudes and latencies to the accuracy and latency of behavioral responses. See https://osf.io/stvgy/ for full model syntax. Where correlations are reported, we used Spearman rank coefficients to test for monotonic relationships while mitigating the influence of potential outliers. We additionally conducted a non-parametric bootstrap analysis (20000 bootstrap samples) to form 95% confidence intervals around across-subject correlation coefficients and to verify the significance of any correlation using an additional non-parametric statistic.