Abstract
Imagining natural scenes enables us to engage with a myriad of simulated environments. How do our brains generate such complex mental images? Recent research suggests that cortical alpha activity carries information about individual objects during visual imagery. However, it remains unclear if more complex imagined contents such as natural scenes are similarly represented in alpha activity. Here, we answer this question by decoding the contents of imagined scenes from rhythmic cortical activity patterns. In an EEG experiment, participants imagined natural scenes based on detailed written descriptions, which conveyed four complementary scene properties: openness, naturalness, clutter level and brightness. By conducting classification analyses on EEG power patterns across neural frequencies, we were able to decode both individual imagined scenes as well as their properties from the alpha band, showing that also the contents of complex visual images are represented in alpha rhythms. A cross-classification analysis between alpha power patterns during the imagery task and during a perception task, in which participants were presented images of the described scenes, showed that scene representations in the alpha band are partly shared between imagery and late stages of perception. This suggests that alpha activity mediates the top-down re-activation of scene-related visual contents during imagery.
Similar content being viewed by others
Introduction
Our ability to evoke mental images of natural scenes enriches our lives by giving shape to the worlds in our favorite novels or by enabling us to navigate the environment. Visual imagery is thought of as a top-down recall of sensory-memory information initiated in frontal cortex that reactivates cortical areas that are typically involved in visual perception1. For example, when imagining natural scenes, the scene network, a network of cortical areas that is typically active during scene perception, is re-engaged2,3,4. Imagery recruits shared representations with perception across the entire visual hierarchy, although more extensively in high-level visual areas1,5,6. What neural mechanisms underlie this top-down re-instantiation of visual contents?
One possibility is that imagery-related information is encoded in neural rhythms that are involved in top-down processing. Previous research has provided evidence that alpha and beta rhythms carry top-down information in the visual cortex7,8,9,10, making them likely candidates to play a role in imagery processing. In the past decades, the relation between alpha activity and visual imagery has been studied to quite an extent. While many studies have found a correlation between changes in alpha rhythms and visual imagery11,12,13,14,15, what exact information is encoded in these alpha rhythms has remained elusive for a long time. However, more recently popularized multivariate pattern analysis (MVPA) techniques16,17 have since enabled us to probe the representational content of alpha activity during visual imagery.
A recent EEG MVPA study18 found that alpha oscillations contained representations of the visual contents of imagined objects. Interestingly, these alpha band representations were also shared with late time windows during perception. While this does provide evidence that alpha oscillations carry out the top-down re-instantiation of perceptual contents, the imagery task employed in this study (imagining isolated objects) is not very representative of the imagery tasks we typically perform in our daily lives. A lot of our everyday tasks like reading, spatial navigation or mental simulation of future events require us to imagine not just single objects, but complex natural scenes19,20,21. Compared to isolated objects, perceptual processing of natural scenes requires several scene-specific steps such as the analysis of scene-diagnostic low-level image statistics22, (global) scene properties23 and object arrangements24,25. It is thus critical to understand if the top-down re-instantiation of scene information during imagery is also mediated by cortical alpha activity.
In the present study, we thus aimed to answer the question if natural scenes and their properties are represented in cortical alpha activity during visual imagery and if these representations are shared with perceptual processing. To that end, we conducted an EEG experiment in which participants imagined natural scenes based on a written description and viewed images of the same scenes in a separate task. The scenes varied in four properties which have previously been investigated in the scene literature: openness, naturalness, clutter level and brightness26,27. We employed frequency-resolved multivariate pattern classification to track the representations of scenes across neural rhythms, and imagery-perception cross-classification to investigate whether scene representations are similarly coded in rhythmic cortical activity during imagery and perception.
We show that imagined natural scenes and their properties are represented in cortical alpha activity and that scene representations in the alpha frequency band are partly shared between scene imagery and late stages of scene perception. Our results indicate that cortical alpha activity mediates the top-down re-instantiation of complex natural scenes during visual imagery.
Results
In order to investigate the neural representations of imagined and perceived scenes and their properties in neural rhythms, we conducted two experimental tasks while participants’ neural activity was recorded via EEG (see Fig. 1c for a schematic of the tasks). In the imagery task, participants imagined natural scenes for 4000 ms based on a detailed, three-sentence description of each scene (see Fig. 1a for an example). The 16 described scenes varied independently in four properties: openness, naturalness, clutter level and brightness. After completing the imagery task, participants were asked to rate the imagined scenes in regard to the four properties on a scale from 1 to 7. These ratings confirmed that, at least on average, the properties of the imagined scenes aligned with those conveyed by the descriptions (see Fig. 1b). In the subsequent perception task, participants viewed images that matched the scene descriptions (3 images per scene; see Fig. 1a) with each image being presented for 1000 ms.
Mean pairwise scene decoding
To identify at which neural frequencies information related to individual imagined scenes can be found, we conducted a mean pairwise frequency searchlight decoding analysis. We transformed the EEG signals across the entire imagery period into the frequency domain and trained classifiers to distinguish between each possible pair of imagined scenes based on power patterns across channels at each frequency from 4 to 30 Hz (see Fig. 2a for a schematic). Averaging the pairwise decoding accuracies across pairs yielded a measure of information content regarding the individual imagined scenes at each frequency. We found that the individual imagined scenes could be discriminated the best in the alpha frequency range (see Fig. 3a), with significant mean pairwise scene decoding from 8 to 13 Hz, peaking at 11 Hz (p < 0.001). There was also some weaker, but sporadically significant mean pairwise scene decoding in the beta band (significant at 18 and 21 Hz), indicating that some information might also be contained there. These results suggest that individual imagined scenes are represented most prominently in the alpha frequency band.
Scene property decoding
We further examined how the four investigated properties (openness, naturalness, clutter level and brightness) of the imagined scenes are represented across the frequency domain. Using the same frequency searchlight approach as above, we had classifiers predict for each property which property category the imagined scene belonged to (e.g., for naturalness, if the scene was natural or man-made). This analysis revealed that all four properties could exclusively be decoded from the alpha band (see Fig. 3b). The decoding accuracy for each property peaked at around 10 Hz (openness: 9 Hz, p = 0.009; naturalness: 11 Hz; p < 0.001, clutter level: 10 Hz, p = 0.038; brightness: 10 Hz, p = 0.031). Aligning with our decoding analysis of the individual scenes, these findings suggest that the properties of imagined scenes are also represented in cortical alpha activity.
Imagery-perception cross-decoding in the alpha frequency band
Next, we investigated if any of the of representations of the individual scenes or scene properties we found in the alpha band are shared between imagery and different stages of the perceptual processing hierarchy. To that end, we conducted the same mean pairwise and property decoding analyses, but trained the classifiers on alpha power patterns in the frequency-resolved imagery data and tested them on alpha power patterns at each time point in the time–frequency-resolved perception data and vice versa (see Fig. 2b for a schematic). We chose to assess these shared alpha representations across the entire imagery period while maintaining temporal resolution for the perceptual data in order to increase the power of the analysis, since imagery representations were found to be relatively invariable across time18,28,29, whereas perceptual representations are thought to be more temporally variable29,30,31. This yielded a time-resolved measure of shared representations in the alpha band between scene imagery and each stage in the processing hierarchy of scene perception, in which temporal representational variations index the procession of perceptual processing32. In the mean pairwise scene cross-decoding analysis, we identified an increase in mean pairwise scene cross-decoding accuracy starting at around 600 ms during perceptual processing that was marginally significant (p = 0.068 at peak) at 750–800 ms (see Fig. 3c). This trend suggests that there are representations of individual scenes in the alpha band that are shared between scene imagery and late stages of scene perception. In the property cross-decoding analysis, we found relatively low but significant cross-classification performance for openness at 750 ms (p = 0.012) and for clutter level at 800–850 ms (p = 0.002 at peak) as well as 950–1000 ms (p = 0.008 at peak) in the perceptual processing hierarchy (see Fig. 3d). There also was marginally significant (p = 0.057) cross-decoding performance for brightness at 400 ms, but no significant cross-decoding performance for naturalness. The peak in cross-decoding accuracy for openness was very similar to the first peak in cross-decoding accuracy for clutter level and both aligned temporally almost perfectly. They both also temporally overlapped with the marginally significant peak in the mean pairwise cross-decoding for the individual scenes. The results of our property cross-decoding analysis indicate that scene imagery shares representations with late scene perception in the alpha band, at least for some properties. We exploratively conducted all imagery-perception cross-decoding analyses in the theta and beta bands as well, but there was no solid evidence of shared scene representations in those frequency bands (see supplementary Fig. S3).
Shuffled property decoding
In a control analysis, we assessed to what extent the representations the classifiers utilized during scene property decoding encode differences in individual scene features or property category information. We performed a shuffled property decoding analysis for both the imagery property decoding and imagery-perception property cross-decoding in which all scenes were randomly assigned to two mock property categories for all possible permutations and classifiers were trained to distinguish between these categories. Since property category information is randomized in this decoding scheme, the classifiers should be limited to differentiate between these mock categories based on the features of the individual scenes they happen to encompass. If the property decoding did not only use differences in individual scenes, but also more abstract information on property categories, shuffled property decoding performance should be reduced in comparison to the original property decoding performance, since the classifiers had no access to this additional source of information in the shuffled analysis. To test this, we compared the peak (cross-)decoding accuracy for each property that was discriminable in the original analyses to the shuffled property (cross-)decoding accuracy at the respective frequency or time point.
For the imagery property decoding analysis, we found higher peak decoding accuracies compared to the shuffled property decoding accuracies for all properties (see Fig. 4a). This difference was significant for openness (p = 0.015) and naturalness (p = 0.031), marginally significant for clutter level (p = 0.072) and not significant for brightness (p = 0.102). This implies that for openness, naturalness and potentially clutter level, there are property category representations in the alpha band during scene imagery. We also conducted the shuffled property decoding at each frequency during imagery (see supplementary Fig. S4). While slightly lower in overall accuracy, the decoding performance profile looks strikingly similar to that in our mean pairwise scene decoding, further corroborating that the shuffled property decoding mainly reflects neural discriminability based on individual scene information.
For the imagery-perception property cross-decoding analysis, we conducted the comparison between property cross-decoding and shuffled property cross-decoding for all properties except naturalness, since we did not find any interpretable cross-decoding accuracy peak for this property. We found higher cross-decoding accuracies compared to the shuffled property cross-decoding accuracies for all three investigated properties (see Fig. 4b). This was significant for all properties (openness: p = 0.008, clutter level: p = 0.004, brightness: p = 0.006), suggesting that there are category representations of these properties in the alpha band that are shared between imagery and late stages of perceptual processing. When conducting the shuffled property cross-decoding at each time point during perception (see supplementary Fig. S4), we found significant above-chance cross-decoding performance starting at about 750 ms, providing further evidence of shared scene representations between imagery and late perception in the alpha band.
Relationship between imagery vividness and neural representations in the alpha band
In an exploratory analysis, we investigated if there is a relationship between the participants’ ability to evoke vivid mental images and the neural representations in the alpha band of such mental images. We correlated their scores in the Vividness of Visual Imagery Questionnaire (VVIQ)33, which they provided online during recruitment, with their peak in mean pairwise scene decoding and cross-decoding accuracy in the alpha band. While we did not find a meaningful correlation between the VVIQ and the peak in the mean pairwise decoding (Pearson’s r = 0.074, p = 0.307), we did find a marginally significant positive correlation with the peak in the mean pairwise cross-decoding (Pearson’s r = 0.2, p = 0.084). This marginal trend suggests that increased representational overlap between imagery and perception in the alpha band leads to a more vivid imagery experience, which aligns with previous research showing the same finding in the fMRI34. To test whether this relationship is solid, future studies could use trial-wise ratings of imagery vividness, which tend to be a more reliable measure34,35.
Discussion
In the present study, we investigated the representations of imagined and perceived natural scenes in rhythmic cortical activity. We found, as hypothesized, that both individual scenes as well as scene properties are represented in cortical alpha activity during visual imagery. We also found evidence that scene representations in the alpha frequency band are partly shared between imagery and late stages of perceptual processing.
These results indicate that the top-down reactivation of scene representations during visual imagery is enabled by cortical alpha activity. This aligns well with studies showing that alpha rhythms play a role in visual imagery (e.g.11,12), and specifically with the notion that imagery-related alpha oscillations are a top-down signal that represents the imagined visual contents18. In more general terms, our results also support theories that postulate that top-down information flows are mediated by alpha dynamics in visual cortex (e.g.9).
Our decoding analyses revealed that all four investigated scene properties were discriminable from cortical alpha activity during visual imagery and that for all scene properties except naturalness (and brightness being only marginally significant) these alpha representations were shared with late stages of scene perception. Comparing the initial imagery property decoding to a decoding scheme in which property categories were randomized showed that there were genuine and abstract representations of scene properties in the alpha band, which were observed for all properties except brightness during imagery (clutter level being only marginally significant). Extending this comparison to the imagery-perception cross-decoding, we found evidence of shared alpha representations of abstract property information for openness, clutter level and brightness. These results suggest that, during imagery, alpha activity enables the top-down reactivation of scene property representations, some of which are shared with late stages of scene perception. This further implies that the representational division into (global) scene properties found during perception36 also holds during imagery. One exception was, however, that we did not find any shared alpha representations for naturalness. A feasible explanation might be that, while participants rated the naturalness of the imagined scenes as expected, the natural and man-made contents they imagined might have differed from the natural and man-made contents in the images they viewed (e.g. different types of natural or man-made objects), resulting in different neural representations being recruited during imagery and perception. The properties for which we did find evidence of shared representations (openness, clutter level and brightness) are much less dependent on the specific types of imagined objects.
We found evidence of shared scene representations in the alpha band between imagery and perception from around 400 ms (for brightness) until 1000 ms (for clutter level) after stimulus onset. This is in alignment with previous studies employing cross-decoding techniques which have also reported late shared representations during perception. Xie et al.18, who originally found shared alpha representations between object imagery and perception, also reported late timings during perceptual processing with the strongest correspondence with imagery for perception emerging after 400 ms. Dijkstra et al.29 reported shared representations between imagery and perception up until 1000 ms during perception. Why would imagery reactivate representations that occur so late during perceptual processing? One potential explanation is that imagery and perception share fewer representations in low-level and more in high-level visual areas5, making it more likely that shared representations occur during later perceptual processing. This can be explained by the prominent conceptualization of imagery as a reverse reactivation of the perceptual hierarchy starting from high-level visual cortex37,38. Following this notion, the representational format of cortical brain areas in late stages of perceptual processing (i.e. high-level visual cortex) is thought to be more similar to those in imagery since they are closer to the trigger source of the imagery signal1. In alignment with this, Xie et al. found that the late shared alpha representations were best explained by complex visual features analyzed in high-level visual cortex. Thus, the shared alpha representations in our results might also reflect late processing in high-level visual areas. This is supported by our shuffled property control analysis that yielded that some of the shared property representations in the alpha band encode category information, which is typically represented in high-level visual cortex39.
However, even if late shared representations between imagery and perception are not unexpected, the timings of our results are still quite late, given that processing of scenes and (global) scene properties (specifically openness, naturalness and clutter level) has been shown to be rapid and already occurs within the first 250 ms after stimulus onset26,27,40,41,42. Since most research on the temporal dynamics of scene processing has focused on comparatively early neural signatures43, what happens during such very late stages of scene processing is still largely unknown. Given that the perceived scene images in our study were presented throughout the entire analysis time window, a possible explanation is that the alpha representations scene perception shares with scene imagery in our data reflect recurrent processing of the scenes and their properties after the first feed-forward sweep37. During recurrent processing, the perceptual representational format might be altered in a way that makes it more similar to imagery representations. Future studies could clarify to what extent recurrent processes shape the late representations in perception that generalize to imagery.
A final caveat in our results are the low decoding accuracies. Imagery-related brain signals tend to have a low signal-to-noise-ratio (e.g.44) which results in lower decoding accuracies in imagery studies that employ MVPA18,29. Furthermore, our imagery task was designed to ensure that the imagined scenes sufficiently differed from the perceived scenes in terms of their low-level features. We had participants imagine the scenes based on descriptions that allow for variability in the generated mental images and only presented them the actual images after, so that if we did find shared representations between imagery and perception, they would not be based on similarities in low-level features. However, a side effect of this might have been reduced cross-decoding performance since the classifiers could not exploit such low-level features to a great extent. The cross-decoding performance might also have been impacted by the limited range of images in the perception task which might not fully cover the variability in the imagined visual contents. This could be remedied in future studies by generating a large image set based on the imagined prompts using text-to-image algorithms45. In addition, due to the relatively long trial duration, we only had 192 imagery trials of training data per participant, which further limited classifier performance. As a result, in particular the very low cross-decoding accuracies (being less than 1% above chance) need to be interpreted with caution. Nevertheless, there are multiple factors that point towards the cross-decoding results being a true effect. First, due to the low temporal resolution typically employed in time–frequency decomposition, there were only 20 post-stimulus time points in our perception data, which is considerably less than the hundreds of time points that typically require multiple comparison correction in temporally resolved decoding analyses16 and we did appropriately correct for multiple comparisons. Second, we found the same latency of roughly 750–800 ms across four different analyses that investigated shared scene representations using three different decoding schemes: the mean-pairwise cross-decoding (Fig. 3c), the openness and clutter level property cross-decoding (Fig. 3d) and the shuffled property cross-decoding (Fig. S4). Third, the late timings in the cross-decoding analyses roughly align with late timings reported in previous imagery-perception cross-decoding studies as discussed above18,29. Fourth, the results of the shuffled property cross-decoding control analysis suggested that there are genuine property representations at the peak cross-decoding time points for openness, clutter level and brightness. Finally, decoding accuracies are considered a poor measure of effect size and low decoding accuracies can still constitute meaningful effects, indicating that information is represented consistently in neural response patterns across participants46,47.
Overall, our results suggest that the top-down reactivation of scene representations during visual imagery is mediated by cortical alpha activity and that the re-instantiated alpha representations are partly shared with late stages of scene perception. They show that alpha dynamics are not only critical for generating mental images of individual objects, but also mediate the creation of complex natural environments in our mind’s eye.
Methods
Participants
50 participants (25 male; mean age = 25.74 years, SD = 6.31) with normal or corrected-to-normal eyesight took part in the experiment. One participant was excluded from all analyses because they did not complete the imagery task due to a technical error during the EEG recording. During recruitment, participants filled in a German translation of the Vividness of Visual Imagery Questionnaire (VVIQ)33, a common measure of a person’s aptitude at evoking mental images, on Limesurvey (https://www.limesurvey.org/en/). The scale of the VVIQ was reversed so that higher scores indicate better imagery performance and participants were only allowed to take part if they had a VVIQ score of at least 24/80, since scoring lower would constitute moderate to severe aphantasia48. For a plot of the distribution of the VVIQ scores see supplementary Fig. S1. They provided written informed consent and received monetary compensation. The study was approved by the ethics committee of the Julius-Liebig-University Gießen und was in accordance with the 6th Declaration of Helsinki.
Stimuli
Participants imagined and viewed 16 naturalistic scenes which independently varied in four properties: openness (8 open and 8 closed scenes), naturalness (8 natural and 8 man-made scenes), clutter level (8 cluttered and 8 sparse scenes) and brightness (8 bright and 8 dark scenes) (see Fig. 1a). During the imagery task, participants visualized each scene based on a three-sentence description in German (see Fig. 1a for an example and supplementary Table S1 for a list of all descriptions with English translation). The descriptions were detailed (mean word count = 50.94, SD = 5.23) in order facilitate evoking rich mental images of the scenes as well as to ensure that the visualized scenes were as similar in properties to the described scenes (and thus the scenes used in the perception task, see below) as possible. In the perception task, participants viewed color images of scenes that matched the descriptions in the imagery task. Each participant was presented with three different images per scene description, in order to account for the variability in imagined scenes when assessing shared neural representations between imagery and perception later on. The total of 48 images (16 scenes × 3 images; see supplementary Fig. S2 for all images) was taken from Google Images, cropped and resized to a resolution of 800 × 600 pixels (21° horizontal visual angle). Brightness and contrast were adjusted where necessary to accentuate the desired lighting conditions (bright vs. dark) or enhance visibility.
Experiment design and procedure
The experiment consisted of two tasks (see Fig. 1c). Participants first performed the imagery task, in which they had to imagine scenes according to three-sentence descriptions (see Fig. 1a) while their EEG was recorded. At the beginning of each trial, participants were presented with a scene description at the center of the screen enclosed by a black frame (21° horizontal visual angle). The frame was visible throughout the entire trial to avoid evoking neural responses related to its onset. Participants were instructed to attentively read the description and try to identify all important aspects of the scene. They had an unlimited amount of time, but were asked to only take as long as needed, especially once they were more familiar with each description after multiple trials. Once they had familiarized themselves with the description, they pressed the spacebar and a black fixation dot appeared at the center of the screen. After a randomly jittered interval of 1000–2000 ms, the fixation dot turned red. This served as the cue to imagine the scene within the black frame that surrounded the red fixation dot. Participants were instructed to maintain the mental image of the scene while fixating the red dot until it turned black again after 4000 ms. Finally, a randomly jittered inter-trial interval (ITI) between 800–1200 ms followed, throughout which the black frame and fixation dot remained visible on screen. Before performing the imagery task, participants completed 16 practice trials (containing each scene once) which they underwent while their EEG cap was prepared. During the subsequent experiment, they were asked to imagine each of the 16 scenes 12 times, resulting in 192 trials (96 per property category). Trials were separated into 12 blocks. In each block, participants imagined all scenes once in random order. After performing the imagery task, participants filled in a questionnaire on Limesurvey in which they had to rate the mental image they had of each of the 16 scenes in respect to the four scene properties (openness, naturalness, clutter level, brightness) on a Likert scale from 1 to 7. We explicitly told the participants to make these ratings purely based on their mental images, even if they differed from the scenes in the descriptions. These ratings suggest that, at least on average, the properties of the scenes imagined by the participants aligned with the properties the scene descriptions intended to convey (see Fig. 1b for a plot of the ratings). After finishing the questionnaire, the participants took part in the perception task. Here, they viewed images of scenes that matched the scene descriptions in the imagery task while their EEG was again recorded. In each trial, a single image was shown at the center of the screen for 1000 ms. To make both tasks visually as similar as possible, we presented each stimulus in the perception task within the same black frame as in the imagery task and there also was a red fixation dot visible at the center of the image which the participants were instructed to attend continuously. Between each stimulus presentation, there was a jittered ITI of 300–700 ms in which again the black fixation dot surrounded by the black frame was visible. Each of the 48 images (see Stimuli) was presented 20 times for a total of 960 trials (480 per property category), except for one participant who only completed 416 trials (after balancing across categories) due to a technical error during the EEG recording. The trial order was fully randomized. Both imagery and perception tasks were interspersed by numerous self-paced breaks. The entire experiment, including EEG preparation, took between 2.5 and 3.5 h. Stimulus presentation was controlled using Psychtoolbox49.
EEG data acquisition and preprocessing
EEG data was acquired using an Easycap system with 64 channels and a Brain Products amplifier. The data was recorded at a sample rate of 500 Hz with Fz as the reference. The electrode arrangement followed the standard 10–10 system. All preprocessing was conducted using FieldTrip50. EEG was high- and low-pass filtered between 1 and 90 Hz, band-stop filtered to remove 50 Hz line noise, epoched between − 1000 ms and 5000 ms for the imagery data and − 1000 ms and 2000 ms for the perception data and baseline-corrected with a baseline window of 500 ms for imagery and 200 ms for perception. The EEG signal was then downsampled to 200 Hz. Noisy channels were removed by calculating the variance of each channel and rejecting outlier channels on this metric through visual inspection. Finally, independent component analysis (ICA) was applied to the EEG data and eye artifact components were removed through visual inspection.
Frequency decomposition
All of our frequency decompositions were conducted separately for each trial and each channel. We transformed the EEG signals within the entire 4000 ms imagery period into the frequency domain. For the perception data, the period from − 200 to 1000 ms was transformed into the time–frequency domain using a fixed-size 500 ms sliding window with 50 ms steps. For both decompositions, we utilized multitapers (15 DPSS tapers for imagery and 3 for perception) with constant 2 Hz frequency smoothing as implemented in FieldTrip. We chose multitapers in particular to increase power in our frequency-based analyses. Imagery data tends to be noisy (e.g.18,44) and the multitaper approach typically increases the signal-to-noise ratio in frequency-resolved data at the expense of increased temporal and frequency smoothing51. In addition, we decided to omit temporal resolution of the frequency decomposition of the imagery data while maintaining it for the perception data in order to further boost power, since imagery representations have been shown to be relatively invariable across time18,28,29 while perceptual representations have been shown to be temporally variable as a function of the different processing stages in the visual hierarchy29,30,31,32. Extracted frequencies ranged from 4 to 30 Hz, thus covering the theta (4–7 Hz), alpha (8–13 Hz) and beta (14–30 Hz) frequency bands.
Decoding analyses
All decoding analyses were conducted using CoSMoMVPA52. We employed linear discriminant analysis (LDA) classifiers which were trained within-subject on power patterns across all channels (see Fig. 2).
In order to assess how the individual imagined scenes and their properties are represented in neural activity patterns, we employed two decoding approaches and solely altered the features on which we conducted these analyses to answer different questions. The first approach was a mean pairwise scene decoding analysis in which classifiers were trained to discriminate between each possible pair of scenes and the resulting pairwise decoding accuracies were averaged across pairs, yielding a measure of discriminability among individual scenes from neural responses. The second approach was a scene property decoding analysis in which classifiers had to distinguish for each property in which of two property categories a scene belonged (e.g. for naturalness, if the imagined scene was a natural or a man-made scene).
First, we assessed at which neural frequencies scene information is represented by running the mean pairwise scene decoding and the scene property decoding on power patterns at each individual frequency from 4 to 30 Hz in the frequency-resolved imagery EEG data (see Fig. 2a). Classifiers were trained using a leave-one-trial-out cross-validation scheme in which one trial per stimulus was left out to avoid imbalance between conditions. Decoding accuracies were calculated as the mean of all cross-validation fold accuracies. For the mean pairwise scene decoding, this resulted in one mean pairwise scene decoding accuracy at each frequency for each participant. For the scene property decoding, this yielded one decoding accuracy at each frequency for each property and each participant.
Second, we examined if there are scene representations in the alpha frequency band that are shared between imagery and different stages of perceptual processing across time. We again conducted the mean pairwise and property decoding analyses, but trained the classifiers on power patterns across the entire alpha frequency range (8–13 Hz) in the frequency-resolved imagery data and tested them on the alpha power patterns at each time point in the time–frequency resolved perception data and vice versa (see Fig. 2b). The decoding accuracies of both train-test directions were averaged, which resulted in one mean pairwise cross-decoding accuracy time course as well as four property category cross-decoding accuracy time courses for each participant. We also exploratively conducted all imagery-perception cross-decoding analyses in the theta and beta bands (see supplementary Fig. S3).
Finally, it is possible that during the property decoding analyses the classifiers did not utilize property category information, but just exploited differences in features of individual imagined and perceived scenes. To investigate this, we conducted a shuffled property decoding analysis in which we estimated how well the classifiers perform if they are constrained to individual scene feature information, without access to property category information and compared this performance to the original property decoding. Within each participant, the 16 scenes were randomly assigned to two mock property categories for all possible \(\left(\genfrac{}{}{0pt}{}{16}{8}\right)\) = 12,780 permutations and at each permutation, classifiers were trained to distinguish between the property categories. Decoding accuracies were then averaged across permutations. Since in this decoding scheme the property category information was randomized, classifiers were limited to discriminate based on differences in the individual scene features in each category. If the property decoding only exploited individual scene features, the shuffled property decoding performance should be identical or highly similar to it. If, however, property category information was also used for property discrimination in the original analysis, the shuffled property decoding performance should be reduced in comparison since the classifiers in the shuffled analysis had no access to this additional source of information. We applied the shuffled property decoding scheme to both the frequency-resolved imagery property decoding and our imagery-perception property cross-decoding. We tested the difference between property decoding and shuffled property decoding by assessing if the peak decoding accuracy of each property in the imagery property decoding and imagery-perception property cross-decoding is greater than the decoding accuracy at the respective frequency or time point in the shuffled property decoding. For the imagery-perception cross-decoding, this comparison was omitted for naturalness since we found no interpretable above-chance cross-decoding performance for this property in the original analysis. We also conducted the shuffled property decoding across all frequencies in the imagery property decoding and all time points in the imagery-perception property cross-decoding (see supplementary Fig. S4) in order to investigate if the shuffled property (cross-)decoding was equal to or exceeded the peak property (cross-)decoding at the frequencies or time points at which we did not conduct the comparison. These analyses showed that this was not the case.
We investigated the temporal dynamics of imaginary scene representations as well. However, when conducting the aforementioned mean pairwise and property decoding analyses on broadband EEG responses at each time point, consistent with a previous imagery study18,44, we did not find robust above-chance decoding performance (see supplementary Fig. S5).
Statistical testing
Decoding accuracies in all frequency-resolved and time-resolved analyses were tested against chance level (50%) using threshold-free cluster enhancement (TFCE)53 as implemented in CoSMoMVPA. Multiple comparison correction was conducted by comparing actual TFCE statistics to a null distribution of maximum TFCE statistics, estimated using a permutation test with 10,000 sign permutations. The resulting z-scores were converted to p-values and thresholded at p < 0.05 (one-tailed). In our time-resolved analyses, only the post-stimulus time points were tested for significance. We compared the peak property (cross-)decoding accuracies to the shuffled property (cross-)decoding accuracies at the respective frequency or time point using paired, one-tailed Wilcoxon signed rank tests. All statistical tests were conducted on the full sample of n = 49.
Data availability
Data and analysis code of the main analyses are openly available from our OSF repository: https://osf.io/vxhtw/.
References
Pearson, J. The human imagination: The cognitive neuroscience of visual mental imagery. Nat. Rev. Neurosci. 20(10), 10. https://doi.org/10.1038/s41583-019-0202-9 (2019).
Boccia, M. et al. I can see where you would be: Patterns of fMRI activity reveal imagined landmarks. Neuroimage 144, 174–182. https://doi.org/10.1016/j.neuroimage.2016.08.034 (2017).
Johnson, M. R. & Johnson, M. K. Decoding individual natural scene representations during perception and imagery. Front. Human Neurosci. https://doi.org/10.3389/fnhum.2014.00059 (2014).
O’Craven, K. M. & Kanwisher, N. Mental imagery of faces and places activates corresponding stimulus-specific brain regions. J. Cogn. Neurosci. 12(6), 1013–1023. https://doi.org/10.1162/08989290051137549 (2000).
Dijkstra, N., Bosch, S. E. & van Gerven, M. A. J. Shared neural mechanisms of visual perception and imagery. Trends Cogn. Sci. 23(5), 423–434. https://doi.org/10.1016/j.tics.2019.02.004 (2019).
Pearson, J., Naselaris, T., Holmes, E. A. & Kosslyn, S. M. Mental imagery: Functional mechanisms and clinical applications. Trends Cogn. Sci. 19(10), 590–602. https://doi.org/10.1016/j.tics.2015.08.003 (2015).
Bastos, A. M. et al. Visual areas exert feedforward and feedback influences through distinct frequency channels. Neuron 85(2), 390–401. https://doi.org/10.1016/j.neuron.2014.12.018 (2015).
Chen, L., Cichy, R. M. & Kaiser, D. Alpha-frequency feedback to early visual cortex orchestrates coherent naturalistic vision. Sci. Adv. https://doi.org/10.1126/sciadv.adi2321 (2023).
Fries, P. Rhythms for cognition: Communication through coherence. Neuron 88(1), 220–235. https://doi.org/10.1016/j.neuron.2015.09.034 (2015).
van Kerkoerle, T. et al. Alpha and gamma oscillations characterize feedback and feedforward processing in monkey visual cortex. Proceed. Nat. Acad. Sci. 111(40), 14332–14341. https://doi.org/10.1073/pnas.1402773111 (2014).
Bartsch, F., Hamuni, G., Miskovic, V., Lang, P. J. & Keil, A. Oscillatory brain activity in the alpha range is modulated by the content of word-prompted mental imagery. Psychophysiology 52(6), 727–735. https://doi.org/10.1111/psyp.12405 (2015).
Michel, C. M., Kaufman, L. & Williamson, S. J. Duration of EEG and MEG α suppression increases with angle in a mental rotation task. J. Cognit. Neurosci. 6(2), 139–150. https://doi.org/10.1162/jocn.1994.6.2.139 (1994).
Salenius, S., Kajola, M., Thompson, W. L., Kosslyn, S. & Hari, R. Reactivity of magnetic parieto-occipital alpha rhythm during visual imagery. Electroencephalogr. Clin. Neurophysiol. 95(6), 453–462. https://doi.org/10.1016/0013-4694(95)00155-7 (1995).
Short, P. L. The objective study of mental imagery. Br. J. Psychol. 44(1), 38 (1953).
Slatter, K. H. Alpha rhythms and mental imagery. Electroencephalogr. Clin. Neurophysiol. 12(4), 851–859. https://doi.org/10.1016/0013-4694(60)90133-4 (1960).
Grootswagers, T., Wardle, S. G. & Carlson, T. A. Decoding dynamic brain patterns from evoked responses: A tutorial on multivariate pattern analysis applied to time series neuroimaging data. J. Cognit. Neurosci. 29(4), 677–697. https://doi.org/10.1162/jocn_a_01068 (2017).
Haynes, J.-D. A primer on pattern-based approaches to fMRI: Principles, pitfalls, and perspectives. Neuron 87(2), 257–270. https://doi.org/10.1016/j.neuron.2015.05.025 (2015).
Xie, S., Kaiser, D. & Cichy, R. M. Visual imagery and perception share neural representations in the alpha frequency band. Curr. Biol. 30(13), 2621-2627.e5. https://doi.org/10.1016/j.cub.2020.04.074 (2020).
Epstein, R. A. Parahippocampal and retrosplenial contributions to human spatial navigation. Trends Cognit. Sci. 12(10), 388–396. https://doi.org/10.1016/j.tics.2008.07.004 (2008).
Mak, M., de Vries, C. & Willems, R. M. The influence of mental imagery instructions and personality characteristics on reading experiences. Collabra Psychol. 6(1), 43. https://doi.org/10.1525/collabra.281 (2020).
Schacter, D. L., Benoit, R. G. & Szpunar, K. K. Episodic future thinking: Mechanisms and functions. Curr. Opin. Behav. Sci. 17, 41–50. https://doi.org/10.1016/j.cobeha.2017.06.002 (2017).
Groen, I. I., Silson, E. H. & Baker, C. I. Contributions of low-and high-level properties to neural processing of visual scenes in the human brain. Philos. Trans. R. Soc. B Biol. Sci. 372(1714), 20160102. https://doi.org/10.1098/rstb.2016.0102 (2017).
Park, S., Konkle, T. & Oliva, A. Parametric coding of the size and clutter of natural scenes in the human brain. Cerebral Cortex 25(7), 1792–1805. https://doi.org/10.1093/cercor/bht418 (2015).
Kaiser, D., Quek, G. L., Cichy, R. M. & Peelen, M. V. Object vision in a structured world. Trends Cognit. Sci. 23(8), 672–685. https://doi.org/10.1016/j.tics.2019.04.013 (2019).
Võ, M.L.-H. The meaning and structure of scenes. Vis. Res. 181, 10–20. https://doi.org/10.1016/j.visres.2020.11.003 (2021).
Cichy, R. M., Khosla, A., Pantazis, D. & Oliva, A. Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks. NeuroImage 153, 346–358. https://doi.org/10.1016/j.neuroimage.2016.03.063 (2017).
Harel, A., Groen, I. I., Kravitz, D. J., Deouell, L. Y. & Baker, C. I. The temporal dynamics of scene processing A multifaceted EEG investigation. Eneuro. 3(5), 1. https://doi.org/10.1523/ENEURO.0139-16.2016 (2016).
Corriveau, A., Kidder, A., Teichmann, L., Wardle, S. G. & Baker, C. I. Sustained neural representations of personally familiar people and places during cued recall. Cortex 158, 71–82. https://doi.org/10.1016/j.cortex.2022.08.014 (2023).
Dijkstra, N., Mostert, P., de Lange, F. P., Bosch, S. & van Gerven, M. A. Differential temporal dynamics during visual imagery and perception. Elife 7, e33904. https://doi.org/10.7554/eLife.33904 (2018).
Carlson, T. A., Hogendoorn, H., Kanai, R., Mesik, J. & Turret, J. High temporal resolution decoding of object position and category. J. Vis. 11(10), 9. https://doi.org/10.1167/11.10.9 (2011).
Singer, J. J. D., Cichy, R. M. & Hebart, M. N. The spatiotemporal neural dynamics of object recognition for natural images and line drawings. J. Neurosci. 43(3), 484–500. https://doi.org/10.1523/JNEUROSCI.1546-22.2022 (2023).
King, J.-R. & Dehaene, S. Characterizing the dynamics of mental representations: The temporal generalization method. Trends Cognit. Sci. 18(4), 203–210. https://doi.org/10.1016/j.tics.2014.01.002 (2014).
Marks, D. F. Visual imagery differences in the recall of pictures. Br. J. Psychol. 64(1), 17–24. https://doi.org/10.1111/j.2044-8295.1973.tb01322.x (1973).
Dijkstra, N., Bosch, S. E. & van Gerven, M. A. J. Vividness of visual imagery depends on the neural overlap with perception in visual areas. J. Neurosci. 37(5), 1367–1373. https://doi.org/10.1523/JNEUROSCI.3022-16.2016 (2017).
Runge, M. S., Cheung, M. W. L. & D’Angiulli, A. Meta-analytic comparison of trial- versus questionnaire-based vividness reportability across behavioral, cognitive and neural measurements of imagery. Neurosci. Conscious. https://doi.org/10.1093/nc/nix006 (2017).
Greene, M. R. & Oliva, A. Recognition of natural scenes from global properties: Seeing the forest without representing the trees. Cognit. Psychol. 58(2), 137–176. https://doi.org/10.1016/j.cogpsych.2008.06.001 (2009).
Dijkstra, N., Ambrogioni, L., Vidaurre, D. & van Gerven, M. Neural dynamics of perceptual inference and its reversal during imagery. eLife 9, e53588. https://doi.org/10.7554/eLife.53588 (2020).
Linde-Domingo, J., Treder, M. S., Kerrén, C. & Wimber, M. Evidence that neural information flow is reversed between object perception and object reconstruction from memory. Nat. Commun. 10(1), 1. https://doi.org/10.1038/s41467-018-08080-2 (2019).
Grill-Spector, K. & Weiner, K. S. The functional architecture of the ventral temporal cortex and its role in categorization. Nat. Rev. Neurosci. 15(8), 536–548. https://doi.org/10.1038/nrn3747 (2014).
Groen, I. I. A., Ghebreab, S., Prins, H., Lamme, V. A. F. & Scholte, H. S. From image statistics to scene gist: evoked neural activity reveals transition from low-level natural image structure to scene category. J. Neurosci. 33(48), 18814–18824. https://doi.org/10.1523/JNEUROSCI.3128-13.2013 (2013).
Hansen, N. E., Noesen, B. T., Nador, J. D. & Harel, A. The influence of behavioral relevance on the processing of global scene properties: An ERP study. Neuropsychologia 114, 168–180. https://doi.org/10.1016/j.neuropsychologia.2018.04.040 (2018).
Lowe, M. X., Rajsic, J., Ferber, S. & Walther, D. B. Discriminating scene categories from brain activity within 100 milliseconds. Cortex 106, 275–287. https://doi.org/10.1016/j.cortex.2018.06.006 (2018).
Harel, A., Nador, J. D., Bonner, M. F. & Epstein, R. A. Early electrophysiological markers of navigational affordances in scenes. J. Cognit. Neurosci. 34(3), 397–410. https://doi.org/10.1162/jocn_a_01810 (2022).
Shatek, S. M., Grootswagers, T., Robinson, A. K. & Carlson, T. A. Decoding images in the mind’s eye: The temporal dynamics of visual imagery. Vision 3(4), 4. https://doi.org/10.3390/vision3040053 (2019).
Becker, C. & Laycock, R. Embracing deepfakes and AI-generated images in neuroscience research. Eur. J. Neurosci. 58(3), 2657–2661. https://doi.org/10.1111/ejn.16052 (2023).
Hebart, M. N. & Baker, C. I. Deconstructing multivariate decoding for the study of brain function. NeuroImage 180, 4–18. https://doi.org/10.1016/j.neuroimage.2017.08.005 (2018).
Robinson, A. K., Quek, G. L. & Carlson, T. A. Visual representations: Insights from neural decoding. Ann. Rev. Vis. Sci. 9(1), 313–335. https://doi.org/10.1146/annurev-vision-100120-025301 (2023).
Zeman, A. et al. Phantasia-the psychological significance of lifelong visual imagery vividness extremes. Cortex 130, 426–440. https://doi.org/10.1016/j.cortex.2020.04.003 (2020).
Brainard, D. H. The psychophysics toolbox. Spatial Vis. 10(4), 433–436. https://doi.org/10.1163/156856897X00357 (1997).
Oostenveld, R., Fries, P., Maris, E. & Schoffelen, J.-M. FieldTrip: Open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Comput. Intell. Neurosci. 2011, 156869. https://doi.org/10.1155/2011/156869 (2011).
Cohen, M.X. Analyzing neural time series data: Theory and practice. (2014)
Oosterhof, N. N., Connolly, A. C. & Haxby, J. V. CoSMoMVPA: Multi-Modal multivariate pattern analysis of neuroimaging data in matlab/GNU octave. Front. Neuroinform. https://doi.org/10.3389/fninf.2016.00027 (2016).
Smith, S. & Nichols, T. Threshold-free cluster enhancement: Addressing problems of smoothing, threshold dependence and localisation in cluster inference. NeuroImage 44(1), 83–98. https://doi.org/10.1016/j.neuroimage.2008.03.061 (2009).
Acknowledgements
We would like to thank Marius Geiss for his assistance in procuring the stimuli and in conducting some of the measurements. D.K. is supported by the Deutsche Forschungsgemeinschaft (DFG; SFB/TRR 135, project number 222641018) and an ERC Starting Grant (ERC-2022-STG 101076057). This research was further supported by “The Adaptive Mind”, funded by the Excellence Program of the Hessian Ministry of Higher Education, Science, Research and Art.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Contributions
R.S.: Conceptualization, Methodology, Data curation, Investigation, Formal analysis, Visualization, Writing—original draft, Project administration, Writing—review and editing.
D.K.: Conceptualization, Methodology, Supervision, Project administration, Funding acquisition, Writing—review and editing.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Stecher, R., Kaiser, D. Representations of imaginary scenes and their properties in cortical alpha activity. Sci Rep 14, 12796 (2024). https://doi.org/10.1038/s41598-024-63320-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-024-63320-4
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.