Electrophysiological correlates of the interplay between low-level visual features and emotional content during word reading

Processing affectively charged visual stimuli typically results in increased amplitude of specific event-related potential (ERP) components. Low-level features similarly modulate electrophysiological responses, with amplitude changes proportional to variations in stimulus size and contrast. However, it remains unclear whether emotion-related amplifications during visual word processing are necessarily intertwined with changes in specific low-level features or, instead, may act independently. In this pre-registered electrophysiological study, we varied font size and contrast of neutral and negative words while participants were monitoring their semantic content. We examined ERP responses associated with early sensory and attentional processes as well as later stages of stimulus processing. Results showed amplitude modulations by low-level visual features early on following stimulus onset – i.e., P1 and N1 components –, while the LPP was independently modulated by these visual features. Independent effects of size and emotion were observed only at the level of the EPN. Here, larger EPN amplitudes for negative were observed only for small high contrast and large low contrast words. These results suggest that early increase in sensory processing at the EPN level for negative words is not automatic, but bound to specific combinations of low-level features, occurring presumably via attentional control processes.

the second-best model assuming additive effects of size, contrast, and emotion (Table 2). However, follow-up contrasts showed no reliable amplitude differences as a function of emotional content. Specifically, evidence leaned in favor of the null model when assessing emotion differences of words presented in small size and low contrast (BF 10 = e −1.72 = 0.18), large size and high contrast (BF 10 = 0.21), large size and low contrast (BF 10 = 0.35), or small size and high contrast (BF 10 = 0.73; inconclusive) ( Table 3).
In a following step, we sought to assess the separate contribution of size, contrast, and emotion by including all possible models, i.e., not only the theoretically relevant ones that always included emotion. We started with the full model and progressively tested all models that could be created by removing one interaction or main effect one at a time ("top-down analysis"; see http://bayesfactorpcl.r-forge.r-project.org/#fixed). This procedure revealed that omitting the factor emotion from the full model improved fitting by 82.27 times. Removing the interactions size × contrast × emotion, contrast × emotion, and size × emotion also improved fitting by 35.52, 24.29, and 21.54 times, respectively. Thus, emotion did not seem to have any explanatory power; instead, it penalized the models in which it was included. Conversely, omitting the factor contrast or the contrast × size interaction lowered the explanatory value of the resulting model by 1/e −3.75 = 42.51 and 3.57 × 10 14 times, respectively. Finally, removing the factor size was maximally detrimental, as it would lower the explanatory value of the resulting model by 9.79 × 10 15 times (see Table 4).
To summarize, the amplitude of the P1 seemed to be mostly influenced by font size, contrast, and their interaction -with the lowest values in response to words presented in small font and low contrast -, whereas emotion did not seem to play a role.

N1.
Mean amplitude values of the N1 component were best explained by the size × contrast × emotion interaction model (including all three factors and their interactions) relative to the null (BF 10 > 1.80 × 10 308 ). The full model was also 9.96 × 10 16 times better than the second-best model (additive effects of size + contrast + emotion). Nonetheless, similarly to the P1, follow-up contrasts showed that emotion did not influence N1 amplitude, with evidence favoring the null model in all tested comparisons (see Table 3 for details).
Additional top-down model comparisons showed that omitting contrast × emotion from the full model improved fitting by 67.36 times. Similarly, removing emotion × size, size × contrast × emotion, and emotion also improved fitting by 62.80, 54.05, and 45.15 times, respectively. On the other hand, omitting size or contrast × size lowered the explanatory value of the resulting model by 4.69 × 10 −17 and 1.21 × 10 22 times. Finally, omitting the factor contrast was maximally detrimental, as it lowered the explanatory value of the resulting model by 1.24 × 10 55 times.
To summarize, the mean amplitude of the N1 component was reliably modulated by contrast as well as its interaction with size, with lower (i.e., less negative) values following words presented in small font and low contrast. In analogy with the preceding P1 component, emotional valence did not seem to modulate the amplitude of the N1.
EPN. Mean amplitude of the EPN was best explained by the size + emotion model, not only relative to the null model (BF 10 > 1.80 × 10 308 ) but also compared to the second-best model size × emotion (32.14). Follow-up paired comparisons investigating emotion-dependent amplitude modulations of this component showed evidence in favor of the null model when words were presented in small size and low contrast (BF 10 = 0.19). However, the alternative model had to be preferred over the null when words were presented in small size and high contrast (BF 10 = 10.07) as well as large size and low contrast (BF 10 = 5.42). When words were presented in large size and high contrast, evidence remained inconclusive (BF 10 = 0.53).
Model fitting improved if contrast, contrast × size, size × emotion, and contrast × emotion were removed from the full model, whereas removing the size × contrast × emotion interaction only marginally improved fitting. Interestingly, omitting the factor emotion decreased the explanatory value of the resulting model by 9.87 times, while removing size was much more deleterious (1.48 × 10 39 ).
Thus, EPN amplitude was not reliably modulated by contrast but mostly by font size, with larger (i.e., more negative) values in response to words presented in large compared to small font. In addition, emotion had a small but non-negligible additive role, as evidenced by a slight increase in EPN amplitude for unpleasant compared to neutral words when presented in small font and high contrast as well as large font and low contrast. Therefore, in this study, the mean amplitude of the LPP was reliably modulated by additive effects of size and contrast, with overall larger amplitude following words presented in small font and low contrast. Emotional valence did not seem to play a role.
Exploratory analyses. Visual inspection of the ERP waveforms (left panels of Fig. 1) revealed that the highest peak of the P1 and N1 components changed as a function of experimental condition. This latency shift was not predicted in the pre-registered protocol and, in principle, could be a potential source of bias when analyzing mean amplitude values: for instance, the pre-selected time windows might encompass the whole ERP component in one condition, but only half of it in another one. To overcome this problem, we performed additional exploratory analyses using peak amplitude as dependent variable, with the important caveat that this measure is highly susceptible to noise 53,54 and the results should therefore be interpreted with caution. We also analyzed peak latency, because this measure could still lead to valuable insights regarding the speed at which size and contrast influence event-related electrophysiological signals during emotional word reading. The results of these exploratory analyses -which can be found in the Supplementary Materials -did not challenge the main interpretation drawn based on the confirmatory results.
Source estimations were based on significant effects at the scalp level. Source reconstructions of the generators of significant ERP differences were computed and statistically assessed with SPM12 55 . Group inversion 56 were computed, and the multiple sparse priors algorithm implemented in SPM12 was applied. Inversion results showed strong early visual responses both to size and contrast manipulations. Broad inferior and middle occipital, as well as fusiform responses were found for large words in the P1 and N1 time window. Later, within the EPN time window, additionally significant changes in cortical generators were localized in parietal areas. For high contrast, similarly broad enhanced visual responses were found in the N1 and EPN time window, as well as enhanced motor-related and posterior cingulate cortex activations. Later, in the LPP time window, this effect reversed, and low contrast led to stronger visual activations. Details can be found in the Supplementary Materials.

Discussion
In this study, we orthogonally varied font size, contrast, and emotion content while examining ERP responses associated with sensory and attentional mechanisms. This study was conducted to better understand whether: (i) ERP modulations due to changes in low-level visual features are limited only to font size or can be generalized to other features (here, contrast); (ii) emotional information and low-level features would modulate amplitude additively or interactively. More generally, we sought to clarify whether sensory gating mechanisms, typically proposed to explain attentional modulations of electrophysiological signals in response to biologically salient pictures, could similarly underlie the enhanced processing of abstract word stimuli carrying a negative emotional meaning. By pre-registering the study and analysis protocol, we minimized biases possibly emerging after observing the study outcome 52,57 .

Low-level visual features dominate early perceptual processing stages. Font size and contrast
were found to explain the observed amplitude changes of the P1 and N1 ERP components, which reflect early stages of stimulus detection and discrimination taking place in the extrastriate visual cortex [3][4][5] These results are in line with previous work reporting larger P1 and N1 amplitudes for stimuli with higher contrast and larger size [7][8][9] . Our experimental design additionally revealed interactive effects of contrast and font size on P1 and N1 amplitudes, with lowest amplitudes in response to words presented in small font and low contrast. Moreover, our model comparison approach allowed us to precisely pinpoint the relative contribution of low-level features on ERP amplitude modulations. Specifically, for P1 amplitudes, size had the largest explanatory value, followed by its interaction with contrast. Conversely, changes in N1 amplitudes were mostly due to contrast, followed by its interaction with size. These results point to a possible "hierarchy" among several low-level features during word reading, with size being more salient during initial stimulus detection (P1) while contrast may be more relevant during discrimination processes (N1).
The current results did not reveal early effects of emotional content, in contrast with some studies 49,58,59 , but in accordance with others 29,60,61 . Future work is needed to directly evaluate whether early emotion effects reported in the literature might be contingent upon specific experimental conditions (e.g., lexical vs. semantic vs evaluative tasks). Also, emotion did not interact with either font size or contrast, at variance with similar studies using pictorial stimuli 33,35,36 , indirectly suggesting that biologically relevant pictures may be more salient than words during early stages of stimulus identification and discrimination.
(difference between emotion conditions δ = 0). For details, see Section 4.6 and Table 3. Abbreviations: large low: large size, low contrast; large high: large size, high contrast; small low: small size, low contrast; small high: small size, high contrast; neg: negative; neut: neutral; localizer: average of all conditions; neg minus neut: difference between negative and neutral conditions (averaged across font size and contrast).
SCIENtIfIC REPoRtS | (2018) 8:12228 | DOI:10.1038/s41598-018-30701-5 Independent effects of size and emotion during early attentional selection. Emotional words typically elicit larger EPN compared to neutral words, indicating preferential lexical access due to early attentional selection 26,47,49 . In addition, recent work showed that font size may affect electrophysiological responses to emotional material, as evidenced by more negative EPN amplitude for large pictures and words 36,50 . Our results contribute to this debate in several ways. First, contrast alone does not seem to reliably explain amplitude variations of the EPN during word reading, similar to recent work using emotional and neutral pictures 35 . Second, we partially replicated the findings of Bayer and colleagues 50 by showing slightly more negative EPN amplitude in response to emotional words when presented in large font, albeit only when contrast was low (right panel of Fig. 1C). These results were obtained using Dutch (instead of German) words, which speaks in favor of the generalizability of these modulatory effects.
We found larger EPN for emotional vs. neutral words also when font size was small and contrast was high. However, in contrast to previous studies using pictures or words 36,50 , no increased EPN amplitude for negative words was observed in response to large, high contrast stimuli. Thus, processing emotional valence while manipulating more than one low-level visual feature gives rise to more complex modulatory effects than previously reported (when only one single low-level feature was changed across conditions). We speculate that, for degraded visual stimuli (e.g., small, low contrast words), there might have been little room for EPN attentional enhancement by negative emotion. Conversely, since large high contrast words were easy to detect, no sensory gain by attentional processes was necessary in this condition. Interestingly, emotional valence seemed to boost brain activity in response to small high contrast as well as large low contrast words, i.e., two conditions in which basic visual information is concurrently facilitating and hindering recognition. This complex pattern challenges to some degree the idea of an automatic emotion processing at the EPN level, and suggests that enhanced attention to negative emotional words -as captured by the EPN -might depend on the processing efficiency of these low-level features.
Sustained processing of emotional content may be contingent upon task requests. Previous work has consistently shown larger LPP amplitudes for emotional compared to neutral words 39,40 likely subtending sustained cognitive processes 28,29 . In our study, font size and contrast modulated LPP amplitude in an additive way, whereas emotion did not seem to play a role.
Several post-hoc explanations can be put forward to account for this result. First, the experimental task may contribute to the systematic modulation of this ERP component. For instance, explicitly requesting participants to pay attention to the semantic content of the words may be more effective in showing emotion-dependent amplitude differences compared to a simple detection task. However, this explanation seems unlikely, since a larger late positivity for emotional as opposed to neutral words has been observed in passive viewing designs 62 , color-naming 63 , lexical decision 48,49,64 , or word identification tasks 65 .
Another source of variation could stem from the task-relevance of the emotional content itself. Some authors argued that emotion captures attention only if (explicitly or implicitly) advantageous for participants to track this feature [66][67][68][69][70] . Indeed, a task that requires evaluating stimulus valence typically elicits stronger emotional modulation of the LPP (e.g., top-down attention to emotion or self-relevance evaluation 19,71 ). In addition, Bayer et al. 50 used an 1-back task to increase compliance. The authors interspersed special trials (identified by a green frame) requiring a button press if the current stimulus was identical to the immediately preceding one. This task requires online maintenance in working memory of the preceding word as well as updating, discrimination, recognition, and comparison with the newly presented word. Thus, constant rehearsal of the previous stimulus is a reasonable and efficient strategy to comply with task demands. In contrast, participants in our study were only required to identify whether the displayed word referred to a color, thereby limiting the processing time needed to complete the task. No updating in working memory was necessary. Therefore, the ERP signal we recorded reflects cognitive processes more consistently related to word reading and not contaminated by working memory components. These arguments notwithstanding, this project was based on Bayer et al. 50 but not meant as its direct replication. Instead, we wished to assess the generalizability of the reported effect using an even simpler experimental paradigm, especially considering that results reported in the literature are not consistent (see Section 3.1).
From a different angle, it is also possible that participants' attention was captured by the high variability in font size and contrast, whose saliency is arguably more powerful than emotional content per se. Affective differences might play a negligible role in visual word processing when there is a concurrent, massive variation of these sensory features. When competition occurs between different features, the ones that are the most salient (in this case, size and contrast) would bias attention the most and overshadow any potential effects of weaker ones, here emotion 72 .
It is also possible that we were unable to detect emotion-dependent modulations of electrophysiological activity because being too small, short-lived, or occurring in only partly overlapping time-windows or electrode clusters (or even within non-selected clusters). Recent MEG studies reported emotion-related activity originating from prefrontal generators, not linked to specific components 58,73,74 . However, similar caveats would also apply to earlier studies investigating modulations of early sensory processing at the scalp level.
Conclusions. The present findings suggest a hierarchical, serial interplay between the processing of low-level visual features and emotional content during word reading. Early perceptual processing was mostly influenced by the interaction between font size and contrast -i.e., smaller P1 and N1 for stimuli harder to discriminate -, whereas emotional content did not seem to be relevant. On the other hand, selective attention allocation was independently affected by font size and emotion: in particular, negative word meaning elicited a larger EPN when stimuli were presented in small font and high contrast. Thus, enhanced attention for negative emotion during word reading at the EPN level is not unconditional, but likely depending on the processing efficiency defined by the combination of low-level features, here with a focus on size and contrast. Later, sustained cognitive processes were sensitive to font size and contrast, presumably more salient than semantic information not only perceptually but also in terms of task-relevance.

Participants.
A total of 42 participants were recruited from the student population of Ghent University. They were right-handed, native Dutch-speaking, healthy students, with normal or corrected-to-normal vision. The study protocol was approved by the ethics committee at Ghent University (Faculteit Psychologie en Pedagogische Wetenschappen, Kenmerk 2017/07/Gilles Pourtois), including any relevant details and confirming that the experiment was performed in accordance with relevant guidelines and regulations. Participants were required to sign an informed consent prior to the beginning of the experiment, debriefed at the end of it, and paid € 10 per hour for their participation. Each dataset was considered eligible for further analyses if the EEG signal -after pre-processing -was judged "clean" based on criteria selected a priori (see Section 4.4), as well as demonstrating adequate task engagement based on behavioral performance (see Section 4.3). Two datasets were discarded: one due to performance below this threshold, the other because the participant aborted testing. Thus, the final sample consisted of 40 volunteers (all right-handed, median age 23.5, range 19-34, 26 females).
From the 20th participant onward, we monitored Bayes factors (BFs) 75,76 every 3 participants (because 3 volunteers per day were tested). The a priori stopping rules were the following: (i) statistical rule: one of the models of interest (see Section 4.6) explained amplitude modulations of the components of interest 10 times better than the null model (or vice versa) and 10 times better than the second-best model; (ii) pragmatic rule: due to budgetary constraints, we had to stop after a maximum number of 40 participants with acceptable behavioral performance and clean EEG data. A third rule, not explicitly mentioned in the pre-registration protocol but logically following from the pre-registered analysis plan, was that the ERP components of interest had to be reliably different from noise (as confirmed by the procedure highlighted in Section 4.6). P1 and N1 were clear even after a few participants, whereas the signal-to-noise ratio of the EPN was generally lower. To ensure a robust identification of this component (as a difference between neutral and negative words, irrespective of size and contrast), we decided to complete data collection using the maximum possible number of participants.  Stimuli. Emotional and neutral words were selected from a database derived from a large multi-center study 77 .
Two-hundred and forty negative and 240 neutral nouns were selected and matched with respect to word length, frequency, power/dominance (i.e., participants judged if words referred to something weak/submissive or strong/ dominant), and age of acquisition (see Supplementary Table S1). The whole stimulus set was also rated during pilot testing, to further validate the stimulus selection (see Supplementary Table S2).
Procedure. Participants were seated in a dimly lit, electrically shielded experimental room, with their head on a chin rest placed approximately 60 cm away from a 19″ CRT screen with resolution of 1,280 × 1,024 pixels. After filling out the informed consent and a short demographic questionnaire, the experiment began. In each trial, a single Dutch word (conveying either unpleasant or neutral content, based on the normative ratings in ref. 77 ) was presented on a gray background (RGB values [201, 201, 201]), either in a small or large font (35 vs. 140 pixels; visual angle 3° × 1.1° and 11.8° × 3.6°, respectively) and in high or low contrast (RGB values [0, 0, 0] vs. [191,191,191]). This 2 (emotion) × 2 (size) × 2 (contrast) factorial design resulted in the presentation of 480 target words (60 stimuli per condition). For each participant, negative and neutral words were randomly assigned to each of these size and contrast variations. Additionally, 20 words describing colors (e.g., groen, i.e., green in Dutch) were presented in all size and contrast conditions, resulting in 80 additional probes. To ensure that participants would pay attention to the semantic content of each word, they were required to press the spacebar as soon as they could detect a word referring to a color. Accuracy and response times were recorded to verify task compliance (see Supplementary Materials). We decided a priori to exclude all participants with accuracy below 80% in any of the four size and contrast conditions, indicating insufficient attention to the words. The total number of 560 words were split in 8 runs of 70 words each. Participants could take a short break in between runs. Each word was presented for 1,000 ms, followed by a variable inter-trial interval between 1,000 and 1,500 ms displaying a fixation cross (  For the values assigned to each parameter, see our commented script at https://osf.io/c7g9y. We decided a priori to discard any dataset in which the artifact detection procedure identified more than 10 noisy scalp channels. No dataset fulfilled this criterion (the median number of interpolated channels was 4, range 1-10). Noisy channels were interpolated via a spherical spline procedure 82 . Please note that the interpolated channels were mostly identified outside of the clusters selected for the ERP components definition (max interpolated channels in clusters: (2); therefore, any potential distortions of the EEG signal due to interpolation was negligible. Ocular channels were discarded, and the scalp data re-referenced to the average signal. Epochs extending from −200 ms to +1,000 ms time-locked to word onset were created, and baseline correction was applied using the pre-stimulus interval. Finally, 8 grand-averages were computed following each combination of our 2 × 2 × 2 factorial design: (1) negative words, large font, high contrast; (2) negative words, small font, high contrast; (3) negative words, large font, low contrast; (4) negative words, small font, low contrast; (5) neutral words, large font, high contrast; (6) neutral words, small font, high contrast; (7) neutral words, large font, low contrast; (8) neutral words, small font, low contrast.
Identification of ERP components. The standard approach of selecting electrodes and time windows of the ERP components of interest by visually inspecting the grand-average waveforms can lead to a severe inflation of false positives 53,83 . Furthermore, this approach typically assumes that the ERP components observed in the grand-averaged data are reliably different from noise, but this assumption is seldom verified. To avoid these issues, we computed the grand-average ERP signal across all participants and conditions and conducted repeated measures, two-tailed permutation tests based on the t max statistic 84  3. randomly permute condition labels (i.e., each observation is either assigned its actual value or zero), calculate the t-test, and store its corresponding t-value; 4. repeat step 3 5,000 times to create a distribution of the possible t-values for these data under the null hypothesis; 5. the relative location of t observed in this empirically generated null distribution provides the p-value for the observed data, i.e., how probable the actual difference wave at this specific time point would be if the null hypothesis were true; 6. at each time point, repeat this procedure for each electrode and retain only the highest t-value (i.e., t max ).
The p-values for the original observations are derived from the t max scores.
All timepoints between 0 and 1,000 ms (i.e., 256 timepoints at 256 Hz sampling rate) at all 64 scalp channels were included in the analysis, resulting in 16,384 total comparisons. The resulting differences were considered statistically significant (i.e., desired family-wise error rate kept at ~5%) if they exceeded the t max of each set of tests.
As already mentioned in the pre-registration protocol, visual inspection of the results of the mass univariate procedure was carried out to minimize Type-II errors (this approach tends to be overly conservative) and ensure that the results would be consistent with well-known characteristics of the ERP components of interest (i.e., polarity, latency, and topography) that have been observed and replicated in the literature. Visual inspection of the localizer data revealed a topography and time window of the EPN that were slightly inconsistent with those reported earlier in the existing literature, i.e., a centroparietal electrode cluster (with positive amplitude values) instead of the typical occipital cluster (with negative amplitude values). Based on this observation, we refined the choice of electrodes and time windows by computing the t max procedure on negative minus neutral difference waves (see https://osf.io/aev6j/ for the complete procedure in MATLAB). Please note that this approach also minimizes experimenter's biases, because being performed on data averaged across font size and contrast.
This procedure allowed us to successfully identify the components of interest in the following electrode clus- (1) size + emotion; (2) contrast + emotion; (3) size + contrast + emotion. The interactive models were: (4) size × emotion; (5) contrast × emotion; (6) size × contrast × emotion. Participants were included as varying factor, and their variance considered nuisance. Please note that, due to poor model convergence, we could not include stimuli (i.e., words) as random effect, contrary to our pre-registered plan.
To further characterize the direction of the effects, two-tailed Bayesian t-tests were calculated to estimate the degree of evidence in favor of a model assuming differences between two specified conditions relative to a model assuming no differences 91,92 . The null hypothesis was specified as a point-null prior (i.e., standardized effect size δ = 0), whereas the alternative hypothesis was defined as a Jeffrey-Zellner-Siow (JZS) prior, i.e., a folded Cauchy distribution centered around δ = 0 with scaling factors of r = 1, r = 0.707, and r = 0.5, to verify the robustness of the results 93 . The most conservative BF was used as reference to decide whether to continue with data collection (see Section 4.1). Visualization