Suppression durations for facial expressions under breaking continuous flash suppression: effects of faces’ low-level image properties

Perceptual biases for fearful facial expressions are observed across many studies. According to the low-level, visual-based account of these biases, fear expressions are advantaged in some way due to their image properties, such as low spatial frequency content. However, there is a degree of empirical disagreement regarding the range of spatial frequency information responsible for perceptual biases. Breaking continuous flash suppression (b. CFS) has explored these effects, showing similar biases for detecting fearful facial expressions. Recent findings from a b. CFS study highlight the role of high, rather than low spatial frequency content in determining faces’ visibility. The present study contributes to ongoing discussions regarding the efficacy of b. CFS, and shows that the visibility of facial expressions vary according to how they are normalised for physical contrast and spatially filtered. Findings show that physical contrast normalisation facilitates fear’s detectability under b. CFS more than when normalised for apparent contrast, and that this effect is most pronounced when faces are high frequency filtered. Moreover, normalising faces’ perceived contrast does not guarantee equality between expressions’ visibility under b. CFS. Findings have important implications for the use of contrast normalisation, particularly regarding the extent to which contrast normalisation facilitates fear bias effects.


Scientific Reports
| (2020) 10:17427 | https://doi.org/10.1038/s41598-020-74369-2 www.nature.com/scientificreports/ The purpose of the present experiment is to replicate and extend the study design employed by Stein et al. 25 , in order to understand the spatial information that underpins the threat bias in b. CFS. This extension contributes to our understanding of how low-level image properties influence perceptual biases for face expressions, how these effects manifest under b. CFS conditions, and what this means for the value of this technique as a measure of conscious perceptual biases. We used the same experimental parameters as those employed by Stein et al. 25 , but extended this to include (1) a broader range of facial expressions, including happy, angry and disgust stimuli (2) a mid-range spatial frequency condition as an intermediate between the low and high frequency conditions, to better understand the frequency tuning of suppression and (3) faces matched for both physical and perceived contrast.

Methods
Participants. Twenty-nine participants took part in the first study (broadband stimuli). Seventeen additional participants took part in the remaining conditions (low-, mid-and high-frequency stimuli). All participated in the experiment as part of a credited research module assessment. All participants had normal to corrected vision. The University of Essex Ethics Committee approved the study on the grounds that the study design was in accordance with university ethical guidelines and regulations. All participants were told that the study was concerned with face perception, and all gave written, informed consent.
Stimuli and apparatus. Stimuli were presented using a VIEWPIXX 3D monitor, viewed from a distance of 80 cm. A chin rest was used to maintain viewing distance and eye-level. The monitor screen was 52 cm wide by 29 cm tall. The screen resolution was 1920 × 1080 pixels, with a refresh rate of 120 Hz, and an average luminance of 50 cdm 2 . All stimuli, including masks, were generated and presented using MATLAB and Psychophysics Toolbox extensions, and were delivered via NVIDIA 3D vision liquid-crystal shutter goggles [39][40][41] . Note, that the use of shutter goggles in the present study differs from the mirror stereoscope used by Stein et al. 25 .
Face stimuli. Stimuli were grayscale front-view face photographs of 16 actors (eight women, eight men) extracted from the Karolinska Direct Emotional Faces set 42 . Faces were cropped to include only internal features. Each actor portrayed a neutral, angry, fearful, happy or disgusted expression. The width of each face image was 4.5°. In MATLAB, a second-order Butterworth filter was used to create spatially filtered versions of the original, broadband images. The cut-off frequencies were f < 1 cpd for low spatial frequency (LSF) faces, 1 < f < 6 cpd for midrange spatial frequency (MSF) faces, and f > 6 cpd for high spatial frequency (HSF) faces. The frequency content of stimuli therefore varied between 4.5 and 27 cycles per face-width, and bandpass cut-offs were comparable to those used by Stein et al. 25 and Vlamings et al. 20 . Faces were presented in two contrast formats: one in which they were normalised for root mean squared (RMS) contrast, and one in which they were psychophysically matched for perceived contrast. The latter contrast condition meant that faces were presented to observers with an associated Michelson contrast required for them to appear the same contrast. To create these faces, we utilised data from a separate study where a sample of participants (not associated with the present study) adjusted the physical contrast of the same grayscale 16 KDEF facial stimuli until they were perceptually the same 37 . This provided the 16 KDEF faces used in the present study with an assigned Michelson contrast value, corresponding to the degree of physical contrast necessary for them to perceptually match a reference face composed of 10% Michelson contrast. Therefore, faces matched for apparent contrast in the present study contained the degree of physical contrast required in order to subjectively appear as though they were composed of 10% Michelson contrast.
Faces were presented in a normal, upright format or as control versions. To create these control stimuli, images were spatially inverted (rotation by 180°) and their luminance polarity was reversed. Combining inversion and luminance polarity reversal reduces emotional recognition beyond that associated with inversion alone 17 . Doing so is therefore a useful tool for disrupting configural, face-specific processing, while preserving low-level image properties including contrast and spatial frequency content 17,43 .
Mask stimuli. The same second-order Butterworth filters used to create the face stimuli were also used to create the b. CFS masks. Masks were composed of randomly positioned rectangles, with minimum and maximum widths and heights of 5.2 and 25 arcmin (respectively), with new samples presented at a rate of 10 Hz. On each trial, the spatial frequency content of masks and facial stimuli were matched. An example is shown in Fig. 1.
Procedure. Participants were tested individually in a quiet room. Nvidia 3D goggles were used to present separate images to the two eyes. Note that Stein et al. 25 used a mirror-stereoscope. Masks were present at full contrast for the duration of all trials, and face stimuli were presented individually at 1 of 4 quadrant locations. Faces reached full Michelson contrast 1 s after stimulus onset. Using a four alternative-forced-choice-task (4AFC), participants were instructed to indicate in which of the four quadrants each face was located, as quickly as possible. Manual responses were recorded using the RESPONSEPixx response box. Next trial onset was triggered by the observer's response, but if responses were not made by 7 s post-trial onset, the next trial began. Overall, the study was separated into two parts. The first part of the study presented 29 observers with broadband facial stimuli: observers completed 320 trials (16 actors × 5 expressions × 2 contrast conditions × 2 orientations). Trials were randomised, and separated into eight blocks. The second part of the study presented 17 observers with low-, mid-, and high-frequency facial stimuli: observers completed 320 trials (16 actors × 5 expressions × 2 contrast conditions × 2 orientations). Trials were randomised, and separated into eight blocks. For both parts of the study, stimulus and procedural details were the same across each of the four studies, except for the spatial frequency content of faces.

Results
Response times (RTs) reflect the point at which face stimulus broke suppression from b. CFS masks. Response times for each spatial frequency study (broadband, LSF, MSF, and HSF) were analysed separately. Here, each analysis included a 5 (Expression) × 2 (Contrast condition) × 2 (Orientation) repeated measures analysis of variance (ANOVA), and were followed by eight Šidák-corrected paired comparisons where appropriate (α = 0.0063, according to eight comparisons). Šidák corrections were selected over Bonferroni corrections for Comparisons explored differences in response times between fear and each counterpart expression, and were performed separately for each contrast condition. Expression-related differences in RTs were explored separately for faces normalised for RMS and apparent contrast. Eight Šidák-corrected tests (α = 0.0063) compared RTs for fear to each other expression, both when they were presented at normal, upright orientation, and when in control format. When normalised for RMS contrast, normal (non-control) fearful expressions were detected faster than angry faces (p < 0.001); an effect that remained true for control faces (p 0.005). When normalised for apparent contrast, RTs for normal fearful expressions were detected faster compared to angry expressions (p 0.003), but this effect was not found for control versions of faces. No other significant differences were observed. All comparisons are summarised in Table 1, and illustrated in Fig. 2a,e.

Response times for broadband faces.
In summary, for broadband stimuli, fearful faces were only detected more quickly than angry faces. This was true for both normal and control stimuli when stimuli were matched for RMS contrast, and for normal faces only when matched for apparent contrast. Eight Šidák-corrected tests (α = 0.0063) compared RTs for fear to each other expression, both for upright and control faces. Overall, RTs did not significantly differ between fear and any other expression, regardless of how they were normalised for contrast. No further analyses were conducted. All comparisons are summarised in Table 2, and illustrated in Fig. 2b,f.

Response times for LSF faces.
In summary, for LSF stimuli, fearful faces were not detected more quickly than any other expression, in any conditions. Overall, faces normalised for RMS contrast were more visible compared to those normalised for apparent contrast. Normal, upright faces were also more visible than  Table 3, and illustrated in Fig. 2c,g.

Response times for MSF faces.
In summary, for MSF stimuli, n fearful faces were detected more quickly than angry faces, but only when matched for RMS contrast. When matched for apparent contrast, response times are slower for detecting fear than happy control faces. . Eight Šidák-corrected tests (α = 0.0063) compared RTs for fear to each other expression, both for normal upright and control faces. When normalised for RMS contrast, RTs for upright fear expressions were faster compared to both angry and happy faces (both p < 0.001). Only the effect between fear and anger remained true for control faces (p ≤ 0.001). When normalised for apparent contrast, upright fear expressions were detected faster compared to angry faces (p = 0.0060), but this effect diminished for control faces. Notably, control fear expressions were detected more slowly compared to neutral controls (p 0.001). All comparisons are summarised in Table 4, and illustrated in Fig. 2d,h. In summary, for HSF stimuli, fearful faces were detected more quickly than the original angry and happy faces, and angry faces only for the control facial stimuli. When matched for apparent contrast, they were only perceived more quickly than normal, angry faces.

Discussion
The objective of the present study was to perform a replication and extension of the experimental design employed by Stein et al. 25 . This extension compared fearful faces to a broader range of spatially filtered facial expressions, included midrange, bandpass stimuli as well as lowpass and highpass images, and repeated the experiments for stimuli matched both for their physical RMS contrast, and for their apparent, perceived contrast. Table 1. Visibility differences between broadband expressions normalised for contrast. Pairwise comparisons conducted separately for faces normalised for RMS contrast and those normalised for apparent, perceived contrast. In each contrast condition, eight comparisons compared response times between upright fear and counterpart expressions (4) and again for control versions of faces (4). All comparisons were Šidák-corrected according to eight comparisons: α = 0.0063.  Table 3. Visibility differences between midrange frequency expressions normalised for contrast. Pairwise comparisons conducted separately for faces normalised for RMS contrast and those normalised for apparent, perceived contrast. In each contrast condition, eight comparisons compared response times between upright fear and counterpart expressions (4) and again for control versions of faces (4). All comparisons were Šidákcorrected according to eight comparisons: α = 0.0063.

Expression comparisons (RMS)
Fear-neutral − 0.003 16 − 198.17, 197 www.nature.com/scientificreports/ expressions was found in high frequency conditions. Notably, Stein et al. 25 study only compared spatially filtered fear and neutral expressions. Though we did not observe a significant difference between upright fear and neutral expressions at any frequency condition, at high spatial frequencies fear expressions were detected faster compared to both happy and angry expressions, and these effects were most pronounced when stimuli were normalised for RMS contrast. In this sense, our findings both support and extend those of Stein et al. 25 . To our knowledge, the present study and that of Stein et al. 25 are the only ones to explore biases for spatially filtered expressions using b. CFS. Perceptual biases for fearful expressions are often found to rely on low frequency tuning, and so we propose that the effects observed in the present study, including those of Stein et al. 25 , are a facet of expression perception specific to the b. CFS paradigm. At broadband and low frequency conditions, we found rather limited evidence to suggest that fearful faces are perceived more rapidly than other expressions. Many studies evidence an initial fear bias for intact broadband stimuli 8,[16][17][18]25,44 , though in the present broadband condition, detection advantages for fearful expressions were only found compared to anger, but not neutral, happiness, or disgust. Notably, this may be due to the number of expressions included in the present study, including stringent effects incurred from statistically-corrected comparisons for both upright and control faces. For more information regarding the statistical power of our study, please see Supplementary Tables 1 and 2. We found that evidence for the threat bias was much diminished when stimuli were matched for apparent contrast, than when matched for RMS contrast. Normalisation for luminance and contrast is a routine procedure in studies of biases in the perception of facial expressions, since all other things being equal, brighter, higher contrast stimuli will be easier to see. Normalisation is therefore performed on the assumption that any such lowlevel differences between stimuli would artefactually influences the results. However, analyses of photographs have found naturally occurring differences in contrast across emotional expressions 36,37 . If the threat bias were to provide a behavioural advantage in everyday life, then it should be evident without prior contrast normalisation, particularly given these reliable differences between expressions. This analysis also showed a difference in the Fourier amplitude slope, with fearful faces having a steeper slope than other expressions 37 . This means that fearful expressions have relatively low contrast at high spatial frequencies. This means that, when normalising for RMS contrast in broadband stimuli, the amplitude of all frequency bands will be increased. This is important because the root cause of the reduced contrast in fearful faces is found primarily in a frequency band that contributes little to apparent contrast, but the normalisation will increase the contrast at low to midrange frequencies, known to be important in both apparent contrast 35 and in the threat bias 8 .
Our finding that the threat bias is most evident at high spatial frequencies is consistent with the findings from Stein et al. 25 , but is at odds with those derived from non-b. CFS studies that show a low frequency role for fear biases. Across studies using different behavioural tasks, there is a wide variation in the spatial scale of information driving the effects, and thus the corresponding neural mechanisms that have been implicated. The threat bias found in b. CFS appears to be driven by high spatial frequencies. This information is processed by the parvocelluar layers of the LGN, and Stein et al. 25 outline how this information would then be processed by Table 4. Visibility differences between high frequency expressions normalised for contrast. Pairwise comparisons conducted separately for faces normalised for RMS contrast and those normalised for apparent, perceived contrast. In each contrast condition, eight comparisons compared response times between upright fear and counterpart expressions (4) and again for control versions of faces (4). All comparisons were Šidákcorrected according to eight comparisons: α = 0.0063. www.nature.com/scientificreports/ cortical mechanism. This would most likely be via the distributed cortical network of brain areas involved in the processing of faces, in the higher-level, ventral regions of the visual cortex 45,46 . Conversely, results from other tasks including saccadic latency show that orientation towards images of faces 8 , and also the orientation of spatial attention and spatial sensitivity 13 , have been associated with the subcortical processing of low spatial frequency in areas including the amygdala 21 . Finally, it has been suggested that the preferential processing of fearful faces reflects the fact their spectral content is especially well matched with the contrast sensitivity function of the human visual system 18 . This matching relies on the fact that fearful faces have increased energy at midrange spatial frequencies, once matched for RMS contrast. The contrast sensitivity function is determined by properties of visual processing at a range of levels, including the centre-surround properties of cells retinal geniculate cells 47 , and the sampling of spatial frequency in the primary visual cortex 48 . The visual processing associated with increased salience of fearful faces thus runs through from the retina, subcortical areas, primary visual cortex and higher visual areas in the ventral stream, and visual information spanning low, midrange and high spatial frequencies. Rather than reflecting a single stage of visual processing, responses to emotional expressions occur at multiple levels of processing, involving a complex network of forward, lateral and backwards connections across levels 49 . There appears to be no single, well-defined adaptation that might be expected to provide a broad behavioural advantage in our responses to fearful faces. Together, the present findings highlight the combined effects of spatial frequency and contrast on face visibility under b. CFS. They show, along with other recent findings, that routine experimental procedures such as contrast normalisation can have facilitatory effects on stimulus salience 37,50 . Moreover, they contribute to current and developing discussions regarding the mechanisms of b. CFS, its reliability as a measure of conscious visual processing 23,24,[26][27][28][29]38,51 , in showing that stimulus visibility under b. CFS also varies according to the contrast content of face stimuli, and the implications this has for our understanding of perceptual biases for emotional expressions.