Introduction

In order to deal with the surfeit of information entering our senses, we can engage spatial selective attention voluntarily based on current goals or orient attention reflexively to a salient event in the environment, which both result in improved perception at the attended location1. Foundational theories of attention propose that two mechanisms are involved during spatial selection: enhancement of sensory processing at the attended location and suppression of sensory processing at unattended locations2,3. This is based on the idea that visual processing is a strictly limited resource, and that if visual processing is increased at one location then it should be paralleled by a decrease in visual processing elsewhere. Studies of endogenous (i.e., voluntary) spatial attention have used electroencephalography (EEG) to examine neural activity following spatial attention cues, and found that when a location is attended, neural activity in parts of the visual cortex that represent that location is increased relative to other unattended locations even before a target stimulus is presented (e.g.,4). Several studies have argued that these preparatory biasing signals not only reflect neural enhancement of attended regions, but also suppression of unattended areas prior to target presentation5,6,7,8,9. More recently, a similar effect on visual-cortical processing was observed during the exogenous orienting of attention. Following peripheral salient sounds that reflexively oriented attention to the left or right side of space, neural activity was increased over the occipital cortex contralateral to the sound’s location relative to ipsilateral, even prior to or in absence of visual targets10. Given that these peripheral sounds were completely random—predicting neither the occurrence, timing, nor location of a target10—this lateralized neural activity has been interpreted as representing the reflexive enhancement of visual processing following the exogenous orienting of attention at the location of the cue11. However, based on this relative difference in neural activity between the hemispheres, it cannot be distinguished whether this modulation reflects signal enhancement at the attended location, suppression of the unattended location, or a combination of the two. Thus, while many studies have attempted to tease apart mechanisms of enhancement and suppression during endogenous attentional orienting, research on how exogenous attention modulates neural activity following a salient cue is lacking.

We addressed this issue by adapting a cross-modal cueing paradigm to include a baseline ‘no-shift’ cue that allowed us to isolate neural activity related to the attended and unattended locations. Prior behavioral studies have used such baseline cues to separate behavioral benefits and costs of target processing, finding evidence for both attentional benefits (i.e., increases in performance for the cued location relative to baseline; e.g.,12,13) as well as attentional costs (i.e., decreases in performance at the uncued location relative to baseline;14,15,16). However, based on behavioral responses alone it is difficult to infer how orienting spatial attention is implemented in sensory cortex, given that responses are the result of multiple processing stages including changes in sensory activity prior to the onset of a target, perceptual processing of the target itself, and subsequent cognitive stages that involve decision-making, response preparation and response execution. Thus, in addition to behavior, we here directly assessed visual-cortical processing elicited by an auditory cue during exogenous attentional orienting using EEG.

Participants performed a visual discrimination task, and shortly before a visual target appeared, an auditory cue was presented either at the left or right target location to orient spatial attention, or at a central location—in this case acting as an alerting signal without eliciting lateralized shifts of attention (no-shift cue). Note that we used sounds as attention cues to eliminate any early visual responses elicited by the cues themselves. Our EEG analysis focused on two event-related potentials (ERPs). First, we examined the Shift-Related Positivity (SRP) over frontal electrode sites to confirm that our no-shift cue differed from the shift-cues in terms of engaging attentional control areas17, second, we examined the Auditory-Evoked Contralateral Occipital Positivity (ACOP), a positive deflection over contralateral relative to ipsilateral occipital cortex with respect to the cue location10,18. The main question was how these lateralized changes over occipital cortex would compare to activity elicited by the no-shift baseline cues. Specifically, if exogenous attention is primarily supported by neural enhancement, we would expect increased neural activity over contralateral cortex relative to the no-shift cue; alternatively, if exogenous attention is primarily supported by neural suppression, we would expect decreased activity over ipsilateral cortex relative to the central no-shift cue; finally, an intermediate level of activity elicited by the no-shift cue would suggest the involvement of both enhancement and suppression. In addition to these neural measures, we also analyzed behavioral performance examining costs and benefits at the cued and uncued location relative to the central cue. Together, these measures would provide converging evidence of how spatial exogenous attention improves perceptual processing of a target at attended relative to unattended locations.

Method

Participants

Nineteen participants were included in the final sample of the experiment (14 female; mean age of 19.9 years). Data from five participants were excluded due to excessive artifacts in the EEG (affecting > 30% of trials). Data from two additional participants were excluded due to issues with the EEG system that resulted in significant lost data: an HEOG electrode came loose for one subject and a battery died for the other. Two subjects lost a small number of trials (11 and 14 trials) due to sampling errors of the EEG system but are included here, as each subject had greater than 70% of the full sample trial number remaining following artifact rejection.

All participants gave informed written consent in accordance with the IRB guidelines of the Human Research Protections Program of the University of California, San Diego and were paid for their time ($10/hour) or received course credit. All participants reported having normal or corrected-to-normal vision and normal hearing. The sample size was chosen a priori based upon a number of other studies utilizing similar cross-modal attentional cueing paradigms that effectively measured the ACOP and/or related behavioral effects10,17,18,19,20,21. We preregistered our predictions and analysis on AsPredicted (https://aspredicted.org/rw2it.pdf) and planned to collect data from 20 participants after exclusion. Our sample has one fewer subject than planned due to data collection being disrupted by COVID-19.

Stimuli and Apparatus

Participants were seated approximately 45 cm in front of a 27″ monitor in a sound-attenuated, electrically shielded booth. Stimuli were presented on the screen via the Psychophysics Toolbox in MATLAB22,23. A small black fixation dot (0.2° × 0.2° of visual angle) was always present in the center of the screen, which was otherwise uniformly gray (RGB: 127, 127, 127). A black circle (0.4° × 0.4°) appeared around the fixation dot at the start of each trial to indicate to the participant that the trial had begun. Peripheral auditory cues were ~ 83 ms pink noise bursts (0.5–15 kHz, 78 dB SPL) played from external speakers mounted on either side of the computer monitor. Consistent with previous cross-modal cueing work from our lab and others10,17,18,19,20,24, these cues were played in stereo and their amplitude was adjusted to give the impression that the sounds were emanating from the possible target locations on the screen. For example, for a left cue, the amplitude of the sound coming from the left speaker was adjusted to be louder than the amplitude of the sound coming from the right speaker, so that the sound appeared to come from the the visual target location. The central auditory cue was the same pink noise burst played from a speaker mounted on the top of the monitor that was slightly tilted downwards (hereafter referred to as central cue) and adjusted to be equal in intensity to the peripheral stimuli. This central cue appeared to emanate from the center of the screen, and thus would not elicit any lateralized shifts of spatial attention. Indeed, as the main goal of this cue was to not elicit any spatial attention shifts but maintain attention focused in the central area of the screen, we opted to present a sound from a speaker located centrally as well. We believe this central sound—rather than, for example, bilateral sounds from both peripheral speakers simultaneously—eliminated any possibility of this sound directing attention to each of the possible target locations simultaneously25. Additionally, because all cue conditions should be intermixed throughout the entirety of the study26, and shift cues were themselves already played in stereo from both speakers, we opted for a no-shift cue played from a separate speaker in order to eliminate the possibility of an interaction between the shift and no-shift cues that may have influenced the perception of each. The target was a Gabor patch with a spatial frequency of 1.3 cycles/degree, turned either -45° or 45° from vertical. The contrast of the Gabor patch was determined for each participant in a calibration task prior to the main experiment (see below). The target was presented in one of two peripheral locations indicated by a black circle with a diameter of ~ 9° visual angle, centered ~ 28° of visual angle to the left and right of fixation. Each target was immediately followed by a visual noise mask of the same size.

Procedures

All experimental procedures were approved by and conducted in accordance with the guidelines and regulations of the University of California, San Diego Institutional Review Board. An example of a full trial presentation is outlined in Fig. 1A. Participants were asked to keep their eyes on the central fixation dot throughout each experimental block. Each trial began with the presentation of a black circle that appeared around the central fixation dot, indicating to the participants that the trial had started. Following the onset of this circle at a variable stimulus onset asynchrony (SOA) of 1000–1300 ms, an 83-ms auditory attention cue was presented randomly at either the left, right, or center and was not predictive of the spatial location of the visual target. Consequently, participants were instructed to ignore the sounds because they would not be informative to the task. On 1/3 of trials, a target was presented following the cue at an SOA of 130 ms, and on another 1/3 of trials the target was presented following the cue after 630 ms. The target Gabor patch was presented at one of the two peripheral locations for 53 ms and was followed immediately by a visual noise mask for 100 ms. The noise mask always appeared at the location of the target to eliminate uncertainty about the location at which the target appeared. Following the offset of the noise mask at an interstimulus interval (ISI) of 300 ms, the black circle surrounding the central fixation dot turned white, prompting a response from the participant as to which direction the target was oriented. Participants made this report using the “m” (clockwise) and “n” (counterclockwise) keys. On the remaining 1/3 of trials, neither a target nor a mask appeared following the cue. These trials ended 900 ms after cue onset, and participants pressed the spacebar to continue to the next trial. The target display was omitted or presented after a longer cue-target SOA (630 ms) to allow recording of the event-related potential (ERP) to the cue separately from the otherwise overlapping ERP to the visual target.

Figure 1
figure 1

Example trial and performance. (A) Participants discriminated the direction of rotation (clockwise or counterclockwise) of a masked Gabor patch target. Prior to the appearance of the target, participants were presented with an auditory cue that was played randomly either 130 ms or 630 ms prior to the target. This sound was a pink noise burst played either in stereo from peripheral speakers such that it appeared to emanate from either the left or right of the screen, or from a speaker at the top center of the screen such that it appeared to emanate from the center of the screen. On one third of the trials, no visual target was presented. (B) Target discrimination accuracy, plotted as a function of cue condition for each of the cue-target SOAs. Error bars represent ± 1 standard error of the mean. Asterisks indicate a significant (p < .05) difference between conditions.

We included the 1/3 of trials in which the target appeared at a short SOA from the cue in order to investigate whether the salient auditory cue influenced behavior, as exogenous attentional benefits in perception typically last only a few hundred milliseconds27,28. We included the 1/3 of trials in which the target appeared at a long SOA from the cue in order to eliminate any clear temporal relationship between the cue and the target, and we included the 1/3 of trials in which no visual information was presented in order to make the cue nonpredictive of whether a target would appear at all. This ensured that neural responses to the cue were representative of purely exogenous activity and not any expectations related to the appearance of the target, as the cue was not predictive of where, when, or even if a target would appear. Additionally, because neither the long SOA nor the no-target trials presented any visual information during and beyond our a priori time window of interest, we were able to include each of these trial types (2/3 of all trials) in the EEG analysis. All trial types were randomly intermixed within a block. Subjects performed 12 blocks of 72 trials each. Prior to the experimental tasks, task difficulty was adjusted for each participant using a thresholding procedure that varied the contrast of the Gabor patch target to achieve about 75% accuracy (i.e., QUEST;29. In this thresholding task, participants discriminated the direction of the 45°-oriented Gabor patch in the absence of any sounds. Each participant performed 72 trials of the thresholding task and the individual contrast thresholds were used for the main experiment. Prior to performing the thresholding task, participants performed 36 practice trials without a cue.

EEG recording and analysis

Electroencephalogram (EEG) was recorded continuously from 32 Ag/AgCl electrodes mounted in an elastic cap and amplified by an ActiCHamp amplifier (BrainProducts, GmbH). Electrodes were arranged according to the 10–20 system. The horizontal electrooculogram (HEOG) was recorded from two additional electrodes placed on the external ocular canthi which were grounded with an electrode placed on the neck of the participant. The vertical electrooculogram was measured at electrodes FP1 or FP2, located above the left and right eye, respectively. All scalp electrodes were referenced to the right mastoid online and were digitized at 500 Hz.

Data processing was carried out using EEGLAB30 and ERPLAB31 toolboxes and custom-written scripts in MATLAB (The MathWorks, Natick, MA). Continuous EEG data were filtered with a bandpass (butterworth filter) of 0.01–112.5 Hz offline. Data were epoched from -1000 ms to + 1200 ms with respect to the onset of the auditory cue. Artifacts were detected in the time window −800 to 800 ms, and trials contaminated with blinks, eye movements, or muscle movements were removed from the analysis. First, we used automated procedures implemented in ERPLAB(31, peak-to-peak for blinks, and a step function to detect horizontal eye movements at the HEOG channel). Second, for each participant, each epoch was visually inspected to check the automated procedure and the trials chosen for rejection were updated (cf.,32). Artifact-free data was digitally re-referenced to the average of the left and right mastoids. In order to avoid overlap of the target-elicited neural activity with the cue-elicited neural activity, only trials without a target and trials with a 630 ms cue-target SOA were included in the cue-elicited ERP analysis.

ERPs elicited by the left and right noise bursts were averaged separately and were then collapsed across cue position (left, right) and hemisphere of recording (left, right) to obtain waveforms recorded ipsilaterally and contralaterally relative to the sound. The ERPs elicited by the central cues were obtained by averaging across the same lateral electrode positions across both hemispheres (bilateral) that were included in the peripheral cue analysis. ERPs were low-pass filtered (half-amplitude cutoff at 25 Hz; slope of 12 dB/octave) to remove high-frequency noise. Mean amplitudes for each participant and condition were measured with respect to a 200 ms prestimulus period (−200 to 0 ms from cue onset), and mean amplitudes were statistically compared using both repeated-measures Analyses of Variance (ANOVAs) and planned follow-up paired t-tests. Our main analysis was focused on the ERP activity during the Auditory-Evoked Contralateral Occipital Positivity (ACOP) time window. The ACOP—usually measured as a contralateral-vs.-ipsilateral ERP component—has been proposed as an index of exogenous attention10. Thus, based on previous studies on the ACOP10,18, the ERP amplitude was measured between 260 and 360 ms at four parietal-occipital electrode sites (PO7/PO8/P7/P8) separately for the hemisphere contralateral to the cued location, ipsilateral to the cued location, and over bilateral electrode sites following the central cue. A separate and more exploratory (though preregistered) analysis focused on frontal activity related to shifting vs. not shifting attention to peripheral and central cues respectively. Based upon prior research demonstrating shift-related activity at frontal sites in endogenous attentional cueing paradigms17, activity was measured between 300–500 ms at four frontal electrode sites (F3/F4/FC1/FC2).

Topographical maps

To illustrate the scalp distribution of the different ERP measures, we created topographical maps using spline interpolation of the voltage differences between the cue conditions. To isolate the activity related to the three different cues, we created maps for the contralateral-minus-ipsilateral activity (the ACOP), as well as for the voltage differences between each lateralized activity and the non-lateralized activity elicited by the central cue (i.e., contralateral-minus-central; ipsilateral-minus-central). For contralateral-minus-ipsilateral topographies, values at midline electrode sites (e.g., POz) were set to zero17,19, and these difference voltage topographies were projected to the right side of the head. The contralateral/ipsilateral-minus-central topographies were plotted together, with differences in the ipsilateral hemisphere projected to the left side of head and differences in the contralateral hemisphere projected to the right side.

Statistical analyses

Behavior was analyzed by comparing accuracy (% correct) in the Gabor discrimination task separately for when a cue was presented at the same location as the visual target (valid trials) vs. at the opposite location (invalid trials) vs. at the center (central trials). Though the behavioral measure of interest was accuracy, we also analyzed reaction time (i.e., RT) in order to rule out any speed-accuracy trade-offs. Behavioral and EEG data were statistically analyzed using repeated-measures ANOVAs and paired t-tests (alpha = 0.05) using MATLAB (The MathWorks, Natick, MA). To compare accuracy and RT in each task following the different cue conditions, we performed 3 × 2 repeated-measures ANOVAs with factors of cue type (valid, invalid, or central) and cue-target SOA (130 ms or 630 ms). Note that the inclusion of the cue-target SOA factor is a departure from our pre-registered analysis but was necessary in order to compare the strength of the alerting response elicited by each cue. To compare ERP activity following each cue, we performed repeated-measures ANOVAs with a factor of hemisphere relative to cue type (contralateral to peripheral cue, ipsilateral to peripheral cue, bilateral for central cue) on the data separately for each of our a priori chosen electrode clusters and time window pairs. Both pre-registered and post-hoc t-tests were performed on the data and are appropriately noted in the Results section. Post-hoc t-tests were corrected for multiple comparisons using a Holm-Bonferroni correction33 and reported in corrected form.

An additional time–frequency analysis of lateralized and nonlateralized oscillatory activity in the alpha-band (8–13 Hz) was pre-registered. These data are not reported here due to the critical a priori ANOVA comparing alpha-frequency activity across the three cue conditions (contralateral vs. ipsilateral vs. central) failing to reach statistical significance. However, the method and results of this analysis are outlined in the Supplementary Alpha Analysis Method and Supplementary Alpha Analysis Results.

Results

Behavior

Accuracy following each cue and cue-target SOA in the target discrimination task is plotted in Fig. 1B. In order to test for the presence of a behavioral cueing benefit at each SOA, a two-way repeated-measures ANOVA with factors of cue condition (valid, invalid, central) and SOA (short, long) was performed. This analysis revealed a significant main effect of SOA, F(1, 18) = 36.00, p < 0.001, ηp2 = 0.59, indicating that overall accuracy was higher in short SOA trials than long SOA trials. Additionally, there was a main effect of cue condition, F(2, 36) = 7.79, p = 0.002, ηp2 = 0.33, indicating that accuracy varied by cue condition. Finally, there was trend towards a significant interaction between cue condition and SOA, F(2, 36) = 3.08, p = 0.06, ηp2 = 0.15. Preregistered follow-up t-tests were performed for the short SOA condition, revealing that accuracy was significantly higher following valid cues compared to invalid cues, t(18) = 4.39, p < 0.001, d = 1.01, and central cues, t(18) = 3.67, p = 0.002, d = 0.84. Critically, there was no significant difference between performance following invalid and central cues, t(18) = 1.69, p = 0.11, d = 0.39. This pattern of findings is generally in line with the predictions of a facilitation-only account of exogenous attention: there were behavioral benefits at the cued location relative to baseline, but no costs for the uncued location relative to baseline. Post-hoc t-tests were also performed on the long SOA data in order to test for the presence of cueing effects, which were not predicted given that exogenous attention effects are typically largest at short SOAs28. These t-tests, which were corrected for multiple comparisons because we did not pre-register them, demonstrated that accuracy was comparable across all cue conditions. Accuracy was not significantly different following valid and invalid cues, t(18) = 1.53, p = 0.52, d = 0.35, or central cues, t(18) = 0.25, p = 0.81, d = 0.06. Additionally, performance did not significantly differ following central and invalid cues, t(18) = 1.20, p = 0.50, d = 0.28.

In order to confirm that any differences in accuracy were not the result of a speed-accuracy trade-off, we analyzed reaction times (i.e. RTs) to the target, plotted in Supplementary Fig. 1. We tested whether there were any differences in RT between our conditions of interest by performing a two-way repeated-measures ANOVA with factors of cue validity (valid, invalid, central cue) and SOA (short vs. long) on the RT data. Neither the main effect of SOA, F(1, 18) = 3.20, p = 0.09, ηp2 = 0.08, nor the main effect of cue condition, F(2, 36) = 1.23, p = 0.30, ηp2 = 0.08, reached significance. Additionally, there was no significant interaction between cue condition and SOA, F(2, 36) = 1.40, p = 0.26, ηp2 = 0.07. These findings demonstrate that higher accuracy following the valid vs. invalid and central cues at the short SOA cannot be explained by a trade-off between speed and accuracy.

Frontal ERPs

Previous research has demonstrated that slow positive deflections in the ERP emerge bilaterally over frontal areas following endogenous symbolic cues that prompt a shift of attention to peripheral locations vs. symbolic cues that do not prompt a shift of attention, termed the Shift-Related Positivity (SRP;17,19). Based upon this finding, we investigated whether a similar ERP signature associated with the spatial shifting of attention emerged following the random peripheral (shift) vs. central (no-shift) cues over the same time interval in the present study. This would provide support for the idea that our central and peripheral cues oriented attention differently and would also demonstrate the presence of a novel frontal ERP signature of exogenous attentional deployment to uninformative, peripheral auditory cues.

As can be seen in Fig. 2, we found a sustained bilateral positivity at frontal sites following the shift vs. no-shift cues. A one-way repeated-measures ANOVA with a factor of cue condition (contralateral to cued location, ipsilateral to cued location, bilateral for central cue) was performed on the ERP waveforms during the predefined SRP time window (300–500 ms post-cue). This analysis revealed a main effect of cue condition, F(2, 36) = 14.67, p < 0.001, ηp2 = 0.45, indicating a significant difference between the amplitudes of the waveforms. Planned follow-up t-tests indicated that both the ipsilateral waveform, t(18) = 4.03, p < 0.001, d = 0.93, and contralateral waveform, t(18) = 3.95, p < 0.001, d = 0.91, were significantly more positive than the waveform elicited by the central cue. Critically, there was no significant difference between the amplitude of the ipsilateral and contralateral waveforms, t(18) = 0.90, p = 0.38, d = 0.21, indicating that this bilateral frontal component was not sensitive to the specific peripheral location attention was shifted to, but instead indexes control processes related to the spatial orienting response more generally.

Figure 2
figure 2

Grand-average ERP waveforms, mean amplitudes, and ERP topography during the shift-related potential (SRP) time window (300–500 ms). ERPs at frontal scalp sites (F3/F4/FC1/FC2) were collapsed over left- and right-cue conditions and left and right hemispheres to obtain waveforms recorded ipsilaterally and contralaterally to the cued location. For the central-cue condition, ERPs were computed by averaging bilaterally across the same set of electrodes. (A) Plot of cue-elicited ERP. There was a significant positivity in ipsilateral and contralateral hemispheres in comparison to the no-shift cue (central) during the a priori defined SRP time window (highlighted in gray). (B) Plot of average ERP magnitude during the a priori SRP time window (300–500 ms). There was a positivity in ipsilateral (ipsi) and contralateral (contra) cortex relative to the central cue. Error bars represent ± 1 standard error of the mean. Asterisks indicate a significant (p < .05) difference between conditions. (C) Topographical voltage map of the contralateral-central and ipsilateral-central ERP difference amplitudes during the SRP time window, with contralateral and ipsilateral differences projected to the right and left sides, respectively. The map demonstrates that the SRP was broadly distributed over bilateral frontal areas, concurrent with the posterior lateralized positivity.

Occipital ERPs

In order to investigate whether cross-modal exogenous attention improved performance on the visual task by facilitating visual-cortical processing at the cued location, suppressing visual-cortical processing at the uncued location, or both, we examined the ERPs elicited by the peripheral cues relative to the central cues at parietal-occipital electrode sites. In particular, we focused on the time window of the ACOP—an ERP component that has previously been associated with the exogenous orienting of attention10,18. The ACOP is based on a relative difference between neural activity across the hemispheres, so it is unclear whether it reflects a contralateral positivity, which would be consistent with visual-cortical enhancement of the attended location; or an ipsilateral negativity, which would be consistent with visual-cortical suppression of the unattended location; or both. If the ACOP reflects facilitation of information at the cued location, then we would expect the contralateral waveform (which reflects activity at the cued location) to be significantly greater in amplitude than the central-cue waveform. In this case, activity ipsilateral to the cue (which reflects activity at the uncued location) should be roughly equivalent to the central-cue baseline. Conversely, if the ACOP reflects suppression of unattended information, then we would expect the ipsilateral waveform to be significantly lower in amplitude than the central-cue waveform. In this case, the contralateral waveform should be comparable to the central-cue baseline. Finally, if the ACOP reflects both facilitation and suppression, then we would expect to observe both a contralateral increase and ipsilateral decrease relative to the central-cue baseline.

As shown in Fig. 3, the contralateral ERP waveform was more positive than both the waveform elicited over the ipsilateral hemisphere in response to the peripheral cue as well as the waveform elicited by the central cue over bilateral sites. Critically, there was no difference between the ipsilateral and central-cue waveforms. A one-way repeated-measures ANOVA with a factor of cue condition (contralateral to cued location, ipsilateral to cued location, bilateral for central cue) was performed on the ERP waveforms during the ACOP time window (260–360 ms post-cue). This analysis revealed a main effect of cue condition, F(2, 36) = 11.91, p < 0.001, ηp2 = 0.40, indicating a significant difference between the amplitudes of the waveforms. Planned follow-up t-tests indicated that the contralateral waveform was significantly more positive than both the ipsilateral, t(18) = 4.39, p < 0.001, d = 1.01, and central-cue waveform, t(18) = 4.57, p < 0.001, d = 1.05. There was no significant difference between the amplitude of the ipsilateral and central-cue waveforms, t(18) = 0.76, p = 0.46, d = 0.18.

Figure 3
figure 3

Grand-average ERP waveforms, mean amplitudes, and ERP topography during the ACOP time window. For peripheral cues, ERPs were collapsed over left- and right-cue conditions and left and right hemispheres to obtain waveforms recorded ipsilaterally and contralaterally to the cued location. For the central-cue condition, ERPs were computed by averaging bilaterally across the same set of electrodes. (A) Plot of cue-elicited ERP. There was a significant positivity over the contralateral hemisphere in comparison to the no-shift cue baseline during the a priori defined ACOP time window (highlighted in gray), whereas the ipsilateral waveform was roughly equivalent to the baseline. (B) Plot of average waveform magnitude during the a priori ACOP time window (260–360 ms). There was a positivity over contralateral (contra) cortex relative to ipsilateral (ipsi) cortex and relative to the central cue. Error bars represent ± 1 standard error of the mean. Asterisks indicate a significant (p < .05) difference between conditions. (C) Topographical voltage map of the contralateral-minus-ipsilateral ERP difference amplitudes, projected to the right side of the scalp during the ACOP time window. This map demonstrates that the topography of the ACOP was distributed over parietal and occipital areas, with no evidence of lateralized frontal activity.

Discussion

How does exogenous attention improve visual perception? While several studies have looked at how endogenous attentional orienting affects visual-cortical processing prior to the onset of a target, it is largely unknown how an exogenous attention cue modulates visual-cortical responses. To fill this gap, we had subjects perform a cross-modal, exogenous cueing task in which either peripheral left and right sounds or a central, no-shift sound preceded a masked visual target while we recorded EEG. This central cue did not evoke lateral shifts of attention, which allowed us to use this condition as a baseline to differentiate between spatially specific increases and decreases in neural activity and behavioral performance triggered by the peripheral cues. In accordance with a classic cost–benefit analysis, any increases above this baseline performance or neural activity were interpreted as evidence of facilitation whereas any decreases below this baseline were interpreted as evidence of suppression. Therefore, if exogenous attention improves perception by facilitating visual processing at attended locations, we expected to see an increase in activity at the cued location; and, conversely, if exogenous attention improves perception by suppressing visual processing at unattended locations, we expected to see a decrease in activity at the uncued location. We found that accuracy on the target discrimination task was higher following valid vs. no-shift and invalid cues. Critically, this effect of cue validity was accompanied by an increase in neural activity over parieto-occipital cortex in the hemisphere contralateral to the cue location, responsible for processing the cued location, relative to the ipsilateral hemisphere as well as the no-shift cue condition baseline. These results indicate that exogenous orienting of spatial attention results in visual-cortical facilitation in the hemisphere contralateral to the attended location, with no signs of suppression in the opposite hemisphere.

Several previous studies have used no-shift cues in behavioral paradigms with the goal of isolating costs and benefits during attentional orienting, with some studies finding costs, others benefits, and some both (e.g.,13,14). We believe that the discrepancies between studies are likely due to differences in the exact stimuli and response measures used. For example, other studies have used dependent measures such as RT16, gap size threshold in a Landolt square spatial acuity task14, and d’34 or contrast threshold15 in a variable-contrast orientation discrimination task. As behavioral responses are the accumulation of multiple sensory and cognitive processing steps, and different measures may vary in their sensitivity to these processes, it is difficult to pinpoint at what stage costs and benefits arise in these behavioral studies. Using neural measures, as done here, allows for direct assessment of costs and benefits prior to a behavioral response, demonstrating that early visual-cortical processing is enhanced at the attended location. Furthermore, the type of cue used as a neutral baseline condition may also play a role. The central sound used in our study was matched to the peripheral cues with regards to its low-level features (i.e., amplitude, frequency spectrum) and provided the same temporal—but not spatial—information about the visual targets. This means that the central and peripheral cues presumably had equivalent processing demands, served as equivalent alerting signals, and required roughly equivalent encoding times26. This is supported by our behavioral data in which we see a decrement of similar magnitude in each cue condition at the long vs. short SOA, indicating that there are no large differences in the (non-spatial) alerting signal triggered by the central and peripheral cues. Furthermore, the frontal ERP index of attentional shifting showed significantly different neural activity for peripheral relative to central cues. Thus, we think the central cue in our study acts as a valid baseline relative to the peripheral cues, as it differs only in the main dimension of interest: shifting attention to a new location vs. not.

In the present study we used auditory cues to orient spatial attention, which allowed us to examine cue-related activity over occipital cortex without the contamination of visual-evoked responses that are necessarily triggered when using visual peripheral cues. Previous work has shown that salient but uninformative peripheral sounds result in enhancements in the detection21, discrimination18,20, apparent contrast17,19 and even perceived latency35 of visual stimuli at the location of the sound in a diverse range of cross-modal cueing paradigms. Electrophysiological studies on the topic suggested that these changes in behavior are the result of enhanced early responses to the visual targets themselves when presented at the same location as the auditory cues, such as increases in the visually-evoked P1 amplitude17,19,35. However, later work demonstrated that visual-cortical processing is also altered following peripheral auditory sounds even in the absence of visual targets, as indexed by both a lateralized positive ERP (i.e., ACOP10,18,20), and lateralized decreases in the power of alpha-frequency oscillations18,36,37. The present study extends these findings by demonstrating that the lateralized changes in visual-cortical processing that follow the salient cue are representative of facilitated processing at the location of the cue rather than suppression of the uncued area, using a novel auditory no-shift cue that served as a baseline and a cost–benefit analysis approach.

Though our attention is often captured by auditory and not only visual stimuli in the real world, one question that may arise is whether our cross-modal findings can be generalized to intra-modal studies of attention. Critically, several cross-modal attention paradigms have found effects similar to visual-only attention paradigms (for a review, see11), and a recent experiment directly comparing ERP activity triggered by peripheral visual and auditory cues found similar lateralized biasing signals over parietal-occipital cortex as here (Störmer, McDonald, & Hillyard, 2019). This suggests that the present results would hold for visual cues as well. Thus, the present paradigm extends previous work on visual and audiovisual attentional cueing, demonstrating that exogenous attention effectively operates across modalities, in support of a supra-modal account of spatial attention38. Furthermore, in our study none of the sounds were predictive of target location, target timing, or even the occurrence of a target, eliminating the possibility that the observed effects are due to endogenous components of attention or expectations (e.g., signal probability) about the audio-visual stimuli, which are also known to interact across sensory modalities (for recent reviews, see39,40).

While we interpret this occipital activity as being the result of a supra-modal attentional system, it should be noted that non-human primate work indicates that there are direct connections between auditory and visual sensory cortices that may allow for bottom-up stimulation of visual cortex by auditory stimuli41. However, the time course of the positivity that we observe over occipital cortex argues against this hypothesis, as one would expect this activity to emerge much more quickly if it were the result of direct connections between the sensory cortices. Indeed, human imaging work has demonstrated that audiovisual interactions, in which the neural response to the simultaneous presentation of an auditory and visual stimulus is greater than the sum of responses to each stimulus individually, emerge in visual cortex as early as 45–73 ms after the onset of the stimuli42,43,44,45; for a review, see46. The much later onset of the positivity in the present study (~ 250 ms) therefore suggests that more elaborate processing occurs between the sensory processing of the auditory cue and the activation of visual cortex, perhaps in higher-level multisensory or attentional areas10. Supporting this argument, prior human ECoG work has found that peripheral auditory tones result in separate early (< 100 ms post-tone) and later (> 200 ms post-tone) activation of visual cortex47. This suggests that an earlier cross-modal activation of visual cortex, which may represent activity resulting from direct connections between the areas, is separable from the later and presumably attention-related activity that we observe in the present study.

The present findings implicate a difference in the neural mechanisms engaged by exogenous and endogenous attention, as prior research has shown that endogenous attention results in both costs and benefits in behavior48,49 and facilitation and suppression of neural activity50,51. That is, the present results indicate that exogenous attention only facilitates processing of information at the location of a salient cue, whereas endogenous attention seems to additionally involve suppressive attentional mechanisms that possibly emerge later in time. This conclusion contrasts with recent work proposing that the lateralized positivity observed over parietal-occipital cortex following peripheral auditory stimuli (i.e., ACOP) is instead representative of the suppression of unattended locations52. In this study, it was demonstrated that the same peripheral sounds that elicit an ACOP during a spatial localization task may not elicit an ACOP during non-spatial tasks, such as pitch discrimination. Additionally, the study compared visual-cortical activity in each hemisphere in the localization task that elicited an ACOP to activity in tasks that did not elicit a reliable ACOP. When making this comparison, they found an ipsilateral decrease in the localization task relative to both ipsilateral and contralateral activity across all other tasks. This was interpreted as evidence that the ACOP reflects suppression of the unattended location. However, it is unclear whether the ACOP-absent trials in the aforementioned study served as an effective baseline in the same way that a central no-shift cue does. Given that processing demands varied between the tasks that did and did not elicit an ACOP, and subjects performed these tasks in separate blocks, it is difficult to directly and clearly interpret differences in neural data elicited by the sounds in the separate tasks. Additionally, the sounds were still presented peripherally in the tasks that did not elicit an ACOP, and thus may have engaged spatial exogenous attention to some extent. Further supporting the present study’s conclusion, the facilitation-only account of exogenous attention comports well with other recent evidence demonstrating that lateralized occipital activity predicts behavioral performance only on validly but not invalidly cued trials20,36. This suggests that only neural activity at the location of a cue (presumably reflecting the facilitative effects of attention) is related to behavioral performance in exogenous cueing tasks. However, it is important to note that these studies, including ours, only investigated the first ~ 600 ms of cue-elicited neural activity, in line with the rapid time course of exogenous attention. Accordingly, it is possible that there is a later, suppressive component of attention that only emerges at longer timescales—and thus would likely only be engaged during endogenous attention due to its relatively sluggish response. Indeed, prior research measuring alpha frequency oscillatory activity following informative central cues suggests that suppression may emerge on a later timescale than facilitation6. Thus, it might also be the case that separate neural measures (e.g., ERPs and oscillatory activity) reveal different patterns of hemisphere-specific neural processing in response to peripheral cues. This is an important question for future research, as in the present study we did not observe reliable lateralized alpha activity in response to the cues. The present data also demonstrate a novel similarity between the exogenous and endogenous orienting of attention, as we find that exogenous cues elicit a Shift-Related Positivity over frontal areas that has previously only been demonstrated in response to endogenous cues17,19. Taken together, these findings suggest that the shifting of exogenous and endogenous attention may be mediated by similar control processes in frontal areas, but that these shifts result in different effects in posterior cortex.

Overall, our data demonstrate that the exogenous orienting of spatial attention results in visual-cortical enhancement at the location of a salient cue but does not result in spatially specific suppression of visual processing at uncued locations (i.e., opposite hemisphere). Broadly, these findings suggest that exogenous and endogenous spatial attention differ in how they bias visual-cortical processing to support effective stimulus selection.