Robust spatial ventriloquism effect and trial-by-trial aftereffect under memory interference

Our brain adapts to discrepancies in the sensory inputs. One example is provided by the ventriloquism effect, experienced when the sight and sound of an object are displaced. Here the discrepant multisensory stimuli not only result in a biased localization of the sound, but also recalibrate the perception of subsequent unisensory acoustic information in the so-called ventriloquism aftereffect. This aftereffect has been linked to memory-related processes based on its parallels to general sequential effects in perceptual decision making experiments and insights obtained in neuroimaging studies. For example, we have recently implied memory-related medial parietal regions in the trial-by-trial ventriloquism aftereffect. Here, we tested the hypothesis that the trial-by-trial (or immediate) ventriloquism aftereffect is indeed susceptible to manipulations interfering with working memory. Across three experiments we systematically manipulated the temporal delays between stimuli and response for either the ventriloquism or the aftereffect trials, or added a sensory-motor masking trial in between. Our data reveal no significant impact of either of these manipulations on the aftereffect, suggesting that the recalibration reflected by the trial-by-trial ventriloquism aftereffect is surprisingly resilient to manipulations interfering with memory-related processes.

www.nature.com/scientificreports/ orientations 22 , the accumulation of the ventriloquism aftereffect 9 , and longer reaction times reduce perceptual biases in visual discrimination 23 . Collectively, the functional analogy of the trial-wise ventriloquism aftereffect with serial dependencies in perceptual decision making and the neuroimaging studies implying medial parietal regions in the aftereffect, make a strong case for a memory-related component in the trial-by-trial ventriloquism aftereffect. However, a number of studies suggested that spatial recalibration may not be easily affected by higher cognitive processes 24,25 and in particular studies on the long-term ventriloquism aftereffect have also suggested an independence of memory processes 26 . Overall the literature seems divergent, and most studies focused on the long-term ventriloquism aftereffect. Hence, the role of memory-related processes specifically in the trial-wise ventriloquism aftereffect remains unclear.
We set out to test the hypothesis that the trial-wise ventriloquism aftereffect is related to memory processes, and hence susceptible to manipulations known to interfere with working memory. In three experiments we (i) manipulated the delay between the inducing audio-visual (ventriloquism) stimulus and the associated response, (ii) manipulated the delay between stimulus and response in the auditory trial, or (iii) used a masker trial in between the audio-visual and the auditory trial to interfere with mnemonic processes. We found that none of the manipulations led to a consistent and robust change in the aftereffect bias, suggesting that the ventriloquism aftereffect is more robust to memory-manipulations as expected from similar studies on serial dependencies in serial perception.

Multisensory response biases.
In three experiments we probed participants' judgments of sound location in audio-visual (AV) trials and subsequent auditory (A) trials (Fig. 1). In the AV trials, spatially localized (5 locations: − 16°, − 8°, 0°, + 8°, +16°) sounds were accompanied with spatially localized random-dot patterns presented at either the same location or a range of spatial discrepancies (ΔVA). This allowed us to quantify the ventriloquism effect, reflecting the bias induced by the visual stimulus on the perceived location of the simultaneous sound. The responses in the subsequent A trials allowed us to probe the trial-wise ventriloquism aftereffect, reflecting the persistent influence of the multisensory discrepancy experienced in the AV trial on the judgement of a subsequent unisensory sound 4,7 . Each experiment manipulated the sequence of AV-A trials in a different manner: experiment 1 induced a variable delay before the response in the AV trial, experiment 2 induced a variable delay before the response in the A trial, and experiment 3 introduced a sensory-motor masker stimulus in between AV and A trials (Fig. 1).
Manipulating the delay within audio-visual trials. In the first experiment (n = 20) we manipulated the temporal delay between the audio-visual stimulus and participant's response in the AV trial, which could take one of the five average values (0.5 s, 1 s, 2 s, 4 s, and 8 s; ± 200 ms uniform random jitter in each trial). This manipulation could in principle affect both the ventriloquism bias and the aftereffect bias. Figure 2A shows the resulting biases (as participant-averaged data) for the two extreme values of the delay (0.5 s and 8 s).
We implemented two separate analyses to probe whether the biases differed as a function of delay. In a first approach, we fit a GLMM across all single trial biases, conditions and participants. Extending model 3 by the delay as an additional factor provided "very strong" evidence in favor of no effect of delay (BIC 55940 without and 55963 with including the delay; ΔBIC = 23). The model parameters for the full model including the delay and its interactions revealed no significant effect for delay (Table 1).
In a second approach, we fit the participant and condition-wise trial-averaged biases using individual regression models and investigated whether the two slopes (linear, nonlinear) differed as a function of delay using a non-parametric test (Fig. 2B): neither slope revealed an effect of delay (Friedman's nonparametric ANOVA, reporting FDR corrected p values; linear term: χ(4,99) = 4.5, p fdr = 0.95, quadratic term: χ(4,99) = 3.1, p fdr = 1.2). Hence, our data offer no evidence that manipulating the delay between the AV stimulus and the associated response affects the strength of the ventriloquism bias.
The same manipulation also did not affect the ventriloquism aftereffect ( Fig. 2A). The addition of the delay in model 1 resulted in a reduced fit (BIC without 52322 and with delay 52339; ΔBIC = 17 providing "very strong" evidence in favor of no effect) and in the combined model neither the effect of delay nor its interaction with ΔVA were significant ( Table 1). The analysis of participant-and condition-wise biases led to the same conclusion (linear term: χ(4,99) = 6.8, p fdr = 0.6; Fig. 2B).
Manipulating the delay within auditory trials. In a second experiment (n = 21) we tested whether adding a similar temporal delay between the auditory stimulus and the response in the A trial would affect the two biases (Fig. 3A). First, and as expected given that the manipulation was specific to the A trial, we found that the ventriloquism bias was not affected: adding the delay as factor did not improve model fit (BIC without 57840 and with delay as factor 57865; ΔBIC = 25 providing "very strong" evidence in favor of no effect) and the factor delay and its interactions were not significant ( Table 2). The condition-and participant-wise biases were also  Fig. 3B). Interestingly, also the aftereffect did not change with the delay in this experiment: the addition of the delay did not improve the model fit (BIC 54617 vs. 54633; ΔBIC = 15 providing "very strong" evidence in favor of no effect) and the interaction terms were not significant ( Table 2). The same conclusion was supported by the participant-and condition-wise biases (linear term: χ(4,104) = 8.1, p fdr = 0.4; Fig. 3B).
Masking audio-visual trials. In a third experiment (n = 22) we tested whether the ventriloquism aftereffect could be manipulated by adding a sensory-motor masker added in between the AV and A trials. The masker comprised a sensory component both in the visual (full-screen random dot pattern) and auditory modalities (a spatially diffuse sound) and required the participants to make a motor response to also mask potential memory traces of the preceding motor response in the AV trial. For comparison, participants performed blocks with the interleaved masking trial and without, with the order of masking and non-masking blocks randomized across participants. We ensured that the overall temporal delay between the AV and A trials was comparable across these two conditions. The experimental manipulation of each experiment. The top sequence shows the AV trial with a varying delay between stimulus and response (Experiment 1). The middle sequence shows Experiment 2, in which the delay between stimulus and response in the A trial was manipulated. In both Experiments 1& 2 the inter-trial intervals had a default delay of 800-1200 ms (uniform). The bottom sequence shows the masking trials inserted in between the AV and A trials in Experiment 3. The masking trial was present in half the AV-A sequences, in the other half there was no masking trial (control) but the inter-trial interval between AV-A trials was extended (1800-2000 ms) to obtain an overall similar delay between AV and A trials in the masking and control conditions. Masking and control trials were blocked. T stands for 'Tone' , displayed to guide the participants which stimulus to localize. Yellow speakers are placeholders for the speaker located behind the screen (invisible to the participant). Red square in the 'Mask response' is the target, and the white vertical line is the cursor. www.nature.com/scientificreports/ As in the two preceding experiments, we observed robust ventriloquism and aftereffect biases in the AV and A trials (Fig. 4A). As expected given the experimental design, the ventriloquism effect did not differ significantly between conditions. The addition of the masking condition as factor did not improve model fit (BIC without 60978 and with masker 60998; ΔBIC = 20 providing "very strong" evidence in favor of no effect) and the interaction terms were not significant ( Table 3). The analysis of participant-wise biases confirmed this (linear slope: χ(1,43) = 0.7, p fdr = 0.8, quadratic slope: χ(1,43) = 0.7, p fdr = 0.8; Fig. 4B).
Interestingly the masking manipulation did not affect the ventriloquism aftereffect. The addition of masking condition did not improve the model fit (BIC without 58439 and with delay 58455; ΔBIC = 15 providing "very strong" evidence in favor of no effect) and the model parameters revealed no significant contribution of condition (Table 3). Finally, the analysis of individual participant data revealed no significant difference in slope ( Fig. 4; χ(1,43) = 1.6, p fdr = 0.8; Fig. 4B).

Discussion
We tested the hypothesis that the trial-wise ventriloquism aftereffect is susceptible to manipulations known to interfere with working memory. Across three variations of an established ventriloquism paradigm we found no evidence for an interference of prolonged temporal delays (up to 8 s) or sensory-motor masking trials to reduce the strength of the ventriloquism aftereffect bias. www.nature.com/scientificreports/ The ventriloquism aftereffect and working memory. The motivation to probe the ventriloquism aftereffect against memory manipulations came from two observations. First, previous neuroimaging studies on the cerebral origin of the ventriloquism aftereffect have suggested a role of medial parietal regions such as the precuneus 7,15 . Studies on working memory or spatial navigation tasks have implied these regions in maintaining a persistent representation of multisensory spatial information 10,11,29,30 . We have previously shown that parietal representations of multisensory spatial information are maintained between trials in the ventriloquism paradigm, and are predictive of the aftereffect bias 7 . Hence, a role of short-term memory in the aftereffect is directly suggested by neuroimaging results. Second, previous work on serial dependencies in unisensory perceptual tasks has suggested that these dependencies do not arise from sensory-level affects but rather reflect higher cognitive processes such as memory or the use of remembered information for subsequent decisions 16,18,22 . For example, a study on serial dependencies in visual judgements has used very similar mnemonic manipulations of temporal delays to show that the trialwise biases are affected by the delay manipulation 22 . Hence, the observation that sensory and meta-cognitive variables carry over between trials even in simple laboratory paradigms also suggests a role of memory-related processes in the ventriloquism aftereffect.
While we did not find a dissipating effect of temporal delays or sensory maskers on the trial-wise ventriloquism aftereffect, a previous study suggested that intervening audio-visual trials before the auditory trial lead to a reduction of the trial-wise aftereffect 4 . This suggests that multisensory information that bears the very same task-relevance can reduce the aftereffect, while a masking stimulus that comprises distinct audio-visual features and pertains to a different task does not, as seen in the present study. In addition, one study used repetitive AV trials to induce the ventriloquism aftereffect and found that this accumulates over repetitions but also dissipates over delays of 5 s and 20 s when no sensory interference is present 9 . Hence, the combined evidence suggests that the trial-wise aftereffect and that induced by prolonged and repetitive exposure to a consistent audio-visual discrepancy differ in their sensitivity to memory interference. Still, future work is needed to directly test this hypothesis within the same participants and experimental design.
Does the lack of evidence speak for the absence of an effect? The absence of a significant result can naturally arise from a number of reasons. First, the sample size may have been too small. We based the sample size on general recommendations for behavioral tests 31 and our previous studies 7, 14 . Across several studies we found that a sample size of about 20 participants is sufficient to reliably detect both the ventriloquism effect and its aftereffect and the present data confirm this. Furthermore, the obtained effect sizes for the absence of an effect of delay or masking conditions (BIC differences) clearly speak against an effect rather than being inconclusive. Hence, the collective evidence obtained across the three experiments provides converging evidence that the ventriloquism aftereffect is robust against the tested manipulations.
It could also be that the tested memory manipulations did not interfere sufficiently with the relevant neural processes maintaining the sensory information. Longer delays may have a stronger influence 9 , but may come at the cost of overall reduced attention to the task, making it difficult to disentangle attention and memory effects. Here we restricted the maximal delay to 8 s to facilitate the collection of sufficiently many trials for all conditions within a single experimental session. Also, the masking stimuli may not have been sufficiently salient or comprehensive to fully mask all relevant memory traces. For example, the acoustic masker had the same spectral composition as the task-relevant sound, and although presented diffusely from all speakers, may have been effectively perceived as a sound with a centrally located center of gravity. Future studies could test alternative manipulations such as more extensive masking stimuli that may provide a more comprehensive sensory-motor  www.nature.com/scientificreports/ interference, or could consider the use of a dual task paradigm enhancing the simultaneous memory load. In fact, two previous studies considered either a dual-task paradigm 32 or diverted attention 25 and found that these did affect, but not abolish, the aftereffect. All in all, more systematic work using experimental manipulations that prove to affect memory in additional control paradigms (known to depend on short-term memory) are required to confirm the observed robustness of the trial-by-trial spatial ventriloquism aftereffect.
Implications for understanding the neural underpinnings of the ventriloquism aftereffect. Previous work has implied parietal regions and also early sensory regions in the ventriloquism aftereffect. For example, work on the long-term aftereffect suggested that the underlying processes involve the recalibration of early sensory representations, more so than relying only on high-level processes in fronto-parietal regions 15,33 . In contrast, in a recent study we found a primary role of parietal regions in mediating the trial-bytrial effect and contributing to long-term recalibration as well 7,14 . Combined with the behavioral results obtained here these neuroimaging studies suggest that the trial-wise aftereffect is not completely mediated by parietal regions involved in short-term memory, but rather originates from more distributed processes comprising regions that are insensitive to the present memory manipulations. One possibility for future work could be to directly quantify the maintenance of the audio-visual information received in the ventriloquist trial based on single-trial classification 14 in order to probe the efficacy of the memory manipulations and to determine whether and where in the brain either the trial-wise or the cumulative aftereffects are established despite memory inter- www.nature.com/scientificreports/ ference. Such a comparative approach seems particularly necessary given the partly divergent results pertaining to the robustness of the ventriloquism aftereffects emerging on a trial-by-trial or much longer timescale.

Methods
We report data from three experiments, in which a sample of 20, 21 (delay paradigm) and 22 (masking paradigm) right-handed healthy young adults participated (age range: Exp 1: 19-30, mean ± SD: 23.1 ± 2.87; Exp 2: 18-30, mean ± SD: 23.4 ± 3.17; Exp 3: 20-30, mean ± SD: 25.3 ± 2.57). As the data were collected anonymously it is possible that several participants participated in more than one experiment. All had tested normal vision and reported normal hearing and no history of neurological or psychiatric disorders. Each participant provided written informed consent and was compensated monetarily. The study was conducted in accordance with the Declaration of Helsinki and was approved by the local ethics committee of Bielefeld University.

General experimental setup and task.
The design of the experiments followed previous studies on the ventriloquism aftereffect 4,7 . Each of the three experiments was based on the same single-trial localization task designed to probe both the audio-visual spatial ventriloquism effect and its aftereffect 4,7 . Participants sat 135 cm in front of an acoustically transparent screen (Screen International Modigliani, 2 × 1 m) with their head on a chin rest. Sounds were presented using a multi-channel soundcard (Creative Sound Blaster Z), amplified via an audio amplifier (t.amp E4-130, Thomann Germany) and played from one of five speakers located at − 16°, − 8°, 0°, + 8°, + 16° (0° = center) (Monacor MKS-26/SW, MONACOR International GmbH, Germany) behind the screen. The acoustic stimulus was a 1300 Hz sine wave tone (50 ms duration) sampled at 48 kHz and presented at 64 dB r.m.s. Visual stimuli were projected (Acer Predator Z650, Acer Inc., Taiwan) onto the screen. The visual stimulus was a cloud of white dots dispersed following a two dimensional Gaussian distribution (N = 200 dots, SD of vertical and horizontal spread 1.6°, width of a single dot = 0.12°, duration = 50 ms). Stimulus presentation was controlled using the Psychophysics toolbox 34 for MATLAB (The MathWorks Inc., Natick, MA) with ensured temporal synchronization of auditory and visual stimuli. Participants' task was to localize a sound during either Audio-Visual (AV: sound and visual stimulus presented simultaneously) or Auditory (A: only sound) trials, or to localize a visual stimulus during Visual trials (V: only visual stimulus). Participants responded with a mouse cursor. Each trial started with a fixation period (Exp1,2: uniform 1100 ms-1500 ms; Exp3: 1000 ms-1200 ms) followed by the stimulus (50 ms). After a random poststimulus period (see below) the response cue emerged, which was a horizontal bar along which participants could move a cursor. A letter 'T' was displayed on the cursor for 'tone' in the AV or A trials, and 'V' for the V trials to indicate which stimulus participants had to localize. There were no constraints on response times, however the participants were instructed to respond intuitively, and to not dwell too much on their response. Inter-trial intervals varied randomly (see below). A typical sequence of trials is depicted in Fig. 1. Specific experimental designs. Experiments 1 and 2 manipulated the delay between the sensory stimulus and its respective response by inducing a variable delay (5 levels with mean delays of 0.5 s, 1 s, 2 s, 4 s, and 8 s) between stimulus and the response cue. The precise delays were randomly jittered (uniform ± 200 ms) around these mean values to avoid participants forming specific expectations. Experiment 1 manipulated this delay for the AV trial, experiment 2 for the A trial. Each experiment consisted of 5 blocks, with each block comprising a sequence of 75 AV-A trials, and 15 interleaved V trials. For the AV trials, the locations of auditory and visual stimuli were drawn semi-independently from the 5 locations to yield a range of different audio-visual discrepancies (abbreviated ΔVA in the following; see below). For the A or V trials, stimulus locations were drawn from the  www.nature.com/scientificreports/ 5 locations randomly. The audio-visual discrepancies in the AV trials took one of the following 5 values (ΔVA: − 24°, − 8°, 0°, + 8°, + 24°), with the combinations of discrepancies and temporal delays changing pseudo-randomly across trials. Each combination of (ΔVA) and delay was repeated 15 times. Experiment 3 separated the AV and A trials by a sensory-motor masking trial. Masked and non-masked trials were blocked. The block (masked vs. non-masked) order was randomized across participants. The masking trial consisted of an audio-visual display (duration for both audio and visual stimulus: 100 ms) and required participants to make a motor response. The visual mask was a more dispersed version of the standard visual stimulus with a SD of 5° (instead of 1.6°), centered at a location sample randomly from [− 10°, 10°]. The auditory mask was the same standard sound stimulus but played from all 5 speakers and hence devoid of spatial information (Fig. 1B, bottom). The motor masking task was to bring the cursor appearing randomly along the horizontal line to the middle red target box (Fig. 1B,  bottom). Experiment 3 comprised equal numbers of masking trials and no-masking (control) trials per level of ΔVA. Given that the stimulus and response in the masking trials required additional time, we extended the inter-trial interval between AV and A trials in the no-masking condition so that the average duration between the response in the preceding AV trial and the subsequent stimulus of the A trial was comparable between AV-A sequences with and without the masking trial (Fig. 1B, bottom).

Data analysis.
The behavioral responses obtained in each trial were converted into response biases following previous studies 4 . The single-trial ventriloquism effect (ve) in the AV trials was defined as the difference between the actual sound location (A AV ) and the reported sound location (R AV ): ve = R AV − A AV . The single-trial ventriloquism after-effect (vae) in the A trials was defined as the difference between the reported sound location  www.nature.com/scientificreports/ (R A ) and the mean reported location for all A trials of the same stimulus position (μR A ), i.e., (vae = R A − μR A ). We adapted this procedure following previous work 4,7 to ensure that any overall bias in sound localization (e.g. a tendency to perceive sounds are closer to the midline than they actually are) would not influence this bias measure. Both biases are systematically related to the audio-visual discrepancy (ΔVA) in a linear, but possibly also in a nonlinear, manner 14,27,28 . For the ventriloquism bias the linear dependency describes the fusion of both stimuli for the response 27,35 , while the non-linear dependency describes the reduced tendency to bind multisensory stimuli when these are seemingly discrepant and not judged as arising from a common cause 27,35 . This nonlinear dependency on the ventriloquism bias on ΔVA follows Bayesian models of sensory causal inference and has been shown to better capture the behavioral bias in many ventriloquism-like paradigms compared to a pure linear model 27,[36][37][38] . Given that the ventriloquism aftereffect is directly related to the sensory information received during the AV trial 6,8,39,40 , a similar linear and possibly nonlinear dependency is expected between the aftereffect bias and ΔVA. To determine the best dependency of each bias on ΔVA for the present dataset, we first compared three candidate models describing each bias. The respective GLMMs were fit across all single-trial biases (ve or vae) from all participants across all three experiments, regardless of the specific memory or masking manipulation: Here, and in the following, (ΔVA) ½ stands for the signed square-root of the magnitude of ΔVA (i.e. sign(ΔVA) * sqrt(abs(ΔVA)), bias stands for the single-trial bias (ve, or vae), and subj stands for the participant ID. The specific form of nonlinear dependency was chosen based on previous work 14,27,28 . Models were fit using a maximum likelihood procedure using the Laplace method in Matlab R2017a (fitglme.m). These models were compared based on their respective BIC. Interpretations of differences in BIC's were based on established criteria, with values larger than 6 corresponding to "strong" and those larger than 10 to "very strong" evidence 41 . This revealed (see Results) that model 3 provided the best fit for the ventriloquism bias and model 1 for the aftereffect.
We then used two approaches to probe whether the ventriloquism bias or the aftereffect are significantly affected by the manipulations of the delays or the masking condition. Each experiment was analyzed separately in the following two ways. In a parametric approach, we extended the above models (model 1 for the vae; model 3 for the ve) by the trial-specific delay (in milliseconds) or the masking condition as additional (continuous or categorical) factors, including their interaction with the linear (and possibly also nonlinear) ΔVA-dependencies. Again we compared BIC values between the respective model without delay (masking condition) and the model including these. In addition, we investigated the respective model parameters and their confidence intervals (Tables 1, 2 and 3).
In the second approach we asked whether the distribution of the condition-wise and participant-wise (trialaveraged) biases show a significant pattern indicative of an effect of delay (masking) manipulation. For this we modelled the trial-averaged participant-wise biases against a linear (vae) or combined linear and nonlinear (ve) dependency on ΔVA. We then used Friedmann's non-parametric ANOVA to quantify whether the regression beta's for the linear or nonlinear terms differed as a function of delay (masking condition). Here we corrected across multiple tests using the Benjamini & Yekutieli method for the false discovery rate (FDR) 42 .