Voice-selective prediction alterations in nonclinical voice hearers

Auditory verbal hallucinations (AVH) are a cardinal symptom of psychosis but also occur in 6–13% of the general population. Voice perception is thought to engage an internal forward model that generates predictions, preparing the auditory cortex for upcoming sensory feedback. Impaired processing of sensory feedback in vocalization seems to underlie the experience of AVH in psychosis, but whether this is the case in nonclinical voice hearers remains unclear. The current study used electroencephalography (EEG) to investigate whether and how hallucination predisposition (HP) modulates the internal forward model in response to self-initiated tones and self-voices. Participants varying in HP (based on the Launay-Slade Hallucination Scale) listened to self-generated and externally generated tones or self-voices. HP did not affect responses to self vs. externally generated tones. However, HP altered the processing of the self-generated voice: increased HP was associated with increased pre-stimulus alpha power and increased N1 response to the self-generated voice. HP did not affect the P2 response to voices. These findings confirm that both prediction and comparison of predicted and perceived feedback to a self-generated voice are altered in individuals with AVH predisposition. Specific alterations in the processing of self-generated vocalizations may establish a core feature of the psychosis continuum.

presented tone [12][13][14][15]18,19 . As the auditory N1 is primarily generated in the primary and secondary auditory cortices [20][21][22] , these findings suggest that the suppression effect associated with the efference copy reflects reduced activity in these brain regions. When auditory feedback does not match the predicted sensation, a prediction error signal is generated, resulting in an increased N1 response 3,15 . Larger N1 responses may signal increased attention to an unexpected sensory event (prediction error) [23][24][25] .
Sensory-related suppression effects have also been observed in the P2 component of the ERP 3,26 . Whereas N1 suppression is typically directly associated with the expected sensory feedback, the P2 may indicate a more conscious distinction between self-generated and externally-generated sensory events 15,18,19 . However, P2 suppression is not consistently reported. Whereas P2 suppression in response to self-generated tones was observed in button-press tasks (for example 15,18,19 ), lack of P2 suppression to self-generated speech sounds in talking tasks has been proposed as a mechanism that allows preservation of the sensory experience of voice feedback during speech generation 27 . These inconsistent findings indicate that the type of task may account for differences in P2 amplitude modulation.
Of note is that these ERP studies focus on neural responses obtained after the onset and during the processing of auditory feedback. However, patterns of neural activity preceding sensory feedback can shed light on the critical stage of sensory prediction formation per se. Specifically, pre-stimulus alpha activity is suggested to reflect the prediction of expected sensory consequences of an action 28,29 . Pre-stimulus alpha power is enhanced in sensory cortices prior to self-induced speech 28 , pure tones 29 , or visual stimuli 30 . Further, increased pre-stimulus alpha power for self-generated sounds is associated with larger N1 suppression 29 . Hence, suppression may index the transfer of an efference copy of a motor command to the auditory cortex, whereas the N1 modulation may reflect how well the sensory consequences of an action match or mismatch (reflecting the magnitude of the prediction error), while pre-stimulus oscillatory power may reflect prediction per se 29 .

Internal forward Model and Auditory Verbal Hallucinations
AVH are one of the cardinal symptoms of schizophrenia and experienced by up to 70% of schizophrenia patients 31 . However, they are also present in 6-13% of general population 32,33 . The experience of AVH in psychotic and nonpsychotic individuals seems to engage similar cognitive mechanisms and brain areas 34,35 . This suggests a neural substrate specific to AVH rather than schizophrenia. Nonetheless, despite numerous attempts to explain the neurofunctional mechanisms of AVH, they remain one of the notoriously unexplained symptoms in psychosis.
A substantial body of evidence shows that a failure to distinguish between internally and externally generated sensory signals (e.g., one's own voice vs. somebody else's voice) may underlie the experience of AVH 34,36,37 . This is reflected in a reduced N1 suppression effect when listening to real-time feedback of one's own voice 6,11,38 or in button-press tasks (contrasting self-initiation of a sound with passive exposure to the same sound) 13 . However, the later P2 ERP component as well as pre-stimulus oscillatory EEG activity have not been systematically analyzed in these studies. Further, as most of the existing studies involved chronic schizophrenia patients, it is possible that confounding effects associated with medication, hospitalization, and the presence of negative symptoms may have mediated the reported N1 results.
The study of nonpsychotic individuals who hear voices thus provides a necessary next step to understand the role of altered voice processing in AVH. These individuals are characterized by an increased tendency to falsely report the presence of a voice in bursts of noise 39 or in conditions of stimulus ambiguity in signal detection tasks 35 . Moreover, they recognize words in degraded speech earlier than controls and before being explicitly informed of its intelligibility 40 . Nonetheless, it remains unclear how these individuals process self-generated vs. externally-generated voice and sound feedback. If altered sensory feedback is observed as a function of increased hallucinatory predisposition, this would support the psychosis continuum hypothesis 31 , revealing common physiological brain processes underlying psychotic-like symptoms in nonpsychotic participants.

The Current Study and Hypotheses
The current study probed whether and how hallucination predisposition modulates the processing of fully predictable (i.e., self-generated) and less predictable (i.e., externally generated) auditory stimuli. By presenting both tones and voices after or in the absence of an action (button press), we examined whether potential problems with prediction formation (pre-stimulus alpha power) and sensory feedback (N1 and P2) are a characteristic in this sample of the general population, and whether the processing of simple (tones) and more complex (voice) auditory stimuli differ in this regard.
A well-established button-press paradigm 15,18,19 was used. Next to investigating classical ERP components (N1 and P2), we performed a time-frequency analysis of pre-stimulus EEG activity in the alpha range (8)(9)(10)(11)(12). In line with the notion that the efficacy of an internal model is a crucial determinant of the experimental performance, these measures were taken as indices for three different processes: the formation of a prediction or efference copy (pre-stimulus alpha power); the comparison between predicted and perceived sensory feedback (N1); the conscious detection of a self-initiated sound (P2).
Our central hypothesis was that the experience of hallucinated voices involves alterations in the internal forward model. If psychotic-like experiences, such as AVH, are elicited by the same underlying neurocognitive mechanisms as in schizophrenia, nonpsychotic individuals with high hallucination predisposition (HP) should exhibit a similar pattern of EEG activity to that observed in schizophrenia. By forming an efference copy based on finger tapping, the auditory cortex will be prepared for incoming sensation and its activity will be attenuated in participants with lower HP (N1/P2 suppression) but not in those with higher HP 13 . Further, we hypothesized that pre-stimulus alpha power would be reduced for self-generated stimuli as a function of increased HP, reflecting altered prediction formation. We specifically hypothesized that altered prediction and sensory feedback in high HP would be enhanced for voice stimuli 41 .

Methods
Participants. The study involved two stages. In stage 1, a large sample of college students from different Universities in Portugal (N = 354) was recruited to complete an on-line version of the 16-items Launay-Slade Hallucination Scale (LSHS) 42 . The total score ranges between 0 and 64, with higher scores indicating higher hallucination predisposition. Responses are provided on a 5-point scale (0 = "certainly does not apply to me"; 1 = "possibly does not apply to me"; 2 = "unsure"; 3 = "possibly applies to me"; 4 = "certainly applies to me"). In stage 2, we recruited 49 participants from stage 1, who consented to be contacted for further research on voice processing. Over a 12-month recruitment period, 49 individuals were interviewed in more detail about their experiences and clinical history. All participants completed a thorough clinical assessment that established that for those who reported hearing 'voices' (and thus scored higher on the scale), voices were distinct from thoughts, were experienced at least once a month, were unrelated to drug or alcohol abuse, and that participants did not have a psychiatric diagnosis or received a psychiatric diagnosis in relation to voice-hearing.
From the 49 participants recruited, 32 participants varying in their LSHS scores (total score: M = 22.72, SD = 14.03, range = 0-51; auditory score: M = 3.19, SD = 2.95, range = 0-9) accepted to partake in an EEG experiment (M age = 22.77, SD = 4.06, age rage = 18-32 years; 18 females). Note that recruitment of nonpsychotic voice hearers represents a challenging process due to concerns about stigma as noted in previous studies 40 . A priori power calculations using G*Power-3 statistical software 43 indicated that with a α = 0.05 and power (1-error probability) = 0.90, a sample of a minimum 24 (based on a medium effect size of 0.25) participants would be required to allow the detection of such effects.
Participants were all right-handed 44 , reported normal or corrected-to-normal visual acuity, and normal hearing. All participants provided informed consent and were reimbursed for their time, either by course credits or a voucher. The study was conducted in accordance with the Declaration of Helsinki and was approved by the local Ethics Committee of the University of Minho, Braga (Portugal).

Stimuli.
A 680 Hz tone (50 ms duration; 70 dB sound pressure level [SPL]) and a pre-recorded self-voice speech sound (vowel/a/) were presented in Experiment 1 and 2, respectively. Before Experiment 2, a voice recording session took place: participants were instructed to repeatedly vocalize the syllable "ah". Recordings were made with an Edirol R-09 recorder and CS-15 cardioid-type stereo microphone 37,45 . After the recording, the best voice sample of the vowel /a/ from each participant (i.e., constant prosody; maximum duration of 300 ms) was selected. The voice sample was edited to eliminate background noise (using Audacity software), and a Praat script was applied to normalize intensity at 70 dB. The stimulus for each participant was saved as.WAV file. Hence, all voice stimuli across participants had the same duration (300 ms) and intensity (70 dB SPL).
Procedure. In the EEG experiments, participants sat comfortably at a distance of 100 cm from a desktop computer monitor in a sound-attenuated and electrically shielded room. Each experiment included three conditions (see Fig. 1): auditory-motor (AMC), auditory-only (AOC), and motor-only (MOC) 15,18,19 . In the AMC, a button press instantaneously elicited a tone (Experiment 1) or the prerecorded voice of the participant (Experiment 2). Participants pressed a button approximately every 2.4 seconds. In the AOC, participants were instructed to passively listen and attend to the tones (Experiment 1) or to their pre-recorded self-voice (Experiment 2). The acoustic stimulation from the AMC was recorded on-line and used as the auditory sequence that was passively presented to participants in the AOC. The MOC represents a control condition that allowed controlling for motor-related artifacts via a button press (AMC-MOC): participants performed self-paced button presses approximately every 2.4 seconds but no tone (Experiment 1) or voice (Experiment 2) was elicited by the button press. No measurable sound was emitted by the presses. The AMC always preceded the AOC but the MOC was randomized across participants.
The experimental blocks were preceded by two training blocks 15,18,19 . The training block contained 200 trials. Participants performed correct taps in 75% of trials. No feedback was provided during the experimental blocks. In each of the AMC and AOC, 100 trials were recorded. The MOC in each experiment consisted of 100 trials. Both experiments took place in the same EEG session, but their order was counterbalanced across participants.
The presentation and timing of the stimuli was controlled by Presentation software (version 16.3; Neurobehavioral Systems, Inc.). Auditory stimuli were presented via Sennheiser CX 300-II headphones. A BioSemi tapping device was used to record the finger taps.
EEG Data Acquisition and Analysis. EEG data were recorded using a 64-channel BioSemi Active Two system in a continuous mode at a digitization rate of 512 Hz and stored on hard disk for later analysis.
EEG data preprocessing details are presented as Supplementary Material. The ERP analysis followed Knolle et al. 15,18,19 . The waveforms revealed two components: a negative component peaking at approximately 100 ms and a positive one peaking at approximately 200 ms (see Fig. 2). Peak amplitudes were calculated in the time windows of 70-110 ms for the N1 and 170-210 ms for the P2 15,18,19 .
In EEG time-frequency analyses, power analysis was performed based on time-frequency magnitude values. The mean power in the alpha band (8-12 Hz) was calculated in a pre-stimulus interval (−250 to 0 ms) 29 .
Amplitude Statistical Data Analyses. Amplitude and power data were analyzed with mixed linear models using the lmer4 46 and lmerTest 47 packages in the R environment (R3.4.3. GUI 1.70), which were used to estimate fixed and random coefficients. In contrast to the more traditional repeated-measures ANOVA analysis, LMER allows controlling for the variance associated with random factors such as random effects for participants in ERP amplitude and EEG power measures 48 . The default variance-covariance structure, i.e. the unstructured matrix, was used 49 .

Results
Event-related-potentials. N1. Intra-class correlation coefficients indicated that 41% of the total variance in the N1 response was accounted for by differences between participants. A Gaussian distribution of residuals was selected to run the mixed model and probability plots (quantile-quantile plots) confirmed its adequacy. Amplitude was included as outcome, participants were included as random effects, whereas stimulus type (tone, voice), condition (self, external), ROI, and LSHS Total were included as fixed effects.
The effects of stimulus type, condition, and ROI were significant. Specifically, we replicated the sensory N1 suppression effect: N1 was more negative for externally-generated compared to self-generated auditory stimuli Based on our hypotheses, we tested whether LSHS Total scores had a specific impact on the N1 amplitude as a function of stimulus type and condition, probing the interaction between the three predictors (see Table 1). For the sake of simplicity and facility in the interpretation, we defined the variable "interaction" (B) with 4 levels (2 stimulus types × 2 conditions) and tested the following model in R: m.N1 <− lmer(N1 ~ LSHS_Total * B + ROI + (1|Subject), data = N1, REML = FALSE); summary(m.N1), in which N1$B <− interaction(N1$StimulusType, N1$Condition). An increase in LSHS Total scores was associated with a less negative N1 response for self-generated tones compared to self-generated voices (β = 0.0423, SE = 0.0120, t = 3.535, p < 0.001) as well as in a less negative N1 for externally-generated voices compared to self-generated voices (β = 0.0301, SE = 0.0120, t = 2.517, p = 0.012; see Fig. 3). Specifically, the N1 amplitude in response to the self-generated voice was expected to be more negative than the N1 amplitude for the externally-generated voice if LSHS Total > 18.94, and the N1 amplitude for the self-generated voice was expected to be more negative than the amplitude for the self-generated tone if LSHS Total > 15.83. A separate analysis of midline electrodes (considering anterior [Fz, Cz] vs. posterior [CPz, Pz] electrode sites) using the same statistical model described before revealed similar effects.
To probe the specific effect of AVH, we ran the same model replacing LSHS Total with LSHS Auditory (sum of items 4 -"In the past, I have had the experience of hearing a person's voice and then found no one was there" P2. The analysis of P2 amplitude followed the same procedure as described above. Removing LSHS Total increased the goodness of fit of the model (χ 2 (10) = 2.859, p < 0.001), even though the Akaike's Information Criterion (AIC 52 ) for the complete model (AIC = 1849.9) was slightly lower than the AIC for the model without LSHS Total (AIC = 1850.8). Hence, LSHS Total was not included as a predictor of the P2 amplitude modulation.
HP specifically influenced alpha activity preceding the self-generated voice (see Table 2). An increase in LSHS Total resulted in a decrease in alpha power preceding self-generated tones compared to self-generated voices (β = −0.0015, SE = 0.0003, t = −3.870, p < 0.001) and preceding externally triggered voices compared to self-triggered voices (β = 0.0017, SE = 0.0004, t = −4.756, p < 0.001). As for the N1, we probed the specific effect of AVH by replacing LSHS Total with LSHS Auditory as a fixed factor. The effect of LSHS Auditory on pre-stimulus alpha power was not significant (β = 0.0015, SE = 0.0023, t = 0.623, p = 0.538).

Discussion
Why some people hear voices in the absence of external stimulation remains to be clarified. The current study probed how HP modulates the processing of auditory stimuli (tones vs. voices) that are anticipated as a consequence of a motor act (a button press) engaging an internal forward model. The study of nonpsychotic individuals with high HP represents an important step forward in probing the continuum hypothesis of psychosis. Here, we disentangled whether increased HP affects the formation of a prediction or efference copy (pre-stimulus alpha power), the comparison between predicted and perceived sensory feedback (N1), and/or the conscious detection of a self-initiated sound (P2). Our results suggest that an increase in HP is specifically related to alterations both in the generation of an efference copy and in the comparison between predicted and perceived sensory feedback. Further, they show that these alterations are more pronounced during the perception of one's own voice compared to simple tones.
The N1 suppression effect is reversed in high HP. Our findings replicate the classical sensory suppression effect (for example [5][6][7][8][9][10][11][12]15,18,19,53 ) both for tones and voices: self-generated sounds elicited a smaller N1 compared to externally generated sounds irrespective of stimulus complexity, an effect that was more prominent at anterior electrode sites 50,51 . As the sensory consequences of self-initiated sounds are precisely predicted such that the auditory cortex is prepared to receive sensory feedback, the N1 amplitude is suppressed. We also observed that self-voices elicited larger N1 amplitudes than tones, an effect that might be accounted for by differences in the duration of the two sound categories (50 ms for tones; 300 ms for voices): N1 amplitude was found to decrease with a shorter stimulus duration 54 , and to increase linearly with a longer tone duration 55 . However, given the complexity of factors known to modulate the N1 amplitude, other factors may have accounted for this effect (e.g., attention 54 ). Whereas the N1 attenuation effect to self-initiated tones was not affected by HP, the response to self-generated voices was. Higher HP resulted in an N1 enhancement rather than suppression: self-generated voices elicited a larger N1 amplitude modulation than externally-generated voices. Typically, an increased N1 response to self-generated auditory stimuli has been related to an increase in prediction error. When sensory feedback mismatches a prediction, the N1 suppression effect is reduced 3,56,57 . Further, when predictions are less specific, the N1 suppression effect is smaller 58 . Based on this evidence, one could also argue that when voices are less clearly predicted, sensory feedback is not attenuated. Hence, the increased N1 response to self-triggered voices may indicate that the self-voice is less accurately predicted even when participants with higher HP produce sensory feedback to their voices. Specific predictions are even more relevant in the case of more complex auditory stimuli, for which features such as stimulus frequency, onset, and intensity need to be incorporated into the prediction of the self-voice. Thus, a prediction error may arise from the comparison of a less specific/accurate prediction with the available sensory feedback. A less specific prediction may imply altered self-monitoring of speech, which has been consistently associated with AVH 34,36,37,[59][60][61][62] .
We cannot rule out the contribution of attentional processes when looking at the enhanced N1 response as a function of increased HP. Increased attention to the to-be-presented voice may have prevented the motor-induced attenuation effect. Cognitive theories of AVH contemplate the role of biased attentional processes in some forms of AVH 63 . Attention and prediction processes may have opposite effects with regards to the N1 attenuation: whereas prediction reduces the N1, attention results in increased N1 amplitude 64,65 . An increased N1 response to self-triggered voices due to increased HP may indicate that participants with high HP focus their attention more on self-voice stimuli 64 .  The conscious detection of a self-initiated voice is not affected by HP. The P2 response was also suppressed to both self-initiated tones and voices compared to their passive presentation. The existing evidence is less consistent regarding the effects of predictable sensory feedback on the P2. The pattern we found here is compatible with our previous results on tones 15,18,19 . The P2 suppression effect may reflect the conscious detection of sensory feedback as self-generated, playing a role in the distinction of self and externally-produced sounds 15,18,19 . Both groups showed a similar P2 suppression effect for tones and voices. This similarity suggests that the conscious detection of a sound that follows a button-press as self-produced is not affected by HP. This may underlie the preserved sensory experience of a self-generated sound despite reduced sensory suppression. Similar findings (i.e., lack of differences in P2 amplitude relative to controls) were observed in talking or button-press paradigms in schizophrenia 13 or cerebellar patients 18,19 .
Increased expectancy for voices in hallucination predisposition. We also observed that HP modulated EEG activity before sound onset. Specifically, HP affected pre-stimulus alpha activity for self-generated voices only: increased HP resulted in increased alpha power. Activity preceding an action (e.g., pressing a button that elicits a sound) has been proposed to reflect the formation of an efference copy 27 . Increased pre-stimulus alpha power has been related to prediction effects (occurring in the medial prefrontal cortex 28 ) that allow preparing the auditory cortex for the processing of self-generated sounds 28,29 . Specifically, alpha power is sensitive to the precision of a prediction regarding an upcoming stimulus. As HP participants did not show the expected sensory attenuation effect to self-triggered voices, it is possible that enhanced predictability to relevant features of the self-voice results in increased synaptic gain that reverses the suppressive effects of the prediction (i.e., the N1 to self-triggered voices fails to be suppressed).

Implications for models of AVH and of a psychosis continuum. Although high HP participants
showed a sensory attenuation effect that differs from that previously reported in schizophrenia (lack of a suppression effect) 11 , the increased N1 response to self-voices is partially consistent with an altered internal forward model in AVH. Whereas HP did not alter the processing of externally generated sounds (no difference between external tone and external voice), it did affect self-voice processing. Further, in both ERP and EEG oscillatory activity, the effects of HP were selective for voices, but not tones. A voice-specific rather than generalized, altered sensory prediction may explain why individuals who experience AVH hallucinated voices display the most common abnormal perceptual experience. It is worth noting that even though this was not explicitly reported in talking paradigms (i.e., in which voice stimuli were tested) in schizophrenia patients with AVH, the inspection of grand average waveforms suggests that HP affected the processing of self-generated vocal sounds more (N1 amplitude was more negative in the talking condition in patients with hallucinations compared to controls) than the processing of externally generated voices (no group differences) 11 . It is possible that along the psychosis continuum, altered sensory feedback that can be predicted as a function of one's own action, occurs first for auditory stimuli with increased social relevance (e.g., voices) and in clinical stages of the continuum it generalizes to other types of sounds (e.g., simple tones). This hypothesis is admittedly speculative and needs to be tested in future studies with larger samples.
Together, the current findings suggest that sensory feedback to self-voice is altered in people with an AHV predisposition. Specific alterations in the processing of self-generated vocal sounds may thus establish a core feature on the psychosis continuum.

Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.