Introduction

The ability to make sense of the noisy information present in the world around us is crucial for our survival. Yet, what we perceive is not a veridical reproduction of the signals reaching our sensory apparatus, but it is instead an interplay between bottom-up processes and top-down predictions about the upcoming events1. Attempts to assess how expectations influence our perception show that we are more likely to report perceiving an expected than an unexpected stimulus2,3,4,5,6. However, although the facilitatory effects of expectation on perceptual processing have been found in the wider sensory literature, they usually conflict with work from the action domain7.

Being able to predict the sensory consequences of our own action constitutes a specific instance of predictive processing that is highly critical in perceiving behaviourally relevant events in our environment. Several lines of research have shown that actions suppress the processing of the self-generated reafferent input (e.g., action-induced blindness8, saccadic suppression9, self-generation of stimuli10). The attenuated physiological responses to self- compared to externally-generated inputs appear to be widespread throughout the animal kingdom and modality independent, being reported in a wide range of species11,12,13,14,15,16 and in several sensory modalities, including the auditory17,18,19,20,21,22,23,24,25,26,27,28, visual29,30,31,32, and tactile33,34,35. An influential proposal referred to as the ‘cancellation account’ attributes sensory attenuation to an efference copy of the motor command generated before or during an action that is sent from the motor to the corresponding sensory cortices36,37. This efference copy allows one to accurately predict the imminent stimulation resulting from the individual’s own action via internal forward modelling38. The resulting motor-driven predictions of sensory reafference (i.e., the “corollary discharge”) are then compared to the actual sensory consequences of one’s actions, and subsequently, only the difference between the two (i.e., prediction error) is sent to higher stages of the neuronal hierarchy for further processing1, effectively cancelling out responses to predictable input. The cancelling role of the motor-driven predictions in sensory cortices has been suggested to be of great ecological importance, as it contributes in prioritizing the newsworthy unpredictable information39,40,41, and shapes our perception of sense of agency42.

However, in the animal kingdom corollary discharge has been found to influence sensory processing in myriad ways besides cancellation of reafference43. Contrary to cancellation theories, recent sharpening models propose that perception is biased towards the expected input44,45 in line with evidence showing enhanced BOLD responses to self-generated stimuli46,47 and increased discharges in some neurons during self-initiated vocalizations48. The discrepancy between cancellation and sharpening accounts is also reflected in human studies attempting to assess the behavioural correlates of the neurophysiological effects of self-generation on stimulus processing. While self-initiated action effects have been typically found to be perceived as less ticklish33,49,50, less forceful35,51, or less loud52,53,54 than equivalent stimuli initiated by another person or by a computer, recent findings show enhanced perception for action-expected outcomes46,55. Collectively, the discrepancy in the results reported so far points to factors other than self-generation that may interactively modulate sensory processing during motor actions.

In a closer look, the mixed findings reported so far as concerns the neurophysiological and behavioural effects of motor predictions on sensory processing may be due to critical differences in the experimental paradigm, stimulus features, and obtained measures (see Table 1). On the one hand, animal studies with perceptual measures have reported both attenuation56,57 and enhancement58, but assess perceptual processing during locomotion compared to quiescence56,57,59 or in Go-No/Go tasks58. However, sensory processing during action may differ from processing of stimuli resulting from action as assessed in contingent paradigms with humans that typically compare action-predicted vs. unpredictable stimuli (i.e., self- vs. externally-generated35,52,53,54) or predicted vs. mispredicted stimuli (action-congruent vs. action-incongruent44), thus rendering it difficult to disentangle whether the observed effects are driven by specific motor-driven predictions or by unspecific arousal mechanisms56. Additionally, studies also differ in the task and stimulus intensities that they employ. Human studies reporting suppression typically use supra-threshold stimuli in discrimination paradigms and show modulations in perceptual bias (Point of Subjective Equality; PSE) rather than sensitivity measures (Just Noticeable Difference; JND)35,52,53,54. In contrast, evidence supporting sharpening accounts has been reported mostly in detection paradigms that obligatorily need to use near-threshold stimuli44,46,55,60. This line of work has reported changes in sensitivity in both directions46,60,61 (but see62,63 for no effects), but also in decision processes55. Collectively, these findings raise the possibility that the conflicting findings on the nature of the effects of action on the perceptual processing of self-initiated stimuli may depend on a handful of specific factors (i.e., action/no action comparisons vs. action-predicted/action-unpredicted comparisons; stimulus intensity) that may selectively affect certain aspects of perception (i.e., detection or discrimination ability; sensitivity or bias).

Table 1 Human studies assessing the behavioural effects of self-generation on auditory processing.

Recent work has indeed provided some evidence showing that sensory attenuation may be dependent on the stimulus intensity64,65 (but see66). Reznik and colleagues65 had participants judge the perceived intensity of self- and externally-generated sounds presented at a supra- or a near-threshold intensity. Unbeknownst to the participants, the two sounds were always presented at the exact same intensity, but they were asked to report which one of them was louder. Their results showed a significant interaction between intensity and sound source. While the supra-threshold self-generated sounds were perceived as less loud than the passive comparisons, the opposite effect was obtained for near-threshold intensities. That is, when the sensory consequences of participants’ movements were of low intensity, a significant sensory enhancement was observed, with the self-generated tones being judged as louder than the comparison passive tones. However, due to the experimental design of this study (i.e., no varying comparison intensities), no psychophysical measures (e.g., PSE, JND) could be obtained to further examine whether the modulatory effects of intensity on perceptual processing for self-initiated sounds are driven by changes in bias or sensitivity, respectively.

Taken together, the evidence reported so far suggests that the direction of self-generation effects may be dependent on the intensity and therefore the amount of sensory noise in the signal. Indeed, recent work has highlighted the role of sensory noise in driving perceptual processing, suggesting that enhanced sensory processing for unexpected events is dependent on the ‘newsworthiness’ of the signal, such that the less the sensory noise (i.e., high intensities), the higher the sensory precision of the signal, and thus the more informative the unexpected stimulus7,41. Yet, we reason that the findings obtained from the previous self-generation studies cannot provide solid conclusions on this matter, due to the use of a small range of intensities (either supra-threshold only52,53,54, near-threshold only46, or only one of each65). More importantly, the inconsistency between the studies raises the possibility of differential effects of self-generation on different aspects of perceptual processing. Although expectations have been found to yield differential effects on perceptual bias and sensitivity measures in the literature outside the action domain6,67, no systematic attempts have been made to date to assess whether motor actions alter our sensitivity or whether they bias the estimate of stimulus’ perceived loudness as a function of sound intensity.

The aim of the present study is twofold: We sought to elucidate the modulatory effects of intensity on the perceptual processing of self-generated sounds across the auditory intensity range, while systematically assessing whether the expected effects drive changes in sensitivity and/or perceptual bias. To this end, we employed a sound detection and a loudness discrimination task and compared the detection and discrimination sensitivity, as well as the possible bias in perceived loudness for self- vs. externally-generated sounds at both supra- and near-threshold intensities.

Based on previous studies with self-initiated sounds of high and low intensities, we expected to observe (i) sensory attenuation for self- compared to externally-generated sounds at supra-threshold intensities and (ii) sensory enhancement for self- compared to externally-generated sounds at near-threshold intensities. This interaction would be evident by better detection performance for the self- as compared to the externally-generated sounds46. Similarly, in the discrimination task, this interaction would be reflected in (i) lower PSE for self- compared to externally-generated sounds at supra-threshold intensities52,53,54,65 and (ii) higher PSE for self- compared to externally-generated sounds at near-threshold intensities65. Finally, based on previous studies52,53,54, we did not expect any significant differences in the JND, at least for the supra-threshold conditions.

The hypotheses and planned analyses for this study were preregistered on the Open Science Framework (https://osf.io/ypajr/). The Method and Results sections follow the preregistered plan.

Methods

The present study consisted of two two-alternative forced-choice (2AFC) tasks: a detection and a discrimination task. In the detection task, participants were presented with one sound at varying intensities and had to indicate whether it was presented in a first or a second interval of time, while in the discrimination task two sounds were presented in two different consecutive intervals of time and participants had to indicate whether the first sound (standard) or the second sound (comparison) was louder. The order of tasks was counterbalanced across participants.

Participants

Thirty-one healthy, normal-hearing subjects, participated in the present study. Participants were typically undergraduate university students at the University of Barcelona. Participants with hearing thresholds above 20 dB, psychiatric or neurological illness, aged below 18 or above 50 years old and who consumed drugs or pharmaceuticals acting on the central nervous system were excluded. Data from three participants (i.e., participants 2, 19, 25) had to be excluded due to technical problems or inability to comply with the task instructions, leaving data from twenty-eight participants (6 men, 22 women, Mage = 23, age range: 18–33 years). The sample size was defined based on the preregistered a priori power analysis. All participants gave written informed consent for their participation after the nature of the study was explained to them and they were monetarily compensated (10 euros per hour). Additional materials included a personal data questionnaire and a data protection document. The study was approved by the Bioethics Committee of the University of Barcelona and all provisions of the Declaration of Helsinki were followed.

Apparatus

The visual stimuli were presented on an ATI Radeon HD 2400 monitor. The auditory stimuli were presented via the Sennheiser KD 380 PRO noise cancelling headphones. To record participants’ button presses and behavioural responses, we used the Korg nanoPAD2. The buttons of this device do not produce any mechanical noise when pressed, and, thus, do not interfere with our auditory stimuli. The presentation of the stimuli and recording of participants’ button presses and responses were controlled using MATLAB R2007a (The Mathworks Inc., 2017), and the Psychophysics Toolbox extension68,69.

Stimuli

In the detection task we used pure tones presented binaurally with durations of 300 ms at a frequency of 1000 Hz (created using MATLAB R2007a; The Mathworks Inc., 2017). The sampling frequency was 44,100 Hz, the ramp duration (duration of the onset and offset ramps) was 25 ms and a number of 16 bits per sample46,65. The tone intensity ranged from 0 to 28 dB in steps of 4 dB for passive and active conditions.

For the discrimination task, we created pure tones with the same characteristics as those used in the detection task, except for the intensities. The intensities for the standard and comparison tones were partly based on those used in previous studies52,53,54,65. The standard tone was always presented at a fixed intensity, while the comparison intensities varied. Specifically, the standard tones had a fixed intensity of 74 dB for supra-threshold conditions, while for the near-threshold conditions we used a fixed intensity of 5 dB above the threshold as obtained from the audiometry for the 1000 Hz sounds65. The comparison supra-threshold stimuli varied randomly between 71 and 77 dB in steps of 1 dB, thereby resulting in seven possible comparison intensities: 71, 72, 73, 74, 75, 76, 7752,53,54. For near-threshold conditions, the comparison intensities were presented at intensities starting from 3 dB below to 3 dB above the standard intensity in steps of 1 dB, so as to match the comparison intensities of the supra-threshold conditions.

Procedure

Participants were seated in a soundproof chamber and auditory stimuli were presented to both ears via headphones. Visual stimuli were presented by a computer screen located in front of the participants. Prior to each task, hearing thresholds were assessed with a standard pure-tone audiometry. Additionally, practice blocks were used so that participants could familiarize themselves with each task, which also allowed us to obtain the stimulus-onset-asynchrony (SOA) between interval-cue presentation and button press in order to introduce the same visual-to-sound delay in the first passive trials.

Detection task

Participants performed a 2-Alternative Forced Choice auditory detection task, where they had to report whether a sound of varying intensities was presented in interval one or two (Fig. 1a). The sounds were either self-generated (active trials) or passively presented by the computer (passive trials).

Figure 1
figure 1

Schematic illustration of the experimental design. (a) Detection task: Each trial started with a fixation cross, followed by two intervals. In active trials, participants were instructed to press a button in each interval (“Press” cue) and a sound was triggered either in 1st or in the 2nd one (in the example shown here, the sound is presented in the 1st interval). In passive trials, the sound was passively presented (“Listen” cue). Participants had to respond whether they heard the sound in the 1st or in the 2nd interval. (b) Discrimination task: Each trial started with a fixation cross, followed by two sounds. The first sound was either self- (active trials; “Press” cue) or externally-generated (passive trials; “Listen” cue) and was presented at an intensity of either 74 dB (supra-threshold intensity) or 5 dBs above each participant’s audiometric threshold (near-threshold intensity). The second sound was always externally-generated (“Listen” cue) and ranged ± 3 dB in steps of 1 dB relative to the first one. Participants had to respond which one was louder.

Every trial started with a fixation cross with a duration of 500 ms followed by two consecutive intervals with a duration of 800 ms each. In the active trials, the sound presentation was contingent on participants’ button press. That is, participants had to press a button with their right hand once the visual cues “PRESS 1” and “PRESS 2” appeared in order to generate a sound that was triggered by the button press in either the 1st or the 2nd interval. For the intervals containing the sound (either 1st or 2nd), the participants’ button press triggered the sound only if he/she pressed the button up to 300 ms prior to the interval offset. This allowed us to control that the sound had always a 300-ms duration in case a participant delayed the button press. In the passive trials, participants were passively presented with a sound in one of the two intervals indicated by the visual cues “LISTEN 1” and “LISTEN 2”. To match the timing of the sound in the active conditions, the sound was presented after an interval that was randomly selected from the participants’ distribution of press times in the active trials performed until the current trial. Thus, the timing of the stimulus presentation was equal for the two types of trials, thereby minimizing any effects of differences in sound timing on the ability to detect self- and externally-generated sounds70,71. After the offset of the second interval, the question “Did you hear the sound in the 1st or 2nd interval?” appeared on the screen for 1500 ms and participants had to press a button with their left hand within this time window to respond. For both trials, once a response was provided the question displayed on the screen disappeared immediately. The next trial started always after the 1500 ms response window was over.

The whole task was divided into 25 blocks consisting of 40 trials, resulting in 1000 trials in total (500 active and 500 passive trials). Active and passive conditions were presented randomly intermixed within each block (20 active and 20 passive trials). The intensities were presented using the method of constant stimuli. Intensities from 0 to 24 dB were presented a total of 70 times each for each condition, while we only presented the sound at 28 dB 10 times for each condition to save experimental time, given that pilot data showed ceiling performance at this intensity level. The interval containing the sound (interval 1 or 2) was random.

Discrimination task

In the discrimination task two sounds were presented in two different consecutive intervals and participants had to indicate whether the first (standard) or the second sound (comparison) was louder (Fig. 1b). Similarly to the detection task, there were two types of trials, passive and active. However, there were two additional intensity conditions, supra- and near-threshold, thereby resulting in 4 possible types of trials in total: Active and Supra-threshold (AS), Passive and Supra-threshold (PS), Active and Near-threshold (AN) and Passive and Near-threshold (PN).

Each trial started with a fixation cross with a duration of 500 ms followed by two consecutive intervals with a duration of 800 ms each. In the active trials, participants had to press a button with their right hand in the first interval, instructed by the cue “PRESS: sound 1”, in order to generate the standard tone. The comparison sound was passively presented in the second interval of time following the visual cue “LISTEN: sound 2”. The interval between visual cue and comparison sound onset was randomly selected from the participants’ distribution of press times in the first interval. For the standard self-generated sound, the participants’ button press triggered the sound only if he/she pressed the button up to 300 ms prior to the interval offset. This allowed us to control that the sound had always a 300-ms duration in case a participant delayed the button press. In the passive trials, participants were passively presented with two sounds in the 1st and the 2nd interval, respectively, indicated by the visual cues “LISTEN: sound 1” and “LISTEN: sound 2”. The sounds were presented after an interval that was randomly selected from the participants’ distribution of press times in the active trials. The interval between the two sounds was therefore random depending on the timing of the button press (active conditions) or the random delay drawn by the distribution of press times (passive conditions). Unbeknownst to the subjects, the standard tone was always presented at the same intensity within each intensity condition: 74 dB for supra-threshold conditions and 5 dB above the threshold obtained from the audiometry for near-threshold conditions. In contrast, the comparison sound ranged from 71 to 77 dB in steps of 1 dB for supra-threshold conditions and ± 3 dB in steps of 1 dB relative to the standard tone for near-threshold conditions. After the offset of the second comparison interval, the question “Which sound was louder: Sound 1 or Sound 2?” appeared on the screen for 1500 ms and participants had to press a button with their left hand to indicate whether the first (left button) or the second (right button) sound was louder. To control for the possibility that participants did not hear the near-threshold sounds, a third control button was used, and participants were instructed to press it only if they did not hear the two sounds. After participants’ response, the question disappeared immediately. The next trial started always after the 1500 ms response window was over.

The task was divided in 25 blocks, each one consisting of 28 trials. Each of the seven possible comparison tone intensities was presented 25 times per condition using the method of constant stimuli, as it yields a better estimation of the Point of Subjective Equality (PSE) and Just Noticeable Difference (JND) values compared to other methods72. This resulted in 175 trials per experimental condition (active/passive and supra-/near-threshold) and 700 trials in total for each participant. The conditions (i.e., sound-source: active vs. passive, and intensity: supra- vs. near-threshold) were intermixed within each block and the order of presentation was randomized for each participant.

Modifications from the preregistered plan

This experiment was preregistered on the Open Science Framework (https://osf.io/ypajr/). Relative to our preregistered plan, we made one modification: Instead of fitting the psychometric function with the Palamedes Toolbox73 as reported in the preregistration of this study, we decided to use the quickpsy package in R74 for better visualization of the data and in order to directly introduce the values obtained from the fitting procedure to statistical analysis in R. The change in the toolbox used is not expected to have affected the results, as we kept all the parameters as predefined in the preregistration.

Data analysis

Data analysis follows the preregistered plan. All analysis code will be publicly released with the data upon publication (https://osf.io/ypajr/).

Detection task

For each participant, the percentage of correct answers were calculated for each intensity and condition—active and passive—. Subsequently, for each condition, the percentage of correct responses was fitted with a normal cumulative function (see average psychometric functions in Fig. 2 and the individual psychometric functions in Supplementary Fig. S1) according to the maximum likelihood procedure, using the quickpsy package in R74. For each participant and condition, two parameters were extracted from the model: alpha (i.e., values for thresholds in the range of the intensity levels we used) and beta (i.e., values for slope in the range of 0 to 10 in steps of 0.1). The lower asymptote of the psychometric function (i.e., gamma) was set to 0.5 as in previous 2-AFC detection tasks, while the upper asymptote (i.e., lambda), which corresponds to the lapse rate, was set to 0.00173. For each participant and condition, goodness-of-fit and the 95% confidence intervals for thresholds were calculated by a parametric bootstrap procedure (n = 1000)75, using the quickpsy package in R74.

Figure 2
figure 2

Group psychometric functions per source condition (active and passive) in the detection task. Vertical lines represent the mean thresholds per source condition (i.e., defined as the intensity accurately detected at 75% of the trials). Detection thresholds did not differ between active and passive trials (p > .050).

The second part of the analysis consisted in calculating the dʹ sensitivity index and criterion in order to directly compare our results with previous studies using this measure46. This analysis was performed using the Palamedes toolbox73 (version 1.10.3). Given that here we employed a 2-AFC task, we first calculated the hit and false alarm rate for one of the two intervals (interval 1 as target). As hit for interval 1 were defined the trials, where the sound was in interval 1 and the participant responded that the sound was indeed presented in this interval. As false alarm for interval 1 were defined the trials, where the participant incorrectly detected the sound in interval 1, while the stimulus was actually presented in interval 2. Subsequently, we calculated the hit rate (= number of hits divided by the number of signal trials, i.e., trials where the sound was presented in the 1st interval) and the false alarm rate (= number of false alarms divided by the number of noise trials, i.e., trials where the sound was presented in the 2nd interval). After z-transforming the hit and false alarm rates, we calculated the dʹ (i.e., z(Hit) – z(False Alarm)) and criterion (i.e., − 0.5 * (z(Hit) + z(False Alarm))) for active and passive trials. Finally, we calculated the mean interval between the cue presentation and participants’ button press (henceforth SOAs) in the active trials.

Discrimination task

For each participant, the proportion of “second sound louder” responses was calculated for each condition (active/passive, supra-/near-threshold) and for the seven comparison intensities. Data from the trials where participants did not hear the near-threshold sounds (as indicated by the third control button; see Procedure) were excluded from the analysis. In order to directly compare performance across supra- and near-threshold conditions, we defined the comparison intensities as the difference in dB from the standard stimulus: − 3, − 2, − 1, 0, 1, 2, 3. The “second sound louder” responses for each condition were, then, fitted with a normal cumulative function (see average psychometric functions in Fig. 3 and the individual psychometric functions in Supplementary Fig. S2) according to the maximum likelihood procedure, using the quickpsy package in R74. For each participant and condition, two parameters were extracted from the model: alpha (i.e., values in the range of the comparison intensity levels we used) and beta (i.e., values for slope in the range of 0 to 10 in steps of 0.1). The lower asymptote of the psychometric function (i.e., gamma) was set to 0 as in previous 2-AFC discrimination tasks, while the upper asymptote (i.e., lambda), which corresponds to the lapse rate, was set to 0.00173. Thus, for each participant and condition, two measures were obtained. First, the Point of Subjective Equality (PSE), which corresponds to the alpha values of the model, and is defined as the intensity, where the comparison stimulus was reported as louder than the standard one on 50% of the trials. This value is used to estimate the comparison tone intensity that would make the standard and comparison tones perceptually equal and is considered an index of perceptual bias76. Higher PSE values would indicate that the standard first tone is perceived as louder, while lower PSE values would reflect an attenuated perceived loudness for this sound. Thus, shifts of the PSE values from the Point of Objective Equality (i.e., the point indexing the physical equality of the two sounds, which is 0 dBs here) would reflect a biased estimate of perceived loudness. Second, we extracted the just noticeable difference (JND), which corresponds to the beta values of the model (i.e., the standard deviation extrapolated from the fit) and is considered a measure of precision associated with the estimate. Higher JND values would reflect lower precision in discriminating the loudness of the two sounds (i.e., lower differential sensitivity77). For each participant and condition, goodness-of-fit and the 95% confidence intervals for PSE were calculated by a parametric bootstrap procedure (n = 1000)75, using the quickpsy package in R74. Finally, we calculated the mean interval between the cue presentation and participants’ button press (henceforth SOAs) in the active trials.

Figure 3
figure 3

Group psychometric functions per source and intensity conditions in the discrimination task. Vertical lines represent the PSE values per source and intensity (i.e., defined as the intensity, where the second comparison stimulus was reported as louder than the first standard one on 50% of the trials). Supra-threshold active sounds had lower PSE values than the supra-threshold passive ones, suggesting attenuated perceived loudness for the former. In contrast, at near-threshold intensities, active sounds had higher PSE values than the passive ones, pointing to enhanced perceived loudness for the former.

Results

All statistical analyses were performed using R (version 3.6.0). For all the significant results in the ANOVA, we report the ηG2 effect size and the ηp2, since the ηG2 is less biased than ηp278,79, but we also wanted to compare our findings with other studies that usually report the ηp2 effect size. Participants’ audiometric thresholds did not differ across the two audiometric sessions (see Supplementary Fig. S3). Significant subject-wise correlations between the measures across the two tasks are reported in Supplementary Table S1. Error bars in Figs. 4 and 5 depict the within-subjects confidence intervals80,81,82, calculated using the summarySEwithin82 function in R.

Figure 4
figure 4

Summary of the results from the detection task. Mean threshold, beta value for slope, dʹ score, and criterion. Error bars depict within-subjects confidence intervals80,81,82. There were no significant differences between active and passive in the threshold (one-tailed paired samples t-test, p > .050), slope (i.e., beta values from the psychometric fitting procedure; nonparametric Wilcoxon test due to violation of normality assumption, p > .050), dʹ score (one-tailed paired samples t-test, p > .050), or criterion (two-tailed paired samples t-test, p > .050).

Figure 5
figure 5

Summary of the results from the discrimination task. Mean PSE, JND, and percent of “1st sound louder responses”65. Error bars depict within-subjects confidence intervals80,81,82. From left to right: Significant interaction between65 source and intensity on PSE (p < .010), with the post-hoc comparisons showing lower PSE for the active supra-threshold compared to the passive supra-threshold condition (one-tailed paired samples post-hoc t-test; p < .050), significantly higher PSE for the active near-threshold compared to the passive near-threshold condition (one-tailed paired samples post-hoc t-test; p < .050), and significantly higher PSE for the active near-threshold compared to active supra-threshold (two-tailed paired samples post-hoc t-test; p < .050). Significant main effect of intensity on JND, with lower JND (i.e., better discrimination sensitivity) for the supra- compared to the near-threshold condition (p < .001). For the “1st sound louder responses”, we only included trials where the standard and the comparison sounds were presented at the same intensity (i.e., 74 dB as a supra-threshold intensity and 5 dB above each participant’s threshold as a near-threshold intensity). There was a significant interaction between source and intensity (p < .010), with the post-hoc comparisons showing less “1st sound louder” responses for active compared to passive trials when the sound was presented at 74 dB (one-tailed paired samples post-hoc t-test; p < .001), less “1st sound louder” responses for active trials when presented at 74 dB compared to when presented at 5 dB above each participant’s threshold (p < .050), and no differences between active and passive trials when the sounds were presented at 5 dB above each participant’s threshold (one-tailed paired samples post-hoc t-test; p > .050).

Modifications from the preregistered plan

Relative to our preregistered analyses (https://osf.io/ypajr/), we made one modification: For the detection task, we initially planned to perform a paired-samples t-test to test for differences in the slope of the psychometric function. However, considering that the normality test was violated, we performed a non-parametric Wilcoxon test.

Detection task

The thresholds, slopes, dʹ, and criterion values, were analyzed using paired samples t-tests with the factor sound source—active (A) or passive (P). Trials with erroneous presses (i.e., late onset time of button press and no presses) were excluded from all analyses (MA = 28.26%, SDA = 20.37 MP = 2.35%, SDP = 3.3). For the active trials, the mean interval between cue onset and button press was 0.39 s (SD = 0.07) for Interval 1 and 0.16 s (SD = 0.14) for Interval 2.

First, we performed statistical analyses for the measures obtained from the psychometric fitting procedure (Fig. 4). To test for differences between the thresholds in the active and passive conditions, we used a paired samples one-tailed t-test with the hypothesis of expecting lower detection thresholds in the active compared to passive trials46 (Shapiro–Wilk normality test p > 0.050). The analysis did not show any significant differences between active and passive conditions (t(27) =  − 1.09, p > 0.050, MA = 7.46, MP = 7.85, SDA = 3.7, SDP = 3.66). Subsequently, we tested for possible differences in the slope of the psychometric function. Considering that the assumption of normality was violated (Shapiro–Wilk normality test p = 0.020), we performed a nonparametric Wilcoxon’s signed rank test for paired data on the beta values obtained from the psychometric functions. The analysis did not show any significant difference between active and passive slopes (W = 146, p > 0.050, MA = 4.65, MP = 5.05, SDA = 2.48, SDP = 3.11). Finally, to further test for possible effects of self-generation and intensity level on detection performance, we also analyzed the percent of correct responses for both the active and passive trials for each one of the intensity levels, but did not find any significant interactions (see Supplementary Fig. S4).

To analyze the differences in the thresholds between the two conditions, we also calculated a 95% confidence interval for the difference in thresholds based on the simulations from the bootstrapping procedure (n = 1000). For 23 out of the 28 subjects no significant differences were observed between the active and the passive trials. For one of them, the comparison between observed and simulated thresholds showed a significantly higher threshold for the active compared to the passive trials, while for the other four, a significantly lower threshold was obtained for the active trials. The goodness-of-fit routine showed that for the active trials, 26 out of the 28 psychometric curves resulted in acceptable goodness-of-fit statistics, while the fitting procedure for the passive trials showed acceptable goodness-of-fit statistics for 25 out of the 28 psychometric curves. Despite the non-acceptable goodness-of-fit for some subjects and conditions, we kept these subjects in the analyses, after confirming that results would remain the same when excluding them.

Subsequently, we performed a signal detection analysis for the dʹ and criterion values (Fig. 4; Shapiro–Wilk normality test, p > 0.050). The dʹ values were analyzed using a paired samples one-tailed t-test with the hypothesis of expecting higher dʹ in active compared to passive trials46. Contrary to previous work46, the analysis did not show any significant differences between the active vs. passive dʹ values (MA = 1.2, SDA = 0.3, MP = 1.24, SDP = 0.32, p > 0.050). Similarly, the criterion values were analyzed using a paired-samples two-tailed t-test46. Similar to previous work46, we did not observe any significant difference in the criterion values between active and passive trials (MA = 0.83, SDA = 0.12, MP = 0.86, SDP = 0.13, p > 0.050). Collectively, although these findings suggest that self-generation does not affect detection sensitivity or response bias in a 2-AFC detection task, the lack of a contingent press-sound relationship (i.e., participants pressed twice in every active trial but only one button press generated the sound) may have also minimized any possible effects of motor-related predictions on detection performance.

Discrimination task

The PSE and JND values were analyzed using a repeated measures ANOVA with two factors: sound source—active (A) or passive (P)—and sound intensity—supra- (S) or near-threshold (N)—. Trials with erroneous presses (i.e., late onset time of button press and no presses) were excluded from all analyses (MAS = 22.9%, SDAS = 19.1, MPS = 0.96%, SDPS = 2.74, MAN = 23.29%, SDAN = 18.88, MPN = 1.51%, SDPN = 3.09). For the active trials, the mean interval between cue onset and button press was 0.37 s (SD = 0.06).

The analysis for the PSE values revealed that there was not a main effect of source (F(1,27) = 0.8, p > 0.050; MA = 0.39, MP = 0.25, SDA = 1.65, SDP = 1.65) or a main effect of intensity (F(1,27) = 2.62, p > 0.050; MN = 0.65; MS = -0.008, SDN = 2.12, SDS = 0.86). However, there was a significant interaction between source and intensity (F(1,27) = 12.10, p = 0.002, ηp2 = 0.31 and ηG2 = 0.15; Fig. 5). The Bonferroni corrected post-hoc tests revealed a higher PSE for the AN condition compared with the AS condition (MAN = 0.92, MAS = -0.13, SDAN = 2.04, SDAS = 0.9, t(27) =  − 2.48, p = 0.020, d = 0.47; two-tailed post-hoc t-test), a lower PSE for the AS compared to the PS condition (MAS = -0.13, MPS = 0.12, SDAS = 0.9, SDPS = 0.83, t(27) =  − 2.41, p = 0.012, d = 0.45; one-tailed post-hoc t-test), and a higher PSE for the AN compared to the PN condition (MAN = 0.92, MPN = 0.39, SDAN = 2.04, SDPN = 2.19, t(27) = 2.09, p = 0.020, d = 0.39; one-tailed post-hoc t-test). The post-hoc analysis did not show differences between the PS and the PN condition (MPS = 0.12, MPN = 0.39, SDPS = 0.83, SDPN = 2.19, t(27) = -0.64, p > 0.050; two-tailed post-hoc t-test). Thus, we replicate the findings obtained by previous discrimination studies with supra-threshold sounds52,53,54, but more importantly we extend previous work by showing that self-generated near-threshold sounds are perceived louder compared to the passively presented ones.

The analysis for the JND values revealed that there was a significant main effect of intensity (F(1,27) = 119.45, p < 0.001, ηp2 = 0.82 and ηG2 = 0.49), with a significantly higher JND (i.e., lower discrimination sensitivity) for the near- compared to the supra-threshold conditions (MS = 1.93, MN = 5.79, SDS = 1.5, SDN = 2.39; Fig. 5). The analysis did not show a significant main effect of source (F(1,27) = 2.75, p > 0.050; MA = 3.68, MP = 4.03, SDA = 2.7, SDP = 2.9) or a significant interaction between source and intensity (F(1,27) = 0.77, p > 0.050). Collectively, the results obtained by these analyses are consistent with previous work with both auditory52,53,54 and tactile self-generated stimuli35 and further show that the interactive effects of intensity and self-generation are not dependent on participants’ differential sensitivity in discriminating the loudness of two sounds (as indexed by the JND values).

For analyzing differences in the PSE between the four conditions, the 95% confidence intervals were calculated for each condition based on the simulations from the bootstrapping procedure (n = 1000). For 9 subjects we found significant differences between the active and passive supra-threshold conditions (for 8 subjects, lower PSE in the AS compared to the PS), while for the near-threshold intensities only 3 subjects had significantly higher PSE values in the active compared to the passive condition. Within the active condition, significant differences were obtained between the supra- and near-threshold intensities for 16 subjects (for 13 subjects, lower PSE in the AS compared to the AN), while for 18 subjects we found significant differences between the passive supra- and passive near-threshold conditions (for 12 subjects, lower PSE in PS compared to PN). The goodness-of-fit routine showed that for 26, 27, 26, and 26 psychometric curves out of the 28 total curves fitted per condition, the fitting procedure resulted in acceptable goodness of-fit statistics (for the AN, AS, PN, and PS, respectively). Despite the non-acceptable goodness-of-fit for some subjects and conditions, we kept these subjects in the analyses, after confirming that results would remain the same when excluding them.

Finally, we aimed to directly compare our results with the findings obtained by Reznik et al.65, where they employed a similar discrimination task where the standard and comparison tone were always presented at the same intensity (either supra- or near-threshold). Thus, in this analysis we only included the trials where the comparison sound was presented at the same intensity as the standard one (i.e., 74 dB for the supra-threshold and 5 dBs above each participant’s audiometric threshold for near-threshold conditions). In order to directly compare with Reznik et al.’s study, we calculated the “1st sound louder” responses and performed a repeated measures ANOVA with the factors sound source (active/passive) and sound intensity (supra-/near-threshold). The results showed a significant main effect of source (F(1,27) = 13.54, p < 0.001, ηp2 = 0.33 and ηG2 = 0.04), with less “1st sound louder” responses for active compared to passive trials (MA = 46.1, SDA = 20.62, MP = 53.63, SDP = 19.86). The main effect of intensity did not reach significance (F(1,27) = 3.26, p > 0.050, MN = 53.77, SDN = 21.57, MS = 45.98, SDS = 18.76). However, consistent with Reznik et al.65, we obtained a significant interaction between source and intensity (F(1,27) = 8.94, p < 0.010, ηp2 = 0.25 and ηG2 = 0.04; Fig. 5). The post-hoc t-tests showed that while there were significantly less “1st sound louder” responses for AS compared to PS trials (MAS = 38.12, MPS = 53.82, SDAS = 16.56, SDPS = 17.75, t(27) = -5.19, p < 0.001, d = 0.98; one-tailed paired samples t-test), no differences were observed between active and passive trials at near-threshold intensities (MAN = 54.09, MPN = 53.45, SDAN = 21.43, SDPN = 22.10, t(27) = 0.17, p = 0.570; one-tailed paired samples t-test). We also observed significantly more “1st sound louder responses” for the AN compared to the AS condition (MAN = 54.09, SDAN = 21.43, MAS = 38.12, SDAS = 16.56, t(27) =  − 3.03, p = 0.010, d = 0.01; two-tailed paired samples t-test), while no differences were obtained between the PS and PN conditions (MPS = 53.82, SDPS = 17.75, MPN = 53.45, SDPN = 22.10, t(27) = 0.08, p = 0.840; two-tailed paired samples t-test). Collectively, the comparison analysis we performed replicates the significant interaction reported by Reznik et al.65 with an even larger effect size (ηp2 = 0.25 here compared to ηp2 = 0.21 in their study), but the follow-up analyses demonstrate that when the standard and comparison tones are presented at the same intensity, the differences between self- and externally-generated sounds are limited to supra-threshold intensities.

Discussion

To-date, many different models have attempted to elucidate the effects of motor acts on perceptual processing. Yet, empirical evidence as to the exact direction and nature of these effects remain mixed. We hypothesized that the mixed findings may be related to the modulatory effects of stimulus intensity and to differences regarding the exact aspect of perceptual processing that is being tested. Here, we present a preregistered study with a priori power estimations (https://osf.io/ypajr/), where we utilized a wide range of intensities to test for possible differences between self- and externally-generated sounds in detection and discrimination ability. Contrary to previous work46, we did not observe enhancements in the detection sensitivity for near-threshold self-generated sounds. However, in the discrimination task we found a significant interaction between self-generation and intensity on perceptual bias (i.e., PSE) that replicates and extends previous work52,53,54,65 by showing that perceived intensity is reduced for self-generated sounds when they are presented at supra-threshold intensities, but enhanced when presented at near-threshold intensities.

Extant models disagree about how motor predictions affect the perceptual processing for expected action consequences. On one hand, consistent with ideomotor theories proposing that we internally activate the sensory outcome of our own action83, dominant cancellation models in the action literature have suggested that behavioural and neurophysiological responses to expected action effects are suppressed35,37,39. Such attenuation is also predicted by preactivation accounts postulating that expectations preactivate representations of the predicted effects, increasing their baseline activity, thereby rendering the actual input less discriminable from baseline and reducing detection sensitivity31. In contrast, according to sharpening models, the motor-driven suppression proposed by cancellation theories is limited to units tuned away from the expected input, resulting in a sharpened population response and higher signal-to-noise ratio (SNR) that ultimately boosts detection sensitivity for what we expect44. However, none of these models can account for our findings: The cancellation account would predict lower perceived intensity irrespective of signal strength, while according to the preactivation and sharpening models we should have found significant differences in detection sensitivity (lower or higher for self-generated sounds, respectively). Critically, these models cannot explain the enhanced perceived intensity for expected near-threshold sounds. Although this enhancement may be partly driven by multisensory integration processes that are known to boost processing when the unimodal signal is of low strength like the near-threshold self-generated sounds (e.g., inverse effectiveness84), two recent models have raised the possibility that the signal strength interacts with motor predictions in determining whether the processing of the expected events will be enhanced or cancelled out7,85.

Reznik and Mukamel85 recently proposed that the inhibitory influence exerted by the motor cortex on auditory areas during motor acts86 may either dampen or enhance perceptual processing of self-generated sounds depending on the environmental context. According to their model, the motor-driven suppression of the auditory cortex58,87 leads to reduced activity at the population level, but also to more selective responses and thus higher SNR. Crucially, while net activity should be always reduced during motor engagement irrespective of intensity, the resulting SNR is proposed to be higher in faint compared to salient contexts. Faint stimulation is known to elicit responses only on “best-frequency” neurons, while louder stimuli also stimulate the neurons tuned to nearby frequencies85. Thus, Reznik and Mukamel propose that in faint contexts, the global inhibition during motor engagement may result in “best-frequency” responses only, with almost complete silence of the activity in nearby frequencies thanks to the inhibition of the spontaneous activity, relatively enhancing the sound-evoked activity compared to the background noise58,87.

This proposal has two important implications as concerns the consequences of motor engagement in perceptual processing: First, salient environments would be characterized by reductions in the loudness perception that are proposed to be driven by reduced population activity. Yet, no predictions are made as to whether perceived intensity for near-threshold sounds would be also attenuated or not, thus leaving unexplained our finding that perceived intensity is enhanced for self-generated near-threshold sounds. Second, the increased SNR would boost the detectability of near-threshold sounds only, since in salient contexts sensitivity is already at ceiling. The authors found support for this claim in a study46 where self-generation significantly enhanced sound detectability. However, this finding was not replicated in the present study.

A caveat to the model proposed by Reznik and Mukamel is that it is largely based on animal studies that compared auditory responses in active vs. passive states (i.e., locomotion vs. quiescence or Go/No-Go tasks)56,58,87, rather than comparing self- vs. externally-generated sounds. It is very likely that active states and contingent action-stimulus relationships do not have the same underlying mechanisms, and that they in turn do not modulate perception in the same way. The modulations found in active states may be mostly driven by unspecific neuromodulatory processes (i.e., arousal56), while in the presence of a contingent action-stimulus relationship specific prediction mechanisms may dominate (i.e., corollary discharge). This critical difference may explain why we did not find any significant effects in the detection task that lacked a contingent press-sound relationship (only 50% of the presses generate a sound). However, previous detection paradigms have also reported no such enhancement56,57,88, thus raising the possibility that the low power of the only human study reporting lower detection thresholds for self-generated sounds (n = 10)46 may have reduced the likelihood of their statistically significant result reflecting a true effect89. Collectively, although Reznik and Mukamel were the first to attempt to explain how sound intensity may modulate neural and behavioural responses during motor engagement, their model cannot fully explain our findings, and in particular it also cannot explain why the interactive effects we observed here are limited to perceptual bias, rather than perceptual sensitivity.

We believe that our findings can be best explained by the opposing process theory which highlights the role of signal strength in enhancing or suppressing the processing of predictable stimuli7. According to this theory, perception is in principle biased towards expected stimuli, such as self-generated and thus more predictable stimuli. However, if the presentation of an unpredicted stimulus generates a high level of surprise after the initial stages of sensory processing, then the perceptual processing of this surprising stimulus is boosted. In terms of self-generation effects, this would imply enhanced processing of externally-generated, and thus unpredictable (surprising) stimuli. Critically, however, the level of surprise is closely related to signal strength, as surprise reflects both the distance between the prior and posterior distributions, as well as their precision (Kullback–Leibler divergence)90,91, and weaker signals are inherently less precise. For example, the sound of a horn in the middle of the night would elicit surprise but only if it is loud, and thus clearly audible. In sum, according to this view, supra-threshold externally-generated stimuli are inherently more surprising than the self-generated ones, shifting perception toward the unexpected (i.e., enhanced perceived loudness for the externally-generated sounds at supra-threshold intensities). Conversely, when sounds are presented at a near-threshold intensity, the increased uncertainty and higher level of noise in the signal renders externally-generated sounds unsurprising and perception is shifted towards the expected (i.e., enhanced perceived intensity for the self- compared to the externally-generated sounds at near-threshold intensities). Thus, the surprise-driven mechanism operates only for highly precise and therefore task-relevant unexpected signals, triggering a process that boosts their perception by driving attention away from the consequences of self-made acts as proposed by the active inference framework92. Therefore, the shifts in perceived intensity in either direction may be related to surprise-induced attentional mechanisms that have been suggested to modulate the precision of the prediction error, rather than the prediction error itself41,92. Nevertheless, one would expect that this mechanism would also operate in detection paradigms, contrary to the null findings obtained in the detection task. While these findings may be due to the lack of a contingent action-sound relationship as mentioned before, an alternative explanation is that the attentional nature of these effects results in affecting certain aspects of perceptual processing.

The studies conducted so far have not systematically assessed the effects of self-generation (and their interaction with stimulus intensity) on the different perceptual measures. Discrimination studies have reported shifts in the PSE, a measure of perceptual bias, while JND—a measure of perceptual sensitivity—remains unaffected by self-generation35,52,53,54,55. Conversely, detection studies have typically measured perceptual thresholds or the dʹ score (perceptual sensitivity) and criterion46,61 (response bias). Here, we provide a more complete picture of how motor actions may affect perception by having two tasks that allowed us to obtain all these measures within subjects and show that the effects of self-generation and their interaction with stimulus intensity are driven by shifts in perceptual bias, rather than sensitivity measures.

In sum, the present study showed that the intensity of the sensory feedback biases perception for self-initiated stimuli in a differential manner, with attenuated perceived loudness at loud intensities, but perceptual enhancement for near-threshold ones. These findings provide empirical support to the opposing process theory7 by showing that the behavioural difference between self- and externally-generated sounds interacts with the noise of the sensory outcome in driving perceptual processing. The strength of this study is that it extends previous work by demonstrating that self-generation and its interaction with intensity only affects perceptual bias, rather than perceptual sensitivity52,53,54,88 or response bias46. Although the opposing process theory does not clarify whether expectation effects are driven by perceptual or later decisional processes7, we argue that the proposed bias in perception as a function of signal strength implies a competition between two percepts, which was only the case in the discrimination task and may point to attentional processes that are known to reverse the effects of prediction on sensory processing93. We believe that further work is required to replicate these findings, assess the neurophysiological correlates of these effects, as well as the influence of other factors such as arousal, that are also known to affect behavioural performance56,94, and ultimately provide a comprehensive account of how motor predictions and signal strength shape the perception of our environment.