Illusory sound texture reveals multi-second statistical completion in auditory scene analysis

Sound sources in the world are experienced as stable even when intermittently obscured, implying perceptual completion mechanisms that “fill in” missing sensory information. We demonstrate a filling-in phenomenon in which the brain extrapolates the statistics of background sounds (textures) over periods of several seconds when they are interrupted by another sound, producing vivid percepts of illusory texture. The effect differs from previously described completion effects in that 1) the extrapolated sound must be defined statistically given the stochastic nature of texture, and 2) the effect lasts much longer, enabling introspection and facilitating assessment of the underlying representation. Illusory texture biases subsequent texture statistic estimates indistinguishably from actual texture, suggesting that it is represented similarly to actual texture. The illusion appears to represent an inference about whether the background is likely to continue during concurrent sounds, providing a stable statistical representation of the ongoing environment despite unstable sensory evidence.

-11.5 Water movement -21 * Negative SNR value corresponds to the noise (masker) being higher in level than the signal (inducer).

Supplementary Table 1
Sounds and masking noise levels used in Experiment 1a. Masking noise SNR (dB) indicates the relative difference in amplitude between the inducer sound and the Gaussian masking noise, selected by the first author to produce masking of the texture by the noise when they were superimposed.  Table 3 Sounds and masking noise levels used in Experiment 3 (extent estimation task).

Inducer
Masking noise SNR (dB) Rattlesnake -16.5 (-6.5) Pneumatic drill -14 (-2.5) Heavy rain on a hard surface -17 (-4.5) Pneumatic drills (road works) -14 (-4.5) Wind whistling -17.5 (-10.5) Pouring coffee beans -17.5 (-16.5) Rain -14.5 (-6.5) Ship anchor -13.5 (-3.5) Blender -16.5 (-9.5) Radio static -12.5 (-4) Car interior -17.5 (-11.5) Babble (large hall) -18 (-8.5) Crunching cellophane -14 (-7) Applause in large hall -18 (-5.5) Lawn edger -15.5 (-9.5) Applause -15 (-6.5) Electric drill -16.5 (-8) Fast running river -20 (-3.5) Frying bacon -14.5 (-8.5) River running over shallows -18 (-7.5) Supplementary Table 4 Sounds and masking noise levels used in Experiment 5 (effect of gaps). These sounds were also used in Experiments 4a and 4b. Masking Noise SNR was that used in Experiment 5. The values in parentheses are the masking thresholds measured in Experiment 4a. The noise levels used in Experiment 5 were higher than these thresholds (lower SNR), which is conservative with respect to ensuring that the noise could have masked the inducer. Supplementary Fig. 1 Results of Experiment 1b. a Schematic of stimulus and task. b Cochleagrams of excerpts of white noise masker (left) and mean noise masker (right). c Correlation between experimental results (proportion of trials on which the inducer was judged to be present during the noise) for white noise masker and mean noise masker in Experiment 1b. Each data-point represents an inducer sound used in Experiment 1b. Here and elsewhere, dot color corresponds to the statistical stationarity measure from Figure 2f. Here and elsewhere, r values give Pearson correlation. Continuity judgments were correlated across masker types, but higher overall for the mean masker. d Correlation between perceptual continuity judgments for white noise masker in Experiment 1a and Experiment 1b (which differed in the participants, as well as in the presence of the mean noise masker condition). Results were similar across experiments. e Correlation of statistical distance to the masker and sound stationarity, for white noise masker (left) and mean noise masker (right) used in Experiment 1b. f Correlation between perceptual continuity judgments and sound stationarity, for white noise masker (left) and mean noise masker (right) conditions. g Correlation between perceptual continuity judgments and statistical distance to the masker, for white noise masker (left) and mean noise masker (right) conditions.

Supplementary Fig. 2
Temporal and spectral density analyses of Experiment 1b. a Correlation between temporal density and sound stationarity measures. Here and elsewhere, r values give Pearson correlation. Data-point color code corresponds to sound stationarity measure from Figure 2f. b Correlation between spectral density and sound stationarity measure. c Correlation between temporal density of inducer sound and perceptual continuity, for white noise masker (left) and mean noise masker (right) conditions of Experiment 1b. d Correlation between spectral density of inducer sound and perceptual continuity, for white noise masker (left) and mean noise masker (right) conditions of Experiment 1b. Supplementary Fig. 3 Results of Experiment 1a ordered by sound periodicity. a Schematic of stimulus and task for Experiment 1. b Results of Experiment 1a. Analogous to Figure 2f except that sounds are sorted by their periodicity. Periodicity was computed from the autocorrelation of the Hilbert envelope of the sound waveform, downsampled to 400 Hz. The periodicity measure was the height of the largest autocorrelation peak for lags between 0.125 s and 0.5 s, normalized by the autocorrelation at lag 0. Error bars show SEM. Color corresponds to the stationarity measure shown in Figure 2f. c Mean proportion of continuous responses plotted as a function of sound periodicity. r value is Pearson correlation.
Supplementary Fig. 4 Results for the masking and continuity experiments (4a/4b) plotted separately for individual sounds. To increase power, this analysis combined the data from Experiments 4a/4b with that of a pilot experiment (n = 10) that was identical except that the stimuli for a given reference sound were generated from a single synthetic texture exemplar (yielding n = 20 in total). a Masking and continuity curves for individual sounds. The data points and light/thin lines show the mean response across SNR values. The dark/thick lines show logistic function fits. The horizontal dashed line shows the threshold value (masking 0.833, continuity 0.666) used to relate masking and continuity in b. b Reliability of masking and continuity threshold measurements. Subpanels show test-retest reliabilities for the continuity experiment (top) and masking experiment (bottom). Reported Pearson correlation is the average of 10,000 test-retest splits (each participant performed two trials per condition, which were randomly assigned to a split); graphs show results from one example split. c Continuity thresholds plotted against masking thresholds, for individual sounds. The r value is the Pearson correlation between masking and continuity thresholds, corrected for attenuation. Index refers to sounds in subplot a. Masking and continuity thresholds were generally similar, and co-varied across sounds to some extent. b Listeners chose one of six response contours to describe their perceptual experience during the interrupting noise segment. The contour response code is indicated as the bolded letter for each response (e.g. "C" for "Continuous"). c To confirm task comprehension/compliance, the experiment included control trials where the texture was physically present during the intermediate noise segment and amplitude modulated according to one of the response contours. The stimulus for each condition is schematized above each of the six subplots. Graphs plot the proportion of trials on which each response was chosen. Data for individual participants is plotted as dots for the response choices selected above chance levels for each condition. d Results of main experimental conditions. Each subplot corresponds to a condition (shown schematically above). Data for individual participants is plotted as dots for the response choices selected above chance levels for each condition. Error bars plot SEM.
Supplementary Fig. 6 Auditory texture model. The model was adapted from that of McDermott and Simoncelli (2011). Statistics are measured from an auditory model capturing the tuning properties of three stages of the peripheral and subcortical auditory system. The cochlear envelope marginal statistics (M) comprise the mean, coefficient of variance, and skewness. Pair-wise envelope (C) correlations were computed between neighboring cochlear envelope bands. The modulation subband statistics comprise the modulation power (MP; the variance of the modulation normalized by the corresponding total cochlear envelope variance) and modulation correlations (MC) between modulation subbands.

Supplementary Fig. 8
Control conditions for Experiment 7. a The stimuli varied in the sounds that were physically present during the noise. In five of the conditions the cued sound was physically present and in the other five the cued sound was physically absent during the noise segment. b Results for control conditions shown in a. Proportion of trials on which participants judged the cued stimulus to continue during the noise, averaged within the two groups of conditions (present or absent). The results indicate that participants were performing the task as intended, in that they reported perceptual continuity when the cued sound was physically continuous, but not when it was unambiguously absent. P value is from a two-tailed paired t test comparing continuous responses between cue present and absent conditions. Error bars plot SEM.