‘Normal’ hearing thresholds and fundamental auditory grouping processes predict difficulties with speech-in-noise perception

Understanding speech when background noise is present is a critical everyday task that varies widely among people. A key challenge is to understand why some people struggle with speech-in-noise perception, despite having clinically normal hearing. Here, we developed new figure-ground tests that require participants to extract a coherent tone pattern from a stochastic background of tones. These tests dissociated variability in speech-in-noise perception related to mechanisms for detecting static (same-frequency) patterns and those for tracking patterns that change frequency over time. In addition, elevated hearing thresholds that are widely considered to be ‘normal’ explained significant variance in speech-in-noise perception, independent of figure-ground perception. Overall, our results demonstrate that successful speech-in-noise perception is related to audiometric thresholds, fundamental grouping of static acoustic patterns, and tracking of acoustic sources that change in frequency. Crucially, speech-in-noise deficits are better assessed by measuring central (grouping) processes alongside audiometric thresholds.


Results
illustrates the correlations between performance on the speech-in-babble task and audiometric thresholds and between performance on the speech-in-babble and figure-ground tasks. The shaded region at the top of the graph displays the noise ceiling, defined as the correlation between thresholds measured in two separate blocks of the speech-in-babble task (r = 0.69, p < 0.001; 95% CI = 0.56-0.78). For all subsequent analyses, we averaged thresholds across these two blocks.
'normal' hearing thresholds relate to speech-in-noise. Audiometric thresholds (2-frequency average at 4-8 kHz across the left and right ears) accounted for 15% of the variance in speech-in-babble thresholds (r = 0.39, p < 0.001; 95% CI = 0.21-0.55). The correlation remained significant after excluding 8 participants who had audiometric thresholds worse than 20 dB at 4 or 8 kHz (r = 0.25, p = 0.017; 95% CI = 0.05-0.44); excluding these participants did not lead to a significant change in the magnitude of the correlation coefficient (z = 1.05, p = 0. 29), confirming that the correlation we found is driven by sub-clinical variability in audiometric thresholds. A post-hoc analysis using the average thresholds at frequencies between 0.25 and 8 kHz showed a numerically were able to perform the same-frequency task when the figure components were less intense than the background components. Whereas, for the two roving figures, most participants (Coherent roving: N = 89; Complex roving: N = 94) could only perform the task when the figure components were more intense than the background components. Thresholds did not differ significantly between the two roving figure-ground tasks [t(96) = 1.21, p = 0.23, d z = 0.12].
To investigate whether performance on different figure-ground tasks explain similar (overlapping) or different variance in speech-in-babble performance, we conducted a hierarchical stepwise regression with the figure-ground tasks as predictor variables. Thresholds on the same-frequency figure-ground discrimination task accounted for 10% of the variance in speech-in-babble thresholds (r = 0.32, p = 0.001). There was a significant improvement in model fit when thresholds on the coherent roving figure-ground discrimination task were added (r = 0.39, r 2 change = 0.05, p = 0.02); together, the two tasks explained 15% of the variance in speech-in-babble thresholds. These results demonstrate that the two figure-ground tasks explain partially independent portions of the variance-thresholds for the coherent roving figure-ground discrimination task explain an additional 5% of the variance that is not explained by thresholds for the same-frequency figure-ground discrimination task.
Possibly, the two tasks might assess the same construct, and a better fit with both tasks together is simply due to repeated sampling. To investigate this, we separated the thresholds for the two runs within each task (which were averaged in the previous analyses). We constructed models in which two runs were included from the same task (which differ in the stimuli that were presented but should assess the same construct) and models in which one run was included from each task. The idea behind these constructions is that they are equivalent in the amount of data entered into each model (always 2 runs). If the two tasks assess different constructs, the models including runs from different tasks should perform better than the models including runs from the same task. The results from these analyses are displayed in Table 1. Indeed, when one run from the same-frequency figure-ground task is entered into the model, there is no significant improvement in the amount of speech-in-babble variance explained by adding the second run (regardless of which run is entered first; upper two rows of Table 1). Whereas, adding one run of the coherent roving figure-ground task significantly improves the model fit (regardless of which run of the same-frequency task is entered first; lower two rows of Table 1). This result provides evidence that the same-frequency and coherent roving tasks assess different constructs that contribute to speech-in-noise, rather than improving model fit by sampling the same construct.

Best predictions by combining peripheral and central measures.
To examine whether the figure-ground tasks explained similar variance as audiometric thresholds, we tested models that included both audiometric thresholds and figure-ground performance. A model including the same-frequency figure-ground www.nature.com/scientificreports www.nature.com/scientificreports/ discrimination task and audiometric thresholds explained significantly more variance in speech-in-babble performance than audiometric thresholds alone (r = 0.45, r 2 change = 0.05, p change = 0.016). Similarly, a model including the coherent roving figure discrimination task and audiometric thresholds explained significantly more variance than audiometric thresholds alone (r = 0.44, r 2 change = 0.04, p change = 0.032).
When the three variables (audiometric thresholds, same-frequency figure discrimination task, and coherent roving figure discrimination task) were entered into a model together, the model explained 23% of the variance (r = 0.48). Based on our estimate of the noise in the data (defined as the correlation between thresholds measured Error bars display 95% between-subjects confidence intervals for the correlation coefficients. The grey shaded box illustrates the noise ceiling, calculated as the (95% between-subjects confidence interval associated with the) correlation between two different blocks of the speech in noise task. Asterisks indicate the significance level of the correlation coefficient (*p < 0.050; **p < 0.010; ***p < 0.001). (B) Scatter plots associated with each of the correlations displayed in Panel A. Each dot displays the results of an individual participant. Solid grey lines indicate the least squares lines of best fit (note that the error bars in Panel A display the normalised confidence intervals for these regressions). [dB HL: decibels hearing level; TMR: target-tomasker ratio.] See also Supplemental Figure.  www.nature.com/scientificreports www.nature.com/scientificreports/ in two separate blocks of the speech-in-babble task, reported above), the best possible variance we could hope to account for is 47%. Thus, when included together, the three tasks account for approximately half of the explainable variance in speech-in-babble performance. Although, the model including all three variables just missed the significance threshold when compared to the model including only audiometric thresholds and the same-frequency figure discrimination task (r 2 change = 0.03, p change = 0.06). The other variables (same-frequency figure detection task and complex roving figure discrimination task) did not approach the significance threshold (t ≤ 0.36, p ≥ .68).

Figure-ground tests index age-independent deficits.
Given the broad age range of participants, we examined whether task performance related to age. As expected, older age was associated with worse speech-in-noise performance (r = 0.43, p < 0.001; 95% CI = 0.26-0.58). Therefore, we next considered whether relationships between our tasks and speech-in-noise could be explained by age-related declines in those tasks.

Discussion
Our results demonstrate that people with normal hearing vary by as much as 7 dB target-to-masker ratio (TMR) (Fig. 3B) in their thresholds for understanding speech when background noise is present, and both peripheral and central processes contribute to this variability. Despite recruiting participants with audiometric thresholds that are widely considered to be 'normal' , variability in these thresholds significantly predicted speech-in-noise performance, explaining 15% of the variance. In addition, fundamental auditory grouping processes, as assessed by our new figure-ground tasks, explained significant variance in speech-in-noise performance that was not explained by audiometric thresholds. Together, audiometric thresholds and figure-ground perception accounted for approximately half of the explainable variance in speech-in-noise performance. These results demonstrate that different people find speech-in-noise difficult for different reasons, and suggest that better predictions of real-world listening can be achieved by considering both peripheral processes and central auditory grouping processes.
Central contributions to speech-in-noise performance. Two of the new figure-ground tests (same-frequency and coherent roving) correlated with speech-in-noise performance, and explained significant independent portions of the variance. This result suggests that (at least partially) separate processes contribute to the ability to perform the same-frequency and coherent roving tasks-and both of these processes contribute to speech-in-noise perception. That these tasks explain different portions of the variance demonstrates that they could help to tease apart different reasons that different people find it difficult to understand speech-in-noise: Some people might struggle to understand speech-in-noise due to impaired mechanisms for detecting static (same-frequency) patterns, whereas others may struggle due to impaired processes for tracking patterns that change frequency over time.
Given that neuroimaging studies show cortical contributions to both figure-ground perception 18,21,27 and to speech-in-noise perception 28 , it is highly plausible that the shared variance arises at a cortical level. That the two figure-ground tasks explain partially independent portions of the variance suggests that their explanatory power is not simply attributable to generic attention or working memory processes. Although attention and working memory may contribute to speech-in-noise perception 16,29 , we expect both of our figure-ground tasks to engage these processes to a similar extent. Consistent with this idea, same-frequency 27,30 and roving-frequency 20 figure-ground stimuli both show neural signatures associated with figure detection during passive listening. Instead, we assume that the shared variance observed here is due to fundamental auditory grouping processes, which (at least partially) differ when a target object has a static frequency than when it changes frequency over time. www.nature.com/scientificreports www.nature.com/scientificreports/ Teki et al. 19 provide evidence that the ability to detect same-frequency figures likely relies on a temporal coherence mechanism for perceptual streaming proposed by Shamma, Elhilali, and Micheyl 31 , drawing on spectro-temporal analyses that are proposed to take place in auditory cortex 32 . In this model, an 'object' or 'stream' is formed by grouping elements that are highly correlated in time across frequency channels, and these elements are separated perceptually from incoherent elements. By simulating this temporal coherence mechanism, Teki et al. 19 show it can successfully distinguish figure-present and figure-absent trials, and provides a good fit to participants' behavioural responses. A temporal coherence mechanism can also explain how people detect figures that change frequency over time 20 , like the coherent roving stimuli used here. The three elements of the coherent roving figures change frequency in the same direction at the same rate, producing temporal coherence across frequency channels, and these would be segregated from the incoherent background of tones that have random frequencies at each time window. A previous study 33 showed that people with hearing loss were worse than people with normal hearing at detecting spectrotemporal modulations-and performance related to speech intelligibility in people with hearing loss. Although, this test was a dynamic test for detecting spectral ripples and thus was different from the figure-ground tasks we used here-and may be considered less similar to speech. Also, they did not investigate the relationship between spectrotemporal modulation detection and speech intelligibility in people with normal hearing thresholds.
Interestingly, participants obtained worse thresholds (on average, by about 20 dB TMR) for the roving frequency figure-ground task than for the same-frequency figure-ground task: that is, the figure needed to be more intense for participants to perform the roving frequency task successfully than the same-frequency task. One plausible explanation for this finding is that both tasks require the detection of static patterns, and the roving frequency task requires additional processes for tracking frequencies over time. Yet, that the two tasks explain partially independent portions of the variance demonstrates that the processes required to perform one task are not a simple subset of the processes required to perform the other. If the same-frequency task involved only a subset of processes for performing the roving frequency task, then the roving frequency task should explain more variance, and the same-frequency task should account for no additional variance beyond the variance explained by the roving frequency task; however, we did not find this result. Therefore, although a temporal coherence mechanism could be used to perform both tasks, the same-frequency and coherent roving tasks must rely on (at least partially) separate processes. For example, perhaps within-channel process are particularly important for detecting static patterns, because the frequencies of each component would fall within the same channel; whereas, processes that integrate across frequency channels might be more important for detecting roving patterns, which changed by 59-172% (interquartile range = 27%) of the median frequency of the component in this experiment. Relating these processes to speech-in-noise perception, the same-frequency figure-ground stimulus somewhat resembles the perception of vowels, which often have frequencies that remain relatively stable over time; whereas, the roving stimulus might approximate the requirement to track speech as it changes in frequency at transitions between different consonants and vowels.
The classic same-frequency figure detection task that has been used in previous studies 18,19,27,30 did not correlate significantly with speech-in-noise perception (p = 0.067), but only narrowly missed the significance threshold. Although the stimuli used in the same-frequency detection task and the same-frequency discrimination task were similar, the discrimination task correlated reliably and the detection task did not. Given that the detection task used fixed stimulus parameters for every participant, whereas the discrimination task was adaptive, this result may have arisen because the adaptive (discrimination) task was marginally more sensitive to individual variability than the (detection) task with fixed parameters.
Peripheral contributions to speech-in-noise performance. 'Normal' variability in audiometric thresholds at 4-8 kHz explained 15% of the variance in speech-in-noise performance, suggesting that the audiogram contains useful information for predicting speech understanding in real-world listening, even when participants have no clinically detectable hearing loss. That sub-clinical variability in audiometric thresholds at these frequencies might predict speech-in-noise perception has often been overlooked: Several previous studies have assumed that sub-clinical variability in audiometric thresholds do not contribute to speech-in-noise perception and have not explored this relationship 34,35 . Of those that have tested the relationship, two found a correlation 29,36 , although one of these 36 included participants with mild hearing loss and it is possible that these participants were responsible for the observed relationship. Two others found no correlation but restricted the variability in their sample by either imposing a stringent criterion on 'normal' hearing less than 15 dB HL 37 or considering only young participants aged 18-30 years 38 . Here, we included participants who had average pure-tone thresholds up to 20 dB HL, as defined by established clinical criteria 24 -and found that the relationship with speech-in-noise perception was significant even after excluding participants with thresholds greater than 20 dB HL at the individual frequencies we tested (.25-8 kHz).
We infer from these results that speech-in-noise difficulties may arise from changes to the auditory periphery that are related to, but which precede, clinically relevant changes in thresholds. Although audiometric thresholds at frequencies higher than 8 kHz have been proposed to contribute to speech perception in challenging listening environments 25 , these frequencies are not routinely measured in clinical practice. That we found correlations with 4-8 kHz audiometric thresholds suggests that speech-in-noise difficulties could be predicted based on audiometric thresholds that are already part of routine clinical assessment.
This relationship might arise because people with worse-than-average 4-8 kHz thresholds already experience some hearing loss that causes difficulties listening in challenging acoustic environments, such as when other conversations are present. Elevated high-frequency audiometric thresholds may be related directly to hair cell loss, or to cochlear synaptopathy. Regarding cell loss, even modest losses of outer hair cells could cause difficulties listening in challenging environments-for example, by degrading frequency resolution. Even people who have average thresholds between 10 and 20 dB HL have unusually low amplitude distortion product otoacoustic emissions (2019) 9:16771 | https://doi.org/10.1038/s41598-019-53353-5 www.nature.com/scientificreports www.nature.com/scientificreports/ (dpOAEs) 39 , which are widely considered to be related to outer hair cell dysfunction 40 . This finding demonstrates that even a modest loss of outer hair cells that is insufficient to produce clinically relevant shifts in audiometric thresholds has the potential to alter sound perception. On the other hand, changes in thresholds have also been suggested to accompany cochlear synaptopathy, which is an alternative mechanism that might impair speech perception. For example, Liberman et al. 41 found that humans with a higher risk of cochlear synaptopathy, based on self-reported noise exposure and use of hearing protection, had higher audiometric thresholds at 10-16 kHz.
Age-related changes in the auditory system. We replicated the common finding that speech-in-noise performance is worse with older age 42,43 . As expected, audiometric thresholds were also worse with older age, and these age-related declines in audiometric thresholds seemed to underlie their relationship with speech-in-noise performance. This is consistent with both of the possible mechanisms described above, because outer hair cells degrade with older age and post-mortem studies in humans 44,45 and animal models 46 show age-related cochlear synaptopathy.
Although performance on the figure-ground tasks became worse with older age, the relationship between figure-ground and speech-in-noise performance remained significant after accounting for age. This finding suggests that age-independent variability in figure-ground perception contributes to speech-in-noise performance. That is, even among people of the same age, figure-ground perception would be expected to predict speech-in-noise performance. An interesting question for future research would be to explore whether the shared age-independent variance is because some people are inherently worse at figure-ground and speech-in-noise perception than others, or whether these same people begin with average abilities, but experience early-onset age-related declines in performance. clinical applications. A sizeable proportion of patients who visit audiology clinics report difficulties hearing in noisy places, despite having normal audiometric thresholds and no apparent cognitive disorders 2-4 -and currently there is no satisfactory explanation for these deficits. Although the current experiment sampled from the normal population, our figure-ground tests might be useful for assessing possible central grouping deficits in these patients. These patients are sometimes diagnosed with 'auditory processing disorder' (APD), despite little understanding of the cause of their difficulties or ways in which we can help these patients. Nevertheless, children with APD have speech-in-noise perception that appears to be at the lower end of the normal range 47 . If these patients also perform poorly on figure-ground tests, then future research might focus on testing strategies to improve fundamental grouping processes.
The figure-ground tasks we developed were quick to run (~10 minutes each), making them feasible to add to standard clinical procedures alongside the pure-tone audiogram. These tests may help clinicians gain a better understanding of the types of deficits these patients face, as well as helping to predict real world listening beyond the audiogram. Furthermore, performance in these tasks is independent of linguistic ability, unlike standard speech-in-noise tests. They would therefore be appropriate for patients who are non-native speakers of the country's language (which is important given that speech-in-noise perception is worse in a listener's second than native language 48 ), and for children who do not have adult-level language skills. Given that clinical interventions, such as hearing aids and cochlear implants, typically require a period of acclimatisation before patients are able to successfully recognise speech, these tests which use simple pure tones may be useful for predicting real-world listening in the early stages following clinical intervention. Although, based on the magnitude of the correlations we observed, the figure-ground tests are not yet ready to act as a substitute for speech-in-noise tests in individuals who are capable of performing speech-in-noise tests.
One step towards clinical application would be to improve the reliability of these measures (see Supplemental Figure), which might allow them to explain even greater variability in speech-in-noise performance than reported here. Between-run variability in figure-ground thresholds could be due to stimulus-specific factors, such as frequency separation. We suspect the reason that the complex roving figure-ground task did not correlate with speech-in-noise performance was because it was unreliable within participants (between run correlation: r = -0.02); a possible alternative explanation that it was more difficult than the other tasks was not supported by the data. The complex roving task may vary more than the other figure-ground tasks because it contains additional variability related to the second and third formants of a spoken sentence, and these components (including the frequency changes within each component and their relationship to the frequency changes in the first formant) might impact the extent to which the figure can be extracted; in contrast, variability in the extent to which the first formant changes frequency is present in both the coherent and complex roving figures. Happily, the finding that some of our tasks did not correlate with speech-in-noise rules out alternative explanations of the results-for example, that significant correlations were due to between-subject differences in motivation or arousal, or because some participants are simply better at performing these types of (lab) tasks. Under these explanations, we should have found significant correlations with all four figure-ground tasks.
conclusions. Overall, our results are consistent with the notion that speech-in-noise difficulties can occur for a variety of reasons, which are attributable to impairments at different stages of the auditory pathway. We show that successful speech-in-noise perception relies on audiometric thresholds at the better end of the normally hearing range, which likely reflect differences at the auditory periphery. Our results also reveal that fundamental grouping processes, occurring centrally, are associated with successful speech-in-noise perception. We introduce new figure-ground tasks that help to assess the grouping of static acoustic patterns, and the ability to track acoustic sources that change in frequency over time-interestingly, both of these processes appear to be important for speech-in-noise perception. These findings highlight that speech-in-noise difficulties are not a unitary phenomenon, rather suggesting that we require different tests to explain why different people struggle to understand www.nature.com/scientificreports www.nature.com/scientificreports/ speech when other sounds are present. Assessing both peripheral (audiometric thresholds) and central (grouping) processes can help to characterise speech-in-noise deficits.

Methods
Subjects. 103 participants completed the experiment. We measured their pure-tone audiometric thresholds at octave frequencies between 0.25 and 8 kHz in accordance with BS EN ISO 8253-1 24 . We excluded 6 participants who had pure-tone thresholds that would be classified as mild hearing loss (6-frequency average ≥20 dB HL in either ear). We analysed the data from 97 participants, which we determined would be sufficient to detect significant correlations of r 2 ≥ 0.12 with .8 power 49 . The 97 participants (40 male) were 18-60 years old (median = 24 years; interquartile range = 11). The study was approved by the University College London Research Ethics Committee, and was performed in accordance with relevant guidelines and regulations. Informed consent was obtained from all participants. experimental procedures. The experiment was conducted in a sound-attenuating booth. Participants sat in a comfortable chair facing an LCD visual display unit (Dell Inc.). Acoustic stimuli were presented through an external sound card (ESI Maya 22 USB; ESI Audiotechnik GmbH, Leonberg) connected to circumaural headphones (Sennheiser HD 380 Pro; Sennheiser electronic GmbH & Co. KG) at 75 dB A.
Participants first performed a short (<5 minute) block to familiarise them with the figure-ground stimuli. During the familiarisation block, they heard the figure and ground parts individually and together, with and without a gap in the figure.
Next, participants completed 5 tasks (Fig. 1A): four figure-ground tasks and two blocks of a speech-in-babble task. All tasks were presented in separate blocks and their order was counterbalanced across participants. Immediately before each task began, participants completed 5 practice trials with feedback. No feedback was provided during the main part of each task.
One of the figure-ground tasks was based on a detection task developed by Teki et al. 18 , in which the stimuli consisted of 40 50-ms chords with 0 ms inter-chord interval. Each chord contained multiple pure tones that were gated by a 10-ms raised-cosine ramp. The background (Fig. 1B) comprised 5-15 pure tones at each time window, whose frequencies were selected randomly from a logarithmic scale between 179 and 7246 Hz (1/24 th octave separation). The background lasted 40 chords (2000 ms). For the figure, we used a coherence level of 3 and a duration of 6. The frequencies of the 3 figure components were also selected randomly, but with an additional requirement that the 3 figure frequencies were separated by more than one equivalent rectangular bandwidth (ERB). The frequencies of the figure were the same at adjacent chords. The figure lasted 6 chords (300 ms) and started on chord 15-20 of the stimulus. For half of stimuli, there was no figure in the stimulus; to ensure that figure-present and figure-absent stimuli had the same number of elements (and therefore the same amplitude), figure-absent stimuli contained an additional 3 components of random frequencies, which had the same onset and duration as the figures in figure-present stimuli. Participants' task was to decide whether the figure was present or absent on each trial. Each participant completed 50 trials, with an inter-trial interval between .8 and 1.2 seconds.
In three of the figure-ground tasks, participants completed a two-interval two-alternative forced choice discrimination task. On each trial, participants heard two figure-ground stimuli sequentially, with an inter-stimulus interval of 400 ms. Both stimuli contained a figure that lasted on average 42 chords (2100 ms) and a background that lasted exactly 3500 ms (70 chords). For one stimulus, 6 chords (lasting 300 ms) were omitted from the figure. For the other stimulus, the same number of components (3) were omitted from the background (6 chords; 300 ms). Participants' task was to decide which of the two stimuli (first or second interval) had a 'gap' in the figure.
In the "same-frequency" task, the figure lasted exactly 42 chords (2100 ms) and the 3 figure components were the same frequencies at adjacent chords, similar to the figure-ground detection task. In the "complex roving" task, the 3 figure components were based on the first three formants of the sentences used in the speech-in-noise tasks. We extracted the formants using Praat (http://www.fon.hum.uva.nl/praat/), and averaged the frequencies of the formants in 50-ms time bins; we then generated 50-ms pure tones at those frequencies. In this task, the figure lasted for the same duration as the extracted formants (34-50 chords; median = 42 chords; interquartile range = 4). In the "coherent roving" task, the 3 figure components were multiples of the first formant frequencies: the first component was equal to the first formant frequency, the second component was the first component multiplied by the average difference between the first and second formants in the sentence, and the third component was the second component multiplied by the average difference between the second and third formants. In all three tasks, we varied the TMR between the figure and ground in a 1-up 1-down adaptive procedure 50 to estimate the 50% threshold. Each run started at a TMR of 6 dB. The step size started at 2 dB and decreased to .5 dB after 3 reversals. For each task, we adapted the TMR in two separate but interleaved runs, which were identical, except that different stimuli were presented. Each run terminated after 10 reversals.
Participants completed two blocks of the speech-in-noise task, which each contained two interleaved runs; these were identical, except different sentences were presented as targets. Sentences were from the English version of the Oldenburg matrix set (HörTech, 2014) and were recorded by a male native-English speaker with a British accent. The sentences are of the form "<Name> <verb> <number> <adjective> <noun>" and contain 10 options for each word (see Fig. 1A). An example is "Cathy brought four large chairs". The sentences were presented simultaneously with 16-talker babble, which began 500 ms before the sentence began, ended 500 ms after the sentence ended, and was gated by a 10-ms raised-cosine ramp. A different segment of the noise was presented on each trial. Participants' task was to report the 5 words from the sentence (in any order), by clicking words from a list on the screen. The sentence was classified as correct if all 5 words were reported correctly. We adapted the TMR between the sentence and babble in a 1-up 1-down adaptive procedure, similar to the figure-ground discrimination tasks. The TMR began at 0 dB and the step size started at 2 dB, which decreased to .5 dB after 3 reversals. Each run terminated after 10 reversals. (2019) 9:16771 | https://doi.org/10.1038/s41598-019-53353-5 www.nature.com/scientificreports www.nature.com/scientificreports/ Analyses. For the figure-ground detection task, we calculated sensitivity (d′) 51 across all 50 trials.
For the adaptive tasks (speech-in-babble and figure-ground discrimination tasks), we calculated thresholds as the median of the last 6 reversals in each run. For the main analyses, the thresholds from the two interleaved runs within each block were averaged.
To isolate the contributions of different tasks to speech-in-noise, we used a hierarchical linear regression with the stepwise method. When estimating the full model, all tasks (audiogram and four figure-ground tasks) were entered into the analysis, in case any of the tasks showed a significant relationship with speech-in-noise only after accounting for the variance explained by another task.
All correlations are Pearson's correlation coefficients, reported without correction, given that the conclusions of the paper are based on the results of the regression analyses rather than the p-values associated with the correlations.
To compare average thresholds between different figure-ground discrimination tasks, we used paired-samples t-tests.

Data availability
The data and analysis scripts are available upon reasonable request from the corresponding author (E.H.).