Evaluation of Cochlear Implant Candidates using a Non-linguistic Spectrotemporal Modulation Detection Test

Adults who score 50% correct or less in open-set sentence recognition test under the best aided listening condition may be considered as candidates for cochlear implant (CI). However, the requirement for ‘the best aided listening condition’ needs significant time and clinical resources to ensure such condition. As speech signals are composed of dynamic spectral and temporal modulations, psychoacoustic sensitivity to the combinations of spectral and temporal modulation cues may be a strong predictor for aided speech recognition. In this study, we tested 27 adults with moderately severe to profound hearing loss to explore the possibility that a non-linguistic unaided spectrotemporal modulation (STM) detection test might be a viable option as a surrogate measure to evaluate CI candidacy. Our results showed that STM detection thresholds were significantly correlated with aided sentence recognition scores for the 27 hearing impaired listeners. The receiver operator characteristic (ROC) curve analysis demonstrated that the CI candidacy evaluation by both unaided STM detection test and the traditional best-aided sentence recognition test was fairly consistent. More specifically, our results demonstrated that the STM detection test using a low spectral and temporal modulation rate might provide an efficient process for CI candidacy evaluation.

Scientific RepoRts | 6:35235 | DOI: 10.1038/srep35235 be needed for adults before speech testing. Limited benefit from hearing aids is defined by speech perception test scores under the best aided listening condition.
However, the current guidelines require significant time and effort to ensure the best aided listening condition. Shim et al. (2014) have explored the possibility of using surrogate non-speech psychoacoustic measures without HA to address this limitation 1 . Shim et al. have assessed unaided spectral ripple discrimination (SRD) and temporal modulation detection in fifteen hearing impaired listeners and compared their performances on psychoacoustic tasks with aided speech perception tasks 1 . Significant correlations between aided speech perception outcomes and unaided spectral modulation sensitivity as well as temporal modulation detection performance have been reported. Most importantly, Shim et al. have demonstrated that the unaided SRD test could be a promising tool for evaluating CI candidacy 1 .
Although speech is composed of dynamic spectral and temporal modulations that can change depending on utterance, previous studies have often measured spectral or temporal modulation sensitivities separately. It is possible that a combination of spectral and temporal modulation cues can be delivered as test signals 2,3 . These stimuli, often called "spectrotemporal modulation (STM)", represent spectral patterns that can change over time or temporal modulation patterns that can differ across frequency channels. Bernstein et al. have reported that STM sensitivity, especially those with low-rate but high-density stimuli, is strongly correlated with speech intelligibility for hearing impaired listeners 4 . However, no studies have evaluated unaided STM sensitivity as a predictor for speech perception performance in hearing impaired listeners with severe to profound sensorineural hearing loss who might be potential CI candidates 4,5 . Won et al. have recently demonstrated that STM detection thresholds for low spectral densities are significantly correlated with sentence identification scores in CI users. Based on the results of these previous studies, we hypothesized that the unaided STM detection test might serve as a viable surrogate measure for CI candidacy evaluation. To test this hypothesis, the performance of the unaided STM test was determined and compared to speech perception performance measured under the best aided listening condition.

Results
Speech perception and psychoacoustic performance. Speech perception performances for twenty-seven hearing impaired participants are shown in Fig. 1. Unaided monosyllabic word recognition scores ranged from 0% to 80% (mean scores: 23%). Aided monosyllabic word recognition scores ranged from 0% to 95% (mean scores: 42%) (Fig. 1A). Aided sentence recognition scores for each subject ranged from 0% to 99% (mean scores: 46%) (Fig. 1B). Speech recognition abilities varied substantially across subjects. The performance results of STM detection without HAs are shown in Fig. 2. Here, lower STM detection threshold indicated better detection performance. When the thresholds for spectral densities of 0.5, 1.0, and 2.0 c/o were compared to each other, subjects performed better at lower spectral densities. Across different stimulus conditions, substantial variabilities in STM thresholds across subjects were found.
Correlation analyses. Results of correlation between unaided STM detection thresholds and speech recognition scores are summarized in Table 1. Significant correlations were found between all STM detection thresholds and the three speech recognition scores for the 27 participants. STM detection thresholds at 2.0 c/o & 10 Hz showed the best correlation with unaided monosyllabic consonant-vowel-consonant (CVC) word scores (r = − 0.742, p < 0.0001). Under the aided condition, STM detection thresholds at 0.5 c/o & 5 Hz significantly correlated with monosyllabic CVC word scores (r = − 0.821, p < 0.0001) and K-CID sentence recognition scores (r = − 0.762, p < 0.0001). The scattergram of unaided STM tests and aided speech perception scores is shown in Fig. 3. Monosyllabic word recognition tests in a quiet background for each subject were performed without HAs, with their own HAs, or loaner HAs. Unaided monosyllabic word recognition scores ranged from 0% to 80%. Aided scores ranged from 0% to 95%. (B) The average of two list sentence recognition scores in "best-fit" condition ranged from 0% to 99%. Dotted line indicates 50% correct scores. HAs: hearing aids, CVC: consonant-vowel-consonant, K-CID: Korean Central Institute for the Deaf.

ROC Curve Analysis.
To further investigate the clinical potential of the unaided STM detection test as a surrogate measure of CI candidacy evaluation, a receiver operator characteristic (ROC) curve analysis was used to assess the quality of CI candidate evaluation by STM tests. The optimal cutoffs of psychoacoustic measures were determined by finding the values for predicting K-CID sentence recognition scores less than 50%.
The ROC curves for the 27 ears in STM detection tests at six stimulus conditions are shown in Fig. 4A to F using sensitivity (%) as a function of specificity (100% -specificity). The areas under the ROC curve in Fig showed the strongest correlation with K-CID sentence scores, the corresponding optimized sensitivity value, specificity, positive predictive value, negative predictive value, and accuracy were 78.6%, 76.9%, 78.6%, 76.9%, and 77.8%, respectively, at the optimal cutoff value (Fig. 4A). For STM detection performance at 2.0 c/o & 5 Hz which showed the second strongest correlation with K-CID sentence scores with the highest accuracy value at optimal cutoff thresholds, the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were 71.4%, 92.3%, 90.9%, 75%, and 81.5%, respectively (Fig. 4E). For STM detection performance at 2.0 c/o & 10 Hz which also showed strong correlation with K-CID sentence scores with the highest accuracy value at optimal cutoff thresholds, the sensitivity, specificity, positive predictive value, negative predictive value, and accuracy were 78.6%, 84.6%, 84.6%, 78.6%, and 81.5%, respectively (Fig. 4F).

Discussion
The current study evaluated the potential implication of unaided non-linguistic STM tests as surrogate measures for CI candidacy evaluation. Shim et al. have demonstrated that unaided spectral-ripple discrimination test could be used a promising tool for evaluating CI candidacy 1 . Although their data were not reported in this paper, we have successfully replicated the finding of Shim et al. 1 using 25 of the 27 patients who participated in the current study, underscoring the efficacy of a non-linguistic psychoacoustic test as an efficient surrogate measure under any language environment.  While previous studies have often measured spectral or temporal modulation sensitivities separately, the STM detection test in this study used test signal as a combination of spectral and temporal modulation cues for evaluating the potential CI candidacy. Unaided STM thresholds showed significant correlations with all three speech recognition tests. Among six stimulus conditions for the STM detection test, STM thresholds at 0.5 c/o & 5 Hz showed the strongest correlation with K-CID sentence scores (r = − 0.762, p < 0.0001). Anderson et al. and Saoki et al. have demonstrated that spectral modulation detection thresholds at lower spectral densities (0.25 to 0.5 c/o) have stronger correlations with speech perception abilities in CI users 6,7 , consistent with the finding of the current study. STM thresholds at 0.5 c/o & 5 Hz also correlated significantly with aided CVC word scores (r = − 0.821, p < 0.0001) and aided K-CID sentence scores (r = − 0.876, p < 0.0001) in this study. In general, unaided STM detection thresholds correlated with aided speech performances more than with unaided monosyllabic CVC word recognition scores (Table 1).
Because cochlear implantation is typically approved for patients with limited benefit from appropriately fit HAs (defined by sentence recognition score), ROC curve analysis was performed between STM detection test scores and K-CID sentence scores. Results of the ROC curve analysis demonstrated that the STM detection test using low spectral and temporal modulation rates (0.5 c/o & 5 Hz) may serve as a good complementary measure for evaluating CI candidacy with a derived area under the ROC curve of 0.8269. Based on speech perception   performances and results of the CI candidacy prediction made by the STM tests (Table 2), among subjects who were predicted to meet CI candidacy by all STM detection tests, none showed a sentence score of more than 50% under the best aided condition. When the prediction of CI candidacy based on cutoff values of six STM detection thresholds was compared to that based on K-CDI sentence scores, only three subjects (S11, S13, and S14) failed to show consistent CI candidacy prediction between sentence recognition test and STM detection tests. Although S11, S13, and 14 showed sentence scores of less than 50% under the best aided condition, three subjects had relatively good hearing at least on one side (See Tables 2 and 3). Aided speech perception test requires significant resource investment such as time and effort to ensure the best aided listening condition. In addition, unnecessary costs could incur if HA users receive cochlear implantation. Thus, unaided psychoacoustic measures could be cost-effective because these tests can be implemented without having to fit HAs. The STM test in the current form took about a total of 60 minutes for the six stimuli conditions. However, the testing time could be reduced by optimizing stimulus conditions that could potentially show the best prediction power such as by using STM stimulus conditions with spectral density of 0.5 c/o and temporal rate of 5 Hz or 10 Hz. Such efforts have already been reported for the development of clinical assessment of psychoacoustic performance 8,9 . Furthermore, STM test tasks can be readily performed within any language system because they use non-speech stimuli. Consequently, they are less likely to be influenced by other non-auditory factors that might affect speech recognition such as patient's cognitive processing ability, educational background, and age 1,10 . In addition, it might be used to create an international or cross language standard for CI candidacy with psychoacoustic measures. In addition, no learning effect was observed in psychoacoustic test. However, speech perception test has a chance of having learning effect due to limited number of sentences in the test list if listeners are repeated with the same test material 10 . Although it may be premature to use the STM test as the gold standard CI candidacy measure, the current study demonstrates that the STM detection test may be a useful complementary measure to determine the CI candidacy. Further studies are needed to optimize the STM stimulus conditions and testing paradigm to improve its CI candidacy evaluation power.

Subjects.
Twenty-seven hearing impaired listeners participated in this study. All subjects were native Korean speaking adults with hearing loss greater than 56 dB HL in both ears on average at four frequencies (0.5 Hz, 1 kHz, 2 kHz, and 4 kHz). Their mean auditory performances at each frequency are shown in Fig. 5. Demographic characteristics of these subjects are shown in Table 4. All participants provided their written informed consent to participate in this study. The study protocol was approved Samsung Medical Center Institutional Review Board (2013-06-031). This study was carried out in accordance with approved guidelines.
Test battery administration. All subjects participated in all STM detection and speech recognition tests. A custom made MATLAB ® (The Mathworks, Natick) graphical user interface was used to present acoustic stimuli to subjects for psychoacoustic tests. Stimuli were generally presented at the most-comfortable level (MCL) which was determined after frequency-dependent amplification using the half-gain rule. Briefly, the stimuli were first set at 65 dB SPL. A frequency of independent gain equal to half of the pure-tone average was then applied to the stimuli. Finally, the presentation levels of the stimuli were adjusted within a range of ± 10 dB to estimate the most-comfortable level for individual participants. Pure-tone thresholds and MCLs for all tested ears are summarized in Table 3. The amplified stimuli were then presented binaurally through inserted ear phones. Speech recognition tests with a quiet background were conducted using monosyllabic words and sentences with or without HAs. Aided speech recognition tests were conducted under the best-fit listening condition using participants' own HAs or loaner HAs if they did not have their own. The order of test administration varied within and across subjects.

Spectrotemporal Modulation (STM) Detection Test.
To create STM stimuli with a bandwidth of four octaves (i.e., 354-5664 Hz), the following equation was used based on previously established technique 2 : In Eq. (1), S ws the amplitude of each carrier tone as a function of time (t) and logarithmic frequency (x) (i.e., x = log 2 (f/354), where f was the frequency). Four thousands carrier tones were spaced equally on a logarithmic frequency scale with bandwidth of 354-5656 Hz. The stimuli had a total duration of 1 sec. The spectral envelope of complex tones was modulated as a single sinusoid along the logarithmic frequency axis on a linear  amplitude scale. The amplitude (A) of the rippled spectral modulation was determined by adaptively adjusting the modulation depth of all carrier tones simultaneously. When A was set to a value between 0 and 1, it corresponded to 0 to 100% spectral modulation of the flat ripple envelope. Ω was the spectral density in units of cycles per octave (c/o). Φ was the spectral modulation starting phase in radians for carrier tones randomized in radians (range, 0 to 2π ). The STM stimuli were also modulated in time by having modulated spectral envelopes sweep across the frequency at a constant velocity. In Eq. (1), ω was the spectral modulation velocity expressed as the number of sweeps per second (Hz). This was referred to as temporal rate in the current study. The positive and negative velocity constructed the STM stimuli with spectral modulations (or frequency modulations) that either increased or decreased in frequency and repeated over time. Since a previous study has shown that the direction of spectral modulation has no effect on STM detection thresholds for normal hearing or hearing impaired listeners 4 , the current study tested a falling direction of spectral modulation alone. Figure 6 shows an example of spectrograms of STM stimuli with different combinations of spectral density and temporal rate. The upper and lower rows are spectrograms for STM stimuli with a spectral density of 0.5 and 1.0 cycle per octave (c/o), respectively. As shown in Fig. 6, there is a relatively broader spectral modulation pattern in the upper row along the frequency domain than that in the lower row. The spectrograms for STM stimuli with a temporal rate of 5 and 10 Hz are shown in the left column and the right column, respectively. The temporal rate determines the speed of frequency sweep that falls from high to low frequency along the frequency domain.
To measure STM detection thresholds, a 2-interval, 2-alternative adaptive forced-choice (2I, 2-AFC) paradigm was used. A silence interval of 500 ms was used between the two intervals. One interval consisted of modulated noise (i.e., test signal) while the other interval consisted of steady noise (i.e., reference signal). Subjects were instructed to choose an interval containing sound such as bird-chirping, vibrating, or moving over time and frequency. Subject's task was to identify the interval that contained a STM stimulus. A 1-down, 1-up adaptive procedure was used to measure STM detection threshold starting with a modulation depth of 0 dB which was decreased in steps of 4 dB from the first to the fourth reversal and 2 dB for the next 10 reversals. For each testing run, the final 10 reversals were averaged to obtain the STM detection threshold. In order to evaluate STM detection performance at different modulation conditions, three different spectral densities (Ω = 0.5, 1, and 2 c/o) and two different temporal rates (ω = 5 and 10 Hz) were tested. Thus, a total of six different sets of STM stimuli were tested. Subjects completed the tests under the six different stimulus conditions in a random order. Subjects then repeated a new set of tests under the six stimulus conditions in a newly created random order. The sequence of stimulus conditions was randomized within and across subjects. A third adaptive track was obtained if the difference between the first two tracks exceeded 3 dB for a given stimulus condition. The final threshold for each STM stimulus condition was the mean of two (or three) adaptive tracks. Before actual testing, example stimuli were played for subjects until they became familiar with STM stimuli and the task. The entire procedure (including rehearsal and actual testing for all stimulus conditions) took about 60 minutes to complete. Speech recognition tests in quiet background. Two different types of speech recognition tests were administered in the current study using either monosyllabic words or sentences. For monosyllabic word recognition test, twenty mono-syllabic consonant-vowel-consonant (CVC) words were presented with live male voice at individual subject's most comfortable level (MCL). For unaided monosyllabic word test, stimuli were presented at MCL which was determined after frequency-dependent amplification using the half-gain rule (Table 3). For aided monosyllabic word test, stimuli were presented at a calibrated level of 65 dB HL. A total percent correct score was calculated based on words that were correctly repeated. For the sentence recognition test, two lists of Korean Central Institute for the Deaf (K-CID) sentences were administered at an average level of 65 dBA 11 . Each list contained ten sentences with four keywords. Therefore, a total of 80 keywords were scored for each subject. All participants were instructed to verbally repeat the sentence that they heard. A total percent correct score was calculated as the percent of keywords correctly recognized.
Monosyllabic word recognition test was performed with or without HA. For aided testing, subjects used either their own HAs or laboratory-owned HAs (Audeo S Smart V, Phonak, Switzerland). Sentence recognition test was always administered with HAs. Unaided monosyllabic word recognition test was presented via a headphone (TDH39, Telephonics, USA). When performing aided monosyllabic word or sentence recognition test, speech stimuli were presented through a loud speaker (Control 1 Xtream, JBL, USA for monosyllabic word test and HS-50M, Yamaha, Japan for sentence recognition test) in a sound-field. Subjects sat one meter away from the loudspeaker. They were asked to face it during the course of the experiment.