Hyperactive sensorimotor cortex during voice perception in spasmodic dysphonia

Spasmodic dysphonia (SD) is characterized by an involuntary laryngeal muscle spasm during vocalization. Previous studies measured brain activation during voice production and suggested that SD arises from abnormal sensorimotor integration involving the sensorimotor cortex. However, it remains unclear whether this abnormal sensorimotor activation merely reflects neural activation produced by abnormal vocalization. To identify the specific neural correlates of SD, we used a sound discrimination task without overt vocalization to compare neural activation between 11 patients with SD and healthy participants. Participants underwent functional MRI during a two-alternative judgment task for auditory stimuli, which could be modal or falsetto voice. Since vocalization in falsetto is intact in SD, we predicted that neural activation during speech perception would differ between the two groups only for modal voice and not for falsetto voice. Group-by-stimulus interaction was observed in the left sensorimotor cortex and thalamus, suggesting that voice perception activates different neural systems between the two groups. Moreover, the sensorimotor signals positively correlated with disease severity of SD, and classified the two groups with 73% accuracy in linear discriminant analysis. Thus, the sensorimotor cortex and thalamus play a central role in SD pathophysiology and sensorimotor signals can be a new biomarker for SD diagnosis.

Brain imaging data. The sound discrimination task strongly activated the superior temporal gyrus and the transverse temporal gyrus in both hemispheres relative to the baseline, consistent with previous studies showing bilateral superior temporal gyrus activation during voice perception as compared with non-voice perception in healthy humans [30][31][32]  To identify specific neuroanatomical correlates of SD, we then examined the interaction between the effects of stimulus type (modal voice vs. falsetto voice) and group (SD vs. control). This stimulus × group interaction Scientific Reports | (2020) 10:17298 | https://doi.org/10.1038/s41598-020-73450-0 www.nature.com/scientificreports/ was significant only in the left ventral sensorimotor cortex (− 54, − 8, 18, Z = 3.90, Fig. 2A), which showed a greater effect of group (SD > control) for modal voice than for falsetto voice. When the analysis was restricted to the modal voice condition, this left sensorimotor region showed greater activation for SD patients than for controls (Z = 3.82). In contrast, this between-group difference was non-significant for the falsetto voice condition (Z < 1.70). Therefore, these findings indicate that (1) this part of the left sensorimotor cortex was more active in SD than in control and that (2) this effect of group was significant only for modal voice and not for falsetto voice.
We then looked at activation patterns in four other regions previously associated with SD, i.e., the left sensorimotor cortex (− 41, − 12, 31), SMA (− 5, 1, 63), thalamus (− 12, − 18, 0), cerebellum (− 26, − 60, − 28), putamen (− 24, − 3, 3), and the right pallidum (27, − 6, 4) (Fig. 3). Although these cortical and subcortical structures did not survive the whole-brain SPM analysis as described above, we ran this ROI analysis to assess the stimulus x group interaction more closely for these regions by using a priori known coordinates of ROIs (see "Methods" section). As for the magnitude of neural activation, the ROI analysis revealed a significant stimulus x group interaction in the left sensorimotor cortex (Z = 3.36, p = 0.03) and in the left thalamus (Z = 3.09, p = 0.02). The same interaction was not significant in the cerebellum (Z = 2.54, p = 0.08), SMA (Z = 1.92, p = 0.2), putamen (Z = 1.65, p = 0.3) and pallidum (Z = 1, p = 0.5). Consistent with the whole-brain SPM, the left SMA showed greater activation to both modal voice and falsetto voice for SD patients relative to controls. This pattern of neural response is thus non-specific to the nature of speech stimuli and is unlikely to reflect the primary pathophysiological focus of the disease, because, as described above, SD patients show impaired vowel sound production but relatively unimpaired falsetto production 5 . Rather, this finding may be attributed to some secondary neural changes of the disease 33 , since the SMA is known to play a role in motor planning during speech production.
Given the observed stimulus × group interaction in the sensorimotor cortex and the thalamus, we additionally examined possible changes in functional connectivity in SD patients. Indeed, this PPI analysis revealed that the functional coupling between the thalamus (− 12, − 18, 0) and the left sensorimotor cortex (− 41, − 12, 31) showed a significant decrease in SD compared with controls (Z = 3.34, p = 0.01). Coupled with whole-brain SPM analysis, these findings thus suggest that the left sensorimotor area corresponds to the "human voice area" 34 and the thalamus plays a specific role in SD pathophysiology.
For SD patients, we further looked at possible correlations between fMRI signals and disease severity by plotting the activation level of the left ventral sensorimotor cortex (− 54, − 8, 18) identified in the SPM analysis against individual VHI-10 scales (Fig. 2B). We observed non-significant correlation between fMRI signals and the VHI-10 scores (r = 0.52, p = 0.1). In Fig. 4, we further plotted the magnitude of activation against the individual VHI-10 score for each of the six spherical ROIs created at a priori known coordinates. This supplemental analysis revealed significant positive correlations between neural signals in the left sensorimotor cortex and VHI-10 scores (r = 0.67, p = 0.02). For other regions, however, the correlations between neural signals and disease severity were non-significant (r = − 0.12, p = 0.7 for the SMA; r = 0.52, p = 0.1 for the thalamus; r = − 0.16, p = 0.6 for the cerebellum; r = − 0.11, p = 0.8 for the putamen; and r = − 0.1, p = 0.8 for the pallidum). While several cortical and subcortical regions have been associated with SD, these findings therefore suggest that only the neural signals from the sensorimotor cortex specifically reflect the clinical severity of the disease.
Lastly, we assessed the diagnostic power of fMRI to separate SD patients from healthy controls. For each ROI, individual activation signals were used to train and test three different machine learning algorithms (see "Methods" section and Fig. 5). The classification performance measures for each method are summarized in Table 1. Classification accuracy was consistently higher for the sensorimotor cortex (> 59%) compared to other three www.nature.com/scientificreports/ regions (~ 59% or chance-level). In particular, linear discriminant analysis for this region achieved 73% accuracy and yielded the highest performance in terms of precision, specificity, recall, and F-measures. These performance metrics seem to exceed those reported in a previous fMRI study that classified SD patients and healthy controls with 71% accuracy by using functional connectivity measures 14 . The present findings suggest that sensorimotor activation signals can help to distinguish SD patients from healthy controls accordingly.

Discussion
The present study used fMRI to isolate specific neural correlates of SD during speech perception. While most previous brain imaging studies of SD employed overt vocalization tasks, a critical problem inherent in such production tasks is that normal and dystonic vocalizations each should differently engage the speech control system at multiple neural levels (i.e., cognitive, motor, somatosensory, and auditory), which could confound statistical comparisons between the SD and control groups. This may explain the fact that previous neuroimaging studies have reported variously different patterns of sensorimotor activation in SD [9][10][11] . In the present study, we used a sound discrimination task to overcome the potential drawback associated with speech production and obtained similar level of behavior performance between SD patients and healthy controls. In SPM analysis, however, we observed different patterns of neural activation in the left ventral sensorimotor cortex between the two groups, i.e., the nature of speech stimuli (modal voice vs. falsetto voice) elicited a much weaker impact on SD patients than on healthy controls ( Fig. 2A). This same stimulus-by-group interaction was also obtained in the left sensorimotor ROI previously associated with laryngeal adduction in SD (Fig. 3). These results are in good accord with a well-known and time-honored clinical feature of adductor-type SD, i.e., its "taskspecificity, " where falsetto phonation is free of voice breakage 5 , and may concur with some neuroimaging studies of SD and other dystonic disorders 9,35,36 . These findings seem consistent with previous TMS studies showing that the hyperactive somatosensory cortex increases the neuromuscular excitability of the speech production system in SD 12,13 . Together, the present findings converge to suggest that reduced sensorimotor reactivity is a primary neural source responsible for the voice disorder in SD. Only the left ventral sensorimotor cortex showed significant interaction between stimulus and groups in whole-brain SPM. Note that this region showed greater activation to modal voice relative to falsetto voice, whereas this effect of stimulus was greater for SD than for controls. (B) fMRI signals in the same ventral sensorimotor cortex showed no significant correlation with symptom severity.

Scientific Reports
| (2020) 10:17298 | https://doi.org/10.1038/s41598-020-73450-0 www.nature.com/scientificreports/ Furthermore, we found that the activation level of the sensorimotor area positively correlated with symptom severity as measured with VHI-10 scores, providing additional support for the notion that sensorimotor overactivity is a key component underlying voice disorder in SD. This may also explain the finding that sensorimotor signals discriminated SD patients from controls with high accuracy (73%), which exceeded the classification performance reported by Battistella 14 . These findings suggest that the fMRI signal of the left sensorimotor cortex can serve as a good index of disease severity and a diagnostic clue for SD.
On the other hand, our ROI analysis revealed that the left thalamus exhibited a significant but different pattern of stimulus-by-group interaction, with the effect of stimulus being weaker from the baseline (Fig. 3). The observed difference in activation profile suggests that the sensorimotor cortex and the thalamus each play a different role in the pathophysiology of SD. This seems to be consistent with the proposal that subcortical structures, including the basal ganglia and thalamus, play a key part in abnormal sensorimotor integration in movement disorders 20 . Using PPI analysis, we indeed observed decreased functional connectivity between the thalamus and . Stimulus-by-group interaction in priori ROIs associated with SD. The left sensorimotor cortex showed significant interaction between stimulus and group. The same effect was also significant in the left thalamus. The SMA showed greater activation in SD relative to controls irrespective of stimulus type and thus showed no significant interaction (see "Results" section for detail).

Scientific Reports
| (2020) 10:17298 | https://doi.org/10.1038/s41598-020-73450-0 www.nature.com/scientificreports/ the sensorimotor cortex. This latter finding seems to be consistent with previous diffusion-tensor MRI studies showing impaired functional connectivity between the thalamus and precentral gyrus in idiopathic dystonia 37,38 . This disrupted thalamo-cortical connectivity is thought to reflect intrinsic abnormal neuronal firing within the thalamus and the precentral gyrus 37 . Coupled with the present finding, reduced thalamo-sensorimotor coupling may play a role in the generation of abnormal vocalization in SD. However, unlike the sensorimotor cortex, the left thalamus ROI showed no significant correlation between fMRI signals and disease severity. A plausible account for this finding is provided by recent neuroimaging and neuronal recording studies suggesting that ventrolateral thalamic activity reflects the strength of sensory afferents from the proprioceptive receptors in muscles, tendons, and joints 39,40 , rather than the neuromuscular excitability of laryngeal muscles. That is, thalamic activations are primarily driven by peripheral inputs from deep sensory cells and thus only weakly correlated with increased muscle activity in SD. Compared to subcortical signals, sensorimotor activation levels may be directly correlated with symptom severity, since phonation itself strongly depends on fine orchestration of multiple cortical networks 14 .
In conclusion, the present study identified the sensorimotor cortex and the thalamus as the primary neural correlates of SD by using the sound discrimination task without overt vocalization. Specifically, the neural activation level of the sensorimotor cortex was elevated in SD relative to controls and correlated with disease severity. Our findings suggest that fMRI signals from these structures may serve as a novel biomarker for SD, but leave at least two major questions to be addressed in future research. First, it is still open whether fMRI signals have the potential to differentiate SD from other functional voice disorders, such as psychogenic voice disorder and muscle tension dysphonia, which mimic the strained and effortful voice characteristics of adductor-type SD and thus lead to diagnostic confusion and treatment delay 3,4,6 . It is therefore an important clinical problem to improve the diagnostic precision for SD, because therapeutic strategies are radically different between adductor-type SD and other disorders (i.e., surgery or botulinum toxin injections for SD and psychologic or voice therapy for other disorders). While several perceptual scales, such as VHI-10, GRBAS scale, CAPE-V, and number of voice break, are now available for assessing symptom severity [41][42][43] , "gold standard" criteria for differential diagnosis has yet to be established. In future research, it is thus important to determine the extent of the diagnostic power of fMRI   www.nature.com/scientificreports/ signals in SD and other voice disorders. Second, it also remains unclear whether the observed hyperactivity in the sensorimotor cortex plays a causal role in the pathophysiology of SD or rather it reflects some plastic change in the brain after abnormal vocalization. As for other forms of dystonia, previous TMS studies show a causal link between the somatosensory cortex and writer's cramp 44,45 . Arguably, abnormal sensorimotor activation may be also causally linked with SD, but the causal relationship between SD and motor cortical excitability has not yet been established because the existing TMS data show a shorter cortical silent period in patients than healthy controls, which only suggests some impairment in inhibitory control in the corticobulbar and cortical spinal tracts 12,13,46 . Further studies are needed to investigate whether TMS stimulation to the left sensorimotor cortex improves the dystonic voice and clarify the potential and limits of fMRI signals in the clinical diagnosis of SD.

Methods
Participants. Eleven adductor-type SD patients (two males; age range, 21-68 years; average, 36.7 years) and 11 age-and gender-matched healthy participants (two males; age range, 21-65 years; average, 30.9 years) participated in the present study ( Table 2). All of them were right-handed native Japanese speakers with normal hearing and normal or corrected-to-normal vision. None of the patients had a known family history of dystonia and neurological or psychiatric disorders other than SD. The mean age at onset and duration of SD was 26.1 years (range, 9-58 years) and 8.5 years (range, 3-20 years), respectively. No patients had received botulinum toxin injections into their laryngeal muscles for the control of voice breakage. The clinical diagnosis of SD was made by certified otolaryngologists (Otorhinolaryngological Society of Japan). The patients fulfilled all the following criteria: (1) a strained/strangled voice with intermittent disruption 47 , (2) hyperadduction of the vocal folds during voice breakage, (3) no anatomic abnormality of the larynx observed on fiberoptic laryngoscopy, and (4) poor improvement in spite of voice therapy. All patients reported that they could produce falsetto voice without difficulty. We further confirmed that all patients could vocalize falsetto /u:/ without voice breakage. Clinical evaluation of severity was performed using the voice handicap index-10 (VHI-10) 48,49 . In brief, the VHI-10 is a patient-based self-assessment tool widely used to quantify the severity of voice problems in physical, mental and social aspects for a variety of voice disorders 50,51 . The mean VHI-10 score was 28.7 (range, 20-40) for SD patients and 4.9 (range, 0-13) for healthy controls, respectively. We also used the VHI-10 scores to evaluate their correlation with fMRI signals (see "Statistical analyses" section). While symptom severity of voice disorders can also be assessed with auditory-perceptual testing (e.g., voice break counts, GRBAS scale, CAPE-V), VHI-10 scores were used because it considers symptom variability. We assumed that this patient-reported outcome measure is a more stable estimate of overall severity than other physiological or behavioral measurements at a single test point. This is important given the fact that clinical symptoms in SD can easily fluctuate with test settings 5,52,53 . All participants provided written informed consent prior to the experiment. The protocol of this study was approved by the ethics committee of Kyoto University Hospital (C 1041). The study protocol was conducted in accordance with the Declaration of Helsinki.
Behavioral tasks. We used a two-alternative sound discrimination task to measure event-related fMRI activations in the neural systems involved in speech processing. We recorded two monosyllabic sounds, /a:/ and /i:/, each pronounced in two different voices (modal and falsetto) by a female native Japanese speaker (see "Supplementary Materials"). An additional pair of physical control stimuli was created by using "white noise" containing all frequencies from 20 to 11,000 Hz and "band noise" with a center frequency of 1000 Hz. All of the six sound stimuli were sampled at 44 kHz, and their overall sound pressure was adjusted to be equal to each other. Auditory stimuli were presented with the E-prime 2.0 software (Psychology Software Tools, Pittsburgh 2000). Each trial started with a variable jitter interval (3-7 s, in steps of 1 s) and included a sound stimulus (modal voice, falsetto voice, or noise) presented for 2 s. The three types of trials (modal voice, falsetto voice, and noise) were presented to participants in a pseudo-random order. Participants responded with their right index finger by pressing either (1) a right button for the vowel /a:/ (modal voice or falsetto voice) or the white noise or (2) a left button for the vowel /i:/ (modal voice or falsetto voice) or the band noise. Each participant received two scanning sessions, each consisting of 60 trials (thus 20 trials for each modal voice, falsetto voice, and noise conditions) and lasting 432 s. Before MRI scanning (see below), each participant performed practice trials with the same trial structure outside and inside the scanner.
Imaging procedure. Imaging data were acquired using a Siemens Trio 3 T head scanner with a 32-channel phased-array head coil. The echo planer imaging (EPI) data were acquired using a recently developed multiband Table 2. Participant characteristics. VHI-10 voice handicap index-10, n/a not applicable.

SD patients Healthy controls
Number of participants 11 11 Gender (women/men) 9/2 9/2 Statistical analyses. Imaging data were preprocessed and analyzed using the Statistical Parametric Mapping (SPM8; Wellcome Department of Cognitive Neurology, London, UK) package software run on Matlab R2014a. Functional images were corrected for head movement, normalized to the standard brain space by the Montreal Neurological Institute (MNI) space using T1 image unified segmentation (the resampling voxel size was 2 × 2 × 2 mm), and smoothed with a 4-mm Gaussian kernel. For the first-level analysis, we assessed the functional images by constructing a general linear model with a factor of three levels: modal voice, falsetto voice, and noise. High-pass temporal filtering (128 Hz) was applied to the fMRI time-series data. For each participant, three contrast images (modal voice, falsetto voice, and noise) were calculated relative to the baseline condition by convolving known time-series of trials with a canonical hemodynamic response function and its time derivative.
In the second-level analysis, we submitted the three contrast images per participant to a 2 × 2 analysis of variance (ANOVA) to examine the effects of stimulus type (modal voice, falsetto voice) and group (SD and control) on brain activation. We first delineated brain areas involved in voice processing by collapsing the contrast images of "modal voice > noise" and "falsetto voice > noise" from all participants across the two groups. We then searched for the effects of stimulus and group and their interactions within this set of voxels involved in voice processing. Unless stated otherwise, statistical significance was examined with the voxel-level threshold p < 0.001 uncorrected (extent threshold = 10 voxels). Activated brain regions were identified according to a probabilistic atlas 55 . Talairach coordinates referred from previous studies were converted into MNI spaces using the Yale Nonlinear Talairach to MNI Conversion Algorithm 56 .
We performed psychophysiological interaction (PPI) analysis to explore the possible changes in corticosubcortical connectivity in SD, because previous studies have demonstrated that this connectivity is impaired in dystonic diseases including SD 14 . In brief, PPI has the potential to evaluate neural coupling of one area to another affected by an experimental or psychological context 60 . Among the cortical areas, we chose the left sensorimotor cortex (− 41, − 12, 31), which was identified as an SD-specific region from the group analysis, and extracted regional responses per participant by calculating the principal eigenvariate across all voxels within a 4-mm sphere centered at the sensorimotor area. We then performed regression analysis for time-series data of the sensorimotor activity by computing the PPI regressor and a vector coding for the differential effect across stimulus types (1 for modal voice, − 1 for falsetto voice) per session per participant. The contrast image for stimulus type (i.e., "modal voice > falsetto voice") created per participant was submitted to second-level analysis (two-sample t test) to examine the group effect (SD vs. control). We created a spherical ROI with a 4-mm radius at the known coordinates of the left thalamus (− 12, − 18, 0) identified as the SD-specific cortical region from group analysis and examined the group effect at the voxel level with the threshold p < 0.05 corrected for multiple comparisons across search volume.
We further performed two supplemental analyses to assess whether these fMRI signals during speech perception can serve as a clinically informative biomarker for SD. First, we examined whether clinical disease severity measures are correlated with neural activation levels in these cortical and subcortical regions associated with SD. Using fMRI signals extracted from the spherical ROIs described above, we calculated Pearson's correlation strength to examine whether the effect-size of stimulus type (modal voice > falsetto voice) correlated with disease severity as measured with VHI-10. Second, we used machine learning algorithms to examine whether the fMRI data can distinguish SD patients from healthy controls. Three linear classification methods, i.e., linear discriminant analysis, linear support vector machine, and logistic regression, were performed using the Caret package in R (https ://www.r-proje ct.org/). In each classification analysis, we extracted individual-level activation signals derived from modal voice versus noise and falsetto voice versus noise comparisons by using the same spherical ROIs described above. We then performed a 26-fold leave-one-out cross-validation procedure that used all individual fMRI data except one to train the classifier and the remaining data to evaluate the prediction accuracy, respectively. Other metrics, i.e., precision, recall, specificity, and F-measure 61 were also calculated to compare classification performance across the three models.

Data availability
The data that support the findings of this study are available from the corresponding author, upon reasonable request.