Introduction

Parkinson’s disease (PD), a chronic, progressive neurodegenerative disorder with an unknown etiology, is associated with a significant burden with regards to cost and use of societal resources1,2. More than 90% of patients with PD suffer from hypokinetic dysarthria3. Early in 1969, Darley et al. defined dysarthria as a collective term for related speech disorders. The classification of dysarthria includes flaccid dysarthria, spastic dysarthria, ataxic dysarthria, hypokinetic dysarthria, hyperkinetic dysarthria, unilateral upper motor neuron dysarthria and mixed dysarthria4. The speech abnormalities of patients with PD are collectively termed hypokinetic dysarthria (HKD). These speech flaws are typically characterized by increased acoustic noise, a reduced intensity of voice, harsh and breathy voice quality, increased voice nasality, monopitch, monoloudness, speech rate disturbances, the imprecise articulation of consonants, the involuntary introduction of pauses, the rapid repetitions of words and syllables and sudden deceleration or acceleration in speech.

Speech impairments are caused by impaired speech mechanisms during any of the basic motor processes involved in speech performance5. The neuromotor speech sequence activates the muscles of the pharynx, tongue, larynx, chest and diaphragm through subthalamic secondary pathways. The anatomical substrate that could result in the abnormalities of PD phonetics may be reduced by the poor coordination of the sound-making muscles6. Usually, the stiffness of the laryngeal muscle tissue, which results in an increased hardness of the vocal cords, affects the closure of the vocal cords and increases the muscle tone7. Moreover, due to the decreased controllability of the diaphragm, the pneumatic input of the lungs to the larynx and the lung capacity decrease significantly8. Fortunately, dysphonia in PD has recently received abundant attention9.

There are three dominant factors that affect dysphonia: aerodynamic deficits, inefficient vibratory function, and weak muscle activity. First, the high phonatory resistance in PD patients may be due to a significant increase in their estimated subglottic pressure and a reduction in their phonatory airflow10. Previously tested patients had insufficient exhalation volume per syllable, and reductions in lung volume are associated with increased sound severity11. Second, the abnormal vibration of the vocal cords also affects dysarthria. Laryngopharyngeal involvement manifests as vocal fatigue, vocal breaks, tremor and the inability to make sound12,13. These manifestations may lead to the inadequate or excessive closing of the vocal cords and irregular or asymmetrical vocal fold motion during phonation14,15. Finally, dysarthria is the result of involuntary movements that are variable and irregular in nature5.

The primary manifestations of PD are tremor16, rigidity17, bradykinesia18, postural instability19, slowness of movement, hyposmia20, sleep disorders21 and changes in sound during speech22. A noticeable tonal change can occur during PD progression; however, it is often ignored. PD patients may present characteristics such as sound quality change23, poor articulation24, trembling or hoarseness25,26, increased frequency and jitters7,27, lower tone28, decreased rhythm29, lack of emotional expression25, and tonal changes29. However, the quantification of the tonal change remains ambiguous.

For this reason, the early diagnosis of PD with accurate, reliable and unbiased predictive models is crucial for PD patients30,31. Previous studies revealed that vocal changes, including poor articulation, trembling or hoarseness, frequency changes, degraded sound quality, lower tone, decreased rhythm, a lack of emotional expression and tonal changes, are important manifestations showing the early development of PD. Muscular hypertension reduces the controllability of vocal cord vibrations and allows insufficient airflow into the lungs for vocal sounds to proceed smoothly32. However, our understanding of the link between physical features and clinical significance remains unclear. Consequently, we aimed to clarify the current state of the voice features during the early stage of developing dysarthria in PD. Moreover, protective factors and risk factors were distinguished from these acoustic parameters.

Methods

Participants

Two groups of participants (PD patients and controls) were recruited for this study from January to August 2019. For the idiopathic PD group, 35 participants (21 male and 14 female patients) were recruited. According to the UK Parkinson’s Disease Society Bank Criteria, they had been diagnosed with idiopathic PD in our neurology department prior to the recruitment interval. Furthermore, the dysarthria level in the PD patients was assessed using the Voice Handicap Index (VHI-30)33. All the participants were assessed by investigator inquiry for their living habits in terms of alcohol consumption and smoking. Each PD patient was assessed using the Hoehn–Yahr scale (H&Y) and the Unified Parkinson’s Disease Rating Scale Motor Score (UPDRS III). A trained neurologist conducted the entire assessment process. Twenty-six age- and sex-matched healthy participants were recruited for the control group. During the first three months of the study, the participants participated in no other clinical trials. The exclusion criteria for all the participants were as follows: (1) a history of a communication or neurological disorder; (2) throat disease, such as pharyngitis, laryngitis, or laryngeal tumors; (3) severe mental disorder or cognitive impairment, which may hinder speech; (4) psychotic or systemic major illness; (5) clinical problems such as aphasia and severe dysarthria; (6) history of acute stroke, sports injury, or mental disorder; (7) long-term use of systemic treatment methods that affect sound detection; (8) inability to complete the study tasks accurately; and (9) participation in other rehabilitation projects. All the patients stopped taking levodopa on the morning of the sound test, they continued to take other anti-Parkinson's drugs and were still in the “ON” phase. Therefore, the sound test was not performed during the “Off” phase, when the patients had severe motor symptoms.

Voice recordings and data preparation

The clinician guided and oversaw the quality control of the voice test. Prior to this test, every participant was thoroughly introduced to the general voice test workflow by the clinician. Therefore, all the participants could cooperate to complete the entire task smoothly. There was no time limit for any item until the inspector was satisfied with the subject’s performance.

Vowels, including /a/, /o/, and /e/, are often used in the phonetic test, which involved the movement of the utterance organ in various positions34. Here, every syllable consisted of a vowel, and the quantity of syllables depended on the quantity of Chinese characters, namely, a single syllable had only one Chinese character, a double syllable had a double Chinese character, and multiple syllables had no more than 5 Chinese characters. There were 12 single syllable samples, 8 double syllable samples and 6 multiple syllable samples for every participant. Sound recordings were performed in a low-noise (< 50 dB) room. An acoustic recording pen (Sony, Japan) was held 30 cm from the subject's mouth to record their sound. It was important to ensure that the subject maintained their normal tone and loudness in a relaxed state while recording. All the participants were subject to the guidelines of the clinician. If the subject felt tired, the test was paused until he/she felt comfortable completing the remainder.

Acoustic parameter extraction

A total of 1,400 sound clips were collected. The duration of one clip (single syllable, double syllable, multiple syllables) is approximately 4–7 s. The duration of the entire recording process was an average of 12 min per candidate for one test. Subsequently, 12 acoustic parameters were extracted using a customized MATLAB script. The acoustic parameters included (a) start f0 (Hz), (b) mean f0 (Hz), (c) mid f0 (Hz), (d) minimum f0 (Hz), (e) maximum f0 (Hz), (f) end f0 (Hz), (g) slope from the maximum f0 to the end of the call (slope M-E; Hz/s), (h) slope from the start f0 of the call to the maximum f0 (slope S-M; Hz/s), (i) median intensity (Hz), (j) duration of speaking (seconds) (k) harmonics-to-noise ratio (HNR, Hz) and (l) jitter (the absolute f0 difference between consecutive f0 measurements/the average period).

Statistical analysis

All the data were stored in Excel files. The data are represented as the means ± s. dev. All the analyses were performed with STATA 15.0 (StataCorp. 2017. Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC). In brief, the chi-square test was used to analyze the differences in the demographic distribution by sex, profession, alcohol consumption, smoking habit and educational level. A Student’s t-test was conducted to assess whether there was an age difference between the two groups. If the data were not normal, the acoustic parameters between the two groups were compared using a Mann–Whitney test. If the data were normally distributed, the Student's t-test was employed. A logistic regression analysis was performed to differentiate the protective factors and risk factors from among the acoustic parameters. To test the collinearity of the regression model, the Pearson correlation coefficients were calculated among the seven parameters in the model. The absolute value of the coefficient between these parameters was very small, with the largest being 0.242, and most of them were less than 0.1. Additionally, the meaning of the parameters is completely different, and thus it is not appropriate to use the dimensionality reduction method to solve the collinearity problem. In addition, the prediction effect is reasonable and rich with professional significance. Therefore, this weak collinearity can be ignored. Spearman's rank correlation coefficient was used to determine the correlation of the acoustic parameters with the H&Y, UPDRS III and VHI-30. A P value of < 0.05 was considered significant.

Results

There was no significant difference in the age distribution between the two groups (t = 0.5305, df = 59, P = 0.7011). No statistically significant sex difference was found between the two groups (χ2 = 1.874, df = 1, P = 0.171). In addition, no significant differences were found in the patient alcohol consumption or smoking habits. However, a difference in the distribution of professions was found between the two groups (Fisher’s exact test, χ2 = 6.2674, df = 2, P = 0.044). Similarly, there was a significant difference in educational levels between the two groups (χ2 = 8.8961, df = 3, P = 0.03). The average H&Y and UPDRS III scores of the PD group were 2.60 ± 0.81 and 35.60 ± 20.39, respectively (Table 1).

Table 1 Baseline characteristics of the participants.

With the aim of reducing the effect from the sex, Table 2 compares the average acoustic parameters between the PD group and the control group for different sexes. The mean f0 among the three syllable tests was lower in the PD group than in the control group among males. Among these findings, the two groups of single-syllable tests displayed significant differences (P < 0.05). The min f0 was significantly different in both the double syllable test and the multiple syllable test in males between double groups (P < 0.05). Regarding the female patients, the single and double syllable tests presented significant differences between the double groups (P < 0.05). Moreover, the max f0 was significantly lower in the PD group than in the control group among the males. For female patients, a significant difference was found only in the single syllable test between double groups. Interestingly, female PD patients presented significantly higher results than the control. By contrast, this finding differs from that of the male patients, and we believe that the reasons are the sex hormone difference and anatomical structural differences between the sexes. Similarly, the end f0 in the single syllable and double syllable tests, the slope S-M in the single syllable test and the median intensity in the multiple syllable test of the PD female patients were significantly higher than those of the controls. In the male double syllable and multiple syllable test, the duration of the PD group was significantly shorter than that of the control group. The jitter results showed statistical differences between groups in the male double syllable test only.

Table 2 Comparison of mean values in sound parameters between PD and control patients.

Subsequently, a logistic regression analysis was performed to determine the protective and risk factors among the acoustic parameters (Table 3). Specifically, Z values of less than 0 signify protective factors, including sex, alcohol consumption, start f0, min f0, max f0, slope S-M, jitter and duration. The other acoustic parameters were risk factors, including the age, smoking habit, education, end f0, HNR and median intensity.

Table 3 Regression analysis result of acoustic parameters among all participants.

In Table 4, the Spearman's rank correlation coefficient was used to determine the correlation of the acoustic parameters with the H&Y, UPDRS III and VHI-30. The single most striking observation to emerge from the data comparison was that the mean f0 was negatively correlated with the VHI-30 in the single and double syllable tests (P < 0.05, R < 0). Namely, the mid f0 was inversely related to the VHI-30 in the double syllable tests. Additionally, the max f0 was negatively correlated with the VHI-30 in monosyllable and multisyllable PD patients. The correlation between the end f0 and the H&Y was statistically significant in the double and multiple syllable tests (P < 0.05). In addition, these factors showed a negative correlation (R < 0). Moreover, a positive correlation was found between the slope M-E and UPDRS III in the single syllable test (P < 0.05, R > 0). The slope M-E of the single and double syllable tests was negatively correlated with the VHI-30. Interestingly, the median intensity and duration were significantly positively correlated with the H&Y and UPDRS III in the single syllable results (P < 0.05, R > 0). However, all the syllables of median intensity were related to the VHI-30. Furthermore, the HNR was negatively correlated with the UPDRS III in the multiple syllable findings (P < 0.05, R < 0). Likewise, the jitter showed a significant negative correlation with the H&Y in the single syllable test (P < 0.05, R < 0). The correlation between the jitter and the VHI-30 was present in the single and multiple syllable results.

Table 4 Correlation between clinical severity of PD and acoustic parameters.

Discussion

Among the participants, the results showed a significant difference in the distribution of professions and education across both groups; a large proportion of participants were farmers, and a low educational level was noted for a larger proportion of patients than other education levels (see Table 1).

Our data revealed significant differences in the mean f0, min f0, max f0, duration and jitter in male participants between the two groups and significant changes in the min f0, max f0, end f0, slope S-M and median intensity in female participants between the two groups (see Table 2).

The vocal fold vibration frequency is known as the fundamental frequency. The PD group presented a lower start f0, lower max f0, higher mid f0 and higher end f0 than the control group. The start f0, min f0, max f0 and end f0 denote the start, minimum, maximum and end values of produced f0 movement in the syllables, respectively. These variables can be used to describe the entire vocalization process. However, we found a significant difference from the healthy group only among the mean f0 in male participants speaking a single syllable, min f0 in both male and female participants and end f0 in female participants speaking single and double syllables. Holmes et al. examined the correlates of PD voice disorders in the fundamental frequency (f0) of the pronounced vowels, and they found that the speaking f0 of the PD patients was reduced35. The vocal cord vibration frequency, vocal cord status information related to voice quality and energy, muscle contraction, joint movement and sound intensity are provided to the CNS by somatosensors located in and near the larynx36. The stimulation of the superior laryngeal nerve, laryngeal mucosa or cartilage results in the reflexive contraction of laryngeal muscles. These reflections may be important for controlling the voice f037.

Moreover, our data showed that slope S-M was significantly larger than the control group in female PD polysyllables. Slope S-M is the slope from the start to the maximum of the pronunciation, and slope M-E is the slope from the maximum to the ending. This factor measures the rate of airflow caused by the coordinated movement of lung muscles. In addition, our data revealed that the median intensity of male PD patients was higher than that of the control group in the overall syllable test. Similarly, the female PD patients also presented a higher median intensity than the controls. However, only during the multiple syllable test did female PD patients present significantly higher intensity than the control group. The median intensity represents the intensity of vocalization. Having a low voice is also one of the symptoms of PD patients. This characteristic is due to the stiffness of the larynx muscles in PD patients, which makes pronunciation difficult. In 2018, Abur et al. studied the loudness of pure tone between PD patients and a control group. The results showed that the average loudness growth slopes of the control and PD groups were not significantly different, while the tone perception and loudness of the PD patients decreased38. Unfortunately, these results are contrary to our findings. As PD patients worsen, they usually develop a low voice, especially during the middle and late stages of the disease39. In our study, most patients with PD had a low level of dysarthria and were in the early and middle stages of the disease. Therefore, the difference in sound intensity may be due to the different severity levels of the disease among PD patients.

Notably, we found that PD patients read the same sentence for a shorter duration than the controls. We found a significant difference in the duration in male participants speaking double and multiple syllables. The duration refers to the time the subject takes to read the same sentence. This measurement is related to speech rhythm and time organization29. The dopamine in the basal ganglia of PD patients is gradually depleted, which is the primary cause of muscle stiffness, and it changes the controllability of the larynx muscles40. Subsequently, muscle stiffness in the throat and pharynx can significantly affect the pronunciation speed and number of pauses41, indicating impaired speech rhythm and timing29. Furthermore, the muscle contraction intensity in the chest cavity and diaphragm is significantly reduced, which leads to a reduction in the airflow from the lungs through the vocal cords. Ultimately, the reduced airflow affects the vibration of the vocal cords, and the shape of the vocal cords affects the sound pressure threshold8. These interactions may reduce the time for vocalization. Similarly, Hammer et al. compared the air flow and acoustic parameters between PD patients and controls. Their results showed that PD patients presented a shorter syllable-speaking duration than the controls11. PD patients present several significant speech characteristics, including an increasing trend in the speech rate and a reduction in the total number of pauses. Our findings are consistent with these previously reported results.

In addition, the HNR is expressed as the degree of acoustic periodicity and is used to estimate the signal-to-noise ratio by calculating the autocorrelation of each cycle. The HNR can be used to correlate the laryngeal pathology and voice changes, indicating the hoarseness of the voice. The lower the value is, the higher the hoarseness of the sound. This value also reflects the muscle tone of its larynx42,43. Our results showed that the PD patients displayed lower HNR than the control subjects, but the difference was not statistically significant. Zwirner et al. found that the HNR was lower in PD patients than in controls, but no significant difference was found44. This finding is consistent with our research. The perturbation in frequency in successive vocal fold cycles is termed jitter, which may be related to tremors in the vocal cords of PD patients. Due to the lack of control over the vocal cord vibration cycle of the glottis, the jitter may change, which is usually found in neurological diseases45. Our results indicated that the jitter values were lower in PD patients than in control subjects, and there were significant differences between male participants speaking double syllables. As the jitter of the sound decreases, its periodicity also decreases, and a creakier voice will be produced46. The depletion of dopaminergic neurons in the substantia nigra pars compacta usually leads to muscle rigidity and changes in the muscle control of the larynx (phonatory subsystem), which may induce increased throat tension (which is physiologically related) and decreased verbal variability47. Nevertheless, Gamboa et al. found that both male and female PD patients had higher jitter values than control subjects. The increased jitter may be related to the perceived low tone, which should correspond to the real f048.

The Z-values from the regression analysis indicated that the acoustic parameters of start f0, min f0, max f0, slope S-M, duration and jitter were less than zero, signifying that these parameters are positive factors (Table 3). Pathological voice tremor occurs when involuntary and rhythmical oscillatory movements are initiated in the vocal tract. These movements can induce rhythmic fluctuations in the fundamental frequency and amplitude of the voice49. These fluctuations are perceived as rhythmic fluctuations in pitch and loudness. For example, Midi et al. revealed that patients with PD had higher jitter, fundamental frequency and fundamental frequency variability than control subjects. These results indicated that the higher f0 and f0 variations in PD patients are generally attributable to the increased stiffness of the vocal folds because of the rigidity of the laryngeal musculature7. Moreover, Alexander et al. analyzed the acoustic characteristics of PD speech before and after PD patients took a medication, and they revealed a higher fundamental frequency (f0) variability in vowels and mean f0 but a lower intensity range in PD patients than in the controls50.

Conversely, the Z-values from the regression analysis indicated that the acoustic parameters of end f0, HNR and the median intensity were greater than zero, signifying that they are negative factors. End f0 represents the frequency of vocal fold vibration at the end of speech, which indicates the voice change trend from the maximum frequency to the ending frequency. There was a significant difference in the end f0 between the two groups of female participants. Interestingly, the results of the logistic regression analysis revealed that the end f0 was a negative factor. Moreover, the absolute HNR was smaller in the PD group than in the control group except in the double- and multiple-syllable tests in the female PD group. Yumoto et al. demonstrated that lower absolute HNR values correspond to a greater proportion of noise. This finding suggests that a lower HNR represents a larger proportion of noise. Yumoto et al. also showed that HNR is an indicator of the degree of hoarseness43. Rusz et al. showed that patients with PD had lower HNR than the control subjects, which may be clinically interpreted as hypophonia, voice hoarseness, or tremolo51. It is worth noting that the median intensity was a protective factor in the logistic analysis, which indicates that the lower vocalization in PD corresponds to worse PD severity. Namely, the healthy controls presented louder speech. Rusz et al. assessed the extent of vocal impairment in PD patients and healthy controls, and the results revealed that PD patients have an overall lower speech intensity level, insufficient intensity range, and intensity variations during speech production25.

Our results showed that the acoustic parameters of end f0, HNR and jitter were negatively correlated with the clinical severity of PD. The slope M-E, median intensity and duration were positively correlated with the severity. Bayestehtashk et al. conducted three tasks, namely the sustained phonation task, the diadochokinetic task and a reading task with 168 PD patients, and they used a time-varying harmonic model of speech to capture clues related to pitch more accurately, including the jitter and shimmer. The results show that the severity of the disease can be inferred from speech, with an average absolute error of about 5.5, explaining 61% of the variance52. Similarly, Asgari et al. showed that it is possible to predict the severity of the disease by extracting voice information from PD patients (time domain, spectrum domain, cepstrum domain, HNR, and jitter)53. This finding also reflects the correlation between the voice information of these PD patients and the severity of the disease.

Our research described the current state of voice features at either an early stage of PD or an early stage of developing dysarthria. In addition, we discovered the significance of the physical and clinical aspects of the acoustic parameters. The quality of speech changes is universal in PD patients during disease progression.

Conclusion

The mean f0, max f0, min f0, jitter, duration and median intensity of speaking in PD patients were significantly different from those of the healthy controls. The end f0, slope M-E, median intensity, duration, HNR and jitter are related to the clinical severity of PD. In addition to these parameters, the mean f0, mid f0, and max f0 are negatively related to the VHI-30. These changes may strengthen public awareness of PD disease progression.

Limitation

First, although the patients stopped taking levodopa on the morning of the sound test, they continued to take other anti-Parkinson's drugs and was still in the “ON” phase. Therefore, the sound test was not measured during the “Off” phase when the patient had severe motor symptoms. Second, we did not perform a further comparative analysis on the speech of the PD patients at different stages of disease development. Third, the variation in the experimental data may be affected by the current situation of the participant in terms of age, sex and medication regimen. Moreover, we have not yet evaluated voice tremors in PD patients using the related scales.

Ethics approval and consent to participate

The Institute's Institutional Review Board and Ethics Committee at the First Affiliated Hospital of Chengdu Medical College approved this study. All experiments were performed in accordance with relevant guidelines and regulations. Written informed consent was provided by all participants.

Consent for publication

Yes.