Introduction

Idiopathic Parkinson’s disease (PD) is a neurological disorder which affects 1.6 % of the population over 65 years and which is featured by the progressive loss of dopaminergicneurons in the substantia nigra pars compacta. It has been shown that concentrations of dopamine are significantly reduced prior to the apparition of motor deficits1,2. The cardinal signs of PD, usually referred to as parkinsonism, include postural instability, bradykinesia, resting tremor and muscular rigidity. Other neurodegenerative diseases that go beyond the signs and symptoms of parkinsonism are known as atypical parkinsonian syndromes (APS). Multiple system atrophy (MSA) and progressive supranuclear palsy (PSP) are subgroups of APS, with a prevalence around 30–40 per 100,000 among the popoulation older than 65 years3. Clinical features of PSP include supranuclear gaze palsy, axial rigidity, bradykinesia, frequent falls, cognitive decline and communication disorders4,5, reflecting widespread neurodegeneration involving the midbrain as well as the hypothalamic nucleus, globus pallidus, pons, striatum, superior cerebellar peduncle and cerebellar dentate nucleus4. Conversely, MSA is characterized by various combinations of parkinsonian, autonomic and cerebellar features6, corresponding to degeneration of striatum, substantia nigra, middle cerebellar peduncle, cerebellum, inferior olivary nucleus and pons7. APS differ from PD by a poor response to levodopa and more rapid progression of the disease, resulting in a shorter life expectancy8,9.

The majority of PD and APS patients manifest similar clinical features which might render very challenging a correct differential diagnosis10. There exists clinical criteria for the diagnosis of "probable” and "possible” MSA and PSP, based on clinical or/and imaging features, but the definite MSA and PSP diagnosis requires postmortem confirmation by a neuropathological examination7. Currently, several imaging techniques such as MRI, positron emission tomography, diffusion tensor imaging, single-photon emission computed tomography and transcranial sonography can be used to assess various parkinsonian syndromes11. In particular, automatic image-based classification based on metabolic patterns is highly accurate in differentiating between PD, MSA and PSP patients at early disease stages, with more than 84% sensitivity and 94% specificity12. However, metabolic imaging is burdened by the invasive use of radiopharmaceuticals, whilst financial costs and technical demands may limit the use of other imaging methods.

It is now well established that dysarthria, a class of motor speech impairments resulting from neurological disorders, is an early clinical feature of PD and APS13. Due to the dysfunction of the basal ganglia, most of PD patients manifest hypokinetic dysarthria which is characterized by monoloudness, monopitch, variable rate, reduced stress, harsh voice quality, imprecise articulation, inappropriate silence and speech dysfluencies14,15. Conversely, MSA and PSP patients typically manifest mixed dysarthria with a combination of ataxia, hypokinesia and spasticity as a result of more widespread neuronal atrophy16,17,18. Indeed, previous studies16,17 which investigated 44 PSP and 46 MSA patients using perceptual speech and oral motor analysis have reported mixed dysarthria with combinations of all ataxic, hypokinetic and spastic components in two-thirds of the patients. Spastic components were mostly present in PSP patients, while hypokinetic components followed by ataxic components were predominant in MSA patients16,17. Dysarthria can manifest in all the levels of speech production19: respiration, articulation, phonation, timing and prosody (and nasality to a lower extent).

During the last decades, PD speech analysis has gained an increasing interest. However, the majority of research have focused on distinguishing between PD and healthy subjects with the motivation to use speech assessment as a supporting method for early PD diagnosis. While this may be interesting from a fundamental perspective or for monitoring purposes, it has actually a limited impact in clinical diagnosis. Indeed, early PD diagnosis cannot be claimed, as often done, when APS dysarthria is not taken into account. Moreover, the clinical diagnosis often even neglects the possibility of an APS. The resulting speech corpora may thus be noisy in the sense that some patients diagnosed as PD may actually be APS. Such studies may claim at best features/methods which correlate with a diagnosis of parkinsonism (which groups PD and APS), acknowledging that parkinsonism does not require speech analysis to be correctly diagnosed.

On the other hand, there exists only few studies on comparison/discrimination between PD and APS or between APS subgroups13,20,21,22,23,24,25,26,27,28,29,30,31. In this work we focused on this challenging problem by using, first, an assessment of all basic subsystems of connected speech: timing, prosody, articulation, phonation and respiration. Then, based on the findings, we proposed an assessment based dysarthria subtypes: hypokinetic, ataxic and spastic.

Results

Univariate statistical analysis

The overview of methodology and major findings can be seen in Fig. 1 (Supplementary Audio S1S4). Univariate statistical analysis of the initial acoustic features, described in Table 7, is shown in Table 1. Only 2 features yielded individually a significant group difference between PSP and MSA, stdF0a and RFAm (p ~ 0.01). DUSm, stdPSD and RFAt approached significant group difference with p ~ 0.06. However, classification accuracy was poor using these features individually.

Fig. 1: Scheme chart depicting the methodology and major findings.
figure 1

PC hypothesized perceptual correlates, AF acoustic features, WF weighting factor, RLR relative loudness of respiration, PIR pause intervals per respiration, RSR rate of speech respiration, DUS duration of stop consonants, stdF0 standard deviation of fundamental frequency, RFA resonant frequency attenuation, stdPSD standard deviation of the power spectral density, GVI gaping in-between voiced intervals, stdF0a pitch fluctuations, EST entropy of speech timing, RST rate of speech timing, AST acceleration of speech timing, VD vowel duration, DDKI diadochokinetic instability, NSR net speech rate, DDKR diadochokinetic rate, PSI proportion of sub-harmonic intervals, PSP progressive supranuclear palsy, MSA multiple system atrophy.

Table 1 Statistical difference between groups.

The result of the statistical analysis of the designed subsystem features (see the Methods section) is illustrated in Fig. 2. The comparison between groups using Fresp is shown in Fig. 2a. Using this feature, a respiration deficit is reflected in both MSA and PSP, the latter showed however a greater severity compared to MSA (p < 0.05). As shown in Fig. 2b, we found that both MSA and PSP develop a significant phonation impairment as measured by Fphon (p < 0.00001 when compared to HC). However, this impairment was more pronounced for MSA (p = 0.08 for MSA vs. PSP). Using the articulation feature Fart, we did not find a statistical significance between PSP and HC as shown in Fig. 2c, suggesting that Fart does not capture a particular articulation impairment in PSP (this does not mean that PSP do not develop articulation impairment in general). On the other hand, we found that it reflects a significant articulation impairment in MSA (p < 0.0001 compared to both PSP and HC). We did not find group difference between PSP and MSA using Fpros, as illustrated in Fig. 2d. However, as could be expected, we found that monopitch (measured by Fpros) was prominent not only for PSP and MSA but also for PD, (p < 0.00001 compared to HC). We did not find group difference between PSP and MSA using Ftime, as illustrated in 2(e). However, timing deficit (measured by Ftime) showed a severe impairment for PSP and MSA, as compared to HC (p < 0.00001) and PD (p < 0.0001).

Fig. 2: Boxpolts of the distribution across groups of subsystem features.
figure 2

a Fresp = respiration feature; b Fphon = phonation feature; c Fart = articulation feature; d Fpros = prosodic feature; e Ftime = timing feature. HC healthy controls, PD Parkinson’s disease, MSA multiple system atrophy, PSP progressive supranuclear palsy. Statistically significant differences between groups: *p < 0.05, **p < 0.01, ***p < 0.001.

The result of the statistical analysis of the designed indices (see the Methods section) is illustrated in Fig. 3. Using the index SSI1, the impairment is more predominant in PSP as shown in Fig. 3a. Moreover, the SSI1 yields a mutual statistically significant difference between all pairs of groups. We recall here that we did not use neither PD nor HC data in the design process. This indicates that SSI1 has a strong potential in the discrimination between all groups. Using the index SSI2, the impairment is more predominant in MSA as shown in Fig. 3b. However, it does not reflect a particular impairment of PD. Using the index DTI1, the impairment is more predominant in PSP as shown in Fig. 3c. Moreover, as SSI1, DTI1 yields a mutual statistically significant difference between all pairs of groups. This indicates that DTI1 has also a strong potential in the discrimination between all groups. Using the index DTI2, the impairment is more predominant in MSA as shown in Fig. 3d. Moreover, DTI2 yields a mutual statistically significant difference between all pairs of groups, except between PD and PSP.

Fig. 3: Boxpolts of the distribution across groups of subsystem and dysarthria type indices.
figure 3

a SSI1 = first subsystem index; b SSI2 = second subsystem index; c DTI1 = first dysarthria type index; d DTI2 = second dysarthria type index. HC healthy controls, PD Parkinson’s disease, MSA multiple system atrophy, PSP progressive supranuclear palsy. Statistically significant differences between groups: *p < 0.05, **p < 0.01, ***p < 0.001.

Bivariate classification analysis

Figure 4a (resp. b) displays the 2–dimensional distribution of the subsystem indices SSI1 and SSI2 (resp. DTI1 and DTI2). One can visually observe that both representations achieved a good mutual separation between PD, MSA and PSP groups. Table 2 shows the scores of classification between PSP and MSA using all indices. Individual indices did not give a good classification performance. However, the composite indices, CSSI and CDTI, yielded a high classification performance, thanks to the orthogonality incorporated in their design. CSSI gave very good classification scores (84%) which is already significantly higher than accuracies reported in the literature. CDTI yielded an even higher performance (>88%) showing that including prior knowledge on predominant dysarthria types in PSP and MSA does indeed improve the discriminative power.

Fig. 4: Two-dimensional projection of all subjects over the indices.
figure 4

a Using the subsystem indices SSI1 and SSI2, b using the dysarthria type indices DTI1 and DTI2. The black line is the logistic regression boundary for the classification between PSP and MSA using all data.

Table 2 Classification accuracy between PSP and MSA.

Table 3 shows the scores of classification between PD and MSA. Individual indices yielded a relatively low specificity, except DTI2. However, the composite indices, CSSI and CDTI, yielded a high classification performance. Table 4 shows the scores of classification between PD and PSP. Here, DTI1 alone yield a high classification performance. We recall that PD data was not used in the design of the features and indices. These results can be thus seen as an additional posterior validation of the pertinence of our approach.

Table 3 Classification accuracy between PD and MSA.
Table 4 Classification accuracy between PD and PSP.

Relation between speech and motor manifestations

The correlations between the speech indices and bradykinesia/rigidity, bulbar, cerebellar and overall NNIPPS scores in the APS (PSP + MSA) group are shown in Table 5. The overall NNIPPS score was related to the indices SSI2 and DTI2 (r = 0.5, p < 0.01). These two indices also showed strong correlation with the bradykinesia/rigidity and cerebellar NNIPPS subscores (r ~ 0.5, p < 0.01). They were not however correlated to the bulbar subscore. No correlation were detected between the NNIPPS (sub)scores and the indices SSI1 and DTI1.

Table 5 Correlations between speech and clinical motor indices.

Discussion

Our findings indicate that speech disorders reflect the differing underlying pathophysiology of tauopathy in PSP and α-synucleinopathy in MSA. The combination of distinct speech patterns via objective acoustic evaluation was able to discriminate between PSP and MSA with a very high accuracy of up to 88.6%, though the difference was not perceptually identifiable using the UPDRS III speech item. This is in accordance with systematic perceptual assessment, which was not able to distinguish between the speech of PSP and MSA32. To the best of our knowledge, this is the best accuracy reported in the literature concerning speech-based discrimination between two APS with diverging pathophysiology. Although PD data was not used in training, our approach also separated both PSP and MSA from PD with an accuracy of up to 87%, reflecting the fact that dysarthria severity was considerably higher in APS than PD. This finding is consistent with previous studies showing that, at mid/late stages, APS can be distinguished from PD by speech deterioration severity using both perceptual and acoustic analysis13,32. The greater severity of dysarthria in APS was reflected likely because early-stage PD patients manifest pure hypokinetic dysarthria33, whereas APS patients had dysarthria with different combinations of hypokinesia, ataxia, or spasticity16,17.

Speech features related to respiratory dysfunction, imprecise consonants, and monopitch, assessed via the SSI1 dimension, contributed to worse performance in PSP than MSA. The influence of respiratory dysfunction on speech was not yet systematically studied. However, it is well known that patients with PSP have profound impairment of voluntary respiratory control34. In addition, PSP patients had significantly more frequent respiratory infections and respiratory-related deaths when compared to PD patients35. Voiceless consonant abnormalities in PSP have also been reported and associated with perceptual severity of dysarthria29. Although a similar extent of monotone speech was found in both PSP and MSA, the distinguishing accuracy of pitch variability might be attributed to wider performance variability in MSA due to more frequent ataxic components causing excessive pitch fluctuations17,18, as compared to the typical occurrence of hypokinetic and spastic elements of dysarthria in PSP16,36. The relevance of spastic speech components in PSP was further underlined by findings of slow speaking rate and subharmonics that contributed to discrimination accuracy between PSP and MSA via the DTI1 dimension. Both slow speaking rate and subharmonics are considered to be the core features encountered in spastic dysarthria as a result of more widespread neuronal atrophy37,38. In particular, the relation between slow articulation rate and bilateral white and gray matter volume loss was observed in patients with progressive spastic dysarthria39 and patients with multiple sclerosis with predominant spastic-ataxic dysarthria40. Surprisingly, ataxic features reflecting higher diadochokinetic irregularity and prolonged phonemes were affected on average more in PSP than MSA. While the longer phonemes may reflect a slower diadochokinetic rate, the slightly higher diadochokinetic irregularity in PSP might be simply caused by a higher number of patients with severe dysarthria (40% in PSP vs. 32% in MSA), which appear to be the main significant factor influencing oral diadochokinetic performance41,42. Another possible explanation is that cerebellar characteristics in PSP may be related to the high concentration of tauprotein deposits in the brainstem where the cerebello-thalamo-cortical and cortico-ponto-cerebellar pathways pass43.

The phonation and timing abnormalities and articulatory decay generally contributed to worse performance in MSA compared to PSP via both SSI2 and DTI2 indices. Our findings demonstrate overall poorer voice control in MSA, typically manifested as the strained-strangled voice that may give the perceptual impression of quivery-croaky strained speech with increased pitch18. This observation is generally in agreement with a recent study showing that 93% of patients with MSA manifested laryngeal dysfunction during an endoscopic task, in contrast with only 1.8% of patients with PD44. The slightly worse performance in articulatory decay in MSA than PSP might be attributed to more distorted vowels that are more common in ataxic dysarthria38,45. Timing abnormalities were relatively non-specific and affected to a similar extent in both PSP and MSA; however, some timing abnormalities in individual MSA patients might still contribute to discrimination accuracy together with phonatory and articulatory dysfunction.

In patients with APS, an overlap of individual speech features among dysarthria subtypes can be expected38, which makes the correct recognition of a specific dysarthria subtype challenging. Although we strived to separate hypokinetic, ataxic and spastic components of dysarthria as much as possible to eliminate most of this overlap, some dysarthric manifestations may still originate from different neuronal dysfunctions and more than anticipated. For instance, subharmonics may arise due to the involvement of the corticobulbar pathways but also cerebellum and basal ganglia. In particular, variation in speech severity within a dysarthria subtype may explain as much variance in acoustic or perceptual data as variation across dysarthria type46. This assumption was further confirmed by the relationships revealed only between SSI2 and DTI2 dysarthria indexes and overall disease severity by NNIPPS, with no specific correlations observed between hypokinetic, ataxic and spastic speech dimensions and bradykinesia/rigidity, ataxia, and bulbar motor manifestations, respectively.

Our findings are based on sophisticated acoustic analyses, which might limit their application to movement disorders specialists or general neurologists. On the other hand, the majority of applied acoustic features correspond to the main perceptual dimensions encountered in dysarthria of parkinsonism (Fig. 1)16,17, and therefore we believe experience clinicians can still profit from the knowledge of distinctive speech characteristics revealed in the present study. In addition, the fully-automated Dysarthria Analyzer used in this study remains under development, but the free beta version is already available47. The most of investigated speech features can also be analyzed using widely-used, freely-available Praat software, although hand-labeling or additional user control of the analysis is required for some features. Last but not least, the detailed protocol on speech tasks and speech metrics used in this study has already been published within a recent guideline for speech recording and acoustic analyses in dysarthrias of movement disorders48.

One potential limitation of the present study is that we did not differentiate between speech in the various subtypes of PSP and MSA due to the limited opportunity to recruit a larger number of participants via a single center. According to the previous research, it appears that different subtypes of disease have no substantial effect on individual acoustic features21, although more perceptual ataxic abnormalities can be observed in MSA cerebellar subtype compared to parkinsonian subtype31. In our study, patients with MSA cerebellar subtype had mostly all components of dysarthria affected and thus did not principally differed from type of dysarthria observed in MSA parkinsonian subtype. Another limitation is that we did not perform additional testing including cerebrospinal fluid biomarkers and radionuclide scanning to improve the accuracy of clinical diagnosis. Also, our results are based on participants recruited from a single center and thus might not be universal among races. Future research is warranted to confirm and extend our findings via international collaborative effort leading to involvement of different languages and various racial groups. Finally, we did not investigate loudness of speech, which is an important distinguishing feature of hypokinetic dysarthria, because of the need for precise microphone calibration to obtain reliable estimates via acoustic analysis.

In conclusion, our findings highlight that detailed speech analysis can be used as a potential diagnostic screening tool to distinguish between PSP, MSA, and PD. Therefore, this study can become the basis for future multicenter studies in parkinsonism with speech testing. Future studies based on the earlier stages of the disease and potentially accompanied by longitudinal assessment should further elaborate and extend our findings and show the sensitivity of speech investigation in the differentiation between APS. Since the speech impairment in PD appears to be a progressive biomarker that reflects dopaminergic treatment response33 and progression in APS is more rapid and less responsible to L-dopa therapy3, longitudinal in-home assessment over a short period could substantially improve our reported sensitivity of speech-based evaluation. As a result, a vocal assessment may provide a no-cost alternative screening method to existing clinical and imaging diagnostic approaches.

Methods

Participants

For the present study, From 2011 to 2018, we recruited a total of 65 patients via a single center, 25 with a diagnosis of probable MSA (15 men and 10 women), 20 with a diagnosis of probable PSP (13 men and 7 women) and 20 with a diagnosis of idiopathic PD (13 men and 7 women). A specialist of movement disorders established the diagnoses of all patients according to the consensus diagnostic criteria for MSA7, the Movement Disorder society diagnostic criteria for PSP49 and the Movement Disorder Society clinical diagnostic criteria for PD50. The MSA group consisted in 19 patients diagnosed with MSA-parkinsonian (MSA-P) subtype and 6 patients with MSA-cerebellar (MSA-C) subtype while the PSP group was composed of 17 patients diagnosed with PSP-Richardson (PSP-R) syndrome, 2 with PSP-parkinsonism (PSP-P) and 1 with PSP-pure akinesia with gait freezing (PSP-PGF). At the time of examination, each treated MSA and PSP patient was on stable medication, for at least 4 weeks, consisting of various doses of levodopa alone or combined with different dopamine agonizts and/or amantadine. PD patients were examined immediately after the diagnosis and before the initiation of dopaminergic treatment. No PD subject manifested dyskinesias at the time of the examination. Disease duration was estimated based on the self-reported manifestation of the first motor symptoms. Each PSP and MSA patient underwent neurological examination including scoring according to the Neuroprotection and Natural History in Parkinson Plus Syndromes (NNIPPS) scale51, while PD patients were rated by the Movement Disorder Society - Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) motor subscore. None of the patients reported a history of speech-language disorders unrelated to potential neurologic disorder manifestations. No statistically significant differences were found between MSA and PSP groups for medication doses, disease duration, cognitive status, speech or motor severity (Mann–Whitney U test: p = 0.11–0.59). Dysarthria presence, severity and type were evaluated based on the auditory-perceptual judgment of a speech-language specialist experienced in movement disorders using audio recordings of vowel phonation, /pa/-/ta/-/ka/ syllable repetition, and monologue following the perceptual criteria described in ref. 14. Patient clinical and demographic characteristics are summarized in Table 6. The control group consisted of 150 healthy subjects of comparable gender distribution (95 men and 55 women; 63% male gender) as well as age (mean age 65.5, SD 7.1, range 45–83 years, p = 0.08 between controls and PSP and MSA). No control subject reported a history of neurological disorders or other disorders that may affect speech, language or hearing. A significant difference in age distribution was found among PSP, MSA, PD and controls (ANOVA: p = 0.002) mainly due to the slightly younger PD group compared to MSA (p = 0.01) as well as PSP (p = 0.01). All patients and controls were Czech native speakers, and none manifested a cognitive or depressive deficit that would bias with the recording procedure. The study was approved by the Ethics Committee of the General University Hospital in Prague, Czech Republic (approval number 34/18 Grant AZV VES 2019 1.LF UK) and have therefore been performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. All participants provided written, informed consent to the neurological examination and recording procedure.

Table 6 Patient clinical and demographic characteristics.

Speech recording

Speech was recorded in a quiet room with a low ambient noise level using a head-mounted condenser microphone (Beyerdynamic Opus 55, Heilbronn, Germany) placed ~5 cm from the subject’s mouth. Speech signals were sampled at 48 kHz with 16-bit resolution. Each subject was recorded during a single session with a speech specialist. All participants performed 4 vocal tasks of (i) sustained phonation of the vowel /a/ per one breath for as long and steadily as possible, (ii) fast /pa/-/ta/-/ka/ syllable repetition at least seven times per one breath, (iii) reading a short paragraph of a standard text composed of 80 words and (iv) monologue on a self-chosen theme during ~90 s. These speech tasks were chosen because they provide comprehensive information for the objective interpretation and description of motor speech disorders48. Sustained phonation, fast syllable repetition and text reading were carried out twice per session by each subject.

Initial acoustic speech features

We performed a quantitative acoustic vocal assessment of 26 distinct speech dimensions related to hypokinetic (19), ataxic (4) and spastic (3) dysarthria with subsystems consisting of respiration (4), phonation (7), articulation (7), prosody and speech timing (8). Acoustic analysis was preferred because it provides objective, sensitive and quantifiable information for the precise assessment of speech performance from very early stages of PD52. Considering hypokinetic dimensions and respiratory features, we obtained relative loudness respiration (RLR), rate of speech respiration (RSR) and pause intervals per respiration (PIR) via reading passage/monologue. To assess hypokinetic dimensions and phonatory features, we calculated jitter, shimmer and harmonics-to-noise ratio (HNR) via sustained phonation and gaping in-between voiced intervals (GVI) via reading passage/monologue. To examine hypokinetic dimensions and articulatory features, we assessed duration of stop consonants (DUS), resonance frequency attenuation (RFA) via reading passage/monologue, as well as voice onset time (VOT) via syllable repetition. To explore hypokinetic dimensions and timing features, we calculated duration of pause intervals (DPI), entropy of speech timing (EST), rate of speech timing (RST) and acceleration of speech timing (AST) via reading passage/monologue. To investigate hypokinetic features and prosody, we assessed standard deviation of fundamental frequency (stdF0) via reading passage/monologue. Subsequently, ataxic features of pitch fluctuation (stdF0; phonation) and standard deviation of the power spectral density (stdPSD; articulation) were examined via sustained phonation which represent phonation and articulation deficits, respectively, while vowel duration (VD; timing) and diadochokinetic instability (DDKI; timing) were assessed via syllable repetition. Finally, three spastic features of proportion of sub-harmonic intervals (PSI; phonation), diadochokinetic rate (DDKR; articulation) and net speech rate (NSR; timing) were calculated via sustained phonation, syllable repetition and reading passage, respectively. The list of initial speech features we used is given in Table 7. Comprehensive algorithmic details on individual acoustics measures have been reported previously53. Also, the accuracy of algorithms for the identification of glottal cycles, temporal intervals, and pitch sequence has been thoroughly tested in previous studies22,53,54.

Table 7 List of the initial acoustic features: Mlg = Monologue task and Txt = Reading passage task.

Design of acoustic indices by subsystem tasks

We first carried out a univariate statistical analysis of the features presented above. This analysis showed that individual features do not lead to an acceptable discrimination performance. We thus considered linear combination of different features as described in the following.

Individual features were first converted to the z-score using the HC mean and standard deviation. For acoustic features in which lower raw scores was associated with greater dysarthria, the z-score was reversed. Thus, higher z-scores were indicating more speech impairment. We then followed a semi-supervised approach to find feature combinations. As our ultimate goal was to design indices that would not be overfitted to our dataset and would allow easy reproducibility, we restricted feature combinations to averaging. Moreover, the designed features were chosen carefully to minimize the potential overlap between subsystem dysarthria in order to achieve a certain orthogonality between indices. For each subsystem task, to find the best combination of measures for separation between groups (HC vs. MSA, HC vs. PSP and MSA vs. PSP), an exhaustive search over all averages over the acoustic features of that task was performed. The separation performance was measured in term of minimizing the p value of difference between MSA and PSP groups (threshold of significance was set at p < 0.05). Using this search, a combination satisfying this criterion was found only in the articulation subsystem. We thus added an additional step for the other subsystems, we combined the two first averages which yielded the lowest p value and by multiplying the lowest one by 2 in order to give it more weight as compared to the second lowest. If no statistical significance was achieved, such as in the prosodic and timing subsystems, then average giving the lowest p value was kept as the feature assessing that subsystem. Different weighting factors were inspired by University of Michigan Classification of different dysarthria subtypes in APS17. We emphasize that we never used PD data in the design of the new features nor indices. PD data was used a posteriori as "controls" to potentially state the unrealibility of a particular new feature or index.

Using the scheme described above, for each subsystem task we ended up with the following features for each subsystem:

  • Respiration subsystem:

    $${F}_{resp}=\frac{1}{2}\left(\frac{RL{R}_{m}+RL{R}_{t}+PI{R}_{m}+PI{R}_{t}}{4}+RS{R}_{m}+RS{R}_{t}\right)$$

    Hence, more weight/importance is given to RSR than RLR and PIR in the design of Fresp, by a factor of 2.

  • Phonation subsystem:

    $${F}_{phon}=\frac{1}{2}\left(\frac{Jitter+GV{I}_{m}+GV{I}_{t}}{3}+2{stdF0}_{a}\right)$$

    Hence, more weight/importance is given to stdF0a than Jitter and GVI in the design of Fphon, by a factor of 2.

  • Articulation subsystem:

    $${F}_{art}=\frac{RF{A}_{m}+RF{A}_{t}+stdPSD}{3}$$
  • Prosodic subsystem:

    $${F}_{pros}=\frac{{stdF0}_{m}+{stdF0}_{t}}{2}$$
  • Timing subsystem:

    $${F}_{time}=\frac{ES{T}_{m}+RS{T}_{m}+AS{T}_{t}}{3}$$

These new features showed interesting behavior in term of statistical difference between groups, however they could not achieve a good classification accuracy, neither individually nor by a bivariate analysis where a classification was performed by considering a 2-dimensional input space with a feature in one dimension and another one in the second. We then grouped (and combined) the features in term of the class of impairment predominance. That is, the features reflecting an impairment which is more predominant in PSP (resp. MSA) are linearly combined by giving more weight to the feature showing the most significant impairment in PSP (resp. MSA). This led us to define 2 new speech subsystem indices (SSI) as:

  • SSI1 as a combination of the 3 features Fresp, Fpros, DUS where we put more emphasize on the feature Fresp:

    $${{{{{SSI}}}}}_{{{{\rm{1}}}}}={{{{{F}}}}}_{{{{{resp}}}}}+\frac{{{{{{DUS}}}}}_{{{{{m}}}}}}{{{{\rm{2}}}}}+\frac{{{{{{F}}}}}_{{{{{pros}}}}}}{{{{\rm{2}}}}},$$
  • and SSI2 as a combination of the 3 features Fart, Fphon, Ftime where we put more emphasize on the feature Fart:

    $${{{{{SSI}}}}}_{{{{\rm{2}}}}}={{{{{F}}}}}_{{{{{art}}}}}+\frac{{{{{{F}}}}}_{{{{{phon}}}}}}{{{{\rm{2}}}}}+\frac{{{{{{F}}}}}_{{{{{time}}}}}}{{{{\rm{2}}}}}.$$

Note that while DUSm and Fart both assess the articulation subsystem, they actually capture distinct articulation mechanisms.

Finally, we defined the composite speech subsystem index (CSSI) as the vector:

$${{{\rm{CSSI}}}}=({{{{{SSI}}}}}_{{{{\rm{1}}}}},{{{{{SSI}}}}}_{{{{\rm{2}}}}}).$$

Design of acoustic indices by dysarthria type

After designing the 2 acoustic indices based on subsystem tasks, we used the findings to develop new indices which can be associated with subtypes of dysarthria. Tod do so, we first observed that SSI1 can be seen as an hypokinetic feature, we rename it then H1:

$$H1={SSI}_{1}={F}_{resp}+\frac{DU{S}_{m}}{2}+\frac{{F}_{pros}}{2}$$

As for SSI2, it is a combination of an hypokinetic feature H2 and an ataxic one A2:

$$H2={H}_{art}+\frac{{H}_{phon}}{2}+\frac{{H}_{time}}{2},$$

where Hart, Hphon and Htime are hypokinetic articulation, phonation and timing features respectively: \({H}_{art}=\frac{{RFA}_{m}+{RFA}_{t}}{2}\), \({H}_{phon}=\frac{Jitter+{GVI}_{m}+{GVI}_{t}}{3}\), \({H}_{time}=\frac{{EST}_{m}+{RST}_{m}+{AST}_{t}}{3}\), and

$$A2=\frac{{stdF0}_{a}+stdPSD}{2}.$$

We then looked whether we could design other distinctive features related to particular dysarthria types. To do so, first we observe that A2 is obtained from the sustained phonation task only. On the other hand, it is known that syllables repetition is a fertile task to detect ataxic impairments. Following the same methodology as in subsystem feature design, we ended up defining an ataxic feature A1 which reflects a higher impairment in PSP:

$$A1=\frac{VD+DDKI}{2}.$$

Similarly, there is a relative consensus that PSP develop spastic dysarthria. We thus defined a spastic feature S1 as:

$$S1=\frac{1}{2}\left(\frac{NSR+DDKR}{2}+PSI\right).$$

This led us to define 2 dysarthria type indices (DTI) as:

$${{{{{DTI}}}}}_{{{{\rm{1}}}}}={{{{H1}}}}+\frac{{{{{A1}}}}+{{{{S1}}}}}{{{{{2}}}}}$$

and

$${{{{{DTI}}}}}_{{{{\rm{2}}}}}={{{{H2}}}}+{{{{A2}}}}.$$

Finally, we defined the composite dysarthria type index (CDTI) as the vector:

$${{{{CDTI}}}}=({{{{{DTI}}}}}_{{{{\rm{1}}}}},{{{{{DTI}}}}}_{{{{\rm{2}}}}}).$$

Statistical analysis

All analyses were performed in Python. The final speech values from the first and second run of sustained phonation, syllable repetition and reading task were averaged to provide greater stability of speech assessment48. The one-sample Kolmogorov–Smirnov test was used to evaluate the normality of distributions; the majority of acoustic features were found to be normally distributed. Group differences were calculated using analysis of variance for normally distributed data and the Kruskal–Wallis test for non-normally distributed data with the possible presence of outliers. A post hoc Tukey’s test was then applied to find differences between individual groups (HC vs. PD, HC vs. PSP, HC vs. MSA, PSP vs. MSA). Pearson and Spearman correlations were applied to test for significant relationships between normally and non-normally distributed data, respectively.

An overall indication of diagnostic accuracy was reported as the area under the curve (AUC), which we obtained from the receiver operating characteristic curve. The classification performance (sensitivity/specificity) of differentiating between groups was calculated using binary logistic regression with leave-one-speaker-out (LOSO) cross-validation. In PSP vs. MSA classification, the PSP group is considered as the positive label. In PD vs PSP/MSA classification, the PSP/MSA group is considered as the positive label.