Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Identifying Parkinson’s disease subtypes with motor and non-motor symptoms via model-based multi-partition clustering


Identification of Parkinson’s disease subtypes may help understand underlying disease mechanisms and provide personalized management. Although clustering methods have been previously used for subtyping, they have reported generic subtypes of limited relevance in real life practice because patients do not always fit into a single category. The aim of this study was to identify new subtypes assuming that patients could be grouped differently according to certain sets of related symptoms. To this purpose, a novel model-based multi-partition clustering method was applied on data from an international, multi-center, cross-sectional study of 402 Parkinson’s disease patients. Both motor and non-motor symptoms were considered. As a result, eight sets of related symptoms were identified. Each of them provided a different way to group patients: impulse control issues, overall non-motor symptoms, presence of dyskinesias and pyschosis, fatigue, axial symptoms and motor fluctuations, autonomic dysfunction, depression, and excessive sweating. Each of these groups could be seen as a subtype of the disease. Significant differences between subtypes (P< 0.01) were found in sex, age, age of onset, disease duration, Hoehn & Yahr stage, and treatment. Independent confirmation of these results could have implications for the clinical management of Parkinson’s disease patients.


Parkinson’s disease (PD) is a progressive neurodegenerative disease that is clinically characterized by a broad spectrum of motor and non-motor manifestations1. There is, however, considerable clinical phenotypic and natural history related variability between PD patients, which may indicate the existence of disease subtypes. Identification of PD subtypes may help understand the underlying disease mechanisms, since homogeneous groups of patients may be more likely to share pathological and genetic features. In addition, identification of PD subtypes may ultimately lead to more precise treatment strategies (i.e., precision medicine)2.

Data-driven techniques such as clustering may be suitable for establishing PD subtypes. In clustering, patients are assigned to several groups (i.e., clusters) so that patients belonging to the same group share similarities. Each of these groups is usually then considered a subtype of the disease. Previous clustering studies have already identified PD subtypes with motor and non-motor symptoms3,4,5,6,7,8. However, to the best of our knowledge, all of them have used single-partition clustering methods such as k-means9, latent class analysis10, Gaussian mixture model11, agglomerative hierarchical clustering12. Single-partition clustering algorithms assume the existence of a single true clustering in a dataset. As a result, each patient is assigned to a single subtype that is defined by all the considered symptoms.

Recently, several issues have been raised about data-driven PD subtypes, such as the low number in the samples, their lack of internal homogeneity, and their difficulty to reproduce meaningful data in real life and external validity13,14. We believe that these issues may be a consequence of using single-partition clustering methods. The assumption that each patient should be assigned to a single generic subtype does not hold for PD, which is usually multifaceted and can be meaningfully partitioned in multiple ways15,16. For this reason, we advocate for model-based multi-partition clustering17,18,19,20, which extends model-based clustering11 by producing mixture models with multiple categorical latent variables. The idea is to use statistical principles to find sets of related symptoms where patients are divided into a number of distinct groups. Each set of symptoms defines a different clustering of patients. As a result, each patient is assigned to one subtype for each clustering. The analysis of these subtypes and their associations may provide more accurate insights about the considered symptoms, as well as their relationship with socio-demographic and clinical information of the patients.

Based on the above, the objectives of our study were: (i) to identify PD subtypes using model-based multi-partition clustering, and (ii) to analyze the associations between the resulting subtypes. To this end, we developed a novel model-based multi-partition clustering algorithm, and applied this method on data from a large, multi-center, international, and well-characterized cohort of patients.



The analysis was carried out on data gathered from the first validation study of the Movement Disorder Society Non-Motor Rating Scale (MDS-NMS), an international, multi-center, cross-sectional study that included PD English-speaking patients from England and the United States21. The study was approved by the institutional review boards or ethics committees of the participating centers. All patients gave their written informed consent to participate in the study. Institutional review boards or ethics committees that approved the study: (1) National Research Ethics Service Committee East Midlands-Northampton, England; (2) Institutional Review Board at the Perelman School of Medicine at the University of Pennsylvania, United States. In addition, the study was conducted according to good clinical practice and all research was performed in accordance with relevant guidelines and regulations. Data are publicly available in our GitHub repository22. For all patients, socio-demographic information and basic clinical variables (i.e., sex, age, age of onset, and disease duration) were recorded and the following assessments were applied:

  1. 1.

    The Movement Disorder Society Unified Parkinson’s Disease Rating Scale (MDS-UPDRS)23, which is composed of 65 items divided across 4 parts, namely, Part I: Non-motor Experiences of Daily Living (13 items); Part II: Motor Experiences of Daily Living (13 items); Part III: Motor Examination (33 items); and Part IV: Motor Complications (6 items). Each item has 5 options of response, running from 0 (normal) to 4 (maximum intensity). The total score of each part is obtained by summing the respective item scores.

  2. 2.

    The MDS-NMS21, which is composed of 52 items grouped into 13 domains: depression, anxiety, apathy, psychosis, impulse control and related disorders (ICRDs), cognition, orthostatic hypotension, urinary, sexual, gastrointestinal, sleep and wakefulness, pain, and other. Each item is scored for both frequency and severity, where both scores have 5 options of response, ranging from 0 (normal) to 4 (maximum intensity). Each item score is generated by multiplying frequency and severity. The score of each domain is obtained by summing the respective item scores. The MDS-NMS also includes a subscale for non-motor fluctuations, composed of 8 items, which was not considered in this study.

  3. 3.

    The Hoehn & Yahr (H&Y) staging system24, which ranges from 1 to 5.

Motor items from the MDS-UPDRS were classed as 5 motor cardinal signs: tremor, rigidity, bradykinesia, dyskinesias and motor fluctuations; and 2 motor subtypes: axial symptoms and postural instability gait difficulty (PIGD)25. This resulted in 7 motor variables. The specific MDS-UPDRS items that constitute each motor variable are provided in Section 1 of the Supplementary Information. Additionally, items from the MDS-NMS were grouped into their respective domains, with the exception of the items from the ”other” domain (unintentional weight loss, decrease in sense of smell, physical fatigue, mental fatigue, and excessive sweating). These items were individually considered due to their individual and unique status. This resulted in 17 non-motor variables. Finally, with the objective of improving the interpretability of the results, both motor and non-motor variables were normalized to the [0, 1] range using min-max scaling.

A total of 402 patients were considered for this study. Average onset age was 59 ± 11 (s.d.) years, 62% were male and average PD duration was 8 ± 6 (s.d.) years. 13% of the patients were in H&Y stage 1; 54% in H&Y stage 2; 28% in H&Y stage 3; and 5% in H&Y stage 4. No patients in this study were in H&Y stage 5. Regarding medication, 87% of the patients took levodopa. The average levodopa daily dose (LDD) for these patients was 658.57 ± 503.55 milligrams (mg). In addition, 42% of the patients received dopamine agonist (DA) treatment. The average levodopa-equivalent daily dose of DA (LEDD-DA), calculated following Tomlinson et al.26, for these patients was 226.84 ± 132.14 mg. Finally, with respect to missing information, 64 values (< 1% of the total) were missing, mostly in the Sexual domain of the MDS-NMS. As our multi-partition clustering method was able to work with missing information, no patients were excluded from the analysis. For more information about the data, see Table 1.

Table 1 Descriptive statistics of the data. Numbers between parentheses correspond to standard deviations (s.d.).

Model-based multi-partition clustering method

A novel model-based multi-partition clustering method was developed to identify groups of individuals with specific patterns in the motor and non-motor domains. The proposed method learned a conditional linear Gaussian Bayesian network (BN)27 with multiple categorical latent variables. Each latent variable provided a unique way to partition PD patients according to a unique set of symptom variables. Each group of patients was considered a PD subtype with respect to the partition variables.

Two components can be distinguished in every BN: (i) a directed acyclic graph that encodes conditional independences among triplets of variables in the BN; and (ii) a set of parameters that describe the conditional probability distributions of each variable given its parents in the graph. Together, both of these elements define a unique joint probability distribution. BNs are useful in multi-partition clustering for several reasons. First, their graphical structure allows for an easier interpretation, showing which variables define each partition, and how partitions relate to each other. Second, their conditional independences result in more compact models that are easier to learn from data. Finally, BNs allow probabilistic inference, which is useful for making predictions, diagnoses and explanations.

Our proposal iteratively explores the space of conditional linear Gaussian BNs using five latent operators and a variational Bayesian28 version of the structural expectation-maximization29 algorithm. Latent operators are tasked with introducing latent variables, removing latent variables, and changing the cardinality (i.e., number of subtypes) of latent variables. Each application of these operators produces a candidate model whose structure is refined using the variational Bayesian structural expectation-maximization algorithm. Once all the candidate models have been evaluated, the highest scoring model is selected. This process is iteratively repeated until the model score ceases to increase. Given its greedy nature, we refer to this method as greedy latent structure learner. It is formally defined in Sections 2 and 3 of the Supplementary Information, and its implementation in Java 8 is publicly available in the project’s Github repository22.

Analysis of multi-partition PD subtypes

By using a conditional linear Gaussian BN, each subtype in a partition was defined by a linear Gaussian distribution whose dimensions corresponded to the partition symptoms. The symbol μ was used to denote the mean of this subtype for a specific symptom and the symbol \(\sigma \) was used to denote the s.d. In addition, to improve the readability of these subtypes, we devised a simple scale that considered the quartiles of the normalized [0, 1] range to refer to the mean symptom severity: (i) slight [0.01, 0.25]; (ii) mild [0.26, 0.50]; (iii) moderate [0.51, 0.75]; and (iv) severe [0.76, 1]. Note that this scale differs from the MDS-UPDRS and MDS-NMS ratings.

To explore the relationship between socio-demographic information, basic clinical variables, H&Y stage and the identified subtypes, hypothesis tests were performed. Each pair of subtypes in a partition were compared. For continuous variables such as age, age of PD onset, PD duration, LDD, and LEDD-DA, an ANOVA test or a Mann-Whitney U-test (both implemented in the Python library SciPy version 1.5.2) was used. When three or more groups were present in a clustering, an ANOVA test or a Kruskal-Wallis test (both implemented in SciPy) was performed instead, followed by a post-hoc analysis using Tukey’s range test (implemented in the Python library Statsmodels version 0.11.1). For categorical variables such as the sex of the patient, the presence of levodopa and DA treatments, and discrete variables such as the H&Y stage, \(\chi ^{2}\) tests (implemented in SciPy) were performed. Statistical significance was defined as p-value p < 0.01.

To analyze the associations between the identified subtypes, probabilistic inference was employed. For example, consider a hypothetical multi-partition clustering model with two partitions, A and B, which are connected by an arc in the model. Partition A defines two PD subtypes {A1, A2} according to a set of symptoms. Partition B defines three PD subtypes {B1, B2, B3} according to a different set of symptoms. We are interested in estimating the difference between the probability distributions P(B) and P(B|A = A1), but also the difference between P(B) and P(B|A = A2). That is, how being assigned a subtype in A affects the probability distribution of B. The inverse probabilistic queries are also relevant (i.e., how being assigned a subtype in B affects the probability distribution of A). Since each subtype in a partition is characterized by a set of symptoms with a certain severity, we are incidentally studying the relationships between their respective symptoms (i.e., how an increase or decrease of the severity of certain symptoms affect the probability of suffering the other symptoms with more or less severity) when we analyze the relationships between subtypes of different partitions. In this study, probabilistic queries were carried out using Monte Carlo sampling in the tool for BN analysis GeNIe (version 3.0).


Multi-partition PD subtypes

The BN structure that resulted from applying our multi-partition clustering algorithm on the 7 motor and 17 non-motor variables is portrayed in Fig. 1. It consisted of 9 (alphabetically-named) latent variables. Each latent variable defined a unique partition according its descendant symptom variables in the graph. For example, in partition A, patients were divided into two subtypes according to the severity of their ICRDs and PIGD. There was, however, one latent variable that differed from the rest by not being directly related to any symptom variable. Instead, it acted as an auxiliary latent variable that connected partitions G (weight loss-depression) and H (excessive sweating-anxiety). This variable was I, and to simplify the analysis, its relevant information was condensed in those of G and H. As a result, 8 partitions were discovered, each with a different number of subtypes. The sex, age, age of onset, disease duration, and H&Y stage of each subtype is provided in Table 2. In addition, treatment information of each subtype is provided in Table 3. Significant differences between subtypes are included. Results (i.e., p-values) of the statistical tests that were performed are provided in Section 4 of the Supplementary Information.

Figure 1

BN structure of the multi-partition clustering model. Blue nodes represent motor variables and red nodes represent non-motor variables. Grey nodes represent categorical latent variables (i.e., partitions), where the number in parentheses corresponds to the variable’s cardinality (i.e., the number of subtypes). Graph arcs represent conditional dependences.

Table 2 Sex, age, age of onset, disease duration and H&Y stage of each PD subtype. Numbers between parentheses correspond to standard deviations.

Partition A (ICRDs-PIGD)

  • Subtype A1 (83%) was characterized by 335 patients that did not show problems to control their impulses (\(\mu \) = 0.00, \(\sigma \) = 0.00), but did show slight PIGD (\(\mu \) = 0.22, \(\sigma \) = 0.19).

  • Subtype A2 (17%) was characterized by 67 patients that showed slight problems to control their impulses (\(\mu \) = 0.09, \(\sigma \) = 0.08), and also presented slight PIGD (\(\mu \) = 0.20, \(\sigma \) = 0.16).

Partition B (apathy-cognitive-pain-gastrointestinal-sleep-urinary)

  • Subtype B1 (49%) was formed of 199 patients that showed no apathy (\(\mu \) = 0.00, \(\sigma \) = 0.00), slight cognitive changes (\(\mu \) = 0.03, \(\sigma \) = 0.04), slight pain (\(\mu \) = 0.06, \(\sigma \) = 0.07), slight gastrointestinal problems (\(\mu \) = 0.04, \(\sigma \) = 0.05), slight sleep disorders (\(\mu \) = 0.06, \(\sigma \) = 0.07), and slight urinary issues (\(\mu \) = 0.06, \(\sigma \) = 0.09).

  • Subtype B2 (51%) was formed of 203 patients that showed slight apathy (\(\mu \) = 0.16, \(\sigma \) = 0.18), slight cognitive changes (\(\mu \) = 0.17, \(\sigma \) = 0.14), slight pain (\(\mu \) = 0.19, \(\sigma \) = 0.17), slight gastrointestinal problems (\(\mu \) = 0.15, \(\sigma \) = 0.14), slight sleep disorders (\(\mu \) = 0.17, \(\sigma \) = 0.13), and slight urinary issues (\(\mu \) = 0.25, \(\sigma \) = 0.22).

Table 3 Treatment information of each subtype. Numbers between parentheses correspond to standard deviations. \(^a\)Significant differences between A1 and A2. \(^b\)Significant differences between B1 and B2. \(^{c_{1}}\)Significant differences between C1 and C2. \(^{c_{2}}\)Significant differences between C1 and C3. \(^{c_{3}}\)Significant differences between C2 and C3. \(^d\)Significant differences between D1 and D2. \(^e\)Significant differences between E1 and E2. \(^f\)Significant differences between F1 and F2. \(^h\)Significant differences between H1 and H2. Statistical significance was defined as p-value p < 0.01.

Partition C (dyskinesias-psychosis)

  • Subtype C1 (60%) was composed of 242 patients that showed no dyskinesias (\(\mu \) = 0.00, \(\sigma \) = 0.01) or psychosis (\(\mu \) = 0.00, \(\sigma \) = 0.00).

  • Subtype C2 (28%) was composed of 112 patients that showed slight dyskinesias (\(\mu \) = 0.18, \(\sigma \) = 0.18) and slight psychosis (\(\mu \) = 0.02, \(\sigma \) = 0.03).

  • Subtype C3 (12%) was composed of 48 patients that showed slight dyskinesias (\(\mu \) = 0.15, \(\sigma \) = 0.23) and slight psychosis (\(\mu \) = 0.14, \(\sigma \) = 0.11).

Partition D (mental fatigue-physical fatigue)

  • Subtype D1 (67%) consisted of 272 patients that showed no mental fatigue (\(\mu \) = 0.00, \(\sigma \) = 0.01) and slight physical fatigue (\(\mu \) = 0.14, \(\sigma \) = 0.06).

  • Subtype D2 (33%) consisted of 130 patients that showed mild mental fatigue (\(\mu \) = 0.28, \(\sigma \) = 0.23) and mild physical fatigue (\(\mu \) = 0.35, \(\sigma \) = 0.06).

Partition E (axial symptoms-bradykinesia-loss of smell-motor fluctuations)

  • Subtype E1 (49%) was constituted by 199 patients that showed slight axial symptoms (\(\mu \) = 0.19, \(\sigma \) = 0.15), slight bradykinesia (\(\mu \) = 0.21, \(\sigma \) = 0.02), mild loss of smell (\(\mu \) = 0.29, \(\sigma \) = 0.15), but no motor fluctuations (\(\mu \) = 0.00, \(\sigma \) = 0.02).

  • Subtype E2 (51%) was constituted by 203 patients that showed mild axial symptoms (\(\mu \) = 0.26, \(\sigma \) = 0.15), mild bradykinesia (\(\mu \) = 0.30, \(\sigma \) = 0.02), moderate loss of smell (\(\mu \) = 0.51, \(\sigma \) = 0.15), and mild motor fluctuations (\(\mu \) = 0.30, \(\sigma \) = 0.16).

Partition F (orthostatic hypotension-sexual problems)

  • Subtype F1 (53%) was composed of 214 patients that showed no orthostatic hypotension (\(\mu \) = 0.00, \(\sigma \) = 0.02) and slight sexual problems (\(\mu \) = 0.01, \(\sigma \) = 0.03).

  • Subtype F2 (47%) was composed of 188 patients that showed slight orthostatic hypotension (\(\mu \) = 0.15, \(\sigma \) = 0.17) and mild sexual problems (\(\mu \) = 0.29, \(\sigma \) = 0.30).

Partition G (weight loss-depression)

  • Subtype G1 (79%) was characterized by 317 patients that showed no weight loss (\(\mu \) = 0.00, \(\sigma \) = 0.01) and slight depression (\(\mu \) = 0.03, \(\sigma \) = 0.04).

  • Subtype G2 (21%) was characterized by 85 patients that showed mild weight loss (\(\mu \) = 0.26, \(\sigma \) = 0.32) and slight depression (\(\mu \) = 0.24, \(\sigma \) = 0.20).

Partition H (excessive sweating-anxiety)

  • Subtype H1 (74%) consisted of 299 patients that showed no degree of excessive sweating (\(\mu \) = 0.00, \(\sigma \) = 0.01) and slight anxiety (\(\mu \) = 0.06, \(\sigma \) = 0.07).

  • Subtype H2 (26%) consisted of 103 patients that showed mild degree of excessive sweating (\(\mu \) = 0.27, \(\sigma \) = 0.28) and slight anxiety (\(\mu \) = 0.19, \(\sigma \) = 0.19).

A total of 29 probabilistic queries were performed to analyze the connections between the identified subtypes. They are provided in Section 4 of the Supplementary Information.

Comparison with other model-based clustering methods.

We compared our model-based multi-partition clustering method with two model-based single-partition clustering methods (i.e., the latent class model10, the Gaussian mixture model11, and the unsupervised k-dependence Bayesian classifier30), and two model-based multi-partition clustering methods (i.e., the Gaussian expansion simplification until termination algorithm17, and the multi-partition mixture model18). We evaluated the quality of the results from both a data fitting and a clustering perspective.

In this comparative analysis, we observed that multi-partition clustering methods were able to obtain multiple partitions from data, which resulted in a higher number of subtypes than single-partition clustering methods. These subtypes were not only more specific, but also more faithful to the data (i.e., higher model selection scores). From the considered methods, our approach returned the highest scoring model. The Gaussian expansion simplification until termination algorithm also obtained a high model selection score. However, its model suffered from overfitting and was difficult to interpret (it identified 18 partitions with 55 subtypes). For more information about the model selection process, see Section 4 of the Supplementary Information.


Clinical interpretation of PD subtypes and their associations

Partitions were underpinned by a reasonable spread of contributory PD symptoms, thus bridging a statistical and clinical divide. Tremor and rigidity were the exceptions, appearing to be independent of the rest of variables in the model (see Fig. 1). Weak correlation between rigidity, tremor, and non-motor symptoms is not uncommon and has also been observed in a recent study that considered a similar population31.

In partition A, patients were divided into two subtypes according to the severity of their ICRDs and PIGD. Although the mean PIGD of the subtypes did not differ by much, subtype A2 was characterized by a higher severity of ICRDs, a younger age and a younger age of onset. A relationship between young age, early PD onset and more severe ICRDs has been previously observed32. Both socio-demographic aspects are known risk factors for ICRDs along with motor complications, a pre-PD history of ICRDs, and a DA treatment33. Related to this, we observed a higher percentage of DA treated patients in A2 than in A1. However, no causal relationship could be extracted from this observation.

Apathy, cognition, pain, gastrointestinal, sleep, and urinary symptoms were associated in partition B. Two subtypes were identified, where patients characterized as subtype B2 presented a higher severity of these symptoms. This subtype is consistent with the Parkinson’s apathy subtype34,35, which has been described to be formed of older patients that showed cognitive impairment, sleep disorders, and relatively severe motor symptoms. The relationship between sleep disorders and urinary problems may indicate the presence of nocturia36. In addition, a recent study has also identified a relationship between constipation and cognitive dysfunction in two independent cohorts of patients37.

Partition C distinguished three subtypes that differed according to the severity of dyskinesias and psychosis. Subtypes C2 and C3 presented a higher severity of these symptoms than subtype C1. In addition, patients in C3 showed more acute psychosis than those in C2. Both C2 and C3 consisted of patients with a longer duration of the disease, a younger age of onset, and a higher LDD. These subtypes coincided with the observation that dyskinesias and psychosis are usually present in late stages of PD and may be associated with higher dopaminergic treatment doses38,39. Moreover, as PD progresses, individuals lose their long-duration response to dopaminergic treatment, usually resulting in higher doses40.

Fatigue is considered a common and complex non-motor symptom of PD, prevalent from the prodromal to the palliative stage. It is usually present from early stages of the disease and may often persist or even worsen over time41. While fatigue is usually regarded as an independent symptom, it has been moderately associated with apathy, sleep disorders, depression, and motor problems42,43. Our model was able to capture this duality by identifying a specific partition for fatigue symptoms, and connecting it with partitions B (apathy, sleep and depression) and E (motor problems). In addition, patients that suffered from more severe fatigue showed a longer duration of the disease and a younger age of PD onset.

Bradykinesia, axial symptoms, and motor fluctuations were associated in partition E with a decrease in sense of smell (i.e., hyposmia). Patients were divided into two subtypes according to E. While both subtypes presented motor issues, E2 was characterized by a higher severity of motor symptoms, hyposmia, and the presence of motor fluctuations. Anosmia/hyposmia is considered a preclinical marker of PD with relatively static severity. However, while it has not been associated to any particular PD phenotype44,45, a recent study has noted that normosmic PD patients usually display better motor function than hyposmic PD patients46.

Partition F identified two subtypes based on orthostatic hypotension and sexual problems. While the rest of partitions were independent of the sex of the patient, 71% of patients in F2 were male, showing significant differences in the sex of the patients belonging to F1 and F2. We also observed significant differences in the H&Y stage and PD duration of these patients, reflecting the later occurrence of the autonomic features of orthostatic hypotension and sexual dysfunction47,48,49.

Weight loss and depression were associated in partition G. Two subtypes were identified, where G2 was characterized by patients with mild weight loss and depression. Loss of appetite due to depression is a known weight loss factor50. There were no significant differences in sex, age, age of onset or H&Y stage of the patients belonging to G1 and G2. With regard to treatment, we did observe significant differences between subtypes. More specifically, there were considerably fewer number of patients with DA treatment in G2 than in G1, and those medicated patients were taking a significantly lower LEDD-DA.

Regarding clustering H, anxiety was associated with excessive sweating. Anxiety was present in both the H1 and H2 subtypes, but patients in H2 showed a higher severity of this symptom along with a mild degree of excessive sweating. This result is consistent with a recent study51 in which anxiety and depression were more prominent in PD patients with hyperhidrosis.

It is important to note that none of the discovered subtypes were fully independent of each other. Belonging to a specific subtype in a partition influenced the subtype probabilities in the rest of partitions. By using probabilistic inference, we were able to study the effect of these associations on their respective symptoms. Some interesting patterns that we observed included: (i) patients with ICRDs (A2 subtype) had a 0.75 probability of presenting the symptoms of B2. This result is consistent with a recent study that has challenged the traditional concept of apathy and ICRDs as opposite symptoms52; (ii) patients that suffered psychosis (C3 subtype) had an 0.88 probability of suffering the symptoms of B2. The presence of visual hallucinations has been linked to sleep deprivation, cognitive impairment and depression53,54; and (iii) patients with mild mental and physical fatigue (D2 subtype) had a 0.79 probability of suffering the symptoms of B2 and a 0.73 probability of suffering the symptoms of E2. As previously discussed, fatigue has been related to the presence of apathetic symptoms, sleep disturbances, and higher H&Y stages42,43.

The majority of partitions were directly or indirectly influenced by B, which acted as a pivotal latent variable in the multi-partition clustering model. This aligns with the current observation that sleep disorders, depression, constipation, and other non-motor symptoms appear across the spectrum of patients with PD55.


This study has some limitations. Concerning the population of the study, patients were not specifically selected for this analysis, but rather for the validation of the MDS-NMS. Nonetheless, the large sample size and the high quality of the collected data will allow these results to be contrasted and compared with the results of future studies. The sample was comparatively younger than the average population of patients with PD. It is therefore possible that the results differ in those with an older age where higher rates of comorbidities exist. In addition, we did not report a control group, although our intention was not to describe the symptoms as discriminant from normal subjects. Concerning MDS-UPDRS and MDS-NMS, these scales do not consider patient treatment. It is therefore difficult to identify if symptom severity is a natural consequence of PD or if it is a consequence of medication. Moreover, the majority of patients in this study were medicated. Finally, we did not consider PD biomarkers, which could provide more information about the identified subtypes.


Dividing PD patients into groups with common symptoms may help understand their underlying pathological processes. In this study, we used model-based multi-partition clustering to categorize patients according to 8 different sets of motor and/or non-motor symptoms. By using probabilistic inference, we were able to explore the associations between these subtypes and extract useful patterns. Independent confirmation of these results could allow for more precise PD treatments. In the future, it would be interesting to research how the evolution of PD throughout the years would affect these subtypes, and to which extent they could be markers of PD progression.

Data availability

All data, code and results are publicly available in our GitHub repository (


  1. 1.

    Greenland, J. C., Williams-Gray, C. H. & Barker, R. A. The clinical heterogeneity of Parkinson’s disease and its therapeutic implications. Eur. J. Neurosci. 49(3), 328–338 (2019).

    PubMed  Google Scholar 

  2. 2.

    Marras, C., Chaudhuri, K. R., Titova, N. & Mestre, T. A. Therapy of Parkinson’s disease subtypes. Neurotherapeutics 17, 1366–1377 (2020).

    PubMed  PubMed Central  Google Scholar 

  3. 3.

    Van Rooden, S. M. et al. Clinical subtypes of Parkinson’s disease. Mov. Disord. 26(1), 51–58 (2011).

    PubMed  Google Scholar 

  4. 4.

    Ma, L. Y., Chan, P., Gu, Z. Q., Li, F. F. & Feng, T. Heterogeneity among patients with Parkinson’s disease: Cluster analysis and genetic association. J. Neurol. Sci. 351(12), 41–45 (2015).

    PubMed  Google Scholar 

  5. 5.

    Erro, R. et al. Clinical clusters and dopaminergic dysfunction in de-novo Parkinson disease. Parkinsonism Relat. Disord. 28, 137–140 (2016).

    PubMed  Google Scholar 

  6. 6.

    Fereshtehnejad, S. M. & Postuma, R. B. Subtypes of Parkinson’s disease: What do they tell us about disease progression?. Curr. Neurol. Neurosci. Rep. 17(4), 34 (2017).

    PubMed  Google Scholar 

  7. 7.

    Mu, J. et al. Parkinson’s disease subtypes identified from cluster analysis of motor and non-motor symptoms. Front. Aging Neurosci. 9, 301 (2017).

    PubMed  PubMed Central  Google Scholar 

  8. 8.

    Martinez-Martin, P. et al. Factor analysis and clustering of the movement disorder society non-motor rating scale. Mov. Disord. 35(6), 969–975 (2020).

    PubMed  Google Scholar 

  9. 9.

    MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability (BSMSP) 281–297 (1967).

  10. 10.

    Lazarsfeld, P. F. & Henry, N. W. Latent Structure Analysis. (Hougton & Mifflin, 1968).

  11. 11.

    McLachlan, G. J., Lee, S. X. & Rathnayake, S. I. Finite mixture models. Annu. Rev. Stat. Appl. 6(1), 355–378 (2019).

    MathSciNet  Google Scholar 

  12. 12.

    Gordon, A. D. A review of hierarchical classification. J. R. Stat. Soc. (Ser. A Stat. Soc.) 150(2), 119–137 (1987).

    MathSciNet  MATH  Google Scholar 

  13. 13.

    Qian, E. & Huang, Y. Subtyping of Parkinson’s disease-where are we up to?. Aging Dis. 10(5), 1130 (2019).

    PubMed  PubMed Central  Google Scholar 

  14. 14.

    Mestre, T. A. et al. Parkinson’s disease subtypes: critical appraisal and recommendations. J. Parkinsons Dis. 11(2), 395–404 (2021).

    PubMed  PubMed Central  Google Scholar 

  15. 15.

    Thenganatt, M. A. & Jankovic, J. Parkinson disease subtypes. JAMA Neurol. 71(4), 499–504 (2014).

    PubMed  Google Scholar 

  16. 16.

    Marras, C. Subtypes of Parkinson’s disease: state of the field and future directions. Curr. Opin. Neurol. 28(4), 382–386 (2015).

    CAS  PubMed  Google Scholar 

  17. 17.

    Poon, L. K., Zhang, N. L., Liu, T. & Liu, A. H. Model-based clustering of high-dimensional data: variable selection versus facet determination. Int. J. Approx. Reason. 54(1), 196–215 (2013).

    MATH  Google Scholar 

  18. 18.

    Galimberti, G., Manisi, A. & Soffritti, G. Modelling the role of variables in model-based cluster analysis. Stat. Comput. 28(1), 145–169 (2018).

    MathSciNet  MATH  Google Scholar 

  19. 19.

    Li, X., Chen, Z., Poon, L. K. & Zhang, N. L. Learning latent superstructures in variational autoencoders for deep multidimensional clustering. In Proceedings of the 7th International Conference on Learning Representations (ICLR) 1–15 (2019).

  20. 20.

    Falck, F. et al. Multi-facet clustering variational autoencoders. Preprint at arXiv:2106.05241 (2021).

  21. 21.

    Chaudhuri, K. R. et al. The movement disorder society non-motor rating scale (MDS-NMS): initial validation study. Mov. Disord. 35(1), 116–123 (2019).

    PubMed  PubMed Central  Google Scholar 

  22. 22.

    Rodriguez-Sanchez et al. GitHub repository. (2021).

  23. 23.

    Goetz, C. G. et al. Movement disorder society-sponsored revision of the unified Parkinson’s disease rating scale (MDS-UPDRS): scale presentation and clinimetric testing results. Mov. Disord. 23(15), 2129–2170 (2008).

    Google Scholar 

  24. 24.

    Hoehn, M. M. & Yahr, M. D. Parkinsonism: onset, progression and mortality. Neurol. 17, 427–442 (1967).

    CAS  Google Scholar 

  25. 25.

    Stebbins, G. T. et al. How to identify tremor dominant and postural instability/gait difficulty groups with the movement disorder society unified Parkinson’s disease rating scale: comparison with the unified Parkinson’s disease rating scale. Mov. Disord. 28(5), 668–670 (2013).

    PubMed  Google Scholar 

  26. 26.

    Tomlinson, C. L. et al. Systematic review of levodopa dose equivalency reporting in Parkinson’s disease. Mov. Disord. 25(15), 2649–2653 (2010).

    PubMed  Google Scholar 

  27. 27.

    Lauritzen, S. L. & Wermuth, N. Graphical models for associations between variables, some of which are qualitative and some quantitative. Ann. Stat. 17, 31–57 (1989).

    MathSciNet  MATH  Google Scholar 

  28. 28.

    Attias, H. A variational Bayesian framework for graphical models. In Proceedings of the 14th Conference on Neural Information Processing Systems (NIPS) 209–215 (2000).

  29. 29.

    Friedman, N. Learning belief networks in the presence of missing values and hidden variables. In Proceedings of the 14th International Conference on Machine Learning (ICML) 125–133 (1997).

  30. 30.

    Pham, D. T. & Ruz, G. A. Unsupervised training of Bayesian networks for data clustering. Proc. R. Soc. Lond. A Math. Phys. Sci. 465(2109), 2927–2948 (2009).

    ADS  MathSciNet  MATH  Google Scholar 

  31. 31.

    Berganzo, K. et al. Motor and non-motor symptoms of Parkinson’s disease and their impact on quality of life and on different clinical subgroups. Neurologia 31(9), 585–591 (2016).

    CAS  PubMed  Google Scholar 

  32. 32.

    Weintraub, D., David, A. S., Evans, A. H., Grant, J. E. & Stacy, M. Clinical spectrum of impulse control disorders in Parkinson’s disease. Mov. Disord. 30(2), 121–127 (2015).

    CAS  PubMed  Google Scholar 

  33. 33.

    Gatto, E. M. & Aldinio, V. Impulse control disorders in Parkinson’s disease. A brief and comprehensive review. Front. Neurol. 10, 351 (2019).

    PubMed  PubMed Central  Google Scholar 

  34. 34.

    Dujardin, K. et al. Apathy in untreated early-stage Parkinson disease: relationship with other non-motor symptoms. Mov. Disord. 29(14), 1796–1801 (2014).

    PubMed  Google Scholar 

  35. 35.

    Sauerbier, A., Jenner, P., Todorova, A. & Chaudhuri, K. R. Non-motor subtypes and Parkinson’s disease. Parkinsonism Relat. Disord. 22, 41–46 (2015).

    Google Scholar 

  36. 36.

    Batla, A., Phé, V., Min, L. D. & Panicker, J. N. Nocturia in Parkinson’s disease: Why does it occur and how to manage? Mov. Disord. Clin. Pract. 3(5), 443–451 (2016).

    PubMed  PubMed Central  Google Scholar 

  37. 37.

    Leta, V. et al. Constipation is associated with development of cognitive Impairment in de novo Parkinson’s disease: a longitudinal analysis of two international cohorts. J. Parkinsons Dis. 11(3), 1209–1219 (2021).

    MathSciNet  CAS  PubMed  Google Scholar 

  38. 38.

    Espay, A. J. et al. Levodopa-induced dyskinesia in Parkinson disease: current and evolving concepts. Ann. Neurol. 84(6), 797–811 (2018).

    PubMed  Google Scholar 

  39. 39.

    Dave, S., Weintraub, D., Aarsland, D. & Ffytche, D. H. Drug and disease effects in Parkinson’s psychosis: revisiting the role of dopamine. Mov. Disord. Clin. Pract. 7(1), 32–36 (2020).

    PubMed  Google Scholar 

  40. 40.

    Armstrong, M. J. & Okun, M. S. Diagnosis and treatment of Parkinson disease: a review. JAMA Neurol. 323(6), 548–560 (2020).

    Google Scholar 

  41. 41.

    Friedman, J. H. et al. Fatigue in Parkinson’s disease: report from a multidisciplinary symposium. NPJ Parkinsons Dis. 2(1), 1–6 (2016).

    Google Scholar 

  42. 42.

    Stocchi, F. et al. Prevalence of fatigue in Parkinson disease and its clinical correlates. Neurol. 83(3), 215–220 (2014).

    Google Scholar 

  43. 43.

    Siciliano, M. et al. Fatigue in Parkinson’s disease: a systematic review and meta-analysis. Mov. Disord. 33(11), 1712–1723 (2018).

    PubMed  Google Scholar 

  44. 44.

    Rossi, M. et al. Motor features in Parkinson’s disease with normal olfactory function. Mov. Disord. 31(9), 1414–1417 (2016).

    PubMed  Google Scholar 

  45. 45.

    Sui, X. et al. Hyposmia as a predictive marker of Parkinson’s disease: a systematic review and meta-analysis. Biomed. Res. Int. 19, 1–9 (2019).

    Google Scholar 

  46. 46.

    Lee, D. H. et al. Is normosmic Parkinson disease a unique clinical phenotype?. Neurol. 85(15), 1270–1275 (2015).

    CAS  Google Scholar 

  47. 47.

    Özcan, T. et al. The association between symptoms of sexual dysfunction and age at onset in Parkinson’s disease. Clin. Auton. Res. 26(3), 205–209 (2016).

    PubMed  Google Scholar 

  48. 48.

    Palma, J. A. & Kaufmann, H. Treatment of autonomic dysfunction in Parkinson disease and other synuclinopathies. Mov. Disord. 33(3), 372–390 (2018).

    PubMed  PubMed Central  Google Scholar 

  49. 49.

    Hiorth, Y. H., Pedersen, K. F., Dalen, I., Tysnes, O. B. & Alves, G. Orthostatic hypotension in Parkinson disease: a 7-year prospective population-based study. Neurol. 93(16), 1526–1534 (2019).

    Google Scholar 

  50. 50.

    Ma, K. et al. Weight loss and malnutrition in patients with Parkinson’s disease: current knowledge and future prospects. Front. Aging Neurosci. 10, 1 (2018).

    PubMed  PubMed Central  Google Scholar 

  51. 51.

    Van Wamelen, D. et al. Exploring hyperhidrosis and related thermoregulatory symptoms as a possible clinical identifier for the dysautonomic subtype of Parkinson’s disease. J. Neurol. 266(7), 1736–1742 (2019).

    PubMed  PubMed Central  Google Scholar 

  52. 52.

    Scott, B. M. et al. Cooccurrence of apathy and impulse control disorders in Parkinson disease. Neurol. 95(20), 2769–2780 (2020).

    Google Scholar 

  53. 53.

    Ffytche, D. H. et al. The psychosis spectrum in Parkinson disease. Nat. Rev. Neurol. 13(2), 81–95 (2017).

    PubMed  PubMed Central  Google Scholar 

  54. 54.

    Lenka, A., Pagonabarraga, J., Pal, P. K., Bejr-Kasem, H. & Kulisvesky, J. Minor hallucinations in Parkinson disease: a subtle symptom with major clinical implications. Neurol. 93(6), 259–266 (2019).

    CAS  Google Scholar 

  55. 55.

    Schapira, A. H. V., Chaudhuri, K. R. & Jenner, P. Non-motor features of Parkinson disease. Nat. Rev. Neurosci. 18(7), 435–450 (2017).

    CAS  PubMed  Google Scholar 

Download references


F.R.-S., C.B., and P.L. are supported in part by the Spanish Ministry of Economy and Competitiveness through the PID2019-109247GB-I00 project, by the European Union’s Horizon 2020 Framework Programme for Research and Innovation under the Specific Grant Agreement No. 945539 (Human Brain Project SGA3), and by the BBVA Foundation (2019 Call) through the ”Score-based nonstationary temporal Bayesian networks. Applications in climate and neuroscience” project.

Author information




F.R.-S., C.R.-B., C.B., P.L., and P.M.-M. conceived the study. F.R.-S., C.B., and P.L. developed the methodology. F.R.-S. conducted the data analysis. All authors reviewed the results of the analysis and discussed the findings. F.R.-S. wrote the original draft of the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Fernando Rodriguez-Sanchez.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rodriguez-Sanchez, F., Rodriguez-Blazquez, C., Bielza, C. et al. Identifying Parkinson’s disease subtypes with motor and non-motor symptoms via model-based multi-partition clustering. Sci Rep 11, 23645 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing