## Introduction

In later stages of Parkinson’s disease (PD), some patients develop dementia or permanently need to use a wheelchair (Hoehn and Yahr stage 5; HY5), whereas others never become as severely affected1. Time and pattern of progression to such PD end-stages is heterogeneous2 and various classification systems have been developed, trying to elucidate these differences3,4,5,6,7,8,9,10,11.

One frequently used classification system, motor-phenotypes, is based on the predominant motor findings in PD4. However, nonmotor symptoms are found to hold prognostic information12,13,14,15 that are crucial for accurate PD prognostication5,16 and have been successfully implemented in several PD classifications5,6,7,8,9. Unfortunately, the data-driven approaches commonly used can make replication difficult17, and identifying scales and measures that are easily applied in a normal out-patient appointment, and also hold high reproducibility between examiners and patients from different ethnicities, can be challenging18. Recently, a study constructed an algorithm for a new clinical subtype classification system, based on cluster analysis of both motor and nonmotor findings3. Patients of the different groups were shown to vary in cerebrospinal fluid biomarkers3 and clinically and radiologically evaluated disease progression19,20,21. Improved clinical feasibility and confirmation of this system with population-based inclusion and longer time of observation have been requested22. The prognostic capabilities of these subtypes have been assessed in two other longitudinal PD cohorts, grouping patients close to diagnosis19,20,21, but, to the best of our knowledge, the long-term efficacy of this system when applied in mid-stage PD has not yet been examined. In this study, we adapted the motor-nonmotor subtyping system to facilitate clinical use and applied both this system and motor-phenotypes in patients with mid and late stages of PD from a cohort with long follow-up, comparing risks for reaching relevant PD milestones.

## Results

### Demographics

Baseline examination was performed at 7.9 ± 5.3 (mean ± SD) years after disease onset in 89 patients with HY-stage 2.8 ± 1.1. Follow-up data covering the following 8.1 ± 2.7 years was extracted from medical records, more specifically until 16.0 ± 5.4 total years of disease duration and 75.7 ± 8.0 years of age. In the most benign groups in each system (mild-motor-predominant and tremor-dominant), there were fewer patients that reached the assessed milestones, but they also had younger-onset ages, a higher proportion of women, and shorter disease duration (Table 1).

### Risks of reaching disease milestones

Risks of reaching the five disease milestones differed between the groups of both systems in an ordered fashion (Kaplan–Meier curves, Fig. 1).

In the motor-nonmotor system, log-rank tests showed significant differences in risks between the groups for all outcomes except walker usage. Compared with mild-motor-predominant patients, the diffuse-malignant patients had significantly increased risks for all outcomes, with and without adjustments (Table 2). Hazard ratios (HRs) showed 4.2 (CI 1.2–14.9) times the risk of dementia development in diffuse-malignant patients compared with mild-motor-predominant patients. Men and women showed HRs of 9.9 (CI 2.0–49.5) and 10.8 (CI 1.9–62.8), respectively, for developing HY5 during the observation period (diffuse-malignant vs mild-motor-predominant). Diffuse-malignant patients had an increased risk for walker usage and nursing home living compared to the intermediate group (adjusted HRs of 3.04 CI 1.1–8.1 and 3.14 CI 1.3–7.5, respectively). For women, the diffuse-malignant group’s risk of reaching HY5 was strongly increased compared to the intermediate patients’ risk during our observation period; however, this subgroup was small (Tables 1 and 2, Supplementary Fig. 1).

The motor-phenotypes showed significantly different risks on log-rank test only for mortality and there were no significant adjusted HRs for any outcome (Table 2). After, ad hoc, removing the tremor part of this system, however, the postural instability and gait disorder (PIGD) scores showed significant HRs after adjustment for death, nursing home, and HY5 milestones.

After sensitivity analysis, removing 28 individuals that had imputed UPDRS scores, motor-phenotypes showed significant adjusted HR for walker usage and PIGD scores HRs remained significant only for the HY5 milestone (Supplementary Table 2). For the motor-nonmotor system, with diffuse-malignant as a reference group, the intermediate group showed similar HRs as in the primary results, but the mild-motor-predominant group retained statistical significance of adjusted HRs only for nursing home living and dementia development (Supplementary Table 2).

### Dementia

Dementia ensued in 27 patients (32.9%) after 12.6 ± 5.9 years of disease (Table 1). Of the 16 surviving patients who had >20 years of disease duration, five (31.3%) had developed dementia; their average onset age was 56.9 ± 12.8 years. Among the patients with older onset age, the proportion of patients with dementia was higher: five of nine (55.6%) with onset age over 70 and two of three (66.7%) with onset age over 75, developed dementia. Male sex, older onset age, and longer disease duration contributed significantly to the risk of developing dementia in Cox regression models (Supplementary Table 3). Hallucinations were present at baseline for 17 of the 27 patients who developed dementia in this study. An X2 test confirmed dementia development the be different in patients with and without hallucinations at baseline, (X2 = 20.03 p < 0.001, 1 degree of freedom). The risk for developing dementia was seven times higher among patients with hallucinations at baseline compared to those without (Cox regression, unadjusted HR 7.4 CI 3.0–18.1, p < 0.001, adjusted HR 7.1, CI 2.5–19.7 p < 0.001; Supplementary Fig. 2, Supplementary Table 3).

### Re-examinations

Re-examinations were performed in 34 patients, 8.24 ± 2.0 years after the baseline examinations. These patients were then 70.5 ± 8.1 years old and had 15.2 ± 5.3 years of PD duration. Onset age and Unified Parkinson disease rating scale (UPDRS) III score were lower in patients with re-examination compared to those without: 55.3 ± 8.2 vs 62.4 ± 8.8 years, and 15.3 ± 7.3 vs 26.2 ± 11.4 points, respectively. Nonmotor symptoms questionnaire (NMSQ) scores showed a slight difference between re-examined and not re-examined patients, 9.0 ± 4.2 vs 10.4 ± 5.2 points, respectively. Nine women and 27 men had died before re-examination and 53.0% of reexamined patients were men compared to 60.1% at baseline. Hallucinations at baseline were correlated with individual Addenbrooke’s Cognitive Examination Revised (ACER) scores at reexamination (linear regression, unadjusted B = −40.3 CI −52.8 to –27.9 p < 0.001, adjusted B = −37.8 CI −52.0 to −23.5, p < 0.001). When both classification systems were re-applied, 29.4% and 64.7% of the re-examined patients changed classification groups in the motor-phenotype and motor-nonmotor systems, respectively (Fig. 2). For these patients, any milestone was only reached if they had PIGD motor-phenotype at baseline or if they had transitioned to PIGD motor-phenotype at re-examination (data not shown).

## Discussion

We used two separate classification systems, motor-phenotype and simplified motor-nonmotor subtypes, to compare risks for major disease milestones in PD. At a mean of 7.9 years of disease duration, we found that the motor-nonmotor system was able to estimate relative risks for using walker, developing dementia, death, nursing home living, and reaching HY5, during the subsequent mean 8.1 years. Motor-phenotypes were unable to successfully stratify risks for reaching the disease milestones during this time, but PIGD score was associated with HY5 development, nursing home living, and death. Furthermore, our simplified motor-nonmotor classification showed high clinical feasibility. Prognostic classification is of importance to enable lifestyle counseling and individualize medical treatment and paramedical care for PD patients and their families.

In the motor-nonmotor system, the diffuse-malignant group showed 2.7–10.8 times the risk of the mild-motor-predominant group for all five disease milestones studied (adjusted HRs; Table 2). Our results support those of previous longitudinal studies, generally substantiating clinical use of the motor-nonmotor system3,21,23.

We performed simplifications to two nonmotor subparts in the motor-nonmotor system, introducing NMSQ as nonmotor burden assessment and having experienced hallucinations as a proxy for cognitive assessment. This facilitated data collection and made clinical categorization feasible during one office visit. The proportion of hallucinations were, however, not found to substantially differ between the groups of the original motor-nonmotor system3. On the other hand, hallucinations and cognitive decline in PD were clearly linked in our work as well as in other studies24,25 and were also found critical for PD subtyping in another study on the same cohort as the original motor-nonmotor system10. As hallucinations are more prone to develop with higher age and longer disease duration, the relatively short durations in the original study of the motor-nonmotor system could have obscured a larger contribution of hallucinations in later PD stages. This might, however, speak against using the simplified approach of the motor-nonmotor system soon after the onset of motor symptoms.

The associations found in the present and earlier work on the motor-nonmotor system used different determinations of the nonmotor subparts. The notion that different methods can achieve successful subtyping, could support using different variants of the motor-nonmotor system, with more complex, more precise variants used in research and more easily applied simplifications used in clinical practice.

At the end of this study, the patients had a mean of 16.0 ± 5.4 years of disease duration and 27 patients (32.9%) had then developed dementia. Previous studies on patients with 20 years of PD duration have reported high diversity regarding cognitive outcome; 83% of PD patients developed dementia in one longitudinal study24 while the patients in a cross-sectional study showed substantially less cognitive impairment26. Two other longitudinal studies evaluated PD development at 10 years of disease and reported 46 and 49% of patients with dementia, respectively27,28. Compared with these studies we found a low proportion of patients with dementia, both before and after 20 years of duration, despite using a very broad definition of dementia. We did not aim to investigate reasons for differing dementia incidence, but the relatively low onset age in our study might have influenced this finding as well as possible disinclination to examine or report symptoms of cognitive decline and the fact that we used retrospective chart reviews to determine cognitive decline and dementia.

For the motor-phenotype system we found no significant risk-stratification in adjusted analyses, and conclude that age and duration had a greater impact than the motor-phenotypes at the disease stages we studied, in concurrence with previous findings (Supplementary Table 3)29,30. Lack of usefulness of the motor-phenotypes could be due to a confounding effect of disease stage29,30,31. A large proportion, 66.3%, of the present cohort had reached PIGD motor-phenotype at baseline, which likely diminishes the usefulness of this system when it is applied in the middle stage of PD, as in the present study. This is not surprising, since similar ceiling effects have been observed in this system already at 4.5 years of disease duration20. Limited usability of motor-phenotypes in mid- and late-stage PD was also indicated for the re-examined patients of the present study where PD milestones were only reached for individuals with PIGD motor-phenotype at inclusion or at re-examination (data not shown), as shown in an earlier study29.

We ad hoc used only the PIGD aspect of motor-phenotype, yielding significant HRs of 1.11–1.34 for the death, nursing home, and HY5 milestones. As HRs infer the change in risk per 1 step increase of a covariate, and since PIGD score ranged 0–17 in this cohort, the impact of PIGD score also had a large spread of risks for reaching these milestones. An individual with one SD (2.95) higher PIGD score than others in this cohort, but similar onset age, sex, and duration, had 56.1, 100.3, and 32.5% higher risk for living at a nursing home, developing HY5, and dying during the observation period, respectively. These results support those of another study in which PIGD score but not tremor was associated with negative patient outcome32. Significance level of results for death and nursing home milestones changed after sensitivity analysis though, impairing robust interpretation for other milestones than HY5. A strong association between PIGD score and HY5 development could be considered expected because both reflect the severity of axial motor symptoms and balance.

For the motor-nonmotor subtype system, a recent cross-sectional study showed the effects of disease stage and duration on motor-nonmotor subtypes as examined at 5.9 ± 5.4 years of disease duration23. In the present work, similar effects were indicated by Cox regression results (Supplementary Table 3), group redistributions at re-examination (Fig. 2), and diverging age of onset and duration in the different motor-nonmotor groups at baseline (Table 1). Nevertheless, these effects did not abolish the simplified motor-nonmotor system’s prognostic capabilities during the observation period, which was longer than other longitudinal studies examining the motor-nonmotor system3,19,20,21, and the present motor-nonmotor groups were also more equally distributed than the motor-phenotypes, both at baseline and after reclassification (Table 1). We conclude that onset age and disease duration substantially affect this system’s risk-stratification capabilities but subvert it to a lesser extent than motor-phenotypes.

Since scales and composite measures of the motor-nonmotor system are valuated relative to the cohort studied, cutoffs must be determined to enable generalizability. We propose that establishing different cutoffs for ranges of onset ages and/or PD durations could compensate for the stage effects observed in the present work and other studies21,23.

It has been postulated that the progression rate of PD is more heterogeneous in early–middle than in late disease stages because all patients reach the same neuropathological end-stage21,33. In contrast to this concept, we found that the motor-nonmotor system has prognostic value in mid–late PD and may hence convey relevant information to patients, families, and caregivers. PD patients often become confronted with worsening motor control and increasing nonmotor symptoms at this time and will likely benefit from individualized information and care.

Limitations of this study include that four out of five primary outcomes investigated were retrieved from medical records and could be affected by inconsistencies due to different reporters. There might have been selection biases, where participating patients were healthier than average, which might have affected the proportion of patients with dementia. Differences in medication34, comorbidities35, and education level36,37 can affect the outcome and classifications of PD patients. These factors were not adjusted for which could confound results and conclusions made in this study.

Strengths of this study include the relatively long follow-up time, which also solidified the clinical diagnoses, access to the major parts of the patients’ medical records, only three cases lost to follow-up, each assessment performed by the same clinician, and that half of the patients studied were not recruited from a tertiary center but from a geographically defined population.

In summary, we confirmed that a simplified clinical motor-nonmotor subtyping system identifies PD patients at different risks for future disease milestones better than motor-phenotypes. The patients were classified in mid-stage PD with variable and relatively long durations. Both systems showed instability later in the disease course, but our results imply a larger timeframe for the usability of the motor-nonmotor system. Our adaptation of two parameters used in the motor-nonmotor algorithm facilitated classification in the clinical setting. We also confirmed, ad hoc, that when using the motor-phenotype system in mid–late disease the tremor part should be omitted.

## Methods

### Patient cohort

Since 2006, patients with PD were continuously included in a research cohort (PARkinson Lund study; PARLU) consisting of patients with PD living in three municipalities in southern Sweden (population subgroup, 50.6% of the cohort) and patients with familial PD without known genetic cause on testing known to the Department of Neurology at Skåne University Hospital, Lund (hereditary subgroup). For the population subgroup, every resident in three adjacent municipalities (Olofström, Karlshamn, and Sölvesborg) were contacted who had a diagnosis of PD or parkinsonism in registries from all public health care providers in the region between 2006 and 2010, and 76% of those contacted were included in the cohort.

### Patient selection

Patients within PARLU with PD or PD-dementia and complete baseline visits were selected for this study. Aiming for long-term observation, patients with <2 years of follow-up data were excluded. Patients whose diagnosis had been changed to any other disorder than PD or PD-dementia were excluded, as were two individuals with monogenetic disease (Fig. 3).

### Examinations

Standardized baseline visits were performed from 2007 to 2013. Patients were then interviewed and clinically examined by the same physician (AP) including UPDRS38, modified HY-stage39, NMSQ40, and clinical assessment for other neurological symptoms. The presence of bradykinesia and one of rigidity, tremor, or postural imbalance was confirmed.

In 2017–2018, all surviving patients were invited to a follow-up research visit, including an interview and neurological re-examination by one physician (EYR). The same examination protocol was used with the addition of ACER. Patients used their regular medication on both examinations. We did not measure the doses of dopaminergic therapy because we aimed at evaluating the real-life situation of the patients and because several outcomes of this study were independent of treatment.

### Follow-up data collection from medical records

Time to relevant social and clinical milestones2 was extracted from medical records of all individuals: regularly using walker, living in a nursing home, developing HY5, or dementia. This was performed 2018–2019 by one physician (EYR) by searching all medical records from medicine/neurology departments and memory clinics in the southern health care region (regions of Skåne, Halland, and Blekinge) in Sweden, ranging back up to six decades. Medical records from primary health care were acquired when data was missing or inconsistent. All available paper records were scrutinized manually in full length. Electronic records were digitally searched for phrases, words, or parts of words associated with the milestones, including several grammatical forms and common spelling mistakes. Pre-defined criteria for fulfilling disease milestones were used. Patients were considered to have developed dementia when obtaining a diagnosis of dementia not otherwise specified or PD-dementia, being prescribed acetylcholinesterase-inhibitors, or when being repeatedly and clearly described as having dementia in medical records. Disease onset was defined as the first notion of rest tremor or subjective PD motor symptoms. Time at diagnosis was used when no description of onset was available (n = 6). To decrease the effect of peri-mortal comorbidities, milestones were ignored if they were only reached within two months before death. Periodicity of follow-up differed between patients and clinics, and to mitigate effects related to the exact timing of the patients’ contact with the medical services (differing interval censoring), all dates when reaching milestones were registered as a calendar year. Dates of birth, death, and end of observation were not standardized but registered as the actual date. The date of death was retrieved in 2019 from the Swedish population register kept by the Swedish Tax Agency, Skatteverket.

### Patient consents and ethical approval

Written informed consent was obtained from all included patients. If the patient was unable or incapable to decide, a close relative was instructed to determine patient consent and to act within the presumed previous intention of the patient. All parts of this study were approved by the Regional Ethics Review Board in Lund.

### Application of classification systems

Each patient was classified according to the two classification systems at the baseline examination. The three motor-phenotypes; tremor-dominant, undetermined, and PIGD, were determined by applying cutoffs to a quote between the mean value of selected tremor and postural stability items in UPDRS (PIGD ≤ 1.0 < undetermined < 1.5 ≤ tremor-dominant) as previously described4. The three motor-nonmotor subtype groups; mild-motor-predominant, intermediate, and diffuse-malignant, were determined by combining a composite motor score and three nonmotor parameters; a nonmotor rating scale, cognitive assessments, and assessment of REM-sleep behavioral disorder (RBD), similar to the original work3. We adapted the nonmotor parameters from the original publication to similar parameters collected in our study (Fig. 4). We used NMSQ instead of Scales for Outcomes in Parkinson’s disease-Autonomic and information provided by the patient and/or caregiver on RBD symptomatology, such as the enactment of dreams, talking, laughing, or screaming while sleeping, replaced the RBD screening questionnaire. As a cognitive marker, we used the occurrence of hallucinations instead of neuropsychological examinations. We considered patients to have had hallucinations if this was indicated in either medical records before examinations or in UPDRS item 2 or NMSQ item 14 at examinations. Thus, RBD and hallucinations had binary (“yes” or “no”) states. NMSQ and composite motor score were continuous rating scales and cutoffs at the 75th percentile of the cohort’s values were used to determine the positive or negative state, as in the original work. We also simplified the composite motor score, derived from averaging individual z values of UPDRSII, UPDRSIII, and PIGD subparts of UPDRS as in the original work3. We inserted the means and SDs of the present cohort and then mathematically deduced the z values to:

$${{{\rm{Composite}}}}\;{{{\rm{motor}}}} = {{{\rm{UPDRSIII}}}} + {{{\rm{UPDRSII}}}} \times 2.3 + {{{\rm{PIGD}}}} \times 21 + 55$$

Of note, the cutoffs for the motor-phenotypes are absolute, whereas the classification in the motor-nonmotor subtypes is per design relative to the distribution of values in the entire cohort studied3,4. In our cohort, 75th percentile cutoffs for motor-nonmotor categorization at baseline were 130.4 for composite motor score and 13.5 for NMSQ score. At re-examination, the simplified formula for composite motor score derived at baseline was utilized, but all other aspects of the systems, including cutoffs of the motor-nonmotor subparts (14.8 for NMSQ and 248.7 for composite motor score), was adapted to reexamination data.

### Statistics

Of 141 patients, eight individuals with > 20% of total data points missing in UPDRS parts II and III were excluded (Fig. 3). For 28 patients that were missing ≤ 20% data points (mean ± SD 6.6 ± 5.3%) missing values were imputed with the mean of each patient’s results for the corresponding UPDRS part. We performed a sensitivity analysis without these 28 cases (Supplementary Table 2). A mean of 5.4 individuals (range 0–14) had experienced the milestones before baseline and could not add to Cox regression analyses (Table 1). Linear regressions were performed after the normal distribution of residuals and equality of variances were confirmed. All regression analyses were adjusted for onset age and sex since these factors are known to affect PD severity41,42. Adjustments also included disease duration since durations at baseline differed between patients. In each classification system, fulfillment of the five milestones was assessed with Kaplan–Meier survival curves, log-rank tests, and Cox regressions. The group with a worse prognosis in both classification systems was selected as the reference category in Cox regressions. The proportionality of hazards assumption was tested with the cox.zph command of the survival package in R v4.0.2. For all other statistical analyses, SPSS v25.0 was used. In the case of non-proportional hazards, analyses were instead performed in subgroups based on the least contributing covariate (sex), which made hazards proportional (Supplementary Table 1). P values < 0.05 were regarded as significant and 95% confidence intervals were consistently used.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.