Clinical classification systems and long-term outcome in mid- and late-stage Parkinson’s disease

Parkinson’s disease shows a heterogeneous course and different clinical subtyping systems have been described. To compare the capabilities of two clinical classification systems, motor-phenotypes, and a simplified clinical motor-nonmotor subtyping system, a cohort was included at mean 7.9 ± 5.3 years of disease duration, classified using both clinical systems, and reexamined and reclassified at the end of an observation period. Time-points were retrospectively extracted for five major disease milestones: death, dementia, Hoehn and Yahr stage 5, nursing home living, and walking aid use. Eighty-nine patients were observed for 8.1 ± 2.7 years after inclusion. Dementia developed in 32.9% of the patients and 36.0–67.4% reached the other milestones. Motor-phenotypes were unable to stratify risks during this period, but the worst compared with the more favorable groups in the motor-nonmotor system conveyed hazard ratios between 2.6 and 63.6 for all milestones. A clear separation of risks for dying, living at the nursing home, and reaching motor end-stage was also shown when using only postural instability and gait disorder symptoms, without weighing them against the severity of the tremor. At reexamination, 29.4% and 64.7% of patients had changed classification groups in the motor-phenotype and motor-nonmotor systems, respectively. The motor-nonmotor system thus stratified risks of reaching crucial outcomes in mid–late Parkinson’s disease far better than the well-studied motor-phenotypes. Removing the tremor aspect of motor-phenotypes clearly improved this system, however. Classifications in both systems became unstable over time. The simplification of the motor-nonmotor system was easily applicable and showed potential as a prognostic marker during a large part of Parkinson’s disease.

One frequently used classification system, motor-phenotypes, is based on the predominant motor findings in PD 4 . However, nonmotor symptoms are found to hold prognostic information [12][13][14][15] that are crucial for accurate PD prognostication 5,16 and have been successfully implemented in several PD classifications [5][6][7][8][9] . Unfortunately, the data-driven approaches commonly used can make replication difficult 17 , and identifying scales and measures that are easily applied in a normal out-patient appointment, and also hold high reproducibility between examiners and patients from different ethnicities, can be challenging 18 . Recently, a study constructed an algorithm for a new clinical subtype classification system, based on cluster analysis of both motor and nonmotor findings 3 . Patients of the different groups were shown to vary in cerebrospinal fluid biomarkers 3 and clinically and radiologically evaluated disease progression [19][20][21] . Improved clinical feasibility and confirmation of this system with population-based inclusion and longer time of observation have been requested 22 . The prognostic capabilities of these subtypes have been assessed in two other longitudinal PD cohorts, grouping patients close to diagnosis [19][20][21] , but, to the best of our knowledge, the long-term efficacy of this system when applied in mid-stage PD has not yet been examined. In this study, we adapted the motor-nonmotor subtyping system to facilitate clinical use and applied both this system and motor-phenotypes in patients with mid and late stages of PD from a cohort with long follow-up, comparing risks for reaching relevant PD milestones.

Demographics
Baseline examination was performed at 7.9 ± 5.3 (mean ± SD) years after disease onset in 89 patients with HY-stage 2.8 ± 1.1. Follow-up data covering the following 8.1 ± 2.7 years was extracted from medical records, more specifically until 16.0 ± 5.4 total years of disease duration and 75.7 ± 8.0 years of age. In the most benign groups in each system (mild-motor-predominant and tremor-dominant), there were fewer patients that reached the assessed milestones, but they also had younger-onset ages, a higher proportion of women, and shorter disease duration ( Table 1).

Risks of reaching disease milestones
Risks of reaching the five disease milestones differed between the groups of both systems in an ordered fashion (Kaplan-Meier curves, Fig. 1).
In the motor-nonmotor system, log-rank tests showed significant differences in risks between the groups for all outcomes except walker usage. Compared with mild-motor-predominant patients, the diffuse-malignant patients had significantly increased risks for all outcomes, with and without adjustments (  Supplementary  Fig. 1).
The motor-phenotypes showed significantly different risks on log-rank test only for mortality and there were no significant adjusted HRs for any outcome (Table 2). After, ad hoc, removing the tremor part of this system, however, the postural instability and gait disorder (PIGD) scores showed significant HRs after adjustment for death, nursing home, and HY5 milestones.
After sensitivity analysis, removing 28 individuals that had imputed UPDRS scores, motor-phenotypes showed significant adjusted HR for walker usage and PIGD scores HRs remained significant only for the HY5 milestone (Supplementary Table 2). For the motor-nonmotor system, with diffuse-malignant as a reference group, the intermediate group showed similar HRs as in the primary results, but the mild-motor-predominant group retained statistical significance of adjusted HRs only for nursing home living and dementia development (Supplementary Table 2).

Dementia
Dementia ensued in 27 patients (32.9%) after 12.6 ± 5.9 years of disease (Table 1). Of the 16 surviving patients who had >20 years of disease duration, five (31.3%) had developed dementia; their average onset age was 56.9 ± 12.8 years. Among the patients with older onset age, the proportion of patients with dementia was higher: five of nine (55.6%) with onset age over 70 and two of three (66.7%) with onset age over 75, developed dementia. Male sex, older onset age, and longer disease duration contributed significantly to the risk of developing dementia in Cox regression models (Supplementary Table 3 Table 3).

Re-examinations
Re-examinations were performed in 34 patients, 8.24 ± 2.0 years after the baseline examinations. These patients were then 70.5 ± When both classification systems were re-applied, 29.4% and 64.7% of the re-examined patients changed classification groups in the motor-phenotype and motor-nonmotor systems, respectively (Fig. 2). For these patients, any milestone was only reached if they had PIGD motor-phenotype at baseline or if they had transitioned to PIGD motor-phenotype at re-examination (data not shown).

DISCUSSION
We used two separate classification systems, motor-phenotype and simplified motor-nonmotor subtypes, to compare risks for major disease milestones in PD. At a mean of 7.9 years of disease duration, we found that the motor-nonmotor system was able to estimate relative risks for using walker, developing dementia, death, nursing home living, and reaching HY5, during the subsequent mean 8.1 years. Motor-phenotypes were unable to successfully stratify risks for reaching the disease milestones during this time, but PIGD score was associated with HY5 development, nursing home living, and death. Furthermore, our simplified motor-nonmotor classification showed high clinical feasibility. Prognostic classification is of importance to enable lifestyle counseling and individualize medical treatment and paramedical care for PD patients and their families.
In the motor-nonmotor system, the diffuse-malignant group showed 2.7-10.8 times the risk of the mild-motor-predominant group for all five disease milestones studied (adjusted HRs; Table 2). Our results support those of previous longitudinal studies, generally substantiating clinical use of the motornonmotor system 3,21,23 .
We performed simplifications to two nonmotor subparts in the motor-nonmotor system, introducing NMSQ as nonmotor burden assessment and having experienced hallucinations as a proxy for cognitive assessment. This facilitated data collection and made clinical categorization feasible during one office visit. The proportion of hallucinations were, however, not found to substantially differ between the groups of the original motornonmotor system 3 . On the other hand, hallucinations and cognitive decline in PD were clearly linked in our work as well as in other studies 24,25 and were also found critical for PD subtyping in another study on the same cohort as the original motor-nonmotor system 10 . As hallucinations are more prone to develop with higher age and longer disease duration, the relatively short durations in the original study of the motornonmotor system could have obscured a larger contribution of hallucinations in later PD stages. This might, however, speak against using the simplified approach of the motor-nonmotor system soon after the onset of motor symptoms.
The associations found in the present and earlier work on the motor-nonmotor system used different determinations of the nonmotor subparts. The notion that different methods can achieve successful subtyping, could support using different variants of the motor-nonmotor system, with more complex, more precise variants used in research and more easily applied simplifications used in clinical practice.
At the end of this study, the patients had a mean of 16.0 ± 5.4 years of disease duration and 27 patients (32.9%) had then developed dementia. Previous studies on patients with 20 years of PD duration have reported high diversity regarding cognitive outcome; 83% of PD patients developed dementia in one longitudinal study 24 while the patients in a cross-sectional study showed substantially less cognitive impairment 26 . Two other longitudinal studies evaluated PD development at 10 years of disease and reported 46 and 49% of patients with dementia, respectively 27,28 . Compared with these studies we found a low proportion of patients with dementia, both before and after 20 years of duration, despite using a very broad definition of dementia. We did not aim to investigate reasons for differing dementia incidence, but the relatively low onset age in our study might have influenced this finding as well as possible disinclination to examine or report symptoms of cognitive decline and the fact that we used retrospective chart reviews to determine cognitive decline and dementia.
For the motor-phenotype system we found no significant riskstratification in adjusted analyses, and conclude that age and duration had a greater impact than the motor-phenotypes at the disease stages we studied, in concurrence with previous findings (Supplementary Table 3) 29,30 . Lack of usefulness of the motorphenotypes could be due to a confounding effect of disease stage [29][30][31] . A large proportion, 66.3%, of the present cohort had reached PIGD motor-phenotype at baseline, which likely diminishes the usefulness of this system when it is applied in the middle stage of PD, as in the present study. This is not surprising, since similar ceiling effects have been observed in this system already at 4.5 years of disease duration 20 . Limited usability of motor-phenotypes in mid-and late-stage PD was also indicated for the re-examined patients of the present study where PD milestones were only reached for individuals with PIGD motorphenotype at inclusion or at re-examination (data not shown), as shown in an earlier study 29 .
We ad hoc used only the PIGD aspect of motor-phenotype, yielding significant HRs of 1.11-1.34 for the death, nursing home, and HY5 milestones. As HRs infer the change in risk per 1 step increase of a covariate, and since PIGD score ranged 0-17 in this cohort, the impact of PIGD score also had a large spread of risks for reaching these milestones. An individual with one SD (2.95) higher PIGD score than others in this cohort, but similar onset age, sex, and duration, had 56.1, 100.3, and 32.5% higher risk for living at a nursing home, developing HY5, and dying during the observation period, respectively. These results support those of another study in which PIGD score but not tremor was associated  32 . Significance level of results for death and nursing home milestones changed after sensitivity analysis though, impairing robust interpretation for other milestones than HY5. A strong association between PIGD score and HY5 development could be considered expected because both reflect the severity of axial motor symptoms and balance. For the motor-nonmotor subtype system, a recent crosssectional study showed the effects of disease stage and duration on motor-nonmotor subtypes as examined at 5.9 ± 5.4 years of disease duration 23 . In the present work, similar effects were indicated by Cox regression results (Supplementary Table 3), group redistributions at re-examination (Fig. 2), and diverging age of onset and duration in the different motor-nonmotor groups at baseline (Table 1). Nevertheless, these effects did not abolish the simplified motor-nonmotor system's prognostic capabilities during the observation period, which was longer than other longitudinal studies examining the motor-nonmotor system 3,[19][20][21] , and the present motor-nonmotor groups were also more equally distributed than the motor-phenotypes, both at baseline and after reclassification (Table 1). We conclude that onset age and disease duration substantially affect this system's riskstratification capabilities but subvert it to a lesser extent than motor-phenotypes.
Since scales and composite measures of the motor-nonmotor system are valuated relative to the cohort studied, cutoffs must be determined to enable generalizability. We propose that establishing different cutoffs for ranges of onset ages and/or PD durations could compensate for the stage effects observed in the present work and other studies 21,23 .
It has been postulated that the progression rate of PD is more heterogeneous in early-middle than in late disease stages because all patients reach the same neuropathological end-stage 21,33 .
In contrast to this concept, we found that the motor-nonmotor system has prognostic value in mid-late PD and may hence convey relevant information to patients, families, and caregivers. PD patients often become confronted with worsening motor control and increasing nonmotor symptoms at this time and will likely benefit from individualized information and care.
Limitations of this study include that four out of five primary outcomes investigated were retrieved from medical records and could be affected by inconsistencies due to different reporters. There might have been selection biases, where participating patients were healthier than average, which might have affected the proportion of patients with dementia. Differences in medication 34 , comorbidities 35 , and education level 36,37 can affect the outcome and classifications of PD patients. These factors were not adjusted for which could confound results and conclusions made in this study.
Strengths of this study include the relatively long follow-up time, which also solidified the clinical diagnoses, access to the major parts of the patients' medical records, only three cases lost to follow-up, each assessment performed by the same clinician, and that half of the patients studied were not recruited from a tertiary center but from a geographically defined population.
In summary, we confirmed that a simplified clinical motornonmotor subtyping system identifies PD patients at different risks for future disease milestones better than motor-phenotypes. The patients were classified in mid-stage PD with variable and relatively long durations. Both systems showed instability later in the disease course, but our results imply a larger timeframe for the usability of the motor-nonmotor system. Our adaptation of two parameters used in the motor-nonmotor algorithm facilitated classification in the clinical setting. We also confirmed, ad hoc, that Motor-phenotype at reexamination when using the motor-phenotype system in mid-late disease the tremor part should be omitted.

Patient cohort
Since 2006, patients with PD were continuously included in a research cohort (PARkinson Lund study; PARLU) consisting of patients with PD living in three municipalities in southern Sweden (population subgroup, 50.6% of the cohort) and patients with familial PD without known genetic cause on testing known to the Department of Neurology at Skåne University Hospital, Lund (hereditary subgroup). For the population subgroup, every resident in three adjacent municipalities (Olofström, Karlshamn, and Sölvesborg) were contacted who had a diagnosis of PD or parkinsonism in registries from all public health care providers in the region between 2006 and 2010, and 76% of those contacted were included in the cohort.

Patient selection
Patients within PARLU with PD or PD-dementia and complete baseline visits were selected for this study. Aiming for long-term observation, patients with <2 years of follow-up data were excluded. Patients whose diagnosis had been changed to any other disorder than PD or PD-dementia were excluded, as were two individuals with monogenetic disease (Fig. 3).

Examinations
Standardized baseline visits were performed from 2007 to 2013. Patients were then interviewed and clinically examined by the same physician (AP) including UPDRS 38 , modified HY-stage 39 , NMSQ 40 , and clinical assessment for other neurological symptoms. The presence of bradykinesia and one of rigidity, tremor, or postural imbalance was confirmed.
In 2017-2018, all surviving patients were invited to a follow-up research visit, including an interview and neurological re-examination by one physician (EYR). The same examination protocol was used with the addition of ACER. Patients used their regular medication on both examinations. We did not measure the doses of dopaminergic therapy because we aimed at evaluating the real-life situation of the patients and because several outcomes of this study were independent of treatment.

Follow-up data collection from medical records
Time to relevant social and clinical milestones 2 was extracted from medical records of all individuals: regularly using walker, living in a nursing home, developing HY5, or dementia. This was performed 2018-2019 by one physician (EYR) by searching all medical records from medicine/neurology departments and memory clinics in the southern health care region (regions of Skåne, Halland, and Blekinge) in Sweden, ranging back up to six decades. Medical records from primary health care were acquired when data was missing or inconsistent. All available paper records were scrutinized manually in full length. Electronic records were digitally searched for phrases, words, or parts of words associated with the milestones, including several grammatical forms and common spelling mistakes. Pre-defined criteria for fulfilling disease milestones were used. Patients were considered to have developed dementia when obtaining a diagnosis of dementia not otherwise specified or PD-dementia, being prescribed acetylcholinesterase-inhibitors, or when being repeatedly and clearly described as having dementia in medical records. Disease onset was defined as the first notion of rest tremor or subjective PD motor symptoms. Time at diagnosis was used when no description of onset was available (n = 6). To decrease the effect of peri-mortal comorbidities, milestones were ignored if they were only reached within two months before death. Periodicity of follow-up differed between patients and clinics, and to mitigate effects related to the exact timing of the patients' contact with the medical services (differing interval censoring), all dates when reaching milestones were registered as a calendar year. Dates of birth, death, and end of observation were not standardized but registered as the actual date. The date of death was retrieved in 2019 from the Swedish population register kept by the Swedish Tax Agency, Skatteverket.

Patient consents and ethical approval
Written informed consent was obtained from all included patients. If the patient was unable or incapable to decide, a close relative was instructed to determine patient consent and to act within the presumed previous intention of the patient. All parts of this study were approved by the Regional Ethics Review Board in Lund.

Application of classification systems
Each patient was classified according to the two classification systems at the baseline examination. The three motor-phenotypes; tremor-dominant, undetermined, and PIGD, were determined by applying cutoffs to a quote between the mean value of selected tremor and postural stability items in UPDRS (PIGD ≤ 1.0 < undetermined < 1.5 ≤ tremor-dominant) as previously described 4 . The three motor-nonmotor subtype groups; mild-motorpredominant, intermediate, and diffuse-malignant, were determined by combining a composite motor score and three nonmotor parameters; a nonmotor rating scale, cognitive assessments, and assessment of REMsleep behavioral disorder (RBD), similar to the original work 3 . We adapted the nonmotor parameters from the original publication to similar parameters collected in our study (Fig. 4). We used NMSQ instead of Scales for Outcomes in Parkinson's disease-Autonomic and information provided by the patient and/or caregiver on RBD symptomatology, such as the enactment of dreams, talking, laughing, or screaming while sleeping, replaced the RBD screening questionnaire. As a cognitive marker, we used the occurrence of hallucinations instead of neuropsychological examinations. We considered patients to have had hallucinations if this was indicated in either medical records before examinations or in UPDRS item 2 or NMSQ item 14 at examinations. Thus, RBD and hallucinations had binary ("yes" or "no") states. NMSQ and composite motor score were continuous rating scales and cutoffs at the 75th percentile of the cohort's values were used to determine the positive or negative state, as in the original work. We also simplified the composite motor score, derived from averaging individual z values of UPDRSII, UPDRSIII, and PIGD subparts of UPDRS as in the original work 3 . We inserted the means and SDs of the present cohort and then mathematically deduced the z values to: Of note, the cutoffs for the motor-phenotypes are absolute, whereas the classification in the motor-nonmotor subtypes is per design relative to the distribution of values in the entire cohort studied 3,4 . In our cohort, 75th percentile cutoffs for motor-nonmotor categorization at baseline were 130.4 for composite motor score and 13.5 for NMSQ score. At reexamination, the simplified formula for composite motor score derived at baseline was utilized, but all other aspects of the systems, including cutoffs of the motor-nonmotor subparts (14.8 for NMSQ and 248.7 for composite motor score), was adapted to reexamination data.

Statistics
Of 141 patients, eight individuals with > 20% of total data points missing in UPDRS parts II and III were excluded (Fig. 3). For 28 patients that were missing ≤ 20% data points (mean ± SD 6.6 ± 5.3%) missing values were imputed with the mean of each patient's results for the corresponding UPDRS part. We performed a sensitivity analysis without these 28 cases (Supplementary Table 2). A mean of 5.4 individuals (range 0-14) had experienced the milestones before baseline and could not add to Cox regression analyses (Table 1). Linear regressions were performed after the normal distribution of residuals and equality of variances were confirmed. All regression analyses were adjusted for onset age and sex since these factors are known to affect PD severity 41,42 . Adjustments also included disease duration since durations at baseline differed between patients. In each classification system, fulfillment of the five milestones was assessed with Kaplan-Meier survival curves, log-rank tests, and Cox regressions. The group with a worse prognosis in both classification systems was selected as the reference category in Cox regressions. The proportionality of hazards assumption was tested with the cox.zph command of the survival package in R v4.0.2. For all other statistical analyses, SPSS v25.0 was used. In the case of non-proportional hazards, analyses were instead performed in subgroups based on the least contributing covariate (sex), which made hazards proportional (Supplementary Table 1). P values < 0.05 were regarded as significant and 95% confidence intervals were consistently used.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
Statistical protocols or 100% depersonalized original data can be made available to researchers by contacting the corresponding author. According to Swedish law, we are only able to directly share data sets (cohort level/pooled data) that can never be traced back to an individual person. Any data that can be traced back to an individual requires prior written permission by Region Skåne, Sweden.