Introduction

Deep brain stimulation (DBS) of the subthalamic nucleus (STN) is a well-established therapy with long-term efficacy improving motor symptoms, quality of life (QoL), and non-motor symptoms (NMS) in patients with Parkinson’s disease (PD)1,2,3,4,5. Previous research also demonstrated beneficial effects of STN-DBS on QoL compared to medical treatment6,7,8. However, on the individual level, 43–49% of patients experience no clinically relevant improvement of QoL postoperatively at 6-month follow-up6,9,10. Furthermore, there is Class I evidence that in 36% of pairs of patients treated either with best medical treatment alone or with STN-DBS, medical treatment alone results in better QoL outcomes than STN-DBS2. Identifying preoperative factors that predict QoL outcome could support the decision-making process for DBS eligibility and improve individual treatment results. Amongst other parameters younger age, worse baseline QoL, and specific NMS have been identified as predictors of more considerable QoL improvement at 6-month follow-up. However, it is unclear which demographic and clinical parameters influence the evolution of QoL beyond such a short-term follow-up. Therefore, we investigated predictors of QoL outcome after STN-DBS at 36-month follow-up and, based on previous studies with shorter follow-up periods, hypothesized that QoL outcome depends on demographic and non-motor predictors as well as baseline QoL.

Results

Of 129 patients screened, 73 patients (43 male) were included in the final analysis (see Fig. 1). The mean age at baseline was 62.0 years (SD = 8.3) and disease duration 10.3 years (SD = 4.7). The mean time to follow-up was 3.0 years (SD = 0.31).

Fig. 1: Enrollment.
figure 1

The flow chart describes the enrollment of patients. DBS deep brain stimulation.

Clinical outcomes at baseline, 6-month, and 36-month follow-up

Friedman tests revealed significant differences between the three visits for all outcome scores (see Table 1). In post hoc tests comparing baseline and 36-month follow-up, we observed significant longitudinal changes for the NMSS total score (P = 0.037), SCOPA-motor examination (P = 0.001), and -motor complications (P < 0.001), and significant sustained levodopa equivalent daily dose (LEDD) reduction (P < 0.001). No significant changes at 36-month follow-up were found for the PDQ-8 SI (P = 0.296) and SCOPA-activities of daily living (P = 0.161). PDQ-8 domains are reported in supplementary Table e-1.

Table 1 Outcome parameters at preoperative baseline and postoperative 6-month and 36-month follow-up.

Correlation analyses

Table 2 shows correlations between PDQ-8 SI change score (baseline vs. 36-month follow-up) and demographic variables and preoperative clinical scores. Significant correlations were found between PDQ-8 SI changes and PDQ-8 SIbaseline (moderate strength) and agebaseline (weak). Correlations between improvement in PDQ-8 SI and NMSS totalbaseline trended. Explorative Spearman correlations between PDQ-8 SI changes at 36-month follow-up and NMSS items at baseline showed significant associations with the items “difficulty experiencing pleasure” (NMSS-12baseline, r = 0.24, P = 0.041), “concentration” (NMSS-16baseline, r = 0.34, P = 0.003), and “urinary frequency” (NMSS 23baseline, r = 0.27, P = 0.022). We observed no significant correlation between these NMSS items at baseline. A partial correlation between PDQ-8 SI change score and NMSS-16baseline was still significant after controlling for NMSS-12baseline (r = 0.31, P = 0.007).

Table 2 Correlations between preoperative baseline test scores or demographic variables and 36-month change scores of quality of life.

Linear regression analysis

Univariate regression analyses were performed using the variables with a P < 0.2 in the correlation analyses as candidate predictors. This additionally included the items “fainting” (NMSS-2baseline, r = −0.17, P = 0.157), “hallucinations” (NMSS-13baseline, r = 0.18, P = 0.131), “forget things or events” (NMSS-17baseline, r = 0.17, P = 0.157), “interest in sex” (NMSS-25baseline, r = 0.21, P = 0.082), and “pain” (NMSS-27baseline, r = 0.22, P = 0.063). Univariate regression analyses with change in PDQ-8 SI at 36-month follow-up as the criterion variable was significant for the following independent variables: PDQ-8 SIbaseline (β = 0.42, P < 0.001), agebaseline (β = –0.29, P = 0.012), NMSS totalbaseline (β = 0.26, P = 0.025), NMSS-2baseline (β = −0.25, P = 0.034), NMSS-12baseline (β = 0.48, P < 0.001), NMSS-16baseline (β = 0.37, P = 0.001), and NMSS 23baseline (β = 0.25, P = 0.032). For the multivariate regression analysis, we excluded the variable NMSS totalbaseline due to high intercorrelation with PDQ-8 SIbaseline (r = 0.65, P = <0.001). In the stepwise multivariate regression analysis, the variables agebaseline, NMSS item 2baseline, NMSS item 12baseline, and NMSS item 16baseline remained significant. The multivariate model accounted for 36% of the variance (R2 = 0.40) in PDQ-8 SI change (F4,68 = 11.1, P < 0.001). In this model, NMSS item 12baseline had the highest predictive value (β = 0.35, P = 0.001), followed by agebaseline (β = −0.28, P = 0.004), NMSS item 16baseline (β = 0.26, P = 0.011), and NMSS item 2baseline (β = −0.22, P = 0.025).

Logistic regression analysis

The cut-off for a clinically relevant change in PDQ-8 SI at 36-month follow-up was 8.4 points (\(\frac{1}{2}\) SD of PDQ-8 SIbaseline). Out of 73 patients in our cohort, 28 patients (38.4%) were classified as 36-month QoL “responders”, 29 patients (39.7%) reported unchanged QoL, and 16 patients (21.9%) indicated a clinically relevant worsening of long-term QoL.

For binary logistic regression analyses, patients reporting unchanged and worsened QoL were grouped as QoL “non-responders” (n = 45, 61.6%). In explorative logistic regression analyses, every additional year of age at baseline decreased the odds of 36-month QoL improvement by ~5% (odds ratio [OR]= 0.949, confidence interval [CI]=0.895–1.007, P = 0.082). Furthermore, the odds of QoL improvement were increased by ~5% with every additional point of baseline QoL impairment in the PDQ-8 SIbaseline (OR = 1.048, CI = 1.012–1.086, P = 0.008) and by ~14% for every additional 10 points of baseline non-motor burden in the NMSS-total scorebaseline (OR = 1.014, CI = 0.999–1.029, P = 0.074). Moreover, specific NMSS items had a predictive value: one additional point in NMSS-12baseline (“difficulties experiencing pleasure”) increased the odds of QoL improvement by 46% (OR = 1.462, CI = 1.054–2.028, P = 0.023) and in NMSS-16baseline (“concentration”) by 30% (OR = 1.302, CI = 1.064–1.593, P = 0.010).

A logistic regression model using the aforementioned parameters (agebaseline, PDQ-8 SIbaseline, NMSS-totalbaseline, NMSS-12baseline, and NMSS-16baseline) correctly classified 75.3% of patients into groups of long-term QoL “responders/non-responders” (Nagelkerke’s R2 = 0.338, χ2 = 0.9, P = 0.001, n = 73) as opposed to only 61.6% without predictors. The model reached 75.0% sensitivity and 73.3% specificity at the optimal trade-off point (C-statistic = 0.779, P < 0.001, CI = 0.667–0.892, see Fig. 2)11.

Fig. 2: Receiver operating characteristic curve.
figure 2

The receiver operating characteristic curve (blue) illustrates the classification accuracy of the fitted logistic regression model (dependent variable: PDQ-8 SI “Responder”/”Non-Responder”, independent variables: agebaseline, PDQ-8 SIbaseline, NMSS-total scorebaseline, NMSS item 12baseline, NMSS item 16baseline). The discriminatory power of the test with these parameters is demonstrated by C-statistic = 0.78. The diagonal line (red) represents chance classification accuracy. The cross of black reference lines indicates the optimal trade-off point in which the model reached 75.0% sensitivity and 73.3% specificity. NMSS Non-Motor Symptom Scale, PDQ-8 SI8-item Parkinson’s Disease Questionnaire Summary Index.

Linear and logistic regression analyses were confirmed by Mann–Whitney U tests comparing baseline parameters between long-term QoL responders and non-responders. Significant differences were found for PDQ-8 SI (responders: 40.1 ± 20.4, non-responders 28.3 ± 12.5, P = 0.021), NMSS-12baseline (responders: 1.7 ± 3.0, non-responders 0.4 ± 1.0, P = 0.015), NMSS-16baseline (responders: 3.1 ± 2.9, non-responders 1.5 ± 2.1, P = 0.009), NMSS-23baseline (responders: 3.1 ± 4.0, non-responders 1.6 ± 3.0, P = 0.017), and NMSS-27baseline (responders: 3.5 ± 3.7, non-responders 2.0 ± 3.5, P = 0.027).

Discussion

In the present study, we report the 36-month effects of STN-DBS on QoL in a cohort of 73 patients with PD. We observed significant improvements in QoL following STN-DBS at a short-term, i.e., 6-month follow-up with subsequent decrements in gains at 36-month follow-up when only 38% of the patients experienced a sustained clinically relevant QoL improvement compared to preoperative baseline. Our results provide evidence that clinically relevant QoL improvement three years after preoperative baseline assessment can be predicted with 75% accuracy. Greater QoL improvement was observed for patients with younger age at intervention, worse baseline QoL, and a higher burden of specific NMS, such as anhedonia and concentration impairments. In contrast, patients more severely affected by fainting at baseline experienced less QoL improvement.

To our knowledge, the present study is the first to report an association between younger age at intervention and greater QoL improvement at 36-month follow-up. The association between these parameters was previously described for a 12-month period by Soulas et al.12. However, other studies found no association between age and changes in QoL6,13. This inconsistency might be explained by the fact that calendar age may not predict QoL. Instead, QoL after STN-DBS may be associated with ‘physiological age’. For example, frailty and co-morbidities may impact QoL post STN-DBS more than calendar age14. In line with previous research, other sociodemographic parameters, such as sex and disease duration were not significantly correlated with long-term change of QoL6,15.

Confirming results of earlier studies with shorter follow-up periods, the dosage of dopaminergic medication at preoperative baseline was not associated with QoL outcome6,13. In line with previous studies with follow-up periods up to 5 years, motor examination did not predict QoL changes13,16. In line with previous studies with follow-up periods up to 5 years, motor examinations did not predict QoL changes. Daniels et al.6 reported that the cumulative daily OFF time is the strongest predictor for improvement in disease-related QoL after DBS at 6-month follow-up. Further studies including cumulative OFF time with a longer follow-up are needed.

To our knowledge, this is the first report of a significant relationship between more severe preoperative QoL impairment and greater postoperative QoL improvement at 36-month follow-up3. This is in line with previous studies, that reported a relationship between these parameters at 6-month and 24-month follow-up3,9. Every additional point in the PDQ-8 SI at baseline increased the odds of favorable long-term QoL outcome by 5%. The strength of the association is in line with the results of our previous study at short-term follow-up9 and the Cleveland Clinic cohort results10, emphasizing the essential role of baseline QoL for the prediction of even long-term QoL outcome and also demonstrating the validity of our results. Our results are in line with several previous studies which have demonstrated that higher baseline QoL impairments predict greater postoperative QoL improvement at short-term follow-up3,6,10. In contrast, a study by Lezcano et al.16 has observed that lower less severe QoL impairments could predict greater QoL improvement at 1- and 5-year follow-up. These differences could be explained by demographic and clinical parameters in the study by Lezcano et al., such as a longer mean disease duration (13.2 years) and higher mean baseline PDQ impairments (41.1 points), than in the present study (10.3 years and 32.8 points)3,6,10,16. In the multivariate model, anhedonia, age, concentration problems and fainting contributed toward explaining QoL outcome at 36-month follow-up, whereas baseline QoL did not add to the predictive value of this model. This means that, although baseline QoL was a significant predictor of QoL change at 36-month follow-up in the univariate analysis, its contribution in the multivariate model was dominated by the other four variables mentioned earlier.

In the present study, specific preoperative NMS, namely more severe anhedonia and problems with sustaining concentration, were predictors for greater QoL improvement.

The predictive potential of depressive symptoms is in line with the results at 6-month follow-up in a previous study of our group9 and 8-month follow-up in the Cleveland Clinic cohort10. The present study results also extend the time frame of a 24-month follow-up study by Schuepbach et al. which reported greater QoL improvement in patients with worse baseline scores in two depression scales (Beck Depression Inventory and Montgomery-Åsberg Depression Rating Scale)3. One must acknowledge, that preoperative psychological interviews and strict formal testing resulted in a highly selected cohort with low baseline depression similar to other cohorts2,3,17. Therefore, the observation that worse baseline depression results in greater QoL improvements is only valid for patients with minimal or subclinical depression. More severe preoperative depression is a known risk factor for postoperative attempted or completed suicide18.

Furthermore, we observed that patients with greater baseline concentration deficits experienced greater QoL improvements at 36-month follow-up. The relationship between baseline concentration and QoL changes remained significant after controlling for anhedonia. Floden et al. and Witt et al. have reported that higher preoperative verbal memory deficits (Rey Auditory Verbal Learning Test single-trial memory and Dementia Rating Scale-2) are predictors of more unsatisfactory postoperative QoL outcome at 6- and 8-month follow-up10,19. Concentration/attention deficits are often accompanied by global cognition impairment in patients with PD. However, in our cohort, multi-disciplinary team assessments included expert neuropsychological assessments with formal testing of global cognition scores, psychiatric interviews, and neurological examinations to identify risks of adverse outcomes in patients with poor preoperative global cognition as these patients have a higher risk to progress to dementia. Strict indication assessments resulted in normal global cognition at baseline which remained stable at 6- and 36-month follow-up. Therefore, in this highly selected cohort, a higher burden of isolated concentration deficits constituted a predictor of greater QoL improvement. Future studies in larger cohorts including formal testing of concentration are warranted to confirm this finding.

To our knowledge, our study is the first to report an association between the presence of preoperative fainting and worse QoL outcome at 36-month follow-up. This finding is in line with the observation that cardiovascular symptoms, such as fainting/syncopes, worsen at 36-month follow-up8 and have a marked negative impact on QoL20.

Some limitations of our study should be acknowledged. One important limiting factor is the underrepresentation of patients with severe NMS, such as clinically relevant psychiatric disorders or cognitive impairment, as these patients were not eligible for DBS. Although the cohort size of the present study (n = 73) is limited, it is still one of the largest beyond short-term follow-up. Furthermore, the multicenter design of our study increases external validity by reducing bias caused by single-center studies. We did not systematically assess apathy, which could have improved our prediction model, as patients with negative QoL outcome showed higher preoperative apathy scores in previous research21. QoL was assessed with the PDQ-8, which may be less sensitive to small QoL changes than the PDQ-39 due to a reduced scale gradation resulting from fewer items22. Due to the focus on QoL and non-motor aspects of PD, we did not conduct assessments of motor examination in pre- or postoperative medication or stimulation OFF states and we did not assess other motor aspects, such as the cumulative daily OFF time or severity of dyskinesia. Future studies are needed to further explore a possible predictive potential of these parameters. Another limitation is that severe disease progression can result in patients being lost to follow-up which could introduce a systematic bias in studies with longer follow-up periods23.

Also, the variability of the exact location of stimulation in the target area might be relevant for postoperative QoL improvement24, but was not investigated in the present study as we focused on preoperative predictors of QoL outcome. A recent study by Petry-Schmelzer et al. reported that non-motor outcomes, such as mood/apathy and attention/memory, depend on the location of neurostimulation and are correlated with QoL outcome24,25,26. These results and the predictive value of baseline anhedonia and concentration deficits observed in the present study highlight the importance of assessments of a wide range of NMS which may have implications for DBS programming to achieve optimal long-term QoL outcomes.

The observation of greater QoL improvements at 36-month follow-up in patients with younger age at intervention, worse preoperative QoL, worse preoperative anhedonia and concentration problems, and less autonomic dysfunction, such as fainting, highlight the importance of preoperative assessments of a wide range of motor and nonmotor symptoms. Our results, therefore, contribute to the long-term goal of identifying patients who experience more considerable postoperative QoL improvement and optimizing patient selection for STN-DBS.

Methods

Study design

In this ongoing, prospective, observational, multicenter international study (Cologne, London, Manchester), we examined patients with PD undergoing STN-DBS as part of the DBS arm of the NILS study at preoperative baseline, 6-month, and 36-month follow-up postoperatively27,28. Patients were screened between 06/2011 and 07/2017. The study was conducted under the Declaration of Helsinki. Study protocols had been approved by the local ethics committees (Cologne, study no.: 12-145; German Clinical Trials Register: DRKS00006735; United Kingdom: NIHR portfolio, number: 10084; National Research Ethics Service South East London REC 3, 10/H0808/141). All patients gave written informed consent before study procedures.

Participants

PD diagnosis was based on the UK Brain Bank criteria and patients were screened for DBS treatment according to the guidelines of the International PD and Movement Disorders Society29. A sufficient levodopa responsiveness (>30% improvement in the Unified Parkinson’s Disease Rating Scale-III) was required for each patient. Furthermore, eligibility for STN-DBS was based on multi-disciplinary assessments including movement disorders specialists, stereotactic neurosurgeons, neuropsychologists, psychiatrists, and when necessary, speech therapists and physiotherapists. This led to the exclusion of patients with clinically relevant cognitive impairment and psychiatric diseases30.

Clinical assessment

Clinical assessments were carried out under medication ON (MedON) at preoperative baseline and with neurostimulation ON and medication ON (MedON/StimON) at 6-month and 36-month follow-up.

The following scales and questionnaires were assessed:

  1. (1)

    QoL was investigated with the PD Questionnaire-8 (PDQ-8) reported as PDQ-8 Summary Index (PDQ-8 SI) ranging from 0 (no impairment) to 100 (maximum impairment)31,32. The PDQ-8 assesses eight aspects of QoL (mobility, activities of daily living, emotional wellbeing, stigma, social support, cognition, communication, bodily discomfort) and has been commonly used in PD33 and STN-DBS28,34,35.

  2. (2)

    The clinician-rated NMS Scale (NMSS) contains 30 items covering nine domains of NMS: cardiovascular, sleep/fatigue, mood/apathy, perceptual problems/hallucinations, attention/memory, gastrointestinal tract, urinary, sexual function, and miscellaneous (including pain, inability to smell/taste, weight changes, and sweating). Symptoms are surveyed over the last four weeks and therefore reflect ON and OFF states. The NMSS total score ranges from 0 (no impairment) to 360 (maximum impairment)36.

  3. (3)

    Motor examination, activities of daily living, and motor complications were assessed with the Scales for Outcomes in PD (SCOPA) -motor examination, -activities of daily living, and -motor complications37. The SCOPA is an abbreviated version of the Unified PD Rating Scale. It strongly correlates with the corresponding parts of the Unified PD Rating Scale and was used here as its administration time is approximately four times shorter than the MDS-Unified PD Rating Scale37,38. SCOPA subscales range from 0 (no impairment) to 42 (motor examination), 21 (activities of daily living), and 12 (motor complications). Motor examinations were conducted by movement disorders specialists.

  4. (4)

    Global cognition was assessed with the Mini-Mental State Examination (MMSE) which ranges between 0 (maximum impairment) and 30 (no impairment).

  5. (5)

    To record the medical regimen, we calculated the LEDD following Tomlinson et al.39.

Statistical analysis

Longitudinal outcome changes

Statistical analyses were performed using SPSS Statistics 26. The Kolmogorov-Smirnov test was applied to check the assumption of normality. Longitudinal outcome changes between the three visits were analyzed with Friedman tests or repeated-measures analyses of variance when parametric test criteria were fulfilled. Post hoc, we calculated Wilcoxon signed-rank and t-tests, respectively, to compare outcome changes between pairs of visits. Benjamini-Hochberg correction was applied to account for multiple testing. The presented P-values were adjusted to the significance threshold P < 0.05 unless stated otherwise.

Correlation analyses

The relationship between changes in QoL scores and preoperative demographic and clinical parameters was explored using Spearman correlations, respectively Pearson correlations for normally distributed variables. PDQ-8 SI change score (mean Testbaseline – mean Test36-month follow-up) was correlated with the following variables: agebaseline, sex, disease duration since diagnosis, NMSS total scorebaseline, PDQ-8 SIbaseline, SCOPA-motor examinationbaseline, -activities of daily livingbaseline, -motor complicationsbaseline, MMSEbaseline, and LEDDbaseline. In addition, we explored if PDQ-8 SI change score correlated to specific NMSS itemsbaseline and, when appropriate, if these results remained significant after controlling for changes in other NMSS items in partial correlations.

Linear regression analysis

In a second step, we aimed to identify preoperative predictors of long-term QoL outcome using stepwise linear regression analysis. We included parameters from the correlation analyses (P < 0.2)40 as candidate predictor variables and PDQ-8 SI change score as criterion variable. Multi-collinearity was checked using intercorrelations between candidate predictor variables (r > 0.6) and Variance Inflation Factors, which should not exceed 1041.

Logistic regression analyses and receiver operating characteristics

Furthermore, the cohort was divided into groups of patients with clinically relevant QoL improvement and patients reporting stable/worsened QoL at 36 months. Each patient was classified as a long-term QoL “responder” or “non-responder” based on a preassigned threshold (½ SD of PDQ-8 SIbaseline) to report clinically important differences42. We employed exploratory logistic regression models and receiver operating characteristic analyses with dichotomized QoL outcome as criterion variable and demographic and preoperative clinical parameters as predictor variables to evaluate the utility of linear regression models to predict patients’ postoperative long-term QoL changes. Moreover, we analyzed differences of baseline characteristics between “responders”/“non-responders” using Mann–Whitney U tests or t-tests, respectively. To explore the relationship between QoL outcome changes and specific NMS, all analyses were explored for NMSS item scores.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.