Long-term Parkinson’s disease quality of life after staged DBS: STN vs GPi and first vs second lead

Deep brain stimulation (DBS) for Parkinson’s disease (PD) improves quality of life (QoL), but longitudinal follow-up data are scarce. We sought to quantify long-term benefits of subthalamic nucleus (STN) vs globus pallidus internus (GPi), and unilateral vs staged bilateral PD-DBS on postoperative QoL. This is a retrospective, longitudinal, non-randomized study using the PD QoL questionnaire (PDQ)-39 in patients with STN- or GPi-DBS, and with unilateral (N = 191) or staged bilateral (an additional contralateral lead implant) surgery (N = 127 and 156 for the first and second lead, respectively). Changes in PDQ-39 summary index (PDQ-39SI) and subscores throughout 60 months of follow-up were used as the primary analysis. We applied mixed models that included levodopa and covariates that differed at baseline across groups. For unilateral implantation, we observed an initial improvement in PDQ-39SI of 15.55 ± 3.29% (µ ± SE) across both brain targets at 4 months postoperatively. Unilateral STN patients demonstrated greater improvement in PDQ-39SI than GPi patients at 4 and 18 months postoperatively. Analysis of patients with staged bilateral leads revealed an initial 25.34 ± 2.74% (µ ± SE) improvement in PDQ-39SI at 4 months after the first lead with further improvement until 18 months, with no difference across targets. Scores did not improve after the second lead with gradual worsening starting at 18 months postoperatively. STN-DBS provided greater short-term QoL improvement than GPi-DBS for unilateral surgery. For staged bilateral DBS, overall QoL improvement was explained primarily by the first lead. Decision-making for patients considering DBS should include a discussion surrounding the potential risks and benefits from a second DBS lead.


INTRODUCTION
Quality of life (QoL) is one of the most important outcome measures in healthcare 1,2 . Parkinson's disease (PD) patients experience a multitude of motor and non-motor symptoms, resulting in a debilitating reduction in QoL 3,4 . Today, deep brain stimulation (DBS) of the subthalamic nucleus (STN) or globus pallidus internus (GPi) is considered a safe and effective surgical treatment for PD based on randomized controlled trials that have included motor symptom scales and QoL as primary or secondary outcomes [5][6][7][8][9] . In PD, QoL is commonly assessed using the validated PD QoL questionnaire (PDQ)-39 10 . Although DBS has been shown to improve QoL, longitudinal QoL follow-up data after PD-DBS are scarce and are mostly drawn from databases with small sample sizes or limited target inclusion 11 .
Since GPi stimulation is becoming increasingly used for PD-DBS 6,[12][13][14][15] , data are needed that compare not only the effects of STN or GPi stimulation on QoL measurements, but also of unilateral and staged bilateral stimulation in each of these brain targets. Furthermore, current studies of QoL outcomes with bilateral implants have reported outcomes before and after the implantation of both leads 5,6,16,17 ; however, data are lacking that systematically analyze the QoL effects of each lead independently. With real-life data from standard-of-care intervention, we aimed to evaluate the long-term effects of DBS on QoL across both brain targets and surgery types, while also uncovering the effects of the first vs the second lead in patients with staged bilateral implantations.

Sample characteristics
A total of 121 unilateral GPi and 70 unilateral STN patients were included. For bilateral surgery, 60 GPi and 67 STN patients were included for the first lead, and 72 GPi and 84 STN patients were included for the second lead. Baseline PDQ scores for unilateral, bilateral first, and bilateral second surgeries were obtained 4.91 ± 0.20, 4.99 ± 0.20, and 2.48 ± 0.21 months before surgery. At baseline, unilateral STN patients had a higher tremor score compared to GPi patients (p < 0.01), and unilateral GPi patients had a worse postural instability gait disorder (PIGD) score compared to STN patients (p < 0.01; Table 1, Supplementary  Table 1). Therefore, in our unilateral analyses, we included tremor score, PIGD, an interaction between tremor score and target, and an interaction between PIGD and target to control for the influence of these differences on QoL outcomes. For the PDQ mobility subscore specifically, baseline mobility scores were added as a covariate since STN patients had a lower baseline score compared to GPi patients (p < 0.05; Table 1, Supplementary  Table 1).
In the staged bilateral cohort, at the first surgery, the STN cohort exhibited significantly higher scores on the Berg balance scale (p < 0.05) and lower PIGD scores (p < 0.05) than the GPi cohort ( Table 2, Supplementary Table 2). Furthermore, three significantly different baseline variables between the staged bilateral STN and GPi cohorts were present at the second lead surgery, namely rigidity (p < 0.05), levodopa equivalent dose (LEDD; p < 0.05), and PIGD (p < 0.001), which were all lower in the STN cohort. Additionally, in the STN group, patients showed more dopamine responsiveness (p < 0.01), a higher rigidity score (p < 0.01), a higher unified Parkinson's disease rating scale (UPDRS)-II score (p < 0.05), and higher LEDD use (p < 0.01) before the first compared to the second lead. Furthermore, in the GPi group, patients had higher UPDRS-II (p < 0.05) and tremor scores (p < 0.05) before the first compared to the second lead. The number of months between the first and second surgeries was similar for the GPi and STN groups (p = 0.27; Table 2, Supplementary Table 2).
Given these numerous differences across the staged bilateral surgery cohorts, mixed models included the following covariates: rigidity score, UPDRS-II, tremor score, PIGD, Berg balance scale, LEDD, and dopamine responsiveness, as well as interactions between rigidity score and target; rigidity score and first vs second surgery; UPDRS-II and first vs second surgery; tremor score and first vs second surgery; PIGD and target; Berg balance scale and target; LEDD and target; LEDD and first vs second surgery; and dopamine responsiveness and first vs second surgery. PDQ-39 summary index (PDQ-39SI) and all PDQ subscores except communication demonstrated baseline differences with respect to target, the first vs second surgery, or both (Table 2,  Supplementary Table 2). These baseline scores and the necessary interactions were thus also added to each mixed model when appropriate.
Unilateral Implantation For unilateral implantation, we observed a significant main effect of time (p < 0.0001) and target (p < 0.05) on postoperative percent change in PDQ-39SI (Fig. 1a). Across targets compared to baseline, there was an initial improvement in PDQ-39SI at 4 months, whereas PDQ-39SI scores gradually returned to baseline during the remainder of follow-up (Fig. 1a). Regarding the effect of target, STN patients showed greater improvement in PDQ-39SI compared to GPi patients, particularly at 4 (27.32 ± 4.94% vs 9.20 ± 4.20%, p < 0.05) and 18 (22.92 ± 6.36% vs 7.12 ± 6.57%, p < 0.05) months of follow-up (Fig. 1a). Similar to PDQ-39SI, for all unilateral subscore mixed models (Fig. 2), there was also a significant main effect of time (Table 3), with the exception of the stigma subscore. In the cases of the ADL (p < 0.01) and communication (p < 0.05) subscores, STN patients improved more and worsened less, respectively, compared to GPi patients and independent of other covariates (Fig. 2, Table 3). Postoperative improvement was associated with increased LEDD for mobility (p < 0.05) and ADL (p < 0.001) subscores. Additionally, we observed positive associations between baseline tremor severity and postoperative improvement in the ADL (p < 0.05) and stigma (p < 0.05) subscores, as well as between baseline mobility scores and postoperative improvement in mobility (p < 0.0001). Lastly, for STN and GPi patients, higher PIGD scores at baseline were associated with less and more improvement in the ADL subscore, respectively (Supplementary Table 3).

Staged bilateral implantation
Within the bilateral lead implantation cohort, there was a significant effect of time (p < 0.01) and first vs second lead surgery (p < 0.05; Table 4), suggesting an overall improvement in PDQ after the first (p < 0.0001), but overall worsening after the second (p < 0.0001) lead placement across all follow-up. More specifically, for the first lead implantation across targets, there was an initial large improvement in PDQ-39SI within 18 months postoperatively (Fig. 1b). For the second lead implantation, PDQ-39SI returned to baseline by 4 months and progressively worsened starting at 18 months postoperatively (Fig. 1c). These results suggest that at the sample level, the second lead provided sustained (but not additional) benefit beyond the first lead for at least 12 months.  For visualization purposes, bins with less than five data points are removed. Blue (GPi) or black (STN) stars (*p < 0.05; **p < 0.01) represent significant improvements in PDQ scores compared to baseline values, whereas gray stars with a line underneath represent significant differences between the STN and GPi at the specified time point. If a target was not significant within the mixed model, STN and GPi were pooled together, and compared to baseline, represented by a gray star without a line. These values were corrected for multiple comparisons using false discovery rate correction.
Postoperative medication (med) adjustments were related to changes in PDQ. Namely, the effect of the first vs the second surgery (p < 0.05) and target (p < 0.01) for PDQ-39SI outcomes further interacted with LEDD. Specifically, a greater postoperative decrease in LEDD was associated with more improvement after the first vs the second surgery, as well as in the STN vs the GPi cohort (Supplementary Table 3). For both the mobility (p < 0.05) and ADL (p < 0.05) subscores, a postoperative decrease in LEDD was associated with more improvement after the first surgery (Supplementary Table 3). For the subscores emotional well-being (p < 0.05) and communication (p < 0.05), postoperative med reduction was linked to postoperative improvement independent of the first vs the second surgery and target (Table 4). Lastly, more baseline dopamine responsiveness was associated with postoperative worsening of the cognition subscore, independent of target (p < 0.05; Table 4).
Excluding communication, all PDQ models included baseline PDQ scores as a covariate. In all cases, higher baseline PDQ subscores were significantly associated with more postoperative improvement (Table 4). In addition, this relationship was stronger after the first surgery compared to the second surgery for the ADL (p < 0.01), emotional well-being (p < 0.05), and cognition subscores (p < 0.001; Supplementary Table 3).
There were a few other notable dependencies of our model variables on the change in PDQ after staged bilateral surgery (Supplementary Tables 3 and 4). Higher baseline tremor scores were associated with less improvement in emotional well-being after the first surgery (p < 0.01). Higher baseline tremor scores were also associated with more postoperative improvement in the ADL subscore (p < 0.05), independent of the first vs the second surgery and of target. Finally, independent of the surgery, the ADL subscore further depended on UPDRS-II, in which higher scores were associated with postoperative improvement (p < 0.01).

DISCUSSION
Although DBS alleviates motor symptoms associated with PD, the long-term effect of DBS on QoL has not been established largely because long-term longitudinal QoL data after DBS are scarce. Furthermore, QoL data comparing targets (i.e., STN vs GPi) are lacking across different types of implantations (i.e., unilateral and staged bilateral). Furthermore, the effect of using a staged bilateral approach is less understood since most centers have traditionally performed simultaneous implantations. Addressing these gaps in knowledge, overall we found that STN-DBS offered a short-term QoL benefit superior to GPi-DBS for unilateral surgery, but with no long-term differences. The data also informed us that the majority of overall QoL improvement after staged bilateral DBS was explained by the first lead.
In our unilateral outcome analyses, PDQ-39SI significantly improved regardless of brain target, but more improvement was seen with STN implantation (Table 3). Although improvement varied across subscores, PDQ generally improved in the short-term and stabilized in the long-term. No short-term or long-term improvements were found in the domains of communication and social support (Fig. 2). This is consistent with literature suggesting that speech does not typically respond to DBS [18][19][20][21] . Furthermore, the microlesion effect from surgery may impair verbal fluency [22][23][24] , and there remains ongoing progression of PD after DBS 25 . A lack of any significant change in the social support subscore may reflect our extensive preoperative neuropsychological screening, and the availability or quality of social support 26 .
Beyond the choice of brain target, several covariates significantly and independently affected postoperative PDQ outcomes after unilateral DBS. More LEDD use was associated with improvements in mobility and ADL subscores, which likely capture levodopa-responsive motor symptoms. Given the debilitating effects of tremor particularly on both ADLs and on stigma 27,28 , it was unsurprising that higher baseline tremor scores were associated with postoperative improvements for these subscores. A more unexpected result after unilateral DBS was that higher baseline PIGD scores were associated with less postoperative ADL improvement in the STN cohort, but more improvement in the GPi cohort. It is known that the ADL subscore is impacted by gait function 29 . DBS outcomes on gait have been highly heterogeneous 19,30 and our results suggest that this could be in part due to differences across specific outcomes (e.g., different QoL domains), as well as a more complex interdependency between not only brain target, but also baseline patient status. Similarly, for mobility PDQ outcomes, patients with higher mobility subscores at baseline improved more after DBS. This may reflect a common observation in this study that patients with more motor deficits at baseline may perceive a larger additive benefit of DBS on their motor symptoms 31 .
In comparing these results to established literature, in a cohort of unilateral STN-and GPi-DBS patients Zahodne and colleagues 8 reported improvements at 6 months in PDQ-39SI and in the subscores mobility, ADL, emotional well-being, stigma, cognition, and bodily discomfort, and these were overall similar to our results. However, for each target, they reported significant improvements in mobility, ADL, social support, and stigma for the GPi group, whereas the STN group only improved in stigma. In contrast, we found that the STN was superior in improving PDQ-39SI; however, our analyses were long-term. In addition, Zahodne et al. performed a randomized controlled trial, whereas this was a retrospective study. We did, however, attempt to minimize potential selection biases by controlling for many confounding clinical factors that differed at baseline. Within the staged bilateral implantation cohort, after controlling for baseline PDQ scores, dopamine responsiveness, LEDD, UPDRS-II, tremor score, Berg balance scale, PIGD, and rigidity score, there was a significant, independent effect of time on postoperative PDQ-39SI. To our knowledge, this is the first study characterizing a larger overall improvement in PDQ-39SI from the first compared to the second surgery for staged bilateral patients. The second DBS lead provided sustained PDQ-39SI scores up to 12 months after surgery, followed by progressive worsening (Fig. 1c).
The first surgery may lead to larger improvements in QoL for several reasons (Fig. 1b, Fig. 3). This could be related to postoperative programming or med optimization that occurs between the two surgeries, though we included LEDD as a covariate in this model. Future studies could incorporate DBS programming settings or similar proxy measures, such as time until DBS optimization. Given the subjective nature of PDQ-39, it is also plausible that patient expectations differed from the first to the second surgery, and this could be an intriguing area for future studies 32,33 . The effect is likely not explained by asymmetric symptoms with respect to DBS laterality, since our analyses controlled for possible differences in contralateral motor scores between the first and second surgery. Emerging literature also has demonstrated bilateral effects from unilateral stimulation, suggesting the possibility for clinical improvement ipsilateral to the implanted lead [34][35][36][37][38][39] . Though our study cannot ascertain the causal explanation for this phenomenon, this result calls for a careful consideration about the necessity of a second lead, especially in patients with asymmetric symptoms, and highlights the importance of a careful risk-benefit analysis.
Within the literature, QoL outcomes after bilateral DBS have been characterized in randomized controlled trials, though not separated by first vs second lead implantation. In Follett and colleagues' trial, QoL improved in six out of the eight subscales at 24 months after STN-or GPi-DBS, with communication worsening in both cohorts 6 . In our data, no statistical improvements were observed at 24 months in either the STN or GPi after the second lead implantation (Fig. 4). This difference in outcomes may be related to separating the first and second lead implantations. In the 36-month outcomes of the same study, patients were worse compared to baseline in only social support and cognition subscores 16 . In our data, across targets, patients were worse in the mobility, emotional well-being, and communication subscore. In ADL, patients in the GPi cohort worsened as well at 36 months. In another randomized controlled trial comparing STN and GPi outcomes at 12 and 36 months, there were no between group differences at either time point, which our results also demonstrated with the exception of the significant difference at 60 months in the emotional well-being subscore; however, the authors did not report individual PDQ subscores 14,40 . Overall, our data corroborate and extend prior literature findings.
All baseline PDQ scores except the communication subscore significantly affected postoperative change in PDQ. This finding was also stronger in the first vs the second surgery for the ADL, emotional well-being, and cognition subscores. This reinforces the need to control for baseline states in evaluating potential postoperative changes after DBS, and the importance of evaluating postoperative change with percentage change. As expected, patients with a worse baseline state improved more 31 , with the exception of communication, which could reflect the well-known lack of improvement in speech function from DBS 41 . Similar to unilateral surgery, as expected, there was an important influence of LEDD on PDQ outcomes after staged bilateral DBS. We found that for bilateral surgery, increasing LEDD use was associated with less improvement in the first vs the second surgery for total PDQ-39SI, ADL, and mobility subscores, as well as for the STN vs the GPi cohorts for total PDQ-39SI. This effect was also found in the emotional well-being and communication subscores independent of target or the first vs second surgery. These effects could reflect a situation in which people with suboptimal DBS improvement tended to require more rescue LEDD as time progressed, and these individuals may have been more likely to progress to a contralateral implantation. Clearly, there is a complex interplay between DBS and LEDD, and to fully disentangle their relationship, a statistical model could include change scores for all covariates at all follow-ups to better track disease severity alongside changes in therapy.
The ADL subscore had several notable dependencies, specifically UPDRS-II and tremor scores. A higher baseline score for both variables was associated with postoperative improvements in the ADL subscore. The ADL subscore is correlated with UDPRS-II and UPDRS-III tremor scores 42,43 ; thus, this outcome likely represents an improvement of ADL alongside lessening of tremor severity. Within the cognition subscore, more dopamine responsiveness was associated with postoperative worsening. The cognition subscore is more related to depression rather than cognitive functioning measured through neuropsychological exams 44 , and higher levodopa has been associated with a worsening of depression 45 . Finally, higher tremor scores were associated with less improvement in emotional well-being after the first surgery compared to the second. The emotional well-being subscore has been linked to mood measures, such as anxiety, depression, and apathy 44 ; thus, this effect may stem from the functional impairments of tremor and their effects on mood 46 .
There are several limitations associated with these analyses. First, this was a retrospective study and data were limited to the scope of what was inputted into the University of Florida (UF) INFORM database. However, our data are representative of real-life outcomes and aimed to characterize current standard of care. The data also had a high dropout rate across time, which could influence our results compared to a completed dataset. We elected to not impute missing values due to potentially inaccurate data that may have been introduced to the model. Additionally, a UF-specific selection bias may have impacted STN vs GPi group effects. However, we controlled for a selection bias by including many baseline covariates. The way in which we binned and analyzed the data could also influence the results of the mixed model ( Supplementary Fig. 1). Furthermore, it may be difficult to directly compare our results to the majority of studies examining DBS outcomes, using change-as opposed to percent change-in PDQ scores; however, as discussed, we opted to use percent change given the variability in baseline PDQ. Finally, our paper did not aim to directly explain why QoL changes after DBS surgery beyond associations of target and surgery type. Future studies could therefore develop statistical models using all covariates across time to better explain factors contributing most to QoL.

Study subjects
Data were retrospectively collected following Institutional Review Board approval to access the UF INFORM database from the Norman Fixel Institute for Neurological Diseases (IRB #201901807). All participants in the INFORM database provided written informed consent. PDQ-39 is routinely given for completion before clinical DBS programming visits. Inclusion , cognition (f), communication (g), and bodily discomfort (h) subscores after staged bilateral first surgery. The number of patients included in each data point across time is listed at the bottom of each graph. For visualization purposes, bins with less than five data points are removed. Blue (GPi) or black (STN) stars (*p < 0.05; **p < 0.01) represent significant improvements in PDQ scores compared to baseline values, whereas gray stars with a line underneath represent significant differences between the STN and GPi at the specified time point. If a target was not significant within the mixed model, STN and GPi were pooled together, and compared to baseline, represented by a gray star without a line. These values were corrected for multiple comparisons using false discovery rate correction. criteria were unilateral and staged bilateral PD patients diagnosed by a movement disorders-trained neurologist, undergoing STN-or GPi-DBS. Staged bilateral cases were defined as bilateral lead implantations specifically in opposite hemispheres, in the same target, and on two different days. At UF, patients undergo a detailed risk-benefit profile assessment by an interdisciplinary team to determine their DBS candidacy 47,48 . Outcomes for bilateral patients were assessed for the first and second surgery, in which both interventions had different baseline assessments. Patients with only unilateral implantations were not included within the first bilateral implantation cohort. Due to separate analyses examining the effect of the first and second lead, and given that some patients lacked baseline scores for one but not both surgeries, different patients could be included in these two groups.
Baseline PDQ scores for all surgeries were defined as the score nearest the date of surgery, but no more than 12 months before the surgery. Only complete PDQ-39 questionnaires were included, and no data in this study was imputed. For bilateral cases, the baseline scores for the second lead implantation were obtained after the date of the first surgery. Similarly, follow-up assessments for the first lead were only considered if they occurred prior to the second lead implantation. Patients' data were available up to 5 years after lead implantation.
The following potential confounding baseline parameters were included in the analysis: age at surgery, age at disease onset, disease duration, gender, LEDD, UPDRS-I, UPDRS-II, off-med UPDRS-III, contralateral off-med UPDRS-III rigidity score (referred to as rigidity score throughout), contralateral off-med UPDRS-III tremor score (referred to as tremor score throughout), contralateral off-med UPDRS-III bradykinesia score, UPDRS-III off-med PIGD score (sum of questions 27-30), Berg balance scale 49 , timedup and go 50 , percent improvement from off-to on-med UPDRS-III (referred to as dopamine responsiveness throughout), mini-mental state examination (MMSE) 51 , swallowing quality of life (SWAL-QOL) 52-54 , Beck Anxiety Inventory (BAI) 55 , Beck Depression Inventory-II 56 , and baseline PDQ scores. We defined dopamine responsiveness as the difference between on-med and off-med UPDRS-III scores, divided by the off-med score. Therefore, negative values indicate improvement from medication (i.e., more dopamine responsiveness). UPDRS scores after DBS implantation were completed off-med and off-stimulation. Overall, these baseline measures were not available for all study participants (Tables 1 and 2).

Data analysis
Changes from the baseline values were calculated in 4-month bins. Bins were centered at peaks in the distribution of the frequency of retrospective postoperative data ( Supplementary Fig. 1). If a patient had multiple followup values within bins, we considered the mean value of the PDQ scores. For PDQ-39SI score, we used percent change from baseline, whereas for PDQ subscores, we used the difference from baseline, thus preventing undefined values from zeros present at baseline, which was not an issue encountered with PDQ-39SI.

Statistical analysis
Potential differences in baseline scores across the groups were evaluated with unpaired t-test, Mann-Whitney U, or chi-squared analyses when appropriate. Normality was assessed using Shapiro-Wilk tests. Significant changes at follow-up were analyzed using mixed models to fully use the available data 57 . A term in the model for the random effect of each subject further allowed us to control for individual variability. We treated time as categorical bins since the time-dependent effect of DBS on PDQ may be nonlinear. Models were fitted using a restricted maximum likelihood estimation approach. For unilateral surgery, we tested for effects of target, time after surgery, and their interaction. For bilateral cases, the effects of target, time after surgery, first or second lead implantation, and their respective interactions were computed. Additionally, covariates stemming from baseline differences were included within mixed models when necessary to address confounding factors. The change in LEDD from baseline at each follow-up was included as a covariate in every model. For significant interactions, estimated marginal means are provided in Supplementary Table 3. Significance was defined as p-values less than or equal to 0.05. All reported p-values are adjusted with false discovery rate correction unless otherwise specified. PDQ scores more than three standard deviations away from the mean were removed from each bin. Fig. 4 Change in subscores of PDQ after staged bilateral second surgery. Change from baseline (µ ± 2 × s.e.m.) within the mobility (a), ADL (b), emotional well-being (c), stigma (d), social support (e), cognition (f), communication (g), and bodily discomfort (h) subscores after staged bilateral second surgery. The number of patients included in each data point across time is listed at the bottom of each graph. For visualization purposes, bins with less than five data points are removed. Blue (GPi) or black (STN) stars (*p < 0.05; **p < 0.01) represent significant improvements in PDQ scores compared to baseline values, whereas gray stars with a line underneath represent significant differences between the STN and GPi at the specified time point. If a target was not significant within the mixed model, STN and GPi were pooled together, and compared to baseline, represented by a gray star without a line. These values were corrected for multiple comparisons using false discovery rate correction.