Subjective estimates of uncertainty and volatility during gambling predict impulsivity after subthalamic deep brain stimulation for Parkinson’s disease

Subthalamic deep brain stimulation (DBS) for Parkinson’s disease (PD) may modulate chronometric and instrumental aspects of choice behaviour, including motor inhibition, decisional slowing, and value sensitivity. However, it is unknown whether subthalamic DBS affects more complex aspects of decision-making, such as estimating the uncertainty around the probability of obtaining rewarding outcomes and the tendency of this probability to change over time. In this study, 38 participants with PD played a slot-machine in a virtual casino prior to subthalamic DBS (whilst ‘on’ medication) and again, 3-months postoperatively (whilst ‘on’ stimulation). Gambling behaviour during game play reflected self-reported measures of impulsivity, as quantified by the Barratt Impulsiveness Scale. We fit several computational models, including a hierarchical model of decision-making in the presence of uncertainty (the Hierarchical Gaussian Filter, HGF) and a reinforcement learning model (based on the Rescorla-Wagner formalism), to choices during slot machine play. The HGF was superior in accounting for the behaviour of our participants. Estimates of the perceptual model parameters, which encoded a participant’s uncertainty regarding the winning probability of the slot machine and its volatility, were significantly associated with impulsivity. Moreover, preoperative parameter estimates enabled significant out-of-sample predictions of the maximum postoperative change in impulsivity during longitudinal follow up. Our findings suggest that impulsivity in PD patients may be underpinned by uncertainty, and implicate a role for the subthalamic nucleus in the modulation of outcome certainty.


4
A paradigmatic approach to inference and learning under uncertainty uses Bayes' theorem to understand how prior knowledge (represented as a probability distribution known as the prior) is combined with new information from the environment (the likelihood) in order to update beliefs (the posterior). To obtain the posterior, a Bayesian agent inverts a 'generative' model (that describes how noisy sensory data result from environmental states); this corresponds to perception. Inferring environmental states from noisy sensory data allows the agent to plan actions that take into account the uncertainty of the environment. 24 Human behaviour often closely resembles those of Bayesian agents, for example, during low level sensory processing, 25,26 sensorimotor learning, 27,28 and higher-level reasoning, 29 although approximations to ideal Bayesian inference are likely required for most domains of cognition. [30][31][32] Critically, a Bayesian perspective can accommodate multiple forms of uncertainty, beyond sensory noise. For example, the agent's environment might change over time. In order to account for this higherorder uncertainty (or volatility), Bayesian agents are able to modulate the rate at which they learn (update their beliefs). This learning rate can be linked to an agent's encoding of volatility. [33][34][35] For instance, in more volatile environments, estimates of uncertainty (and thus learning rate) should be higher so that more emphasis is given to very recent information; at the same time, predicting the longerterm consequences of actions becomes more difficult. This link between uncertainty and decisionmaking may be of crucial importance for impulsivity. 23 Furthermore, individual differences in approximate Bayesian inference plausibly contribute to inter-individual variability in behaviour. Such differences can be quantified using models with subject-specific parameters. 34 For example, individual differences could concern the estimation of environmental volatility 36 or the formation of unusually confident or 'precise' beliefs. 37 Here, we applied a hierarchical Bayesian model (the Hierarchical Gaussian Filter, HGF) to behavioural data from 38 participants with PD who played a virtual casino before and after subthalamic DBS. By allowing participants to vary their bet size, switch between slot machines and place 'double or nothing' bets, we could estimate how participants not only inferred the trial-by-trial probability of winning, but also updated higher-order beliefs about the likelihood of a machine running 'hot' or 'cold', and the fluctuation of machines between these states. These model-based estimates afford us an individual profile of how each participant represented (and responded to) environmental uncertainty. We assessed these findings against standard measures of impulsivity derived from clinical assessment and questionnaires, focusing our attention on self-reported impulsivity as measured by the Barratt Impulsiveness Scale (BIS). Our computational analysis examines how DBS changes the manner in which persons with PD engage in Bayesian belief updating, and whether subject-specific estimates of uncertainty and volatility prior to DBS may predict impulsivity postoperatively.

Subjects
Prior to the commencement of data collection, the full protocol was approved by the Human Research Ethics Committees of the Royal Brisbane & Women's Hospital, the University of Queensland, the QIMR Berghofer Medical Research Institute and UnitingCare Health. All research was performed in accordance with relevant guidelines and regulations. All participants gave written, informed consent to participate in the study.
Thirty-eight participants with PD undertaking STN-DBS were consecutively recruited at the Asia-Pacific Centre for Neuromodulation in Brisbane, Australia. All participants met the UK Brain Bank criteria for PD 38 . No participants met the Movement Disorder Society criteria for dementia. 39 The PD subtype and the Hoehn and Yahr stage at device implantation was recorded. 40 Patients underwent bilateral implantation of Medtronic 3389 or Boston Vercise electrodes in a single-stage procedure.
Stimulation was commenced immediately using microelectrode recording data to identify the optimal contact. Further contact testing took place over the following week as an inpatient, with participants returning to the DBS centre following discharge for further stimulation titration, guided by residual motor symptoms. Further details have previously been reported. 41,42

Neuropsychiatric Assessment
Impulsivity amongst participants was assessed with patient and clinician-rated instruments prior to STN-DBS and subsequently 2 weeks, 6 weeks, 13 weeks and 26 weeks postoperatively (see Supplementary Figure 1 for a study flowchart). A range of measures were obtained, to account for the fact that impulsivity is not a unitary construct. These included the Barratt Impulsiveness Scale 11 (BIS) and second-order factors attentional, motor and non-planning; 18 the Questionnaire for Impulsive-Compulsive disorders in PD Rating Scale (QUIP-RS); 43 the delay discounting task; 44 the Excluded letter fluency task (ELF); 45

Design and Setting
Participants completed the experimental task prior to DBS and at 13-weeks post-DBS. Participants were 'on' medication and stimulation for all assessments. We opted against a counterbalanced 'off' and 'on' DBS assessment at the same visit for several reasons. First, our aim was to provide a naturalistic insight into the subtle behavioural changes that emerge as patients transition from dopaminergic therapies to subthalamic stimulation; changes in levodopa equivalent daily dose were included as co-variates in our analyses. Second, our experience is that many patients would not tolerate the DBS 'off' state without severe discomfort. Thirdly, despite allowing DBS washout, plastic network effects of chronic DBS may persist and contaminate findings in an on-off design.

Task
We employed a modified version of an established slot machine gambling paradigm validated in healthy controls. 19 Subjects read an instruction screen and played through 5 training trials, after which they entered a 'virtual' casino, starting with 2000 AUD available to gamble and playing 100 trials ( Figure   1). The win-loss likelihood of the slot machines was predetermined and changed at regular intervals. On completion of the task, participants received a small monetary reward proportional to their total winnings. The naturalistic gambling task allows for risk-taking and impulsive behaviour to be expressed and offers four actions on each trial, each of which reflect exploration, and thereby, risk-taking.   option. If the subject elects to engage in this gamble, a card flips over revealing the result, and subjects are taken to the next trial. If the subject does nothing, or decides not to gamble, he or she is taken to the next trial. The trajectory of win-loss outcomes is fixed, ensuring comparable inference upon perceptual and response parameters across participants.

The Hierarchical Gaussian Filter (HGF)
The HGF is a hierarchical Bayesian model 34 Bayesian inference (of an agent). 24 An informal description is given below and a formal summary is provided in the Supplementary Material.

The Perceptual Model
The HGF is used to infer how an individual subject learns about hierarchically-coupled environmental quantities under different forms of uncertainty (including volatility). In our case, the lowest level of the HGF, 1 , represents the trial-wise binary outcome (win or loss) in the slot machine. This derives from a sigmoid transformation of 2 representing probability in logit space (i.e., whether the machine is currently "hot" or "cold"). 2 evolves as a Gaussian random walk whose step size is a function 2 ( 3 ) of a third-level variable, 3 , which performs a Gaussian random walk of its own. 3   Bernoulli distribution, around the probability of win or loss, 2 ( ) . In turn, 2 ( ) is modelled as a Gaussian random walk, whose step-size is governed by a combination of 3 ( ) , via coupling parameter , and a tonic volatility parameter . 3 ( ) also evolves as a Gaussian random walk over trials, with step size (meta-volatility). In this investigation, after observing trial-wise outcomes (win or lose), the gambler updates her belief about the probability of win on a given trial k ( 2 ( ) ), as well as how swiftly that slot machine is moving between being 'hot' (high probability of win) or 'cold' 3 ( ) (low probability of win). On any trial, the ensuing beliefs then provide a basis for the gambler's response, which may be to increase the bet size, 'double up' after a win, switch to a new slot machine or leave the casino. 10 10

The Response Model
The response model maps a subject's beliefs (obtained by inverting the perceptual model under given parameter values) to observed gambling behaviour. Here, we use a sigmoidal response model 34 ; if this function is steep, there is a close relationship between current perceptual beliefs and betting behaviour.
Conversely, a gentler sigmoidal slope results in a lower-precision belief-behaviour mapping. This response function has a parameter, , the decision 'temperature', which determines the steepness of the sigmoid and thus the degree of stochasticity in the belief-to-choice mapping. In this paper, we test the following two variants of this response model, i.e., the mapping from beliefs to behaviour is fixed across the experiment. This parameter is estimated for each subject.
(ii) 'Uncertainty-driven' HGF: = 1/ 2 ( ) , where 2 ( ) is the variance of the inferred probability of win on trial k. That is, the response behaviour dynamically adapts to the precision of the subjects' belief about the hotness or coldness of the machine.

Perceptual Variable
Based on previous work that examined different computational models of our slot machine paradigm, 19 the perceptual variable used here was simple: a binary variable in which wins were represented by 1 and losses by 0.

Response Variable
The response variable is a binary representation of actions associated with risk taking. It is constructed using a logical OR operator on four choices during the slot machine paradigm: bet increases, machine switches, double-ups and casino switches. For each trial, the response variable takes a 1 when any of these four events occur, and 0 otherwise. For details, please see Supplementary Table 1.

Reinforcement Learning
As an alternative model, we used a classical associative learning model, Rescorla-Wagner (RW), often used in reinforcement learning (RL). 53 The RW model updates the probability of a win on trial k by combining the probability on trial k-1 with a PE weighted by a constant learning-rate. Hence, in contrast to the HGF, the RW model does not have a dynamic learning rate over trials, nor can it account for different forms of perceptual uncertainty. Here, we combine the RW learning rule with the same sigmoidal response model described above, with free parameter , that we estimate on a subject-specific 11 11 basis. This results in a model that is (i) structurally not dissimilar but less complex than the HGF and (ii) almost identical to the RL model used in a prior investigation of learning after STN-DBS. 17

Model Inversion
The HGF and RW models were inverted using population Markov-Chain Monte Carlo (MCMC) sampling. 54 Parameter estimation in the HGF is classically 'fully Bayesian' and requires a selection of priors, which influence parameter estimation to a lesser or greater degree. In order to minimise this influence, we used a novel empirical Bayesian inference scheme for the HGF where a Gaussian grouplevel distribution of parameters is constructed from samples across the group. This group-level empirical prior is then used to obtain posterior parameter estimates in each subject (Supplementary

Model Comparison
As described above, we considered two competing hypotheses of how subjects might incorporate uncertainty into their choice of actions, i.e., two different belief-to-choice mappings in the response model for the HGF (the 'Standard' and 'Uncertainty-driven' models). These two versions of the HGF were compared with the RW model. As we were primarily interested in the pre-DBS to post-DBS change, we selected the winning model for the pre-DBS measurements. We then evaluated if the parameter estimates of that winning model changed postoperatively. Estimates of the negative free energy (log model evidence) were computed using thermodynamic integration. 54 The negative free energy balances goodness of fit with a complexity penalty. Group-level free energy estimates were compared to select a winning model.

General Considerations
All computational modelling and model inversion was performed using MATLAB (Mathworks), We were particularly interested in examining whether the computational characterisation of individual uncertainty estimates prior to DBS could predict clinically-relevant changes in impulsivity at any time point after DBS. This follows the 'generative embedding' approach, in which individual predictions are not derived from measured data but from parameter estimates obtained by a generative model. 55,56 Importantly, the optimal BIS cut-off score for clinically-significant impulsivity varies by age and disease, 57 with only one existing investigation specific to a PD cohort. 58 Therefore, we examined whether we could predict the maximum postoperative increase in impulsivity, as measured by the BIS, from each participant's parameter estimates obtained at baseline, using regression and cross-validation.

Cross-Validation
In order to evaluate the out-of-sample predictability of the maximum change in BIS by pre-DBS model parameter estimates, a leave-one-out cross-validation was performed. A null distribution was

Participant Characteristics
Participants were a predominantly middle-aged sample, with a bias towards male gender and akineticrigid/mixed phenotype over tremor (Table 1). Most participants had bilateral disease with consequent impairment of functioning in their activities of daily living.

Neuropsychiatric Assessment Pre-and Post-DBS
Concerning symptoms of primary interest (  Figure 3).
The BIS and the BDI showed a significant positive correlation at each time point ( =0.46, p=0.003; =0.53, p<0.001), and both showed (near-)significant changes from pre-to post-DBS (Table 2).
Therefore, to rule out that impulsivity-related findings were driven by changes in depression, the BDI was included as a covariate when regressing behaviour and model parameter estimates against BIS scores. Whilst the LEDD is conceivably related to impulsivity, it did not correlate with the BIS total we focused our attention on exploring impulsivity as measured by the BIS.

Gambling Behaviour Pre-and Post-DBS
At the group level, there were no significant differences in the behaviour of participants on the slot machine from pre-to post-DBS (Supplementary Table 3). Due to subjects not engaging in the 'casino switch' option, this variable was eliminated from regression analyses.

Pre-DBS Regression of Gambling Behaviour on BIS scores
We studied the relationship between pre-DBS gambling behaviour and pre-DBS impulsivity as measured by the BIS (

Post-DBS Regression of Gambling Behaviour on BIS scores
The full model of postoperative gambling behaviour was also significantly associated with BIS total score [ (4,33) =4.920, p=0.003] (Table 3). Again, post-hoc t-tests revealed that no task behaviour was significant on its own. When subscales of the BIS were examined, gambling behaviour correlated significantly with the BIS Attentional subscale [ (4,33) =8.123, p<0.001]. Post-hoc t-tests revealed that higher bet sizes ( (37) =2.604 p=0.014) and more frequent double or nothing gambles ( (37) =2.589 p=0.014) corresponded to higher BIS Attentional scores (Supplementary Table 5).
Post-DBS, higher bets and more frequent machine switches were significantly associated with higher QUIP-RS scores (Supplementary Table 6). No other measures of impulsivity were significantly associated with pre-or post-DBS slot machine activity.

Computational Modelling
As described above, we were primarily interested finding pre-DBS predictors for post-DBS changes in impulsivity. We therefore first determined, using Bayesian model comparison, which of our three models best explained pre-DBS behaviour, before evaluating whether the parameter estimates of this winning model changed postoperatively and predicted maximum postoperative BIS scores. Bayesian model comparison selected the 'standard' HGF (with a subject-specific decision temperature in the response model) as the winning model, with a group-level Bayes factor of approximately 11, compared to the next best model (the Rescorla-Wagner model) (Figure 3).

Changes in Model Parameter Estimates Pre-to Post-DBS
Estimates of both HGF model perceptual parameters and significantly increased postoperatively, implying larger subjective estimates of uncertainty (specifically, volatility) after DBS (Table 5 and

Pre-DBS Regression of Model Parameter Estimates on BIS scores
The full regression model (including the estimates of preoperative perceptual parameters and ) was significantly associated with BIS Total [ (3,34) =4.093, p=0.014] ( Table 5). Post-hoc t-tests on model parameter estimates revealed that no single parameter was significantly related to the BIS.

Post-DBS Regression of Model Parameter Estimates on BIS scores
The full regression model was significantly associated with BIS Total [ (3,34) =10.183, p<0.001] ( Table   5). Post-hoc t-tests revealed that was a significant regressor ( (34) =2.863, p=0.014). The positive regression coefficient for implied that the greater the subjective estimate of tonic volatility, the higher the BIS score.

Pre-DBS Model Parameter Estimates Predict Post-DBS Impulsivity
We next regressed the estimates of and at baseline against the maximum postoperative increase in impulsivity, as measured by the BIS. Pre-DBS parameter estimates, along with pre-DBS BDI were significantly associated with the maximum postoperative increase in impulsivity [ (3,34) =3.235, p=0.034] ( Table 5). Post-hoc t-tests revealed that was significantly related to the maximum change in BIS ( (37) =2.301, p=0.027). Notably, the BDI alone did not predict the maximum change in BIS

DISCUSSION
In this study, we employed a naturalistic gambling task and a hierarchical Bayesian model (for inference on subject-specific estimates of uncertainty) in order to investigate impulsive decision-making in patients with PD undertaking subthalamic DBS. We found that parameter estimates representing different forms of uncertainty (volatility) changed significantly from pre-to postoperative conditions.
In particular, there was a significant increase in , the parameter that captures a gambler's uncertainty about changes in the tendency of slot machines to oscillate between 'hot' and 'cold' states (metavolatility). There was also a postoperative increase in a second parameter, , that reflects a gambler's uncertainty about how quickly the probability of winning on a given slot machine was changing (volatility).
Notably, these model-based estimates of subjective uncertainty related to postoperative impulsivity. For example, the maximum postoperative increase in BIS was significantly associated with preoperative estimates. Most importantly, our model-based estimates also allowed for out-of-sample predictions: leave-one-out cross validation demonstrated that the regression model as a whole, but also individually, significantly predicted the maximum postoperative change in BIS ( Figure 5).
Increased estimates of uncertainty accelerate the rate of learning at higher hierarchical levels, which could engender maladaptive learning at lower levels of the hierarchy. A high learning rate means suppressing top-down expectations, and may impair learning about probabilistically aberrant events.
This offers a parallel but computationally distinct account of the stimulation-related learning changes described previously, 17 in which reduced positive and negative instrumental outcome sensitivity was reported as a consequence of neurostimulation. Similar to prior work, we also find a positive relationship between model-based estimates of uncertainty and impulsivity. 19,20,22,23 A plausible computational account of impulsivity is that high subjective uncertainty leads to lack of predictability and increases a tendency for exploration. In a non-surgical population, persons with PD withdrawn from medication display a characteristic impairment in reward learning and may show enhanced punishment sensitivity. 65 However, whilst dopamine replacement enhances the ability to learn from positive outcomes, learning from negative outcomes is impaired. 65 From the HGF perspective, an agent with increased uncertainty at higher levels would be expected to show both decreased reward and punishment learning, as surprise to both positive and negative unexpected outcomes would be reduced (see Lawson et al, 2017). 66 However, if LEDD reduction is a principal driver of a change in behaviour, then a selective impairment in positive outcome representation would be observed. This suggests that LEDD changes may have a secondary role, but further careful experiments will be necessary to address this question.
We did not observe significant correlations between behaviour or parameters inferred from slot machine play with other estimates of impulsivity including the excluded letter fluency task, the Hayling test and the delay discounting task. This reflects the compound nature of inhibitory control, which may implicate discrete subcortical and cortical regions and may evidence differential patterns of expression amongst impulsive endophenotypes. 67,68 For example, the Hayling and ELF tasks are more commonly included amongst measures of task-switching and conflict interference, whilst the delay discounting task assesses impatience. Alternative paradigms may be required to capture participant-wise behaviour amongst these constructs.
We note that the lack of a counterbalanced on-off stimulation design is a potential limitation of our study but suggest that our longitudinal design is more reflective of the natural clinical course taken by persons with PD in the clinic. Our participants simply would not have tolerated an extended DBS washout and we hypothesise that the younger age of participants in the study of Seymour et al may have facilitated their crossover design. 17 In summary, this study suggests that hierarchically related forms of uncertainty (volatility and metavolatility) change after subthalamic DBS in PD. Our results demonstrate that a naturalistic assessment of gambling behaviour in a virtual casino is useful for investigating impulsivity in PD: behavioural measures on this task and model-based estimates of subjective uncertainty (volatility) related significant to BIS scores, both pre-and postoperatively (Tables 3, 5). Increased uncertainty about environmental volatility may be maladaptive, as informed predictions about the world are necessary for learning from errors about trial-wise outcomes. We therefore posit a cognitive mechanism for the genesis of impulsive behaviour in this population. Finally, the potential of our model to predict changes in postoperative impulsivity from game play ( Figure 5) could be most valuable in PD, given the significant, but poorly 25 25 quantified risks relating to surgical (neurostimulation) and medical (dopamine agonist) treatments. If those at a higher risk of neuropsychiatric harm could be identified, this would improve the nature of treatment choice and informed consent and the effectiveness of clinical follow-up. 26 26

ACKNOWLEDGEMENTS
The authors gratefully acknowledge the commitment of patients and caregivers who contributed their time to this study. The authors acknowledge the ongoing support of St Andrew's War Memorial Hospital and the Herston Imaging Research Facility.

Author Contribution Statement
Paliwal: Task