Subjective estimates of uncertainty during gambling and impulsivity after subthalamic deep brain stimulation for Parkinson’s disease

Subthalamic deep brain stimulation (DBS) for Parkinson’s disease (PD) may modulate chronometric and instrumental aspects of choice behaviour, including motor inhibition, decisional slowing, and value sensitivity. However, it is not well known whether subthalamic DBS affects more complex aspects of decision-making, such as the influence of subjective estimates of uncertainty on choices. In this study, 38 participants with PD played a virtual casino prior to subthalamic DBS (whilst ‘on’ medication) and again, 3-months postoperatively (whilst ‘on’ stimulation). At the group level, there was a small but statistically significant decrease in impulsivity postoperatively, as quantified by the Barratt Impulsiveness Scale (BIS). The gambling behaviour of participants (bet increases, slot machine switches and double or nothing gambles) was associated with this self-reported measure of impulsivity. However, there was a large variance in outcome amongst participants, and we were interested in whether individual differences in subjective estimates of uncertainty (specifically, volatility) were related to differences in pre- and postoperative impulsivity. To examine these individual differences, we fit a computational model (the Hierarchical Gaussian Filter, HGF), to choices made during slot machine game play as well as a simpler reinforcement learning model based on the Rescorla-Wagner formalism. The HGF was superior in accounting for the behaviour of our participants, suggesting that participants incorporated beliefs about environmental uncertainty when updating their beliefs about gambling outcome and translating these beliefs into action. A specific aspect of subjective uncertainty, the participant’s estimate of the tendency of the slot machine’s winning probability to change (volatility), increased subsequent to DBS. Additionally, the decision temperature of the response model decreased post-operatively, implying greater stochasticity in the belief-to-choice mapping of participants. Model parameter estimates were significantly associated with impulsivity; specifically, increased uncertainty was related to increased postoperative impulsivity. Moreover, changes in these parameter estimates were significantly associated with the maximum post-operative change in impulsivity over a six month follow up period. Our findings suggest that impulsivity in PD patients may be influenced by subjective estimates of uncertainty (environmental volatility) and implicate a role for the subthalamic nucleus in the modulation of outcome certainty. Furthermore, our work outlines a possible approach to characterising those persons who become more impulsive after subthalamic DBS, an intervention in which non-motor outcomes can be highly variable.

withdrawn 38 . However, other centres report the emergence of harmful impulsivity subsequent to DBS, in persons with no prior history of clinically-significant psychiatric symptoms [39][40][41][42][43] . At present, there is little evidence to guide the identification of surgical candidates at risk of postoperative impulsivity 4,44 .
In this analysis, we employed a similar computational framework to that previously reported 19 , applying a hierarchical Bayesian model (the Hierarchical Gaussian Filter, HGF) to behavioural data from 38 participants with PD who played a virtual casino before and after subthalamic DBS. By allowing participants to vary their bet size, switch between slot machines and place 'double or nothing' bets, we could estimate how participants not only inferred the trial-by-trial probability of winning, but also updated higher-order beliefs about the fluctuations (volatility) of a slot machine's winning probability. Similar to the rationale outlined in prior work 19 , we believe that a naturalistic paradigm engenders increased behavioural engagement, allowing us to quantify behaviour that has a higher fidelity to 'real world' impulsivity. Additionally, model-based estimates derived from the computational framework may afford us an individual profile of how each participant represented (and responded to) environmental uncertainty. We assessed these findings against standard measures of impulsivity derived from clinical assessment and questionnaires, focusing our attention on self-reported impulsivity as measured by the Barratt Impulsiveness Scale (BIS). Our computational analysis examined how DBS changes the manner in which persons with PD engage in Bayesian belief updating, and whether changes in subject-specific estimates of uncertainty before and after DBS relate to changes in impulsivity.

Materials and Methods
Subjects. Prior to the commencement of data collection, the full protocol was approved by the Human Research Ethics Committees of the Royal Brisbane & Women's Hospital, the University of Queensland, the QIMR Berghofer Medical Research Institute and UnitingCare Health. All research was performed in accordance with relevant guidelines and regulations. All participants gave written, informed consent to participate in the study.
Thirty-eight participants with PD undertaking STN-DBS were consecutively recruited at the Asia-Pacific Centre for Neuromodulation in Brisbane, Australia. All participants met the UK Brain Bank criteria for PD 45 . No participants met the Movement Disorder Society criteria for dementia 46 . The PD subtype and the Hoehn and Yahr stage at device implantation was recorded 47 . Patients underwent bilateral implantation of Medtronic 3389 or Boston Vercise electrodes in a single-stage procedure. Stimulation was commenced immediately using microelectrode recording data to identify the optimal contact. Further contact testing took place over the following week as an inpatient, with participants returning to the DBS centre following discharge for further stimulation titration, guided by residual motor symptoms. Further details have previously been reported 48,49 . neuropsychiatric assessment. Impulsivity amongst participants was assessed with patient and clinician-rated instruments prior to STN-DBS and subsequently 2 weeks, 6 weeks, 13 weeks and 26 weeks postoperatively (see Supplementary Fig. 1 for a study flowchart). A range of measures were obtained, to account for the fact that impulsivity is not a unitary construct. These included the Barratt Impulsiveness Scale 11 (BIS) and second-order factors attentional, motor and non-planning 18 ; the Questionnaire for Impulsive-Compulsive disorders in PD Rating Scale (QUIP-RS) 50 ; the delay discounting task 51 ; the Excluded letter fluency task (ELF) 52 ; and the Hayling test 53 . Further information on these instruments is detailed in the Supplementary Information. Additional neuropsychiatric symptoms were captured with the Beck Depression Inventory II (BDI) 54 ; the Empathy Quotient (EQ) 55 ; the Geriatric anxiety inventory (GAI) 56 ; and the Apathy Scale 57 . For each self-report scale, participants were instructed to refer to 'the last two weeks' , in order to obtain a measurement of current 'state' . At each visit, PD motor symptoms were assessed using the UPDRS Part III motor examination 58 . Dopaminergic medication was recorded and converted to a levodopa-equivalent daily dose (LEDD) value 59 . Design and setting. Participants completed the experimental task prior to DBS and at 13-weeks post-DBS.
Participants were 'on' medication and stimulation for all assessments. We opted against a counterbalanced 'off ' and 'on' DBS assessment at the same visit for several reasons. First, our aim was to provide a naturalistic insight into the subtle behavioural changes that emerge as patients transition from dopaminergic therapies to subthalamic stimulation; changes in levodopa equivalent daily dose were included as co-variates in our analyses. Second, our experience is that many patients would not tolerate the DBS 'off ' state without severe discomfort. Thirdly, despite allowing DBS washout, plastic network effects of chronic DBS may persist and contaminate findings in an on-off design. task. We employed a modified version of an established slot machine gambling paradigm validated in healthy controls 19 . Subjects read an instruction screen and played through 5 training trials, after which they entered a 'virtual' casino, starting with 2000 AUD available to gamble and playing 100 trials (Fig. 1). The win-loss likelihood of the slot machines was predetermined and changed at regular intervals. On completion of the task, participants received a small monetary reward proportional to their total winnings. The naturalistic gambling task allows for risk-taking and impulsive behaviour to be expressed and offers four actions on each trial, each of which reflect exploration, and thereby, risk-taking.
(i) Bet Increase: increasing the amount wagered on consecutive trials (minimum of 5 AUD per bet, no maximum) (ii) Machine-Switch: switching between slot machines (four machines in total) (iii) Casino Switch: cashing out and switching 'virtual' casino days (iv) Double-Up: engaging in a secondary double-or-nothing gamble on all win trials www.nature.com/scientificreports www.nature.com/scientificreports/ As in our previous work 19 , these responses, together with trial-wise outcome information (wins/losses), served as the input for our computational models (for a brief summary, see below). Details on the paradigm and computational modelling can be found in prior work 19 34,35 (Fig. 2) where each level of the hierarchy encodes distributions of environmental variables (in ascending complexity) that evolve as Gaussian random walks. The HGF is an extension of the model presented in Behrens et al. 33 , and describes an agent whose learning rate is a function of his or her uncertainty. In the HGF, an agent is assumed not only to represent current environmental contingencies, but also to track how these contingencies change over time (volatility), and to what degree volatility itself is constant (tonic volatility) or may change in time (phasic volatility). Importantly, the agent modelled in the HGF employs these representations to make predictions about emerging environmental fluctuations and future sensory feedback. Furthermore, the agent is able to encode the precision of each prediction and use these precision estimates to scale trial-wise updates of beliefs about the environment and its statistical structure. Each level of the HGF is coupled such that higher states determine how quickly the next lower state evolves, with the lowest hierarchical level representing sensory events.
Inversion of this 'perceptual model' produces subject-specific parameter estimates that determine the nature of the coupling between levels of the HGF. Inverting this model under generic (mean-field) approximations results in analytical belief-update equations, in which trial-wise belief updates are proportional to prediction errors (PEs) weighted by uncertainty (or its inverse, precision). The subject-specific parameters shape an individual's approximation to ideal Bayesian inference, specifically how phasic and tonic volatility impacts trial-wise estimates of uncertainty at all levels of the hierarchy. Posterior estimates of HGF parameters can thus be regarded as a compact summary of an individual's uncertainty processing during an experiment.
Furthermore, in a 'response model' , trial-wise beliefs are probabilistically linked to observed trial-wise decisions. Inverting both perceptual and response models allows for estimating the parameters; this corresponds to Bayesian inference (of an observer) on Bayesian inference (of an agent) 24 . An informal description is given below and a formal summary is provided in the Supplementary Material.
The perceptual model. The HGF is used to infer how an individual subject learns about hierarchically-coupled environmental quantities under different forms of uncertainty (including volatility). In our case, the lowest level of the HGF, x 1 , represents the trial-wise binary outcome (win or loss) in the slot machine. This derives from a sigmoid transformation of x 2 representing winning probability in logit space (i.e., whether the machine is currently Figure 1. Slot machine gambling paradigm: The task consists of 100 trials. On every trial, players are able to place a bet of unlimited magnitude, switch slot machines or 'cash out' , exiting the casino and returning again on another virtual 'day' . The overall win probability is 25%, with wins split into big wins and small wins. The two possible types of losses are near-misses, in which the first two wheels are the same and the third is different (i.e. AAB) or a true loss, in which all the wheels are different (i.e. ABC). Game play proceeds as follows. Each trial begins with the slot machine main screen loading, displaying the player's account value. The player then places a continuous-valued bet amount, incremented in units of 5 or 10 AUD. After the player has placed a bet, he or she presses the 'Pull' button and watches as the wheels begin to spin. At any point, the player has the ability to press the 'Stop' button, ending the trial and subsequently revealing the outcome of the three wheels. Unbeknownst to the participant, pressing the stop button has no effect on the trial outcome. If the stop button is not pressed, the trial times out after 5 seconds, and the player sees the outcome of the first, second and third wheel sequentially. On trials in which the outcome is a win, there are ten possible reward grades (or multiples of the bet amount). After every win trial, players are offered a possible 'double-up' option, during which players are given 3 seconds to decide whether or not to engage in a 'double-or-nothing' option, thereby risking his or her entire win amount. If the player elects to engage in this gamble, a card flips over revealing the result, and subjects are taken to the next trial. If the player does nothing, or decides not to gamble, he or she is taken to the next trial. For each loss trial, players are taken directly to the beginning of the next trial. Again, the trajectory of win-loss outcomes is fixed, ensuring comparable inference upon perceptual and response parameters across participants.
'hot' or 'cold' and likely (or not) to pay out). x 2 evolves as a Gaussian random walk whose step size is a function f 2 (x 3 ) of a third-level variable, x 3 , which performs a Gaussian random walk of its own. x 3 represents the slot machine's 'volatility' , the speed at which it fluctuates between 'hot' and 'cold' states. The coupling function f 2 between levels, contains subject-specific parameters κ and ω that determine an individual's approximation to ideal Bayesian inference. Finally, the parameter ϑ at the highest level denotes how quickly volatility itself is changing (meta-volatility). A detailed derivation of the exact equations can be found in Mathys et al. 34 .
More concretely, in the context of our study, parameters ω and ϑ at the second and third level of the hierarchy, respectively, encode different aspects of subjective estimates of uncertainty. Specifically, these estimates concern environmental uncertainty, i.e., hidden fluctuations (volatility) of environmental states (for details, see Mathys et al.) 34 . These volatility estimates are potentially important for explaining the observed behaviour because they shape participants' belief updates about the slot machine and their ensuing choices about gambling. Parameter ω represents a subject's estimate of tonic volatility, i.e., how quickly a slot machine could be moving from a state where it is likely to pay out (running 'hot') to a state where it is not (running 'cold') and vice versa. Parameter ϑ encodes a subject's estimate of meta-volatility, i.e., the tendency of volatility itself to change over time. Larger values of each parameter correspond to greater uncertainty in the subject's perceptual inference process.
The response model. The response model maps a subject's beliefs (obtained by inverting the perceptual model under given parameter values) to observed gambling behaviour. Here, we use a sigmoidal response model 34 ; if this function is steep, there is a close relationship between current perceptual beliefs and betting behaviour. Conversely, a gentler sigmoidal slope results in a more stochastic mapping of beliefs to behaviour. This response function has a parameter, β, the decision 'temperature' (also known as the inverse temperature), that determines the steepness of the sigmoid and thus the degree of stochasticity in the belief-to-choice mapping. The larger the value of β, the steeper the function, and the more deterministic is the relationship between a subject's belief and their actions. In this paper, we test the following two variants of this response model: (i) 'Standard' HGF: β = constant, i.e., the mapping from beliefs to behaviour is fixed across the experiment.
This parameter is estimated for each subject.  . In turn, x k 2 ( ) is modelled as a Gaussian random walk, whose step-size is governed by a combination of x k 3 ( ) , via coupling parameter κ, and a tonic volatility parameter ω. x k 3 ( ) also evolves as a Gaussian random walk over trials, with step size ϑ (meta-volatility). In this investigation, after observing trial-wise outcomes (win or lose), the gambler updates her belief about the probability of win on a given trial k x ( ) k 2 ( ) , as well as how swiftly that slot machine is moving between being 'hot' (high probability of win) or 'cold' (low probability of win) x k

( )
. On any trial, the ensuing beliefs then provide a basis for the gambler's response, which may be to increase the bet size, 'double up' after a win, switch to a new slot machine or leave the casino. (2019) 9:14795 | https://doi.org/10.1038/s41598-019-51164-2 www.nature.com/scientificreports www.nature.com/scientificreports/ Perceptual variable. Based on previous work that examined different computational models of our slot machine paradigm 19 , the perceptual variable used here was simple: a binary variable in which wins were represented by 1 and losses by 0. This binary representation of win or loss in the task allows for increased interpretability of model parameters in measuring uncertainty-updating and impulsive-responding in reaction to a binary win/ loss outcome.
Response variable. The response variable is a binary representation of actions associated with risk taking. It is constructed using a logical OR operator on four choices during the slot machine paradigm: bet increases, machine switches, double-ups and casino switches. For each trial, the response variable takes a 1 when any of these four events occur, and 0 otherwise. For details, please see Supplementary Table 1.
While these actions might at first glance appear to relate to different behaviours, they all share a common theme in that they enhance outcome variance and thus the amount of risk the player takes in the game. For example, increasing bet size from one trial to the next results in higher reward variability in the trial outcome, thereby making the player more susceptible to larger wins and losses. In aggregate, these four actions relate to a player's risk-taking tendencies (described further in Supplementary Section 1.4).

Reinforcement learning.
As an alternative model, we used a classical associative learning model, Rescorla-Wagner (RW), often used in reinforcement learning (RL) 60 . The RW model updates the probability of a win on trial k by combining the probability on trial k−1 with a PE weighted by a constant learning-rate. Hence, in contrast to the HGF, the RW model does not have a dynamic learning rate over trials, nor can it account for different forms of perceptual uncertainty-essentially, the RW model corresponds to an HGF with a fixed learning rate. Here, we combine the RW learning rule with the same sigmoidal response model described above, with free parameter β, that we estimate on a subject-specific basis. This results in a model that is (i) structurally not dissimilar but less complex than the HGF and (ii) almost identical to the RL model used in a prior investigation of learning after STN-DBS 17 .
Model inversion. The HGF and RW models were inverted using population Markov-Chain Monte Carlo (MCMC) sampling 61 . Parameter estimation in the HGF is classically 'fully Bayesian' and requires a selection of priors, which influence parameter estimation to a lesser or greater degree. In order to minimise this influence, we used a novel empirical Bayesian inference scheme for the HGF where a Gaussian group-level distribution of parameters is constructed from samples across the group. This group-level empirical prior is then used to obtain posterior parameter estimates in each subject ( Supplementary Fig. 2). Subject-specific point estimates for model parameters are calculated as the median value of the subject's posterior distribution.
Given the clinical constraints of our investigation (to reduce any burden on the participants, we only used 100 trials per episode of gambling, i.e., only half as many as in our previous work) 19 , it was important to ensure that our parameter estimates were robust. Therefore, in order to verify that HGF parameter estimates reliably reflected subject-specific characteristics of uncertainty encoding and decision noise, we tested our ability to recover ground-truth parameter values from simulated response data. In order to assess parameter recoverability, we used three parameter values per parameter (shown in Supplementary Fig. 4) and generated a batch of 38 synthetic response variables based on these assigned values, using the underlying trace of the slot machine as the perceptual variable. We then inverted the HGF and explored the relationship of the recovered parameter estimates, using the median of the posterior, with the ground truth values. When estimating ω and ϑ, β was held fixed; conversely, when estimating for β, ω and ϑ were fixed. This process was repeated for 10 batches across each parameter. Model comparison. As described above, we considered two competing hypotheses of how subjects might incorporate uncertainty into their choice of actions, i.e., two different belief-to-choice mappings in the response model for the HGF (the 'Standard' and 'Uncertainty-driven' models). These two versions of the HGF were compared with the RW model. As we were primarily interested in the pre-DBS to post-DBS change, we selected the winning model for the pre-DBS measurements. We then evaluated if the parameter estimates of that winning model changed postoperatively. Estimates of the negative free energy (log model evidence) were computed using thermodynamic integration 61 . The negative free energy balances goodness of fit with a complexity penalty. Group-level free energy estimates were compared to select a winning model. Neuropsychiatric assessment data from baseline, prior to DBS, was compared with data gathered at 13-weeks post-DBS, when the gambling task was repeated. To test for differences in pre-DBS and post-DBS questionnaire scores and model parameter estimates, a paired t-test was employed when the data were normally distributed and the Wilcoxon signed-rank test otherwise, where distribution was assessed using the Lilliefors test. Gambling behaviours (such as bet increases and machine switches) were also compared at both intervals. Gambling behaviours were regressed against clinical measures of impulsivity to determine significant relations. After determining the winning computational model, model parameter estimates were extracted for each participant and regressed against clinical measures of impulsivity to determine significant associations and predictors of postoperative impulsivity. Based on this previous work showing a significant association between BIS scores and both slot machine behaviour and HGF-based estimates of uncertainty encoding 19 , we focused our analyses on the BIS and (2019) 9:14795 | https://doi.org/10.1038/s41598-019-51164-2 www.nature.com/scientificreports www.nature.com/scientificreports/ its subscales. Perceptual model parameters were extracted in log space: ω and β are naturally estimated in log space, since they are part of exponential terms in their respective equations (see equation 5 in the supplementary material).
From a clinical perspective, we were interested in examining whether changes in the computational characterisation of individual uncertainty estimates pre-to post-DBS were associated with clinically-relevant changes in impulsivity at any time point after DBS. Our strategy to attempt prediction of clinical outcomes follows the 'generative embedding' approach, in which individual predictions are not derived from measured data but from parameter estimates obtained by a generative model 62,63 . Importantly, stimulation-dependent changes in impulsivity may evolve in an unpredictable manner subsequent to DBS, related to variations in DBS programming over time (with considerable adjustments to stimulation in the first six postoperative months). Furthermore, the optimal BIS cut-off score for clinically-significant impulsivity varies by age and disease 64 , with only one existing investigation specific to a PD cohort 65 . Therefore, we examined whether individual changes in parameter estimates associated with the maximum postoperative increase in impulsivity, as measured by the BIS, compared to baseline, across six months of longitudinal follow up.

Results participant characteristics.
Participants were a predominantly middle-aged sample, with a bias towards male gender and akinetic-rigid/mixed phenotype over tremor (Table 1). Most participants had bilateral disease with consequent impairment of functioning in their activities of daily living. neuropsychiatric assessment pre-and post-DBS. Concerning symptoms of primary interest (Table 2), there was a small but statistically-significant group-level post-DBS decrease in impulsivity, as measured by the BIS Total, compared to baseline. There was also a significant reduction of motor symptoms assessed using the UPDRS Part III Motor Examination, with a corresponding significant reduction in the requirement for dopaminergic therapy (LEDD). There were no statistically-significant changes in other behavioural measures related to impulsivity, including the Hayling test, the Excluded Letter Fluency task and the delay discounting task. Comparable to the BIS, the QUIP-RS total score demonstrated a trend towards a reduction at 13-weeks post-DBS, but this did not reach significance.
For symptoms of secondary interest (and subscales), see Supplementary Table 2. There was considerable variance between subjects across assessment scores at each interval and within subjects across the course of longitudinal follow up (Supplementary Fig. 3).
The BIS and the BDI showed a significant positive correlation at each time point (ρ pre = 0.46, p = 0.003; ρ post = 0.53, p < 0.001), and both showed (near-)significant changes from pre-to post-DBS (Table 2). Therefore, to rule out that impulsivity-related findings were driven by changes in depression, the BDI was included as a covariate when regressing behaviour and model parameter estimates against BIS scores. Whilst the LEDD is conceivably related to impulsivity, it did not correlate with the BIS total (ρ pre = −0.126, p = 0.450; ρ post = −0.042, p = 0.799) and was therefore not included in these regression analyses. However, the QUIP and LEDD correlated strongly at both time points (ρ pre = 0.42, p = 0.008); ρ post = 0.44, p = 0.005), with LEDD decreasing significantly post-DBS. There were no significant correlations between LEDD and the other measures of impulsivity (ELF Rule Violations, Hayling AB Error Score and Delay Discount K). Based on previous work using this task and modelling framework 19 , we focused our attention on exploring impulsivity as measured by the BIS.

Gambling behaviour. Gambling behaviour Pre-and Post-DBS.
At the group level, there were no significant differences in the behaviour of participants on the slot machine from pre-to post-DBS (Supplementary Table 3). Due to subjects not engaging in the 'casino switch' option, this variable was eliminated from regression analyses.
Pre-DBS Regression of BIS scores on Gambling Behaviour. We studied the relationship between pre-DBS gambling behaviour and pre-DBS impulsivity as measured by the BIS ( www.nature.com/scientificreports www.nature.com/scientificreports/ regression in order to control for changes in clinical state attributable to depressive symptoms. The overall preoperative model including the BDI total was significantly associated with the BIS total score [F (4,33) = 3.024, p = 0.031]. Post-hoc t-tests on task behaviour revealed that no behavioural variable was significantly related to the BIS individually. When subscales of the BIS were examined, gambling behaviour associated significantly with the BIS Attentional subscale [F (4,33) = 4.094, p = 0.008], where higher bet sizes corresponded to higher attentional impulsivity (t (37) = 2.303, p = 0.028) (Supplementary Table 4).
Post-DBS regression of BIS scores on gambling behaviour. The full model of postoperative gambling behaviour was also significantly associated with BIS total score [F (4,33) = 4.920, p = 0.003] (Table 3). Again, post-hoc t-tests revealed that no task behaviour was significant on its own. When subscales of the BIS were examined, gambling behaviour correlated significantly with the BIS Attentional subscale [F (4,33) = 8.123, p < 0.001]. Post-hoc t-tests revealed that higher bet sizes (t (37) = 2.604 p = 0.014) and more frequent double or nothing gambles (t (37) = 2.589 p = 0.014) corresponded to higher BIS Attentional scores (Supplementary Table 5).
Post-DBS, higher bets and more frequent machine switches were significantly associated with higher QUIP-RS scores (Supplementary Table 6). No other measures of impulsivity were significantly associated with pre-or post-DBS slot machine activity.  Table 3). Post-hoc t-tests revealed that the change in bet behaviour significantly was associated with maximum BIS increase (t (37) = 2.866, p = 0.007). Additionally, the change in machine switch behaviour was also significantly associated with maximum BIS increase (t ( www.nature.com/scientificreports www.nature.com/scientificreports/ p = 0.034). In other words, changes in betting and slot machine switching behaviours after DBS indexed changes in impulsivity as assessed by the BIS. computational modelling. As described above, we were interested in evaluating the role of uncertainty and its association with postoperative changes in impulsivity in our cohort. We therefore first determined, using Bayesian model comparison, which of our three models best explained pre-DBS behaviour, before evaluating whether the parameter estimates of this winning model changed postoperatively and were associated with postoperative BIS scores. Bayesian model comparison selected the 'standard' HGF (with a subject-specific decision temperature in the response model) as the winning model, with a group-level Bayes factor of approximately 12.5, compared to the next best model (the Rescorla-Wagner model) (Fig. 3).

Regression of maximum BIS
Parameter recoverability in the HGF. We tested for parameter recoverability in the HGF, finding that ground truth parameter values for ω and β could be recovered consistently, but we were unable to reliably recover ϑ (for details, see Supplementary Fig. 4). For this reason, we restricted the following analysis to parameters ω and β when exploring the association between parameter values and questionnaire-based measures of impulsivity.

Changes in model parameter estimates Pre-to Post-DBS.
Estimates of the HGF model perceptual parameter ω significantly increased postoperatively (t 37 = −61.328, p < 0.001), and estimates of β significantly decreased (t 37 = 2.124, p = 0.04), implying larger subjective estimates of uncertainty (volatility) and greater stochasticity in the selection of responses after DBS (Table 5 and Fig. 4). ω represents a subject's estimate about the tonic component of environmental volatility; i.e., how quickly the likelihood of winning on a given slot machine might be changing, while β represents the decision noise, or the stochasticity involved in the belief-to-choice mapping process.
Pre-DBS regression of BIS scores on model parameter estimates. The full regression model (including the estimates of preoperative perceptual and response parameters ω and β) was significantly associated with BIS total [F (3,34) = 3.372, p = 0.03] ( Table 5). Post-hoc t-tests on model parameter estimates did not reveal any single parameter to be significantly related to the BIS on its own. When subscales of the BIS were examined, the full regression model associated significantly with the BIS Attentional subscale [F (3,34) = 3.314, p = 0.031], but again no single parameter was independently significant (Supplementary Table 7).  Table 5). Post-hoc t-tests revealed that ω was a significant regressor (t (34) = 3.761, p = 0.001). The positive regression coefficient for ω implies that the greater the subjective estimate of uncertainty (tonic volatility), the higher the BIS score (Fig. 4). In other words, although there was a group level decrease in BIS score pre-post DBS, at an individual level the greater postoperative estimate of uncertainty (i.e., the higher the estimated volatility of the slot machine's winning probability), the greater the postoperative impulsivity. Thus, for participants with high subjective volatility estimates, there was likely to be a postoperative increase in BIS. When subscales of the BIS were examined, model parameter estimates correlated significantly with the BIS Attentional subscale [F (3,34) Table 8).
Change in model parameters associating with maximum change in BIS. We were interested in whether individual pre-to postoperative changes in estimates of subjective uncertainty would correlate with the maximum postoperative change in impulsivity across six months of follow up post-DBS. The pre-to-post change in model parameter estimates was calculated as the preoperative minus the postoperative parameter estimate. In the case of ω, postoperative parameter estimates were significantly higher than preoperative values (in the case of ω, higher values imply higher uncertainty), therefore the Δω value for each subject is negative and implies a post-operative increase in uncertainty. With regards to β, postoperative parameter estimate values were significantly lower than preoperative values (in the case of β, lower values imply greater stochasticity), therefore the Δβ value for most subjects is positive, implying a postoperative increase in the stochasticity of the belief-to-choice mapping. Changes in model parameter estimates were associated with the maximum post-operative increase in the BIS across all longitudinal assessments over six months [F (3,34) = 3.987, p = 0.015] ( Table 5). Post-hoc t-tests revealed that Δβ was a significant regressor (t (34) = 3.312, p = 0.002). The positive regression coefficient here implies that a post-operative increase in decisional randomness related to a greater maximum increase in BIS (Fig. 4).
Supplementary analyses. We examined whether dopaminergic medication dosage expressed as a standardised unit (LEDD) was connected to computational model parameters and whether changes in drug doses postoperatively were connected to changes in uncertainty encoding. There was no significant relationship between pre-DBS model parameters and pre-DBS LEDD [F (2,35) Table 5). The BDI was not included in these regression models, as it is not a confound when examining a relationship between model parameter estimates and the LEDD. Post-hoc t-tests showed that Δβ was a significant regressor (t (34) = 2.832, p = 0.008). The positive regression coefficient here implies that a post-operative increase in decisional stochasticity was observed in patients who had a larger post-operative decrease in LEDD (Fig. 4). However, this relationship appeared to be   www.nature.com/scientificreports www.nature.com/scientificreports/ driven by an outlier participant with a particularly large perioperative decrease in LEDD. When this participant was removed, the relationship was no longer statistically significant. We have removed the outlier in Fig. 4 but a full plot including the outlier can be found in Supplementary Fig. 5.

Discussion
In this study, we employed a naturalistic gambling task and a hierarchical Bayesian model (for inference on subject-specific estimates of uncertainty) in order to investigate impulsive decision-making in participants with PD undertaking subthalamic DBS. Gambling behaviour associated with a 'gold-standard' questionnaire (BIS) measure of impulsivity, with post-DBS changes in gambling behaviours indexing postoperative changes in impulsivity. We also found that parameter estimates representing subjective estimates of environmental uncertainty (volatility) changed significantly from pre-to postoperative conditions. In particular, there was a significant increase in ω, that reflects a gambler's estimate of how quickly the probability of winning on a given slot machine  Table 4. Also shown is the change in BIS pre-and post-DBS (t (37) = −2.66, p = 0.033), as shown in Table 2. p-values are Holm-Bonferroni corrected for multiple comparisons with α = 0.05. The second row illustrates the relationship between pre-DBS ω and pre-DBS BIS, pre-DBS β and pre-DBS BIS and the pre-topost change in ω with the max increase in BIS, as well as the max decrease in LEDD. The third row illustrates the relationship between post-DBS ω and post-DBS BIS, post-DBS β and post-DBS BIS and the pre-to-post change in β with the max increase in BIS. Here, we have removed the outlier in the plot relating the change in β to the max decrease in LEDD. These plots serve to better illustrate the results shown in Table 5. Specifically, that greater volatility estimates (ω, the tendency of a slot machine's winning probability to change) were associated with greater maximum postoperative BIS scores, and that greater stochasticity in belief-to-choice mapping (decision temperature -β) associated significantly with the maximum postoperative increase in BIS. www.nature.com/scientificreports www.nature.com/scientificreports/ was changing (volatility), There was also a postoperative decrease in a second parameter, β, that captures the decision noise in a player's belief-to-choice mapping. Notably, these model-based estimates of uncertainty related to postoperative impulsivity. The greater the postoperative estimate of ω, the greater the postoperative BIS score. In other words, the more a participant perceived the pay-out tendency of a slot machine to be changing after DBS, the more impulsive they rated themselves. Additionally, the higher the pre-to postoperative decrease in estimates of β, the higher the postoperative increase in BIS score across six months of longitudinal follow up. In other words, the more a participant became indiscriminate in their belief-to-choice mapping after DBS, the more impulsive they rated themselves.
Our gambling task utilised a multivariate response variable (bet increase, machine switch, casino switch and double-up) that captured different aspects of impulsivity and explorative behaviour. Furthermore, by employing a generative model that mapped observed responses to perceptual states, we were able to infer directly upon subject-specific parameters defining individual differences in uncertainty encoding. This is an important point of difference from a purely behavioural analysis, in which responses can have more than one (ambiguous) proximate cause. In the HGF, parameters are mathematically defined and have a concrete influence upon learning at different levels of the hierarchy (see Mathys et al. 2011Mathys et al. , 2014 for simulations that demonstrate these effects) 34,35 .
What is the significance of individual differences in uncertainty encoding? Increased estimates of environmental uncertainty accelerate the rate of learning at higher hierarchical levels, which could engender maladaptive learning at lower levels of the hierarchy. A high learning rate suppresses the influence of top-down expectations, and may impair learning about probabilistically aberrant events. In a recent investigation employing the HGF to model surprise about unexpected events, persons with autism learned more quickly about environmental volatility than controls without autism 66 . However, at lower levels of the hierarchy, the tendency to believe that environmental instability is unstable resulted in smaller prediction errors (surprise) when events violated expectations. In other words, when the world is judged to be unstable and unpredictable, an agent differentiates less between expected and unexpected outcomes. This offers a similar but computationally distinct account of the stimulation-related learning changes described in a previous study 17 , in which reduced positive and negative instrumental outcome sensitivity was reported as a consequence of neurostimulation. Similar to prior work, we found a positive relationship between model-based estimates of uncertainty and impulsivity 19,20,22,23 . A plausible computational account of impulsivity is that high subjective uncertainty leads to lack of predictability and thus increases a tendency for short-term reward seeking and exploration.
We established that the 'standard' HGF best explained the gambling behaviour of our participants, in favour of a Rescorla-Wagner model or an 'uncertainty-driven' HGF. Importantly, the distinction between the 'standard' and 'uncertainty-driven' HGF models pertains only to the modelling of responses (the perceptual model is identical), in which the 'standard' HGF employs a fixed decision temperature and the 'uncertainty-driven' HGF a dynamic belief-to-response mapping based on online estimates of uncertainty (of beliefs about winning probability). These model comparison results suggest that our participants incorporate estimates of volatility into their prediction of reward probability but do not vary the stochasticity of their responses in response to these estimates. This is an interesting point of difference from the findings amongst younger, healthy males who completed a similar (albeit much longer) version of this task 19 and future work will corroborate whether this finding of a static decision temperature is also observed amongst other neurodegenerative disorders.
In our participants, neurostimulation may interact with the physiology of the STN and alter the computations it implements. A tripartite functional organisation of the STN into limbic, associative and motor subregions is suggested by primate and human studies 67,68 , with electrode implantation targeted to the dorsolateral sensorimotor region to address motor symptoms of PD 69 . Yet, the small size of the STN means that dispersion of electrical charge from a stimulating contact in this region could still modulate subthalamic regions with greater connectivity to fronto-striatal networks. The more ventral and medial the stimulating contact, the more likely these networks are to be affected by DBS. Previous investigations have suggested that the site of subthalamic stimulation can modulate cognitive 70 and psychiatric symptoms 49,71,72 . How could STN-DBS modulate uncertainty? From a computational perspective, the STN has been considered to implement a 'delay' on cognitive-associative circuits in the basal ganglia, allowing more information to be gathered to guide the most appropriate behavioural policy, suppressing impulsive and potentially error-prone responding 13,14 . It is possible that by modulating the decision threshold, STN-DBS could alter the bound for evidence accumulation and thus uncertainty in the representation of the reward environment 73,74 . Further work employing drift diffusion modelling to quantify rates of evidence accumulation and decision boundaries after STN-DBS may be illuminating, having previously helped to elucidate the mechanisms underlying hallucinations in PD 75 . Further work is also required to determine if the site of stimulation affects the magnitude of changes in uncertainty estimation observed here and specifically if cognitive-associative or sensorimotor regions of the STN are most implicated in these shifts.
We did not observe a cross-sectional relationship between dopaminergic medication (expressed as LEDD) and uncertainty encoding, nor did we observe a longitudinal relationship between LEDD and self-reported impulsivity. However, there was a longitudinal relationship between changes in model parameter estimates and the maximum reduction in LEDD during longitudinal follow up. Specifically, the greater the increase in decision noise (the greater the decrease in β), the greater the postoperative reduction in LEDD. It is difficult to be certain about whether this is a causal relationship and it may be an epiphenomenon of effective subthalamic DBS: One of the benefits of the STN (as opposed to other surgical targets in DBS for PD such as the internal segment of the globus pallidus) is that it allows for significant postoperative reduction in dopaminergic medication. Therefore, this apparent relationship could well be mediated by the effect of electrical stimulation -increasing indiscriminate responding and leading to a reduced requirement for dopaminergic therapy. Moreover, the finding that this relationship no longer held after the removal of an outlying participant decreases the confidence in this result.
There are likely to be fundamental differences in the computational operations subserved by the STN and dopamine in decision-making and impulsive behaviour. We have discussed the chronometric role of the STN is setting a decision bound and delaying impulsive choice, whereas dopamine is likely to have an essential role in reinforcement learning and reward evaluation [76][77][78][79][80][81][82][83] . In a non-surgical population, persons with PD withdrawn from medication display a characteristic impairment in reward learning and may show enhanced punishment sensitivity 84 . However, whilst dopamine replacement enhances the ability to learn from positive outcomes, learning from negative outcomes is impaired 84 . Thus, if postoperative LEDD reduction were a principal driver of a change in behaviour subsequent to DBS, then a selective impairment in positive outcome representation would be expected. However, from the HGF perspective, an agent with increased uncertainty at higher levels would be expected to show both decreased reward and punishment learning, as surprise to both positive and negative unexpected outcomes would be reduced (see Lawson et al.) 66 . This suggests that LEDD changes may have a secondary role, but further careful experiments will be necessary to address this question. For example, the goal of this behavioural analysis was to relate a computational marker of uncertainty (over all trials) to impulsivity, but future neuroimaging investigations could model trial-wise positive and negative reward prediction errors and relate this to trial-wise brain activity. Participants could also be tested prior to STN-DBS 'on' and 'off ' medication (although in our cohort, participants were too impaired by their movement disorder to tolerate this and a group of less severely-affected individuals would be required).
We did not observe significant correlations between behaviour or parameters inferred from slot machine play with other estimates of impulsivity including the excluded letter fluency task, the Hayling test and the delay discounting task. This reflects the multifaceted nature of impulsivity, which may implicate discrete subcortical and cortical regions and may evidence differential patterns of expression amongst impulsive endophenotypes 85,86 . For example, the Hayling and ELF tasks are more commonly included amongst measures of task-switching and conflict interference, whilst the delay discounting task assesses impatience. Alternative paradigms may be required to capture participant-wise behaviour amongst these constructs.
We acknowledge methodological limitations of our investigation. The lack of a counterbalanced on-off stimulation design means that we cannot directly infer that stimulation underlies the observed changes in perceptual modelling observed in our participants (rather than, for example, practice effects or time). Specifically, for our participants the pre-DBS session was the first time they had performed the task, and so changes in postoperative behaviour could also be attributable to greater familiarity with the task and perhaps the inherent volatility of the reward structure. However, we suggest that a strength of our longitudinal design is that it is more reflective of the natural clinical course taken by persons with PD in the clinic. Moreover, our participants simply would not have tolerated an extended DBS washout and we hypothesise that the younger age of participants in the study of Seymour et al. may have facilitated their crossover design 17 . Nevertheless, it would be important to consider future experiments that could resolve this question, for example, selecting a cohort of younger PD participants who could tolerate a washout of stimulation, or testing a cohort of PD participants without DBS twice, 13-weeks apart.
Unfortunately, in this study, we were unable to utilise estimates of meta-volatility in our analysis as ϑ could not be robustly recovered from simulated data. This failure to recover ϑ might result from the limited number of trials (100) completed by each participant, which limits the amount of information that can be gathered to update estimates of this higher-level HGF parameter from the population prior. Again, the disability of our participant cohort prohibited a greater number of trials (as employed in previous studies using this paradigm) 19 , but this could be considered in future studies using younger or less severely-affected participants.
In summary, this study suggests that subjective estimates of uncertainty pertaining to environmental volatility and the stochasticity in belief-to-choice mapping change after subthalamic DBS for PD and relate significantly to postoperative impulsivity. Increased estimates of environmental uncertainty (volatility) and increased noise in the decision process may contribute to impulsivity as a clinically relevant form of maladaptive behaviour. Uncertainty elevates the learning rate and suppresses top-down expectations, which may blunt error signalling in a series of trial-wise outcomes. Similarly, a consistent decision rule with regards to acting on an internal model of the world is important to make appropriate decisions based on what has been learned. We therefore posit a cognitive mechanism for the genesis of impulsive behaviour in this population. Finally, our results demonstrate that a naturalistic assessment of gambling behaviour in a virtual casino is useful for investigating impulsivity in PD. The potential of our model to explain changes in impulsivity through game play could be most valuable in PD, given the significant, but poorly quantified risks relating to surgical (neurostimulation) and medical (dopamine agonist) treatments. If those at a higher risk of neuropsychiatric harm could be identified, this would improve the nature of treatment choice and informed consent and the effectiveness of clinical follow-up.

Data Availability
The HGF toolbox is part of the open source TAPAS software and available for download at http://www.translationalneuromodeling.org/tapas. The gambling paradigm is provided for download on a git repository at https:// github.com/saeepaliwal/breakspear_slot_machine.git. The analysis pipeline is provided at https://github.com/ saeepaliwal/dbs_pd_analysis_pipeline.git. A de-identified data set containing neuropsychiatric assessment and gambling data can be provided by Dr Philip Mosley (Philip.Mosley@qimrberghofer.edu.au) on application, subject to institutional review board approval.