## Introduction

Major depressive disorder (MDD) has been associated with impairments in reward processing, and many studies indicate that symptoms of MDD correlate with diminished neural and behavioral responses when rewards are presented1,2,3,4,5,6. These studies have typically used reward learning and other tasks that provide feedback about rewards and focused on individuals’ responses at this feedback or ‘reward outcome’ phase (see Rizvi et al.7 for review). However, little is known about how depression affects reward valuation during decision-making in the absence of learning and feedback. Understanding whether individuals with MDD have disrupted valuation during decision-making at the ‘decision phase’, separate from reward outcome, will clarify whether individuals with MDD are disrupted overall in reward valuation or more specifically in experiencing rewards. Here, we used a risky decision-making task, a model-based analytic approach, and a repeated measures within-subject design across four visits to investigate whether participants with MDD have intact or disrupted valuation during decision-making in the absence of learning and feedback.

Sixty-nine individuals with current MDD and 41 non-psychiatric controls were recruited for the current study. To investigate ‘value sensitivity’ during decision-making independent from feedback, we asked participants to complete a risky decision-making task (adapted from Holt & Laury8 and Dickhaut et al.9) (Fig. 1). During the task, participants made a series of nine choices between two gambles, one of which was objectively riskier than the other8. Each pair of gambles had the same high- and low-payoff probabilities; the high-payoff probability increased from 10% to 90% in 10% increments (as the low-payoff probability decreased) along the nine pairs. Participants’ choices between the safer and riskier options, at each payoff and probability combination, were recorded to investigate individual value sensitivity. Participants were paid based on the actual outcome of one of their choices; the outcome was determined after all choices had been made (i.e., no feedback at each decision). This paradigm allowed us to examine valuation during decision-making, independent from potential learning and outcome effects.

Tasks of this sort are classically used to study individuals’ value-based decision-making under risk10,11, and expected utility theory12 points to two basic components that account for differences among individuals’ choices in such tasks. The first, ‘risk preference13,14 (RP)’ reflects how objective values are subjectively perceived (subjective value) and is quantified by the curvature of a power utility function12. This approach is equivalent to defining risk as the variance of each option and thus, takes into account both the reward magnitude outcome spread and probability information. The second component determines the degree to which subjective value differences between options affect the probability of choosing one option over the other, and is often referred to as ‘inverse temperature15 (IT)’. Both components characterize individual differences in the direction and the degree to which objective values impact individual choices, and thus are used as measures of value sensitivity in the current study. Note that each measure explains a different functional relationship between subjective values and decision-making: RP accounts for nonlinear (concave or convex function) subjective valuation and IT is a linear scaling of subjective values (similar to ‘reward sensitivity’ in other MDD studies1; see Methods for expected utility model specifications). The value sensitivity measures were estimated from individuals’ choices using maximum a posteriori fitting (see Methods for parameter estimation procedure).

Participants completed the decision-making task on up to four laboratory visits as part of a longitudinal study; on average, visits were separated by 5.5 weeks (mean of 116.27 days between Time 1 and Time 4 visits). At each visit, participants were instructed that one of their actual choices would be randomly selected and played out to determine their payoff at the end of the visit. The payoff was determined by the values of a gamble selected via random number generator from the participant’s actual choices and a roll of a hundred-sided die (the first determining which gamble would be played and the second determining the payoff). Participants who made less than two visits to the laboratory, had Beck Depression Inventory (BDI-II) scores16 >12 for controls or <13 at Time 1 for MDD participants, or always chose the option with smaller expected value were excluded from analyses (see Methods for numbers of excluded participants for each criterion). The analyzed sample for RP and IT parameter estimation included 33 non-psychiatric controls (14 females; age = 33.00 ± 11.31) and 65 individuals with MDD (48 females; age = 37.92 ± 11.48). See Table 1 for further demographic information.

## Results

### Valuation is comparable between MDD and non-psychiatric control participants

To compare value sensitivity in MDD participants with that of non-psychiatric controls, we estimated each individual’s risk preference and inverse temperature for each visit, and first compared the means of these parameters between groups (see Methods for details about parameter estimation). Thus, RP and IT at each of the four visits were computed for each individual, for participants who visited all four times (Ncontrol = 28, NMDD = 47). Group mean parameter values were: RPcontrol = 0.50 ± 0.31; RPMDD = 0.46 ± 0.31; ITcontrol = 3.41 ± 0.41; and ITMDD = 3.25 ± 0.43 (mean ± s.d.). Note that both the MDD and non-psychiatric control groups showed risk aversion (RP < 1) consistent with Holt & Laury8. Across four laboratory visits, participants with MDD showed comparable RP and IT to that of non-psychiatric controls (Fig. 2ai,bi; RP: F(1, 219) = 0.63, P = 0.43; IT: F(1, 219) = 2.68, P = 0.11; Group × Time mixed-design ANOVAs with rank transformation17). These results were confirmed using Bayesian null hypothesis significance testing (JASP, version 0.8.0.018) which similarly indicated no group differences in RP or IT (see Supplementary Text for additional details and Table S5 for Bayes factors). These results indicate that MDD and non-psychiatric control participants have comparable linear and nonlinear value sensitivities during decision-making.

### Valuation is stable over time for both MDD and non-psychiatric control participants

Previous studies have shown that risk preferences measured with variations of the Holt & Laury task8 are stable over time in unselected control individuals, particularly when model-based estimates were used19,20. As an initial test of RP and IT stability within MDD and control participants, we compared within-group means over time; these analyses indicate that neither parameter differed over time for either group (RPcontrol: χ2(3, 81) = 2.12, P = 0.55; RPMDD: χ2(3, 138) = 0.66, P = 0.88; ITcontrol: χ2(3, 81) = 2.94, P = 0.40; ITMDD: χ2(3, 138) = 2.40, P = 0.49; Friedman’s tests, analogous to non-parametric repeated measures ANOVAs; only participants who visited all four times were examined here (Ncontrol = 28, NMDD = 47)). Adopting the approach of previous studies for measuring temporal stability, we also examined the stability of RP and IT within controls and participants with MDD by correlating the value of each parameter between pairs of visits ([1st vs 2nd visit], [1st vs 3rd visit], … [3rd vs 4th visit]) (Fig. 2aii,bii). Both control and MDD participants showed moderate to high stability in both RP and IT, respectively (mean correlation coefficients: RPcontrol: Spearman ρ = 0.57; RPMDD: ρ = 0.54; ITcontrol: ρ = 0.48; ITMDD: ρ = 0.57; see Fig. 2aii and bii for full correlation matrix; Supplementary Table S2 for sample size at each visit). Note that the proportion of risky choices, a model-free measure of risk preference, was also stable over time in both the MDD and control groups (see Supplementary Fig. S2 for model-free risk preference stability over time). Again, Bayesian null hypothesis significance testing confirmed the stable RP and IT across visits (see Supplementary Text and Table S5 for details and Bayes factors). These significant correlations indicate that for MDD and control participants, the risk preference and inverse temperature measures of value sensitivity at the decision phase are stable over time.

### Valuation does not vary with severity of clinical symptoms in MDD

Given previous reports that reward sensitivity at decision outcome varies with symptoms in depression1,21,22,23, we also examined whether RP and IT varied systematically with depressive or anxious symptoms. Symptoms were measured using the BDI-II16, Anhedonia subscale of the BDI-II (sum of BDI items 4, 12, 15, and 21)24, State Anxiety Scale of the Spielberger State-Trait Anxiety Inventory (STAI)25, and the five subscales of the Mood and Anxiety Symptoms Questionnaire (MASQ; Anhedonic Depression, Anxious Arousal, General distress (GD):Anxiety, GD:Depression, and GD:Mixed)26; correlations were performed within the MDD group. None of the clinical symptom scores or changes in symptoms over time were related to MDD participants’ RP or IT parameter values (see Supplementary Fig. S1, Supplementary Table S3, and S4 for statistical test results). These data demonstrate that individual differences in value sensitivity during decision-making are not explained by clinical characteristics of MDD.

## Discussion

The current study used a risky decision-making task to investigate MDD individuals’ value sensitivity at the decision phase independent from learning and feedback. The within-subjects repeated-measures design allowed us to examine the stability of the value sensitivity measures, and the model-based approach dissociated linear (inverse temperature) and nonlinear (risk preference) value sensitivities that together determine behavioral choices during risky decision-making.

A few previous studies have used risky decision-making paradigms and measured MDD individuals’ risk preferences. The results, however, have been inconsistent. Some studies have reported decreased risk seeking behavior in individuals with MDD21,27,28, while other studies have reported comparable risk preferences between individuals with MDD and healthy individuals29,30. In the current study, we showed that risk preferences (nonlinear value sensitivity) in individuals with MDD are comparable with those of healthy individuals. The stability of risk preferences was tested across four repeated visits, and consistent with previous findings in unselected control individuals19,20,31, MDD participants showed stable risk preferences over time (c.f., model-free measures showing lower reliability32,33,34). In addition to estimating risk preference, we examined inverse temperature (linear value sensitivity, similar to ‘reward sensitivity’ in other MDD studies1) at the decision phase, and showed that MDD participants have stable and comparable inverse temperature compared with non-psychiatric controls. In addition, none of the clinical symptom severity measures within participants with MDD were related to individual differences in risk preference or inverse temperature. These results indicate that in contrast with previous decision-making studies showing blunted valuation at the outcome phase in MDD1, neither linear nor nonlinear value sensitivity at the decision phase in MDD was different from that of controls.

To date, studies examining valuation in MDD have focused primarily on the outcome phase of reward learning tasks and shown impaired valuation, including diminished neural reward responses35,36,37, reduced learning rate38, lower reward sensitivity1, or enhanced exploration (more frequent choice shifting)39,40 in participants with MDD. A few other studies have used various non-learning tasks and have suggested that individuals with MDD have low motivation for monetary reward21,41,42; however, in these studies, the focus was also on responses at the outcome phase22,23. Unlike the abundant literature about responses to reward outcome (particularly during reward learning), little is known about whether individuals with MDD have intact ability to process and compare values during decision-making when no learning is required. The current study provided no outcome feedback during the task and thus focused on the decision phase dissociated from learning and reward experience. These data showed that during the decision phase, participants with MDD have value processes comparable to that of healthy individuals. This is consistent with previous studies showing intact neural responses in individuals with MDD during reward anticipation (prior to outcome)43,44. Together, the present data suggest that individuals with MDD have intact valuation when reward contingencies are fully known (no reward learning required) and suggest that previously reported valuation deficits in MDD are specific to the outcome phase of tasks in which rewards are experienced and learning occurs. We note that this conclusion is drawn based on the assumption of linear probability weighting (see Supplementary Text for nonlinear probability weighting function) and the present task where each pair of gambles had the same high- and low-payoff probabilities. Individuals with MDD may exhibit altered valuation in other environments (e.g., when making choices between two options that have different payoff probabilities); these possibilities cannot be ruled out in the present data.

In MDD, intact valuation, dissociated from learning, may provide mechanistic insight about behavioral activation therapies for depression45. These type of therapies engage individuals with potential positive reinforcers (rewards) in a structured manner and, in doing so, allow individuals with MDD to largely bypass disrupted learning processes. That is, behavioral activation provides a guided learning environment wherein engagement and experience of action-reward contingencies are enforced, allowing for the value of rewards to evolve from being unsampled and ambiguous to sampled and fully known. Once these values are known, intact decision processes such as those identified here allow individuals to engage in healthy choices. As our data indicate, when action-reward contingencies are fully known, participants with MDD show intact valuation during decision-making. We speculate that this state is comparable to the endpoint of successful behavioral activation wherein the experience of reward is restored. In brief summary, the current study suggests specificity of previously reported value processing disruptions in MDD, informs the conditions under which sensitivity to reward values is preserved, and offers the possibility that learning about reward values, rather than discriminating among values when making decisions, may be a mechanistic target for intervention in MDD.

## Methods

### Participants

Fifty non-psychiatric controls and 80 individuals with MDD were recruited as part of a larger ongoing study examining neural substrates of treatment response in MDD (neural and treatment data will be analyzed as part of another manuscript). Among these participants, we included individuals who at least participated in both Time 1 and 4 laboratory visits to maximize the time interval for test-retest reliability. These inclusion criteria yielded 41 non-psychiatric controls and 69 individuals with MDD for the present study. Basic inclusion/exclusion criteria were initially assessed via telephone and were confirmed during the first laboratory visit using the Structured Clinical Interview for DSM-IV-TR Axis I Disorders – Research Version – Patient Edition (With Psychotic Screen) (SCID-I/P)46 and selected modules of the Mini-International Neuropsychiatric Interview (M.I.N.I.)47. At study intake, individuals in the MDD group met DSM-IV criteria for MDD and/or dysthymia while individuals in the control group did not meet criteria for any current Axis I disorder. Exclusion criteria for all participants included contraindications to magnetic resonance imaging (MRI) and history of neurological disease. Following the initial screening visit (Time 1), participants returned to the lab up to three times; on average, there were 5.5 weeks between each visit. Three controls whose BDI-II scores were above the non-depressive range (i.e., greater than 12) at any visit and two individuals with MDD who had BDI-II scores in the non-depressive range (i.e., less than 13) at Time 1 were additionally excluded from analyses48. Five controls and two individuals with MDD who always chose the option with smaller expected value were also excluded. Therefore, the analyzed sample for RP and IT parameter estimation included 33 healthy controls (14 females; age = 33.00 ± 11.31) and 65 participants with MDD (48 females; age = 37.92 ± 11.48). See Table 1 for additional demographic information. Use of psychotropic medication was not an exclusion criterion for individuals with MDD, and at study enrollment, 20 participants with MDD were taking one or more psychotropic medications. As noted above, these data were collected as part of a larger study examining neural substrates of treatment response in MDD, and a subgroup of individuals with MDD received cognitive behavioral therapy (CBT) over the course of participation (N = 45; see Supplementary Table S1 for demographic data for the treatment group). As such, participants showed decline in symptoms over time; neither symptoms nor symptom change was related to either RP or IT (as detailed throughout the main text). All participants provided written informed consent following an explanation of study procedures. The study was approved by the Institutional Review Board (IRB) of Virginia Tech, and all experimental procedures followed relevant institutional guidelines and regulations.

### Experimental procedure

Participants made a series of nine choices between two gambles, one of which was objectively riskier than the other (adapted from Holt & Laury8) (Fig. 1). Each pair of gambles had the same high- and low-payoff probabilities that varied from 10% to 90% in 10% increments along the nine pairs. Payoff spreads between high- and low- payoffs were fixed for each option; ‘Option A’ had $5.00 and$4.00, and ‘Option B’ had $9.63 and$0.25 as potential payoffs. Participants were paid based on the actual outcome of one of their choices; the payoff was determined by the values of a gamble selected via random number generator from the participant’s actual choices and a roll of a hundred-sided die (the first determining which gamble would be played and the second determining the payoff).

### Model-free analyses

For model-free behavioral analyses, the proportion of choices of the risky option (P(risky)) among the nine pairs of gambles was used as a measure of risk preference. Given the expected value (EV) between pairs of choices (Fig. 1), a risk neutral individual should show P(risky) = 5/9 ≈ 0.56 (as per expected utility theory, a risk neutral individual is expected to choose Option B in the trials where EV(B) > EV(A), decisions 5–9, and to choose Option A in the trials where EV(B) < EV(A)). Higher P(risky) thus indicates risk seeking. P(risky) was calculated per visit and used to examine stability of model-free risk preferences over time in each group.

### Estimates of individual risk preference

We applied expected utility theory12 to estimate each individual’s risk preference (RP) and inverse temperature (IT) that predict the individual’s choices. We used a standard power utility function and softmax choice rule as described below:

where UA (UB) is the utility of the Option A (Option B), P is the probability of earning a payoff, V represents the payoff amount for each gamble, α is the risk preference, and μ is the inverse temperature. The estimated RP, α, indicates whether an individual is risk averse (0 < α < 1), risk neutral (α = 1), or risk seeking (α > 1). The estimated IT, μ, indicates how sensitive an individual is to the utility differences between the two gambles; larger μ indicates higher sensitivity to utility differences and μ ≈ 0 indicates utility (subjective value) insensitivity.

To achieve a more stable parameter estimation for each individual, we adopted a hierarchical model structure of the population49 in which it is assumed that a participant i’s parameters (μi and αi) are sampled from the population’s parameter distribution. Of importance, both controls and participants with MDD were considered to share the same group-level (population) distribution (equal prior), which allowed us to compare the two participant groups in the further analyses. That is, the two groups were assumed to be coming from the same distribution every time the group-level parameters were updated throughout the estimation procedures. As this approach may bias against finding group differences (because individuals from two groups are treated as samples from equal priors), we also implemented two additional approaches with different assumptions: i) separately estimating group-level distributions (biasing toward finding group differences) and ii) estimating an additional variable that captures a potential group mean difference (examining whether the mean group difference is different from zero) (see Supplementary Text for details).

Based on the assumptions for each approach, we estimated the group-level parameter distribution for each parameter and set the distribution as a prior for individual estimation (maximum a posteriori (MAP) estimation). In the current study, we set the group-level distribution of each parameter as a gamma distribution50 with a shape parameter, k, and a scale parameter, θ, (μ ~ Γ(kμ, θμ); and α ~ Γ(kα, θα)). For each iteration of the group-parameter estimation (max iteration of 15,000), 100 random samples were drawn from each parameter distribution for each participant, and the average of the calculated likelihoods was used as an approximation of the integral in the following equation:

Note that all participants visited at least twice, including the 1st and the 4th visits.

To allow for RP and IT values to vary across time independent from subject-level information, we did not provide any information about subjects’ identity in the estimation step; that is, behavioral choices from a participant’s 1st and 4th visits were considered decision patterns from two independent participants. Note that estimated value sensitivities for the same subject from repeated visits were considered as repeated-measures for post estimation stability testing. To apply this method, we used 196 sets of behavioral choices for the group-level parameter estimation ([33 HC + 65 MDD] × [1st visit + 4th visit]; only 1st and 4th visits were used to provide an equal amount of choice information from each individual participant). The group-level parameters were used to define each parameter’s prior distribution for individual-level estimation, which was equally applied to individual-level estimations for all four visits. We fit the data using MAP, with the posterior function below:

All parameter estimations were conducted with custom MATLAB R2015b (MathWorks) scripts and the fminsearch function in MATLAB with multiple initial values.

### Clinical measures

At each visit, participants completed a battery of self-report measures to assess current depression and anxiety symptoms. Depressive symptom severity was measured using the BDI-II, Anhedonia subscale of the BDI-II (sum of BDI items 4, 12, 15, and 21)24, and the Anhedonic Depression subscale score of the MASQ. Anxiety symptom severity was measured using the State Anxiety Scale of the STAI (State-Trait Anxiety Inventory) and the Anxious Arousal subscale of the MASQ. Additionally, general distress (GD) related to depressive symptoms, anxious symptoms, or a mixture of the two were measured using the MASQ subscales, GD: Anxiety, GD: Depression, and GD: Mixed, respectively.

### Statistical analyses

We examined if model-free risk preference (proportion risky choices) and model-based measures of value sensitivity (inverse temperature and risk preference) were consistent across multiple visits. IT and RP measures in both participant groups were not normally distributed (Shapiro-Wilk test P < 0.01 for IT and RP in each group and in each visit), and thus non-parametric tests were used as appropriate and available. First, to compare the means of IT and RP across four laboratory visits and between groups, we used mixed-design ANOVAs where visit number (Time 1, Time 2, Time 3, Time 4) was the within-subject factor and diagnostic group (MDD, control) was the between-subject factor. Parameters were first rank transformed and then inserted for the mixed-design ANOVAs17. In addition, we used Friedman’s test to examine whether IT and RP across four visits were stable or not, within each group. Second, Spearman’s correlations between risk preference measures from two different visits (‘1st visit’ (T1) vs T2, T1 vs T3, T1 vs T4, T2 vs T3, T2 vs T4, and T3 vs T4) were calculated to test if the rank-order of risk preference within each group was consistent across multiple visits. All statistical tests were two-sided. False discovery rate (FDR) adjusted q-values where indicated were reported for multiple comparisons51. MATLAB R2015b was used for all statistical tests.