Deficits in reinforcement learning but no link to apathy in patients with schizophrenia

Negative symptoms in schizophrenia have been linked to selective reinforcement learning deficits in the context of gains combined with intact loss-avoidance learning. Fundamental mechanisms of reinforcement learning and choice are prediction error signaling and the precise representation of reward value for future decisions. It is unclear which of these mechanisms contribute to the impairments in learning from positive outcomes observed in schizophrenia. A recent study suggested that patients with severe apathy symptoms show deficits in the representation of expected value. Considering the fundamental relevance for the understanding of these symptoms, we aimed to assess the stability of these findings across studies. Sixty-four patients with schizophrenia and 19 healthy control participants performed a probabilistic reward learning task. They had to associate stimuli with gain or loss-avoidance. In a transfer phase participants indicated valuation of the previously learned stimuli by choosing among them. Patients demonstrated an overall impairment in learning compared to healthy controls. No effects of apathy symptoms on task indices were observed. However, patients with schizophrenia learned better in the context of loss-avoidance than in the context of gain. Earlier findings were thus partially replicated. Further studies are needed to clarify the mechanistic link between negative symptoms and reinforcement learning.

Scientific RepoRts | 7:40352 | DOI: 10.1038/srep40352 of these previously learned values. The orbitofrontal cortex (OFC) is thought to be significantly involved in the representation of expected value 14 . Thus, value-based decision-making relies on a learning mechanism at the time of outcome, which relates to relevant previous situations (PE signaling in the midbrain) and a "storage unit" that holds a representation of the expected value of choice options (expected value representation in the OFC).
Selective reinforcement learning deficits with reduced learning from positive, but intact learning from negative outcomes has been reported in patients with schizophrenia [15][16][17] , which is in line with a proposed neurocomputational model of dopamine induced basal ganglia-cortex interactions 15,18 . It has further been argued that this selective deficit might be linked to the severity of negative symptoms 16,19 . The selective learning deficit in the context of reward constitutes a plausible factor causing and/or maintaining negative symptoms, especially motivational deficits manifesting themselves as apathy. However, it is unclear whether these findings reflect impairments in positive PE signaling during outcomes or in the precise representation of expected reward values to guide decision-making. In their seminal paper, Gold and colleagues 20 applied a task that allowed the investigation of the relative contribution of these two mechanisms. In an acquisition phase, participants were presented with stimulus pairs involving either potential gain or potential loss. Following this acquisition phase, where participants learned the stimulus-outcome associations of these pairs, they indicated the valuation of the different stimuli in a transfer test phase. In this transfer test phase, participants chose between novel combinations of all learned stimuli without additional feedback. The most critical novel stimulus combination was the one that involved choosing between an action that had been rewarded and one that avoided loss. Both these stimuli should have elicited positive PEs during acquisition and thus should have been reinforced. If an actor would only make decisions based on PE signals during learning, he would be indifferent between the rewarded and the loss-avoiding stimulus. In contrast, if the actor would base his decision on representations of expected value, he should prefer the rewarded to the loss-avoiding stimulus, since the latter does not yield reward. Patients with high apathy symptoms showed impairments in learning from gain but intact loss-avoidance learning. Moreover, patients with high apathy levels failed to prefer the rewarded stimuli (that with higher expected value) to the loss-avoiding stimuli in the transfer test phase, indicating deficits in the representation of expected value.
The findings of Gold and colleagues 20 have provided new insights and contributed to a mechanistic understanding of motivational negative symptoms in schizophrenia. In light of the relevance of this pattern and the neurobiological plausibility thereof, we aimed to assess the stability of the finding across studies as part of a larger study investigating neural and cognitive correlates of negatives symptoms in schizophrenia. Based on the study by Gold and colleagues 20 and other previous studies [15][16][17] , we hypothesized that patients compared to healthy control participants would show impaired gain, but intact loss-avoidance learning in the acquisition phase. Moreover, we expected that patients would generally show better learning in loss-avoidance relative to gain-seeking trials, as revealed by within-group analyses. Critically, we hypothesized that both these effects would be more pronounced in patients with high apathy compared to patients with low apathy symptoms. Moreover, based on the paper of Gold and colleagues 20 , we expected that in the transfer test phase high apathy patients, compared to low apathy patients and controls, would show a weaker preference for rewarded stimuli over the stimuli that simply avoided loss, which would be indicative of impaired expected value representation.

Methods
Participants. Participants included 64 patients with a DSM-IV 21 diagnosis of schizophrenia (n = 60) or schizoaffective disorder (n = 4; no mood episode; the patient group is abbreviated SZ), recruited from the inand outpatient units of the Psychiatric University Hospital in Zurich (Switzerland) and the Psychiatric Hospital Karlsbad Langensteinbach (Germany). Patients were stable regarding medication and psychopathology and met the following inclusion criteria: Daily lorazepam dosage ≤ 1 mg, no florid positive symptoms (Positive and Negative Syndrome Scale; PANSS 22 no positive item score > 4), no extrapyramidal symptoms on clinical examination, no additional DSM-IV axis I or II diagnosis. Nineteen healthy control participants (HC) were recruited from the hospital staff of the Psychiatric University Hospital in Zurich. In order to confirm axis I diagnosis in the SZ group, and exclude potential HC group participants with axis I disorders, we conducted the Mini-International Neuropsychiatric Interview 23 .
Participants did not enter data analysis if learning performance in the probabilistic learning task did not exceed 0.5. In other words, they had to perform better than would be expected in case of choice by chance. 7 participants from the SZ group and 1 HC participant failed to reach the criterion and were thus excluded from further analyses. All remaining patients received antipsychotic medication. Eight were also treated with antidepressants, 5 with mood stabilizers, 1 with an anxiolytic agent, and 2 with anticholinergic medication. Further clinical and demographic information of the study sample are presented in Table 1.
The ethics committee of the Canton of Zurich approved the study protocol and all methods and experimental procedures were performed in accordance with the latest version of the Declaration of Helsinki. All participants provided written informed consent.

Assessment of psychopathology and cognition.
To assess the symptom severity in our patient sample, we applied the PANSS 22 . To further assess negative symptoms more accurately and according to the recent consensus regarding their two-dimensionality (i.e., apathy and diminished expression) 1,2 , we used the German version of the Brief Negative Symptom Scale (BNSS) 24 . Patients were divided into a high apathy (HA) and a low apathy (LA) group according to a median split on the apathy factor (anhedonia, asociality, and avolition) of the BNSS. Please note that Gold and colleagues 20 applied the Scale for the Assessment of Negative Symptoms 25 and extracted a factor comprising avolition and anhedonia global items 26 , which is similar to the apathy factor of the BNSS 27 . Thus the study by Gold and colleagues and the present study have targeted the same psychopathological construct. Social, occupational, and psychological functioning of patients were evaluated with the Global Assessment of Functioning scale (GAF) 28 . Participants were also characterized regarding premorbid intelligence Scientific RepoRts | 7:40352 | DOI: 10.1038/srep40352 using a multiple-choice vocabulary intelligence test 29 . We further applied the digit-symbol coding test of the Wechsler Intelligence Scale 30 , which assesses processing speed and has been shown to significantly predict composite cognition scores in schizophrenia 31 .
Patients differed from the HC group regarding processing speed and premorbid verbal intelligence (see Table 1). Groups did not significantly differ regarding age and gender. The LA and HA group showed differences in gender distribution (relatively more men in the HA group), in total negative symptom severity, in severity of positive symptoms (more severe positive symptoms in the HA group), global psychopathology (more severe global symptoms in the HA group), and in functioning (lower functioning in the HA group).

Probabilistic reward learning task. The present learning task is adapted from the version introduced by
Gold and colleagues 20 . In a pilot study, we implemented the task using images of landscapes as learning stimuli as in the original Gold study. However, five patients with schizophrenia all failed to learn stimulus-outcome associations above chance level. Consequently, we adjusted the task in several respects. First, we replaced the landscape stimuli with line drawings of everyday objects (taken from a larger set of 260 pictures by Snodgrass and Vanderwart) 32 . Second, to emphasize the learning context (gain vs. loss-avoidance), we added green or red bars above and below the stimuli. Third, instead of the verbal feedback in the trials with "zero outcome" ("Not a winner, Try again!" or "Keep your money!"), we presented an empty white circle. Each of these adaptions was aimed at making the task easier for the participants.
The resulting adapted version of the task was administered via the MATLAB toolboxes Cogent 2000 and Cogent Graphics on a 19-in monitor. The task is structured in two phases, an acquisition and a transfer phase. During the acquisition phase, participants were presented with 4 pairs of line drawings of everyday objects (white on black background), 1 pair at a time (see Fig. 1). Two pairs involved a potential gain if the correct item was selected and two pairs involved a potential loss if the incorrect item was selected. For the gain pairs, if the correct item was selected, participants saw the image of either 10 Swiss cents (≈ 0.096 US dollars by time of testing) or 10 Euro cents (≈ 0.090 US dollars by time of testing) coupled with the feedback "Win!" (positive PE), whereas if the incorrect item was selected, an empty white circle was presented, indicating that nothing was won in this trial ("zero outcome", negative PE). The correct response was reinforced 90% of trials in one pair and 80% in the other pair. Two other pairs involved a potential loss (loss-avoidance pairs). In these pairs, selection of the incorrect response resulted in the feedback "Lose!" combined with a crossed out image of the coin (negative PE). If the correct item was selected in the loss-avoidance pairs, participants avoided a loss in 90% or 80% of the time and were  presented with an empty white circle ("zero outcome", positive PE). A 12-trial practice session was administered to ensure task comprehension, followed by 160 learning trials, which were presented in randomized order within 4 blocks. Each pair was presented 40 times in the acquisition phase. Following acquisition, the 64 trials of the transfer test phase were presented. The original pairings of the acquisition phase were each presented 4 times, and 24 novel pairings were each presented twice. Novel pairings consisted of each trained item combined with one another (e.g., a 90% winner item was paired with both items from the 80% gain pair, the 90% loss-avoidance pair, and the 80% loss-avoidance pair). No feedback was given during the transfer test phase. Participants were instructed to select the item that they thought was "best" based on prior learning in the acquisition phase.

Statistical analyses.
To investigate performance in the acquisition phase, we computed a repeated-measures analysis of variance (ANOVA) with the between-subject factor group (HC, LA, HA) and within-subjects factors of feedback valence (gain vs. loss-avoidance), probability (80% and 90%), and learning block (1-4). Huynh-Feldt correction was applied if assumption of sphericity was violated. Significant interaction terms were followed-up by post hoc Fisher's Least Significant Difference (LSD) contrasts. In line with Gold and colleagues 20 , we subtracted performance in loss-avoidance trials from the performance in gain-seeking trials (gain-loss difference score) to additionally investigate whether learning was better in the context of gain or loss-avoidance. Potential group differences were then examined using ANOVA, while within-group balance between gain vs. loss-avoidance learning was examined using one-sample t-tests. Performance in the transfer test phase was evaluated using ANOVA, follow-up post hoc LSD tests, and one-sample t-tests to test for within-group preferences between stimuli.
Please note that p-values of one-sample t-tests were not corrected for multiple testing. Raw task data was processed using MATLAB. All statistical analyses were performed using SPSS version 22.0.

Results
Acquisition phase. As depicted in Fig. 2, all groups showed learning across the 4 blocks in all conditions of the acquisition phase. However, the HC group performed better than the patient groups in all conditions, while the LA and the HA groups did not show consistent differences in learning performance. More precisely, a mixed-effects ANOVA with the factors group, feedback valence, probability and learning block revealed significant main effects of group (F(2,72) = 4.01, p = 0.02) and learning block (F(3,72) = 81.20, p < 0.001). Follow-up LSD tests yielded significant group differences between the HC and the LA (p = 0.01) and HA (p = 0.03), but no significant difference between the two patient groups (p = 0.80). The significant main effect of learning block reflects significant learning performance across blocks. All other main effects, as well as interaction effects (twoway and three-way) were non-significant.
In line with Gold and colleagues 20 , we additionally calculated a gain-loss difference score to investigate the balance of learning from gain vs. loss-avoidance (Fig. 3). We subtracted learning performance in loss-avoidance trials from the learning in gain trials. An ANOVA with this gain-loss difference score as dependent variable yielded no significant group differences between HC, LA, and HA groups (F(2,72) = 0.51, p = 0.60). One-sample t-tests were conducted to investigate the balance of gain vs. loss-avoidance learning within the groups. The HC group did not show a preference for feedback valence in the learning phase (t(17) = − 0.38, p = 0.71). The patient groups showed a tendency towards better loss-avoidance than reward learning, however, the effects were not significant (LA: t(27) = − 1.62, p = 0.12 ; HA: t(28) = − 1.39, p = 0.18). We further pooled the two patient groups to explore potential effects disregarding negative symptom severity. The pooled patient group showed significantly better learning in the context of loss-avoidance than in context of reward (t(56) = − 2.14, p = 0.04).
In sum, in the acquisition phase all groups showed learning across blocks. The HC group performed significantly better than the patient groups, which did not differ from each other regarding acquisition learning. Moreover, the pooled patient group learned better in the context of loss-avoidance relative to gain trials independent of apathy symptom severity.  Transfer test phase. In the transfer test phase (Fig. 4), participants were presented with novel stimulus pairings and were instructed to choose the stimulus that they thought was "best" based on prior learning.
Two stimulus pairings in the transfer test phase are indicative of the relative contribution of PEs and representations of expected value. A first relevant pair is the one with the most frequently rewarded stimulus (FW) and the item that most frequently avoided losses (FLA). Both stimuli were associated with identical positive PEs during acquisition learning. However, the expected value of the FW stimulus (frequently wins 10 centimes) is higher than the FLA item (frequently avoids the loss of 10 centimes). The one-way ANOVA testing for this first critical group difference was not significant (F(2, 72) = 0.23, p = 0.79). One-sample t-tests on the FW-FLA pair, testing for difference from 0.5, revealed that the HC (t(17) = 2.83, p = 0.01), the LA (t(27) = 2.15, p = 0.04), and the HA group (t(28) = 4.48, p < 0.001) showed choice preference implicating, at least partial, representation of expected value. In other words, they preferred the stimulus with higher expected value, despite equal association with positive PEs.
The second relevant pairing in the transfer test phase was the pair with the infrequent winners (IW) and the frequent loss-avoiders (FLA). All participant groups showed a robust preference for the frequent loss avoiding items, although the infrequent winner items had a higher expected value (one-sample t-tests all p's < 0.001). Thus, all groups preferred the stimulus that was more frequently associated with positive prediction errors to the stimulus that had a higher expected value, but was more frequently associated with negative prediction errors. The one-way ANOVA testing for group differences was only trend-level significant (F(2, 72) = 3.11, p = 0.051). In fact, patients showed a tendency for stronger preference for the higher expected value item compared to healthy controls.
In sum, there were no significant group differences in the transfer phase, which tested for learned valuation of the stimuli. Participants showed preference for the FW stimulus in the FW-FLA pairs, indicating expected value representation in all groups. In contrast, in the IW-FLA pairs, all groups preferred the stimuli with lower expected value but more frequent positive prediction errors, which suggests valuation that was more strongly driven by prediction errors. Thus, all groups showed a valuation pattern that is indicative of prediction error based learning combined with a representation of expected value.
Effects of potential confounders. HC group performance in the cognitive measures used in the present study was significantly better compared to the SZ group. If cognitive measures were added as covariates in the reported mixed-effects ANOVA for the acquisition phase data, the main effect of group did not remain significant (F(2,70) = 0.05, p = 0.56). It can thus not be ruled out that differences in acquisition learning between the SZ and HC group were driven by general cognitive deficits in the SZ group. However, the gain-loss difference score was not significantly associated with cognitive measures in the study groups.
We additionally computed Pearson correlations coefficients (r) between positive symptom severity and chlorpromazine equivalents (CPZ) and relevant task indices. Interestingly, positive symptoms correlated negatively (r = − 0.28, p = 0.04) with learning performance in the acquisition phase. Moreover, positive symptoms correlated negatively with the ratio of optimal selection in the FW-FLA pairs of the transfer phase (r = − 0.30, p = 0.03), which might suggest that positive symptoms interfere with the ability to represent the expected reward value of actions. However, since the present study focused on negative symptoms, severe positive symptoms were an exclusion criterion to control for secondary negative symptoms. Thus positive symptom severity was low in the present patient sample. Please note that adding positive symptom severity as a covariate to the ANOVAs computed in the present paper had no significant influence on the results. CPZ equivalents correlated with the ratio of optimal selection in the FLA-IW pairs (r = 0.30, p = 0.02), which might indicate that antipsychotic medication may have a positive effect on reward value representation. CPZ equivalents and antipsychotic medication regimens did not differ significantly between the LA and HA group.

Discussion
The present study aimed to investigate probabilistic reinforcement learning in the context of gain and loss-avoidance in a sample of schizophrenia patients and how learning and valuation relates to the severity of apathy. More specifically, we aimed to assess the stability of the findings of a previous study by Gold and colleagues 20 . We found a general deficit in probabilistic reinforcement learning in the patient group compared to healthy controls. However, LA and HA patients did not differ regarding acquisition learning performance. Moreover, we found that the pooled patient group learned better in the context of loss-avoidance relative to gain, independent of severity of apathy symptoms. In the transfer test phase, groups did not differ in their valuation in two critical stimulus pairs. In other words, diagnosis of schizophrenia and apathy symptom severity did not affect valuation. All participants showed choice behavior in the transfer test phase indicating both learning from positive PEs and representation of expected reward value. The present study could thus partially replicate the findings by Gold and colleagues 20 .
Several studies have suggested that patients with schizophrenia have a selective learning deficit when they have to learn contingencies of reward-related stimulus-outcome associations [15][16][17]19,20 . In the context of preventing a potential loss, learning performance seems to be unimpaired. This pattern has been shown to be most pronounced in patients suffering from severe negative symptoms 16,20 . An organism that is impaired in encoding and learning from positive outcomes and at the same time is unimpaired in learning from negative outcomes is likely to develop defensive strategies, and show decreased exploration and gain-oriented goal-directed behavior. This description matches the clinical symptoms of motivational negative symptoms (i.e., apathy) quite well. In the present study, we could replicate the relatively better learning from loss-avoidance compared to gain learning in patients with schizophrenia. However, both learning indices were impaired compared to the HC group, which is in contrast to some previous studies reporting intact loss-avoidance learning [15][16][17]19,20 , but in line with studies showing impairments in both gain and loss-avoidance learning in patients with schizophrenia [33][34][35] and first episode psychosis 36 . In further contrast to some previous studies 16,19,20 , we found no association of apathy symptoms with a relative preference for loss-avoidance learning or other learning parameters. However, other recent studies have also reported no relation of learning parameters with apathy symptom severity 34,37,38 .
In the present study, we investigated choice in the transfer test phase in two critical stimulus parings. Contrary to our hypothesis, we did not find any significant group differences. One test pair (FW-FLA) indicated that all participant groups attained an expected value representation during acquisition. However, in the other critical pair (IW-FLA) all participant groups preferred the stimulus with lower expected value, but stronger association with positive prediction errors. These inconsistent findings likely reflect the fact that the proposed two relevant mechanisms (PEs and precise expected reward value representation) are not mutually exclusive, but rather dynamic systems that strongly interact. This notion is in line with the reasoning of Gold and colleagues 20 , which additionally supported the assumption of a "mixed-strategy" regarding the two mechanisms with elegant computational modeling. Considering non-significant group differences in transfer phase choice behavior, additional computational modeling is not indicated to address the primary hypotheses of the present study. In line with our findings, a recent study in patients with first episode psychosis has applied a very similar probabilistic reward learning task and also found no group differences between patients and healthy controls in transfer test performance 36 .
Certain limitations should be considered when interpreting the present findings. First, the sample size of the present study was modest and thus further studies are needed. Second, our patient sample included in-and outpatients. This approach was chosen to include a wide array of patients with different severity of negative symptoms. In the study by Gold and colleagues 20 , the patient sample was recruited from an outpatient clinic. It is thus possible that the heterogeneity in the patient group had an influence on the fact that task adaptations were needed to ensure comprehension and also consecutive results. Participants in the present study showed overall better learning in the acquisition phase compared to the original study. This difference might have also arisen because of the adaptions that were made. Moreover, it cannot be ruled out that the better learning in the acquisition phase impacted choice behavior in the critical transfer test phase. It is conceivable that an "overlearning" in the acquisition phase has led participant to develop a cognitive strategy to solely divide stimuli into "good" and "bad" items, which would decouple them from their feedback valence. The consistent preference in all groups for the stimulus with lower expected value in the IW-FLA pair would be in line with this reasoning. However, it could not explain the preference for the frequent winner over the frequent loss avoider (FW-FLA pair). Another possible factor that might explain discrepancies in our findings compared to the study by Gold and colleagues 20 is the age difference in the two samples. Patients in our sample were on average more than 10 years younger, which might be associated with lower chronicity in our sample. As mentioned above, it has recently been reported that patients with first episode psychosis show less pronounced learning deficits and intact expected value representation 36 , which is more consistent with our findings and might point towards effects of illness stage on learning and value representation. A direct comparison of chronicity in our sample and the study by Gold and colleagues 20 is not possible because they did not report illness duration. Our sample was also different regarding antipsychotic medication. Patients in our sample were only medicated with second-generation antipsychotics. However, a direct comparison of antipsychotic medication effects is not possible since Gold and colleagues 20 did not report CPZ or haloperidol equivalents in their sample characteristics.
The present paper reports that schizophrenia patients showed a general deficit in probabilistic reinforcement learning and a tendency towards better loss-avoidance than gain learning. However, these effects were not driven by apathy symptom severity. These findings are in partial contrast to previous studies and thus add to the growing body of literature that reports heterogeneous findings regarding the link between task-based assessments and Scientific RepoRts | 7:40352 | DOI: 10.1038/srep40352 negative symptoms (e.g., effort-based decision-making). These inconsistencies might be due to critical variation in task design and characteristics of the study population. Future studies with unmedicated patients or studies applying functional magnetic resonance imaging might help to clarify the mechanistic link between negative symptoms and reinforcement learning and might also explain the heterogeneity in findings. Shedding light on these potentially basic mechanisms of schizophrenia symptoms is of fundamental importance not only for the development of pharmacological therapies, but also for psychosocial and psychotherapeutic interventions. For example, the success of cognitive behavioral therapy is strongly dependent on the patient's ability to learn from experiences that contradict expectations (e.g., "better than expected" outcomes when challenging defeatist beliefs).