Introduction

Monitoring and evaluating the consequences of our behaviour is important for learning from past events and for action selection in an uncertain environment. Crucially, both positive and negative outcomes (i.e., reward and punishment, respectively) differentially influence our future behaviour. Such reinforcement-guided learning has been shown to engage a specific neural circuitry including the mesencephalic dopamine system and its prominent target areas, e.g., the striatum and the medial frontal cortex, especially the anterior cingulate cortex (ACC). The present study was designed to elucidate the specific roles of these brain areas in feedback processing.

The seminal work by Schultz and colleagues demonstrated that the presentation of an unpredicted reward elicits a phasic response in mesencephalic dopamine neurons of monkeys. Dopamine neurons also decrease their firing rate when a stimulus predicts an aversive outcome or when an expected reward is not given1. On the basis of these results, it was assumed that dopamine neurons code the mismatch between expected and actual reward outcomes2 and relay this information to target regions in order to drive learning3.

In one of the most influential theories, the reinforcement learning (RL) theory4, the above findings have been adapted to humans and the mesencephalic dopamine system has been linked to error- and feedback-induced learning. The RL theory, building on an actor-critic architecture, postulates that the basal ganglia play the role of an adaptive critic that computes the value of events. Actor functions are carried out at least in part in the dorsal ACC4. When an event is evaluated as “worse than expected” (as is the case for unexpected negative feedback), decreased dopaminergic input disinhibits neurons in the ACC and train it to adjust control of the motor system. The RL theory assumes that undesirable outcomes like negative feedback or reduced reward signal the need to switch to an alternative response. In contrast, positive outcomes suppress ACC activity by inhibiting ACC neurons via a phasic increase in dopaminergic activity in the basal ganglia. In both cases, the ACC uses these signals to select a response that is most successful for the task at hand and by this guides future behaviour based on experience. Supporting evidence for this view comes from studies showing a greater involvement of the ACC after negative as compared to positive outcomes, e.g., after negative feedback in a dynamically adaptive motion prediction task5, during category learning6,7, after negative feedback in a probabilistic learning task8, or after feedback signalling reduced monetary reward9. Additional evidence comes from event-related potential (ERP) studies showing that unexpected negative but not unexpected positive feedback elicits the feedback-related negativity (FRN)10,11, which is related to learning processes and generated in the dorsal ACC4,12,13,14,15,16. In accordance with the RL theory, these results indicate that the ACC plays a crucial role in feedback processing with undesired, i.e., negative outcomes leading to a greater involvement of the ACC.

Despite such converging evidence for increased ACC activity solely to negative feedback in a variety of tasks17, several other studies have failed to find ACC activity that was specific for negative feedback. For instance, Nieuwenhuis and colleagues18 found a large area in the posterior medial frontal cortex, extending from caudal ACC into the presupplementary motor area, that was activated to a similar degree by positive and negative feedback in a time estimation task (for similar results see19,20,21). Other studies have observed even greater ACC activation after positive than after negative feedback22 or after unexpected rewarding events23. Similarly, several ERP studies reported FRNs after unexpected positive outcomes24,25,26,27. These findings cannot readily be explained by the RL theory. However, the prediction of response-outcome (PRO) theory28,29 offers an alternative account to reconcile these apparent inconsistencies. The PRO theory suggests that the key function of the ACC is predicting the likely outcomes of actions and signalling unexpected non-occurrences of those events. This includes detecting not only unexpected undesirable outcomes like negative feedback, but also unexpected desirable or rewarding outcomes. Consistent with this view, researchers found larger ACC activation in imaging studies after low-frequency, surprising events30,31, signalling the need for increased control32.

Taken together, the two competing theories make different predictions regarding the effects of valence (positive vs. negative feedback) and unexpectedness of the feedback stimulus on ACC activity. However, a direct test of these different predictions, i.e., the question of which of these aspects or maybe even a combination of both is the determining factor of the involvement of the ACC in feedback processing, has received little attention in previous research. In fact, in studies using learning paradigms (where feedback is behaviourally relevant) these two critical aspects are typically confounded. Negative outcomes generally occur unexpectedly, while positive outcomes can be expected after learning has taken place7,9,33. Hence, it is conceivable that the ACC is activated not only by negative expectancy violations but by expectancy violations independent of their valence.

In an earlier ERP study, we applied a time estimation task with unexpected positive, unexpected negative and expected intermediate feedback. To disentangle feedback valence and expectancy, feedback was given via an adaptive mechanism that was tied to participants' performance. This mechanism ensured that positive and negative feedback occurred in only 20% of cases, respectively and thus was unexpected, while intermediate feedback occurred in most cases (60%) and therefore was expected. Our results showed that an FRN (measured by applying a peak-to-peak approach) was elicited for unexpected feedback irrespective of feedback valence25. Assuming that the FRN is generated in the ACC12,13,15,16, our data confirmed the PRO model of ACC function28,29. However, ACC activation was inferred rather indirectly from scalp recorded ERPs and conclusions about the underlying brain structures responsible for specific processes from ERPs are problematic in general. Therefore, the aim of the present study was to test the different claims that can be derived from the PRO and RL model and to examine whether the ACC is differentially activated by positive and negative feedback by means of functional magnetic resonance imaging (fMRI). To our knowledge, there is only one fMRI study with the explicit goal of disentangling the contribution of these factors34. They encouraged participants to choose a high-probability gamble over a sure win by manipulating the win value (gamble: 4 cents with a probability of .8, sure win: 3 cents) and found that feedback indicating losses resulted in greater activity of the ACC than feedback indicating wins. Crucially, the probability of these two outcomes is rather unbalanced with losses being very improbable and thus the result could be equally attributed to the valence and to the unexpectedness of losses. Moreover, feedback may play a different role in gambling than in learning paradigms. In gambling tasks, feedback expectations are not based on feedback frequency alone (cf. the gamblers' fallacy) and the feedback cannot be utilized to improve performance. Here, valence seems more important as it reports back the outcome of the current trial. A similar reasoning also holds for other tasks where feedback is invalid or fictive.

To disentangle the potential roles of feedback valence and expectancy in learning tasks, we used a scanner-adapted version of our above described time estimation task25 (see Figure 1). By comparing positive and negative feedback, i.e., both extremes on the valence dimension that are equally unexpected, we were able to investigate the effects of feedback valence without the confounding influence of differences in expectancy. Likewise, combining positive and negative feedback into a single class of rare events minimizes the influence of feedback valence when investigating feedback expectancy effects on brain activation. Moreover, a completely crossed factorial design, as employed in previous studies separating the effects of feedback valence and expectancy34, necessarily requires two different conditions (one with rare positive and frequent negative outcomes and one with the opposite frequency assignment) that have to be compared. This is problematic as participants' subjective evaluation of a bad outcome critically depends on the context in which it occurred, i.e., amplitude differences between equally probable positive and negative feedback vary depending on whether participants expect to win in the upcoming trials as compared to when they expect to lose35. Consequently, in paradigms controlling feedback frequency across different blocks there might still be a residual confound between feedback valence and subjective expectancy. It should be further noticed that a) unlike feedback in gambling tasks, positive and negative feedback in our study are equally salient by virtue of their confirmatory or corrective nature and are therefore relevant for adequately performing the time estimations and b) unlike feedback in probabilistic learning tasks, the feedback in the present task was always valid since it was specifically tied to the participants' task performance19. On the basis of our ERP results, our hypothesis was that the ACC should be more sensitive to unexpected events regardless of the feedback's valence and thus support the PRO theory.

Figure 1
figure 1

(a) Trial procedure for the time estimation task. Participants had to estimate 2500 ms from the offset of the fixation cross and to indicate their estimate by pressing a response button. Feedback was provided 5000 ms after the offset of the fixation cross for 500 ms. (b) Time estimation data and adjustment of the inner and outer time windows for one representative subject. The received feedback is represented in the colour bar (positive = grey, negative = red, intermediate = blue). To keep the rate of positive and negative feedback low and thus unexpected an adaptive procedure was used to adjust the feedback to the participants' performance. The inner and outer time windows (indicated by the grey lines) were adjusted independently of each other whenever negative or positive feedback occurred in less or more than 20% of cases.

Results

Behavioural data

The adaptive mechanism succeeded in generating the intended feedback frequencies: Mean frequencies were 20.81% (SE = .81) for positive, 18.58% (SE = .60) for negative and 60.37% (SE = .79) for intermediate feedback. An ANOVA including the factors Feedback (positive, negative, intermediate) and Block (1st, 2nd, 3rd and 4th block of the experiment) with feedback frequencies as dependent variable confirmed that there was a main effect for Feedback (F(2,32) = 676.12, p < .001, ε = .98). As intended, intermediate feedback was more frequent than the mean of positive and negative feedback (F(1,16) = 1123.98, p < .001) and positive and negative feedback did not differ reliably (p = .08). Feedback frequencies did not vary over experimental blocks (p = .99).

Additionally, mean feedback frequencies over all four blocks of the experiment and for the first block only were subjected to a χ2 test (α = .05, df = 1) for each individual participant. The results indicated that the adaptive mechanism succeeded in adequately adjusting the time windows in all participants since the frequency distributions of positive and negative feedback did not differ from the intended equal distribution in any subject (all χ2 values ≤1.81).

To evaluate participants' performance the deviation of time estimation error, i.e., the absolute difference between estimated time and target time, across the four blocks of the experiment were compared. This analysis revealed a significant variance reduction (linear trend: F(1,16) = 10.96, p = .004) indicating that participant's time estimations became less variable with learning.

fMRI data

The first analysis tested the effect of feedback expectancy. Unexpected feedback elicited greater activity than expected feedback in the ACC and in the anterior insula bilaterally (see Figure 2 and Table I for a complete list of activations). To test whether there is a significant difference in ACC activity as a function of valence, activity elicited by unexpected negative and unexpected positive feedback was extracted from the activated ACC cluster and subjected to a repeated-measures ANOVA. Crucially, negative and positive feedback did not differ from each other (p = .78; see Figure 2).

Table 1 Brain areas that exhibit a significant larger BOLD signal for unexpected than for expected feedback and for positive than for negative feedback. Talairach coordinates, t- and p-values are given for the peak activity
Figure 2
figure 2

Increase in brain activity for unexpected (positive and negative) feedback as compared to expected (intermediate) feedback.

The bottom right corner depicts the mean % signal change of the BOLD response of all activated voxels in the ACC in the three feedback conditions. Whiskers denote standard errors of the mean.

Although the adaptive mechanism was used to control for the frequency of positive and negative feedback, there nevertheless were some modest individual differences. This variability was used to additionally analyse the relationship between feedback frequency (for positive and negative feedback) and ACC activation by means of a correlation analysis. Corroborating the above results, this correlation analysis revealed that increased activity in the ACC was associated with less expected feedback, i.e., the mean percent signal change of all activated voxels in the ACC was negatively correlated with the mean frequency of positive or negative feedback (r = −.32, p = .034, one-tailed; see Figure 3).

Figure 3
figure 3

Correlation between the percent signal change in the ACC for positive and negative feedback and the mean frequency of positive and negative feedback.

The second analysis investigated the effect of feedback valence by contrasting positive and negative feedback. Areas that were activated significantly more by positive than negative feedback included bilaterally the ventral striatum (more specifically, the head of the caudate nucleus, nucleus accumbens and putamen; see Figure 4 and Table I). Interestingly, for the reversed contrast, i.e., negative > positive, no significant activations were observed. To further examine the activity in striatum with respect to effects of feedback expectancy the activity elicited by expected intermediate feedback was compared to the activity elicited by rare positive and negative feedback, separately. This analysis indicated that the striatal activity for intermediate feedback was smaller as compared to positive feedback (F(1,16) = 36.18, p < .001, η2p = .693) but did not differ from the activity elicited by negative feedback (p = .27). This suggests that there is no effect of feedback expectancy on striatal activity per se but that this might depend on the valence of the feedback.

Figure 4
figure 4

Increase in brain activity for positive as compared to negative feedback.

The bottom right corner depicts the mean % signal change of the BOLD response of all activated voxels in the left (dark bars) and the right (brighter bars) ventral striatum in the three feedback conditions. Note that the relative activation in the left and right hemisphere was quite similar. Whiskers denote standard errors of the mean.

Discussion

The aim of the current study was to examine whether the valence or the unexpectedness of an event or a combination of both is the determining factor of ACC involvement in feedback processing in a task context where feedback is behaviourally relevant. More specifically, our goal was to investigate ACC activity after unexpected positive feedback and by this corroborate our previous FRN findings25. For this purpose, we used a time estimation task with three different types of feedback. Because an adaptive procedure ensured that positive and negative feedback stimuli were equally rare and thus unexpected, while intermediate feedback was frequent and therefore expected, we were able to compare positive and negative feedback without the confounding influence of expectancy differences.

Examining the activity profile in the ACC as a function of feedback expectancy, unexpected feedback as compared to expected feedback was found to elicit greater activity in the ACC irrespective of its valence. Importantly, both unexpected negative and unexpected positive feedback differed reliably from expected intermediate feedback but did not differ from each other. Moreover, ACC was also sensitive to the size of these expectancy violations as reflected in increased activity for less expected feedback.

At first glance, our findings are at odds with those of several fMRI studies showing ACC activation for negative feedback only5,7,8,9,33 and ERP studies showing that unexpected negative feedback elicits an FRN4,13,16. However, the present results are consistent with our previous ERP findings which demonstrated that a peak-to-peak FRN can be elicited by unexpected feedback irrespective of feedback valence25,26 (see also24,27). By this, they indicate that the ACC is a likely generator for the peak-to-peak FRN. Note however, that there is an ongoing debate on how the FRN should best be measured and that the method used to quantify the FRN may have consequences for the results obtained and the conclusions drawn (for a detailed discussion see36). The present results are also well in line with other recent studies demonstrating indistinguishable ACC activity to positive and negative feedback in variants of a time estimation task18,19,20, but also in a task switching paradigm21, in a probabilistic Pavlovian conditioning procedure37 and in a problem solving task23. For example, participants in the study by Amiez and colleagues23 had to choose the reward stimulus out of four alternative choices. Greater activity in the ACC was found after unexpected positive and negative feedback as compared to expected positive feedback. Crucially, both types of unexpected feedback were signalling participants to either change their response (negative feedback in the exploration phase) or their response strategy (first positive feedback in the exploration phase). In addition, our findings fit in nicely with single cell recordings in monkeys revealing that separate populations of neurons within the same region of the ACC code positive and negative expectancy violations, respectively. One study38, for instance, found that macaque ACC sulcal neurons respond after the omission of an expected reward and that a similar number of ACC sulcal neurons also respond to the delivery of positive reinforcers (see also39).

The present results might help to reconcile the apparent inconsistencies in previous results by taking into account the co-variation of feedback valence and expectancy, a so far neglected problem in research on reinforcement learning. In the study by Daniel and Pollmann7, for instance, participants were trained to a criterion of 80% correct answers prior to scanning, thus positive and negative feedback were probably not equally unexpected. Rather, participants were likely to expect positive feedback in the actual experiment, whereas at the same time negative feedback was more unexpected. Similarly, in most of the learning studies mentioned above negative outcomes become increasingly unexpected while positive outcomes are more and more expected with learning. Consequently, these particular learning paradigms render it difficult to find ACC activation in response to a positive expectancy violation.

The present results suggest a similar involvement of the ACC in processing unexpected positive and negative outcomes of actions when both carry information that is behaviourally relevant. Rather than assuming that the ACC predicts both outcomes separately, the most parsimonious explanation for this finding is that during performance monitoring, the likely outcomes of actions are predicted and that the key function of the ACC is to signal unexpected non-occurrences of those outcomes irrespective of their valence as suggested by the PRO model28. In the context of the present study, this would mean that the ACC is activated by the unexpected non-occurrence of intermediate feedback rather than the unexpected occurrence of the alternative outcomes, i.e., positive and negative feedback. Note however, that the present design cannot distinguish between these two cases. Since our results show that when the task characteristics are optimal, similar responses to positive and negative expectancy violations can be found in the ACC and thus, that the influence of expectancy violation can override the role of valence, we propose that the relative contribution of valence and expectancy to ACC involvement might depend on the respective task requirements: Tasks involving behaviourally irrelevant feedback (like gambling tasks) or with partly invalid feedback (like probabilistic learning tasks) put an emphasis on feedback valence (i.e., a difference between positive and negative feedback) while learning tasks that involve reliable and behaviourally relevant feedback emphasize the contribution of feedback expectancy. Although expectancy is the most parsimonious explanation of our data, we cannot exclude other explanations. According to recent studies37, proposing that the ACC is part of a brain network processing saliency, positive and negative feedback should activate the ACC in a similar manner because both are motivationally salient events. It is, thus, conceivable that positive and negative feedback in our study in addition to being unexpected can be regarded as salient because they carry possibly task relevant information. In addition, the hierarchical reinforcement learning (HRL) framework40 assumes that the ACC is responsible for option selection and maintenance. In line with this model, positive and negative rather than the intermediate feedback should inform the selection and maintenance function of the ACC in order to carry out the time estimation task at a high performance level. By this, positive and negative feedback are behaviourally relevant to a similar degree and should lead to a uniform ACC activation. Although our data are consistent with both views further research is required to directly test these alternative accounts.

Another brain region, showing greater activity for unexpected positive and negative feedback than expected intermediate feedback was the anterior insula, bilaterally. Although the insula consistently shows up in studies on reward and feedback processing during multiple tasks, it did not get as much attention in this context as other brain areas. Together with ACC activation, anterior insula activation was found to be larger during the processing of negative than positive feedback in learning tasks5,7 and larger for loss than gain feedback in a gambling task34. Daniel and Pollmann7 also found that activity in the insula was larger for negative than positive feedback, however, they additionally found activation of the anterior insula when contrasting the receipt of positive as well as negative feedback with fixation which suggests that insula activation can in principle also be found after positive outcomes. This is in line with Critchley's view41, who suggested that in contrast to the ACC, which is implicated in generating autonomic changes, the insula is specialized in mapping visceral responses and by this influences subjective feeling states in response to arousing events. In this context, it has to be mentioned that in our study design, positive as well as negative feedback can, in principle, be regarded as more arousing than intermediate feedback because arousal is coupled to the valence of a (feedback) stimulus. For this reason, we cannot exclude the possibility that arousal might be an explanation for our present ACC findings. However, we consider this explanation as rather unlikely because arousal cannot explain earlier findings showing greater activity in the ACC to negative as compared to positive feedback despite rather minimal differences in arousal5,7,8,9,33 and therefore it seems plausible that ACC is more sensitive to expectancy per se.

In stark contrast to the activation patterns in the ACC and insula that did not distinguish between unexpected positive and negative feedback, the ventral striatum did show an effect of feedback valence in line with the predictions of the RL theory4. More specifically, the head of the caudate nucleus and the putamen were significantly more activated by positive than negative or intermediate feedback. Interestingly, no significant activations were observed for the reversed contrast. The striatum, like the ACC, is also a target area of the mesencephalic dopamine system and has been shown to be involved in a variety of tasks related to reward processing7,42,43,44,45,46. Crucially, most of these studies reported increased activity in the striatum related to the unexpected delivery of a reward47,48,49 and decreases in activity for unexpected negative outcomes like reward omission48. Other neuroimaging studies have implicated the striatum in feedback learning differentiating between positive and negative feedback5,50,51,52,53,54.

In light of these findings, the pattern of striatal activity observed in the present study could be interpreted in terms of the special role of the striatum in sustaining the rewarded action. It is conceivable that the ventral striatum evaluates the positive outcome to boost the immediate action that led to this positive outcome. This view is supported by recent findings demonstrating that striatal activity to positive feedback is dependent upon the action that led to this feedback55. Only when the reward required a specific action from their participants (but not when it was delivered without any action), the ventral striatum was active. The dependency upon an action leading to positive outcome might also explain the identical activity for intermediate and negative feedback observed in the present study as both feedback types similarly indicate a need for response adjustment. Unlike in tasks using only two response options (i.e., where the correct response can also be inferred from negative feedback), in the present task the three feedback types were largely independent of each other and participants received intermediate or negative feedback when they responded too fast or too slow. Consequently, the correct response is much more difficult to infer from these two types of feedback than from positive feedback which simply required repeating this particular response. In other words, in the present task positive feedback is of greater utility to the ‘critic’ function of the ventral striatum to guide future behaviour because it implies a specific action. In contrast, negative and intermediate feedback only indicates that the behaviour should be changed, while the specific action that would lead to success is less clear. However, this idea needs to be directly tested in future research.

A methodological issue that needs to be mentioned is that studies examining the neuronal basis of time estimation have shown activity in a distributed network which is partially overlapping with the one responsible for feedback processing56. Consequently, it is important to try to disentangle the activations related to the respective processing. In the present study, we did so by including the actual estimation time as a predictor of no interest in our analyses and by presenting feedback stimuli always 5 s after the start of the time estimation so that the estimation process should already be finished for some time by the time we measure brain activation related to feedback presentation (except for some rare cases of extreme overestimation). Thus, although we think that the influence of activation related to time estimation is low in the present data, the found pattern of results should be replicated in the future with other experimental tasks that are not dependent on time estimation.

Taken together, the present results illustrate a network of brain structures differentially contributing to feedback processing during learning. The functional role of the ACC in this network seems to be the processing of unexpected non-occurrences of events irrespective of the valence of the alternative event, as proposed by the PRO model28,29. The striatum, on the other hand, was differentially activated by positive and negative feedback as proposed by the RL theory4 and seems to be selectively associated with behaviourally relevant positive events. By this, both structures contribute differentially to a complex process of feedback-related behavioural adaptation with their specific engagement depending on the current task demands.

Methods

Participants

Twenty volunteers without any psychiatric, neurological, or medical illness participated in the experiment. The study was approved by the local ethics committee and was in accordance with the ethical guidelines of the Declaration of Helsinki. All participants but one were right-handed and all had normal or corrected-to-normal vision. All signed informed consent before the experiment and were paid 25 [euro].

One participant had to be excluded from all analyses because of an equipment malfunction. Two further participants were excluded from the data analyses because they systematically produced underestimations of the target time with their mean estimation errors being larger than two standard deviations of the mean sample error. These participants were excluded because it is not clear whether they actually performed the time estimation task correctly. Consequently, all analyses were based on the data of 17 participants (7 female/10 male, aged 19–31 years, mean age = 23.4 years).

Task, stimuli and procedure

After participants filled out a short demographic questionnaire, they performed the time estimation task (adapted from25). The task started with a fixation cross which was presented for 250 ms, 500 ms, 750 ms, or 1000 ms. Participants were instructed to press a response button 2.5 seconds after the cross vanished from the screen. Five seconds after the fixation cross disappeared, they received positive (“excellent”), negative (“bad”), or intermediate (“ok”) feedback about their estimation accuracy in form of a yellow, purple, or blue rectangle (see Figure 1a). To avoid differences in perceptual processing between the feedback conditions, we used simple coloured rectangles as feedback stimuli. The assignment of colours to the type of feedback was counterbalanced across subjects. In order to prevent mere rhythmic responses variable presentation times were used for the fixation cross (see above) and for the inter-trial interval (ITI). The ITI varied between 2–11 s (mean = 4 s) in steps of 1 s and followed an exponential distribution in order to get an optimal trade-off between detectability and estimation efficiency of the BOLD response57,58. Participants completed 20 practice trials and 240 experimental trials, i.e., four blocks of 60 trials each separated by a short self-paced break.

To keep the rate of positive and negative feedback low (at about 20%, respectively) and thus unexpected and the rate of intermediate feedback high (at about 60%) and therefore expected, an adaptive procedure was used to adjust the feedback to the participants' estimation performance (for a detailed description of this procedure see25). In the first 20 trials, positive feedback was given if the participant's response occurred within a ±100 ms time window around the target time (2.5 s), negative feedback was given if the response was outside a ±500 ms time window. For all other responses intermediate feedback was given. In the following, the inner time windows were adjusted every 20 trials whenever positive feedback occurred in less (more) than 20% of the last 20 trials and overall by symmetrically increasing (decreasing) the time window by 30 ms. Outer time windows were adjusted likewise for negative feedback by adding or subtracting 150 ms (see Figure 1b for an example).

To get used to the task, participants completed 20 practice trials in which they got the feedback “faster” or “slower”. This feedback was given to avoid expectancy formation during practice concerning the positive, negative and intermediate feedback that was used in the experiment. In these practice trials the adaptive mechanism was already at work so that the experiment could start with already adjusted time windows. After practice, subjects were told that they would receive “excellent” feedback when their time estimation had been very close to 2.5 s and “bad” feedback when their button press was very late or very early and thus far away from 2.5 s. It was also explained to them that they would probably receive the intermediate “ok” feedback most of the time because this type of feedback is easiest to get. They were also told that they should try to get positive feedback as often as possible and avoid negative feedback. Additionally, to prevent participants from getting the impression that the feedback was not valid because the adaptive mechanism could change the time windows according to participants' performance over the course of the experiment, they were told that they would play against the computer, which would try to make the task more difficult for them when they succeeded too often.

fMRI acquisition

Structural and functional brain imaging was performed on a 3 Tesla Siemens Skyra scanner. A T1 weighted 3D whole brain scan was performed for anatomical co-registration (MP-Rage sequence: TR = 1900 ms, TE = 4 ms, flip angle = 15°, FOV = 256 mm, 192 sagittal slices). During functional imaging, 28 axial slices (3 mm thickness, .75 mm inter-slice distance, FOV = 192 mm, 94 × 94 data acquisition matrix) were acquired with a T2* weighted blood-oxygenation-level dependent (BOLD) sensitive gradient echo planar sequence (TR = 2000 ms, TE = 30 ms, inter slice time 71 ms, flip angle = 90°). A total of 325 functional volumes were acquired during each experimental block.

Statistical analyses of behavioural data

Statistical analyses of behavioural data included mean numbers of feedback frequencies and variance of time estimation errors, timeout trials were excluded. Behavioural data were analysed using repeated measures analyses of variance (ANOVAs) with an alpha level of .05. The Greenhouse-Geisser correction for non-sphericity was applied whenever appropriate and epsilon-corrected p-values are reported together with uncorrected degrees of freedom and Greenhouse-Geisser epsilon values. Additionally, to ensure that the above reported frequencies were actually due to an equal distribution of positive and negative feedback in each subject, mean feedback frequencies over all four blocks of the experiment and for the first block only were subjected to a χ2 test (α = .05, df = 2) for each individual participant. For this assessment, we used the first block of the experiment in addition to the overall mean because the beginning of the experiment should be most critical for expectancy formation.

Analyses of fMRI data

Analyses of fMRI data were carried out using the BrainVoyager software package (Brain Innovation B.V., Maastricht, The Netherlands). The first 4 volumes of each experimental block were discarded to allow for T1 equilibration. The remaining volumes were pre-processed using the standard routines as implemented in BrainVoyager. First, to correct for the sequentially executed interleaved slice acquisition a slice scan time correction was performed using sinc interpolation. Next, a correction of 3D motion (sinc interpolation) was performed to spatially align functional volumes of all four blocks to the first acquired volume of the first block. An isotropic spatial Gaussian filter (FWHM = 6 mm) was then applied to the data. The data were high-pass filtered at 3 cycles per block to account for low frequency signal changes and baseline drifts. Functional slices were then co-registered to the high-resolution whole-brain anatomical scans obtained in the beginning of the session and were subsequently spatially transformed into stereotactic Talairach space59 and re-sampled to a spatial resolution of 2 × 2 × 2 mm.

All blocks from each individual were analysed together using random effects multisubjects general linear model (GLM). The three feedback types (i.e., positive, intermediate and negative feedback) were modelled as separate events with a duration of .5 s each for each participant. In addition, the fixation cross (duration modelled according to the respective presentation time) and the estimation time (modelled as an event of the projected estimation time of 2.5 s) as well as motion parameters were added as predictors of no interest to the design matrix of each block. Predictor time courses were adjusted for the hemodynamic response delay by convolution with a double-gamma hemodynamic response function (onset: 0 s, time to response peak: 5 s, time to undershoot peak: 15 s)60. In a first analysis, contrasts tested for differential BOLD-response as a function of feedback expectancy, i.e. positive and negative feedback against intermediate feedback. The valence effect was estimated in a second analysis by contrasting positive against negative feedback. Results are reported at thresholds of voxel level p < .00001, cluster level p < .05 FDR (false discovery rate) corrected.