INTRODUCTION

Adequate adaptation to our constantly changing environment requires the anticipation of biologically relevant events by learning signals of their occurrence, that is prediction. Models of reinforcement learning use a temporal difference prediction error signal, representing the difference between expected and obtained events, to update their predictions based on states of the environment (Sutton and Barto, 1998). A putative neuronal mechanism of the temporal prediction error signal for future reward is the fast-phasic firing of dopamine cells in the ventral tegmental area (Montague et al, 1996; Schultz et al, 1997). According to this proposal, positive prediction error of reward, that is unexpected reward, produces a burst in the firing of dopamine neurons, whereas negative prediction error of reward, that is unexpected omission of reward, produces a pause in the firing of dopamine neurons.

Recently, Daw et al (2002) have highlighted that existing models of reward prediction cannot easily account for the prediction of future punishment. They have extended the proposal that dopamine subserves the reward prediction error by suggesting a way in which central serotonin (5-HT), released by the dorsal raphe nucleus, could act as a motivational opponent to dopamine in prediction learning. According to this theoretical model, the phasic release of 5-HT mirrors the phasic release of dopamine, and reports a prediction error for future punishment.

The hypothesis that 5-HT is involved in the prediction of aversive signals is plausible, but not yet proven. There is abundant empirical evidence implicating 5-HT in controlling aversion and potentiating anxiety-induced avoidance (Tye et al, 1977; Gray, 1982; Graeff et al, 1996). The hypothesis concurs with the observation that serotonergic neurotransmission is implicated in a range of mood and anxiety disorders (Young et al, 1985; Anderson et al, 1990; Deakin et al, 1990; Blier and de Montigny, 1999) that are characterized by enhanced anticipation of, and sensitivity to threat-related stimuli, punishment, and negative feedback (Beats et al, 1996; Elliott et al, 1997; Mathews and Mackintosh, 2000; Steffens et al, 2001; Richards et al, 2002; Murphy et al, 2003). Studies with healthy human volunteers have demonstrated potentiated processing of punishment-related signals after dietary depletion of the 5-HT precursor tryptophan (TRP; acute tryptophan depletion, ATD), particularly in vulnerable individuals. For example, ATD enhanced the amygdala response to fearful faces (Cools et al, 2005b; Van der Veen et al, 2007), decreased the impact of positively valenced words (Murphy et al, 2002), and increased the impact of negatively valenced emotional words in Stroop-like tasks (Evers et al, 2006a). In addition, ATD potentiated neural activity during negative feedback in a probabilistic reversal-learning task (Evers et al, 2005), which was also sensitive to acute administration of the selective serotonin reuptake inhibitor citalopram (Chamberlain et al, 2006). Finally, a processing bias in favor of aversive signals is seen in healthy individuals who carry one or two copies of the short allele of the 5-HT transporter polymorphism, which is associated with reduced expression of the 5-HT transporter (Hariri et al, 2002; Heinz et al, 2005; Pezawas et al, 2005) and possibly reduced 5-HT function (Bethea et al, 2004).

However, despite the existence of abundant evidence for a role of 5-HT in aversive processing, there is currently no direct experimental evidence supporting the specific hypothesis that central 5-HT mediates the prediction of future punishment, but not that of future reward. Here, we test this hypothesis by investigating the effects of ATD, a well-known procedure to reduce central nervous system 5-HT (Nishizawa et al, 1997; Carpenter et al, 1998), on performance of an observational learning paradigm that allowed the independent assessment of reward and punishment prediction. Reward- and punishment-prediction trials were matched in terms of response inhibition demands, and learning demands were maximized by repeatedly reversing contingencies at unpredictable intervals.

METHODS

Subjects

Procedures were approved by the Norfolk Research Ethical Committee (06/Q0101/5) and were in accord with the Helsinki Declaration of 1975.

Twelve subjects participated in this study. They were screened for psychiatric and neurological disorders, gave written informed consent, and were compensated for participation. Exclusion criteria were any history of cardiac, hepatic, renal, pulmonary, neurological, psychiatric or gastrointestinal disorders, medication/drug use, and personal or family history of major depression or bipolar affective disorder. Following the screening interview, subjects were assigned in a double-blind approximately counterbalanced fashion to the ‘first-TRP−’ (n=5) or the ‘first-BAL’ group (n=7) (mean age (years) 22.4, SD=4.0; four males).

The experimental paradigm of interest in the current paper was administered as part of a larger study (data to be published separately by OJR and BJS). One subject vomited immediately after consuming the drink and her data were excluded from analysis. Three subjects failed to comply with the task instructions as revealed by error rates at chance or worse than chance levels in one or more blocks (eg one subject made zero correct responses in one block with a mean reaction time of 305 ms). One subject did not return for a second visit and three subjects encountered technical difficulties with the task.

General Procedure

Subjects were assessed on a neuropsychological battery on two test sessions, separated by at least 1 week. Volunteers were asked to abstain from alcohol, caffeine, and food from midnight before each session. During the test days, they followed a low-protein diet. In the morning of a test day (between 0830 and 1030 hours), volunteers arrived at the research center, where a blood sample was taken, and a nutritionally balanced (BAL) or a TRP-free (TRP−) amino-acid drink was ingested. Testing started after a resting period of approximately 5.5 h to ensure stable and low TRP− levels. After a second blood sample, the task was completed.

Amino-Acid Mixtures

Central TRP was depleted by ingesting an amino-acid load that did not contain TRP but did include other large neutral amino acids (LNAAs) (Reilly et al, 1997). The quantities of amino acids in each drink were based on those used by Young et al (1985), though a 75 g mixture was employed to minimize nausea. Amino-acid mixtures (prepared by SHS international, Liverpool, UK) were as follows:

BAL: L-alanine, 4.1 g; L-arginine, 3.7 g; L-cystine, 2.0 g; glycine, 2.4 g; L-histidine, 2.4 g; L-isoleucine, 6 g; L-leucine, 10.1 g; L-lysine, 6.7 g; L-methionine, 2.3 g; L-proline, 9.2 g; L-phenylalanine, 4.3 g; L-serine, 5.2 g; L-threonine, 4.9 g; L-tyrosine, 5.2 g; L-valine, 6.7 g; and L-tryptophan, 3.0 g—total 78.2 g.

TRP−: L-alanine, 4.1 g; L-arginine, 3.7 g; L-cystine, 2.0 g; glycine, 2.4 g; L-histidine, 2.4 g; L-isoleucine, 6 g; L-leucine, 10.1 g; L-lysine, 6.7 g; L-methionine, 2.3 g; L-proline, 9.2 g; L-phenylalanine, 4.3 g; L-serine, 5.2 g; L-threonine, 4.9 g; L-tyrosine, 5.2 g; and L-valine, 6.7 g—total 75.2 g.

For female participants, the same ratios of amino acids were used, but with a 20% reduction in quantity to take into account lower body weight. The drinks were prepared by stirring the mixture into approximately 200 ml tap water. Subjects were given the choice of adding either lemon-lime or grapefruit flavoring to compensate for the unpleasant taste. They reported no side effects apart from transient nausea following ingestion of the drink.

Self-Report Measurements

The Positive and Negative Affect Scale (PANAS; Watson et al, 1988) was administered on nine occasions during the test day (with the first measure administered before the drink). We analyzed the difference in positive and negative affect scores obtained from the PANAS between the following two time points: (1) immediately before drink ingestion and (2) immediately before test administration. Analysis of these difference scores revealed that ATD did not significantly affect time-related changes in positive affect (F1,11=2.1, P=0.2) or negative affect (F1,11=0.08, P=0.8).

Subjects also completed a number of questionnaires: the behavioral inhibition system/behavioral activation system (BIS/BAS) scales (Carver and White, 1994), the Eysenck Personality Questionnaire (EPQ), the Impulsiveness Venturesomeness Empathy questionnaire (IVE-7; Eysenck and Eysenck, 1978), the Beck Depression Inventory (BDI; Beck et al, 1961), and the Barratt Impulsiveness Scale (BIS; Patton et al, 1995). Scores are reported in Table 1.

Table 1 Demographic and Trait Characteristics

Task Design

General description

The paradigm was previously described by Cools et al (2006) and the reader is referred to that manuscript for additional details (Table 2).

Table 2 Schematic of Sample Trial Sequences from the Two Conditions

Subjects were presented with a series of two stimuli. The two stimuli were the same throughout the experiment. At any one point in time, one of the stimuli was associated with reward, while the other was associated with punishment. On each trial, one of the two stimuli was highlighted and subjects had to predict, based on trial and error learning, whether the highlighted stimulus would lead to reward or punishment. The outcome was presented after subjects made their prediction. Outcomes were not response contingent, but depended on which stimulus was highlighted. Thus, the outcome did not provide performance feedback. To minimize confusion regarding the task instructions, we provided performance feedback in an indirect fashion, by highlighting the same stimulus after error trials. This procedure was identical for punishment- and reward-prediction trials and allowed us to track whether subjects adhered to the task instructions. During the task, the stimulus-outcome contingencies reversed multiple times provided attainment of learning criteria.

Trial details

On each trial subjects were presented two vertically adjacent stimuli, one scene and one face (location randomized), at about 19-inch viewing distance (subtending about 3° horizontally and 3.5° vertically). One of the stimuli was highlighted with a black border surrounding the stimulus. Subjects indicated their predictions by pressing, with the index or middle finger, one of two colored buttons (corresponding to keys ‘b’ and ‘n’ depending on the response-outcome mapping) on a laptop keyboard. They pressed the green button for reward and the red button for punishment. The outcome-response mappings were counterbalanced between subjects. The (self-paced) response was followed by an interval of 1000 ms, after which the outcome was presented for 500 ms. Reward consisted of a green smiley face, a ‘+$100’ sign and a high-frequency jingle tone. Punishment consisted of a red sad face, a ‘−$100’ sign and a single low-frequency tone. After the outcome, the screen was cleared for 500 ms, after which the next two stimuli were presented.

Task procedure

Each subject performed one practice block and four experimental blocks. Each practice block consisted of one acquisition stage and one reversal stage. Each experimental block consisted of one acquisition stage and a variable number of reversal stages. The task proceeded from one stage to the next following a number of consecutive correct trials, as determined by a preset learning criterion. Learning criteria (that is the number of consecutive correct trials following which the contingencies changed) varied between stages (mean=6.9, SD=1.8, range from 5 to 9), to prevent predictability of reversals. The maximum number of reversal stages per experimental block was 16, although the block terminated automatically after completion of 120 trials (∼6.6 min), so that each subject performed 480 trials (four blocks) per experimental session.

The task consisted of two conditions (two blocks per condition). A schematic of sample trial sequences for each condition is shown in Table 2. In the unexpected reward condition, reversals were signaled by unexpected reward. Specifically, on reversal trials of the unexpected reward condition, the previously punished stimulus was highlighted and followed unexpectedly by reward. In the unexpected punishment condition, reversals were signaled by unexpected punishment. Thus, on reversal trials of this condition, the previously rewarded stimulus was highlighted and followed unexpectedly by punishment. The order of conditions was counterbalanced between groups (six subjects received the unexpected punishment condition first).

The stimulus that was highlighted on the first trial of each reversal stage (on which the unexpected outcome was presented) was always highlighted again on the second trial of that stage (ie the switch trial on which the subject had to implement the reversed contingencies and switch their predictions) (Table 2). For example, if the previously rewarded stimulus A was highlighted on the first trial of a reversal stage and followed by unexpected punishment, then stimulus A was highlighted again on the second trial of that reversal stage.

Data Analysis

Biochemical measures

Blood (venous) samples (10 ml) were taken immediately before ingestion of the amino-acid drink and after the testing session, approximately 5.5 h after administration, to determine the level of total and free TRP in plasma, and the TRP/∑LNAA ratio. This ratio was calculated from the serum concentrations of total TRP divided by the sum of the LNAAs (tyrosin, phenylalanine, valine, isoleucine, and leucine) and is important, because the uptake of TRP in the brain is strongly associated with the amounts of other competing LNAAs (TRP and the other LNAA share the blood–brain barrier). Venous samples were taken in lithium heparin tubes and stored at −20°C. Plasma TRP concentrations were determined by an isocratic high-performance liquid chromatography (HPLC) method of analysis. Plasma proteins were removed by precipitation with 3% trichloroacetic acid and centrifugation at 3000 revolutions, 4°C for 10 min, and then pipetted into heparin aliquots. An aliquot was diluted in mobile phase before injection onto the HPLC analytical column. Fluorescence end-point detection was used to identify TRP.

Behavioral measures

Data were analyzed in three steps. First, we assessed the effects of ATD on the mean number of errors on the task as a whole, regardless of outcome type (that is regardless of whether subjects predicted reward or punishment). Errors were square-root transformed to stabilize variances and decrease skewness (√x; as is usual when data are in the form of counts (Howell, 1997, p327)) and submitted to an ANOVA with condition (which contrasted the unexpected punishment condition with the unexpected reward condition) and drink (TRP− vs BAL) as within-subject factors.

Second, the data were decomposed according to outcome type. This trial-by-trial analysis included only those trials that followed correct responses. (We excluded trials following errors, because errors on these error +1 trials probably reflected a failure to maintain the task instructions. This assumption was based on the fact that the same stimulus was highlighted on trials following errors and likely provided no significant cognitive challenge on non-switch trials.) Errors were transformed into proportional scores, given that the number of data points varied per trial type as a function of performance. Mean proportions of errors were arcsine transformed (2 × arcsine(√x); as is appropriate when the variance is proportional to the mean (Howell, 1997; p328)) and analyzed using repeated-measures ANOVAs (SPSS 11, Chicago, IL) with condition, drink, and outcome type as within-subject factors.

Finally, we separately analyzed trials after unexpected outcomes that required a behavioral switch (switch trials) and trials after expected outcomes that did not require such switching (non-switch trials) (Table 2). This analysis allowed us to assess whether ATD differentially affected switching as a function of the valence of the unexpected outcomes. For these analyses we excluded trials from the first acquisition stage of each block, which did not differ between conditions.

We report two-tailed P-values. Greenhouse–Geisser corrections were applied when the sphericity assumption was violated (Howell, 1997). The data in the figures represent raw data.

RESULTS

Biochemical Measures

Repeated-measures ANOVA revealed significant two-way interactions of drink by time of blood test, due to significant reductions in total TRP levels (F1,11=94.3, P<0.0001), free TRP levels (F1,11=28.2, P<0.0001), and the critical ratio TRP/∑LNAA measure (F1,11=64.6, P<0.0001) approximately 5.5 h after TRP− relative to BAL (Table 3).

Table 3 Biochemical Measures as a Function of Time of Test and Drink

Analyses of simple effects for the critical ratio data revealed a significant main effect of time for the TRP− drink (T11=12.7, P<0.0001), but not for the BAL drink (T11=−1.2, P=0.25). Thus, the ratio of TRP/∑LNAA was significantly reduced after the TRP− drink, but remained unaltered after the BAL drink.

Behavioral Data: Block Analysis

In Figure 1 we present the total number of errors made on the task as a whole as a function of condition and drink. Subjects made on average 5% errors on the task (chance=50%). Significantly fewer errors were made after the TRP− drink than after the BAL drink (main effect of drink: F1,11=9.5, P=0.01). The effect of ATD did not differ between the unexpected punishment and unexpected reward condition (drink by condition interaction: F1,11=0.1, P=0.7; main effect of condition: F1,11=0.5, P=0.5). Separate paired-sample t-tests confirmed that ATD improved performance in both the unexpected reward (T11=2.2, P=0.05) and the unexpected punishment condition (T11=2.6, P=0.03).

Figure 1
figure 1

The mean number of errors as a function of condition and drink. Error bars represent standard errors of the difference as a function of drink.

In keeping with the reduced number of errors, subjects completed more stages within the maximum of 120 trials after the TRP− drink than after the BAL drink and this main effect of drink on the number of completed stages across both conditions was marginally significant (F1,11=3.8, P=0.08).

Behavioral Data: Trial-by-Trial Analysis

We assessed whether the improvement depended on outcome type by comparing reward- and punishment-prediction trials using a second ANOVA. This analysis revealed that the effect of ATD was not equally distributed between punishment- and reward-prediction trials (drink by outcome interaction: F1,11=5.6, P=0.04). In keeping with our hypothesis, the effect of ATD was restricted to punishment-prediction trials and did not extend to reward-prediction trials. Simple effects analyses confirmed that subjects made significantly fewer punishment-prediction errors after the TRP− drink than after the BAL drink (F1,11=8.3, P=0.015), whereas there was no drink effect on reward-prediction errors (F1,11=0.1, P=0.7). There was also a significant interaction between condition and outcome (F1,11=10.2, P=0.009), due to more punishment- than reward-prediction errors in the unexpected punishment condition, but more reward- than punishment-prediction errors in the unexpected reward condition. However, as mentioned above, the condition factor did not interact with drink.

The next set of analyses assessed switch and non-switch trials separately. An ANOVA on switch trials with condition and drink as within-subject factors revealed that there was no main effect of drink on switch trials (Table 4; F1,11=0.2, P=0.7), nor a drink by condition interaction (F1,11=0.005, P=0.9). Note that on switch trials (ie trials immediately after unexpected outcomes), the condition factor overlaps with the outcome factor, because unexpected punishment was always followed by a punishment-prediction trial and unexpected reward was always followed by a reward-prediction trial. Thus, there was no punishment prediction improvement on switch trials.

Table 4 The Mean Percentage of Errors on Switch Trials as a Function of Condition and Drink

Conversely, an ANOVA on non-switch trials with condition, drink, and outcome as within-subject factors confirmed again a significant drink by outcome interaction (F1,11=10.6, P=0.008; Figure 2; Table 5). Simple effects analyses revealed that subjects made significantly fewer errors on punishment-prediction trials after the TRP− drink than after the BAL drink (F1,11=7.6, P=0.02), while there was no drink effect on reward-prediction trials (F1,11=0.07, P=0.4). Further simple effects analyses of these non-switch trials revealed that subjects made significantly more errors on punishment- than reward-prediction trials after the BAL drink (F1,11=4.8, P=0.05). By contrast, there was no difference between punishment- and reward-prediction errors after the TRP− drink (F1,11=0.003, P=0.96). Thus, ATD abolished a disproportionate difficulty with punishment prediction, but did not affect reward prediction. This effect was restricted to non-switch trials, and did not extend to switch trials.

Figure 2
figure 2

The mean percentage of errors on non-switch trials as a function of drink and outcome trial type. Error bars represent standard errors of the difference as a function of drink.

Table 5 The Mean Percentage of Errors on Non-switch Trials as a Function of Condition, Drink, and Outcome

The order of drink administration could not account for the data, as revealed by additional analyses of non-switch trials, evidencing that the significant interaction between drink and outcome remained significant when drink order was inserted as a between-subject variable (F1,10=10.9, P=0.008). In addition, there was no evidence for an interaction between drink, outcome, and drink order (F1,10=0.07, P=0.4).

In summary, ATD improved performance by abolishing a disproportionate difficulty with punishment prediction relative to reward prediction. The effect of ATD was present only on non-switch trials, when subjects found punishment prediction more difficult than reward prediction after the BAL drink. There was no effect of ATD on switch trials. These findings indicate that ATD increased the prediction of punishment, but left unchanged the prediction of reward. Furthermore, ATD did not affect the ability to flexibly alter responding based on unexpected outcomes.

DISCUSSION

The observation that ATD increased punishment prediction concurs with classic and more recent findings indicating that 5-HT controls the processing of aversive signals (Tye et al, 1977; Iversen, 1984; Deakin and Graeff, 1991; Daw et al, 2002; Cools et al, 2005b; Pezawas et al, 2005; Harmer et al, 2006). More specifically, the selective effect of ATD on punishment prediction is consistent with a recent theoretical model, suggesting that in prediction learning 5-HT acts as a motivational opponent to dopamine, which is commonly implicated in the prediction of future reward (Daw et al, 2002).

In this model, learning to predict punishment depends on a transfer, with learning, of a high-amplitude transient-phasic 5-HT response from an aversive stimulus to a conditioned stimulus that predicts it. We demonstrate that a modest reduction in ‘background’ levels of tonic 5-HT increased the ability to predict punishment. One possibility is that the depletion of tonic 5-HT increased the dynamic range and thus the impact of changes in phasic 5-HT, thus shifting the system from a tonic mode of neurotransmission to a phasic mode of neurotransmission, effectively reducing the signal to noise ratio (Figure 3). Similar antagonistic interactions between phasic and tonic neurotransmission have been proposed for dopamine, where tonic levels regulate the phasic dopamine responses to biologically relevant stimuli (Grace, 1991). Although definitive confirmation of the pharmacological mechanism underlying our selective effect requires electrophysiological recording from serotonergic neurons and voltammetric data during punishment and reward prediction, our findings provide the first direct evidence in support of the hypothesis that 5-HT is critical for the prediction of punishment.

Figure 3
figure 3

Schematic representation of the hypothetical effect of ATD (in red) on phasic 5-HT neuronal activity. ATD is hypothesized to increase the dynamic range of the phasic 5-HT burst. Time series for the balanced (BAL) and the TRP-free (TRP−) condition are shifted in time to facilitate visualization. The bars on the right represent the height of the phasic burst. CS, conditioned stimulus predictive of punishment.

As with most neurochemical manipulations available for human research, we cannot fully exclude the possibility that manipulation of TRP levels did not also affect levels of dopamine, due to known interactions between 5-HT and dopamine (Millan et al, 1998). However, it should be noted that direct manipulation of dopamine by withdrawal of the dopamine precursor L-DOPA and dopamine receptor agonists had diametrically opposite effects on this same learning paradigm from those reported here. Specifically, we demonstrated that withdrawal of dopaminergic medication in patients with mild Parkinson's disease selectively improved the ability to switch predictions based on unexpected punishment, while not affecting the ability to predict punishment (or reward) on non-switch trials (Cools et al, 2006). Thus, the effects of ATD dissociated from the effects of withdrawal of dopaminergic drugs, likely reflecting neurochemically specific effects of central 5-HT and dopamine, respectively.

In temporal difference models, the prediction error for future punishment is largest when events are unexpected. At first sight, one may thus argue that the effect of ATD should be most pronounced following unexpected punishment. In fact, the improvement was not present on such switch trials, but only surfaced on non-switch trials. This finding may be reconciled with the above-described model, by assuming that the prediction error due to unexpected punishment was too large and too robust to be sensitive to the small reduction in central 5-HT.

An important implication of the lack of effect on switch trials is that ATD did not modulate attention to punishment per se. Thus, the matched performance following unexpected punishment indicates that regardless of drink subjects attended equally to the unexpected punishment and were equally able to implement the changed contingency on the next trial.

An alternative account of our effect on punishment prediction is that it does not reflect a modulation of learning per se, but rather that of the memory of specific stimulus-punishment contingencies. These two alternative learning and memory hypotheses can be disentangled in future study by assessing the effect of ATD on slow learning curves after reversals of more difficult (eg probabilistic) contingencies than those presented in the present paradigm (where learning curves reached asymptote on the second trial following reversal).

The increased tendency to learn and/or memorize stimulus-punishment contingencies was not a result of a nonspecific, generalized increase in punishment anticipation, because subjects did not predict punishment more often for reward-associated stimuli. This finding indicates that our effect reflects enhanced learning and/or memory of specific stimulus-punishment contingencies and concurs with results from studies with experimental animals indicating an important role for 5-HT in fear conditioning and fear memory (Inoue et al, 1993, 1996, 2004; Wilkinson et al, 1995; Burghardt et al, 2004).

After the BAL drink, subjects found punishment prediction significantly more difficult than reward prediction. It is unlikely that this reflects an effect of the BAL drink for two reasons. First, the BAL drink did not affect the critical ratio of TRP/∑LNAA. Second, we observed similar disproportionate difficulty with punishment prediction in elderly volunteers who did not take any substance in our previous study with this paradigm (Cools et al, 2006). Therefore, the selective difficulty with punishment prediction may reflect a protective bias in subjects under baseline. Suppression of the learning and/or memory of stimulus-punishment contingencies may be adaptive in this task, where the punishment is uncontrollable. Critically, the difference between punishment and reward prediction was abolished by ATD. Thus, after TRP− subjects exhibited a form of depressive realism (Alloy and Abramson, 1979), ATD did not induce a negative bias but rather abolished a protective bias against punishment anticipation. This observation concurs with previous suggestions that depressed individuals, who exhibit low 5-HT levels, do not show an attentional bias to negative information, but rather fail to demonstrate the protective bias that is evident in nondepressed individuals (McCabe and Gotlib, 1995).

The protective bias under baseline, that is the impairment in punishment relative to reward prediction may reflect resilience to aversive signals (Amat et al, 2005; Yehuda et al, 2006; JV Taylor Tavares, L Clark, ML Furey, GB Williams, BJ Sahakian, WC Drevets, unpublished observations). Resilience protects subjects from the detrimental consequences of exposure to adversity and enables them to quickly recover from negative experiences. In the present task, resilience may have resulted in a paradoxical impairment in the ability to anticipate punishment given specific predictive stimuli. Resilience has been hypothesized to result from cortical, top-down control (from the prefrontal cortex, PFC) over subcortical brain regions that mediate aversive conditioning (eg the amygdala and the dorsal raphe nucleus) (Quirk and Gehlert, 2003; Amat et al, 2005; Pezawas et al, 2005; Urry et al, 2006; Yehuda et al, 2006). In keeping with this hypothesis, recent neuroimaging observations suggest that the PFC controls amygdala activity when subjects are presented with negatively valenced stimuli (Ochsner et al, 2002; Phelps and LeDoux, 2005). Based on previous suggestions that 5-HT conveys resilience to adversity (Deakin and Graeff, 1991; Deakin, 1991; Richell et al, 2005), we hypothesize that ATD disrupts PFC-mediated control over subcortical brain regions, such as the amygdala and/or the dorsal raphe nucleus (Amat et al, 2005; Heinz et al, 2005; Pezawas et al, 2005). Such a top-down control failure may interact with reductions in ‘background’ levels of tonic 5-HT to bias the system toward anticipation of adversity (by increasing prediction errors for future punishment). This hypothesis can be tested using event-related functional neuroimaging.

Trial-by-trial analyses showed that the effect of ATD was present only on non-switch trials and there was no evidence for a similar effect on switch trials. This is consistent with the finding that effects of systemic serotonergic manipulations in human volunteers are not specific to the reversal stage of discrimination learning tasks, but extend to simple and compound discrimination learning stages of such tasks (Park et al, 1994; Rogers et al, 1999; Murphy et al, 2002; Chamberlain et al, 2006). In keeping with these findings, ATD enhanced the BOLD response to punishment in a probabilistic reversal-learning paradigm regardless of whether punishment led to switching (Evers et al, 2005). Thus, ATD in human volunteers does not selectively alter behavioral flexibility, but rather has a more generalized effect on the learning and/or memory of contingencies via effects on punishment processing.

It may be noted that the effect of systemic serotonergic manipulations on human reversal learning differs from that of selective 5-HT depletion following injection of the neurotoxin 5,7-dihydroxytryptamine (5,7-DHT) in the nonhuman primate OFC. This manipulation dramatically impairs the ability to inhibit responding to the previously rewarded stimulus, while not affecting the initial acquisition of a discrimination (Clarke et al, 2004, 2005, 2007). To explain this discrepancy, we must take into account three factors: (1) the method of depletion, (2) the extent to which the task depends on inhibitory control, and (3) the neural site of action of 5-HT. First, injection of 5,7-DHT leads to almost complete removal of brain 5-HT levels, whereas ATD in humans reduces central 5-HT levels only modestly. These methods may well have different effects on the hypothetical equilibrium between tonic and phasic modes of neurotransmission. Second, the (serial) reversal-learning tasks in studies with humans, particularly the one employed here, do not load on inhibitory control as much as do the paradigms used in studies with nonhuman primates, for whom reinforcement is more salient and thus, habit formation more pronounced (Clarke et al, 2007). Finally, neuroimaging studies with human volunteers have shown particularly pronounced effects of ATD on the (dorso)medial PFC during cognitive performance (Cools et al, 2005a; Evers et al, 2005, 2006a, 2006b; Talbot and Cooper, 2006; Van der Veen et al, 2007). Conversely, the disinhibitory effects resulted from selective 5-HT depletion in the OFC (Clarke et al, 2004, 2005, 2007). Thus, 5-HT depletion may have different functional consequences depending on the extent of depletion, task demands and on the neural site of action (medial PFC vs OFC).