Abstract
NmethylDaspartate (NMDA) receptors are known to fulfill crucial functions in many forms of learning and plasticity. More recently, biophysical models, however, have suggested an additional role of NMDA receptors in evidence integration for decisionmaking, going beyond their role in learning. We designed a task to study the role of NMDA receptors in human rewardguided learning and decisionmaking. Human participants were assigned to receive either 250 mg of the partial NMDA agonist dcycloserine (n=20) or matching placebo capsules (n=27). Rewardguided learning and decisionmaking were assessed using a task in which participants had to integrate learnt and explicitly shown value information to maximize their monetary wins and minimize their losses. To tease apart the effects of NMDA on learning and decisionmaking we used simple learning models. Dcycloserine shifted decisionmaking towards a more optimal integration of the learnt and the explicitly shown information, in the absence of a direct learning effect. In conclusion, our results reveal a distinct role for NMDA receptors in rewardguided decisionmaking. We discuss these findings in the context of NMDA’s roles in neuronal superadditivity and as crucial for evidence integration for decisions.
INTRODUCTION
Animals and humans live in ever changing and complex environments. They need to continuously track and learn about the changing properties of their environment and use them to behave adaptively. The neural mechanisms engaged in learning about the environment depend on the type of information being tracked. Nevertheless, the plasticity underpinning several of these various forms of learning has been argued to depend on the same molecular mechanism, ie, NmethylDaspartate (NMDA) subtypes of glutamate receptors. Accordingly a number of studies have investigated whether NMDA receptor manipulations affect learning and memory (Bannerman et al, 2012; Kuriyama et al, 2011; Bohn et al, 2003).
Most of the studies looking at the role of NMDA receptors in reward learning have focused on simple tasks, such as a single association between a stimulus and an outcome or its reversal. However, in more ecological, and thus complex, scenarios, learning and decisionmaking often involve consideration of positive and negative aspects of potential outcomes as well as the integration of learnt information with information that is explicitly cued. Importantly, it has recently been shown that integration of information conveyed by different dimensions during decisionmaking is an active process recruiting particular neural mechanisms (Burke et al, 2013; Stein and Stanford, 2008). Thus, to understand the full role of NMDA receptors in reward learning and decisionmaking, it might not be sufficient to only study a single simple component in isolation.
Interestingly, there is some, albeit indirect, evidence that NMDA receptors are not only important for learning, but also for integration of information: NMDA receptor blockade has been reported to affect the integration of multisensory information in cat superior colliculus (Binns and Salt, 1996) or the integration of reward and delay in rats (Floresco et al, 2008). However, to our knowledge, it has not been investigated whether NMDA receptors play a role in human valuebased decisionmaking and learning.
To examine the influence of changes in NMDA receptor activity on complex learning and decisionmaking behaviors in humans, we used the partial NMDA agonist dcycloserine. Dcycloserine binds to the glycine site of the NMDA receptor. Glycine is a coagonist of the NMDA receptor, meaning that NMDA receptors only open when both glycine and glutamate bind. Dcycloserine can thus increase the probability of glutamate release opening NMDA receptors, which in turn enhances NMDA receptormediated activation. We designed a multiattribute decisionmaking task in which participants learnt changing accepted article preview 13 June 2014 of gains and losses of two options. They made choices between options by integrating those learnt probabilities with explicitly cued information about gain and loss magnitudes. To assess potential effects of dcycloserine on learning and decisionmaking, we used reinforcementlearning models.
Surprisingly, we did not find any evidence for a change in the rate at which participants learned about reward or punishment outcomes. However, we found that dcycloserine improved decisionmaking. Dcycloserine led to a more optimal integration of the learnt probability information with the explicitly cued magnitude information.
METHODS
Participants
The study was approved by the local ethics committee. In total, 52 healthy volunteers (age 18–30) took part in the study (inclusion details in Supplementary Methods). The groups were wellmatched on sociodemographic and personality parameters (Table 1). Five participants were excluded (Supplementary Methods). There remained 20 participants in the dcycloserine and 27 participants in the placebo group.
Procedure
In a doubleblind, placebocontrolled design, participants were randomly allocated to a single dose of dcycloserine (250 mg) or matching placebo capsule. They fasted 2 h before the testing visit. Approximately, 250 mg was chosen in agreement with recent studies (Klumpers et al, 2012; Onur et al, 2010). Participants were tested 3 h after drug administration. According to product information (King’s Pharmaceutical), plasma peak levels are reached within 3–4 h; other studies (van Berckel et al, 1997, 1998; Patel et al, 2011) found that peak levels are reached within ∼1 h. However, given dcycloserine’s halflife of 8–12 h (product information) or 15 h (Patel et al, 2011), plasma levels would have been close to peak levels during testing, given either timetopeak information. To assess potential subjective changes following dcycloserine, participants completed questionnaires (Table 1) before capsule intake and before testing.
Probabilistic Instrumental Learning Task
Participants performed a probabilistic learning task with monetary wins and losses (Figure 1a). Participants made repeated choices between two options with the aim to maximize their monetary payoff.
On each trial, participants had a choice between two options. Each option had four independent attributes: a reward and a loss magnitude, a reward and a loss probability. The magnitude determined how many points could be won (and lost) on this trial, while the probabilities determined how probable winning and losing was respectively. After participants selected one of the options, they were shown the outcomes for both options. However, only the option they had chosen contributed to the participants’ earnings. In trials where the chosen option incurred both, wins and losses, the participants’ earnings in that trial were the sum of both. Therefore, to maximize the overall gains, participants should be choosing the option on each trial with the highest reward utility (reward magnitude × reward probability) and with the lowest loss utility (loss magnitude × loss probability).
Reward and loss magnitudes were explicitly cued at the time of choice and were randomly drawn from a flat distribution between 1 and 100 points. In contrast, the probabilities were not explicitly shown and had to be learnt across trials by observing the outcomes. The outcomes for one option could either be a win and a loss, only a win, only a loss, or neither win nor loss. The independent reward and loss probabilities determined the probability of these outcomes. The probabilities varied over the course of the experiment between 20 and 80%, with only one of the four probabilities varying at any given time (Figure 1c).
Before the experiment, participants were instructed about the task (see Supplementary Methods). Participants first performed 30 training trials, followed by 381 test trials, which were included in the analysis. Each participant was tested using the same task schedule to allow for better group comparisons. At the start of the task participants were given 4£ (400 points) to ensure that they had sufficient funds to sustain losses incurred even at the beginning of the experiment.
Analysis
The behavioral analysis compared the effects of dcycloserine vs placebo on (a) learning of reward and loss probabilities, and (b) integration of learnt (probabilities) and explicitly cued (magnitudes) information for guiding complex decisions. All analyses were performed in Matlab and SPSS.
Logistic regression analysis
To ensure that participants learnt the probabilities, we first assessed the impact of past outcomes (reward and loss) and the explicitly cued magnitudes (reward and loss) on choice, using a logistic regression analysis and normalized regressor estimates. We included regressors for the last five trial differences in reward and loss outcomes between the two options, as well as the differences in the explicitly cued magnitudes.
To investigate whether the groups differed in their learning speeds, the resulting regression weights for the past outcomes were entered into an ANOVA with group (dcycloserine vs placebo) as a betweensubject factor and time (1, 2, 3, 4, or 5 trials in the past) and valence (reward or loss) as withinsubject factors.
Modeling
To look at the learning effects more specifically and assess participants’ strategies for the integration of information to make decisions, we used reinforcementlearning models to fit each participant’s trialbytrial behavior.
Each model consisted of three main components. First, the model had estimates about the probabilities underlying the outcomes of both options. These were updated on every trial using a reinforcementlearning algorithm. Second, the probability estimates were integrated with the explicitly cued magnitudes to calculate how valuable each of the two options was (ie, their utility). Third, these two utilities were compared to predict participants’ choices. To determine the best parameter estimates for every participant, we used a standard loglikelihood maximization procedure.
When calculating how valuable each option is, participants might use different decision strategies for integrating learnt probabilities with explicit magnitudes. They could use a mathematically optimal strategy (utility as probability × magnitude). However, as this optimal strategy may be quite cognitively taxing, they could resort to a heuristic strategy (utility as a weighted sum of probability and magnitude). To test which decision strategy participants used, we fitted different models to the data. To test for differences in strategy, we then compared how well each of these models explained the groups’ behavior. In addition, we fitted a third model, which directly estimated to what degree they used either decision strategy.
Optimal model
This model assumed that participants integrated the learnt probabilities optimally with the explicit magnitudes (magnitude × probability). The learning of probabilities was modeled using a standard reinforcementlearning rule. On each trial, the estimated probability of an attribute was updated based on the trial’s outcome, as a function of the prediction error (PE):
with
where α is the learning rate. Thus, the learning rate is a measure of how much participants updated their probability estimate when the outcome associated with an attribute differed from their expectation (eg, the probability that the left option yields a reward). Separate learning rates were used for learning about wins and losses.
These predictions were combined optimally with the shown magnitudes:
The loss utility was computed in the same way and combined with the reward utility:
where λ determines how much participants weighted the prospect of rewards vs losses.
A standard softmax decision rule was used to predict the probability of choosing an option.
where β reflects a participant’s ability to pick the option with higher utility.
To assess the effect of dcycloserine on learning about wins and losses, we compared their respective learning rates between groups.
Heuristic model
This model differed from the optimal model only in the decision strategy for integrating learnt probabilities with explicit magnitudes. Instead of being a product of probability and magnitude, utility was computed as a weighted sum.
where μ is the probability weighting factor, describing the relative importance of the learnt probability compared to the explicit. The loss utility was computed in the same way, sharing the same μ.
Again, we compared the groups’ learning rates. Additionally, a change in learning could also manifest as a changed reliance on learnt compared to explicit information, we therefore also compared the probability weighting factors between the groups.
Hybrid model
We used the hybrid model to examine whether dcycloserine affected how participants integrated information, shifting them towards a more optimal, less heuristic decision strategy. The hybrid model computed utility as a weighted sum of the utilities from the optimal and the heuristic model:
where ω is the heuristic weight factor, determining how much the overall utility is like the heuristic or the optimal utility. The higher ω (between 0 and 1), the more a participant relied on a heuristic decision rule.
Model Comparison
If the groups differ in their decision strategy, this should also be reflected in how well the models incorporating the different strategies can explain behavior. To assess this, we compared the model fits using the Akaike information criterion (AIC). We calculated for each participant the AIC differences between the optimal and the heuristic model and also between the heuristic and the hybrid model. This allowed us to compare whether the dcycloserine differed from the placebo group in how well their behavior was explained by the heuristic relative to the optimal and by the heuristic relative to the hybrid model.
To confirm the results from the modeling analysis using a different method, we performed an additional regression analysis with regressors analogous to the components of the hybrid model. We included regressors for the explicit magnitude differences, for the learnt probability estimate differences, as well as for the difference in optimal utilities (magnitude × probability). The probability estimates were obtained using a Bayesian learner, like the one described in Behrens et al, 2007; also see Supplementary Methods. These Bayesian probability estimates are the most accurate estimates a participant could have given the past outcomes (Figure 1d). If participants’ decision strategy is more heuristic, the main effects should have a larger impact on behavior. Conversely, if the decision strategy is more optimal, the interaction term (magnitude × probability) should have a higher impact.
Learning About the Unchosen Option
There is some evidence (Boorman et al, 2011) that different brain areas are used to learn about the chosen and the unchosen option, thus there is a possibility they could be affected differently by dcycloserine. We found that dcycloserine did not affect the usage of the unchosen option’s outcomes for decisionmaking or learning (see Supplementary Methods and Results).
RESULTS
General Performance
In the task, participants had to constantly track the independent win and loss probabilities of two options and integrate these with explicitly cued reward and loss magnitudes. The groups did not differ in the overall earnings or in the mean points won/lost (Table 2a).
Logistic Regression
As a measure of learning, we assessed the impact of past reward/loss outcomes on choice, using a logistic regression analysis. We also included the currently displayed reward/loss magnitudes. Participants were more likely to pick options with higher reward (t(46)=18.6, P=5 × 10^{−23}) and lower loss magnitudes (t(46)=−14.9, P=3 × 10^{−19}). They also chose options more frequently when they were associated with more past wins and fewer losses (Figure 2a), thus suggesting that they were able to learn from past outcomes.
To test whether learning differed between the groups, and maybe in dependence on reward and loss valence, we ran a 2 (group) × 5 (time point) × 2(valence) ANOVA on the regression weights of the past outcomes (Figure 2a). That participants learnt the reward/loss probabilities over time was evidenced by the fact that recent reward/loss outcomes influenced choices more than longer ago ones (main effect of time, F(4,180)=80.6, P<10^{−6}). This effect of time was stronger for losses than rewards (interaction effect, time × valence, F(4,180)=6.2, P=3.8 × 10^{−4}), suggesting that loss probabilities were learnt more quickly.
Importantly, the groups neither differed in overall learning speed (time × group: F(4,180)=0.9, P=0.42), nor in their relative learning speeds for wins and losses (time × valence × group, F(4,180)=0.3, P=0.89).
Modeling
The aims of the modeling analyses were to assess whether the groups differed in their learning rate (all three models), as well as whether they differed in their decision strategies (model three). Each model used a standard predictionerror learning rule for learning about reward/loss probabilities. The models only differed in how learnt probabilities were integrated with explicit reward/loss magnitudes.
Optimal model
This model integrated magnitudes and probabilities optimally using a multiplicative rule. For assessing differences in learning, we compared the learning rate parameters between the groups. Surprisingly, the groups did not differ in their learning rates (reward: t(45)=−0.2, P=0.8, loss: t(45)=0.7, P=0.5), nor did they differ in any other model parameter (Table 2b).
Heuristic model
Participants might use a more heuristic and less optimal decision strategy to reduce task complexity. This was accounted for in the heuristic model, in which the decision utility was calculated as a weighted sum of probability and magnitude. Here, we could detect a change in learning either as a changed learning rate or as a change in how much impact the learnt probabilities have on decisions relative to the explicit magnitudes. Again, we did not find any group differences for the two learning rates (reward: t(45)=1.3, P=0.20, loss: t(45)=0.7, P=0.5) or for the probability weighting factor (t(45)=−0.36, P=0.72). The groups did not differ either in any other model parameter (Table 2c).
Hybrid model
This model combined the optimal and heuristic model’s decision strategy using an additional weighting factor (ω, heuristic weight); ω indexed the degree to which participants used the heuristic relative to the optimal strategy. Again, the two groups did not differ in their learning rates (Table 2d). However, there was a strong difference in the heuristic weight between the groups, with the dcycloserine group relying more on an optimal rather than heuristic decision strategy compared to the placebo group (t(45)=4.6, P=3.6 × 10^{−5}).
Model comparison
To investigate this shift in decision strategy further we compared how well each model explained the behavior of each group, using the AIC (Figure 2c). We hypothesized that if participants in the dcycloserine group used a more optimal decision strategy, models incorporating the optimal decision rule should better explain their behavior compared to the placebo group.
First, we compared the optimal model relative to the heuristic model. In both groups, the heuristic model provided a better fit to the data than the optimal model. However, this effect was weaker in the dcycloserine group than in the placebo group, as shown by a significant group difference (t(45)=2.1, P=0.046). This suggests that relative to the placebo, the dcycloserine group used a more optimal strategy. We also compared the fits of the heuristic to the hybrid model. Similarly, we found that the difference in scores varied significantly between the groups (t(45)=3.6, P=8.0 × 10^{−4}). For the dcycloserine group, the hybrid model provided the best fit (t(19)=3.9, P=0.01). However, for the placebo group, both models explained their behavior equally well (t(26)=1.6, P=0.11). These findings are in line with those from the modeling analysis above and suggest that dcycloserine shifted participants’ behavior to rely more on an optimal decision strategy.
To investigate this further, we ran a logistic regression analysis, analogous to the hybrid model (Figure 2b). It included an interaction term for magnitude and probability (analogous to the optimal model), as well as each factor separately (analogous to the heuristic model). Similar to our hybrid model findings, the groups did not differ in the heuristic aspect: there was no difference between the main effects for magnitude and probability (magnitude: F(1,45)=2.2, P=0.14, probability: F(1,45)=0.98, P=0.33). In contrast, the groups differed on the optimal aspect of the model: a twobytwo ANOVA analysis on the interaction regressors of magnitude and probability for reward and loss revealed that those regressors had a stronger impact on the decisions in the dcycloserine group (F(1,45)=10.8, P=0.002). Posthoc tests revealed that this effect was significant for loss (t(45)=−2.7, P=0.01) and was marginally significant for reward (t(32.4)=−2.0, P=0.056).
DISCUSSION
Summary of the Results
We studied the effect of changes in NMDA activity on human rewardguided learning and decisionmaking using dcycloserine, a partial NMDA agonist. Participants had to learn changing probabilities of positive and negative outcomes over time and integrate them with explicitly cued reward and loss magnitudes in order to make decisions. Although participants were able to track the changing probabilities, this learning was not modulated by dcycloserine. If there had been an effect on learning, we would have expected either a change in learning rates or alternatively a changed reliance on the learnt compared to the explicit information, neither was the case.
However, dcycloserine had a significant effect on decisionmaking. The drug shifted decisionmaking away from a heuristic approach towards a more optimal one. In the current context, more optimal behavior meant that participants multiplied the learnt probability and the displayed magnitude information rather than adding them (heuristic behavior). Note, however, that such multiplication might be more demanding especially since participants tracked several pieces of information at a time, making heuristic strategies an appealing way of reducing task complexity.
Absence of a Learning Effect
To our knowledge, there are no studies in humans that have looked at the effects of dcycloserine on rewardguided learning and decisionmaking. However, animal studies suggest that dcycloserine improves fear extinction (Walker et al, 2002), drug use extinction (Flavell et al, 2011), and appetitive conditioning (Golden and Houpt, 2007). However, our task is most similar to instrumental reward reversal learning task paradigms and it is still unknown whether this neurally distinct form of learning is also improved by dcycloserine, with Golden and Houpt, 2007 finding that at least simple reversal learning might not be.
Additionally, evidence for learning effects in human studies analogous to the animal studies above has been more inconsistent. Although there is evidence from clinical trials that dcycloserine may accelerate the therapy of different anxiety disorders (reviewed in Hofmann et al, 2013), some experimental studies have not been able to replicated the basic fear extinction effects in humans (Guastella et al, 2007; Klumpers et al, 2012). Furthermore, clinical trials for alcohol or for cocaine use extinction have failed to find an enhancement by dcycloserine (Kamboj et al, 2011, Price et al, 2013). This discrepancy between human and animal studies could be due to the strong dose dependency of dcycloserine’s effects (Walker et al, 2002): in general animal studies have used a higher dose (by about a factor of three) and injection rather than oral administration of the drug.
Changes in Evidence Integration for DecisionMaking
We found that dcycloserine led to a very selective shift in participants’ decision strategy, making them integrate different sources of information in a more optimal manner, compared to the placebo group. More specifically, participants integrated the learnt probability and the explicitly cued magnitude more multiplicatively (ie, nonlinearly) and thus less additively (ie, linearly).
Recent work has suggested functions for NMDA receptors beyond traditional longterm consolidation in learning. To our knowledge, nobody has yet tested whether changes in NMDAmediated neural processing would affect evidence integration in a complex valuebased decisionmaking task in humans. However, there is some evidence that NMDA receptors can modulate higher cognitive functions, like working memory (Wang et al, 2013), attention (Herrero et al, 2013; Meuwese et al, 2013; Self et al, 2012), or delay discounting (Floresco et al, 2008). However, it is important to note that it is unlikely that the decisionmaking effects in our task are a workingmemory effect in disguise. Collins and Frank, 2012 have shown that in a standard human learning paradigm (similar to the one used here), learning rate captures a mixture of corticostriatal learning systems as well as prefrontal working memory. Therefore, if the observed effect were driven by a workingmemory mechanism, we would have expected to see a change in at least one of the different measures we used to probe for learning effects. However, we did not see any such effects.
There are at least two theoretical frameworks that are, however, in line with our results. First, NMDA receptors have been proposed as crucial for the integration of information over time during decisionmaking in a biophysical model (Wang, 2002). Although this model was originally based on animal recordings during perceptual decisionmaking tasks (reviewed in Wang, 2008), recent human brain imaging studies have shown that it is also relevant for human rewardbased decisionmaking (Chau et al, 2014; Hunt et al, 2012; Jocham et al, 2012). Simulation studies have suggested that modulation of the NMDA receptor activity should enhance or impair decisionmaking, depending on the baseline activation of NMDA receptors (Standage and Paré, 2011). Following these findings, the decisionmaking improvement in our study could be explained as shifting the NMDA receptor activation into a more favorable range for decisionmaking.
Second, there is evidence that NMDA receptors are important for neural computations when more than one input is integrated nonlinearly. For example, Binns and Salt, 1996 have shown that NMDA receptors are crucial for multisensory integration. More specifically, they found that the neuronal superadditivity between inputs of different modalities in the cat superior colliculus was strongly reduced by NMDA blockade. Furthermore, Self et al, 2012 have shown that NMDA receptors are crucial for the integration of topdown and bottomup information during figure–ground separation. In the context of our task the optimal way to integrate the probabilities and the magnitudes presented, is nonlinearly (ie, multiplicatively), rather than linearly (additively). Therefore dcycloserine may have led to more optimal integration by biasing processing towards more nonlinear neural interactions.
One possible explanation for the occurrence of an effect on decisionmaking in the absence of a learning effect is that while both are supported by NMDA receptors, there is a different optimal range of NMDA activation supporting both functions (Standage and Paré, 2011). In addition, the strength of the effect of dcycloserine might vary across brain areas, due to different NMDA receptor subunit compositions or different local levels of glycine (see Yang and Svensson, 2008 for a review). Therefore, the behavioral dissociation could be explained as due to different dcycloserine sensitivities in reward decision compared to reward learningrelated brain areas.
Future Directions
Future experiments will have to establish whether our results generalize to the integration of dimensions other than probability and magnitude, and to situations where all information to be integrated is explicitly shown rather than learnt.
It would also be interesting to manipulate the predicted choices for linear vs nonlinear integration more widely to produce more distinct changes in choices and outcome between the groups, as well as testing whether other forms of nonlinear integration or conditional reasoning (modelbased) would also be affected.
Lastly, future decisiontask designs could explicitly contrast which theoretical NMDA integration mechanism mentioned above drives the behavioral effect: while the biophysical model explanation would predict decision enhancements in situations where attributes are integrated linearly or nonlinearly, the superadditivity mechanism would predict enhancements only where attributes need to be integrated nonlinearly.
CONCLUSION
We have found that enhancing the activation of NMDA receptors changed how humans integrated evidence in a multiattribute decisionmaking and learning task in the absence of more direct effects on learning. This suggests a role for NMDA receptors beyond their wellstudied role in longterm consolidation and plasticity. Although we cannot clearly dissociate between different candidate mechanisms for our observed effect, either biophysical attractor networks or basic superadditivity models could plausibly explain our data.
FUNDING AND DISCLOSURE
This research was funded by the Oxfordshire Health Services Research Committee (research grant awarded to AR, OHSRC 1021) and by the Wellcome Trust (JS and NK were awarded FourYear PhD Studentships, 092759/Z/10/Z and 089280/Z/09/Z and MFSR holds a Senior Investigator Award, WT100973AIA). CJH has valueless shares in p1vital and serves on their advisory panel. She has received consultancy payments from p1vital, Servier, Eli Lilly, AstraZeneca, and Lundbeck and is a director of Oxford Psychologists. The remaining authors declare no conflict of interest.
References
Bannerman DM, Bus T, Taylor A, Sanderson DJ, Schwarz I, Jensen V et al (2012). Dissecting spatial knowledge from spatial choice by hippocampal NMDA receptor deletion. Nat Neurosci 15: 1153–1159.
Beck AT, Steer RA, Brown GK (1996) Manual for the Beck Depression InventoryII. Psychological Corporation: San Antonio, TX.
Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS (2007). Learning the value of information in an uncertain world. Nat Neurosci 10: 1214–1221.
Binns KE, Salt TE (1996). Importance of NMDA receptors for multimodal integration in the deep layers of the cat superior colliculus. J Neurophysiol 75: 920–930.
Bohn I, Giertler C, Hauber W et al (2003). NMDA receptors in the rat orbital prefrontal cortex are involved in guidance of instrumental behaviour under reversal conditions. Cereb Cortex 13: 968–976.
Boorman ED, Behrens TE, Rushworth MF (2011). Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biol 9: e1001093.
Burke CJ, Brünger C, Kahnt T, Park SQ, Tobler PN (2013). Neural integration of risk and effort costs by the frontal pole: only upon request. J Neurosci 33: 1706–1713a.
Carver CS, White TL (1994). Behavioral inhibition, behavioral activation, and affective responses to impending reward and punishment: the BIS/BAS Scales. J Pers Soc Psychol 67: 319–333.
Chau BKH, Kolling N, Hunt LT, Walton ME, Rushworth MFS (2014). A neural mechanism underlying failure of optimal choice with multiple alternatives. Nat Neurosci 17: 463–470.
Collins AGE, Frank MJ (2012). How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis. Eur J Neurosci 35: 1024–1035.
Derryberry D, Reed MA (2002). Anxietyrelated attentional biases and their regulation by attentional control. J Abnorm Psychol 111: 225–236.
Eysenck HJ, Eysenck SBG (1994) Manual for the Eysenck Personality Questionnaire (EPQR Adult). Educational Industrial Testing Service.
Flavell CR, Barber DJ, Lee JLC (2011). Behavioural memory reconsolidation of food and fear memories. Nat Commun 2: 504.
Floresco SB, Tse MTL, GhodsSharifi S (2008). Dopaminergic and glutamatergic regulation of effort and delaybased decision making. Neuropsychopharmacology 33: 1966–1979.
Golden GJ, Houpt TA (2007). NMDA receptor in conditioned flavortaste preference learning: blockade by MK801 and enhancement by Dcycloserine. Pharmacol Biochem Behav 86: 587–596.
Guastella AJ, Lovibond PF, Dadds MR, Mitchell P, Richardson R (2007). A randomized controlled trial of the effect of Dcycloserine on extinction and fear conditioning in humans. Behav Res Ther 45: 663–672.
Herrero JL, Gieselmann Ma, Sanayei M, Thiele A (2013). Attentioninduced variance and noise correlation reduction in macaque V1 is mediated by NMDA receptors. Neuron 78: 729–739.
Hofmann SG, Wu JQ, Boettcher H (2013). DCycloserine as an augmentation strategy for cognitive behavioral therapy of anxiety disorders. Biol Mood Anxiety Disord 3: 11.
Hunt LT, Kolling N, Soltani A, Woolrich MW, Rushworth MFS, Behrens TEJ (2012). Mechanisms underlying cortical activity during valueguided choice. Nat Neurosci 15: 470–476, S1–3.
Jocham G, Hunt LT, Near J, Behrens TEJ (2012). A mechanism for valueguided choice based on the excitationinhibition balance in prefrontal cortex. Nat Neurosci 15: 960–961.
Kamboj SK, MasseyChase R, Rodney L, Das R, Almahdi B, Curran HV et al (2011). Changes in cue reactivity and attentional bias following experimental cue exposure and response prevention: a laboratory study of the effects of Dcycloserine in heavy drinkers. Psychopharmacology (Berl) 217: 25–37.
Klumpers F, Denys D, Kenemans JL, Grillon C, van der Aart J, Baas JM (2012). Testing the effects of Δ9THC and Dcycloserine on extinction of conditioned fear in humans. J Psychopharmacol 26: 471–478.
Kuriyama K, Honma M, Soshi T, Fujii T, Kim Y (2011). Effect of Dcycloserine and valproic acid on the extinction of reinstated fearconditioned responses and habituation of fear conditioning in healthy humans: a randomized controlled trial. Psychopharmacology (Berl) 218: 589–597.
Meuwese JDI, van Loon AM, Scholte HS, Lirk PB, Vulink NC, Hollmann MW et al (2013). NMDA receptor antagonist ketamine impairs feature integration in visual perception. PLoS One 8: e79326.
Onur OA, Schlaepfer TE, Kukolja J, Bauer A, Jeung H, Patin A et al (2010). The NmethylDaspartate receptor coagonist Dcycloserine facilitates declarative learning and hippocampal activity in humans. Biol Psychiatry 67: 1205–1211.
Patel DS, Sharma N, Patel MC, Patel BN, Shrivastav PS, Sanyal M (2011). Development and validation of a selective and sensitive LCMS/MS method for determination of cycloserine in human plasma: application to bioequivalence study. J Chromatogr B Analyt Technol Biomed Life Sci 879: 2265–2273.
Price KL, Baker NL, McRaeClark AL, Saladin ME, Desantis SM, Santa Ana EJ et al (2013). A randomized, placebocontrolled laboratory study of the effects of Dcycloserine on craving in cocainedependent individuals. Psychopharmacology (Berl) 226: 739–746.
Self MW, Kooijmans RN, Supèr H, Lamme Va, Roelfsema PR (2012). Different glutamate receptors convey feedforward and recurrent processing in macaque V1. Proc Natl Acad Sci USA 109: 11031–11036.
Spielberger CD, Gorsuch RL, Lushene PR, Vagg PR, Jacobs AG (1983). Manual for the StateTrait Anxiety Inventory (Form Y). Man StateTrait Anxiety Invent STAI 4–6.
Standage D, Paré M (2011). Persistent storage capability impairs decision making in a biophysical network model. Neural Netw 24: 1062–1073.
Stein BE, Stanford TR (2008). Multisensory integration: current issues from the perspective of the single neuron. Nat Rev Neurosci 9: 255–266.
van Berckel BN, Lipsch C, Gispende Wied C, Wynne HJ, Blankenstein MA, van Ree JM et al (1998). The partial NMDA agonist Dcycloserine stimulates LH secretion in healthy volunteers. Psychopharmacology (Berl) 138: 190–197.
van Berckel BN, Lipsch C, Timp S, Gispende Wied C, Wynne H, van Ree JM et al (1997). Behavioral and neuroendocrine effects of the partial NMDA agonist Dcycloserine in healthy subjects. Neuropsychopharmacology 16: 317–324.
Walker DL, Ressler KJ, Lu KT, Davis M (2002). Facilitation of conditioned fear extinction by systemic administration or intraamygdala infusions of Dcycloserine as assessed with fearpotentiated startle in rats. J Neurosci 22: 2343–2351.
Wang M, Yang Y, Wang CJ, Gamo NJ, Jin LE, Mazer JA et al (2013). NMDA receptors subserve persistent neuronal firing during working memory in dorsolateral prefrontal cortex. Neuron 77: 736–749.
Wang XJ (2002). Probabilistic decision making by slow reverberation in cortical circuits. Neuron 36: 955–968.
Wang XJ (2008). Decision making in recurrent neuronal circuits. Neuron 60: 215–234.
Yang CR, Svensson KA (2008). Allosteric modulation of NMDA receptor via elevation of brain glycine and Dserine: the therapeutic potentials for schizophrenia. Pharmacol Ther 120: 317–332.
Acknowledgements
We would like to thank Marco Wittmann for helpful comments on an earlier version of the manuscript and Bolton Chau, Rei Akaishi and David Bannerman for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supplementary Information accompanies the paper on the Neuropsychopharmacology website
Supplementary information
PowerPoint slides
Rights and permissions
About this article
Cite this article
Scholl, J., Günthner, J., Kolling, N. et al. A Role Beyond Learning for NMDA Receptors in RewardBased DecisionMaking—a Pharmacological Study Using dCycloserine. Neuropsychopharmacol 39, 2900–2909 (2014). https://doi.org/10.1038/npp.2014.144
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/npp.2014.144
This article is cited by

Activation and disruption of a neural mechanism for novel choice in monkeys
Nature (2021)

Cognitive and emotional empathy after stimulation of brain mineralocorticoid and NMDA receptors in patients with major depression and healthy controls
Neuropsychopharmacology (2020)

A randomized, placebocontrolled laboratory study of the effects of Dcycloserine on sexual memory consolidation in women
Psychopharmacology (2020)