INTRODUCTION

A large portion of the variance in higher cognitive function across the population can be accounted for by genetic factors (Friedman et al, 2008). For example, although there are multiple separable components of executive function, as indexed by latent factor analysis (Miyake et al, 2000), these components share a nearly perfectly (99%) heritable ‘common factor’ that is itself separable from general intelligence and perceptual speed (Friedman et al, 2008). Nevertheless, some components of executive function are trainable (eg, Dahlin et al, 2008a, 2008b), and are therefore sensitive to environmental factors.

Heritability does not preclude trainability, and indeed, these genetic and environmental factors may even be synergistic. One possibility is that the common heritable factor includes a strong motivational component that also influences learning. Indeed, motivational factors can influence neural activation and cognitive performance, both within levels of incentive in a given individual, and across intrinsic levels of motivation between individuals (Watanabe and Sakagami, 2007; Locke and Braver, 2008; Linke et al, 2010). Motivation might also influence the extent to which one might engage in activities that would further improve performance because of training. Genes that modulate the neural components of motivation, and in turn, the synaptic plasticity that results from having achieved motivated outcomes, should influence not only ‘baseline’ cognitive performance but also the degree to which training and experience would further enhance this performance. Supporting this interpretation, the heritability of general cognitive function actually increases linearly from childhood to young adulthood (Haworth et al, 2009), suggesting genetic modulation of environmental influences on cognition (Scarr, 1992). Thus, genetic contributions to cognitive function might, in part, affect the degree to which cognitive strategies are learned. In this study, we review some of the genetic factors that are expected to be relevant to the particular computations that support aspects of motivation, learning, and cognition.

A wealth of evidence across a range of species (including humans) implicates the dopaminergic system as a key component in learning and motivational processes (for reviews, see Robbins and Everitt, 2007; Berridge, 2007; Salamone and Correa, 2002; Cools, 2008; Doll and Frank, 2009). Dopamine is also critically involved in higher cognitive processes (Sawaguchi and Goldman-Rakic, 1991; Mehta et al, 1999; Cools et al, 2001; Frank and O’Reilly, 2006), and theoretical work suggests a link between these functions (Montague et al, 2004; Frank and O’Reilly, 2006). As reviewed in more detail below, in the striatum, dopamine is critical for reinforcing actions that are most likely to lead to rewards. Critically, the term ‘action’ refers to both lower-level motor programs and higher-level cognitive actions such as when and when not to update information into working memory (Braver and Cohen, 2000; Frank et al, 2001, 2005). This conceptualization fits with the observation that striatal activation predicts the extent to which working memory updating can be trained (Dahlin et al, 2008a). Further, in addition to learning effects, dopaminergic agents can also directly potentiate effortful motivated behavior (Salamone et al, 2009; Farrar et al, 2010).

Individual Differences in Response to Pharmacological Manipulations

Circumstantial evidence for a dopaminergic locus of individual differences in cognition comes from pharmacological studies. If dopamine modulates cognitive function, it should be possible to improve or impair performance by administering pharmacological agents that induce dopamine release or directly stimulate/block dopamine receptors. Indeed, dopamine-releasing stimulants are well known to improve cognitive function in both participants with ADHD and healthy individuals (Klorman et al, 1984; Elliott et al, 1997). However, multiple studies have shown that whether dopaminergic agents (particularly D2 receptor agonists), improve or impair cognitive function strongly depends on baseline measures (Kimberg et al, 1998; Mehta et al, 2004; Roesch-Ely et al, 2005; Frank and O’Reilly, 2006; Cools et al, 2009, 2007; Clatworthy et al, 2009). For example, D2 stimulation generally improves performance in individuals with low working memory span (Kimberg et al, 1998; Frank and O’Reilly, 2006), or high impulsivity (Cools et al, 2007), or low baseline DA synthesis (Cools et al, 2009), whereas it impairs performance in those in the opposite groups.

These seemingly disparate effects likely share a common mechanism. Indeed, baseline striatal DA synthesis is directly predictive of baseline working memory span (Cools et al, 2008). Moreover, the dependency of D2 drug effects on baseline working memory abilities applies not only to tasks with working memory demands, but also to those that tap into basic reinforcement learning processes (Frank and O’Reilly, 2006). Relatedly, baseline striatal DA synthesis predicts the extent to which DA drugs alter reinforcement learning processes (Cools et al, 2009). Furthermore, DA drugs can affect the degree to which working memory updating strategies are themselves learned across trials (Frank and O’Reilly, 2006; Moustafa et al, 2008b). Finally, a recent study showed that the extent to which stimulants cognitive function is predicted by drug-induced changes in D2 receptor availability in distinct striatal sub-regions, depending on the task (Clatworthy et al, 2009).

There is strong reason to suspect that these individual differences in response to DA drug effects on cognition and brain activity are, at least in part, genetic. For example, one study showed that a polymorphism associated with striatal D2 receptor affinity is predictive of the direction of the neural response to D2 agonist stimulation (Cohen et al, 2007). Other studies reviewed below suggest that DA polymorphisms are predictive of learning and performance in the same tasks and conditions that are modulated by pharmacological manipulations. Thus, these converging literatures suggest a key genetic component that may explain individual differences in cognitive function, motivation, and responses to pharmacological agents.

The remainder of this review provides a theoretical framework and associated evidence in the dopaminergic system, while identifying other candidate genetic loci of interest. But before doing so, we first address a few key caveats, common criticisms to the neurogenetic approach, and introduce a general recipe for addressing these criticisms.

A Note of Caution, Caveats, and Common Criticisms

Given the myriad of potential genes that could influence learning, it is important to note that exploratory genome-wide association studies (GWASs) that scan the human genome (some 20 500 genes) to identify predictive factors x, y, and z may suffer from an inability to draw substantive conclusions because of multiple comparisons, type-I errors, and the correlational nature of genetic findings. A number of recent reports suggest that current GWAS efforts to identify common genetic variants that underlie common psychiatric disorders do indeed suffer from these statistical limitations as the reported proportion of risk explained by common variation seems to be modest (Purnell et al, 2009; Stefansson et al, 2009; Shi et al, 2009). Further efforts to re-sequence genomic regions containing common variants of interest have yielded few functional mutations or polymorphisms that track illness within familial pedigrees. However, recent simulations suggest that when low-frequency disorder-causing variants are dispersed within common variants across large regions of the genome, sequencing that extends further outward along chromosomes containing GWAS loci may be sufficient to identify the much sought-after disease-causing variants (Dickson et al, 2010). As genome sequencing technology continues to improve in price and speed, it seems likely that the assignment of genetic polymorphisms to residual unexplained variability and risk could occur at a rapid pace (although this remains to be substantiated). For these reasons, along with others, we have begun to evaluate an approach that is well suited for the analysis of as-yet-undetermined bona-fide risk variants as well as known genetic variants that have been implicated in the development of normal and abnormal mental function.

A so-called neurocognitive–genetic strategy is an alternative to the large-scale GWAS approach to clinical or neuropsychological phenomena. This strategy leverages what is currently understood regarding the causal relationships of candidate genetic polymorphisms to protein function, synaptic physiology/structure and beyond, to the dynamics of local neural circuits and large-scale networks. This approach involves the co-application of two historically separate research disciplines—a molecular genetic approach that seeks to ascertain the genetic origins of individual differences and a cognitive neuroscience approach that seeks to understand human behavior in terms of specific cognitive systems and component neural mechanisms. The combination of these research methods is synergistic, as both seek to dissociate and dissect mental function along the lines of naturally occurring biological processes (ie, to cut nature at its joints). Thus, this approach is suited to constrain genetic analysis to candidate genetic factors that are hypothesized to alter processing in brain regions critical for the cognitive process of interest (Green et al, 2008; Tan et al, 2007a, 2007b; Frank et al, 2007a, 2009; Ullsperger, 2009). Critically, the hypotheses are informed by existing converging literatures based on patient populations, lesion studies, psychopharmacological data, functional imaging in humans, and direct measures in animal model systems. The heart of this neurocognitive–genetic method involves a working model that serves to interlink the biochemical processes that are carried out by proteins encoded by candidate genes with the neural dynamics of local circuits and broader networks that are measured and treated as endophenotypes (Box 1).

Before we elaborate several hypotheses linking candidate genetic variation with specific cognitive operations, we highlight a number of operational criteria that pertain to the design of neurocognitive–genetic association studies. These criteria are important to consider mainly because genetic variation can vary in its utility and suitability for hypothesis testing. For example, candidate genes used in extant studies on reinforcement learning have been selected based on pharmacological targets known to exert physiological or information-processing changes. These are, without question, the most well-studied class of candidate genes, whose effects are described further be low. Psychopharmacological studies, especially in combination with patients or neuroimaging, can also provide a ‘proof of concept’ that genetic effects associated with certain neurotransmitters are causal rather than simply correlational. Other forms of genetic variation derived from spontaneous or targeted mutations that confer anatomical or behavioral changes similar to those observed in pharmacological manipulations, are also relevant from a systems perspective. Yet another class of genetic variation is found in genes whose expression and/or change in expression is correlated with physiology and/or anatomy. Further discussion below of genes whose normal expression is limited to striatonigral- and striatopallidal-specific circuitry (Table 1) provides an entry point to begin to understand the differential development of these circuits.

Table 1 Candidate Polymorphisms Modulating Striatonigral vs Striatopallidal Function

Within each candidate gene, genetic variation in human populations may consist of changes as small as a single-nucleotide polymorphism (SNP) or as large as multi-kilobase duplications, repeats, and/or deletions. These polymorphic changes, which are found approximately once every 1000 bases, contain polymorphisms that are found rather infrequently and unevenly across the human diaspora. Indeed, the varied migratory patterns of human populations as far back as 10 000 and even 60 000 years ago and the highly admixed and shifting genetic structures of current human genetic populations pose a constant and formidable challenge to the design of genetic association studies (Fagundes et al, 2007). For example, the A-allele of a SNP consisting of a G vs A (valine vs methionine) at position 68690 (rs6265) in the brain-derived neurotrophic factor gene is found rather infrequently (<10%) in European and African populations but much more frequently (as high as 60%) in Asian populations (http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=6265). This type of disparity between the genetic structure of Asian vs European and African populations can, in principle, confound the conclusions drawn from genetic association studies wherein contrasting experimental groups are not well matched for ethnicity. Even as statistical methods are developed that take into account differences in hidden genetic structure between experimental populations, caution must be taken when basing conclusions on genetic polymorphisms that are rare and/or known to vary widely in frequency across different ethnic groups (Montana and Pritchard, 2004). To this end, we emphasize the utility of polymorphisms whose major and minor variants are found in equal or similar frequencies across ethnic groups.

Polymorphic variants can also be distinguished, not only by their frequency, but also by their location and functionality within a candidate gene. A linear string of nucleotide pairs along a chromosome consists of nucleotides that regulate the expression of genes (regulatory sites) as well as nucleotides that encode protein products (exons) and long stretches of intervening nucleotides that separate exons (intronic sequences). Thus, it becomes apparent that certain variations are more desirable for hypothesis testing than other variations. One of the most well-studied polymorphisms in the neurocognitive–genetic literature is the Val158Met polymorphism in the COMT gene. The change from valine to methionine has implications for the structure and activity of the mature protein making this particular genetic variant highly useful in genetic association studies. Other genetic variants such as intronic polymorphisms, for example, may give rise to no changes in the biochemical function of a mature protein or its expression levels. Still other variants, such as those located within DNA sequences that regulate gene transcription, messenger RNA (mRNA) splicing, and/or chromosomal structure are desirable for hypothesis testing although they do not lead to alterations in protein structure or activity. Thus, it is of interest to consider the presumed functionality of genetic variants within any candidate gene of interest. We recognize that for the majority of candidate genetic polymorphisms, however, current biochemical evidence to support presumed biochemical functionality may be scarce.

In spite of the caution exercised in selecting valid endophenotypes and appropriate candidate-gene variants, it is important to keep in mind that the genetic analysis of intermediate phenotypes is likely subject to many of the same limitations inherent to the alternative large-scale gene-mapping approaches. The main limitation seems to be the small contribution any single variant can make toward phenotypic variation. Whether the phenotype is a DSM-based questionnaire or the neural response of a particular brain region, both processes rely on extraordinarily complex biological systems and, therefore, caution must be taken when interpreting intermediate phenotypes as ‘more proximal’ to clinical assessments and therefore yielding higher signal-to-noise relationship to genotype. Although others have estimated that the effect-size (d0.5) for a genetic variant and a brain-based phenotype are not dramatically greater than the effects size for the same genetic variant and psychometric assessments of personality traits (d0.2), recent meta-analytical critiques involving the serotonin-transporter-linked promoter region (5HTT-LPR) suggest that genetic associations between genotype and brain endophenotypes can survive meta-analysis while associations between genotype and clinical assessments may not (Munafo et al, 2005). We argue that a candidate gene is more likely to yield an effect on behavior and/or brain activity when (i) the literature implicating a role for the neurotransmitter in question is relatively mature, and (ii) the task probing this role is sufficiently basic and includes conditions designed to be maximally sensitive to the computation of interest. In this manner, one can factor out a number of factors likely to overwhelm those of a single gene, and directly compare performance or brain activity in different conditions, which differ primarily in terms of a single process.

This point cannot be over-emphasized. For example, as reviewed in the context of learning below, the COMT Val158Met polymorphism is known to affect prefrontal dopamine levels and D1 receptor binding (Gogos et al, 1998; Matsumoto et al, 2003; Meyer-Lindenberg et al, 2005; Slifstein et al, 2008), and influences cognitive processing in various tasks that depend on prefrontal function (Egan et al, 2001; Goldberg et al, 2003; Blasi et al, 2005; Meyer-Lindenberg et al, 2005; Tan et al, 2007b; Frank et al, 2007a, 2009). However, in other studies COMT has no behavioral effect (eg, Ho et al, 2005), and a recent meta-analysis suggested that there is overall no effect of COMT (Barnett et al, 2008). It seems likely that variations in COMT, an enzyme that degrades DA, do not have affect a ‘blanket’ effect across all cognitive tasks, but rather might affect a more basic computation (eg, increased signal-to-noise ratio in prefrontal attractor states; Durstewitz and Seamans, 2008) that may well improve performance in some measures but not others (Goldman et al, 2009). Indeed, another meta-analysis concluded that COMT genotype reliably affects prefrontal activation with large effect size (d=0.73), with an advantage for met carriers in executive cognition, but an advantage for val carriers in emotional processing (Mier et al, 2009). In other cases, COMT genotype does not affect cognitive accuracy but the COMT-related increase in neural activity during executive control is associated with speeded execution of that control (Frank et al, 2007).

A remaining caution arises from the inherent complexity of the neural systems. As further elaborated, we describe only a handful of well-substantiated genetic variants with desired criteria for functionality, frequency, and so on. As each of these might be expected to only subtly alter the neural makeup of an entire circuit, it remains inherently difficult to understand how these genetic factors interact with each other. Unconstrained exploratory analyses of genetic interactions will require prohibitively large populations and even hypothesis-driven designs will require substantial scale-ups. For example, the effect of the well-studied COMT val/met polymorphism can itself be moderated by other dopaminergic polymorphisms such as DAT1 and DRD2 (see gene–gene interaction section below) as well as within the COMT gene itself (Meyer-Lindenberg et al, 2006; Diaz-Asper et al, 2008; Nackley et al, 2006).

To this end, we recognize the utility of biologically constrained computational models that attempt to bridge levels between cellular, network, and behavioral levels. Such models provide a powerful framework from which one can examine functional roles of specific mechanisms that are affected by individual genes. In particular, models specify the particular computations that might be altered by genetic variation, such that these can be hypothesis-tested with specific tasks that are designed to include varying levels of conditions that depend on these computations.

DISSOCIATING CORTICOSTRIATAL GENETIC COMPONENTS TO LEARNING, MOTIVATION, AND COGNITION

As introduced earlier, much evidence implicates the dopaminergic system in reinforcement-based decision making, learning, and cognition. Across a range of species from rats to humans, dopamine cells burst fire in response to positive ‘prediction errors’ (events that are better than expected), whereas these same cells pause in response to negative prediction errors (events that are worse than expected) (Schultz et al, 1997; Bayer and Glimcher, 2005; Roesch et al, 2007; Joshua et al, 2006; Pan et al, 2008; Zaghloul et al, 2009). A key assumption of reinforcement learning models is that these bursts and dips act as a ‘teaching signal’ by modifying synaptic plasticity in target structures (Houk et al, 1995; Wickens et al, 2003; Frank, 2005). Indeed, synaptic plasticity is strongly modulated by dopamine, and particularly, phasic dopamine signals in the striatum (Reynolds et al, 2001). Optogenetic studies in rodents reveal that phasic, but not tonic, stimulation of dopaminergic cells induces behavioral conditioning (Tsai et al, 2009). Conversely, selective genetic disruption of phasic dopaminergic burst firing (while leaving tonic activity intact) produces behavioral learning deficits (Zweifel et al, 2009).

A related issue is how these phasic signals modify learning in target structures. Much evidence implicates dopamine receptor signaling in the striatum, in which the major cell class is the medium spiny neuron (MSN). In particular, and D1 receptor stimulation is necessary for long-term potentiation in response to phasic bursts (Kerr and Wickens, 2001; Reynolds et al, 2001). It was recently shown using spike-timing-dependent plasticity protocols that a theta burst of corticostriatal activity leads to synaptic potentiation (LTP) in MSNs originating in the striatonigral ‘direct’ pathway and express D1 receptors (Shen et al, 2008). Conversely, the same protocol leads to depression in the striatopallidal ‘indirect’ pathway through D2 receptor stimulation. Furthermore, a lack of D2 receptor stimulation is associated with enhanced rather than depressed synaptic potentiation in striatopallidal neurons (Shen et al, 2008).

Together, these findings converge with computational models suggesting that dopamine bursts during positive prediction errors promote ‘Go learning’ in the basal ganglia through D1 receptor signaling, whereas dopamine dips during negative prediction errors promote ‘NoGo learning’ by disinhibiting D2 receptors and making striatopallidal cells more excitable to corticostriatal input patterns (see Figure 1; Frank, 2005). This network model can learn to select responses that are probabilistically most likely to be rewarded and to avoid those responses likely to yield negative outcomes. Recent feats in genetic engineering revealed striking support for these mechanisms, whereby reward and aversive/avoidance learning was impaired in animals with selected targeted disruption of striatonigral and striatopallidal cells, respectively (Hikida et al, 2010). Moreover, the model correctly predicted that human Parkinson's patients would show impairments in learning to make responses associated with positive outcomes but relative enhancements in NoGo learning from negative prediction errors—and that both these effects would be reversed when patients were taking dopamine medication (Frank et al, 2004, 2007b; Moustafa et al, 2008a; Cools et al, 2006; Bodi et al, 2009; Palminteri et al, 2009; Voon et al, 2010).

Figure 1
figure 1

(a) Anatomy of corticostriatal circuitry simulated in computational models and used as a framework for studying the roles of these systems in reinforcement learning and decision making. Potential actions in cortex are communicated to the striatum (caudate and putamen). The probabilities of yielding positive vs negative outcomes for these actions are learned as a function of dopaminergic reinforcement signals conveyed to striatonigral (‘Go’) and striatopallidal (‘NoGo’) neural populations expressing D1 and D2 receptors, respectively. The likelihood of selecting a given action is a function of the relative difference in these populations. In parallel, the hyperdirect pathway from frontal cortex to the STN implements cognitive control by modulating the overall threshold for executing an action as a function of decision conflict. (b) Candidate genetic factors specific to striatonigral and striatopallidal function, posited to alter learning and decision-making function.

PowerPoint slide

According to the model, dopamine depletion leads to relatively enhanced NoGo learning because of disinhibition of striatopallidal neurons, making them more excitable and plastic in response to negative prediction errors. This notion is supported by rodent studies showing that striatal dopamine depletion results in exaggerated excitability and synaptic potentiation in these indirect pathway cells (Day et al, 2008; Shen et al, 2008). Similarly, according to the model, increased phasic dopamine induced by levodopa medication (Harden and Grace, 1995; Wightman et al, 1988; Keller et al, 1988) alleviates Go learning deficits by increasing D1 receptor signaling in striatonigral neurons. However, medication tonically stimulates D2 receptors, even during periods under which DA dips would normally be elicited, thereby making the striatum insensitive to negative prediction errors and impairing NoGo learning (Frank, 2005). Recent functional imaging data support this notion, showing that medication blunts the normal striatal response to negative prediction errors and also impairs learning in this condition (Voon et al, 2010). Conversely, D2 receptor blockade actually improves NoGo learning from negative prediction errors in patients with Tourette's syndrome (Palminteri et al, 2009). This latter result is also supported by both theoretical and empirical data: D2-blockade and resulting enhancement of striatopallidal excitability and plasticity promotes avoidance learning in both models and rats (Wiecki et al, 2009).

In addition to the learning process, the computational models specify mechanisms associated with dynamics of the decision-making process itself, which may also be affected by genetic variation. Two main processes determine the speed with which responses are executed. First, faster responses are made as a function of the relative difference in activity levels between striatonigral/Go and striatopallidal/NoGo cells coding for the executed action. These relative activation differences can arise either because of previous learning (such that responses with higher reinforcement probabilities are more swiftly executed), or because of greater induced DA release (such that DA bursts as a function of novelty or reward predicting variables may directly influence Go vs NoGo activity, and hence response speed), or both (Moustafa et al, 2008a; Wiecki et al, 2009). It is to be noted that the same Go–NoGo mechanisms, operating in the ventral striatum as opposed to dorsal striatum coding of specific instrumental actions, can support the selection of general Pavlovian approach ‘actions’ in pursuit of rewards. These motor and motivational effects are supported by empirical observations, such that increased response–reward probabilities, DA release, or pharmacological DA stimulation, are associated with speeded responding and motivated behavior (eg, Satoh et al, 2003; Nakamura and Hikosaka, 2006; Everitt and Robbins, 2005; Robbins and Everitt, 2007; Salamone et al, 2009; Farrar et al, 2010). Similarly, the cost of performing an effortful action can be modulated through manipulation of the ventral-striatopallidal NoGo pathway (Mingote et al, 2008).

The second main process determining response speed is modulated by a third cortico-basal ganglia pathway, namely the so-called hyperdirect pathway from mediofrontal cortex (preSMA and anterior cingulate) to the subthalamic nucleus (STN), which sends excitatory projections to BG output nuclei (Figure 1; Nambu et al, 2000; Miller, 2007). According to the model, this pathway is particularly active under conditions associated with response conflict (Frank, 2006). As a result, the STN sends a transient but global signal to prevent any response from being executed prematurely. Consistent with this functionality, functional imaging studies report enhanced frontal cortex and STN activity as a function of response conflict and associated with response slowing (Aron et al, 2007; Fleming et al, 2010), and manipulation of STN function with deep brain stimulation leads to impulsive responding under decision conflict (Frank et al, 2007b).

Striatal Genetics of Reinforcement and Decision Making

The above body of data in basic science, computational modeling, and human pharmacology provides a basis for examining candidate genes that may affect the learning process, the decision-making process, or both. Although there is little work examining human genetics of DA release (but see Keri et al, 2008), there are now several studies examining polymorphisms affecting downstream signaling in the striatum during reinforcement learning and decision-making tasks. A few studies examined polymorphisms affecting DA signaling in the striatum. For example, dopamine- and cAMP-regulated phosphoprotein (DARPP-32) is largely expressed in striatum, is phosphorylated by D1 receptor stimulation and necessary for D1- mediated corticostriatal plasticity (Ouimet et al, 1984; Stipanovich et al, 2008; Calabresi et al, 2000; Valjent et al, 2005). Recent studies show that in response to physiological rewards, DARPP-32 accumulates in the nucleus, an effect that is mediated by D1 (and apparently not D2) receptor stimulation and is needed for behavioral reward learning (Stipanovich et al, 2008). Moreover, both nuclear accumulation induced by D1 stimulation and behavioral reward learning was abolished in a knock-in mouse in which a particular DARPP-32 phosphorylation site (Ser97) was mutated. Biophysical models further show that DARPP-32 serves to integrate dopamine signals across time (Lindskog et al, 2006), and is therefore a key substrate in probabilistic reinforcement learning.

It is noteworthy that genetic polymorphisms modulating DARPP-32 mRNA expression and cognition in humans are associated with changes in activation in the entire striatum, and striatal connectivity with frontal cortex, with no direct effect on any other brain region (Figure 2a) (Meyer-Lindenberg et al, 2007). Given the above discussion on the role of DARPP-32 in corticostriatal plasticity, we analyzed the effects of DARPP-32 genotype on human reinforcement learning. Two studies reported in independent samples, and across different reinforcement learning tasks that DARPP-32 polymorphism is predictive of probabilistic Go learning (Frank et al, 2007a, 2009). These effects were observed behaviorally by comparing performance in different conditions that rely on Go vs NoGo learning (ie, choosing an action based on its high probability of yielding a positive outcome or avoiding an action based on its high probability of yielding a negative outcome). Mathematical reinforcement learning models were used to quantitatively fit learning performance to determine the best fitting parameters that explain each individual's sequence of choices. These parameter estimates revealed that the above effects were mediated by DARPP-32 modulation of learning from positive prediction errors (Figure 3). Specifically, DARPP-32 genotype modulated the ability to discriminate between subtly different probabilistic reward values (Frank et al, 2007a).

Figure 2
figure 2

(a) DARPP-32 genotype modulates striatal activation across a range of cognitive tasks (from Meyer-Lindenberg et al, 2007). (b) COMT genotype modulates orbitofrontal cortex activation during reward receipt, with a gene–dose relationship (met/met>val/met>val/val) (from Dreher et al, 2009)

PowerPoint slide

Figure 3
figure 3

(a) Mathematical models were quantitatively fit to each individuals sequence of choices in a reinforcement-learning task. Plotted are the extent to which striatal DA genes, and dopaminergic medications in Parkinson's disease, affect learning as a function of positive relative to negative reward prediction errors. Enhanced DARPP-32 genetic function (T/T homozygotes) was associated with relatively greater learning from positive than negative outcomes as compared with C carriers. Conversely, enhanced striatal D2 receptor affinity in DRD2 C957T T/T carriers was associated with relatively greater learning from negative outcomes (Frank et al, 2009). Dopaminergic medication increased relative learning from positive outcomes in Parkinson's patients performing the same task (Moustafa et al, 2008a, where here we fit the mathematical model from Frank et al, 2009 to choices in the 2008 study). (b) Model fits enable inference regarding the learned ‘Q’ values of stimulus–action combinations as a function of their true probability of reinforcement. DARPP-32 T/T homozygotes could better differentiate between subtly different positive probabilities (80, 70, and 60%), whereas C carriers showed less differentiation (Frank et al, 2007). Error bars reflect standard error.

PowerPoint slide

In the same learning studies, we also examined the C957T polymorphism within the DRD2 gene, which has been reported to affect D2 receptor density in the striatum (Hirvonen et al, 2005), in which D2 receptors are by far most prevalent (Camps et al, 1989). In both studies, this D2 polymorphism was strongly predictive of learning from negative reward prediction errors, that is, to avoid those responses that most often led to negative outcomes (Frank et al, 2007a, 2009), consistent with the posited role of D2 receptors in the striatopallidal NoGo pathway. Moreover, such learning obeyed a gene-dose effect whereby T/T homozygotes showed the greatest learning from negative, but not positive outcomes, and C/C carriers performed most poorly in this condition. This DRD2 gene-dose effect has been replicated in a third independent sample in our lab (unpublished data). Others have since reported D2-related genetic influences on behavioral NoGo learning and modulation of striatal activity during negative outcomes (Klein et al, 2007; Jocham et al, 2009). These studies examined the Taq1A polymorphism, which has also been reported to be associated with striatal D2 receptor density (Pohjalainen et al, 1998). Although this SNP is 3′ downstream from DRD2, its effect on D2 function is now thought to be mediated through indirect linkage with other polymorphisms within DRD2 (including C957T) (Zhang et al, 2007). Indeed, a recent analysis found that while Taq1A is indeed associated with probabilistic NoGo learning, this effect vanished when controlling for C957T genotype (Frank and Hutchison, 2009). In contrast, two other functional polymorphisms within DRD2 accounted for additional variance in NoGo learning. Together, these multiple variations within DRD2 accounted for approximately 20% of the variance in a single measure of avoidance learning (Frank and Hutchison, 2009).

That these effects are related to striatal D2 function is supported by evidence cited above showing that learning from negative outcomes, and its sensitivity to D2 drug administration, is predicted by baseline striatal DA synthesis (Cools et al, 2009). Further, a recent neuroimaging study showed that the adverse effects of dopaminergic medication on learning from negative outcomes in Parkinson's disease are accompanied by blunted striatal response to negative prediction errors (Voon et al, 2010). Rodent genetic manipulations provide additional support for this mechanism: transient overexpression of striatal D2 receptors, or selective disruption of striatopallidal cells, lead to NoGo learning deficits (Bach et al, 2008; Hikida et al, 2010). Similarly, genetically induced elevations in striatal DA (resulting from dopamine transporter (DAT) knockout) produce deficits in avoidance learning (Costa et al, 2007), similar to the pattern of data observed because of pharmacologically induced striatal DA elevations (Frank et al, 2004). These findings may also be related to the observation that drug-addicted rats with ventral striatal D2 receptor dysfunction fail to learn to inhibit responding to drug-related cues even when they are followed by punishment (Dalley et al, 2007; Belin et al, 2008).

The above human genetic findings have been replicated in multiple studies, samples, and tasks, all of which were designed to vary demands loading on the hypothesized probabilistic reinforcement mechanisms, and thereby confirm the validity of the approach. Given the role of basal ganglia and dopamine across a range of higher-level cognitive processes (eg, working memory updating and manipulation, reviewed earlier), it may be expected that striatal genes would also affect such processes. Indeed, DARPP-32 genotype modulates not only reinforcement learning but also higher-level cognitive processes (Meyer-Lindenberg et al, 2007). Other evidence comes from studies investigating the effects of the DAT1 polymorphism, which modulates gene expression and density of the DAT and consequently striatal dopamine availability (Mill et al, 2002; Sesack et al, 1998; VanNess et al, 2005), and is a candidate gene for ADHD surviving meta-analysis (Gizer et al, 2009). It is noteworthy that DAT1 genotype is predictive of brain and behavioral responses to cognitive flexibility (Garcia-Garcia et al, 2010) and, in particular, a recent study showed that DAT1 modulates striatal (caudate) activation as a function of working memory load in an updating task (Stollstorff et al, 2010). A role for striatal dopamine in cognitive flexibility and working memory updating is consistent with multiple behavioral pharmacological and imaging studies (Cools et al, 2007; Clatworthy et al, 2009; Frank and O’Reilly, 2006; Moustafa et al, 2008b). Thus, given the evidence that striatal activation predicts the extent to which cognitive training can be achieved with updating tasks (Dahlin et al, 2008a), future research should analyze the degree to which these training effects are modulated by variations with striatal dopaminergic genes such as DARPP-32, DAT1 and DRD2. Such an association would provide some evidence for the notion raised in the introduction, that genetic effects on executive function reflect a modulation on the degree to which cognitive strategies are learned.

Prefrontal Genetic Contributions: COMT and DRD4

As mentioned in the introduction, the val/met polymorphism within the COMT gene has been extensively studied in cognitive tasks and as an intermediate phenotype for schizophrenia (Egan et al, 2001; Blasi et al, 2005; Meyer-Lindenberg et al, 2005; Tan et al, 2007b; Frank et al, 2007, 2009). COMT is an enzyme that breaks down dopamine, and affects dopamine levels in prefrontal cortex (in which it is the primary mechanism for DA clearance) but has little to no effect on striatal dopamine (in which DATs are abundant and more efficient) (Sesack et al, 1998; Gogos et al, 1998; Tunbridge et al, 2004; Matsumoto et al, 2003; Meyer-Lindenberg et al, 2005; Slifstein et al, 2008). In a relatively large-sample imaging genetics study, striatal dopaminergic polymorphisms predicted striatal reward-related activity, but COMT genotype did not (Forbes et al, 2009). Accordingly, COMT genotype does not influence learning in measures require integrating the statistics of reinforcement across multiple trials (Frank et al, 2007a, 2009; Frank et al, 2007), a process thought to depend on striatal dopaminergic mechanisms. Rather, COMT influences other task measures requiring rapid changes in behavior on a trial-to-trial basis. Indeed, a gene-dose effect of COMT was found on the tendency for participants to slow down and shift their response after a single instance of negative feedback (‘lose-switch’) the next time the same stimulus was encountered (Frank et al, 2007a). Met/met participants showed the greatest level of such shifting (particularly during early phases of task acquisition), whereas val/val participants showed the least. Recent EEG studies revealed that lateral frontal activity during reward prediction errors predicts the speed of these subsequent trial behavioral adaptations (but is not associated with probabilistic integration) (Cavanagh et al, 2010), supporting the notion that COMT effects on behavior are frontal.

Overall, these findings are consistent with the broader literature on COMT and its tendency to influence performance in shifting or updating tasks (Egan et al, 2001; Goldberg et al, 2003; Diaz-Asper et al, 2008). Superficially, however, they seem to conflict with prominent biophysical models of prefrontal dopamine function and associated empirical data (Durstewitz and Seamans, 2008; Seamans and Yang, 2004). These models suggest that because of higher PFC DA levels, met/met individuals show stable prefrontal activation states, which facilitate working memory maintenance but at the same time renders these states less flexible or labile (Durstewitz and Seamans, 2008). Val/val participants would then show the opposite characteristic, having less robust maintenance but increased flexibility to update or shift.

This account seems to conflict with our data showing that met carriers are more likely to shift responses during acquisition of reinforcement contingencies. However, it is to be noted that in the reinforcement learning task, to detect the relevant conditions in which to shift, one must maintain the outcome associated with a particular stimulus choice in working memory for a number of intervening trials (during which other stimuli are presented, selected, yielding their own outcomes) before the relevant stimulus appears again. Thus, shifting because of negative outcomes in this context requires robust maintenance capabilities and the ability to prevent intervening information from disrupting stable attractor states. It is precisely this demand posited by various computational models to require sufficient PFC DA to increase the signal-to-noise ratio in stable activation states (Durstewitz et al, 2000; Cohen et al, 2002; Seamans and Yang, 2004; Rolls et al, 2008), and thought to be more robust in met/met individuals (Durstewitz and Seamans, 2008), particularly when interference management is critical (Goldman et al, 2009). In reward learning tasks, models suggest that the orbitofrontal cortex maintains recent reinforcement experiences in an active working memory-like state, which can govern trial-to-trial behavioral adjustments through top–down projections (Frank and Claus, 2006; Deco and Rolls, 2004). Supporting this interpretation, orbitofrontal patients show impairments in early phases of acquisition in reinforcement tasks (Chase et al, 2008), as do patients with schizophrenia (Waltz et al, 2007). Moreover, recent studies reported a gene-dose effect of COMT on orbitofrontal activity during reward receipt (Figure 2b) (Dreher et al, 2009) and in lateral PFC during reward anticipation (Dreher et al, 2009; Yacubian et al, 2007). Thus, COMT, which affects prefrontal (and particularly orbitofrontal) dopamine levels (Slifstein et al, 2008), may affect the robustness with which recent reinforcement occurrences are maintained online to determine when it might be appropriate to adjust behavioral strategies. Nevertheless, future studies should discriminate between shifting per se, and maintenance abilities required to detect conditions under which to shift.

More recently, we examined effects of COMT, DARPP-32, and DRD2 genotypes on a novel task requiring response time adjustments to maximize reward. Participants had to integrate both reward probability and reward magnitude across multiple trials to determine whether it was better (ie, expected value was higher) in a given block of trials when responding fast or slow (Frank et al, 2009). As multiple factors were found to govern response times, a mathematical model was used to estimate the degree to which each of several strategies were used by any individual. It is noteworthy that the striatal polymorphisms again were associated with measures of Go and NoGo learning: the DARPP-32 genotype was predictive of the extent to which participants incrementally speeded their responses as function of positive reward prediction error, whereas the DRD2 genotype was predictive of incremental slowing because of negative prediction errors (Figure 3). These effects again converge with those observed as a function of pharmacological manipulations of striatal dopamine in Parkinson's patients performing the same task (Moustafa et al, 2008a), and with effects of striatal D1 and D2 receptor manipulation on approach and avoidance as a function of reinforcement outcomes in rodents and primates (Dalley et al, 2005; Nakamura and Hikosaka, 2006; Klein and Schmidt, 2003; Wiecki et al, 2009).

In contrast to the striatal DA genes, COMT did not influence incremental response time adjustments, but had a large effect on rapid trial-to-trial exploratory adjustments (Frank et al, 2009). Specifically, COMT influenced the extent to which participants strategically adjusted their responses in the direction of greater Bayesian uncertainty regarding the potential outcomes of those responses. In other words, if fast responses had been reinforced, met allele carriers were more likely to shift their responses toward slow, particularly when they would be most uncertain regarding the likelihood that slower responses might produce yet larger rewards. This effect was manifest in terms of a strong and reliable monotonic gene–dose relationship between met allele expression and the model parameter estimating the individual's degree of ‘uncertainty-driven exploration’ (Figure 4a). Thus, COMT influenced the degree to which individuals were motivated to pursue actions that could potentially improve their performance beyond the status quo.

Figure 4
figure 4

Frontal genetic effects on decision making. (a) Gene-dose effect of COMT on model-derived parameters estimating the degree to which individuals make trial-to-trial exploratory responses in proportion to the uncertainty regarding whether other responses might yield better outcomes than the status quo. Met/met participants showed the greatest degree of exploration in proportion to outcome uncertainty, and val/val participants showed the least amount (Frank et al, 2009). (b) DRD4 effects on conflict related activity in anterior cingulate cortex (Fan et al, 2003), which were accompanied by conflict-induced response time slowing.

PowerPoint slide

This dissociation between prefrontal and striatal dopaminergic components found across multiple tasks and independent samples, highlights the advantage of brain-based hypothesis approach to genetic research. Whereas the striatal reinforcement learning mechanisms have been specified by the neural models of basal ganglia, the prefrontal effects are interpreted in the context of other models and empirical data. Imaging studies reveal anterior prefrontal cortical activations when participants make exploratory choices in reinforcement tasks (Daw et al, 2006), which are sensitive to the value of alternative outcomes (Boorman et al, 2009). Again, we interpret the COMT effects in terms of the role of DA in computational models suggesting that dopamine enhances the signal-to-noise ratio and stabilizes prefrontal attractor states (Durstewitz et al, 2000; Seamans and Yang, 2004; Durstewitz and Seamans, 2008), and that the same functions may apply to working memory for reinforcement outcomes (Frank and Claus, 2006). Although the mechanism for monitoring uncertainty to govern exploration is somewhat less specified, other neural models have shown that uncertainty quantities are naturally extracted from probabilistic population codes (Zemel et al, 1998). It is possible that such population codes encode reward values in orbitofrontal cortex. Alternatively, it is possible that COMT modulates the extent to which individuals can over-ride their learned associations through top–down cognitive control and behavioral inhibition, as found in other inhibitory control tasks (Krämer et al, 2007). Thus, future imaging genetics studies are required to determine which aspect of these computations might be affected by COMT, leading to a greater influence of outcome uncertainty in driving exploration.

DRD4

The DRD4 gene is also associated with dopaminergic function in prefrontal cortex, in which the D4 receptor is primarily expressed (Oak et al, 2000). Interestingly, the DRD4 gene is predictive of error-related prefrontal activity and subsequent behavioral adjustments following these errors (Krämer et al, 2007). Furthermore, DRD4 also predicts the extent to which individuals show response slowing as a function of decision conflict (Fossella et al, 2002), an effect that is accompanied by changes in anterior cingulate activation (Fan et al, 2003) (Figure 4c). How can these effects be interpreted mechanistically or computationally? As noted earlier (see also Figure 1), neurocomputational models and other data suggest that conflict-induced slowing and inhibitory control are mediated by interactions between mediofrontal areas and the STN (Aron et al, 2007; Frank et al, 2007b). It is noteworthy that although the D4 receptor is very minimally expressed in striatum, it is more strongly represented in STN (Matsumoto et al, 1996) and controls activity presynaptic input and postsynaptic output from that nucleus (Flores et al, 1999; Hernndez et al, 2006). Thus, future studies should assess whether DRD4 modulates functional connectivity between frontal cortex and the STN, or the STN and BG output structures, under decision conflict.

GENE–GENE INTERACTIONS

Cellular

As the previous discussion suggests, individual differences in reinforcement learning can be probed with a joint use of both suitable genetic variations as well as suitable cognitive tasks in a manner that permits the testing of models of physiological processes and neural circuitry. However, we acknowledge that individual genes, even when used as experimental tools in efforts to dissociate neuro-cognitive processes, function themselves in complex intra-cellular networks. A short review of these intracellular networks within both striatonigral and striatopallidal cells reveals that (i) the experimental utility of individual genetic polymorphisms can be dependent on the interaction (and hence genotype) of other polymorphisms and (ii) that there are many other biochemical factors such as, DRD1, DRD2, COMT, and DRD4 whose expression and/or function is enriched or limited to either the striatonigral or striatopallidal pathways. For example, striatonigral MSNs selectively express an abundance of D1 receptors, which can actvate adenylyl cyclase by way of a Golf G-protein binding (Surmeier et al, 1997). Consequent increases in cytosolic cAMP levels then lead to the activation of protein kinase A (PKA), which can phosphorylate signalling proteins, such as DARPP-32 and enhance neuronal excitability by promoting the trafficking of AMPA receptors and NMDA receptors to the cell surface (Hallett et al, 2006). In striatopallidal MSNs, which selectively express an abundance of D2 receptors, adenylyl cyclase is normally inhibited by D2 stimulation through G-α-i/o protein binding (Kheirbek et al, 2009). Only when DRD2 signalling is reduced, adenylyl cyclase becomes unrepressed and is able to support cAMP-dependent phospho-signaling of downstream targets such as DARPP-32 and the GluR 1subunit of the glutamate AMPA receptor (Stoof and Kebabian, 1981). This protein–protein interaction is, however, dependent on an additional layer of regulation; that of adenosine A2A receptors (A2ARs), which also interact with adenylyl cyclase through a stimulatory G--olf-dependent interaction (Corvol et al, 2001). Hence, when A2ARs are inactivated, it is not possible to activate adenylyl cyclase through the DRD2-blocker haloperidol, or to induce extended states of potentiation (Hakansson et al, 2006; Shen et al, 2008). Others have found that A2ARs are necessary for DRD2-induced changes in immediate-early gene expression (Chen et al, 2001). Thus, it is likely that genetic associations between inhibitory behavioral responses and genetic variants in the DRD2 gene would be dependent on genetic variability of these other members of the signaling cascade. That members of this signal-transduction pathway (DRD2, ADORA2A, and ADK) are differentially expressed in striatopallidal cells, seems to support this hypothesis. Similarly, the expression of the striatonigral-enriched genes DRD1, TAC1, and PDYN are also coupled (Xu et al, 1994). Thus, future studies might benefit from a consideration of genotypic status of such additional interactions. As a practical matter, however, this creates a new experimental challenge wherein large population sizes are needed to obtain sufficient numbers of allele-specific genotypic sub-groups. Nevertheless, new striatonigral- and striatopallidal-specific genes that can, as described in the Future Directions section below, be used as experimental tools in human populations to test new aspects of core neuro-anatomical models.

Network

Furthermore, we acknowledge that the associations of individual genes with cognitive performance may depend, not only on gene–gene interactions within cells, but also on cell–cell interactions at the level of the synapse and even circuit dynamic level. Interactions between genes primarily controlling function in distinct brain regions may also be implicated in individual differences in behavioral conditions requiring functional connectivity between these regions. For example, there is evidence that the frontal cortex regulates dopaminergic input to the striatum, and that there is a negative relationship between prefrontal and striatal dopamine (Roberts et al, 1994; Bertolino et al, 2000; Jackson et al, 2001). In addition, prefrontal activity may directly influence striatal activity, and it's response to dopaminergic signals, through top–down corticostriatal projections. Thus, it may be expected that prefrontal dopaminergic genes might interact with striatal dopaminergic genes to predict striatal response to reward. Recall the simple depiction that COMT directly affects prefrontal but not striatal measures, as supported by rodent data and large sample human imaging studies (Gogos et al, 1998; Matsumoto et al, 2003; Tunbridge et al, 2004; Meyer-Lindenberg et al, 2007; Forbes et al, 2009), and that conversely, DAT1 influences striatal but not prefrontal DA (Sesack et al, 1998; Madras et al, 2005; Durston et al, 2008). However, two other imaging studies report that although COMT modulates prefrontal activity, it also influences the extent to which striatal activations are modulated by DAT1 during reward anticipation (Yacubian et al, 2007; Dreher et al, 2009).

These recent findings are consistent with a role for PFC in modulating striatal DA (Bilder et al, 2004), or striatal activation in response to DA. Nevertheless, there remains little evidence that COMT influences behavioral measures typically associated with striatal function. A notable exception is a recent study in which COMT influenced both reinforcement learning in a dynamic environment, and striatal response to reward prediction errors (Krugel et al, 2009). Interestingly, the investigators reported that COMT effects on behavior were mediated by functional connectivity between prefrontal cortex and striatum. These interactions might be expected in conditions in which information held in working memory provides a contextual signal that modulates how the striatum interprets reinforcement outcomes. Recent evidence indicates that when participants are given a previous instruction regarding choices are likely to be correct, their subsequent learning is subject to a confirmation bias that distorts the true reinforcement statistics observed in the environment. Interestingly, this confirmation bias is modulated by both COMT and striatal genotypes (Doll, Hutchison, and Frank, unpublished data), with the overall pattern of genetic data supporting the predictions of one of two competing models. Specifically, the data support a model positing that, because of top–down frontostriatal projections, reinforcement outcomes may be amplified or discounted in the striatum so that learned associations are consistent with previous ‘beliefs’ held in working memory (Doll et al, 2009). (The competing model suggests that striatum learns the actual statistics of reinforcement but is over-ridden by PFC for control of behavior). Although preliminary and requiring further investigation, this example illustrates how neurogenetic research may not only analyze individual differences in brain–behavior phenomena, but may itself provide an instrument with which one can test between alternative theories in cognitive neuroscience.

LIMITATIONS OF MODEL

The model we have depicted, while useful for theory building and consistent with the available evidence, has limitations. First, for simplicity the model does not consider other aspects of the neural circuitry, such as influences of other neurotransmitters such as serotonin and acetylcholine within the basal ganglia (for more extensive discussion of these limitations see Cohen and Frank, 2009). Similarly, we have focused on individual differences in learning from dopaminergic reinforcement signals assuming the signals themselves are intact. Of course, variation in learning abilities may also involve modulations of neural functions upstream of the dopaminergic system, leading (for example) to disrupted ability to convey prediction error signals, as seems to occur in schizophrenia (Waltz et al, 2009). Computational models of these upstream processes exist (Barto et al, 1995; Brown et al, 1999; Hazy et al, 2010) but much more empirical work remains in identifying the critical computations of the myriad of brain structures involved in generating reward and punishment predictions before we could identify the most likely genetic candidates controlling individual differences in such functions. As far as prefrontal function is concerned, thus far a neurogenetic strategy has not been able to identify genetic factors associated with function in restricted PFC sub-regions (allowing dissociation between, for example, processes depending on frontal pole vs dorsolateral PFC). Within corticostriatal circuitry, models must be developed to better understand how the many corticostriatal loops interact to support behavioral plans, and how individual differences in these interactions might lead to disrupted planning at different behavioral time horizons.

Similarly, the model discussed here leaves out contributions of other structures known to have critical roles in cognitive function, such as the hippocampus and parietal cortex. Indeed, while we focused here on COMT effects on PFC/working memory and exploration, there is also evidence that COMT genotype modulates episodic memory, through additional effects in the hippocampus (and interactions with ventrolateral PFC), which are becoming increasingly appreciated (Alessandro et al, 2006; Bertolino et al, 2008; Ehrlich et al, 2009). It is possible that some of the effects discussed above rely to some extent on retrieval from episodic memory rather than online working memory maintenance. Indeed, other evidence for met carriers showing rapid behavioral adaptation comes from an episodic memory task in which participants had to decide whether items on a list were among those studied previously (Frank et al, 2007). The first decision was forced to be speeded, but after further consideration participants were given the opportunity to reverse their decision with a second response if they decide that the first instinct was erroneous. Reversals of this type were accompanied by an event-related potential recorded with EEG, called the error positivity. It is noteworthy that met/met individuals showed both larger error positivity signals, and were faster to reverse their errors, than val carriers (Frank et al, 2007). Although the EEG signals reflect cortical rather than hippocampal processes, the papers cited at the beginning of this paragraph suggest that it is likely that COMT modulates functional coupling between PFC and hippocampus during episodic memory retrieval.

To this end, we briefly acknowledge the vital importance of animal models in the past and future process of model building and hypothesis testing. In most cases, human behavioral–genetic hypotheses are based on a wealth of molecular genetic research conducted in animal models. Even when cognitive neurogenetic hypotheses are informed by pharmacological data (as reviewed earlier) there exists a great deal of converging in vitro, mouse, rat, and primate model data close at hand. The conservation of both genomic information and, to some extent, gross neuroanatomy across mammalian species suggests that hypotheses of human cognitive function can be, at least initially, reasonably derived from models obtained from animal model research. For example, mice that lack functional striatonigral- and striatopallidal-enriched genes show attenuated responses to reward and difficulties in reinforcement learning paradigms. This is the case for both DRD1 and DRD2 mice carrying targetted mutations who show abnormalities in learning and neurophysiological correlates of learning (Centonze et al, 2003; Holmes and Sibley, 2004; Waddington et al, 2005). Mice with knock-out of the DARPP-32 gene show markedly reduced striatal responses to dopamine and fail to show corticostriatal D1-dependent long-term potentiaton and depression (Fienberg et al, 1998; Calabresi et al, 2000) needed to support a faithful representation of probabilistic reward values. Similarly, adenylyl cyclase (type 5) mutant mice were unable to successfully learn reward contingencies in a cross-maze paradigm (Kheirbek et al, 2009) and mice with selective deletion of A2A receptors in the striatum were found to be impaired in their ability to form habitual responses, such that their actions remained sensitive to sudden changes in reward contingencies (Yu et al, 2009). These data converge with those reviewed above suggesting that A2A receptors, which are prevalent in the striatopallidal pathway and interact with D2 receptors, are critical for synaptic plasticity and learning.

FUTURE RESEARCH DIRECTIONS

The above review highlights the advantage of brain-based hypothesis approach to genetic research. It should be clear that this field will improve as theories regarding underlying brain mechanisms for particular phenotypes are further developed in basic research, as novel candidate genes are discovered, and as genetic methods are combined with other methodologies, including imaging. Moreover, because both genetic and imaging methods are correlational, it is vital that theories implicating genetic contributions to behavioral and neural phenotypes are tested by manipulation studies, including pharmacology and trans-cranial magnetic stimulation. For example, if the association between a particular dopaminergic gene and behavior or neural activity is causal, then one should be able to alter this relationship by manipulating the dopaminergic system. Conversely, the dopaminergic genes may be predictive of the direction of pharmacological effects on brain and behavior. At this writing, studies examining this question are extremely rare, but are required for making further inferences.

The above approach can also be used to better understand the neural and genetic basis for other disorders. For example, human genetic associations for grooming disorders in OCD are related to a gene, SAPAP3, whose expression is enriched in the striatum (Bienvenu et al, 2008). Mice that lack the striatal-enriched, dendritically expressed, postsynaptic density scaffolding protein DLGAP3 (SAPAP3) show excessive grooming, which, as in the case of OCD in humans, can be ameliorated with SSRIs (Welch et al, 2007). Exactly how serotonin functionally modulates the computations within corticostriatal circuits is poorly understood, but application of convergent strategies will be critical for deciphering this complex issue.

Genes and Environment

As suggested earlier, individual differences in learning efficiency may underlie a number of behaviors over the course of development. As dopamine signals are now considered central to reinforcement learning, the process itself is necessarily dependent on an interaction with the environment. Therefore, it seems relevant to examine not only how neural circuits process contingencies in the environment, but also how the environment influences biochemical and genetic pathways that are involved in the development and physiology of component neural circuits. In such a case, genes involved in reinforcement learning may interact with the environment in ways that buffer against new mutations and/or environmental insults to ensure the stability and robustness of the learning process. Longitudinal genetic studies suggest that genetic variation can influence individual differences in behavior in ways that are persistent or continuous over long periods of development—a process that has been termed ‘canalization’ (Lenroot et al, 2009). In the case of dopaminergic neuromodulation, its effect on adenylyl cyclase is known to influence the expression of immediate early genes, which may have short-term effects on neuronal responsivity and plasticity, and, hence, learning of new skills and/or contingencies. Environmental conditions can also exert effects on the genome that are much longer in duration and can, potentially, induce shifts in gene expression that last for many years and, in some cases, influence maternal care and the maternal environment experienced by subsequent generations (Kosten and Kehoe, 2010). One of the leading mechanisms for such enduring gene–environment interactions is that of epigenetic change, a series of biochemical mechanisms that can regulate gene expression by way of the opening and closing of chromatin structure. In some model systems, it seems that environmental conditions, in the form of chemical toxins or extreme physiological stress are sufficient to engage signal transduction cascades that induce either methylation of cytosine residues and/or the acetylation/de-acetylation of histone proteins in MSNs (Song et al, 2010; Veldic et al, 2007). Along chromosomal stretches in which DNA is highly methylated, it generally is less accessible for transcription and hence gene expression is reduced. Similarly, when histone proteins are de-acetylated, the DNA is much sterically less accessible and gene expression is reduced. That the ‘opening’ and ‘closing’ of chromatin structures can so effectively regulate gene expression over long periods of time has given impetus to the study of biochemical pathways involved in DNA methylation and histone acetylation. Previous findings revealed, for instance, that the DRD2-antagonist haloperidol can lead to DARPP32-dependent phosphorylation of the acetylated form of histone H3 in the striatum selectively in striatopallidal neurons (Bertran-Gonzalez et al, 2008). Other studies on DNA methylation show that proper methylation is necessary for normal dendritic arborization and neuronal excitability in the striatum. Finally, the functional relevance of such epigenetic pathways to striatal function is supported by data showing that local knock-out of the histone deacetylase HDAC1 in the striatum abolished amphetamine-induced desensitization of immediate-early gene expression (Renthal et al, 2008). Although their connection to these epigenetic pathways is not presently understood, the existence of the striatonigral-enriched gene EYA1, a tyrosine phosphatase that interacts with histone proteins; and the striatopalidal-enriched HIST1H2BC histone gene, do suggest that differential regulation of the direct vs indirect circuitry may be susceptible to epigenetic influences.

Identification of Novel Striatonigral and Striatopallidal Genes

A majority of the evidence for the model comes from dopaminergic genes or pharmacological manipulations (with the notable exception of studies manipulating the A2A receptor). To be more confident in the precise loci of the effects, it will be immensely helpful to assess other genes, which may provide clues to the molecular pathways that regulate the differentiation of the striatonigral vs striatopallidal circuitry. Early radiolabelling and radioimmunohistochemical work has shown that a number of genes are differentially expressed in MSNs of these pathways. TAC1 and PDYN, for example, were first identified through early radiolabelling studies that sought the origins of substance P and dynorphin-containing terminals in the substantia nigra, respectively (Brownstein, 1977; Hong et al, 1977). Similarly, PENK was identified as a source of enkephalin-producing terminals in the GPe. In recent years, genes that are differentially expressed in striatonigral vs striatopallidal circuitry as well as the patch vs matrix compartments in the dorsal striatum have been identified through innovations in cell labelling, separation and genome-wide expression analysis (Lobo, 2009; Heiman et al, 2008). Animal models such as transgenic mice that express fluorescent proteins (EGFPs) in striatonigral MSNs and their axonal projections to the GPi and SN (so-called Drd1a-EGFP and Chrm4-EGFP BAC mice) as well as the so-called Drd2-EGFP BAC mice whose striatopallidal cells and axonal projections to the GPe have been used to overcome the long-standing difficulty in isolating differentially expressed genes. Fluorescence-activated cell sorting has been used in conjunction with this transgenic cell-labelling approach to purify individual striatonigral vs striatopallidal cells whose mRNA has then been subject to genome-wide expression comparisons (Lobo, 2009). This approach has been useful in the confirmation of genes (such as DRD1, TAC1, PDYN, and CHRM4 in the striatonigral pathway and DRD2 and PENK1 in the striatopallidal pathway) whose differential expression was previously known, and also for the identification of novel genes that have roles in the development and physiology of striatonigral vs striatopallidal circuits.

For example, a novel gene, Slc35d3, was identified as the gene whose expression was most enriched in striatonigral cells. This gene encodes a sugar-transporter protein and has a role in the glycosylation of proteins that are expressed in the surface of cells (Selva et al, 2001), and in some cases, in development by regulating the surface presentation of gylycospylated proteins involved in cell fate decisions (Goto et al, 2001). Further clues into the function of Slc35d3 come from mice who carry inactivated copies of the gene and show striatal and motor abnormalities (Lu et al, 2008). Another gene enriched in striatonigral cells is Znf521, a transcription activator/repressor involved in BMP signaling and in the development of B cells through its binding to EBF1. Interestingly, mice that lack functional copies of EBF1 show abnormalities in the development of the matrix compartment in the dorsal striatum, but not the patch compartment. The relationship between the striatonigral/pallidal and matrix/patch compartments in the striatum is not well understood at the molecular level, but some data suggest functional differentiation in ways that would be relevant to information processing along the direct and indirect pathways. For example, each compartment projects to different areas on the SN, as MSNs in the matrix project to GABAergic targets in the SNr and MSNs in the patch compartment synapsing with dopaminergic targets in the SNc (Gerfen, 1984).

Within striatopallidal cells, a G-protein coupled receptor known as GPR6 was identified that showed colocalization with enkephalin protein and expression in striatopallidal but not striatonigral projections (Lobo et al, 2007). Targetted mutations of GPR6-produced mice that were insensitive to the GPR6 ligand sphingosine-1-phosphate, resulting in lower striatal cAMP levels, and therefore has similar effects as dopamine binding to D2 receptors on striatopallidal cells (ie, reduced ‘NoGo’). Similar to DRD2 animals, GPR6 mutant animals, in spite of showing normal locomotor function, were faster in acquiring the bar-press response in a variety of reward-based instrumental conditioning assays, an effect that was not related to alterations in motivation or processing of reward values themselves (Lobo et al, 2007). These effects on behavior and cAMP levels are similar to those found in mice lacking the striatopallidal-enriched adenosine 2A receptor (ADORA2A) gene, which, as discussed previously, has a direct role in moderating adenylyl cyclase activity (Corvol et al, 2001). Finally, the developmental genes, plexin domain-containing 1 (PLXDC1), LIM homeobox 8 (LHX8) and tropomyosin 2 (β) (TPM2) have not been directly tied to regulation of cAMP levels or other aspects of dopaminergic signal transduction, but may rather point to developmental pathways in neural migration and connectivity.

Hypothesis Development for Novel Striatonigral- and Striatopallidal-Enriched Genes

In much the same way that the physiological properties of the D1 vs D2 signal transduction cascade have been related to the mathematical parameters of reinforcement learning, the genes described above may also modulate the dynamics of information flow through the basal ganglia. In striatonigral cells, for example, the binding of dopamine to DRD1 G-protein coupled (Golf) receptors leads to the activation of adenylate cyclase, increased PKA activity, inactivation of K+ channels and thereby facilitates the stable ‘up’ state and associated firing of activated MSNs. In contrast, when dopamine binds to DRD2 Gi/o receptors on striatopallidal cells, there is a net decrease in cAMP levels and increased protein kinase C activity, which leads to the facilitation of K+ channel opening and stabilization of the ‘down’ state.

Further exploration into the signaling cascades of novel differentially expressed genes in striatonigral vs striatopallidal cells may point to similar or intersecting intra-cellular connections with adenylate cyclase, PKA, and/or C or ion channels that differentially stabilize ‘up’ vs ‘down’ states. One such example is found in the striatopallidal-enriched ADORA2A gene. Caffeine, which is an adenosine receptor antagonist, exerts reinforcing effects that are ADORA2A-dependent (Casta et al, 2006) and are associated with downstream phosphorylation of DARPP-32 (Hsu et al, 2009). The ADORA2A-dependent promotion of the ‘down state’ is further supported by studies finding gamma oscillation strength is opposed by A(2A) receptors (Pietersen et al, 2009). Many of the other novel genes described above are presently distinguished merely as being enriched in terminally differentiated cells, however, the wider literature suggests clues on their roles in cell differentiation migration and connectivity, and hence, information-processing functions of MSNs.

CONCLUSIONS

We have presented a neurocognitive strategy for understanding genetic components to individual differences in motivation, learning and cognition. We hope it is clear that, in principle, dopaminergic genetic factors can lead to differences in both motivation (ie, the desire to seek rewarded outcomes), and learning from these outcomes (synaptic plasticity and working memory). Together, these factors can dynamically influence cognitive performance across multiple time scales. We believe that initial findings across a range of domains, labs, and species, lend credence to the notion that much can be gained from a brain-based hypothesis-driven approach to neurogenetics. Of course, much work remains to validate and refine the assumptions of both particular models, and the more general strategy. We of course also acknowledge the complementary potential benefit of the genome-wide association approach, particularly in domains for which the theoretical literature is relatively immature, but also for identifying loci that may not be predicted by current theoretical models. Ultimately, both strategies should be useful in tandem.