INTRODUCTION

There has been extensive interest in the effect of the Val158Met polymorphism of the catechol-O-methyltransferase (COMT) gene on working memory. The Val158Met polymorphism is a common functional genetic variant that impacts dopamine (DA) levels in the prefrontal cortex (PFC; Tunbridge et al, 2006), an area critical to working memory function (Cools and D'Esposito, 2011). Specifically, individuals who carry one or two copies of the Met allele are believed to have higher levels of DA in the PFC, and several studies have shown that this is associated with superior performance on measures of working memory (Egan et al, 2001; Goldberg et al, 2003; Mattay et al, 2003). However, other studies have failed to replicate this effect (Blanchard et al, 2011; Stefanis et al, 2004) and a 2008 meta-analysis found no significant relationship between COMT genotype and working memory performance in healthy adults, as assessed by the typical ‘N-Back’ measure of working memory (Barnett et al, 2008, 2011). A persistent problem in these candidate gene studies is the relatively small samples that have been used. Combining these studies in a meta-analysis also showed substantial between-study heterogeneity, limiting the conclusions that could be drawn from this approach (Barnett et al, 2008; Goldman et al, 2009). Here, we examined the relationship between COMT and working memory, in a cohort of over 2400 18-year-olds.

COMT inactivates DA through enzymatic degradation, and is a particularly important regulator of DA in the PFC (Tunbridge et al, 2006). Manipulations of COMT, preferentially affect PFC DA levels without strongly affecting DA in other regions, or altering levels of noradrenaline (Gogos et al, 1998). The relative scarcity of DA transporters in PFC may give COMT this large role in DA regulation in the PFC (Tunbridge et al, 2006). The Met allele of the Val158Met polymorphism results in lower COMT activity, and correspondingly higher tonic DA levels in PFC (Chen et al, 2004). Goldman-Rakic et al, (2000) proposed that the relationship between DA levels and PFC functioning follows an inverted ‘U,’ such that PFC functioning is optimal in a limited range of PFC DA, with poorer functioning above and below those levels, a hypothesis that is now generally accepted (Robbins and Arnsten, 2009; Seamans and Yang, 2004). Given this, COMT variation might be expected to relate to working memory ability in two ways. First, Met/Met individuals might be in the optimal range of DA functioning, and thus perform better than Val/Met heterozygotes, who in turn would perform better than Val/Val individuals (ie, a dose-response or linear relationship between the Met allele and working memory). This relationship was observed in(Goldberg et al, 2003) and (Mattay et al, 2003). But if Met/Met individuals are actually at higher than optimal levels of tonic DA, heterozygotes might perform better than either homozygous group (ie, a quadratic relationship between the Met allele and working memory). This relationship was observed in Gosso et al, (2008).

In the current study we examine both of these possible relationships in a single large cohort of over 2400 18-year-olds in the Avon Longitudinal Study of Parents and Children (ALSPAC; Golding et al, 2001), who performed the N-Back task of working memory. These participants have been repeatedly assessed for cognitive development since age seven. Their performance on another working memory task, the counting span, was also examined in relation to COMT when participants were 10 years of age (Barnett et al, 2007). At this assessment, presence of the Met allele was associated with superior counting span, such that a higher number of Met alleles corresponded to better working memory performance. This association was especially evident in males. These findings are consistent with other reports in adults (Egan et al, 2001; Goldberg et al, 2003; Mattay et al, 2003), and therefore we hypothesized COMT would show a linear relationship with performance on the N-Back test, with the best performance in Met/Met carriers, followed by Val/Met, and Val/Val individuals. On the basis of the earlier findings, we also hypothesized that this association would be stronger in males than females.

MATERIALS AND METHODS

Sample

ALSPAC consists of a cohort recruited in 1991 to 1992. All pregnant women in what was formerly the Avon district of the UK were invited to take part, with data being collected on N=14 541 pregnancies with the mother, child, or partner completing at least some measures. When the oldest children were 7 years of age, additional families who had been eligible but had not enrolled were recruited, resulting in a total of 15 247 pregnancies with some data collected. Data were collected yearly via questionnaires, medical records, and interviews, and, beginning at age 7 all children were invited to clinic-based assessments that included physical, social, and cognitive measures. The primary measure for the current study was taken at age 18. Parents who enrolled their children into ALSPAC gave written informed consent at the time of enrollment, with parents and subsequently children reconsented at later assessments. Ethics approval for the current study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees.

Measures

N-Back

Working memory was assessed using a computerized version of the N-Back task. In this task, participants continuously monitored a series of numbers presented on a computer screen, and pressed ‘1’ if the number was the same as the number presented N numbers ago, or ‘2’ if it was not. Stimuli were numbers 0–9, presented in black on white background with a random spatial jitter of 180 pixels in y-axis 200 pixels in x-axis. Each target was presented for 500 ms, followed by a 3000 ms period, in which to respond with a key press (‘1’ for a non target, ‘2’ for a target). The practice block consisted of 12 trials, with two targets. Each experimental block consisted of 48 trials, with eight targets, with a single block for each of the 2- and 3-Back conditions. We examined four metrics for both the 2- and 3-Back conditions: (i) hits, or the percentage of matching numbers correctly identified as matches, (ii) false alarms, or the percentage of non-matching numbers incorrectly identified as matches, (iii) the discriminability index, or d’ which is a signal-detection metric that takes into account both hits and false alarms to derive an overall estimate of signal detection ability (see McNicol, 1972 for calculation information), and (iv) median reaction times, as an indicator of processing efficiency.

N-Back data were collected from participants (N=5081) who attended the clinic at 18 years of age (M=17 years 10 months, SD=5 months). N-Back data was available in either the 2- or 3-Back condition for n=3933 participants. We excluded 387 individuals from the 2-Back and 335 individuals from the 3-Back analyses due to non-responsiveness to the task (giving no answers on any item). Thus, a total of 3159 individuals had useable 2-Back data, and 3170 individuals had useable 3-Back data.

Genotyping

DNA was obtained from cord blood, blood samples taken at clinic days or mouthwash samples, and extracted using standardized procedures (Jones et al, 2000). COMT genotyping was the same as described previously in (Barnett et al, 2007). Of the individuals with usable 3-Back data, 2624 had genotype information available, while of those with useable 2-Back data, 2659 had genotype information available. A small number of individuals in the sample (n=34, 1%) were half of a sibling pair. Excluding one half of each sibling pair did not change the results. The majority of the sample was of European ancestry (n=2604, 93%), with the ethnicity/race of the remainder non-European (n=189, 6.8%). Excluding individuals of non-European ethnicity did not change the results. The three COMT genotypes were in Hardy–Weinberg equilibrium.

Covariates

We controlled for standard demographic information collected via questionnaire, including sex, mother’s education level, and family home-ownership status (coded as subsidized rental, private rental, or home owners). In addition, because smoking status and recent smoking may affect cognitive functioning (Loughead et al, 2008; Zhang et al, 2010), we controlled for number of cigarettes smoked in lifetime (coded as none, <5, 5–19, 20–49, 50–99, or 100+), and recent smoking (coded as never smoker, has smoked but not in last 30 days, or has smoked in last 30 days). We also controlled for current alcohol use, represented as frequency of drinking alcohol (never, monthly or less, 2–4 × per month, or >2 × per week), and frequency of binge drinking, defined here as having six or more drinks on one occasion (never, once or twice in lifetime, less than monthly, monthly, or weekly). Lastly we used Full-Scale IQ (FSIQ) estimates collected at the 8-year assessment using the WISC 3rd Edition to control for general cognitive functioning. Alternate items of the WISC-III were given for all tests except the coding subtest, for which all items were administered. Verbal, Performance, and FSIQ scores were calculated whenever four subscale scores were available. FSIQ is generally believed to be stable over time (r=0.91 in one study with an average 2.8-year interval between tests; Canivez and Watkins, 1998). Of those with FSIQ scores available, n=77 (3.1%) had FSIQs in the borderline range or below (<80). Excluding these, individuals did not change the results. Table 1 shows the means/percentages for the demographics and other covariates, and their relationships with COMT genotype. None of the covariates were significantly related to COMT genotype. We did not have clinician verified DSM-IV diagnoses, but we did have probability estimates for each individual having an anxiety, mood disorder, ADHD, ODD, or Conduct Disorder diagnosis, produced by the Development and Well-Being Assessment (DAWBA; Goodman et al, 2011). The DAWBA uses computer-administered self- and parent-questionnaires to automatically generate probability bands for given diagnoses. Setting a liberal cut-point of >50% probability, only 4.4% of the sample was identified as likely to have one or more of the above disorders. Excluding these individuals and the 14% of the sample who had missing information on the DAWBA did not change the results.

Table 1 Demographics and Other Covariates by Genotype

Analyses

In all analyses we used hierarchical multiple regressions (Cohen et al, 2003) with planned comparisons for COMT, which allowed us to test for both the predicted linear effect of an increased number of Met alleles (Met/Met>Val/Met>Val/Val), and a possible quadratic effect (Val/Met>both Met/Met and Val/Val). In each analysis, the main effect of sex, linear effect of COMT and quadratic effect of COMT were entered in one block, and then interactions between sex and COMT were added in a second block. Because there were two N-Back levels available for analysis, our primary analyses took a regression approach to repeated-measures ANOVA (as described in Judd et al, 2009). We first examined the main effects of sex and COMT on N-Back performance by using the average of the 2-Back and 3-Back levels as the dependent variable in our regression. We then examined the interactions between N-Back level, sex, and COMT by using the difference between the 2-Back and 3-Back levels as the dependent variable in our regression. These two regressions combined, yielded the same exact results as a repeated-measures ANOVA with N-Back level as a within-subject variable, with the advantage that a regression approach makes it easier to then include and adjust for multiple covariates, as we did in the next step. We then examined the effect of adjusting these analyses for mother’s education, home-ownership, smoking status, recent smoking, drinking frequency, binge drinking frequency, and mean-centered FSIQ by adding these variables in the first block, before the effects of sex, COMT and their interactions. Lastly, we conducted secondary analyses of COMT, sex, and their interactions at the 2-Back and 3-Back levels separately. This was done because some individuals were missing the data from one N-Back level only, reducing the number of individuals available for the repeated-measures analysis. We also examined the effect of including the covariates in these secondary 2-Back and 3-Back analyses. The previous meta-analysis indicated an effect size of d=0.06 for COMT on N-Back performance, or an R2 of 0.009, or 0.9% of variance explained. In the current study with P=0.05 we had 95% power to detect an R2 of 0.007, or 0.7% of variance explained.

RESULTS

Hits

We first conducted the repeated-measures analysis of the effects of sex, COMT, and N-Back level on hits (n=2490). Means and standard deviations by sex and genotype are available in Supplementary Table 1. Hits were normally distributed at the 2- and 3-Back levels. As shown in Table 2, hits were significantly lower at the 3-Back level (M=57.03, SD=22.84) compared with the 2-Back level (M=73.45, SD=22.74), giving a significant effect of N-Back. Neither COMT nor sex were related to hits, and there were no interactions. Total variance in hits explained by COMT genotype was around 0.2% (R2=0.002). Including the covariates reduced the number of complete cases available for analysis (from n=2490 to n=1874), but did not change the primary results for COMT, sex, or their interactions. Of the covariates, only FSIQ uniquely predicted hits, with increased FSIQ predicting better hit percentage, B=0.27, SE=0.03, t(1848)=9.08, P<0.001. We also conducted secondary linear regressions examining the 2- and 3-Back levels individually (n=2659 and n=2624, respectively). Sex, COMT and their interactions did not predict either 2- or 3-Back hits when these levels were considered individually. Inclusion of the covariates did not alter these results.

Table 2 Heirarchical Linear Regression Analysis of the Effect of Sex, COMT and N-Back Level on N-Back Hits (% correct). The Main Effect of Sex, Linear Effect of COMT and Quadratic Effect of COMT were Entered in Block 1, and the Interactions Between Sex and COMT were Added in Block 2. We took a Repeated-measures Approach, First Predicting the Average of the 2-Back and 3-Back Levels (Yielding the Main Effects of Sex and COMT), Then Difference Between the 2-Back and 3-Back Levels (Yielding the Interactions Between N-Back, Sex and COMT)

False Alarms

We next conducted the repeated-measures analysis of the effects of sex, COMT, and N-Back level on false alarms (n=2490). Means and standard deviations by sex and genotype are available in Supplementary Table 2. False alarms were positively skewed, but transforming false alarms for normality did not alter the results. Thus, the results are reported in the original metric for ease of interpretation. As shown in Table 3, false alarms were higher at the 3-Back level (M=20.63, SD=17.01) compared with the 2-Back level (M=19.25, SD=22.26), for a significant effect of N-Back, but COMT did not predict false alarms. Men had fewer false alarms than women, but COMT and sex did not interact. Total variance in false alarms explained by COMT was around 0.6% (R2=0.006). Including the covariates reduced the number of cases available for analysis (n=1874), and did not change the results for COMT, but the effect of sex was no longer significant, B=0.32, SE=0.81, t(1848)=0.39, P=0.70. Two covariates uniquely predicted false alarms. Increased FSIQ predicted fewer false alarms, B=0.32, SE=0.03, t(1848)=11.53, P<0001, and higher mother’s education predicted fewer false alarms, particularly having a mother who completed either A-level exams or a college degree, B=2.77, SE=0.98, t(1848)=2.83, P=0.005 and B=3.67, SE=1.32, t(1854)=3.40, P=0.001, respectively. We also conducted secondary linear regressions examining the 2- and 3-Back levels individually (n=2659 and n=2624, respectively). Sex, COMT and their interactions did not predict 2-Back false alarms, even after including the covariates. Examining 3-Back false alarms, there was a significant interaction between sex and the quadratic effect of COMT, B=4.08, SE=1.87, t(2618)=2.18, P=0.03. Post-hoc t-tests determined that the only significant differences between genotypes were that Val/Val males performed significantly better than Val/Met males, t(871)=2.51, P=0.01, but not Met/Met males. Performance of all genotypes was similar for females. This interaction did not conform either to our linear prediction (Met/Met better), or to a biologically plausible quadratic effect (Val/Met better) and was no longer significant once covariates were included, B=3.37, SE=2.04, t(1940)=1.66, P=0.10. Thus this seems likely to represent a spurious finding.

Table 3 Hierarchical Linear Regression Analysis of the Effect of Sex, COMT and N-Back Level on N-Back False Alarms. The Main Effect of Sex, Linear Effect of COMT and Quadratic Effect of COMT were Entered in Block 1, and the Interactions Between Sex and COMT Were Added in Block 2. We Took a Repeated-measures Approach, First Predicting the Average of the 2-Back and 3-Back Levels (Yielding the Main Effects of Sex and COMT), Then Difference Between the 2-Back and 3-Back Levels (Yielding the Interactions Between N-Back, Sex, and COMT)

Discrimination

We next conducted the repeated-measures analysis of the effects of sex, COMT, and N-Back level on d’ (n=2490). Means and standard deviations by sex and genotype are available in Supplementary Table 3. d’ was normally distributed. As shown in Table 4, d’ was lower at the 3-Back (M=0.95, SD=0.93) than 2-Back (M=1.05, SD=1.17), for a significant effect of N-Back, but COMT did not predict d’. Performance was better in men than women, but COMT and Sex did not interact. Total variance in d’ explained by COMT was around 0.6% (R2=0.006). Including the covariates reduced the number of cases available for analysis (n=1874), and did not change the primary results for sex, COMT, or their interactions. Of the covariates, increased FSIQ uniquely predicted better d’ performance, B=0.008, SE=0.001, t(1848)=6.49, P<0.001, and binge drinking predicted worse performance, particularly binge drinking at least monthly, B=−0.20, SE=0.07, t(1848)=−2.68, P=0.007. We also conducted secondary linear regressions examining the 2- and 3-Back levels individually (n=2659 and n=2624, respectively). Sex, COMT and their interactions did not predict either 2- or 3-Back d’ when these levels were considered individually. Inclusion of the covariates did not alter these results.

Table 4 Hierarchical Linear Regression Analysis of the Effect of Sex, COMT and N-Back Level on N-Back Discriminability (d’). The Main Effect of Sex, Linear Effect of COMT and Quadratic Effect of COMT Were Entered in Block 1, and the Interactions Between Sex and COMT Were Added in Block 2. We Took a Repeated-measures Approach, First Predicting the Average of the 2-Back and 3-Back Levels (Yielding the Main Effects of Sex and COMT), Then Difference Between the 2-Back and 3-Back Levels (Yielding the Interactions Between N-Back, Sex and COMT)

Reaction Time (RT)

We next conducted the repeated-measures analysis of the effects of sex, COMT, and N-Back level on reaction time (n=2490). Means and standard deviations by sex and genotype are available in Supplementary Table 4. Median RT was right skewed, but transforming RT for normality did not alter the results. Thus, the results are reported in the original metric for ease of interpretation. As shown in Table 5, RT was slower at the 3-Back (M=715.42, SD=265.98) than 2-Back (M=685.21, SD=208.02), for a significant effect of N-Back, but COMT did not predict RT. There was an interaction between sex and N-Back level, such that RT in men slowed more from the 2-back to the 3-Back condition than women did. COMT and sex did not interact. Total variance in RT explained by COMT was around 0.6% (R2=0.006). Including the covariates reduced the number of cases available for analysis (n=1874), and did not change the primary results for sex, COMT, or their interactions. Of the covariates, increased FSIQ uniquely predicted slightly slower reaction time, B=1.01, SE=0.34, t(1848)=2.95, P=0.003, any alcohol use (greater than never) also predicted slower reaction times. We also conducted secondary linear regressions examining the 2- and 3-Back levels individually (n=2659 and n=2624, respectively). Sex, COMT and their interactions did not predict either 2- or 3-Back RT when these levels were considered individually. Inclusion of the covariates did not alter these results.

Table 5 Hierarchical Linear Regression Analysis of the Effect of Sex, COMT and N-Back Level on N-Back Reaction Time (RT). The Main Effect of Sex, Linear Effect of COMT and Quadratic Effect of COMT were Entered in Block 1, and the Interactions Between Sex and COMT Were Added in Block 2. We Took a Repeated-measures Approach, First Predicting the Average of the 2-Back and 3-Back Levels (Yielding the Main Effects of Sex and COMT), Then Difference Between the 2-Back and 3-Back Levels (Yielding the Interactions Between N-Back, Sex and COMT)

DISCUSSION

Contrary to our hypotheses, the Val158Met polymorphism in COMT was not associated with hits, false positives, d’, or reaction time on the N-Back task in this large sample of 18-year-olds. There was also no moderation of the effects of COMT by sex on any of the measures. Our sample was much larger than previous studies, and thus had more power to detect a true effect. After controlling for the participants’ mother’s education, home-ownership, smoking status, recent smoking, and IQ, there were no differences in performance on any of the outcome measures for the three genotypic groups. The findings suggest that COMT genotype is not related to working memory performance using the N-Back task in young adults. This lack of association was somewhat surprising in light of the previous association reported in this same sample between COMT genotype and working memory performance on a different working memory task at 10 years of age, and positive reports with the N-Back in previous smaller samples of healthy normal adults.

Regarding the discrepancy between our current findings and the positive association in this same sample in childhood, there are several possible explanations. First, it may be that the relationship between COMT and working memory changes with age. Basal DA levels alter throughout development, which would place the effect of COMT against a different dopaminergic ‘background’ (Weickert et al, 2007), and COMT activity also changes during development, increasing up until adolescence (Tunbridge et al, 2007). One previous study has suggested differential effects of COMT on brain activation during the N-Back in children vs adults, but this study found a greater ‘advantage’ of COMT (measured in this case as more focused brain activity during the working memory task) in adults (Dumontheil et al, 2011), which would not be consistent with our lack of effect in adults. Second, COMT effects might not be behaviorally evident in adulthood because individuals learn compensatory strategies. In this case, differences between genotypes may only be detectable using techniques such as fMRI that measure neural differences in how individuals perform tasks, rather in performance of the task itself. Indeed, the imaging literature suggests more consistent differences in brain activation between genotypes in adults, rather than behavioral differences (Caldu et al, 2007; Egan et al, 2001; Meyer-Lindenberg et al, 2005; Meyer-Lindenberg et al, 2006). A recent study examining several neurocognitive tasks (including working memory) and brain imaging concluded that there were differences in brain activation between COMT genotypes, perhaps indicating recruitment of different strategies, but no behavioral differences (Dennis et al, 2010). This interpretation would be consistent with our findings. Third, it may be that certain working memory tasks are more sensitive to COMT effects than others. It has been suggested that PFC circuitry is particularly important to information manipulation rather than information storage (Bruder et al, 2005). If the counting span task used in the childhood evaluation is more ‘manipulation-heavy’ while the current N-Back task is more ‘storage-heavy’ that could explain the lack of a COMT effect in the adult assessment. Finally, it is always possible that the previous results represented Type I error.

Regarding the discrepancy between our results and previous findings in other samples of adults, some of the same possibilities apply. First, our sample is at the younger end of the age range considered ‘adult’ that has been studied in previous investigations. As noted above, both basal DA and COMT activity alter with age. In one study, COMT effects on delay discounting (a form of impulsivity) were more pronounced in adulthood (defined in that investigation as over the age of 22) than they were in late adolescence (defined as ages 18–21; Smith and Boettiger, 2012), supporting the possibility that this cohort is still too young for reliable COMT effects to emerge. However, in the (Dumontheil et al, 2011) study noted in the previous paragraph, which examined age-related changes in COMT influences on brain activity during working memory, an advantage for the Met/Met genotype emerged as early as 10 years of age. Thus, more systematic study of age-related effects of COMT on working memory performance appears warranted. Second, there are two common variants of the N-Back task in use, the version used here, which requires a yes/no match for each stimulus, and a version that requires the participant to indicate the exact number that was presented N numbers ago. It has been argued that the yes/no version utilized here is more heavily dependent on storage, while the continuous response version requires more information manipulation, which may be more sensitive to COMT variation (Goldman et al, 2009). However, there has been no systematic examination of the differences between these N-Back variants in terms of difficulty or sensitivity to COMT effects, and the more ‘manipulation heavy’ version has also produced negative results in a fairly large (n=291) sample (Blanchard et al, 2011). Third, it may be that COMT effects are only evident in combination with other genotypes, or against a psychiatric disease background. Several studies have suggested the importance of a COMT haplotype, or of interactions with other genes that collectively determine dopaminergic tone (Meyer-Lindenberg et al, 2006; Wishart et al, 2011). Indeed, although there was a main effect of COMT in the childhood assessment, a COMT-related haplotype also emerged as a significant predictor (Barnett et al, 2009). COMT effects may be more evident in the presence of a psychiatric disease background, as COMT effects on working memory may be more consistent in schizophrenic samples than in healthy normal adults (Wirgenes et al, 2010), although c.f. (Barnett et al, 2011). Finally, there is the possibility that previous positive results with COMT and the N-Back in adults represented Type I error.

These findings should be considered in the context of the following limitations. First, the N-Back is often administered using a ‘0-Back’ attentional task, in which participants simply respond to each number as it appears. Due to time constraints we were not able to administer this type of task, and thus we are limited in our ability to distinguish poor attentional performance from poor working memory performance. However, analysis of the 2-Back and 3-Back conditions in isolation is not unusual in this literature (eg, Mattay et al, 2003; Bruder et al, 2005; Barnett et al, 2008; Stefanis et al, 2004). Second, because the N-Back has only been administered at one time point in this study thus far, we are not able to address questions raised about the relationship of COMT to working memory across development. Third, although we attempted to control for a number of covariates, there are other possible confounding variables we were not able to examine in this sample, including educational attainment, and clinician-verified diagnoses of psychiatric disorder.

In conclusion, the current results constitute the largest assessment of COMT and working memory performance in healthy normal adults reported to date, and suggest no important impact of COMT genotype on behavioral working memory performance in healthy young adults. In this study we were powered to detect anything over 0.7% variance in the phenotype with 95% power, yet we saw no significant differences. This is in line with both the previous meta-analysis of COMT and N-Back performance (Barnett et al, 2008, 2011), and with other recent large-scale studies of COMT and working memory (Blanchard et al, 2011; Dennis et al, 2010). This suggests that COMT has no impact on working memory performance as measured by the N-Back, or at least not one that is measurable without prohibitively large sample sizes or perhaps under conditions of additional challenge to the DA system, such as nicotine withdrawal or compromised DA functioning in schizophrenia. It is also possible that COMT has an effect on working memory, but that the N-Back test as typically given is not an adequate measure for assessing behavioral effects of COMT. It may be that other assessment methods are needed to detect COMT effects on cognition in healthy adults.