Introduction

Personality and temperament traits are stable representations of emotional, motor and attentional reactivity to stimulation, as manifested by an organized pattern of behavioral responses across a range of contexts.1 The assessment of personality and temperament measures in human populations is therefore a major component of efforts to correlate higher order behaviors with underlying biology. Two common models for assessing personality include the five factor model of personality (FFM)2 and the temperament and character inventory (TCI).3, 4

The big five factors are openness to experience, conscientiousness, extraversion, agreeableness and neuroticism. Openness to experience is reflected in a strong intellectual curiosity and a preference for novelty and variety. Individuals scoring high on conscientiousness are characterized as being disciplined, organized and achievement-oriented. The extraversion dimension characterizes the tendency to be active, seek out stimulation and the company of others. Agreeableness evaluates the tendency to be helpful, cooperative and sympathetic towards others. Lastly, emotional stability, impulse control and anxiety are all components of the Neuroticism dimension.

The four temperament dimensions of the TCI are harm avoidance (HA), novelty seeking (NS), reward dependence (RD) and persistence (P). HA is a tendency to respond intensively to signals of aversive stimuli, thereby inhibiting behavior. NS is a tendency to respond with intense excitement to novel stimuli, or cues for potential rewards or potential relief of punishment and thereby activating behavior. RD is a tendency to respond intensely to signals of reward, especially social rewards, thereby maintaining and continuing particular behaviors. P is a tendency to persevere in behaviors that have been associated with reward or relief from punishment.

Although there is some evidence in support of convergence between the TCI and FFM, they differ in how the underlying models were created, with the FFM based on a lexical analysis of trait adjectives and the TCI based on a theoretical model that sought to account for individual differences in personality by integrating neurobiological systems, learning and social influences (for a review, see Stallings et al.5). In a sample of 130 individuals that had been administered both the TCI and the NEO-PI-R (representing the FFM2), De Fruyt6 used a multiple regression approach to show that 23–51% of the variance in the TCI scales was explained by the NEO-PI-R scales, and 29–55% of the variance in the NEO-PI-R scales were explained by the TCI scales, indicating a substantial portion of the variance in each of the TCI and NEO-PI-R unaccounted for by the other.

Personality measures from these models show correlations to problem behaviors and psychiatric diagnoses,7, 8, 9, 10, 11, 12 and may serve as useful endophenotypes for the study of genetic components of behavior. An endophenotype is a quantitative, heritable trait, thought to more directly reflect the influence of genetic variation.13 Personality dimensions assessed in both the FFM and TCI demonstrate broad-sense heritability in excess of 30%,14 and have been the focus of several genome-wide association studies (GWAS).

The largest GWAS of personality measures has been conducted using the NEO-PI-R. de Moor et al.15 analyzed the FFM dimensions for association to 2.4M single-nucleotide polymorphisms (SNPs) in a meta-analysis of 17 375 individuals, and while two dimensions (openness to experience and conscientiousness) were associated at genome-wide significance levels to several SNPs, the associations were not replicated in an independent sample.

Although Cloninger proposed that HA, NS and RD were influenced by the serotonergic system, the dopamine system and the noradrenaline system, respectively,16 genetic evidence to support this assertion has been mixed, and a recent large meta-analysis of the 5–HTTLPR polymorphism and HA showed no association.17 Only a single GWAS of the TCI has been reported to date. Verweij et al.18 tested all four TCI temperament scales for association with 1.2M SNPs in a sample of 5117 Australians, employing single-marker analyses, gene-based tests and pathway analyses, and did not identify any genetic variants to be genome-wide significant.

Although both de Moor and Verweij failed to find genome-wide association to scales related to personality, there were key differences between these studies. They used different instruments to assess personality, and had very different sample sizes. One cannot rule out that the null results of de Moor et al.15 were due in part to the instrument used. While both the TCI and FFM dimensions are clearly heritable, at a similar magnitude, the locus-specific heritabilities of dimensions of both instruments are unknown and may differ. That is, if the proportion of variance in the TCI that is not accounted for by the FFM has higher locus-specific heritability than the FFM dimensions themselves, it is possible that the TCI will have greater success in genetic mapping. Although the GWAS of Verweij et al.18 used the TCI, the sample was not powered to detect genetic variants of small effect size (<1% of the variance explained). The aim of the current study was to employ meta-analytic techniques to evaluate the possibility of small genetic effects on the TCI. By combining four cohorts totaling over 11 000 persons, this sample has 80% power to identify association with an SNP responsible for as little as 0.4% of the variance in the temperament scales, at an alpha level of 1.2 × 10−8 (a genome-wide significance level of 5 × 10−8 corrected for testing four temperament traits). Additionally, two features of the study design should minimize the degree of phenotypic and genotypic heterogeneity across the cohorts. First, temperament in all four cohorts was analyzed using identical items. Second, three of the four cohorts were derived from Finland, a relatively genetically homogeneous population.

Materials and methods

Sample descriptions

The Northern Finland Birth Cohort (NFBC) is a population-based birth cohort comprised of 12 058 individuals born in the northernmost two Finnish provinces in 1966.19 In 1997, a temperament questionnaire was given to 5999 individuals who participated at the age of 31 in a follow-up assessment. Subjects were asked to complete the questionnaire and return it by mail; 5105 individuals returned the questionnaire,20 of whom 4508 were genotyped and passed quality control (QC, 55% female).

The cardiovascular risk in young Finns (YFS) study is a stratified random sample of children and adolescents aged 3–18 years from five university cities and surrounding areas with medical schools.21, 22 Subjects were born in years 1962, 1965, 1968, 1971, 1974 and 1977 and followed up every 3–5 years, beginning in 1980. Temperament data used in these analyses were collected in 2001; 2105 persons had valid phenotype data, and of these 1383 were genotyped and passed QC (54% female). The mean age of participants was 32.5 (±5.1) years.

The Helsinki Birth Cohort Study (HBCS) is a birth cohort sample of individuals born at Helsinki University Central Hospital in 1934–44.23, 24, 25 Temperament data used in these analyses were collected in 2004; 1671 persons had valid phenotype data, and of these 1425 were genotyped and passed QC (60% female). The mean age of participants was 63.4 (±2.9) years.

The Australian twin registry (QIMR) was initiated in 1978. Temperament questionnaires were sent to two cohorts of Australian twins and their families (parents, children, spouses and siblings), the first in 1988 and the second in 1990. A total of 20 464 individuals had valid phenotypic data, and of these 5117 were genotyped and passed QC (1727 males and 3390 females, from 2567 independent families). The mean age of the participants was 36.2 (±12.1) years. The effective sample size (that is, correcting for non-independence of family members) was calculated to be 4312. Verweij et al.18 used this sample in their GWAS of TCI scales.19

Temperament assessment

Temperament in all samples was assessed using Cloninger's TCI.4 The QIMR sample used a short version (54 dichotomous items26) of the Tridimensional Personality Questionnaire (TPQ) subset of the TCI. Although the TPQ originally measured three dimensions, revisions showed that five items contributing to RD should be analyzed as a separate P scale, and that one of the RD items should be assigned to NS. Therefore, the final TPQ measure as obtained in the QIMR sample included 18 HA, 19 NS, 12 RD and 5 P items. For each scale, missing items were replaced with the mean item score. If individuals had >25% of the scales’ items missing, their scale score was treated as missing. Scale scores were transformed by taking the arcsine of the square root, corrected for the linear combination of age, age-squared, sex, a sex by age interaction and a sex by age-squared interaction, and standardized separately for each sex to a mean of 0 and a s.d. of 1.

The NFBC used the TPQ subset of the TCI version 9, with 107 binary items. The YFS used the full TCI version 9 with 240 Likert items. HBCS used the TPQ version 4 with 98 binary items. NFBC, YFS and HBCS questionnaires were examined and the subset of questions identical to those administered to the QIMR sample were identified and used in all analyses. Negatively keyed questionnaire items were reverse scored as necessary. Persons missing >25% of data for a scale were set to missing for that scale. Persons missing <25% of data had any missing values imputed by the mean of other persons’ responses (in the same study) to that item. The Likert-like scale used in the YFS was converted to a 0–1 measure by mapping 1=0, 2=0.25, 3=0.5, 4=0.75, and 5=1.0. A sum score across all items in a scale was taken as the final measure. The HBCS sum scores were regressed on age and sex, and residuals taken as the phenotype. The NFBC sample was of a uniform age, and the sum score was regressed only on sex, and residuals taken as the final measure. Although the YFS sample varied in age, age was not significantly related to the sum score for any scale, therefore the score was regressed only on sex, and residuals taken as the final measure. Data transformations were not employed for NFBC/YFS analyses, and for HBCS analysis a natural logarithmic (ln) transform was applied to the HA data.

The means of the raw sum scores are very similar for three of the four cohorts (Table 1). HBCS, with mean age 30 years older than the other three cohorts (63 years vs mid-30's), has lower average NS and P than the other three cohorts.

Table 1 Raw sum scores for each scale, by sex and cohort

Genotyping and imputation

Individuals were genotyped on the following platforms, with the respective genotyping centers indicated in parentheses: NFBC—Illumina 370duo Chip (Broad Institute, Cambridge, MA, USA), YFS—Illumina 670K Custom BeadChip, HBCS—Illumina 610K Quad Chip (Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK), QIMR samples—Illumina 317K (Finnish Genome Centre, Helsinki, Finland); Illumina HumanCNV370-Quadv3 (Center for Inherited Disease Research, Baltimore, MD, USA); Illumina Human610-Quad (DeCode, Reykjavik, Iceland) and Affymetrix 6.0. (TGen, Phoenix, AZ, USA). Rather than combining genotype data from different platforms in a joint analysis of all four cohorts, we used meta-analytic techniques to combine results from association analyses performed separately by cohort (see below).

Sample and SNP-level QC in all Finnish cohorts proceeded using the same protocols. Individuals were excluded if they were missing >5% of data, if there was a discrepancy between reported sex and sex determined from the X chromosome, if they were sibs or half-sibs of other subjects or if they withdrew consent. YFS and NFBC subjects were excluded if they had low IQ (<70). Fewer than 5% of subjects were excluded during QC. Genotyped SNPs were excluded with call rate <95%, P-value from an exact test of Hardy–Weinberg Equilibrium (HWE) <10−4 and minor allele frequency <1%. Imputation to HapMap2 (HM2) was done at the Wellcome Trust Sanger Institute, separately by cohort, using all samples that passed QC and all genotyped SNPs that passed QC in an individual cohort. Imputation was done using Markov Chain Haplotyper (MaCH),27 and all data were imputed to the forward/positive strand. The following numbers of SNPs were successfully imputed with r2>0.30: NFBC: 2 454 909, YFS: 2 489 350, HBCS: 2 492 667.

Initial QC control in the QIMR sample was applied separately to different genotype platform data and different projects. Data were checked for ancestry outliers, Mendelian errors, HWE failure (excluded if P<10−6) and minor allele frequency. After separate QC checks, Illumina and Affymetrix data were imputed separately by MACH using the data from the European HM2. SNPs with an imputation quality score (r2) >0.3 were retained, resulting in 2 380 486 Illumina and 2 369 130 Affymetrix SNPs. In addition, QC using individuals that were imputed on both the Illumina and the Affymetrix platforms, SNPs were only retained if they had high concordance rates for the most probable genotype, and had a minor allele frequency>0.01. In total, 1 252 387 SNPs were available for association analyses. More details on QC procedures in the QIMR sample can be found in Verweij et al.18

Cohort level analysis

The Finnish HM2 imputed data were analyzed separately by cohort and followed identical protocols. A separate analysis was performed for each scale. Data for HA and RD were also analyzed separately by sex, as previous work in twins indicated sex differences in the source of genetic variance for these scales.28 An additive model was assumed for SNP genotype/dose, and principal components (from a PCA analysis of the genotype IBS matrix between persons) that were significantly related to the phenotype were included as covariates (as per the method of Price et al.29) to guard against possible stratification. The first two PCs were always included as covariates, as previous work showed they correlated strongly with geographic birthplace and ancestry.30 The HM2 imputed dosage data were analyzed using probABEL.31

For the QIMR sample, the most probable imputed genotype at each SNP was tested for association with the four TPQ scales using a family-based association test,32 which takes family relationships into account (including identical twins). The additive genetic effect was calculated.

Meta-analysis

Meta-analysis combining results from all four samples was done using METAL (http://www.sph.umich.edu/csg/abecasis/metal/) by calculating a Z-statistic that was a weighted average of sample-level statistics, where the weights were proportional to the square root of the number of individuals examined in each sample, and selected so that the squared weights summed to 1. The weight for QIMR reflected only independent individuals. The direction of effect in each study was taken into account in calculating the average. There were 1 252 222 SNPs common to all four data cohorts and scales. The number of individuals analyzed varied by phenotype: HA: 11 597, NS: 11 612, RD: 11 590, P: 11 610.

We also employed the heterogeneity option of METAL. The METAL heterogeneity analysis requires a second pass of analysis to decide whether observed effect sizes (or test statistics) are homogeneous across samples. The resulting heterogeneity statistic has n–1 degrees of freedom for n samples.

Gene-based tests

To determine whether any genes harbor more associated SNPs than expected by chance, we performed a gene-based test for each personality scale (VEGAS, Versatile Gene-based Association Study33). VEGAS tests for association on a per-gene basis, by considering the P-value of all SNPs within genes (including +/−50 kb from the 5′ and 3′ UTR), accounting for the number of SNPs per gene, and linkage disequilibrium between the SNPs. As such, the test identifies genes that show more signals of association than expected by chance given their length and linkage disequilibrium between the SNPs. The gene-based test was performed on the meta-analysis association results.

Pathway analysis

Subsequently, all genes from the gene-based test with a P-value<0.01 were included in a pathway analysis using the Ingenuity Pathway analysis program (Ingenuity Systems, Redwood City, CA, USA, release IPA 6.0). By performing these pathway analyses we tried to identify whether the genes most associated with the personality scales were more prevalent in any known biological or canonical pathway than would be expected by chance. The alpha level was set at 0.0125 (0.05/4 personality scales) and significance of individual pathways was corrected for multiple testing by the Benjamini–Hochberg procedure as implemented in Ingenuity. The pathway analysis was performed on the results from the gene-based test of the meta-analysis results.

Prediction

We used the results from a meta-analysis using only the three Finnish cohorts to predict the four TCI scales in the QIMR sample, using the ‘score’ function in PLINK.34 We restricted this analysis to the same set of SNPs used in the full meta-analysis, and used only one individual per family. The ‘risk score’ for individuals in the QIMR sample was constructed by multiplying the number of copies of the effect allele at each SNP by the Z-score from the Finnish-only meta-analysis of a given scale, and summing across SNPs. The observed TCI score in the QIMR sample was regressed on this risk score to assess the degree to which variability in the observed phenotype could be explained by variability in the risk score. The risk score was calculated using all SNPs, and also using the top 10, 20, 30, 40 and 50% of SNPs in the Finnish-only meta-analysis.

Results

Meta-analysis

Genomic control lambda parameters35 estimated from the meta-analysis of 1 252 222 autosomal SNPs indicated minimal inflation of test statistics over the null value of 1.0; HA: 1.01, NS: 1.04, RD: 1.00, P: 1.02 (Figure 1 QQ plots). No SNPs were significant at a genome-wide threshold of 5 × 10−8. The most significant finding was for rs17608059 on chromosome 17 with scale P, with a P-value of 2.8 × 10−7 (Table 2). There were 83 SNPs from 16 independent genomic locations on 12 chromosomes with P<10−5 (HA: 9 SNPs, NS: 57 SNPs, RD: 10 SNPs, P: 7 SNPs, Supplemental Table S1). Scales HA and RD were also analyzed separately by sex; across all four analyses 73 SNPs from 13 independent genomic locations resulted in P<10−5 but none were significant at a genome-wide level (Supplemental Table S2). Meta-analysis of the three Finnish cohorts alone also did not produce any genome-wide significant results (data not shown), nor did meta-analysis including the heterogeneity option. A priori, both QIMR and HBCS might be considered to be cohorts with a heterogeneous signal; QIMR due to population differences and HBCS due to age differences. Among the handful of markers with METAL heterogeneity P<10−5 for one or more scales, it was never true that only the QIMR sample or the HBCS sample had a test result considered to be heterogeneous from the other three cohorts.

Figure 1
figure 1

QQ plots of meta-analysis results for each of the four temperament scales. On the x-axis is the distribution of –log10 P-values expected under the null hypothesis of no association of SNPs to the phenotype. On the y-axis is the ordered distribution of observed –log10 P-values. Deviation from the 1:1 line in the bulk of the distribution can suggest inflation of test statistics. The 95% confidence bands (dashed lines) are generated assuming the jth order statistic from a uniform (0.1) sample has a beta (j,n–j+1) distribution, and assuming independence. (a) Harm avoidance; (b) novelty seeking; (c) reward dependence; (d) persistence.

Table 2 Meta analysis and individual study-level results for the most significant SNPs in each of 16 regions demonstrating association to a temperament phenotype at P<10−5 in the meta-analysis

de Moor et al.15 found two SNPs on 5q14.3 to be genome-wide significantly associated with openness to experience (rs1477268 and rs2032794) and one SNP on 18q21.1 to be genome-wide significantly associated with conscientiousness (rs2576037). Neither association was replicated in an independent sample. Openness to experience and conscientiousness are not measured in our sample and are only modestly correlated to our phenotypes (correlations of openness to experience/conscientiousness with NS, HA, P and RD are 0.27/–0.36, –0.33/–0.24, 0.03/0.46 and 0.32/0.07, respectively6); however, we still reviewed the association findings for these three SNPs in our results. We find rs1477268 and rs2032794 to be associated to NS (P=0.03). In de Moor et al.,15 the ‘T’ allele at these markers resulted in a decrease in openness to experience; we find the ‘T’ allele to result in a decrease in NS. Association of HA, RD and P to these two SNPs all resulted in P>0.30. de Moor et al.15 found the ‘T’ allele of rs2576037 to result in a decrease in conscientiousness. In our sample the ‘T’ allele was associated with a decrease in HA (P=0.10) and P (P=0.076). Association of RD and NS to rs2576047 both resulted in P>0.27.

Gene-based tests and pathway analysis

Approximately 17 200 tests were performed as part of the autosomal gene-based analysis. Genomic control parameters indicated minimal inflation of the test statistics over the null value of 1.0; HA: 1.00, NS: 1.04, RD: 0.991, P: 0.963. The percentage of associations to be significant at the 0.05 level range from 4.7% with scale P to 5.5% with scales HA and RD. None of the scales resulted in gene-based associations that survived correction for multiple testing (17 261 tests and four scales, α=7.2 × 10−7). The top five genes for each scale are presented in Supplemental Table S3.

We then examined, for each scale, all genes with P<0.01 in the gene-based test to see whether they were concentrated in known biological or canonical pathways, using the Ingenuity Pathway analysis program (Ingenuity Systems, release IPA 6.0). The number of genes included in this analysis is HA: 193 genes, NS: 198 genes, RD: 225 genes, P: 135 genes. Results from the pathway analyses were not significant after correction for multiple testing, indicating that our top genes were not over-represented in known biological or canonical pathways more than one would expect by chance.

Prediction

A risk score calculated from the top 50% of SNPs in the HA Finnish-only meta-analysis accounted for 0.28% of the variance in HA in the QIMR sample (P=0.007); however, this result was not significant after correction for multiple testing (six thresholds and four scales=24 tests). The top 10% of SNPs in the RD and P Finnish-only meta-analysis accounted for 0.15% (P=0.052) and 0.17% (P=0.04) of the variance in RD and P in the QIMR sample, respectively. Using all SNPs in the Finnish-only meta-analysis of NS accounted for 0.059% (P=0.22) of the variance in NS in the QIMR sample. Other SNP thresholds resulted in less of the variance in the QIMR sample being accounted for by the risk score (Supplemental Table S4).

Discussion

We report here the results of the largest GWAS conducted to date for personality assessed using the TCI. The lack of genome-wide significant associations in our meta-analysis of more than 11 000 subjects, and the lack of replicated associations for personality measured by the NEO-PI-R in an even larger meta-analysis15 suggest that it will be challenging to identify such associations using standard approaches for studying personality traits. Although we find modest association of the top findings of de Moor et al.15 to some of our phenotypes, the statistical evidence is well below the level required for replication. Additionally, we find no association evidence to support the suggestion of Cloninger16 that NS, HA and RD would be influenced by genes directly affecting the dopamine, serotoneric or noradrenaline systems, respectively. Two previous studies have identified genome-wide significant linkage to HA36 and NS.37 None of our top association signals (P<10−5) for these phenotypes were on the same chromosomal arms as these linkage findings.

The true genetic architecture underlying variation in personality is of course unknown, but as with other complex polygenic phenotypes, causal loci are likely represented by a mixture of common variants of small effect and rare variants some of which could have larger effect. Height is a classic polygenic phenotype; a recent meta-analysis of 180K persons has demonstrated that 10.5% of the phenotypic variation in height can be explained by 180 associated loci.38 Although our study is much smaller, it is worth noting that our prediction analysis accounted for, at most, only 0.28% of the phenotypic variability in temperament scales.

Our study had >80% power to detect loci responsible for 0.4% of the phenotypic variation in temperament scales, at genome-wide significance levels. Failure of our study to detect significant association to what are clearly heritable phenotypes suggests that either the true effect sizes of causal loci are much smaller, and/or that the causal loci are rare and not well tagged by common variation.

GWAS are designed to identify common polymorphisms responsible for variation between individuals, and identification of common loci with small effects on personality traits could be possible by assembling very large sample sizes. For example, GWAS of blood pressure have identified replicated associations at genome-wide significance but only once sample sizes in excess of 60 000 individuals were available for meta-analysis.39 There are substantial obstacles to amassing a sample of such size for meta-analyses of personality, which do not exist for traits such as blood pressure. Blood pressure is a relatively direct biological measure, which is assessed in an objective and standardized way throughout the world. It is therefore straightforward to combine data across studies. In sharp contrast, personality trait assessment relies on self-report instruments, such as the TCI and NEO-PI-R, which pose two potential problems. First, while test–retest correlation for HA, RD and NS range from 0.58 to 0.84, for measures collected an average of 2 years apart,26 it is possible that self-report biases and differences in subjective interpretation of questionnaire items may introduce error in the assessment of traits. Second, different instruments reflect different models of personality, highlighting philosophical differences in schools of thought about core components of personality. It therefore remains unclear to what degree the phenotypes in individuals assessed via TCI overlap with those obtained in individuals assessed via NEO-PI-R.

Although combining data from studies using different personality assessments may enable GWAS of personality in sample sizes large enough to detect common loci with small effects on personality, this is a challenging undertaking. A naïve meta-analysis that would use simple sum scores is unlikely to be effective, given the modest correlation between dimensions measured in different instruments (for example, De Fruyt et al.6 showed the maximum correlation between dimensions of the TPQ and the FFM to be 0.54). Alternatively, one might attempt to map in a meta analysis not the sum scores themselves, but the scores from a principal components analysis that would combine information across scales within the same instrument, and potentially account for more of the phenotypic variance. A more sophisticated approach would be to employ item response theory (IRT40) to estimate the unmeasured, latent trait thought to be evaluated by personality assessments. van den Berg et al.41 applied IRT to the attention problems subscale of the Young Adults Self Report questionnaire42 assessed in a sample of individuals from the Netherlands Twin Registry. Heritability of the estimated latent trait was found to be much larger than the heritability of the traditionally used sum score: 73% vs 40%, respectively. Using IRT in samples evaluated with different personality assessments would identify a subset of items in these different instruments that are related to a common, unmeasured latent trait. Samples measured with multiple instruments are needed to identify these items, which are then extracted from samples evaluated using only one instrument. Refinement of personality phenotypes in this manner has the potential to greatly improve power to genetically map these traits.

The results of our meta-analysis, as well as those of de Moor et al.,15 demonstrate that the null GWAS findings are not simply due to the instrument used, and appear to suggest that successful mapping of loci contributing to personality will require new strategies and methodology. Additionally, next-generation sequencing soon will provide a host of data that may reveal rare variants that, when aggregated in the form of a ‘burden analysis’,43 account for variability in personality traits. Understanding the biological processes underlying personality-related traits would be greatly facilitated by discovery of any such associated loci, and such loci may also provide a window for understanding cognitive and behavioral disorders.