Introduction

The personality traits of neuroticism and extraversion are two higher-order dimensions of personality that are consistently identified in different dimensional models of personality such as the Five Factor Model of personality. Both personality traits are captured in a range of personality inventories (for example, NEO-Five Factor Inventory,1 Eysenck Personality Questionnaire2 and the Multidimensional Personality Questionnaire.3, 4

Neuroticism is characterized by a tendency for emotional instability, psychological distress, low self-esteem and negative emotions such as anxiety and depression. Extraversion is characterized by a tendency for high levels of sociability, activity, sensation seeking and positive emotions. Both traits are predictive of a number of social and behavioural outcomes and of anxiety and depressive disorders.5, 6, 7, 8, 9 For example, high levels of neuroticism are associated with lifetime disorders such as major depressive disorder, generalised anxiety disorder, social phobia, dysthymia and obsessive-compulsive disorder. Low levels of extraversion are associated with social phobia, agoraphobia and dysthymia. Causes of individual differences in both neuroticism and extraversion have been studied extensively using twin and adoption studies, showing heritability estimates ranging from 13 to 58% for neuroticism and from 34 to 57% for extraversion.10, 11, 12, 13, 14, 15, 16, 17, 18 Genetic factors that are related to neuroticism and extraversion may also be genetic risk factors for associated mental disorders.7, 19, 20 Therefore, insight into the genetic architecture of neuroticism and extraversion may shed light on biological mechanisms underlying mental disorders that are associated with these traits.

Gene-finding studies for higher-order personality traits have focused mainly on neuroticism, because of its association with anxiety and depression. Identifying genetic variants associated with neuroticism, but also with extraversion, has proven difficult. Suggestive linkage has been reported for several chromosomal regions with only few replications.21, 22, 23, 24, 25, 26, 27 Candidate studies have reported associations with markers within several genes,28, 29, 30, 31, 32 but again, replication generally failed.33, 34, 35 To date, a number of genome-wide association studies (GWAS) for higher-order personality traits have been published.36, 37, 38, 39, 40 None of them, however, has reported genome-wide significant associations between neuroticism/extraversion and markers. The dilemma of moderate to high heritability estimates of which only a small proportion can be explained by markers has been expressed as ‘the case of the missing heritability’.41, 42

Recently, Yang et al.43 showed (i) that 45% of the phenotypic variance (roughly half of the heritability) of human height can be explained by considering all single-nucleotide polymorphism SNPs simultaneously in a linear model analysis, implying that most of the heritability is not missing but is as yet undetected due to effect sizes of individual variants being too small to reach significance in GWAS conducted to date; and (ii) that the remaining heritability can be ascribed to incomplete linkage disequilibrium (LD) between causal variants and genotyped SNPs. Insufficient LD is likely to occur if causal variants have lower minor allele frequency (MAF) than the genotyped SNPs.44

The purpose of the present study is to estimate the proportion of the phenotypic variance of neuroticism and extraversion explained by considering all SNPs on currently-used genome-wide arrays simultaneously.43, 45 The basic idea of this method is to accumulate the effects of all the associated SNPs that might be too small to reach the stringent significance level in single SNP analyses. To this end, we analysed the neuroticism and extraversion personality trait scores, measured in 12 000 unrelated individuals from combined samples from Australia, Sweden, the United Kingdom and the United States of America.

Methods

Sample and phenotypes

Information on samples and phenotypes is provided in the Supplementary Information Section 1. Briefly, neuroticism data were available for 17 875 (58% female) individuals (mean age 39.07; s.d.=16.02, range: 14–86). After estimation of the pairwise genetic similarity using all autosomal markers and deleting one of each pair of individuals with a genetic relatedness >0.025 (See Supplementary Information Section 1), 11 961 (58% females) individuals were retained (mean age 42.36; s.d.=16.91, range: 14–86). For extraversion, data from 17 557 individuals (59% female) individuals were available (mean age 39.01; s.d.=16.18, range: 14–86) of whom 11 786 (58% females) were genetically ‘unrelated’ (mean age 42.36; s.d.=17.09, range: 14–86).

Analyses

The method applied in the present study is extensively described by Yang et al.43, 45 and is designed to capture variation due to LD between genotyped SNPs and unknown causal variants in the genome. This estimate of variance explained by all SNPs is different from heritability estimates in twin and family studies as the latter include variance explained by all causal variants (an estimate of the latent genetic effect). For more detailed information on the methods see Supplementary Information Section 2.

We first fitted a linear model including all autosomal SNPs to estimate the proportion of the variance explained by all the SNPs. The Study cohort status and the first 20 principal components from a principal-component analysis of the genome-wide genotypes were fitted as covariates in the model to control for the effects attributable to population structure. Analyses were repeated (i) without PC adjustment, (ii) adjusting for imprecise LD between genotyped SNPs and causal variants (See notes Supplementary Information Table S2 for details), (iii) separately for each study-cohort, (iv) for the total sample excluding one study-cohort at a time, (v) fitting only SNPs on the X chromosome, (vi) separately for men and women and (vii) including a genotype-by-sex interaction term.

Twin and family studies have reported, although inconsistently, sex effects on genetic factors underlying both neuroticism and extraversion.3, 18, 46, 47, 48, 49, 50 To investigate possible sex effects on the genetic variance of neuroticism and extraversion in the current design, significance of a genotype-by-sex interaction term was tested. We also estimated genetic variance explained by all the SNPs in sex-specific analyses. We investigated possible sex effects in both the total sample and in the Australian cohort only. Twin and family studies have generally suggested stability of genetic factors underlying both neuroticism and extraversion over age,3, 18, 46, 47, 48, 49, 50 others, however, suggested instability over age.50 Age effects were not investigated as age is confounded with the different study cohorts in the present study.

To investigate whether the variance explained is proportional to the length of the chromosome, we subsequently partitioned the variance explained into individual autosomes. To this end, all chromosomes were simultaneously fitted in a mixed linear model and the proportion of the variance explained by each of the chromosomes was estimated.51

In a subsequent series of analyses we investigated possible discrepancy in the percentage of variance explained by SNPs and variance explained by latent additive genetic influences estimated in twin and family studies. We mimicked the conventional AE-model (that is, estimation of additive genetic factors and environmental factors) using the entire Australian sample, including close relatives (for example, MZ/DZ twin pairs, full sibs, parent-offspring pairs and cousins), that is, no cutoff in genetic similarity between individuals was applied. In twin and family studies, estimates of heritability are based on the relationship between phenotypic resemblance and expected genetic similarity based on pedigree information (for example, MZ twin pairs are expected to share 100% of their genetic material, whereas DZ twins, full sibs and parent-offspring pairs are expected to share 50% of their genetic material). SNP data, however, provide us with estimates of the realized genetic similarity, which vary around the expected values from pedigrees (See Supplementary Information Figure S2). The genetic variation around the expected values (for example, 0, 50 or 100%) is not captured by pedigree studies but can be captured by the SNP data. In order to partition the genetic variance into variance explained by pedigree data and variance explained by SNP data, genetic similarity was estimated from pedigree information (expected genetic similarity) and from SNP data (realized genetic similarity) in a series of separate and joint analyses of the pedigree and SNP data. The analyses are described in detail in the Supplementary Information Section 1. Briefly, we (1) estimated variance explained by all SNPs based on ‘unrelated’ individuals using genome-wide SNP date, as above; (2) estimated variance explained by all SNPs based on all individuals using genome-wide SNP data; (3) estimated heritability from all individuals using expected genetic similarity (pedigree similarity matrix); (4) partitioned the variance into variance captured by the SNP similarity matrix and variance captured by the pedigree similarity matrix from all individuals; and (5) partitioned the variance further into variance captured by the pedigree similarity matrix from all individuals, the SNP similarity matrix for ‘unrelated’ individuals and the SNP similarity matrix for ‘related’ individuals. Note that estimates of variance from analyses including close relatives are increased by the effects of (i) possible environmental factors that render phenotypic similarity between genetically more similar individuals and (ii) by causal variants that are not correlated with genotyped SNPs but are captured by pedigree structure. Close relatives share a substantial proportion of their segregating genes on all chromosomes (for example, 50% for full siblings), consequently, variance attributable to causal variants on one chromosome will be captured by SNPs on other chromosomes.43, 52

Results

A genetic similarity matrix was estimated for all individuals using 849 801 autosomal SNPs that passed quality control. After removing one individual of each pair with a genetic similarity >0.025, a subset of 12 000 individuals was retained for the analyses of neuroticism and extraversion (See Supplementary Information Table S2 for exact numbers for the combined sample and for specific study cohorts). Estimates of genetic similarities for each pair of ‘unrelated’ individuals were normally distributed with mean of -0.0002 and s.d.=0.0043 (See Supplementary Information Figure S1).

The proportion of the phenotypic variance explained by all the SNPs (h2SNP) estimated using all ‘unrelated’ individuals was 0.06 (s.e.=0.03, P<0.05) for neuroticism, and 0.12 (s.e.=0.03, P<0.001) for extraversion (See Figure 1 and Supplementary Information Table S2).

Figure 1
figure 1

Proportion of variance explained by autosomal single-nucleotide polymorphisms (SNPs) for neuroticism (dark blue) and extraversion (light blue). Notes: all individuals included in the analyses have a pair-wise genetic similarity <0.025 except for the * marked in which no cutoff was made; the first 20 principal components (PC) were included as fixed effects in the model, except for the *** marked; error bars represent s.e.; PC; SNPs; UK and USA.**analyses based on 162 056 SNPs that were in common between all study cohorts; adjusted: estimate adjusted for imprecise LD between genotyped SNPs and causal variants for causal variants within the allelic frequency spectrum as genotyped SNPs, using the regression coefficient β from equation: (assuming c=0), where Ajk is the variance of the off-diagonal elements of the genetic similarity matrix, N is the number of SNPs used to calculate Ajk. The value of c depends on the minor allele frequency of the causal variants.44 Further details and statistics are provided in Supplementary Table S2. Abbreviations: SNPs, single-nucleotide polymorphisms; UK, United Kingdom; USA, United States of America.

If the analyses were based on only the SNPs that were in common between all the study cohorts, estimates of the heritability were slightly lower (See Figure 1 and Supplementary Information Table S2). This is expected because a less dense SNP set (162 056 versus 871 333) provides lower LD between the genotyped SNPs and the unknown causal variants. Analyses with and without the first 20 principal components included as covariates in the model revealed very similar results, implying that any population stratification that was observed within our European-ancestry sample had a negligible effect on the results. The analyses were repeated for subsets of the sample (Figure 1 and Supplementary Information Table S2). Separate study-cohort analyses showed a fairly inconsistent pattern reflecting larger s.e., but generally estimates were lower for neuroticism than for extraversion. Study-cohort separate analyses based on only the 162 056 SNPs that were in common between all study cohorts showed a similar pattern as the analyses based on all the SNPs, suggesting that different estimates between study cohorts are not a result of different SNP arrays that were used for genotyping in the different study cohorts. Exclusion of any of the study cohorts provided no evidence that one of them affected the results disproportionately. We also estimated the variance explained by the X chromosome and did not find any significant result for either neuroticism or extraversion.

Separate analyses on the sexes in the total sample showed different point estimates for men and women with increased point estimates in men for both neuroticism and extraversion and a decreased point estimate in women for extraversion, however, the genotype-by-sex interaction analysis suggested that these differences were not significant (Supplementary Information Table S2). The estimate of genetic variance from the total sample reflects a weighted average of the estimates of variance from each of the sex cohorts and the covariance tagged by SNPs between the sexes. The nonsignificant genotype-by-sex interaction implies no significant differences between the genetic variances for each sex and the covariance between sexes.

Partitioning the genetic variance into the 22 autosomes revealed that the phenotypic variance of extraversion explained by each chromosome is proportional to its length (R2=0.20, P<0.05; Figure 2). This corroborates evidence of a polygenic model underlying variation in extraversion. As longer chromosomes harbour more genes, a linear relationship between length of a chromosome and the estimate of variance explained by that chromosome is consistent with the hypothesis that many genes of small effect contribute to genetic variation of the trait. The s.e. of the estimates for the chromosomes were high so that none of the estimates differed significantly from the regression line. This suggests that no individual loci disproportionately affect genetic variation. Genome partitioning for neuroticism did not show a significant relationship between chromosome length and variance explained (Figure 2). This is not unexpected given the smaller genetic variance that can be attributed to common SNPs.

Figure 2
figure 2

Variance explained by each chromosome for neuroticism (a) and extraversion (b). Notes: equation corresponding to the linear regression line for neuroticism: y=8E-06x +0.0017 and extraversion: y=4E-05x +0.0011, where y is the variance explained by each of the chromosomes and x is the total length of the chromosome in mega bases (Mb); R2 corresponding to the regression equation for neuroticism: 0.0202 (P=0.460) and extraversion: 0.203 (P=0.035).

Estimates of the heritability based on the separate and joint analyses of genetic similarity estimated from pedigree and from SNP data based on the Australian sample only are listed in Supplementary Information Table S3. The coefficient from the regression pedigree pairwise similarities on the SNP pairwise similarities is 1.003 (See Supplementary Information Figure S2), as expected.

Heritability estimates from SNP data using only ‘unrelated’ individuals were substantially lower than estimates reported by twin and family studies (analysis 1 in Supplementary Information Table S3). Heritability estimates from both SNP and pedigree data using all individuals, however, (analyses 2 and 3 in Supplementary Information Table S3), confirmed a moderate heritability for both neuroticism and extraversion from twin and family studies (42% from SNP analyses and 45% from pedigree analyses). Note that estimates of variance from analyses including close relatives are increased by the effects of all causal variants (rather than only the effects of causal variants in sufficient LD with the genotyped SNPs included in the analyses), but may also be inflated by confounding of closely related individuals and common-environmental factors and nonadditive genetic effects.

The estimates of variance using all data will reflect some weighted average of the estimates from close and distant relatives, with relatively more weight attributed to close relatives because the coefficients of similarity are so much higher. Estimates from SNP data were slightly lower than estimates from pedigree data, reflecting the relatively large number of pairwise combinations of ‘unrelateds’ in the SNP analyses.

Partitioning the variance into variance explained by pedigree data and variance explained by SNP data showed that for both neuroticism and extraversion one third of the variance (14%) that was explained by pedigree data could be picked up by SNP similarity alone (analysis 4 in Supplementary Information Table S3). Further partitioning showed that the part of the variance that could be explained by SNP similarity was mainly due to variation detected from those individuals that are ‘unrelated’ in the conventional (pedigree) sense, and much less so to variation in realized genetic similarity of related individuals around the expected (pedigree) values (analyses 5–7 in Supplementary Information Table S3). However, the s.e. of the estimates from realized similarity were large, as expected from theory and empirical studies.53, 54

Discussion

The aim of this study was to estimate the proportion of variance for personality dimensions of neuroticism and extraversion explained by all the SNPs in a linear model analysis. Common SNPs explain 6% (s.e. 3%) and 12% (s.e. 3%) of the phenotypic variance for neuroticism and extraversion, respectively.

Estimates of variance explained by all the SNPs for neuroticism and extraversion are similar to estimates reported for the Cloninger Temperament subscales Harm Avoidance and Novelty Seeking in a Finnish sample.55 The significant estimates from the present study contrast considerably with results from candidate gene studies and large scale GWAS. Hitherto, only very little variance (<2%) could be attributed to markers that have been reported in candidate gene studies28, 29, 30, 31, 32 and no variance could be explained by SNPs detected in large-scale GWAS.36, 37, 38, 39, 40 Those GWAS were powered to detect genetic variants that explain 1–2% of the phenotypic variance.

Our results suggest a polygenic model in which a large number of variants individually explain a small proportion of the variance. Moreover, we observed a linear relationship between the length of a chromosome and the proportion of the variance explained by that chromosome for extraversion, which also supports a model in which many variants affect the variance of the trait. As none of the chromosomes significantly differed from the regression line, no major gene effects are supported in our analyses. The linear relationship between chromosome length and variance explained was substantially smaller for neuroticism, which was expected given the smaller genetic variance that can be attributed to common SNPs.

In our approach, we tried to eliminate possible causes of sample structure (for example, hidden relatedness, population stratification) that may lead to spurious association.56 Consequently, phenotypic correlations between individuals are due to genetic similarities so that the estimated genetic variance component entirely reflects LD between the genotyped SNPs and the unknown causal variants. Our estimates are unbiased estimates of the variance in neuroticism and extraversion explained by common SNPs, so that the estimates are not expected to change (within bounds implied by the s.e. and assuming the same genomic coverage by SNPs) as sample sizes increase. These results imply that with larger sample sizes, common associated variants could be detected in single SNP analysis, depending on effect sizes of single variants.

Although our estimates surpass results from GWASs, estimates are still lower than heritability estimates reported in twin and family studies. The discrepancy can be attributed to several factors. First, heritability estimates from twin and family studies include the effects of all causal variants, whereas heritability estimated using the methods employed in this study include only the effects of variants that are in (sufficient) LD with the genotyped SNPs included in the analyses. LD is especially low if the causal variant has a different MAF than the genotyped marker. Consequently, contribution of rare variants (MAF <0.5%) and variants with low MAF (0.5% < MAF <1%) is included in the heritability estimates from twin and family studies but is generally not included in the variance explained by common SNPs. As causal variants are unknown, we cannot estimate LD between genotyped SNPs and causal variants directly. However, assuming that LD between the causal variants and genotyped SNPs is as strong as between the genotyped SNPs (that is, adjustment for prediction error of causal variants that have the same allelic frequency spectrum as genotyped SNPs), we corrected the estimate of variance explained by all the SNPs that were in common between all study cohorts (162 056 SNPs) for incomplete LD with causal variants. The adjusted point estimates were considerably higher for both neuroticism (7 versus 5%) and extraversion (13 versus 9%) suggesting that with a denser SNP set more variance could be explained (see also Figure 1 and Supplementary Information Table S2).

Similar analyses of a number of other complex traits, for example, human height,43 crystallised and fluid intelligence,57 Crohn's Disease, Bipolar Disorder and Type 1 Diabetes,58 Body Mass Index and biological measures such as von Willebrand Factor and QTi51 have shown that common SNPs explain a substantial part of their reported heritabilities. If rarer variants are more likely to affect variance of neuroticism and extraversion, compared with the traits described above, relatively low heritability estimates would be expected from the current design (that is, genome-wide SNP analysis). A potential reason for a relatively large contribution of rare variants is the possible relation between neuroticism/extraversion and fitness. If personality traits neuroticism and extraversion are correlated with fitness, and decreased reproductive fitness reduces the frequencies of risk variants for less desirable personality traits, one would expect a relatively large contribution of rare variants to standing genetic variation for these traits. As the current data set primarily consists of SNPs with MAF >0.01, rare (causal) variants are generally poorly tagged which, in turn, would explain some of the remaining heritability. Plausible arguments can be made that neuroticism and extraversion are correlated with fitness: mental disorders that are related to, or at the extreme end of, neuroticism and extraversion personality domains are associated with lower fertility, because of, for example, higher mortality rates and reduced mating opportunities.59, 60 Correlations between fitness and personality domains neuroticism and extraversion, however, are likely to be very low.

A second reason for the lower heritability estimated from SNP data compared with pedigree data is that the latter may be biased upwards through confounding with nonadditive effects while, in contrast, the heritability estimates from SNP data are ‘narrow sense’ estimates, which reflect purely additive genetic effects. Nonadditive effects have been inconsistently reported for both neuroticism and extraversion with evidence provided from some studies,11, 12, 13, 48, 59, 60, 61 but not others.48, 62 However, nonadditive effects are difficult to distinguish from common-environmental effects in the classical twin design.48, 62, 63 Consequently, common-environmental effects may well be masked by nonadditive effects in twin and family studies resulting in inflated estimates of additive genetic effects and deflated estimates of common-environmental effects. In a sample including both twin pairs reared apart and twin pairs reared together, Pedersen et al.,18 showed that common-environmental factors had a modest contribution on the variance of both neuroticism and extraversion while nonadditive genetic effects were absent in neuroticism but significant in extraversion.

Third, and related to the second, heritability estimates from twin and family studies are possibly inflated by common environment of family members because of deviations from assumptions that underpin the classical twin design such as equal common environmental influences for monozygotic and dizygotic twins, although credible arguments against this special ‘MZ twin environment’ have been made.64 In this study, the variance explained by all SNPs is estimated through such distant genetic relationships that confounding with common environment is unlikely. Furthermore, it is common practice in twin and family studies to ‘drop’ nonsignificant variance components from the model and subsequently report only the remaining significant variance components. In this way the common environmental term is often dropped but if it truly contributes to variance between individuals and if the reduced most parsimonious model reflects sample size and power, the reported estimates of heritability may be biased upwards with their s.e. biased downwards.

From twin and family studies we know that heritability estimates for both neuroticism and extraversion are moderate, with no major differences in estimates between the two dimensions. In the current design, however, we observe a larger estimate of variance explained for extraversion compared with neuroticism. A possible explanation for this discrepancy is that the phenotypic and biological complexity underlying neuroticism, is greater than for extraversion, causing diluted associations between genetic polymorphisms and neuroticism.65 The personality dimension of neuroticism includes facets related to both anxiety and depression, which although genetically correlated.66 may imply multiple biological dimensions. Other explanations for the lower estimate of variance explained by SNPs for neuroticism compared with extraversion may reflect differential impact of some of the reasons listed above. However, all explanations are speculative as yet.

The observed difference in estimates of variance explained based on a sample of ‘unrelateds’ (that is, not affected by common environmental influences and/or pedigree structure) and of variance explained based on a sample of ‘relateds’ (that is, possibly affected by common environmental influences and/or pedigree structure), especially for neuroticism, informs on the poor outcomes reported to date from GWAS of major depressive disorder.67 Future research may consider possible environmental factors that dilute associations between genetic polymorphisms and the trait in the classic GWAS design.

In summary, a significant proportion of the phenotypic variance of neuroticism and extraversion can be explained by unknown causal variants in LD with common SNPs. This means that some of the so-called missing heritability of neuroticism and extraversion can be found in many common variants of small effects. The remaining discrepancy between our estimates from all the SNPs and the reported estimates of narrow sense heritability from pedigree studies might be attributed to (i) a large (er) role of rare variants, (ii) nonadditive effects (dominance and epistasis), and/or (iii) environmental influences that are included in the heritability estimate from pedigree studies but not in the heritability based on SNP data. Large samples with genome-wide sequencing data that are likely to become available with current developments in sequencing technologies may reveal the contribution of rare variants to the heritability of neuroticism and extraversion. Applying the current method to sequencing data could not only allow quantification of the contribution of rare variants, it should also be more conclusive about the total contribution of additive genetic effects to phenotypic variation in personality traits.