Introduction

Human personality is a compound of complex traits that are associated with several psychiatric,1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 and somatic disorders.17, 18, 19, 20, 21, 22, 23, 24, 25, 26 Despite high heritability ranging from 33 to 60%, the understanding of the genetic origins of personality trait variation is extremely limited. Where linkage analyses have identified several large regions, very few overlap.27 Candidate gene studies have their own concerns in that the findings have generally not been replicated.27 A recent genome-wide association study (GWAS), including up to 18 000 individuals, yielded only a few loci that attained genome-wide significance.28

The general success of GWAS in gene discovery and failure to replicate most of the significant linkage peaks for complex traits and diseases shifted interest towards GWAS. Association studies have benefitted from precise estimation of the locus, reproducibility of the findings and the availability of large population-based cohorts, which has led to powerful studies for gene discovery as well as replication using meta-analyses. However, the GWAS have not been very successful in finding genes of complex psychiatric/behavioral traits. The ‘case of the missing heritability29 has led to the view that common traits may be driven by relatively rare variants.30 Despite the fact that rare variants are weakly tagged by the common single-nucleotide polymorphisms (SNPs) present in the current arrays,30, 31 association signals in GWAS at loci including rare variants have been seen in particular for lipids.29 Also, two recent papers argued that common variants may tag more rare ones from a theoretical perspective,30, 31 implying that linkage and association analyses should be able to identify the same loci. For complex traits, it was suggested to use genome-wide linkage and association analyses in two steps to maximize power.32 Linkage may point towards the regions in the GWAS to look for possible associated variants, reducing the number of tests and relaxing the significance threshold, thus resulting in a higher chance of finding genetic variants.

In this study, we aimed to discover genetic variants, both rare and common, that affect personality traits, including neuroticism, extraversion, openness, agreeableness and conscientiousness. We first perform a quantitative trait linkage analysis in four independent cohorts – assuming that this would pick signals at loci containing large effects with relatively rare (population-specific) variants, and then combine these in a meta-analysis – assuming that the meta-analysis would pick linkage signals at loci harboring genes with relatively moderate to small effects. Finally, we compare the findings of our meta-analysis of linkage scans with those of the largest meta-analysis of GWAS of the same traits (which included the linkage samples examined)33 to localize the possible linked gene. This is by far the largest linkage study conducted for the NEO personality traits combined with the largest GWAS of these traits conducted to date.

Materials and methods

We performed a meta-analysis of linkage studies that used the NEO Five Factor Inventory (NEO-FFI) or the Revised NEO personality inventory (NEO-PI-R)34 to assess the five basic personality traits including: (1) neuroticism – a trait that refers to the tendency to experience negative emotions; (2) extraversion – a measure of sociability, positive emotions and action; (3) openness – a measure of intellectual curiosity and preference for variety; (4) agreeableness – a measure of altruism, cooperation and harmony; and (5) conscientiousness – a measure of an individual’s tendency to plan, organize and direct his impulses.34 This five-factor model is a hierarchical model where each of these five traits is defined by six underlying facets assessed by the NEO-PI-R. These five traits, also known as the Big Five, are considered universal,35 stable in adulthood36 and orthogonal, but correlations appear possibly due to self-report.37 Women score generally higher on neuroticism and agreeableness38, 39, 40 compared with men. The NEO-FFI consists of 60 items, 12 for each trait, whereas the NEO-PI-R consists of 240 questions inclusive of the ones in the NEO-FFI. This study included linkage scans from four independent populations, including the Erasmus Rucphen Family (ERF), the Netherlands Twin Register (NTR), the Australian Adolescent sample (QIMR_adolescent) and the Australian adult sample (QIMR_adult) (Table1). ERF and NTR used the NEO-FFI for personality assessment, whereas the Australian samples were assessed with either NEO-FFI or NEO-PI-R; however, for this study the 60 items of the NEO-FFI was used for the final assessment. Within each cohort the scores were considered invalid if an individual answered fewer than nine questions, otherwise the missing data were imputed by taking the individual’s average score for the valid items of that dimension. The descriptive statistics of the study samples are provided in the Supplementary Table 1. Variance component (VC) linkage analysis, adjusted for age and gender, was performed in each cohort with MERLIN (Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA) (Table 1). A power analysis for a quantitative trait locus (QTL) explaining 1, 5 and 10% of the trait variance was also performed in each cohort with POLY (Chen and Abecasis)41 at fixed type 1 error rates of 1 and 5%.

Table1 General features of independent genome scan

Erasmus Rucphen Family

The study sample consisted of 2657 individuals who participated in ERF study.42 The study population essentially consists of one extended family spanning over 23 generations and including more than 23 000 individuals descending from 20 related couples who lived in the region between 1850 and 1900. All descendants were ascertained and descendants of 18 years and older were invited to participate. Spouses were invited only for family members who had children of 18 years and older.

For all participants, genomic DNA was extracted from peripheral venous blood utilizing the salting out method.43 For genome-wide linkage analysis, genotyping was performed using the Illumina 6K linkage panel that includes markers distributed evenly across the human genome (median distance between markers is 301 kb). Of the 6000 single-nucleotide polymorphisms (SNPs), we used 5250, after quality control and excluding X-chromosomal SNPs. The genotyping was performed at the Center National de Génotypage in France according to the standard protocol.

Multipoint VC linkage analysis was performed using MERLIN v.1.0.1 software44 for all the five personality traits. Marker allele frequencies were estimated from the data. The pedigrees were split on non-overlapping fragments of no more than 18 bits with the help of two programs: Jenti45 and PedSTR.46 Three sets of subpedigrees were obtained with different parameters with the help of these programs, which were then analyzed separately. These three sets differed one from another by number, size and structure of pedigree fragments. However, they demonstrated similar profiles of LODs for all analyzed traits (Spearman’s correlations varied from 0.6 (P<0.001) to 0.8 (P<0.001)), which allowed us to use the maximum of three values of LOD for each marker locus.47 In accordance with Bonferroni correction, suggestive and significant thresholds were estimated as 2.34 and 3.75, respectively. The analysis was based on 2244 genotyped and phenotyped persons from ERF.

Netherlands Twin Register

The NTR48 sample consists of 711 families with 3412 non-clone individuals (1438 founders, 1870 females) with an average of 4.8 subjects per family. In all, 282 of these families have both founders genotyped and 138 families had one genotyped founder. In addition, there are 290 nuclear families with no genotyped founders, and one extended pedigree with four founders without genotypes. Autosomal genomes had 757 markers spaced at an average of 4.76 cM (range 0.0–20.59 cM), with average heterozygosity of 0.76. Founders had the genetic data for 446 autosomal microsatellites. NEO measures together with the age and gender information were available for a total of 1507 subjects (998 non-founder; 509 non-founders) from 409 families with genetic data. Of 409 families, 270 had two phenotyped siblings, 113 had three, 19 had four, 1 family had five, 4 families had six and 2 families had seven phenotyped siblings, resulting in a total of 835 sibling pairs. In addition, this sample also features 83 phenotyped MZ clones.

The genetic maps were obtained through the Rutgers University Map Interpolator.49 The allele frequencies were estimated with the Mendel v.10.0 software package (Kenneth Lange, University of California, CA, USA).50 VC linkage scan of the autosomal genome was conducted with MERLIN v.1.1.2.

Queensland Institute of Medical Research, Australian Study Sample

NEO personality data (NEO-PI-R or NEO-FFI) were collected as part of two independent research streams – one focused on an adult sample (QIMR_adult) and the other with an adolescent/young adult focus (QIMR_adolescent). The QIMR_adult data were collected as part of the Nicotine Addiction Genetics (NAG) study (2001–2005), which targeted families based on heavy smoking index cases identified in earlier interviews and questionnaires51 and was itself part of the Interactive Research Project Grants (IRPG). This sample comprised 1349 genotyped individuals aged 21 to 85 years (M=45.5±13.1) from 519 families, and included 15 complete MZ pairs for whom data were averaged.

The QIMR_adolescent data were collected from two population studies under the umbrella of the Brisbane Adolescent and Young Adult Twin Studies, specifically, studies of cognition (1996–ongoing)52 and health and well-being (2002–2003),52 and from a study of borderline personality disorder (2004–2006).53 This sample comprised 1096 genotyped individuals aged 16 to 27 years (M=19.4±2.7), from 563 families, and included 127 complete MZ pairs for whom data were averaged.

Participants were typically Caucasian, predominantly Anglo-Celtic (ancestry outliers, identified using HapMap3 and GenomeEUTwin individuals as a reference panel, were excluded). Written, informed consent was obtained from all participants and from a parent or guardian for those aged under 18 years. Ethics approval was received from the institutional review boards appropriate to each study (QIMR and Washington University School of Medicine).

Genotyping was carried out using the Illumina (San Diego, CA, USA) 610K or 370K SNP platform and Illumina BeadStudio software, with 269 840 SNPs common to the subsamples passing QC (28% of the SNPs selected for linkage were from this set of SNPs).54 Data were imputed using HapMap I+II (CEU, build 36 release 22) using the MACH software (Center for Statistical Genetics, Ann Arbor, MI, USA).54 SNP selection for linkage analysis was matched as far as possible to the SNP set used with the ERF sample. The final selection contained 5479 SNPs, of which 5181 had a direct match with our genotyped or imputed data. For the remaining 298 SNPs, proxies based on linkage disequilibrium (>0.8), or position, were used. Multipoint VC linkage analysis was performed using MERLIN 1.1.244 in both samples.

Meta-analysis of linkage scans

Results from individual genome-wide linkage scans were combined together in a meta-analysis using the Genome Search Meta-Analysis (GSMA).55, 56 The GSMA method divides the genome into several bins of equal width (chosen arbitrarily such that smallest chromosome should have at least two bins), and is robust to differences between studies in ascertainment schemes, marker maps and statistical methods used to detect linkage. Customarily a bin width of 30 cM, which gives 120 bins across the autosomal genome, is used. The bins are then ranked in descending order in each study, with the bin with the highest LOD score getting the highest rank. The ranks within a bin are summed across each study to get a summed rank (SR). The SR statistic is tested for significance using its distribution function or by simulation,55 which gives the probability of observing a given SR for a bin (PSR). PSR only gives the point-wise probability for the SR for a certain bin. A genome-wide interpretation of the results is obtained through the ordered statistic (OR), which determines the probability (POR) of a given SR for a bin by chance when bins are assigned ranks randomly in multiple simulations.57 Simulations show that a bin with a significant PSR and a significant POR (PSR<0.05 and POR<0.05) has a high probability of containing a true susceptibility locus.57 For an individual bin, the genome-wide significance is defined as PSR<0.05/number of bins and suggestive as PSR<1/number of bins. For an individual scan, we used 3.3 as significant and 1.9 as suggestive linkage threshold.58

In this study, except for the NTR sample genome-wide linkage results were reported against SNP markers, which helped us map all the results directly to the base pair positions. For NTR, we mapped all the results to base pair positions by interpolation and using base pair positions from Rutgers map.49 For each study, we divided the autosomal genome into 125 bins of width 25 million base pairs (mbp) each, giving a maximum of 10 bins on chromosome 1 and a minimum of 2 bins on chromosome 22. To evaluate the possibility of correlation between adjacent bins, we also performed the analysis using a bin width of 40 mbp. We performed weighted meta-analysis, where weights were calculated as the square root of the sample size in each study. A total of 10 000 permutations were performed to get the PSR and POR. A bin was considered significant if (1) either the Bonferroni-corrected significance was achieved, that is, PSR<0.0004 for a bin or (2) if both PSR and POR were nominally significant for multiple bins.57 Suggestive linkage threshold was set at PSR<0.008. Heterogeneity testing was performed using the HEGESMA59 software (School of Medicine, University of Thessaly, Thessaly, Greece).

In an attempt to discover the variant that might be causing the linkage signals, we fine mapped the interesting regions from the meta-analysis of linkage scans with the results of a meta-analysis of GWAS of the NEO personality traits.28 The meta-analysis of GWAS (n>17 000) included the samples used in this study (and many others as well), but the marker sets used were different. Brief description of the studies included in the meta-analysis of GWAS is provided in Supplementary Table 2.

Results

Individual scans

Results of the power analysis are provided in the Supplementary Table 3. The results from individual genome-wide linkage scans are provided in Supplementary Figures 1–5 and significant and suggestive findings are summarized in Supplementary Table 4. Significant evidence of linkage was observed for neuroticism in the ERF study at chromosome 3p14 (rs1490265, LOD=4.67) and at chromosome 19q13 (rs628604, LOD=3.55) in the QIMR (adolescent) sample; for extraversion at 14q32 (ATGG002, LOD=3.3) in the NTR sample; for agreeableness at 3p25 (rs709160, LOD=3.67) and two adjacent regions on chromosome 15, including 15q13 (rs970408, LOD=4.07) and 15q14 (rs1055356, LOD=3.52) in the QIMR (adolescents) study. Considering the suggestive findings, there was an overlap at chromosome 2q14 between ERF (LOD=2.1) and NTR (LOD=2.03) for neuroticism and at 12q23 between ERF (LOD=2.85) and NTR (LOD=1.96) for openness (Supplementary Table 4).

Meta-analysis

Genome-wide results of the meta-analysis are illustrated in Figure 1. Table 2 provides a summary of the bins with significant evidence of linkage. None of the bins crossed the Bonferroni-corrected genome-wide significance threshold. However, there were multiple bins for which both PSR and POR were nominally significant (Table 2 and Figure 1).

Figure 1
figure 1

Results of the meta-analysis of the linkage scans. (a) For neuroticism, (b) for extraversion, (c) for openness, (d) for agreeableness and (e) for conscientiousness. Dots represent PSR and triangles represent POR. The X axis shows the whole autosomal genome divided by solid vertical lines into chromosomes, which are further divided into bins by dotted gray lines. The Y axis shows the negative principal log of the P-values. The red horizontal dotted line represents the nominal threshold P-value=0.5, the gray dotted horizontal line represents the suggestive threshold P-value=0.008 and the light blue horizontal dotted line represents the Bonferroni-corrected threshold P-value=0.0004. A bin is considered significant if a circle for that bin surpasses the sky blue line or if both the circle and the triangle for a specific bin are above the dotted red line. The green diamonds represent the P-values of the SNPs from the results of the meta-analysis of the GWAS of the same traits falling in the bins of interest. The green diamonds with the red outline are the P-values significant/borderline significant after correcting for the number of total SNPs in the bin.

Table 2 Comparison of interesting regions of linkage meta-analysis with the personality GWAS meta-analysis

For extraversion, five bins showed nominally significant PSR and POR. These included bins 9.6, 11.5 and two adjacent bins on chromosome 10 (10.4 and 10.5). The linkage signals on the adjacent bins on chromosome 10 were being caused by the same peaks that extended over 40 cM (Supplementary Figure S2). When the bin width was increased to 40 mbp, the finding for the new bin on chromosome 10 covering the previous 10.4 bin and partially covering the 10.5 bin remained significant and the adjacent bin did not show linkage signals. When comparing the significant bins with the meta-analysis of GWAS, we identified clusters of SNPs with low P-values for the bin in chromosome 10 (rs7088779, P-value=4.2 × 10−06) (Table 2 and Figure 1b). Rs7088779 was marginally significant after being corrected for the number of SNPs in the bin. This SNP is located between CRTAC1 (cartilage acidic protein 1) and C10orf28, a region previously implicated in Alzheimer’s disease.

For openness, five bins (9.1, 11.6, 15.4 and 19.3) were significant in that they showed both PSR and POR<0.05 (Table 2 and Figure 1c). A cluster of eight SNPs with very low P-values was identified from the GWAS for the bin 11.6 (Figure 1c), which maps to 11q24 region. The most significant SNP (rs677035) showed a P-value of 2.6 × 10−06, which passed the Bonferroni threshold. Rs677035 is an intergenic SNP located between FLI1 and KCNJ1.

Two bins (4.8 and 19.1) showed nominally significant PSR and POR for agreeableness, but were not supported by association (Table 2 and Figure 1d). Heterogeneity between studies was detected for bin 4.8. None of the bins for neuroticism or conscientiousness showed significant linkage (Figure 1e).

Discussion

We found significant evidence of linkage of neuroticism with chromosomes 3p14 (rs1490265, LOD=4.67) and 19q13 (rs628604, LOD=3.55); of extraversion with 14q32 (ATGG002, LOD=3.3); and of agreeableness with 3p25 (rs709160, LOD=3.67) and with two adjacent regions on chromosome 15, including15q13 (rs970408, LOD=4.07) and 15q14 (rs1055356, LOD=3.52) in the individual scans. In the meta-analysis, we discovered one region on chromosome 11q24 significantly linked to openness, which was also strongly supported by the results of the largest GWAS of personality traits performed to date.

Our meta-analysis included 6149 individuals for multiple extended families and families with sibships. There are a number of methodological issues relevant for the interpretation of the findings. We used a physical map (base pair positions) and bin width of 25 mbp (which would roughly translate to 25 cM) as opposed to the traditional 30 cM bin width used in all previous studies. The choice of 25 mbp was made after taking into account the genetic maps of all four studies and specially the position of last reported markers on chromosome 22 (49 mbp), giving two bins of about equal size on the smallest chromosome as was required, thus avoiding manipulation of data. However, this selection could lead to a correlation between two adjacent bins. Interestingly, this correlation was observed only for extraversion in two pairs of adjacent bins on chromosomes 9 and 10. The size of the linkage peak on chromosome 9 extends to about 50 cM (Supplememtary Figure S2), implying that choice of a bin width of more than 30 cM could not have removed this correlation. For chromosome 10, the significance remained even after increasing the bin width to 40 mbp, which suggests that our result on 10q24 for extraversion is consistent. The GSMA gave broad linkage regions spread over 25 mbp, but we made an effort to localize the susceptibility genes by using additional information from the meta-analysis of GWAS.

It is interesting to note that none of the significant regions from the individual scans showed any evidence of linkage in the meta-analysis. There may be several explanations: first, our significant findings of individual studies may just be false positives. This may be a possibility, but it is difficult to believe that all significant findings are false, as VC linkage analysis is usually robust in detecting linkage.60 Second, this may due to the differences in the power of the various studies (see Supplementary Table 3). For example, the reason why the linkage of neuroticism to 3p14 was detected in the ERF study only may be the size of the study, which was twice as large as any other included study. Third and more likely, is possible locus heterogeneity across populations. High locus heterogeneity, which results in inconsistent linkage peaks, is one characteristic of complex traits like personality. However, the heterogeneity analysis did not provide evidence of heterogeneity at this locus. Also, the rank-based test used for the meta-analysis of linkage scans is insensitive to the significance of a linkage peak in an individual study and is more adept at finding subtle linkage peaks present in all studies included in the meta-analysis. For instance, the meta-analysis ignores the overlap at chromosome 2q14 between ERF (LOD=2.01) and NTR (LOD=2.03) for neuroticism despite falling in the same bin. Similarly, two adjacent regions for openness at chromosome 11q25 for ERF (LOD=2.05) and 11q24 for QIMR_adolescent (LOD=2.55) and chromosome 12q23 for ERF (LOD=2.85) and 12q24 for NTR (LOD=1.96) that showed suggestive evidence of linkage were not picked up in the meta-analysis. Such results were also observed in studies that used parametric methods (Fisher’s method) to meta-analyze genome-wide linkage scans.61

The most interesting finding of the meta-analysis is 11q24 that showed nominally significant PSR and POR and also significant association in the GWAS after adjusting for multiple testing. The 11q24 region is implicated in mental retardation62 and migraine.63 The significant SNP in the GWAS rs677035 is located between FLI1 and KCNJ1. Previously, we have found linkage to other potassium channel genes including KCNJ2, KCNJ6 and KCNJ16,27 making KCNJ1 the most interesting gene in the region. The other regions of interest include 10q24 for extraversion. The region showed both PSR and POR<0.05, and additionally showed borderline significant evidence of association in the meta-analysis of GWAS after adjusting for multiple testing. The region has been implicated in spinocerebellar ataxia,64 AD and cognitive function.65, 66

We cannot discard other regions that showed a high probability of containing true susceptibility loci in the linkage analyses as being false positives based on insignificant association results of the GWAS. Our multiple testing corrections for association within a linkage region were based on total number of SNPs in the region (rather than independent SNPs), which made the Bonferroni threshold very conservative. Secondly, the power of the linkage analysis and association analysis to detect a QTL depends on the effect size of the QTL and the linkage disequilibrium between the candidate loci,32 as the former is more powerful to detect relatively rare large effect loci, while the latter is more powerful for finding frequent loci with subtle effects. The power of association analysis to detect a QTL declines rapidly with the decreasing degree of linkage disequilibrium between the QTL and the candidate locus.32 It is also possible that because of arbitrary binning of the linkage peaks for the meta-analysis, which is not very precise, we might have completely missed the association signal.

The regions we have discovered with linkage are good candidates for sequencing exomes and regulatory regions and may unveil variants with moderate to large effects contributing to the make-up of human personality. The criteria for significant evidence for linkage in the meta-analyses were not reached in this study. The highest level of significance for linkage reached, translated into a high probability (of unknown size) to include a susceptibility locus. It is obvious that false negatives may occur because of locus heterogeneity using the current methods. To overcome these problems, we combined the data from the linkage analyses of the five basic NEO personality traits in four independent populations with those of the genome wide association analyses. Here we relied on classical association analysis and genetic imputations based on HapMap. This approach will be strengthened by imputing populations that underwent GWAS using the 1000 genomes, providing a greater resolution and a better coverage of rare variants.