Original Article

Molecular Psychiatry (2016) 21, 1145–1151; doi:10.1038/mp.2015.108; published online 4 August 2015

There is a Corrigendum (21 July 2016) associated with this article.

A genome-wide analysis of putative functional and exonic variation associated with extremely high intelligence

S L Spain1,8, I Pedroso1,8, N Kadeva1, M B Miller2, W G Iacono2, M McGue2, E Stergiakouli3, G D Smith3, M Putallaz4, D Lubinski5, E L Meaburn6, R Plomin7 and M A Simpson1

  1. 1Department of Medical and Molecular Genetics, Division of Genetics and Molecular Medicine, King’s College London, London, UK
  2. 2Department of Psychology, University of Minnesota, Minneapolis, MN, USA
  3. 3MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK
  4. 4Duke University Talent Identification Program, Duke University, Durham, NC, USA
  5. 5Department of Psychology and Human Development, Vanderbilt University, Nashville, TN, USA
  6. 6Department of Psychological Sciences, Birkbeck, University of London, London, UK
  7. 7MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK

Correspondence: Dr MA Simpson, Division of Genetics and Molecular Medicine, King’s College London, Medicine Guy’s Hospital, Great Maze Pond, London SE1 9RT, UK E-mail: michael.simpson@kcl.ac.uk; Professor R Plomin, MRC Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology and Neuroscience, King's College London, DeCrespigny Park, Denmark Hill, London SE5 8AF, UK. E-mail: robert.plomin@kcl.ac.uk

8These authors contributed equally to this work.

Received 17 March 2015; Revised 21 May 2015; Accepted 16 June 2015
Advance online publication 4 August 2015



Although individual differences in intelligence (general cognitive ability) are highly heritable, molecular genetic analyses to date have had limited success in identifying specific loci responsible for its heritability. This study is the first to investigate exome variation in individuals of extremely high intelligence. Under the quantitative genetic model, sampling from the high extreme of the distribution should provide increased power to detect associations. We therefore performed a case–control association analysis with 1409 individuals drawn from the top 0.0003 (IQ >170) of the population distribution of intelligence and 3253 unselected population-based controls. Our analysis focused on putative functional exonic variants assayed on the Illumina HumanExome BeadChip. We did not observe any individual protein-altering variants that are reproducibly associated with extremely high intelligence and within the entire distribution of intelligence. Moreover, no significant associations were found for multiple rare alleles within individual genes. However, analyses using genome-wide similarity between unrelated individuals (genome-wide complex trait analysis) indicate that the genotyped functional protein-altering variation yields a heritability estimate of 17.4% (s.e. 1.7%) based on a liability model. In addition, investigation of nominally significant associations revealed fewer rare alleles associated with extremely high intelligence than would be expected under the null hypothesis. This observation is consistent with the hypothesis that rare functional alleles are more frequently detrimental than beneficial to intelligence.



General cognitive ability, usually called intelligence, indexes the covariance among diverse cognitive tests and is one of the best predictors of important life outcomes such as education, occupation, and mental and physical health and illness.1, 2, 3 Extensive quantitative genetic research consistently indicates that about half of the total variance of intelligence can be accounted for by genetic factors.4, 5

Unlike psychiatric disorders, intelligence is a normally distributed quantitative trait with a positive end of high performance as well as a problematic end of intellectual disability.6 Unlike other normally distributed behavioural traits, such as personality that assess average behaviour across situations, intelligence indexes maximal performance on a battery of tests, analogous to musical ability and athletic ability.7 These features of intelligence make it especially well suited phenotypically as a target for positive genetics—that is, considering the positive end of the normal distribution of genetic effects rather than focusing on the negative effects of genetic mutations.8 The investigation of high intelligence could be of particular utility for finding DNA variants responsible for the heritability of the entire distribution of intelligence. The quantitative genetics model proposes that quantitative traits are influenced by multiple genes distributed as a bell-shaped curve, which implies that extremely high intelligence can be achieved only if an individual has many of the positive alleles and few of the negative alleles that affect intelligence. In other words, a large group of individuals of extremely high intelligence should be enriched for alleles associated with intelligence and thus yield increased power to detect associations.9

This quantitative genetics model could be construed to suggest that the extreme low end of the distribution of intelligence is the mirror image of the extreme high end. However, it seems likely that new mutations can more easily disrupt than improve a finely tuned performance system, which implies that genetic variation responsible for the heritability of the entire distribution of intelligence may be more likely to be found for extremely high rather than extremely low intelligence. Strong empirical support for this hypothesis comes from recent studies based on 360000 sibling pairs and 9000 twin pairs from 3 million 18-year-old males with cognitive assessments administered as part of conscription to military service in Sweden. These studies concluded that extremely high intelligence is as familial and heritable as the rest of the distribution10 but that extremely low intelligence—severe intellectual disability—is not familial or heritable.11 The latter finding, predicted initially in 1938 by Lionel Penrose,12 suggests that severe intellectual disability is aetiologically distinct from the normal distribution of intelligence, which may explain why DNA variants recently found to be associated with severe intellectual disability appear to be rare non-inherited de novo mutations.13, 14, 15

Research to date has had limited success in identifying DNA variants responsible for the heritability of intelligence;16, 17, 18, 19, 20, 21 the ‘missing heritability’ problem, is not unique to intelligence—it is pandemic throughout the life sciences.22 What has been learned from the whirlwind of genome-wide association (GWA) studies during the past few years is that the biggest effect sizes for associations between DNA variants and traits—common disorders as well as quantitative dimensions—are much smaller than expected.23 For example, in a GWA meta-analysis of intelligence for nearly 18000 children, the largest effect size accounts for <0.2% of the variance.16 Similar effect sizes were reported in a recent GWA meta-analysis of intelligence for nearly 54000 adults.21 If the largest effects are so small, the smallest effects will be infinitesimal, which means that they will be difficult to detect in GWA studies and even harder to replicate.24

One strategy for addressing this ‘missing heritability’ problem is a brute-force approach in which larger samples are amassed in order to detect associations of smaller effect size using the common single-nucleotide polymorphisms (SNPs) available on genome-wide DNA arrays.25 An alternative strategy is to consider the role of low frequency and rare alleles. The role of rare variants is well established in monogenic syndromes for which lowered intelligence is a symptom.26, 27, 28 However, it remains unclear to what extent low frequency and rare alleles contribute to the genetic architecture of complex traits like intelligence in the general population, although there have been reports finding no association between intelligence and low frequency copy number variants29, 30, 31 and between intelligence and rarer exonic variants;32 one pilot study reported more minor (<1% minor allele frequency (MAF)) exonic alleles in a higher-IQ group.33

This study capitalises on the increased power of association at the high extreme of intelligence as a strategy to expedite the discovery of alleles contributing to variation in intelligence throughout the distribution. Another novel feature of this study is its focus on low frequency protein-coding variants in exons rather than genome-wide common variants. The combination of these two features is expected to be synergistic because the gain in power that comes from studying the high extreme is particularly pronounced for low frequency and rare alleles.9, 34

For these reasons, we undertook a threshold-selected case–control study by obtaining DNA from individuals representing approximately the top 0.03% of the intelligence distribution and comparing them with controls drawn from an unselected population cohort. Within this extreme sampling design, we investigate the role of common and low frequency protein-coding variation in intelligence.


Materials and methods


Data from four samples were used in this study: a sample of high-intelligence cases, a control sample and two representative samples used to extend our case–control results to individual differences in the population. The project received ethical approval from the King’s College London Research Ethics Committee (reference number PNM/11/12-51) and from the European Research Council Executive Agency (reference number Ares(2012)56321).

High-intelligence cases (HiQ)

Individuals were recruited from the Duke University Talent Identification Program (TIP), a non-profit organisation established in 1980 and dedicated to identifying and fostering the development of academically gifted children35 (see http://www.tip.duke.edu). Individuals were selected from the United States for participation in the HiQ study on the basis of performance on the Scholastic Assessment Test (SAT) or American College Test (ACT) taken at age 12 rather than the usual age of 18 years. A composite that aggregates verbal and mathematics SAT and ACT scores correlates >0.80 with intelligence tests and it is estimated that the TIP program recruits from the top 3% of the intelligence distribution.36 For this study (HiQ), cases were selected from the top 1% of these TIP individuals, representing approximately the top 0.03% of the intelligence distribution. Exome array genotyping data were available for 1759 HiQ individuals who reported their ethnicity as ‘white’. Following genotyping and quality control (see below), the HiQ sample included 1409 individuals (818 males).

Unselected controls (MTFS)

The Minnesota Twin Family Study (MTFS) is a population-based study of twins and their parents recruited from population databases for twins born between 1972 and 1994 in the state of Minnesota in the United States and tested at ages 11 or 17 (refs 37, 38) (see https://mctfr.psych.umn.edu/). Cognitive ability was measured using a short form of the Wechsler Adult Intelligence Scale-Revised (WAIS-R) for participants age 16 or older, or the Wechsler Intelligence Scale for Children-Revised (WISC-R) for those younger than 16. The short form consisted of two verbal subtests (information and vocabulary) and two performance subtests (block design and picture arrangement). An estimate of full-scale IQ was determined by prorating the scaled scores for these four subtests; the effective ceiling for the IQ scores was about 150. We excluded individuals with IQ <70 and those who were not white. We included 3253 unrelated individuals (1500 males) from MTFS for whom exome array genotyping and intelligence data were available.

Extension sample 1

The Twins Early Development Study (TEDS) is a longitudinal UK-based population sample of over 15000 families with twins born in England and Wales between 1994, 1995 and 1996 and identified from birth records39 (see http://www.teds.ac.uk). Individuals were tested at 12 years using two verbal and two nonverbal measures. Test scores were adjusted for age within each testing period, and first principal component scores were derived using principal component analysis implemented in R. The TEDS sample with genotyping and age 12 intelligence data included 4072 unrelated individuals (1780=males). Individuals with severe recurrent medical problems or severe perinatal medical problems were excluded, as were individuals whose first language was not English. Only included individuals who reported their ethnicity as ‘white’ were included.

Extension sample 2

The Avon Longitudinal Study of Parents and Children (ALSPAC) is a prospective birth cohort, which recruited pregnant women with expected delivery dates between April 1991 and December 1992 from Bristol, UK. A total of 14541 pregnant women were initially enrolled with 14062 children born. Detailed information on health and development of children and their parents were collected from regular clinic visits and completion of questionnaires. A detailed description of the cohort has been published previously.40, 41 The study website contains details of all the data that is available through a fully searchable data dictionary: http://www.bris.ac.uk/alspac/researchers/data-access/data-dictionary/. Ethical approval was obtained from the ALSPAC Law and Ethics Committee and the local ethics committees. Intelligence in ALSPAC was measured with a short version of the Wechsler Intelligence Scale for Children (WISC-III) at 8 years of age. Verbal (information, similarities, arithmetic, vocabulary and comprehension) and performance (picture completion, coding, picture arrangement, block design and object assembly) subscales were administered, each subtest was age scaled according to population norms and a summary score for total IQ derived. The sample we used included 6461 unrelated individuals (3209 males) who had been tested for intelligence using a short form of the WISC-III assessment at age 8 years (mean=8.7 years, s.d.=0.30) and whose reported ethnicity was ‘white’.

Genotyping and quality control procedures

DNA from cheek swabs was collected by post from 1759 HiQ individuals and extracted in accordance with standard protocols. DNA was genotyped using the Illumina HumanExome-12v1-1 array according to the manufacturer’s protocols. The control cohort were previously genotyped as part of the MTFS using the Illumina HumanExome-12v1_A array (San Diego, CA, USA). We limited our analyses to 242901 variants present on both versions of the array.

Provisional genotypes were called using Illumina's GenomeStudio v2011.1 software. Sample QC was performed on the HiQ and MTFS cohorts separately using PLINK and R. Samples were removed from subsequent analyses on the basis of call rate (<0.95), suspected non-European ancestry, heterozygosity (±4 s.d. from the mean), array signal intensity (>4 s.d. from the mean) and relatedness (Supplementary Table 1). SNPs were removed based on the criteria of call rate (<0.99), deviations from Hardy–Weinberg equilibrium (P <1 × 10−4) and GenomeStudio cluster separation score (<0.4). Duplicate assays, tri-allelic variants and insertion/deletions were also removed from downstream analysis (Supplementary Table 2). The ZCALL program (see Web resources section) was used to augment the genotype calling for samples and SNPs that passed the initial QC. Only samples that achieved a call rate >99% after ZCALL genotype refinement were included in the analysis.

After a preliminary association analysis, we manually reviewed the cluster plots for 10000 variants (ordered by association P-value) using the EVOKER tool (see Web resources section) and recalled genotypes or removed SNPs where necessary. During this process, 1083 SNPs were removed from the analysis owing to poor clustering. A total of 227858 variants passed QC and were available for analysis (Supplementary Table 2). The genotype data were converted to VCF format for downstream analysis using tools within the EPACTS framework (see Web resources section).

Genotyping in the TEDS and ALSPAC extension samples

Genotyping of rs4713668 was undertaken in the TEDS and ALSPAC samples with a TaqMan SNP genotyping assay (Applied Biosystems, Life Technologies Ltd, Paisley, UK).

Statistical analysis

Discovery study

Single-SNP analysis: Analysis was restricted to the 199039 variants on the HumanExome array that were successfully genotyped and predicted to be functional (non-synonymous, stop gain/loss, start gain/loss or splice site altering) based on annotation performed in EPACTS v3.2.3 using the ‘anno’ option and the default ‘refFlat’ file (refFlat_hg19.txt.gz) (Supplementary Table 3). A total of 114416 variants were polymorphic (minor allele count >0) in the complete data set of HiQ and MTFS individuals, of which 70112 were polymorphic in each of the HiQ and MTFS data sets. In all, 17309 variants (MAF >0.01) on the array were assessed for association with a linear mixed model using the EPACTS v3.2.3 implementation of EMMAX. Population stratification was corrected for using a kinship matrix, which was created with the ‘make-kin’ function in EPACTS using autosomal SNPs with a call rate of >0.99 and MAF >0.01 in both HiQ cases and MTFS controls.

Replication of previously reported cognition-associated variants

Seventeen cognition-related GWA studies were identified in the NHGRI GWAS Catalogue (as of March 2014, Supplementary Table 4), which reported 567 SNPs with P <1 × 10−4. We identified variants on the exome array that were either in linkage disequilibrium (LD; r2 >0.6) or within 250kb of the previously published variant (Supplementary Table 5). Pairwise LD relationships were calculated from 1000 genomes (version 3.20101123) EUR haplotypes using a 500kb window in the vcftools program (see Web resources section).

Extension studies

In order to further investigate suggestive hits (P <5 × 10−5) to emerge from the main case–control single-SNP analysis, we performed a quantitative trait analysis using the full distribution of IQ scores. In the MTFS data set, a kinship matrix was used to correct for population structure and association was assessed using a linear mixed model implemented in EMMAX. In the TEDS and ALSPAC data sets, allelic tests of association were performed using linear regression in PLINK.

Gene-based rare variant analysis

A total of 104644 rare exonic variants (MAF <0.01; see Supplementary Table 3) were tested for association in the discovery case–control sample using a variant-collapsing method implemented within EPACTS v3.2.3. The analysis was performed using two different methods, a burden analysis (emmaxCMC) and the optimal sequence kernel association test (SKAT-O) method (see Web resources section). Both methods, as implemented in EPACTS, use EMMAX to correct for population stratification. Variants within each gene were assigned to groups by function to form three different combinations of variants to be analysed separately: (1) StopGain, Essential Splice Site (ESS) and NonSynonymous (NS), (2) StopGain, ESS and predicted damaging by polyphen2 (PP2) and (3) StopGain and ESS.

Estimation of phenotypic variance explained by genotype data

Kinship matrices were calculated with LDAK software42 using all variants and estimating weights for all SNPs in a chromosome simultaneously. Principal components, used as covariates for heritability estimation, were calculated using the GCTA software43 for variants with MAF >0.05 and polymorphic in both cases and control data sets. We used GCTA to fit a linear mixed model to estimate the fraction of phenotypic variance explained by genotyped variants on the exome array, on both the observed and liability scale. For individual SNP association, we transformed the case–control effect size into the corresponding variance explained on a liability scale.44

Increaser/decreaser ratio

The ratio of the number of associated variants for which the minor allele is at elevated frequency in the high-intelligence cohort to those in which the minor allele is at elevated frequency in the control cohort (increaser/decreaser (I/D) ratio) was calculated based on the method reported recently by Chan et al.45 The beta values for each SNP’s association were recalculated to reflect the effect of the minor allele. The I/D ratio was calculated for different P-value thresholds (0.01 and 0.001) and MAF ranges (0.00–0.01, 0.01–0.05, 0.05–0.15, and 0.15–0.50). SNPs were grouped using the LD-based clumping procedure in PLINKv1.07 (—clump); SNPs that had an r2 >0.1 with a SNP with stronger evidence of association in the locus (defined as a 1MB window) were removed. In all, 1000 permutations of the single-variant association analysis were performed to determine the expected distribution under the null hypothesis and to allow the calculation of P-values for significance. The permuted phenotypes were generated using ‘make-perm-pheno’ function in PLINK, which preserves the case–control sample number. LD pruning and the generation of I/D allele counts and ratios were repeated for each permutation as described above.

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

where μ is the mean and σ is the s.d. of the expected log2 I/D ratios. Squared Z-scores were converted to two-tailed P-values in R using the chi-squared distribution with 1 degree of freedom.



Study design and statistical power

Following genotyping and quality control, the HiQ sample included 1409 individuals selectively sampled from the upper tail of the intelligence distribution and 3253 unselected MTFS controls. A previous study of intelligence in the TIP cohort derived a conversion scale relating SAT scores at age 12 years to the familiar IQ scale (normal distribution with a mean of 100 and s.d. of 15).36 Using this conversion factor, the successfully genotyped HiQ cohort has a mean estimated IQ of 176, ranging from 144 to 214 with 93.6 % of the cohort with estimated IQ scores >170 (Figure 1). This distribution of estimated IQ scores illustrates the extreme sampling of individuals from the high end of the trait distribution in comparison with the unselected MTFS controls (Figure 1). A threshold-selected case–control analysis of this design provides 80% power to detect associated variants explaining >0.0015 of the trait variance assuming an additive model and α=1 × 10−7.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

Extreme sampling of the high cognitive ability cohort. Distribution of general cognitive ability, measured on a standardised IQ scale (mean 100, s.d. 15), of the high cognitive ability cohort (HiQ, n=1409, top panel) and the control cohort (MTFS, n=3253, lower panel). The red lines illustrate the expected distribution of IQ scores with unbiased sampling.

Full figure and legend (67K)Download PowerPoint slide (232 KB)

Single-variant analysis

A case–control association analysis was undertaken with high quality genotypes at 70112 common (>5%) and low frequency (1–5%) protein-altering and splice site SNPs from 1409 individuals from the HiQ cohort and 3253 control subjects. The quantile–quantile plot indicated adequate control for confounders (λGC=1.06, Supplementary Figure 1). We explored whether any loci previously reported to be associated with cognition-related traits showed evidence of association in our data set. For the 567 SNPs that had been previously reported, seven were within 250kb of a SNP that showed evidence of association (PCC <0.01) with intelligence in our data set (Supplementary Table 5). One SNP rs4713668 (P=4.62 × 10−4) identified in our study, was in LD (r2=0.65) with rs3227, which is one of the three variants recently reported with genome-wide significant evidence of association with educational attainment.46

Our single-SNP analysis of 70112 protein-altering and splice site SNPs that were polymorphic in both the HiQ and MTFS data sets did not reveal any associations that satisfied the genome-wide significant threshold (Supplementary Table 6). However, the most strongly associated SNP was a missense variant located in PLXNB2 at 22q13 (rs28379706, c.A952G, p.K318E; PCC=1.31 × 10-5, beta=−0.04±0.009, odds ratio=0.76 (0.69–0.83)), which explains an estimated 0.16% of the variance (Supplementary Table 7). Given the unique sampling design used in the HiQ sample, an equivalent case–control replication cohort is not currently available. However, the quantitative genetic model suggests that variants associated with high intelligence will also associate with individual differences in intelligence throughout the distribution. We therefore sought to evaluate the role of rs28379706 in the normal distribution of intelligence in three independent unselected population cohorts. A nominally significant association with intelligence was observed in the MTFS cohort (PQT=0.02). However, we did not observe a significant association in either the TEDS or ALSPAC cohorts (PQT=0.56 and PQT=0.94, respectively). The sample sizes of these cohorts, together with the small effect size of rs28379706 observed in the case–control analysis, indicates that our power to detect the association is limited in these cohorts. For this reason, it warrants mention that the allelic effect was directionally consistent in the discovery case–control study and all three population cohorts (Supplementary Table 7).

Gene-based rare variant analysis

To address the hypothesis that burden of rare variants (MAF <0.01) with genes is associated with intelligence, we grouped variants within each gene by function and applied two gene-based analyses, CMC and SKAT-O (see Materials and methods section). A limitation of the CMC method is its reduced power to detect association when alleles that increase and decrease intelligence are present in the same gene; SKAT-O models protective and risk effects. Both tests were performed using a linear mixed model to control for confounders (quantile–quantile plots in Supplementary Figure 2). No individual gene reached our study-wide significance threshold using either test (P <3.18 × 10−6, based on the analysis of 15686 genes, Supplementary Table 8). PLXNB2, which harboured the most strongly associated common variant, did not demonstrate compelling evidence of association of rare variation in the gene-based analysis (StopGain_ESS_NS CMC P=0.0036; SKAT-O P=0.0042).

Variance explained by protein-altering variants

To establish the contribution of the genotyped exonic SNPs to the heritability of intelligence, we used a linear mixed model implemented in the genome-wide complex trait analysis (GCTA) framework to estimate the variance explained by the 70112 functional SNPs used in this analysis. We converted the variance to a liability scale assuming a population prevalence of 0.0003 based on the ascertainment criteria for the HiQ sample. The results suggest that the genotyped protein-altering SNPs (or non-coding variants that they are tagging) explain 17.4% (1.7% s.e.) of the variance in intelligence on the liability scale. The observation that a substantial proportion of the variance of liability to high intelligence could be explained by these SNPs suggests that these data could be further explored to gain insight into the genetic architecture of intelligence.

Ratio of minor allele increaser to decreaser associations

A recent study of the genetic architecture of complex human diseases and traits elegantly illustrated that an elevated ratio of the direction of allelic effect (risk to protective in a typical disease case–control study) of nominally associated rare alleles can be interpreted as a signature of polygenic inheritance from lower frequency variants.45 The ratio of associated risk to protective variants is substantially influenced by unequal sizes of the case and control cohorts or uneven population stratification. The ratio must therefore be evaluated in the context of the null distribution of ratios that is generated through permutation. A risk to protective ratio for rare alleles that is high in comparison with the null may be attributed to negative selection keeping detrimental rare alleles at low frequency. High intelligence may be considered as an advantageous rather than detrimental trait. As a result, we may expect to observe a reduction of the ratio of rare IQ-increasing alleles to IQ-decreasing alleles (the I/D ratio), indicating an enrichment of rare intelligence-decreasing alleles or depletion of rare intelligence-increasing alleles. To test this supposition, we evaluated this I/D ratio in our data set (see Materials and methods section, Table 1). We observe a significantly lower I/D ratio for variants within the lowest frequency class (MAF <0.01) evaluated in comparison with the null. The low ratios are observed for both 73 variants with a PCC<0.001 (I/Dobs=8.000, I/Dexp=32.159, Z-score=−2.67, PI/D=0.001) and also 467 variants with PCC <0.01 (I/Dobs=3.396, I/Dexp=6.400, Z-score=−3.28, PI/D=0.008). These results indicate that fewer rare alleles are associated with high intelligence or more rare alleles are associated with lower intelligence than would be expected.



We have described the results from the first investigation of exome variation in individuals of extremely high intelligence. The sampling design from the extreme tail of the distribution provides increased power for the detection of alleles associated with high intelligence. However, this unique sample also presents challenges including the absence of a direct replication sample. The strongest associated single variant identified in the discovery case–control analysis is a non-synonymous variant located in the PLXNB2 gene, whose protein product has been previously implicated in neuronal migration. Given the unavailability of a direct replication sample, we sought to evaluate the variant’s association with individual differences in intelligence across the normal distribution, aware that power would be limited because the SNP accounted for only 0.16% of variance of a liability model comparing the extremely high intelligence group and unselected controls. Using this approach, we were unable to replicate this association observed in the case–control study, although it is noteworthy that the same direction of allelic effect was observed in all three extension cohorts together with the case–control study.

Evaluating the role of all of the genotyped protein-coding variation on individual differences in intelligence strongly suggests that such variation does contribute a substantial proportion (about one-fifth) to the variance of the trait. The failure to detect any single robust association suggests that the contribution emanates from multiple alleles with effect sizes smaller than this study’s ability to detect robustly. It is also important to note that the variance explained by this group of functional SNPs could also be a consequence of linkage disequilibrium between the coding variants and non-coding true ‘causal’ SNPs and is therefore an upper bound of the variance explained by this group of coding variants. It should also be noted that the heritability of individual differences in intelligence increases substantially throughout the lifespan, from about 20% in early childhood, 40% in adolescence and 60% or higher in adulthood.47 As a result the samples used in this study were tested in adolescence, our estimates of heritability may not be as high as they would be in a sample of older adults.

The observed ratio of I/D rare alleles nominally associated with high intelligence indicates that we observe fewer rare alleles that are associated with high intelligence or more rare alleles that are associated with normal intelligence than would be expected. Although we cannot exclude the possibility that this signal is driven by asymmetric population stratification, this finding is consistent with the hypothesis that rare alleles are more frequently detrimental to intelligence. This hypothesis is important when considering appropriate study designs for future investigations. We have demonstrated here the increased power from extreme sampling as compared with unbiased sampling designs. For example, a study that maximises power to detect rare alleles associated with decreased intelligence would ideally seem to use a low-intelligence sample selected to be as extreme as the high-intelligence sample in order to ensure balanced sampling. However, this design may not be advisable because, as noted earlier, extremely low intelligence is not familial or heritable.11 Instead, a design seems warranted that compares the high-intelligence group with a group representative of the low end of the heritable normal distribution of intelligence (that is, between 1 and 2 s.d. below the mean).

As mentioned in the Introduction section, one approach to the ‘missing heritability’ problem is a brute-force approach that moves toward ever-larger consortia in order to increase power. Although an intelligence GWA consortium of 54000 adults was recently reported, a polygenic score based on this GWA only accounted for about 1% of the variance of intelligence scores in independent samples,21 which suggests that very much larger samples will be needed to narrow the missing heritability gap. This study is novel in suggesting two alternative strategies to increase power. One strategy is specific to intelligence and other normally distributed traits: selecting individuals extremely high in intelligence. The other strategy is to move beyond the common SNPs that have been the focus so far of GWA studies to consider functional exonic SNPs and rarer SNPs. Although combining these two novel strategies has suggested that some of the missing heritability could reside in functional and rarer SNPs, our results indicate that these strategies are not a panacea to the problem of missing heritability. Being optimists, we are hopeful that new strategies will be developed, especially as whole-genome sequencing comes on line, which will complement the common-SNP, exonic-SNP and extreme sampling strategies that have been tried so far.

Some remarks about the phenotype under analysis are in order because they point to a direction for future research as well as a limitation. Individuals in the profoundly gifted (IQ >170) group were identified by above-level assessment procedures, which involved administering standardised college-entrance measures of quantitative and verbal reasoning to intellectually talented young adolescents. A composite of quantitative and verbal reasoning was used to capture general intelligence, which is justified for genetic and neurocognitive reasons.6 Such a composite based on above-level assessment has been shown to capture extremely high levels of general intellectual functioning with precision, well beyond 4 s.d. above the population mean.36, 48 However, quantitative reasoning and verbal reasoning predict different outcomes. For example, 25-year longitudinal studies have revealed that differences in the quantitative versus verbal prowess of profoundly gifted youth portend contrasting occupational and creative accomplishments in scientific/technical versus humanistic/linguistic endeavours.49, 50 For this reason, future research might profit by considering quantitative and verbal reasoning separately. In addition, a specific cognitive ability missing from this study, spatial ability, should be added to the mix because spatial ability predicts occupational and creative accomplishments beyond quantitative and verbal reasoning abilities.51, 52

This investigation of protein-altering variation in high intelligence provides clear insight in the role of this class of genetic variation in the architecture of the trait. A common theme emerging from genetic studies of intelligence, similar to all complex traits and common disorders, is its highly polygenic nature with its heritability explained by many variants of small effect. Although the unique extreme sampling design used in this study provides improved power to detect associations in certain situations it has also provided challenges for direct replication. Nevertheless, the evidence for the contribution of protein-altering variants to the heritability of intelligence and the evidence that rare functional alleles are detrimental to intelligence provides a framework for defining the role of individual rare alleles. However, we did not find any individual protein-altering variants that are reproducibly associated with extremely high intelligence. Thus, despite the power of sampling from the extreme high end of the distribution of intelligence, we conclude that these results primarily highlight the complex genetic architecture of intelligence.


Conflict of interest

The authors declare no conflict of interest.



  1. Hunt EB. Human Intelligence. Cambridge University Press: : Cambridge, 2011.
  2. Deary IJ, Batty GD. Cognitive epidemiology. J Epidemiol Community Health 2007; 61: 378–384. | Article | PubMed | ISI |
  3. Lubinski D, Benbow CP, Kell HJ. Life paths and accomplishments of mathematically precocious males and females four decades later. Psychol Sci 2014; 25: 2217–2232. | Article | PubMed |
  4. Deary IJ, Johnson W, Houlihan LM. Genetic foundations of human intelligence. Hum Genet 2009; 126: 215–232. | Article | PubMed | ISI |
  5. Plomin R, DeFries JC, Knopik VS, Neiderhiser JM. Behavioral genetics 6th edn. Worth Publishers: New York, 2013.
  6. Plomin R, Deary IJ. Genetics and intelligence differences: five special findings. Mol Psychiatry 2015; 20: 98–108. | Article | PubMed | CAS |
  7. Epstein D. The Sports Gene: Inside the Science of Extraordinary Athletic Performance. Current: New York, 2013.
  8. Plomin R, Haworth CMA, Davis OSP. Common disorders are quantitative traits. Nat Rev Genet 2009; 10: 872–878. | Article | PubMed | ISI | CAS |
  9. Pütter C, Pechlivanis S, Nöthen MM, Jöckel KH, Wichmann HE, Scherag A. Missing heritability in the tails of quantitative traits? A simulation study on the impact of slightly altered true genetic models. Hum Hered 2011; 72: 173–181. | Article | PubMed |
  10. Shakeshaft NG, Trzaskowski M, McMillan A, Krapohl E, Simpson MA, Reichenberg A et al. Thinking positively: the genetics of high intelligence. Intelligence 2015; 48: 123–132. | Article | PubMed |
  11. Reichenberg A, Cederlöf M, McMillan A, Trzaskowski M, Davidson M, Weiser M et al. Discontinuity in the genetic and environmental causes of the intellectual disability spectrum. Proc Natl Acad Sci USA (in press).
  12. Penrose LS. A Clinical and Genetic Study of 1280 Cases of Mental Defect. HM Stationery Office: London, 1938.
  13. de Ligt J, Willemsen MH, van Bon BW, Kleefstra T, Yntema HG, Kroes T et al. Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 2012; 367: 1921–1929. | Article | PubMed | ISI | CAS |
  14. Ellison JW, Rosenfeld JA, Shaffer LG. Genetic basis of intellectual disability. Annu Rev Med 2013; 64: 441–450. | Article | PubMed | ISI | CAS |
  15. Veltman JA, Brunner HG. De novo mutations in human genetic disease. Nat Rev Genet 2012; 13: 565–575. | Article | PubMed | ISI | CAS |
  16. Benyamin B, Pourcain B, Davis OS, Davies G, Hansell NK, Brion MJ et al. Childhood intelligence is heritable, highly polygenic and associated with FNBP1L. Mol Psychiatry 2014; 19: 253–258. | Article | PubMed | ISI | CAS |
  17. Butcher LM, Davis OSP, Craig IW, Plomin R. Genome-wide quantitative trait locus association scan of general cognitive ability using pooled DNA and 500K single nucleotide polymorphism microarrays. Genes Brain Behav 2008; 7: 435–446. | Article | PubMed | CAS |
  18. Davies G, Tenesa A, Payton A, Yang J, Harris SE, Liewald D et al. Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol Psychiatry 2011; 16: 996–1005. | Article | PubMed | ISI | CAS |
  19. Davis OSP, Butcher LM, Docherty SJ, Meaburn EM, Curtis CJC, Simpson A et al. A three-stage genome-wide association study of general cognitive ability: hunting the small effects. Behav Genet 2010; 40: 759–767. | Article | PubMed |
  20. Rietveld CA, Esko T, Davies G, Pers TH, Turley P, Benyamin B et al. Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc Natl Acad Sci USA 2014; 111: 13790–13794. | Article | PubMed | CAS |
  21. Davies G, Armstrong N, Bis JC, Bressler J, Chouraki V, Giddaluru S et al. Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N=53 949). Mol Psychiatry 2015; 20: 183–192. | Article | PubMed | CAS |
  22. Robinson MR, Wray NR, Visscher PM. Explaining additional genetic variation in complex traits. Trends Genet 2014; 30: 124–132. | Article | PubMed | ISI | CAS |
  23. Gratten J, Wray NR, Keller MC, Visscher PM. Large-scale genomics unveils the genetic architecture of psychiatric disorders. Nat Neurosci 2014; 17: 782–790. | Article | PubMed | CAS |
  24. Plomin R. Child development and molecular genetics: 14 years later. Child Dev 2013; 84: 104–120. | Article | PubMed |
  25. Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet 2012; 90: 7–24. | Article | PubMed | ISI | CAS |
  26. Inlow JK, Restifo LL. Molecular and comparative genetics of mental retardation. Genetics 2004; 166: 835–881. | Article | PubMed | ISI | CAS |
  27. Baker K, Raymond FL, Bass N. Genetic investigation for adults with intellectual disability: opportunities and challenges. Curr Opin Neurol 2012; 25: 150–158. | Article | PubMed |
  28. Willemsen MH, Kleefstra T. Making headway with genetic diagnostics of intellectual disabilities. Clin Genet 2014; 85: 101–110. | Article | PubMed |
  29. Kirkpatrick RM, McGue M, Iacono WG, Miller MB, Basu S, Pankratz N. Low-frequency copy-number variants and general cognitive ability: no evidence of association. Intelligence 2014; 42: 98–106. | Article | PubMed |
  30. MacLeod AK, Davies G, Payton A, Tenesa A, Harris SE, Liewald D et al. Genetic copy number variation and general cognitive ability. PLoS One 2012; 7: e37385. | Article | PubMed |
  31. McRae AF, Wright MJ, Hansell NK, Montgomery GW, Martin NG. No association between general cognitive ability and rare copy number variation. Behav Genet 2013; 43: 202–207. | Article | PubMed |
  32. Marioni RE, Penke L, Davies G, Huffman JE, Hayward C, Deary IJ. The total burden of rare, non-synonymous exome genetic variants is not associated with childhood or late-life cognitive ability. Proc R Soc Lond B Biol Sci 2014; 281: 20140117. | Article |
  33. Luciano M, Svinti V, Campbell A, Marioni RE, Hayward C, Wright AF et al. Exome sequencing to detect rare variants associated with general cognitive ability: a pilot study. Twin Res Hum Genet 2015; 18: 117–125. | Article | PubMed |
  34. Barnett IJ, Lee S, Lin X. Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 2013; 37: 142–151. | Article | PubMed | ISI |
  35. Putallaz M, Baldwin J, Selph H. The Duke University Talent Identification Program. High Ability Studies 2005; 16: 41–54. | Article |
  36. Lubinski D, Webb RM, Morelock MJ, Benbow CP. Top 1 in 10,000: a 10-year follow-up of the profoundly gifted. J Appl Psychol 2001; 86: 718–729. | Article | PubMed |
  37. Iacono WG, Carlson SR, Taylor J, Elkins IJ, McGue M. Behavioral disinhibition and the development of substance-use disorders: findings from the Minnesota Twin Family Study. Dev Psychopathol 1999; 11: 869–900. | Article | PubMed | ISI | CAS |
  38. Iacono WG, McGue M. Minnesota Twin Family Study. Twin Res 2002; 5: 482–487. | Article | PubMed |
  39. Haworth CMA, Davis OSP, Plomin R. Twins Early Development Study (TEDS): a genetically sensitive investigation of cognitive and behavioral development from childhood to young adulthood. Twin Res Hum Genet 2013; 16: 117–125. | Article | PubMed |
  40. Boyd A, Golding J, Macleod J, Lawlor DA, Fraser A, Henderson J et al. Cohort profile: the 'children of the 90s' — the index offspring of the Avon Longitudinal Study of Parents and Children. Int J Epidemiol 2013; 42: 111–127. | Article | PubMed | ISI |
  41. Fraser A, Macdonald-Wallis C, Tilling K, Boyd A, Golding J, Davey Smith G et al. Cohort profile: the Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort. Int J Epidemiol 2013; 42: 97–110. | Article | PubMed | ISI |
  42. Speed D, Hemani G, Johnson MR, Balding DJ. Improved heritability estimation from genome-wide SNPs. Am J Hum Genet 2012; 91: 1011–1021. | Article | PubMed | ISI | CAS |
  43. Yang JA, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 2011; 88: 76–82. | Article | PubMed | ISI | CAS |
  44. Lee SH, Goddard ME, Wray NR, Visscher PM. A better coefficient of determination for genetic profile analysis. Genet Epidemiol 2012; 36: 214–224. | Article | PubMed | ISI |
  45. Chan Y, Lim ET, Sandholm N, Wang SR, McKnight AJ, Ripke S et al. An excess of risk-increasing low-frequency variants can be a signal of polygenic inheritance in complex diseases. Am J Hum Genet 2014; 94: 437–452. | Article | PubMed | ISI | CAS |
  46. Rietveld CA, Medland SE, Derringer J, Yang J, Esko T, Martin NW et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 2013; 340: 1467–1471. | Article | PubMed | ISI | CAS |
  47. Haworth CMA, Wright MJ, Luciano M, Martin NG, de Geus EJC, van Beijsterveldt CEM et al. The heritability of general cognitive ability increases linearly from childhood to young adulthood. Mol Psychiatry 2010; 15: 1112–1120. | Article | PubMed | ISI | CAS |
  48. Frey MC, Detterman DK. Scholastic assessment or g? the relationship between the scholastic assessment test and general cognitive ability. Psychol Sci 2004; 15: 373–378. | Article | PubMed |
  49. Kell HJ, Lubinski D, Benbow CP. Who rises to the top? Early indicators. Psychol Sci 2013; 24: 648–659. | Article | PubMed |
  50. Lubinski D. Exceptional cognitive ability: the phenotype. Behav Genet 2009; 39: 350–358. | Article | PubMed |
  51. Kell HJ, Lubinski D, Benbow CP, Steiger JH. Creativity and technical innovation: spatial ability's unique role. Psychol Sci 2013; 24: 1831–1836. | Article | PubMed |
  52. Wai J, Lubinski D, Benbow CP. Spatial ability for STEM domains: aligning over 50 years of cumulative psychological knowledge solidifies its importance. J Educ Psychol 2009; 101: 817–835. | Article |


This research was supported by a European Research Council Advanced Investigator award (295366) to RP. Collecting DNA from the highest-scoring TIP individuals was supported by an award from the John Templeton Foundation (13575) to RP. Data from the unselected population-based controls came from the Minnesota Twin Family Study, which is supported by the United States Public Health Service Grants to WGI and MM from the National Institute on Alcohol Abuse and Alcoholism (AA09367 and AA11886), the National Institute on Drug Abuse (DA05147, DA13240, DA024417 and DA036216), and the National Institute on Mental Health (MH066140). Two other unselected population-based control samples were used to extend case–control associations for associations with individual differences throughout the distribution. TEDS is supported by grants to RP from the UK Medical Research Council (G0901245 and G19/2). ALSPAC is supported in part by a grant to GDS from the UK Medical Research Council (092731).

Supplementary Information accompanies the paper on the Molecular Psychiatry website

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.