Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N=53 949)

General cognitive function is substantially heritable across the human life course from adolescence to old age. We investigated the genetic contribution to variation in this important, health- and well-being-related trait in middle-aged and older adults. We conducted a meta-analysis of genome-wide association studies of 31 cohorts (N=53 949) in which the participants had undertaken multiple, diverse cognitive tests. A general cognitive function phenotype was tested for, and created in each cohort by principal component analysis. We report 13 genome-wide significant single-nucleotide polymorphism (SNP) associations in three genomic regions, 6q16.1, 14q12 and 19q13.32 (best SNP and closest gene, respectively: rs10457441, P=3.93 × 10−9, MIR2113; rs17522122, P=2.55 × 10−8, AKAP6; rs10119, P=5.67 × 10−9, APOE/TOMM40). We report one gene-based significant association with the HMGN1 gene located on chromosome 21 (P=1 × 10−6). These genes have previously been associated with neuropsychiatric phenotypes. Meta-analysis results are consistent with a polygenic model of inheritance. To estimate SNP-based heritability, the genome-wide complex trait analysis procedure was applied to two large cohorts, the Atherosclerosis Risk in Communities Study (N=6617) and the Health and Retirement Study (N=5976). The proportion of phenotypic variation accounted for by all genotyped common SNPs was 29% (s.e.=5%) and 28% (s.e.=7%), respectively. Using polygenic prediction analysis, ~1.2% of the variance in general cognitive function was predicted in the Generation Scotland cohort (N=5487; P=1.5 × 10−17). In hypothesis-driven tests, there was significant association between general cognitive function and four genes previously associated with Alzheimer's disease: TOMM40, APOE, ABCG1 and MEF2C.


Supplementary Tables
Supplementary Table S1 8 Supplementary

Supplementary Figures
Supplementary Figure S1  Genetic and Environmental Risk in AD consortium -GERAD). In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer's disease cases and 11,312 controls.
Finally, a meta-analysis was performed combining results from stages 1 & 2. The results from the stage 1 meta-analysis were used to create a polygenic predictor which was used to predict general cognitive function and general fluid cognitive function phenotypes in the GS cohort.

Pathway and network analyses
INRICH INRICH 1 was used to quantify the degree to which the most significant genomic regions identified in our GWAS overlapped with known biological pathways. Significant genomic intervals were identified using the PLINK 2 clumping procedure. The intervals were formed by selecting all SNPs with a P-value of less than 0.0005 as index SNPs. The region around each index SNP was then extended across a 250kb range and clumps were formed by including other SNPs if they were both nominally associated (P < 0.05) with general cognitive function and in moderate LD (r 2 > 0.5) with the index SNP according to the HapMap II CEU reference panel. Genomic intervals were included from subsequent analysis if they were within 20kb (5' or 3') of any known gene found in the UCSC human genome browser hg18 assembly. A total of 722 genomic intervals were found of which 434 were located within 20kb of a known gene. Intervals which overlapped with each other were then merged, leaving 284 LD independent intervals to be analysed for enrichment.
Enrichment testing was carried out using the pathways found in Gene Ontology 3 . After filtering gene-sets by size, 1284 gene-sets of between 5 and 200 genes were included. The number of intervals that overlapped with genes found in each of the gene-sets of Gene Ontology was then counted. The significance of the overlap between the gene-sets and the intervals was assessed using 10 000 randomly assigned intervals, matched for gene density, SNP number and similar SNP density.
Finally, a bootstrapping-based re-sampling method using 5000 permutations was used to correct the enrichment P-values of each gene-set for the number of sets tested.

Ingenuity Pathway analysis (IPA)
Ingenuity Pathway Analysis (IPA; Ingenuity Systems, www.ingenuity.com) was used to identify functions, pathways, and networks associated with general cognitive function. Gene symbols and Pvalues from the gene-based association analysis were uploaded to IPA and 17 013/17 715 were successfully mapped to corresponding objects in the Ingenuity Knowledge Base (IKB; September 2014).
A filter criterion of P-value ≤ 0.01 was used to identify 581 molecules of interest (focus molecules) and the full list was used as a reference set for the IPA analysis. An IPA core analysis was performed on the dataset. Networks were constructed using the 581 focus molecules and associated bio-functions and canonical pathways were identified. Networks were given a score which is the -log(P-value) of Fisher's Exact Test, giving an indication of the fit of the network to the focus molecule set. IPA builds networks to a user specified target size; networks of 70 and 140 nodes were generated using direct interactions only.
All other settings were left as default.

Functional annotation and gene expression
For three genomic regions, located on chromosomes 6, 14 and 19, functional annotation, gene expression and evidence of expression quantitative trait loci (eQTL) were explored using publicly available online resources. These three regions were identified from the genome-wide significant findings in the SNP-based meta-analysis (P < 5 x 10 -8 ). The Genotype-Tissue Expression Portal (GTEx) (http://www.gtexportal.org) was used to identify eQTLs associated with any SNP that had a P-value < 5 x 10 -8 in the meta-analysis (13 SNPs). Functional annotation was investigated for the same 13 SNPs (P < 5 x 10 -8 ) described above using the Regulome DB database 4 . Regulome DB was used to identify regulatory DNA elements in non-coding and intergenic regions of the genome. Data describing differential expression of the top two genes from the VEGAS analyses, in six brain regions across the life course, were extracted from the Human Brain Transcriptome Project (hbatlas.org) 5 .  indicates that the SNP did not pass QC in that cohort.

Supplementary
Supplementary Table S4: (to be included in SI as excel file). Genes showing association with general cognitive function (P < 1 x 10 -3 ) in the VEGAS gene-based analysis; bold type indicates genome-wide significance (P < 2.8 x 10 -6 ). Abbreviations: SNP, single-nucleotide polymorphism; N SNPs, the number of SNPs in the gene (±50kb); Best-SNP, the most significant SNP within the gene; SNP-Pvalue, the original association P-value for the most significant SNP within the gene.

Supplementary Table S8
Polygenic prediction results. The results from the meta-analysis (excluding Generation Scotland (GS)) were used to create a polygenic predictor which was used to predict cognitive phenotypes and health outcomes in the GS cohort. R 2 was calculated by taking the difference in R 2 between a null model that adjusted for age, sex, and 4 PCs, and the model that also included the polygenic prediction score. pvalue corresponds to the prediction score term in the model. R 2 is presented as a percentage.