Ninety-nine independent genetic loci influencing general cognitive function include genes associated with brain health and structure (N = 280,360)

General cognitive function is a prominent human trait associated with many important life outcomes1,2, including longevity3. The substantial heritability of general cognitive function is known to be polygenic, but it has had little explication in terms of the contributing genetic variants4,5,6. Here, we combined cognitive and genetic data from the CHARGE and COGENT consortia, and UK Biobank (total N=280,360). We found 9,714 genome-wide significant SNPs (P<5 × 10−8) in 99 independent loci. Most showed clear evidence of functional importance. Among many novel genes associated with general cognitive function were SGCZ, ATXN1, MAPT, AUTS2, and P2RY6. Within the novel genetic loci were variants associated with neurodegenerative disorders, neurodevelopmental disorders, physical and psychiatric illnesses, brain structure, and BMI. Gene-based analyses found 536 genes significantly associated with general cognitive function; many were highly expressed in the brain, and associated with neurogenesis and dendrite gene sets. Genetic association results predicted up to 4% of general cognitive function variance in independent samples. There was significant genetic overlap between general cognitive function and information processing speed, as well as many health variables including longevity.

General cognitive function is a prominent human trait associated with many important life outcomes 1,2 , including longevity 3 . The substantial heritability of general cognitive function is known to be polygenic, but it has had little explication in terms of the contributing genetic variants 4,5,6 . Here, we combined cognitive and genetic data from the CHARGE and COGENT consortia, and UK Biobank (total N=280,360). We found 9,714 genome-wide significant SNPs (P<5 x 10 -8 ) in 99 independent loci. Most showed clear evidence of functional importance. Among many novel genes associated with general cognitive function were SGCZ, ATXN1, MAPT, AUTS2, and P2RY6. Within the novel genetic loci were variants associated with neurodegenerative disorders, neurodevelopmental disorders, physical and psychiatric illnesses, brain structure, and BMI. Genebased analyses found 536 genes significantly associated with general cognitive function; many were highly expressed in the brain, and associated with neurogenesis and dendrite gene sets.
Genetic association results predicted up to 4% of general cognitive function variance in independent samples. There was significant genetic overlap between general cognitive function and information processing speed, as well as many health variables including longevity.
Since its discovery in 1904 7 , hundreds of studies have replicated the finding that around 40% of the variance in people's test scores on a diverse battery of cognitive tests can be accounted for by a single general factor 8 . General cognitive function is peerless among human psychological traits in terms of its empirical support and importance for life outcomes 1,2 . Individual differences in general cognitive function are stable across most of the life course 9 . Twin studies find that general cognitive function has a heritability of more than 50% from adolescence through adulthood to older age 4,10,11 .
SNP-based estimates of heritability for general cognitive function are about 20-30% 5 . To date, little of this substantial heritability has been explained; only a few relevant genetic loci have been discovered ( Table 1 and Fig. 1). Like other highly polygenic traits, a limitation on uncovering relevant genetic loci is sample size 12 ; to date, there have been fewer than 100,000 individuals in studies of general cognitive function 5,6 .
General cognitive function, unlike height for example, is not measured the same way in all samples.
Here, this was mitigated by applying a consistent method of extracting a general cognitive function component from cognitive test data in the cohorts of the CHARGE and COGENT consortia (Supplementary Materials). Cohorts' participants were required to have scores from at least three cognitive tests, each of which tested a different cognitive domain. Each cohort applied the same data reduction technique (principal component analysis) to extract a general cognitive component.
Scores from the first unrotated principal component were used as the general cognitive function phenotype. The psychometric characteristics of the general cognitive component from each of the 57 cohorts in the CHARGE consortium are shown in Supplementary Materials. We showed that general cognitive function components extracted from different sets of cognitive tests on the same participants correlate highly 5 . The cognitive test from the large UK Biobank sample was the so-called 'fluid' test, a 13-item test of verbal-numerical reasoning, which has a high genetic correlation with general cognitive function 13 . With the CHARGE and COGENT samples' general cognitive function scores and UK Biobank's verbal-numerical reasoning scores (in two samples: assessment centretested, and online-tested), there were 280,360 participants included in the present study's genomewide association (GWA) analysis. We performed post-GWA meta-analyses separately on the CHARGE and COGENT cohorts, and on UK Biobank's two samples. Prior to running the meta-analysis of CHARGE-COGENT with UK Biobank, the genetic correlation, calculated using linkage disequilibrium score (LDSC) regression, was estimated at 0.82 (SE=0.02), indicating very substantial overlap between the genetic variants influencing general cognitive function in these two groups. We performed an inverse-variance weighted meta-analysis of CHARGE-COGENT and UK Biobank.
Genome-wide results for general cognitive function showed 9,714 significant (P < 5 × 10 -8 ) SNP associations, and 17,563 at a suggestive level (1 × 10 -5 > P > 5 x 10 -8 ); see Fig. 2a and Supplementary   Tables 3 and 4. There were 120 independent lead SNPs identified by FUnctional MApping and annotation of genetic associations (FUMA) 14 . A comparison of these lead SNPs with results from the largest previous GWAS of cognitive function 6 and educational attainment 15 -which included a subsample of UK Biobank-confirmed that 4 and 12 of these, respectively, were genome-wide significant in the previous studies (Supplementary Table 14 Supplementary Table 16). Therefore, our study uncovered 87 novel independent loci associated with cognitive function. Of the five completely novel loci, two of these are in/near interesting candidate genes: MAPT gene mutations are associated with neurodegenerative disorders such as Alzheimer's disease and frontotemporal dementia 16 ; and AUTS2 is a candidate gene for neurological disorders such as autism spectrum disorder, intellectual disability, and developmental delay 17 . These general cognitive function-associated genes also showed significant gene associations in the gene-based tests (except for P2RY6); see Supplementary Table 7 and Fig. 2b for the results for 536 genes that the present study finds to be significantly associated with general cognitive function.
For the 120 lead SNPs, a summary of previous SNP associations is listed in Supplementary Table 15.
We sought to identify lead and tagged SNPs within the 99 significant genomic risk loci associated with general cognitive function that are potentially functional, using FUMA 14 (Supplementary Table   16). See online methods for further details. Seventy-nine of the genomic risk loci contained at least one SNP with a Combined Annotation Dependent Depletion (CADD) score > 12.37, indicating that they are likely to be deleterious SNPs. Sixty-five of the genomic risk loci contained at least one SNP with a RegulomeDB score < 3, indicating that they are likely to be involved in gene regulation.
Ninety-seven of the loci contained at least one SNP with a minimum 15-core chromatin state score of < 8, indicating that they are located in an open chromatin state consistent with the SNP being in a regulatory region. Sixty-eight of the loci contained at least one eQTL. Of interest, rs1135840 in CYP2D6 (P=1.42 × 10 -11 ) is a non-synonymous SNP (Ser486Thr), that has previously been associated with the metabolism of several commonly used drugs 18 .
Identification of these gene-sets is consistent with genes associated with cognitive function regulating the generation of cells within the nervous system, including the formation of neuronal dendrites. MAGMA gene-property analysis indicated that genes expressed in all brain regionsexcept the brain spinal cord and cervical c1-and genes expressed in the pituitary share a higher level of association with general cognitive function than genes not expressed in the brain or pituitary  (Table 2). Genetic correlations for general cognitive function amongst these cohorts, estimated using bivariate GCTA-GREML, ranged from r g = 0.88 to 1.0 ( Table 2). There were slight differences in the test questions and the testing environment for the UK Biobank's 'fluid' (verbal-numerical reasoning) test in the assessment centre versus the online version. Therefore, we investigated the genetic contribution to the stability of individual differences in people's verbal-numerical reasoning using a bivariate GCTA-GREML analysis, including only those individuals who completed the test on both occasions (mean time gap = 4.93 years). We found a significant perfect genetic correlation of r g = 1.0 (SE = 0.02).
We tested how well the genetic results from our CHARGE-COGENT-UK Biobank general cognitive function GWA analysis accounted for cognitive test score variance in independent samples. We reran the GWA analysis excluding three of the larger cohorts: ELSA, Generation Scotland, and Using the CHARGE-COGENT-UK Biobank GWA results, we tested the genetic correlations between general cognitive function and 25 health traits. Sixteen of the 25 health traits were significantly genetically correlated with general cognitive function (Supplementary Table 12). Novel genetic correlations were identified between general cognitive function and ADHD (r g = -0.36, SE = 0.03, P = 3.91 × 10 -32 ), bipolar disorder (r g = -0.09, SE = 0.04, P = 0.008), major depression (r g = -0.30, SE = 0.05, P = 4.13 × 10 -12 ), and longevity (r g = 0.15, SE = 0.06, P = 0.014).
Reaction time is an elementary cognitive task that assesses a person's information processing speed.
It is both phenotypically and genetically correlated with general cognitive function, and accounts for some of its association with health 20,21 . We explored the genetic foundations of reaction time and its genetic association with general cognitive function. We note the limitation that the UK Biobank' People with higher general cognitive function are broadly healthier 22,23 ; here, we find overlap between genetic loci for general cognitive function and a number of physical health traits. These shared genetic associations may reflect a causal path from cognitive function to disease, cognitive consequences of disease, or pleiotropy 24 . For psychiatric illness, conditions like schizophrenia (and, to a lesser extent, bipolar disorder) are characterised by cognitive impairments 25 , and thus reverse causality (i.e. from cognitive function to disease) is less likely. In terms of localising more proximal structural and functional causes of variation in cognitive function, researchers could prioritise the genetic loci uncovered here that overlap with brain-related measures.
General cognitive function has prominence and pervasiveness in the human life course, and it is important to understand the environmental and genetic origins of its variation in the population 4 .
The unveiling here of many new genetic loci, genes, and genetic pathways that contribute to its heritability (Supplementary Tables 3, 7 and 18; Fig. 2)-which it shares with many health outcomes, longevity, brain structure, and processing speed-provides a foundation for exploring the mechanisms that bring about and sustain cognitive efficiency through life.

Acknowledgments
This research was conducted in The University of Edinburgh Centre for Cognitive Ageing and

Genome-wide association analyses
Genotype-phenotype association analyses were performed within each cohort, using an additive model, on imputed SNP dosage scores. Adjustments for age, sex, and population stratification, if required, were included in the model. Cohort-specific covariates-for example, site or familial relationships-were also fitted as required. Cohort specific quality control procedures, imputation methods, and covariates are described in Supplementary   Table S2. Quality control of the cohort-level summary statistics was performed using the EasyQC software 31 , which implemented the exclusion of SNPs with imputation quality < 0.6 and minor allele count < 25.

Meta-analysis
A meta-analysis of the 57 CHARGE-COGENT cohorts was performed using the METAL package with an inverse variance weighted model implemented and single genomic control applied (http://www.sph.umich.edu/csg/abecasis/Metal). The two UK Biobank groups, VNR Assessment Centre and VNR Web-Based were also meta-analysed using the same method. An inverse-variance weighted meta-analysis of the CHARGE-COGENT and UKB summary results was then performed.

Reaction Time Genome-wide association analysis
The GWA of reaction time from the UK Biobank sample was performed using the BGENIE v 1.2 analysis package (https://jmarchini.org/bgenie/). A linear SNP association model was tested which accounted for genotype uncertainty. Reaction time was adjusted for the following covariates; age, sex, genotyping batch, genotyping array, assessment centre, and 40 principal components.

Gene-based analysis (MAGMA)
Gene-based analysis was conducted using MAGMA 32 . All SNPs that were located within protein coding genes were used to derive a P-value describing the association found with general cognitive function and reaction time. The SNP-wise model from MAGMA was used and the NCBI build 37 was used to determine the location and boundaries of 18,199 autosomal genes. Linkage disequilibrium within and between each gene was gauged using the 1000 genomes phase 3 release 33 . A Bonferroni correction was applied to control for multiple testing.

Estimation of SNP-based heritability
Univariate GCTA-GREML analyses 34 were used to estimate the proportion of variance explained by all common SNPs in four of the largest individual cohorts: ELSA, Understanding Society, UK Biobank, and Generation Scotland. Sample sizes for all of the GCTA analyses in these cohorts will differ from the association analyses, because one individual was excluded from any pair of individuals who had an estimated coefficient of relatedness of > 0.025 to ensure that effects due to shared environment were not included. The same covariates were included in all GCTA-GREML analyses as for the SNP-based association analyses.

Univariate Linkage Disequilibrium Score Regression (LDSC)
Univariate LDSC regression was performed on the summary statistics from the GWAS on general cognitive function and reaction time. The heritability Z-score provides a measure of the polygenic signal found in each data set. Values greater than 4 indicate that the data are suitable for use with bivariate LDSC regression 35 . The mean χ 2 statistic indicates the inflation of the GWAS test statistics that, under the null hypothesis of no association (i.e. no inflation of test statistics), would be 1. For each GWAS, an LD regression was carried out by regressing the GWA test statistics (χ 2 ) on to each SNP's LD score (the sum of squared correlations between the minor allele frequency count of a SNP with the minor allele frequency count of every other SNP).
Bivariate GCTA was used to calculate genetic correlations between phenotypes and cohorts where the genotyping data were available. This method was used to calculate the genetic correlations between different cohorts for the general cognitive function phenotype. It was also employed to investigate the genetic contribution to the stability of UK Biobank's participants' verbal-numerical reasoning test scores in the assessment centre and then in web-based, online testing. In cases where only GWA summary results were available, LDSC was used to estimate genetic correlations between two traits-for example, general cognitive function and longevity-in order to estimate the degree of overlap between polygenic architecture of the traits. Genetic correlations were estimated between general cognitive function and reaction time and a number of health outcomes.

Polygenic prediction
Polygenic profile score analysis was used to predict cognitive test performance in Generation Scotland, the English Longitudinal Study of Ageing, and Understanding Society. Polygenic profiles were created in PRSice 37 using results of a general cognitive function meta-analysis that excluded the Generation Scotland, the English Longitudinal Study of Ageing, and Understanding Society cohorts. Polygenic profiles were also created based on the UK Biobank GWA reaction time results.

Functional Annotation and Loci Discovery
Genomic risk loci were derived using Functional mapping and annotation of genetic associations (FUMA) 14 . Firstly, independent significant SNPs were identified using the SNP2GENE function and defined as SNPs with a P-value of ≤ 5 × 10 −8 and independent of other genome wide significant SNPs at R 2 < 0.6. Using these independent significant SNPs, candidate SNPs to be used in subsequent annotations were identified as all SNPs that had a MAF of 0.0005 and were in LD of R 2 ≥ 0.6 with at least one of the independent significant SNPs. These candidate SNPs included those from the 1000 genomes reference panel and need not have been included in the GWAS performed in the current study. Lead SNPs were also identified using the independent significant SNPs and were defined as those that were independent from each other at R 2 < 0.1. Genomic risk loci that were 250kb or closer were merged into a single locus.
The lead SNPs and those in LD with the lead SNPs were then mapped to genes based on the functional consequences of genetic variation of the lead SNPs which was measured using ANNOVAR 38 and the Ensembl genes build 85. Intergenic SNPs were mapped to the two closest up-and down-stream genes which can result in their being assigned to multiple genes. All SNPs found in 1000 genomes phase 3 were then annotated with a CADD score 39 , RegulomeDB score 40 , and 15-core chromatin states [41][42][43] .
The mapping of eQTLs was performed using each independent significant SNP and those in LD with it. Those SNP-gene pairs that were not significant (FDR ≤ 0.05) were omitted from the analysis.

Gene-set analysis
Gene-set analysis was conducted in MAGMA 32 using competitive testing, which examines if genes within the gene set are more strongly associated with each of the cognitive phenotypes than other genes. Such competitive tests have been shown to control for Type 1 error rate as well as facilitating an understanding of the underlying biology of cognitive differences 44,45 . A total of 10 891 gene-sets (sourced from Gene Ontology 46 , Reactome 47 , and, SigDB 48 ) were examined for enrichment of intelligence. A Bonferroni correction was applied to control for the multiple tests performed on the 10,891 gene sets available for analysis.

Gene property analysis
In order to indicate the role of particular tissue types that influence differences in general cognitive function and reaction time, a gene property analysis was conducted using MAGMA.
The goal of this analysis was to determine if, in 30 broad tissue types and 53 specific tissues, tissue-specific differential expression levels were predictive of the association of a gene with general cognitive function and reaction time. Tissue types were taken from the GTEx v6 RNAseq database 49 with expression values being log2 transformed with a pseudocount of 1 after winsorising at 50, with the average expression value being taken from each tissue. Multiple testing was controlled for using a Bonferroni correction.     Br ain Co rte x

Figure Captions
Br ain An ter ior cin gu lat e co rte x BA 24 Br ain Nu cle us ac cu mb en s ba sa l ga ng lia