Article | Published:

Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

Nature Geneticsvolume 50pages11121121 (2018) | Download Citation

Abstract

Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

Main

Educational attainment is moderately heritable1 and an important correlate of many social, economic and health outcomes2,3. Because of its relationship with many health outcomes, measures of educational attainment are available in most medical datasets. Partly for this reason, educational attainment was the focus of the first large-scale genome-wide association study (GWAS) of a social-science phenotype4 and has continued to serve as a ‘model phenotype’ for behavioral traits (analogous to height for medical traits). Genetic associations with educational attainment identified by GWAS have been used in follow-up work in which biological5 and behavioral mechanisms6,7 and genetic overlap with health outcomes8,9 were analysed.

The largest (n = 293,723) GWAS of educational attainment to date identified 74 approximately independent SNPs at genome-wide significance (hereafter, lead SNPs) and reported that a 10-million-SNP linear predictor (hereafter, polygenic score) had an out-of-sample predictive power of 3.2%10. Here, we expand the sample size to over a million individuals (n = 1,131,881). We identify 1,271 lead SNPs. For a subsample (n = 694,894), we also conduct genome-wide association analyses of variants on the X chromosome, identifying ten lead SNPs.

The marked increase in our GWAS sample size enables us to conduct a number of additional informative analyses. For example, we show that the lead SNPs have heterogeneous effects, and we perform within-family association analyses that probe the robustness of our results. Our biological annotation analyses, which focus on the results from the autosomal GWAS, reinforce the main findings from earlier GWAS in smaller samples, such as the role of many of the prioritized genes in brain development. However, the newly identified SNPs also lead to several new findings. For example, they strongly implicate genes involved in almost all aspects of neuron-to-neuron communication.

We found that a polygenic score derived from our results explains around 11% of the variance in educational attainment. We also report additional GWAS of three phenotypes that are highly genetically correlated with educational attainment: cognitive (test) performance (n = 257,841), self-reported math ability (n = 564,698) and hardest math class completed (n = 430,445). We identify 225, 618 and 365 lead SNPs, respectively. When we jointly analyze all four phenotypes using a recently developed method11, we found that the explanatory power of polygenic scores based on the resulting summary statistics increases, to 12% for educational attainment and 7–10% for cognitive performance.

Results

Primary GWAS of educational attainment

In our primary GWAS, we study educational attainment, which is measured as the number of years of schooling that individuals completed (EduYears). All association analyses were performed at the cohort level in samples restricted to European-descent individuals. We applied a uniform set of quality-control procedures to all cohort-level results. Our final sample-size-weighted meta-analysis produced association statistics for around 10 million SNPs from phase 3 of the 1000 Genomes Project12.

The quantile–quantile plot of the meta-analysis (Supplementary Fig. 1) exhibits substantial inflation (λGC = 2.04). According to our linkage disequilibrium (LD) score regression13 estimates, only a small share (approximately 5%) of this inflation is attributable to bias (Supplementary Fig. 2 and Supplementary Table 1). We used the estimated LD score intercept (1.11) to generate inflation-adjusted test statistics.

Figure 1 shows the Manhattan plot of the resulting P values. We identified 1,271 approximately independent (pairwise r2 < 0.1) SNPs at genome-wide significance (P < 5 × 10−8), 995 of which remain if we adopt the stricter significance threshold (P < 1 × 10−8) proposed in a recent study14 (Supplementary Table 2, see Methods for a description of the clumping algorithm). The results from a conditional-joint analysis15 are reported in the Supplementary Note and Supplementary Table 3.

Fig. 1: Manhattan Plot for GWAS of EduYears.
Fig. 1

The P value and mean χ2 value are based on inflation-adjusted test statistics. The x axis is chromosomal position and the y axis is the significance on a –\({{\rm{log}}}_{10}\) scale. The dashed line marks the threshold for genome-wide significance (P = 5 × 10−8) (n = 1,131,881).

We used a Bayesian statistical framework to calculate winner’s-curse-adjusted posterior distributions of the effect sizes of the lead SNPs (Methods). We found that the median effect size of the lead SNPs corresponds to 1.7 weeks of schooling per allele; at the 5th and 95th percentiles, 1.1 and 2.6 weeks, respectively. We also examined the replicability of the 162 single-SNP associations (P < 5 × 10−8) that were reported in the combined discovery and replication sample (n = 405,073) of the largest previous study10. In the subsample of our data (n = 726,808) that did not contribute to the analyses of the previous study, the SNPs replicate at a rate that closely matches theoretical projections derived from our Bayesian framework (Supplementary Fig. 3).

Within-family association analyses

We conducted within-family association analyses in four sibling cohorts (22,135 sibling pairs) and compared the resulting estimates to those from a meta-analysis that excluded the siblings (n = 1,070,751). The latter association statistics were adjusted for stratification bias using the LD score intercept. Figure 2 shows the observed sign concordance for three sets of approximately independent SNPs, selected using P value cutoffs of 5 × 10−3, 5 × 10−5 and 5 × 10−8. The concordance is substantially greater than expected by chance but weaker than predicted by our Bayesian framework, even after we extend the framework to account for inflation in GWAS coefficients owing to assortative mating. In a second analysis based on all SNPs, we estimate that within-family effect sizes are roughly 40% smaller than GWAS effect sizes and that our assortative-mating adjustment explains at most one third of this deflation. (For comparison, when we apply the same method to height, we found that the assortative-mating adjustment fully explains the deflation of the within-family effects.)

Fig. 2: Sign concordance in within-family association analyses.
Fig. 2

a, Sign concordance for LD-pruned SNPs reaching P < 5 × 10−3 (14,670 SNPs). b, Sign concordance for LD-pruned SNPs reaching P < 5 × 10−5 (4,594 SNPs). c, Sign concordance for LD-pruned SNPs reaching P < 5 × 10−8 (that is, lead SNPs;  1,318 SNPs). Each panel compares the observed sign concordance between within-family and GWAS estimates to the distributions expected by chance alone (pink); according to a Bayesian framework that adjusts the GWAS estimates for bias due to winner’s curse (green); and according to the same framework with an additional adjustment for bias due to assortative mating (blue). These results are based on a GWAS sample size of 1,070,751 individuals and a within-family sample of 22,135 sibling pairs (44,270 individuals).

The Supplementary Note contains analyses and discussion of the possible causes of the remaining deflation we observe for EduYears. Although the evidence is not conclusive, it suggests that the GWAS effect-size estimates may be biased upward by correlation between educational attainment and a rearing environment conducive to educational attainment. Consistent with this hypothesis, a recent paper16 reports that a polygenic score for EduYears based entirely on the non-transmitted alleles of the parents is approximately 30% as predictive as a polygenic score based on transmitted alleles. (For height, the analogous estimate is only 6%.) The non-transmitted alleles affect the educational attainment of the parents but can only influence the educational attainment of the child indirectly. If greater parental educational attainment positively influences the rearing environment, then GWAS that control imperfectly for rearing environment will yield inflated estimates. The LD score regression intercept does not capture this bias because the bias scales with the LD score in the same way as a direct genetic effect.

Heterogeneous effect sizes

Because educational institutions vary across places and time, the effects of specific SNPs may vary across environments. Consistent with such heterogeneity, for the lead SNPs, we reject the joint null hypothesis of homogeneous cohort-level effects (P = 9.7 × 10−12; Supplementary Fig. 4). Moreover, we found that the inverse-variance-weighted mean genetic correlation of EduYears across pairs of cohorts in our sample is 0.72 (s.e. = 0.14), which is statistically distinguishable from one (P = 0.03).

Our finding of an imperfect genetic correlation replicates earlier results from smaller samples17,18. This imperfect genetic correlation is an important factor to consider in power calculations and study design. In the Supplementary Note, we report exploratory analyses that aim to identify specific sources of measurement heterogeneity or gene–environment interactions that may explain the imperfect genetic correlation. Unfortunately, the estimates are noisy, and the only robust finding was that SNP heritability was smaller in cohorts for which the measurement of EduYears was derived from questions with fewer response categories.

X-chromosome GWAS results

We supplemented our autosomal analyses with association analyses of SNPs on the X chromosome. We first conducted separate association analyses of males (n = 152,608) and females (n = 176,750) in the UK Biobank. We found a male–female genetic correlation close to unity. We also found nearly identical SNP heritability estimates for men and women, which is consistent with partial dosage compensation (that is, on average the per-allele effect sizes are smaller in women) and indicates that any contribution of common variants on the X chromosome to sex differences in the normal-range variance of cognitive phenotypes19 is quantitatively negligible.

Next, we conducted a large (n = 694,894) meta-analysis of summary statistics from mixed-sex analyses (Supplementary Fig. 5). We identified 10 lead SNPs and estimated a SNP heritability due to the X chromosome of approximately 0.3% (Supplementary Table 4). This heritability is lower than that expected for an autosome of similar length (Supplementary Fig. 6 and Supplementary Table 5). We cannot distinguish whether the lower heritability is due to smaller per-allele effect sizes for SNPs on the X chromosome or to the combination of haploidy in males and (partial) X inactivation in females.

Biological annotation

For biological annotation, we focus on the results from the autosomal meta-analysis of EduYears. Across an extensive set of analyses (see Supplementary Fig. 7 for a flow chart), all major conclusions from the largest previous GWAS of EduYears10 continue to hold but are statistically stronger. For example, we applied the bioinformatic tool DEPICT20 and found that, relative to other genes, genes near our lead SNPs were overwhelmingly enriched for expression in the central nervous system (Fig. 3a and Supplementary Table 6).

Fig. 3: Tissue-specific expression of genes in DEPICT-defined loci.
Fig. 3

a, We took microarray measurements from the Gene Expression Omnibus20 and determined whether the genes overlapping EduYears-associated loci (as defined by DEPICT) are significantly overexpressed (relative to genes in random sets of loci) in each of 180 tissues or cell types. These types are grouped by first-level terms according to the medical subject headings (MeSH). The y axis is the one-sided P value from DEPICT on a –\({{\rm{log}}}_{10}\) scale. The 28 dark bars correspond to tissues or cell types in which the genes are significantly overexpressed (FDR < 0.01), including all 22 classified as part of the central nervous system (see Supplementary Table 6 for identifiers of all tissues and cell types). b, Whereas genes prioritized by DEPICT in a previous analysis based on a smaller sample10 tend to be more strongly expressed in the brain prenatally (red curve), the 1,703 newly prioritized genes show a flat trajectory of expression across development (blue curve). Both groups of DEPICT-prioritized genes show elevated levels of expression relative to protein-coding genes that are not prioritized (gray curve). Analyses were based on RNA-sequencing data from the BrainSpan Developmental Transcriptome35. These results are based on the full GWAS sample of 1,131,881 individuals. Error bars represents 95% confidence intervals. RPKM, reads per kilobase of transcript per million reads mapped.

There are also many novel findings associated with the large number of genes newly implicated by our analyses. At the standard false discovery rate (FDR) threshold of 5%, the bioinformatic tool DEPICT20 prioritizes 1,838 genes (Supplementary Table 7), a tenfold increase relative to the DEPICT results from an earlier GWAS of EduYears10. In the following paragraphs, we distinguish between the 1,703 ‘newly prioritized’ genes and the 135 ‘previously prioritized’ genes.

An extensive analysis of many of the newly prioritized genes and their brain-related functions are described in the Supplementary Note. Here we highlight two especially noteworthy regularities. First, whereas previously prioritized genes exhibited especially high expression in the brain prenatally, the newly prioritized genes show elevated levels of expression both pre- and postnatally (Fig. 3b). Many of the newly prioritized genes encode proteins that carry out neurophysiological functions such as neurotransmitter secretion, the activation of ion channels and metabotropic pathways, and synaptic plasticity (Supplementary Fig. 8).

Second, even though glial cells are at least as numerous as neurons in the human brain21, gene sets related to glial cells (astrocytes, myelination and positive regulation of gliogenesis) are absent from those identified as positively enriched (Supplementary Table 8). Furthermore, using stratified LD score regression22, we estimated relatively weak enrichment of genes highly expressed in glial cells (Supplementary Table 9): 1.08-fold for astrocytes (P = 0.07) and 1.09-fold for oligodendrocytes (P = 0.06) versus 1.33-fold for neurons (P = 2.89 × 10−11). Because myelination increases the speed with which signals are transmitted along axons23, the absence of enrichment of genes related to glial cells may weigh against the hypothesis that differences across people in cognition are driven by differences in transmission speed.

The results also raise a number of possible targets for functional studies. Among SNPs within 50 kb of lead SNPs, 127 of them are identified by the fine-mapping tool CAVIARBF24 as likely causal SNPs (posterior probability > 0.9; Supplementary Table 10). Eight of these are non-synonymous, and one of these eight (rs61734410) is located in CACNA1H (Supplementary Fig. 9), which encodes the pore-forming subunit of a voltage-gated calcium channel that has been implicated in the trafficking of N-methyl-D-aspartate receptors25.

Polygenic prediction

Polygenic predictors derived from earlier GWAS of EduYears have proven to be a valuable tool for researchers, especially in the social sciences6,7. We constructed polygenic scores for individuals of European ancestry in two prediction cohorts: the National Longitudinal Study of Adolescent to Adult Health (Add Health, n = 4,775), a representative sample of American adolescents; and the Health and Retirement Study (HRS, n = 8,609), a representative sample of Americans over the age of 50. We measure prediction accuracy by the ‘incremental R2’ statistic: the gain in the coefficient of determination (R2) when the score is added as a covariate to a regression of the phenotype on a set of baseline controls (sex, birth year, their interaction and 10 principal components of the genetic relatedness matrix).

All scores are based on the results from a meta-analysis that excluded the prediction cohorts. Our first four scores were constructed from sets of LD-pruned SNPs associated with EduYears at various P-value thresholds: 5 × 10−8, 5 × 10−5, 5 × 10−3 and 1 (that is, all SNPs). In both cohorts, the predictive power is greater for scores constructed with less stringent thresholds (Supplementary Fig. 10). The sample-size-weighted mean incremental R2 increases from 3.2% at P < 5 × 10−8 to 9.4% at P ≤ 1. Our fifth score was generated from HapMap3 SNPs using the software LDpred26. Rather than removing SNPs that are in LD with each other, LDpred is a Bayesian method that weights each SNP by (an approximation to) the posterior mean of its conditional effect, given other SNPs. This score was the most predictive in both cohorts, with an incremental R2 of 12.7% in AddHealth and 10.6% in HRS (and a sample-size weighted mean of 11.4%).

To put the predictive power of this score in perspective, Fig. 4a shows the mean college completion rate by polygenic-score quintile. The difference between the bottom and top quintiles in Add Health and HRS is, respectively, 45 and 36 percentage points (see Supplementary Fig. 11 for analogous analyses of high school completion and grade retention). Figure 4b compares the incremental R2 of the score to that of standard demographic variables. The score is a better predictor of EduYears than household income and a worse predictor than the educational attainment of the mother or father. Controlling for all the demographic variables jointly, the score’s incremental R2 is 4.6% (Supplementary Fig. 12).

Fig. 4: Prediction Accuracy.
Fig. 4

a, Mean prevalence of college completion by EduYears polygenic score (PGS) quintile. Data are mean ± 95% confidence interval. b, Incremental R2 of the EduYears PGS compared to that of other variables. c, Incremental R2 of the PGS for EduYears and cognitive performance constructed from the respective GWAS or MTAG summary statistics. Error bars for the R2 values show bootstrapped 95% confidence intervals with 1,000 iterations each. Sample sizes are n = 4,775 for Add Health and n = 8,609 for HRS.

We also found that the score has substantial predictive power for a variety of other cognitive phenotypes measured in the prediction cohorts (Supplementary Fig. 13). For example, it explains 9.2% of the variance in overall grade point average in Add Health.

Because the discovery sample used to construct the score consisted of individuals of European ancestry, we would not expect the predictive power of our score to be as high in other ancestry groups7,27,28. Indeed, when our score was used to predict EduYears in a sample of African-Americans from the HRS (n = 1,519), the score only has an incremental R2 of 1.6%, implying an attenuation of 85%. The Supplementary Note shows that this amount of attenuation is typical of what has been reported in previous studies.

Related cognitive phenotypes and multi-trait analysis of GWAS

We performed GWAS on three complementary phenotypes: cognitive performance (n = 257,841), self-reported math ability (n = 564,698), and highest math class taken (highest math, n = 430,445). For cognitive performance, we meta-analyzed published results from the COGENT consortium29 with results based on new analyses of the UK Biobank (UKB), as did another study30. For the two math phenotypes, we studied new genome-wide analyses using samples of research participants from 23andMe. We identified 225, 618 and 365 genome-wide significant SNPs for cognitive performance, math ability and highest math, respectively (Supplementary Figs. 1416 and Supplementary Tables 1113).

We conducted a multi-trait analysis of EduYears and our supplementary phenotypes to improve polygenic prediction accuracy. These phenotypes are well suited to joint analysis because their pairwise genetic correlations are high, in all cases exceeding 0.5 (Supplementary Table 14). We applied a recently developed method, multi-trait analysis of GWAS (MTAG)11 to summary statistics for the four phenotypes from meta-analyses that exclude the prediction cohorts. For all four phenotypes, MTAG increases the number of lead SNPs identified at genome-wide significance (Supplementary Figs. 1720 and Supplementary Table 15). Figure 4c shows the incremental R2 for the polygenic scores based on GWAS and MTAG association statistics (but otherwise constructed using identical methods) when the target phenotype is either EduYears (left panel) or cognitive performance (right panel).

In Add Health, in which our measure of cognitive performance is the respondent’s score on a test of verbal cognition, the incremental R2 values of the GWAS and MTAG scores are 5.1% and 6.9%, respectively. To obtain a better measure of prediction accuracy for cognitive performance, we used an additional validation cohort, the Wisconsin Longitudinal Study (WLS), which administered a cognitive test with excellent retest reliability and psychometric properties that were similar to those used in our discovery GWAS of cognitive performance. In the WLS, the MTAG score predicts 9.7% of the variance in cognitive performance, a substantial improvement over the 7.0% predicted by the GWAS score and approximately double the prediction accuracy reported in three recent GWAS analyses of cognitive performance29,31,32.

Discussion

The results of this study illustrate what the advocates of GWAS anticipated: as sample sizes get larger, thousands of lead SNPs will be identified, and polygenic predictors will attain non-trivial levels of predictive power. However, theoretical projections that failed to consider heterogeneity of effect sizes were optimistic4. Our and others’ findings17,18 suggest that imperfect genetic correlation across cohorts will be the norm for phenotypes, such as educational attainment, that are environmentally contingent.

For research at the intersection of genetics and neuroscience, the set of 1,271 lead SNPs that we identify is a treasure trove for future analyses. For research in social science and epidemiology, the polygenic scores that we construct—which explain 11–13% and 7–10% of the variance in educational attainment and cognitive performance, respectively—will prove useful across at least three types of applications.

First, by examining associations between the scores and high-quality measures of endophenotypes, researchers may be able to disentangle the mechanisms by which genetic factors affect educational attainment and cognitive phenotypes. Such studies are already being conducted with polygenic scores from earlier GWAS of educational attainment6,7, but they can now be well powered in samples as small as those from laboratory experiments. For example, if our polygenic score explains 10% of the variance in an endophenotype, then its effect can be detected at a 5% significance threshold with 80% power in a sample of only 75 individuals. Second, the polygenic scores can be used as control variables in randomized controlled trials (RCTs) of interventions that aim to improve academic and cognitive outcomes. Given the current levels of predictive power of the scores, such use can now generate non-trivial gains in statistical power for the RCT. For example, if adding the polygenic score to the set of control variables in an RCT increases their joint explanatory power from 10% to 20%, then the gain in power from including the polygenic score is equivalent to increasing the sample size of the RCT by 11% (for such calculations, see the supplementary online material of a previous study4). Third, the polygenic scores can be used as a tool for exploring gene–environment interactions33, which are known to be important for genetic effects on educational attainment and cognitive performance1,34.

Our results also highlight two caveats to the use of the polygenic scores in research. First, our within-family analyses suggest that GWAS estimates may overstate the causal effect sizes: if educational attainment-increasing genotypes are associated with parental educational attainment-increasing genotypes, which are in turn associated with rearing environments that promote educational attainment, then failure to control for rearing environment will bias GWAS estimates. If this hypothesis is correct, some of the predictive power of the polygenic score reflects environmental amplification of the genetic effects. Without controls for this bias, it is therefore inappropriate to interpret the polygenic score for educational attainment as a measure of genetic endowment.

Second, we found that our score for educational attainment has much lower predictive power in a sample of African-American individuals than in a sample of individuals with an European ancestry, and we anticipate that the score would also have reduced predictive power in other samples of individuals with a non-European ancestry. Therefore, until polygenic scores are available that have as much predictive power in other ancestry groups, the score will be most useful in research that is focused on samples of individuals with an European ancestry.

URLs

Social Science Genetic Association Consortium (SSGAC), http://www.thessgac.org/#!data/kuzq8; Minimac2, https://genome.sph.umich.edu/wiki/Minimac2; BEAGLE v.2.1.2, http://faculty.washington.edu/browning/beagle/b3.html; IMPUTE2 v.2.3.1, http://mathgen.stats.ox.ac.uk/impute/impute_v2.html; PBWT, https://github.com/richarddurbin/pbwt; IMPUTE4, https://jmarchini.org/impute-4/; ShapeIT v.2.r790, http://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html; BOLT-LMM, https://data.broadinstitute.org/alkesgroup/BOLT-LMM/; SNPTEST v.2.4.1, https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html; REGSCAN v.0.2.0, https://www.geenivaramu.ee/en/tools/regscan; METAL, release 2011-03-25, http://csg.sph.umich.edu/abecasis/metal/; EasyQC v.9.0, http://www.uni-regensburg.de/medizin/epidemiologie-praeventivmedizin/genetische-epidemiologie/software/; ldsc v.1.0.0, https://github.com/bulik/ldsc; Plink, 1.90b3p, http://zzz.bwh.harvard.edu/plink/plink2.shtml; LDpred v.0.9.09, https://bitbucket.org/bjarni_vilhjalmsson/ldpred; Stata v.14.2, https://www.stata.com/install-guide/windows/download/; DEPICT (downloaded February 2016), https://data.broadinstitute.org/mpg/depict/; MAGMA v.1.06b, https://ctg.cncr.nl/software/magma; PANTHER release 20170403, http://www.geneontology.org; CAVIARBF v.0.2.1, https://bitbucket.org/Wenan/caviarbf; MTAG software v.1.0.1, https://github.com/omeed-maghzian/mtag.

Methods

This article is accompanied by a Supplementary Note with further details.

GWAS meta-analyses

Our primary analysis extends the (combined discovery and replication) sample of a previous GWAS of educational attainment10 from n = 405,072 to n = 1,131,881 individuals. We performed a sample-size-weighted meta-analysis of 71 quality-controlled cohort-level results files using the METAL software36. The meta-analysis combines 59 cohort-level result files from the previous study with 12 new result files: 8 from cohorts that were not included in the previous study10 and 4 from cohorts that updated their results with larger samples.

All cohort-level analyses were restricted to individuals of European ancestry that passed the quality control of the cohort and for whom EduYears was measured at an age of at least 30. The EduYears phenotype was constructed by mapping each major educational qualification that can be identified from the survey measure of the cohort to an International Standard Classification of Education (ISCED) category and imputing a years-of-education equivalent for each ISCED category. Details on cohort-level phenotype measures, genotyping, imputation, association analyses and quality-control filters are described in Supplementary Tables 1619.

We used the estimated intercept from LD score regression13 to inflation-adjust the test statistics. We then used the clumping algorithm described below to determine the number of approximately independent SNPs identified at any given P-value threshold.

Clumping algorithm

Our clumping algorithm is iterative and has been used previously10. We describe it here for the case of identifying lead SNPs among the set of SNPs that reached P < 5 × 10−8; the algorithm is the same when determining sets of approximately independent SNPs for other P-value thresholds.

First, the SNP with the smallest P value in the pooled meta-analysis results is identified as the lead SNP of the first clump. Next, all SNPs in LD with the lead SNP are also assigned to this clump. SNPs are defined to be in LD with each other if they are on the same chromosome and the squared correlation of their genotypes is r2 > 0.1. To determine the second lead SNP and second clump, the first clump is removed, and the same steps are applied to the remaining SNPs. The process is repeated until no SNPs with a P value below 5 × 10−8 remain. Each locus is defined by a lead SNP and the SNPs assigned to its clump. Therefore, each lead SNP maps to exactly one locus, and each locus maps to exactly one lead SNP.

We performed the clumping in Plink37. Note that we measure the LD between every pair of SNPs on each chromosome without regard to the physical distance between them. Therefore, if two SNPs on the same chromosome have a pairwise r2 above 0.1, then they cannot both be lead SNPs. On the other hand, it is possible for two SNPs in close physical proximity both to be lead SNPs, provided their pairwise r2 is below 0.1. Analyses of the sensitivity of the number of lead SNPs and loci to alternative definitions and to the choice of the reference file used to estimate LD are included in the Supplementary Note.

Conditional and joint multiple-SNP analysis

Given a P-value threshold specified by the user, conditional and joint multiple-SNP analysis (COJO)15 is a method that identifies a set of SNPs such that, in a multivariate regression of the phenotype on all the SNPs in the set, every SNP has a P value below the threshold. COJO uses the meta-analysis summary statistics together with LD estimates from a reference sample. Our COJO analysis was conducted using a reference sample of approximately unrelated individuals of European ancestry from UK Biobank. We specified the P-value threshold as 5 × 10−8. The analyses were restricted to SNPs satisfying recommended quality-control filters. The Supplementary Note contains additional details.

Bayesian framework for calculating winner’s-curse-adjusted posterior effect-size distributions

We assume that the marginal effect size of each SNP is drawn from the following mixture distribution:

$${\beta }_{j} \sim \left\{\begin{array}{ll}N\left(0,{\tau }^{2}\right) & {\rm{with}}\,{\rm{probability}}\,\pi \\ 0 & {\rm{otherwise}}\end{array}\right.$$

where τ2 is the effect-size variance for non-null SNPs and π is the fraction of non-null SNPs in our data. We estimate the parameters τ2 and π by maximum likelihood. Given their values, the posterior distribution of SNP j can be calculated from Bayes’ rule. Relative to the GWAS effect estimate, the mean of the posterior distribution is shrunken toward zero (because zero is the mean of the prior distribution) and is not biased by the winner’s curse. Further details and a derivation of the likelihood function used in the maximum-likelihood estimation are provided in the supplementary note of a previous SSGAC study38.

To calculate the 5th, 50th and 95th percentile of the effect-size distribution of our lead SNPs, we simulated effect sizes from the posterior distribution of each lead SNP and identified the 5th, 50th and 95th percentiles of the complete set of simulated effect sizes.

As described below, we also use this Bayesian framework in our GWAS and MTAG replication analyses and in our within-family analyses.

Replication of lead SNPs from the previous combined-stage analysis

We conducted a replication analysis of the 162 lead SNPs identified at genome-wide significance in a previous10 combined-stage (discovery and replication) meta-analysis (n = 405,073). Of the 162 SNPs, 158 passed quality-control filters in our updated meta-analysis. To examine their out-of-sample replicability, we calculated Z-statistics from the subsample of our data (n = 726,808) that was not included in the previous study10. Let the Z-statistics of association from, respectively, the previous study10, the new data and our current meta-analysis, be denoted by Z1, Z2 and Z. Since our meta-analysis used sample-size weighting36, Z2 is implicitly defined by:

$$Z=\sqrt{\frac{{N}_{1}}{N}}{Z}_{1}+\sqrt{\frac{{N}_{2}}{N}}{Z}_{2}$$

where SNP subscripts have been dropped and N’s are sample sizes. Because this formula holds when Z1 and Z2 are independent, the implicitly defined Z2 is interpreted as the additional information contained in the new data.

Of the 158 SNPs, we found that 154 have matching signs in the new data (for the remaining four SNPs, the estimated effect is never statistically distinguishable from zero at P < 0.10). Of the 154 SNPs with matching signs, 143 are significant at P < 0.01, 119 are significant at P < 10−5 and 97 are significant at P < 5 × 10−8. The replication results are shown graphically in Supplementary Fig. 3. To help to interpret these results, we used the Bayesian framework described above to calculate the expected replication record under the hypothesis that all 158 SNPs are true associations. The posterior distributions of the effect sizes of the SNPs are calculated using parameters estimated from the summary statistics of the previous study10: \(\left({\hat{\tau }}^{2},\hat{\pi }\right)=\left(5.02\times 1{0}^{-6},0.33\right)\).

Within-family analyses

We conducted within-family association analyses on a sample of 22,135 sibling pairs from the Swedish Twin Registry’s Twingene study, the Swedish Twin Registry’s Screening Across the Lifespan Twin Youth study, UKB and WLS. For each cohort, we standardized EduYears within the cohort and then residualized this variable using the same controls as in the GWAS. We then regressed the sibling difference in the residuals on the sibling difference in genotype. We restricted analyses to SNPs with a minor allele frequency (MAF) above 5% in each of the sibling cohorts and meta-analyzed the cohort-level results using inverse-variance weighting.

We followed a previous study38 to compare the signs of the within-family estimates to the signs of the estimates from a GWAS meta-analysis that we re-ran after removing the sibling samples (n = 1,070,751). We benchmarked our observed fraction of concordant signs against the three theoretical benchmarks shown in Fig. 2. The theoretical benchmarks are calculated using posterior distributions for the GWAS effect sizes obtained from our Bayesian statistical framework. Treating each benchmark as a null hypothesis, we conducted one-sided binomial tests for which the alternative hypothesis is that the observed sign concordance falls short of the benchmark. We conducted this test for sets of approximately independent SNPs selected at the P-value thresholds of 5 × 10−8, 5 × 10−5 and 5 × 10−3 (Fig. 2 and Supplementary Table 20).

We also performed regression-based comparisons of the within-family estimates and the GWAS estimates (Supplementary Table 21 and Supplementary Fig. 21). Further details, including a derivation of our assortative-mating adjustment, can be found in the Supplementary Note.

Joint F-test of heterogeneity

When the SNPs are considered individually, for all but one of the 1,271 lead SNPs, we fail to reject a null hypothesis of homogenous effects across cohorts at the Bonferroni-adjusted P value threshold of 0.05/1,271. We generated an omnibus test statistic for heterogeneity by summing the Cochran Q-statistics for heterogeneity across all 1,271 lead SNPs39. Because the software used for meta-analysis does not report Q-statistics, we inferred these values based on the reported heterogeneity P values. To do so, we treated each lead SNP as if it were available for each of the 71 cohorts in the meta-analysis, which implies that the Q-statistic for each lead SNP has a χ2 distribution with 70 degrees of freedom. The sum of these Q-statistics is therefore (approximately) χ2-distributed with 70 × 1,270 =88,970 degrees of freedom. This gave us an omnibus Q-statistic of 91,830, with corresponding P value equal to 9.68 × 10−12.

Cross-cohort genetic correlation

We estimated the genetic correlation of EduYears across all pairs of cohorts with non-negative heritability estimates (Supplementary Table 22). We used bivariate LD score regression40 implemented by the LDSC software with a European reference population, filtered to HapMap3 SNPs. The estimated genetic correlations of EduYears between each of our 933 pairs of cohorts is shown in Supplementary Table 23.

We calculated the inverse-variance-weighted mean of the genetic-correlation estimates. The genetic correlation across pairs of cohorts will be correlated across all observations that share one of their cohorts in common. Therefore, to obtain correct standard errors, we used the node-jackknife variance estimator described previously41. As detailed in Supplementary Note, we also estimated the variance of SNP heritability of EduYears across cohorts, and we conducted analyses to assess the extent to which we can predict variation in SNP heritability and genetic correlation of EduYears based on several observable cohort characteristics (Supplementary Tables 24, 25).

X chromosome

We performed association analyses of SNPs on the X chromosome in our two largest cohorts, UKB (n = 329,358) and 23andMe (n = 365,536). The UKB analyses were conducted in a sample of conventionally unrelated individuals of European ancestry, yielding a smaller sample size than the autosomal UKB analyses (Supplementary Table 26). Imputed genotypes for the X chromosome were not included in the data officially released by UKB. We therefore imputed the data ourselves using the 1000 Genomes Project42 as our reference panel.

In both cohorts, the association analyses were performed on a pooled male–female sample with male genotypes coded 0/2. Except for this allele coding in males, all major aspects of the 23andMe analysis were identical to those described for the autosomal analyses; see Supplementary Tables 1719 for details.

Both sets of association results underwent the same set of quality-control filters as the autosomal analyses prior to meta-analysis. Additionally, we dropped a small number of SNPs with male–female allele frequency differences above 0.005 in UKB. The meta-analysis was conducted in METAL36, using sample-size weighting. Only SNPs that were present in both cohorts’ result files were used. To adjust the test statistics for bias, we inflated the standard errors using the LD score regression intercept estimated from our main autosomal analysis \(\left(\sqrt{1.113}\right)\).

Heritability of the X chromosome and dosage compensation

To estimate SNP heritability for males and females, we use the equation

$$E\left[{\chi }_{i}^{2}\right]=1+\frac{{N}_{i}{h}_{i}^{2}}{{M}_{{\rm{eff}}}}$$

where \(i\in \left\{{\rm{m,f}}\right\}\) indicates males or females, \(E\left[{\chi }_{i}^{2}\right]\) is the expected χ2 statistic, \({h}_{i}^{2}\) is the SNP heritability for the X chromosome, Ni is the GWAS sample size, and Meff is the effective number of SNPs (which is assumed to be the same in males and females). We replaced \(E\left[{\chi }_{i}^{2}\right]\) with its sample analog and Meff with its estimated value, and then we solved for \({h}_{i}^{2}\).

Let \(\gamma ={h}_{{\rm{m}}}^{2}/{h}_{{\rm{f}}}^{2}\)denote the dosage compensation ratio. The ratio takes on a value between 0.5 (zero dosage compensation) and 2 (full dosage compensation). On the basis of the above equation, we estimated it as

$$\hat{\gamma }=\frac{({\bar{\chi }}_{{\rm{m}}}^{2}-1){N}_{{\rm{f}}}}{({\bar{\chi }}_{{\rm{f}}}^{2}-1){N}_{{\rm{m}}}}$$

where \({\bar{\chi }}_{i}^{2}\) is the mean χ2 statistic. (Similarly, our \(\hat{\gamma }\) estimate is equal to the ratio of our SNP heritability estimates.)

Biological annotation

We used DEPICT20 (downloaded February 2016 from https://github.com/perslab/depict) to identify the tissues or cell types in which the causal genes are strongly expressed, detect enrichment of gene sets and prioritize likely causal genes. We ran DEPICT as described previously10 with the following exceptions: we used 37,427 human Affymetrix HGU133a2.0 platform microarrays20, discarded gene sets that were not well-reconstituted43, and relaxed the significance threshold for defining a matching SNP in the simulated null GWAS from 5 × 10−4 to 5 × 10−3. ‘Previously prioritized’ genes were prioritized by DEPICT (in the sense of achieving FDR < 0.05) both in the previous study10 and in the current work; ‘newly prioritized’ genes, on the other hand, were not prioritized in the previous study10. We used expression data from the BrainSpan Developmental Transcriptome35 and calculated the average expression in the brain of all DEPICT-prioritized EduYears genes (Supplementary Table 7) as a function of developmental stage (Supplementary Table 8 and Supplementary Fig. 22).

In addition to the analyses presented in the main text, we determined which functional systems are least implicated by DEPICT (Supplementary Table 27) and how enrichment of gene sets differs across phenotypes (Supplementary Table 28).

We tested the robustness of our DEPICT results using the bioinformatics tools MAGMA44 and PANTHER45,46. For MAGMA, we used the ‘multi=snp-wise’ option, mapping a SNP to a gene if it resides within the gene boundaries or 5 kb of either endpoint. We estimated the LD using a reference panel of Europeans in 1000 Genomes phase 3, and we defined a gene as significant if its joint P value falls below the threshold corresponding to FDR < 0.05 (Supplementary Table 29). For PANTHER, we used the binomial overrepresentation test with the DEPICT-prioritized genes as input (Supplementary Table 30).

We also used stratified LD score regression22 to partition the heritability of the trait between SNPs of different types. In addition to the baseline SNP-level annotations (Supplementary Table 31), we tested a number of novel annotation types, described more completely in the Supplementary Note. We tested the heritability enrichment of neural cell types (Supplementary Table 9), various SNP-level annotations assembled by Pickrell47 (Supplementary Fig. 23 and Supplementary Table 32), developmental stages (Supplementary Table 33), and genes that are broadly expressed or specifically expressed in a particular tissue (Supplementary Fig. 24 and Supplementary Table 34). We also applied LD score regression to DEPICT-reconstituted gene sets (Supplementary Table 35) and binary gene sets (Supplementary Table 36 and Supplementary Fig. 25).

We used the tool CAVIARBF24,48 in a fine-mapping exercise to identify candidate causal SNPs. We used the 74 baseline annotations employed by stratified LD score regression as well as 451 annotations from Pickrell47. We applied a MAF filter of 0.01 and a sample-size filter of 400,000 and only considered SNPs within a 50-kb radius of a lead SNP. We computed exact Bayes factors by averaging over prior variances of 0.01, 0.1 and 0.5; we set the sample size to the mean sample size of our considered SNPs; and we added 0.2 to the main diagonal of the LD matrix because we used a reference panel for LD estimation. To incorporate annotations, we used the elastic net setting with parameters selected via fivefold cross-validation. The resulting annotation effect sizes and list of candidate causal SNPs are given in Supplementary Tables 37 and 10. Regional association plots of four noteworthy candidates are shown in Supplementary Fig. 9.

Polygenic prediction

Prediction analyses were performed using Add Health, HRS and WLS. Polygenic scores were constructed using HapMap3 SNPs that meet the following conditions: (i) the variant has a call rate greater than 98% in the prediction cohort; (ii) the variant has a MAF greater than 1% in the prediction cohort; and (iii) the allele frequency discrepancy between the meta-analysis and the prediction cohort does not exceed 0.15. To calculate the SNP weights, we used the software package LDpred26, assuming a fraction of causal variants equal to 1, and then we constructed the scores in PLINK.

All prediction exercises were performed with an Ordinary Least Squares or probit regression of a phenotype on our score and a set of controls consisting of a full set of dummy variables for year of birth, an indicator variable for sex, a full set of interactions between sex and year of birth, and the first 10 principal components of the variance–covariance matrix of the genetic relatedness matrix.

Our measure of prediction accuracy is the incremental R2. To calculate this value, we first regress a phenotype on our set of controls without the polygenic score. Next, we re-run the same regression but with the score included as a regressor. For quantitative phenotypes, our measure of predictive power is the change in R2. For binary outcomes, we calculated the incremental pseudo-R2 from a Probit regression. To obtain 95% confidence intervals, we bootstrapped the incremental R2 values with 1,000 repetitions (Supplementary Table 38 and Supplementary Figs. 13, 2628).

Prediction of other phenotypes

In addition to EduYears, we also used our polygenic score to predict a number of other phenotypes. For the HRS and Add Health datasets, we analyzed three binary variables related to educational attainment: (i) high school completion; (ii) college completion; and (iii) grade retention (that is, retaking a grade).

In additional analyses in Add Health, we predicted an augmented version of the Peabody picture vocabulary test, measured when participants were 12–20 years old. Peabody scores were age-standardized. We also predicted a number of grade point average (GPA) variables (range: 0.0–4.0) from the third wave of Add Health, when transcripts were collected from respondents’ high schools. We analyzed overall GPA, math GPA, science GPA and verbal GPA, controlling for high school fixed effects.

In additional analyses in the HRS, we predicted several cognitive phenotypes. Total cognition is the sum of four cognitive measures measured in waves 3 through 10: an immediate word recall task, a delayed word recall task, a naming task and a counting task. Verbal cognition measures the ability of the subject to define five words. To evaluate changes over time, we also studied wave-to-wave changes in total cognition and verbal cognition. Our next cognitive outcome, Alzheimer’s, is an indicator variable equal to 1 for subjects who report having been diagnosed with Alzheimer’s disease, and 0 otherwise. Because the HRS data are longitudinal, the unit of analysis for our four cognitive outcomes is a person-year. For these analyses, because an individual took the cognitive tests at different ages, in our set of controls we replaced our person-specific age variable with age at assessment (which differs for an individual across the cognitive outcomes); we also clustered all standard errors at the person level.

In the WLS, we measured cognitive performance using the raw score of the respondent in a Henmon–Nelson test of mental ability49.

For all of these additional prediction exercises, results are shown in Supplementary Table 38 and depicted in Fig. 4a and Supplementary Figs. 13, 11.

Benchmarking the predictive power of the EduYears polygenic score

To benchmark the predictive power of our score, we compared its predictive power to the predictive power of other common variables: education of the mother, education of the father, education of both mother and father, verbal cognition, household income and a binary indicator for marital status. For each variable, we calculated the incremental R2 of the variable using the same procedures as those described above, with the same set of control variables. (For ‘education of both mother and father’, we calculated the incremental R2 from adding both variables as regressors.) The results of this analysis are shown in Supplementary Table 39a and depicted in Fig. 4b and Supplementary Fig. 12.

We also evaluated the attenuation in the incremental R2 of the polygenic score in predicting EduYears when we control for available demographic variables one at a time: marital status, household income, education of the mother and education of the father. We next controlled for the education of both mother and father, and finally, we controlled for the full set of demographic controls. The results of this analysis are shown in Supplementary Table 39b and Supplementary Fig. 12.

GWAS of cognitive performance, math ability and highest math

The GWAS of math ability (n = 564,698) and highest math (n = 430,445) phenotypes were conducted exclusively among research participants of the personal genomics company 23andMe who answered survey questions about their mathematical background. In our analyses of cognitive performance, we combined a published study of general cognitive ability (n = 35,298) conducted by the COGENT consortium29 with new genome-wide association analyses of cognitive performance in the UKB (n = 222,543). The phenotype measures are described in detail in Supplementary Table 40. Our new genome-wide analyses of cognitive performance in UKB, and math ability and highest math in 23andMe, were conducted using methods identical to those for EduYears in UKB and 23andMe, respectively (Supplementary Table 19).

For cognitive performance, we conducted a sample-size-weighted meta-analysis (n = 257,841), imposing a minimum-sample-size filter of 100,000. We similarly applied minimum-sample-size filters to the math ability (n > 500,000) and highest math (n > 350,000) results. We adjusted the test statistics using the estimated intercepts from LD score regressions (1.073 for math ability, 1.105 for highest math and 1.046 for cognitive performance). The summary statistics underwent quality control using the same procedures applied to the EduYears results.

The lists of lead SNPs were obtained by applying the same clumping algorithm used in the EduYears analyses (Supplementary Tables 1113). Manhattan plots from the analyses are shown in Supplementary Figs. 1416.

MTAG of cognitive performance, math ability and highest math

We performed a joint analysis of our GWAS results on EduYears, cognitive performance, math ability and highest math using MTAG11. Supplementary Table 14 shows moderately high pairwise genetic correlations, ranging from 0.51 to 0.85, which motivate the multivariate analysis. The MTAG analyses were restricted to SNPs that passed MTAG-recommended filters in all files with summary statistics. We removed (i) SNPs with a MAF below 1% or (ii) SNPs with sample sizes below a cutoff (66.6% of the 90th percentile), leaving approximately 7.1 million SNPs found in all four results files. Supplementary Table 41 reports the increases in effective sample size from using MTAG for each set of GWAS results.

Supplementary Table 15 lists all the lead SNPs in the MTAG analysis. Supplementary Figs. 1720 show inverted Manhattan plots that compare the MTAG and GWAS results, restricted to the set of SNPs that pass MTAG filters.

Polygenic scores were constructed from MTAG results using the same procedures as for the GWAS results. Supplementary Figure 29 and Supplementary Tables 42 and 43 compare the predictive power of scores constructed from MTAG results in the Add Health and WLS cohorts (see Supplementary Note for details).

To examine the credibility of the MTAG-identified lead SNPs of our lowest-powered GWAS, cognitive performance, we conducted a replication analysis. We re-ran MTAG with GWAS results that exclude COGENT cohorts, and we used the COGENT meta-analysis as our replication sample. In addition to applying the MTAG filters above, we limited the analysis to SNPs for which the COGENT results file contains summary statistics based on analyses of at least 25,000 individuals. The MTAG-identified lead SNPs for cognitive performance from our restricted sampled are reported in Supplementary Table 44. We used our Bayesian framework to calculate the expected replication record of the MTAG results under the hypothesis that the MTAG-identified lead SNPs are true positives, given sampling variation and adjusted for winner’s curse and differences in SNP heritability across the samples.

Reporting Summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this article.

Code availability

All software used to perform these analyses are available online.

Data availability

Summary statistics can be downloaded from http://www.thessgac.org/data. We provide association results for all SNPs that passed quality-control filters in a GWAS meta-analysis of EduYears that excludes the research participants from 23andMe. SNP-level summary statistics from analyses based entirely or in part on 23andMe data can only be reported for up to 10,000 SNPs. We provide summary statistics for all lead SNPs identified in our GWAS analyses of cognitive performance, math ability and highest math and the MTAG analyses of our four phenotypes. For the complete EduYears GWAS, which includes 23andMe, clumped results for the 3,575 SNPs with P < 10−5 are provided; this P-value threshold was chosen such that the total number of SNPs across the analyses that include data from 23andMe does not exceed 10,000. Contact information for each of the cohorts included in this paper can be found in the Supplementary Note.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Branigan, A. R., McCallum, K. J. & Freese, J. Variation in the heritability of educational attainment: an international meta-analysis. Soc. Forces 92, 109–140 (2013).

  2. 2.

    Conti, G., Heckman, J. & Urzua, S. The education–health gradient. Am. Econ. Rev. 100, 234–238 (2010).

  3. 3.

    Cutler, D. M. & Lleras-Muney, A. in Making Americans Healthier: Social and Economic Policy as Health Policy (eds House, J. et al.) (Russell Sage Foundation, New York, 2008).

  4. 4.

    Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

  5. 5.

    Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).

  6. 6.

    Belsky, D. W. et al. The genetics of success: how single-nucleotide polymorphisms associated with educational attainment relate to life-course development. Psychol. Sci. 27, 957–972 (2016).

  7. 7.

    Domingue, B. W., Belsky, D. W., Conley, D., Harris, K. M. & Boardman, J. D. Polygenic influence on educational attainment: new evidence from The National Longitudinal Study of Adolescent to Adult Health. AERA Open 1, 1–13 (2015).

  8. 8.

    Marioni, R. E. et al. Genetic variants linked to education predict longevity. Proc. Natl Acad. Sci. USA 113, 13366–13371 (2016).

  9. 9.

    Anttila, A. V. et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).

  10. 10.

    Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

  11. 11.

    Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

  12. 12.

    The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  13. 13.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

  14. 14.

    Wu, Y., Zheng, Z., Visscher, P. M. & Yang, J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 18, 86 (2017).

  15. 15.

    Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

  16. 16.

    Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).

  17. 17.

    de Vlaming, R. et al. Meta-GWAS accuracy and power (MetaGAP) calculator shows that hiding heritability is partially due to imperfect genetic correlations across studies. PLoS Genet. 13, e1006495 (2017).

  18. 18.

    Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat. Hum. Behav. 1, 757–765 (2017).

  19. 19.

    Johnson, W., Carothers, A. & Deary, I. J. Sex differences in variability in general intelligence: a new look at the old question. Perspect. Psychol. Sci. 3, 518–531 (2008).

  20. 20.

    Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).

  21. 21.

    Azevedo, F. A. C. et al. Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. J. Comp. Neurol. 513, 532–541 (2009).

  22. 22.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

  23. 23.

    Reed, T. E. & Jensen, A. R. Arm nerve conduction velocity (NCV), brain NCV, reaction time, and intelligence. Intelligence 15, 33–47 (1991).

  24. 24.

    Chen, W., McDonnell, S. K., Thibodeau, S. N., Tillmans, L. S. & Schaid, D. J. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics 204, 933–958 (2016).

  25. 25.

    Wang, G. et al. CaV3.2 calcium channels control NMDA receptor-mediated transmission: a new mechanism for absence epilepsy. Genes Dev. 29, 1535–1551 (2015).

  26. 26.

    Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenicrisk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

  27. 27.

    Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

  28. 28.

    Scutari, M., Mackay, I. & Balding, D. Using genetic distance to infer the accuracy of genomic prediction. PLoS Genet. 12, e1006288 (2016).

  29. 29.

    Trampush, J. W. et al. GWAS meta-analysis reveals novel loci and genetic correlates for general cognitive function: a report from the COGENT consortium. Mol. Psychiatry 22, 336–345 (2017).

  30. 30.

    Davies, G. et al. Ninety-nine independent genetic loci influencing general cognitive function include genes associated with brain health and structure (n = 280,360). https://doi.org/10.1101/176511 (2017).

  31. 31.

    Sniekers, S. et al. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nat. Genet. 49, 1107–1112 (2017).

  32. 32.

    Savage, J. E. et al. GWAS meta-analysis (n=279,930) identifies new genes and functional links to intelligence. https://doi.org/10.1101/184853 (2017).

  33. 33.

    Schmitz, L. L. & Conley, D. The effect of Vietnam-era conscription and genetic potential for educational attainment on schooling outcomes. Econ. Educ. Rev. 61, 85–97 (2017).

  34. 34.

    Heath, A. C. et al. Education policy and the heritability of educational attainment. Nature 314, 734–736 (1985).

  35. 35.

    Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).

  36. 36.

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

  37. 37.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 1–16 (2015).

  38. 38.

    Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

  39. 39.

    Cochran, W. G. The combination of estimates from different experiments. Biometrics 10, 101–129 (1954).

  40. 40.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

  41. 41.

    Cameron, A. C. & Miller, D. Robust inference with dyadic data. Winter North American Meetings of the Econometric Society, Boston, January 5, 2015.

  42. 42.

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  43. 43.

    Fehrmann, R. S. N. et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet. 47, 115–125 (2015).

  44. 44.

    de Leeuw, C. A. et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

  45. 45.

    Liu, J. Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).

  46. 46.

    Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 8, 1551–1566 (2013).

  47. 47.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

  48. 48.

    Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).

  49. 49.

    Henmon, V. A. C. & Nelson, M. J. Henmon–Nelson Tests of Mental Ability, High School Examination—Grades 7 to 12—Forms A, B, and C. Teacher’s Manual. (Houghton-Mifflin, Boston, 1946).

Download references

Acknowledgements

This research was carried out under the auspices of the Social Science Genetic Association Consortium (SSGAC). The research has also been conducted using the UK Biobank Resource under application numbers 11425 and 12512. We acknowledge the Swedish Twin Registry for access to data. The Swedish Twin Registry is managed by the Karolinska Institutet and receives funding through the Swedish Research Council under the grant number 2017-00641. This study was supported by funding from the Ragnar Söderberg Foundation (E9/11, E42/15), the Swedish Research Council (421-2013-1061), The Jan Wallander and Tom Hedelius Foundation, an ERC Consolidator Grant (647648 EdGe), the Pershing Square Fund of the Foundations of Human Behavior, The Open Philanthropy Project (2016-152872), and the NIA/NIH through grants P01-AG005842, P01-AG005842-20S2, P30-AG012810 and T32-AG000186-23 to N.B.E.R. and R01-AG042568 to U.S.C. A full list of acknowledgments is provided in the Supplementary Note.

Author information

Author notes

  1. These authors contributed equally: James J. Lee, Robbee Wedow, Aysu Okbay.

  2. These authors jointly supervised this work: Patrick Turley, Peter M. Visscher, Daniel J. Benjamin, David Cesarini.

  3. A list of members and affiliations appears at the end of the paper.

  4. A list of members and affiliations appears in the Supplementary Information.

Affiliations

  1. Department of Psychology, University of Minnesota Twin Cities, Minneapolis, MN, USA

    • James J. Lee
    • , Emily A. Willoughby
    • , James J. Lee
    • , Michael B. Miller
    • , William G. Iacono
    • , Matt McGue
    •  & Robert F. Krueger
  2. Department of Sociology, University of Colorado Boulder, Boulder, CO, USA

    • Robbee Wedow
    •  & Jason D. Boardman
  3. Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA

    • Robbee Wedow
    •  & Jason D. Boardman
  4. Institute of Behavioral Science, University of Colorado Boulder, Boulder, CO, USA

    • Robbee Wedow
    •  & Jason D. Boardman
  5. Department of Complex Trait Genetics, Center for Neurogenomics and Cognitive Research, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

    • Aysu Okbay
    • , Richard Karlsson Linnér
    • , Aysu Okbay
    • , S. Fleur W. Meddens
    • , Christiaan de Leeuw
    • , Danielle Posthuma
    • , Philipp D. Koellinger
    •  & Philipp D. Koellinger
  6. Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands

    • Aysu Okbay
    • , Richard Karlsson Linnér
    • , Aysu Okbay
    • , Philipp D. Koellinger
    •  & Philipp D. Koellinger
  7. Department of Economics, Harvard University, Cambridge, MA, USA

    • Edward Kong
    • , Omeed Maghzian
    • , Peter Bowers
    • , Chanwook Lee
    • , Hui Li
    • , Olga Rostapshova
    • , David I. Laibson
    •  & David I. Laibson
  8. Department of Sociology, Harvard University, Cambridge, MA, USA

    • Meghan Zacher
  9. Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA

    • Tuan Anh Nguyen-Viet
    • , Mark Alan Fontana
    • , Tushar Kundu
    • , Ruoxi Li
    • , Rebecca Royer
    • , Mark Alan Fontana
    • , Daniel J. Benjamin
    • , Chelsea Watson
    •  & Daniel J. Benjamin
  10. Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia

    • Julia Sidorenko
    • , Loïc Yengo
    • , Jian Yang
    • , Peter M. Visscher
    • , Kathryn E. Kemper
    • , Yang Wu
    • , Zhili Zheng
    • , Matthew R. Robinson
    • , Jian Yang
    •  & Peter M. Visscher
  11. Estonian Genome Center, University of Tartu, Tartu, Estonia

    • Julia Sidorenko
    • , Evelin Mihailov
    • , Natalia Pervjakova
    • , Reedik Mägi
    • , Lili Milani
    • , Andres Metspalu
    • , Markus Perola
    • , Tõnu Esko
    • , Maris Alver
    • , Reedik Mägi
    • , Andres Metspalu
    • , Lili Milani
    •  & Tõnu Esko
  12. Institute for Behavior and Biology, Erasmus University Rotterdam, Rotterdam, The Netherlands

    • Richard Karlsson Linnér
    • , Cornelius A. Rietveld
    • , S. Fleur W. Meddens
    • , Ronald de Vlaming
    • , A. Roy Thurik
    • , Philipp D. Koellinger
    •  & Philipp D. Koellinger
  13. Center for the Advancement of Value in Musculoskeletal Care, Hospital for Special Surgery, New York, NY, USA

    • Mark Alan Fontana
    •  & Mark Alan Fontana
  14. The Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, Faculty of Health and Medical Sciences, Copenhagen, Denmark

    • Pascal N. Timshel
    • , Tune H. Pers
    • , Pascal Timshel
    • , Tarunveer S. Ahluwalia
    • , Thorkild I. A. Sørensen
    •  & Tune H. Pers
  15. Statens Serum Institut, Department of Epidemiology Research, Copenhagen, Denmark

    • Pascal N. Timshel
    • , Tune H. Pers
    • , Pascal Timshel
    •  & Tune H. Pers
  16. Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA

    • Raymond K. Walters
    • , Patrick Turley
    • , Aarno Palotie
    •  & Patrick Turley
  17. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Raymond K. Walters
    • , Patrick Turley
    • , Aarno Palotie
    •  & Patrick Turley
  18. Institute for Social and Economic Research, University of Essex, Colchester, UK

    • Yanchun Bao
    • , Pamela Herd
    •  & Meena Kumari
  19. Centre for Global Health Research, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK

    • Katharina E. Schraut
    • , Harry Campbell
    • , Peter K. Joshi
    • , Igor Rudan
    • , Ozren Polasek
    • , James F. Wilson
    • , David W. Clark
    • , Peter K. Joshi
    • , Harry Campbell
    • , Melissa C. Smart
    •  & James F. Wilson
  20. MRC Epidemiology Unit, Institute of Metabolic Science, University of Cambridge, Cambridge, UK

    • Felix R. Day
    • , Claudia Langenberg
    • , Jing Hua Zhao
    • , Jian’an Luan
    • , Ken K. Ong
    • , John R. B. Perry
    •  & Nicholas J. Wareham
  21. 23andMe, Inc., Mountain View, CA, USA

    • Michelle Agee
    • , Babak Alipanahi
    • , Adam Auton
    • , Robert K. Bell
    • , Katarzyna Bryc
    • , Sarah L. Elson
    • , Pierre Fontanillas
    • , David A. Hinds
    • , Jennifer C. McCreight.
    • , Karen E. Huber
    • , Nadia K. Litterman
    • , Matthew H. McIntyre
    • , Joanna L. Mountain
    • , Elizabeth S. Noblin
    • , Carrie A. M. Northover
    • , Steven J. Pitts
    • , J. Fah Sathirapongsasuti
    • , Olga V. Sazonova
    • , Janie F. Shelton
    • , Suyash Shringarpure
    • , Chao Tian
    • , Vladimir Vacic
    • , Catherine H. Wilson
    • , Nicholas A. Furlotte
    • , David A. Hinds
    • , Joyce Y. Tung
    • , Nicholas A. Furlotte
    • , Aaron Kleinman
    •  & Joyce Y. Tung
  22. Institute of Social and Preventive Medicine, University Hospital of Lausanne, Lausanne, Switzerland

    • Peter K. Joshi
    •  & Peter K. Joshi
  23. BrainWorkup, LLC, Santa Monica, CA, USA

    • Joey W. Trampush
  24. Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA

    • Joey W. Trampush
  25. Department of Biomedical and Translational Informatics, Geisinger Health System, Danville, PA, USA

    • Shefali Setia Verma
    •  & Marylyn D. Ritchie
  26. Institute of Mental Health, Singapore, Singapore

    • Max Lam
  27. Genome Institute, Singapore, Singapore

    • Max Lam
  28. The Eye Hospital, School of Ophthalmology and Optometry, Wenzhou Medical University, Wenzhou, China

    • Zhili Zheng
  29. Department of Sociology, Stanford University, Stanford, CA, USA

    • Jeremy Freese
  30. Department of Sociology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

    • Kathleen Mullan Harris
  31. Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA

    • Kathleen Mullan Harris
  32. MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK

    • Jennifer E. Huffman
    • , Jonathan Marten
    • , Caroline Hayward
    • , Veronique Vitart
    • , James F. Wilson
    • , Alan F. Wright
    • , Caroline Hayward
    •  & James F. Wilson
  33. La Follette School of Public Affairs, University of Wisconsin-Madison, Madison, WI, USA

    • Pamela Herd
  34. Departments of Psychiatry and Molecular Medicine, Hofstra Northwell School of Medicine, Hempstead, NY, USA

    • Todd Lencz
    •  & Anil K. Malhotra
  35. Center for Psychiatric Neuroscience, Feinstein Institute for Medical Research, Manhasset, NY, USA

    • Todd Lencz
    •  & Anil K. Malhotra
  36. Psychiatry Research, The Zucker Hillside Hospital, Glen Oaks, CA, USA

    • Todd Lencz
    •  & Anil K. Malhotra
  37. Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia

    • Andres Metspalu
    •  & Andres Metspalu
  38. Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK

    • Sarah E. Harris
    • , David J. Porteous
    •  & David J. Porteous
  39. Division of Population Health Sciences, Ninewells Hospital and Medical School, University of Dundee, Dundee, UK

    • Blair H. Smith
    •  & Blair H. Smith
  40. Medical Research Institute, University of Dundee, Dundee, UK

    • Blair H. Smith
    •  & Blair H. Smith
  41. Department of Economics, University of Toronto, Toronto, Ontario, Canada

    • Jonathan P. Beauchamp
    • , Riccardo E. Marioni
    •  & Jonathan P. Beauchamp
  42. Department of Sociology, Princeton University, Princeton, NJ, USA

    • Dalton C. Conley
    •  & Dalton C. Conley
  43. School of Policy Studies, Queen’s University, Kingston, Ontario, Canada

    • Steven F. Lehrer
    •  & Steven F. Lehrer
  44. Department of Economics, New York University Shanghai, Pudong, Shanghai, China

    • Steven F. Lehrer
    •  & Steven F. Lehrer
  45. National Bureau of Economic Research, Cambridge, MA, USA

    • Steven F. Lehrer
    • , David Cesarini
    • , Daniel J. Benjamin
    • , Steven F. Lehrer
    • , Daniel J. Benjamin
    •  & David Cesarini
  46. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden

    • Robert Karlsson
    • , Paul Lichtenstein
    • , Nancy L. Pedersen
    • , Patrik K. E. Magnusson
    •  & Patrik K. E. Magnusson
  47. Department of Government, Uppsala University, Uppsala, Sweden

    • Sven Oskarsson
    • , Karl-Oskar Lindgren
    •  & Sven Oskarsson
  48. Department of Computational Biology, University of Lausanne, Lausanne, Switzerland

    • Matthew R. Robinson
  49. Department of Economics, New York University, New York, NY, USA

    • Kevin Thom
    • , David Cesarini
    • , Kevin Thom
    •  & David Cesarini
  50. Autism and Developmental Medicine Institute, Geisinger Health System, Lewisburg, PA, USA

    • Christopher F. Chabris
    •  & Christopher F. Chabris
  51. Center for Translational Bioethics and Health Care Policy, Geisinger Health System, Danville, PA, USA

    • Michelle N. Meyer
    •  & Michelle N. Meyer
  52. Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia

    • Guo-Bo Chen
    • , Zhihong Zhu
    • , Andrew Bakshi
    • , Anna A. E. Vinkhuyzen
    • , Jacob Gratten
    • , Jian Yang
    • , Peter M. Visscher
    • , Jian Yang
    •  & Peter M. Visscher
  53. Department of Economics, Stockholm School of Economics, Stockholm, Sweden

    • Magnus Johannesson
    •  & Magnus Johannesson
  54. Department of Economics, University of Southern California, Los Angeles, CA, USA

    • Daniel J. Benjamin
    •  & Daniel J. Benjamin
  55. Center for Experimental Social Science, New York University, New York, NY, USA

    • David Cesarini
    •  & David Cesarini
  56. Department of Applied Economics, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, The Netherlands

    • Cornelius A. Rietveld
    • , Ronald de Vlaming
    •  & A. Roy Thurik
  57. Department of Epidemiology, Erasmus Medical Center, Rotterdam, The Netherlands

    • Cornelius A. Rietveld
    • , Ronald de Vlaming
    • , Sven J. van der Lee
    • , Najaf Amin
    • , Frank J. A. van Rooij
    • , Cornelia M. van Duijn
    • , Henning Tiemeier
    • , André G. Uitterlinden
    •  & Albert Hofman
  58. Icelandic Heart Association, Kopavogur, Iceland

    • Valur Emilsson
    • , Albert V. Smith
    •  & Vilmundur Gudnason
  59. Faculty of Pharmaceutical Sciences, University of Iceland, Reykjavík, Iceland

    • Valur Emilsson
  60. Amsterdam Business School, University of Amsterdam, Amsterdam, The Netherlands

    • S. Fleur W. Meddens
    •  & Maël P. Lebreton
  61. New York Genome Center, New York, NY, USA

    • Joseph K. Pickrell
  62. Department of Biological Psychology, VU University Amsterdam, Amsterdam, The Netherlands

    • Abdel Abdellaoui
    • , Jouke-Jan Hottenga
    • , Gonneke Willemsen
    •  & Dorret I. Boomsma
  63. COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark

    • Tarunveer S. Ahluwalia
    • , Klaus Bønnelykke
    • , Johannes Waage
    •  & Hans Bisgaard
  64. Steno Diabetes Center, Gentofte, Denmark

    • Tarunveer S. Ahluwalia
    •  & Johannes Waage
  65. Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, Gothenburg, Sweden

    • Jonas Bacelis
    •  & Bo Jacobsson
  66. Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany

    • Clemens Baumbach
    •  & Christian Gieger
  67. Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany

    • Clemens Baumbach
    •  & Christa Meisinger
  68. deCODE Genetics/Amgen Inc., Reykjavík, Iceland

    • Gyda Bjornsdottir
    • , Augustine Kong
    • , Gudmar Thorleifsson
    • , Bjarni Gunnarsson
    • , Bjarni V. Halldórsson
    • , Kari Stefansson
    •  & Unnur Thorsteinsdottir
  69. Department of Cell Biology, Erasmus Medical Center Rotterdam, Rotterdam, The Netherlands

    • Johannes H. Brandsma
    •  & Raymond A. Poot
  70. Istituto di Ricerca Genetica e Biomedica U.O.S. di Sassari, National Research Council of Italy, Sassari, Italy

    • Maria Pina Concas
    • , Simona Vaccargiu
    •  & Mario Pirastu
  71. Psychology, University of Illinois, Champaign, IL, USA

    • Jaime Derringer
  72. Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands

    • Tessel E. Galesloot
    •  & Lambertus A. L. M. Kiemeney
  73. Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste, Italy

    • Giorgia Girotto
    • , Dragana Vuckovic
    • , Ilaria Gandin
    • , Paolo Gasparini
    •  & Nicola Pirastu
  74. Department of Public Health, University of Helsinki, Helsinki, Finland

    • Richa Gupta
    • , Antti Latvala
    • , LifeLines Cohort Study, Anu Loukola
    •  & Jaakko Kaprio
  75. Department of Cardiovascular Sciences, University of Leicester, Leicester, UK

    • Leanne M. Hall
    • , Christopher P. Nelson
    •  & Nilesh J. Samani
  76. Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh, UK

    • Leanne M. Hall
    • , Sarah E. Harris
    • , Gail Davies
    • , David C. M. Liewald
    • , Riccardo E. Marioni
    •  & Ian J. Deary
  77. Department of Neurology, General Hospital and Medical University Graz, Graz, Austria

    • Edith Hofer
    • , Katja E. Petrovic
    • , Helena Schmidt
    •  & Reinhold Schmidt
  78. Institute for Medical Informatics, Statistics and Documentation, General Hospital and Medical University Graz, Graz, Austria

    • Edith Hofer
  79. Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK

    • Momoko Horikoshi
  80. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK

    • Momoko Horikoshi
  81. Institute of Behavioural Sciences, University of Helsinki, Helsinki, Finland

    • Kadri Kaasik
    • , Jari Lahti
    • , Liisa Keltigangas-Järvinen
    •  & Katri Räikkönen
  82. Nutrition and Dietetics, Health Science and Education, Harokopio University, Athens, Greece

    • Ioanna P. Kalafati
    •  & George V. Dedoussis
  83. Folkhälsan Research Centre, Helsingfors, Finland

    • Jari Lahti
    • , Katri Räikkönen
    •  & Johan G. Eriksson
  84. Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, The Netherlands

    • Christiaan de Leeuw
  85. Quantitative Genetics, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia

    • Penelope A. Lind
    •  & Sarah E. Medland
  86. Lifespan Psychology, Max Planck Institute for Human Development, Berlin, Germany

    • Tian Liu
  87. Department of Twin Research and Genetic Epidemiology, King’s College London, London, UK

    • Massimo Mangino
    • , Lydia Quaye
    • , Cristina Venturini
    •  & Tim D. Spector
  88. NIHR Biomedical Research Centre, Guy’s and St. Thomas’ Foundation Trust, London, UK

    • Massimo Mangino
    •  & Cristina Venturini
  89. Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands

    • Peter J. van der Most
    • , Behrooz Z. Alizadeh
    • , Jennifer A. Smith
    •  & Judith M. Vonk
  90. Public Health Stream, Hunter Medical Research Institute, New Lambton, New South Wales, Australia

    • Christopher Oldmeadow
    • , Elizabeth G. Holliday
    •  & John R. Attia
  91. Faculty of Health and Medicine, University of Newcastle, Newcastle, New South Wales, Australia

    • Christopher Oldmeadow
    • , Elizabeth G. Holliday
    • , Rodney J. Scott
    •  & John R. Attia
  92. Centre for Integrated Genomic Medical Research, Institute of Population Health, The University of Manchester, Manchester, UK

    • Antony Payton
    •  & William E. R. Ollier
  93. School of Psychological Sciences, The University of Manchester, Manchester, UK

    • Antony Payton
  94. Department of Health, THL-National Institute for Health and Welfare, Helsinki, Finland

    • Natalia Pervjakova
    • , Niina Eklund
    • , Seppo Koskinen
    • , Tomi Mäki-Opas
    • , Veikko Salomaa
    • , Jaakko Kaprio
    •  & Markus Perola
  95. Psychiatry, VU University Medical Center and GGZ inGeest, Amsterdam, The Netherlands

    • Wouter J. Peyrot
    • , Yusplitri Milaneschi
    •  & Brenda W. J. H. Penninx
  96. Laboratory of Genetics, National Institute on Aging, Baltimore, MD, USA

    • Yong Qian
    • , Jun Ding
    •  & David Schlessinger
  97. Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, Turku, Finland

    • Olli Raitakari
  98. Department of Medical Genetics, University of Lausanne, Lausanne, Switzerland

    • Rico Rueedi
    •  & Zoltan Kutalik
  99. Swiss Institute of Bioinformatics, Lausanne, Switzerland

    • Rico Rueedi
    •  & Zoltan Kutalik
  100. Department Of Health Sciences, University of Milan, Milan, Italy

    • Erika Salvi
    •  & Daniele Cusi
  101. Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, Essen, Germany

    • Börge Schmidt
    • , Lewin Eisele
    •  & Karl-Heinz Jöckel
  102. Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD, USA

    • Jianxin Shi
  103. Faculty of Medicine, University of Iceland, Reykjavík, Iceland

    • Albert V. Smith
    • , Vilmundur Gudnason
    • , Kari Stefansson
    •  & Unnur Thorsteinsdottir
  104. MRC Integrative Epidemiology Unit, University of Bristol, Bristol, UK

    • Beate St Pourcain
    • , David M. Evans
    • , George McMahon
    • , Lavinia Paternoster
    • , Susan M. Ring
    • , Thorkild I. A. Sørensen
    • , Nicholas J. Timpson
    •  & George Davey Smith
  105. School of Oral and Dental Sciences, University of Bristol, Bristol, UK

    • Beate St Pourcain
  106. Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany

    • Alexander Teumer
    • , Sebastian E. Baumeister
    • , Henry Völzke
    •  & Wolfgang Hoffmann
  107. Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands

    • Niek Verweij
    • , Klaus Berger
    •  & Pim van der Harst
  108. Institute of Epidemiology and Social Medicine, University of Münster, Münster, Germany

    • Juergen Wellmann
  109. Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA

    • Harm-Jan Westra
  110. Partners Center for Personalized Genetic Medicine, Boston, MA, USA

    • Harm-Jan Westra
  111. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • Harm-Jan Westra
    • , Philip L. de Jager
    •  & Aarno Palotie
  112. Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL, USA

    • Jingyun Yang
    • , Patricia A. Boyle
    •  & David A. Bennett
  113. Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA

    • Jingyun Yang
    •  & David A. Bennett
  114. Department of Epidemiology, University of Michigan, Ann Arbor, MI, USA

    • Wei Zhao
    • , Erin B. Ware
    •  & Sharon L. R. Kardia
  115. Department of Gastroenterology and Hepatology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands

    • Behrooz Z. Alizadeh
  116. Institute of Epidemiology and Preventive Medicine, University of Regensburg, Regensburg, Germany

    • Sebastian E. Baumeister
  117. Institute of Molecular Genetics, National Research Council of Italy, Pavia, Italy

    • Ginevra Biino
  118. Department of Behavioral Sciences, Rush University Medical Center, Chicago, IL, USA

    • Patricia A. Boyle
  119. Warwick Medical School, University of Warwick, Coventry, UK

    • Francesco P. Cappuccio
  120. Department of Psychology, University of Edinburgh, Edinburgh, UK

    • Gail Davies
    • , David C. M. Liewald
    •  & Ian J. Deary
  121. Saïd Business School, University of Oxford, Oxford, UK

    • Jan-Emmanuel De Neve
  122. William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK

    • Panos Deloukas
    •  & Stavroula Kanoni
  123. Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), King Abdulaziz University, Jeddah, Saudi Arabia

    • Panos Deloukas
  124. The Berlin Aging Study II; Research Group on Geriatrics, Charité-Universitätsmedizin Berlin, Berlin, Germany

    • Ilja Demuth
    •  & Elisabeth Steinhagen-Thiessen
  125. Institute of Medical and Human Genetics, Charité-Universitätsmedizin Berlin, Berlin, Germany

    • Ilja Demuth
  126. German Socio-Economic Panel Study, DIW Berlin, Berlin, Germany

    • Peter Eibich
    •  & Martin Kroh
  127. Health Economics Research Centre, Nuffield Department of Population Health, University of Oxford, Oxford, UK

    • Peter Eibich
  128. The University of Queensland Diamantina Institute, The Translational Research Institute, Brisbane, Queensland, Australia

    • David M. Evans
  129. Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA

    • Jessica D. Faul
    •  & David R. Weir
  130. Department of Genetics, Division of Statistical Genomics, Washington University School of Medicine, St Louis, MO, USA

    • Mary F. Feitosa
    • , Aldi T. Kraja
    • , Ingrid B. Borecki
    •  & Michael A. Province
  131. Institute of Human Genetics, University of Bonn, Bonn, Germany

    • Andreas J. Forstner
  132. Department of Genomics, Life and Brain Center, University of Bonn, Bonn, Germany

    • Andreas J. Forstner
  133. Institute of Biomedical and Neural Engineering, School of Science and Engineering, Reykjavík University, Reykjavík, Iceland

    • Bjarni V. Halldórsson
  134. Laboratory of Epidemiology, Demography, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA

    • Tamara B. Harris
  135. Department of Psychiatry, Washington University School of Medicine, St Louis, MO, USA

    • Andrew C. Heath
    •  & Pamela A. Madden
  136. Division of Applied Health Sciences, University of Aberdeen, Aberdeen, UK

    • Lynne J. Hocking
  137. Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald, Germany

    • Georg Homuth
    •  & Uwe Völker
  138. Manchester Medical School, The University of Manchester, Manchester, UK

    • Michael A. Horan
  139. Program in Translational NeuroPsychiatric Genomics, Departments of Neurology and Psychiatry, Brigham and Women’s Hospital, Boston, MA, USA

    • Philip L. de Jager
  140. Harvard Medical School, Boston, MA, USA

    • Philip L. de Jager
  141. Department of Genes and Environment, Norwegian Institute of Public Health, Oslo, Norway

    • Astanand Jugessur
    • , Ronny Myhre
    •  & Bo Jacobsson
  142. Department of Genomics of Common Disease, Imperial College London, London, UK

    • Marika A. Kaakinen
  143. Department of Clinical Physiology, Tampere University Hospital, Tampere, Finland

    • Mika Kähönen
  144. Department of Clinical Physiology, University of Tampere, School of Medicine, Tampere, Finland

    • Mika Kähönen
  145. Public Health, Medical School, University of Split, Split, Croatia

    • Ivana Kolcic
  146. Institute of Social and Preventive Medicine, Lausanne University Hospital (CHUV), Lausanne, Switzerland

    • Zoltan Kutalik
  147. Neuroepidemiology Section, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA

    • Lenore J. Launer
  148. Amsterdam Brain and Cognition Center, University of Amsterdam, Amsterdam, The Netherlands

    • Maël P. Lebreton
  149. Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA

    • Douglas F. Levinson
  150. Institute of Human Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany

    • Peter Lichtner
    •  & Thomas Meitinger
  151. Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK

    • Riccardo E. Marioni
  152. Department of Internal Medicine, Lausanne University Hospital (CHUV), Lausanne, Switzerland

    • Pedro Marques-Vidal
    •  & Peter Vollenweider
  153. Tema BV, Hoofddorp, The Netherlands

    • Gerardus A. Meddens
  154. Molecular Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia

    • Grant W. Montgomery
    •  & Dale R. Nyholt
  155. NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester, UK

    • Christopher P. Nelson
    •  & Nilesh J. Samani
  156. Institute of Health and Biomedical Innovation, Queensland Institute of Technology, Brisbane, Queensland, Australia

    • Dale R. Nyholt
  157. Psychiatric and Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA

    • Aarno Palotie
  158. Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland

    • Aarno Palotie
    • , Antti-Pekka Sarin
    •  & Jaakko Kaprio
  159. Department of Neurology, Massachusetts General Hospital, Boston, MA, USA

    • Aarno Palotie
  160. Medical Genetics, Institute for Maternal and Child Health IRCCS “Burlo Garofolo”, Trieste, Italy

    • Antonietta Robino
    • , Sheila Ulivi
    • , Diego Vozzi
    •  & Paolo Gasparini
  161. Social Impact, Arlington, VA, USA

    • Olga Rostapshova
  162. Department of Economics, University of Minnesota Twin Cities, Minneapolis, MN, USA

    • Aldo Rustichini
  163. Department of Psychiatry and Behavioral Sciences, NorthShore University HealthSystem, Evanston, IL, USA

    • Alan R. Sanders
    •  & Pablo V. Gejman
  164. Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, IL, USA

    • Alan R. Sanders
    •  & Pablo V. Gejman
  165. Public Health Genomics Unit, National Institute for Health and Welfare, Helsinki, Finland

    • Antti-Pekka Sarin
  166. Research Unit for Genetic Epidemiology, Institute of Molecular Biology and Biochemistry, Center of Molecular Medicine, General Hospital and Medical University, Graz, Graz, Austria

    • Helena Schmidt
  167. Information Based Medicine Stream, Hunter Medical Research Institute, New Lambton, New South Wales, Australia

    • Rodney J. Scott
  168. Research Unit Hypertension and Cardiovascular Epidemiology, Department of Cardiovascular Science, University of Leuven, Leuven, Belgium

    • Jan A. Staessen
  169. R&D VitaK Group, Maastricht University, Maastricht, The Netherlands

    • Jan A. Staessen
  170. Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany

    • Konstantin Strauch
  171. Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig Maximilians-Universität, Munich, Germany

    • Konstantin Strauch
  172. Department of Geriatrics, Florida State University College of Medicine, Tallahassee, FL, USA

    • Antonio Terracciano
  173. Department of Health Sciences and Genetics, University of Leicester, Leicester, UK

    • Martin D. Tobin
  174. Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands

    • Frank J. A. van Rooij
  175. Research Center for Group Dynamics, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA

    • Erin B. Ware
  176. Platform for Genome Analytics, Institutes of Neurogenetics and Integrative and Experimental Genomics, University of Lübeck, Lübeck, Germany

    • Lars Bertram
  177. Neuroepidemiology and Ageing Research Unit, School of Public Health, Faculty of Medicine, Imperial College London, London, UK

    • Lars Bertram
  178. Department of Health Sciences, Community and Occupational Medicine, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands

    • Ute Bültmann
  179. Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche, c/o Cittadella Universitaria di Monserrato, Monserrato, Cagliari, Italy

    • Francesco Cucca
  180. Institute of Biomedical Technologies, Italian National Research Council, Segrate, Milan, Italy

    • Daniele Cusi
  181. Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland

    • Johan G. Eriksson
  182. Departments of Human Genetics and Psychiatry, Donders Centre for Neuroscience, Nijmegen, The Netherlands

    • Barbara Franke
  183. Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands

    • Lude Franke
    •  & Pim van der Harst
  184. Sidra, Experimental Genetics Division, Sidra, Doha, Qatar

    • Paolo Gasparini
  185. Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany

    • Hans-Jörgen Grabe
  186. Department of Psychiatry and Psychotherapy, HELIOS-Hospital Stralsund, Stralsund, Germany

    • Hans-Jörgen Grabe
  187. Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, The Netherlands

    • Patrick J. F. Groenen
  188. Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, The Netherlands

    • Pim van der Harst
  189. Centre for Population Health Research, School of Health Sciences and Sansom Institute, University of South Australia, Adelaide, South Australia, Australia

    • Elina Hyppönen
  190. South Australian Health and Medical Research Institute, Adelaide, South Australia, Australia

    • Elina Hyppönen
    •  & Christine Power
  191. Population, Policy and Practice, UCL Institute of Child Health, University College London, London, UK

    • Elina Hyppönen
  192. Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment and Health, School of Public Health, Imperial College London, London, UK

    • Marjo-Riitta Järvelin
  193. Center for Life Course Epidemiology, Faculty of Medicine, University of Oulu, Oulu, Finland

    • Marjo-Riitta Järvelin
  194. Unit of Primary Care, Oulu University Hospital, Oulu, Finland

    • Marjo-Riitta Järvelin
  195. Biocenter Oulu, University of Oulu, Oulu, Finland

    • Marjo-Riitta Järvelin
  196. Fimlab Laboratories, Tampere, Finland

    • Terho Lehtimäki
  197. Department of Clinical Chemistry, University of Tampere, School of Medicine, Tampere, Finland

    • Terho Lehtimäki
  198. Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Queensland, Australia

    • Nicholas G. Martin
  199. Centre for Clinical and Cognitive Neuroscience, Institute Brain Behaviour and Mental Health, Salford Royal Hospital, Manchester, UK

    • Neil Pendleton
  200. Manchester Institute Collaborative Research in Ageing, University of Manchester, Manchester, UK

    • Neil Pendleton
  201. Faculty of Medicine, University of Split, Split, Croatia

    • Ozren Polasek
  202. Department of Clinical Genetics, VU Medical Centre, Amsterdam, The Netherlands

    • Danielle Posthuma
  203. Institute of Preventive Medicine, Bispebjerg and Frederiksberg Hospitals, The Capital Region, Frederiksberg, Denmark

    • Thorkild I. A. Sørensen
  204. Montpellier Business School, Montpellier, France

    • A. Roy Thurik
  205. Panteia, Zoetermeer, The Netherlands

    • A. Roy Thurik
  206. Department of Psychiatry, Erasmus Medical Center, Rotterdam, The Netherlands

    • Henning Tiemeier
  207. Department of Child and Adolescent Psychiatry, Erasmus Medical Center, Rotterdam, The Netherlands

    • Henning Tiemeier
  208. Department of Internal Medicine, Erasmus Medical Center, Rotterdam, The Netherlands

    • André G. Uitterlinden

Authors

  1. Search for James J. Lee in:

  2. Search for Robbee Wedow in:

  3. Search for Aysu Okbay in:

  4. Search for Edward Kong in:

  5. Search for Omeed Maghzian in:

  6. Search for Meghan Zacher in:

  7. Search for Tuan Anh Nguyen-Viet in:

  8. Search for Peter Bowers in:

  9. Search for Julia Sidorenko in:

  10. Search for Richard Karlsson Linnér in:

  11. Search for Mark Alan Fontana in:

  12. Search for Tushar Kundu in:

  13. Search for Chanwook Lee in:

  14. Search for Hui Li in:

  15. Search for Ruoxi Li in:

  16. Search for Rebecca Royer in:

  17. Search for Pascal N. Timshel in:

  18. Search for Raymond K. Walters in:

  19. Search for Emily A. Willoughby in:

  20. Search for Loïc Yengo in:

  21. Search for Maris Alver in:

  22. Search for Yanchun Bao in:

  23. Search for David W. Clark in:

  24. Search for Felix R. Day in:

  25. Search for Nicholas A. Furlotte in:

  26. Search for Peter K. Joshi in:

  27. Search for Kathryn E. Kemper in:

  28. Search for Aaron Kleinman in:

  29. Search for Claudia Langenberg in:

  30. Search for Reedik Mägi in:

  31. Search for Joey W. Trampush in:

  32. Search for Shefali Setia Verma in:

  33. Search for Yang Wu in:

  34. Search for Max Lam in:

  35. Search for Jing Hua Zhao in:

  36. Search for Zhili Zheng in:

  37. Search for Jason D. Boardman in:

  38. Search for Harry Campbell in:

  39. Search for Jeremy Freese in:

  40. Search for Kathleen Mullan Harris in:

  41. Search for Caroline Hayward in:

  42. Search for Pamela Herd in:

  43. Search for Meena Kumari in:

  44. Search for Todd Lencz in:

  45. Search for Jian’an Luan in:

  46. Search for Anil K. Malhotra in:

  47. Search for Andres Metspalu in:

  48. Search for Lili Milani in:

  49. Search for Ken K. Ong in:

  50. Search for John R. B. Perry in:

  51. Search for David J. Porteous in:

  52. Search for Marylyn D. Ritchie in:

  53. Search for Melissa C. Smart in:

  54. Search for Blair H. Smith in:

  55. Search for Joyce Y. Tung in:

  56. Search for Nicholas J. Wareham in:

  57. Search for James F. Wilson in:

  58. Search for Jonathan P. Beauchamp in:

  59. Search for Dalton C. Conley in:

  60. Search for Tõnu Esko in:

  61. Search for Steven F. Lehrer in:

  62. Search for Patrik K. E. Magnusson in:

  63. Search for Sven Oskarsson in:

  64. Search for Tune H. Pers in:

  65. Search for Matthew R. Robinson in:

  66. Search for Kevin Thom in:

  67. Search for Chelsea Watson in:

  68. Search for Christopher F. Chabris in:

  69. Search for Michelle N. Meyer in:

  70. Search for David I. Laibson in:

  71. Search for Jian Yang in:

  72. Search for Magnus Johannesson in:

  73. Search for Philipp D. Koellinger in:

  74. Search for Patrick Turley in:

  75. Search for Peter M. Visscher in:

  76. Search for Daniel J. Benjamin in:

  77. Search for David Cesarini in:

Consortia

  1. 23andMe Research Team

  1. COGENT (Cognitive Genomics Consortium)

    1. Social Science Genetic Association Consortium

    Contributions

    D.J.B., D.C., P.T. and P.M.V. designed and oversaw the study. A.O. was the lead analyst of the study, responsible for quality control and meta-analyses. Analysts who assisted A.O. in major ways include: E.K. (quality control), O.M. (COJO, MTAG, quality control), T.A.N.-V. (figure preparation), H.L. (quality control), C.L. (quality control), J.S. (UKB association analyses) and R.K.L. (UKB association analyses). P.B. and E.K. conducted the within-family association analyses. The cross-cohort heritability and genetic-correlation analyses were conducted by R.W. and M.Z. The analyses of the X chromosome in UK Biobank were conducted by J.S.; A.O. ran the meta-analysis. J.J.L. organized and oversaw the bioinformatic analyses, with assistance from T.E., E.K., K.T., T.H.P. and P.N.T. Polygenic-prediction analyses were designed and conducted by A.O., K.T. and R.W. Besides the contributions explicitly listed above, T.K., R.L. and R.R. conducted additional analyses for several subsections. C.W. helped with the coordination of the participating cohorts. J.P.B., D.C.C., T.E., M.J., J.J.L., P.D.K., D.I.L., S.F.L., S.O., M.R.R., K.T. and J.Y. provided helpful advice and feedback on various aspects of the study design. All authors contributed to and critically reviewed the manuscript. E.K., J.J.L. and R.W. made especially large contributions to the writing and editing.

    Competing interests

    Anil Malhotra is a consultant for Genomind Inc., Informed DNA, Concert Pharmaceuticals, and Biogen. Nicholas A. Furlotte, Aaron Kleinman and Joyce Tung are employees of 23andMe, Inc.

    Corresponding authors

    Correspondence to Aysu Okbay or Peter M. Visscher or Daniel J. Benjamin.

    Supplementary information

    1. Supplementary Text and Figures

      Supplementary Note and Supplementary Figures 1–29

    2. Reporting Summary

    3. Supplementary Tables

      Supplementary Tables 1–44

    About this article

    Publication history

    Received

    Accepted

    Published

    DOI

    https://doi.org/10.1038/s41588-018-0147-3

    Further reading