Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals

We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12–16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI’s magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.


Articles
NATurE GENETIcS a polygenic score) that has greater prediction accuracy, increasing the percentage of variance in EA explained from 11-13% to 12-16%, depending on the validation sample, an increase of approximately 20%. In meta-analyses of the expanded 23andMe sample and the UK Biobank (UKB) 3 , we also conduct an updated GWAS of the X chromosome (N = 2,713,033) and the first large-scale 'dominance GWAS' (i.e., a SNP-level GWAS of dominance deviations) of EA on the autosomes (N = 2,574,253). In our updated X-chromosome GWAS, we increase the number of approximately uncorrelated genome-wide-significant SNPs from 10 to 57. Our dominance GWAS identifies no genome-wide-significant SNPs. Moreover, with high confidence, we can rule out the existence of any common SNPs whose dominance effects explain more than a negligible fraction of the variance in EA. Table 1 summarizes the GWASs conducted in this paper and compares them to previous large-scale GWASs of educational attainment.
The rest of the paper investigates the scope and sources of the PGI's predictive power. We first document that the EA PGI not only predicts a range of cognitive phenotypes, as has been found in previous work 2,4 , but also adds nontrivial predictive power for ten diseases we examine, even after controlling for disease-specific PGIs. Next, using a combined sample of ~53,000 individuals with genotyped siblings and ~3,500 individuals with both parents genotyped, we examine the predictive power of the EA PGI controlling for parental EA PGIs. By controlling for parental EA PGIs, we isolate the component of predictive power that is due to direct effects 5 , or the causal effects of an individual's genetic material on that individual 6 . For EA and 22 other phenotypes, controlling for the parental EA PGIs roughly halves the EA PGI's association with the phenotype. In contrast, when we examine PGIs for height, body mass index (BMI) and cognitive performance, controlling for parental PGIs has far less impact on their associations with their corresponding phenotype. Thus, the EA PGI stands out as unusual in terms of how much of its predictive power is not due to direct effects.
Finally, we use PGIs to study assortative mating. Using 862 genotyped mate pairs in the UKB and 1,603 pairs in Generation Scotland (GS) 7 , we estimate the correlation between mate-pair PGIs for EA, as well as for height. For height, the correlation between mate-pair PGIs is close to that expected under phenotypic assortment (that is, all similarity between mate pairs on the genetic component of the phenotype arises via matching on the phenotype). Once again, EA is different; the correlation between mate-pair PGIs for EA is much larger than one would expect from phenotypic assortment on EA. We find evidence that population structure captured by principal components (PCs) and assortment on cognitive performance explain some, but not all, of the excess mate-pair PGI correlation. These findings shed further light on the EA PGI's predictive power for EA and other phenotypes; the factors on which mate pairs assort that are not EA but are correlated with the EA PGI (e.g., geographic location at courtship age (we speculate)) likely also contribute to the PGI's predictive power.
For a less technical description of the paper and of how it should-and should not-be interpreted, see the frequently asked questions in Supplementary Data 1.

Additive GWAS of EduYears in autosomes.
We conducted a sample-size-weighted meta-analysis of association results on EA, measured as number of years of schooling completed (EduYears), by combining three sets of summary statistics: public results from our previous meta-analysis of 69 cohorts (N = 324,162, excluding UKB and 23andMe), new association results from 23andMe (N = 2,272,216) and new association results from a GWAS we conducted in UKB with an improved coding of the EA measure (N = 441,121; Supplementary Note). All analyses were conducted in samples of European genetic ancestries, included controls for Table 1 Fig. 2 shows the LD score plot). The Manhattan plot in Fig. 1 and many of our subsequent analyses are based on test statistics adjusted for the LD score intercept. We identify 3,952 lead SNPs, defined as approximately uncorrelated (pairwise r 2 < 0.1) variants with an association P value below 5 × 10 −8 . At the stricter threshold 9 of P < 1 × 10 −8 , the number declines to 3,277 (Supplementary Table 1; Supplementary Note contains a description of the clumping algorithm). To assess the sensitivity of our conclusions about the number of independent SNPs, we conducted a conditional and joint (COJO) multiple-SNP analysis 10 . This analysis identified 2,925 SNPs (Supplementary Table 2); 41 of these are in LD (r 2 > 0.1) with other COJO lead SNPs and may represent secondary associations within a locus. Adjusted for the winner's curse, we find that the effects of our lead SNPs are consistently quite small. On average, an additional copy of the reference allele of the median SNP is associated with 1.4 weeks more schooling: the effects at the 5th and 95th percentiles (in absolute value) are 0.9 and 3.5 weeks, respectively (Supplementary Note contains details on these calculations). We also examined the out-of-sample replicability of the lead SNPs identified in the most recent previous meta-analysis 2 . In the independent 23andMe data, the replication record is broadly in line with theoretical predictions derived from an empirical Bayesian framework described in the Supplementary Note (Extended Data Fig. 3).

Biological annotation.
To compare results from biological annotation of our meta-analysis to that of the most recent previous meta-analysis, we applied stratified LD score regression 11 to both sets of summary statistics using a recent set of SNP annotations 12 . The results are very similar across the two meta-analyses, but standard errors are smaller when using the current meta-analysis results, as expected given the larger sample size (Supplementary Fig. 1a-d). Notably, we replicate the unexpected result of relatively weak enrichment of genes highly expressed in glial cells (astrocytes and oligodendrocytes) relative to neurons.

X-chromosome GWAS results.
To update the previous X-chromosome analysis, we conducted a sample-size-weighted meta-analysis of mixed-sex association results from UKB and 23andMe (N = 2,713,033) for ~200,000 SNPs on the X chromosome (Extended Data Fig. 4). We identified 57 lead SNPs with estimated effects in the range 1 to 3 weeks of schooling. Our findings are fully consistent with earlier conclusions: SNP heritability due to the X chromosome of 0.4% and (using sex-stratified association analyses in the UKB) a male-female genetic correlation on the X chromosome close to unity (rg = 0.94, s.e. = 0.03).

Dominance GWAS.
We conducted a GWAS of dominance deviations from the additive model (Supplementary Note) by meta-analyzing summary statistics from association analyses conducted in 23andMe and UKB (N = 2,574,253). Theory and evidence from the quantitative genetics literature, including findings from two recent papers 13,14 that estimated dominance SNP heritability across dozens of phenotypes (but not EA), suggest that dominance effects explain at most a very small share of the variance in polygenic phenotypes 15 . Nevertheless, in the behavior genetics literature, when the phenotypic correlation between monozygotic twins is more than twice as large as the phenotypic correlation between dizygotic twins, it remains common practice to attribute the violation of the additive model to dominance variance.
The Manhattan plot from our dominance GWAS is shown in red in the bottom panel of Fig. 1. There are no genome-wide-significant SNPs. Power calculations indicate that, at genome-wide significance, we had 80% power to detect dominance effects with an R 2 of 0.0015% (Supplementary Note). Such effect sizes would be over an order of magnitude smaller than the largest additive effects (R 2 ≅ 0.04%). Therefore, the absence of genome-wide-significant SNPs suggests that dominance effects of common SNPs, taken individually, are negligibly small.

NATurE GENETIcS
Next, we turn to the combined dominance effects of common SNPs. Applying an adapted version of LD Score regression to the summary statistics, we estimate a SNP heritability of 0.00015 (s.e. = 0.00024), which is statistically indistinguishable from zero (P = 0.54). In the Supplementary Note, we report additional analyses (that rely on different assumptions) that similarly conclude that the combined variance explained by dominance deviations in common SNPs is negligible. Our results do not rule out the possibility that rare SNPs have substantial dominance effects.
Even when the phenotypic variance across individuals explained by dominance is negligible, the combined dominance effects on an individual can be substantial when homozygosity (which is deleterious on average) is increased genome-wide due to inbreeding 16 . This reduction of fitness-related phenotypic values is called directional dominance, or inbreeding depression (ID). We applied a recently developed method that uses dominance GWAS summary statistics to estimate ID 17 . Our estimate implies the offspring of first cousins have on average ~1.0 fewer months of EA (P = 0.04) than the offspring of unrelated individuals.
Polygenic prediction. We assessed empirically how well a PGI derived from the autosomal GWAS of additive variation predicts a host of phenotypes related to EA, academic achievement and cognition. We used three European genetic-ancestry holdout samples from the National Longitudinal Study of Adolescent to Adult Health (Add Health) 18 , a representative sample of American adolescents followed into adulthood; the Health and Retirement Study (HRS) 19 , a representative sample of Americans over age 50 years; and the Wisconsin Longitudinal Study (WLS) 20 , a sample of individuals who graduated from high school in Wisconsin in 1957. Because of the range restriction for EduYears in WLS, we do not use it to evaluate predictive power for EA. Our measure of prediction accuracy is the 'incremental R 2 ' , or the gain in coefficient of determination (R 2 ) when the PGI is added as a covariate to a regression of the phenotype on a set of baseline controls (sex, dummy variables for birth year and/or age at assessment, their interactions and ten PCs of the genomic relatedness matrix). All PGIs that we analyze are based on a meta-analysis that excluded Add Health, HRS and WLS.
A PGI constructed using only genome-wide-significant SNPs has an incremental R 2 of 9.1% in Add Health and 7.0% in HRS (Extended Data Fig. 5). For all PGI analyses hereafter, unless stated otherwise, we use a PGI generated from HapMap3 SNPs using the software LDpred (ref. 21 ). This PGI explains 15.8% of the variance in EduYears in Add Health and 12.0% in HRS (Extended Data Fig. 6). The sample-size-weighted mean is 13.3%. Fig. 2a depicts how the predictive power has increased as GWAS sample sizes have increased. Fig. 2b shows that the prevalence of college completion varies a great deal over PGI deciles (Extended Data Fig. 7a,b shows prevalences of high school completion and grade retention). For example, only 7.3% and 6.8% of individuals in the lowest PGI decile have a college degree in Add Health and HRS, respectively, compared to 70.7% and 53.0% in the highest PGI decile. Fig. 2c, which displays scatterplots of individual EA versus PGIs, shows that throughout the PGI distribution, there is substantial variation in EA at the individual level. Thus, although average EA varies substantially across the PGI distribution, the PGI cannot be used to meaningfully predict an individual's EA.
In post hoc analyses, we found that a PGI generated from ~2.5 million pruned common SNPs using the software SBayesR (ref. 22 ) is more predictive than our LDpred PGI. It explains 17.0% of the variance in EduYears in Add Health and 12.9% in HRS, with a sample-size-weighted mean of 14.3% (Supplementary Table 3).
We supplemented our analyses of education outcomes with other cognitive and academic achievement outcomes (Extended Data Fig. 6 and Supplementary Table 4). For example, in Add Health, we found that the PGI explains 8.7% of the variation in Peabody verbal test scores and 12.3% in overall grade point average. In WLS, the PGI explains 6.1% of the variation in Henmon-Nelson test scores and 7.7% in high-school-grade percentile rank.
PGIs like ours that are constructed from GWAS in samples of European genetic ancestries are generally found to have much lower predictive power in samples with other genetic ancestries; for example, on average across phenotypes, estimates of relative accuracy (ratio of R 2 ) in African-genetic-ancestry to European-genetic-ancestry samples have been 22% (ref. 23 ) and 36% (ref. 24 ). When we used our PGI to predict EduYears in samples with African genetic ancestries from the HRS (N = 2,507) and Add Health (N = 1,716), the incremental R 2 was 1.3% (95% confidence interval (CI), 0.6% to 2.2%) and 2.3% (95% CI, 1.1% to 3.7%), implying that the relative accuracies for EA in the HRS and Add Health are only 11% and 15%, respectively. Using the UKB, we find that the relative accuracy is smaller than would be predicted based on population differences in allele frequencies and LD alone (Online Methods), and this discrepancy is greater for EA than has been found in prior work 25 for height, BMI and six other phenotypes (Extended Data Fig. 8 and Supplementary Table 5). The remaining reduction in predictive power is due to factors including epistasis (although epistatic variance is likely small 13,15 ), gene-environment interactions and differences between populations in gene-environment correlations, assortative mating and environmental variance.
Predicting disease risk. Among individuals of European genetic ancestries in the UKB, we estimated the predictive power of the EA PGI for ten common diseases for which large-scale GWASs have been conducted (Fig. 3). Because disease status is dichotomous, we assess predictive power using Nagelkerke's coefficient of determination 26 . Consistent with prior work that has estimated nonzero genetic correlations between EA and many diseases and health-related phenotypes 27 , some using an earlier EA PGI 1,28,29 , our EA PGI significantly predicts all ten diseases (all ten P values are smaller than 3 × 10 −8 ; Supplementary Table 6). The mean incremental R 2 across all ten diseases is 0.63%. This predictive power is nontrivial compared with the average incremental R 2 of 1.19% for disease-specific PGIs constructed using summary statistics from large-scale GWASs of the diseases. Moreover, the EA and disease-specific PGIs contribute roughly independently to predicting disease risk; the incremental R 2 from adding both PGIs and their interaction to the regression model is typically roughly equal to the sum of the incremental R 2 values of each of the two PGIs considered separately. Higher values of the EA PGI correspond to lower relative risk for each of the ten diseases (Extended Data Fig. 9 and Supplementary Tables 7 and 8).
Within-family analyses. Our next set of analyses, like related prior work 5,30,31 , aimed to isolate the component of the PGI's predictive . b, Prevalence of college completion by EA PGI decile, with 95% CIs. c, Scatterplot of EA PGI (residualized on ten principal components) and EduYears (residualized on sex, a full set of birth-year dummies, their interactions and ten principal components). Prediction samples for all panels are European-ancestry participants in Add Health (N = 5,653) and the HRS (N = 10,843). All PGIs were constructed from EduYears GWAS results that exclude Add Health and HRS using the software LDpred and assuming a normal prior for SNP effect sizes. Incremental R 2 is the difference between the R 2 from a regression of EduYears on the PGI and the controls (sex, a full set of birth-year dummies, their interactions and ten principal components) and the R 2 from a regression of EduYears on just the controls. The individual-level data plotted in c have been jittered by adding a small amount of noise to each observation.

NATurE GENETIcS
power that is due to direct effects 5,6 , or causal effects of an individual's genetic material on that individual. When controls for both parents' PGIs are included, we refer to the coefficient from a regression of an individual's phenotype on the individual's PGI as the direct effect of the PGI; when those controls are omitted, we refer to it as the population effect. (The regression controlling for parental PGIs gives an equivalent estimate of the direct effect of the PGI as a regression on PGIs constructed from transmitted and nontransmitted parental alleles 5 ; Supplementary Note.) The population effect captures the sum of the direct effect, indirect effects from relatives (e.g., genetic influences on parents' education, socioeconomic status and behavior), other gene-environment correlation (i.e., correlation a Rietveld

NATurE GENETIcS
between genotypes and environmental exposure, with population stratification being one possible cause) and a contribution from the genetic component of the phenotype that would be uncorrelated with the PGI under random mating but becomes correlated with the PGI due to the LD between causal alleles induced by assortative mating (Supplementary Note) 5,32 . Because the PGI is constructed from summary statistics that partly reflect indirect effects and other gene-environment correlation, estimating the direct effect of the PGI is different from estimating the total contribution of direct effects of SNPs 33,34 , for which relatedness disequilibrium regression 35 or summary statistics from within-family GWAS 36 could be used. For this analysis, we used a combined sample of ~53,000 individuals with genotyped siblings and ~3,500 individuals with both parents genotyped (Online Methods and Supplementary Note). Direct-effect estimates from the sibling data may be biased by sibling indirect effects, but estimates of such effects are small, including for some of the phenotypes we study 37 . The data are from the UKB (ref. 3 ), GS (ref. 7 ) and the Swedish Twin Registry (STR) 38 . We did not have sufficient power to study the diseases from Fig. 3 when restricting to these family samples. We instead analyze a set of 23 health, cognitive and socioeconomic phenotypes, which include cardiometabolic and lung biomarkers related to disease risk (Supplementary Tables 9 and 10).  Table 10) shows our meta-analysis estimates of the direct and population effects of the EA PGI. For predicting EA, the ratio of direct to population effect estimates is 0.556 (s.e. = 0.020), implying that 100% × 0.556 2 = 30.9% of the PGI's R 2 is due to its direct effect. This is smaller than the estimate of 48.9% reported in a previous analysis of Icelandic data 5 . For comparison with EA, we similarly estimate the direct and population effects of PGIs for height, BMI and cognitive performance on their respective phenotypes (Fig. 4a). The ratio of direct to population effect estimates is 0.910 (s.e. = 0.009) for height, 0.962 (s.e. = 0.017) for BMI and 0.824 (s.e. = 0.033) for cognitive performance, implying that 82.8%, 92.5% and 67.9%, respectively, of the PGI R 2 values are due to their direct effects (Supplementary Tables 11-13). The EA PGI has by far the lowest ratio.
We similarly assessed how much of the EA PGI's predictive power for the other 22 phenotypes (other than EA) is due to direct effects. Assortative mating. We also use the PGI to study assortative mating. For this analysis, we use data on genotyped mate pairs in the UKB (862 pairs) and GS (1,603 pairs). Under the (commonly assumed) hypothesis of phenotypic assortment-according to which the mate-pair genetic components are independent conditional on the mate-pair phenotypes 39,40 -the mate-pair PGI correlation should equal the product of the mate-pair phenotypic correlation, the correlation between the father's phenotype and PGI and the correlation between the mother's phenotype and PGI. We examined whether correlations between mate-pair EA PGIs fit this model (Fig. 5a), and we performed the same analysis for the height PGI (Fig. 5b). Height provides a useful comparison, because its mate-pair phenotypic correlation (0.290, s.e. = 0.018) and mate-pair PGI correlation (0.106, s.e. = 0.020) are somewhat similar to EA's mate-pair phenotypic correlation (0.430, s.e. = 0.017) and mate-pair PGI correlation (0.175, s.e. = 0.020). (For completeness, Supplementary Table 14 also shows results for the BMI and cognitive performance PGIs, but these are less informative because the mate-pair PGI correlations are not statistically distinguishable from zero.) For height, phenotypic assortment predicts a mate-pair PGI correlation of 0.087 (s.e. = 0.007) (the gray point in the figure), which is only somewhat smaller than the observed estimate of 0.106 and is contained within the 95% CI. In contrast, for EA, the predicted value of 0.031 (s.e. = 0.004) is much smaller than, and statistically distinguishable from, the mate-pair PGI correlation of 0.175. Phenotypic assortment on EA would also imply that after residualizing the PGI on EA, the mate-pair PGI correlation should fall to zero. In fact, the correlation falls by only 37%, to 0.110 (s.e. = 0.021).

NATurE GENETIcS
We explore two plausible explanations of the high mate-pair EA PGI correlation. The first is mate pairs tending to share genetic ancestry. Not all forms of social homogamy generate a mate-pair PGI correlation 41 , but social homogamy that is related to genetic ancestry (e.g., due to geographic proximity that tracks genetic structure in the population) will do so if there are components of genetic ancestry correlated with the PGI. After residualizing the EA PGI on 40 PCs of the genomic relatedness matrix in addition to EA, we find that the mate-pair PGI correlation falls to 0.091 (s.e. = 0.021). This implies that some, but not most, of the mate-pair PGI correlation is due to assortment on genetic ancestry captured by the PCs (or some factor correlated with the PCs). In the UKB, further adjustment for birth coordinates and the center where participants were assessed (Online Methods) resulted in a slight reduction of the correlation between mate-pair PGIs (Supplementary Table 14), suggesting that geographic factors not captured by the top 40 PCs also contribute to the high mate-pair EA PGI correlation. The second explanation is assortment on a phenotype or composite of phenotypes that is more strongly correlated with the EA PGI than EA itself. The GS cohort contains high-quality measures of cognitive performance and vocabulary, proxies for plausible candidates of such a composite. In this cohort, after residualizing on these proxies as well as on EA and 40 PCs, the mate-pair PGI correlation is 0.083 (s.e. = 0.027) compared to 0.113 (s.e. = 0.026) when residualizing on EA and PCs alone, which leaves a substantial remainder of the mate-pair PGI correlation unexplained. This remainder is due to assortment on phenotypes correlated with the EA PGI other than EA, cognitive performance and vocabulary-possibly including various personality traits 42-44 -and sources of social homogamy other than genetic ancestry captured by the top 40 PCs-possibly including

NATurE GENETIcS
geographic location at courtship age 45,46 , socioeconomic status and social class 47 .
Any factor that contributes to explaining the mate-pair PGI correlation must be correlated with the EA PGI. Therefore, these factors likely contribute to the EA PGI's predictive power for EA and other phenotypes. Moreover, assortative mating on these factors increases the variance of the component of the EA PGI with which they are correlated, which amplifies their contribution to the EA PGI's predictive power.

Discussion
The results of previous large-scale GWAS of EA have proven useful across many different areas of research, including medicine 48 , epidemiology 49,50 , psychology 42 , economics 51,52 and sociology 47,53,54 . The substantial increase in power from our large sample size will make the summary statistics from the current paper even more useful. Beyond increasing power, the GWAS reported in this paper also included extensive dominance, within-family and assortative mating analyses. These analyses illustrate how, as GWAS have advanced from relatively small samples (by today's standards) that identify just a few SNPs to well-powered analyses of most of the variation from common SNPs, it has become possible to address an ever-increasing set of questions. For example, we find that the EA PGI has predictive power across a broad range of educational, cognitive and health-related phenotypes and diseases. Our results show that this predictive power derives both from direct genetic effects and from gene-environment correlation (likely including indirect genetic effects from relatives), with assortative mating amplifying the predictive power over what would be expected under random mating.
Our findings are also relevant for informing some decades-old debates in the behavior genetics literature. Because the parameters of a general biometric model cannot be separately identified from a small number of phenotypic correlations among different types of relatives, researchers typically have to assume that some of the parameters equal zero in order to estimate other parameters. In the 1970s, for example, researchers from the Birmingham School 55,56 , researchers from the Hawaii School 57,58 and the sociologist Sandy Jencks famously came up with strikingly different explanations for a set of kinship correlations on cognitive test scores assembled by Jencks et al. 59 . A careful analysis by Loehlin 60 showed that the three sets of researchers arrived at different explanations for the same data primarily due to their divergent assumptions about dominance, assortative mating, and special twin environments.
Although our results concern EA rather than cognitive test scores, we believe they are relevant for evaluating the plausibility of some of the assumptions underlying the modeling approaches that have been used to explain familial resemblance in EA and cognitive phenotypes. Three of our findings are especially relevant: (1) dominance variance due to common variants is negligible, (2) much of the predictive power of the EA PGI is not explained by direct effects and (3) the mate-pair PGI correlation is far too strong to be consistent with assortative mating purely on phenotype. Overall, these findings suggest that any model of EA that requires substantial dominance to fit the data, restricts gene-environment correlations to zero or assumes assortative mating is purely based on phenotype is likely to be misspecified. Thus, our analyses demonstrate how results from large-scale GWAS and the resulting PGIs can be used to improve the identifiability of behavior-genetic models.
The sample size of the GWAS of EA reported in this paper is the largest published to date. For some purposes, such as attaining greater predictive power for the PGI, there are clearly diminishing returns. However, even larger samples will enable other analyses that have not yet been adequately powered, such as estimating differences in SNP effect sizes across phenotypes or populations and estimating the fraction of variance explained by epistatic interactions 13 .

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-022-01016-z.

This article is accompanied by a Supplementary Note with further details.
Coding the EduYears phenotype. As in previous GWAS 2,61,62 , the EduYears phenotype was coded by mapping the highest level of education that a respondent achieved to an International Standard Classification of Education 1997 category and then imputing a years-of-education equivalent for each International Standard Classification of Education 1997 category. Details on cohort-level phenotype measures, genotyping and imputation are in Supplementary Table 15.
Our phenotype coding was unchanged from previous GWAS, except in the UKB. UKB participants with a qualification of 'NVQ or HND or HNC or equivalent' (National Vocational Qualification, Higher National Diploma and Higher National Certificate, respectively) but no college or university degree were previously coded as having 19 years of education 2,62 , but this classification overstates their average years of schooling (Supplementary Note section 1 and Supplementary Fig. 3). We therefore recoded EduYears for these participants as the age they reported leaving full-time education minus five. We dropped holders of a National Vocational Qualification/Higher National Diploma/Higher National Certificate/equivalent who reported leaving full-time education before age 12 years (fewer than 50 individuals).
In previous GWAS, individuals younger than 30 years when EA was measured were excluded to ensure that almost everyone had completed formal schooling. In the 23andMe GWAS for the current paper, ~16% of the individuals are aged 16-29 years. To explore the effect of including these individuals, we conducted a simulation using the UKB data (Supplementary Note section 1.2). The results indicate that the inclusion of individuals aged younger than 30 years in the 23andMe GWAS is unlikely to have materially affected our meta-analysis results.
Additive GWAS. For our additive GWAS of EduYears, we meta-analyzed three sets of summary statistics: publicly available results from Lee et al. 2 that exclude 23andMe and UKB (N = 324,162), new association results from 23andMe (N = 2,272,216) and new association results from a GWAS we conducted in UKB with the identical methodology as in Lee et al. but with the improved coding of EduYears described above (N = 441,121). All cohort-level analyses were restricted to European-genetic-ancestry individuals who passed the cohort's quality-control filters and, except in 23andMe as described above, whose EA was measured at an age of at least 30 years. We did not run sex-stratified analyses for the autosomal meta-analysis, because there is compelling evidence from our prior work that the male-female genetic correlation for EduYears is close to one. For example, the Okbay et al. 62 data yield an estimate of 0.98 (s.e. = 0.029).
To the new 23andMe and UKB results, we applied a quality-control protocol similar to the one described previously 62 and implemented in the EasyQC R package but updated to a more recent reference panel and adjusted to account for the large GWAS sample sizes (Supplementary Note section 2.2.5 and Supplementary Table 16). Using the software METAL (ref. 63 ), for all SNPs that passed the quality-control thresholds in the new 23andMe and UKB results, we conducted a sample-size-weighted meta-analysis of these new results with the 69 results files from Lee et al. 2 (all except 23andMe and UKB). After the meta-analysis, we inflated the standard errors by the square root of the intercept (√ 1.663 ) from an LD score regression 8 estimated from the meta-analysis summary statistics.
We selected the set of approximately independent genome-wide-significant SNPs using the same iterative clumping algorithm used previously 2 and implemented in Plink (ref. 64 Table 2).

X-chromosome analyses.
We conducted separate association analyses of the X-chromosome SNPs in UKB and 23andMe (Supplementary Note section 3). The 23andMe analysis (N = 2,272,216) was conducted in a pooled male-female sample using a 0/2 genotype coding for males. The UKB analysis (N = 440,817) was an inverse-variance-weighted meta-analysis (assuming 0/2 genotype coding to match the 23andMe analysis) of sex-stratified association analyses conducted using BOLT-LMM v2.3.4 (ref. 66 ). Following Supplementary Note section 4.1 of Lee et al., we used the sex-stratified UKB analyses to estimate the X-chromosome SNP heritability for males and females, as well as the male-female genetic correlation (Supplementary Note section 3.1, Supplementary Table 17).
We performed a sample-size-weighted meta-analysis of the 211,581 SNPs that were available in both UKB and 23andMe, passed the quality control filters (Supplementary Note section 3.3 and Supplementary Table 16) and had a sample size greater than 500,000. To adjust for uncontrolled-for population stratification, we inflated the standard errors by the square root of the LD score intercept from an autosomal meta-analysis of UKB and 23andMe (√ 1.666 ) . We selected the set of approximately independent genome-wide-significant SNPs using the same clumping algorithm as in the additive GWAS (Supplementary Note section 2.2.6).

Dominance GWAS.
We conducted a sample-size-weighted meta-analysis for 5,870,596 autosomal SNPs that passed quality control filters and were available in both the 23andMe (N = 2,272,216) and UKB (N = 302,037) summary statistics. Similar to the additive GWAS, after the meta-analysis, we inflated the standard errors by the square root of the intercept from an LD score regression. We used LD scores that account for the faster decay of information from tagged SNPs as a function of LD for dominance effects (e.g., Hivert et al. 13 ). The LD score regression was restricted to the set of HapMap3 SNPs, and the dominance LD scores were estimated using the 1000 Genomes phase 1 reference sample 67 .
We decomposed the variance in the estimated dominance effect sizes into shares due to true signal of dominance genetic variance and sampling variation (Supplementary Note section 4.5 and Supplementary Table 18). We also conducted a series of preregistered replication exercises (https://osf.io/uegqv/) to assess whether the estimates of the dominance effects for various subsets of SNPs are consistent across UKB and 23andMe (Supplementary Note sections 4.6 and 8 and Supplementary Table 19).
To estimate ID for EA, we used ldscdom software, which implements a recently developed method 17 that uses GWAS summary statistics to obtain an estimate of the slope from the regression of the phenotype of interest (EA) on the inbreeding coefficient across individuals. Supplementary Note section 4.7 provides details, and Supplementary Table 20 shows the estimates of ID for each cohort separately, as well as the inverse-variance-weighted meta-analysis of these two estimates.
Polygenic prediction. From a GWAS meta-analysis that omits Add Health, HRS and WLS, the SNP weights for our main PGIs were obtained using LDpred  Table 4). We also constructed a PGI for the African-genetic-ancestry individuals in HRS and Add Health using the same LDpred weights (Supplementary Table 21).
The 'clumping and thresholding' PGIs with P value cutoffs of 5 × 10 −8 , 5 × 10 −5 , 5 × 10 −3 and 1 (i.e., all SNPs) were made in Plink2 (ref. 69 ) using the clumping algorithm described in the section ' Additive genome-wide-association study meta-analysis' and the procedure described above. The SNP weights were set equal to the coefficient estimates from the meta-analysis (Supplementary Table 3).
The SNP weights for the SBayesR (ref. 22 ) PGI were obtained using GCTB software 70 . We assume four components in the finite mixture model, with initial mixture probabilities π = (0.95,0.02,0.02,0.01) and fixed γ = (0.0,0.01,0.1,1), where γ is a parameter that constrains how the SNP-effect-size variance scales in each of the four distributions. LD was estimated using 2,865,810 pruned common variants from the full UKB European-genetic-ancestry (N ≈ 450,000) dataset from Lloyd-Jones et al. 22 . Weights were obtained for 2,548,339 of these SNPs that overlapped with the summary statistics after excluding the major histocompatibility complex region. PGIs were constructed in Plink2 (ref. 69 ) by multiplying the genotype probabilities at each SNP by the corresponding estimated posterior mean calculated by SBayesR and then summing over all included SNPs (Supplementary Table 3).
We analyzed how well the PGIs predict a host of phenotypes related to educational attainment, academic achievement and cognition (Supplementary Note section 5.2). All regressions include controls for year of birth or age at assessment, sex, their interactions and the first ten PCs of the variance-covariance matrix of the genomic relatedness matrix. In our analyses of grade point average outcomes in Add Health, we also controlled for high-school fixed effects (Supplementary Note section 5.3).
To evaluate prediction accuracy, we first regress the phenotype on the controls listed above without the PGI. Next, we rerun the regression but with the PGI included. For quantitative phenotypes, our measure of predictive power is the incremental R 2 , or the difference in R 2 between the regressions with and without the PGI. For binary outcomes, we proceed similarly but calculate the incremental Nagelkerke R 2 from a Probit regression. We obtained 95% CIs around the incremental (Nagelkerke) R 2 values by performing a bootstrap with 1,000 repetitions.
Expected prediction accuracy of the EA PGI. We calculate the expected prediction accuracy of the EA PGI using a generalization of de Vlaming et al. 71 . The expected coefficient of determination, R 2 , can be expressed as the following function of the discovery sample size, N: Although A may vary by prediction sample, B does not. We estimate A and B by nonlinear least squares using data from Add Health and HRS. More details of this calculation can be found in Supplementary Note section 5.5.
Analysis of European genetic ancestries to African genetic ancestries relative accuracy in UKB. We used a method that was recently developed by Wang et al. 25 Articles NATurE GENETIcS to investigate the factors contributing to the substantial loss of prediction accuracy of the EduYears PGI in samples of African genetic ancestries. We define the European genetic ancestries to African genetic ancestries relative accuracy (RA) as where R 2 AFR and R 2 EUR are prediction accuracies of PGIs derived from a GWAS conducted in European-genetic-ancestry populations. To facilitate comparability with Wang et al. 's results for eight other phenotypes, we extended their original analyses to also include EduYears. We thus performed a GWAS of HapMap3 SNPs (1,365,446 SNPs) in a sample of European-genetic-ancestry individuals in UKB (N = 425,231). We identified 507 approximately independent genome-wide-significant SNPs (using the LD clumping algorithm implemented in Plink (ref. 64 ), setting the window size equal to 1 Mb and the LD r 2 threshold to 0.1). We then used these 507 SNPs to generate PGIs and evaluate their accuracy in UKB holdout samples of African-genetic-ancestry individuals (N = 6,514) and European-genetic-ancestry individuals (N = 10,000). To compare our empirical estimate of RA to the RA predicted by the model, we used genotypes from 503 European-genetic-ancestry and 504 African-genetic-ancestry participants in the 1000 Genomes Project to estimate genetic-ancestry-specific MAF and LD correlations between all candidate causal variants (defined as any SNP within a 100-kb window of a genome-wide-significant SNP whose squared correlation with the genome-wide-significant SNP is above 0.45). Following Wang et al., we then substituted these estimates into their equation (2) (Supplementary Table 5 and Extended Data Fig. 8).
Prediction of disease risk from the EA PGI. The EA PGI was constructed using LDpred (v.1.0.11) (ref. 21 ) as described above but using the summary statistics of a meta-analysis of EA that excludes UKB. Disease-specific PGIs were constructed using summary statistics from GWAS conducted among participants of European genetic ancestries for nine phenotypes (Supplementary Table 22). The PGI for coronary artery disease was used to predict two diseases, ischemic heart disease and myocardial infarction. For all phenotypes other than migraine, we generated weights using LDpred and constructed the PGI using Plink1.9. LDpred was run using the same settings and Haplotype Reference Consortium reference data used for the EA PGI. For migraine, only SNPs with association P value < 10 −5 were available in the summary statistics, so we generated the PGI using clumping and thresholding. Disease phenotypes were generated based on UKB Category 1712 and Data Field 41270 (Supplementary Note section 6.1.2 and Supplementary  Tables 23 and 24).
For the various diseases, we computed the predictive power of (1) the EA PGI, (2) the disease-specific PGI and (3) these two PGIs together with their interaction (Supplementary Table 6). Our measure of predictive power is the incremental Nagelkerke's R 2 of adding the variable(s) to a logistic regression of the disease phenotype on sex, a third-degree polynomial in birth year and interactions with sex, the first 40 PCs and batch dummies. 95% CIs around the incremental Nagelkerke's R 2 were obtained by performing a bootstrap with 1,000 repetitions.
We also computed the odds ratio for selected diseases by deciles of the EA PGI in UKB (Supplementary Tables 7 and 8). Odds ratios and 95% CIs were estimated using logistic regression while controlling for covariates (Supplementary Note section 6.2.1).

Comparing direct and population effects.
To compare the direct effect of the PGI on various phenotypes to its population effect, we used data on siblings and trios from UKB (ref. 3 ), GS (ref. 7 ) and STR (ref. 38 ). In both UKB and GS, first-degree relatives were identified using KING with the "-related-degree 1" option 72 . For parent-offspring relations, the parent was identified as the older individual in the pair. We removed 621 individuals from GS that had been previously identified by GS as being also present in UKB (Supplementary Note section 7.3).
We analyzed PGIs for EA and cognitive performance in all three samples and height and BMI only in UKB and GS. PGIs were made using GWAS results that exclude GS, STR and all related individuals of up to third degree from UKB (Supplementary Note section 7.3), following the LDpred PGI pipeline described in Supplementary Note section 5.1.
We selected 23 phenotypes related to education, cognition, income and health (Supplementary Table 9) available in at least one of the datasets. For each phenotype in each dataset, we first regressed the phenotype onto sex and age, age 2 and age 3 and their interactions with sex. In addition, for UKB, we included as covariates the top 40 genetic PCs provided by UKB and the genotyping array dummies 3 . For GS and STR, we included the top 20 genetic PCs (Supplementary Note section 5.3 explains how the PCs were created). We then took the residuals from the regression of the phenotype on the covariates and normalized the residual variance within each sex separately so that the phenotypic residual variance was 1 in each sex in the combined sample of siblings and individuals with both parents genotyped. The PGIs of the phenotyped individuals were also normalized to have variance 1 in the same sample. Thus, effect estimates correspond to (partial) correlations, and their squares to proportions of phenotypic variance explained.
We give an overview of the statistical analyses performed here, with details in Supplementary Note section 7.4. In the siblings, we regressed individuals' phenotypes onto the difference between the individual's PGI and the mean PGI among the siblings in that individual's family and the mean PGI among siblings in that family. In trios, we regressed phenotypes onto the individual's PGI and the individual's father's and mother's PGIs. In both the siblings and trios, we used a linear mixed model to account for relatedness in the samples. We meta-analyzed the results from the siblings and trios, accounting for covariance between the estimates from the sibling and trio samples from the same datasets. We applied a transformation to the meta-analysis that accounts for assortative mating to estimate the population effect of the PGI and the difference between the direct and population effects.
Analysis of assortative mating. We identified mate pairs in UKB (862 mate pairs) and GS (1603 mate pairs) by identifying genotyped parents of genotyped individuals within each sample. Let ry denote the phenotypic correlation between mate pairs, and let rp and rm denote the correlations between the phenotype and PGI for the father and mother, respectively. The correlation between the mate-pair PGIs should be equal to ryrprm if the correlation is explained by assortative mating on the phenotype alone, and the relationship between the PGI and the phenotype is linear. To test the model of phenotypic assortment, we estimated the expected correlation between mate-pair PGIs by estimating ry, rp and rm. We estimated the standard error of the product of ry, rp and rm using 1,000 bootstrap samples where we sampled over the mate pairs. We also estimated the correlation between the residual of the father's PGI after regression onto the father's phenotype and the residual of the mother's PGI after regression onto the mother's phenotype, which should be zero under phenotypic assortment if the relationship between phenotype and PGI is linear. We performed further analyses adjusting for genetic PCs, birth coordinates, UKB assessment center, cognitive performance and vocabulary to test whether assortative mating on factors related to ancestry, geography and cognition explained the mate-pair PGI correlations (Supplementary Note section 9).
Reporting Summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
GWAS summary statistics can be downloaded from http://www.thessgac.org/ data subject to a terms of use to encourage responsible use of the data. We provide association results for all SNPs that passed quality-control filters in autosomal, X chromosome and dominance GWAS meta-analyses that exclude the research participants from 23andMe. SNP-level summary statistics from analyses based entirely or in part on 23andMe data can only be reported for up to 10,000 SNPs. For the complete dominance GWAS meta-analysis, which includes 23andMe, clumped results for the 1,000 SNPs with the smallest P values are provided. For the complete autosomal and X chromosome GWAS meta-analyses, respectively, clumped results for the 8,618 and 141 SNPs with P < 10 −5 are provided; this P value threshold was chosen such that the total number of SNPs across the analyses that include data from 23andMe does not exceed 10,000. The full GWAS summary statistics from 23andMe will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection No software was used for data collection.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability GWAS summary statistics can be downloaded from http://www.thessgac.org/data subject to a Terms of Use to ensure responsible use of the data. We provide association results for all SNPs that passed quality-control filters in autosomal, X chromosome, and dominance GWAS meta-analyses that excludes the research nature research | reporting summary April 2020 participants from 23andMe. SNP-level summary statistics from analyses based entirely or in part on 23andMe data can only be reported for up to 10,000 SNPs. For the complete dominance GWAS meta-analysis, which includes 23andMe, clumped results for the 1,000 SNPs with the smallest P values are provided. For the complete autosomal and X chromosome GWAS meta-analyses, respectively, clumped results for the 8,617 and 143 SNPs with P < 10-5 are provided; this P value threshold was chosen such that the total number of SNPs across the analyses that include data from 23andMe does not exceed 10,000. The full GWAS summary statistics from 23andMe will be made available through 23andMe to qualified researchers under an agreement with 23andMe that protects the privacy of the 23andMe participants. Please visit https://research.23andme.com/collaborate/#dataset-access/ for more information and to apply to access the data.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf Behavioural & social sciences study design All studies must disclose on these points even when the disclosure is negative.

Study description
This is a genome-wide association study (GWAS) meta-analysis of educational attainment (EA) in a sample of ~3 million individuals. All data used in this study (genetic and phenotype data) are quantitative.