Article | Open | Published:

The role of common genetic variation in educational attainment and income: evidence from the National Child Development Study

Scientific Reports volume 5, Article number: 16509 (2015) | Download Citation


We investigated the role of common genetic variation in educational attainment and household income. We used data from 5,458 participants of the National Child Development Study to estimate: 1) the associations of rs9320913, rs11584700 and rs4851266 and socioeconomic position and educational phenotypes; and 2) the univariate chip-heritability of each phenotype, and the genetic correlation between each phenotype and educational attainment at age 16. The three SNPs were associated with most measures of educational attainment. Common genetic variation contributed to 6 of 14 socioeconomic background phenotypes, and 17 of 29 educational phenotypes. We found evidence of genetic correlations between educational attainment at age 16 and 4 of 14 social background and 8 of 28 educational phenotypes. This suggests common genetic variation contributes both to differences in educational attainment and its relationship with other phenotypes. However, we remain cautious that cryptic population structure, assortative mating, and dynastic effects may influence these associations.


Young people’s human capital accumulates over their childhood, partially via formal schooling, and is affected by their decisions and opportunities1,2. However, twin and family studies suggest that socio-economic characteristics, such as educational attainment, are heritable3. Branigan et al. (2013) meta-analysed twin studies from around the world that suggested a heritability of educational attainment due to additive genetic variation of 40.0% (95% confidence interval (95%CI): 35.3%, 44.7%). This implies genetics could play an important role in influencing educational attainment and could potentially explain some of the observed relationships between peoples’ backgrounds, education, and outcomes.

Recent studies have identified single-nucleotide polymorphisms (SNPs) associated with educational attainment and cognition4,5,6. Rietveld et al. identified three SNPs that were consistently associated with educational attainment. These were rs9320913, which was associated with the number of years of education, and rs11584700 and rs4851266, which were associated with graduating from college (university). Henceforth, we use “allele” or “alleles” to refer to the alleles that associated with higher educational attainment.

An alternative source of evidence about the relationships between genetic variation and educational phenotypes comes from estimates of the combined contribution of common genetic variation measured on genotyping arrays to these phenotypes using genome-wide data from unrelated individuals7. Henceforth, we refer to these estimates as “chip-heritability” to distinguish these estimates from the heritability studies using family data that account for all genetic variation (including rare variants). Marioni et al. (2014) found educational attainment had a chip-heritability of 21%, and socioeconomic position had a chip-heritability of 18%8. They also reported that education and socioeconomic position had a bivariate chip-heritability of 41%. This is the proportion of the phenotypic correlation between the two phenotypes that can be explained by shared additive SNP effects. For heritable traits a substantive bivariate heritability is a necessary (but not sufficient) condition for a causal relationship. Krapohl and Plomin (2015) found that common genetic variation explained 31% (SE = 0.11) of differences in educational attainment, and 20% (SE = 0.11) of the variation in socioeconomic position. They also found that education and socioeconomic position had a bivariate chip-heritability of 50%9.

There is still much that we do not know about the relationship between the genome and educational attainment and outcomes later in life, such as household income. Here, we provide new evidence from the National Child Development Study (NCDS) using three approaches10. First, we investigated the associations of the three SNPs described above on a range of educational and socio-economic phenotypes, and second we investigated the association of a genome-wide allele score constructed using all the coefficients reported in the educational attainment GWAS, and third we used genome-wide approaches to estimate the chip-heritability of educational attainment and the genetic correlation and bivariate chip-heritability of educational attainment and other phenotypes.


There were 17,416 infants that enrolled into the NCDS, of which 9,377 had biological data in the 2003 biomedical survey, 5,458 were genotyped and passed the quality control described in the methods below. See Supplementary Figure 1 for a flow chart of the participants’ inclusion and exclusion from the study, and Supplementary Table 1 for a description of their characteristics. Most participants (52.6%) reported having no O-levels (Supplementary Table 2). Of participants who reported having O-levels, the median was 4. A minority (19%) of participants stayed on for A-levels and 11% had been awarded a degree by age 23. The average household nominal income was £45,247, from all sources before taxes at age 46 in 2004, which is equivalent to £59,038 in June 2015 prices (Supplementary Table 2). This is comparable to figures from the UK Office of National Statistics which suggest that average household nominal incomes were £37,554 in 2004 for people aged 30 to 4911. The three SNPs were in Hardy Weinberg Equilibrium (Supplementary Table 3).

Association of number of O-levels and phenotypes

Number of O-levels was associated with 42 of 44 observed phenotypes. Participants who achieved more O-levels had: older parents, taller mothers and fathers, more educated parents, fewer mothers who smoked, more fathers and grandfathers of high social class, and scored more highly in all measures of educational attainment at all ages (Tables 1, 2, 3, ). Participants with more O-levels were more likely to obtain A-levels; to obtain a degree; and all measures of income (Table 4).

Table 1: Association of number of O-levels and Rietveld et al. (2013) allele score and perinatal covariates.
Table 2: Association of number of O-levels and Rietveld allele score and educational phenotypes at age 7 and 11.
Table 3: Association of number of O-levels and Rietveld allele score and educational phenotypes at age 16.
Table 4: Association of number of O-levels and Rietveld allele score and educational attainment and log household income and wealth.

Association of the 3 SNP allele score and the phenotypes

The 3 SNP allele score was associated with two measures taken from the perinatal survey: fathers’ and maternal grandfathers’ social class (odds-ratio = 1.07, 95%CI: 1.01, 1.14, p-value = 0.03 and odds-ratio = 1.08, 95%CI: 1.01, 1.15, p-value = 0.02, respectively) (Table 1). The score was also positively associated with paternal social class, however this association was small and imprecisely estimated (odds-ratio 1.01, 95%CI: 0.94, 1.09, p-value = 0.74). Adjusting for the first twenty principal components of the genotype data matrix did not meaningfully affect the results.

The 3 SNP allele score was positively associated with educational attainment at age 7 and 11 (Table 2). Each allele was associated with a higher likelihood of intending to stay on in school at age 11 (odds-ratio = 1.06, 95%CI: 1.00, 1.12, p-value = 0.03). The score was weakly associated with maths test score at age 16 (mean-difference = 0.12, 95%CI: −0.07, 0.30, p-value = 0.22), and was more strongly associated with reading test scores (mean-difference = 0.27, 95%CI: 0.11, 0.43, p-value = 0.001) (Table 3). There was little evidence that the 3 SNP allele score was associated with the teachers’ reports of the participants’ ability, or whether the participant was hardworking or lazy. It was positively associated with risk of poor eyesight (odds-ratio = 1.13, 95%CI: 1.03, 1.25, p-value = 0.01), but there was little evidence it was associated with having poor hearing, speech, or being clumsy. The 3 SNP allele score was associated with the participants’ aspirations for staying on past the school minimum age, studying A-levels and further full time study. These associations were not substantially affected by adjusting for the first twenty principal components, which suggests that population stratification is unlikely to explain our results.

On average each extra allele was associated with an additional 0.07 (95%CI: 0.01, 0.14, p-value = 0.03) O-levels. Participants with no alleles achieved 1.70 O-levels, whereas those with five alleles achieved 2.30 O-levels (Fig. 1). Participants were 11.2% (95%CI: 4.7%, 18.2%, p-value < 0.001) more likely to achieve A-levels per allele (Table 4). Only 14.4% of 347 participants with no alleles had A-levels, whereas 25.4% of the 106 participants with 5 alleles had A-levels (Supplementary Figure 2). Participants were 3.6% (95%CI: −4.0%, 11.9%, p-value = 0.36) more likely to achieve a degree per allele. Finally, each additional education associated allele was associated with a 1.0% (95%CI: −1.3%, 3.2%, p-value = 0.41) higher household income (Table 4). The association of the three individual SNPs and educational attainment and before tax household income is shown in Fig. 2.

Figure 1: The association of the Rietveld et al. (2013) allele score and number of O-levels achieved at age 23.
Figure 1

The dashed line indicates linear prediction of household income based on the number of alleles. The grey line indicates the confidence intervals of linear prediction.

Figure 2: Associations of rs9320913, rs11584700, rs4851266 and household income at age 46 compared to associations with educational attainment reported by Rietveld et al. (2013).
Figure 2

Point estimates and confidence intervals for the effects of the SNPs on education (x-axis) taken from Rietveld et al. Point estimates for the effects of the SNPs on household income (y-axis) taken from the National Child Development Study. The dashed line indicates the line of best fit between the points.

The association of the three SNP allele score on the number of O-levels was larger for children of high social class fathers compared to those of low social class fathers (Supplementary Table 4). However, we found little evidence of such an interaction for obtaining A-levels, obtaining a degree, or household income. Therefore the interaction of social class and number of O-levels may simply be due to chance.

Each extra O-level was associated with a 7.4% (95%CI: 6.5%, 8.3%, p-value < 0.001) higher before-tax household income. Participants with A-levels had 48.0% (95%CI: 41.7%, 54.2%, p-value < 0.001) higher household income than those without A-levels, and those with degrees had 55.7% (95%CI: 48.1%, 63.1%, p-value < 0.001) more than those without. The instrumental variable results implied a larger effect of education on earnings than the observational association, however the two-stage least squares results using the allele score were imprecise. The instrumental variable results using a saturated model and the continuously updating estimator were more precise and suggested that each additional O-level generated a 26.2% (95%CI: 1.3%, 51.1%, p-value = 0.04) increase in earnings, obtaining A-levels generated a 142.6% (95%CI: 5.8%, 279.3%, p-value = 0.04) increase in earnings and obtaining a degree generated a 524.3% (95%CI: −225.3%, 1274.9%, p-value = 0.17) increase in earnings compared with not having A-levels or a degree. The effect estimates are large and depend on strong assumptions that are unlikely to hold; see the discussion for further details.

The genome-wide allele score was positively associated with 41 of 47 traits (Supplementary Tables 5 to 10). The score was strongly associated with take home pay and savings and investment wealth: a one standard deviation increase in the score was associated with a 13.8% (95%CI: 10.0%, 17.6%) increase in net take home pay, and a 32.3% (95%CI: 22.5%, 42.1%) increase in wealth.

Contribution of common genetic variation

Estimates of the chip-heritability for the social background and education related phenotypes are shown in Tables 5, 6, 7, 8. Overall we found evidence (p < 0.05) of chip-heritability for 6 of 14 background and 17 of 29 educational phenotypes. There was modest evidence of gene-environment correlations—in the form of non-zero estimates of chip-heritability for the background phenotypes, for example, the age the participants’ fathers and mothers left school ( = 0.07 SE = 0.07 and  = 0.07 SE = 0.08, respectively Table 4). We found evidence of a moderate chip-heritability of mothers’ smoking behaviour during pregnancy ( = 0.10, p-value = 0.04). We found positive genetic correlations between the participants’ father’s, and their paternal and maternal grandfathers’ social classes and the number of O-levels obtained (rg = 0.87, p-value = 0.002, rg = 0.39, p-value = 0.27 and rg = 0.64, p-value = 0.03, respectively). There was substantial bivariate chip-heritability between the participants’ number of O-levels and their fathers’, and paternal and maternal grandfathers’ social classes ( = 0.54,  = 0.28, and  = 0.59 respectively).

Table 5: Univariate estimates of chip-heritability for pre-natal phenotypes and their genetic correlation with number of O-levels.
Table 6: Univariate estimates of chip-heritability for educational phenotypes at age 7 and 11 and genetic correlation with number of O-levels.
Table 7: Univariate estimates of chip-heritability for educational phenotypes at age 16 and genetic correlation with number of O-levels.
Table 8: Univariate estimates of chip-heritability for educational and income phenotypes and genetic correlation with number of O-levels.

There was moderate chip-heritability of the participants’ arithmetic and reading test scores, and their teacher reported ability at age 7 ( = 0.11 SE = 0.07,  = 0.15 SE = 0.07, and  = 0.17 SE = 0.07 respectively) (Table 5), and for teacher reported ability there was a substantial genetic correlation between these phenotypes and number of O-levels achieved at age 16 (rg = 1.00, p-value = 0.002). Maths and reading tests measured at age 11 were also moderately heritable ( = 0.23, SE=0.07 and  = 0.28, SE=0.07), as were the correlations between these phenotypes and the number of O-levels (rg = 0.53 and rg = 0.61 respectively Table 5). The heritability of maths and reading scores at age 16 were similar to at age 11 (Table 7). There was moderate heritability of parents’ wishes about their child’s education; the participants’ aspirations and intentions at age 16 about staying on in school to obtain A-levels and a degree; and of genetic correlations between these phenotypes and the number of O-levels. In contrast, there was little evidence from this sample that the participants’ speech, eyesight or clumsiness were due to common variation.

Finally, the heritabilities of the number of O-levels, obtaining A-levels or a degree were similar ( = 0.10, SE = 0.06,  = 0.11, SE = 0.06 and  = 0.13 SE = 0.06 respectively, Table 8). There was evidence of large genetic correlations between obtaining A-levels and a degree and the number of O-levels. We found modest evidence that net take home pay, and wealth were heritable (Table 8).


In this study, we found that three SNPs: rs9320913, rs11584700, and rs4851266 were associated with educational attainment across childhood and years of education obtained by early adulthood. We found little evidence that the size of the genetic effects differed between reading and maths ability. The SNPs were also associated with 11-year-old children’s intentions to stay on in school after age 15 (the school minimum leaving age at the time of the study). Individuals with more alleles were more likely to obtain O-levels, A-levels and degrees.

Of the 29 measured education phenotypes 17 had detectable chip-heritability (p < 0.05). Common SNPs explained 10% of the variation in number of O-levels achieved and 13% of the reading scores at age 11, and 13% of the variance in obtaining a degree by age 23. Previous authors have reported evidence from twin studies suggesting that the heritability of educational phenotypes is relatively stable across childhood12. Consistent with this, we found little evidence that the contribution of common genetic variation changed over childhood and adolescence. However, our results are imprecise, reflecting relatively low statistical power.

Further, we found evidence of genetic correlations between the number of O-levels and 8 of 28 educational phenotypes. For example, the genetic correlations between the number of O-levels a participant achieved and their arithmetic at age 7 and Maths score at ages 11, and 16 were 0.50, 0.53, and 0.59 respectively. This suggests similar genetic pathways influence both the number of O-levels and students’ motivation as reported by the teachers. There were further genetic correlations between O-levels and parents’ interest in their children’s education at age 11. This suggests that a proportion of the associations of parents’ engagement with their children’s education and their children’s educational attainment may be due to a shared common genetic architecture.

Our genome-wide results are consistent with the results reported by Marioni et al. (2014)8. They found a bivariate heritability between cognition, educational attainment and socio-economic position of between 24% and 59%. Our point estimate of the bivariate chip-heritability of socioeconomic position and educational attainment is also consistent with results reported by Krapohl and Plomin (2015)9. Our results add to the growing evidence that socioeconomic gradients in educational attainment may be partially due to common genetic variation.

We also note that the putative positive bivariate chip-heritability between number of O-levels and parents’ and grandparents’ socio-economic position could be due to a number of factors, including pleiotropic effects, dynastic effects, assortative mating, and population stratification. The simplest explanation is a direct pleiotropic effect of common SNPs on both phenotypes, indicated by the arrow A on Supplementary Figure 4. However, it is also possible that there are dynastic (direct) effects of parents’ socioeconomic position on their offspring, as indicated by the B1 and B2 arrows on Supplementary Figure 4. Our analysis would attribute a portion of the direct effects of the parents’ socioeconomic economic position to common genetic variation. Finally, assortative mating between parents on socioeconomic position and educational attainment or hidden population stratification may lead to over-estimates of the contribution of common genetic variation to the association of socioeconomic position and educational attainment (arrows C1 and C2 in Supplementary Figure 4). Hence the estimated genetic correlation between these phenotypes is not conclusive proof that these phenotypes are due to a shared biological process.

A strength of our study is that we used a large geographically representative sample (N = 5,458). The SNPs were imputed using the 1,000 genomes reference panel. We found weak evidence that SNPs associated with higher educational attainment were also associated with higher adult household income. This is consistent with a causal effect of education on earnings. We also exploited extremely detailed data on participants’ educational attainment over childhood and adolescence. This provides important evidence about the aetiology of educational attainment and labour market success across the life-course.

The SNPs detected in Rietveld et al. are unlikely to be valid instrumental variables for educational choices at a specific point in time, such as “took A-levels”. This is because the three Rietveld et al. (2013) SNPs affect a range of phenotypes other than educational attainment. For example, we know that the SNPs are associated with cognition6. The SNPs also affect educational attainment and decisions across the life-course. Whereas the instrumental variable point estimates assume that the SNPs only affect earnings via the number of O-levels, having A-levels or having a degree. This would invalidate the use of these SNPs as genetic instrumental variables for educational attainment13. This could explain why our instrumental variable estimates of the effects of education on wages are so substantial. Population stratification could explain our results, however we excluded individuals who were not of European genetic origin and adjusting for the first twenty principal components did not affect the three SNP allele score results, so this is unlikely. We found that estimates of the contribution of common genetic variation dropped substantially when adjusting for principal components, and it may reduce further still if more accurate measures of cryptic population structure are used.

Common genetic variation explained more of the variability in children’s educational attainment than is typically attributed to teachers or schools14,15,16. These results add to a growing body of literature that suggests a portion of the observed differences in educational attainment between children can be explained by underlying genetic differences. Furthermore, the increasing availability of genome-wide datasets provides an opportunity to produce new evidence about the genome’s role in educational attainment and other important outcomes such as earnings. These findings are important as they suggest that commonly studied relationships, such as socioeconomic gradients in educational attainment may be substantially explained by common genetic variation. However, shared genetic variation could reflect genetic variation influencing one phenotype (e.g. education) which then influences outcomes, which would both generate genetic correlation. Advances in Mendelian randomization methods17, such as bidirectional Mendelian randomization, two-sample, and invalid (pleiotropic) instrument robust methods could potentially provide further insights into the development of educational phenotypes18,19,20,21.

Materials and Methods

The Data—The National Child Development Study (NCDS)

The NCDS is a nationally representative cohort study of 17,416 births in a single week in 1958. The participants and their families have been surveyed a further nine times during childhood, adolescence and as adults. We had information about the participants’ family’s socio-economic position at birth, their intermediate educational attainment at ages 7, 11 and 16 and their labour market outcomes, measured by their household income at age 46. For further details of the cohort see the published cohort profile10. We report the associations with a range phenotypes, for which the summary statistics can be seen in Supplementary Table 1.

Perinatal Mortality Study 1958 shortly after birth

We defined the participants’ family background using the birth survey: including their mother’s and father’s age and height, mother’s weight, the age at which their parents left school, whether their mother smoked before and/or during pregnancy, and whether their fathers, and paternal and maternal grandfathers were in social class I or II.

NCDS Survey 1 1965 at age 7

From this survey we used the participants’ academic ability, as reported both by arithmetic and reading tests and a summation of teacher reported Likert scores for awareness, ability at reading, creativity, number skills and speaking.

NCDS Survey 2 1969 at age 11

From this survey we used the parents’ initiative to discuss the child’s education with the school and if the participants’ intended to stay on in school after the age of 15.

NCDS Survey 3 1974 at age 16

We extracted the participants’ maths and reading test scores at age 16; the teachers’ opinion about whether the participant was above average ability in Math, English and Science; whether the participant was lazy or hardworking, which was measured on a Likert scale (1-5); teacher reported difficulties in hearing, speaking, eyesight and clumsiness; the parents’ positive attitude to their children’s education; whether the parents wanted the child to get a degree and expected them to stay in school after age 16; and finally whether the participant aspired to full-time study after leaving school.

NCDS Survey 4 1981 educational attainment at age 23

We report three educational outcomes: 1) the number of school examinations (O-levels) each participant achieved at age 16; 2) whether the participant achieved A-levels at age 18; and 3) whether the participant completed a college (university) degree. Only a minority of students stayed on at school until age 18 to take A-levels, that are usually a requirement for attending university, (Supplementary Table 2). We used data from the survey at age 23, because this represents the educational achievement of the participants after they had passed through the conventional education system, but before their educational attainment could be affected by adult education.

Income and wealth

We used four measures of income and wealth. First household income before taxes, second take home pay, third equivalised family income, and finally investment and savings wealth. Household income before taxes was measured in the NCDS Survey 7 2004 income at age 46. This includes labour market earnings, state and private pensions, state-benefits such as child benefits or tax credits, and investment income such as interest from savings and income from rental properties. Net take home pay is from the 2008 survey at age 50, and records income from labour market income only. Family equivalent net income is a derived variable from the 1981 survey at age 23. It measured the participants’ income accounting for family structure.

Genome-wide data

The NCDS has biological samples from 9,377 of the participants who took part in a biomedical survey between 2002 and 2004. These samples were used to extract DNA for use in high-throughput genotyping arrays. These biological samples were originally genotyped in three different, but overlapping, samples. These results were used as control data for the first and second waves of the Wellcome Trust Case Control Consortiums22. Each sample had between 516,115 and 653,522 SNPs genotyped. Prior to imputation we excluded the SNPs with: a minor allele frequency of less than 1%, a Hardy Weinberg Equilibrium test p-value less than 1e-6. We also excluded individuals missing data for more than 3% of SNPs. We aligned SNP positions to the Human Genome 19 and flipped strands to be positive. We performed haplotyping using ShapeIt V2 software and imputed the data using Impute V2.2.2. The imputation used all population samples in the 1,000 genomes reference panel (phase 1 version 3, phased using ShapeIt v2 software, release 9-12-13)23,24,25. For whole genome variance estimation we retained HapMap3 SNPs with imputation quality score (R2) > 0.8 and a minor allele frequency of more than 1%26. This resulted in a final dataset of 1,187,090 individual SNPs. We excluded individuals not born in the UK for two reasons: they may have experienced different education systems, and they may introduce population stratification into our data. We combined the three genome-wide datasets to give a final sample of 5,458 individuals with genome-wide data.

We constructed an allele score equal to each participant’s number of G, T, and A alleles in rs11584700, rs4851266, and rs9320913 respectively. These SNPs were in strong linkage disequilibrium with genotyped SNPs, with IMPUTE 2 info-scores >0.99. In the Rietveld et al. GWAS these alleles were associated with higher educational attainment. We refer to this variable as the three SNP Rietveld allele score27. The NCDS was not included in the Rietveld et al. GWAS. We report the allele frequencies and tests for Hardy-Weinberg equilibrium in Supplementary Table 3. In a sensitivity analysis we repeated the allele score analysis using a weighted allele score of the three SNPs, these results were similar to the analysis using an unweighted score, so we do not discuss the weighted allele score results further.

Statistical methods

We report the results of three sets of analyses: first, the associations of the Rietveld allele score and the background and educational phenotypes described above; second, the association of a genome-wide allele score and the phenotypes; and third, we report the chip-heritability, denoted , which is the proportion of each phenotype explained by common genetic variation measured on the genome-wide arrays7. This is a lower bound estimate of total heritability as estimated by twin studies, because chip-heritability does not account for the effects of a large proportion of rare variation or unmeasured common variation which is in linkage equilibrium with the measured SNP data.

We estimated the associations between the number of O-levels each participant achieved and their observable phenotypes listed in Supplementary Table 1 using linear and logistic regression for the continuous and binary phenotypes respectively. We compared these associations to the associations of each phenotype and the Rietveld allele score, again estimated using linear and logistic regression. We report mean differences for the continuous phenotypes and odds-ratios for the binary phenotypes. All variance estimates from linear models use sandwich estimators that allow for heteroskedasticity28. We also report the Rietveld allele score results adjusted for the first twenty principal components of the genotype data matrix to allow for population stratification. The principal components were estimated using PLINK29,30.

In an exploratory analysis, we investigated whether there was an interaction between socioeconomic position of the participants’ fathers and the effects of the allele score on educational attainment and household income. In a further exploratory analysis we investigated using instrumental variable analysis to estimate the effects of educational attainment on household income. We report instrumental variable estimates on household income of: the number of O-levels, whether they had A-levels, or had obtained a degree. We used two methods: first, two-stage least squares using the Rietveld allele score as an instrument and, second, a model in which each of the three genotypes was indicated by two binary variables representing how many alleles the participant had. This resulted in six binary variables. We were concerned that the latter model might suffer from weak instrument bias, therefore we report Newey-Windmeijer standard errors using the continuously updating estimator31,32,33. This estimator is robust to weak instruments and allows for a general form of heteroskedasticity.

We investigated the genetic architecture of educational attainment and socio-economic position using genomic restricted maximum likelihood (GREML) analysis in Genome-wide Complex Trait Analysis (GCTA) software7. GREML estimates the “chip-heritability” – the proportion of variation in a phenotype explained by common SNPs measured in genotyping arrays. We refer to this estimate as chip-heritability (,). For this analysis we inverse normal rank transformed the continuous variables. For each phenotype, we tested whether chip-heritability differed from zero using likelihood-ratio tests. We report the genetic correlation between the genome-wide genetic effects for O-levels and the other phenotypes, indicated by rg. We calculated the bivariate heritability of the number of O-levels and other phenotypes; we refer to this as the bivariate chip-heritability (). This parameter is the proportion of the correlation between number of O-levels and other phenotypes that can be explained by common genetic variation measured on genotyping arrays. We calculated bivariate chip-heritability using the following formulae , where is the chip-heritability of the number of O-levels, is the heritability of the second phenotype, and is the observed correlation between the two phenotypes. All estimates of chip-heritability adjust for indicator variables for the genotyping sample and the first twenty principle components of the genotype data matrix.

Statistical power

To maximise the power of our study we only investigated the association of the three SNPs found to associate with educational attainment at genome-wide levels of significance (p < 5e-8) in Rietveld et al. (2013). This reduced the number of statistical tests we had to run and increase our power to detect associations. This approach was used in Rietveld et al. (2014) to investigate the association of these SNPs and cognitive ability.

There were 1,034 and 586 of the participants who obtained A-levels and degree. We used Visscher’s method to calculate the likely power of our chip-heritability estimators34. Thus, assuming a heritability of 0.3 and a variance of the genetic relatedness matrix of 2e-5, we had a 70% and 45% power to reject the null hypothesis that . For continuous traits we had greater power, 93.67% and 99.95% assuming heritabilities of 0.2 and 0.3 respectively.

Data access

The data can be accessed by submitting an application to the NCDS35. The statistical code used to create these results can be accessed here (

Additional Information

How to cite this article: Davies, N. M. et al. The role of common genetic variation in educational attainment and income: evidence from the National Child Development Study. Sci. Rep. 5, 16509; doi: 10.1038/srep16509 (2015).


  1. 1.

    & The Technology of Skill Formation. Am. Econ. Rev. 97, 31–47 (2007).

  2. 2.

    Schools, skills, and synapses. Econ. Inq. 46, 289–324 (2008).

  3. 3.

    , & Variation in the Heritability of Educational Attainment: An International Meta-Analysis. Soc. Forces 92, 109–140 (2013).

  4. 4.

    et al. GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment. Science 340, 1467–1471 (2013).

  5. 5.

    et al. Genetic Variation Associated with Differential Educational Attainment in Adults Has Anticipated Associations with School Performance in Children. PLoS ONE 9, e100248, 10.1371/journal.pone.0100248 (2014).

  6. 6.

    et al. Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. Proc. Natl. Acad. Sci. 111, 13790–13794 (2014).

  7. 7.

    , , & GCTA: A Tool for Genome-wide Complex Trait Analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  8. 8.

    et al. Molecular genetic contributions to socioeconomic status and intelligence. Intelligence 44, 26–32 (2014).

  9. 9.

    & Genetic link between family socioeconomic status and children’s educational achievement estimated from genome-wide SNPs. Mol. Psychiatry (2015), 10.1038/mp.2015.2.

  10. 10.

    Cohort profile: 1958 British birth cohort (National Child Development Study). Int. J. Epidemiol. 35, 34–41 (2005).

  11. 11.

    Family Spending, 2004 Edition - ONS. at , (2004) (Date of access:06/08/2015).

  12. 12.

    et al. Literacy and numeracy are more heritable than intelligence in primary school. Psychol. Sci. 24, 2048–2056 (2013).

  13. 13.

    & ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

  14. 14.

    , & Do Teachers Matter? Measuring the Variation in Teacher Effectiveness in England*. Oxf. Bull. Econ. Stat. 74, 629–645 (2012).

  15. 15.

    , & Teachers, schools, and academic achievement. Econometrica 73, 417–458 (2005).

  16. 16.

    & The Distribution of Teacher Quality and Implications for Policy. Annu. Rev. Econ. 4, 131–157 (2012).

  17. 17.

    & Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–R98 (2014).

  18. 18.

    , & Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).

  19. 19.

    , , , & Identification and Inference with Many Invalid Instruments. Journal of Business & Economic Statistics. 2014 Nov 6;00–00, 10.1080/07350015.2014.978175.

  20. 20.

    & Two-Sample Instrumental Variables Estimators. Rev. Econ. Stat. 92, 557–561 (2010).

  21. 21.

    , , & Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization. J. Am. Stat. Assoc. 0–0 (2015), 10.1080/01621459.2014.994705

  22. 22.

    et al. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  23. 23.

    , , , & A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).

  24. 24.

    et al. A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness. PLoS Genet. 10, e1004234, 10.1371/journal.pgen.1004234 (2014).

  25. 25.

    et al. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014).

  26. 26.

    et al. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  27. 27.

    & Use of allele scores as instrumental variables for Mendelian randomization. Int. J. Epidemiol. 42, 1134–1144 (2013).

  28. 28.

    A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econom. J. Econom. Soc. 48, 817–838 (1980).

  29. 29.

    et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  30. 30.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

  31. 31.

    GMM with many weak moment conditions: Replication and application of Newey and Windmeijer (2009). J. Appl. Econom. 27, 343–346 (2012).

  32. 32.

    & GMM with many weak moment conditions. Econometrica 77, 687–719 (2009).

  33. 33.

    et al. The many weak instruments problem and Mendelian randomization. Stat. Med. 34, 454–468 (2015).

  34. 34.

    et al. Statistical Power to Detect Genetic (Co)Variance of Complex Traits Using SNP Data in Unrelated Samples. PLoS Genet. 10, e1004269, 10.1371/journal.pgen.1004269 (2014).

  35. 35.

    Welcome to the 1958 National Child Development Study - Centre for Longitudinal Studies. at < , (2015) (Date accessed: 01/04/2015).

Download references


We are extremely grateful to all the families who took part in this study and the whole National Child Development Study team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. This work made use of data and samples generated by the 1958 Birth Cohort (NCDS). Access to these resources was enabled via the 58READIE Project funded by Wellcome Trust and Medical Research Council (grant numbers WT095219MA and G1001799). A full list of the financial, institutional and personal contributions to the development of the 1958 Birth Cohort Biomedical resource is available at Genotyping was undertaken as part of the Wellcome Trust Case-Control Consortium (WTCCC) under Wellcome Trust award 076113, and a full list of the investigators who contributed to the generation of the data is available at This work was carried out using the computational facilities of the Advanced Computing Research Centre, University of Bristol— and the Research Data Storage Facility of the University of Bristol— Funding: The Integrative Epidemiology Unit is supported by the Medical Research Council and the University of Bristol (G0600705, MC_UU_12013/1-9). No funding body has influenced data collection, analysis or its interpretations. This publication is the work of the authors, who serve as the guarantors for the contents of this paper. This work was carried out using the computational facilities of the Advanced Computing Research Centre— and the Research Data Storage Facility of the University of Bristol—

Author information


  1. Medical Research Council Integrative Epidemiology Unit, University of Bristol, BS8 2BN, United Kingdom

    • Neil M. Davies
    • , Gibran Hemani
    • , Nic J. Timpson
    • , Frank Windmeijer
    •  & George Davey Smith
  2. School of Social and Community Medicine, University of Bristol, Barley House, Oakfield Grove, Bristol, BS8 2BN, United Kingdom

    • Neil M. Davies
    • , Gibran Hemani
    • , Nic J. Timpson
    •  & George Davey Smith
  3. Department of Economics, University of Bristol, 8 Woodland Road, Bristol BS8 1TN, United Kingdom

    • Frank Windmeijer


  1. Search for Neil M. Davies in:

  2. Search for Gibran Hemani in:

  3. Search for Nic J. Timpson in:

  4. Search for Frank Windmeijer in:

  5. Search for George Davey Smith in:


N.M.D., G.H., N.J.T., F.W. and G.D.S. wrote the main manuscript text. N.M.D. prepared the tables and figures. G.H. ran the imputation of the GWAS data. N.M.D., G.H., N.J.T., F.W. and G.D.S. reviewed the manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Neil M. Davies.

Supplementary information

About this article

Publication history





Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.