Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals

Abstract

Here we conducted a large-scale genetic association analysis of educational attainment in a sample of approximately 1.1 million individuals and identify 1,271 independent genome-wide-significant SNPs. For the SNPs taken together, we found evidence of heterogeneous effects across environments. The SNPs implicate genes involved in brain-development processes and neuron-to-neuron communication. In a separate analysis of the X chromosome, we identify 10 independent genome-wide-significant SNPs and estimate a SNP heritability of around 0.3% in both men and women, consistent with partial dosage compensation. A joint (multi-phenotype) analysis of educational attainment and three related cognitive phenotypes generates polygenic scores that explain 11–13% of the variance in educational attainment and 7–10% of the variance in cognitive performance. This prediction accuracy substantially increases the utility of polygenic scores as tools in research.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Manhattan Plot for GWAS of EduYears.
Fig. 2: Sign concordance in within-family association analyses.
Fig. 3: Tissue-specific expression of genes in DEPICT-defined loci.
Fig. 4: Prediction Accuracy.

References

  1. 1.

    Branigan, A. R., McCallum, K. J. & Freese, J. Variation in the heritability of educational attainment: an international meta-analysis. Soc. Forces 92, 109–140 (2013).

    Article  Google Scholar 

  2. 2.

    Conti, G., Heckman, J. & Urzua, S. The education–health gradient. Am. Econ. Rev. 100, 234–238 (2010).

    Article  Google Scholar 

  3. 3.

    Cutler, D. M. & Lleras-Muney, A. in Making Americans Healthier: Social and Economic Policy as Health Policy (eds House, J. et al.) (Russell Sage Foundation, New York, 2008).

  4. 4.

    Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

    CAS  Article  Google Scholar 

  5. 5.

    Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).

    CAS  Article  Google Scholar 

  6. 6.

    Belsky, D. W. et al. The genetics of success: how single-nucleotide polymorphisms associated with educational attainment relate to life-course development. Psychol. Sci. 27, 957–972 (2016).

    Article  Google Scholar 

  7. 7.

    Domingue, B. W., Belsky, D. W., Conley, D., Harris, K. M. & Boardman, J. D. Polygenic influence on educational attainment: new evidence from The National Longitudinal Study of Adolescent to Adult Health. AERA Open 1, 1–13 (2015).

    Article  Google Scholar 

  8. 8.

    Marioni, R. E. et al. Genetic variants linked to education predict longevity. Proc. Natl Acad. Sci. USA 113, 13366–13371 (2016).

    CAS  Article  Google Scholar 

  9. 9.

    Anttila, A. V. et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).

    Article  Google Scholar 

  10. 10.

    Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

    CAS  Article  Google Scholar 

  11. 11.

    Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).

    CAS  Article  Google Scholar 

  12. 12.

    The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  Google Scholar 

  13. 13.

    Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  Article  Google Scholar 

  14. 14.

    Wu, Y., Zheng, Z., Visscher, P. M. & Yang, J. Quantifying the mapping precision of genome-wide association studies using whole-genome sequencing data. Genome Biol. 18, 86 (2017).

    Article  Google Scholar 

  15. 15.

    Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).

    CAS  Article  Google Scholar 

  16. 16.

    Kong, A. et al. The nature of nurture: effects of parental genotypes. Science 359, 424–428 (2018).

  17. 17.

    de Vlaming, R. et al. Meta-GWAS accuracy and power (MetaGAP) calculator shows that hiding heritability is partially due to imperfect genetic correlations across studies. PLoS Genet. 13, e1006495 (2017).

    Article  Google Scholar 

  18. 18.

    Tropf, F. C. et al. Hidden heritability due to heterogeneity across seven populations. Nat. Hum. Behav. 1, 757–765 (2017).

    Article  Google Scholar 

  19. 19.

    Johnson, W., Carothers, A. & Deary, I. J. Sex differences in variability in general intelligence: a new look at the old question. Perspect. Psychol. Sci. 3, 518–531 (2008).

    Article  Google Scholar 

  20. 20.

    Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).

    CAS  Article  Google Scholar 

  21. 21.

    Azevedo, F. A. C. et al. Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. J. Comp. Neurol. 513, 532–541 (2009).

    Article  Google Scholar 

  22. 22.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  Article  Google Scholar 

  23. 23.

    Reed, T. E. & Jensen, A. R. Arm nerve conduction velocity (NCV), brain NCV, reaction time, and intelligence. Intelligence 15, 33–47 (1991).

    Article  Google Scholar 

  24. 24.

    Chen, W., McDonnell, S. K., Thibodeau, S. N., Tillmans, L. S. & Schaid, D. J. Incorporating functional annotations for fine-mapping causal variants in a Bayesian framework using summary statistics. Genetics 204, 933–958 (2016).

    Article  Google Scholar 

  25. 25.

    Wang, G. et al. CaV3.2 calcium channels control NMDA receptor-mediated transmission: a new mechanism for absence epilepsy. Genes Dev. 29, 1535–1551 (2015).

    CAS  Article  Google Scholar 

  26. 26.

    Vilhjálmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenicrisk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    Article  Google Scholar 

  27. 27.

    Martin, A. R. et al. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100, 635–649 (2017).

    CAS  Article  Google Scholar 

  28. 28.

    Scutari, M., Mackay, I. & Balding, D. Using genetic distance to infer the accuracy of genomic prediction. PLoS Genet. 12, e1006288 (2016).

    Article  Google Scholar 

  29. 29.

    Trampush, J. W. et al. GWAS meta-analysis reveals novel loci and genetic correlates for general cognitive function: a report from the COGENT consortium. Mol. Psychiatry 22, 336–345 (2017).

    CAS  Article  Google Scholar 

  30. 30.

    Davies, G. et al. Ninety-nine independent genetic loci influencing general cognitive function include genes associated with brain health and structure (n = 280,360). https://doi.org/10.1101/176511 (2017).

  31. 31.

    Sniekers, S. et al. Genome-wide association meta-analysis of 78,308 individuals identifies new loci and genes influencing human intelligence. Nat. Genet. 49, 1107–1112 (2017).

    CAS  Article  Google Scholar 

  32. 32.

    Savage, J. E. et al. GWAS meta-analysis (n=279,930) identifies new genes and functional links to intelligence. https://doi.org/10.1101/184853 (2017).

  33. 33.

    Schmitz, L. L. & Conley, D. The effect of Vietnam-era conscription and genetic potential for educational attainment on schooling outcomes. Econ. Educ. Rev. 61, 85–97 (2017).

    Article  Google Scholar 

  34. 34.

    Heath, A. C. et al. Education policy and the heritability of educational attainment. Nature 314, 734–736 (1985).

    CAS  Article  Google Scholar 

  35. 35.

    Kang, H. J. et al. Spatio-temporal transcriptome of the human brain. Nature 478, 483–489 (2011).

    CAS  Article  Google Scholar 

  36. 36.

    Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    CAS  Article  Google Scholar 

  37. 37.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 1–16 (2015).

    Article  Google Scholar 

  38. 38.

    Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

    CAS  Article  Google Scholar 

  39. 39.

    Cochran, W. G. The combination of estimates from different experiments. Biometrics 10, 101–129 (1954).

    Article  Google Scholar 

  40. 40.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    CAS  Article  Google Scholar 

  41. 41.

    Cameron, A. C. & Miller, D. Robust inference with dyadic data. Winter North American Meetings of the Econometric Society, Boston, January 5, 2015.

  42. 42.

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

    Article  Google Scholar 

  43. 43.

    Fehrmann, R. S. N. et al. Gene expression analysis identifies global gene dosage sensitivity in cancer. Nat. Genet. 47, 115–125 (2015).

    CAS  Article  Google Scholar 

  44. 44.

    de Leeuw, C. A. et al. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    Article  Google Scholar 

  45. 45.

    Liu, J. Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).

    CAS  Article  Google Scholar 

  46. 46.

    Mi, H., Muruganujan, A., Casagrande, J. T. & Thomas, P. D. Large-scale gene function analysis with the PANTHER classification system. Nat. Protoc. 8, 1551–1566 (2013).

    Article  Google Scholar 

  47. 47.

    Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).

    CAS  Article  Google Scholar 

  48. 48.

    Chen, W. et al. Fine mapping causal variants with an approximate Bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).

    Article  Google Scholar 

  49. 49.

    Henmon, V. A. C. & Nelson, M. J. Henmon–Nelson Tests of Mental Ability, High School Examination—Grades 7 to 12—Forms A, B, and C. Teacher’s Manual. (Houghton-Mifflin, Boston, 1946).

    Google Scholar 

Download references

Acknowledgements

This research was carried out under the auspices of the Social Science Genetic Association Consortium (SSGAC). The research has also been conducted using the UK Biobank Resource under application numbers 11425 and 12512. We acknowledge the Swedish Twin Registry for access to data. The Swedish Twin Registry is managed by the Karolinska Institutet and receives funding through the Swedish Research Council under the grant number 2017-00641. This study was supported by funding from the Ragnar Söderberg Foundation (E9/11, E42/15), the Swedish Research Council (421-2013-1061), The Jan Wallander and Tom Hedelius Foundation, an ERC Consolidator Grant (647648 EdGe), the Pershing Square Fund of the Foundations of Human Behavior, The Open Philanthropy Project (2016-152872), and the NIA/NIH through grants P01-AG005842, P01-AG005842-20S2, P30-AG012810 and T32-AG000186-23 to N.B.E.R. and R01-AG042568 to U.S.C. A full list of acknowledgments is provided in the Supplementary Note.

Author information

Affiliations

Authors

Consortia

Contributions

D.J.B., D.C., P.T. and P.M.V. designed and oversaw the study. A.O. was the lead analyst of the study, responsible for quality control and meta-analyses. Analysts who assisted A.O. in major ways include: E.K. (quality control), O.M. (COJO, MTAG, quality control), T.A.N.-V. (figure preparation), H.L. (quality control), C.L. (quality control), J.S. (UKB association analyses) and R.K.L. (UKB association analyses). P.B. and E.K. conducted the within-family association analyses. The cross-cohort heritability and genetic-correlation analyses were conducted by R.W. and M.Z. The analyses of the X chromosome in UK Biobank were conducted by J.S.; A.O. ran the meta-analysis. J.J.L. organized and oversaw the bioinformatic analyses, with assistance from T.E., E.K., K.T., T.H.P. and P.N.T. Polygenic-prediction analyses were designed and conducted by A.O., K.T. and R.W. Besides the contributions explicitly listed above, T.K., R.L. and R.R. conducted additional analyses for several subsections. C.W. helped with the coordination of the participating cohorts. J.P.B., D.C.C., T.E., M.J., J.J.L., P.D.K., D.I.L., S.F.L., S.O., M.R.R., K.T. and J.Y. provided helpful advice and feedback on various aspects of the study design. All authors contributed to and critically reviewed the manuscript. E.K., J.J.L. and R.W. made especially large contributions to the writing and editing.

Corresponding authors

Correspondence to Aysu Okbay or Peter M. Visscher or Daniel J. Benjamin.

Ethics declarations

Competing interests

Anil Malhotra is a consultant for Genomind Inc., Informed DNA, Concert Pharmaceuticals, and Biogen. Nicholas A. Furlotte, Aaron Kleinman and Joyce Tung are employees of 23andMe, Inc.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Note and Supplementary Figures 1–29

Reporting Summary

Supplementary Tables

Supplementary Tables 1–44

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lee, J.J., Wedow, R., Okbay, A. et al. Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet 50, 1112–1121 (2018). https://doi.org/10.1038/s41588-018-0147-3

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing