Genome-wide association study identifies 74 loci associated with educational attainment

Journal name:
Nature
Volume:
533,
Pages:
539–542
Date published:
DOI:
doi:10.1038/nature17671
Received
Accepted
Published online

Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals1. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample1, 2 of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.

At a glance

Figures

  1. Manhattan plot for EduYears associations (n = 293,723).
    Figure 1: Manhattan plot for EduYears associations (n = 293,723).

    The x axis is chromosomal position, and the y axis is the significance on a −log10 scale (two-tailed test). The black dashed line shows the genome-wide significance level (5 × 10−8). The red crosses are the 74 approximately independent genome-wide significant associations (lead SNPs). The black dots labelled with rs numbers are the three SNPs identified in ref. 1.

  2. Genetic correlations between EduYears and other traits.
    Figure 2: Genetic correlations between EduYears and other traits.

    Results from bivariate LD score regressions9: estimates of genetic correlation with brain volume, neuropsychiatric, behavioural, and anthropometric phenotypes using published GWAS summary statistics. The error bars show the 95% confidence intervals (CI).

  3. Overview of biological annotation.
    Figure 3: Overview of biological annotation.

    Thirty-four clusters of significantly enriched gene sets. Each cluster is named after one of its member gene sets. The colour represents the permutation P value of the member set exhibiting the most statistically significant enrichment. Overlap between pairs of clusters is represented by an edge. Edge width represents the Pearson correlation ρ between the two vectors of gene membership scores (ρ < 0.3, no edge; 0.3 ≤ ρ < 0.5, thin edge; 0.5 ≤ ρ < 0.7, intermediate edge; ρ ≥ 0.7, thick edge), where each cluster’s vector is the vector for the gene set after which the cluster is named.

  4. Q–Q plot of the genome-wide association meta-analysis of 64 EduYears results files (n = 293,723).
    Extended Data Fig. 1: Q–Q plot of the genome-wide association meta-analysis of 64 EduYears results files (n = 293,723).

    Observed and expected P values are on a −log10 scale (two-tailed). The grey region depicts the 95% confidence interval under the null hypothesis of a uniform P value distribution. The observed λGC is 1.28. (As reported in Supplementary Information section 1.5.4, the unweighted mean λGC is 1.02, the unweighted median is 1.01, and the range across cohorts is 0.95–1.15.)

  5. The distribution of effect sizes of the 74 lead SNPs.
    Extended Data Fig. 2: The distribution of effect sizes of the 74 lead SNPs.

    a, SNPs ordered by absolute value of the standardized effect of one more copy of the education-increasing allele, with 95% confidence intervals. b, SNPs ordered by R2. Effects on EduYears are benchmarked against the top 74 genome-wide significant hits identified in the largest GWAS conducted to date of height and body mass index (BMI), and the 48 associations reported for waist-to-hip ratio adjusted for BMI (WHR). These results are based on the GIANT consortium’s publicly available results for pooled analyses restricted to European-ancestry individuals: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium.

  6. Assessing the extent to which population stratification affects the estimates from the GWAS.
    Extended Data Fig. 3: Assessing the extent to which population stratification affects the estimates from the GWAS.

    a, LD score regression plot with the summary statistics from the GWAS. Each point represents an LD score quantile for a chromosome (the x and y coordinates of the point are the mean LD score and the mean χ2 statistic of variants in that quantile). That the intercept is close to 1 and that the χ2 statistics increase linearly with the LD scores suggest that the bulk of the inflation in the χ2 statistics is due to true polygenic signal and not to population stratification. b, Estimates and 95% confidence intervals from individual-level and within-family regressions of EduYears on polygenic scores, for scores constructed with sets of SNPs meeting different P value thresholds. In addition to the analyses shown here, we conduct a sign concordance test, and we decompose the variance of the polygenic score. Overall, these analyses suggest that population stratification is unlikely to be a major concern for our 74 lead SNPs. See Supplementary Information section 3 for additional details.

  7. Replication of 74 lead SNPs in the UK Biobank data.
    Extended Data Fig. 4: Replication of 74 lead SNPs in the UK Biobank data.

    Estimated effect sizes (in years of schooling) and 95% confidence intervals of the 74 lead SNPs in the meta-analysis sample (n = 293,723) and the UK Biobank replication sample (n = 111,349). The reference allele is the allele associated with higher values of EduYears in the meta-analysis sample. SNPs are in descending order of R2 in the meta-analysis sample. Of the 74 lead SNPs, 72 have the anticipated sign in the replication sample, 52 replicate at the 0.05 significance level, and 7 replicate at the 5 × 10−8 significance level.

  8. Q–Q plots for the 74 lead EduYears SNPs (or LD proxies) in published GWAS of other phenotypes.
    Extended Data Fig. 5: Q–Q plots for the 74 lead EduYears SNPs (or LD proxies) in published GWAS of other phenotypes.

    SNPs with concordant effects on both phenotypes are pink, and SNPs with discordant effects are blue. SNPs outside the grey area pass Bonferroni-corrected significance thresholds that correct for the total number of SNPs we tested (P < 0.05/74 = 6.8 × 10−4) and are labelled with their rs numbers. Observed and expected P values are on a −log10 scale. For the sign concordance test: *P < 0.05, **P < 0.01 and ***P < 0.001.

  9. Regional association plots for four of the ten prioritized SNPs for mental health, brain anatomy, and anthropometric phenotypes identified using EduYears as a proxy phenotype.
    Extended Data Fig. 6: Regional association plots for four of the ten prioritized SNPs for mental health, brain anatomy, and anthropometric phenotypes identified using EduYears as a proxy phenotype.

    a, Cognitive performance; b, hippocampus; c, intracranial volume; d, neuroticism. The four were selected because very few genome-wide significant SNPs have been previously reported for these traits. Data sources and methods are described in Supplementary Information section 3. The R2 values are from the hg19 / 1000 Genomes Nov 2014 EUR references samples. The figures were created with LocusZoom (http://csg.sph.umich.edu/locuszoom/). Mb, megabases.

  10. Application of fgwas to EduYears.
    Extended Data Fig. 7: Application of fgwas to EduYears.

    See Supplementary Information section 4.2 for further details. a, The results of single-annotation models. ‘Enrichment’ refers to the factor by which the prior odds of association at an LD-defined region must be multiplied if the region bears the given annotation; this factor is estimated using an empirical Bayes method applied to all SNPs in the GWAS meta-analysis regardless of statistical significance. Annotations were derived from ENCODE and a number of other data sources. Plotted are the base 2 logarithms of the enrichments and their 95% confidence intervals. Multiple instances of the same annotation correspond to independent replicates of the same experiment. b, The results of combining multiple annotations and applying model selection and cross-validation. Although the maximum-likelihood estimates are plotted, model selection was performed with penalized likelihood. c, Reweighting of GWAS loci. Each point represents an LD-defined region of the genome, and shown are the regional posterior probabilities of association (PPAs). The x axis gives the PPA calculated from the GWAS summary statistics alone, whereas the y axis gives the PPA upon reweighting on the basis of the annotations in b. The orange points represent genomic regions where the PPA is equivalent to the standard GWAS significance threshold only upon reweighting.

  11. Tissue-level biological annotation.
    Extended Data Fig. 8: Tissue-level biological annotation.

    a, The enrichment factor for a given tissue type is the ratio of variance explained by SNPs in that group to the overall fraction of SNPs in that group. To benchmark the estimates for EduYears, we compare the enrichment factors to those obtained when we use the largest GWAS conducted to date on BMI, height, and waist-to-hip ratio adjusted for BMI. The estimates were produced with the LDSC Python software, using the LD scores and functional annotations introduced in ref. 17 and the HapMap3 SNPs with minor allele frequency >0.05. Each of the ten enrichment calculations for a particular cell type is performed independently, while each controlling for the 52 functional annotation categories in the full baseline model. The error bars show the 95% confidence intervals. b, We took measurements of gene expression by the Genotype-Tissue Expression (GTEx) Consortium and determined whether the genes overlapping EduYears-associated loci are significantly overexpressed (relative to genes in random sets of loci matched by gene density) in each of 37 tissue types. These types are grouped in the panel by organ. The dark bars correspond to tissues where there is significant overexpression. The y axis is the significance on a −log10 scale.

  12. Gene-level biological annotation.
    Extended Data Fig. 9: Gene-level biological annotation.

    a, The DEPICT-prioritized genes for EduYears measured in the BrainSpan Developmental Transcriptome data (red curve) are more strongly expressed in the brain prenatally rather than postnatally. The DEPICT-prioritized genes exhibit similar gene expression levels across different brain regions (grey lines). Analyses were based on log2-transformed RNA-seq data. Error bars represent 95% confidence intervals. b, For each phenotype and disorder, we calculated the overlap between the phenotype’s DEPICT-prioritized genes and genes believed to harbour de novo mutations causing the disorder. The bars correspond to odds ratios. c, DEPICT-prioritized genes in EduYears-associated loci exhibit substantial overlap with genes previously reported to harbour sites where mutations increase risk of intellectual disability and autism spectrum disorder (Supplementary Table 4.6.1).

  13. The predictive power of a polygenic score (PGS) varies in Sweden by birth cohort.
    Extended Data Fig. 10: The predictive power of a polygenic score (PGS) varies in Sweden by birth cohort.

    Five-year rolling regressions of years of education on the PGS (left axis in all four panels), share of individuals not affected by the comprehensive school reform (a, right axis), and average distance to nearest junior high school (b, right axis), nearest high school (c, right axis) and nearest college/university (d, right axis). The shaded area displays the 95% confidence intervals for the PGS effect.

References

  1. Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 14671471 (2013).
  2. Rietveld, C. A. et al. Replicability and robustness of genome-wide-association studies for behavioral traits. Psychol. Sci. 25, 19751986 (2014).
  3. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807812 (2011).
  4. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291295 (2015).
  5. Sudlow, C. et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
  6. Fowler, T., Zammit, S., Owen, M. J. & Rasmussen, F. A population-based study of shared genetic variation between premorbid IQ and psychosis among male twin pairs and sibling pairs from Sweden. Arch. Gen. Psychiatry 69, 460466 (2012).
  7. Tambs, K., Sundet, J. M., Magnus, P. & Berg, K. Genetic and environmental contributions to the covariance between occupational status, educational attainment, and IQ: a study of twins. Behav. Genet. 19, 209222 (1989).
  8. Thompson, L. A., Detterman, D. K. & Plomin, R. Associations between cognitive abilities and scholastic achievement: Genetic overlap but environmental differences. Psychol. Sci. 2, 158165 (1991).
  9. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 12361241 (2015).
  10. Ardlie, K. G. et al.; GTEx Consortium. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648660 (2015).
  11. Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
  12. Allen Institute for Brain Science. BrainSpan atlas of the developing human brain http://www.brainspan.org (2015).
  13. Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748752 (2009).
  14. Krapohl, E. et al. The high heritability of educational achievement reflects many genetically influenced traits, not just intelligence. Proc. Natl Acad. Sci. USA 111, 1527315278 (2014).
  15. Branigan, A. R., McCallum, K. J. & Freese, J. Variation in the heritability of educational attainment: An international meta-analysis. Social Forces 92, 109140 (2013).
  16. Heath, A. C. et al. Education policy and the heritability of educational attainment. Nature 314, 734736 (1985).
  17. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genetics 47, 12281235 (2015).

Download references

Author information

  1. These authors contributed equally to this work.

    • Aysu Okbay,
    • Jonathan P. Beauchamp,
    • Mark Alan Fontana,
    • James J. Lee,
    • Tune H. Pers,
    • Cornelius A. Rietveld &
    • Patrick Turley
  2. These authors jointly supervised this work.

    • Peter M. Visscher,
    • Tõnu Esko,
    • Philipp D. Koellinger,
    • David Cesarini &
    • Daniel J. Benjamin
  3. A list of participants and affiliations appears in the Supplementary Information.

    • LifeLines Cohort Study

Affiliations

  1. Department of Applied Economics, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, 3062 PA, The Netherlands

    • Aysu Okbay,
    • Cornelius A. Rietveld,
    • Ronald de Vlaming &
    • A. Roy Thurik
  2. Department of Epidemiology, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands

    • Aysu Okbay,
    • Cornelius A. Rietveld,
    • Ronald de Vlaming,
    • Sven J. van der Lee,
    • Najaf Amin,
    • Frank J. A. van Rooij,
    • Cornelia M. van Duijn,
    • Henning Tiemeier,
    • André G. Uitterlinden &
    • Albert Hofman
  3. Erasmus University Rotterdam Institute for Behavior and Biology, Rotterdam, 3062 PA, The Netherlands

    • Aysu Okbay,
    • Cornelius A. Rietveld,
    • S. Fleur W. Meddens,
    • Ronald de Vlaming,
    • A. Roy Thurik &
    • Philipp D. Koellinger
  4. Department of Economics, Harvard University, Cambridge, Massachusetts 02138, USA

    • Jonathan P. Beauchamp,
    • Patrick Turley,
    • Olga Rostapshova &
    • David I. Laibson
  5. Center for Economic and Social Research, University of Southern California, Los Angeles, California 90089-3332, USA

    • Mark Alan Fontana &
    • Daniel J. Benjamin
  6. Department of Psychology, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, USA

    • James J. Lee,
    • Michael B. Miller,
    • William G. Iacono,
    • Matt McGue &
    • Robert F. Krueger
  7. Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children’s Hospital, Boston, Massachusetts 2116, USA

    • Tune H. Pers &
    • Tõnu Esko
  8. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA

    • Tune H. Pers,
    • Pascal Timshel,
    • Harm-Jan Westra,
    • Philip L. de Jager,
    • Aarno Palotie &
    • Tõnu Esko
  9. The Novo Nordisk Foundation Center for Basic Metabolic Research, Section of Metabolic Genetics, University of Copenhagen, Faculty of Health and Medical Sciences, Copenhagen 2100, Denmark

    • Tune H. Pers,
    • Tarunveer S. Ahluwalia &
    • Thorkild I. A. Sørensen
  10. Statens Serum Institut, Department of Epidemiology Research, Copenhagen 2300, Denmark

    • Tune H. Pers
  11. Queensland Brain Institute, The University of Queensland, Brisbane, QLD 4072, Australia

    • Guo-Bo Chen,
    • Zhihong Zhu,
    • Andrew Bakshi,
    • Riccardo E. Marioni,
    • Anna A. E. Vinkhuyzen,
    • Jacob Gratten,
    • Jian Yang &
    • Peter M. Visscher
  12. Icelandic Heart Association, Kopavogur 201, Iceland

    • Valur Emilsson &
    • Vilmundur Gudnason
  13. Faculty of Pharmaceutical Sciences, University of Iceland, Reykjavík 107, Iceland

    • Valur Emilsson
  14. Department of Complex Trait Genetics, VU University, Center for Neurogenomics and Cognitive Research, Amsterdam, 1081 HV, The Netherlands

    • S. Fleur W. Meddens,
    • Christiaan deLeeuw,
    • Danielle Posthuma &
    • Philipp D. Koellinger
  15. Amsterdam Business School, University of Amsterdam, Amsterdam, 1018 TV, The Netherlands

    • S. Fleur W. Meddens,
    • Maël P. Lebreton &
    • Philipp D. Koellinger
  16. Department of Government, Uppsala University, Uppsala 751 20, Sweden

    • Sven Oskarsson &
    • Karl-Oskar Lindgren
  17. New York Genome Center, New York, New York 10013, USA

    • Joseph K. Pickrell
  18. Department of Economics, New York University, New York, New York 10012, USA

    • Kevin Thom &
    • David Cesarini
  19. Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark Lyngby 2800, Denmark

    • Pascal Timshel
  20. Department of Biological Psychology, VU University Amsterdam, Amsterdam, 1081 BT, The Netherlands

    • Abdel Abdellaoui,
    • Jouke-Jan Hottenga,
    • Gonneke Willemsen &
    • Dorret I. Boomsma
  21. COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen 2820, Denmark

    • Tarunveer S. Ahluwalia,
    • Klaus Bønnelykke,
    • Johannes Waage &
    • Hans Bisgaard
  22. Steno Diabetes Center, Gentofte 2820, Denmark

    • Tarunveer S. Ahluwalia &
    • Johannes Waage
  23. Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, Gothenburg 416 85, Sweden

    • Jonas Bacelis &
    • Bo Jacobsson
  24. Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg 85764, Germany

    • Clemens Baumbach &
    • Christian Gieger
  25. Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg 85764, Germany

    • Clemens Baumbach &
    • Christa Meisinger
  26. deCODE Genetics/Amgen Inc., Reykjavik 101, Iceland

    • Gyda Bjornsdottir,
    • Augustine Kong,
    • Gudmar Thorleifsson,
    • Bjarni Gunnarsson,
    • Bjarni V. Halldórsson,
    • Kari Stefansson &
    • Unnur Thorsteinsdottir
  27. Department of Cell Biology, Erasmus Medical Center Rotterdam, 3015 CN, The Netherlands

    • Johannes H. Brandsma &
    • Raymond A. Poot
  28. Istituto di Ricerca Genetica e Biomedica U.O.S. di Sassari, National Research Council of Italy, Sassari 07100, Italy

    • Maria Pina Concas,
    • Simona Vaccargiu &
    • Mario Pirastu
  29. Psychology, University of Illinois, Champaign, Illinois 61820, USA

    • Jaime Derringer
  30. 23andMe, Inc., Mountain View, California 94041, USA

    • Nicholas A. Furlotte,
    • David A. Hinds &
    • Joyce Y. Tung
  31. Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, 6500 HB, The Netherlands

    • Tessel E. Galesloot &
    • Lambertus A. L. M. Kiemeney
  32. Department of Medical, Surgical and Health Sciences, University of Trieste, Trieste 34100, Italy

    • Giorgia Girotto,
    • Dragana Vuckovic,
    • Ilaria Gandin,
    • Paolo Gasparini &
    • Nicola Pirastu
  33. Department of Public Health, University of Helsinki, 00014 Helsinki, Finland

    • Richa Gupta,
    • Antti Latvala,
    • Anu Loukola &
    • Jaakko Kaprio
  34. Department of Cardiovascular Sciences, University of Leicester, Leicester LE3 9QP, UK

    • Leanne M. Hall,
    • Christopher P. Nelson &
    • Nilesh J. Samani
  35. NIHR Leicester Cardiovascular Biomedical Research Unit, Glenfield Hospital, Leicester LE3 9QP, UK

    • Leanne M. Hall,
    • Christopher P. Nelson &
    • Nilesh J. Samani
  36. Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, Edinburgh EH8 9JZ, UK

    • Sarah E. Harris,
    • Gail Davies,
    • David C. M. Liewald,
    • Riccardo E. Marioni &
    • Ian J. Deary
  37. Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK

    • Sarah E. Harris &
    • David J. Porteous
  38. Department of Neurology, General Hospital and Medical University Graz, Graz 8036, Austria

    • Edith Hofer,
    • Katja E. Petrovic,
    • Helena Schmidt &
    • Reinhold Schmidt
  39. Institute for Medical Informatics, Statistics and Documentation, General Hospital and Medical University Graz, Graz 8036, Austria

    • Edith Hofer
  40. Oxford Centre for Diabetes, Endocrinology & Metabolism, University of Oxford, Oxford OX3 7LE, UK

    • Momoko Horikoshi
  41. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK

    • Momoko Horikoshi
  42. MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK

    • Jennifer E. Huffman,
    • Jonathan Marten,
    • Caroline Hayward,
    • Veronique Vitart,
    • James F. Wilson &
    • Alan F. Wright
  43. Institute of Behavioural Sciences, University of Helsinki, 00014 Helsinki, Finland

    • Kadri Kaasik,
    • Jari Lahti,
    • Liisa Keltigangas-Järvinen &
    • Katri Räikkönen
  44. Nutrition and Dietetics, Health Science and Education, Harokopio University, Athens 17671, Greece

    • Ioanna P. Kalafati &
    • George V. Dedoussis
  45. Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm 171 77, Sweden

    • Robert Karlsson,
    • Paul Lichtenstein,
    • Nancy L. Pedersen &
    • Patrik K. E. Magnusson
  46. Folkhälsan Research Centre, 00014 Helsingfors, Finland

    • Jari Lahti,
    • Katri Räikkönen &
    • Johan G. Eriksson
  47. Institute for Computing and Information Sciences, Radboud University Nijmegen, Nijmegen, 6525 EC, The Netherlands

    • Christiaan deLeeuw
  48. Quantitative Genetics, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia

    • Penelope A. Lind &
    • Sarah E. Medland
  49. Lifespan Psychology, Max Planck Institute for Human Development, Berlin 14195, Germany

    • Tian Liu
  50. Department of Twin Research and Genetic Epidemiology, King’s College London, London SE1 7EH, UK

    • Massimo Mangino,
    • Lydia Quaye,
    • Cristina Venturini &
    • Tim D. Spector
  51. NIHR Biomedical Research Centre, Guy’s and St. Thomas’ Foundation Trust, London SE1 7EH, UK

    • Massimo Mangino &
    • Cristina Venturini
  52. Estonian Genome Center, University of Tartu, Tartu 51010, Estonia

    • Evelin Mihailov,
    • Natalia Pervjakova,
    • Reedik Mägi,
    • Lili Milani,
    • Andres Metspalu,
    • Markus Perola &
    • Tõnu Esko
  53. Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, 9700 RB, The Netherlands

    • Peter J. van der Most,
    • Behrooz Z. Alizadeh &
    • Judith M. Vonk
  54. Public Health Stream, Hunter Medical Research Institute, New Lambton, NSW 2305, Australia

    • Christopher Oldmeadow,
    • Elizabeth G. Holliday &
    • John R. Attia
  55. Faculty of Health and Medicine, University of Newcastle, Newcastle, NSW 2300, Australia

    • Christopher Oldmeadow,
    • Elizabeth G. Holliday,
    • Rodney J. Scott &
    • John R. Attia
  56. Centre for Integrated Genomic Medical Research, Institute of Population Health, The University of Manchester, Manchester M13 9PT, UK

    • Antony Payton &
    • William E. R. Ollier
  57. Human Communication and Deafness, School of Psychological Sciences, The University of Manchester, Manchester M13 9PL, UK

    • Antony Payton
  58. Department of Health, THL-National Institute for Health and Welfare, 00271 Helsinki, Finland

    • Natalia Pervjakova,
    • Niina Eklund,
    • Seppo Koskinen,
    • Tomi Mäki-Opas,
    • Veikko Salomaa,
    • Jaakko Kaprio &
    • Markus Perola
  59. Psychiatry, VU University Medical Center & GGZ inGeest, Amsterdam, 1081 HL, The Netherlands

    • Wouter J. Peyrot,
    • Yusplitri Milaneschi &
    • Brenda W. J. H. Penninx
  60. Laboratory of Genetics, National Institute on Aging, Baltimore, Maryland 21224, USA

    • Yong Qian,
    • Jun Ding &
    • David Schlessinger
  61. Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku, 20521 Turku, Finland

    • Olli Raitakari
  62. Department of Medical Genetics, University of Lausanne, Lausanne 1005, Switzerland

    • Rico Rueedi &
    • Zoltan Kutalik
  63. Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland

    • Rico Rueedi &
    • Zoltan Kutalik
  64. Department Of Health Sciences, University of Milan, Milano 20142, Italy

    • Erika Salvi &
    • Daniele Cusi
  65. Institute for Medical Informatics, Biometry and Epidemiology, University Hospital of Essen, Essen 45147, Germany

    • Börge Schmidt,
    • Lewin Eisele &
    • Karl-Heinz Jöckel
  66. Centre for Global Health Research, The Usher Institute for Population Health Sciences and Informatics, University of Edinburgh, Edinburgh EH8 9AG, UK

    • Katharina E. Schraut,
    • Harry Campbell,
    • Peter K. Joshi,
    • Igor Rudan,
    • Ozren Polasek &
    • James F. Wilson
  67. Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland 20892-9780, USA

    • Jianxin Shi
  68. Icelandic Heart Association, Kopavogur 201, Iceland

    • Albert V. Smith
  69. Faculty of Medicine, University of Iceland, Reykjavik 101, Iceland

    • Albert V. Smith,
    • Vilmundur Gudnason,
    • Kari Stefansson &
    • Unnur Thorsteinsdottir
  70. MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK

    • Beate St Pourcain,
    • David M. Evans,
    • George McMahon,
    • Lavinia Paternoster,
    • Susan M. Ring,
    • Thorkild I. A. Sørensen,
    • Nicholas J. Timpson &
    • George Davey Smith
  71. School of Oral and Dental Sciences, University of Bristol, Bristol BS1 2LY, UK

    • Beate St Pourcain
  72. Institute for Community Medicine, University Medicine Greifswald, Greifswald 17475, Germany

    • Alexander Teumer,
    • Sebastian E. Baumeister,
    • Henry Völzke &
    • Wolfgang Hoffmann
  73. Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, 9700 RB, The Netherlands

    • Niek Verweij,
    • Klaus Berger &
    • Pim van der Harst
  74. Institute of Epidemiology and Social Medicine, University of Münster, Münster 48149, Germany

    • Juergen Wellmann
  75. Divisions of Genetics and Rheumatology, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA

    • Harm-Jan Westra
  76. Partners Center for Personalized Genetic Medicine, Boston, Massachusetts 02115, USA

    • Harm-Jan Westra
  77. Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois 60612, USA

    • Jingyun Yang,
    • Patricia A. Boyle &
    • David A. Bennett
  78. Department of Neurological Sciences, Rush University Medical Center, Chicago, Illinois 60612, USA

    • Jingyun Yang &
    • David A. Bennett
  79. Department of Epidemiology, University of Michigan, Ann Arbor, Michigan 48109, USA

    • Wei Zhao,
    • Jennifer A. Smith,
    • Erin B. Ware &
    • Sharon L. R. Kardia
  80. Department of Gastroenterology and Hepatology, University of Groningen, University Medical Center Groningen, Groningen, 9713 GZ, The Netherlands

    • Behrooz Z. Alizadeh
  81. Institute of Epidemiology and Preventive Medicine, University of Regensburg, Regensburg D-93053, Germany

    • Sebastian E. Baumeister
  82. Institute of Molecular Genetics, National Research Council of Italy, Pavia 27100, Italy

    • Ginevra Biino
  83. Department of Behavioral Sciences, Rush University Medical Center, Chicago, Illinois 60612, USA

    • Patricia A. Boyle
  84. Warwick Medical School, University of Warwick, Coventry CV4 7AL, UK

    • Francesco P. Cappuccio
  85. Department of Psychology, University of Edinburgh, Edinburgh EH8 9JZ, UK

    • Gail Davies,
    • David C. M. Liewald &
    • Ian J. Deary
  86. Saïd Business School, University of Oxford, Oxford OX1 1HP, UK

    • Jan-Emmanuel De Neve
  87. William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK

    • Panos Deloukas &
    • Stavroula Kanoni
  88. Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), King Abdulaziz University, Jeddah 21589, Saudi Arabia

    • Panos Deloukas
  89. The Berlin Aging Study II; Research Group on Geriatrics, Charité – Universitätsmedizin Berlin, Germany, Berlin 13347, Germany

    • Ilja Demuth &
    • Elisabeth Steinhagen-Thiessen
  90. Institute of Medical and Human Genetics, Charité-Universitätsmedizin, Berlin, Berlin 13353, Germany

    • Ilja Demuth
  91. German Socio- Economic Panel Study, DIW Berlin, Berlin 10117, Germany

    • Peter Eibich &
    • Martin Kroh
  92. Health Economics Research Centre, Nuffield Department of Population Health, University of Oxford, Oxford OX3 7LF, UK

    • Peter Eibich
  93. The University of Queensland Diamantina Institute, The Translational Research Institute, Brisbane, QLD 4102, Australia

    • David M. Evans,
    • Jian Yang &
    • Peter M. Visscher
  94. Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, Michigan 48109, USA

    • Jessica D. Faul &
    • David R. Weir
  95. Department of Genetics, Division of Statistical Genomics, Washington University School of Medicine, St. Louis, Missouri 63018, USA

    • Mary F. Feitosa,
    • Aldi T. Kraja,
    • Ingrid B. Borecki &
    • Michael A. Province
  96. Institute of Human Genetics, University of Bonn, Bonn 53127, Germany

    • Andreas J. Forstner
  97. Department of Genomics, Life and Brain Center, University of Bonn, Bonn 53127, Germany

    • Andreas J. Forstner
  98. Institute of Biomedical and Neural Engineering, School of Science and Engineering, Reykjavik University, Reykjavik 101, Iceland

    • Bjarni V. Halldórsson
  99. Laboratory of Epidemiology, Demography, National Institute on Aging, National Institutes of Health, Bethesda, Maryland 20892-9205, USA

    • Tamara B. Harris
  100. Department of Psychiatry, Washington University School of Medicine, St. Louis, Missouri 63110, USA

    • Andrew C. Heath &
    • Pamela A. Madden
  101. Division of Applied Health Sciences, University of Aberdeen, Aberdeen AB25 2ZD, UK

    • Lynne J. Hocking
  102. Interfaculty Institute for Genetics and Functional Genomics, University Medicine Greifswald, Greifswald 17475, Germany

    • Georg Homuth &
    • Uwe Völker
  103. Manchester Medical School, The University of Manchester, Manchester M13 9PT, UK

    • Michael A. Horan
  104. Program in Translational NeuroPsychiatric Genomics, Departments of Neurology & Psychiatry, Brigham and Women’s Hospital, Boston, Massachusetts 02115, USA

    • Philip L. de Jager
  105. Harvard Medical School, Boston, Massachusetts 02115, USA

    • Philip L. de Jager
  106. Department of Genes and Environment, Norwegian Institute of Public Health, N-0403 Oslo, Norway

    • Astanand Jugessur,
    • Ronny Myhre &
    • Bo Jacobsson
  107. Department of Genomics of Common Disease, Imperial College London, London, W12 0NN, UK

    • Marika A. Kaakinen
  108. Department of Clinical Physiology, Tampere University Hospital, 33521 Tampere, Finland

    • Mika Kähönen
  109. Department of Clinical Physiology, University of Tampere, School of Medicine, 33014 Tampere, Finland

    • Mika Kähönen
  110. Public Health, Medical School, University of Split, 21000 Split, Croatia

    • Ivana Kolcic
  111. Institute of Social and Preventive Medicine, Lausanne University Hospital (CHUV), Lausanne 1010, Switzerland

    • Zoltan Kutalik
  112. Neuroepidemiology Section, National Institute on Aging, National Institutes of Health, Bethesda, Maryland 20892-9205, USA

    • Lenore J. Launer
  113. Amsterdam Brain and Cognition Center, University of Amsterdam, Amsterdam, 1018 XA, The Netherlands

    • Maël P. Lebreton
  114. Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, California 94305-5797, USA

    • Douglas F. Levinson
  115. Institute of Human Genetics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg 85764, Germany

    • Peter Lichtner &
    • Thomas Meitinger
  116. Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK

    • Riccardo E. Marioni
  117. Department of Internal Medicine, Internal Medicine, Lausanne University Hospital (CHUV), Lausanne 1011, Switzerland

    • Pedro Marques-Vidal &
    • Peter Vollenweider
  118. Tema BV, Hoofddorp, 2131 HE, The Netherlands

    • Gerardus A. Meddens
  119. Molecular Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia

    • Grant W. Montgomery &
    • Dale R. Nyholt
  120. Institute of Health and Biomedical Innovation, Queensland Institute of Technology, Brisbane, QLD 4059, Australia

    • Dale R. Nyholt
  121. Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital, Boston, Massachusetts 02114, USA

    • Aarno Palotie
  122. The Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA

    • Aarno Palotie
  123. Psychiatric & Neurodevelopmental Genetics Unit, Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts 02114, USA

    • Aarno Palotie
  124. Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki 00014, Finland

    • Aarno Palotie,
    • Antti-Pekka Sarin &
    • Jaakko Kaprio
  125. Department of Neurology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA

    • Aarno Palotie
  126. Medical Genetics, Institute for Maternal and Child Health IRCCS “Burlo Garofolo”, Trieste 34100, Italy

    • Antonietta Robino,
    • Sheila Ulivi &
    • Paolo Gasparini
  127. Social Impact, Arlington, Virginia 22201, USA

    • Olga Rostapshova &
    • Diego Vozzi
  128. Department of Economics, University of Minnesota Twin Cities, Minneapolis, Minnesota 55455, USA

    • Aldo Rustichini
  129. Department of Psychiatry and Behavioral Sciences, NorthShore University HealthSystem, Evanston, Illinois 60201-3137, USA

    • Alan R. Sanders &
    • Pablo V. Gejman
  130. Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, Illinois 60637, USA

    • Alan R. Sanders &
    • Pablo V. Gejman
  131. Public Health Genomics Unit, National Institute for Health and Welfare, 00300 Helsinki, Finland

    • Antti-Pekka Sarin
  132. Research Unit for Genetic Epidemiology, Institute of Molecular Biology and Biochemistry, Center of Molecular Medicine, General Hospital and Medical University, Graz, Graz 8010, Austria

    • Helena Schmidt
  133. Information Based Medicine Stream, Hunter Medical Research Institute, New Lambton, NSW 2305, Australia

    • Rodney J. Scott
  134. Medical Research Institute, University of Dundee, Dundee DD1 9SY, UK

    • Blair H. Smith
  135. Research Unit Hypertension and Cardiovascular Epidemiology, Department of Cardiovascular Science, University of Leuven, Leuven 3000, Belgium

    • Jan A. Staessen
  136. R&D VitaK Group, Maastricht University, Maastricht, 6229 EV, The Netherlands

    • Jan A. Staessen
  137. Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg 85764, Germany

    • Konstantin Strauch
  138. Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig Maximilians-Universität, Munich 81377, Germany

    • Konstantin Strauch
  139. Department of Geriatrics, Florida State University College of Medicine, Tallahassee, Florida 32306, USA

    • Antonio Terracciano
  140. Department of Health Sciences and Genetics, University of Leicester, Leicester LE1 7RH, UK

    • Martin D. Tobin
  141. Department of Internal Medicine, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands

    • Frank J. A. van Rooij
  142. Research Center for Group Dynamics, Institute for Social Research, University of Michigan, Ann Arbor, Michigan 48104, USA

    • Erin B. Ware
  143. Platform for Genome Analytics, Institutes of Neurogenetics & Integrative and Experimental Genomics, University of Lübeck, Lübeck 23562, Germany

    • Lars Bertram
  144. Neuroepidemiology and Ageing Research Unit, School of Public Health, Faculty of Medicine, Imperial College of Science, Technology and Medicine, London SW7 2AZ, UK

    • Lars Bertram
  145. Department of Health Sciences, Community & Occupational Medicine, University of Groningen, University Medical Center Groningen, Groningen, 9713 AV, The Netherlands

    • Ute Bültmann
  146. Department of Psychology, Union College, Schenectady, New York 12308, USA

    • Christopher F. Chabris
  147. Istituto di Ricerca Genetica e Biomedica (IRGB), Consiglio Nazionale delle Ricerche, c/o Cittadella Universitaria di Monserrato, Monserrato, Cagliari 9042, Italy

    • Francesco Cucca
  148. Institute of Biomedical Technologies, Italian National Research Council, Segrate (Milano) 20090, Italy

    • Daniele Cusi
  149. Department of General Practice and Primary Health Care, University of Helsinki, 00014 Helsinki, Finland

    • Johan G. Eriksson
  150. Departments of Human Genetics and Psychiatry, Donders Centre for Neuroscience, Nijmegen, 6500 HB, The Netherlands

    • Barbara Franke
  151. Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, 9700 RB, The Netherlands

    • Lude Franke &
    • Pim van der Harst
  152. Sidra, Experimental Genetics Division, Sidra, Doha 26999, Qatar

    • Paolo Gasparini
  153. Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald 17475, Germany

    • Hans-Jörgen Grabe
  154. Department of Psychiatry and Psychotherapy, HELIOS-Hospital Stralsund, Stralsund 18437, Germany

    • Hans-Jörgen Grabe
  155. Econometric Institute, Erasmus School of Economics, Erasmus University Rotterdam, Rotterdam, 3062 PA, The Netherlands

    • Patrick J. F. Groenen
  156. Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, 1105 AZ, The Netherlands

    • Pim van der Harst
  157. Generation Scotland, Centre for Genomics and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh EH4 2XU, UK

    • Caroline Hayward
  158. Centre for Population Health Research, School of Health Sciences and Sansom Institute, University of South Australia, Adelaide, SA 5000, Australia

    • Elina Hyppönen
  159. South Australian Health and Medical Research Institute, Adelaide, SA 5000, Australia

    • Elina Hyppönen
  160. Population, Policy and Practice, UCL Institute of Child Health, London WC1N 1EH, UK

    • Elina Hyppönen &
    • Christine Power
  161. Department of Epidemiology and Biostatistics, MRC-PHE Centre for Environment & Health, School of Public Health, Imperial College London, London W2 1PG, UK

    • Marjo-Riitta Järvelin
  162. Center for Life Course Epidemiology, Faculty of Medicine, University of Oulu, 90014 Oulu, Finland

    • Marjo-Riitta Järvelin
  163. Unit of Primary Care, Oulu University Hospital, 90029 Oulu, Finland

    • Marjo-Riitta Järvelin
  164. Biocenter Oulu, University of Oulu, 90014 Oulu, Finland

    • Marjo-Riitta Järvelin
  165. Fimlab Laboratories, 33520 Tampere, Finland

    • Terho Lehtimäki
  166. Department of Clinical Chemistry, University of Tampere, School of Medicine, 33014 Tampere, Finland

    • Terho Lehtimäki
  167. Economics, NYU Shanghai, 200122 Pudong, China

    • Steven F. Lehrer
  168. Policy Studies, Queen’s University, Kingston, Ontario K7L 3N6, Canada

    • Steven F. Lehrer
  169. Genetic Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, QLD 4029, Australia

    • Nicholas G. Martin
  170. Institute of Molecular and Cell Biology, University of Tartu, Tartu 51010, Estonia

    • Andres Metspalu
  171. Centre for Clinical and Cognitive Neuroscience, Institute Brain Behaviour and Mental Health, Salford Royal Hospital, Manchester M6 8HD, UK

    • Neil Pendleton
  172. Manchester Institute for Collaborative Research in Ageing, University of Manchester, Manchester M13 9PL, UK

    • Neil Pendleton
  173. Faculty of Medicine, University of Split, Split 21000, Croatia

    • Ozren Polasek
  174. Department of Clinical Genetics, VU Medical Centre, Amsterdam, 1081 HV, The Netherlands

    • Danielle Posthuma
  175. Institute of Preventive Medicine. Bispebjerg and Frederiksberg Hospitals, The Capital Region, Frederiksberg 2000, Denmark

    • Thorkild I. A. Sørensen
  176. Montpellier Business School, Montpellier 34080, France

    • A. Roy Thurik
  177. Panteia, Zoetermeer, 2715 CA, The Netherlands

    • A. Roy Thurik
  178. Department of Psychiatry, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands

    • Henning Tiemeier
  179. Department of Child and Adolescent Psychiatry, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands

    • Henning Tiemeier
  180. Department of Internal Medicine, Erasmus Medical Center, Rotterdam, 3015 GE, The Netherlands

    • André G. Uitterlinden
  181. Department of Sociology, New York University, New York, New York 10012, USA

    • Dalton C. Conley
  182. School of Medicine, New York University, New York, New York 10016, USA

    • Dalton C. Conley
  183. Bioethics Program, Union Graduate College – Icahn School of Medicine at Mount Sinai, Schenectady, New York 12308, USA

    • Michelle N. Meyer
  184. Department of Economics, Stockholm School of Economics, Stockholm 113 83, Sweden

    • Magnus Johannesson
  185. Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.

    • Tõnu Esko
  186. Research Institute for Industrial Economics, Stockholm 10215, Sweden

    • David Cesarini

Contributions

Study design and management: D.J.B., D.Ce., T.E., M.J., P.D.K. and P.M.V. Quality control and meta-analysis: A.O., G.B.C., T.E., M.A.F., C.A.R. and T.H.P. Stratification: P.T., J.P.B., C.A.R. and J.Y. Genetic overlap: J.P.B., M.A.F., P.T. Biological annotation: J.J.L., T.E., T.H.P., J.K.P., J.H.B., J.P.B., L.F., V.E., G.A.M., M.A.F., S.F.W.M., P.Ti., R.A.P., R.d.V. and H.J.W. Prediction and mediation: J.P.B., M.A.F. and J.Y. G×E: D.Co., S.F.L., K.O.L., S.O. and K.T. Replication in UKB: M.A.F. and C.A.R. SSGAC advisory board: D.Co., T.E., A.H., R.F.K., D.I.L., S.E.M., M.N.M., G.D.S. and P.M.V. All authors contributed to and critically reviewed the manuscript. Authors not listed above contributed to the recruitment, genotyping, or data processing for the contributing components of the meta-analysis. For a full list of author contributions, see Supplementary Information section 8.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Results can be downloaded from the SSGAC website (http://ssgac.org/Data.php). Data for our analyses come from many studies and organizations, some of which are subject to a MTA, and are listed in the Supplementary Information.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Q–Q plot of the genome-wide association meta-analysis of 64 EduYears results files (n = 293,723). (90 KB)

    Observed and expected P values are on a −log10 scale (two-tailed). The grey region depicts the 95% confidence interval under the null hypothesis of a uniform P value distribution. The observed λGC is 1.28. (As reported in Supplementary Information section 1.5.4, the unweighted mean λGC is 1.02, the unweighted median is 1.01, and the range across cohorts is 0.95–1.15.)

  2. Extended Data Figure 2: The distribution of effect sizes of the 74 lead SNPs. (225 KB)

    a, SNPs ordered by absolute value of the standardized effect of one more copy of the education-increasing allele, with 95% confidence intervals. b, SNPs ordered by R2. Effects on EduYears are benchmarked against the top 74 genome-wide significant hits identified in the largest GWAS conducted to date of height and body mass index (BMI), and the 48 associations reported for waist-to-hip ratio adjusted for BMI (WHR). These results are based on the GIANT consortium’s publicly available results for pooled analyses restricted to European-ancestry individuals: https://www.broadinstitute.org/collaboration/giant/index.php/GIANT_consortium.

  3. Extended Data Figure 3: Assessing the extent to which population stratification affects the estimates from the GWAS. (77 KB)

    a, LD score regression plot with the summary statistics from the GWAS. Each point represents an LD score quantile for a chromosome (the x and y coordinates of the point are the mean LD score and the mean χ2 statistic of variants in that quantile). That the intercept is close to 1 and that the χ2 statistics increase linearly with the LD scores suggest that the bulk of the inflation in the χ2 statistics is due to true polygenic signal and not to population stratification. b, Estimates and 95% confidence intervals from individual-level and within-family regressions of EduYears on polygenic scores, for scores constructed with sets of SNPs meeting different P value thresholds. In addition to the analyses shown here, we conduct a sign concordance test, and we decompose the variance of the polygenic score. Overall, these analyses suggest that population stratification is unlikely to be a major concern for our 74 lead SNPs. See Supplementary Information section 3 for additional details.

  4. Extended Data Figure 4: Replication of 74 lead SNPs in the UK Biobank data. (283 KB)

    Estimated effect sizes (in years of schooling) and 95% confidence intervals of the 74 lead SNPs in the meta-analysis sample (n = 293,723) and the UK Biobank replication sample (n = 111,349). The reference allele is the allele associated with higher values of EduYears in the meta-analysis sample. SNPs are in descending order of R2 in the meta-analysis sample. Of the 74 lead SNPs, 72 have the anticipated sign in the replication sample, 52 replicate at the 0.05 significance level, and 7 replicate at the 5 × 10−8 significance level.

  5. Extended Data Figure 5: Q–Q plots for the 74 lead EduYears SNPs (or LD proxies) in published GWAS of other phenotypes. (227 KB)

    SNPs with concordant effects on both phenotypes are pink, and SNPs with discordant effects are blue. SNPs outside the grey area pass Bonferroni-corrected significance thresholds that correct for the total number of SNPs we tested (P < 0.05/74 = 6.8 × 10−4) and are labelled with their rs numbers. Observed and expected P values are on a −log10 scale. For the sign concordance test: *P < 0.05, **P < 0.01 and ***P < 0.001.

  6. Extended Data Figure 6: Regional association plots for four of the ten prioritized SNPs for mental health, brain anatomy, and anthropometric phenotypes identified using EduYears as a proxy phenotype. (291 KB)

    a, Cognitive performance; b, hippocampus; c, intracranial volume; d, neuroticism. The four were selected because very few genome-wide significant SNPs have been previously reported for these traits. Data sources and methods are described in Supplementary Information section 3. The R2 values are from the hg19 / 1000 Genomes Nov 2014 EUR references samples. The figures were created with LocusZoom (http://csg.sph.umich.edu/locuszoom/). Mb, megabases.

  7. Extended Data Figure 7: Application of fgwas to EduYears. (230 KB)

    See Supplementary Information section 4.2 for further details. a, The results of single-annotation models. ‘Enrichment’ refers to the factor by which the prior odds of association at an LD-defined region must be multiplied if the region bears the given annotation; this factor is estimated using an empirical Bayes method applied to all SNPs in the GWAS meta-analysis regardless of statistical significance. Annotations were derived from ENCODE and a number of other data sources. Plotted are the base 2 logarithms of the enrichments and their 95% confidence intervals. Multiple instances of the same annotation correspond to independent replicates of the same experiment. b, The results of combining multiple annotations and applying model selection and cross-validation. Although the maximum-likelihood estimates are plotted, model selection was performed with penalized likelihood. c, Reweighting of GWAS loci. Each point represents an LD-defined region of the genome, and shown are the regional posterior probabilities of association (PPAs). The x axis gives the PPA calculated from the GWAS summary statistics alone, whereas the y axis gives the PPA upon reweighting on the basis of the annotations in b. The orange points represent genomic regions where the PPA is equivalent to the standard GWAS significance threshold only upon reweighting.

  8. Extended Data Figure 8: Tissue-level biological annotation. (150 KB)

    a, The enrichment factor for a given tissue type is the ratio of variance explained by SNPs in that group to the overall fraction of SNPs in that group. To benchmark the estimates for EduYears, we compare the enrichment factors to those obtained when we use the largest GWAS conducted to date on BMI, height, and waist-to-hip ratio adjusted for BMI. The estimates were produced with the LDSC Python software, using the LD scores and functional annotations introduced in ref. 17 and the HapMap3 SNPs with minor allele frequency >0.05. Each of the ten enrichment calculations for a particular cell type is performed independently, while each controlling for the 52 functional annotation categories in the full baseline model. The error bars show the 95% confidence intervals. b, We took measurements of gene expression by the Genotype-Tissue Expression (GTEx) Consortium and determined whether the genes overlapping EduYears-associated loci are significantly overexpressed (relative to genes in random sets of loci matched by gene density) in each of 37 tissue types. These types are grouped in the panel by organ. The dark bars correspond to tissues where there is significant overexpression. The y axis is the significance on a −log10 scale.

  9. Extended Data Figure 9: Gene-level biological annotation. (279 KB)

    a, The DEPICT-prioritized genes for EduYears measured in the BrainSpan Developmental Transcriptome data (red curve) are more strongly expressed in the brain prenatally rather than postnatally. The DEPICT-prioritized genes exhibit similar gene expression levels across different brain regions (grey lines). Analyses were based on log2-transformed RNA-seq data. Error bars represent 95% confidence intervals. b, For each phenotype and disorder, we calculated the overlap between the phenotype’s DEPICT-prioritized genes and genes believed to harbour de novo mutations causing the disorder. The bars correspond to odds ratios. c, DEPICT-prioritized genes in EduYears-associated loci exhibit substantial overlap with genes previously reported to harbour sites where mutations increase risk of intellectual disability and autism spectrum disorder (Supplementary Table 4.6.1).

  10. Extended Data Figure 10: The predictive power of a polygenic score (PGS) varies in Sweden by birth cohort. (250 KB)

    Five-year rolling regressions of years of education on the PGS (left axis in all four panels), share of individuals not affected by the comprehensive school reform (a, right axis), and average distance to nearest junior high school (b, right axis), nearest high school (c, right axis) and nearest college/university (d, right axis). The shaded area displays the 95% confidence intervals for the PGS effect.

Supplementary information

PDF files

  1. Supplementary Information (4.2 MB)

    This file contains Supplementary Text and Data – see contents page for details.

Excel files

  1. Supplementary Data (4.1 MB)

    This file contains Supplementary Tables.

Comments

  1. Report this comment #68083

    David Whitlock said:

    This is an interesting and daunting analysis, with what I consider to be a surprising result. My null hypothesis in such a study would be that effects of social discrimination would be important, and unless specifically accounted for, would likely dominate.

    I don't see mention as to how educational discrimination and economic discrimination associated with ethnicity was ?corrected? for. The phenotype is represented as a linear sum of purely linear weighting factors which are considered to be independent of environmental effects. In other words, the variance due to genetic X environmental effects is considered to be zero.

    We know that discrimination differentially and adversely affects individuals based upon the ethnicity of the person being discriminated against. The social environment does not treat individuals of different ethnicity equivalently. Individuals with SNPs associated with an ethnicity that is discriminated against will experience an environment that is not equivalent to the environment experienced by individuals with an ethnicity that is not discriminated against (and accordingly who do not have those SNPs).

    Without proper accounting for differential effects of ethnicity on social discrimination, phantom heritability will certainly be generated.

    In any case, the ?genetic component? found is tiny, and cannot be changed (at present). There are multiple non-genetic interventions that can be implemented that do positively influence educational attainment; higher socioeconomic status, better prenatal, infant, child and young adult healthcare, better schools, reduced violence in neighborhoods. Wouldn't public policy be better served by focusing on interventions we can do to increase educational attainment?

  2. Report this comment #68085

    David Whitlock said:

    I just noticed a NYT article that illustrates some of the ethnicity differences that need to be accounted for.

    http://www.nytimes.com/interactive/2016/04/29/upshot/money-race-and-success-how-your-school-district-compares.html?_r=1

Subscribe to comments

Additional data