Technical Report | Published:

LD Score regression distinguishes confounding from polygenicity in genome-wide association studies

Nature Genetics volume 47, pages 291295 (2015) | Download Citation

Abstract

Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    & Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).

  2. 2.

    , , & Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66, 1616–1630 (2000).

  3. 3.

    et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

  4. 4.

    & Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005).

  5. 5.

    & Genomic control for association studies. Biometrics 55, 997–1004 (1999).

  6. 6.

    & Meta-analysis of genome-wide association studies with overlapping subjects. Am. J. Hum. Genet. 85, 862–872 (2009).

  7. 7.

    1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  8. 8.

    & Estimating R2 shrinkage in multiple regression: a comparison of different analytical methods. J. Exp. Educ. 69, 203–224 (2001).

  9. 9.

    & The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013).

  10. 10.

    et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).

  11. 11.

    , , & Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).

  12. 12.

    et al. The impact of divergence time on the nature of population structure: an example from Iceland. PLoS Genet. 5, e1000505 (2009).

  13. 13.

    International Multiple Sclerosis Genetics Consortium & Wellcome Trust Case Control Consortium 2. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).

  14. 14.

    et al. The Third Generation Cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165, 1328–1335 (2007).

  15. 15.

    et al. Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Mol. Psychiatry 14, 359–375 (2009).

  16. 16.

    et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat. Genet. 42, 949–960 (2010).

  17. 17.

    et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

  18. 18.

    et al. Meta-analysis of genome-wide association studies of attention-deficit/hyperactivity disorder. J. Am. Acad. Child Adolesc. Psychiatry 49, 884–897 (2010).

  19. 19.

    et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

  20. 20.

    et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514 (2010).

  21. 21.

    Tobacco & Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

  22. 22.

    International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).

  23. 23.

    Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983 (2011).

  24. 24.

    et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).

  25. 25.

    et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat. Genet. 44, 491–501 (2012).

  26. 26.

    et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).

  27. 27.

    et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012).

  28. 28.

    et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).

  29. 29.

    Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–1379 (2013).

  30. 30.

    Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).

  31. 31.

    et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

  32. 32.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  33. 33.

    , & Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).

  34. 34.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

  35. 35.

    et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

  36. 36.

    et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

  37. 37.

    et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).

  38. 38.

    , , & GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  39. 39.

    et al. The genome-wide patterns of variation expose significant substructure in a founder population. Am. J. Hum. Genet. 83, 787–794 (2008).

  40. 40.

    International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

  41. 41.

    et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135, author reply 135–139 (2008).

  42. 42.

    , , & Sequence features in regions of weak and strong linkage disequilibrium. Genome Res. 15, 1519–1534 (2005).

  43. 43.

    et al. The structure and evolution of centromeric transition regions within the human genome. Nature 430, 857–864 (2004).

Download references

Acknowledgements

We would like to thank P. Sullivan for helpful discussion. This work was supported by US National Institutes of Health grants F32 HG007805 (P.-R.L.), R01 HG006399 (A.L.P.), R03 CA173785 (H.K.F.) and R01 MH094421 (PGC) and by the Fannie and John Hertz Foundation (H.K.F.). Data on coronary artery disease and myocardial infarction were contributed by CARDIoGRAMplusC4D investigators and were downloaded from Psychiatric Genomics Consortium.

Author information

Affiliations

  1. Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Brendan K Bulik-Sullivan
    • , Po-Ru Loh
    • , Nick Patterson
    • , Mark J Daly
    • , Alkes L Price
    •  & Benjamin M Neale
  2. Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.

    • Brendan K Bulik-Sullivan
    • , Stephan Ripke
    • , Mark J Daly
    •  & Benjamin M Neale
  3. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • Brendan K Bulik-Sullivan
    • , Stephan Ripke
    • , Mark J Daly
    •  & Benjamin M Neale
  4. Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Po-Ru Loh
    • , Hilary K Finucane
    •  & Alkes L Price
  5. Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.

    • Hilary K Finucane
  6. Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia.

    • Jian Yang
  7. Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

    • Alkes L Price

Consortia

  1. Schizophrenia Working Group of the Psychiatric Genomics Consortium

    A full list of members and affiliations appears in the Supplementary Note.

Authors

  1. Search for Brendan K Bulik-Sullivan in:

  2. Search for Po-Ru Loh in:

  3. Search for Hilary K Finucane in:

  4. Search for Stephan Ripke in:

  5. Search for Jian Yang in:

  6. Search for Nick Patterson in:

  7. Search for Mark J Daly in:

  8. Search for Alkes L Price in:

  9. Search for Benjamin M Neale in:

Contributions

B.K.B.-S. conceived the idea, analyzed the data, performed the analyses and drafted the manuscript. B.M.N. conceived the idea and drafted the manuscript. M.J.D. conceived the idea and supplied reagents. N.P. conceived the idea and supplied reagents. A.L.P. conceived the idea and supplied reagents. P.-R.L. analyzed the data and performed the analyses. H.K.F. analyzed the data and performed the analyses. S.R. analyzed the data and performed the analyses. J.Y. provided software. All authors provided input and revisions for the final manuscript.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Benjamin M Neale.

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Note, Supplementary Figures 1–9 and Supplementary Tables 1–10.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/ng.3211

Further reading