Abstract
Both polygenicity (many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield an inflated distribution of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from a true polygenic signal and bias. We have developed an approach, LD Score regression, that quantifies the contribution of each by examining the relationship between test statistics and linkage disequilibrium (LD). The LD Score regression intercept can be used to estimate a more powerful and accurate correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of the inflation in test statistics in many GWAS of large sample size.
Access options
Subscribe to Journal
Get full journal access for 1 year
$225.00
only $18.75 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
from$8.99
All prices are NET prices.
References
- 1.
Pritchard, J.K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).
- 2.
Sham, P.C., Cherny, S.S., Purcell, S. & Hewitt, J.K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66, 1616–1630 (2000).
- 3.
Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).
- 4.
Voight, B.F. & Pritchard, J.K. Confounding from cryptic relatedness in case-control association studies. PLoS Genet. 1, e32 (2005).
- 5.
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
- 6.
Lin, D.Y. & Sullivan, P.F. Meta-analysis of genome-wide association studies with overlapping subjects. Am. J. Hum. Genet. 85, 862–872 (2009).
- 7.
1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
- 8.
Yin, P. & Fan, X. Estimating R2 shrinkage in multiple regression: a comparison of different analytical methods. J. Exp. Educ. 69, 203–224 (2001).
- 9.
Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013).
- 10.
Bersaglieri, T. et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004).
- 11.
McVicker, G., Gordon, D., Davis, C. & Green, P. Widespread genomic signatures of natural selection in hominid evolution. PLoS Genet. 5, e1000471 (2009).
- 12.
Price, A.L. et al. The impact of divergence time on the nature of population structure: an example from Iceland. PLoS Genet. 5, e1000505 (2009).
- 13.
International Multiple Sclerosis Genetics Consortium & Wellcome Trust Case Control Consortium 2. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011).
- 14.
Splansky, G.L. et al. The Third Generation Cohort of the National Heart, Lung, and Blood Institute's Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165, 1328–1335 (2007).
- 15.
Sullivan, P.F. et al. Genome-wide association for major depressive disorder: a possible role for the presynaptic protein piccolo. Mol. Psychiatry 14, 359–375 (2009).
- 16.
Heid, I.M. et al. Meta-analysis identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism in the genetic basis of fat distribution. Nat. Genet. 42, 949–960 (2010).
- 17.
Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).
- 18.
Neale, B.M. et al. Meta-analysis of genome-wide association studies of attention-deficit/hyperactivity disorder. J. Am. Acad. Child Adolesc. Psychiatry 49, 884–897 (2010).
- 19.
Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
- 20.
Stahl, E.A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514 (2010).
- 21.
Tobacco & Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).
- 22.
International Consortium for Blood Pressure Genome-Wide Association Studies. Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature 478, 103–109 (2011).
- 23.
Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983 (2011).
- 24.
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).
- 25.
Estrada, K. et al. Genome-wide meta-analysis identifies 56 bone mineral density loci and reveals 14 loci associated with risk of fracture. Nat. Genet. 44, 491–501 (2012).
- 26.
Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
- 27.
Manning, A.K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012).
- 28.
Morris, A.P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).
- 29.
Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–1379 (2013).
- 30.
Major Depressive Disorder Working Group of the Psychiatric GWAS Consortium. A mega-analysis of genome-wide association studies for major depressive disorder. Mol. Psychiatry 18, 497–511 (2013).
- 31.
Rietveld, C.A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).
- 32.
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
- 33.
Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006).
- 34.
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
- 35.
Kang, H.M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).
- 36.
Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).
- 37.
Korte, A. et al. A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat. Genet. 44, 1066–1071 (2012).
- 38.
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
- 39.
Jakkula, E. et al. The genome-wide patterns of variation expose significant substructure in a founder population. Am. J. Hum. Genet. 83, 787–794 (2008).
- 40.
International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
- 41.
Price, A.L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135, author reply 135–139 (2008).
- 42.
Smith, A.V., Thomas, D.J., Munro, H.M. & Abecasis, G.R. Sequence features in regions of weak and strong linkage disequilibrium. Genome Res. 15, 1519–1534 (2005).
- 43.
She, X. et al. The structure and evolution of centromeric transition regions within the human genome. Nature 430, 857–864 (2004).
Acknowledgements
We would like to thank P. Sullivan for helpful discussion. This work was supported by US National Institutes of Health grants F32 HG007805 (P.-R.L.), R01 HG006399 (A.L.P.), R03 CA173785 (H.K.F.) and R01 MH094421 (PGC) and by the Fannie and John Hertz Foundation (H.K.F.). Data on coronary artery disease and myocardial infarction were contributed by CARDIoGRAMplusC4D investigators and were downloaded from Psychiatric Genomics Consortium.
Author information
Affiliations
Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
- Brendan K Bulik-Sullivan
- , Po-Ru Loh
- , Nick Patterson
- , Mark J Daly
- , Alkes L Price
- & Benjamin M Neale
Analytical and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, USA.
- Brendan K Bulik-Sullivan
- , Stephan Ripke
- , Mark J Daly
- & Benjamin M Neale
Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
- Brendan K Bulik-Sullivan
- , Stephan Ripke
- , Mark J Daly
- & Benjamin M Neale
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
- Po-Ru Loh
- , Hilary K Finucane
- & Alkes L Price
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA.
- Hilary K Finucane
Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia.
- Jian Yang
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
- Alkes L Price
Consortia
Schizophrenia Working Group of the Psychiatric Genomics Consortium
A full list of members and affiliations appears in the Supplementary Note.
Authors
Search for Brendan K Bulik-Sullivan in:
Search for Po-Ru Loh in:
Search for Hilary K Finucane in:
Search for Stephan Ripke in:
Search for Jian Yang in:
Search for Nick Patterson in:
Search for Mark J Daly in:
Search for Alkes L Price in:
Search for Benjamin M Neale in:
Contributions
B.K.B.-S. conceived the idea, analyzed the data, performed the analyses and drafted the manuscript. B.M.N. conceived the idea and drafted the manuscript. M.J.D. conceived the idea and supplied reagents. N.P. conceived the idea and supplied reagents. A.L.P. conceived the idea and supplied reagents. P.-R.L. analyzed the data and performed the analyses. H.K.F. analyzed the data and performed the analyses. S.R. analyzed the data and performed the analyses. J.Y. provided software. All authors provided input and revisions for the final manuscript.
Competing interests
The authors declare no competing financial interests.
Corresponding author
Correspondence to Benjamin M Neale.
Supplementary information
PDF files
- 1.
Supplementary Text and Figures
Supplementary Note, Supplementary Figures 1–9 and Supplementary Tables 1–10.
Rights and permissions
To obtain permission to re-use content from this article visit RightsLink.
About this article
Further reading
-
Varicose veins of lower extremities: Insights from the first large-scale genetic study
PLOS Genetics (2019)
-
Estimating variance components in population scale family trees
PLOS Genetics (2019)
-
Genome-wide association study of multisite chronic pain in UK Biobank
PLOS Genetics (2019)
-
Identification of new therapeutic targets for osteoarthritis through genome-wide analyses of UK Biobank data
Nature Genetics (2019)
-
Deleterious Mutation Burden and Its Association with Complex Traits in Sorghum (Sorghum bicolor)
Genetics (2019)