Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Dissecting the genetics of complex traits using summary association statistics

Key Points

  • Summary association statistics from genome-wide association studies (GWAS) are widely available in large sample sizes across hundreds of complex traits. Analyses of such data can yield important insights, motivating the development of new statistical methods in this area.

  • Single variant association analysis (including meta-analyses, conditional association and imputation) can be performed effectively using summary association data. These methods often rely on linkage disequilibrium (LD) information from population reference panels.

  • Summary association data can be used to perform gene-based association tests to identify genes influencing complex traits. In particular, expression quantitative trait loci (eQTLs) can be integrated to identify genes whose expression levels influence complex traits, and rare variant association tests can aggregate evidence of association across multiple rare variants in a gene.

  • Statistical fine-mapping of causal variant (or variants) at GWAS loci can be performed using summary association data, leveraging information on the strength of association, functional genomic annotations and differences in LD patterns across different populations.

  • It is becoming increasingly clear that most complex traits and common diseases have a large number of causal variants with small effects. Summary association statistics can be used to understand these polygenic architectures and leverage them for polygenic risk prediction.

  • Summary association statistics have broad utility in cross-trait analyses, including detecting pleiotropic effects and inferring genetic correlations between traits. Pleiotropic effects can be used in Mendelian randomization analyses to draw inferences about causal relationships among traits.

Abstract

During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Illustration of summary association statistics.
Figure 2: TWAS using predicted expression and summary data.
Figure 3: Leveraging functional annotation and trans-ethnic data to improve fine-mapping.

Similar content being viewed by others

References

  1. Visscher, P. M., Brown, M. A., McCarthy, M. I. & Yang, J. Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  3. Evangelou, E. & Ioannidis, J. P. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013).

    CAS  PubMed  Google Scholar 

  4. Lin, D. Y. & Zeng, D. Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 34, 60–66 (2010).

    CAS  PubMed  Google Scholar 

  5. Han, B. & Eskin, E. Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011). This study introduces a powerful new random-effects meta-analysis method that uses a null model of no heterogeneity.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Han, B. & Eskin, E. Interpreting meta-analyses of genome-wide association studies. PLoS Genet. 8, e1002555 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012). This study demonstrates that conditional association analysis can be performed using summary statistics.

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Wood, A. R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Shi, H., Kichaev, G. & Pasaniuc, B. Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99, 139–153 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).

    CAS  PubMed  Google Scholar 

  12. Wen, X. & Stephens, M. Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann. Appl. Stat. 4, 1158–1182 (2010). This study is the first to show that Gaussian imputation methods can be applied to summary-level genetic data.

    PubMed  PubMed Central  Google Scholar 

  13. Kostem, E., Lozano, J. A. & Eskin, E. Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms. Genetics 188, 449–460 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Lee, D., Bigdeli, T. B., Riley, B. P., Fanous, A. H. & Bacanu, S. A. DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics 29, 2925–2927 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Pasaniuc, B. et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Xu, Z. et al. DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31, 2434–2442 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Lee, D. et al. DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts. Bioinformatics 31, 3099–3104 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Park, D. S. et al. Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics 31, i181–189 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Liu, J. Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  20. Li, M.-X., Gui, H.-S., Kwan, J. S. H. & Sham, P. C. GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am. J. Hum. Genet. 88, 283–293 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Conneely, K. N. & Boehnke, M. So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am. J. Hum. Genet. 81, 1158–1168 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Hormozdiari, F., Kichaev, G., Yang, W.-Y., Pasaniuc, B. & Eskin, E. Identification of causal genes for complex traits. Bioinformatics 31, i206–i213 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

    CAS  PubMed  Google Scholar 

  24. Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    PubMed  PubMed Central  Google Scholar 

  25. Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

    PubMed  PubMed Central  Google Scholar 

  26. Xiong, Q., Ancona, N., Hauser, E. R., Mukherjee, S. & Furey, T. S. Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. Genome Res. 22, 386–397 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. He, X. et al. Sherlock: detecting gene–disease associations by matching patterns of expression QTL and GWAS. Am. J. Hum. Genet. 92, 667–680 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Huang, Y. T., Liang, L., Moffatt, M. F., Cookson, W. O. C. M. & Lin, X. iGWAS: integrative genome-wide association studies of genetic and genomic data for disease susceptibility using mediation analysis. Genet. Epidemiol. 39, 347–356 (2015).

    PubMed  PubMed Central  Google Scholar 

  29. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014). This study introduces a method for performing TWAS using summary statistics by assessing whether a single causal variant affects both gene expression and trait.

    PubMed  PubMed Central  Google Scholar 

  30. Onengut-Gumuscu, S. et al. Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 47, 381–386 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Fortune, M. D. et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nat. Genet. 47, 839–846 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Lee, D. et al. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics 31, 1176–1182 (2015).

    PubMed  Google Scholar 

  34. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016). This study identifies 69 new genes associated with obesity-related traits using a powerful new method for performing TWAS using summary statistics by assessing the association between predicted gene expression (using all cis SNPs) and trait.

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    CAS  PubMed  Google Scholar 

  36. Pavlides, J. M. W. et al. Predicting gene targets from integrative analyses of summary data from GWAS and eQTL studies for 28 human complex traits. Genome Med. 8, 84 (2016).

    PubMed  PubMed Central  Google Scholar 

  37. Gibson, G. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2011).

    Google Scholar 

  38. Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2014).

    CAS  PubMed  Google Scholar 

  39. Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Lee, S., Teslovich, T. M., Boehnke, M. & Lin, X. General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013). This study is the first of three studies to demonstrate that rare variant burden and overdispersion tests can be performed using summary statistics.

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Hu, Y.-J. et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. Am. J. Hum. Genet. 93, 236–248 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Liu, D. J. et al. Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 46, 200–204 (2014).

    CAS  PubMed  Google Scholar 

  43. Faye, L. L., Machiela, M. J., Kraft, P., Bull, S. B. & Sun, L. Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification. PLoS Genet. 9, e1003609 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Stephens, M. & Balding, D. J. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009).

    CAS  PubMed  Google Scholar 

  45. Wellcome Trust Case Control Consortium et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012). This study uses posterior probabilities of causality to construct credible sets of causal disease-associated SNPs across multiple loci and diseases under a single causal variant per locus assumption.

  46. Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

    PubMed  PubMed Central  Google Scholar 

  48. Chen, W. et al. Fine mapping causal variants with an approximate bayesian method using marginal test statistics. Genetics 200, 719–736 (2015).

    PubMed  PubMed Central  Google Scholar 

  49. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Newcombe, P. J., Conti, D. V. & Richardson, S. JAM: a scalable bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016).

    PubMed  PubMed Central  Google Scholar 

  51. Van de Bunt, M. et al. Evaluating the performance of fine-mapping strategies at common variant GWAS loci. PLoS Genet. 11, e1005535 (2015).

    PubMed  PubMed Central  Google Scholar 

  52. Li, Y. & Kellis, M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic Acids Res. 44, e144 (2016).

    PubMed  PubMed Central  Google Scholar 

  53. Udler, M. S. et al. FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. Hum. Mol. Genet. 18, 1692–1703 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Udler, M. S., Tyrer, J. & Easton, D. F. Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet. Epidemiol. 34, 463–468 (2010).

    PubMed  Google Scholar 

  55. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  56. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  57. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).

    CAS  PubMed  Google Scholar 

  59. Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014). This study uses a Bayesian hierarchical model to estimate posterior probabilities of causality and to identify functional annotations enriched for disease heritability under a single causal variant per locus assumption.

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet. 10, e1004787 (2014).

    PubMed  PubMed Central  Google Scholar 

  61. Kichaev, G. & Pasaniuc, B. Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015). This study shows that fine-mapping accuracy can be improved by leveraging functional annotation data and trans-ethnic samples and modelling multiple causal variants per locus.

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).

    CAS  PubMed  Google Scholar 

  63. Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Liu, C.-T. et al. Trans-ethnic meta-analysis and functional annotation illuminates the genetic architecture of fasting glucose and insulin. Am. J. Hum. Genet. 99, 56–75 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Grubert, F. et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Waszak, S. M. et al. Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015).

    CAS  PubMed  Google Scholar 

  67. Zaitlen, N., Pasaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. Morris, A. P. Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).

    PubMed  PubMed Central  Google Scholar 

  69. Ong, R. T.-H., Wang, X., Liu, X. & Teo, Y. Y. Efficiency of trans-ethnic genome-wide meta-analysis and fine-mapping. Eur. J. Hum. Genet. 20, 1300–1307 (2012).

    PubMed  Google Scholar 

  70. Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for fine-mapping. Eur. J. Hum. Genet. 24, 1330–1336 (2016).

    PubMed  PubMed Central  Google Scholar 

  71. Liu, C.-T. et al. Multi-ethnic fine-mapping of 14 central adiposity loci. Hum. Mol. Genet. 23, 4738–4744 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  72. Kuo, J. Z. et al. Trans-ethnic fine mapping identifies a novel independent locus at the 3′ end of CDKAL1 and novel variants of several susceptibility loci for type 2 diabetes in a Han Chinese population. Diabetologia 56, 2619–2628 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Chatterjee, N., Shi, J. & Garcia-Closas, M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Chatterjee, N. et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009). This study uses polygenic risk scores to predict schizophrenia risk with appreciable accuracy, implicating a highly polygenic disease architecture.

  76. Stahl, E. A. et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. Vilhjalmsson, B. J. et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Henderson, C. R. Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).

    CAS  PubMed  Google Scholar 

  79. de los Campos, G., Gianola, D. & Allison, D. B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 11, 880–886 (2010).

    CAS  PubMed  Google Scholar 

  80. Speed, D. & Balding, D. J. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. Moser, G. et al. Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genet. 11, e1004969 (2015).

    PubMed  PubMed Central  Google Scholar 

  83. Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. Palla, L. & Dudbridge, F. A. Fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. Am. J. Hum. Genet. 97, 250–259 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. Yang, J. et al. Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011).

    PubMed  PubMed Central  Google Scholar 

  87. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. Styrkársdottir, U. et al. Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013).

    PubMed  Google Scholar 

  94. Denny, J. C. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. Gusev, A. et al. Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013).

    PubMed  PubMed Central  Google Scholar 

  96. Stefansson, H. et al. CNVs conferring risk of autism or schizophrenia affect cognition in controls. Nature 505, 361–366 (2014).

    CAS  PubMed  Google Scholar 

  97. Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016). This study applies a Bayesian framework to identify pleiotropic effects across a broad set of complex traits and diseases.

    CAS  PubMed  PubMed Central  Google Scholar 

  98. Voight, B. F. et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380, 572–580 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).

    PubMed  PubMed Central  Google Scholar 

  100. Burgess, S., Dudbridge, F. & Thompson, S. G. Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat. Med. 35, 1880–1906 (2016).

    PubMed  Google Scholar 

  101. Lee, S. H. et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013).

    CAS  PubMed  Google Scholar 

  102. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015). This study introduces a new method for estimating genome-wide genetic correlations from summary statistics.

    CAS  PubMed  PubMed Central  Google Scholar 

  103. Brown, B. C. et al. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Nieuwboer, H. A., Pool, R., Dolan, C. V., Boomsma, D. I. & Nivard, M. G. GWIS: genome-wide inferred statistics for functions of multiple phenotypes. Am. J. Hum. Genet. 99, 917–927 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  105. Hormozdiari, F. et al. Imputing phenotypes for genome-wide association studies. Am. J. Hum. Genet. 99, 89–103 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  106. [No authors listed.] Asking for more. Nat. Genet. 44, 733 (2012).

  107. Homer, N. et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008).

    PubMed  PubMed Central  Google Scholar 

  108. Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

    PubMed  PubMed Central  Google Scholar 

  109. Sankararaman, S., Obozinski, G., Jordan, M. I. & Halperin, E. Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41, 965–967 (2009).

    CAS  PubMed  Google Scholar 

  110. Visscher, P. M. & Hill, W. G. The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet. 5, e1000628 (2009).

    PubMed  PubMed Central  Google Scholar 

  111. Erlich, Y. & Narayanan, A. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. Madsen, B. E. & Browning, S. R. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009).

    PubMed  PubMed Central  Google Scholar 

  113. Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  114. Price, A. et al. Pooled association tests for rare variants in exon resequencing studies. 86, 832–838 (2010).

  115. Neale, B. M. et al. Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  116. Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  117. Daetwyler, H. D., Villanueva, B. & Woolliams, J. A. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PloS One 3, e3395 (2008).

    PubMed  PubMed Central  Google Scholar 

  118. Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).

    PubMed  PubMed Central  Google Scholar 

  119. Perry, J. R. et al. Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 92–97 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  120. Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45, 1452–1458 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  121. Zheng, H. F. et al. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature 526, 112–117 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  122. Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  123. Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  124. Jostins, L. et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  125. Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  126. Okbay, A. et al. Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

  128. Manning, A. K. et al. A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  129. Soranzo, N. et al. Common variants at 10 genomic loci influence hemoglobin A1C levels via glycemic and nonglycemic pathways. Diabetes 59, 3229–3239 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  130. Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  131. Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

  132. Lango Allen, H. et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  134. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    CAS  PubMed  Google Scholar 

  135. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  136. Morris, A. P. et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  137. Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics http://dx.doi.org/10.1093/bioinformatics/btw613 (2016).

Download references

Acknowledgements

The authors are grateful to H. Finucane, S. Gazal, N. Mancuso and H. Shi for helpful discussions, and to G. Kichaev and R. Johnson for help with figure 3. The work of the authors is funded by US National Institutes of Health grants R01 HG006399, R01 MH101244, R01 GM105857 and R01 MH107649.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bogdan Pasaniuc or Alkes L. Price.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides

Glossary

Individual-level data

Genome-wide single nucleotide polymorphism genotypes and trait values for each individual included in a genome-wide association study.

Summary association statistics

Estimated effect sizes and their standard errors for each single nucleotide polymorphism analysed in a genome-wide association study.

z-scores

Association statistics that follow a standard normal distribution under the null model; often computed as per-allele effect sizes divided by their standard errors.

Meta-analysis

A method for combining data from different studies in which summary association statistics from each study are jointly analysed.

Mega-analysis

A method for combining data from different studies in which individual-level data from each study are merged and jointly analysed.

Summary LD information

(summary linkage disequilibrium information). In-sample correlations between each pair of typed single nucleotide polymorphisms analysed in a genome-wide association study; can be restricted to proximal pairs of typed SNPs to limit the number of pairs of SNPs.

Transcriptome-wide association studies

(TWAS). Studies that evaluate the association between the expression of each gene and a trait of interest; predicted expression may be used instead of measured expression to improve practicality.

Mendelian randomization

A method that uses significantly associated single nucleotide polymorphisms as instrumental variables to quantify causal relationships between two traits.

Burden tests

Gene-based rare variant tests in which all rare variants in a gene are assumed to have the same direction of effect.

Overdispersion tests

Gene-based rare variant tests in which rare variants in a gene are assumed to impact trait in either direction.

Posterior probability of causality

The inferred probability that a single nucleotide polymorphism is causal based on association data and optional prior information.

Polygenic risk scores

A method of predicting trait by summing the predicted marginal effects of all markers below a P value threshold in a training sample multiplied by marker genotypes in a validation sample.

LD score regression

A method of assessing trait polygenicity by regressing χ2 association statistics against linkage disequilibrium (LD) scores for each single nucleotide polymorphism (SNP), computed as sums of squared correlations of each SNP with all SNPs including itself.

Pleiotropy

The existence of a genetic variant (or variants) that affects more than one trait.

Genetic correlation

The signed correlation across single nucleotide polymorphisms between causal effect sizes for two traits.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pasaniuc, B., Price, A. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 18, 117–127 (2017). https://doi.org/10.1038/nrg.2016.142

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg.2016.142

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing