During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
At a glance
- Five years of GWAS discovery. Am. J. Hum. Genet. 90, 7–24 (2012). , , &
- 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
- Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 14, 379–389 (2013). &
- Meta-analysis of genome-wide association studies: no efficiency gain in using individual participant data. Genet. Epidemiol. 34, 60–66 (2010). &
- Random-effects model aimed at discovering associations in meta-analysis of genome-wide association studies. Am. J. Hum. Genet. 88, 586–598 (2011).
This study introduces a powerful new random-effects meta-analysis method that uses a null model of no heterogeneity.
- Interpreting meta-analyses of genome-wide association studies. PLoS Genet. 8, e1002555 (2012). &
- Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
This study demonstrates that conditional association analysis can be performed using summary statistics.
- Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014). et al.
- Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015). et al.
- Contrasting the genetic architecture of 30 complex traits from summary association data. Am. J. Hum. Genet. 99, 139–153 (2016). , &
- Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010). &
- Using linear predictors to impute allele frequencies from summary or pooled genotype data. Ann. Appl. Stat. 4, 1158–1182 (2010).
This study is the first to show that Gaussian imputation methods can be applied to summary-level genetic data.
- Increasing power of genome-wide association studies by collecting additional single-nucleotide polymorphisms. Genetics 188, 449–460 (2011). , &
- DIST: direct imputation of summary statistics for unmeasured SNPs. Bioinformatics 29, 2925–2927 (2013). , , , &
- Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics 30, 2906–2914 (2014). et al.
- DISSCO: direct imputation of summary statistics allowing covariates. Bioinformatics 31, 2434–2442 (2015). et al.
- DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts. Bioinformatics 31, 3099–3104 (2015). et al.
- Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses. Bioinformatics 31, i181–189 (2015). et al.
- A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010). et al.
- GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am. J. Hum. Genet. 88, 283–293 (2011). , , &
- So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am. J. Hum. Genet. 81, 1158–1168 (2007). &
- Identification of causal genes for complex traits. Bioinformatics 31, i206–i213 (2015). , , , &
- Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016). &
- Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010). et al.
- Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010). et al.
- Integrating genetic and gene expression evidence into genome-wide association analysis of gene sets. Genome Res. 22, 386–397 (2012). , , , &
- Sherlock: detecting gene–disease associations by matching patterns of expression QTL and GWAS. Am. J. Hum. Genet. 92, 667–680 (2013). et al.
- iGWAS: integrative genome-wide association studies of genetic and genomic data for disease susceptibility using mediation analysis. Genet. Epidemiol. 39, 347–356 (2015). , , , &
- Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
This study introduces a method for performing TWAS using summary statistics by assessing whether a single causal variant affects both gene expression and trait.
- Fine mapping of type 1 diabetes susceptibility loci and evidence for colocalization of causal variants with lymphoid gene enhancers. Nat. Genet. 47, 381–386 (2015). et al.
- Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nat. Genet. 47, 839–846 (2015). et al.
- A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015). et al.
- JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics 31, 1176–1182 (2015). et al.
- Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
This study identifies 69 new genes associated with obesity-related traits using a powerful new method for performing TWAS using summary statistics by assessing the association between predicted gene expression (using all cis SNPs) and trait.
- Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016). et al.
- Predicting gene targets from integrative analyses of summary data from GWAS and eQTL studies for 28 human complex traits. Genome Med. 8, 84 (2016). et al.
- Rare and common variants: twenty arguments. Nat. Rev. Genet. 13, 135–145 (2011).
- Searching for missing heritability: designing rare variant association studies. Proc. Natl Acad. Sci. USA 111, E455–E464 (2014). et al.
- Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014). , , &
- General framework for meta-analysis of rare variants in sequencing association studies. Am. J. Hum. Genet. 93, 42–53 (2013).
This study is the first of three studies to demonstrate that rare variant burden and overdispersion tests can be performed using summary statistics.
, , &
- Meta-analysis of gene-level associations for rare variants based on single-variant statistics. Am. J. Hum. Genet. 93, 236–248 (2013). et al.
- Meta-analysis of gene-level tests for rare variant association. Nat. Genet. 46, 200–204 (2014). et al.
- Re-ranking sequencing variants in the post-GWAS era for accurate causal variant identification. PLoS Genet. 9, e1003609 (2013). , , , &
- Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10, 681–690 (2009). &
- Wellcome Trust Case Control Consortium et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).
This study uses posterior probabilities of causality to construct credible sets of causal disease-associated SNPs across multiple loci and diseases under a single causal variant per locus assumption.
- Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014). , , , &
- Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014). et al.
- Fine mapping causal variants with an approximate bayesian method using marginal test statistics. Genetics 200, 719–736 (2015). et al.
- FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016). et al.
- JAM: a scalable bayesian framework for joint analysis of marginal SNP effects. Genet. Epidemiol. 40, 188–201 (2016). , &
- Evaluating the performance of fine-mapping strategies at common variant GWAS loci. PLoS Genet. 11, e1005535 (2015). et al.
- Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic Acids Res. 44, e144 (2016). &
- FGFR2 variants and breast cancer risk: fine-scale mapping using African American studies and analysis of chromatin conformation. Hum. Mol. Genet. 18, 1692–1703 (2009). et al.
- Evaluating the power to discriminate between highly correlated SNPs in genetic association studies. Genet. Epidemiol. 34, 463–468 (2010). , &
- ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
- Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
- Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012). et al.
- Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013). et al.
- Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).
This study uses a Bayesian hierarchical model to estimate posterior probabilities of causality and to identify functional annotations enriched for disease heritability under a single causal variant per locus assumption.
- GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLoS Genet. 10, e1004787 (2014). , , , &
- Leveraging functional-annotation data in trans-ethnic fine-mapping studies. Am. J. Hum. Genet. 97, 260–271 (2015).
This study shows that fine-mapping accuracy can be improved by leveraging functional annotation data and trans-ethnic samples and modelling multiple causal variants per locus.
- Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015). et al.
- The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016). et al.
- Trans-ethnic meta-analysis and functional annotation illuminates the genetic architecture of fasting glucose and insulin. Am. J. Hum. Genet. 99, 56–75 (2016). et al.
- Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015). et al.
- Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015). et al.
- Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010). , , , &
- Transethnic meta-analysis of genomewide association studies. Genet. Epidemiol. 35, 809–822 (2011).
- Efficiency of trans-ethnic genome-wide meta-analysis and fine-mapping. Eur. J. Hum. Genet. 20, 1300–1307 (2012). , , &
- Trans-ethnic study design approaches for fine-mapping. Eur. J. Hum. Genet. 24, 1330–1336 (2016). , , , &
- Multi-ethnic fine-mapping of 14 central adiposity loci. Hum. Mol. Genet. 23, 4738–4744 (2014). et al.
- Trans-ethnic fine mapping identifies a novel independent locus at the 3′ end of CDKAL1 and novel variants of several susceptibility loci for type 2 diabetes in a Han Chinese population. Diabetologia 56, 2619–2628 (2013). et al.
- Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17, 392–406 (2016). , &
- Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat. Genet. 45, 400–405 (2013). et al.
- International Schizophrenia Consortium. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
This study uses polygenic risk scores to predict schizophrenia risk with appreciable accuracy, implicating a highly polygenic disease architecture.
- Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat. Genet. 44, 483–489 (2012). et al.
- Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97, 576–592 (2015). et al.
- Best linear unbiased estimation and prediction under a selection model. Biometrics 31, 423–447 (1975).
- Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 11, 880–886 (2010). , &
- MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014). &
- Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013). , &
- Simultaneous discovery, estimation and prediction analysis of complex traits using a Bayesian mixture model. PLoS Genet. 11, e1004969 (2015). et al.
- Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013). et al.
- Fast method that uses polygenic scores to estimate the variance explained by genome-wide marker panels and the proportion of variants affecting a trait. Am. J. Hum. Genet. 97, 250–259 (2015). &
- LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015). et al.
- Genomic inflation factors under polygenic inheritance. Eur. J. Hum. Genet. 19, 807–812 (2011). et al.
- Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010). et al.
- Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015). et al.
- Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015). et al.
- Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011). et al.
- Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011). et al.
- Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011). et al.
- Nonsense mutation in the LGR4 gene is associated with several human diseases and other traits. Nature 497, 517–520 (2013). et al.
- Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat. Biotechnol. 31, 1102–1110 (2013). et al.
- Quantifying missing heritability at known GWAS loci. PLoS Genet. 9, e1003993 (2013). et al.
- CNVs conferring risk of autism or schizophrenia affect cognition in controls. Nature 505, 361–366 (2014). et al.
- Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).
This study applies a Bayesian framework to identify pleiotropic effects across a broad set of complex traits and diseases.
- Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380, 572–580 (2012). et al.
- Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013). , &
- Combining information on multiple instrumental variables in Mendelian randomization: comparison of allele score and summarized data methods. Stat. Med. 35, 1880–1906 (2016). , &
- Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013). et al.
- An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
This study introduces a new method for estimating genome-wide genetic correlations from summary statistics.
- Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016). et al.
- GWIS: genome-wide inferred statistics for functions of multiple phenotypes. Am. J. Hum. Genet. 99, 917–927 (2016). , , , &
- Imputing phenotypes for genome-wide association studies. Am. J. Hum. Genet. 99, 89–103 (2016). et al.
- [No authors listed.] Asking for more. Nat. Genet. 44, 733 (2012).
- Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4, e1000167 (2008). et al.
- Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014). , , , &
- Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41, 965–967 (2009). , , &
- The limits of individual identification from sample allele frequencies: theory and statistical analysis. PLoS Genet. 5, e1000628 (2009). &
- Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15, 409–421 (2014). &
- A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 5, e1000384 (2009). &
- Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet. 83, 311–321 (2008). &
- 86, 832–838 (2010). et al. Pooled association tests for rare variants in exon resequencing studies.
- Testing for an unusual distribution of rare variants. PLoS Genet. 7, e1001322 (2011). et al.
- Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011). et al.
- Accuracy of predicting the genetic risk of disease using a genome-wide approach. PloS One 3, e3395 (2008). , &
- Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011). , , &
- Parent-of-origin-specific allelic associations among 106 genomic loci for age at menarche. Nature 514, 92–97 (2014). et al.
- Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45, 1452–1458 (2013). et al.
- Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture. Nature 526, 112–117 (2015). et al.
- Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010). et al.
- Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011). et al.
- Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012). et al.
- Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015). et al.
- Genetic variants associated with subjective well-being, depressive symptoms, and neuroticism identified through genome-wide analyses. Nat. Genet. 48, 624–633 (2016). et al.
- Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).
- A genome-wide approach accounting for body mass index identifies genetic variants influencing fasting glycemic traits and insulin resistance. Nat. Genet. 44, 659–669 (2012). et al.
- Common variants at 10 genomic loci influence hemoglobin A1C levels via glycemic and nonglycemic pathways. Diabetes 59, 3229–3239 (2010). et al.
- Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010). et al.
- Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
- Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467, 832–838 (2010). et al.
- New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015). et al.
- Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014). et al.
- Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
- Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012). et al.
- LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics http://dx.doi.org/10.1093/bioinformatics/btw613 (2016). et al.