Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets

Abstract

Genome-wide association studies (GWAS) have identified thousands of genetic variants associated with human complex traits. However, the genes or functional DNA elements through which these variants exert their effects on the traits are often unknown. We propose a method (called SMR) that integrates summary-level data from GWAS with data from expression quantitative trait locus (eQTL) studies to identify genes whose expression levels are associated with a complex trait because of pleiotropy. We apply the method to five human complex traits using GWAS data on up to 339,224 individuals and eQTL data on 5,311 individuals, and we prioritize 126 genes (for example, TRAF1 and ANKRD55 for rheumatoid arthritis and SNX19 and NMRAL1 for schizophrenia), of which 25 genes are new candidates; 77 genes are not the nearest annotated gene to the top associated GWAS SNP. These genes provide important leads to design future functional studies to understand the mechanism whereby DNA variation leads to complex trait variation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Association between gene expression and phenotype through genotypes.
Figure 2: Manhattan plots of SMR tests for association between gene expression and complex traits.
Figure 3: Prioritizing genes at a GWAS locus using SMR analysis.
Figure 4: Prioritizing genes at the TRAF1-C5 locus for rheumatoid arthritis.
Figure 5: The SNX19 locus for schizophrenia.

Similar content being viewed by others

References

  1. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  Google Scholar 

  2. Smemo, S. et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507, 371–375 (2014).

    Article  CAS  Google Scholar 

  3. Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).

    Article  CAS  Google Scholar 

  4. Golzio, C. et al. KCTD13 is a major driver of mirrored neuroanatomical phenotypes of the 16p11.2 copy number variant. Nature 485, 363–367 (2012).

    Article  CAS  Google Scholar 

  5. Weiss, L.A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).

    Article  CAS  Google Scholar 

  6. McCarthy, S.E. et al. Microduplications of 16p11.2 are associated with schizophrenia. Nat. Genet. 41, 1223–1227 (2009).

    Article  CAS  Google Scholar 

  7. Jansen, R.C. & Nap, J.P. Genetical genomics: the added value from segregation. Trends Genet. 17, 388–391 (2001).

    Article  CAS  Google Scholar 

  8. Katan, M.B. Apolipoprotein E isoforms, serum cholesterol, and cancer. Lancet 1, 507–508 (1986).

    Article  CAS  Google Scholar 

  9. Smith, G.D. & Ebrahim, S. 'Mendelian randomization': can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. Epidemiol. 32, 1–22 (2003).

    Article  Google Scholar 

  10. Freeman, G., Cowling, B.J. & Schooling, C.M. Power and sample size calculations for Mendelian randomization studies using one genetic instrument. Int. J. Epidemiol. 42, 1157–1163 (2013).

    Article  Google Scholar 

  11. Brion, M.J., Shakhbazov, K. & Visscher, P.M. Calculating statistical power in Mendelian randomization studies. Int. J. Epidemiol. 42, 1497–1501 (2013).

    Article  Google Scholar 

  12. Yang, J. et al. Ubiquitous polygenicity of human complex traits: genome-wide analysis of 49 traits in Koreans. PLoS Genet. 9, e1003355 (2013).

    Article  CAS  Google Scholar 

  13. Wood, A.R. et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, 1173–1186 (2014).

    Article  CAS  Google Scholar 

  14. Locke, A.E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).

    Article  CAS  Google Scholar 

  15. Shungin, D. et al. New genetic loci link adipose and insulin biology to body fat distribution. Nature 518, 187–196 (2015).

    Article  CAS  Google Scholar 

  16. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    Article  CAS  Google Scholar 

  17. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

  18. Westra, H.J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).

    Article  CAS  Google Scholar 

  19. Patsopoulos, N.A. et al. Fine-mapping the genetic association of the major histocompatibility complex in multiple sclerosis: HLA and non-HLA effects. PLoS Genet. 9, e1003926 (2013).

    Article  Google Scholar 

  20. Plenge, R.M. et al. TRAF1-C5 as a risk locus for rheumatoid arthritis—a genomewide study. N. Engl. J. Med. 357, 1199–1209 (2007).

    Article  CAS  Google Scholar 

  21. Stahl, E.A. et al. Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat. Genet. 42, 508–514 (2010).

    Article  CAS  Google Scholar 

  22. Albers, H.M. et al. The TRAF1/C5 region is a risk factor for polyarthritis in juvenile idiopathic arthritis. Ann. Rheum. Dis. 67, 1578–1580 (2008).

    Article  CAS  Google Scholar 

  23. Xavier, R.J. & Rioux, J.D. Genome-wide association studies: a new window into immune-mediated diseases. Nat. Rev. Immunol. 8, 631–643 (2008).

    Article  CAS  Google Scholar 

  24. Tsitsikov, E.N. et al. TRAF1 is a negative regulator of TNF signaling. enhanced TNF signaling in TRAF1-deficient mice. Immunity 15, 647–657 (2001).

    Article  CAS  Google Scholar 

  25. Chung, J.Y., Park, Y.C., Ye, H. & Wu, H. All TRAFs are not created equal: common and distinct molecular mechanisms of TRAF-mediated signal transduction. J. Cell Sci. 115, 679–688 (2002).

    CAS  PubMed  Google Scholar 

  26. Nica, A.C. et al. The architecture of gene regulatory variation across multiple human tissues: the MuTHER study. PLoS Genet. 7, e1002003 (2011).

    Article  CAS  Google Scholar 

  27. GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  28. McKenzie, M., Henders, A.K., Caracella, A., Wray, N.R. & Powell, J.E. Overlap of expression quantitative trait loci (eQTL) in human brain and blood. BMC Med. Genomics 7, 31 (2014).

    Article  Google Scholar 

  29. Finucane, H.K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  Google Scholar 

  30. Eaton, W.W. et al. Association of schizophrenia and autoimmune diseases: linkage of Danish national registers. Am. J. Psychiatry 163, 521–528 (2006).

    Article  Google Scholar 

  31. Nemani, K., Hosseini Ghomi, R., McCormick, B. & Fan, X. Schizophrenia and the gut-brain axis. Prog. Neuropsychopharmacol. Biol. Psychiatry 56, 155–160 (2015).

    Article  CAS  Google Scholar 

  32. Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).

    Article  CAS  Google Scholar 

  33. Veyrieras, J.B. et al. Exon-specific QTLs skew the inferred distribution of expression QTLs detected using gene expression array data. PLoS One 7, e30629 (2012).

    Article  CAS  Google Scholar 

  34. Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375, S1–S3 (2012).

    Article  CAS  Google Scholar 

  35. Lawlor, D.A., Harbord, R.M., Sterne, J.A., Timpson, N. & Davey Smith, G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27, 1133–1163 (2008).

    Article  Google Scholar 

  36. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  Google Scholar 

  37. Gamazon, E.R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Article  CAS  Google Scholar 

  38. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  CAS  Google Scholar 

  39. Powell, J.E. et al. The Brisbane Systems Genetics Study: genetical genomics meets complex trait genetics. PLoS One 7, e35430 (2012).

    Article  CAS  Google Scholar 

  40. Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23 R1, R89–R98 (2014).

    Article  Google Scholar 

  41. Ference, B.A. et al. Effect of long-term exposure to lower low-density lipoprotein cholesterol beginning early in life on the risk of coronary heart disease: a Mendelian randomization analysis. J. Am. Coll. Cardiol. 60, 2631–2639 (2012).

    Article  CAS  Google Scholar 

  42. Pierce, B.L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).

    Article  Google Scholar 

  43. Inoue, A. & Solon, G. Two-sample instrumental variables estimators. Rev. Econ. Stat. 92, 557–561 (2010).

    Article  Google Scholar 

  44. Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer Associates, 1998).

  45. International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).

  46. Psaty, B.M. et al. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ. Cardiovasc. Genet. 2, 73–80 (2009).

    Article  Google Scholar 

  47. Kuonen, D. Saddlepoint approximations for distributions of quadratic forms in normal variables. Biometrika 86, 929–935 (1999).

    Article  Google Scholar 

  48. Davies, R.B. Numerical inversion of a characteristic function. Biometrika 60, 415–417 (1973).

    Article  Google Scholar 

  49. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  50. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank J. McGrath for helpful comments. This research was supported by the Australian Research Council (DP130102666), the Australian National Health and Medical Research Council (grants 1078037, 1048853 and 1046880) and the Sylvia and Charles Viertel Charitable Foundation. This study makes use of data from the database of Genotypes and Phenotypes (dbGaP) available under accession phs000090.v3.p1 (see the Supplementary Note for the full set of acknowledgments for these data).

Author information

Authors and Affiliations

Authors

Contributions

J.Y. conceived and designed the study. J.Y. and Z.Z. derived the theories. Z.Z. performed simulations and statistical analyses. F.Z., Z.Z. and J.Y. developed the software tool. H.H., A.B., M.R.R., J.E.P., G.W.M., M.E.G., N.R.W. and P.M.V. contributed by providing statistical support and/or advice on interpretation of results. J.E.P., G.W.M. and P.M.V. provided the Brisbane Systems Genetics Study data. J.Y. and Z.Z. wrote the manuscript with the participation of all authors.

Corresponding author

Correspondence to Jian Yang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 QQ plots of P values from the SMR tests under a range of simulation scenarios.

The simulation strategy is described in the Supplementary Note. Shown are the results from 10,000 simulations. MR one sample, Mendelian randomization analysis in one sample; SMR one sample, summary data–based Mendelian randomization analysis in one sample; SMR two samples, the effect sizes of a SNP on gene expression (βzx) and a SNP on phenotype (bzy) are estimated from two independent samples.

Supplementary Figure 2 Prioritizing genes at the NMRAL1 locus for schizophrenia.

(a) Top, P values from the GWAS for schizophrenia1 (brown dots) and P values from SMR tests (diamonds) using Westra eQTL data2. Bottom, P values from the Westra and Ramasamy eQTL studies3 for NMRALI1. Shown in a are all the SNPs available in the GWAS and eQTL data. (b,c) Effect sizes of the SNPs (used for the HEIDI test) from the GWAS against those from the Westra and Ramasamy eQTL studies. The orange dashed lines represent the estimate of bxy at the top cis-eQTL (rather than the regression line). Error bars are the standard errors of SNP effects.

Supplementary Figure 3 Genes with multiple tagging probes that pass the SMR and HEIDI tests—the ANKRD55 locus for rheumatoid arthritis.

(a) Top, dots in brown represent P values for SNPs from the latest GWAS for rheumatoid arthritis4, diamonds represent P values for probes from the SMR analysis and triangles represent probes without a cis-eQTL at PeQTL < 5.0 × 10−8. Probes that passed the HEIDI test (PHEIDI ≥ 0.05) are highlighted in red. Bottom, P values from eQTL studies for the two probes that both tag ANKRD55. Shown in a are all SNPs available in the GWAS and eQTL summary data (rather than those in common). (b,c) Effect sizes of the SNPs (used in the HEIDI test) from the GWAS against those from the eQTL study for the two probes. The orange dashed lines represent the estimate of bxy at the top cis-eQTL. Error bars are the standard errors of SNP effects. (d) Plot of expression levels of the two probes against each other in the data (328 unrelated individuals) from the Brisbane Systems Genetics Study5.

Supplementary Figure 4 Power of detecting heterogeneity in bxy as a function of LD between two causal variants in simulations.

We simulated data under the alternative hypothesis that there are two distinct causal variants, one affecting gene expression and one affecting the trait. The method for simulation is described in the Supplementary Note.

Supplementary Figure 5 Distribution of LD between the top associated GWAS SNPs and the simulated causal eQTLs.

The results are from an SMR analysis of real GWAS data for five complex traits (Supplementary Table 6) and simulated eQTL data. The eQTL data were simulated on the basis of real genotypes, mimicking the Westra study but with the causal variants of cis-eQTLs randomly placed across the genome (see the Supplementary Note for details). There were a number of simulated probes that passed the SMR and HEIDI tests (Supplementary Table 14). For these probes, the overlaps between the GWAS and eQTL signals are due to the strong LD between the GWAS causal variants and the causal eQTLs. Unfortunately, the GWAS causal variants are unknown. Shown are LD r2 values for the top associated GWAS SNPs and the simulated causal eQTLs in these probe regions, which are likely to be underestimations of LD between the causal variants. For regions where there were multiple independent GWAS signals as indicated by a GCTA-COJO analysis6, we selected the top GWAS SNP, conditioning the secondary GWAS signal (a secondary GWAS signal was defined as the top associated GWAS SNP in the region when conditioning on the GWAS SNP used in the SMR test).

Supplementary Figure 6 Estimating SNP effects from z statistics.

The reported GWAS effect sizes are equivalent to the SNP effects reported in the GWAS summary data. The estimated GWAS effect sizes are equivalent to the SNP effects estimated from z statistics using the sample size of the whole study rather than the per-SNP sample size reported in the summary data (Supplementary Note). This is just a demonstration of the difference between the reported estimate of effect size and the effect size estimated from z statistics without taking the per-SNP sample size into account. In our SMR analyses for height, we used the reported estimate of effect size.

Supplementary Figure 7 An example of pleiotropic signal being diluted by multiple associated SNPs in GWAS.

(a) Effect sizes of the SNPs (used in the HEIDI test) from the latest GWAS for height7 against those from the Westra eQTL study2 at the SPI1 locus. We detected a secondary signal in the GWAS data. (b) Effect sizes conditioning on the secondary signal. The orange dashed lines represent the estimate of bxy at the top cis-eQTL. Error bars are the standard errors of SNP effects.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Note. (PDF 4402 kb)

Supplementary Tables 1–16

Supplementary Tables 1–16. (XLSX 161 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Z., Zhang, F., Hu, H. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48, 481–487 (2016). https://doi.org/10.1038/ng.3538

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3538

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing