Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A gene-based association method for mapping traits using reference transcriptome data

Abstract

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual's genetic profile and correlates 'imputed' gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys the benefits of gene-based approaches such as reduced multiple-testing burden and a principled approach to the design of follow-up experiments. Our results demonstrate that PrediXcan can detect known and new genes associated with disease traits and provide insights into the mechanism of these associations.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Mechanism tested by the PrediXcan method.
Figure 2: PrediXcan framework.
Figure 3: Cross-validated prediction performance versus heritability.
Figure 4: Prediction performance of elastic net tested on a separate cohort.
Figure 5: Examples of well-predicted genes.
Figure 6: PrediXcan results for T1D.
Figure 7: Comparison of gene-based methods.

References

  1. Spencer, C.C., Su, Z., Donnelly, P. & Marchini, J. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLoS Genet. 5, e1000477 (2009).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  2. Speliotes, E.K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. Manolio, T.A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. Perera, M.A. et al. The missing association: sequencing-based discovery of novel SNPs in VKORC1 and CYP2C9 that affect warfarin dose in African Americans. Clin. Pharmacol. Ther. 89, 408–415 (2011).

    CAS  PubMed  Article  Google Scholar 

  5. Ritchie, M.D. The success of pharmacogenomics in moving genetic association studies from bench to bedside: study design and implementation of precision medicine in the post-GWAS era. Hum. Genet. 131, 1615–1626 (2012).

    PubMed  PubMed Central  Article  Google Scholar 

  6. Smemo, S. et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature 507, 371–375 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. Nicolae, D.L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  8. Gamazon, E.R., Huang, R.S., Cox, N.J. & Dolan, M.E. Chemotherapeutic drug susceptibility associated SNPs are enriched in expression quantitative trait loci. Proc. Natl. Acad. Sci. USA 107, 9287–9292 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. Davis, L.K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  10. Gamazon, E.R. et al. The convergence of eQTL mapping, heritability estimation and polygenic modeling: emerging spectrum of risk variation in bipolar disorder. arXiv 1303.6227 (2013).

  11. Gusev, A. et al. Regulatory variants explain much more heritability than coding variants across 11 common diseases. bioRxiv 004309 (21 April 2014).

  12. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  13. GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013).

  14. GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multi-tissue gene regulation in humans. Science 348, 648–660 (2015).

  15. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc., B 58, 267–288 (1996).

    Google Scholar 

  19. Zou, H. & Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67, 301–320 (2005).

    Article  Google Scholar 

  20. Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. Hammer, G.E., Kanaseki, T. & Shastri, N. The final touches make perfect the peptide–MHC class I repertoire. Immunity 26, 397–406 (2007).

    CAS  PubMed  Article  Google Scholar 

  22. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  23. Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. Noble, J.A. & Valdes, A.M. Genetics of the HLA region in the prediction of type 1 diabetes. Curr. Diab. Rep. 11, 533–542 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  27. Hakonarson, H. et al. A novel susceptibility locus for type 1 diabetes on Chr12q13 identified by a genome-wide association study. Diabetes 57, 1143–1146 (2008).

    CAS  PubMed  Article  Google Scholar 

  28. Wang, H. et al. Genetically dependent ERBB3 expression modulates antigen presenting cell function and type 1 diabetes risk. PLoS ONE 5, e11789 (2010).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  29. Hart, A.B. et al. Genome-wide association study of d-amphetamine response in healthy volunteers identifies putative associations, including cadherin 13 (CDH13). PLoS ONE 7, e42646 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. Hart, A.B. et al. Genetic variation associated with euphorigenic effects of d-amphetamine is associated with diminished risk for schizophrenia and attention deficit hyperactivity disorder. Proc. Natl. Acad. Sci. USA 111, 5968–5973 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983 (2011).

  32. Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. Price, A.L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  34. Gilad, Y., Rifkin, S.A. & Pritchard, J.K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 24, 408–415 (2008).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. Cookson, W., Liang, L., Abecasis, G., Moffatt, M. & Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 10, 184–194 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. Manor, O. & Segal, E. Robust prediction of expression differences among human individuals using only genotype information. PLoS Genet. 9, e1003396 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Torres, J.M. et al. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am. J. Hum. Genet. 95, 521–534 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  38. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G.R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–959 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  40. Fuchsberger, C., Abecasis, G.R. & Hinds, D.A. minimac2: faster genotype imputation. Bioinformatics 31, 782–784 (2015).

    CAS  PubMed  Article  Google Scholar 

  41. Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  42. Hastie, T., Tibshirani, R. & Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2009).

  43. Breiman, L. Random Forests. Mach. Learn. 45, 5–32 (2001).

    Article  Google Scholar 

  44. Wheeler, H.E. et al. Poly-omic prediction of complex traits: OmicKriging. Genet. Epidemiol. 38, 402–415 (2014).

    PubMed  PubMed Central  Article  Google Scholar 

  45. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. Shabalin, A.A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  47. Franke, A. et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat. Genet. 42, 1118–1125 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    CAS  PubMed  Article  Google Scholar 

  49. Liu, J.Z. et al. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87, 139–145 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  50. Wu, M.C. et al. Powerful SNP-set analysis for case-control genome-wide association studies. Am. J. Hum. Genet. 86, 929–942 (2010).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  51. Wu, M.C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  52. Carroll, R.J., Eyler, A.E. & Denny, J.C. Naive Electronic Health Record phenotype identification for rheumatoid arthritis. AMIA Annu. Symp. Proc. 2011, 189–196 (2011).

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank A. Konkashbaev and C. Fuchsberger for outstanding technical support and N. Knoblauch for assistance in performing the quality control pipeline. We acknowledge the following US National Institutes of Health grants: K12 CA139160 (H.K.I.), T32 MH020065 (K.P.S.), F32 CA165823 (H.E.W.), R01 MH101820 and R01 MH090937 (GTEx), P30 DK20595 and P60 DK20595 (Diabetes Research and Training Center), P50 DA037844 (Rat Genomics), UO1 GM61393 (Pharmacogenomics of Anticancer Agents Research), P50 MH094267 (Conte), U01 GM092691 (J.C.D.) and U19 HL065962 (PGRN Statistical Analysis Resource). Additional acknowledgments can be found in the Supplementary Note.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

H.K.I., H.E.W., E.R.G., K.P.S., S.V.M. and K.A.-M. performed the analyses. J.C.D., R.J.C. and A.E.E. provided replication data. E.R.G., H.E.W., K.P.S. and H.K.I. wrote the manuscript. D.L.N., N.J.C. and H.K.I. provided intellectual input and supervised the study. H.K.I. designed the study. All authors reviewed and contributed to the final manuscript.

Corresponding author

Correspondence to Hae Kyung Im.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

A full list of members and affiliations appears in the Supplementary Note.

Integrated supplementary information

Supplementary Figure 1 Comparison of tenfold cross-validated predictive performance between all tested methods (LASSO, elastic net with α = 0.5, top SNP, polygenic score at several P-value thresholds) in the DGN whole-blood cohort.

Predictive performance was measured by the R2 value between predicted (GReX) and observed expression.

Supplementary Figure 2 Comparison of tenfold cross-validated predictive performance of elastic net in different starting SNP sets (4.6 million 1000 Genomes Project (TGP) SNPs, 1.9 million HapMap Phase 2 SNPs, 331,800 WTCCC genotyped SNPs) in the DGN whole-blood cohort.

Predictive performance was measured by the R2 value between predicted (GReX) and observed expression.

Supplementary Figure 3 Comparison of predicted levels of expression with observed levels from nine tissues of the GTEx pilot project.

The observed squared correlation between predicted and observed gene expression levels, R2, is plotted against the null distribution of R2.

Supplementary Figure 4 Comparison of prediction performance between local- and distal-based prediction models.

Using whole-blood prediction models trained in DGN, we compared predicted levels of expression with observed levels in GTEx whole blood. Local predictors were generated using elastic net on SNPs within 1 Mb of each gene, and distal predictors included any trans eQTLs outside this region with linear regression P < 1 × 10–5. The observed (y axis) squared correlation between predicted and observed gene expression levels, R2, is plotted against the null distribution of R2 (x axis).

Supplementary Figure 5 Quantile-quantile plot of the association P values from the PrediXcan analysis of 6 remaining WTCCC diseases using expression levels imputed from DGN whole blood.

The red line in each panel shows the null expected distribution of P values, and the blue line represents the Bonferroni-corrected genome-wide significance threshold. For each disease, the top three genes that exceed the Bonferroni significance threshold are labeled. The diseases shown are (a) rheumatoid arthritis, (b) Crohn's disease, (c) bipolar disorder, (d) coronary artery disease, (e) hypertension and (f) type 2 diabetes.

Supplementary Figure 6 Plot of the association P values based on genomic position from the PrediXcan analysis of six remaining WTCCC diseases using expression levels imputed from DGN whole blood.

The blue line in each panel represents the Bonferroni-corrected genome-wide significance threshold. For each disease, the top three genes that exceed the Bonferroni significance threshold are labeled. The diseases shown are (a) rheumatoid arthritis, (b) Crohn's disease, (c) bipolar disorder, (d) coronary artery disease, (e) hypertension and (f) type 2 diabetes.

Supplementary Figure 7 Enrichment of known disease genes.

Each plot shows the null expected distribution for the number of genes expected to fall below a P-value threshold of 0.01. The null distribution was derived via 10,000 random permutations. The large point on the horizontal axis of each plot shows the observed number of previously known disease genes that fall below the P-value threshold. The diseases shown are (a) rheumatoid arthritis, (b) Crohn's disease, (c) bipolar disorder, (d) coronary artery disease, (e) hypertension and (f) type 2 diabetes.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Note. (PDF 1472 kb)

Supplementary Table 1

Supplementary Table 1. (XLSX 69 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gamazon, E., Wheeler, H., Shah, K. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091–1098 (2015). https://doi.org/10.1038/ng.3367

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3367

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing