Opportunities and challenges for transcriptome-wide association studies

Article metrics

Subjects

Abstract

Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and gene expression datasets to identify gene–trait associations. In this Perspective, we explore properties of TWAS as a potential approach to prioritize causal genes at GWAS loci, by using simulations and case studies of literature-curated candidate causal genes for schizophrenia, low-density-lipoprotein cholesterol and Crohn’s disease. We explore risk loci where TWAS accurately prioritizes the likely causal gene as well as loci where TWAS prioritizes multiple genes, some likely to be non-causal, owing to sharing of expression quantitative trait loci (eQTL). TWAS is especially prone to spurious prioritization with expression data from non-trait-related tissues or cell types, owing to substantial cross-cell-type variation in expression levels and eQTL strengths. Nonetheless, TWAS prioritizes candidate causal genes more accurately than simple baselines. We suggest best practices for causal-gene prioritization with TWAS and discuss future opportunities for improvement. Our results showcase the strengths and limitations of using eQTL datasets to determine causal genes at GWAS loci.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: TWAS, like GWAS, frequently has multiple significant associations per locus.
Fig. 2: Co-regulation strongly predicts TWAS hit strength at the SORT1 locus.
Fig. 3: Correlated predicted expression can cause non-causal hits even in the absence of correlated total expression.
Fig. 4: Sharing of GWAS variants between expression models can contribute to non-causal hits even without correlated predicted expression.
Fig. 5: Co-regulation scenarios in TWAS that may lead to non-causal hits, from least to most general.
Fig. 6: Most candidate causal genes drop out after switching to a tissue with a less clear mechanistic relationship to the trait, owing to a lack of sufficient expression or sufficiently heritable expression.

References

  1. 1.

    Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet. 102, 717–730 (2018).

  2. 2.

    Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

  3. 3.

    Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

  4. 4.

    Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).

  5. 5.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

  6. 6.

    Hauberg, M. E. et al. Large-scale identification of common trait and disease variants affecting gene expression. Am. J. Hum. Genet. 100, 885–894 (2017).

  7. 7.

    Pavlides, J. M. W. et al. Predicting gene targets from integrative analyses of summary data from GWAS and eQTL studies for 28 human complex traits. Genome Med. 8, 84 (2016).

  8. 8.

    He, X. et al. Sherlock: detecting gene-disease associations by matching patterns of expression QTL and GWAS. Am. J. Hum. Genet. 92, 667–680 (2013).

  9. 9.

    Wallace, C. et al. Statistical colocalization of monocyte gene expression and genetic risk variants for type 1 diabetes. Hum. Mol. Genet. 21, 2815–2824 (2012).

  10. 10.

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

  11. 11.

    Plagnol, V., Smyth, D. J., Todd, J. A. & Clayton, D. G. Statistical independence of the colocalized association signals for type 1 diabetes and RPS26 gene expression on chromosome 12q13. Biostatistics 10, 327–334 (2009).

  12. 12.

    Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

  13. 13.

    Wen, X., Pique-Regi, R. & Luca, F. Integrating molecular QTL data into genome-wide genetic association analysis: probabilistic assessment of enrichment and colocalization. PLoS Genet. 13, e1006646 (2017).

  14. 14.

    Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

  15. 15.

    Mancuso, N. et al. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100, 473–487 (2017).

  16. 16.

    Gusev, A. et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat. Genet. 50, 538–548 (2018).

  17. 17.

    Sekar, A. et al. Schizophrenia risk from complex variation of complement component 4. Nature 530, 177–183 (2016).

  18. 18.

    GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

  19. 19.

    Willer, C. J. et al. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).

  20. 20.

    Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).

  21. 21.

    Franzén, O. et al. Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases. Science 353, 827–830 (2016).

  22. 22.

    Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).

  23. 23.

    Grundberg, E. et al. Mapping cis- and trans-regulatory effects across multiple tissues in twins. Nat. Genet. 44, 1084–1089 (2012).

  24. 24.

    Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. https://doi.org/10.1038/s41588-019-0367-1 (2019).

  25. 25.

    de Leeuw, C. A., Neale, B. M., Heskes, T. & Posthuma, D. The statistical properties of gene-set analysis. Nat. Rev. Genet. 17, 353–364 (2016).

  26. 26.

    Liu, S. J. et al. CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells. Science 355, aah7111 (2017).

  27. 27.

    Palazzo, A. F. & Lee, E. S. Non-coding RNA: what is functional and what is junk? Front. Genet. 6, 2 (2015).

  28. 28.

    Luo, Y. et al. Exploring the genetic architecture of inflammatory bowel disease by whole-genome sequencing identifies association at ADCY7. Nat. Genet. 49, 186–192 (2017).

  29. 29.

    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

  30. 30.

    Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).

  31. 31.

    Hu, Y. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51, 568–576 (2019).

  32. 32.

    Xu, Z., Wu, C., Wei, P. & Pan, W. A powerful framework for integrating eQTL and GWAS summary data. Genetics 207, 893–902 (2017).

  33. 33.

    Wu, M. C. et al. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89, 82–93 (2011).

  34. 34.

    Mogil, L. S. et al. Genetic architecture of gene expression traits across diverse populations. PLoS Genet. 14, e1007586 (2018).

  35. 35.

    Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL meta-analysis. Preprint at https://www.biorxiv.org/content/10.1101/447367v1 (2018).

  36. 36.

    Wheeler, H. E. et al. Imputed gene associations identify replicable trans-acting genes enriched in transcription pathways and complex traits. Preprint at https://www.biorxiv.org/content/10.1101/471748v1 (2018).

  37. 37.

    Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

  38. 38.

    Engreitz, J. M. et al. Local regulation of gene expression by lncRNA promoters, transcription and splicing. Nature 539, 452–455 (2016).

  39. 39.

    Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).

  40. 40.

    Bhutani, K., Sarkar, A., Park, Y., Kellis, M. & Schork, N. J. Modeling prediction error improves power of transcriptome-wide association studies. Preprint at https://www.biorxiv.org/content/10.1101/108316v1 (2017).

  41. 41.

    Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).

  42. 42.

    Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).

  43. 43.

    Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).

  44. 44.

    Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).

Download references

Acknowledgements

We gratefully acknowledge J. Pritchard, H. Tang and members of the laboratory of N. Zaitlen for helpful discussions. This work was funded in part by the Natural Sciences and Engineering Research Council of Canada (NSERC) (grant PGSD3-476082-2015 to M.W.); a Stanford Bio-X Bowes fellowship (to M.W.); a Stanford Graduate Fellowship (to N.S.-A.); a National Defense Science & Engineering Grant (to N.S.-A.); NIH grants 1DP2OD022870 and U01HG009431 (to A.K.), 1U24HG008956 and 5U01HG009080 (to M.A.R.), R01HG009120 and R01MH115676 (to B.P.), R01MH107666, R01MH101820 and P30DK20595 (to H.K.I.), and R01HL125863 and R21TR001739 (to J.L.M.B.); NHGRI grant R01HG010140 (to M.A.R.); Leducq Foundation grant 12CVD02 (to J.L.M.B.); and American Heart Association grant A14SFRN20840000 (to J.L.M.B.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Author information

M.W., M.A.R. and A.K. conceived the study. M.W., N.M. and A.N.B. performed analyses. N.S.-A., D.A.K. and D.G. provided intellectual input. R.E., A.R., T.Q., K.H. and J.L.M.B. provided assistance with analysis of the STARNET dataset. H.K.I., B.P., M.A.R. and A.K. supervised the study. M.W., H.K.I., B.P., M.A.R. and A.K. wrote the manuscript. All authors reviewed the manuscript.

Correspondence to Johan L. M. Björkegren or Hae Kyung Im or Bogdan Pasaniuc or Manuel A. Rivas or Anshul Kundaje.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Supplementary Figures 1–8 and Supplementary Tables 1–6

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading