Probabilistic fine-mapping of transcriptome-wide association studies

Article metrics


Transcriptome-wide association studies using predicted expression have identified thousands of genes whose locally regulated expression is associated with complex traits and diseases. In this work, we show that linkage disequilibrium induces significant gene–trait associations at non-causal genes as a function of the expression quantitative trait loci weights used in expression prediction. We introduce a probabilistic framework that models correlation among transcriptome-wide association study signals to assign a probability for every gene in the risk region to explain the observed association signal. Importantly, our approach remains accurate when expression data for causal genes are not available in the causal tissue by leveraging expression prediction from other tissues. Our approach yields credible sets of genes containing the causal gene at a nominal confidence level (for example, 90%) that can be used to prioritize genes for functional assays. We illustrate our approach by using an integrative analysis of lipid traits, where our approach prioritizes genes with strong evidence for causality.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Illustration of the induced correlation structure for predicted expression.
Fig. 2: Simulation diagram for alternative and null scenarios.
Fig. 3: Credible gene sets are well calibrated in simulations.
Fig. 4: FOCUS credible sets alleviate bias in confounding simulations.
Fig. 5: FOCUS accurately prioritizes causal genes in simulations.
Fig. 6: 1p13 locus for LDL.

Code availability

FUSION TWAS method ( and FOCUS fine-mapping methods (

Data availability

Data used in this study are available at the following links: TWAS eQTL weights (, TWAS and fine-mapping results (, and lipid GWAS summary data (


  1. 1.

    Gusev, A. K. A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

  2. 2.

    Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet. 47, 1091–1098 (2015).

  3. 3.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

  4. 4.

    Mancuso, N. et al. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100, 473–487 (2017).

  5. 5.

    Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).

  6. 6.

    Lawlor, D. A., Harbord, R. M., Sterne, J. A., Timpson, N. & Davey, S. G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27, 1133–1163 (2008).

  7. 7.

    Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).

  8. 8.

    Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).

  9. 9.

    Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–R98 (2014).

  10. 10.

    Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. (2019).

  11. 11.

    Barfield, R. et al. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet. Epidemiol. 42, 418–433 (2018).

  12. 12.

    Bowden, J., Davey Smith, G., Haycock, P. C. & Burgess, S. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).

  13. 13.

    Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–1301 (2012).

  14. 14.

    Hormozdiari, F., Kichaev, G., Yang, W.-Y., Pasaniuc, B. & Eskin, E. Identification of causal genes for complex traits. Bioinformatics 31, i206–i213 (2015).

  15. 15.

    Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).

  16. 16.

    Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

  17. 17.

    Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).

  18. 18.

    Consortium, G. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

  19. 19.

    Gelman, A., Meng, X.-L. & Stern, H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sin. 6, 733–760 (1996).

  20. 20.

    Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

  21. 21.

    Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).

  22. 22.

    Gusev, A. et al. Atlas of prostate cancer heritability in European and African-American men pinpoints tissue-specific regulation. Nat. Commun. 7, 10979 (2016).

  23. 23.

    Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).

  24. 24.

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

  25. 25.

    Krause, B. R. & Hartman, A. D. Adipose tissue and cholesterol metabolism. J. Lipid Res. 25, 97–110 (1984).

  26. 26.

    Le Lay, S. et al. Cholesterol: a cell size dependent signal which regulates glucose metabolism and gene expression in adipocytes. J. Biol. Chem. 276, 16904–16910 (2001).

  27. 27.

    Berg, A. H., Combs, T. P. & Scherer, P. E. ACRP30/adiponectin: an adipokine regulating glucose and lipid metabolism. Trends Endocrinol. Metab. 13, 84–89 (2002).

  28. 28.

    de Haan, W., Bhattacharjee, A., Ruddle, P., Kang, M. H. & Hayden, M. R. ABCA1 in adipocytes regulates adipose tissue lipid content, glucose tolerance and insulin sensitivity. J. Lipid Res. 55, 516–523 (2014).

  29. 29.

    O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

  30. 30.

    Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).

  31. 31.

    Hormozdiari, F. et al. Widespread allelic heterogeneity in complex traits. Am. J. Hum. Genet. 100, 789–802 (2017).

  32. 32.

    Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

  33. 33.

    Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

  34. 34.

    Gusev, A. et al. Transcriptome-wide association study of schizophrenia and chromatin activity yields mechanistic disease insights. Nat. Genet. 50, 538–548 (2018).

  35. 35.

    Kaalund, S. et al. Contrasting changes in DRD1 and DRD2 splice variant expression in schizophrenia and affective disorders, and associations with SNPs in postmortem brain. Mol. Psychiatry 19, 1258–1266 (2014).

  36. 36.

    Marigorta, U. M. et al. Transcriptional risk scores link GWAS to eQTLs and predict complications in Crohn’s disease. Nat. Genet. 49, 1517–1521 (2017).

  37. 37.

    Habier, D., Fernando, R. & Dekkers, J. C. The impact of genetic relationship information on genome-assisted breeding values. Genetics 177, 2389–2397 (2007).

  38. 38.

    VanRaden, P. M. Efficient methods to compute genomic predictions. J. Dairy Science 91, 4414–4423 (2008).

  39. 39.

    Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).

  40. 40.

    Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).

  41. 41.

    Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

  42. 42.

    The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

  43. 43.

    Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996).

Download references


We would like to thank C. Giambartolomei for discussions. This work was funded by NIH awards nos. T32NS048004 (N.M.), T32LM012424 (M.K.F.), R01HG009120 (N.M., M.K.F., R.J., G.K., H.S., B.P.), R01MH115676 (N.M., M.K.F., R.J., G.K., H.S., A.G., B.P.), R01HG006399 (N.M., M.K.F., R.J., G.K., H.S., B.P.), and U01CA194393 (N.M., M.K.F., R.J., G.K., H.S., B.P.); NSF award no. DGE-1829071 (R.J.); and the Claudia Adams Barr Award (A.G.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

N.M., A.G., and B.P. developed the model. N.M., M.K.F., H.S., and G.K. performed simulations and analyses. N.M. and R.J. designed and wrote the FOCUS software. All authors read and approved the manuscript.

Correspondence to Nicholas Mancuso or Bogdan Pasaniuc.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Text and Figures

Supplementary Note and Supplementary Figures 1–24

Reporting Summary

Supplementary Tables

Supplementary Tables 1–5

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading