A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis

Abstract

Here, we present a joint-tissue imputation (JTI) approach and a Mendelian randomization framework for causal inference, MR-JTI. JTI borrows information across transcriptomes of different tissues, leveraging shared genetic regulation, to improve prediction performance in a tissue-dependent manner. Notably, JTI includes the single-tissue imputation method PrediXcan as a special case and outperforms other single-tissue approaches (the Bayesian sparse linear mixed model and Dirichlet process regression). MR-JTI models variant-level heterogeneity (primarily due to horizontal pleiotropy, addressing a major challenge of transcriptome-wide association study interpretation) and performs causal inference with type I error control. We make explicit the connection between the genetic architecture of gene expression and of complex traits and the suitability of Mendelian randomization as a causal inference strategy for transcriptome-wide association studies. We provide a resource of imputation models generated from GTEx and PsychENCODE panels. Analysis of biobanks and meta-analysis data, and extensive simulations show substantially improved statistical power, replication and causal mapping rate for JTI relative to existing approaches.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: The JTI framework: the multi-tissue gene expression imputation model and causal inference engine.
Fig. 2: Comparison of the prediction performance between PrediXcan and JTI.
Fig. 3: Comparison of the performance of the various methods.
Fig. 4: Manhattan plot for LDL-C level from the application of PrediXcan and JTI models in liver to UK Biobank LDL-C GWAS summary statistics.
Fig. 5: Improved performance of JTI relative to PrediXcan in GTEx liver tissue for LDL-C-associated genes.
Fig. 6: MR-JTI identifies LDL-C-associated genes with a potential causal effect.

Data availability

The protected data for the GTEx project (for example, genotype and RNA sequencing data) are available via access request to the database of Genotypes and Phenotypes (accession number phs000424.v8.p2). Processed GTEx data (for example, gene expression and eQTLs) are available from the GTEx portal (https://gtexportal.org). The URLs of the summary statistics datasets of all of the GWAS meta-analyses analyzed in this paper can be found in Supplementary Table 7. All of the summary results from the gene-based analyses are provided in Supplementary Table 7. The JTI GTEx models (as well as the PrediXcan and (modified) UTMOST models we generated) are available for download from Zenodo (https://doi.org/10.5281/zenodo.3842289). The PsychENCODE (https://doi.org/10.5281/zenodo.3859065) and GEUVADIS (https://doi.org/10.5281/zenodo.3859075) models have also been deposited.

Code availability

The code for JTI and MR-JTI and for reproducing the figures in this paper is available from GitHub (https://github.com/gamazonlab/MR-JTI).

References

  1. 1.

    Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    CAS  Article  PubMed  Google Scholar 

  4. 4.

    Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).

    Article  PubMed  Google Scholar 

  5. 5.

    Hu, Y. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51, 568–576 (2019).

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Flutre, T., Wen, X., Pritchard, J. & Stephens, M. A statistical framework for joint eQTL analysis inmultiple tissues. PLoS genetics 9 (2013).

  7. 7.

    GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  8. 8.

    GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

  9. 9.

    Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).

    CAS  Article  PubMed  Google Scholar 

  10. 10.

    GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

  11. 11.

    Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).

    Article  PubMed  Google Scholar 

  12. 12.

    Burgess, S. & Thompson, S. G. Use of allele scores as instrumental variables for Mendelian randomization. Int. J. Epidemiol. 42, 1134–1144 (2013).

    Article  PubMed  Google Scholar 

  13. 13.

    Smith, G. D. & Ebrahim, S. Mendelian randomization: prospects, potentials, and limitations. Int. J. Epidemiol. 33, 30–42 (2004).

    Article  PubMed  Google Scholar 

  14. 14.

    Johnson, T. Efficient Calculation for Multi-SNP Genetic Risk Scores Technical Report (The Comprehensive R Archive Network, 2013).

  15. 15.

    Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).

    Article  PubMed  Google Scholar 

  16. 16.

    Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).

    Article  PubMed  Google Scholar 

  17. 17.

    Bowden, J. et al. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).

    Article  PubMed  Google Scholar 

  18. 18.

    Verbanck, M., Chen, C.-Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).

    CAS  Article  PubMed  Google Scholar 

  19. 19.

    Nagpal, S. et al. Tigar: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am. J. Hum. Genet. 105, 258–266 (2019).

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    Barbeira, A. N. et al. Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits. Preprint at BioRxiv https://doi.org/10.1101/814350 (2019).

  21. 21.

    Bavoux, C., Hoffmann, J. & Cazaux, C. Adaptation to DNA damage and stimulation of genetic instability: the double-edged sword mammalian DNA polymerase κ. Biochimie 87, 637–646 (2005).

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    Williams, H. L., Gottesman, M. E. & Gautier, J. Replication-independent repair of DNA interstrand crosslinks. Mol. Cell 47, 140–147 (2012).

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Shimizu, I., Yoshida, Y., Suda, M. & Minamino, T. DNA damage response and metabolic disease. Cell Metab. 20, 967–977 (2014).

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Stancel, J. N. K. et al. Polk mutant mice have a spontaneous mutator phenotype. DNA Repair 8, 1355–1362 (2009).

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Li, Z. et al. Integrating mouse and human genetic data to move beyond GWAS and identify causal genes in cholesterol metabolism. Cell Metab. 31, 741–754.e5 (2020).

    CAS  Article  PubMed  Google Scholar 

  26. 26.

    Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390.e19 (2019).

    CAS  Article  PubMed  Google Scholar 

  27. 27.

    Gamazon, E. R. et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956–967 (2018).

    CAS  Article  PubMed  Google Scholar 

  28. 28.

    Gratten, J. & Visscher, P. M. Genetic pleiotropy in complex traits and diseases: implications for genomic medicine. Genome Med. 8, 78 (2016).

    Article  PubMed  Google Scholar 

  29. 29.

    Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).

    CAS  Article  PubMed  Google Scholar 

  30. 30.

    Wen, X., Pique-Regi, R. & Luca, F. Integrating molecular QTL data into genome-wide genetic association analysis: probabilistic assessment of enrichment and colocalization. PLoS Genet. 13, e1006646 (2017).

    Article  PubMed  Google Scholar 

  31. 31.

    ENCODE Project Consortium The ENCODE (Encyclopedia of DNA Elements) project. Science 306, 636–640 (2004).

  32. 32.

    Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    CAS  Article  PubMed  Google Scholar 

  33. 33.

    Barfield, R. et al. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet. Epidemiol. 42, 418–433 (2018).

    Article  PubMed  Google Scholar 

  34. 34.

    Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).

    CAS  Article  PubMed  Google Scholar 

  35. 35.

    Deschenes, A. et al. similaRpeak: Metrics to estimate a level of similarity between two ChIP-Seq profiles. R package version 1.18.0 (2019).

  36. 36.

    Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).

    CAS  Article  PubMed  Google Scholar 

  37. 37.

    Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

    CAS  Article  PubMed  Google Scholar 

  38. 38.

    Roden, D. M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 84, 362–369 (2008).

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Knight, K. & Fu, W. Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000).

    Article  Google Scholar 

  40. 40.

    Chatterjee, A. & Lahiri, S. Asymptotic properties of the residual bootstrap for lasso estimators. Proc. Am. Math. Soc. 138, 4497–4509 (2010).

    Article  Google Scholar 

  41. 41.

    Chatterjee, A. & Lahiri, S. N. Bootstrapping lasso estimators. J. Am. Stat. Assoc. 106, 608–625 (2011).

    CAS  Article  Google Scholar 

  42. 42.

    Burgess, S. et al. Guidelines for performing Mendelian randomization investigations. Wellcome Open Res. 4, 186 (2019).

    Article  Google Scholar 

  43. 43.

    Hartwig, F. P., Davies, N. M., Hemani, G. & Davey Smith, G. Two-Sample Mendelian Randomization: Avoiding the Downsides of a Powerful, Widely Applicable but Potentially Fallible Technique (Oxford Univ. Press, 2016).

Download references

Acknowledgements

E.R.G. is grateful to the president and fellows of Clare Hall, University of Cambridge for providing a stimulating intellectual home and for generous support during the Lent and Easter terms (2018). E.R.G. is supported by the National Human Genome Research Institute of the National Institutes of Health under award numbers R35HG010718 and R01HG011138. N.J.C. is supported by U01HG009086 and R01MH113362. This content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The datasets used for part of the replication analysis were obtained from Vanderbilt University Medical Center’s BioVU biorepository, which is supported by institutional funding, private agencies and federal grants, including the NIH-funded Shared Instrumentation Grant S10RR025141 and Clinical and Translational Science Award grants UL1TR002243, UL1TR000445 and UL1RR024975. Genomic data are also supported by investigator-led projects that include U01HG004798, R01NS032830, RC2GM092618, P50GM115305, U01HG006378, U19HL065962 and R01HD074711, as well as the additional funding sources listed at https://victr.vanderbilt.edu/pub/biovu/. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS.

Author information

Affiliations

Authors

Contributions

E.R.G. and D.Z. designed the study and wrote the manuscript. E.R.G. and D.Z. developed and implemented the methodology. D.Z. performed the analyses. Y.J., X.Z., N.J.C. and C.L. provided critical input and contributed to reviewing and editing the manuscript. E.R.G. supervised and acquired funding for the study.

Corresponding authors

Correspondence to Dan Zhou or Eric R. Gamazon.

Ethics declarations

Competing interests

E.R.G. receives an honorarium from the journal Circulation Research of the American Heart Association as a member of the Editorial Board. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 TWAS could be biased by possible sources of bias (PSB), including invalid instrumental variables (IVs) due to horizontal pleiotropy and weak instruments.

Conventional TWAS, such as PrediXcan, can be viewed as Mendelian randomization with multiple IVs, but without horizontal pleiotropy control. a, A major source of false positives from TWAS is the use of invalid IVs due to horizontal pleiotropy. b, Horizontal pleiotropy can arise in multiple ways. For example, it can come from LD-induced invalid IVs, that is, LD contamination. If we are testing the significance of Gene A, but one of the SNPs (yellow) in the prediction model tags another coding (red) or regulatory (blue) variant that is causal for the trait through another Gene B, causal effect estimation will be biased. c, Even without LD contamination, the estimation may also be biased by the inclusion of weak or false positive eQTLs in the prediction model for Gene A. In this case, the effect of the weak or false positive eQTL for Gene A on the trait is actually mediated by another Gene B (by affecting coding or regulation). More generally, weak instrument bias is a type of finite sample bias; it arises in finite samples where the gene expression (“exposure”) is only weakly correlated with the instrument set. Both b, and c, result in d, a biased estimate of gene causal effect on trait. We estimate the heterogeneity due to PSB using threshold-based residual bootstrap LASSO (see Methods). Our approach estimates the heterogeneity due to invalid IVs and gives an adjusted estimate of the gene causal effect on trait.

Extended Data Fig. 2 The gene expression similarity matrix.

The median expression level (log2-transformed TPM) across all the samples of a given tissue was used to evaluate the correlation (Pearson) of tissue-tissue pairs across the transcriptome. The similarity map was generated by performing hierarchical clustering.

Extended Data Fig. 3 Comparison of prediction performance between PrediXcan and JTI in all GTEx v8 tissues.

We compared the performance of PrediXcan and JTI using the Pearson correlation r between predicted and observed expression levels for each of the 49 GTEx v8 tissues with more than 70 samples. The white box edges depict interquartile range, whiskers 1.5× the interquartile range, center black dot marks the median level, and the outlines display the kernel probability density. The median correlation is also shown below the x-axis label.

Extended Data Fig. 4 Prediction performance comparison between JTI and three single-tissue approaches (top eQTL, BSLMM, and DPR) in two independent datasets.

Prediction models were trained using BLSMM (5-fold cross-validation FUSION default setting) and JTI (see Methods) in GTEx v8 a, brain frontal cortex BA9 region and d, EBV-transformed lymphocytes. The x-axis and y-axis represent the Pearson correlation r between the predicted expression and observed expression in external (non-GTEx and independent) datasets, that is, a, PsychENCODE and d, GEUVADIS. b, and e, show the corresponding comparisons between JTI and top eQTL, which simply models the genetically regulated expression using the top eQTL. c, and f, We also compared the prediction performance with the DPR model, a nonparametric Bayesian method with a Dirichlet process prior on effect-size variance, using the software tool ‘TIGAR’ with 5-fold cross-validation. The green and purple circles denote genes imputable using only JTI and the other single-tissue approaches, respectively. The black and grey dots denote genes consistently imputable and not imputable, respectively, using both methods.

Extended Data Fig. 5 JTI and PrediXcan showed a substantial increase in iGene discovery between GTEx v6p and v8.

We compared the number of imputable genes across all the tissues between GTEx v6p (yellow) and v8 (green). The prediction performance of GTEx v8 was superior to v6p for both a, PrediXcan and b, JTI in all tissues. The number of iGenes can be found in Supplementary Table 1.

Extended Data Fig. 6 Prediction performance comparison among PrediXcan, JTI, original UTMOST, and modified UTMOST for brain frontal cortex BA9.

We compared the cross-validation prediction performance (r2) in GTEx (internal, brain frontal cortex BA9) and the prediction performance in PsychENCODE (an external test data set, brain prefrontal cortex) among a, PrediXcan, b, JTI, c, original UTMOST, and d, modified UTMOST. The lower figures eh, are the zoom-in version of the corresponding upper figures. The yellow, green, and purple dots indicate high, medium, and low density.

Extended Data Fig. 7 Prediction performance comparison among PrediXcan, JTI, original UTMOST, and modified UTMOST for EBV transformed lymphocytes.

We compared the cross-validation prediction performance (r2) in GTEx (internal, EBV transformed lymphocytes) and the prediction performance in GEUVADIS (an external test data set, LCLs) among a, PrediXcan, b, JTI, c, original UTMOST, and d, modified UTMOST. The lower figures e, f, g, and h, are the zoom-in version of the corresponding upper figures. The yellow, green, and purple dots indicate high, medium, and low density.

Extended Data Fig. 8 Type I error rate for PrediXcan, UTMOST, and JTI.

The Q-Q plots show the type I error from applying PrediXcan, UTMOST, and JTI models in ac brain frontal cortex BA9 and df, Liver. The blue dashed lines show the 95% CI of the expected -log(P). Type I error rate for all the tissues can be found in Supplementary Table 3.

Extended Data Fig. 9 TWAS power analysis for PrediXcan, UTMOST, and JTI.

The true expression level of randomly sampled causal genes and the effect size for each gene on trait were simulated. In this model, each gene, on average, contributed 0.5% to the phenotypic variance. For each gene, the predicted (that is, genetically determined) expression level was generated according to the proportion of variance explained (PVE), based on the actual prediction performance (R2) in two external datasets (a, PsychENCODE and b, GEUVADIS), for each of the three imputation approaches (PrediXcan, UTMOST, and JTI). Power was estimated as the proportion of simulations that attain significance (defined as Bonferroni adjusted P < 0.05).

Extended Data Fig. 10 Comparison of the estimated gene effect size on LDL-C from MR-JTI and the median estimator.

For each gene, the median estimator was calculated as the median of the Wald ratio estimates across all the cis-eQTLs. The Wald ratio estimate for a cis-eQTL is the ratio of the estimate for the GWAS effect size and the estimate for the eQTL effect size. a, Positive correlation between the estimated gene effect size from MR-JTI and the median estimator effect size (Spearman r = 0.72, P < 2.2e-16) was observed. b, No significant correlation was observed between the median estimator and the MR-JTI estimate from shuffled GWAS summary statistics data. Furthermore, note that MR-JTI’s type I error is well-controlled.

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–4

Reporting Summary

Supplementary Tables

Supplementary Tables 1–10

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, D., Jiang, Y., Zhong, X. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat Genet 52, 1239–1246 (2020). https://doi.org/10.1038/s41588-020-0706-2

Download citation

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing