A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis

Zhou, Dan; Jiang, Yi; Zhong, Xue; Cox, Nancy J.; Liu, Chunyu; Gamazon, Eric R.

doi:10.1038/s41588-020-0706-2

Technical Report
Published: 05 October 2020

A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis

Dan Zhou ORCID: orcid.org/0000-0002-5313-8164^1,2,
Yi Jiang^2,3,4,
Xue Zhong^1,2,
Nancy J. Cox^1,2,5,
Chunyu Liu ORCID: orcid.org/0000-0002-5986-4415^3,4,6 &
…
Eric R. Gamazon ORCID: orcid.org/0000-0003-4204-8734^1,2,5,7,8

Nature Genetics volume 52, pages 1239–1246 (2020)Cite this article

14k Accesses
95 Citations
31 Altmetric
Metrics details

Subjects

Abstract

Here, we present a joint-tissue imputation (JTI) approach and a Mendelian randomization framework for causal inference, MR-JTI. JTI borrows information across transcriptomes of different tissues, leveraging shared genetic regulation, to improve prediction performance in a tissue-dependent manner. Notably, JTI includes the single-tissue imputation method PrediXcan as a special case and outperforms other single-tissue approaches (the Bayesian sparse linear mixed model and Dirichlet process regression). MR-JTI models variant-level heterogeneity (primarily due to horizontal pleiotropy, addressing a major challenge of transcriptome-wide association study interpretation) and performs causal inference with type I error control. We make explicit the connection between the genetic architecture of gene expression and of complex traits and the suitability of Mendelian randomization as a causal inference strategy for transcriptome-wide association studies. We provide a resource of imputation models generated from GTEx and PsychENCODE panels. Analysis of biobanks and meta-analysis data, and extensive simulations show substantially improved statistical power, replication and causal mapping rate for JTI relative to existing approaches.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The JTI framework: the multi-tissue gene expression imputation model and causal inference engine.**

**Fig. 2: Comparison of the prediction performance between PrediXcan and JTI.**

**Fig. 3: Comparison of the performance of the various methods.**

**Fig. 4: Manhattan plot for LDL-C level from the application of PrediXcan and JTI models in liver to UK Biobank LDL-C GWAS summary statistics.**

**Fig. 5: Improved performance of JTI relative to PrediXcan in GTEx liver tissue for LDL-C-associated genes.**

**Fig. 6: MR-JTI identifies LDL-C-associated genes with a potential causal effect.**

A transcriptome-wide Mendelian randomization study to uncover tissue-dependent regulatory mechanisms across the human phenome

Article Open access 10 January 2020

Tom G. Richardson, Gibran Hemani, … George Davey Smith

Testing and controlling for horizontal pleiotropy with probabilistic Mendelian randomization in transcriptome-wide association studies

Article Open access 31 July 2020

Zhongshang Yuan, Huanhuan Zhu, … Xiang Zhou

SUMMIT: An integrative approach for better transcriptomic data imputation improves causal gene identification

Article Open access 25 October 2022

Zichen Zhang, Ye Eun Bae, … Chong Wu

Data availability

The protected data for the GTEx project (for example, genotype and RNA sequencing data) are available via access request to the database of Genotypes and Phenotypes (accession number phs000424.v8.p2). Processed GTEx data (for example, gene expression and eQTLs) are available from the GTEx portal (https://gtexportal.org). The URLs of the summary statistics datasets of all of the GWAS meta-analyses analyzed in this paper can be found in Supplementary Table 7. All of the summary results from the gene-based analyses are provided in Supplementary Table 7. The JTI GTEx models (as well as the PrediXcan and (modified) UTMOST models we generated) are available for download from Zenodo (https://doi.org/10.5281/zenodo.3842289). The PsychENCODE (https://doi.org/10.5281/zenodo.3859065) and GEUVADIS (https://doi.org/10.5281/zenodo.3859075) models have also been deposited.

Code availability

The code for JTI and MR-JTI and for reproducing the figures in this paper is available from GitHub (https://github.com/gamazonlab/MR-JTI).

References

Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Article CAS PubMed Google Scholar
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Article CAS PubMed Google Scholar
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Article CAS PubMed Google Scholar
Barbeira, A. N. et al. Integrating predicted transcriptome from multiple tissues improves association detection. PLoS Genet. 15, e1007889 (2019).
Article PubMed Google Scholar
Hu, Y. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51, 568–576 (2019).
Article CAS PubMed Google Scholar
Flutre, T., Wen, X., Pritchard, J. & Stephens, M. A statistical framework for joint eQTL analysis inmultiple tissues. PLoS genetics 9 (2013).
GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).
Article CAS PubMed Google Scholar
GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Pierce, B. L. & Burgess, S. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. Am. J. Epidemiol. 178, 1177–1184 (2013).
Article PubMed Google Scholar
Burgess, S. & Thompson, S. G. Use of allele scores as instrumental variables for Mendelian randomization. Int. J. Epidemiol. 42, 1134–1144 (2013).
Article PubMed Google Scholar
Smith, G. D. & Ebrahim, S. Mendelian randomization: prospects, potentials, and limitations. Int. J. Epidemiol. 33, 30–42 (2004).
Article PubMed Google Scholar
Johnson, T. Efficient Calculation for Multi-SNP Genetic Risk Scores Technical Report (The Comprehensive R Archive Network, 2013).
Burgess, S., Butterworth, A. & Thompson, S. G. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol. 37, 658–665 (2013).
Article PubMed Google Scholar
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Article PubMed Google Scholar
Bowden, J. et al. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40, 304–314 (2016).
Article PubMed Google Scholar
Verbanck, M., Chen, C.-Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).
Article CAS PubMed Google Scholar
Nagpal, S. et al. Tigar: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am. J. Hum. Genet. 105, 258–266 (2019).
Article CAS PubMed Google Scholar
Barbeira, A. N. et al. Widespread dose-dependent effects of RNA expression and splicing on complex diseases and traits. Preprint at BioRxiv https://doi.org/10.1101/814350 (2019).
Bavoux, C., Hoffmann, J. & Cazaux, C. Adaptation to DNA damage and stimulation of genetic instability: the double-edged sword mammalian DNA polymerase κ. Biochimie 87, 637–646 (2005).
Article CAS PubMed Google Scholar
Williams, H. L., Gottesman, M. E. & Gautier, J. Replication-independent repair of DNA interstrand crosslinks. Mol. Cell 47, 140–147 (2012).
Article CAS PubMed Google Scholar
Shimizu, I., Yoshida, Y., Suda, M. & Minamino, T. DNA damage response and metabolic disease. Cell Metab. 20, 967–977 (2014).
Article CAS PubMed Google Scholar
Stancel, J. N. K. et al. Polk mutant mice have a spontaneous mutator phenotype. DNA Repair 8, 1355–1362 (2009).
Article CAS PubMed Google Scholar
Li, Z. et al. Integrating mouse and human genetic data to move beyond GWAS and identify causal genes in cholesterol metabolism. Cell Metab. 31, 741–754.e5 (2020).
Article CAS PubMed Google Scholar
Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390.e19 (2019).
Article CAS PubMed Google Scholar
Gamazon, E. R. et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956–967 (2018).
Article CAS PubMed Google Scholar
Gratten, J. & Visscher, P. M. Genetic pleiotropy in complex traits and diseases: implications for genomic medicine. Genome Med. 8, 78 (2016).
Article PubMed Google Scholar
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).
Article CAS PubMed Google Scholar
Wen, X., Pique-Regi, R. & Luca, F. Integrating molecular QTL data into genome-wide genetic association analysis: probabilistic assessment of enrichment and colocalization. PLoS Genet. 13, e1006646 (2017).
Article PubMed Google Scholar
ENCODE Project Consortium The ENCODE (Encyclopedia of DNA Elements) project. Science 306, 636–640 (2004).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article CAS PubMed Google Scholar
Barfield, R. et al. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet. Epidemiol. 42, 418–433 (2018).
Article PubMed Google Scholar
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Article CAS PubMed Google Scholar
Deschenes, A. et al. similaRpeak: Metrics to estimate a level of similarity between two ChIP-Seq profiles. R package version 1.18.0 (2019).
Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).
Article CAS PubMed Google Scholar
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
Article CAS PubMed Google Scholar
Roden, D. M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 84, 362–369 (2008).
Article CAS PubMed Google Scholar
Knight, K. & Fu, W. Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000).
Article Google Scholar
Chatterjee, A. & Lahiri, S. Asymptotic properties of the residual bootstrap for lasso estimators. Proc. Am. Math. Soc. 138, 4497–4509 (2010).
Article Google Scholar
Chatterjee, A. & Lahiri, S. N. Bootstrapping lasso estimators. J. Am. Stat. Assoc. 106, 608–625 (2011).
Article CAS Google Scholar
Burgess, S. et al. Guidelines for performing Mendelian randomization investigations. Wellcome Open Res. 4, 186 (2019).
Article Google Scholar
Hartwig, F. P., Davies, N. M., Hemani, G. & Davey Smith, G. Two-Sample Mendelian Randomization: Avoiding the Downsides of a Powerful, Widely Applicable but Potentially Fallible Technique (Oxford Univ. Press, 2016).

Download references

Acknowledgements

E.R.G. is grateful to the president and fellows of Clare Hall, University of Cambridge for providing a stimulating intellectual home and for generous support during the Lent and Easter terms (2018). E.R.G. is supported by the National Human Genome Research Institute of the National Institutes of Health under award numbers R35HG010718 and R01HG011138. N.J.C. is supported by U01HG009086 and R01MH113362. This content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The datasets used for part of the replication analysis were obtained from Vanderbilt University Medical Center’s BioVU biorepository, which is supported by institutional funding, private agencies and federal grants, including the NIH-funded Shared Instrumentation Grant S10RR025141 and Clinical and Translational Science Award grants UL1TR002243, UL1TR000445 and UL1RR024975. Genomic data are also supported by investigator-led projects that include U01HG004798, R01NS032830, RC2GM092618, P50GM115305, U01HG006378, U19HL065962 and R01HD074711, as well as the additional funding sources listed at https://victr.vanderbilt.edu/pub/biovu/. The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH and NINDS.

Author information

Authors and Affiliations

Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
Dan Zhou, Xue Zhong, Nancy J. Cox & Eric R. Gamazon
Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
Dan Zhou, Yi Jiang, Xue Zhong, Nancy J. Cox & Eric R. Gamazon
Center of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
Yi Jiang & Chunyu Liu
Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
Yi Jiang & Chunyu Liu
Data Science Institute, Vanderbilt University Medical Center, Nashville, TN, USA
Nancy J. Cox & Eric R. Gamazon
Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA
Chunyu Liu
Clare Hall, University of Cambridge, Cambridge, UK
Eric R. Gamazon
MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
Eric R. Gamazon

Authors

Dan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yi Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xue Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Nancy J. Cox
View author publications
You can also search for this author in PubMed Google Scholar
Chunyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Eric R. Gamazon
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E.R.G. and D.Z. designed the study and wrote the manuscript. E.R.G. and D.Z. developed and implemented the methodology. D.Z. performed the analyses. Y.J., X.Z., N.J.C. and C.L. provided critical input and contributed to reviewing and editing the manuscript. E.R.G. supervised and acquired funding for the study.

Corresponding authors

Correspondence to Dan Zhou or Eric R. Gamazon.

Ethics declarations

Competing interests

E.R.G. receives an honorarium from the journal Circulation Research of the American Heart Association as a member of the Editorial Board. The other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 TWAS could be biased by possible sources of bias (PSB), including invalid instrumental variables (IVs) due to horizontal pleiotropy and weak instruments.

Conventional TWAS, such as PrediXcan, can be viewed as Mendelian randomization with multiple IVs, but without horizontal pleiotropy control. a, A major source of false positives from TWAS is the use of invalid IVs due to horizontal pleiotropy. b, Horizontal pleiotropy can arise in multiple ways. For example, it can come from LD-induced invalid IVs, that is, LD contamination. If we are testing the significance of Gene A, but one of the SNPs (yellow) in the prediction model tags another coding (red) or regulatory (blue) variant that is causal for the trait through another Gene B, causal effect estimation will be biased. c, Even without LD contamination, the estimation may also be biased by the inclusion of weak or false positive eQTLs in the prediction model for Gene A. In this case, the effect of the weak or false positive eQTL for Gene A on the trait is actually mediated by another Gene B (by affecting coding or regulation). More generally, weak instrument bias is a type of finite sample bias; it arises in finite samples where the gene expression (“exposure”) is only weakly correlated with the instrument set. Both b, and c, result in d, a biased estimate of gene causal effect on trait. We estimate the heterogeneity due to PSB using threshold-based residual bootstrap LASSO (see Methods). Our approach estimates the heterogeneity due to invalid IVs and gives an adjusted estimate of the gene causal effect on trait.

Extended Data Fig. 2 The gene expression similarity matrix.

The median expression level (log₂-transformed TPM) across all the samples of a given tissue was used to evaluate the correlation (Pearson) of tissue-tissue pairs across the transcriptome. The similarity map was generated by performing hierarchical clustering.

Extended Data Fig. 3 Comparison of prediction performance between PrediXcan and JTI in all GTEx v8 tissues.

We compared the performance of PrediXcan and JTI using the Pearson correlation r between predicted and observed expression levels for each of the 49 GTEx v8 tissues with more than 70 samples. The white box edges depict interquartile range, whiskers 1.5× the interquartile range, center black dot marks the median level, and the outlines display the kernel probability density. The median correlation is also shown below the x-axis label.

Extended Data Fig. 4 Prediction performance comparison between JTI and three single-tissue approaches (top eQTL, BSLMM, and DPR) in two independent datasets.

Prediction models were trained using BLSMM (5-fold cross-validation FUSION default setting) and JTI (see Methods) in GTEx v8 a, brain frontal cortex BA9 region and d, EBV-transformed lymphocytes. The x-axis and y-axis represent the Pearson correlation r between the predicted expression and observed expression in external (non-GTEx and independent) datasets, that is, a, PsychENCODE and d, GEUVADIS. b, and e, show the corresponding comparisons between JTI and top eQTL, which simply models the genetically regulated expression using the top eQTL. c, and f, We also compared the prediction performance with the DPR model, a nonparametric Bayesian method with a Dirichlet process prior on effect-size variance, using the software tool ‘TIGAR’ with 5-fold cross-validation. The green and purple circles denote genes imputable using only JTI and the other single-tissue approaches, respectively. The black and grey dots denote genes consistently imputable and not imputable, respectively, using both methods.

Extended Data Fig. 5 JTI and PrediXcan showed a substantial increase in iGene discovery between GTEx v6p and v8.

We compared the number of imputable genes across all the tissues between GTEx v6p (yellow) and v8 (green). The prediction performance of GTEx v8 was superior to v6p for both a, PrediXcan and b, JTI in all tissues. The number of iGenes can be found in Supplementary Table 1.

Extended Data Fig. 6 Prediction performance comparison among PrediXcan, JTI, original UTMOST, and modified UTMOST for brain frontal cortex BA9.

We compared the cross-validation prediction performance (r²) in GTEx (internal, brain frontal cortex BA9) and the prediction performance in PsychENCODE (an external test data set, brain prefrontal cortex) among a, PrediXcan, b, JTI, c, original UTMOST, and d, modified UTMOST. The lower figures e–h, are the zoom-in version of the corresponding upper figures. The yellow, green, and purple dots indicate high, medium, and low density.

Extended Data Fig. 7 Prediction performance comparison among PrediXcan, JTI, original UTMOST, and modified UTMOST for EBV transformed lymphocytes.

We compared the cross-validation prediction performance (r²) in GTEx (internal, EBV transformed lymphocytes) and the prediction performance in GEUVADIS (an external test data set, LCLs) among a, PrediXcan, b, JTI, c, original UTMOST, and d, modified UTMOST. The lower figures e, f, g, and h, are the zoom-in version of the corresponding upper figures. The yellow, green, and purple dots indicate high, medium, and low density.

Extended Data Fig. 8 Type I error rate for PrediXcan, UTMOST, and JTI.

The Q-Q plots show the type I error from applying PrediXcan, UTMOST, and JTI models in a–c brain frontal cortex BA9 and d–f, Liver. The blue dashed lines show the 95% CI of the expected -log(P). Type I error rate for all the tissues can be found in Supplementary Table 3.

Extended Data Fig. 9 TWAS power analysis for PrediXcan, UTMOST, and JTI.

The true expression level of randomly sampled causal genes and the effect size for each gene on trait were simulated. In this model, each gene, on average, contributed 0.5% to the phenotypic variance. For each gene, the predicted (that is, genetically determined) expression level was generated according to the proportion of variance explained (PVE), based on the actual prediction performance (R²) in two external datasets (a, PsychENCODE and b, GEUVADIS), for each of the three imputation approaches (PrediXcan, UTMOST, and JTI). Power was estimated as the proportion of simulations that attain significance (defined as Bonferroni adjusted P < 0.05).

Extended Data Fig. 10 Comparison of the estimated gene effect size on LDL-C from MR-JTI and the median estimator.

For each gene, the median estimator was calculated as the median of the Wald ratio estimates across all the cis-eQTLs. The Wald ratio estimate for a cis-eQTL is the ratio of the estimate for the GWAS effect size and the estimate for the eQTL effect size. a, Positive correlation between the estimated gene effect size from MR-JTI and the median estimator effect size (Spearman r = 0.72, P < 2.2e-16) was observed. b, No significant correlation was observed between the median estimator and the MR-JTI estimate from shuffled GWAS summary statistics data. Furthermore, note that MR-JTI’s type I error is well-controlled.

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–4

Reporting Summary

Supplementary Tables

Supplementary Tables 1–10

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, D., Jiang, Y., Zhong, X. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat Genet 52, 1239–1246 (2020). https://doi.org/10.1038/s41588-020-0706-2

Download citation

Received: 07 January 2020
Accepted: 28 August 2020
Published: 05 October 2020
Issue Date: November 2020
DOI: https://doi.org/10.1038/s41588-020-0706-2

This article is cited by

Tissue-specific atlas of trans-models for gene regulation elucidates complex regulation patterns
- Robert Dagostino
- Assaf Gottlieb
BMC Genomics (2024)
Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework
- Shuo Zhang
- Zhou Jiang
- Ping Zeng
Journal of Translational Medicine (2024)
The broad impact of cell death genes on the human disease phenome
- Abigail L. Rich
- Phillip Lin
- Sandra S. Zinkel
Cell Death & Disease (2024)
Adjusting for genetic confounders in transcriptome-wide association studies improves discovery of risk genes of complex traits
- Siming Zhao
- Wesley Crouse
- Xin He
Nature Genetics (2024)
Investigating the role of common cis-regulatory variants in modifying penetrance of putatively damaging, inherited variants in severe neurodevelopmental disorders
- Emilie M. Wigdor
- Kaitlin E. Samocha
- Hilary C. Martin
Scientific Reports (2024)