Abstract
Transcriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to impute gene expression levels from genotypes by using samples with matched genotypes and gene expression data in a given tissue. However, it is challenging to develop robust and accurate imputation models with a limited sample size for any single tissue. Here, we first introduce a multi-task learning method to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average of 39% improvement in imputation accuracy and generated effective imputation models for an average of 120% more genes. We describe a summary-statistic-based testing framework that combines multiple single-tissue associations into a powerful metric to quantify the overall gene–trait association. We applied our method, called UTMOST (unified test for molecular signatures), to multiple genome-wide-association results and demonstrate its advantages over single-tissue strategies.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data used in the manuscript are publicly available (see URLs). GTEx and GERA data can be accessed by application to dbGaP. CommonMind data are available through formal application to NIMH. ADGC phase 2 summary statistics used for validation are available through the NIAGADS portal under accession number NG00076.
References
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An Expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Ardlie, K. G. et al. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Yang, F. et al. Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis. Genome Res. 27, 1859–1871 (2017).
Saha, A. et al. Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res. 27, 1843–1858 (2017).
Mohammadi, P., Castel, S. E., Brown, A. A. & Lappalainen, T. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res. 27, 1872–1884 (2017).
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS. Genet. 6, e1000888 (2010).
Hou, L., Chen, M., Zhang, C. K., Cho, J. & Zhao, H. Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Hum. Mol. Genet. 23, 2780–2790 (2013).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Zhao, S. D., Cai, T. T., Cappola, T. P., Margulies, K. B. & Li, H. Sparse simultaneous signal detection for identifying genetically controlled disease genes. J. Am. Stat. Assoc. 112, 1032–1046 (2016).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Mancuso, N. et al. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100, 473–487 (2017).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Hoffman, J. D. et al. Cis-eQTL-based trans-ethnic meta-analysis reveals novel genes associated with breast cancer risk. PLoS Genet. 13, e1006690 (2017).
Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).
Wainberg, M. et al. Vulnerabilities of transcriptome-wide association studies. Preprint at https://www.biorxiv.org/content/10.1101/206961v5 (2017).
Li, C., Yang, C., Gelernter, J. & Zhao, H. Improving genetic risk prediction by leveraging pleiotropy. Hum. Genet. 133, 639–650 (2014).
Maier, R. et al. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am. J. Hum. Genet. 96, 283–294 (2015).
Hu, Y. et al. Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLoS Genet. 13, e1006836 (2017).
Flutre, T., Wen, X., Pritchard, J. & Stephens, M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 9, e1003486 (2013).
Sul, J. H., Han, B., Ye, C., Choi, T. & Eskin, E. Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches. PLoS Genet. 9, e1003491 (2013).
Duong, D. et al. Applying meta-analysis to genotype-tissue expression data from multiple tissues to identify eQTLs and increase the number of eGenes. Bioinformatics 33, i67–i74 (2017).
Li, G., Jima, D. D., Wright, F. A. & Nobel, A. B. HT-eQTL: integrative eQTL analysis in a large number of human tissues. BMC Bioinformatics 19, 95 (2018).
Hore, V. et al. Tensor decomposition for multiple-tissue gene expression experiments. Nat. Genet. 48, 1094–1100 (2016).
Yuan, M. & Lin, Y. Model selection and estimation in regression with grouped variables. J. Royal Stat. Soc. B 68, 49–67 (2006).
Sun, R. & Lin, X. Set-based tests for genetic association using the generalized Berk–Jones statistic. Preprint at https://arxiv.org/pdf/1710.02469 (2017).
Zhou, X., Carbonetto, P. & Stephens, M. Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet. 9, e1003264 (2013).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506 (2013).
Fromer, M. et al. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19, 1442 (2016).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228 (2015).
Lu, Q. et al. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease. PLoS Genet. 13, e1006933 (2017).
Global Lipids Genetics Consortium. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45, 1274–1283 (2013).
Franzén, O. et al. Cardiometabolic risk loci share downstream cis-and trans-gene regulation across tissues and diseases. Science 353, 827–830 (2016).
Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).
Strong, A. et al. Hepatic sortilin regulates both apolipoprotein B secretion and LDL catabolism. J. Clin. Invest. 122, 2807 (2012).
Lambert, J.-C. et al. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013).
Gagliano, S. A. et al. Genomics implicates adaptive and innate immunity in Alzheimer’s and Parkinson’s diseases. Ann. Clin. Transl. Neurol. 3, 924–933 (2016).
Huang, K. L. et al. A common haplotype lowers PU. 1 expression in myeloid cells and delays onset of Alzheimer’s disease. Nat. Neurosci. 20, 1052 (2017).
Raj, T. et al. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer’s disease susceptibility. Nat. Genet. 50, 1584 (2018).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414. e24 (2016).
Liu, J. Z., Erlich, Y. & Pickrell, J. K. Case-control association mapping by proxy using family history of disease. Nat. Genet. 49, 325–331 (2017).
Hollingworth, P. et al. Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer’s disease. Nat. Genet. 43, 429–435 (2011).
Harold, D. et al. Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer’s disease. Nat. Genet. 41, 1088–1093 (2009).
Naj, A. C. et al. Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer’s disease. Nat. Genet. 43, 436–441 (2011).
Seshadri, S. et al. Genome-wide analysis of genetic loci associated with Alzheimer disease. JAMA 303, 1832–1840 (2010).
Jun, G. R. et al. Transethnic genome-wide scan identifies novel Alzheimer’s disease loci. Alzheimers Dement. 13, 727–738 (2017).
Lambert, J. C. et al. Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer’s disease. Nat. Genet. 41, 1094–1099 (2009).
Sherva, R. et al. Genome-wide association study of the rate of cognitive decline in Alzheimer’s disease. Alzheimers Dement. 10, 45–52 (2014).
Crehan, H. et al. Complement receptor 1 (CR1) and Alzheimer’s disease. Immunobiology 217, 244–250 (2012).
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Remmers, E. F. et al. Genome-wide association study identifies variants in the MHC class I, IL10, and IL23R-IL12RB2 regions associated with Behcet’s disease. Nat. Genet. 42, 698–702 (2010).
Plagnol, V. et al. Genome-wide association analysis of autoantibody positivity in type 1 diabetes cases. PLoS Genet. 7, e1002216 (2011).
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
Kiyota, T. et al. AAV serotype 2/1-mediated gene delivery of anti-inflammatory interleukin-10 enhances neurogenesis and cognitive function in APP + PS1 mice. Gene Ther. 19, 724–733 (2012).
Chakrabarty, P. et al. IL-10 alters immunoproteostasis in APP mice, increasing plaque burden and worsening cognitive behavior. Neuron 85, 519–533 (2015).
Xu, M. et al. A systematic integrated analysis of brain expression profiles reveals YAP1 and other prioritized hub genes as important upstream regulators in Alzheimer’s disease. Alzheimers Dement. 14, 215–229 (2017).
Hohman, T. J. et al. Discovery of gene–gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium. Neurobiol. Aging 38, 141–150 (2016).
Katsouri, L. et al. Prazosin, an α 1-adrenoceptor antagonist, prevents memory deterioration in the APP23 transgenic mouse model of Alzheimer’s disease. Neurobiol. Aging 34, 1105–1115 (2013).
Duplan, L. et al. Lithostathine and pancreatitis-associated protein are involved in the very early stages of Alzheimer’s disease. Neurobiol. Aging 22, 79–88 (2001).
Stenmark, H. & Olkkonen, V. M. The rab gtpase family. Genome. Biol. 2, reviews3007 (2001).
Lin, B. D. et al. Heritability and GWAS studies for monocyte–lymphocyte ratio. Twin Res Hum. Genet. 20, 97–107 (2017).
Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429.e19 (2016).
Li, T. et al. Identification of the gene for vitamin K epoxide reductase. Nature 427, 541–544 (2004).
Kohnke, H., Sörlin, K., Granath, G. & Wadelius, M. Warfarin dose related to apolipoprotein E (APOE) genotype. Eur. J. Clin. Pharmacol. 61, 381–388 (2005).
Teslovich, T. M. et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).
de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256–261 (2017).
Davies, G. et al. Genetic contributions to variation in general cognitive function: a meta-analysis of genome-wide association studies in the CHARGE consortium (N = 53 949). Mol. Psychiatry 20, 183 (2015).
Torres, J. M. et al. Integrative cross tissue analysis of gene expression identifies novel type 2 diabetes genes. Preprint at https://www.biorxiv.org/content/10.1101/108134v2 (2017).
Park, Y. et al. Causal gene inference by multivariate mediation analysis in Alzheimer’s disease. Preprint at https://www.biorxiv.org/content/10.1101/219428v3 (2017).
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Preprint at https://www.biorxiv.org/content/10.1101/236869v2 (2017).
Xu, Z., Wu, C., Wei, P. & Pan, W. A powerful framework for integrating eQTL and GWAS summary data. Genetics 207, 893–902 (2017).
GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Stegle, O., Parts, L., Durbin, R. & Winn, J. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput. Biol. 6, e1000770 (2010).
O’Connor, L. J. et al. Estimating the proportion of disease heritability mediated by gene expression levels. Preprint at https://www.biorxiv.org/content/10.1101/118018v1 (2017).
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284 (2016).
Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906–913 (2007).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317 (2015).
Lu, Q. et al. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease. PLoS Genet. 13, e1006933 (2017).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621 (2018).
Abecasis, G. R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Huang, D. W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4, 44–57 (2009).
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Turner, S. D. qqman: an R package for visualizing GWAS results using QQ and manhattan plots. Preprint at https://www.biorxiv.org/content/10.1101/005165v1 (2014).
Raj, T. et al. Polarization of the effects of autoimmune and neurodegenerative risk alleles in leukocytes. Science 344, 519–523 (2014).
Acknowledgements
This study was supported in part by NIH grants R01 GM59507 and 3P30AG021342-16S2 (Y.H., M.L., Q.L., and H.Z.), CTSA UL1TR000427 (Q.L.), R01 AG042437 and U01 AG006781 (P.K.C. and S. Mukherjee); the Yale World Scholars Program sponsored by the China Scholarship Council (J.W., and Z.Y.); Neil Shen’s SJTU Medical Research Fund, the SJTU-Yale Collaborative Research Seed Fund; and NSFC 31728012 (J.G., H.L., and H.Z.), and the National Key R&D Program of China 2018YFC0910500 (J.G., and H.L). We thank C. Brown for assistance in matching GTEx tissues to Roadmap cell types. This study makes use of summary statistics from many GWAS consortia. We thank the investigators in these GWAS consortia for generously sharing their data. We thank the IGAP for providing summary results data for these analyses. The investigators within IGAP contributed to the design and implementation of IGAP and/or provided data but did not participate in the analysis or writing of this report. IGAP was made possible by the generous participation of the subjects and their families. The i-Select chips were funded by the French National Foundation on Alzheimer’s disease and related disorders. EADI was supported by the LABEX (Laboratory of Excellence Program Investment for the Future) DISTALZ grant, Inserm, Institut Pasteur de Lille, Université de Lille 2, and the Lille University Hospital. The Genetic and Environmental Risk in AD consortium (GERAD) was supported by the Medical Research Council (grant no. 503480), Alzheimer’s Research UK (grant no. 503176), the Wellcome Trust (grant no. 082604/2/07/Z), and the German Federal Ministry of Education and Research (BMBF): Competence Network Dementia (CND) grant nos. 01GI0102, 01GI0711, and 01GI0420. The Cohorts for Heart and Aging Research in Genomic Epidemiology consortium (CHARGE) was partly supported by NIH/NIA grant no. R01 AG033193, NIA grant no. AG081220, AGES contract N01–AG–12100, NHLBI grant no. R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and Erasmus University. ADGC was supported by NIH/NIA grants nos. U01 AG032984, U24 AG021886, and U01 AG016976, and the Alzheimer’s Association grant no. ADGC–10–196728. We thank the contributors who collected the samples used in this study, as well as the patients and their families, whose help and participation made this work possible; data for this study were prepared, archived, and distributed by the National Institute on Aging Alzheimer’s Disease Data Storage Site (NIAGADS) at the University of Pennsylvania (U24-AG041689-01). We are also grateful for all the consortia and investigators that provided publicly accessible GWAS summary statistics.
Author information
Authors and Affiliations
Consortia
Contributions
Y.H., M.L., Q.L., H.L., and H.Z. conceived the study and developed the statistical model. Y.H., M.L., Q.L., H.W., J.W., S.M.Z., B.L., Y.S., S. Muchnik, and J.G. performed the statistical analyses. S.M.Z. and P.N. assisted in LDL analysis. Y.H., M.L., Z.Y., and Q.L. implemented the software. B.W.K. prepared ADGC summary statistics. A.N., A.K., and Y.Z. assisted in data preparation. S. Mukherjee and P.K.C. assisted in Alzheimer’s disease data application, curation, and interpretation. Y.H., M.L., Q.L., H.L., and H.Z. wrote the manuscript. H.Z. advised on statistical and genetic issues. All authors contributed to manuscript editing and approved the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Text and Figures
Supplementary Note, Supplementary Tables 1–13, 21–23 and 26–29, and Supplementary Figures 1–13
Supplementary Tables
Supplementary Tables 14–20, 24 and 25
Rights and permissions
About this article
Cite this article
Hu, Y., Li, M., Lu, Q. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat Genet 51, 568–576 (2019). https://doi.org/10.1038/s41588-019-0345-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-019-0345-7
This article is cited by
-
Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework
Journal of Translational Medicine (2024)
-
Investigating the role of common cis-regulatory variants in modifying penetrance of putatively damaging, inherited variants in severe neurodevelopmental disorders
Scientific Reports (2024)
-
Integrative cross-omics and cross-context analysis elucidates molecular links underlying genetic effects on complex traits
Nature Communications (2024)
-
An overview of detecting gene-trait associations by integrating GWAS summary statistics and eQTLs
Science China Life Sciences (2024)
-
The eQTL colocalization and transcriptome-wide association study identify potentially causal genes responsible for economic traits in Simmental beef cattle
Journal of Animal Science and Biotechnology (2023)