Abstract
Disease variants identified by genome-wide association studies (GWAS) tend to overlap with expression quantitative trait loci (eQTLs), but it remains unclear whether this overlap is driven by gene expression levels ‘mediating’ genetic effects on disease. Here, we introduce a new method, mediated expression score regression (MESC), to estimate disease heritability mediated by the cis genetic component of gene expression levels. We applied MESC to GWAS summary statistics for 42 traits (average N = 323,000) and cis-eQTL summary statistics for 48 tissues from the Genotype-Tissue Expression (GTEx) consortium. Averaging across traits, only 11 ± 2% of heritability was mediated by assayed gene expression levels. Expression-mediated heritability was enriched in genes with evidence of selective constraint and genes with disease-appropriate annotations. Our results demonstrate that assayed bulk tissue eQTLs, although disease relevant, cannot explain the majority of disease heritability.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Genome-wide enhancer-gene regulatory maps link causal variants to target genes underlying human cancer risk
Nature Communications Open Access 25 September 2023
-
Transcriptome-wide association studies: recent advances in methods, applications and available databases
Communications Biology Open Access 01 September 2023
-
Single-cell genomics improves the discovery of risk variants and genes of atrial fibrillation
Nature Communications Open Access 17 August 2023
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
GWAS summary statistics for 42 diseases and complex traits can be found at https://data.broadinstitute.org/alkesgroup/sumstats_formatted/. Genotypes for 1000 Genomes Phase 3 data can be found at ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502. GTEx v.7 data can be found at https://www.gtexportal.org/home/datasets, although to access genotypes one is required to have an approved application. eQTLGen data can be found at https://www.eqtlgen.org/cis-eqtls.html. BaselineLD v.2.0 annotations can be found at https://data.broadinstitute.org/alkesgroup/LDSCORE/. Gene sets can be found from the Macarthur laboratory, https://github.com/macarthur-lab/gene_lists, and Molecular Signatures Database, http://software.broadinstitute.org/gsea/msigdb/collections.jsp. S-LDSC software can be found at https://github.com/bulik/ldsc. BOLT-LMM software can be found at https://data.broadinstitute.org/alkesgroup/BOLT-LMM/downloads/.
Code availability
Software implementing MESC can be found at https://github.com/douglasyao/mesc.
References
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Visscher, P. M. et al. 10 Years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Stunnenberg, H. G. et al. The international human epigenome consortium: a blueprint for scientific collaboration and discovery. Cell 167, 1145–1149 (2016).
GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).
Ongen, H. et al. Estimating the causal tissues for complex traits and diseases. Nat. Genet. 49, 1676–1683 (2017).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
Barfield, R. et al. Transcriptome-wide association studies accounting for colocalization using Egger regression. Genet. Epidemiol. 42, 418–433 (2018).
Mancuso, N. et al. Large-scale transcriptome-wide association study identifies new prostate cancer risk regions. Nat. Commun. 9, 4079 (2018).
Wu, L. et al. A transcriptome-wide association study of 229,000 women identifies new candidate susceptibility genes for breast cancer. Nat. Genet. 50, 968–978 (2018).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Gandal, M. J. et al. Transcriptome-wide isoform-level dysregulation in ASD, schizophrenia, and bipolar disorder. Science 362, eaat8127 (2018).
Gusev, A. et al. A transcriptome-wide association study of high-grade serous epithelial ovarian cancer identifies new susceptibility genes and splice variants. Nat. Genet. 51, 815–823 (2019).
Huckins, L. M. et al. Gene expression imputation across multiple brain regions provides insights into schizophrenia risk. Nat. Genet. 51, 659–674 (2019).
Gamazon, E. R., Zwinderman, A. H., Cox, N. J., Denys, D. & Derks, E. M. Multi-tissue transcriptome analyses identify genetic mechanisms underlying neuropsychiatric traits. Nat. Genet. 51, 933–940 (2019).
Porcu, E. et al. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat. Commun. 10, 3300 (2019).
Davis, L. K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).
Torres, J. M. et al. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am. J. Hum. Genet. 95, 521–534 (2014).
Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).
Gamazon, E. R. et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956–967 (2018).
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Mancuso, N. et al. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51, 675–682 (2019).
Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).
Võsa, U. et al. Unraveling the polygenic architecture of complex traits using blood eQTL metaanalysis. Preprint at bioRxiv https://doi.org/10.1101/447367 (2018).
Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949 (2014).
Wijst, M. G. Pvander. et al. Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs. Nat. Genet. 50, 493–497 (2018).
Strober, B. J. et al. Dynamic genetic regulation of gene expression during cellular differentiation. Science 364, 1287–1290 (2019).
Hemani, G., Bowden, J. & Davey Smith, G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum. Mol. Genet. 27, R195–R208 (2018).
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Glassberg, E. C., Gao, Z., Harpak, A., Lan, X. & Pritchard, J. K. Evidence for weak selective constraint on human gene expression. Genetics 211, 757–772 (2019).
Hernandez, R. D. et al. Ultrarare variants drive substantial cis heritability of human gene expression. Nat. Genet. 51, 1349–1355 (2019).
Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).
Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).
Gazal, S. et al. Linkage disequilibrium–dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45, D353–D361 (2017).
Fabregat, A. et al. The reactome pathway knowledgebase. Nucleic Acids Res. 46, D649–D655 (2018).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Wishart, D. S. et al. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res. 46, D1074–D1082 (2018).
Blake, J. A. et al. The mouse genome database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 39, D842–D848 (2011).
Georgi, B., Voight, B. F. & Bućan, M. From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013).
Liu, X., Jian, X. & Boerwinkle, E. dbNSFP v2.0: a database of human non-synonymous snvs and their functional predictions and annotations. Hum. Mutat. 34, E2393–E2402 (2013).
MacArthur, J. et al. The new NHGRI-EBI catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res. 45, D896–D901 (2017).
Darnell, J. C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011).
Pardiñas, A. F. et al. Common schizophrenia alleles are enriched in mutation-intolerant genes and in regions under strong background selection. Nat. Genet. 50, 381–389 (2018).
Krishnan, V., Bryant, H. U. & MacDougald, O. A. Regulation of bone mass by Wnt signaling. J. Clin. Invest. 116, 1202–1209 (2006).
Periayah, M. H., Halim, A. S. & Mat Saad, A. Z. Mechanism action of platelets and crucial blood coagulation pathways in hemostasis. Int. J. Hematol. Oncol. Stem Cell Res. 11, 319–327 (2017).
Segrè, A. V. et al. Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet. 6, e1001058 (2010).
Lee, P. H., O’Dushlaine, C., Thomas, B. & Purcell, S. M. INRICH: interval-based enrichment analysis for genome-wide association studies. Bioinformatics 28, 1797–1799 (2012).
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Leeuw, C. A., de, Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Leeuw, C. A., de, Neale, B. M., Heskes, T. & Posthuma, D. The statistical properties of gene-set analysis. Nat. Rev. Genet. 17, 353–364 (2016).
Yoon, S. et al. Efficient pathway enrichment and network analysis of GWAS summary data using GSA-SNP2. Nucleic Acids Res. 46, e60 (2018).
Zhu, X. & Stephens, M. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat. Commun. 9, 4361 (2018).
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Delaneau, O. et al. Chromatin three-dimensional interactions mediate genetic effects on gene expression. Science 364, eaat8266 (2019).
Hu, Y. et al. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51, 568–576 (2019).
Price, A. L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).
Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).
Tibshirani, R. Regression shrinkage and selection via the Lasso.J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996).
The International HapMap 3 Consortium. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Stegle, O., Parts, L., Piipari, M., Winn, J. & Durbin, R. Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses. Nat. Protoc. 7, 500–507 (2012).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Kim, S. S. et al. Genes with high network connectivity are enriched for disease heritability. Am. J. Hum. Genet. 104, 896–913 (2019).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Loh, P.-R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
Hart, T. et al. Evaluation and design of genome-wide CRISPR/SpCas9 knockout screens. G3 7, 2719–2727 (2017).
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739–1740 (2011).
Acknowledgements
We thank B. Pasaniuc, R. Ophoff, H. Shi, S. Groha, K. Siewert, S. Gazal and A. Liu for helpful discussions. This research was funded by NIH grant nos. T32 HG002295 (D.W.Y.), R01 MH115676 (A.L.P. and A.G.), R01 CA227237 (A.G.), R01 MH107649 (A.L.P.), R01 MH101244 (A.L.P.), R01 HG006399 (A.L.P.) and U01 HG009379 (A.L.P.). This research was conducted using the UK Biobank Resource under Application 16549.
Author information
Authors and Affiliations
Contributions
D.W.Y., L.J.O., A.L.P. and A.G. conceived the project. D.W.Y. and A.G. designed experiments. D.W.Y. performed the experiments and analyzed the data. D.W.Y. and A.G. wrote the manuscript with input from L.J.O. and A.L.P.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Relationship between \(h_{med}^2/h_g^2\) and \(h_g^2\).
\(h_{med}^2/h_g^2\) estimates were obtained using all-tissue meta-analyzed expression scores. \(h_g^2\) estimates were obtained using stratified LD-score regression. Error bars represent jackknife standard errors.
Extended Data Fig. 2 \(h_{med}^2/h_g^2\) estimates for all diseases and expression scores.
Same as Fig. 3a, but containing \(h_{med}^2/h_g^2\) estimates for all 42 traits from all three types of expression scores: ‘All tissues’ (expression scores meta-analyzed across all 48 GTEx tissues), ‘Best tissue group’ (expression scores meta-analyzed within 7 tissue groups), and ‘Best tissue’ (expression scores computed within individual tissues). Here, ‘best’ refers to the tissue/tissue group resulting in the highest estimates of \(h_{med}^2/h_g^2\) compared to all other tissues/tissue groups. Error bars represent jackknife standard errors.
Extended Data Fig. 3 Relationship between individual tissue sample size and magnitude of \(h_{med}^2/h_g^2\).
\(h_{med}^2/h_g^2\) estimates from expression scores estimated in each of 48 individual GTEx tissues were meta-analyzed across 42 complex traits, then plotted against the number of samples in each tissue. We use the following abbreviations: adipose visceral, adipose visceral omentum; brain ACC, brain anterior cingulate cortex BA24; brain CBG, brain caudate basal ganglia; brain CH, brain cerebellar hemisphere; brain FC, brain frontal cortex BA9; brain NABG, brain nucleus accumbens basal ganglia; brain PBG brain putamen basal ganglia; cells CETL, cells EBV-transformed lymphocytes; cells TF, cells transformed fibroblasts; esophagus GJ, esophagus gastroesophageal junction; heart AA, heart atrial appendage; heart LV, heart left ventricle; skin NSES, skin not sun exposed suprapubic; skin SELL, skin sun exposed lower leg; small intestine, small intestine terminal ileum.
Extended Data Fig. 4 \(h_{med}^{2}/h_{g}^{2}\) estimates for 42 diseases and complex traits using data from eQTLGen.
We estimated expression scores for all SNPs using cis-eQTL summary statistics from eQTLGen (N = 31,684 blood samples), then estimated \(h_{med}^2/h_g^2\) using GWAS summary statistics for the same 42 traits analyzed in the main text. Expression cis-heritability estimates for eQTLGen data were obtained using LD-score regression. For sake of comparison, we also display \(h_{med}^2/h_g^2\) estimates obtained from expression scores from GTEx all-tissue meta-analysis and GTEx whole blood only. (a) \(h_{med}^2/h_g^2\) estimates for 42 individual traits, organized into blood/immune and non-blood/immune traits. Error bars represent jackknife standard errors. (b) Results from a meta-analyzed across traits. Error bars represent standard errors from random-effects meta-analysis. Note that low estimates of \(h_{med}^2/h_g^2\) for GTEx whole blood expression scores are caused by the small sample size of the GTEx whole blood data set (N = 369).
Extended Data Fig. 5 Relationship between expression cis-heritability and metrics of gene essentiality.
For each gene, pLI (probability of loss-of-function intolerance) was obtained from Lek et al. 2016 Nature and shet (selection against protein-truncating variants) was obtained from Cassa et al. 2017 Nature Genetics.
Extended Data Fig. 6 \(h_{med}^{2}\) enrichment estimates for all 10 broadly essential gene sets across all 26 complex traits.
Same as Fig. 5a, but showing \(h_{med}^2\) enrichment estimates for individual traits rather than meta-analyzed estimates.
Extended Data Fig. 7 \(h_{med}^{2}\) enrichment estimates for 97 pathway-specific gene sets across all 26 complex traits.
Same as Fig. 5b, but plotting all pathway-specific gene sets (out of 780 total) with FDR-significant \(h_{med}^2\) enrichment in at least one of the 26 complex traits. For ease of display, we grouped together related traits and gene sets.
Extended Data Fig. 8 Comparison between gene set enrichment estimates from MESC, MAGMA, and DEPICT.
See Supplementary Note for details on these analyses. (a) Venn diagram showing the overlap between significantly enriched trait-gene set pairs (FDR < 0.05) identified by the three methods. (b) Scatterplots of -log10 enrichment p-values from MESC vs. MAGMA (left), MESC vs. DEPICT (middle), and MAGMA vs. DEPICT (right). Each point represents a trait-gene set pair. (c) List of all 32 gene sets-complex traits pairs detected as significant by MESC (FDR q-value < 0.05) that are not detected as significant by MAGMA or DEPICT. See Supplementary Table 9 for enrichment estimates for all gene set-complex traits pairs.
Supplementary information
Supplementary Information
Supplementary Note and Figs. 1–25
Supplementary Tables
Supplementary Tables 1–9
Rights and permissions
About this article
Cite this article
Yao, D.W., O’Connor, L.J., Price, A.L. et al. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat Genet 52, 626–633 (2020). https://doi.org/10.1038/s41588-020-0625-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-020-0625-2
This article is cited by
-
Local genetic correlations exist among neurodegenerative and neuropsychiatric diseases
npj Parkinson's Disease (2023)
-
Alternative polyadenylation transcriptome-wide association study identifies APA-linked susceptibility genes in brain disorders
Nature Communications (2023)
-
Genome-wide meta-analysis, functional genomics and integrative analyses implicate new risk genes and therapeutic targets for anxiety disorders
Nature Human Behaviour (2023)
-
Effect of tissue-grouped regulatory variants associated to type 2 diabetes in related secondary outcomes
Scientific Reports (2023)
-
Systematic differences in discovery of genetic effects on gene expression and complex traits
Nature Genetics (2023)