There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10−31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10−35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.
Subscribe to Journal
Get full journal access for 1 year
only $17.42 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Schizophrenia Working Group of the Psychiatric Genomics Consortium.. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).
Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).
Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Wright, F. A. et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).
Zhang, X. et al. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat. Genet. 47, 345–352 (2015).
The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).
Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).
The GTEx Consortium.. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749 (2013).
Waszak, S. M. et al. Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015).
Grubert, F. et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015).
Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 (2016).
Davis, L. K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).
Torres, J. M. et al. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am. J. Hum. Genet. 95, 521–534 (2014).
Hu, X. et al. Regulation of gene expression in autoimmune disease loci and the genetic basis of proliferation in CD4+ effector memory T cells. PLoS Genet. 10, e1004404 (2014).
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Mancuso, N. et al. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100, 473–487 (2017).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).
Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).
Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
Gazal, S., Finucane, H. K. & Price, A. l. Reconciling S-LDSC and LDAK functional enrichment estimates. bioRxiv https://doi.org/10.1101/256412 (2018).
Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).
Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).
Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).
Hormozdiari, F. et al. Widespread allelic heterogeneity in complex traits. Am. J. Hum. Genet. 100, 789–802 (2017).
Veyrieras, J.-B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).
Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).
Sul, J. H., Han, B., Ye, C., Choi, T. & Eskin, E. Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches. PLoS Genet. 9, e1003491 (2013).
Flutre, T., Wen, X., Pritchard, J. & Stephens, M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 9, e1003486 (2013).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
The International HapMap 3 Consortium.. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).
Loh, P.-R. et al. Mixed model association for Biobank-scale data sets. Nat. Genet. 50, 621–629 (2018).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).
Urbut, S. M., Wang, G. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. bioRxiv https://doi.org/10.1101/096552 (2017).
Park, Y., Sarkar, A. K., Bhutani, K. & Kellis, M. Multi-tissue polygenic models for transcriptome-wide association studies. bioRxiv (2017).
Jostins, L. et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).
Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).
Dubois, P. C. A. et al. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42, 295–302 (2010).
Day, F. R. et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat. Genet. 49, 834–841 (2017).
Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).
Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983 (2011).
The Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).
Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).
We thank S. Raychaudhuri, N. Zaitlen, B. Pasaniuc, M. Nivard, J.-H. Sul and F. Hormozdiari for helpful discussions. This research was funded by NIH grants U01 HG009379, R01 MH101244, R01 MH109978, T32 DK110919 and R01 MH107649. This research was conducted using the UK Biobank Resource under application 16549.
The authors declare no competing interests.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
Supplementary Figure 1 Enrichment and τ* of different QTL annotations for whole blood in the GTEx dataset without conditioning on the baselineLD model.
a, Meta-analysis results of enrichment for whole blood tissue from the GTEx datasets. b, Meta-analysis results of τ* for whole blood. The y axis is the meta-analyzed value, and error bars represent jackknife 95% confidence intervals. These values were computed by meta-analyzing 41 independent traits (n = 41 independent simulations to derive the statistics). Numerical results are reported in Supplementary Table 9.
We report annotation effect size (τ*) estimates at different eQTL sample sizes simulated under the alternative simulation framework (Methods). We simulated eQTL studies where we ranged the eQTL sample size as 100, 200, 300, 400, 500, 600, 700, 800, 900 and 1,000. The x axis is the eQTL sample size, and the y axis is the estimated annotation effect size (τ*). We observed a linear relationship between eQTL sample size and τ* (R2 = 0.84, P = 9.39 × 10–5).
The x axis is the MaxCPP value, and the y axis is the frequency corresponding to each MaxCPP value.
Supplementary Figure 4 Pairwise correlation among all baselineLD model annotations and our six molecular QTL MaxCPP annotations.
We compute the pairwise correlation between annotation values.
Supplementary Figure 5 Pairwise correlation among LD scores of all baselineLD model annotations and six molecular QTL MaxCPP annotations.
We compute the pairwise correlation between the LD score of each annotation.
Supplementary Figure 6 Pairwise correlation among all baselineLD model annotations and GTEx blood and Brain+Nerve MaxCPP annotations.
We compute the pairwise correlation between annotation values.
Supplementary Figure 7 Pairwise correlation among LD scores of all baselineLD model annotations and GTEx blood and Brain+Nerve MaxCPP annotations.
We compute the pairwise correlation between the LD score of each annotation.
Supplementary Figure 8 Histogram of values for MaxCPP annotation for FE-Meta-Tissue for each molecular QTL in BLUEPRINT.
a, eQTL. b, H3K27ac hQTL. c, H3K4me1 hQTL. d, meQTL. e, sQTL. All these annotations were obtained by creating the MaxCPP annotation for each QTL dataset.
Supplementary Figure 9 MaxCPP enrichment estimates are not sensitive to the maximum number of causal variants per locus modeled by CAVIAR.
We report MaxCPP enrichment estimates for each of 44 GTEx tissues with CAVIAR modeling either up to six or up to three causal variants per locus. We determined that results were not statistically different. The y axis is the enrichment meta-analyzed value, and error bars represent 95% confidence intervals. These values were computed by meta-analyzing 41 independent traits (n = 41 independent traits to derive the statistics). Numerical results are reported in Supplementary Table 37.
Supplementary Figure 10 MaxCPP τ* estimates are not sensitive to the maximum number of causal variants per locus modeled by CAVIAR.
We report MaxCPP τ* estimates for each of 44 GTEx tissues with CAVIAR modeling either up to six or up to three causal variants per locus. We determined that results were not statistically different. The y axis is the τ* meta-analyzed value, and error bars represent 95% confidence intervals. These values are computed by meta-analyzing 41 independent traits (n = 41 independent traits to derive the statistics). Numerical results are reported in Supplementary Table 37.
Supplementary Figure 11 Simulations confirm that S-LDSC estimates for τ* are unique to the focal annotation.
We generated simulated data under a model in which only baselineLD and GTEx-FE-Meta-Tissue MaxCPP annotations directly influence trait heritability, using estimated τ* values from meta-analysis of 41 independent traits (estimated using a model that contains only baselineLD and GTEx-FE-Meta-Tissue MaxCPP annotations). We then estimated τ* values using a model that contains baselineLD, GTEx-FE-Meta-Tissue MaxCPP, and GTEx whole-blood maxCPP annotations. Results are averaged across 2,000 simulations. a, We report τ* estimates for each annotation. The y axis is the mean of τ* values, and error bars represent 95% confidence intervals. We computed the mean and confidence intervals using 400 simulations (n = 400 independent simulations to derive the statistics). τ* estimates for GTEx whole-blood MaxCPP were not significantly different from 0. b, We report the false positive rate for different P-value thresholds of τ* estimates for GTEx whole-blood MaxCPP. We observed correct null calibration. We computed the false positive rate using 400 simulations (n = 400 independent simulations to derive the statistics).
Supplementary Figures 1–11 and Supplementary Tables 1–5, 7–15, 17–26 and 28–36
List of 47 datasets analyzed in this study
τ* of MaxCPP annotations for individual tissues and individual traits
Enrichment and τ* for CD14+ monocytes, CD16+ neutrophils, naive CD4+ T cells and FE-Meta-Tissue in the BLUEPRINT dataset
MaxCPP enrichment and τ* estimates are not sensitive to the maximum number of causal variants per locus modeled by CAVIAR
Results using baselineLD model v1.1 and baselineLD model v1.0 are highly concordant
About this article
Cite this article
Hormozdiari, F., Gazal, S., van de Geijn, B. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat Genet 50, 1041–1047 (2018). https://doi.org/10.1038/s41588-018-0148-2
Trends in Genetics (2021)
Common schizophrenia risk variants are enriched in open chromatin regions of human glutamatergic neurons
Nature Communications (2020)
Trends in Genetics (2020)
Leveraging mouse chromatin data for heritability enrichment informs common disease architecture and reveals cortical layer contributions to schizophrenia
Genome Research (2020)