Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits

Subjects

Abstract

There is increasing evidence that many risk loci found using genome-wide association studies are molecular quantitative trait loci (QTLs). Here we introduce a new set of functional annotations based on causal posterior probabilities of fine-mapped molecular cis-QTLs, using data from the Genotype-Tissue Expression (GTEx) and BLUEPRINT consortia. We show that these annotations are more strongly enriched for heritability (5.84× for eQTLs; P = 1.19 × 10−31) across 41 diseases and complex traits than annotations containing all significant molecular QTLs (1.80× for expression (e)QTLs). eQTL annotations obtained by meta-analyzing all GTEx tissues generally performed best, whereas tissue-specific eQTL annotations produced stronger enrichments for blood- and brain-related diseases and traits. eQTL annotations restricted to loss-of-function intolerant genes were even more enriched for heritability (17.06×; P = 1.20 × 10−35). All molecular QTLs except splicing QTLs remained significantly enriched in joint analysis, indicating that each of these annotations is uniquely informative for disease and complex trait architectures.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: S-LDSC and GCTA estimates for Topcis-QTL are upward biased in simulations.
Fig. 2: Fine-mapped eQTLs are strongly enriched for disease and trait heritability.
Fig. 3: Relationship between eQTL sample size and the annotation effect size (τ*).
Fig. 4: Tissue-specific fine-mapped eQTL enrichments for blood- and brain-related traits.
Fig. 5: Heritability enrichment of fine-mapped eQTLs is concentrated in disease-relevant gene sets.
Fig. 6: Fine-mapped eQTL, hQTL, sQTL and meQTL annotations are enriched for disease and trait heritability.

References

  1. 1.

    Schizophrenia Working Group of the Psychiatric Genomics Consortium.. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).

    Article  PubMed Central  CAS  Google Scholar 

  2. 2.

    Okbay, A. et al. Genome-wide association study identifies 74 loci associated with educational attainment. Nature 533, 539–542 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. 3.

    Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  4. 4.

    Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).

    Article  PubMed  Google Scholar 

  5. 5.

    Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  6. 6.

    Nica, A. C. et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. 7.

    Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. 8.

    Trynka, G. et al. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am. J. Hum. Genet. 97, 139–152 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  9. 9.

    Roadmap Epigenomics Consortium. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. 10.

    Wright, F. A. et al. Heritability and genomics of gene expression in peripheral blood. Nat. Genet. 46, 430–437 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  11. 11.

    Zhang, X. et al. Identification of common genetic variants controlling transcript isoform variation in human whole blood. Nat. Genet. 47, 345–352 (2015).

    Article  PubMed  CAS  Google Scholar 

  12. 12.

    The GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

    Article  CAS  Google Scholar 

  13. 13.

    Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).

    Article  PubMed  CAS  Google Scholar 

  14. 14.

    The GTEx Consortium.. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  Google Scholar 

  15. 15.

    McVicker, G. et al. Identification of genetic variants that affect histone modifications in human cells. Science 342, 747–749 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  16. 16.

    Waszak, S. M. et al. Population variation and genetic control of modular chromatin architecture in humans. Cell 162, 1039–1050 (2015).

    Article  PubMed  CAS  Google Scholar 

  17. 17.

    Grubert, F. et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell 162, 1051–1065 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. 18.

    Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in human immune cells. Cell 167, 1398–1414 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. 19.

    Davis, L. K. et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS Genet. 9, e1003864 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. 20.

    Torres, J. M. et al. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. Am. J. Hum. Genet. 95, 521–534 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. 21.

    Hu, X. et al. Regulation of gene expression in autoimmune disease loci and the genetic basis of proliferation in CD4+ effector memory T cells. PLoS Genet. 10, e1004404 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. 22.

    Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  23. 23.

    Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. 24.

    Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. 25.

    Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. 26.

    Mancuso, N. et al. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100, 473–487 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. 27.

    Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. 28.

    Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat. Genet. 44, 247–250 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. 29.

    Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. 30.

    Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. 31.

    Speed, D., Cai, N., Johnson, M. R., Nejentsev, S. & Balding, D. J. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. 32.

    Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. 33.

    Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  34. 34.

    Pasaniuc, B. & Price, A. L. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18, 117–127 (2017).

    Article  PubMed  CAS  Google Scholar 

  35. 35.

    Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying causal variants at loci with multiple signals of association. Genetics 198, 497–508 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. 36.

    Liu, X. et al. Functional architectures of local and distal regulation of gene expression in multiple human tissues. Am. J. Hum. Genet. 100, 605–616 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. 37.

    Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. 38.

    Gazal, S., Finucane, H. K. & Price, A. l. Reconciling S-LDSC and LDAK functional enrichment estimates. bioRxiv https://doi.org/10.1101/256412 (2018).

  39. 39.

    Cassa, C. A. et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. 40.

    Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. 41.

    Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. 42.

    Hormozdiari, F. et al. Widespread allelic heterogeneity in complex traits. Am. J. Hum. Genet. 100, 789–802 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. 43.

    Veyrieras, J.-B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. 44.

    Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  45. 45.

    Mumbach, M. R. et al. Enhancer connectome in primary human cells identifies target genes of disease-associated DNA elements. Nat. Genet. 49, 1602–1612 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  46. 46.

    Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  47. 47.

    Li, X. et al. The impact of rare variation on gene expression across tissues. Nature 550, 239–243 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  48. 48.

    Sul, J. H., Han, B., Ye, C., Choi, T. & Eskin, E. Effectively identifying eQTLs from multiple tissues by combining mixed model and meta-analytic approaches. PLoS Genet. 9, e1003491 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  49. 49.

    Flutre, T., Wen, X., Pritchard, J. & Stephens, M. A statistical framework for joint eQTL analysis in multiple tissues. PLoS Genet. 9, e1003486 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. 50.

    Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  51. 51.

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  52. 52.

    The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

    Article  PubMed Central  CAS  Google Scholar 

  53. 53.

    The International HapMap 3 Consortium.. Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–58 (2010).

    Article  PubMed Central  CAS  Google Scholar 

  54. 54.

    Loh, P.-R. et al. Mixed model association for Biobank-scale data sets. Nat. Genet. 50, 621–629 (2018).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  55. 55.

    Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. 56.

    Urbut, S. M., Wang, G. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. bioRxiv https://doi.org/10.1101/096552 (2017).

  57. 57.

    Park, Y., Sarkar, A. K., Bhutani, K. & Kellis, M. Multi-tissue polygenic models for transcriptome-wide association studies. bioRxiv (2017).

  58. 58.

    Jostins, L. et al. Host–microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  59. 59.

    Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    Article  PubMed  CAS  Google Scholar 

  60. 60.

    Bentham, J. et al. Genetic association analyses implicate aberrant regulation of innate and adaptive immunity genes in the pathogenesis of systemic lupus erythematosus. Nat. Genet. 47, 1457–1464 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  61. 61.

    Dubois, P. C. A. et al. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42, 295–302 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  62. 62.

    Day, F. R. et al. Genomic analyses identify hundreds of variants associated with age at menarche and support a role for puberty timing in cancer risk. Nat. Genet. 49, 834–841 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  63. 63.

    Speliotes, E. K. et al. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42, 937–948 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  64. 64.

    Psychiatric GWAS Consortium Bipolar Disorder Working Group. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4. Nat. Genet. 43, 977–983 (2011).

    Article  CAS  Google Scholar 

  65. 65.

    The Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42, 441–447 (2010).

    Article  PubMed Central  CAS  Google Scholar 

  66. 66.

    Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–1471 (2013).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We thank S. Raychaudhuri, N. Zaitlen, B. Pasaniuc, M. Nivard, J.-H. Sul and F. Hormozdiari for helpful discussions. This research was funded by NIH grants U01 HG009379, R01 MH101244, R01 MH109978, T32 DK110919 and R01 MH107649. This research was conducted using the UK Biobank Resource under application 16549.

Author information

Affiliations

Authors

Contributions

F.H. and A.L.P. designed experiments. F.H. performed experiments. F.H., S.G., B.v.d.G., H.K.F., C.J.-T.J., P.-R.L., A.S., Y.R., X.L., L.O., A.G. and E.E. analyzed data. F.H. and A.L.P. wrote the manuscript with assistance from all authors.

Corresponding authors

Correspondence to Farhad Hormozdiari or Alkes L. Price.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Enrichment and τ* of different QTL annotations for whole blood in the GTEx dataset without conditioning on the baselineLD model.

a, Meta-analysis results of enrichment for whole blood tissue from the GTEx datasets. b, Meta-analysis results of τ* for whole blood. The y axis is the meta-analyzed value, and error bars represent jackknife 95% confidence intervals. These values were computed by meta-analyzing 41 independent traits (n = 41 independent simulations to derive the statistics). Numerical results are reported in Supplementary Table 9.

Supplementary Figure 2 Simulations results at different eQTL sample sizes.

We report annotation effect size (τ*) estimates at different eQTL sample sizes simulated under the alternative simulation framework (Methods). We simulated eQTL studies where we ranged the eQTL sample size as 100, 200, 300, 400, 500, 600, 700, 800, 900 and 1,000. The x axis is the eQTL sample size, and the y axis is the estimated annotation effect size (τ*). We observed a linear relationship between eQTL sample size and τ* (R2 = 0.84, P = 9.39 × 10–5).

Supplementary Figure 3 Histogram of values for MaxCPP annotation in GTEx.

The x axis is the MaxCPP value, and the y axis is the frequency corresponding to each MaxCPP value.

Supplementary Figure 4 Pairwise correlation among all baselineLD model annotations and our six molecular QTL MaxCPP annotations.

We compute the pairwise correlation between annotation values.

Supplementary Figure 5 Pairwise correlation among LD scores of all baselineLD model annotations and six molecular QTL MaxCPP annotations.

We compute the pairwise correlation between the LD score of each annotation.

Supplementary Figure 6 Pairwise correlation among all baselineLD model annotations and GTEx blood and Brain+Nerve MaxCPP annotations.

We compute the pairwise correlation between annotation values.

Supplementary Figure 7 Pairwise correlation among LD scores of all baselineLD model annotations and GTEx blood and Brain+Nerve MaxCPP annotations.

We compute the pairwise correlation between the LD score of each annotation.

Supplementary Figure 8 Histogram of values for MaxCPP annotation for FE-Meta-Tissue for each molecular QTL in BLUEPRINT.

a, eQTL. b, H3K27ac hQTL. c, H3K4me1 hQTL. d, meQTL. e, sQTL. All these annotations were obtained by creating the MaxCPP annotation for each QTL dataset.

Supplementary Figure 9 MaxCPP enrichment estimates are not sensitive to the maximum number of causal variants per locus modeled by CAVIAR.

We report MaxCPP enrichment estimates for each of 44 GTEx tissues with CAVIAR modeling either up to six or up to three causal variants per locus. We determined that results were not statistically different. The y axis is the enrichment meta-analyzed value, and error bars represent 95% confidence intervals. These values were computed by meta-analyzing 41 independent traits (n = 41 independent traits to derive the statistics). Numerical results are reported in Supplementary Table 37.

Supplementary Figure 10 MaxCPP τ* estimates are not sensitive to the maximum number of causal variants per locus modeled by CAVIAR.

We report MaxCPP τ* estimates for each of 44 GTEx tissues with CAVIAR modeling either up to six or up to three causal variants per locus. We determined that results were not statistically different. The y axis is the τ* meta-analyzed value, and error bars represent 95% confidence intervals. These values are computed by meta-analyzing 41 independent traits (n = 41 independent traits to derive the statistics). Numerical results are reported in Supplementary Table 37.

Supplementary Figure 11 Simulations confirm that S-LDSC estimates for τ* are unique to the focal annotation.

We generated simulated data under a model in which only baselineLD and GTEx-FE-Meta-Tissue MaxCPP annotations directly influence trait heritability, using estimated τ* values from meta-analysis of 41 independent traits (estimated using a model that contains only baselineLD and GTEx-FE-Meta-Tissue MaxCPP annotations). We then estimated τ* values using a model that contains baselineLD, GTEx-FE-Meta-Tissue MaxCPP, and GTEx whole-blood maxCPP annotations. Results are averaged across 2,000 simulations. a, We report τ* estimates for each annotation. The y axis is the mean of τ* values, and error bars represent 95% confidence intervals. We computed the mean and confidence intervals using 400 simulations (n = 400 independent simulations to derive the statistics). τ* estimates for GTEx whole-blood MaxCPP were not significantly different from 0. b, We report the false positive rate for different P-value thresholds of τ* estimates for GTEx whole-blood MaxCPP. We observed correct null calibration. We computed the false positive rate using 400 simulations (n = 400 independent simulations to derive the statistics).

Supplementary information

Supplementary Figures, Text and Tables

Supplementary Figures 1–11 and Supplementary Tables 1–5, 7–15, 17–26 and 28–36

Reporting Summary

Supplementary Table 6

List of 47 datasets analyzed in this study

Supplementary Table 16

τ* of MaxCPP annotations for individual tissues and individual traits

Supplementary Table 27

Enrichment and τ* for CD14+ monocytes, CD16+ neutrophils, naive CD4+ T cells and FE-Meta-Tissue in the BLUEPRINT dataset

Supplementary Table 37

MaxCPP enrichment and τ* estimates are not sensitive to the maximum number of causal variants per locus modeled by CAVIAR

Supplementary Table 38

Results using baselineLD model v1.1 and baselineLD model v1.0 are highly concordant

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hormozdiari, F., Gazal, S., van de Geijn, B. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat Genet 50, 1041–1047 (2018). https://doi.org/10.1038/s41588-018-0148-2

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing