Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Systematic differences in discovery of genetic effects on gene expression and complex traits

Abstract

Most signals in genome-wide association studies (GWAS) of complex traits implicate noncoding genetic variants with putative gene regulatory effects. However, currently identified regulatory variants, notably expression quantitative trait loci (eQTLs), explain only a small fraction of GWAS signals. Here, we show that GWAS and cis-eQTL hits are systematically different: eQTLs cluster strongly near transcription start sites, whereas GWAS hits do not. Genes near GWAS hits are enriched in key functional annotations, are under strong selective constraint and have complex regulatory landscapes across different tissue/cell types, whereas genes near eQTLs are depleted of most functional annotations, show relaxed constraint, and have simpler regulatory landscapes. We describe a model to understand these observations, including how natural selection on complex traits hinders discovery of functionally relevant eQTLs. Our results imply that GWAS and eQTL studies are systematically biased toward different types of variant, and support the use of complementary functional approaches alongside the next generation of eQTL studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Study workflow and key results.
Fig. 2: GWAS and eQTL genes are under different selective constraints.
Fig. 3: GWAS and eQTL genes have different transcriptional regulatory landscapes.
Fig. 4: Diverse categories of functional genes are enriched among GWAS genes but not among eQTL genes.
Fig. 5: GWAS hits are less enriched at TSSs than are eQTLs.
Fig. 6: A model for variant discovery in GWAS and eQTL assays.

Similar content being viewed by others

Data availability

Data generated by or processed for this study can be found in Supplementary Tables, on Zenodo with https://doi.org/10.5281/zenodo.6618073 (ref. 84), and on GitHub (https://github.com/hakha-most/gwas_eqtl) with https://doi.org/10.5281/zenodo.8330029 (ref. 85). Public data used in this study are accessible via URLs cited at appropriate locations in the Methods, as listed: Neale lab UKB data: http://www.nealelab.is/uk-biobank GTEx data: https://gtexportal.org/home/datasets; NCBI’s gene_info file: https://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz; GENCODE Basic annotations: https://www.gencodegenes.org/human/release_39lift37.html; Ensembl’s BioMart: http://uswest.ensembl.org/biomart/martview; gnomAD: https://gnomad.broadinstitute.org/downloads; ABC enhancer–gene links: https://www.engreitzlab.org/resources; Liu et al.’s enhancer–gene links: https://ernstlab.biolchem.ucla.edu/roadmaplinking; FANTOM5 promoters: https://fantom.gsc.riken.jp/5/datafiles/latest/extra/CAGE_peaks; FANTOM5 enhancers: https://fantom.gsc.riken.jp/5/datafiles/latest/extra/Enhancers; Transcription factors: http://humantfs.ccbr.utoronto.ca; ldsc software: https://github.com/bulik/ldsc; LD annotations: https://alkesgroup.broadinstitute.org/LDSCORE; ENCODE cCREs: https://screen-v2.wenglab.org.

Code availability

Codes used to process and analyze GWAS and eQTL data are available on GitHub (https://github.com/hakha-most/gwas_eqtl) with https://doi.org/10.5281/zenodo.8330029 (ref. 85).

References

  1. Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Gusev, A. et al. Partitioning heritability of regulatory and cell-type-specific variants across 11 common diseases. Am. J. Hum. Genet. 95, 535–552 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  9. Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    Article  CAS  PubMed  Google Scholar 

  12. Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet. 99, 1245–1260 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Aguet, F. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  Google Scholar 

  14. GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  Google Scholar 

  15. Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).

    Article  CAS  PubMed  Google Scholar 

  17. Connally, N. J. et al. The missing link between genetic association and regulatory function. eLife 11, e74970 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Strober, B. J. et al. Dynamic genetic regulation of gene expression during cellular differentiation. Science 364, 1287–1290 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. D’Antonio-Chronowska, A. et al. iPSC-derived pancreatic progenitors are an optimal model system to study T2D regulatory variants active during fetal development of the pancreas. Preprint at bioRxiv https://doi.org/10.1101/2021.03.17.435846 (2021).

  21. Walker, R. L. et al. Genetic control of expression and splicing in developing human brain informs disease mechanisms. Cell 179, 750–771.e22 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Jerber, J. et al. Population-scale single-cell RNA-seq profiling across dopaminergic neuron differentiation. Nat. Genet. 53, 304–312 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Zhernakova, D. V. et al. Identification of context-dependent expression quantitative trait loci in whole blood. Nat. Genet. 49, 139–145 (2017).

    Article  CAS  PubMed  Google Scholar 

  24. Young, A. M. H. et al. A map of transcriptional heterogeneity and regulatory variation in human microglia. Nat. Genet. 53, 861–868 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Kim-Hellmuth, S. et al. Cell type-specific genetic regulation of gene expression across human tissues. Science 369, eaaz8528 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Yazar, S. et al. Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease. Science 376, eabf3041 (2022).

    Article  CAS  PubMed  Google Scholar 

  27. Fairfax, B. P. et al. Innate immune activity conditions the effect of regulatory variants upon monocyte gene expression. Science 343, 1246949 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Calderon, D. et al. Landscape of stimulation-responsive chromatin across diverse human immune cells. Nat. Genet. 51, 1494–1505 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Gutierrez-Arcelus, M. et al. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci. Nat. Genet. 52, 247–253 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Ota, M. et al. Dynamic landscape of immune cell-specific gene regulation in immune-mediated diseases. Cell 184, 3006–3021.e17 (2021).

    Article  CAS  PubMed  Google Scholar 

  31. Mu, Z. et al. The impact of cell type and context-dependent regulatory variants on human immune traits. Genome Biol. 22, 122 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Hukku, A. et al. Probabilistic colocalization of genetic variants from complex and molecular traits: promise and limitations. Am. J. Hum. Genet. 108, 25–35 (2021).

    Article  CAS  PubMed  Google Scholar 

  33. Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Li, L. et al. An atlas of alternative polyadenylation quantitative trait loci contributing to complex trait and disease heritability. Nat. Genet. 53, 994–1005 (2021).

    Article  CAS  PubMed  Google Scholar 

  35. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  38. Pierce, B. L. et al. Mediation analysis demonstrates that trans-eQTLs are often explained by cis-mediation: a genome-wide analysis among 1,800 South Asians. PLoS Genet. 10, e1004818 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Zeng, J. et al. Signatures of negative selection in the genetic architecture of human complex traits. Nat. Genet. 50, 746–753 (2018).

    Article  CAS  PubMed  Google Scholar 

  43. Koch, E. M. & Sunyaev, S. R. Maintenance of complex trait variation: classic theory and modern data. Front. Genet. 12, 763363 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Simons, Y. B., Bullaughey, K., Hudson, R. R. & Sella, G. A population genetic interpretation of GWAS findings for human quantitative traits. PLoS Biol. 16, e2002985 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Siewert-Rocks, K. M., Kim, S. S., Yao, D. W., Shi, H. & Price, A. L. Leveraging gene co-regulation to identify gene sets enriched for disease heritability. Am. J. Hum. Genet. 109, 393–404 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Weiner, D. J., Gazal, S., Robinson, E. B. & O’Connor, L. J. Partitioning gene-mediated disease heritability without eQTLs. Am. J. Hum. Genet. 109, 405–416 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Fuller, Z. L., Berg, J. J., Mostafavi, H., Sella, G. & Przeworski, M. Measuring intolerance to mutation in human genetics. Nat. Genet. 51, 772–776 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Wang, X. & Goldstein, D. B. Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am. J. Hum. Genet. 106, 215–233 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Liu, Y., Sarkar, A., Kheradpour, P., Ernst, J. & Kellis, M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 18, 193 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  51. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Forrest, A. R. R. et al. A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).

    Article  CAS  PubMed  Google Scholar 

  53. Saha, A. et al. Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Res. 27, 1843–1858 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Kim, S. S. et al. Genes with high network connectivity are enriched for disease heritability. Am. J. Hum. Genet. 104, 896–913 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Dey, K. K. et al. SNP-to-gene linking strategies reveal contributions of enhancer-related and candidate master-regulator genes to autoimmune disease. Cell Genom. 2, 100145 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Veyrieras, J. B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  58. Dimas, A. S. et al. Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325, 1246–1250 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Brown, C. D., Mangravite, L. M. & Engelhardt, B. E. Integrative modeling of eQTLs and cis-regulatory elements suggests mechanisms underlying cell type specificity of eQTLs. PLoS Genet. 9, e1003649 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Zuin, J. et al. Nonlinear control of transcription through enhancer–promoter interactions. Nature 604, 571–577 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Nair, S., Kim, D. S., Perricone, J. & Kundaje, A. Integrating regulatory DNA sequence and gene expression to predict genome-wide chromatin accessibility across cellular contexts. Bioinformatics 35, i108–i116 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Abell, N. S. et al. Multiple causal variants underlie genetic associations in humans. Science 375, 1247–1254 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. Morris, J. A. et al. Discovery of target genes and pathways at GWAS loci by pooled single-cell CRISPR screens. Science 380, eadh7699 (2023).

    Article  CAS  PubMed  Google Scholar 

  67. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. GTEx Consortium. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015).

  69. Hinrichs, A. S. et al. The UCSC genome browser database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).

    Article  CAS  PubMed  Google Scholar 

  70. Aygün, N. et al. Brain-trait-associated variants impact cell-type-specific gene regulation during neurogenesis. Am. J. Hum. Genet. 108, 1647–1668 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Agarwal, I., Fuller, Z. L., Myers, S. R. & Przeworski, M. Relating pathogenic loss-of-function mutations in humans to their evolutionary fitness costs. eLife 12, e83172 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJournal Complex Systems, 1695 (2006).

  73. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020); https://www.R-project.org/

  74. Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Alexa, A. & Rahnenfuhrer, J. topGO: enrichment analysis for Gene Ontology. R package version 2.44.0 (2021).

  76. Lambert, S. A. et al. The human transcription factors. Cell 172, 650–665 (2018).

    Article  CAS  PubMed  Google Scholar 

  77. Pintacuda, G. et al. Genoppi is an open-source software for robust and standardized integration of proteomic and genetic data. Nat. Commun. 12, 2580 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Li, T. et al. A scored human protein–protein interaction network to catalyze genomic interpretation. Nat. Methods 14, 61–64 (2017).

    Article  CAS  PubMed  Google Scholar 

  79. Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Berisa, T. & Pickrell, J. K. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32, 283–285 (2016).

    Article  CAS  PubMed  Google Scholar 

  81. Storey, J. D., Bass, A. J., Dabney, A. & Robinson, D. qvalue: Q-value estimation for false discovery rate control. R package version 2.24.0 http://github.com/jdstorey/qvalue (2021).

  82. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  83. Schoech, A. P. et al. Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat. Commun. 10, 790 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Mostafavi, H. Supplementary data for ‘Systematic differences in discovery of genetic effects on gene expression and complex traits’. Zenodo https://doi.org/10.5281/zenodo.6618073 (2023).

  85. Mostafavi, H. Code repository for ‘Systematic differences in discovery of genetic effects on gene expression and complex traits’. Zenodo https://doi.org/10.5281/zenodo.8330029 (2023).

Download references

Acknowledgements

This research has been conducted using the UK Biobank resource under application number 24983. We thank the Rivas lab at Stanford University for assistance with accessing this resource. We are grateful to J. Engreitz, M. Przeworski, G. Sella, A. Kundaje, Y. Simons, I. Agarwal, M. Ota, R. Patel and members of the Pritchard lab for helpful conversations, and to J. Engreitz, B. Pasaniuc, A. Battle, A. Harpak, M. Przeworski, G. Sella and W. Wohns for valuable feedback on an earlier draft of the manuscript. This research was supported by National Institutes of Health grants R01HG008140 and R01HG011432 to J.K.P., and U01HG012069 to A. Kundaje. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

H.M. and J.K.P. conceived and designed the study. H.M. performed all data analyses and developed the model. J.P.S. contributed to the design and interpretation of the statistical analyses and validation of the model. J.P.S. and S.N. provided intellectual contributions to all aspects of the study. H.M. and J.K.P. wrote the paper. J.K.P. supervised the study and acquired funding.

Corresponding authors

Correspondence to Hakhamanesh Mostafavi or Jonathan K. Pritchard.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Tiffany Amariuta, Andrew Jaffe and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Genes closest to eQTLs versus eGenes.

(A) Fraction of eQTLs for which the target eGene is also the gene with the closest TSS, as a function of eQTL association p-value. Error bars show ± 2 standard errors computed as \(\sqrt{2f(1-f\;)/M}\), where f is the estimated fraction, and M is the number of eQTLs per p-value group. In the p-value groups shown, from left to right, there are 50,859, 45,650, 11,246, 4,781, 2,575, and 3,885 eQTLs, respectively. The dashed line shows the mean value of 0.52 across all eQTLs. (B) Same as Fig. 2a, but with different gene assignments to eQTLs (N=118,996). Fraction of eGenes linked to eQTLs (green), or closest genes to eQTLs (red), or closest genes to control SNPs matched for MAF, LD score and gene density (light red) with high pLI (pLI > 0.9, a measure of selective constraint). Error bars corresponding to eQTL properties (red and green points) show 95% confidence intervals as determined by quantile bootstrapping. For matched SNPs (light red), points and error bars show mean values and 95% confidence intervals in 1000 sampling iterations.

Extended Data Fig. 2 Basic variant-level differences between GWAS hits and eQTLs.

Distribution of minor allele frequency (MAF), linkage disequilibrium (LD) score and gene density for 118,996 eQTLs (red), 22,119 GWAS hits (blue), and 100,000 randomly chosen variants. LD score values are cut at 1000 for clarity.

Extended Data Fig. 3 GWAS and eQTL genes are under different selective constraints: robustness to gene-level measures of selective constraints.

Logistic regression coefficients corresponding with different gene-level measures of selection for predicting GWAS hits (N=22,119) or eQTLs (N=118,996) versus random SNPs (N=100,000) after adjusting for confounders (see Methods). Results are plotted as regression coefficients on the original data with error bars showing the 2.5th and 97.5th percentile over 1000 bootstrap samples. The measures of selection are pLI and LOEUF from the gnomAD study45,67, and hs estimates from Agarwal et al.71. Lower LOEUF values correspond to higher selective constraints, therefore we used -LOEUF values to match other measures, such that higher values mean higher constraint levels.

Extended Data Fig. 4 GWAS and eQTL genes have different enhancer architectures.

Same as Fig. 3b, but using enhancer-gene links predicted by the activity-by-contact (ABC) model from Nasser et al.51 (Methods). For a given gene, we computed (i) the number of biosamples in which a gene has an enhancer, and (ii) the average total enhancer length (in base pairs) across active biosamples. Shown are logistic regression coefficients corresponding with the two enhancer features for predicting 22,119 GWAS hits (blue) and 118,996 eQTLs (red) versus 100,000 random variants after adjusting for confounders (Methods). Results are plotted as regression coefficients on the original data with error bars showing the 2.5th and 97.5th percentile over 1000 bootstrap samples.

Extended Data Fig. 5 Contribution of transcription factors (TFs) in Gene Ontology (GO) annotations and their enrichment in GWAS and eQTL genes.

(A) Proportion of TFs in 41 GO biological processes shown in Fig. 4a. (B) Same as Fig. 4a, but now excluding TFs from all 41 gene categories before computing enrichment values among GWAS and eQTL genes. Traits and tissues (x-axis) are sorted by hit count (decreasing from left to right), and GO terms (y-axis) are sorted by the mean pLI value of associated genes (before removing TFs, replicating the ordering in Fig. 4a). For each trait- or tissue-GO term pair we computed enrichment z-scores based on 1000 sampling iterations of variants matched for MAF, LD score, and gene density (see Methods). The color map represents enrichment (green) or depletion (magenta) of a given gene set among GWAS or eQTL genes. See Fig. 4a for additional details.

Extended Data Fig. 6 Multi-functionality of highly interacting genes in protein-protein interaction (PPI) networks and their enrichment in GWAS genes.

(A) Proportion of genes in bins ranked by the number of interactions in the InWeb PPI network77 that are among the top multi-functional genes (defined as top 20% of genes ranked by the count of Gene Ontology (GO) terms they belong to, see Methods). Error bars show 2 standard errors. 16,510 genes with an assigned PPI degree are evenly split into the 5 gene bins shown. (B) Fraction of GWAS and eQTL genes in gene bins ranked by the number of interactions in the InWeb PPI network. For GWAS hits and eQTLs, error bars show 95% confidence intervals as determined by quantile bootstrapping over 1000 sampling iterations. For matched variants (for MAF, LD score and gene density, shown in light blue and red colors), points and error bars show mean values and 95% confidence intervals in 1000 sampling iterations. See Supplementary Table 5 for the counts of genes in each bin shown.

Extended Data Fig. 7 Effect of selection on variants contribution to variance in phenotype and gene expression.

(A,B) As described in the main text, we consider a model of phenotypic effects mediated by effects on gene expression intermediates: a genetic variant affects the expression of the target gene with effect β, and the gene expression intermediate affects the downstream phenotype with effect size γ. (A) Contribution to phenotypic variance. Under a neutral model, contribution to phenotypic variance, E[2p(1 − p)]β2γ2, is proportional to phenotypic effect, β2γ2, as effect size and allele frequency are uncoupled. Selection keeps higher effect variants at lower frequencies (that is, lowering E[2p(1 − p)]) and thus “flattens" the expected contribution to variance. The red line shows a flattened curve taking \(E[2p(1-p){\beta }^{2}{\gamma }^{2}| \beta ,\gamma ]\)\(\sim \kappa (1-{e}^{-{\beta }^{2}{\gamma }^{2}}/\kappa )\), with κ = 2.986 (Methods). (B) Contribution to variance in gene expression. Similar to the argument in (A), under neutrality, contribution to variance in gene expression, E[2p(1 − p)]β2, is proportional to the effect on expression, β2. Under selection, flattening (that is, lowering of E[2p(1 − p)]) is more pronounced for variants regulating high-effect (that is, high γ2) genes. Red lines show trends for four quantiles of γ2, where γ ~ N(0, 1); darker colors show higher γ2 values. See Methods for modeling details.

Extended Data Fig. 8 Depletion of selectively constrained genes among non-GTEx eGenes.

The factors we described against the discovery of trait-eQTLs likely bias eQTL assays in any context. As proof of concept, we show that similar to GTEx eGenes, eGenes identified in non-conventional eQTL assays are also depleted of strongly selected genes. (A) Enrichment of high pLI genes in eGenes identified (i) in fetal brain samples by Aygün et al.70, (ii) at multiple stages of iPS cells differentiation towards neuronal fate by Jerber et al.22 and (iii) in GTEx brain tissues. Sample labels for Jerber et al. refer to different ascertained cell types, at different days of differentiation, and in the presence or absence of stimulation by rotenone (ROT). Cell labels for Jerber et al.: Astro, astrocyte-like; DA, dopaminergic neuron; epen1, ependymal-like 1; FPP, floor plate progenitors; prolif. FPP, proliferating floor plate progenitors; sert, serotonergic-like neuron; D11, day 11 of differentiation; D30, day 30; D52, day 52. (B) Enrichment of high pLI genes in eGenes identified in (i) single-cell analyses of blood cell types by Yazar et al.26 and (ii) GTEx whole blood. Sample labels for Yazar et al. refer to different blood cell types: : B_IN, immature and naive B cell; B_Mem, memory B cell; CD4_ET, CD4+ effector memory and central memory T cell; CD4_NC, CD4+ naive and central memory T cell; CD4_SOX4, CD4+ SOX4 T cell; CD8_ET, CD8+ effector memory T cell; CD8_NC, CD8+ naive and central memory T cell; CD8_S100B, CD8+ S100B T cell; DC, dendritic cell; Mono_C, classical monocyte; Mono_NC, non-classical monocyte; NK, natural killer cell; NK_R, natural killer cell recruiting; Plasma, plasma cell. Enrichment values (on the x-axis) and z-scores (on the y-axis) were computed based on values observed in 10,000 sampling iterations of random genes (Methods).

Extended Data Fig. 9 Effect of eQTL assay sample size on discovery.

Same as Fig. 6B, but with three eQTL discovery thresholds corresponding to different sample sizes. The discovery thresholds are derived by setting the power rate to 15% for GWAS under the assumptions detailed in the Methods section, and to 10%, 15% and 20% for eQTLs.

Supplementary information

Supplementary Information

Supplementary Note.

Reporting Summary

Peer Review File

Supplementary Table 1

Supplementary Table 1 List of traits and tissues. Supplementary Table 2 List of autosomal protein-coding genes. Supplementary Table 3 List of GWAS hits. The P value column displays association P-values reported by the original GWAS study conducted by the Neale lab. Supplementary Table 4 List of eQTLs. The P value column displays association P values obtained from the GTEx data. Supplementary Table 5 Count of GWAS genes, eQTL genes and eGenes within gene groups categorized by quantiles of continuous gene features. Supplementary Table 6 List of broadly unrelated GO biological process terms. Supplementary Table 7 Enrichment of GO biological processes in GWAS and eQTL genes for individual traits and tissues. Supplementary Table 8 Count of variants located within promoter/enhancer regulatory annotations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mostafavi, H., Spence, J.P., Naqvi, S. et al. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat Genet 55, 1866–1875 (2023). https://doi.org/10.1038/s41588-023-01529-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-023-01529-1

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research