Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function

Abstract

We determined the relationships between DNA sequence variation and DNA methylation using blood samples from 3,799 Europeans and 3,195 South Asians. We identify 11,165,559 SNP–CpG associations (methylation quantitative trait loci (meQTL), P < 10−14), including 467,915 meQTL that operate in trans. The meQTL are enriched for functionally relevant characteristics, including shared chromatin state, High-throuhgput chromosome conformation interaction, and association with gene expression, metabolic variation and clinical traits. We use molecular interaction and colocalization analyses to identify multiple nuclear regulatory pathways linking meQTL loci to phenotypic variation, including UBASH3B (body mass index), NFKBIE (rheumatoid arthritis), MGA (blood pressure) and COMMD7 (white cell counts). For rs6511961, chromatin immunoprecipitation followed by sequencing (ChIP–seq) validates zinc finger protein (ZNF)333 as the likely trans acting effector protein. Finally, we used interaction analyses to identify population- and lineage-specific meQTL, including rs174548 in FADS1, with the strongest effect in CD8+ T cells, thus linking fatty acid metabolism with immune dysregulation and asthma. Our study advances understanding of the potential pathways linking genetic variation to human phenotype.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Summary of results for genome-wide association and replication testing.
Fig. 2: Replication in isolated white cells, isolated adipocytes and adipose tissue.
Fig. 3: Candidate genes for sentinel SNPs that are associated with trans CpG sites that overlap transcription factor-binding sites.
Fig. 4: Regulatory networks and locus colocalization analyses.
Fig. 5: Experimental evaluation of ZNF333 by ChIP–seq.
Fig. 6: White cell iQTL.

Data availability

Summary statistics for the 11.2 million SNP–CpG pairs reaching genome-wide significance are available at https://zenodo.org/record/5196216#.YRZ3TfJxeUk. ChIP–seq data for ZNF333 are available through the NCBI SRA (accession code SRP284104). Raw genotype, methylation and expression data can be made available upon reasonable request by the authors. Controlled data access to data from the KORA cohort can be obtained through https://epi.helmholtz-muenchen.de. The web links for the publicly available datasets used in the study are as follows: PhenoScanner version 2 (http://www.phenoscanner.medschl.cam.ac.uk), GWAS catalog (https://www.ebi.ac.uk/gwas/docs/file-downloads), meQTL and eQTM data from Bonder et al 2017 (ref. 14). (https://molgenis26.gcc.rug.nl/downloads/biosqtlbrowser/2015_09_02_Primary_cis_meQTLsFDR0.05-ProbeLevel.zip, https://molgenis26.gcc.rug.nl/downloads/biosqtlbrowser/2015_09_02_trans_meQTLsFDR0.05-CpGLevel.txt, https://molgenis26.gcc.rug.nl/downloads/biosqtlbrowser/2015_09_02_cis_eQTMsFDR0.05-CpGLevel.txt), GTEx version 6 eQTL results (https://storage.googleapis.com/gtex_analysis_v6/single_tissue_eqtl_data/GTEx_Analysis_V6_eQTLs.tar.gz), eQTLGen cis eQTL results (https://molgenis26.gcc.rug.nl/downloads/eqtlgen/cis-eqtl/cis-eQTLs_full_20180905.txt.gz), TWAS hub (http://twas-hub.org/genes/UBASH3B/), GWAS summary statistics of 114 traits for colocalization analysis (https://zenodo.org/record/3629742), ChIP–seq binding sites (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegTfbsClustered/wgEncodeRegTfbsClusteredWithCellsV3.bed.gz, http://tagc.univ-mrs.fr/remap/download/All/filPeaks_public.bed.gz), chromHMM states (http://egg2.wustl.edu/roadmap/data/byFileType/chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/all.mnemonics.bedFiles.tgz), Hi-C data (EGAD00001003106), PPIs (http://string90.embl.de/newstring_download/protein.links.detailed.v9.0.txt.gz). Source data are provided with this paper.

Code availability

Code for the analysis is available at GitHub (https://github.com/heiniglab/hawe2021_meQTL_analyses) and also through Zenodo (https://doi.org/10.5281/zenodo.5529828 (ref. 84)).

References

  1. Bird, A. Perceptions of epigenetics. Nature 447, 396–398 (2007).

    Article  CAS  PubMed  Google Scholar 

  2. Schubeler, D. Function and information content of DNA methylation. Nature 517, 321–326 (2015).

    Article  CAS  PubMed  Google Scholar 

  3. Parry, A., Rulands, S. & Reik, W. Active turnover of DNA methylation during cell fate decisions. Nat. Rev. Genet. 22, 59–66 (2021).

    Article  CAS  PubMed  Google Scholar 

  4. Jaenisch, R. & Bird, A. Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals. Nat. Genet. 33, 245–254 (2003).

    Article  CAS  PubMed  Google Scholar 

  5. Chambers, J. C. et al. Epigenome-wide association of DNA methylation markers in peripheral blood from Indian Asians and Europeans with incident type 2 diabetes: a nested case–control study. Lancet Diabetes Endocrinol. 3, 526–534 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Marioni, R. E. et al. DNA methylation age of blood predicts all-cause mortality in later life. Genome Biol. 16, 25 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  7. van der Harst, P., de Windt, L. J. & Chambers, J. C. Translational perspective on epigenetics in cardiovascular disease. J. Am. Coll. Cardiol. 70, 590–606 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Wahl, S. et al. Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 541, 81–86 (2017).

    Article  CAS  PubMed  Google Scholar 

  9. Zhang, Y. et al. DNA methylation signatures in peripheral blood strongly predict all-cause mortality. Nat. Commun. 8, 14617 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Sugiura, M. et al. Epigenetic modifications in prostate cancer. Int. J. Urol. 28, 140–149 (2020).

    Article  PubMed  Google Scholar 

  11. Blokhin, I. O., Khorkova, O., Saveanu, R. V. & Wahlestedt, C. Molecular mechanisms of psychiatric diseases. Neurobiol. Dis. 146, 105136 (2020).

    Article  CAS  PubMed  Google Scholar 

  12. Darwiche, N. Epigenetic mechanisms and the hallmarks of cancer: an intimate affair. Am. J. Cancer Res. 10, 1954–1978 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. Bonder, M. J. et al. Genetic and epigenetic regulation of gene expression in fetal and adult human livers. BMC Genomics 15, 860 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Bonder, M. J. et al. Disease variants alter transcription factor levels and methylation of their binding sites. Nat. Genet. 49, 131–138 (2017).

    Article  CAS  PubMed  Google Scholar 

  15. Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  16. Grundberg, E. et al. Global analysis of DNA methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements. Am. J. Hum. Genet. 93, 876–890 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Gutierrez-Arcelus, M. et al. Passive and active DNA methylation and the interplay with genetic variation in gene regulation. eLife 2, e00523 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Lemire, M. et al. Long-range epigenetic regulation is conferred by genetic variation located at thousands of independent loci. Nat. Commun. 6, 6326 (2015).

    Article  CAS  PubMed  Google Scholar 

  19. Huan, T. et al. Genome-wide identification of DNA methylation QTLs in whole blood highlights pathways for cardiovascular disease. Nat. Commun. 10, 4267 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Hannon, E. et al. Leveraging DNA-methylation quantitative-trait loci to characterize the relationship between methylomic variation, gene expression, and complex traits. Am. J. Hum. Genet. 103, 654–665 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Gaunt, T. R. et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 17, 61 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  22. McRae, A. F. et al. Identification of 55,000 replicated DNA methylation QTL. Sci. Rep. 8, 17605 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Hop, P. J. et al. Genome-wide identification of genes regulating DNA methylation using genetic anchors for causal inference. Genome Biol. 21, 220 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Peterson, R. E. et al. Genome-wide association studies in ancestrally diverse populations: opportunities, methods, pitfalls, and recommendations. Cell 179, 589–603 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Bell, C. G. et al. Obligatory and facilitative allelic variation in the DNA methylome within common disease-associated loci. Nat. Commun. 9, 8 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).

    Article  CAS  PubMed  Google Scholar 

  27. Brenner, C. et al. Myc represses transcription through recruitment of DNA methyltransferase corepressor. EMBO J. 24, 336–346 (2005).

    Article  CAS  PubMed  Google Scholar 

  28. Esteve, P. O., Chin, H. G. & Pradhan, S. Human maintenance DNA (cytosine-5)-methyltransferase and p53 modulate expression of p53-repressed promoters. Proc. Natl Acad. Sci. USA 102, 1000–1005 (2005).

    Article  PubMed Central  Google Scholar 

  29. Shlyueva, D., Stampfel, G. & Stark, A. Transcriptional enhancers: from properties to genome-wide predictions. Nat. Rev. Genet. 15, 272–286 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. Visel, A., Rubin, E. M. & Pennacchio, L. A. Genomic views of distant-acting enhancers. Nature 461, 199–205 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Javierre, B. M. et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Liu, Y., Toh, H., Sasaki, H., Zhang, X. & Cheng, X. An atomic model of Zfp57 recognition of CpG methylation within a specific DNA sequence. Genes Dev. 26, 2374–2379 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Shi, H. et al. ZFP57 regulation of transposable elements and gene expression within and beyond imprinted domains. Epigenetics Chromatin 12, 49 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Yengo, L. et al. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum. Mol. Genet. 27, 3641–3649 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Lee, S. T. et al. Protein tyrosine phosphatase UBASH3B is overexpressed in triple-negative breast cancer and promotes invasion and metastasis. Proc. Natl Acad. Sci. USA 110, 11121–11126 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Pulit, S. L. et al. Meta-analysis of genome-wide association studies for body fat distribution in 694 649 individuals of European ancestry. Hum. Mol. Genet. 28, 166–174 (2019).

    Article  CAS  PubMed  Google Scholar 

  38. Kichaev, G. et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 104, 65–75 (2019).

    Article  CAS  PubMed  Google Scholar 

  39. Zhu, Z. et al. Shared genetic and experimental links between obesity-related traits and asthma subtypes in UK Biobank. J. Allergy Clin. Immunol. 145, 537–549 (2020).

    Article  CAS  PubMed  Google Scholar 

  40. Richardson, T. G. et al. Evaluating the relationship between circulating lipoprotein lipids and apolipoproteins with risk of coronary heart disease: a multivariable Mendelian randomisation analysis. PLoS Med. 17, e1003062 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Konieczna, J., Sanchez, J., Palou, M., Pico, C. & Palou, A. Blood cell transcriptomic-based early biomarkers of adverse programming effects of gestational calorie restriction and their reversibility by leptin supplementation. Sci. Rep. 5, 9088 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Mancuso, N. et al. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100, 473–487 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    Article  CAS  PubMed  Google Scholar 

  44. Emery, P. et al. IL-6 receptor inhibition with tocilizumab improves treatment outcomes in patients with rheumatoid arthritis refractory to anti-tumour necrosis factor biologicals: results from a 24-week multicentre randomised placebo-controlled trial. Ann. Rheum. Dis. 67, 1516–1523 (2008).

    Article  CAS  PubMed  Google Scholar 

  45. Navarro-Millan, I., Singh, J. A. & Curtis, J. R. Systematic review of tocilizumab for rheumatoid arthritis: a new biologic agent targeting the interleukin-6 receptor. Clin. Ther. 34, 788–802 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wen, X., Pique-Regi, R. & Luca, F. Integrating molecular QTL data into genome-wide genetic association analysis: probabilistic assessment of enrichment and colocalization. PLoS Genet. 13, e1006646 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Burnichon, N. et al. MAX mutations cause hereditary and sporadic pheochromocytoma and paraganglioma. Clin. Cancer Res. 18, 2828–2837 (2012).

    Article  CAS  PubMed  Google Scholar 

  48. Li, H. et al. Novel treatment of hypertension by specifically targeting E2F for restoration of endothelial dihydrofolate reductase and eNOS function under oxidative stress. Hypertension 73, 179–189 (2019).

    Article  CAS  PubMed  Google Scholar 

  49. Burstein, E. et al. COMMD proteins, a novel family of structural and functional homologs of MURR1. J. Biol. Chem. 280, 22222–22232 (2005).

    Article  CAS  PubMed  Google Scholar 

  50. Astle, W. J. et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell 167, 1415–1429 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Suhail, A. et al. DeSUMOylase SENP7-mediated epithelial signaling triggers intestinal inflammation via expansion of γδ T cells. Cell Rep. 29, 3522–3538 (2019).

    Article  CAS  PubMed  Google Scholar 

  52. Jing, Z., Liu, Y., Dong, M., Hu, S. & Huang, S. Identification of the DNA binding element of the human ZNF333 protein. J. Biochem. Mol. Biol. 37, 663–670 (2004).

    CAS  PubMed  Google Scholar 

  53. Chen, M. H. et al. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182, 1198–1213 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Nedelec, Y. et al. Genetic ancestry and natural selection drive population differences in immune responses to pathogens. Cell 167, 657–669 (2016).

    Article  CAS  PubMed  Google Scholar 

  55. Joehanes, R. et al. Epigenetic signatures of cigarette smoking. Circ. Cardiovasc. Genet. 9, 436–447 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Singmann, P. et al. Characterization of whole-genome autosomal differences of DNA methylation between men and women. Epigenetics Chromatin 8, 43 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Zeilinger, S. et al. Tobacco smoking leads to extensive genome-wide changes in DNA methylation. PLoS ONE 8, e63812 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Giri, A. K. et al. DNA methylation profiling reveals the presence of population-specific signatures correlating with phenotypic characteristics. Mol. Genet. Genomics 292, 655–662 (2017).

    Article  CAS  PubMed  Google Scholar 

  59. Breeze, C. E. et al. eFORGE: a tool for identifying cell type-specific signal in epigenomic data. Cell Rep. 17, 2137–2150 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Westra, H. J. et al. Cell specific eQTL analysis without sorting cells. PLoS Genet. 11, e1005223 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  61. Guan, W. et al. Genome-wide association study of plasma N6 polyunsaturated fatty acids within the cohorts for heart and aging research in genomic epidemiology consortium. Circ. Cardiovasc. Genet. 7, 321–331 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Shin, S. Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  63. Kamat, M. A. et al. PhenoScanner V2: an expanded tool for searching human genotype–phenotype associations. Bioinformatics 35, 4851–4853 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Gelfand, E. W. & Dakhama, A. CD8+ T lymphocytes and leukotriene B4: novel interactions in the persistence and progression of asthma. J. Allergy Clin. Immunol. 117, 577–582 (2006).

    Article  CAS  PubMed  Google Scholar 

  65. Cho, S. H., Stanciu, L. A., Holgate, S. T. & Johnston, S. L. Increased interleukin-4, interleukin-5, and interferon-γ in airway CD4+ and CD8+ T cells in atopic asthma. Am. J. Respir. Crit. Care Med. 171, 224–230 (2005).

    Article  PubMed  Google Scholar 

  66. Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Ernst, J. & Kellis, M. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28, 817–825 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  Google Scholar 

  69. Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics 28, 1353–1358 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. GTEx Consortium et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).

    Article  PubMed Central  Google Scholar 

  71. Kim, K. A. et al. Environmental risk factors and comorbidities of primary biliary cholangitis in Korea: a case–control study. Korean J. Intern. Med. 36, 313–321 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  72. GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  Google Scholar 

  73. Staley, J. R. et al. PhenoScanner: a database of human genotype–phenotype associations. Bioinformatics 32, 3207–3209 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Griffon, A. et al. Integrative analysis of public ChIP–seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res. 43, e27 (2015).

    Article  PubMed  Google Scholar 

  75. Franceschini, A. et al. STRING v9.1: protein–protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 41, D808–D815 (2013).

    Google Scholar 

  76. Suthram, S., Beyer, A., Karp, R. M., Eldar, Y. & Ideker, T. eQED: an efficient method for interpreting eQTL associations using protein networks. Mol. Syst. Biol. 4, 162 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  77. Tu, Z., Wang, L., Arbeitman, M. N., Chen, T. & Sun, F. An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics 22, e489–96 (2006).

    Article  CAS  PubMed  Google Scholar 

  78. Haghverdi, L., Buettner, F. & Theis, F. J. Diffusion maps for high-dimensional single-cell analysis of differentiation data. Bioinformatics 31, 2989–2998 (2015).

    Article  CAS  PubMed  Google Scholar 

  79. Haghverdi, L., Buttner, M., Wolf, F. A., Buettner, F. & Theis, F. J. Diffusion pseudotime robustly reconstructs lineage branching. Nat. Methods 13, 845–848 (2016).

    Article  CAS  PubMed  Google Scholar 

  80. Schramm, K. et al. Mapping the genetic architecture of gene regulation in whole blood. PLoS ONE 9, e93844 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  81. Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. & Golani, I. Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284 (2001).

    Article  CAS  PubMed  Google Scholar 

  82. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    Article  CAS  PubMed  Google Scholar 

  83. Taylor-Weiner, A. et al. Scaling computational genomics to millions of individuals with GPUs. Genome Biol. 20, 228 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Hawe, J. S., Heinig, M. & Loh, M. Code for the analyses described in Hawe et al. Nature Genetics. Zenodo https://doi.org/10.5281/zenodo.5529828 (2021).

Download references

Acknowledgements

The KORA study was initiated and financed by the Helmholtz Zentrum München (German Research Center for Environmental Health), which is funded by the German Federal Ministry of Education and Research (BMBF) and by the state of Bavaria. KORA research was supported within the Munich Center of Health Sciences (MC-Health), Ludwig-Maximilians-Universität, as part of LMUinnovativ. The work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the EU Joint Programming Initiative ‘a Healthy Diet for a Healthy Life’ (DIMENSION grant number 01EA1902A). The work was further supported by the Bavarian State Ministry of Health and Care through the research project DigiMed Bayern (https://www.digimed-bayern.de/). The German Diabetes Center (DDZ) is supported by the Ministry of Culture and Science of the State of North Rhine–Westphalia and the German Federal Ministry of Health. This study was supported in part by a grant from the German Federal Ministry of Education and Research to the German Center for Diabetes Research (DZD). The LOLIPOP study is supported by the National Institute for Health Research (NIHR) Comprehensive Biomedical Research Centre Imperial College Healthcare NHS Trust, the British Heart Foundation (SP/04/002), the Medical Research Council (G0601966, G0700931), the Wellcome Trust (084723/Z/08/Z), the NIHR (RP-PG-0407-10371), European Union FP7 (EpiMigrant, 279143) and European Union Horizon 2020 (iHealth-T2D, 643774). B.C.L. is supported by the Imperial College Junior Research Fellowship scheme as well as an Academy of Medical Sciences Springboard award. J.C.C. is also supported by the Singapore NMRC (NMRC/STaR/0028/2017). We thank the participants and research staff who made the study possible. For the Northern Finnish Birth Cohort studies, M. Wielscher was supported by the European Union’s Horizon 2020 research and innovation program (grant 633212). NFBC1966 received financial support from the Academy of Finland (grants 104781, 120315, 129269, 1114194 and 24300796, Center of Excellence in Complex Disease Genetics and SALVE), University Hospital Oulu, Biocenter, University of Oulu, Finland (75617), NHLBI grant 5R01HL087679-02 through the STAMPEED program (1RL1MH083268-01), the NIH–NIMH (5R01MH63706:02), the ENGAGE project and grant agreement HEALTH-F4-2007-201413, EU FP7 EurHEALTHAgeing (277849), the Medical Research Council, UK (G0500539, G0600705, G1002319, PrevMetSyn/SALVE) and an MRC Centenary Early Career Award. NFBC1986 received financial support from EU QLG1-CT-2000-01643 (EUROBLCS) grant E51560, NorFA grant nos. 731, 20056 and 30167 and USA/NIHH 2000 G DF682 grant 50945. The NFBC programs are also funded by the H2020-633595 DynaHEALTH action, the Academy of Finland Exposomic, Genomic and Epigenomic Approach to Prediction of Metabolic and Cardiorespiratory Function and Ill-Health project (285547) and the EU H2020 ALEC project (grant agreement 633212). The MuTHER study was funded by the WT (081917/Z/07/Z). TwinsUK was funded by the WT and the European Community’s Seventh Framework Programme (FP7/2007-2013). The study also received support from the NIHR Clinical Research Facility at Guy’s and St. Thomas’ and King’s College London. Analysis was funded by British Heart Foundation grant RG/14/5/30893 to P.D. and forms part of the research themes contributing to the translational research portfolio of the Barts Cardiovascular Biomedical Research Unit, which is funded by the NIHR. The Saguenay Youth Study has been funded by the Canadian Institutes of Health Research (T.P., Z.P.), the Heart and Stroke Foundation of Canada (Z.P.) and the Canadian Foundation for Innovation (Z.P.). We acknowledge G. Möller and J. Adamski (Helmholtz Center Munich) for their support in the IP–MS transfection experiment. We used data generated by the PCHI-C Consortium31, funded by the UK NIHR, the Medical Research Council (MR/L007150/1) and the Biotechnology and Biological Research Council (BB/J004480/1).

Author information

Authors and Affiliations

Authors

Consortia

Contributions

Data collection and analysis in the contributing population studies: KORA, A.P., B.K., C.G., C.H., C.B., H.P., K. Strauch, L. Pfeiffer, M. Waldenberger, M.R., R.W., T.I., T.M. and W.R.; LOLIPOP, B.C.L., J.S.K., J.C.C., W.Z. and W.R.S.; MuTHER, E. Marouli; MuTHER Consortium, P.D. and S.B.; NFBC, M.-R.J., M. Wielscher, S.S. and V.K.; SYS, J. Shin, M.B., T.P. and Z.P. Data collection and molecular follow-up analyses: ChIP–seq, D.P.L., M.I.A., R.S.Y.F. and W.L.W.T.; ChIP–MS, S.M.H., J.M.-P. and P.R.M.-G. Data analysis and writing group (alphabetical order): J.C.C., J.S.H., M.H., C.G., B.C.L., M.L., K. Schmid, M. Waldenberger and R.W.

Corresponding authors

Correspondence to Jaspal S. Kooner, Marie Loh, Matthias Heinig, Christian Gieger, Melanie Waldenberger or John C. Chambers.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Genetics thanks Charles Danko and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Study design.

Overview of study design.

Extended Data Fig. 2 Replication testing of meQTLs within and across ancestries.

a: Ancestry-specific replication of SNP-CpG pairs identified by genome-wide association. Effect size: change in methylation (0-1 scale) per allele copy of the SNP. Axes set to [-0.5,0.5]. b: Ancestry-specific replication, by pair proximity and MAF. Bars: no. of pairs identified in discovery in given category. Blue: replicated; yellow: not replicated. c: Cross-ancestry replication, by pair proximity and MAF. Top: discovery in EU, replication in SA; bottom: discovery in SA, replication in EU. Bars: no. of pairs identified in discovery in given category. Blue: replicated; yellow: not replicated. d: Cross-platform: replication in KORA F4 (N<=1731) of published MeDIP-seq meQTLs, by significance threshold. Blue lines: no. of replicated results (of 328); histograms: no. of replicated results over 100 randomly selected matched datasets. P-values: one-sided, no adjustment for multiple testing. See Methods for test description. EU: European; SA: South Asian; MAF: Minor allele frequency.

Extended Data Fig. 3 Variance in DNA methylation explained by meQTL SNPs.

Histograms showing the proportions of variance of DNA methylation explained by genetic variants in both populations when variants are located in cis (left), long-range cis (middle) or trans (right) of the associated CpG site. EU: European; SA: South Asian.

Extended Data Fig. 4 Analysis of proximity between meQTL SNPs and CpGs.

Panel a: Histogram showing for each CpG site the genomic distance between CpG and the closest associated SNP from the cosmopolitan set of 10,346,172 SNP-CpG pairs identified in cis (association confirmed in both Europeans and South Asians). Panel b: Boxplots showing the proportion of SNP-CpG pairs that reach genome-wide significance for different distance categories (x-axis), compared to SNP-CpG pairs on different chromosomes (trans).1,000 random samples of 10,000 SNPs were taken. P-values above each box are based on a comparison (one-sided t-test) between the proportion of SNP-CpG pairs in trans that reach significance, and the proportion that reach significance in the respective same-chromosome distance window. Boxplots show medians (center lines), first and third quartiles (lower and upper box limits, respectively), 1.5-fold interquartile ranges (whisker extents) and outliers (black circles).

Extended Data Fig. 5 Functional genomic context of meQTL SNPs and CpGs.

Panel a: Genomic overlap between chromatin state annotations (15- state model; Roadmap Epigenomics Project and SNPs/CpGs identified by genome-wide association and cross-ancestry replication testing. Results are presented as a heatmap showing the P-values for enrichment (blue) or depletion (yellow) in the respective chromatin state (two-sided t-test). P-values have been Bonferroni-adjusted for the total number of tests (see Methods for details). Panel b: Colocalisation of SNPs and CpG sites in promoter and enhancer chromatin states. The histograms show the frequency at which CpG sites that localise in promoter or enhancer chromatin states have at least one cis-meQTL SNP that localises to the same chromatin state. Observed (turquoise) cis-meQTL pairs colocalise to the same chromatin state more frequently than matched background SNP-CpG pairs (grey). Panel c: Distance distributions for cis SNP-CpG pairs 1) localising to the same state (left), 2) where one entity localises to a promoter/enhancer state and the other to neither promoter nor enhancer state (center) and 3) one entity localises to a promoter and the other to an enhancer state. Panel d: Overlap of SNP-CpG associations with chromatin contacts in primary cells. The x-axis shows the fraction of SNP-CpG pairs that localise within the same topologically associated domain (TAD, left panel) or that overlap with Hi-C contacts (center and right panels). The left panel shows localisation of long-range cis-meQTLs within the same TAD. The center panel shows the overlap of long range cis-meQTLs (same chromosome, distance SNP - CpG > 1Mb) with contacts from promoter capture Hi-C (PCHi-C). The right panel shows overlap of trans-meQTL with Hi-C contacts. The blue vertical arrows indicate the overlap observed in the data. The grey histograms show the distribution of the fraction of randomly sampled SNP-CpG pairs overlapping contact regions for each category.

Extended Data Fig. 6 Enrichment of meQTL SNPs and CPGs for association with gene expression.

Sentinel meQTL SNPs and CpGs are enriched for association with gene expression in cis and trans (SNPs) and only in cis (CpGs). Panel a: Results are presented as the proportion of SNPs that are observed to be associated with gene expression in cis (top row) or in trans (bottom row), stratified by proximity between SNP and CpG for the respective SNP-CpG pair (cis, long-range cis and trans from left to right). Panel b: Similarly, results are presented as the proportion of CpGs that are observed to be associated with gene expression in cis (top row) or in trans (bottom row), stratified by proximity between SNP and CpG for the respective SNP-CpG pair (cis, long-range cis and trans from left to right). Both panels: In each plot, the observed proportion (yellow boxplots) is compared to the proportion expected under the null hypothesis based on permutation testing (blue boxplots, see Methods). Inset in each figure is the P-value for comparison between observed and expected proportions (t-test). Boxplots show medians (center lines), first and third quartiles (lower and upper box limits, respectively), 1.5-fold interquartile ranges (whisker extents) and outliers (black circles). Proportions were calculated based on 100 sets of permutations with 1,000 SNPs (Panel A) or 1,000 CpGs (Panel B) in each permutation.

Extended Data Fig. 7 Enrichment of meQTL SNPs and CpGs for associations with phenotypic traits.

(A) SNPs influencing DNA methylation (left panel) and SNPs identified to be population interacting meQTL based on our cosmopolitan discovery analysis (right panel) are both enriched for association with phenotypic traits, Analysis carried out using using QTLEnrich and 114 uniformly processed GWAS summary statistics. The volcano plot shows the log2 fold enrichment of significant GWAS hits among iQTL on the x-axis and the -log10 of the P-value of the enrichment test on the y-axis. Each point represents one of 114 GWAS studies. The transparency of the fill colour indicates the false discovery rate (FDR < 5%: no transparency). (B) Sentinel CpGs are enriched for clinical and metabolic traits. We tested the Sentinel CpGs for association with 277 available clinical and metabolic traits (NMR metabolomics). We used permutation testing to generate expectations under the null hypothesis, and to determine both the magnitude and probability for enrichment. Results show strong evidence that our genetically regulated Sentinel CpGs are enriched for association with traits (enriched at P<0.05/277 for 252 phenotypes) with median enrichment 1.10 (IQR: 1.06-1.15).

Extended Data Fig. 8 CpG sites associated with trans-acting sentinel SNPs are enriched for location in transcription factor binding sites.

Heatmap showing the enrichment (or depletion) of CpG sites for trans-acting sentinel SNPs (x-axis) with the DNA binding sites of known transcription factors (y-axis). Log2 odds ratios compare the frequency of overlap for the CpGs associated with the respective SNP, compared to the background frequency of overlap for all tested CpG sites. Results are shown for the 45 sentinel SNPs that show evidence for overlap with known transcription factor binding sites (out of the 115 tested trans-acting sentinel SNPs with at least five associated CpG sites).

Extended Data Fig. 9 Trans-acting regulatory networks at the CTCF, NFKB1, REST, NFE2, MAD1L1 and ENRICH1 loci.

(a) Circos plots summarising i. genomic distribution of CpGs associated in trans [inner connections], and ii. known DNA binding sites of transcription factor encoded in cis [outer ring], for sentinel SNPs at CTCF, NFKB1, REST and NFE2 loci. Inset are observed and expected proportions of CpG sites that overlap respective DNA binding sites as available for different cell lines (see Methods). FDR < 1.17 × 10−2 for all cell lines and transcription factors. (b) Regulatory network of ERICH1 locus illustrating the connection between SNP rs10103269 (yellow rectangle) and expression of identified candidate gene ERICH1 (yellow ellipse), which is connected through protein-protein and protein-DNA interactions to methylation at trans-associated CpG sites (beige rectangles). Ellipses represent genes encoded at the genetic locus identified by the sentinel or that are part of the protein-protein interaction network. Genes marked with an asterisk (*) show co-expression with the candidate gene. Bold gene names indicate a strong genetic effect of the sentinel on the expression of that gene (eQTL). Fill colour of ellipses represent the random walk score (colour bar legend). The colour of edges connecting genes and CpG sites represent: i. protein-protein interactions (purple), ii. protein-DNA interactions identified by TFBS overlap (green), and iii. proximity (distance < 1 Mb) between genes and SNPs or CpG sites (blue). The thickness of edges represents correlation with gene expression (thick) or no correlation of/with gene expression (thin). Boxplot shows the effect of sentinel SNP (rs10103269) in cis on expression of ERICH1 with the p-value from linear regression of expression ~ genotype (n=1,546 biologically independent samples combined from both cohorts). Center line indicates median, lower and upper box limits correspond to the first and third quartiles, respectively; whisker extent indicates 1.5-fold interquartile range; outliers not shown. (c) MAD1L1 locus pathway analysis. Annotations and symbols are as described in (b).

Source data

Extended Data Fig. 10 Experimental validation at the ZNF333 locus.

Panel a. Regulatory network of the ZNF333 locus. Annotations and symbols are as described in Extended Figure 9. The boxplot shows the effect of sentinel SNP (rs6511961) in cis on expression of the candidate gene ZNF333 with the p-value from the linear regression of expression ~ genotype (n=1,546 biologically independent samples combined from both cohorts). Panels b-d. HCT116 cells were transfected with ZNF333-FLAG/Myc tagged or GFP-control plasmids in biological replicates. Panel b. Protein lysates were Western blotted for ZNF333 expression using FLAG or MYC antibodies as validations. GAPDH was used as loading control (n=2). Source data: Membranes were cut into three pieces for optimisation of exposure. Top left panel: Original uncropped and unprocessed scans. Top right panel: Scan exposure optimized for molecular ladder. Bottom left panel: Scan exposure optimized for GAPDH. Bottom right panel: Final overlay figure. Panel c. Heatmap showing the Pearson correlation between ChIP-seq performed for ZNF333 using either FLAG or MYC antibodies. Panel d. Motifs of known TFs enriched in ZNF333 binding sites showing perfect overlap between ChIP with FLAG and MYC antibodies.

Supplementary information

Supplementary Information

Supplementary Note and Figs. 1–9

Reporting Summary

Supplementary Tables

Supplementary Tables 1–41.

Source data

Source Data Extended Data Fig. 9

Unprocessed data for ZNF333 ChIP–seq experiment.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hawe, J.S., Wilson, R., Schmid, K.T. et al. Genetic variation influencing DNA methylation provides insights into molecular mechanisms regulating genomic function. Nat Genet 54, 18–29 (2022). https://doi.org/10.1038/s41588-021-00969-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-021-00969-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing