Identification of 55,000 Replicated DNA Methylation QTL

McRae, Allan F.; Marioni, Riccardo E.; Shah, Sonia; Yang, Jian; Powell, Joseph E.; Harris, Sarah E.; Gibson, Jude; Henders, Anjali K.; Bowdler, Lisa; Painter, Jodie N.; Murphy, Lee; Martin, Nicholas G.; Starr, John M.; Wray, Naomi R.; Deary, Ian J.; Visscher, Peter M.; Montgomery, Grant W.

doi:10.1038/s41598-018-35871-w

Download PDF

Article
Open access
Published: 04 December 2018

Identification of 55,000 Replicated DNA Methylation QTL

Allan F. McRae ORCID: orcid.org/0000-0001-5286-5485^1,2,
Riccardo E. Marioni^3,4,
Sonia Shah¹,
Jian Yang ORCID: orcid.org/0000-0003-2001-2474^1,2,
Joseph E. Powell¹,
Sarah E. Harris^3,4,
Jude Gibson⁵,
Anjali K. Henders¹,
Lisa Bowdler⁶,
Jodie N. Painter⁶,
Lee Murphy ORCID: orcid.org/0000-0001-6467-7449⁵,
Nicholas G. Martin ORCID: orcid.org/0000-0003-4069-8020⁶,
John M. Starr^4,7,
Naomi R. Wray^1,2,
Ian J. Deary^4,8^na1,
Peter M. Visscher^1,2,4^na1 &
…
Grant W. Montgomery ORCID: orcid.org/0000-0002-4140-8139¹^na1

Scientific Reports volume 8, Article number: 17605 (2018) Cite this article

7233 Accesses
124 Citations
10 Altmetric
Metrics details

Subjects

Abstract

DNA methylation plays an important role in the regulation of transcription. Genetic control of DNA methylation is a potential candidate for explaining the many identified SNP associations with disease that are not found in coding regions. We replicated 52,916 cis and 2,025 trans DNA methylation quantitative trait loci (mQTL) using methylation from whole blood measured on Illumina HumanMethylation450 arrays in the Brisbane Systems Genetics Study (n = 614 from 177 families) and the Lothian Birth Cohorts of 1921 and 1936 (combined n = 1366). The trans mQTL SNPs were found to be over-represented in 1 Mbp subtelomeric regions, and on chromosomes 16 and 19. There was a significant increase in trans mQTL DNA methylation sites in upstream and 5′ UTR regions. The genetic heritability of a number of complex traits and diseases was partitioned into components due to mQTL and the remainder of the genome. Significant enrichment was observed for height (p = 2.1 × 10⁻¹⁰), ulcerative colitis (p = 2 × 10⁻⁵), Crohn’s disease (p = 6 × 10⁻⁸) and coronary artery disease (p = 5.5 × 10⁻⁶) when compared to a random sample of SNPs with matched minor allele frequency, although this enrichment is explained by the genomic location of the mQTL SNPs.

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

The All of Us Research Program Genomics Investigators

Introduction

DNA methylation plays an important role in transcriptional regulation and is increasingly recognised as having a role in health and disease^1,2. The contribution of genetic variation to the inheritance of DNA methylation levels across a range of tissues has been widely demonstrated both through studies investigating the heritability of DNA methylation using twin pairs and families^3,4,5,6, and through the identification of methylation quantitative trait loci or mQTL acting in both cis and trans^{7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}.

As the majority of single nucleotide polymorphisms (SNPs) associated with complex traits and disease are found in non-protein coding regions²², it is hypothesised that the SNPs act through the perturbation of the regulation of gene-expression. DNA methylation QTL have been associated with other genomic marks that affect gene regulation, including DNase I accessibility and histone modifications^16,17, as well as directly with gene-expression^15,16. Therefore, they are potential causal variants for disease. Indeed, the overlap between mQTL and disease SNPs has been investigated previously, finding inflation for the number of mQTL in bipolar risk SNPs¹¹, schizophrenia¹⁸ and autoimmune disease¹⁷.

These published studies indicate that mQTL have an influence in disease risk, however some aspects of the methodological approach in determining the significance of the overlap may be sub-optimal. For example, most identified mQTL have been found using Illumina HumanMethylation arrays, but the analytical methods have not recognised that the measures of DNA methylation are distributed non-randomly throughout the genome. Most of the DNA methylation probes on these arrays are located in genic regions, and, given that the majority of mQTL are found in cis to DNA methylation sites, the mQTL SNPs are also preferentially located in genic regions. Genic regions are also known to explain a larger proportion of the genetic variation underlying complex traits and disease²³. Therefore, any analysis looking into the overlap of mQTL with SNPs identified in genome-wide association studies (GWAS) needs to account for the proportion of methylation sites assessed in different genomic regions. In addition, determining the overlap between a mQTL and disease SNP often uses criteria such as an arbitrary linkage disequilibrium (LD) threshold of r² > 0.8 between the best disease GWAS SNP and the mQTL SNP. This implicitly assumes that a common causal variant for the mQTL and disease is being tagged by two different SNPs, rather than there being two different causal variants.

In this study we identify >50,000 mQTL in whole blood that are replicated at a stringent significance level in the Brisbane Systems Genetics Study (BSGS)^6,24 and the Lothian Birth Cohorts of 1921 and 1936 (LBC)^25,26,27. We then use LD Score regression^28,29 to partition the genetic variation for complex traits and diseases into components due to mQTL SNPs and the remainder of the SNPs in the genome using summary statistics from large GWAS meta-analyses. These results are compared to null distributions generated by selecting random sets of SNPs that have been matched by allele frequency or by both allele frequency and genomic annotation. This analysis both avoids the selection of an arbitrary linkage disequilibrium threshold above which mQTL and disease SNPs are considered as overlapping, and accounts for the non-random distribution of methylation sites tested across the genome, providing an unbaised assessment of the role of mQTL in complex traits and disease.

Results

Identification of mQTL

Due to prior evidence showing large cis SNP effects on DNA methylation, we firstly tested for association in a window spanning 2 Mbp on both sides of the target CpG site. This window is larger than what is usually considered for cis mQTL, but our prior observation of significant cis mQTL effects spanning this far in the MHC region on chromosome 6 indicated a larger window is warranted⁶. This was further justified by noting that the number of cis mQTL rapidly drops off to a constant background level between 1 and 2 Mbps from the target CpG site (Figure S1).

A total of 62,257 and 61,180 cis mQTL were identified in whole blood in the BSGS and LBC cohorts respectively at a significance threshold of p < 10⁻¹¹. While only the most significant SNP for each DNA methylation probe is considered, many of the mQTL are non-independent due to both correlations between DNA methylation levels for probes separated by small distances and through linkage disequilibrium between SNPs. Of these, 52,916 (~85%) replicated in the other cohort at Bonferonni corrected significance threshold of p < 10⁻⁶ and also had SNP effects on DNA methylation in the same direction in the other cohort. The correlation of cis mQTL effect sizes between the two cohorts was 0.97. Thus we have stringently replicated cis mQTL for more than 13% of the methylation sites tested.

Trans mQTL were defined using a more stringent significance threshold of p < 10⁻¹³ to account for the extra multiple testing burden from testing association with the whole genome. The number of significant trans mQTL found in the BSGS and LBC was 2,454 and 2,048 respectively. Of these, 2,025 replicated in the other cohort with a Bonferonni corrected p-value of p < 10⁻⁵ and also had the same direction of effect. The correlation in trans mQTL effect sizes across the two cohorts was 0.91. The location of the replicated mQTL are given in Fig. 1. The extremely high replication rate for both cis- and trans-mQTL in independent samples demonstrates the high quality of the data and reliability of the results.

The proportion of phenotypic variation in DNA methylation levels explained by all replicated mQTL in the LBC cohort is given in Fig. 2. As expected from QTL identified using limited sample sizes (as compared to contemporary GWAS for complex traits and disease), the phenotypic variation explained by the mQTL is very large, with 8% of cis mQTL explaining greater than 50% of phenotypic variation. While trans mQTL still explain a substantial proportion of the phenotypic variance, the overall distribution has fewer mQTL explaining very large amounts of variance. The effect of the “winner’s curse”, where the variance explained by the top SNPs identified in a GWAS is biased upwards, is likely to be small in this study given the stringency of testing and the high replication rate.

There is potential for SNPs located within DNA methylation probe binding regions to have an effect on the measurement of methylation levels, and thus potentially create false positive mQTL. To address this, we used the 1000Genomes (v3) European samples to identify any genetic variation within a probe site and identified a SNP in 27% of the probes passing QC. It is of note that many of the SNPs identified within probe sequences are rare and would not be in strong linkage disequilibrium with the common (>1% frequency) SNPs used for the GWAS. For trans mQTL, it is very unlikely that a SNP in the probe site was associated with the mQTL SNP, particularly given the very stringent significance thresholds that were used for mQTL mapping. This is reflected in 499 (25%) trans mQTL having a SNP in the probe site, which is the same as the null proportion of probes that do not have an associated mQTL that have SNPs in their binding site (85,621/342,967). SNPs were found within the probe binding site for 22,267 (42%) of cis mQTL. Thus, we can potentially attribute 15% (42–27%) of cis mQTL to genetic variation within the probe location causing genotype specific measurement error. However, it can also be argued that the majority of cis mQTL are found within a very small distance of the probe location, and it would not be surprising for genetic variation very close to a CpG site to have a genuine effect on methylation levels. To take an extreme example, a SNP falling within a CpG site completely disrupts DNA methylation at this site, which occurs for 6,160 (12%) of cis mQTL. For this reason, we include all mQTL – regardless of the identification of SNP within the probe site – in the further analyses.

Genomic Distribution of Trans mQTL

From Fig. 1, we have an indication that the distribution of trans mQTL SNPs is non-randomly located throughout the genome. This is investigated in Fig. 3, which shows there is a large number of trans mQTL SNP located on chromosomes 16 and 19 given their respective sizes. This may not be surprising under a polygenic model of inheritance given those chromosomes have a higher gene density than other chromosomes. However, this inflation is beyond that expected given the gene count on those two chromosomes. The rest of the genome shows a strong correlation between number of genes on a chromosome and the number of trans mQTL SNPs, except for chromosome 1 which has fewer trans mQTL SNP than expected. Of interest, chromosome 19 contains DNMT1 (DNA methyltransferase 1) that has a role in the establishment and regulation of DNA methylation. However, there is no clustering of trans mQTL SNPs around its location.

There are clear horizontal bands of SNPs in Fig. 1, located in the subtelomeric regions of the genome. Indeed, 17.9% of all trans mQTL SNP are located in telomeric regions covering the 1 Mbp at the end of chromosomes, which represents 1.53% of the genome. There is also some inflation of the numbers of trans mQTL methylation probes found in the 1 Mbp subtelomeric region (7.0%), but this is primarily due to the increased number of array probes in the subtelomeric region (5.5%) and this inflation is reflected in the number of cis mQTL methylation probes also (7.5%). Given the association with trans mQTL SNP in subtelomeric regions, we tested whether the trans CpG probes or SNPs were significantly associated with telomere length in the LBC1936 cohort. This identified no inflation of test statistics for either the SNPs or methylation compared to the whole genome (Figure S3).

Unlike trans mQTL SNPs, the CpG probe locations showed no clustering across the genome. To investigate a functional role of the trans mQTL methylation sites, we annotated the genomic locations of all the array probes tested (Table 1). As expected from the design of the array, the majority of the probe CpG targets were located in genic regions. While cis mQTL methylation probes showed no large deviation in genomic annotation from all probes, the number of trans mQTL CpGs was substantially inflated in both upstream and 5′ UTR regions.

Table 1 Genomic annotation of mQTL CpG site locations.

Full size table

Role of mQTL in Complex Traits and Disease

To assess the role of mQTL in driving the phenotypic variation of complex traits and disease, we used LD Score regression^28,29 to partition the trait heritability into components due to mQTL and the rest of the genome. LD Score regression uses summary statistics from GWAS, allowing us to investigate a range of traits and diseases using results from large consortia (for height³⁰, BMI³¹, schizophrenia³², ulcerative colitis³³, Crohn’s disease³³, coronary artery disease³⁴, type 2 diabetes³⁵, rheumatoid arthritis³⁶, and educational attainment³⁷).

The replicated mQTL were firstly filtered to have no SNP pairs with an estimated r² of greater than 0.8. This allows for straightforward generation of sets of SNPs to estimate the distribution of variance explained under the null hypothesis, as then the LD structure is similar to that of a random set of minor allele frequency matched SNPs. Two different null hypotheses were used. The first (null #1) accounted for the fact that on average SNPs with a higher heterozygosity explain more variation in a trait by drawing random sets of SNPs with a matched minor allele frequency (in bins of 0.05 width). The second (null #2) in addition matched the genomic location of randomly sampled SNPs using annotation from ANNOVAR³⁸. This accounts for the observation that a large proportion of the genetic variation in complex traits is explained by genic regions and that the array (and thus cis mQTL locations) is very gene centric.

Under null #1, height, ulcerative colitis, Crohn’s disease and coronary artery disease all showed a significant inflation of the proportion of genetic variation explained by mQTL (Table 2), although none of these were significant after accounting for the genomic location of the mQTL SNP (null #2). However, sets of SNPs generated for null #2 tag many of the same regions of the genome as the mQTL SNP due to large number of genic mQTL identified in this study compared to genes in the genome. Thus it is not surprising that none of the tests under null #2 are significant, and we cannot distinguish between the hypotheses of close linkage and causality. It is of note that all of those tests that were significant under null #1 explained more than average variation under null #2.

Table 2 LDScore regression partitioning of the heritability for a variety of traits and disease.

Full size table

Due to the limitations of the genomic partitioning, a second approach to investigate the effect of mQTL on complex traits and disease was taken. If mQTL are a driving force behind phenotypic variation, then it would be expected that mQTL SNPs with large effects on DNA methylation would also have large effects on the complex trait. To test this, we estimated the correlation between the mQTL SNP effect size and its effect from the large GWAS studies. The absolute value of the effect (or log odds-ratio) on both DNA methylation and the trait was used as it is expected that there will be variation in whether DNA methylation is protective or not for different regions of the genome. In addition, the effect sizes were corrected for the expected relationship between effect size and minor allele frequency by multiplying the effect size by $\sqrt{{\rm{2}}{f}({\rm{1}}-{f})}$, where f is the minor allele frequency of the SNP. After correcting for minor allele frequency, no significant correlation was observed between the effects sizes of the SNPs on the mQTL and the corresponding SNP effect sizes on any of the tested traits (Table S2).

Discussion

We have identified 52,916 cis and 2,025 trans mQTL that are replicated across two independent cohorts at very stringent significance levels. While the mQTL can explain a large proportion of the genetic variation underlying DNA methylation variation, there is still substantial genetic variation remaining to be explained. Using the twin family structure in the Brisbane Systems Genetics Study, we have previously shown that the average heritability of DNA methylation at sites measured by the Illumina HumanMethylation450 array is 0.187⁶. The average proportion of phenotypic variation explained by all mQTL across all DNA methylation probes in this study (including probes that had no mQTL and thus explained zero variation) is 0.021. Thus, the mQTL identified here explain approximately 11.2% of the total genetic variation for DNA methylation. This implies there is substantial genetic variation for DNA methylation remaining to be discovered through additional variants in cis and/or many more trans variants with small effects in larger samples.

By partitioning heritability into components due to mQTL SNPs and the rest of the genome, we established that the identified mQTL explained a significant amount of the genetic variation for a number of complex traits and diseases. Using a null distribution generated by randomly sampling SNPs from the genome with matching minor allele frequencies showed significant amounts of genetic variation were explained by mQTL for height, schizophrenia, ulcerative colitis, Crohn’s disease, and coronary artery disease. This enrichment of mQTL in disease associated regions was explained by the genomic location of the mQTL SNP. This is due to most mQTL SNP being cis to the DNA methylation probes, which also tend to be found in genic regions due to the design of the array, combined with the observation that genic regions explain more of the heritability for many traits²³. Previous studies that have shown a relationship between mQTL and bipolar disorder¹¹ and schizophrenia¹⁸ QTL whilst only considered MAF when sampling SNPs for the null distribution, and, as demonstrated here, the results are likely to be driven by the common genomic function of the SNPs. Testing for a role of mQTL in complex traits and disease beyond that explained by genomic location is difficult due to the large number of mQTL replicated in this study. This means that a large proportion of genes in the genome are tagged by an mQTL and any null sample of SNPs will cover many of the same genomic regions. This makes any test for the proportion of heritability explained by mQTL being extremely conservative.

Determining whether associations detected in the same genetic region for DNA methylation and a disease are the result of (mediated) pleiotropy or just close linkage is a difficult prospect. To have potential for pleiotropy, the set of potential causal variants for the two associations will need to overlap. Fine-mapping to a set of potential causal variants can be determined by statistical prioritisation using only association statistics^39,40,41, or in combination with other genomic data^42,43,44. Reducing the set of potential causal variant(s) underlying a mQTL using these approaches is helped by the large amount of phenotypic variation the mQTLs explain. There is also strong potential to determine causal SNPs for mQTLs in cell lines using CRISPR genome editing⁴⁵ as the end phenotype is directly observable in the cell, unlike the case for complex traits and disease where a phenotype to investigate in cell lines is generally unclear.

We observed a strong over-representation of trans mQTL SNP in the 1 Mbp subtelomeric region of the genome, as had been previously noted¹⁷. No association of the trans mQTL SNP or methylation probes was found with telomere length in the LBC1936 cohort. The trans mQTLs were significantly inflated for methylation probes found in the upstream regions of genes, indicating a potential effect on the regulation of gene-expression. However, there was no overlap with trans eQTLs identified in the BSGS²⁴. The mechanism and potential importance of subtelomeric regions in altering DNA methylation throughout the genome warrants further investigation and at this stage artefacts of the technology cannot be excluded.

In summary, we have identified and replicated a large number of genetic loci associated with DNA methylation in both cis and trans. We demonstrated an overlap of mQTL and loci for complex traits and diseases, which was explained by the genomic location of the mQTL SNPs.

Materials and Methods

Brisbane Systems Genetics Study (BSGS)

DNA methylation was measured on 614 individuals from 177 families of European descent recruited as part of a study on adolescent twins and selected from individuals in the Brisbane Systems Genetics Study^6,24. Families consist of adolescent monozygotic (MZ) and dizygotic (DZ) twins, their siblings, and their parents. DNA was extracted from peripheral blood lymphocytes by the salt precipitation method⁴⁶. The BSGS study was approved by the Queensland Institute for Medical Research Human Research Ethics Committee, and all methods were performed in accordance with the relevant guidelines and regulations. All participants gave informed written consent.

Lothian Birth Cohorts

Methylation data were analysed from the combined data of the Lothian Birth Cohort 1921 (LBC1921) and the Lothian Birth Cohort 1936 (LBC1936)^25,26,27. The LBC1921 and LBC1936 are longitudinal studies of ageing, with a focus on cognition, in groups of initially healthy older people. DNA methylation was measured in 446 LBC1921 subjects at an average age of 79 years, and in 920 LBC1936 subjects at an average age of 70 years⁴⁷. Following informed consent, venesected whole blood was collected for DNA extraction by standard methods in both LBC1921 and LBC1936. Ethics permission for the LBC1921 was obtained from the Lothian Research Ethics Committee (Wave 1: LREC/1998/4/183). Ethics permission for the LBC1936 was obtained from the Multi-Centre Research Ethics Committee for Scotland (Wave 1: MREC/01/0/56), the Lothian Research Ethics Committee (Wave 1: LREC/2003/2/29) and all methods were performed in accordance with the relevant guidelines and regulations. Written informed consent was obtained from all subjects.

DNA Methylation

DNA methylation was measured using Illumina HumanMethylation450 BeadChips as described in detail elsewhere^6,47. The HM 450 BeadChip-assessed methylation status was interrogated at 485,577 CpG sites across the genome. It provides coverage of 99% of RefSeq genes. Methylation scores for each CpG site are obtained as a ratio of the intensities of fluorescent signals and are represented as β-values. DNA methylation data for the BSGS is available at the Gene Expression Omnibus under accession code GSE56105, and the LBC data is available at the European Genome-phenome Archive under accession number EGAS00001000910.

Probes on the sex chromosomes or having been annotated as binding to multiple chromosomes⁴⁸ were removed from the analysis, as were non CpG sites. Probes with excess missingness or high numbers of individuals with detection p-value less than 0.001 were also removed. After cleaning, 397,710 probes remained for association analysis in both cohorts.

Normalisation

Array data were background corrected, followed by individual probes being normalised using a generalised linear model with a logistic link function. Corrections were made for the effects of chip (which encompasses batch processing effects), position on the chip, sex, age, age², sex × age and sex × age². In addition, the LBC data were corrected for white blood cell counts (basophils, eosinophils, monocytes, lymphocytes, and neutrophils). The LBC data were normalised for the two cohorts individually before combining the data for further analysis.

Outlying data points can result in a high number of false positive in GWAS analysis when associated with rare variants. To address this, the BSGS cohort removed any measurement at a probe that was greater than five interquartile ranges from its nearest quartile. In the LBC, probes that had such outliers were restricted to testing association with SNPs having a minor allele frequency greater than 5%.

Genotyping and Imputation

Both the BSGS and LBC were genotyped on Illumina 610-Quad Beadchip arrays, with full details of genotyping procedures described elsewhere^49,50. After standard quality control, the BSGS and LBC had 528,509 and 549,692 SNPs remaining respectively.

The remaining genotyped SNPs were phased using SHAPEIT^51,52 and imputed against 1000 Genomes Phase I Version 3^53,54 using Impute V2^55,56. Raw imputed SNPs were filtered to remove any SNPs with low imputation quality as defined by an r² < 0.8. Subsequent quality control removed SNPs with MAF < 0.05, and those with HWE p < 1 × 10⁻⁶. The “best-guess” (highest probability) genotype was used for the GWAS analyses.

Genome-Wide Association Analysis

Genome-wide association (GWAS) was performed individually on the BSGS and LBC cohorts, with each serving as an independent discovery cohort and replication performed in the other. Association testing was performed using MERLIN⁵⁷ using the–fastAssoc option for the BSGS cohort (to account for family structure) and PLINK⁵⁸ for the combined LBC cohorts.

To reduce the massive computational burden, GWAS was performed in two stages. Firstly the cis region to the methylation probe – defined as a window 2 Mbp each side of the target CpG site location – was investigated. A significance threshold of 10⁻¹¹ was used, which is a stringent p = 0.05 Bonferroni correction for the approximate number of independent SNPs in the window and number of probes analysed. Significant associations were replicated with a Bonferroni corrected (based on the approximate number of independent mQTL) p-value of 10⁻⁶ and having effect in the same direction in the other sample. When a single methylation probe had a replicated association from both cohorts but at a different SNP, the SNP with the best combined evidence of association was selected for further analyses.

Association with trans SNPs (defined as all SNPs outside the 4 Mbp window used in the cis analysis) was performed in two steps. Firstly, all chromosome/probe pairs were analysed on non-imputed genotyped data, which reduced the number of tests performed by a factor of 10. This was particularly important for the BSGS cohort which had related individuals and thus was much slower to analyse. Any chromosome/probe pair that had an association at p < 10⁻⁷ was then reanalysed using imputed SNP data. An experiment-wide significance of 10⁻¹³ was used for trans associations, which is the standard GWAS genome-wide significance threshold of 5 × 10⁻⁸ Bonferroni corrected for the number of probes tested. The replication threshold of 10⁻⁵ was used, again being more stringent than a 5% significance Bonferroni corrected for the number of associations to be replicated.

Genomic Annotation of SNP and Methylation Sites

SNPs and the CpG targets of methylation probes were functionally annotated using ANNOVAR³⁸, using the hg19 annotation with the distance of the upstream and downstream regions of genes being 2 Mbp to align with our definition of cis loci.

Telomere Measurements

Telomere length was measured using the same blood sample as methylation in the LBC1936 cohort using a quantitative real-time polymerase chain reaction (PCR) assay⁵⁹. The intra-assay coefficient of variation was 2.7% and the inter-assay coefficient of variation was 5.1%. Four internal control DNA samples were run within each plate to correct for plate-to-plate variation. These internal controls are cell lines of known absolute telomere length whose relative ratio values (telomere starting quantity/glyceraldehyde 3-phosphate dehydrogenase starting quantity) were used to generate a regression line by which values of relative telomere length for the actual samples were converted into absolute telomere lengths. Measurements were performed in quadruplicate and the mean of the measurements used. PCRs were performed on an Applied Biosystems (Pleasonton, CA, USA) 7900HT Fast Real Time PCR machine.

Partitioning Heritability

The heritability of a trait explained by all GWASed SNPs was partitioned in to a component due to all discovered mQTL and all remaining SNP using LD Score regression^28,29. The sum of the LD r² values for between that target SNP and all other SNPs within the 1 Mbp region centred on the target SNP⁶⁰, and was calculated using the European samples from the 1000 Genomes project^53,54 using the software GCTA (–ld-score option)⁶¹. The LD score at a SNP, j, is then calculated as:

$${{L}}_{{j}}={\rm{1}}+\sum {{r}}^{{\rm{2}}}-\frac{n}{{N}}$$

(1)

Where n is the number of SNP in the window and N is sample size used to calculate the r² measures.

Using the summary statistics from a large GWAS for a quantitative trait or disease, the heritability of the trait is partitioned into components due to mQTL and the rest of the genome using a regression

$${\chi }_{{j}}^{{\rm{2}}}=\alpha +\,{\beta }_{{mQTL}}{{L}}_{j{,}\mathrm{mQTL}}+{\beta }_{{G}}{{L}}_{j{,}G}$$

(2)

where ${\chi }_{{j}}^{{\rm{2}}}$ is the chi-square test statistics for SNP j. The heritability attributable to mQTL is calculated as

$$\frac{{{\rm{\beta }}}_{{mQTL}}\,\ast \,{{M}}_{{mQTL}}}{{{N}}_{{GWAS}}}$$

(3)

where M_mQTL is the number of mQTL SNPs and N_GWAS is the sample size of the GWAS from which the summary statistics were obtained. The heritability attributable to the rest of the genome is calculated similarly.

Data Availability

DNA methylation data for the BSGS is available at the Gene Expression Omnibus under accession code GSE56105, and the LBC data is available at the European Genome-phenome Archive under accession number EGAS00001000910.

References

Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat. Biotechnol. 28, 1057–68 (2010).
Article CAS Google Scholar
Bergman, Y. & Cedar, H. DNA methylation dynamics in health and disease. Nat. Struct. Mol. Biol. 20, 274–81 (2013).
Article CAS Google Scholar
Kaminsky, Z. A. et al. DNA methylation profiles in monozygotic and dizygotic twins. Nat. Genet. 41, 240–245 (2009).
Article CAS Google Scholar
Boks, M. P. et al. The relationship of DNA methylation with age, gender and genotype in twins and healthy controls. PLoS One 4, e6767 (2009).
Article ADS Google Scholar
Gordon, L. et al. Neonatal DNA methylation profile in human twins is specified by a complex interplay between intrauterine environmental and genetic factors, subject to tissue-specific influence. Genome Res. 22, 1395–406 (2012).
Article CAS Google Scholar
McRae, A. F. et al. Contribution of genetic variation to transgenerational inheritance of DNA methylation. Genome Biol. 15, R73 (2014).
Article Google Scholar
Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010).
Article Google Scholar
Bell, J. T. et al. DNA methylation patterns associate with genetic and gene expression variation in HapMap cell lines. Genome Biol. 12, R10 (2011).
Article CAS Google Scholar
van Eijk, K. R. et al. Genetic analysis of DNA methylation and gene expression levels in whole blood of healthy human subjects. BMC Genomics 13, 636 (2012).
Article Google Scholar
Fraser, H., Lam, L., Neumann, S. & Kobor, M. Population-Specificity of Human s - DNA Methylation. Genome Biol. 13, R8 (2012).
Article CAS Google Scholar
Smith, A. K. et al. Methylation quantitative trait loci (meQTLs) are consistently detected across ancestry, developmental stage, and tissue type. BMC Genomics 15, 145 (2014).
Article Google Scholar
Drong, A. W. et al. The presence of methylation quantitative trait loci indicates a direct genetic influence on the level of DNA methylation in adipose tissue. PLoS One 8, e55923 (2013).
Article ADS CAS Google Scholar
Quon, G., Lippert, C., Heckerman, D. & Listgarten, J. Patterns of methylation heritability in a genome-wide analysis of four brain regions. Nucleic Acids Res. 41, 2095–104 (2013).
Article CAS Google Scholar
Teh, A. L. et al. The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes. Genome Res. 24, 1064–74 (2014).
Article CAS Google Scholar
Wagner, J. R. et al. The relationship between DNA methylation, genetic and expression inter-individual variation in untransformed human fibroblasts. Genome Biol. 15, R37 (2014).
Article Google Scholar
Banovich, N. E. et al. Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels. PLoS Genet. 10, e1004663 (2014).
Article Google Scholar
Lemire, M. et al. Long-range epigenetic regulation is conferred by genetic variation located at thousands of independent loci. Nat. Commun. 6, 6326 (2015).
Article CAS Google Scholar
Hannon, E. et al. Methylation QTLs in the developing brain and their enrichment in schizophrenia risk loci. Nat Neurosci 19, 48–54 (2016).
Article CAS Google Scholar
Grundberg, E. et al. Global analysis of dna methylation variation in adipose tissue from twins reveals links to disease-associated variants in distal regulatory elements. Am. J. Hum. Genet. 93, 876–890 (2013).
Article CAS Google Scholar
Gaunt, T. R. et al. Systematic identification of genetic influences on methylation across the human life course. Genome Biol. 17, 61 (2016).
Article Google Scholar
Volkov, P. et al. A Genome-Wide mQTL Analysis in Human Adipose Tissue Identifies Genetic Variants Associated with DNA Methylation, Gene Expression and Metabolic Traits. PLoS One 11, 1–31 (2016).
Article Google Scholar
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, (2014).
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–25 (2011).
Article CAS Google Scholar
Powell, J. E. et al. The Brisbane Systems Genetics Study: genetical genomics meets complex trait genetics. PLoS One 7, e35430 (2012).
Article ADS CAS Google Scholar
Deary, I. J., Whiteman, M. C., Starr, J. M., Whalley, L. J. & Fox, H. C. The impact of childhood intelligence on later life: following up the Scottish mental surveys of 1932 and 1947. J. Pers. Soc. Psychol. 86, 130–147 (2004).
Article Google Scholar
Deary, I. J. et al. The Lothian Birth Cohort 1936: a study to examine influences on cognitive ageing from age 11 to age 70 and beyond. BMC Geriatr. 7, 28 (2007).
Article Google Scholar
Deary, I. J., Gow, A. J., Pattie, A. & Starr, J. M. Cohort profile: the Lothian Birth Cohorts of 1921 and 1936. Int. J. Epidemiol. 41, 1576–84 (2012).
Article Google Scholar
Bulik-Sullivan, B. K. et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet advance on, 291–295 (2015).
Article CAS Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 47, 1228–1235 (2015).
Article CAS Google Scholar
Wood, A. R. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46, (2014).
Locke, A. E. et al. Genetic studies of body mass index yield new insights for obesity biology. Nature 518, 197–206 (2015).
Article CAS Google Scholar
The Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 421–427 (2014).
Article ADS Google Scholar
Jostins, L. et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–24 (2012).
Article CAS Google Scholar
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).
Article CAS Google Scholar
Morris, A., Voight, B. & Teslovich, T. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat. Genet. 44, 981–990 (2012).
Article CAS Google Scholar
Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature advance on, 376–81 (2013).
Rietveld, C. A. et al. GWAS of 126,559 individuals identifies genetic variants associated with educational attainment. Science 340, 1467–71 (2013).
Article ADS CAS Google Scholar
Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
Article Google Scholar
Maller, J. B. et al. Bayesian refinement of association signals for 14 loci in 3 common diseases. Nat. Genet. 44, 1294–301 (2012).
Article CAS Google Scholar
Faye, L. L., Machiela, M. J., Kraft, P., Bull, S. B. & Sun, L. Re-Ranking Sequencing Variants in the Post-GWAS Era for Accurate Causal Variant Identification. PLoS Genet. 9, e1003609 (2013).
Article CAS Google Scholar
Hormozdiari, F., Kostem, E., Kang, E. Y., Pasaniuc, B. & Eskin, E. Identifying Causal Variants at Loci with Multiple Signals of Association. Genetics genetics. 114, 167908, https://doi.org/10.1534/genetics.114.167908 (2014).
Article CAS Google Scholar
Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).
Article CAS Google Scholar
Gaffney, D. J. et al. Dissecting the regulatory architecture of gene expression QTLs. Genome Biol. 13, R7 (2012).
Article CAS Google Scholar
Kichaev, G. et al. Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet. 10, e1004722 (2014).
Article Google Scholar
Jinek, M. et al. A Programmable Dual-RNA – Guided DNA Endonuclease in Adaptive Bacterial Immunity. Science (80-.). 337, 816–822 (2012).
Article ADS CAS Google Scholar
Miller, S. A., Dykes, D. D. & Polesky, H. F. A simple salting out procedure for extracting DNA from human nucleated cells. Nucleic Acids Res. 16, 1215 (1988).
Article CAS Google Scholar
Shah, S. et al. Genetic and environmental exposures constrain epigenetic drift over the human life course. Genome Res. 24, 1725–1733 (2014).
Article CAS Google Scholar
Price, E. M. et al. Additional annotation enhances potential for biologically-relevant analysis of the Illumina Infinium HumanMethylation450 BeadChip array. Epigenetics Chromatin 6, 4 (2013).
Article CAS Google Scholar
Medland, S. E. et al. Common variants in the trichohyalin gene are associated with straight hair in Europeans. Am. J. Hum. Genet. 85, 750–5 (2009).
Article CAS Google Scholar
Houlihan, L. M. et al. Common variants of large effect in F12, KNG1, and HRG are associated with activated partial thromboplastin time. Am. J. Hum. Genet. 86, 626–31 (2010).
Article CAS Google Scholar
Delaneau, O., Howie, B., Cox, A. J., Zagury, J. F. & Marchini, J. Haplotype estimation using sequencing reads. Am. J. Hum. Genet. 93, 687–696 (2013).
Article CAS Google Scholar
Delaneau, O., Marchini, J. & Zagury, J.-F. A linear complexity phasing method for thousands of genomes. Nature Methods 9, 179–181 (2011).
Article Google Scholar
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–73 (2010).
The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Howie, B., Marchini, J. & Stephens, M. Genotype imputation with thousands of genomes. G3(1), 457–70 (2011).
Google Scholar
Howie, B., Fuchsberger, C., Stephens, M., Marchini, J. & Abecasis, G. R. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44, 955–9 (2012).
Article CAS Google Scholar
Abecasis, G. R., Cherny, S. S., Cookson, W. O. & Cardon, L. R. Merlin - rapid analysis of dense genetic maps using sparse gene flow trees. Nat. Genet. 30, 97–101 (2002).
Article CAS Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–75 (2007).
Article CAS Google Scholar
Martin-Ruiz, C. et al. Stochastic variation in telomere shortening rate causes heterogeneity of human fibroblast replicative life span. J Biol Chem 279, 17826–17833 (2004).
Article CAS Google Scholar
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat Genet 47, 1114–1120 (2015).
Article CAS Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: A tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS Google Scholar

Download references

Acknowledgements

We thank the cohort participants and team members who contributed to these studies. The Brisbane Systems Genetics Study (BSGS) was supported by NHMRC grants 1010374, 496667, 1046880. A.F.M., J.E.P., N.R.W., P.M.V., and G.W.M. are supported by the NHMRC Fellowship Scheme (1083656, 1107599, 1078901, 1078037 and 1078399) and grants (1050218). J.Y. is supported by the Sylvia & Charles Viertel Charitable Foundation. We acknowledge funding by the Australian Research Council (A7960034, A79906588, A79801419, DP0212016, DP0343921), and the Australian National Health and Medical Research Council (NHMRC) Medical Bioinformatics Genomics Proteomics Program (grant 389891) for building and maintaining the adolescent twin family resource through which samples were collected. Phenotype collection in the Lothian Birth Cohort 1921 (LBC1921) was supported by the UK’s Biotechnology and Biological Sciences Research Council (BBSRC), The Royal Society and The Chief Scientist Office of the Scottish Government. Phenotype collection in the Lothian Birth Cohort 1936 (LBC1936) was supported by Age UK (The Disconnected Mind project). Genotyping of LBC1921 and LBC1936 was funded by the BBSRC. Methylation typing of LBC1921 and LBC1936 was supported by The Centre for Cognitive Ageing and Cognitive Epidemiology (Pilot Fund award), Age UK, The Wellcome Trust Institutional Strategic Support Fund, The University of Edinburgh, and The University of Queensland. Telomere length data was generated with the support of Carmen Martin-Ruiz and Thomas von Zglinicki. R.E.M., S.E.H., J.M.S., I.J.D. and P.M.V. are members of the University of Edinburgh Centre for Cognitive Ageing and Cognitive Epidemiology (CCACE). CCACE is supported by funding from the BBSRC, the Economic and Social Research Council (ESRC), the Medical Research Council (MRC), and the University of Edinburgh as part of the cross-council Lifelong Health and Wellbeing initiative (MR/K026992/1).

Author information

Ian J. Deary, Peter M. Visscher and Grant W. Montgomery contributed equally.

Authors and Affiliations

The Institute for Molecular Bioscience, The University of Queensland, Brisbane, 4072, QLD, Australia
Allan F. McRae, Sonia Shah, Jian Yang, Joseph E. Powell, Anjali K. Henders, Naomi R. Wray, Peter M. Visscher & Grant W. Montgomery
Queensland Brain Institute, The University of Queensland, Brisbane, 4072, QLD, Australia
Allan F. McRae, Jian Yang, Naomi R. Wray & Peter M. Visscher
Medical Genetics Section, Centre for Genomic and Experimental Medicine, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XU, UK
Riccardo E. Marioni & Sarah E. Harris
Centre for Cognitive Ageing and Cognitive Epidemiology, University of Edinburgh, 7 George Square, Edinburgh, EH8 9JZ, UK
Riccardo E. Marioni, Sarah E. Harris, John M. Starr, Ian J. Deary & Peter M. Visscher
Edinburgh Clinical Research Facility, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, UK
Jude Gibson & Lee Murphy
QIMR Berghofer Medical Research Institute, Brisbane, 4029, QLD, Australia
Lisa Bowdler, Jodie N. Painter & Nicholas G. Martin
Alzheimer Scotland Dementia Research Centre, University of Edinburgh, Edinburgh, EH8 9JZ, UK
John M. Starr
Department of Psychology, University of Edinburgh, Edinburgh, EH8 9JZ, UK
Ian J. Deary

Authors

Allan F. McRae
View author publications
You can also search for this author in PubMed Google Scholar
Riccardo E. Marioni
View author publications
You can also search for this author in PubMed Google Scholar
Sonia Shah
View author publications
You can also search for this author in PubMed Google Scholar
Jian Yang
View author publications
You can also search for this author in PubMed Google Scholar
Joseph E. Powell
View author publications
You can also search for this author in PubMed Google Scholar
Sarah E. Harris
View author publications
You can also search for this author in PubMed Google Scholar
Jude Gibson
View author publications
You can also search for this author in PubMed Google Scholar
Anjali K. Henders
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Bowdler
View author publications
You can also search for this author in PubMed Google Scholar
Jodie N. Painter
View author publications
You can also search for this author in PubMed Google Scholar
Lee Murphy
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas G. Martin
View author publications
You can also search for this author in PubMed Google Scholar
John M. Starr
View author publications
You can also search for this author in PubMed Google Scholar
Naomi R. Wray
View author publications
You can also search for this author in PubMed Google Scholar
Ian J. Deary
View author publications
You can also search for this author in PubMed Google Scholar
Peter M. Visscher
View author publications
You can also search for this author in PubMed Google Scholar
Grant W. Montgomery
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceived and designed the experiments: A.F.M., R.E.M., N.R.W., I.J.D., P.M.V., G.W.M. Performed the experiments: S.E.H., J.G., A.K.H., L.B., J.N.P., L.M. Analyzed the data: A.F.M., R.E.M., S.S., J.Y. Contributed reagents/materials/analysis tools: S.E.H., J.G., A.K.H., N.G.M., J.M.S., L.M. Wrote the paper: A.F.M., R.E.M., N.R.W., I.J.D., P.M.V., G.W.M. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Allan F. McRae.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Figures and Tables

Supplementary table 1: List of replicated mQTL

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

McRae, A.F., Marioni, R.E., Shah, S. et al. Identification of 55,000 Replicated DNA Methylation QTL. Sci Rep 8, 17605 (2018). https://doi.org/10.1038/s41598-018-35871-w

Download citation

Received: 14 April 2016
Accepted: 12 November 2018
Published: 04 December 2018
DOI: https://doi.org/10.1038/s41598-018-35871-w

Keywords

This article is cited by

Circadian clock-related genome-wide mendelian randomization identifies putatively genes for ulcerative colitis and its comorbidity
- Mengfen Huang
- Yuan Wu
- Yan Feng
BMC Genomics (2024)
QTL mapping of human retina DNA methylation identifies 87 gene-epigenome interactions in age-related macular degeneration
- Jayshree Advani
- Puja A. Mehta
- Anand Swaroop
Nature Communications (2024)
Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers
- Marina Salvadores
- Fran Supek
Nature Cancer (2024)
Integrative cross-omics and cross-context analysis elucidates molecular links underlying genetic effects on complex traits
- Yihao Lu
- Meritxell Oliva
- Lin S. Chen
Nature Communications (2024)
Genetic control of DNA methylation is largely shared across European and East Asian populations
- Alesha A. Hatton
- Fei-Fei Cheng
- Allan F. McRae
Nature Communications (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.