We characterized DNA methylation quantitative trait loci (mQTLs) in a large collection (n = 166) of human fetal brain samples spanning 56–166 d post-conception, identifying >16,000 fetal brain mQTLs. Fetal brain mQTLs were primarily cis-acting, enriched in regulatory chromatin domains and transcription factor binding sites, and showed substantial overlap with genetic variants that were also associated with gene expression in the brain. Using tissue from three distinct regions of the adult brain (prefrontal cortex, striatum and cerebellum), we found that most fetal brain mQTLs were developmentally stable, although a subset was characterized by fetal-specific effects. Fetal brain mQTLs were enriched amongst risk loci identified in a recent large-scale genome-wide association study (GWAS) of schizophrenia, a severe psychiatric disorder with a hypothesized neurodevelopmental component. Finally, we found that mQTLs can be used to refine GWAS loci through the identification of discrete sites of variable fetal brain methylation associated with schizophrenia risk variants.
Human brain development is orchestrated by complex transcriptional programs1, which are guided and reinforced by epigenetic modifications to DNA and histone proteins. DNA methylation is the most extensively studied epigenetic modification, having a key role in many important genomic regulatory processes2. Of note, the establishment and maintenance of cell- and tissue-specific DNA methylation patterns is crucial for normal mammalian development3. Although traditionally regarded as a mechanism of transcriptional repression, DNA methylation can be associated with both increases and decreases in gene expression4, and has recently been implicated in other genomic functions, including alternative splicing and promoter usage5.
We recently characterized widespread changes in DNA methylation across human fetal brain development6, although the factors influencing inter-individual methylomic variation during the prenatal period are unknown. Studies in a variety of tissues, including adult human brain7,8, have shown that DNA methylation can be influenced by DNA sequence variation. These mQTLs have been found to overlap with DNA variants associated with levels of gene expression (expression quantitative trait loci, eQTLs)4,9, and may serve as markers for these as well as other genetic influences on gene regulation. Although mQTLs have been assessed in the adult human brain using low-resolution DNA methylation arrays7,8, mQTLs in the developing human brain have not been explored.
We combined high-density DNA methylation profiling with genome-wide SNP genotyping in a large (n = 166) collection of human brain samples from the first and second trimester of gestation. Given the growing evidence that many common variants associated with complex diseases act through effects on gene regulation10,11, we subsequently tested for enrichment of fetal mQTL among risk loci identified in a recent large-scale GWAS of schizophrenia12, a neuropsychiatric disorder with a hypothesized neurodevelopmental component13,14. Finally, we found that mQTL data can be used to refine broad GWAS loci through the identification of discrete sites of variable fetal brain methylation associated with schizophrenia risk variants. As a resource to the wider community, we have developed a searchable online database of fetal brain mQTLs that can be accessed at http://epigenetics.essex.ac.uk/mQTL/.
mQTLs in the developing human brain are widespread and predominantly characterized by cis effects
We performed genome-wide single-nucleotide polymorphism (SNP) genotyping and DNA methylation profiling in 166 human fetal brain samples ranging from 56–166 d post-conception (Online Methods and Supplementary Table 1). After stringent quality control, we tested for an additive effect of allele dosage on DNA methylation across all potential pairings of 430,304 SNPs and 314,554 DNA methylation sites to identify fetal brain mQTLs (Supplementary Table 1). We identified 16,809 mQTLs at a conservative Bonferroni-corrected significance threshold of P < 3.69 × 10−13 (Supplementary Tables 2 and 3). The median DNA methylation change per allele across all identified mQTLs was 6.69% (interquartile range (IQR) = 3.17–8.96%) for each mQTL SNP (Supplementary Fig. 1), slightly larger than reported in a previous analysis of genome-wide mQTLs in the adult brain (median effect size = 4.11%, IQR = 2.13–6.97%)7. The majority of mQTL SNPs (74.17%) are associated with DNA methylation at only a single probe; in contrast, most DNA methylation sites (69.83%) showing evidence for association do so with multiple mQTL SNPs, presumably as a result of linkage disequilibrium (LD) between SNPs (Supplementary Fig. 2). A searchable database of fetal brain mQTLs is available at http://epigenetics.essex.ac.uk/mQTL/.
The majority of fetal brain mQTLs (96.3%) involve SNPs and DNA methylation sites on the same chromosome (Fig. 1a and Supplementary Table 2). We defined significant SNP-methylation relationships spanning <500 kb as cis-mQTLs (n = 15,942, 94.8% of total), with those spanning >500 kb or being characterized by inter-chromosomal effects (n = 867, 5.16% of total) as representing trans-mQTLs. The strong enrichment of cis-mQTLs concurred with data from other tissues and cell types7,15,16,17,18. Among the cis-mQTLs, both effect size (that is, DNA methylation change per allele; Fig. 1b) and significance (that is, P value; Supplementary Fig. 3) were related to the distance between the Illumina 450K array probe and mQTL SNP.
Despite the preponderance of cis-mQTLs, there were some notable trans-mQTL effects (Fig. 1c and Supplementary Table 4), consistent with previous reports of long-range genetic regulation of epigenetic variation in multiple cell types19. Although the average effect size for trans-mQTLs was significantly lower than that observed for cis-mQTLs (two-sided Wilcoxon rank sum test, P = 6.74 × 10−7; Supplementary Fig. 1), there was a higher proportion of larger (DNA methylation change per allele > 25%) effects among trans-mQTLs than cis-mQTLs (1.04% versus 0.715%). Of the 178 DNA methylation sites identified as being associated with trans-acting genetic variation in the fetal brain, 50 (28.09%) and 108 (60.67%) were also identified in studies of pancreatic islet cells16 and lymphocytes19, respectively (Supplementary Tables 4 and 5). These long-range associations between genotype and DNA methylation complement data showing interactions between regulatory elements spanning several Mb20, and even between chromosomes21.
Fetal brain mQTLs are significantly enriched in functional regulatory domains
We used data from ENCODE and the Roadmap Epigenomics Project10,22,23,24 to assess whether sites characterized by genotype-associated DNA methylation colocalize with genomic regions associated with markers of transcriptional activity. We observed an enrichment of fetal brain mQTLs in genomic regions characterized by chromatin immunoprecipitation sequencing (ChIP-seq) peaks for repressive histone modifications in fetal brain, such as H3K9me3 (relative enrichment = 1.43, P = 0.00107) and H3K27me3 (relative enrichment = 1.16, P = 0.000144), and a significant depletion of fetal brain mQTLs in genomic regions defined by ChIP-seq peaks for histone modifications associated with active transcription, such as H3K4me1 (relative enrichment = 0.828, P = 6.55 × 10−8) and H3K36me3 (relative enrichment = 0.543, P = 1.39 × 10−15; Supplementary Table 6). Fetal brain mQTLs were found to be significantly enriched in regions of open chromatin, as indicated by DNase1 hypersensitivity sites (DHSs) identified in the adult human brain (relative enrichment = 1.10–1.15, P = 0.00592–6.35 × 10−5;Supplementary Table 7), consistent with the observation that intermediately methylated domains, one potential consequence of allele-specific DNA methylation, are enriched in DHSs25. We also identified a significant enrichment of genotype-associated DNA methylation sites overlapping annotated transcription factor binding sites identified by the ENCODE project10,22 (relative enrichment = 1.26, P = 2.96 × 10−11; Supplementary Table 8). Of note, there was a highly significant enrichment (odds ratio = 1.35, P = 1.66 × 10−9) of fetal brain mQTLs influencing DNA methylation in CCCTC-binding factor (CTCF) motifs (Supplementary Table 8), confirming a finding from a previous study of heritable DNA methylation sites in the human brain26. CTCF is an 11 zinc-finger protein with insulator and chromatin barrier activity whose binding affinity is known to be strongly influenced by DNA methylation27. Given the important role of CTCF in core genomic processes, including transcription, chromosomal interactions and chromatin structure28, the enrichment of genetically mediated DNA methylation at CTCF binding sites highlights an important potential mechanism linking genetic variation to genomic function. In addition to CTCF, a significant enrichment was also observed in binding sites for several other transcription factors, including IRF1 (relative enrichment = 1.34, P = 1.12 × 10−6), GABP (relative enrichment = 1.32, P = 2.99 × 10−6), ELF1 (relative enrichment = 1.26, P = 6.07 × 10−6), Rad21 (relative enrichment = 1.31, P = 0.000133) and CCNT2 (relative enrichment = 1.25, P = 0.000298), with significant depletion in binding sites for others (for example, SUZ12 (relative enrichment = 0.350, P = 1.88 × 10−9) and CtBP2 (relative enrichment 0.444, P = 0.000201); Supplementary Table 8).
Although the majority of fetal brain mQTLs are conserved in adult brain regions, there are fetal-specific genetic effects on DNA methylation at certain loci
We next generated mQTL data from three adult human brain regions (prefrontal cortex (PFC), striatum (STR) and cerebellum (CER)) dissected from matched donors (n = 83; 21–96 years old; Online Methods and Supplementary Table 1) to explore the extent to which fetal brain mQTLs are also present in the adult brain. Using a replication mQTL significance threshold of P < 10−5, we found that the majority (83.46%) of fetal brain mQTLs were present in at least one of the tested adult brain region (Supplementary Table 9 and Fig. 2a) and there was a highly significant overall correlation of individual mQTL effect sizes between fetal brain and each of the individual adult brain regions (PFC: r = 0.911, P < 2.2 × 10−16; STR: r = 0.899, P < 2.2 × 10−16; CER: r = 0.835, P < 2.2 × 10−16; Supplementary Fig. 4) across all Bonferroni-significant fetal brain mQTLs, even in mQTLs that did not meet our replication threshold (Supplementary Fig. 5). Of note, fetal brain mQTLs that did not replicate in adult brain were characterized by significantly lower effect sizes across all brain regions, including the fetal brain discovery sample (P = 3.18 × 10−141; Supplementary Fig. 6). Despite the overall strong concordance in the direction of mQTL effects between fetal and adult brain, there are notable examples of heterogeneity between fetal and adult brain tissue. We used a multilevel linear regression model to test the significance of an interaction term and identify differential mQTL effects across our data sets. Of the 10,663 fetal mQTL effects that we tested, 3,173 (29.76%) were significantly heterogeneous (Bonferroni-corrected, P < 4.69 × 10−6) across the fetal and adult data sets (Supplementary Table 10). These include mQTLs that had notably larger or smaller effects in the adult brain, and fetal-specific mQTLs showing no significant association with DNA methylation in any adult brain region (Fig. 2b). We also identified a small number (n = 45) of fetal brain mQTLs that had opposite effects on DNA methylation in fetal and adult brain samples (Fig. 2c and Supplementary Table 11).
Fetal brain mQTLs overlap with genetic variants associated with RNA transcript abundance in the brain
We used eQTL data from ten adult brain regions29 to test whether identified fetal brain mQTLs overlap with genetic variants associated with RNA transcript abundance in cis. We compared the distribution of the minimum brain eQTL P values of all interrogated SNPs split into the subsets of those identified as fetal brain mQTLs and those that were not (Supplementary Fig. 7), finding that variants associated with DNA methylation were indeed more likely to be associated with gene expression in cis (Wilcoxon rank-sum test P < 2.2 × 10−16). Of the 414,172 SNPs tested in both the mQTL and eQTL data sets, 9,869 were identified as being Bonferroni-significant cis mQTLs and 2,674 as being Bonferroni-significant (P < 5.99 × 10−9) eQTLs, with an overlap of 750 variants associated with 227 DNA methylation probes and 127 transcript probes (Supplementary Table 12). At a more relaxed eQTL threshold (P < 1.00 × 10−7), there was an overlap of 1,042 variants associated with 315 DNA methylation probes and 183 transcript probes. A list of all variants associated with both DNA methylation and gene expression in cis is given in Supplementary Table 13. This overlapping set of variants likely includes multiple SNPs in LD that are associated with one gene expression transcript and DNA methylation site. Because the extent and magnitude of LD varies across the genome, we established an LD-independent set of SNPs associated with DNA methylation and tested the overlap of these with the sentinelized subsignals, that is, the most associated marker from a set in high LD (r2 > 0.8), from the brain eQTL data set29. Compared with 1,000,000 simulated mQTL SNP sets matched for allele frequency, this overlap was significantly greater than expected (relative enrichment = 4.23, P < 1.00 × 10−6 after 1,000,000 simulations; Supplementary Table 14 and Supplementary Fig. 8).
There is a significant enrichment of schizophrenia-associated GWAS variants in fetal brain mQTLs
Our catalog of fetal brain mQTLs provides a unique resource for investigating putative functional consequences of genetic variation associated with postulated neurodevelopmental disorders such as schizophrenia. A recent large-scale GWAS identified 108 independent genomic loci exhibiting genome-wide significant association with the disorder (P < 5 × 10−8), with evidence for a substantial polygenic component in signals that fall below this stringent level of significance12. Because the majority of these variants reside in regions of strong LD and do not index coding variants affecting protein structure, there remains considerable uncertainty about the causal genes involved in pathogenesis and the way in which they are functionally regulated by schizophrenia risk variants. We used PLINK30 to 'clump' our list of significant (P < 3.69 × 10−13) fetal brain mQTL variants into a set of quasi-independent SNPs (SNP pairwise r2 < 0.25 in 250 kb (non-major histocompatibility complex, MHC) or 10,000 kb (MHC); see Online Methods) and tested for enrichment of schizophrenia-associated variants across a range of GWAS significance thresholds, using up to 1,000,000 simulated SNP sets to generate empirical P values (Online Methods). We observed a highly significant enrichment (relative enrichment = 4.11, P = 3.0 × 10−6) of genome-wide significant schizophrenia risk variants amongst fetal brain mQTLs, with a trend for stronger enrichment at more stringent levels of GWAS significance (Supplementary Fig. 9 and Table 1). To examine the specificity of any enrichment, we repeated these analyses using large GWAS data sets from a non-neurodevelopmental brain disorder (Alzheimer's disease, AD31) and two non-neurological phenotypes (body mass index, BMI32, and type 2 diabetes, T2D33). Although our confidence in the enrichment of fetal brain mQTL in these data sets is limited by the smaller number of semi-independent GWAS SNPs, levels of enrichment were found to be notably lower for all other tested phenotypes (Table 1). Variants associated with AD were, however, nominally significantly enriched at the most relaxed GWAS threshold (GWAS threshold P < 5 × 10−5: relative enrichment = 3.18, P = 0.022) and several individual GWAS variants identified for this and the other tested phenotypes were also significant mQTLs in fetal brain mQTLs (P < 3.69 × 10−13; Supplementary Table 15).
The identified mQTLs residing in schizophrenia-associated GWAS regions do not necessarily represent the actual causal risk variants; in many instances, we are likely to be capturing 'passenger' effects whereby the variant influencing DNA methylation and the schizophrenia-associated SNP are instead co-segregating in the same LD block. Thus, we sought to identify instances in which a likely causal risk variant for schizophrenia was an mQTL SNP. We used data from the 1000 Genomes Project (http://www.1000genomes.org/) to identify all variants in strong LD (r2 > 0.8) with the 125 autosomal index SNPs associated with schizophrenia12. Of note, two of the actual schizophrenia GWAS index SNPs represented Bonferroni-significant fetal brain mQTLs: rs2535627 (associated with DNA methylation at cg11645453, P = 3.05 × 10−13; Supplementary Fig. 10) and rs4648845 (associated with DNA methylation at cg02275930, P = 4.54 × 10−15; Supplementary Fig. 11). 46 additional mQTL variants that were in strong LD with another six index SNPs were part of 86 highly significant fetal brain mQTL pairs (Supplementary Table 16).
mQTLs can be used to localize putative causal loci within large genomic regions associated with schizophrenia
To generate a more comprehensive database of mQTLs in the fetal brain and to identify more examples in which the same SNP is associated with both DNA methylation and disease, we imputed our genotype data using the most recent panel downloaded from the 1000 Genomes Project (Online Methods). Using an imputed set of 5,177,320 variants, we identified an additional 256,040 mQTLs, which reflected the non-imputed data set in terms of genomic distribution and observed effect sizes (Supplementary Table 2 and Supplementary Fig. 12). The full list of fetal brain mQTLs after imputation can be downloaded from http://epigenetics.essex.ac.uk/mQTL//All_Imputed_BonfSignificant_mQTLs.csv.gz. Our imputed data enabled us to identify 1,067 instances in which the same SNP was associated with both DNA methylation and schizophrenia, with a comprehensive list available for download at http://epigenetics.essex.ac.uk/mQTL//PGC_IndexSNPs_QTLs_Inc2PCs_AllTissues_MatchSNPPosition.csv. Because they could be biased by the LD structure at associated loci, the imputed mQTL data were not used for subsequent enrichment analyses, but they enabled us to further refine schizophrenia candidate regions and undertake colocalization analyses to identify variants associated with both DNA methylation and schizophrenia. We performed a Bayesian colocalization analysis34 across the 105 autosomal regions associated with schizophrenia12, spanning 19,378 DNA methylation sites included in our analysis. Instead of focusing only on the intersection of significant variants associated with two phenotypes independently, this approach compares the pattern of association results from the schizophrenia GWAS and mQTL analyses across a region, combining the summary statistics into posterior probabilities for five hypotheses (Online Methods). As this methodology assumes that the causal variant is present, or at least very well tagged, in the data set, these colocalization analyses were performed using our imputed fetal brain mQTL data set. The posterior probabilities for 65 regions, involving 296 DNA methylation sites in 306 pairs, were supportive of a colocalized association signal for both schizophrenia and DNA methylation in that region (PP3+PP4 > 0.99; Supplementary Table 17). 26 of these pairs (covering 15 regions associated with schizophrenia) had a higher posterior probability for both schizophrenia and DNA methylation being associated with the same causal variant (PP4/PP3 > 1), with 16 (10 regions) of these having sufficient support for them to be considered as 'convincing' (PP4/PP3 > 5) according to the criteria of a previous study34. Of note, three of these top-ranked pairs were annotated to the AS3MT locus in a robust schizophrenia-associated region on chromosome 10 (Supplementary Table 17). The utility of mQTL mapping for localizing putative causal loci associated with disease in this region is shown in Figure 3, with additional examples for other schizophrenia-associated regions available on our website (http://epigenetics.essex.ac.uk/mQTL/).
Schizophrenia-associated genomic regions are enriched for fetal-specific mQTLs
Given the hypothesized neurodevelopmental component to schizophrenia, we examined the extent to which the mQTLs overlapping with schizophrenia-associated variants are characterized by fetal-specific effects (Supplementary Table 16). Across the 78 mQTLs also tested in adult brain samples, overall effect sizes were significantly larger in fetal brain than all of the adult brain regions tested (Wilcoxon rank-sum test: PFC P = 0.0420, STR P = 0.00226, CER P = 0.00998; Fig. 4). Our heterogeneity analysis highlighted 16 (20.5%) instances in which significantly different relationships between genotype and DNA methylation were found across the adult and fetal data sets, with eight classed as fetal-specific variants (that is, those not reaching our replication threshold (P < 1.00 × 10−5) in any adult brain region) and the remaining eight demonstrating smaller effects across the adult brain.
To explore the functional consequences of genetic variation in the developing human brain, we characterized mQTLs in human fetal brain samples (spanning 56–166 d post-conception), identifying over 16,000 associated pairs of SNPs and DNA methylation sites. We found that fetal brain mQTLs were significantly enriched in functional regulatory domains, including DHSs, regions of repressive histone modifications and specific transcription factor binding sites across the genome, and observed significant overlap with genetic variants influencing gene expression in the brain. Although the majority of fetal brain mQTLs appear to be conserved across adult brain regions, we found evidence for fetal-specific genetic effects at certain loci. Our data concur with findings from an independent study of cortical mQTLs across development35; mQTL effects were highly consistent across both analyses (Supplementary Fig. 13) and largely conserved between fetal and adult brain.
There is growing evidence that the majority of common variants associated with complex traits act through effects on gene regulation10,11. Our data add to a growing literature showing that DNA methylation is genetically influenced26, with mQTLs representing a potential mechanism linking genetic variation to complex phenotypes4,9,36,37. We found a significant enrichment of schizophrenia-associated GWAS variants in fetal brain mQTLs, indicating that common genetic variants conferring risk for schizophrenia are associated with altered DNA methylation in fetal human brain. The hypothesis that schizophrenia has an early neurodevelopmental component is supported by several lines of epidemiological and neuropathological evidence13,14. However, direct molecular evidence of schizophrenia risk factors operating in the fetal brain is scarce38,39. We recently found that genomic loci that are differentially methylated between schizophrenia patients and unaffected controls in the adult brain are enriched at those undergoing dynamic changes in DNA methylation during human fetal brain development6,40. Here we found that genetic variants exhibiting genome-wide significant association with schizophrenia12 showed a fourfold enrichment amongst fetal brain mQTLs, directly implicating altered gene regulation during fetal brain development in the etiology of the disorder.
To conclude, we report, to the best of our knowledge, the first systematic analysis of genetically mediated DNA methylation in the developing human brain. Our data support the hypothesis that a substantial proportion of the genetic variants conferring schizophrenia risk have regulatory effects that become manifest early in the prenatal period and demonstrate the utility of mQTL mapping for localizing putative causal loci associated with complex disease phenotypes in large genomic regions. As a resource to the wider community, we have developed a searchable online database of fetal brain mQTLs that can be accessed at http://epigenetics.essex.ac.uk/mQTL/.
Human brain samples.
Human fetal brain tissue was acquired from the Human Developmental Biology Resource (HDBR) (http://www.hdbr.org) and MRC Brain Banks network (http://www.mrc.ac.uk/research/facilities/brain-banks/access-for-research). Ethical approval for the HDBR was granted by the Royal Free Hospital research ethics committee under reference 08/H0712/34 and Human Tissue Authority (HTA) material storage license 12220; ethical approval for MRC Brain Bank was granted under reference 08/MRE09/38. A detailed description of these samples can be found elsewhere6. Briefly 173 fetal brain samples (94 male, 79 female) ranging from 56–169 d post-conception were used for DNA methylation and SNP profiling. Brain tissue was obtained frozen and had not been dissected into regions. Half of the brain tissue from each individual fetus was homogenized for subsequent genomic DNA extraction. Postnatal prefrontal cortex (PFC), striatum (STR) and cerebellum (CER) samples were obtained from the MRC London Neurodegenerative Disease Brain Bank and the Douglas Bell-Canada Brain Bank (DBCBB) (http://www.douglasbrainbank.ca) and included both schizophrenia and controls. Postmortem brain specimens were collected postmortem following consent obtained with next of kin, dissected by neuropathology technicians, snap-frozen and stored at –80 °C. Genomic DNA was isolated from all brain samples using a standard phenol-chloroform extraction protocol. DNA was tested for degradation and purity using spectrophotometry and gel electrophoresis.
Genome-wide quantification of DNA methylation.
500 ng of DNA from each sample was treated with sodium bisulfite in duplicate, using the EZ-96 DNA Methylation kit (Zymo Research). DNA methylation was quantified using the Illumina Infinium HumanMethylation450 BeadChip (Illumina) run on an Illumina iScan System (Illumina) using the manufacturers' standard protocol. Signal intensities for each probe were extracted using Illumina GenomeStudio software (Illumina) and imported into the R statistical program using the methylumi and minfi packages41,42. Multi-dimensional scaling (MDS) plots of variable probes on the sex chromosomes were used to check that the predicted gender corresponded with the reported gender for each individual. Further data quality control and processing steps were conducted using the wateRmelon package43 in R. The pfilter function was used to filter first samples with >1% probes with a detection P value > 0.05 were removed and probes with a detection P value > 0.05 in at least 1% samples or/and a beadcount <3 in 5% of samples were removed across all samples to control for poor quality probes. The dasen function was used to normalize the data as previously described44. Cross-hybridizing probes44,45, probes with any SNP in 10 bp of the CpG site or single base extension44 and probes on the sex chromosomes were excluded from the QTL analysis. These data are publically available through GEO and can be found under accession numbers: GSE58885, GSE61431, GSE61380. Genotype data is available to access from dbSNP. All fetal brain mQTL data are also available via an online database at http://epigenetics.essex.ac.uk/mQTL/.
Genome-wide SNP genotyping.
200 ng of genomic DNA from each sample was genotyped using the Illumina HumanOmniExpress BeadChip (Illumina). Following scanning, Illumina GenomeStudio software was used for genotype calling and the data were exported as ped and map files. PLINK30 was used to remove samples with >5% missing values, and SNPs with > 1% missing values, Hardy-Weinburg equilibrium P < 0.001, and a minor allele frequency of <5%. Subsequently, SNPs were also filtered so that each of the three genotype groups with 0, 1 or 2 minor alleles (or two genotype groups in the case of rare SNPs with 0 or 1 minor allele) had a minimum of 5 observations.
Methylation QTL (mQTL) analyses.
Before commencing QTL analyses, genotypes at the polymorphic SNP probes on the HumanMethylation 450K array were compared to calls from the HumanOmniExpress genotyping array to confirm sample identity. All genome-wide SNP-methylation probe pairs were tested using the R package MatrixEQTL46. This package enables fast computation of QTLs by only saving those more significant than a pre-defined threshold (set to P = 0.0001 for these analyses). An additive linear model was fitted to test if the number of alleles (coded 0, 1 and 2) predicted DNA methylation (beta value 0–100) at each site, including covariates for age, sex and the first two principal components from the genotype data to control for ethnicity differences. In addition, a brain bank covariate was also included for the adult data sets.
Identifying overlap and testing for enrichment of expression QTLs (eQTLs) among fetal brain mQTLs.
P values for all cis eQTLs (within 1Mb) were supplied by the authors of a recent manuscript documenting eQTLs in the human brain29 to enable a more thorough examination of the overlap of eQTL and mQTL. To identify all variants associated with DNA methylation and gene expression in cis, our definition of cis mQTL was relaxed to match that used in the eQTL study. Chromosome and base position of the SNPs were used to map between the two data sets. A Bonferroni significance threshold was established for the eQTL results (P from aveALL analysis < 5.99 × 10−9), based on the number of cis eQTLs tested across all SNPs overlapping with those tested in the mQTL data set (414,172), in addition to two more relaxed exploratory thresholds (P < 10−9; 10−7).
Prior to testing for a significant overlap with SNPs associated with brain eQTLs all SNPs associated with at least one DNA methylation site in the fetal brain were 'clumped' based on their best mQTL P value using PLINK30 to create a list of quasi-independent SNPs (r2 < 0.25 for all pairs of SNPs within 250 kb) and prevent LD between SNPs in the set biasing the results. Given the extensive correlation between variants in the major histocompatibility complex (MHC) region, a more stringent clumping procedure was used for SNPs located in chr6:25000000–35000000, where the window for pairwise SNP comparisons was extended to 10,000 kb. To test for a larger overlap than expected by chance, up to 1,000,000 simulated sets, matched for allele frequency, were drawn to calculate the expected overlap and generate empirical P values. SNPs were categorised into MAF bins split at intervals of 2%, and SNPs from each bin were selected to match the distribution in the test set. Empirical significance for enrichment of eQTLs in mQTLs was ascertained by counting the number of simulations with at least as many SNPs overlapping the set of sentinelized subsignals from the aveALL analysis described previously29, as the true 'clumped' Bonferroni significant mQTL SNP set and dividing by the number of simulations performed. Fold change statistics were calculated as the true overlap divided by the mean overlap of these simulations, and 95% confidence intervals as the true overlap divided by the 2.5th and 97.5th quantiles of the distribution of overlaps.
Enrichment of regulatory regions.
Published 450K array probe annotations22 were used to identify probes located in transcription factor binding sites (TFBSs) or DNase1 hypersensitivity sites (DHSs) based on data made publically available as part of the ENCODE project10,25. In addition, brain specific DHSs were downloaded from the UCSC (University of California, Santa Cruz) Genome Browser for 'Frontal_cortex_OC', 'Cerebellum_OC' and 'Cerebrum_frontal_OC' and used to annotate DNA methylation sites in the same manner. Peaks associated with 5 histone modifications identified separately in two fetal brain samples (17 weeks gestation; 1 male, 1 female; sample IDs E081 and E082) were downloaded from the Epigenomics Roadmap project23. Due to the heterogeneity in the Chip-seq profiles, presumed due to experimental differences rather than biological differences47, DNA methylation sites had to be located within peaks generated from both brain samples to be classed as overlapping any of the histone marks. The overlap between regulatory features and the DNA methylation sites identified from the set of Bonferroni significant mQTLs in the fetal brain data set was tested for enrichment using a two sided Fisher's 2 × 2 exact test. The significance level for enrichment of overlap with transcription factor binding sites was calculated using a Bonferroni correction for the 149 different transcription factor binding sites tested.
All Bonferroni-significant mQTLs (P < 3.69 × 10−13) identified in the fetal brain, for which corresponding mQTL data was available from all three adult brain regions, were tested for heterogeneous relationships between DNA methylation and genetic variation across the data sets (n = 10,663). A null model of no heterogeneity was fitted in line with the linear model fitted to test for mQTL effects between the number of alleles (coded 0, 1 and 2) and DNA methylation (beta value 0-100) with fixed effect covariates for sex, age and the first two genetic principal components. As the adult brain regions were dissected from the same set of individuals, we expect their DNA methylation values to be correlated. In addition, we expect DNA methylation values within a brain region to be correlated, and therefore both of these covariates were included as random effects in addition to an indicator variable discriminating fetal from adult samples to control for absolute differences in DNA methylation level associated with age/development stage. This was compared to a heterogeneity model which included an interaction between genotype and development stage indicator with an ANOVA to calculate the heterogeneity P value.
Enrichment of disease-associated variants among fetal brain mQTLs.
A similar simulation procedure to that used to test the overlap of mQTLs and eQTLs was used to test for a larger overlap than expected by chance between fetal brain mQTL SNPs and those identified in GWAS of complex disorders including: schizophrenia12, Alzheimer's disease31, body mass index (BMI)32, and type 2 diabetes33. The clumping procedure as described for the eQTL enrichment analysis was repeated separately for each phenotype to ensure that the best mQTL SNP present in those analyzed in the GWAS was retained. Up to 1 million simulations were performed to generate the expected overlap between the set of mQTL SNPs and variants associated with each disorder at four GWAS significance thresholds (P < 5 × 10−5, 5 × 10−6, 5 × 10−7, 5 × 10−8) and derive fold change statistics and empirical P values.
Prior to imputation PLINK30 was used to remove samples with >5% missing data. We also excluded SNPs characterized by >1% missing values, a Hardy-Weinberg equilibrium P < 0.001 and a minor allele frequency of <5%. These were recoded as vcf files using PLINK1.9 (ref. 48) and VCFtools49 before uploading to the Michigan Imputation Server (https://imputationserver.sph.umich.edu/start.html#!pages/home) which uses SHAPEIT50,51 to phase haplotypes, and Minimac352 with the most recent 1000 Genomes reference panel (phase 3, version 5). Imputed genotypes were then filtered and recoded with PLINK1.9 (ref. 48) removing samples with >5% missing values, and SNPs with >2 alleles, those indicated as a fail in the FILTER columns using the flag '–vcf-filter', in addition to those characterized by >1% missing values, a Hardy-Weinberg equilibrium P < 0.001, a minor allele frequency of <5%, or <5 observations for any genotype group in line with the SNP filtering for the raw genotype groups. This resulted in 5,177,320 variants in the imputed set of genotypes. MatrixEQTL48 was used to test genome-wide mQTLs as previously described, except only mQTL with P < 10−8 were recorded.
Schizophrenia associated genomic loci were taken as the 105 autosomal regions published as part of the PGC mega-analysis12. Given our definition of cis mQTLs (that is, associations between SNPs and DNA methylation probes within 500 kb), all DNA methylation sites located within 500 kb of these regions were identified and cis mQTL analysis was repeated using the imputed genotypes using MatrixEQTL46 and recording all mQTL results. Colocalization analysis was performed as previously described34 using the R coloc package (http://cran.r-project.org/web/packages/coloc) for each DNA methylation site within each region. In total 19,607 possible mQTLs were tested. From both the PGC schizophrenia GWAS data and our mQTL results we inputted the regression coefficients, their variances and SNP minor allele frequencies, and the prior probabilities were left as their default values. This methodology quantifies the support across the results of each GWAS for five hypotheses by calculating the posterior probabilities, denoted as PPi for hypothesis Hi.
H0: there exist no causal variants for either trait;
H1: there exists a causal variant for one trait only, schizophrenia;
H2: there exists a causal variant for one trait only, DNA methylation;
H3: there exist two distinct causal variants, one for each trait;
H4: there exists a single causal variant common to both traits.
A Supplementary Methods Checklist is available.
Gene Expression Omnibus
We thank M. Weale for providing eQTL data from the BRAINEAC database. This work was supported by grants from the UK Medical Research Council (MRC; MR/K013807/1 to J.M. and MR/L010674/1 to N.J.B.) and the US National Institutes of Health (AG036039) to J.M. R.P. and H.S. were funded by MRC PhD studentships. The human embryonic and fetal material was provided by the Joint MRC/Wellcome Trust (grant #099175/Z/12/Z) Human Developmental Biology Resource.
Integrated supplementary information
R code for analyses and figures.