Emerging evidence suggests that schizophrenia (SZ) susceptibility involves variation at genetic, epigenetic and transcriptome levels. We describe an integrated approach that leverages DNA methylation and gene expression data to prioritize genetic variation involved in disease. DNA methylation levels were obtained from whole blood of 260 SZ patients and 250 unaffected controls of which a subset with gene expression data was available. By assessing DNA methylation and gene expression in cases and controls, we identified 432 CpG sites with differential methylation levels that are associated with differential gene expression. We hypothesized that genetic factors involved in these methylation levels may be associated with the genetic risk of SZ susceptibility. To test this hypothesis, we used results from the Psychiatric Genomics Consortium SZ genome-wide association study (GWAS). We observe an enrichment of SZ-associated SNPs in the mQTLs of which the associated CpG site is also correlated with differential gene expression in SZ. While this enrichment was already apparent when using nominal significant thresholds, enrichment was even more pronounced when applying more stringent significance levels. One locus, previously identified as susceptibility locus in a SZ GWAS, involves SNP rs11191514:C>T, which regulates DNA methylation of calcium homeostasis modulator 1 that is also associated with differential gene expression in patients. Overall, our results suggest that epigenetic variation plays an important role in SZ susceptibility and that the integration of analyses of genetic, epigenetic and gene expression profiles may be a biologically meaningful approach for identifying disease susceptibility loci, even when using whole blood data in studies of brain-related disorders.
Although a number of susceptibility loci for psychiatric disorders have been identified through genome-wide association studies (GWAS), the vast majority of the estimated heritability of these traits remains unexplained. Schizophrenia (SZ), for example, is a common polygenic mental disorder affecting about 1% of the population and has an estimated heritability of 70–80%,1, 2, 3 but only a small fraction of the heritability can be attributed to known susceptibility loci.4, 5 Very large numbers of samples are needed to comprehensively identify the hundreds or possibly thousands of genetic loci involved in SZ susceptibility.6 It has been suggested that epigenetic variation might be partly responsible for the missing heritability7, 8 and be involved in phenotypic variation.9, 10, 11 Recent evidence shows that SZ susceptibility loci are enriched for gene expression QTLs,12 and a similar enrichment exists for QTLs affecting DNA methylation and expression in bipolar disorder.13 Given these results, we hypothesized that intersecting disease-related gene expression data with disease-related methylation data might lead to identification of genetic susceptibility loci. To test this hypothesis, we combined available whole blood DNA methylation and gene expression data of SZ patients and healthy controls. Using the results of the Psychiatric Genomics Consortium (PGC) SZ mega GWAS,14 we examined whether methylation QTLs (mQTLs) are enriched for disease susceptibility loci. While previous studies examined gene expression and DNA methylation separately and in different ways,12, 13 we considered both DNA methylation and gene expression data simultaneously in the same SZ cases and controls. Previous studies investigating enrichment of QTL signals (involving either gene expression or DNA methylation) focused on SZ-related SNPs with genome-wide significant evidence of association. We combined epigenome and transcriptome levels of information and did not restrict ourselves to only those loci with prior evidence of genome-wide association. Our results show that the combined use of gene expression and methylation data outperforms either data modality when it comes to identifying disease susceptibility loci, even in a relatively small sample size compared with GWAS.
Subjects and methods
An outline of the approach is provided in Figure 1. Genome-wide genotype data and DNA methylation data were obtained from whole blood of 260 SZ patients and 250 control subjects; for a subset of 120 cases and 120 controls, whole blood gene expression data were available as well. A detailed description of the samples, procedures and quality control steps is provided in the Supplementary Material. The data is available via NCBI/GEO under accession number GSE41037 and GSE38484. First, we examined differential methylation levels (after quantile normalization) between cases and controls. DNA methylation levels with low variability (a Beta range <0.1) were excluded to focus on CpGs with reasonably large biological variation. We used a linear model (the limma package in R15) to regress methylation values on disease status, gender and age.16 FDR correction at the 5% level was applied to correct for multiple testing.
Next, for methylation levels that differed between cases and controls (referred to as differentially methylated loci, or abbreviated DMs), association with gene expression was investigated using a genome-wide DNA methylation–gene expression association study as described previously.17 We focused on CpG sites with cis effects on expression within 500 kb interval. Locus-specific correction for multiple testing was applied for the number of expression probes within each cis region of that methylation probe (ie, 0.05 divided by the number of expression probes per cis area). This relatively lenient significance threshold was chosen to ensure a sufficient number of probes for subsequent filtering steps and analysis. The subset of DMs associated with gene expression is referred to as DMEs, as determined in control samples. A subset of the DMEs is not only associated with gene expression levels in general but is associated with transcript levels of differentially expressed genes when comparing SZ patients and controls as described previously.18 In short, we conducted linear regression to detect differences in expression levels between cases and controls after FDR correction at the 5% level, with age and gender as covariates.18 The DMEs associated with differential gene expression in SZ are referred to as Differential Methylation regions with Differential gene Expression (DMDEs).
To identify mQTLs, we used a multivariate linear regression model for regressing all methylation values (dependent variable) on the SNPs (independent variable) with disease status, age and gender as covariates using PLINK.19 We defined a SNP as cis-acting if it was significantly associated (P<5.0e−08) with a methylation probe within 500 kb. Next, four groups of SNPs with a minimum minor allele frequency of 0.05 were generated with increasing functional relevance with regard to DNA methylation and gene expression: (i) SNPs representing all mQTLs regardless of differential methylation between patients and controls; (ii) SNPs from the previous step that represent mQTLs with differential methylation between cases and controls (DMs); (iii) SNPs from the previous step for which the associated methylation level is associated with gene expression (DMEs); and finally (iv) SNPs selected from the previous step that are associated with differential methylation that is associated with differential gene expression between patients and controls (DMDEs). After linkage disequilibrium (LD)-pruning the SNPs with an R2 of 0.2 using PLINK, to exclude possible signal bias and enrich for independent genetic signal, we extracted the SNPs from the PGC mega GWAS results with their association signal with SZ.
Based on the significance values observed in the PGC SZ GWAS, we categorized the four different SNP lists and examined the observed/expected ratios for different thresholds ranging from P<0.5 to <1.0e−04.
We examined whether genetic factors involved in differential methylation and gene expression in SZ are enriched for SZ susceptibility alleles. The results of the steps are shown in Figure 1. In total, 11 320 CpGs were differentially methylated in SZ cases versus controls (5% FDR correction). Of these, 1095 CpGs are associated with 1226 transcripts in cis (after locus-specific correction). The 1226 transcripts were examined for differential expression levels between patients and controls based on data from a previous study of our group.18 This step resulted in the identification of 391 transcripts (after 5% FDR correction); these 391 transcripts are associated with 432 CpG sites. Information about the differentially methylated probes is shown in Supplementary Table 1.
We subsequently calculated the mQTLs associated with CpGs in cis, and SNPs with a P-value<5.0e−08 were grouped by the biological relevance of the associated CpG site (see Subjects and methods). Figure 1 shows the number of associated CpGs and SNPs, and the statistics of the associations are provided in Supplementary Table 2.
After calculating the observed/expected ratio of the mQTLs per PGC GWAS P-value threshold, our results show a significant pattern of enrichment of PGC SZ association signal by adding DNA methylation and gene expression data (Figure 2). The strongest enrichment was observed for SNPs associated with DMDEs (P=0.0041 with OR=22 and 95%CI=2.60–83.7; Fisher’s exact test).
We identified one DMDE locus on chromosome 10 with genome-wide significant evidence of association with SZ. SNP rs11191514:C>T (chr10.hg19:c.104773364C>T) at this locus represents an mQTL associated with differential DNA methylation between cases and controls that is correlated with the expression of a gene that is also differentially expressed between cases and controls (Figures 3a–c). In addition, this SNP is in perfect LD with rs11191580:C>T (chr10.hg19:t.104906211C>T), a SNP with significant evidence of association with SZ according to the PGC study (PGC P=2.2e−08). SNP rs11191514:C>T is also strongly associated with SZ (PGC P=8.7e−08) and likely regulates DNA methylation in our data set (P=1.4e−14, Figure 3d). SNP rs11191514:C>T is located in CNNM2, while the methylation probe associated with this SNP is located at the gene Calcium Homeostasis Modulator 1 (CALHM1), some 630 kb apart (Supplementary Figure 1). CALHM1 regulates Ca2+ concentrations20 and is highly expressed in the hippocampus,20 which is implicated in SZ.21 Recently, a GWAS identified this whole locus as susceptibility locus for SZ, including the genes CALHM1 and CNNM2, P=3.7e−13.22
In addition to rs11191514:C>T, we identified four additional independent DMDE SNPs associated with SZ with a significance level of P<0.01 (Supplementary Table 3), that is, these SNPs are at best suggestive of a susceptibility locus for SZ.
These additional loci highlighted in our study include proline-rich transmembrane protein 1 (PRRT1), HLA-C and MRPL41. The CpG sites of these genes are all differentially methylated in patients, and are associated with genotype as well as with transcripts that are differentially expressed in patients. PRRT1 and HLA-C are both located within the MHC region on the short arm of chromosome 6, a region with the most significant association with SZ.14, 23 Little is known about the function of PRRT1 and this finding suggests that further study of its involvement in SZ is warranted. HLA-C belongs to the HLA class 1 molecules that have a central role in the immune system and is located within the chromosomal region first implicated in SZ.11 Finally, MRPL41 is a mitochondrial ribosomal protein and has an important role in cell growth suppression in association with p53 and p27Kip1.24 It is not clear how this ribosomal protein may be involved in the etiology of SZ. As the MHC region is overrepresented in the PGC SZ GWAS findings, we were concerned about a possible bias in our analysis. However, when we performed the same analyses without chromosome 6, we still observed an enrichment of genetic signal associated with SZ (P=0.0037 with OR=23 and 95%CI=2.8–89.2; Fisher’s exact test). This suggests that functional enrichment analysis by using gene expression and DNA methylation from whole blood is not dependent on the MHC region.
We investigated the enrichment of mQTLs for SZ susceptibility alleles using genotype and DNA methylation data obtained from whole blood of >500 SZ cases and controls, and of which gene expression data were available from a subset. We detected over 10 000 CpG sites that are differentially methylated in SZ patients. A subset of these sites (n=1095) was associated with gene expression with 50% of these transcripts also showing differentially gene expression in SZ. In addition, SNPs associated with differential methylation levels are enriched for SZ susceptibility alleles. This enrichment was even stronger if the differential methylation level associated with these SNPs were also associated with differential expression levels. While this enrichment was already apparent when using nominal significant thresholds, enrichment was even more pronounced when applying more stringent significance levels. Previous studies have shown that top GWAS findings are enriched for mQTLs and eQTLs in bipolar disorder13 and for eQTLs in SZ.12 In accordance, here we show that mQTLs are enriched for SZ SNPs. In addition, we used different layers of genomic information to identify mQTLs associated with CpGs of functional relevance and found increasing enrichment. Our findings show that genetic variation affecting DNA methylation, that is associated with gene expression, has an important role in SZ susceptibility.
As SZ is primarily a brain-related disease, a limiting factor of our study may be the use of whole blood. However, our findings show that even when using blood, we observe enrichment of SZ-associated alleles, indicating that blood might be a reasonable surrogate for our approach. Replication of these analyses in brain tissue would be useful to understand the extent of enrichment of disease-specific signal when combining different genomic layers for prioritizing genomic loci. Another potential expansion of this study includes the obtaining of allele-specific methylation and gene expression information, which could provide more insight into the precise mechanism of cis-acting regulating SNPs and their effects on methylation and gene expression. Finally, the relationship between genetic variation, DNA methylation, gene expression and disease susceptibility is complex and warrants further study.
In summary, we identified biologically plausible SZ susceptibility loci in whole blood in a relatively small sample of <600 subjects. We demonstrate that enrichment of genetic data using different layers of genomic information may be an efficient approach to identify disease susceptibility loci for neuropsychiatric traits. While our results are supportive of an important role of epigenetic regulation in SZ, we expect that this integrated approach based on blood DNA methylation and gene expression data from the same subjects may help prioritize SNPs from other GWAS as well.
We thank the patients and controls for participating in the study. This study was supported by National Institutes of Health (NIH) grants R01 DA028526, R01 NS058980 and RO1 MH 078075 (to RAO). We thank Dr Kim Staats for critical reading of the manuscript.