DNA Methylation Signatures of a Large Cohort Monozygotic Twins Clinically Discordant for Multiple Sclerosis

Multiple sclerosis (MS) is an inflammatory demyelinating disease of the central nervous system with a modest concordance rate in monozygotic twins that strongly argues for involvement of epigenetic factors. We observe in 45 MS discordant monozygotic twins highly similar peripheral blood mononuclear cell-based methylomes. However, a few MS-associated differentially methylated positions (DMP) were identified and validated, including a region in the TMEM232 promoter and ZBTB16 enhancer. In CD4+ T cells we observed an MS-associated differentially methylated region in FIRRE. In addition, many regions showed large methylation differences in individual pairs, but were not clearly associated with MS. Furthermore, epigenetic biomarkers for current interferon-beta treatment were identified, and extensive validation revealed the ZBTB16 DMP as a signature of prior glucocorticoid treatment. Altogether, our study represents an important reference for epigenomic MS studies. It identifies new candidate epigenetic markers, highlights treatment effects and genetic background as major confounders, and argues against some previously reported MS-associated epigenetic candidates.

Multiple sclerosis (MS) is one of the leading causes of neurological disability in young adults 1 , and is considered to be an autoimmune disease, characterized by chronic inflammatory demyelination of neurons in the central nervous system 2 . Although nuclear genetic factors contribute to the development of MS 3 , the maximum reported concordance rate for MS in monozygotic (MZ) twins is only 25% 4,5 , which indicates that for the development of clinical symptoms interaction with other risk factors remains compulsory. Despite of various studies suggesting that mitochondrial DNA variants are plausible susceptibility factors for MS, we recently showed that mitochondrial DNA variation (e.g. skewed heteroplasmy) does not play a major role in the discordant clinical manifestation of MS in MZ twins 6 .
Another source of molecular variation that can cause discordant phenotypes within MZ twins are DNA methylation differences [7][8][9][10] . DNA methylation is an epigenetic modification that in mammals almost exclusively occurs at cytosines within CpG dinucleotides. Since changes in CpG methylation can cause transcriptional alterations, DNA methylation is crucial for normal development and aberrant DNA methylation has been observed in various human diseases 11,12 .
Discordant DNA methylation profiles within MZ twins have quite frequently been reported at imprinted regions 7,8 , which are characterized by parent-of-origin-specific methylation patterns resulting in mono-allelic expression. Since a number of studies reported a maternal parent-oforigin effect in MS susceptibility 13,14 , and several imprinted genes have been linked to immune system development and functioning (reviewed by Ruhrmann et al. 15 ), genomic imprinting errors might be involved in the pathogenesis of MS 15 . In addition, environmental risk factors such as smoking, history of symptomatic Epstein-Barr virus infection and vitamin D deficiency have been associated with an increased MS risk [16][17][18] . Although the molecular mechanisms underlying these associations remain unknown, it is possible that these environmental factors operate by inducing DNA methylation changes.
Thus far, several epigenome-wide association studies (EWAS) for MS have been carried out [19][20][21][22][23][24][25] , and a number of differentially methylated CpG positions (DMPs) have been reported, including DMPs in the HLA-DRB1 locus. However, even though these studies used the same array platform (i.e. Infinium HumanMethylation450 (450K) array), the reported results are inconsistent. Since these studies used genetically unmatched cases and controls, they are potentially hampered by DNA sequence variation. As genetic factors predispose to MS, these studies cannot dissect whether MS is due to genetic or epigenetic susceptibility. In addition, SNP-containing probes give rise to biased DNA methylation measurements 26 , and DNA methylation changes are also often the result of cis-or trans-acting genetic variants (methylation quantitative trait loci or mQTLs) 27 . Fortunately, an MZ twin-based design controls for these genetic differences as well as for other factors (potentially) affecting the methylome, including gender, age and a broad range of environmental factors. Thus far, one EWAS in MS discordant MZ twins has been reported, in which no DNA methylation differences were identified 28 . Since this study included only three twin pairs, further studies in larger cohorts are required. Hence, here we describe an EWAS comprising a unique cohort of 45 MZ twins clinically discordant for MS, in which we aim to identify DNA methylation changes associated with the clinical manifestation of MS and to study the effect of MS treatments on the DNA methylome (Fig. 1).

Peripheral blood mononuclear cell-based methylomes
For this study peripheral blood mononuclear cells (PBMCs) of 46 MZ twins clinically discordant for MS were accessible and genome-wide DNA methylation profiles were established using Illumina Infinium MethylationEPIC BeadChips (EPIC arrays). After quality control and filtering, methylation data of 849,832 sites were available for 45 MZ twin pairs. As expected, these twin pairs showed a very high mean within-pair array-wide correlation coefficient of 0.995, indicating high quality data. General demographic and clinical characteristics of these 45 MS discordant MZ twins at study entry are shown in Table 1.

Identification and confirmation of MS-associated DMPs in TMEM232 and ZBTB16
To identify DMPs associated with the clinical manifestation of MS (MS-DMPs), first a pair-wise analysis was carried out on the EPIC array data of the 45 pairs without adjusting for cell type composition (Supplementary Fig. 1a and 1b). The Q-Q plot in Supplementary Fig. 1b shows that the obtained P-values clearly deviate from the null expectation. This inflation of P-values was eliminated after adjusting for cell type composition ( Fig. 2a and 2b), indicating that many differences are due to variation in cellular composition. Fig. 2a and Supplementary Fig. 1a Fig. 2). However, these large within-pair methylation differences were not confirmed by validation using targeted deep bisulfite sequencing (TDBS) (Supplementary Fig. 3). This indicates that some EPIC probes are prone to technical artefacts, as reported by others 29 , and that validation using independent assays is required.

The volcano plots in
The unadjusted analysis revealed 39 MS-DMPs with a suggestive P-value<5*10 -6 and after correcting for multiple testing six MS-DMPs remained genome-wide significant (false discovery rate (FDR)<0.05) (Supplementary Fig. 1a). After adjusting for cell-type composition, no MS-DMP had an FDR<0.05, but five MS-DMPs had a suggestive P-value<5*10 -6 ( Fig. 2a and Table   2). One of these MS-DMPs is located in the promoter of the TMEM232 gene (cg27037608, mean Δ β -value=0.024), encoding for a transmembrane protein of unknown function. However, genetic variants in this gene have been associated with atopic dermatitis and allergic rhinitis in genome-wide association studies (GWAS) [30][31][32] . For this MS-DMP EPIC array data of 12 neighbouring CpGs were also available, which all showed a similar effect and eight CpGs had a P<0.01 (Fig. 2c) Table 2).
Next, a functional annotation analysis carried out with all MS-DMPs having a P<0.001 in the analysis adjusted for cell-type composition (698 in total) revealed TMEM232 as a prime candidate marker for MS. All other annotation categories were not significant.
Based on their significance, effect size and/or whether neighbouring probes were also differentially methylated, the TMEM232 (cg27037608) and ZBTB16 (cg25345365) MS-DMPs were selected for validation using TDBS. The TDBS data correlated highly with the array data (r TMEM232 =0.84 and r ZBTB16 =0.89, Supplementary Fig. 4a and 5a). In addition, both the individual MS-DMPs as well as the surrounding CpGs in the TMEM232 and ZBTB16 loci were significantly differentially methylated between the clinically non-affected and MS-affected co-twins (Supplementary Fig. 4b and 5b). This confirms that these MS-DMPs represent true effects in our twin cohort.

Whole genome bisulfite sequencing identifies MS-DMR in FIRRE
To identify additional MS-associated differentially methylated regions (MS-DMRs), we performed a pilot study using whole genome bisulfite sequencing (WBGS) in a subset of four MS discordant female twin pairs. Since raw PBMC-based methylomes are hampered by cell type composition differences and need adjustment, we decided to perform WGBS on CD4+ memory T cells, which have been implicated in the pathogenesis of MS 38 . First global DNA methylation changes in the WGBS data were evaluated by identifying and comparing partially methylated domains (PMD), fully methylated regions (FMRs), low methylated regions (LMRs) and unmethylated regions (UMRs). However, no global changes were observed between the non-affected and MS-affected MZ co-twins (P>0.05) (Supplementary Fig. 6 and 7).
Next, a DMR analysis was carried out and MS-DMRs were defined as ≥ 3 CpGs (max. distance 500bp), each having P<0.05 and absolute mean methylation difference >0.2. This analysis revealed a prominent MS-DMR located in an intronic CTCF/YY1 bound regulatory region in the FIRRE gene, that is located on the X-chromosome (chrX:130863481-130863509) and encodes a circular long non-coding RNA (Supplementary Fig. 8) 39 . The CpGs within this MS-DMR are not covered by the EPIC array, but a probe (cg08117231) located close to this DMR was not significant in the PBMC-based EWAS (P>0.05). The TMEM232 and ZBTB16 loci did not fulfill our filtering criterion of ≥ 10 reads coverage across all samples (note that they were validated by TDBS).

Within-pair differentially methylated regions are common among MZ twins
The EPIC array-based association analysis enables to identify relatively small DNA methylation differences present in a large number of cases. However, since MS is a heterogeneous disease, DNA methylation changes present in only a few cases might also have biological consequences.
We therefore performed a DMR analysis in the individual twin pairs to identify so-called within-  Fig. 9-12). Of the 45 WP-DMRs, 43 were solitary pair-specific and only two were found in two independent pairs. These recurrent WP-DMRs were located in the ISOC2 and the HIST1H3E promoter ( Supplementary Fig. 9 40 . Two of the pair-specific WP-DMRs were located in reported imprinted regions 41 , including SVOPL and HM13/MCTS2P (Supplementary Fig. 10), but in both cases the non-affected co-twin showed an abnormal methylation pattern. Furthermore, in one pair four WP-DMRs were identified in the protocadherin gamma (PCDHG) gene cluster due to hypermethylation in the MS-affected co-twin (Supplementary Fig. 11). In another pair a WP-DMR was observed in the promoter of the non-clustered protocadherin 10 (PCDH10) gene, also due to hypermethylation in the MS-affected co-twin (Supplementary Fig. 12). Protocadherins are highly expressed in the brain and are involved in neuronal development 42 .

Interferon treatment induces robust DNA methylation changes
Besides identifying methylation changes associated with the clinical manifestation of MS, our MZ twin design also enables to identify MS treatment-related DNA methylation changes. Since in our cohort interferon-beta (IFN) is the most common disease modifying treatment, we performed a pair-wise analysis only including the EPIC array data of the 12 pairs of which the MS-affected co-twins were treated with IFN at the moment of blood collection. We observe that the mean within-pair β -value differences (Δβ-values) for this sub-cohort are larger (Fig. 3a), since 257 DMPs with an absolute mean Δ β -value>0.05 and P<0.001 were identified. None of the MS- Table 1 were among these 257 IFN-associated DMPs (IFN-DMPs). The 257 IFN-DMPs were annotated to 212 genes, of which 124 genes (58%) overlap with IFN-regulated genes recorded in the INTERFEROME database (accessed May 2018), which is based on gene expression data 43 . In addition, a functional annotation analysis revealed a clear enrichment for genes involved in the antiviral defence and interferon homeostasis (Fig. 3b). Moreover, seven IFN-DMPs had an absolute mean Δ β -value>0.10 and P<0.001, due to strong hypomethylation in the IFN-treated MS-affected co-twins (Supplementary Table 2 and Supplementary Fig. 13).

DMPs listed in
These seven DMPs were located in RSAD2 (n=3), MX1 (n=2), IFI44L (n=1) and PLSCR1 (n=1), i.e. genes that have been reported to be up-regulated in blood and PBMCs of MS patients following IFN treatment [43][44][45] . Adjusting the data for cell-type composition resulted only in a slight attenuation of the IFN-effect (Supplementary Table 2), indicating that these seven DMPs are robust markers for monitoring IFN treatment effects in PBMCs.

Glucocorticoid (GC) treatment induces hypomethylation at ZBTB16 enhancer DMP
Among the MS-DMPs, the technically replicated ZBTB16 DMP (cg25345365) had the largest effect size (~4%). This MS-DMP is located in an enhancer in intron 3 of ZBTB16, which encodes for a transcription factor also known as promyelocytic leukemia zinc finger (PLZF).
ZBTB16/PLZF has been reported to be essential for NKT cell development 36 , and there is evidence that ZBTB16/PLZF contributes to T helper 17 (Th17) cell differentiation and phenotype maintenance 46 . However, ZBTB16 has also been identified as a major glucocorticoid (GC) response gene being highly upregulated after GC exposure 47 , and several days of high-dose intravenous GC therapy is generally used to treat relapses in MS.
None of the MS-affected co-twins included in the array-based EWAS received GCs within three months prior blood collection, but 43 out of the 45 MS-affected co-twins have a GC treatment history, of which 14 received GCs within >3-12 months prior blood collection (minimal 3x1g and 6g on average). In those 14 pairs, the within-pair methylation differences at the ZBTB16 DMP are significantly larger (more negative) compared to the pairs of which the MS-affected co-twin received GCs longer than 12 months ago (P=0.0004, Fig. 4a and Supplementary Fig. 14). This indicates that the strong association between the ZBTB16 DMP and the MS phenotype is due to the GC treatment history in the MS-affected co-twins.

GC-induced methylation changes are not widespread in GC-response genes
Then, the effect of recent GC treatment history on the PBMC methylomes was analysed by performing a pair-wise analysis including only the EPIC array data of 14 pairs of which the MSaffected co-twin received GCs within >3-12 months prior blood collection (Fig. 4b).
In total, 320 potential GC-DMPs had an absolute mean  Table 1 were not among the 320 GC-DMPs.
To study acute effects of GC treatment, a WGBS analysis was performed on CD4+ memory T cells of an MS discordant MZ twin pair of which the MS-affected co-twin was treated with GCs at the moment of blood collection. As shown in  Table 3), which represent potential GC-treatment epigenetic biomarkers. The majority of these 41 GC-DMRs were hypomethylated and the corresponding GC-response gene was recorded as upregulated due to GC treatment (Fig. 4d).

ZBTB16 enhancer DMP methylation correlates with global hypermethylation
Global DNA methylation levels of the PBMC-derived samples were assessed by TDBS of the repetitive elements Alu, human endogenous retrovirus type K (HERVK) and long interspersed nuclear element-1 (LINE1). Although Alu methylation correlated significantly with LINE1 methylation (r=0.43, P=0.003), Alu, HERVK and LINE1 methylation levels did not differ between the clinically non-affected and MS-affected co-twins (P>0.05). Alu and HERVK methylation were affected by cell-type composition differences, but the association also remained non-significant after adjusting for cell-type composition.
The volcano plots of the EPIC array data in Supplementary Fig 1a and

No evidence for within-pair copy number variations
Finally, discordant phenotypes within MZ twins can also be the result of within-pair copy number variations (WP-CNVs) 48 , therefore we checked the EPIC array data for CNVs. However, our analysis did not reveal evidence for chromosomal gains and losses within the MZ twin pairs. To illustrate, examples of copy number plots generated from the EPIC array data are presented in Supplementary Fig. 16.
The most prominent MS-DMP in our EWAS was the technically replicated cg25345365 DMP in the transcription factor ZBTB16, which is also a GC-response gene that becomes highly upregulated after GC exposure 47 . Although nobody received GCs within three months prior blood collection, since 95% of the MS-affected co-twins have a GC treatment history and the healthy co-twins do not, GC treatment is a serious confounder. Our extensive analysis shows that the strong association between hypomethylation at the ZBTB16 DMP and MS in our EWAS is due to the GC treatment history, which indicates that the generally used inclusion criterion of  Supplementary Fig. 17), which are believed to be sufficient for binding of the GC-GCR complex 49 . Although the precise demethylation mechanism remains unknown, data of Wiench et al. 50  However, variants in this gene have been associated with atopic dermatitis, allergic rhinitis and asthma in GWAS [30][31][32] , which might point towards a common immunologic pathway involving TMEM232. However, besides being biologically plausible, robust evidence that supports an association between atopic diseases and MS is thus far lacking 54,55 . At this point, further studies are needed to verify the association between the TMEM232 DMR and MS.
We also carried out a WGBS analysis on CD4+ memory T cells of four MS discordant MZ twin pairs. This pilot did not reveal widespread global or site-specific methylation differences between the MS-affected and non-affected co-twins. Nevertheless, one potential MS-DMR was identified in an intronic regulatory region in the X-linked FIRRE gene (Supplementary Fig. 13), which encodes for a circular long non-coding RNA that has been reported to be involved in positioning the inactive X-chromosome to the nucleolus and to maintain histone H3K27me3 methylation 39,56 .
Although our results are preliminary, MS is more common in women 3 and a role of X-inactivation in the pathogenesis of MS has been proposed (reviewed by Brooks et al. 57 ), therefore this DMR represents a possible candidate.
Our array-based EWAS allows us to detect small DNA methylation differences that are present in many patients, but MS is a heterogeneous disease and DNA methylation changes present in a few patients only might also contribute. In order to identify such rare methylation differences, we carried out a WP-DMR analysis using the EPIC array data. In total, 45 WP-DMRs were identified in 17 twin pairs, which indicates that WP-DMRs are quite common among MZ twins.
Two WP-DMRs were present in two pairs, but not associated with the MS phenotype. The remaining 43 WP-DMRs were pair-specific, of which 27 showed an abnormal methylation level in the MS-affected co-twin. These WP-DMRs have not previously been associated with MS, although two WP-DMRs were related to genes encoding protocadherins that are involved in neuronal development 42 . Hence, a contribution of these WP-DMRs to the discordant phenotype cannot be excluded, but since they are pair-specific, these results should be interpreted very cautiously.
Due to the observed maternal parent-of-origin effect in MS and the reported involvement of several imprinted genes in the immune system, a role of genomic imprinting in the aetiology of MS has been suggested 15  However, the unbalanced volcano plots in Fig. 2a and Supplementary Fig. 1a 61 . This might indicate that for detecting robust MS-DMPs it is required to immediately profile rare subpopulations such as Th1, Th17 and regulatory T cells, or to asses immune cells in the cerebrospinal fluid.
In conclusion, our EWAS shows that DNA methylation profiles in PBMCs of MZ twins clinically discordant for MS are overall very similar, and no evidence was found that genomic imprinting errors or CNVs explain discordance of MS in MZ twins. However, a DMR in the TMEM232 promoter was identified as a candidate loci associated with the clinical manifestation of MS. In addition, epigenetic biomarkers for MS treatments were identified, revealing that besides shortterm also medium-term treatment effects are detectable in blood cells, which should be considered in epigenomic and transcriptomic studies. Altogether, we believe that this study represents an important first step in elucidating epigenetic mechanisms underlying the pathogenesis of MS.

Twins were recruited by launching a nationally televised appeal and internet notification in
Germany with support from the German Multiple Sclerosis Society (DMSG, regional and national division

Infinium MethylationEPIC BeadChip data processing and DMP identification
Raw EPIC array data were preprocessed using the RnBeads R/Bioconductor package 63 . Lowquality samples and probes were removed using the Greedycut algorithm, based on a detection P-value threshold of 0.05, as implemented in the RnBeads package. In addition, probes with less than three beads and probes with a missing value in at least 5% of the samples were removed. For each CpG site a β -value was calculated, which represents the fraction of methylated cytosines at that particular CpG site (0 = unmethylated, 1 = fully methylated).  Fig. 18

Estimation of cell-type composition
Cell-type composition of each PBMC sample was estimated with the reference-based method first published by Houseman et al. 67 As a result, Supplementary Fig. 19 shows that the overall within-pair correlations are, as expected, higher after adjusting for cell-type composition.

Identification of within-pair differentially methylated regions (WP-DMRs)
To identify WP-DMRs in the EPIC array data, for each twin pair the

Copy-number variation (CNV) analysis
CNV analysis with the EPIC array data was performed using the conumee R/Bioconductor package with default settings 70 . Individual profiles and output were manually assessed. To define chromosomal gains and losses within the MZ twin pairs, an absolute segment mean threshold ≥ 0.3 was applied.

Targeted deep bisulfite sequencing (TDBS)
TDBS was used to validate DMPs resulting from the EPIC array analysis and to determine methylation levels of the repetitive elements HERVK, LINE1 and Alu. Amplicons were generated on bisulfite-treated DNA using region-specific primers with TruSeq adaptor sequences on their 5´-ends (Illumina). Reaction conditions and primer sequences are described in Supplementary   Table 5. Purified PCR products were quantified, pooled, amplified using index primers (five cycles), and sequenced in a 300-bp paired-end MiSeq run (Illumina). After demultiplexing, adaptor trimming and clipping overlapping mates, the resulting FASTQ files were imported into BiQ Analyzer HiMod 71 , to filter out low quality reads and call the methylation levels. Final coverage was >1500 reads/base.

Targeted deep sequencing (TDS)
The rs6471533 SNP was genotyped using TDS (see Supplementary Table 5 for reaction conditions and primer sequences). The workflow is similar as described for TDBS, except that genomic DNA was used and that the resulting FASTQ files were aligned to the reference sequence using Bowtie 2 72 . Subsequently, variants were called with SAMtools mpileup and variant information was extracted using filter pileup. Final coverage was >1500 reads/base.  Table 6.

DMR identification in WGBS data
To identify MS-associated DMRs (MS-DMRs), the WGBS data of all four pairs were analysed using the RnBeads package, in which for every CpG a paired t-test was performed. Only CpGs with a coverage

Partially methylated domain (PMD) analysis in WGBS data
The WGBS data was segmented into PMDs, low methylated regions (LMRs) and unmethylated regions (UMRs) using the MethylSeekR R/Bioconductor package 78  to assess the genome-wide PMD similarity across the eight samples, the genome was binned into 1kb windows, and each was annotated with 1 if the bin overlaps with a PMD and 0 otherwise. Based on this binarized matrix a hierarchical clustering was performed in R using "ward.D2" as agglomeration method and "euclidean" as a distance measurement. The very same procedure was performed for FMRs.

Data availability
The epigenomic data has been deposited at the European Genome-phenome Archive (EGA, http://www.ebi.ac.uk/ega/), which is hosted at the EBI, under accession number EGAS00001003076 (submission in progress).         37 . a Genome coordinates are human genome build GRCh37/hg19. b Based on information provided by the Illumina manifest. c The number of EPIC probes mapping close to the DMPs are listed as well as whether these probes have a P<0.01. d Behaves like a methylation quantitative trait loci. A = adjusted for cell-type composition, DHS = DNAse I hypersensitive site, FDRW-A = FDR Wilcoxon signed-rank test adjusted for cell-type composition, FDRW-U = FDR Wilcoxon signedrank test unadjusted for cell-type composition, GWAS = genome-wide association study, PW-A = P-value Wilcoxon signed-rank test adjusted for cell-type composition, PW-U = P-value Wilcoxon signed-rank test unadjusted for cell-type composition, TFBS = transcription factor binding site, TSS200 = the region from transcription start site (TSS) to −200 nt upstream of TSS, TSS1500 = −200 to −1500 nt upstream of TSS, U = unadjusted for cell-type composition, Δβ-value = within-pair β-value difference, 450K = CpG present on 450K array (N = no, Y = yes).