DNA methylation signatures of monozygotic twins clinically discordant for multiple sclerosis

Multiple sclerosis (MS) is an inflammatory, demyelinating disease of the central nervous system with a modest concordance rate in monozygotic twins, which strongly argues for involvement of epigenetic factors. We observe highly similar peripheral blood mononuclear cell-based methylomes in 45 MS-discordant monozygotic twins. Nevertheless, we identify seven MS-associated differentially methylated positions (DMPs) of which we validate two, including a region in the TMEM232 promoter and ZBTB16 enhancer. In CD4 + T cells we find an MS-associated differentially methylated region in FIRRE. Additionally, 45 regions show large methylation differences in individual pairs, but they do not clearly associate with MS. Furthermore, we present epigenetic biomarkers for current interferon-beta treatment, and extensive validation shows that the ZBTB16 DMP is a signature for prior glucocorticoid treatment. Taken together, this study represents an important reference for epigenomic MS studies, identifies new candidate epigenetic markers, and highlights treatment effects and genetic background as major confounders.


Supplementary Tables
. Characteristics of the top 15 most significantly differentially methylated positions (DMPs) associated with longstanding MS, identified by a pair-wise analysis including only the EPIC array data of the 25 MZ twins pairs that have been clinically discordant for MS for more than 10 years (n = 25 twin pairs). The first two DMPs had a suggestive P-value<5*10 -6 (adjusted for cell type composition). Source data are provided as a Source Data file. a All the genome coordinates are based on human genome build GRCh37/hg19. b Based on information provided by the Illumina manifest. Since all genes have multiple transcripts, the "UCSC_RefGene_Group" gene-related location is listed. A = adjusted for cell-type composition, CI = confidence interval, DHS = DNAse I hypersensitive site, n = number of MS-discordant MZ twin pairs, PW-A = P-value two-tailed Wilcoxon signed-rank test adjusted for cell-type composition, PW-U = P-value two-tailed Wilcoxon signed-rank test unadjusted for cell-type composition, TFBS = transcription factor binding site, TSS200 = the region from transcription start site (TSS) to −200 nt upstream of TSS, TSS1500 = −200 to −1500 nt upstream of TSS, U = unadjusted for cell-type composition, 450k = probe present on the 450k array (Y = yes, N = no), 5′UTR= 5' untranslated region, Δβ-value = withinpair β-value difference (clinically MS-affected MZ co-twin -non-affected MZ co-twin).

Supplementary Table 2. Evaluation of the TMEM232 MS-DMPs in blood-based 450K EWAS data of 140 MS patients and 139 unrelated controls from Kular et al 1 .
The results of the PBMC-based EPIC array EWAS in the 45 MZ twins clinically discordant for MS are shown as well. b Significance estimated using linear regression (see Methods for details). CI = confidence interval, n = number of individuals, PU = P-value unadjusted, PA = Pvalue adjusted for sex, age and smoking status, PA2 = P-value adjusted for sex, age, smoking status and cell type composition, PW-A = P-value Wilcoxon twotailed signed-rank test adjusted for cell-type composition, PW-U = P-value Wilcoxon two-tailed signed-rank test unadjusted for cell-type composition, TSS200 = the region from transcription start site (TSS) to −200 nt upstream of TSS, TSS1500 = −200 to −1500 nt upstream of TSS, Δβ-value = within-pair β-value difference (clinically MS-affected MZ co-twin -non-affected MZ co-twin). Source data are provided as a Source Data file. a Genomic coordinates are based on human genome build GRCh37/hg19. The MS-DMRs were annotated using the ChIPseeker R/Bioconductor package (v1.14.2) 4 . b In this column the approximate distance to the closest EWAS EPIC probe is listed. When the distance is <1 kb, then of this EPIC probe the PW-A-value of the pair-wise analysis using the EPIC array data of the 45 MZ twin pairs adjusted for cell-type composition is listed as well. If the distance is 0 bp, then the EPIC probe is located within the MS-DMR. c cg16446288 is exactly located at chr7:98424232-98424233. d cg16557370 is exactly located at chr13:27295982-27295983. e This MS-DMR fulfilled the stringent selection criteria of ≥3 CpGs, each having P<0.05 (two-tailed paired T-test) and absolute mean methylation difference >0.20, and a maximum of 500 bp distance between neighbouring significant CpGs. PW-A = P-value two-tailed Wilcoxon signed-rank test adjusted for cell-type composition. Robust DMR = DMR that shows overall consistent methylation differences (same direction) across the entire DMR (applying the lower methylation threshold of 0.15 resulted in several MS-DMRs not showing consistent methylation differences (same direction) across the entire DMR). Please see the Source Data File for details. Δβ = Within-pair β-value difference (clinically MS-affected MZ co-twin -non-affected MZ co-twin). Source data are provided as a Source Data file. a WP-DMRs were defined as ≥3 CpGs with a within-pair β-value difference >0.20 (adjusted for cell-type composition) and a maximum 1 kb distance between neighboring CpGs (the 257 IFN-associated CpGs were excluded from this analysis). In addition, the β-value of the "abnormally methylated" co-twin had to be greater than ±3 standard deviations from the mean. b All genome coordinates are based on human genome build GRCh37/hg19. Chr = chromosome, DMF = dimethyl fumarate, GLAT = glatiramer acetate, IFN = interferon-beta, IR = imprinted region (Y = yes), TFM = teriflunomide. Table 4), were present in other pairs as well by applying a lower Δβ-value threshold of 0.15. In total, four WP-DMRs were also identified in other twin pairs, of which one intergenic WP-DMR was present in 4 pairs and always associated with the MS phenotype (in bold). Hence, in total 24 MSassociated WP-DMRs were identified in 11 pairs, of which 23 were pair-specific and one present in 4 twin pairs. Clinical characteristics such as gender, MS course, disease duration at sampling date, age at first disease manifestation, MS treatment, and pack-years at sample collection did not differ between these 11 twin pairs and the 34 other pairs (P>0.05, two-tailed Wilcoxon rank sum test for continuous data and two-tailed Fisher's exact test for categorical data). Source data are provided as a Source Data file. a WP-DMRs were defined as ≥3 CpGs with a within-pair β-value difference >0.20 (adjusted for cell-type composition) and a maximum 1 kb distance between neighboring CpGs (the 257 IFN-associated CpGs were excluded from this analysis). In addition, the β-value of the "abnormally methylated" co-twin had to be greater than ±3 standard deviations from the mean. b All genome coordinates are based on human genome build GRCh37/hg19. c A Δβ-value threshold of 0.15 was used to evaluate whether the 27 WP-DMRs, that were aberrantly methylated in the MS-affected co-twins, were present in other twin pairs as well. Chr = chromosome, DMF = dimethyl fumarate, GLAT = glatiramer acetate, IFN = interferon-beta, IR = imprinted region (Y = yes), TFM = teriflunomide.    6 implemented in the minfi R/Bioconductor package 7 . b Values are expressed as mean ± SD. c Within-pair difference = clinically MS-affected, IFN-treated MZ co-twin -non-affected MZ co-twin. CI = confidence interval, n = number of pairs. PW=P-value nonparametric two-tailed Wilcoxon signed-rank test. P-values<0.05 are in bold.

Gene
Chr Source data are provided as a Source Data file. a The glucocorticoid treatment-associated DMRs (GC-DMRs) result from the DMR analysis of the WGBS data of CD4+ memory T-cells of a MS discordant MZ twin pair of which the MS-affected co-twin was very recently treated with GCs at the moment of blood collection (n=1). The GC-DMRs were identified using the DSS-single package 8 , including a smoothing span of 100 bp, a minimum region length of 25 bp with ≥2 CpGs and a P-value<0.01. The absolute mean methylation difference had to be larger than 0.25, and to limit the number of false positives only GC-DMRs located in reported GC-response genes were considered. b Genomic coordinates are based on human genome build GRCh37/hg19. Magnitude of the correlation Power to detect a mean β-value difference of (at least) 0.05 at alpha = 1x10 -7 Power to detect a mean β-value difference of (at least) 0.04 at alpha = 1x10 -7 Power to detect a mean β-value difference of (at least) 0. >0.999 The table shows the statistical power to detect a mean β-value difference of (at least) 0.05, 0.04 and 0.03 with a (genome-wide) significance threshold of 1x10 -7 , a sample size of 45 MS discordant MZ twin pairs using a twosided paired T-test and assuming a standard deviation of 0.0266 (which is the true median standard deviation observed in the EPIC array data).

Supplementary Table 12. Primer sequences and PCR conditions. a
a Loci were amplified in 30 µL mixes containing 40 ng bisulfite-treated DNA (TDS: 25 ng genomic DNA), 0.2 mM of each dNTP, n nM of each primer, 2.5 mM MgCl2, 1.5 U HotStarTaq DNA polymerase (Qiagen) and 1X PCR buffer. DNA was denatured for 15 min at 95°C, followed by n cycles of 30 sec at 95°C, 1 min at T°C and 30 sec up to 1 min at 72°C. The reaction was completed by a final extension step of 5 min at 72°C. PCR products were mixed, purified using AMPure XP beads (Agencourt) and quantified using the Qubit Fluorometer and Qubit dsDNA HS assay kit (Invitrogen). TMEM232 was sequenced with a minimum coverage of 1500 reads. All other amplicons were sequenced with a minimum coverage of at least 2000 reads. b Alu primers were designed using the consensus sequence published by Price et al. 9 (nucleotide positions 29-260) and generate in silico 10 an "infinite" number of specific PCR products (no mismatches allowed). c HERVK primers target the youngest subfamily LTR5Hs (nucleotide position 256-487) and generate in silico 10 328 different specific PCR products (no mismatches allowed) of which 98% matches to LTR5Hs according to RepeatMasker (http://repeatmasker.org/). d LINE1 Primers were designed using the promoter/5'-UTR consensus sequence (GenBank-Nr. X58075.1, nucleotide positions 105-361) and generate in silico 10 309 different specific PCR products (no mismatches allowed), which mainly comprise the youngest subfamilies L1HS (~64%), L1PA2 (~25%) and L1PA3 (~9%). The repetitive elements were sequenced with a minimum coverage of 2000 reads, giving a >6 fold coverage per individual HERVK and LINE1 element. BisConAssay = bisulfite conversion rate assay (nonbisulfite-dependent primers), C = primer concentration (nM), Cyc = number of cycles, NA = not applicable, T = annealing temperature (°C), TDBS = targeted deep bisulfite sequencing, TDS = targeted deep sequencing, #CpGs = number of CpGs present in the amplicon.

Supplementary Table 13. Sequencing coverage statistics of the whole genome bisulfite sequencing (WGBS) analysis in CD4+ memory T-cells of four MS discordant female MZ twin pairs.
a Only the MS-affected co-twin of pair AY had been treated very recently with GCs at the time of blood collection (but never received any immune-modulating therapy), while the MS-affected co-twins of the other three pairs had not received GCs or other immune-modulating therapies within at least 12 months prior to blood collection. GC-DMR = glucocorticoid treatment-associated differentially methylated region, MS-DMR = MS-associated DMR, SS = smoking status at sample collection (Y = yes, N = no).  9 12.0 N 27,970,390 10.8 25,398,195 15,270,455 2,693,926 18.7 2,796,900 20.6

Supplementary Figures
Supplementary Figure 1. Tukey boxplots (with all data points shown in green) of (A) the age of disease onset and (B) the years that the MZ twins were clinically discordant for MS at sample collection (n = 45 twin pairs). Our twin cohort has an average age of onset of 28 years, and contains 7 (16%) cases that were younger than 20 years and 6 (13%) cases that were older than 40 years at disease onset. Since MS has an average age of onset of about 30 years and manifests in 70% of the patients between 20 and 40 years of age 11,12 , the age of onset in our cohort is within the normal range. Boxplots represent the median (central line), the interquartile range or IQR (bottom and top of the box), and 1.5 times the IQR (whiskers). . This MS-DMR is located in an intronic CTCF/YY1 bound regulatory region in the FIRRE gene, 13 that is located on the X-chromosome (chrX:130863481-130863509) and encodes a circular long non-coding RNA. 14 MS-DMRs were defined as ≥3 CpGs, each having P-value<0.05 (two-tailed paired T test) and absolute mean methylation difference >0.2, and a maximum 500 bp distance between neighbouring CpGs. The green bars highlights the MS-affected co-twins. All genome coordinates are based on human genome build GRCh37/hg19. Labels indicate: Pair ID -Disease status (i.e. MS = MS-affected MZ co-twin, H = clinically non-affected MZ co-twin). Source data are provided as a Source Data file.

Bisulfite treatment
From each PBMC sample, 500 ng DNA was treated with bisulfite using the EZ DNA Methylation kit samples revealed an average conversion rate of 99.7% (SD=0.10%). Accordingly, the other 40 MZ twin pairs were processed using the adapted bisulfite treatment and the EPIC array bisulfite controls showed normal intensities for those samples. Both members of a twin pair were always processed in the same batch.

Estimation of cell-type composition
Cell-type composition of each PBMC sample was estimated with the reference-based method first published by Houseman et al. 6 , which employs DNA methylation reference profiles of individual cell types to estimate the cell-type composition of each sample. Several reference-based deconvolution algorithms were compared, including the implementation of the Houseman algorithm in the minfi R/Bioconductor package 7 , and the standard constrained projection as well as the two nonconstrained, reference-based, cell-type deconvolution approaches recently implemented in the EpiDISH R/Bioconductor package 16 . For a subset of samples (n=61) cell-type proportions determined using immunophenotyping were available, which showed the best correlation with the estimates provided by the minfi package. Accordingly, the minfi estimates were used to adjust the