Abstract
Sequence-based genetic testing identifies causative variants in ~ 50% of individuals with developmental and epileptic encephalopathies (DEEs). Aberrant changes in DNA methylation are implicated in various neurodevelopmental disorders but remain unstudied in DEEs. We interrogate the diagnostic utility of genome-wide DNA methylation array analysis on peripheral blood samples from 582 individuals with genetically unsolved DEEs. We identify rare differentially methylated regions (DMRs) and explanatory episignatures to uncover causative and candidate genetic etiologies in 12 individuals. Using long-read sequencing, we identify DNA variants underlying rare DMRs, including one balanced translocation, three CG-rich repeat expansions, and four copy number variants. We also identify pathogenic variants associated with episignatures. Finally, we refine the CHD2 episignature using an 850 K methylation array and bisulfite sequencing to investigate potential insights into CHD2 pathophysiology. Our study demonstrates the diagnostic yield of genome-wide DNA methylation analysis to identify causal and candidate variants as 2% (12/582) for unsolved DEE cases.
Similar content being viewed by others
Introduction
The developmental and epileptic encephalopathies (DEEs) are the most severe group of epilepsies, defined by frequent epileptiform activity associated with developmental slowing or regression1. While each genetic etiology is rare, with more than 825 genes implicated2, the cumulative incidence of DEEs overall is 1 in 590 children3. Currently, de novo, X-linked, or recessively inherited pathogenic germline variants are found in ~ 50% of individuals with DEEs who undergo genetic testing4. These are identified by gene panels, exome sequencing (ES), and now, genome sequencing (GS)5,6,7. A smaller subset is explained by copy number variants (CNVs)8. Understanding the etiology guides management, such as clinical trial participation, informs accurate reproductive counseling, enables families to join gene-based support groups, and facilitates the development of targeted therapies9,10,11,12. This, in turn, improves outcomes but is not possible when the etiology is unknown (“unsolved”)13,14,15.
Epigenetic modifications, which alter the DNA without inherently changing the DNA nucleotide sequence, determine the etiology of some individuals with neurodevelopmental disorders but have not yet been studied in the DEEs. DNA methylation is an essential epigenetic modification that regulates cellular gene expression by adding a methyl (CH3) group to a DNA strand, typically at CpG sites. This can occur through methylation of promoter CpGs, genomic imprinting, and X-chromosome inactivation16. Rare epivariants, defined as rare alterations in DNA methylation with or without identified underlying DNA sequence alterations, contribute to human genetic variation17, but have also been shown to disrupt normal methylation and transcription to cause disease18,19. While DNA methylation does not change the DNA sequence itself, epivariants are often perpetrated by underlying in-cis DNA changes, such as rare sequence variants, structural alterations, and CG-rich repeat expansions17 that are difficult to identify by standard sequencing. One example is the methylation of CGG repeats in the 5’ untranslated region (5’UTR) of FMR1 (MIM:309550) that represses gene expression and causes Fragile X syndrome (MIM:300624)20. Similarly, hypermethylation of the 5’UTR of Xylosyltransferase 1 (XYLT1, MIM:608124), leading to gene silencing, may identify the “missing” allele in the recessive disease Baratela-Scott syndrome (BSS [MIM:615777])21. In both Fragile X and BSS, the aberrant methylation is due to the expansion of a CG-rich repeat that is difficult to reliably detect using short-read sequencing. Rare epivariants, also called rare differentially (hyper- and hypo-) methylated regions (DMRs), are enriched in individuals with neurodevelopmental disorders and congenital anomalies (ND-CA) compared to controls22.
In contrast to rare DMRs, which represent discrete genomic regions with outlier methylation changes, genome-wide epigenetic profiles identify a collection of distinct individual CpG site methylation changes across the genome. These epigenetic profiles were first implemented for cancer diagnostics with the introduction of the brain tumor classifier in 201823. A growing number of rare diseases exhibit methylation patterns, or “episignatures,” in the blood that are reproducible among individuals with pathogenic variants within the same protein domain, gene, or protein complex, yielding highly sensitive and specific biomarkers24,25. Since episignatures in diagnostics of rare neurodevelopmental disorders were first clinically validated and implemented with the EpiSignTM assay in 201926, episignatures for nearly 70 rare diseases have been published. Episignatures provide strong evidence for genetic diagnosis, regardless of whether an underlying pathogenic DNA variant is identified, and to resolve variants of uncertain significance (VUS). Episignatures have been found for neurodevelopmental disorders where epilepsy is part of the phenotype25,27,28,29,30, but the diagnostic yield for DEEs has not been determined. Furthermore, how these clinically relevant episignatures might be harnessed to inform underlying disease biology and give insights into potential distinct and overlapping pathogenic mechanisms among disorders is just beginning to be explored31.
Both rare DMRs and episignatures can be detected in peripheral blood samples. Rare DMRs derived from individuals with ND-CA are recapitulated across multiple tissue types, including blood and fibroblasts22. Episignature classifiers for rare diseases are trained on data obtained from blood-derived DNA and are, therefore, blood-specific.
Here, we assessed rare outlier DMRs and DNA methylation signatures in peripheral blood-derived DNA from 582 individuals with genetically unsolved DEEs (uDEEs, Fig. 1). We report our methylation array data processing pipeline, MethylMiner32, which automates quality control, normalization, and implementation of an algorithm that mines rare DNA methylation events17 in addition to interactive data visualization. Using a combination of short- and long-read sequencing (LRS), we identify variants underlying rare epivariants and episignatures. Finally, we refine the robust episignature for the DEE gene CHD2 (MIM:602119)25 to explore how clinically relevant episignatures may give insights into underlying biology. For individuals with uDEEs, we show that rare epivariants and episignatures uncover molecular causes missed using standard sequence-based approaches.
Results
Discovery and validation of DMRs
To determine the ability of our analysis pipeline to robustly detect rare, outlier DMRs, we included DNA from six positive controls with genetic alterations: three individuals with heterozygous or homozygous hypermethylation of XYLT1, and three individuals (two males and one female) with hypermethylation of FMR1. The outlier DMR analysis detected both rare DMRs (Supplementary Fig. 2, Supplementary Data 3A). Additionally, we identified an XYLT1 heterozygous hypermethylation carrier in our DEE cohort. Targeted X-chromosome analysis in males identified complete methylation at the FMR1 locus in both Fragile X males compared to the remaining cohort, all of which were completely unmethylated at FMR1. FMR1 hypermethylation was also higher (~ 75%) in the Fragile X female sample compared to the other females with 25–50% methylation, likely due to random X-inactivation. Thus, our methylation array analysis approach detects outlier DMRs at known disease loci for the autosomes and sex chromosomes.
Next, we assessed outlier DMRs in our cohort of 1194 individuals (582 uDEEs) across 1226 array samples. We predicted n = 2184 total DMRs for the autosomes, n = 49 DMRs for males on chromosome X, n = 27 DMRs for females on chromosome X, and no DMRs on chromosome Y (Supplementary Data 3B, D, F). After accounting for DMRs overlapping across samples ( ≥ 50% probe overlap in the same direction of DNA methylation hyper- or hypo-methylation), we derived n = 1545 unique DMRs for the autosomes (1009 hyper, 536 hypo), n = 37 for males on chrX (26 hyper, 11 hypo), and n = 22 for females on chrX (14 hyper, 8 hypo) (Supplementary Data 3C, E, G). Of the samples with one or more outlier DMRs, the majority had only a single outlier DMR (Supplementary Fig. 3).
To determine the robustness of our DMR calling algorithm, we (i) assessed the reproducibility of DMR calls in a subset of samples and (ii) performed validation of DMRs using targeted EM-seq (Supplementary Methods). Using replicate array data for 29 individuals, we found that 80% of DMRs were replicated across different batches for an individual (Supplementary Methods). We then used targeted EM-seq, a non-bisulfite approach, to validate a subset of DMRs. We confirmed that our positive control DMRs (XYLT1 and FMR1) could be detected in the targeted EM-seq data (Supplementary Fig. 4). We then validated 29 outlier DMRs by targeted EM-seq in six individuals with uDEEs and four family members (Supplementary Figs. 5, 6, and 12). In addition to DMR validation, targeted EM-seq provides much higher resolution of the extent of differential methylation than the methylation array (e.g. > 80 methylated CpG sites for the XYLT1 DMR by targeted EM-seq compared to eight representative probes on array; Supplementary Data 4). Thus, we detected and validated outlier DMRs at higher resolution using an orthogonal approach.
Rare outlier DMRs in uDEEs
We narrowed down outlier DMR calls for individuals with uDEEs to determine high-priority candidates for further study based on DMR recurrence across multiple individuals, population frequency17, functional annotations (Methods), and manual inspection of DMR plots for each DMR. We identified 12 individuals with uDEEs with one or more rare, potentially disease-associated DMRs and performed follow-up studies (Table 1, Supplementary Data 2A). One individual had multiple DMRs due to a balanced translocation between chrX and chr13, four individuals each had a DMR due to an expanded CG-rich repeat, and seven individuals had DMRs due to underlying CNVs.
Rare outlier DMR analysis detects hypermethylation of chr13 due to X;13 translocation
One female with the DEE syndrome, epilepsy of infancy with migrating focal seizures (EIMFS), had 26 rare outlier hypermethylated DMRs across chr13 (Fig. 2A, Supplementary Fig. 7), none of which were present in > 23,000 controls17. The DMRs were replicated on a second, independent methylation array from the same individual and validated using targeted EM-seq (Supplementary Fig. 6). Methylation array analysis of both parents revealed that all rare hypermethylated DMRs occurred de novo in the proband (Fig. 2B). Whole-genome Oxford Nanopore Technologies (ONT) long-read sequencing also confirmed the hypermethylated DMRs and identified a balanced translocation between chrX and chr13 (Fig. 2C), annotated as 46,XX,t(X;13)(q28;q14.2). The translocation provides a mechanism whereby random X-inactivation induces hypermethylation on the portion of chr13q attached to the large piece of the X chromosome. The translocation breakpoints were confirmed by PCR and Sanger sequencing of peripheral blood-derived DNA as chrX:152,092,342 to chr13:47,005,269 and chr13:47,005,271 to chrX:152,092,344 (GRCh38/hg38). Parental methylation studies and short-read GS confirmed that the translocation occurred de novo, and SNP analysis revealed that the haplotype containing the translocation was paternally derived. The translocation is likely causative in this individual given the de novo occurrence, absence of clearly pathogenic sequence variants by trio sequence (Supplementary Data 5), and report of a similar translocation in a female individual with intellectual disability and bilateral retinoblastoma33.
Rare outlier DMR analysis detects hypermethylation caused by underlying triplet repeat expansions
We detected two individuals with uDEEs and two control individuals with hypermethylation spanning the 5’UTR and intron 1 of the Casein kinase 1 isoform epsilon (CSNK1E, MIM:600863, Fig. 3A) gene. Although present in one control and reported in 6/23,116 controls17, an individual with DEE and probable haploinsufficiency due to a de novo splicing variant (c.885+1 G > A) in CSNK1E has been reported34, suggesting further study is warranted to determine if variation in this gene causes DEE. Segregation analysis revealed that the hypermethylation in one proband was maternally inherited (Family 1, Supplementary Fig. 8), whereas the other arose de novo (Family 2). After validation of hypermethylation with targeted EM-seq for both probands (Supplementary Fig. 5), long-read sequencing of the proband (genome, ~ 1500–3000 bp) and mother (targeted, ~ 1500 bp) from Family 1 and the proband from Family 2 (genome, ~ 1100–3200 bp) confirmed the presence of an expanded CGG motif in both (Fig. 3C), as previously reported in individuals with hypermethylation of CSNK1E at fragile site FRA22A and reduced expression in lymphoblastoid cells17. Through GeneMatcher35, we identified Family 3 consisting of a proband with the same CSNK1E hypermethylated DMR and CGG repeat expansion (genome, ~ 1300–2100 bp) inherited from his mother (genome, ~ 270 –3500 bp), who is mildly affected by learning, speech, and sleep difficulties (Supplementary Phenotype data). Expression analysis in available fibroblasts from Families 2 and 3 showed that individuals with CSNK1E hypermethylation had decreased expression of CSNK1E compared to hypermethylation-negative controls (Fig. 3B). Analysis using the OUTRIDER algorithm36 confirmed “drop-out” of CSNK1E (ENSG00000213923) expression compared to publicly available fibroblast controls37 (Fig. 3B, Supplementary Fig. 9). Thus, we report 3 individuals with uDEEs harboring inherited and de novo CSNK1E hypermethylation due to an underlying repeat expansion (n = 4 LRS) that leads to approximately 50% reduction in CSNK1E expression (n = 3 RNA-seq drop-out). No other candidate gene variants for these 3 probands were found by trio GS analysis. However, due to finding this abnormality in seemingly unaffected individuals, one control and one mother (Family 1) in our cohort and others17, further work is required to determine whether variations in CSNK1E cause or contribute to the DEEs.
A male individual with uDEE displayed maternally inherited hypermethylation of the DIP2B (MIM:611379) promoter region and exon 1 (Supplementary Fig. 10), due to an underlying CGG-repeat expansion (~ 1300–2300 bp), previously characterized as fragile site FRA12A38. Loss of DIP2B is associated with an autosomal dominant neurodevelopmental disorder (NDD) with variable penetrance, including a DIP2B repeat expansion in an individual with epilepsy38.
We detected a rare hypermethylated DMR on the X chromosome in exon 1 of an uncharacterized gene (BCLAF3/CXorf23) in a male with uDEE (Supplementary Fig. 10), that was absent in >23,000 unaffected controls (> 8000 males)17. We validated hypermethylation using targeted EM-seq (Supplementary Fig. 5), and ONT long-read sequencing of the proband and his mother revealed a novel CGG repeat expansion in the proband ( ~ 2500–3000 bp, Supplementary Fig. 11) inherited from his mother, who had a smaller expansion ( ~ 1700–1900 bp). LRS and standard X-inactivation studies39 show that the mother has skewed X-inactivation (Supplementary Data 6) of the allele with the expansion, which explains why outlier hypermethylation is not detected from her methylation array data. There are no other candidate variants for the proband’s DEE by trio GS. Collectively, these results highlight the detection of repeat-expansion-associated loci based on outlier DMR analysis of DNA methylation array in individuals with uDEEs.
Rare outlier DMR analysis detects copy number variants
Seven individuals displayed DMRs that were found to be due to underlying CNVs. One individual with uDEE displayed hypermethylation of the promoter region and TSS of STX1B (MIM: 601485, Fig. 4A), an established epilepsy gene known to cause generalized epilepsy with febrile seizures plus (GEFs + ). Interestingly, the proband was indeed a member of a family displaying GEFs+ and other epileptic phenotypes (Fig. 4B, Supplementary Phenotype data, Supplementary Fig. 12). We validated the methylation finding in the family members for which blood DNA was available (n = 6 including proband) using targeted EM-seq (Supplementary Fig. 12). Genome sequencing of the proband revealed a 1784 bp deletion encompassing the promoter, the TSS, exon 1, and part of intron 1 of STX1B. Importantly, the deletion encompasses the TSS and the first 10 amino acids of the protein encoded in exon 1 (Fig. 4D). We determined the exact breakpoints of the deletion using Sanger sequencing and segregated it among the family members (Fig. 4C, Supplementary Fig. 13). The deletion was confirmed in the proband and present in the affected sister, affected mother, and affected maternal grandmother. The deletion was absent in the unaffected brother and father. Altogether, DNA methylation analysis uncovered a presumably deleterious deletion encompassing an essential portion of STX1B gene as a likely pathogenic finding for this family.
One individual with uDEE and one control had a ~ 10–15 hypomethylated DMRs along chr2 spanning ≥ 144 Kb (Supplementary Fig. 14A). Short and long-read sequencing analysis revealed this “DMR” was due to a homozygous ~ 182 Kb deletion encompassing outlier DMRs (Supplementary Fig. 14C). Segregation testing found that the proband inherited the deletion from both parents, who were heterozygous carriers. The CNV was also found on DNA methylation array using the R tool conumee40 (Supplementary Fig. 14B).
Four individuals with uDEEs and one control had a 686 bp hypomethylated DMR in intron 2 of the gene LINGO1 (MIM:609791). DNA methylation array analysis for a proband’s mother found that the hypomethylation was at least in part maternally inherited, and short and long-read sequencing revealed that hypomethylation was caused by an underlying ~ 4 Kb inherited deletion (Supplementary Fig. 15).
Another individual with uDEE had hypermethylation in the 5’UTR of CFAP36/CCDC104 (Supplementary Fig. 16A, C), which was not present in > 23,000 controls17. DNA methylation array analysis of both parents indicated it was maternally inherited (Supplementary Fig. 16B), and targeted ONT long-read sequencing revealed a ~ 500 Kb tandem duplication from chr2:55,034,228-55,536,971 (GRCh38/hg38, Supplementary Fig. 16D). Collectively, these results indicate that outlier DNA methylation can be due to underlying CNVs and that the 850 K methylation array may not have sufficient coverage to detect smaller CNVs. Due to the high population frequencies and inheritance status in the cases of the chr2 deletion, LINGO1 deletion, and CFAP36/CCDC104 tandem duplication, we determined they are unlikely to contribute to the individuals’ phenotypes. However, these findings illuminate how detected DNA methylation changes are influenced by underlying DNA variation and highlight a novel copy number alteration in STX1B as a cause of GEFs+ and other related phenotypes in a family.
Episignature screening validates pathogenicity of genetic diagnoses and resolves variants of uncertain significance
We next performed episignature analysis, using the EpiSignTM v4 classifier, including 70 conditions associated with 96 genes/genomic regions (Fig. 5). To validate our approach, we included several individuals with causal variants in episignature genes or CNVs and an individual with a VUS. These included sixteen individuals with variants in CHD2 (n = 15 pathogenic, n = 1 VUS) and one individual each with a pathogenic variant in KDM5C, SETD1B, KMT2A, or SMARCA2 (Supplementary Data 2B). We also included two individuals with CNVs, including chr17p11.2 deletion and duplication. Fifteen of the individuals with variants in CHD2 were positive for the epileptic encephalopathy of childhood (EEOC) episignature25, also known as the developmental and epileptic encephalopathy 94 (DEE94) episignature. However, one individual with a VUS in CHD2 was negative for the episignature, and in combination with other clinical evidence the VUS was reclassified as likely benign (Supplementary Phenotype data). The individuals with variants in KDM5C (MIM:314690), SETD1B (MIM:611055), KMT2A (MIM:159555), and SMARCA2 (MIM:600014) were all positive for the episignatures associated with their disorders. While these individuals were considered solved before episignature screening, the finding was used to support the genetic diagnosis of the individual with a KDM5C variant.
Additionally, we identified two individuals with inconclusive results for episignatures despite definitive genetic and clinical findings for the associated syndromes. Inconclusive findings are caused by methylation profiles that partially overlap existing signatures but are not a definitive match. This included an individual with a 17p11.2 deletion inconclusive for the Smith-Magenis syndrome episignature (SMS_del) and a female individual with a 17p11.2 duplication inconclusive for the Potocki-Lupski syndrome episignature (PTLS, Supplementary Fig. 17). In each case, the inconclusive episignature finding is concordant with the genetic diagnosis but yields an inconclusive result potentially attributable to variability introduced by differential CNV breakpoints. Because of this and other factors, inconclusive EpiSignTM results are reported with the caveat that further follow-up or investigation may be warranted if there is a clinical phenotype consistent with the inconclusive episignature in question.
Episignature screening solves genetically unsolved DEEs
We then tested our cohort of 582 individuals with uDEEs for 70 clinically validated episignatures, leading to a likely diagnosis in five individuals (Table 2). All methylation variant pathogenicity (MVP) scores for episignatures and detailed genomic variant information are in Supplementary Data 2C. Two unrelated individuals with uDEEs were positive for the KBG syndrome episignature (KBGS_MRD23) caused by pathogenic variants in ANKRD11 (Supplementary Figs. 18 and 19). Exome or genome sequencing analysis revealed de novo pathogenic stop-gain variants in both individuals, and phenotypes for each individual are consistent with the diagnosis (Supplementary Phenotype data). One proband had affected siblings and family members (n = 8, Supplementary Fig. 19). However, none harbored the ANKRD11 episignature and neither affected sibling harbored the variant, indicating that there is likely a different explanation for this familial epilepsy. One individual with uDEE was positive for the episignature associated with SETD1B (Supplementary Fig. 20). Exome sequencing revealed a pathogenic stop-gain variant in SETD1B. Another individual with uDEE harbored the episignature for TET3 and had a maternally inherited pathogenic stop-gain variant in TET3 on GS (Supplementary Fig. 21). This remains the likely cause of the individual’s DEE as the mother has a milder phenotype including macrocephaly and learning difficulties (Supplementary Phenotype data). One male individual with uDEE was positive for the UBE2A episignature (Supplementary Fig. 22). Through exome sequencing, we identified a predicted damaging maternally inherited missense variant absent in gnomAD (c.376 G > A, p.Ala126Thr). Although the variant does not reach likely pathogenic classification using existing ACMG criteria, the prediction scores (REVEL = 0.776, CADD = 26.4, and PolyPhen-2 = 1.00) support pathogenicity; the variant is maternally inherited in an X-linked intellectual disability disorder; and the individual shares multiple phenotypic features with UBE2A disorder. Thus, the variant has been determined to be the most likely genetic cause of disease. Another male individual with uDEE was positive for the episignature for the SMS gene on chromosome X (Supplementary Fig. 23). Through ES, we identified a maternally inherited, likely pathogenic missense variant (CADD = 24.2) in the SMS gene.
Of the high-confidence episignature findings, only one individual had an established genetic diagnosis in another gene. This individual harbored a de novo variant in PTEN with a consistent phenotype of macrocephaly and focal epilepsy but also had the episignature for KDM2B. Further analysis identified a paternally inherited missense variant in KDM2B. We performed methylation array analysis for the unaffected father and found that he, too, harbored the KDM2B episignature. This variant is predicted to be likely pathogenic (LP) by ACMG criteria due to its putative effect on splicing regulation, though assessment of this variant with SpliceAI predicts that it does not have a high likelihood of affecting splicing (Δ score for Donor Gain:0.01). When this criterion is taken away, the designation of LP is reduced to a VUS; other computational predictors assess the impact to be uncertain (REVEL = 0.517). Thus, while it is unlikely that this KDM2B variant explains the individual’s phenotype, it still represents an underlying DNA change detected through episignature screening, and it remains possible that it has a modifying effect on phenotype. Collectively, we have identified positive episignatures and causal genetic etiologies in five previously unsolved individuals with DEEs through episignature screening.
An additional 40 individuals with DEEs (n = 32 unsolved, n = 8 solved) and nine controls had inconclusive results for episignatures, consistent with the rate of inconclusive results in previous studies41. Of the individuals with DEEs, 4/40 were run across multiple methylation array batches. Three individuals did not reproduce their inconclusive episignature result in the other sample(s). While one individual’s inconclusive result did replicate across the different batches, no pathogenic variants were found by GS in the associated genes(s). Of all the individuals with available sequencing data (n = 27), none harbored pathogenic variants in the genes associated with episignature findings. While some had overlapping clinical features, most were discordant with the described phenotypes for their inconclusive episignature finding. Additional follow-up will be required to determine whether these inconclusive results are due to array artifacts or have underlying biological or disease-associated meaning. If technical artifacts are ruled out, an inconclusive result may be caused by episignatures in other genes that are yet to be defined and trained against for specificity of the classifier.
Redefining the CHD2 episignature on the 850 K EPIC array
While episignatures are proven to be clinically useful for diagnosis, little work has been done to investigate how episignatures may inform disease biology by studying DMRs that may impact gene expression. Here, we performed refinement and in-depth analysis of the episignature for the DEE gene CHD2. The CHD2 episignature was originally derived using overlapping 450 K and 850 K DNA methylation array probes representing individual CpG sites in n = 9 individuals with pathogenic CHD2 variants25. We refer to this signature as the CHD2 450 K episignature (Fig. 6A upper, Supplementary Fig. 24A, Supplementary Data 7). Here, we refine the CHD2 episignature exclusively on 850 K EPIC methylation array probes with data from a cohort of n = 29 individuals with pathogenic CHD2 variants (Fig. 6A lower, Supplementary Fig. 24B, Supplementary Data 7). We refer to this signature as the CHD2 850 K episignature. Of the 200 probes included in the CHD2 850 K episignature, 79/200 are specific to the 850 K EPIC array.
Comparison of the CHD2 episignature to 55 other clinically validated episignatures
We then compared the CHD2 450 K and 850 K episignatures to 55 other NDD episignatures (57 total including CHD2)31 by examining shared probes (Fig. 6B, Supplementary Fig. 25), Euclidean clustering (Fig. 6C), probe mean methylation differences (Supplementary Fig. 26), and functional annotations (Supplementary Fig. 27). As expected, the CHD2 850 K episignature shares the most probe overlap with the CHD2 450 K episignature (86/200 or 43%, Fig. 6B, Supplementary Fig. 25). Euclidean clustering was used to examine the relatedness of the episignatures by probe overlap and directionality. The CHD2 850 K episignature shares the closest branchpoint with the MRXSCJ episignature for KDM5C of which it shares 7% of its top 500 DMPs. Collectively, both 450 K and 850 K episignatures do not share immediate branches (other than the primary branchpoint) with many other episignatures. This may indicate different sets of predominant pathways underlying CHD2 pathophysiology compared to the other episignatures. Additionally, the CHD2 850 K episignature represents more hypermethylated regions than the CHD2 450 K episignature, as depicted by the mean methylation differences in Fig. 6C and Supplementary Fig. 26. We also performed functional annotation of episignature probes for CpG characteristics and gene regions in relation to the 55 other NDD episignatures (Supplementary Fig. 27). We found that both CHD2 850 K and 450 K DMPs map to predominately the coding regions of genes (46% and 41%, respectively) with a significant difference in the distribution of DMPs in these regions compared with the background probe distribution (P < 9.06 × 10−69 and P < 2.02 × 10−79, respectively). Though the CHD2 850 K episignature represents a higher portion of interCGI (interCpG island) regions compared with the 450 K episignature (43% vs. 31%, respectively), both are enriched in interCGI regions relative to background probe distribution (P < 2.26 × 10−121 and P < 9.17 × 10−144).
The CHD2 episignature is associated with differentially methylated regions
Since CHD2 encodes a chromatin remodeler that has been shown to regulate gene expression42,43, we investigated whether individual episignature probes are contained within larger DMRs between cases and controls. DMRs could potentially provide a link to downstream gene expression. We first investigated DMRs in an unbiased genome-wide manner by calling DMRs from the 850 K DNA methylation array data (n = 16 CHD2, n = 18 controls) using bumphunter44 and DMRcate45. We predicted 1684 DMRs from bumphunter and 963 DMRs from DMRcate. These DMRs were intersected, requiring an overlap in the same direction (hyper/hypo) of at least 50 bp, to derive a high-confidence DMR list of 712 overlapping regions (349 hyper, 363 hypo). Representative images of these DMRs are shown in Supplementary Fig. 28. These DMRs directly coincide with 86/200 (43%) CHD2 450 K episignature probes and an increased 90/200 (45%) CHD2 850 K episignature probes (Supplementary Fig. 29, Supplementary Data 8). Thus, the CHD2 episignature is characterized by DMRs, and this overlap increases by four probes for the CHD2 850 K episignature.
Increased CpG resolution and genomic coverage of differentially methylated regions using whole genome-bisulfite sequencing
Due to limited genomic coverage, DNA methylation arrays can be skewed in their representation of CpGs across the genome, as evidenced by their tendency to bias gene set analyses46. To better understand the DMR landscape of CHD2 and investigate DMRs at higher CpG resolution, we performed whole-genome bisulfite sequencing (WGBS) with coverage of > 20,000,000 CpGs on three CHD2 trios and one singleton. We derived 11,019 DMRs from DSS47, 4078 DMRs from DMRcate48, and 3655 DMRs that overlap between both callers (2420 hyper and 1235 hypo). To determine the robustness of this approach, we manually inspected DMRs with a methylation difference of at least 20% (n = 207 DMRs, 146 hyper, 61 hypo) by examining the reads in all three trios in IGV and confirmed 169/207 DMRs, yielding a true call rate of 81.6%. Representative DMRs called from WGBS are shown in Supplementary Fig. 30. We then investigated the overlap of episignature probes with the WGBS DMRs with a methylation difference of at least 5% and found direct overlap with 76/200 (38%) CHD2 450 K episignature probes and an increased 94/200 (47%) CHD2 850 K episignature probes (Fig. 6D, Supplementary Fig. 29). Thus, considering the increased genomic coverage afforded by WGBS and increased DMRs, it is unsurprising that a higher proportion of CHD2 850 K episignature probes overlap with DMRs (Supplementary Fig. 29, Supplementary Data 8). Notably, for nearly all probes found within DMRs, those DMRs could be better visualized from the WGBS data due to the lack of probe coverage on the array. Thus, we have confirmed using an orthogonal approach with higher CpG coverage that the CHD2 episignature is characterized by DMRs.
We further investigated DMR calls by functionally annotating them using the annotatr49. We first examined the representation of CpG islands, CpG shores, CpG shelves, and interCpG Island (interCGI) regions for DMRs (Supplementary Fig. 31). We find that most DMRs called exclusively from WGBS are located at interCGI regions compared to DMRs called from the array or overlap of both, likely due to the bias of gene-enriched regions on the array compared with increased genomic coverage of WGBS. We also annotated DMRs with gene annotations (Supplementary Fig. 32) and found similar patterns across DMRs called by the 850 K array, WGBS, or both, especially for DMRs called with a methylation difference of at least 5% between CHD2 and controls. When compared to three independent sets (Rep-1, Rep-2, Rep-3, Supplementary Data 9) of randomly generated regions of comparable number (n = 4767) and length (n = 50-3100 bp) representing background, the combined CHD2 episignature probes and DMRs (n = 4767) are enriched in gene regulatory regions (enhancers, promoters, and bivalent regions), transcription factor binding sites (TFBS), and DNase sites (Fig. 7B, C, D, Supplementary Data 10). Although CHD2 episignature and DMR insights are limited to the blood in our study, this work supports further investigations into CHD2 methylation of brain-relevant tissue types, such as cultured neurons, brain organoids or, when available, post-mortem tissue. Notably, we show how the global CHD2 episignature is characterized by DMRs (Fig. 7A, Supplementary Fig. 33) enriched in functional regions, and therefore, poised to affect underlying disease biology.
Discussion
A major challenge in rare disease genetics is determining molecular causes in unsolved cases. Even if ES or comprehensive GS of trios identifies all de novo and recessively inherited coding and noncoding variants, prioritizing and functionally interpreting candidate variants is challenging. In the case of the DEEs, this difficulty is further compounded by immense phenotypic and genetic heterogeneity. Genome-wide DNA methylation analysis represents an innovative approach to discovering genetic etiologies by investigating rare DMRs and screening for DNA methylation signatures. Notably, rare DMRs and episignatures can be assessed with cost-effective, high-throughput DNA methylation arrays using blood-derived DNA. Here, we performed genome-wide DNA methylation analysis on 582 individuals with uDEEs and identified causal or candidate etiologies in 12 individuals: six from rare DMR analysis (Table 1) and six from episignature screening (Table 2). Thus, the diagnostic yield of genome-wide methylation analysis in individuals with uDEEs is 2%, similar to the added diagnostic yield of GS after ES or gene panel50,51. A study of unsolved ND-CA showed a similar 2-3% increase in diagnostic yield using episignature analysis52.
We have performed rare outlier DMR analysis of methylation array data for a cohort of individuals with uDEEs and uncovered various underlying DNA variants using ONT long-read sequencing. These include a X;13 translocation, CGG repeat expansions, and copy number variants. We first validated a subset of outlier DMRs using targeted EM-seq enriched for 3.98 M CpGs, a highly effective bisulfite-free, enzyme-based conversion method for detecting CpG methylation by sequencing. Targeted EM-seq has several advantages to bisulfite-based array approaches, including minimizing DNA damage, lowering input requirements (picograms of DNA), and detecting more CpGs53. We found that all DMRs were confirmed using the EM-seq approach, and the greater number of CpGs detected compared to the methylation array afforded higher resolution to interpret DMRs. Future high-throughput DNA methylation analyses could consider using EM-seq for validation or discovery.
We report an individual with 26 outlier hypermethylation events along chr13q detected through the rare DMR analysis. Using ONT whole-genome long-read sequencing, we identified a de novo X;13 translocation showing that the hypermethylation identified the likely cause of disease. This discovery was enabled without the need for live cellular material, which is typically required by classical cytogenetics approaches. This child passed away at 7-months-old due to the severity of the disease, and this approach provided a diagnosis postmortem using banked genomic material.
We also found that several individuals displayed hypermethylation of loci associated with known or novel CG-rich repeat expansions. These regions include the 5’UTR and intron 1 of the epilepsy candidate gene CSNK1E, the 5’UTR of the neurodevelopmental disorder gene DIP2B, and the 5’UTR of the uncharacterized gene BCLAF3. We report the occurrence of hypermethylation, a CGG repeat expansion, and reduced expression of CSNK1E among three unrelated individuals with uDEEs and a mildly affected mother. CSNK1E has been implicated in the circadian rhythm54,55, and variation causes a familial advanced sleep phase syndrome (FASPS)56. Variation also produces a rapid eye movement phenotype in a knockout mouse model57. Interestingly, all our probands with DEEs and the mildly affected mother with CSNK1E hypermethylation and a repeat expansion report sleep-related phenotypes (Supplementary Phenotype data). Our results indicate that there is an enrichment of CSNK1E hypermethylation in individuals with DEE compared to controls in our cohort combined with those previously reported17 (Fisher’s Exact P = 0.0276), suggesting that further studies to determine if CSNK1E variation contributes to DEEs are warranted.
One male proband with uDEE displayed de novo outlier hypermethylation in a region annotated as intergenic on the GRCh37/hg19 genome build and at the 5’UTR of BCLAF3 on the GRCh38/hg38 genome build. Using ONT long-read sequencing, we discovered a novel CGG repeat expansion in exon 1 of BCLAF3 in this proband inherited from his unaffected mother. The mother’s long-read data displayed skewed X-inactivation against the expanded allele. Skewed X-inactivation may explain why the mother does not have a detectable DNA methylation abnormality at this locus and could provide a mechanism for her to circumvent any functional consequences of the BCLAF3 abnormality. While BCLAF3 has been previously predicted to be a potential disorder-associated gene on chrX58, little is known about its function or disease associations. Thus, further work is needed to investigate whether this abnormality is present in other individuals and if loss of this gene on chrX in males could cause a DEE.
Seven individuals displayed DMRs due to underlying CNVs, one of which we found to be likely pathogenic. Hypermethylation of the STX1B TSS and promoter from a proband with uDEE revealed a 1784 bp heterozygous deletion in GS ~ 65 bp away, which was confirmed to be present in an affected sister, affected mother, and affected grandmother. This deletion encompasses the promoter region, the TSS, exon 1, and part of intron 1 of STX1B, resulting in probable loss of function. Importantly, the deletion is unlikely to be detected through standard microarray approaches due to its small size and may escape gene panels and exome sequencing, which would not detect the non-coding portions. DNA methylation served as a signpost of the cause of this family’s epilepsy and led to the identification of a pathogenic variant.
We performed episignature screening of our uDEE cohort using the EpiSignTM v4 classifier, which contains 90 episignatures representing 70 disorders encompassing 96 genes/genomic regions. We found seven individuals with uDEEs harbored positive episignatures concordant with their phenotypes. We reviewed or reanalyzed available or newly generated ES or GS data and identified pathogenic variants in the episignature-associated genes in 6/7 individuals. In the individual with a pathogenic SETD1B variant, the father was unavailable for genetic testing to segregate the sequence variant. Thus, the positive episignature finding provided supportive information for genetic diagnosis in lieu of inheritance data. Episignatures can serve to screen for disorders that have broad, overlapping phenotypes and identify individuals who may not have the classical features of specific neurodevelopmental syndromes or DEEs. For instance, most DEEs have a phenotypic spectrum, so individuals with different etiology, developmental trajectories, or subtle dysmorphic features may escape diagnosis until a molecular etiology is found.
The top 27 most implicated genetic causes of DEEs explain 80% of DEEs7. However, only 1/27 genes (CHD2) has a clinically validated episignature. Like CHD2, 58/59 genes with robust episignatures localize to the nucleus and are associated with DNA binding, transcriptional regulation, and histone interactions. Since DNA methylation occurs in the nucleus, most genes for which episignatures have been derived are directly or indirectly involved in the epigenetic and transcriptional machinery. Whereas the top 27 DEE genes are associated with a range of cellular processes5, only a minority are associated with direct DNA interactions, and only 10 of the top 27 most frequent DEE genes are annotated to localize to the nucleus at least partially. The only gene with a clinically validated episignature not involved in any nuclear activity is SLC32A1, which encodes solute carrier family 32 member 1 (SLC32A1, MIM:616440) responsible for inhibitory neurotransmission, and variants in this gene cause a DEE59. Unfortunately, SLC32A1 is not among the most common ~60 DEE genes. Therefore, the diagnostic utility of episignatures for DEEs would increase when we can confidently derive episignatures for more DEE genes, such as ion channel, synaptic transmission, and metabolic genes.
Episignature derivation is further complicated by the existence of variant-specific episignatures that exist for a subgroup of variants within a gene (e.g.SMARCA229,60) or a set of common genes within similar pathways (e.g.Coffin-Siris syndrome episignature, due to variants in ARID1A (MIM: 603024), ARID1B (MIM:614556), SMARCB1 (MIM:601607), and SMARCA4 (MIM:603254), and SOX11 (MIM:600898)60. Thus, there is not only a need to derive episignatures for more epilepsy-related genes but also to analyze variants for testing based on variant type (i.e. missense, nonsense) and protein domain, which may segregate with phenotypes. For instance, our cohort included two females with solved DEEs and pathogenic truncating variants in the SMC1A gene located on chromosome X. Neither had a positive episignature for SMC1A for Cornelia de Lange syndrome (CdLS), which is usually due to missense or in-frame small indels proposed to have a dominant negative effect. Truncating, loss-of-function variants, however, are found exclusively in girls with DEEs. The difference in underlying disease mechanism likely impacts the composition of the distinct probe sets contained within the episignatures. Discordant or unusual findings like this example underscore additional considerations when deriving and interpreting episignatures. We came across five individuals reported as male whose methylation pattern on the X chromosome suggested two X chromosomes. Of 2/5 of these individuals who had LRS, a genotype of XXY was confirmed, which is consistent with a diagnosis of Klinefelter syndrome. More unexpected and incidental findings will arise as a greater number of episignatures are derived, and methylation testing becomes more routine.
Episignatures for many epilepsy-related genes are currently in development. As more episignatures are clinically validated, re-analysis of previously generated methylation array data from unsolved individuals will identify pathogenic findings, akin to re-analysis of exome sequencing data for new epilepsy genes years after initial sequencing was performed61. We found that episignature analysis was useful for clarifying VUSs, including an individual annotated as solved for CHD2 displaying a VUS, which was re-assessed as benign based on a negative CHD2 episignature result. We anticipate that episignatures will also be useful for interpreting the impact of noncoding variants.
There are additional considerations when determining the utility of DNA methylation analysis for the molecular diagnosis of individuals with DEEs. Firstly, the diagnostic utility will vary depending on when the individual receives the test relative to other genetic testing modalities. In our study, we analyzed DNA from individuals with DEEs who had remained unsolved after undergoing extensive genetic testing, including gene panels, microarrays, exome, and genome sequencing. As DNA methylation testing becomes increasingly accessible to newly diagnosed individuals with DEEs and as the number of epilepsy-relevant genes with robust episignatures grows, the utility of DNA methylation analysis in uDEEs may increase and guide which regions should be sequenced to identify causal variants.
DNA methylation information can be readily assessed from both ONT long-read sequencing and PacBio long-read sequencing data. Therefore, when long-read sequencing becomes more available, there is potential for an “all-in-one” approach to genetic testing whereby individuals can simultaneously be assessed for sequence variants, structural abnormalities, and rare DNA methylation changes. While it is advantageous to study rare DMRs and their potential underlying DNA defects using the same technology, applying episignatures to long-read sequencing data is uncertain and may require new computational approaches to re-derive and validate episignatures on each platform. As long-read sequencing produces far more data than arrays ( > 20,000,000 CpGs versus ~ 850,000 CpGs), this will offer an opportunity to interrogate DNA methylation more broadly and deeply.
As advances in sequencing technologies allow DNA methylation datasets to get larger, there will be a need to analyze comparative data from controls to generate population-level reference information. For our DMR analysis, we leveraged 450 K DNA methylation array outlier DMR calls generated from peripheral blood-derived DNA for > 23,000 control individuals17. Where possible, we used these data to approximate population frequencies for the DMRs we derived. However, this reference information is not available for 850 K exclusive DMRs or whole-genome sequencing DMRs. Thus, interpreting DNA methylation data for uDEEs and other unsolved genetic disorders will improve as we understand more of the methylome, including regions that were only recently resolved on the T2T genome build62, using appropriate reference datasets from diverse populations.
While episignatures provide a robust readout of the genetic etiology, they are composed of individual array probes representing singular CpG sites that might not contribute to understanding the underlying disease mechanism. Given that CHD2 is the most frequent DEE gene with a robust episignature and has a biological role as a chromatin remodeler, we were interested to use the episignature to understand how DNA methylation relates to underlying CHD2 pathophysiology. First, we re-defined the episignature on exclusively 850 K array probes with an increased sample size from n = 9 to n = 29 individuals with CHD2 pathogenic variants. Using DNA methylation array and WGBS, we show that the CHD2 episignature is associated with DMRs between cases and controls. In a recent study, investigators derived DMRs for individuals with pathogenic HNRNPU (MIM:617391) variants versus controls in methylation array data from peripheral blood-derived DNA and reported 19 DMRs called with DMRcate (Fisher P < 0.01, betacutoff = 0.05, minCpG = 5)63. The comparative number of DMRs we derived for CHD2 versus control methylation array data under the same conditions using DMRcate is 474 DMRs. This increased number of DMRs may represent the inherent function of CHD2 as a chromatin remodeler that interacts directly with the DNA, whereas HNRNPU forms complexes with RNA. Furthermore, a subset of CHD2 episignature probes overlap with DMRs in the TSS/5’UTR of developmentally relevant genes and might regulate expression (Supplementary Data 8). For instance, a cluster of hypermethylated episignature probes for the CHD2 450 K and 850 K episignatures are contained within a larger hypermethylated DMR in the TSS and 5’UTR of HOXA4 (Fig. 7A). However, HOXA4 is not expressed in the blood, and, therefore, would not be expected to be impacted by differential methylation. Thus, we have shown that CHD2 is associated with DMRs in the blood that correspond with the episignature and are enriched in functional regions (enhancers, promoters, bivalent regions, TFBS, and DNase sites). Our work suggests that future studies should investigate the CHD2 episignature in disease-relevant tissue types where DMRs are likely to contribute directly to gene dysregulation and disease pathogenesis.
Here, we have utilized various DNA methylation analyses to identify causative and candidate etiologies in 2% of our cohort of 582 individuals with uDEEs. While DNA methylation does not explain the majority of DEEs, methylation array yield is comparable to the current added utility of GS50,51 and remains a low-cost approach that can detect missed genetic etiologies and propose new molecular candidates. Importantly, this yield is expected to increase over time as we interrogate the functional consequences of rare DMRs and better understand which genes and pathways exhibit episignatures, including unraveling inconclusive episignature results. We have also investigated the episignature for the DEE gene CHD2 in-depth and have provided evidence that the CHD2 episignature is associated with DMRs. DMRs are enriched in functional regions and may affect gene expression, especially in disease-relevant tissue types. Furthermore, CHD2 episignatures and associated DMRs may have potential as a biomarker readout for therapeutic testing, as the DNA methylation might potentially be reversed with targeted treatment. Thus, our work highlights the impact of investigating DNA methylation in DEEs, both for the genetic diagnosis of unsolved cases and to augment our understanding of underlying disease function toward the future development of targeted therapies.
Methods
Cohorts
Our cohorts consist of 593 affected individuals (43% female) with uDEEs and 475 healthy controls (47% female) (Fig. 1B, Supplementary Data 1A, Supplementary Methods). An additional 148 analytical controls (60% female) were included for validation. Individuals with DEEs were recruited from investigators’ research and clinical programs64,65. Methylation array data for healthy controls were drawn from a public database66 (n = 111), an internal institutional database (SJLIFE, n = 335), and unaffected parents or siblings (n = 29) of probands with DEEs (Supplementary Methods). Eight family members with epilepsy were studied to identify familial methylation patterns (shared rare DMRs or episignatures). Analytical controls, including i) six individuals each with a disease-associated rare DMR, ii) 26 individuals with a pathogenic variant in a gene or CNV associated with an episignature, and iii) 116 individuals with a pathogenic variant in a gene without a known episignature, were used to validate positive and negative rare DMR and episignature findings in the DEE cohort. After quality control and normalization (described below), there were 582 remaining individuals with uDEEs (43% female) who had undergone extensive molecular testing: 79% (458 individuals) had a gene panel, 51% (298 individuals) microarray or karyotype analysis, 75% (435 individuals) ES, and 40% (232 individuals) GS. Collectively, 97% (562 individuals) had at least one sequence-based investigation (gene panel, ES, or GS). There were also 461 healthy controls (47% female), 143 analytical controls (57% female), and eight affected family members for DNA methylation analysis. This study was approved by the Institutional Review Board (IRB) of St. Jude Children’s Research Hospital (SJCRH). Written informed consent was provided by parents or legal guardians of individuals with DEEs with local IRB approval from SJCRH, Austin Health (Australia), the University of Washington (UW), and the National Institutes of Health (NIH). For any photographs shown in the supplement, we affirm that the patients and representatives have consented to open-access publication and have seen the photos in the context of the publication.
Methylation array
All data were from peripheral blood-derived DNA, except for five analytical control samples used for outlier DMR analysis: saliva-derived DNA from one female individual with BSS and her mother (carrier) and lymphoblastoid cell line (LCL)-derived DNA from three individuals, including two males and one female, with Fragile X syndrome (Coriell). These samples were used as positive controls to validate the outlier analysis, and then removed from the final analysis to minimize potential cell type differences. DNA was extracted from peripheral blood samples using standard protocols, with approximately 250–500 ng of DNA bisulfite converted. The Illumina Infinium MethylationEPIC v1.0 (850 K array) bead chip arrays (processed according to the manufacturer’s protocol) interrogate > 850,000 individual CpG sites, including CpG islands, promoter regions, gene bodies, FANTOM5 enhancers, and proximal ENCODE regulatory elements67.
Of 1224 individuals included, three individuals were run in triplicate, and 29 were run in duplicate to produce a total of 1259 blood-derived DNA methylation array samples before quality control and processing. Each sample consisted of data for > 850,000 probes that were rigorously quality-controlled for the removal of outlier samples as opposed to outlier regions of interest. All data were combined and loaded into the R package minfi68 for quality control and normalization. Samples judged to be of poor quality ( > 1% of probes that failed) and samples that were deemed outliers based on manual inspection of the principal component analysis (PC1 and PC2), using β values for probes located on chromosome (chr) 1, were removed (Supplementary Fig. 1). Individual CpG probes that failed (detection p > 0.01) in > 10% of samples were removed; also, probes overlapping with common SNPs and those previously reported as cross-reactive were removed67,69. Since samples were run in multiple batches and at different institutions, we visually examined the PCA plot for batch effects. The only batch effect observed was on PC1 between the SJLIFE unaffected control cohort and the rest of the samples analyzed (including both cases and controls). We used the R package SVA70 for batch correction using the ComBat method and confirmed the elimination of the batch effect (Supplementary Fig. 1, Supplementary Data 1B)71. We estimated blood cell type composition for six cell types (CD8T, CD4T, NK, B-cell, monocytes, and granulocytes) from β values for each sample72. Samples containing outlier cellular fractions defined as ≥ 99th percentile + 2% or ≤ 1st percentile − 2% for at least two of the six cell types were also removed. Methylation array intensity values on the sex chromosomes (X, Y) were used to infer the sample sex and compared to the clinically reported sex. Samples with sex mismatches were removed. Samples were separated into inferred sex (males and females) for all downstream analyses of sex chromosomes. Quality control and filtering left 1226 samples across 1194 individuals (26 individuals in duplicate and three individuals in triplicate across batches) assayed by the 850 K array and 793,009 probes (775,431 autosomal probes and 17,578 sex chromosome probes) (Supplementary Data 1C).
Identification and annotation of rare epivariants
To identify outlier DMRs, we used a sliding window approach as previously described17. In brief, this algorithm employs user-defined quantile thresholds to determine outlier β values across multiple CpG sites. Per 1 Kb window, at least three consecutive CpG sites must exhibit outlier β values in the same direction (hyper or hypo) for a sample compared to the rest of the cohort to be considered an outlier DMR. We considered β values above the 99.25th percentile plus 0.15 as hypermethylated, and those below the 0.75th percentile minus 0.15 as hypomethylated for analysis of the autosomes (chr1-chr22). Since samples were split into inferred sex (males and females) for analysis of the sex chromosomes, the stringency was adjusted accordingly to 99th plus 0.15 for hypermethylated and 1st percentile minus 0.15 for hypomethylated. Samples with over 100 rare DMRs on the autosomes during the initial analysis (n = 7) were removed from the final analysis as this is thought to be artifactual and may interfere with real signal. DMRs were then annotated to inform functional interpretation using HOMER73 and including overlap with UCSC RefSeq gene bodies and promoter regions, defined as ± 2 Kb of the transcription start sites (TSS), known CpG islands (CGIs), repetitive-element information (RepeatMasker and SimpleRepeats), imprinting control centers74, CTCF-binding sites75, gene molecular function information73, OMIM phenotype76, average brain expression using bulk RNA-seq data from the GTEx Portal, and in-house epilepsy- and candidate-gene lists to prioritize candidates but not as exclusion criteria. Additionally, a recent study delineated the rare DMR landscape in the human population by examining 450 K methylation array data from > 23,000 individuals17. Regions from those data were checked against our DMRs where possible to determine the frequency at which each DMR occurs in the population. Based on this annotation information, DMRs were prioritized by four features: (1) a low or negligible population frequency; (2) a well-annotated genomic location, such as in or near known epilepsy and candidate genes; (3) recurrence in multiple individuals; and (4) manual inspection of DMRs, including flanking regions.
Development of a DNA methylation array analysis and visualization pipeline
We developed MethylMiner, a methylation array analysis pipeline tailored toward discovering rare epivariants with interactive data visualization. The pipeline requires standard input files, raw signal.idat files containing each sample’s green and red channels, and a metadata sheet including sample names, sentrix IDs, reported sample sex, and sample group (if applicable). In brief, the pipeline performs quality control and normalization as described to derive output files, including quality control reports, β values, M-values, and bigWig files for quick and convenient visualization in the integrative genomics viewer (IGV)77. The pipeline then performs the outlier DMR analysis (using scripts derived from the GitHub repository: https://github.com/AndyMSSMLab/Methylation_script) based on user-defined quantile thresholds and outputs the DMRs and annotations into a tabulated sheet. This annotated list of DMRs is then used as input for the interactive data visualization in JupyterDash, which allows users to interact with plots for quality control metrics, DMR annotations, and DMR genomic tracks. Static DMR plots, like those displayed throughout this manuscript, were created using the <AndyMSSMLab/Methylation_script/blob/main/plotDMR.R> script. The MethylMiner pipeline is hosted on our GitHub page (https://github.com/stjude-biohackathon/MethylMiner).
Validation of outlier DMRs using enzymatic methyl-sequencing
We performed targeted Enzymatic Methyl-sequencing (targeted EM-seq) enriched with the Twist Human methylome panel targeting 3.98 M CpGs through 123 Mb of genomic content. Targeted EM-seq of peripheral blood-derived DNA was used to validate a subset of outlier DMRs, including n = 2 positive control DMRs (XYLT1 and FMR1) and n = 29 DMRs-of-interest called amongst n = 6 individuals with uDEEs and n = 4 family members. EM-seq library preparation, target enrichment, and sequencing were performed using standard protocols53. Reads were processed using the “nf-core/methyseq” pipeline with the ‘--emseq’ flag. For detailed EM-seq methods, please refer to Supplementary Methods.
Identification of structural variants with long-read sequencing
We used both targeted and whole-genome LRS on the ONT platform to validate rare DMRs and identify candidate disease-causing variants at or near the site of interest (Supplementary Data 2A). Targeted LRS using the “read-until” function was performed on an ONT GridION using a single R9.4.1 flowcell as described previously78. At least 100 Kb of sequence was added to either side of the target region for capture. Libraries for GS were prepared using the ligation sequencing kit (SQK-LSK110) following the manufacturer’s instructions, then loaded onto a single flowcell (FLO-PRO110, R9.4.1) on a PromethION and run for 72 h with one wash and reload. All data were base called using Guppy 6.3.2 (ONT) with the superior model including 5mC methylation. Reads were aligned to GRCh38/hg38 using minimap279, SNP and indel variants were called using Clair380, structural variants were called using Sniffles81, SVIM82, and CuteSV83, and phasing was performed using LongPhase84. Aligned and phased bam files were visualized in IGV77.
Episignature testing
Data were blinded and submitted to the clinical bioinformatics laboratory [Molecular Diagnostics Laboratory, London Health Sciences Centre (LHSC), Western University, London, Canada] through a secure file transfer protocol and stored on encrypted servers. The data analysis pipeline was adapted from previously described methods25 as summarized in Fig. 1A. Importantly, probes with a detection p-value > 0.01, probes located on the X and Y chromosomes, probes that contained SNPs at the CpG interrogation or single-nucleotide extension sites, and probes that are known to cross-react with other genomic locations were removed67,69. DNA methylation data for each sample were compared to clinically validated DNA methylation signatures for all disorders which are part of the EpiSignTM v4 clinical test85. The reference database EpiSignTM Knowledge Database (EKD) includes thousands of clinical, peripheral blood DNA methylation profiles from disorder-specific reference and normal controls (general population samples of various ages and racial backgrounds). Individual DNA methylation data for each individual were compared with the EKD using the support vector machine (SVM) based classification algorithm for EpiSignTM disorders. A Methylation Variant Pathogenicity (MVP) score between 0 and 1 was generated to represent the confidence of prediction for the specific disorder the SVM was trained to detect. Conversion of SVM decision values to these scores was carried out according to the Platt scaling method86.
Classification for a specific EpiSignTM disorder included a combination of MVP score, hierarchical clustering, multidimensional scaling (MDS) of an individual’s methylation data relative to the disorder-specific EpiSignTM probe sets and controls. MVP score assessment had a scale with thresholds of > 0.5 for positive, < 0.1 negative, 0.1–0.5 inconclusive or moderate confidence. A detailed description of this analytics protocol was described previously25,87. Possible types of results included: positive (matching an EpiSignTM disorder), negative (not matching any EpiSignTM disorder), and inconclusive (described in detail in results).
Exome and genome sequencing
If sequencing data were already available for the individual on a collaborative research basis, these data were reviewed. If the data were unavailable, ES or GS was performed on peripheral blood-derived DNA using standard Illumina short-read sequencing techniques and bioinformatic approaches (Supplementary Methods). We validated potentially pathogenic variants with Sanger sequencing and confirmed sample identity and relatedness (e.g. trios) using Powerplex Short-Tandem Repeat (STR) Identification analysis.
RNA-sequencing and gene expression analysis
RNA was extracted using the Quick-RNA Miniprep Kit (Zymo Research) from dermal fibroblasts established from skin punch biopsies for Family 2 (n = 2) and Family 3 (n = 3) described in the results. RNA-seq was performed using standard Illumina short-read sequencing practices (Supplementary Methods), and the reads were processed using the “nfcore/rnaseq” pipeline. Removal of the adapter sequences was performed using Trim Galore!, and low-quality reads were eliminated with FastQC88. Subsequently, reads were aligned to a reference genome using the STAR aligner89. Gene expression quantification was performed using Salmon90, which estimates transcript abundance. To determine gene “dropout,” the OUTRIDER algorithm36 was applied to RNA-seq data for Family 2 (proband and mother), Family 3 (proband and father), and Family 3 (mother and father) against a publicly available dataset of n = 139 fibroblast samples37. PCA displayed no batch groupings, and genes with Fragments Per Kilobase of transcript per Million mapped reads (FPKM) < 1 were removed as lowly expressed genes. Results were considered significant if they had a padj < 0.05 and a z-score cutoff of ± 2.
Refinement of a CHD2 episignature
A total of 17 females and 12 males with genetic variants in CHD2 and clinical features consistent with CHD2-epileptic encephalopathy of childhood (EEOC) were included in this expanded 850 K cohort. The detailed list of genetic variants classified as pathogenic or likely pathogenic according to the American College of Medical Genetics guidelines is in Supplementary Data 2B. All samples and records were deidentified.
Details of the methylation data analysis and episignature refinement are as previously described25,52,60,91. Briefly, methylation signal intensities were imported into R 4.1.3 for analysis. Normalization was performed by the Illumina normalization method with background correction using minfi68. Probes located on X and Y chromosomes, known SNPs, or probes that cross-react were excluded67,69. Samples containing failed probes of more than 5% (p > 0.1, calculated by the minfi package) were also removed. The genome-wide methylation density of all samples was examined, and principal component analysis (PCA) was performed to visualize the overall data structure of the batches and to identify outlier samples. All 29 samples passed and were used for probe selection. The MatchIt package was used to randomly select controls, which were matched for age, sex, and array type from the EKD at the LHSC, as previously described in refs. 25,92. The methylation level of each probe was calculated as the ratio of methylated signal intensity over the sum of methylated and unmethylated signal intensities (β-values), ranging between 0 (completely unmethylated) and 1 (fully methylated). β-values were then converted to M-values by logit transformation using the formula log2(β/(1-β)) to perform linear regression modeling, which was used to identify the differentially methylated probes (DMPs), via the R package limma93. The analysis was also adjusted for blood cell-type compositions, using the Houseman algorithm94. The estimated blood cell proportions were added to the model matrix of the linear models as confounding variables. The generated p-values were moderated using the eBayes function in the limma package and were corrected for multiple testing using the Benjamini and Hochberg (BH) method.
Following this, probe selection was performed in three steps. Firstly, 1000 probes were selected, which had the highest product of methylation difference means between case and control samples and the negative of the logarithm of multiple-testing corrected p values derived from the linear modeling. Secondly, a receiver’s operating characteristic (ROC) curve analysis was performed, and 200 probes with the highest area under the ROC curve (AUC) were retained. Lastly, probes having pair-wise Pearson’s correlation coefficient greater than 0.85 within case and control samples separately were removed (none of the selected 200 probes met this criteria). This resulted in the identification of 200 DMPs. These probes were used for the construction of a hierarchical clustering model using Ward’s method on Euclidean distance, as well as a MDS model by scaling of the pairwise Euclidean distances between samples.
Functional annotation and correlation of the CHD2 episignature
Functional annotation and episignature cohort comparisons were performed according to our published methods87. Briefly, to assess the percentage of DMPs shared between the CHD2 episignature and other neurodevelopmental conditions on the EpiSign™ clinical classifier, heatmaps and circos plots were produced. Heatmaps were plotted using the R package pheatmap (version 1.0.12) and circos plots using the R package circlize (version 0.4.15)95. To determine the genomic location of the DMPs, probes were annotated in relation to CGIs and genes using the R package annotatr49 with AnnotationHub and annotations hg19_cpgs, hg19_basicgenes, hg19_genes_intergenic, and hg19_genes_intronexonboundaries. CGI annotations included CGI shores from 0–2 Kb on either side of CGIs, CGI shelves from 2–4 Kb on either side of CGIs, and inter-CGI regions encompassing all remaining regions. A chi-squared goodness of fit test was performed in R to investigate the significance between background DMP annotation distribution and the CHD2 cohort annotation distribution. P values were obtained for both annotation categories (gene and CGIs). To assess the relationship between the expanded 850 K only CHD2 cohort and other EpiSign™ disorders, the distance and similarities between cohorts were analyzed using clustering methods and visualized on a tree and leaf plot. This assessed the top 500 DMPs for each cohort, ranked by p-value. For cohorts with less than 500 DMPs, all DMPs were used. Tree and leaf plots, generated using the R package TreeAndLeaf96, illustrated additional information, including global mean methylation difference and total number of DMPs identified for each cohort.
Whole-genome bisulfite sequencing
Genomic peripheral blood-derived DNA from n = 3 CHD2 trios (proband and parents) and n = 1 CHD2 singleton (proband) (total n = 10 samples) were bisulfite-converted and then underwent WGBS using standard Illumina short-read sequencing processing methods (Supplementary Methods). Reads were trimmed by Trim Galore! and aligned to the GRCh38/hg38 human genome reference using BSMAP2.74. The methylation ratios from BSAMP mapping results were extracted using methratio.py. Duplicated reads were removed and CpG methylation from both strands was combined. The methylation ratios were also corrected according to the C/T SNP information estimated by the G/A counts on reverse strand.
DMR calling of DNA methylation array and WGBS
We performed DMR analysis on Illumina 850 K EPIC methylation array data for 16 individuals with DEEs harboring pathogenic variants in CHD2 compared to 18 controls. The data were normalized using the minfi package’s functional normalization algorithm97, and we employed two independent R packages to call DMRs, bumphunter44 and DMRcate45. DMRs were defined as those passing a significance threshold of p < 0.05 for bumphunter and Fisher’s multiple comparison P < 0.05 for DMRcate. A minimum of three CpGs and mean methylation difference between CHD2 and controls of at least 5% was also required (bumphunter “cutoff” and DMRcate “betacutoff”= 0.05) in either the hyper or hypo direction. For bumphunter, smoothing was used, and the number of permutations for each condition was set to B = 1000. For DMRcate, default settings were used, and the Gaussian kernel bandwidth for smoothed-function estimation was set to λ = 1000, meaning that significant CpGs further than 1000 nucleotides were in separate DMRs.
The methylCall data from WGBS, which consists of the total number of reads covered for each CpG site and the number of methylated C’s at each CpG site, was used for calling DMRs between four individuals with DEEs caused by pathogenic CHD2 variants and six unaffected parents. Firstly, CpG sites with less than 10X coverage and those on the sex chromosomes were removed. DMRs were called from WGBS methylCall data using two independent R packages, DMRcate48 and DSS47. DMRcate identifies and ranks the most differentially methylated regions across the genome, while DSS detects differentially methylated loci or regions from WGBS. For DMRcate, the scaling factor for bandwidth “C” was set to 50, as recommended for WGBS. DSS was run with default parameters. DMRs were defined by each algorithm (with smoothing) as regions of a minimum of five CpGs with significance (Fisher’s multiple comparison P value < 0.05) and minimum methylation differences of 5% in either the hyper or hypo direction (DSS “delta” and DMRcate “betacutoff”=0.05) between cases and controls.
The genomic locations of output DMR calls were intersected between both callers requiring a minimum overlap of 50 bp in the same direction to reduce the false positive rate. This resulted in high-confidence lists of DMRs predicted by two independent callers each for array (bumphunter and DMRcate) and WGBS (DMRcate and DSS). The methylation difference between CHD2 and control was averaged between both callers for the final DMR list. DMRs were segmented by mean methylation difference between CHD2 and control (5%, 10%, 15%, and 20%) for visualization and annotation with CpG elements (islands, shores, shelves) and gene regions (1–5 Kb upstream TSS, promoters as < 1 Kb upstream TSS, 5’UTRs, exons, introns, and 3’UTRs) using annotator49. To get adequate CpG element counting (i.e. a DMR spanning both a shore and shelf would not get counted twice), CpG annotations were adjusted for DMR size by calculating representation across CpG elements as a fraction of the total DMR length. Details for in-depth annotation and enrichment calculation of CHD2 epsignature probes and DMRs for regulatory elements (bivalent regions, enhancers, promoters, TFBS, and DNase sites) may be found in Supplementary Methods.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Methylation array data for individuals with uDEEs and those with pathogenic variants in CHD2 who have given consent for data sharing is available through the Gene Expression Omnibus (GSE269416). Additional data requests can be directed to H.C.M.
Code availability
The methylation array analysis pipeline used in part of this study for epivariant detection can be accessed on GitHub: https://github.com/stjude-biohackathon/MethylMiner. Further bash and shell scripts created for this manuscript and used in the analysis may be found on the Mefford Laboratory GitHub: https://github.com/MeffordLab/2024_GenomeWideMethylationPaper. EpiSignTM is proprietary commercial software and is not publicly available.
References
Scheffer, I. E. et al. ILAE classification of the epilepsies: Position paper of the ILAE Commission for Classification and Terminology. Epilepsia 58, 512–521 (2017).
Oliver, K. L. et al. Genes4Epilepsy: An epilepsy gene resource. Epilepsia 64, 1368–1375 (2023).
Poke, G., Stanley, J., Scheffer, I. E. & Sadleir, L. G. Epidemiology of Developmental and Epileptic Encephalopathy and of Intellectual Disability and Epilepsy in Children. Neurology 100, e1363–e1375 (2023).
Palmer, E. E. et al. Integrating exome sequencing into a diagnostic pathway for epileptic encephalopathy: Evidence of clinical utility and cost effectiveness. Mol. Genet Genom. Med 6, 186–199 (2018).
McTague, A., Howell, K. B., Cross, J. H., Kurian, M. A. & Scheffer, I. E. The genetic landscape of the epileptic encephalopathies of infancy and childhood. Lancet Neurol. 15, 304–316 (2016).
Sanchez Fernandez, I., Loddenkemper, T., Gainza-Lein, M., Sheidley, B. R. & Poduri, A. Diagnostic yield of genetic tests in epilepsy: A meta-analysis and cost-effectiveness study. Neurology 92, e418–e428 (2019).
Symonds, J. D. & McTague, A. Epilepsy and developmental disorders: Next generation sequencing in the clinic. Eur. J. Paediatr. Neurol. 24, 15–23 (2020).
Mefford, H. C. et al. Rare copy number variants are an important cause of epileptic encephalopathies. Ann. Neurol. 70, 974–985 (2011).
A roadmap for precision medicine in the epilepsies. The Lancet Neurology 14, 1219-1228 (2015).
Bayat, A., Bayat, M., Rubboli, G. & Moller, R. S. Epilepsy Syndromes in the First Year of Life and Usefulness of Genetic Testing for Precision Therapy. Genes (Basel) 12, 1051 (2021).
D’Gama, A. M. et al. Evaluation of the feasibility, diagnostic yield, and clinical utility of rapid genome sequencing in infantile epilepsy (Gene-STEPS): an international, multicentre, pilot cohort study. Lancet Neurol. 22, 812–825 (2023).
Sheidley, B. R. et al. Genetic testing for the epilepsies: A systematic review. Epilepsia, https://doi.org/10.1111/epi.17141 (2021).
Kohler, J. N., Turbitt, E. & Biesecker, B. B. Personal utility in genomic testing: a systematic literature review. Eur. J. Hum. Genet. 25, 662–668 (2017).
Jeffrey, J. S. et al. Developmental and epileptic encephalopathy: Personal utility of a genetic diagnosis for families. Epilepsia Open 6, 149–159 (2021).
Swartwood, S. M. et al. Early genetic testing in pediatric epilepsy: Diagnostic and cost implications. Epilepsia Open, https://doi.org/10.1002/epi4.12878 (2023).
Moore, L. D., Le, T. & Fan, G. DNA methylation and its basic function. Neuropsychopharmacology 38, 23–38 (2013).
Garg, P. et al. A Survey of Rare Epigenetic Variation in 23,116 Human Genomes Identifies Disease-Relevant Epivariations and CGG Expansions. Am. J. Hum. Genet. 107, 654–669 (2020).
Ligtenberg, M. J. et al. Heritable somatic methylation and inactivation of MSH2 in families with Lynch syndrome due to deletion of the 3’ exons of TACSTD1. Nat. Genet. 41, 112–117 (2009).
Evans, D. G. R. et al. A Dominantly Inherited 5’ UTR Variant Causing Methylation-Associated Silencing of BRCA1 as a Cause of Breast and Ovarian Cancer. Am. J. Hum. Genet. 103, 213–220 (2018).
Bagni, C., Tassone, F., Neri, G. & Hagerman, R. Fragile X syndrome: causes, diagnosis, mechanisms, and therapeutics. J. Clin. Invest 122, 4314–4322 (2012).
LaCroix, A. J. et al. GGC repeat expansion and exon 1 methylation of XYLT1 is a common pathogenic variant in baratela-scott syndrome. Am. J. Hum. Genet. 104, 35–44 (2019).
Barbosa, M. et al. Identification of rare de novo epigenetic variations in congenital disorders. Nat. Commun. 9, 2064 (2018).
Capper, D. et al. DNA methylation-based classification of central nervous system tumours. Nature 555, 469–474 (2018).
Levy, M. A. et al. Novel diagnostic DNA methylation episignatures expand and refine the epigenetic landscapes of Mendelian disorders. Hum. Genet. Genomics Adv. 3, 100075 (2021).
Aref-Eshghi, E. et al. Evaluation of DNA Methylation Episignatures for Diagnosis and Phenotype Correlations in 42 Mendelian Neurodevelopmental Disorders. Am. J. Hum. Genet 106, 356–370 (2020).
Sadikovic, B. et al. Clinical epigenomics: genome-wide DNA methylation analysis for the diagnosis of Mendelian disorders. Genet. Med. 23, 1065–1074 (2021).
van Jaarsveld, R. H. et al. Delineation of a KDM2B-related neurodevelopmental disorder and its associated DNA methylation signature. Genet. Med. 25, 49–62 (2023).
Foroutan, A. et al. Clinical utility of a unique genome-wide DNA methylation signature for KMT2A-related syndrome. Int J. Mol. Sci. 23, 1815 (2022).
Cappuccio, G. et al. De novo SMARCA2 variants clustered outside the helicase domain cause a new recognizable syndrome with intellectual disability and blepharophimosis distinct from Nicolaides-Baraitser syndrome. Genet. Med. 22, 1838–1850 (2020).
Levy, M. A. et al. Deficiency of TET3 leads to a genome-wide DNA hypermethylation episignature in human whole blood. NPJ Genom. Med. 6, 92 (2021).
Levy, M. A. et al. Functional correlation of genome-wide DNA methylation profiles in genetic neurodevelopmental disorders. Hum. Mutat. 43, 1609–1628 (2022).
LaFlamme, C. W., Pandurang K., Djekidel M. N., Rosikiewicz W. MethylMiner: A methylation array analysis pipeline tailored for discovering rare methylation events with interactive data visualization, <https://github.com/stjude-biohackathon/MethylMiner> (2022).
Tsutsumi, M. et al. A female patient with retinoblastoma and severe intellectual disability carrying an X;13 balanced translocation without rearrangement in the RB1 gene: a case report. BMC Med. Genomics 12, 182 (2019).
Chen, X. et al. A de novo pathogenic CSNK1E mutation identified by exome sequencing in family trios with epileptic encephalopathy. Hum. Mutat. 40, 281–287 (2019).
Sobreira, N., Schiettecatte, F., Valle, D. & Hamosh, A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum. Mutat. 36, 928–930 (2015).
Brechtmann, F. et al. OUTRIDER: A Statistical Method for Detecting Aberrantly Expressed Genes in RNA Sequencing Data. Am. J. Hum. Genet. 103, 907–917 (2018).
Yépez, V. A., Murdock, D. R. & Lee, B. Gene expression counts from fibroblast, strand- specific, BCM UDN. Zenodo, https://doi.org/10.5281/zenodo.3963474 (2020).
Winnepenninckx, B. et al. CGG-repeat expansion in the DIP2B gene is associated with the fragile site FRA12A on chromosome 12q13.1. Am. J. Hum. Genet. 80, 221–231 (2007).
Kiedrowski, L. A. et al. DNA methylation assay for X-chromosome inactivation in female human iPS cells. Stem Cell Rev. Rep. 7, 969–975 (2011).
Hovestadt V., Z. M. conumee: Enhanced copy-number variation analysis using Illumina DNA methylation arrays. R package version 1.9.0.
Kerkhof, J. et al. Diagnostic utility and reporting recommendations for clinical DNA methylation episignature testing in genetically undiagnosed rare diseases. Genet Med. 26, 101075 (2024).
Harada, A. et al. Chd2 interacts with H3.3 to determine myogenic cell fate. EMBO J. 31, 2994–3007 (2012).
Lamar, K. J. & Carvill, G. L. Chromatin Remodeling Proteins in Epilepsy: Lessons From CHD2-Associated Epilepsy. Front Mol. Neurosci. 11, 208 (2018).
Jaffe, A. E. et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J. Epidemiol. 41, 200–209 (2012).
Peters, T. J. et al. De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin 8, 6 (2015).
Geeleher, P. et al. Gene-set analysis is severely biased when applied to genome-wide methylation data. Bioinformatics 29, 1851–1857 (2013).
Wu, H. et al. Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates. Nucleic Acids Res. 43, e141 (2015).
Peters, T. J. et al. Calling differentially methylated regions from whole genome bisulphite sequencing with DMRcate. Nucleic Acids Res. 49, e109 (2021).
Cavalcante, R. G. & Sartor, M. A. annotatr: genomic regions in context. Bioinformatics 33, 2381–2383 (2017).
Alfares, A. et al. Whole-genome sequencing offers additional but limited clinical utility compared with reanalysis of whole-exome sequencing. Genet Med. 20, 1328–1333 (2018).
Palmer, E. E. et al. Diagnostic yield of whole genome sequencing after nondiagnostic exome sequencing or gene panel in developmental and epileptic encephalopathies. Neurology 96, e1770–e1782 (2021).
Aref-Eshghi, E. et al. Diagnostic utility of genome-wide DNA methylation testing in genetically unsolved individuals with suspected hereditary conditions. Am. J. Hum. Genet 104, 685–700 (2019).
Vaisvila, R. et al. Enzymatic methyl sequencing detects DNA methylation at single-base resolution from picograms of DNA. Genome Res. 31, 1280–1289 (2021).
Vielhaber, E., Eide, E., Rivers, A., Gao, Z. H. & Virshup, D. M. Nuclear entry of the circadian regulator mPER1 is controlled by mammalian casein kinase I epsilon. Mol. Cell Biol. 20, 4888–4899 (2000).
Lee, C., Weaver, D. R. & Reppert, S. M. Direct association between mouse PERIOD and CKIepsilon is critical for a functioning circadian clock. Mol. Cell Biol. 24, 584–594 (2004).
Toh, K. L. et al. An hPer2 phosphorylation site mutation in familial advanced sleep phase syndrome. Science 291, 1040–1043 (2001).
Zhou, L. et al. The circadian clock gene Csnk1e regulates rapid eye movement sleep amount, and nonrapid eye movement sleep architecture in mice. Sleep 37, 785–793 (2014).
Leitao, E. et al. Systematic analysis and prediction of genes associated with monogenic disorders on human chromosome X. Nat. Commun. 13, 6570 (2022).
Platzer, K. et al. De novo missense variants in SLC32A1 cause a developmental and epileptic encephalopathy due to impaired GABAergic neurotransmission. Ann. Neurol. 92, 958–973 (2022).
Aref-Eshghi, E. et al. BAFopathies’ DNA methylation epi-signatures demonstrate diagnostic utility and functional continuum of Coffin-Siris and Nicolaides-Baraitser syndromes. Nat. Commun. 9, 4885 (2018).
Liu, P. et al. Reanalysis of Clinical Exome Sequencing Data. N. Engl. J. Med. 380, 2478–2480 (2019).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Rooney, K. et al. DNA methylation episignature and comparative epigenomic profiling of HNRNPU-related neurodevelopmental disorder. Genet Med. 25, 100871 (2023).
Carvill, G. L. et al. Targeted resequencing in epileptic encephalopathies identifies de novo mutations in CHD2 and SYNGAP1. Nat. Genet. 45, 825–830 (2013).
Scheffer, I. E. et al. Exome sequencing for patients with developmental and epileptic encephalopathies in clinical practice. Dev. Med Child Neurol. 65, 50–57 (2023).
Parkinson Progression Marker, I. The Parkinson Progression Marker Initiative (PPMI). Prog. Neurobiol. 95, 629-635 (2011).
Pidsley, R. et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 17, 208 (2016).
Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).
Chen, Y. A. et al. Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray. Epigenetics 8, 203–209 (2013).
Leek, J. T., Johnson, W. E., Parker, H. S., Jaffe, A. E. & Storey, J. D. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics 28, 882–883 (2012).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Jaffe, A. E. & Irizarry, R. A. Accounting for cellular heterogeneity is critical in epigenome-wide association studies. Genome Biol. 15, R31 (2014).
Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38, 576–589 (2010).
Court, F. et al. Genome-wide parent-of-origin DNA methylation analysis reveals the intricacies of human imprinting and suggests a germline methylation-independent mechanism of establishment. Genome Res. 24, 554–569 (2014).
Zhou, W., Laird, P. W. & Shen, H. Comprehensive characterization, annotation and innovative use of Infinium DNA methylation BeadChip probes. Nucleic Acids Res. 45, e22 (2017).
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–517, (2005).
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Miller, D. E. et al. Targeted long-read sequencing identifies missing disease-causing variation. Am. J. Hum. Genet 108, 1436–1449 (2021).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797–803 (2022).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Lin, J. H., Chen, L. C., Yu, S. C. & Huang, Y. T. LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. Bioinformatics 38, 1816–1822 (2022).
Greenwood Diagnostic Laboratories, London Health Science Centre. EpiSign v4 Menu, https://episign.lhsc.on.ca/img/EpiSign_v4_Menu.pdf).
Platt, J. C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10, 63–70 (2000).
Sadikovic, B., Levy, M. A. & Aref-Eshghi, E. Functional annotation of genomic variation: DNA methylation episignatures in neurodevelopmental Mendelian disorders. Hum. Mol. Genet. 29, R27–R32 (2020).
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data, <http://www.bioinformatics.babraham.ac.uk/projects/fastqc/> (2010).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Aref-Eshghi, E. et al. Genomic DNA Methylation Signatures Enable Concurrent Diagnosis and Clinical Genetic Variant Classification in Neurodevelopmental Syndromes. Am. J. Hum. Genet 102, 156–174 (2018).
Ho, D. E., Imai, K., King, G. & Stuart, E. A. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Anal. 15, 199–236 (2017).
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).
Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinforma. 13, 86 (2012).
Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. circlize Implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014).
TreeAndLeaf: Displaying binary trees with focus on dendrogram leaves (R package version 1.12.0., 2023).
Fortin, J.-P. et al. Functional normalization of 450k methylation array data improves replication in large cancer studies. Genome Biol. 15, 503 (2014).
Gel, B. & Serra, E. karyoploteR: an R/Bioconductor package to plot customizable genomes displaying arbitrary data. Bioinformatics 33, 3088–3090 (2017).
Giacopuzzi, E., Popitsch, N. & Taylor, J. C. GREEN-DB: a framework for the annotation and prioritization of non-coding regulatory variants from whole-genome sequencing data. Nucleic Acids Res. 50, 2522–2535 (2022).
Acknowledgements
We thank all the individuals and their families for participating in this research. Major funding for this project was provided by a grant (#631106) from Citizens United for Research in Epilepsy (CURE). A subset of DNA methylation arrays was provided by the University of Washington Center for Rare Disease Research (UW-CRDR), formerly known as the Center for Mendelian Genomics (CMG), with support from NHGRI grants U01 HG011744, UM1 HG006493, U24 HG011746, and with enthusiastic support from the late Debbie Nickerson. We gratefully acknowledge support from the Australian Epilepsy Research Foundation grant, the Australian National Health and Medical Research Council (NHMRC) Centre for Research Excellence Grant (GNT2006841), NHMRC Synergy Grant (GNT2010562), the Health Research Council of New Zealand, Cure Kids New Zealand, and the Estate of Ernest Hyam Davis and the Tedd and Mollie Carr Endowment Trust. We acknowledge the Epi25 Consortium, which provided exome sequence data for review for a subset of individuals. C.W.L. has been funded through the American Epilepsy Society (AES) predoctoral fellowship (#919453) and the St. Jude Graduate School of Biomedical Sciences. We would also like to acknowledge the inaugural St. Jude Biohackathon 2022 for coordinating the event that led to a team comprised of C.W.L., P.K., M.N.D., and W.R., who assembled the MethylMiner pipeline described here. K.L.P. has been funded through the GRIN2B foundation and CURE. The project was also supported by NIHR Manchester Biomedical Research Centre (NIHR203308) and the MRC Epigenomics of Rare Diseases Node (MR/Y008170/1); we thank Siddharth Banka and David Gokhale for their support. Research reported in this manuscript by M.W.H., S.K., H.D., K.C.W., J.A.R., and H.T.C., was supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under Award Numbers U01HG007709 and U01HG007942. I.E.S. is also supported by an NHMRC Senior Investigator Fellowship (GNT1172897). D.E.M. is supported by NIH grant DP5OD033357. We acknowledge Pratibha Kottapalli and Sanchit Trivedi from the St. Jude Hartwell Center, who performed Illumina sequencing for this project. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. Schematics featured in this manuscript were created with BioRender.com and released under a Creative Commons Attribution-Non-Commercial-Noderivs 4.0 International License (CC-BY-NC-ND).
Author information
Authors and Affiliations
Consortia
Contributions
H.C.M. and C.W.L. conceptualized the study. C.W.L., H.C.M., B.S., C.R., I.E.S., D.E.M., L.G.S., and S.F.B. designed the research methodology. C.W.L., C.R., S.S., H.E.P., M.P.Z., J.G., S.B.G., D.M.N., M.H., E.V.W., D.D., P.K., M.N.D., W.R., H.M., J.K., M.A.L., R.R., and S.K. performed formal analysis. C.W.L., E.P.A., M.P.Z., J.G., N.L., M.H., E.V.W., and S.R.O. executed experimental investigation. H.C.M., C.W.L., UW-CRDR, and G.N. acquired funding. H.C.M, I.E.S., L.G.S., S.F.B., S.J.R., A.L.S., E.S.B., T.J.A., Z.W., G.L.C., H.T.C., J.A.R., K.C.W., H.D., M.W.H., D.L., T.L.S., K.L.P., UDN, G.C., N.C., L.D., D.G., G.L., T.R., D.S., M.L.T., M.A., S.G., and E.A.J. recruited, phenotyped, and provided samples or data for individuals in this study. C.W.L., C.R., S.S., and H.E.P. prepared figures. C.W.L. and H.C.M. wrote the original draft. All authors reviewed and edited the manuscript. Clarification on abbreviations used to refer to the authors and full lists of consortia members (UW-CRDR and UDN) may be found in the Supplementary Information.
Corresponding authors
Ethics declarations
Competing interests
B.S. is a shareholder in EpiSign Inc, a company involved in commercialization of EpiSignTM software. D.E.M. is on a scientific advisory board at ONT and has received travel support from ONT to speak on their behalf. D.E.M. is engaged in a research agreement with ONT. D.E.M. holds stock options in MyOme. I.E.S. has served on scientific advisory boards for BioMarin, Chiesi, Eisai, Encoded Therapeutics, GlaxoSmithKline, Knopp Biosciences, Nutricia, Rogcon, Takeda Pharmaceuticals, UCB, Xenon Pharmaceuticals, Cerecin; has received speaker honoraria from GlaxoSmithKline, UCB, BioMarin, Biocodex, Chiesi, Liva Nova, Nutricia, Zuellig Pharma, Stoke Therapeutics and Eisai; has received funding for travel from UCB, Biocodex, GlaxoSmithKline, Biomarin, Encoded Therapeutics Stoke Therapeutics and Eisai; has served as an investigator for Anavex Life Sciences, Cerevel Therapeutics, Eisai, Encoded Therapeutics, EpiMinder Inc, Epygenyx, ES-Therapeutics, GW Pharma, Marinus, Neurocrine BioSciences, Ovid Therapeutics, Takeda Pharmaceuticals, UCB, Ultragenyx, Xenon Pharmaceuticals, Zogenix and Zynerba; has consulted for Care Beyond Diagnosis, Epilepsy Consortium, Atheneum Partners, Ovid Therapeutics, UCB, Zynerba Pharmaceuticals, BioMarin, Encoded Therapeutics and Biohaven Pharmaceuticals; and is a Non-Executive Director of Bellberry Ltd and a Director of the Australian Academy of Health and Medical Sciences and the Australian Council of Learned Academies Limited. I.E.S. may accrue future revenue on pending patent WO61/010176 (filed: 2008): Therapeutic Compound; has a patent for SCN1A testing held by Bionomics Inc and licensed to various diagnostic companies; has a patent molecular diagnostic/theragnostic target for benign familial infantile epilepsy (BFIE) [PRRT2] 2011904493 & 2012900190 and PCT/AU2012/001321 (TECH ID:2012-009). L.G.S. receives funding from the Health Research Council of New Zealand and Cure Kids New Zealand, is a consultant for the Epilepsy Consortium, and has received travel grants from Seqirus and Nutricia. L.G.S. has received research grants and consultancy fees from Zynerba Pharmaceuticals and has served on Takeda and Eisai Pharmaceuticals scientific advisory panels. The Department of Molecular and Human Genetics at Baylor College of Medicine receives revenue from clinical genetic testing conducted at Baylor Genetics Laboratories. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
LaFlamme, C.W., Rastin, C., Sengupta, S. et al. Diagnostic utility of DNA methylation analysis in genetically unsolved pediatric epilepsies and CHD2 episignature refinement. Nat Commun 15, 6524 (2024). https://doi.org/10.1038/s41467-024-50159-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-024-50159-6