Introduction

Rolandic epilepsy (RE), or epilepsy with centro-temporal spikes (CTS), is one of the most common epilepsy syndromes of childhood. RE is related to rarer and less benign epilepsy syndromes, including atypical benign partial epilepsy, Landau–Kleffner syndrome and epileptic encephalopathy with continuous spike-and-waves during sleep, referred to as RE-related syndromes or atypical rolandic epilepsy (ARE) [1]. In up to 20% sib pairs or families, mutations affecting GRIN2A, a subunit of the excitatory glutamate receptor N-methyl-d-aspartate (NMDA), were found implicated as major risk factor for RE and ARE by us and others [2, 3]. Recently, the association of the genes RBFOX1, RBFOX3, DEPDC5, GABRG2 and genomic duplications at 16p11.2 in 1.5–2.0% was identified in patients with RE and ARE [4,5,6] through candidate gene and loci screens. In the current study, an unbiased exome-wide survey was conducted in the RE/ARE cohort.

Patients and methods

Study participants

Two hundred and four unrelated European Rolandic cases (182 RE, 22 ARE) and 728 population control subjects were included [6]. Children with (typical) RE suffer from perisylvian oromotor seizures frequently starting during sleep. In adolescence, the epilepsy resolves spontaneously, frequently without any intellectual sequels. ARE share the essential electroencephalography feature with RE but show a different seizure symptomatology by their own or in addition to rolandic seizures. Seizures, like in RE, resolve spontaneously, but cognitive outcome is guarded in ARE. In detail, these epilepsies are: atypical benign partial epilepsy of childhood, with atonic seizures and atypical absences in addition to rolandic seizures; Landau–Kleffner syndrome, with loss of speech and cognitive decline; and epilepsia-aphasia syndrome with seizures and language dysfunction [1, 6]. Written informed consent was obtained from participating subjects and, if appropriate, from both patients and adolescents.

Data generation and processing

Exome sequencing of all individuals was performed with the Illumina HiSeq 2000 using the EZ Human Exome Library Kit (NimbleGen, Madison, WI). Sequencing adapters were trimmed and samples with <30× mean depth or <70% total exome coverage at 20× mean depth of coverage were excluded from further analysis. Variant calling was performed in targeted exonic intervals with 100 bp padding using the GATK best practices pipeline [7] against the GRCh37 human reference genome followed by multi-allelic variant decomposition and left normalization. Samples were excluded from further analysis if they (i) were not ethnically matched, (ii) were related, (iii) showed discrepancy with reported sex, (iv) had an excess heterozygosity >3 SD in any of the quality metrics (NALT, NMIN, NHET, NVAR, RATE and SINGLETON statistics as calculated by PLINKseq i-stats parameter [8]. The genotypes of variants with read depth <10 or genotype quality <20 were set to missing. Variants were excluded if they (i) failed variant quality score recalibration (VQSR) or GATK recommended hard filter, (ii) showed missingness >3%, (iii) were present in repeat regions or (iv) had an average read depth <10 in either cases or controls. The ExAC variants were restricted to the exonic intervals used for variant calling in this study, not present in the repeat regions and passed the VQSR threshold.

Variant annotation and filtering

Variants were annotated using ANNOVAR [9] version 2015 Mar 22 with RefSeq and Ensembl, Combined Annotation Dependent Depletion (CADD) scores [10], allele frequencies and dbNSFP (v3.0) annotations. The samples used in this study are of Non-Finnish European (NFE) ancestry, hence to investigate rare variants, we selected variants having a minor allele frequency <0.005 in the European populations of the 1000 genomes, Exome Variant Server and the NFE data from ExAC. We generated three classes of variants for further analyses: (1) deleterious variants (CADD15), which were defined as missense variants with a CADD Phred score >15 as it is the median value across all missense and canonical splice site variants [10], (2) loss-of-function (LOF) variants comprising all rare indels, stop gain, stop loss and splice site variants (2 nt plus/minus the exon boundary), (3) CADD15+LOF variants as the union of the above two datasets, and (4) rare synonymous variants.

Single variant and gene association analysis

For the statistical analysis, we employed two independent control cohorts (available in-house and ExAC) to increase reliability and power of the statistical tests. For single variant burden analysis, we applied the single score method in RVTESTS [11] to cases and in-house controls, for which individual genotypes were available. For gene burden analysis, a 2 × 2 contingency table was constructed by counting the number of alternate allele counts per gene in patients vs. controls (in-house controls and NFE ExAC controls). We then obtained a one-sided p-value, odds ratios and the 95% confidence intervals [12] by using Fisher’s exact test. Resulting p-values were corrected for 18,668 RefSeq protein-coding genes [13] by Bonferroni approach. Finally, to ensure the exclusion of false positive association results and following the 'rare variant of large effect hypothesis', we selected those genes that are present in the first quartile of the Residual Variant Intolerance Score (RVIS) distribution [14].

Selection of gene-sets

We investigated the following four neuron-related gene-sets: (1) genes encoding proteins at synapses downloaded from the SynaptomeDB [15] database (“SYNAPTIC_GENES”, N = 1887), (2) genes of postsynaptic signalling complexes including NMDA receptors (NMDARs) and the neuronal activity-regulated cytoskeleton-associated protein (ARC) [16] (“NMDAR_ARC_COMPLEX”, N = 80), (3) genes encoding proteins at the inhibitory synapses (“INHIBITORY”, N = 5941) and excitatory synapses (“EXCITATORY”, N = 5261) [17], and (4) glutamate receptor subunit encoding genes (“GLUTAMATE_RECEPTORS”, N = 18). In addition, we included five gene-sets associated with disease and/or mutational intolerance: (1) genes encoding targets of Fragile-X-Mental-Retardation-1-Protein [18] (“FMRP_TARGETS_DARNELL”, N = 1772), (2) genes intolerant for variants from ExAC (“EXAC_CONSTRAINED_GENES”, N = 3230), (3) genes intolerant for loss-of-function variants [19] ('constrained') (“CONSTRAINED_GENES_SAMOCHA”, N = 1004), (4) a curated list of dominant genes associated with developmental delay obtained from the DECIPHER database [20] (“DDG2P_MONOALLELIC”, N = 299), and (5) genes found related before to epileptic encephalopathies [21] (“EPILEPTIC_ENCEPHALOPATHY”, N = 73). As control data sets, we used (1) for each dataset the corresponding set of synonymous variants, and (2) the ‘non-constraint’ gene-set including RefSeq genes that have been found tolerant to LOF variants (“GENES_WITHOUT_CONSTRAINT”, N = 14,417). GRIN2A, as the most significant single gene from the burden analysis, was excluded from all gene-sets in order to test if other genes also contribute to the disease association.

Data availability

All the CADD15+LOF variants from our study within the “EPILEPTIC_ENCEPHALOPATHY” gene-set were deposited in the Leiden Open Variation Database (LOVD) (https://databases.lovd.nl/shared/genes). The accession numbers of the deposited variants in LOVD are 188117–188549. Also, the variants present in the cases within the “EPILEPTIC_ENCEPHALOPATHY” gene-set are available in the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/) with the accession numbers SCV000588243–SCV000588353. The variants that were described in our previous studies are indicated in Supplementry Table 1.

Gene-set association analysis

The gene-set association analysis for the different types of variants was performed by using a logistic regression approach using R (version 3.2) and adjusting for the following confounding variables: the total number of called genotypes per sample, the total number of rare coding variants per sample, the total number of rare coding singletons (variants observed only once in the entire dataset) per sample, calculated sex, the first four principal components, and the total number of variants per sample for each variant class.

Results

Exome sequencing and variant filtering

We performed whole-exome sequencing on 204 patients with RE/ARE and 728 population controls. After quality control, the final dataset consisted of 19 ARE, 175 RE and 567 control samples. From the total of 761 samples, 226,521 exonic and splice site variants were called. The mean transition/transversion ratio equalled 3.39 per sample. After the final filtering 45,881 CADD15, 10,326 LOF and 38,802 synonymous variants were analysed.

Association analysis

To investigate the mutational burden within the RE spectrum, all associations were assessed for both RE and ARE separately and by combining cases from both phenotypes while assuming them to be a single disease. In comparison to 567 in-house controls, we did not observe statistically significant burden in any of the variants or genes in cases after multiple-testing correction. In order to increase the statistical power, we used the non-Finnish European (NFE) ExAC cohort as an additional control dataset. Association testing against the much larger NFE-ExAC cohort (N = 33,370) identified an exome-wide significant burden for CADD15, CADD15+LOF and LOF variants for GRIN2A within the combined typical and atypical (RE+ARE) cohort. No other variant-intolerant gene (i.e., being present in the first quartile of RVIS) was significantly enriched for variants in any of the tested patient groups. Although variant enrichment for GRIN2A was not found to be significant after correction for RE and/or ARE separately, the odds ratio for GRIN2A consistently exceeded unity in all the considered datasets (Fig. 1a).

Fig. 1
figure 1

Burden analysis. Typical Rolandic epilepsy is represented as RE, atypical rolandic epilepsy as ARE and RE plus ARE as ROLANDIC. On the x axis, the odds ratios in cases vs controls are given. The names of the variant classes are given on the y axis. Each panel represents a different dataset. The dashed vertical line represents the expected odds ratio of 1. The horizontal lines indicate 95% confidence intervals. a Assessment of risk for deleterious variants in GRIN2A against two control groups (ExAC and In-house). The values on top of each point represent multiple-testing corrected p-values, the ones in red are significant p-values and the ones in black are the p-values that are not significant after multiple-testing correction. The odds ratios are restricted to a maximum value of 50. b Exome-wide burden analysis by different variant classes. The values on top of each point represent the p-value. Synonymous variants serve as a control functional group (colour figure online)

Exome-wide and gene-set burden analysis

Assuming a shared mutational burden in patients across gene-sets of convergent function and/or pathways, we performed gene-set burden analyses by using the in-house controls. A logistic regression approach was used to account for various confounding variables (see Methods section). No significant exome-wide burden was observed across the different variant classes (Fig. 1b). Despite the fact that none of the gene-sets showed a significant result after multiple-testing correction, we found several gene-sets with an odds ratio >1 for the CADD15, CADD15+LOF and LOF variant classes, especially for the LOF variants, but not for synonymous variants (Fig. 2). A similar result was seen when we performed the analysis with ARE and RE independently.

Fig. 2
figure 2

Gene-set burden across different variant classes. Each panel represent a different variant class. The synonymous variants serve as a control variant class. GRIN2A was removed from all gene-sets to identify other contributing genes. On the x axis, the odds ratios in cases vs controls are given. On the y axis, the names of different gene-sets are given. The red vertical line represents the expected odds ratio of 1. The horizontal lines indicate 95% confidence intervals and are restricted to the maximum of odds ratios over all gene-sets. In that case, points are represented as the points without error bars to their right. The uncorrected p-values are shown on top of each point. CADD15 = deleterious predicted missense variants. LOF = Loss-of-function variants (colour figure online)

Discussion

We performed the first exome-wide association study investigating rare genetic variants of large effect in 194 patients with childhood focal epilepsies with CTS in comparison with 567 in-house and online available 33,370 population controls from the ExAC database. By performing an unbiased gene-burden analysis of patients against the in-house and ExAC controls (Fig. 1a), we show that, only for GRIN2A rare CADD15, CADD15+LOF and LOF variants are significantly more frequent in RE and ARE, respectively (odds ratio >1). Owing to the small sample size and genetic heterogeneity, no other gene or gene-set was significantly enriched for variants after correction for multiple-testing (Fig. 2). Since we observe a consistent trend in the odds ratios for the enrichment of LOF variants in several disease-associated gene-sets, we are optimistic that the availability of larger cohorts in the future can allow to identify other genes associated with RE/ARE.