Introduction

Epileptic encephalopathies are severe and therapy-resistant epilepsies of childhood, which frequently lead to developmental delay and multiple associated medical issues. Infantile spasms (IS) and Lennox–Gastaut syndrome (LGS) represent two of the more common broad subtypes of epileptic encephalopathies. Many novel genes for epileptic encephalopathies have been discovered in the last 5 years, fueled by the access to whole-exome sequencing. In particular, exome sequencing has highlighted the important role of de novo variants with current estimates suggesting that over 15% of non-Dravet epileptic encephalopathy cases are explained by a disease-causing de novo variant in an established epileptic encephalopathy gene1, 2 with this estimate increased to over 80% among individuals diagnosed with Dravet syndrome.3 Up to a further 3% have been reported to be explained by likely clinically relevant de novo copy-number variants.4

While the role of de novo genetic variation in epileptic encephalopathies is increasingly understood, the role of recessive genetic variation, outside of recessive neurometabolic disorders such as lysosomal disorders, amino acid or organic acid imbalances, congenital disorders of glycosylation, and some mitochondrial diseases, remains unclear. In our current study, we systematically assessed autosomal recessive inheritance in 320 IS or LGS patient–parent trios who did not have a likely disease-causing de novo variant among one of the established dominant epileptic encephalopathy genes.1, 2 In general, the 320 cases studied here had already been intensively studied for neurometabolic disorders using biochemical assessments.

Subjects and methods

Cohort

Three-hundred and twenty epileptic encephalopathy trios were recruited through multiple international consortia, including 57 IS or LGS trios unpublished in our earlier studies.1, 2 Patients did not have a clearly identified metabolic or genetic cause for their epilepsy based on clinically available testing, which varied across institutions. This collection of 320 trios did not include: (a) patients previously found to have a disease-causing de novo variant in an established dominant epileptic encephalopathy gene, and (b) trios where exome sequencing was based on a lymphoblastoid cell line source for at least one of the three family members. The overall cohort was not enriched for consanguineous parents. Only two parent pairs showed an identity-by-descent >0.125, both <0.15, which is approximately equivalent to third degree relatives.5

Among the 320 trios; two families reported multiple affected children. For one of these families both the proband and affected sibling were investigated through exome sequencing, while for the second family only the proband and parents were studied. Sequencing methods used to generate the sequence data have been previously described.1, 2

Transmission disequilibrium tests

For the transmission test, we used two approaches that we have previously introduced.6, 7 First, we tested for an autosomal homozygous or compound heterozygous effect using core TDT.7 In computing the test, we selected loss-of-function and missense single-nucleotide substitution variants found at a global population minor allele frequency <5% (MAF<0.05). The loss-of-function variants were defined as stop gain, stop lost, start lost, and canonical splice acceptor and donor site variants. For the missense variants, we used our in-house Analysis Tool for Annotated Variants platform to identify the possibly and probably damaging variants based on a maximum Polyphen-2 HumDiv and HumVar prediction score8 of >0.4333. This test was applied to each autosomal gene individually as well as collectively across a set of 99 autosomal recessive neurometabolic genes published by van Karnebeek et al.9 The recessive neurometabolic gene-set analysis allowed us to assess whether there was evidence for elevated rate of recessive genotypes among recessive neurometabolic disorder genes beyond what had already been screened out by the conventional biochemical assessments performed on this patient sample.

Second, we tested for a general effect of inherited autosomal variation by using a rare variant TDT that uses information from an independent collection of population controls (6503 EVS10 plus 1303 IGM sequenced controls) to weigh the contribution of variants to the final test statistic.6 In this analysis, qualifying variants were defined using the same PolyPhen-2 thresholds as above and were again required to have a global MAF <5%. Given that population stratification can impact the power of the test but not the type I error, we restricted this second analysis to trios with European ancestry (n=286 trios).

Power simulation

As we had not previously performed power simulations for the type of gene-set application conducted in our neurometabolic analysis, we conduct a new power simulation to evaluate the types of effects that we could exclude based on this analysis (Figure 1). In these simulations, we conditioned on the parental genotype information contained in this IS/LGS population sample and characterized the distribution of offspring genotypes, given this information and the fact that the offspring is affected. This distribution is a function of the number of causal genes, for which the family is informative, which is related to the density of causal genes within the actual gene set, and the relative risk of the offspring developing disease, given that they have two affected gene copies. We give the details of this procedure here.

Figure 1
figure 1

coreTDT power simulation conditional on the parental genotype of 54 informative families and 20 informative genes in the compound heterozygous analysis. Presents the combination of the relative risk and the proportion of disease causal genes among these 20 informative genes as a single unit, under which the tests can achieve 80% power.

Let Gf, Gm, Go be the number of gene copies harboring a qualifying variant in the trio’s father, mother, and offspring, respectively. We condition our power analysis on the observed parental genotype and study our ability to identify signal given a differing proportion of causal genes (out of the total number of genes considered), γ and differing relative risks, R, of being diseased, given two gene copies (of a causal disease gene) are affected versus less than two copies are affected. Since the analysis is conditional on the observed parental data, only a subset of genes and families are informative.7 Specifically, only 20 genes across 54 families can have compound genotypes that lead to informative transmissions, that is, Gf=Gm=1, Gf=1, Gm=2 or Gf=2, Gm=1. A total of 46 families are informative for only one gene and eight families are informative for two genes. In each of these eight families, the two genes are located on different chromosomes, so we assume that the transmissions of each gene are independent.

Let Do=1 indicate the fact that the offspring is affected. Let C be an indicator of whether the gene whose transmission is being considered is among the set of disease causal genes or not. When a family is informative for two genes disease causal indicators are given for each gene by C1 and C2. Note, we assume the disease risk for samples with multiple affected disease genes are the same with those with only one affected disease gene.

To simulate trios under the alternative, we first randomly select 20γ genes as disease causal and then generate offspring as follows.

If the family is informative for only one gene, the distribution of both offspring’s gene copies being affected is given by:

If the family is informative for two genes and no more than one of them are disease causal, the compound genotype, the two genes, can be computed independently of one another using the equation above. When both genes are disease causal, their transmissions are not independent given the offspring is affected. In this case, the compound genotypes of the offspring, for the two genes, can be given by,

where Go1, Gm1, Gf1 and Go2, Gm2, Gf2 denotes the trio’s compound genotypes at the first and second gene, respectively. We apply coreTDT to each simulated data set, and for each combination of γ and R, we use 1000 replicates to estimate the power. The combination of γ and R that obtains 80% power are presented in Figure 1.

Accession numbers

The exome-sequencing data reported in this paper are deposited in the Database of Genotypes and Phenotypes (dbGaP) with the accession number phs000654.v2.p1. The EuroEPINOMICS-RES data are deposited in the European Genome-phenome Archive with the accession numbers: EGAS00001000190, EGAS00001000386, and EGAS00001000048.

Results

We assessed the role of inherited rare variation using the population control-weighted rare variant TDT.6 This test was applied to each autosomal gene across 320 eligible trios. No gene reached exome-wide significance after correcting for the 17 816 consensus coding sequence (CCDS release 14) autosomal genes (adjusted α=2.81−10−6, Table 1). Though population stratification cannot affect the false positive rate of the test, it can affect the power.6 We also conducted an analysis that was restricted to the 286 trios of European ancestry. Again, no gene reached the exome-wide significance level (Table 1).

Table 1 Top 10 genes from the analysis of rvTDT with 320 and subsequently with the subset of 286 European ancestry trios

We then tested for the presence of a recessive effect in each autosomal gene across the 320 trios. After quality control, only 3472 autosomal genes were found to have at least one informative family, that is, contain qualifying variants within the gene and that could, potentially, lead to homozygous or compound heterozygous offspring. None of these 3472 genes achieved significance after correcting for the number of genes tested (adjusted α=1.44 × 10−5). The 10 most significant genes are listed in Table 2. To investigate whether there is any evidence of recessive neurometabolic involvement in this sample, we also applied the coreTDT to the set of 99 autosomal recessive neurometabolic genes,9 looking for an enrichment of homozygous or compound heterozygous offspring across the entire gene set as a single unit. No enrichment was found (P=0.51).

Table 2 Top 10 autosomal genes from the analysis of coreTDT with 320 trios

We investigated the power of this analysis. Since only 54 families are informative for at least one of the 99 autosomal recessive neurometabolic genes, and only 20 genes have at least one informative family, our analyses are effectively restricted to these 54 families and 20 genes. We vary the proportion of informative genes that are actually disease causal and the relative risk and identify combinations of these parameters that attain at least 80% power (see ‘Power simulation’ for details). The results of this analysis can be found in Figure 1. As can be seen, even when the compound heterozygous or homozygous qualifying variants are fully penetrant, the causal gene proportion must be >40% to attain 80% power. When the proportion of causal genes is larger, for example, 80%, we will have high power to detect an effect even with a relatively low relative risk.

Using established standards to identify clinically relevant recessive genotypes,11, 12 one trio was found to have inherited two clinically relevant SPATA5 variants in a compound heterozygous manner.13 The proband’s phenotype is consistent with the SPATA5 disease literature, and both variants (NM_145207.2; c.1677C>A (p.(Tyr559*)) and c.251G>A (p.(Arg84Gln)) have previously been described as clinically relevant among other patients with SPATA5 encephalopathy.13

Discussion

A number of rare recessive disorders can present with an epileptic encephalopathy, particularly neurometabolic disorders; the latter are generally identified by biochemical analyses of blood, urine or CSF. We performed a global, hypothesis-free test to assess the role of autosomal recessive genetic variation in 320 patients with classic epileptic encephalopathies undiagnosed with standard clinical workups. Our sample of patient–parent trios did not identify a genome-wide significant departure in the observed number of offspring with recessive genotypes from that expected for any specific gene, or among 99 genes compiled for autosomal recessive neurometabolic disorders.

Many classical recessive metabolic disorders are routinely identified through biochemical screening prior to research study enrollment. Within our sample of 320 trios, we did not find any genetic neurometabolic disorders that were missed through the conventional biochemical screening. From a clinical perspective, this emphasizes that conventional biochemical screening for these treatable causes should continue to be pursued. We did identify a single case among the 320 with a clinically relevant recessive genotype in SPATA5,13 a recently described gene for a recessive condition characterized by seizures, microcephaly, intellectual disability, and hearing loss.

The role of various dominant epilepsy genes including ALG13, CDKL5, DNM1, GABRB3, SCN1A, SCN2A, and STXBP1, for epileptic encephalopathies was securely established through exome sequencing of 356 trios and subsequent genome-wide assessments for excess de novo variants identified in individual genes.1, 2 No single gene passes a comparable threshold among the 320 trios studied here when assessing autosomal recessive genotypes. We demonstrate that the current sample of 320 trios is insufficiently powered to appropriately estimate what overall contribution autosomal recessive epilepsy genes have on the epileptic encephalopathies; however, our power analyses show that we do have sufficient power to rule out a large role from known recessive neurometabolic genes among this patient sample that has been previously screened for such factors using conventional biochemical assessments. Using a similar approach, a recent study on 4125 patient–parent trios with various developmental disorders identified two novel autosomal recessive disease genes exceeding genome-wide significance,14 emphasizing the importance of acquiring larger numbers to more confidently interpret the current lack of signal for very rare genetic epilepsies with recessive inheritance. Large-scale collaborative initiatives like the epilepsy genetic initiative and the Epi25 effort will aid the efforts to analyze genomic data on this scale.