Introduction

On 12 September 2011, the US National Human Genome Research Institute (NHGRI) convened an expert working group to address the challenges of assigning disease causality to sequence variants. Clear guidelines for distinguishing disease-causing sequence variants from false-positive reports of causality were provided.1 The US Centers for Disease Control and Prevention2 in the same year and the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG) in 2013, concerned about accuracy of clinical laboratory reports to clinical practitioners, also convened a workgroup consisting of clinical service providers.3

The NHGRI working group cautioned that the vast majority of genes reported as causally linked to monogenic diseases are true positives, but 27% of 406 published severe disease mutations in 104 sequenced individuals either were common polymorphisms or lacked direct evidence for pathogenicity.4,5,6,7,8 The NHGRI working group defined rare germ-line variants with minor allele frequencies of <0.01 that have relatively large effects on disease risk and variants that have been implicated in severe monogenic diseases and complex diseases.9 The working groups’ intended scope excluded common small-effects variants typically identified by genome-wide association studies of complex traits.10,11 Because “unambiguous assignment of disease causality for sequence variants is often impossible,” the NHGRI working group introduced the concept of “implication”—implicating by genetic evidence sequence variant(s) of a gene that is in the process of integrating and assessing experimental evidence for pathogenicity.1

For “implication,” the NHGRI workgroup emphasized the “critical primacy of strong robust statistical genetic support,” such as coinheritance in family studies, case–control association studies, allele frequencies in public genome databases, and predictions for conservation and pathogenicity. Strong statistical genetic support for “implication” is then supplemented by variant specific experimental studies that demonstrate that a gene product is functionally disrupted by variants. The NHGRI workgroup values disease models that recapitulate the relevant pathology of human disease and allow rescue of the phenotype when the molecular disease pathway is knocked out or eliminated.1 The ACMG workgroup, concerned more with reports of clinical genomic testing that impact medical decision making, took these evidentiary data and rated their strengths as “very strong,” “strong,” “moderate,” and “supportive.” They set rules for combining the strengths of evidentiary data when classifying sequence variants into “pathogenic,” “likely pathogenic,” “uncertain significance,” “likely benign,” and “benign”3 ( Figure 1 ).

Figure 1
figure 1

Assigning disease causality to sequence variants according to the National Human Genome Research Institute and American College of Medical Genetics and Genomics Guidelines.

The NHGRI core guidelines,1 the ACMG consensus recommendations for interpretation of sequence variants,3 and large genome databases representing different racial and continental populations, such as the Genome 1000 (ref. 12), the Exome Variant Server 6500 (ref. 13), and the Exome Aggregation Consortium (ExAC) (ref. 14), were not available in 2004 when variants of the EF-hand domain (C-terminal) containing 1 gene (EFHC1) were reported as disease-causing mutations in myoclonic and grand mal clonic-tonic-clonic (CTC) convulsions produced by juvenile myoclonic epilepsy (JME). Consequently, all EFHC1 variants discovered in the first decade of this millennium and reported with respect to epilepsy or not15,16,17,18,19,20,21,22,23,24,25,26,27,28 have not been “vetted” through NHGRI and ACMG guidelines. More importantly, both NHGRI and ACMG guidelines advise that “with evidence on variants evolving” and the “content of sequencing tests expanding,” “rigorous evaluation” and “reanalysis of variants are encouraged” to prevent misannotation of the pathogenicity of variants in public databases. For all these reasons, we applied NHGRI guidelines and ACMG rules ( Figure 1 ) for combining evidentiary criteria in reanalyzing 54 EFHC1 variants, of which 33 were originally published as mutations.

Materials and Methods

We gathered 54 EFHC1 variants reported in regard to epilepsy from scientific and medical literature, bibliographic resources from the NCBI PubMed literature server, the ClinVar database,15 and personal communications with authors of abstracts and posters published during neurological, epilepsy, and genetic meetings. The 33 purported EFHC1 mutations are scattered across the 640 amino acid protein of Myoclonin1 ( Figure 2 ; Supplementary Table S1 online places all 54 variants in the GRCh37/hg19 coordinate system, provides rsID number if identified in dbSNP 142, and translates the cDNA and protein nomenclature for all four coding transcripts identified by Ensembl).

Figure 2
figure 2

An EF hand–containing calcium-binding gene ( EFHC1 ) spans 72  k b, has 11 exons, three DM10 domains (DM refers to Drosophila melanogaster sequences), and one EF-hand motif that is calcium binding (HGNC: 16406). (a) The EFHC1 gene encodes a 640-amino-acid protein called myoclonin1. The EF-hand motif is located at the C-terminal between amino acids 578 and 606 and is encoded by a nucleotide sequence that is present in exon 10. Because the only motif of EFHC1 whose function was known consisted of the EF hand, the gene was first called EFHC1 for “EF hand containing one.”15 The diagram shows the domain organization of EFHC1 protein, the positions of various mutations found in JME families, and their frequency (number of independent families with a given mutation). Mutation numbering is based on the GenBank reference protein sequence with accession number 608816. A, Africa; B, Brazil; DM10, Domain 10; H, Honduras; IN*, India; IS, Israel; IT, Italy; J, Japan; M, Mexico; NY, New York. (b) Schematic representation of an EF-hand motif comprising two helixes—E and F—linked by a calcium-binding sequence. Symbolic representation of the same motif as a right hand in which the E helix corresponds to the index and the F helix to the thumb. Note: Part b of Figure 2 (EFH domain) has been reproduced with permission (publicly available) from: “http://www.ncbi.nlm.nih.gov/books/NBK98188/” Myoclonin1/EFHC1 in cell division, neuroblast migration, synapse/dendrite formation in juvenile myoclonic epilepsy. Jasper's Basic Mechanisms of the Epilepsies [Internet]. 4th edition. Noebels JL, Avoli M, Rogawski MA, Olsen RW, Delgado-Escueta AV, editors. Bethesda (MD): “http://www.ncbi.nlm.nih.gov/” National Center for Biotechnology Information (US); 2012.

Coinheritance

Twelve families were reported with EFHC1 variants cosegregating identically by descent with all disease-affected members16,17,19,23,25 and with two variants that did not cosegregate ( Table 1 ). We calculated Bayesian factor linkage likelihood to evaluate the significance of sequence variants29 based on the pedigrees as published. Eleven different genotyped EFHC1 variants that were found in these families were used as marker alleles. JME was assumed to be in linkage disequilibrium with the markers and at Ɵ = 0, with a standard penetrance model for JME (pnp2 = 0.001; pnpq = 0.7; pnq2 = 0.7). We corrected for ascertainment bias as proposed by Thompson et al.29

Table 1 Cosegregation analysis of EFHC1 variants found in juvenile myoclonic epilepsy families

Case–control association studies

Supplementary Table S2 online lists the study design of all published case–control studies, their population groups, their specific racial/ethnic groups and countries of their residencies, the number of index cases and controls, and the targets used in screening for EFHC1 mutations in 12 cohorts from 9 countries.

We studied the association of EFHC1 variants with JME or genetic generalized epilepsy index cases versus the association of EFHC1 variants with ancestry/race-matched controls as originally published. Table 2 summarizes the actual results of the unconditional exact homogeneity/independence test (Z-pooled method, one-tailed),30,31 which assessed whether the proportion of variants associated between the two groups reflected a statistically significant difference. Supplementary Table S3 online provides all the details of the case–control studies. Because many of the published studies on EFHC1 had insufficient control sample sizes, we also calculated odds ratios (OR) and statistical significance (P values) for both the study as published and the allele counts available in race-matched population groups from the ExAC database.14 An OR of 1.0 means that the variant does not affect the odds of having the disease; values higher than 1.0 indicate that there is an association between the variant and the risk for the disease.

Table 2 OR or risk for JME or GGE derived from case–control studies on putative pathogenic EFHC1 variants as originally published

Allele frequencies

We extracted the minor allele frequencies for all putative pathogenic EFHC1 variants in exomes from race-matched and ancestry-matched presumably normal populations of 6,503 persons stored in the 2013 Exome Variant Server13 (ESP6500SI-V2), in genome sequences of 2,504 persons of the 1000 Genomes Project12 (phase 3, release 16), and in 60,706, exomes collected by the ExAC consortium (Supplementary Table S4a,b online).14

Studies of epilepsy prevalence were selected primarily on the basis of whether they classified electroclinical syndromes such as genetic generalized epilepsies, JME, and childhood absence epilepsy in observed cases. Population prevalence for Hispanics was determined by a door-to-door study performed in rural Bolivia in 1999.32 Prevalence among Caucasians was determined by a study of the Norwegian National registry in 2015,33 and East Asian prevalence by a study of the regional registry of patients older than 15 years in Hong Kong, China.34 Another estimate of Southeast Asian prevalence was produced via a random cluster survey of Cambodian villages.35 Differing in methodology and ascertainment and not ideally race-matched to JME index cases studied, these studies represent the only data we could use to compare allele frequencies of EFHC1 variants with JME disease prevalence.

Algorithms predicting conservation and pathogenicity

We analyzed the theoretical pathogenicity of all 54 variants by applying: (i) four algorithms (PhyloP,36,37 SiPhy,38,39 GERP++,40 and PHASTCons41) that measure evolutionary conservation at the level of DNA base pairs and (ii) seven algorithms (SIFT,42,43 PolyPhen2-HVAR,44,45 LRT,46 Mutation Taster,47 Mutation Assessor48 and FATHMM,49 and CADD50) that calculate amino acid conservation and the likelihood of deleteriously altering the encoded amino acid function (Supplementary Table S5ac online). These algorithms were chosen because they were included in the database of Nonsynonymous Functional Predictions (dbNSFP v2.6),51,52 which is available for download from the ANNOVAR53 website. Variants found in non-RefSeq-defined transcripts were annotated manually when possible. To determine whether variants might affect the splicing consensus, all were run through the online algorithms provided by Human Splicing Finder.54 To assess the ACMG criterion for benign status (BP4), we used the Multiz alignment of 62 mammalian species available on the UCSC Genome Browser to determine whether the amino acid substitution would compromise function.

Variant-specific experimental functional studies on pathogenicity

Only 20 variants have undergone variant-level experimental functional studies (Supplementary Table S6ac online). These consist of five EFHC1 mutations and three polymorphisms, which we originally reported in 2004 (ref. 16) and 12 EFHC1 mutations reported from India (ref. 28). We summarize the results of functional studies in three tables: Supplementary Table S6a online, which presents the molecular and cellular models, Table 6b, which displays in vivo models of neurodevelopment, and Table 6b, which covers the protein–protein interactions (PPIs). We also report the level of statistical significance between each variant and the wild-type allele published in original articles.55,56,57,58

Paradigm V—gene level functional studies on pathogenesis

EFHC1 function has been studied at the gene level by knocking out EFHC1 orthologs in both mice59,60 and flies (Supplementary Table S7 online).61 Both mouse and fly models presented seizure-related and electroclinical phenotypes and neuroanatomical measures similar to those in the variant-level functional studies. These measures and their level of statistical significance with respect to the wild type as published are summarized in Supplementary Table S7 online.

Results

EFHC1

Ensembl identifies a total of four alternative coding transcripts and two noncoding transcripts for EFHC1 (see Supplementary Table S1 online). Most reported EFHC1 variants associated with epilepsies cause amino acid substitutions in two of the isoforms: transcripts A (length: 640aa) and B (length: 278aa). The two transcripts share the first 241-amino-acid sequence, and transcript B translates two additional variants, causing a frameshift and nonsense change in the protein, but it is prematurely truncated at the end of exon 4. Transcript B retains the first DM10 domain (see Figure 2 for an illustration of EFHC1). Transcript B is not expressed in the mouse brain, but it is expressed in human and chimpanzee neural tissue. Transcripts C and D have not been evaluated for their role in epilepsy or their expression in other mammalian species.

Cosegregation

Of 33 putative pathogenic variants, eight were reported to cosegregate with 40 clinically and EEG polyspike wave–affected family members across two to four generations of 12 JME families in four separate cohorts from Mexico, Honduras, Italy, and Tennessee (USA).16,17,19,23 The remaining variants were detected only in singletons ( Table 1 ).

Table 1 summarizes the pedigrees studied and calculates the estimated penetrances within and across all families. We reanalyzed the patterns of cosegregation with JME and the EEG polyspike wave trait in affected carrier families and calculated a Bayesian method for evaluating causality of variants29 using eight EFHC1 variants as markers ( Table 1 ). Our reanalysis identified the following variants as 3.07- to 1,000-times more likely (LOD: 0.4874 to 3.0300) to have cosegregation occur not by chance. P77T/R221H (as a double heterozygous variant), R221H, R118C, R221H, D253Y, F229L, and Q277X were identified as single autosomal dominant heterozygous variants.

The families in which R353W and D210N were found were too small and hence underpowered to detect significant linkage. P77T/R221H and trB:Q277X both had LOD scores >2.0 (pathogenic OR >100×), which is suggestive of linkage. R221H (including in the two families where it was found with P77T) had a LOD score >3.0, which is significant for linkage. Our reanalysis does not show whether the combination of P77T and R221H produces a disruption of EFHC1 function or whether R221H by itself is causal.

One reported variant, P429P, segregated with two out of three affected individuals in a family from Italy, and presumptively in trans with the nonpathogenic variant R182H, which segregated with all three affected individuals.19 The Bayesian LOD score for this variant was −2.2438. Another variant, R353Q, cosegregated with one of the four clinically affected members in the family25 with a Bayesian LOD score of −4.9308. These variants meet the “strong” criterion (BS4) for benign status.

Our reanalysis of cosegregating families indicated that EFHC1 variants were transmitted in an autosomal dominant manner and suggested the implication of EFHC1 variants in JME. In weighing the value of cosegregation as ACMG evidentiary data, we first considered it as only “supporting” evidence for pathogenicity because it is not clear from ACMG guidelines how to weight evidence of cosegregation in one large multi-generational family. However, because there was increasing segregation data in at least seven families from diverse ethnic backgrounds, these cosegregation data could be weighed by ACMG as moderate to strong evidence.3

Case association studies

The 12 case–control association studies yielded 32 putative pathogenic variants (note: R276X was not studied in case–controls) ( Table 2 and Supplementary Table S3 online). An additional four variants (L9L, R353P, E357K, and P429P) were originally published as polymorphisms, because they either were found in one control or produced a synonymous change in the protein product. These four variants were found to be absent in their race-matched population in ExAC.14 The final variant included in Supplementary Table S3 online is R159W, which has been reported as benign in nearly all studies but reached statistical significance during case association within two studies. It should be noted that the variant was no longer significant when compared with the ExAC population data. For each variant, all studies that genotyped the variant in a case or a control are summarized in Table 2 and all details are presented in Supplementary Table S3 online.

Within the scope of their specific study design, nine variants reached statistical significance and had an OR >5: P77T, D210N, R221H, F229L, trB: G264Vfs*280, trB:Q277X, T252K, D253Y, and c.*91T>C. When using the race-matched ExAC population data, all of the variants listed here still reached statistical significance within their own study, and an additional 18 variants also met the ACMG criterion for strong evidence (PS4). Three variants were replicated and met the criterion in two or more studies: R221H, F229L, and R353W.

Allele frequencies

Eight variants were completely absent across all populations in the ExAC database: C259Y, R276X, E322K, K378E, A394S, P429P, c.1640+1G>A, and c.*91T>C (Supplementary Table S4a,b online). Another eight variants were absent in the race-matched subpopulation in ExAC: L9L, R118C, R152Q, I174V, T252K, R353P, E357K, and Y485H. Two additional variants were absent in the race-matched European American subpopulation of the ESP6500si database: R221H and R353W.

Of the 54 variants we examined, 17 were found at allele frequencies greater than the expected population prevalence: P77T, H89R, R182C, D210N, R221H (only in the Latino subpopulation), F229L, trB:G264Vfs, trB:Q277X, R294H, R353W, R436C, M448T, T508R, N607N, I619S, and Y631C. The ESP6500si database corroborated only three of these variants as being greater than the population prevalence: F229L, M448T, and R294H. Finally, six variants met the stand-alone BA1 criteria based on their allele frequency in ExAC: c.-148_147delGC, R159W, R182H, c.573+10A>G, I619L, and c.*121C>A. ExAC and ESP6500si do not target intronic regions, so certain variants do not have allele frequencies. 1000 Genomes identifies two additional intronic variants, c.1492+175_176delTT and c.1851+59C>T, that were at allele frequencies >0.05.

In silico analysis for conservation and damaging effects

All exonic variants, except the two transcript B variants, were predicted to be evolutionarily constrained across a phylogeny of species by at least one nucleotide conservation measure (Supplementary Table S5ac online). None of the conservation scores predicted either of the transcript B variants to be conserved across species. This is predictable because transcript B was not found to be expressed in the mouse brain, but it was expressed in humans and chimpanzees; therefore, it may not be under evolutionary constraint across the entirety of the vertebrate or mammalian clades. V556L was not found to be conserved by any of the nucleotide-based calculations, but it was determined to be part of a conserved element by GERP++ spanning exon 10.

Eight exonic variants were predicted to be benign by all eight pathogenicity algorithms: P77T, R221H, R353P, E357K, A394S, M448T, K378E, and V556L. Of these variants, only P77T and R221H underwent experimental functional testing and were found to have a significant effect on several measures of calcium channel–dependent activities and neurodevelopment (see Results and Discussion, and Supplementary Table S6a,b online). The other variants that showed similar significant differences in functional experiments were D253Y (predicted to be pathogenic by five of the algorithms), D210N (predicted to be pathogenic by seven), and F229L (predicted to be pathogenic by four algorithms). The only variant that was predicted to be pathogenic by all eight measures—the ACMG requirement for “supporting” pathogenic evidence (PP3)—was R436C. It should be noted that the FATHMM algorithm predicted all but one of the variants to be benign. Five variants were found as the reference allele in two or more mammalian species: R221H, R296H, M448T, and Y355C, which meets the requirement for “supporting” benign evidence (BP4).

Human Splice Finder results are summarized in Supplementary Table S5c online. Ten variants were predicted to create new donor or acceptor sites, and one variant (c.1640+1G>A) disrupted the wild-type donor sites. Thirty-one variants were predicted to disrupt splicing enhancer motifs, and 17 variant were predicted to create new splicing silencer motifs. Eight variants were predicted to not alter the splicing consensus by any of the algorithms. We applied the PP3 and BP4 criteria only to synonymous variants and those affecting canonical splicing sites.

Variant-specific experimental evidence for pathogenicity

Only a few variants of EFHC1 have undergone functional testing (Supplementary Table S6ac online). Five EFHC1 variants were originally reported in JME patients from Mexico16 (P77T, D210N, R221H, F229L, and D253Y), in reverse TRPM2-induced (transient receptor potential calcium permeable M2 channel) apoptosis, and in current densities.57 Four of these variants (the exception was P77T, which was not tested) produce severe mitotic spindle defects during cell division55 and impair early radial and tangential migration of neuroblasts,56 thus providing experimental evidence that EFHC1 variants are damaging to gene function. Supplementary Table S6a,b online show the published statistical results of 14 experimental measures demonstrating a significant difference between the tested variants and the wild-type protein. Three variants (R159W, R182H, and I619L) classified as benign polymorphism did not produce statistically significant results in almost all of the measures in comparison to the wild type.16,55,56,57 R182H showed a small, but significant difference in apoptotic activity in primary mouse hippocampal neurons in culture.16

Most recently, Sahni et al.58 demonstrated that wild-type EFHC1 proteins interacted with products of 16 genes. Of the EFHC1 disease alleles tested, R221H and A394S perturbed EFHC1 interaction with all 16 genes. P77T disrupted interaction with all but one protein (TEX11, which is expressed exclusively in male germ cells and therefore may not play a role in epileptogenesis), and T508R disrupted interaction with four proteins. The PPI profiles of the two polymorphisms, R159W and M448T, did not show any significant perturbation of the wild-type network (see Supplementary Table S6c online for summary of these findings). Interactions with five gene products—CCDC36 (Coiled-coil Domain Containing 36), EIF4ENIF1 (Eukaryotic Translation Initiation Factor 4E Nuclear Import Factor 1), REL (V-Rel Avian Reticuloendotheliosis Viral Oncogene Homolog), TCF4 (Transcription Factor 4), and ZBED1 (Zinc-Finger, BED containing 1)—were interrupted by all four of the EFHC1 disease alleles. Four of these proteins—EIF4ENIF1, REL, TCF4, and TRAF2—play important roles in the regulation of neuron-specific differentiation or apoptosis.62,63,64,65 Two of the interactors, GOLGA2 and ZBED1, have been implicated in cell-cycle control and cell proliferation.66,67 TRIP6 is a positive regulator of lysophosphatidic acid (LPA)-induced cell migration.68 Finally, three of the proteins whose interactions were perturbed—REL, TRAF2, and TRIP6—play roles in the NF-κB signaling pathway and have been implicated in the processes of hippocampal synaptic plasticity and memory.69

Gene-level experimental evidence for pathogenesis

Supplementary Table S7 online summarizes 20 gene-level experimental studies of efhc1-deficient mouse59,60 and fly models.61 The most impressive experimental evidence for disease causality seeing the epileptic disorder manifest in a knockout animal model. Supplementary Video S1 online shows an efhc1KO mouse (efhc1−/−) having several massive myoclonias and a CTC convulsion.60 The massive myoclonic seizures and CTC convulsions shown in the video can occur in homozygous efhc1−/− and heterozygous efhc1+/− mutant mice.

Heterozygous (efhc1+/−) and null (efhc1−/−) mutants exhibit more spontaneous positive myoclonias than wild-type mice and quick, high-amplitude polyspikes on their EMG.59 Two measures of seizure susceptibility—the percentage of animals exhibiting generalized seizures within 600 s after treatment with pentylenetretrazole (PTZ) and latency to clonic seizures after PTZ treatment—are significantly increased in the same measures of wild-type mice, with the greatest significance reached in 9- to 12-month-old mice. Both efhc1+/− and efhc1−/− mutant mice show ependymal cilia in the lateral ventricles, with abnormally decreased movements at 3 months and slightly enlarged lateral ventricles and decreased hippocampal volume at 12 months.59 In mice with massive myoclonias and grand mal CTC convulsions, cell death occurs in ependymal cells along periventricular zones (both lateral and fourth ventricles) and in striatal cells, while disorganization of cell layering in paraventricular nucleus of hypothalamus, thalamus, hippocampus, and neocortex is present.60

The notion of overgrown and overexcitable neurons and neurites was further examined in vivo in Drosophila melanogaster.61 Knocking out the Drosophila DEFHC1.1 gene, a homolog of human EFHC1, resulted in supernumerary synaptic boutons at the neuromuscular junction synapse and increased terminal branching of dendritic arborization, along with increased spontaneous neurotransmitter release. The notion of overgrown and overexcitable neurons and neurites was further solidified when DEFHC1.1 overexpression rescued and markedly reduced dendrite branching and complexity.61 These rescue experiments strongly recommended by NHGRI core guidelines argue convincingly that EFHC1’s main function is to restrain excessive synaptic and dendritic growth and arborization.

Discussion

In research laboratories, the primary purpose of the search for disease-associated variants is to identify molecular disease mechanisms that can lead to a quest for a curative molecule.1 During clinical laboratory testing, however, the primary purpose of searching for disease-causing sequence variants is to support medical decision making.3 Here, in the reanalysis of EFHC1 variants, we search for disease mechanisms that produce the most feared and most neurologically damaging seizure phenotype—the grand mal CTC convulsions of JME—while we weigh evidence for or against pathogenicity of a given EFHC1 variant that can be used in medical decision making.

Applying NHGRI guidelines and ACMG classification to all 54 EFHC1 variants in literature, we show that EFHC1 is definitely implicated in JME. Table 2 provides evidence used to classify all the variants as “pathogenic” and “likely pathogenic.” Supplementary Table S8a online summarizes the ACMG criteria used for classifying all 54 EFHC1 variants included in this study. Using ACMG combinatorial criteria, we classified 9 EFHC1 variants as “pathogenic,” 14 variants as “likely pathogenic,” and 20 variants “of unknown significance” ( Table 3 ). Eight EFHC1 variants were benign and 3 were likely benign.

Table 3 Combination of evidentiary data, their weight, and strength of evidence used to classify variants into the five-tier American College of Medical Genetics and Genomics classification of sequence variants

Of the “pathogenic” variants, the five original EFHC1 variants discovered in Mexican families16 met two “strong” criteria for pathogenicity; they were found to be statistically increased in JME cases in comparison to controls in at least one study and also demonstrated a significant difference in 4 to 10 experimental measures of neuron function and neurodevelopment.55,56,57 trB:Q277X was found to be statistically increased in disease cases compared with controls, and it was found de novo in a singleton whose parentage was confirmed.17 R276X, a nonsense variant that truncates the final six exons of the primary transcript, was associated with a JME patient and reported in the ClinVar database.15 It was also found as a de novo mutation in a single case of epileptic encephalopathy (personal communication during presentation of a poster by S. Jamuar et al. during 2013 American Epilepsy Society Meeting.), thereby meeting the ACMG “very strong” criteria for pathogenicity (PVS1).

Within the scope of their originally reported individual studies, 20 EFHC1 variants did not reach statistical significance during case–control association because these studies, having an insufficient number of racial/ethnic- and ancestry-matched controls, were statistically underpowered. However, when compared with racial/ethnic-matched ExAC population controls, these same EFHC1 variants have ORs >5 and reach statistical significance.

Fourteen variants classified as “likely pathogenic” would have been classified as “unknown” if only the results within their specific case–control study were used. Although most of these variants were absent in the study controls run, two variants—L9L and E357K—were found in the study ancestry-matched controls but were completely absent from Europeans in ExAC.14 These variants may be examples of population-specific benign variations whose case association would have resulted in false positives if the ancestry-matched population controls were not included in the study and only the public genome databases were used as reference panels. These observations demonstrate the necessity of using both (i) study controls, matched for ancestry and country of residence, who are well screened to be free of epilepsy and febrile convulsions in their families and (ii) large public genome databases that have not been screened for epilepsy.

In the framework of the larger population groups used by ExAC,14 we attempted to quantify the statistical effect size that pathogenic variants would have on JME patients. These estimates were calculated using the number of JME index cases in each population group who were screened for variants in all exons of EFHC1 (see Supplementary Table S2 online). Among Latinos, variants meeting the ACMG standard of “pathogenic” were identified as heterozygous mutations in 4.10% of individuals with JME, and “likely pathogenic” variants were found in another 3.59%. In Caucasians, both “pathogenic” and “likely pathogenic” variants were discovered in 2.59% (each) of individuals with JME. Currently, the variants discovered in India only meet the standard for “likely pathogenic” and account for 1.46% of all its screened JME patients.28 However, we are still awaiting publication of experimental functional studies of the EFHC1 variants discovered in JME patients in India. These studies will probably change their classification. Finally, the JME cohort from Brazil found variants meeting the “pathogenic” standard in only 2.94% of their JME cases (unpublished observations).

Further evidence for a large effect on the phenotype of JME is provided by a nonconsanguineous Moroccan-Jewish family in which three of their seven children afflicted with intractable epilepsy during infancy and who died at 18–36 months.24 Whole-exome sequencing of the family revealed a homozygous mutation of F229L in two of the three affected children (the third child could not be tested). These children began experiencing seizures 6–12 h after birth and subsequently developed severe psychomotor retardation and microcephaly. Brain MRI of one of the children at 2 years of age exhibited decreased cerebellar volume, hypomyelination, and enlarged lateral and third ventricles, consistent with both our variant-specific experiments and gene-level experiments in knockout models of EFHC1 (efhc1−/− KO mice and the loss-of-function fly model).

When a variant’s allele frequency and its population prevalence are greater than the disease prevalence, ACMG recommendations consider this result “strong” criterion for the benign status for the variant. This criterion needs to be revisited by the AMG workgroup and rediscussed in the context of nonmonogenic disorders because of the uncertainty of the prevalence of specific diseases such as a specific epilepsy syndrome like JME. Although this criterion did not change the pathogenic classification of EFHC1 variants in question according to the combinatorial rules of ACMG, 5 of the 7 “pathogenic” variants and 2 of the 16 “likely pathogenic” variants were found to have allele frequencies higher than the expected prevalence of the JME in their respective populations. There are three possible explanations for this observation:

  • 1. The population prevalence studies are poorly matched to the populations in which the variants were found. Most epidemiology studies focused only on the prevalence of “active” epilepsies, a measure that is important for public health policy and estimation of economic impact. However, the lifetime prevalence, which would attempt to also capture individuals with a history of epilepsy prior to the date of ascertainment but who may be in remission or no longer seeking treatment, would be a better measure for comparison in genetic studies. Furthermore, studies of epilepsy prevalence frequently do not capture further information of seizure types or classifications of electroclinical syndromes. When they do, different diagnostic and ascertainment criteria make it difficult to compare results between studies.

  • 2. The ExAC database, our primary reference panel for estimating minor allele frequencies, includes exomes of several disease populations. Notably, these three studies were targeted toward the identification of neurological diseases such as schizophrenia and Tourette syndrome. Although EFHC1 has not been specifically implicated in either of these conditions, other genes implicated in JME and childhood absence epilepsy show overlap with those implicated in schizophrenia, specifically GABRA1, GABRB3, GABRG2 (refs. 70,71), and CHRNA7 (ref. 72), indicating that the ExAC database may be enriched with these alleles.

  • 3. Like variants associated with other diseases that have a complex genetic architecture, some EFHC1 variants may not be sufficient by themselves to cause epilepsy; however, they may have an additive effect toward the pathogenesis of JME disease in conjunction with other alleles associated with epilepsy. This would also explain the discovery of JME disease alleles in screened study controls20,21 as well as in the case of multiple disease alleles in linkage disequilibrium and the autosomal dominant transmission with incomplete penetrance that we see in our large families.

In the case of a fully penetrant monogenic disease with a large statistical effect size (the disease model used when creating both the NHGRI and ACMG standards), the comparison of allele frequencies to population prevalence may provide an appropriate guideline for classifying diseases. However, genetic cases of JME exhibit a high degree of phenotypic heterogeneity, and even within the affected families only 50.94% of all heterozygous carriers actually develop clinical epilepsy. Of all the implicated JME genes, EFHC1 has been replicated by more studies than any other, but “pathogenic” and “likely pathogenic” variants still only account for approximately 3% of JME cases in Caucasians and 8% of those in Hispanics. JME does not perfectly fit the disease model for which these standards were created. However, even with this limitation, the ACMG and NHGRI guidelines enabled us to classify the purported disease-causing variants in EFHC1.

In conclusion, we found the NHGRI and ACMG guidelines to be useful in quantifying the amounts and types of evidence that implicate sequence variants of EFHC1 as disease-causing in JME. Vetting EFHC1 variants through NHGRI guidelines1 definitely implicates these EFHC1 variants in JME. Using ACMG recommendations, scoring rules, and combinatorial criteria to choose a classification from the five-tier system,3 our reanalysis showed that 9 EFHC1 variants are “pathogenic,” 14 are “likely pathogenic,” and 20 are “variants of unknown significance.” (See Supplementary Table S8b online for criteria and classification of all variants.) NHGRI gene-level evidence and variant-level evidence establish EFHC1 as the first non–ion channel microtubule–associated protein53 whose mutations disturb R-type VDCC16 and TRPM2 calcium currents57 in overgrown synapses and dendrites61 within abnormally migrated dislocated neurons,56 thus explaining myoclonic and grand mal CTC convulsions and “microdysgenesis”73,74 neuropathology of JME.

Disclosure

The authors declare no conflict of interest.