GWAS meta-analysis of over 29,000 people with epilepsy identifies 26 risk loci and subtype-specific genetic architecture

Epilepsy is a highly heritable disorder affecting over 50 million people worldwide, of which about one-third are resistant to current treatments. Here we report a multi-ancestry genome-wide association study including 29,944 cases, stratified into three broad categories and seven subtypes of epilepsy, and 52,538 controls. We identify 26 genome-wide significant loci, 19 of which are specific to genetic generalized epilepsy (GGE). We implicate 29 likely causal genes underlying these 26 loci. SNP-based heritability analyses show that common variants explain between 39.6% and 90% of genetic risk for GGE and its subtypes. Subtype analysis revealed markedly different genetic architectures between focal and generalized epilepsies. Gene-set analyses of GGE signals implicate synaptic processes in both excitatory and inhibitory neurons in the brain. Prioritized candidate genes overlap with monogenic epilepsy genes and with targets of current antiseizure medications. Finally, we leverage our results to identify alternate drugs with predicted efficacy if repurposed for epilepsy treatment.

The epilepsies are a heterogeneous group of neurological disorders, characterized by an enduring predisposition to generate unprovoked seizures 1 . It is estimated that over 50 million people worldwide have active epilepsy, with an annual cumulative incidence of 68 per 100,000 persons 2 .
Similar to other common neurodevelopmental disorders, epilepsies have substantial genetic risk contributions from both common and rare genetic variations. Analysis of the epilepsies benefits from deep phenotyping, which allows clinical subtypes to be distinguished 3 , in contrast to other common neurodevelopmental disorders, where phenotypic subtypes are more difficult to define. Differences in the genetic architecture of clinical subtypes of epilepsy are also emerging, to complement the clinical partitioning [4][5][6][7] . The rare but severe epileptic encephalopathies are usually nonfamilial and are largely caused by single de novo dominant variants, often involving genes encoding ion channels or proteins of the synaptic machinery 8 . Both common and rare variants have been shown to contribute to the milder and more common focal and generalized epilepsies. This is particularly true for generalized epilepsy, which is primarily constituted by genetic generalized epilepsy (GGE) 4,5,9,10 . Nevertheless, previous genetic studies of common epilepsies have explained only a limited proportion of this common genetic variant, or single-nucleotide polymorphism (SNP)-based, heritability-9.2% for focal and 32.1% for GGE [4][5][6]10 .
Epilepsy is typically treated using antiseizure medications (ASMs). However, despite the availability of over 25 licensed ASMs worldwide, a third of people with epilepsy experience continuing seizures 11 . Diet, surgery and neuromodulation represent additional treatment options that can be effective in small subgroups of patients 12 . Accurate classification of clinical presentations is an important guiding factor in epilepsy treatment.
Here we report the third epilepsy genome-wide association study (GWAS) meta-analysis by the International League against Epilepsy (ILAE) Consortium on complex epilepsies, comprising a total of 29,944 deeply phenotyped cases recruited from tertiary referral centers and 52,538 controls, approximately doubling the previous sample size 4 . Results suggest markedly different genetic architectures between focal and generalized forms of epilepsy. Combining these results with those from less-stringently phenotyped biobank Article https://doi.org/10.1038/s41588-023-01485-w Annotation-Dependent Depletion (CADD) scores predicted that 11 'all epilepsy' and 50 GGE SNPs were deleterious (CADD score > 12.37) (ref. 15). LDAK heritability analyses showed significant enrichment of signal in 'super-enhancers' (Supplementary Table 6), suggesting that GGE SNPs regulate clusters of transcriptional enhancers that control the expression of genes that define cell identity 16 .
To assess potential syndrome-specific loci, we performed GWAS on seven well-defined FE and GGE subtypes (Supplementary Fig. 4a-g). We found three genome-wide significant loci associated specifically with JME (n = 1,813), of which one was new (8q23.1) and the other two (4p12 and 16p11.2) previously reported 4 . Our analysis of CAE (n = 1,072) consolidated an established genome-wide significant signal at 2p16.1, which was also observed in the GGE and all epilepsy GWAS. We did not find any genome-wide significant loci for JAE (n = 671), GTCSA (n = 499), 'nonlesional FE' (n = 6,367), 'FE with HS' (n = 1,375) or 'FE with other lesions' (n = 4,661).
MTAG 17 analysis of individual GGE subphenotypes showed concordance with the main GGE GWAS, without identifying new loci. In addition, this analysis confirmed that the majority of GWAS-significant SNPs in GGE are overlapping (Supplementary Figs. 5 and 6 and Supplementary Table 7).
The vast majority of loci reported in our previous effort 4 remained genome-wide significant. A summary of loci that fell below the genome-wide significance threshold is provided in Supplementary  Table 8.
Genomic inflation was comparable to our previous GWAS, and all linkage-disequilibrium score regression (LDSC) intercepts were lower (Supplementary Table 9) 4 , suggesting that the signals are primarily driven by polygenicity. Computation of the attenuation ratio suggested that part of the inflation signal, in particular for FE (0.58), might be due to some form of bias (for example, confounding or population stratification) 13 . The attenuation ratio was lowest for GGE (0.11), which includes the vast majority of significant loci (Supplementary Table 9).
We calculated a gene-based association score based on the aggregate of all SNPs inside each gene using MAGMA (Methods) 19 . This analysis yielded 39 significant genic associations-six with 'all epilepsy' and 37 with GGE (four overlapped with the 'all epilepsy' analysis), after correction for 16,371 tested genes (P < 0.05/16,371 genes; Supplementary Data 3). Thirteen of these 39 genes mapped to regions outside of the genome-wide significant loci from the single SNP analyses.
Next, we performed a transcriptome-wide association study (TWAS) to assess whether epilepsy was associated with differential gene expression in the brain (Methods) 20,21 . These analyses revealed significant associations with 27 genes in total; 13 genes with 'all epilepsy, ' 16 with GGE and two with both phenotypes (Supplementary Data 4). Nineteen of the 27 genes mapped outside of the 26 loci were identified through the GWAS. Using summary-data-based Mendelian randomization (SMR) 22 , we determined a potentially causal relationship between brain expression of RMI1 and 'all epilepsy,' and among RMI1, CDK5RAP3 and TVP23B and GGE (Supplementary Data 5).
Of note, expression of RMI1 was associated with GGE in both TWAS (P = 4.0 × 10 −10 ) and SMR (P = 5.2 × 10 −8 ), as well as with 'all epilepsy' (TWAS P = 1.3 × 10 −6 ; SMR P = 2.6 × 10 −6 ). RMI1 has a crucial role in genomic stability 23 and has not been previously associated with epilepsy or any other Mendelian trait (OMIM, 610404). and deCODE genetics epilepsy cases did not substantially increase signal, despite almost doubling the sample size to 51,678 cases and 1,076,527 controls. Our findings shed light on the enigmatic biology of generalized epilepsy and the importance of accurate syndromic phenotyping and may facilitate drug repurposing for new therapeutic approaches.

Study overview
We performed a GWAS meta-analysis by combining the previously published effort from our consortium 4 with unpublished data from the Epi25 collaborative 10 and four additional cohorts (Supplementary  Tables 1 and 2). Our primary mixed model meta-analysis constitutes 4.9 million SNPs tested in 52,538 controls and 29,944 people with epilepsy, of which 16,384 had neurologist-classified focal epilepsy (FE) and 7,407 had GGE. The epilepsy cases were primarily of European descent (92%), with a smaller proportion of African (3%) and Asian (5%) ancestry (Supplementary Table 3). Cases were matched with controls of the same ancestry, and GWAS analyses were performed separately per ancestry, before performing multi-ancestry meta-analyses for the broad epilepsy phenotypes 'FE' (n = 16,384 cases) and 'GGE' (n = 7,407 cases). We further conducted meta-analyses in individuals of European ancestry of the well-defined GGE subtypes of juvenile myoclonic epilepsy ( JME; n = 1,732), childhood absence epilepsy (CAE; n = 1,049), juvenile absence epilepsy ( JAE; n = 662) and generalized tonic-clonic seizures alone (GTCSA; n = 485), as well as the FE subtypes of FE with hippocampal sclerosis (HS; n = 1,260), FE with other lesions (n = 4,213) and lesion-negative FE (n = 5,778). The same controls (n = 42,436) were shared across the different subphenotypes. We ran a variety of follow-up analyses to identify potential sex-specific signals and obtain biological insights and opportunities for drug repurposing. Sample size prevented the inclusion of other ethnicities in the subtype analyses.

GWAS for the epilepsies
Our 'all epilepsy' meta-analysis revealed four genome-wide significant loci, of which two are new (Fig. 1). Similar to our previous GWAS 4 , the 2q24.3 locus was composed of two independently significant signals (Supplementary Table 4). Using ASSET to determine the extent of FE and GGE-related pleiotropy, the 2q24.3 and 9q21.13 signals showed pleiotropic effects at a genome-wide significance level, with concordant SNP effect directions for both forms of epilepsy (Supplementary Table 5). The 2p16.1 and 10q24.32 loci were primarily derived from GGE. The FE analysis did not reveal any genome-wide significant signals.
Our 'GGE' meta-analysis uncovered a total of 25 independent genome-wide significant signals across 22 loci, of which 13 loci are new. The strongest signal of association (P = 6.6 × 10 −21 ), located at 2p16.1, constitutes three independently significant signals. Similarly, the new locus 12q13.13 was composed of two independently significant signals (Supplementary Table 4). Forest plots and P-M plots of these signals show that they appear consistent across all four GGE subphenotypes, with some exceptions (Supplementary Figs. 1 and 2).
We applied multitrait analysis of GWAS (MTAG) 17 to exploit the correlation between FE and GGE, boosting the effective sample size. Results were concordant with our main analysis, and new signals did not emerge ( Supplementary Fig. 3).
Functional annotation of the 1,082 genome-wide significant SNPs across the 22 GGE loci and 270 SNPs from the 'all epilepsy' loci revealed that most variants were intergenic or intronic (Supplementary Data 1). Eight of 1,082 (0.7%) GGE SNPs were exonic, of which five were located in protein-coding genes and were missense variants. We identified one exonic 'all epilepsy' SNP (rs7580482, synonymous), located in SCN1A. Seventy-four percent of 'all epilepsy' SNPs and 64% of GGE SNPs were located in open chromatin regions, as indicated by a minimum chromatin state of 1-7 (ref. 14). Further annotation by Combined Article https://doi.org/10.1038/s41588-023-01485-w We used a combination of ten different criteria to identify the most likely implicated gene within each of the 26 associated loci from the meta-analysis (Methods). This resulted in a shortlist of 29 genes (Table  1; see Supplementary Data 6 for scores of all mapped genes), of which ten are monogenic epilepsy genes, seven are known targets of currently licensed ASDs and 17 are associated with epilepsy for the first time.
The strongest association signal for GGE was found at 2p16.1, consistent with our previous results where we implicated VRK2 or FANCL 24 . Our gene prioritization analysis suggests the transcription factor BCL11A as the culprit gene, located 2.5 Mb upstream of the lead SNPs at this locus. Two of three lead SNPs are in enhancer regions (as assessed by chromatin states in brain tissue) that are linked to the BCL11A promoter via 3D chromatin interactions ( Supplementary  Fig. 8). Rare variants in BCL11A were recently associated with intellectual disability and epileptic encephalopathy 25 . However, interrogation of the MetaBrain expression quantitative trait loci (eQTL) two-sided −log 10 P value is on the y axis. New genome-wide significant loci are highlighted in red, and loci previously associated with epilepsy in orange. New loci were those previously unreported as GWAS significant in previous epilepsy GWASs. Annotated genes are those implicated by our gene prioritization analyses. See Supplementary Fig. 7 for QQ plots. QQ plots, quantile-quantile plot.   ten criteria/methods, after which the gene with the highest score in the locus was selected as the prioritized gene. Genomic coordinates for each locus (hg19) can be found in Supplementary Table 4. Two-tailed P values and z scores were obtained by fixed-effects meta-analysis weighted by effective sample sizes. Total, number of satisfied criteria for gene prioritization; missense, the locus contains a missense variant in the gene; TWAS, significant transcriptome-wide association with the gene; SMR, significant summary-based Mendelian randomization association with the gene; MAGMA, significant genome-wide gene-based association; PoPS, gene prioritized by polygenic priority score; brain exp, the gene is preferentially expressed in brain tissue; brain-coX, the gene is prioritized as co-expressed with established epilepsy genes; KO mouse, knockout of the gene causes a neurological phenotype in mouse models; monogenic, the gene is a known cause of monogenic epilepsy.

The HLA system and common epilepsies
The highly polymorphic HLA region has been associated with various neuropsychiatric and autoimmune neurological disorders. Therefore, we imputed HLA alleles and amino acid residues using CookHLA v1.0.1 (ref. 26) and ran association across epilepsy, focal and GGE phenotypes, as well as the seven subphenotypes (Methods). No SNP, amino acid residue or HLA allele reached genome-wide significance (Supplementary Fig. 9). The most significant signal was an aspartame amino acid residue in exon 2 of HLA-B (position 31432494), which had a P value of 3.8 × 10 −7 for GGE.

SNP-based heritability
We calculated SNP-based heritability using LDAK to determine the proportion of epilepsy risk attributable to common genetic variants. We observed liability scale SNP-based heritabilities of 17 9 . Power analysis demonstrated that the current genome-wide significant SNPs only explain 1.5% of the phenotypic variance, whereas an estimated sample size of around 2.5 million individuals would be necessary to identify the causal SNPs that explain 90% of GGE SNP-based heritability (Supplementary Fig. 10).
To further explore the heritability of the different epilepsy phenotypes, we used LDSC to perform genetic correlation analyses 28 . We found evidence for a strong genetic correlation among all four GGE syndromes (Supplementary Fig. 11 and Supplementary Table 11). We also observed the previously reported significant genetic correlation 4 between the focal nonlesional and JME syndromes.
Here CAE also showed a significant genetic correlation with the focal nonlesional cohort. Multivariate modeling of genetic correlation using Genomic structural equation modeling (SEM) 29 confirmed that most of the heritability signal is shared among the four GGE syndromes, with some subtype-specific signals ( Supplementary Fig. 12).

Tissue and cell type enrichment
To further illuminate the underlying biological causes of the epilepsies, we used MAGMA 19 and data from the gene-tissue expression (GTEx) consortium to assess whether our GGE-associated genes were enriched for expression in specific tissues and cell types (Methods). We identified significant enrichment of associated genes expressed in brain and pituitary tissue ( Supplementary Fig. 13). The implication of the pituitary gland in GGE might reflect a hormonal component to seizure susceptibility. Further subanalyses showed that our results were enriched for genes expressed in almost all brain regions, including subcortical structures such as the hypothalamus, hippocampus and amygdala ( Supplementary Fig. 14). We did not find enrichment for genes expressed at specific developmental stages in the brain ( Supplementary Fig. 15).
Cell-type specificity analyses of GGE data using various single-cell RNA-sequencing reference datasets (Methods) revealed enrichment in excitatory as well as inhibitory neurons, but not in other brain cells like astrocytes, oligodendrocytes or microglia ( Supplementary  Fig. 16). Similarly, stratified linkage-disequilibrium (LD)-score regression using single-cell expression data (Methods) did not reveal a difference between excitatory and inhibitory neurons (P = 0.18).

Gene-set analyses
MAGMA gene-set analyses showed significant associations between GGE and biological processes involving various functions in the synapse (Supplementary Data 7). To further refine the synaptic signal, we performed a gene-set analysis using lists of expert-curated gene sets involving 18 different synaptic functions 30 . These analyses showed that GGE was associated with intracellular signal transduction (n = 139 genes, P = 9.6 × 10 −5 ) and excitability in the synapse (n = 54 genes, P = 0.0074). None of the other 16 synaptic functions showed any association (Supplementary Data 7). Genes involved with excitability include the N-type calcium channel gene CACNA2D2, implicated at the new GGE locus 3p21.31. N-type calcium channel blockers such as levetiracetam and lamotrigine are among the most widely used and effective ASMs for GGE as well as FE 31-33 . Together, these results suggest that the genes associated with GGE are expressed in excitatory as well as inhibitory neurons in various brain regions, where they affect excitability and intracellular signal transduction at the synapse.

Sex-specific analyses
There are known sex-related patterns in the epidemiology of epilepsy. Although females have a marginally lower incidence of epilepsy than males, GGE is known to occur more frequently in females 34 . To test whether this sex divergence has a genetic basis, we performed sex-specific GWAS for 'all', GGE and FE . These analyses revealed one female-specific genome-wide significant signal at 10q24.32 (lead SNP: rs72845653), containing KCNIP2. This locus was also implicated in our main GGE meta-analysis (lead SNP: rs11191156); however, the lead SNPs of these two signals show low allelic correlation (r 2 = 0.05; D′ = 0.87). Interestingly, the direction of effect of this signal is opposite in females and males. This sex difference is further corroborated by significant sex heterogeneity (P = 1.54 × 10 −8 ) and sex-differentiated GWAS (P = 5.6 × 10 −9 ) (ref. 35). Sex-related differences in transcription levels in human heart have previously been reported for KCNIP2 (ref. 36). We did not find any sex-divergent signals for 'all' or FE. These analyses were limited by a reduction in sample size and prone to random fluctuation.
We used LDSC to assess the genetic correlation between male-only and female-only GWAS. The male and female GWAS of 'all epilepsy,' FE and GGE were strongly genetically correlated (all r G > 0.9), and none of these correlations were significantly different from 1 (all P > 0.05). These results suggest that, with the exception of the female-specific 10q24.32 signal, the overall genetic basis of common epilepsy appears largely similar between males and females.

Genetic overlap between epilepsy and other phenotypes
To explore the genetic overlap of epilepsy with other diseases, we first used the GWAS Catalog 37 to cross-reference the 26 genome-wide epilepsy loci with other traits with significant associations (P < 5 × 10 −8 ) for the same SNP, or SNPs in strong LD with our lead SNPs (as detailed in Table 1). This analysis revealed 18 likely pleiotropic loci, with previous associations reported across a variety of traits, the most common being cognitive, sleep, psychiatric, coronary and blood cell-related ( Supplementary Fig. 20). The remaining eight loci appear to be specific to epilepsy (3p22.3, 4p12, 5q31.2, 7p14.1, 8q23.1, 9q21.13, 21q21.1 and 21q22.1).
We then performed genetic correlation analyses between 18 selected traits (Supplementary Table 12) and 'all', GGE and FE using LDSC 13 . The selected traits had either, or a combination of, epilepsy as a common comorbidity or pleiotropic loci shared with epilepsy. Significant correlations (P < 0.05/54 = 0.0009) were found with febrile seizures, stroke, headache, ADHD, type 2 diabetes and intelligence (Fig. 2).
Genetic correlation analyses assess the aggregate of shared genetic variants associated with two phenotypes. However, genetic correlations can become close to zero when there is inverse directionality of SNP effects between two phenotypes 38 . To explore this further, Article https://doi.org/10.1038/s41588-023-01485-w we applied MiXeR v1.2.0 to quantify the polygenic overlap between GGE and the same 18 selected traits, irrespective of genetic correlation (Methods). Results showed a large polygenic overlap between epilepsy and various other brain traits ( Supplementary Fig. 21). For most selected brain traits, the direction of effect was concordant for 40-60% of SNPs. This might explain why some LDSC correlations were low, together with other relevant factors including sample size, polygenicity and trait genetic architecture. In combination, these analyses suggest that the SNPs involved with GGE are highly pleiotropic; a large proportion of the ~2,850 causal SNPs underlying GGE seem to underlie the risk of a wide range of other brain diseases and traits, often with opposing directions of effect. These results emphasize that each phenotype has a specific underlying distribution of effect sizes and directions among shared causal variants, which together explain the shared and unique risk for different brain diseases.

Leveraging GWAS for drug repurposing
We next tested the potential of our meta-analysis to inform drug repurposing, by predicting the relative efficacy of drugs for epilepsy (Methods). This analysis was based on the predicted ability of each drug to modulate epilepsy-related changes in the function and abundance of proteins, as inferred from the GWAS summary statistics (Methods) 39 . In our predictions for all epilepsy, current ASMs were ranked higher than expected by chance (P < 1 × 10 −6 ) and higher than drugs used to treat any other human disease (Supplementary Data 8). These observations were also true for a 'test set' (randomly selected 50%) of ASMs, when the remaining ASMs ('training set') were used for optimizing the predictions.
For GGE, broad-spectrum ASMs were predicted to be more effective than narrow-spectrum ASMs (P < 1 × 10 −6 ), consistent with clinical experience 40 . Furthermore, the predicted order of efficacy for GGE of individual ASMs matched their observed order in the largest head-to-head randomized controlled clinical trials for generalized epilepsy 33,41 , an observation unlikely to occur by chance (P < 1 × 10 −6 ).
Using this approach, we highlight the top 20 drugs that are licensed for conditions other than epilepsy, but are predicted to be efficacious for generalized epilepsy, and additionally have published evidence of antiseizure efficacy from multiple published studies and multiple animal models (Supplementary Table 13). The full list of all predictions can be found in Supplementary Data 9.

GWAS in epilepsies ascertained from population biobanks
Finally, we leveraged the data from several large-scale population biobanks and from deCODE genetics to explore the consistency of the epilepsy loci in cohorts that were less deeply phenotyped (total cases n = 21,734, total controls n = 1,023,989, phenotyped using International Classification of Diseases (ICD) codes; Methods; Supplementary Table 14). Forest plots showed a consistent direction of effect between the biobanks and our primary GWAS for all biobank-genotyped genome-wide significant top SNPs of the 'all epilepsy' GWAS and for all but one GGE top SNP (Supplementary Figs. 22 and 23). Although the biobank and deCODE genetics-specific GWAS did not identify any genome-wide significant loci for GGE or 'all epilepsy,' one significant locus at 2q22.1 (nearest gene, NXPH2) emerged for FE ( Supplementary Fig. 24).
Meta-analysis of the biobank and deCODE genetics summary statistics with those from the primary epilepsy GWAS identified seven significant loci for the 'all epilepsy' phenotype. Six of these signals were previously identified in the primary 'all epilepsy' (n = 4) or the 'GGE' GWAS (n = 2). One locus (2q12.1) was new. The combined biobank and deCODE genetics meta-analysis for GGE identified five new loci, but four loci from our primary GWAS fell below the threshold of significance ( Supplementary Fig. 25). The combined FE meta-analysis showed no significant associations. LDSC between the biobank/deCODE genetics and the primary GWAS results showed genetic correlations ranging between 0.31 and 0.74 (Supplementary Table 15).

Discussion
In this study, we leveraged a substantial increase in sample size to uncover 26 common epilepsy risk loci, of which 16 have not been reported previously. Using a combination of ten post-GWAS analysis methods, we pinpointed 29 genes that most likely underlie these signals of association. These signals showed enrichment throughout the brain and indicate an important role for synapse biology in excitatory as well as inhibitory neurons. Drug prioritization from the genetic data highlighted licensed ASMs, ranked the ASMs broadly in line with clinical experience and pointed to drugs for potential repurposing. These findings further our understanding of the pathophysiology of common epilepsies and provide new leads for therapeutics.
The 26 associated loci included some notable monogenic epilepsy genes. These include the calcium channel gene CACNA2D2, an established epileptic encephalopathy gene 42 that is directly targeted by ten currently licensed drugs, including two ASMs (gabapentin and pregabalin) as well as the Parkinson's disease drug safinamide and the nonsteroidal anti-inflammatory drug celecoxib. Both safinamide and celecoxib have evidence of antiseizure activity 43,44 . SCN8A, which encodes a voltage-gated sodium channel, is an established epileptic encephalopathy gene and is associated here with common epilepsies. Na v 1.6 (encoded by SCN8A) is targeted by commonly used sodium channel-blocking drugs, the most efficacious ASMs for people with monogenic SCN8A-related epilepsies, that are often caused by gain-of-function pathogenic variants 45 . Additional drugs targeting Na v 1.6 include safinamide and quinidine. RYR2 encodes a ryanodine receptor, is an established cardiac disorder gene, has recently been implicated in epilepsy 46,47 and is targeted by caffeine as well simvastatin, atorvastatin and carvedilol. The acetylcholine receptor gene CHRM3 has been previously associated with epilepsy 48 and is targeted by drugs including solifenacin, used to treat urinary incontinence.   Fig. 2 | Genetic correlations of epilepsy with other phenotypes. The genetic correlation coefficient was calculated with LDSC and is denoted by color scale from −1 (red; negatively (anti-)correlated) to +1 (blue; positively correlated).
The square size relates to the absolute value of the corresponding correlation coefficient. Single asterisk indicates two-sided P < 0.05 and double asterisk indicates two-sided P < 0.0009 (Bonferroni corrected).
Article https://doi.org/10.1038/s41588-023-01485-w We found that GGE, in particular, has a strong contribution from common genetic variation. When analyzing individual GGE syndromes, we found that up to 90% of liability is attributable to common variants in the JAE subtype, making it among the highest of over 700 traits reported in a large GWAS atlas 49 (albeit with relatively large CIs; Supplementary Table 10). The heritability estimates decrease to 40% for the collective GGE phenotype, possibly due to increased heterogeneity from combining syndromes with pleiotropic as well as syndrome-specific risk loci. Although statistical power drastically decreased when assessing specific GGE syndromes, three loci appeared specific to JME. These findings highlight the unique genetic architecture of the subtypes of common epilepsies, which are characterized by a high degree of both shared and syndrome-specific genetic risk.
In contrast to GGE, for FEs, we found only a minor contribution of common variants, with no variant reaching genome-wide significance. It would seem that FEs, as a group, are far more heterogeneous than GGE, lack (common-variation) loci with high effect sizes, have a higher degree of polygenicity and/or have a lower contribution of common heritable risk variation. Our attempt to mitigate this heterogeneity by performing subtype analysis contrasted with the results from GGE, suggesting different genetic architectures, consistent with the experience from studies of common 9 and rare 5 genetic variation and polygenic risk score analyses 6 . There is also emerging evidence for a substantial role of noninherited, somatic mutations in FEs 50 .
This work highlights the challenges of working with epilepsy cohorts ascertained through large biobanking initiatives. Accurate classification of epilepsy requires a combination of clinical features, electrophysiology and neuroimaging. Such details were absent from the biobanks we worked with. Rather, phenotypes were generally limited to ICD codes, which are prone to misclassification 51 . Population biobanks are also probably ascertaining milder epilepsies that are responsive to treatment, contrasting with the enrichment for refractory epilepsies at tertiary referral centers.
Moreover, a proportion of adults with epilepsy have an acquired brain lesion, such as stroke, tumors or head trauma. Biobanks typically provide self-reported clinical information and codes from primary care and inpatient hospital care episodes, but not neurological specialist outpatient records that would indicate whether previous brain insults were considered relevant to epilepsy. As a result, the inclusion of the biobank data appeared to introduce more heterogeneity. This contrasts with genetic mapping of other polygenic diseases like type 2 diabetes and migraine, which are relatively easy and reliable to diagnose and classify, resulting in a great increase in GWAS loci when including data from the same biobanks as included in our study 52,53 .
We found enrichment of GGE variants in brain-expressed genes, involving excitatory and inhibitory neurons, but not any other brain cell type. This contrasts with other neurological diseases. For example, microglia are involved in Alzheimer's disease 54 and multiple sclerosis 55 , whereas migraine does not appear to have brain cell specificity 53 . We further refine this signal by showing the involvement of synapse biology, primarily intracellular signal transduction and synapse excitability. These findings suggest an important role of synaptic processes in excitatory and inhibitory neurons throughout the brain, which could be a potential therapeutic target. Indeed, synaptic vesicle transport is a known target of the ASMs levetiracetam and brivaracetam 56 .
We confirmed that our GWAS-identified genes had substantial overlap with monogenic epilepsy genes. A similar convergence of common and rare variant associations has been observed for other neurological neuropsychiatric conditions including schizophrenia 57 and ALS 58 . The genes prioritized in our GWAS signals also overlapped with known targets of current ASMs 4 , and we have provided a list of other drugs that directly target these genes. Moreover, using a systems-based approach 39 , we highlight drugs that are predicted to be efficacious when repurposed for epilepsy, based on their ability to perturb function and abundance in gene expression. Insights from GWAS of epilepsy have the potential to accelerate the development of new treatments via the identification of promising drug repurposing candidates for clinical trials 59 . We anticipate that follow-up studies of the highlighted drugs in this study could show clinical efficacy in epilepsy treatment.
In summary, these new data reveal markedly different genetic architectures between the milder and more common focal and generalized epilepsies, provide new biological insights to disease etiology and highlight drugs with predicted efficacy when repurposed for epilepsy treatment.

Online content
Any methods, additional references, Nature Portfolio reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-023-01485-w.

Inclusion and ethics statement
Local institutional review boards approved study protocols at each contributing site. All study participants provided written, informed consent for the use of their data in genetic studies of epilepsy. For minors, written informed consent was obtained from their parents or legal guardian.

Sample and phenotype descriptions
This meta-analysis combines previously published datasets with new genotyped cohorts. Descriptions of the 24 cohorts included in our previous analysis can be found in the Supplementary Table 6 of that publication 4 . Here we included five new cohorts (Supplementary Table  1), comprising 14,732 epilepsy cases and 22,362 controls, resulting in a total sample size of 29,944 cases and 52,538 controls. Classification of epilepsy was performed as described previously (see Supplementary Note for a detailed description) 4 . In brief, we assigned people with epilepsy to FE, GGE or unclassified epilepsy. 'All epilepsy' was the combination of GGE, focal and unclassified epilepsy. Where possible, we used EEG, MRI and clinical history to further refine the subphenotypes-JME, CAE, JAE, GTCSA, nonlesional FE, FE with HS and FE with lesions other than HS.

Genotyping, quality control (QC) and imputation
Study participants were genotyped on SNP arrays (see Supplementary Table 1 for an overview of genotyping in new cohorts). QC was performed separately for each cohort. Pre-imputation QC included removal of SNPs with call rate (<98%), differential missing rate, duplicated and monomorphic SNPs, SNPs with batch association (P < 10 −4 ) and violation of Hardy-Weinberg equilibrium (P < 10 −10 ). In addition, the Epi25 cohort was split by ancestry, based on principal component analysis. Individuals were removed if their heterozygous/homozygous ratio was >4 s.d. from the mean. We also removed one from each pair of related samples (determined by identity-by-descent >0.2) and removed individuals with ambiguous or nonmatching genetically imputed sex. Furthermore, 3,180 duplicates between the Epi25 cohort and the previously published genome-wide mega-analysis 4 were identified based on genotype and were removed from the Epi25 cohort. Of the 3,180 duplicates, 1,226 were GGE and 1,402 FE. Before imputation, cohorts were cross-referenced to the Haplotype Reference Consortium (HRC) panel to ensure SNPs matched in terms of strand, position and ref/alt allele assignment. Additionally, SNPs were removed if they were absent in the HRC panel, if they had a >20% allele frequency difference with the HRC panel or if any AT/GC SNPs had MAFs >40%, using tools available from https://www.well.ox.ac.uk/~wrayner/tools/. Data from Janssen Pharmaceuticals, Austrian GenEpa, Swiss GenEpa, Norwegian GenEpa and BPCCC were then imputed using the Wellcome Sanger Institutes' imputation server (https://imputation.sanger. ac.uk/), using EAGLE v2.4.1 (ref. 60) for phasing, and the Positional Burrows-Wheeler Transform algorithm 61 v3.1 for imputation. The HRC reference panel r1.1 was used as a reference for imputation (n = 32,470) (ref. 62). Similarly, data from the Epi25 cohort were imputed using the Michigan Imputation server (https://imputationserver.sph.umich. edu/). We used the HRC r1.1 as the reference panel for individuals of European and Asian ancestry and the 1000 Genomes Phase 3 v5 (n = 2,504) for individuals of African ancestry. Default imputation parameters were used. Due to data sharing restrictions and with the Epi25 cohort data located in the USA and the other cohorts located in the European Union, we were unable to merge the data or use the same imputation server. Postimputation QC was largely similar among all cohorts. The Epi25 cohort used an in-house pipeline, where imputed dosages were used for genome-wide association analyses, filtering on imputation INFO > 0.3, MAF < 1%, genotype coverage <0.98 and Hardy-Weinberg violations (P < 10 −5 ). For all other cohorts, the same procedures as our previous study 4 were used-imputed datasets were converted to hard-coded PLINK format, requiring a more stringent imputation filtering of INFO > 0.9 (as opposed to dosages, where imputation inaccuracy is incorporated in downstream analyses). Furthermore, we removed SNPs with MAF < 5%, genotype coverage <0.98 and Hardy-Weinberg violations (P < 10 −5 )(ref. 4). We removed SNPs <5% MAF in the Janssen Pharmaceuticals, Austrian GenEpa, Swiss GenEpa, Norwegian GenEpa and BPCCC cohorts for QC reasons, and note there will be a corresponding loss in study power for lower frequency SNPs in the 'focal' and 'all epilepsy' epilepsy analysis.

Genome-wide association analyses
GWAS of the Janssen Pharmaceuticals, Swiss GenEpa, Norwegian GenEpa and Austrian GenEpa cohorts was performed as a mega-analysis, as described previously 4 . GWAS of the Epi25 cohort was performed with a generalized mixed model using SAIGE v0.38 (ref. 63). SAIGE was performed in two steps. First, we fit the null logistic mixed model to estimate the variance component and other model parameters. For this step, SNPs were filtered on-call rate >0.98 and MAF > 5%, and SNPs were pruned to obtain approximate independent markers (window size of 100 SNPs and r 2 > 0.3). Second, we tested for the association between each genetic variant and phenotypes by applying SPA to the score test statistics. Next, we performed P value-based fixed-effects meta-analyses with METAL v2020-05-05 (ref. 64) for each of the main phenotypes ('all', GGE and FE), as well as the subphenotypes, weighted by effective samples sizes (n eff = 4/ (1/n cases + 1/n controls )) to account for case-control imbalance. We performed multi-ancestry and European-only meta-analyses for the main phenotypes, and restricted the subphenotype analyses to Europeans only, due to limited sample size in other ancestries. We included all SNPs (~4.9 million, MAF > 1%) that were present in at least the previous mega-analysis and the Epi25 dataset, which together account for 88% of the total sample size. We calculated genomic inflation factors (λ), mean χ 2 and LD-score regression intercepts to assess potential inflation of the test statistic. Because λ is known to scale with sample size, we also calculated λ1000, which is λ corrected for an equivalent sample size of 1,000 cases and 1,000 controls 65 . We limited these analyses to participants of European ancestry because LD-structure depends on ethnicity and Europeans constituted 92% of cases. For forest plots of genome-wide significant hits, Beta/SE was estimated from METAL z scores using a previously published formula 22 . For P-M plots, m values were generated using the default settings of the tool Metasoft v2.0.0 (ref. 66).

Data sources for the biobank and deCODE genetics GWAS
Summary statistics for epilepsy GWAS were obtained from three population biobanks (UK Biobank 67 , Biobank Japan 68,69 and FinnGen release R6 (ref. 70)) and from deCODE genetics 71 (Iceland). The Biobank Japan, FinnGen and deCODE genetics epilepsy cases were further assigned into either 'focal' or 'generalized' epilepsy, whereas the UK Biobank samples were not subdivided based on seizure localization, as the relevant clinical details were unavailable to facilitate an accurate subdivision (see Supplementary Table 14 for sample sizes per biobank and deCODE genetics). Control data were population-matched samples with no history of epilepsy.
UK Biobank. We identified people with epilepsy from the UK Biobank using an analysis of self-reported data, inpatient hospital episode statistics, death certificate diagnostic data and primary care diagnostic data as described elsewhere 72 . This allowed us to interrogate the evidence available to support a diagnosis of epilepsy rather than relying purely on UK Biobank-generated data fields 131048 and 13049 based on ICD-10 G40 mapping. Control data were population-matched samples with no history of epilepsy. GWAS fixed-effects meta-analyses were conducted using METAL 64 . To account for case-control imbalance, the effective sample size for each cohort was calculated as n eff = 4/(1/n cases + 1/n controls )). GWAS Manhattan plots were generated using the qqman package 73 in R v3.6.0. Genome-wide significant loci were mapped onto genes using the FUMA web platform 18 .
We performed three meta-analyses. As a primary analysis, we meta-analyzed all nonbiobank samples, then we meta-analyzed only biobank/deCODE genetics samples and finally, we performed a combined meta-analysis of biobank/deCODE genetics and nonbiobank samples.

Pleiotropy analysis
ASSET 74 is a meta-analysis-based pleiotropy detection approach that identifies common or shared genetic effects between two or more related, but distinct traits. We used ASSET v2.2.0 with a genome-wide significance level of α = 5 × 10 −8 . We applied ASSET to the subset of European-ancestry samples, comprising 6,952 (3,244 + 3,708) GGE cases and 14,939 (5,344 + 9,095) FE cases from the Epi25 and our consortium as well as 42,434 partially overlapping controls from both consortia. Note that ASSET accounts for sample overlap in the analysis. Effect sizes, standard errors and the effective sample sizes estimated were from the main meta-analysis.

HLA association
Given the prior association of the HLA with autoimmune epilepsy 75,76 , we included a specific analysis of the HLA. HLA types and amino acid residues were imputed using CookHLA software v1.0.1 (ref. 26), with the 1000 Genomes Phase 3 used as a reference panel 77 . Samples were grouped by genetic ancestry for imputation.
Following imputation, association analysis was conducted using the HLA Analysis Toolkit (HATK) v1. 2 (ref. 78). The following three phenotypes were analyzed: 'all epilepsy', FE and GGE. Samples from the ILAE and Epi25 datasets were analyzed separately, and the association results were meta-analyzed across datasets and ancestries using PLINK v1. 9 (ref. 79).

Functional annotation
We annotated all genome-wide significant SNPs and tagged SNPs within the loci from our multi-ancestry meta-analyses. ANNOVAR v2017-07-17 was used to retrieve the location and function of each SNP 80 , the CADD score was used as a measure of predicted deleteriousness 81 and chromatin states were incorporated from the ENCODE and NIH Roadmap Epigenomics Mapping Consortium 14,82 . We used FUMA v1.3.8 to define the independently significant SNPs within loci; that is, SNPs that were genome-wide significant but not in LD (r 2 < 0.2 in Europeans) with the lead SNP in the locus.

MTAG
MTAG v1.0. 8 (ref. 17) was used (with default settings) to increase the effective sample size from our European ancestry GGE subphenotype analysis by pairing it with the strongly correlated overall GGE GWAS with a larger sample size. MTAG accounts for sample overlap between traits and uses the fact that estimations of effect size and standard error of a primary GWAS, in this case GGE subtypes, can be improved by matching them to a genetically correlated secondary GWAS, in this case GGE 17 . Similarly, we applied MTAG to combine FE with GGE.

Gene mapping
To map genome-wide significant loci from our multi-ancestry meta-analyses to specific genes, we used FUMA v1. 3.8 (ref. 18) with the same parameters as published previously 4 . We defined genome-wide significant loci as the region encompassing all SNPs with P < 10 −4 that were in LD (r 2 > 0.2) with the lead SNP (that is, the SNP with the strongest association within the region). We used a combination of positional mapping (within 250 kb from the locus), eQTL mapping (SNPs with FDR corrected eQTL P < 0.05 in blood or brain tissue) and 3D Chromatin Interaction Mapping (FDR P < 10 −6 in brain tissue).

Genome-wide gene-based association study (GWGAS) and gene-set analyses
We performed the GWGAS using the default settings of MAGMA v1.08, as implemented in FUMA v1.3.8, which calculates an association P value based on all the associations of all SNPs within each gene in the GWAS 19 . Based on these GWGAS results, we performed competitive gene-set analyses with default MAGMA settings, using 15,483 default gene sets and GO-terms from MsigDB. In addition, we specifically assessed 18 curated gene sets involving different synaptic functions 30 .

TWAS
TWAS was performed with FUSION v3, with default settings 20 . We imputed gene expression based on our European-only GWAS (because the method relies on LD reference data) eQTL data from the PsychEN-CODE consortium, which includes dorsolateral prefrontal cortex tissue from 1,695 individuals 21 . SMR SMR v1.03 is an additional method to assess the association between epilepsy and expression of specific genes 22 . Although TWAS and SMR have similar aims, the differences in methods and reference datasets result in complementary information. As opposed to the FUSION TWAS method, which uses multi-SNP imputation of gene expression, SMR uses Mendelian randomization to test whether the effect size of an SNP on epilepsy is mediated by the expression of specific genes. We performed SMR analyses with default settings, using European-only GWAS and the MetaBrain expression data as reference, a new eQTL dataset including 2,970 human brain samples 83 .

Sex-specific analyses
We performed a GWAS, as described above, for all epilepsy (13,889 female cases and 19,676 female controls; 12,259 male cases and 18,645 male controls) and GGE (3,946 female cases and 19,676 female controls; 2,603 male cases and 18,645 male controls) separately for participants of either sex, after which we performed fixed-effects meta-analyses with METAL to merge the different cohorts. We performed meta-analyses between the male and female GWAS with GWAMA v2.2.2 (ref. 84) to assess the heterogeneity of effect sizes between sexes and sex-differentiated associations 35 . Sex-differentiated analyses are meta-analyses between female-only and male-only GWAS, allowing for different effect sizes between the sexes, while sex-heterogeneity tests the difference in effect size for each SNP between female-only and male-only GWAS 35 . Article https://doi.org/10.1038/s41588-023-01485-w

Gene prioritization
We combined ten methods to prioritize the most likely biological candidate gene within each genome-wide significant locus. For each gene in each locus, we assessed the following criteria: • Missense: we assessed whether the SNPs tagged in the genomewide significant locus contained an exonic missense variant in the gene, as annotated by ANNOVAR v2017-07-17. • TWAS: we assessed whether imputed gene expression was significantly associated with the epilepsy phenotype, based on the FUSION TWAS as described above, Bonferroni corrected for each mapped gene with expression information. • SMR: we assessed whether the gene had a significant SMR association with the epilepsy phenotype, based on the SMR analyses as described above, Bonferroni corrected for each mapped gene with expression information. • MAGMA: we assessed whether the gene was significantly associated with the epilepsy phenotype through a GWGAS analysis, Bonferroni corrected for each mapped gene. • PoPS: we calculated the polygenic priority score (PoPS) 85 , a method that combines GWAS summary statistics with biological pathways, gene expression and protein-protein interaction data, to pinpoint the most likely causal genes. We scored the gene with the highest PoPS score within each locus. • Brain expression: for each mapped gene, we calculated the mean expression in all brain and nonbrain tissues based on data from the GTEx project v8 (ref. 86). Next, we assessed whether the gene was more strongly expressed in brain tissues than nonbrain tissues, by comparing the average expression in all brain tissues with all nonbrain tissues. • Brain-coX: we assessed whether genes were prioritized as co-expressed with established epilepsy genes in more than a third of brain tissue resources used, using the tool brain-coX (Supplementary Fig. 26) 87 . Similar to previous studies 4, 90 , we scored all genes based on the number of criteria being met (range: 0-10; all criteria had an equal weight). The gene with the highest score was chosen as the most likely implicated gene (see Supplementary Data 6 for a complete list of scores for all genes in each locus). We implicated both genes if they had an identical, highest score. We calculated Pearson correlation coefficients between the ten criteria (Supplementary Table 16) and note that most correlations were low (range: −0.13 to 0.39), suggesting that they convey complementary information.

Long-distance expression regulation of BCL11A
Most eQTL databases, like PsychENCODE and MetaBrain, restrict eQTL analyses to 1 Mb distance between genes and SNPs. To specifically assess the hypothesis of long-distance regulation of BCL11A by the lead SNPs in the 2p16.1 epilepsy locus, we manually interrogated the MetaBrain database 83 without distance restraints. Next, we calculated the association between the three lead SNPs in the locus (rs11688767, rs77876353 and rs13416557) with BCL11A expression.

Heritability analyses
We calculated SNP-based heritability on the European-only GWAS using LDAK v5.2, as it was recently shown to give more accurate heritability estimates for complex traits, when compared to other methods including LDSC 91,92 . We used default settings in LDAK and precalculated LD weights from 2,000 European (white British) reference samples under the BLD-LDAK SumHer model 92 . SNP-based heritabilities were converted to liability scale heritability estimates, using the following formula: h 2 l = h 2 o × K 2 (1 − K) 2 /p(1 − p) × Z 2 , where K is the disease prevalence, p is the proportion of cases in the sample and Z is the standard normal density at the liability threshold. To decrease downward bias, we performed these calculations based on the effective sample sizes (see calculation above), after which p = 0.5 can be assumed 93 , with the same population prevalences as our previous study (Supplementary Table 10) 4 . The total amount of causally associated variants (that is, variants with nonzero additive genetic effect) underlying epilepsy risk was calculated by a causal mixture model (MiXeR) v1.2.0 (ref. 38). MiXeR uses a likelihood-based framework to estimate the amount of causal SNPs underlying a trait, without the need to pinpoint which specific SNPs are involved. Furthermore, MiXeR allows for power calculations to assess the required sample size to explain a certain proportion of SNP-based heritability by genome-wide significant SNPs.

Genomic SEM
Genomic SEM entails two stages of estimation 29 . In the first stage, the empirical genetic covariance matrix and sampling covariance matrix are estimated using an extension of multivariable LDSC. This matrix is extended to include SNP effects for the multivariate GWAS SEM. In the second stage, an SEM is specified, and its parameters are estimated such that the discrepancies in the model covariance matrix and the empirical covariance matrix are minimized. The Genomic SEM models are specified such that the SNP effect, defined by multiple traits, occurs at a level of a latent factor (F g ), and the model fit is assessed using model chi-square, Akaike information criterion and standardized root mean square. However, this method also provides evidence of heterogeneity between the phenotypes via the QSNP statistics, which show the extent to which the univariate regression effects of SNPs for each phenotype are explained by a common genetic factor. QSNP is a chi-square distributed statistic that can test whether SNPs act entirely through a common factor.

Enrichment analyses
We used MAGMA v1.08 (as implemented in FUMA) to perform tissue and cell-type enrichment based on our multi-ancestry meta-analyses. First, we assessed whether our GGE GWAS was enriched for specific tissues from the GTEx database. Similarly, we assessed the enrichment of genes expressed in the brain at 11 general developmental stages, using data from the BrainSpan consortium. Next, we assessed whether GGE was associated with specific cell types, by cross-referencing two single-cell RNA-sequencing databases of human developmental and adult brain samples. The PsychENCODE database contains RNA-sequencing data from 4,249 human brain cells from developmental stages and 27,412 human adult brain cells 94 . The Zhong dataset (GSE104276) contains RNA-sequencing data from 2,309 human brain cells at different stages of development 95 . We performed FDR correction across datasets to assess which cell types were significantly associated with GGE. As a sensitivity analysis, we performed stratified LDSC with default settings using the cell-specific gene expression weights from the PsychENCODE consortium to compare GABAergic with glutamatergic neuron enrichment 96 .

Recruitment
Case and control samples were recruited from tertiary hospital and academic research centres. All cases were diagnosed with epilepsy syndrome according to the same international guidelines and classification system, however, it is possible that the application of diagnostic criteria across cohorts may slightly differ. This ascertainment bias may have resulted in a reduction to the overall power of the study and the generalizability of results.

Ethics oversight
All contributing case and control sites collected samples following local IRB/ethics committee approval. A full list of approval bodies can be found in Supplementary Table 1.
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences
Behavioural & social sciences Ecological, evolutionary & environmental sciences For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Sample size was not predetermined, however, we note that this study is almost twice the size of the previous largest epilepsy GWAS published in 2018.
Data exclusions We excluded poorly genotyped SNPs and outlier samples according to the various QC parameters which are described in our methods.