Introduction

The absence epilepsies are a group of idiopathic generalised epilepsies (IGEs), which vary in their age of onset, seizure frequency and pattern of evolution. The typical absence seizure is manifested as a transitory loss of awareness with 2.5–4 Hz spike–wave complexes on ictal EEG. The International League Against Epilepsy (ILAE) classification recognises a number of distinct absence epilepsy syndromes, including childhood absence epilepsy (CAE), juvenile absence epilepsy (JAE), epilepsy with myoclonic absences, eyelid myoclonus with absences and juvenile myoclonic epilepsy.1 However, it is uncertain whether they represent a ‘biological continuum’ or distinct entities. There is some evidence that CAE and JAE share a close genetic relationship, allowing them to be considered as one phenotype in genetic studies.2, 3

Twin studies demonstrate that the IGEs have a significant heritability,4 with regards to both occurrence and type of seizure and syndrome.5 The molecular genetic basis of CAE in humans is presently unknown, but studies on the mechanism by which spike–wave seizures are generated, isolation of genes causing spike–wave seizures in rodents and initial linkage and association studies in humans have allowed candidate genes and chromosomal regions to be identified.

Four mouse models of spike–wave epilepsy are caused by mutations in genes for different subunits of voltage-gated calcium channels (VGCCs): tottering tg, Cacna1a;6 lethargic lh, Cacnb4;7 ducky du, Cacna2d28 and stargazer stg, Cacng2.9 There is some evidence that the γ-subunits may function as transmembrane receptor regulatory proteins (TARPs),10 involved in trafficking.11

Genome-wide linkage analysis of IGE-multiplex families has demonstrated evidence for susceptibility loci on chromosomes 3q26, 14q23 and 2q36.12 Furthermore, loci for three similar forms of absence epilepsy have been identified on chromosomes 8q24 (ECA1), 5q31.1 (ECA2) and 3q26 (ECA3).13, 14, 15, 16 An association in humans has been documented between polymorphisms in CACNA1A (chromosome 19p13.2–p13.1) and IGE including CAE.17 Finally, 12 missense mutations in CACNA1H (chromosome 16p13.3) have been found in 14 sporadic Chinese Han patients with CAE but not in any of 230 unrelated controls.18 However, Heron et al.19 screened exons 9–11 of CACNA1H (in which 75% of the missense mutations were found) in 192 patients with IGE or generalised epilepsy with febrile seizures plus but did not find any of those identified by Chen et al. Furthermore, we did not find any of the 12 missense mutations in our resource of CAE families and trios; nor did we find any evidence for linkage to the CACNA1H locus.20

Previous analysis of 33 nuclear families with CAE under the assumption of heterogeneity produced evidence supportive of linkage to the CACNG3 locus on 16p12–p13.1,21 with an HLOD score of 0.55 (α=0.35) and a non-parametric linkage (NPL) score of 1.21. Although the HLOD score did not reach statistical significance, this may reflect the lack of power in the family resource and locus heterogeneity. This gene was prioritised for further analysis, because it had the second most positive HLOD score, albeit not statistically significant, and because it is a compelling candidate on biological grounds. The GABA cluster on chromosome 15q, which had the most positive HLOD score, has also been investigated further but the results are not presented here. Those genes with HLOD scores of zero and without any supportive evidence from NPL analysis were not pursued in the larger resource.

Available evidence therefore suggests that genes encoding brain expressed VGCCs including CACNG3 may contribute to the aetiology of CAE. The aim of this work was to test this hypothesis by linkage analysis using microsatellite loci spanning CACNG3 in a resource of 65 nuclear families each with a proband with CAE, expanded from the original resource of 33 families and by association analysis using 23 single nucleotide polymorphisms (SNPs) distributed across CACNG3 in the nuclear families and 217 parent-affected child trios.

Materials and methods

Subjects and samples

The 217 trios (affected child and both parents) and 65 nuclear pedigrees (with a total of 145 AE cases, including 25 of the pedigrees originally used by Robinson et al21) were all of Caucasian origin and ascertained from European populations, including the UK, France, Germany, Austria, the Netherlands, Denmark, Sweden, Finland and Italy. Clinical data on subjects categorised as affected are provided in the Supplementary Data. Appropriate informed consent was obtained from all participants. Diagnostic criteria based on the ILAE classification of absence epilepsies were applied as described in the Supplementary Data.1

Genomic DNA was extracted from whole blood or cheek swab samples according to standard protocols.

Linkage analysis

Linkage analysis was performed using three fluorescently-labelled microsatellite markers: D16S420, situated 5’ of the gene; URB036 in the gene; a novel marker, UCL1032122 situated 3’ of the gene (chromosome 16: 24416648–24416860). They span a genetic distance of 0.83 cM and a physical distance of 273 Kb (Figure 1). This is a smaller distance than originally tested for linkage by Robinson et al. because we are specifically targeting CACNG3, whereas markers to include CACNA1H as well were previously used.

Figure 1
figure 1

Chromosome 16pl2–13.1 locus linkage analysis combined with a map illustrating the distribution of microsatellite markers used in this analysis with respect to the gene CACNG3. The corresponding genetic map was based on data from the HapMap release 16c.l.

Genotyping was performed on the ABI 373 Sequence Analyser using the Genescan® and Genotyper® software. All pedigrees were checked for Mendelian inheritance using the PedCheck program.23 Any pedigrees that failed this test were re-genotyped. Multipoint linkage analysis was performed using GeneHunter 2.1.24 Parametric analysis was performed under the assumption of autosomal dominant inheritance with a penetrance of 50%. A disease allele frequency of 0.01 and a phenocopy rate of 0.0001 were assumed. These values are compatible with the observed population prevalence and sibling recurrence risk ratio attributable to the locus, based on the original calculations of Risch.25 HLOD scores as well as an estimate of α, which represents the proportion of pedigrees consistent with linkage at a specific locus, were calculated. The NPL statistic, along with the corresponding degree of significance, was also calculated by GeneHunter. The NPL statistic assesses the degree of allele-sharing in affected individuals only and is valuable as it is a ‘model-free’ form of analysis thus bypassing the inherent problems of a parametric analysis such as misspecification of parameters.

Re-sequencing of coding exons

Bi-directional re-sequencing of the coding exons (chromosome 16: 24174862–24175823; 24265106–24266056; 24273263–24274226 and 24279814–24281595) was undertaken in 59 cases taken from the 65 nuclear pedigrees. This re-sequencing was performed in conjunction with ABC at Imperial College, London, using standard Sanger dideoxy protocols.

SNP analysis

Genomic DNA was typed for 23 SNPs by KBiosciences using both the Amplifluor™ and Taqman™ chemistries. SNPs were chosen at 10 kb intervals encompassing the putative promoter region through to the 3’UTR (Figure 2). The SNPs have been numbered from 1 to 23 for ease of reference. One of these is a novel SNP identified via previous sequencing of a subset of the nuclear pedigrees; the remaining 22 can be found on the NCBI SNP database (Table 1).

Figure 2
figure 2

Block structure of CACNG3 locus based on the whole resource as defined by Haploview using a minimum D′ of 0.7. The boxed SNPs are those which demonstrated significant transmission disequilibrium.

Table 1 Details of SNPs used for association analysis

These SNPs were typed in the entire resource and the genotypes were used to construct linkage disequilibrium (LD) blocks with Haploview 3.2.26 Blocks were defined as a solid-spine of LD, that is, the first and last marker in a block are in strong LD with all intermediate markers (one slight mismatch is allowed by the programme), but these intermediate markers are not necessarily in LD with one another. A minimum D’ of 0.7 was used as the cutoff point for strong LD. The program's standard colour scheme was employed, with pairwise D’ values less than 1 shown and the degree of pink/red shading representing a pairwise LOD ≥2. GeneHunter was used to construct haplotypes based on the largest blocks identified. Intrafamilial association analysis was performed on individual SNPs using the PDT.27 The PDT produces two measures of association, the PDT-AVE and the PDT-SUM. The former gives all families equal weight in the analysis, whereas the latter gives more weight to more informative families.28 Association analysis was also performed on the SNP haplotypes. Each haplotype was assigned a single number, so that the analysis could be performed essentially as though each haplotype was a single locus with multiple alleles. This is necessary because the PDT cannot simultaneously analyse multiple loci.

The block structure of the CACNG3 locus was also determined using the HapMap genotyped SNPs (see Supplementary Data).

Re-sequencing and variant analysis

Bi-directional direct re-sequencing of 35 kb of genomic DNA (chromosome 16, 24155960–24190949; Accession number GI 51511732, NCBI Nucleotide Database) encompassing SNPs 1–9 from 24 affecteds was performed. Cases were chosen from families compatible with linkage to the CACNG3 locus and included individuals whose haplotypes demonstrated the most significant disease association. This re-sequencing work was performed by Polymorphic DNA Technologies Inc., using standard Sanger dideoxy sequencing protocols. The potential functional affect of all identified variants was assessed by searching for predicted regulatory motifs contained within the TransFac and Biobase databases via the Softberry NSITE portal. This website also contains the FPROM program, which predicts the position of potential promoters and enhancers. GeneSplicer29 was used to predict whether any variant might affect the splicing of the gene by identifying exon–intron boundaries and scoring them. ESEfinder30 was used to predict the presence of any exonic splicing enhancers in exon 1. This program identifies putative binding sites for four SR-rich proteins thought to be involved in the control of splicing. Prediction is based on a scoring system developed from weighted matrices for each motif consensus sequence; when a certain threshold score is achieved, the motif is recognised. The default values suggested by the program authors were used throughout. Standard BLAST analyses were performed to check for sequence conservation between species.

Results

Linkage analysis

The maximum HLOD score was 3.54 (α=0.62) located 0.15 cM upstream of the distal marker, UCL10321. The non-parametric analysis is also statistically significant: maximum NPL statistic of 2.87 (P<0.002) occurring at UCL10321 (Figure 1).

Re-sequencing of coding exons

Bi-directional re-sequencing of the coding exons and surrounding intron–exon boundaries in 59 cases identified 34 variants: four were upstream of CACNG3; six in the 5′UTR; five in intron 1; five in intron 2; nine in intron 3; one synonymous SNP in exon 4 (A2121G, Pro307Pro); two in the 3′UTR and two downstream of CACNG3.

LD block structure

Analysis of LD based on the whole resource identified five LD ‘blocks’ (Figure 2). The LD block structure predicted by the HapMap project genotyped SNPs (based on CEPH Caucasian data only), identified 11 blocks of LD across the same region (see Supplementary Data).

SNP-based association analysis

Three SNPs showed significant transmission disequilibrium (P≤0.01) with at least one of the test statistics: SNP3; SNP7 and SNP8 (Table 2). SNP3 is located approximately 2 kb upstream of CACNG3, whereas SNPs 7 and 8 are all located in intron 1. All three SNPs are in the first block of LD (Figure 2).

Table 2 SNPs showing statistically significant disease association (P≤0.01) in at least one PDT test statistic in the entire resource

Haplotype-based association analysis

Block-based haplotype association analysis was performed on the entire data set using the PDT. No single complete haplotype within a block was sufficiently common to allow demonstration of disease association on the global level; however, if a ‘sliding window’ approach was used on each block, associated haplotypes were identified. Using this approach, there are 13 haplotypes in Block 1, composed of combinations of SNPs 2–8, which demonstrate overtransmission and disease association (P≤0.05; Table 3). The individual haplotypes that are overtransmitted within each window together form a larger haplotype composed of the alleles 2211122. This haplotype has a frequency of 26.4% in our parental population.

Table 3 SNP-based sliding-window analysis of Block 1 showing windows, which demonstrated significant (P<0.05) global transmission disequilibrium in the entire resource when analysed using the PDT

The sliding window approach also produces some significant results in Block 2, which runs from SNPs 10 to 13, although these data are not as significant as for Block 1 (see Supplementary Data for details).

Variant detection

Intra-familial association analysis suggested that any functional variant underlying the observed transmission disequilibrium was most likely to be found between SNPs 1 and 9. Consequently, re-sequencing of the 35 kb of genomic sequence in this region was undertaken. Of the 48 chromosomes from 24 affecteds that were sequenced, 19 were of the most common haplotype, 2211122, which also shows the greatest evidence for disease association. The remaining 25 chromosomes that were sequenced were composed of a variety of different haplotypes. A total of 72 sequence variants were identified, including the nine previously typed (Figure 3; full details can be found in the Supplementary Information). Forty-five of these are within 20 kb upstream of the gene, two in the 5’UTR and the remaining 25 are in intron 1.

Figure 3
figure 3

Map illustrating the location of all variants found via re-sequencing of 24 unrelated cases. The boxed SNPs are those which were considered to have most supporting evidence for being causal.

An initial assessment of which of the identified variants were most likely to be causal was based on whether the minor allele frequency was different in the 24 sequenced cases from that quoted on the NCBI database (if that information was available). Any variants in which this did seem to be the case were typed in our entire resource, so that intrafamilial association analysis could be performed. Three variants (rs392728, rs11860647 and rs8048987) were genotyped across the resource for this reason. However, intrafamilial association analysis with the PDT did not provide any evidence for preferential transmission of either allele (data not shown).

Bioinformatics tools were also used to ascertain which of these 72 variants might be functional. Those considered to be most likely to have a functional effect are summarised in Table 4 (see Supplementary Information for full details). Of these, rs2021512 and rs1494550 are conserved at the nucleotide level in the chimpanzee (see Supplementary Data). rs11646957 has been typed in our resource of pedigrees and trios and intrafamilial association analysis performed. The results were not significant (data not shown). Intrafamilial association analysis had already been performed on rs1494550 and n20 as they are SNPs numbers 5 and 6 of the original 23 that were used. Neither demonstrated any disease association in these analyses (see Supplementary Information). However, rs2021512 did demonstrate significant transmission disequilibrium (SUM PDT χ2(1)=7.91, P=0.005; AVE PDT χ2(1)=4.90, P=0.027), with the reference allele being overtransmitted to cases (457 transmitted:422 not transmitted).

Table 4 Details of those variants most likely to have a functional effect based on the bioinformatics analyses performed

Discussion

Optimal strategies for the genetic analysis of ‘complex’ traits such as CAE have been much discussed in the recent literature and the potential problems are well recognised. These include uncertainties surrounding the delineation of a categorical phenotype and the unpredictable relationship between phenotype and genotype exemplified by current ignorance of the genetic architecture of most ‘complex’ traits, and the recognition that even epilepsy phenotypes demonstrating simple Mendelian inheritance display immense heterogeneity and variable expressivity. All these uncertainties preclude confident predictions about the likely power and validity of any strategy selected, but these data arise from approaches, which have some reasonable degree of support and expectation of success given certain assumptions.

CAE is a fairly well-defined and homogeneous phenotype with an electrophysiological hallmark and clear evidence of a genetic aetiology. Although there is some debate as to whether a narrower phenotype definition should be adopted – in part to allow clearer prognostic predictions – the clinical criteria adopted here provide the reasonable expectation that the patients ascertained represent a homogenous clinical phenotype. It is known that a variety of IGE phenotypes may cluster in families with a proband with absence epilepsy, but analysis reveals an increased clustering of CAE and JAE,2, 3 suggesting that they may share susceptibility loci. For this reason, the minority of pedigrees in which first degree relatives of a proband with CAE had a diagnosis of JAE were included and such individuals were categorised as affected.

A further advantage of this phenotype for genetic analysis lies in the existing level of understanding of the molecular neurophysiological basis of the ‘spike–wave’ seizures, which are their hallmark.1 A substantial body of evidence implicates VGCC genes in the aetiology of spike–wave seizures in rodents and absence seizures in humans. In particular, the stargazer phenotype arises from mutations in Cacng2, one of a family of so-called γ-subunit genes, which have been further defined as a family of transmembrane AMPA receptor regulatory proteins (TARPS)10 that mediate surface expression of AMPA receptors. Preliminary analysis in a limited family resource provided support for CACNG3 as a CAE susceptibility locus. It is noteworthy that the expression pattern of γ3 is specific to the cortex and hippocampus with low levels in the cerebellum, consistent with a role in epileptogenesis. A candidate gene approach therefore seems justified.

The relevant merits of linkage or association in the analysis of a complex trait depend on the genetic architecture of the trait, which cannot of course be known in advance. Linkage can detect a locus of moderate effect in a set of small nuclear pedigrees or sib-pairs provided the proportion of linked families is adequate. It is robust to any allelic heterogeneity. Association has more power to detect loci of small effect, but does of course depend on allelic homogeneity and allele frequencies conferring adequate power. Both parametric and NPL analysis provided significant evidence for linkage, indicating CACNG3 is a susceptibility loci for CAE in a subset of the 65 nuclear pedigrees analysed. Sequencing of coding regions did not however identify any plausible causal sequence variants.

The role of CACNG3 was therefore further analysed by intra-familial association analysis using the indirect approach based on genotyping of a set of common SNPs spanning the CACNG3 gene. The pattern of LD across CACNG3 was established and confirmed by HapMap data. A number of assumptions underlie this strategy and restrict its power to identify causal sequence variants. It is assumed that the actual SNPs typed are not causal but are in sufficiently tight LD with causal SNPs of adequately matching allele frequency to permit their detection. Power diminishes rapidly if these conditions are not fulfilled.31 It is also assumed that any causal SNPs will be within the associated interval, although in practice the interval is difficult to define.32

Three SNPs in the first ‘block’ of LD showed significant transmission disequilibrium (SNPs 3, 7 and 8) using the pedigree disequilibrium test in the entire patient resource. Using a ‘sliding window’ approach 13 haplotypes comprising SNPs 2–8 within this ‘block’ demonstrated transmission disequilibrium. Together, these form an extended haplotype composed of the alleles 2211122. Only two of these SNPs (SNPs 5 and 6) could be implicated on functional grounds. In each case, the minor allele is predicted to lead to creation of a splice acceptor site. However, neither SNP showed significant transmission disequilibrium either alone or in combination. A correction for multiple testing was not applied to these analyses, because although methods for calculating the effective number of independent tests have been developed,33 their validity in the presence of haplotype block structure has been questioned34 and these methods are not yet established tools. Thus, the association evidence described must be considered as tentative but requiring independent replication.

These observations suggested that causal variants underlying the observed transmission disequilibrium might lie within the genomic region between SNPs 1 and 9 and were most likely to be found on chromosomes of the haplotype 2211122. Re-sequencing of 35 kb of genomic sequence in 48 chromosomes identified a total of 72 sequence variants. Evaluation of these variants encompassed the use of bioinformatics tools to determine any potential functionality and genotyping in the entire resource to investigate disease association. Four SNPs were predicted to have some functional effect by disrupting exonic splicing enhancer binding motifs or creating novel acceptor sites. One of these, rs2021512, demonstrated significant transmission disequilibrium with the reference sequence allele (G) being overtransmitted in preference to the variant allele (A). This suggests that the variant form is protective. Analysis indicated that the variant allele could potentially create an acceptor splice site, although it is unclear how this might affect the function of the protein as rs2021512 is non-genic and approximately 14 kb upstream of CACNG3. However, it is possible that this SNP has a subtle regulatory effect, which was not identified with the bioinformatics used. Indeed, a paper earlier this year demonstrated that a non-genic variant can have a gain-of-function effect on another gene by creating a new transcriptional promoter.35 This is not necessarily what is occurring in this situation but it is clear that variants some distance from a gene can still exert a powerful effect on them. Furthermore, it is still possible that rs2021512 is not a causal variant but is in LD with an unidentified causal variant.

It is possible that the linkage observed is spurious and CACNG3 is not a susceptibility locus for the CAE trait. A false positive result is of course feasible even with the fairly stringent threshold for significance utilised. The transmission disequilibrium observed could be a false positive result, although not due to population substructure. Alternatively, the observed association is real, but driven by causal variants outside the sequenced region or too infrequent and heterogeneous to be detected in the limited number of chromosomes sequenced. It has been demonstrated that long range LD can exist generating ‘genetically indistinguishable SNPs’, which are many kilobases apart.36 The power to detect a homogeneous causal variant with a population frequency of 5% is approximately 92% when 48 chromosomes are sequenced but of course a heterogeneous collection of low frequency variants might go undetected. Finally, it is possible that the observed SNPs demonstrating transmission disequilibrium have functional consequences, which are not apparent.

In conclusion, these observations provide genetic evidence that CACNG3 is a susceptibility locus for CAE. Common variants showing transmission disequilibrium have been identified. Definitive evidence to confirm or exclude this locus will require re-sequencing across an extended genomic region encompassing CACNG3 in a larger number of patients. Replication studies in similar resources of CAE patients would demonstrate whether rs2021512 is associated in other patient groups, and functional work to establish what the exact biological mechanism could be is needed.