Genome-wide association analysis identifies new candidate risk loci for familial intracranial aneurysm in the French-Canadian population

Intracranial Aneurysm (IA) is a common disease with a worldwide prevalence of 1–3%. In the French-Canadian (FC) population, where there is an important founder effect, the incidence of IA is higher and is frequently seen in families. In this study, we genotyped a cohort of 257 mostly familial FC IA patients and 1,992 FC controls using the Illumina NeuroX SNP-chip. The most strongly associated loci were tested in 34 Inuit IA families and in 32 FC IA patients and 106 FC controls that had been exome sequenced (WES). After imputation, one locus at 3p14.2 (FHIT, rs1554600, p = 4.66 × 10–9) reached a genome-wide significant level of association and a subsequent validation in Nunavik Inuit cohort further confirmed the significance of the FHIT variant association (rs780365, FBAT-O, p = 0.002839). Additionally, among the other promising loci (p < 5 × 10−6), the one at 3q13.2 (rs78125721, p = 4.77 × 10−7), which encompasses CCDC80, also showed an increased mutation burden in the WES data (CCDC80, SKAT-O, p = 0.0005). In this study, we identified two new potential IA loci in the FC population: FHIT, which is significantly associated with hypertensive IA, and CCDC80, which has potential genetic and functional relevance to IA pathogenesis, providing evidence on the additional risk loci for familial IA. We also replicated the previous IA GWAS risk locus 18q11.2, and suggested a potential locus at 8p23.1 that warrants further study.

Intracranial Aneurysm (IA) has a prevalence of 1-3% in the general population 1,2 . The rupture of an IA can lead to subarachnoid hemorrhages (SAH), which has devastating consequences. Environmental and genetic factors, such as hypertension and smoking 3 , family history and ethnicity all contribute to the risk of IA. Because of the complexity of IA, genome-wide association studies (GWAS) have become the predominant strategy used to look for genetic factors associated with IA. These studies used several large cohorts with IA patients, mainly of Finnish, Japanese or European descent. Several risk loci were discovered in these GWA studies: 8q11.23 (SOX17), 9p21.3-23.1 (CDKN2A-CDKN2BAS) 4,5 and 2q33.1 from the European and Japanese cohorts; 18q11.2 (RBBP8), 13q13.1 (STARD13-KL) and 10q24.32 6 from the Finnish and Japanese cohorts; 1q23.1, 3p25.2, 7p21.2, 9q31.3 6 and 4q31.22 (EDNRA) 7 from two Japanese cohorts, and 7p21.1 (HDAC9) 8 from a cohort with European ancestry. A more recent Finnish IA study revealed additional GWAS risk loci, including 2q23.3, 5q31.3 and 6q24.2, represented by low-frequency SNPs 9 . Multiple GWAS signals suggest that the genetic etiology of IA may be complex and population specific. The risk loci found in each GWA study was estimated to only account for 4.1-6.1% of the heritability in the respective cohort 9 .
It has been reported that the French-Canadian (FC) population has higher IA/SAH incidence and that patients usually aggregate in large pedigrees 10 , with 30% of IA patients having a family history (fIA) 11 . Similarly, to the Finnish people, French-Canadians are also descended from a relatively small founder population and have population specific variants due to the population bottleneck and genetic drift. Therefore, we hypothesized that population specific and median/low frequency variants may play an important part in the disease risk in FC fIA.

GWAS discovery phase.
After data QC and sample pruning, 621,983 SNPs with 173 FC IA cases and 1,772 FC controls remained in the analysis. The genome-wide threshold for significance after Bonferroni correction was set to 5 × 10 −8 . Marker-wise P values of Cochran-Armitage trend test were performed using PLINK 1.9 12 for genotyped variants. Genomic inflation factor λ = 1.02 indicated that there was little inflation of excessive significant markers, as shown in quantile-quantile (QQ) plot (Supplementary Figure S1).
After imputation using the Haplotype Reference Consortium (HRC) and the exclusion of low quality and low MAF variants, 7,614,484 remaining variants were included in the test of associations using SNPtest 13 . The results were shown in the Manhattan plot ( Figure 1). One locus reached genome-wide significant level after imputation: 3p14.2 (rs1554600, OR 0.26, p = 4.66 × 10 −9 ) located in gene FHIT. TaqMan validation of rs1554600 on IA cases and controls suggested the imputation was accurate (maf.case = 0.0838 and maf.control = 0.026). The 28 most significant SNPs, representing 26 distinct loci that each reached suggestive level (p < 5 × 10 −6 ), were prioritized for further validation (Table 1). A meta-analysis of 138 SNPs (Supplementary Table S1) located in 23 out of 26 loci comprising the current study and FIA cohort 8 showed that three SNPs: rs76308736 (8p23.1), rs7084131 (10p14) and rs4867356 (5p13.3) are validated with decreased p-value (Table 2).
Replication in exome data and Inuit cohort. We looked into the exome sequencing data of the aforementioned 28 genes containing the top GWAS SNPs in our FC WES cohort; of the 23 genes that had exonic variants, a total of 177 exonic and splicing variants were found in 138 FC cases and controls. Sequence Kernel Association Test (SKAT) 17 results showed excessive exonic variation burden in IA cases in four genes SLC35F3 (p = 0.002), DTNB (p = 0.003), CCDC80 (p = 0.0005) and PABPC3 (p = 0.0001) ( Table 3). However, the first two genes were less convincing with the limited number of variants in the testing (two variants for each). PABPC3 was unlikely to be a risk gene for IA due to its human testis-specific expression. Thresholds Test (VT) 18 focused on the selected genes revealed that only CCDC80 (p = 0.01) in 3q13.2 reached the statistical significance after accounting for multiple testing.
In the Nunavik Inuit IA cohort, after performing Family Based Association Test (FBAT) 19 of the 2,429 SNPs within the 28 aforementioned genes and neighboring regions, 50 SNPs located in the FHIT gene region were with p < 0.05 (Supplementary Table S2), the most significant one being rs780365 (p = 0.002839). Although the associations were no longer significant after corrections of multiple testing (p < 0.00014, 2,429 variants in 353 independent tests), it could be due to the limited sample size, which still suggested that FHIT variants may likely still be associated with IA.
Replication of previous GWAS risk loci. We also attempted to replicate the 12 IA risk loci identified in previous GWAS in our FC IA cohorts. 825 distinct LD blocks were established in these 12 loci in the FC population, the level of significance was therefore p < 6.06 × 10 −5 after multiple correction. As a result, only one SNP rs35127791 located in 18q11.2 was replicated (maf = 0.157, p = 5.05 × 10 −5 , beta = −0.66, SE = 0.16) ( Figure 2). LocusZoom plots covering the rest of the 23 genome-wide significant SNPs from previous GWAS in these 12 loci were shown in Supplementary Figure S3.
We further looked into the FC WES data in locus 18q11.2, which comprised 4 protein coding genes: GATA6, CTAGE1, RBBP8 and CABLES1, a variant burden test revealed exonic variants of CABLES1 seemed to be associated with IA in FC (p = 0.022, corrected) (Supplementary Table S3).

Discussion
In this study, we discovered a new IA associated region on 3p14.2 which encompasses the FHIT gene in the French-Canadians (Figure 3), intronic variants in FHIT also suggested an association with IA in the Nunavik Inuit population. Additionally, we found evidence suggesting exonic variants in CCDC80 within the 3q13.2 locus to be associated with the French-Canadian IA. Collectively, SNPs in FHIT and CCDC80 could explain approximately 3% of the heritability of IA in French-Canadians, higher than that was reported in the Finnish study (2.1%) 9 and in the GWAS replication (2.5%) 6 . The underlying reason for this might be that French-Canadians are a more homogenous population and this study has mainly included familial cases. On the other hand, we also replicated a previous GWAS risk locus 18q11.2 20   hypertension. Among the three SNPs which were validated in the FIA cohort, rs76308736 is located in the promoter region of SOX7, with its crucial function in angiogenesis and establishment of arteriovenous identity. rs76308736 only showed suggestive significance in our discovery cohort and did not pass multiple correction in the FIA data, the relationship of this SNP and risk for IA remains to be explored. Although the number of cases in this study was limited, we tried improving the power by doing the following: 1) targeting IA patients with family history, 2) focusing on individuals only with the French-Canadian ethnicity, and 3) validating the findings in exome sequencing results and in another founder population. Because French-Canadians originated from a small founder population, we also included intermediate variants after the imputation. The top SNPs that we discovered in FHIT and CCDC80 had rare or intermediate frequency (2-4%). The MAF of rs1554600 was higher in French-Canadians (3.3%) compared to Europeans (1.9%) and was significantly higher when compared with East Asians (0.1%), which suggest a bottleneck and/or drift may be the reason for the accumulation of low-frequency variants that potentially associated with the risk of IA in the French-Canadians.
FHIT (fragile histidine triad) is a tumor suppressor gene that regulates DNA replication and signals stress responses 23 , and encompasses the most active of the common human chromosomal fragile regions (FRA3B). Its expression has an important role in response to oxidative damage 24 . On the other hand, oxidative stress is known to be a key contributor to IA formation and rupture 25 . Interestingly, a study highlighted that the SNPs in FHIT have been associated with hypertensive traits in populations from Saguenay-Lac-St-Jean region, mainly French-Canadians 26 . As over 40% of our French-Canadian IA cases were also affected with hypertension (Table 4), we considered that rs1554600 in FHIT is possibly more likely to be a risk for hypertensive IA in French-Canadians. Further test of FHIT SNPs between IA patients with and without hypertension revealed several SNPs in LD with rs1554600 to be significantly associated with this trait (ie. rs73098963, p = 0.002611, GWAS p value = 2.42 × 10 −8 ). FHIT has been reported in other GWAS studies to associate with hypertension: rs6782531, which located at approximately 160 kb upstream and in high LD of rs1554600 have also been reported to be significantly associated with blood pressure 27 .
The 3q13.2 locus is a gene-rich region with many functions in inflammatory responses. BTLA (B-and T-lymphocyte attenuator) is involved in inflammatory responses 28 and homeostasis of the immune system 29 . A previous study showed BTLA expression to be up-regulated in organs after a hemorrhagic shock 30 . The 3q13.2 locus also comprises ATG3, which encodes a protein known to induce apoptosis 31 and also to act as a regulator of oxidant and inflammatory balance that regulates endothelial cell stress response 32 . The top SNP rs2705520 in ATG3 was also reported in a GWAS to be associated with asthma 33 , suggesting its role in inflammatory diseases.
The most interesting gene in the 3q13.2 locus is CCDC80, also known as SSG1 (steroid-sensitive gene 1), which is a cGMP signalling effector and was reported to be widely express in vascular smooth muscle cells 34 . A previous study also showed that it has a role as a modulator of glucose and energy homeostasis 35 . Another study highlighted that the product of fibroblast growth factor (FGF) regulates the expression of CCDC80 36 , which is in turn also involved in cell adhesion during differentiation of fibroblast 37 . CCDC80 was reported as a tumor suppressor as well 38 . These evidences suggest the critical function of CCDC80 in vascular formation. CCDC80 harbor a large number of rare variants in the French-Canadians (Supplementary Table S4); which also showed a significant difference in the variation burden in cases and controls. Both ATG3 and CCDC80 are dosage sensitive genes 39,40 , therefore the potential different expression levels of those genes may affect the risk of IA.
In conclusion, we have provided evidence for four new loci associated with IA in French-Canadian IA cohort recruited from Montréal and Québec city, which could explain 3% of the disease heritability. Based on the findings of this study and the functions of their encoded products, two genes (FHIT and CCDC80) are potentially relevant to IA with strong aggregation of familial IA cases with high blood pressure in the French-Canadian population. FHIT is more particularly associated with hypertensive IA cases and this may be the consequence of a bottleneck and/or drift that affected the French-Canadian founder population. CCDC80 was shown to have a large number of rare variations in the French-Canadian cohort and with a significantly different variation burdens between IA cases and controls. Both the lack of association of SNPs in FHIT and CCDC80 and the replication of the only 18q11.2 locus of the previous GWAS hits suggests a genetic heterogeneity in IA, and thus additional studies targeting other high-risk populations are needed. However, the limited number of cases available in our study calls for a validation study that will have access to a larger cohort from the same founder population, thus to increase the power of detection.

Methods
Discovery cohort. The discovery cohort included 257 French-Canadian IA patients, a majority of them were with family history, which were recruited in Montréal and Québec City, Canada. The diagnoses were confirmed either by magnetic resonance angiography (MRA), or by surgical confirmation (clipped or coiled). An additional 1,992 controls, mainly comprised of unrelated FC individuals without cerebrovascular diseases were also included in the analysis. Their demographic information is listed in Table 4. Written informed consent were obtained from all participants, and this manuscript contain no identifying information for any participant. This study was approved by Comité d' éthique de la recherche du Centre hospitalier de l′Université de Montréal and McGill University ethics, all methods were performed in accordance with the relevant guidelines and regulations of McGill University (REB NEU-14-051).
Genotyping and quality control. All patients and controls were genotyped using the Illumina NeuroX SNP-chip, which contains 719,885 markers and is comprised of the backbone of Illumina HumanOmniExpress-v24 BeadChip. Raw data was processed by Illumina GenomeStudio software before the genotypes were generated. Both markers and samples were passed through a series of quality control (QC) steps. Samples were first removed if duplicated or if they had one of the following issues: 1) sex discrepancies; 2) exceeded a missing rate of 0.02; 3) ethnical admixture determined by PCA; or 4) with cryptic relateness determined by PLINK. Markers were removed if they meet one of the following criteria: 1) exceeding a missing rate of 0.02; 2) having a minor allele frequency (MAF) lower than 0.01; 3) deviating from Hardy Weinberg Equilibrium (p < 0.0001).
Principal Component Analysis (PCA) impelemented in the package EIGENSOFT 6.0 41 was performed to assess the ethnicity of the samples. Three distinct populations CEU, CHB and YRI from 1000 Genome (1KGP) Phase III were used for clustering and CEU outliners were removed from further analysis. The remaining homogeneous population was also adjusted for the principal components in the subsequent tests for associations. Imputation. Imputation was done by the Sanger Imputation Server (https://imputation.sanger.ac.uk/) using Haplotype Reference Consortium r1.1 41 , and were pre-phased using SHAPEIT2 43 . Imputed variants were included in the further analyses only with MAF >0.01 and with imputation quality score >0.3.

Association analysis.
Frequentist additive association implemented in SNPtest 13 was used to test for association of the imputed dataset, between FC IA cases and controls. Five major principal components were used as covariates for ancestry adjustment along with the sex of samples. Only autosomal SNPs were analyzed.
Regional association of suggestive loci were plotted using LocusZoom 44 with LD data from 1KGP CEU population.  Heritability estimation. We estimated the heritability from the original and imputed variants within the most promising loci, using methods of Estimation of Variance explained by SNPs (GREML) 14 and GREML-LDMS 16 programs implemented in the package of the Genome-wide Complex Trait Analysis (GCTA) 15 .

Meta-analysis.
SNP identified in the current study that reached suggestive level of association (p < 5 × 10 −6 ) were compared with the previous FIA GWAS summary statistics from 2,617 IA cases and 1,416 controls 8 . METAL was used to conduct meta-analysis of the two GWAS results for these selected SNPs.
FC and Inuit IA WES data. We examined the loci from the GWAS signals that have reached suggestive level of association in the exome sequencing results of 32 selected FC IA cases and 106 FC controls. Variable Thresholds Test (VT) 18 implemented in Variant Tools (Vtools) 45 and Sequence Kernel Association Test (SKAT) 17 were performed to test the exonic variation burden in the genes located in the GWAS significant regions. Thirty-four Nunavik Inuit (Québec, Canada) families comprised of 49 IA patients and 124 family controls were also used to follow up the regions of significance. The samples were genotyped on Illumina HumanOmniExpress-v24 Beadchip which contains 730,525 SNPs. We looked into all the suggestive loci of the FC IA discovery cohort, and performed family-based association analysis (fbat) implemented in FBAT package 19 in the Inuit SNP-chip data to test the case-control association in related individuals.
Replication of previous IA GWAS loci. 12 loci from previous GWA studies (Supplementary Figure S3) with 23 SNPs that reached genome-wide significance were selected to examine if they could be replicated in our study. 500 kb upstream/downstream of the first and last genome-wide significant SNP located in each of the 12 loci were investigated in our GWAS data. Multiple correction was performed on the number of independent tests, which was defined by the number of LD blocks within these 12 regions. LocusZoom was used for data plotting. The validated loci will further be examined in FC WES data using Genome Analysis Toolkit (GATK) 46 and VTools.