Introduction

Intracranial Aneurysm (IA) has a prevalence of 1–3% in the general population1,2. The rupture of an IA can lead to subarachnoid hemorrhages (SAH), which has devastating consequences. Environmental and genetic factors, such as hypertension and smoking3, family history and ethnicity all contribute to the risk of IA. Because of the complexity of IA, genome-wide association studies (GWAS) have become the predominant strategy used to look for genetic factors associated with IA. These studies used several large cohorts with IA patients, mainly of Finnish, Japanese or European descent. Several risk loci were discovered in these GWA studies: 8q11.23 (SOX17), 9p21.3-23.1 (CDKN2A-CDKN2BAS)4,5 and 2q33.1 from the European and Japanese cohorts; 18q11.2 (RBBP8), 13q13.1 (STARD13-KL) and 10q24.326 from the Finnish and Japanese cohorts; 1q23.1, 3p25.2, 7p21.2, 9q31.36 and 4q31.22 (EDNRA)7 from two Japanese cohorts, and 7p21.1 (HDAC9)8 from a cohort with European ancestry. A more recent Finnish IA study revealed additional GWAS risk loci, including 2q23.3, 5q31.3 and 6q24.2, represented by low-frequency SNPs9. Multiple GWAS signals suggest that the genetic etiology of IA may be complex and population specific. The risk loci found in each GWA study was estimated to only account for 4.1–6.1% of the heritability in the respective cohort9.

It has been reported that the French-Canadian (FC) population has higher IA/SAH incidence and that patients usually aggregate in large pedigrees10, with 30% of IA patients having a family history (fIA)11. Similarly, to the Finnish people, French-Canadians are also descended from a relatively small founder population and have population specific variants due to the population bottleneck and genetic drift. Therefore, we hypothesized that population specific and median/low frequency variants may play an important part in the disease risk in FC fIA.

Results

GWAS discovery phase

After data QC and sample pruning, 621,983 SNPs with 173 FC IA cases and 1,772 FC controls remained in the analysis. The genome-wide threshold for significance after Bonferroni correction was set to 5 × 10−8. Marker-wise P values of Cochran–Armitage trend test were performed using PLINK 1.912 for genotyped variants. Genomic inflation factor λ = 1.02 indicated that there was little inflation of excessive significant markers, as shown in quantile-quantile (QQ) plot (Supplementary Figure S1).

The result of the initial trend test showed 3q13.2 as the most significant locus: rs2705520 (p = 6.93 × 10−8), an intronic SNP in ATG3 (autophagy 3), followed by rs1877362 (p = 1.16 × 10−7) in CCDC80 (coiled-coil domain containing 80) and rs1472107 (p = 1.18 × 10−7) in SLC35A5 (Solute Carrier Family 35 member A5) (Supplementary Figure S2).

After imputation using the Haplotype Reference Consortium (HRC) and the exclusion of low quality and low MAF variants, 7,614,484 remaining variants were included in the test of associations using SNPtest13. The results were shown in the Manhattan plot (Figure 1). One locus reached genome-wide significant level after imputation: 3p14.2 (rs1554600, OR 0.26, p = 4.66 × 10−9) located in gene FHIT. TaqMan validation of rs1554600 on IA cases and controls suggested the imputation was accurate (maf.case = 0.0838 and maf.control = 0.026). The 28 most significant SNPs, representing 26 distinct loci that each reached suggestive level (p < 5 × 10−6), were prioritized for further validation (Table 1). A meta-analysis of 138 SNPs (Supplementary Table S1) located in 23 out of 26 loci comprising the current study and FIA cohort8 showed that three SNPs: rs76308736 (8p23.1), rs7084131 (10p14) and rs4867356 (5p13.3) are validated with decreased p-value (Table 2).

Figure 1
figure 1

Genome-wide assocation analysis in the FC IA discovery cohort. Imputed using HRC panel, 7,614,484 variants passed QC are included in making the manhattan plot. X-axis shows the physical position along the genome. Y-axis shows the −log10 (p-value) for association. Red line indicates the level of genome-wide significant association (p = 5 × 10−8); blue line indicates the level of suggestive association (p = 5 × 10−6). Green dots indicate FHIT SNPs.

Table 1 26 loci (28 genes) with top SNPs reached promising level of association.
Table 2 Three loci with SNPs validated in the FIA cohort.

Using imputed SNPs, the two loci 3p14.2 and 3q13.2 were estimated by GCTA-GREML14,15,16 (Genome-wide Complex Trait Analysis) to account for approximately 3% of the heritability in the FC cohort (standard error (SE) = 0.026).

Replication in exome data and Inuit cohort

We looked into the exome sequencing data of the aforementioned 28 genes containing the top GWAS SNPs in our FC WES cohort; of the 23 genes that had exonic variants, a total of 177 exonic and splicing variants were found in 138 FC cases and controls. Sequence Kernel Association Test (SKAT)17 results showed excessive exonic variation burden in IA cases in four genes SLC35F3 (p = 0.002), DTNB (p = 0.003), CCDC80 (p = 0.0005) and PABPC3 (p = 0.0001) (Table 3). However, the first two genes were less convincing with the limited number of variants in the testing (two variants for each). PABPC3 was unlikely to be a risk gene for IA due to its human testis-specific expression. Thresholds Test (VT)18 focused on the selected genes revealed that only CCDC80 (p = 0.01) in 3q13.2 reached the statistical significance after accounting for multiple testing.

Table 3 Exonic and splicing variants of 23 GWAS suggestive genes in 138 FC cases and controls from WES.

In the Nunavik Inuit IA cohort, after performing Family Based Association Test (FBAT)19 of the 2,429 SNPs within the 28 aforementioned genes and neighboring regions, 50 SNPs located in the FHIT gene region were with p < 0.05 (Supplementary Table S2), the most significant one being rs780365 (p = 0.002839). Although the associations were no longer significant after corrections of multiple testing (p < 0.00014, 2,429 variants in 353 independent tests), it could be due to the limited sample size, which still suggested that FHIT variants may likely still be associated with IA.

Replication of previous GWAS risk loci

We also attempted to replicate the 12 IA risk loci identified in previous GWAS in our FC IA cohorts. 825 distinct LD blocks were established in these 12 loci in the FC population, the level of significance was therefore p < 6.06 × 10−5 after multiple correction. As a result, only one SNP rs35127791 located in 18q11.2 was replicated (maf = 0.157, p = 5.05 × 10−5, beta = −0.66, SE = 0.16) (Figure 2). LocusZoom plots covering the rest of the 23 genome-wide significant SNPs from previous GWAS in these 12 loci were shown in Supplementary Figure S3.

Figure 2
figure 2

Regional association signals of the previous GWAS risk locus 18q11.2. LocusZoom plot showing the regional association of chr18.19.7-20.75 mb, including the most significant SNP rs35127791 and previous GWAS SNP rs11661542. Purple line indicates the genetic recombination rate (cM/Mb). SNPs in linkage disequilibrium with rs35127791 are shown in color gradient indicating r2 levels (hg19, 1KGP, Nov 2014, EUR).

We further looked into the FC WES data in locus 18q11.2, which comprised 4 protein coding genes: GATA6, CTAGE1, RBBP8 and CABLES1, a variant burden test revealed exonic variants of CABLES1 seemed to be associated with IA in FC (p = 0.022, corrected) (Supplementary Table S3).

Discussion

In this study, we discovered a new IA associated region on 3p14.2 which encompasses the FHIT gene in the French-Canadians (Figure 3), intronic variants in FHIT also suggested an association with IA in the Nunavik Inuit population. Additionally, we found evidence suggesting exonic variants in CCDC80 within the 3q13.2 locus to be associated with the French-Canadian IA. Collectively, SNPs in FHIT and CCDC80 could explain approximately 3% of the heritability of IA in French-Canadians, higher than that was reported in the Finnish study (2.1%)9 and in the GWAS replication (2.5%)6. The underlying reason for this might be that French-Canadians are a more homogenous population and this study has mainly included familial cases. On the other hand, we also replicated a previous GWAS risk locus 18q11.220, the top SNP rs35127791 located approximately 200 kb upstream of GATA6, close to a H3K24Ac region in HUVEC cells. GATA6 regulates the differentiative state of vascular smooth muscle cells, and an important candidate for cardiovascular development. CABLES1 proves to have an important role in cancer and development of neurons21, a recent study also highlighted its function vascular cell senescence and inflammation through p21 regulation22.

Figure 3
figure 3

Regional association signals of 3p14.2 locus. 4 Mb region around the most significant association, rs1554600 in 3p14.2 locus is displayed using imputed data. Variant with the most significant association (rs1554600) is indicated in purple diamond. Purple line indicates the genetic recombination rate (cM/Mb). SNPs in linkage disequilibrium with rs1554600 are shown in color gradient indicating r2 levels (hg19, 1KGP, Nov 2014, EUR).

Unfortunately, we could not replicate the most significant SNP in FHIT in another IA cohort. However, this was expected, as FHIT variants have higher MAFs in French-Canadians compared to other European populations. The signal may only exist in French-Canadian populations and may correlate with IA cases with hypertension. Among the three SNPs which were validated in the FIA cohort, rs76308736 is located in the promoter region of SOX7, with its crucial function in angiogenesis and establishment of arteriovenous identity. rs76308736 only showed suggestive significance in our discovery cohort and did not pass multiple correction in the FIA data, the relationship of this SNP and risk for IA remains to be explored.

Although the number of cases in this study was limited, we tried improving the power by doing the following: 1) targeting IA patients with family history, 2) focusing on individuals only with the French-Canadian ethnicity, and 3) validating the findings in exome sequencing results and in another founder population. Because French-Canadians originated from a small founder population, we also included intermediate variants after the imputation. The top SNPs that we discovered in FHIT and CCDC80 had rare or intermediate frequency (2-4%). The MAF of rs1554600 was higher in French-Canadians (3.3%) compared to Europeans (1.9%) and was significantly higher when compared with East Asians (0.1%), which suggest a bottleneck and/or drift may be the reason for the accumulation of low-frequency variants that potentially associated with the risk of IA in the French-Canadians.

FHIT (fragile histidine triad) is a tumor suppressor gene that regulates DNA replication and signals stress responses23, and encompasses the most active of the common human chromosomal fragile regions (FRA3B). Its expression has an important role in response to oxidative damage24. On the other hand, oxidative stress is known to be a key contributor to IA formation and rupture25. Interestingly, a study highlighted that the SNPs in FHIT have been associated with hypertensive traits in populations from Saguenay-Lac-St-Jean region, mainly French-Canadians26. As over 40% of our French-Canadian IA cases were also affected with hypertension (Table 4), we considered that rs1554600 in FHIT is possibly more likely to be a risk for hypertensive IA in French-Canadians. Further test of FHIT SNPs between IA patients with and without hypertension revealed several SNPs in LD with rs1554600 to be significantly associated with this trait (ie. rs73098963, p = 0.002611, GWAS p value = 2.42 × 10−8). FHIT has been reported in other GWAS studies to associate with hypertension: rs6782531, which located at approximately 160 kb upstream and in high LD of rs1554600 have also been reported to be significantly associated with blood pressure27.

Table 4 FC sample demographics and the clinical information of IA cases.

The 3q13.2 locus is a gene-rich region with many functions in inflammatory responses. BTLA (B- and T-lymphocyte attenuator) is involved in inflammatory responses28 and homeostasis of the immune system29. A previous study showed BTLA expression to be up-regulated in organs after a hemorrhagic shock30. The 3q13.2 locus also comprises ATG3, which encodes a protein known to induce apoptosis31 and also to act as a regulator of oxidant and inflammatory balance that regulates endothelial cell stress response32. The top SNP rs2705520 in ATG3 was also reported in a GWAS to be associated with asthma33, suggesting its role in inflammatory diseases.

The most interesting gene in the 3q13.2 locus is CCDC80, also known as SSG1 (steroid-sensitive gene 1), which is a cGMP signalling effector and was reported to be widely express in vascular smooth muscle cells34. A previous study also showed that it has a role as a modulator of glucose and energy homeostasis35. Another study highlighted that the product of fibroblast growth factor (FGF) regulates the expression of CCDC8036, which is in turn also involved in cell adhesion during differentiation of fibroblast37. CCDC80 was reported as a tumor suppressor as well38. These evidences suggest the critical function of CCDC80 in vascular formation. CCDC80 harbor a large number of rare variants in the French-Canadians (Supplementary Table S4); which also showed a significant difference in the variation burden in cases and controls. Both ATG3 and CCDC80 are dosage sensitive genes39,40, therefore the potential different expression levels of those genes may affect the risk of IA.

In conclusion, we have provided evidence for four new loci associated with IA in French-Canadian IA cohort recruited from Montréal and Québec city, which could explain 3% of the disease heritability. Based on the findings of this study and the functions of their encoded products, two genes (FHIT and CCDC80) are potentially relevant to IA with strong aggregation of familial IA cases with high blood pressure in the French-Canadian population. FHIT is more particularly associated with hypertensive IA cases and this may be the consequence of a bottleneck and/or drift that affected the French-Canadian founder population. CCDC80 was shown to have a large number of rare variations in the French-Canadian cohort and with a significantly different variation burdens between IA cases and controls. Both the lack of association of SNPs in FHIT and CCDC80 and the replication of the only 18q11.2 locus of the previous GWAS hits suggests a genetic heterogeneity in IA, and thus additional studies targeting other high-risk populations are needed. However, the limited number of cases available in our study calls for a validation study that will have access to a larger cohort from the same founder population, thus to increase the power of detection.

Methods

Discovery cohort

The discovery cohort included 257 French-Canadian IA patients, a majority of them were with family history, which were recruited in Montréal and Québec City, Canada. The diagnoses were confirmed either by magnetic resonance angiography (MRA), or by surgical confirmation (clipped or coiled). An additional 1,992 controls, mainly comprised of unrelated FC individuals without cerebrovascular diseases were also included in the analysis. Their demographic information is listed in Table 4. Written informed consent were obtained from all participants, and this manuscript contain no identifying information for any participant. This study was approved by Comité d’éthique de la recherche du Centre hospitalier de l′Université de Montréal and McGill University ethics, all methods were performed in accordance with the relevant guidelines and regulations of McGill University (REB NEU-14-051).

Genotyping and quality control

All patients and controls were genotyped using the Illumina NeuroX SNP-chip, which contains 719,885 markers and is comprised of the backbone of Illumina HumanOmniExpress-v24 BeadChip. Raw data was processed by Illumina GenomeStudio software before the genotypes were generated. Both markers and samples were passed through a series of quality control (QC) steps. Samples were first removed if duplicated or if they had one of the following issues: 1) sex discrepancies; 2) exceeded a missing rate of 0.02; 3) ethnical admixture determined by PCA; or 4) with cryptic relateness determined by PLINK. Markers were removed if they meet one of the following criteria: 1) exceeding a missing rate of 0.02; 2) having a minor allele frequency (MAF) lower than 0.01; 3) deviating from Hardy Weinberg Equilibrium (p < 0.0001).

Principal Component Analysis (PCA) impelemented in the package EIGENSOFT 6.041 was performed to assess the ethnicity of the samples. Three distinct populations CEU, CHB and YRI from 1000 Genome (1KGP) Phase III were used for clustering and CEU outliners were removed from further analysis. The remaining homogeneous population was also adjusted for the principal components in the subsequent tests for associations.

Imputation

Imputation was done by the Sanger Imputation Server (https://imputation.sanger.ac.uk/) using Haplotype Reference Consortium r1.141, and were pre-phased using SHAPEIT243. Imputed variants were included in the further analyses only with MAF >0.01 and with imputation quality score >0.3.

Association analysis

Frequentist additive association implemented in SNPtest13 was used to test for association of the imputed dataset, between FC IA cases and controls. Five major principal components were used as covariates for ancestry adjustment along with the sex of samples. Only autosomal SNPs were analyzed.

Regional association of suggestive loci were plotted using LocusZoom44 with LD data from 1KGP CEU population.

Heritability estimation

We estimated the heritability from the original and imputed variants within the most promising loci, using methods of Estimation of Variance explained by SNPs (GREML)14 and GREML-LDMS16 programs implemented in the package of the Genome-wide Complex Trait Analysis (GCTA)15.

Meta-analysis

SNP identified in the current study that reached suggestive level of association (p < 5 × 10−6) were compared with the previous FIA GWAS summary statistics from 2,617 IA cases and 1,416 controls8. METAL was used to conduct meta-analysis of the two GWAS results for these selected SNPs.

FC and Inuit IA WES data

We examined the loci from the GWAS signals that have reached suggestive level of association in the exome sequencing results of 32 selected FC IA cases and 106 FC controls. Variable Thresholds Test (VT)18 implemented in Variant Tools (Vtools)45 and Sequence Kernel Association Test (SKAT)17 were performed to test the exonic variation burden in the genes located in the GWAS significant regions.

Thirty-four Nunavik Inuit (Québec, Canada) families comprised of 49 IA patients and 124 family controls were also used to follow up the regions of significance. The samples were genotyped on Illumina HumanOmniExpress-v24 Beadchip which contains 730,525 SNPs. We looked into all the suggestive loci of the FC IA discovery cohort, and performed family-based association analysis (fbat) implemented in FBAT package19 in the Inuit SNP-chip data to test the case-control association in related individuals.

Replication of previous IA GWAS loci

12 loci from previous GWA studies (Supplementary Figure S3) with 23 SNPs that reached genome-wide significance were selected to examine if they could be replicated in our study. 500 kb upstream/downstream of the first and last genome-wide significant SNP located in each of the 12 loci were investigated in our GWAS data. Multiple correction was performed on the number of independent tests, which was defined by the number of LD blocks within these 12 regions. LocusZoom was used for data plotting. The validated loci will further be examined in FC WES data using Genome Analysis Toolkit (GATK)46 and VTools.