A replication study of genetic risk loci for ischemic stroke in a Dutch population: a case-control study

We aimed to replicate reported associations of 10 SNPs at eight distinct loci with overall ischemic stroke (IS) and its subtypes in an independent cohort of Dutch IS patients. We included 1,375 IS patients enrolled in a prospective multicenter hospital-based cohort in the Netherlands, and 1,533 population-level controls of Dutch descent. We tested these SNPs for association with overall IS and its subtypes (large artery atherosclerosis, small vessel disease and cardioembolic stroke (CE), as classified by TOAST) using an additive multivariable logistic regression model, adjusting for age and sex. We obtained odds ratios (OR) with 95% confidence intervals (95% CI) for the risk allele of each SNP analyzed and exact p-values by permutation. We confirmed the association at 4q25 (PITX2) (OR 1.43; 95% CI, 1.13–1.81, p = 0.029) and 16q22 (ZFHX3) (OR 1.62; 95% CI, 1.26–2.07, p = 0.001) as risk loci for CE. Locus 16q22 was also associated with overall IS (OR 1.24; 95% CI, 1.08–1.42, p = 0.016). Other loci previously associated with IS and/or its subtypes were not confirmed. In conclusion, we validated two loci (4q25, 16q22) associated with CE. In addition, our study may suggest that the association of locus 16q22 may not be limited to CE, but also includes overall IS.

A substantial proportion of the etiology of acute ischemic stroke (IS) is thought to be attributable to (common) genetic variation 1 . Genome-wide association studies (GWAS) have estimated that the proportion of phenotypic variance of IS explained by common variants ranges between 16 and 40%, depending on subtype 1 . Thus far, GWAS have identified a small number of single nucleotide polymorphisms (SNPs) associated with overall IS or its subtypes large artery atherosclerosis (LAA), small vessel disease (SVD) and cardioembolism (CE) [2][3][4][5][6][7][8][9][10][11] . These loci have been suggested to be mostly or entirely subtype specific. Discovery of common variants influencing stroke is hindered by many challenges, including but not limited to the heterogeneity of the phenotype, high lifetime risk of stroke, late disease onset, and limited statistical power in studies performed to date. Thus, replication of presumed risk loci in independent cohorts is emphatically recommended before initiating fine-mapping efforts in search of causal variants and functional studies to discern the functional consequences of these variants 12 . We aimed to replicate the associations of eight loci with IS and/or its subtypes as previously reported in an independent set of patients with IS drawn from a Dutch cohort.
classified IS subtypes according to the Trial of Org 10172 in Acute Stroke Treatment (TOAST), LAA, SVD, CE, and stroke of other and of undetermined cause 14 . We used 1,533 population-level controls of Dutch descent 15 . Information on ancestry in patients and controls was obtained by self-report. The Medical Ethics Committee of the University Medical Center Utrecht approved the study and all patients provided written informed consent. The research described was conducted in accordance with relevant guidelines and regulations.

Statistical analysis.
We removed individuals with >25% missing genotypes (29 individuals; 7 cases and 22 controls). We tested each SNP for deviation from Hardy-Weinberg equilibrium (p < 0.001) and calculated minor allele frequencies for each SNP in cases and controls. As the design of the study prevented us from performing principal component analyses to test for ancestral homogeneity, we compared risk allele frequencies with those from the Genome of the Netherlands (GoNL) Project 16 . GoNL comprises a comprehensive characterization of genetic variation of 769 individuals of Dutch ancestry as assessed by whole-genome sequencing 16 . Frequencies were calculated in the unrelated set of individuals in GoNL (N = 498). Next, we tested these SNPs for association with IS and its subtypes using an additive logistic regression model, which includes 0, 1 or 2 copies of the risk alleles, and adjusted for age and sex. We report odds ratios (OR) with 95% confidence intervals (95% CI) for each risk allele as established in previous studies 2,4 . To assess the validity of also including samples with missing genotypes, we performed a sensitivity analysis excluding each individual with at least one missing genotype. Accompanying exact probability-values for the observed associations were obtained by performing 10,000 permutations. Analyses were performed in Plink version 1.9b3.38. Data Availability. The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Results
After quality control, our data set consisted of 1,368 IS patients (803 (58.7%) men, median age 67.3 years (interquartile range (IQR): 56.5-77.2)) and 1,511 controls of Dutch descent (926 (61.3%) men, median age 64.4 years (IQR: 58.0-70.3)). Baseline characteristics of the patients with IS are presented in Supplementary Table S1. All genotyped SNPs had call rates >98% and were in Hardy-Weinberg equilibrium. Risk allele frequency of all variants showed high concordance with those reported in the Genome of the Netherlands Project (Table 1) 16 .
Results did essentially not differ when individuals with missing genotypes were excluded (Supplementary Table S2).
Previous studies have consistently demonstrated the association between variants at 4q25 (PITX2) and 16q22 (ZFHX3) with atrial fibrillation both in patients with and without IS 3,10 , and additionally with cardioembolic stroke 2-4 . In the 4q25 locus, we only replicated rs2634074, but not rs2200733, despite moderate linkage disequilibrium (r 2 = 0.51) and a comparable effect size as established previously [2][3][4]9 . This finding is likely explained by the difference in the power to detect a statistically-significant signal at each SNP (95% and 56%, respectively), a difference that results from their ~10% frequency difference. After the initial report of the association 10 , other studies found locus 16q22 to be specific for CE 2,4 , whereas our findings point to a possible association with both CE and overall IS. The association with overall IS remained significant after excluding cases with cardioembolic stroke, possibly suggesting a partially shared genetic architecture across different stroke subtypes 17 .
Variants near PITX2 that encode for a transcription factor have convincingly been implicated in sinoatrial node development and regulation of cardiac action potentials 18 . Little is known about the role of ZFHX3 in ischemic stroke. Besides atrial fibrillation, this gene has also been implicated in the regulation of myogenic and neuronal differentiation, and as a tumor suppressor gene in multiple cancers. Additionally, sequence variants in the locus have been linked to Kawasaki disease 10 . These lines of evidence may suggest that the role of this locus in IS might not be restricted to those of cardiac origin, and therefore may explain its potential association with overall IS.
While recent publications have suggested that implicated variants are likely subtype specific 2,4 , it is noteworthy that some genetic overlap between diagnostic IS subtypes has also been reported 17 . Given the repeated discovery of this locus in cardioembolic stroke in large-scale GWAS, it is likely that the observed association of locus 16q22 with overall IS in this study is driven by a subset of patients with another IS subtype that may also have as yet undiscovered atrial fibrillation or cardioembolic stroke. In addition, significant associations of genetic risk scores for atrial fibrillation with overall IS were recently found to be almost entirely explained by an association with cardioembolic stroke 19 .
Several factors may have prevented us from being able to replicate all associations investigated here. First, we had limited power to discover (nominal) associations in a relatively limited cohort size; our power was particularly limited for lower-frequency variants or variants with modest effect, a characteristic true of the vast majority of loci discovered through genome-wide association studies (stroke loci included). Thus, it is entirely possible that non-replicated loci are truly associated with overall IS and its subtypes and would replicate in larger sample collections. Second, failure to replicate may also be due to phenotypic heterogeneity; subtyping approaches vary across studies, and subtyping is imperfect, as many samples are categorized as 'undetermined,' thus allowing for potentially incorrectly subtyped cases (and consequently, reducing power). However, to decrease diagnostic uncertainty we excluded patients with transient ischemic attacks. Despite these limitations, most variants showed comparable effect sizes in the same direction as reported previously.
In conclusion, we validated two loci (4q25, 16q22) associated with IS caused by CE. In addition, our study may suggest that locus 16q22 may also be associated with overall IS or another subtype for which the current study may lack power to demonstrate a significant association. Future studies should search for the causal variants underlying these loci by fine-mapping and further discerning which genes within the loci may have functional consequence for disease.