In view of the population-specific heterogeneity in reported genetic risk factors for Parkinson's disease (PD), we conducted a genome-wide association study (GWAS) in a large sample of PD cases and controls from the Netherlands. After quality control (QC), a total of 514 799 SNPs genotyped in 772 PD cases and 2024 controls were included in our analyses. Direct replication of SNPs within SNCA and BST1 confirmed these two genes to be associated with PD in the Netherlands (SNCA, rs2736990: P=1.63 × 10−5, OR=1.325 and BST1, rs12502586: P=1.63 × 10−3, OR=1.337). Within SNCA, two independent signals in two different linkage disequilibrium (LD) blocks in the 3′ and 5′ ends of the gene were detected. Besides, post-hoc analysis confirmed GAK/DGKQ, HLA and MAPT as PD risk loci among the Dutch (GAK/DGKQ, rs2242235: P=1.22 × 10−4, OR=1.51; HLA, rs4248166: P=4.39 × 10−5, OR=1.36; and MAPT, rs3785880: P=1.9 × 10−3, OR=1.19).
A genetic contribution to Parkinson's disease (PD) is well recognized1, 2, 3 with ∼10% of the cases carrying mutations that lead to rare Mendelian forms of the disease.4 Recently, independent genome-wide association studies (GWAS) established an unequivocal role for common genetic variants in the etiology of PD5, 6, 7, 8, 9, 10, 11 and suggested a population-specific genetic heterogeneity, with SNCA, PARK16, BST1 and LRRK2 as shared risk loci for PD,6, 7, 8, 10, 11 and MAPT as an European-specific risk locus.6, 7 Besides, variation in the GAK/DGKQ,5 the HLA region10 and a locus in chromosome 12q2411 has been proposed to exert a risk for PD in different European populations.
To determine the role of these loci in the Dutch population and to find new genetic factors exerting a risk for PD, we carried out what is, to our knowledge, the first GWAS for PD in the Dutch population.
Subjects and methods
As a product of a national collaborative venture, a total of 841 PD patients were recruited from four different centers within the Netherlands (Scales for Outcomes in Parkinson's disease, SCOPA, http://www.scopa-propark.eu; the Academic Medical Center Amsterdam, AMC, http://www.amc.uva.nl); the Parkinson Centrum Nijmegen, ParC, http://www.umcn.nl; and the VU University medical centre, VUmc, http://www.vumc.nl). All patients were self-reported Caucasian individuals from the Netherlands. The assessed samples consisted of 533 males and 308 females with a mean age at onset ranging from 16 to 84 years (mean=57.5 years; standard deviation=12). For more information about these samples, please visit the websites listed above.
Genome-wide genotyping data from 2082 control participants from the Rotterdam study III12, 13, 14 (ERGO Young) genotyped with Human610K beadchips from Illumina (http://www.illumina.com) were used as our control population. Of these, 912 were males and 1116 females. The mean age was 53.75 years with a range of 45–95 years.
All 841 PD cases were genotyped at 592 839 unique positions with Human660W-Quad beadchips from Illumina, a powerful tool for GWAS. For more information about this genotyping platform, please visit http://www.illumina.com.
Quality control (QC) procedures
After extensive QC approaches (see Supplementary Material for details), the final number of fully genotyped samples from the Netherlands was 2796 including 772 cases and 2024 controls. Each of these was genotyped in a total of 514 799 unique autosomal SNPs.
Quanto software was used to estimate power (University of South California, http://hydra.usc.edu/gxe). Odds ratios (OR) considering a P-value of 9.71 × 10−8 (genome-wide significance level after Bonferroni correction) and different allele frequencies were calculated (Supplementary Figure 1).
For each SNP that successfully passed our QC process, a multi-covariate logistic regression analysis following an additive model was applied. Covariates used in our model were gender, age at onset and age at ascertainment (for cases and controls, respectively), and the first two components of the MDS values after calculation of pairwise IBS. Although the genomic inflation factor (based on median χ2) was low (λ=1.06, Supplementary Figure 3), we chose to adjust for genomic control to correct for possible population substructure.
In a first approach aiming to determine (in the Dutch) the role of previously identified PD genetic risk factors, we decided to look closely at the results for SNCA (chromosome 4q21),5, 6, 7 MAPT locus (chromosome 17q21.1),5, 7 LRRK2 (chromosome 12q12),6, 7 PARK166, 7 (chromosome 1q32), BST1 (chromosome 4p15),6 the GAK/DGKQ locus (chromosome 4p16),5 the HLA region (chromosome 6p21.3)10 and chromosome 12q24 locus.11 A total of 30 SNPs from these loci were selected for closer scrutiny.
The LD structure of the associated loci was analyzed using Haploview 4.116 (Broad Institute of MIT and Harvard, www.broad.mit.edu/mpg/haploview/) and blocks delimited using the D'-based confidence interval method developed by Gabriel et al17 as implemented in Haploview. Risk haplotypes were counted and OR values were plotted with R v.2.7.2 (http://cran.r-project.org/).
A logistic regression analysis of the most associated SNP in each block conditioned to variation at the most associated SNP in the other block (if applicable) was carried out to test for statistical independence of the signals detected.
For SNCA, the population attributable risk (PAR) was calculated using the formula PAR=(p(RR−1))/(p(RR−1)+1) × 100, where p is the prevalence of the risk allele in the population and RR is the relative risk. As we showed that association from the 3′ and 5′ LD block is independent (see results), the combined PAR was calculated with the formula cPAR=1−(1−PAR3′ block) × (1−PAR5′ block).
For MAPT locus, the presence of alleles in the H1 and H2 haplotypes was accomplished using rs1981997 as a haplotype tag SNP because the major (G) and minor (A) alleles of this SNP are fixed in the H1 and H2 haplotypes, respectively.9, 18 To determine which of the associated alleles in the MAPT locus were present in H1 (previously associated with PD19, 20, 21, 22, 23, 24) a two-locus haplotype association analysis of rs1981997 and SNPs in the MAPT region was carried out using PLINK.15
Although we are aware that the sample size of this cohort has a limited power and a GWAS would probably fail to find any associated locus after correcting for 514 799 independent tests, we decided to carry out this analysis to look for specific PD risk loci in the Dutch population. For this purpose, each genotyped SNP was tested for association using the multi-covariate logistic regression explained above.
After QC, a total of 772 Dutch PD cases and 2024 controls from the Rotterdam study12, 13, 14 genotyped in a total of 514 799 unique autosomal SNPs, were included in our analyses (Table 1). To assess the homogeneity of our cohort, identity by state (IBS) distances were calculated. This analysis revealed that both cases and controls share common ancestry (Supplementary Figure 1A) consistent with that seen in CEU (CEPH Europeans from Utah) (Supplementary Figure 1B, C). Power calculations showed that our sample has adequate power to detect variants conferring a risk with an OR of 1.4–1.5 depending on the minor allele frequency of such variant (Supplementary Figure 2).
As we are aware that our population is of limited size to detect risk loci with the effect size previously described for PD in Europeans at the significance level requested by GWAS, the first objective of this project was to determine the role, in the Dutch population, of previously identified genetic risk factors. Thus, association results from 30 SNPs in SNCA,5, 6, 7 MAPT locus,5, 7 LRRK2,6, 7 PARK16,6 BST16 and GAK/DGKQ,5, 10 HLA10 and chromosome 12q24 loci,11 were selected for close scrutiny. Two SNPs in SNCA and BST1 were significantly associated with PD in our population after correcting for 30 independent tests (P<0.00166 in both instances, Table 2). Although no association was found in any of the other SNPs, a trend toward an association in the DGKQ/GAK and MAPT loci was detected (Table 2). Post-hoc analysis of results at these two locations and the HLA region, provided evidence of the role of these three loci in the pathogenesis of PD in the Dutch. No association was found in LRRK2, PARK16, or the chromosome 12q24 locus. A more detailed description of the results obtained at each locus is given below.
Although association of SNCA with PD is well established, it remains unclear whether it is derived from the 3′ or the 5′ end of the gene.6, 7, 8, 25, 26, 27, 28, 29, 30, 31, 32 In an effort to delineate the signal at this locus, we examined the LD structure across this region. This analysis revealed two LD blocks delimited at the intron 4 of SNCA as previously described6, 7, 8, 25 (Supplementary Figure 4). Although the strongest signal detected in SNCA was located at the 3′ block (rs2736990, P=1.63 × 10−5), a closer examination of the association signals revealed that SNPs both in the 3′ (lowest P=1.63 × 10−5, rs2736990) and the 5′ blocks (lowest P=1.78 × 10−3, rs356188) of the gene appeared to be associated with PD in our population (Supplementary Table 1). A logistic regression analysis using the allele dosage of the most associated SNP in each block as a covariate, did not change the results observed at the most associated signal in the other, indicating that the two signals are independent (Supplementary Table 2) and being in disagreement with previous results.7, 25, 32 LD analysis between these two SNPs revealed a r2 value of 0.028 and a D' value of 0.388, further supporting this hypothesis.
Examination of the risk conferred by each of the haplotypes contained in these blocks confirmed these results and showed that a single haplotype contained in the 3′ block and two other in the 5′ block exerted the largest risk for PD in our population with ORs of 1.42, 1.41 and 1.48, respectively (Supplementary Figure 5).
Rs12502586 showed significant association with PD in our population, with a P-value of 1.63 × 10−3 and an OR of 1.337. A closer look into the association results at this locus revealed that three SNPs presented with P-values of ∼10−3 (range 1.63 × 10−3−3.33 × 10−3, Supplementary Table 3). Haplotype analysis of this region showed the existence of two LD blocks containing haplotypes highly associated with PD (P-values of 6 × 10−4 and 4 × 10−4, respectively). These blocks span from LOC285550 to intron 3 of BST1 and from intron 4 of BST1 to intron 8 of this same gene. The associated haplotypes exert a risk for PD with an OR of 1.24 and 1.36, respectively (Supplementary Figure 6). A logistic regression analysis of the most associated SNP in each block (rs4583752 and rs12502586, respectively) conditioned to variation at the most associated SNP in the other, yielded a prominent reduction of the detected association, indicating that both risk haplotypes are tagging the same risk variant (Supplementary Table 4). The linkage disequilibrium between these two had r2 value of 0.280 and a D' of 0.938, further supporting the previous results.
Only two out of the three SNPs in the GAK/DGKQ locus, which were reported to be associated with PD by Pankratz et al5 were genotyped in our cohort. One of these (rs1564282), which is most associated in Pankratz's study, was associated with PD in our population (Table 1). Although this association did not surpass our conservative Bonferroni correction for 30 independent tests, post-hoc analysis revealed the existence of 11 SNPs with P-values <0.05 spanning 260.5 kb (lowest P=1.22 × 10−4, rs2242235) (Supplementary Table 5, Supplementary Figure 7). Due to this, and because Bonferroni correction is unnecessarily punitive when replicating genes that are known to be associated with a certain trait, we consider these results as positive.
The two most associated SNPs in this region (rs2242235 and rs4690296 with P-values of 1.22 × 10−4 and 2.02 × 10−4, respectively) are located in PCGF3, a gene 78 kb away from GAK and 188 kb away from DGKQ, which encodes a protein with unknown function containing a C3HC4 type RING finger for protein–protein interaction. These two SNPs are contained in two different haplotype blocks spanning from intron 1 to intron 8 of PCGF3 and intron 8 to intron 9 of this same gene, respectively. In addition, a predicted transcript, LOC100118084 is contained in the first block. Each of the two blocks contain a single haplotype associated with PD in our population (P=2 × 10−4 and 7.38 × 10−5, respectively) with an OR of 1.47 and 1.52, respectively (Supplementary Figure 8). The most associated SNPs in each of these haplotype blocks are in strong LD with each other (r2=0.818, D'=0.916), indicating that they are, most likely, tagging the same risk variant in the region. A logistic regression analysis using the allele dosage at the other marker as a covariate confirmed this hypothesis (Supplementary Table 6).
In the GWAS carried out by Hamza et al,10 only one SNP (rs3129882) in this region exceeded genome-wide significance level. After in silico replication and meta-analysis of the most significant SNPs, another six SNPs in this region were associated with PD in their population.10 Of these, only rs3129882 was present in our genotyping platform. Although this SNP was not associated with PD in our population (P=0.13), post-hoc analysis of this region revealed the existence of 36 SNPs with P-values below 0.05 (lowest P=4.39 × 10−5, rs4248166) spanning a total of 440.1 kb (Supplementary Table 7, Supplementary Figure 9).
The four most associated SNPs in this region (P-values ranging from 4.39 × 10−5 to 7.98 × 10−5) are located in two consecutive LD blocks. Each of these blocks contains a haplotype associated with PD in our population (P-values of 6.90 × 10−5 and 6.63 × 10−5, respectively) with ORs of 1.32 and 1.28, respectively (Supplementary Figure 10). The most associated SNPs in each of these haplotype blocks are in strong LD with each other (r2=0.758, D'=0.984), indicating that they are tagging the same risk variant. A logistic regression analysis using the allele dosage at the other marker as a covariate confirmed this hypothesis (Supplementary Table 8).
The next association signal in this locus lies on rs17533090. Although this signal is 223.46 kb away from the one explained in the previous paragraph, it is in relatively strong LD with it (r2=0.376, D'=0.616), indicating that they are, most likely, tagging the same risk variant. A logistic regression analysis using the allele dosage at rs4248166 as a covariate confirmed this hypothesis (Supplementary Table 9).
Again, Bonferroni correction is unnecessarily conservative when replicating genes known to be associated with PD. A total of 20 SNPs in this locus were nominally associated with PD in our population (lowest P=1.9 × 10−3 at rs3785880) (Supplementary Table 10). Genotypes at rs1981997 allowed us to differentiate between H1 and H2 haplotypes at this locus as the major (G) and the minor (A) alleles of this SNP are fixed in the H1 and H2 haplotypes, respectively.9, 18 This analysis showed that the H1 haplotype frequency in the Dutch population is 77.64%.
Two-locus haplotype analysis of this SNP and the most associated SNPs in our data set (P<0.01) revealed medium to high r2 values in all SNP pairs (Supplementary Figure 11). A two-locus association analysis showed that, in all instances, positive effect sizes were found in haplotypes in which the G allele of rs1981997 was present (Supplementary Table 11).
As expected, none of the SNPs tested reached genome-wide significance (Figure 1). The most significant signal was found on rs7995973, with a P-value of 5.41 × 10−6 and an OR of 0.72. This SNP is part of an associated locus containing 39 SNPs nominally associated with PD in our population of which eight had a P-value below 10−3 (Supplementary Table 1). This locus is on chromosome 13q31 and contains LOC729479 and SPRY2 (Supplementary Figure 12).
In an attempt to replicate these results, logistic regression models were applied to these 39 SNPs from two recently published GWAS datasets in European populations.5, 7 This approach failed to find any association after correcting for 39 independent tests (Supplementary Table 12).
In this study, we present the results of the first GWAS of PD in the Dutch population. Given the population-specific heterogeneity in reported genetic risk factors for PD,6, 7 the first objective of this project was to determine the role, in our population, of the genetic risk factors identified in previous GWAS in European and Japanese populations.5, 6, 7, 9 This approach confirmed SNCA and BST1 as PD risk loci. Association of the tested SNPs in GAK/DGKQ, HLA and MAPT loci did not surpass our stringent Bonferroni correction for 30 independent tests. However, given the strong a priori for each of these loci, correcting across all the tests carried out is unnecessarily conservative. For this reason and because post-hoc analyses revealed the existence of 11, 36 and 20 SNPs nominally associated with PD at these loci (GAK/DGKQ, HLA and MAPT, respectively), we are confident that the signals detected represent a true association with PD.
Although a role of SNCA in the risk for PD is well established, it is still unclear whether this association is derived from the 3′ or the 5′ end of the gene.6, 7, 8, 25, 26, 27, 28, 29, 30, 31, 32 The data presented here supports the hypothesis that two different signals in these blocks are present in the Dutch population, which is in controversy with previous results.7, 28, 30, 31, 32 In a report by Mueller et al,25 the authors also detected two association signals in the 3′ and 5′ blocks of SNCA. Interestingly, when stratifying their cohort into males and females, as well as young and old PD groups (according to the median age of 57 years), additional significant association in the intron 4 region of the 5′ block was found for the female and the young patients' subgroups. Stratifying our data on the basis of gender and median age at onset/ascertainment did not replicate such differences between either of the subgroups (data not shown). These results do not support gender and age heterogeneity for PD at this locus in the Dutch population, but support the fact that two different variants are exerting a risk for PD in our population.
Collectively, these results indicate that the population attributable risk (PAR) by variation at the SNCA locus in the Dutch is larger than previously described in other European populations,7 as it should be calculated by adding the effect of variation from each independent LD block. When the most associated SNPs at each block are considered, the combined PAR at the SNCA locus would be 17.56% (6.39% at the 3′ and 11.97% at the 5′ block).
Several groups have focused their association studies on polymorphisms in the promoter region of SNCA, especially a microsatellite marker ∼10 kb upstream of the gene (REP1, D4S3481). Although there are some conflicting results regarding the association of this marker with PD,33, 34, 35, 36, 37 most reports consistently describe a positive association between the longer forms of REP1 and PD.26, 28, 38, 39 Gene expression differences caused by this repeat have also been described both in vitro35, 40 and in vivo.41, 42 We agree that examining this microsatellite in our population would help understanding whether the effect detected on the 5′ block is a residual effect of variation in REP1 caused by LD between the associated alleles. However, these experiments could not be performed because only genotypic information was available for the control population used in this study.
Results derived from the analyses performed herein, successfully replicate the association at BST1. Although association at this locus was originally presented as specific for the Japanese population,6, 7 it has recently been replicated in British8 and French11 populations as well. Thus, this is the third study implicating BST1 on PD risk in a European population.
Although association at this locus was not detected in the GWAS carried out in a combined cohort of German and American samples,7 separating the data set by country of origin shows that rs12502586 is significantly associated with PD in the Germans, but not in North Americans (P-values of 4.82 × 10−3 and 0.179, respectively) (Supplementary Table 13).
These results together point to the hypothesis that variation in BST1 represents a risk factor for PD in Japanese6 and European populations7, 8, 11 (British, French, German and Dutch populations) and that this effect has been diluted in the North America.7 A recent report by Hamza et al10 in which a larger cohort of North American samples have been assayed (2000 North American PD cases versus 971 typed in the previous German/North American study), identified two of the six reported SNPs at this locus to be nominally associated with PD. These results point to the hypothesis that the association of BST1 is, indeed, diluted in North America and that large datasets are needed to detect the effect of variation at this locus.
Given the known European population genetic substructure (specially between northern and southern European populations groups)43, 44 and that American population represents an admixture of individuals with northern and southern European ancestry,10, 45 one would expect allele frequency differences between North American and European populations at the most heterogeneous loci. In case of rs12502586, the allele frequency of the associated allele has been diluted from 11.24 and 10.41% in the Dutch and German controls, respectively, to 9.3% in the North American controls (Supplementary Table 14). Owing to the small size effect of the associated allele, this minor allele frequency changes can lead to differences of statistical power to detect association. Thus, it is not surprising that there is a diminished power to detect association at BST1 locus in the North American population.
The events that trigger neuronal degeneration in PD remain unknown. Some authors argue that PD may result from inflammatory processes within the nervous system.46, 47, 48 From this perspective, it is interesting that BST1 encodes a molecule that facilitates pre-B-cell growth49 and has an aminoacid sequence that is 33% similar to CD38, a key player in inflammatory processes. Although no autoimmune disease has been found to be a convincing risk factor for PD, a possible pathogenic link has been proposed between rheumatoid arthritis and PD.50, 51, 52 It is noteworthy that BST1 expression is enhanced in bone marrow stromal cell lines derived from patients with rheumatoid arthritis.49
Interestingly, another region with genes implicated in neuroinflammation (HLA region) has recently been associated with PD.10 In this report, the authors identified rs3129882 to be associated with PD in their population and chose to name this locus PARK18. Although rs3129882 is not associated with PD in our population, post-hoc analysis revealed 36 SNPs nominally associated with PD in the Dutch (lowest P=4.39 × 10−5), confirming, for the first time, the role of this locus in PD etiology.
Association between certain HLA antigens and PD has largely been suggested since postmortem analysis of the substantia nigra (SN) in PD patients revealed the presence of activated microglia.46 In addition, the frequency of various HLA antigens was increased in patients with PD and in patients dying from postencephalitic PD.53, 54, 55 Although PD is clearly not an autoimmune disorder, the occurrence of a localized attack of microglia during the disease course has been demonstrated with epidemiological,56, 57 animal model58, 59 and cell culture60, 61, 62 approaches. Zhang et al63 reported that aggregated α-synuclein activates microglia which, as a consequence, are toxic toward cultured dopaminergic neurons.
Thus, is it hypothesized that in response to something related to α-synuclein accumulation, microglial cells are activated to undergo a neuroinflammation process that, ultimately, kills the dopaminergic neurons of the SN.
In a recent GWAS by Pankratz and colleagues, the authors carried out a GWAS focusing only on familial PD cases. With this approach they detected evidence of association to several chromosomal regions including a locus containing GAK, TMEM175 and DGKQ, of which GAK is of special interest because it was one of the 137 genes shown to be differentially expressed in PD cases in a gene expression profiling approach of parkinsonian SN.64 In our cohort, only two out of the three SNPs reported to be associated with PD by Pankratz et al,5 were successfully genotyped, one of them (rs1564282) being associated with PD in our population.
Although this association did not surpass our conservative Bonferroni correction, the a priori for this hypothesis is strong, as it has been replicated in three independent studies.8, 10, 11 Thus, we would at most correct for the SNPs tested within this locus and both of them would be significantly associated with PD in our population (P<0.025 in both instances, Table 2).
Post-hoc analysis revealed the existence of 11 SNPs nominally associated with PD at this locus. Interestingly, the two most associated ones are located in PCGF3, which is 78 kb away from GAK and 188 kb away from DGKQ, and encodes a protein with unknown function containing a C3HC4 type RING finger for protein–protein interaction. Although the function of this transcript is still unknown, it is expressed in the brain. Haplotype analyses also point to this gene as the one carrying the risk variants for the development of PD in the Dutch population, pointing to population-specific heterogeneity for PD in this genomic region with different genes exerting risk in different Caucasian populations. Further genotyping of cases and controls from different populations will help understanding the role of this locus in PD.
No significant association was detected in the MAPT locus after multiple tests correction. This is probably because of the relatively small size of the sample used for these experiments, as this locus is a well-known PD risk locus among Europeans.5, 7, 9, 23, 65, 66 If the same hypothesis as that presented for GAK/DGKQ is applied, five out of the nine tested SNPs would be borderline significant (in the range of 10−3) supporting a role of this locus in the development of PD among the Dutch.
In our population, associated alleles at this locus occur in the H1 haplotype. Genotyping of a larger cohort of Dutch PD cases and controls will help confirming these results and defining what H1 subhaplotypes exert the risk for PD in the Dutch.
Last, we failed to find any new PD risk loci in the Dutch population. The most associated locus in our cohort was located in chromosome 13q31 and contained LOC729479 and SPRY2. As the signal detected was not significant after multiple test correction and because we failed to replicate it in two other Caucasian GWAS data sets,5, 7 we refrained from drawing any further conclusions. For the same reason, further conclusions cannot be drawn for other clusters of nominally associated SNPs in regions with interesting biological candidates for PD that did not reach genome-wide significance level. A table showing the top 100 results is displayed in the Supplementary Material (Supplementary Table 10).
In summary, we presented the results of the first GWAS in PD in the Dutch population. These data directly confirmed SNCA and BST1 as PD genes in the Dutch population. Besides, post-hoc analysis confirmed a role for DGKQ/GAK, HLA and MAPT loci in modulating the risk for PD in this population.
We thank the subjects involved in this study for making this work possible. We also want to thank the Hersenstichting Nederland (http://www.hersenstichting.nl), the Neuroscience Campus Amsterdam and the section of Medical genomics (MGA) and the Intramural Research Program of the National Institute on Aging (National Institutes of Health, Department of Health and Human services; project number Z01 AG000949-04) for supporting in part the work presented here. Last, we would like to thank the Prinses Beatrix Fonds (http://www.prinsesbeatrixfonds.nl) for sponsoring this work.
About this article
Supplementary Information accompanies the paper on European Journal of Human Genetics website (http://www.nature.com/ejhg)