Introduction

The identification of genes involved in monogenic forms of Alzheimer's disease (AD) has significantly contributed to our knowledge of the disease mechanisms. The causal links between mutations, the functions of the mutated genes (APP, PS1 and PS2) and disease development prompted a pathophysiological hypothesis, which radically changed our understanding of AD: the amyloid cascade hypothesis.1 The systematic association of pathogenic mutations with changes in APP metabolism and, more particularly, a relative overproduction of Aβ42 peptides indicates that this metabolism is at the heart of the disease process (at least in the monogenic forms of the disease). The overproduction of these neurotoxic peptides is supposed to lead to or accentuate neuron-to-neuron propagation of the τ pathology (leading to neuronal death) by an unknown mechanism.2

By analogy, it was expected that the characterization of genetic factors involved in the common forms of AD (that is, lacking classical Mendelian inheritance), the most frequent form of the disease, should also help to better understand the AD physiopathological process. However, the characterization of these genetic factors has encountered significant difficulties. Until 2009, the APOE (apolipoprotein E) gene was the only globally valid genetic determinant of AD to have been unambiguously identified in 15 years of intensive research.3, 4

As with other multifactorial diseases, this systematic inability to detect new genetic determinants has prompted more comprehensive investigations using genome-wide association studies (GWASs). We and others performed five large GWASs in this field and reported that the CLU (clusterin), PICALM (phosphatidylinositol-binding clathrin assembly protein), CR1 (complement component (3b/4b) receptor 1), BIN1 (bridging integrator 1), ABCA7 (ATP-binding cassette, sub-family A, member 7), MS4A (membrane spanning 4A) cluster, EPHA1 (ephrin type-A receptor 1), CD33 (differentiation antigen 33) and CD2AP (CD2-associated protein) genes were associated with the AD risk.5, 6, 7, 8, 9 Most of these susceptibility genes have been already systematically replicated in Caucasians in large case–control studies and in families.10

However, our understanding of the AD genetics is far away to be complete and strong efforts have still to be done. At this level, classical GWAS approaches present an important limitation with systematic application of a conventional, highly conservative Bonferroni correction leading to select only the most statistically significant associations (commonly, P<1 × 10−8). This involves the risk of rejecting biologically valid hypotheses on purely statistical grounds, that is, false negatives.

To partly handle some of this limitation, other complementary approach consists in extracting pertinent information from single-nucleotide polymorphisms (SNPs) nominally associated with the risk of developing AD in GWAS by using complex statistical and bioinformatics multiple-SNP analyses such as genome-wide haplotype association (GWHA) study.11 We adopted this GWHA strategy to the AD genetic susceptibility through a three-step approach.

Materials and methods

Population description

The main characteristics of the populations used for the GWHA study are described in the Supplementary Notes and Supplementary Table 1. All AD cases met the criteria for either probable AD (NINCDS-ADRDA, DSM-IV)12 or definite AD (CERAD).13 All elderly controls were screened for dementia using the MMSE or the AD Assessment Scale-cognitive subscale and were determined to be free from dementia at neuropathological examination or had a Braak score of 2.5 or below. All subjects or, in those with substantial cognitive impairment, a caregiver, legal guardian, or other proxy gave written informed consent for participation in this study. The study protocols for all populations were reviewed and approved by the appropriate Institutional review boards of each country.

The main characteristics of the populations (nondemented individuals) used for the Aβ plasma study (3C study, Rotterdam and CHS) are described in the Supplementary Notes and Supplementary Table 2.14, 15, 16

Genotyping

Participants in the French GWA study (including the 3C study) were genotyped using an Illumina 610–quad array (Illumina, San Diego, CA, USA). Quality control and analytical parameters have been described in detail elsewhere.5 The GERAD participants were genotyped using an Illumina 610–quad array (Illumina), a HumanHap550 array or a HumanHap300 array (Illumina). Again, QC and analytical parameters have been described in detail elsewhere.6 The Rotterdam and CHS studies were genotyped using the Affymetrix 500K array (Affymetrix, Santa Clara, CA, USA). Again, QC and analytical parameters have been described in detail elsewhere.7

In the European populations (stage 3), genotyping was performed using Sequenom assays at the exception of the German population genotyped with a 610K ILLUMINA chip (Illumina). The primer and probe sequences used in the genotyping assays are available upon request. In order to avoid any genotyping bias, cases and controls were randomly mixed while genotyping and laboratory personnel were blinded to case/control status. The genotyping success rate was at least 95%. Departure from Hardy–Weinberg equilibrium was observed for rs2446581 in the Swedish control and case samples (P=9.1 × 10−7 for the whole population). The Swedish sample was consequently excluded from further analysis, as haplotype analyses that do not comply with Hardy–Weinberg equilibrium are likely to bias observations.

Statistical analyses

Missing age or gender data

Any individuals with missing age or gender data were excluded. This gave a maximum of 2025 AD cases and 5328 controls in step 1, 2820 AD cases and 6356 controls in step 2 and 5093 AD cases and 4061 controls in step 3.

Detection of haplotype effects using a sliding-window approach

This approach has been fully described elsewhere.11, 17 Briefly, the search for haplotype effects was carried out by applying a sliding-window approach18, 19 to the French GWA data set for each chromosome. After excluding SNPs not in Hardy–Weinberg equilibrium or with a minor allele frequency (MAF)<0.02, the first step of the strategy was to eliminate part of the redundancy between SNPs by using haplotype-tagging SNPs (htSNPs). For this, the same binning procedure as in11 was used: within each bin of 10 adjacent SNPs, we identified a minimal set of htSNPs that were able to characterize more than 95% of the inferred haplotypes with estimated frequency greater than 0.02. Once a bin had been characterized by a set of htSNPs, the same strategy was applied to the bin composed of the next 10 adjacent SNPs. The final set of htSNPs (n=287 956) was then fed into the sliding-windows approach.

Given a window of 10 htSNPs, the search for the most informative and parsimonious haplotype configuration in terms of disease prediction was performed for all possible 1 to 4 loci combinations of not necessarily adjacent SNPs. We used a strategy based on Akaike's information criterion, which has been previously described for candidate gene haplotype analysis.19, 20 It relies on the stochastic expectation–maximisation module21 in THESIAS software.22 If required, missing genotypes were inferred by applying,23 multiple imputation.21 In all, 37 330 050 combinations were investigated in our genome scan and this investigation was conducted thanks to the use of the grid technology developed by the European Grid Infrastructure (http://www.egi.eu).20 This technology enables several thousand computations to be run in parallel on a large number of different CPUs. The sliding-window haplotype approach was developed into a GridHaplo grid package for the EGI grid (http://genecanvas.ecgene.net).11

Replication of haplotype effects

Regions with ‘window P-values’ (i) below 10−5 and (ii) 100 times smaller than the smallest single-locus P-value (including all SNPs and not only htSNPs) were analyzed in terms of replication in the GERAD1 data set by using THESIAS software, with systematic adjustment for age and gender.

Association of FRMD4A SNPs with Aβ plasma concentrations

In each center, Aβ plasma variables are normally distributed. We excluded prior analyses all samples with a +/− two s.d. values in order to avoid potential associations driven by extreme observations. Finally, each quantitative variable was transformed into a z-score (equal to (observed value minus the sample mean), divided by the sample s.d.). The association between the Aβ1−40, Aβ1−42 and Aβ1−42/Aβ1−40 z-scores on one hand and imputed FRMD4A imputed SNPs on the other (see below) were assessed using a general linear model under an additive model adjusted for age, center and gender.

We used inverse-variance weighting (also known as fixed-effects meta-analysis) to investigate the homogeneity of haplotype effects from one study to another and to provide meta-analysed, age- and gender-adjusted ORs for haplotype effect estimates in the seven studies. A similar strategy was used to provide meta-analysed, age- and gender-adjusted association levels between Aβ plasma z-scores and FRMD4A SNPs.

Imputation analyses

We imputed SNPs by using MaCH (http://www.sph.umich.edu/csg/abecasis/mach/index.html) and minimac software (http://genome.sph.umich.edu/wiki/minimac). The reference haplotype data are provided by the MaCH website, which was built for the combined Caucasian populations as part of the 1000 genomes project. In our data set, all individuals were genotyped on the same platform (the Illumina Human660W-Quad Beadchip) and we used 492 941 observed SNP genotypes that passed quality filters as follows: genotyping call rate 98%, Hardy–Weinberg Equilibrium Test P value 1 × 10−6 and MAF 1%. We first inferred haplotype combinations of each individual using the ‘phase’ option in the MaCH program and then imputed them with minimac. As minimac is a newly developed software tool, we compared the correlation between the imputed genetic dosage from minimac and those from the standard MaCH program for SNPs in chromosome 22. The results were very similar (data not shown). Doses for 7 704 555 million SNPs with a MAF>0.01 were available from the French GWA data set using the 1000 genomes data set. We selected 2538 SNPs within the Chromosome 10p13 locus of interest (chromosome10:13 655 705–14 402 866) and evaluated their associations with AD risk in an additive logistic regression model adjusted for age, gender and disease status. A graphic representation was then generated with Locuszoom software (http://csg.sph.umich.edu/locuszoom/).

In the 3C, Rotterdam and CHS cohorts, we used the genotype data to impute to the 2.5 million non-monomorphic, autosomal SNPs described in HapMap II (CEU population) as described elsewhere.7 We selected 1486 SNPs (MAF>0.01) within the Chromosome 10p13 locus of interest (chr10:13 655 705–14 402 866) and evaluated their associations with Aβ plasma concentrations as described above. Again, a graphic representation was then generated with Locuszoom software (http://csg.sph.umich.edu/locuszoom/).

Results

We developed a three-step approach. In the first step, the French GWA study (EADI1 for European Alzheimer's initiative 1),5 including 2025 AD cases and 5328 controls was used to select regions with potential haplotype associations with AD. Following a sliding-windows approach (see Material and methods), we applied two a priori criteria to select loci of interest as previously described:11 level of association with a P-value (i) below 10−5 and (ii) at least 100 times smaller than the smallest single-SNP P value observed in the corresponding locus. We were able to detect loci already known to be involved in AD (the APOE, BIN1 and CR1 locus)5, 6, 7, 8, 9 from previous GWASs. The obtained signal was systematically highly stronger than the one observed for each SNP taken separately (Table 1). After exclusion of these loci, we retained 91 regions of interest.

Table 1 Best haplotype combination observed in CR1, BIN1 and APOE and comparison with the association obtained for single-SNP analysis within these loci

In the second step, we replicated these haplotype associations in the GWA database from another consortium involved in the study of AD genetic susceptibility, the GERAD1 consortium,6 including 2820 AD cases and 6356 controls. All 91 regions were available for investigation and 9 of them showed nominal association (P<0.05; Table 2). We decided to further investigate two regions showing the same best haplotypes associated with AD risk in both EADI1 and GERAD1, with similar magnitude and direction of association (Table 2). These two loci, not previously detected in single-SNP GWAS analyses, were located on chromosomes 6p21 and 10p13. The identified AD-associated haplotypes at the 6p21 region were tagged by rs2395760, rs991762 and rs4711652. The rs7081208, rs2446581 and rs17314229 tagged the 10p13 haplotypes. For the Chromosome 6p21 locus, the best association was attributable to the GGT haplotype in both GWASs (OR (Odds Ratio): 1.53; 95% CI: (1.31–1.79); P=8.1 × 10−8 after adjustment for age and gender when both EADI1 and GERAD1 studies were combined). For the Chromosome 10p13 locus, the highest level of association was attributable to the AAC haplotype (OR: 1.76; 95% CI: (1.44–2.15); P=2.3 × 10−8 after adjustment for age and gender when both EADI1 and GERAD1 studies were combined. Additional adjustment for the four main principal components did not modify the results, data not shown) (Table 2).

Table 2 List of the haplotype combinations identified in the French GWA study and replicated at P<0.05 in GERAD

In the third step, the six tagging SNPs were further genotyped in five additional AD case–control studies from Flanders-Belgium (842 cases and 489 controls), Finland (560 cases and 623 controls), Germany (728 cases and 961 controls), Italy (1846 cases and 904 controls) and Spain (1117 cases and 1084 controls) (Supplementary Tables 3 and 4). The same common haplotypes were inferred from the six SNPs in the two loci, with the exception of the AGT haplotype in Chromosome 10p13 in the Italian population (frequency<0.01 in controls) (Supplementary Table 5 and 6).

In a combined analysis of the five replication data sets, the overall difference in haplotype distribution for the 6p21 locus was not significant between AD cases and controls (P=0.13) and the GGT haplotype was not associated with AD risk (OR: 0.89; 95% CI: (0.74–1.08); P=0.23 after adjustment for age and gender). We therefore considered that the haplotype association in this locus was not confirmed (see Supplementary Table 5).

Conversely, an overall significant difference in haplotype distribution between AD cases and controls was observed at the 10p13 locus in the replication sample (P=4.3 × 10−2 after adjustment for age, gender and country) and the AAC haplotype was associated with increased AD risk in the five data sets meta-analysis (OR: 1.55; 95% CI: (1.19–2.00); P=9.2 × 10−4 after adjustment for age and gender). When the seven data sets (EADI1, GERAD1 and the follow-up studies) were analyzed together, the AAC haplotype had a meta-analysed OR of 1.68 (95% CI: (1.43–1.96); P=1.1 × 10−10 adjusted for age and gender) when compared with the most frequent GGC haplotype, with no evidence of heterogeneity across the seven countries (P=0.92) (Figure 1). Among the whole sample, nine cases were homozygous for the AAC haplotype versus only five controls (OR: 2.85; 95% CI: (0.88–9.76); P=0.09 with Yates correction), which is consistent with a dose-dependent effect of the AAC haplotype.

Figure 1
figure 1

Haplotypic Odd ratios (ORs) for Alzheimer's disease (AD) risk with the AAC haplotype derived from rs7081208, rs2446581, rs17314229 at Chromosome 10p13 in seven independent European populations.

PowerPoint slide

Importantly, none of the individual SNPs were associated with AD risk in the total sample with a P value lower than 10−5. The best meta-analysed SNP was rs2446581 (OR: 1.15; 95% CI: (1.08–1.24); P=2.1 × 10−5 after adjustment for age and gender; Supplementary Table 7). The hypothesis that the rs2446581 could solely explain the association observed with the 10p13 haplotypes was rejected by use of the likelihood ratio test (P=9.1 × 10−7).

We also tested whether this haplotype association might be explained by one or more untyped SNPs located nearby the genotyped SNPs. Genotypes for 7 704 555 million SNPs with a MAF0.01 were imputed from the French GWA data by using the 1000 genome dataset (http://www.1000genomes.org/). None of the single SNPs imputed at this locus (n=2538) showed stronger evidence of association with the AD risk than the haplotypes initially identified (Supplementary Figure 1). The 10p13 haplotype region appeared to be fully included within the FERM domain containing 4A (FRMD4A) gene, as indicated by the linkage disequilibrium map of this locus of interest (Figure 2).

Figure 2
figure 2

Linkage disequilibrium map in the FRMD4A locus and localization of the region defined by the rs7081208, rs2446581 and rs17314229 at this locus (in bold).

PowerPoint slide

We finally explored how FRMD4A might be involved in the AD process. According to a recent report, the FRMD4A gene was described to interact with Arf6.24 As this latter protein was reported to control APP processing,25 we postulated that the FRMD4A locus could be associated with endophenotypes susceptible to reflect modulation of the APP metabolism. We accordingly analyzed association of the FRMD4A locus with Aβ peptide plasma concentrations in three independent populations of nondemented individuals (n=2579) (Supplementary Table 2) for which FRMD4A imputed SNPs, Aβ1−40 and Aβ1−42 plasma concentrations were available. Strong associations were observed between several FRMD4A SNPs and Aβ1−42/Aβ1−40 (nine SNPs reaching a significant level after Bonferroni correction, P=3.4 × 10−5 for 1486 SNPs, Figure 3). The best signal was obtained for rs7921545 (meta-analysed z-score β coefficient: 0.12, CI: 95% (0.07–0.017); P=5.4 × 10−7) and was homogeneous between the three data sets (P for heterogeneity=3.7 × 10−1, Supplementary Figure 2). Of note, rs2446581 showed nominal association with Aβ1−42/Aβ1−40 (P=3.1 × 10−2, see Supplementary Figure 2). Furthermore, only nominal associations between FRMD4A SNPs, Aβ1−40 or Aβ1−42 were detected (see Supplementary Figure 3).

Figure 3
figure 3

Association of single-nucleotide polymorphism (SNP) in the FRMD4A locus with plasma Aβ1−42/Aβ1−40 level following meta-analyses of z-score β coefficients under an additive model adjusted for age and gender using three independent healthy populations. SNPs in red are nominally associated with Aβ peptide levels. SNPs in are the three markers defining the AAC haplotype assocated with Alzheimer's disease (AD) risk.

PowerPoint slide

Discussion

From a specific haplotype-based GWAS approach, we were able to detect a new genetic susceptibility factor for AD that could not be identified through usual GWAS analyses. Owing to the high number of association tests performed in this GWHA study (37 330 050, not all of which were independent), a robust replication strategy was necessary. Our three-step approach was thus particularly conservative and we cannot rule out the possibility that we failed to identify other haplotype-based loci associated with AD risk. Nonetheless, our GWHA immediately identified two potential AD susceptibility loci among which the 10p13 locus showed an AAC haplotype that was strongly and consistently associated with AD risk in seven independent European populations. Interestingly, this locus is included in the large AD linkage region regularly identified on chromosome 10.26

Additional work will be necessary to determine whether the AD risk-haplotype association indicates an interaction between SNPs or whether they tag non-genotyped, functional variants. However, our GWHA study could have handled several of the inherent limitations of GWAS. As the AAC haplotype is rare (with a mean frequency of 2% in our Caucasian populations), our work suggests the possibility that rare variants may be responsible for the signal detected within the FRMD4A gene. This might explain why the locus was not detected (i) in our previous GWA studies based on single-SNPs analyses5, 6, 7, 8, 9 and (ii) through imputation, as SNPs with low frequency and/or SNPs within specific regions with low LD are poorly imputed even when using the 1000 genome data set (Figure 2). Furthermore, we cannot rule out the possibility that the AAC haplotype tags insertion–deletion variants or copy-number variations—neither of which is captured by imputation. Interestingly, several copy-number variations within the FRMD4A locus have been described (http://projects.tcag.ca/variation/).

Little is known about the function of the protein encoded by FRMD4A in animal cells. It belongs to the FERM super family, which includes ubiquitous components of the cytocortex involved in cell structure, transport and signalling functions.27 According to a recent report, the FRMD4A gene product may regulate epithelial polarity by interacting with Arf6 and the PAR complex.24 Interestingly, several other lines of evidence suggest that Arf6 modulates cell polarity in various systems—including neurons. Arf6 reportedly regulates dendritic branching in hippocampal neurons and neurite outgrowth in PC12 cells.28, 29 Finally, Arf6 was recently reported to control APP processing,27 suggesting that FRMD4A could also be implied in this metabolism. This hypothesis is sustained by the observation of an association of the FRMD4A locus with plasma Aβ1−42/Aβ1−40, at once reinforcing the plausibility of the association of this gene with AD risk and its potential implication in a subtle control of the APP metabolism. Unfortunately, as the SNPs defining the AAC haplotype were only available by imputation in the Rotterdam and CHS study, search for association of the AAC haplotype with plasma Aβ1−42/Aβ1−40 was not possible. Furthermore, it is important to keep in mind that this result is difficult to interpret in terms of AD pathophysiological process. First, it is not known whether plasma Aβ peptides reflect a dynamic equilibrium between the brain, CSF and plasma compartments.30, 31, 32, 33, 34 Second, the source of plasma Aβ species is not known and the Aβ peptides' physiological functions are still not fully understood.35, 36

However, taken as a whole, these data suggest that FRMD4A could be a relevant candidate gene for AD risk and the basis for another possible pathophysiological pathway for AD.