Introduction

The association of age-related macular degeneration (AMD) with variants on chromosome 1 (complement factor H (CFH)), chromosome 6 (CFB; C2), chromosome 10 (LOC387715/ARMS2), and chromosome 19 (C3) has clearly identified the primary role of the complement pathway in disease pathogenesis, and is consistent with the linkage peaks observed in several whole-genome linkage studies.1, 2, 3, 4, 5, 6, 7, 8, 9 Follow-up of whole-genome linkage regions with fine-mapping has met with limited success in many other complex diseases. However, the effect sizes of the identified risk variants for AMD have been dramatically larger than most late-onset disease associations.10, 11, 12 AMD, the leading cause of irreversible blindness in older adults,13 has become a model for the genetics of complex disease. In light of these early successes, we selected 1500 single nucleotide polymorphisms (SNPs) using two different criteria: targeting genes in regions under suggestive linkage peaks from a recent meta-analysis1 and genes selected from the complement pathway not in these regions.

Materials and methods

Sample

The study population consisted of 2053 unrelated Caucasian individuals 60 years of age or older diagnosed based on ocular examination and fundus photography. There were 1228 cases of both dry and neovascular (wet) advanced AMD and 825 controls.14 The mean age was 74 years for controls (54% women) and 78 years for affected individuals (55% women).8 Informed consent was obtained in writing from all participants, and procedures were approved by the appropriate institutional review boards. This is largely the same sample set with the same phenotyping criteria that we described in detail previously.8, 9 Importantly, this sample has been previously confirmed to show no inflation of case–control association statistics because of population substructure.8

SNP selection

We selected a total of 1500 SNPs in complement pathway genes and across regions of chromosomes 1, 2, 3, 4, 6, and 16 based on the Fisher et al1 bin rank of a meta-analysis of previous whole-genome linkage studies. We chose SNPs in and around regions of transcription as described by Wiltshire et al.15 However, we built upon the efficiency of this strategy by only selecting SNPs that tag seven or more other SNPs. This SNP selection routine was conducted by using tagger (http://www.broad.mit.edu/mpg/tagger/) and HapMap data from the CEPH population (phase II, http://www.hapmap.org). We selected SNPs with a minor allele frequency (MAF)>10% and with a minimum r2 of 0.8. Within this set of SNPs were nine SNPs in and around the complement factor I (CFI) region. We chose another 20 SNPs in the region to adequately tag the entire region using the same tagging parameters as above. The 29 total SNPs span a 173 kilobase (kb) region and tag 114 out of the 116 HapMap SNPs in the area that have a MAF above 5%. These 29 tag SNPs provide very good information coverage for the 114 HapMap SNPs having a mean r2=0.966.

Genotyping

The 1500 SNPs were genotyped at the Center for Inherited Disease Research (CIDR) using an Illumina OPA of which 1409 SNPs passed quality control measures as previously outlined.8The follow-up genotyping and sequencing was performed at the Broad and National Center for Research Resources (NCRR) Center for Genotyping and Analysis using the Sequenom MassARRAY system for iPLEX assays and the ABI 3700XL sequencing system, respectively.

Sequencing

A novel SNP was discovered by sequencing 85 subjects as a subset of our case–control cohort. This SNP has been labeled Broad13981263 and is an A/G SNP, with A being the major allele, and G having an MAF of 7.65%. Broad13981263 is on chromosome 4 just 5′ (113 bp) of CFI's exon 12 according to dbSNP, and is located at the coordinate 110 883 313 base pairs on chromosome 4 according to NCBI build 36.1 (Table 1). Broad13981263 has an r2 of 0.057, 0.006, and 0.003, respectively with rs13117504, rs10033900, and rs11726949. As our two most associated SNPs are not in high correlation with our novel SNP, nor does it have an MAF that is very close to our associated SNPs, we are fairly certain that Broad13981263 is not the causal SNP driving the association in the region between 110 787 671 and 110 961 059.

Table 1 New SNP found by sequencing the CFI exonic regions

Analysis

All linkage disequilibrium (LD) calculations were performed with Haploview.16 We conducted single-locus and two-marker haplotype association analysis using logistic regression tests implemented in PLINK.17, 18

Liability estimation

To calculate the percent variance accounted for by any risk alleles, we assume a prevalence of late-stage AMD in this older age group to be 5% and that liability is normally distributed in the population, with a mean of 0 and a variance of 1.

Results

The most significantly associated SNP in the experiment, rs10033900 (P=9.11 × 10−8), resides in the chromosome 4 linkage peak region according to Fisher et al1 and was very close to genome-wide significance levels as put forth by Dudbridge and Gusnanto and Pe’er et al19,20 (Supplementary Table 1 shows the full results of the screen). Several nearby SNPs were also associated with P<0.0005, suggesting this association was not because of a sporadic genotyping artifact. This SNP is 2781 bp upstream of the 3′ UTR of CFI.

Given this compelling result, we decided to genotype a much higher density of SNPs in this region. Our originally associated SNP (rs10033900) remained the most highly associated SNP with a P-value of 6.46 × 10−8 (OR=0.7056 referring to lower-risk C allele) (Table 2). This SNP showed a very high level of genotyping concordance between the Illumina assay and the iPLEX assay designs.

Table 2 29 SNPs tested across PLA2G12A and CFI on chromosome 4

We tested 29 SNPs across this region for association; conditioning on our most associated SNP, rs10033900, and observed no significant independent associations. We did, however, observe modest residual association at two neighboring, highly correlated SNPs (Table 2). This result suggests that rs10033900 may not be the causal variant but may be highly correlated with said variant. Therefore, we applied multimarker haplotype tests in an attempt to refine and isolate the association signal. We tested the two-marker haplotype of the two closest SNPs to rs10033900, both 5′ (rs13117504) and 3′ (rs11726949). The two-marker haplotype between rs13117504 and rs10033900 shows a somewhat stronger association to AMD than either SNP alone with a P-value of 1.18 × 10−8 (Table 3). None of these three SNPs appear to be functional, although rs11726949 is in intron 11 of CFI. We also tested for differences in association between the neovascular (‘wet’) and geographic atrophy (‘dry’) forms of advanced AMD and found only a 0.2% difference in MAF (46%) between the two groups.

Table 3 Two-marker haplotype association results for most significant SNPs in CFI region

We conservatively defined the span of LD that encompassed the SNPs of interest. All other HapMap SNPs in this region are correlated to rs10033900 and rs13117504 with an r2>0.35. Multimarker haplotype testing across this region did not show any further association above the levels observed by the two-marker haplotype created between rs13117504 and rs10033900. We then sequenced all of the exons in this region to determine whether an obvious functional variant exists that explains this association. This block of LD spans the last two exonic regions of the 3′ end of CFI and all four exons of phospholipase A(2) group 12A (PLA2G12A). We found no SNPs in either gene transcript that could statistically explain the association observed at rs10033900. We did find a novel SNP just 5′ of exon 12 in CFI, but this SNP does not appear to be in high r2 with our associated SNP or haplotype and is, therefore, not the biological source of association.

We next evaluated the role of epistasis between rs10033900 and rs13117504 and the six variants previously established to be associated with AMD.2, 3, 4, 5, 6, 7, 8, 9 Specifically, two variants at CFH, two variants at the CFB/C2 locus, one at the LOC387715/HTRA1 locus, and one at the C3 locus were established as unequivocally associated to AMD risk in this cohort – the typing and analysis of which was described in Maller et al.8, 9 Using logistic regression, we observed no statistically significant interaction terms between any pair of these SNPs. Although weak interactions cannot be excluded, this result suggests that despite targeting the same pathway, these variants largely confer risk in an independent, log-additive fashion.

Given the independent action of this new variant, we were able to add it to the multilocus model from Maller et al.9 We estimate that this variant accounts for approximately 1% of the population variance of liability.21

Discussion

We identify one more gene involved in the complement pathway implicated in AMD pathogenesis. Although the complement pathway has been extensively studied, we are only recently learning about its relationship with AMD. The CFI gene spans 63 kb and contains 13 exons, the first 8 of which encode the heavy chain and the last 5 the light chain, which contains the serine protease domain.22 This serine protease domain is responsible for cleaving and inactivating C4b and C3b.23 C3b inactivation by CFI is regulated by CFH. CFH acts as a cofactor for CFI-mediated cleavage of C3b and also has decay accelerating activity against the alternative pathway C3 convertase, C3bBb. Membrane cofactor protein (or CD46) also acts as a cofactor for CFI-mediated cleavage of C3b by downregulating the complement cascade.24

Although it would be a remarkable coincidence if CFI is not the associated gene, in the absence of a proven causal variant, we cannot formally exclude PLA2G12A from consideration as the source of this association. Secreted PLA2G12A, expressed by the corneal epithelium, could be involved in the normal antibacterial activity in tears and wound healing.25 PLA2G12A has been shown to have bactericidal activity against the Gram-negative bacterium Escherichia coli and Helicobacter pylori in vitro.26 PLA2G12A might also participate in helper T-cell immune response by the release of immediate second signals and generation of downstream eicosanoids.27

With this result we have found two SNPs and their combined haplotype that achieve genome-wide significance.19,20 It is technically possible that the two intergenic SNPs could form a functional haplotype. However, a more likely explanation is that these SNPs tag an undiscovered, biologically relevant, structural variant. The next step will be to more comprehensively evaluate this region with the aim of uncovering this variation.