Natural resistance to Meningococcal Disease related to CFH loci: Meta-analysis of genome-wide association studies

Meningococcal disease (MD) remains an important infectious cause of life threatening infection in both industrialized and resource poor countries. Genetic factors influence both occurrence and severity of presentation, but the genes responsible are largely unknown. We performed a genome-wide association study (GWAS) examining 5,440,063 SNPs in 422 Spanish MD patients and 910 controls. We then performed a meta-analysis of the Spanish GWAS with GWAS data from the United Kingdom (combined cohorts: 897 cases and 5,613 controls; 4,898,259 SNPs). The meta-analysis identified strong evidence of association (P-value ≤ 5 × 10−8) in 20 variants located at the CFH gene. SNP rs193053835 showed the most significant protective effect (Odds Ratio (OR) = 0.62, 95% confidence interval (C.I.) = 0.52–0.73; P-value = 9.62 × 10−9). Five other variants had been previously reported to be associated with susceptibility to MD, including the missense SNP rs1065489 (OR = 0.64, 95% C.I.) = 0.55–0.76, P-value = 3.25 × 10−8). Theoretical predictions point to a functional effect of rs1065489, which may be directly responsible for protection against MD. Our study confirms the association of CFH with susceptibility to MD and strengthens the importance of this link in understanding pathogenesis of the disease.

Meningococcal disease (MD), caused by Neisseria meningitidis, is an important cause of meningitis and septicaemia in children and young adults globally; it is associated with average mortality rates of 10% and long-term disability in survivors despite early diagnosis, antibiotic administration, and intensive care 1,2 . Global N. meningitidis serogroup distribution is varied and dynamic, making trends in disease epidemiology unpredictable 3 . Introduction of conjugate vaccines against serogroup C and A have drastically reduced its incidence in several countries 4 . However failure of wide implementation of the available conjugated vaccines against serogroups A, B, C, Y, and W-135 still makes MD a major challenge worldwide 5 .
There is now strong evidence that host genetic factors influence occurrence of MD, and a number of genes controlling susceptibility and severity of MD have been identified in candidate gene association studies 6,7 . Our previous genome-wide association study (GWAS) on a UK population sample, with validation in Western and southern European cohorts, identified genetic variations in the regulation of complement activation, complement factor H (CFH) and complement factor H related 3 (CFHR3) contributing to MD susceptibility 8 .
In order to identify new genetic associations with susceptibility to MD we carried out a GWAS in a cohort of Southern European (Spanish) MD cases and controls, and then undertook meta-analysis combining our South European GWAS with the previously published UK GWAS.

Results and Discussion
Clinical and demographic data of Southern Europe (Spanish) and UK cohorts are summarized in Table 1.
In the Spanish MD cohort, we genotyped a total of 1,488 samples with 561,880 SNPs using the Illumina Quad 660W array, of which 42 samples had call rates < 90% and were excluded. We detected 45 pairs of samples with suspected first-degree relationships. The sample with the lower call rate among each pair of relatives was removed to break any potential first-degree familial relationship in the cohort. In addition, a total of 69 samples were detected as significant ancestry outliers in the principal component analysis (PCA), and were also excluded, leaving a total of 422 MD cases and 910 controls for further imputation and association analyses. We also filtered the genotyped SNPs, removing those with call rate < 95% (n = 1,258), those where the Hardy-Weinberg equilibrium (HWE) test showed significant deviation at P-value < 10 −7 in controls (n = 12,754 SNPs), as well as SNPs not in human genome build 37 (n = 1,475). The final post QC dataset containing 546,393 SNPs was used to impute genotypes based on the 1000 Genomes Project Phase 1 reference data 9 for an additional 6,749,004 SNPs, with impute probability of 90% and filtered with information score ≥ 0.90. We filtered the post-imputed dataset with the same QC criteria, including a minor allele frequency (MAF) filter of ≥ 1%, to have 5,440,063 SNPs in 422 MD cases and 910 controls. A mild degree of genetic stratification was observed (λ GC value = 1.133) when we performed direct single SNP association analyses on the 5,440,063 SNPs, and hence we adjusted for the top 4 genetic axes of population ancestry on the association analysis, which lowered the λ GC value to 1.03, Supplementary Data Figures S1A and S2A.
Details on, and data analysis for the UK GWAS cohort have been previously described 8 . From a starting number of 547 initially genotyped UK MD cases, a total of 72 samples were excluded from the study because they either failed genotyping (n = 6), were discrepant between clinically assigned and genotyped gender (n = 4), had a genotyping call rate below 95% (n = 17), were duplicates (n = 2), or were identified as a population outlier by PCA (n = 43). This left 475 UK MD cases for genetic association analysis. We used genome-wide genotyping data from UK healthy controls: 2,482 members of the 1958 Birth cohort (58BC), and 2,587 healthy blood donors from the National Blood Service (NBS) from the WTCCC2 project (http://www.wtccc.org.uk/ccc2). Out of 5069 controls, 4,703 passed stringent quality control filters and were included in final analysis. For this GWAS, a total of 516,963 SNPs, in common between the cases and controls, passed stringent quality control filters (per-SNP call rate of > 95%, with HWE P > 10 −7 , present in human genome build 37). These SNPs were used to impute genotypes based on the 1000 Genomes Project Phase 1 reference data 9 , for an additional 7,453,462 SNPs; with impute probability of 90% and filtered with information score ≥ 0.90. We also filter the post imputed dataset with the same QC criteria, including a MAF filter of ≥ 1%, to have 5,365,603 SNPs in 475 UK MD cases and 4703 UK population controls.
Ancestry analysis of the UK sample collection showed that the MD cases and controls were well matched (Supplementary Data Figures S1B and S2B). In this light, the 1-degree of freedom (d.f.) score test for association performed in the UK dataset was not further adjusted for axes of genetic ancestry.
Given that this study has a total of 897 cases and 5614 controls, it has 0.969 power to reject the null hypothesis, considering that the risk allele frequency is at least 10%, the effect size (OR) is 0.6, and an uncorrected alpha threshold P-value = 0.05.
We observed minimal genome-wide inflation of the association test statistics when the UK and Spanish MD collections were analyzed separately (Supplementary Data Figure S1A,B), and thus formal meta-analysis for both sample collections was conducted under the fixed effects model using inverse-variance pooling as well as the random effects model, that were compared to show similar significance in P-values.Single SNP association tests from the meta-analysis between the Spanish and UK collections revealed an excess of significant P-values at the tail end of the distribution (Supplementary Data Figure S1C). Observed against a background of minimal genomic inflation, this suggests that at least some of these significant P-values (< 10 −5 ) could represent true associations with MD.
A total of 20 SNPs surpassed genome-wide significance (P-value ≤ 5 × 10 −8 ) in the meta-analysis study of the two European cohorts that jointly include 897 MD cases and 5,613 controls ( Fig. 1; and Supplementary Data Figure S3A,B for Manhattan plots on individual cohorts). A heterogeneity test showed non-significant values, indicating that the association was in the same direction in the two cohorts. The 20 SNPs fall in the CFH gene (located on chromosome 1q32-q32.1) or very close to this gene (rs72482675).
We found the strongest evidence of association with protection to MD susceptibility at SNP rs193053835 (P-value = 9.62 × 10 −9 per-allele OR = 0.62, 95% C.I. = 0.52-0.73) (Fig. 2) Table 2 and Supplementary Dataset S1. Other SNPs with P-values < 1 × 10 −5 are shown in Supplementary Dataset S2. For the purpose of discovery novel candidate loci, we re-scanned the genome after conditioning on the top SNPs (rs1065489 and rs193053835; see Supplementary Data Figure S4), but we did not find other SNPs significantly associated (P-value ≤ 5 × 10 −8 ) with MD.
We performed a stepwise conditional analysis to assess for the presence of independent signals among the 20 candidate SNPs. After conditioning for the top SNPs, the significant P-values observed in other SNPs disappeared (Supplementary Dataset S3). In addition, many of the SNPs were found to have collinearity, indicating that the 20 SNPs are strongly correlated with each other.
Five out of the 20 CFH associated variants were also reported to be associated with MD in Davila et al. 8 , namely, rs742855, rs1065489, rs11582939, rs11799595, and rs10489456. In addition, the SNP rs1065489 was additionally replicated in a Central European cohort 10 . We did not find an association with rs3753394; a SNP located in the promoter region that was previously reported to be associated with MD susceptibility by Haralambous Characteristics Spanish GWAS UK GWAS n a n b n a n b  et al. 11 . Consistently, all the studies point to a protective effect of these variants, with OR values ranging (in our study) from 0.62 to 0.72. Although the SNP rs12085435, recently associated by Bradley et al. 12 with invasive MD, was not genotyped in the present study, we did not find other SNPs associated within the C8B region. In addition, other SNPs recently associated with Age-related Macular Degeneration (AMD) risk located within the CFH region 13 did not appeared as associated in our meta-analysis. Moreover, we did not find genetic variation associated within CFHR3, as previously reported 8 .
The G allele at rs3753396 reaches the highest frequency in East Asia (50%), but it appears at high frequencies also in Europe and America (~18%); in Africa, however, the G variant reaches only 3.7% of the population (Supplementary Data Table S1). This SNP was also reported to be associated with AMD in Ref. 14.
Similarly, the minor allele T at the rs1065489 locus has a high frequency in East Asia (49.7%), while it reaches moderate frequencies in America (19.1%) and Europe (18.3%), and has a low frequency in Africa (3.7%). Of the 20 candidate SNPs in CFH, the rs1065489, located in exon 19, is the only one that represents a nonsynonymous substitution; and it is therefore more likely to be potentially functional. The data indicate that carriers of allele T have a protective effect against MD, with an OR = 0.64 in the merged cohorts (Table 2 and Supplementary Dataset S1). As already noticed in Davila et al. 8 , the incidence of MD inversely parallels the MAF of this SNP. Thus, sub-Saharan African populations have the highest rates of disease but the lowest frequency of this polymorphism (Supplementary Data Figure S5 and Table S1). Supplementary Data Table S2 shows different functional predictions for this variant; while a number of the predictions indicate that this variant is functionally tolerated, Polyphen2 using the database HDIV (recommended when evaluating loci potentially involved in complex phenotypes) indicates that rs1065489 is possibly damaging. Most interesting, the DANN score of this SNP receives a value of 0.985 (ranging from 0 to 1). It has been reported that this score has the best sensitivity and specificity when compared to comparable scores (e.g. CADD and FATHMM). Moreover, it is worth mentioning that this variant is highly phylogenetically conserved in other mammals (Supplementary Data Figure S6). In addition, Bradley et al. 12 recently described an association of this SNP with genetic susceptibility to MD. It is important to note that all the SNPs in the CFH region are in high linkage disequilibrium ( Fig. 2 and Supplementary Data Figure S7), and therefore, there could be other causal SNPs (in high LD with rs1065489) conferring protection against MD not captured in the present GWAs.
The present study adds further support to the association in CFH to the susceptibility of MD previously reported by Davila et al. 8 . Different polymorphisms at the CFH gene were also found to be associated in a number of diseases, including Atypical Hemolytic Uremic Syndrome 15 , or AMD 16 . The CFH gene encodes for the protein complement factor H, which plays an essential role regulating part of the complement system during an immune response. The complement system involves a cascade of proteins that work together in order to opsonize or kill invading pathogens. According to Schneider et al. 17 N. meningitidis exploits the natural regulators (such as FH) to scape from the complement system and therefore from host immune control. The involvement of CFH and CFHR with MD appears to be through their role in inhibition of complement activation. Complement mediated killing is a fundamental mechanism of immunity to meningococcal invasion. Meningococci express a surface factor H binding protein (fHbp), which binds the host plasma protein FH and possibly FH related proteins to the bacterial surface with high affinity. The bacteria thus appears to use a "Trojan horse" strategy to evade complement mediated killing, using the host CFH to inhibit complement activation on the bacterial surface.
The main limitation of the present study is that the study design lacks external replication stage. Further studies would therefore be needed in order to confirm the present findings. In addition, SNP rs3753396 was found to be protective for MD but it has been reported to be a risk factor in AMD. This interesting finding deserves also further investigation.
Summarizing, the fact that different population-based genome-wide studies show consistent statistical association of the same SNPs strengthens the evidence in favor of an association of the CFH gene in MD susceptibility. All of the significant SNPs were identified to reside within a single haplotype, indicating that the SNPs are in LD and correlated with each other. Further evidence from the stepwise conditional analysis demonstrated that the significant P-values seen in each SNPs are not independent after conditioning for the top SNPs. The risk alleles of the 20 genome wide significant SNPs have odds ratio that are similar within the range of 0.62-0.72, which are protective towards MD. We found that the non-synonymous variant rs1065489 may be directly responsible for a protective effect in MD. However, further studies are needed in order to better understand the molecular and functional mechanisms that relate CFH variants and MD and to identify causal variants within the CFH region. Our findings support complement factor H as an important target for future development of therapeutic and preventive strategies against MD.

Material and Methods
Study participants. Diagnosis of MD was made in patients presenting with a characteristic purpuric or petechial rash, and clinical evidence of meningitis and/or septicaemia 6 Table 2. SNPs located at the CFH gene (chromosome 1) statistically associated with MD in the merged ESIGEM-UK cohort. An asterisk indicates those SNPs that were reported previously as associated with MD susceptibility. Position in base pairs (bp) are given according to GRCh37/hg19. A1: risk allele; Fixed-P: fixedeffects meta-analysis P-value; Fixed.OR: fixed-effects meta-analysis odds ratio estimate with reference to the risk allele; in brackets is the 95% confidence interval; Random-P: random-effects meta-analysis P-value; Random-OR: Random-effects meta-analysis odds ratio estimate with reference to the risk allele; in brackets is the 95% confidence interval. Q: P-value for Cochrane's Q statistic; I 2 : I-squared index between sample collections, with values ranging from 0 (no heterogeneity) to 100 (very high heterogeneity). I 2 value < 50 indicates heterogeneity is not significant. Spanish IS: information on info score for SNPs that were imputed in the Spanish GWAS cohort. UK IS: information score for SNPs that were imputed in the UK GWAS cohort. rs1065489 and rs11582939 were directly genotyped.
Scientific RepoRts | 6:35842 | DOI: 10.1038/srep35842 by culture of N. meningitidis from blood or CSF, or by PCR for meningococcal DNA in blood. Individuals in whom microbiological studies were negative, but presented with a characteristic clinical picture of shock, purpuric rash or meningitis, were also included once other pathogenic bacteria or virus infection were ruled out 20 .  [21][22][23] . Each MD case cohort was accompanied by geographically matched healthy controls. Thus, 992 unrelated healthy Spanish ethnically matched controls were recruited at the same time as the MD cases; a subset of this sample has been previously described 24 . UK controls for GWAS have already been described 8 ; in brief, the genotypic data consisted of 4,703 individuals from the Wellcome Trust Case-Control Consortium 2 (WTCCC2) 8 .
The research was performed in accordance with the ICH Harmonized Tripartite Guidelines for Good Clinical Practice, the convention of the Council of Europe on Human Rights and Biomedicine and with the ethical principles laid down in the Declaration of Helsinki. The project was submitted and approved by the ethics committee of each participating site, namely, Ethics Committee of Galicia (Spain) and NRES Committee London (United Kingdon). Parents information sheets and informed consent forms were also submitted and approved. Informed consent (Informed assent when applicable) was obtained from parents/legal guardians of subjects (subjects when applicable) included in the study prior to any study procedure. The study was performed in compliance with National and European normative that is applicable for human research and data protection. Confidentiality of the identity of the participants was also safeguarded.
Genotyping and data quality control. DNA was extracted from blood samples using established laboratory techniques. The MD cases were genotyped using the Illumina Human 660W Quad BeadChips for the Spanish collection, and the Illumina Human 610 K Quad BeadChips for the UK collection following manufacturer instructions.
The Spanish controls were genotyped using the Illumina 660W Quad BeadChip. The UK controls (drawn from the Wellcome Trust Case-Control Consortium 2; WTCCC2) were genotyped using the Illumina HumanHap 1.2M chip. Access to raw data for this control dataset was granted by The Wellcome Trust Consortium Data Access Committee for our study.
After pre-phasing the Spanish and UK GWAS datasets using SHAPEIT 25 , IMPUTE2 was used to perform imputation for additional SNPs genotype using the 1000 Genomes phase 1 (Feb 2012 release) cosmopolitan populations, as reference; Ref. 9. We used cosmopolitan multi-ethnic groups as reference for imputation because there is evidence performed on cosmopolitan panels versus population specific panels demonstrating that in general, the additional unrelated samples help to improve the imputation accuracy, in particular for the low frequency SNPs. Moreover, this procedure does not seem to penalize or reduce the accuracy, even when additional redundant samples are added as reference in the imputation 26 .
A selection of stringent quality control (QC) filters were applied to the genome-wide genotyping data to remove poorly performing SNPs and samples, using tools implemented in PLINK version 1.07 (Ref. 27). SNPs that had > 5% of missing genotypes, gross departure from HWE (test for HWE showing significant deviation at P -value < 10 −7 in controls) or were of MAF below 1% were excluded from downstream analysis. For QC, samples with an overall genotyping call rate of < 95% were excluded from analysis. Most of the SNPs located in the CFHR1 and CFHR3 genes were filtered due to a poor call rate at this region. We therefore used a more relaxed filter (call rate > 80%) in order to retain SNPs at these genes. However, the combined meta-analysis did not revealed better SNP candidates (Supplementary Data Figure S8A,B,C).
Samples were subjected to biological kinship verification by using the principle of variability in allele sharing according to the degree of relationship. Identity-by-state (IBS) information was derived using PLINK. Pairs of samples predicted to be first-degree relatives based on IBS sharing were identified, and the sample with the lower call rate of each pair was excluded from further analysis.
We combined the Spanish or UK samples with the Africans (YRI), East Asians (CHB, CHS, JPT), and samples with European descent (CEU, GBR, FIN, ISB, TSI) of the 1000 genomes project 28,29 , and used their LD-pruned SNPs to perform PCA as implemented in EIGENSTRAT 30 and extracted the values of PC1 and PC2 to plot Supplementary Data Figure S2A,B. We also performed independent PCA on the Spanish GWAS dataset that consisted of 1332 post-QC samples, and extracted the top 4 genetic axes (PC1-PC4) of population ancestry to include as covariates in the SNP association analysis.

Statistical analysis.
A meta-analysis was carried out on the GWAS data from the Spanish and the UK cohort. Statistical analyses for association were undertaken using an additive model logistic regression as implemented in PLINK v1.07; Ref. 27. Principal component (PC) analysis as implemented in EIGENSTRAT 30 was undertaken to account for spurious genetic associations resulting from ancestral differences of individual SNPs. PC plots were performed using the R statistical program package (www.r-project.org/).
For the independent GWAS data from Spain and UK, analysis of association with MD was undertaken using a 1 d.f. logistic regression test, which allows further adjustment of incorporating additional principal components as covariates of population stratification. The combined meta-analysis on the results of both Spanish and UK collections was performed using both the inverse-variance fixed effects method as well as the random effects model. LocusZoom 31 was used to plot the CFHRs genes regional association plot using the combined fixed effect meta-analysis P value of SNPs located in this region. LD in the plot was estimated based on the 1000 genomes EUR population and using the hg19 genome build coordinates. R 2 correlation values and the D′ CI method of Gabriel et al. 32 as implemented in Haploview 33 were used to plot the pairwise LD map between the 20 genome wide significant SNPs within the Spanish GWAS samples or the UK samples.
Functional annotation of gene variants was carried out using ANNOVAR 34 . ANNOVAR obtains from high-throughput sequencing data, different variant functional predictions, such as: (a) SIFT (Sorting Intolerant From Tolerant) 35 that predicts whether an amino acid substitution is likely to affect protein function based on sequence homology and the physico-chemical similarity between the alternate amino acids, (b) PolyPhen2 36 : prediction of functional effects of human nsSNPs, (c) LRT (Likelihood Ratio Test) 37 identifies a subset of deleterious mutations that disrupt highly conserved amino acids within protein-coding sequences, (d) MutationTaster 38 evaluates the disease-causing potential of DNA sequence alterations, (e) MutationAssesor: predicts the functional impact of amino-acid substitutions in proteins, and (f) FATHMM or Functional Analysis Through Hidden Markov Models 39 . CADD (Combined Annotation Dependent Depletion) 40 was also used for scoring the deleteriousness of single nucleotide variants. The algorithm that runs in the DANN score uses the exact same training and annotation data as CADD, but using a different machine learning approach 41,42 .