Introduction

Adhesion of Neisseria meningitidis to the human nasopharyngeal epithelium is critical for establishment of invasive meningococcal disease,1 a globally important infection that cannot be effectively controlled due to the lack of a comprehensive vaccine.2 In Europe 1–6/100 000 people contract the disease annually, with a case fatality rate of approximately 8%.3 Asymptomatic carriage is more frequent however, with meningococci present in approximately 10% of the general population and up to 40% in 15–24 year olds.4 Both colonization and invasion require the specific interaction of Opacity-associated adhesin (Opa) proteins on the meningococcal surface with the carcino-embryonic antigen cell adhesion molecule (CEACAM) family of cell surface proteins on the human nasopharyngeal epithelium.5 Most individuals are colonized asymptomatically but, for incompletely understood reasons, adhesion in some hosts is followed by invasion of the mucosa, entry of meningococci into the blood stream and meningococcal disease. Most global meningococcal disease over the past half century has been caused by less than ten genotypes, known as the hyperinvasive lineages, as defined by multilocus sequence typing.6, 7 As these genotypes are also carried asymptomatically, their acquisition alone cannot explain all cases of invasive disease.

CEACAM proteins are encoded by 7 members of a 29-member gene family (the remainder being pseudogenes), in the q13.2 region of chromosome 19.8 Their usual biological functions range from cell adhesion, facilitation and regulation of signal transduction to a possible role in innate immunity.9, 10, 11 Structurally, these proteins belong to the immunoglobulin superfamily of surface proteins, consisting of an IgV-like N-terminal domain followed by a number of IgC-like conserved domains. In some members of the CEACAM family, dependent on variation in splicing, these are linked to the cell surface by a GPI anchor whereas in others, a transmembrane region and a cytosolic signalling domain are present. The extramembranous loop regions of Opa proteins, which exhibit high levels of diversity12 interact with the nonglycosylated face of a β-pleated-sheet in the N-terminal domain of CEACAM proteins.5, 13, 14 Meningococcal Opa proteins have been shown to bind to HeLa cells expressing CEACAM1 (formerly known as BGPa or CD66a), CEACAM3 (formerly known as CGM1a or CD66d), CEACAM5 (formerly known as CEA or CD66e) and CEACAM6 (formerly known as NCA or CD66c) and to soluble chimaeric CEACAM1 N-terminal domains.14, 15

Despite their diversity, the majority of meningococcal Opa proteins maintain the ability to bind to at least one member of the CEACAM family and in some receptor-ligand pairs, this interaction leads only to attachment, whereas uptake into the host cells ensues in others.5, 13, 14, 15, 16 Upregulation of CEACAM expression in vitro leads to increased uptake of meningococci, even when encapsulated,17 further indicating the importance of CEACAM proteins to the meningococcal life cycle and pathogenesis of meningococcal disease.

Host genetics is thought to contribute one-third of meningococcal disease susceptibility18 but little is known about the population genetic diversity of the CEACAM family and its influence on susceptibility, which may be manifested at the level of interaction between bacterium and host. Genetic diversity in CEACAM may result in particular members of the human population being more easily colonized by meningococci, perhaps placing them at an increased risk of invasive disease. Furthermore, diversity in CEACAM may provide an evolutionary selection pressure driving the high diversity of the Opa proteins12 for specificity to individual CEACAM variants. This study was undertaken to determine whether genetic differences in human CEACAM genes are associated with susceptibility to meningococcal disease.

Results

CEACAM genetic diversity

During curation of data from the initial 94 samples from each of the case and control groups, assays producing genotypes falling within Hardy–Weinberg equilibrium (HWE) and which failed in less than 15% of these initial samples were accepted. A total of 26 samples (18 case, 8 control) were removed due to their high assay failure rates, possibly reflecting DNA quality. Subsequent removal of assays across the remaining 76 cases and 86 controls left 9 single nucleotide polymorphisms (SNPs) in CEACAM1, 10 in CEACAM3 and 16 in each of CEACAM5 and CEACAM6. Monomorphic (noninformative) SNPs were removed, leaving a total of 38 informative SNPs. CEACAM1 was considerably less polymorphic than the other three genes with 6 of 9 SNPs monomorphic compared to 2 of 10 in CEACAM3, 2 of 16 in CEACAM5 and 3 of 16 in CEACAM6. A total of 13 tag SNPs, 2 in CEACAM1, 3 in CEACAM6 and 4 in each of CEACAM3 and CEACAM5 were chosen. These SNPs were used to investigate the genetic diversity in a final total of 384 cases and 190 controls, reflecting removal during data curation of 75 samples (59 case and 16 control) due to their high assay failure rates, again possibly due to poor sample quality. No significant difference in allele or genotype frequency was detected at any SNP between the final set of case and control samples (full data set available from authors upon request). N-terminal-encoding domain SNPs, which could directly affect Opa binding were also analysed. In CEACAM5, the SNP rs3815780 was monomorphic in both cohorts, whereas rs1805223 in CEACAM6 was polymorphic, but no significant differences in the allele frequency between cases and controls were observed.

Linkage disequilibrium (LD) analysis

LD within each gene was analysed using the 38 informative polymorphisms in the initial set of samples (Figure 1). The majority of sites in CEACAM1 and CEACAM3 were in complete LD. In CEACAM5, SNPs in cases exhibited apparently higher LD than in controls, especially 3′ of the SNP rs7249230. The most striking feature of the LD plot for CEACAM6 was the appearance of two distinct high LD blocks (SNPs rs6508996-rs4803507 and rs1971787-rs6508997) with the majority of SNPs in either half of the gene not in linkage disequilibrium with those of the other half. In controls but not in cases, SNP rs6508996 was in higher LD with 6 of the 7 SNPs towards the 3′ end of the CEACAM6 gene and in lower LD with its closest neighbours.

Figure 1
figure 1

Linkage disequilibrium (LD) plots for SNPs in CEACAM3, 5 and 6. LD plots were constructed using the program MARKER and based on absolute D′ values (a standard measure of LD based on the normalized deviation of the observed allele frequencies from the expected).19 Gene direction is 5′–3′ from the top to the bottom of each map. The level of LD is indicated by color with high LD between sites (absolute D′ of >0.9) indicated by red squares, through intermediate to high LD (absolute D′ of 0.7–0.9) indicated in yellow, intermediate LD (absolute D′ of 0.5–0.7) in gray and low LD (absolute D′ below 0.5) in white. An LD map for CEACAM1 was not shown as only three polymorphisms were detected, which were in complete LD (data not shown). Numbers in the left hand column are major allele frequencies.

Haplotype analysis

Haplotype structure, phylogenetic relationships and frequencies were analysed and compared in the final case/control set (Figures 2a–d). In CEACAM1, four major haplotypes were observed (Figure 2a), accounting for all case and control samples. The haplotypic diversity of CEACAM1 was dominated by haplotype A, observed in 94.1% of cases and 89.9% of controls. Haplotype D clustered together with the low frequency haplotypes B and C, representing a further 5.3% cases and 8.9% controls.

Figure 2
figure 2figure 2

(a–d) Genetic diversity of CEACAM1, 3, 5 and 6 (a–d, respectively). Exons appear as solid red boxes on the gene architecture schematic while untranslated regions are shown as solid blue boxes. All SNPs analysed are indicated: solid green lines were polymorphic sites, whereas dashed lines were monomorphic in our cohort. Diversity at tag SNPs is indicated by red boxes. Phylogenetic relationships, indicating topology only, and case vs control frequencies are shown for haplotypes present at greater than 1% frequency in the sample collection. Groups of haplotypes detected by the chosen sets of tag SNPs are indicated by curved black lines.

In CEACAM3, six major haplotypes were observed (Figure 2b), accounting for 99.4% of cases and 99.8% of controls. Haplotype A clustered separately and accounted for 40.3% of cases and 37.6% of controls. Three other haplotypes, B (23.4% of cases, 22.3% of controls), C (13.4% of cases, 21.5% of controls) and F (20.2% of cases, 17.1% of controls) were also common, whereas haplotypes D and E together accounted for 2.1% of cases and 1.3% of controls.

In CEACAM5, nine major haplotypes were observed (Figure 2c) accounting for 95.8% of cases and 94.7% of controls. Two haplotypes in CEACAM5 accounted for approximately half of the haplotypic diversity in both cases and controls. Haplotype A accounted for 26.0% of cases and 24.5% of controls whereas haplotype E was observed in 26.2% cases and 27.0% of controls. Phylogenetic analysis revealed that the nine haplotypes were grouped into six subclades clustering into two major clades. The first major clade (haplotypes A–D) accounted for 35.7% of case haplotypes and 31.9% of control haplotypes. The second clade (haplotypes E–I) accounted for the remaining 60.1% of case haplotypes in 62.8% of control haplotypes. This phylogenetic structuring was most likely caused by the approximately mirrored alleleic differences towards the 3′ end of the gene after SNP rs7249230, consistent with LD analysis of this gene.

In CEACAM6, 12 major haplotypes (Figure 2d) accounted for 99.6% of cases and 99.9% of controls, with haplotype A accounting for 32% of cases and 33% of controls. Haplotypes were grouped into 4–7 subclades forming two major clades. The clade containing haplotypes A–F accounted for 43.2% of case haplotypes and 43.5% of control haplotypes whereas the clade including haplotypes G–L accounted for the remaining 56.4% of haplotypes in both cohorts. As in CEACAM5, the phylogenetic differences in CEACAM6 were consistent with LD analysis of this gene, reflecting the mirrored allelic diversity in its 3′ half, downstream of, and including SNP rs1971787.

Association of CEACAM diversity with meningococcal disease

The frequency distribution of haplotype C in CEACAM6 was significantly different between cases and controls (χ2 P=0.018) and the effect of carrying this haplotype on meningococcal disease was dose dependent (Pearson's χ2 P=0.017). Possession of this haplotype, which was observed in 7.6% of cases, was associated with an OR of 2.01 with 95% CI of 1.13–3.6 (RR of 1.21 with 95% CI of 1.01–1.46) for increased susceptibility to meningococcal disease. The frequency distribution of haplotype B in CEACAM6 (observed in 3% of cases) and haplotype C in CEACAM3 (observed in 13% of cases) was also significantly different between cases and controls (χ2 both P<0.001) and their effects were also dose dependent (Pearson's χ2 both P<0.001). CEACAM6 haplotype B had an OR of 0.29 with 95% CI of 0.14–0.61 and an RR of 0.57 with 95% CI of 0.36–0.9. CEACAM3 haplotype C had an OR of 0.52 with 95% CI of 0.35–0.075 and an RR of 0.79 with 95% CI of 0.64–0.96). No association between susceptibility to meningococcal disease and genetic diversity at either CEACAM1 or CEACAM5 was detected. The P-values presented are uncorrected. Use of the highly conservative Bonferroni correction would be inappropriate here since this correction assumes no linkage disequilibrium between SNPs.

Discussion

Here we describe the genotypic diversity of four CEACAM genes in a case/control study cohort of Caucasians to determine whether the diversity in these genes influences susceptibility to meningococcal disease. Three haplotypes in two CEACAM genes were found to affect susceptibility, which may exert their effects through two mechanisms. However, our data suggest that diversity in CEACAM is unlikely to drive the high diversity of the Opa proteins, adhesins that are important in mediating meningococcal pathogenesis.

Genotyping of the CEACAM genes revealed differences in the extent of their diversity relative to each other, each with a limited number of high-frequency haplotypes circulating in the Caucasian population. Based on the SNPs typed in this study, CEACAM1 was less polymorphic than CEACAM3, 5 and 6. This could reflect a necessity to maintain the structure and function of the CEACAM1 protein, and explain why the majority of Opa proteins tested bind to CEACAM1, whilst varying in their abilities to bind to those of other CEACAM proteins. Meningococcal Opa proteins have been shown to bind to HeLa cells expressing CEACAM1, CEACAM3, CEACAM5 and CEACAM6 and to soluble chimaeric CEACAM1 N-terminal domains.5, 14, 15, 16 Furthermore, the comparatively lower level of polymorphism in CEACAM1 may also have led to its targeting by the adhesins of other bacterial pathogens including other Neisseria species (which also express Opa proteins), UspA1 of Moraxella catarrhalis,20 the P5 proteins of Haemophilus influenzae21 and the Afa/Dr adhesins of Escherichia coli22 during adhesion to human hosts.

Our data indicate that CEACAM diversity contributes little in driving the high diversity of meningococcal opa genes,12 suggesting that immunological selection pressures on these antigens, as in other meningococcal outer membrane proteins, including the PorA and PorB porins are more influential.23, 24 Future functional studies aimed at more fully understanding the role of Opa proteins in meningococcal pathogenesis, together with that of CEACAM-binding adhesins in other bacterial species, may be aided by our observation that the genetic variation in their receptors is low in the Caucasian population.

Haplotype C in CEACAM6 was significantly associated with meningococcal disease in 7.6% of cases while CEACAM3 haplotype C and CEACAM6 haplotype B were significantly associated with protection against meningococcal disease. The effects of these haplotypes were dose dependent, amplified in homozygous individuals. Although uncorrected P-values were presented for the reasons given in the results section, these data warrant further investigation in a larger set of samples and of the functional role of these CEACAM haplotypes in susceptibility to meningococcal disease. Phylogenetic and LD analyses revealed striking features of the genetic structure of the CEACAM6 gene, with phylogenetically distinct mirror haplotypes and a block of high LD in its 3′ half indicating a likely recombination breakpoint between SNPs rs4803507 and rs1971787. CEACAM5 also exhibited a similar, but less clear, pattern.

CEACAM diversity may influence human susceptibility to meningococcal disease in two ways. CEACAM proteins on the nasopharyngeal epithelium and on cells of the immune system are recognized as receptors for meningococcal Opa proteins,5 so susceptibility may be influenced by the effects of CEACAM diversity on meningococcal adhesion. Since CEACAM is also expressed on endothelia,15 it may play a role in mediating meningococcal entry into the meninges. In this study, there was no evidence of polymorphism in the Opa binding site, encoded by the first and second exons of each CEACAM gene, affecting disease susceptibility. Polymorphisms in the 5′ extragenic regions upstream of CEACAM6 may, however, be associated with effects on the regulation of gene expression but further functional analyses would be required to test this hypothesis. There is clear evidence, however, that increased surface density of CEACAM proteins leads to increased Opa-mediated uptake of even encapsulated meningococci and expression levels of different CEACAMs on the same cells may also be important in differential mediation of cell signalling.17, 25

An alternative means by which CEACAM-mediated disease susceptibility may arise involves subversion of the human immune response. Interaction of gonococcal Opa proteins with CEACAM1 switches off and prevents proliferation of CD4+ T cells.26 The effect of CEACAM diversity on immunity may not reflect a role in binding Opa, but may indicate an underlying immunological defect, perhaps involving intercellular communication or contact. CEACAM3 and CEACAM6, along with other CEACAM proteins are thought to play a role in innate immunity.10, 27 CEACAM3, which consists only of an amino-terminal IgV-like domain, is exclusively expressed on granulocytes and is involved in opsonin-independent phagocytosis and oxidative killing of human-specific bacterial pathogens.11, 28 Taken in conjunction with our data, this may indicate that individuals with protective haplotypes in CEACAM3 and CEACAM6 have improved innate immunity against the meningococcus, whereas the disease-associated haplotype in CEACAM6 may indicate a defect in innate immunity. The underlying molecular mechanisms are unclear however, and would require further analysis of CEACAM polymorphism and function. Different expression levels of alternatively spliced CEACAM isoforms or alternatively posttranslationally processed CEACAM proteins may also play a role in mediating different signalling pathways and in innate immunity29 and it may be useful in the future to investigate levels of CEACAM3 expression on granulocytes and CEACAM6 on other cells of the immune system.

A number of examples of polymorphisms in cell surface receptor-encoding genes affecting human disease susceptibility to a variety of pathogens have been reported. Promoter polymorphisms in CD209, encoding the C-type lectin DC-SIGN, a major Mycobacterium tuberculosis receptor, are associated with decreased risk of developing tuberculosis.30 These polymorphisms appear to be more common in Eurasian populations than Africans, possibly reflecting a longer history of Eurasian exposure to tuberculosis. Homozygosity of tandem repeats in the CLEC4M gene, also known as CD209L and encoding the protein L-SIGN, is protective in severe acute respiratory syndrome coronavirus infection.31 Homozygotes for a 32 bp deletion in CCR5, an HIV1-coreceptor gene, are highly resistant to the virus.32

Among a range of suggested environmental, host and bacterial factors,33 host genetic polymorphism is thought to contribute approximately a third of the total risk for invasive meningococcal disease.18 Previous studies have detected the influence of polymorphisms in genes involved in both the acquired and innate immune response, the inflammatory response and the coagulation/fibrinolysis pathway on aspects of meningococcal disease including susceptibility, severity and outcome.34 Secondary familial cases of meningococcal disease35 are also known to be associated with genetic polymorphism.18, 36 Our data suggest that haplotype C in CEACAM3 and B and C in CEACAM6 could present additional targets for future investigations assessing familial risk.

Most meningococcal disease in the latter half of the 20th century was caused by a small number of genotypes known as the hyperinvasive lineages.6, 7 In developed countries, these represent a small proportion of the asymptomatically carried meningococcal population, causing endemic disease, localized outbreaks and epidemics. In the developing world, hyperinvasive lineages are responsible for periodic, large-scale epidemics and pandemics. It is tempting to speculate that genetic polymorphisms, perhaps including those detected in this study, contribute to the observed epidemiological differences by increasing the risk of meningococcal disease in particular populations.

Materials and methods

Patients and controls

A total of 387 Caucasian patients were recruited following either admission to the pediatric intensive care unit at St Mary's Hospital, London, United Kingdom between 1992 and 2002, or recruited from a UK meningococcal disease study overseen by the Royal College of Paediatrics and Child Health in which all fatal pediatric meningococcal disease cases between 1 December 1997 and 28 February 1999 were investigated. Confirmation of diagnosis was made by positive meningococcal culture from blood or cerebrospinal fluid (CSF), detection of increased meningococcal antibodies, or by PCR detection of meningococcal DNA in the blood or in CSF. In patients with no microbiological confirmation, meningococcal disease was diagnosed clinically upon presentation with petechial or purpuric rash and fever and features of systemic sepsis or meningitis where no other pathogen could be isolated. A further 56 samples from survivors of meningococcal disease were enrolled via the Meningitis Research Foundation (MRF) charity between 1996 and 1999. The patients' general physician or hospital consultant confirmed the diagnosis. There was no overlap between the three sources of patient recruitment, which together yielded 443 samples. Reliable statistical analysis of the influence of individual polymorphisms and haplotypes on the clinical severity of meningococcal disease was prohibited by the low number of samples from cases for which clinical disease severity scores were known.

A total of 206 DNA samples were extracted from the blood of healthy Caucasian individuals for use as controls, enrolled via both St Mary's Hospital (n=42) and the MRF (n=164). These individuals originated from throughout the United Kingdom and were nonrelated contacts of patients at the time of meningococcal disease onset, but who had not themselves contracted the disease.

All DNA samples were prepared using established techniques as previously described.37 Ethical approval for the investigation of CEACAM diversity in this sample collection was obtained from Oxfordshire Local Research Ethics Committee A (OXREC number 04.OXA.024) and the St Mary's Local Research Ethics Committee (St Mary's LREC number EC3263) under whose approval the samples were originally collected.

Choice of SNPs

A total of 20 known single nucleotide polymorphisms (SNPs) per gene in the CEACAM 1, 3, 5 and 6 genes, which encode proteins that interact with meningococcal Opa proteins,14, 15 were initially chosen from dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) and ENSEMBL (http://www.ensembl.org/) databases covering an ∼832 kb region of chromosome 19. The SNPs appeared only once in the human genome sequence and first round PCR products gave only single bands when analysed by agarose electrophoresis (data not shown). SNPs in CEACAM4, 7 and 8 were not included due to the lack of evidence of their interaction with meningococcal Opa proteins at the time the study was designed. The possibility that some Opa proteins interact with these cannot however be discounted completely since the number of Opa proteins investigated is relatively low. CEACAM1 SNPs were located on average once every 1409 bp. CEACAM3 SNPs were located on average once every 1507 bp. CEACAM5 SNPs were located on average once every 1880 bp. CEACAM6 SNPs were located on average every once every 1353 bp. A full list of SNPs analysed in this study is available upon request from the authors.

Genotyping

High-throughput genotyping was performed using the Sequenom MassARRAY system using the Homogeneous MassEXTEND assay as previously described.38 Briefly, oligonucleotide primer pairs (Metabion, Martinsried, Germany) were designed using the Sequenom SpectroDESIGNER software and used to amplify short sequences surrounding the chosen SNPs. Nonincorporated dNTPs were removed from amplicons using arctic shrimp alkaline phosphatase, before a third (universal extension) primer specific to each SNP allowed determination of alleleic differences by primer extension reaction in the presence of SNP specific dNTP/ddNTP termination mixes. Alleleic differences were determined by mass spectrometry using the Sequenom SpectroPOINT/SpectroCHIP system in conjunction with a Bruker Biflex III Mass Spectrometer. Raw data were autocurated electronically using the Sequenom SpectroTYPER software and manually checked to ensure accuracy and consistency of allele assignments.

Data analysis

Initially, a set of 94 samples from each of the case and control cohorts was investigated to choose informative assays for further haplotype and allele frequency analysis of the remaining cohort. Major and minor allele frequencies and overall genotype frequencies were calculated for all SNPs. HWE at each SNP and linkage disequilibrium within each CEACAM gene was analysed using MARKER (http://www.gmap.net/marker). HWE in cases and controls was tested separately and SNPs with P<0.001 HWE were excluded. The program SNPHAP (written by David Clayton, University of Cambridge, United Kingdom, accessed through a Pise interface at www.gmap.net) was used to construct and analyse haplotypes occurring at greater than 1% frequency in each CEACAM gene. Phylogenetic trees indicating relationships among haplotypes were constructed using the neighbour-joining algorithm implemented in the NEIGHBOUR program, part of the PHYLIP software package (Felsenstein J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle, WA, USA).

In conjunction with manual analysis, the program ENTROPY (http://www.well.ox.ac.uk/~rmott/SNPS) in MARKER (www.gmap.net/marker), with default settings, was used to generate ‘tag’ SNPs that were informative of the haplotypic diversity in each CEACAM gene. These SNPs were used to investigate the genetic diversity of CEACAM genes in the remaining case and control samples and to investigate their associations with meningococcal disease.

The program SPSS was used to perform statistical analyses on the full data set. Allele, genotype and haplotype frequencies were compared between case and control cohorts by χ2 analysis. Pearson χ2 analysis was then used to determine whether any associations were dose dependent. Odds ratio (OR) and relative risk (RR), both with 95% confidence intervals (CI), were calculated to determine the degree of association with meningococcal disease.