Introduction

Kawasaki disease (KD) is an acute systemic vasculitis of young children characterized by mucous membrane inflammation, cervical lymph adenopathy, rash and fever.1 It mainly affects children younger than 5 years of age and the peak of the onset is near 6 months. Although it is usually a self-limited disorder, serious, and sometimes life-threatening complications related to coronary artery aneurysms can develop in 15–20% of untreated patients.2 The disease is the leading cause of acquired heart diseases in Japan and also in the US.3 Since Kawasaki1 described the disease in 1967, extensive efforts have been made to identify the etiology of the disease. However, to date, the etiology still remains elusive. Genetic factors are suspected to influence the susceptibility to an unknown hypothetical pathogen(s) based on the following two epidemiological studies. First, siblings and offspring of KD patients are at higher risk for the disease.4, 5 Second, marked ethnic differences in KD susceptibility exist with increased prevalence of KD among Japanese Americans in Hawaii.6 The human genome project has provided us with new tools for examining genetic susceptibility to disease. Almost 200 000 well-characterized single-nucleotide polymorphisms (SNPs) for the entire genome of the Japanese population are now available (IMS JSNP database, http://snp.ims.u-tokyo.ac.jp/). Genome-wide microsatellite information is also available (UniSTS, http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unists). With these resources, we decided to search for genes for KD susceptibility using a genome-wide scan. Initially, we performed affected sib-pair linkage analysis and have identified four possible loci, 12q24, 7p15, 19q13.3, and Xq26 (Onouchi et al, manuscript in preparation). The fact that male children are predominantly affected with KD (male/female=1.4)7 suggests that a gene on the X chromosome could be implicated in KD susceptibility.

Recently, elevated expression of CD40 ligand (CD40L) on CD4+ helper T cells and platelets during the acute-phase KD and significantly higher expression in KD patients with coronary artery lesions (CAL) was reported.8 We found that the CD40L gene locates on Xq26, the same region of identified peak by our linkage analysis. In addition, CD40L is known to induce endothelial cells to produce cell adhesion molecules and chemokines, and these molecules are suspected to play important roles in the pathophysiology of KD.9, 10, 11, 12, 13, 14, 15 Thus, we hypothesized that CD40L is a strong candidate gene whose genetic alteration could influence KD susceptibility. To test this hypothesis, we performed the following experiments. First, the entire 12.2 kb gene region containing 3.4 kb 5′ upstream was sequenced to find SNPs. We identified a total of 22 SNPs, four in the 5′ upstream region, one in exon 1, one in intron1, one in intron 2, 10 in intron 4 and five in exon 5. In all, 10 of the 22 SNPs were novel. A CA dinucleotide repeat in the 3′ untranslated region of exon 5 was also used as a marker. Second, an association study of these polymorphisms with 427 Japanese KD patients and 476 healthy Japanese controls was conducted. Among these polymorphisms, a novel SNP in intron 4, which is marginally over-represented in KD altogether, has revealed to be significantly more frequent in male KD patients with CAL when compared to controls. Our findings may suggest a role of CD40L in the pathogenesis or disease severity of KD. Considering an effect of lyonization on the X chromosome genes and the extremely rare frequency of this SNP in Caucasian population (0.7%), this polymorphism is a good candidate to explain the sex and/or ethnic differences in KD susceptibility and outcome.

Materials and methods

Patients and samples

All patients were diagnosed with KD by pediatricians according to the criteria established by the Japan Kawasaki Disease Research Committee (http://www.kawasaki-disease.org/diagnostic/index.html). CAL was diagnosed with dilated coronary artery lesions measured by two-dimensional echocardiography under the definition in the guideline.16 Of 427 patients studied, 81 were the probands of sib-pair or sib-trio cases and the remaining were sporadic cases. In total, 476 Japanese volunteers constituted by 338 healthy adults without history of KD and 138 children visiting the hospitals to be treated for their illness other than KD were selected as controls. The Ethical Committee of RIKEN approved this study, and all patients and Japanese volunteers gave written informed consent for blood sampling and DNA analysis. EB virus-transformed lymphoblastoid cell lines were established in 42 of the sibling cases using previously described methods.17 Genomic DNA was extracted from peripheral leukocytes or such cell lines by standard procedures.18 In all, 96 DNA samples of unrelated Caucasian individuals were purchased from the Coriell Institute for Medical Research.

Polymerase chain reaction

All polymerase chain reactions (PCRs) were performed on GeneAmp 9700 thermal cyclers (Applied Biosystems, Foster City CA, USA). PCR thermocycling parameters used in discovery and genotyping SNPs were first denatured for 5 min at 94°C, followed by 37 cycles of 30 s each at 94, 60 and 72°C, and a final extension step of 7 min at 72°C.

Discovery of variations

Approximately 16 kb of genomic region containing all exons, introns and 3.4 kb of 5′-flanking of the CD40L gene was sequenced in search of variations. Repetitive sequences were masked using the RepeatMasker program (http://repeatmasker.genome.washington.edu/cgi-bin/RepeatMasker) and excluded from the screening. PCR amplification was performed in a 16-μl reaction volume containing 67 mM Tris–HCl, 16.6 mM (NH4)2SO4, 6.7 mM MgCl2, 10 mM β-mercaptoethanol, 6.7 mM EDTA, 1.5 mM of each dNTP, 0.45 mM of each primer, 1.0 U of Ex Taq™ Hot Start Version (Takara, Tokyo, Japan) and 10 ng of genomic DNA. DNA samples from unrelated 24 females, which included 18 KD patients and six healthy controls were analyzed. Sequences, positions and sizes of amplicons of the each PCR primer pair used were summarized (Supplementary Table 1). Purified and diluted PCR products were sequenced using ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kits v2.0 (Applied Biosystems, Foster City CA, USA). Electrophoresis was conducted on an ABI 3700 auto sequencer (Applied Biosystems, Foster City CA, USA). GAP4 program contained in Staden Package (http://www.mrc-lmb.cam.ac.uk/pubseq/) was used to detect variations.

Linkage disequilibrium analysis

Pairwise LD analysis for 22 SNPs discovered in the screening was performed using the SNPAlyze V3.0. program (Dynacom, Mobara, Japan). Tightly linked (r2>0.5) SNPs were grouped and representative SNPs from each group were analyzed in the subsequent case–control study.

Genotyping of SNPs

Genotyping of the SNPs was carried out by PCR-RFLP, except for SNP140. Mismatch primers designed to generate restriction enzyme recognition sequences including SNPs within them were used in each primer pair. After PCR amplification, products were digested using the appropriate restriction enzyme according to the manufacturers’ instructions. Primers and restriction enzymes used are listed in Supplementary Table 2. Separation was performed on 4% agarose gels in TBE buffer. The gels were stained by ethidium bromide. Direct sequencing of PCR fragments was conducted for genotyping SNP140. Primers used to amplify 357 bp of the SNP140 region were 5′-TCTTTGCGTGCAGTGTCTTTCC-3′ and 5′-TGTTAGAAAGGGGGATTGAGAAG-3′.

Genotyping of dinucleotide repeats

PCR amplification of dinucleotide repeat located in the last exon of CD40L gene was performed using the following primers (forward primer 5′-AGTCTCTTCCCTCCCCCAGTCT-3′; reverse primer 5′-GAACTGACTAGCAACGGCCTGA-3′). The forward primer was labeled with VIC (Applied Biosystems, Foster City CA, USA). Amplified samples were diluted and mixed with deionized formamide containing LIZ-labeled molecular size marker (GeneScan 500LIZ, Applied Biosystems, Foster City, CA, USA). Electrophoresis was conducted on an ABI 3700 auto sequencer (Applied Biosystems, Foster City CA, USA). Analyses of the electropherogram results and size calling of amplified DNA fragments was performed using Genescan 3.5.2 software and Genotyper 3.7 software (Applied Biosystems, Foster City CA, USA). In all, 473 healthy Japanese individuals (210 males and 263 females), 81 probands of sib-pair cases (40 males and 41 females) and 308 sporadic cases of KD (191 males and 117 females) were genotyped. Subsequently, estimation of allele frequency and analyses of linkage and association were performed.

Statistical analysis

Frequencies of alleles and genotypes in KD cases and controls were compared using χ2 tests. Two-point sib-pair analysis for dinucleotide repeat was conducted by using MAPMAKER/SIBS program version 2.1 (ftp://ftp-genome.wi.mit.edu/distribution/software/sibs) in the sex-linked mode. In this method, brother–brother, brother–sister and sister–sister sib pairs are separately analyzed.19 It is the extended Holman’s method in which the maximization is restricted to the following genetically valid values: 0z1bb0.5, 0z1bs0.5, 0z1ss0.5. z1 represents the probability of IBD sharing of maternal allele.20

Results

Identification of variations in CD40L

A total of 22 SNPs were identified by sequencing the entire CD40L gene (12.2 kb), including the 3.4 kb 5′ upstream region that includes the previously described promoter,21 and all five exons and introns (Figure 1 and Table 1). In all, 12 of the 22 SNPs were identical to those in the dbSNP database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=snp). The remaining 10 SNPs were located as follows: SNP95 in the 5′ flanking region, SNP02 in intron 1, SNP04, 05 and 07 in intron 4 and SNP14, 15, 140, 141 and 142 in exon 5. All but one SNP in exon 1, which is synonymous, were located outside the coding region (Table 1). Three SNPs in exon 5, SNP14, 15 and 140, were in close proximity to the previously described CU-rich element,22, 23 where members of the polypyrimidine tract-binding proteins (PTB) family bind and play a role in regulating CD40L expression at the level of post-transcriptional mRNA decay (Figure 1). The possible existence of rare deleterious mutations or polymorphisms altering translation of the protein has been ruled out by re-sequencing all five exons of CD40L in DNA derived from 81 probands of sibling cases.

Figure 1
figure 1

Genomic organization of CD40L. Locations of SNPs in the CD40L gene. Open and filled rectangles represent untranslated and coding regions, respectively. A CU-rich element and a CA dinucleotide repeat (XqMS257) in exon 5 are represented by rectangles, dotted and striped respectively. Location of repetitive elements is indicated in a horizontal bar below exon numbers as blackened zones.

Table 1 Identified SNPs in the CD40L gene

Linkage disequilibrium (LD) analysis

To evaluate the relationship among these 22 SNPs, pairwise LD analysis was performed. LD coefficients based on the genotype frequencies of 24 female subjects were shown (Supplementary Table 3). The D′ statistics were all 1.0. Strong LD was observed in the following two groups of SNPs (group 1: SNP97, 98, 96,1, 5, 6, 9 and 13; group 2: SNP08, 10, 11, 12, 14, 15, 141 and 142), with the r2 statistics >0.5 between every two SNPs in each group. Thus, one SNP in each group was selected for genotyping and further analysis.

Case–control study

SNP01 and 08 were chosen as representatives of the two LD groups for genotyping in the case–control study. Six SNPs (SNP95, 02, 03, 04, 07 and 140) that did not belong to the LD groups were also genotyped (Table 2). No significant difference was observed in the allele frequency of these SNPs between KD patients and healthy controls. However, in a stratified analysis using sex and coronary artery status, male patients with CAL had an increased frequency of the minor allele of SNP04 (IVS4+121 A>G) as compared to the control group (25.9 vs 15.1%; OR=2.0, 95% CI=1.07–3.66, χ2=4.70, P=0.030). While comparison between male patients with CAL and those without CAL also showed a significant difference (25.9 vs 14.4%; OR=2.1, 95% CI=1.02–4.21, P=0.041) (Table 3), no difference was observed between female patients with CAL and those without CAL (data not shown). To estimate the allele frequency in a Caucasian population, 96 samples (47 males and 49 females; Coriell Institute, Camden, NJ) were genotyped. Only one allele in 145 X chromosomes carried the G allele (0.7%) (Table 3).

Table 2 Allele frequencies of SNPs of the CD40L gene in KD patients and controls
Table 3 Allele frequencies of IVS4+121 A>G polymorphism of CD40L gene in male KD patients with or without CAL, Japanese controls and Caucasians

Analyses of dinucleotide repeat in 3′UTR

A polymorphic dinucleotide repeat sequence (XqMS257) locates in the 3′-UTR of the CD40L gene (Figure 1). We tested the relevance of the dinucleotide repeat by two-point sib-pair analysis. There is no difference in allele frequencies of the polymorphic sequence between 389 KD patients and 473 healthy controls (Supplementary Table 4). However, we observed excess sharing of alleles and nominal evidence of linkage in 80 KD sibling cases (LOD score=1.0) (Table 4).

Table 4 Sib-pair analysis of a CA dinucleotide repeat

Discussion

KD is an acute systemic vasculitis syndrome of unknown etiology. Three KD epidemics of 1979, 1982 and 1986 in Japan suggest the role of unidentified infectious agent(s) in the pathogenesis of the disease. In addition, genetic factors influencing susceptibility to KD have been postulated.4, 5, 6 In our genome-wide sib-pair analysis, Xq26, where CD40L mapped, is one of the loci that showed linkage (Onouchi et al, manuscript in preparation). CD40L is a type II integral membrane protein and belongs to the tumor necrosis factor (TNF) gene family. The protein is expressed on the surface of activated CD4+ T cells and platelets and interacts as a ligand of CD40 molecule expressed on various types of cells. It is well known that CD40L plays an important role in host defense, regulation of autoimmunity and tumor growth.9 Deleterious mutations in CD40L cause X-linked hyperimmunoglobulin M syndrome (HIGM1).24, 25, 26 HIGM1 is characterized by elevated serum IgM in contrast to the absence of IgG, IgA and IgE, and patients with this disorder have increased susceptibility to bacterial infections. Thus, variations modulating function or quantity of the CD40L protein could be postulated to the response to infectious agents. Indeed, an SNP in the 5′ flanking region of CD40L was associated with reduction in risk for severe malaria.27 KD is thought to develop as a result of excessive activation of the immune system triggered by some infectious events. Wang et al8 reported that CD40L expression on T cells and platelets was significantly higher in acute KD patients as compared to febrile controls. Moreover, in KD patients, the expression level of CD40L on CD4+ T cells and platelets is correlated with CAL.8 On the other hand, CD40L expressed on T cells and platelets is known to interact with CD40 on endothelial cells and induce them to express cell adhesion molecules such as E-selectin, ICAM-1 and VCAM-110 and secrete IL-8 and MCP-1.11 In an acute phase of KD, it is known that these molecules are upregulated and thought to play important roles in the pathogenesis of the disease.12, 13, 14, 15 Therefore, CD40L is a candidate gene relevant to KD susceptibility.

In the present study, we identified 22 SNPs of the CD40L gene in the Japanese population. Although this gene region was known as an ‘SNP desert’28 with low SNP incidence, and was reported to have fewer SNPs in Asian,29 a relatively large number of SNPs were identified in this study. SNP01, the only SNP within the protein-coding region, was synonymous and coded as rs1126535 in the dbSNP database. The allele frequency of the SNP in the present Japanese population was 10.5%, about half that reported in the database (22.5%). Among SNPs identified in this study, the minor allele frequency of SNP04, which was associated with CAL in male patients, was the highest in the Japanese. In contrast, allele frequency was only 0.7% in Caucasians. The Japanese nationwide surveys of KD incidence have documented that male children are more often affected and have a higher risk for cardiac complications.7 Unlike adults, no differences in lifestyle or environment are conceivable in early childhood. Thus, it seems rational to postulate that one of the genes influencing KD susceptibility and disease severity is located on the X chromosome. Although it is clear that larger numbers of patients with CAL will need to be analyzed to obtain a conclusive result, the finding that SNP04 is very rare in Caucasians is interesting when the low incidence rate of KD is considered. It is unknown whether this SNP by itself affects the transcription, splicing or stability of CD40L mRNA. It is possible that the observed association with SNP04 is due to a genetic variation at a different locus that is in linkage disequilibrium with this marker.

CU-rich cis-acting elements known to regulate mRNA stability22, 23 exist in the 3′-UTR of the gene. Although a total of five SNPs were identified in the 3′-UTR (Figure 1), none of these polymorphisms are located within the element. In the 5′-flanking region, there are three NFAT transcription factor-binding sites, the most distal one being about 760 bp upstream of the transcription start site.21 Again, no variation likely to be involved in transcriptional factor binding was identified despite extensive screening.

Eight haplotypes were determined by genotyping seven SNPs in male subjects. A CA repeat near the CU-rich element was found to be highly polymorphic. Although no positive association was found with the haplotypes and the CA repeat, the new information about polymorphisms in this study provide a valuable tool for the future studies of other diseases. Validation of our current findings in larger cohorts of KD patients will define the importance of CD40L SNP04 in disease susceptibility and outcome.