Introduction

Hirschsprung's disease (HSCR) is a congenital disorder in which ganglion cells are absent in variable portions of the lower digestive tract. There is significant racial variation in the incidence of the disease and it is most often found among Asians (2.8 per 10 000 life births).1, 2 HSCR patients can be classified according to the severity of the phenotype into long (L-SHCR) and short (S-HSCR) segment aganglionosis. HSCR presents mostly sporadically, although it can be familial with a complex pattern of inheritance, including low, sex-dependent penetrance and phenotypic variability. The male/female ratio (M/F) is ≈4:1 among S-HSCR patients and ≈1:1 among L-HSCR patients. Aganglionosis is attributed to a failure of neural crest cells (enteric ganglion precursors) to migrate, proliferate or survive in the gastrointestinal tract. The RET gene, encoding a tyrosine-kinase receptor, is the major HSCR gene and its expression is crucial for the development of the enteric ganglia.3 Other HSCR genes identified so far mainly code for protein members of interrelated signalling pathways involved in the development of enteric ganglia: RET, endothelin receptor B and the transcriptional regulator SOX10 signalling pathways,4, 5, 6, 7 although mutations in genes other than RET account only for 7% of the cases. Reduced penetrance of RET-coding sequence (CDS) mutations and variable expression of HSCR phenotype indicates that the disease could result from the combined effect of RET and genes of these signalling pathways, whereby the outcome would be altered RET expression.8, 9, 10, 11 Mutations in RET regulatory regions have been shown to contribute to HSCR either alone, or in combination with variants in RET CDS or in other susceptibility genes. Indeed, common RET SNPs encompassing the whole gene, including the promoter, were found strongly associated with HSCR either singly or combined (haplotype).12, 13, 14, 15, 16, 17, 18, 19, 20 Importantly, a functionally relevant SNP was identified within a HSCR-associated RET haplotype.21 This ‘common’ mutation, which lies in a putative enhancer element within RET intron 1, has low penetrance, small sex-dependent effect and explains only a small fraction of the HSCR cases. RET non-coding region mutations allow to speculate that proteins, encoded by yet unknown loci, could modify RET expression and act as disease-promoting or -suppressing genes (modifiers).16, 21 HSCR can, therefore, be defined as a complex disorder with multifactorial inheritance that requires RET and other interacting disease susceptibility alleles. As in many other complex diseases, the manifestation of the phenotype may result from the combination of common variants (SNPs) in several genes. Additional genes are necessary to explain not only the disease incidence but also its complex pattern of inheritance. Current data indicate that S-HSCR manifestation requires the effect of RET (major gene with large effect) and that of two unidentified RET-dependent modifiers mapped to 3p21 and 19q12.22 The 3p21 region was contained in the 70 Mb of the p arm of chromosome 3, which was genotyped by the Hong Kong HapMap Group for the HapMap project.23 We identified a high LD gene-rich region (6 Mb) comprised in 3p21 and subsequently, we investigated the region on a quest for the HSCR-susceptibility locus. Elucidating HSCR is especially relevant in China where Hirschsprung's disease has one of the highest incidences in the world.

Materials and methods

Samples

The institutional review board of The University of Hong Kong together with the Hospital Authority granted ethical approval for this project (IRB: UW 03-227 T/227). Blood samples were drawn from patients, their parents and controls after obtaining informed consent (parental consent in newborns and children below age 7 years).

Fifty-eight S-HSCR case–parent trios, 172 S-HSCR cases and 153 unrelated controls were included in this study. Parents were clinically unaffected. The male/female ratio was 5:1.

Marker selection

A total of 1693 SNP markers (density of 1 SNP per 5 kb approximately) encompassing the 3p21 region (from 47 to 53 Mb) were downloaded from HapMap data release 2005-03_16a_phase I (Build 34) for the Chinese Han from Beijing (CHB). Eight-hundred and fourteen common SNPs with a minor allele frequency of ≥5% were chosen and input in an ‘in-house’-built clustering programme, CLUSTAG24 for selection of tag SNPs. The cluster-merging threshold was set at r2=0.8.

Assay design and genotyping

We employed a Sequenom platform (Sequenom MassARRAY system, Sequenom, San Diego CA, USA)25 for assay design and genotyping. SNP sites were amplified by PCR in multiplex format in 384-microtiter plates by a pair of specifically designed forward and reverse PCR primers. The length of the amplicons for SNP capture ranged from 60 to 120 base pairs (bp). Following genomic amplification of the target regions, PCR products were treated with shrimp alkaline phosphatase for 20 minutes at 37 °C to dephosphorylate any residual nucleotides and to prevent their future incorporation and interference with the primer extension assay. Extension primers, DNA polymerase, and a cocktail mixture of deoxynucleotides (dNTPs) and dideoxynucleotide triphosphates (ddNTPs) were added to each mix. These were then followed by cycles of homogeneous MassEXTEND™ reaction probed by the extension primers for each SNP. The MassARRAY™ typer software version 3.1 was then used to read out the extended mass and assign the genotype call.

Quality control

For each 384-well plate, 20 samples were duplicated and four wells were filled with H2O (blank) to crosscheck contamination and reliability of the system. A whole plate was considered failed if: (i) no SNPs had passed the call rate of >80% and/or (ii) if the success rate of duplicate check had been <99.5% and that of the blank <90% and/or (iii) the success rate of the blank check alone had been <75%.

SNPs were removed from the analysis when: (i) they were not called at least in 90% of the individuals; (ii) they were monomorphic; (iii) their genotype frequencies deviated from the Hardy–Weinberg equilibrium expectation (P<0.01) or (iv) a Mendelian error rate of >0.01 was detected.

Individuals with less than 90% successful genotype calls were removed from the study.

Data analysis

The publicly available software Whap,26 Haploview27 and FUGUE (http://www.sph.umich.edu/csg/abecasis/fugue/)28 were used for statistical analyses of single markers and haplotypes for both family-based and case–control association tests. These programmes implement a standard EM approach to estimate haplotype frequencies.

The disease prevalence was set at 0.028. Empirical and global P-values for each test were obtained by running 1000 permutation tests.

Results and discussion

SNP selection, genotyping and quality control

After having implemented CLUSTAG, 218 markers were selected as tags. Four tag SNPs did not pass the quality control or SpectroDESIGNER failed to provide a successful genotyping assay. Therefore, a total of 214 SNPs were initially genotyped in 58 trios. One-hundred and seventy-three assays passed the data evaluation and were considered as successful. Genotype calls were >90% for all individuals included in this study.

Family-based association study

Single marker and haplotype analysis

One-, three- and five-marker ‘sliding’ window TDT analyses showed association with HSCR within a narrow region (Figure 1). The optimum window-size subject to the minimum type I error was that of five markers. Five consecutive markers encompassing a 129 059 bp region were therefore identified. Although none of the markers were individually associated with HSCR (Table 1), haplotypes 6 and 7 (CCGGT and CCGGC, respectively) were associated with decreased risk to HSCR. Haplotype 1 (CCTAT) was more frequently transmitted to the affected offspring, although it did not reach statistical significance (Table 2).

Figure 1
figure 1

Results of one (a), three (b) and five (c) marker ‘sliding’ window TDT analyses. At each position, the results of any covering window are averaged to produce a final statistic. This statistic is tested by permutations only (global significance values will also be given for the maximum and summed statistics).

Table 1 SNPs encompassed by the five-marker haplotypesa
Table 2 Distribution of the five-marker haplotypes in 58 triosa

Case–control association study

To verify the results obtained from the family-based analyses, 172 independent S-HSCR cases and 153 unrelated controls were also genotyped for the five markers comprised in the HSCR-associated haplotype observed previously. Single- and five-marker haplotype analyses were performed using Whap, Haploview and FUGUE. Similar results were obtained with all programmes and these are represented in Tables 3, 4. Marker rs747654 (marker 4 in Table 1) was found significantly associated with HSCR. This marker had the smallest P-value in the family-based association study conducted in 58 trios, even though, it did not reach statistical significance. Five-marker haplotype analyses showed that the CCTAT haplotype was significantly over-represented in patients (haplotype 1 in Table 2), hence conferring risk to HSCR. On the other hand, the CCGGT haplotype (haplotype 6 in Table 2) was significantly under-represented in the affected individuals, presumably conferring protection. Thus, the results from the family-based study could be replicated in an independent sample. Of note is the fact that ‘allele A’ of marker rs747654 was encompassed by the ‘risk’ haplotype while ‘allele G’ of the same marker by the ‘protective’ ‘haplotype’. Similar analyses were performed after having classified the individuals according to their gender. No significant differences were found when genders were compared, probably due to the small sample size and to the fact that the majority of patients were male. This was expected since no parental gender bias of transmission of susceptibility to HSCR at the 3p21 locus had previously been noted.22

Table 3 SNPs encompassed by the five-marker haplotype in Chinese S-HSCR and controlsa
Table 4 Distribution of the five-marker haplotype in Chinese S-HSCR and controlsa

Since differences in LD between patient and control groups may help pinpoint the location of the causative locus in any given haplotype, we investigated the detailed LD in the region encompassed by the 3p21 five-marker haplotype in both patients and controls. Figure 2 represents the LD plot among these five markers in both HSCR patients and controls together with a representation of the genes encompassed in the region. Pair-wise LD values are higher in HSCR patients than in controls and consequently, LD blocks differ when using the ‘confidence interval’ method for block definition.29 According to HapMap data for CHB (Figure 3a), a spot for recombination exists between rs1841178 and rs3774808 (markers 1 and 2), which is in line with the overall lower LD values observed in our control population.

Figure 2
figure 2

Pair-wise linkage disequilibrium diagram in HSCR patients (a) and controls (b). r2-values are represented.

Figure 3
figure 3

(a) Linkage disequilibrium plot of the 3p21 region analysed in this study for the CHB population. Circled are the 5 SNPs comprised in the associated haplotypes. (b) Linkage disequilibrium plot of the 3p21 region analysed in this study for the CEU population.

Biological implications

To investigate the putative biological implications of the HSCR-associated haplotype to the disease, we studied the genes encompassed by the five markers and at the intrinsic characteristics of the chromosomal region. Six genes and five transcripts were found in the 3p21 region, spanning from 48357638 to 48475671 (NCBI build 35). Detailed location of markers and genes in the region are depicted in Table 5 and Figure 2. Importantly, some of the genes encompassed are responsible for neurological phenotypes in humans (TREX1, PLXNB1 and SCOTIN). The most HSCR-associated SNP, rs747654, lies intronic to several transcripts of the prime repair exonuclease 1 gene (TREX1). Notably, TREX1 is expressed in developing gut of mouse.30 The plexin B1 gene (PLXNB1), encoding a receptor for the transmembrane semaphorin SEMA4D,31 promotes cell adhesion and neurite outgrowth and plays an important role in the development of the nervous system.32 Interestingly, SEMA4D maps to 9q31 region, which is known to harbor another RET-dependent modifier.33 SCOTIN plays a crucial role amid the p53–p73 pathway governing apoptosis and whose misregulation is implicated in the pathogenesis of neuroblastoma (neural crest cell-derived tumor).34, 35, 36 Apoptosis of neural crest cells have been suggested as one of the causes of enteric aganglionosis.37, 38, 39

Table 5 Characteristics of the five markers encompassed by the HSCR-associated haplotype

It is, therefore, tempting to speculate that dysfunction of any of these genes could contribute to HSCR. The five SNPs are, in fact, located in conserved regulatory regions (as indicated by comparative analysis of genomic sequences by using VISTA, data not shown) and could, therefore, overlap binding sites of regulatory proteins governing the transcription of the genes within the region. To investigate this possibility, we used rVISTA.40 None of the SNP alleles introduced or abolished a predicted binding site, although a nucleotide change at these positions could well interfere with the neighboring sites by reducing its accessibility. Further analyses are needed to evaluate the effect of the SNPs on these genes. These SNPs could also act as surrogate markers for other functional site(s) and/or act synergistically with other disease variants.

Outside the five-marker haplotype region lies the ARHGEF3 gene (Rho guanine nucleotide exchange factor), which encodes a Rho GTPase that plays a fundamental role in numerous cellular processes. ARHGEF3 has been identified as an HSCR candidate locus in mice.30 Selected additional SNPs lying in the ARHGEF3 region were also genotyped in all HSCR and control samples. Unfortunately, no HSCR-associated SNP was identified.

The initial study reporting 3p21 as a RET-dependent modifier was conducted in samples from patients of European origin.22 Although that 3p21 region is a ‘gene rich’ region in relatively high LD, the later is higher in the individuals of northern and western European ancestry (CEU) genotyped for HapMap than in CHB. This provides grounds for differences in the associated haplotypes between populations (Figure 3), which could be advantageous for the identification of the causative locus. Of note, rs747654 is monomorphic in the CEU population (G allele; Table 5), which implies that if allele A of rs747654 was a causative variant, it would be exclusive to Chinese population. A more parsimonious explanation would be that the yet unknown susceptibility variant lies within the LD region.

In this study, we investigated the 3p21 region for genes that may be implicated in the pathogenesis of HSCR disease. Our data indicates that a HSCR locus could be lying in this region, even though the statistical significance did not survive rigorous multiple-testing adjustments. Indeed, the inclusion of a large number of SNPs generates an enormous multiple testing problem, especially when studying all but the strongest effects. However, our preliminary findings on trios could be replicated in an independent sample. Ranking markers by proximity to candidate genes or by expected functional consequences could be used in follow-up studies to finally pinpoint the HSCR locus.