Introduction

Hirschsprung disease (HSCR (MIM 142623)) is characterized by the absence of intrinsic ganglion cells in the myenteric and submucosal plexuses of the gastrointestinal tract resulting in a sometimes life-threatening intestinal obstruction. HSCR is a disorder with a complex pattern of inheritance. Differences in sex ratio with a male predominance (3:1 to 5:1), incomplete penetrance, variable expression and association with a large number of syndromes have been observed.1, 2, 3 Genetic analyses of the disorder made clear that HSCR is a heterogenic disease.3 To date, mutations in nine genes have been found that contribute to the disease: RET,4, 5 GDNF,6, 7 NTN,8 EDNRB,9 EDN3,10, 11 ECE1,12 SOX10,13, 14 SIP115, 16 and PHOX2B.17 Of all of these genes, RET (MIM 164761) is thought to be the major gene, since almost all familial cases are linked to 10q11.2 the region in which the RET gene is located.18, 19 However, mutations in RET are detected in about 50% of all familial cases3, 20 and fail to explain 70–90% of the more commonly observed sporadic HSCR cases.3, 20 All other genes do explain no more than about 5–10% of the HSCR cases.21

In this study, we aim at investigating a possible role of RET in HSCR patients for whom no mutation had been found in the RET coding sequence. Therefore, we typed nine SNPs and four microsatellite markers covering the entire genomic region of RET (Figure 1) in a group of 117 Dutch sporadic HSCR patients and their 231 parents. Of these, 64 were selected on being investigated for mutations in RET and the fact that no mutations could be identified. In 53 patients, RET was not screened. We performed single and multilocus association and TDT analysis in order to determine in our patients a possible inherited ancestral haplotype coupled to a putative ancestral disease mutation(s). The two different subgroups of patients, that is, those screened and found negative for RET mutations and those not screened for mutations, were also analyzed separately to see if combining these patients was a valid procedure.

Figure 1
figure 1

Schematic representation of the RET gene and flanking regions. In the upper part of the figure, the positions of the microsatellite markers used in this study are given. In the lower part, the RET gene structure is magnified and approximate positions of typed SNPs are shown.

Materials and methods

Studied patient population

DNA samples for this study were obtained from sporadic HSCR patients and their parents with written informed consent. These patients reside throughout the Netherlands and had been referred to clinical geneticists. Almost all our families (n=113) consist of two parents and the affected child (trios). Of three patients, only one parent was available for the study and one patient participated with his child and spouse. Of these, 64 were screened by means of DGGE and sequence analysis20 and found negative for RET mutations. These 64 were selected from an initial group of 75 sporadic HSCR patients from whom 11 proved to carry a RET mutation. In all, 53 did not undergo the screening. As controls, we used the nontransmitted haplotypes of unaffected parents, like does the Haplotype Relative Risk method.22, 23 The two nontransmitted haplotypes within each trio form pseudocontrol genotypes. Under the hypothesis of random mating in our study population, these pseudocontrol genotypes are expected to be similar to genotypes of population controls. Therefore, we are allowed to use these genotypes to provide estimates of the number of heterozygotes and homozygotes and their odds ratios (ORs) in the population.

Genotyping

In all patients and their relatives, we genotyped four microsatellite markers and nine single nucleotide polymorphisms (SNPs) in a 400 kb region flanking and containing the RET locus (Figure 1). These markers included two SNPs located in the promoter region of the gene (SNP-5; G>A, 202 bp upstream of the start codon, and SNP-1; C>A, 198 upstream of the start codon),24 and SNP rs741763; G>C located 4 kb upstream of the RET gene transcription start site. Two SNPs were located in intron 1 (IVS+6000A>C/rs2435362 and IVS1–126G>T/rs2565206 (http://www.ncbi.nlm.nih.gov/SNP/) and the following SNPs were from exon 2 (c135G>A/A45A, rs1800858),25 exon 7 (c1296G>A/rs1800860),25 exon 14 (c2508C>T/S836S, rs1800862)25 and intron 19 (IVS19+47C>T).26 Of the four microsatellite markers, two were located upstream of RET, namely D10S110027 (300 kb upstream) and M35328 (80 kb upstream), one located within intron 5, MRETint5,28 and one is located 38 kb downstream the RET gene, s-TCL228 (Figure 1).

Genotyping of the microsatellite markers (D10S1100, M353, MRETint5, s-TCL2) was performed on the ABI 377 DNA sequencer machine (PE Biosystems). Data were analyzed by using the Applied Biosystem software. The SNPs, rs2565206, rs1800858, rs1800860, rs1800862 and IVS19+47C>T, were genotyped by restriction enzymes digestion. Additionally, two SNPs in RET promoter region, SNP-1 and SNP-5 were typed using the pyrosequencing method (Isogen Bioscience, Maarssen, The Netherlands). Table 1 contains characteristics of all genotyped markers.

Table 1 Characteristics of the typed markers around and within the RET locus

Statistical analyses

As a quality control on the scoring of the genotypes, all markers were tested for Hardy–Weinberg equilibrium (HWE) before analysis. For this test, we only used DNA specimens from control individuals. If resulting P-values were smaller than 0.05, this was regarded as a sign of low quality of the genotyping data and the corresponding marker was discarded from further analyses.

Linkage disequilibrium (LD) was evaluated in order to investigate the identity-by-descent status of similar haplotypes. If LD is strong, the probability that two haplotypes are not coincidental but identical by descent is high. The strength of LD was measured separately for case and the control chromosomes using a randomization test, which permutes alleles over the haplotypes, determining the significance of D′.

For single-locus allelic association analyses, the frequencies of alleles and genotypes, respectively, were compared between patients and controls using a χ2-test. When this test showed a significant result, a Z-test was performed to determine as to which allele was responsible for the difference. This test assumes that patients and controls are independent samples from the population and that the numbers of specific allele genotypes follow a binomial distribution that can be approximated by a normal distribution. Under the assumption of no assortative mating in the population, transmitted and nontransmitted haplotypes meet these criteria.

Transmission distortion of each allele versus all other alleles together is tested and combined in a multiallelic transmission/distortion test.29 This test evaluates whether, among heterozygous parents, one or more alleles are transmitted to their affected child more often than the 50% expectation.

The program TDTPHASE from the software package UNPHASED is used for the estimation of haplotype transmissions and determining the corresponding haplotype risks as a result of overtransmission.30

ORs and 95% confidence intervals (CIs) were calculated without correction for external variables such as age at diagnosis or type of the disease.31

All P-values and CIs were corrected for multiple testing for independent tests, that is, for 13 markers and two groups of patients (all patients and those with unknown or without a RET mutation), using a Bonferroni correction. Owing to LD between markers in the region, this correction is overstated and the results are, therefore, conservative.

Results

HWE analysis

None of the markers showed a significant deviation from HWE in the sample of nontransmitted pseudogenotypes. Hence, further analyses were performed on all 13 marker loci.

Linkage disequilibrium

Strong LD is observed for all marker pairs both for the sample of case chromosomes and for the sample of control chromosomes (data not shown).

Association analyses

Table 2 shows that six markers rs741763, SNP-5, SNP-1, rs2435362, rs2565206 and rs1800858, were strongly associated with the disease, revealing significant differences in frequencies of particular alleles in patients versus controls: for rs741763, the C allele is present in 81.5% of the patients chromosomes and 58.3% of the control chromosomes (P=2.0 × 10−7) and SNP-5, the A allele is present in 66.7% of the patients chromosomes versus 22.5% of the control chromosomes (P=6.1 × 10−24); for SNP-1, the C allele was present in 81.2% of the patients chromosomes versus 58.0% in the control chromosomes (P=1.1 × 10−6); for rs2435362, the A allele was seen in 67.6% of the patients chromosomes versus 26.2% of the control chromosomes (P=1.4 × 10−22); for rs2565206, allele T was found on 84.7% of the patients chromosomes versus 67.1% of the control chromosomes (P=1.2 × 10−4); and for rs1800858, the A allele was observed on 67.1% of the patients chromosomes versus 23.1% of the control chromosomes (P=2.2 × 10−24). Separate association analyses were performed on the groups of patients who were known to be RET mutation negative and those who were not screened for the mutations in order to exclude possible variations in allelic associations. Owing to the fewer number of cases, these analyses were, of course, less powerful than on the whole patients group. Nevertheless, significant associations were found for the same alleles at the same marker loci for both groups. Hence, we assumed that the group of patients not tested for the RET mutations was similar to the group of patients negative for a RET mutation and we could consider our patients' population as a whole in further analyses. Furthermore, we observed that a large proportion of our patients is homozygous for the associated alleles at the six marker loci that showed the highest association with HSCR, whereas these homozygous genotypes are hardly present in controls (Table 3a). This particularly is the case for SNP-5 allele A homozygotes – 52.0% (53/102) among patients versus 6.4% (8/125) among controls, rs2435362 allele A homozygotes – 53.7% (58/108) versus 10.0% (13/130) and rs1800858 allele A homozygotes – 54.3% (57/105) versus 7.1% (9/126). The ORs for the homozygote genotypes at these three SNPs ranged from 16.95 to 26.50, whereas the ORs for the heterozygote genotypes ranged from 2.48 to 2.93 (Table 3a).

Table 2 Frequencies of associated alleles in patients, subgroups and controls with, in brackets, P-values corrected for multiple testing using a Bonferroni correction for two independent subgroups and all independent alleles with an expected number of at least five copies
Table 3 ORs and 95% CIs for the six most significantly associated loci (a) separately and (b) combined in a haplotype

Transmission/disequilibrium test (TDT)

For the TDT, 113 case-parents triads were considered. Table 4 shows the results of the multiallelic TDT, described in detail for the specific associated alleles. Large differences are observed in transmission of alleles at marker loci on the 5′ region of the RET gene. For rs741763, allele C is transmitted 70 times versus 20 nontransmitted (P=9.3 × 10−6); for SNP-5, allele A is transmitted 97 times versus 14 nontransmitted (P=2.3 × 10−13); for SNP-1, allele C is transmitted 63 versus 17 nontransmitted (P=1.8 × 10−5); for rs2435362, allele A is 100 transmitted versus 14 nontransmitted (P=5.3 × 10−14); for rs2565206 allele T is transmitted 48 times versus 18 nontransmitted (P=0.015); and for rs1800858 allele A is transmitted 102 times versus 18 nontransmitted (P=1.2 × 10−12). Allele transmissions in the two subgroups (RET-negative versus not RET screened) did not significantly differ from each other. These findings confirm the association results that we observed.

Table 4 Transmissions of associated alleles in the entire sample and subgroups with P-values corrected for multiple testing for two independent (sub)groups and all independent alleles with a frequency of at least five copies using a Bonferroni correction

Haplotype analyses

Based on the results of the single-locus association analyses, we estimated frequencies of transmissions for the haplotypes consisting of the six strongest associated marker loci (Table 5). Out of 23 haplotypes found, the haplotype most frequent among patients was the one consisting of the alleles that gave the highest single-locus associations (rs741763 – allele C; SNP-5 – allele A; SNP-1 – allele C; rs2435362 – allele A; rs2565206 – allele T; rs1800858 – allele A): it was transmitted in 55.6% and in 16.2% it was not. The corresponding risk of overtransmission of this haplotype was estimated to be 10.31. ORs (Table 3b) for the haplotype composed of the six highest associated alleles reveal that the risk of developing HSCR is highly increased, in particular, for homozygotes for the associated haplotype (OR=21.90; 95% CI 4.86–98.69) as compared to the risk of carriers of one associated haplotype (OR=2.21; 95% CI 0.66–7.39).

Table 5 Frequencies and relative risks (RR) of transmission of haplotypes consisting of the six most associated loci as estimated using the software package UNPHASED22

Figure 2 is a graphical representation of the 13-loci haplotypes after clustering for maximal similarity around rs2435362 in order to visualize the observed association. The most striking difference is the much higher frequency of long haplotypes in patients than in controls, in particular, the haplotypes that appear as a large red block in the patients (left grid) as compared to the smaller red block in the controls (right grid).

Figure 2
figure 2

Graphical representation of the 13-loci haplotypes after clustering for maximal similarity around rs2435362. Haplotypes of patients are shown in the left half of the figure and the family-based controls in the right half. Each horizontal colored line represents one haplotype. On the x-axis, marker loci are indicated. Marker alleles are mapped to colors, white being a missing or phase-unknown allele. Different alleles at a locus have different colors. The choice of colors is such that only minimal changes in color appear from one marker locus to the next in haplotypes that are shared over a long distance. In the direction of the y-axis, haplotypes are clustered for the largest sharing around rs2435362. This algorithm causes that each block in one color represents a preserved haplotype. Clustering around marker rs2435362 was chosen because this locus was the fourth out of the six most associated markers. The same clustering algorithm was applied to patient and control haplotypes. Only patient and control haplotypes with at least five phase-known alleles are displayed. The haplotype causing the difference between patients and controls is the haplotype depicted as a red block in the middle of each grid.

Discussion

In 70–90% of the sporadic HSCR patients, no mutations in any of the nine known HSCR susceptibility genes are identified. This raises the question whether, despite the fact that no pathogenic mutations can be detected in the majority of sporadic patients, the known HSCR susceptibility genes are involved in the development of the disease, or whether yet unknown genes are responsible. A possible involvement of RET, the major gene in HSCR, might be concluded from several studies. Bolk et al18 showed that 11 of 12 families investigated were linked to RET. In six of them, however, no clear RET mutation could be identified in the coding and flanking intronic sequences, as checked by single-stranded conformation polymorphism (SSCP), DGGE and sequence analysis. It suggests an involvement of RET although not through a clear RET mutation. Several association and haplotype studies also support this hypothesis.26, 32, 33 Conserved haplotypes could be constructed using the alleles identified for different markers in and around the RET gene.24, 34, 35, 36 Carrasquilo et al34 used the highest number of SNPs in and around the RET locus. They found that patients shared common haplotypes with markers in both the proximal and distal part of the gene without having a clear pathogenic mutation. This study was carried out on a Mennonite kindred, an isolated population in which HSCR occurs in one in 500 live births. Therefore, it is likely that their findings apply to this population only and that the associated haplotype will be in LD with mutations that are most likely different from those that are present in other populations. In the recent study, Fitze et al36 showed that the haplotype ACA comprising alleles from SNPs -5, -1 and rs1800858 (c135G>A) is overrepresented in their patients population (66.9% of 80 cases). Moreover, of 58 HSCR patients, all nonmutation carriers, 62.1% proved homozygous for a two-locus (SNP-5, SNP-1) haplotype. Sancandi et al24 performed a haplotype analysis on a smaller group of HSCR patients (46 patients and 50 population-matched control individuals). They genotyped two SNPs in the promoter region lying close together (SNP-1 and SNP-5) and SNPs in exon 2 (A45A) and exon 13 (L769L). As relatives of the patients were not available for screening, real haplotypes could not be reconstructed. They estimated frequencies of haplotypes consisting of the markers in the promoter region and those in exons 2 and 13. Two of the haplotypes, differing only in the allele from last marker from exon 13, appeared to be much more frequent among HSCR patients (62% of the patients chromosomes versus 22% of control chromosomes). Assuming that these SNPs are not causative variants themselves, Sancandi et al24 localized the unknown variant upstream of exon 2, although based on their results the region between exons 2 and 13 cannot be excluded. Borrego et al35 described a haplotype analysis on 103 HSCR patients and their parents. They genotyped three SNPs at the end of intron one (IVS1−1463T>C; IVS1−1370C>T; IVS−126G>T) and seven in exons 2, 3, 7, 11, 13, 14 and 15. Significant haplotype frequency differences were found for the three SNPs in intron 1. From the whole pool of genotyped SNPs, they reconstructed one haplotype that was transmitted nine times and never nontransmitted. However, when they reconstructed a haplotype consisting solely of the three markers located at the end of intron 1, a haplotype spanning a region of 1.2 kb was observed that was far more common in HSCR patients (59.2%) than in controls (18.5%). Furthermore, they suspected that, based on extrapolation of the strength of LD at the observed SNPs, LD would become stronger upstream to the SNPs from intron 1 and that the susceptibility variant should lie upstream but probably still in the intron 1.

All these studies suggest that the RET gene is involved in sporadic HSCR and that an ancestral mutation is likely to be located in the 5′ region of the gene. We typed 13 markers, including four microsatellites and nine SNPs within and flanking the gene, in order to better define the region in which this ancestral mutation might be located.

In correspondence with previous studies, significant associations with HSCR were found for SNPs in the 5′ region of the RET gene. These were even stronger than those published in previous studies. Six successive SNPs, namely: rs741763, SNP-5, SNP-1, rs2435362, rs2565206 and rs1800858 (Tables 1 and 2), were found to be strongly associated with HSCR in our population with the highest frequency increase of 23.1% among controls to 67.1% among patients. Associated alleles at marker loci genotyped also in other studies24, 35, 36 were found to be the same (SNP-5, allele A; SNP-1, allele C; rs2565206, allele T; rs1800858, allele A). All of the alleles of genotyped markers also showed a significant transmission distortion (Table 3). Furthermore, a large proportion of our patients proved to be homozygous for the alleles at these six marker loci, whereas these homozygous genotypes had a very low frequency in controls. We were able to reconstruct haplotypes consisting of these six SNPs and observed one that was very commonly transmitted to the patients (55.6%) and significantly under-represented among controls (16.2%). Homozygosity for this ancestral haplotype was observed in 50.5% of our patients versus 5.5% of our controls. It appears that homozygosity for this, most likely European ancestral haplotype, in sporadic HSCR cases gave a much higher increased risk of developing HSCR than heterozygosity, that is, the disease appears dosage-dependent with respect to a mutation in this ancestral haplotype.

The six strongest associated SNPs are located in the interval spanning from the promoter region and the beginning of the exon 2 (27 kb). The region of association may not extend far 5′ to the transcriptional start site of the RET gene, since the two microsatellites markers (M353, D10S1100) located upstream of the RET locus do not segregate significantly with HSCR. Nevertheless, we cannot exclude that the unknown mutation is lying upstream of the promoter region as microsatellite M353 is located 80 kb upstream of RET. Alleles of marker s-TCL2, which lies downstream of the gene were associated when consider the patients as a whole. However, when separating the analyses on both group of patients (screened for RET and not tested for RET), no association between HSCR and an allele of the marker, as well as TDT was found.

We found very strong associations for three SNPs and, in particular, large differences in numbers of homozygotes between patients and controls giving ORs, ranging from 16.95 to 26.50 (see Table 3). This might suggest that one or more of these SNPs are themselves causally involved in HSCR. For some of the SNPs included, this has indeed been proposed. For instance the polymorphism in exon 2 might interfere with correct splicing,35 but experimental proof has not been presented. Polymorphisms in the introns have been suggested to disrupt the binding sites of regulatory proteins and thereby change gene transcription.35 Calculations indicated four possible binding sites for regulatory proteins. Again, it has not been proven experimentally. Fitze et al36 performed functional study on basal RET promoter sequences, carrying different haplotypes at loci -5 and -1. Fitze's group has found that expression of a reporter gene in NMB and Vi-856 cell lines is reduced under the RET promoter carrying the ‘AC’ haplotype. The expression was even lower with the ‘AA’ haplotype; however, this haplotype was not present in patients. They came to the conclusions that -5A variant can alter RET promoter activity and modulate the HSCR phenotype. It should, however, be noted that expression of a gene in different cell lines might give contradicting results. Preliminary results of the Ceccherini group indeed show such differences (Ceccherini et al, paper in preparation). Sequence analysis of the promoter region up to intron 1 should reveal all possible SNPs that, when typed on the entire sample, might provide sufficient statistical information to identify the true causal mutation(s).

In conclusion, we observed a very strong association for several SNPs in the promoter and in the 5' region of the RET gene. An increased risk of HSCR was most evident for patients homozygous for the associated alleles. The haplotype consisting of these markers showed similar results, indicating that a strong founder effect is present in our population. The alleles and consequently the halpotype found in our study is similar to that found by others who analyzed other European patient populations. The ancestral haplotype might therefore be very old making it likely that most of the European HSCR patients share the same disease-associated variant(s). We, therefore, expect that these results will allow us to identify eventually the mutation, which is obviously playing a major role in the susceptibility to HSCR.