Introduction

The inflammatory bowel diseases (IBD), Crohn's disease (CD, MIM# 266600) and ulcerative colitis (UC, MIM# 191390), are chronic inflammatory diseases of the gastrointestinal tract with a peak age of onset in the second and third decades of life and result in symptoms of diarrhea, bleeding, and abdominal pain.1 CD is a transmural inflammatory process that can occur in any part of the gastrointestinal tract with areas of normal mucosa interspersed among areas of affected bowel. UC manifests as a superficial mucosal inflammation, which begins in the anus and rectum and progresses in a contiguous fashion to affect the entire colon in more than one-third of affected individuals. As CD and UC have many overlapping clinical features and there are no specific diagnostic tests, it can occasionally be difficult to distinguish the two, and in those instances a diagnosis of indeterminate colitis (IC) is applied.2

The pathogenesis of IBD is believed to involve exposure to an as yet unidentified environmental trigger in a genetically susceptible individual.3 The importance of genetic factors in the pathophysiology of IBD has been repeatedly established by the substantially increased relative risk to siblings of affected individuals and higher monozygotic versus dizygotic twin concordance rates.4, 5, 6, 7 Although the heritability of CD seems higher than that for UC, the available epidemiological and genetic data suggest that there are common susceptibility factors for both diseases and it is likely that gene–gene and gene–environmental interactions determine the specific phenotype.8 Although significant progress in identifying genetic factors underlying IBD through replicated linkage has been made (IBD1–IBD6),9, 10, 11, 12, 13, 14 only the susceptibility gene within the IBD1 locus has been conclusively identified. Three variants within the caspase recruitment domain-containing protein 15 (CARD15, MIM# 605956) gene15, 16, 17 have been consistently associated with the development of small bowel CD.17 There is an important ethnic and geographic heterogeneity for CARD15 (NOD2) with slightly different allele frequencies and odds ratios (OR) for the major variants among AJ compared to non-Jewish (NJ) populations and in Asian individuals affected by CD, CARD15 (NOD2) variants are extremely rare and not associated with CD.17, 18, 19, 20

The IBD5 locus (MIM# 606348) on chromosome 5q31 is the second locus clearly demonstrated to confer increased risk for CD through replicated association. Following an initial finding of significant linkage in a Canadian population,14 subsequent studies identified a 250 kb risk haplotype within the IBD5 locus that is significantly associated with CD.21 As part of this effort, the region was sequenced in eight individuals and was intensively evaluated with dense genotyping in more than 250 trios, which led to the discovery of long segments of complete linkage disequilibrium (LD), described as ‘haplotype blocks’, which were separated by apparent ‘hotspots’ of recombination.22 This original association study described 11 haplotype tagging SNPs in this region which were equally associated with CD. Important candidate genes located within and nearby to this region include interferon regulatory factor 1 (IRF1), solute carrier family 22, member 4 (SLC22A4) also known as organic cation transporter 1 (OCTN1); solute carrier family 22, member 5 (SLC22A5), also known as organic cation transporter 2 (OCTN2); and prolyl 4-hydroxylase (P4HA2) among others. Since the initial identification of this risk haplotype, numerous studies have replicated the association with CD and a few have demonstrated an association between IBD5 and UC as well,23, 24, 25, 26, 27, 28 establishing this as one of a still limited number of bona fide associations in human complex disease genetics and in IBD specifically.

In 2004 Peltekova et al29 postulated that functional polymorphisms, L503F (rs1050152) and G-207C (rs2631367) in the SLC22A4 (OCTN1) and SLC22A5 (OCTN2) genes, respectively, comprised a two-locus risk haplotype that accounted for the association findings at IBD5. That functional polymorphisms in significant LD with each other in adjacent, highly homologous genes may act as a functional cassette to increase disease risk would represent a novel, important paradigm in complex disorders. In particular, SLC22A5 (OCTN2) transports carnitine, which plays a key role in mitochondrial transport of long chain fatty acids; furthermore, SLC22A5 (OCTN2)-deficient mice develop intestinal lymphocytic infiltration.

Statistical support for a relatively unique contribution of the two-locus OCTN haplotype was initially reported in a German cohort.28 Subsequent to this study, a number of independent populations have been evaluated with somewhat variable results.30, 31, 32 In particular, a study from Edinburgh demonstrated that although the OCTN1 and OCTN2 variants are associated with CD, this association is not independent of the background risk haplotype in the region.30 This study assessed not only the OCTN variants but also a number of surrounding SNPs from the list of SNPs comprising the IBD5 risk haplotype (IGR2096, IGR 2198 and IGR2230). Utilizing these SNPs, they could not demonstrate independent association of the OCTN variants and suggest that the IGR2078 SNP used by 24, 25, 31, 32, 33 is one haplotype block removed from the OCTN variants and therefore was not truly indicative of an independent association (Figure 1). There is also significant inconsistency in the results of phenotypic analyses of the IBD5 locus with associations described for ileal and perianal disease locations as well as for UC.24, 25, 31, 32, 33 To better evaluate localization of the IBD5 association, ethnic specificity of association, and specific association to subphenotypic groups, we performed genotyping of haplotype-tagging SNPs in the IBD5 region in a much larger North American cohort comprised of 1879 IBD-affected offspring and their parents.

Figure 1
figure 1

IBD5 map demonstrating the location of the SNPs genotyped and regional candidate genes.

Materials and methods

Population

This study was conducted by the NIDDK Inflammatory Bowel Disease Genetics Consortium (NIDDK IBDGC), which is comprised of six Genetic Research Centers (GRCs) located at the Broad Institute/Massachusetts Institute of Technology (J Rioux), Cedars Sinai Medical Center in Los Angeles (H Yang), University of Chicago (J Cho), Johns Hopkins University (S Brant), University of Pittsburgh (R Duerr) and the University of Toronto (M Silverberg), and a Data Coordinating Center (DCC) located at the University of Chicago. DNA samples and clinical data from patients with inflammatory bowel disease recruited for prior studies in IBD genetics by each GRC were utilized. The investigators listed for each site were responsible for ensuring documentation of the diagnosis of IBD as well as the additional clinical information that was collected. Institutional review board approvals were obtained from each of the sites involved in this study.

Patients considered eligible for this study were those with available DNA and clinical data from complete trio (a proband with two parents available for genotyping) and tetrad (affected siblings with two parents available for genotyping) families. A description of the study sample is shown in Table 1. The following clinical parameters were ascertained: diagnosis, age of diagnosis, gender, race, Hispanic origin, Jewish ethnicity, disease location. The definitions of these categories was deemed to be similar among all sites and therefore included in the analysis. Other disease parameters such as disease behavior, surgical history, and smoking were deemed to be insufficiently documented with consistency at all centers and were therefore not included for analysis. Definitions of race and Hispanic or Jewish origin were self-reported. There were not a substantial number of Hispanic or non-Caucasian samples and thus the evaluation of these subgroups individually are not addressed in this study. All clinical variables were documented by review of medical records at each GRC.

Table 1 (a) Population descriptiona and (b) clinical characteristics

Genotyping

DNA from all individuals was genotyped at the Broad Institute of Harvard and MIT using the Sequenom MassArray 7K system and MALDI-TOF MS analysis. For this report, the following seven IBD5 SNPs were tested: IGR2096a_1 (rs12521868); IGR2198a_1 (rs11739135); IGR 2230a_1 (rs17622208); SLC22A5/OCTN2 G-207C (rs2631367); SLC22A4/OCTN1 L503F (rs1050152); IGR3081a_1; and IGR3178a_1 (rs3844312). The genotypes for IGR3081a_1 deviated substantially from Hardy–Weinberg equilibrium, so IGR3081a_1 was dropped from subsequent analysis. Although potentially a sign of non-log-additive association, IGR3081a_1 was less associated than four other markers and no previous analysis of this association had suggested substantially non-log-additive association so it appeared this was simply an unreliable assay. The selected IBD5 markers were chosen as tags for the risk haplotype in the consecutive blocks in the maximally associated region as previously described21, 22 with the specific addition of the two OCTN variants29 for comparison. The Sequenom platform used at the Broad Institute had been used to run more than 50 000 previous assays and was established to be more than 99.5% accurate in HapMap QC exercises. Genotyping was performed blinded to affection and family information.

Statistical analysis

Analysis was performed using the standard transmission disequilibrium test (TDT) (as implemented in Haploview v3.3). Results are reported with multiple affected siblings counted independently as (a) the present study was not conditioned on nor pursuing linkage in these samples; (b) the modest genotype relative risk of the IBD5 risk haplotype estimated at 1.334 suggest a minimal effect on allele sharing owing to this locus (λs <1.02; expected mean allele sharing <50.5%); and (c) given previous replications we fully expected to observe the effect and the current study is motivated by evaluating subgroups and resolving location rather than establishing whether the factor is truly associated or not. All results were confirmed by running FBAT with the conservative empirical variance adjustment for the presence of affected siblings and no meaningful differences were seen beyond a slight reduction in statistical significance owing to the conservative nature of the correction. The incorporation of a discordant allele test for affected–unaffected parent pairs described by Purcell et al35 was performed using Haploview v3.3. Conditional logistic regression was performed using WHAP (http://pngu.mgh.harvard.edu/~purcell/whap), which enables flexible stepwise testing with nested models compared using a likelihood-ratio test. HapMap public release 19 (Phase II data with more than 3.9 million total SNPs passing QC in the CEU sample) was used to identify untested SNPs highly correlated to the most associated SNPs in this study. Testing of subgroup specificity (eg, Jewish versus NJ) was performed by permuting subgroup labels at random 10 000 times – the proportion of random subgroups that showed a difference as great or greater than the observed groups is reported as the significance of the observed subgroup difference.

Results

Association of IBD5 with CD

A description of the study sample is shown in Table 1. Utilizing the TDT, significant association between the six IBD5 markers tested and CD was found (Table 2). Importantly, whereas significant results were found for SLC22A4/OCTN1 L503F (rs1050152) and SLC22A5/OCTN2 G-207C (rs2631367), slightly stronger association was seen for marker IGR2096a_1 (rs12521868), which is located 30 kb telomeric from the end of IRF1 and 60 kb centromeric of SLC22A5 (OCTN2) (Table 2 and Figure 1). As a considerable number of affected parents were present in this data set, we incorporated a discordant allele test among the affected–unaffected parent pairs, which increased the significance of the replication to a maximum P-value of 0.00006 at IGR2096a_1 (Table 2).

Table 2 TDT results for CD

A small number of Toronto samples (n=60) were included in the original sample that identified the IBD5 association21 – removal of these to define formal replication has very little impact (elevating the 0.00006–0.000075). For the remaining subgroup analyses these samples are included as the analyses are evaluating previously unexplored hypotheses and thus inclusion of these samples is biased neither for nor against any specific outcome. Furthermore, a Breslow-Day test of heterogeneity showed no significant site-specific heterogeneity of association across the six sites. The estimated effect size observed in this data set is OR=1.21 [(1.09,1.36)] (IGR2096a_1).

Evaluation of the association results stratified by Jewish and NJ ethnicity demonstrated that the IBD5 association with CD comes exclusively from the NJ CD population (Table 3). IGR2096a_1 (rs12521868) and SLC22A4/OCTN1 L503F (rs1050152) are uniformly the most associated variants in this sample. In fact, within the Jewish CD population, there was a small under-transmission of IBD5 risk alleles. This observed difference in transmission distortion is statistically significant (P=0.01) and not simply a factor of the smaller sample of Jewish families. (ie, when ethnicity was permuted, only 1% of subgroups of equivalent size to the Jewish sample here would show the observed lack of transmission.) As a result of this finding, subsequent subgroup and resolution analyzes described here were performed exclusively in the NJ sample. The effect size among NJ only is estimated to be OR=1.30 (1.14, 1.49).

Table 3 TDT results stratified by ethnicity, disease location and age of diagnosis

When evaluated by disease location, we did not observe significant differences between the level of association to IBD5 in patients grouped by disease site (Table 3). When evaluating CD patients by age of diagnosis, both early onset and late onset cases showed significant association.

Localization of IBD5 association

We utilized conditional logistic regression approaches to evaluate which, if any, of the significantly associated IBD5 SNPs could fully explain the association to CD and therefore be potentially implicated as the causal polymorphism(s). We began by adopting the most associated SNP, IGR2096a_1 (rs12521868) as causal, and asked whether significant association was seen with any other SNP when controlling for the effect of this genetic variant. No other SNPs in this region were associated when IGR2096a_1 (rs12521868) was considered causal, indicating that this SNP by itself could explain all reports of association to the region and suggesting that the IBD5 association is due to a single factor (rather than a more complex combination of alleles or haplotypes). Similarly, SLC22A4/OCTN1 L503F (rs1050152) and IGR2198a_1 (rs11739135) were also found to adequately explain the association to all SNPs (Table 1, Figure 1).

The other three SNPs tested, including SLC22A5/OCTN2 G-207C (rs2631367) reported to influence, or to be in LD with an SNP that influences SLC22A5 (OCTN2) expression, are rejected as causal under this approach. For example, if one conditions on the association to SLC22A5/OCTN2 G-207C (rs2631367), very significant association is still seen at IGR2096a_1 (rs12521868) (P<0.0002). We note that this does not at all compromise the observation that this SNP may alter the expression of SLC22A5 (OCTN2); the data here simply assert that this effect is not the causal factor involved in CD susceptibility. Peltekova et al29 raise the possibility that it is the combination of the SLC22A4/OCTN1 and SLC22A5/OCTN2 alleles that is the causal factor. However, these two SNPs are in very strong LD and although there are a substantial number of chromosomes (6.5% in our study) that carry the reported SLC22A5/OCTN2 risk allele but not SLC22A4/OCTN1 503F, fewer than 1% of chromosomes carry 503F without the SLC22A5/OCTN2 risk allele. Thus, the set of chromosomes that carry the SLC22A4/OCTN1 503F risk allele are essentially equal to the set carrying both putative risk alleles and therefore any postulated effects of the 503F allele or the combination of that allele with the SLC22A5/OCTN2 risk allele cannot be distinguished.

Examining LD among the three potentially causal SNPs and the three rejected SNPs shows, not surprisingly, that the three potentially causal variants are highly correlated with each other (all three pairwise r2 values >0.8) but less so with the SNPs excluded as being causal (all nine pairwise r2 values below 0.8). The recently released Phase II HapMap data provides an average of more than one SNP per kb throughout the IBD5 region and using this data, we identify 13 additional SNPs, spanning 243 kb that are highly correlated (r2>0.8) with the maximally associated SNP IGR2096a_1 (rs12521868). This list is far from comprehensive, as the Phase II HapMap has directly examined fewer than half of the common variants in the genome, but suggests that variants near P4HA2 and IRF1, in addition to the two OCTN genes are, from a statistical standpoint, equally likely to be the causal factor.

IBD5 and UC

A modest but significant association is seen between UC and IBD5 variation, supporting the possible role of the IBD5 locus in UC (Table 3). Although subset analysis identified apparently stronger association between IBD5 and extensive UC (as compared to more distal disease) and early onset UC (data not shown), the modest relative risk of this association and the limited size of the UC sample preclude any statistically meaningful subgroup differences to be evaluated until substantially larger study samples are examined.

Discussion

Although common alleles of modest effect play a very important role in the architecture of complex diseases and can be readily detected in some cases with modest sample sizes, it is now widely recognized that extremely large sample sizes are required in order to conclusively confirm these factors, resolve their location through fine-mapping and identify specific subphenotypes that are either preferentially associated or not-associated with disease. With the large sample size aggregated here, the current study has the opportunity to explore age at diagnosis, disease location and ethnicity in more detail than the previously published replication efforts and to evaluate the positional resolution of the IBD5 association.

In complex disorders, the IBD5 association in CD represents one of the best replicated genomic regions, and therefore warrants comprehensive studies in large, well-phenotyped cohorts to define those population subsets and functional polymorphisms that most likely contribute to disease risk.23, 24, 26, 28 The report by Peltekova et al29 suggested that an independently associated two-allele risk haplotype of the SLC22A4 (OCTN1) and SLC22A5 (OCTN2) genes were responsible for the association findings of the IBD5 risk haplotype. The data evaluated in this large sample do not support the conclusion that these two SNPs can be assigned responsibility for the genetic susceptibility at IBD5.

In our study, the association of IBD5 to CD was unequivocally confirmed, and as the AJ population has a substantially higher incidence of CD and has been reported to have a potentially different allelic architecture at the CARD15 (NOD2) risk locus, we sought to examine whether the evidence for association was similar in Jews and non-Jews. Strikingly, the 340 Jewish trios show no evidence for association to IBD5, and in fact, demonstrated a slight undertransmission of the risk alleles whereas the 986 NJ trios show strong association (OR >1.3), consistent with a recent meta-analysis of this risk haplotype.30 We have therefore found statistically meaningful (after permutation testing) evidence that the IBD5 risk factor, whereas present at comparable frequencies in AJ and non-Jews alike, may constitute a CD risk factor in the NJ population only. The AJ population likely reflects a unique risk group and whereas overall genetic similarity would suggest most risk factors should be shared with NJ individuals, it is conceivable that there will also be risk alleles specific to each. Although variable LD with a true causal factor can lead to apparent differences in association, this seems unlikely given the general similarity in LD patterns among common variants between AJ and NJ populations.19 Alternately, an interaction which requires the presence of a remote allele or mutation which itself is at different frequencies among populations can give rise to differences in association even when the causal variant is typed and equifrequent among the populations. Studies in other population groups having distinct patterns of LD, such as in African cohorts, may provide additional insight and aid in further resolution of the effect; however, given the relatively modest significance of the observation and the overall genetic similarity between AJ and NJ Caucasians, the effect observed here should be replicated first.

In the context of this study, we also evaluated the relationship of the IBD5 locus to disease location. There has been considerable discrepancy in correlation between IBD5 and phenotype with ileal, ileocolonic, and perianal disease sites all described.23, 24, 26, 28, 31, 33 Association to IBD5 is quite strong in ileal disease with and without colonic involvement and is generally comparable in all phenotypic categories in this data set.

Having explored ethnic and phenotypic specificity of IBD5, we then sought to utilize the large sample in conjunction with the identification of this higher penetrance subgroup, to evaluate whether the reported OCTN1 or OCTN2 variants were actually the most significantly associated variants in this region and more generally, whether resolution in this extended region of LD was possible. Using stepwise regression, we identified three SNPs (IGR2096a_1, IGR2198a_1, and SLC22A4) that were each adequate individually to explain the association to CD in full; that is, if any of these are considered the only causal SNP, no residual association is observed at any other SNP. Therefore, these three SNPs, and by extension SNPs very highly correlated to these SNPs observed on the HapMap, can be considered statistically equivalent in terms of their association to CD. As the list of maximally associated variants does include the OCTN1 L503F variant, functional evaluation of this variant is certainly warranted. However, the extensive LD in this region suggests that caution must be exerted in asserting which of the SNPs in this region is potentially causal. Several conclusions can be drawn from these data. Given the extensive LD, it is likely that many other SNPs in this region might also demonstrate similar results and therefore be considered potentially causal. Interestingly, whereas the SLC22A4 variant (but not the SLC22A5 promoter variant) are included in this group, there is no genetic evidence that SLC22A4 is any more associated than the other IBD5 tagging SNPs and in fact the distal IGR2096a_1 SNP tested is the most associated SNP and appears uniformly on all overtransmitted haplotypes. These data confirm and extend upon, in a much larger sample size, prior findings described in the IBD population from Edinburgh.30

Utilizing this large data set, we also examined the reproducibility of previously reported findings regarding UC and IBD5.24, 25, 33, 36 There was modestly significant association between IBD5 and UC in this population. Despite prior findings that the CARD15 (NOD2) and IBD5 loci interact to enhance this association,25 we found no evidence to support an interaction between the two (data not shown). Although these results require further investigation in larger sample sizes, it is not unexpected that there may be some effect of IBD5 on the development of UC. There is an increased risk of UC developing in the relatives of those with CD and the reverse is also true suggesting there are genes which confer overlapping susceptibility.8 In addition, the clinical heterogeneity among the two disorders could result in misspecification of disease type in those with colonic involvement. Moreover, it has been previously noted that there are subsets of CD which have a more ‘UC-like’ appearance raising further suspicion for overlapping genetic susceptibility factors.37 One might also speculate on the possibility of a unique ‘IBD5 positive’, genetically and clinically unique subset of IBD patients, that would require further testing to be better characterized.

To further advance the understanding of the IBD5 locus and its role in IBD susceptibility will require even larger population studies that will have adequate power to dissect the high degree of LD found in this region in conjunction with functional studies that will demonstrate mechanisms. These studies should be focused on the NJ populations given our findings described here. In addition, the findings in UC illustrate the importance of better powered UC genetics studies to confirm and extend the results reported here. Owing to the modest but important genetic relative risk of this locus, large consortia with prospectively collected, well-phenotyped cohorts will be required in order to refine the information regarding this region.