Original Article

Genes and Immunity (2012) 13, 245–252; doi:10.1038/gene.2011.79; published online 15 December 2011

Amino acid position 11 of HLA-DRβ1 is a major determinant of chromosome 6p association with ulcerative colitis

J-P Achkar1,2,12, L Klei3,12, P I W de Bakker4,5,6,12, G Bellone7, N Rebert2, R Scott8, Y Lu9, M Regueiro8, A Brzezinski1, M I Kamboh10, C Fiocchi1,2, B Devlin3,10, M Trucco9, S Ringquist9, K Roeder7,11,13 and R H Duerr8,10,13

  1. 1Department of Gastroenterology and Hepatology, Digestive Disease Institute, Cleveland Clinic, Cleveland, OH, USA
  2. 2Department of Pathobiology, Lerner Research Institute, Cleveland Clinic, Cleveland, OH, USA
  3. 3Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
  4. 4Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
  5. 5Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
  6. 6Julius Center for Health Sciences and Primary Care and Department of Medical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
  7. 7Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA
  8. 8Division of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
  9. 9Division of Immunogenetics, Department of Pediatrics, Children's Hospital of Pittsburgh of UPMC, Pittsburgh, PA, USA
  10. 10Department of Human Genetics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, PA, USA
  11. 11Lane Center for Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.

Correspondence: Dr RH Duerr, Division of Gastroenterology, Hepatology and Nutrition, Department of Medicine, University of Pittsburgh School of Medicine, S704 Biomedical Science Tower, 3500 Terrace Street, Pittsburgh, PA 15261, USA. E-mail: duerr@pitt.edu

12These authors contributed equally to this work.

13These authors contributed equally to this work

Received 22 August 2011; Revised 12 October 2011; Accepted 31 October 2011
Advance online publication 15 December 2011

Top

Abstract

The major histocompatibility complex (MHC) on chromosome 6p is an established risk locus for ulcerative colitis (UC) and Crohn's disease (CD). We aimed to better define MHC association signals in UC and CD by combining data from dense single-nucleotide polymorphism (SNP) genotyping and from imputation of classical human leukocyte antigen (HLA) types, their constituent SNPs and corresponding amino acids in 562 UC, 611 CD and 1428 control subjects. Univariate and multivariate association analyses were performed, controlling for ancestry. In univariate analyses, absence of the rs9269955 C allele was strongly associated with risk for UC (P=2.67 × 10−13). rs9269955 is a SNP in the codon for amino acid position 11 of HLA-DRβ1, located in the P6 pocket of the HLA-DR antigen binding cleft. This amino acid position was also the most significantly UC-associated amino acid in omnibus tests (P=2.68 × 10−13). Multivariate modeling identified rs9269955-C and 13 other variants in best predicting UC vs control status. In contrast, there was only suggestive association evidence between the MHC and CD. Taken together, these data demonstrate that variation at HLA-DRβ1, amino acid 11 in the P6 pocket of the HLA-DR complex antigen binding cleft is a major determinant of chromosome 6p association with UC.

Keywords:

inflammatory bowel disease genetics; major histocompatibility complex; ulcerative colitis

Top

Introduction

The major histocompatibility complex (MHC) on chromosome 6p contains the highly polymorphic human leukocyte antigen (HLA) genes and other immunoregulatory genes.1, 2 Genetic variants in the MHC have been associated with susceptibility for many infectious and immune-mediated diseases including the inflammatory bowel diseases (IBDs), ulcerative colitis (UC) and Crohn's disease (CD).3, 4 Features of the MHC such as dense gene clustering with broad linkage disequilibrium, extensive polymorphism and heterogeneity among different populations have made localization of causal variants challenging.2

HLA polymorphisms were the focus of attention in several IBD candidate gene association studies of relatively small sample size and meta-analyses of these studies found HLA associations in UC that were mostly different from those found in CD.3, 4, 5 Subsequently, linkage between IBD and the chromosome 6p IBD3 locus was found in genome-wide linkage scans.6, 7, 8 Recent genome-wide association studies (GWAS) have confirmed the MHC as one of 47 UC loci and 71 CD loci with significant evidence for association (P<5 × 10−8).9, 10 The most significant association signal in a recent meta-analysis of six GWAS that included 6687 UC cases and 19718 controls of European ancestry was at a single-nucleotide polymorphism (SNP) in the MHC class II region (rs9268853, P=1.35 × 10−55).10 In contrast, the most significant MHC association signal in a meta-analysis of six CD GWAS that included a similar combined sample size (6333 CD cases and 15056 controls) was less significant than the UC signal and was located in the MHC class III region near the lymphotoxin A locus (rs1799964, P=3.98 × 10−11).9, 10

Here, we explore the MHC association signal in the discovery stage of a new UC and CD GWAS with excellent coverage (>10000 SNPs) across the extended MHC. We used our MHC SNP data and an existing reference data set to impute classical HLA allele types, their constituent SNPs, and corresponding amino acids in our UC, CD and control samples. This allowed us to evaluate if the observed SNP associations in the MHC can be explained by variation specifically in the classical HLA genes.

Top

Results

Analysis of genotyped MHC SNPs in IBD

First, we tested 10347 genotyped SNPs in the MHC region from 29299 to 33884kilobases (kb) on chromosome 6 using National Center for Biotechnology Information (NCBI)36/hg18 coordinates for association with UC and CD with ileal involvement. Among 35 SNPs that reached genome-wide significance (P<5 × 10−8) in the UC analysis, the most significant SNP was rs2647025 (odds ratio (OR)=1.95, 95% confidence interval (CI)=(1.62–2.35) for the G allele; P=1.94 × 10−12), located in the promoter region of HLA-DQB1 (Figure 1a). This SNP is correlated with rs9268853 (r2=0.63 in HapMap 3-CEU11), which was the MHC region SNP with the most significant association in a recent UC GWAS meta-analysis,10 and it is also correlated with rs2395185 (r2=0.60 in our data set), which was the MHC region SNP with the most significant association in the NIDDK IBD Genetics Consortium UC GWAS,12 both at distances of >200kb.

Figure 1.
Figure 1 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

MHC regional association plots for UC. (a) Association results for genotyped SNPs from the Illumina HumanOmni1-Quad BeadChip. The intensity of the red shading indicates the strength of the pairwise r2 correlation to the most associated SNP, rs2647025. (b) Association results for all polymorphic nucleotide positions within the region of peak association in (a). Horizontal lines represent the classical HLA alleles in this region. The intensity of the red shading indicates the strength of the pairwise r2 correlation to the most associated SNP marker, rs9269955-C. (c) Association results for imputed amino acids in HLA-DRβ1.

Full figure and legend (101K)

In contrast, there was only suggestive evidence for association between MHC region SNPs and CD with ileal involvement (Figure 2). The most significant association signal was found at rs17880124 (OR=2.23, 95% CI=(1.52–3.27) for the G allele; P=3.82 × 10−5), which is located in an exon of the MHC class I polypeptide-related sequence A gene. Of note, the association observed in UC was many orders of magnitude stronger than that in CD with ileal involvement despite a similar number of cases. Therefore, we focused on the UC signal through imputation of classical HLA alleles and their corresponding nucleotide and amino acid sequences.

Figure 2.
Figure 2 - Unfortunately we are unable to provide accessible alternative text for this. If you require assistance to access this image, please contact help@nature.com or the author

MHC regional association plot for CD with ileal involvement. Association results are for genotyped SNPs from the Illumina HumanOmni1-Quad BeadChip. The intensity of the red shading indicates the strength of the pairwise r2 correlation to the most associated SNP, rs17880124.

Full figure and legend (41K)

Analysis of imputed classical HLA alleles in UC

The following imputed genetic markers were included in our UC vs control analyses: 156 classical HLA alleles at four-digit resolution, 95 classical HLA allele groups at two-digit resolution, 1765 binary SNP features at 1573 nucleotide positions and 561 binary HLA amino acid features at 357 amino acid positions. The most significant association signal in UC mapped to rs9269955 (Figure 1b), which is a tri-allelic SNP within the coding region of HLA-DRB1 (position 32660116 using NCBI36/hg18 coordinates). In combination with the nucleotide position directly adjacent to it (rs17878703 at position 32660115), rs9269955 determines the codon for amino acid position 11 of the HLA-DRβ1 protein, where six different amino acid alleles are observed in the population at large (Table 1). Chromosome 6 position 32660114 is the third position in this codon, and it is not known to be polymorphic. rs9269955-C (to indicate the presence of the C allele) is associated with protection against UC (OR=0.51, 95% CI=(0.43–0.61), P=2.67 × 10−13). In combination with the adjacent rs17878703 alleles, rs9269955-C encodes three of the six observed amino acids (aspartic acid, valine or glycine) at HLA-DRβ1 amino acid 11 (Table 1). This SNP is correlated with rs2395185 (r2=0.88 in our data set), which was the MHC region SNP with the most significant association in the NIDDK IBD Genetics Consortium UC GWAS.12


To analyze the role of specific amino acid positions in the HLA genes in UC, we conducted omnibus tests for association with degrees-of-freedom equal to the number of distinct residues for that amino acid position minus one (Table 2). The most significant finding was for HLA-DRβ1 amino acid 11 (P=2.68 × 10−13), consistent with the results noted above (Figure 1c). Several other amino acid associations were highly significant including other amino acid positions in HLA-DRβ1, HLA-DQα1 or HLA-DQβ1 (Table 2).


As these results highlighted HLA-DRβ1 amino acid 11, we further analyzed the six amino acids at this position and the corresponding classical HLA-DRB1 allele groups at two-digit resolution (Table 3). The three amino acids (aspartic acid, valine and glycine) encoded by the rs9269955-C allele in combination with the adjacent rs17878703 alleles, are all associated with protection against development of UC.


Among 28 imputed classical HLA-DRB1 alleles tested at four-digit resolution, three were significantly associated with UC (DRB1*15:01, OR=1.59, 95% CI=(1.31–1.93), P=3.68 × 10−6; DRB1*01:03, OR=38.39, 95% CI=(7.50–196.60), P=1.20 × 10−5; DRB1*07:01, OR=0.61, 95% CI=(0.48–0.77), P=3.38 × 10−5).

As the above findings highlighted HLA-DRB1 association in UC, we then evaluated the quality of our classical HLA-DRB1 allele imputation at two-digit resolution by performing HLA-DRB1 genotyping via sequence-specific oligonucleotide probes and also next-generation sequencing using genomic DNA from 384 of our study subjects. This analysis demonstrated that the imputation procedure we applied was 98.8% accurate (see Supplementary Materials).

We next determined the most parsimonious model to explain the association of HLA-DRβ1 amino acid 11 with UC using forward stepwise model selection for the six observed amino acids. The best model included only three of the six amino acids: valine, glycine and aspartic acid (Table 3). The overall P-value for this best model was 3.60 × 10−13 as compared with a P-value of 2.68 × 10−13 for the full model that included all six amino acid alleles, suggesting that most of the association signal for UC at this position can be accounted for by only these three amino acids. Of note, valine, glycine and aspartic acid are the same three amino acids encoded by the most significant SNP allele, rs9269955-C, when it is combined with the adjacent rs17878703 SNP alleles. This provides good internal validation between these different analytic approaches and highlights that variation at HLA-DRβ1 amino acid 11 explains much of the HLA association with UC.

UC vs control best multivariate model

When we performed analyses conditioned on including either rs9269955-C or the HLA-DRβ1 amino acid 11 variants, there were residual UC vs control association signals because of effects of other variants in the HLA region. This finding is consistent with prior observations in UC that multiple independent association signals exist in the MHC. We used a forward stepwise model selection procedure to select the best set of markers to predict UC (Table 4). This best model has an overall P-value of 4.28 × 10−40 and includes rs9269955-C and 13 other markers that span the chromosome 6 region from 29.45 to 33.81 megabases (Mb).


UC vs CD with ileal involvement best multivariate model

In order to compare HLA associations between UC and CD with ileal involvement, we performed an analysis using UC subjects as cases and CD with ileal involvement subjects as controls. Initial association analyses for all markers in our study were performed and then we applied stepwise model selection to determine the best model for a UC vs CD with ileal involvement comparison (Table 5a). The model that was selected included 11 markers and had an overall model P-value of 4.48 × 10−33. Not unexpectedly, there was no overlap between these markers and those that were chosen in the UC vs control best model described above (Table 4).


We then used the 11 markers from the UC vs CD with ileal involvement best model to perform two further analyses: UC vs control and CD with ileal involvement vs control (Tables 5b and c). The model P-value for UC vs control was 1.59 × 10−19, which is less significant than the P-value of 4.28 × 10−40 for the unrestricted UC best model (Table 4). The model P-value for CD with ileal involvement vs control was 1.42 × 10−5. Divergent effects for each UC vs CD with ileal involvement best model marker in the UC vs control compared with the CD with ileal involvement vs control analyses are apparent when the ORs for each marker are compared.

Top

Discussion

The MHC locus demonstrates the strongest evidence for association to UC among 47 well-established UC loci identified in a GWAS meta-analysis,10 and is also one of 71 well-established CD loci identified by GWAS meta-analysis.9 In order to better understand MHC association signals in UC and CD, we used dense MHC SNP data from the discovery stage of an ongoing, new UC and CD GWAS to impute classical HLA types, their constituent SNPs and corresponding amino acids, and we performed detailed analyses of the genotyped and imputed data.

Our univariate tests of binary SNP and SNP allele markers, and our omnibus tests of polymorphic HLA amino acid positions both highlighted HLA-DRβ1, amino acid position 11 as the MHC feature most significantly associated with UC. The C allele of rs9269955 was the SNP allele most significantly associated with UC (presence of rs9269955-C is associated with protection and absence is associated with risk for UC). In combination with the immediately adjacent SNP, it encodes the valine, glycine or aspartic acid amino acid residues at HLA-DRβ1, amino acid 11, which were all associated with protection against UC. Furthermore, in multivariate analysis, the most parsimonious model to explain the association with UC at amino acid 11 consisted of valine, glycine and aspartic acid as the only terms.

HLA-DRB1 has extensive polymorphism as demonstrated by its 928 alleles and the 704 proteins for which it codes (International Immunogenetics Information System/HLA Database: http://www.ebi.ac.uk/imgt/hla).13 Valine at amino acid 11 corresponds to the common DRB1*04 (DR4) or lower frequency DRB1*10 (DR10) allele groups, glycine to DRB1*07 (DR7) and aspartic acid to DRB1*09 (DR9). The HLA-DR4, -DR7 and -DR9 allele groups were associated with protection against UC in a meta-analysis of prior studies.3 They almost always occur on haplotypes carrying the HLA-DRB4 gene, which encodes the DR53 antigen, and HLA-DRB4*01:01 has been associated with protection against UC in Japan.14 In addition, the previously reported HLA-DR2 association with risk for UC3, 5 is consistent with our observation that proline at position 11 in HLA-DRβ1 is associated with risk for UC. Based on the complementary findings from our different analyses and their correlation with results from prior studies, we conclude that variation at amino acid position 11 of HLA-DRβ1 is a major determinant of chromosome 6p association with UC.

The potential biological significance of the UC association of amino acid position 11 relates to the peptide binding specificity of HLA class II molecules and their role in antigen presentation to T cells.15, 16 The three-dimensional structure of the class II molecule HLA-DR1 heterodimer (DRA/DRB1*0101) has been well characterized and its peptide binding groove has been shown to be determined by polymorphic molecules that form nine pockets with different chemical and size characteristics.15, 17 In one of these pockets (P6), amino acid position 11 appears to be the only variable residue and thus determines the binding specificity of that pocket.18 Of note, hydrophobic amino acid residues at DRβ1 amino acid 11 were found to be associated with protection against development of sarcoidosis.19 This finding suggests that such hydrophobic interactions could affect peptide binding in the P6 pocket.19 We therefore hypothesize that variation at the amino acid position 11 of HLA-DRβ1 could have an effect on peptide binding in the HLA-DR complex antigen binding cleft that alters risk for the development of UC.

It is important to note that the MHC association signal in UC is complex and not completely explained by amino acid position 11 in HLA-DRβ1. In fact, our forward stepwise model selection identified 13 other terms besides rs9269955-C. This model is highly significant with an overall P-value of 4.28 × 10−40, but it will need to be validated in additional large cohorts.

Included in our model was another missense SNP allele in HLA-DRB1, the T allele of rs1136759. rs1136759 and two adjacent flanking SNPs encode variation at HLA-DRβ1, amino acid 13, which is located in the P4 pocket of the HLA-DR complex antigen binding cleft. The finding that two of the terms in the best model for prediction of UC risk relate to the HLA-DRβ1 complex antigen binding cleft emphasizes the probable importance of HLA-DRB1 in the pathogenesis of UC. Four other MHC class II loci variants, including SNPs in HLA-DQB1 (rs1130380-C) and HLA-DRA (rs3135391), between HLA-DQA1 and HLA-DQB1 (rs9273363), and between HLA-DQA2 and HLA-DQB2 (rs6933763), were associated with UC in our multivariate model. The HLA-DRB, -DQB and -DPB genes are all highly polymorphic and encode β-chains of the class II molecule αβ heterodimer while the α-chains are encoded by the HLA-DQA, -DPA genes and -DRA genes.4

Three polymorphisms in MHC class III loci (rs440454, rs28435656 and rs915654) were included as terms in our UC vs control model. The MHC class III region is one of the most gene dense regions in the human genome. Two of the SNPs in our model, rs440454 and rs28435656, are in linkage disequilibrium (r2=0.54 in HapMap 3-CEU11) and located in an MHC class III segment that contains four genes within 30kb including superkiller viralicidic activity 2-like and RD RNA-binding protein.20 rs440454 is in perfect linkage disequilibrium (r2=1.0 in HapMap 3-CEU11) with SNP rs419788 that was associated with risk for lupus.21 rs28435656 is located in the complement component 2 gene, which is located immediately adjacent to the region that includes superkiller viralicidic activity 2-like and RD RNA-binding protein. Finally, rs915654 is located 5 prime to the lymphotoxin A locus, which has been associated with CD and diabetes.22 All these findings suggest a role for MHC class III genes in UC pathogenesis, which warrants further investigation.

Another association of potential pathogenic interest identified in our UC vs control model is rs2844677, a synonymous SNP in the coding region of the mucin 21, cell surface associated (MUC21) gene. MUC21 is a recently identified gene that is expressed in normal colon among other tissues and produces a transmembrane mucin involved in cell adhesion.23, 24

In the last part of our analysis, we compared MHC region association signals between UC and CD with ileal involvement. The finding that the 11 studied markers each had ORs with effects in opposite directions for the two IBD phenotypes together with the results from our initial association analysis in which the most significant associations in UC were different than those for ileal CD, demonstrates that the association signals for UC and ileal CD are quite different. This conclusion correlates with results of prior studies, which have shown that the only consistent associations with risk for both UC and CD have been for HLA-DRB1*01:03 and HLA-B52.3, 4 In contrast, alleles of the HLA-DR2 split antigen DR15 have been associated in opposite directions with HLA-DRB1*15:01 associated with protection against CD and HLA-DRB1*15:02 associated with increased risk for UC.3, 5

In summary, we have performed detailed analyses to better understand MHC association signals in UC and CD. Our most significant finding is that a specific variation at amino acid position 11 of HLA-DRβ1, the only variable amino acid in the P6 pocket of the HLA-DR complex antigen binding cleft, explains a substantial portion of the MHC association signal and corresponds with several previously established classical HLA class II associations in UC. The observed alteration at amino acid position 11 of HLA-DRβ1 may affect peptide binding and result in an altered immune activation underlying protection against UC. We have also developed a novel multivariate model that further defines the contribution of MHC variation to risk for UC and highlights other genes of potential importance in UC pathogenesis. Finally, our multivariate modeling suggests different effects of MHC polymorphisms in UC and CD.

Top

Subjects and methods

Study subjects

Our study sample included 574 UC, 630 CD with at least ileal involvement and 1508 control subjects of European ancestry that were recruited for genetic studies at the Cleveland Clinic or the University of Pittsburgh under institutional review board-approved protocols. All subjects provided written informed consent. IBD diagnoses and assessment of disease location were confirmed by IBD physicians via review of primary medical records using standard endoscopic, radiographic and histologic criteria.

Genotyping and quality control

Study subjects were genotyped using the Illumina HumanOmni1-Quad BeadChip (Illumina, San Diego, CA, USA) at the Feinstein Institute for Medical Research of the North Shore-Long Island Jewish Health System. Data from samples with preliminary genotype call rates >0.98 using cluster positions provided by Illumina were reclustered using the Illumina GenomeStudio software, and the new cluster positions were applied to all samples. Initial quality control of the genotyping data included removal of one sample from each pair with estimated identity-by-descent proportion >0.10, removal of samples with genotype missing rates >0.05, or with discordant SNP-determined and reported gender or ambiguous SNP-determined gender, and removal of SNPs with genotype missing rates >0.05, minor allele frequencies in controls <0.005, or Hardy–Weinberg P-values in controls <1 × 10−6. These quality control steps were performed using the PLINK software.25 Subsequently, tag SNPs with genotype missing rates <0.1% and physical separation of at least 0.4Mb were used in spectral analysis of ancestry that identified 929 controls with a relatively homogenous ‘European’ ancestral background. Additional SNPs with minor allele frequencies <0.005 or Hardy–Weinberg P<0.001 in these 929 controls were removed from the data set.

Ancestry matching

To control for potential confounding because of variation in genetic ancestry, study subjects were grouped into 11 approximately homogenous clusters, based on genetic distances derived from GemTools.26, 27 Ancestry was inferred based on SNPs with genotype missing rates <0.1% and a physical separation of at least 0.2Mb. In all of the association analyses, we controlled for ancestry by including cluster membership as a blocking variable. The inflation across the genome-wide SNP data was minimal (genomic control lambda28=1.02 for UC vs control and 1.03 for CD with ileal involvement vs control), confirming that the samples were well matched.

Imputation of classical HLA, SNP and amino acid allele dosages

We followed a previously described procedure29 to impute classical HLA alleles and their corresponding amino acid sequences in our cases and controls, using the genotyped SNPs in our GWAS as input. This imputation procedure is conceptually similar to HLA*IMP32 in that haplotype information across the region is used to predict classical HLA alleles based on genotyped SNPs. A prior study demonstrated empirical evidence that the imputations have good accuracy29 reaching comparable levels of accuracy to the work on which HLA*IMP is based.32

As the reference panel, we used a data set of 263 HLA-A, -B, -C, -DRB1, -DQA1, -DQB1, -DPA1 and -DPB1 classical alleles at four-digit resolution, 3852 SNPs, and 372 amino acid positions in 2767 unrelated founder individuals of European descent collected by the MHC Working Group of the Type 1 Diabetes Genetics Consortium.30 All variants were encoded as biallelic markers, allowing us to use standard tools for imputation. For variants with greater than two alleles, each allele was coded as present or absent, and analyzed in a separate test. We used default parameters for BEAGLE (http://faculty.washington.edu/browning/beagle/beagle.html): 10 iterations of phasing/imputation, testing four pairs of haplotype pairs for each individual at each iteration. For each variant, we used the posterior probabilities of carrying 0 (AA), 1 (AB) or 2 (BB) copies to calculate the effective dosage for allele B (=2 × Pr(BB) + Pr(AB)). To obtain allele dosages for MHC region HumanOmni1-Quad SNPs, we used BEAGLECALL.31 Three iterations of BEAGLECALL were run, with increasing stringency of genotype calling filters (callthreshold=0.9 and missingcohort=0.1 in iteration 1, callthreshold=0.98 and missingcohort=0.02 in iteration 2, and callthreshold=0.985 and missingcohort=0.015 in iteration 3). We combined dosage information for markers in the Type 1 Diabetes Genetics Consortium reference panel with dosage information for additional HumanOmni1-Quad SNPs that appeared in both genome builds NCBI36/hg18 and GRCh37/hg19 into a combined set of genetic features in the MHC region from 29299 to 33884kb on chromosome 6 using NCBI36/hg18 coordinates.

HLA-DRB1 imputation quality at two-digit resolution was assessed by sequence-specific oligonucleotide probes and next-generation sequencing of genomic DNA collected from 384 of our study subjects (see Supplementary Materials).

Association analyses

Association analyses were performed using allele dosage data from 562 UC, 611 CD with ileal involvement and 1428 control samples that passed quality control. We examined the association between binary markers in the HLA region and UC vs control and CD with ileal involvement vs control using logistic regression with a log-additive model. Forward stepwise model selection was used to determine a set of markers in the post imputation data that jointly predicted disease vs control status, without including multiple markers that were in tight linkage disequilibrium. Markers with an allele frequency <0.001 were excluded. The Bayesian Information Criterion was used to find a model that balanced model complexity with parsimony. The stepwise procedure started by taking the best marker (lowest P-value) into the regression model and iteratively adding markers until the Bayesian Information Criterion ceased to improve. This procedure was performed in R (http://www.r-project.org) using the ‘glm’ and ‘step’ functions.

For each polymorphic amino acid position in the HLA region, we also conducted an omnibus test for association using multivariate logistic regression with degrees-of-freedom equal to the number of distinct residues for that amino acid position minus one. For the position yielding the smallest P-value, we used stepwise regression, limited to that position, to select a parsimonious model for the site.

Finally, using stepwise regression we determined a model for differentiating UC and CD with ileal involvement. In this model, CD with ileal involvement subjects served as controls and UC subjects served as cases.

For each multivariate model, we provide the P-value associated with the best model. This P-value pertains to the null hypothesis that none of the terms in the model has any explanatory value, vs the alternative hypothesis that at least one term is associated with the phenotype. The degrees-of-freedom associated with this test equals the number of markers in the multivariate model.

Top

Conflict of interest

The authors declare no conflict of interest.

Top

References

  1. Shiina T, Hosomichi K, Inoko H, Kulski JK. The HLA genomic loci map: expression, interaction, diversity and disease. J Hum Genet 2009; 54: 15–39. | Article | PubMed | ISI | CAS |
  2. Traherne JA. Human MHC architecture and evolution: implications for disease association studies. Int J Immunogenet 2008; 35: 179–192. | Article | PubMed | ISI | CAS |
  3. Fernando MM, Stevens CR, Walsh EC, De Jager PL, Goyette P, Plenge RM et al. Defining the role of the MHC in autoimmunity: a review and pooled analysis. PLoS Genet 2008; 4: e1000024. | Article | PubMed | CAS |
  4. Cassinotti A, Birindelli S, Clerici M, Trabattoni D, Lazzaroni M, Ardizzone S et al. HLA and autoimmune digestive disease: a clinically oriented review for gastroenterologists. Am J Gastroenterol 2009; 104: 195–217. | Article | PubMed | ISI |
  5. Stokkers PC, Reitsma PH, Tytgat GN, van Deventer SJ. HLA-DR and -DQ phenotypes in inflammatory bowel disease: a meta-analysis. Gut 1999; 45: 395–401. | Article | PubMed | ISI | CAS |
  6. Hampe J, Schreiber S, Shaw SH, Lau KF, Bridger S, Macpherson AJ et al. A genomewide analysis provides evidence for novel linkages in inflammatory bowel disease in a large European cohort. Am J Hum Genet 1999; 64: 808–816. | Article | PubMed | ISI | CAS |
  7. Hampe J, Shaw SH, Saiz R, Leysens N, Lantermann A, Mascheretti S et al. Linkage of inflammatory bowel disease to human chromosome 6p. Am J Hum Genet 1999; 65: 1647–1655. | Article | PubMed | ISI | CAS |
  8. van Heel DA, Fisher SA, Kirby A, Daly MJ, Rioux JD, Lewis CM et al. Inflammatory bowel disease susceptibility loci defined by genome scan meta-analysis of 1952 affected relative pairs. Hum Mol Genet 2004; 13: 763–770. | Article | PubMed | ISI | CAS |
  9. Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, Ahmad T et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nat Genet 2010; 42: 1118–1125. | Article | PubMed | ISI | CAS |
  10. Anderson CA, Boucher G, Lees CW, Franke A, D'Amato M, Taylor KD et al. Meta-analysis identifies 29 additional ulcerative colitis risk loci, increasing the number of confirmed associations to 47. Nat Genet 2011; 43: 246–252. | Article | PubMed | ISI | CAS |
  11. Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F et al. Integrating common and rare genetic variation in diverse human populations. Nature 2010; 467: 52–58. | Article | PubMed | ISI | CAS |
  12. Silverberg MS, Cho JH, Rioux JD, McGovern DP, Wu J, Annese V et al. Ulcerative colitis-risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study. Nat Genet 2009; 41: 216–220. | Article | PubMed | ISI | CAS |
  13. Robinson J, Mistry K, McWilliam H, Lopez R, Parham P, Marsh SG. The IMGT/HLA database. Nucleic Acids Res 2011; 39: D1171–D1176. | Article | PubMed | ISI |
  14. Yoshitake S, Kimura A, Okada M, Yao T, Sasazuki T. HLA class II alleles in Japanese patients with inflammatory bowel disease. Tissue Antigens 1999; 53: 350–358. | Article | PubMed | ISI | CAS |
  15. Brown JH, Jardetzky TS, Gorga JC, Stern LJ, Urban RG, Strominger JL et al. Three-dimensional structure of the human class II histocompatibility antigen HLA-DR1. Nature 1993; 364: 33–39. | Article | PubMed | ISI | CAS |
  16. Janeway C, Travers P, Walport M, Shlomchik M. Antigen recognition by B-cell and T-cell receptors. In: Immunobiology: The Immune System in Health and Disease, 6th edn. Garland Science: New York, 2005 pp 103–134.
  17. Stern LJ, Brown JH, Jardetzky TS, Gorga JC, Urban RG, Strominger JL et al. Crystal structure of the human class II MHC protein HLA-DR1 complexed with an influenza virus peptide. Nature 1994; 368: 215–221. | Article | PubMed | ISI | CAS |
  18. Sturniolo T, Bono E, Ding J, Raddrizzani L, Tuereci O, Sahin U et al. Generation of tissue-specific and promiscuous HLA ligand databases using DNA microarrays and virtual HLA class II matrices. Nat Biotechnol 1999; 17: 555–561. | Article | PubMed | ISI | CAS |
  19. Foley PJ, McGrath DS, Puscinska E, Petrek M, Kolek V, Drabek J et al. Human leukocyte antigen-DRB1 position 11 residues are a common protective marker for sarcoidosis. Am J Respir Cell Mol Biol 2001; 25: 272–277. | PubMed | ISI | CAS |
  20. Yang Z, Shen L, Dangel AW, Wu LC, Yu CY. Four ubiquitously expressed genes, RD (D6S45)-SKI2W (SKIV2L)-DOM3Z-RP1 (D6S60E), are present between complement component genes factor B and C4 in the class III region of the HLA. Genomics 1998; 53: 338–347. | Article | PubMed | ISI |
  21. Fernando MM, Stevens CR, Sabeti PC, Walsh EC, McWhinnie AJ, Shah A et al. Identification of two independent risk factors for lupus within the MHC in United Kingdom families. PLoS Genet 2007; 3: e192. | Article | PubMed | CAS |
  22. Valdes AM, Thomson G, Barcellos LF. Genetic variation within the HLA class III influences T1D susceptibility conferred by high-risk HLA haplotypes. Genes Immun 2010; 11: 209–218. | Article | PubMed | ISI |
  23. Yi Y, Kamata-Sakurai M, Denda-Nagai K, Itoh T, Okada K, Ishii-Schrade K et al. Mucin 21/epiglycanin modulates cell adhesion. J Biol Chem 2010; 285: 21233–21240. | Article | PubMed | ISI |
  24. Itoh Y, Kamata-Sakurai M, Denda-Nagai K, Nagai S, Tsuiji M, Ishii-Schrade K et al. Identification and expression of human epiglycanin/MUC21: a novel transmembrane mucin. Glycobiology 2008; 18: 74–83. | Article | PubMed | ISI | CAS |
  25. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575. | Article | PubMed | ISI | CAS |
  26. Lee AB, Luca D, Klei L, Devlin B, Roeder K. Discovering genetic ancestry using spectral graph theory. Genet Epidemiol 2010; 34: 51–59. | Article | PubMed | ISI |
  27. Klei L, Kent B, Melhem N, Devlin B, Roeder K. GemTools: a fast and efficient approach to estimating genetic ancestry http://arxiv.org/abs/1104.1162 2011.
  28. Devlin B, Roeder K. Genomic control for association studies. Biometrics 1999; 55: 997–1004. | Article | PubMed | ISI | CAS |
  29. Pereyra F, Jia X, McLaren PJ, Telenti A, de Bakker PI, Walker BD et al. The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science 2010; 330: 1551–1557. | Article | PubMed | ISI | CAS |
  30. Brown WM, Pierce J, Hilner JE, Perdue LH, Lohman K, Li L et al. Overview of the MHC fine mapping data. Diabetes Obes Metab 2009; 11(Suppl 1): 2–7. | Article | PubMed | ISI |
  31. Browning BL, Yu Z. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am J Hum Genet 2009; 85: 847–861. | Article | PubMed | ISI |
  32. Leslie S, Donnelly P, McVean G. A statistical method for predicting classical HLA alleles from SNP data. Am J Human Genet 2008; 82: 48–56. | Article | ISI |
Top

Acknowledgements

We acknowledge Leonard Baidoo, MD and David Binion, MD for providing phenotypic information for some of the study subjects, the Feinstein Institute for Medical Research of the North Shore-Long Island Jewish Health System for Illumina Genotyping BeadChip processing, and the University of Pittsburgh Genomics and Proteomics Core Laboratories for HLA-DRB1 sequencing technical assistance. This work was supported by the National Institutes of Health grants DK068112 (J-PA), AG030653 (MIK), MH057881 (BD and KR), DK062420 (RHD) and DK076025 (RHD); a Crohn's and Colitis Foundation of America Senior Research Award (RHD); Department of Defense Grant W81XWH-07-1-0619 (MT); and funds generously provided by Kenneth and Jennifer Rainin, Gerald and Nancy Goldberg, and Victor and Ellen Cohn.

Supplementary Information accompanies the paper on Genes and Immunity website