Introduction

Inflammatory bowel disease (IBD) is a chronic or remission-relapse inflammatory disease of the gastrointestinal tract. IBD includes two types of chronic gut disorders: ulcerative colitis (UC) and Crohn’s disease (CD)1. UC primarily affects the mucous membranes and often forms erosions and ulcers in the colon, which can lead to inflammation of the rectum and extend proximally. There are three UC phenotypes in the clinical setting: proctitis, left-sided colitis, and pancolitis2. CD produces granulomatous inflammation with ulceration and fibrosis mainly in the small intestine, but also anywhere in the digestive tract. Long-term severe intestinal inflammation can result in ulcers, constriction, and perforation of the intestinal tract. Indeed, IBD chronically injures the digestive tract, leading to tissue damage, function loss, disability, and systemic inflammation1.

The exact etiology of IBD remains unknown, although a multifactorial pathogenesis that includes genetic, immunological, environmental, and microbial factors are likely involved3,4,5,6,7. Specifically, the influence of genetic factors on IBD pathogenesis has been supported by the high concordance rate among monozygotic twins as well as the high relative risk in affected siblings8,9.

To date, genome-wide association studies and the candidate-gene approach have identified more than 240 IBD susceptibility genes or loci outside of the human leukocyte antigen (HLA) region, including nucleotide oligomerization domain 2, autophagy related 16-like 1, interleukin-23, PR domain-containing 1, and caspase recruitment domain 93,10,11,12,13,14,15,16,17. Among the many candidate genes proposed, however, the HLA region has been consistently and strongly associated with IBD onset. HLA class I proteins play a crucial role in human immune responses18,19,20. High expression levels of HLA-C molecules, which contribute to viral control in HIV-infected individuals, reportedly have a deleterious effect on CD in patients of European descent21. Moreover, among several single-nucleotide polymorphisms (SNPs) associated with HLA-C expression22,23, a SNP located 35 kb upstream of the coding region of the HLA-C gene in chromosome 6 (-35C/T; rs9264942) was suggested to play an important role in controlling HLA-C cell surface molecule expression as an expression quantitative trait locus (eQTL) in subjects of European ancestry24. However, no such association studies on Japanese IBD patients have been published24.

Including HLA-DQB1, HLA-DRB1, and HLA-DQA1, the HLA class II region has also been associated with IBD onset across human groups7,25,26. A particular haplotype, HLA-C*12:02~B*52:01~DRB1*15:02, was related to an increased risk for UC but a reduced risk for CD in a Japanese population25. Specifically, both HLA-B*52:01 and HLA-DRB1*15:02 alleles were directly associated with UC susceptibility and CD protection in the Japanese, whereas HLA-C genetic involvement in IBD remained unknown25. Since HLA genes are the most highly polymorphic in the human genome, a high-resolution HLA and SNP haplotype map analysis was developed for disease association studies27,28,29. Based on that dataset and Okada’s report25, three SNPs at rs2270191, rs3132550, and rs6915986 have strong linkage disequilibrium (LD) with the HLA-C*12:02 (r2 = 1), HLA-B*52:01 (r2 = 0.94), and HLA-DRB1*15:02 (r2 = 0.89) alleles, respectively, in Japanese populations22,25 (Supplementary Fig. 1). However, no association studies of these tag SNPs have been reported in terms of Japanese IBD.

The aim of this study was to determine whether the eQTL SNP at rs9264942 could regulate HLA-C expression in the Japanese and analyze the relationships of SNPs in strong LD of a particular HLA haplotype with UC and CD susceptibility. It sought to uncover important findings on IBD from the in vivo, genetic, and functional aspects in well-defined patient groups and controls.

Results

Comparisons of HLA-C expression on peripheral blood mononuclear cells by the eQTL rs9264942 SNP genotype in healthy Japanese subjects

A total of 32 healthy control subjects were included for the analysis of HLA-C expression on peripheral blood mononuclear cells (PBMC) (Table 1). The gating strategy for PBMC is shown in Fig. 1a. Although the cell surface expression of HLA-C on CD3e+CD8a+ T lymphocytes (Fig. 1b) as detected by flow cytometry in healthy subjects was comparable between males and females (Fig. 1e), it was significantly higher for the CC or CT genotype than for the TT genotype at rs9264942. HLA-C expression on CD3e+CD8a+ T lymphocytes (Fig. 1f), macrophages (Fig. 1c,g), and neutrophils (Fig. 1d,h) were significantly higher for the CC or CT genotype than for the TT genotype. Since another SNP at rs2395471 was also reported to be an eQTL SNP of HLA-C23, we compared HLA-C expression for the AA or AG genotype and the GG genotype at rs2395471 on PBMC. HLA-C expression on CD3e+CD8a+ T lymphocytes (Fig. 1i) was significantly higher for the AA or AG genotype than for the GG genotype at rs2395471, while HLA-C expression on macrophages (Fig. 1j) and neutrophils (Fig. 1k) were comparable between the groups.

Table 1 Age and gender proportions of healthy subjects for HLA-C expression analysis.
Figure 1
figure 1

HLA-C expression on PBMC by flow cytometry analysis. Representative gating strategy (a). Representative HLA-C expression on CD3e+CD8a+ lymphocytes (b), macrophages (c), and neutrophils (d) (red) in relation to the isotype control (gray). Comparisons of geometric mean fluorescence intensity quantification of the cell surface expression of HLA-C on CD3e+CD8a+ lymphocytes between males and females (e) and between the SNP of the CC or CT genotype and of the TT genotype at rs9264942 on CD3e+CD8a+ lymphocytes (f), macrophages (g), and neutrophils (h) as well as between the SNP of the AA or AG genotype and the GG genotype at rs2395471 on CD3e+CD8a+ T lymphocytes (i), macrophages (j), and neutrophils (k).

Association of the eQTL rs9264942 SNP of HLA-C expression with IBD susceptibility

A total of 160 patients with UC and 275 patients with CD along with 325 healthy subjects were enrolled for this association study (Table 2). The C allele frequency of the eQTL SNP at rs9264942, which has been associated with HLA-C expression, was significantly higher in UC patients than in the healthy control group (UC vs. controls: 49.7% vs. 35.5%, odds ratio [OR] 1.79; pc = 9.52 × 10–4) (Table 3). In terms of clinical findings, there were no significant differences for UC between the CC or CT genotype and the TT genotype (Supplementary Table 1). There was also no remarkable difference for the C allele frequency of the rs9264942 SNP between the CD and healthy control groups (OR 1.00; pc = 1.000) (Table 3).

Table 2 Demographic and clinical data of UC, CD, PBC and healthy subjects.
Table 3 Association of four SNPs with IBD susceptibility.

Association of three SNPs at rs2270191, rs3132550, and rs6915986 with IBD susceptibility

The allele frequencies of three SNPs at rs2270191, rs3132550, and rs6915986 in patients with UC and CD and in healthy and primary biliary cholangitis (PBC) disease controls are shown in Table 3. The frequency of the T allele at rs2270191 in strong LD with the HLA-C*12:02 allele (r2 = 1), the A allele at rs3132550 in strong LD with the HLA-B*52:01 allele (r2 = 0.94), and the C allele at rs6915986 in strong LD with the HLA-DRB1*15:02 allele (r2 = 0.89) were significantly higher in UC patients than in the healthy group but significantly lower in CD patients than in healthy controls (Table 3). These allele frequencies were comparable between healthy and PBC disease controls (Table 3).

Haplotype analysis of the SNPs at rs2270191, rs3132550, and rs6915986

Haplotype frequencies were calculated using an expectation–maximization algorithm for the three SNPs (rs2270191, rs3132550, and rs6915986) and compared among UC, CD, PBC, and healthy control groups. The three-SNP haplotype of TAC, which was predicted as associated with the particular HLA haplotype of HLA-C*12:02~B*52:01~DRB1*15:02, showed a strong susceptibility correlation between UC and controls (OR 2.53; pc = 3.92 × 10–7) but a protective association between CD and controls (OR 0.50; pc = 0.002) (Table 4). In contrast, the three-SNP haplotype of CGT showed a strong protectivity correlation between UC and controls (OR 0.39; pc = 1.58 × 10–8) but a significant susceptibility association between CD and controls (OR 1.90; pc = 1.19 × 10–3) (Table 4). The three-SNP haplotypes of CGT and TAC accounted for the vast majority of the cohort, which indicated that the SNPs were in strong LD with each other, as shown in Fig. 2. Moreover, these three-SNP haplotype frequencies were comparable between healthy and PBC disease controls (Table 4).

Table 4 Association of the three-SNP* haplotype or four-SNP* haplotype with IBD susceptibility.
Figure 2
figure 2

LD plots of four SNPs at rs2270171, rs3132550, rs9264942, and rs6915986 of healthy controls (a), ulcerative colitis patients (b) and Crohn’s disease patients (c). Values of r2 corresponding to each pair are expressed as a percentage and shown within the respective square.

Logistic regression analysis of SNPs associated with IBD susceptibility

The strong LD of the three SNPs at rs2270191, rs3132550, and rs6915986 indicated that simultaneous logistic regression analysis to determine which SNP was independently associated with IBD susceptibility was inappropriate due to multi-collinearity. However, since rs9264942 SNP distribution was different from that of the other three SNPs regardless of being within a haplotype block (Fig. 2 and Supplementary Fig. 1), the SNP at rs9264942 was compared with the representative SNP at rs3132550 by logistic regression analysis. The SNP at rs3132550 was independently associated with UC susceptibility [OR 3.507, 95% confidence interval (95% CI)] 2.176–5.654; p < 0.00001), whereas the SNP at rs9264942 did not reach statistical significance (OR 1.236, 95% CI 0.744–2.054; p = 0.413). Moreover, the SNP at rs3132550 was independently associated with CD protectivity (OR 0.478, 95% CI 0.298–0.767; p = 0.00221), unlike the SNP at rs9264942 (OR 1.376, 95% CI 0.966–1.961; p = 0.077).

Haplotype analysis of the SNPs at rs2270191, rs3132550, rs9264942, and rs6915986

Haplotype frequencies were calculated using an expectation–maximization algorithm for the four SNPs (rs2270191, rs3132550, rs9264942, and rs6915986) and compared among UC, CD, PBC, and healthy control groups. The four-SNP haplotype of TACC, which was predicted as associated with the particular HLA haplotype of HLA-C*12:02~B*52:01~DRB1*15:02, showed a strong susceptibility correlation between UC and controls (OR 2.53; pc = 3.96 × 10–7), but a protective association between CD and controls (OR 0.50; pc = 0.002) (Table 4). The four-SNP haplotype of TATC could not be calculated, which indicated that all subjects carrying the particular HLA haplotype of HLA-C*12:02~B*52:01~DRB1*15:02 had the TACC haplotype, i.e., the SNP at rs9264942 of subjects carrying the HLA haplotype of HLA-C*12:02~B*52:01~DRB1*15:02 was the C allele (CC or CT genotype). The four-SNP haplotype of CGTT in individuals with the SNP T allele at rs9264942 showed a protective correlation between UC and controls (OR 0.56; pc = 2.66 × 10–5), whereas the four-SNP haplotype of CGCT in individuals with the SNP C allele at rs9264942 displayed no significance (pc = 0.803). On the other hand, the four-SNP haplotype of CGTT showed no significance between CD and controls (pc = 0.828), while the four-SNP haplotype of CGCT exhibited significant susceptibility (OR 1.31; pc = 0.042). The above four-SNP haplotype frequencies were comparable between healthy and PBC disease controls (Table 4).

Comparisons of clinical findings of the 4-SNP haplotype for IBD

There were no significant differences for any clinical findings between UC patients with and without the TACC haplotype (Supplementary Table 2). In contrast, significant differences were observed for male frequency (OR 2.30, 95% CI 1.06–5.01; p = 0.032), disease phenotype location of ileocolonic pattern (OR 3.17, 95% CI 1.44–6.96; p = 0.003), intestinal complication frequency (OR 3.40, 95% CI 1.55–7.45; p = 0.001), and frequency of history of intestinal surgery (OR 3.33, 95% CI 1.50–7.39; p = 0.002) between CD patients with and without the TACC haplotype (Table 5).

Table 5 Relationship between haplotypes from four SNPs (rs2270171, rs3132550, rs9264942, and rs6915986) and clinical findings in CD.

Discussion

This study clearly demonstrated a significant association of an eQTL SNP of HLA-C with the regulation of HLA-C expression on PBMC in Japanese subjects. Moreover, four SNP haplotypes consisting of the eQTL SNP of HLA-C and three SNPs of HLA-C*12:02~B*52:01~DRB1*15:02 were significantly related to susceptibility to UC but protection against CD.

The cell surface expression of HLA-C is regulated through multiple mechanisms30. Located 35 kb upstream of the HLA-C gene (Supplementary Fig. 1), the SNP variant at rs9264942 has been associated with HLA-C surface expression in subjects of European-descent24,31,32, but not in an African American population21, suggesting that the function of the eQTL SNP at rs9264942 in HLA-C expression varies among population groups. Therefore, we examined for and identified a significant association of the eQTL SNP at rs9264942 with HLA-C expression on PBMC, specifically on CD3e+CD8a+ lymphocytes, macrophages, and neutrophils, in the Japanese. HLA-C cell surface expression differed among cell populations, which supported previous findings of relatively low HLA-C expression on lymphocytes33,34,35 and high expression on antigen-presenting cells, such as macrophages36. The increased expression of HLA-C molecules promotes antigen presentation and recognition by CD8a+ cytotoxic T cells and natural killer cells for host protection, but in some instances leads to multiple diseases. HLA-C surface expression is also regulated by microRNA miR-148a binding on the 3′ untranslated region (3′UTR) of the HLA-C gene37,38. Therefore, interactions between the eQTL SNP of HLA-C at rs9264942 and miR-148a binding on the 3′UTR region of the HLA-C gene merit future consideration to uncover the precise genetic and molecular mechanisms of HLA-C expression. We also compared the HLA-C expression on PBMC between the AA or AG genotype and the GG genotype at rs2395471, which was reported as another eQTL SNP of HLA-C23. HLA-C expression on CD3e+CD8a+ T lymphocytes was significantly higher for the AA or AG genotype than for the GG genotype at rs2395471, but no differences were observed on macrophages and neutrophils. Additional molecular mechanisms of HLA-C expression regulation on PBMC subsets by eQTL SNP differences require assessment in future studies.

Okada et al. reported that the HLA-C*12:02~B*52:01~DRB1*15:02 haplotype was associated with susceptibility to UC as well as protection against CD in the Japanese25. If patients had the particular three-SNP haplotype of TAC at rs2270191, rs3132550, and rs6915986, this haplotype showed very similar sensitivity to UC as did the HLA-C*12:02~B*52:01~DRB1*15:02 haplotype25. Moreover, the TAC haplotype frequency was low in CD patients, which supported a similar findings for the HLA-C*12:02~B*52:01~DRB1*15:02 haplotype frequency in a previous report25. Our result indicated that three-SNP haplotype assignments could serve as surrogate markers for disease susceptibility or protectivity to IBD instead of HLA ones. Although the HLA-C eQTL SNP at rs9264942 was located on this haplotype, the r2 values for each SNP were relatively high apart from those between the SNP at rs9264942 and the others of less than 30. This indicated that LD was broken only by rs9264942 and therefore the possibility that the eQTL SNP at rs9264942 had an independent function in terms of IBD susceptibility. Thus, we analyzed the relationship of four-SNP haplotypes with IBD risk. The SNP at rs9264942 was related to CD susceptibility in European patients by the regulation of HLA-C expression21,24. We detected an association of the TACC haplotype at rs9264942, but not the TATC haplotype, which suggested that IBD susceptibility might not be related to the eQTL SNP at rs9264942 if patients carried the three SNPs of the TAC haplotype, thereby depending on a particular HLA-C*12:02~B*52:01~DRB1*15:02 haplotype25. Reciprocally, the CGTT haplotype in patients with the T allele at rs9264942 (low HLA-C expression group) was associated with UC protection, whereas the CGCT haplotype in patients with the C allele at rs9264942 (high HLA-C expression group) exhibited no remarkable association (Table 4). On the other hand, the CGTT haplotype was not associated with CD, although the CGCT haplotype had a significant association with CD susceptibility. This supported a previous paper describing that high HLA-C expression was related to CD susceptibility21. Attempts to confirm the independent involvement of the SNP at rs9264942 in IBD susceptibility or protection by stratified analysis using the Mantel–Haenszel test were inconclusive (data not shown), which might have been due to a relatively small number of subjects or a stronger association of a particular haplotype. Larger studies are needed to validate our findings.

Regarding disease phenotype, it was interesting that CD patients without the TACC haplotype had significantly more intestinal complications (intestinal stenosis, perforation, and fistula) and history of intestinal surgery, suggesting that the HLA-C*12:02~B*52:01~DRB1*15:02 haplotype might impart a stronger effect on clinical results. Based on the above findings, these four SNPs may be simple but useful surrogate markers for predicting HLA-C*12:02~B*52:01~DRB1*15:02 haplotypes as well as disease outcome. Other HLA class II alleles have been associated with IBD onset among populations, such as HLA-DRB1*01:03 with disease predisposition to CD and UC in Europeans, but not in East-Asian Backgrounds39,40. Further studies are required to identify the direct mechanisms of involvement of these genetic variants on disease susceptibility and outcome.

The present study had several limitations. First, it evaluated HLA-C expression on the PBMC of healthy controls. As the number of normal controls was too small for a definitive conclusion, a larger validation analysis with more subjects is needed to confirm our results. However, earlier studies have used as few as 30 healthy controls31. HLA-C expression in tissue samples should also be analyzed to confirm the molecular mechanisms of HLA-C involvement in IBD. Second, this investigation was preliminary in nature because the numbers of cases and controls were limited. However, we adopted a PBC disease control cohort for the association analysis, whereby the IBD cohort was significantly different from healthy controls as well as from PBC patient controls regarding genetic frequency. Additional investigation is required to validate these newly discovered associations in a larger number of individuals; however, power calculations based on the study subjects of 160 UC patients and 275 CD patients and an effect size of 0.28 at rs9264942 showed sufficient detection power (0.99) at the 0.05 level of significance41. Moreover, the calculated type I error and type II were 0.001 and 0.004, respectively, which suggested that statistical error could be avoided. Third, the HLA-C*12:02~B*52:01~DRB1*15:02 haplotype should be examined in more detail to strengthen the results of this study. Fourth, this investigation was retrospective in nature and did not include a segregation analysis; prospective longitudinal studies are needed to clarify the associations of genetic polymorphisms with IBD outcomes.

In conclusion, our findings supported that the eQTL SNP at rs9264942 regulated HLA-C expression in the Japanese. Our association study also implicated four SNPs in strong LD of a particular HLA haplotype, HLA-C*12:02~B*52:01~DRB1*15:02, with IBD susceptibility and outcome, which might be potential surrogate markers for the disease.

Patients and methods

Research ethics considerations

This study was conducted in accordance with the principles of the 1975 Declaration of Helsinki and approved by the ethics committees of each participating institution [Shinshu University School of Medicine, Matsumoto, Japan (approval numbers: 639 and 4533), Japanese Red Cross Society Suwa Red Cross Hospital, Suwa, Japan (approval number: 30-18), and Tokyo Yamate Medical Center, Tokyo, Japan (approval number: 188)]. Informed written consent was obtained from all patients and healthy subjects.

Study subjects

A total of 32 healthy control subjects were included for the analysis of HLA-C expression on PBMC (Table 1). The controls were volunteers from hospital staff who had indicated the absence of any major illnesses and no direct familial relations in a standard questionnaire.

A total of 160 patients with UC and 275 patients with CD who were seen between April 2014 and May 2019 at Shinshu University School of Medicine, Matsumoto, Japan, Japanese Red Cross Society Suwa Red Cross Hospital, Suwa, Japan, or Tokyo Yamate Medical Center, Tokyo, Japan, along with 325 healthy subjects including the above-described 32 healthy control subjects were enrolled for this association study (Table 2). A total of 328 PBC patients who were diagnosed as having PBC at Shinshu University Hospital between 1986 and 2015 were also included as a disease control group in this analysis. No participants were direct relatives of each other, and all were of an East-Asian genetic background.

The diagnoses of UC and CD were based on established Japanese Society of Gastroenterology guidelines using endoscopic, histologic, radiographic, and serologic findings42. We also sub-classified the IBD patients into disease parameters based on the phenotype of disease location (colonic, ileal, ileocolonic, and not determined in CD and proctitis, left-sided colitis, pancolitis, and not determined in UC) (Table 2)2,43,44. The introduction of prednisolone (PSL) and anti-TNF-α was recommended by the guidelines for moderate or severe UC or CD42. Oral or intravenous PSL was advised as a remission induction therapy for left-sided colitis or pancolitis in UC, while anti-TNF-α was advocated for induction or maintenance therapy. A PSL-positive status was designated for patients with a history of PSL administration. A PSL- and anti-TNF-α-positive status was designated for patients who had received both PSL and anti-TNF-α.

The diagnosis of PBC was based on the criteria from the Japan Society of Hepatology45. This sub-cohort has been well characterized in prior genetic and clinical studies46,47,48,49,50,51,52,53,54.

Flow cytometry analysis of HLA-C expression on CD3e+CD8a+ lymphocytes

PBMC were obtained one day after the hemolysis of red blood cells in ACK lysing buffer (Thermo Fisher Scientific K.K., Tokyo, Japan) according to the manufacturer’s instructions. PBMC were incubated with antibodies that included 7-AAD (BioLegend, San Diego, USA), APC-anti-CD3e (BioLegend), FITC-anti-CD8a (BioLegend), and PE-anti-HLA-C (BD Biosciences) or PE-HLA-C-isotype control by IgG1 k (BioLegend) for 30 min on ice, washed twice with staining buffer, and then analyzed using a BD FACSCanto™ II flow cytometer (BD Biosciences, New Jersey, USA). HLA-C expression was determined by median fluorescence intensity on CD3e+CD8a+ lymphocytes, CD3e+CD8a lymphocytes, neutrophils, and macrophages in relation to the isotype control. The gating strategy is shown in Fig. 1a. Obtained data were analyzed by FlowJo software (BD Biosciences). All samples were simultaneously stained and analyzed in a one-day experiment.

SNP determination

Genomic DNA from all participants was isolated from whole blood samples using Quick Gene-610L assays (Fujifilm, Tokyo, Japan). The DNA concentration was adjusted to 10–15 ng/μL for SNP genotyping experiments. Two eQTL SNPs at rs9264942 and rs2395471 were selected based on a previous study22,23. Three SNPs at rs2270191, rs3132550, and rs6915986, which were in strong LD with the HLA-C*12:02~B*52:01~DRB1*15:02 haplotype, were also adopted based on an earlier report22. The genotyping of five SNPs located at rs9264942, rs2395471, rs2270191, rs3132550, and rs6915986 was performed with a TaqMan 5 exonuclease assay using primers supplied by Applied Biosystems (Foster City, CA, USA). The probe’s fluorescence signal was detected with a StepOne Plus Real-Time PCR System (Thermo Fisher Scientific) according to the manufacturer’s instructions.

SNP haplotype analysis

Haploview version 4.255,56 was used to evaluate the haplotype structure of the four SNPs at rs2270191, rs3132550, rs9264942, and rs6915986 (Fig. 2). Pairwise LD patterns and haplotype frequency analysis for all SNPs in patients and controls were analyzed by the block definition established by Gabriel et al.57.

Statistical analysis

The significance of associations was evaluated using chi-squared analysis or Fisher’s exact test. p values were subjected to Bonferroni’s correction by multiplication by the number of different alleles observed in each locus (pc). The Mann–Whitney U-test was used to analyze continuous variables. Stepwise logistic regression analysis with a forward approach was performed to identify independent SNPs associated with IBD susceptibility. A two-sided p value of < 0.05 was considered statistically significant. Association strength was estimated by calculating the OR and 95% CI with StatFlex software (version 7.0.3, Artech, Osaka, Japan) as well as IBM SPSS Statistics version 23.0 (IBM, Chicago, IL, USA).