Introduction

Tuberculosis (TB), caused by Mycobacterium tuberculosis infection, is an infectious disease that is reemerging as a major killer of humans. It has been estimated that one third of the world's population are infected by TB, with more than nine million becoming sick each year with “active” TB, which can be spread to healthy people1. Most people infected with Mycobacterium tuberculosis do not develop active diseases, a status called “latent tuberculosis infection (LTBI)”; only 5–10% eventually develop active tuberculosis diseases, indicating that host response plays a critical role in containment of TB infection. Indeed, both animal and clinical studies have demonstrated that cellular immunity, particularly the CD4+ T cell immunity, plays indispensable roles in protective immune responses against TB infection2.

So far, a large number of polymorphisms in the human leucocyte antigens (HLA) have been discovered to be associated with TB susceptibility in a wide range of ethnic populations3. The variations in cytokines and their receptors have also been shown to influence susceptibility to TB. For example, A874T polymorphism in the first intron of interferon (IFN)-γ gene was reported to be associated with TB susceptibility in Sicilians4, South Africans5 and Chinese6. Other cytokines, such as IL-1, IL-2, IL-4, IL6, IL-10, IL-12, TNF-α and TGF-β, have also been associated with TB3. A recent genome wide association study identified a single nucleotide polymorphism (SNP) in the gene desert region of 18q11.2 that was associated with tuberculosis with p-value of 6.8×10−9 in Ghana and Gambia populations7.

It was recently reported that IL-17 and IL-22, produced by T helper type 17 (Th17) cells and NK cells, play important roles in TB protective immune response8,9. In vitro experiments showed that IL-22 can directly affect human macrophages, improving cells' protective immunity against TB infection10. IL-22 has also been shown to induce host defense genes and thus is crucial for controlling the gram-negative pulmonary pathogen, Klebsiella pheumaniae11. Knock-out study observed that IL-22 null mice showed increased intestinal epithelial damage, systemic bacterial burden and mortality after Citrobacter rodentium infection12. It was also reported that IL-22 can inhibit intracellular mycobacterial growth by enhancing phagolysosomal fusion, which contributes to the immune defenses of NK cells against M. tuberculosis10.

We thus hypothesize that the genetic variations in the IL-22 gene among humans play an important role in TB susceptibility. Based on our previous study13, we are interested in the SNPs in the IL-22 gene region that could affect TB susceptibility, particularly the SNPs that could affect the regulation of this gene. To address this hypothesis, we carried out extensive genotyping experiments in the gene and evaluated their relationship with TB susceptibility.

We adopted a two-stage approach to genotype the selected SNPs for the experimental cohort and further validated our discovery in a validation cohort. In the first stage, we scanned a relatively large number of SNPs in a limited number of samples to identify the markers with potential interest. We then selected a small number of SNPs that showed promise and typed them in a large number of independent samples. The two-stage scan has been adopted in many genome-wide association studies and has been shown to be effective. In this study, during the first stage, we typed 48 cases and 48 controls, which were randomly selected from the pools. We then selected 6 SNPs that met our selection criteria and typed the remaining 431 cases and 310 controls. We finally genotyped one SNP (rs2227473) in the independent validation cohort (413 cases and 241 controls) to further confirm our study. The biological functions (target gene IL-22's expression) of different genotypes under specific and nonspecific stimulations were investigated and compared.

Results

Reduced susceptibility of the rs2227473A allele in patients with TB in stage I data

In stage I of 48 cases and 48 controls, the rs2227473 marker, which is located in -1756 bp upstream of the IL-22 gene transcription start site (TSS), was in Hardy-Weinberg equilibrium. The minor allele frequency of 0.103 in the control is comparable to the annotations in the HapMap (0.102 for CHB+JPT, 0.120 for YRI and 0.208 for CEU). The ‘A’ allele is associated with decreased risk of TB susceptibility at p-value of 0.0275, with odds ratio (OR) of 0.188 (confidence interval (CI) = 0.037–0.967), as shown in Table 1. The haplotype analysis was further performed in the surrounding region of this SNP. We found that the haplotype TCATGA—which is made of SNPs rs2227485, rs2227484, rs2227483, rs2227478, rs2227473 and rs2227472—shows a trend of association with decreased risk of TB susceptibility at p-value of 4.89E-5 ( Supplementary Table S1 ). We therefore selected these 6 SNPs for genotyping in the second stage.

Table 1 The association of TB susceptibility and the rs2227473 marker in the promoter region of IL-22 in multiple stages of association studies in Chinese.

Reduced susceptibility of the rs2227473A allele in patients with TB in stage II and pooled data

The association of rs2227473 was confirmed in the stage II experimental dataset with 431 cases and 310 controls. As shown in Table 1, the association is significant at a p-value of 0.0343, with OR = 0.694 (95% CI = 0.494–0.975). Analysis on the pooled data (479 cases and 358 controls) showed even more significant association at a p-value of 0.0086, with OR = 0.653 (95% CI = 0.449–0.896). This SNP was predicted to alter the protein-DNA interactions among several transcription factors (TFs), including three TFs from the UniPROBE database and two TFs from the Jaspar database ( Supplementary Table S2 ).

Marginal association between rs2227473 and TB in validation cohort

The association was further validated by an independent cohort with 413 cases and 241 controls, which were collected after Feb., 2010 (Table 1). The association was marginally significant in this dataset, with p-value of 0.061, OR = 0.702 (95% CI = 0.484–1.107). However, when we combined the experimental and validation cohorts, the association was very significant at a p-value of 0.001 (OR = 0.663, with 95% CI = 0.518–0.847).

Effects of rs2227473 genotypes on IL-22's protein expression

To test whether different genotypes of the rs2227473 affect IL-22's expression levels, we stimulated patients' peripheral blood mononuclear cells (PBMCs) with both anti-CD3/anti-CD28 antigene nonspecific and Mtb antigen-specific stimulations. Patients with A allele (GA+AA, n = 29) at rs2227473 had significantly higher IL-22 protein productions than those without A allele (GG, n = 29) under both non-specific (p value = 0.0095) and specific stimulations (p value = 0.0099, Figure 1).

Figure 1
figure 1

IL-22 production by PBMCs from individuals with polymorphism in IL-22 (rs2227473).

Subjects were grouped for SNP (rs2227473) in IL-22 as genotype with (GA+AA, n = 29) or without (GG, n = 29) A allele. Frozen PBMCs were thawed and cultured for 96 hours in the presence of anti-CD3/anti-CD28 (1µg/ml each), or killed Mtb. (20µg/ml). Supernatant was collected and the concentration of IL-22 was determined by ELISA. IL-22 protein production by PBMC in response to (A) anti-CD3/anti-CD28 and (B) Mtb. Data are represented as mean ± SEM. P values are indicated.

Other polymorphisms in the IL-22 promoter region that reduce the susceptibility of TB in stage II and pooled data

Among the other five SNPs we genotyped in stage II of the experimental cohort, four of them showed significant association with TB susceptibility (Table 2). The locations of the four SNPs to the IL-22 TSS are rs2227472 (-1851bp), rs2227478 (-1340bp), rs2227483 (-894bp) and rs2227485 (-431bp). SNP rs2227472 is 95 bp upstream of our major SNP, rs2227473. Its A allele is a major allele in controls, but a minor allele in cases in both stage II and pooled data. Similar patterns were observed for two other SNPs, rs2227478 and rs2227485, which are 416 bp and 1325 bp downstream of rs2227473, respectively. All the four SNPs have p-values in the range of 0.021–0.044, with OR range from 0.74–0.80 (Table 2) for the stage II data. Similar results were observed for the pooled data.

Table 2 The association of TB susceptibility and polymorphism of four SNPs in the promoter region of IL-22 in the second stage and two stages pooled studies in Chinese.

Haplotypes in the IL-22 promoter region that affect the susceptibility of TB in pooled data

We observed very high linkage disequilibrium among the five SNPs in the control population of the pooled data (Figure 2). We thus performed the association study between the haplotypes of these five SNPs and TB susceptibility. As shown in Table 3, haplotype CTTAA (SNPs in the order of rs2227485, rs2227483, rs2227478, rs2227473 and rs2227472) shows very strong association with decreased TB susceptibility, at p-value of 2.12E-6, with OR at 0.04 (95% CI = 0.01–0.35). On the other hand, the haplotype TATGG, which consists of all major alleles, shows significant association with increased TB susceptibility at p-value of 0.01, with OR at 1.49 (95% CI = 1.05–2.13). In the pooled data of the two stages, the p-value is slightly more significant for rs2227478 and a bit less significant for rs2227472, when comparing with stage II alone. The linkage disequilibriums (LDs) of the five SNPs are given in Figure 2, showing moderate LDs.

Table 3 Association of TB susceptibility and IL22 haplotypes and in the pooled study in Chinese (479 cases and 358 controls)
Figure 2
figure 2

Pattern of linkage disequilibrium across the IL22 promoter region in the pooled Chinese population.

The LD pattern is represented by pairwise D' values between SNPs based on genotypes from the pooled study of 479 cases and 358 controls. D' values (×100) for each comparison are given in the squares. Empty squares represent D' values equal to 1.

The association between SNPs in the IL-22 promoter region and TB susceptibility in pooled data stratified by age at diagnosis (≤25 and >25)

We further stratified our pooled data according to the age when the patients were admitted to the hospital for the experimental cohort. As shown in Table 4, out of five SNPs showing significant association in the pooled data, three SNPs (rs2227472, rs2227483 and rs2227485) showed more significant association in younger patients but not in elder patients and two other SNPs (rs2227473 and rs2227478) showed significant association in elder patients but not in younger patients. Notably SNP rs2227485, which is marginally significant in the pooled data (p-value = 0.049), showed very significant association with decreased TB susceptibility in younger patients (age < = 25 years old) at p-value of 5.2E-5, with OR = 0.496 (95% CI = 0.303–0.663). But for the older group (age >25), this SNP did not show any association with TB susceptibility. A similar situation is true for SNP rs2227472.

Table 4 TB susceptibility and polymorphism of IL-22 in the pooled data analysis of association studies for different age groups of Chinese

Discussion

IL-22 has been reported to play a role in host defense against bacterial pathogens11. It is involved in host defense at environmental interfaces, such as mucosal surfaces of the airways and gastrointestinal tract. IL-22 can activate STAT3, which is a transcription activator that mediates the expression of a variety of genes in response to various cytokines14. It regulates genes that are involved in antimicrobial defense and cellular differentiation15. IL-22 can restrict the growth of M. tuberculosis in macrophages by enhancing phagolysosomal fusion10. However, the associations between genetic variations in the IL-22 gene and its promoter region and TB susceptibility have never been reported. In this study, we carried out this association study between the IL-22 gene and found several SNPs in the IL-22 promoter region that influence TB susceptibility.

Our approach of SNP selection is different from the traditional way of tagging SNPs, where SNPs were usually selected based only on the linkage disequilibrium in a genomic region. The SNP selection process focuses more on efficiency, which maximizes the coverage but minimizes the cost (the number of SNPs for genotyping)16. After genotyping, one can search for the functional SNPs with close LD to the tagged SNP that shows significant associations, as we did previously in breast cancer scanning17. However, there are disadvantages of this approach. (1) The LD is usually inferred from a limited number of previously genotyped data, such as HapMap. It might be different in the current studied population. (2) It ignores the functional role of the SNP and many tagged SNPs are in the gene desert regions and have little biological meanings18,19. Our SNP selection approach focuses on the functional role of the SNPs, particularly in the promoter region of the gene that might affect the transcription factor binding sites. We found this simple approach very effective in identifying functional SNPs for our IL22 promoter study. Among the thirteen regulatory SNPs we selected, five (38.5%) showed significant association with TB susceptibility. This number is very high compared with non-regulatory SNPs, of which we selected seven SNPs, with none of them showing significant association. However, the effectiveness of this approach on other genes is unknown and yet to be tested.

The SNPs we found are biologically meaningful. The SNP rs2227473 alters the putative binding sites of several transcription factors from multiple databases, including Cgd2_3490 and PF14_0633, from the UniPROBE database and RCGCANGCGY and TCCCRNNRTG from the Jaspar database. Both Cgd2_3490 and PF14_0633 are hypothetical transcription factors containing AP2 domain. Cgd2_3490 is from Cryptosporidium parvum2021 and PF14_0633 is from Plasmodium falciparum22,23,24. We propose that the two proteins from pathogens can target the host genes and regulate IL-22's expression. The variations in the binding sites of the two proteins provide a differential way of regulation. The two motifs, RCGCANGCGY and TCCCRNNRTG, were discovered by scanning the promoters of all human genes for conserved motifs25. RCGCANGCGY matches the binding consensus of transcription factor NRF1 (nuclear respiratory factor 1), which binds to the cytochrome c promotor26 and is essential for cell survival in oxidative stress inducing agents27.

Consistent with previous research that patients with tuberculosis have significantly lower Th22 response than healthy donors28, our functional assay indicated that PBMCs from individuals carrying rs2227473G allele in IL-22, which is associated with susceptibility to tuberculosis, produce significantly lower IL-22 in response to polyclonal and Mtb antigen stimulation. Although we are currently uncertain with the mechanism of predisposition to tuberculosis in individuals with the rs2227473G allele, it probably lies at the level of the macrophage as well as lung epithelial cell. The macrophage is not only the most important effector of immunity to kill infected Mycobacterium tuberculosis, but also the niche to host them. It has been well recognized that Mycobacterium tuberculosis resides in phagosomes within macrophages and killing these mycobacteria tuberculosis requires fusion of the phagosome with a lysosome. Importantly, Dhiman et al demonstrated that human IL-22 inhibits growth of mycobacterium tuberculosis by enhancing phagolysosomal fusion10. While there is no evidence to show the effect of IL-22 on epithelial cells to kill Mycobacterium tuberculosis, IL-22 has been known to enhance lung epithelial cells to produce β-defensin15,29, which is beneficial to clear infected Mycobacterium30. Besides, several studies have shown that IL-22 participates in the protective immunity against tuberculosis28,31,32. Taken together, our findings indicate that rs2227473 polymorphism controls IL-22 response upon Mycobacterium tuberculosis infection, which in turn determines, at least in part, the outcome of infection.

In summary, our study demonstrates an association between polymorphisms in the promoter of IL-22 gene and TB susceptibility. The rs2227473 polymorphism and associated SNPs in the promoter region of the IL-22 gene are associated with decreased susceptibility to pulmonary TB in Chinese. The ‘TGTAG’ haplotype, which presents in 45.6% of the Chinese population, is associated with increased susceptibility to TB. The rs2227473 polymorphism affects IL-22's protein expression. Our SNP selection strategy focusing on regulatory SNPs is effective in identifying susceptive loci for the IL22 gene.

Methods

Subjects and samples

Patients with different manifestations of active TB were recruited at clinics in the Shenzhen Third People's Hospital and Shenzhen Polytechnic College in Shenzhen, China. Healthy adults with no history of TB disease were also recruited as the control group. All participants had received bacillus Calmette-Guérin (BCG) vaccination at birth. We used M. tuberculosis-specific IFN-γ enzyme-linked immunospot (ELISPOT) assay to exclude LTBI from healthy donors and only those who were ELISPOT negative were selected as our healthy controls. The diagnosis of tuberculosis was based on mycobacterium tuberculosis examination, clinical symptoms and chest X-ray examination as described before13. Samples collected before Feb., 2010 were used as our experimental cohort (479 cases and 358 controls) and samples collected after that date were used as our validation cohort (413 cases and 241 controls). The statistics of age distributions and male/female ratios are listed in Supplementary Table S3 . The study obtained ethical approval from the Institutional Review Board of the Shenzhen Third People's Hospital and informed written consent was obtained from all the patients. Clinical specimens from patients with TB were collected within one week after anti-TB treatment. Whole blood was collected by venipuncture from the populations mentioned above.

SNP selection

Since we focus on the SNPs with regulatory roles, we are particularly interested in two types of SNPs: (a) the SNP that is in a putative transcription factor binding site and its two alleles that alter the binding scores and (b) the SNP that is in a putative microRNA target site and the two alleles that alter the miRNA-target interaction scores.

To search for the SNPs with alleles that alter the putative transcription factor binding sites, we scanned the promoter region of the gene with annotated Position Weight Matrices (PWMs) in Jaspar33, UniPROBE34 and TRANSFAC35 databases. We first obtained the gene coordinates (txStart and txEnd) for human genome hg18 (NCBI36) from the RefSeq table of UCSC table browser (http://genome.ucsc.edu/). For the annotated SNPs in dbSNP129 that are within 2000 bp upstream to 500 bp downstream of the IL-22 gene, we extracted their flanking sequence (+-25bp) from the dbSNP website (http://www.ncbi.nlm.nih.gov/projects/SNP/). For each SNP, we kept the flanking sequence the same and changed the alleles, thus obtaining one sequence for each allele and forming a sequence set. We then used a PWM_SCAN algorithm36 to scan each sequence in the set to test whether it had a putative binding site (PBS), with the method we described in37. We considered sites with a probability score of p-value < = 0.001 as PBS. If the SNP was within the PBS and any of the two alleles had differential p-values, we calculated the ratio (Sr) of p-values by dividing the bigger p-value by the smaller p-value. The Sr measures how the two alleles in the PBS change the binding scores between the PBS and putative binding protein. We then converted this Sr into a p-value based on a background distribution from a permutation test, based on the FastPval program38. If an SNP has Sr with p-value < 0.01, we considered this SNP as functional and thus selected it for genotyping.

To search for the SNPs with alleles that alter the putative microRNA target sites, we first searched the PITA database for the microRNAs that could target the 3′ UTR region of the IL-22 gene. For the SNP that is in the target site, we obtained its flanking sequence (+-50bp) from the UCSC genome browser according to its genomic location. We then changed the polymorphism site and obtained one sequence for each of the alleles. First, we tested whether the change of alleles also changed the microRNA-target interaction by RNAhybrid. The RNAhybrid calculates the minimum hybridization energy between the microRNA and its target. If the alleles in the polymorphic site change this energy significantly, it might enhance or repress the regulatory function of the RNA and thus plays a role in the disease. Second, we used RNAfold to calculate the thermodynamics of the putative target site. If the alleles change the thermodynamics of the target, this might affect the chance of the microRNA binding to its target and thus affect the function of the microRNA. SNPs with statistical significance in either target thermodynamics or microRNA-target interaction were selected for genotyping.

In addition to the regulatory SNPs, we also used the traditional tagging procedure to select SNPs that cover the IL-22 gene39. We chose the CHB+JPT HapMap panel, included our regulatory SNPs, chose r2 = 0.8 and used default settings for all other parameters. We also selected SNPs in the IL-22 gene region that are reported to be associated with disease. The functional, tagged and selected SNPs are shown in Supplementary Table S4 . The consensuses of the putative binding sites of each selected SNP are shown in Supplementary Table S2 .

SNP genotyping

Genomic DNA was extracted with a DNA isolation kit (Qiagen Inc., Germany), following the manufacturer's instruction. SNPs were typed by a high-throughput Sequenom® genotyping platform (San Diego, CA, USA). The genotypes were determined by a Homogenous Mass EXTEND assay. The Mass ARRAY AssayDesign software was used to design allele-specific extension primers. The primers designed for each selected SNP are listed in Supplementary Table S5 . The genotyping for the experimental cohort was carried out in Shenzhen and genotyping for the validation cohort was carried out in both Shenzhen and Hong Kong, with concordance >98%.

Statistical analysis

We used PLINK v1.07 and Haploview v4.0 for all the analysis undertaken. At each stage, SNPs that failed the Hardy-Weinburg equilibrium (p-value < 0.05) were removed from further analysis. We used allelic association tests to compare the frequency of SNP alleles in cases and controls. OR and 95% CI were estimated using logistic regression models. We performed allelic association tests for different age groups (< = 25 and >25). The haplotype association test was performed in the region surrounding the rs2227473 marker. SNPs or haplotypes with p-values < 0.05 were considered to be significant.

Quantifications of IL-22 under specific and non-specific stimuli of PBMCs

5ml peripheral blood samples of TB patients were collected and PBMCs were isolated by the Ficoll lymphocyte separation medium. PBMCs were then spread into a 96-well plate at the cell density of 4×105 and 1ug/ml of monoclonal antibodies anti-CD3 and anti-CD28 were added for non-specific stimulation and 20ug/ml of Mycobacterium tuberculosis (Mtb) lysate were added for specific stimulation. The stimulated PBMCs were incubated at 37°C, 5% CO2 for 96h. Cell supernatants were used for IL-22 protein quantification with ELISA.