A rare variant at 11p13 is associated with tuberculosis susceptibility in the Han Chinese population

Genome-wide association studies (GWASs) have yet to be conducted for tuberculosis (TB) susceptibility in China. Two previously identified single nucleotide polymorphisms (SNPs) from tuberculosis GWASs, rs2057178 and rs4331426, were evaluated for TB predisposition. The associations between SNPs and gene expression levels were analyzed using the genomic data and corresponding whole-genome expression of the Han Chinese in Beijing, China. Genotyping was successfully completed for 763 pulmonary TB patients and 763 healthy controls. The T allele of the rare variant rs2057178 was significantly associated with TB predisposition (χ2 = 14.07, P = 0.0002). Meanwhile, the CT genotype of rs2057178 was associated with a decreased risk of TB (adjusted OR = 0.52, 95% CI, 0.34–0.78). The CT genotype of rs2057178 was also associated with decreased expression levels of infection-related gene, suppressor of cytokine signaling 2 (SOCS2), and increased expression levels of v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog B (MAFB). No gene expression levels were found to be associated with the genotype of rs4331426. We found that the rare variant rs2057178 was significantly associated with TB in the Han Chinese population. Moreover, the expression levels of MAFB and SOCS2 correlated with rs2057178 and might be potential candidates for assessing TB susceptibility.

were identified in previous GWASs, were selected to verify their association with TB predisposition 10,11 . More importantly, the suggestion of Wilkinson 13 was adopted to test the latent TB infection status of the control group to control for the exposure factor of M. tb infection in this study. Meanwhile, the Gene Expression Omnibus Database was used to explore the relationship between the selected SNPs and whole genome mRNA expression levels 14,15 . Lymphoblastoid cell lines, which carry the complete set of germline genetic material, have been instrumental, in general, as a source of biomolecules and as a system to carry out various immunological and epidemiological studies 16 . Dissemination of M. tb in infected persons may be connected with the initiation of adaptive immune responses, which are under strict host genetic control 17 . The transcriptome level among the lymphoblastoid cell lines may be closely associated with disease-associated genetic variants. Thus, we consider that specific gene expression levels in lymphoblastoid cell lines may be closely regulated by host genomic variants. RNAs extracted from lymphoblastoid cell lines of the 45 unrelated CHB of the HapMap Project were quantified to explore the relationship between selected SNPs and the expression levels of potential TB susceptibility genes.

Methods
Study subjects. This case-control study was carried out in two designated hospitals with TB control programs in Jiangsu Province, eastern China. One was the TB control center of Danyang County, and the other was the Nanjing Chest Hospital in the province's capital city. Incident TB cases of Han ethnicity registered in these two hospitals from July 1st, 2013 to December 31st, 2014 were recruited for inclusion as cases in this study. All enrolled cases were bacteriologically confirmed by Lowenstein-Jensen (LJ) culture, and M. tb was identified using the p-nitrobenzoic acid (PNB) method. Meanwhile, the healthy controls were recruited from two communities from Danyang County during the same study period. All control candidates underwent X-ray examination. Sputum culture was provided if the potential controls reported having TB-like clinical symptoms. Only subjects with normal X-ray manifestation, negative LJ culture, if tested, and no comorbidity with other infectious diseases (such as HIV/AIDS and hepatitis B virus) were eligible as healthy controls. All of the controls were of Han ethnicity, and they were 1:1 matched to the cases by age (± 5 years) and gender. In total, 764 TB cases and 764 healthy controls were recruited for the study. All experimental protocols in this study were approved by the Institutional Review Board of the Center for Disease Control and Prevention of Jiangsu Province, and written informed consent was obtained from each participant before the study. Additionally, all of the methods in this study were carried out in accordance with the approved guidelines.

Interferon-Gamma Release Assay. The interferon-gamma release assay (QuantiFERON-TB Gold
In-Tube [QFT; Qiagen, Valencia, CA, USA]) was used to test the latent TB infection (LTBI) status of the controls. QFT was performed according to the instructions provided by Qiagen 18 .
Genotyping of rs2057178 and rs4331426. The restriction fragment length polymorphism (RFLP) method was used for the genotyping. The sequences of the primers used to amplify the PCR fragment of rs2057178 were as follows: 5′-TCC ATT GGC CTG AAC TGG AT-3′ (forward); 5′-TGG CCT CCA GTT CTT TAG CA-3′ (reverse). A 186 base pair PCR fragment was amplified by the primers. The restriction endonuclease enzyme StuI (New England BioLabs, inc., Ipswich, MA, USA) was used to digest the PCR fragment. The presence of the C allele results in two fragments: one fragment of 125 base pairs in length and one fragment of 61 base pairs in length. The presence of the T allele results in a single fragment of 186 base pairs in length. The PCR amplification fragment for rs4331426 was 250 base pairs in length, and the sequences of the amplification primers were as follows: 5′-AAG GGT GTT GTT CTG TTT CTA GA-3′ (forward), 5′-TGT TGC ACC ACC TCT TGT AGA-3′ (reverse). The restriction endonuclease enzyme HhaI (New England BioLabs, inc., Ipswich, MA, USA) was used to digest the PCR fragment. The presence of the G allele results in two fragments: one fragment of 202 base pairs in length and one fragment of 48 base pairs in length. The presence of the A allele results in a single fragment of 250 base pairs in length.

Genotypic data of rs2057178 and rs4331426 of the 45 Han Chinese in Beijing (CHB) from the HapMap Project and whole-genome expression levels from the Gene Expression Omnibus of
PubMed. The genotypic data of rs2057178 and rs4331426 were extracted from the HapMap Genome Browser Release #28 (phases 1, 2 & 3-merged genotypes and frequencies), and the genotypes of each SNP for the 139 CHB individuals were derived from this database. The DNA samples were prepared from blood samples collected from individuals living in the residential community at Beijing Normal University. All of the samples are from unrelated individuals who identified themselves as having at least three out of four Han Chinese grandparents.
Finally, 45 CHB provided validated genotypic data of the two SNPs. Using mRNAs extracted from lymphoblastoid cell lines for the corresponding 45 CHB, the Gene Expression Omnibus (GEO) of PubMed (accession number GSE6536) was used to analyze the relationship between the two SNPs and the whole-genome mRNA expression levels of the 47,293 genes 14,15 . Statistics. An unpaired Student t test was applied to numerical variables, whereas the differences in categorical variables were tested using the χ 2 test. The Cochran-Armitage trend test was used to compare the genotype dosage among the TB cases and controls. Hardy-Weinberg equilibrium (HWE) was assessed by Pearson χ 2 test. The strength of associations between genotypes and TB were estimated by odds ratio (OR) and its 95% confidence interval (95% CI) through univariate and multivariate logistic regression analyses adjusted for age and gender. A P value of less than 0.05 was considered statistically significant. The relationships between the genotypes of the SNPs and the gene expression levels of the 45 CHB were analyzed by the online software GEO2R based on the moderate t test. Meanwhile, the usual t test was also applied to analyze the relationship between the genotypes and the gene expressions. The Benjamini & Hochberg (False discovery rate) was adopted for the multiple comparison correction 19 . The significance level for gene expression among different genotype groups was P < 0.20 20

Results
A total of 763 TB cases and 763 controls were included in this analysis, with one case and one control failing the genotyping. The mean ages for TB cases and controls were 49.17 ± 17.48 years and 52.03 ± 17.33 years, respectively. The QFT results showed that the positive rate of LTBI for the controls was 23.1% (176/763). As shown in Table 1, the minor allele (T allele) frequency of rs2057178 was 0.048 in the TB cases and 0.027 in the healthy controls (χ 2 = 14.07, P = 0.0002). As the minor allele of rs2057178 was less than 0.05 and the TT genotype was only found in six subjects, the TT genotype among TB cases and controls showed a decreased risk of TB without reaching significance (adjusted OR = 0.56, 95% CI, 0.10-3.10). However, the CT genotype was significantly associated with a decreased risk of TB (adjusted OR = 0.52, 95% CI, 0.34-0.78). The dominant model (CT + TT vs. CC) demonstrated a protective effect on TB (adjusted OR = 0.52, 95% CI, 0.35-0.78). Based on the T allele frequency (0.048) of rs2057178 and the estimated TB prevalence (51/100000) in this region 21 , the corresponding power for the dominant OR of rs2057178 was 87.77%. For SNP rs4331426, the minor allele (G allele) frequencies between TB cases and controls showed no statistically significant difference (χ 2 = 0.04, P = 0.8390). The Hardy-Weinberg equilibrium test demonstrated that genotypes of each locus in the controls were all in Hardy-Weinberg equilibrium (χ 2 = 2.79, P = 0.095 for rs2057178 and χ 2 = 0.142, P = 0.704 for rs4331426).
Then, the controls were classified as QFT-positive and QFT-negative to compare the genotype distributions of the two variants among the TB cases, non-infected controls and LTBI controls. The data in Table 2 show that the CT genotype of rs2057178 was significantly associated with a 0.60-fold (adjusted OR = 0.40, 95% CI, 0.22-0.70) decreased risk of TB in QFT-positive controls. The protective effect of the CT genotype was also observed in the QFT-negative controls (adjusted OR = 0.57, 95% CI, 0.36-0.90). The Cochran-Armitage trend test showed that the proportion of the CT genotype was increasing from the QFT-negative controls (8.0%) to the QFT-positive controls (11.4%, P trend = 0.0006). No associations of rs4331426 with TB were found among the TB cases and the subgroups of the controls.
The liner regression analysis was conducted between the genotypes of the two SNPs and the whole genome mRNA expression levels in the lymphoblastoid cell lines. The 45 CHB were classified into two groups based on the genotypic data of rs2057178 and rs4331426. For rs2057178, 42 CHB had the CC genotype, two CHB had the CT genotype and one CHB failed the genotyping (no TT genotype was found). For rs4331426, 39 CHB had the AA genotype, four CHB had the AG genotype and two CHB failed the genotyping (no GG genotype was found). The GEO2R analyzed the expression levels of 47293 genes in each CHB subject, and 28 genes revealed significantly different expression levels between the two genotype groups of rs2057178 by moderate t test after multiple comparison (Table 3). However, the usual t test only found the first 20 genes were in relationship with SNP rs2057178. Thus, we included the first 20 genes for both reaching significance. No gene expression levels were found to be associated with the genotypes of rs4331426 after multiple comparison adjustment (data not shown).

Discussion
In this case-control study, the T allele of SNP rs2057178 was significantly associated with a decreased risk of TB in the Han Chinese population, and 28 mRNA levels were found to be associated with the genotypes of rs2057178 in  the 42 CHB, which indicated a potential functional role of rs2057178 in modulating those gene expression levels.
However, no genotype of rs4331426 was found to be associated with TB susceptibility and no gene expression levels were associated with any genotype of rs4331426. Thy et al. first reported that rs4331426 was associated with TB susceptibility in a GWAS of the African population in 2010 11 . The HapMap data showed that the G allele frequency of rs4331426 in the Han Chinese population was 0.044, whereas the G allele frequency in the African population was 0.51. Because of the vast difference in the G allele frequencies between the African and Asian populations, the results of repeated association studies would be different. Another replicated association study conducted in the Chinese population by Wang et al. found that the G allele of rs4331426 had an opposite effect on TB susceptibility 22 compared with the results of Thy et al. The G allele frequency was 0.0338 in the control group of Wang's study, while the G allele frequency in our control group was 0.0301, all were close to the G allele frequency of 0.044 of the CHB of the HapMap data. Based on our data, we did not find any association between the genotypes of rs4331426 and TB predisposition. Another two association studies conducted in the Chinese population also did not find a relationship between rs4331426 and TB risk 23,24 . As SNP rs4331426 was located in a gene desert region, it was difficult to determine the function of the locus. Generally, it was postulated that other functional loci, in linkage with rs4331426, would be the target loci that were involved in the mechanism of predisposition to TB. In this study, the relationship between the genotypes of rs4331426 and the whole-genome expression levels indicated that no gene expression levels were correlated with the genotypes of rs4331426 after multiple comparison adjustment.
A subsequent GWAS by Thy et al. revealed that rs2057178 was associated with TB susceptibility 10 . In Thy's study, the effect of the T allele on TB susceptibility was further verified in the Gambian and Russian populations. However, the association between rs2057178 and the predisposition to TB failed to replicate in the Indonesian population. Another study conducted in the Asian population also failed to replicate the protective effect of the T allele of rs2057178 on TB susceptibility 23 . It is interesting that a recent GWAS conducted in the African population also revealed the relationship between rs2057178 and TB susceptibility 9 . For the populations discussed above, the HapMap data showed that the T allele frequency of rs2057178 varied with a broad spectrum; it was highest in the African population (0.33) and lowest in the Asian population (0.02). Inter-population heterogeneity cannot be ignored for genetic susceptibility for TB. However, in this study, SNP rs2057178 was found to be significantly associated with TB susceptibility in the Han Chinese population. When the control group was stratified into QFT-positive and QFT-negative groups, the proportion of the CT genotype in the QFT-positive group (11.4%) was higher than that of the QFT-negative group (8%). The Cochran-Armitage trend test revealed that the QFT-positive group with the CT genotype would be more resistant to TB, which suggested that the T allele of rs2057178 might protect people latently infected with TB from developing TB disease. Although the locus was associated with TB susceptibility, the functional role of the locus was not clearly determined as the SNP was located in an intergenic region. SNP rs2057178 was in the 45 Kb downstream of Wilms' tumor 1 (WT1) gene, which had been shown to be associated with the occurrence of Wilms' tumor 25 . It was reported that WT1 variants might play a role in altering the effects of interferon-beta on vitamin D 26 , which had been shown to be beneficial in the treatment of TB 27 . Meanwhile, the WT1 gene was involved in the activation of the vitamin D receptor 28 , which was critically important for binding with 1,25-dihydroxyvitamin D3 to modulate the immune system in fighting M. tb infection 29 .
Although the HapMap genotypic data of rs2057178 and the whole-genome expression levels of the 42 CHB did not reveal a significant association between the genotypes of rs2057178 and WT1 gene expression, another 20 gene expression levels were found to be significantly associated with rs2057178. More importantly, v-maf avian musculoaponeurotic fibrosarcoma oncogene homolog B (MAFB, Fig. 1), suppressor of cytokine signaling 2 (SOCS2, Fig. 2) were found to be associated with SNP rs2057178 in infectious diseases. According to the fold change (FC) of the gene expressions, MAFB was up expressed by 31% while SOCS2 was down expressed by 57%. MAFB was first reported to be a candidate gene for TB susceptibility in a GWAS by Mahasirimongkol et al. 30 Table 2. The genotypes distribution of rs2057178 and rs4331426 between tuberculosis cases and IGRA positive and negative controls. * Adjusted by age and gender. and the expression level of MAFB was found to be higher in patients with active TB compared with the healthy controls and previous TB cases 31 . Our study provided evidence for rs2057178 in modulating the TB susceptible gene MAFB in trans effect, and the mechanism of modulation needs to be further explored. Simultaneously, the SOCS2 expression level was significantly decreased in the CT genotype of rs2057178 for the CHB compared with the CC genotype. A previous study showed that SOCS2 was required to mediate the effects of lipoxin 32 , which was thought to negatively regulate protective Th1 responses against mycobacterial infection in vivo 33 .
Even though the interferon regulatory factor 5 (IRF5, Fig. 3) was not found in association with SNP rs2057178 by the usual t test, and the FC showed that IRF5 was only slightly decreased (14%) among the CT genotype of rs2057178, IRF5 has an important role in the type 1 interferon response to M. tb 34    between the protective effect of the CT genotype of rs2057178 and the decreased expression level of IRF5, and the level of IRF5 needs to be further validated in TB cases and controls. Several limitations need to be noted in this study. First, the QFT method is an indirect method for detecting latent TB infection, and it may not accurately represent the existence of M. tb in vivo because we do not know how long the immunological reaction to M. tb will last. However, the differentiation ability of the QFT is more convincing in detecting M. tb-induced infection rather than other Mycobacteria when compared with the tuberculin skin test. Second, the transcriptome varies considerably across different cell populations and developmental stages. A previous study revealed the different cell-type associated gene expression profiles of tuberculosis 35 . Even some researchers found that the interferon-inducible genes were predominantly expressed in neutrophils and, to some extent, in monocytes, but not in T cells 36 . Gene expression levels in other cell types should be evaluated to comprehensively reveal the potentially distinct gene expression profiles of different cell populations. Third, the limited sample size for revealing the associations between SNPs and gene expression levels may be confounded by other factors, and it is worthwhile to compare the actual mRNA expression levels among the TB cases and the healthy controls with larger samples in future studies.
In conclusion, we replicated the loci of TB GWAS in the Han Chinese population. We found that rs2057178 was significantly associated with TB predisposition and that the expression levels of MAFB and SOCS2 were significantly associated with the genotypes of rs2057178. We assume that MAFB and SOCS2 could be potential candidate genes for TB susceptibility in the Han Chinese population. Further functional studies are required to reveal the mechanism of host genetics on TB susceptibility. Additionally, the liner regression analysis for the association between SNP genotypes and gene expression levels could be a choice for exploring the potential functional role of disease predisposition loci.