Identification of novel genetic loci GAL3ST4 and CHGB involved in susceptibility to leprosy

Leprosy has long been thought to have a strong genetic component, and so far, only positional cloning and genomewide association studies have been used to study the genetic susceptibility to leprosy,while whole exome sequencing (WES) approach has not yet been applied. In this study, we used WES approach on four leprosy patients and four healthy control relatives from two leprosy families. We found three new susceptible loci of leprosy, one in GAL3ST4 and two in CHGB. We went on to validate the findings of WES using 151 leprosy cases and 226 healthy controls by Sanger sequencing. Stratified by gender, GAL3ST4 was found to be the susceptible gene only for the female population, and CHGB48 and CHGB23 were susceptibile to leprosy for the male population, respectively). Moreover, the gene expression levels of the three susceptible loci were measured by real-time PCR after the stimulation by M. leprae antigens in the PBMC (peripheral blood mononuclear cells) of 69 healthy people. The results showed that the female subjects with high frequent genotype in GAL3ST4 had a fivefold elevated expression. We suggest the polymorphisms in GAL3ST4 in different population are associated with increased risk of leprosy.

Leprosy is a chronic infectious disease caused by Mycobacterium leprae and about 200,000 cases were reported each year 1 . The clinical features of leprosy differ greatly among individuals, and previous studies showed that the widely different clinical manifestations of leprosy contrast with the low variability of the bacillus 2 . This suggests that the host genetic variation may have played a more important role in the pathogenesis of the disease 3,4 . One of the widely-used methods to explore the differences in human genetic component is the single-nucleotide polymorphism (SNP) variation study, and previous studies have identified certain associations between SNP variations and the host susceptibility to leprosy 5,6 . Recent genome-wide association studies (GWAS) on leprosy patients have identified some susceptibility loci (CCDC122, LACC1 (C13orf31), NOD2, TNFSF15, RIPK2, HLA-DR, HLA-DQ IL-23R) and RAB32 locus in the Chinese population [7][8][9] , which indicated the importance of host susceptibility in protection against M. leprae. Although the clinical progression of the disease may be associated with certain genes such as PARK2-PACRG 10 , few of these associations have been confirmed in different populations 11 .
Whole exome sequencing is a new powerful strategy to discover causative genes in rare Mendelian disorders 12 . Recently, this technology combined with a filtering methodology was demonstrated as an approach to identify susceptible genes among many genetic diseases 13 . Moreover, many genetic variants of common diseases such as diabetes, hypertension and tumor were found by this approach [14][15][16] . However, the application of this method in the infectious diseases is scarce. In this study, we enrolled both leprosy patients and healthy controls within leprosy families to further study hostgenetic variations and their associations with the disease using whole exome sequencing.
GAL3ST4, CHGB48, CHGB23, GLT8D2 and ANKRD35 were more frequently reported than the other ten variants among the remaining six patients from two leprosy families by means of gene sequencing (data not shown). We found no significant difference of the frequency of five variants between patients and healthy relatives (data not shown). However, three SNP loci (GAL3ST4, CHGB48 and CHGB23) were found to have significant difference in frequency between the leprosy patients and healthy controls from database of 1000 Hapmap project ( Table 2). GAL3ST4 polymorphisms in cases and controls. We further expanded the testing of the GAL3ST4 (rs3823646) variants in 151 cases and 226 endemic healthy people to verify whether the difference existed. Although no difference for GAL3ST4 (rs3823646) between the patients and controls was found (Table 3), the GAL3ST4 gene polymorphism was significantly different between leprosy patients and healthy controls in female population. The frequency of AG and GG genotype for female leprosy (50% and 25%, respectively) was higher than that of female controls (39.4% and 13.2%, respectively) (OR = 3.16, P = 0.018; OR = 3.75, P = 0.027, respectively). Furthermore, the percentage of G locus for female leprosy (50%) was higher than that of female controls (32.9%)(OR = 1.68, P = 0.004) ( Table 3). These results indicated GAL3ST4 might be the susceptible gene of female leprosy population. The Hardy-Weinberg equilibrium test for all leprosy and healthy control conformed to genetic principle (p > 0.05), indicating appropriate representation of sample collection. Gene expression of GAL3ST4 in different genotypes of female population. To assess whether the three kinds of genotype of GAL3ST4 on the susceptibility of leprosy in female patients is exerted through changes in GAL3ST4 expression, we performed an in vitro M. leprae antigen stimulation assay. PBMCs from 28 female healthy subjects (12 AA, 4 GG, and 12 AG) were stimulated with and without M. leprae antigens for 12 hours, respectively, and the expression level of the GAL3ST4 mRNA was quantified by real-time PCR. As shown in Fig. 1, significantly higher GAL3ST4 expression was observed following in vitro antigen stimulation in GG homozygotes compared to AA homozygotes (p = 0.018) or AG heterozygotes (p = 0.006), respectively. (Fig. 1). These results revealed the female population with G allele were more readily infected with M. leprae by means of mediating expression of GAL3ST4 in monocytes/macrophages. Therefore, we postulate a positively selected polymorphism in the GAL3ST4 exon region of genome for female population might be associated with the susceptibility to M. leprae infection by upregulating GAL3ST4 expression. There was no significant difference for CHGB48 polymorphisms between the patients and the controls (Table 4). However, CHGB48 polymorphism in male population was found to be significant between leprosy patients and healthy controls. The distribution of GC genotype for male leprosy (50%) was higher than that of male control (39.4%)(OR = 2.85, P = 0.011). Meanwhile, the frequency of CC locus in male leprosy (16.3%) was lower than that of male controls (22.3%)(OR = 0.38, P = 0.03). These results indicated G allele of CHGB48 might be the susceptible site of leprosy for male population. The Hardy-Weinberg equilibrium test for all leprosy patients and healthy controls indicated the samples were representative of population.

Gene expression of CHGB48 in different genotypes of male population.
To explore whether the three kinds of genotype of CHGB48 on the susceptibility of leprosy in male patients is exerted through changes in CHGB48 expression, we conducted an in vitro M. leprae antigen stimulation assay. Specifically, PBMCs from 28 male healthy subjects (12 GG, 5 CC, and 11 CG) were stimulated with and without antigen of M. leprae for 12 hours, respectively, and the expression level of the CHGB48 mRNA was quantified by real-time PCR. As shown in Fig. 2, no difference for CHGB48 expression was observed following in vitro antigen stimulation among different genotypes (Fig. 2). These results found no association between the susceptibility of CHGB48 locus and gene expression among the male population.    Table 4. Genotype and allele count of CHGB48 polymorphism in leprosy cases and healthy controls stratified by gender and adjusted by ethnicity and age. subjects (7 GG, 15AA, and 14AG) were stimulated with or without M. leprae antigens for 12 hours, respectively. The ratio of the RNA expression amount with or without antigen stimulation was regarded as the gene expression level, and CHGB23 RNA expression abundance was quantified by real-time PCR. As shown in Fig. 3, no significant difference for CHGB23 expression was observed following in vitro antigen stimulation among different genotypes (Fig. 3). These results revealed no relationship for the male population with G allele in CHGB23 gene between susceptibility and gene expression of leprosy.

C13orf31 polymorphisms in cases and controls.
In order to verify the reliability of three leprosy susceptible loci (GAL3ST4,CHGB48 and CHGB23) found in our experiment, the expression of the known leprosy susceptibility gene C13orf31 discovered in Chinese population 9 , was studied in the same group of leprosy patients and endemic healthy controls. As shown in Table 6, there was a significant difference for genotype between female multibacillary and paucibacillary. The frequency of genotype AG for female multibacillary patients (58.3%) was significantly higher than that of paucibacillary patients (8.3%) (OR = 16, p = 0.03). Additionally, the frequency of G allele for female multibacillary patients (50.2%) was also higher than that of female paucibacillary patients (20.8%). Meanwhile, the percentage of genotype GG in all leprosy patients (13.1%) was more than that of healthy controls (7.5%) (OR = 3.3, p = 0.045) ( Table 7). Accordingly, the frequency of G allele in all leprosy patients (36.3%) was higher than that of healthy controls (24.3%) (OR = 1.79, P = 0.024). These data indicated G allele of   C13orf31 gene was not only the susceptibility locus for female multibacillary, but also for the leprosy patients. The result was consistent with the report discovered by genomewide association study.

Discussion
Up to now, there have been three methods to identify the susceptible genes of leprosy. Positional cloning is the first conducted approach, and PARK2 and PACRG were the first susceptible genes of leprosy by this approach 10 .
The genomewide association studies for leprosy were later conducted and found about eleven leprosy susceptibility genes [7][8][9] . Besides, a few susceptible genes of leprosy were discovered by comparison of different frequency between leprosy cases and controls 17,18 . Fifty nine susceptible genes of leprosy have been identified by the above three kinds of method [19][20][21][22][23][24][25][26][27][28][29][30][31][32][33] . However, whole exome sequencing has not been utilized to find the susceptibility genes for leprosy. In this study, we first reported three susceptible loci using whole exome sequencing. One site was located in the GAL3ST4 gene, and the other two sites were located in the CHGB gene. Interestingly, we identified that the three loci are closely related to gender. After the stimulation by M. leprae antigens on PBMCs from healthy people, we observed that the female individuals with GG genotype had a significantly elevated GAL3ST4 expression levels than those with AA and AG. These results were in agreement with the effect we observed on genetic susceptibility, suggesting that the polymorphisms of these genes are associated with their expression after M. leprae infection.
GAL3ST4 gene is located on human chromosome 7, and its coding protein is galactose sulfonium transferase, which participates in glycoprotein synthesis, metabolism, and cell signal transduction 34 . Studies have shown that the mutations of this gene can lead to pectus excavatum 35 , and the gene may also be involved in childhood bone mature process. Our research shows that the gene may be one of the susceptibility loci in female leprosy patients. The female population who have G allele homozygous mutation in GAL3ST4 are more likely to be infected with M. leprae.
CHGB gene is situated on chromosome 20, encoding secreted protein tyrosine sulphation peptide, a regulation peptide precursor which is rich in the endocrine and nerve cells. This protein can function as hormone and also participate in protein binding 36 . We found the two polymorphisms of this gene are associated with leprosy. The heterozygous mutation of G5903848C in chromosome 20(CHGB48) is a susceptibility loci for leprosy in male population. However, GG homozygous mutations in this site may be a protective locus for male against leprosy, indicating that the mutation of the locus G to C is likely important. When G is mutated to C in this locus, it will lead to the change of amino acid from glycine to alanine, and may cause the change of protein function secreted by endocrine cells. The other leprosy susceptibility locus is located at a different site in the same gene CHGB23. The GG homozygous site is also one of the leprosy susceptibility loci for male population. When A is mutated to G at site 23, it will lead to the change of glutamine to arginine. The CHGB gene may play an important role for leprosy susceptibility, and men with GG homozygous are less likely to suffer from leprosy.
Due to the small number of the patients and healthy controls in our study, we selected the known leprosy susceptible gene C13orf31 as the positive control in the same group of leprosy patients and healthy people. The result confirmed that this gene is indeed a susceptibility locus for leprosy, and it is also specifically susceptible in female population. This finding revealed the reliability and accuracy of the other three susceptible loci for leprosy.
In conclusion, the GAL3ST4 and the CHGB allele variants 23 and 48 are novel genetic loci involved in susceptibility to leprosy among female and male population, respectively. However, these observations need to be further confirmed and validated in larger populations. Additionally, our genetic findings combined with the expression of GAL3ST4 and CHGB in PBMC, point strongly to an important function of secretogranin and galactose-3-O-sulfotransferase involved in the synthesis and metabolism of glycoprotein in the study of leprosy pathogenesis. Overall, our study demonstrates a significant association of GAL3ST4 and CHGB polymorphisms with leprosy and suggests that these gene polymorphisms may be a contributing factor in leprosy susceptibility. Further studies on functional characterization of SNPs may shed light on the association of these polymorphisms with leprosy.

Materials and Methods
Ethics statement. The study was designed and performed according to the Helsinki declaration and was approved by the Ethics Committees of the Beijing Tropical Medicine Research Institute. All patients and healthy blood donors provided written informed consent to participate in this study.
Subjects and samples. The clinical data and blood samples were obtained from one Chinese Han family that included5 leprosy patients and 26 unaffected individuals from, and another Chinese Zhuang ethnic minority family that included 11 patients and 52 unaffected individuals. All individuals with leprosy met the World Health Organization diagnostic criteria for the disease. Four affected individuals (II-4, III-6,8-1 and 11-1) and four unaffected individuals (II-6, III -8,8-2 and 8-6) from family 1 and family 2 were selected for whole exome sequencing (Fig. 4), and the remaining individuals from two families and other leprosy patients outside the two families and controls from the same endemic region were recruited for Sanger sequencing. A total of 151 leprosy cases and 226 controls were recruited for this study. All patients were Han ethnic Chinese or ethnic minorities from Yunnan province, and controls were from the same endemic region. Demographic and other selected characteristics of the cases and controls were presented (Table 8). Cases and controls showed statistically significant differences      Table 8. General characteristics of the subjects.
sequencing platform to ensure that each sample was covered to a depth of at least 50×. Raw image files were processed using Ilumina Pipeline version 1.9 for base calling with default parameters, and the sequences of each individual were generated as 90-bp paired-end reads. BWA was used to align the clean reads to the UCSC human reference genome (hg19). On the basis of BWA alignment results, GATK software was used to assemble the consensus sequence and call genotypes in target regions. Insertions and deletions (indels) in the exome regions were identified using GATK software. Public databases available from dbSNP137, the 1000 Genomes Project, and the NHLBI Exome Sequencing Project (ESP 6500) were used for analysis of the results.
Sanger sequencing. Sanger sequencing was performed to confirm the variants found by whole-exome sequencing. The PCR primers (sequences and conditions provided in Table 9) were designed to amplify the variants. PCR amplification was carried out using a thermal cycler PCR System (Takara) using standard conditions. PCR products were examined by 1% agarose gel electrophoresis, which were then sequenced directly by Shanghai Bioshine company.
μM random hexanucleotide primers, 50 μM oligonucleotide, and 0.5 μl RT Enzyme Mix I (TAKARA). cDNA quantification for GAL3ST4, CHGB23, CHGB48 and GAPDH was performed by real-time PCR (7500 PCR system, life Applied Biosystems, Foster City, CA, USA). Reactions were performed using a SYBR Green PCR mix (TAKARA). The primers used are listed in Table 9. Results were expressed as ⊿Ct between the target gene and the GAPDH housekeeping mRNA, and presented as ratios as 2 −⊿⊿Ct between the genes stimulated or unstimulated with M. leprae antigens.
Statistical analysis. The SPSS statistical software package ver.20.0 (SPSS Inc., Chicago, USA) was used for statistical analysis. The gene polymorphisms were tested for deviation from Hardy-Weinberg equilibrium (HWE) by comparing the observed and expected genotype frequencies using the Pearson chi-square test by HWE 2.1 software. The comparison of gene expression levels between cases and controls was performed using the Mann-Whitney U test. The chi-square test was used to compare the difference of ethnicity and gender between cases and controls. For SNP analysis, the genotype and allele frequencies of GAL3ST4 and CHGB were compared between groups using multivariate logistic regression analysis. p value, odds ratios (OR) and 95% confidence intervals (CIs) after adjusting age and ethnicity were calculated using binary unconditional logistic regression. P values less than 0.05 were considered statistically significant.