Introduction

Leprosy is a chronic infectious disease caused by Mycobacterium leprae and about 200,000 cases were reported each year1. The clinical features of leprosy differ greatly among individuals, and previous studies showed that the widely different clinical manifestations of leprosy contrast with the low variability of the bacillus2. This suggests that the host genetic variation may have played a more important role in the pathogenesis of the disease3,4. One of the widely-used methods to explore the differences in human genetic component is the single-nucleotide polymorphism (SNP) variation study, and previous studies have identified certain associations between SNP variations and the host susceptibility to leprosy5,6. Recent genome-wide association studies (GWAS) on leprosy patients have identified some susceptibility loci (CCDC122, LACC1 (C13orf31), NOD2, TNFSF15, RIPK2, HLA-DR, HLA-DQ IL-23R) and RAB32 locus in the Chinese population7,8,9, which indicated the importance of host susceptibility in protection against M. leprae. Although the clinical progression of the disease may be associated with certain genes such as PARK2-PACRG 10, few of these associations have been confirmed in different populations11.

Whole exome sequencing is a new powerful strategy to discover causative genes in rare Mendelian disorders12. Recently, this technology combined with a filtering methodology was demonstrated as an approach to identify susceptible genes among many genetic diseases13. Moreover, many genetic variants of common diseases such as diabetes, hypertension and tumor were found by this approach14,15,16. However, the application of this method in the infectious diseases is scarce. In this study, we enrolled both leprosy patients and healthy controls within leprosy families to further study hostgenetic variations and their associations with the disease using whole exome sequencing.

Results

Results of whole exome sequencing and validation

Whole exome sequencing was carried out in four patients and four unaffected individuals from two families with leprosy. The data of patients and controls were compared utilizing database of Hapmap 1000 project and European 6500, and twenty variants were found, and fifteen variants were further validated to be correct by Sanger sequencing (Table 1).

Table 1 Twenty variants identified by exome sequencing.

Comparison of validated variants between the remaining patients from two leprosy families and healthy controls from 1000 project database

We found that five gene variants including gene GAL3ST4, CHGB48, CHGB23, GLT8D2 and ANKRD35 were more frequently reported than the other ten variants among the remaining six patients from two leprosy families by means of gene sequencing (data not shown). We found no significant difference of the frequency of five variants between patients and healthy relatives (data not shown). However, three SNP loci (GAL3ST4, CHGB48 and CHGB23) were found to have significant difference in frequency between the leprosy patients and healthy controls from database of 1000 Hapmap project (Table 2).

Table 2 Comparison of validated variants between remaining patients from two leprosy families and healthy controls from 1000 project database.

GAL3ST4 polymorphisms in cases and controls

We further expanded the testing of the GAL3ST4 (rs3823646) variants in 151 cases and 226 endemic healthy people to verify whether the difference existed. Although no difference for GAL3ST4 (rs3823646) between the patients and controls was found (Table 3), the GAL3ST4 gene polymorphism was significantly different between leprosy patients and healthy controls in female population. The frequency of AG and GG genotype for female leprosy (50% and 25%, respectively) was higher than that of female controls (39.4% and 13.2%, respectively) (OR = 3.16, P = 0.018; OR = 3.75, P = 0.027, respectively). Furthermore, the percentage of G locus for female leprosy (50%) was higher than that of female controls (32.9%)(OR = 1.68, P = 0.004) (Table 3). These results indicated GAL3ST4 might be the susceptible gene of female leprosy population. The Hardy-Weinberg equilibrium test for all leprosy and healthy control conformed to genetic principle (p > 0.05), indicating appropriate representation of sample collection.

Table 3 Genotype and allele frequency of GAL3ST4 polymorphism in leprosy cases and healthy controls stratified by gender and adjusted by ethnicity and age.

Gene expression of GAL3ST4 in different genotypes of female population

To assess whether the three kinds of genotype of GAL3ST4 on the susceptibility of leprosy in female patients is exerted through changes in GAL3ST4 expression, we performed an in vitro M. leprae antigen stimulation assay. PBMCs from 28 female healthy subjects (12 AA, 4 GG, and 12 AG) were stimulated with and without M. leprae antigens for 12 hours, respectively, and the expression level of the GAL3ST4 mRNA was quantified by real-time PCR. As shown in Fig. 1, significantly higher GAL3ST4 expression was observed following in vitro antigen stimulation in GG homozygotes compared to AA homozygotes (p = 0.018) or AG heterozygotes (p = 0.006), respectively. (Fig. 1). These results revealed the female population with G allele were more readily infected with M. leprae by means of mediating expression of GAL3ST4 in monocytes/macrophages. Therefore, we postulate a positively selected polymorphism in the GAL3ST4 exon region of genome for female population might be associated with the susceptibility to M. leprae infection by upregulating GAL3ST4 expression.

Figure 1
figure 1

Plot of GAL3ST4 expression in response to stimulation with M. leprae antigens. Data were derived from PBMCs from 35 female healthy people stimulated with M.leprae antigens. GAL3ST4 transcript levels are revealed as ratios and shown in median with interquantile range; p values are calculated using the Manny-Whitney U test.

CHGB48 polymorphisms in cases and controls

We further validated whether the difference exists for the CHGB48 SNP(rs236132) in previously collected 140 leprosy patients and 190 endemic healthy people. There was no significant difference for CHGB48 polymorphisms between the patients and the controls (Table 4). However, CHGB48 polymorphism in male population was found to be significant between leprosy patients and healthy controls. The distribution of GC genotype for male leprosy (50%) was higher than that of male control (39.4%)(OR = 2.85, P = 0.011). Meanwhile, the frequency of CC locus in male leprosy (16.3%) was lower than that of male controls (22.3%)(OR = 0.38, P = 0.03). These results indicated G allele of CHGB48 might be the susceptible site of leprosy for male population. The Hardy-Weinberg equilibrium test for all leprosy patients and healthy controls indicated the samples were representative of population.

Table 4 Genotype and allele count of CHGB48 polymorphism in leprosy cases and healthy controls stratified by gender and adjusted by ethnicity and age.

Gene expression of CHGB48 in different genotypes of male population

To explore whether the three kinds of genotype of CHGB48 on the susceptibility of leprosy in male patients is exerted through changes in CHGB48 expression, we conducted an in vitro M. leprae antigen stimulation assay. Specifically, PBMCs from 28 male healthy subjects (12 GG, 5 CC, and 11 CG) were stimulated with and without antigen of M. leprae for 12 hours, respectively, and the expression level of the CHGB48 mRNA was quantified by real-time PCR. As shown in Fig. 2, no difference for CHGB48 expression was observed following in vitro antigen stimulation among different genotypes (Fig. 2). These results found no association between the susceptibility of CHGB48 locus and gene expression among the male population.

Figure 2
figure 2

Plot of CHGB48 expression in response to stimulation with M. lepraeantigens. Data were derived from PBMCs from 28 male healthy people stimulated with M. leprae antigens. CHGB48 transcript levels are revealed as ratios and shown in median with interquantile range; p values are calculated using the Manny-Whitney U test. ns: no significance.

CHGB23 polymorphisms in cases and controls

The 145 leprosy patients and 189 endemic healthy people were collected to validate whether the difference exist for the CHGB23 SNP(rs910122). The results showed not only significant difference for CHGB23 between the patients and controls (Table 5), but also, CHGB23 gene polymorphism in male population was found to be different between leprosy patients and healthy controls. The percent of AG genotype for leprosy (51.7%) was higher than that of control (46.6%) (OR = 1.69, P = 0.058). Meanwhile, the prevalence of AA genotype for male leprosy (29.7%) was lower than that of control (33.9%)(OR = 0.214, p = 0.029). Furthermore, stratified by gender, the frequency of AA for male leprosy (34.4%) was lower than that of male control (44.7%) (OR = 0.142, P = 0.003). Accordingly, the percentage of allele G for male leprosy (42.7%) was much higher that of A for male controls (35.1%)(OR = 11.73, p = 0.001). These results revealed G allele of CHGB23 might be the susceptible site of leprosy infection in the male population. The cases and healthy controls conformed to the Hardy–Weinberg equilibrium test, (p > 0.05), indicating the sample selection has appropriate representation.

Table 5 Genotype and allele count of CHGB23 polymorphism in leprosy cases and healthy controls stratified by gender and adjusted by ethnicity and age.

Gene expression of CHGB23 in different genotypes of male population

To examine whether the three kinds of genotype of CHGB23 on male leprosy susceptibility is exerted through changes in CHGB23 expression, we conducted an in vitro M. leprae antigen stimulation assay. Specifically, PBMCs from 34 male healthy subjects (7 GG, 15AA, and 14AG) were stimulated with or without M. leprae antigens for 12 hours, respectively. The ratio of the RNA expression amount with or without antigen stimulation was regarded as the gene expression level, and CHGB23 RNA expression abundance was quantified by real-time PCR. As shown in Fig. 3, no significant difference for CHGB23 expression was observed following in vitro antigen stimulation among different genotypes (Fig. 3). These results revealed no relationship for the male population with G allele in CHGB23 gene between susceptibility and gene expression of leprosy.

Figure 3
figure 3

Plot of CHGB23 expression in response to stimulation with M.leprae antigens. Data were derived from PBMCs of 34 male healthy people stimulated with M.leprae antigens. CHGB23 transcript levels are revealed as ratio and shown in median with interquantile range; p values are calculated using the Manny-Whitney U test. ns: no significance.

C13orf31 polymorphisms in cases and controls

In order to verify the reliability of three leprosy susceptible loci (GAL3ST4,CHGB48 and CHGB23) found in our experiment, the expression of the known leprosy susceptibility gene C13orf31 discovered in Chinese population9, was studied in the same group of leprosy patients and endemic healthy controls. As shown in Table 6, there was a significant difference for genotype between female multibacillary and paucibacillary. The frequency of genotype AG for female multibacillary patients (58.3%) was significantly higher than that of paucibacillary patients (8.3%) (OR = 16, p = 0.03). Additionally, the frequency of G allele for female multibacillary patients (50.2%) was also higher than that of female paucibacillary patients (20.8%). Meanwhile, the percentage of genotype GG in all leprosy patients (13.1%) was more than that of healthy controls (7.5%) (OR = 3.3, p = 0.045) (Table 7). Accordingly, the frequency of G allele in all leprosy patients (36.3%) was higher than that of healthy controls (24.3%) (OR = 1.79, P = 0.024). These data indicated G allele of C13orf31 gene was not only the susceptibility locus for female multibacillary, but also for the leprosy patients. The result was consistent with the report discovered by genomewide association study.

Table 6 Genotype and allele count of C13orf31 polymorphism in multibacilary and paucibacilary patients stratified by gender and adjusted by ethnicity and age.
Table 7 Genotype and allele count of C13orf31 polymorphism in leprosy and healthy people stratified by gender and adjusted by ethnicity and age.

Discussion

Up to now, there have been three methods to identify the susceptible genes of leprosy. Positional cloning is the first conducted approach, and PARK2 and PACRG were the first susceptible genes of leprosy by this approach10. The genomewide association studies for leprosy were later conducted and found about eleven leprosy susceptibility genes7,8,9. Besides, a few susceptible genes of leprosy were discovered by comparison of different frequency between leprosy cases and controls17,18. Fifty nine susceptible genes of leprosy have been identified by the above three kinds of method19,20,21,22,23,24,25,26,27,28,29,30,31,32,33. However, whole exome sequencing has not been utilized to find the susceptibility genes for leprosy. In this study, we first reported three susceptible loci using whole exome sequencing. One site was located in the GAL3ST4 gene, and the other two sites were located in the CHGB gene.

Interestingly, we identified that the three loci are closely related to gender. After the stimulation by M. leprae antigens on PBMCs from healthy people, we observed that the female individuals with GG genotype had a significantly elevated GAL3ST4 expression levels than those with AA and AG. These results were in agreement with the effect we observed on genetic susceptibility, suggesting that the polymorphisms of these genes are associated with their expression after M. leprae infection.

GAL3ST4 gene is located on human chromosome 7, and its coding protein is galactose sulfonium transferase, which participates in glycoprotein synthesis, metabolism, and cell signal transduction34. Studies have shown that the mutations of this gene can lead to pectus excavatum35, and the gene may also be involved in childhood bone mature process. Our research shows that the gene may be one of the susceptibility loci in female leprosy patients. The female population who have G allele homozygous mutation in GAL3ST4 are more likely to be infected with M. leprae.

CHGB gene is situated on chromosome 20, encoding secreted protein tyrosine sulphation peptide, a regulation peptide precursor which is rich in the endocrine and nerve cells. This protein can function as hormone and also participate in protein binding36. We found the two polymorphisms of this gene are associated with leprosy. The heterozygous mutation of G5903848C in chromosome 20(CHGB48) is a susceptibility loci for leprosy in male population. However, GG homozygous mutations in this site may be a protective locus for male against leprosy, indicating that the mutation of the locus G to C is likely important. When G is mutated to C in this locus, it will lead to the change of amino acid from glycine to alanine, and may cause the change of protein function secreted by endocrine cells. The other leprosy susceptibility locus is located at a different site in the same gene CHGB23. The GG homozygous site is also one of the leprosy susceptibility loci for male population. When A is mutated to G at site 23, it will lead to the change of glutamine to arginine. The CHGB gene may play an important role for leprosy susceptibility, and men with GG homozygous are less likely to suffer from leprosy.

Due to the small number of the patients and healthy controls in our study, we selected the known leprosy susceptible gene C13orf31 as the positive control in the same group of leprosy patients and healthy people. The result confirmed that this gene is indeed a susceptibility locus for leprosy, and it is also specifically susceptible in female population. This finding revealed the reliability and accuracy of the other three susceptible loci for leprosy.

In conclusion, the GAL3ST4 and the CHGB allele variants 23 and 48 are novel genetic loci involved in susceptibility to leprosy among female and male population, respectively. However, these observations need to be further confirmed and validated in larger populations. Additionally, our genetic findings combined with the expression of GAL3ST4 and CHGB in PBMC, point strongly to an important function of secretogranin and galactose-3-O-sulfotransferase involved in the synthesis and metabolism of glycoprotein in the study of leprosy pathogenesis. Overall, our study demonstrates a significant association of GAL3ST4 and CHGB polymorphisms with leprosy and suggests that these gene polymorphisms may be a contributing factor in leprosy susceptibility. Further studies on functional characterization of SNPs may shed light on the association of these polymorphisms with leprosy.

Materials and Methods

Ethics statement

The study was designed and performed according to the Helsinki declaration and was approved by the Ethics Committees of the Beijing Tropical Medicine Research Institute. All patients and healthy blood donors provided written informed consent to participate in this study.

Subjects and samples

The clinical data and blood samples were obtained from one Chinese Han family that included5 leprosy patients and 26 unaffected individuals from, and another Chinese Zhuang ethnic minority family that included 11 patients and 52 unaffected individuals. All individuals with leprosy met the World Health Organization diagnostic criteria for the disease. Four affected individuals (II-4, III-6,8-1 and 11-1) and four unaffected individuals (II-6, III -8,8-2 and 8-6) from family 1 and family 2 were selected for whole exome sequencing (Fig. 4), and the remaining individuals from two families and other leprosy patients outside the two families and controls from the same endemic region were recruited for Sanger sequencing. A total of 151 leprosy cases and 226 controls were recruited for this study. All patients were Han ethnic Chinese or ethnic minorities from Yunnan province, and controls were from the same endemic region. Demographic and other selected characteristics of the cases and controls were presented (Table 8). Cases and controls showed statistically significant differences with regard to age, gender and ethnicity (P < 0.05). As for the leprosy patients, 92 were multibacillary, and 59 were paucibacillary.

Figure 4
figure 4

The pedigrees of the two families affected by leprosy included in the present study. Filled-in symbols indicate individuals with leprosy, empty circles indicate unaffected individuals, and symbols with a slash through them indicate deceased individuals. √, exome sequencing individuals. BL, LL, BT and TT are the clinical pathological type of leprosy. Arrows indicate the probands of the families.

Table 8 General characteristics of the subjects.

Whole-exome sequencing

Genomic DNA was extracted from peripheral blood with EDTA anticoagulation using a QIAamp DNA Blood Mini kit (Qiagen). Purified DNA was analyzed on a ND-8000 spectrophotometer (Nanodrop, Technologies, Wilmington, DE), and Qubit (Invitrogen, CA, USA), to determine the quantity. DNA samples were used only if the 260/280 ratio was above 2.0 and no smear on the agarose. High quality DNA (1 μg) was used as the starting material. The DNA was fragmented by Bioruptor Sonicator (Diagenode, USA). The Truseq DNA sample preparation kit was used for end repair, dA tailing, adaptors ligation and DNA fragments enrichment. The TruSeq Enrichment kit was used to capture exome or custom sequences of a human DNA library. After two rounds hybridization and wash, the DNA exome library was subjected toHiSeq. 2500 sequencing platform to ensure that each sample was covered to a depth of at least 50×. Raw image files were processed using Ilumina Pipeline version 1.9 for base calling with default parameters, and the sequences of each individual were generated as 90-bp paired-end reads. BWA was used to align the clean reads to the UCSC human reference genome (hg19). On the basis of BWA alignment results, GATK software was used to assemble the consensus sequence and call genotypes in target regions. Insertions and deletions (indels) in the exome regions were identified using GATK software. Public databases available from dbSNP137, the 1000 Genomes Project, and the NHLBI Exome Sequencing Project (ESP 6500) were used for analysis of the results.

Sanger sequencing

Sanger sequencing was performed to confirm the variants found by whole-exome sequencing. The PCR primers (sequences and conditions provided in Table 9) were designed to amplify the variants. PCR amplification was carried out using a thermal cycler PCR System (Takara) using standard conditions. PCR products were examined by 1% agarose gel electrophoresis, which were then sequenced directly by Shanghai Bioshine company.

Table 9 PCR primers and conditions designed for validation and gene expression of variants identified by exome sequencing.

Gene expression analysis

Whole blood was collected from 48 healthy volunteers by venipuncture in Vacutainer tubes containing EDTA (BD company USA), and peripheral blood mononuclear cells (PBMCs) were separated on lymphocyte separation medium (CEDARLANE Ltd, Cat. No. CL5020). PBMCs (2 × 106 cells/ ml) were cultured for 12 hours at 37 °C and 5% CO2 in RPMI 1640 containing antigens of M. leprae (10 μg/ml, Whole Cell Sonicate, NR-19329, NIH, USA) and without the antigens as a control. Cells were plated in 12-well cell culture plates and were collected to add TRIZOL (Invitrogen) and reserved at −80 °C for gene expression analysis. RNA was extracted from cultured PBMCs with or without antigen stimulation by using the acid guanidium thiocyanate–phenol–chloroform method. The RNA was treated with RNase-free water (Shanghai Biotechnology Company). RNA (500 ng) was reverse transcribed into first-strand cDNA in a 10 μl final volume containing 100 μM random hexanucleotide primers, 50 μM oligonucleotide, and 0.5 μl RT Enzyme Mix I (TAKARA). cDNA quantification for GAL3ST4, CHGB23, CHGB48 and GAPDH was performed by real-time PCR (7500 PCR system, life Applied Biosystems, Foster City, CA, USA). Reactions were performed using a SYBR Green PCR mix (TAKARA). The primers used are listed in Table 9. Results were expressed as Ct between the target gene and the GAPDH housekeeping mRNA, and presented as ratios as 2Ct between the genes stimulated or unstimulated with M. leprae antigens.

Statistical analysis

The SPSS statistical software package ver.20.0 (SPSS Inc., Chicago, USA) was used for statistical analysis. The gene polymorphisms were tested for deviation from Hardy–Weinberg equilibrium (HWE) by comparing the observed and expected genotype frequencies using the Pearson chi-square test by HWE 2.1 software. The comparison of gene expression levels between cases and controls was performed using the Mann-Whitney U test. The chi-square test was used to compare the difference of ethnicity and gender between cases and controls. For SNP analysis, the genotype and allele frequencies of GAL3ST4 and CHGB were compared between groups using multivariate logistic regression analysis. p value, odds ratios (OR) and 95% confidence intervals (CIs) after adjusting age and ethnicity were calculated using binary unconditional logistic regression. P values less than 0.05 were considered statistically significant.