Main

Tuberculosis (TB) is a common infectious disease and a serious health issue in the developing world. It is commonly caused by the infection of Mycobacterium tuberculosis (MTB). About one-third of the world's population is infected by MTB and one-tenth will develop active TB. More than 90% of TB cases were from developing countries (http://www.who.int/tb/en/). The treatment of TB is difficult and relies on long-course regimens of chemotherapy, whereas about 5% of TB cases are multidrug resistant. The most widely used vaccine to prevent TB is Bacille Calmette–Guérin (BCG, named after the two pioneers who developed the virulence-reduced live vaccine of Mycobacterium bovis), which has been used for >60 years. But the effectiveness of the vaccination is limited to the prevention of primary TB in children.1, 2, 3, 4

The etiological mechanisms of TB are unclear, which hinders the development of effective strategies for the treatment or prevention of TB disease. Normal macrophage function and Th1 cellular immunity are critical to contain MTB infection, but are insufficient to clear MTBs. About 90% MTB infection will become latent TB infection (LTBI). MTBs can remain dormant in granulomas for decades. Adaptive Th1 immunity induced by BCG vaccination is not effective to prevent the LTBI activation. The LTBI activation involves complicated Th2 and Th1 immunity,5 whereas the molecular mechanisms are mostly unknown.

Human genetics for TB knowledge

Propelled by the completion of the Human Genome Project in 2003, human genetics has achieved tremendous progress in recent years. It has penetrated into every branch of biomedical science, and become an indispensable approach to understand the molecular basis of human diseases, including infectious diseases. The theme of human genetics is genes and genetic variations. There are around 20 000–25 000 protein-coding genes encoded in human nuclear genome (∼3 billion base pairs in length), as well as 37 additional genes encoded by the mitochondrial genome (∼16.6 kilo base pairs in length).6 To adapt to constantly varying environments, changes in the human genes have been happening all through the human history from spontaneous mutations of DNA molecules in human genome. As the result of genetic drift or natural selection, gene mutations with relatively neutral or beneficial effects may become common variations after a number of successive generations. There are >11 million common DNA variations with frequencies ⩾1%, that is, DNA polymorphisms, in the human genome. Except identical (monozygotic) twins, there are no two people's genomes that are identical, although the difference is <0.1% of the whole genome between any two persons across the world. The most common form of DNA polymorphisms is single-nucleotide polymorphism (SNP). Because of the wide-spread of DNA polymorphisms, different individuals may have different susceptibility to a common disease.7 The diversity of genetic susceptibility to common diseases in the human population enables researchers to understand the molecular mechanisms of common diseases by the genetic approach.

TB had been thought as genetic in ancient time because of obvious family clustering. In early 1880s, Robert Koch successfully isolated TB bacilli, and elicited TB in experimental animals using cultured TB bacilli,8 establishing the communicable nature of TB. Different from acute infectious diseases, chronic TB infection is acquired by aerosol transmission to intimate contacts. This fact explains the majority of the family clustering of TB. The genetic susceptibility of TB recalled people's attention because of an investigation of the coincidence of TB in twins,9 although the initial study highlighted environmental factors in TB in twins.10 The coincidence of TB was once reported of 2.5-fold higher among monozygotic than dizygotic twin pairs.9 However, a recent reanalysis of the same dataset suggested that the increased coincidence in monozygotic twins can be better explained, by closer physical proximity and thereby increased chance of MTB transmission between monozygotic twin pairs.11 In addition, a twin study suggested that the delayed type hypersensitivity (DTH) response to mycobacterial antigens was also under genetic regulation,12 though the details remain controversial. The concordance of DTH response to tuberculin after newborn BCG immunization was not found to be different between monozygotic twins and dizygotic twins.13 Besides increased TB coincidence in monozygotic twins, genetic susceptibility of TB was also suggested by ethnic difference in the susceptibility to TB infection. Black people have about 1.5–2-fold greater risk to get infected than white people as shown by the tuberculin skin test, whereas the risk of progression to clinical TB after MTB infection has no ethnic difference.14 The ethnic difference may be related to different innate resistance of macrophages to MTB exposure.15 In spite of the controversy on the inheritance attribute of TB, the human genetics approach represents a unique approach to understand the molecular mechanisms of host immunity against TB.

To find common DNA variations associated with TB, two approaches have been adopted, that is, linkage analysis and association study. A linkage study examines the sharing of genome regions in affected relatives, whereas an association study examines the coexistence of genetic markers with the disease. Comparing these two approaches: (1) a linkage study can take a few hundreds of genetic markers to scan the whole human genome, whereas the statistical power is relatively low. A linkage study is more useful to identify larger genetic effects. (2) An association study has larger statistical power to identify genetic effects. However, a genome-wide association study (GWAS) needs to genotype nearly one million genetic markers to cover the whole genome. Before the availability of high-throughput genotyping technology, an association study was usually candidate-based, which had to focus on genes with candidate functions. The performance of candidate-based studies was not ideal in some human common diseases, which had little value in helping us to understand the unknown aspects of a disease. The rise of GWAS studies changed this situation. Because a GWAS is hypothesis free, it has turned out to be a great success in understanding unknown aspects of many human common diseases by examining hundreds of thousands of genetic markers across human genome unbiasedly for disease association.7 As demonstrated by many GWAS studies, minor genetic aggregation may be explained by numerous association loci.

Linkage mapping of TB susceptibility

Numerous efforts have been made to screen the human genome for genetic loci of TB susceptibility by linkage mapping. Reported linkage loci are summarized in Table 1.

Table 1 Human TB susceptibility loci suggested by linkage studies

According to a widely used guideline for the interpretation of linkage results (Log of Odds, LOD⩾2.2 as suggestive linkage and LOD⩾3.6 as significant linkage),26 most reported linkage loci had weak statistical evidence, and were not replicated by any independent study. One explanation for the lack of replication is that the genetic effects on the susceptibility to TB disease were seriously overestimated in the linkage studies. Compared with the diseases for which linkage analysis identified significant genetic effects, the sharing of environmental factors in affected relatives seems to be much more important than the sharing of genome regions. The twins study has also demonstrated the tremendous effect of the environmental factors in TB.11 Therefore, the conventional linkage analysis has not identified significant genetic effects on TB disease. Because of the limited statistical power, linkage studies may not be able to identify weak genetic effects of TB.

In addition, most reported linkage loci contain a large number of genes. Which gene in a linkage locus explains the disease susceptibility cannot be addressed by the linkage analysis. Nevertheless, a few loci are worthy of attention, that is, 2q3518 and 8q12-q1321 with high statistical significance, 20q13 reported by two independent studies in African populations,22, 24 and 5p15 in linkage with DTH with high statistical significance.25 The linkage locus at chr2q3518 contains an interesting candidate gene SLC11A1. Besides SLC11A1, no other TB-associated gene has been identified from these linkage loci to date.

Candidate-based association studies on TB susceptibility

A large number of candidate-based association studies of TB were conducted to examine the association of predicted functional DNA variations in candidate genes with TB infection or disease.

SLC11A1 (NRAMP1)

The most studied gene associated with TB is the solute carrier family 11 member 1 gene (SLC11A1) at Chr2q35, which is more popularly known as the natural resistance-associated macrophage protein 1 gene (NRAMP1). In mice, a series of studies on host genetic susceptibility was initiated by the observation that different mouse strains have two distinct patterns of response to BCG, that is, either susceptible or resistant. This genetic effect is unrelated to DTH response.27 Studies map the genetic effect to the locus Bcg in segregating backcross mice.28, 29 Besides BCG infection, Bcg also controls the resistance to other intracellular pathogens, for example, Salmonella typhimurium and Leishmania donovani.30 The effect of Bcg was on macrophage function to destroy intracellular pathogens.31 Positional cloning of the Bcg locus identified the macrophage specific gene, Nramp1, the murine ortholog of human SLC11A1,32 explaining resistance to TB infection in mice.

Nramp1 is a proton/divalent cation antiporter. Acting as a divalent-metal efflux pump at the phagosomal membrane of macrophages, Nramp1 depletes divalent metals as Zn2+, Cu2+, Fe2+ and Mn2+ from bacteria-containing phagosomes. These divalent metals are essential micronutrients for pathogen vitality. Depletion of these divalent metals renders the pathogen more sensitive to the killing by oxygen radicals.33 The genetic susceptibility of intracellular infection among different mouse strains is explained by an amino acid substitution, Gly169Asp (G169D). The Asp accelerates NRAMP1 degradation and thus reduces or abrogates Nramp1 function.34

The genetic association of SLC11A1 with TB susceptibility in humans was first reported in 410 TB cases and 417 controls in West African.35 Four DNA polymorphisms, that is, the (CA)n microsatellite in the immediate 5′-flanking region, a SNP in intron 4 (469+14G/C, rs3731865), a non-synonymous SNP at codon 543 that changes aspartic acid to asparagine (D543N, rs17235409), and a TGTG deletion in the 3′-untranslated region (1729+55del4, rs17235416), were associated with TB disease with the smallest P-value of 0.004.35 After the initial report, a number of studies replicated the TB association in different populations.36 However, several issues remain: (1) studies with equivalent or larger sample sizes than the original report are needed to assess the genetic effect in different populations. The reported replicative studies had relatively small sample sizes. (2) One well-designed study in a different African population from Malawi did not replicate the original reported association.37 (3) In addition, the genotyping technology of early association studies may be exposed to technical bias. A large number of early genetic associations in different human complex diseases were not replicated by recent GWASs, genotyped by high-quality DNA genotyping arrays. (4) The WTCCC has not reported the SLC11A1 association in the large case–control cohort from The Gambia, although the study was completed 3 years ago. (5) At one decade after the initial report, the specific causative variation that can explain the TB susceptibility has not been identified.

Compared with the genetic association of SLC11A1 in humans, the evidence from mouse studies is more convincing. The functional gene variation Gly169Asp explains the genetic susceptibility in mice. However, no human DNA variation has thus far been identified that is comparable with the Gly169Asp in mice.

Histocompatibility leukocyte antigen (HLA)

The histocompatibility leukocyte antigen genes (HLA) at Chr6p21.3 has essential roles in infectious diseases, autoimmune diseases, graft-versus-host disease and cancer, by presenting antigenic peptides. HLA class II (DR, DQ and DP) molecules present antigenic peptides on the surfaces of antigen-presenting cells to CD4+ T cells; HLA class I (HLA-A, B and C) molecules present foreign peptides on the surfaces of the infected cells to activate cytotoxic T cells. In Th1 immunity against MTB infection, HLA class II molecules present MTB antigens on the cell surfaces of macrophages to activate naïve CD4+ T cells, and to induce Th1 response in conjunction with interleukin-12.38 The HLA genes are by far the most polymorphic in the human genome. Different HLA alleles have different binding properties with antigenic peptides. The evolutionary advantage of heterozygosity has resulted in multiple amino acid substitutions in HLA genes, which permits the presentation of a wider range of antigens from emerging pathogens. This diversity may be essential to protect a population from extinction by a single pathogen infection.

A number of studies have reported the association of HLA class II polymorphisms with TB susceptibility. One interesting report was the association of the polymorphism at residue 57 of the DQβ1 protein. According to the study by Delgado et al.,39 the DQβ1 Asp57 allele was associated with increased risk of progressive pulmonary TB. Compared with non-Asp57 alleles, the DQβ1 Asp57 demonstrates reduced ability to bind the immunogenic peptides of MTB,39 which may weaken the Th1 response. It is noteworthy, that the residue 57 polymorphism of DQβ1 is also associated with an autoimmune disease type 1 diabetes, caused by T-cell mediated pancreatic β-cell destruction. The Asp57 is protective against type 1 diabetes susceptibility,40 and the mechanism is related to more efficient presentation of ectopic autoantigens in the thymus to induce central immune tolerance.41, 42 This opposite effect of Asp57 in TB and autoimmune diabetes may explain partially the decreased susceptibility of TB, and the increased susceptibility of autoimmune diabetes in the European population. Nevertheless, the Asp57 association with TB susceptibility needs to be validated by an independent study.

Besides the residue 57 polymorphism of DQβ1, some studies reported association of TB disease with DQ and DR haplotypes. Increased TB susceptibility was associated with the DR2 allele in Indonesian43 and Asian Indian,44 the DQB1*0503 allele in Cambodian,45 DQA1*0101 and DQB1*0501 in Mexican,46 DRB1*1501 and DQB1*0601 in Asian Indian,47 DQB1*05 in Polish.48 Protective effect was reported to be associated with DQB1*0402, DR4 and DR8 in Mexican,46 and DQB1*02 in Polish48 populations. More studies are still needed to validate these associations.

Toll-like receptor (TLR) genes

TLRs are membrane signaling receptors that have an essential role in the activation of innate immunity against microbial infection, by recognizing specific molecular patterns of microbial components.49 Among the TLR genes, TLR7 and TLR8, located in close proximity at ChrX, are highly expressed in lung and TLR7 also in leukocytes and macrophages, which suggest their potential roles in lung infection.50 To date, TLR7 and TLR8 are known to recognize single-stranded RNA.51 This response has generally been thought to involve single-stranded RNA of viruses; however, recently TLR7 was implicated in interferon (IFN) production following dendritic cell phagocytosis of streptococci.52 This response occurred only in phagocytosed bacteria and was dependent on TLR response in degradative vacuoles. Such a scenario has not been reported with TB, but seems plausible. It seems that TLR7 and TLR8 resides in endosomes and lysozomes and their response to single-stranded RNA requires the sequestration of the RNA in one of these locations.53, 54, 55, 56 Davila et al.57 identified the TB association of the TLR7 and TLR8 locus in an Indonesian cohort including 375 TB patients and 387 controls, then replicated the association in a Russian cohort including 1837 cases and 1779 controls. This study highlights the potential function of the TLRs in anti-TB immunity, and warrants further exploration of the potential role of these molecules in TB pathogenesis.

Genes involved in the IFN-γ signaling

Previous studies on the Mendelian susceptibility to mycobacterial diseases highlight the critical roles of the IFN-γ signaling genes in MTB infection. The Mendelian susceptibility to mycobacterial diseases mutations of the IFN-γ receptor genes are specific and lethal for mycobacterial infection.58 A number of studies reported that the DNA variations of the IFN-γ gene (IFNG) were associated with TB susceptibility, as reviewed in the meta-analysis by Pacheco et al.59 The IFNG +874 polymorphism (dbSNP ID rs2430561) was suggested of association with TB susceptibility (P=0.0008) by the meta-analysis of 11 studies.59 The +874 T/A polymorphism is located in a putative nuclear factor-κB-binding site, and a protective T allele is associated with high IFN-γ expression.60 However, the dramatic variability in the effect sizes in different studies needs to be addressed by further study.

Studies also investigated the association of the IFNGR1 polymorphisms and TB susceptibility. However, the results of different studies are controversial. Some studies suggested genetic association,61, 62, 63, 64 whereas other studies suggested no association.65, 66

Other genes

Mendelian susceptibility to mycobacterial diseases mutations of the interleukin-12 signaling genes were also reported, which have the phenotypes with nonspecificity, low penetrance and better prognosis.67, 68 These mutations suggest the importance of the interleukin-12 signaling in MTB infection. Common IL12B variation was reported to be associated with TB susceptibility but with nominal statistical significance.62 No replicative study of this association has been reported. Polymorphisms of the cytokines interleukin-10 and tumor necrosis factor-α were also reported to be associated with TB susceptibility. The results from several of these studies were controversial, and the joint analysis did not show statistical significance.59 Nitric oxide produced in macrophages may participate in destroying MTB.69 Nitric oxide in macrophages is catalyzed by the nitric oxide synthase 2A gene (NOS2A). Polymorphisms of NOS2A also may change host TB susceptibility.64, 70 The role of vitamin D goes beyond calcium regulation, and includes mediation of innate immunity against MTB infection.71 Several studies suggested the association of the vitamin D receptor gene (VDR) variations and TB susceptibility. However, dramatic heterogeneity existed in different populations, and the combined analysis of genetic association of VDR and TB susceptibility was not statistically significant.72

GWAS studies on TB susceptibility

Recently, Thye et al.73 published the first report of GWAS on TB. In this study, Thye et al. combined the GWAS results of two large African cohorts, that is, one cohort from Ghana comprising 1740 controls and 921 cases, and another from The Gambia comprising 1377 controls and 1309 cases. This study identified a novel association tagged by a SNP rs4331426 at 18q11.2. The effect size of this association is small, with an odds ratio of 1.19. This is a gene-poor region, and therefore, the functional mechanism of this association is still a puzzle. Besides the 18q11.2 locus, this study identified another 16 loci with interesting P-values in the two GWAS cohorts, but there is lack of statistical significance in the replication cohorts. None of these 17 loci harbors any genes with known function that may participate in MTB infection or TB activation. This study by Thye et al. provides a proof-of-principle that the GWAS approach can be used to understand the molecular mechanisms of TB, as it has been successful in other common human diseases.

In addition, Thye et al. examined the association of 22 TB candidate genes that were reported of TB association by previous low-throughput association studies, including all the TB candidate genes discussed in this review article. Nominal significant P-values were seen, but as each gene region contains multiple SNPs, no P-value can clear the multiple testing correction in each gene region. Family-wise test of all the 22 candidate regions was not significant either with P=0.325. Therefore, the TB GWAS did not support previous candidate-based studies. The genotyping coverage of the TB GWAS might affect the replication of the association of the TB candidate genes. DNA polymorphisms in many TB candidate genes were not genotyped directly in the GWAS. However, the overall nonsignificance of the TB candidate genes suggests no major genetic effect from these candidate genes.

Further GWAS efforts on TB may renovate the traditional idea of TB genetic susceptibility as none of the candidate genes with important roles in MTB infection and Mendelian susceptibility to mycobacterial diseases was identified of association with active TB in the GWAS, whereas the TB-associated locus and other loci with nominal significance in the GWAS harbors no gene with function in TB.

Summary

The epidemic of acquired immunodeficiency syndrome (AIDS) and diabetes in the developing world, as well as the prevalence of multidrug resistant TB, is threatening the TB control. TB is the most common complication of AIDS. Diabetes increases the risk of TB for two o four times.74 Although great efforts are being made to find cures for AIDS and diabetes, the prevalence of both AIDS and diabetes is still increasing in countries with substantial TB transmission. Without an effective strategy to prevent the incidence of TB in high-risk individuals, the highly communicable disease may jeopardize the health of broader population. The prevalence of multi-drug resistant TB (accounts for 5% of all TB cases) is making this situation even more urgent. Knowledge on the molecular mechanisms underlying immunity to TB is critical for the development of effective strategies to control TB.

However, compared with other complex diseases, the genetic studies of TB susceptibility are far less successful because of the serious confounding effect of environmental factors. Most reported TB-associated loci are lack of replications among different populations. The risk of MTB infection is determined by the combined effect of a number of factors, that is, the virulence of MTB, exposure, the infectious dose and route of infection, and the host susceptibility to MTB infection. A number of non-genetic factors (that is, AIDS, aging, socioeconomic condition and so on) may contribute to the host susceptibility to MTB infection. In addition, the risk of developing active TB from LTBI is also influenced by a number of non-genetic factors, including AIDS, diabetes, smoking, alcoholism, malnutrition, aging, immunosuppressive treatment (for example, anti-tumor necrosis factor α therapy) and so on.75, 76 The hypothesis-free GWAS is an important tool to understand unknown mechanisms of TB. To make the GWAS approach more successful in TB research, international efforts are required to increase the statistical power by merging multiple datasets and to improve the study design by controlling more strictly for multiple environmental confounding factors (particularly intensity of exposure). Using a mathematic modeling approach, our study (unpublished data) showed that MTB infection and LTBI activation were differently influenced by environmental factors. Improved phenotyping of TB susceptibility will also be a critical approach for the future success.