Introduction

Tuberculosis (TB) remains a major public health burden globally, with roughly one-quarter to one-third of the world infected by the disease-causing bacterium Mycobacterium tuberculosis (Mtb). It remains a critical public health issue because it one of the top infectious disease causes of mortality worldwide, in recent years falling a close second only to COVID-19 in mortality [1]. While it its public health significance is great, there is still a great need for improved treatment and vaccination strategies [2], making the identification of susceptibility factors a major focus of basic science research.

After exposure to an infectious tuberculosis (TB) case, there are three possible outcomes: resistance or early clearance of the bacillus, asymptomatic or latent Mtb infection (LTBI) that can persist for decades, or symptomatic ‘active TB’, which includes pulmonary disease that can result in further transmission. Active TB is manifested typically through a severe productive cough (pulmonary TB), though it can also affect other organ systems (extrapulmonary TB), and it is also commonly characterized by fever, weight loss, night sweats, and other clinical characteristics common to respiratory infectious diseases. Resistance to infection is seen in individuals who are initially uninfected based on standard diagnostic tests such as the tuberculin skin test (TST) and/or interferon-gamma release assay (IGRA), and then remain negative. Some individuals may test negative initially, but within months, or perhaps even longer, convert to an LTBI state (Fig. 1). While both the uninfected state and LTBI state are asymptomatic, LTBI individuals have a 10% lifetime risk of progression to active symptomatic TB.

Fig. 1: Conceptual model and clinical classification of resistance to infection.
figure 1

After initial exposure to an active TB case, ascertained individuals may present as either infected or uninfected based on the tuberculin skin test (TST) and/or interferon-gamma release assay (IGRA) such as the QuantiFERON. After this initial exposure, some individuals remain uninfected, while others convert to positive. This conversion can happen weeks or even months after that initial exposure.

While previous TB research focused primarily on factors associated with the transition from LTBI to active TB disease, recent attention has turned to factors associated with resistance to developing LTBI after exposure, as it would provide an appealing target for prevention, specifically through vaccine development and host-directed therapies [3]. The underlying premise is that targeting this stage of earlier pathogenesis, the progression from the uninfected to the infected state, might prevent LTBI and ultimately progression to disease [4]. This is similar in principle to the identification of biologic mechanisms that prevent HIV infection (as opposed to development of full-blown disease) that has been harnessed as a treatment modality [5]. A vaccine that prevents infection would be easier to study in clinical trials, since infection events are more common than disease events, and indeed, such studies are ongoing [4]. Epidemiologically, this resistance phenotype has also gained attention in understanding annual risk of infection [6]. Accurate estimates of the annual risk of infection are important for understanding the global burden of disease and for developing public health strategies to slow TB transmission. Understanding LTBI, and in converse resistance to LTBI, is critical for understanding the complexity of immune responses in TB pathogenesis [3, 7]. Genetics and genomics provide one perspective into innate biologic factors that influence resistance, and thus are one avenue into target discovery, just like the original discovery of the CCR5 HIV-protective variant.

Identification of epidemiological factors associated with this ‘resistance’ phenotype has been challenging because of the wide variety of phenotype definitions used across studies [8], described in more detail below. Multiple studies have not identified clear epidemiologic factors that differentiated resisters from LTBIs [9,10,11,12]. A multinational study identified relationship to the index case and alcohol use as potential factors associated with resistance to infection, though these were no longer significant when considered in a multivariable model [13]. One study found that BCG vaccination was associated with resistance [14], though this study used a different clinical definition and it is not clear how this might have affected that finding (see further discussion below). Age might be an important factor since it is associated with time to TST conversion [15] as well as the aforementioned potential effect of BCG [14], though the other studies did not replicate this. Age as a potential confounder is noteworthy since it has also been proposed as a potential confounder of TB (disease) genetics studies [16]. While time spent on anti-retroviral treatment might be a potential factor [12], another study found no association between HIV infection (regardless of treatment) and resistance [17], so the importance of HIV infection itself is not well understood.

The ‘resister’ phenotype is also becoming the focus of genetic and genomic studies. In fact, the first study of this phenotype was actually a genetic study [18], and it was conducted prior to detailed epidemiological investigations. When designing and interpreting these studies, the definition of this phenotype is critical. In this review, first we focus on phenotype definition, then we turn our attention to current findings and potential future directions. While the focus of this paper is on tuberculosis, these principles are generally relevant to the genetic study of infectious diseases in general.

Resistance to infection: study design is critical

Individuals who despite prolonged exposure to infectious TB cases continue to have a persistently negative tuberculin skin test (TST) or interferon-γ response assay (IGRA) are of particular interest as they may help inform prevention strategies. Such individuals are presumed resistant to latent Mtb infection (RSTR) (Fig. 1). Controls (non-RSTR) would then be individuals who do not remain TST-/IGRA- after exposure. While this sounds simple, there are many elements of this definition that are critical and also sometimes challenging to implement. Epidemiological definitions in a variety of settings and immunological implications have been discussed at length in other reviews [3, 7, 8], so here we will briefly summarize (Table 1).

Table 1 Critical elements of RSTR identification.

Exposure

In order to be considered a RSTR, exposure to an infectious TB case must be clearly established. As seen across the studies summarized in our review [8], different studies have approached this documentation of exposure differently. In close contact settings such as households, aspects of the TB diagnosis in the index case (culture confirmation, microbial load, etc.) are important, since these variables reflect potential infectiousness of that index case. In addition, data on proximity to the index TB case are also important, as they may additionally reflect risk of high intensity exposure. By contrast, in high transmission community-based settings, there is no index case. However, a community-based study in South Africa, a hyper-endemic setting, stated that transmission occurs more frequently in the community than in the household, and this transmission was verified based on positive antibody titers in TST-/IGRA- subjects [12]. A high degree of exposure may also result from certain employment settings, especially where TB is endemic [11]. However, no design is perfect. The household contact design does not guarantee transmission [19], and in populations of low endemicity, it is not clear that a community cohort would have sufficient exposure to truly characterize TST-/IGRA- individuals as “resistant” to infection.

Another proxy for exposure is age. As described above, some studies suggest that age might be an important risk factor for TST/IGRA conversion. Many of the other studies where age was not found to be a factor only included adults. Thus, age might be an appropriate variable to consider as an indicator of lifetime exposure. A recent simulation study has suggested that misclassification of exposure in cases and controls can bias genetic association study results [20, 21], so documentation of exposure is not a trivial point.

Duration of follow-up

The duration of follow-up in studies of “resisters” or “early clearance” has varied across studies. Cross-sectional studies are insufficient; exposure to a TB case might not translate into a positive test (IGRA and/or TST) in the contact until a few weeks later. Some studies have limited follow-up to 3 months after exposure [14], which is when potentially most conversions to positive TST/IGRA occur. However, our studies [15, 22] show that conversion to a positive test can occur 6 months or more from initial diagnosis of the TB case. Other studies have used 12 [10, 12] or 24 [15] months of follow-up. Our long term follow-up study [17] shows that conversion happens even several years (with an average length of follow-up of 9–10 years) after that initial exposure. In that case, acquisition of infection is likely due to exposure to some other TB case other than the original index, which may be more similar to the high endemicity situation described above. In either case, the point remains: cross-sectional studies and/or short-term follow-up (less than 6 months) might be insufficient for a strict RSTR definition. Lastly, our studies [17] and others [12] have shown that IGRA results collected over the course of a year may be unstable and clinical classifications are best made using the entire collection of data (IGRA and TST if available) over the course of observation for each subject. If only one or even two measurements were available, a subject might be misclassified due to a sporadic value.

Diagnostic tool

Early studies of the RSTR phenotype only used the TST, because the IGRA hadn’t been developed or refined yet. Thus, many published genetic studies are based on that phenotype. More recent studies are transitioning to the IGRA. This raises a new question, since discordance between these two tests is well-known [10, 17]. Will studies that use only the IGRA appropriately reflect ‘resistance’? Should studies use both tests, which might be logistically challenging, especially for longitudinal study designs? Ongoing studies need to tackle this question both epidemiologically and immunologically, to understand how different clinical definitions might be reflected biologically.

Lingering questions

The discussion above already mentions some key challenges in defining the RSTR phenotype. The first question is, how long must one conduct follow-up in a cohort study setting in order to be sufficiently certain about the characterization of RSTRs? In Stein et al. [17], ~80% of persistently TST- individuals remained both TST- and were also IGRA- after an average of 9-10 years post initial exposure. While that is a high concordance rate, is that high enough? Certainly it is logistically challenging and expensive to carry out follow-up studies for a decade or more. Based on their summary of the literature, Gutierrez et al. [8] advocate for at least 1 year of follow-up, since that is the period covering the majority of TST/IGRA conversions post-exposure. Kroon et al. [12] additionally advocate for the recruitment of especially high risk individuals, particularly HIV-infected, who are more likely to convert their TST/IGRA in a shorter amount of time. It is striking that antibody signatures identified in HIV-infected resisters with shorter follow-up are very similar to those found in the Ugandan long-term follow-up study [23], suggesting that at least this aspect of biological mechanism is robust to duration of follow-up as well as exposure (household vs community). It is also quite possible that additional genomic studies, building on those described below, may elucidate the importance of phenotypic stringency on biological interpretation. Additional long-term follow-up studies that quantify the conversion rate as a function of time since initial exposure (in household contact studies) or initial ascertainment (for community-based studies) are needed to better understand this issue.

Second, the question about TST/IGRA discordance and the field transitioning to the IGRA is not trivial. The objections to the TST have been based primarily on concerns of false positives due to BCG vaccination and/or boosting. However, many studies have not found a bias in TST conversion rates due to history of BCG vaccination, and studies of boosting suggest this effect is minimal [24].

Third, while above we outline variables that should be used in defining sufficient exposure, little work has been done examining the degree of exposure and whether this has an impact. While the Ugandan studies examined a quantitative epidemiologic risk score to define degree of exposure [17, 25], another large multinational study categorized exposure into low, medium, and high levels [13]; perhaps these quantitative levels would be helpful for future genetic analyses.

How is TB unique?

From a genetic epidemiological standpoint, it is important to consider which of these aspects of study design and phenotype definition are also relevant to the genetic study of other infectious diseases. The first point to consider is mode of transmission. TB is spread through airborne droplets, which is an indirect mode of transmission. This is one of the reasons why quantification of exposure is challenging, because it is not a direct mode of transmission as is seen in sexually transmitted diseases and other viral infections. Certainly establishing exposure for those infections is also challenging but entails a different strategy, which is beyond the scope of this review. Other infectious diseases with indirect transmission might be vector-borne or vehicle-borne, where establishing exposure is likely even more difficult, though as pointed out by the aforementioned simulation study [20], still important. The prevalence of infection and disease might also be a consideration when considering exposure. Second, not all infectious diseases have a stage of pathogenesis where the pathogen is present but does cause symptoms. Influenza and COVID-19 are examples of viral infectious where there is an asymptomatic state, and resistance in hepatitis C has been examined in terms of “spontaneous clearance” [20] (similar to the “early clearance” term coined in the Mtb resistance field [14]). In this same vein, different types of pathogens (bacteria, viruses, parasites) have different stages of pathogenesis, which ultimately could affect clinical definitions in genetic studies.

Recent genetic and genomic findings

Heritability – impact on gene mapping

A recent review by Abel et al. [16] nicely summarizes heritability estimates across a number of studies, with a variety of phenotype definitions and study designs, and in a variety of global populations. These values ranged from as low as 39% (for IGRA results) to 92% (for quantitative TST result). In our recent study, where we defined RSTR based on our long-term follow-up study [17], we found an estimate of 48% when including both HIV-infected and HIV-uninfected individuals in the analysis. In a sense, our RSTR definition is a composite phenotype, requiring negative results on both TST and IGRA, so not directly comparable to previous studies, but it also has longitudinal stringency. The heritability analysis of other TB phenotypes presented in McHenry et al. illustrates that strictness of phenotype yields higher heritability estimates. As a result, higher heritability estimates result in higher power for gene mapping. This is important for interpretation of the extant literature.

Candidate gene and genome-wide association studies (GWAS)

Most of the initial studies examining LTBI (or its converse, lack of infection) were based on cross-sectional studies, sometimes in low transmission settings (Table 2). Other studies examined TST longitudinally, but did not use IGRA in characterization of infected and uninfected individuals because that assay was unavailable at the time. Only recently as IGRAs became a standard part of epidemiological studies have genetic studies also characterized subjects according to this measure. Studies conducted early on were primarily focused on candidate genes and genome-wide linkage studies, and more recently as genotyping costs have come down considerably, genome-wide association studies have been conducted. This contrast in approach makes comparison across papers difficult, as candidate gene studies required less stringent p-values for declaring significance, and linkage studies also require a lower significance threshold than GWAS. Linkage studies also have low resolution for identifying common variants. The passage of time has affected the available phenotyping strategies in addition to the genotyping technology. If our goal is to use genetics to identify potential therapeutic targets, replication at the gene level is of primary interest. Frankly, this is the only way to consider replication between linkage results and association studies, since linkage studies mostly employ allele-sharing between relative pairs as the statistic of interest, and as such do not identify an effect allele. In addition, targeted candidate gene studies might employ different strategies for selecting single nucleotide polymorphism (SNP) markers than GWAS studies.

Table 2 Summary of genetic studies.

A few observations can be made based on the results of the literature summarized in Table 2. This table includes the primary findings of the cited studies, and also indicates which of these findings were replicated (at p < 0.05) in a recent GWAS [26] where the phenotype was based on stringent follow-up and both TST and IGRA [17]. Thus, this enables examination of replication by phenotype definition. First, there was one locus replicated between original linkage studies using TST only, and that was on 5p15. This locus appears to be robust because it was observed in two different populations (South Africa and Uganda), using slightly different TST-based definitions (persistently negative TST and continuous value of TST induration), and most recently replicated in a GWAS that used the stringent definition of TST/IGRA persistent negativity. Second, some of the loci originally identified by genome-wide linkage studies (2q21-q24 and 11p14), again using different variations of the TST-based definition (longitudinal and cross-sectional, respectively), were also replicated by the recent GWAS that used the stringent TST/IGRA definition. Third, the GWAS studies uncovered significant loci that were not identified by the genome-wide linkage studies. This is quite likely due to the increased power and resolution of association analysis methodology. It is also possible that the more stringent definitions yielded increased statistical power [26]. However, with only two such studies, this still provides justification for additional GWAS studies. The consistency between the older TST-based linkage and association studies and McHenry et al. [26] suggests two things: One, the high heritability also identified by McHenry et al. [26] suggests that this strict phenotype may have increased power to detect effects. Two, since these loci were originally identified with a variety of phenotype definitions but still attained genome-wide significance, perhaps effects seen at this stringent a threshold may be robust to phenotype definition.

The non-replication between the McHenry et al. [26] and Quistrebert et al. [27] studies is curious and deserves particular attention. The Quistrebert et al. [27] study was interesting in that it used 3 different populations for internal replication, with rigorous criteria for TST and IGRA positivity, but two of the three cohorts only utilized TST and IGRA at baseline with no longitudinal assessment. The lack of replication of findings between the two GWAS studies reveals that the impact of a stringent phenotype definition might increase power to detect effects, but that does not guarantee replication, though this lack of replication may also suggest genetic heterogeneity underlying RSTR. It is also unclear whether the lack of replication between the two GWAS studies was due to difference in phenotypic characterization (cross-sectional vs long-term follow-up) or perhaps ethnic background. Certainly, the use of 3 ethnically diverse populations within the Quistrebert study is a major strength. Since the study focused on the loci that were internally replicated, it is unknown if there might be replication with some of the other loci listed in Table 2. Furthermore, it is quite possible that there are additional as-yet undetected genetic associations with the RSTR phenotype. Indeed, the analysis by Dawkins et al. [28] revealed genetic associations with HLA that had not been seen in prior studies, and the authors hypothesized that these associations were detected because of the well-defined phenotypic contrasts.

The validity of these genetic associations is also difficult to assess for a few reasons. First, as is very clear from the citation list, relatively few study groups have conducted large scale studies of this phenotype, therefore there are relatively few opportunities for replication. This was a stated weakness of the Ugandan GWAS – the analysis only contained a discovery sample with no replication sample. Second, the functional implications of these loci have not been delineated. If the goal is to use such findings to identify therapeutic targets, this is an important next step. Most likely, some of these studies have been attempted and/or are ongoing, so it will be essential to keep a close eye on the literature. Intriguingly, the loci identified by these studies are distinct from those associated with TB susceptibility [29,30,31], suggesting that the biologic underpinnings of resistance are quite different.

Transcriptomic studies

As of this writing, only a few RNA expression studies have been published, though there are likely many more ongoing (Table 3). While there are only a few studies, they also include functional validation, which is an important step towards the ultimate goal of identifying biologic targets. The first study used microarray technology, which was cutting-edge at the time, and the phenotype was based on TST only, because again, QFT was not used at the time [32]. This study examined macrophage gene expression after stimulation with Mtb in vitro. Several gene sets were differentially expressed, but the most notable pathway was the histone deacetylase pathway, and these findings were validated with immunologic and microbiologic experiments.

Table 3 Results of transcriptomic studies.

The second two studies utilized RNA-seq technology and the stricter RSTR phenotype [17] described above; the first of these examining basal gene expression levels in macrophages [33], and the second examining expression responses after these macrophages were stimulated with Mtb in vitro [34], similar to the aforementioned microarray study. The study examining basal gene expression levels also included two populations (Ugandan RSTRs and South African miners with the RSTR phenotype), and identified gene sets that were associated with carbon metabolism and free fatty acid enrichment that differentiated RSTRs and LTBI in both populations. These hypotheses were validated with a variety of functional validation experiments as well as candidate gene association analysis using SNP data. In the study where gene expression after stimulation with Mtb in vitro was examined, again no single gene differences were identified after multiple testing correction, but gene set and pathway enrichment analysis revealed several pathways of interest, most notably TNF. Validation was conducted again using SNP association analysis, and these analyses revealed associations with ABCA1 and DUSP2. Note that the RNA-seq study [34] did not replicate the microarray study [32]; it is unknown whether this lack of replication was due to small sample size, different phenotype definition, or different technology. These questions can only be elucidated with additional studies and functional validation.

As of the writing of this review, all three published transcriptomic studies have been based in Uganda (with one replicating findings in South Africa), and they have differed by technology, phenotype, and tissue of interest (stimulated or unstimulated macrophages). Thus, it is impossible to generalize these findings to other populations, either those with different phenotype definitions or those with different exposure profiles. This will certainly be a focus of future research. However, it is worth noting that the most novel findings from these current transcriptomic studies are different from those identified with TB disease [35,36,37,38,39,40].

Future directions

While there are no published studies as yet examining epigenetic influences on the RSTR phenotypes, there is another clinical framework that provides premise for these studies. Progression, or non-progression, to TB disease, in the setting of HIV infection is another model for resistance. When immunosuppressed individuals living in TB-endemic settings do not develop TB disease or LTBI despite exposure and high risk, this provides a model of resistance that is useful for gene mapping [41, 42]. Two studies under this paradigm have revealed important epigenetic effects underlying resistance to infection. First, a methylome-wide study, examining methylation markers across the genome, identified methylation markers associated with TB susceptibility as well as SNP markers that interacted with methylation markers to increase susceptibility for TB [43]. Second, a study examining chromatin accessibility using ATAC-seq identified differential chromatin accessibility in different clinical groups [44]. Certainly future studies of the RSTR phenotype will utilize these technologies to better understand the impact of epigenetics in the context of genetic and transcriptomic variation. In addition, eQTL studies may begin to integrate SNP association and transcriptomic data, which will enable interpretation of the role of both types of variation in clinical outcome [45]. The integration of SNP and RNA data will provide insight into function, which is essential for developing therapeutic strategies.

In addition, the lack of functional validation of findings from genetic studies illustrate the challenge in going from an association (or linkage) effect to a biologic readout. One approach that shows promise is using genetic data from a GWAS to validate transcriptomic findings, as in two of our studies [33, 34]. Another approach is to utilize genetic variation data within a bioinformatic approach that integrates data generated across other ‘omic platforms. Analyses that examine genetic variants one by one overlook possible epistatic and epigenetic effects, a point mentioned by another recent review in this area [16]. Systems biology approaches that incorporate genetic and genomic data along with other ‘omic (proteomic, epigenetic, etc) data may reveal interactions between these various components of a comprehensive biologic model [46,47,48,49,50,51,52,53,54,55,56]. Yet another related approach that may point to disease biology is to examine enriched gene sets and pathways [30]. This has proven fruitful in both our GWAS [26] and transcriptomic studies [32,33,34], especially because the gene expression studies did not reveal any single gene effects that were significant after multiple testing correction. Ultimately, genomic data are one piece in the puzzle, and should be integrated with the findings from immunologic and ‘omic strategies.

While replication analyses have been conducted independently, another future goal would be a cross-population meta-analysis. The challenge here would be accounting for variability in phenotype definition. Given some of the replication seen between some of the studies using cross-sectionally defined phenotypes and longitudinally-defined phenotypes, it would be interesting to see if GWAS using cross-sectional phenotypes may add some insight to the question about robustness of genetic results in the context of phenotype definition. A recent meta-analysis of TB disease GWAS studies was conducted [57], and it illustrated the challenges of phenotype definitions different by geographic location. Still, this goal is commendable because it may bring to the fore genetic associations that did not attain significance after multiple testing correction in samples independently, but at lower yet suggestive levels, these loci may still be clinically and biologically meaningful. Such results might not be extracted at lower levels of significance unless they were combined with other samples. However, for a meta-analysis to be valuable, first, genomic studies of the RSTR phenotype need to be conducted more diverse populations. As noted above, the currently published findings come from a few well-characterized but globally restricted populations. For a biologic target to have any translatability, it needs to be transferable across global populations. It is also worth re-analyzing published TB GWAS data (where disease was the focus, not resistance to infection) to examine whether genes associated with resistance might also be associated with disease progression at lower levels of significance and/or different directions of effect. Lastly, it is of interest to understand if the regions associated with resistance have arisen due to selective pressure. Signatures of selection have been observed in regions previously associated with TB disease resistance in the context of HIV infection [42]. Addressing this question would entail the analysis and collection of new data, and is beyond the scope of this review, but a worthy question for future studies.

Conclusion

Genetic and genomic studies of resistance to Mtb infection are increasingly important as they become part of multidisciplinary to identify potential vaccine and other therapeutic targets. Relatively few studies have examined this strictly defined phenotype, though recent work suggests that genetic findings may be potentially robust to phenotype definition. Future efforts must attempt to use similarly stringent phenotype definitions in order to determine the true impact on reliability and validity of genetic and transcriptomic associations. In addition, studies are needed in a variety of global settings in order to better understand the impact of Mtb genetic variation and other environmental factors the reliability and validity of these genetic associations. These next steps are critical when considering potential translation of these findings to at-risk human populations. Moreover, it is important to understand how these results compare to those currently being harnessed as host-directed therapy and vaccine targets for preventing disease progression as opposed to maintaining resistance to infection. However, before the findings from these studies are translatable, there must be consideration of how such prevention strategies fit in the scope of existing strategies to prevent progression to TB disease. The public health strategies may differ depending on the biologic commonalities from the results of ongoing and future studies, and this will impact how potential host-directed therapies and/or novel vaccine approaches are implemented.