Every minute, another three people in the world die of tuberculosis. With more than 8 million new cases of active disease and nearly 1.5 million deaths annually, tuberculosis is a global health emergency of overwhelming proportions1. The causal pathogen — the bacterium Mycobacterium tuberculosis — is transmitted by inhalation. In the lungs, M. tuberculosis is phagocytosed by macrophages, which are thought to be the predominant host cell for the majority of the infectious life cycle. Internalization by macrophages triggers an immune response, the recruitment of additional monocytes and, ultimately, the formation of a granuloma that effectively contains the infected cells. The success of this microorganism is partly due to its ability to survive within macrophages in a granuloma for months and even decades in an asymptomatic state2,3,4. It has been estimated that one-third of the human population may harbour M. tuberculosis in this state5. For transmission to occur this containment must fail (which can be due to changes in the host immune status) or be overcome. When the granuloma breaks down, infectious bacilli can be released into the airways to be expelled and transmitted to the next host. Although this outline of the life cycle is generally accepted2,3, many details of this cycle and the mechanisms that allow M. tuberculosis to survive in the host remain poorly understood.

M. tuberculosis is one member of a group of organisms known as the M. tuberculosis complex (MTBC) that also includes numerous strains of M. tuberculosis, the human pathogen Mycobacterium africanum and a clade of animal-infecting mycobacteria (including Mycobacterium bovis). Some non-tuberculosis mycobacteria are capable of causing infections in humans or animals, but the majority are common environmental organisms that typically live in soil6. The mycobacteria, in turn, are members of Gram-positive Actinomycetes, which include numerous common soil and aquatic bacteria. The story of M. tuberculosis is therefore a story of how what was most likely a soil bacterium evolved to become one of the most successful human pathogens in history.

In this Review, I describe how the application of genomics to the study of tuberculosis is accelerating our understanding of this ancient disease. Through the expanding use of genome sequencing and comparative genomic analyses, the research field has gained new insights into the origins of M. tuberculosis. This in turn has provided knowledge into how M. tuberculosis evolved to become a 'professional pathogen' of humans and how it continues to evolve to evade our clinical efforts through drug resistance. The extension of sequencing technology to systems biology has further yielded insights into the adaptations that enable M. tuberculosis to survive in the host. Together with an extensive literature on cellular biology and immunology of tuberculosis, these data give rise to an emerging picture of an organism that has evolved over an extended period of time to adapt to, and possibly orchestrate changes in, the host immune system.

The emergence of a human pathogen

Tuberculosis has affected humans for thousands of years. The ancient association of tuberculosis with humanity is supported by literary descriptions, by morphological evidence in human fossils and by the identification of mycobacterial DNA in human remains using genotyping methods, including PCR of the IS6110 repeat sequence, spoligotyping or PCR amplification, and sequencing of the D1 deletion region (Boxes 1,2; see Supplementary information S1 (table)). The initial analyses of these data led to the hypothesis that human-infecting M. tuberculosis arose through a zoonotic transmission of an ancient M. bovis strain from cows during domestication7,8 (Box 1), which is similar to the zoonotic origins of other human pathogens such as measles and influenza viruses. However, it was the subsequent application of genomics that forced a re-evaluation of the origins of M. tuberculosis in humans.

Fresh phylogenetic insights from genome sequencing. Insights into the emergence of M. tuberculosis as a human pathogen have come from several genomic studies, such as whole-genome sequencing (WGS) of M. tuberculosis, the characterization of the genetic diversity of modern M. tuberculosis strains and comparative genomic analyses with related species (including other members of the MTBC and smooth tubercle bacilli (STBs)).

The WGS of the first strain of M. tuberculosis — the widely used laboratory strain H37Rv — was a landmark in the study of this disease9. A little more than 100 years after the bacterium was isolated by Robert Koch, it was revealed to have a GC-rich genome of 4.4 Mb that contains ~4,000 genes. The genome sequence also enabled the development of DNA microarrays that could be used to probe the gene content of closely related mycobacteria through comparative genome hybridization10. Such studies, complemented by pulse-field gel electrophoresis approaches11,12, resulted in the identification of variable genomic regions that are present in some but not all MTBC strains. The distribution of these regions led to the construction of a phylogenetic tree of the MTBC species that began to overturn prior beliefs about the origins of tuberculosis13 (Fig. 1).

Figure 1: Evolutionary relationship between selected mycobacteria and members of the Mycobacterium tuberculosis complex.
figure 1

Mycobacteria are predominantly environmental organisms. Several transitions from the environment to pathogenicity have occurred in these mycobacteria, including transitions for Mycobacterium avium and the Mycobacterium tuberculosis complex (MTBC). The MTBC was thought to arise as a clonal expansion from a smooth tubercle bacillus (STB) progenitor population. The animal-adapted Mycobacterium bovis ecotypes branch from a presumed human-adapted lineage of Mycobacterium africanum that is currently restricted to West Africa. Human-adapted M. tuberculosis strains are grouped into seven main lineages, each of which is primarily associated with distinct geographical distribution. Colour coding for lineages corresponds to the colours in Fig. 2a,b. The dates of branching events are only crude estimates. TbD1 indicates the deletion event specific for M. tuberculosis lineages 2, 3 and 4. Evolutionary distances are not to scale. Detailed phylogenetic trees are provided in Supplementary information S2 (figure). All species shown are from the genus Mycobacterium.

PowerPoint slide

In contrast to the notion that human tuberculosis arose as a zoonosis from domesticated cattle, the phylogeny revealed that animal strains of M. bovis are nested within a tree of human-adapted tuberculosis strains and thus represent derivative species relative to M. tuberculosis13,14 (Fig. 1). This new phylogeny also casts doubt on the idea of a recent origin for human tuberculosis. The original zoonotic theory of M. tuberculosis posited that human tuberculosis arose with the domestication of animals ~10,000 years ago. The new phylogeny predicted that animal strains instead diverged from human strains much earlier.

The notion of an older origin of M. tuberculosis from other human-infecting mycobacterial species was further supported by genomic studies of STBs, which are primarily associated with human tuberculosis in East Africa15,16,17. Initial studies suggested that STB isolates and, particularly, Mycobacterium canettii represent an early branching lineage of human-infecting mycobacteria13,15,18 (Fig. 1). This phylogenetic relationship was recently confirmed by WGS of five STB isolates19. This relationship was also used to attempt to date the origins of human tuberculosis. As both STB and M. tuberculosis strains are associated with human tuberculosis, parsimony can be used to argue that the last common ancestor of the two strains was also adapted to humans18. However, this assertion requires caution, as arguments for an environmental reservoir for STB strains have also been put forward20. Using a molecular clock analysis, one study suggested that the ancestor of M. tuberculosis and STB strains might have existed up to 2.8 million years ago18. Assuming that there was a human-adapted ancestor, this leads to the controversial hypothesis that mycobacterial disease in hominids may predate that in Homo sapiens and may have plagued human ancestors as far back as Homo habilis.

The studies on STBs suggest that the emergence of pathogenicity was associated with a change in evolutionary tempo. STB strains are genetically diverse and show evidence of both recombination and lateral gene transfer (LGT; also known as horizontal gene transfer). By contrast, the MTBC seems to be a single clonal expansion from this progenitor population. MTBC strains display low genetic diversity and no unambiguous evidence of LGT. A similar pattern of a clonal expansion from a recombining progenitor population has also been observed in Mycobacterium avium, which is a non-tuberculosis mycobacterium related to M. tuberculosis and includes both environmental and animal pathogen strains21,22,23 (Fig. 1). This shared pattern suggests potential common constraints in the evolution of pathogenicity within the mycobacteria. Clonality may be either an adaptation to or a consequence of pathogenicity, and it has implications for interpreting the evolution of mycobacteria. Low genetic diversity is consistent with pathogens emerging from a genetic bottleneck. The relative lack of sequence diversity within the MTBC, in particular, is consistent with a bottleneck at the split between the MTBC and the progenitor population.

The degree of genetic diversity within the MTBC has been a topic of debate. Early studies based on selected genes revealed little DNA diversity and led to the notion that strain variability in M. tuberculosis was negligible and clinically unimportant24,25. Subsequent studies using a larger number of markers called into question this conclusion and suggested the presence of genetic differences between M. tuberculosis strains from different parts of the world. The full breadth of such diversity was first revealed by the WGS of 25 M. tuberculosis strains that were selected to span the emerging genetic diversity20,26. The subsequent sequencing and analysis of an additional 229 strains provided an even more detailed view27. The comprehensive picture afforded by WGS has revealed substantially more genetic diversity than previously appreciated, and two strains of M. tuberculosis were found to differ by up to 2,000 single-nucleotide polymorphisms (SNPs) — a genetic distance that is comparable to the interspecies distance between M. tuberculosis and M. bovis20. Although this diversity is substantially smaller than that seen in other bacterial species (see below), there is an increasing amount of evidence that it may nonetheless have functional and perhaps clinical consequences28.

WGS has also enabled the construction of the most reliable phylogenetic tree of MTBC strains so far27. Within this phylogeny seven major lineages of human pathogenic M. tuberculosis and M. africanum can be discerned along with a lineage of the animal-adapted strains of M. bovis (Fig. 1; see Supplementary information S2 (figure)). As previously detected using genotyping and selected sequencing29,30,31, each lineage is associated with a restricted geographical distribution32.

An updated model of M. tuberculosis origins. From the combination of archaeological, sequencing and comparative genomic analyses, a plausible scenario of the evolutionary origins of M. tuberculosis can be proposed (Fig. 2). Building on previous proposals13,29,33, components of this scenario have been described20,27,32,34,35. The deepest origins are most likely to lie in ancestral species that were adapted to soil and other environmental niches6,14,36. From these primordial roots, several transitions of mycobacteria to animal pathogenesis occurred. For M. tuberculosis, this transition began within an STB progenitor population18,19. It remains unclear whether human pathogens arose in this population (and if so, when this happened). From this early population, a successful clone emerged to seed the MTBC. The argument that this emergence occurred in Africa32 is supported by the geographical restriction of M. canettii and other STBs to the Horn of Africa, the restriction of M. africanum to West Africa and the occurrence in Africa of all seven lineages of human-infecting M. tuberculosis32 (Fig. 1). A coalescence analysis of genome sequences from 259 MTBC strains further supports this hypothesis and suggests that the MTBC arose at least 70,000 years ago27. From this origin, M. tuberculosis probably spread with human migration out of Africa. An analysis of the MTBC phylogeny suggests an early dispersal of MTBC lineage 1 that is coincident with the initial migration of humans out of Africa around the Indian Ocean as early as 67,000 years ago. A second dispersal may then have occurred ~46,000 years ago, which is coincident with the second wave of human migration into the Middle East, Europe and Asia27. Early human migration over the Bering Strait may provide an explanation for the archaeological evidence of tuberculosis in the ancient Americas (Box 1). In addition, MTBC lineage 7 seems to have diverged after the primary migration out of Africa. MTBC lineage 7 is so far associated strictly with the Horn of Africa37 and may thus have arisen in a population that either stayed in or returned to Africa27. Sometime during this period, a clone of M. tuberculosis that was originally adapted to cause human tuberculosis evolved to infect a non-human mammal and thus began the transition into non-human ecotypes (for example, M. bovis). Such infections then spread to other animals, including cattle, goats, oryx and seals38,39,40. The early branching of the M. bovis clade suggests that its emergence predates animal domestication, although it has been noted that domestication may have contributed to the spread to livestock14. After M. tuberculosis emerged in Africa, its spread with human migration to early population centres in Western Europe, northern India and East Asia would provide an explanation for the early emergence of three main lineages of M. tuberculosis and for the archaeological evidence of tuberculosis in these regions (Box 1; Fig. 2a). Human population expansion could have resulted in a concurrent expansion of these three lineages. Consistent with a co-divergence of MTBC and modern humans41, a comparison of the MTBC phylogeny with a tree constructed from human mitochondrial genomes reveals congruence27. Global exploration, trade, conquest and migration, coupled with the rapid growth in human population in modern times, would then have resulted in the worldwide spread of these initial lineages and led to the current phylogeographical distribution of tuberculosis (Fig. 2b).

Figure 2: Hypothesized evolutionary scenario for Mycobacterium tuberculosis and selected archaeological evidence.
figure 2

a | Early evolutionary events are shown. Colour coding for geographical locations corresponds to the lineages in Fig. 1. Mycobacterium tuberculosis and the M. tuberculosis complex (MTBC) arose from a smooth tubercle bacillus (STB) progenitor population in Africa through infections of early hominids. Mycobacterium africanum strains spread to West Africa (green arrow). During this period a strain evolved from M. africanum to infect non-human mammals. MTBC strains followed human migration from Africa to the Middle East and Asia (brown, purple, orange, yellow, light blue and dark blue arrows), which gave rise to documented archaeological evidence (pink circles). Potential migration across the Bering Strait may also have spread MTBC to the Americas (grey arrows). b | The spread of M. tuberculosis during modern times is shown. Human population expansions in Western Europe, northern India and East Asia gave rise to concurrent expansion of M. tuberculosis lineages in these regions. Global migration then resulted in the observed geographical distribution of these lineages. c | A timeline of evolutionary events and archaeological data is shown. The location for archaeological evidence is indicated in each box and corresponds to the pink circles in part a. Boxes outlined in black indicate morphological evidence only, whereas boxes outlined in red denote both morphological and molecular evidence. A complete table of citations of archaeological evidence for tuberculosis in the ancient world is provided in Supplementary information S1 (table). M. bovis, Mycobacterium bovis; MYA, million years ago. Parts a and b are modified from Ref. 32.

PowerPoint slide

Many questions remain concerning this hypothesized scenario. A recent analysis of 63 M. tuberculosis genomes supported the concurrent expansion of the M. tuberculosis and human populations but failed to find evidence of co-divergence between population structures42. The recent isolation of an M. tuberculosis strain from a wild chimpanzee in Côte d'Ivoire also complicates the picture43. WGS and phylogenetic analyses indicate that this strain clusters most closely with the M. africanum 2 lineage rather than with the M. bovis clade. However, several lines of evidence suggest that this strain in chimpanzees was not a result of infection from humans. Moreover, two other strains that were adapted to African mongoose and hydraxes have also been reported to cluster with the M. africanum 2 lineage44,45 but seem to be distinct from the strain in chimpanzees43. These interesting strains may imply that the earliest MTBC ancestor had a host range that was wider than the Homo genus. Alternatively, such strains may represent transitions of M. tuberculosis from humans to mammalian hosts that were distinct from the M. bovis transition. Further study into the full genetic and host diversity of the MTBC is required to answer these questions. Their answers, in turn, have important implications for understanding the nature of the co-evolution of M. tuberculosis and humans, the length of time over which this has occurred and the nature of M. tuberculosis pathogenesis, and therefore for the development of drugs and vaccines. In addition, the mechanisms of evolution have an immediate and important consequence for understanding the most recent evolutionary adaptation of M. tuberculosis: the emergence of drug resistance.

Recent evolution and drug resistance

Until ~80 years ago, there was no drug to treat tuberculosis. Since then, we have witnessed the emergence of single-drug resistant, multiple-drug resistant (MDR) and extensively drug resistant (XDR) M. tuberculosis strains. Most recently, four different countries have reported strains that are resistant to all drugs tested, which are known as totally drug resistant (TDR) M. tuberculosis46,47,48,49. Drug resistance is a widespread problem in all bacterial pathogens and is typically driven by a combination of LGT, genome rearrangement and nucleotide mutation. By contrast, evolution of M. tuberculosis seems to be more restricted, which raises challenges for understanding resistance in tuberculosis.

Genetic mechanisms and rates of recent evolution. LGT has long been considered a 'driver' for the evolution of new gene functions. In contrast to other bacteria50,51,52,53,54, evidence of LGT had been lacking within the MTBC. Comparative genomics between M. tuberculosis and related mycobacteria have provided strong evidence for the role of gene gains and losses in the evolution of mycobacterial ancestors55,56,57,58,59, and the progenitor population of STBs and M. tuberculosis shows clear signs of recombination18,19. However, since the emergence of the MTBC from this progenitor population, MTBC evolution seems to be mostly clonal60,61. Consistent with this, only 1% of SNPs in 259 sequenced MTBC strains display evidence of homoplasy27. However, a recent comparative analysis62 has provided evidence for ongoing recombination, albeit for short tracts of DNA. Further work is needed to confirm these intriguing findings.

Similarly, until recently, large-scale genome rearrangements were generally considered to be absent from the MTBC. Studies of larger-scale polymorphisms reported predominantly deleted sequences of small genomic regions that span part of a gene to several genes63,64,65,66. These data have led to the prevailing view that genome-scale evolution in the MTBC has mostly occurred by gene loss6. Such a pattern is often assumed to be consistent with reductive evolution that is associated with an intracellular pathogenic lifestyle36, and a similar pattern was observed in M. avium pathogenic strains relative to environmental strains67. However, two recent studies reveal that large-scale duplications are not altogether absent in M. tuberculosis68,69. WGS reported multiple large-scale duplications of the same genomic region in M. tuberculosis69. Originally identified only in the Beijing lineage68, WGS revealed such occurrences in multiple lineages with substantially different boundaries, which indicates independent originating events. The independent duplication of the same large genomic region in multiple strains suggests instability of this region and/or a selective advantage for these duplications. Along with the identification of a smaller segmental duplication in the laboratory strain H37Rv, these polymorphisms suggest that large-scale duplication events may be more common than previously considered. Gene duplication is an evolutionary driver of new gene functions50,52,54 and has been linked to both drug resistance and virulence in bacteria51,54,70,71. Thus, the results in M. tuberculosis warrant further study.

In spite of the evidence above, genome evolution in M. tuberculosis is thought to be mostly driven by sequential chromosomal nucleotide substitutions. Understanding the rate of nucleotide mutation in M. tuberculosis is thus central to our knowledge of the process of drug resistance evolution. To address this issue, one study carried out WGS on strains of M. tuberculosis isolated from cynomolgus macaques72. By sequencing both the infecting strain of M. tuberculosis and isolates after various times of infection, the authors were able to estimate the mutation rate in monkeys during various stages of tuberculosis infection. The estimate of 2 × 10−10 mutations per cell division of M. tuberculosis for active disease places the mutation rate at the lower end of the spectrum for bacteria. More surprisingly, the mutation rate per unit time for latent disease, reactivated disease and M. tuberculosis grown in vitro was not significantly different from that for M. tuberculosis during active disease. A subsequent study also confirmed that these rates were similar to those estimated from the sequencing of human isolates73. The finding that the M. tuberculosis per-time mutation rate is similar between different stages of tuberculosis is controversial74 because much evidence suggests that M. tuberculosis replicates at substantially different rates during different stages75,76. During active disease, it is thought that bacteria may exist in subpopulations with heterogeneous phenotypes within a single patient and even within a single granuloma3. During latent disease, M. tuberculosis is thought to enter a dormant state in which the replication rate is substantially slower than that during active growth. In many bacteria, mutation rate is linked to errors during replication. On the bases of their results and particular mutations observed, the authors propose an alternative model for M. tuberculosis infection. Mutations during latent infection might be explained by oxidative damage in the host. The mutation rate would then be linked to the amount of time growing in the host rather than to replication rate. The role of oxidative stress in the development of drug resistance has also been recently described in Escherichia coli77,78,79.

Implications of mutation rates for the emergence of drug resistance. The low mutation rate is consistent with the lack of genetic diversity among M. tuberculosis strains but is surprising in the face of the rapid emergence of multiple-drug resistance and the increasing amount of evidence for the diversity of M. tuberculosis in vivo. WGS of serial patient isolates has documented that resistance mutations to many drugs are often independently acquired several times by different isolates80,81,82,83. In the case of one patient who was infected with an M. tuberculosis Beijing strain82, different drug resistance mutations were found at different times during a nine-year period. Moreover, the evidence suggested that multiple different resistance mutations transiently existed within this patient between selective sweeps. A subsequent study confirmed this possibility by using WGS to directly characterize the genetic diversity of the M. tuberculosis population in individual patients84. Rather than isolating single bacterial colonies for sequencing, one study cultured bacteria from sputum samples and sequenced all colonies that were present. By sequencing to a high depth of coverage, the researchers could identify mutations at specific loci that are present only in a proportion of the bacteria sequenced. Using this approach, they confirmed the surprising heterogeneity of mutations in a single patient, albeit in sputa: as many as 4–5 different resistance mutations for a single drug could be detected, although only one was ultimately fixed in the population. Resistance to a single drug can thus arise multiple times, even in an individual patient85.

Together, these data have raised the question of whether the slow mutation rate of M. tuberculosis is sufficient to provide an explanation for the relatively rapid rate of multiple-drug resistance acquisition observed86. Several factors may play a part in this question74. One factor may be phenotypic: it has been proposed that transient hypermutator phenotypes might arise, as seen in other bacteria87. In M. tuberculosis, deletion of the error-prone DNA polymerase (DnaE2) reduced the rate of rifampicin resistance acquisition during treatment in mice88. This led to the suggestion that changes in the expression of DnaE2 alter the rate of resistance acquisition. Another important factor seems to be the genetic diversity in the different M. tuberculosis lineages. Experimental and epidemiological evidence has long suggested that different strains of M. tuberculosis can vary in their propensity to develop drug resistance. In particular, the Beijing family of strains that belong to the East Asian lineage has been associated with an increased rate of acquiring resistance mutations89,90,91,92. The relative contribution of bacterial genetic background and other epidemiological factors to this association has been unclear. One study showed that M. tuberculosis strains from different lineages differed in their intrinsic mutation rates: strains from the East Asian lineage acquired drug resistance in vitro more rapidly than strains from the Euro–American lineage73. These differences in mutation rates are consistent with known mutations in DNA replication and repair genes in strains from the East Asian lineage93. These data imply that patients infected with strains from different lineages are at different risks for developing drug-resistant disease. Such a possibility requires connecting the measured mutation rates with observed rates of drug resistance acquisition during treatment, and has substantial implications for treatment strategies.

In addition to the acquisition of de novo drug resistance, molecular epidemiological studies (Box 2) have also highlighted the importance of the transmission of strains that are already resistant (that is, primary resistance). The frequency at which resistant strains are transmitted has been a topic of debate94. It was once assumed that such strains would suffer a fitness cost relative to susceptible strains in the absence of treatment. However, several studies revealed that many resistance mutations carried no fitness costs and that these are the most common mutations in clinical isolates95,96,97,98,99,100. Moreover, the fitness cost of deleterious resistance mutations can be offset by compensatory mutations95,100. For example, a WGS study confirmed that mutations in rpoA and rpoC (which encode the α and β′ subunits of RNA polymerase, respectively) seem to offset the fitness cost of mutations in rpoB (which encodes the β subunit of RNA polymerase) that lead to rifampicin resistance101,102. These compensatory mutations are observed in a large proportion of clinical isolates83,101. These data imply that resistant strains may persist in the global M. tuberculosis population even in the absence of drug treatment.

From genomes to function: systems biology

Beyond mapping genome sequence and structure, which is rapidly becoming routine, sequencing-based technologies are now also enabling global profiling of genome function. The combination of genomics with other 'omics' technologies — such as proteomics, metabolomics and lipidomics — has the ability to assay cells with unprecedented breadth and depth, as well as across a range of timescales and experimental contexts. Thus, there is the potential to develop coherent models of the systems themselves that can be used to predict and understand dynamic system behaviour in a range of contexts.

A recent report described the application of this approach for understanding the molecular systems that underlie the ability of M. tuberculosis to survive in the host103. At the heart of this effort was the use of chromatin immunoprecipitation followed by sequencing (ChIP–seq)104,105,106,107 for the characterization of the genome-wide occupancy of M. tuberculosis transcription factors to delineate the global transcription factor regulatory network. The Tuberculosis Systems Biology Consortium was the first to apply ChIP–seq on a large scale to globally map transcription factor binding sites in bacteria103. The consortium recently reported a genome-scale map of the regulatory interactions of 50 transcription factors (which constitute 26% of predicted M. tuberculosis transcription factors), and many more have since been completed (see the Tuberculosis Database).

The M. tuberculosis ChIP–seq data have confirmed several surprises that have also emerged from extensive ChIP–seq mapping of transcription factors in nearly all other organisms107. This confirmation calls into question some of the simplifying assumptions about bacterial transcriptional regulation, as reviewed elsewhere107,108. First, the data are revealing that binding of transcription factors in M. tuberculosis occurs in many more diverse genomic locations than simply the promoter-proximal regions that are expected on the basis of the canonical model of regulation. ChIP–seq data from M. tuberculosis confirm that binding in upstream intergenic regions is enriched over that expected by chance, but this represents less than 40% of the binding events for any transcription factor. The majority of binding events occur outside upstream intergenic regions in either genic or converging intergenic regions, which suggests a potentially more complex role for transcription factor binding107,109. Second, mapping in all organisms has revealed many more binding sites even for well-studied regulators109, and this was also true for M. tuberculosis103,107. The functions of most of these novel binding sites in all organisms remain unknown. In M. tuberculosis these novel binding sites are nearly always associated with an underlying sequence motif, but not all instances of motifs are bound. Depending on the transcription factor, less than 40–70% of potential binding sites are occupied in vivo. Thus, M. tuberculosis transcription factor binding is specific to genomic context103. The determinants of this specificity remain unknown.

The binding data also suggest that the M. tuberculosis regulatory network is far more complex and interconnected than previously assumed. For example, there is substantial binding between transcription factors (Fig. 3a). This complexity mirrors that seen in regulatory networks in other bacteria and eukaryotes110,111,112, and its implications have been the topic of intense study. One key finding is that this connectivity results in many nested feedback and feedforward loops (known as network motifs) that are known to give rise to non-trivial expression dynamics, including gene expression pulses111,113,114. Such dynamics, which are typically invisible to steady-state or single time-point expression measurements107, are essential for understanding complex cellular behaviour51,52,70,115,116.

Figure 3: Host–pathogen interactions.
figure 3

a | Selected regulatory interactions that link responses to stress with changes in lipid metabolism are shown. Interactions were identified by chromatin immunoprecipitation followed by sequencing (ChIP–seq), transcription factor perturbation and transcriptomics103. Links are represented by arrows, the colours of which indicate predicted regulatory effects of the transcription factors on the target genes. b | A simplified model of the interactions between Mycobacterium tuberculosis and the host in an infected granuloma is shown. Interactions include two hypothesized feedback loops that are mediated partly by the regulatory interactions in part a. Host stresses induce metabolic changes in M. tuberculosis, which lead to the catabolism of host lipids (including cholesterol). Degradation of cholesterol leads to propionate build-up in the bacterium, which can be alleviated by the assimilation of other host lipids into the production of immunomodulatory lipids. Certain immunomodulatory lipids induce the formation of foamy macrophages that enclose numerous intracellular lipid-containing bodies. Lipids from these bodies can be accessed by M. tuberculosis. Thus, M. tuberculosis responds to the host environment and the digestion of host lipids, leading to the production of immunomodulatory lipids that shape the host environment to increase the availability of host lipids.

PowerPoint slide

The importance of expression dynamics in M. tuberculosis has recently been demonstrated by an investigation of the response of single cells to treatment with the antibiotic isoniazid (INH)117. Antibiotic treatment of M. tuberculosis and other bacteria typically only kills a proportion of cells, and this persistence in the face of antibiotics is thought to contribute to the difficulties of effectively treating tuberculosis and possibly to the emergence of drug resistance. Using a microfluidic device and time-lapse microscopy, one study showed that the catalase-peroxidase-peroxynitritase T (KatG) gene that activates INH is expressed in stochastic pulses. These pulses were only present in a minority of cells. Moreover, pulsing was negatively correlated with cell survival after treatment with INH. These data suggest that infrequent pulsatile expression of KatG has a role in allowing M. tuberculosis to adapt to drug exposure. The mechanisms that generate KatG pulsing are not yet known. The considerations above suggest that aspects of the complexity of the regulatory network mapped at a global level may well explain this behaviour, which is only visible when M. tuberculosis is observed at a highly granular level.

The regulatory network map also begins to reveal interactions between transcription factors that mediate the complex and dynamic responses of M. tuberculosis to the host environment (Fig. 3a), such as the hypoxic conditions within macrophages. For example, the data show an interconnected subnetwork that links hypoxic adaptation, lipid and cholesterol degradation, and lipid biosynthesis. These processes, which are among the most extensively studied in M. tuberculosis, are often treated as separate, disconnected phenomena. However, they are linked biochemically118, and the emerging M. tuberculosis regulatory network reveals that they may also be linked in terms of gene regulation (Fig. 3a). Consistently, a systems-level profiling of lipids, proteins, metabolites and mRNA in M. tuberculosis during a time course of hypoxia and subsequent re-aeration in vitro uncovered numerous alterations in lipid content, as well as changes in gene expression and metabolites in corresponding metabolic pathways103. For example, changes in oxygen tension produced rapid and reversible changes in the expression of genes that are necessary for cholesterol degradation. This was surprising because although cholesterol is present in host cells, the culture conditions used for profiling contained no cholesterol, and M. tuberculosis is unable to synthesize cholesterol de novo. These and other changes suggest that changes in oxygen levels evoke a regulatory and metabolic programme that results in changes which are specifically adapted to the host environment. This hypothesis, in turn, is consistent with the emerging picture of M. tuberculosis as a pathogen that has evolved over a substantial period of time to interact specifically with the human immune system.

The host–pathogen interaction

A successful M. tuberculosis infection is a delicate balance. On the one hand, M. tuberculosis requires an immune response that is adequate to establish a granuloma. However, the immune response must not be substantial enough to lead to complete sterilization. On the other hand, inflammation over time must be sufficient to promote the eventual breakdown of some granulomas for transmission. The growing evidence suggests that striking this balance requires a complex programme of cellular changes in both the pathogen and the host that is partly induced and coordinated by M. tuberculosis.

The mechanisms underlying these changes are many, complex and still poorly understood. However, there is an increasing amount of evidence that part of the explanation involves interactions that lead to changes in lipid metabolism in both the pathogen and the host (Fig. 3b). According to this scenario, host stresses119,120 induce metabolic changes in M. tuberculosis that include a switch to the catabolism of host lipids, particularly cholesterol119,121,122,123,124,125. Alterations in M. tuberculosis lipid metabolism are tightly linked with changes in the production of toxic propionate, the detoxification of which is central to M. tuberculosis host survival126. This programme also results in alterations in immunomodulatory cell wall lipids that can be trafficked through macrophages103,127,128 and induce host changes, including the conversion of macrophages to foam cells that have numerous lipid-containing bodies129,130. The lipids in these bodies, which include cholesterol, can be accessed by M. tuberculosis. Alterations in M. tuberculosis metabolism, including the digestion of host cholesterol, lead to propionate build-up121, which can be alleviated in M. tuberculosis by the assimilation of host lipids into immunomodulatory cell wall lipids118,131,132. This completes the loop through the stimulation of further host cell lipid availability. In short, host cells and M. tuberculosis may interact partly through positive feedback loops, in which responses to the host environment and the digestion of host lipids lead to the production of M. tuberculosis immunomodulatory lipids that shape the host environment to increase the availability of host lipids. This scenario is consistent with the regulatory links discovered in the M. tuberculosis regulatory network described above.

These and other interactions characterize M. tuberculosis as a pathogen that has evolved specific cellular programmes that not only respond to the host environment but also orchestrate changes to it3,132,133,134,135,136. One of the most compelling pieces of evidence that support this view was derived from the sequencing of strains spanning the genetic diversity of M. tuberculosis described above. A common theme for many human pathogens is that genes encoding antigens are frequently highly variable to allow pathogens to escape detection by the immune system. However, an analysis revealed that the situation in M. tuberculosis seems to be different26. The authors examined the conservation of all genes between the sequenced M. tuberculosis strains. As expected, genes that are known to be essential were more conserved than non-essential genes. Surprisingly, the majority of known T cell antigen-encoding genes were also as conserved as essential genes. Moreover, the epitope-encoding regions within these genes were the most conserved regions of these or any genes. This apparent purifying selection of T cell antigens has since been confirmed by a comparative analysis of sequenced M. tuberculosis and STB strains19. These results do not include data on PE–PPE genes, which constitute a highly repetitive antigenic gene family that is typically not captured by short-read sequencing (Box 2). Moreover, more work is required to link specific immune responses to the predicted conserved antigens. Nonetheless, these remarkable data suggest that, unlike other pathogens that seek to avoid host immune responses, M. tuberculosis is potentially under selective pressure to instigate such a response26.


The reports documented in this Review give rise to a picture of M. tuberculosis as a pathogen that is remarkably adapted for life in human host cells, and the evolutionary history of this bacterium suggests that this adaptation may have evolved over thousands, if not millions, of years. This alters our perspective on the disease. As previously noted, the formation of a tuberculosis granuloma is generally thought to be a response of the immune system to the bacterium, whereas breakdown of the granuloma then represents a failure of this response3. However, from the perspective of long-term evolution of M. tuberculosis with humans, the granuloma can also been seen as an adaption of M. tuberculosis to enable a persistent infection, which is a likely requirement in the face of low human population densities during pre-history3,137. The subsequent expansion of the human population has altered this dynamic, and more recent evolutionary events may be driving additional changes. Co-infection with both M. tuberculosis and HIV amplifies the impact on the host immune system and thus the effects of both tuberculosis and AIDS138. The impact of these and other recent events (for example, the emergence of diabetes139) on the evolution of M. tuberculosis remains to be seen.

This picture of host–pathogen interactions has implications for vaccine development efforts. The challenge facing immunologists is the development of a vaccine against an organism in which infection with the entire organism is insufficient to generate protective immunity. This challenge is illustrated by the failure of vaccination with the attenuated bacille Calmette–Guérin (BCG) strain to induce lifelong immunity. Such a challenge is potentially compounded by the finding that many of the antigens being used to develop vaccines seem to be under strong purifying selection26. As noted by a previous study, this is a double-edged sword: the lack of diversity may simplify vaccine development, but the possibility that the immune response induced by most antigens may have evolved to benefit M. tuberculosis complicates efforts26.

Encouragingly, the expanding application of genomics is providing new means for combating this old foe. Although most antigens seem to be highly conserved, sequencing has revealed a small subset of antigens that do show variation26. These may thus warrant further study in the context of vaccine discovery. The widespread sequencing and analyses of M. tuberculosis strains are also providing new insights into the ongoing evolution of M. tuberculosis during infection, treatment and the acquisition of drug resistance85. These results can also be used to develop more effective strategies for deploying existing drugs, such as by analysing drug resistance mutations in patient-derived populations of M. tuberculosis to predict the drugs that might be most clinically effective for a particular patient. Finally, systems mapping is uncovering the complex regulatory programmes that have evolved to enable the organism to survive in the host103. Disruptions of these programmes may thus represent new avenues for drug discovery.