Mycobacterium tuberculosis (M.TB) is considered one of the most successful pathogens in human history (Galagan 2014). Infection caused by M.TB or other species in the Mycobacterium tuberculosis complex (MTBC), known as tuberculosis (TB), has persistently been among the top ten global causes of death per decade, with at least 8 million new cases every year (World Health Organization 2019). Two important factors have kept TB as the major focus in clinical research: (a) increasing antibiotic resistance in patients since the 1970s (Alanis 2005; Toungoussova et al. 2006; Eldholm et al. 2016), and (b) the rise of human immunodeficiency virus (HIV) infection since the 1980s, and the resulting immunosuppression leading to an increased incidence of M.TB (Centers for Disease Control and Prevention CDC 1989; Müller et al. 2013). More recently, whole-genome sequencing (WGS) technologies have opened new avenues, particularly with regard to drug-resistance surveillance (e.g., Köser et al. 2014; Zignol et al. 2018), epidemiology for infection prevention and outbreak control (e.g., Dheda et al. 2017; Dicks and Stout 2019), and for the more general study of pathogen evolution and transmission dynamics (e.g., Trauner et al. 2017; Payne et al. 2019). Leveraging these data to study the evolutionary forces driving the evolution of M.TB, which will be the focus of this review, has required a population-genetic perspective. Specifically, we explore how M.TB evolves within human hosts, and summarize recent results of relevance related to population-genetic theory and statistical inference. We argue that newly developed null modeling has shed light on earlier discrepancies and paradoxes, and provides an appropriate framework for studying the evolution of drug resistance in a variety of pathogens.

Deep sequencing as a measure of variation

During the last decade, WGS methods have allowed for detailed insight into DNA sequence variation. For pathogen samples, the number of sequences in the alignment is taken to be representative of an entire within-host population, and commonly this diversity is summarized as a consensus sequence for subsequent between-host analyses. Such analyses in M.TB have revealed low genetic diversity, with few SNPs differing between patients in an outbreak. In some cases, there have been fewer than 6 nucleotide differences across more than 100 patient samples (e.g., Walker et al. 2013; Séraphin et al. 2019). Similarly, low variation has been reported for thousands of global samples with only ~2200 SNPs separating any two MTBC genomes, and ~1000 on average differing between any two strains (e.g., Liu et al. 2018; Ruesen et al. 2018). Consistently, global phylogenetic levels of genome-wide nucleotide diversity have been observed to be on the order of 10–4/site (O’Neill et al. 2015, 2019). While the limits of SNP detection naturally differ by sequencing technology (see reviews of Pfeifer 2017; Meehan et al. 2019), reducing the analysis to the most common alleles between hosts (i.e., a consensus sequence) as is common practice, has limited our progress toward understanding M.TB evolution, as the full spectrum of diversity remains underexplored (Fig. 1; see discussion of Renzette et al. 2017). For example, recent studies have revealed that M.TB nucleotide variation is considerably higher within host than previously thought—particularly if a sample is directly sequenced from the sputum rather than cultured before sequencing (Lee et al. 2020)—reaching up to 10–3/site (Séraphin et al. 2019; Shockey et al. 2019; Nimmo et al. 2019). Additionally, WGS analysis has revealed the large number of rare alleles present in a host population, with a portion being associated with drug resistance (Operario et al. 2017; O’Donnell et al. 2019). Rare alleles are most often excluded in common bioinformatic practice in M.TB, as establishing a minimum allele frequency is necessary for determining unambiguous variants (which will differ by sequencing technology and processing pipeline, as noted above), and would by definition be neglected in a consensus. These factors, combined with a lack of standardization in workflow for WGS analysis of M.TB with different coverage requirements and SNP filtering criteria, appear to have led to consistent underestimation of within- and between-host genetic variation (Meehan et al. 2019; Ley et al. 2019).

Fig. 1: M.TB population diversity from a host sample.
figure 1

An example of a clinical sample, demonstrating how a consensus sequence-based analysis neglects low-frequency variants (which are expected to constitute the majority of within-host variation). Each star in the clinical sample represents a unique variant. In this example, only one of the four variants would be represented in the consensus.

We next briefly summarize the primary processes shaping within-patient levels and patterns of variation that have been described to date.

Determinants of M.TB diversity

Within-host spatial structure

Sequentially sampled sputum from individual patients analyzed with WGS has revealed that the genetic separation of bacteria can be sufficiently large to resemble subpopulations (Ford et al. 2012; Liu et al. 2015), a phenomenon also observed in autopsies across different anatomical compartments from HIV patients with M.TB infection (Lieberman et al. 2016). Relatedly, infection experiments performed on nonhuman primates show limited evidence of interlesion mixing of bacterial populations, indicating that granulomas (aggregations of immune cells—a host response for isolating infection) may only rarely exchange bacteria after dissemination (Martin et al. 2017). However, infection is associated with a variety of types of pathological lesions, and the ability of each to effectively contribute to pathogen migration remains an open question (and see review of Leanerts et al. 2015). Hence, interpretations pertaining to the entire population, based on a single sample per patient, must be made with caution—as different samples may be associated with different variants (Dheda et al. 2018; Cohen et al. 2019). Furthermore, such structuring may effectively serve as a reservoir of variation, an important consideration in designing treatment strategies (Navarro et al. 2017; Cadena et al. 2017).

Purifying selection

The selective removal of deleterious mutations (purifying selection), as well as the resulting background-selection effects (Charleworth et al. 1993), strongly impacts M.TB populations, given both a coding-dense genome and a lack of recombination (Dos Vultos et al. 2008; Morales-Arce et al. 2020). Furthermore, as a haploid, there is an expectation that selection may generally act more efficiently, as fitness-impacting mutations may not be masked by dominant alleles (Kondrashov and Crow 1991; Otto and Gerstein 2008). Other claims have been made of relaxed purifying selection in M.TB, given what appears to be an excess of nonsynonymous variants attributed to a lack of selective constraint (Hershberg et al. 2008; Pepperell et al. 2013; Lee et al. 2015). More recently, WGS data have allowed for the observation that rare alleles are common in patient samples (Trauner et al. 2017; Shockey et al. 2019), and that most of these rare alleles are eventually lost (Séraphin et al. 2019), as would be expected. This excess of rare variation is represented by the strongly left-skewed site-frequency spectra (SFS) generally observed in individual samples (Trauner et al. 2017), and despite the loss of diversity when calling consensus sequences, similar patterns are observed when comparing consensus sequences across patients (Chiner-Oms et al. 2019). In terms of parameter fitting, infection histories combined with a mix of both deleterious and neutrally evolving sites produce the nearest fit to the observed SFS in M.TB populations (Pepperell et al. 2013; Morales-Arce et al. 2020)—consistent with the suggestion that purifying selection effects are widespread in the M.TB genome, serving to greatly reduce variation and skew the SFS (Pepperell et al. 2010; Namouchi et al. 2012; Liu et al. 2014; Minias et al. 2018). Further, when considering the relative roles of selective compared with stochastic (e.g., genetic drift) factors in dictating observed variation, it is important to appreciate that this is not simply a question of the underlying selection coefficients themselves, but also of the underlying effective population size (i.e., the relevant quantity is the product of the two)—with selection acting more efficiently in large relative to small populations (Ohta 1973). Further, the effective population size itself is shaped by population history, progeny skew, and selective effects, resulting in heterogeneity in the efficacy of selection (see Charlesworth and Charlesworth 2010; Walsh and Lynch 2018).

Population bottlenecks

Experimental data and physiological studies in M.TB and other members of the MTBC have shown that pulmonary infection can be established by as few as 1–3 bacteria (Rich 1946; Sonin 1951; O’Grady and Riley 1963; Dean et al. 2005; see the review of Ryndak and Laal 2019). This infection bottleneck is one avenue through which genetic drift shapes M.TB evolution, the effect of which would be amplified at each transmission. M.TB populations also experience subsequent population bottlenecks after transmission, related to within-host dissemination (Martin et al. 2017) as well as drug treatment (Cohen et al. 2019). Unfortunately, with the severity of the infection bottleneck combined with a lack of recombination, distinguishing selective from demographic signals can be challenging (Thornton and Jensen 2007; Crisci et al. 2013), even under standard models. In sum, both population-size changes and purifying selection likely contribute to the observed low levels of genetic diversity, and to the excess of low-frequency variants.

Progeny skew and mutation rate

One underappreciated process in M.TB has been the large variance in progeny distributions, and recent multiple-merger coalescent (MMC) modeling has been proposed as the appropriate population-genetics framework to examine M.TB diversity (Morales-Arce et al. 2020; Menardo et al. 2020). Martin et al. (2017) provided experimental evidence for the appropriateness of MMC for M.TB through digital barcoding experiments, in which individual bacteria were tracked from the moment of infection in macaques. The experiments demonstrated that most diversity was produced inside the granuloma, where for each infection, a single dominant clone generated large population sizes (up to ~53,710 CFU in a 4-week period). As the common Wright–Fisher (WF) model assumes progeny distributions to be small, M.TB likely violates this assumption, necessitating the development of alternative inference approaches (Tellier and Lemaire 2014; Irwin et al. 2016; Sackman et al. 2019). Specifically considering MMC modeling of within-host M.TB populations sequenced from serial sputum samples, recent work has demonstrated that such progeny skew (Ψ) is likely an additional and important factor in reducing variation (Fig. 2), and further that a neglect of this parameter has resulted in a downward bias in the estimation of de novo mutation rates (Morales-Arce et al. 2020). In other words, under a null model not accounting for Ψ or background selection, lower mutation rates are inferred in order to fit the paucity of observed variation. However, once accounted for, inference suggested an underlying de novo mutation rate on the order of ~6e–8 per site per replication, and a mean progeny distribution strongly differing from WF assumptions (Morales-Arce et al. 2020). Thus, once these biologically relevant, diversity-reducing processes are considered, it is necessary to invoke higher mutation rates in order to fit observed within-patient levels of variation.

Fig. 2: Infection dynamics and population-genetic diversity.
figure 2

A The diversity of M.TB within a patient is reduced relative to Wright–Fisher expectations owing to stochastic progeny skew, as shown in the inset in which a single clone in generation ti–1 leaves a large proportion of progeny in the following generation ti. Each colored star represents a unique variant. B The transmission bottleneck associated with a new patient infection additionally acts to reduce diversity to a subset of that present in the infecting individual. Further, the cycle of clonality will continue to reduce variation and alter allele frequencies.

However, there are a number of challenges in directly relating these within-patient mutation-rate estimates with those previously made within a phylogenetic context, which are rather based on the comparison of between-patient consensus sequences. First, the latter estimates are generally given per year, whereas the population-genetic estimates are per generation. While there is some support for a generation time of 1 day (Cole et al. 1998), the uncertainty surrounding this conversion makes direct comparisons difficult. Second, the population-genetic estimates concern the total mutation rate, whereas phylogenetic estimates measure the neutral mutation rate (i.e., mutations with an appreciable probability of fixation). As such, determining the fraction of the total distribution of fitness effects that is represented by neutral mutations represents another highly tenuous conversion factor. Finally, consensus sequences by their nature neglect the vast majority of sequence variation (Fig. 1); thus, within-patient genome sequencing measures variation on a much finer scale.

Positive selection

While the above processes are highly significant for understanding M.TB evolution, the identification of positively selected sites remains a major focus owing to the high incidence of drug resistance (as such, this search has been well-reviewed elsewhere—see Kurz et al. 2016; Dookie et al. 2018; Singh et al. 2020), and thus will not be extensively rediscussed here given length limitations. Briefly, a primary objective of many WGS efforts has been the characterization of de novo mutations and genes specifically involved in mechanisms associated with specific drug treatments (Gygli et al. 2017; Dookie et al. 2018; Ghajavand et al. 2019). This framework has led to the proposal of novel multidrug combinations that may reduce the possibility of adaptive escape (Moreno-Gamez et al. 2015; Trauner et al. 2017). Furthermore, the adaptive potential of M.TB has been partially attributed to the existence of pulmonary cavities and lesions that can effectively protect M.TB populations in the presence of drug treatment, harboring potential resistance variants (Moreno-Gamez et al. 2015).

Yet, differentiating the genomic effects of positive selection from population bottlenecks can be challenging as described, and even more so with the addition of Ψ, and a neglect of an appropriate null model in such genomic scans can often result in extreme false-positive rates (i.e., misidentifying large numbers of SNPs as being positively selected; Crisci et al. 2013; Harris et al. 2018; Jensen et al. 2019). Statistical analyses designed to improve the ability to differentiate these evolutionary processes have largely been focused on the WF model and Kingman coalescent, and only recently have efforts been made to extend this focus to the type of non-WF MMC models of relevance to most human pathogens, and M.TB in particular. For example, Eldon et al. (2015) demonstrated that population-size change and Ψ may be differentiated based on expectations in the SFS; Matuszewski et al. (2018) derived analytical expectations for such SFS and demonstrated an ability to coestimate size change and Ψ in a maximum-likelihood framework, and Sackman et al. (2019) utilized an approximate Bayesian framework to additionally identify positively selected mutations, though this approach requires the presence of time-sampled patient data.

Non-patient-associated M.TB samples—the emerging role of ancient DNA

Importantly, clinical research on M.TB remains necessarily focused on resistant strains. Thus, historical and ancient samples are highly valuable for characterizing levels and patterns of variation from the pre-antibiotic era. Owing to the above-described difficulties in untangling the contributions of various evolutionary processes, ancient DNA is beginning to be utilized to provide a deeper temporal perspective as well. The first ancient genome-level data belonging to members of the MTBC were released in 2014, produced from human remains discovered in the Osmore River valley in Peru, dated to ~1000 years before present (Bos et al. 2014). These and the ancient M.TB genomes produced since (Kay et al. 2015; Sabin et al. 2020) have offered valuable evolutionary insights, potentially better-informing mutation rates across the MTBC for example, though they also present unique challenges. Specifically, pathogen genomes from archeological remains are typically lower coverage than their modern, clinical counterparts. In addition, ancient samples are vulnerable to contamination by environmental microbes, which include many benign species within the genus Mycobacterium that share substantial nucleotide identity with species in the MTBC (Warinner et al. 2017). As such, in addition to the suboptimal nature of working with consensus sequences described above, there is an issue associated with the filtering of multi-allelic sites for which there is no clearly common allele (Bos et al. 2014; Sabin et al. 2020). Furthermore, in many genetic studies of ancient microbes, heterogeneity among variants is considered to be a sign of exogenous contamination (Bos et al. 2019). While the fraction of heterozygous sites and the allele-frequency distribution may act together as a marker of quality under this rationale, it also inherently neglects potentially significant population-specific variation. Kay et al. (2015) implemented an alternative approach, in which variant genotypes were treated as true within-individual variation in the form of mixed-strain infections. To orient the strains within the MTBC, sequencing reads were mapped to phylogenetically relevant nodes; however, any variation unique to the samples (i.e., not represented by the existing phylogeny) was not analyzed.

To overcome these difficulties in order to leverage true within-host variation of M.TB from ancient samples, and thus better quantify the above-described variation-determining processes, variant sites thus must have high coverage and be rigorously authenticated. Though many studies present high- coverage pathogen genomes produced through shotgun sequencing alone, such a strategy may be cost-prohibitive, depending on the metagenomic makeup of the sequencing library. Alternatively, investigators may use a targeted enrichment approach on a pathogen-positive library to achieve sufficient coverage for confident variant analysis. Either strategy carries the risk of including exogenous contamination from closely related taxa, however, which must be detected and excluded if possible. Taxonomic binning tools can be used to assess the overall genetic diversity in the library (Warinner et al. 2017; Bos et al. 2019) and extract taxon-specific reads for downstream analysis, competitive mapping strategies can be used to eliminate reads that better align to known contaminant reference sequences (Andrades Valtueña et al. 2017), and mapping stringency can be relaxed and enhanced to determine the impact of poor-quality alignments on the mean coverage of the target reference sequence (Bos et al. 2019; see Pfeifer 2017 for a general overview of these bioinformatic pipelines). Researchers could also use fragment-misincorporation plots to determine if an alignment to a target reference sequence has the damage pattern expected in ancient DNA (i.e., C>T transitions in the sequencing reads due to deamination of cytosine to uracil, which is read by the Illumina sequencer as thymine). One may also, with sufficient coverage, only utilize reads with damage for downstream analysis, clipping the damaged ends for genotyping, as has been done with some ancient human datasets (Skoglund et al. 2014; Posth et al. 2018).

An additional consideration for future analyses of ancient M.TB DNA is within-host population structure, as discussed above. Successful next-generation sequencing (NGS) investigations of ancient M.TB DNA have largely been restricted to a few sampling sites in the body. In skeletonized remains, which make up the majority of samples that are screened, vertebral bodies with lesions suggestive of spinal TB have been the only positive source (excluding PCR-based studies) published thus far. In mummified remains, ribs, lung tissue, abdominal tissue, or calcified nodules from the lungs have yielded positive results (Kay et al. 2015; Sabin et al. 2020). These samples represent only one compartment per individual (Fig. 3). This sheds light on another difficulty of ancient DNA studies of TB, which is that the stochasticity of preservation of DNA throughout different elements (as demonstrated thus far with human DNA; Damgaard et al. 2015), or of human remains more generally, limits our ability to gather genetic data from the whole infectious community within an individual. This unpredictability is unavoidable when working with archeological remains. However, incorporating the biological reality of structuring between tuberculous granulomas and other lesions into discussions contextualizing genetic variation in ancient specimens will provide more nuance to our understanding of M.TB evolution.

Fig. 3: M.TB structuring and preservation in archeological human remains.
figure 3

The red and blue dots indicate body sites from which ancient M.TB DNA has been successfully recovered using NGS techniques. Red dots represent successful recoveries from mummified remains where soft tissue had been preserved, with positive samples taken from a rib bone (Kay et al. 2015), a calcified lung nodule (Sabin et al. 2020), soft tissue from the lungs (Kay et al. 2015), and soft tissue taken from the abdominal cavity (Kay et al. 2015). Blue dots represent successful recoveries from skeletonized remains (Bos et al. 2014). Published positive findings from skeletonized remains have been limited to vertebrae as of the writing of this paper. In the majority of cases, only one sampling site is represented per individual. The diversity discovered in an individual sample may not be representative of the total infection population across different subpopulations within a host. In addition, the stochasticity of DNA preservation in terms of ubiquitous contamination, fragmentation, and cytosine-to-uracil deamination poses barriers to accurate reconstruction of bacterial population diversity during life.


The main challenge of studying drug-resistance evolution in M.TB, and other pathogens, is to disentangle the contribution of specific population-level evolutionary processes, including positive selection and selective sweeps, purifying and background selection, population bottlenecks and structuring, and the underlying mutation rates and progeny distributions. Importantly, recent theoretical and statistical results are beginning to explore the abilities and limitations of joint parameter estimation (Eldon et al. 2015; Matuszewski et al. 2018; Sackman et al. 2019), and to construct an appropriate evolutionary null model for M.TB. Yet, these efforts in MMC models are in their infancy compared wih the decades of important developments achieved under the more common Kingman coalescent, and continued theory development in this area is greatly needed (Wakeley 2013; Irwin et al. 2016). What has become clear thus far however is that background selection and progeny-skew effects are likely much more dominant in shaping the patterns of variation in M.TB than previously appreciated (Morales-Arce et al. 2020; Menardo et al. 2020), and ought to be included in future null modeling. Fortunately, advanced simulation tools (e.g., SLiM; Haller and Messer 2019) readily allow for the modeling of these processes, as well as for the incorporation of population-size change and positive selection as well. Furthermore, experimental approaches are also beginning to quantify these parameters better—with mutation-accumulation studies better informing the underlying mutation rates (Ford et al. 2011; 2013), and digital barcoding offering insights on progeny distributions (Martin et al. 2017)—again highlighting the value of better integrating experimental and natural population studies (Bank et al. 2014). More mutation- accumulation studies in the genus Mycobacterium (e.g., Kucukyildirim et al. 2016) and in members of MTBC specifically could shed light on the controversies surrounding mutation-rate evolution in M.TB. Such experimental evolution approaches could also prove invaluable for better-characterizing mutational interactions, the distribution of fitness effects, and progeny distributions. On the empirical side, new sequencing technologies combined with time-sampled data can detect lower-frequency genetic variation and provide improved statistical power to quantify these different evolutionary processes. Importantly, this quantification of within-patient variation will allow the field to move away from the common practice of only comparing per-individual consensus sequences (see Séraphin et al. 2019; Lee et al. 2020)—an approach that neglects the vast majority of segregating variation, thus hampering scans for resistance mutations, mixed-strain infections, and transmission chains. In addition to this enhanced view of modern variation, recent studies also provided insight into ancient within-host variation. As such, future research on the evolution of M.TB will benefit from both shallow-time serial patient sampling and the deep-time serial sampling of ancient genomes.