Eighteenth-century genomes show that mixed infections were common at time of peak tuberculosis in Europe

Tuberculosis (TB) was once a major killer in Europe, but it is unclear how the strains and patterns of infection at ‘peak TB' relate to what we see today. Here we describe 14 genome sequences of M. tuberculosis, representing 12 distinct genotypes, obtained from human remains from eighteenth-century Hungary using metagenomics. All our historic genotypes belong to M. tuberculosis Lineage 4. Bayesian phylogenetic dating, based on samples with well-documented dates, places the most recent common ancestor of this lineage in the late Roman period. We find that most bodies yielded more than one M. tuberculosis genotype and we document an intimate epidemiological link between infections in two long-dead individuals. Our results suggest that metagenomic approaches usefully inform detection and characterization of historical and contemporary infections.

T uberculosis (TB), although still a major global health problem, was once much more common in Europe than it is today-for example, when first describing the bacterial aetiology of TB in 1882, Robert Koch claimed that this infection accounted for one in seven deaths 1 . However, it remains unclear when TB reached peak prevalence in Europe and how the epidemiology of infection differed in a historical high-prevalence context from what we see today. In addition, dates of origin of key TB lineages remain contentious: for example, recent estimates of the M. tuberculosis complex, which contains human-and animalassociated lineages, vary by an order of magnitude from 70,000 to o6000 years ago 2,3 .
Here, we address these questions by analysing 14 historical genome sequences of M. tuberculosis with well-documented dates, obtained from human remains from eighteenth-century Hungary using shotgun metagenomics (direct sequencing of DNA from samples without target-specific capture or amplification) 4,5 . Our samples originate from a crypt in the Dominican church of Vác in Hungary ( Fig. 1) that was used to house the remains of affluent Catholics during the eighteenth and early nineteenth centuries. When re-discovered in 1994, it was found to contain the remains of over 200 individuals. Most of these had undergone natural mummification and for many, names and dates of death were available from written records. Previous pathological and molecular investigations showed that around half those sampled were infected with TB 6 and, in a preliminary analysis, some of us showed that genomic data could be acquired from one Vác sample 5 .
In this study, we show that all the historic genotypes from Vác belong to M. tuberculosis Lineage 4. Bayesian phylogenetic dating places the most recent common ancestor of this lineage in the late Roman period. We find that most bodies yielded more than one M. tuberculosis genotype and we document an intimate epidemiological link between infections in two long-dead individuals.

Results
Genome sequences. We extracted DNA from samples from 26 bodies from the Vác crypt with previous evidence of infection with TB (Table 1 and Supplementary Table 1). We converted the DNA into Illumina libraries, which were then sequenced alongside three blank controls. Sequencing reads were then mapped to the reference genome of M. tuberculosis strain H37Rv (Genbank accession code NC_000962.2) under conditions stringent enough to exclude spurious hits to conserved genes from related environmental organisms (o3 mismatches per 100 bases; exclusion of reads mapping to rRNA genes). In this way, we obtained draft M. tuberculosis genome sequences from eight bodies (Table 1). From five of the eight bodies we recovered multiple M. tuberculosis genome sequences ( Supplementary Figs 1 and 2), so that, in total, we acquired 14 eighteenth-century M. tuberculosis genome sequences, 4 of them at 410X coverage (B68-1, B68-2, B80, B92-1). No significant matches to M. tuberculosis were found in the negative controls. Among the historical M. tuberculosis reads, we found a bias for a purine before the start of reads, consistent with the depurination seen in aged DNA, although, as with medieval leprosy 7 , some signatures of ancient DNA damage were absent, including CT and GA base conversions at the 5 0 and 3 0 ends (Fig. 2).
Phylogenetic analyses. In all 14 historical genomes, we detected a seven base-pair deletion in the pks15/1 gene, characteristic of the Euro-American lineage of M. tuberculosis 8,9 . This lineage, also termed Lineage 4, currently accounts for over a million TB cases a year in Europe and in the Americas 10 . To determine precise relationships between historical and modern strains, we retrieved 1,582 contemporary unassembled Lineage 4 genomes drawn from four collections [11][12][13][14] and one Beijing-lineage genome as outgroup. We mapped reads from the contemporary genomes and from our four high-coverage historical genomes against the H37Rv reference genome. We then generated a phylogenetic tree from single-nucleotide variants (SNVs) found in non-repetitive regions and adapted a recently described hierarchical nomenclature 15 to define nodes and sub-clades within the tree (Fig. 3a and Supplementary . In this way, we established that our four high-coverage historical genomes belonged to phylogenetically distinct genotypes ( Fig. 3a and Table 1).  The tree was rooted using a Beijing genotype (not shown). Ten additional low-coverage historical genotypes were mapped to the tree with MGplacer (red nodes). Lineages and sub-lineages used for dating (Supplementary Table 3 Conventional phylogenetic methods that rely on identification of all trusted SNVs within a genome could not be applied to the ten low-coverage genome sequences we obtained. We thus adapted the technique of phylogenetic placement, whereby lowcoverage genomes are placed on a fixed reference tree, computed from high-coverage genomes. To do this, we reconstructed the sequences of all nodes within the Lineage 4 phylogeny and documented the SNVs that characterized each node. We then devised a new algorithm, MGplacer, capable of mapping lowcoverage genomes, including those from multiple-genotype samples, to successive nodes within the tree. This approach allowed us to perform phylogenetic placements for all ten lowcoverage historical genomes (Fig. 3). These phylogenetic analyses revealed that there were at least 12 distinct strains of M. tuberculosis circulating in eighteenth-century Hungary. This means we can rule out a clonal outbreak caused by a single particularly virulent strain as the explanation for the high prevalence of TB in this population. Furthermore, the deep nesting of our historical M. tuberculosis genotypes within contemporary sub-divisions of the Lineage 4 phylogeny confirms continuity of strain lineages over the past two centuries.
Dating. Next, we estimated divergence times for Lineage 4 and its sub-lineages. For dating, we selected the four high-coverage historical genomes, which had well-documented dates of death for tip calibration, as well as 161 modern genomes from Lineage 4 that spanned a range of isolation dates from 1992 to 2010 (Supplementary Table 2). We detected significant clock-like behaviour for the Lineage 4 tree when data from modern and historical isolates were combined (Fig. 4). We obtained consistent dates of the most recent common ancestors of Lineage 4 and of several internal nodes using two different programmes, Path-O-Gen 16 and BEAST 17 Fig. 4 and Supplementary Fig. 3). The Path-O-Gen analysis reported the date of the most recent common ancestor of Lineage 4 as 470 CE, whereas the median estimate from the best of 12 BEAST models placed the date at 396 CE, that is, within the late Roman period, with a range between models that spanned the Iron Age to the end of Antiquity (Supplementary Table 4). Our dating is consistent with evidence that a strain containing the pks15/1 deletion was present in Britain by the second-fourth centuries CE 18 .
In line with historical epidemiological records 19,20 , a Bayesian Skyline plot 21 shows that the effective population size for Lineage 4 increased continuously from its origin until the twentieth century, when it underwent a precipitous decline (Fig. 5b). Our median estimate of the mutation rate for Lineage 4 is 5.00 Â 10 À 08 substitutions per nucleotide per year (Supplementary Table 4). This mutation rate, calibrated with accurate historical dates, is similar to rates estimated from contemporary M. tuberculosis genomes 11 and from M. pinnepedii genomes that were obtained from samples radiocarbon-dated to 1028-1280 CE 3 . This mutation rate is consistent with the hypothesis that the most recent common ancestor of the M. tuberculosis complex existed o6,000 years ago 3 , but is inconsistent with the recovery of amplification products indicative of sub-lineages within the complex from Neolithic samples [22][23][24] . One potential explanation for this discrepancy is that some assumptions underlying phylogenetic dating (for example, of a stable substitution rate) may be erroneous. Alternatively, ancient DNA studies that rely on PCR amplification may have been subject to contamination. The recovery of additional M. tuberculosis genomes from well-dated historic and prehistoric samples will be needed to settle this issue, including from the Neolithic samples that have generated the sublineage-specific amplification products.
Mixed infections. Microbiological analyses of samples from contemporary TB patients usually report a single strain of M. tuberculosis per patient. By contrast, five of the eight bodies in our study yielded more than one M. tuberculosis genotyperemarkably, from one individual we obtained three distinct genotypes (Table 1). Although this predominance of mixed infections almost certainly reflects a real difference between the   epidemiology of TB today and in this historical setting, mixed infections are still seen in up to a fifth of cases in high-prevalence areas and four distinct M. tuberculosis genotypes have been reported from a single patient [25][26][27] . We thus suspect that multistrain infection was common during peak TB in Europe. However, as culture-based TB microbiology appears to be poorly suited to the detection of mixed infections 28 , the approaches we describe here might deliver improvements in diagnosis and management of contemporary infections 29 .
Within family transmission. Two of the bodies we sampled belonged to a family group: Anna Schöner (body 28) was the mother of Terézia Hausmann (body 68). Our analyses on these bodies provide the first evidence of an intimate epidemiological link between TB infections in two long-dead individuals, supporting mother-child transmission, or vice versa, or infection from a common source. More striking is that we obtained the same two M. tuberculosis genotypes, albeit in different proportions, from samples from both bodies (Table 1). It remains unclear whether this shared within-host diversity in mother and daughter stems from multiple episodes of infection or from a single transmission event of more than one strain. These findings add weight to the claim that within-host diversity poses a challenge when attempting to infer the nature and direction of disease transmission 30 . Interestingly, two samples from Terézia Hausmann's lung yielded different proportions of the two genotypes, perhaps suggesting fine-grained spatial heterogeneity in the distribution of strains ( Supplementary Fig. 2c).

Discussion
Here, we have confirmed the remarkably high prevalence of TB within an affluent, urbanized, but largely pre-industrial, Central  Table 2 summarizes the dating estimates for nodes. Short branches corresponding to four historical genotypes are labelled by name and highlighted by asterisks. Coloured boxes show broad spoligotype groupings for modern isolates, illustrating the paraphyletic nature of these groups (details in Supplementary Fig. 3). (b) Bayesian skyline plot showing changes over time in effective population size, N e (in black) since 396 CE, with 95% confidence intervals in grey.
European population. By showing that historical strains can be accurately mapped to contemporary lineages, we have ruled out, for early modern Europe, the kind of scenario recently proposed for the Americas 3 , that is, wholesale replacement of one major lineage by another (with a different host range and presumed pathogen biology) and have confirmed the genotypic continuity of an infection that has ravaged the heart of Europe since prehistoric times 31 . With TB resurgent in many parts of the world, including Hungary 32 , the struggle to control this ancient infection is far from over. Here, we have shown that metagenomic approaches can document past infections. However, we have also recently shown that metagenomics can identify and characterize pathogens in contemporary samples 29,33 , so such approaches might soon also inform current and future infectious disease diagnosis and control.

Methods
Sample collection and storage. Samples were collected in 1997 and 1999 using a Stortz endoscope and aseptic technique. Sample site, body number and name were recorded and samples were placed into numbered sterile universal bottles, which were then individually wrapped in plastic bags and stored at 4°C. Relevant biographic information (for example, age, sex, family name and relationships, date and cause of death) was retrieved from contemporary written records, including text on the coffin and information in death, baptismal and marriage registers.
Extraction of DNA with library preparation and sequencing. DNA was extracted from mummified tissue following a modification of a published protocol 34 . Mummified tissue (15-20 mg) was added to 400 ml of deproteination solution (0.5 M EDTA pH 8.0 and 20 mg ml À 1 proteinase K) in a sterile 2 ml screw-capped tube containing a minimum of ten glass beads (1-2 mm diameter). The tubes were mixed twice in a mini-bead beater at top speed for 45 s, then incubated shaking at 56°C for 48-72 h (or until the samples were fully dispersed). After deproteination, half the slurry was transferred into sterile 15 ml screwcapped tubes (N-phenacythiazolium bromide (PTB-)), with 4.5 ml lysis buffer (5 M guanidine thiocyanate, 0.1 M tris-HCl pH 6.4, 0.2 M EDTA pH 8.0 and Triton X-100). PTB (40 ml) was added to the remaining residual slurry in the 2-ml tube and incubated at 56°C for 1 h. After incubation, the residual slurry was transferred into sterile 15 ml screw-capped tubes (PTB þ ), with 4.5 ml lysis buffer. Sample tubes were incubated in a water bath at 56°C for 48-72 h, subjected to three rounds of snap freezing in liquid nitrogen, followed by thawing in a 56°C water bath.
Tubes were centrifuged at 2,500 r.p.m. (1,258g) for 15 min and the supernatants transferred to sterile 15 ml screw-capped tubes. Freshly mixed silica (20 ml) was added and the samples incubated at room temperature for 1 h on a rotator, centrifuged at 2,500 r.p.m. (1,258g) for 15 min, the silica supernatants were either stored at 4°C (PTB-samples) or discarded (PTB þ samples). The silica pellets were dislodged by vortexing, resuspended in 200 ml wash buffer (5 M guanidine thiocyanate, 0.1 M Tris-HCl, pH 6.4) and transferred to sterile 2 ml screw-capped tubes. Residual silica was washed out with another 100 ml wash buffer. Tubes were centrifuged at 14,000 r.p.m. (20,817g) for 1 min to pellet the silica and the supernatant was discarded. Silica pellets were washed by resuspending once in 200 ml wash buffer, twice in 200 ml À 20°C filter-sterilized ethanol and once in 200 ml of À 20°C acetone, centrifuging at 14,000 r.p.m. (20,817g) for 1 min and discarding washings. Tubes were drained on clean absorbent paper followed by drying at 56°C for 1 h. Dried preparations were stored at 4°C until examined.
Dried silica supernatants were rehydrated with 80 ml of filter sterilized elution buffer (EB), mixed and incubated at 60°C for 15 min. All samples were centrifuged at 14,000 r.p.m. (20,817g) for 1 min, 70 ml of supernatant was transferred to 1.5 ml sterile low binding tubes. To remove residual slurry, the supernatant was centrifuged for a further 1 min at 14,000 r.p.m. (20,817g) and 60 ml of supernatant was processed. AMPureXP beads (at room temperature) were added to each sample (36 ml) and mixed by pipetting. All samples were incubated for 10 min at room temperature, placed on the magnetic stand until the sample was clear, 94 ml of supernatant was removed and stored for further processing. The remaining beads were washed twice with 80% ethanol for at least 30 s per wash. Beads were air dried for 5 min on the magnetic stand, resuspended in 32.5 ml EB off the magnetic stand and incubated for 5 min at room temperature. After placing on the magnetic stand 30 ml of supernatant was removed and stored at -20°C in sterile 1.5 ml low binding tubes (DNA fragments 4500 bp). The stored 94 ml supernatant was taken and 200 ml AMPure XP beads added, mixed by pipetting and incubated at room temperature for 10 min. Samples were placed on the magnetic stand until clear and 290 ml of supernatant was discarded. The remaining beads were washed twice with 80% ethanol for at least 30 s per wash. Beads were air dried for 10 min on the magnetic stand, resuspended in 65 ml EB off the magnetic stand and incubated for 10 min at room temperature. After placing on the magnetic stand 62.5 ml of supernatant was transferred to sterile 1.5 ml low binding tubes (DNA fragments o500 bp). Extracted DNA (o500 bp) was quantified using HS dsDNA qubit assay (Life Technologies) as per the manufacturer's instructions (2 ml of sample was quantified).
Extracted DNA was converted into TruSeq Nano libraries for sequencing on an Illumina MiSeq according to the manufacturer's low sample protocol with a few modifications. No fragmentation step or size selection after end repair was carried out due to the nature of ancient DNA. Samples were cleaned after end repair with 200 ml sample-purification beads. dA-tailing and adapter ligation were according to the manufacturer's protocol. DNA fragments were enriched using 15 PCR cycles instead of 8. Libraries were quantified using HS dsDNA qubit assay as per the manufacturer's instructions (2 ml of sample was quantified), then stored at À 20°C until preparation for sequencing on the MiSeq.
Libraries were pooled in equimolar amounts (determined by analysis on an Agilent Bioanalyser 2100 and HS dsDNA qubit assay) and 12 pM sequenced on an Illumina MiSeq platform v2 2 Â 250 bp paired end protocol. Body 23, 25, 28 and 121 of 4 nM libraries were pooled and sequenced on an Illumina HiSeq platform (rapid run), TruSeq Rapid SBS kit-HS (200 cycle).
Preventing M. tuberculosis contamination. DNA extraction and library preparation (up until the library amplification step) were carried out in a dedicated pre-PCR laboratory in which no mycobacterial strains had been cultured, no mycobacterial DNA had been PCR-amplified and no Lineage 4 strains had been genome-sequenced. Library preparation was completed in the post-PCR laboratory. All pipettes were ultraviolet treated and all benches and equipment cleaned with hypochlorite and wiped with 80% ethanol before and after these procedures. Gloves were changed between handling different samples. Preparation of sequencing libraries for M. bovis Bacillus Calmette-Guérin (BCG) had been carried out previously in these laboratories, but no contamination of Illumina libraries with BCG sequences was seen in any of the intervening sequencing runs, nor in any of the libraries analysed here.
There are no plausible human or environmental sources for M. tuberculosis DNA in our laboratory. The presence of distinct genotypes of M. tuberculosis in each of the samples, aside from the mother/daughter pair, rules out crosscontamination between samples as the source of M. tuberculosis sequences. In the mother/daughter pair, the marked difference in the proportions of the two M. tuberculosis genotypes suggests cross-contamination is unlikely.
We also excluded reads that matched to rRNA genes. These stringent conditions rule out spurious matches to reads from environmental mycobacteria allowing selection of samples, which had appreciable coverage of TB genomes. Under these conditions, we obtained convincing even coverage for eight samples (Table 1) and uneven, low depth of coverage for the others (Supplementary  Table 1). Aligned BAM files were analysed using MapDamage2 (ref. 36) for evidence of DNA damage associated with ancient DNA, revealing little to no C to T or G to A conversion at the ends of reads (Fig. 2).
Additional single MiSeq and HiSeq runs on the same and/or samples from the eight putatively infected mummies (Supplementary Table 1) were pooled with the original reads for genomic evaluation. The final pooled reads were filtered with Bowtie2 using the relaxed parameters: --local -D 10 -R 3 -N 0 -L 20 -i S,1,0.50 to allow sensitive recovery of reads specific to M. tuberculosis. Specificity was then ensured by filtering the mapping results with the Python script post_filter.py, which retained pairs of reads where the mapped region in at least one read was Z100 bp in length and Z97.5% identical to the H37Rv reference (using a conservative 100 bp cutoff enabled improved specificity of mapping and excluded short spurious matches to sequences from related organisms). For each sample, although we also estimated the number of reads that mapped to the human genome hg19, we found relatively few reads of human origin (Supplementary Table 1).
Phylogenetic analysis of samples from eight bodies. In an initial phylogenetic analysis, we examined all reads that mapped to H37Rv and overlapped position 3,296,371 in the H37Rv genome, which marks a seven base-pair deletion characteristic of Lineage 4 (to which H37Rv belongs) 8,9 . All such reads showed the same sequence as H37Rv, so we concluded that all our historical genotypes belong to Lineage 4. For detailed phylogenetic analysis of historical genotypes, we selected unassembled draft genomes from the European Nucleotide Archive Short Read Archives drawn from four sets of genotypes. These contemporary Lineage 4 genomes were derived from four distinct geographical settings: Malawi (Short Read Archive accession code ERP001072), the UK (ERP000276), Russia (ERP000192) and the Netherlands (ERP000111) [11][12][13][14] . From these collections, we selected unassembled draft genomes belonging to various sub-lineages of Lineage 4, based on lineage-specific IS6110 insertions 29 at the following positions (relative to reference strain 9177-77): 486,196 for Harlem and X clades, 935,456 for LAM clade and 893,641 for T clade. ERR234658 from the Beijing lineage was chosen as an outgroup.
Reads from the contemporary Lineage 4 unassembled draft genomes were mapped to H37Rv using Bowtie2 with default settings 2 . SNVs were called using Samtools 37 with parameters mpileup -AB -Q 0 -I -C20 -h 50 -gf followed by BCFtools 11 with parameters view -g -cp 1 We tagged as ambiguous all sites with o20-fold coverage and where o80% of reads supported a single nucleotide. Within the entire data set, phylogenetic analyses were restricted to sites that were unambiguous in Z90% of the genomes and were outside of the repetitive regions in Supplementary Data 3. Genomes with 410% ambiguous bases were then removed.
We called SNVs for B68 (Terézia Hausmann) in two phases. In the first phase, we binned SNVs to the B68-1 and B68-2 genotypes by depth of coverage. As shown in supplementary Fig. 2A, the frequency of reads supporting each SNV fell into a bimodal distribution with minimal overlap, so that we could call informative SNVs by assigning those with r48% to B68-2 and Z52% to B68-1. After these procedures, eight uninformative heterogeneous sites remained unassigned and were discarded. In a second confirmatory phase, we used a phylogeny-aware approach (MGPlacer, see below) to analyse the B68 reads, using a pre-computed Lineage 4 phylogeny (see below), which gave identical results.
We confirmed that the two high-coverage unmixed historical genomes contained comparable numbers of informative SNVs to the contemporary genomes. When considering all informative sites (SNVs present in other genotypes), the proportion of uncovered sites in B80 was found to be only 1.2% (compared with an average of 0.5% sites for the modern genomes). This means that we can expect to be missing no more than one or two B80-specific SNVs. The B92-1 genome has an extremely high coverage (186X and 495% in the sample), so we see only 0.1% uncovered informative sites. As shown in Supplementary Fig. 2B, all previously known SNVs were supported in B92-1 by 485% of the reads. This means that the SNVs in B92-1 are as good as in any modern genome.
The two historical genomes derived from B68, together with two other historical draft genomes with 410-fold coverage (B80, B92-1) were then combined with the 1,582 modern genomes to create a set of 1,586 Lineage 4 genomes (plus Beijing ERR234658), which were subjected to phylogenetic analysis. A maximumlikelihood tree (Fig. 3a)  MGplacer. Conventional phylogenetic methods that rely on identification of all trusted SNVs within a genome cannot be applied to the low-coverage genome sequences we obtained from most samples. We thus adapted the technique of phylogenetic placement, whereby low-coverage genomes are placed on a fixed reference tree, computed from high-coverage genomes. To do this, we devised a new algorithm, MGplacer, capable of mapping low-coverage genomes, even from samples with mixed genomes. MGplacer provides improved performance when applied to samples with mixed genotypes from the same phlogeny in a monomorphic organism like M. tuberculosis. When using MGplacer, the assignment of reads to the low-coverage genotypes in mixed samples relies on mapping of each read to a chain of nodes in an existing pre-computed phylogeny derived from 41,500 strains. It does not rely on depth of coverage during this process, or does it require all regions of the genome to be recovered. Supplementary Fig. 1 provides a visual of the strength of evidence for the deduced lineage mapping using this approach. The scripts for implementing MGplacer and other scripts described here for public download at https://sourceforge.net/projects/mgplacer/files/.
Reconstructing ancestral states at all phylogenetic nodes (script MGplacer.R). Ancestral states for all nodes in the maximum-likelihood 1,586-Lineage 4 genome phylogeny (Fig. 3a) were determined by a time-reversible Markov process 39 , modified from the ACE function in the APE package in R 40 . This process uses the JC69 model with a fixed substitution rate, calculated as the number of all polymorphic sites divided by the total length of non-repetitive sites in the reference genome. The most likely ancestral state at each node was calculated as the maximum a posteriori state by the Viterbi algorithm, a dynamic programming algorithm commonly used for finding the most likely sequence of hidden states 41 .
Branch locations of mummy genotypes with low coverage (script MGplacer2.py). Reads that covered polymorphic sites in the 1,586-genome phylogeny were extracted from the output of the post_filter.py script. SNVs were classified as 'supported' when the number of supporting reads was Z1/3 of the median coverage of all polymorphic sites within that draft genome and otherwise as 'not supported '. The likelihood of the assignment of a genotype to a branch b in the phylogeny was calculated from a 2 by 2 table:

Consistent SNVs
Inconsistent SNVs where N b refers to the number of all SNVs that are consistent with a branch assignment to b, of which C b are supported. Similarly, N i is the number of all SNVs that are inconsistent with an assignment onto b, of which C i are supported. Assuming a binomial distribution of the ratio of supported/not supported SNVs, the likelihood of a genotype belonging to branch b is: The significance ( Supplementary Fig. 1) of these branch assignments was tested by the rank order of lk b in 10,000 random permutations of SNV support across all branches in the tree. To detect multiple genotypes within each sample, we used MGplacer in an iterative manner, in which iterations were continued until no further significant branch assignment were achieved. After each iteration, only inconsistent SNVs and their supporting reads were retained for the subsequent iterations. Examples of the percentage of reads assigned to the major and minor genotypes in high-coverage samples are shown in Supplementary Figs. 2A and B and examples of the numbers of reads supporting each genotype in low-coverage samples in Supplementary Fig. 1A-D. Examples of low-coverage samples with no significant reads from more than one genotype are shown in Supplementary  Fig. 1E and F.
In silico spoligotyping. SpolPred was used to calculate the spoligotype pattern 42 and TB-lineage used to predict the representative clade from this pattern 43 . The spoligotype was called as 'orphan' if the probability given by TB-lineage was less than 0.8.
Bayesian estimates of age and population fluctuation. Calculation of the root to tip distances versus dates of isolation indicated linear relationships when using only modern isolates (R 2 ¼ 0.063) or both historical and modern isolates (R 2 ¼ 0.21; Fig. 4) using Path-O-Gen (http://tree.bio.ed.ac.uk/software/pathogen/). We estimated the population history of lineage 4 with the Bayesian algorithms in Beast v1.8.0 (ref. 17). The input consisted of four historical genotypes with 410X coverage (Table 1), as well as 161 modern genomes (Supplementary Table 2), which represented the widest range of isolation dates as well as the genetic diversity in the maximum likelihood phylogeny (Supplementary Fig. 2A and Supplementary Data 2). A total of 16,449 SNVs (selected from Supplementary Data 1) in the non-repetitive core genome, supplemented by the numbers of invariant A, C, T and G nucleotides were considered in the Bayesian estimates. The dates of isolation for each strain were included in the Bayesian model as tip dates. Initial comparisons showed that the root positions of all maximum clade credibility trees differed from the root indicated by outgroup analysis in the maximum-likelihood phylogeny, which would reduce the dating accuracy of the MRCA (time to most recent common ancestor (TMRCA)). We therefore assigned lineages 4.1, 4.2 and 4.a to single monophyletic clades in order to ensure that the root was between these three lineages. A total of 12 independent Markov Chain Monte Carlo analyses were run for different combinations of clock rate and population models (Supplementary Table 4) for 100 million states, with sampling every 1,000 iterations. The initial five million samples from the beginning of each run were treated as burn-in because they had significantly lower likelihoods or priors than subsequent samples.
The Bayes factors for all 12 models were evaluated with two methods, path sampling and stepping-stone sampling, both of which are integrated in BEAST v1.8.0 and out-performed other existing methods 44 . Both methods require Markov Chain Monte Carlo sampling from a series of power posteriors. To initiate this sampling, a chain of 5M was first run as burn-in, and then 50 path steps, each of which contains 100 K burn-in and a chain length of 1M, were applied to sample the likelihood every 1,000 iterations. Of all combinations, the Bayesian model with uncorrelated lognormal distribution (UCLD) clock rates and a 30-step Bayesian Skyline population size yielded the highest Bayes factor in the path-sampling method and the second highest Bayes factor in the stepping stone method. Therefore, the samples generated from this model were applied to yield the maximum clade credibility tree and Bayesian Skyline plot in Fig. 5 and Supplementary Fig 3. Supplementary Table 3 presents the median date estimates of the best model, as well as the range of median dates from all 12 models, for the MRCA of lineage 4 and sub-lineages. Supplementary Table 4 presents the Bayes factors inferred by path sampling and stepping-stone sampling methods, as well as mutation rates and TMRCA generated from all 12 samples.