Leprosy is a chronic human disease caused by the yet-uncultured pathogen Mycobacterium leprae. Although readily curable with multidrug therapy (MDT), over 200,000 new cases are still reported annually. Here, we obtain M. leprae genome sequences from DNA extracted directly from patients’ skin biopsies using a customized protocol. Comparative and phylogenetic analysis of 154 genomes from 25 countries provides insight into evolution and antimicrobial resistance, uncovering lineages and phylogeographic trends, with the most ancestral strains linked to the Far East. In addition to known MDT-resistance mutations, we detect other mutations associated with antibiotic resistance, and retrace a potential stepwise emergence of extensive drug resistance in the pre-MDT era. Some of the previously undescribed mutations occur in genes that are apparently subject to positive selection, and two of these (ribD, fadD9) are restricted to drug-resistant strains. Finally, nonsense mutations in the nth excision repair gene are associated with greater sequence diversity and drug resistance.
Mycobacterium leprae is the main causative agent of leprosy, a disease that affects the skin, nerves, and mucosa of the upper respiratory tract in humans1. A second, distantly related leprosy bacillus, Mycobacterium lepromatosis, was recently discovered in humans and red squirrels (Sciurus vulgaris)2. Leprosy is curable with multidrug therapy (MDT), but remains a public health problem in South America, Africa, South and Southeast Asia, and Micronesia, where over 200,000 new leprosy cases are reported each year3. MDT, comprising rifampicin, dapsone, and clofazimine, has been used intensively since the 1980s and a few second-line drugs, ofloxacin, minocycline, and clarithromycin, are sometimes employed as therapeutic agents4. The emergence of drug-resistant (DR) and multidrug-resistant (MDR) M. leprae is increasingly reported5,6,7,8,9,10,11,12. For dapsone, rifampicin and ofloxacin, the resistance mechanism has been attributed to missense mutations in the drug resistance determining regions (DRDR) of the folP1, rpoB, and gyrA genes, respectively.
M. leprae is an obligate intracellular pathogen that has never been cultured axenically but can infect wild or experimental animals. The nine-banded armadillo (Dasypus novemcinctus) or the mouse footpad (MFP) can be used to produce bacilli, but both methods are cumbersome and time-consuming13. The genome of M. leprae is the smallest among mycobacteria (3.3 Mb) with 1614 genes encoding proteins and a remarkable 1300 pseudogenes14. Such reductive evolution is a hallmark of bacteria that have changed their lifestyle from free-living to strictly host-associated15. Due to its 14-day generation time and the absence of horizontal gene transfer, the genome of M. leprae is highly conserved, with <300 single-nucleotide polymorphisms (SNPs) observed between distantly related strains, and only a few SNPs between close relatives5,16,17,18. Four SNP types (branches 1–4) and 16 SNP subtypes (A–P) were defined by surveying 78 informative SNPs and six single-base insertion/deletions (InDels)16,19. Genotyping a large panel of M. leprae strains revealed strong geographical associations and suggested possible routes of dissemination of leprosy16 whereas, a recent phylogenetic analysis of 16 whole-genome sequences of modern and ancient M. leprae strains, implicated the 3K subtype (branch 0) as the most ancestral17.
Leprosy seems to have appeared during the Iron Age (1200–600 BC) and the date of the most recent common ancestor of M. leprae was estimated to be from 2543 BC to 36 AD, based on whole-genome sequence analysis17. Similarly, the earliest accepted written record of leprosy is from 600 BC20, and the earliest osteological evidence dates from around 300 BC21,22,23,24,25. The oldest genomic evidence of leprosy is for samples from 80 to 240 AD in Central Asia26.
In this study, we develop and apply methods to isolate and purify M. leprae DNA that enable whole genome sequences to be obtained directly from human biopsy material, thus removing the necessity for passage through animals. This approach was successfully used to generate 120 new M. leprae genome sequences from drug-susceptible and DR strains from around the world, thereby enabling detailed phylogenetic and phylogeographic comparisons to be performed, new mutations associated with antimicrobial resistance to be detected, and the likely origin of leprosy to be proposed.
Isolating M. leprae DNA from human skin biopsies
Genome sequencing has become routine practice in microbiology27, especially for micro-organisms that can be readily isolated, which is not the case of the leprosy bacillus. For decades, the sole source of M. leprae DNA suitable for genomics was from bacteria isolated 12 months after infection of armadillos or mice. Recently, we have developed and optimized methods that enable M. leprae DNA to be extracted directly from fresh or formalin-fixed skin biopsies from leprosy patients28. These methods include enrichment of M. leprae DNA by array capture17 but this is less practical for large population-based investigations.
The DNA extraction method used in this study was applied directly to punch biopsies from clinically well-characterized patients of known bacillary index (BI) and exploits the fact that M. leprae resides intracellularly. Host cells are first disrupted and their DNA degraded, leaving the bacilli intact. The bacilli are then lysed and their DNA extracted and used for library preparation. This approach was applied to 106 biopsies whose BI ranged from 0 (no bacilli visible) to 6 (>1000 bacilli per microscopic field) thereby enabling a relationship between BI and sequencing efficiency to be established (Fig. 1). As expected, there was a direct correlation between genome coverage and the BI but, surprisingly, successful coverage could even be achieved with some specimens whose BI was as low as 1+.
Genome analysis of patient and animal cohort
We analyzed a total of 154 M. leprae genomes from 25 countries (Fig. 2, Supplementary Data 1), of which 120 were newly sequenced and 34 were previously published (Supplementary Data 1). The cohort comprised 147 human samples, 6 from red squirrels and 1 from an armadillo that were all naturally infected. Genome sequences were obtained directly from 109 human samples, 30 from bacilli passaged in mice, and 8 from armadillos. Thirty of these strains were from patients who had relapsed or not responded to MDT the remainder (124) were from supposedly drug-susceptible strains (87 were from confirmed primary cases, while disease history was unknown for the others).
A total of 3053 SNPs and 219 InDels (excluding tandem repeats) was found (Supplementary Data 2). The average SNP difference among the 154 genomes was 114. We found a total of 988 non-synonymous alleles (0.62 per protein-coding gene, or 0.61 per kb of protein-coding genes) and 530 synonymous SNPs (0.33 per protein-coding gene or 0.33 per kb of protein-coding genes), and 1763 mutations in intergenic regions and pseudogenes (1.07 mutations per kb of intergenic regions and pseudogenes). The SNP density for each gene is given in Supplementary Data 2. Of the 219 InDels, 58 (27%) were in protein-coding genes.
Phylogeny of M. leprae
Phylogenetic analysis using both maximum parsimony (MP) and Bayesian inference resulted in consistent tree topologies and revealed distinct lineages and sublineages of M. leprae (Fig. 3). Strains belonging to the same SNP subtypes16 clustered within single branches, with the exception of SNP subtype 3K, which is represented by a newly discovered ancestral lineage, termed here 3K-1, and the ancestral lineage referred to earlier as branch 017 and termed here 3K-0 (Fig. 3a). All strains from the two most ancestral lineages, 3K-0 and 3K-1, originated from Japan (8), China (1), Korea (1), the Marshall Islands (1), and New Caledonia (1), in agreement with earlier genotyping studies of hundreds of M. leprae strains which confirmed the predominance of the 3K genotype in East Asia, notably in Japan, China, and Korea16,29,30,31
M. leprae in East Africa showed higher diversity with subtypes 2E, 2F, and 2H representing distinct lineages (Fig. 3a). The geographic distribution of those lineages corroborates the findings of earlier studies reporting the presence of SNP type 2 in Medieval Europe, the Middle East and East Africa16. Two Ethiopian isolates, belonging to the 2F subtype, clustered closely with medieval European strains dating from the 11th to 12th century (Fig. 3), which supports the hypothesis that the ancient Greek and Roman routes32 connecting Europe, the Middle East, East Africa, and South Asia16,33,34,35 contributed to the dissemination of SNP type 2 M. leprae.
West Africa, on the other hand, harbors exclusively SNP type 4, suggesting that overland migration between East and West Africa was limited. SNP subtypes 4N, 4O, and 4P, albeit sharing the same ancestor, do not form a monophyletic clade as previously hypothesized16. Rather, the 4O and 4P subtypes cluster together in a branch distinct from 4N (Fig. 3).
Brazil, as expected, contains a great diversity of several M. leprae lineages, with the SNP type 4 and SNP subtype 3I being the most prevalent36. The 3I genotype was common in medieval Europe17,21,37,38, and is still present in red squirrels in the United Kingdom2. The modern Brazilian strain Br2016-45 branched between two medieval strains from Europe (Fig. 3), making it the most ancestral contemporary 3I strain in the Americas to date. The broad diversity of 3I genotypes from Brazil probably derives from multiple introductions from Europe. On the other hand, the strains circulating in the Southern USA and associated with zoonosis from the nine-banded armadillo, I-30, NHDP-55 and NHDP-6318, originated much more recently (Fig. 3b), in agreement with the rapid expansion and spread of the armadillo population since its introduction to this region about 150 years ago39.
Good representation of most M. leprae lineages enabled identification of lineage-specific markers. A set of 235 SNPs and 25 InDels were specific to single lineages or groups of related lineages (Supplementary Data 2), of which 73 non-synonymous SNPs and 5 InDels were within protein-coding genes. These new lineage-specific markers can be used for future genotyping schemes.
Dating analysis of M. leprae (Fig. 3b and Supplementary Figure 1) was done using BEAST v2.4.440 and the results were very similar to those obtained from a ten-fold smaller number of contemporary isolates17. The most recent common ancestor (TMRCA) of all M. leprae strains was estimated to be 3699 years old (95% Highest Posterior Density (HPD) 2731–4838 ya) and the substitution rate was 7.8 × 10−9 per site per year. Overlapping results were obtained when using different models (Supplementary Note and Supplementary Table 1), indicating that the dataset was robust and sufficiently informative.
A striking observation is the relative youth of the SNP type 1 lineage and its association with South Asia (Fig. 3b). Earlier studies revealed a predominance of SNP subtype 1D in India and Nepal, followed by 1C, 1A16, and 2E, 2G, and 2H33,34,35. SNP type 1 predominates in Thailand41, Bangladesh, Indonesia, and Philippines16. The current phylogeography of M. leprae implies that humans brought leprosy to South Asia from other parts of the continent.
Hypermutated M. leprae strains
Eight M. leprae strains (85054, S15, Amami, Zensho-4, Zensho-5, Zensho-9, Br14-3, and Br2016-15), belonging to five different SNP subtypes, had unusually long branches in the MP tree (Fig. 3a) because they contained on average 92 more SNPs than the other strains but approximately the same number of InDels. Comparative analysis revealed one unique feature linking the observed “hypermutated” strains, namely deleterious mutations in the endonuclease III gene nth (ML2301) due to frameshifts and premature stop codons (Table 1).
DR-associated SNPs were detected in the DRDR in 24 strains for folP1, 11 strains for rpoB, and 2 strains for gyrA (in bold in Table 1). Previously described mutations were identified in folP1 at codons 53 (n = 7) and 55 (n = 17), except in one isolate (Bn8-52), which had mutations at codons 55 and 145. Eleven strains had known mutations that confer rifampicin resistance in their rpoB-DRDR, while two strains (Kutatsu-6 and S15) harbor one additional mutation, and one (Br14-3) has two additional mutations in the DRDR (Table 1). One of these additional mutations (G432S) does not confer rifampicin resistance to recombinant Mycobacterium smegmatis42 whereas no information is available for the remaining three (T433I, G448D, and T508I) except that the G448A substitution does confer rifampicin resistance in M. tuberculosis43. Also, 5 of the 11 rifampicin-resistant strains had additional missense mutations in rpoB (85054, Br14-4, Zensho-4, Zensho-5, and Zensho-9) while 2 strains (ARLP_08 and S9) presented non-synonymous SNPs outside the DRDR (Fig. 4). Compensatory mutations in rpoA and rpoC, encoding the alpha and beta-prime subunits of RNA polymerase, can occur in rifampicin-resistant M. tuberculosis44. We found one non-synonymous SNP in rpoA, substitution T187P in the rifampicin-resistant strain Br14-5, and seven non-synonymous SNPs in rpoC (Supplementary Data 2), of which two occurred in the drug-resistant strains S15 (A258T) and Zensho-4 (H1133Y).
Two strains had known quinolone resistance mutations in the DRDR of gyrA and six harbored different single mutations elsewhere in the gene. Three isolates had a missense mutation in gyrB, including two within the DRDR (Table 1). Five strains harbor deleterious mutations in the ethA gene, encoding a mono-oxygenase that activates thioamide prodrugs in M. tuberculosis45,46. Interestingly, in addition to ethA and nth, three genes (fadD9, ribD, pks4) were mutated almost exclusively in MDR strains occurring 18, 19, and 4 times, respectively (Table 1).
Retracing the emergence of drug resistance in leprosy patients
Prior to the introduction of MDT in the 1980s, patients were treated with dapsone or other antimicrobials as monotherapies of varying duration7,8. Since genomics uncovered new mutations that are associated with antimicrobial resistance in other bacteria, such as those in ethA and gyrB, this prompted us to try and retrieve the clinical records of six patients whose strains displayed resistance to three or more drugs (dapsone, rifampicin, quinolones, and thioamides). Four of these extensively drug-resistant (XDR) strains were from multibacillary patients in Japan who had received a succession of monotherapies in the pre-MDT era and our genome analysis enabled the chronology of resistance emergence to be retraced. This is illustrated in Fig. 5, and sadly exemplified by the strain from patient Zensho-4 who was diagnosed in 1963 and first treated with protionamide followed by thiambutosin, both of which show cross-resistance and likely require activation by the EthA mono-oxidase that acquired the A25T missense mutation47; then treatment began with dapsone leading to emergence of the T53I mutation in folP1, followed by rifapentine that selected the S456L mutation in rpoB, and continued with ofloxacin, to which resistance arose from the A91V mutation in gyrA and D464N in gyrB. Molecular drug susceptibility testing was performed in 1998 and the patient finally cured by a regimen comprising clofazimine, minocycline, chloramphenicol and levofloxacin/sparfloxacin. The fifth XDR strain was from a newly diagnosed Brazilian case (Br2016-15) with no history of treatment for leprosy, confirming the ongoing transmission of primary antimicrobial resistance, while details of the sixth case could not be recovered.
Genes under positive selection
We also identified genes containing an unusually high number of polymorphisms, multiple alleles, and homoplasies (Supplementary Figure 2), which could be indicative of positive selection48. Strikingly, the distribution of these polymorphic sites around the genome was not random as they were often clustered, especially proximal to either side of the origin of replication (Supplementary Figure 3). Protein-changing mutations were found in 540 genes, with an average of 1.77 mutations per gene (STD 2.12). Table 2 contains a ranking of genes with at least five non-synonymous mutations or regions with one or more homoplasy (excluding VNTRs). The most polymorphic gene by far was ML041149 encoding the serine-rich antigen, a member of the immunogenic, surface-exposed PPE protein family. Two other known T-cell antigens whose genes display variability are Lsr2 and EsxA (Table 2). Other than nth, three other polymorphic genes (ML1040c, ML1750c, and ML1512c) code for proteins that appear to function in nucleic acid or cyclic nucleotide metabolism (Table 2).
Here we have optimized and applied highly sensitive procedures to extract M. leprae DNA directly from human skin biopsies that is suitable for whole-genome sequencing. The resultant genome sequences were analyzed phylogenetically and used to retrace the origin of the leprosy bacillus, and to identify polymorphisms that had been positively selected during evolution. Such polymorphisms might reflect pressure from the human immune system, from MDT or other forces.
It is striking that the ancestral lineages of M. leprae predominate in East Asia, although we should keep in mind that Central Asia has been understudied, so it would be interesting to sequence more samples spanning the East–West axis of Asia, including the Middle East, where the 3K genotype is also present16. Nevertheless, given the current data on the distribution of the 3K subtype we can deduce that the ancestor of M. leprae originated within Eurasia, probably in the Far East.
Endonuclease III (Nth) and the formamidopyrimidine and endonuclease VIII family (Fpg/Nei) of DNA glycosylases are central to the base excision repair pathway in bacteria50. Mycobacterial genomes usually contain a single nth and two fpg/nei genes but M. leprae has lost both fpg/nei orthologues and retained the nth gene. Nth, Fpg, and Nei may have overlapping functions and, in enteric bacteria, mutator phenotypes were observed when nth was inactivated in combination with the fpg and nei genes51,52,53. In M. smegmatis, deletion of nth and both the nei homologs resulted in elevated spontaneous mutation frequencies and increased sensitivity to oxidative stress54. Therefore, in the absence of Nei, inactivation of nth in M. leprae should lead to increased sequence variability, which is consistent with our results.
Strikingly, all nth mutants were also drug-resistant so Nth loss likely favors emergence of drug resistance, and nth mutations might serve as a surrogate marker for potential drug resistance and treatment failure. A link between a higher mutation rate and drug resistance was observed in strains of M. tuberculosis (which has nth and two fpg/nei genes) belonging to lineage 2, but the molecular basis for this is unknown55. For a pathogen with an extremely reduced genome such as M. leprae, a hypermutator phenotype could be detrimental and ultimately lethal.
Drug resistance is alarming for leprosy control. There is growing evidence for primary quinolone resistance in strains of M. leprae from patients who have never been treated with quinolones for leprosy but may have received this drug for other infections56. Five new GyrA mutations were identified in this study, but their effect on FQ resistance remains to be determined. Since two of them arose independently in the GyrA intein, which is removed by protein splicing, they may not impact quinolone activity (Fig. 4). Three non-synonymous mutations were found in gyrB (Table 2) and experimental evidence exists for two of them conferring quinolone resistance in in vitro assays or in M. tuberculosis57,58. To our knowledge, this is the first report of M. leprae clinical isolates harboring mutations in gyrB. Thus, despite their apparent rarity, mutations in gyrB should be systematically assessed in drug resistance screening.
A range of known and new mutations was detected in the DRDR and elsewhere in rpoB (Fig. 4; Table 1). Some of these might have a compensatory role in restoring fitness, that is known to be reduced to various degrees in rifampicin-resistant mutants of M. tuberculosis59,60. Similarly, compensatory mutations in rpoA and rpoC can occur in rifampicin-resistant M. tuberculosis44. The rpoA substitution T187P in the rifampicin-resistant M. leprae strain Br14-5 was shown to be compensatory in M. tuberculosis44. Rifampicin-resistant isolates of M. tuberculosis harbor more mutations in rpoC compared to rifampicin-susceptible isolates44,61. In our case, we observed no clear correlation between rifampicin resistance and mutations in rpoC, which occurred in two resistant and two wild-type strains.
Arguably the most intriguing finding of the present investigation was the remarkably high frequency of mutations in the fadD9 and ribD genes and in 19/23 cases these occur in strains that have at least one mutation that is associated with resistance to a leprosy drug (Fig. 4; Table 1). Functional information for fadD9 is scarce and the mutations found are predicted to either abolish protein production (8/16) or to cause detrimental amino acid changes (Fig. 4). In the case of ribD, 14 different missense mutations were found in a group of 17 variant alleles, indicating that this is likely an essential function. From studies with M. tuberculosis it is known that ribD encodes an alternative dihydrofolate reductase, with relatively low activity compared to that conferred by the bona fide dihydrofolate reductase gene, dfrA62. In clinical isolates of M. tuberculosis, a promoter mutation causes overexpression of ribD that is associated with resistance to the old drug, para-amino salicylic acid (PAS), and to certain DHFR inhibitors63. This suggests that the mutations detected in the M. leprae ribD gene may also confer resistance to PAS and support for this is provided by the fact that vadrine (2-pyridyl-(4)-1,3,4-oxydiazolone-(5)-p-aminosalicylate) was used as a drug to treat leprosy before dapsone became widely available64. It is thus possible that the ribD mutations we report here arose nearly 60 years ago following treatment with vadrine or another PAS derivative. Our discovery of these mutations and those in fadD9 should encourage further experimentation in order to establish their true role and contribution to antimicrobial resistance, especially to clofazimine.
Samples were taken from leprosy patients as punch biopsies of skin (preserved in 70% ethanol or formalin-fixed and paraffin-embedded (FFPE)), which is standard diagnostic procedure for leprosy, or from mouse foot-pads. Details about the samples used in this study are given in Supplementary Data 1 and below.
Origin of S15: Strain S15 corresponds to strain 9204165, which was isolated from a lepromatous leprosy patient originally from Martinique. The origin of S15 was erroneously attributed to New Caledonia in Monot et al.16, and the error was subsequently propagated in several publications2,17,66,67.
Origin of the case Ng14-3: We mention in the Results that West Africa harbors only SNP-type 4 strains. Strain Ng14-35 (subtype 1D) is an exception, but the patient (a native of Nigeria) developed leprosy during a prolonged stay in Libya.
LRC-1A, an unidentified strain from Japan: A sample obtained from the Leprosy Research Center (LRC) in Tokyo, initially labeled as Airaku-2, did not contain the mutations in folP1 and rpoB previously detected in Airaku-2 by PCR sequencing7. Therefore, we renamed this sample to LRC-1A (standing for SNP subtype 1A sample from LRC).
DNA extraction and library preparation
DNA was extracted from 101 human skin biopsies with known BI using a customized in-house protocol combining host tissue digestion and the QIAmp microbiome kit for host DNA depletion, strong bacterial cell lysis and silica-based purification. Punch biopsies (6 mm) in 70% ethanol were first rehydrated in Hank’s balanced solution prior to mincing with scissors. Cells were detached from the tissue by 30 min incubation at 37 °C with a mixture of 0.5 U of collagenase and dispase, followed by incubation at 56 °C with 10 mg/ml of trypsin until complete digestion. Free cells were then suspended in 1 ml of phosphate-buffered saline (PBS) and DNA was extracted using the QIAmp DNA microbiome extraction kit according to the manufacturer’s recommendations. Each run of extraction included a batch of five to nine samples and one blank control (500 μl of Hank’s balanced solution). The presence of M. leprae was assessed by PCR using RLEP primers2 prior to library preparation. Libraries were prepared from 50 µl of extracted DNA using the Kapa Hyperprep kit as described previously2,5. DNA from FFPE samples was extracted using the truXTRACTM FFPE DNA kit (Covaris) as described previously28. Libraries prepared from the extracted DNA were used directly for shotgun sequencing. M. leprae DNA extraction quality was assessed from the percentage of M. leprae DNA present in the library inferred by alignment to the reference genome sequence, with a minimum threshold set at 1%. This threshold was chosen because it yields an average genome coverage of at least 5× per sample in a multiplexed run of 10 samples on one HiSeq 2500 lane (yielding around 20 million reads per sample and 100 bases per read).
Libraries with low M. leprae content underwent enrichment using whole-genome tiling arrays as described previously17. Briefly, Illumina libraries were hybridized onto custom Agilent SureSelect Capture Arrays containing ca. one million DNA probes (60 bp) spanning the entire M. leprae genome (tiled every 4 bp), followed by elution and PCR amplification.
Sequencing was performed on Illumina Hi-Seq 2000, Hi-Seq 2500, or Mi-Seq instruments.
We took precautions in analyzing the data to avoid false-positive SNP calls. All raw reads were adapter- and quality-trimmed with Trimmomatic v0.3368. The quality settings were “SLIDINGWINDOW:5:15 MINLEN:40”. Paired-end (PE) data were additionally processed with SeqPrep (https://github.com/jstjohn/SeqPrep) to merge overlapping pairs. This increases the accuracy of sequence in the overlapping area, avoids problems in estimating coverage and creates longer reads, which facilitates InDel calling. Duplicate reads were omitted from downstream analyses. This is especially important for libraries with insufficient M. leprae DNA fragments, which is not uncommon for low BI samples or samples that are difficult to process, like FFPE samples. In these cases, library enrichment with array-capture, or very deep sequencing often produce a high number of duplicate reads (DNA fragments that were sequenced multiple times, seemingly increasing the overall genome coverage), with each read having dozens or even hundreds of copies. Such reads will amplify possible artefacts and sequence errors, resulting in false SNP calls.
Preprocessed reads were mapped onto the M. leprae TN reference genome (GenBank AL450380.1) with Bowtie2 v2.2.569. We filtered out all reads with mapping quality below 8 and omitted repetitive regions in the reference sequence. We also omitted ribosomal RNA (rRNA) genes because alignments in these regions tend to be error prone. This is because rRNA genes are highly conserved in bacteria, so sequences from other species could map to the M. leprae reference sequence. This usually happens when the content of M. leprae DNA in a sequencing library is scarce and is even more pronounced when libraries with low M. leprae content undergo array-capture. However, because lineage-specific mutations were previously observed in the M. leprae rrs gene30, we manually checked the alignments corresponding to the rRNA genes and added the curated results to Supplementary Data 2.
SNP calling was done using VarScan v2.3.970. To avoid false-positive SNP calls the following cutoffs were applied: minimum overall coverage of five non-duplicated reads, minimum of three non-duplicated reads supporting the SNP, mapping quality score >8, base quality score >15, and a SNP frequency above 80%. InDel calling was done using Platypus v0.8.171 followed by manual curation. Completed genome sequences of M. leprae Br4923 and Mycobacterium lepromatosis (GenBank JRPY00000000.1) were aligned against the M. leprae TN reference using LAST72 using the default parameters for the former and the gamma-centroid option for the latter.
A large number of missing values, especially in lineage-specific loci, points to the presence of more than one strain in a sequencing library. Although not thoroughly tested, in our opinion mixed data sets are mostly due to technical problems or contamination because in some cases we were able to identify the problematic strains. The possible presence of multiple M. leprae strains in single skin lesions was not tested in this study, but we expect it to be extremely low. Overall, a few mixed data sets were detected and some were removed from this study, except for samples that we deemed important and describe below. Nevertheless, results were not biased because loci with mixed alleles were treated as missing values.
Zensho-4 seems to contain a fraction of another strain (possibly around 40%) that is closely related to it. Only a few loci had mixed alleles, and these include the A91V substitution in gyrA (supported by 62% of reads) and the D464N substitution in gyrB (supported by 41% of reads). The latter was attributed to Zensho-4 for simplicity. Similarly, Zensho-5 seems to contain around 30% of Zensho-4. This is the main reason why we could not detect SNPs specific only to Zensho-5 (Fig. 3a), since such SNPs would be “diluted” with wild-type alleles from Zensho-4 and could not pass the SNP “purity” threshold. We included these two samples in this study because they are multi-drug-resistant and belong to the SNP-type 3K-0. Furthermore, mutations in genes conferring drug resistance from this study match with those from earlier reports of these samples, confirming their identity73.
Thai-311 contains <20% of an unidentified 3K-0 strain that belongs to the Kyoto-1/Zensho-5 cluster of strains (Fig. 3a). SNP calling was not significantly affected. Finally, sample Ye2-3 contained around 25% of an unidentified strain belonging to SNP-type 4. Because we only have few samples from Yemen, we decided to keep Ye2-3 in this study.
Phylogeny and dating analysis
Concatenated SNP alignments were used for the analyses. MP trees were constructed in MEGA674 using 500 bootstrap replicates. Sites with missing data were partially deleted (80% coverage cutoff), resulting in 3046 variable sites used for the tree calculation. The Subtree-Pruning-Regrafting algorithm was used as the MP search method. Dating analysis and discrete phylogeography were done using BEAST2 v2.4.440. Details are given in the Supplementary Note.
Sequence data are available from the NCBI Sequence Read Archive (SRA) under accession number SRP072827. Accession numbers for all samples used in this study are given in the Supplementary Data 1. Other relevant data supporting the findings of the study are available in this published article and its Supplementary Information files, or from the corresponding author upon request.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Britton, W. J. & Lockwood, D. N. J. Leprosy. Lancet 363, 1209–1219 (2004).
Avanzi, C. et al. Red squirrels in the British Isles are infected with leprosy bacilli. Science 354, 744–747 (2016).
World Health Organization. Global leprosy update 2015: time for action, accountability and inclusion. Wkly Epidemiol. Rec. 91, 405–420 (2016).
Scollard, D. M. et al. The continuing challenges of leprosy. Clin. Microbiol. Rev. 19, 338–381 (2006).
Avanzi, C. et al. Transmission of drug-resistant leprosy in Guinea-Conakry detected using molecular epidemiological approaches. Clin. Infect. Dis. 63, 1482–1484 (2016).
Cambau, E., Perani, E., Guillemin, I., Jamet, P. & Ji, B. Multidrug-resistance to dapsone, rifampicin, and ofloxacin in Mycobacterium leprae. Lancet 349, 103–104 (1997).
Maeda, S. et al. Multidrug resistant Mycobacterium leprae from patients with leprosy. Antimicrob. Agents Chemother. 45, 3635–3639 (2001).
Matsuoka, M., Kashiwabara, Y. & Namisato, M. A. Mycobacterium leprae isolate resistant to dapsone, rifampin, ofloxacin and sparfloxacin. Int. J. Lepr. Mycobact. Dis. 68, 452–455 (2000).
Matsuoka, M., Kashiwabara, Y., Liangfen, Z., Goto, M. & Kitajima, S. A second case of multidrug-resistant Mycobacterium leprae isolated from a Japanese patient with relapsed lepromatous leprosy. Int. J. Lepr. Mycobact. Dis. 71, 240–243 (2003).
Matsuoka, M. et al. The frequency of drug resistance mutations in Mycobacterium leprae isolates in untreated and relapsed leprosy patients from Myanmar, Indonesia and the Philippines. Lepr. Rev. 78, 343–352 (2007).
da Silva Rocha, A. et al. Drug and multidrug resistance among Mycobacterium leprae isolates from Brazilian relapsed leprosy patients. J. Clin. Microbiol. 50, 1912–1917 (2012).
You, E.-Y., Kang, T. J., Kim, S.-K., Lee, S.-B. & Chae, G.-T. Mutations in genes related to drug resistance in Mycobacterium leprae isolates from leprosy patients in Korea. J. Infect. 50, 6–11 (2005).
Lahiri, R. & Adams, L. B. Cultivation and Viability Determination of Mycobacterium leprae. In International Textbook of Leprosy (eds. Scollard, D. M. & Gillis, T. P.) http://www.internationaltextbookofleprosy.org/ (2016).
Cole, S. T. et al. Massive gene decay in the leprosy bacillus. Nature 409, 1007–1011 (2001).
Moya, A., Peretó, J., Gil, R. & Latorre, A. Learning how to live together: genomic insights into prokaryote–animal symbioses. Nat. Rev. Genet. 9, 218–229 (2008).
Monot, M. et al. Comparative genomic and phylogeographic analysis of Mycobacterium leprae. Nat. Genet. 41, 1282–1289 (2009).
Schuenemann, V. J. et al. Genome-wide comparison of medieval and modern Mycobacterium leprae. Science 341, 179–183 (2013).
Truman, R. W. et al. Probable zoonotic leprosy in the southern United States. N. Engl. J. Med. 364, 1626–1633 (2011).
Monot, M. et al. On the origin of leprosy. Science 308, 1040–1042 (2005).
Dharmendra Leprosy in ancient Indian medicine. Int. J. Lepr. 15, 424–430 (1947).
Taylor, G. M. et al. Mycobacterium leprae genotype amplified from an archaeological case of lepromatous leprosy in Central Asia. J. Archaeol. Sci. 36, 2408–2414 (2009).
Donoghue, H. D. et al. Co-infection of Mycobacterium tuberculosis and Mycobacterium leprae in human archaeological samples: a possible explanation for the historical decline of leprosy. Proc. Biol. Sci. 272, 389–394 (2005).
Blau, S. & Yagodin, V. Osteoarchaeological evidence for leprosy from western Central Asia. Am. J. Phys. Anthropol. 126, 150–158 (2005).
Tayles, N. & Buckley, H. R. Leprosy and tuberculosis in Iron Age Southeast Asia? Am. J. Phys. Anthropol. 125, 239–256 (2004).
Mariotti, V., Dutour, O., Belcastro, M. G., Facchini, F. & Brasili, P. Probable early presence of leprosy in Europe in a Celtic skeleton of the 4th–3rd century BC (Casalecchio di Reno, Bologna, Italy). Int. J. Osteoarchaeol. 15, 311–325 (2005).
Donoghue, H. D. et al. A migration-driven model for the historical spread of leprosy in medieval Eastern and Central Europe. Infect. Genet. Evol. 31, 250–256 (2015).
Loman, N. J. & Pallen, M. J. Twenty years of bacterial genome sequencing. Nat. Rev. Microbiol. 13, 787–794 (2015).
Stefani, M. M. A. et al. Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases. PLoS Negl. Trop. Dis. 11, e0005598 (2017).
Weng, X. et al Molecular, ethno-spatial epidemiology of leprosy in China: Novel insights for tracing leprosy in endemic and non endemic provinces. Infect. Genet. Evol. 14, 361–368 (2013).
Yuan, Y. et al. Characterization of Mycobacterium leprae genotypes in China—identification of a new polymorphism C251T in the 16S rRNA gene. PLoS ONE 10, e0133268 (2015).
Kim, J. P. SNP genotypes of Mycobacterium leprae isolated in Korea. Korean Lepr. Bull. 45, 3–19 (2012).
McLaughlin, R. Rome and the Distant East: Trade Routes to the Ancient Lands of Arabia, India and China (Continuum, London, 2010).
Lavania, M. et al. Single nucleotide polymorphisms typing of Mycobacterium leprae reveals focal transmission of leprosy in high endemic regions of India. Clin. Microbiol. Infect. 19, 1058–1062 (2013).
Lavania, M. et al. Genotyping of Mycobacterium leprae strains from a region of high endemic leprosy prevalence in India. Infect. Genet. Evol. 36, 256–261 (2015).
Das, M., Chaitanya, V. S., Kanmani, K., Rajan, L. & Ebenezer, M. Genomic diversity in Mycobacterium leprae isolates from leprosy cases in South India. Infect. Genet. Evol. 45, 285–289 (2016).
Fontes, A. N. B. et al. Genotyping of Mycobacterium leprae present on Ziehl-Neelsen-stained microscopic slides and in skin biopsy samples from leprosy patients in different geographic regions of Brazil. Mem. Inst. Oswaldo Cruz. 107, 143–149 (2012).
Inskip, S. A. et al. Osteological, biomolecular and geochemical examination of an early Anglo-Saxon case of lepromatous leprosy. PLoS ONE 10, e0124282 (2015).
Mendum, T. A. et al. Mycobacterium leprae genomes from a British medieval leprosy hospital: towards understanding an ancient epidemic. BMC Genom. 15, 270 (2014).
Taulman, J. F. & Robbins, L. W. Recent range expansion and distributional limits of the nine-banded armadillo (Dasypus novemcinctus) in the United States. J. Biogeogr. 23, 635–648 (1996).
Bouckaert, R. et al. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10, e1003537 (2014).
Phetsuksiri, B. et al. SNP genotypes of Mycobacterium leprae isolates in Thailand and their combination with rpoT and TTC genotyping for analysis of leprosy distribution and transmission. Jpn J. Infect. Dis. 65, 52–56 (2012).
Nakata, N., Kai, M. & Makino, M. Mutation analysis of mycobacterial rpoB genes and rifampin resistance using recombinant Mycobacterium smegmatis. Antimicrob. Agents Chemother. 56, 2008–2013 (2012).
Bahrmand, A. R., Titov, L. P., Tasbiti, A. H., Yari, S. & Graviss, E. A. High-level rifampin resistance correlates with multiple mutations in the rpoB gene of pulmonary tuberculosis isolates from the Afghanistan border of Iran. J. Clin. Microbiol. 47, 2744–2750 (2009).
Comas, I. et al. Whole-genome sequencing of rifampicin-resistant Mycobacterium tuberculosis strains identifies compensatory mutations in RNA polymerase genes. Nat. Genet. 44, 106–110 (2012).
DeBarber, A. E., Mdluli, K., Bosman, M., Bekker, L. G. & Barry, C. E. Ethionamide activation and sensitivity in multidrug-resistant Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 97, 9677–9682 (2000).
Baulard, A. R. et al. Activation of the pro-drug ethionamide is regulated in mycobacteria. J. Biol. Chem. 275, 28326–28331 (2000).
Pattyn, S. R. & Colston, M. J. Cross-resistance amongst thiambutosine, thiacetazone, ethionamide and prothionamide with Mycobacterium leprae. Lepr. Rev. 49, 324–326 (1978).
Hedge, J. & Wilson, D. J. Practical approaches for detecting selection in microbial genomes. PLoS Comput. Biol. 12, e1004739 (2016).
Kai, M. et al. Characteristic mutations found in the ML0411 gene of Mycobacterium leprae isolated in Northeast Asian countries. Infect. Genet. Evol. 19, 200–204 (2013).
Zharkov, D. O. Base excision DNA repair. Cell. Mol. Life Sci. 65, 1544–1565 (2008).
Jiang, D., Hatahet, Z., Blaisdell, J. O., Melamede, R. J. & Wallace, S. S. Escherichia coli endonuclease VIII: cloning, sequencing, and overexpression of the nei structural gene and characterization of nei and nei nth mutants. J. Bacteriol. 179, 3773–3782 (1997).
Saito, Y. et al. Characterization of endonuclease III (nth) and endonuclease VIII (nei) mutants of Escherichia coli K-12. J. Bacteriol. 179, 3783–3785 (1997).
Suvarnapunya, A. E. & Stein, M. A. DNA base excision repair potentiates the protective effect of Salmonella Pathogenicity Island 2 within macrophages. Microbiol. Read. Engl. 151, 557–567 (2005).
Moolla, N., Goosens, V. J., Kana, B. D. & Gordhan, B. G. The contribution of Nth and Nei DNA glycosylases to mutagenesis in Mycobacterium smegmatis. DNA Repair 13, 32–41 (2014).
Ford, C. B. et al. Mycobacterium tuberculosis mutation rate estimates from different lineages predict substantial differences in the emergence of drug-resistant tuberculosis. Nat. Genet. 45, 784–790 (2013).
Linder, J. A., Huang, E. S., Steinman, M. A., Gonzales, R. & Stafford, R. S. Fluoroquinolone prescribing in the United States: 1995 to 2002. Am. J. Med. 118, 259–268 (2005).
Malik, S., Willby, M., Sikes, D., Tsodikov, O. V. & Posey, J. E. New insights into fluoroquinolone resistance in Mycobacterium tuberculosis: functional genetic analysis of gyrA and gyrB mutations. PLoS ONE 7, e39754 (2012).
Yokoyama, K. et al. Impact of amino acid substitutions in B subunit of DNA gyrase in Mycobacterium leprae on fluoroquinolone resistance. PLoS Negl. Trop. Dis. 6, e1838 (2012).
Gagneux, S. et al. The competitive cost of antibiotic resistance in Mycobacterium tuberculosis. Science 312, 1944–1946 (2006).
Mariam, D. H., Mengistu, Y., Hoffner, S. E. & Andersson, D. I. Effect of rpoB mutations conferring rifampin resistance on fitness of Mycobacterium tuberculosis. Antimicrob. Agents Chemother. 48, 1289–1294 (2004).
de Vos, M. et al. Putative compensatory mutations in the rpoC gene of rifampin-resistant Mycobacterium tuberculosis are associated with ongoing transmission. Antimicrob. Agents Chemother. 57, 827–832 (2013).
Cheng, Y.-S. & Sacchettini, J. C. Structural insights into Mycobacterium tuberculosis Rv2671 protein as a dihydrofolate reductase functional analogue contributing to para-aminosalicylic acid resistance. Biochemistry 55, 1107–1119 (2016).
Zheng, J. et al. Para-aminosalicylic acid is a prodrug targeting dihydrofolate reductase in Mycobacterium tuberculosis. J. Biol. Chem. 288, 23447–23456 (2013).
Jopling, W. H. & Ridley, D. S. Vadrine (S. 131) in the treatment of lepromatous leprosy: a preliminary report. Lepr. Rev. 29, 143–147 (1958).
Honoré, N., Perrani, E., Telenti, A., Grosset, J. & Cole, S. T. A simple and rapid technique for the detection of rifampin resistance in Mycobacterium leprae. Int. J. Lepr. Mycobact. Dis. 61, 600–604 (1993).
Singh, P. et al. Insight into the evolution and origin of leprosy bacilli from the genome sequence of Mycobacterium lepromatosis. Proc. Natl Acad. Sci. USA 112, 4459–4464 (2015).
Singh, P. et al. Genome-wide re-sequencing of multidrug-resistant Mycobacterium leprae Airaku-3. Clin. Microbiol. Infect. 20, O619–O622 (2014).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Koboldt, D. C. et al. VarScan 2: Somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22, 568–576 (2012).
Rimmer, A. et al. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46, 912–918 (2014).
Kiełbasa, S. M., Wan, R., Sato, K., Horton, P. & Frith, M. C. Adaptive seeds tame genomic sequence comparison. Genome Res. 21, 487–493 (2011).
Matsuoka, M. The history and characteristics of isolates maintained at the leprosy research center. Nihon Hansen. Gakkai Zasshi 79, 247–256 (2010).
Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: molecular evolutionary genetics analysis version 6.0. Mol. Biol. Evol. 30, 2725–2729 (2013).
Choi, Y. & Chan, A. P. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 31, 2745–2747 (2015).
We thank the Genomic Technologies Facility at the University of Lausanne for Illumina sequencing and technical support and all the patients and clinical staff who participated in the study. This work was supported by the Fondation Raoul Follereau, the Swiss National Science Foundation grant IZRJZ3_164174, the Swiss Cooperation and Development Center (CODEV), the Heiser Program of the New York Community Trust for Research in Leprosy (grant numbers P15-000827 and P16-000976), and grants CNPq 428964/2016-8 and CAPES PROAMAZONIA 3288/2013.
About this article
Scientific Reports (2019)