Leprosy, one of the oldest recorded diseases, remains a major public health problem. Although prevalence has been reduced extensively by WHO multidrug therapy and vaccination with BCG1, 2, the incidence of the disease remains worrying with more than 690,000 new cases reported annually3. In 1873, in the first convincing association of a microorganism with a human disease, Armauer Hansen4 discovered the leprosy bacillus in skin biopsies but failed to culture Mycobacterium leprae. A century later, the nine-banded armadillo5 was used as a surrogate host, enabling large quantities of the bacillus to be isolated for biochemical and physiological studies. Subsequent efforts to demonstrate multiplication in synthetic media have been equally fruitless, although metabolic activity can be detected6. The exceptionally slow growth of the bacillus, which has a doubling time of
14 days (ref. 7), may contribute to these failures.
The means of transmission of leprosy is uncertain but, like tuberculosis, the infection is thought to be spread by the respiratory route because lepromatous patients harbour bacilli in their nasal passages. The bacterium accumulates principally in the extremities of the body where it resides within macrophages and infects the Schwann cells of the peripheral nervous system. Lack of myelin production by infected Schwann cells, and their destruction by host-mediated immune reactions, leads to nerve damage, sensory loss and the disfiguration that, sadly, are the hallmarks of leprosy.
Genome features and reductive evolution
Sequence analysis.
The complete genome sequence of M. leprae contains 3,268,203 base pairs (bp), and has an average G+C content of 57.8% (Table 1). These values are much lower than those reported for the M. tuberculosis genome, which comprises
4,000 genes, 4,411,532 bp and 65.6% G+C (ref. 8). From detailed pairwise comparisons of both genome and proteome sequences8, 9, only 49.5% of the M. leprae genome contains protein-coding genes, whereas 27% contains recognizable pseudogenes (inactive reading frames with functional counterparts in the tubercle bacillus). The remaining 23.5% of the genome does not appear to be coding, and may correspond to regulatory sequences or even gene remnants mutated beyond recognition. The distribution of the 1,116 pseudogenes is essentially random (Fig. 1), and if these are excluded 1,604 potentially active genes remain, of which 1,439 are common to both pathogens. Among the 165 genes with no orthologue in M. tuberculosis are 29 for which we can attribute functions. Many of the 136 residual coding sequences in M. leprae, which show no similarity to known genes, may also represent pseudogenes as they are shorter than average (Table 1) and occur in regions of low gene density (Fig. 1).
Figure 1: Circular genome map showing the position and orientation of known genes, pseudogenes and repetitive sequences.

From the outside: circles 1 and 2 (clockwise and anticlockwise) genes on the - and + strands, respectively; circles 3 and 4, pseudogenes; 5 and 6, M. leprae specific genes; 7, repeat sequences; 8, G+C content; 9, G/C bias (G+C)/(G-C). Genes are colour coded using the following functional categories8: lipid metabolism (dark grey); intermediary metabolism and respiration (yellow); information pathways (red); regulatory proteins (light blue); conserved hypothetical proteins (orange); proteins of unknown function (light green); insertion sequences and phage related functions (pink); stable RNAs (dark blue); cell wall and cell processes (green); PE and PPE protein families (magenta); virulence, detoxification, adaptation (brown). See http://www.sanger.ac.uk/Projects/M_leprae or http://genolist.pasteur.fr/Leproma/ for additional information about gene functions. The scale in Mb is indicated (numbers on the outside of the genome).
High resolution image and legend (114K)Reductive evolution.
Assuming that the genome of M. leprae was once topologically equivalent and similar in size to those of all other mycobacteria (
4.4 Mb)10, 11, 12, then extensive downsizing and rearrangement must have occurred during evolution. If all the genes in the
3.3 Mb M. leprae genome were active, one would expect 3,000 proteins as compared with the 4,000 predicted in M. tuberculosis. Comparative proteome analysis detected only 391 soluble protein species13, compared with
1,800 in M. tuberculosis14, indicating that the pseudogenes are translationally inert. Thus, since diverging from the last common mycobacterial ancestor, the leprosy bacillus may have lost more than 2,000 genes.
Reductive evolution is documented in obligate intracellular parasites, such as Rickettsia and Chlamydia spp., and in some endosymbionts15, because genes become inactivated once their functions are no longer required in highly specialized niches. This process may have naturally defined the minimal gene set for a pathogenic mycobacterium. The most extensive genome degradation reported previously was in Rickettsia prowazekii, the typhus agent, in which only 76% of the potential coding capacity was used and 12 pseudogenes were identified16. In comparison with M. leprae, the level of gene loss detected was modest, and it is striking that elimination of pseudogenes by deletion lags far behind gene inactivation in both pathogens. Intriguingly, the G+C content of M. leprae genes (60.1%) is higher than that of the pseudogenes (56.5%), and the remainder of the genome (54.5%). The high G+C content of M. leprae, and other mycobacteria, is apparently driven by the codon preference of active genes, whereas deamination of cytosines in non-coding regions may have resulted in a more neutral G+C content.
Mosaic organization and horizontal transfer.
Although the precise mechanism behind pseudogene formation in M. leprae is unclear, the loss of dnaQ-mediated proofreading activities of DNA polymerase III (ref. 17) may have contributed. By contrast, there is extensive evidence for large-scale rearrangements and deletions arising from homologous recombination events. Comparison with M. tuberculosis delineated
65 segments that show synteny but differ in their relative order and distribution in the M. leprae genome. Breaks in synteny generally correspond to dispersed repeats, transfer RNA genes or gene-poor regions. Copies of all three prinicipal repeats, RLEP, REPLEP and LEPREP, occur at the junctions of discontinuity, suggesting that the mosaic arrangement of the M. leprae genome reflects multiple recombination events between related repetitive sequences. In some cases, aberrant recombination may have occurred as truncated repeats exist.
Although there is little sequence similarity indicating that they are insertion sequences, RLEP is probably capable of transposition as it exists within sequences corresponding to known genes. Unlike M. tuberculosis H37Rv, which contains at least 2 prophages and 56 intact or truncated insertion sequence (IS) elements8, 18, M. leprae has only 3 phage-like genes, all with M. tuberculosis orthologues, and 26 transposase gene fragments. We did, however, detect one possible case of horizontal transfer of genetic material when we examined the aminoacyl-tRNA synthetase genes. With one exception, all of these are more closely related to M. tuberculosis enzymes than to those of any other organism.
Unexpectedly, prolyl-tRNA synthetase, encoded by proS, is more similar to the enzymes of Borrelia burgdorferi and eukaryotes such as Drosophila, humans and yeast. It has been proposed that horizontal transfer of tRNA synthetase genes occurs frequently, and that the pathogen B. burgdorferi may have acquired proS from its host19. Comparing the genetic context provides further support for this hypothesis, as the M. leprae proS is both displaced and inverted with respect to the M. tuberculosis genome (Fig. 2), consistent with a recent acquisition.
Figure 2: Comparison of the proS loci of M. leprae and M. tuberculosis.

a, The M. leprae proS region is shown above that of M. tuberculosis. Genes or operons are depicted by arrows; crosses denote pseudogenes. Note the absence of ugpAEBC and dinF from M. leprae and the presence of proS at this site. b, Domain structures of prolyl-tRNA synthetases of bacterial (M. tuberculosis) or eukaryotic (M. leprae) types after ref. 19. Distinct subdomains are depicted as different shapes.
High resolution image and legend (25K)Multigene families.
Half of the genes (52%) present in M. tuberculosis arose from gene-duplication events leading to extensive functional redundancy9. Many of these are involved in lipid metabolism or belong to the PE and PPE families, encoding unusual glycine-rich proteins of repetitive structure and unknown function. The latter are confined to certain mycobacterial species20, and represent sources of genetic, and possibly antigenic, variation8. The corresponding 167 genes are exceptionally (G+C)-rich and occupy more than 8% of the M. tuberculosis genome21. By contrast, only 9 intact PE and PPE genes were found in M. leprae although 30 pseudogenes were present. No intact members of the PE-PGRS (with multiple Gly-Gly-Ala repeats) subfamily were found. This reduction partly contributes to the smaller genome size and the lower G+C content of M. leprae.
Some PE-PGRS proteins have been shown to be upregulated in Mycobacterium marinum during granuloma formation in frogs22. However, this effect is probably not mediated directly by the PE-PGRS, as granulomas are a prominent cytological feature of all forms of leprosy. Essentially all of the gene families9 in M. leprae have retracted and may now encode 'just enough' activity to permit intracellular growth. Selected examples are given in Table 2 whereas the comprehensive comparison presented in Fig. 3 shows shrinkage of all functional categories.
Figure 3: Distribution of genes by functional category.

The number of complete (blue) and pseudogenes (red) within each category for M. leprae is shown. Data for M. tuberculosis (green) were taken from the published genome sequence8. Functional categories: 1, small-molecule catabolism; 2, energy metabolism; 3, central intermediary metabolism; 4, amino-acid biosynthesis; 5, nucleoside and nucleotide biosynthesis and metabolism; 6, biosynthesis of cofactors, prosthetic groups and carriers; 7, lipid biosynthesis; 8, polyketide and non-ribosomal peptide synthesis; 9, proteins performing regulatory functions; 10, synthesis and modification of macromolecules; 11, degradation of macromolecules; 12, cell-envelope constituents; 13, transport/binding proteins; 14, chaperones/heat-shock proteins; 15, cell-division proteins; 16, protein and peptide secretion; 17, adaptations and atypical conditions; 18, detoxification; 19, virulence determinants; 20, IS elements and phage-derived proteins; 21, PE and PPE families; 22, antibiotic production and resistance; 23, cytochrome P450 enzymes; 24, coenzyme F420-dependent enzymes; 25, miscellaneous transferases; 26, miscellaneous phosphatases, lyases and hydrolases; 27, cyclases; 28, chelatases. Inset, y axis shows the number of genes within each functional category; x axis shows the functional categories: 29, conserved hypothetical proteins; 30, hypothetical proteins that share no significant similarity with any protein currently in the databases.
High resolution image and legend (34K)Metabolic clues
Successive generations of microbiologists have failed to grow M. leprae in axenic culture, leading to the notion that the bacterium lacks certain biosynthetic pathways. Complete genome comparisons shed new light on this. Lipid metabolism is prominent in the biochemical repertoire of M. leprae but to a lesser extent than in the tubercle bacillus, whose cell envelope has a greater diversity of lipids, glycolipids and carbohydrates23.
Envelope biogenesis.
Mycolic acids are structural components of all mycobacteria and include the alpha mycolates, which lack oxygen functions, and the oxygenated keto- and methoxy-forms. Reappraisal of mycolic-acid modification is now possible with the reduced cmaA, mmaA and umaA gene sets encoding the effector methyltransferases found in M. leprae. Mycobacterium leprae contains no methoxy-mycolates23, probably because it has lost the MmaA2 and MmaA3 enzymes that attach the methoxy group in M. tuberculosis10, 24. The mycolic acids also have cyclopropane functions25, which in M. tuberculosis are carried out by MmaA2 and CmaA1. As both the mmaA2 and cmaA1 genes have decayed in M. leprae, cyclopropanation must be encoded by one of the related umaA genes. Both umaA2 and cmaA2 have been shown to be essential for the cyclopropanation function in M. tuberculosis26. The same enzymes also catalyse cyclopropanation in M. leprae, as their duplicate copies are both inactive (Table 2).
Foremost among the outer lipids of the leprosy bacillus is phenolic glycolipid 1 (PGL1), an envelope component not found in M. tuberculosis27. PGL1 is derived from phthiocerol-dimycocerosate (PDIM), an esterified compound lipid generated by mycocerosic-acid synthase and a type I polyketide synthase (PKS), by addition of three O-methylated deoxy sugars23. However, we could not detect the genes for the glycosyltransferases, which modify PDIM to produce PGL1, despite extensive comparisons. PDIM, a virulence factor in M. tuberculosis, requires the RND protein, MmpL7, for its transport across the cytoplasmic membrane9, 28, 29. Of the 18 PKS systems identifiable in M. tuberculosis8, only 6 were predicted in M. leprae; and the number of mmpL genes (often linked to PKS genes) is only 5, as opposed to 16 in M. tuberculosis, possibly because these genes are no longer required for polyketide or lipid export. Deletion of such systems may be reflected in the lack of mycolipenic and hydroxylipenic acids—polyketides esterified to trehalose in M. tuberculosis. Further PKS genes missing from M. leprae include the mbt operon required for production of the salicylate-based mycobactin siderophores. Lipids, polyketides and aromatic compounds are often substrates for cytochrome-P450 monooxygenases30, enzymes that are exceptionally abundant in M. tuberculosis8. Astonishingly, none of these is functional in M. leprae, although a new member of the P450 family is present.
Lipolysis.
Intracellular mycobacteria probably derive much of their energy from the degradation of host-derived lipids31, a process initiated by lipases. In contrast to the 22 lip genes of M. tuberculosis, M. leprae has only 2 lipase genes, of which lipG clusters with mmaA genes and might, therefore, effect fatty-acid remodelling. This appears to leave just one lipase for scavenging fatty acids. In addition to the multifunctional FadA and FadB enzymes, which catalyse
-oxidation, M. tuberculosis has numerous alternative systems for fatty-acid degradation8. Once again, M. leprae has roughly one-third as many potential enzymes; however, there are three-times more FadD acyl-CoA synthases than there are FadE acyl-CoA dehydrogenases, whereas these are predicted in equal amounts in M. tuberculosis. This may be explained by the dual role of FadD in
-oxidative and anabolic processes, whereas FadE only participates catabolically.
The acetyl-CoA produced by
-oxidation, or glycolysis, flows into the central pathways of carbon metabolism in M. leprae. However, the pattern of 'just enough' genes for each step is firmly established, so that the redundancy seen in M. tuberculosis almost never occurs. For instance, there is only one isocitrate lyase (with low predicted activity) capable of participating in the glyoxylate shunt (Table 2)32, 33, and one enzyme complex that oxidatively decarboxylates pyruvate to acetyl-CoA, compared with two such systems in M. tuberculosis. In the Krebs cycle, as in glycolysis, replicate genes for the same activity are deleted although differences in expression levels might compensate for some missing copies. Thus, although lack of pdh genes is reflected in a low rate of oxidative decarboxylation of pyruvate, isocitrate dehydrogenase activity is comparable in host-grown leprosy and tubercle bacilli34 even though a duplicate icd gene is inactivated in M. leprae.
Central and energy metabolism.
Despite an active glyoxylate cycle, there seem to be fundamental differences elsewhere in anaplerotic pathways between M. leprae and M. tuberculosis. Here, phosphoenol-pyruvate (PEP) carboxylase replaces the pyruvate carboxylase of M. tuberculosis, and the malic enzyme, which is associated with fast growth in mycobacteria35, is missing. The metabolic implications are that flux between C3 and C4 compounds and the balance between glycolysis and gluconeogenesis will be very different. Another missing link between by-products of lipid metabolism and the Krebs cycle is the production of succinyl-CoA by catabolic acetyl/propionyl CoA carboxylases predicted for M. tuberculosis8.
Other carbon sources lost to M. leprae are acetate, because ackA, pta and acs are all inactive, and galactose—the cell-wall galactan can be produced only from glucose because the galK and galT genes are missing. This might imply that M. leprae is limited to growth on a restricted, or even a specialized, combination of carbon sources, on which it can maintain balanced carbon metabolism. Although a similar range of potential substrates is available to both M. leprae and M. tuberculosis in the host, marked differences in their ability to exploit them are apparent on examination of the systems involved in carbon and nitrogen compound degradation: there are fewer oxidoreductases, oxygenases and short-chain alcohol dehydrogenases, and their probable regulatory genes (Fig. 3). The inescapable conclusion is that catabolism in M. leprae is severely limited.
In the same vein, the leprosy bacillus has lost anaerobic and microaerophilic electron transfer systems, such as formate dehydrogenase, nitrate and fumarate reductase, together with the biosynthetic and transport systems required to produce the cognate prosthetic groups. Likewise, the aerobic respiratory chain of M. leprae is truncated, as only the extreme 3' end of the NADH oxidase operon, nuoA-N, remains. The consequences of this event are far-reaching, because not only has the potential to produce ATP from the oxidation of NADH been lost, but also the regeneration of NAD+ may be limited, as M. leprae must rely heavily on ndh, which is involved only in recycling NAD+. Alternatively, M. leprae may oxidize NADH by either diverting pyruvate to acetate and CO2 using lactate dehydrogenase and lactate oxidase, or diverting PEP to malate or fumarate through oxaloacetate using its PEP carboxylase (an enzyme not found in the tubercle bacillus), which catalyses the reaction only in this direction. Given the loss of genes reviewed above, the acids produced by these two processes cannot be recycled and must be excreted.
Anabolism.
In contrast, all the anabolic pathways seem to be relatively intact. With few exceptions, complete enzyme systems are predicted for synthesis of amino acids, purines, pyrimidines, nucleosides, nucleotides, most vitamins and cofactors. This suggests that the availability of these metabolites in phagosomes is either highly limited or that they cannot be transported. It also sets the biology of the leprosy bacillus apart from that of the other obligate parasites for which genomes have been sequenced15, 16.
Mycobacterium leprae may, however, be auxotrophic for methionine as metC, encoding cystathionine
-lyase, is a pseudogene, whereas the other counterparts of M. tuberculosis met genes are all intact. This requirement for methionine may be dictated by the inactivation of the sulphate transport operon, cysTWA, implying that M. leprae may depend on an organic source of sulphur. Cobinamide auxotrophy is also predicted, as examination of the cob genes shows selective deletion of those required for its synthesis, whereas the genes needed to produce vitamin B12 from cobinamide are retained.
Pathogenesis and disease control
The ability to obtain iron is central to a successful pathogenic lifestyle. Mycobacterium leprae has many genes for haem and iron-based proteins and uses the iron regulatory systems ideR and furB. Compared with M. tuberculosis, however, M. leprae may be severely handicapped as it appears to have lost the mbt operon, which encodes the non-ribosomal peptide synthase required for production of the iron-scavenging siderophores mycobactin/exochelin8, 36, 37. Part of the iron uptake system is functional in M. leprae, however, as it transports exochelinMN from Mycobacterium neoaurum but not those of Mycobacterium smegmatis or M. tuberculosis38. The genes for exochelinMN are unknown and seem unlikely to be present in M. leprae.
As might be expected given the differences in their respective pathologies, M. leprae contains several enzymes that have no counterparts in the tubercle bacillus, including a eukaryotic-like uridine phosphorylase and adenylate cyclase. In addition, there are two transport systems that may have significant physiological roles: an ABC-transporter for sugars; and a second Nramp1-like protein39, could be involved in divalent metal ion uptake. The latter may have been acquired to ensure adequate intracellular iron concentrations resulting from lack of mycobactin siderophores.
M. leprae shows a marked tropism for myelin-producing Schwann cells, and a surface-exposed laminin-binding protein (LBP) of relative molecular mass 21,000 (Mr 21K) may be an important virulence factor40, 41, 42. Inspection of the genome sequence revealed a single LBP gene, and this also occurs in M. tuberculosis. We did not detect any other candidates for virulence genes, and many of those present in M. tuberculosis have been inactivated or lost, including three of the Mce operons encoding putative invasins9, 43. Although the leprosy and tubercle bacilli both survive within macrophages, M. leprae has no catalase-peroxidase44, and fewer peroxidoxins and epoxide hydrolases to combat reactive oxygen species. It has, however, retained both superoxide dismutases.
Elimination of leprosy as a public health problem will not only require continued multidrug therapy but also improved detection of infected individuals so that treatment can be implemented earlier, thus reducing contagion and limiting neurological complications. Diagnosis is difficult in patients with few lesions but, thanks to genomics and the identification of potentially specific proteins, new avenues are now open for developing immunodiagnostic tests. Finally, development of new tuberculosis drugs and vaccines should also benefit directly from our comparative genomic analysis through definition of the core mycobacterial genes.
Methods
The whole genome sequence was obtained from a combination of sequenced cosmids45 and 54,000 end sequences (giving 7.1
coverage) from a pUC18 genomic shotgun library using dye terminator chemistry on ABI373 or ABI377 automated sequencers. The sequences of cosmids previously generated by multiplex sequencing46 were used for scaffolding purposes only. The sequence was assembled using Phrap (P. Green, unpublished), finished using GAP4 (ref. 47), and compared with sequences present in public databases using FASTA, BLASTN and BLASTX48. Potential coding sequences were predicted, and gene and protein sequences analysed as described previously8, 49, using Artemis50 to collate data and facilitate annotation. We compared the genome and proteome sequences of M. leprae and M. tuberculosis H37Rv pairwise to identify conserved genes using the Artemis Comparison Tool (K. Rutherford, unpublished; http://www.sanger.ac.uk/Software/ACT/). Pseudogenes had one or more mutations that would ablate expression and were pinpointed by direct comparison with M. tuberculosis. A relational database presenting the findings is available (http://genolist.pasteur.fr/Leproma/).

2 chain. Cell 88, 811-821 (1997). |
