Leptospirosis is a widely spread disease of global concern. Infection causes flu-like episodes with frequent severe renal and hepatic damage, such as haemorrhage and jaundice. In more severe cases, massive pulmonary haemorrhages, including fatal sudden haemoptysis, can occur1. Here we report the complete genomic sequence of a representative virulent serovar type strain (Lai)2 of Leptospira interrogans serogroup Icterohaemorrhagiae consisting of a 4.33-megabase large chromosome and a 359-kilobase small chromosome, with a total of 4,768 predicted genes. In terms of the genetic determinants of physiological characteristics, the facultatively parasitic L. interrogans differs extensively from two other strictly parasitic pathogenic spirochaetes, Treponema pallidum3 and Borrelia burgdorferi4, although similarities exist in the genes that govern their unique morphological features. A comprehensive analysis of the L. interrogans genes for chemotaxis/motility and lipopolysaccharide synthesis provides a basis for in-depth studies of virulence and pathogenesis. The discovery of a series of genes possibly related to adhesion, invasion and the haematological changes that characterize leptospirosis has provided clues about how an environmental organism might evolve into an important human pathogen.
Spirochaetes, morphologically unique in their coiled, slender and flexuous shape and related form of motility, form a major phylogenetic lineage (phylum) of eubacteria. Leptospira, an obligately aerobic, tightly coiled spirochaete, is the only genus other than Borrelia, Treponema and Brachyspira that is able to cause significant infection in mammals. The leptospires are physiologically chemoheterotrophic. They include the saprophytic L. biflexa and the pathogenic L. interrogans. The latter is known worldwide to be responsible for the water-borne zoonosis leptospirosis. Although antibiotic therapy is effective against the disease, it remains a serious threat in tropical and subtropical countries as well as in those cities where sanitation is substandard and where wild rats can serve as reservoirs when sewage disposal is poor1.
Molecular and cellular studies on leptospires5 have focused on their dynamics of motility, biosynthesis of amino acids and lipopolysaccharide (LPS), outer-membrane proteins and other potential virulence factors. In contrast to L. biflexa, little in the way of genetic analysis has been reported for L. interrogans, owing to their fastidious cultivation requirements and the lack of genetic systems5. Previously, the genomic sequences of two pathogenic spirochaetes—T. pallidum, responsible for syphilis3, and B. burgdorferi, responsible for Lyme disease4—have been determined. We employed the whole-genome random sequencing method3,4,6 to sequence and analyse the genomic DNA of a representative virulent serovar type strain (Lai)2 of L. interrogans serogroup Icterohaemorrhagiae (see Methods).
The L. interrogans genome (4,691,184 base pairs (bp); Fig. 1, Table 1) is much larger than either of the other two spirochaetes (1,138,006 bp for T. pallidum and 1,519,857 bp for B. burgdorferi, including plasmids). It consists of two circular chromosomes, a large one of 4,332,241 bp (CI) and a small one of 358,943 bp (CII), in good agreement with previous estimates5. More than 30 copies of repetitive DNA elements, including members of the IS1500 and IS1501 families, were distributed throughout the genome but few phage-related sequences were identified.
Both GC nucleotide skew ((G - C)/(G + C)) analysis and comparisons with the ori sequences of other bacteria were employed to locate the replication origin of CI, whereas only GC nucleotide skew analysis was used to identify a putative replication origin on CII, as with Vibrio cholerae6. DnaA boxes were identified on the anticlockwise side of oris for both CI and CII. In addition, parAB operons were identified on each side of the putative replication origins of both chromosomes (Supplementary Information 1).
In all, 4,768 putative genes were predicted, among them 37 genes for transfer RNAs (Supplementary Information 2-1). Previous reports5 indicated that in strains Ictero No. 1, Verdun and RZ11 of L. interrogans, there were two sets each of genes encoding 16S ribosomal RNA (rrs) and 23S rRNA (rrl). However, besides the two rrs genes, we identified only one gene each encoding 5S (rrf) and 23S rRNAs respectively. The extraordinarily low number of tRNA and rRNA genes might well account for the fastidious growth of L. interrogans.
Among the 4,727 protein-coding sequences (CDSs), 4,360 lie on CI and 367 lie on CII, whereas all of the rRNA and tRNA genes were found on CI (Table 1). Although most of the genes required for growth and viability are located on CI, some essential genes lie on CII. Besides the previously recognized metF7 (LB002) and asd5 (LB355), it is significant to recognize an ndh gene (LB036), encoding NADH dehydrogenase, and clusters of genes involved in a nearly complete pathway for the de novo biosynthesis of haem. These data, therefore, tend to support the view that CII is an authentic part of the genome that did not originate by lateral transfer.
On the basis of amino acid sequence similarity searches and/or domain analysis, biological functions have been assigned to about 44% of the CDSs (2,060), whereas 15% of the CDSs (715) either encode proteins of unknown function or are similar to unassigned CDSs predicted in other organisms. A total of 1,952 predicted CDSs (41%) failed to exhibit obvious similarity to any protein-coding genes of other organisms (Table 1). In particular, only 315 orthologues were shared by L. interrogans, T. pallidum and B. burgdorferi (Supplementary Information 3).
Some of the previously identified metabolic characteristics of leptospires, such as the absence of hexokinase1, were confirmed by genomic analysis. A complete set of genes for a system of long-chain fatty-acid utilization, a tricarboxylic acid cycle and a respiratory electron transport chain were identified in L. interrogans; this was consistent with the notion that the organism generates ATP by oxidative phosphorylation (Fig. 2). In contrast, none of the aforementioned genes are present in T. pallidum or B. burgdorferi, in which ATP can be generated only by sugar fermentation by means of the Embden–Meyerhof pathway5. Because L. interrogans cannot utilize sugars as carbon sources, anaplerotic reactions are essential for gluconeogenesis. We failed to identify genes encoding glucose-6-phosphate dehydrogenase, one of the key enzymes of the phosphogluconate pathway. Neither of the two key enzymes of the glyoxylate pathway, isocitrate lyase and malate synthase, were present, although these two enzymes were detected in L. biflexa1. However, we did identify all the genes encoding enzymes for gluconeogenesis from glycerol (Fig. 2), including phosphoglucose isomerase, as previously reported1. In addition, genes encoding enzymes likely to be involved in the oxidative carboxylation of acetyl-CoA to succinyl-CoA through the 3-hydroxypropionate pathway8 were recognized (Fig. 2). Intermediates of carbohydrate metabolism are therefore likely to be synthesized by means of the tricarboxylic acid cycle and the non-oxidative pentose phosphate pathway (Fig. 2). Genes encoding transhydrogenase (pntA and pntB) were identified. These enzymes could catalyse the formation of sufficient NADPH for anabolic processes at the cost of protonmotive force generated by an NADH dehydrogenase complex (Fig. 2). In this connection, one should emphasize that glycerol, together with the long-chain fatty acids, is present in EMJH medium (Johnson and Harris modification of the Ellinghausen and McCullough medium)1 for better growth of L. interrogans.
In contrast to B. burgdorferi and T. pallidum, L. interrogans encodes complete metabolic systems for amino acid and nucleotide biosynthesis, which is in agreement with previous work1. Methionine biosynthesis in leptospires is similar to that in yeast1, whereas the final step seems to be catalysed by a B12-dependent homocysteine-N5-methyltetrahydrofolate transmethylase, encoded by metH, rather than by a cobalamin-independent methionine synthase encoded by metE (Fig. 2). In this connection, the absence of several genes of B12 biosynthesis from the L. interrogans genome accounts for the fact that this compound is an essential component of the EMJH semi-synthetic medium1. It was proposed that a pyruvate pathway might be used by leptospires for isoleucine biosynthesis, either alone or together with the conventional threonine deaminase pathway1. Because we failed to identify a gene encoding threonine deaminase but did find three putative leuA genes, we experimentally determined the substrate specificity of these enzymes (see Methods). The enzyme encoded by LA2202 is an isopropylmalate synthase (leuA1), whereas LA2350 encodes citramalate synthase (cimA). Although the enzyme encoded by LA0469 has some citramalate synthase activity, it is primarily an isopropylmalate synthase (leuA2).
The genomic information enhances our understanding of the mechanisms of virulence and pathogenesis in leptospirosis. As with most other pathogenic bacteria, L. interrogans possesses several genes related to the attachment and invasion of eukaryotic cells (mce, invA, atsE and mviN; Supplementary Information 2-2). The unique cellular shape and motility apparatus of spirochaetes provide these organisms with an additional method of achieving effective infection5,9. We found at least 50 genes (not including chemotaxis genes) related to motility, accounting for more than 1% of the deduced CDSs (Fig. 2). Like B. burgdorferi and T. pallidum, L. interrogans uses FlaA sheath protein and FlaB core protein as the essential components of its endoflagellar filament5. Other bacteria5,10 employ FliC for this purpose. L. interrogans also has a complete set of genes (Supplementary Information 2-2) for shape determination. In contrast to B. burgdorferi11, the finely coiled spiral shape of leptospires is likely to be mainly attributable to the murein layer rather than the flagella12.
Chemotaxis is generally acknowledged to be an important virulence factor for pathogenic bacteria. The chemotaxis system of L. interrogans (Fig. 2) is more complex than that of either T. pallidum or B. burgdorferi. The recognition of many genes (12 CDSs) encoding methyl-accepting chemotaxis proteins (MCPs) presumably reflects the extremely diverse environmental situations that a facultatively parasitic zoonotic bacterium can encounter. Employing secondary-structure prediction methods, 5 of the 15 CDSs with clear CheY-like response domains were designated cheY genes (Supplementary Information 4-1). However, only one such gene was located in a putative chemotaxis operon (cheWABY, LA1250-1253).
Leptospirosis virulence has been attributed in part to the effect of the leptospiral LPS1. The nucleotide sequence of the locus encoding a set of enzymes for the biosynthesis of the O-antigen component of Leptospira LPS (rfb locus) is known for four serovars of two species5. We identified an rfb locus of 103 kilobases (kb) (Supplementary Information 5) in L. interrogans serovar lai. In agreement with findings in other rfb loci of leptospires, almost all of the 97 CDSs (LA1576 to LA1672), except three short ones, are encoded on the same strand (forward). About 30 kb of nucleotide sequence located at the 3′-proximal end of the locus is almost identical to its counterpart in serovar copenhageni (GB: U61226). Unlike L. borgpetersenii serovar hardjo, subtype hardjobovis, no IS elements were found within or flanking the rfb locus. We tentatively assigned a series of genes encoding O-antigen-processing enzymes within and outside the rfb locus by comparisons of predicted transmembrane patterns with genes characterized in other Gram-negative bacteria (Supplementary Information 4-2). This is a strong indication that the biosynthesis of LPS in L. interrogans proceeds through the Rfc (Wzy)-dependent pathway.
In contrast to T. pallidum and B. burgdorferi5, genes encoding enzymes involved in the biosynthesis of the Lipid A backbone and its KDO (2-keto-3-deoxyoctonoic acid) core (Supplementary Information 6) are present in L. interrogans. The LPS of L. interrogans is a structurally unique molecule of relatively low toxicity5 that activates macrophages in a distinct manner13. These characteristics can be rationalized on the basis of structural comparisons between LpxA proteins of different bacterial origins (Supplementary Information 4-3).
Although it is not clear whether the extensively studied sphingomyelin-specific phospholipases have significant roles in the pathogenesis of leptospirosis1, we identified four genes encoding other kinds of haemolysin in addition to five genes coding for sphingomyelinase-like proteins (Supplementary Information 7). All these proteins have been expressed in Escherichia coli, and their haemolytic activities have been demonstrated (Y.X.-Z. and G.-P.Z., unpublished results).
The genome of L. interrogans encodes several proteins bearing homology to animal proteins important in haemostasis (Supplementary Information 8). These include a protein that resembles the mammalian platelet-activating factor (PAF) acetylhydrolase14 (LA2144, pafAH) and another that is similar to von Willebrand factor15 type A domains (LB054 and LB055, vwa). No bacterial genomes have hitherto been shown to encode both of these proteins, although they have been separately identified in several bacterial species (Supplementary Information 8). A third gene relevant to haemostasis, so far found only in Leptospira, seems to specify an orthologue of paraoxonase (LA0399, pon). This protein might hydrolyse PAF through its arylesterase activity16. Because a colA17 gene (LA0872) encoding microbial collagenase has been identified, it is reasonable to propose that collagenase-mediated injury to the vascular epithelium during infection and the subsequent combined effects of the Vwa, PafAH and Pon proteins could lead to a loss of haemostasis, in addition to the proposed effects of LPS1,13. This model is consistent with the clinical manifestations of leptospirosis, namely damage to the endothelial cell membranes of small blood vessels1. It also might explain the observed sequelae of severe infections by serovar lai, such as massive pulmonary haemorrhage and fatal sudden haemoptysis1.
Among eubacteria, spirochaetes are evolutionarily primitive9,18. However, the fact that leptospires can survive either as saprophytes or as facultative parasites has presumably afforded them significant growth opportunities, although not without pressure for co-evolution in response to their environment or hosts. A BLAST analysis was performed to compare the best-hit distribution of protein homologues in representative eubacteria with the predicted proteomes of bacteria, virus (phage), archaea and eukarya. The result (Fig. 3) suggests that the genome of L. interrogans surpasses those of other bacteria in terms of the number of proteins with structural similarity to eukaryal and archaeal proteins that it encodes. In this respect, L. interrogans resembles B. burgdorferi and Mycoplasma genitalium. This raises several important evolutionary questions, including the possibility that lateral gene transfer, operating in parallel with standard gene evolution events, contributed to the emergence of an important human pathogen from an environmental bacterium.
Source and culturing of study organism
The Leptospira interrogans serogroup Icterohaemorrhagiae serovar lai type strain 56601 used in this study is maintained by the National Institute for Communicable Disease Control and Prevention, Chinese Center for Disease Control and Prevention (ICDC, China CDC), Beijing, China2. For sequencing purposes, a single colony was picked from EMJH1 soft agar and cultured in the same medium. The culture thus obtained was then subjected to morphological, serological, genetic and virulence analysis. The properties of the strain were in accordance with those of pathogenic Leptospira. For functional analysis, growth curves for L. interrogans in EMJH or Korthof1 medium were measured turbidimetrically, and viable bacterial counts were determined by dark-field microscopy. Culture conditions were then developed to ensure that only mid-exponential-phase bacterial cultures were used for further experimentation.
Genome sequencing and analysis
The genome of strain Lai of L. interrogans was sequenced by a whole-genome random sequencing method previously applied to other microbial genomes3,4,6. Three different libraries were used in this project. The first two, in pUC18, had inserts of either 1.5–3 kb or 8–10 kb. The third was a 40-kb cosmid library. Altogether, 111,402 sequence reads (Phred value >Q20 (refs 19, 20)) were generated, which gave rise to an overall genome coverage of 8.5 fold, of which 1,600 were from the end sequences of large insert plasmid (8–10-kb) clones and 1,000 were from the end sequences of cosmid clones. The Phred/Phrap/Consed software package19,20,21 was used for quality assessment and sequence assembly. The initial assembly yielded 805 contigs, which were clustered into 145 groups based on linking information from forward and reverse sequence reads. Some contigs were also located on the physical map by Southern analysis. Sequence and/or physical gaps of the chromosomes were closed by primer walking and PCR. The final assembly was checked against the physical map of restriction sites, mapped genes and end sequences of large plasmid and cosmid clones.
Assignment of CDSs
CDSs were determined with Glimmer 2.0 (ref. 22) and the Z-curve method23, and the results were subjected to further manual inspection. A few CDSs were found by hand curating as guided by BLAST results. BLAST searches against the NCBI non-redundant protein database (or SwissProt, PIR and COG) were performed to determine the similarity. The blast search criteria were as follows: (1) e-value = 10-5 and (2) at least 60% of the subject sequence was aligned. If there was no database hit, domain analysis was performed by searching the Pfam, PRINTS, PROSITE, ProDom, Block and SMART databases. Transfer RNAs were predicted with tRNAscan-SE24. TopPred25 was used to identify potential membrane-spanning domains in proteins. The presence of signal peptides and the probable position of a cleavage site in secreted proteins were detected with Signal-P. Lipoproteins were identified by scanning for a lipobox ([LV][ASTVI][GAS][C]) in the first 30 amino acids of every protein. Possible metabolic pathways were examined using the KEGG database10. Transmembrane helices in proteins were predicted by the THMHH method (Supplementary Information 4). Predicted biological roles were assigned by the classification scheme in ref. 26. In cases in which tertiary structures of hypothetical proteins were predicted, sequences of CDSs were submitted to the SWISS-MODEL server and the illustrations were prepared with Rasmol 2.6.
Deposition of data
In addition to the data deposited at the NCBI database (GB: AE010300 for CI and GB: AE010301 for CII), the L. interrogans genome database is also available at http://www.chgc.sh.cn/lep/ and at http://bioinfo.hku.hk/LeptoList/.
The BLAST analysis for comparing the best-hit distribution of protein homologues in representative eubacteria with the predicted proteomes of bacteria, virus (phage), archaea and eukarya was based on ref. 27 for studying horizontal gene transfer with modifications. The data were retrieved from NCBI TaxMap (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/micr.html). The CDSs of each bacterium were used in a BLAST search against the database. Only those hits that scored at least 95 bits were collected and ranked. The ‘most’ similar organism was the one to which the homologous protein bears the strongest similarity with the query CDS.
Faine, S., Adler, B., Bolin, C. & Perolat, P. Leptospira and Leptospirosis (Medisci, Melbourne, 1999)
Kmety, E. & Dikken, H. Classification of the species Leptospira interrogans and history of its serovars. (Groningen Univ. Press, The Netherlands, 1993)
Fraser, C. M. et al. Complete genome sequence of Treponema pallidum, the syphilis spirochete. Science 281, 375–388 (1998)
Fraser, C. M. et al. Genomic sequence of a lyme disease spirochaete, Borrelia burgdorferi. Nature 390, 580–586 (1997)
Saier, M. & Garcia-Lara, J. The Spirochetes: Molecular and Cellular Biology (Horizon Scientific Press, Wymondham, 2001)
Heidelberg, J. F. et al. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406, 477–483 (2000)
Bourhy, P. & Saint Girons, I. Localization of the Leptospira interrogans metF gene on the CII secondary chromosome. FEMS Microbiol. Lett. 191, 259–263 (2000)
Herter, S. et al. Autotrophic CO2 fixation by Chloroflexus aurantiacus: Study of glyoxylate formation and assimilation via the 3-hydroxypropionate cycle. J. Bacteriol. 183, 4305–4316 (2001)
Charon, N. W. & Goldstein, S. F. Genetics of motility and chemotaxis of a fascinating group of bacteria: The spirochetes. Annu. Rev. Genet. 36, 47–73 (2002)
Ogata, H. et al. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27, 29–34 (1999)
Motaleb, M. A. et al. Borrelia burgdorferi periplasmic flagella have both skeletal and motility functions. Proc. Natl Acad. Sci. USA 97, 10899–10904 (2000)
Picardeau, M., Brenot, A. & Saint Girons, I. First evidence for gene replacement in Leptospira spp. Inactivation of L. biflexa flaB results in non-motile mutants deficient in endoflagella. Mol. Microbiol. 40, 189–199 (2001)
Werts, C. et al. Leptospiral lipopolysaccharide activates cells through a TLR2-dependent mechanism. Nature Immunol. 2, 346–352 (2001)
Arai, H. et al. Platelet-activating factor acetylhydrolase (PAF-AH). J. Biochem. (Tokyo) 131, 635–640 (2002)
Tuckwell, D. Evolution of von Willebrand factor A (VWA) domains. Biochem. Soc. Trans. 27, 835–840 (1999)
Rodrigo, L., Mackness, B., Durrington, P., Hernandez, A. & Mackness, M. Hydrolysis of platelet-activating factor by human serum paraoxonase. Biochem. J. 354, 1–7 (2001)
Matsushita, O. et al. Gene duplication and multiplicity of collagenases in Clostridium histolyticum. J. Bacteriol. 181, 923–933 (1999)
Wolf, Y., Rogozin, I., Grishin, N., Tatusov, R. & Koonin, E. V. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1, 8 (2001)
Ewing, B. et al. Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185 (1998)
Ewing, B. & Green, P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194 (1998)
Gordon, D., Abajian, C. & Green, P. Consed: A graphical tool for sequence finishing. Genome Res. 8, 195–202 (1998)
Delcher, A. L., Harmon, D., Kasif, S., White, O. & Salzberg, S. L. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27, 4636–4641 (1999)
Zhang, C. T. & Wang, J. Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve. Nucleic Acids Res. 28, 2804–2814 (2000)
Lowe, T. & Eddy, S. tRNAscan-SE: A program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 955–964 (1997)
Claros, M. & von Heijne, G. TopPred II: an improved software for membrane protein structure predictions. Comput. Appl. Biosci. 10, 685–686 (1994)
Riley, M. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57, 862–952 (1993)
Olendzenski, L., Liu, L., Zhaxybayeva, O., Murphey, R., Shin, D. G. & Gogarten, J. P. Horizontal transfer of archaeal genes into the Deinococcaceae: Detection by molecular and computer-based approaches. J. Mol. Evol. 51, 587–599 (2000)
Howell, D. M., Xu, H. & White, R. H. (R)-Citramalate synthase in methanogenic Archaea. J. Bacteriol. 181, 331–333 (1999)
Kohlhaw, G. B., Leary, T. R. & Umbarger, H. E. Alpha-isopropylmalate synthase from Salmonella typhimurium. Purification and properties. J. Biol. Chem. 244, 2218–2225 (1969)
We thank L. Bao, B.-M. Dai, J. Yan, C. Werts, M. Picardeau and G. Baranton for suggestions and comments on our research strategy and manuscript preparation; C. Jin and G.-C. Liu of the Institute of Microbiology, Chinese Academy of Science, for help in the attempt at assaying the enzymatic activity of PafAH; Y. Liu and H.-G. Zhu for help in preparing the drawings; X. Mao and G. Cai for help in computer simulation; B.-Y. Hu and Y-X. Nie for help in bacterial culture preparation; and the members of CHGCS for support and encouragement. This work was supported by the National Natural Science Foundation of China, the Chinese National High Technology Development Program (863), the National Key Program for Basic Research (973) and the Sciences and Technology Commission of the People's Government of Shanghai Municipality. It was also supported by the Pôle Sino-Français en Sciences du Vivant et en Génomique and le Programme de Recherches Avancées Franco-Chinois PRA B00-05.
The authors declare that they have no competing financial interests.
Supplementary information 4: Schematic illustration of functional assignment to some hypothetical proteins based on their structural characteristics. Supplementary information 4-1: Schematic illustration of functional assignment to CheY based on secondary structure prediction and comparison on hypothetical proteins. Supplementary information 4-2 (A-E): Schematic illustration of functional assignment to genes encoding key enzymes for LPS biosynthesis based on prediction of protein transmembrane domains. Supplementary information 4-3: Amino acid sequence comparison of the hypothetical UDP-N-acetylglucosamine acyltransferase (LpxA, LA3949) of L. interrogans to those of E. coli and Pseudomonas. (PDF 170 kb)
About this article
In Silico Analysis of Genetic VapC Profiles from the Toxin-Antitoxin Type II VapBC Modules among Pathogenic, Intermediate, and Non-Pathogenic Leptospira
Genomic analysis of Leptospira interrogans serovar Paidjan and Dadas isolates from carrier dogs and comparative genomic analysis to detect genes under positive selection
BMC Genomics (2019)
Comprehensive genomic analysis of an indigenous Pseudomonas pseudoalcaligenes degrading phenolic compounds
Scientific Reports (2019)
BMC Evolutionary Biology (2019)
Peridomestic small Indian mongoose: An invasive species posing as potential zoonotic risk for leptospirosis in the Caribbean
Acta Tropica (2019)