Introduction

The genus Klebsiella, especially the species Klebsiella pneumoniae, is a common human pathogen that can lead to a wide range of diseases in both hospital and community settings. It causes nosocomial infections, such as septicemia, pneumonia and urinary tract infections1,2 and is also associated with community-acquired infections, including pneumonia, urinary tract infections and pyogenic liver abscess complicated with meningitis and endophthalmitis3,4,5. Capsule is a major virulence factor of K. pneumoniae and association between capsular types and particular diseases6,7 or severity of infections was documented8,9. At present, a total of 79 capsular types have been identified and associated with different Klebsiella species10, including 77 types from reference strains (recently reclassified into K. pneumoniae, K. variicola, K. oxytoca, K. michiganensis, Raoultella planticola, R. ornithinolytica and R. terrigena10) recognized by serological reactivity tests established during the period 1926 to 197711 and 2 new types of K. pneumoniae (KN1 and KN2) characterized by molecular genotyping and phage typing in recent years12,13.

Serotyping has been used to characterize the K-types of Klebsiella spp. since 192614. However, the limitations of serotyping of Klebsiella spp. have been reported in several studies, including limitations of sensitivity and specificity15,16,17. For this reason, capsular genotyping methods have been developed for discriminating the capsular types of Klebsiella spp.10,12,18,19,20,21,22,23,24. Polymerase chain reaction-based genotyping of the capsular polysaccharide synthesis region, cps-PCR genotyping, was first adopted for the detection of specific wzy genes in Klebsiella spp. type K118,19,20 and subsequently applied to other capsular types related to community-acquired pyogenic liver abscess12,21,22. Recently, wzi or wzc sequencing was also used for Klebsiella spp. capsular typing10,25. However, some types were undistinguished by their sequences and it can be also complicated to determine the capsular types when sequence variation exists in a given type.

Genetic structures of the capsular polysaccharide synthesis (cps) gene cluster in Klebsiella spp. have been determined in some types12,18,26,27,28. A group of six genes (galF, cpsACP, wzi, wza, wzb and wzc) at the 5′ end of the cps regions that encode proteins involved in CPS translocation and processing at the bacterial surface are highly conserved among different capsule types and genes encoding glucose-6-phosphate dehydrogenase (gnd) and UDP-glucose dehydrogenase (ugd) were found at the 3′ end. The middle region (variable region) of the cps loci which comprises particular genes encoding proteins responsible for polymerization and assembly of specific CPS subunits were therefore considered to be crucial for K-type variation18. Generally, the synthesis of the capsular repeat is initiated by the initial glycosyltransferase (GT)–WbaP and WcaJ28,29 and further catalyzed by specific (non-initial) GTs allowing the addition of sugars29. The lipid-linked repeat units are flipped across the plasma membrane by Wzx and then polymerized by Wzy30. Subsequently, the channel Wza, together with regulators Wzb and Wzc, which control the process of polymerization and transportation, exports the polymer to the surface of the bacteria29.

Of the 79 documented capsular types in Klebsiella spp., the cps gene clusters of 22 types (complete 13 cps for K1, K2, K3, K5, K9, K14, K16, K20, K22, K39, K52, K62 and KN2; incomplete 9 cps for K15, K23, K37, K45, K50, K54, K57, K79 and KN1) are available6,12,18,25,28. In order to associate all of 79 cps gene clusters with distinct capsular types, we sequenced the cps of 57 capsular types of Klebsiella spp. and extend the 3′ of incomplete 9 cps and conducted comparative analysis of the cps gene clusters of various types. Investigation into the relationships between different capsular gene clusters provided further understanding of capsule biosynthesis. Moreover, as we have gained more complete information on the genetic structures of all 79 capsular types, the limitations of current genotyping methods can be more clearly defined and the use of these typing methods can be further improved.

Results

cps gene clusters of 79 Klebsiella spp. capsular types

We obtained all the 79 cps gene clusters which extend from galF to ugd (except for K4 and K50) by retrieving sequences from Genbank database (13 complete cps and 9 incomplete cps), extending the 3′ cps sequences in the 9 types and resolving 57 cps of Klebsiella spp. (Supplementary Table S1). In K4, we failed to extend the downstream sequences of gnd; in K50, conserved gnd or ugd genes were not found in this locus although a ~21 kb region from galF to the downstream genes HisA and HisF (which encode enzymes associated with histidine biosynthesis and are generally located downstream of cps gene clusters in Klebsiella spp.18) was resolved. Moreover, we identified K. pneumoniae strain BIDMC 47 as K13 by wzc genotyping25 (100% DNA sequence identity). Thus, the full cps sequences of BIDMC47 (accession number AB924555) was included to represent K13 type. For these 78 cps, a total of 1515 coding sequences were annotated, including galF (n = 79), cpsACP (n = 80), wzi (n = 78), wza (n = 81), wzb (n = 78), wzc (n = 78), gnd (n = 79), manB (n = 43), manC (n = 44), rmlA (n = 30), rmlB (n = 29), rmlC (n = 30), rmlD (n = 30), wcaJ (n = 40), wbaP (n = 39), gmd (n = 6), wcaG (n = 6), glf (n = 5), wzx (n = 77), wzy (n = 77) and genes encoding non-initial GTs (n = 318), glycosyl hydrolase (n = 33), acetyltransferases (n = 35), pyruvyltransferases (n = 35), transposases (n = 21), nitroreductase (n = 2), potassium/proton antiporter (n = 2), tail fiber (n = 6), acetylneuraminic acid synthetase (n = 1), UDP galacturonate 4-epimerase (n = 1), carbohydrate lyase (n = 1), CMP-N-acetylneuraminic acid synthetase (n = 1), coenzyme F420 hydrogenase (n = 1) and hypothetical proteins (n = 49) (Supplementary Table S2).

By NCBI blast, the cps gene clusters of Klebsiella spp. K31, K47, K61 and K63 were almost identical (>96% DNA identity) to those of Escherichia coli 5-172-05_S1_C3 (JOQS01000075.1), Escherichia coli HS (CP000802), Escherichia coli MS 85-1 (ADWQ01000010.1) and Escherichia coli KTE222 (ASUP01000016.1), respectively. Similarly, previous studies reported that E. coli and Klebsiella spp. possess highly similar cps sequences23,31. The cps sequences of Klebsiella spp. K4 also share high similarity (99% DNA identity) with those of Serratia spp. (AEQT01000901).

General and atypical features of the cps locus in 79 capsular types of Klebsiella spp

The commonality of genetic features of the Klebsiella spp. cps loci has been revealed in previous studies18,28. Conserved genetic organization at the 5′ end of the cps locus extends from galF through cpsACP, wzi, wza, wzb and wzc and at the 3′ end of cps locus from gnd to ugd. The wzc-gnd region which usually contains genes encoding GTs, flippase (wzx), polymerase (wzy) and modifying enzymes (acetyltransferase, pyruvyl transferase, etc.) varies among different capsular types18. The gnd-ugd region possibly is composed of genes involved in GDP-D-mannose synthesis (manB and manC) or dTDP-L-rhamnose synthesis (rmlA, rmlB, rmlC and rmlD)28. Analysis of the 79 cps gene clusters from Klebsiella spp. indicated that these general features were observed in most capsular types, meanwhile, some notably uncommon features were characterized as well.

In K4, cpsACP was not followed by a wzi gene; instead, one acetyltransferase gene, two potassium/proton antiporter genes and a transposase gene were located between cpsACP and wza (Fig. 1). Similarly, the wzi gene was absent in K33 and K40 cps loci; instead, three hypothetical proteins were identified in the region of K33 (Fig. 1) and three GT genes, three genes encoding hypothetical proteins and three transposases genes were located in the region of K40 (Fig. 1). Besides, two gnd genes were found in the K41 cps region and most interestingly, the K4 cps was composed of a wza-wzb-wzc region and an additional wzi-wza-wzb-wzc region (wza was interrupted by a transposase gene). The additional three genes and the upstream wzi gene showed high DNA sequence identity with those of K1 (99% for each gene), indicating that K4 cps included several K1 cps genes (Fig. 1). Another atypical feature is that no wzx-like gene was found in capsular types K11 and K34 and no wzy-like gene was found in capsular types K29 and K50.

Figure 1
figure 1

Genetic alignment of the K4, K33 and K40 cps gene cluster.

Open reading frames (ORFs) are shown as arrows. The upper panel indicates conserved genetic organization of cps gene cluster. Atypical gene contents are marked in red color. GT, glycosyltransferase.

In addition, we further examined the correlation between the sugar composition and presence/absence of related cps genes. Of the 79 documented capsular types, to our best knowledge, 74 capsule structures are publicly available (the chemical structure are unavailable for the five types, K29, K42, K65, KN1 and KN2) (references were provided in Supplementary Table S3). Sugars found in different K-types are mannose for 37 types, fucose for 6 types, rhamnose for 28 types and galactofuranose for 3 types. Among the 37 types with mannose as a structural unit, genes for GDP-D-mannose synthesis (manB and manC) were found in their cps regions, except in K4 (with manC only) and in K50 (both absent). Moreover, even there is no mannose incorporated into their capsule structures, capsular types K1, K16, K54, K58 and K63 harbored manCB genes. As the five types were known to use fucose as one of the components of their capsules and GDP-L-fucose is converted from GDP-D-mannose, mannose would be an intermediate rather than the final product incorporated into capsule. The six capsular types (K1, K6, K16, K54, K58, K63) that contain fucose as a structural unit possessed both gmd (gene encodes GDP-D-mannose 4, 6-dehydratase) and wcaG (a nucleotide sugar epimerase/dehydratase with bifunctional activity: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase) genes, which are responsible for conversion from GDP-D-mannose to GDP-L fucose32. Conversely, the types with capsules that do not contain fucose lacked both gmd and wcaG in their cps regions.

The rmlA, rmlB, rmlC and rmlD genes are known to be responsible for dTDP-L-rhamnose synthesis33,34 and the four genes were usually found together, with the exception of the K65 cps region, which contained only the rmlA, rmlC and rmlD genes and not rmlB. From the resolved CPS structure, we found that the presence/absence of rhamnose in repeat units was perfectly correlated with presence/absence of rmlBADC genes. Galactofuranose was found in K12, K14 and K41, consistent with the presence of glf genes (encoding UDP-galactopyranose mutase, which catalyzes the conversion of UDP-galactopyranose into UDP-galactofuranose35,36).

With respect to the correlation between capsule modifications and modifying enzymes (acetyltransferases and pyruvyltransferases), 12 types exhibited acetylated capsules and 10 of them carried genes encoding acetyltransferases (the two exceptions are K33 and K59) (Supplementary Table S3). Sixty-two types express capsule without acetylation, but genes encoding acetyltransferases were found in 19 types. Twenty-eight types have pyruvylated capsules and contained genes that encoded pyruvyltransferases in their cps regions except for K11 (Supplementary Table S3). Forty-six types express capsule without pyruvylation, but gene for adding pyruvyl groups were found in 4 types (K8, K22, K37 and K66).

Homology group (HG) assignment of cps genes

We used the TribeMCL program to assemble 1,515 predicted proteins into 361 HGs. The clustering result showed that 143 of the 361 HGs (40%) contained 2 to 81 members each and the remainders formed 218 single-member HGs (Supplementary Table S2).

The products of galF, wzi, wza, wzb, wzc, gnd, wcaJ, wbaP, manC, manB, rmlA, rmlB, rmlC and rmlD fell into a single HG, suggesting these proteins were conserved among different capsular types. In contrast, non-initial GTs, Wzy polymerases and Wzx flippases were clustered into 142, 56 and 28 different HGs, respectively, indicating they were diverse in various types. Intriguingly, proteins for capsule modification (acetyltransferases and pyruvyltransferases) also classified into multiple groups (26 and 16 HGs, respectively), suggesting different modifying enzymes were needed for distinct capsule structures.

Applications of cps-PCR genotyping

Due to the limitations of capsular serotyping, polymerase chain reaction-based genotyping of the capsular polysaccharide synthesis region, cps-PCR genotyping, was developed based on available cps sequences to detect specific cps genes in some capsular types of Klebsiella spp.10,12,18,21,23. Because cps-PCR genotyping is a rapid and accurate method for detecting the cps genotype, the availability of cps sequences in all 79 types will be very useful for discriminating capsular types based on capsular type-specific genes. According to the results of the protein clustering, non-initial GTs, Wzx and Wzy were specific to distinct capsular types, indicating these genes could be selected for genotyping. Because more than one non-initial GT gene was present in a given type, it would be easier to choose wzx or wzy as typing genes. In addition, since the 77 Wzy were classified into 56 HGs compared to the 77 Wzx categorized into 28 HGs, wzy exhibits more diversity than wzx in different types. There were 78 Wzi clustered into 4 HGs, suggesting that the wzi was less discriminatory than wzx or wzy. Therefore, wzy would be most specific for capsular PCR genotyping. We further analyzed the amino acid and DNA sequence identity of the wzy genes that were grouped into the same HG groups. Most of the Wzy proteins shared <60% amino acid sequences identity even within a single HG group and shared DNA sequence similarity with <600 bp matching sequences over ~1.2 kb gene length except K22 vs. K37, K12 vs. K41, K2 vs. K13, K74 vs. K80, K79 vs. KN1 and K30 vs. K69 (Table 1). Previous studies have documented that K22 and K37 possess identical wzy genes and are only distinguishable by the acetyltransferase encoding genes23. For the types exhibited high similarity (>600 bp matching sequences over ~1.2 kb gene length), primers should be designed according to the variable region of their wzy genes; alternatively, other cps genes can be used for differentiating these types. Another limitation is the inapplicability in the capsular types lacking wzy-like genes (K29 and K50).

Table 1 Amino acid and DNA sequences identity of the members in Wzy HG groups.

Enzymes for synthesis of capsular repeat unit

WbaP and WcaJ regarded as initial GT for capsule synthesis are UDP-hexose transferase enzymes that transfer galactose-1-phosphate and glucose-1-phosphate, respectively, to undecaprenol phosphate28,37. Additional transferases (non-initial GT) further add sugars to form repeat units29,38 and polymerase enzyme, Wzy, subsequently assemble the lipid-linked repeat units29. We found that either wbaP or wcaJ were present in the 79 cps loci and the clustering results showed that the initial GTs (WbaP and WcaJ) were assembled into a single group each, implying they were conserved among different types. Furthermore, a perfect correlation was observed in the 74 types with available capsule structures, that is, wbaP genes co-exist with the presence of galactose in the repeat unit and wcaJ co-exist with the presence of glucose. Moreover, possible polymerization linkage of the repeat unit can be predicted based on which type of initial GT they possess. For example, the presence of wcaJ indicates that glucose is the initial sugar of K1 capsular repeat units, therefore, the polymerization linkage of K1 capsular repeat units could be β-D-Glcp(1 → 4)β-D-GlcpA according to reported chemical structure of its capsule39 and K1 Wzy (MagA) is supposed to be responsible for the linkage formation. In addition, K12 and K41 which share 82% amino acid sequences identity in Wzy seem to have the same predicted polymerization linkage for their capsular repeat units, i.e., α-D-Galp(1 → 2)β-D-galf40,41.

The clustering results showed that 318 non-initial GTs were clustered into 142 different HGs, which provide some information on the possible functions of the GTs. For example, one HG (HG20) contains 14 GTs from K3, K7, K21, K24, K26, K28, K29, K39, K40, K43, K53, K65, K74 and K80 (the GTs show 37-64% amino acid identity to their members). Based on the available CPS structures (except for K29 and K65), 10 of the 12 types (K3, K7, K21, K24, K26, K28, K43, K53, K74 and K80) share the same linkage α-D-Manp(1 → 2)α-D-Manp. Therefore, we suggested that these GTs grouped into the same HG (named as WcuE) probably has catalytic activity for the specific sugar linkage. Accordingly, the relationship of GTs and CPS structures lays the foundations for understanding the putative functions of different GTs.

Capsular types with related cps genes and similar capsule structure

According to the protein clustering results, 9 pairs of capsular types (K1 and K58, K2 and K13, K12 and K41, K14 and K64, K10 and K61, K30 and K69, K33 and K35, K74 and K80 and K57 and K68) have 5 or more genes that located within the variable region (wzc-ugd, excluding man and rml genes) shared similarity (clustered into the same HG). Therefore, we compared the CPS structures of these capsular types and indicated correlations between genes and products. Below are 6 examples with clearer implications (others are described in Supplementary Fig. S1a–S1c):

K1 and K58

The same linkage of β-D-GlcpA(1 → 4)α-L-Fucp was found in the capsular repeat units of K1 and K5839,42, thus, we suggest that the fucosyl transferase WcaI present in both types is responsible for the synthesis of this linkage. Moreover, The specific GT in K1, wcsS, most likely accounts for the linkage α-L-Fucp((1 → 3)β-D-Glcp; whereas the two GTs in K58, WcqS and WcqT, likely accounts for the α-L-Fucp((1 → 3)α-D-Glcp linkage or the side chain synthesis (Fig. 2a and Supplementary Table S4). As for the similarity of the genetic organization of cps clusters, it is not surprising that serological cross reactions are reported between the two types24.

Figure 2
figure 2

Comparison of cps gene clusters and capsule structures in capsular types with similar cps gene content.

Open reading frames (ORFs) are shown as arrows. Conserved genes, man genes, rml genes or transposases are shown in black. Other gene products that were clustered into the same HGs are shown in same colors and the amino acid similarities (%) are indicated below the ORFs. Genes only present in either of the two types are shown in white. GT, glycosyltransferase. Enzymes most likely involved in linkage formation are indicated along with their capsular structures. The differences of capsule structures from two types are shown in red. a, K1 and K58; b, K2 and K13; c, K12 and K41; d, K30 and K69; e, K74 and K80; f, K57 and K68.

K2 and K13

K2 and K13, which are known to cross-react by serotyping43,44, share similar capsule structures that only differ in the side chain, i.e., α-D-GlcpA(1 → 3)β-D-Manp in K2 and 3, 4-Pyr-β-D-Galp(1 → 4)α-D-GlcpA(1 → 3)β-D-Manp in K1345. The pyruvyl transferase (WcuL) and the GT (WcoW) present only in K13 but not in K2 may contribute to the addition of the pyruvyl group and the synthesis of the linkage β-D-Galp(1 → 4)α-D-GlcpA, respectively (Fig. 2b and Supplementary Table S4). The function of WcoW was also evidenced by the co-existence of WcoW and the linkage in K7446. In addition, we also found that K2 has an acetyltransferase-encoding gene; however, the previously reported K2 capsule structure is not acetylated45.

K12 and K41

Serological cross-reactions between K12 and K41 are known to occur47,48. The two capsular types exhibit the same repeat unit but distinct side branches40,41. The side chain of K12 was determined to be 5, 6-Pyr-β-D-Galf(1 → 4)β-D-GlcpA(1 → 3)β-D-Galf and that of K41 is β-D-Glcp(1 → 6)α-D-Glcp((1 → 4)β-D-GlcpA(1 → 3)β-D-Galf. A GT (wckG) and a pyruvyl transferase (wckH) were found only in K12, suggesting that these are the key enzymes involved in the synthesis of β-D-Galf(1 → 4)β-D-GlcpA and pyruvylation, respectively (Fig. 2c and Supplementary Table S4). And The two GTs (WcpT and WcpU) in K41 are likely involved in the synthesis of β-D-Glcp(1 → 6)α-D-Glcp((1 → 4)β-D-GlcpA.

K30 and K69

Even no cross-reaction has been reported between K30 and K69, the capsule structures of the two types are almost identical with the exception of the linkage between β-D-Galp and the pyruvyl group49,50. The cps regions of the two types were also highly similar (Fig. 2d and Supplementary Table S4). With the major difference between these two strains being pyruvylation, the pyruvyltransferases from K30 and K69 which shared 73% amino acid identity (named as WcuL) could catalyze both pyruvylation linkages or the dissimilarity of the two proteins is critical for their specificity.

K74 and K80

K74 and K80 exhibit similar capsule structures46,51 and cps genes but do not show serological cross-reactivity. The differences between these two types reside within the side chains: 4, 6-Pyr-β-D-Galp(1 → 4)α-D-GlcpA(1 → 3)α-D-Manp and 3, 4-Pyr-β-L-Rhap(1 → 4)α-D-GlcpA(1 → 3)α-D-Manp in K74 and K80, respectively. Comparing the gene content of the K74 and K80 cps loci, genes for rhamnose synthesis (rmlABCD) were found in K80 but not in K74, which is consistent with the use of rhamnose in the side chain of K80 (Fig. 2e and Supplementary Table S4). Moreover, K74 and K80 each possess a unique pyruvyl transferase and a GT, suggesting that WcuL is involved in the synthesis of 4, 6-Pyr-β-D-Galp (the predicted function of WcuL is the same as what we proposed for the K69 structure); WcoW is involved in the synthesis of β-D-Galp(1 → 4)α-D-GlcpA (the predicted function of WcoW is the same as what we proposed for the K13 structure); WcuN likely accounts for the synthesis of 3, 4-Pyr-β-L-Rhap; and WcuS likely accounts for the synthesis of β-L-Rhap(1 → 4)α-D-GlcpA. Moreover, because WbaZ is known to catalyze the α-D-Manp-(1 → 3)β-D-Galp glycosidic linkage38, WcuD and WcuE were presumably responsible for the synthesis of the rest of linkages, i.e., α-D-Manp(1 → 2)α-D-Manp or α-D-GlcpA(1 → 3)α-D-Manp.

K57 and K68

K57 and K68 do not exhibit serological cross-reactivity but showed similarity in CPS structures45,52 and cps genes. The GT WbaZ which is known to form the α-D-Manp-(1 → 3)β-D-Galp disaccharide backbone is present in both strains. The pyruvylation of the capsule in K68 is also indicated by the presence of a pyruvyl transferase gene within its cps locus. In addition, the GTs WckX and WckW from K57 exhibited similarity with those of K68, implicating that the two proteins are responsible for the common linkages of the two types: α-D-Manp-(1 → 4)α-D-GalpA on the side chains and the α-D-GalpA(1 → 2)α-D-Manp linkage in the backbone (Fig. 2f and Supplementary Table S4).

Discussion

A notable finding of this study is that K4 cps region was mosaicked with K1 cps genes, implicating cps gene could shuffle within Klebsiella spp. Therefore, lateral gene transfers of cps loci either intra- or inter- species could frequently occur as capsule switching was evidenced by a recent study which revealed that high number of distinct cps variants within K. pneumoniae clonal group CG258 were caused by extensive recombination events between distinct cps53.

We also found that some common cps genes were absent or truncated in a few types. Chances are that gene homologues in other locations of genome could compensate the functions, or mechanisms other than the typical group 1 system could be involved in capsule biosynthesis for these types. Another possibility could be the actual loss of this gene function, such as the K50 capsular type reference strain has been observed to be non-capsulated25.

Another notable feature of the cps loci is the existence of genes encoding transposases or phage-related proteins, which may be evidence that transposition and horizontal gene transfer has occurred within cps regions. Some chromosomal rearrangements associated with transposition events may lead to gene loss. A previous study found that the wzb-wzc locus of the cps region was replaced by transposase genes in K15 and K50, which resulted in capsule deficiency23. In some cases, transposases most likely modify the cps region instead of disrupting it. For example, it has been documented that the cps gene clusters of Streptococcus pneumoniae serogroup 12 and serotypes 44 and 46 only differ in the presence of transposase genes54. Although we did not find any documented capsular types of Klebsiella spp. that differ only in transposases or transposons, we hypothesize that other strains will likely display a subtype or new type by transposase or transposon integration.

cps genotyping based on wzi sequencing has been used for discriminating the capsular types of Klebsiella spp.10. wzy genes were highly variable while wzi genes were relatively conserved. wzy PCR genotyping needed specific primers from each already resolved sequences, however, it was more specific and no sequencing was necessary. In contrast, wzi genotyping could use relatively conserved sequences as primers, but it needed PCR and sequencing of PCR products to obtain final results. Both methods would encounter difficulties in some capsular types unless full cps sequences available. wzc genotyping25 was similar to wzi PCR and sequencing. However, it can differentiate much more reported genotypes than wzi. Therefore, wzy PCR would be preferable to rapid identify a specific genotype while wzc PCR with sequencing would be best to test isolates with unknown type prevalence.

Comparative analysis of different capsular types showed their relatedness and the genetic differences (presence or absence of genes, sequence changes and gene truncation etc.) can be linked to the various structures of the expressed capsules. Our results indicated that some types exhibit similar capsule structures because of the high similarity in their cps regions. In terms of serological reactions, some of the capsular types that share related cps genes are known to cross-react by serotyping (K1 vs. K58; K2 vs. K13; K12 vs. K41), indicating that anti-sera recognize their common structures; other strains do not exhibit cross-reactivity despite sharing very similar structures (K30 vs. K69; K74 vs. K80; K57 vs. K68), suggesting that distinct epitopes are crucial for serological differentiation. In addition, putative functions of cps genes were also indicated according to the presence of specific genes and unique linkages. The existence of genes for capsule modifications in cps region also revealed the possibility of presence of undefined capsule modifications in certain types. Besides, we also provide some evidence of sugar composition in types with unknown CPS structure. Therefore, as all cps gene clusters from different capsular types of Klebsiella spp. have been resolved, the functions of genes involved in capsule synthesis will be much clear.

In conclusion, the available cps sequences and comparative analysis of various capsular types has an impact on understanding of the functions of cps genes and provides complete information on the relatedness of different capsular types through evolutionary history. Furthermore, these data are an important basis for the application of capsular genotyping as well as new type identification in Klebsiella spp.

Methods

Bacterial strains

A total of 77 K-serotype Klebsiella spp. reference strains were purchased from Statens Serum Institute, Copenhagen, Denmark. Two additional strains with novel type KN1 and KN2 capsules identified in our laboratory were also included12,13.

Sequencing of cps loci

We amplified the cps loci from Klebsiella spp. strains using multiple pairs of conserved primers as previously described12,55 (Supplementary Table S5 and Supplementary Fig. S2). PCR amplifications were performed with the Long and Accurate PCR system and the cycling programs were used in accordance with previously described procedures12. The PCR amplicons were subjected to sequencing by high-throughput sequencing (Yang-Ming Genome Research Center) using the Illumina/Solexa GAII sequencing platform. When PCR amplifications failed, cps sequences were obtained by previously described inverse PCR and DNA sequencing methods56 based on the available wzc sequences of these types23. The cps sequences (approximately 20–30 kb) were deposited in Genbank (Accession Numbers are shown in Supplementary Table S1).

Gene annotation and homology group (HG) assignment

Coding sequences were predicted by vector NTI and annotated by NCBI-protein blast. Predicted proteins were classified into HGs using the TribeMCL algorithm (Centre for Mathematics and Computer Science and EMBL-EBI)57 with a cut-off of 1e−50. Gene names were assigned for cps genes encoding GTs, acetyltransferases and pyruvyltransferases in accordance with the Bacterial Polysaccharide Gene Database58 if they had not been given names previously. Proteins within the same HGs were given the same name and hypothetical proteins with uncertain roles in capsule synthesis were given names according to the number of HGs. The polymerases (Wzy) that fell into multiple HGs were each assigned a number to indicate the different groups.

Additional Information

How to cite this article: Pan, Y.-J. et al. Genetic analysis of capsular polysaccharide synthesis gene clusters in 79 capsular types of Klebsiella spp. Sci. Rep. 5, 15573; doi: 10.1038/srep15573 (2015).