Genetic analysis of capsular polysaccharide synthesis gene clusters in 79 capsular types of Klebsiella spp

A total of 79 capsular types have been reported in Klebsiella spp., whereas capsular polysaccharide synthesis (cps) regions were available in only 22 types. Due to the limitations of serotyping, complete repertoire of cps will be helpful for capsular genotyping. We therefore resolved the rest 57 cps and conducted comparative analysis. Clustering results of 1,515 predicted proteins from cps loci categorized proteins which share similarity into homology groups (HGs) revealing that 77 Wzy polymerases were classified into 56 HGs, which indicate the high specificity of wzy between different types. Accordingly, wzy-based capsular genotyping could differentiate capsule types except for those lacking wzy (K29 and K50), those sharing identical wzy (K22 vs. K37); and should be carefully applied in those exhibited high similarity (K12 vs. K41, K2 vs. K13, K74 vs. K80, K79 vs. KN1 and K30 vs. K69). Comparison of CPS structures in several capsular types that shared similarity in their gene contents implies possible functions of glycosyltransferases. Therefore, our results provide complete set of cps in various types of Klebsiella spp., which enable the understandings of relationship between genes and CPS structures and are useful for identification of documented or new capsular types.

General and atypical features of the cps locus in 79 capsular types of Klebsiella spp. The commonality of genetic features of the Klebsiella spp. cps loci has been revealed in previous studies 18,28 . Conserved genetic organization at the 5′ end of the cps locus extends from galF through cpsACP, wzi, wza, wzb and wzc and at the 3′ end of cps locus from gnd to ugd. The wzc-gnd region which usually contains genes encoding GTs, flippase (wzx), polymerase (wzy) and modifying enzymes (acetyltransferase, pyruvyl transferase, etc.) varies among different capsular types 18 . The gnd-ugd region possibly is composed of genes involved in GDP-D-mannose synthesis (manB and manC) or dTDP-L-rhamnose synthesis (rmlA, rmlB, rmlC and rmlD) 28 . Analysis of the 79 cps gene clusters from Klebsiella spp. indicated that these general features were observed in most capsular types, meanwhile, some notably uncommon features were characterized as well.
In K4, cpsACP was not followed by a wzi gene; instead, one acetyltransferase gene, two potassium/proton antiporter genes and a transposase gene were located between cpsACP and wza (Fig. 1). Similarly, the wzi gene was absent in K33 and K40 cps loci; instead, three hypothetical proteins were identified in the region of K33 (Fig. 1), and three GT genes, three genes encoding hypothetical proteins and three transposases genes were located in the region of K40 (Fig. 1). Besides, two gnd genes were found in the K41 cps region and most interestingly, the K4 cps was composed of a wza-wzb-wzc region and an additional Scientific RepoRts | 5:15573 | DOi: 10.1038/srep15573 wzi-wza-wzb-wzc region (wza was interrupted by a transposase gene). The additional three genes and the upstream wzi gene showed high DNA sequence identity with those of K1 (99% for each gene), indicating that K4 cps included several K1 cps genes (Fig. 1). Another atypical feature is that no wzx-like gene was found in capsular types K11 and K34 and no wzy-like gene was found in capsular types K29 and K50.
In addition, we further examined the correlation between the sugar composition and presence/absence of related cps genes. Of the 79 documented capsular types, to our best knowledge, 74 capsule structures are publicly available (the chemical structure are unavailable for the five types, K29, K42, K65, KN1 and KN2) (references were provided in Supplementary Table S3). Sugars found in different K-types are mannose for 37 types, fucose for 6 types, rhamnose for 28 types, and galactofuranose for 3 types. Among the 37 types with mannose as a structural unit, genes for GDP-D-mannose synthesis (manB and manC) were found in their cps regions, except in K4 (with manC only) and in K50 (both absent). Moreover, even there is no mannose incorporated into their capsule structures, capsular types K1, K16, K54, K58 and K63 harbored manCB genes. As the five types were known to use fucose as one of the components of their capsules and GDP-L-fucose is converted from GDP-D-mannose, mannose would be an intermediate rather than the final product incorporated into capsule. The six capsular types (K1, K6, K16, K54, K58, K63) that contain fucose as a structural unit possessed both gmd (gene encodes GDP-D-mannose 4, 6-dehydratase) and wcaG (a nucleotide sugar epimerase/dehydratase with bifunctional activity: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase) genes, which are responsible for conversion from GDP-D-mannose to GDP-L fucose 32 . Conversely, the types with capsules that do not contain fucose lacked both gmd and wcaG in their cps regions.
The rmlA, rmlB, rmlC and rmlD genes are known to be responsible for dTDP-L-rhamnose synthesis 33,34 , and the four genes were usually found together, with the exception of the K65 cps region, which contained only the rmlA, rmlC and rmlD genes and not rmlB. From the resolved CPS structure, we found that the presence/absence of rhamnose in repeat units was perfectly correlated with presence/absence of rmlBADC genes. Galactofuranose was found in K12, K14 and K41, consistent with the presence of glf genes (encoding UDP-galactopyranose mutase, which catalyzes the conversion of UDP-galactopyranose into UDP-galactofuranose 35,36 ).
With respect to the correlation between capsule modifications and modifying enzymes (acetyltransferases and pyruvyltransferases), 12 types exhibited acetylated capsules and 10 of them carried genes encoding acetyltransferases (the two exceptions are K33 and K59) (Supplementary Table S3). Sixty-two types express capsule without acetylation, but genes encoding acetyltransferases were found in 19 types. Twenty-eight types have pyruvylated capsules and contained genes that encoded pyruvyltransferases in their cps regions except for K11 (Supplementary Table S3). Forty-six types express capsule without pyruvylation, but gene for adding pyruvyl groups were found in 4 types (K8, K22, K37 and K66).
Homology group (HG) assignment of cps genes. We used the TribeMCL program to assemble 1,515 predicted proteins into 361 HGs. The clustering result showed that 143 of the 361 HGs (40%) contained 2 to 81 members each, and the remainders formed 218 single-member HGs (Supplementary  Table S2).
Applications of cps-PCR genotyping. Due to the limitations of capsular serotyping, polymerase chain reaction-based genotyping of the capsular polysaccharide synthesis region, cps-PCR genotyping, was developed based on available cps sequences to detect specific cps genes in some capsular types of Klebsiella spp. 10,12,18,21,23 . Because cps-PCR genotyping is a rapid and accurate method for detecting the cps genotype, the availability of cps sequences in all 79 types will be very useful for discriminating capsular types based on capsular type-specific genes. According to the results of the protein clustering, non-initial GTs, Wzx and Wzy were specific to distinct capsular types, indicating these genes could be selected for genotyping. Because more than one non-initial GT gene was present in a given type, it would be easier to choose wzx or wzy as typing genes. In addition, since the 77 Wzy were classified into 56 HGs compared to the 77 Wzx categorized into 28 HGs, wzy exhibits more diversity than wzx in different types. There were 78 Wzi clustered into 4 HGs, suggesting that the wzi was less discriminatory than wzx or wzy. Therefore, wzy would be most specific for capsular PCR genotyping. We further analyzed the amino acid and DNA sequence identity of the wzy genes that were grouped into the same HG groups. Most of the Wzy proteins shared < 60% amino acid sequences identity even within a single HG group and shared DNA sequence similarity with < 600 bp matching sequences over ~1. 2 23 . For the types exhibited high similarity (> 600 bp matching sequences over ~1.2 kb gene length), primers should be designed according to the variable region of their wzy genes; alternatively, other cps genes can be used for differentiating these types. Another limitation is the inapplicability in the capsular types lacking wzy-like genes (K29 and K50).
Enzymes for synthesis of capsular repeat unit. WbaP and WcaJ regarded as initial GT for capsule synthesis are UDP-hexose transferase enzymes that transfer galactose-1-phosphate and glucose-1-phosphate, respectively, to undecaprenol phosphate 28,37 . Additional transferases (non-initial GT) further add sugars to form repeat units 29,38 and polymerase enzyme, Wzy, subsequently assemble the lipid-linked repeat units 29 . We found that either wbaP or wcaJ were present in the 79 cps loci, and the clustering results showed that the initial GTs (WbaP and WcaJ) were assembled into a single group each, implying they were conserved among different types. Furthermore, a perfect correlation was observed in the 74 types with available capsule structures, that is, wbaP genes co-exist with the presence of galactose in the repeat unit, and wcaJ co-exist with the presence of glucose. Moreover, possible polymerization linkage of the repeat unit can be predicted based on which type of initial GT they possess. For example, the presence of wcaJ indicates that glucose is the initial sugar of K1 capsular repeat units, therefore, the polymerization linkage of K1 capsular repeat units could be β -D-Glcp(1 → 4)β -D-GlcpA according to reported chemical structure of its capsule 39 and K1 Wzy (MagA) is supposed to be responsible for the linkage formation. In addition, K12 and K41 which share 82% amino acid sequences identity in Wzy seem to have the same predicted polymerization linkage for their capsular repeat units, i.e., α -D-Galp(1 → 2)β -D-galf 40,41 .
The clustering results showed that 318 non-initial GTs were clustered into 142 different HGs, which provide some information on the possible functions of the GTs.  Table S4). As for the similarity of the genetic organization of cps clusters, it is not surprising that serological cross reactions are reported between the two types 24 .  Table  S4). The function of WcoW was also evidenced by the co-existence of WcoW and the linkage in K74 46 . In addition, we also found that K2 has an acetyltransferase-encoding gene; however, the previously reported K2 capsule structure is not acetylated 45 .  K12 and K41. Serological cross-reactions between K12 and K41 are known to occur 47,48 . The two capsular types exhibit the same repeat unit but distinct side branches 40,41 . The side chain of K12 was determined to be 5 K30 and K69. Even no cross-reaction has been reported between K30 and K69, the capsule structures of the two types are almost identical with the exception of the linkage between β -D-Galp and the pyruvyl group 49,50 . The cps regions of the two types were also highly similar ( Fig. 2d and Supplementary Table S4). With the major difference between these two strains being pyruvylation, the pyruvyltransferases from K30 and K69 which shared 73% amino acid identity (named as WcuL) could catalyze both pyruvylation linkages or the dissimilarity of the two proteins is critical for their specificity.  Table S4).

Discussion
A notable finding of this study is that K4 cps region was mosaicked with K1 cps genes, implicating cps gene could shuffle within Klebsiella spp. Therefore, lateral gene transfers of cps loci either intra-or interspecies could frequently occur as capsule switching was evidenced by a recent study which revealed that high number of distinct cps variants within K. pneumoniae clonal group CG258 were caused by extensive recombination events between distinct cps 53 .
We also found that some common cps genes were absent or truncated in a few types. Chances are that gene homologues in other locations of genome could compensate the functions, or mechanisms other than the typical group 1 system could be involved in capsule biosynthesis for these types. Another possibility could be the actual loss of this gene function, such as the K50 capsular type reference strain has been observed to be non-capsulated 25 .
Another notable feature of the cps loci is the existence of genes encoding transposases or phage-related proteins, which may be evidence that transposition and horizontal gene transfer has occurred within cps regions. Some chromosomal rearrangements associated with transposition events may lead to gene loss. A previous study found that the wzb-wzc locus of the cps region was replaced by transposase genes in K15 and K50, which resulted in capsule deficiency 23 . In some cases, transposases most likely modify the cps region instead of disrupting it. For example, it has been documented that the cps gene clusters of Streptococcus pneumoniae serogroup 12 and serotypes 44 and 46 only differ in the presence of transposase genes 54 . Although we did not find any documented capsular types of Klebsiella spp. that differ only in transposases or transposons, we hypothesize that other strains will likely display a subtype or new type by transposase or transposon integration.
cps genotyping based on wzi sequencing has been used for discriminating the capsular types of Klebsiella spp. 10 . wzy genes were highly variable while wzi genes were relatively conserved. wzy PCR genotyping needed specific primers from each already resolved sequences, however, it was more specific and no sequencing was necessary. In contrast, wzi genotyping could use relatively conserved sequences as primers, but it needed PCR and sequencing of PCR products to obtain final results. Both methods would encounter difficulties in some capsular types unless full cps sequences available. wzc genotyping 25 Scientific RepoRts | 5:15573 | DOi: 10.1038/srep15573 was similar to wzi PCR and sequencing. However, it can differentiate much more reported genotypes than wzi. Therefore, wzy PCR would be preferable to rapid identify a specific genotype while wzc PCR with sequencing would be best to test isolates with unknown type prevalence.
Comparative analysis of different capsular types showed their relatedness, and the genetic differences (presence or absence of genes, sequence changes and gene truncation etc.) can be linked to the various structures of the expressed capsules. Our results indicated that some types exhibit similar capsule structures because of the high similarity in their cps regions. In terms of serological reactions, some of the capsular types that share related cps genes are known to cross-react by serotyping (K1 vs. K58; K2 vs. K13; K12 vs. K41), indicating that anti-sera recognize their common structures; other strains do not exhibit cross-reactivity despite sharing very similar structures (K30 vs. K69; K74 vs. K80; K57 vs. K68), suggesting that distinct epitopes are crucial for serological differentiation. In addition, putative functions of cps genes were also indicated according to the presence of specific genes and unique linkages. The existence of genes for capsule modifications in cps region also revealed the possibility of presence of undefined capsule modifications in certain types. Besides, we also provide some evidence of sugar composition in types with unknown CPS structure. Therefore, as all cps gene clusters from different capsular types of Klebsiella spp. have been resolved, the functions of genes involved in capsule synthesis will be much clear.
In conclusion, the available cps sequences and comparative analysis of various capsular types has an impact on understanding of the functions of cps genes and provides complete information on the relatedness of different capsular types through evolutionary history. Furthermore, these data are an important basis for the application of capsular genotyping as well as new type identification in Klebsiella spp.

Methods
Bacterial strains. A total of 77 K-serotype Klebsiella spp. reference strains were purchased from Statens Serum Institute, Copenhagen, Denmark. Two additional strains with novel type KN1 and KN2 capsules identified in our laboratory were also included 12,13 . Sequencing of cps loci. We amplified the cps loci from Klebsiella spp. strains using multiple pairs of conserved primers as previously described 12,55 (Supplementary Table S5 and Supplementary Fig. S2). PCR amplifications were performed with the Long and Accurate PCR system, and the cycling programs were used in accordance with previously described procedures 12 . The PCR amplicons were subjected to sequencing by high-throughput sequencing (Yang-Ming Genome Research Center) using the Illumina/ Solexa GAII sequencing platform. When PCR amplifications failed, cps sequences were obtained by previously described inverse PCR and DNA sequencing methods 56 based on the available wzc sequences of these types 23 . The cps sequences (approximately 20-30 kb) were deposited in Genbank (Accession Numbers are shown in Supplementary Table S1).
Gene annotation and homology group (HG) assignment. Coding sequences were predicted by vector NTI and annotated by NCBI-protein blast. Predicted proteins were classified into HGs using the TribeMCL algorithm (Centre for Mathematics and Computer Science and EMBL-EBI) 57 with a cut-off of 1e −50 . Gene names were assigned for cps genes encoding GTs, acetyltransferases and pyruvyltransferases in accordance with the Bacterial Polysaccharide Gene Database 58 if they had not been given names previously. Proteins within the same HGs were given the same name, and hypothetical proteins with uncertain roles in capsule synthesis were given names according to the number of HGs. The polymerases (Wzy) that fell into multiple HGs were each assigned a number to indicate the different groups.