Reconstruction and analysis of carbon metabolic pathway of Ketogulonicigenium vulgare SPU B805 by genome and transcriptome

Ketogulonicigenium vulgare has been widely used in vitamin C two-step fermentation. Four K. vulgare strains (WSH-001, Y25, Hbe602 and SKV) have been completely genome-sequenced, however, less attention was paid to elucidate the reason for the differences in 2-KGA yield on genetic level. Here, a novel K. vulgare SPU B805 with higher 2-keto-L-gulonic acid (2-KGA) yield, was genome-sequenced to confirm harboring one circular chromosome with plasmid free. Comparative genome analyses showed that the absence of plasmid 2 was an important factor for its high 2-KGA productivity. The amino acid biosynthetic pathways in strain SPU B805 are much more complete than those in other K. vulgare strains. Meanwhile, strain SPU B805 harbored a complete PPP and TCA route, as well as a disabled EMP and ED pathway, same as to strain SKV, whereas strain WSH-001, Y25 and Hbe602 harbored complete PPP, ED, TCA pathway and a nonfunctional EMP pathway. The transcriptome of strain SPU B805 validated the carbon metabolism in cytoplasm mainly through the PPP pathway due to its higher transcriptional levels. This is the first time to elucidate the underlying mechanism for the difference in 2-KGA yield, and it is of great significance for strain improvement in the industrial fermentation.

Genomic features of K. vulgare SPU B805. The complete genome of K. vulgare SPU B805 was sequenced and deposited in the GenBank database with the accession number CP017622. The genome schema was drawn as Supplementary Fig. S2. It is 3,032,608 bp in length with a G + C content of 61.7%. Based on 16S rRNA gene phylogenetic analysis, K. vulgare SPU B805 is highly homologous to a reported 2-KGA-producing strain of K. vulgare SKV 5 (see Supplementary Fig. S3). The comparison of K. vulgare SPU B805 and K. vulgare Y25 2 , K. vulgare WSH-001 3 , K. vulgare SKV 5 and K. vulgare Hbe602 4 in genome characteristics was summarized in Table 2. K. vulgare SPU B805 has the largest genome among all the strains, but it doesn't encode the most genes and proteins. No plasmid is found in K. vulgare SPU B805, while other four strains have 1-2 plasmids. The number of rRNA (15) and tRNA (58) are the same as those in other strains except for tRNA (59) in WSH-001. The number of sorbose dehydrogenase (sdh) and sorbosone dehydrogenase (sndh), key genes responsible for the bioconversion of L-sorbose to 2-KGA, is different among all the strains, shown in Table 2. There are 5 sdh genes (KVC_2764, KVC_2744, KVC_0718, KVC_1927, KVC_0337) and 1 sndh gene (KVC_0605) in strain SPU B805.
Furthermore, the genome-scale sequence comparison conducted by LAST software showed that the genome of K. vulgare SPU B805 is more similar with K. vulgare SKV (see Supplementary Fig. S4). Strain SPU B805, showing a 99% identity to strain SKV, harbors one circular chromosome without any plasmid. However, strain SKV contains one circle chromosome and one circle plasmid, while the other strains (Y25, WSH-001, Hbe602) consist of one circle chromosome and two circle plasmids, respectively. Therefore, it is the first time to find a 2-KGA-producing strain without any plasmid. The frame diagram of genome comparison between K. vulgare SPU B805, SKV, Hbe602, WSH-001 and Y25 was shown in Fig. 1. Most of the genome sequences are highly consistent except for some insertion, dislocation and rearrangement. Further analysis revealed that the DNA fragment of plasmid 1 (for example, namely as plasmid pKvSKV1 in SKV) was completely inserted into the circular chromosome (from 125,709 bp to 393,697 bp) of SPU B805, while the DNA fragment of plasmid 2 (belonging to strain Hbe602, WSH-001 and Y25), was completely lost in SPU B805.
Genome annotation of K. vulgare SPU B805. To facilitate the gene function analysis, the genes on chromosome and plasmid 1 of the four published K. vulgare strains (SKV, Hbe602, WSH-001 and Y25) were selected to compare the distribution of COG classification with SPU B805 (see Supplementary Fig. S5). Consequently, in SPU B805, the number of genes related to amino acid transport and metabolism (E) is similar to that of Hbe602 and Y25 and higher than that in SKV and WSH-001. These functional genes encode many transporters, which absorb nutrients from the environment to compensate for its metabolic defects. Besides, the number of genes related to inorganic ion transport and metabolism (P) and translation, ribosomal structure and biogenesis (J) in SPU B805 is more than those of the reported K. vulgare strains. Inorganic ions are a kind of important cofactor involved in various physiological reactions in cell. The strong translation ability may be helpful to biosynthesize protein in K. vulgare SPU B805.
On the basis of the COG and RAST annotation, the genes on plasmid 2 of strain Hbe602, WSH-001 and Y25 account for about 7% of the genome. A majority of them are related to amino acid transport and metabolism (E), and assigned as "an ABC transporter" (Supplementary Fig. S5 and Table S1). Besides, some of the genes for carbohydrate metabolism are annotated as gluconate and ketogluconate metabolism. For example, the enzyme  gluconate 2-dehydrogenase (GA2DH) can convert 2-KGA into idonate, followed flowing to 6-phosphogluconate (6PG) and subsequently entering the PPP for energy and biomass production 6 , finally decrease the carbon flow to 2-KGA and lower the 2-KGA production. Additionally, one of the L-sorbosone dehydrogenase (SNDH) genes is also found to be located on plasmid 2. The over-expression of SNDH (encoded by plasmid 2) in K. vulgare Hbe602 produced an obvious byproduct (not identified its chemical structure in the published paper), so as to decrease the 2-KGA yield 15 . Moreover, the key genes unique to ED pathway, 2-keto-3-deoxy-6-phosphogluconate aldolase (Eda) and phosphogluconate dehydratase (Edd), are annotated to be located on plasmid 2. The deletion of edd and eda genes from Gluconobacter oxydans 621H resulted in a lower overall sugar uptake by cells in the cytoplasm, and led to more sugar being left in the periplasm and converted to the end-product (2-keto-gluconate) with high yield 16 . The loss of ED pathway maybe has a similar effect on 2-KGA production in K. vulgare. Fortunately, these genes (GA2DH, SNDH, Eda and Edd) located on the indigenous plasmid 2, are all absent from the genome of K. vulgare SPU B805, which may be a reason for its high 2-KGA production. Therefore, presumably, elimination of the above genes or indigenous plasmid 2 is an effective method to enhance 2-KGA production by K. vulgare. Moreover, to construction an XFP-PTA pathway as described by Wang, et al. 17 to decrease the carbon flux towards ED pathway with lower carbon loss is another pathway to promote the 2-KGA yield. Moreover, a total of 2933 coding sequences were obtained by RAST annotation in strain SPU B805, which belonged to 26 subsystems, including cofactor, membrane transport metabolism, nucleotides metabolism, protein metabolism, regulation and cell signaling, amino acids and carbohydrate metabolism, and so on. Among them, genes regarding to amino acid metabolism were accounted for the highest proportion (15%) ( Fig. 2 and Supplementary Table S2). However, the genes coding histidinol-phosphate (EC 3.1.3.15), leucine-alanine transaminase (EC 2.6.1.12), alanine transaminase (EC 2.6.1.2) and asparagine synthetase (EC 6.3.1.1) were found to be absent in strain SPU B805, which resulted in the incomplete biosynthesis pathway of histidine, alanine and asparagine (Fig. 3). While in strain WSH-001, the biosynthesis pathways of more amino acids, such as histidine, glycine, lysine, proline, threonine, methionine, leucine and isoleucine, were deficient due to absence of one or more key enzymes 7 . So the biosynthetic pathways of amino acid in K. vulgare SPU B805 are more complete than The sequence of plasmid pKvSKV1 (belong to SKV), plasmid 1 (belong to Hbe602), pKVU_100 (belong to WSH-001) and pYP1 (belong to Y25) with highly similarity is inserted into the genome of K. vulgare SPU B805 (dark grey parts), while plasmid 2 (belong to SKV) or pKVU_200 (belong to WSH-001) or pYP12 (belong to Y25) (white fragments) is missing in the K. vulgare SPU B805 genome. Scientific  those in the reported K. vulgare strain WSH-001. Previous study showed that glycine, serine, threonine and proline were important factors for 2-KGA biosynthesis, and threonine affected cell growth significantly 8 . Glycine and serine are donors of one-carbon units in one-carbon metabolism, and threonine can be converted into glycine and acetyl-CoA. The impairment of these amino acids biosynthesis pathways will weaken the biosynthesis of nucleic acid and protein, and the biogenesis methyl group. Moreover, proline can decrease the destruction of microbial cells resulted by the osmotic stress at high 2-KGA concentrations 18 . Therefore, presumably, the more complete biosynthesis pathways in strain SPU B805 may contribute to the growth of K. vulgare and the accumulation of 2-KGA. Besides, the membrane transport genes were abundant in strain SPU B805, which might help to transport and absorb the nutrients and metabolites released by B. megatherium 1 .

Construction of carbohydrate metabolic network.
Generally, the prokaryote carbohydrate metabolic pathways are diverse, including the EMP pathway, ED pathway and PPP pathway. However, the genomic annotation revealed that the genes coding 6-phosphofructokinase (EC 2.7.1.11, Pfk), 2-keto-3-deoxy-6 -phosphogluconate aldolase (EC 4.1.2.14, Eda) and phosphogluconate dehydratase (EC 4.2.1.12, Edd) were missing in strain SPU B805. Because Pfk is an enzyme specific to the EMP pathway, the absence of pfk leads to the incompleteness of the EMP route even though all the other enzymes are present. Similarly, Eda and Edd are the key enzymes unique to the ED pathway, therefore the absence of eda and edd imply the ED pathway is disabled in strain SPU B805. Thus, the PPP pathway was speculated as the main carbohydrate metabolic pathway in strain SPU B805. The EMP pathways of strain WSH-001, Y25 and Hbe602 were also analyzed based on their published genome, and found to be incomplete because of the absence of gene pfk (Table 3), therefore, we speculated that the ED and PPP pathways were the main carbohydrate metabolic routes. However, for strain SKV, the gene pfk, eda and edd were all absent from the genome, so presumably the carbohydrate metabolism was mainly through the PPP pathway. To sum up, there are two types of carbohydrate metabolic pathways in K. vulgare. One is simultaneously through the ED and PPP pathways in strain WSH-001, Y25 and Hbe602, the other one is only through the PPP pathway in strain SKV and SPU B805. The loss of the ED pathway had no negative effective on 2-KGA production ( Table 1) further confirmed that the ED pathway might be negligible in the cell growth and 2-KGA accumulation, thus the PPP pathway played an important role in the cytoplasmic carbohydrate metabolism in K. vulgare. Carbohydrate metabolism analysis based on transcriptome. The abundance of FPKM (fragments per kilo bases of million fragments) was employed to represent the gene transcriptional level 19 . As for strain SPU B805, the whole FPKM was divided into 11 grades (from Rank −4 to Rank +6), and the median point was 195.564 (Fig. 4). The FPKM of the key genes related to PPP pathway and TCA cycle (Zwf, EC 1.  Fig. 4, and it was noteworthy that all of the genes related to carbohydrate metabolism (PPP, TCA) were transcribed at higher levels than the median point. However, no transcription signals of pfk, ga2dh, edd and eda could be detected, which confirmed that these genes did not exist in the genome of strain SPU B805, and indicated that the EMP and ED pathway were disabled. Because the PPP pathway, ED pathway and EMP pathway were three parallel carbon metabolic routes before TCA cycle, therefore, the only existing complete PPP pathway was assigned as the major and important cytoplasmic carbohydrate decomposition pathway linked with TCA cycle in strain SPU B805. In fermentation, the FPKM of the five sdh (KVC_2764, KVC_2744, KVC_0718, KVC_1927, KVC_0337) (the average FPKM value 24691.32, 10501.89, 2078.73, 750.78, 220.99) were obviously higher than those of the central carbon degradation related genes, which partially explained that the majority of L-sorbose was converted to 2-KGA in strain SPU B805. Although the gene sndh (KVC_0605, the average FPKM value 479.86) had a relative lower transcriptional level, it did not affect the 2-KGA production rate because SDH was a dual-function enzyme, which could oxidize L-sorbose to L-sorbosone and further oxidize to 2-KGA 20 . Therefore, the central carbon metabolism, as well as the transcription level (the average FPKM value), were put together in Fig. 5. Strain SPU B805 possesses two separated modes for L-sorbose catabolism. A major part of L-sorbose is oxidized to L-sorbosone in the periplasm by the membrane-bound SDH, and further to 2-KGA by SDH and SNDH. The produced 2-KGA can't be further decomposed to idonate because of the absence of plasmid 2 (so absence of gene ga2dh), therefore the 2-KGA production can reach a higher level in SPU B805 than that in Hbe602 and WSH-001. A minor part of L-sorbose as carbon source is catabolized to F6P and subsequently to 6PG in cytoplasm, and then enters the PPP pathway for energy and biomass production. In the oxidative branch, 6PG is converted into ribulose 5-phosphate (Ru5P), carbon dioxide and NADPH. NADPH is the major reducing power used to maintain the redox balance under stress situation 21 . The formed Ru5P is converted to R5P by ribose 5-phosphate isomerase (RpiA, EC 5,3,1,6) or X5P by ribulose 5-phosphate epimerase (Rpe, EC 5.1.3.1) in the non-oxidative branch. R5P and X5P undergo carbon rearrangement to generate sedoheptulose 7-phosphate (S7P), glyceraldehyde

RT-qPCR validation of the genes related to carbohydrate metabolism.
The key genes in PPP pathway, zwf (KVC_1674), gnd (KVC_0890), transketolase (KVC_2247; tktA) and transaldolase B (KVC_1789; tal) were conducted by RT-qPCR. Compared with internal standard (polA), all of the four genes showed a relative high transcription level, and the changed trends in different time-points were in accordance with the transcriptome results (see Supplementary Fig. S6). The gene transcription level assayed by RNA-seq and RT-qPCR showed high correlation, indicating the reliability of the RNA-seq analysis.

Discussion
For a long time, researchers have mainly concentrated on elucidating the symbiosis mechanism of K. vulgare and B. megatherium, however, few studies have been explored on the underlying mechanism for the differences in central carbon metabolic pathways and 2-KGA yield of K. vulgare. In this study, the genome of K. vulgare SPU B805 was completely sequenced and the metabolic network was reconstructed, which provided a new insight into the carbohydrate metabolic specificity. K. vulgare SPU B805 was the first 2-KGA-producing strain without any plasmid. Its genome was similar to K. vulgare SKV except for some insertion, dislocation and rearrangement. The genome annotation showed that except for histidine, alanine and asparagine, the biosynthetic pathway of other amino acids was complete in strain SPU B805, which had significantly integral function than that in WSH-001 (eight amino acid biosynthesis pathways incomplete) 6 . Liu, et al. 7 reported that glycine, proline, threonine and  isoleucine play vital roles in K. vulgare growth and 2-KGA production, and the addition of these amino acids increased the 2-KGA productivity by 20.4%, 17.2%, 17.2% and 11.8%, respectively. Zhang, et al. 8 also determined the key factors affected 2-KGA fermentation by orthogonal design experiments, and the results indicated that glycine and threonine were the key components affecting the mixed cell growth; serine, glycine, and proline were the key components that affected the 2-KGA production. Therefore, the more complete amino acid biosynthetic pathway, the more contribution to the high 2-KGA production of K. vulgare SPU B805. Moreover, the absence of the SNDH on plasmid 2 can avoid the unknown byproduct in the formation of 2-KGA. Besides, the gene ga2dh,  responsible for the subsequent decomposition of 2-KGA to idonate, was also absent from strain SPU B805, which was a very advantageous factor for the 2-KGA accumulation. Additionally, the higher transcriptional level of five sdh (KVC_2764, KVC_2744, KVC_0718, KVC_1927, KVC_0337) helps to explain the high L-sorbose conversion ability of strain SPU B805. The genome and transcriptome analysis showed that the central carbon metabolism of K. vulgare was versatile. Strain WSH-001, Y25 and Hbe602 contained all the genes involved in the ED, the PPP and the TCA cycle, meaning that these pathways were complete [2][3][4] . While the pfk gene was absent from strain WSH-001, Y25 and Hbe602 genome, indicating that the EMP pathway was nonfunctional in these strains. Whereas, in strain SPU B805 and SKV, all the genes referred to the PPP and the TCA cycle were found, meaning that the PPP and the TCA pathway were complete 5 . The pfk, as well as eda and edd genes were simultaneously missing from the genome of strain SPU B805 and SKV, which suggested that neither of the EMP and the ED pathway were functional. Therefore, the PPP pathway is the exclusively pathway for cytoplasmic carbohydrate decomposition in strain SPU B805 and SKV. In strain WSH-001, Y25 and Hbe602, the ED and the PPP pathways are paralleled for cytoplasmic carbohydrate catabolism. Although the ED pathway was complete in strain WSH-001, Y25 and Hbe602 genome, the carbon flux through this pathway is presumably negligible because of the lower ATP productivity ability of the ED pathway 10 . Moreover, in G. oxydans 621H, the deletion of gnd to inactive the PPP pathway resulted in a reduced final biomass and a decreased end-product 2-keto-gluconate, whereas, the inactivation of the ED pathway by deleting the edd-eda genes caused a lower overall sugar uptake by cells for the formation of biomass and energy in the cytoplasm, and led to a larger fraction of the sugar being converted to the final product of 2-keto-gluconate in the periplasm 16 . This research indicated that the PPP pathway was of the major importance for cytoplasmic carbohydrate catabolism, whereas the ED pathway is dispensable. The loss of ED pathway maybe has a similar effect on 2-KGA production in K. vulgare. Although higher ATP productivity in the EMP pathway than in the ED pathway, the EMP requires more enzymes to sustain the equivalent carbon flux 10 . If the protein synthesis is a limiting factor for cell growth, the bacterium usually does not synthesize additional protein only for one more energy 22,23 .
The PPP pathway, as a fundamental component in cellular metabolism, plays an important role in maintain carbon homoeostasis 13 . The PPP pathway can provide various intermediate metabolites for the biosynthesis of nucleotide and amino acids, such as G3P, E4P, R5P, S7P, and generate NADPH for the maintenance of redox balance under the stress situations 21 . K. vulgare as an aerobic bacterium, its growth and metabolism are inevitably subject to reactive oxygen species (ROS) 24,25 , therefore a reducing environment is an essential requirement for cell metabolism. It is well known that NADPH is the major reducing power in cell, and plays an important role in the GSH and thioredoxin metabolism [26][27][28] . Besides, a large number of anti-oxidant enzymes are coupled with NADPH as cofactor 29 , thus the PPP pathway may help K. vulgare SPU B805 to defeat intracellular ROS. Moreover, the biomass of K. vulgare was composed of protein, DNA, RNA, lipids, peptidoglycans, lipopolysaccharides, glycogen and soluble pool 6,30 . It is conceivable that the PPP pathway may provide a variety of intermediates as precursors to satisfy the demands for the biosynthesis of cellular components and facilitate to curb different types of environmental hardship in the process of growth.

Materials and Methods
Bacterial strains. Two strains, K. vulgare SPU B805 (2-KGA-producing strain) and B. megatherium SPU B806 (the helper strain), which were stored in Microbial Resource Center of Shenyang Pharmaceutical University, were used for 2-KGA fermentation in this study.
To calculate the Colony Forming Units (CFUs) of K. vulgare, the fermentation broth was diluted and spread on P plate, and the growth curve was drawn according to the CFU/mL. Meanwhile, the cell density was measured spectrophotometrically at 660 nm.
Genome sequencing and assembly. K. vulgare SPU B805 was cultured in 250 mL flasks with 50 mL S medium at 30 °C for 18 h at 220 rpm. The genome of strain SPU B805 was extracted with the TIANamp Genomic DNA Kit (Tiangen Biotech (Beijing) Co., Ltd.) according to the instruction. The concentration and purity of DNA sample were qualified by Agilent 2100 BioAnalyzer (Agilent Technologies, USA). The integrity of genomic DNA was detected by agarose gel electrophoresis. The DNA samples with a 260/280 nm absorbance ration of 1.8-2.0 and a 260/230 nm absorbance ration of 2.0-2.2 were considered pure and then used for the library construction and sequencing.
The complete genome sequence was determined by the Chinese National Human Genome Center (Shanghai, China) using 454 single-end sequencing technology. It yielded a total of 116,450 reads with a 17.29-fold coverage of the genome. The reads were assembled into 26 large contigs (>500 nucleotides) and 3 small contigs (100-500 nucleotides) by using the 454 Newbler assembler (454 Life Sciences, Branford, CT). The relationships between the contigs were determined by ContigScape 31 , and the gaps between the contigs were closed by PCR amplification, Scientific  followed by DNA sequencing. Finally, the contigs were assembled into a circular chromosome, and no plasmid was found. The complete genome of SPU B805 had an error rate of less than 0.2 in a 10-kb sequence as determined by sequence assembly and quality assessment using the Phred 32 /Phrap 33 /Consed 34 software package.
Genome annotation. Protein-coding genes were predicted by combining the results of Glimmer 3.02 35 and ZCURVE 36 , followed by manual inspection. Each gene was functionally classified by assigning a Cluster of Orthologous groups (COG) number. Both tRNA and rRNA genes were identified by tRNA scan-SE 37