Biochemical properties of GH94 cellodextrin phosphorylase THA_1941 from a thermophilic eubacterium Thermosipho africanus TCF52B with cellobiose phosphorylase activity

A hypothetic gene (THA_1941) encoding a putative cellobiose phosphorylase (CBP) from Thermosipho africanus TCF52B has very low amino acid identities (less than 12%) to all known GH94 enzymes. This gene was cloned and over-expressed in Escherichia coli BL21(DE3). The recombinant protein was hypothesized to be a CBP enzyme and it showed an optimum temperature of 75 °C and an optimum pH of 7.5. Beyond its CBP activity, this enzyme can use cellobiose and long-chain cellodextrins with a degree of polymerization of greater than two as a glucose acceptor, releasing phosphate from glucose 1-phosphate. The catalytic efficiencies (kcat/Km) indicated that cellotetraose and cellopentaose were the best substrates for the phosphorolytic and reverse synthetic reactions, respectively. These results suggested that this enzyme was the first enzyme having both cellodextrin and cellobiose phosphorylases activities. Because it preferred cellobiose and cellodextrins to glucose in the synthetic direction, it was categorized as a cellodextrin phosphorylase (CDP). Due to its unique ability of the reverse synthetic reaction, this enzyme could be a potential catalyst for the synthesis of various oligosaccharides. The speculative function of this CDP in the carbohydrate metabolism of T. africanus TCF52B was also discussed.

i where G-1-P denotes glucose-1-phosphate, and P i denotes inorganic phosphate. where G n denotes a β-glucan oligomer of length n (n ≥ 2), and G n+1 denotes a β-glucan oligomer of length n + 1.
Although CBP and CDP belong to the same glycoside hydrolase family, they have different substrate specificities 13,14 .
GH94 phosphorylases are involved in the intracellular catabolism of β-d-glycosides, enabling direct production of phosphorylated glucose without ATP consumption. Due to this energetic benefit, these enzymes could find broad uses in metabolic engineering, biocatalysis and in vitro synthetic biology. For example, CBPs have been introduced into non-cellulolytic ethanol-producing microbes, such as Saccharomyces cerevisiae and Escherichia coli, for intracellular phosphorolysis of cellobiose for enhanced bioenergetic levels [15][16][17][18][19] . Compared to cells employing a βG for intracellular cellobiose hydrolysis, Ha et al. 16 demonstrate that CBP-containing S. cerevisiae had more biomass synthesis and higher ethanol yields under strict anaerobic conditions and under acetate stress conditions. Shin et al. 17 further shows that CBP-containing E. coli cells tolerated acetate more than BG-containing cells under both anaerobic and aerobic conditions. Another promising application of CBP is in vitro cascade biocatalysis. By utilization of its phosphorolytic ability, this enzyme can produce high-yield hydrogen from cellobiose and cellodextrin 20 and from oligoxylosaccharides 21 . Also, by combining C. thermocellum CBP along with potato alpha-glucan phosphorylase, a fraction of cellulose can be converted to artificial starch 22 . By utilization of their reverse synthetic ability, CBP and CDP can be used to synthesize diverse disaccharides, oligosaccharides, and glycolipids [23][24][25] . Great potentials of CBP and CDP motivate us to discover their new species with more desirable properties, for example, high thermostability, high activity, broad substrate specificity and so on.
In this study, the open reading frame (THA_1941) encoding a putative CBP from Thermosipho africanus TCF52B was cloned and overexpressed in E. coli BL21(DE3). Its basic biochemical properties were characterized. This enzyme was the first enzyme with both cellodextrin and cellobiose phosphorylase activities.

Results
Discovery for a putative CBP from T. africanus TCF52B. By considering potential industrial applications and easy thermophilic protein purification expressed in a mesophilic host E. coli, putative thermostable CBP genes were searched from exploring genomic DNA database of thermophilic microorganisms. T. africanus TCF52B, which was isolated from a high-temperature oil reservoir in the North Sea with an optimal growth temperature of 70 °C, was sequenced and annotated in 2009 26 . Although the locus THA_1941 (protein_id ACJ76363.1) was annotated as a hypothetical protein in both KEGG and NCBI, there existed a region COG3459 in the sequence annotated as "cellobiose phosphorylase [carbohydrate transport and metabolism]". So we speculated THA_1941 as a putative CBP. Also, signal peptide analysis predicted that the protein without a signal peptide was located intracellularly, in consistent with its intracellular function. However, sequence alignment with ClustalW showed that this putative 1,019 amino acid enzyme had a very low sequence identity to all characterized CBPs, as well as to characterized CDPs, LBPs, ChBPs, and CBAPs. Among them, it has the highest identity of 11.9% with C. thermocellum CBP (GenBank No.: ABN51514.1).

Structure basis for the putative CBP's enzymatic function. A phylogenetic analysis was conducted
to examine the genetic relationship of THA_1941, the putative CBP with characterized GH94 phosphorylases, including CBPs, CDPs, LBPs, ChBPs, and CBAPs. The phylogenetic tree was generated by Neighbour-Joining Method based on amino acid sequences (Fig. 1). It was clear that all CBPs fell into one cluster, and ChBPs, CBAPs, and LBPs were all in their own cluster, while CDPs could go into the same cluster at the larger genetic distance with CBPs, ChBPs and CBAPs. Yet, THA_1941 separately located on an earliest diverging clade, indicating that the putative CBP had the greatest genetic distance with all others and therefore could not be categorized into any of the known GH94 enzymes.
Based on these structural similarities, THA_1941 was very likely to have the similar 3-D folding structure with the modular CBPs, ChBP, and CBAP and belong to GH94 family, having similar catalytic mechanism.
Production, purification and CBP identification of THA_1941. The recombinant THA_1941 was over-expressed by E. coli BL21(DE3) harboring the protein expression plasmid, and then was purified to homogeneity by Ni-chelating column chromatography followed by anion-exchange column chromatography. The purity of the protein was confirmed by SDS-PAGE analysis (Fig. 3). The molecular weight of the purified protein was estimated to be approximately 120 kDa, in agreement with its deduced amino acid sequence. The purified protein can release Pi from G-1-P with either d-glucose or d-xylose as a glucosyl acceptor. It also showed phosphorolytic activity towards cellobiose. Therefore, the hypothetical protein THA_1941 was validated to be a CBP enzyme (denoted TaCBP). TaCBP showed the highest activity at 75 °C ( Fig. 4) and pH 7.5 (Fig. 5), and retained 80% of its initial activity after incubating at 75 °C and pH 7.5 for 30 min (Fig. 6), suggesting its good thermostability.
Substrate specificity of TaCBP in the synthetic reaction. The synthetic reaction rates of TaCBP on 10 mM various sugars are presented in Table 1. TaCBP showed measurable activities to all nine monosaccharides, among which d-glucose had the highest rate of 1.86 µmol/min/mg, followed by d-glucosamine. As for three tested disaccharides, it was notable that both d-cellobiose and d-gentiobiose acted as glucosyl acceptors with 31.5-and 26.9-fold higher rates than that of d-glucose, respectively. Compared with these two disaccharides, the other disaccharide d-maltose showed a rather weak activity, furthermore, the possibility that the activity might be from contaminated d-glucose in the d-maltose was not yet excluded. Although the wide range of substrate specificity commonly existed in synthetic reaction of CBPs 3, 7, 10, 32 , there was no CBP reported to use cellobiose as a glucosyl acceptor. This result suggested that TaCBP was the first enzyme having both CBP and CDP activities.
Identification of the CDP function. TaCBP's activities towards various cellodextrins in the synthetic reaction and phophorolytic reaction were investigated in the presence of 5 mM different chain-length cellodextrins. Table 2 shows the activities of TaCBP on all cellodextrins from cellobiose to cellopentaose acted as glucosyl acceptors. It showed the highest activity of 109.3 µmol/min/mg on cellotetraose while remained 38.6% activity on cellobiose. In the phosphorolytic reaction, TaCBP showed measurable activities on all tested cellodextrins, although the reaction rates were much lower than those of synthetic reaction. Similar to the synthetic reaction, TaCBP exhibited the highest activity on cellotetraose. The fact that TaCBP can phosphorolyze both cellobiose and cellodextrins with a DP greater than two further validated that TaCBP had the catalytic function of a CDP enzyme.  Amino acid sequence alignment of THA_1941 with structure-solved GH94 enzymes. Sequence alignment of THA_1941 with structure-solved CBPs from Cellvibrio gilvus (CgCBP, BAA28631.1), Cellulomonas uda (CuCBP, AAQ20920.1), and Clostridium thermocellum (CtCBP, AAL67138.1), ChBP from Vibrio proteolyticus (VpChBP, BAC87867.1), and CBAP from Saccharophagus degradans (SdCBAP, ABD80168.1) was performed using the program ClustalX2 42 and formatted with BioEdit. Secondary structure was predicted with PSIPRED sever 43 and the secondary structural elements are marked on the top. Secondary structure elements from CgCBP, VpChBP and SdCBAP are shown below the sequence alignment (the secondary structure of CgCBP represents that of CuCBP and CtCBP because of their high similarity). Arrows and columns represent the β strands and α helices, respectively. Conserved residues in the N terminal sandwich domain (Glyco_trans_36, PF06165, reclassficated into GH94) are highlighted with red rectangles. The catalytic residues are marked with red stars, while the phosphate binding sites and the sugar binding sites are marked with triangles and dots, respectively. Kinetic parameters. In the synthetic reaction, initial reaction rates were measured on varying concentrations of d-glucose, d-xylose, G-1-P, and cellodextrins and the kinetic parameters are summarized in Table 3. TaCBP showed a much higher catalytic efficiency on glucose (30.3 s −1 ) than on d-xylose (1.37 s −1 ). The K m values for cellodextrins decreased with an increase in substrate chain length, suggesting TaCBP's higher affinity on longer-chain substrates. The k cat for cellotetraose was the highest among the cellodextrins tested, yet cellopentaose was the best substrate in terms of its k cat /K m .
The kinetic parameters to a series of cellodextrins for the phosphorolytic reaction were determined in the presence of 100 mM inorganic phosphate (Table 4). Similar with those for the synthetic reaction, the K m values for cellodextrins decreased with an increase in substrate chain length. Cellotetraose was the best substrate for  phosphorolysis in terms of the value of k cat /K m , consistent with the fact that TaCBP had the highest phosphorolytic activity towards cellotetraose.

Discussion
We validated that the hypothetical protein THA_1941 from T. africanus TCF52B was an enzyme having both CDP and CBP activities. Comparing its two activities, though in phosphorolytic reaction they were in the similar level, in the synthetic reaction it showed remarkably higher specificity to cellodextins than to d-glucose and d-xylose, indicating its higher CDP activity than CBP activity. Therefore this enzyme was categorized as cellodextrin phosphorylase (EC 2.4.1.49) and designated as TaCDP despite of the initial name TaCBP.   Compared with properties of the known CBPs and CDPs (Table 5), TaCDP had the highest optimal temperature (75 °C) among known CDPs, representing its best thermostability. As for substrate specificity, no known CDP was reported to phosphorolyze cellobiose, neither to use monosaccharides such as d-glucose, d-xylose and d-glucose-derivatives as glucosyl acceptors in its reverse synthetic reaction. Different from known CDPs, TaCDP showed a wide range of substrate specificity in both reaction directions, making it a unique bifunctional enzyme with both CDP and CBP activities. It was noteworthy that TaCDP had much lower catalytic efficiency in the phosphorolytic reaction than in the reverse synthetic reaction, indicating that it had much greater preference for the synthetic reaction than for the phosphorolytic reaction. Moreover TaCDP's catalytic efficiency in the synthetic reaction was shown to be the highest among the known CDPs as comparing their highest k cat values for cellodextrins in synthetic reaction: the highest k cat (612 s −1 , 60 °C) of TaCDP was 13-fold and 38-fold higher than that of RaCDP (47.1 s −1 , 37 °C) 33 and CtCDP (16.2 s −1 , 37 °C) 13 , respectively.
Though TaCDP's enzymatic functions were characterized in vitro, it was difficult to investigate its biological function in T. africanus TCF52B without genetic modification tools. As suggested by Taylor II et al. 34 , both CBPs and CDPs were included in the class of "cellulase accessory enzymes", which acted on cellulose oligosaccharides, the products of cellulases i.e. β-1,4-endoglucanases and/or cellobiohydrolases. By searching the genomic sequence of T. africanus TCF52B, only 2 ORFs (THA_83 and THA_328) were found to be possible endoglucanases related to cellulose degradation, yet both of them have no putative signal peptides. Hence, we presumed that T. africanus TCF52B was unlikely to be a lignocellulose-degrading bacterium. Considering the fact that TaCDP's phosphorolysis efficiency for cellodextrins was much lower than its synthesis efficiency, we then speculated that TaCDP might be responsible for the cellodextrins formation other than for cellodextrins degradation, where cellodextrins formed by TaCDP might be intracellular energy reserve material, like poly-β-hydroxybutyrate accumulated in many bacteria 35 and glycogen accumulated by C. cellulolyticum 36 . A possible carbohydrate metabolism pathway was constructed to illustrate the role of involved TaCDP (Fig. 7). Here, when T. africanus TCF52B grows in rich nutrition conditions, cytoplasmic G-6-P will be converted to G-1-P by phosphoglucomutase (locus THA_RS09865 and THA_1027) and G-1-P then transfer its glucose unit to d-glucose or other monosaccharides by TaCDP, forming gradually longer cellodextrin. When this bacterium grows in poor nutrition conditions, accumulated cellodextrin will be hydrolyzed by intracellular β-glucosidase (locus THA_1926 and THA_1942). This scheme for synthesizing energy reserves uses ATP-derived energy more efficiently than that of synthesizing   Continued glycogen, as only one ATP is consumed per glucose unit addition while two ATPs in glycogen synthesis. Actually, we did not find any putative UDP/ADP-glucose pyrophosphorylases in T. africanus TCF52B genome, enzymes responsible for activated glucosyl donor formation for the glycogen synthesis, which may increase the possibility of our assumption on TaCDP's function. Phosphorolytic enzymes like CBPs and CDPs had advantages in oligosaccharides synthesis over chemical catalysts because they have both stereoselectivity and regiospecificity 37,38 . TaCDP had high synthetic activities towards monosacchrides, disaccharides, and long-chain oligosaccharids, making it a valuable biocatalyst for cost-effective enzymatic synthesis of various oligosaccharides.

Methods
Cloning and expression of T. africanus THA_1941. The DNA sequence of THA_1941 can be found in KEGG database. The gene was amplified using the polymerase chain reaction (PCR) from the T. africanus TCF52B genomic DNA using 5′-CCTAG CTAGC ATGAA AAAAT TTGAC TTTGT G-3′ and 5′-CCGCT CGAGT TCAAA ATAAC ATATA ACTTC GTC-3′ as the forward and reverse primers, respectively (NheI and XhoI restriction sites underlined, respectively). The PCR product was digested with NheI and XhoI prior to insertion into pET21a(+). The ligation product was transformed into E. coli DH5α competent cells and the plasmid was verified by DNA sequencing service provided at Genewiz Inc. China. The plasmid was then transformed into E. coli BL21(DE3) competent cells. A single colony was picked to grow in Luria-Bertani (LB) medium supplemented with 50 μg/ml ampicillin. The culture was grown at 37 °C and 220 rpm until an absorbency of 600 nm reached 0.6-0.8. Expression of the protein was induced with the addition of isopropyl β-d-1-thiogalactopyranoside (IPTG) to a final concentration of 1 mM and the culture temperature was decreased to 25 °C for six hours. The cells were harvested by centrifugation at 3,800 × g for 10 min at 4 °C.
Enzyme purification. The cell pellets were re-suspended in 20 mM Tris-HCl buffer (pH 7.0) with 500 mM NaCl (pH8.0) followed by sonication. The cell lysate was centrifuged at 10,000 × g at 4 °C for 20 min and the supernatant was applied to a nickel-charged resin column (Takara, Shiga, Japan). The column was washed with gradual washing containing 20-50-200-500 mM imidazole. The fraction eluted with 200 mM imidazole was collected and dialyzed against 20 mM Tris-HCl buffer (pH 7.4). After centrifugation the enzyme solution was loaded onto a pre-equilibrated HiTrap Q HP column (GE Healthcare) and the elution was performed with a linear gradient of NaCl from 100 to 500 mM. Under these conditions, most target protein was in the fractions of 200 mM NaCl eluate. The purity of each fraction was assessed by SDS-PAGE 39 . Only those fractions showing a single band were pooled and the concentration of the resulting purified sample was determined to be 1.40 mg/ml using the method of Bradford with bovine serum albumin as a standard 40 . Enzyme assays. In the synthetic reaction, CBP activity was assayed by measuring the amount of P i liberated from G-1-P 32 . d-Glucose was routinely used as an acceptor of glucosyl, unless otherwise noted. A reaction mixture of 200 μl contained 50 μl appropriately diluted enzyme, 40 mM G-1-P, 1 mM MgCl 2 , 10 mM dithiothreitol (DTT), 20 mM d-glucose, and 50 mM Tris-HCl buffer (pH 7.5). The mixture was incubated for 15 min at 60 °C and then the reaction was terminated by the addition of 2 ml of the molybdate reagent containing 15 mM ammonium molybdate, 100 mM zinc acetate [pH 5.0]), and 500 μl of ascorbic acid reagent (10% [wt/vol], pH 5.0) was then added to the mixture. This mixture was incubated at 30 °C for 15 min, and the absorbance was measured at 850 nm. One unit of CBP in synthetic reaction was defined as the amount of enzyme that produced 1 μmole of phosphate per min. The phosphorolytic activity of TaCBP was assayed by measuring the formation of G-1-P from cellobiose (or cellodextrins when stated). A reaction mixture of 40 μl containing 5 mM cellobiose or cellodextrins, 100 mM sodium phosphate buffer (pH 7.5), and appropriate amount of enzyme was incubated for 15 min at 60 °C. The reaction was stopped by boiling for 10 min, and the amount of G-1-P produced was determined by a coupled enzyme assay measuring the appearance of NADPH at 340 nm. The assay mixture contained phosphoglucomutase (4.0 U/ml), glucose-6-phosphate dehydrogenase (2.0 U/ml), where the two enzymes were purchased from Sigma, 3 mM NADP + , and 5 μM glucose 1,6-bisphosphate (Sigma) in 80 mM triethanolamine buffer (pH 7.5) containing 4.4 mM MgCl 2 . One unit of phosphorolytic activity was defined as the release of one μmole of G-1-P per min.
Optimum pH, optimum temperature, and thermal stability. Determinations in this section were all based on the synthetic reaction. The optimum pH was investigated by measuring the enzyme activity as described above at various pH values (citric acid-sodium citrate buffers for pH 3.0-6.6, Tris-HCl buffers for pH 7.1-8.9, and glycine-NaOH buffers for pH 8.6-10.6). The optimum temperature was measured over a temperature range of 50-100 °C at pH 7.5. To evaluate thermostability, the enzyme (0.092 mg/ml) was incubated at 75 °C and pH 7.5 for different time periods (5-30 min) and the residual enzyme activity was measured. All assays were performed in triplicates.
Substrate specificity. The substrate specificity for the synthetic direction was investigated by measuring initial velocities (μmol/min/mg) for 10 mM various sugars with 40 mM G-1-P at 60 °C. The final concentration of enzyme used for each sugar was adjusted properly to make sure the P i amount released was suitable for accurate determination. Thus the final enzyme concentrations were 0.67 µg/ml and 2 µg/ml for d-cellobiose and d-gentiobiose, respectively, while 20 µg/ml of enzyme was used for d-glucose, d-maltose, and 2-deoxy-d-glucose, and 100 µg/ml for all other sugars.
The substrate specificity for the phosphorolytic direction was investigated by measuring the initial velocities for 5 mM different cellodextrins at 60 °C with 87.5 µg/ml of the enzyme. The other conditions were the same as for the phosphorolytic activity assay mentioned earlier.

Kinetic analyses.
To determine the kinetic parameters for the synthetic reaction, 200 µl of reaction system was used and the final concentrations of the enzyme used for d-glucose, d-xylose, and cellodextrins were 20, 100, and 0.67 µg/ml, respectively. The initial reaction rates at varying concentrations of d-glucose (0.5-10 mM), d-xylose (5-100 mM), or cellodextrins (0.5-5 mM) with 40 mM of a fixed concentration of G-1-P were determined. While determining the kinetic parameters to G-1-P, the reaction mixture contained 20 mM of a fixed concentration of d-glucose and G-1-P (0.5-10 mM) with 20 µg/ml of the enzyme. To determine the kinetic parameters for the phosphorolytic reaction to cellodextrins, 40 µl of reaction system and 100 µg/ml of the enzyme concentration were used. The initial reaction rates at varying concentrations of cellodextrins with 100 mM sodium phosphate buffer (pH 7.5) were determined. Both the synthetic reaction and the phosphorolytic reaction were incubated at 60 °C for 10 min. Each result was an average of at least three repetitions. K m and k cat values were calculated based on the Hanes-Woolf plots.