Structural basis for broad substrate specificity of UDP-glucose 4-epimerase in the human milk oligosaccharide catabolic pathway of Bifidobacterium longum

Infant gut-associated bifidobacteria has a metabolic pathway that specifically utilizes lacto-N-biose I (Gal-β1,3-GlcNAc) and galacto-N-biose (Gal-β1,3-GalNAc) from human milk and mucin glycans. UDP-glucose 4-epimerase (GalE) from Bifidobacterium longum (bGalE) catalyzes epimerization reactions of UDP-Gal into UDP-Glc and UDP-GalNAc into UDP-GlcNAc with the same level of activity that is required to send galacto-hexoses into glycolysis. Here, we determined the crystal structures of bGalE in three ternary complex forms: NAD+/UDP, NAD+/UDP-GlcNAc, and NAD+/UDP-Glc. The broad specificity of bGalE was explained by structural features of the binding pocket for the N-acetyl or C2 hydroxy group of the substrate. Asn200 is located in a pocket of the C2 group, and its side chain adopts different conformations in the complex structures with UDP-Glc and UDP-GlcNAc. On the other side, Cys299 forms a large pocket for the C5 sugar ring atom. The flexible C2 pocket and the large C5 pocket of bGalE are suitable for accommodating both the hydroxy and N-acetyl groups of the substrate during sugar ring rotation in the catalytic cycle. The substrate specificity and active site structure of bGalE were distinct from those of Esherichia coli GalE but similar to those of human GalE.


Results and Discussion
Overall structure. The molecular masses of the purified recombinant bGalE protein (348 amino acids) as deduced from the amino acid sequence, estimated by SDS-PAGE and gel filtration chromatography were 38.3, 38, and 68 kDa, respectively, suggesting that this protein is dimeric in solution. Crystals of bGalE belong to the P6 5 22 space group and contain one monomer per asymmetric unit. Since the recombinant bGalE protein contains one NAD + molecule per monomer as a noncovalently bound prosthetic group, we prepared ternary complex form crystals with UDP, UDP-Glc, or UDP-GlcNAc by cocrystallization and determined their crystal structures at 1.8-2.0 Å resolution (Table 2). A plausible dimer assembly formed by a crystallographic 2-fold rotation symmetry is present in the crystal packing (Fig. 3A). A molecular interface analysis using PISA 36 indicates that the whole surface area, buried area, and ΔG int value (the solvation free energy gain upon formation of the assembly) are 23,790 Å 2 , 6,340 Å 2 , and -44.7 kcal/mol, respectively. The dimer interface consists of a bundle of 4 α-helices and contains 8 hydrogen bonds and 4 salt bridges. bGalE has a typical fold of SDR superfamily 35 . The overall monomer protein structure can be roughly divided into an N-terminal NAD + binding domain (N domain, residues 1-177 and 236-262) and a C-terminal UDP-sugar binding domain (C domain, residues 178-235 and 263-340). The N domain adopts a typical Rossmann fold composed of a seven-stranded parallel β-sheet flanked by nine α-helices. The C domain possesses an α/β motif consisting of four α-helices and four β-strands. The three ternary complex structures have virtually the same main chain structures because the root mean square deviation (RMSD) values for the Cα atoms (no distance cutoff) between them are less than 0. NAD + binding site. The electron density of the NAD + molecule bound in the N domain was clearly observed in all three complex structures. Figure 3B shows the mF o -DF c omit map of NAD + in the UDP complex. The nicotinamide ribose and adenine ribose moieties are in syn-C2′-endo and anti-C2′-endo conformations, respectively. The ribose conformations of NAD + are identical to those observed in eGalE 37 and other GalE enzymes studied so far. Figure 4A shows interactions of the NAD + molecule with bGalE. Residues recognizing the NAD + cofactor are basically conserved in GalEs. In bGalE, the nicotinamide ribose moiety interacts with Phe101, Lys154, and Tyr150, the pyrophosphate moiety interacts with Phe12, Ile13, and Lys85, and the adenosine base moiety interacts with Asn33, Asp59, Val60, and Asn100. A loop region (Asp32-Ser37, magenta in Fig. 4A) forms several direct hydrogen bonds with the base, ribose, and phosphate groups of the adenosine moiety.
The cofactor NAD + was intrinsically bound to bGalE without any supplementation throughout experimental processes in all complexes. It was reported that when NAD + was removed from the GalE proteins, the proteins were irreversibly denatured 37 . The loop region of Asp32-Ser37 in the N domain interacts with the adenosine moiety of NAD + (magenta in Fig. 3A). The corresponding region (residues 32-43) in P. calidifontis GalE was defined as the "NAD-binding loop" 33 . The NAD-binding loop is present in all GalE homologs (Fig. 5). In contrast, l-threonine dehydrogenases (l-ThrDHs), which share similar structural fold with GalEs, lack this loop and easily release NAD + from the protein 38 . Sakuraba et al. indicated that the NAD-binding loop of GalEs plays a key role in preventing the release of the catalytically relevant cofactor and contributing to the protein stability of P. calidifontis GalE 33 . Fig. 3C-E, the electron densities of UDP, UDP-Glc, and UDP-GlcNAc were clearly observed. Interactions of UDP-Glc and UDP-GlcNAc with bGalE are shown in Fig. 4B. Residues recognizing the UDP moiety are basically conserved in GalEs. In bGalE, the UDP moiety is recognized by stacking and hydrophobic interactions by Tyr291 and Leu201, hydrogen bonds by Asp295 and Asn180, and salt bridges by Arg292 and Arg232. These interactions are identically present in the UDP complex (data not shown). The hydroxy groups of the Glc moiety are recognized by hydrogen bonds from the main chain of Lys85 (O2 and O3) and the side chain atoms of Asn200 (O2), Tyr150 (O3 and O4), Ser125 (O4), and Asn180 (O6). The GlcNAc moiety is similarly recognized, but a significant displacement of the side chain of Asn200 was observed. In the UDP-GlcNAc complex, the side chain of Asn200 adopts a distinct conformation (designated as "swing out") from that in the UDP-Glc complex ("swing in") to accommodate the 2-N-acetyl group. This phenomenon was also reported for the corresponding residue (Asn207) in the UDP-GlcNAc complex of hGalE 30 (discussed below).

UDP-sugar binding site. As shown in
The glucopyranose rings of the UDP-sugars in bGalE are both in the "standard" conformation ( Fig. 1), in which the H4 atom faces the pro-S side of the NAD + nicotinamide. The distances between the C4 atom of the sugar ring and the C4 atom of the nicotinamide ring are 3.4 Å and 3.5 Å in the UDP-Glc and UDP-GlcNAc complex structures, respectively. In the proposed reaction mechanism of GalE, a hydride transfer is thought to occur between these carbon atoms, leading to formation of a 4-ketopyranose intermediate and NADH (Fig. 1) 37 . It has been reported so far that the distance between the C4-C4 atoms in a productive mode is within the range of 3.0 ~ 3.7 Å 22,26,29,30,34,39 . Tyr150 forms a direct hydrogen bond (2.6 Å) with the O4 atom in both the UDP-Glc and UDP-GlcNAc complex structures. The corresponding residue of hGalE (Tyr157) is the catalytic base residue

UDP-Glc UDP-Gal UDP-GlcNAc UDP-GalNAc Note
Ref. www.nature.com/scientificreports www.nature.com/scientificreports/ (proton acceptor) from the O4 hydroxy in the first reaction step (Fig. 1) 29 . In the subsequent reaction steps, the sugar moiety of the substrate is assumed to flip its sugar ring, and the nicotinamide ring of the reduced cofactor (NADH) transfers back the hydride from the si-face to the opposite face of the C4-keto-intermediate to produce an epimer 19 . In the latter step, the catalytic tyrosine residue acts as a general acid (proton donor) to the O4 hydroxy group of the sugar moiety. A comprehensive study using X-ray crystallography, in situ NMR, and steady-state and stopped-flow kinetics revealed a detailed reaction mechanism of human UDP-xylose synthase 1 (hUXS1; EC 4.1.1.35; other name: UDP-glucuronic acid decarboxylase), which also belongs to the SDR1E family of SDR superfamily 40 . In addition to the decarboxylation reaction that forms xylose from glucuronic acid, the proposed reaction mechanism involves base/acid catalysis by Tyr147, which is located at the corresponding position (near the C4 atom of the sugar moiety) to Tyr150 of bGalE.
A completely conserved SYK catalytic triad motif has been recognized in SDR superfamily enzymes 35 . In bGalE, Ser125, Tyr150, and Lys154 correspond to the triad (Fig. 5). Lys154 contributes to the binding of nicotinamide ribose, while Ser125 and Tyr150 form direct hydrogen bonds with the O3 and O4 hydroxy groups of the Glc www.nature.com/scientificreports www.nature.com/scientificreports/ or GlcNAc (Fig. 4), respectively. These residues appear to support precise transfer of hydride and proton, which is the most important step of GalE catalysis. For eGalE, site-directed mutagenesis studies indicated the catalytic importance of the triad residues (Ser124, Tyr149, and Lys153) 41,42 . Structural analyses on mutants of the Ser and Tyr residues were also reported 24,26 . Tyr149 and Ser124 are in position to participate in the acid-base catalysis required to drive hydride transfer and synergistically contribute to the catalysis by coordinating the reactive O4 hydroxy group 20 . Asn180, which recognizes the O6 hydroxy of the sugar moiety, is also completely conserved (Fig. 5). The other three residues involved in forming the sugar binding pocket (Lys85, Asn200, and Cys299) exhibit variety in the three GalE groups (1, 2, and 3) and prescribe their substrate specificity. Figure 6 shows superimpositions of the active site of bGalE + UDP-GlcNAc and representative enzymes in group 2 (hGalE), group 1b (eGalE), group 1a (tGalE), and group 3 (WbpP), presenting both the standard (bGalE + UDP-GlcNAc, hGalE + UDP-GlcNAc, and bGalE + UDP-Glc) and the flipped (tGalE + UDP-4-deoxy-4-fluoro-α-galactose (UDP-4fGal) and WbpP + UDP-GlcNAc) conformations of the sugar moiety. The UDP-GlcNAc complex structures of bGalE and hGalE (both in group 2) superimpose very well, and the residues forming the sugar binding site are completely conserved (Figs 5 and 6A). It is noteworthy that most UDP sugars observed in the crystal structures of GalE enzymes in groups 1 and 2 were in the standard conformation. For eGalE and tGalE in group 1, the flipped conformation structure was obtained only by using the S124A/Y149F double mutant 26 or by using a substrate analog UDP-4fGal 39 . On the other hand, for a group 3 enzyme WbpP, the wild-type enzyme bound the natural substrate UDP-GlcNAc in the flipped conformation 32 .
Ishiyama et al. proposed a conceptual model of the GalE active site surrounded by a hexagon-shaped box (six walls) and used it for discussion of the relationship between sequence (active site structure) and substrate specificity 32 . Here, we simplify the concept and focus on two pockets (walls in the original concept) responsible for the substrate specificity between UDP-Glc/Gal and UDP-GlcNAc/GalNAc. As shown in Fig. 7A,B, Asn200 and Cys299 form pockets for the groups connected to the C2 and C5 sugar ring atoms, respectively, in the "standard" conformation. Therefore, we hereafter designate these structural regions as C2 pocket and C5 pocket. These two pockets can discriminate the size of the C2 hydroxy or N-acetyl groups during the sugar group rotation in the catalytic cycle of the GalE reaction.
It has been shown that the size difference of the C5 pocket is the primary factor for the substrate specificity of GalEs 32 . Group 2 and 3 enzymes (bGalE, hGalE, and WbpP) have large C5 pocket with small Cys or Ser residue at this position whereas group 1 enzymes (eGalE and tGalE) have small C5 pocket with a bulky Tyr or Leu residue UDP + NAD + UDP-GlcNAc + NAD + UDP-Glc + NAD +  www.nature.com/scientificreports www.nature.com/scientificreports/ (Fig. 7). Although the crystal structure of eGalE Y299C mutant complexed with UDP-GlcNAc was reported, the sugar moiety was bound in a displaced nonproductive mode, and the distance between the C4 atom of GlcNAc and the C4 atom of nicotinamide ring was 9.4 Å 27 . The Y299C mutant of eGalE showed 5-fold reduction in the activity toward UDP-Gal, while it exhibited a 230-fold increase in activity toward UDP-GalNAc (Table 1). For hGalE, C307Y mutation resulted in significant loss of activity toward UDP-GalNAc, while the activity toward UDP-Gal was not affected (Table 1) 43 .
In the C2 pocket, the size and flexibility of the amino acid side chain contributed to accommodation of the N-acetyl group of the substrate. Similar to the case of bGalE observed in the present work, hGalE (group 2) adopts "swing in" and "swing out" conformations in the UDP-Glc and UDP-GlcNAc complex structures, respectively 29,30 . eGalE (group 1b) also has Asn199 residue at this position (Fig. 7D). However, in the crystal structure of the wild-type eGalE enzyme, the sugar moiety of UDP-GlcNAc was disordered 27 , suggesting limited flexibility of the www.nature.com/scientificreports www.nature.com/scientificreports/ Asn199 side chain. We further investigated the protein structures around the Asn residue and C2 pocket of the group 1b and 2 enzymes, but no significant structural feature that contribute to the flexibility difference between them was found (data not shown). The corresponding residues at the C2 pocket are large His221 in tGalE (group 1a specific for UDP-Glc) and small Ala209 in WbpP (group 3 specific for UDP-GlcNAc/GalNAc) (Fig. 7E,F). Therefore, the C2 pocket size is likely to contribute to the substrate specificity in these groups. In addition to the steric hindrance, a hydrogen-bonded solvent (water) network near Ala209 in WbpP is involved in the recognition of the N-acetyl group of the substrate 32 .
Three-dimensional structures of GalEs from a wide range of organisms are now available in the Protein Data Bank due to structural genomics projects and studies from several groups. Such efforts and advance in protein homology modeling have enabled more precise prediction of GalE specificities than the pioneering work by Ishiyama et al. 32 . We previously investigated biochemical characteristics of a paralog GalE enzyme from the same organism, B. longum JCM1217 (BLLJ_1592 corresponding to BL1671 of B. longum NCC2705) 44 , whose gene is not accompanied by other galactose metabolic genes. Although we did not measure the activity for UDP-GlcNAc/ GalNAc, the specific activity of BLLJ_1592 for UDP-Gal (260 U/mg) was higher than that of bGalE (140 U/mg). Interestingly, BLLJ_1592 was suggested to be inhibited by GalNAc more severely than bGalE because the enzymatic GNB production using BLLJ_1592 under high GalNAc concentration (600 mM) was less efficient than that using bGalE. A phylogenetic tree analysis indicated that BLLJ_1592 is distantly located from any group (Fig. 2), and multiple amino acid sequence alignment could not predict its substrate specificity because of the low sequence homology around the C2 pocket-forming region (Fig. 5). We built a reliable homology model of BLLJ_1592 (GQME = 0.72 and QMEAN = −2.34) using the SWISS-MODEL server 45 with an unpublished structure of GalE from Burkholderia pseudomallei (PDB ID: 3ENK, sequence identity = 42.3%) as a template. As shown in Fig. 6E, the C2 and C5 pocket of BLLJ_1592 are formed by Asn190 and Leu290, respectively. The inhibition by GalNAc suggests that the side chain of Asn190 is flexible and can accommodate the N-acetyl side chain. Therefore, BLLJ_1592 was predicted to have group 1-like specificity (low activity for UDP-GlcNAc/GalNAc) due to the large side chain of Leu290 in the C5 pocket. www.nature.com/scientificreports www.nature.com/scientificreports/ Concluding remarks. In this study, we revealed the structural basis for the broad substrate specificity of a bacterial enzyme that is required for efficient catabolism of HMO and mucin O-glycans. Interestingly, the structure of GalE from the human symbiont (B. longum) was more similar to that of its host animal (H. sapiens) compared to the enzyme from a gut microbe living in the same niche (E. coli). GalEs in group 1b and 2 are closely located in the phylogenetic tree (Fig. 2), and a small number of amino acid mutations in the active site (especially Cys or Tyr residue at the C5 pocket) can change the substrate specificity. Gal and GalNAc sugars are utilized by wide range of organisms from microbes to animals for various biological events such as breast feeding (lactose and HMO) and utilization of glycoproteins and glycoconjugates. To metabolize and utilize galacto-type sugars in relationship with other organisms (symbiosis or pathogenesis), organisms may be taking advantage of the GalE enzymes' characteristics, such as the specificity being prone to change. Interestingly, hUXS1, which catalyzes the considerably different reaction from GalEs 40 , is located within the clade of group 3 GalEs in a phylogenetic tree (Fig. 2). The UDP-xylose synthase plays a key role in the glycosaminoglycan synthesis on the protein core of extracellular matrix proteoglycans of mammals. Moreover, a study using kinetic isotope effects reveled a more complicated reaction mechanism of UDP-D-apiose/UDP-D-xylose synthase, which is related to hUXS1. In addition to the NAD + -dependent oxidation and base catalysis by Tyr, the UDP-D-apiose/UDP-D-xylose synthase reaction involves decarboxylation, retro-aldol sugar ring opening, rearrangement, and ring contraction 46 . The unusually high catalytic potential of SDR superfamily enzymes cannot be estimated from the phylogenetic analysis alone but careful inspection based on three-dimensional protein structures is required.

Methods
Protein production and purification. A gene encoding C-terminally His6-tagged GalE (pET-30-lnpD, residues 1-340) was cloned from the genomic DNA of B. longum JCM1217 as previously described 15    Crystallography. Crystals of bGalE were grown at 4 °C using the sitting drop vapor diffusion method by mixing 1.0 µL of a 7 mg/ml protein solution with an equal volume of a reservoir solution containing 30% (v/v) PEG400, 0.2 M MgCl, and 0.1 M HEPES-NaOH (pH 7.5). Crystals grew in 2 weeks. Each complex was obtained by cocrystallisation with a ligand (10 mM UDP, 10 mM UDP-Gal, or 10 mM UDP-GlcNac). Crystals were flash-cooled at 100 K in a stream of nitrogen gas. The diffraction data sets were collected using synchrotron radiation at beamlines BL5A, BL17A, and AR-NW12A of the Photon Factory at the High Energy Accelerator Research Organization (KEK, Tsukuba, Japan). The bGalE crystals were long pillar-shaped, and we used helical