Structural insights of the enzymes from the chitin utilization locus of Flavobacterium johnsoniae

Chitin is one of the most abundant renewable organic materials found on earth. The chitin utilization locus in Flavobacterium johnsoniae, which encodes necessary proteins for complete enzymatic depolymerization of crystalline chitin, has recently been characterized but no detailed structural information on the enzymes was provided. Here we present protein structures of the F. johnsoniae chitobiase (FjGH20) and chitinase B (FjChiB). FjGH20 is a multi-domain enzyme with a helical domain not before observed in other chitobiases and a domain organization reminiscent of GH84 (β-N-acetylglucosaminidase) family members. The structure of FjChiB reveals that the protein lacks loops and regions associated with exo-acting activity in other chitinases and instead has a more solvent accessible substrate binding cleft, which is consistent with its endo-chitinase activity. Additionally, small angle X-ray scattering data were collected for the internal 70 kDa region that connects the N- and C-terminal chitinase domains of the unique 158 kDa multi-domain chitinase A (FjChiA). The resulting model of the molecular envelope supports bioinformatic predictions of the region comprising six domains, each with similarities to either Fn3-like or Ig-like domains. Taken together, the results provide insights into chitin utilization by F. johnsoniae and reveal structural diversity in bacterial chitin metabolism.

Scientific RepoRtS | (2020) 10:13775 | https://doi.org/10.1038/s41598-020-70749-w www.nature.com/scientificreports/ Analysis of the protein in solution by size exclusion chromatography was consistent with a monomer unit being the prominent species (≥ 80%) and thus the dimer observed in the crystal structure is likely not of biological relevance. Each protomer consists of three domains: a central domain (residues 145-502) composed of a (β/α) 8 TIM-barrel, an N-terminal domain (residues 1-144) comprising a six-stranded β-sheet and two α-helices where the helices are sandwiched between the β-sheet and the central domain, and a C-terminal domain (residues 504-673) consisting of an eight-helix bundle with two short β-strands (Fig. 1). An α-helix (residues 481-503) extends from and across α-helix 8 of the TIM-barrel to connect the central domain to the C-terminal helical bundle. Most of the polypeptide chains are well defined in the electron density except for residues 243-261 and 421-429 in both protomers, which are not visible and are poorly defined, respectively, suggesting a greater degree of flexibility of these regions.
Structural features of FjGH20 and comparison to homologous structures. The FjGH20 N-terminal and central domains, although only sharing up to 35% sequence identity over the domains, are closely related in structure to several GH20 members, such as the β-hexosaminidases from Bacteroides thetaiotaomicron (BT0459; PDB accession 6q63) and Homo sapiens (PDB accession 1o7a; root mean square deviation of Cα atoms ~ 2.5 Å, as determined by DALI 22 ). The N-terminal domain of GH20 family members is ubiquitous and bears structural resemblance to certain CBMs, but a defined biological role for this domain remains undetermined. The catalytic site for the N-acetylhexosaminidase activity in these enzymes is found in the cleft of the TIM-barrel. Several of the active site residues shown to be important for substrate binding and catalysis in GH20 members are conserved in FjGH20, including the H-x-G-G-D-E motif where a glutamic acid (Glu317) is proposed to act as a general acid/base in the reaction. The aspartic acid in this motif (Asp316), along with a Figure 1. Structure of FjGH20. The overall structure of the enzyme is shown with the individual domains annotated by color (a). The active site architectures of FjGH20 (b) and SmGH20 (c; PDB accession 1qbb) are shown with key residues lining the substrate-binding pocket. The structure of SmGH20 20 was determined in complex with chitobiose (orange carbons). A notable difference between the two enzymes is the presence of a tyrosine (Tyr538) from a loop of the helical bundle domain in FjGH20 filling the void left by the absence of a conserved tryptophan (Trp685 in SmGH20), which is involved in substrate binding by ring stacking with a GlcNAc residue in chitobiose.
Scientific RepoRtS | (2020) 10:13775 | https://doi.org/10.1038/s41598-020-70749-w www.nature.com/scientificreports/ conserved tyrosine residue (Tyr413), are proposed to position and polarize the N-acetyl group for the substrateassisted catalytic mechanism 20,23,24 . The FjGH20 substrate-binding pocket is distinct in one significant way compared to the archetypical GH20 from S. marcescens (SmGH20) 20,25 in that FjGH20 lacks the extended loop that is located between β-strand 7 and α-helix 7 in SmGH20 (Fig. 1). This alteration of the FjGH20 TIM-barrel leads to loss of a conserved active site tryptophan residue (Trp685), which is involved in sugar binding in the + 1 subsite 26 of SmGH20. Interestingly, a portion of a loop extending from the FjGH20 helical bundle domain wraps around, and closes off, one side of the active site cleft in the TIM barrel with a tyrosine residue (Tyr538) that projects its sidechain into a position similar to the Trp685 residue in SmGH20. A similar arrangement occurs in the homologous human β-hexosaminidase, which also has a short loop between β-strand 7 and α-helix 7 of the TIM-barrel and has a tyrosine residue in a position similar to Tyr538 in FjGH20. Interestingly, in the human protein the tyrosine residue originates from a loop in a different protomer 27 . It seems unlikely that this different feature of the substrate-binding sites significantly affects overall activity since FjGH20, human β-hexosaminidase, and a GH20 enzyme from B. thetaiotaomicron, which lacks an analogous aromatic residue, maintain enzymatic activity toward their target substrates 8,28 . The FjGH20 C-terminal helical bundle is a distinct domain amongst structurally determined GH20 family members. The GH20 family is distantly related to GH84, which also contains enzymes with N-acetyl-β-hexosaminidase activity and its members display a similar structural architecture, i.e. an N-terminal β-sheet domain and a (β/α) 8 TIM-barrel catalytic domain containing GH20-like catalytic residues. Some GH84 members, such as GH84C (NagJ) from Clostridium perfringens 29,30 , contain a C-terminal helical bundle analogous to that observed in FjGH20 (Supplemental Fig. 1). As in FjGH20, a loop from the helical bundle in NagJ also lines one face of the substrate binding pocket resulting in a tyrosine side chain being positioned analogously to Tyr538 in FjGH20 and Trp685 in SmGH20. In NagJ, this helical domain appears to act as a bridging domain between the catalytic TIMbarrel and multiple additional domains found closer to the C-terminus. While only distantly related in sequence (8% sequence identity shared between the full-length FjGH20 and the homologous domains of NagJ), the presence of an analogous C-terminal domain in FjGH20 to GH84 enzymes may suggest a closer relationship between the two GH families than has been previously suggested and/or could be a remnant of their common ancestry.
Structure of FjchiB. Although many GH18 chitinase structures have been determined to date (> 75 from distinct species), ChiB only shares up to 30% sequence identity to enzymes with solved structures and its structure could therefore provide novel insights into the GH18 family as a whole. To investigate these features, we pursued structural determination by X-ray crystallography and were able to solve the structure of FjChiB to 1.63 Å resolution. The asymmetric unit contained one protein molecule without contacts indicative of oligomerization. The overall FjChiB structure is similar to other structurally determined GH18 chitinases, having a (β/α) 8 TIMbarrel fold and containing a CID between strand 7 and helix 7 of the (β/α) 8 -barrel (residues 249-286; Fig. 2). The electron density is well defined with only 13 residues from the N-terminus not possible to resolve in the final model. Notably, electron density for three residues of the C-terminal histidine tag used for affinity chromatography purification was resolved and modelled close to a symmetry related molecule's substrate binding cleft. motif (DxxDxDxE) common amongst GH18 family members, which supports a substrate-assisted catalytic mechanism 2 , is conserved in FjChiB (Asp146-Val147-Asp148-Leu149-Glu150; Fig. 2). Electron density consistent with a formate molecule, likely from the crystallization solution, was found in the active site positioned by hydrogen bonds with the hydroxyl moiety of Tyr205 and the carboxyl moiety of Glu150. The orientation and position of the formate molecule is similar to that of the acetyl group of a GlcNAc unit bound in the -1 subsite in several GH18 ligand complex structures. The overall architecture of the substrate-binding cleft of FjChiB is similar to that of other GH18 chitinases. Relative to SmChiB, a processive exo-acting enzyme that is amongst the best studied GH18 enzymes, the binding cleft of FjChiB has two distinct differences, described in more detail below, which leads to the cleft being more open and exposed to the bulk solvent.
In SmChiB, a small insertion between β-strand 1 and α-helix 1 leads to a capping of the cleft at the -3 site 14 , which likely explains why this enzyme favors exo-binding (at the non-reducing end of a chitin chain), rather than endo-binding 2,31 . FjChiB lacks this insertion and instead shows more similarity to the GH18 ChtII from the insect pest Ostrinia furnacalis, which also lacks this insertion and has been shown to be able to bind longer oligosaccharides beyond the -3 site 32 . Like several GH18 chitinases, FjChiB has a CID inserted between strand 7 and helix 7 of the (β/α) 8 -barrel which folds into a distinct domain and builds up one face of the active site cleft. In SmChiB, this region is large and folds over one end of the cleft effectively forming a tunnel and shielding the − 1 and + 1 sites from the bulk solvent 2,14 . There is significant diversity amongst GH18 chitinase members in the CID region, both in length and sequence, and the CID in FjChiB, while similar in overall structure, is shorter than the one found in SmChiB, leading to a much more open cleft at the + 1 and + 2 sites (Fig. 2). A small or absent CID, leading to a more exposed binding cleft, has been observed in other GH18 chitinases, such as the O. furnacalis ChtII 32 , and is commonly associated with endo-acting activity. Previous work has shown FjChiB to be an endo-acting chitinase 8 and the openness of the active site cleft is consistent with this activity.

Structural investigation of the multi-modular FjchiA.
FjChiA is indispensable for the growth of F. johnsoniae on crystalline chitin. Between its two GH18 chitinase domains, the protein contains a middle domain (FjChiA_M) with carbohydrate-binding functionality, which lacks close similarity to any previously studied proteins 8 . Attempts to crystallize either the full-length protein or the middle domain of FjChiA were unsuccessful. However, thanks to high sequence similarity, reliable homology modeling of the FjChiA N-and C-terminal GH18 chitinase domains was possible using PHYRE2 33 . The modelling resulted in high-confidence structure predictions of TIM-barrel proteins consistent with both FjChiA_N and FjChiA_C belonging to the GH18 family (Supplemental Fig. 2). Both domains contain the conserved DxxDxDxE catalytic motif. Of characterized chitinases FjChiA_N is most similar to ChiW from Paenibacillus sp. str. FPU-7, a chitinase from Bacillus circulans WL-12, and SmChiA, while FjChiA_C is most similar to chitinases from Bacillus cereus NCTU2, Chromobacterium violaceum, and SmChiC (Supplemental Fig. 3). Our previous functional characterization of the individual catalytic domains suggested that FjChiA_N and FjChiA_C were exo-and endo-acting chitinases, respectively 8 . As discussed above, in GH18 members the presence of a large CID, that partially covers the substrate-binding cleft, is associated with a higher degree of exo-and processive characteristics while a smaller or absent CID, leading to a more open binding cleft, is associated with endo-acting activities 2 . As illustrated in Supplemental Figs. 2 and 3, a large and extensive CID domain is present FjChiA_N whereas the domain is much smaller in FjChiA_C, consistent with the observed enzyme activities 8 .
In the absence of atomic-level structural information, and to gain better insights into the overall structure of the multi-modular FjChiA, we utilized small angle X-ray scattering (SAXS) to determine a solution structure of FjChiA_M. Unfortunately, the full-length protein suffered from both aggregation and radiation damage issues, even when utilizing SAXS measurements coupled to size exclusion chromatography, and the data could not be utilized for analysis. The FjChiA_M protein, however, proved to be much more amenable to the technique and allowed for the generation of a low-resolution model of the domain ( Table 1). Analysis of the data by a Kratky plot and the pair distance distribution function, P(r), indicated that the protein is modular and elongated with some degree of flexibility 34 (Fig. 3). The ab initio calculation of the SAXS molecular envelopes of FjChiA_M consistently yielded an elongated protein comprised of 6 distinct modules each between 30 to 40 Å in length and ~ 30 Å wide (Fig. 3). The envelope of FjChiA_M is slightly compressed with a small rotation between the third and fourth modules.
The FjChiA_M domain lacks significant sequence similarity to any previously characterized or structurally determined proteins, as determined by NCBI BLAST. However, protein structure predictions using PHYRE2 33 suggest that FjChiA_M is composed of six modules comprised of two Fn3-like domains (residues 471-577 and 578-718) followed by four immunoglobulin (Ig)-like domains (residues 719-821, 822-925, 926-1,030, and 1,031-1,140). Fn3-like and Ig-like domains are similar to each other, with both comprising seven to nine strands arranged into two β-sheets that pack onto each other 35 . However, Fn3-like domains tend to have shorter strands and longer intervening loop regions compared to Ig-like domains. The two FjChiA_M Fn3-like domains share 36% identity to each other and are both most closely related (20% and 22% sequence identity, respectively) to the Fn3-like domain of the pilin protein BcpA from Bacillus cereus (PDB accession 3kpt). The first two Ig-like domains share 75% sequence identity while the last two shares only 20 to 30% identity with each other and with the first two. Structure predictions of these last four domains are consistent with each comprising an Ig-like domain with each sharing between 20 to 30% sequence identity to Ig-like domains from the antifreeze protein (MpAFP) from Marinomonas primoryensis 35 . MpAFP is a large (1.5 MDa) protein comprised of > 100 tandem Ig-like domains that are proposed to extend and project the ice-binding domain of the protein away from the cell. Both Fn3-like and Ig-like domains are commonly found in extracellular carbohydrate-active enzymes and, while sometimes displaying weak carbohydrate-binding ability, it has been suggested that they can play a role in Scientific RepoRtS | (2020) 10:13775 | https://doi.org/10.1038/s41598-020-70749-w www.nature.com/scientificreports/ loosening and exfoliating chains from fibrous polysaccharides 36 . In SmChiA, an Fn3-like domain is connected to the catalytic domain in close proximity to the active site and may have a role in the interaction with polysaccharide substrates 37,38 , and it is possible that Ig-like modules could also have substrate interaction roles. Collectively, the solution scattering results support the bioinformatic prediction that the FjChiA_M domain is composed of distinct modules which likely fold similarly to Fn3-like and Ig-like domains. Further, the protein is observed as elongated in solution in a fashion similar to "beads on a string" where each bead may sit on crystalline chitin and exfoliate chains for the two terminal chitinase domains. Our previous biochemical characterization 8 showed that the FjChiA_M domain adheres to several insoluble and crystalline polysaccharides including α-and β-chitin and cellulose. The elongated conformation of the protein may be a feature that is  www.nature.com/scientificreports/ important not only for adhesion/exfoliation but also for tethering FjChiA_N and FjChiA_C together physically for increased cooperativity between the domains. A tentative model of the FjChiA solution structure together with the entire ChiUL machinery is presented in Fig. 4.

conclusions
Together with our previous characterization, the data presented here represent a holistic structural view of the chitin-interacting proteins of the F. johnsoniae ChiUL, including both carbohydrate-binding proteins and enzymes (Fig. 4). Our results in addition provide new structural information for both the GH18 and GH20 families. The FjGH20 structure reveals novel features previously not seen in structures of GH20 members, and the results suggest a stronger connection between families GH20 and GH84. FjChiA is an exceptionally powerful multi-modular chitinolytic enzyme and our SAXS model of its internal domain, FjChiA_M, showcases how FjChiB are outer membrane-bound lipoproteins where the former two bind oligosaccharides and facilitate import into the periplasm and the latter is an endo-acting chitinase. FjGH20 is a periplasmic chitobiase that cleaves imported oligosaccharides into GlcNAc for further metabolism. A modelled structure of ChiA based on homology models of the terminal GH18 chitinases, FjChiA_N and FjChiA_C, and spanned by homology models of the Fn3-and Ig-like domains of FjChiA_M fitted into the modelled SAXS envelope is visualized bound to, and possibly exfoliating, polysaccharides from insoluble chitin crystals.
Scientific RepoRtS | (2020) 10:13775 | https://doi.org/10.1038/s41598-020-70749-w www.nature.com/scientificreports/ F. johnsoniae has evolved this multidomain chitinase to form a complex enzyme with the terminal chitinase domains separated by an extended protein 'spacer' that also has carbohydrate-binding abilities. The exact mechanism of substrate-binding for the Fn3-and Ig-like domains of FjChiA_M remains elusive, but the current model provides a useful template for future studies. Further insights into the structure, dynamics and functional abilities of multi-catalytic molecular "machines" such as FjChiA may have implications not only for understanding chitin deconstruction, but also for understanding, and eventually designing and optimizing, enzymatic deconstruction of other recalcitrant polysaccharides.

Methods
Structure determination of FjGH20. The protein was produced and purified as previously reported 8 .  45 , as the template. An initial model was built using autobuilding in ARP/wARP [46][47][48][49] . Inspection of electron density maps was done in Coot 50 with positional refinement in REFMAC 51 . The data collection, processing, and refinement statistics for all of the datasets can be found in Table 2.
Structure determination of FjchiB. FjChiB was produced and purified as previously reported 8 [46][47][48][49] . A subsequent data set diffracting to 1.63 Å was collected at the ESRF id30a3 (2017-05-03), processed with XDS 39 , and the solution defined by rigid body refinement using Phenix Refine 52 and the previously determined FjChiB structure. Since the new data set provided an improvement in resolution, only this dataset was pursued for further refinement and deposition. Coot 50 and Phenix Refine 52 were used in iterative cycles of manual and computational refinement. The data collection, processing, and refinement statistics for all of the datasets can be found in Table 2.
Small-angle X-ray scattering (SAXS) of FjchiA_M. The protein was produced and purified as previously reported 8 . X-ray scattering data were obtained at BL4-2 at the Stanford Synchrotron Radiation Lightsource (SSRL) with a Pilatus3 X 1 M detector (Dectris) operated at 11.0 keV. Full-length ChiA and ChiA_M were buffer exchanged into 50 mM Tris pH 8.0 with 250 mM NaCl and 250 μM DTT, using a HiPrep 26/10 Desalting column and an ÄKTA Explorer (GE Healthcare). The proteins were concentrated to 20 mg/mL by ultrafiltration using a Vivaspin (GE Healthcare) 10 kDa molecular weight cut-off polyethersulfone spin column and a protein dilution series was created using the ultrafiltration filtrate. 10 images with 1 s exposure, taken at a distance of 1802.5 mm, were averaged and, after background subtraction of the buffer, were utilized for analysis in the ATSAS suite version 3.0 53 . PRIMUS 54 and GNOM 55 were utilized to assess the data. The data for the full-length ChiA showed significant indications of aggregation and radiation damage and were not pursued further. Images of ChiA_M at concentrations of 1.25, 2.5, and 5.0 mg/mL yielded Guinier R g estimates within 3% of each other with no indication of radiation damage and the data collected at 5 mg/mL were chosen for further analysis. DAMMIF 56 was used to generate 100 models that were subsequently clustered by DAMCLUST 57 and the top cluster was averaged by DAMAVER 58 and then refined by DAMMIN 59 . The SAXS data collection and analysis parameters can be found in Table 1.