Characterization and diversity of the complete set of GH family 3 enzymes from Rhodothermus marinus DSM 4253

The genome of Rhodothermus marinus DSM 4253 encodes six glycoside hydrolases (GH) classified under GH family 3 (GH3): RmBgl3A, RmBgl3B, RmBgl3C, RmXyl3A, RmXyl3B and RmNag3. The biochemical function, modelled 3D-structure, gene cluster and evolutionary relationships of each of these enzymes were studied. The six enzymes were clustered into three major evolutionary lineages of GH3: β-N-acetyl-glucosaminidases, β-1,4-glucosidases/β-xylosidases and macrolide β-glucosidases. The RmNag3 with additional β-lactamase domain clustered with the deepest rooted GH3-lineage of β-N-acetyl-glucosaminidases and was active on acetyl-chitooligosaccharides. RmBgl3B displayed β-1,4-glucosidase activity and was the only representative of the lineage clustered with macrolide β-glucosidases from Actinomycetes. The β-xylosidases, RmXyl3A and RmXyl3B, and the β-glucosidases RmBgl3A and RmBgl3C clustered within the major β-glucosidases/β-xylosidases evolutionary lineage. RmXyl3A and RmXyl3B showed β-xylosidase activity with different specificities for para-nitrophenyl (pNP)-linked substrates and xylooligosaccharides. RmBgl3A displayed β-1,4-glucosidase/β-xylosidase activity while RmBgl3C was active on pNP-β-Glc and β-1,3-1,4-linked glucosyl disaccharides. Putative polysaccharide utilization gene clusters were also investigated for both R. marinus DSM 4253 and DSM 4252T (homolog strain). The analysis showed that in the homolog strain DSM 4252T Rmar_1080 (RmXyl3A) and Rmar_1081 (RmXyl3B) are parts of a putative polysaccharide utilization locus (PUL) for xylan utilization.

specific activity). Both RmXyl3A and RmXyl3B showed highest specific activity towards pNP-β-Xyl, but displayed side activity on pNP-α-l-Ara and pNP-β- Glc. temperature and pH optima. The effect of temperature and pH on activity was studied using the pNP-substrate with highest specific activity for the respective enzyme (Supplementary Table 4). Highest optimal temperature was observed for RmNag3 and RmBgl3B at 90 °C, while the remaining enzymes displayed activity optima between 60-80 °C. The pH optima for all six enzymes were between pH 5.0-5.6 except RmBgl3C, which was pH 7.0.
Kinetic analysis. Kinetic parameters on the same substrates ( Table 2), showed that RmBgl3A had a relatively low K M and higher catalytic efficiency (k cat /K M ) for pNP-β-Glc compared with pNP-β-Xyl (Table 2). RmBgl3A also had the highest catalytic efficiency for pNP-β-Glc of the three enzymes with β-glucosidase activity (RmBgl3A, B and C). The K M of RmBgl3C was an order of magnitude higher than the K M of the two other enzymes, showing lower affinity for this substrate. The turnover number for RmBgl3B (k cat ) was in the same range for pNP-β-Glc, pNP-β-Xyl and pNP-α-l-Ara. The higher k cat /K M of 511 s −1 mM −1 for pNP-β-Glc (due to lower K M ) suggests that this is the most preferred synthetic substrate. All three enzymes, RmBgl3A, RmBgl3B and RmBgl3C, showed substrate inhibition while hydrolysing pNP-β-Glc, RmXyl3A and RmXyl3B were active against pNP-β-Glc, pNP-β-Xyl and pNP-α-l-Ara, and comparison of kinetic parameters revealed a definite preference for pNP-β-Xyl with a low K M . RmXyl3A and RmXyl3B showed a similar substrate preference, but RmXyl3B had higher turnover number and catalytic efficiency.
any potential phosphorolytic function for RmNag3. The K M value increased between 0 and 50 mM of phosphate added to HEPES buffer from 0.16 to 0.30 mM, with no further increase at higher phosphate concentration (100 and 200 mM). For k cat a similar trend was observed (Table 3). However, there was no clear trend in k cat /K M values to suggest any significant effect of phosphate on enzyme activity (Table 3).  phylogenetic analysis of the R. marinus GH3 enzymes. A phylogenetic analysis was performed on the amino acid sequences of 100 characterized GH3 proteins, including the six enzymes from R. marinus. Multiple sequence alignment revealed strictly conserved residues within the family and highlights the lack of the otherwise conserved catalytic acid/base among the β-N-acetyl-glucosaminidases ( Supplementary Fig. 2). Based on the maximum likelihood tree (Fig. 2), it can be seen that the β-N-acetyl-glucosaminidase cluster is the deepest rooted group within GH3, separated from others. Characterized enzymes clustering in this lineage have two different domain architectures, single-domain or two-domain enzymes. One subgroup consists of both domain 1 and 2, the other one is missing the C-terminal domain 2. RmNag3 clusters with the lineage of β-N-acetyl-glucosaminidases, but instead of the typical single or two-domain architecture, RmNag3 displays three domains, containing an additional β-lactamase domain at its C-terminus. BLAST search using the BALSTp tool 30 , revealed a number of gene sequences encoding proteins with similar domain architecture (i.e. domain 1, domain 2 and the β-lactamase domain) e.g. genes encoding putative GH3 candidates from Salinibacter altiplanensis (RefSeq: WP_103019701) and Rhodohalobacter halophilus (RefSeq: WP_083750206), indicating a common role. To date, RmNag3 is however the only characterized enzyme with this type of domain architecture. The other five GH3 enzymes cluster with different subgroups in two main evolutionary lineages. The first lineage consists of the three subgroups: thermostable multifunctional β-xylosidases, closely related β-glucosidases and exo-β-1,3-1,4-glucanases from plants and bacteria (Fig. 2). Four of the R. marinus GH3 enzymes cluster with subgroups within this first lineage (RmBgl3A, RmBgl3C, RmXyl3A and RmXyl3B). RmBgl3A is part of a weakly defined cluster formed by three bacterial β-glucosidases (Fig. 2), all clustering within the β-glucosidase subgroup. The enzyme consists of domains 1, 2 and the fibronectin type III (FnIII) domain. The protein RmBgl3C clusters together with exo-β-1,3-1,4-glucanase from plants and bacteria and is a two-domain enzyme consisting of domain 1 and 2. At its C-terminus it has a C-terminal extension, like exo-glucanase (ExoI) from Hordeum vulgare (UniP: Q9XEI3), instead of the domain 3 found in e.g. exo-glucanase (ExoP) from Pseudoalteromonas sp. BB1 (UniP: Q0QJA3). RmXyl3A and RmXyl3B both consist of domain 1, 2 and the FnIII domain and cluster together with the group of thermostable enzymes with predominant β-xylosidase (EC 3.2.1.37) activity with minor α-l-arabinofuranosidase (EC 3.2.1.55) and β-glucosidase (EC 3.2.1.21) activities (Fig. 2). This group is separated from, and therefore not directly phylogenetically related to, other β-xylosidase clusters originating mainly from plants and fungi.

Gene loci Annotation
A second lineage is formed by two large subgroups. The first subgroup comprises fungal and bacterial β-glucosidases, including a group of macrolide β-glucosidases, whereas the second subgroup is formed by β-xylosidases from plants, fungi and bacteria (Fig. 2). RmBgl3B clusters with the macrolide β-glucosidase subgroup together with bacterial (mostly actinobacterial) β-glucosidases involved in activation of secreted antibiotics, activated by the removal of glucosyl moieties from their non-active glycosylated forms. In addition to domains 1, 2 and FnIII, it comprises the PA14 domain (Fig. 3F) inserted between β-strand k and α-helix K.
Structural analysis of the R. marinus GH3 enzymes. Domain organization, tertiary structure and active site arrangement for the six GH3 enzymes were investigated by homology modelling and compared to closely related structures. Sequence similarities with template structures and model validation are presented in Table 5. Homology model validation show an overall good quality of the six models however indicating some parts of low quality. The modelled structure of RmBgl3B have low quality in parts of the PA14 and FnIII domains and in loop l in domain 2 close to the active site. For RmXyl3A and RmXyl3B, several long loops in the modelled Substrate Specific activity (µmol min −1 mg −1 ) www.nature.com/scientificreports www.nature.com/scientificreports/ structures were not present in the templates, making loop predictions in these regions poor. The RmNag3 model had very good quality, despite lower sequence identity with its template.

RmBgl3A RmBgl3B RmBgl3C RmXyl3A RmXyl3B RmNag3
The overall structures of the most common domains in GH3 enzymes: (α/β) 8   Phylogenetic relationship between proteins in GH3. The maximum likelihood phylogenetic tree was calculated using amino acid sequences of biochemically characterized proteins from GH3 together with six protein sequences from Rhodothermus marinus (highlighted in colored boxes). Sequences belonging to particular group are colored in the same color and the schematic representation of domain organization is shown in a circle above the tree. If domain organization of particular protein differs from prevailing domain organization of the group to which it belongs, its domain organization is shown separately next to the protein.  Table 6). The enzyme consists of domain 1 and 2 but only domain 1 builds up the active site ( Supplementary Fig. 3F). Conserved residues important for catalysis were found in positions corresponding to those in structure determined GH3 β-N-acetyl-glucosaminidases: catalytic nucleophile Asp118 on β-strand g 33 , the Asp227-His229 dyad on loop e 34 , and Arg186 and Phe188 on loop d 35,36 (Supplementary Fig. 4F). Similar to other GH3 enzymes, Lys216 and His217 on β-strand e and Asp118 on β-strand c was found hydrogen bonding to GlcNAc in subsite −1. Superimposition with NagZ from Pseudomonas aeruginosa in complex with GlcNAc and l-Ala-1,6-anhydroMurNAc (PDB: 5G3R), showed that Arg60 on β-strand a, Arg126 on loop c and Glu306 on loop g were potentially hydrogen bonding to a sugar unit in subsite +1 (Supplementary Figs. 3F and 4F). All residues important for substrate binding were conserved in the closely related NagA from Streptomyces thermoviolaceus ( Table 6).
The modelled structures of the β-glucosidases and β-xylosidases presented −1 subsite-architectures typical of GH3, with small but important differences (Fig. 4A, Supplementary Figs. 3, 4 and Table 6). The strictly   Table 5. Validation of the homology models of the six GH3 enzymes from Rhodothermus marinus. RmBgl3A, RmXyl3A and RmXyl3B were modelled as dimers, result for chain A and B are showed separately for ProSA and ERRAT analysis. a 80% of the residues in a structure is required at an averaged 3D-1D score of 0.2 to be considered a good model. b Numbers corresponds to percentage of residues in: Most favorable regions; additionally allowed regions; generously allowed regions; disallowed regions. c For high-resolution X-ray structures an overall quality factor of 95% is considered a good protein structure and above 91% is expected for X-ray structures with a resolution between 2.5 and 3 Å. www.nature.com/scientificreports www.nature.com/scientificreports/ conserved catalytic nucleophile on β-strand g 37 , the catalytic acid/base on loop l in domain 2 38 , a tandem Lys and His on β-strand e, Arg on loop d and Met on β-strand f were in positions corresponding to those in other GH3-structures 39 . Subsite −1 of RmXyl3A and RmXyl3B were very similar. This was also the case for subsite −1 of RmBgl3A and RmBgl3C, while RmBgl3B revealed two non-conserved residues: Arg178 on loop e and Ser380 on loop j potentially hydrogen bonding with the glucose in subsite −1 (Fig. 4A). A difference between the β-xylosidases and β-glucosidases was found on β-strand c; the two β-xylosidases possess a Glu and the three β-glucosidases an Asp, potentially hydrogen bonding with the respective sugar in subsite −1 (Fig. 4A). These features were conserved in the closest relative of the respective enzyme (Table 6).
Subsite +1 was studied by superimposition of structure determined enzyme-ligand complexes (PDB: 1IEX, 1J8V, 4ZO9 and 5AE6) (Fig. 4B, Supplementary Figs. 3, 4 and Table 6). Both β-glucosidases and β-xylosidases displayed an aromatic residue on top of subsite +1 on loop d. In RmBgl3A, RmBgl3C and the two β-xylosidases an aromatic residue below subsite +1, was potentially sandwiching the sugar unit. This residue was positioned in different loops in the respective enzyme (Fig. 4B). In RmBgl3A, Tyr614 is located in the linker region between domain 2 and FnIII of the opposite chain. In RmBgl3C, Trp453 is located in loop j in domain 2. In RmXyl3A and RmXyl3B, Tyr545 and Trp547, respectively, are located two positions after the acid/base catalyst on loop l in domain 2. Due to low sequence identity to the model templates, the prediction of loop l in domain 2 is,  Table 6. Comparison of potential substrate interacting residues with closely related characterized GH3 enzymes based on sequence alignment and structure. The identities between the enzymes are indicated in parenthesis. Strictly conserved resides presented in Supplementary Fig. 2   www.nature.com/scientificreports www.nature.com/scientificreports/ however uncertain, including the potential involvement of Tyr545 and Trp547 in subsite +1. In RmBgl3A, an additional residue, Arg606 in the linker between domain 2 and FnIII, is potentially interacting with subsite +1. In RmBgl3B, RmBgl3C, RmXyl3A and RmXyl3B, a Tyr on loop f creates hydrogen bonding possibilities. RmBgl3A has Phe254 in the corresponding position. The interacting residues in subsite +1 were not completely conserved in the closest related structure of the respective enzyme (Table 6). Tyr614 in RmBgl3A corresponded to Glu587 in the β-glucosidase from Elizabethkingia meningoseptica and the aromatic residue on the bottom of subsite +1 in RmXyl3A and RmXyl3B was replaced by Cys524 in Xyl3A from Caldanaerobius polysaccharolyticus.

Discussion
Rhodothermus marinus has served as a source of various thermostable enzymes. The Carbohydrate Active enZymes (CAZy) database (http://www.cazy.org) 19 reports 55 glycoside hydrolase (GH) encoding genes in 33 different GH families and two non-classified genes in the type strain (DSM 4252 T ). Six genes encodes GH family 3 (GH3) candidates and five of them are annotated as β-glucosidases in Kyoto Encyclopedia of Genes and Genomes (KEGG). However, the GH3 members of strain DSM 4253 have different substrate specificities, consistent with differences in their modelled structures. It is very important to mention that the GH3 candidates from both strains of R. marinus DSM 4252 T and 4253 are homologous. The most common domain architecture in GH3 is either a two domain type: domain 1 and 2, with the catalytic nucleophile in domain 1 and acid/base in domain 2 40 or a three domain type, with an FnIII-domain combined with domain 1 and 2 41 . Besides these domain architectures, there are two clusters of proteins in the phylogenetic tree containing an additional domain type, PA14, inserted between β-strand k and α-helix K. Since the two clusters are phylogenetically separated (Fig. 2) and differ in both type and orientation of the PA14 domain, it is likely that they are results of two evolutionary independent insertions.
RmNag3 has a well-conserved −1 subsite and several possible hydrogen bonding residues in subsite +1 (Fig. 4F). Among the interacting residues in subsite +1 Arg126 was found conserved in almost all structure-determined GH3 β-N-acetyl-glucosaminidases. RmNag3 possesses a signal peptide, a β-lactamase domain and showed optimal activity at high temperature. It was earlier proposed that all β-N-acetyl-glucosaminidases were phosphorylases 42 . However, this seems less likely as the presence of phosphate in the kinetic reaction showed little effect on hydrolytic activity of RmNag3. An increase in K M was observed at 50 mM phosphate, but K M remained unaffected with increasing phosphate concentration. A similar trend was also observed for the β-N-acetyl-glucosaminidase from Herbaspirillum seropedicae which does not have phosphorylase activity 43 . For the GH3 phosphorylase from Cellulomonas fimi the presence of phosphate had significant effect on the hydrolytic activity 42 compared to RmNag3. It is also interesting to see that RmNag3 had no activity on pNP-β-Glc while both enzymes from H. seropedicae and C. fimi were active on this substrate 42,43 . This narrow substrate specificity could be due to the presence of the β-lactamase domain in RmNag3. Two main biological roles for β-N-acetyl-glucosaminidases are degradation of chitin and peptidoglycan turnover. R. marinus possesses an extracellular GH18 chitinase 44 and an intracellular GH20 β-N-acetyl-hexosaminidase, suggesting chitin utilization in R. marinus. However, the β-lactamase domain in RmNag3 suggests association with peptidoglycan turnover as the degradation products have been shown to function as inducers for β-lactamase 45 . β-N-acetyl-glucosaminidases involved in peptidoglycan turnover are often but not exclusively intracellular in Gram-negative bacteria 46 . Comparison with peptidoglycan turnover in Escherichia coli strain K12 shows that the R. marinus type strain genome contains orthologous genes for degrading peptidoglycan and release of GlcNAc, including anhMurNAc and tetrapeptide in the periplasmic space (endolytic murein transglycosylase (MltG), membrane-bound lytic murein transglycosylase (MltD), N-acetylmuramoyl-l-alanine amidase (AmiA) and NagZ (ortholog to RmNag3)). No intracellular orthologs to NagZ, the intracellular N-acetylmuramoyl-L-alanine amidase (AmpD) or the permease AmpG for transportation of GlcNAc-1,6-anhydro-MurNAc-peptides across the inner membrane are found in R. marinus. The lack of AmpG makes it likely that RmNag3 is present in the periplasm. This set-up is found in the Gram-positive Bacillus subtilis which lacks AmpG and AmpD, and has a β-N-acetyl-glucosaminidase (NagZ) and an N-acetylmuramoyl-L -alanine amidase (AmiE) in the periplasm 47 . In B. subtilis, transportation of GlcNAc and anhMurNAc is believed to be done by the phosphotransferase system enzymes MurP (anhMurNAc) and NagE (GlcNAc). No orthologs for these proteins were found in R. marinus, however, AnmK (anhydro-N-acetylmuramic acid kinase) and NagK (N-acetyl-d-glucosamine kinase) were present as intracellular proteins.
Both RmBgl3A and RmBgl3B showed specificity for cellobiose but only RmBgl3A could hydrolyse cellohexaose. RmBgl3A is expected to be intracellular while RmBgl3B, which has a higher temperature optimum and a predicted a signal peptide, is likely exported. RmBgl3A is a dimer with a conserved subsite −1 and a well-defined subsite +1 including two aromatic residues sandwiching the sugar and Arg606 on the linker positioned between domain 2 and the FnIII domain on the opposite chain. These residues are also conserved in the closest related structure-determined enzymes (Supplementary Table 1). The corresponding residue to Arg606 have been hypothesis by others to be responsible for specificity towards β-1,2and β-1,3-linked glucose and exclusion of activity on β-1,4-linked glucose 48 . This is not the case, either for RmBgl3A due to lack of activity on laminaribiose (β-1,3), nor the closest structure-determined enzyme, JMB19063 isolated from compost metagenome, which has activity on cellooligosaccharides 49 . RmBgl3B has a well-defined subsite −1 with two additional interacting residues (Arg178 and Ser380) compared to RmBgl3A and a weakly interacting subsite +1 with only two residues, one stacking residue and one hydrogen bonding residue, also conserved in the closest related structure-determined enzymes (Supplementary Table 1). Arg178 on loop e in subsite −1 (Fig. 4A) is conserved in several structure-determined GH3 β-glucosidases, especially within the fungal kingdom. Ser380 is situated on a small α-helix on loop j and is found in several other structures, including many fungal β-glucosidases, but is not as common as Arg178. The inability of RmBgl3B to cleave longer substrate is most likely a result of the very deep active site cleft created by the PA14 domain. The domain is of the same type and is inserted in the same direction as in the main template, www.nature.com/scientificreports www.nature.com/scientificreports/ DesR from Streptomyces venezuelae, which is placed in the same cluster (Fig. 2) and active on glucosylated macrolides 50 . The PA14 domain in other carbohydrate-active proteins, including the closely related KmBglI from Kluyveromyces marxianus, have been associated with oligosaccharide binding 32,51,52 . However, no substrate interaction possibility for the PA14 domain was found, neither in the model of RmBgl3B nor in the structure of DesR.
RmBgl3C has linkage preference towards β-1,3-linked glucose which is found in laminarin, a polysaccharide present in brown algae 53 , in a bacterial polysaccharide called curdlan 54 and in various mixed-linkage glucans. It clusters under the linage that consists of exo-glucanases from plants and bacteria. Despite C-terminal extension differences, the exo-glucanase ExoP from Pseudoalteromonas sp. BB1 and RmBgl3C display similar activity on laminarioligosaccharides 55 . RmBgl3C has a clear signal peptide and higher pH optimum than the other GH3 enzymes in R. marinus. Therefore, it is likely that the enzyme is exported out of the outer membrane. Corresponding pH optima were also observed in RmXyn10A and RmCel12 which are exported 13,56 . RmBgl3C displays a subsite −1 similar to RmBgl3A and a subsite +1 with two aromatic resides sandwiching the sugar, the conserved Tyr311 on loop g and Trp453 on loop j, and Tyr278 with a hydrogen bonding possibility. Trp453 is located in a similar position as Tyr614 of RmBgl3A but the two residues are places on different loops (Fig. 4B). Trp453 is only found in a few structures including the closest structure-determined enzymes (Supplementary Table 1). The two characterised ExoI from barley (Hordeum vulgare) and ExoP, are in similarity with RmBgl3C β-1,3-β-1,4-glucanases. The structural basis for this specificity has been investigated but not clearly understood 57 . Interestingly, loop e which is involved in shaping subsite +1 is longer in RmBgl3C than in any of the closest structure-determined enzymes. An accurate conformation of this loop can be important in understanding the specificity of RmBgl3C, as Arg228 on loop e in ExoP was found to hydrogen bond to the glucose in subsite +1, when laminaribiose was modelled into the structure 58 .
RmXyl3A and RmXyl3B are both bifunctional β-xylosidases/β-1,4-glucosidases but show different activity patterns. They are 93% identical, positioned next to each other on the chromosome, and thus, likely a result of a relatively recent gene duplication. There are two less similar regions identifiable in the sequences. The first region is located in domain 1 in between β-strand g and α-helix H2, whereas the second one is found in domain 2 on an unusually long loop between β-strand k and α-helix K. RmXyl3B has endo-β-xylanase activity while RmXyl3A only hydrolysed xylobiose and, interestingly, cellohexaose. Based on kinetic analysis RmXyl3B also showed specificity for pNP-α-l-Ara. A similar type of bifunctional β-xylosidase/β-1,4-glucosidase has been characterised from Caldanaerobius polysaccharolyticus 59 , in the same phylogenetic cluster as RmXyl3A and RmXyl3B (Fig. 2). Subsite −1 of RmXyl3A and RmXyl3B is similar to that of RmBgl3A and RmBgl3C except for one residue potentially responsible for the difference in specificity. The two β-xylosidases/β-1,4-glucosidases display a Glu instead of an Asp on β-strand c in subsite −1 (Fig. 4A). This difference could affect the preference for glucose or xylose in subsite −1 as the Asp is making no steric hindrance for the additional CH 2 OH group of glucose. The same pattern was found in other GH3 enzymes (Table 6), including ExoI from barley and Xyl3B from Prevotella bryantii where substitution of Glu115 to an Asp increased catalytic activity of Xyl3B on pNP-β-Glc 39 . Another interesting feature is the aromatic residue on loop j in domain 2, two positions away from the potential acid/base catalyst in RmXyl3A and RmXyl3B (Fig. 4B), with the possibility to be in a position analogous to Tyr614 in RmBgl3A. This was not found in the Caldanaerobius polysaccharolyticus β-xylosidase/β-1,4-glucosidase, but comparison with structures of other GH3 enzymes, reveals an aromatic residue two positions after the acid/base catalyst and a loop j of similar length in several fungal β-glucosidases with very low sequence identity to the R. marinus β-xylosidases/β-1,4-glucosidases (see PDB: 4IIB, 5FJI, 5FJJ, 5NBS, 4DOJ and 3ZYZ), and stacking with a glucose in subsite +1 has been shown for AaBGL1 from Aspergillus aculeatus 60 .
The genomic context of the GH3 enzymes was informative to some extent concerning their potential role in utilization of carbohydrate substrates. The genome comparison between R. marinus DSM 4252 T and 4253 showed that all the corresponding GH3 gene loci are identical. In R. marinus DSM 4252 T and DSM 4253, the putative PUL consists of six GHs: the endo-1,4-β-xylanase RmXyn10A 13 with putative location in the extracellular space, an uncharacterized GH10 (a family containing mainly endo-1,4-β-xylanases) and the two GH3 β-xylosidases characterized in the current study, suggesting that the cluster is involved in utilization of 1,4-β-linked xylans. The potential extracellular GH43 belongs to subfamily 15 with no characterized proteins so far. The uncharacterized GH67 is annotated as an α-glucuronidase in NCBI, suggesting that xylan substituted with glucuronic acid or methyl glucuronic acid could be utilised by the PUL. Besides glycan degrading enzymes, a canonical PUL involves a pair of susC and susD homologues and a regulator 25 , components which are all found in the described cluster. SusC and SusD, first described as vital components of a starch utilization system (SUS) in Bacteroides thetaiotaomicron, are co-regulated genes involved in coordinated binding, transport and degradation of carbohydrates from the extracellular space via the outer membrane into the periplasmic space 24 . SusC is a type of TonB-dependent receptor, an outer membrane transporter known to transport ferric chelates, but also shown to transport carbohydrates. TonB-dependent transporters require energy and three inner membrane proteins in a complex, TonB-ExbB-ExbD, for the transportation 61 . SusD is a lipoprotein attached to the outer cell membrane shown to bind carbohydrates. In addition, the putative regulation system identified in the cluster, called trans-envelope signalling, involving an extracytoplasmic function (ECF) σ-factor/anti-σ-factor system is found in PULs from Bacteroidetes 24 . Biochemical and structural characterization of the hydrolytic enzymes involved in the cluster, transcriptome analysis during growth on different substrates as well as knock-out of the genes are necessary to fully understand the mechanisms involved in the PUL described in this study.
In summary, the biochemical and structural characterisation of the GH3 enzymes from R. marinus DSM 4253 (and 4252 T ), shows that the six GH3 enzymes encoded in the genome have non-redundant substrate specificities which are involved in extracellular laminarin, potential macrolide degradation, as well as intracellular cellobiose to glucose conversion, the conversion of xylans, and recycling of peptidoglycans, giving significant insights into structural features important for the specificity of these enzymes as well as the organization of corresponding loci in the R. marinus genome. (2020) 10:1329 | https://doi.org/10.1038/s41598-020-58015-5 www.nature.com/scientificreports www.nature.com/scientificreports/ Materials and Methods chemicals. Laminaribiose, cello-, xylo-and acetyl-chitooligosaccharides were obtained from Megazyme (Wicklow, Ireland). Sodium acetate, laminarin from Laminaria digitata, xylan from birch wood, para-nitrophenol and all para-nitrophenyl-β-d-glycosides were purchased from Sigma-Aldrich (St. Louis, Mo). All other chemicals were of molecular biology or analytical grades and purchsed from VWR International (Stockholm, Sweden).
Bacterial strains, genome sequencing and gene cluster analysis. Rhodothermus marinus strain DSM 4253 was isolated from an intertidal hot spring in Iceland, at a location close to that of the Rhodothermus marinus DSM 4252 T (Type strain) 6 . The genome was sequenced using TrueSeq chemistry for library construction and MiSeq sequencing platform (unpublished data). Sequencing data was assembled using GS De Novo Assembler software (Roche) and annotated using the RAST annotation server at rast.nmpdr.org 62 . Genes, encoding enzymes of glycoside hydrolase (GH) family 3 (GH3), were identified by BlastX 63 using GH3 amino acid sequences from the R. marinus type strain retrieved from the Carbohydrate-Active enZYme (CAZy) database 19 (http://www.cazy.org) as query sequences. Sequence regions harbouring the GH3 genes along with flanking sequences, spanning approximately 15 kb, were extracted from the genome sequence using the Geneious molecular biology tool. The structure of corresponding loci of the GH3 genes was resolved in a genome viewer and the neighbouring genes were analysed by BLAST 63 .
Gene clusters were analysed in the genome of the type strain 12 by annotations presented in the National Center for Biotechnology Information (NCBI) Nucleotide database, Reference Sequence (RefSeq): NC_013501.1, of genes in proximity to the six genes (Rmar_0536, Rmar_0925, Rmar_1080, Rmar_1081, Rmar_2069, and Rmar_2616) encoding the putative GH3 enzymes: RmBgl3A, RmNag3, RmXyl3A, RmXyl3B, RmBgl3B, and RmBgl3C. Additional annotations were done by compiling the information from searches in the Conserved Domain Database (CDD) from NCBI 64 and in the CAZy database in the case of annotated GHs. Operons were predicted by DOOR 2.0 65,66 and Genome2D 67 . Putative promotors were predicted by PePPER 68 and Rho-independent translational terminator sites were predicted by ARNold 69 and DOOR 2.0. Potential co-transcription was based on predicted operons, promotors and transcription terminators. Predictions of signal peptides were done by SignalP versions 3.0 and 4.1 [70][71][72] . BlASTp 30 was used to investigate poorly annotated genes and search for domains. Potential cellular location was based on prediction of signal peptides and the putative por-secretion system C-terminal sorting domain.
Sequence alignment and evolutionary relationships. From 301 characterized protein members classified in GH3 of the CAZy database, 100 with known activity were selected (Supplementary dataset file). Amino acid sequences of these proteins were retrieved mainly from the UniProt database 73 and some were obtained from GenBank 74 and RefSeq 75 databases. All sequences of characterized GH3 proteins together with those of the six proteins from R. marinus were aligned using the programme Clustal-X 76 and then the alignment was further manually fine-tuned.
Maximum likelihood phylogenetic tree 77 was calculated using the PhyML algorithm 78 available through T-REX server (http://www.trex.uqam.ca/ 79 . Gamma shape parameter and proportion of invariable sites were estimated by the PhyML itself. Number of relative substitution rate categories was set to four and WAG substitution model 80 was used. The starting tree was calculated using BIONJ. NNI was used for a tree improvement. Tree topology and branch lengths were optimized. Reliability of tree topologies was evaluated using the bootstrap test 81 with 100 replications. For the phylogenetic analysis, only amino acid sequences of domain 1, which is universally present in all proteins from GH3, was used. Structure homology modelling. Homology modelling of the six GH3 of R. marinus DSM 4253 was carried out using the YASARA program 82,83 with default settings except templates that were manually inserted (Supplementary Table 1). For each enzyme, a BLASTp search 63 in the Protein Data Bank (PDB) was done and the five enzymes with the highest score were chosen. If several structures were available, the structure without mutations, with a high resolution and relevant ligand was chosen. If the sequence identity was more than 10% units lower than the hit with the highest sequence identity, this template was not used. Only relevant ligands were kept in the structure and in some cases ligands were manually modified: in RmXyl3A and RmXyl3B into a xylose and in the case of RmNag3 into a β-N-acetyl-glucosamine before entering the homology modelling. Only ions conserved among the templates were kept in the structures. Only dimerization conserved among the templates were kept and in other cases only chain A was kept. The β-lactamase domain of RmNag3 was removed before homology modelling since no hit was found for the entire protein and no hit with a sequence identity above 30% was found for the domain alone. YASARA homology modelling generates five alignments for each template and builds a model for each alignment. A hybrid model is generated by combining the best part of the models. Each hybrid model was manually checked by superimposition and comparison of templates and other GH3 structures in Chimera 84 . Side chains in the active site and ligand positioning were manually changed based on conserved features in the closest related structures. The peptide bonds linking the conserved residues Lys and His on β-strand e and the following two amino acids were manually modified into cis-conformation, which are conserved within GH3. Modified models were energy minimised and then refined in YASARA. Refinement was done with default setting, only changing temperature and pH to the optima for each enzyme (Supplementary Table 4), with a 500 ps simulation of molecular dynamics with the YASARA2 force field 85 . During the simulation, 20 trajectories were saved, energy minimised and analysed by checking the energy of the system as well as dihedral angles, packag-ing1D and packaging3D. The trajectory with best overall quality was further evaluated. Evaluation of each refined structure was done manually and by several online validation tools. Superimposition in Chimera verified that the active site arrangement was kept and stabilized by the simulation. If not, modifications of the hybrid model were revised and a new version was refined. The quality of each refined structure was evaluated by average 3D-1D score Purification. Cell pellets were resuspended in binding buffer (20 mM sodium phosphate buffer, 500 mM NaCl, 20 mM imidazole, pH 7.4), and lysed by sonication 5 × 3 min, at 60% amplitude and a cycle of 0.5 using a 14-mm titanium probe (UP400 S; Hielscher Ultrasonic GmbH, Teltow, Germany). For Rmar_0925 encoding the RmNag3, the binding buffer contained 20 mM Tris-HCl instead of sodium phosphate buffer Cell debris was removed by centrifugation (14000 × g, 20 min, 4 °C), prior to purification by nickel affinity chromatography using an ÄKTA TM start system (GE Healthcare) with a HisTrap FF crude 1 mL column (GE Healthcare). Bound protein was eluted using gradient of imidazole 20-500 mM and fractions of 1 mL were collected. All purified proteins were stored at 4 °C.

Hydrolysis of para-nitrophenyl glycosides. Enzyme catalysed hydrolysis of para-nitrophenyl
(pNP)-linked substrates was assayed spectrophotometrically at 405 nm using a UV-1650PC spectrophotometer (Shimadzu, Kyoto, Japan) connected to a JulaboMB (Labortechnik GMBH, Germany) temperature-controlled system at 60 °C. Final reaction volume was 600 µL and contained 1 mM substrate dissolved in 20 mM citrate phosphate buffer at pH 5.6. The reaction was initiated by adding 0.04-0.2 µM of enzyme to the pre-incubated reaction solution and monitored for 5 min. 1 U equals the amount of enzyme required to release 1 µmol pNP min −1 . The extinction coefficient for pNP at 60 °C is 1426 M −1 cm −1 and 18072 M −1 cm −1 at pH 5.6 and 6.0 respectively. For kinetic parameters, substrate concentrations were 0.05-20 mM, at reaction conditions and enzyme concentration as above except for RmNg3. Additional the kinetic assays for RmNg3 were performed at pH 6.0 using 50 mM HEPES buffer, while in the presence of phosphate the buffer was supplemented with 50 mM, 100 mM and 200 mM of sodium phosphate. Each reaction was monitored for 10 min and K M , V max and K i values were calculated from GraphPad Prism V6. For RmNag3, the kinetic parameters for determining phosphorolytic activity were obtained by using KinTek Explorer. The pH and temperature optima were determined in the pH-range 3.0-6.0 (50 mM sodium citrate phosphate buffer), and 7.0-8.0 (50 mM sodium phosphate buffer) and at 40-90 °C in assays with a protein concentration of 0.04-0.1 µM and 1 mM of pNP-β-Glc or pNP-β-Xyl. All reactions were run in triplicates.