Introduction

Marine extremophilic biotopes, such as hot springs and hydrothermal vents, harbour diverse microbes hitherto underexploited and unexplored. Recent genomic studies show that many of the species, especially those found in coastal geothermal areas surrounded by profusion of carbohydrate rich biomass (seaweeds as well as terrestrial species), contain a wide array of novel glycoside hydrolases (GHs)1,2. Thermostable GHs have numerous applications in different fields, making marine thermophiles targets for prospecting of industrially interesting enzymes3,4. Rhodothermus marinus are Gram-negative marine thermophilic bacteria, previously classified under the phylum Bacteroidetes, but recently assigned to the new phylum Rhodothermaeota5. The type-species was isolated from a coastal hot spring on the North-West coast of Iceland and has an optimum temperature of 65 °C and is slightly halophilic6. R. marinus can utilize a variety of sugars as carbon sources and produces a wide range of GHs6,7,8,9,10,11,12. Sequence analysis shows that the R. marinus genome contains a large number of genes encoding GH enzymes, many of which are secreted extracellularly; yet, several of them appear to be attached to the cell surface13. These putative enzymes include six members of GH family 3 (GH3) (RmBgl3A, RmBgl3B, RmBgl3C, RmXyl3A, RmXyl3B and RmNag3), which have not yet been studied.

Glycoside hydrolase family 3 (GH3) is intriguing due to possible roles in cellulosic biomass degradation, bacterial and plant cell wall remodelling, recycling and in pathogen defence14,15,16,17. GH3 members are retaining enzymes, capable of hydrolysing the terminal glycosidic bond in the non-reducing end of a number of glycosides and glyco-conjugates18. According to the Carbohydrate-Active enZYme (CAZy) database19 (http://www.cazy.org), this family is widely distributed in bacteria, fungi and plants. It contains more than 27,000 genes (December 2019) encoding putative enzymes and most are of bacterial origin. Members of the family are known for diverse activities, such as β-d-glucosidases, β-d-xylosidases, α-l-arabinofuranosidases and β-N-acetyl-d-glucosaminidases. This activity spectrum makes the family interesting for biotechnological applications such as degradation of renewable resources for biofuel production20,21. As sequenced-based families group together GHs of different specificity22, without biochemical investigation it is difficult to resolve substrate specificities for individual enzymes. Looking at the CAZy database the total number of biochemically characterised enzyme is significantly less compare to total gene numbers23.

In this study, we report biochemical characterisation of the complete set of GH3 enzymes from R. marinus DSM 4253. The hydrolysis data show that the six enzymes encoded in the genome have non-redundant substrate specificities which will help in future characterisation of other enzymes from this family. This study gave us significant insights into novel structural features of the enzymes as well as on corresponding loci in the R. marinus genome.

Results

Gene identification and cluster analysis

Six genes encoding glycoside hydrolase (GH) family 3 (GH3) enzymes were identified in the Rhodothermus marinus DSM 4253 genome by BLAST analysis, and designated according to the locus tags of their homologs in Rhodothermus marinus DSM 4252T (Supplementary Table 2). The positions were in accordance with those of the type strain, and as indicated by the gene number, the four genes Rmar_0536 (encoding RmBgl3A), Rmar_0925 (encoding RmNag3), Rmar_2069 (encoding RmBgl3B) and Rmar_2616 (encoding RmBgl3C) were found at different positions the genome, while Rmar_1080 (encoding RmXyl3A) and Rmar_1081 (encoding RmXyl3B) were located adjacent to each other (Fig. 1). Clustering with other GH-encoding, regulatory or transporter genes, was investigated by analysing neighbouring genes in both strains.

Figure 1
figure 1

Gene cluster analysis in Rhodothermus marinus DSM 4253. The gene cluster corresponding to GH3 loci in strains DSM 4252T and DSM 4253 is identical. (A) Downstream of Rmar_0536 are genes enconding: membrane proteins (yedZ and yedY); a membrane transporter protein (mfs); an arylesterase precursor (arylest.); an ABC transporter ATP-binding protein (yvcR); a hypothetical protein (hyp.). Upstream of Rmar_0536 are genes encoding: a DNA-binding protein (hiran); two hypothetical proteins (hyp.); a potential methyltransferase (duf43). (B) Downstream of Rmar_0925 are genes encoding: a methylmalonyl-CoA mutase (mm_coa_mut); a 1,4-dihydroxy-2-naphthoyl-CoA synthase (menB); a 4′-phosphopantetheinyl transferase (sfp); a superoxide dismutase (sodA); a hypothetical protein (hyp.). Upstream of Rmar_0925 are genes encoding: ATPase (atpase); a 23 S rRNA methyltransferase (rlmN); an O-acetylhomoserine aminocarboxypropyltransferase (acpt); a homoserine O-acetyltransferase (esterase). (C) Downstream of Rmar_1080 and Rmar_1081 are genes encoding: two hypothetical proteins (hyp.); a periplasmic ABC transporter substrate-binding protein (abc), a potential cobalamin binding protein (yvcR.); a cobalamin biosynthesis protein (cbiX); a heme-binding protein (hmuY). Upstream of Rmar_1080 and Rmar_1081 genes encoding: an α-Glucuronidase (α-glucuronidase) and a pair of susC (a TonB-dependent receptor) and susD homologes. (D) Downstream of Rmar_2069 are genes encoding: a tRNA-Gln-CTG,tRNA (gray arrow); a hypothetical protein similar to ABC transporter (hyp. abc); a hypothetical protein (hyp.); a riboflavin synthase (ribD); a phosphohydrolase (hd); a riboflavin. Upstream of Rmar_2069 are genes encoding: an asparagine synthase (asn_synth); a Fe-S oxidoreductase (rad sam); a FAD-dependent oxidoreductase (oxidoreduct.); a hypothetical protein (hyp.). (E) Downstream of Rmar_2616 are genes encoding: a glycoside hydrolase of the GH43_62_32_68 superfamily (gh43_62_32_68); a predicted glycosyl hydrolase of the GH43/DUF377 family (gh43); a protein similar to glycosyltransferase involved in cell wall biosynthesis (rfaB); a short-chain dehydrogenase (sdr); a saccharopine dehydrogenas (sacc). Upstream of Rmar_2616 are genes encoding: a RNA polymerase ECF-type sigma factor gene (sig70); a tRNA (cytosine-5-)-methyltransferase (met_tra); a Fmu domain protein (fmu); a flavin containing amine oxidoreductase (amine oxidase); a bacterioferritin (ferr.); an exodeoxyribonuclease III (exoIII).

For RmBgl3A, RmNag3 and RmBgl3B no adjacent glycoside hydrolase encoding genes were found (Fig. 1A,B,D). RmNag3 is predicted to be co-transcribed with four intracellular proteins including a 23 s rRNA methyltransferase, a putative ATPase and a transcriptional regulator (Fig. 1B). A gene encoding an ABC transporter was located in the vicinity of RmBgl3B that may be of importance for substrate uptake (Fig. 1D). Signal peptides were predicted for RmNag3, RmBgl3B and RmBgl3C but not for RmBgl3A.

RmXyl3A and RmXyl3B were located in a putative polysaccharide utilization locus (PUL)24,25 for xylan utilization (Fig. 1B). This putative xylan utilization locus is also present in the type strain, consisting of six GHs, including the previously cloned and characterized GH10 endo-xylanase RmXyn10A26,27,28,29 (encoded by Rmar_1069) isolated from the type strain. Table 1 describes the cluster which consists of Rmar_1067, annotated as a transcriptional regulator of the LacI family, and the 14 following genes (Rmar_1067 – Rmar_1081). Several versions and settings of SignalP were used to predict presence of signal peptides. No signal peptide was found for Rmar_1075, Rmar_1076 and Rmar_1081 (encoding RmXyl3B) which were concluded to code for intracellular proteins, while Rmar_1071, Rmar_1073 and Rmar_1080 (encoding RmXyl3A) code for proteins containing a signal peptide. The other genes predicted signal peptide patterns with lower score. A putative por-secretion system C-terminal sorting domain (CDD: cl22550 in the Conserved Domain Database (CDD)) was encoded in Rmar_1068, Rmar_1069, Rmar_1071 and Rmar_1073. This domain-type has previously been described for RmXyn10A13. Promotors were predicted upstream Rmar_1067, Rmar_1068, Rmar_1069, Rmar_1074 and Rmar_1075 and termination sites were predicted downstream Rmar_1067, Rmar_1071, Rmar_1078 and Rmar_1081. The cluster includes a TonB-dependent receptor (encoded by Rmar_1070) predicted to be co-transcribed with RmXyn10A, an RNA polymerase σ-factor, the anti-FecI σ-factor FecR, and a pair of susC (a TonB-dependent receptor) and susD homologs (Rmar_1075 – Rmar_1078).

Table 1 Putative locus for degradation and uptake of xylan in Rhodothermus marinus DSM 4252T.

A second smaller potential carbohydrate utilization cluster is seen around RmBgl3C (a β-glucosidase). RmBgl3C is predicted to be co-transcribed with a cluster of genes. Three genes, two encoding GHs, and a homologue to the RfaB glycosyltransferase (a family GT4 glycosyl transferase) are located downstream RmBgl3C (Fig. 1E). The two GHs, of which one contains two domains, are classified into GH43 by CDD, but into GH130 by the CAZy database. A signal peptide was predicted for RmBgl3C, while surrounding genes were predicted to be intracellular (lacking signal peptides).

Enzyme purification

The activity of all six GH3 candidates was confirmed using the respective cell extract after expression in E. coli. Purification by nickel affinity chromatography resulted in enzyme purities of 70–80% for all six enzymes (as judged by SDS-PAGE, Supplementary Fig. 1), which also showed that the molecular mass of each enzyme was in accordance with the theoretical molecular mass. Dependent on the domain composition (see below), the molecular mass of the GH3 enzymes ranged from 66 to 107 kDa (Supplementary Table 2).

Activity screening

Specific activity was first determined using α- and β-pNP-linked sugars at 60 °C (Supplementary Table 3), a temperature where all enzymes were shown to be stable. The results showed that RmNag3 was only active on pNP-β-GlcNAc, which makes it a putative β-N-acetyl-d-glucosaminidases. RmBgl3A showed highest hydrolytic activity towards pNP-β-Glc and pNP-β-cellobioside (pNP-β-Cel) with some activity on pNP-β-Xyl. RmBgl3B was most active on pNP-β-Glc, but with significant activity on pNP-β-Cel and pNP-β-Xyl, and minor activity on pNP-α-l-Ara. In contrast, RmBgl3C only hydrolysed pNP-β-Glc (with low specific activity). Both RmXyl3A and RmXyl3B showed highest specific activity towards pNP-β-Xyl, but displayed side activity on pNP-α-l-Ara and pNP-β-Glc.

Temperature and pH optima

The effect of temperature and pH on activity was studied using the pNP-substrate with highest specific activity for the respective enzyme (Supplementary Table 4). Highest optimal temperature was observed for RmNag3 and RmBgl3B at 90 °C, while the remaining enzymes displayed activity optima between 60–80 °C. The pH optima for all six enzymes were between pH 5.0–5.6 except RmBgl3C, which was pH 7.0.

Kinetic analysis

Kinetic parameters on the same substrates (Table 2), showed that RmBgl3A had a relatively low KM and higher catalytic efficiency (kcat/KM) for pNP-β-Glc compared with pNP-β-Xyl (Table 2). RmBgl3A also had the highest catalytic efficiency for pNP-β-Glc of the three enzymes with β-glucosidase activity (RmBgl3A, B and C). The KM of RmBgl3C was an order of magnitude higher than the KM of the two other enzymes, showing lower affinity for this substrate. The turnover number for RmBgl3B (kcat) was in the same range for pNP-β-Glc, pNP-β-Xyl and pNP-α-l-Ara. The higher kcat/KM of 511 s−1mM−1 for pNP-β-Glc (due to lower KM) suggests that this is the most preferred synthetic substrate. All three enzymes, RmBgl3A, RmBgl3B and RmBgl3C, showed substrate inhibition while hydrolysing pNP-β-Glc, RmXyl3A and RmXyl3B were active against pNP-β-Glc, pNP-β-Xyl and pNP-α-l-Ara, and comparison of kinetic parameters revealed a definite preference for pNP-β-Xyl with a low KM. RmXyl3A and RmXyl3B showed a similar substrate preference, but RmXyl3B had higher turnover number and catalytic efficiency.

Table 2 Kinetic parameters on aryl substrates. Data presented as means ± standard error from three independent experiments.

The kinetic parameters for RmNg3 were determined using pNP-β-GlcNAc and the KM value was 0.1 mM (Table 2), in a reaction run in the presence of 20 mM sodium phosphate buffer. Additional kinetic analysis was then made using different concentrations of sodium phosphate in 50 mM HEPES buffer (Table 3), to investigate any potential phosphorolytic function for RmNag3. The KM value increased between 0 and 50 mM of phosphate added to HEPES buffer from 0.16 to 0.30 mM, with no further increase at higher phosphate concentration (100 and 200 mM). For kcat a similar trend was observed (Table 3). However, there was no clear trend in kcat/KM values to suggest any significant effect of phosphate on enzyme activity (Table 3).

Table 3 Kinetic parameters for aryl substrate hydrolysis by RmNag3.

Activity on natural substrates

Hydrolysis of natural substrates was monitored using a set of oligosaccharides (Table 4). RmNag3 was able to release glucosamine from chitobiose and chitopentaose, consistent with the pNP-β-GlcNAc hydrolysis, confirming it as a β-N-acetyl-glucosaminidase.

Table 4 Hydrolysis of oligosaccharides.

RmBgl3A produced glucose from both cellobiose and cellohexaose, which confirmed exo-glucanase activity. In addition, the enzyme maintained its bifunctional activity by hydrolysing xylobiose to xylose. RmBgl3B hydrolysed cellobiose, but was not active on cellohexaose, limiting the activity to short substrates. RmBgl3C hydrolysed all glucooligosaccharides tested, but showed clear preference for laminaribiose over cellobiose, and thus a clear preference for hydrolysis of β-1,3- over β-1,4-linkages.

Both RmXyl3A and RmXyl3B were hydrolysing xylobiose, but only RmXyl3B showed exo-glycanase activity and released xylose from both xylohexaose and xylan.

Phylogenetic analysis of the R. marinus GH3 enzymes

A phylogenetic analysis was performed on the amino acid sequences of 100 characterized GH3 proteins, including the six enzymes from R. marinus. Multiple sequence alignment revealed strictly conserved residues within the family and highlights the lack of the otherwise conserved catalytic acid/base among the β-N-acetyl-glucosaminidases (Supplementary Fig. 2). Based on the maximum likelihood tree (Fig. 2), it can be seen that the β-N-acetyl-glucosaminidase cluster is the deepest rooted group within GH3, separated from others. Characterized enzymes clustering in this lineage have two different domain architectures, single-domain or two-domain enzymes. One subgroup consists of both domain 1 and 2, the other one is missing the C-terminal domain 2. RmNag3 clusters with the lineage of β-N-acetyl-glucosaminidases, but instead of the typical single or two-domain architecture, RmNag3 displays three domains, containing an additional β-lactamase domain at its C-terminus. BLAST search using the BALSTp tool30, revealed a number of gene sequences encoding proteins with similar domain architecture (i.e. domain 1, domain 2 and the β-lactamase domain) e.g. genes encoding putative GH3 candidates from Salinibacter altiplanensis (RefSeq: WP_103019701) and Rhodohalobacter halophilus (RefSeq: WP_083750206), indicating a common role. To date, RmNag3 is however the only characterized enzyme with this type of domain architecture.

Figure 2
figure 2

Phylogenetic relationship between proteins in GH3. The maximum likelihood phylogenetic tree was calculated using amino acid sequences of biochemically characterized proteins from GH3 together with six protein sequences from Rhodothermus marinus (highlighted in colored boxes). Sequences belonging to particular group are colored in the same color and the schematic representation of domain organization is shown in a circle above the tree. If domain organization of particular protein differs from prevailing domain organization of the group to which it belongs, its domain organization is shown separately next to the protein.

The other five GH3 enzymes cluster with different subgroups in two main evolutionary lineages. The first lineage consists of the three subgroups: thermostable multifunctional β-xylosidases, closely related β-glucosidases and exo-β-1,3-1,4-glucanases from plants and bacteria (Fig. 2). Four of the R. marinus GH3 enzymes cluster with subgroups within this first lineage (RmBgl3A, RmBgl3C, RmXyl3A and RmXyl3B). RmBgl3A is part of a weakly defined cluster formed by three bacterial β-glucosidases (Fig. 2), all clustering within the β-glucosidase subgroup. The enzyme consists of domains 1, 2 and the fibronectin type III (FnIII) domain. The protein RmBgl3C clusters together with exo-β-1,3-1,4-glucanase from plants and bacteria and is a two-domain enzyme consisting of domain 1 and 2. At its C-terminus it has a C-terminal extension, like exo-glucanase (ExoI) from Hordeum vulgare (UniP: Q9XEI3), instead of the domain 3 found in e.g. exo-glucanase (ExoP) from Pseudoalteromonas sp. BB1 (UniP: Q0QJA3). RmXyl3A and RmXyl3B both consist of domain 1, 2 and the FnIII domain and cluster together with the group of thermostable enzymes with predominant β-xylosidase (EC 3.2.1.37) activity with minor α-l-arabinofuranosidase (EC 3.2.1.55) and β-glucosidase (EC 3.2.1.21) activities (Fig. 2). This group is separated from, and therefore not directly phylogenetically related to, other β-xylosidase clusters originating mainly from plants and fungi.

A second lineage is formed by two large subgroups. The first subgroup comprises fungal and bacterial β-glucosidases, including a group of macrolide β-glucosidases, whereas the second subgroup is formed by β-xylosidases from plants, fungi and bacteria (Fig. 2). RmBgl3B clusters with the macrolide β-glucosidase subgroup together with bacterial (mostly actinobacterial) β-glucosidases involved in activation of secreted antibiotics, activated by the removal of glucosyl moieties from their non-active glycosylated forms. In addition to domains 1, 2 and FnIII, it comprises the PA14 domain (Fig. 3F) inserted between β-strand k and α-helix K.

Figure 3
figure 3

Homology models of the six GH3 enzymes from Rhodothermus marinus DSM 4253. Ribbon representation of (A) RmBgl3A, (B) RmBgl3B, (C) RmBgl3C, (D) RmXyl3A, (E) RmXyl3B and (F) RmNag3. Domain 1 is colored in gray, domain 2 in purple, FnIII in blue, PA14 in green and the linker between domain 2 and FnIII in yellow. RmBgl3A, RmBgl3B and RmBgl3C were modelled as dimers, one chain in each dimer is colored in light colors while the other is colored in dark colors. The β-lactamase domain of RmNag3 was not modelled and is not represented in the figure.

Structural analysis of the R. marinus GH3 enzymes

Domain organization, tertiary structure and active site arrangement for the six GH3 enzymes were investigated by homology modelling and compared to closely related structures. Sequence similarities with template structures and model validation are presented in Table 5. Homology model validation show an overall good quality of the six models however indicating some parts of low quality. The modelled structure of RmBgl3B have low quality in parts of the PA14 and FnIII domains and in loop l in domain 2 close to the active site. For RmXyl3A and RmXyl3B, several long loops in the modelled structures were not present in the templates, making loop predictions in these regions poor. The RmNag3 model had very good quality, despite lower sequence identity with its template.

Table 5 Validation of the homology models of the six GH3 enzymes from Rhodothermus marinus.

The overall structures of the most common domains in GH3 enzymes: (α/β)8 barrel of domain 1, (α/β)6 sandwich of domain 2, and β-sandwich of FnIII were conserved, when present in the modelled structures (Fig. 3)31. The FnIII domain is found in RmBgl3A, RmBgl3B, RmXyl3A and RmXyl3B, and is situated on the side of the domain 1 barrel, in contact with domain 2. The PA14 domain of RmBgl3B is situated on top of the barrel in contact with domain 2. RmBgl3A, RmXyl3A and RmXyl3B were modelled as dimers, with the monomers positioned with the surface created by the FnIII domain, domain 2 and domain 1 next to the opposite chain, and with the linker between domain 2 and FnIII positioned on top of the barrel of the opposite chain.

The active sites are situated on top of the barrel (Supplementary Figs. 3 and 4), are pocket shaped and are made up of both domain 1 and 2, the exception being RmNag3, which will be discussed separately below. RmBgl3B has a very deep and narrow active site pocket, and the PA14 domain as well as a long loop g covers the surface over the active site. The PA14 domain of RmBgl3B differs considerably from the PA14 domain present in closely related GH3 enzyme KmBglI from Kluyveromyces marxianus which have been shown to be involved in substrate interaction32. For RmBgl3A, the linker between domain 2 and the FnIII domain on the opposite chain is involved in shaping the active site. This linker makes the active site deeper on one side, compared to the active site of RmBgl3C and the β-xylosidases. However, RmBgl3A displays a shorter loop b which opens up the active site on the opposite side. RmBgl3C has a different conformation of loop j on domain 2, making the active site narrower in this direction.

The homology model of RmNag3, the β-N-acetyl-glucosaminidase, displays a well-conserved tertiary structure as well as active site arrangement (Fig. 3F, Supplementary Figs. 3F, 4.F and Table 6). The enzyme consists of domain 1 and 2 but only domain 1 builds up the active site (Supplementary Fig. 3F). Conserved residues important for catalysis were found in positions corresponding to those in structure determined GH3 β-N-acetyl-glucosaminidases: catalytic nucleophile Asp118 on β-strand g33, the Asp227-His229 dyad on loop e34, and Arg186 and Phe188 on loop d35,36 (Supplementary Fig. 4F). Similar to other GH3 enzymes, Lys216 and His217 on β-strand e and Asp118 on β-strand c was found hydrogen bonding to GlcNAc in subsite −1. Superimposition with NagZ from Pseudomonas aeruginosa in complex with GlcNAc and l-Ala-1,6-anhydroMurNAc (PDB: 5G3R), showed that Arg60 on β-strand a, Arg126 on loop c and Glu306 on loop g were potentially hydrogen bonding to a sugar unit in subsite +1 (Supplementary Figs. 3F and 4F). All residues important for substrate binding were conserved in the closely related NagA from Streptomyces thermoviolaceus (Table 6).

Table 6 Comparison of potential substrate interacting residues with closely related characterized GH3 enzymes based on sequence alignment and structure.

The modelled structures of the β-glucosidases and β-xylosidases presented −1 subsite-architectures typical of GH3, with small but important differences (Fig. 4A, Supplementary Figs. 3, 4 and Table 6). The strictly conserved catalytic nucleophile on β-strand g37, the catalytic acid/base on loop l in domain 238, a tandem Lys and His on β-strand e, Arg on loop d and Met on β-strand f were in positions corresponding to those in other GH3-structures39. Subsite −1 of RmXyl3A and RmXyl3B were very similar. This was also the case for subsite −1 of RmBgl3A and RmBgl3C, while RmBgl3B revealed two non-conserved residues: Arg178 on loop e and Ser380 on loop j potentially hydrogen bonding with the glucose in subsite −1 (Fig. 4A). A difference between the β-xylosidases and β-glucosidases was found on β-strand c; the two β-xylosidases possess a Glu and the three β-glucosidases an Asp, potentially hydrogen bonding with the respective sugar in subsite −1 (Fig. 4A). These features were conserved in the closest relative of the respective enzyme (Table 6).

Figure 4
figure 4

Comparison of active sites of the β-glucosidase and β-xylosidases showing residues involved in (A) subsite −1 and (B) subsite +1. Superimposition of homology models of RmBgl3A (pink), RmBgl3B (blue), RmBgl3C, RmXyl3A (yellow) and RmXyl3B (orange). Cellobiose (gark gray) and laminaribiose (light gray) in subsite −1 and +1 from PDB entries 1IEX and 1J8V respectively, both solved in complex with ExoI from barley. RmXyl3A and RmXyl3B have similar −1 and +1 subsites and RmBgl3C have a similar −1 subsite as RmBgl3A in terms of possible interacting residues.

Subsite +1 was studied by superimposition of structure determined enzyme-ligand complexes (PDB: 1IEX, 1J8V, 4ZO9 and 5AE6) (Fig. 4B, Supplementary Figs. 3, 4 and Table 6). Both β-glucosidases and β-xylosidases displayed an aromatic residue on top of subsite +1 on loop d. In RmBgl3A, RmBgl3C and the two β-xylosidases an aromatic residue below subsite +1, was potentially sandwiching the sugar unit. This residue was positioned in different loops in the respective enzyme (Fig. 4B). In RmBgl3A, Tyr614 is located in the linker region between domain 2 and FnIII of the opposite chain. In RmBgl3C, Trp453 is located in loop j in domain 2. In RmXyl3A and RmXyl3B, Tyr545 and Trp547, respectively, are located two positions after the acid/base catalyst on loop l in domain 2. Due to low sequence identity to the model templates, the prediction of loop l in domain 2 is, however uncertain, including the potential involvement of Tyr545 and Trp547 in subsite +1. In RmBgl3A, an additional residue, Arg606 in the linker between domain 2 and FnIII, is potentially interacting with subsite +1. In RmBgl3B, RmBgl3C, RmXyl3A and RmXyl3B, a Tyr on loop f creates hydrogen bonding possibilities. RmBgl3A has Phe254 in the corresponding position. The interacting residues in subsite +1 were not completely conserved in the closest related structure of the respective enzyme (Table 6). Tyr614 in RmBgl3A corresponded to Glu587 in the β-glucosidase from Elizabethkingia meningoseptica and the aromatic residue on the bottom of subsite +1 in RmXyl3A and RmXyl3B was replaced by Cys524 in Xyl3A from Caldanaerobius polysaccharolyticus.

Discussion

Rhodothermus marinus has served as a source of various thermostable enzymes. The Carbohydrate Active enZymes (CAZy) database (http://www.cazy.org)19 reports 55 glycoside hydrolase (GH) encoding genes in 33 different GH families and two non-classified genes in the type strain (DSM 4252T). Six genes encodes GH family 3 (GH3) candidates and five of them are annotated as β-glucosidases in Kyoto Encyclopedia of Genes and Genomes (KEGG). However, the GH3 members of strain DSM 4253 have different substrate specificities, consistent with differences in their modelled structures. It is very important to mention that the GH3 candidates from both strains of R. marinus DSM 4252T and 4253 are homologous. The most common domain architecture in GH3 is either a two domain type: domain 1 and 2, with the catalytic nucleophile in domain 1 and acid/base in domain 240 or a three domain type, with an FnIII-domain combined with domain 1 and 241. Besides these domain architectures, there are two clusters of proteins in the phylogenetic tree containing an additional domain type, PA14, inserted between β-strand k and α-helix K. Since the two clusters are phylogenetically separated (Fig. 2) and differ in both type and orientation of the PA14 domain, it is likely that they are results of two evolutionary independent insertions.

RmNag3 has a well-conserved −1 subsite and several possible hydrogen bonding residues in subsite +1 (Fig. 4F). Among the interacting residues in subsite +1 Arg126 was found conserved in almost all structure-determined GH3 β-N-acetyl-glucosaminidases. RmNag3 possesses a signal peptide, a β-lactamase domain and showed optimal activity at high temperature. It was earlier proposed that all β-N-acetyl-glucosaminidases were phosphorylases42. However, this seems less likely as the presence of phosphate in the kinetic reaction showed little effect on hydrolytic activity of RmNag3. An increase in KM was observed at 50 mM phosphate, but KM remained unaffected with increasing phosphate concentration. A similar trend was also observed for the β-N-acetyl-glucosaminidase from Herbaspirillum seropedicae which does not have phosphorylase activity43. For the GH3 phosphorylase from Cellulomonas fimi the presence of phosphate had significant effect on the hydrolytic activity42 compared to RmNag3. It is also interesting to see that RmNag3 had no activity on pNP-β-Glc while both enzymes from H. seropedicae and C. fimi were active on this substrate42,43. This narrow substrate specificity could be due to the presence of the β-lactamase domain in RmNag3. Two main biological roles for β-N-acetyl-glucosaminidases are degradation of chitin and peptidoglycan turnover. R. marinus possesses an extracellular GH18 chitinase44 and an intracellular GH20 β-N-acetyl-hexosaminidase, suggesting chitin utilization in R. marinus. However, the β-lactamase domain in RmNag3 suggests association with peptidoglycan turnover as the degradation products have been shown to function as inducers for β-lactamase45. β-N-acetyl-glucosaminidases involved in peptidoglycan turnover are often but not exclusively intracellular in Gram-negative bacteria46. Comparison with peptidoglycan turnover in Escherichia coli strain K12 shows that the R. marinus type strain genome contains orthologous genes for degrading peptidoglycan and release of GlcNAc, including anhMurNAc and tetrapeptide in the periplasmic space (endolytic murein transglycosylase (MltG), membrane-bound lytic murein transglycosylase (MltD), N-acetylmuramoyl-l-alanine amidase (AmiA) and NagZ (ortholog to RmNag3)). No intracellular orthologs to NagZ, the intracellular N-acetylmuramoyl-L-alanine amidase (AmpD) or the permease AmpG for transportation of GlcNAc-1,6-anhydro-MurNAc-peptides across the inner membrane are found in R. marinus. The lack of AmpG makes it likely that RmNag3 is present in the periplasm. This set-up is found in the Gram-positive Bacillus subtilis which lacks AmpG and AmpD, and has a β-N-acetyl-glucosaminidase (NagZ) and an N-acetylmuramoyl-L-alanine amidase (AmiE) in the periplasm47. In B. subtilis, transportation of GlcNAc and anhMurNAc is believed to be done by the phosphotransferase system enzymes MurP (anhMurNAc) and NagE (GlcNAc). No orthologs for these proteins were found in R. marinus, however, AnmK (anhydro-N-acetylmuramic acid kinase) and NagK (N-acetyl-d-glucosamine kinase) were present as intracellular proteins.

Both RmBgl3A and RmBgl3B showed specificity for cellobiose but only RmBgl3A could hydrolyse cellohexaose. RmBgl3A is expected to be intracellular while RmBgl3B, which has a higher temperature optimum and a predicted a signal peptide, is likely exported. RmBgl3A is a dimer with a conserved subsite −1 and a well-defined subsite +1 including two aromatic residues sandwiching the sugar and Arg606 on the linker positioned between domain 2 and the FnIII domain on the opposite chain. These residues are also conserved in the closest related structure-determined enzymes (Supplementary Table 1). The corresponding residue to Arg606 have been hypothesis by others to be responsible for specificity towards β-1,2- and β-1,3-linked glucose and exclusion of activity on β-1,4-linked glucose48. This is not the case, either for RmBgl3A due to lack of activity on laminaribiose (β-1,3), nor the closest structure-determined enzyme, JMB19063 isolated from compost metagenome, which has activity on cellooligosaccharides49. RmBgl3B has a well-defined subsite −1 with two additional interacting residues (Arg178 and Ser380) compared to RmBgl3A and a weakly interacting subsite +1 with only two residues, one stacking residue and one hydrogen bonding residue, also conserved in the closest related structure-determined enzymes (Supplementary Table 1). Arg178 on loop e in subsite −1 (Fig. 4A) is conserved in several structure-determined GH3 β-glucosidases, especially within the fungal kingdom. Ser380 is situated on a small α-helix on loop j and is found in several other structures, including many fungal β-glucosidases, but is not as common as Arg178. The inability of RmBgl3B to cleave longer substrate is most likely a result of the very deep active site cleft created by the PA14 domain. The domain is of the same type and is inserted in the same direction as in the main template, DesR from Streptomyces venezuelae, which is placed in the same cluster (Fig. 2) and active on glucosylated macrolides50. The PA14 domain in other carbohydrate-active proteins, including the closely related KmBglI from Kluyveromyces marxianus, have been associated with oligosaccharide binding32,51,52. However, no substrate interaction possibility for the PA14 domain was found, neither in the model of RmBgl3B nor in the structure of DesR.

RmBgl3C has linkage preference towards β-1,3-linked glucose which is found in laminarin, a polysaccharide present in brown algae53, in a bacterial polysaccharide called curdlan54 and in various mixed-linkage glucans. It clusters under the linage that consists of exo-glucanases from plants and bacteria. Despite C-terminal extension differences, the exo-glucanase ExoP from Pseudoalteromonas sp. BB1 and RmBgl3C display similar activity on laminarioligosaccharides55. RmBgl3C has a clear signal peptide and higher pH optimum than the other GH3 enzymes in R. marinus. Therefore, it is likely that the enzyme is exported out of the outer membrane. Corresponding pH optima were also observed in RmXyn10A and RmCel12 which are exported13,56. RmBgl3C displays a subsite −1 similar to RmBgl3A and a subsite +1 with two aromatic resides sandwiching the sugar, the conserved Tyr311 on loop g and Trp453 on loop j, and Tyr278 with a hydrogen bonding possibility. Trp453 is located in a similar position as Tyr614 of RmBgl3A but the two residues are places on different loops (Fig. 4B). Trp453 is only found in a few structures including the closest structure-determined enzymes (Supplementary Table 1). The two characterised ExoI from barley (Hordeum vulgare) and ExoP, are in similarity with RmBgl3C β-1,3-β-1,4-glucanases. The structural basis for this specificity has been investigated but not clearly understood57. Interestingly, loop e which is involved in shaping subsite +1 is longer in RmBgl3C than in any of the closest structure-determined enzymes. An accurate conformation of this loop can be important in understanding the specificity of RmBgl3C, as Arg228 on loop e in ExoP was found to hydrogen bond to the glucose in subsite +1, when laminaribiose was modelled into the structure58.

RmXyl3A and RmXyl3B are both bifunctional β-xylosidases/β-1,4-glucosidases but show different activity patterns. They are 93% identical, positioned next to each other on the chromosome, and thus, likely a result of a relatively recent gene duplication. There are two less similar regions identifiable in the sequences. The first region is located in domain 1 in between β-strand g and α-helix H2, whereas the second one is found in domain 2 on an unusually long loop between β-strand k and α-helix K. RmXyl3B has endo-β-xylanase activity while RmXyl3A only hydrolysed xylobiose and, interestingly, cellohexaose. Based on kinetic analysis RmXyl3B also showed specificity for pNP-α-l-Ara. A similar type of bifunctional β-xylosidase/β-1,4-glucosidase has been characterised from Caldanaerobius polysaccharolyticus59, in the same phylogenetic cluster as RmXyl3A and RmXyl3B (Fig. 2). Subsite −1 of RmXyl3A and RmXyl3B is similar to that of RmBgl3A and RmBgl3C except for one residue potentially responsible for the difference in specificity. The two β-xylosidases/β-1,4-glucosidases display a Glu instead of an Asp on β-strand c in subsite −1 (Fig. 4A). This difference could affect the preference for glucose or xylose in subsite −1 as the Asp is making no steric hindrance for the additional CH2OH group of glucose. The same pattern was found in other GH3 enzymes (Table 6), including ExoI from barley and Xyl3B from Prevotella bryantii where substitution of Glu115 to an Asp increased catalytic activity of Xyl3B on pNP- β-Glc39. Another interesting feature is the aromatic residue on loop j in domain 2, two positions away from the potential acid/base catalyst in RmXyl3A and RmXyl3B (Fig. 4B), with the possibility to be in a position analogous to Tyr614 in RmBgl3A. This was not found in the Caldanaerobius polysaccharolyticus β-xylosidase/β-1,4-glucosidase, but comparison with structures of other GH3 enzymes, reveals an aromatic residue two positions after the acid/base catalyst and a loop j of similar length in several fungal β-glucosidases with very low sequence identity to the R. marinus β-xylosidases/β-1,4-glucosidases (see PDB: 4IIB, 5FJI, 5FJJ, 5NBS, 4DOJ and 3ZYZ), and stacking with a glucose in subsite +1 has been shown for AaBGL1 from Aspergillus aculeatus60.

The genomic context of the GH3 enzymes was informative to some extent concerning their potential role in utilization of carbohydrate substrates. The genome comparison between R. marinus DSM 4252T and 4253 showed that all the corresponding GH3 gene loci are identical. In R. marinus DSM 4252T and DSM 4253, the putative PUL consists of six GHs: the endo-1,4-β-xylanase RmXyn10A13 with putative location in the extracellular space, an uncharacterized GH10 (a family containing mainly endo-1,4-β-xylanases) and the two GH3 β-xylosidases characterized in the current study, suggesting that the cluster is involved in utilization of 1,4-β-linked xylans. The potential extracellular GH43 belongs to subfamily 15 with no characterized proteins so far. The uncharacterized GH67 is annotated as an α-glucuronidase in NCBI, suggesting that xylan substituted with glucuronic acid or methyl glucuronic acid could be utilised by the PUL. Besides glycan degrading enzymes, a canonical PUL involves a pair of susC and susD homologues and a regulator25, components which are all found in the described cluster. SusC and SusD, first described as vital components of a starch utilization system (SUS) in Bacteroides thetaiotaomicron, are co-regulated genes involved in coordinated binding, transport and degradation of carbohydrates from the extracellular space via the outer membrane into the periplasmic space24. SusC is a type of TonB-dependent receptor, an outer membrane transporter known to transport ferric chelates, but also shown to transport carbohydrates. TonB-dependent transporters require energy and three inner membrane proteins in a complex, TonB-ExbB-ExbD, for the transportation61. SusD is a lipoprotein attached to the outer cell membrane shown to bind carbohydrates. In addition, the putative regulation system identified in the cluster, called trans-envelope signalling, involving an extracytoplasmic function (ECF) σ-factor/anti-σ-factor system is found in PULs from Bacteroidetes24. Biochemical and structural characterization of the hydrolytic enzymes involved in the cluster, transcriptome analysis during growth on different substrates as well as knock-out of the genes are necessary to fully understand the mechanisms involved in the PUL described in this study.

In summary, the biochemical and structural characterisation of the GH3 enzymes from R. marinus DSM 4253 (and 4252T), shows that the six GH3 enzymes encoded in the genome have non-redundant substrate specificities which are involved in extracellular laminarin, potential macrolide degradation, as well as intracellular cellobiose to glucose conversion, the conversion of xylans, and recycling of peptidoglycans, giving significant insights into structural features important for the specificity of these enzymes as well as the organization of corresponding loci in the R. marinus genome.

Materials and Methods

Chemicals

Laminaribiose, cello-, xylo- and acetyl-chitooligosaccharides were obtained from Megazyme (Wicklow, Ireland). Sodium acetate, laminarin from Laminaria digitata, xylan from birch wood, para-nitrophenol and all para-nitrophenyl-β-d-glycosides were purchased from Sigma-Aldrich (St. Louis, Mo). All other chemicals were of molecular biology or analytical grades and purchsed from VWR International (Stockholm, Sweden).

Bacterial strains, genome sequencing and gene cluster analysis

Rhodothermus marinus strain DSM 4253 was isolated from an intertidal hot spring in Iceland, at a location close to that of the Rhodothermus marinus DSM 4252T (Type strain)6. The genome was sequenced using TrueSeq chemistry for library construction and MiSeq sequencing platform (unpublished data). Sequencing data was assembled using GS De Novo Assembler software (Roche) and annotated using the RAST annotation server at rast.nmpdr.org62. Genes, encoding enzymes of glycoside hydrolase (GH) family 3 (GH3), were identified by BlastX63 using GH3 amino acid sequences from the R. marinus type strain retrieved from the Carbohydrate-Active enZYme (CAZy) database19 (http://www.cazy.org) as query sequences. Sequence regions harbouring the GH3 genes along with flanking sequences, spanning approximately 15 kb, were extracted from the genome sequence using the Geneious molecular biology tool. The structure of corresponding loci of the GH3 genes was resolved in a genome viewer and the neighbouring genes were analysed by BLAST63.

Gene clusters were analysed in the genome of the type strain12 by annotations presented in the National Center for Biotechnology Information (NCBI) Nucleotide database, Reference Sequence (RefSeq): NC_013501.1, of genes in proximity to the six genes (Rmar_0536, Rmar_0925, Rmar_1080, Rmar_1081, Rmar_2069, and Rmar_2616) encoding the putative GH3 enzymes: RmBgl3A, RmNag3, RmXyl3A, RmXyl3B, RmBgl3B, and RmBgl3C. Additional annotations were done by compiling the information from searches in the Conserved Domain Database (CDD) from NCBI64 and in the CAZy database in the case of annotated GHs. Operons were predicted by DOOR 2.065,66 and Genome2D67. Putative promotors were predicted by PePPER68 and Rho-independent translational terminator sites were predicted by ARNold69 and DOOR 2.0. Potential co-transcription was based on predicted operons, promotors and transcription terminators. Predictions of signal peptides were done by SignalP versions 3.0 and 4.170,71,72. BlASTp30 was used to investigate poorly annotated genes and search for domains. Potential cellular location was based on prediction of signal peptides and the putative por-secretion system C-terminal sorting domain.

Sequence alignment and evolutionary relationships

From 301 characterized protein members classified in GH3 of the CAZy database, 100 with known activity were selected (Supplementary dataset file). Amino acid sequences of these proteins were retrieved mainly from the UniProt database73 and some were obtained from GenBank74 and RefSeq75 databases. All sequences of characterized GH3 proteins together with those of the six proteins from R. marinus were aligned using the programme Clustal-X76 and then the alignment was further manually fine-tuned.

Maximum likelihood phylogenetic tree77 was calculated using the PhyML algorithm78 available through T-REX server (http://www.trex.uqam.ca/79. Gamma shape parameter and proportion of invariable sites were estimated by the PhyML itself. Number of relative substitution rate categories was set to four and WAG substitution model80 was used. The starting tree was calculated using BIONJ. NNI was used for a tree improvement. Tree topology and branch lengths were optimized. Reliability of tree topologies was evaluated using the bootstrap test81 with 100 replications. For the phylogenetic analysis, only amino acid sequences of domain 1, which is universally present in all proteins from GH3, was used.

Structure homology modelling

Homology modelling of the six GH3 of R. marinus DSM 4253 was carried out using the YASARA program82,83 with default settings except templates that were manually inserted (Supplementary Table 1). For each enzyme, a BLASTp search63 in the Protein Data Bank (PDB) was done and the five enzymes with the highest score were chosen. If several structures were available, the structure without mutations, with a high resolution and relevant ligand was chosen. If the sequence identity was more than 10% units lower than the hit with the highest sequence identity, this template was not used. Only relevant ligands were kept in the structure and in some cases ligands were manually modified: in RmXyl3A and RmXyl3B into a xylose and in the case of RmNag3 into a β-N-acetyl-glucosamine before entering the homology modelling. Only ions conserved among the templates were kept in the structures. Only dimerization conserved among the templates were kept and in other cases only chain A was kept. The β-lactamase domain of RmNag3 was removed before homology modelling since no hit was found for the entire protein and no hit with a sequence identity above 30% was found for the domain alone. YASARA homology modelling generates five alignments for each template and builds a model for each alignment. A hybrid model is generated by combining the best part of the models. Each hybrid model was manually checked by superimposition and comparison of templates and other GH3 structures in Chimera84. Side chains in the active site and ligand positioning were manually changed based on conserved features in the closest related structures. The peptide bonds linking the conserved residues Lys and His on β-strand e and the following two amino acids were manually modified into cis-conformation, which are conserved within GH3. Modified models were energy minimised and then refined in YASARA. Refinement was done with default setting, only changing temperature and pH to the optima for each enzyme (Supplementary Table 4), with a 500 ps simulation of molecular dynamics with the YASARA2 force field85. During the simulation, 20 trajectories were saved, energy minimised and analysed by checking the energy of the system as well as dihedral angles, packaging1D and packaging3D. The trajectory with best overall quality was further evaluated. Evaluation of each refined structure was done manually and by several online validation tools. Superimposition in Chimera verified that the active site arrangement was kept and stabilized by the simulation. If not, modifications of the hybrid model were revised and a new version was refined. The quality of each refined structure was evaluated by average 3D-1D score generated by Verify3D86,87, Ramachandran plot generated by PROCHECK88, Z-score generated by ProSA-web89,90 and overall quality factor generated by ERRAT91. For models containing ions, refinement and evaluation with and without the ions were done. In case of a lower quality with the ions, indicated by ERRAT, the homology modelling and refinement were rerun after deletion of the ion(s) in the templates.

Cloning and expression of GH3 genes

R. marinus DSM 4253 was grown in 500 mL of DIFCO™ Marine Broth (BD, NJ, USA) at 65 °C for 12 hours. The cell pellet was harvested by centrifugation and washed twice with 200 mM sodium phosphate buffer pH 7. Genomic DNA was extracted using ZymoBead™ Genomic DNA Kit (Zymo Research, CA, U.S.A). The GH3 encoding genes, without signal peptide, were amplified using the primers listed (Supplementary Table 2). A C-terminal His-tag was introduced in the primer-designs, except for Rmar_2069. Genes digested with NdeI and BglII were ligated into pJOE307592, a non-commercial vector and transformed to Escherichia coli BL21 C43. Rmar_2069 was digested by NdeI and HindIII, propagated in vector pUC19 followed by sub-cloning into pET-21b(+) (Novagen, Madison, WI) in frame with the C-terminal His-tag and transformed to E. coli BL21 C43. Expression was performed at 0.5 L cultivation scale in Erlenmeyer flasks at 37 °C in LB containing 100 µg/mL ampicillin. After reaching an optical density at 620 nm of 0.5, gene expression was induced by 1 mM IPTG for the pET-construct and 0.2% (w/v) L-rhamnose for pJOE3075-constructs92. Production was continued at 30 °C over-night. Cells were harvested by centrifugation at 5000 × g for 15 min at 4 °C and washed twice with 20 mM sodium phosphate buffer pH 7.0.

Purification

Cell pellets were resuspended in binding buffer (20 mM sodium phosphate buffer, 500 mM NaCl, 20 mM imidazole, pH 7.4), and lysed by sonication 5 × 3 min, at 60% amplitude and a cycle of 0.5 using a 14-mm titanium probe (UP400 S; Hielscher Ultrasonic GmbH, Teltow, Germany). For Rmar_0925 encoding the RmNag3, the binding buffer contained 20 mM Tris-HCl instead of sodium phosphate buffer Cell debris was removed by centrifugation (14000 × g, 20 min, 4 °C), prior to purification by nickel affinity chromatography using an ÄKTATM start system (GE Healthcare) with a HisTrap FF crude 1 mL column (GE Healthcare). Bound protein was eluted using gradient of imidazole 20–500 mM and fractions of 1 mL were collected. All purified proteins were stored at 4 °C.

Hydrolysis of para-nitrophenyl glycosides

Enzyme catalysed hydrolysis of para-nitrophenyl (pNP)-linked substrates was assayed spectrophotometrically at 405 nm using a UV-1650PC spectrophotometer (Shimadzu, Kyoto, Japan) connected to a JulaboMB (Labortechnik GMBH, Germany) temperature-controlled system at 60 °C. Final reaction volume was 600 µL and contained 1 mM substrate dissolved in 20 mM citrate phosphate buffer at pH 5.6. The reaction was initiated by adding 0.04–0.2 µM of enzyme to the pre-incubated reaction solution and monitored for 5 min. 1 U equals the amount of enzyme required to release 1 µmol pNP min−1. The extinction coefficient for pNP at 60 °C is 1426 M−1cm−1 and 18072 M−1cm−1 at pH 5.6 and 6.0 respectively. For kinetic parameters, substrate concentrations were 0.05–20 mM, at reaction conditions and enzyme concentration as above except for RmNg3. Additional the kinetic assays for RmNg3 were performed at pH 6.0 using 50 mM HEPES buffer, while in the presence of phosphate the buffer was supplemented with 50 mM, 100 mM and 200 mM of sodium phosphate. Each reaction was monitored for 10 min and KM, Vmax and Ki values were calculated from GraphPad Prism V6. For RmNag3, the kinetic parameters for determining phosphorolytic activity were obtained by using KinTek Explorer. The pH and temperature optima were determined in the pH-range 3.0–6.0 (50 mM sodium citrate phosphate buffer), and 7.0–8.0 (50 mM sodium phosphate buffer) and at 40–90 °C in assays with a protein concentration of 0.04–0.1 µM and 1 mM of pNP-β-Glc or pNP-β-Xyl. All reactions were run in triplicates.

Hydrolysis of oligosaccharides

Oligosaccharides (10 mg/mL) with a degree of polymerization (DP) of 2 (cellobiose, laminaribiose and xylobiose) and DP 6 (cellohexaose and xylohexaose) in 500 µl, 20 mM citrate phosphate buffer, pH 5.6 were incubated with 5–10 µg/mL enzyme in duplicate reactions in a ThermoMixer (HLC Biotech, Bovenden, Germany) at 60 °C, 600 rpm. Samples (30 µL) were withdrawn at 20 min intervals for 120 min, and diluted with 970 µL 0.5 mM NaOH before analysis on high-performance anion exchange chromatography with pulsed amperometric detection (HPAEC-PAD) (Thermo Fisher Scientific, Waltham, USA), using a CarboPac PA200 column (250 mm × 3 mm, 5.5 µm) and a guard column (50 mm × 3 mm) (Thermo Fisher Scientific, Waltham, USA) of the same material. A mobile phase of 100 mM NaOH at 0.5 mL/min and a linear gradient of sodium acetate (0–120 mM) was used for 25 min. Standards included glucose, cellooligosaccharides (DP 2 to 6), xylose and xylooligosaccharides (DP 2 to 6). An identical method was applied to analyse hydrolysis of acetylated chitooligosaccharides, with N-acetylated chitooligosaccharides (DP 2 to 5) and glucosamine as standards.