Mining for novel cyclomaltodextrin glucanotransferases unravels the carbohydrate metabolism pathway via cyclodextrins in Thermoanaerobacterales

Carbohydrate metabolism via cyclodextrins (CM-CD) is an uncommon starch-converting pathway that thoroughly depends on extracellular cyclomaltodextrin glucanotransferases (CGTases) to transform the surrounding starch substrate to α-(1,4)-linked oligosaccharides and cyclodextrins (CDs). The CM-CD pathway has emerged as a convenient microbial adaptation to thrive under extreme temperatures, as CDs are functional amphipathic toroids with higher heat-resistant values than linear dextrins. Nevertheless, although the CM-CD pathway has been described in a few mesophilic bacteria and archaea, it remains obscure in extremely thermophilic prokaryotes (Topt ≥ 70 °C). Here, a new monophyletic group of CGTases with an exceptional three-domain ABC architecture was detected by (meta)genome mining of extremely thermophilic Thermoanaerobacterales living in a wide variety of hot starch-poor environments on Earth. Functional studies of a representative member, CldA, showed a maximum activity in a thermoacidophilic range (pH 4.0 and 80 °C) with remarkable product diversification that yielded a mixture of α:β:γ-CDs (34:62:4) from soluble starch, as well as G3–G7 linear dextrins and fermentable sugars as the primary products. Together, comparative genomics and predictive functional analysis, combined with data of the functionally characterized key proteins of the gene clusters encoding CGTases, revealed the CM-CD pathway in Thermoanaerobacterales and showed that it is involved in the synthesis, transportation, degradation, and metabolic assimilation of CDs.

, Carboxydocella sp. 53 , and Thermoanaerobacterium thermosulfurigenes 54 . Moreover, since the identification of CGTases for structure-function relationship studies has also been the central focus over the years, their functional role in a putative CM-CD pathway for extremely thermophilic bacteria remains obscure. In this work, a novel group of CGTases from GH13_2 with an exceptional three-domain ABC architecture was detected by (meta)genome mining of microbial communities living in a wide variety of hot environments on Earth. Sequence analysis revealed that this group of CGTases belongs to the extremophilic Thermoanaerobacterales Caldanaerobacter subterraneus ssp., and Thermoanaerobacter spp. and shares ≤ 46% sequence identity with the CGTases characterized thus far. Sequence and comparative genomic analysis also showed that the three-domain ABC CGTase-encoding genes are exceptionally grouped in unrevealed gene clusters that encode the entire CM-CD pathway and several important proteins for prokaryotic cell functions. Together, functional studies of a representative member, CldA, combined with phylogenetic analysis revealed a new evolutionary path among CGTases and shed light on a nonclassical pathway for starch metabolism in Thermoanaerobacterales.

Results
Database mining for novel thermophilic CGTase enzymes. To identify putative CGTases involved in the CM-CD pathway of extremely thermophilic bacteria, a database mining approach was applied to ~ 130 public metagenomes of microbial communities from diverse thermophilic environments (Tables S1 and S2). Notably, a low number of putative CGTases were detected (14 hits in total; Table S1), which seems to be related to the rarity of the CM-CD pathway in extremely thermophilic bacteria living in starch-poor environments. Nevertheless, a CGTase-encoding gene (cldA) from Obsidian Pool hot spring metagenomic data at Yellowstone National Park was distinguished (Tables S1 and S2). Sequence analysis revealed that CldA consists of 524 residues and shares ≤ 42% sequence identity (100% query coverage) with the 51 characterized enzymes from GH13_2. A BLAST search in the nonredundant GenBank database revealed another three CldA-like sequences that share 98% average sequence identity with CldA (100% query coverage) and are annotated as hypothetical glycosidase/ α-amylase enzymes in eight available genomes from several Thermoanaerobacterales subspecies of G+ thermophilic Caldanaerobacter subterraneus (Table S3). Although C. subterraneus subspecies (T opt of 60-85 °C) are found in various extremophilic environments [55][56][57] , they natively live in the Obsidian Pool hot spring at Yellowstone National Park 58 . Sequence analysis also revealed that CldA exhibits a 21-residue N-terminal signal peptide, 1 MRKNFKAFVALFAAILLFFSGC, which contains a positively charged tail, 2 RKNFK, followed by a hydrophobic core region that ends with the conserved Cys22 (boldface residues) typical for the cleavage site of signal peptidase type II (SPII) 59 . In agreement with this observation, the extracellular glycoside hydrolases of the GH13 family from G+ bacteria are translocated from the cytoplasmic membrane through the general secretion (Sec) system 60,61 . Because CldA and CldA-like enzymes displayed an unusual short-form sequence compared to conventional five-domain CGTases (Figs. 1A and S1), a functional domain analysis was conducted. Remark- , and four-domain ABCE CBM20 CGTases (red), which are recognized by CAZy. Note that the novel group of 19 CGTases, (CldA/ThmA)like enzymes from thermophilic C. subterraneus ssp. and Thermoanaerobacter spp., showed a three-domain ABC architecture (magenta). (B) Multiple amino acid sequence alignment of CGTases from GH13_2 with a conventional five-domain ABCDE CBM20 (blue), five-domain ABCDE arch (orange), four-domain ABCE CBM20 (red), and three-domain ABC distribution (magenta), as well as maltogenic starch-acting enzymes (white). Note the CSR I-VII motifs showing functionally critical residues (asterisk) for the GH13 family. The underline indicates the conserved acidic catalytic triad Asp x , Glu y , and Asp z from CSR II, III, and IV, respectively. The conserved aromatic central Tyr/Phe residue (green sphere) and the hydrophobic pair (Phe/Trp/Tyr)/(Phe/Tyr/Met) (H1 and H2 shadow boxes), which are essential for the cyclization activity of CGTases and to distinguish them from α-amylases are also showed 13,30,31 1B), including the highly conserved catalytic triad Asp250, Glu279, and Asp351 from CSR II, III, and IV, respectively (Fig. 1B), which is involved in glycoside bond cleavage 18 . Furthermore, both the conserved aromatic central Phe216 residue from CSR V (which is usually replaced by a nonaromatic residue in α-amylases) and the pair of hydrophobic residues Trp204/Met281, which are critical in sugar chain circularization for CD formation, were observed ( Fig. 1B) 31,62 . Interestingly, while Met281 belongs to CSR III, Trp204 is found in a 199 GSISNWN motif. Although CldA was found in G+ bacteria, both 199 GSISNWN and CSR VI motifs were observed in CGTases from archaea and G− (Fig. 1B). Hence, the presence of these unique three-domain ABC CGTases in the Caldanaerobacter genus (Table S3) also suggests a putative CM-CD pathway for starch metabolism.
Functional characterization of CldA. The recombinant CldA enzyme was successfully produced in Escherichia coli to evaluate CGTase activity. The mature form of CldA consists of 511 residues with a calculated molecular mass of 58.4 kDa, including a C-terminal His 6 -tag sequence without the N-terminal signal peptide. Protein purification was performed by a heat treatment procedure and nickel-affinity chromatography followed by size-exclusion chromatography (SEC)-dynamic light scattering (DLS) coupled experiments (Fig. S2A), resulting in a purification yield of ~ 45 mg CldA from 1 L of culture. Purified recombinant CldA showed a molecular mass of 58.5 kDa in the SEC-DLS analysis with an optimal monodispersity (Mw/Mn = 1.02), showing that the biological assembly is monomeric (Fig. S2A). CldA also showed a molecular mass of ~ 58 kDa on SDS-PAGE (Fig. S2B) and a theoretical isoelectric point (pI) of 5.7. CldA displayed cyclization activity over a broad range of temperatures from 40 to 100 °C and pH ranges from 4 to 8 ( Fig. 2A), using soluble starch as the substrate. Furthermore, CldA reached more than 65% relative cyclization activity at acidic pH (4)(5) and high temperatures (70-90 °C) ( Fig. 2A). CldA also displayed a half-life (t 1/2 ) of 25.5 min at 80 °C and extraordinary thermostability at 70 °C (t 1/2 = 63.4 h) (Fig. S3). CD production was monitored over time by incubating CldA with 50 g L −1 soluble starch at 75 °C and pH 4. The production of α-, β-, and γ-CDs increased over time, achieving the maximum yield of total CDs (2.72 ± 0.06 g L −1 ) after 2 h of incubation (Figs. 2B and S5). The proportion of α-and β-CDs (34:62) was relatively conserved over time with minor γ-CD production (Fig. 2B,C), revealing that the CldA enzyme is a β-CGTase. Nevertheless, while CldA displayed a specific β-cyclization activity of 51.26 ± 6.3 U mg −1 , it exhibited an unusual high hydrolytic activity of 405.40 ± 5.4 U mg −1 . According to the latter, CldA yielded as the primary products those related to the hydrolysis of soluble starch, such as linear oligosaccharides with different degrees of polymerization (G3-G7) and the fermentable sugars maltose (G2) and glucose (G1) (Figs. 2C and S4). All products synthesized by the action of CldA from soluble starch were confirmed by HPLC and mass spectrometry analysis (Figs. S4 and S5).

Discovery of a novel group of three-domain ABC CGTases.
To identify additional three-domain ABC CGTases, a database mining approach was also applied to ~ 30 public metagenomes of microbial communities from the Obsidian Pool hot spring (Table S2), using the CldA sequence as a template. The database mining approach revealed a homologous CGTase-encoding gene (thmA) that codifies for a 526-residue CGTase (Table S3) sharing 80% sequence identity with CldA (100% query coverage) (Fig. S1). Functional domain analysis showed that ThmA is a three-domain ABC CGTase exhibiting the highly conserved Asp252/Glu281/Asp353 catalytic triad, the conserved aromatic central Phe218, and the pair of hydrophobic residues Trp206/Met283 (Fig. 1B). A BLAST search in the GenBank database of the ThmA enzyme showed 100% sequence identity with a putative glycosidase from Thermoanaerobacter ethanolicus. Furthermore, 14 putative ThmA-like sequences www.nature.com/scientificreports/ encoded in 16 genomes from several Thermoanaerobacterales subspecies of G+ thermophilic Thermoanaerobacter spp. were also found (Table S3). A subsequent BLAST search in the GenBank database confirmed that the 19 three-domain ABC (CldA/ThmA)-like CGTases (Table S3) belong to C. subterraneus ssp. and Thermoanaerobacter spp., respectively. Furthermore, CldA and ThmA share only 38% average sequence identity with three characterized five-domain ABCDE CBM20 CGTases (100% query coverage for ABC domains) from Thermoanaerobacter spp. 52 , confirming that both three-domain CldA/ThmA CGTases are not truncated forms from conventional five-domain CGTases. Accordingly, to determine the evolutionary relationship among this novel group of threedomain CGTases with all characterized CGTases from GH13_2, a phylogenetic analysis was conducted, including seven α-amylases from GH13 as an outgroup. The analysis showed that the CGTases were distributed in five phylogenetic groups that presented a bootstrap value of 100% (Fig. 3). The four-domain ABCE CBM20 CGTases from G−, five-domain ABCDE arch CGTases from archaea, and conventional five-domain ABCDE CBM20 CGTases from the well-studied G+ Bacilli class bacteria were observed in three different clades. Nevertheless, it has been shown that the five-domain ABCDE CBM20 configuration is not unique to CGTases from G+, as has been observed in the thermophilic CGTase from archaea Thermococcus sp. B1001 and the halophilic CGTase from archaea Haloferax mediterranei. A fourth clade comprises maltogenic starch-acting enzymes from GH13_2, which showed sequence and structural homology with CGTases was previously described elsewhere 63,64 . Notably, the 19 three-domain ABC (CldA/ThmA)-like CGTases were clustered together in a fifth new monophyletic group that is well supported by a bootstrap value of 100%, revealing a novel group of CGTases that is separated from the four conventional GH13_2 clades (Fig. 3). Identical phylogenetic results were obtained using the full amino acid sequence (Fig. 3) or solely the amino acid sequence of the minimal functional core ABC (Fig. S6) for all sequences analyzed.
Database mining for the CM-CD pathway in Thermoanaerobacterales. Because C. subterraneus is the only species formally recognized from the Caldanaerobacter genus 55,65 , the eight publicly assembled and draft genomes from the four subspecies of C. subterraneus ssp. were examined (Table S3), focusing on the gene  Table S4). Sequence analysis of the cld gene cluster predicts several proteins of the CM-CD pathway: a putative type I ATP-dependent ABC transporter system, MdxEFG (CldEFG), with the cldEFG gene cassette located immediately downstream of the cldA-likeencoding gene, as well as the three cytoplasmic enzymes CDase, GP, and a glucoamylase from GH15 (GA, EC 3.2.1.3). Predictive functional analysis showed that the cldE-encoding gene from the cldEFG gene cassette codifies for a periplasmic MdxE cyclo/maltodextrin-binding protein that shares 40% average sequence identity (100% query coverage) with the MdxE proteins from G+ Thermoactinomyces vulgaris (TvuCMBP, PDB ID: 2DFZ 47 ) and G+ Alicyclobacillus acidocaldarius (MalE) 46 . Sequence analysis also revealed that CldE exhibits a 24-residue N-terminal signal peptide, 1 MKKYSKILALLTAMVFVLSIALTGCG, containing the conserved Cys25 (boldface residue), which is essential to anchor the MdxE proteins from G+ and archaea to the cytoplasmic membrane outer surface via an N-terminal lipid moiety that is covalently bound to the Cys residue 67 . The cldFGencoding genes from the cldEFG gene cassette (Fig. 4, Table S4) encode two putative ABC transporter permease subunits, CldF and CldG, that share 40% average sequence identity (100% query coverage) with the CymFG/ CgtDE/YvfL-YvfM/MalFG permease subunits from the MdxEFG transporter system of K. oxytoca 40 , Thermococcus sp. 45 , B. subtilis 41 , and A. acidocaldarius 46 , respectively. The putative CDase encoded in the cld gene cluster shares 88% sequence identity with the functionally characterized CDase from Thermoanaerobacter thermohydrosulfuricus (NCBI ID: AAA23219.1), which hydrolyzes CDs to yield maltodextrins G2 and G1 68 . Thus, while CDase linearizes CDs into the cytoplasm, the resulting dextrins are converted mainly into G1/G1P by the GA/GP enzymes encoded in the cld gene cluster (Fig. 4, Table S4). Both GA and GP enzymes have been functionally characterized elsewhere 69,70 . Furthermore, several proteins of the EMP pathway from C. subterraneus ssp., such as phosphoglucose isomerase (Pgi, EC 5.3.1.9), 6-phosphofructokinase (PfkA, EC 2.7.1.11), and the functionally characterized pyruvate kinase (PykF; EC 2.7.1.40) 71 , were also found in the cld gene cluster (Fig. 4, Table S4). Similarly, the genomes of all Thermoanaerobacterales were also tested using an expanded searching cross-family algorithm to identify additional CM-CD-encoding gene clusters. Remarkably, two gene clusters (thm and thb) involved in the CM-CD pathway were also identified in the assembled genomes from Thermoanaerobacter spp. and Thermoanaerobacterium spp., respectively (Fig. 4, Table S4). Sequence analysis of the thm and thb gene clusters predicts several proteins of the CM-CD pathway: a  (5) and CDP (6) from G− K. oxytoca (cym) are also blue, the putative msmX-encoding gene is not included. (iii) Degradation: CDase (7), GA (8), and GP (9) in green. (iv) Metabolic assimilation: Pgi (10), PfkA (11), and PykF (12) in orange. AmyB (33) and the AmyEDC transporter system (34)(35)(36) from Thermoanaerobacterium spp. (thb), and the putative transcriptional regulator of the ABC transporter system from cym/cyc (37)(38) are shown. Note the five groups of protein-encoding genes that are essential for several prokaryotic cell functions: (i) HPr (13), PolIIIα (25), and the CBS domain/Bateman module (24) for carbon catabolite regulation, bacterial genome replication, and sensing cellular energy status, metal ion concentration, and ionic strength.  www.nature.com/scientificreports/ putative type I ATP-dependent ABC transporter system, MdxEFG (CldEFG), and the three cytoplasmic enzymes CDase, GP, and GA. Nevertheless, while the thm gene cluster contains three-domain ABC ThmA-like CGTases, the thb gene cluster contains a conventional five-domain ABCDE CBM20 CGTase. In addition, although the Pgiencoding gene was absent in the thb gene cluster, the critical enzymes for the EMP pathway were encoded in both the thm and thb gene clusters (Fig. 4, Table S4).
Remarkably, sequence analysis of the cld/thm/thb gene clusters also revealed the presence of 18 proteinencoding genes that are essential for prokaryotic cell functions (Fig. 4, Table S4), such as the functionally and structurally characterized phosphotransferase HPr (PDB ID: 3LE5), which is a key enzyme for carbon catabolite regulation in C. subterraneus ssp. tengcongensis 72 , Thermoanaerobacter spp. 73 , and Thermoanaerobacterium spp. 74 , as well as a DNA polymerase III (PolIIIα, EC 2.7.7.7) responsible for bacterial genome replication 75 , which is preceded by a putative CBS domain/Bateman module involved in sensing cellular energy status, metal ion concentration, and ionic strength 76,77 . The second group of putative proteins of the cld/thm/thb gene clusters is involved in cell wall biogenesis, sporulation, and cell division: (i) UDP-N-acetylmuramate dehydrogenase (MurB, EC 1.3.1.98) is involved in the biosynthesis of bacterial cell wall peptidoglycan 78 , (ii) histidinol phosphatase (PHP) is required in the phosphorelay system to regulate the biosynthesis of cell wall-associated polysaccharides 79 , (iii) RapZ regulator is implicated in the RNA-mediated regulatory network of glucosamine biogenesis 80 , (iv) the transmembrane RodZ protein is a key protein in cell elongation (elongasome) and cell division 81,82 , and (v) the sporulation transcription WhiA regulates cell differentiation 83,84 . The third group of proteins is essential for oxidative stress defense, degradation of aromatic compounds, and fatty acid metabolism: (i) the functionally characterized feruloyl esterase (EC 3.1.1.73) from C. subterraneus ssp. tengcongensis, which can hydrolyze esterified phenolic acids from xylan and pectin 85 , (ii) 2-phospho-l-lactate transferase (EC 2.7.8.28) involved in the biosynthesis of redox coenzyme F 420 , which is important for the redox transformations of cell wall lipids, degradation of aromatic/xenobiotic compounds, and neutralization of oxidative and nitrosative stress 86,87 , (iii) the two components E1 (activator) and E2 (dehydratase) of the enzyme system (R)-2-hydroxyglutaryl-CoA dehydratase (EC 4.2.1.167), which is involved in glutamate metabolism via butyrate fermentation in G+ bacteria 88 , and (iv) putative 4-hydroxy benzoyl-CoA thioesterase, which can hydrolyze fatty acyl-CoA thioesters 89 . The fourth group of putative proteins is implicated in amino acid metabolism: (i) signal-transducing protein PII involved in the regulation of nitrogen metabolism via glutamine/glutamate cycle 90 , (ii) methylenetetrahydrofolate reductase (EC 2.1.1.13), and methionine synthase (EC 1.5.1.20), which are both involved in methionine biosynthesis via methyltetrahydrofolate (methyl-THF), and (iii) tripeptide aminopeptidase T (PepT; EC 3.4.11.4), which is preceded by its anaerobic transcriptional activator fnr 91 and is only included in the cld/thm gene clusters. Finally, the putative tRNA(m 5 U 54 )methyltransferase (EC 2.1.1.190) and a multiantimicrobial extrusion protein (MATE), which might be involved in tRNA maturation and detoxification, respectively [92][93][94] , are also encoded in the cld/thm/thb gene clusters. Although G− K. oxytoca, archaea Thermococcus sp., and G+ B. subtilis arranged the proteins involved in the CM-CD pathway in three similar gene clusters, cym, cgt, and cyc, respectively (Fig. 4), none of the latter protein-encoding genes for prokaryotic cell functions and the proteins for the EMP pathway are encoded near their CM-CD gene clusters. The proteins encoded in the cld/thm gene clusters (Fig. 4) are shown in Table S4.

Discussion
Traditionally, the five-domain ABCDE CBM20 organization has been considered the central architecture of CGTases, with the only few exceptions for five-domain ABCDE arch CGTases from archaea and four-domain ABCE CBM20 CGTases from G−, highlighting the recurrence of both the ABC core structure and the E CBM20 /E arch domain in the overall CGTase fold. Here, a database mining approach allowed the identification of a novel group of threedomain ABC (CldA/ThmA)-like CGTases from G+ thermophilic C. subterraneus ssp. and Thermoanaerobacter spp., respectively, which exhibit a unique CGTase domain distribution that is different from that seen in all other CGTases characterized thus far (Fig. 1A). Notably, although the (CldA/ThmA)-like enzymes displayed a distinctive active site for CGTases with the presence of all CSR I-VII motifs from the GH13 family (Fig. 1B), the three-domain ABC architecture is not commonly associated with conventional CGTases. The functional characterization of a representative member, the three-domain ABC CldA, revealed that regardless of whether β-CD is synthesized as the major cyclization product from the starch substrate under the assay conditions, cyclization does not appear to be the main activity of the enzyme (Fig. 2). Accordingly, the production of fermentable sugars, dextrins, and functional CDs from the starch substrate by the action of extracellular (CldA/ThmA)-like CGTases seems to be a reasonable adaptation to diversify products and increase the probability of survival in extremely hot environments with low starch and nutrient concentrations. Compared with the CldA enzyme, similarly increased hydrolytic and decreased cyclization products have been observed for several CGTases from archaea and thermophilic bacteria 36,54 .
The identification of this novel group of enzymes showed for the first time that the three-domain ABC organization represents the minimal functional core structure for CGTases and confirmed previous studies suggesting that the C-terminal region of CGTases has been acquired through evolutionary processes 15,35,95 . Indeed, while the raw starch-binding E CBM20 domain is observed in several GH families 24,96,97 , both the E arch domain with an unknown structure-function relationship and the connecting D domain are unique to CGTases 15,35,95 . Interestingly, the three-domain CGTases clustered together in a new monophyletic group that diverged as a novel evolution path among conventional CGTases. Hence, while the four-domain CGTases from G− separated early from the rest of CGTases, the three-domain CGTases and both groups of five-domain CGTases diverged later from a common ancestor. This observation also indicates that three-domain CGTases are not truncated forms from either of the two groups of five-domain CGTases, and the minimal ABC framework of the (CldA/ThmA)-like enzymes from Thermoanaerobacterales is not the common ancestor of all CGTases (Fig. 3). www.nature.com/scientificreports/ In addition to the phylogenetic analysis, the presence of this novel group of three-domain CGTases suggests a role in starch metabolism. Nevertheless, Thermoanaerobacterales are obligate anaerobic Clostridia class bacteria with low genomic G + C content capable of thriving in various hot environments on Earth, such as geothermal fields, submarine hydrothermal vents, and oil reservoirs 57,98 , which are expected to be starch-poor environments. Consequently, genomic gene clustering analysis against 246 Thermoanaerobacterales genomes allowed the identification of only three gene clusters involved in the CM-CD pathway, cld, thm, and thb, from the Thermoanaerobacteracea family (C. subterraneus ssp. and Thermoanaerobacter spp.) and from Thermoanaerobacterales family III (Thermoanaerobacterium spp.), respectively, confirming the rarity of the pathway. Thus, while the three-domain (CldA/ThmA)-like-encoding genes belong to the cld and thm gene clusters, respectively, the thb gene cluster contains a conventional five-domain CGTase-encoding gene (Fig. 4). Based on comparisons with G− K. oxytoca, archaea Thermococcus sp., and G+ B. subtilis, which arranged the proteins involved in the CM-CD pathway in three similar gene clusters, cym, cgt, and cyc, respectively (Fig. 4), the first step of the CM-CD pathway in Thermoanaerobaterales involves converting the surrounding starch substrate to CDs catalyzed by secreted three-and five-domain CGTases (Fig. 5). As previously established by X-ray crystallography studies, the resulting CDs are then internalized into the periplasm by a transmembrane β-barrel CDP in G− K. oxytoca (CymA, PDB ID: 4V3G), which mediates the passive diffusion of CDs through the perturbation of electrostatic interactions of the N-terminal region with the β-barrel wall of CDP. Therefore, the 15 N-terminal residues of CymA are expelled from the barrel through a ligand-expelled gate mechanism, allowing the diffusion of CDs into the periplasmic www.nature.com/scientificreports/ space 43 . As expected, owing to the differences in the cell wall composition between G+ and G− bacteria, the outer-membrane translocation of CDs in G+ remains uncertain, as no putative CDP was detected in the extensive data mining analysis using the CymA sequence. However, sequence analysis revealed that the putative MdxEFG transporter system, CldEFG, which is present in all three cld/thm/thb gene clusters (Fig. 4), appears to be translocating cyclo/maltodextrin molecules through the peptidoglycan layer and subsequently internalizing them into the cytoplasm (Fig. 5). Similar MdxEFG transporter systems, which translocate cyclo/maltodextrin molecules into the cytoplasm, have been described in G− K. oxytoca (CymEFGD) 40 , archaea Thermococcus sp. (CgtCDE) 45 , G+ B. subtilis (CycB-YvfL-YvfM) 41 , and A. acidocaldarius (MalEFG) 46 (Fig. 5). Accordingly, translocation through the MdxEFG transporter system initiates when MdxE binds the cyclo/maltodextrin molecules synthesized by CGTases. The crystal structure of the cyclo/maltodextrin-binding protein MdxE, TvuCMBP, showed the classical architecture of bacterial sugar-binding proteins, consisting of two domains that are joined by a hinge region, which surrounds a sugar-binding site located at a cleft formed by the two domains 47 . Hence, TvuCMBP binds cyclo/maltodextrin molecules and undergoes substantial conformational changes to transit from the open to the sugar transporter closed conformation to release them into a transmembrane protein complex composed of the two putative permease subunits MdxF and MdxG. Notably, it has also been shown that MdxE, MalE, from G+ A. acidocaldarius is anchored to the cytoplasmic membrane outer surface via a lipid moiety that is covalently bound to an N-terminal cysteine residue, so it can be distributed throughout the cell wall to scavenge the surrounding cyclo/maltodextrin molecules that are synthesized by CGTases to release them into the MdxFG system 46 . In contrast, the cyclo/maltodextrin-binding protein CymE from G− K. oxytoca is an untethered component of the periplasmic space that binds the cyclo/maltodextrin molecules diffused through the transmembrane CDP to release them into the MdxFG system 40 (Fig. 5). Owing to modifications in the cell wall composition and the absence of a transmembrane CDP in G+ bacteria, differences between MdxE proteins from G+ and G− bacteria are typical features that distinguish sugar-binding proteins from type I ATP-dependent ABC transporter systems 44 . Thus, because CldE also includes the N-terminal Cys25 residue that covalently binds to a lipid moiety for anchoring to the cytoplasmic membrane outer surface, translocation through the CldEFG transporter system encoded in the cld/thm/thb gene clusters appears similar to the translocation mechanism of the MdxEFG transporter system, MalEFG, from G+ A. acidocaldarius (Fig. 5). In the next step, cyclo/maltodextrin translocation into the cytoplasm occurs through a conformational change of the two permease subunits MdxFG triggered by the ATPase activity of MdxX/MsmX. Accordingly, the MdxEFG-X transporter system from G− K. oxytoca includes a dedicated intracellular pair of ATP-binding components encoded in the same cym gene cluster by mdxX (CymD) (Fig. 4), which is coupled to the two permease subunits CymFG 37,40 (Fig. 5). In contrast, the CgtCDE, CycB-YvfL-YvfM, and MalEFG transporter systems include a promiscuous MsmX ATPase with the same function as MdxX but exhibiting different nonspecific hydrophobic interactions with several transmembrane complexes, promiscuously energizing multiple sugar importers 48,99 (Fig. 5). The latter observation is quite common in various carbohydrate ABC transporter systems from G+ bacteria 99 . Notably, additional data mining analysis revealed that C. subterraneus ssp., Thermoanaerobacter spp., and Thermoanaerobacterium spp. encoded a putative MsmX ATPase (NCBI ID: WP_011026113.1, WP_003866589.1, and WP_015311043.1, respectively) (Table S4), which completes the putative type I ATP-dependent ABC transporter system, CldEFG-MsmX, from the Thermoanaerobacterales order (Fig. 5). As expected, the msmX-encoding gene is distally located from the cld/thm/thb gene clusters and shares 64% sequence identity with the functionally and structurally characterized MsmX from B. subtilis (NCBI ID: WP_003242648.1, PDB ID: 6YIR) 100 . The following step of the CM-CD pathway involves several enzymes encoded in the cld/thm/thb gene clusters (Fig. 4, Table S4), which are essential for the cleavage and degradation of CDs into the cytoplasm through the EMP pathway (Fig. 5). Thus, while the linearization of CDs by CDase produces G1 and G2 molecules for the EMP pathway, dextrins (G n>3 ) are either converted into G1 or G1P (with the release of G n-1 dextrin) by the actionof GA and GP enzymes, respectively. Both G1 and G1P molecules could be converted into G6P by the action of ADP-dependent hexokinase (HK) and phosphoglucomutase (Pgm), respectively, to also be metabolized through the EMP pathway (Fig. 5). Furthermore, since the putative Pgi and PfkA enzymes and the functionally characterized PykF 71 of the EMP pathway are encoded exceptionally near the protein-encoding genes for (CldA/ThmA)-like CGTases, the CldEFG transporter system, CDase, GP, and GA enzymes (Fig. 4, Table S4), the entire CM-CD pathway from the Thermoanaerobacterales order is revealed (Fig. 5). Thus, while the synthesis of CDs might have a physiological role as functional amphipathic toroids 6,8,10 , the resulting G2 and G1 molecules, as well as the G3-G7 dextrins, could serve as a simple carbon source (Fig. 5). Interestingly, the entire CM-CD pathway is encoded along with several essential proteins for G+ cell functions, such as DNA replication, carbon catabolite regulation, tRNA maturation, biogenesis, sporulation, and cell division (Fig. 4, Table S4), suggesting that extracellular heat-resistant CGTases could play a leading role in the metabolism of Thermoanaerobacterales. Moreover, the presence of protein-encoding genes related to extreme thermophilic metabolism, such as oxidative stress defense, degradation of aromatic compounds, fermentation, and fatty acid and amino acid metabolism (Fig. 4, Table S4), also indicates that the physiological role of heatresistant CGTases in product diversification seems to be a convenient adaptation to survive in hot starch-poor environments. Accordingly, the relevance of CGTases during starch metabolism can be supported by early observations of Thermoanaerobacterium spp. 101 , in which the secreted thermophilic α-amylase/amylopullulanase AmyB was found to hydrolyze a variety of α-(1,4)-and α-(1,6)-glucans 102 , acting together with an ABC maltose/ maltotriose importer (amyEDC) 101 . Notably, the amyBEDC gene cluster is located immediately upstream of the conventional five-domain CGTase (formerly named AmyA) 54 (Fig. 5) encoded in the thb gene cluster (Fig. 4). Thus, AmyB and the five-domain CGTase seem to play a cooperative role, as it has been shown that the transcription of the amyBEDC gene cluster and the CGTase-encoding gene is induced by maltose or starch as carbon sources 101 . Likewise, the deduced promoter sequences of cldA/thmA genes, 5′-TGC ACT -17 bp-TAA TAT , and 5′-TTT CGA -17 bp-CAT ATT , showed similarity to the σ-dependent consensus promoters of the amyABEDC gene cluster 101 . However, the database mining analysis revealed that the AmyB-like enzyme from C. subterraneus ssp. www.nature.com/scientificreports/ and Thermoanerobacter spp. is not encoded near the cld/thm gene clusters (Fig. 4), indicating that the secreted three-domain CGTases are the main starch-acting enzymes of both gene clusters, highlighting their importance for product diversification on these microorganisms. In summary, this is the first identification of a novel group of CGTases with an uncommon three-domain ABC organization, which further established a new evolutionary path among CGTases. These novel enzymes were detected in two gene clusters, cld and thm, from extremely thermophilic Thermoanaerobacterales C. subterraneus ssp. and Thermoanaerobacter spp., as part of a CM-CD pathway involved in the synthesis, transportation, degradation, and metabolic assimilation of CDs from starch. These findings were extended to Thermoanaerobacterales Thermoanaerobacterium spp., which also showed a CM-CD pathway not previously described but governed by a conventional five-domain CGTase encoded in the thb gene cluster. In contrast to the secondary role of the CM-CD pathway in mesophilic bacteria, the remarkable product diversification catalyzed by the three-domain CGTases suggests that they could play a critical role in the carbohydrate metabolism of C. subterraneus ssp. and Thermoanaerobacter spp. Future X-ray crystal structure determination, structure-based protein engineering, and kinetic studies of CldA will offer an opportunity to gain insights into this particular pathway and the structure-function relationship of this novel group of enzymes.

Materials and methods
Data mining for CGTases. Metagenomes were analyzed from the Joint Genome Institute (JGI) IMG/M database 103 , which contains more than 15,014 metagenomes from different environments (last search, July 2021). Putative CGTases were detected by a BLASTn search in ~ 130 publicly assembled metagenomes in the IMG/M platform using an E-value cutoff of 1.0e −5 . The metagenomes were filtered for those containing different terms from hyperthermophilic ecological niches (e.g., geothermal fumarole, geyser, hot spring, or hydrothermal vent) in the "Genome Name/Sample Name" description (Table S1). The protein query sequences consisted of the complete amino acid sequences of experimentally characterized CGTases, including CGTase from G+ T. thermosulfurigenes EM1 with a conventional five-domain ABCDE CBM20 distribution (NCBI ID: AAB00845.1, PDB ID: 1CIU) 21 , the solely characterized CGTase from G− K. oxytoca M5a1 with a four-domain ABCE CBM20 distribution (NCBI ID: AAA25059.1) 104 , and a CGTase from the thermophilic archaea P. furiosus DSM 3638 with a fivedomain ABCDE arch distribution (NCBI ID: ABA33720.1) 105 . Putative CGTases that shared > 45% sequence identity with query sequences were excluded to increase novelty. The best hits were analyzed manually to evaluate the complete scaffold templates and discard truncated sequences. NCBI's Batch Web CD-Search Tool against the Conserved Domain Database (CDD/SPARCLE) 106 was employed to predict the functional domains of selected hits. Hence, a putative CGTase with a unique three-domain ABC distribution (named CldA) was identified in a scaffold containing ~ 50 genes in a metagenome of thermophilic microbial communities from Obsidian Pool hot spring at Yellowstone National Park (Wyoming, USA) (Table S1). Therefore, a second database mining approach was applied to identify additional three-domain ABC CGTases. The CldA sequence was then submitted to BLASTn against 30 publicly assembled metagenomes deposited in the IMG/M platform 103 that belong to several microbial communities from the Obsidian Pool hot spring at Yellowstone National Park (Table S2). A second putative three-domain ABC CGTase (named ThmA) was identified in three metagenomes from the Obsidian Pool hot spring (Table S2). Redundant sequences and truncated genes were discarded. The CldA/ ThmA sequences, along with the 51 sequences of characterized enzymes from GH13_2, were listed into a FASTA file and subjected to multiple alignments using Clustal Omega with default parameters 107 . Manual refinement of the multiple alignments was performed to detect key conserved catalytic residues from CGTases 13,31,62 . Finally, a third database mining approach was conducted to identify additional (CldA/ThmA)-like CGTases. Hence, the CldA/ThmA sequences were submitted to BLASTn against publicly assembled genomes deposited in the Gen-Bank database from Caldanaerobacter spp. (NCBI Taxonomy ID: 249529) and Thermoanaerobacter spp. (NCBI Taxonomy ID: 68295). Several (CldA/ThmA)-like sequences were obtained (Table S3), listed in a FASTA file, and subjected to the bioinformatics pipeline described above. The sequence logo was generated by WebLogo 108 . The three-domain ABC CGTase CldA was selected for further recombinant production and functional studies.
Gene cloning and protein production. A synthetic gene coding for the mature form of CldA, codonoptimized for E. coli expression, was prepared by Integrated DNA Technologies (Iowa, USA). The synthetic cldA gene was cloned into the NdeI and NotI sites of the pET-22b(+) expression vector (Novagen), which contains a sequence coding for six histidines at the C-terminus. The identity of the resulting plasmid pCldA was evaluated by restriction analysis and confirmed by DNA sequencing. Electrocompetent E. coli BL21(DE3)pLysS cells were transformed with pCldA and grown on Luria-Bertani (LB) agar plates containing 100 μg mL −1 ampicillin at 37 °C. A single colony of BL21(DE3)pLysS/pCldA was picked to inoculate 5 mL LB medium overnight with 100 μg mL −1 ampicillin at 37 °C, aliquoted in a sterile solution of 40% (v/v) glycerol and maintained at − 80 °C. For recombinant CldA production, a fraction of a frozen cell aliquot was taken and cultured for 12 h at 37 °C and 200 rev min −1 in 50 mL LB medium containing 200 μg mL −1 ampicillin. This preinoculum was used to inoculate 1 L 2xYT medium with 200 μg mL −1 ampicillin at an initial optical density at 600 nm (OD 600 ) of 0.05 at 37 °C and 200 rev min −1 . After induction by adding a final concentration of 0.1 mM IPTG to the medium (OD 600 of ~ 0.6), the temperature was lowered to 22 °C, and the culture was grown for 12 h at 150 rev min −1 . The cells were harvested by centrifugation (7500g, 10 min, 4 °C) and resuspended in 10 mL buffer A [50 mM sodium phosphate pH 8.0, 500 mM NaCl, 2% (v/v) glycerol, 20 mM imidazole] containing EDTA-free complete protease inhibitor cocktail mini tablet (Roche Molecular Biochemicals) and 1 μg mL −1 DNAse. The cell suspension was sonicated on ice for 30 min with an amplitude of 25-29%, and the resulting solution was heated for 20 min at 60 °C to precipitate the thermolabile protein fraction of E. coli. After the heating step, the lysate was centrifuged (19, . The temperature dependence of β-CGTase activity was determined in the 40-100 °C range. The optimum pH was determined by incubating the enzyme in different 50 mM buffer solutions ranging from pH 3.0 to 9.0. Hence, glycine-HCl buffer was used at pH 3.0, acetate buffer at pH 4.0 to 5.0, phosphate buffer at pH 6.0 to 7.0, Tris-HCl buffer at pH 8.0, and glycine-NaOH buffer at pH 9.0. The β-CGTase activity was determined spectrophotometrically by the phenolphthalein method described elsewhere 109 with minor modifications. Accordingly, 250 mL of working phenolphthalein solution was prepared by adding ~ 249 mL of 125 mM sodium carbonate pH 10.5 to 1 mL of 3 mM phenolphthalein solution in ethanol. The reaction was stopped by adding 175 µL of 1 mM NaOH to 50 µL aliquots of the reaction mixture. The latter solution was then mixed and vortexed with 100 µL of working phenolphthalein solution and analyzed by the decrease in absorbance at λ = 550 nm owing to β-CD-phenolphthalein complex formation. The β-CD concentration was determined using a standard curve constructed by the phenolphthalein method 109 with commercial β-CD (Sigma-Aldrich). One unit of β-CGTase activity was defined as the amount of enzyme that produced 1 μmol β-CD per min under the defined conditions. The hydrolytic activity was measured as the liberation of reducing sugars from soluble starch by the 3,5-dinitrosalicylic acid (DNS) method 110 using a standard curve constructed with commercial maltose (Sigma-Aldrich). One unit of hydrolytic activity was defined as the amount of enzyme that produced 1 μmol of reducing sugars per min under the defined conditions. Product analysis. . Mass spectrometry analysis of products from 5% (w/v) soluble starch by the action of CldA was obtained from a mixture at 2 h using a QTOF Xevo G2-S (Waters). A direct infusion into the mass spectrometer was used at a flow rate of 5 μL min −1 . The ionization conditions were as follows: (i) the electrospray source was operated in positive ion mode, and the source and desolvation temperatures were 100 and 250 °C, respectively; (ii) desolvation and cone gas at a flow rate of 800 and 50 L h −1 , respectively; (iii) capillary and cone voltage of 2500 and 10 V, respectively; (iv) acquisition mass range from 50 to 1500 m/z. For HPLC and mass spectrometry analysis, high-purity oligosaccharides from G3 to G7 (Toronto Research Chemical) and G1-G2, α-, β-, and γ-CDs (Sigma-Aldrich) were used as standards.
Phylogenetic analyses. The (Fig. 3). The sequences of 7 α-amylases from GH13 were used as an outgroup. Three starch-acting enzymes from GH13_2 (NCBI ID: AAA22229.1, AID53183.1, and CAJ81031.1) were excluded from the analysis since they are not CGTases. Two phylogenetic trees were built using the full amino acid sequence (Fig. 3) or solely the amino acid sequence of the minimal functional core ABC (Fig. S6) for all 85 sequences mentioned above. The alignment of all amino acid sequences was conducted with the ClustalW algorithm using default parameters. The Scientific Reports | (2022) 12:730 | https://doi.org/10.1038/s41598-021-04569-x evolutionary relationship of CGTases was inferred with the maximum likelihood method 111 , setting the best-fit model of amino acid substitution (WAG + G) 112 . The bootstrap method (1000 replicates) was applied to assess the confidence in the phylogenetic analysis. All the implemented algorithms are included in the Molecular Evolutionary Genetics Analysis (MEGA 6.06) package 112 . The consensus tree was visualized and edited in Interactive Tree Of Life iTOL v4 (http:// itol. embl. de) 113 .
Data mining for CM-CD gene clusters. The cld gene clusters where the cldA-like-encoding genes are located were delimited in the complete assembled scaffold from C. subterraneus ssp. yonseiensis KB-1 (NCBI ID: AXDC01000002, location 50928-86345), C. subterraneus ssp. subterraneus 38_43 (NCBI ID: LGEY01000002, location 21575-56994) and T. tengcongensis MB4 (NCBI ID: AE008691.1, location 1749287-1786305). Partial scaffolds of the cld gene clusters were also found in five other genomes from C. subterraneus ssp. (Tables S3,  S4). Furthermore, the cld gene clusters involved in the CM-CD pathway were submitted to BLASTn against 246 available genomes deposited in the GenBank database from Thermoanaerobacterales order (NCBI Taxonomy ID: 68295). Accordingly, the genomes from Carboxydothermus (NCBI Taxonomy ID: 129957), Thermacetogenium (NCBI Taxonomy ID:  respectively, were performed using the PATRIC genus-specific protein families (PLFams) method 114 . Functional prediction of proteins encoded by the cld, thm, and thb gene clusters (Table S4) was carried out using the CDD/ SPARCLE 106 , Pfam 115 , and UniProt (https:// www. unipr ot. org/) databases. Protein subcellular localization and physicochemical property predictions were conducted using the CELLO v.2.5 116 and ProtParam (ExPASy) servers 117 , respectively. The presence of a signal peptide was performed using the SignalP 5.0 server 118