Thermophiles and carbohydrate-active enzymes (CAZymes) in biofilm microbial consortia that decompose lignocellulosic plant litters at high temperatures

The SKY hot spring is a unique site filled with a thick layer of plant litter. With the advancement of next-generation sequencing, it is now possible to mine many new biocatalyst sequences. In this study, we aimed to (i) identify the metataxonomic of prokaryotes and eukaryotes in microbial mats using 16S and 18S rRNA markers, (ii) and explore carbohydrate degrading enzymes (CAZymes) that have a high potential for future applications. Green microbial mat, predominantly photosynthetic bacteria, was attached to submerged or floating leaves litter. At the spring head, the sediment mixture consisted of plant debris, predominantly brownish-reddish gelatinous microbial mat, pale tan biofilm, and grey-white filament biofilm. The population in the spring head had a higher percentage of archaea and hyperthermophiles than the green mat. Concurrently, we cataloged nearly 10,000 sequences of CAZymes in both green and brown biofilms using the shotgun metagenomic sequencing approach. These sequences include β-glucosidase, cellulase, xylanase, α-N-arabinofuranosidase, α-l-arabinofuranosidase, and other CAZymes. In conclusion, this work elucidated that SKY is a unique hot spring due to its rich lignocellulosic material, often absent in other hot springs. The data collected from this study serves as a repository of new thermostable macromolecules, in particular families of glycoside hydrolases.

There were two dominant algae in the green mat, including the solitary green microalgae Heveochlorella hainangensis (phylum Chlorophyta) and unicellular green photosynthetic Trebouxia usneae (class Trebouxiophyceae). The main detected algae phyla in the brown microbial mat were Apicomplexa, Ciliophora, Diatomea, Dinoflagellata, and Protalveolata. In SKY hot spring green microbial mat, we noticed ASVs of Amoebozoa, Ciliophora, Protalveolata, and Ochrophyta.
Taxonomy of shotgun assembled contigs. Samples G1a, G3, B1a, and B3 have individually undergone shotgun metagenomic sequencing and reads were assembled using metaSPAdes into respectively 254,331, 107,238, 81,154 and 125,514 contigs that were larger than 500 bp. After filtering the unassigned taxa, approximately 98% of green mat contigs belonged to bacteria, while the remaining were archaea or eukaryotes. The average contigs for bacteria, archaea, and eukaryote in the brown mat were 77%, 22% and 1%, respectively (Fig. 3).
With ≥ 90% subject coverage and ≥ 90% protein sequence identity threshold, the total non-redundant CE sequences (family CE1, 4, 7, and 9) annotated in the green and brown mat datasets were 23 and 21, respectively. These CE families comprised esterase, polysaccharide deacetylase, acetylxylan esterase, and N-acetylglucosamine-6-phosphate deacetylase, often assist in xylan breakdown. With the same threshold, AA (family AA2, 3, 4, 6, 7, and 12) sequences present in the green and brown mat dataset were 22 and 16, respectively. The identified proteins were catalase-peroxidase, oxidoreductase, glycolate oxidase and sorbosone dehydrogenase.   www.nature.com/scientificreports/ Cellulosic degradation enzymes with high subject coverage but low sequence identity. In an earlier report 3 , the authors suggested that the sequences obtained from a metagenome assembly are considered novel if the primary sequence has ≥ 90% subject coverage and 50-70% identity to the deposited protein sequences. Using this as our threshold, we cataloged 116 and 91 non-redundant GH sequences in the green and brown samples, respectively. Approximately 80% of these sequences were related to families GH1, 3, 5, 10, and 51, while the remaining were from GH6, 8, 9, 12, 16, 26, 30, 43, 44, 48, or 116. We then selected some novel β-xylanase and cellulase and were compared with sequences obtained from various databases.

Discussion
Phyla Bdellovibrionota, Fusobacteriota, and Myxococcota were present in the green microbial mat but in negligible quantities in the brown mat. The unique phyla detected in the brown mat, but not in the green microbial mat, included Caldatribacteriota, Thermodesulfobacteriota, Dictyoglomota, Elusimicrobiota, Thermotogota, Candidatus Calescamantes, Fervidibacteria, Hydrothermae, GAL15 and TA06. The Candidatus Caldatribacterium (phyla Caldatribacteriota), earlier named OP9 was also detected in this work. Using single-cell and metagenome sequencing, data elucidated that Ca. Caldatribacterium conducts anaerobic sugar fermentation and exhibited diverse glycosyl hydrolases, including endoglucanase 37 . www.nature.com/scientificreports/ Cyanobacteria and Chloroflexota were the main identified phyla in the green microbial mat. Because the hot spring is almost stagnant, undisturbed, and the water surface temperature (< 64 °C) is below the maximum threshold of the bacteria photosynthesis process 38 , together these factors favor the growth of the microorganisms. Chloroflexota Thermoflexus hugenholtzii 39 (opt. growth temp. [OGT] 72-75 °C) constituted 31% of B1 microbiota (Fig. 2). The complete genome of T. hugenholtzii JAD2 T and several associated metagenome-assembled genomes are available 40 , and they harbored multiple cellulosic degrading enzymes. When we extracted the DNA materials from sample B3, approximately half of the working materials was reddish-brown jelly-type microbial mat while the remaining were heterogeneous materials. 63% of the total ASVs in the B3 mat were dominated a taxon related to Roseiflexus, another Chloroflexota member. At the time of writing, Roseiflexus castenholzii HLO8 T (DSM 13941) is the only described type strain. Bacterium HLO8 T , a photosynthetic strain, formed a reddish-brown microbial mat in a Japanese hot spring 41 . We anticipate that the Chloroflexota associated taxon that formed reddish-brown jelly-type microbial mat in the spring head of SKY hot spring (71-74 °C) is different from strain HLO8 T (OGT 55 °C) as the latter could not thrive at a higher temperature 41 .
Fervidobacterium, under the Thermotogota phylum, was a major genus in sample B2. Fervidobacterium species, for instance, F. islandicum and F. changbaicum, exhibit a broad range of CAZymes. The percentage of Fervidobacterium in hot spring microbiota would be increased if the water was enriched with switchgrass inoculum 32 . Some Firmicutes ASVs were detected in the brown mats. Firmicutes' members, i.e., Geobacillus and other Firmicutes bacilli thermophiles, may dominate cellulose-degrading consortium in a heated lab setup 42 . Caldicellulosiruptor thermophilum, another member of Firmicutes, has been targeted as one potential thermophile for consolidated bioprocessing of lignocellulose 2 . We detected Caldicellulosiruptor and other Firmicutes in relatively small quantities in SKY hot spring mats.
Crenarchaeota was the dominated Archaea phylum in brown mats, with Nitrososphaeria being the main class and consisted of Ca. Caldiarchaeum and Ca. Nitrosocaldus (Fig. 2). The knowledge on these candidates is very limited [43][44][45] . The remaining classes in brown samples were Bathyarchaeia and Thermoprotei. ASVs stated above were also present in green microbial mats with the exception of Nitrosocaldales which was the main order in green biofilm datasets but existed in relatively smaller quantity in brown mats. www.nature.com/scientificreports/ The study of eukaryotes is scarce for hot springs around the globe and is often neglected compared to prokaryotes. Oliverio et al. examined the presence of protists (microbial eukaryotes) in 160 New Zealand geothermal springs and suggested that the main protists possibly thrived in elevated temperatures are Amoebozoa, Archaeplastida, Alveolata, Excavata, Rhizaria, and Stramenopiles 20 . In SKY hot spring green microbial mat, we noticed ASVs of Amoebozoa, Ciliophora (protozoa algae that feed bacteria), Protalveolata, and Ochrophyta. The Amoebozoa Echinamoeba thermarum was a common thermophilic protist in New Zealand geothermal springs 20 . E. thermarum was positively identified in the green mat but absent in the brown mat. We spotted other Amoebozoa, for instance, taxa from order Euamoebida and Leptomyxida. Additionally, thermophilic protist Protalveolata (mainly from class Syndiniales) was approximately 1% of total eukaryotes ASVs in the B1b sample. Thermophilic protist Ochrophyta (particularly class Chrysophyceae) was detected in very small quantity in brown samples.
18S rRNA metataxonomic sample datasets elucidated that thermophilic fungi were present only as the minority (< 1% ASVs). That included Chaetomium (T max 61 °C), Paecilomyces (55 °C), Chrysosporium (60 °C), www.nature.com/scientificreports/ Trichothecium (57 °C), Paecilomyces (55 °C), Torula (58 °C), Talaromyces (50 °C), Paecilomyces (50 °C), Geosmithia (55 °C), and Thermomyces (60 °C) 46 . We were doubtful if all detected fungi ASVs could grow pleasantly in SKY hot spring. Also, the water level was low during the first field trip, and the green mat was not submerged completely, water temperature was 58 °C, as measured ~ 5 cm below the G1 green mat. The actual temperature was expected to be lower in the floating green microbial mat; therefore, certain thermophilic fungi or some heattolerant mesophilic fungi may survive. However, none of the currently known thermophilic fungi can develop beyond 70 °C; therefore, detected taxa are not likely to survive without sporulation at the SKY brown mat at the spring head. Moreover, it is suspected that most of the detected mesophilic fungi ASVs were originated from fallen plant litter, and we expect them to be in their dormant form. For the first time, microscopic water bear Tardigrada (Macrobiotus hufelandi) was detected in a Malaysian hot spring. This small eight-legged animal feed on microorganisms, decomposed leaf, and survive in extreme temperatures. We also detected high background of plant phyla Phragmoplastophyta that likely originated from plant debris. Other background ASVs included flies, grasshoppers, and fringed winged insects, mites, and ticks. We confirmed that eukaryotes were the minority group in the SKY hot spring microbial mats by putting together the shotgun and amplicon data. We think that 18S rRNA primers excessively amplified the background of plant residuals or chromosomal fragments from the dead organisms, spores, inactivated eggs or larva in particular mesophilic fungi or insects. Collectively, we concluded that relatively high background noise was observed using 18S rRNA primer set. We performed shotgun metagenomic sequencing using two green-and two brown microbial mats. Negligible amounts (~ 0.1%) of virus reads were detected in all the mats, and it is quite common to see a trace quantities of viruses in hot springs 47 . Judged using contigs generated from shotgun sequencing data, a greater percentage of archaea present in the brown mat was probably related to a higher temperature at the spring head (Fig. 3). We also think that temperature is the main abiotic factor that differentiate the microbial profile in green-and brown mats. In addition, data elucidated that bacteria are the main microbiota in green and brown mats, and they are the primary plant-biomass degraders and consumers in SKY hot spring. This observation was also noticed earlier in a separate report 16 . Despite lower abundance and diversity, archaea and some candidate taxa may exhibit some functional role on lignocellulosic decomposition in SKY hot spring. Several sequences of CAZymes from GH families (i.e., GH1, 3, 5, 10, etc.) were identified from Candidatus Bathyarchaeota, Candidatus Brockarchaeota, Thaumarchaeota, Nitrososphaeria archaeon, and Thermoprotei archaeon.
On average, more than 10,000 CAZymes ORFs were found in each type of microbial mat (Table S2). The annotated glycoside hydrolases sequences included cellulases, hemicellulases, CEs, GTs, AAs, and enzymes acting on carbohydrates such as starch. More cellulase and hemicellulase sequences were identified in SKY hot spring than the counterpart numbers detected in an Indian hot spring metagenomic study using water-sediment samples lacking in-situ plant litters 3 . Besides, the metataxonomic described in this current study differed from Deulajhari hot springs and Obsidian Pool that contained Pandanus leaf litters and heat-tolerant plant Juncus tweedyi, respectively 16,28 . The microbial and enzyme diversity in SKY hot springs are far more complex than other heated in-situ or ex-situ studies supplemented with insoluble cellulosic biomass [31][32][33]42 .
Armatimonadota is another phylum spotted with multiple sequences from GH1, 5, 10, 43, and 51, CE7, and AA12 (threshold: ≥ 90% subject coverage and ≥ 90% protein sequence identity). Using the metataxonomy dataset, at least four Armatimonadia ASVs were detected in SKY hot spring mats, and an ASV was closely related to class Chthonomonadetes while the rest were unresolved at the lower taxonomy level. This phylum, earlier designated as candidate division OP10, was initially found in Yellowstone National Park Obsidian Pool 48 . Currently, Chthonomonas calidirosea T49 T is the only thermophilic type strain 48 . The complete genome of strain T49 T harbored 64 glycosyl hydrolases and eight carbohydrate esterases. To the best of our knowledge, the characteristics of these enzymes are still undescribed.
Deinococcota is the third-largest phylum with 28 sequences with ≥ 90% subject coverage and identity to the CAZy database. These putative sequences have high identity to counterpart proteins from Calidithermus, Thermus, and Meiothermus species. The draft genome of Calidithermus timidus DSM17022 indicated that the bacterium harbor seven AA, 38 GH, 16 CE sequences 49 . So far, only a GH57-glycogen branching enzyme 50 and GH13 amylosucrase 51 from this bacterium have been analysed in detail. Meiothermus spp. may help break down plant litter in SKY because a representative, M. taiwanensis WR-220 (PRJNA205607), had enzymes such as xylanase, β-xylosidase, endoglucanase, and polysaccharide deacetylase. More than a dozen Thermus spp. have completely curated genomes in the CAZyme genome database. For an example, the genome of Thermus thermophilus (http:// www. cazy. org/ b12268. html) encoded sequence of 15 types of enzymes particularly from families GH1, 13, 23, 36, 57, 63 and 77; however, the essential enzyme for lignocellulose hydrolysis is missing. Therefore, Thermus spp. are sugar consumers in the SKY community.
We are interested in mining novel CAZymes from the shotgun contigs (Fig. S1). A protein sequence may be considered novel if the primary sequence has ≥ 90% subject coverage and 50-70% identity to the deposited protein sequences. We spotted a 1036-residues β-xylanase B1_109149 with approximately 60% identity with an endo-1,4-β-xylanase (WP_012584062.1) from thermophilic Dictyoglomus turgidum. Both protein sequences formed a cluster with β-xylanase from Thermotoga maritima (Q60037) (Fig. 5) www.nature.com/scientificreports/ a signal peptide, two N-terminal β-sandwich fold CBM4_9, followed by a TIM barrel GH10-catalytic domain consisting of four conserved motifs and with two β-sandwich CBM9_1 at the C-terminal. The putative protein structure of β-xylanase G1_109149 was predicted using AlpaFold v.2 52 and is shown in Fig. 6a. Additionally, another twelve GH10 family putative novel β-xylanase sequences were present in the dataset. These proteins sequences were probably related to phyla Armatimonadota, Bacteroidota/Chlorobiota, Ca. Bipolaricaulota, Ca. Solibacter, Ignavibacteriota, Planctomycetota, or Verrucomicrobiota (Fig. 5). The identified β-xylanase sequences have a single domain of the GH10_2 family where the four conserved motifs were located. The predicted structures of the selected xylanases are shown in Fig. 6b-d.
The primary sequence of β-xylanase G1_35826 is unique because it has a domain related to the secretion system C-terminal sorting domain and is absent in other counterparts displayed in Fig. 5. The other name for that domain is por-secretion system or the T9SS type IX secretion system. The putative protein structure of β-xylanase G1_35826 is displayed in Fig. 6c, and the C-terminal domain resembled a β-sandwich fold structure. There is little research exploring how annotated xylanase is incorporated with a T9SS. We observed such domain in xylanase XynRA2 from halo-thermophilic Roseithermus sp., and xylanase Xyn10A from Rhodothermus marinus, and xylanase Xyl2091 from Melioribacter roseus 53 . All these thermo-halophilic bacteria are from Bacteroidota/ Chlorobiota group. Based on a recent review, certain microorganisms, especially those from Bacteroidota utilise the T9SS system for secreting proteins 54 .
Additionally, we annotated three cellulases that belong to non-GH5 groups (Fig. 7). B3_230401 (238 aa) was 50% homologous with the primary sequence of endoglucanase Cel12A (PDB 2BW8, 227 aa) from Rhodothermus marinus 60 . Both sequences have a single GH12 catalytic domain. Often, that domain is similar to the concanavalin-like glucanase domain superfamily that looks like a sandwich structure with 12-14 β-strands (Fig. 6f). Another putative cellulase G1_45801 sequence was 69% homologous to the sequence of a crystal structure CelM2 (3FW6), and Interproscan indicates that both sequences have identical domain arrangements. The gene of CelM2 was cloned from a metagenomic library 60 . GH44 domain (β-sandwich structure) was found in the N-terminal while a galactose-binding domain (TIM-like barrel structure) was present at the C-terminal, where the acid/base Glu 221 and nucleophilic Glu393 are located 61 . Enzyme CelM2 actively hydrolysed multiple substrates, including birchwood xylan, barley glucan and cellulosic CMC, respectively having β-1,3/4-glucan and β-1,4-glucan linkages 61 . The binding ability of multi-substrates is possibly related to the broad and deep pocket. Due to relatively close sequence identity, G1_45801 may exhibit a similar bifunctional glucanase-xylanase activity as CelM2. Lastly, putative cellulase B3_106662 (615 aa) was detected in a brown microbial dataset. The stretch 42-238 resembles a GH114-family domain having a typical aldolase-type TIM-barrel structure (Fig. 6g). The latter half of the sequence (residue 273-587) is the GH5 family domain, resembling the second TIM-barrel. A short loop (residue 247-254) joint both TIM-barrels. As shown in Fig. 6g, a long loop protruding from the GH5-TIM barrel that points towards the GH114-TIM barrel. We expect that the protruding loop has some structural role. So far, there are no closely related crystal structures to the sequence of B3_106662.

Conclusion
More than a thousand hot springs metataxonomic data are deposited in databases. Only Obsidian Pool at Yellowstone Natural Park, Indian Deulajhari spring cluster, and SKY hot springs have reported plant biomass. Hot springs rich with lignocellulosic materials from natural ecosystem are reservoirs of metagenomic data to harness excellent carbohydrate degrading thermozymes. By incorporating amplicon and shotgun metagenomics, this study elucidated that green and brown microbial mat exhibit different microbial profiles, probably driven by temperature and other factors. Current data suggested that microbial profiles and enzymes involved in the lignocellulosic decomposition are more complex than initially thought and is more intricate than ex-situ heated experiments. Many of the taxa in SKY hot springs have not been cultivated. Green mat was rich with photosynthetic microorganisms i.e., Cyanobacteria and Chloroflexota. Brown microbial mat was highly heterogenous, and the microbial community was relatively more complex. Bacteria's enzymes may play a more prominent role in high-temperature lignocellulosic degradation. Few archaea and Candidatus species, such as Candidatus Bathyarchaeota, Candidatus Brockarchaeota, Thaumarchaeota, Nitrososphaeria archaeon, and Thermoprotei archaeon, made a considerable contribution. Certain taxa, for instance, Thermus, were sugar-consumers. Both green and brown samples were rich with unexplored CAZymes. Few of the interesting putative lignocellulosicglycosyl hydrolases and their counterpart homologous proteins were described in this report. This work expands our understanding on thermophilic lignocellulosic degradation and provides new enzyme targets for future development. Thermostable cellulases, xylanases, CE, and AAs could form a cocktail for biofuel industries.  Amplicon sequencing was conducted using Illumina NovaSeq 6000 platform (Illumina, San Diego, USA) with paired-end 250 base pairs at NovogeneAIT Genomics (Singapore). A minimum sequencing depth of 100 K raw reads was reserved for every sample. The resulting raw paired-end reads were processed using the DADA2 plugin in QIIME 2 pipeline 62,63 . The process includes demultiplexing, quality-filtering, denoising, dereplication, and removal of chimeras, and clustering of the paired-end sequences. Taxonomy classification was carried out using SILVA SSU 138.1 database.
Shotgun sequencing and bioinformatic analyses. DNA samples (G1a, B1a, G3, and B3) were fragmented by a Covaris sonicator (Covaris, Woburn, USA). Then, the fragmented DNA was used for dual-indexed, paired-end library construction following the Illumina DNA Prep kit protocol (Illumina, San Diego, USA). The constructed library samples were run in Qubit 3.0 Fluorometer and Agilent Bioanalyzer 2100 (Agilent Technologies, Palo Alto, USA). Whole metagenome shotgun sequencing was carried out in an Illumina NovaSeq 6000 with the running mode of PE 150 (paired-end 150 bp) conducted in NovogeneAIT Genomics (Singapore). A minimum of 20 Gb (equivalent to approximately 66.5 million paired end reads) output was reserved for each sample. The resulting raw paired-end reads were trimmed and filtered by SOAPnuke v2.1.6 software 64 . Clean paired-end reads were de novo assembled by metaSPAdes assembler v3.15.2 65 . Taxonomy classification on the assembled contigs was carried out by Kraken2 v2.1.2 66 . MetaQUAST v5.0.2 with default MetaGeneMark as gene predictor was used to find all the open reading frames (ORFs) from the assembled contigs 67 . Contigs < 500 bp were excluded. All ORFs were also subjected to Carbohydrate-Active enZYmes (CAZymes) annotation via run_ dbcan v2.0.11 coupled with the latest CAZy database v07312020 35 . Sequences that were positive in at least one in program HMMER, Hotpep, and Diamond 35 were shortlisted and further annotated by NCBI non-reductant (nr) and Uniprot reviewed protein database via Diamond v2.0.11.149 68 . Selected CAZymes were further analysed using InterProScan v5.52-86.0 for predicting domains, motifs, and others 69 . The motifs of Thermotoga maritima β-xylanase was used as the reference (IRGHTLVWHNQTP, VYAWDVVNEAVD, AKLFYNDYNTFE, and EKGLIDGIGMQCH). Protein structure prediction was performed by AlphaFold Colab v2.0 52 using the default parameters and displayed using UCSF ChimeraX v1.2.5.

Data availability
The amplicon and shotgun sequencing data were deposited in the NCBI with BioProject number PRJNA761511 and BioSample accessions SAMN21353065-SAMN21353070. www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.