Exploring the taxonomical and functional profile of As Burgas hot spring focusing on thermostable β-galactosidases

In the present study we investigate the microbial community inhabiting As Burgas geothermal spring, located in Ourense (Galicia, Spain). The approximately 23 Gbp of Illumina sequences generated for each replicate revealed a complex microbial community dominated by Bacteria in which Proteobacteria and Aquificae were the two prevalent phyla. An association between the two most prevalent genera, Thermus and Hydrogenobacter, was suggested by the relationship of their metabolism. The high relative abundance of sequences involved in the Calvin–Benson cycle and the reductive TCA cycle unveils the dominance of an autotrophic population. Important pathways from the nitrogen and sulfur cycle are potentially taking place in As Burgas hot spring. In the assembled reads, two complete ORFs matching GH2 beta-galactosidases were found. To assess their functional characterization, the two ORFs were cloned and overexpressed in E. coli. The pTsbg enzyme had activity towards o-Nitrophenyl-β-d-galactopyranoside (ONPG) and p-Nitrophenyl-β-d-fucopyranoside, with high thermal stability and showing maximal activity at 85 °C and pH 6, nevertheless the enzyme failed to hydrolyze lactose. The other enzyme, Tsbg, was unable to hydrolyze even ONPG or lactose. This finding highlights the challenge of finding novel active enzymes based only on their sequence.

. Characteristics of the paired-end raw sequences obtained after Illumina MiSeq sequencing of As Burgas water before and after quality control (QC) with PRINSEQ. Read 1 and Read 2 correspond to the paired reads. www.nature.com/scientificreports/ Alternatively, some species can oxidize thiosulfate or sulfur as energy sources 32 . Compared with other geothermal springs worldwide, the community structure of As Burgas is very similar to the Mihai-Bravu in Romania, which has similar temperature and pH (65 °C, pH 7.91) 28 , as both of the springs were dominated by phyla Proteobacteria, Aquificae, and Deinococcus-Thermus. This result suggests that chemolitotrophy by oxidation of H 2 and reduced sulfur compounds are important metabolic processes in these springs and that the members of phylum Aquificae play a main role in primary productivity in this community. Focusing on the genus level, the three most abundant genera in As Burgas water were Thermus (21,221 sequences (15.77%)), Hydrogenobacter (11,517 sequences (8.56%)) and Thiobacillus (5659 sequences (4.20%)). Thermus spp. has been traditionally described as heterotrophic thermophilic Gram-negative aerobic bacteria; although most are facultative anaerobes in the absence of oxygen and presence of nitrate 37 but some species from the genera have shown the ability to grow mixotrophically 38,39 . The dominance of Thermus in As Burgas water is consistent with this genus' optimal growth temperature (62-75 °C) 37 , in fact, members of this genus are commonly found in other thermal springs with temperatures above 60 °C. For example, in the hot springs of Heart Lake Geyser Basin in Yellowstone National Park, a shift in the microbial population was detected from several cyanobacterial genera at 44 °C to the observation of Thermus members at 63 °C and finally a predominance of this genus in the 75 °C geysers 40 . Thermus genus was also dominant in the 65 °C Mihai-Bravu spring in Romania 28 and the Rupi Basin geothermal spring in Bulgaria 41 . This genus also dominates the water of the geographically close Lobios hot spring in Ourense 23 .
Hydrogenobacter was the second most abundant genus in As Burgas. These extremely thermophilic representatives of phylum Aquificae are obligate chemolithotrophic organisms with anaerobic anabolism, but aerobic catabolism 42 . High relative abundance and co-existence of Hydrogenobacter with Thermus genera was found in Lobios (Ourense) 23 , Rupi Basin (Bulgaria) 41 , Elegedi (Eritrea) 43 and in Niujie (China) 44 thermal springs. The association between hydrogen-oxidizing Hydrogenobacter with hydrogen-producing Thermus in these hot springs suggests hydrogen metabolism as an essential component of these ecosystems.
In addition to the community analysis, functional analysis was performed with MG-RAST. The sequences that passed MG-RAST quality control produced 347,814 and 368,188 predicted protein-coding features for BW1 and BW2, respectively. From these, 52.1% (181,371) for BW1 and 52.8% (194,410) for sample BW2, were assigned annotation by MG-RAST to SEED functional categories (Subsystems) ( Table 2). Among the functional categories at Level 1 identified by the SEED subsystems annotation, the four most dominant were the clusteringbased subsystems (functional coupling evidence but unknown function; 13.44 ± 0.55%), protein metabolism (10.77 ± 0.17%), carbohydrates (9.55 ± 0.11%) and miscellaneous (6.42 ± 0.24%), based in the relative abundance of assigned reads (Fig. 2). More detailed information is provided in supplementary material (Supplementary  Table S3). Similar results were found in Lobios hot spring water where the clustering-based subsystems were found as the largest category followed by miscellaneous, carbohydrates, and protein metabolism 23 . The predominance of the clustering-based subsystems in both metagenomes shows how limited our knowledge is regarding the functional annotation of the microbial proteome, as the precise functions of most proteins in metabolic pathways are yet to be revealed. Thus, the strategy of discovering new activities by a functional-driven metagenomic approach rises as a valid alternative to overcome such challenges.
Since O 2 concentration is reduced in hot springs due to lower oxygen solubility in heated water, other electron acceptors are important, such as nitrate, elemental S, sulfate, or CO 2 . Thus, an overrepresentation of sequences related to nitrogen and sulfur metabolism could be expected in these kinds of habitats. Consequently, in this study, we specially review those pathways involved in nitrogen and sulfur metabolism.
Analysis of the nitrogen metabolism at subsystem level 3 revealed a high abundance of sequences involved in nitrate and nitrite ammonification, also known as dissimilatory nitrate reduction to ammonium (DNRA) ( Table 3). DNRA is the result of anaerobic respiration by chemoorganoheterotrophic microorganisms using nitrate (NO 3 − ) as a final electron acceptor, producing ammonia (NH 4 + ). This metabolic pathway results in nitrogen (N) conservation in the ecosystems and is favored in habitats where NO 3 − is limiting in relation to organic carbon 45 . Therefore, the low NO 3 content found in As Burgas water in comparison to other proximal geothermal springs such as Outariz, Tinteiro and Chavasqueira 20 might be promoting the prevalence of DNRA bacteria like Proteobacteria 46,47 . This result is in accordance with the dominance of phylum Proteobacteria found in the taxonomical analysis of As Burgas metagenomic sequences. Nevertheless, it is important to remark that the presence www.nature.com/scientificreports/ or relative abundance of a gene in a metagenome does not mean that it is active. Metatranscriptomic studies are necessary to determine if DNRA is an important pathway in this ecosystem. In this aspect, other studies have reported the occurrence of an active DNRA pathway in some hot springs [48][49][50] . A high number of reads with similarity to ammonia assimilation were found in As Burgas water metagenome ( Table 3). The abundance of sequences annotated as glutamine synthetase and glutamate synthase, key enzymes in this metabolic pathway, were already expected as they are widely distributed among microorganisms, playing an important role in nitrogen metabolism 51 .
Reads annotated as Nitrogenase (Nif) genes for nitrogen fixation were also abundant in the metagenome. Although the distribution of these genes seems to be widespread in nature, as they have been described in different environments 52 including hot springs [53][54][55] , active nitrogen fixation has been reported in several thermophilic organisms 56,57 . Nitrogen fixation could be important in As Burgas as this ecosystem harbors phyla with known diazotrophic representatives such as Proteobacteria and the phylum Aquificae in which some members of Hydrogenobacter were recently described as nitrogen-fixing bacteria 58 . Furthermore, nitrogen fixation has been demonstrated in other geothermal springs such as several hot springs from Yellowstone National Park 59,60 and Nakabusa hot springs in Japan 61 , among others.
Nitrification might also take place in As Burgas ecosystem, as sequences matching the ammonia monooxigenase (AMO) enzyme were detected in the two metagenomes. This enzyme catalyzes the oxidation of ammonia to hydroxylamine and it is essential for chemolithotrophic ammonia-oxidizing bacteria. The oxidation of ammonia to nitrite in As Burgas hot spring water could be associated with the abundant Proteobacteria, as several members of this phylum have been described as autotrophic nitrifiers 62,63 .
Another important component in the nitrogen cycle is denitrification, which competes with DNRA, due to the dependence of both metabolic pathways on NO 3 − . Members of the genus Thermus are important denitrifiers in heated ecosystems, as they can perform facultative anaerobic respiration using NO 3 − as the final electron acceptor, producing N 2 or nitrous oxide (N 2 O) 37 . In addition, representatives from another abundant genus in As Burgas, Thiobacillus, also perform denitrification processes 64,65 . Unexpectedly, not many sequences related to denitrification were annotated in the metagenome (771 sequences in BW1 and 692 in BW2), even though these potential denitrifiers were two of the most abundant genera found in As Burgas. At the function level, sequences related to denitrification such as nitrite reductase (nir), nitric-oxide reductase (nor) and nitrous-oxide reductase (nos), were present in both metagenomes, but not in high abundance. www.nature.com/scientificreports/ Functions involved in sulfur oxidation were also abundant in As Burgas water ( Table 3). The high abundance of these sequences can be attributed to the prevalence of Proteobacteria in the microbial community, since this is an important sulfur-oxidizing phyla 66,67 . Numerous members of the abundant phylum Aquificae and Deinococcus-Thermus can oxidize thiosulphate or sulfur as an energy source and thus harbor sox genes 38,39,68 . Moreover, some sulfur-oxidizing bacterial species of the genus Thermus and Thiobacillus are also nitrate-reducing bacteria that accept electrons from the oxidation of reduced inorganic sulfur compounds and have been frequently identified in a diverse range geothermal springs 38,39,64 . Therefore, sulfur oxidation coupled with denitrification could be an important source of energy for carbon fixation in this hot spring, as was previously described for other hot springs 69 and diverse heated habitats like hydrothermal vents 70 .
In relation to carbon-fixation metabolism, a high abundance of sequences associated with the reductive pentose phosphate cycle (Calvin-Benson cycle) ( Table 3) was found. This cycle has been described as the principal pathway of carbon fixation in Cyanobacteria and Proteobacteria 71 and some studies have reported the presence of genes related to this cycle in several Thermus strains 72 .
The number of sequences affiliated to the tricarboxylic acid (TCA) cycle was also representative (1742 for BW1 and 1775 for BW2), but slightly lower than those for the Calvin-Benson cycle. Most enzymes involved in the TCA cycle function in an oxidative way (releasing stored energy through the oxidation of acetyl-CoA into ATP and CO 2 ), but they can be used by some microorganisms in a reductive TCA cycle that is essentially the oxidative TCA cycle running in reverse, leading to the fixation of two molecules of CO 2 and the production of one molecule of acetyl-CoA 73 . Reverse TCA is suggested to be a more ancient pathway for carbon fixation 74 and the main route for primary production at high temperatures (above 70 °C) 75 . The ability to perform the reverse TCA cycle is typical of bacteria from the phylum Aquificae such as Hydrogenobacter 75,76 and was confirmed in a variety of anaerobic and microaerobic bacteria, including several proteobacteria 73 . Moreover, reads annotated as pyruvate: ferredoxin oxidoreductases (POR) were found in the two metagenomes. POR enzyme decarboxylates pyruvate to form acetyl-CoA and is crucial for the reverse TCA cycle, as it is able to act as pyruvate synthase catalyzing the reverse reaction 77,78 . The high abundance of sequences involved in the Calvin-Benson and reverse www.nature.com/scientificreports/ TCA cycles reveals that autotrophy is an important source of energy of the ecosystem, as was expected, in accordance with the low organic content of this kind of thermal habitats. A high relative abundance of reads associated with one-carbon metabolism such as YgfZ, a folate-binding regulatory protein 79 and sequences related to the serine-glyoxylate cycle (Table 3) were identified. Serineglyoxylate cycle is a carbon assimilation pathway found in aerobic methanotrophs belonging to the classes Alpha-, Gammaproteobacteria, and the phylum Verrucomicrobia 80 . Sequences annotated as crucial enzymes for methanotrophic metabolism such as methane monooxygenase, methanol dehydrogenase or hydroxypyruvate reductase 81,82 were present in the two replicates of As Burgas metagenome. A similar result was previously reported for the nearby Lobios hot spring, in which a high abundance of sequences associated with YgfZ and the serine-glyoxylate cycle was also detected. However, Lobios metagenome lacks the methane monooxygenase and methanol dehydrogenase encoding genes 23 . The methanogenic microorganisms frequently found in hot springs microbial mats 83 would be the methane producers for methanotrophs in As Burgas. In fact, sequences annotated to the methanogenic orders Methanobacteriales, Methanocellales, Methanomicrobiales, Methanosarcinales, and Methanopyrales were found among the archaeal reads in the taxonomical analysis of As Burgas. Moreover, sequences matching several proteins involved in methanogenesis such as heterodisulfite reductase, formate dehydrogenase, and carbon monoxide dehydrogenase were found in the metagenome. Nevertheless, the presence of methyl-coenzyme M reductase gene, a key enzyme in methanogenesis 84 , was not detected in the metagenome.
Sequence assembly and screening for sequences annotated as β-galactosidase. From the 873,846 quality paired-end BW2 raw reads, a total of 28,296 contigs with a maximum length of 263,962 bp and an average length of 932 bp (26,379,150 bp) were obtained using SPADes. From these, 26,417 sequences (93.36%) were annotated to the functional level with the MG-RAST. A search for β-galactosidase sequences with this tool resulted in only 2 sequences that harbor complete coding ORFs that were chosen for further study. Both selected ORFs belong to Themus scotoductus SA-01, as their nucleotidic sequence had 100% alignment with the T. scotoductus SA-01 complete genome, deposited in the GenBank by Gounder et al. 85 under the accession number CP001962.1. This result is consistent with the dominance of Thermus genera reported in the taxonomical analysis. The deduced protein sequence of Tsbg and pTsbg consisted of 574 and 690 residues, respectively, and showed 100% homology with two different β-galactosidases from T.scotoductus with GeneBank accession number WP_015717803.1 and WP_015717801.1 for Tsbg and pTsbg respectively. The two proteins have been registered in GeneBank as part of a whole shotgun genome sequencing and annotation, but their cloning and expression have never been reported, therefore we selected both ORFs for further study and characterization. Both protein sequences contain a Glycosyl hydrolases family 2 (GH2) TIM barrel Domain (PF02836) according to Pfam protein database 86 . Therefore they are within the GH2 superfamily, in agreement with other thermostable microbial β-galactosidases like those from Thermotoga maritima 87 or Streptococcus thermophilus 88 . Cloning, expression, and purification of T. scotoductus β-galactosidases. Both sequences were efficiently amplified, cloned in pDEST-527 vector and overexpressed in T7 Express E. coli. As no activity towards ONPG or lactose was detected for Tsbg, the gene was cloned in pDEST-527 without the histidine tag, in an attempt to discard the possibility of an incorrect folding or blocking of the active site due to the tag. Nevertheless, purified Tsbg protein without tag did not show activity using both lactose and ONPG as substrates. The lack of β-galactosidase activity in Tsbg is similar to the results obtained for T. scotoductus DSM 8553, as no β-galactosidase activity was detected in this strain 89,90 , which suggests that the cause is the protein itself rather than the expression host. Therefore, the successive characterization steps were only performed with the pTsbg.
Effect of pH and temperature on activity and stability of recombinant pTsbg. pTsbg showed maximal activity at pH 6.0 in Britton-Robinson buffer using ONPG as substrate (Fig. 3A). This result is slightly lower than the optimum pH reported for other bacteria from Thermus genera like T. thermophilus HB8 91 , T. thermophilus HB27 92 and it is comparable to the optimal pH reported for other thermostable β-galactosidases such as those from Bacillus licheniformis 93 , Caldicellulosiruptor saccharolyticus 94 , Marinomonas sp. BSi20414 95 and much lower than the pH 7.8 reported for T. oshimai DSM 12092 β-galactosidase 96 .
As shown in Fig. 3B, maximal pTsbg β-galactosidase activity towards ONPG was found at 85 °C. This optimal temperature is higher than described using the same substrate for other counterparts of the genus Thermus such as T.thermophilus HB8 91 , T.thermophilus HB27 92 , T. aquaticus YT-1 97 , T. oshimai DSM 12092 96 and is the same reported as optimal to T.thermophilus KNOUC114 β-galactosidase 98 . When compared to other genera of thermophilic bacteria β-galactosidases, pTsbg showed higher optimal temperature than documented for the extremely thermophilic C. saccharolyticus and Marinomonas sp. BSi20414, which showed the optimum temperature at 80 °C and 60 °C respectively 94,95 . Nevertheless, the optimal temperature described for Thermotoga naphthophila RUK-10 β-galactosidase is higher 99 .
In relation to the thermal stability, pTsbg was able to retain up to 60% of its maximal activity towards ONPG after 24 h of incubation at 75 °C (Fig. 4).
Determination of substrate specificity of pTsbg. Although substrate specificity of the enzyme was studied using the eight chromogenic substrates described in the "Methods" section, pTsbg was only active towards ONPG and p-Nitrophenyl-β-d-fucopyranoside. Moreover, the enzyme was unable to hydrolyze lactose and no transgalactosylation was observed in the presence of this substrate, as was determined by HPLC after carrying the reaction with 40% lactose at 70 °C and using a mix of galactose, glucose, lactose, raffinose and stachyose as standard (data not shown). www.nature.com/scientificreports/ The preference for β-linked galactosidic substrates such as ONPG or p-Nitrophenyl-β-d-fucopyranoside over lactose has been frequently described in the characterization of β-galactosidases 99,100 . Similar to our results with pTsbg, other studies have reported β-galactosidases with activity towards ONPG but unable to hydrolyze their natural substrate lactose in vitro such as YesZ β-galactosidase from Bacillus subtilis 101 or the β-Gal II from Bifidobacterium adolescentis DSM 20083 102 . The lack of β-galactosidase activity towards lactose reduces considerably the biotechnological potential of pTsbg, as it could not be applied to produce GOS from lactose and to generate lactose-free dairy products. Nevertheless, more studies focused on the fucosidase activity should be conducted, since pTsbg showed high activity with p-Nitrophenyl-β-d-fucopyranoside and may harbor fucosyltransferase activity that could be used for the synthesis of fucosylated oligosaccharides (FUCOS) with biological interest 103 such as those from human milk.

Conclusions
The taxonomical analysis of As Burgas hot spring metagenome reveals a microbial community dominated by Bacteria in which Proteobacteria (68.25 ± 3.59%) and Aquificae (11.24 ± 1.15%) are the most abundant phyla. The prevalence of the genera Thermus (15.77%) and Hydrogenobacter (8.56%) and the relation of their metabolism suggests an association between these two genera.  www.nature.com/scientificreports/ Moreover, the high relative abundance of sequences involved in the Calvin-Benson cycle and sequences annotated as key for the reductive TCA cycle unveils the dominance of an autotrophic population. Important pathways from the nitrogen and sulfur cycle such as DNRA, nitrification, or sulfur oxidation are potentially taking place in As Burgas hot spring, as was determined by the functional annotation of the metagenomic reads and in accordance with the microbial composition of the ecosystem.
After assembling the metagenomic reads two complete ORFs annotated as β-galactosidases were found. Both of them showed 100% homology with T. scotoductus SA-01 and were cloned and overexpressed in E. coli. The enzyme Tsbg lacked β-galactosidase activity using ONPG and lactose as substrates. On the contrary, pTsbg showed β-galactosidase activity towards ONPG but was not able to hydrolyze lactose; it showed β-fucosidase activity on the substrate p-Nitrophenyl-β-d-fucopyranoside, which suggests a priori unexpected biotechnological application. Once more this result reveals that the presence of a gene in a metagenome does not mean that it is active in the way predicted from the sequence, and highlights the importance of combining both functional and sequence metagenomics to find novel enzymes from metagenomes.
Our culture-independent study has provided an insight into the diversity of the microorganisms that inhabit As Burgas thermal environment, in an attempt to find novel β-galactosidases. Future research should be directed to characterize new environments, which will lead to better understanding of their ecological differences, and to find new enzymes of interest.

Methods
Sampling. Thermal water, with temperature 66.3 °C and pH 7.56 20  Taxonomic and functional assignment of metagenomic sequences. Illumina reads were treated with PRINSEQ software for quality control, removing all artificial duplicate reads and reads shorter than 60 base-pairs 104 . High-quality unassembled reads of both replicates were uploaded into the Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) v4.0.3 server 105 and are available under the accession numbers mgm4709017.3 (BW1) and mgm4709018.3 (BW2). MG-RAST is an automated annotation pipeline in which taxonomic assignment is done with BLAT comparisons 106 to the NCBI, and gene functional potential with BLAT comparisons to the SEED protein database 105 . Sequence annotations were performed using the following parameters: cut off e-value 10 −5 , minimum 60% identity, and > 15 bp alignment length, as we have done previously 107 .
To reduce the differences related to library size, relative abundance was calculated as the percentage of reads assigned to a taxon or gene function in proportion to the total number of annotated reads.

Sequence assembly and screening for sequences annotated as β-galactosidase. Paired-end
unassembled high-quality reads were merged using PEAR 108 and assembled with the SPAdes pipeline 109 . Then, assembled reads were uploaded to MG-RAST for functional annotation with the SEED subsystem database (maximum e-value of e−5, minimum identity of 60%, and minimum alignment length of 15). The contigs that contained β-galactosidases sequences were downloaded and analyzed for all possible open reading frames (ORFs) using NCBI ORF finder 110 . The ORFs and the deduced amino acid sequence were compared with other known sequences using nucleotide-nucleotide and protein-protein basic local alignment search tool (BLASTN and BLASTP) search 111 . The Pfam 32.0 web server, based on Pfam family database 86 was used to infer the conserved domains within the amino acid sequences.

Cloning, expression, and purification of T. scotoductus β-galactosidases. Thermus scotoductus
β-galactosidase and putative β-galactosidase ORFs were amplified directly from the metagenomic DNA with the primers listed in Table 4 and both were cloned in the pDONR211 vector using the Invitrogen Gateway Technology (Invitrogen). From the Gateway vector, the gene was shuttled into the His-tagged expression vector pDEST-527, using the Gateway LR recombination reaction (Invitrogen). The constructions were transformed and expressed in T7 Express (C2566) E. coli (NEB). Induction was done with 0.4 mM IPTG for 2 h at 37 °C. Cells were collected by centrifugation (5000 rpm for 15 min 4 °C) and resuspended in 20 mM sodium phosphate buffer 500 mM NaCl (pH 7.2) and Complete Mini Protease Inhibitor Cocktail (Roche; Basel, Switzerland), following the manufacturer instructions. Cell disruption was done by sonication on ice using Vibra Cell sonicator ( Determination of β-galactosidase activity. Enzymatic activity was measured using ortho-Nitrophenylβ-d-galactopyranoside (ONPG). Purified protein preparations were diluted in 150 µL Z buffer (100 mM Na 2 HPO 4 , 40 mM NaH 2 PO 4 , 10 mM KCl, 1.6 mM MgSO 4 , pH 7). Aſter incubation for 5 min at 85 °C, the reaction was started by adding 150 µL of a solution of 4 mg mL −1 ONPG in Z buffer to the enzyme preparation. Aliquots (100 µL) of the reaction mixture were stopped by adding 100 µL 1 M Na 2 CO 3 . Released o-nitrophenol was measured by UV absorbance at 420 nm. β-galactosidase activity is expressed in enzymatic units (U), defined as the amount of enzyme capable of releasing one µmol of the product (o-nitrophenol) per min (µmol min −1 mL −1 ) under the experimental conditions. All measurements were determined in triplicate.
Effect of pH and temperature on activity and stability of recombinant pTsbg. To estimate the effect of pH on enzyme activity, the relative activities against ONPG (4 mg mL −1 ) were measured in the range of pH 5.0-8.5 using 20 mM Britton-Robinson buffer 113 . The influence of temperature was determined by measuring relative enzyme activities at 55-90 °C with ONPG (4 mg mL −1 ) in Z buffer. The thermal stability of the protein was assessed by pre-incubation of the enzyme in Z buffer at a range of 55-85 °C for different times followed by an activity assay against ONPG at 85 °C.