Thermophiles, growing optimally at temperatures over 55 °C, are found in hot environments such as fumaroles, hydrothermal vents, hot springs, or deserts1,2,3,4. Apart from high temperatures, these habitats usually show other harsh conditions like extreme pH or high salt concentration. Therefore, the study of microorganisms inhabiting hot environments and their enzymes has drawn considerable interest from a biotechnological point of view, as these extremophiles have features suitable for industrial processes, in which high stability and activity at elevated temperatures, as well as high tolerance toward various reagents and solvents, are required.

The potential of thermal water as a source of novel thermostable biocatalysts has been demonstrated since a considerable number of thermozymes such as lipases5, polymerases6, or cellulases7, among others, have been isolated from hot springs. In recent years, metagenomics has become a powerful tool to explore the microbiological community composition and activity of extreme environments, like hot springs, whose conditions are difficult to reproduce in a lab-bench. The metagenomic approach is based on the study of the whole environmental microbial DNA (metagenome) that is directly sequenced, in what is called sequence metagenomics, or ligated into a vector and transformed to generate a metagenomic library, in what is known as functional metagenomics. Sequence metagenomics has enabled the study of a large number of hot springs extended all over the world like Tuwa, Lasundra and Unkeshwar hot springs in India8,9,10, a hot spring in Kamchatka, Russia11, Sungai Klah hot spring in Malaysia12 or several hot springs in Yellowstone National Park USA13,14.

β-galactosidases catalyze the hydrolysis of lactose to glucose and galactose, and they have drawn considerable interest from the biotechnological industry for the production of low-lactose milk and the revalorization of whey. Furthermore, some β-galactosidases can transfer the galactosyl residue of lactose carrying transgalactosylations reactions, which are frequently used for the synthesis of galacto-oligosaccharides (GOS), with prebiotic effects15, and to synthesize other galactosylated products16. Metagenomics has contributed to the exploration of heated habitats such as hot springs, either for ecological study or for bioprospection of novel enzymes. Some thermal β-galactosidases have been isolated from hot springs using functional metagenomics17,18 but there is only one reported study of thermostable β-galactosidases found in hot springs through sequence metagenomics19.

In the province of Ourense (Spain), there are at least 13 geothermal springs widespread across the region. Because of its accessibility and its historical importance, in this study, we have focused on As Burgas hot spring. Although some authors have previously investigated its water composition20, or its culturable microorganisms21, the present is the first reported metagenomic study of this hot spring. From the unassembled reads obtained through shotgun metagenomic DNA sequencing, we have assessed taxonomical and functional characteristics of As Burgas water population. Then, metagenomic sequences were assembled and annotated, finding two potential β-galactosidases that have been cloned, purified, and characterized.

Results and discussion

Taxonomic and functional assignment of metagenomic sequences

The BW1 and BW2 metagenomes consisted of 747,684 and 761,635 high quality reads, respectively (Table 1). There was no significant differences between the two samples (data not shown), thus the relative abundances of assigned reads to each taxon or function were expressed as an average. The taxonomical community analysis revealed a predominance of Bacteria (93.11 ± 1.86%), followed by Archaea (6.18 ± 1.84%), Eukaryota (0.67 ± 0.009%), and Viruses (0.02 ± 0.03%) (Fig. 1A). From the 27 bacterial phyla detected, the most abundant were Proteobacteria (68.25 ± 3.59%), Aquificae (11.24 ± 1.15%), Deinococcus-Thermus (5.26 ± 1.01%), Firmicutes (4.29 ± 0.53%) and Bacteroidetes (1.95 ± 0.19%) (Fig. 1B). More detailed information on the community structure is provided in supplementary material (Supplementary Tables S1, S2).

Table 1 Characteristics of the paired-end raw sequences obtained after Illumina MiSeq sequencing of As Burgas water before and after quality control (QC) with PRINSEQ.
Figure 1
figure 1

(A) Taxonomic assignment of the reads at domain level. The chart represents the percentage of reads assigned to each domain (relative abundance expressed as a percentage from the total assigned reads). (B) Taxonomic assignment of sequences within Bacteria domain. Percentage of reads annotated at phylum level is represented. Others include those phyla with less than 0.7% sequences assigned (Candidatus Poribacteria, Chlamydiae, Chlorobi, Chrysiogenetes, Deferribacteres, Dictyoglomi, Elusimicrobia, Fibrobacteres, Fusobacteria, Gemmatimonadetes, Lentisphaerae, Spirochaetes, Synergistetes, Tenericutes, Thermotogae, unclassified (derived from Bacteria) and Verrucomicrobia). Graphs were created with Excel for Windows version 14.0.0.

The predominance of Bacteria followed by Archaea was also found in the soil and the water of the Lobios hot spring, located in the same Galician region22,23. Nevertheless, in contrast with the significant relative abundance of Proteobacteria found in As Burgas water, Acidobacteria was the major phylum in the Lobios sediment while Deinococcus-Thermus dominated the Lobios water. These differences might be due to the influence of physicochemical parameters, such as pH and temperature, on the microbial community composition. In fact, As Burgas water has a lower temperature (66.3 °C) and pH (7.56)20 than Lobios water (76 °C, pH 8.2)23. It is also important to consider that taxonomical assignment in the study of Lobios water was done using assembled reads rather than the unassembled reads and thus, real phyla abundance might be lost24.

Temperature has been reported as a key factor in the prevalence of Proteobacteria. Dominance of this phylum has been found in geographically distant but moderate-temperature (29–65 °C) geothermal springs like Deulajhari and Tattapani in India25,26, Aguas Calientes in the Amazon rainforest of Perú27, Chiraleu, Ciocaia, and Mihai Bravu in Romania28 or El Coquito in the Colombian Andes29. Moreover, Power et al.30 found that the phyla Proteobacteria and Aquificae dominated in 925 geothermal springs in New Zealand (65.2% total average relative abundance across all springs), especially in hot springs with temperatures below 50 °C where Proteobacteria were the most abundant phylum. Similar results were found by Najar et al.31 that studied the microbial diversity of Polok (75–77 °C) and Borong (50–52 °C) hot springs in India finding that the dominance of the Phylum Proteobacteria was more pronounced in Borong hot spring, which had a lower temperature. Another distinctive aspect of Proteobacteria is that they are known to tolerate a higher concentration of sulfur and use reduced compounds of this element as an electron donor during their physiological processes31.

Aquificae is the second most abundant phylum in the As Burgas ecosystem consisting of 11.24 ± 1.15% of the metagenome. This phylum encompasses strictly thermophilic bacteria with an optimum growth temperature above 65 °C32. The high relative abundance of Aquificae occurs in other hot springs with a broad range of pH and temperatures, including six geothermal springs in the Philippines (60–92 °C, pH 3.72–6.58)33, the Mihai Bravu in Romania28 and the Ganzi Prefecture hot springs in China34. Members of this phylum dominate in environments with limited biomass and low ions concentrations, such as the King-Yu, Nono-Yu Koya, Yamanojo, and Jinata Onsen hot springs in Japan35,36, among others. Most Aquificae representatives are hydrogen-oxidizing bacteria that use hydrogen as electron donor, carbon dioxide as carbon source, and oxygen as the final electron acceptor. Alternatively, some species can oxidize thiosulfate or sulfur as energy sources32. Compared with other geothermal springs worldwide, the community structure of As Burgas is very similar to the Mihai-Bravu in Romania, which has similar temperature and pH (65 °C, pH 7.91)28, as both of the springs were dominated by phyla Proteobacteria, Aquificae, and Deinococcus-Thermus. This result suggests that chemolitotrophy by oxidation of H2 and reduced sulfur compounds are important metabolic processes in these springs and that the members of phylum Aquificae play a main role in primary productivity in this community.

Focusing on the genus level, the three most abundant genera in As Burgas water were Thermus (21,221 sequences (15.77%)), Hydrogenobacter (11,517 sequences (8.56%)) and Thiobacillus (5659 sequences (4.20%)). Thermus spp. has been traditionally described as heterotrophic thermophilic Gram-negative aerobic bacteria; although most are facultative anaerobes in the absence of oxygen and presence of nitrate37 but some species from the genera have shown the ability to grow mixotrophically38,39. The dominance of Thermus in As Burgas water is consistent with this genus’ optimal growth temperature (62–75 °C)37, in fact, members of this genus are commonly found in other thermal springs with temperatures above 60 °C. For example, in the hot springs of Heart Lake Geyser Basin in Yellowstone National Park, a shift in the microbial population was detected from several cyanobacterial genera at 44 °C to the observation of Thermus members at 63 °C and finally a predominance of this genus in the 75 °C geysers40. Thermus genus was also dominant in the 65 °C Mihai-Bravu spring in Romania28 and the Rupi Basin geothermal spring in Bulgaria41. This genus also dominates the water of the geographically close Lobios hot spring in Ourense23.

Hydrogenobacter was the second most abundant genus in As Burgas. These extremely thermophilic representatives of phylum Aquificae are obligate chemolithotrophic organisms with anaerobic anabolism, but aerobic catabolism42. High relative abundance and co-existence of Hydrogenobacter with Thermus genera was found in Lobios (Ourense)23, Rupi Basin (Bulgaria)41, Elegedi (Eritrea)43 and in Niujie (China)44 thermal springs. The association between hydrogen-oxidizing Hydrogenobacter with hydrogen-producing Thermus in these hot springs suggests hydrogen metabolism as an essential component of these ecosystems.

In addition to the community analysis, functional analysis was performed with MG-RAST. The sequences that passed MG-RAST quality control produced 347,814 and 368,188 predicted protein-coding features for BW1 and BW2, respectively. From these, 52.1% (181,371) for BW1 and 52.8% (194,410) for sample BW2, were assigned annotation by MG-RAST to SEED functional categories (Subsystems) (Table 2). Among the functional categories at Level 1 identified by the SEED subsystems annotation, the four most dominant were the clustering-based subsystems (functional coupling evidence but unknown function; 13.44 ± 0.55%), protein metabolism (10.77 ± 0.17%), carbohydrates (9.55 ± 0.11%) and miscellaneous (6.42 ± 0.24%), based in the relative abundance of assigned reads (Fig. 2). More detailed information is provided in supplementary material (Supplementary Table S3). Similar results were found in Lobios hot spring water where the clustering-based subsystems were found as the largest category followed by miscellaneous, carbohydrates, and protein metabolism23. The predominance of the clustering-based subsystems in both metagenomes shows how limited our knowledge is regarding the functional annotation of the microbial proteome, as the precise functions of most proteins in metabolic pathways are yet to be revealed. Thus, the strategy of discovering new activities by a functional-driven metagenomic approach rises as a valid alternative to overcome such challenges.

Table 2 MG-RAST resume of the two replicates of As Burgas water metagenome (BW1 and BW2 samples MG-RAST Ids mgm4709017.3 and mgm4709018.3 respectively).
Figure 2
figure 2

Functional profile of As Burgas hot spring at SEED subsystems level 1. Of 347,814 and 368,188 protein-coding regions predicted from BW1 and BW2 reads by MG-RAST, 52.1% (181,371) and 52.8% (194,410) were assigned by MG-RAST to SEED functional categories (Subsystems). The percentage of reads assigned to each function is represented. Others include those functions with less than 2.11% reads assigned (Cell Division and Cell Cycle; Dormancy and Sporulation; Fatty Acids, Lipids, and Isoprenoids; Iron acquisition and metabolism; Metabolism of Aromatic Compounds; Phages, Prophages, Transposable elements, Plasmids; Phosphorus Metabolism; Photosynthesis; Potassium metabolism; Regulation and Cell signaling; Secondary Metabolism; Sulfur Metabolism; Motility and Chemotaxis). Graph was created with Excel for Windows version 14.0.0.

Since O2 concentration is reduced in hot springs due to lower oxygen solubility in heated water, other electron acceptors are important, such as nitrate, elemental S, sulfate, or CO2. Thus, an overrepresentation of sequences related to nitrogen and sulfur metabolism could be expected in these kinds of habitats. Consequently, in this study, we specially review those pathways involved in nitrogen and sulfur metabolism.

Analysis of the nitrogen metabolism at subsystem level 3 revealed a high abundance of sequences involved in nitrate and nitrite ammonification, also known as dissimilatory nitrate reduction to ammonium (DNRA) (Table 3). DNRA is the result of anaerobic respiration by chemoorganoheterotrophic microorganisms using nitrate (NO3) as a final electron acceptor, producing ammonia (NH4+). This metabolic pathway results in nitrogen (N) conservation in the ecosystems and is favored in habitats where NO3 is limiting in relation to organic carbon45. Therefore, the low NO3- content found in As Burgas water in comparison to other proximal geothermal springs such as Outariz, Tinteiro and Chavasqueira20 might be promoting the prevalence of DNRA bacteria like Proteobacteria46,47. This result is in accordance with the dominance of phylum Proteobacteria found in the taxonomical analysis of As Burgas metagenomic sequences. Nevertheless, it is important to remark that the presence or relative abundance of a gene in a metagenome does not mean that it is active. Metatranscriptomic studies are necessary to determine if DNRA is an important pathway in this ecosystem. In this aspect, other studies have reported the occurrence of an active DNRA pathway in some hot springs48,49,50.

Table 3 Analysis of Subsystems at level 3. From the 28 subsystems at level 3 registered by MG-RAST, only those subsystems with more than 2000 reads assigned were collected in the table.

A high number of reads with similarity to ammonia assimilation were found in As Burgas water metagenome (Table 3). The abundance of sequences annotated as glutamine synthetase and glutamate synthase, key enzymes in this metabolic pathway, were already expected as they are widely distributed among microorganisms, playing an important role in nitrogen metabolism51.

Reads annotated as Nitrogenase (Nif) genes for nitrogen fixation were also abundant in the metagenome. Although the distribution of these genes seems to be widespread in nature, as they have been described in different environments52 including hot springs53,54,55, active nitrogen fixation has been reported in several thermophilic organisms56,57. Nitrogen fixation could be important in As Burgas as this ecosystem harbors phyla with known diazotrophic representatives such as Proteobacteria and the phylum Aquificae in which some members of Hydrogenobacter were recently described as nitrogen-fixing bacteria58. Furthermore, nitrogen fixation has been demonstrated in other geothermal springs such as several hot springs from Yellowstone National Park59,60 and Nakabusa hot springs in Japan61, among others.

Nitrification might also take place in As Burgas ecosystem, as sequences matching the ammonia monooxigenase (AMO) enzyme were detected in the two metagenomes. This enzyme catalyzes the oxidation of ammonia to hydroxylamine and it is essential for chemolithotrophic ammonia-oxidizing bacteria. The oxidation of ammonia to nitrite in As Burgas hot spring water could be associated with the abundant Proteobacteria, as several members of this phylum have been described as autotrophic nitrifiers62,63.

Another important component in the nitrogen cycle is denitrification, which competes with DNRA, due to the dependence of both metabolic pathways on NO3. Members of the genus Thermus are important denitrifiers in heated ecosystems, as they can perform facultative anaerobic respiration using NO3 as the final electron acceptor, producing N2 or nitrous oxide (N2O)37. In addition, representatives from another abundant genus in As Burgas, Thiobacillus, also perform denitrification processes64,65. Unexpectedly, not many sequences related to denitrification were annotated in the metagenome (771 sequences in BW1 and 692 in BW2), even though these potential denitrifiers were two of the most abundant genera found in As Burgas. At the function level, sequences related to denitrification such as nitrite reductase (nir), nitric-oxide reductase (nor) and nitrous-oxide reductase (nos), were present in both metagenomes, but not in high abundance.

Functions involved in sulfur oxidation were also abundant in As Burgas water (Table 3). The high abundance of these sequences can be attributed to the prevalence of Proteobacteria in the microbial community, since this is an important sulfur-oxidizing phyla66,67. Numerous members of the abundant phylum Aquificae and Deinococcus-Thermus can oxidize thiosulphate or sulfur as an energy source and thus harbor sox genes38,39,68. Moreover, some sulfur-oxidizing bacterial species of the genus Thermus and Thiobacillus are also nitrate-reducing bacteria that accept electrons from the oxidation of reduced inorganic sulfur compounds and have been frequently identified in a diverse range geothermal springs38,39,64. Therefore, sulfur oxidation coupled with denitrification could be an important source of energy for carbon fixation in this hot spring, as was previously described for other hot springs69 and diverse heated habitats like hydrothermal vents70.

In relation to carbon-fixation metabolism, a high abundance of sequences associated with the reductive pentose phosphate cycle (Calvin–Benson cycle) (Table 3) was found. This cycle has been described as the principal pathway of carbon fixation in Cyanobacteria and Proteobacteria71 and some studies have reported the presence of genes related to this cycle in several Thermus strains72.

The number of sequences affiliated to the tricarboxylic acid (TCA) cycle was also representative (1742 for BW1 and 1775 for BW2), but slightly lower than those for the Calvin-Benson cycle. Most enzymes involved in the TCA cycle function in an oxidative way (releasing stored energy through the oxidation of acetyl-CoA into ATP and CO2), but they can be used by some microorganisms in a reductive TCA cycle that is essentially the oxidative TCA cycle running in reverse, leading to the fixation of two molecules of CO2 and the production of one molecule of acetyl-CoA73. Reverse TCA is suggested to be a more ancient pathway for carbon fixation74 and the main route for primary production at high temperatures (above 70 °C)75. The ability to perform the reverse TCA cycle is typical of bacteria from the phylum Aquificae such as Hydrogenobacter75,76 and was confirmed in a variety of anaerobic and microaerobic bacteria, including several proteobacteria73. Moreover, reads annotated as pyruvate: ferredoxin oxidoreductases (POR) were found in the two metagenomes. POR enzyme decarboxylates pyruvate to form acetyl-CoA and is crucial for the reverse TCA cycle, as it is able to act as pyruvate synthase catalyzing the reverse reaction77,78. The high abundance of sequences involved in the Calvin–Benson and reverse TCA cycles reveals that autotrophy is an important source of energy of the ecosystem, as was expected, in accordance with the low organic content of this kind of thermal habitats.

A high relative abundance of reads associated with one-carbon metabolism such as YgfZ, a folate-binding regulatory protein79 and sequences related to the serine-glyoxylate cycle (Table 3) were identified. Serine-glyoxylate cycle is a carbon assimilation pathway found in aerobic methanotrophs belonging to the classes Alpha-, Gammaproteobacteria, and the phylum Verrucomicrobia80. Sequences annotated as crucial enzymes for methanotrophic metabolism such as methane monooxygenase, methanol dehydrogenase or hydroxypyruvate reductase81,82 were present in the two replicates of As Burgas metagenome. A similar result was previously reported for the nearby Lobios hot spring, in which a high abundance of sequences associated with YgfZ and the serine-glyoxylate cycle was also detected. However, Lobios metagenome lacks the methane monooxygenase and methanol dehydrogenase encoding genes23. The methanogenic microorganisms frequently found in hot springs microbial mats83 would be the methane producers for methanotrophs in As Burgas. In fact, sequences annotated to the methanogenic orders Methanobacteriales, Methanocellales, Methanomicrobiales, Methanosarcinales, and Methanopyrales were found among the archaeal reads in the taxonomical analysis of As Burgas. Moreover, sequences matching several proteins involved in methanogenesis such as heterodisulfite reductase, formate dehydrogenase, and carbon monoxide dehydrogenase were found in the metagenome. Nevertheless, the presence of methyl-coenzyme M reductase gene, a key enzyme in methanogenesis84, was not detected in the metagenome.

Sequence assembly and screening for sequences annotated as β-galactosidase

From the 873,846 quality paired-end BW2 raw reads, a total of 28,296 contigs with a maximum length of 263,962 bp and an average length of 932 bp (26,379,150 bp) were obtained using SPADes. From these, 26,417 sequences (93.36%) were annotated to the functional level with the MG-RAST. A search for β-galactosidase sequences with this tool resulted in only 2 sequences that harbor complete coding ORFs that were chosen for further study. Both selected ORFs belong to Themus scotoductus SA-01, as their nucleotidic sequence had 100% alignment with the T. scotoductus SA-01 complete genome, deposited in the GenBank by Gounder et al.85 under the accession number CP001962.1. This result is consistent with the dominance of Thermus genera reported in the taxonomical analysis. The deduced protein sequence of Tsbg and pTsbg consisted of 574 and 690 residues, respectively, and showed 100% homology with two different β-galactosidases from T.scotoductus with GeneBank accession number WP_015717803.1 and WP_015717801.1 for Tsbg and pTsbg respectively. The two proteins have been registered in GeneBank as part of a whole shotgun genome sequencing and annotation, but their cloning and expression have never been reported, therefore we selected both ORFs for further study and characterization. Both protein sequences contain a Glycosyl hydrolases family 2 (GH2) TIM barrel Domain (PF02836) according to Pfam protein database86. Therefore they are within the GH2 superfamily, in agreement with other thermostable microbial β-galactosidases like those from Thermotoga maritima87 or Streptococcus thermophilus88.

Cloning, expression, and purification of T. scotoductus β-galactosidases

Both sequences were efficiently amplified, cloned in pDEST-527 vector and overexpressed in T7 Express E. coli. As no activity towards ONPG or lactose was detected for Tsbg, the gene was cloned in pDEST-527 without the histidine tag, in an attempt to discard the possibility of an incorrect folding or blocking of the active site due to the tag. Nevertheless, purified Tsbg protein without tag did not show activity using both lactose and ONPG as substrates. The lack of β-galactosidase activity in Tsbg is similar to the results obtained for T. scotoductus DSM 8553, as no β-galactosidase activity was detected in this strain89,90, which suggests that the cause is the protein itself rather than the expression host. Therefore, the successive characterization steps were only performed with the pTsbg.

Effect of pH and temperature on activity and stability of recombinant pTsbg

pTsbg showed maximal activity at pH 6.0 in Britton-Robinson buffer using ONPG as substrate (Fig. 3A). This result is slightly lower than the optimum pH reported for other bacteria from Thermus genera like T. thermophilus HB891, T. thermophilus HB2792 and it is comparable to the optimal pH reported for other thermostable β-galactosidases such as those from Bacillus licheniformis93, Caldicellulosiruptor saccharolyticus94, Marinomonas sp. BSi2041495 and much lower than the pH 7.8 reported for T. oshimai DSM 12092 β-galactosidase96.

Figure 3
figure 3

Effect of pH (A) and temperature (B) on the activity of pTsbg in Z Buffer using ONPG (4 mg mL−1) as substrate. Graphic was created using GraphPad Prism 6 for Windows (GraphPad Software, San Diego, California USA,

As shown in Fig. 3B, maximal pTsbg β-galactosidase activity towards ONPG was found at 85 °C. This optimal temperature is higher than described using the same substrate for other counterparts of the genus Thermus such as T.thermophilus HB891, T.thermophilus HB2792, T. aquaticus YT‐197, T. oshimai DSM 1209296 and is the same reported as optimal to T.thermophilus KNOUC114 β-galactosidase98. When compared to other genera of thermophilic bacteria β-galactosidases, pTsbg showed higher optimal temperature than documented for the extremely thermophilic C. saccharolyticus and Marinomonas sp. BSi20414, which showed the optimum temperature at 80 °C and 60 °C respectively94,95. Nevertheless, the optimal temperature described for Thermotoga naphthophila RUK-10 β-galactosidase is higher99.

In relation to the thermal stability, pTsbg was able to retain up to 60% of its maximal activity towards ONPG after 24 h of incubation at 75 °C (Fig. 4).

Figure 4
figure 4

Effect of temperature on the stability of purified pTsbg. Graphic was created using GraphPad Prism 6 for Windows (GraphPad Software, San Diego, California USA,

Determination of substrate specificity of pTsbg

Although substrate specificity of the enzyme was studied using the eight chromogenic substrates described in the “Methods” section, pTsbg was only active towards ONPG and p-Nitrophenyl-β-d-fucopyranoside. Moreover, the enzyme was unable to hydrolyze lactose and no transgalactosylation was observed in the presence of this substrate, as was determined by HPLC after carrying the reaction with 40% lactose at 70 °C and using a mix of galactose, glucose, lactose, raffinose and stachyose as standard (data not shown).

The preference for β-linked galactosidic substrates such as ONPG or p-Nitrophenyl-β-d-fucopyranoside over lactose has been frequently described in the characterization of β-galactosidases99,100. Similar to our results with pTsbg, other studies have reported β-galactosidases with activity towards ONPG but unable to hydrolyze their natural substrate lactose in vitro such as YesZ β-galactosidase from Bacillus subtilis101 or the β-Gal II from Bifidobacterium adolescentis DSM 20083102. The lack of β-galactosidase activity towards lactose reduces considerably the biotechnological potential of pTsbg, as it could not be applied to produce GOS from lactose and to generate lactose-free dairy products. Nevertheless, more studies focused on the fucosidase activity should be conducted, since pTsbg showed high activity with p-Nitrophenyl-β-d-fucopyranoside and may harbor fucosyltransferase activity that could be used for the synthesis of fucosylated oligosaccharides (FUCOS) with biological interest103 such as those from human milk.


The taxonomical analysis of As Burgas hot spring metagenome reveals a microbial community dominated by Bacteria in which Proteobacteria (68.25 ± 3.59%) and Aquificae (11.24 ± 1.15%) are the most abundant phyla. The prevalence of the genera Thermus (15.77%) and Hydrogenobacter (8.56%) and the relation of their metabolism suggests an association between these two genera.

Moreover, the high relative abundance of sequences involved in the Calvin–Benson cycle and sequences annotated as key for the reductive TCA cycle unveils the dominance of an autotrophic population. Important pathways from the nitrogen and sulfur cycle such as DNRA, nitrification, or sulfur oxidation are potentially taking place in As Burgas hot spring, as was determined by the functional annotation of the metagenomic reads and in accordance with the microbial composition of the ecosystem.

After assembling the metagenomic reads two complete ORFs annotated as β-galactosidases were found. Both of them showed 100% homology with T. scotoductus SA-01 and were cloned and overexpressed in E. coli. The enzyme Tsbg lacked β-galactosidase activity using ONPG and lactose as substrates. On the contrary, pTsbg showed β-galactosidase activity towards ONPG but was not able to hydrolyze lactose; it showed β-fucosidase activity on the substrate p-Nitrophenyl-β-d-fucopyranoside, which suggests a priori unexpected biotechnological application. Once more this result reveals that the presence of a gene in a metagenome does not mean that it is active in the way predicted from the sequence, and highlights the importance of combining both functional and sequence metagenomics to find novel enzymes from metagenomes.

Our culture-independent study has provided an insight into the diversity of the microorganisms that inhabit As Burgas thermal environment, in an attempt to find novel β-galactosidases. Future research should be directed to characterize new environments, which will lead to better understanding of their ecological differences, and to find new enzymes of interest.



Thermal water, with temperature 66.3 °C and pH 7.5620, was collected from As Burgas hot spring (GPS 42.334626, -7.865332), in Ourense (Galicia, Spain) in December 2015. Briefly, two samples (BW1 and BW2) of 50 L of water were collected into thermal water bottles, which were prewashed with 70% ethanol. The water sample was stored at room temperature until the next day when it was filtered through a nitrocellulose filter of 0.2 µm (25 L per filter and two filters per sample). Filters were preserved at − 20 °C until metagenomic DNA extraction.

DNA extraction and shotgun sequencing

Total DNA was isolated from the filters using the Metagenomic DNA Isolation Kit for Water (Epicentre Biotechnologies; Madison, Wisconsin, United States), according to the manufacturer’s protocol. Metagenomic DNA of both replicates was quantified using Qubit dsDNA HS Assay kit (Invitrogen; Waltham, Massachusetts, USA) and prepared for Next Generation Sequencing using the Accel-NGS 2S Plus DNA Library Kit (Swift Biosciences; Ann Arbor, Michigan, USA). The amplified libraries were checked with a Bioanalyzer 2100 (Agilent Technologies; Santa Clara, California, USA) and concentrations were quantified by Qubit dsDNA HS Assay kit (Invitrogen). Paired-end sequencing of the metagenomic DNA libraries was performed with 2 × 300 bp using the MiSeq sequencer (Illumina, San Diego, California, USA) at San Diego State University.

Taxonomic and functional assignment of metagenomic sequences

Illumina reads were treated with PRINSEQ software for quality control, removing all artificial duplicate reads and reads shorter than 60 base-pairs104. High-quality unassembled reads of both replicates were uploaded into the Metagenomics Rapid Annotation using the Subsystem Technology (MG-RAST) v4.0.3 server105 and are available under the accession numbers mgm4709017.3 (BW1) and mgm4709018.3 (BW2). MG-RAST is an automated annotation pipeline in which taxonomic assignment is done with BLAT comparisons106 to the NCBI, and gene functional potential with BLAT comparisons to the SEED protein database105. Sequence annotations were performed using the following parameters: cut off e-value 10−5, minimum 60% identity, and > 15 bp alignment length, as we have done previously107.

To reduce the differences related to library size, relative abundance was calculated as the percentage of reads assigned to a taxon or gene function in proportion to the total number of annotated reads.

Sequence assembly and screening for sequences annotated as β-galactosidase

Paired-end unassembled high-quality reads were merged using PEAR108 and assembled with the SPAdes pipeline109. Then, assembled reads were uploaded to MG-RAST for functional annotation with the SEED subsystem database (maximum e-value of e−5, minimum identity of 60%, and minimum alignment length of 15). The contigs that contained β-galactosidases sequences were downloaded and analyzed for all possible open reading frames (ORFs) using NCBI ORF finder110. The ORFs and the deduced amino acid sequence were compared with other known sequences using nucleotide-nucleotide and protein–protein basic local alignment search tool (BLASTN and BLASTP) search111. The Pfam 32.0 web server, based on Pfam family database86 was used to infer the conserved domains within the amino acid sequences.

Cloning, expression, and purification of T. scotoductus β-galactosidases

Thermus scotoductus β-galactosidase and putative β-galactosidase ORFs were amplified directly from the metagenomic DNA with the primers listed in Table 4 and both were cloned in the pDONR211 vector using the Invitrogen Gateway Technology (Invitrogen). From the Gateway vector, the gene was shuttled into the His-tagged expression vector pDEST-527, using the Gateway LR recombination reaction (Invitrogen). The constructions were transformed and expressed in T7 Express (C2566) E. coli (NEB). Induction was done with 0.4 mM IPTG for 2 h at 37 °C. Cells were collected by centrifugation (5000 rpm for 15 min 4 °C) and resuspended in 20 mM sodium phosphate buffer 500 mM NaCl (pH 7.2) and Complete Mini Protease Inhibitor Cocktail (Roche; Basel, Switzerland), following the manufacturer instructions. Cell disruption was done by sonication on ice using Vibra Cell sonicator (100 W, 5 min 2″ ON/8″ OFF) (Sonics & Materials; Newtown, Connecticut, USA). The resulting crude extract was preheated at 70 °C for 10 min to denature E. coli proteins, as suggested by Pessela et al.112. Then, the clear lysate obtained after centrifugation (14,000 rpm for 20 min) was passed through a HisTrap HP column (GEHealthcare; Chicago, Illinois, USA), following the manufacturers’ protocol and using an ÄKTA chromatography system (GEHealthcare). Briefly, the column was equilibrated with 20 mM sodium phosphate buffer 500 mM NaCl and 20 mM imidazole (pH 7.2) and the elution of the bound His-tagged fusion protein was done with a 20 mM sodium phosphate buffer 500 mM NaCl and 500 mM imidazole (pH 7.2). The selected fractions were concentrated and dialyzed using an Amicon Ultra-15 30,000 MWCO column (Millipore; Burlington, Massachusetts, USA). Purified protein concentration was quantified according to the Bio-Rad Protein Assay (Bio-Rad; Hercules, California, USA), employing bovine serum albumin as a standard. Protein samples of the different stages of the purification were run in a 10% SDS-PAGE gel for its molecular weight determination. NZYcolour Protein Marker II (Nzytech; Lisboa Portugal) was used as molecular weight standard and proteins were detected by staining with Coomassie Brilliant Blue.

Table 4 Primers used for the amplification of T. scotoductus β-galactosidase and putative β-galactosidase ORFs.

Determination of β-galactosidase activity

Enzymatic activity was measured using ortho-Nitrophenyl-β-d-galactopyranoside (ONPG). Purified protein preparations were diluted in 150 µL Z buffer (100 mM Na2HPO4, 40 mM NaH2PO4, 10 mM KCl, 1.6 mM MgSO4, pH 7). Aſter incubation for 5 min at 85 °C, the reaction was started by adding 150 µL of a solution of 4 mg mL−1 ONPG in Z buffer to the enzyme preparation. Aliquots (100 µL) of the reaction mixture were stopped by adding 100 µL 1 M Na2CO3. Released o-nitrophenol was measured by UV absorbance at 420 nm. β-galactosidase activity is expressed in enzymatic units (U), defined as the amount of enzyme capable of releasing one µmol of the product (o-nitrophenol) per min (µmol min−1 mL−1) under the experimental conditions. All measurements were determined in triplicate.

Effect of pH and temperature on activity and stability of recombinant pTsbg

To estimate the effect of pH on enzyme activity, the relative activities against ONPG (4 mg mL−1) were measured in the range of pH 5.0–8.5 using 20 mM Britton–Robinson buffer113. The influence of temperature was determined by measuring relative enzyme activities at 55–90 °C with ONPG (4 mg mL−1) in Z buffer. The thermal stability of the protein was assessed by pre-incubation of the enzyme in Z buffer at a range of 55–85 °C for different times followed by an activity assay against ONPG at 85 °C.

Determination of substrate specificity and GOS production

The substrate specificity of the purified pTsbg was determined at 85 °C using 4 mg mL−1 solutions of the following chromogenic substrates in Z buffer (pH 7): ONPG, p-Nitrophenyl-β-d-fucopyranoside, p-Nitrophenyl-β-d-mannoside, p-Nitrophenyl-α-d-mannoside, p-Nitrophenyl-β-d-glucoside, p-Nitrophenyl-α-d-glucoside, p-Nitrophenyl-β-d-xyloside, and p-Nitrophenyl-α-d-xyloside.

GOS and lactose concentrations were determined by HPLC (HPLC Waters Breeze I), using a Waters Sugar-Pak column eluted at 90 °C with 0.1 M EDTA disodium salt in Milli-Q water at a flow rate of 0.5 mL min−1, and a Waters 2414 refractive-index detector. Purified protein was incubated at 70 °C and 650 rpm in phosphate buffer 0.1 M (pH 6.8), supplemented with 40% lactose. Samples were taken at 0, 0.5, 1, 2, 4, 6, and 24 h and immediately transferred to 99 °C for 5 min to inactivate the enzyme and stored at − 20 °C for subsequent analysis. Carbohydrates were quantified by external calibration, using standard solutions of galactose, glucose, lactose, raffinose, and stachyose.