Introduction

β-galactosidase (EC 3.2.1.23), commonly known as lactase, is one of the most important enzymes used in food processing industry1,2. This enzyme catalyzes the hydrolysis of β-galactosides from polymers, oligosaccharides or secondary metabolites by breaking the β-D-galactosidic linkages3. In addition to the hydrolytic activity, some β-galactosidases also possess the transgalactosylation activity, which involves the process of transfering galactose to another carbohydrate instead of water4. Thus, this enzyme has two main applications including the removal of lactose from milk products and the production of galactosylated products such as galacto-oligosaccharides (GOS). The lactose-hydrolyzed milk products can meet the need of lactose-intolerant people, while GOS is one of the important human prebiotics5.

β-galactosidases are distributed in a variety of sources including bacteria, fungi, and plants1,6. They are categorized into the glycoside hydrolase (GH) families GH1, GH2, GH35, GH42, GH59, and GH147 based on their similarities2. β-galactosidases from these GH families belong to the superfamily Clan-A and share the (α/β)8 barrel structure7. The well-known Escherichia coli β-galactosidase LacZ belongs to the GH2 family and has been structurally elucidated8,9. The functionally active form of E. coli LacZ is a homotetramer with each monomer comprising of five structural domains, and the third (central) domain (residues 334–627) is an (α/β)8 barrel with an extended active-site cleft. Different sources of β-galactosidases differ in their optimum pH and temperature, thermal stability, substrate specificity, and metal ion cofactor sensitivity, providing a diversified selection for application in food processing1,2,10,11,12. Therefore, identification and characterization of new β-galactosidases from natural resources is beneficial for establishing glycosidase libraries and offers a wide variety of candidate glycosidases for application in food industry. Photosynthetic microorganisms (microalgae and cyanobacteria) serve as promising resources to excavate the intra- and extra-cellular β-galactosidases13,14,15. However, the reports about the biochemical characterization of new β-galactosidases from microalgae or cyanobacteria are still relatively few.

Nostoc flagelliforme is a soil surface-dwelling cyanobacterium inhabiting the xeric steppes of western China16. It exhibits a predominantly filamentous (hair-like or cylindrical) colony shape. In previous studies, an acidic water stress protein, WspA, was identified to be a novel β-galactosidase in N. flagelliforme and its close relative Nostoc commune17,18. WspA was synthesized in cells under ultraviolet irradiation or desiccation stress and secreted into extracellular polysaccharide matrix upon rehydration19. It was recently reported that wspA sequences showed high polymorphism in N. flagelliforme colonies20. In the sequenced N. flagelliforme CCNUN1 (NCBI BioProject, PRJNA407846), there are two adjacent wspA genes (COO91_01770 and COO91_01773)21. In the sequenced N. commune HK-02and Nostoc sphaeroides CCNUC1, there is one (NIES4070_53480) and three (GXM_06477, GXM_06476, and GXM_06474) wspA genes, respectively. wspA1 gene was first reported in N. commune19 and was also amplified by PCR in N. flagelliforme colonies18. Since the recombinant full-length WspA1 was always expressed as inclusion bodies in E. coli protein expression system, we had generated two truncated proteins of WspA1 (WspA and WspB) for biochemical characterization18. The effects of temperature, pH, and metal ions on the activities of WspA and WspB as well as their catalytic constant Km were characterized in our previous study. The enzymatic activity of WspA was stronger than that of WspB. However, there still remain some uncharacterized biochemical features for WspA1, such as the active form (monomer or multimer), enzymatic inhibitors, the active center, and so on. In addition, the potential homologs of the well-known β-galactosidase LacZ in N. flagelliforme have also not been characterized so far. In the present study, we identified a LacZ in N. flagelliforme (hereinafter Nf-LacZ) and conducted comparative biochemical analysis of Nf-LacZ and WspA1. Further, we focused on WspA1 to explore its central active region by using the protein truncation test. Besides, we investigated the possible role of the specific N-terminus of WspA1.

Material and methods

Cloning of β-galactosidase genes in N. flagelliforme

The potential β-galactosidase LacZ in N. flagelliforme was identified by local blasting (BioEdit software) against the proteome fasta file of N. flagelliforme CCNUN1 with the well-known E. coli (strain K12) LacZ (JW0335, KEGG)9. The resulting LacZ homolog is AUB41471 (NCBI), which is encoded by the gene COO91_07519 (KEGG). Nf-LacZ sequence was amplified by PCR from genomic DNA of the N. flagelliforme culture in our laboratory. The NCBI accession no. for WspA1 is ABA54841 and its complete CDS can be retrieved from NCBI accession no. DQ155425. Various truncated sequences of wspA1 were amplified by PCR from our previously constructed plasmid pMD18-T::wspA122. The PCR primers used in this part (primer no. 1–9) were summarized in Table 1. PCR products were digested with the restriction endonucleases Nde I and BamH I and constructed into the plasmid pET28a between the same restrictive sites. All the constructions were verified by sanger sequencing.

Table 1 The PCR primers used in this study.

In vitro expression and purification

The E. coli BL21(DE3)/pET28a protein expression system (Novagen, USA) was used to express target proteins. The above constructs were transformed into the E. coli stain to produce target proteins with His-tags at the N-terminus. The transformed E. coli strains were grown in 200 ml LB medium (containing 50 μg/ml kanamycin) at 37 °C and 220 rpm until the optimum density at 600 nm (OD600) reached up to 0.5 ~ 0.6, and then the cultures were subjected to protein induction for 6 h with 0.2 mM Isopropyl β-D-thiogalactoside (IPTG). After centrifugation, the pellets were crushed by a low-temperature high-pressure crusher. The crude proteins were loaded on Ni His•Bind resin gravity column (Novagen, USA). The column was washed with the buffer (20 mM Tris–HCl, 500 mM NaCl, 80 mM imidazole, 5% glycerol, pH 8.0) to remove unwanted proteins and then the target protein was eluted with the buffer (20 mM Tris–HCl, 500 mM NaCl, 1000 mM imidazole, 5% glycerol, pH 8.0). Protein profiling or separation was examined using 12% sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE)23. If necessary, eluted proteins were further purified by gel filtration with fast protein liquid chromatography (FPLC) system (AKTA purifier, GE Healthcare, Sweden), which was equipped with an anion exchange column HiTrap Q FF (16 × 25 mm, GE Healthcare)24. Protein concentration was determined with the Bradford assay25.

Protein polymerization assay

The gel filtration with the FPLC system can be also employed to analyze the polymerization state of native or active proteins26. Protein samples were injected into the column for separation, which was equilibrated with the buffer (20 mM Tris–HCl, 150 mM NaCl, 10% glycerol, pH 7.5). Five molecular weight markers were used: beta-amylase (200 kDa), alcohol dehydrogenase (150 kDa), albumin (66 kDa), carbonic anhydrase (29 kDa), and cytochrome c (12.4 kDa) (GE Healthcare, China agency). Protein separation was monitored by measuring the absorbance at 280 nm. To analyze the effects of acidic or more alkaline conditions on the protein polymerization state, the above-mentioned buffer was adjusted to pH 5.5 and 8.5, respectively.

Galactosidase activity assay

Galactosyl hydrolytic and transgalactosylation activities of the target proteins were assayed as previously described18 with slight modifications. Galactosyl hydrolytic activity was assayed in 1 ml of 0.1 M phosphate-buffered saline (PBS) solution (pH 7.5) with 3 mM o-nitrophenyl-β-D-galactopyranoside (ONPG) as the substrate. The final protein concentration used for the reaction was 10 µg/ml. The reactions were conducted at 37 °C and stopped by supplementation with 100 μl of 1 M Na2CO3 solution. The absorbance of the reaction product o-nitrophenol (ONP) was measured at 405 nm.

Transgalactosylation reactions were performed at 37 °C for 3 h by incubation of 20 μg/ml enzyme (final concentration) with 20 μl of the acceptor glucose (500 mM) and 60 μl of ONPG (50 mM) in 100 μl of 0.1 M PBS buffer (pH 7.5). Products of the transgalactosylation reaction were examined by thin-layer chromatography (TLC)27. Each reaction solution of 2 μl was dropped on the silica gel plate for TLC analysis with methanol:chloroform (40:60) as the mobile phase. After the chromatography, the plate was air-dried at room temperature. The chromatogram observation was performed by spraying 20% H2SO4 on the silica gel plate and heating at 115 °C for 15 min.

Enzymatic inhibitor assay

Enzymatic inhibition reaction was conducted in the above-mentioned galactosyl hydrolytic solution by supplementing various concentrations of glycosidase inhibitors. Four inhibitors were used: 4-methylumbelliferyl-beta-D-glucopyranoside (4-MU-Glu), conduritol B epoxide (CBE), acarbose, and galactostatin bisulfite (GBS) (ALFA Chemistry, USA). The reaction was conducted at 37 °C. The protein concentration was 10 µg/ml. Similarly, the absorbance at 405 nm of the reaction product was measured. For evaluating the inhibitory effects of these inhibitors, the concentration range of 0 ~ 200 µM and the extended reaction time of 9 h were considered in the initial test.

The half-maximal inhibitory concentration (IC50) represents the concentration of a substance (e.g. a drug) that is required for 50% inhibition in a specific biological or biochemical function28. The IC50 values of the galactosidases in response to GBS inhibition were assayed as previously described29 with slight modification. For IC50 detection, the reaction was conducted at 37 °C for 3 h, with the concentrations of GBS ranging from 0 to 50 µM.

Expression of GFP-fused proteins in Nostoc sp. PCC 7120

Three nucleotide sequences, wspA1, the N-terminal sequence of wspA1 (wspN), and the truncated sequence of wspA1 without wspN (wspB), were amplified by PCR from the plasmid pMD18-T::wspA1 with the primers (primer no. 10–13) as shown in Table 1. For generating green fluorescent protein (GFP) gene-fused constructions, a plasmid pRL25C-GFP30 was modified by introducing a petE promoter31 and two adjacent restriction sites Sma I and Xho I, and then the PCR products were inserted into the modified plasmid. Plasmid transformation and transformant selection were performed as previously described32. GFP fluorescence signals in the transgenic cells were observed by confocal laser-scanning microscopy (Leica, Germany). GFP was excited at 488 nm by an argon-ion laser.

Western blotting analysis

The above transgenic cells at the exponential period (OD750 of 0.4 ~ 0.6) were collected by centrifugation at 6,000 rpm for 5 min. The pelleted cells were subjected to protein extraction and the crude protein extracts were separated on 12% SDS-PAGE gels for western blotting as previously described22. Anti-WspA1 rabbit antiserum was used for the blotting. In addition, the remaining supernatants were filtered with double filter papers to remove residual cells. The initial chlorophyll fluorescence (F0) of possibly residual cells in the filtered solutions was detected by a plant efficiency analyzer (Hansatech Instruments Ltd., UK)33. The F0 value of zero confirmed no cell contamination. The solutions were freeze-dried and the pellets (containing the released proteins) were subjected to western blotting as the above-mentioned.

Phylogenetic analysis

The Nf-LacZ sequence was used to query the KEGG database with BLAST, and the resulting top 50 hits were retrieved from the database. The LacZ from E. coli MS 85–1 (b0344, KEGG) was used as an outgroup. The amino acid sequences were aligned using mafft34. The resulting sequence alignments were adjusted using trimAl35 by removing spurious sequences. The maximum-likelihood phylogenetic tree was inferred using IQ-Tree 2.1.236 with the LG + F + R5 model and 1,000 bootstraps.

Results

Identification of the β-galactosidase LacZ in N. flagelliforme

The potential homologs of the β-galactosidase LacZ have not yet been identified in N. flagelliforme. In this study, a putative Nf-LacZ (COO91_07519) was identified as described in the methods. Nf-LacZ consists of 619 amino acid residues with a calculated molecular weight of 70.8 kDa. The Pfam domain analysis showed that Nf-LacZ possesses the TIM barrel domain and the sugar-binding domain of GH family 2. Phylogenetic analysis suggested that LacZ homologs from cyanobacterial species form a distinct clade (Fig. 1).

Figure 1
figure 1

Phylogenetic analysis of Nf-LacZ (COO91_07519) with its top 50 similar homologs from the KEGG database. The nodes with bootstrap values higher than 70% were highlighted with blue dots. The lacZ homologs from Cyanobacteria (green) form a distinct clade. The homologs from Betaproteobacteria were highlighted in blue, the homologs from Deltaproteobacteria were highlighted in orange, the homologs from Acidobacteria were highlighted in red, and the homologs from Deinococcus-Thermus group were highlighted in purple. The species names for these proteins were included in supplemental table S1.

The recombinant Nf-LacZ was expressed by employing the E. coli expression system. As shown in the SDS-PAGE gel, Nf-LacZ was effectively induced and then separated (Fig. 2A). Gel filtration with FPLC is often used to analyze or purify mixtures of proteins according to size and charge37. Based on the molecular weight markers in this FPLC analysis (Fig. 2B), native Nf-LacZ should be a multimeric protein (at least a trimer or larger). The effects of pH and temperature on the enzymatic activity were also assayed using ONPG as a substrate. The optimum temperature and pH for Nf-LacZ are 40 °C and pH 6.5, respectively (Supplemental Fig. S1). Metal ions may also affect the activity of the β-galactosidase. It was found that the metal ions, K+, Mg2+, Ca2+, Zn2+, and Mn2+ can all enhance the enzymatic activity of Nf-LacZ (Supplemental Fig. S1). In contrast, the optimum temperature and pH for WspA are 45 °C and pH 8.0, respectively, and Ca2+ and Zn2+ are inhibitory for the activity of WspA18. Further, the kinetic parameters Km and Vmax of Nf-LacZ were determined with ONPG as the substrate (Fig. 2C). The Km value was 0.5 mmol/liter for Nf-LacZ, which is close to that of WspA18. Thus, Nf-LacZ has a similar affinity as WspA for ONPG under the tested condition.

Figure 2
figure 2

In vitro expression and enzymatic analysis of Nf-LacZ protein. (A) In vitro expression of Nf-LacZ by the E. coli BL21/pET28a protein expression system. M marker protein, P total proteins before IPTG induction, IP total proteins after IPTG induction, W1 and W2 washed fractions, E eluted fraction. Blue arrow points to the Nf-LacZ protein. (B) The protein polymerization state analyzed by FPLC. Molecular weight markers: beta-amylase, 200 kDa; alcohol dehydrogenase, 150 kDa; albumin, 66 kDa; carbonic anhydrase, 29 kDa; cytochrome c, 12.4 kDa. (C) Michaelis kinetic analysis. Vmax and Km values were calculated. ONPG serves as the substrate. Reaction was conducted at 45 °C and pH 8.0.

Analysis of the polymerization of native WspA

WspA consists of 234 amino acid residues with a calculated molecular weight of 24.0 kD. The polymerization state of native WspA was not explored in our previous study. As shown in the SDS-PAGE gel, the recombinant WspA that was expressed by the E. coli expression system was separated (Fig. 3A). Subsequently, the polymerization state of native WspA was analyzed by FPLC (Fig. 3B). FPLC fraction of WspA was between the 29 and 66 kDa makers, implying that native WspA is not a monomer but a dimer. Our previous study showed that WspA had a narrow optimal pH range; at pH 5.5, the activity of WspA reduced to nearly zero, while at pH 8.5 the activity decreased more than 60% compared to the maximum activity at pH 8.018. However, FPLC analysis showed that the dimer of WspA was not dissociated at both pH 5.5 (Fig. 3C) and pH 8.5 (Fig. 3D). Therefore, native WspA forms a stable dimer although its activity can be affected by the unfavorable acid–base environment.

Figure 3
figure 3

In vitro expression of WspA and analysis of the protein polymerization state. (A) In vitro expression of WspA by the E. coli BL21/pET28a protein expression system. M marker protein, E eluted fraction. Blue arrow points to the target protein. (B) The protein polymerization state of native WspA analyzed by FPLC at pH 7.5. (C) The effect of unfavorable acidic condition (pH 5.5) on the dimer of WspA. (D) The effect of unfavorable alkaline condition (pH 8.5) on the dimer of WspA.

Effects of the inhibitors on the activities of Nf-LacZ and WspA

The response of the glycosidase to various inhibitors is an important aspect for characterization. The influences of glycosidase inhibitors on the activities of Nf-LacZ and WspA were investigated. Totally four inhibitors, 4-MU-Glu, CBE, acarbose, and GBS, were used for testing. Among them, 4-MU-Glu, CBE, and acarbose did not show obvious inhibitory effects on both enzymes. The inhibitory effects of GBS on the two enzymes were then compared (Fig. 4). The activities of both Nf-LacZ (Fig. 4A) and WspA (Fig. 4B) were markedly inhibited by 0.1 µM GBS. The IC50 value is widely used as the informative measure of an enzyme inhibitor’s efficacy28. The results showed that Nf-LacZ had an IC50 value of 0.59 µM (Fig. 4C), while WspA had a IC50 value of 1.18 µM (Fig. 4D). Thus, WspA was relatively less sensitive to the inhibitor GBS at the tested condition.

Figure 4
figure 4

The inhibitory effects of GBS on the activities of Nf-LacZ and WspA. (A,B), the galactosyl hydrolytic reactions of Nf-LacZ and WspA in presence of different concentrations of GBS, respectively. GBS concentration: 0 ~ 20 µM. The absorbance of the reaction product ONP at 405 nm (OD405) was detected. Data shown are means ± SD (n = 3). (C,D), determination of the IC50 values of GBS inhibition on the activities of Nf-LacZ and WspA, respectively. GBS concentration ranged from 0.01 µM to 50 µM in the tests.

Identification of the central activity region of WspA1

The recombinant WspA1 was always expressed as inclusion bodies in E. coli cell and thus expression of its truncated proteins was one way to explore its biochemical functions18. To investigate the central activity region of WspA1, we designed four truncated WspA1 variants (WspC1, WspC2, WspC3, and WspC4) in this study (Fig. 5A), roughly according to its secondary structure predicted by PSIPRED38. The four truncated proteins were in vitro expressed and purified (Fig. 5B). Their catalytic features as a β-galactosidase were assayed using ONPG and 5-bromo-4-chloro-3-indolyl-β-D-galactoside (X-Gal) as the substrates. The biochemical analysis showed that WspC1, WspC2, and WspC3 had a successively decreased galactosyl hydrolytic activity, while WspC4 did not show the activity (Fig. 5C,D). Further, the potential transgalactosylation activities of WspC1, WspC2, and WspC3 were assayed by TLC with glucose as the acceptor (Fig. 5E). The result showed that the disaccharide or oligosaccharide was produced under the catalysis of WspC1 and WspC2, while WspC3 had no this catalytic activity. Therefore, WspC2 should represent the minimum active region of WspA1 with both hydrolytic and transgalactosylation activities up to now.

Figure 5
figure 5

Biochemical analysis of the truncated proteins of WspA1. (A) An illustration of the truncated protein variants. (B) Protein profiles of the in vitro expressed and purified target proteins in SDS-PAGE gels. M marker protein, E eluted fraction. No. 1–6, the purified fractions by FPLC. The red arrows indicate the target proteins. C1, C2, C3 and C4 represent WspC1, WspC2, WspC3, and WspC4, respectively. (C) Comparative analysis of hydrolytic activities in 0.1 M PBS buffer (pH 7.5) with ONPG as the substrate. Data shown are means ± SD (n = 3). Protein concentrations, 10 μg/ml. ONPG, 3 mM. Control, no addition of any protein in the reaction buffer. (D) In vitro activity analysis of the four proteins in X-Gal buffer (pH 7.5). Similar conditions were used as in (C). Reactions were performed for 6 h. (E) TLC analysis of the transgalactosylation activity of the truncated proteins. Glucose was used as the acceptor. Blue arrows indicate the generated products. The pink arrow indicates the solvent diffusion direction.

Analysis of the role of the N-terminus of WspA1 in secretion

As implied in our previous attempts, the N-terminus of WspA1 (WspN) was a potential cause for the forming of inclusion bodies in the E. coli expression system. We further speculated that WspN might have a potential role in facilitating the export of WspA1 from cells, since WspA can be secreted into extracellular polysaccharide matrix upon rehydration19. WspA1, WspB (the WspA1 protein lacking WspN), and WspN were respectively fused with the GFP protein, and their features regarding extracellular transport were examined in transgenic cells by confocal microscopy (Fig. 6A–C). In WspA1::GFP cells, sporadic fluorescent foci were observed in the periplasmic space (Fig. 6A), while no such fluorescent foci were observed in WspB::GFP cells (Fig. 6B). In WspN::GFP cells, fluorescent foci were scattered in cells and some of them seemed to be in the process of secretion (Fig. 6C). In addition, the crude proteins that were extracted from WspA1::GFP and WspB::GFP cells and their culture solutions were subjected to western blotting using anti-WspA1 antibody (Fig. 6D). WspA1 was detected in both its transgenic cells and the culture solution, while WspB was only detected in its transgenic cells. Together, these results implied that WspN had a potential role in facilitating the secretion of WspA1.

Figure 6
figure 6

Examination of the potential export role of WspN in transgenic Nostoc sp. PCC 7120 cells. Confocal microscopy observation of the GFP-fused proteins, WspA1::GFP (A), WspB::GFP (B), and WspN::GFP (C) in transgenic cells. For panel (A,B), a 40 × objective lens was used; for panel (C), a 60 × objective lens was used. Red arrows point to the secreted fluorescent foci; Red circles indicate the fluorescent foci that are seemingly in the process of secretion. (D) Western blotting analysis of WspA1::GFP and WspB::GFP proteins in transgenic cells and culture solutions. An anti-WspA1 antibody was used. Blue arrows point to the target proteins. WT, wild-type cells.

Discussion

Microbial β-galactosidases hold particular importance due to their wide applications in food industries. They are also important tools for glycosylation of vital molecules in the medicine and cosmetic industries7. Characterization of new β-galactosidases from natural resources can enrich glycosidase libraries. This study conducted comparative characterization of two β-galactosidases Nf-LacZ and WspA1 from the terrestrial cyanobacterium N. flagelliforme, with more focus on the latter based on the previous research18. The LacZ homologs from some cyanobacteria form a distinct clade (Fig. 1). Biochemical analysis verified that Nf-LacZ functions as a β-galactosidase. However, Nf-LacZ shares only 25.2% sequence identity (Query coverage, 75%; E-value, 4e−21) with E. coli LacZ (JW0335). The Km values for E. coli LacZ with ONPG as the substrate ranged from 0.12 to 0.82 mmol/liter at specific conditions39,40,41. The Km value (0.5 mmol/liter) of Nf-LacZ falls in that range. The active form of E. coli LacZ is a tetramer9. According to the gel filtration assay, the size of native Nf-LacZ was larger than 200 kDa, implying that it is at least a trimer or larger. Its precise active form remains to be clarified.

By employing E. coli cell as a host, expression and production of recombinant proteins are not always successful and sometimes lead to form inclusion bodies42. The case is same for the full-length WspA1 protein and thus the protein truncation strategy was considered in the in vitro expression. The truncated proteins of WspA1 without the N-terminus (Fig. 5A) could be all obtained in soluble state. In most cases, we used WspA for biochemical characterization. As indicated by the gel filtration assay, native WspA1 should be a dimer and pH alteration cannot dissociate the dimer. Cold-active β-galactosidases are an attractive group identified in low temperature-adapted microorganisms10. Two cold-active β-galactosidases from Paracoccus sp. 32d and Arthrobacter sp. 32cB are also dimers in their native form43,44. WspA1 has no significant sequence similarity to the two enzymes. WspA1 and its homologs are found in some colonial Nostoc species, including N. flagelliforme, N. commune, Nostoc sphaeroides and Nostoc verrucosum18,19,45. Thus, WspA proteins may also represent a novel group of β-galactosidase.

Nf-LacZ and WspA have other different biochemical features. The optimum temperatures for the two enzymes are 40 °C and 45 °C, respectively, and both are very sensitive to higher temperature. The optimum pH values for them are 6.5 and 8.0, respectively, but Nf-LacZ seems more resistant to lower pH than WspA18 (Supplemental Fig. S1). The pH value of extracellular polysaccharide matrix is around 7.6 in N. flagelliforme46, which may guarantee that the secreted WspA could function effectively in the matrix. In contrast, Nf-LacZ is an intracellular protein in N. flagelliforme, since we did not detect this protein by mass spectrometry analysis of the exoproteins. The activities of both enzymes are stimulated by Mg2+, but Ca2+ is inhibitory for WspA. An in vitro experiment found that WspA could bind the UV-A/B absorbing pigment scytonemin through non-covalent interactions19. It implied that the activity of WspA might also be affected by the scytonemin molecule in the extracellular polysaccharide matrix. In addition, it was found that Nf-LacZ and WspA have similar Km values, but the latter is less sensitive to the inhibitor GBS.

Our previous study showed that the activity of WspB (Fig. 5A) was lower than that of WspA18. A derived question is which sequence region or domain in WspA1 is critical for the activity. The truncation test of WspA1 indicated that WspC2 (114 aa) can be recognized as the minimum central region with glycosyl hydrolytic and transgalactosylation activities. The smaller WspC3 (94 aa) has a very weak glycosyl hydrolytic activity, which implies that it might be the primitive sequence for the evolution of WspA1. Searching WspC3 against the NCBI nr database showed that this sequence was highly conserved (Supplemental Fig. S2). The species/strains having WspA homologs share a common feature of dense extracellular polysaccharide matrix. WspA was suggested to play a crucial role in the regulation of structural dynamics of the polysaccharide matrix for coping with periodic desiccation18. Thus, the present protein truncation analysis of WspA1 would advance our understanding on the evolution and function of WspA in those glycan-rich Nostoc species.

As the above mentioned, WspN was prone to cause the forming of inclusion bodies in the E. coli expression system. We had speculated that WspN might have a potential role in facilitating the export of WspA1 from cells. Our results showed that WspA1::GFP and WspN::GFP could be secreted from the cell in the form of small particles (fluorescent foci), while WspB::GFP could not (Fig. 6). The forming of secreted particles was also observed in our previous study in which WspA1::GFP transgenic Arabidopsis plants were generated22. WspN is not a typical signal peptide as predicted by SignalP47. Thus, WspN may represent a special or atypical transport way. The membrane-fusion potential of WspN might also be an important reason for the forming of insoluble WspA1 in the E. coli expression system. Longer or shorter similar sequences of WspN can be found in several other WspA homologs, AHB33430, QFS48983, QFS48982, and WP_100897955 (NCBI; Supplemental Fig. S2). However, it was also reported that the two WspA proteins (AUB35877 and AUB35880, NCBI) could be released from the cells of a N. flagelliforme culture but both proteins lack the WspN sequence48. Thus, WspN-facilitated export may be an evolving new way for protein secretion.

The secreted WspA accounts for only a very minor part of the total WspA protein in the cells of N. flagelliforme and N. commune18,19. Also, it can be released from the desiccated colonies upon rehydration. In contrast, Nf-LacZ should be still a traditional intracellular β-galactosidase but with low sequence similarity with the well-known E. coli LacZ. An illustration of the two β-galactosidases, LacZ and WspA, in the N. flagelliforme cell is shown in Fig. 7.

Figure 7
figure 7

An illustration of the two β-galactosidases LacZ and WspA in a cell of N. flagelliforme. LacZ is located intracellularly. WspA is stored intracellularly, but can be secreted into the glycan sheath upon rehydration. The activities of both enzymes are promoted by Mg2+, while Ca2+ is inhibitory for WspA. In the glycan sheath, the activity of WspA may also be affected by the extracellular pigment scytonemin and its own hydrolysis. scy scytonemin.

In conclusion, we characterized some biochemical features of the two β-galactosidases Nf-LacZ and WspA1 from N. flagelliforme. They have different enzymatic characteristics and can serve as potential biocatalysts for use in food industry. Elucidation of the central active region of WspA1 provides a valuable clue for understanding its evolution. The future resolution of their crystal structures will provide more functional information.