Introduction

To date many classes of carbohydrate binding proteins (CBPs) have been identified, including lectins such as F-type1, C-type2, and Galectins3, and carbohydrate binding modules (CBMs) associated with glycoside hydrolases or glycosidases4 that occur ubiquitously in nature. Lectins regulate numerous crucial biological processes including pattern recognition of pathogens, correct folding of glycoproteins5 and cell-cell communication6. Lectin-oligosaccharide interactions are highly specific due to the branching motifs of either homo- or hereteopolymers of monosaccharide units allowing for structural complexity7,8 that are crucial determinants for many biological interactions at a cellular level. Over 50% of all newly synthesized proteins and lipids are glycosylated9, resulting in cell surface decoration of glycoproteins and glycolipids that are known to play central roles in cell development, tumour progression and metastasis10,11.

Intensive research aimed at recognizing altered cell surface glycosylation during disease development using CBPs is becoming increasingly important for both biomarker discovery and inhibitor design. As a result it is essential that CBPs encoded within any given genome be identified and their carbohydrate specificity determined. Employing computational function prediction to guide experimental analysis has the potential to significantly aid in this endeavour.

The first generation techniques used to automate protein function annotation was based on sequence homology. Modern approaches now employ machine-learning classifiers based on the protein evolutionary data, structure geometric, or other sequence or structure information12. Recently, we introduced a template-based method (SPOT-Struc) that predicts CBPs by making structural alignment between a query structure and the template structure of a known CBP. Highly aligned structures are followed by a binding affinity prediction to further remove potentially false positive CBPs. Using this method we predicted several structural genome targets as CBPs, one of which was YesU from Bacillus subtilis (RCSB Protein Data Bank code 1oq1)13.

YesU, even after a decade of genome engineering14, remains an uncharacterized protein with unknown function. In 2005, Structural Classification of Proteins–extended (SCOPe) classed YesU as a beta protein with a galectin-like fold, being a member of the concanavalin-A (ConA) like lectins/glucanses superfamily15 possessing a β-sandwich structure comprising 12–14 strands organised as 2 sheets to form a jellyroll topology16. Our SPOT-Struc analysis specifically matched YesU to the integral membrane mammalian protein VIP36 (2e6v)13, a known leguminous type lectin with a β-sandwich and jellyroll fold17,18 that recognizes high Man-type glycans19.

Here we describe the experimental verification of YesU as a CBP, and its functional annotation as a new Lewisx (Lex) binding lectin using a combination of glycan array profiling followed by detailed affinity analysis using surface plasmon resonance (SPR). Our data demonstrates the complementary role of computational prediction and experimental validation in function annotation.

Results

Expression of Recombinant YesU and Glycan Array Analysis

Recombinant B. subtilis hypothetical cytosolic protein 031524 (YesU) was expressed in E. coli BL21 (DE3) cells using the vector pMCSG68 obtained from DNASU (Arizona State University), and purified to homogeneity. Figure 1 illustrates the successful purification of highly pure His-tagged YesU (220 residues, 25.2 kDa) using His-select nickel affinity resin (lane 7 coomassie stain and lane 8 Western blot) that was used in subsequent analyses. Size exclusion chromatography showed the YesU in solution is predominantly monomeric with a molecular mass of approximately 25 kDa (Fig. S1), which is consistent with its structural annotation as a monomeric protein. The glycan binding potential and specificity of recombinant YesU was initially assessed using glycan array analysis, a now well-established and widely utilised tool for high-throughput screening of CBP specificity, and whose relative densities are comparable to that seen in a biological context20,21. Interactions were verified through both visual inspection of scanned images and statistical analysis (P < 0.05). The evaluation of YesU by glycan array analysis revealed a limited glycan binding profile, with YesU exhibiting statistically significant binding only to fucosylated glycans (Fig. 2). Importantly, YesU displayed no significant binding to any Man structures present on the array, even though VIP36, the lectin matched to YesU using SPOT-Struc, exclusively binds high-Man type glycans17. Of the fucosylated structures bound, significant binding was observed to fucosylated glycans that share a lacto-N-neotetraose (LNnT) backbone and at least one non-terminal α1-3 fucose (Fuc) (8C, 8H, 8I, 8L, 8N), and to Lewis (8A, 539, 542) and Blood Group B (363) structures (Fig. 2 and Table S1). Interestingly, similar but truncated fucosylated LNnT structures (7A–7E) were not bound by YesU under the experimental glycan array conditions used. Grant et al., recently reported the potential of false-negative binding on glycan arrays arising from restricted glycan presentation22. Therefore in subsequent SPR experiments undertaken to verify and further characterize YesU’s carbohydrate binding specificity, truncated fucosylated LNnT structures were also evaluated. In addition to binding terminal and non-terminal Fuc-containing glycans, interaction with LnNT (383) and N,N′, N″, N″′, N″″, N″″′-Hexaacetyl chitohexaose ((GlcNAc)6, 4D) was also observed (Fig. 2), and as such were also assessed by SPR.

Figure 1
figure 1

Coomassie stained gel and subsequent western blot of the expression and purification of B. subtilis hypothetical cytosolic protein 031524 (YesU). Lane 1: Biorad precision plus protein standard, Lane 2: uninduced culture, Lane 3: induced culture, Lane 4: slurry mix (protein lysate and resin), Lane 5 wash #1 (binding buffer), Lane 6: wash #2 (wash buffer), Lane 7: concentrated purified YesU protein (220 residues) concentrated using Amicon 3 K 15 mL centrifugal filter (2 mg/mL), Lane 8: Subsequent western blot of purified YesU protein (220 residues) concentrated using Amicon 3 K 15 mL centrifugal filter at the expected 25.2 kDa.

Figure 2
figure 2

Glycan array analysis of YesU. Binding of YesU to glycan structures present on the array was assessed both for statistical significance using T-tests (p < 0.05, red bars), and by determining fold-change over background. Black bars indicated statistical insignificant (p > 0.05) fold-change over background. The Symbol Nomenclature for Glycans (SNFG)55 was used to represent glycan structures (yellow circle, Gal; blue square, GlcNAc; yellow square, GalNAc; red triangle, Fuc). The full list of glycans present on our array and their corresponding glycan ID and structure is provided in Table S1.

Surface Plasmon Resonance Analysis

Glycan array analysis strongly suggested that YesU recognises terminal and non-terminal Fuc-containing glycans. To further quantitatively explore the interaction of YesU with these glycans SPR was performed. SPR analysis revealed preferential binding to Lewis type, Blood Group type and fucosylated LNnT glycans, in particular non-terminal Fuc variants (Table 1 and Fig. S3). A comparison of YesU’s binding to Lewis and related structures demonstrates an increased affinity for non-terminal α1-2/3-Fuc, revealed through a 3-fold greater affinity for Lewisx (Lex, 7I, KD: 0.15 ± 0.006 μM), lacto-N-fucopentaose III (LNFP III, 7 C: KD: 0.14 ± 0.001 μM) and Blood group B trisaccharide (BGBT, 7 M, KD: 0.19 ± 0.003 μM) over Lewisa (Lea, 7 J, KD: 0.42 ± 0.004 μM) and lacto-N-difucohexaose II (LNDFH II, 7E: KD: 0.34 ± 0.01 μM) that both possess a non-terminal α1-4 branched Fuc. Similarly, monofucosyl(1-3)-iso-lacto-N-octaose (MFiLNO, 8 N, KD: 0.24 ± 0.02 μM) also showed higher affinity over glycans with terminal Fuc, although with a slight reduction in KD probably due to the presence of a terminal lacto-N-biose I (Galβ1-3GlcNAc). The observed decrease in affinity for Lewisy (Ley, 7 N, KD: 0.34 ± 0.002 μM) could be due to the presence of a terminal α1-2-Fuc. This is supported by the observation that the structurally related histo-blood group antigens ABH(O); H-disaccharide (HDi, 7 F, KD: 0.36 ± 0.01 μM) and Blood Group H type II trisaccharide (BGHT II, 7 O, KD: 0.34 ± 0.004 μM) exhibited a 2–3-fold lower affinity. Similarly, 8 H (LNnDFH I), 7 A (LNFP I), and 7 K (BGAT) displayed a reduced affinity compared to saccharides with only a non-terminal α1-3-Fuc residue. Further the absence of Fuc as in 1 H (LnNT, Galβ1-4GlcNAcβ1-3Galβ1-4Glc), or a charged group as in 8 A (sulpho-Lewisa) and 10B (sialyl-Lewisx) to the core Lewis type structure lead to a reduction or complete loss of binding. Glycan array analysis also revealed potential binding of YesU to N,N′,N″,N″′,N″″,N″″′-Hexaacetyl chitohexaose ((GlcNAc)6, 4D). Although this interaction was verified by SPR, it was of significantly weaker affinity (KD: 1.13 ± 0.02 μM) compared to Lex and LNFP III (Table 1). As was also observed by glycan array analysis, no interaction of YesU with α1-2- and α1-6-mannobiose (5 C, 5 F), α1-3-galactobiose (1N), and LNT (1G) was detected by SPR. Taken together, SPR analysis clearly highlights the importance of a non-terminal Fuc and a lesser extent terminal Fuc for glycan binding by YesU.

Table 1 Dissociation constants (KD) of YesU (50 μg/mL) His tagged to NTA chip for selected glycans.

As previously mentioned large fucosylated glycans that share an LNnT backbone and at least one non-terminal α1-3 Fuc (eg. 8C, 8H, 8I, 8L, 8N) were bound by YesU on our glycan array but the corresponding truncated fucosylated LNnT structures (7A-7E) were not. Quantitative analysis by SPR verified the interaction of YesU with monofucosyl(1-3)-iso-lacto-N-octaose (MFiLNO, 8 N: KD: 0.24 ± 0.02 μM) and lacto-N-neodifucohexaose I (LNnDFH I, 8 H: KD: 1.14 ± 0.03 μM), further highlighting the need to consider restricted glycan presentation when assessing glycan array data22. Despite this, glycan array analysis successfully predicted the glycan binding specificity of YesU that was subsequently verified and refined by SPR.

Structure-based alignment and Lex docking analysis

Glycan array and SPR analyses confirmed the prediction that YesU is a carbohydrate binding protein. However, it was expected that YesU would display a similar Man-binding profile to that described for VIP36, the template that was best matched structurally to YesU (SP-score of 0.80, Table 2). To better understand the discrepancy between the YesU predicted and observed carbohydrate binding specificity, we structurally compared YesU to five Fuc-binding lectins (Table 2 and Fig. 3A), lectin II from Pseudomonas aeruginosa (PA-IIL, pdbID: 1gzt), the Anguilla anguilla agglutinin (AAA, pdbID: 1k12), SP2159 from Streptococcus pneumoniae (pdbID: 2j1u), lectin I from Ulex europaetus (UEA-I, pdbID: 1fx5), and lectin IV from Griffonia simplicifolia (GS-IV, pdbID: 1gsl), by SPalign23. This analysis revealed that YesU is more structurally similar to the legume lectins UEA-I and GS-IV than it is to the bacterial lectins PA-IIL and SP2159. Importantly, YesU’s similarity to UEA-1 and GS-IV is comparable to that observed with VIP36. As shown in Table 2, in total 185 and 177 amino acids in YesU and UEA-I and GS-IV were aligned with an average root-mean-squared distance of 2.86 Å (SP-score of 0.75) and 2.68 Å (SP-score of 0.75), respectively. That is, global structural similarity is high enough for function transfer. The structural alignment of YesU and GS-IV is depicted in Fig. 3B, and shows that even though these lectins have a sequence identity of only 13.7% (Table 2), there is a high degree of structural similarity; with key features of the conventional legume lectin fold24,25 being conserved in YesU. Based on the structural alignments in Fig. 3, there is no clear consensus binding site for Lex in YesU. GS-IV was selected as a representative binding template due to its fucosylated glycan binding characteristics, high structural similarity with YesU, and the proximity of its binding site to the VIP36 Man-type glycan binding site. A potential Lex-YesU binding site was identified based on a structural alignment with the Leb-GS-IV binding complex (Table 2 and Fig. 3). Docking was subsequently performed by Vina-Carb in a 30 Å box around the predicted site. This docking experiment revealed a potential Lex binding orientation, with predicted binding affinity of −4.3 kcal/mol (Fig. 4A). While the minimised conformation was translated somewhat away from the original, predicted binding site based on the template of the Leb-GS-IV binding complex (Fig. 3A), the docked conformation is consistent with the site independently predicted by a purely sequence-based technique SPRINT-CBH26. Of the top 10 predicted binding residues by SPRINT-CBH, Arg200, His105, Arg133, Tyr101 and Ser115 are all within 7 Å of the docked Lex conformation (Fig. 4B), with His105, Arg133 and Tyr101 also predicted by Vina-Carb to interact with Lex (Fig. 4A). While Arg125, Lys196, Trp51, Ser40 and Tyr158 are also predicted as potential binding residues by SPRINT-CBH, they are largely isolated on the protein surface and are unlikely to form part of the true binding site. Geometric clustering has previously been used to strengthen the signal/noise ratio of structure-based, functional annotation tools. In this case, the geometric distribution of predicted binding residues supports the binding site selected by the docking experiment.

Table 2 Structure based alignment of YesU to five Fuc-binding lectins using SP-align.
Figure 3
figure 3

Side-by-side representation of YesU with five Fuc-binding lectins (GS-IV, pdbID: 1gsl; UEA-I, pdbID: 1fx5; SP2159 pdbID: 2j1u; AAA, pdbID: 1k12; PA-IIL, pdbID: 1gzt) and VIP36 (pdbid: 2e6v) that were used for structure-based alignment by SPalign. All bound ligands shown (with the exception of YesU (1oq1)) are those associated with the corresponding PDB entries (A). YesU (magenta) aligned to the structure of GS-IV (cyan) by SPalign (B).

Figure 4
figure 4

The docking of Lex into YesU using Vina-Carb with default parameters revealed a potential Lex binding orientation in the final complex structure model with predicted binding affinity of −4.3 kcal/mol. Predicted interacting residues are highlighted in yellow (A). The YesU-Lex binding site was also independently predicted by the purely sequence-based technique SPRINT-CBH. Predicted binding residues within 7 Å of the docked Lex are highlighted in yellow (B).

B. subtilis adherence to intestinal cells is inhibited by fucosylated glycans

Our data suggests that YesU from B. subtilis preferentially binds fucosylated glycans such as Lex. Fucosylated glycans, particularly Lewis blood group antigens are known to play a functional role in the adhesion of a number of bacteria to the gastrointestinal tract. In order to explore the possibility that YesU has a similar function in B. subtilis we investigated the ability of fucosylated and non-fucosylated glycans to inhibit the adherence of B. subtilis ATCC6633 (expression of YesU in B. subtilis ATCC6633 was confirmed by RT-PCR, data not shown) to the intestinal cell line Caco-2. Figure 5 shows that Lex and Ley at 1 μM were able to significantly inhibit B. subtilis adherence by greater than 50%, whereas the high Man-type glycan, Manα1-3(Manα1-3(Manα1-6)Manα1-6)Man (Man5) showed no significant inhibition at concentrations up to 10 μM, and Galα1-3 Gal and BGBT (Galα1-3(Fucα1-2)Gal) only exhibited significant inhibition at 10 μM (approximately 50% at 10 μM). LNnT (Galβ1-4GlcNAcβ1-3Galβ1-4Glc) was also inhibitory but not to the same extent as Lex and Ley, which mimics the SPR data that showed preferential binding of YesU to Lex and Ley over BGBT and LNnT.

Figure 5
figure 5

The adherence of B. subtilis to the intestinal cell line Caco-2 was inhibited by a range of fucosylated glycans including BGBT, Lex, and Ley, as well as by LNnT but to a slightly lesser extent. No significant inhibition of B. subtilis to Caco-2 cells was observed for the high Man-type glycan, Manα1-3(Manα1-3(Manα1-6)Manα1-6)Man (Man5). *Significant difference to control containing no glycan, P < 0.05.

Discussion

YesU is located between a rhamnogalacturonan acetylesterase (RhgT) and a DUF624 domain containing protein in the B. subtilis genome. The location of YesU between these two genes is fairly well conserved with 20 complete B. subtilis genomes containing the identical alignment and an additional 32 complete genomes containing the same up and downstream genes around YesU, with overlap between the gene start and finish codons for YesU and the downstream DUF624 protein. YesU in all genomes assessed had its own promoter (BPPROM27) with the upstream RhgT gene possessing a transcription terminator (using RNAmotif28 and Erpin29).

We had previously predicted, using SPOT-struc, YesU to be a carbohydrate binding protein13 with specificity similar to the Man-binding protein VIP36 (pdbID: 2e6v)17,18. VIP36 exhibits a typical β-sandwich and jellyroll fold17,18 that is a common structural characteristic of the ConA-like lectin/glucanse superfamily15. However, wide scale glycan array screening of YesU did not display any binding to terminal mannosyl structures or even oligosaccharides containing sub-terminal Man. This lack of binding was validated by SPR with no binding observed to both α1-2-mannobiose and α1-6-mannobiose structures. Our glycan array and SPR analysis instead suggests that YesU’s carbohydrate specificity is more closely related to that of Fuc-binding lectins than Man-binding lectins.

Fuc-binding lectins have been identified in a range of organisms from bacteria to vertebrates. Interestingly, similar to VIP36 and other members of the ConA-like lectin/glucanse superfamily, F-type lectins (fucolectins) also possess a β-barrel with jellyroll topology that defines a typical F-type fold. Two of the best-characterized F-type lectins, both structurally and biochemically, are AAA (Anguilla anguilla agglutinin) from the European eel, and SP2159 from the Gram-positive pathogen Streptococcus pneumoniae. AAA binds terminal Fuc residues in certain blood group antigens, including H type 1 (Fucα1-2Galβ1-4GlcNAc) and Lea (Galβ1-3(Fucα1-4)GlcNAc)30, while SP2159 binds to a more distinct set of fucosylated oligosaccharides compared to AAA, including the H, A and B blood group antigens and Ley (Fucα1-2Galβ1-4(Fucα1-3)GlcNAc) epitope, but not Lex (Galβ1-4(Fucα1-3)GlcNAc) or Lea Galβ1-3(Fucα1-4)GlcNAc)31,32. The glycan binding profile of AAA33, and the glycan binding domain CBM4731 of SP215934 (also referred to as SpGH98), have also been analysed by glycan array through the Consortium for Functional Glycomics (CFG). Interestingly, even though both lectins did bind Fuc, they only bound a very restricted subset of fucosylated glycans on the CFG array. AAA, analysed on the CFG glycan array version 2.1, only bound Fucα1-2GlcNAcβ-Sp8 and Fucα1-3GlcNAcβ-Sp8 with any significance, and SP2159, analysed on CFG glycan array version 3.0, only bound Lewisy (Fucα1-2Galβ1-4(Fucα1-3)GlcNAcβ-Sp2), 2′-fucosylactose (Fucα1-2Galβ1-4Glcβ–Sp1) and Lewisb (Fucα1-2Galβ1-3(Fucα1-4)GlcNAcβ-Sp2), with any significance. The narrow Fuc-binding profile observed for AAA and SP2159 is similar to that observed for YesU on our glycan array, and as such may also reflect the influence of restricted glycan presentation on the CFG glycan array22.

Further examples of Fuc-binding lectins include UEA-1 and GS-IV from the plants Ulex europaeus and Griffonia simplicifolia, respectively. GS-IV is known to bind Leb and Ley with high affinity25, and UEA-I interacts strongly with α1,2 linked Fuc specifically H type 2 trisaccharides24. CFG Glycan array data is only available for UEA-I35 with specific binding on the CFG glycan array version 4.0 observed to blood group H type saccharides possessing the terminal disaccharide Fucα1-2Gal. Structurally both UEA-I and GS-IV are similar to other leguminous lectins, including members of the ConA-like lectin/glucanse superfamily, possessing a conventional legume lectin fold that comprises three β-sheets, a six-stranded back sheet, a seven- stranded front sheet, and a five-stranded S sheet which connects the front and back sheets24,25. Although leguminous lectins are structurally similar, their carbohydrate binding specificities and binding site location can vary widely. In the case of UEA-I and GS-IV even though the binding sites are found at different locations of the protein there is significant homology between the binding site residues (Fig. 3A). Residues involved in carbohydrate binding in the GS-IV lectin are Arg48, Ser49, Asp89, Tyr105, Gly106, Gly107, Phe108, His114, Asn135, Trp138, Tyr22325, and in UEA-I the binding site is a depression made up of residues Glu44, Thr86, Asp87, Gly104-Gly105, Ile130, Val134, Asn135, Trp137, Tyr220, and Arg223. Structure-based alignment of YesU with GS-IV (Fig. 3) followed by Lex docking analysis using Vina-Carb (Fig. 4A) revealed a complex structure model for YesU and Lex with binding affinity of −4.3 kcal/mol. Lex was located within a binding pocket comprising residues Leu74, Tyr101, Pro102, His105, Arg118, Tyr121, Glu124, Arg133, Phe138 and Met202 with 5/11 residues in common with GS-IV binding residues. An additional sequence-based carbohydrate-binding site prediction technique SPRINT-CBH further supported the binding site predicted by Vina-Carb. Based on the complex structure models of YesU and Lex shown in Fig. 4, we propose that Tyr101, Pro102, Arg133 and Phe138 stabilize the GlcNAc moiety, while Leu74, Tyr101, His105, Tyr121, Glu124 and Met202 are involved in Fuc binding. The stabilization of the GlcNAc moiety by Tyr101, Pro102, Arg133 and Phe138 more than likely accounts for the observed weak but significant binding to LnNT (1 H, Galβ1-4GlcNAcβ1-3Galβ1-4Glc) and (GlcNAc)6 (4D) observed by glycan array and SPR analyses. Further experimental studies are needed to unequivocally validate the proposed binding sites.

Similar to the leguminous lectins GS-IV and UEA-I, AAA and SP2159 are comprised of two main β-sheets and four-five anti-parallel β-strands, two forming a shallow positively charged pocket that forms the Fuc-binding pocket30,36. The Fuc-binding pockets in AAA and SP2159 have been attributed to hydrogen bonding with basic residues and Van der Waals contact with hydrophobic residues in the proposed binding pocket36. The ability of some GAGs and GAG fragments to interact with YesU may be due to the anionic sulfate interacting with YesU positively charged residues in the proposed Lex binding pocket.

Given our finding that YesU from B. subtilis preferentially binds fucosylated glycans such as Lex it is possible that YesU may be involved in the adhesion of B. subtilis to the gastrointestinal tract of mammals. Fucosylated glycans, particularly Lewis blood group antigens have been proposed to play a functional role in the colonisation and virulence of a number of gastrointestinal pathogens, including Helicobacter pylori37, Campylobacter jejuni38, Salmonella enterica sv. Typhimurium39, and Pseudomonas aeruginosa40. P. aeruginosa is particularly interesting as one of its two soluble lectins, PA-IIL binds host cell Lewis eptiopes41 and as we have shown here is structurally similar to YesU (Table 2 and Fig. 3). In addition, even though PA-IIL is abundantly present on the bacteria outer membrane40, it does not possess a signal sequence/peptide that predicts subcellular localization as determined using SignalP 4.1 (Gram-negative organism group)42. Similarly, YesU also does not possess a signal sequence/peptide as determined using SignalP 4.1 (Gram-positive organism group), suggesting that PA-IIL and YesU are secreted in a signal sequence/Sec independent process. Although it is unclear exactly how YesU is secreted in B. subtilis, evidence to support signal sequence/Sec independent secretion in Gram-positive bacteria is available. The carbohydrate binding toxin43, pneumolysin (Ply) from Streptococcus pneumonia is secreted in a signal sequence/Sec independent process. This signal sequence/Sec independent pathway is also present in B. subtilis as Ply knock-in mutants can also export Ply in a signal sequence/autolysis independent manner44. It is therefore possible that the export of YesU utilises this signal peptide/Sec independent process in B. subtilis.

The ability of a range of fucosylated glycans including BGBT, Lex, and Ley, but not the high Man-type glycan, Manα1-3(Manα1-3(Manα1-6)Manα1-6)Man, to inhibit the adherence of B. subtilis to intestinal cells (Fig. 5) provides experimental evidence to support the potential role of YesU in the colonisation of the gastrointestinal tract by B. subtilis. Lex, in particular, is a very common glycan motif in mammals, and as such represents an important receptor for pathogenic and commensal gut microbes. Although usually considered soil organisms, members of the genus Bacillus have been found to inhabit the gastrointestinal tract of insects and animals45. In fact there is now significant evidence that species such as B. subtilis should be considered gut commensals in humans rather than purely environmental microorganisms46. Interestingly, a homolog of YesU is also present in B. cereus, strains of which cause foodborne illnesses in humans45, but is not present in B. thuringiensis and B. sphaericus, both of which are gut commensals in insects. Insects do not express Lex, although they do possess a Lewis-like structure, α1,3-fucosylated N-acetylgalactosaminyl-β1,4-N-acetylglucosamine (GalNAcβ1-4(Fucα1-3)GlcNAc)47 that resembles Lex except that a GalNAc is present rather than a Gal. The absence of a YesU homolog in Bacillus species that are insect gut commensals together with the lack of Lex eptitopes in insects provides further evidence for the potential importance of YesU in the colonisation of the human gut by B. subtilis (and potentially B. cereus).

This paper validates the use of computational prediction from a method like SPOT-Struc as the first step to screen and identify potential new CBPs. We have shown that SPOT-Struc is highly predictive and ideal to uncover novel carbohydrate-binding proteins. However, prediction of the carbohydrate specificities such as reported here for YesU is more challenging for computational techniques as the sidechains rather than backbone structure play a more active role in determining lectin-binding specificity.

Methods

Expression and Purification of B. subtilis YesU Protein

The vector pMCSG68 (clone ID BSCD00606331) was obtained from DNASU plasmid repository (http://dnasu.org). The vector incorporated the B. subtilis hypothetical cytosolic protein 031524 (YesU; 220 residues) with a N-terminal 6xHis tag under the control of a T7 promoter. It was transformed into competent E. coli BL21 (DE3) cells and used for recombinant protein expression. An overnight culture of BL21 (DE3)/BSCD00606331 was used to inoculate LB broth containing ampicillin (100 µg/mL) and incubated at 37 °C with aeration. Once OD600nm reached 0.4–0.6, protein expression was induced using 1 mM IPTG for a further 4 hrs and the cell pellet resuspended in binding buffer (50 mM NaHPO4, 300 mM NaCl; pH 8.0), lysozyme (2 mg/mL), DNaseI and protease inhibitor cocktail mix (50 µL). An additional freeze/thaw step was performed to aid in cell lysis followed by sonication and removal of the insoluble cell debris by centrifugation at 100,000 × g for 90 min. The clarified supernatant was added to 1 mL of His-select nickel affinity resin (Sigma) and rotated overnight at 4 °C using a rotational mixer. The slurry mix was then packed by gravity into 10 ml Bio-Rad chromatography column. The column was washed once with binding buffer, then washed with 50 mM NaHPO4, 1 M NaCl; pH 8.0 and the bound His-tagged protein eluted with 50 mM NaHPO4, 300 mM NaCl, 500 mM imidazole; pH 8.0 in 1 mL volume. Imidazole was removed from the sample by dialysis in PBS overnight at 4 °C. Purity was confirmed by SDS-PAGE using 12% SDS-polyacrylamide gels and Western Blot using a mouse monoclonal anti-His6 (1:10,000 dilution, Cell Signaling Technologies) and a goat anti-mouse horseradish peroxidase conjugated (1:10,000 dilution, Bio-Rad Laboratories) as the primary and secondary antibodies, respectively.

Glycan Array Analysis

Glycan arrays consisting of 367 diverse glycans with and without the presence of one of three spacers (sp2, sp3 or sp420,48,49) were prepared from two previously described glycan libraries20,48,49). Amine containing glycans with spacer’s sp2, sp3 or sp4 were synthesised as previously described20 and glycans without spacers were amine functionalised as previously published50. All glycans were suspended in 1:1 DMF:DMSO at a concentration of 500 μM and were printed onto SuperEpoxy 2 glass slides (ArrayIt, Sunnyvale, CA) using a ArrayIt SpotBot Extreme array spotter in a six pin subarray print per glass slide format. All glycans were printed in replicates of four, including four FITC control spots, per subarray using 946MP4 pins and a contact time of 1 second at 60% relative humidity, with pins being reloaded after every 12 spots.

The arrays were printed, and subsequently neutralized in 1:1 ethanolamine:DMF, followed by blocking with 0.1% BSA in 50 mM phosphate buffered saline (PBS), pH 7.4 for 5 min at 22 °C. After washing with PBS, each slide was dried by placing them in an empty 50 mL tube and centrifuging for 5 min at 200 x g. Recombinant YesU (2 µg) was incubated at a molar ratio of 1:2:4 with anti His-tag mouse monoclonal antibody (10 mg/mL, Cell Signalling Technology), anti-mouse-IgG-Alexa555 conjugated rabbit polyclonal antibody (2 mg/mL, Life Technologies) and goat conjugated anti-rabbit-IgG-Alexa555 polyclonal antibody (2 mg/mL, Life Technologies) in 50 mM Array PBS (PBS with 1.8 mM MgCl2 and 1.8 mM CaCl2), pH 7.4 containing 0.1% BSA for 15 min on ice protected from light. All subarrays on the slide were isolated using a Gene Frame (1.5 × 1.6 cm, 65 µL, Abgene, Epsom, UK) prior to the addition of the YesU-antibody mix to the array. A coverslip was applied to the GeneFrame and array slides incubated in a humidified incubator for 20 min at 22 °C in the dark. The GeneFrame and coverslip were subsequently removed and the slide gently washed twice with 50 mM PBS, pH 7.4 containing 0.001% TWEEN® 20, and twice with 50 mM PBS, pH 7.4. Slides were dried by centrifugation for 5 min at 200 x g and allowed to air dried for a further 5 min.

Fluorescent Image Acquisition and Data Processing

Fluorescence intensities of the array spots were measured using the ProScanArray microarray reader (Perkin Elmer, Waltham, MA) using the Helium-Neon 543 green excitation laser set to the Alexa Fluor 555 setting (555 nm excitation and 580 nm emission). Image analysis was carried out using the inbuilt ProScanArray imaging software, ScanArray Express (Perkin Elmer). Raw glycan signals were exported into Microsoft EXCEL. The mean background was calculated from DMF/DMSO blanks on the array plus three standard deviations. This was subtracted from each glycan to generate an adjusted signal. A one tailed T-test was performed with significance set at p = 0.05. T-test and fold change were generated using Microsoft EXCEL.

Surface Plasmon Resonance Detection

Surface plasmon resonance (SPR) experiments were performed using a BIAcore T100 biosensor system (GE Healthcare) at 25 °C in 10 mM PBS-MgCl2 (pH 7.4) at a flow rate of 30 mL/min. Purified His-YesU was diluted to 50 μg/mL in PBS (pH 7.4) and loaded on flow cell 2 (FC2) of a Ni2+-nitrilo-triacetic acid (NTA) Series S sensor chip with 5 min of contact time. This was repeated for FC3 and FC4 to allow for triplicates and validate the reproducibility of the SPR response. FC1 had no protein loaded and was used as a blank reference. Five-fold serial dilutions of selected glycans were prepared in 10 mM PBS (pH 7.4). The glycan dilutions were loaded onto the sensor chip and assessed using single cycle kinetics (that is, after the last injection of the dilution series, the chip was regenerated with EDTA). Subsequently, the chip was re-loaded with Ni2+ and His-YesU before the injection of the next glycan dilution series. The specificity of the glycan binding was recorded as the response signal difference between each YesU loaded FC and the reference FC1. A 10-min dissociation time was allowed after the addition of each concentration of analyte. SPR signals were analysed using the T100 BIAcore Evaluation software and dissociation constants (KD) determine from steady-state analysis.

Structure-based alignment and Lex docking analysis

We utilized the structural alignment of YesU (pdbid: 1oq1) over GS-IV (pdbid: 1gsl) to build a candidate complex structure with the lectin Leb/Ley of 1gsl and identified a potential binding site. The selected binding site is supported by binding residue predictions generated by SPRINT-CBH26. SPRINT-CBH is an orthogonal, sequence-based method, which employs a learned, SVM model to predict carbohydrate-binding residues. The final complex structure model was generated by Vina-Carb51; a modified version of Autodock Vina (version 1.1.2)52 designed to reproduce native glycosidic torsion angle preferences. In order to identify a potential binding mode, Lex was relaxed with default parameters inside a 30 Å box at the predicted binding site. We did not perform a de novo docking over the whole structure as docking methods were found to be less accurate than template-based methods if binding sites are not known a priori53.

Caco-2 adherence and glycan inhibition assays

Adherence assays were performed using Caco-2 human intestinal cell lines essentially as previously described54. Briefly, cells were seeded at 105 cells/well in a black wall 96-well cell culture plates in minimal essential medium (MEM) for 48–72 h prior to bacterial challenge. Caco-2 cells were monitored prior to the assay to ensure a confluent monolayer of cells and that the cells had formed tight junctions. Bacillus subtilis ATCC6633 was fluorescently labelled with carboxyfluorescein diacetate, succinimidyl ester (CFDA-SE), and 107 labelled bacteria were applied to the Caco-2 cells in the presence and absence of selected glycans (at final concentrations between 0.1 and 10 mM). Following incubation at 37 °C for 60 min, protected from light, cells were carefully washed 3 times with pre-warmed PBS and the fluorescence measured at an excitation and emission wavelength of 485 nm and 535 nm, respectively using a Infinite® 200 PRO (Tecan) fluorescence plate reader.