Introduction

The human gut microbiota (HGM) is a key determinant of health1,2,3. Orthogonal transfer from the mother contributes markedly to the establishment of this community shortly after birth4,5. The HGM develops dynamically during infancy until a resilient adult-like community is formed after 2–3 years of life6,7,8. The early life microbiota plays a role in the maturation of the host’s endocrine, metabolic and immune systems9, and the composition of this consortium is associated with life-long health effects10,11,12. Therefore, understanding the factors that define the HGM structure during infancy is critical for minimizing the risk for a range of metabolic, inflammatory and neurodegenerative disorders, all associated to specific HGM signatures13,14.

Dietary glycans resistant to digestion by human enzymes are a major driver that shapes the developing HGM6,15. This is emphasized by the dominance of Bifidobacterium in breast-fed infants7,8, attributed to the competitiveness of distinct members of this genus in the utilization of human milk oligosaccharides (HMOs)16,17. Indeed, the most prominent changes in the infant microbiota occur during weaning and the introduction of solid food6,7, whereby bifidobacteria are replaced by Firmicutes as the top abundant phylum of the mature HGM. This compositional shift is accompanied by notable longitudinal increases in concentrations of the short chain fatty acids (SCFAs) propionate and butyrate (from carbohydrate fermentation) during and after weaning18.

Butyrate exerts immune-modulatory activities19 and is associated with a lowered risk of colon cancer, atherosclerosis, and enteric colitis20,21. The production of butyrate is largely ascribed to Firmicutes Clostridium cluster IV and Clostridium cluster XIVa that includes the Roseburia-Eubacterium group (Lachnospiraceae family, Clostridiales order), which are abundant and prevalent in the adult HGM22,23. The abundance of Roseburia spp. is decreased in patients suffering from metabolic, inflammatory and cardiovascular diseases24,25,26,27. Although butyrate producers are established by the first year of life27, the mechanisms underpinning their early appearance (and prevalence) remain unknown.

The evolution of uptake and enzymatic systems that support competitive growth of Bifidobacterium spp. on HMOs17 reflects a successful adaptation to the intestines of breast fed infants. We hypothesize that other taxonomic groups, which possess metabolic capabilities that target HMOs, may have an early advantage in the colonization of the infant gut during infancy.

The early emergence of Roseburia-Eubacterium in the human gut offers a suitable model group to evaluate this hypothesis. Here, we perform genomic analyses that are suggestive of the presence of putative HMO utilization loci in Roseburia and Eubacterium strains. Growth on selected HMOs or a complex mixture from mother’s milk combined with differential proteomics reveal the high upregulation of the protein apparatus encoded by these loci, consistent with their role in mediating HMO utilization.

Further, we characterize enzymes and transport proteins encoded by the HMO loci to elucidate the molecular details of HMO capture and degradation by this protein apparatus. These analyses unveil an enzymatic activity and a structural fold, which have not been previously reported. The HMO catabolic pathways are upregulated during growth with the model mucin degrader Akkermansia muciniphila, suggesting these pathways may support cross-feeding on mucin oligosaccharides made accessible by A. muciniphila. Analyses of the metagenome of Roseburia show a striking conservation and wide occurrence of the HMO utilization pathways across the genus, underscoring their importance for adaptation to the human gut. This study provides insight into pathways that may confer a competitive advantage in the early colonization and the resilience of key butyrate-producing Clostridiales by mediating the catabolism of distinct HMOs and host O-glycans.

Results

HMO loci in Roseburia and Eubacterium

Our aim was to investigate the HMO utilizsation potential in butyrate producing Clostridiales, which potentially may confer an advantage during the maturation of the infant HGM during weaning. Genomic analyses of butyrate producers from Lachnospiraceae identified distant homologs of the recently discovered glycoside hydrolase family 136 (GH136) in the Carbohydrate Active enZyme (CAZyme) database (www.cazy.org) (Supplementary Fig. 1). This family was assigned based on the lacto-N-biosidase LnbX from Bifidobacterium longum subsp. longum JCM 121728, which cleaves the key HMO lacto-N-tetraose (LNT) to lacto-N-biose (LNB) and lactose (EC 3.2.1.140; Supplementary Table 1). The activity of the bifidobacterial LnbX was dependent on the co-expression of an adjacent gene, proposed to encode a molecular chaperone (LnbY). The GH136 orthologues from Roseburia and Eubacterium are organized, unlike the counterpart from Bifidobacterium, on a locus harbouring additional CAZyme genes (Supplementary Fig. 1).

We selected two Roseburia strains and one from Eubacterium, all having GH136-like genes, to examine their HMO utilization capabilities.

Significant growth was observed for Roseburia hominis DSM 16839 (p < 4.0 × 10−4) and Roseburia inulinivorans DSM 16841 (p < 1.3 × 10−4) after 24 h on media with HMOs from mother’s milk, but the growth of R. inulinivorans was more efficient (µmax = 0.30 ± 0.01 h−1) (Fig. 1a,b). Next, we carried out growth on building blocks from HMOs and related oligomers from O-glycoconjugates (Fig. 1a–d). None of the strains grew on the top abundant fucosyl-lactose (FL) HMOs, despite good growth on lactose (Fig. 1d). Roseburia strains failed to grow on sialyl lactose (SL) (Fig. 1d), consistent with the lack of encoded sialidases. R. hominis grew efficiently on the HMO LNT (µmax = 0.22 ± 0.02 h−1), its LNB unit (µmax = 0.16 ± 0.01 h−1) and the mucin-derived galacto-N-biose (GNB) (µmax = 0.21 ± 0.02 h−1) (Fig. 1a). Growth on LNT was also shared by the taxonomically related Eubacterium ramulus DSM 15684 from Eubacteriaceae. By contrast, R. inulinivorans grew well only on LNB and GNB, but not LNT (Fig. 1c). R. inulinivorans was further distinguished by growth on sialic acid (Neu5Ac), abundant in HMOs and glycoconjugates (Fig. 1d).

Fig. 1: Growth of Roseburia and Eubacterium spp. on HMOs and upregulation of HMOs utilization loci in Roseburia.
figure 1

Growth curves of R. hominis (a) and R. inulinivorans (b) on glucose, LNT, GNB, LNB, and/or purified HMOs from mother’s milk compared to no-carbon source controls over 24 h. c Growth levels of R. inulinivorans on LNT, LNB, GNB and of E. ramulus on LNT within 24 h including glucose and a no-carbon source controls. d, Growth of R. hominis, R. inulinivorans and E. ramulus on lactose, 2′FL, 3FL, 3′SL and 6′SL as well as on monosaccharides from HMOs and mucin after 24 h including a non-carbon source control. Growth analyses (ad) on media supplemented with 0.5 % (w/v) carbohydrates (for R. inulinivorans on 1% (w/v) and 4% (w/v) purified HMOs from mothers milk) were performed in independent biological triplicates. The growth data are presented as mean values with the error bars representing the standard deviations (SD) for ac. e HMO and mucin oligomeric growth substrates in ad. The HMO utilization loci in R. hominis (f) and R. inulinivorans (g) identified from proteomic analyses of cells growing on LNT and HMOs from mother’s milk, respectively, relative to glucose. Genes are denoted by their protein products: transcriptional regulator (Trans. R.); ABC transporter solute binding protein (RhLNBBP (f) and RiLea/bBP (g)); ABC transporter permease protein (PP); hypothetical proteins (HP); Glycoside hydrolase 136 (RhLnb136I, RhLnb136II (f) and RiLea/b136I, RiLea/b136II (g)); Glycoside hydrolase 112 (RhGLnbp112 (f) and RiGLnbp112 (g)); Glycoside hydrolase 29 (RiFuc29 (g)); Glycoside hydrolase 95 (RiFuc95 (g)) and histidine kinase sensory protein (His. K.) The proteomic analyses (fg) were in biological triplicates and the log2-fold change from the label free quantification of upregulated gene products is shown. Glycan structures presentation according to Symbol Nomenclature for Glycans (SNFG) (https://www.ncbi.nlm.nih.gov/glycans/snfg.html). Source data are provided as a Source data file labelled with the corresponding figure number and panel definition.

Typically, bacteria repress pathways for less preferred substrates in the presence of a favourite carbon source. Xylotetraose from the abundant dietary plant fibre xylan has been shown to be a preferred growth substrate and uptake ligand of the xylo-oligosaccharide importer conserved in Roseburia29. During weaning, the HGM of infants is likely to be exposed to both HMOs and dietary plant fibres, e.g. xylan from cereals and fruits. We tested the growth of R. hominis in the presence of equimolar concentrations of LNT and the similarly sized xylotetraose to evaluate the utilization hierarchy of the HMO versus the plant fibre. Strikingly, monophasic growth was observed consistent with the simultaneous uptake of both tetraoses from culture supernatant (Supplementary Fig. 2a–c).

To unravel the basis of growth on HMOs with focus on the Roseburia genus, we analysed the proteomes of R. hominis and R. inulinivorans on LNT and the HMO mixture, respectively, relative to glucose. For R. hominis and R. inulinivorans, 15 and 62 proteins, respectively, were significantly upregulated (log2 fold change >  2). These differential proteomes were dominated by carbohydrate metabolism proteins, especially products of two loci (henceforth referred to as HMO utilization loci), both encoding an ATP-binding cassette (ABC) transporter, GH112 and GH136 enzymes with putative HMO activities, as well as sensory and transcriptional regulators (Fig. 1f, g). The HMO locus of R. inulinivorans is extended with two fucosidases of GH29 and GH95. The specificity-determining solute binding proteins (SBPs) of the ABC transporters of R. hominis (RhLNBBP) and R. inulinivorans (RiLea/bBP) were the first and fifth top-upregulated proteins in the HMO proteomes, respectively. In addition, the GH112 LNB/GNB phosphorylases were within the top 3 and 12 upregulated proteins in R. hominis and R. inulinivorans, respectively. In R. inulinivorans two additional loci encoding sialic acid and fucose catabolism proteins, were also upregulated (Supplementary Fig. 3).

Diverse GH136 enzymes mediate initial HMO degradation

The homologs RhLnb136I (LnbY in B. longum) and RhLnb136II (LnbX that harbours the catalytic residues in B. longum) were highly co-upregulated in the LNT proteome of R. hominis (Fig. 1f). Both, RhLnb136I and RhLnb136II lack a predicted transmembrane domain and signal peptide in contrast to the B. longum counterparts (Supplementary Fig. 4a), indicative of the intracellular degradation of LNT in R. hominis. Only co-expression and co-purification of RhLnb136I and RhLnb136II resulted in an active lacto-N-biosidase (henceforth RhLnb136) (Fig. 2b, Supplementary Table 4). These findings and the observed co-upregulation, suggested that a heterodimer (or oligomer) of RhLnb136I and RhLnb136II assembles the catalytically active RhLnb136. Next, we demonstrated phosphorolysis of LNB and GNB to α-d-galactose-1-phosphate and the corresponding N-acetylhexosamines GlcNAc and GalNAc, respectively (Supplementary Fig. 5f), by the GH112 GNB/LNB phosphorylase (RhGLnbp112) located in the same locus (Fig. 1f and Supplementary Fig. 1). This enzyme has comparable specific activities for LNB and GNB (Supplementary Table 5) consistent with the growth on these disaccharides. The functional lacto-N-biosidase and GNB/LNB phosphorylase further support the HMO catabolism role of the locus.

Fig. 2: Specificities of GH136 enzymes that mediate the HMO degradation.
figure 2

a Activity of RiLea/b136 on fucosylated HMOs. b Activity of RhLnb136 on LNT. c Activity of ErLnb136 on LNT. ac The hydrolysates were analysed by MALDI-ToF MS without (b, c) or with a permethylation. a Masses of methylated sugars are in parentheses and the ion peaks correspond to the Na+ adducts of the methylated sugars. ac relative intensity (percentage intensity) is shown. The MALDI-ToF MS analyses (ac) were performed from independent triplicates (one analysis from each biological enzymatic reaction replicate) and all analyses yielded similar results.

RiGH136, the GH136 homolog from the HMO-upregulated locus in R. inulinivorans was predicted to be extracellular, with a signal peptide in RiGH136II that also possesses two C-terminal putative carbohydrate binding modules (Supplementary Fig. 8a) and a predicted N-terminal transmembrane domain in RiGH136I. Co-expression of RiGH136I and RiGH136II, lacking the transmembrane domain and signal peptide respectively, resulted in an active enzyme with an unprecedented specificity. This enzyme (RiLea/b136) released Lewis a triose or Lewis b tetraose from fucosylated HMOs including lacto-N-fucopentaose II (LNFP II), lacto-N-difucohexaose I (LNDFH I) and lacto-N-difucohexaose II (LNDFH II) (Fig. 2a and Supplementary Fig. 5a). To our knowledge, cleavage of the bond at the reducing end of a fucosylated-GlcNAc has not been reported to date. Next, we characterized the additional CAZymes encoded by the locus, all lacking a signal peptide or transmembrane domain suggestive of their intracellular localization. We showed that the concerted action of RiFuc29 and RiFuc95 that act on α-(1 → 4) and α-(1 → 2)-linked l-fucosyl, respectively mediates the complete defucosylation of putative products of RiGH136, Leb tetraose, Lea triose and H triose type I (Supplementary Fig. 5b–d). Initial defucosylation by RiFuc29 is required for releasing the 1 → 2 linked l-fucosyl in Leb tetraose by RiFuc95. Finally, we showed that the GH112 from R. inulinivorans (RiGLnbp112) phosphorolyzes LNB and GNB equally efficiently (Supplementary Fig. 5e, Supplementary Table 5).

A domain with a new fold is required for GH136 activity

To discern the molecular architecture of GH136 enzymes and explain the requirement of the two subunits for activity, we endeavoured to crystallize both RhLnb136 and RiLea/b136, without success. Hence, we turned our attention to the taxonomically related E. ramulus, which has a GH136 locus similar to the one in R. hominis, except for a substitution of the GH112 phosphorylase with a GH42 β-galactosidase gene (Supplementary Fig. S1) that may confer hydrolysis of the LNB/GNB units30. E. ramulus and R. hominis also shared similar growth profiles on LNT (Fig. 1), which was consistent with the presence of a functional GH136. The N-terminal (ErLnb136I) and the C-terminal (ErLnb136II) regions of the E. ramulus GH136 share homology to RhLnb136I and RhLnb136II (Supplementary Fig. 1 and 4), respectively, suggestive of the fusion of ErLnb136I and ErLnb136II to form an active enzyme (ErLnb136). This is supported by the identical specificity and similar catalytic efficiencies of ErLnb136 and RhLnb136 (Supplementary Table 4). Moreover, intimate interaction of the two ErLnb136 domains is consistent with the cooperative unfolding profile of the enzyme (Supplementary Fig. 4b). These data justified the use ErLnb136 to study the architecture of the two subunits/domains compulsory for activity within GH136. The crystal structures of selenomethionine (SeMet)-labelled and native ErLnb136 were determined at 1.4 and 2.0 Å resolution, respectively (Supplementary Table 6). The C-terminal catalytic domain (ErLnb136II, from AA 242-663) assumes a β helix fold (Fig. 3) similar to the bifidobacterial homolog LnbX (Supplementary Table 7). The LNB molecule bound in the active site is recognized by ten potential hydrogen bonds and aromatic stacking of the Gal unit onto W548 (Fig. 3f and Supplementary Fig. 6a). Interestingly, the GlcNAc sugar ring of LNB in ErLnb136 adopts an 4E conformation (φ = 232° and ψ = 68°) with the O1-OH in a pseudo-axial position to form a direct hydrogen bond with the acid/base catalyst (D568) (Supplementary Fig. 6a). Moreover, the D575 Oδ2 of the nucleophile is positioned appropriately for nucleophilic attack on the anomeric carbon of the GlcNAc at 3.2 Å (Fig. 3f).

Fig. 3: Crystal structure of the GH136 lacto-N-biosidase from E. ramulus (ErLnb136).
figure 3

ac Overall structure and a semitransparent surface of ErLnb136 consisting of an N-terminal domain (ErLnb136I, cyan-blue) and a C-terminal β-helix domain (ErLnb136II, green). The enzyme is shown in a a view orthogonal to the C-terminal β helix domain, b the view of a rotated 180° and c a view along the axis of C-terminal β helix domain, to highlight the interaction of ErLnb136I and ErLnb136II. d A molecular surface top view of the active site and a close up view e to illustrate the contribution of the ErLnb136I domain to the active site architecture, especially the tyrosine (Y145, magenta) that contributes to substrate affinity. f The weighted mFo-DFc omit electron density map (contoured at 4.0 σ) of the LNB unit (yellow sticks) in the active site. The water (red sphere) mediated and direct hydrogen bonds that recognize the LNB are the yellow dashed lines. df The catalytic nucleophile (D575) and catalytic acid/base residue (D568) are labelled in red. ac Disordered regions (residues 180–199 and 225–241) are shown as orange dotted lines.

The N-terminal domain (ErLnb136I, from AA 7-224) consists of 8 α-helices (α1-α8) (Fig. 3a–c) and assumes a previously unknown fold, stabilized by the central helix α1. The structurally most related protein to ErLnb136I, a peptidyl-prolyl cis-trans isomerase with a chaperone activity from Helicobacter pylori (5EZ1), shares weak structural similarity restricted to helices α6 and α7 (Supplementary Fig. 6b, Supplementary Table 7). The ErLnb136I domain embraces the sides and back of the β helix domain (Fig. 3a–c). These extensive inter-domain interactions (solvent inaccessible interface ≈1618 Å2), stabilize the protein structure with ΔG = −17 kcal mol−1. Remarkably, the α6-α7 loop of ErLnb136I forms a part of the active site with the solvent accessible sidechain of Y145 positioned near the active site (5.7 Å to the GlcNAc O1 atom of LNB) (Fig. 3d, e). The Y145A mutant showed a 4.9-fold higher KM (Supplementary Table 4, Supplementary Fig. 4c), suggesting that this residue contributes to substrate interactions, possibly at the +1 subsite.

Capture and uptake of HMOs by Roseburia

The proteomic analyses highlighted the putative protein apparatus required for growth on HMOs. The solute binding proteins (SBPs) of two ABC transporters in R. hominis and R. inulinivorans were within the top 8% upregulated proteins, hinting their involvement in uptake of HMOs. Both SBPs recognized distinct HMOs and ligands from host-glycans (Fig. 4, Supplementary Tables 2 and 3, Supplementary Fig. 7). The R. hominis SBP (LNB-binding protein, RhLNBBP) shows preference to LNB followed by GNB and LNT, suggestive of the uptake and intracellular degradation of these ligands by RhGLnbp112 and RhLnb136 as described above. By contrast, fucosyl-decorated Lewis b (Leb) tetraose and Lewis a (Lea) triose were the preferred ligand of the Lea/b binding protein (RiLea/bBP) from R. inulinivorans, followed by LNB and GNB, whereas no binding to LNT was detected (Fig. 4, Supplementary Table 3). The loss of the fucosyl unit at the terminal reducing GlcNAc reduced the affinity of RiLea/bBP about 5-fold for blood group H antigen triose type I (H triose type I) relative to Leb tetraose. The specificity of RiLea/bBP is highlighted by the lack of affinity for lacto-N-neotetroase (LNnT), blood group A antigen triose (A triose), lactose and 2′-fucosyllactose (2′-FL). These findings show that the products of RiLea/b136 are the preferred ligands for RiLea/bBP, consistent with the extracellular degradation of fucosylated pentaose and hexaose HMOs and uptake of their products by the ABC transporter. An equimolar mixture of Leb, Lea and H-triose type I oligomers promoted the growth of R. inulinivorans to a similar final OD600 as glucose (Fig. 1b, c and Fig. 4a). The uptake profiles of these ligands reflected the preference of RiLea/bBP, consistent with uptake by the associated transporter (Fig. 4). This was also in accord with the utilization of larger fucosylated HMO structures observed during growth on purified HMOs from mother’s milk (Supplementary Fig. 2a–c). Notably, no uptake of LNT was observed, which is in excellent agreement with the poor growth (Fig. 1c) and with the lack of detectable binding to LNT by RiLea/bBP (Fig. 4a).

Fig. 4: Roseburia transport proteins mediate capture of HMOs and related host derived oligosaccharides.
figure 4

a Binding analysis of HMOs and host derived oligosaccharides to RhLNBBP and RiLea/bBP. b, c Growth and uptake preference of R. inulinivorans on an equimolar mixture of Leb tetraose, Lea triose and H triose type I. b Growth level of R. inulinivorans on an equimolar mixture of Leb tetraose, Lea triose and H triose type I within 24 h including a no-carbon control. c Time course of the relative percentages of Leb tetraose, Lea triose and H triose type I in culture supernatants from b based on HPAEC-PAD analyses presented in d. d Representative HPAEC-PAD chromatograms showing time course analysis of culture supernatants of R. inulinivorans grown on YCFA media supplemented with 1.5 mM Leb tetraose, 1.5 mM Lea triose and 1.5 mM H triose type I. Binding affinities a of RhLNBBP were determined by isothermal titration calorimetry (ITC) while binding affinities of RiLea/bBP were determined by surface plasmon resonance (SPR) due to low availability of the ligands and justified by the comparability of binding constants from these techniques17,31. Both analyses were in independent duplicates (n = 2) and the KD values are reported with error bars representing the error of the fit to the binding isotherms. Growth experiments b were performed as independent biological triplicates (n = 3) and triplicate HPAEC-PAD analyses c, d were performed (one analysis/per biological replicate) whereby all HPAEC-PAD analyses yielded similar results.

These results established the capture of specific HMOs and related ligands by the above SBPs and the differentiation of their specificities, e.g. preference of RiLea/bBP to fucosylated ligands at the terminal reducing GlcNAc.

Roseburia cross feeding on mucin

HMOs and O-glycans from glycolipids and glyco-proteins including mucin share structural motifs. The high affinity of the SBPs from Roseburia for GNB from mucin suggested possible foraging of this substrate (and/or oligomers from glycoconjugates) and thereby a metabolic interplay of Roseburia with mucolytic HGM members. To evaluate possible mechanisms of cross-feeding we compared Roseburia growth on mucin with and without the model mucin degrader Akkermansia muciniphila DSM 2295932.

A co-culture of R. hominis and R. inulinivorans displayed no growth within 24 h on a mucin mixture and only poor growth after 48 h (Supplementary Fig. 8a,b), in contrast to A. muciniphila that grew well within 24 h. The co-culture of the two Roseburia species and A. muciniphila grew to a significantly higher OD600 than A. muciniphila alone (p < 3.7 × 10−6 at 24 h, p < 1.3 × 10−3 at 48 h)(Supplementary Fig. 8a). The growth of Roseburia is supported by a 4.5-fold higher butyrate level in the co-culture supernatants than Roseburia alone (24 h). After 48 h, a slight increase in butyrate concentration was also detected in cultures containing only Roseburia consistent with the growth data (Supplementary Fig. 8c).

To unveil the basis for the Roseburia growth, the proteomes of R. hominis and R. inulinivorans, both grown on glucose were compared with co-cultures of Roseburia and A. muciniphila grown on mucin. For R. hominis and R. inulinivorans, 31 and 93 proteins, including several CAZymes, were significantly upregulated (log2 fold change > 2) relative to the glucose co-cultures. The transport protein RhLNBBP and RhGLnbp112 from the R. hominis HMO locus (Fig. 1f) were the top 6th and 10th most upregulated proteins in the mucin proteome of R. hominis, respectively, indicative of a role of this locus in cross-feeding on host glycans (Supplementary Fig. 8e). In R. inulinivorans, the corresponding enzymes RiLea/bBP and RiGLnbp112 were also significantly upregulated with log2 fold changes of 2.77 and 4.71, respectively (Supplementary Fig. 8e). However, the top upregulated protein in the R. inulinivorans proteome was a SBP of an ABC transporter co-localised with genes encoding a blood group A- and B- cleaving endo-β-(1 → 4)-galactosidase (RiGH98), a putative α-galactosidase of GH36 and an α-l-fucosidase (GH29), which was the top fourth upregulated protein in the mucin proteome (Supplementary Fig. 8f). The upregulation of this locus suggested that R. inulinivorans possesses a functional machinery for directly accessing certain mucin oligomers. We expressed the predicted extracellular RiGH98 and demonstrated release of blood group A and B oligomers from mucin and related O-glycans (Supplementary Fig. 8g–h, Supplementary Data 1). The co-upregulation of a locus encoding a fucose utilization pathway (Supplementary Fig. 3a) is in accordance with the release of fucosylated oligomers by RiGH98. Another route of foraging, was suggested by the high upregulation of the sialic acid catabolism pathway (Supplementary Fig. 3b), which likely confers the potent growth of R. inulinivorans on this substrate (Fig. 1d). The poor growth of R. inulinivorans on mucin in the absence of A. muciniphila suggests that the latter bacterium enables cross-feeding by the release of sialic acid, as R. inulinivorans lacks sialidases, which together with the ability of RiGH98 to access blood group A and B oligomers in mucin substrates, may support a better co-growth in the co-culture. These findings are consistent with the role of HMO utilization machinery and additional functional operons in supporting co-growth with A. muciniphila on mucin.

The HMO utilization loci are prevalent in Roseburia

The HMO loci, defined by the co-occurrence of GH136 and GH112 genes, are conserved in five Roseburia reference genomes (Supplementary Fig. 1). To broadly examine the structure and conservation of these loci, the presence of homologs of the aforementioned genes was mapped across 4599 previously reconstructed Roseburia genomes33. As a reference signature for a central catabolic pathway, the presence of GH10 xylanase genes, compulsory for xylan utilization in R. intestinalis29, was also analysed. Strikingly, the GH112 and GH136 HMO utilization genes are about 2-3 fold more prevalent than the GH10 counterparts (Fig. 5a), indicative of the broader distribution of the HMO loci compared to the xylanase locus, which is mainly conserved in R. intestinalis. The GH136I and GH136II genes have a similar prevalence, which is ~30% lower than that of GH112.This overall trend is reiterated when we analyze individual species-level genome bins (SGBs), with some differences in the co-occurrence patterns of GH136 and GH112 genes (Fig. 5b). For example, while GH112 and GH136 have similar prevalence in R. hominis (SGB 4936), GH112 was 2.6 times more prevalent than GH136 in R. inulinivorans (SGB 4940). We analysed the organization of 818 loci, defined by the presence of a GH112 gene and at least one encoded subunit of the GH136, with a more stringent threshold (70% identity of the GH112 and GH136 sequences present in any of the 5 Roseburia reference genomes, see Supplementary Fig. 1). The gene clusters around the GH112 appeared to be SGBs-specific (Fig. 5c), indicative of diversity of the loci within the genus. Analysis of the most representative gene contexts for each SGB (Fig. 5d) shows that genes for ABC transporters, GH136, and transcriptional regulators were the most frequently co-occurring with GH112 genes, which offers a robust signature of the Roseburia HMO utilization loci (Fig. 5d) and validates their broad distribution. Additional CAZymes and carbohydrate metabolic genes were also frequently co-occurring in the vicinity of GH112 genes, suggesting that additional glycan utilization capabilities are clustered around the HMO loci.

Fig. 5: The conservation and structure of HMO utilization loci in Roseburia.
figure 5

a Global abundance of GH112, GH136I, GH136II and GH10 xylanase genes in 4599 Roseburia genomes illustrating the broad occurrence and conservation of the HMO utilization apparatus. b Heat map showing the segregation of GH112-containing genomes from a into different species-level genome bins (SGBs) and the corresponding relative abundance patterns of HMO utilization genes within each SGB. This data shows the frequent co-occurrence of GH136 and GH112 genes, although some Roseburia strains encode only the GNB/LNB degrading GH112. c Principal coordinate analysis of 818 Roseburia gene-landscapes defined stringently based on ≥70% identity to the GH112 and GH136 with any of the five references Roseburia genomes displayed in Supplementary Fig. 1 and including 10 proteins up- and downstream of the GH112. d The most frequently occurring gene landscapes in each Roseburia SGB, as anchored by aligning at the 3′ terminal of GH112 genes. The gene landscape analyses provide a signature for the HMO utilization loci that are defined by at least one GH112, a GH136, an ABC-transporter, and a transcriptional regulator.

Discussion

Perturbation of the early life HGM assembly is associated with life-long effects on the immune- and metabolic homeostasis of the host9,10,11,12. Breastfeeding is a key affector of the dynamics of the microbiota during infancy. Weaning marks a dramatic transition towards an adult-like structure of the HGM, which matures at the age of 2–3 and exhibits high resilience throughout adulthood7,8,22.

The critical window that precedes the maturation of the microbiota offers a unique opportunity for therapeutic interventions to address aberrant HGM states and thereby to prevent dysbiosis-related chronic disorders. To date, insight into the compositional transitions of the assembly of the microbiota during infancy6,7,8 is available, but the underpinning mechanisms, especially during weaning, remain elusive. Here, we describe previously unknown pathways that confer the growth of butyrate producing Clostridiales on distinct HMO motifs and related oligomers from host glyco-conjugates. These pathways may promote an early competitive adaptation advantage for Clostridiales that are associated with the healthy HGM and with the protection from metabolic and inflammatory disorders as well as colorectal cancer24,25,26,34.

We uniquely demonstrate that key butyrate producing Roseburia and Eubacterium spp. grow on complex HMOs purified from mother’s milk and on defined HMO motifs (Fig. 1a–d). Proteomic analyses revealed two highly upregulated genetic loci that encode distant homologs to a lacto-N-biosidase from B. longum28,35, GNB/LNB phosphorylases and ABC transporters in R. hominis and R. inulinivorans, (Fig. 1f–g and Supplementary Fig. 1). The R. hominis locus (Figs. 1, 2 and 4, Supplementary Fig. 5e, Supplementary Tables 2, 4 and 5) supports growth on the HMO motifs LNT and LNB, whereas the R. inulinivorans locus confers growth on more complex HMOs, e.g. single and double fucosylated versions of LNT (Figs. 2a and 4, Supplementary Figs. 2 and 5). The specialization on different, but partially overlapping, HMOs and related Lewisa/b antigen oligomers from glyco-lipids or glyco-proteins creates differential competitive catabolic niches. This specialization is evident from the divergence of the GH136 specificities. Thus, RhLnb136 and ErLnb136 are lacto-N-biosidases, whereas RiLea/b136 displays an unprecedented specificity that requires a Fuc-α-(1 → 4)-GlcNAc at the subsite −1 and accommodates additional fucosylation at the −2, and +2 subsites (Fig. 2, Supplementary Fig. 5a and Supplementary Table 4). The preference to fucosylation is consistent with an open active site effectuated by shortening of loops, (ErLnb136: Loop 1 AA 330-341, Loop 2 AA 520-543, Supplementary Fig. 6c), which allows the accommodation of bulky fucosylated substrates. Remarkably, the GH136I subunits (or domains in ErGH136-like enzymes) are co-evolved with the GH136II counterparts that possess the catalytic residues (Supplementary Fig. 6d).

Our stability (Supplementary Fig. 4c), structural (Fig. 3 and Supplementary Fig. 6), biochemical (Supplementary Fig. 4, Supplementary Table 4) and phylogenetic analyses (Supplementary Fig. 6d) affirm the crucial role of the GH136I domain in the functionality of GH136 enzymes and provide compelling evidence to the association of the two GH136 domains. The sequence conservation of GH136I and GH136II was mapped on the structure of ErLnb136. Strikingly, highly conserved patches were identified across both domains (Supplementary Fig. 6e). Particularly, parts of the α4-α5 loop and of the α5 helix in ErLnb136I that pack extensively onto ErLnb136II display globally conserved residues, together with the complementary co-conserved regions of ErLnb136II (Supplementary Fig. 6e). Moreover, the surface of ErLnb136I is positively charged and apolar at the interface with ErLnb136II, which is notably different from the negative potential on the surface of the rest of the enzyme (Supplementary Fig. 6f) and complementary to the interface surface of ErLnb136II. These results highlight the co-evolution of GH136 subunits or domains.

ABC transporters are a determinant of uptake selectivity and competitiveness in both bifidobacteria17,31,36 and R. intestinalis29. The two SBPs of the ABC importers located in the HMO loci of R. hominis and R. inulinivorans were within the top 5 upregulated proteins in each proteome in response to HMO utilization (Fig. 1), underscoring the critical role of oligosaccharide transport in the competitive gut niche. The preferences of the SBPs and GHs encoded by these loci appear aligned to confer efficient uptake and subsequent catabolism of preferred substrates (Figs. 2 and 4, Supplementary Fig. 5, Supplementary Tables 2, 3, 4 and 5). The LNB/GNB phosphorylases of GH112 are also conserved in the HMO loci (Supplementary Fig. 1). R. inulinivorans possesses additional CAZymes, notably different fucosidases for degradation of internalized fucosylated-oligomers (Supplementary Fig. 1 and 5b–d). Based on the proteomic analyses and the biochemical data, we propose a model for the two distinct routes for uptake and depolymerisation of HMOs in Roseburia and Eubacterium (Fig. 6 and Supplementary Fig. 9).

Fig. 6: Model for HMOs and related host glycan utilization by Roseburia and other Lachnospiraceae.
figure 6

In R. hominis, LNT, LNB and the mucin derived GNB are captured by RhLNBBP for uptake into the cytoplasm and LNT is subsequently hydrolysed to LNB. Both LNB and GNB are phosphorolyzed by RhGLnbp112 into α-d-galactose-1-phosphate and the corresponding N-acetylhexosamines GlcNAc and GalNAc, respectively. Lactose is likely hydrolysed by a canonical β-galactosidase. In R. inulinivorans, initial hydrolysis of HMOs or O-glycans from glyco-lipids/proteins occurs at the outer cell surface by RiLea/b136, which has two C-terminal putative galactose-binding domains. The import of degradation products is mediated by the RiLea/bBP-associated ABC transporter. Fucosyl decorations are removed by the concerted activity of RiFuc95 and RiFuc29 before RhGLnbp112 phosphorolyzes the resulting LNB or imported GNB into monosaccharides, as described in R. hominis. Galactose and galactose-1-phosphate products are converted via the Leloir pathway to glucose-6-phosphate and N-acetylhexosamine sugars are converted to GlcNAc-6-phosphate before entering glycolysis. The pyruvate generated from glycolysis is partly converted to butyrate46. Roseburia inhabits the outer mucus layer47 together with A. muciniphilia. R. inulinivorans cross-feeds on sialic acid and accesses β-(1 → 4)-linked blood group A and B oligosaccharides from mucin and glyco- lipids/proteins via RiGH98. Black solid arrows show enzymatic steps established or confirmed in this study. Black dotted arrows indicate steps based on literature. Grey dotted arrows indicate butyrate production by R. hominis and R. inulinivorans from mucin in co-culture with A. muciniphilia. The glycan structure key is the same as in Fig. 1.

Butyrate producing bacteria of the Roseburia-Eubacterium group (Clostridiales order) are early colonizers of the infant gut6,8,37 and are prevalent members of the adult HGM22,23.

The origin of this taxonomic group is enigmatic, but their presence in the human milk microbiome has been reported38,39. Orthogonal transfer from mothers based on the identification of the same Roseburia strains in mothers faeces, milk and the infant guts40 has also been proposed. R. intestinalis type strains have been isolated from infant faeces41, hinting the presence of this taxon before full transition to solid food.

We have previously shown that the abundance of distinct bifidobacteria in guts of breast-fed infants is strongly correlated to efficient ABC transporters that capture the 2′- and 3′-fucosyl-lactose HMOs with high affinity (KD ≈ 5 µM)17. The strains possessing these genes, e.g. from Bifidobacterium longum subspecies infantis, are not detected after weaning, as opposed to counterparts adept at utilizing plant-derived glycans. By contrast, the same Clostridium group XIVa strains that possess plant glycan utilization pathways29,42,43 retain HMO catabolic pathways. The simultaneous growth of R. hominis on LNT and the cereal derived xylotetraose (Supplementary Fig. 2a–c) demonstrates this catabolic plasticity, which likely confers an additional competitive advantage during weaning, when the dominant fucosyl lactose specialized Bifidobacterium community collapses due to sporadic supply of HMOs.

The loci that target HMOs also mediate cross-feeding on mucin or other glyco-conjugate oligomers, e.g. GNB from mucin and blood antigen structures, both captured efficiently by Roseburia transport proteins (Fig. 4a, Supplementary Tables 2 and 3). This is consistent with the significant butyrate production measured in co-cultures of Roseburia and A. muciniphila32 (Supplementary Fig. 8c) and the upregulation of GH136-containing loci in the mucin co-culture and HMO monocultures (Fig. 1 and Supplementary Fig. 8e). R. inulinivorans possesses an extensive mucolytic machinery revealed by the upregulation of fucose and sialic acid catabolism loci (Supplementary Fig. 3) as well as a blood group A and B- locus (Supplementary Fig. 8f–g, Supplementary Data 1) that allows the release of β-(1 → 4)-linked blood group oligomers found in mucin and glyco-lipids on the surfaces of enterocytes44,45. This ability to access carbohydrates from mucin and host glyco-conjugates supports growth during periods of nutritional perturbations, which may increase the resilience of this taxonomic group.

Our bioinformatic analysis of the Roseburia genomes establish that HMO utilization appears to be a core trait within Roseburia, based on the ubiquitous presence of loci harbouring GH112 and GH136 genes (Fig. 5). The occurrence of SGBs that exclusively possess GH112 genes (e.g. SGB 4939, Fig. 5b) suggests that distinct strains are secondary degraders that cross-feed on released simple substrates, e.g. LNB and GNB. By contrast, the co-occurrence of GH112 and GH136 genes (Fig. 5b) offers a signature for primary degraders that are able to access more complex glycans from HMOs or host glyco-conjugates.

In conclusion, the present study sets the stage for a mechanistic understanding of the assembly of physiologically important core groups in the early life microbiota and discloses previously unknown roles of HMOs in selection of Clostridiales. Additional studies are required to further address the paramount, but poorly understood maturation of the early life microbiota.

Methods

Chemicals and carbohydrates

Human milk and blood antigen oligosaccharides used in this study are described in Table S1. N-acetylneuraminic acid (Neu5Ac), α-d-galactose-1-phosphate (Gal1P) and α-l-fucose (Fuc) were form Carbosynth and xylotetraose was from Megazyme. Galactose (Gal), Glucose (Glc), N-acetylglucosamine (GlcNAc), N-acetylgalactosamine (GalNAc) and porcine gastric mucin type III, (PGM) were from Sigma Aldrich. Bovine submaxillary mucin (BSM) was from VWR. 2-aminoanthranilic acid (2-AA) was from Nacalai Tesque and pooled human milk samples were purchased from Hvidøvre hospital (Hvidøvre, Denmark). All chemicals were of analytical grade unless otherwise stated.

Enzymatic production of LNB and GNB

LNB and GNB for growth were produced enzymatically with the GH112 galacto-N-biose/lacto-N-biose phosphorylase (EC 2.4.1.211) from R. hominis (RhGLnbp112). In detail, 100 mM Gal1P and 300 mM corresponding N-acetylhexosamine (GlcNAc or GalNac) in 50 mM MES, 150 mM NaCl, pH 6.5 were incubated with 10 µM RhGLnbp112 for 36 h at 30 °C. After incubation, 2.5 volumes of ice-cold ethanol (99%) were added, samples were incubated at –20 °C for 2 h and centrifuged (10,000×g, 30 min at 4 °C) to remove the enzyme. Supernatants were up concentrated by rotary evaporation and disaccharides were desalted in ultrapure water (milliQ) using a HiPrep Desalt column (GE Healthcare, Denmark) on an Äkta avant chromatograph (GE Healthcare). Elution was monitored by measuring A235 nm and pooled fractions were freeze dried. Further purification was accomplished by high-performance liquid chromatography (HPLC) (UltiMate 3000, Dionex) using a TSKgel® Amide 80 column (4.6 × 250 mm) and a TSKgel® Amide 80 guard column (4.6 × 10 mm) (VWR) by loading LNB or GNB dissolved in the mobile phase (75% (v/v) acetonitrile, ACN) and an isocratic elution at 1 mL min−1. Purity of collected fractions (2 mL) was analysed by thin layer chromatography (TLC) using 5 mM standards of GalNAc, GlcNAc, Gal1P and LNB/GNB. Fractions containing pure LNB/GNB were pooled, ACN was removed by speed vacuum evaporation and samples were lyophilized until further use.

Purification of human milk oligosaccharides

HMOs were purified from pooled human milk samples48,49. Milk fat was separated by centrifugation (10,000 × g, 30 min at 4 °C) and proteins were removed by ethanol precipitation (as above). The supernatant was up concentrated by rotary evaporation, buffered with 2 volumes 100 mM MES, 300 mM NaCl, pH 6.5 and lactose was digested with ß-galactosidase from Kluyvermomyces lactis (Sigma Aldrich) (20 U mL−1, 3 h at 37 °C). The enzyme was precipitated with ethanol (as before) and the supernatant was concentrated by rotary evaporation. Residual lactose and monosaccharides were removed by solid-phase extraction (SPE) using 12 mL graphitized Supelclean™ ENVI-Carb™ columns (Supelco) with a bed weight of 1 g. For SPE, columns were activated with 80% (v/v) ACN containing 0.05% (w/v) formic acid (FA) and equilibrated with buffer A (with 4% (v/v) ACN, 0.05% (w/v) FA), which was also used to dilute the samples prior to loading. After sample loading, the columns were washed (6 column volumes of buffer A) to remove lactose and monosaccharides before oligosaccharides were eluted with 40% (v/v) ACN, 0.05% (w/v) FA. Eluted oligosaccharides were concentrated in a speed vacuum concentrator, freeze-dried and dissolved in milliQ prior to usage.

Purity of HMOs was verified by high-performance anion exchange chromatography with pulsed amperometric detection (HPAEC-PAD) on an ICS-5000 (Dionex) system with a 3 × 250 mm CarboPac PA200 column (Theromofisher), a 3 × 50 mm CarboPac guard column (Theromofisher) and 10 µL injections. HMOs were eluted with a stepwise linear gradient of sodium acetate: 0-7.5 min of 0–50 mM, 7.5–25 min of 50–150 mM and 25–35 min of 150-400 mM, at a flow rate of 0.35 mL min−1 and a mobile phase of constant 0.1 mM NaOH. Standards (0.01–0.5 mM) of lactose, galactose and glucose in milliQ were used to quantify these residual sugars as described above. The analysis was performed in triplicates and the residual content of these sugars was <2% (w/w) of the purified HMO mixture.

Isolation and purification of porcine mucins

The commercial porcine gastric mucin (PGM) was further purified50. In short, 20 g PGM was stirred for 20 h at 25 °C in 20 mM phosphate buffer, 100 mM NaCl, pH 7.8 (adjusted to pH 7.2 after the first 2 h using 2 M NaOH). Insoluble residues were removed by centrifugation (10,000×g, 30 min at 4 °C) and soluble mucin was precipitated by the addition of 3 volumes of ice cold ethanol (99%) and incubation for 18 h at 4 °C. Precipitated mucin was dialyzed 5 times against 200 volumes milliQ for 16 h at 4 °C, using a 50 kDa molecular weight cut off membrane (Spectra, VWR) and afterwards freeze dried.

Porcine colonic mucin was isolated from five fresh pig colons from the slaughterhouse of Danish Crown (Horsens, Denmark). Pig colons were processed at site and immediately placed on dry ice to ensure quick cooling during transport. Colons were opened longitudinally and content was removed mechanically and by washing with ice cold 0.9% (w/v) NaCl until no digesta was visible. Cleaned luminal surface was quickly dried with absorptive paper and the mucosa was scraped off with a blunt metal spatula and subsequently transferred into a pre-cooled glass beaker whereby visible fat was removed and discarded. Mucin was then purified as previously described51. Isolated mucin was immersed in 10 volumes extraction buffer (10 mM sodium phosphate buffer, 6 M guanidine hydrochloride (GuHCl), 5 mM ethylenediaminetetraacetic acid (EDTA), 5 mM N-ethylmaleimide, pH 6.5) and gently stirred overnight at 4 °C. Soluble impurities and floating fat were separated by centrifugation (10,000×g, 30 min at 4 °C), pelleted mucin was dissolved in 10 volumes extraction buffer and incubated for 3 h at room temperature again. Soluble impurities were removed by centrifugation as described before. Short incubation (3 h) extraction steps were repeated 7 times until the supernatant was clear for at least two repeated extractions. Afterwards insoluble mucin was solubilized by reduction in 0.1 M Tris, 6 M GuHCl, 5 mM EDTA, 25 mM dithiotreitol (DTT) pH 8, for 5 h at 37 °C and subsequent alkylation through the addition of 65 mM iodoacetamide and incubation in the dark for 18 h at 4 °C. Soluble mucin was dialyzed 6 times against 200 volumes milliQ using a 50 kDa MWCO dialysis bag for 6 h at 4 °C and freeze dried.

Cloning, expression and purification of proteins

Open reading frames encoding proteins from R. hominis DSM 16839, R. inulinivorans DSM 16841 and E. ramulus DSM 15684 were cloned without signal peptide or transmembrane domain from genomic DNA using In-Fusion cloning (Takara) and the primers in Table S8 into the EcoRI and Ncol restriction sites of the corresponding plasmids, to encode proteins with either a cleavable N- or C- terminal His6 tag. The pETM 11 plasmid was used (from G. Stier, EMBL, Center for Biochemistry, Heidelberg, Germany)52, except for RHOM_04110 (RhLnb136I) and ROSEINA2194_01899 (RiLea/b136I) which were cloned into pET15b (Novagen). Recombinant proteins were expressed in E. coli BL21 ΔlacZ (DE3)/pRARE2 and purified following standard protocols using His-affinity and size-exclusion chromatography. Mutants of E. ramulus HMPREF0373_02965 (ErLnb136) were constructed using QuickChange II Site-Directed Mutagenesis (Agilent) with pETM11_ HMPREF0373_02965 as template. Primers used for site-directed mutagenesis are listed in Table S8 and mutants were produced as described above. l-Selenomethionine (ʟ-SeMet) labelled protein expression of ErLnb136 was performed by introducing the corresponding plasmid into E. coli B834 (DE3) and culturing the transformed cells in a synthetic M9 based medium of the SelenoMet labelling Kit (Molecular Dimensions) supplemented either with l-methionine or l-SeMet (both to 50 µg mL−1). The l-SeMet labelled protein was purified as described above.

Growth experiments and single strain proteomics analysis

R. hominis DSM 16839, R. inulinivorans DSM 16841, E. ramulus DSM 15684 and E. ramulus DSM 16296 were grown anaerobically at 37 °C using a Whitley DG250 Anaerobic Workstations (Don Whitley Scientific). R. hominis and R. inulinivorans were propagated in YCFA medium41 while for E. ramulus strains CFA medium (modified YCFA medium lacking yeast extract to minimize E. ramulus growth on yeast extract) was used. Growth media were supplemented with 0.5% (w/v) carbohydrates sterilized by filtration (soluble carbohydrates, 0.45 µm filters) or autoclaving (mucins, 15 min at 121 °C) and cultures were performed in at least biological triplicates unless otherwise indicated. Bacterial growth was monitored by measuring OD600 nm and pH (for co-culture experiments). For growth experiments performed in microtiterplates, a Tecan Infinite F50 microplate reader (Tecan Group Ltd) located in the anaerobic workstation was used and growth was followed by measuring OD595 nm. An unpaired two-tailed Student’s t-test was used to determine the statistical significance between growth level reached between different culture conditions and non-carbohydrate controls.

For differential proteome analyses, R. hominis and R. inulinivorans were grown in 200 µL YCFA (1.5 mL Eppendorf tubes) to mid-late exponential phase (OD600 ~0.5–0.8) in four biological replicates. For R. hominis YCFA was supplemented with 0.5% (w/v) LNT or glucose and for R. inulinivorans 1% (w/v) HMOs or glucose was used as carbon source. Cells were harvested by centrifugation (5000×g, 5 min at 4 °C), washed twice with ice cold 0.9% (w/v) NaCl, resuspended in 20 µL lysis buffer (50 mM HEPES, 6 M GuHCl, 10 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP), 40 mM 2-chloroacetamide (CAA) pH 8.5) and stored at −80 °C for proteomics analysis.

Co-culture experiment and proteomics analyses

R. hominis, R. inulinivorans and A. muciniphila DSM 22959 were grown in 10 mL YCFA to mid-late exponential phase (OD600 ~0.6-0.7). From these pre-cultures, equal amounts of cells (OD600) were used to inoculate 30 mL fresh YCFA medium with 1% (w/v) of a mucin mixture (0.6% (w/w) PGM, 0.2% (w/w) PCM, 0.2% (w/w) BSM) or 1% (w/v) glucose to a start OD600 ~0.01. All cultures were performed in four biological replicates and growth was followed (OD600 and pH) at 0, 6, 8, 12, 16, 24, and 48 h. Samples (2 mL) were collected for proteomics analyses after 16 h and for SCFA quantification after 24 and 48 h. Samples were immediately cooled on ice and cells were harvested by centrifugation (5000 × g, 10 min at 4 °C). For proteomics, cell pellets were washed twice with ice cold 0.9% (w/v) NaCl, resuspended in 60 µL lysis buffer and stored at −80 °C until proteomics analysis. Collected culture supernatants for SCFA quantification were sterile filtrated (0.45 µm filters) and stored at −80 °C for further analysis.

Sample preparation for mass spectrometry

Samples were processed using a previously established protocol53,54. Cells were lysed by boiling (5 min 95 °C) followed by bead beating (3 mm beads, 30 Hz for 1 min) (TissueLyser II, Qiagen) and sonication bath (3 × 10 s at 4 °C) (Bioruptor, Diagenode). Lysates were centrifuged (14,000×g, 10 min at 4 °C) and soluble protein concentrations were determined by a Bradford assay (Thermo Fisher Scientific). For digestion, 20 µg protein were diluted 1:3 with 50 mM HEPES, 10% (v/v) ACN, pH 8.5 and incubated with LysC (MS grade, Wako) in a ratio of 1:50 (LysC:protein) for 4 h at 37 °C. Subsequently, samples were diluted to 1:10 with 50 mM HEPES, 10% (v/v) ACN, pH 8.5 and further digested with trypsin (MS grade, Promega) in a ratio of 1:100 for 18 h at 37 °C. Next, samples were diluted 1:1 with 2% (w/v) trifluoroacetic acid (TFA) to quench enzymatic activity and peptides were processed for mass spectrometry using in house packed stage tips55 as described below.

Peptides from single strain cultures were desalted using three discs of C18 resin packed into a 200 µL tip and activated by successive loading of 40 µL of MeOH and 40 µL of 80% (v/v) ACN, 0.1% (w/v) FA by centrifugation at 1800×g and equilibrated twice with 40 µL of 3% (v/v) ACN, 1% (w/v) FA before samples were loaded in steps of 50 µL. After loading, tips were washed three times with 100 µL 0.1% (w/v) TFA and peptides were eluted in two steps with 40 µL each of 40% (v/v) ACN, 0.1% (w/v) FA into a 0.5 mL Eppendorf LoBind tube. Peptides derived from co cultures were desalted and fractionated using strong cation exchange (SCX) chromatography filter plugs (3 M Empore). Per sample, 6 SCX discs were packed into a 200 µL tip and tips were activated and equilibrated by loading 80 µL (v/v) of ACN and then 80 µL of 0.2% (w/v) TFA. Samples were applied in 50 µL steps and tips were washed twice with 600 µL 0.2% (w/v) TFA. Subsequently peptides were stepwise eluted in 3 fractions with 60 µL of 125 mM NH4OAc, 20% (v/v) ACN, 0.5% (w/v) FA, then with 60 µL of 225 mM NH4OAc, 20% (v/v) ACN, 0.5% (w/v) FA and lastly with 5% (v/v) NH4OH, 80 % (v/v) ACN into 0.5 mL Eppendorf LoBind tubes. Eluted peptides were dried in an Eppendorf Speedvac (3 h at 60 °C) and reconstituted in 2% (v/v) ACN, 1% (w/v) TFA prior to mass spectrometry (MS) analysis.

LC-MS/MS

Peptides from biological triplicates of each culture condition were loaded on the mass spectrometer by reverse phase chromatography through an inline 50 cm C18 column (Thermo EasySpray ES803) connected to a 2 cm long C18 trap column (Thermo Fisher 164705) using a Thermo EasyLc 1000 HPLC system. Peptides were eluted with a gradient of 4.8–48% (v/v) ACN, 0.1% (w/v) FA at 250 nL min−1 over 260 min (samples from single strain cultures) or 140 min (SCX fractionated samples from co cultures) and analysed on a Q-Exactive instrument (Thermo Fisher Scientific) run in a data-dependent manner using a Top 10 method. Full MS spectra were collected at 70,000 resolution, with an AGC target set to 3 × 106 ions or maximum injection time of 20 ms. Peptides were fragmented via higher-energy collision dissociation (normalized collision energy = 25). The intensity threshold was set to 1.7 × 106, dynamic exclusion to 60 s and ions with a charge state <2 or unknown species were excluded. MS/MS spectra were acquired at a resolution of 17,500, with an AGC target value of 1 × 106 ions or a maximum injection time of 60 ms. The scan range was limited from 300–1750 m/z.

Protein label free quantification in bacterial co-cultures

Proteome Discoverer versions 2.2 and 2.3 were used to process and analyse the raw MS data files and label free quantification was enabled in the processing and consensus steps. The spectra from single strains proteomics were matched against the proteome database of R. hominis DSM 16839 (ID: UP000008178) or R. inulinivorans DSM 16841 (ID: UP000003561) respectively, as obtained from Uniprot. The spectra from co-culture experiments were searched against a constructed database consisting of the reference proteomes of the two Roseburia strains (as above) and A. muciniphila DSM 22959 (ID: UP000001031). For spectral searches, oxidation (M), deamidation (N, Q) and N-terminal acetylation were specified as dynamic modifications and cysteine carbamidomethylation was set as a static modification. Obtained results were filtered to a 1% FDR and protein quantitation was done by using the built-in Minora Feature Detector. For analysis of the label-free quantification data, proteins were considered present if at least two unique peptides (as defined in Proteome Discoverer) were identified and proteins had to be identified in at least two out of the three samples analysed per culture condition with high confidence.

Relative bacterial abundance in co-cultures was estimated based on strain unique peptides identified with Unipept version 4.056. To exclude peptides shared between closely related strains from the analyses, all peptide sequences quantified via Proteome Discoverer were imported into the Unipept web server and analysed with the settings Equate I and L and Advanced missed cleavage handling activated. The normalized sum of intensities of the resulting taxonomically distinctive peptides was then used for assessing relative abundances of each strain.

Butyrate quantification

Butyrate in culture supernatants was quantified by HPLC coupled to a refracting index detector (RID) and diode array detector (DAD) on an Agilent HP 1100 system (Agilent). Standards of butyric acid (0.09–50 mM) were prepared in 5 mM H2SO4 for peak identification and quantification. Samples from four biological replicates were analysed by injecting 20 µL of standard or filtrated (0.45 µM filter) culture supernatant on a 7.8 × 300 mm Aminex HPX-87H column (Biorad) combined with a 4.6 × 30 mm Cation H guard column (Biorad). Elution of was performed with a constant flow rate of 0.6 mL min−1 and a mobile phase of 5 mM H2SO4. Standards were analysed as above in technical triplicates.

Oligosaccharide uptake preference of Roseburia spp

R. hominis was grown anaerobically in 250 µL YCFA medium with 0.5% (w/v) of an equal mixture of xylotetraose and LNT in biological triplicates. Samples (20 µL) were taken after 0, 3.5, 5.5, 6.5, 8, 9.5 and 24 h, diluted 10-fold in ice cold 100 mM NaOH and centrifuged (10 min at 5000×g at 4 °C) before supernatants were stored at −20 °C until the HPAEC-PAD analysis. Standards of 0.5 mM xylotetraose and LNT were prepared in 100 mM NaOH and used to identify corresponding peaks in the chromatograms. Samples or standard were injected (2 µL injections) on a 4 × 250 mm CarboPac PA10 column with a 4 × 50 mm CarboPac guard column and eluted isocratically (0.750 mL min−1, 100 mM NaOH, 10 mM NaOAc). The analysis was performed from a biological triplicate and standards were analysed in technical duplicates.

For determining uptake preference of Leb tetraose, Lea triose and blood group H triose type I, R. inulinivorans was grown anaerobically in 200 µL YCFA medium supplied with an equal mixture of 1.5 mM Leb tetraose, 1.5 mM Lea triose and 1.5 mM blood group H triose type I in biological triplicates. Samples (10 µL) were taken after 0, 3.5, 5.5, 6.5, 8, 9.5 and 24 h, diluted 10-fold in ice cold 20 mM NaOH and centrifuged (10 min at 5000×g at 4 °C) before supernatants were stored at −20 °C until the HPAEC-PAD analysis. Standards of 0.1 mM Leb tetraose, Lea triose and blood group H triose type I were prepared in 20 mM NaOH and used to identify corresponding peaks in the chromatograms. Samples or standard were injected (10 µL injections) on a 4 × 250 mm CarboPac PA10 column with a 4 × 50 mm CarboPac guard column and eluted isocratically (0.750 mL min−1, 50 mM NaOH). The analysis was performed from a biological triplicate and standards were analysed in technical duplicates.

For investigating mixed HMO uptake, R. inulinivorans was grown anaerobically in 300 µL YCFA medium with 0.5% (w/v) of mixed HMOs purified from mother’s milk or 0.5% (w/v) of mixed HMOs purified from mother’s milk but previously digested with RiLea/b136 (0.5 µM RiLea/b136 for 18 h) in biological triplicates. Samples (10 µL) were taken after 0 and 24 h, diluted 10-fold in ice cold 20 mM NaOH and centrifuged (10 min at 5000×g at 4 °C) before supernatants were stored at −20 °C until the HPAEC-PAD analysis. Standards of 0.1 mM Leb tetraose, Lea triose, blood group H triose type I, LNDFH I, Lactose, 2′FL and LNT were prepared in 20 mM NaOH and used to identify corresponding peaks in the chromatograms. Samples or standard were injected (10 µL injections) on a 4 × 250 mm CarboPac PA200 column with a 4 × 50 mm CarboPac guard column and eluted isocratically (0.350 mL min−1, 50 mM NaOH). The analysis was performed from a biological triplicate and standards were analysed in technical duplicates.

Enzyme activity assays

Enzymatic activity assays were carried out in 50 mM MES, 150 mM NaCl, 0.005% (v/v) Triton X-100, pH 6.5 standard assay buffer and in triplicates unless otherwise stated.

Hydrolysis kinetics and specific activities of the GH136 lacto-N-biosidases were measured using a coupled enzymatic assay to monitor lactose release. The lactose was hydrolysed with a ß-galactosidase (used above) and the resulting glucose was oxidized with a glucose oxidase (Sigma Aldrich) concomitant with the production of H2O2 measured by coupling to horseradish peroxidase (Sigma Aldrich) oxidation of 4-aminoantipyrine and 3,5-dichloro-2-hydroxybensensulfonic acid. Reactions were prepared in 96-well microtiter plates to a final volume of 150 µL, containing substrate, lacto-N-biosidase, ß-galactosidase (150 U mL−1), glucose oxidase (150 U mL−1), horseradish peroxidase (150 U mL−1), 10 mM 3,5-dichloro-2-hydroxybensensulfonic acid, 1 mM 4-aminoantipyrine in standard assay buffer. Reactions were performed at 37 °C and A515 nM was measured in 5 sec intervals for 30 min. Blanks were prepared by substituting lacto-N-biosidase with standard assay buffer in the reaction mixture and a lactose standard (3–500 µM) was used for the quantification.

Hydrolysis kinetics of RhLnb136 (40 nM) and ErLnb136 (10 nM) towards LNT (0.2–5 mM for RhLnb136 and 0.1–2.5 mM for ErLnb136) were determined as described above. The kinetic parameters KM and kcat, were calculated by fitting the Michaelis-Menten equation to the initial rate data using OriginPro 2018b and OriginPro 2019b (OriginLab). Lacto-N-biosidase specific activity of RiLea/b136 (1.2 µM) was measured as described above using 3.5 mM LNT. The specific activity was expressed in units (U) mg−1 enzyme, where a unit is defined as the amount of enzyme that releases 1 µmol lactose min−1 quantified as above.

Specific activities of RhGLnbp112 and RiGLnbp112 towards LNB and GNB were assayed 50 mM sodium phosphate buffer, 150 mM NaCl, 0.005% (v/v) Triton X-100, pH 6.5. Reactions (150 µL) were incubated for 10 min at 37 °C with 20 nM enzyme and 2 mM substrate. Aliquots of 15 µL were removed every minute and quenched in 135 µL 0.2 M NaOH. Standards of Gal1P (5 mM─0.02 mM) were prepared in 0.2 M NaOH and were used to quantify the concentrations of released Gal1P in the quenched reaction samples. Both, quenched reactions and standards were examined by HPAEC-PAD using a 3 × 250 mm CarboPac PA200 column (Theromofisher) in combination with a 3 × 50 mm CarboPac guard column (Theromofisher) and 10 µL injections. Elution was performed with a flow of 0.350 mL min−1 and a mobile phase of 150 mM NaOH and 60 mM sodium acetate. The specific activity was expressed in U mg−1 enzyme, where a U is defined as the amount of enzyme that releases 1 µmoL Gal1P min−1. The analysis was performed in technical triplicates.

Enzyme product profiles

Enzyme assays were performed at 37 °C for 16 h in standard assay buffer or in the phosphate version (instead of MES) for GH112 enzymes, in independent biological triplicates. Degradation products were analysed by thin layer chromatography (TLC) and or Matrix-assisted laser desorption/ionization time of flight mass spectroscopy (MALDI-TOF/MS) as described below.

Thin layer chromatography

The TLC was performed by spotting 2 µL of enzymatic reaction on a silica gel 60 F454 plate (Merck), the separation was carried out in butanol: ethanol: milliQ water (5:3:2) (v/v) as mobile phase and sugars were visualized with 5-methylresorcinol:ethanol:sulfuric acid (2:80:10) (v/v) and heat treatment except for RiLea/b136. The TLC for the latter enzyme was performed in butanol:acetic acid: milliQ (2:1:1)(v/v) and developed with diphenylamine-phosphoric acid reagent57. TLC analyses were performed from two independent biological duplicates (one analysis from each biological enzymatic reaction replicate)

MALDI-TOF/MS

MALDI-TOF/MS analysis of RiLea/b136 was according to58, following permethylation of oligosaccharides59. For permethylation, lyophilized oligosaccharides were reconstituted in 200 µL of anhydrous dimethylsulfoxide (DMSO) and mixed for 5 min with 250 µL of NaOH in DMSO and with 150 µL of iodomethane. Next, 2 mL of 5 % (w/v) acetic acid was added followed by the addition of 2 mL of CH2Cl2. Subsequently, permethylated oligosaccharides were extracted in the organic phase, dried under a nitrogen stream at 40 °C before loading onto a pre-equilibrated Sep-pak C18 cartridges, washing with water and elution with 85% (v/v) acetonitrile. Eluted fractions were dried under nitrogen as before and stored at −20 °C until further use. After the enzymatic reaction permethylated products were dried, mixed with 2,5-dihydroxybenzoic acid, and spotted onto the MALDI plate. For MALFI-TOF/MS analyses, a Bruker Autoflex III smartbeam in positive ion mode was used. Degradation products of RhLnb136 and ErLnb136 were analysed without initial permethylation of oligosaccharides using 2,5-dihydroxybenzoic acid as matrix and an Ultraflex II TOF/TOF (Bruker Daltonics) instrument operated in positive ion linear mode. Peak analysis of mass spectra was performed using Flexanalysis Version 3.3 (Bruker Daltonics). MALDI-TOF/MS analyses where performed from independent triplicates (one analysis from each biological enzymatic reaction replicate).

LC-MS2 of O-glycan derived oligosaccharides

A homogenous preparation of porcine gastric mucin, PGM (Sigma), carrying blood group A, was used in the analysis. A total of 0.1 mg mucin per dot were immobilized by dot blotting onto an immobilon-P PVDF membranes (Immobilon P membranes, 0.45 µm, Millipore, Billerica, MA). RiGH98 was added to one dot to 1.5 µM in 50 µL and incubated for 1 h and 4 h at 37 °C. The reaction supernatants which contained released free oligosaccharides, were collected and purified by passage through porous graphitized carbon (PGC) particles (Thermo Scientific) packed on top of a C18 Zip-tip (Millipore). Samples were eluted with 65% (v/v) ACN in 0.5% trifluoro-acetic acid (TFA, v/v), dried, resuspended in 10 μL of milliQ, frozen at −20 °C and stored until further analysis. The residual O-linked glycans (on the dot) were released by reductive β-elimination by incubating the dot in 30 μL of 0.5 M NaBH4 in 50 mM NaOH at 50 °C for 16 h followed by adding 1.5 μL glacial acetic acid to quench the reaction. The released O-glycans were desalted and dried as described before60. The purified glycans were resuspended in 10 μL of milliQ and stored at −20 °C for further analysis. Released oligosaccharides from glycosphingolipids as a model substrate carrying blood group B (B5-2 and B6-2)61 were prepared as described above, except for a single incubation time of 2 h.

Purified samples were analysed by LC-MS/MS using 10 cm × 250 µm I.D. column, packed in house with PGC 5 µm particles. Glycans were eluted using a linear gradient of 0–40% ACN in 10 mM NH4HCO3 over 40 min at 10 µl min−1. The eluted O-glycans were analysed on a LTQ mass spectrometer (Thermo Scientific) in negative-ion mode with an electrospray voltage of 3.5 kV, capillary voltage of −33.0 V and capillary temperature of 300 °C. Air was used as a sheath gas and mass ranges were defined depending on the specific structure to be analysed. The data were processed using Xcalibur software (version 2.0.7, Thermo Scientific).

Oligosaccharide binding analysis

Binding of LNT, LNB, GNB, H type I triose, Lea triose and Leb tetraose to RiLea/bBP was analyzed by surface plasmon resonance (SPR; Biacore T100, GE Healthcare). RiLea/bBP, diluted in 10 mM NaOAc buffer pH 3.75 to 50 µg mL−1, was immobilized on a CM5 chip using a random amine coupling kit (GE Healthcare) to a final chip density of 3214 and 4559 response units (RU). Analysis comprised 90 s for association and 240 s for dissociation phase, respectively, at a flow rate of 30 µL min−1. Sensograms were recorded at 25 °C in 20 mM sodium phosphate buffer, 150 mM NaCl, 0.005% (v/v) P20 (GE Healthcare), pH 6.5. Experiments were performed in duplicates (each consisting of a technical duplicate) in the range of 0.3–50 µM for LNB, 0.78-200 µM for GNB, 0.97–250 µM for Lea, 0.097–100 µM for Leb and 1.5–250 µM for blood H type I triose. To investigate ligand specify of RiLea/bBP, binding was further tested towards 0.5 mM LNT, LNnT, lactose, blood A triose, 2′FL and 3′FL. Equilibrium dissociation constants (KD) were calculated by fitting a one binding site model to steady state sensograms, using the Biacore T100 data evaluation software.

Binding of LNT, LNB, GNB, LNnT, lactose and 2′FL to RhLNBBP was measured using a Microcal ITC200 calorimeter (GE Healthcare). Titrations were performed in duplicates at 25 °C with RhLNBBP (0.1 mM) in the sample cell and 1.5 mM ligand in 10 mM sodium phosphate buffer, pH 6.5 in the syringe. A first injection of 0.4 µL was followed by 19 injections of 2 µL ligand each, separated by 180 s. Heat of dilution was determined from buffer titrations and corrected data were analysed using MicroCal Origin software v7.0. To determine binding thermodynamics a non-linear single binding model was fitted to the normalized integrated binding isotherms.

Differential scanning calorimetry

The Differential scanning calorimetry (DSC) analyses was performed at protein concentrations of 1 mg mL−1 in 20 mM sodium phosphate buffer, 150 mM NaCl, pH 6.5, using a Nano DSC (TA instruments). Thermograms were recorded from 10 to 90 °C at a scan speed of 1 °C min−1 using buffer as reference. Baseline corrected data were analysed using the NanoAnalyze software (TA instruments). DSC analyses were performed in duplicates unless otherwise states.

Crystallization

Crystals of ErLnb136 proteins were grown at 20 °C using the sitting-drop vapor diffusion method, by mixing 0.5 µL of a 10 mg mL−1 protein solution with an equal volume of a reservoir solution. Native crystals were grown in a 20% (w/v) PEG4000, 0.1 M sodium citrate pH 5.6, and 20% isopropanol reservoir solution. SeMet-labelled crystals were grown using a reservoir solution containing 20% (w/v) PEG6000, 0.1 M Tris-HCl pH 8.5, and 1 M lithium chloride. The crystals were cryoprotected in the reservoir solution supplemented with 20% (v/v) glycerol and 25 mM LNB. The crystals were flash-cooled at 100 K (−173.15 °C) in a stream of nitrogen gas. Diffraction data were collected at 100 K on beamlines at SLS X06DA (Swiss Light Source, Swiss) and Photon Factory of the High Energy Accelerator Research Organization (KEK, Tsukuba, Japan). The data were processed using HKL200062 and XDS63. Initial phase calculation, phase improvement, and automated model building were performed using PHENIX64. Manual model rebuilding and refinement was achieved using Coot65 and REFMAC566. Because the crystal structures of SeMet-labelled and native protein were virtually the same (root mean square deviations for the Cα atoms = 0.14 Å), we used the SeMet-labelled protein structure for the descriptions in the Results and Discussion. Molecular graphics were prepared using PyMOL (Schrödinger, LLC, New York) or UCSF Chimera (University of California, San Francisco)

Bioinformatics

SignalP v.4.167, PSORTb v3.068, TMHMM v.2.069 were used for prediction of signal peptides and transmembrane domains. InterPro70 and dbCAN271 were used to analyse modular organization using default settings for Gram positive bacteria. Redundancy in biological sequence datasets was reduced using the CD-HIT server (sequence identity cut off = 0.95)72. Protein sequence alignments were performed using MAFFT (BLOSUM62)73. Phylogenetic trees were constructed using the MAFFT server, based on the neighbour-joining algorithm, and with bootstraps performed with 1000 replicates. Phylogenetic trees were visualized and tanglegrams constructed using dendroscope74. Colouring of protein structures according to amino acid sequence conservation was accomplished in UCSF Chimera, based on protein multiple (structural based) alignments from the PROMALS3D server75 and by using the in UCSF Chimera implemented AL2CO algorithm76. The MEME suite web server was used for amino acid sequence motif discovery and evaluation77. Protein structures were compared using the Dali server (http://ekhidna2.biocenter.helsinki.fi/dali/) (PMID: 27131377) and the molecular interface between ErLnb136I and ErLnb136II was analysed (solvent inaccessible interface, Gibbs energy) via the PDBePISA server (https://www.ebi.ac.uk/pdbe/pisa/).

The abundance and distribution of HMO utilization genes encoding GH112, GH136I and GH136II in Roseburia were analysed by a BLAST search of the corresponding DNA reference sequences from R. intestinalis L1-82, R. hominis A2-183 and R. inulinivorans A2-194 against a total of 4599 reconstructed Roseburia genomes, binned into 42 Species-level Genome Bins (SGBs) by Pasolli et al.33. The variability of the Roseburia core xylanase (GH10) was determined similarly by blasting the DNA reference sequences from R. intestinalis L1-82 (ROSINTL182_06494) against the same dataset.

For further analyses, initial blast hits were filtered based on a 70% identity with any of the 5 conserved Roseburia reference genomes. Additionally, Roseburia genomes were considered only if they have a hit with GH112 gene and at least one subunit of the GH136 gene. The resulting 818 genomes were assigned into the respective Roseburia SGBs, based on the assignment of Pasolli et al.33. The retrieved genomes were used to analyse the gene landscape around the GH112 gene. The RAST server78 was used for gene annotation. Based on the annotation and coordinates of the genes, 11 genes upstream and downstream the GH112 were selected for gene landscapes analysis. The most conserved gene neighborhood along each SGB was selected as the representative for each SGB. Principal component analysis was done based on the structure of the GH112-GH136 neighbourhood, considering the present or absent of the genes on the gene landscape and as well its position on the loci. We used the function stringdistmatrix with the “osa” method from R to compute the distance matrix.

Quantification and statistical analysis

Statistical significant differences were determined using unpaired two-tailed Student’s t-test. Statistical parameters, including values of n and p-values, are reported or indicated in the figures, figure legends and the result section. The data are expressed as arithmetic means with standard deviations (SD), unless otherwise indicated.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.