Butyrate producing colonic Clostridiales metabolise human milk oligosaccharides and cross feed on mucin via conserved pathways

The early life human gut microbiota exerts life-long health effects on the host, but the mechanisms underpinning its assembly remain elusive. Particularly, the early colonization of Clostridiales from the Roseburia-Eubacterium group, associated with protection from colorectal cancer, immune- and metabolic disorders is enigmatic. Here, we describe catabolic pathways that support the growth of Roseburia and Eubacterium members on distinct human milk oligosaccharides (HMOs). The HMO pathways, which include enzymes with a previously unknown structural fold and specificity, were upregulated together with additional glycan-utilization loci during growth on selected HMOs and in co-cultures with Akkermansia muciniphila on mucin, suggesting an additional role in enabling cross-feeding and access to mucin O-glycans. Analyses of 4599 Roseburia genomes underscored the preponderance and diversity of the HMO utilization loci within the genus. The catabolism of HMOs by butyrate-producing Clostridiales may contribute to the competitiveness of this group during the weaning-triggered maturation of the microbiota. The assembly and maturation of the early life microbiome has life-long effects on human health. Here, the authors combine omics, functional assays and structural analyses to characterize the catabolic pathways that support the growth of butyrate producing Clostridiales members from the Roseburia and Eubacterium, on distinct human milk oligosaccharides.

T he human gut microbiota (HGM) is a key determinant of health [1][2][3] . Orthogonal transfer from the mother contributes markedly to the establishment of this community shortly after birth 4,5 . The HGM develops dynamically during infancy until a resilient adult-like community is formed after 2-3 years of life [6][7][8] . The early life microbiota plays a role in the maturation of the host's endocrine, metabolic and immune systems 9 , and the composition of this consortium is associated with life-long health effects [10][11][12] . Therefore, understanding the factors that define the HGM structure during infancy is critical for minimizing the risk for a range of metabolic, inflammatory and neurodegenerative disorders, all associated to specific HGM signatures 13,14 .
Dietary glycans resistant to digestion by human enzymes are a major driver that shapes the developing HGM 6,15 . This is emphasized by the dominance of Bifidobacterium in breast-fed infants 7,8 , attributed to the competitiveness of distinct members of this genus in the utilization of human milk oligosaccharides (HMOs) 16,17 . Indeed, the most prominent changes in the infant microbiota occur during weaning and the introduction of solid food 6,7 , whereby bifidobacteria are replaced by Firmicutes as the top abundant phylum of the mature HGM. This compositional shift is accompanied by notable longitudinal increases in concentrations of the short chain fatty acids (SCFAs) propionate and butyrate (from carbohydrate fermentation) during and after weaning 18 .
Butyrate exerts immune-modulatory activities 19 and is associated with a lowered risk of colon cancer, atherosclerosis, and enteric colitis 20,21 . The production of butyrate is largely ascribed to Firmicutes Clostridium cluster IV and Clostridium cluster XIVa that includes the Roseburia-Eubacterium group (Lachnospiraceae family, Clostridiales order), which are abundant and prevalent in the adult HGM 22,23 . The abundance of Roseburia spp. is decreased in patients suffering from metabolic, inflammatory and cardiovascular diseases [24][25][26][27] . Although butyrate producers are established by the first year of life 27 , the mechanisms underpinning their early appearance (and prevalence) remain unknown.
The evolution of uptake and enzymatic systems that support competitive growth of Bifidobacterium spp. on HMOs 17 reflects a successful adaptation to the intestines of breast fed infants. We hypothesize that other taxonomic groups, which possess metabolic capabilities that target HMOs, may have an early advantage in the colonization of the infant gut during infancy.
The early emergence of Roseburia-Eubacterium in the human gut offers a suitable model group to evaluate this hypothesis. Here, we perform genomic analyses that are suggestive of the presence of putative HMO utilization loci in Roseburia and Eubacterium strains. Growth on selected HMOs or a complex mixture from mother's milk combined with differential proteomics reveal the high upregulation of the protein apparatus encoded by these loci, consistent with their role in mediating HMO utilization.
Further, we characterize enzymes and transport proteins encoded by the HMO loci to elucidate the molecular details of HMO capture and degradation by this protein apparatus. These analyses unveil an enzymatic activity and a structural fold, which have not been previously reported. The HMO catabolic pathways are upregulated during growth with the model mucin degrader Akkermansia muciniphila, suggesting these pathways may support cross-feeding on mucin oligosaccharides made accessible by A. muciniphila. Analyses of the metagenome of Roseburia show a striking conservation and wide occurrence of the HMO utilization pathways across the genus, underscoring their importance for adaptation to the human gut. This study provides insight into pathways that may confer a competitive advantage in the early colonization and the resilience of key butyrate-producing Clostridiales by mediating the catabolism of distinct HMOs and host O-glycans.

Results
HMO loci in Roseburia and Eubacterium. Our aim was to investigate the HMO utilizsation potential in butyrate producing Clostridiales, which potentially may confer an advantage during the maturation of the infant HGM during weaning. Genomic analyses of butyrate producers from Lachnospiraceae identified distant homologs of the recently discovered glycoside hydrolase family 136 (GH136) in the Carbohydrate Active enZyme (CAZyme) database (www.cazy.org) (Supplementary Fig. 1). This family was assigned based on the lacto-N-biosidase LnbX from Bifidobacterium longum subsp. longum JCM 1217 28 , which cleaves the key HMO lacto-N-tetraose (LNT) to lacto-N-biose (LNB) and lactose (EC 3.2.1.140; Supplementary Table 1). The activity of the bifidobacterial LnbX was dependent on the coexpression of an adjacent gene, proposed to encode a molecular chaperone (LnbY). The GH136 orthologues from Roseburia and Eubacterium are organized, unlike the counterpart from Bifidobacterium, on a locus harbouring additional CAZyme genes ( Supplementary Fig. 1).
We selected two Roseburia strains and one from Eubacterium, all having GH136-like genes, to examine their HMO utilization capabilities.
Typically, bacteria repress pathways for less preferred substrates in the presence of a favourite carbon source. Xylotetraose from the abundant dietary plant fibre xylan has been shown to be a preferred growth substrate and uptake ligand of the xylooligosaccharide importer conserved in Roseburia 29 . During weaning, the HGM of infants is likely to be exposed to both HMOs and dietary plant fibres, e.g. xylan from cereals and fruits. We tested the growth of R. hominis in the presence of equimolar concentrations of LNT and the similarly sized xylotetraose to evaluate the utilization hierarchy of the HMO versus the plant fibre. Strikingly, monophasic growth was observed consistent with the simultaneous uptake of both tetraoses from culture supernatant ( Supplementary Fig. 2a-c).
To unravel the basis of growth on HMOs with focus on the Roseburia genus, we analysed the proteomes of R. hominis and R. inulinivorans on LNT and the HMO mixture, respectively, relative to glucose. For R. hominis and R. inulinivorans, 15 and 62 proteins, respectively, were significantly upregulated (log 2 fold change > 2). These differential proteomes were dominated by carbohydrate metabolism proteins, especially products of two loci (henceforth referred to as HMO utilization loci), both encoding an ATP-binding cassette (ABC) transporter, GH112 and GH136 enzymes with putative HMO activities, as well as sensory and transcriptional regulators (Fig. 1f, g). The HMO locus of R. inulinivorans is extended with two fucosidases of GH29 and GH95. The specificity-determining solute binding proteins (SBPs) of the ABC transporters of R. hominis (RhLNBBP) and R. inulinivorans (RiLe a/b BP) were the first and fifth topupregulated proteins in the HMO proteomes, respectively. In addition, the GH112 LNB/GNB phosphorylases were within the top 3 and 12 upregulated proteins in R. hominis and R. inulinivorans, respectively. In R. inulinivorans two additional loci encoding sialic acid and fucose catabolism proteins, were also upregulated ( Supplementary Fig. 3).
Diverse GH136 enzymes mediate initial HMO degradation. The homologs RhLnb136 I (LnbY in B. longum) and RhLnb136 II (LnbX that harbours the catalytic residues in B. longum) were highly co-upregulated in the LNT proteome of R. hominis (Fig. 1f) Fig. 1 Growth of Roseburia and Eubacterium spp. on HMOs and upregulation of HMOs utilization loci in Roseburia. Growth curves of R. hominis (a) and R. inulinivorans (b) on glucose, LNT, GNB, LNB, and/or purified HMOs from mother's milk compared to no-carbon source controls over 24 h. c Growth levels of R. inulinivorans on LNT, LNB, GNB and of E. ramulus on LNT within 24 h including glucose and a no-carbon source controls. d, Growth of R. hominis, R. inulinivorans and E. ramulus on lactose, 2′FL, 3FL, 3′SL and 6′SL as well as on monosaccharides from HMOs and mucin after 24 h including a non-carbon source control. Growth analyses (a-d) on media supplemented with 0.5 % (w/v) carbohydrates (for R. inulinivorans on 1% (w/v) and 4% (w/v) purified HMOs from mothers milk) were performed in independent biological triplicates. The growth data are presented as mean values with the error bars representing the standard deviations (SD) for a-c. e HMO and mucin oligomeric growth substrates in a-d. The HMO utilization loci in R. hominis (f) and R. inulinivorans (g) identified from proteomic analyses of cells growing on LNT and HMOs from mother's milk, respectively, relative to glucose. Genes are denoted by their protein products: transcriptional regulator (Trans. R.); ABC transporter solute binding protein (RhLNBBP (f) and RiLe a/b BP (g)); ABC transporter permease protein (PP); hypothetical proteins (HP); Glycoside hydrolase 136 (RhLnb136 I , RhLnb136 II (f) and RiLe a/b 136 I , RiLe a/b 136 II (g)); Glycoside hydrolase 112 (RhGLnbp112 (f) and RiGLnbp112 (g)); Glycoside hydrolase 29 (RiFuc29 (g)); Glycoside hydrolase 95 (RiFuc95 (g)) and histidine kinase sensory protein (His. K.) The proteomic analyses (f-g) were in biological triplicates and the log 2 -fold change from the label free quantification of upregulated gene products is shown. Glycan structures presentation according to Symbol Nomenclature for Glycans (SNFG) (https://www.ncbi.nlm.nih. gov/glycans/snfg.html). Source data are provided as a Source data file labelled with the corresponding figure number and Fig. 4a), indicative of the intracellular degradation of LNT in R. hominis. Only coexpression and co-purification of RhLnb136 I and RhLnb136 II resulted in an active lacto-N-biosidase (henceforth RhLnb136) (Fig. 2b, Supplementary Table 4). These findings and the observed co-upregulation, suggested that a heterodimer (or oligomer) of RhLnb136 I and RhLnb136 II assembles the catalytically active RhLnb136. Next, we demonstrated phosphorolysis of LNB and GNB to α-D-galactose-1-phosphate and the corresponding Nacetylhexosamines GlcNAc and GalNAc, respectively (Supplementary Fig. 5f), by the GH112 GNB/LNB phosphorylase (RhGLnbp112) located in the same locus ( Fig. 1f and Supplementary Fig. 1). This enzyme has comparable specific activities for LNB and GNB (Supplementary Table 5  c Activity of ErLnb136 on LNT. a-c The hydrolysates were analysed by MALDI-ToF MS without (b, c) or with a permethylation. a Masses of methylated sugars are in parentheses and the ion peaks correspond to the Na + adducts of the methylated sugars. a-c relative intensity (percentage intensity) is shown. The MALDI-ToF MS analyses (a-c) were performed from independent triplicates (one analysis from each biological enzymatic reaction replicate) and all analyses yielded similar results.
RiGH136, the GH136 homolog from the HMO-upregulated locus in R. inulinivorans was predicted to be extracellular, with a signal peptide in RiGH136 II that also possesses two C-terminal putative carbohydrate binding modules ( Supplementary Fig. 8a) and a predicted N-terminal transmembrane domain in RiGH136 I . Co-expression of RiGH136 I and RiGH136 II , lacking the transmembrane domain and signal peptide respectively, resulted in an active enzyme with an unprecedented specificity. This enzyme (RiLe a/b 136) released Lewis a triose or Lewis b tetraose from fucosylated HMOs including lacto-N-fucopentaose II (LNFP II), lacto-N-difucohexaose I (LNDFH I) and lacto-N-difucohexaose II (LNDFH II) ( Fig. 2a and Supplementary Fig. 5a). To our knowledge, cleavage of the bond at the reducing end of a fucosylated-GlcNAc has not been reported to date. Next, we characterized the additional CAZymes encoded by the locus, all lacking a signal peptide or transmembrane domain suggestive of their intracellular localization. We showed that the concerted action of RiFuc29 and RiFuc95 that act on α-(1 → 4) and α-(1 → 2)-linked L-fucosyl, respectively mediates the complete defucosylation of putative products of RiGH136, Le b tetraose, Le a triose and H triose type I ( Supplementary Fig. 5b-d). Initial defucosylation by RiFuc29 is required for releasing the 1 → 2 linked L-fucosyl in Le b tetraose by RiFuc95. Finally, we showed that the GH112 from R. inulinivorans (RiGLnbp112) phosphorolyzes LNB and GNB equally efficiently ( Supplementary Fig. 5e, Supplementary Table 5).
A domain with a new fold is required for GH136 activity. To discern the molecular architecture of GH136 enzymes and explain the requirement of the two subunits for activity, we endeavoured to crystallize both RhLnb136 and RiLe a/b 136, without success. Hence, we turned our attention to the taxonomically related E. ramulus, which has a GH136 locus similar to the one in R. hominis, except for a substitution of the GH112 phosphorylase with a GH42 β-galactosidase gene ( Supplementary  Fig. S1) that may confer hydrolysis of the LNB/GNB units 30 . E. ramulus and R. hominis also shared similar growth profiles on LNT ( Fig. 1), which was consistent with the presence of a functional GH136. The N-terminal (ErLnb136 I ) and the C-terminal (ErLnb136II) regions of the E. ramulus GH136 share homology to RhLnb136 I and RhLnb136 II (Supplementary Fig. 1 and 4), respectively, suggestive of the fusion of ErLnb136 I and ErLnb136 II to form an active enzyme (ErLnb136). This is supported by the identical specificity and similar catalytic efficiencies of ErLnb136 and RhLnb136 (Supplementary Table 4). Moreover, intimate interaction of the two ErLnb136 domains is consistent with the cooperative unfolding profile of the enzyme (Supplementary Fig. 4b). These data justified the use ErLnb136 to study the architecture of the two subunits/domains compulsory for activity within GH136. The crystal structures of selenomethionine (SeMet)-labelled and native ErLnb136 were determined at 1.4 and 2.0 Å resolution, respectively (Supplementary Table 6). The C-terminal catalytic domain (ErLnb136 II , from AA 242-663) assumes a β helix fold (Fig. 3) similar to the bifidobacterial homolog LnbX (Supplementary Table 7). The LNB molecule bound in the active site is recognized by ten potential hydrogen bonds and aromatic stacking of the Gal unit onto W548 ( Fig. 3f and Supplementary Fig. 6a). Interestingly, the GlcNAc sugar ring of LNB in ErLnb136 adopts an 4 E conformation (φ = 232°and ψ = 68°) with the O1-OH in a pseudoaxial position to form a direct hydrogen bond with the acid/base catalyst (D568) (Supplementary Fig. 6a). Moreover, the D575 O δ2 of the nucleophile is positioned appropriately for nucleophilic attack on the anomeric carbon of the GlcNAc at 3.2 Å (Fig. 3f).
The N-terminal domain (ErLnb136 I , from AA 7-224) consists of 8 α-helices (α1-α8) (Fig. 3a-c) and assumes a previously unknown fold, stabilized by the central helix α1. The structurally most related protein to ErLnb136 I , a peptidyl-prolyl cis-trans isomerase with a chaperone activity from Helicobacter pylori (5EZ1), shares weak structural similarity restricted to helices α6 and α7 ( Supplementary Fig. 6b, Supplementary Table 7). The ErLnb136 I domain embraces the sides and back of the β helix domain (Fig. 3a-c). These extensive inter-domain interactions (solvent inaccessible interface ≈1618 Å 2 ), stabilize the protein structure with ΔG = −17 kcal mol −1 . Remarkably, the α6-α7 loop of ErLnb136 I forms a part of the active site with the solvent accessible sidechain of Y145 positioned near the active site (5.7 Å to the GlcNAc O1 atom of LNB) (Fig. 3d, e). The Y145A mutant showed a 4.9-fold higher K M (Supplementary Table 4, Supplementary Fig. 4c), suggesting that this residue contributes to substrate interactions, possibly at the +1 subsite.
Capture and uptake of HMOs by Roseburia. The proteomic analyses highlighted the putative protein apparatus required for growth on HMOs. The solute binding proteins (SBPs) of two ABC transporters in R. hominis and R. inulinivorans were within the top 8% upregulated proteins, hinting their involvement in uptake of HMOs. Both SBPs recognized distinct HMOs and ligands from host-glycans ( Fig. 4, Supplementary Tables 2 and 3, Supplementary Fig. 7). The R. hominis SBP (LNB-binding protein, RhLNBBP) shows preference to LNB followed by GNB and LNT, suggestive of the uptake and intracellular degradation of these ligands by RhGLnbp112 and RhLnb136 as described above. By contrast, fucosyl-decorated Lewis b (Le b ) tetraose and Lewis a (Le a ) triose were the preferred ligand of the Le a/b binding protein (RiLe a/b BP) from R. inulinivorans, followed by LNB and GNB, whereas no binding to LNT was detected (Fig. 4 Table 3). The loss of the fucosyl unit at the terminal reducing GlcNAc reduced the affinity of RiLe a/b BP about 5-fold for blood group H antigen triose type I (H triose type I) relative to Le b tetraose. The specificity of RiLe a/b BP is highlighted by the lack of affinity for lacto-N-neotetroase (LNnT), blood group A antigen triose (A triose), lactose and 2′-fucosyllactose (2′-FL). These findings show that the products of RiLe a/b 136 are the preferred ligands for RiLe a/b BP, consistent with the extracellular degradation of fucosylated pentaose and hexaose HMOs and uptake of their products by the ABC transporter. An equimolar mixture of Le b , Le a and H-triose type I oligomers promoted the growth of R. inulinivorans to a similar final OD 600 as glucose (Fig. 1b, c and Fig. 4a). The uptake profiles of these ligands reflected the preference of RiLe a/b BP, consistent with uptake by the associated transporter (Fig. 4). This was also in accord with the utilization of larger fucosylated HMO structures observed during growth on purified HMOs from mother's milk ( Supplementary Fig. 2a-c). Notably, no uptake of LNT was observed, which is in excellent agreement with the poor growth ( Fig. 1c) and with the lack of detectable binding to LNT by RiLe a/b BP (Fig. 4a).

, Supplementary
These results established the capture of specific HMOs and related ligands by the above SBPs and the differentiation of their specificities, e.g. preference of RiLe a/b BP to fucosylated ligands at the terminal reducing GlcNAc.
Roseburia cross feeding on mucin. HMOs and O-glycans from glycolipids and glyco-proteins including mucin share structural motifs. The high affinity of the SBPs from Roseburia for GNB from mucin suggested possible foraging of this substrate (and/or oligomers from glycoconjugates) and thereby a metabolic interplay of Roseburia with mucolytic HGM members. To evaluate possible mechanisms of cross-feeding we compared Roseburia growth on mucin with and without the model mucin degrader Akkermansia muciniphila DSM 22959 32 .
A co-culture of R. hominis and R. inulinivorans displayed no growth within 24 h on a mucin mixture and only poor growth after 48 h ( Supplementary Fig. 8a,b), in contrast to A. muciniphila that grew well within 24 h. The co-culture of the two Roseburia species and A. muciniphila grew to a significantly higher OD 600 than A. muciniphila alone (p < 3.7 × 10 −6 at 24 h, p < 1.3 × 10 −3 at 48 h)( Supplementary Fig. 8a). The growth of Roseburia is supported by a 4.5-fold higher butyrate level in the co-culture supernatants than Roseburia alone (24 h). After 48 h, a slight increase in butyrate concentration was also detected in cultures containing only Roseburia consistent with the growth data ( Supplementary Fig. 8c).
To unveil the basis for the Roseburia growth, the proteomes of R. hominis and R. inulinivorans, both grown on glucose were compared with co-cultures of Roseburia and A. muciniphila grown on mucin. For R. hominis and R. inulinivorans, 31 and 93 proteins, including several CAZymes, were significantly upregulated (log 2 fold change > 2) relative to the glucose co-cultures. The transport protein RhLNBBP and RhGLnbp112 from the R. hominis HMO locus ( Fig. 1f) were the top 6 th and 10 th most upregulated proteins in the mucin proteome of R. hominis, respectively, indicative of a role of this locus in cross-feeding on host glycans ( Supplementary Fig. 8e). In R. inulinivorans, the corresponding enzymes RiLe a/b BP and RiGLnbp112 were also significantly upregulated with log 2 fold changes of 2.77 and 4.71, respectively ( Supplementary Fig. 8e). However, the top upregulated protein in the R. inulinivorans proteome was a SBP of an ABC transporter co-localised with genes encoding a blood group A-and B-cleaving endo-β-(1 → 4)-galactosidase (RiGH98), a putative α-galactosidase of GH36 and an α-L-fucosidase (GH29), which was the top fourth upregulated protein in the mucin proteome ( Supplementary Fig. 8f). The upregulation of this locus suggested that R. inulinivorans possesses a functional machinery for directly accessing certain mucin oligomers. We expressed the predicted extracellular RiGH98 and demonstrated release of blood group A and B oligomers from mucin and related Oglycans (Supplementary Fig. 8g-h, Supplementary Data 1). The co-upregulation of a locus encoding a fucose utilization pathway ( Supplementary Fig. 3a) is in accordance with the release of fucosylated oligomers by RiGH98. Another route of foraging, was suggested by the high upregulation of the sialic acid catabolism pathway ( Supplementary Fig. 3b), which likely confers the potent growth of R. inulinivorans on this substrate (Fig. 1d). The poor growth of R. inulinivorans on mucin in the absence of A. muciniphila suggests that the latter bacterium enables crossfeeding by the release of sialic acid, as R. inulinivorans lacks sialidases, which together with the ability of RiGH98 to access blood group A and B oligomers in mucin substrates, may support a better co-growth in the co-culture. These findings are consistent with the role of HMO utilization machinery and additional functional operons in supporting co-growth with A. muciniphila on mucin.
The HMO utilization loci are prevalent in Roseburia. The HMO loci, defined by the co-occurrence of GH136 and GH112 genes, are conserved in five Roseburia reference genomes (Supplementary Fig. 1). To broadly examine the structure and conservation of these loci, the presence of homologs of the aforementioned genes was mapped across 4599 previously reconstructed Roseburia genomes 33 . As a reference signature for a central catabolic pathway, the presence of GH10 xylanase genes, compulsory for xylan utilization in R. intestinalis 29 , was also analysed. Strikingly, the GH112 and GH136 HMO utilization  genes are about 2-3 fold more prevalent than the GH10 counterparts (Fig. 5a), indicative of the broader distribution of the HMO loci compared to the xylanase locus, which is mainly conserved in R. intestinalis. The GH136 I and GH136 II genes have a similar prevalence, which is~30% lower than that of GH112.This overall trend is reiterated when we analyze individual species-level genome bins (SGBs), with some differences in the co-occurrence patterns of GH136 and GH112 genes (Fig. 5b). For example, while GH112 and GH136 have similar prevalence in R. hominis (SGB 4936), GH112 was 2.6 times more prevalent than GH136 in R. inulinivorans (SGB 4940). We analysed the organization of 818 loci, defined by the presence of a GH112 gene and at least one encoded subunit of the GH136, with a more stringent threshold (70% identity of the GH112 and GH136 sequences present in any of the 5 Roseburia reference genomes, see Supplementary Fig. 1). The gene clusters around the GH112 appeared to be SGBs-specific (Fig. 5c), indicative of diversity of the loci within the genus. Analysis of the most representative gene contexts for each SGB (Fig. 5d) shows that genes for ABC transporters, GH136, and transcriptional regulators were the most frequently co-occurring with GH112 genes, which offers a robust signature of the Roseburia HMO utilization loci (Fig. 5d) and validates their broad distribution. Additional CAZymes and carbohydrate metabolic genes were also frequently co-occurring in the vicinity of GH112 genes, suggesting that additional glycan utilization capabilities are clustered around the HMO loci.

Discussion
Perturbation of the early life HGM assembly is associated with life-long effects on the immune-and metabolic homeostasis of the host 9-12 . Breastfeeding is a key affector of the dynamics of the microbiota during infancy. Weaning marks a dramatic transition towards an adult-like structure of the HGM, which matures at the age of 2-3 and exhibits high resilience throughout adulthood 7,8,22 .
The critical window that precedes the maturation of the microbiota offers a unique opportunity for therapeutic interventions to address aberrant HGM states and thereby to prevent dysbiosis-related chronic disorders. To date, insight into the compositional transitions of the assembly of the microbiota during infancy 6-8 is available, but the underpinning mechanisms, especially during weaning, remain elusive. Here, we describe previously unknown pathways that confer the growth of butyrate producing Clostridiales on distinct HMO motifs and related oligomers from host glyco-conjugates. These pathways may promote an early competitive adaptation advantage for Clostridiales that are associated with the healthy HGM and with the protection from metabolic and inflammatory disorders as well as colorectal cancer [24][25][26]34 .
We uniquely demonstrate that key butyrate producing Roseburia and Eubacterium spp. grow on complex HMOs purified from mother's milk and on defined HMO motifs (Fig. 1a-d).
Proteomic analyses revealed two highly upregulated genetic loci that encode distant homologs to a lacto-N-biosidase from B. longum 28,35 , GNB/LNB phosphorylases and ABC transporters in R. hominis and R. inulinivorans, (Fig. 1f-g and Supplementary  Fig. 1). The R. hominis locus (Figs. 1, 2 and 4, Supplementary  Fig. 5e, Supplementary Tables 2, 4 and 5) supports growth on the HMO motifs LNT and LNB, whereas the R. inulinivorans locus confers growth on more complex HMOs, e.g. single and double fucosylated versions of LNT (Figs. 2a and 4, Supplementary  Figs. 2 and 5). The specialization on different, but partially overlapping, HMOs and related Lewis a/b antigen oligomers from glyco-lipids or glyco-proteins creates differential competitive catabolic niches. This specialization is evident from the divergence of the GH136 specificities. Thus, RhLnb136 and ErLnb136 are lacto-N-biosidases, whereas RiLe a/b 136 displays an unprecedented specificity that requires a Fuc-α-(1 → 4)-GlcNAc at the subsite −1 and accommodates additional fucosylation at the −2, and +2 subsites (Fig. 2, Supplementary Fig. 5a and Supplementary Table 4). The preference to fucosylation is consistent with an open active site effectuated by shortening of loops, (ErLnb136: Loop 1 AA 330-341, Loop 2 AA 520-543, Supplementary Fig. 6c), which allows the accommodation of bulky fucosylated substrates. Remarkably, the GH136 I subunits (or domains in ErGH136-like enzymes) are co-evolved with the GH136 II counterparts that possess the catalytic residues ( Supplementary Fig. 6d).
Our stability ( Supplementary Fig. 4c), structural ( Fig. 3 and Supplementary Fig. 6), biochemical ( Supplementary Fig. 4, Supplementary Table 4) and phylogenetic analyses ( Supplementary  Fig. 6d) affirm the crucial role of the GH136 I domain in the functionality of GH136 enzymes and provide compelling evidence to the association of the two GH136 domains. The sequence conservation of GH136 I and GH136 II was mapped on the structure of ErLnb136. Strikingly, highly conserved patches were identified across both domains (Supplementary Fig. 6e). Particularly, parts of the α4-α5 loop and of the α5 helix in ErLnb136 I that pack extensively onto ErLnb136 II display globally conserved residues, together with the complementary coconserved regions of ErLnb136 II (Supplementary Fig. 6e). Moreover, the surface of ErLnb136 I is positively charged and apolar at the interface with ErLnb136 II , which is notably different from the negative potential on the surface of the rest of the enzyme (Supplementary Fig. 6f) and complementary to the interface surface of ErLnb136 II . These results highlight the coevolution of GH136 subunits or domains.
ABC transporters are a determinant of uptake selectivity and competitiveness in both bifidobacteria 17,31,36 and R. intestinalis 29 . The two SBPs of the ABC importers located in the HMO loci of R. hominis and R. inulinivorans were within the top 5 upregulated proteins in each proteome in response to HMO utilization (Fig. 1), underscoring the critical role of oligosaccharide transport in the competitive gut niche. The preferences of the SBPs and GHs encoded by these loci appear aligned to confer efficient uptake and subsequent catabolism of preferred substrates (Figs. 2 and 4, Supplementary Fig. 5, Supplementary Tables 2, 3, 4 and 5). The LNB/GNB phosphorylases of GH112 are also conserved in the HMO loci ( Supplementary Fig. 1). R. inulinivorans possesses additional CAZymes, notably different fucosidases for degradation of internalized fucosylated-oligomers (Supplementary Fig. 1  and 5b-d). Based on the proteomic analyses and the biochemical data, we propose a model for the two distinct routes for uptake and depolymerisation of HMOs in Roseburia and Eubacterium ( Fig. 6 and Supplementary Fig. 9).
Butyrate producing bacteria of the Roseburia-Eubacterium group (Clostridiales order) are early colonizers of the infant gut 6,8,37 and are prevalent members of the adult HGM 22,23 .
The origin of this taxonomic group is enigmatic, but their presence in the human milk microbiome has been reported 38,39 . Orthogonal transfer from mothers based on the identification of the same Roseburia strains in mothers faeces, milk and the infant guts 40 has also been proposed. R. intestinalis type strains have been isolated from infant faeces 41 , hinting the presence of this taxon before full transition to solid food.
We have previously shown that the abundance of distinct bifidobacteria in guts of breast-fed infants is strongly correlated to efficient ABC transporters that capture the 2′-and 3′-fucosyllactose HMOs with high affinity (K D ≈ 5 µM) 17 . The strains possessing these genes, e.g. from Bifidobacterium longum subspecies infantis, are not detected after weaning, as opposed to counterparts adept at utilizing plant-derived glycans. By contrast, the same Clostridium group XIVa strains that possess plant glycan utilization pathways 29,42,43 retain HMO catabolic pathways. The simultaneous growth of R. hominis on LNT and the cereal derived xylotetraose (Supplementary Fig. 2a-c) demonstrates this catabolic plasticity, which likely confers an additional competitive advantage during weaning, when the dominant fucosyl lactose specialized Bifidobacterium community collapses due to sporadic supply of HMOs.
The loci that target HMOs also mediate cross-feeding on mucin or other glyco-conjugate oligomers, e.g. GNB from mucin and blood antigen structures, both captured efficiently by Roseburia transport proteins (Fig. 4a, Supplementary Tables 2 and 3). This is consistent with the significant butyrate production measured in co-cultures of Roseburia and A. muciniphila 32 (Supplementary Fig. 8c) and the upregulation of GH136-containing loci in the mucin co-culture and HMO monocultures ( Fig. 1 and Supplementary Fig. 8e). R. inulinivorans possesses an extensive mucolytic machinery revealed by the upregulation of fucose and sialic acid catabolism loci ( Supplementary Fig. 3 Fig. 8f-g, Supplementary Data 1) that allows the release of β-(1 → 4)-linked blood group oligomers found in mucin and glyco-lipids on the surfaces of enterocytes 44,45 . This ability to access carbohydrates from mucin and host glyco-conjugates supports growth during periods of nutritional perturbations, which may increase the resilience of this taxonomic group.

) as well as a blood group A and B-locus (Supplementary
Our bioinformatic analysis of the Roseburia genomes establish that HMO utilization appears to be a core trait within Roseburia, based on the ubiquitous presence of loci harbouring GH112 and GH136 genes (Fig. 5). The occurrence of SGBs that exclusively possess GH112 genes (e.g. SGB 4939, Fig. 5b) suggests that distinct strains are secondary degraders that cross-feed on released simple substrates, e.g. LNB and GNB. By contrast, the co-occurrence of GH112 and GH136 genes (Fig. 5b) offers a signature for primary degraders that are able to access more complex glycans from HMOs or host glyco-conjugates.
In conclusion, the present study sets the stage for a mechanistic understanding of the assembly of physiologically important core groups in the early life microbiota and discloses previously unknown roles of HMOs in selection of Clostridiales. Additional studies are required to further address the paramount, but poorly understood maturation of the early life microbiota.
Purification of human milk oligosaccharides. HMOs were purified from pooled human milk samples 48,49 . Milk fat was separated by centrifugation (10,000 × g, 30 min at 4°C) and proteins were removed by ethanol precipitation (as above). The   Fig. 6 Model for HMOs and related host glycan utilization by Roseburia and other Lachnospiraceae. In R. hominis, LNT, LNB and the mucin derived GNB are captured by RhLNBBP for uptake into the cytoplasm and LNT is subsequently hydrolysed to LNB. Both LNB and GNB are phosphorolyzed by RhGLnbp112 into α-D-galactose-1-phosphate and the corresponding N-acetylhexosamines GlcNAc and GalNAc, respectively. Lactose is likely hydrolysed by a canonical β-galactosidase. In R. inulinivorans, initial hydrolysis of HMOs or O-glycans from glyco-lipids/proteins occurs at the outer cell surface by RiLe a/b 136, which has two C-terminal putative galactose-binding domains. The import of degradation products is mediated by the RiLe a/b BP-associated ABC transporter. Fucosyl decorations are removed by the concerted activity of RiFuc95 and RiFuc29 before RhGLnbp112 phosphorolyzes the resulting LNB or imported GNB into monosaccharides, as described in R. hominis. Galactose and galactose-1-phosphate products are converted via the Leloir pathway to glucose-6phosphate and N-acetylhexosamine sugars are converted to GlcNAc-6-phosphate before entering glycolysis. The pyruvate generated from glycolysis is partly converted to butyrate 46 . Roseburia inhabits the outer mucus layer 47 together with A. muciniphilia. R. inulinivorans cross-feeds on sialic acid and accesses β-(1 → 4)-linked blood group A and B oligosaccharides from mucin and glyco-lipids/proteins via RiGH98. Black solid arrows show enzymatic steps established or confirmed in this study. Black dotted arrows indicate steps based on literature. Grey dotted arrows indicate butyrate production by R. hominis and R. inulinivorans from mucin in co-culture with A. muciniphilia. The glycan structure key is the same as in Fig. 1.
supernatant was up concentrated by rotary evaporation, buffered with 2 volumes 100 mM MES, 300 mM NaCl, pH 6.5 and lactose was digested with ß-galactosidase from Kluyvermomyces lactis (Sigma Aldrich) (20 U mL −1 , 3 h at 37°C). The enzyme was precipitated with ethanol (as before) and the supernatant was concentrated by rotary evaporation. Residual lactose and monosaccharides were removed by solid-phase extraction (SPE) using 12 mL graphitized Supelclean™ ENVI-Carb™ columns (Supelco) with a bed weight of 1 g. For SPE, columns were activated with 80% (v/v) ACN containing 0.05% (w/v) formic acid (FA) and equilibrated with buffer A (with 4% (v/v) ACN, 0.05% (w/v) FA), which was also used to dilute the samples prior to loading. After sample loading, the columns were washed (6 column volumes of buffer A) to remove lactose and monosaccharides before oligosaccharides were eluted with 40% (v/v) ACN, 0.05% (w/v) FA. Eluted oligosaccharides were concentrated in a speed vacuum concentrator, freeze-dried and dissolved in milliQ prior to usage. Purity of HMOs was verified by high-performance anion exchange chromatography with pulsed amperometric detection (HPAEC-PAD) on an ICS-5000 (Dionex) system with a 3 × 250 mm CarboPac PA200 column (Theromofisher), a 3 × 50 mm CarboPac guard column (Theromofisher) and 10 µL injections. HMOs were eluted with a stepwise linear gradient of sodium acetate: 0-7.5 min of 0-50 mM, 7.5-25 min of 50-150 mM and 25-35 min of 150-400 mM, at a flow rate of 0.35 mL min −1 and a mobile phase of constant 0.1 mM NaOH. Standards (0.01-0.5 mM) of lactose, galactose and glucose in milliQ were used to quantify these residual sugars as described above. The analysis was performed in triplicates and the residual content of these sugars was <2% (w/w) of the purified HMO mixture.
Isolation and purification of porcine mucins. The commercial porcine gastric mucin (PGM) was further purified 50 . In short, 20 g PGM was stirred for 20 h at 25°C in 20 mM phosphate buffer, 100 mM NaCl, pH 7.8 (adjusted to pH 7.2 after the first 2 h using 2 M NaOH). Insoluble residues were removed by centrifugation (10,000×g, 30 min at 4°C) and soluble mucin was precipitated by the addition of 3 volumes of ice cold ethanol (99%) and incubation for 18 h at 4°C. Precipitated mucin was dialyzed 5 times against 200 volumes milliQ for 16 h at 4°C, using a 50 kDa molecular weight cut off membrane (Spectra, VWR) and afterwards freeze dried.
Porcine colonic mucin was isolated from five fresh pig colons from the slaughterhouse of Danish Crown (Horsens, Denmark). Pig colons were processed at site and immediately placed on dry ice to ensure quick cooling during transport. Colons were opened longitudinally and content was removed mechanically and by washing with ice cold 0.9% (w/v) NaCl until no digesta was visible. Cleaned luminal surface was quickly dried with absorptive paper and the mucosa was scraped off with a blunt metal spatula and subsequently transferred into a precooled glass beaker whereby visible fat was removed and discarded. Mucin was then purified as previously described 51 . Isolated mucin was immersed in 10 volumes extraction buffer (10 mM sodium phosphate buffer, 6 M guanidine hydrochloride (GuHCl), 5 mM ethylenediaminetetraacetic acid (EDTA), 5 mM N-ethylmaleimide, pH 6.5) and gently stirred overnight at 4°C. Soluble impurities and floating fat were separated by centrifugation (10,000×g, 30 min at 4°C), pelleted mucin was dissolved in 10 volumes extraction buffer and incubated for 3 h at room temperature again. Soluble impurities were removed by centrifugation as described before. Short incubation (3 h) extraction steps were repeated 7 times until the supernatant was clear for at least two repeated extractions. Afterwards insoluble mucin was solubilized by reduction in 0.1 M Tris, 6 M GuHCl, 5 mM EDTA, 25 mM dithiotreitol (DTT) pH 8, for 5 h at 37°C and subsequent alkylation through the addition of 65 mM iodoacetamide and incubation in the dark for 18 h at 4°C. Soluble mucin was dialyzed 6 times against 200 volumes milliQ using a 50 kDa MWCO dialysis bag for 6 h at 4°C and freeze dried.
Cloning, expression and purification of proteins. Open reading frames encoding proteins from R. hominis DSM 16839, R. inulinivorans DSM 16841 and E. ramulus DSM 15684 were cloned without signal peptide or transmembrane domain from genomic DNA using In-Fusion cloning (Takara) and the primers in Table S8 into the EcoRI and Ncol restriction sites of the corresponding plasmids, to encode proteins with either a cleavable N-or C-terminal His 6 tag. The pETM 11 plasmid was used (from G. Stier, EMBL, Center for Biochemistry, Heidelberg, Germany) 52 , except for RHOM_04110 (RhLnb136 I ) and ROSEINA2194_01899 (RiLe a/b 136 I ) which were cloned into pET15b (Novagen). Recombinant proteins were expressed in E. coli BL21 ΔlacZ (DE3)/pRARE2 and purified following standard protocols using His-affinity and size-exclusion chromatography. Mutants of E. ramulus HMPREF0373_02965 (ErLnb136) were constructed using QuickChange II Site-Directed Mutagenesis (Agilent) with pETM11_ HMPREF0373_02965 as template. Primers used for site-directed mutagenesis are listed in Table S8 and mutants were produced as described above. L-Selenomethionine (ʟ-SeMet) labelled protein expression of ErLnb136 was performed by introducing the corresponding plasmid into E. coli B834 (DE3) and culturing the transformed cells in a synthetic M9 based medium of the SelenoMet labelling Kit (Molecular Dimensions) supplemented either with L-methionine or L-SeMet (both to 50 µg mL −1 ). The L-SeMet labelled protein was purified as described above.
Growth experiments and single strain proteomics analysis. R. hominis DSM 16839, R. inulinivorans DSM 16841, E. ramulus DSM 15684 and E. ramulus DSM 16296 were grown anaerobically at 37°C using a Whitley DG250 Anaerobic Workstations (Don Whitley Scientific). R. hominis and R. inulinivorans were propagated in YCFA medium 41 while for E. ramulus strains CFA medium (modified YCFA medium lacking yeast extract to minimize E. ramulus growth on yeast extract) was used. Growth media were supplemented with 0.5% (w/v) carbohydrates sterilized by filtration (soluble carbohydrates, 0.45 µm filters) or autoclaving (mucins, 15 min at 121°C) and cultures were performed in at least biological triplicates unless otherwise indicated. Bacterial growth was monitored by measuring OD 600 nm and pH (for co-culture experiments). For growth experiments performed in microtiterplates, a Tecan Infinite F50 microplate reader (Tecan Group Ltd) located in the anaerobic workstation was used and growth was followed by measuring OD 595 nm . An unpaired two-tailed Student's t-test was used to determine the statistical significance between growth level reached between different culture conditions and non-carbohydrate controls.
LC-MS/MS. Peptides from biological triplicates of each culture condition were loaded on the mass spectrometer by reverse phase chromatography through an inline 50 cm C18 column (Thermo EasySpray ES803) connected to a 2 cm long C18 trap column (Thermo Fisher 164705) using a Thermo EasyLc 1000 HPLC system. Peptides were eluted with a gradient of 4.8-48% (v/v) ACN, 0.1% (w/v) FA at 250 nL min −1 over 260 min (samples from single strain cultures) or 140 min (SCX NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-17075-x ARTICLE fractionated samples from co cultures) and analysed on a Q-Exactive instrument (Thermo Fisher Scientific) run in a data-dependent manner using a Top 10 method. Full MS spectra were collected at 70,000 resolution, with an AGC target set to 3 × 10 6 ions or maximum injection time of 20 ms. Peptides were fragmented via higher-energy collision dissociation (normalized collision energy = 25). The intensity threshold was set to 1.7 × 10 6 , dynamic exclusion to 60 s and ions with a charge state <2 or unknown species were excluded. MS/MS spectra were acquired at a resolution of 17,500, with an AGC target value of 1 × 10 6 ions or a maximum injection time of 60 ms. The scan range was limited from 300-1750 m/z.
Protein label free quantification in bacterial co-cultures. Proteome Discoverer versions 2.2 and 2.3 were used to process and analyse the raw MS data files and label free quantification was enabled in the processing and consensus steps. The spectra from single strains proteomics were matched against the proteome database of R. hominis DSM 16839 (ID: UP000008178) or R. inulinivorans DSM 16841 (ID: UP000003561) respectively, as obtained from Uniprot. The spectra from co-culture experiments were searched against a constructed database consisting of the reference proteomes of the two Roseburia strains (as above) and A. muciniphila DSM 22959 (ID: UP000001031). For spectral searches, oxidation (M), deamidation (N, Q) and N-terminal acetylation were specified as dynamic modifications and cysteine carbamidomethylation was set as a static modification. Obtained results were filtered to a 1% FDR and protein quantitation was done by using the built-in Minora Feature Detector. For analysis of the label-free quantification data, proteins were considered present if at least two unique peptides (as defined in Proteome Discoverer) were identified and proteins had to be identified in at least two out of the three samples analysed per culture condition with high confidence.
Relative bacterial abundance in co-cultures was estimated based on strain unique peptides identified with Unipept version 4.0 56 . To exclude peptides shared between closely related strains from the analyses, all peptide sequences quantified via Proteome Discoverer were imported into the Unipept web server and analysed with the settings Equate I and L and Advanced missed cleavage handling activated. The normalized sum of intensities of the resulting taxonomically distinctive peptides was then used for assessing relative abundances of each strain.
Butyrate quantification. Butyrate in culture supernatants was quantified by HPLC coupled to a refracting index detector (RID) and diode array detector (DAD) on an Agilent HP 1100 system (Agilent). Standards of butyric acid (0.09-50 mM) were prepared in 5 mM H 2 SO 4 for peak identification and quantification. Samples from four biological replicates were analysed by injecting 20 µL of standard or filtrated (0.45 µM filter) culture supernatant on a 7.8 × 300 mm Aminex HPX-87H column (Biorad) combined with a 4.6 × 30 mm Cation H guard column (Biorad). Elution of was performed with a constant flow rate of 0.6 mL min −1 and a mobile phase of 5 mM H 2 SO 4 . Standards were analysed as above in technical triplicates.
Oligosaccharide uptake preference of Roseburia spp. R. hominis was grown anaerobically in 250 µL YCFA medium with 0.5% (w/v) of an equal mixture of xylotetraose and LNT in biological triplicates. Samples (20 µL) were taken after 0, 3.5, 5.5, 6.5, 8, 9.5 and 24 h, diluted 10-fold in ice cold 100 mM NaOH and centrifuged (10 min at 5000×g at 4°C) before supernatants were stored at −20°C until the HPAEC-PAD analysis. Standards of 0.5 mM xylotetraose and LNT were prepared in 100 mM NaOH and used to identify corresponding peaks in the chromatograms. Samples or standard were injected (2 µL injections) on a 4 × 250 mm CarboPac PA10 column with a 4 × 50 mm CarboPac guard column and eluted isocratically (0.750 mL min −1 , 100 mM NaOH, 10 mM NaOAc). The analysis was performed from a biological triplicate and standards were analysed in technical duplicates.
For determining uptake preference of Le b tetraose, Le a triose and blood group H triose type I, R. inulinivorans was grown anaerobically in 200 µL YCFA medium supplied with an equal mixture of 1.5 mM Le b tetraose, 1.5 mM Le a triose and 1.5 mM blood group H triose type I in biological triplicates. Samples (10 µL) were taken after 0, 3.5, 5.5, 6.5, 8, 9.5 and 24 h, diluted 10-fold in ice cold 20 mM NaOH and centrifuged (10 min at 5000×g at 4°C) before supernatants were stored at −20°C until the HPAEC-PAD analysis. Standards of 0.1 mM Le b tetraose, Le a triose and blood group H triose type I were prepared in 20 mM NaOH and used to identify corresponding peaks in the chromatograms. Samples or standard were injected (10 µL injections) on a 4 × 250 mm CarboPac PA10 column with a 4 × 50 mm CarboPac guard column and eluted isocratically (0.750 mL min −1 , 50 mM NaOH). The analysis was performed from a biological triplicate and standards were analysed in technical duplicates.
For investigating mixed HMO uptake, R. inulinivorans was grown anaerobically in 300 µL YCFA medium with 0.5% (w/v) of mixed HMOs purified from mother's milk or 0.5% (w/v) of mixed HMOs purified from mother's milk but previously digested with RiLe a/b 136 (0.5 µM RiLe a/b 136 for 18 h) in biological triplicates. Samples (10 µL) were taken after 0 and 24 h, diluted 10-fold in ice cold 20 mM NaOH and centrifuged (10 min at 5000×g at 4°C) before supernatants were stored at −20°C until the HPAEC-PAD analysis. Standards of 0.1 mM Le b tetraose, Le a triose, blood group H triose type I, LNDFH I, Lactose, 2′FL and LNT were prepared in 20 mM NaOH and used to identify corresponding peaks in the chromatograms. Samples or standard were injected (10 µL injections) on a 4 × 250 mm CarboPac PA200 column with a 4 × 50 mm CarboPac guard column and eluted isocratically (0.350 mL min −1 , 50 mM NaOH). The analysis was performed from a biological triplicate and standards were analysed in technical duplicates.
Enzyme activity assays. Enzymatic activity assays were carried out in 50 mM MES, 150 mM NaCl, 0.005% (v/v) Triton X-100, pH 6.5 standard assay buffer and in triplicates unless otherwise stated.
Hydrolysis kinetics and specific activities of the GH136 lacto-N-biosidases were measured using a coupled enzymatic assay to monitor lactose release. The lactose was hydrolysed with a ß-galactosidase (used above) and the resulting glucose was oxidized with a glucose oxidase (Sigma Aldrich) concomitant with the production of H 2 O 2 measured by coupling to horseradish peroxidase (Sigma Aldrich) oxidation of 4-aminoantipyrine and 3,5-dichloro-2-hydroxybensensulfonic acid. Reactions were prepared in 96-well microtiter plates to a final volume of 150 µL, containing substrate, lacto-N-biosidase, ß-galactosidase (150 U mL −1 ), glucose oxidase (150 U mL −1 ), horseradish peroxidase (150 U mL −1 ), 10 mM 3,5-dichloro-2-hydroxybensensulfonic acid, 1 mM 4-aminoantipyrine in standard assay buffer. Reactions were performed at 37°C and A 515 nM was measured in 5 sec intervals for 30 min. Blanks were prepared by substituting lacto-N-biosidase with standard assay buffer in the reaction mixture and a lactose standard (3-500 µM) was used for the quantification.
Hydrolysis kinetics of RhLnb136 (40 nM) and ErLnb136 (10 nM) towards LNT (0.2-5 mM for RhLnb136 and 0.1-2.5 mM for ErLnb136) were determined as described above. The kinetic parameters K M and k cat , were calculated by fitting the Michaelis-Menten equation to the initial rate data using OriginPro 2018b and OriginPro 2019b (OriginLab). Lacto-N-biosidase specific activity of RiLe a/b 136 (1.2 µM) was measured as described above using 3.5 mM LNT. The specific activity was expressed in units (U) mg −1 enzyme, where a unit is defined as the amount of enzyme that releases 1 µmol lactose min −1 quantified as above.
Specific activities of RhGLnbp112 and RiGLnbp112 towards LNB and GNB were assayed 50 mM sodium phosphate buffer, 150 mM NaCl, 0.005% (v/v) Triton X-100, pH 6.5. Reactions (150 µL) were incubated for 10 min at 37°C with 20 nM enzyme and 2 mM substrate. Aliquots of 15 µL were removed every minute and quenched in 135 µL 0.2 M NaOH. Standards of Gal1P (5 mM─0.02 mM) were prepared in 0.2 M NaOH and were used to quantify the concentrations of released Gal1P in the quenched reaction samples. Both, quenched reactions and standards were examined by HPAEC-PAD using a 3 × 250 mm CarboPac PA200 column (Theromofisher) in combination with a 3 × 50 mm CarboPac guard column (Theromofisher) and 10 µL injections. Elution was performed with a flow of 0.350 mL min −1 and a mobile phase of 150 mM NaOH and 60 mM sodium acetate. The specific activity was expressed in U mg −1 enzyme, where a U is defined as the amount of enzyme that releases 1 µmoL Gal1P min −1 . The analysis was performed in technical triplicates.
Enzyme product profiles. Enzyme assays were performed at 37°C for 16 h in standard assay buffer or in the phosphate version (instead of MES) for GH112 enzymes, in independent biological triplicates. Degradation products were analysed by thin layer chromatography (TLC) and or Matrix-assisted laser desorption/ ionization time of flight mass spectroscopy (MALDI-TOF/MS) as described below.
Thin layer chromatography. The TLC was performed by spotting 2 µL of enzymatic reaction on a silica gel 60 F454 plate (Merck), the separation was carried out in butanol: ethanol: milliQ water (5:3:2) (v/v) as mobile phase and sugars were visualized with 5-methylresorcinol:ethanol:sulfuric acid (2:80:10) (v/v) and heat treatment except for RiLe a/b 136. The TLC for the latter enzyme was performed in butanol:acetic acid: milliQ (2:1:1)(v/v) and developed with diphenylaminephosphoric acid reagent 57 . TLC analyses were performed from two independent biological duplicates (one analysis from each biological enzymatic reaction replicate) MALDI-TOF/MS. MALDI-TOF/MS analysis of RiLe a/b 136 was according to 58 , following permethylation of oligosaccharides 59 . For permethylation, lyophilized oligosaccharides were reconstituted in 200 µL of anhydrous dimethylsulfoxide (DMSO) and mixed for 5 min with 250 µL of NaOH in DMSO and with 150 µL of iodomethane. Next, 2 mL of 5 % (w/v) acetic acid was added followed by the addition of 2 mL of CH 2 Cl 2 . Subsequently, permethylated oligosaccharides were extracted in the organic phase, dried under a nitrogen stream at 40°C before loading onto a pre-equilibrated Sep-pak C18 cartridges, washing with water and elution with 85% (v/v) acetonitrile. Eluted fractions were dried under nitrogen as before and stored at −20°C until further use. After the enzymatic reaction permethylated products were dried, mixed with 2,5-dihydroxybenzoic acid, and spotted onto the MALDI plate. For MALFI-TOF/MS analyses, a Bruker Autoflex III smartbeam in positive ion mode was used. Degradation products of RhLnb136 and ErLnb136 were analysed without initial permethylation of oligosaccharides using 2,5-dihydroxybenzoic acid as matrix and an Ultraflex II TOF/TOF (Bruker Daltonics) instrument operated in positive ion linear mode. Peak analysis of mass spectra was performed using Flexanalysis Version 3.3 (Bruker Daltonics). MALDI-TOF/MS analyses where performed from independent triplicates (one analysis from each biological enzymatic reaction replicate).
LC-MS 2 of O-glycan derived oligosaccharides. A homogenous preparation of porcine gastric mucin, PGM (Sigma), carrying blood group A, was used in the analysis. A total of 0.1 mg mucin per dot were immobilized by dot blotting onto an immobilon-P PVDF membranes (Immobilon P membranes, 0.45 µm, Millipore, Billerica, MA). RiGH98 was added to one dot to 1.5 µM in 50 µL and incubated for 1 h and 4 h at 37°C. The reaction supernatants which contained released free oligosaccharides, were collected and purified by passage through porous graphitized carbon (PGC) particles (Thermo Scientific) packed on top of a C18 Zip-tip (Millipore). Samples were eluted with 65% (v/v) ACN in 0.5% trifluoro-acetic acid (TFA, v/v), dried, resuspended in 10 μL of milliQ, frozen at −20°C and stored until further analysis. The residual O-linked glycans (on the dot) were released by reductive β-elimination by incubating the dot in 30 μL of 0.5 M NaBH 4 in 50 mM NaOH at 50°C for 16 h followed by adding 1.5 μL glacial acetic acid to quench the reaction. The released O-glycans were desalted and dried as described before 60 . The purified glycans were resuspended in 10 μL of milliQ and stored at −20°C for further analysis. Released oligosaccharides from glycosphingolipids as a model substrate carrying blood group B (B5-2 and B6-2) 61 were prepared as described above, except for a single incubation time of 2 h.
Purified samples were analysed by LC-MS/MS using 10 cm × 250 µm I.D. column, packed in house with PGC 5 µm particles. Glycans were eluted using a linear gradient of 0-40% ACN in 10 mM NH 4 HCO 3 over 40 min at 10 µl min −1 . The eluted O-glycans were analysed on a LTQ mass spectrometer (Thermo Scientific) in negative-ion mode with an electrospray voltage of 3.5 kV, capillary voltage of −33.0 V and capillary temperature of 300°C. Air was used as a sheath gas and mass ranges were defined depending on the specific structure to be analysed. The data were processed using Xcalibur software (version 2.0.7, Thermo Scientific).
Oligosaccharide binding analysis. Binding of LNT, LNB, GNB, H type I triose, Le a triose and Le b tetraose to RiLe a/b BP was analyzed by surface plasmon resonance (SPR; Biacore T100, GE Healthcare). RiLe a/b BP, diluted in 10 mM NaOAc buffer pH 3.75 to 50 µg mL −1 , was immobilized on a CM5 chip using a random amine coupling kit (GE Healthcare) to a final chip density of 3214 and 4559 response units (RU). Analysis comprised 90 s for association and 240 s for dissociation phase, respectively, at a flow rate of 30 µL min −1 . Sensograms were recorded at 25°C in 20 mM sodium phosphate buffer, 150 mM NaCl, 0.005% (v/v) P20 (GE Healthcare), pH 6.5. Experiments were performed in duplicates (each consisting of a technical duplicate) in the range of 0.3-50 µM for LNB, 0.78-200 µM for GNB, 0.97-250 µM for Le a , 0.097-100 µM for Le b and 1.5-250 µM for blood H type I triose. To investigate ligand specify of RiLe a/ b BP, binding was further tested towards 0.5 mM LNT, LNnT, lactose, blood A triose, 2′FL and 3′FL. Equilibrium dissociation constants (K D ) were calculated by fitting a one binding site model to steady state sensograms, using the Biacore T100 data evaluation software.
Binding of LNT, LNB, GNB, LNnT, lactose and 2′FL to RhLNBBP was measured using a Microcal ITC 200 calorimeter (GE Healthcare). Titrations were performed in duplicates at 25°C with RhLNBBP (0.1 mM) in the sample cell and 1.5 mM ligand in 10 mM sodium phosphate buffer, pH 6.5 in the syringe. A first injection of 0.4 µL was followed by 19 injections of 2 µL ligand each, separated by 180 s. Heat of dilution was determined from buffer titrations and corrected data were analysed using MicroCal Origin software v7.0. To determine binding thermodynamics a non-linear single binding model was fitted to the normalized integrated binding isotherms.
Differential scanning calorimetry. The Differential scanning calorimetry (DSC) analyses was performed at protein concentrations of 1 mg mL −1 in 20 mM sodium phosphate buffer, 150 mM NaCl, pH 6.5, using a Nano DSC (TA instruments). Thermograms were recorded from 10 to 90°C at a scan speed of 1°C min −1 using buffer as reference. Baseline corrected data were analysed using the NanoAnalyze software (TA instruments). DSC analyses were performed in duplicates unless otherwise states.
Crystallization. Crystals of ErLnb136 proteins were grown at 20°C using the sitting-drop vapor diffusion method, by mixing 0.5 µL of a 10 mg mL −1 protein solution with an equal volume of a reservoir solution. Native crystals were grown in a 20% (w/v) PEG4000, 0.1 M sodium citrate pH 5.6, and 20% isopropanol reservoir solution. SeMet-labelled crystals were grown using a reservoir solution containing 20% (w/v) PEG6000, 0.1 M Tris-HCl pH 8.5, and 1 M lithium chloride. The crystals were cryoprotected in the reservoir solution supplemented with 20% (v/v) glycerol and 25 mM LNB. The crystals were flash-cooled at 100 K (−173.15°C) in a stream of nitrogen gas. Diffraction data were collected at 100 K on beamlines at SLS X06DA (Swiss Light Source, Swiss) and Photon Factory of the High Energy Accelerator Research Organization (KEK, Tsukuba, Japan). The data were processed using HKL2000 62 and XDS 63 . Initial phase calculation, phase improvement, and automated model building were performed using PHENIX 64 . Manual model rebuilding and refinement was achieved using Coot 65 and REFMAC5 66 . Because the crystal structures of SeMet-labelled and native protein were virtually the same (root mean square deviations for the Cα atoms = 0.14 Å), we used the SeMetlabelled protein structure for the descriptions in the Results and Discussion. Molecular graphics were prepared using PyMOL (Schrödinger, LLC, New York) or UCSF Chimera (University of California, San Francisco) Bioinformatics. SignalP v.4.1 67 , PSORTb v3.0 68 , TMHMM v.2.0 69 were used for prediction of signal peptides and transmembrane domains. InterPro 70 and dbCAN2 71 were used to analyse modular organization using default settings for Gram positive bacteria. Redundancy in biological sequence datasets was reduced using the CD-HIT server (sequence identity cut off = 0.95) 72 . Protein sequence alignments were performed using MAFFT (BLOSUM62) 73 . Phylogenetic trees were constructed using the MAFFT server, based on the neighbour-joining algorithm, and with bootstraps performed with 1000 replicates. Phylogenetic trees were visualized and tanglegrams constructed using dendroscope 74 . Colouring of protein structures according to amino acid sequence conservation was accomplished in UCSF Chimera, based on protein multiple (structural based) alignments from the PROMALS3D server 75 and by using the in UCSF Chimera implemented AL2CO algorithm 76 . The MEME suite web server was used for amino acid sequence motif discovery and evaluation 77 . Protein structures were compared using the Dali server (http://ekhidna2.biocenter.helsinki.fi/dali/) (PMID: 27131377) and the molecular interface between ErLnb136 I and ErLnb136 II was analysed (solvent inaccessible interface, Gibbs energy) via the PDBePISA server (https://www.ebi.ac.uk/pdbe/pisa/).
The abundance and distribution of HMO utilization genes encoding GH112, GH136 I and GH136 II in Roseburia were analysed by a BLAST search of the corresponding DNA reference sequences from R. intestinalis L1-82, R. hominis A2-183 and R. inulinivorans A2-194 against a total of 4599 reconstructed Roseburia genomes, binned into 42 Species-level Genome Bins (SGBs) by Pasolli et al. 33 . The variability of the Roseburia core xylanase (GH10) was determined similarly by blasting the DNA reference sequences from R. intestinalis L1-82 (ROSINTL182_06494) against the same dataset.
For further analyses, initial blast hits were filtered based on a 70% identity with any of the 5 conserved Roseburia reference genomes. Additionally, Roseburia genomes were considered only if they have a hit with GH112 gene and at least one subunit of the GH136 gene. The resulting 818 genomes were assigned into the respective Roseburia SGBs, based on the assignment of Pasolli et al. 33 . The retrieved genomes were used to analyse the gene landscape around the GH112 gene. The RAST server 78 was used for gene annotation. Based on the annotation and coordinates of the genes, 11 genes upstream and downstream the GH112 were selected for gene landscapes analysis. The most conserved gene neighborhood along each SGB was selected as the representative for each SGB. Principal component analysis was done based on the structure of the GH112-GH136 neighbourhood, considering the present or absent of the genes on the gene landscape and as well its position on the loci. We used the function stringdistmatrix with the "osa" method from R to compute the distance matrix.
Quantification and statistical analysis Statistical significant differences were determined using unpaired two-tailed Student's t-test. Statistical parameters, including values of n and p-values, are reported or indicated in the figures, figure legends and the result section. The data are expressed as arithmetic means with standard deviations (SD), unless otherwise indicated.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD015045. The accession numbers for the atomic coordinates reported in this paper are PDB: 6KQS (Se-Met) (https://doi.org/10.2210/pdb6KQS/pdb) and 6KQT (native), see also Table S6. Mucin glycomics MS/MS data are summarized in Table S9 and raw data files are available upon request. Accession number of the cloned genes are provided in Supplementary Table 8. Data underlying Fig. 1a-d, Supplementary Figs. 2, 4c, 5a-f and 8a-c are provided as Source Data files and the reproducibility of representative experiments is indicated in the corresponding figure legends. All other data are available from the corresponding author upon requests. Source data are provided with this paper.